MYCSS

2022-12-02

Install JAVA, HADOOP, ELASTICSEARCH, CRAWLER - NUTCH on DEPLOY MODE. Problem: Store INDEX to ELASTICSEARCH (PART I)

The task is to create a web crawler of http sites using Apache-Nutch on multiple servers and store the index on the elasticsearch server. HADOOP with DFS file system is used to manage JAVA tasks.
A distributed version .job file was used to run Apache-Nutch via HADOOP.
Currently testing on one Ubuntu 22.04 LTS 'el-mix' server with IP 10.110.6.77.

Based on this :  https://phoenixnap.com/kb/install-hadoop-ubuntu

JAVA

root@el-mix:/home/developer# java -version; javac -version
bash: /usr/bin/java: No such file or directory
Command 'javac' not found, but can be installed with:
apt install default-jdk              # version 2:1.11-72build2, or
apt install openjdk-11-jdk-headless  # version 11.0.17+8-1ubuntu2~22.04
apt install openjdk-17-jdk-headless  # version 17.0.3+7-0ubuntu0.22.04.1
apt install ecj                      # version 3.16.0-1
apt install openjdk-18-jdk-headless  # version 18~36ea-1
apt install openjdk-8-jdk-headless   # version 8u312-b07-0ubuntu1
root@el-mix:/home/developer# apt install openjdk-11-jdk-headless
root@el-mix:/home/developer# java -version; javac -version
openjdk version "11.0.17" 2022-10-18
OpenJDK Runtime Environment (build 11.0.17+8-post-Ubuntu-1ubuntu222.04)
OpenJDK 64-Bit Server VM (build 11.0.17+8-post-Ubuntu-1ubuntu222.04, mixed mode, sharing)
javac 11.0.17

ADD USER HDOOP

root@el-mix:sudo adduser hdoop
Adding user `hdoop' ...
Adding new group `hdoop' (1001) ...
Adding new user `hdoop' (1001) with group `hdoop' ...
Creating home directory `/home/hdoop' ...
Copying files from `/etc/skel' ...
New password:

root@el-mix:su - hdoop
hdoop@el-mix

hdoop@el-mix:~$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
Generating public/private rsa key pair.
Created directory '/home/hdoop/.ssh'.
Your identification has been saved in /home/hdoop/.ssh/id_rsa
Your public key has been saved in /home/hdoop/.ssh/id_rsa.pub
The key fingerprint is:
SHA256:HMWsdT22q3UnXZ0ngrQ1RfUpww2gh9sajj9tp6kPmk0n/I hdoop@el-mix
The key's randomart image is:
+---[RSA 3072]----+
|         o. ++=O=|
|         .+.o=*oB|
|        .ooo=o.oo|
|       ..+.+++ . |
|        S.oo..o  |
|        . .o o   |
|         .. =.o  |
|           =o+.  |
|          oo=E   |
+----[SHA256]-----+

hdoop@el-mix:~$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

hdoop@el-mix:~$ chmod 0600 ~/.ssh/authorized_keys

 ssh localhost
The authenticity of host 'localhost (127.0.0.1)' can't be established.
ED25519 key fingerprint is SHA256:iibThntPsyKZ+dD5LynvKrVoei7K/ydr7AWBg8kRGqg.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added 'localhost' (ED25519) to the list of known hosts.
Welcome to Ubuntu 22.04.1 LTS (GNU/Linux 5.15.0-53-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

  System information as of Tue Nov 22 11:21:19 PM UTC 2022

  System load:  0.00341796875      Processes:              133
  Usage of /:   16.1% of 58.75GB   Users logged in:        1
  Memory usage: 59%                IPv4 address for ens18: 10.110.6.77
  Swap usage:   0%

 * Strictly confined Kubernetes makes edge and IoT secure. Learn how MicroK8s
   just raised the bar for easy, resilient and secure K8s cluster deployment.

   https://ubuntu.com/engage/secure-kubernetes-at-the-edge

0 updates can be applied immediately.



The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.

hdoop@el-mix:~$

INSTALL HADOOP

hdoop@el-mix:~$ wget https://dlcdn.apache.org/hadoop/common/hadoop-3.3.4/hadoop-3.3.4.tar.gz
--2022-11-22 23:25:03--  https://dlcdn.apache.org/hadoop/common/hadoop-3.3.4/hadoop-3.3.4.tar.gz
Resolving dlcdn.apache.org (dlcdn.apache.org)... 151.101.2.132, 2a04:4e42::644
Connecting to dlcdn.apache.org (dlcdn.apache.org)|151.101.2.132|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 695457782 (663M) [application/x-gzip]
Saving to: ‘hadoop-3.3.4.tar.gz.1’

hadoop-3.3.4.tar.gz.1           100%[====================================================>] 663.24M   160MB/s    in 3.8s

2022-11-22 23:25:34 (176 MB/s) - ‘hadoop-3.3.4.tar.gz’ saved [695457782/695457782]

hdoop@el-mix:~$ tar xzf hadoop-3.3.4.tar.gz
hdoop@el-mix:~$ nano .bashrc
#Hadoop Related Options
export HADOOP_HOME=/home/hdoop/hadoop-3.3.4
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/nativ"

hdoop@el-mix:~$ source ~/.bashrc

hdoop@el-mix:~$ dirname $(dirname $(readlink -f $(which javac)))
/usr/lib/jvm/java-11-openjdk-amd64

hdoop@el-mix:~$ mkdir -p /home/hdoop/tmpdata
hdoop@el-mix:~$ mkdir -p /home/hdoop/dfsdata/namenode
hdoop@el-mix:~$ mkdir -p /home/hdoop/dfsdata/datanode
HADOOP CONF
hdoop@el-mix:~$ nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64

hdoop@el-mix:~$ nano $HADOOP_HOME/etc/hadoop/core-site.xml
<configuration>
<property>
  <name>hadoop.tmp.dir</name>
  <value>/home/hdoop/tmpdata</value>
</property>
<property>
  <name>fs.default.name</name>
  <value>hdfs://127.0.0.1:9000</value>
</property>
</configuration>

hdoop@el-mix:~$ nano $HADOOP_HOME/etc/hadoop/hdfs-site.xml
<configuration>
<property>
  <name>dfs.data.dir</name>
  <value>/home/hdoop/dfsdata/namenode</value>
</property>
<property>
  <name>dfs.data.dir</name>
  <value>/home/hdoop/dfsdata/datanode</value>
</property>
<property>
  <name>dfs.replication</name>
  <value>1</value>
</property>
</configuration> 

hdoop@el-mix:~$ nano $HADOOP_HOME/etc/hadoop/yarn-site.xml
<configuration>
<property>
  <name>yarn.nodemanager.aux-services</name>
  <value>mapreduce_shuffle</value>
</property>
<property>
  <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
  <value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
  <name>yarn.resourcemanager.hostname</name>
  <value>127.0.0.1</value>
</property>
<property>
  <name>yarn.acl.enable</name>
  <value>0</value>
</property>
<property>
  <name>yarn.nodemanager.env-whitelist</name>
  <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PERPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
</configuration> 

 

HADOOP DFS FORMAT
hdoop@el-mix:~$ hdfs namenode -format
WARNING: /home/hdoop/hadoop-3.3.4/logs does not exist. Creating.
2022-11-23 00:05:12,583 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = el-mix/10.110.6.77
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 3.3.4
STARTUP_MSG: classpath = /home/hdoop/hadoop-3.3.4/etc/hadoop:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/jetty-io-9.4.43.v20210629.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/kerb-crypto-1.0.1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/jetty-xml-9.4.43.v20210629.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/j2objc-annotations-1.1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/failureaccess-1.0.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/hadoop-annotations-3.3.4.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/jackson-jaxrs-1.9.13.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/jsch-0.1.55.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/animal-sniffer-annotations-1.17.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/nimbus-jose-jwt-9.8.1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/slf4j-reload4j-1.7.36.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/kerb-common-1.0.1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/commons-text-1.4.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/guava-27.0-jre.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/asm-5.0.4.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/json-smart-2.4.7.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/javax.servlet-api-3.1.0.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/httpcore-4.4.13.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/jcip-annotations-1.0-1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/gson-2.8.9.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/jaxb-api-2.2.11.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/commons-io-2.8.0.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/kerby-util-1.0.1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/audience-annotations-0.5.0.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/metrics-core-3.2.4.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/jersey-core-1.19.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/snappy-java-1.1.8.2.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/commons-configuration2-2.1.1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/stax2-api-4.2.1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/kerby-config-1.0.1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/kerb-server-1.0.1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/commons-logging-1.1.3.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/protobuf-java-2.5.0.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/jackson-mapper-asl-1.9.13.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/commons-compress-1.21.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/jersey-servlet-1.19.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/paranamer-2.3.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/kerb-admin-1.0.1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/jetty-util-ajax-9.4.43.v20210629.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/jsp-api-2.1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/zookeeper-3.5.6.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/jetty-server-9.4.43.v20210629.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/jackson-core-asl-1.9.13.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/jetty-webapp-9.4.43.v20210629.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/jsr305-3.0.2.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/commons-collections-3.2.2.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/netty-3.10.6.Final.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/kerb-simplekdc-1.0.1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/jackson-annotations-2.12.7.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/re2j-1.1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/commons-net-3.6.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/hadoop-shaded-guava-1.1.1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/commons-codec-1.15.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/jersey-server-1.19.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/curator-client-4.2.0.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/commons-daemon-1.0.13.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/jetty-servlet-9.4.43.v20210629.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/httpclient-4.5.13.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/kerb-client-1.0.1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/kerby-xdr-1.0.1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/accessors-smart-2.4.7.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/kerb-util-1.0.1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/jettison-1.1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/curator-framework-4.2.0.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/kerb-core-1.0.1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/dnsjava-2.1.7.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/hadoop-shaded-protobuf_3_7-1.1.1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/token-provider-1.0.1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/jetty-http-9.4.43.v20210629.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/commons-lang3-3.12.0.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/jul-to-slf4j-1.7.36.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/reload4j-1.2.22.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/checker-qual-2.5.2.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/zookeeper-jute-3.5.6.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/avro-1.7.7.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/jackson-databind-2.12.7.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/kerby-pkix-1.0.1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/jackson-core-2.12.7.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/woodstox-core-5.3.0.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/curator-recipes-4.2.0.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/kerby-asn1-1.0.1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/slf4j-api-1.7.36.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/jersey-json-1.19.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/jsr311-api-1.1.1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/kerb-identity-1.0.1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/jackson-xc-1.9.13.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/hadoop-auth-3.3.4.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/jetty-security-9.4.43.v20210629.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/jakarta.activation-api-1.2.1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/jetty-util-9.4.43.v20210629.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/commons-beanutils-1.9.4.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/commons-math3-3.1.1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/commons-cli-1.2.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/hadoop-registry-3.3.4.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/hadoop-nfs-3.3.4.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/hadoop-common-3.3.4-tests.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/hadoop-kms-3.3.4.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/hadoop-common-3.3.4.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/jetty-io-9.4.43.v20210629.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/kerb-crypto-1.0.1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/netty-handler-4.1.77.Final.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/jetty-xml-9.4.43.v20210629.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/j2objc-annotations-1.1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/okio-2.8.0.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/netty-codec-redis-4.1.77.Final.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/failureaccess-1.0.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/hadoop-annotations-3.3.4.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/jackson-jaxrs-1.9.13.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/jsch-0.1.55.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/animal-sniffer-annotations-1.17.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/netty-transport-classes-kqueue-4.1.77.Final.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/nimbus-jose-jwt-9.8.1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/kerb-common-1.0.1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/commons-text-1.4.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/guava-27.0-jre.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/asm-5.0.4.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/json-smart-2.4.7.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/javax.servlet-api-3.1.0.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/kotlin-stdlib-common-1.4.10.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/netty-all-4.1.77.Final.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/httpcore-4.4.13.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/jcip-annotations-1.0-1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/gson-2.8.9.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/jaxb-api-2.2.11.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/commons-io-2.8.0.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/kerby-util-1.0.1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/audience-annotations-0.5.0.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/jersey-core-1.19.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/snappy-java-1.1.8.2.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/commons-configuration2-2.1.1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/stax2-api-4.2.1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/kerby-config-1.0.1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/kerb-server-1.0.1.jar:/home/hdoop/hadoop-3.3.4
/share/hadoop/hdfs/lib/netty-transport-native-kqueue-4.1.77.Final-osx-x86_64.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/commons-logging-1.1.3.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/netty-transport-4.1.77.Final.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/protobuf-java-2.5.0.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/netty-transport-native-epoll-4.1.77.Final-linux-x86_64.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/jackson-mapper-asl-1.9.13.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/leveldbjni-all-1.8.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/commons-compress-1.21.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/jersey-servlet-1.19.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/netty-common-4.1.77.Final.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/paranamer-2.3.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/kerb-admin-1.0.1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/netty-codec-smtp-4.1.77.Final.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/jetty-util-ajax-9.4.43.v20210629.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/zookeeper-3.5.6.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/jetty-server-9.4.43.v20210629.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/jackson-core-asl-1.9.13.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/jetty-webapp-9.4.43.v20210629.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/netty-resolver-dns-native-macos-4.1.77.Final-osx-aarch_64.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/jsr305-3.0.2.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/commons-collections-3.2.2.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/jaxb-impl-2.2.3-1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/netty-handler-proxy-4.1.77.Final.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/netty-3.10.6.Final.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/kerb-simplekdc-1.0.1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/jackson-annotations-2.12.7.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/netty-codec-dns-4.1.77.Final.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/netty-transport-sctp-4.1.77.Final.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/netty-codec-4.1.77.Final.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/re2j-1.1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/commons-net-3.6.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/hadoop-shaded-guava-1.1.1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/netty-codec-memcache-4.1.77.Final.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/commons-codec-1.15.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/jersey-server-1.19.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/curator-client-4.2.0.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/jetty-servlet-9.4.43.v20210629.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/httpclient-4.5.13.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/kerb-client-1.0.1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/kerby-xdr-1.0.1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/okhttp-4.9.3.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/accessors-smart-2.4.7.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/kerb-util-1.0.1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/jettison-1.1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/netty-codec-socks-4.1.77.Final.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/netty-transport-native-kqueue-4.1.77.Final-osx-aarch_64.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/curator-framework-4.2.0.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/kerb-core-1.0.1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/dnsjava-2.1.7.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/hadoop-shaded-protobuf_3_7-1.1.1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/token-provider-1.0.1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/json-simple-1.1.1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/jetty-http-9.4.43.v20210629.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/kotlin-stdlib-1.4.10.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/commons-lang3-3.12.0.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/netty-codec-haproxy-4.1.77.Final.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/netty-codec-xml-4.1.77.Final.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/netty-resolver-4.1.77.Final.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/netty-transport-rxtx-4.1.77.Final.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/reload4j-1.2.22.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/checker-qual-2.5.2.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/netty-transport-udt-4.1.77.Final.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/zookeeper-jute-3.5.6.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/avro-1.7.7.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/netty-transport-classes-epoll-4.1.77.Final.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/jackson-databind-2.12.7.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/kerby-pkix-1.0.1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/jackson-core-2.12.7.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/netty-resolver-dns-native-macos-4.1.77.Final-osx-x86_64.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/woodstox-core-5.3.0.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/curator-recipes-4.2.0.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/kerby-asn1-1.0.1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/netty-transport-native-unix-common-4.1.77.Final.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/netty-resolver-dns-4.1.77.Final.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/netty-resolver-dns-classes-macos-4.1.77.Final.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/jersey-json-1.19.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/netty-codec-http-4.1.77.Final.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/netty-transport-native-epoll-4.1.77.Final-linux-aarch_64.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/netty-codec-mqtt-4.1.77.Final.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/jsr311-api-1.1.1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/kerb-identity-1.0.1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/netty-codec-stomp-4.1.77.Final.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/jackson-xc-1.9.13.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/hadoop-auth-3.3.4.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/jetty-security-9.4.43.v20210629.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/jakarta.activation-api-1.2.1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/jetty-util-9.4.43.v20210629.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/netty-buffer-4.1.77.Final.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/commons-beanutils-1.9.4.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/commons-math3-3.1.1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/commons-cli-1.2.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/lib/netty-codec-http2-4.1.77.Final.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/hadoop-hdfs-native-client-3.3.4-tests.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/hadoop-hdfs-3.3.4-tests.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/hadoop-hdfs-client-3.3.4-tests.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/hadoop-hdfs-native-client-3.3.4.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/hadoop-hdfs-nfs-3.3.4.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/hadoop-hdfs-client-3.3.4.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/hadoop-hdfs-3.3.4.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/hadoop-hdfs-httpfs-3.3.4.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/hadoop-hdfs-rbf-3.3.4-tests.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/hdfs/hadoop-hdfs-rbf-3.3.4.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/mapreduce/hadoop-mapreduce-client-nativetask-3.3.4.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.3.4.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-3.3.4.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.3.4-tests.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/mapreduce/hadoop-mapreduce-client-shuffle-3.3.4.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/mapreduce/hadoop-mapreduce-client-core-3.3.4.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/mapreduce/hadoop-mapreduce-client-app-3.3.4.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.4.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/mapreduce/hadoop-mapreduce-client-common-3.3.4.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/mapreduce/hadoop-mapreduce-client-uploader-3.3.4.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-plugins-3.3.4.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/yarn:/home/hdoop/hadoop-3.3.4/share/hadoop/yarn/lib/guice-4.0.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/yarn/lib/HikariCP-java7-2.4.12.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/yarn/lib/mssql-jdbc-6.2.1.jre7.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/yarn/lib/jetty-plus-9.4.43.v20210629.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/yarn/lib/javax.websocket-client-api-1.0.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/yarn/lib/bcprov-jdk15on-1.60.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/yarn/lib/guice-servlet-4.0.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/yarn/lib/metrics-core-3.2.4.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/yarn/lib/swagger-annotations-1.5.4.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/yarn/lib/jackson-jaxrs-json-provider-2.12.7.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/yarn/lib/aopalliance-1.0.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/yarn/lib/geronimo-jcache_1.0_spec-1.0-alpha-1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/yarn/lib/javax.websocket-api-1.0.jar:/home/hdoop/hadoop-3.3.4/share/had
oop/yarn/lib/javax-websocket-client-impl-9.4.43.v20210629.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/yarn/lib/websocket-api-9.4.43.v20210629.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/yarn/lib/websocket-server-9.4.43.v20210629.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/yarn/lib/asm-tree-9.1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/yarn/lib/asm-commons-9.1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/yarn/lib/bcpkix-jdk15on-1.60.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/yarn/lib/jetty-annotations-9.4.43.v20210629.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/yarn/lib/jline-3.9.0.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/yarn/lib/snakeyaml-1.26.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/yarn/lib/jna-5.2.0.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/yarn/lib/objenesis-2.6.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/yarn/lib/asm-analysis-9.1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/yarn/lib/javax-websocket-server-impl-9.4.43.v20210629.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/yarn/lib/jackson-jaxrs-base-2.12.7.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/yarn/lib/jackson-module-jaxb-annotations-2.12.7.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/yarn/lib/jetty-client-9.4.43.v20210629.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/yarn/lib/websocket-client-9.4.43.v20210629.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/yarn/lib/jetty-jndi-9.4.43.v20210629.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/yarn/lib/websocket-servlet-9.4.43.v20210629.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/yarn/lib/fst-2.50.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/yarn/lib/jakarta.xml.bind-api-2.3.2.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/yarn/lib/json-io-2.5.1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/yarn/lib/websocket-common-9.4.43.v20210629.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/yarn/lib/ehcache-3.3.1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/yarn/lib/jersey-guice-1.19.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/yarn/lib/java-util-1.9.0.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/yarn/lib/javax.inject-1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/yarn/lib/jersey-client-1.19.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/yarn/hadoop-yarn-services-core-3.3.4.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/yarn/hadoop-yarn-server-timeline-pluginstorage-3.3.4.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/yarn/hadoop-yarn-common-3.3.4.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/yarn/hadoop-yarn-server-sharedcachemanager-3.3.4.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/yarn/hadoop-yarn-server-common-3.3.4.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/yarn/hadoop-yarn-server-router-3.3.4.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/yarn/hadoop-yarn-api-3.3.4.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/yarn/hadoop-yarn-client-3.3.4.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/yarn/hadoop-yarn-server-tests-3.3.4.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.3.4.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/yarn/hadoop-yarn-registry-3.3.4.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/yarn/hadoop-yarn-server-resourcemanager-3.3.4.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/yarn/hadoop-yarn-server-web-proxy-3.3.4.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/yarn/hadoop-yarn-applications-mawo-core-3.3.4.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/yarn/hadoop-yarn-services-api-3.3.4.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-3.3.4.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/yarn/hadoop-yarn-server-applicationhistoryservice-3.3.4.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/yarn/hadoop-yarn-server-nodemanager-3.3.4.jar
STARTUP_MSG: build = https://github.com/apache/hadoop.git -r a585a73c3e02ac62350c136643a5e7f6095a3dbb; compiled by 'stevel' on 2022-07-29T12:32Z
STARTUP_MSG: java = 11.0.17
************************************************************/
2022-11-23 00:05:12,592 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
2022-11-23 00:05:12,717 INFO namenode.NameNode: createNameNode [-format]
2022-11-23 00:05:12,869 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2022-11-23 00:05:13,215 INFO namenode.NameNode: Formatting using clusterid: CID-494e8f94-14a1-4ee4-a5aa-a79dbda4e8fa
2022-11-23 00:05:13,245 INFO namenode.FSEditLog: Edit logging is async:true
2022-11-23 00:05:13,274 INFO namenode.FSNamesystem: KeyProvider: null
2022-11-23 00:05:13,276 INFO namenode.FSNamesystem: fsLock is fair: true
2022-11-23 00:05:13,276 INFO namenode.FSNamesystem: Detailed lock hold time metrics enabled: false
2022-11-23 00:05:13,294 INFO namenode.FSNamesystem: fsOwner = hdoop (auth:SIMPLE)
2022-11-23 00:05:13,294 INFO namenode.FSNamesystem: supergroup = supergroup
2022-11-23 00:05:13,294 INFO namenode.FSNamesystem: isPermissionEnabled = true
2022-11-23 00:05:13,294 INFO namenode.FSNamesystem: isStoragePolicyEnabled = true
2022-11-23 00:05:13,294 INFO namenode.FSNamesystem: HA Enabled: false
2022-11-23 00:05:13,336 INFO common.Util: dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO profiling
2022-11-23 00:05:13,346 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit: configured=1000, counted=60, effected=1000
2022-11-23 00:05:13,346 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
2022-11-23 00:05:13,350 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
2022-11-23 00:05:13,350 INFO blockmanagement.BlockManager: The block deletion will start around 2022 Nov 23 00:05:13
2022-11-23 00:05:13,352 INFO util.GSet: Computing capacity for map BlocksMap
2022-11-23 00:05:13,352 INFO util.GSet: VM type = 64-bit
2022-11-23 00:05:13,353 INFO util.GSet: 2.0% max memory 1.9 GB = 39.0 MB
2022-11-23 00:05:13,353 INFO util.GSet: capacity = 2^22 = 4194304 entries
2022-11-23 00:05:13,392 INFO blockmanagement.BlockManager: Storage policy satisfier is disabled
2022-11-23 00:05:13,393 INFO blockmanagement.BlockManager: dfs.block.access.token.enable = false
2022-11-23 00:05:13,399 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.threshold-pct = 0.999
2022-11-23 00:05:13,399 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.min.datanodes = 0
2022-11-23 00:05:13,399 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.extension = 30000
2022-11-23 00:05:13,400 INFO blockmanagement.BlockManager: defaultReplication = 1
2022-11-23 00:05:13,400 INFO blockmanagement.BlockManager: maxReplication = 512
2022-11-23 00:05:13,400 INFO blockmanagement.BlockManager: minReplication = 1
2022-11-23 00:05:13,400 INFO blockmanagement.BlockManager: maxReplicationStreams = 2
2022-11-23 00:05:13,400 INFO blockmanagement.BlockManager: redundancyRecheckInterval = 3000ms
2022-11-23 00:05:13,400 INFO blockmanagement.BlockManager: encryptDataTransfer = false
2022-11-23 00:05:13,400 INFO blockmanagement.BlockManager: maxNumBlocksToLog = 1000
2022-11-23 00:05:13,421 INFO namenode.FSDirectory: GLOBAL serial map: bits=29 maxEntries=536870911
2022-11-23 00:05:13,421 INFO namenode.FSDirectory: USER serial map: bits=24 maxEntries=16777215
2022-11-23 00:05:13,421 INFO namenode.FSDirectory: GROUP serial map: bits=24 maxEntries=16777215
2022-11-23 00:05:13,422 INFO namenode.FSDirectory: XATTR serial map: bits=24 maxEntries=16777215
2022-11-23 00:05:13,432 INFO util.GSet: Computing capacity for map INodeMap
2022-11-23 00:05:13,432 INFO util.GSet: VM type = 64-bit
2022-11-23 00:05:13,433 INFO util.GSet: 1.0% max memory 1.9 GB = 19.5 MB
2022-11-23 00:05:13,433 INFO util.GSet: capacity = 2^21 = 2097152 entries
2022-11-23 00:05:13,449 INFO namenode.FSDirectory: ACLs enabled? true
2022-11-23 00:05:13,449 INFO namenode.FSDirectory: POSIX ACL inheritance enabled? true
2022-11-23 00:05:13,449 INFO namenode.FSDirectory: XAttrs enabled? true
2022-11-23 00:05:13,449 INFO namenode.NameNode: Caching file names occurring more than 10 times
2022-11-23 00:05:13,454 INFO snapshot.SnapshotManager: Loaded config captureOpenFiles: false, skipCaptureAccessTimeOnlyChange: false, snapshotDiffAllowSnapRootDescendant: true, maxSnapshotLimit: 65536
2022-11-23 00:05:13,455 INFO snapshot.SnapshotManager: SkipList is disabled
2022-11-23 00:05:13,459 INFO util.GSet: Computing capacity for map cachedBlocks
2022-11-23 00:05:13,459 INFO util.GSet: VM type = 64-bit
2022-11-23 00:05:13,459 INFO util.GSet: 0.25% max memory 1.9 GB = 4.9 MB
2022-11-23 00:05:13,459 INFO util.GSet: capacity = 2^19 = 524288 entries
2022-11-23 00:05:13,471 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
2022-11-23 00:05:13,471 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
2022-11-23 00:05:13,471 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
2022-11-23 00:05:13,476 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
2022-11-23 00:05:13,476 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
2022-11-23 00:05:13,477 INFO util.GSet: Computing capacity for map NameNodeRetryCache
2022-11-23 00:05:13,478 INFO util.GSet: VM type = 64-bit
2022-11-23 00:05:13,478 INFO util.GSet: 0.029999999329447746% max memory 1.9 GB = 599.7 KB
2022-11-23 00:05:13,478 INFO util.GSet: capacity = 2^16 = 65536 entries
2022-11-23 00:05:15,467 INFO namenode.FSImage: Allocated new BlockPoolId: BP-22364112-10.110.6.77-1669161915457
2022-11-23 00:05:15,650 INFO common.Storage: Storage directory /home/hdoop/tmpdata/dfs/name has been successfully formatted.
2022-11-23 00:05:15,689 INFO namenode.FSImageFormatProtobuf: Saving image file /home/hdoop/tmpdata/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
2022-11-23 00:05:15,841 INFO namenode.FSImageFormatProtobuf: Image file /home/hdoop/tmpdata/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 400 bytes saved in 0 seconds .
2022-11-23 00:05:15,885 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
2022-11-23 00:05:15,919 INFO namenode.FSNamesystem: Stopping services started for active state
2022-11-23 00:05:15,920 INFO namenode.FSNamesystem: Stopping services started for standby state
2022-11-23 00:05:15,924 INFO namenode.FSImage: FSImageSaver clean checkpoint: txid=0 when meet shutdown.
2022-11-23 00:05:15,925 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at el-mix/10.110.6.77
************************************************************/
HADOOP RUN
hdoop@el-mix:~$ cd hadoop-3.3.4/sbin/
hdoop@el-mix:~/hadoop-3.3.4/sbin$
hdoop@el-mix:~/hadoop-3.3.4/sbin$ ./start-dfs.sh
Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [el-mix]
el-mix: Warning: Permanently added 'el-mix' (ED25519) to the list of known hosts.
2022-11-23 00:15:01,577 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
(https://sparkbyexamples.com/hadoop/hadoop-unable-to-load-native-hadoop-library-for-your-platform-warning/)

hdoop@el-mix:~$ nano /home/hdoop/.bashrc
#Hadoop Related Options
export HADOOP_HOME=/home/hdoop/hadoop-3.3.4
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=$HADOOP_HOME/lib/native"

hdoop@el-mix:~$ source /home/hdoop/.bashrc
hdoop@el-mix:~$ cd hadoop-3.3.4/sbin
hdoop@el-mix:~/hadoop-3.3.4/sbin$ ./start-dfs.sh – now OK

hdoop@el-mix:~/hadoop-3.3.4/sbin$ ./start-yarn.sh
Starting resourcemanager
Starting nodemanagers

hdoop@el-mix:~/hadoop-3.3.4/sbin$ jps
6273 NameNode
7012 NodeManager
6858 ResourceManager
6651 SecondaryNameNode
6412 DataNode
7374 Jps

HADOOP WEB PAGES
http://el-mix:9870
http://el-mix:9870/dfshealth.html#tab-overview 

http://el-mix:8042/cluster
 http://el-mix:8042/node

Apache NUTCH

Base user developer
cd ~
developer@el-mix:~$ git clone https://github.com/apache/nutch.git
Cloning into 'nutch'...
remote: Enumerating objects: 68010, done.
remote: Counting objects: 100% (1843/1843), done.
remote: Compressing objects: 100% (755/755), done.
remote: Total 68010 (delta 639), reused 1599 (delta 516), pack-reused 66167
Receiving objects: 100% (68010/68010), 133.53 MiB | 40.00 MiB/s, done.
Resolving deltas: 100% (32563/32563), done.
NUTCH CONFIG
developer@el-mix:~$ cd nutch/conf
developer@el-mix:~/nutch/conf$
developer@el-mix:~/nutch/conf$ cp index-writers.xml.template index-writers.xml
developer@el-mix:~/nutch/conf$ cp nutch-site.xml.template nutch-site.xml
developer@el-mix:~/nutch/conf$ nano nutch-site.xml
<configuration>
<property>
<name>http.agent.name</name>
<value>MY Spider</value>
</property>

<property>
<name>plugin.includes</name>
<value>protocol-http|urlfilter-regex|parse-(html|tika)|index-(basic|anchor)|urlnormalizer-(pass|regex|basic)|scoring-opic|indexer-elastic</value>
</property>

<property>
<name>db.ignore.external.links</name>
<value>false</value>
<description>If true, outlinks leading from a page to external hosts or domain
will be ignored. This is an effective way to limit the crawl to include
only initially injected hosts or domains, without creating complex URLFilters.
See 'db.ignore.external.links.mode'.
</description>
</property>


<property>
<name>elastic.host</name>
<value>localhost</value>
<description>The hostname to send documents to using TransportClient.
Either host and port must be defined or cluster.
</description>
</property>

<property>
<name>elastic.index</name>
<value>nutch</value>
<description>
The name of the elasticsearch index. Will normally be autocreated if it
doesn't exist.
</description>
</property>
</configuration>
BUILD
eveloper@el-mix:~/nutch/conf$ cd ~/nutch
developer@el-mix:~/nutch$

developer@el-mix:~/nutch$ ant
Command 'ant' not found, but can be installed with:
sudo snap install ant # version 1.10.12, or
sudo apt install ant # version 1.10.12-1
See 'snap info ant' for additional versions.

developer@el-mix:~/nutch$ sudo apt install ant

developer@el-mix:~/nutch$ ant
Buildfile: /home/developer/nutch/build.xml
Trying to override old definition of task javac

ivy-probe-antlib:

ivy-download:

ivy-download-unchecked:
[get] Getting: https://repo1.maven.org/maven2/org/apache/ivy/ivy/2.5.0/ivy-2.5.0.jar
[get] To: /home/developer/nutch/ivy/ivy-2.5.0.jar

ivy-init-antlib:

ivy-init:

init:
[mkdir] Created dir: /home/developer/nutch/build
[mkdir] Created dir: /home/developer/nutch/build/classes
[mkdir] Created dir: /home/developer/nutch/build/release
[mkdir] Created dir: /home/developer/nutch/build/test
[mkdir] Created dir: /home/developer/nutch/build/test/classes
[mkdir] Created dir: /home/developer/nutch/build/test/lib
[copy] Copying 27 files to /home/developer/nutch/conf
[copy] Copying /home/developer/nutch/conf/adaptive-mimetypes.txt.template to /home/developer/nutch/conf/adaptive-mimetypes.txt
[copy] Copying /home/developer/nutch/conf/automaton-urlfilter.txt.template to /home/developer/nutch/conf/automaton-urlfilter.txt
[copy] Copying /home/developer/nutch/conf/contenttype-mapping.txt.template to /home/developer/nutch/conf/contenttype-mapping.txt
[copy] Copying /home/developer/nutch/conf/cookies.txt.template to /home/developer/nutch/conf/cookies.txt
[copy] Copying /home/developer/nutch/conf/date-styles.txt.template to /home/developer/nutch/conf/date-styles.txt
[copy] Copying /home/developer/nutch/conf/db-ignore-external-exemptions.txt.template to /home/developer/nutch/conf/db-ignore-external-exemptions.txt
[copy] Copying /home/developer/nutch/conf/domain-suffixes.xml.template to /home/developer/nutch/conf/domain-suffixes.xml
[copy] Copying /home/developer/nutch/conf/domain-urlfilter.txt.template to /home/developer/nutch/conf/domain-urlfilter.txt
[copy] Copying /home/developer/nutch/conf/domaindenylist-urlfilter.txt.template to /home/developer/nutch/conf/domaindenylist-urlfilter.txt
[copy] Copying /home/developer/nutch/conf/exchanges.xml.template to /home/developer/nutch/conf/exchanges.xml
[copy] Copying /home/developer/nutch/conf/fast-urlfilter.txt.template to /home/developer/nutch/conf/fast-urlfilter.txt
[copy] Copying /home/developer/nutch/conf/host-protocol-mapping.txt.template to /home/developer/nutch/conf/host-protocol-mapping.txt
[copy] Copying /home/developer/nutch/conf/host-urlnormalizer.txt.template to /home/developer/nutch/conf/host-urlnormalizer.txt
[copy] Copying /home/developer/nutch/conf/httpclient-auth.xml.template to /home/developer/nutch/conf/httpclient-auth.xml
[copy] Copying /home/developer/nutch/conf/mimetype-filter.txt.template to /home/developer/nutch/conf/mimetype-filter.txt
[copy] Copying /home/developer/nutch/conf/naivebayes-train.txt.template to /home/developer/nutch/conf/naivebayes-train.txt
[copy] Copying /home/developer/nutch/conf/naivebayes-wordlist.txt.template to /home/developer/nutch/conf/naivebayes-wordlist.txt
[copy] Copying /home/developer/nutch/conf/parse-plugins.xml.template to /home/developer/nutch/conf/parse-plugins.xml
[copy] Copying /home/developer/nutch/conf/prefix-urlfilter.txt.template to /home/developer/nutch/conf/prefix-urlfilter.txt
[copy] Copying /home/developer/nutch/conf/protocols.txt.template to /home/developer/nutch/conf/protocols.txt
[copy] Copying /home/developer/nutch/conf/regex-normalize.xml.template to /home/developer/nutch/conf/regex-normalize.xml
[copy] Copying /home/developer/nutch/conf/regex-parsefilter.txt.template to /home/developer/nutch/conf/regex-parsefilter.txt
[copy] Copying /home/developer/nutch/conf/regex-urlfilter.txt.template to /home/developer/nutch/conf/regex-urlfilter.txt
[copy] Copying /home/developer/nutch/conf/stopwords.txt.template to /home/developer/nutch/conf/stopwords.txt
[copy] Copying /home/developer/nutch/conf/subcollections.xml.template to /home/developer/nutch/conf/subcollections.xml
[copy] Copying /home/developer/nutch/conf/suffix-urlfilter.txt.template to /home/developer/nutch/conf/suffix-urlfilter.txt
[copy] Copying /home/developer/nutch/conf/tika-config.xml.template to /home/developer/nutch/conf/tika-config.xml

clean-default-lib:

resolve-default:
[ivy:resolve] :: Apache Ivy 2.5.0 - 20191020104435 :: https://ant.apache.org/ivy/ ::
[ivy:resolve] :: loading settings :: file = /home/developer/nutch/ivy/ivysettings.xml

...

compile:

job:
[jar] Building jar: /home/developer/nutch/build/apache-nutch-1.20-SNAPSHOT.job

runtime:
[mkdir] Created dir: /home/developer/nutch/runtime
[mkdir] Created dir: /home/developer/nutch/runtime/local
[mkdir] Created dir: /home/developer/nutch/runtime/deploy
[copy] Copying 1 file to /home/developer/nutch/runtime/deploy
[copy] Copying 2 files to /home/developer/nutch/runtime/deploy/bin
[copy] Copying 1 file to /home/developer/nutch/runtime/local/lib
[copy] Copying 1 file to /home/developer/nutch/runtime/local/lib/native
[copy] Copying 36 files to /home/developer/nutch/runtime/local/conf
[copy] Copying 2 files to /home/developer/nutch/runtime/local/bin
[copy] Copying 212 files to /home/developer/nutch/runtime/local/lib
[copy] Copying 649 files to /home/developer/nutch/runtime/local/plugins
[copy] Copied 3 empty directories to 3 empty directories under /home/developer/nutch/runtime/local/test

BUILD SUCCESSFUL
Total time: 8 minutes 43 seconds

CRAWLING RUNTIME SCRIPT
developer@el-mix:~$ mkdir crawler
developer@el-mix:~$ cd crawler/
developer@el-mix:~/crawler$
developer@el-mix:~/crawler$ mkdir urls
developer@el-mix:~/crawler$ nano urls/seed.txt
https://cwiki.apache.org


developer@el-mix:~/crawler$ nano cr-local.sh
#!/bin/bash
export JAVA_HOME=$(readlink -f /usr/bin/java | sed "s:bin/java::")
export HADOOP_HOME=/home/hdoop/hadoop-3.3.4
export HADOOP_COMMON_HOME=/home/hdoop/hadoop-3.3.4
export PATH="${PATH}:$HADOOP_HOME/bin"
NUTCH=/home/developer/nutch/runtime/local
$NUTCH/bin/nutch inject crawl/crawldb urls
$NUTCH/bin/crawl --size-fetchlist 10 -i crawl 1
CRAWLING RUNTIME RUN
developer@el-mix:~/crawler$ ./cr-local.sh
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/developer/nutch/runtime/local/lib/log4j-slf4j-impl-2.18.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/developer/nutch/runtime/local/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
2022-11-24 22:35:03,066 INFO o.a.n.p.PluginManifestParser [main] Plugins: looking in: /home/developer/nutch/runtime/local/plugins
2022-11-24 22:35:03,268 INFO o.a.n.p.PluginRepository [main] Plugin Auto-activation mode: [true]
2022-11-24 22:35:03,269 INFO o.a.n.p.PluginRepository [main] Registered Plugins:
2022-11-24 22:35:03,269 INFO o.a.n.p.PluginRepository [main]    Regex URL Filter (urlfilter-regex)
2022-11-24 22:35:03,269 INFO o.a.n.p.PluginRepository [main]    Html Parse Plug-in (parse-html)
2022-11-24 22:35:03,269 INFO o.a.n.p.PluginRepository [main]    HTTP Framework (lib-http)
2022-11-24 22:35:03,270 INFO o.a.n.p.PluginRepository [main]    the nutch core extension points (nutch-extensionpoints)
2022-11-24 22:35:03,270 INFO o.a.n.p.PluginRepository [main]    Basic Indexing Filter (index-basic)
2022-11-24 22:35:03,270 INFO o.a.n.p.PluginRepository [main]    Anchor Indexing Filter (index-anchor)
2022-11-24 22:35:03,270 INFO o.a.n.p.PluginRepository [main]    Tika Parser Plug-in (parse-tika)
2022-11-24 22:35:03,270 INFO o.a.n.p.PluginRepository [main]    Basic URL Normalizer (urlnormalizer-basic)
2022-11-24 22:35:03,271 INFO o.a.n.p.PluginRepository [main]    Regex URL Filter Framework (lib-regex-filter)
2022-11-24 22:35:03,271 INFO o.a.n.p.PluginRepository [main]    Regex URL Normalizer (urlnormalizer-regex)
2022-11-24 22:35:03,271 INFO o.a.n.p.PluginRepository [main]    CyberNeko HTML Parser (lib-nekohtml)
2022-11-24 22:35:03,271 INFO o.a.n.p.PluginRepository [main]    OPIC Scoring Plug-in (scoring-opic)
2022-11-24 22:35:03,272 INFO o.a.n.p.PluginRepository [main]    Pass-through URL Normalizer (urlnormalizer-pass)
2022-11-24 22:35:03,272 INFO o.a.n.p.PluginRepository [main]    Http Protocol Plug-in (protocol-http)
2022-11-24 22:35:03,272 INFO o.a.n.p.PluginRepository [main]    ElasticIndexWriter (indexer-elastic)
2022-11-24 22:35:03,272 INFO o.a.n.p.PluginRepository [main] Registered Extension-Points:
2022-11-24 22:35:03,272 INFO o.a.n.p.PluginRepository [main]     (Nutch Content Parser)
2022-11-24 22:35:03,273 INFO o.a.n.p.PluginRepository [main]     (Nutch URL Filter)
2022-11-24 22:35:03,273 INFO o.a.n.p.PluginRepository [main]     (HTML Parse Filter)
2022-11-24 22:35:03,273 INFO o.a.n.p.PluginRepository [main]     (Nutch Scoring)
2022-11-24 22:35:03,273 INFO o.a.n.p.PluginRepository [main]     (Nutch URL Normalizer)
2022-11-24 22:35:03,273 INFO o.a.n.p.PluginRepository [main]     (Nutch Publisher)
2022-11-24 22:35:03,274 INFO o.a.n.p.PluginRepository [main]     (Nutch Exchange)
2022-11-24 22:35:03,274 INFO o.a.n.p.PluginRepository [main]     (Nutch Protocol)
2022-11-24 22:35:03,274 INFO o.a.n.p.PluginRepository [main]     (Nutch URL Ignore Exemption Filter)
2022-11-24 22:35:03,274 INFO o.a.n.p.PluginRepository [main]     (Nutch Index Writer)
2022-11-24 22:35:03,274 INFO o.a.n.p.PluginRepository [main]     (Nutch Segment Merge Filter)
2022-11-24 22:35:03,274 INFO o.a.n.p.PluginRepository [main]     (Nutch Indexing Filter)
2022-11-24 22:35:03,278 INFO o.a.n.c.Injector [main] Injector: starting at 2022-11-24 22:35:03
2022-11-24 22:35:03,278 INFO o.a.n.c.Injector [main] Injector: crawlDb: crawl/crawldb
2022-11-24 22:35:03,278 INFO o.a.n.c.Injector [main] Injector: urlDir: urls
2022-11-24 22:35:03,278 INFO o.a.n.c.Injector [main] Injector: Converting injected urls to crawl db entries.
2022-11-24 22:35:03,604 INFO o.a.n.c.Injector [main] Injecting seed URL file file:/home/developer/crawler/urls/seed.txt
2022-11-24 22:35:04,444 INFO o.a.n.n.u.r.RegexURLNormalizer [LocalJobRunner Map Task Executor #0] can't find rules for scope 'inject', using default
2022-11-24 22:35:04,591 INFO o.a.n.c.Injector [pool-5-thread-1] Injector: overwrite: false
2022-11-24 22:35:04,591 INFO o.a.n.c.Injector [pool-5-thread-1] Injector: update: false
2022-11-24 22:35:05,241 INFO o.a.n.c.Injector [main] Injector: Total urls rejected by filters: 0
2022-11-24 22:35:05,241 INFO o.a.n.c.Injector [main] Injector: Total urls injected after normalization and filtering: 1
2022-11-24 22:35:05,242 INFO o.a.n.c.Injector [main] Injector: Total urls injected but already in CrawlDb: 0
2022-11-24 22:35:05,242 INFO o.a.n.c.Injector [main] Injector: Total new urls injected: 1
2022-11-24 22:35:05,245 INFO o.a.n.c.Injector [main] Injector: finished at 2022-11-24 22:35:05, elapsed: 00:00:01
Thu Nov 24 10:35:05 PM UTC 2022 : Iteration 1 of 1
Generating a new segment
/home/developer/nutch/runtime/local/bin/nutch generate -Dmapreduce.job.reduces=2 -Dmapreduce.reduce.speculative=false -Dmapreduce.map.speculative=false -Dmapreduce.map.output.compress=true crawl/crawldb crawl/segments -topN 10 -numFetchers 1 -noFilter
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/developer/nutch/runtime/local/lib/log4j-slf4j-impl-2.18.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/developer/nutch/runtime/local/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
2022-11-24 22:35:06,666 INFO o.a.n.p.PluginManifestParser [main] Plugins: looking in: /home/developer/nutch/runtime/local/plugins
2022-11-24 22:35:06,857 INFO o.a.n.p.PluginRepository [main] Plugin Auto-activation mode: [true]
2022-11-24 22:35:06,858 INFO o.a.n.p.PluginRepository [main] Registered Plugins:
2022-11-24 22:35:06,858 INFO o.a.n.p.PluginRepository [main]    Regex URL Filter (urlfilter-regex)
2022-11-24 22:35:06,858 INFO o.a.n.p.PluginRepository [main]    Html Parse Plug-in (parse-html)
2022-11-24 22:35:06,859 INFO o.a.n.p.PluginRepository [main]    HTTP Framework (lib-http)
2022-11-24 22:35:06,859 INFO o.a.n.p.PluginRepository [main]    the nutch core extension points (nutch-extensionpoints)
2022-11-24 22:35:06,859 INFO o.a.n.p.PluginRepository [main]    Basic Indexing Filter (index-basic)
2022-11-24 22:35:06,859 INFO o.a.n.p.PluginRepository [main]    Anchor Indexing Filter (index-anchor)
2022-11-24 22:35:06,859 INFO o.a.n.p.PluginRepository [main]    Tika Parser Plug-in (parse-tika)
2022-11-24 22:35:06,860 INFO o.a.n.p.PluginRepository [main]    Basic URL Normalizer (urlnormalizer-basic)
2022-11-24 22:35:06,860 INFO o.a.n.p.PluginRepository [main]    Regex URL Filter Framework (lib-regex-filter)
2022-11-24 22:35:06,860 INFO o.a.n.p.PluginRepository [main]    Regex URL Normalizer (urlnormalizer-regex)
2022-11-24 22:35:06,860 INFO o.a.n.p.PluginRepository [main]    CyberNeko HTML Parser (lib-nekohtml)
2022-11-24 22:35:06,860 INFO o.a.n.p.PluginRepository [main]    OPIC Scoring Plug-in (scoring-opic)
2022-11-24 22:35:06,861 INFO o.a.n.p.PluginRepository [main]    Pass-through URL Normalizer (urlnormalizer-pass)
2022-11-24 22:35:06,861 INFO o.a.n.p.PluginRepository [main]    Http Protocol Plug-in (protocol-http)
2022-11-24 22:35:06,861 INFO o.a.n.p.PluginRepository [main]    ElasticIndexWriter (indexer-elastic)
2022-11-24 22:35:06,861 INFO o.a.n.p.PluginRepository [main] Registered Extension-Points:
2022-11-24 22:35:06,861 INFO o.a.n.p.PluginRepository [main]     (Nutch Content Parser)
2022-11-24 22:35:06,861 INFO o.a.n.p.PluginRepository [main]     (Nutch URL Filter)
2022-11-24 22:35:06,862 INFO o.a.n.p.PluginRepository [main]     (HTML Parse Filter)
2022-11-24 22:35:06,862 INFO o.a.n.p.PluginRepository [main]     (Nutch Scoring)
2022-11-24 22:35:06,862 INFO o.a.n.p.PluginRepository [main]     (Nutch URL Normalizer)
2022-11-24 22:35:06,862 INFO o.a.n.p.PluginRepository [main]     (Nutch Publisher)
2022-11-24 22:35:06,862 INFO o.a.n.p.PluginRepository [main]     (Nutch Exchange)
2022-11-24 22:35:06,863 INFO o.a.n.p.PluginRepository [main]     (Nutch Protocol)
2022-11-24 22:35:06,863 INFO o.a.n.p.PluginRepository [main]     (Nutch URL Ignore Exemption Filter)
2022-11-24 22:35:06,863 INFO o.a.n.p.PluginRepository [main]     (Nutch Index Writer)
2022-11-24 22:35:06,863 INFO o.a.n.p.PluginRepository [main]     (Nutch Segment Merge Filter)
2022-11-24 22:35:06,863 INFO o.a.n.p.PluginRepository [main]     (Nutch Indexing Filter)
2022-11-24 22:35:07,136 INFO o.a.n.c.Generator [main] Generator: starting at 2022-11-24 22:35:07
2022-11-24 22:35:07,136 INFO o.a.n.c.Generator [main] Generator: Selecting best-scoring urls due for fetch.
2022-11-24 22:35:07,137 INFO o.a.n.c.Generator [main] Generator: filtering: false
2022-11-24 22:35:07,137 INFO o.a.n.c.Generator [main] Generator: normalizing: true
2022-11-24 22:35:07,138 INFO o.a.n.c.Generator [main] Generator: topN: 10
2022-11-24 22:35:07,985 INFO o.a.n.c.FetchScheduleFactory [LocalJobRunner Map Task Executor #0] Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule
2022-11-24 22:35:07,986 INFO o.a.n.c.AbstractFetchSchedule [LocalJobRunner Map Task Executor #0] defaultInterval=2592000
2022-11-24 22:35:07,987 INFO o.a.n.c.AbstractFetchSchedule [LocalJobRunner Map Task Executor #0] maxInterval=7776000
2022-11-24 22:35:07,993 INFO o.a.n.n.u.r.RegexURLNormalizer [LocalJobRunner Map Task Executor #0] can't find rules for scope 'partition', using default
2022-11-24 22:35:08,167 INFO o.a.n.n.u.r.RegexURLNormalizer [pool-5-thread-1] can't find rules for scope 'generate_host_count', using default
2022-11-24 22:35:08,763 INFO o.a.n.c.Generator [main] Generator: number of items rejected during selection:
2022-11-24 22:35:08,770 INFO o.a.n.c.Generator [main] Generator: Partitioning selected urls for politeness.
2022-11-24 22:35:09,771 INFO o.a.n.c.Generator [main] Generator: segment: crawl/segments/20221124223509
2022-11-24 22:35:10,954 INFO o.a.n.c.Generator [main] Generator: finished at 2022-11-24 22:35:10, elapsed: 00:00:03
Operating on segment : 20221124223509
Fetching : 20221124223509
/home/developer/nutch/runtime/local/bin/nutch fetch -Dmapreduce.job.reduces=2 -Dmapreduce.reduce.speculative=false -Dmapreduce.map.speculative=false -Dmapreduce.map.output.compress=true -D fetcher.timelimit.mins=180 crawl/segments/20221124223509 -threads 50
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/developer/nutch/runtime/local/lib/log4j-slf4j-impl-2.18.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/developer/nutch/runtime/local/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
2022-11-24 22:35:12,444 INFO o.a.n.p.PluginManifestParser [main] Plugins: looking in: /home/developer/nutch/runtime/local/plugins
2022-11-24 22:35:12,648 INFO o.a.n.p.PluginRepository [main] Plugin Auto-activation mode: [true]
2022-11-24 22:35:12,649 INFO o.a.n.p.PluginRepository [main] Registered Plugins:
2022-11-24 22:35:12,649 INFO o.a.n.p.PluginRepository [main]    Regex URL Filter (urlfilter-regex)
2022-11-24 22:35:12,649 INFO o.a.n.p.PluginRepository [main]    Html Parse Plug-in (parse-html)
2022-11-24 22:35:12,649 INFO o.a.n.p.PluginRepository [main]    HTTP Framework (lib-http)
2022-11-24 22:35:12,649 INFO o.a.n.p.PluginRepository [main]    the nutch core extension points (nutch-extensionpoints)
2022-11-24 22:35:12,650 INFO o.a.n.p.PluginRepository [main]    Basic Indexing Filter (index-basic)
2022-11-24 22:35:12,650 INFO o.a.n.p.PluginRepository [main]    Anchor Indexing Filter (index-anchor)
2022-11-24 22:35:12,650 INFO o.a.n.p.PluginRepository [main]    Tika Parser Plug-in (parse-tika)
2022-11-24 22:35:12,650 INFO o.a.n.p.PluginRepository [main]    Basic URL Normalizer (urlnormalizer-basic)
2022-11-24 22:35:12,651 INFO o.a.n.p.PluginRepository [main]    Regex URL Filter Framework (lib-regex-filter)
2022-11-24 22:35:12,651 INFO o.a.n.p.PluginRepository [main]    Regex URL Normalizer (urlnormalizer-regex)
2022-11-24 22:35:12,651 INFO o.a.n.p.PluginRepository [main]    CyberNeko HTML Parser (lib-nekohtml)
2022-11-24 22:35:12,651 INFO o.a.n.p.PluginRepository [main]    OPIC Scoring Plug-in (scoring-opic)
2022-11-24 22:35:12,651 INFO o.a.n.p.PluginRepository [main]    Pass-through URL Normalizer (urlnormalizer-pass)
2022-11-24 22:35:12,652 INFO o.a.n.p.PluginRepository [main]    Http Protocol Plug-in (protocol-http)
2022-11-24 22:35:12,652 INFO o.a.n.p.PluginRepository [main]    ElasticIndexWriter (indexer-elastic)
2022-11-24 22:35:12,652 INFO o.a.n.p.PluginRepository [main] Registered Extension-Points:
2022-11-24 22:35:12,652 INFO o.a.n.p.PluginRepository [main]     (Nutch Content Parser)
2022-11-24 22:35:12,652 INFO o.a.n.p.PluginRepository [main]     (Nutch URL Filter)
2022-11-24 22:35:12,652 INFO o.a.n.p.PluginRepository [main]     (HTML Parse Filter)
2022-11-24 22:35:12,653 INFO o.a.n.p.PluginRepository [main]     (Nutch Scoring)
2022-11-24 22:35:12,653 INFO o.a.n.p.PluginRepository [main]     (Nutch URL Normalizer)
2022-11-24 22:35:12,653 INFO o.a.n.p.PluginRepository [main]     (Nutch Publisher)
2022-11-24 22:35:12,653 INFO o.a.n.p.PluginRepository [main]     (Nutch Exchange)
2022-11-24 22:35:12,653 INFO o.a.n.p.PluginRepository [main]     (Nutch Protocol)
2022-11-24 22:35:12,654 INFO o.a.n.p.PluginRepository [main]     (Nutch URL Ignore Exemption Filter)
2022-11-24 22:35:12,654 INFO o.a.n.p.PluginRepository [main]     (Nutch Index Writer)
2022-11-24 22:35:12,654 INFO o.a.n.p.PluginRepository [main]     (Nutch Segment Merge Filter)
2022-11-24 22:35:12,654 INFO o.a.n.p.PluginRepository [main]     (Nutch Indexing Filter)
2022-11-24 22:35:12,662 INFO o.a.n.f.Fetcher [main] Fetcher: starting at 2022-11-24 22:35:12
2022-11-24 22:35:12,662 INFO o.a.n.f.Fetcher [main] Fetcher: segment: crawl/segments/20221124223509
2022-11-24 22:35:12,662 INFO o.a.n.f.Fetcher [main] Fetcher Timelimit set for : 1669340112662  (2022-11-25 01:35:12)
2022-11-24 22:35:13,785 INFO o.a.n.f.FetchItemQueues [LocalJobRunner Map Task Executor #0] Using queue mode : byHost
2022-11-24 22:35:13,786 INFO o.a.n.f.Fetcher [LocalJobRunner Map Task Executor #0] Fetcher: threads: 50
2022-11-24 22:35:13,823 INFO o.a.n.f.Fetcher [LocalJobRunner Map Task Executor #0] Fetcher: time-out divisor: 2
2022-11-24 22:35:13,833 INFO o.a.n.f.QueueFeeder [QueueFeeder] QueueFeeder finished: total 1 records
2022-11-24 22:35:13,833 INFO o.a.n.f.QueueFeeder [QueueFeeder] QueueFeeder queuing status:
2022-11-24 22:35:13,833 INFO o.a.n.f.QueueFeeder [QueueFeeder]  1       SUCCESSFULLY_QUEUED
2022-11-24 22:35:13,833 INFO o.a.n.f.QueueFeeder [QueueFeeder]  0       ERROR_CREATE_FETCH_ITEM
2022-11-24 22:35:13,834 INFO o.a.n.f.QueueFeeder [QueueFeeder]  0       ABOVE_EXCEPTION_THRESHOLD
2022-11-24 22:35:13,834 INFO o.a.n.f.QueueFeeder [QueueFeeder]  0       HIT_BY_TIMELIMIT
2022-11-24 22:35:13,852 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner Map Task Executor #0] Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter'
2022-11-24 22:35:13,879 INFO o.a.n.f.FetcherThread [LocalJobRunner Map Task Executor #0] FetcherThread 45 Using queue mode : byHost
2022-11-24 22:35:13,880 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 51 fetching https://cwiki.apache.org/ (queue crawl delay=5000ms)
2022-11-24 22:35:13,890 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner Map Task Executor #0] Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter'
2022-11-24 22:35:14,036 INFO o.a.n.p.h.Http [FetcherThread] http.proxy.host = null
2022-11-24 22:35:14,036 INFO o.a.n.p.h.Http [FetcherThread] http.proxy.port = 8080
2022-11-24 22:35:14,037 INFO o.a.n.p.h.Http [FetcherThread] http.proxy.exception.list = false
2022-11-24 22:35:14,037 INFO o.a.n.p.h.Http [FetcherThread] http.timeout = 10000
2022-11-24 22:35:14,037 INFO o.a.n.p.h.Http [FetcherThread] http.content.limit = 1048576
2022-11-24 22:35:14,038 INFO o.a.n.p.h.Http [FetcherThread] http.agent = My Spider/Nutch-1.20-SNAPSHOT
2022-11-24 22:35:14,038 INFO o.a.n.p.h.Http [FetcherThread] http.accept.language = en-us,en-gb,en;q=0.7,*;q=0.3
2022-11-24 22:35:14,039 INFO o.a.n.p.h.Http [FetcherThread] http.accept = text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
2022-11-24 22:35:14,039 INFO o.a.n.p.h.Http [FetcherThread] http.enable.cookie.header = true
2022-11-24 22:35:14,040 INFO o.a.n.f.FetcherThread [LocalJobRunner Map Task Executor #0] FetcherThread 45 Using queue mode : byHost
2022-11-24 22:35:14,042 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 52 has no more work available
2022-11-24 22:35:14,043 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 52 -finishing thread FetcherThread, activeThreads=1
2022-11-24 22:35:14,051 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner Map Task Executor #0] Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter'
2022-11-24 22:35:14,051 INFO o.a.n.f.FetcherThread [LocalJobRunner Map Task Executor #0] FetcherThread 45 Using queue mode : byHost
2022-11-24 22:35:14,053 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 53 has no more work available
2022-11-24 22:35:14,054 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 53 -finishing thread FetcherThread, activeThreads=1
2022-11-24 22:35:14,062 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner Map Task Executor #0] Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter'
2022-11-24 22:35:14,063 INFO o.a.n.f.FetcherThread [LocalJobRunner Map Task Executor #0] FetcherThread 45 Using queue mode : byHost
2022-11-24 22:35:14,064 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 54 has no more work available
2022-11-24 22:35:14,065 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 54 -finishing thread FetcherThread, activeThreads=1
2022-11-24 22:35:14,074 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner Map Task Executor #0] Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter'
2022-11-24 22:35:14,075 INFO o.a.n.f.FetcherThread [LocalJobRunner Map Task Executor #0] FetcherThread 45 Using queue mode : byHost
2022-11-24 22:35:14,076 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 55 has no more work available
2022-11-24 22:35:14,076 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 55 -finishing thread FetcherThread, activeThreads=1
2022-11-24 22:35:14,086 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner Map Task Executor #0] Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter'
2022-11-24 22:35:14,087 INFO o.a.n.f.FetcherThread [LocalJobRunner Map Task Executor #0] FetcherThread 45 Using queue mode : byHost
2022-11-24 22:35:14,087 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 56 has no more work available
2022-11-24 22:35:14,088 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 56 -finishing thread FetcherThread, activeThreads=1
2022-11-24 22:35:14,097 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner Map Task Executor #0] Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter'
2022-11-24 22:35:14,098 INFO o.a.n.f.FetcherThread [LocalJobRunner Map Task Executor #0] FetcherThread 45 Using queue mode : byHost
2022-11-24 22:35:14,100 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 57 has no more work available
2022-11-24 22:35:14,100 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 57 -finishing thread FetcherThread, activeThreads=1
2022-11-24 22:35:14,110 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner Map Task Executor #0] Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter'
2022-11-24 22:35:14,112 INFO o.a.n.f.FetcherThread [LocalJobRunner Map Task Executor #0] FetcherThread 45 Using queue mode : byHost
2022-11-24 22:35:14,113 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 58 has no more work available
2022-11-24 22:35:14,113 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 58 -finishing thread FetcherThread, activeThreads=1
2022-11-24 22:35:14,123 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner Map Task Executor #0] Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter'
2022-11-24 22:35:14,124 INFO o.a.n.f.FetcherThread [LocalJobRunner Map Task Executor #0] FetcherThread 45 Using queue mode : byHost
2022-11-24 22:35:14,125 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 59 has no more work available
2022-11-24 22:35:14,126 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 59 -finishing thread FetcherThread, activeThreads=1
2022-11-24 22:35:14,135 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner Map Task Executor #0] Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter'
2022-11-24 22:35:14,136 INFO o.a.n.f.FetcherThread [LocalJobRunner Map Task Executor #0] FetcherThread 45 Using queue mode : byHost
2022-11-24 22:35:14,137 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 60 has no more work available
2022-11-24 22:35:14,138 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 60 -finishing thread FetcherThread, activeThreads=1
2022-11-24 22:35:14,147 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner Map Task Executor #0] Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter'
2022-11-24 22:35:14,148 INFO o.a.n.f.FetcherThread [LocalJobRunner Map Task Executor #0] FetcherThread 45 Using queue mode : byHost
2022-11-24 22:35:14,150 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 61 has no more work available
2022-11-24 22:35:14,150 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 61 -finishing thread FetcherThread, activeThreads=1
2022-11-24 22:35:14,160 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner Map Task Executor #0] Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter'
2022-11-24 22:35:14,161 INFO o.a.n.f.FetcherThread [LocalJobRunner Map Task Executor #0] FetcherThread 45 Using queue mode : byHost
2022-11-24 22:35:14,162 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 62 has no more work available
2022-11-24 22:35:14,162 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 62 -finishing thread FetcherThread, activeThreads=1
2022-11-24 22:35:14,172 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner Map Task Executor #0] Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter'
2022-11-24 22:35:14,173 INFO o.a.n.f.FetcherThread [LocalJobRunner Map Task Executor #0] FetcherThread 45 Using queue mode : byHost
2022-11-24 22:35:14,173 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 63 has no more work available
2022-11-24 22:35:14,174 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 63 -finishing thread FetcherThread, activeThreads=1
2022-11-24 22:35:14,183 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner Map Task Executor #0] Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter'
2022-11-24 22:35:14,184 INFO o.a.n.f.FetcherThread [LocalJobRunner Map Task Executor #0] FetcherThread 45 Using queue mode : byHost
2022-11-24 22:35:14,185 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 64 has no more work available
2022-11-24 22:35:14,185 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 64 -finishing thread FetcherThread, activeThreads=1
2022-11-24 22:35:14,195 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner Map Task Executor #0] Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter'
2022-11-24 22:35:14,195 INFO o.a.n.f.FetcherThread [LocalJobRunner Map Task Executor #0] FetcherThread 45 Using queue mode : byHost
2022-11-24 22:35:14,196 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 65 has no more work available
2022-11-24 22:35:14,196 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 65 -finishing thread FetcherThread, activeThreads=1
2022-11-24 22:35:14,206 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner Map Task Executor #0] Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter'
2022-11-24 22:35:14,207 INFO o.a.n.f.FetcherThread [LocalJobRunner Map Task Executor #0] FetcherThread 45 Using queue mode : byHost
2022-11-24 22:35:14,208 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 66 has no more work available
2022-11-24 22:35:14,208 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 66 -finishing thread FetcherThread, activeThreads=1
2022-11-24 22:35:14,217 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner Map Task Executor #0] Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter'
2022-11-24 22:35:14,218 INFO o.a.n.f.FetcherThread [LocalJobRunner Map Task Executor #0] FetcherThread 45 Using queue mode : byHost
2022-11-24 22:35:14,219 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 67 has no more work available
2022-11-24 22:35:14,219 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 67 -finishing thread FetcherThread, activeThreads=1
2022-11-24 22:35:14,229 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner Map Task Executor #0] Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter'
2022-11-24 22:35:14,230 INFO o.a.n.f.FetcherThread [LocalJobRunner Map Task Executor #0] FetcherThread 45 Using queue mode : byHost
2022-11-24 22:35:14,234 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 68 has no more work available
2022-11-24 22:35:14,235 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 68 -finishing thread FetcherThread, activeThreads=1
2022-11-24 22:35:14,244 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner Map Task Executor #0] Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter'
2022-11-24 22:35:14,245 INFO o.a.n.f.FetcherThread [LocalJobRunner Map Task Executor #0] FetcherThread 45 Using queue mode : byHost
2022-11-24 22:35:14,255 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 69 has no more work available
2022-11-24 22:35:14,255 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 69 -finishing thread FetcherThread, activeThreads=1
2022-11-24 22:35:14,262 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner Map Task Executor #0] Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter'
2022-11-24 22:35:14,263 INFO o.a.n.f.FetcherThread [LocalJobRunner Map Task Executor #0] FetcherThread 45 Using queue mode : byHost
2022-11-24 22:35:14,265 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 70 has no more work available
2022-11-24 22:35:14,265 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 70 -finishing thread FetcherThread, activeThreads=1
2022-11-24 22:35:14,275 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner Map Task Executor #0] Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter'
2022-11-24 22:35:14,286 INFO o.a.n.f.FetcherThread [LocalJobRunner Map Task Executor #0] FetcherThread 45 Using queue mode : byHost
2022-11-24 22:35:14,289 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 71 has no more work available
2022-11-24 22:35:14,289 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 71 -finishing thread FetcherThread, activeThreads=1
2022-11-24 22:35:14,296 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner Map Task Executor #0] Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter'
2022-11-24 22:35:14,297 INFO o.a.n.f.FetcherThread [LocalJobRunner Map Task Executor #0] FetcherThread 45 Using queue mode : byHost
2022-11-24 22:35:14,302 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 72 has no more work available
2022-11-24 22:35:14,302 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 72 -finishing thread FetcherThread, activeThreads=1
2022-11-24 22:35:14,310 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner Map Task Executor #0] Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter'
2022-11-24 22:35:14,310 INFO o.a.n.f.FetcherThread [LocalJobRunner Map Task Executor #0] FetcherThread 45 Using queue mode : byHost
2022-11-24 22:35:14,318 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 73 has no more work available
2022-11-24 22:35:14,318 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 73 -finishing thread FetcherThread, activeThreads=1
2022-11-24 22:35:14,321 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner Map Task Executor #0] Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter'
2022-11-24 22:35:14,322 INFO o.a.n.f.FetcherThread [LocalJobRunner Map Task Executor #0] FetcherThread 45 Using queue mode : byHost
2022-11-24 22:35:14,323 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 74 has no more work available
2022-11-24 22:35:14,323 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 74 -finishing thread FetcherThread, activeThreads=1
2022-11-24 22:35:14,333 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner Map Task Executor #0] Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter'
2022-11-24 22:35:14,333 INFO o.a.n.f.FetcherThread [LocalJobRunner Map Task Executor #0] FetcherThread 45 Using queue mode : byHost
2022-11-24 22:35:14,337 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 75 has no more work available
2022-11-24 22:35:14,337 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 75 -finishing thread FetcherThread, activeThreads=1
2022-11-24 22:35:14,344 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner Map Task Executor #0] Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter'
2022-11-24 22:35:14,344 INFO o.a.n.f.FetcherThread [LocalJobRunner Map Task Executor #0] FetcherThread 45 Using queue mode : byHost
2022-11-24 22:35:14,347 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 76 has no more work available
2022-11-24 22:35:14,348 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 76 -finishing thread FetcherThread, activeThreads=1
2022-11-24 22:35:14,355 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner Map Task Executor #0] Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter'
2022-11-24 22:35:14,356 INFO o.a.n.f.FetcherThread [LocalJobRunner Map Task Executor #0] FetcherThread 45 Using queue mode : byHost
2022-11-24 22:35:14,364 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 77 has no more work available
2022-11-24 22:35:14,364 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 77 -finishing thread FetcherThread, activeThreads=1
2022-11-24 22:35:14,370 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner Map Task Executor #0] Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter'
2022-11-24 22:35:14,370 INFO o.a.n.f.FetcherThread [LocalJobRunner Map Task Executor #0] FetcherThread 45 Using queue mode : byHost
2022-11-24 22:35:14,373 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 78 has no more work available
2022-11-24 22:35:14,374 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 78 -finishing thread FetcherThread, activeThreads=1
2022-11-24 22:35:14,381 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner Map Task Executor #0] Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter'
2022-11-24 22:35:14,382 INFO o.a.n.f.FetcherThread [LocalJobRunner Map Task Executor #0] FetcherThread 45 Using queue mode : byHost
2022-11-24 22:35:14,386 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 79 has no more work available
2022-11-24 22:35:14,387 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 79 -finishing thread FetcherThread, activeThreads=1
2022-11-24 22:35:14,393 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner Map Task Executor #0] Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter'
2022-11-24 22:35:14,394 INFO o.a.n.f.FetcherThread [LocalJobRunner Map Task Executor #0] FetcherThread 45 Using queue mode : byHost
2022-11-24 22:35:14,397 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 80 has no more work available
2022-11-24 22:35:14,397 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 80 -finishing thread FetcherThread, activeThreads=1
2022-11-24 22:35:14,406 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner Map Task Executor #0] Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter'
2022-11-24 22:35:14,407 INFO o.a.n.f.FetcherThread [LocalJobRunner Map Task Executor #0] FetcherThread 45 Using queue mode : byHost
2022-11-24 22:35:14,408 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 81 has no more work available
2022-11-24 22:35:14,409 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 81 -finishing thread FetcherThread, activeThreads=1
2022-11-24 22:35:14,418 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner Map Task Executor #0] Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter'
2022-11-24 22:35:14,419 INFO o.a.n.f.FetcherThread [LocalJobRunner Map Task Executor #0] FetcherThread 45 Using queue mode : byHost
2022-11-24 22:35:14,421 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 82 has no more work available
2022-11-24 22:35:14,422 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 82 -finishing thread FetcherThread, activeThreads=1
2022-11-24 22:35:14,430 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner Map Task Executor #0] Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter'
2022-11-24 22:35:14,431 INFO o.a.n.f.FetcherThread [LocalJobRunner Map Task Executor #0] FetcherThread 45 Using queue mode : byHost
2022-11-24 22:35:14,436 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 83 has no more work available
2022-11-24 22:35:14,436 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 83 -finishing thread FetcherThread, activeThreads=1
2022-11-24 22:35:14,445 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner Map Task Executor #0] Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter'
2022-11-24 22:35:14,446 INFO o.a.n.f.FetcherThread [LocalJobRunner Map Task Executor #0] FetcherThread 45 Using queue mode : byHost
2022-11-24 22:35:14,447 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 84 has no more work available
2022-11-24 22:35:14,447 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 84 -finishing thread FetcherThread, activeThreads=1
2022-11-24 22:35:14,457 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner Map Task Executor #0] Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter'
2022-11-24 22:35:14,458 INFO o.a.n.f.FetcherThread [LocalJobRunner Map Task Executor #0] FetcherThread 45 Using queue mode : byHost
2022-11-24 22:35:14,467 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 85 has no more work available
2022-11-24 22:35:14,472 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner Map Task Executor #0] Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter'
2022-11-24 22:35:14,473 INFO o.a.n.f.FetcherThread [LocalJobRunner Map Task Executor #0] FetcherThread 45 Using queue mode : byHost
2022-11-24 22:35:14,472 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 85 -finishing thread FetcherThread, activeThreads=1
2022-11-24 22:35:14,474 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 86 has no more work available
2022-11-24 22:35:14,474 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 86 -finishing thread FetcherThread, activeThreads=1
2022-11-24 22:35:14,484 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner Map Task Executor #0] Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter'
2022-11-24 22:35:14,484 INFO o.a.n.f.FetcherThread [LocalJobRunner Map Task Executor #0] FetcherThread 45 Using queue mode : byHost
2022-11-24 22:35:14,485 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 87 has no more work available
2022-11-24 22:35:14,485 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 87 -finishing thread FetcherThread, activeThreads=1
2022-11-24 22:35:14,495 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner Map Task Executor #0] Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter'
2022-11-24 22:35:14,496 INFO o.a.n.f.FetcherThread [LocalJobRunner Map Task Executor #0] FetcherThread 45 Using queue mode : byHost
2022-11-24 22:35:14,496 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 88 has no more work available
2022-11-24 22:35:14,496 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 88 -finishing thread FetcherThread, activeThreads=1
2022-11-24 22:35:14,506 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner Map Task Executor #0] Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter'
2022-11-24 22:35:14,507 INFO o.a.n.f.FetcherThread [LocalJobRunner Map Task Executor #0] FetcherThread 45 Using queue mode : byHost
2022-11-24 22:35:14,507 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 89 has no more work available
2022-11-24 22:35:14,508 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 89 -finishing thread FetcherThread, activeThreads=1
2022-11-24 22:35:14,517 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner Map Task Executor #0] Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter'
2022-11-24 22:35:14,518 INFO o.a.n.f.FetcherThread [LocalJobRunner Map Task Executor #0] FetcherThread 45 Using queue mode : byHost
2022-11-24 22:35:14,518 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 90 has no more work available
2022-11-24 22:35:14,519 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 90 -finishing thread FetcherThread, activeThreads=1
2022-11-24 22:35:14,529 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner Map Task Executor #0] Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter'
2022-11-24 22:35:14,529 INFO o.a.n.f.FetcherThread [LocalJobRunner Map Task Executor #0] FetcherThread 45 Using queue mode : byHost
2022-11-24 22:35:14,530 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 91 has no more work available
2022-11-24 22:35:14,530 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 91 -finishing thread FetcherThread, activeThreads=1
2022-11-24 22:35:14,540 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner Map Task Executor #0] Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter'
2022-11-24 22:35:14,541 INFO o.a.n.f.FetcherThread [LocalJobRunner Map Task Executor #0] FetcherThread 45 Using queue mode : byHost
2022-11-24 22:35:14,541 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 92 has no more work available
2022-11-24 22:35:14,541 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 92 -finishing thread FetcherThread, activeThreads=1
2022-11-24 22:35:14,551 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner Map Task Executor #0] Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter'
2022-11-24 22:35:14,552 INFO o.a.n.f.FetcherThread [LocalJobRunner Map Task Executor #0] FetcherThread 45 Using queue mode : byHost
2022-11-24 22:35:14,552 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 93 has no more work available
2022-11-24 22:35:14,552 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 93 -finishing thread FetcherThread, activeThreads=1
2022-11-24 22:35:14,562 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner Map Task Executor #0] Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter'
2022-11-24 22:35:14,563 INFO o.a.n.f.FetcherThread [LocalJobRunner Map Task Executor #0] FetcherThread 45 Using queue mode : byHost
2022-11-24 22:35:14,563 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 94 has no more work available
2022-11-24 22:35:14,563 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 94 -finishing thread FetcherThread, activeThreads=1
2022-11-24 22:35:14,573 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner Map Task Executor #0] Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter'
2022-11-24 22:35:14,574 INFO o.a.n.f.FetcherThread [LocalJobRunner Map Task Executor #0] FetcherThread 45 Using queue mode : byHost
2022-11-24 22:35:14,574 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 95 has no more work available
2022-11-24 22:35:14,574 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 95 -finishing thread FetcherThread, activeThreads=1
2022-11-24 22:35:14,584 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner Map Task Executor #0] Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter'
2022-11-24 22:35:14,585 INFO o.a.n.f.FetcherThread [LocalJobRunner Map Task Executor #0] FetcherThread 45 Using queue mode : byHost
2022-11-24 22:35:14,585 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 96 has no more work available
2022-11-24 22:35:14,586 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 96 -finishing thread FetcherThread, activeThreads=1
2022-11-24 22:35:14,595 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner Map Task Executor #0] Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter'
2022-11-24 22:35:14,596 INFO o.a.n.f.FetcherThread [LocalJobRunner Map Task Executor #0] FetcherThread 45 Using queue mode : byHost
2022-11-24 22:35:14,596 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 97 has no more work available
2022-11-24 22:35:14,597 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 97 -finishing thread FetcherThread, activeThreads=1
2022-11-24 22:35:14,607 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner Map Task Executor #0] Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter'
2022-11-24 22:35:14,607 INFO o.a.n.f.FetcherThread [LocalJobRunner Map Task Executor #0] FetcherThread 45 Using queue mode : byHost
2022-11-24 22:35:14,608 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 98 has no more work available
2022-11-24 22:35:14,608 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 98 -finishing thread FetcherThread, activeThreads=1
2022-11-24 22:35:14,618 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner Map Task Executor #0] Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter'
2022-11-24 22:35:14,618 INFO o.a.n.f.FetcherThread [LocalJobRunner Map Task Executor #0] FetcherThread 45 Using queue mode : byHost
2022-11-24 22:35:14,619 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 99 has no more work available
2022-11-24 22:35:14,619 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 99 -finishing thread FetcherThread, activeThreads=1
2022-11-24 22:35:14,629 INFO o.a.n.n.URLExemptionFilters [LocalJobRunner Map Task Executor #0] Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter'
2022-11-24 22:35:14,630 INFO o.a.n.f.FetcherThread [LocalJobRunner Map Task Executor #0] FetcherThread 45 Using queue mode : byHost
2022-11-24 22:35:14,631 INFO o.a.n.f.Fetcher [LocalJobRunner Map Task Executor #0] Fetcher: throughput threshold: -1
2022-11-24 22:35:14,633 INFO o.a.n.f.Fetcher [LocalJobRunner Map Task Executor #0] Fetcher: throughput threshold retries: 5
2022-11-24 22:35:14,633 INFO o.a.n.f.Fetcher [LocalJobRunner Map Task Executor #0] fetcher.maxNum.threads can't be < than 50 : using 50 instead
2022-11-24 22:35:14,634 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 100 has no more work available
2022-11-24 22:35:14,634 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 100 -finishing thread FetcherThread, activeThreads=1
2022-11-24 22:35:15,647 INFO o.a.n.f.Fetcher [LocalJobRunner Map Task Executor #0] -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0, fetchQueues.getQueueCount=1
2022-11-24 22:35:16,592 INFO o.a.n.n.u.r.RegexURLNormalizer [FetcherThread] can't find rules for scope 'fetcher', using default
2022-11-24 22:35:16,624 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 51 has no more work available
2022-11-24 22:35:16,624 INFO o.a.n.f.FetcherThread [FetcherThread] FetcherThread 51 -finishing thread FetcherThread, activeThreads=0
2022-11-24 22:35:16,648 INFO o.a.n.f.Fetcher [LocalJobRunner Map Task Executor #0] -activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0, fetchQueues.getQueueCount=0
2022-11-24 22:35:16,649 INFO o.a.n.f.Fetcher [LocalJobRunner Map Task Executor #0] -activeThreads=0
2022-11-24 22:35:17,624 INFO o.a.n.f.Fetcher [main] Fetcher: finished at 2022-11-24 22:35:17, elapsed: 00:00:04
Parsing : 20221124223509
/home/developer/nutch/runtime/local/bin/nutch parse -Dmapreduce.job.reduces=2 -Dmapreduce.reduce.speculative=false -Dmapreduce.map.speculative=false -Dmapreduce.map.output.compress=true -D mapreduce.task.skip.start.attempts=2 -D mapreduce.map.skip.maxrecords=1 crawl/segments/20221124223509
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/developer/nutch/runtime/local/lib/log4j-slf4j-impl-2.18.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/developer/nutch/runtime/local/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
2022-11-24 22:35:19,083 INFO o.a.n.p.PluginManifestParser [main] Plugins: looking in: /home/developer/nutch/runtime/local/plugins
2022-11-24 22:35:19,325 INFO o.a.n.p.PluginRepository [main] Plugin Auto-activation mode: [true]
2022-11-24 22:35:19,326 INFO o.a.n.p.PluginRepository [main] Registered Plugins:
2022-11-24 22:35:19,326 INFO o.a.n.p.PluginRepository [main]    Regex URL Filter (urlfilter-regex)
2022-11-24 22:35:19,327 INFO o.a.n.p.PluginRepository [main]    Html Parse Plug-in (parse-html)
2022-11-24 22:35:19,327 INFO o.a.n.p.PluginRepository [main]    HTTP Framework (lib-http)
2022-11-24 22:35:19,327 INFO o.a.n.p.PluginRepository [main]    the nutch core extension points (nutch-extensionpoints)
2022-11-24 22:35:19,328 INFO o.a.n.p.PluginRepository [main]    Basic Indexing Filter (index-basic)
2022-11-24 22:35:19,328 INFO o.a.n.p.PluginRepository [main]    Anchor Indexing Filter (index-anchor)
2022-11-24 22:35:19,328 INFO o.a.n.p.PluginRepository [main]    Tika Parser Plug-in (parse-tika)
2022-11-24 22:35:19,328 INFO o.a.n.p.PluginRepository [main]    Basic URL Normalizer (urlnormalizer-basic)
2022-11-24 22:35:19,329 INFO o.a.n.p.PluginRepository [main]    Regex URL Filter Framework (lib-regex-filter)
2022-11-24 22:35:19,329 INFO o.a.n.p.PluginRepository [main]    Regex URL Normalizer (urlnormalizer-regex)
2022-11-24 22:35:19,329 INFO o.a.n.p.PluginRepository [main]    CyberNeko HTML Parser (lib-nekohtml)
2022-11-24 22:35:19,329 INFO o.a.n.p.PluginRepository [main]    OPIC Scoring Plug-in (scoring-opic)
2022-11-24 22:35:19,330 INFO o.a.n.p.PluginRepository [main]    Pass-through URL Normalizer (urlnormalizer-pass)
2022-11-24 22:35:19,330 INFO o.a.n.p.PluginRepository [main]    Http Protocol Plug-in (protocol-http)
2022-11-24 22:35:19,330 INFO o.a.n.p.PluginRepository [main]    ElasticIndexWriter (indexer-elastic)
2022-11-24 22:35:19,330 INFO o.a.n.p.PluginRepository [main] Registered Extension-Points:
2022-11-24 22:35:19,331 INFO o.a.n.p.PluginRepository [main]     (Nutch Content Parser)
2022-11-24 22:35:19,331 INFO o.a.n.p.PluginRepository [main]     (Nutch URL Filter)
2022-11-24 22:35:19,331 INFO o.a.n.p.PluginRepository [main]     (HTML Parse Filter)
2022-11-24 22:35:19,332 INFO o.a.n.p.PluginRepository [main]     (Nutch Scoring)
2022-11-24 22:35:19,332 INFO o.a.n.p.PluginRepository [main]     (Nutch URL Normalizer)
2022-11-24 22:35:19,332 INFO o.a.n.p.PluginRepository [main]     (Nutch Publisher)
2022-11-24 22:35:19,332 INFO o.a.n.p.PluginRepository [main]     (Nutch Exchange)
2022-11-24 22:35:19,333 INFO o.a.n.p.PluginRepository [main]     (Nutch Protocol)
2022-11-24 22:35:19,333 INFO o.a.n.p.PluginRepository [main]     (Nutch URL Ignore Exemption Filter)
2022-11-24 22:35:19,333 INFO o.a.n.p.PluginRepository [main]     (Nutch Index Writer)
2022-11-24 22:35:19,333 INFO o.a.n.p.PluginRepository [main]     (Nutch Segment Merge Filter)
2022-11-24 22:35:19,334 INFO o.a.n.p.PluginRepository [main]     (Nutch Indexing Filter)
2022-11-24 22:35:19,607 INFO o.a.n.p.ParseSegment [main] ParseSegment: starting at 2022-11-24 22:35:19
2022-11-24 22:35:19,607 INFO o.a.n.p.ParseSegment [main] ParseSegment: segment: crawl/segments/20221124223509
2022-11-24 22:35:20,615 INFO o.a.n.n.URLExemptionFilters [pool-5-thread-1] Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter'
2022-11-24 22:35:20,706 INFO o.a.n.n.URLExemptionFilters [pool-5-thread-1] Found 0 extensions at point:'org.apache.nutch.net.URLExemptionFilter'
2022-11-24 22:35:21,243 INFO o.a.n.p.ParseSegment [main] ParseSegment: finished at 2022-11-24 22:35:21, elapsed: 00:00:01
CrawlDB update
/home/developer/nutch/runtime/local/bin/nutch updatedb -Dmapreduce.job.reduces=2 -Dmapreduce.reduce.speculative=false -Dmapreduce.map.speculative=false -Dmapreduce.map.output.compress=true crawl/crawldb crawl/segments/20221124223509
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/developer/nutch/runtime/local/lib/log4j-slf4j-impl-2.18.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/developer/nutch/runtime/local/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
2022-11-24 22:35:22,650 INFO o.a.n.p.PluginManifestParser [main] Plugins: looking in: /home/developer/nutch/runtime/local/plugins
2022-11-24 22:35:22,861 INFO o.a.n.p.PluginRepository [main] Plugin Auto-activation mode: [true]
2022-11-24 22:35:22,862 INFO o.a.n.p.PluginRepository [main] Registered Plugins:
2022-11-24 22:35:22,862 INFO o.a.n.p.PluginRepository [main]    Regex URL Filter (urlfilter-regex)
2022-11-24 22:35:22,863 INFO o.a.n.p.PluginRepository [main]    Html Parse Plug-in (parse-html)
2022-11-24 22:35:22,863 INFO o.a.n.p.PluginRepository [main]    HTTP Framework (lib-http)
2022-11-24 22:35:22,863 INFO o.a.n.p.PluginRepository [main]    the nutch core extension points (nutch-extensionpoints)
2022-11-24 22:35:22,863 INFO o.a.n.p.PluginRepository [main]    Basic Indexing Filter (index-basic)
2022-11-24 22:35:22,864 INFO o.a.n.p.PluginRepository [main]    Anchor Indexing Filter (index-anchor)
2022-11-24 22:35:22,864 INFO o.a.n.p.PluginRepository [main]    Tika Parser Plug-in (parse-tika)
2022-11-24 22:35:22,864 INFO o.a.n.p.PluginRepository [main]    Basic URL Normalizer (urlnormalizer-basic)
2022-11-24 22:35:22,864 INFO o.a.n.p.PluginRepository [main]    Regex URL Filter Framework (lib-regex-filter)
2022-11-24 22:35:22,865 INFO o.a.n.p.PluginRepository [main]    Regex URL Normalizer (urlnormalizer-regex)
2022-11-24 22:35:22,865 INFO o.a.n.p.PluginRepository [main]    CyberNeko HTML Parser (lib-nekohtml)
2022-11-24 22:35:22,865 INFO o.a.n.p.PluginRepository [main]    OPIC Scoring Plug-in (scoring-opic)
2022-11-24 22:35:22,865 INFO o.a.n.p.PluginRepository [main]    Pass-through URL Normalizer (urlnormalizer-pass)
2022-11-24 22:35:22,865 INFO o.a.n.p.PluginRepository [main]    Http Protocol Plug-in (protocol-http)
2022-11-24 22:35:22,866 INFO o.a.n.p.PluginRepository [main]    ElasticIndexWriter (indexer-elastic)
2022-11-24 22:35:22,866 INFO o.a.n.p.PluginRepository [main] Registered Extension-Points:
2022-11-24 22:35:22,866 INFO o.a.n.p.PluginRepository [main]     (Nutch Content Parser)
2022-11-24 22:35:22,866 INFO o.a.n.p.PluginRepository [main]     (Nutch URL Filter)
2022-11-24 22:35:22,866 INFO o.a.n.p.PluginRepository [main]     (HTML Parse Filter)
2022-11-24 22:35:22,867 INFO o.a.n.p.PluginRepository [main]     (Nutch Scoring)
2022-11-24 22:35:22,867 INFO o.a.n.p.PluginRepository [main]     (Nutch URL Normalizer)
2022-11-24 22:35:22,867 INFO o.a.n.p.PluginRepository [main]     (Nutch Publisher)
2022-11-24 22:35:22,867 INFO o.a.n.p.PluginRepository [main]     (Nutch Exchange)
2022-11-24 22:35:22,868 INFO o.a.n.p.PluginRepository [main]     (Nutch Protocol)
2022-11-24 22:35:22,868 INFO o.a.n.p.PluginRepository [main]     (Nutch URL Ignore Exemption Filter)
2022-11-24 22:35:22,868 INFO o.a.n.p.PluginRepository [main]     (Nutch Index Writer)
2022-11-24 22:35:22,868 INFO o.a.n.p.PluginRepository [main]     (Nutch Segment Merge Filter)
2022-11-24 22:35:22,868 INFO o.a.n.p.PluginRepository [main]     (Nutch Indexing Filter)
2022-11-24 22:35:23,148 INFO o.a.n.c.CrawlDb [main] CrawlDb update: starting at 2022-11-24 22:35:23
2022-11-24 22:35:23,148 INFO o.a.n.c.CrawlDb [main] CrawlDb update: db: crawl/crawldb
2022-11-24 22:35:23,148 INFO o.a.n.c.CrawlDb [main] CrawlDb update: segments: [crawl/segments/20221124223509]
2022-11-24 22:35:23,149 INFO o.a.n.c.CrawlDb [main] CrawlDb update: additions allowed: true
2022-11-24 22:35:23,149 INFO o.a.n.c.CrawlDb [main] CrawlDb update: URL normalizing: false
2022-11-24 22:35:23,149 INFO o.a.n.c.CrawlDb [main] CrawlDb update: URL filtering: false
2022-11-24 22:35:23,149 INFO o.a.n.c.CrawlDb [main] CrawlDb update: 404 purging: false
2022-11-24 22:35:23,151 INFO o.a.n.c.CrawlDb [main] CrawlDb update: Merging segment data into db.
2022-11-24 22:35:24,384 INFO o.a.n.c.FetchScheduleFactory [pool-5-thread-1] Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule
2022-11-24 22:35:24,385 INFO o.a.n.c.AbstractFetchSchedule [pool-5-thread-1] defaultInterval=2592000
2022-11-24 22:35:24,385 INFO o.a.n.c.AbstractFetchSchedule [pool-5-thread-1] maxInterval=7776000
2022-11-24 22:35:24,435 INFO o.a.n.c.FetchScheduleFactory [pool-5-thread-1] Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule
2022-11-24 22:35:24,436 INFO o.a.n.c.AbstractFetchSchedule [pool-5-thread-1] defaultInterval=2592000
2022-11-24 22:35:24,436 INFO o.a.n.c.AbstractFetchSchedule [pool-5-thread-1] maxInterval=7776000
2022-11-24 22:35:24,850 INFO o.a.n.c.CrawlDb [main] CrawlDb update: finished at 2022-11-24 22:35:24, elapsed: 00:00:01
HostDB update
Link inversion
/home/developer/nutch/runtime/local/bin/nutch invertlinks -Dmapreduce.job.reduces=2 -Dmapreduce.reduce.speculative=false -Dmapreduce.map.speculative=false -Dmapreduce.map.output.compress=true crawl/linkdb crawl/segments/20221124223509 -noNormalize -nofilter
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/developer/nutch/runtime/local/lib/log4j-slf4j-impl-2.18.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/developer/nutch/runtime/local/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
2022-11-24 22:35:26,355 INFO o.a.n.p.PluginManifestParser [main] Plugins: looking in: /home/developer/nutch/runtime/local/plugins
2022-11-24 22:35:26,580 INFO o.a.n.p.PluginRepository [main] Plugin Auto-activation mode: [true]
2022-11-24 22:35:26,581 INFO o.a.n.p.PluginRepository [main] Registered Plugins:
2022-11-24 22:35:26,581 INFO o.a.n.p.PluginRepository [main]    Regex URL Filter (urlfilter-regex)
2022-11-24 22:35:26,582 INFO o.a.n.p.PluginRepository [main]    Html Parse Plug-in (parse-html)
2022-11-24 22:35:26,582 INFO o.a.n.p.PluginRepository [main]    HTTP Framework (lib-http)
2022-11-24 22:35:26,582 INFO o.a.n.p.PluginRepository [main]    the nutch core extension points (nutch-extensionpoints)
2022-11-24 22:35:26,582 INFO o.a.n.p.PluginRepository [main]    Basic Indexing Filter (index-basic)
2022-11-24 22:35:26,583 INFO o.a.n.p.PluginRepository [main]    Anchor Indexing Filter (index-anchor)
2022-11-24 22:35:26,583 INFO o.a.n.p.PluginRepository [main]    Tika Parser Plug-in (parse-tika)
2022-11-24 22:35:26,583 INFO o.a.n.p.PluginRepository [main]    Basic URL Normalizer (urlnormalizer-basic)
2022-11-24 22:35:26,583 INFO o.a.n.p.PluginRepository [main]    Regex URL Filter Framework (lib-regex-filter)
2022-11-24 22:35:26,584 INFO o.a.n.p.PluginRepository [main]    Regex URL Normalizer (urlnormalizer-regex)
2022-11-24 22:35:26,584 INFO o.a.n.p.PluginRepository [main]    CyberNeko HTML Parser (lib-nekohtml)
2022-11-24 22:35:26,584 INFO o.a.n.p.PluginRepository [main]    OPIC Scoring Plug-in (scoring-opic)
2022-11-24 22:35:26,584 INFO o.a.n.p.PluginRepository [main]    Pass-through URL Normalizer (urlnormalizer-pass)
2022-11-24 22:35:26,584 INFO o.a.n.p.PluginRepository [main]    Http Protocol Plug-in (protocol-http)
2022-11-24 22:35:26,585 INFO o.a.n.p.PluginRepository [main]    ElasticIndexWriter (indexer-elastic)
2022-11-24 22:35:26,585 INFO o.a.n.p.PluginRepository [main] Registered Extension-Points:
2022-11-24 22:35:26,585 INFO o.a.n.p.PluginRepository [main]     (Nutch Content Parser)
2022-11-24 22:35:26,585 INFO o.a.n.p.PluginRepository [main]     (Nutch URL Filter)
2022-11-24 22:35:26,586 INFO o.a.n.p.PluginRepository [main]     (HTML Parse Filter)
2022-11-24 22:35:26,586 INFO o.a.n.p.PluginRepository [main]     (Nutch Scoring)
2022-11-24 22:35:26,586 INFO o.a.n.p.PluginRepository [main]     (Nutch URL Normalizer)
2022-11-24 22:35:26,586 INFO o.a.n.p.PluginRepository [main]     (Nutch Publisher)
2022-11-24 22:35:26,587 INFO o.a.n.p.PluginRepository [main]     (Nutch Exchange)
2022-11-24 22:35:26,587 INFO o.a.n.p.PluginRepository [main]     (Nutch Protocol)
2022-11-24 22:35:26,587 INFO o.a.n.p.PluginRepository [main]     (Nutch URL Ignore Exemption Filter)
2022-11-24 22:35:26,587 INFO o.a.n.p.PluginRepository [main]     (Nutch Index Writer)
2022-11-24 22:35:26,588 INFO o.a.n.p.PluginRepository [main]     (Nutch Segment Merge Filter)
2022-11-24 22:35:26,588 INFO o.a.n.p.PluginRepository [main]     (Nutch Indexing Filter)
2022-11-24 22:35:26,936 INFO o.a.n.c.LinkDb [main] LinkDb: starting at 2022-11-24 22:35:26
2022-11-24 22:35:26,937 INFO o.a.n.c.LinkDb [main] LinkDb: linkdb: crawl/linkdb
2022-11-24 22:35:26,937 INFO o.a.n.c.LinkDb [main] LinkDb: URL normalize: false
2022-11-24 22:35:26,937 INFO o.a.n.c.LinkDb [main] LinkDb: URL filter: false
2022-11-24 22:35:26,938 INFO o.a.n.c.LinkDb [main] LinkDb: internal links will be ignored.
2022-11-24 22:35:26,938 INFO o.a.n.c.LinkDb [main] LinkDb: adding segment: crawl/segments/20221124223509
2022-11-24 22:35:28,609 INFO o.a.n.c.LinkDb [main] LinkDb: finished at 2022-11-24 22:35:28, elapsed: 00:00:01
Dedup on crawldb
/home/developer/nutch/runtime/local/bin/nutch dedup -Dmapreduce.job.reduces=2 -Dmapreduce.reduce.speculative=false -Dmapreduce.map.speculative=false -Dmapreduce.map.output.compress=true crawl/crawldb -group none
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/developer/nutch/runtime/local/lib/log4j-slf4j-impl-2.18.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/developer/nutch/runtime/local/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
2022-11-24 22:35:30,058 INFO o.a.n.p.PluginManifestParser [main] Plugins: looking in: /home/developer/nutch/runtime/local/plugins
2022-11-24 22:35:30,260 INFO o.a.n.p.PluginRepository [main] Plugin Auto-activation mode: [true]
2022-11-24 22:35:30,261 INFO o.a.n.p.PluginRepository [main] Registered Plugins:
2022-11-24 22:35:30,261 INFO o.a.n.p.PluginRepository [main]    Regex URL Filter (urlfilter-regex)
2022-11-24 22:35:30,262 INFO o.a.n.p.PluginRepository [main]    Html Parse Plug-in (parse-html)
2022-11-24 22:35:30,262 INFO o.a.n.p.PluginRepository [main]    HTTP Framework (lib-http)
2022-11-24 22:35:30,262 INFO o.a.n.p.PluginRepository [main]    the nutch core extension points (nutch-extensionpoints)
2022-11-24 22:35:30,262 INFO o.a.n.p.PluginRepository [main]    Basic Indexing Filter (index-basic)
2022-11-24 22:35:30,262 INFO o.a.n.p.PluginRepository [main]    Anchor Indexing Filter (index-anchor)
2022-11-24 22:35:30,263 INFO o.a.n.p.PluginRepository [main]    Tika Parser Plug-in (parse-tika)
2022-11-24 22:35:30,263 INFO o.a.n.p.PluginRepository [main]    Basic URL Normalizer (urlnormalizer-basic)
2022-11-24 22:35:30,263 INFO o.a.n.p.PluginRepository [main]    Regex URL Filter Framework (lib-regex-filter)
2022-11-24 22:35:30,263 INFO o.a.n.p.PluginRepository [main]    Regex URL Normalizer (urlnormalizer-regex)
2022-11-24 22:35:30,263 INFO o.a.n.p.PluginRepository [main]    CyberNeko HTML Parser (lib-nekohtml)
2022-11-24 22:35:30,263 INFO o.a.n.p.PluginRepository [main]    OPIC Scoring Plug-in (scoring-opic)
2022-11-24 22:35:30,264 INFO o.a.n.p.PluginRepository [main]    Pass-through URL Normalizer (urlnormalizer-pass)
2022-11-24 22:35:30,264 INFO o.a.n.p.PluginRepository [main]    Http Protocol Plug-in (protocol-http)
2022-11-24 22:35:30,264 INFO o.a.n.p.PluginRepository [main]    ElasticIndexWriter (indexer-elastic)
2022-11-24 22:35:30,264 INFO o.a.n.p.PluginRepository [main] Registered Extension-Points:
2022-11-24 22:35:30,264 INFO o.a.n.p.PluginRepository [main]     (Nutch Content Parser)
2022-11-24 22:35:30,265 INFO o.a.n.p.PluginRepository [main]     (Nutch URL Filter)
2022-11-24 22:35:30,265 INFO o.a.n.p.PluginRepository [main]     (HTML Parse Filter)
2022-11-24 22:35:30,265 INFO o.a.n.p.PluginRepository [main]     (Nutch Scoring)
2022-11-24 22:35:30,265 INFO o.a.n.p.PluginRepository [main]     (Nutch URL Normalizer)
2022-11-24 22:35:30,265 INFO o.a.n.p.PluginRepository [main]     (Nutch Publisher)
2022-11-24 22:35:30,266 INFO o.a.n.p.PluginRepository [main]     (Nutch Exchange)
2022-11-24 22:35:30,266 INFO o.a.n.p.PluginRepository [main]     (Nutch Protocol)
2022-11-24 22:35:30,266 INFO o.a.n.p.PluginRepository [main]     (Nutch URL Ignore Exemption Filter)
2022-11-24 22:35:30,266 INFO o.a.n.p.PluginRepository [main]     (Nutch Index Writer)
2022-11-24 22:35:30,266 INFO o.a.n.p.PluginRepository [main]     (Nutch Segment Merge Filter)
2022-11-24 22:35:30,267 INFO o.a.n.p.PluginRepository [main]     (Nutch Indexing Filter)
2022-11-24 22:35:30,271 INFO o.a.n.c.DeduplicationJob [main] DeduplicationJob: starting at 2022-11-24 22:35:30
2022-11-24 22:35:32,219 INFO o.a.n.c.DeduplicationJob [main] Deduplication: 0 documents marked as duplicates
2022-11-24 22:35:32,219 INFO o.a.n.c.DeduplicationJob [main] Deduplication: Updating status of duplicate urls into crawl db.
2022-11-24 22:35:33,425 INFO o.a.n.c.DeduplicationJob [main] Deduplication finished at 2022-11-24 22:35:33, elapsed: 00:00:03
Indexing 20221124223509 to index
/home/developer/nutch/runtime/local/bin/nutch index -Dmapreduce.job.reduces=2 -Dmapreduce.reduce.speculative=false -Dmapreduce.map.speculative=false -Dmapreduce.map.output.compress=true crawl/crawldb -linkdb crawl/linkdb crawl/segments/20221124223509 -deleteGone
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/developer/nutch/runtime/local/lib/log4j-slf4j-impl-2.18.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/developer/nutch/runtime/local/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
2022-11-24 22:35:34,877 INFO o.a.n.p.PluginManifestParser [main] Plugins: looking in: /home/developer/nutch/runtime/local/plugins
2022-11-24 22:35:35,080 INFO o.a.n.p.PluginRepository [main] Plugin Auto-activation mode: [true]
2022-11-24 22:35:35,081 INFO o.a.n.p.PluginRepository [main] Registered Plugins:
2022-11-24 22:35:35,081 INFO o.a.n.p.PluginRepository [main]    Regex URL Filter (urlfilter-regex)
2022-11-24 22:35:35,081 INFO o.a.n.p.PluginRepository [main]    Html Parse Plug-in (parse-html)
2022-11-24 22:35:35,081 INFO o.a.n.p.PluginRepository [main]    HTTP Framework (lib-http)
2022-11-24 22:35:35,082 INFO o.a.n.p.PluginRepository [main]    the nutch core extension points (nutch-extensionpoints)
2022-11-24 22:35:35,082 INFO o.a.n.p.PluginRepository [main]    Basic Indexing Filter (index-basic)
2022-11-24 22:35:35,082 INFO o.a.n.p.PluginRepository [main]    Anchor Indexing Filter (index-anchor)
2022-11-24 22:35:35,082 INFO o.a.n.p.PluginRepository [main]    Tika Parser Plug-in (parse-tika)
2022-11-24 22:35:35,082 INFO o.a.n.p.PluginRepository [main]    Basic URL Normalizer (urlnormalizer-basic)
2022-11-24 22:35:35,083 INFO o.a.n.p.PluginRepository [main]    Regex URL Filter Framework (lib-regex-filter)
2022-11-24 22:35:35,083 INFO o.a.n.p.PluginRepository [main]    Regex URL Normalizer (urlnormalizer-regex)
2022-11-24 22:35:35,083 INFO o.a.n.p.PluginRepository [main]    CyberNeko HTML Parser (lib-nekohtml)
2022-11-24 22:35:35,083 INFO o.a.n.p.PluginRepository [main]    OPIC Scoring Plug-in (scoring-opic)
2022-11-24 22:35:35,083 INFO o.a.n.p.PluginRepository [main]    Pass-through URL Normalizer (urlnormalizer-pass)
2022-11-24 22:35:35,084 INFO o.a.n.p.PluginRepository [main]    Http Protocol Plug-in (protocol-http)
2022-11-24 22:35:35,084 INFO o.a.n.p.PluginRepository [main]    ElasticIndexWriter (indexer-elastic)
2022-11-24 22:35:35,084 INFO o.a.n.p.PluginRepository [main] Registered Extension-Points:
2022-11-24 22:35:35,084 INFO o.a.n.p.PluginRepository [main]     (Nutch Content Parser)
2022-11-24 22:35:35,084 INFO o.a.n.p.PluginRepository [main]     (Nutch URL Filter)
2022-11-24 22:35:35,084 INFO o.a.n.p.PluginRepository [main]     (HTML Parse Filter)
2022-11-24 22:35:35,085 INFO o.a.n.p.PluginRepository [main]     (Nutch Scoring)
2022-11-24 22:35:35,085 INFO o.a.n.p.PluginRepository [main]     (Nutch URL Normalizer)
2022-11-24 22:35:35,085 INFO o.a.n.p.PluginRepository [main]     (Nutch Publisher)
2022-11-24 22:35:35,085 INFO o.a.n.p.PluginRepository [main]     (Nutch Exchange)
2022-11-24 22:35:35,085 INFO o.a.n.p.PluginRepository [main]     (Nutch Protocol)
2022-11-24 22:35:35,086 INFO o.a.n.p.PluginRepository [main]     (Nutch URL Ignore Exemption Filter)
2022-11-24 22:35:35,086 INFO o.a.n.p.PluginRepository [main]     (Nutch Index Writer)
2022-11-24 22:35:35,086 INFO o.a.n.p.PluginRepository [main]     (Nutch Segment Merge Filter)
2022-11-24 22:35:35,086 INFO o.a.n.p.PluginRepository [main]     (Nutch Indexing Filter)
2022-11-24 22:35:35,334 INFO o.a.n.s.SegmentChecker [main] Segment dir is complete: crawl/segments/20221124223509.
2022-11-24 22:35:35,335 INFO o.a.n.i.IndexingJob [main] Indexer: starting at 2022-11-24 22:35:35
2022-11-24 22:35:35,343 INFO o.a.n.i.IndexingJob [main] Indexer: deleting gone documents: true
2022-11-24 22:35:35,343 INFO o.a.n.i.IndexingJob [main] Indexer: URL filtering: false
2022-11-24 22:35:35,343 INFO o.a.n.i.IndexingJob [main] Indexer: URL normalizing: false
2022-11-24 22:35:35,344 INFO o.a.n.i.IndexerMapReduce [main] IndexerMapReduce: crawldb: crawl/crawldb
2022-11-24 22:35:35,347 INFO o.a.n.i.IndexerMapReduce [main] IndexerMapReduces: adding segment: crawl/segments/20221124223509
2022-11-24 22:35:35,348 INFO o.a.n.i.IndexerMapReduce [main] IndexerMapReduce: linkdb: crawl/linkdb
2022-11-24 22:35:36,897 INFO o.a.n.i.IndexWriters [pool-5-thread-1] Index writer org.apache.nutch.indexwriter.elastic.ElasticIndexWriter identified.
2022-11-24 22:35:36,961 WARN o.a.n.e.Exchanges [pool-5-thread-1] No exchange was configured. The documents will be routed to all index writers.
2022-11-24 22:35:37,426 INFO o.a.n.i.IndexerOutputFormat [pool-5-thread-1] Active IndexWriters :
ElasticIndexWriter:
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│host                       │Comma-separated list of hostnames                                                   │localhost│
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│port                       │The port to connect to elastic server.                                              │9200     │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│scheme                     │The scheme (http or https) to connect to elastic server.                            │http     │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│index                      │Default index to send documents to.                                                 │nutch    │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│username                   │Username for auth credentials                                                       │elastic  │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│password                   │Password for auth credentials                                                       │         │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│max.bulk.docs              │Maximum size of the bulk in number of documents.                                    │250      │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│max.bulk.size              │Maximum size of the bulk in bytes.                                                  │2500500  │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│exponential.backoff.millis │Initial delay for the BulkProcessor exponential backoff policy.                     │100      │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│exponential.backoff.retries│Number of times the BulkProcessor  exponential  backoff  policy  should  retry  bulk│10       │
│                           │operations.                                                                         │         │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│bulk.close.timeout         │Number of seconds allowed for the BulkProcessor to complete its last operation.     │600      │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────


2022-11-24 22:35:37,436 INFO o.a.n.i.a.AnchorIndexingFilter [pool-5-thread-1] Anchor deduplication is: off
log4j:WARN No appenders could be found for logger (org.apache.http.impl.nio.client.MainClientExec).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
2022-11-24 22:35:37,798 ERROR o.a.n.i.e.ElasticIndexWriter [I/O dispatcher 1] Elasticsearch indexing failed:
java.io.IOException: Unable to parse response body for Response{requestLine=POST /_bulk?timeout=1m HTTP/1.1, host=http://localhost:9200, response=HTTP/1.1 200 OK}
        at org.elasticsearch.client.RestHighLevelClient$1.onSuccess(RestHighLevelClient.java:1805) ~[?:?]
        at org.elasticsearch.client.RestClient$FailureTrackingResponseListener.onSuccess(RestClient.java:636) ~[?:?]
        at org.elasticsearch.client.RestClient$1.completed(RestClient.java:376) ~[?:?]
        at org.elasticsearch.client.RestClient$1.completed(RestClient.java:370) ~[?:?]
        at org.apache.http.concurrent.BasicFuture.completed(BasicFuture.java:122) ~[httpcore-4.4.14.jar:4.4.14]
        at org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.responseCompleted(DefaultClientExchangeHandlerImpl.java:181) ~[?:?]
        at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.processResponse(HttpAsyncRequestExecutor.java:448) ~[httpcore-nio-4.4.14.jar:4.4.14]
        at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.inputReady(HttpAsyncRequestExecutor.java:338) ~[httpcore-nio-4.4.14.jar:4.4.14]
        at org.apache.http.impl.nio.client.InternalRequestExecutor.inputReady(InternalRequestExecutor.java:83) ~[?:?]
        at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:265) ~[httpcore-nio-4.4.14.jar:4.4.14]
        at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:81) ~[?:?]
        at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:39) ~[?:?]
        at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:114) ~[httpcore-nio-4.4.14.jar:4.4.14]
        at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:162) ~[httpcore-nio-4.4.14.jar:4.4.14]
        at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:337) ~[httpcore-nio-4.4.14.jar:4.4.14]
        at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315) ~[httpcore-nio-4.4.14.jar:4.4.14]
        at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276) ~[httpcore-nio-4.4.14.jar:4.4.14]
        at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104) ~[httpcore-nio-4.4.14.jar:4.4.14]
        at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:591) ~[httpcore-nio-4.4.14.jar:4.4.14]
        at java.lang.Thread.run(Thread.java:829) ~[?:?]
Caused by: java.lang.NullPointerException
        at java.util.Objects.requireNonNull(Objects.java:221) ~[?:?]
        at org.elasticsearch.action.DocWriteResponse.<init>(DocWriteResponse.java:116) ~[?:?]
        at org.elasticsearch.action.delete.DeleteResponse.<init>(DeleteResponse.java:42) ~[?:?]
        at org.elasticsearch.action.delete.DeleteResponse.<init>(DeleteResponse.java:27) ~[?:?]
        at org.elasticsearch.action.delete.DeleteResponse$Builder.build(DeleteResponse.java:94) ~[?:?]
        at org.elasticsearch.action.delete.DeleteResponse$Builder.build(DeleteResponse.java:90) ~[?:?]
        at org.elasticsearch.action.bulk.BulkItemResponse.fromXContent(BulkItemResponse.java:148) ~[?:?]
        at org.elasticsearch.action.bulk.BulkResponse.fromXContent(BulkResponse.java:177) ~[?:?]
        at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:1933) ~[?:?]
        at org.elasticsearch.client.RestHighLevelClient.lambda$performRequestAsyncAndParseEntity$10(RestHighLevelClient.java:1721) ~[?:?]
        at org.elasticsearch.client.RestHighLevelClient$1.onSuccess(RestHighLevelClient.java:1803) ~[?:?]
        ... 19 more
2022-11-24 22:35:37,878 INFO o.a.n.i.IndexerOutputFormat [pool-5-thread-1] Active IndexWriters :
ElasticIndexWriter:
┌───────────────────────────┬────────────────────────────────────────────────────────────────────────────────────┬─────────┐
│host                       │Comma-separated list of hostnames                                                   │localhost│
├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┼─────────┤
│port                       │The port to connect to elastic server.                                              │9200     │
├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┼─────────┤
│scheme                     │The scheme (http or https) to connect to elastic server.                            │http     │
├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┼─────────┤
│index                      │Default index to send documents to.                                                 │nutch    │
├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┼─────────┤
│username                   │Username for auth credentials                                                       │elastic  │
├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┼─────────┤
│password                   │Password for auth credentials                                                       │         │
├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┼─────────┤
│max.bulk.docs              │Maximum size of the bulk in number of documents.                                    │250      │
├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┼─────────┤
│max.bulk.size              │Maximum size of the bulk in bytes.                                                  │2500500  │
├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┼─────────┤
│exponential.backoff.millis │Initial delay for the BulkProcessor exponential backoff policy.                     │100      │
├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┼─────────┤
│exponential.backoff.retries│Number of times the BulkProcessor  exponential  backoff  policy  should  retry  bulk│10       │
│                           │operations.                                                                         │         │
├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────────┼─────────┤
│bulk.close.timeout         │Number of seconds allowed for the BulkProcessor to complete its last operation.     │600      │
└───────────────────────────┴────────────────────────────────────────────────────────────────────────────────────┴─────────┘


2022-11-24 22:35:38,078 INFO o.a.n.i.IndexingJob [main] Indexer: number of documents indexed, deleted, or skipped:
2022-11-24 22:35:38,086 INFO o.a.n.i.IndexingJob [main] Indexer:      1  deleted (redirects)
2022-11-24 22:35:38,090 INFO o.a.n.i.IndexingJob [main] Indexer: finished at 2022-11-24 22:35:38, elapsed: 00:00:02
Thu Nov 24 10:35:38 PM UTC 2022 : Finished loop with 1 iterations
developer@el-mix:~/crawler$
DEPLOY SCRIPT
developer@el-mix:~$ mkdir crawler
developer@el-mix:~$ cd crawler
developer@el-mix:~$ mkdir urls
developer@el-mix:~$ echo https://cwiki.apache.org >> urls/seed.txt

developer@el-mix:~$ nano format_dfs.sh

export HADOOP_HOME=/home/hdoop/hadoop-3.3.4
export HADOOP_COMMON_HOME=/home/hdoop/hadoop-3.3.4

sudo -u hdoop $HADOOP_HOME/sbin/stop-dfs.sh
sudo -u hdoop $HADOOP_HOME/bin/hdfs namenode -format
sudo -u hdoop $HADOOP_HOME/sbin/start-dfs.sh


sudo -u hdoop $HADOOP_HOME/bin/hdfs dfs -chmod 777 /
$HADOOP_HOME/bin/hdfs dfs -mkdir -p /user/$USER

$HADOOP_HOME/bin/hdfs dfs -ls /
$HADOOP_HOME/bin/hdfs dfs -ls /user


developer@el-mix:~$ ./format_dfs.sh
Stopping namenodes on [localhost]
Stopping datanodes
Stopping secondary namenodes [el-mix]
2022-11-29 22:30:30,818 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = el-mix/10.110.6.77
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 3.3.4
STARTUP_MSG: classpath = /home/hdoop/hadoop-3.3.4/etc/hadoop:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/jetty-io-9.4.43.v20210629.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/kerb-crypto-1.0.1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/jetty-xml-9.4.43.v20210629.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/j2objc-annotations-1.1.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/failureaccess-1.0.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/hadoop-annotations-3.3.4.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/jackson-jaxrs-1.9.13.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/jsch-0.1.55.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/animal-sniffer-annotations-1.17.jar:/home/hdoop/hadoop-3.3.4/share/hadoop/common/lib/nimbus-
...
STARTUP_MSG: build = https://github.com/apache/hadoop.git -r a585a73c3e02ac62350c136643a5e7f6095a3dbb; compiled by 'stevel' on 2022-07-29T12:32Z
STARTUP_MSG: java = 11.0.17
************************************************************/
2022-11-29 22:30:30,827 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
2022-11-29 22:30:30,936 INFO namenode.NameNode: createNameNode [-format]
2022-11-29 22:30:31,411 INFO namenode.NameNode: Formatting using clusterid: CID-e3618a83-12fc-4fa4-9313-23bae06b84d0
...
2022-11-29 22:30:31,631 INFO namenode.FSDirectory: XAttrs enabled? true
2022-11-29 22:30:31,631 INFO namenode.NameNode: Caching file names occurring more than 10 times
2022-11-29 22:30:31,637 INFO snapshot.SnapshotManager: Loaded config captureOpenFiles: false, skipCaptureAccessTimeOnlyChange: false, snapshotDiffAllowSnapRootDescendant: true, maxSnapshotLimit: 65536
2022-11-29 22:30:31,639 INFO snapshot.SnapshotManager: SkipList is disabled
2022-11-29 22:30:31,643 INFO util.GSet: Computing capacity for map cachedBlocks
2022-11-29 22:30:31,643 INFO util.GSet: VM type = 64-bit
2022-11-29 22:30:31,644 INFO util.GSet: 0.25% max memory 1.9 GB = 4.9 MB
2022-11-29 22:30:31,644 INFO util.GSet: capacity = 2^19 = 524288 entries
2022-11-29 22:30:31,654 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
2022-11-29 22:30:31,654 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
2022-11-29 22:30:31,654 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
2022-11-29 22:30:31,659 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
2022-11-29 22:30:31,659 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
2022-11-29 22:30:31,661 INFO util.GSet: Computing capacity for map NameNodeRetryCache
2022-11-29 22:30:31,661 INFO util.GSet: VM type = 64-bit
2022-11-29 22:30:31,661 INFO util.GSet: 0.029999999329447746% max memory 1.9 GB = 599.7 KB
2022-11-29 22:30:31,661 INFO util.GSet: capacity = 2^16 = 65536 entries
2022-11-29 22:30:31,685 INFO namenode.FSImage: Allocated new BlockPoolId: BP-487432392-10.110.6.77-1669761031677
2022-11-29 22:30:31,740 INFO common.Storage: Storage directory /home/hdoop/tmpdata/dfs/name has been successfully formatted.
2022-11-29 22:30:31,773 INFO namenode.FSImageFormatProtobuf: Saving image file /home/hdoop/tmpdata/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
2022-11-29 22:30:31,900 INFO namenode.FSImageFormatProtobuf: Image file /home/hdoop/tmpdata/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 397 bytes saved in 0 seconds .
2022-11-29 22:30:31,932 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
2022-11-29 22:30:31,961 INFO namenode.FSNamesystem: Stopping services started for active state
2022-11-29 22:30:31,961 INFO namenode.FSNamesystem: Stopping services started for standby state
2022-11-29 22:30:31,965 INFO namenode.FSImage: FSImageSaver clean checkpoint: txid=0 when meet shutdown.
2022-11-29 22:30:31,967 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at el-mix/10.110.6.77
************************************************************/
Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [el-mix]
Found 1 items
drwxr-xr-x - developer supergroup 0 2022-11-29 22:30 /user
Found 1 items
drwxr-xr-x - developer supergroup 0 2022-11-29 22:30 /user/developer

developer@el-mix:~$ nano cr-deploy.sh
#!/bin/bash

export JAVA_HOME=$(readlink -f /usr/bin/java | sed "s:bin/java::")
export HADOOP_HOME=/home/hdoop/hadoop-3.3.4
export HADOOP_COMMON_HOME=/home/hdoop/hadoop-3.3.4
export PATH="${PATH}:$HADOOP_HOME/bin"
NUTCH=/home/developer/nutch/runtime/deploy

$HADOOP_HOME/bin/hdfs dfs -put urls
$HADOOP_HOME/bin/hdfs dfs -rm crawl/crawldb/.locked
$HADOOP_HOME/bin/hdfs dfs -ls

$NUTCH/bin/nutch inject crawl/crawldb urls
$NUTCH/bin/crawl -i crawl 1
PART II

Немає коментарів:

Коли забув ти рідну мову, біднієш духом ти щодня...
When you forgot your native language you would become a poor at spirit every day ...

Д.Білоус / D.Bilous
Рабів до раю не пускають. Будь вільним!

ipv6 ready