偽分散式執行MapReduce(叢集配置,log日誌和namenode格式化,叢集操作)
阿新 • • 發佈:2018-11-15
目錄
叢集的啟動和配置
#1,進入/opt/module/hadoop-2.7.2/etc/hadoop目錄,配置hadoop-env.sh
[[email protected] hadoop]$ vim hadoop-env.sh
*
*
# Set Hadoop-specific environment variables here.
# The only required environment variable is JAVA_HOME. All others are
# optional. When running a distributed configuration it is best to
# set JAVA_HOME in this file, so that it is correctly defined on
# remote nodes.
# The java implementation to use.
export JAVA_HOME=/opt/module/jdk1.8.0_144
*
*
#2,配置core-site.xml
[ [email protected] hadoop]$ vim core-site.xml
<!-- Put site-specific property overrides in this file. -->
<configuration>
<!-- 指定HDFS中NameNode的地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop104:9000</value>
</property>
<!-- 指定Hadoop執行時產生檔案的儲存目錄 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/module/hadoop-2.7.2/data/tmp</value>
</property>
</configuration>
#3,配置hdfs-site.xml
[ [email protected] hadoop]$ vim hdfs-site.xml
<!-- Put site-specific property overrides in this file. -->
<configuration>
<!-- 指定HDFS副本的數量 -->
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
#4,格式化namenode,(第一次啟動之前格式化,以後就不用了)
[ [email protected] hadoop]$ hdfs namenode -format
18/11/14 20:07:27 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = hadoop104/192.168.1.104
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 2.7.2
*
*
18/11/14 20:07:28 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hadoop104/192.168.1.104
************************************************************/
#5,分別啟動namenode 和 datanode,並檢視是否啟動成功
[[email protected] hadoop]$ hadoop-daemon.sh start namenode
starting namenode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-isea-namenode-hadoop104.out
[[email protected] hadoop]$ hadoop-daemon.sh start datanode
starting datanode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-isea-datanode-hadoop104.out
[[email protected] hadoop]$ jps
3427 NameNode
3517 DataNode
3598 Jps
到此,完成叢集的配置和啟動工作
接下來,我們訪問這個網址:
會出現如下的內容
log日誌和namenode為何不能一直格式化?
#1,log日誌:
[[email protected] logs]$ pwd
/opt/module/hadoop-2.7.2/logs
[[email protected] logs]$ ll
總用量 60
-rw-rw-r--. 1 isea isea 23848 11月 14 20:10 hadoop-isea-datanode-hadoop104.log
-rw-rw-r--. 1 isea isea 715 11月 14 20:10 hadoop-isea-datanode-hadoop104.out
-rw-rw-r--. 1 isea isea 27519 11月 14 20:10 hadoop-isea-namenode-hadoop104.log
-rw-rw-r--. 1 isea isea 715 11月 14 20:10 hadoop-isea-namenode-hadoop104.out
-rw-rw-r--. 1 isea isea 0 11月 14 20:10 SecurityAuth-isea.audit
在啟動namenode 和 datanode的過程中會在hadoop目錄下產生log資料夾,在log資料夾中會產生日誌檔案,
和尾綴為out的檔案 和 一個安全認證的檔案。
#2,為什麼不能一直格式化namenode?
[[email protected] current]$ pwd
/opt/module/hadoop-2.7.2/data/tmp/dfs/data/current
[[email protected] current]$ ll
總用量 8
drwx------. 4 isea isea 4096 11月 14 20:10 BP-847571129-192.168.1.104-1542197248436
-rw-rw-r--. 1 isea isea 229 11月 14 20:10 VERSION
[[email protected] current]$ cat VERSION
#Wed Nov 14 20:10:52 CST 2018
storageID=DS-305b15b0-96c1-407c-b58e-1beb65922151
clusterID=CID-8eeb5d53-e49f-4de6-9e05-387a7eb1472f
cTime=0
datanodeUuid=ea5794eb-6929-40b7-b8c3-aad970d72c29
storageType=DATA_NODE
layoutVersion=-56
[[email protected] current]$
格式化NameNode,會產生新的叢集id,導致NameNode和DataNode的叢集id不一致,叢集找不到已往資料。
所以,格式NameNode時,一定要先刪除data資料和log日誌,然後再格式化NameNode
操作叢集(上傳,下載,執行MapReduce,查詢)
#1,在HDFS檔案系統上建立一個input資料夾,並準備要上傳的資料
[[email protected] hadoop-2.7.2]$ hdfs dfs -mkdir -p /user/isea/input
[[email protected] hadoop-2.7.2]$ vim wcinput/wc.input
you know that i sea you
sea you
isea you
isea
i sea you
#2,上傳測試資料到HDFS檔案系統,並檢查是否上傳成功
[[email protected] hadoop-2.7.2]$ hdfs dfs -put wcinput/wc.input /user/isea/input/
[[email protected] hadoop-2.7.2]$ hdfs dfs -ls /user/isea/input/
Found 1 items
-rw-r--r-- 1 isea supergroup 57 2018-11-14 20:45 /user/isea/input/wc.input
#3, 執行MapReduce程式,並檢查結果
[[email protected] hadoop-2.7.2]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount /user/isea/input/ /user/isea/output
[[email protected] hadoop-2.7.2]$ hdfs dfs -cat /user/isea/output/*
i 2
isea 2
know 1
sea 3
that 1
you 5
集訓驗證操作叢集,從叢集中下載檔案,最後刪除HDFS的輸出檔案
[[email protected] hadoop-2.7.2]$ mkdir wcoutput
[[email protected] hadoop-2.7.2]$ hdfs dfs -get /user/isea/output/part-r-00000 ./wcoutput/
[[email protected] hadoop-2.7.2]$ cd wcoutput/
[[email protected] wcoutput]$ ll
總用量 4
-rw-r--r--. 1 isea isea 37 11月 14 21:21 part-r-00000
[[email protected] wcoutput]$ cat part-r-00000
i 2
isea 2
know 1
sea 3
that 1
you 5
[[email protected] wcoutput]$ hdfs dfs -rm -r /user/isea/output
18/11/14 21:26:27 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted /user/isea/output
此外,我們還可以在瀏覽器端驗證結果: