centos7安裝hadoop偽分佈叢集
1、獲取hadoop
使用國內映象下載速度很快,清華映象地址:Index of /apache (tsinghua.edu.cn)
找到hadoop目錄
點選common
選擇自己需要的版本
本示例使用hadoop-3.2.2
2、因環境受限,使用偽分佈的安裝方式
ip:設定靜態ip
[root@bigdata01 ~]# vi /etc/sysconfig/network-scripts/ifcfg-ens33
修改以下引數:
BOOTPROTO="static"
新增:
IPADDR=192.168.184.128
GATEWAY=192.168.184.2
DNS1=192.168.184.2
儲存後重啟網路
[root@192 ~]# service network restart
Restarting network (via systemctl): [ OK ]
3、設定永久主機名
[root@192 ~]# vi /etc/hostname
4、關閉防火牆
臨時關閉防火牆
[root@192 ~]# systemctl stop firewalld
檢視防火牆狀態
[root@bigdata01 ~]# systemctl status firewalld
永久關閉防火牆
[root@bigdata01 ~]# systemctl disable firewalld
Removed symlink /etc/systemd/system/multi-user.target.wants/firewalld.service.
Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
5、實現ssh免密登入
現在伺服器上執行以下命令,rsa表示的是一種加密演算法,執行命令後需要預設連續按4次回車回到命令列,不需輸入任何內容。
[root@bigdata01 ~]# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:J1ZsjlwvB9MZBUylhVPAjj6gQJgshpCs14zETqQmLhM root@bigdata01
The key's randomart image is:
+---[RSA 2048]----+
|==.o +=B= |
|++B . . .== |
|E* = Bo+. |
|=.+ + ..*.+. |
|oo . .So+ o |
|.. .. ooo |
| . |
| |
| |
+----[SHA256]-----+
執行以後會在~/.ssh目錄下生產對應的公鑰和祕鑰檔案
[root@bigdata01 ~]# ll ~/.ssh
total 12
-rw-------. 1 root root 1679 Jul 3 17:42 id_rsa
-rw-r--r--. 1 root root 396 Jul 3 17:42 id_rsa.pub
-rw-r--r--. 1 root root 203 Jul 3 17:41 known_hosts
上述檔案中pub表示公鑰檔案
下一步是把公鑰拷貝到需要免密碼登入的機器上面
因本示例只有一臺伺服器,所以拷貝到本伺服器對應目錄中
[root@bigdata01 ~]# cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
>>表示重定向
至此就可以通過ssh免密登入。
[root@bigdata01 ~]# ssh bigdata01
Last login: Sat Jul 3 17:52:45 2021 from fe80::3c67:9a76:5a25:c4d9%ens33
6、安裝jdk
建立data/soft目錄用於存放安裝檔案
[root@bigdata01 /]# mkdir -p /data/soft
將jdk檔案上傳到/data/soft目錄下
[root@bigdata01 soft]# ll
total 190424
-rw-r--r--. 1 root root 194990602 Jul 3 18:04 jdk-8u211-linux-x64.tar.gz
解壓jdk安裝包
[root@bigdata01 soft]# tar -zxvf jdk-8u211-linux-x64.tar.gz
解壓後的資料夾名稱有點長,我們修改一下
[root@bigdata01 soft]# mv jdk1.8.0_211/ jdk1.8
配置環境變數 JAVA_HOME
[root@bigdata01 soft]# vi /etc/profile
.....
export JAVA_HOME=/data/soft/jdk1.8
export PATH=.:$JAVA_HOME/bin:$PATH
使配置生效
[root@bigdata01 soft]# source /etc/profile
確認是否安裝成功
[root@bigdata01 soft]# java -version
java version "1.8.0_211"
Java(TM) SE Runtime Environment (build 1.8.0_211-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.211-b12, mixed mode)
7、安裝hadoop
上傳安裝包到/data/soft目錄下
檢視檔案
[root@bigdata01 soft]# ll
total 576608
-rw-r--r--. 1 root root 395448622 Jul 3 18:22 hadoop-3.2.2.tar.gz
drwxr-xr-x. 7 10 143 245 Apr 2 2019 jdk1.8
-rw-r--r--. 1 root root 194990602 Jul 3 18:04 jdk-8u211-linux-x64.tar.gz
解壓安裝包
[root@bigdata01 soft]# tar -zxvf hadoop-3.2.2.tar.gz
檢視目錄
[root@bigdata01 soft]# ll
total 576608
drwxr-xr-x. 9 1000 1000 149 Jan 3 18:11 hadoop-3.2.2
-rw-r--r--. 1 root root 395448622 Jul 3 18:22 hadoop-3.2.2.tar.gz
drwxr-xr-x. 7 10 143 245 Apr 2 2019 jdk1.8
-rw-r--r--. 1 root root 194990602 Jul 3 18:04 jdk-8u211-linux-x64.tar.gz
[root@bigdata01 soft]# cd hadoop-3.2.2
[root@bigdata01 hadoop-3.2.2]# ll
total 184
drwxr-xr-x. 2 1000 1000 203 Jan 3 18:11 bin
drwxr-xr-x. 3 1000 1000 20 Jan 3 17:29 etc
drwxr-xr-x. 2 1000 1000 106 Jan 3 18:11 include
drwxr-xr-x. 3 1000 1000 20 Jan 3 18:11 lib
drwxr-xr-x. 4 1000 1000 4096 Jan 3 18:11 libexec
-rw-rw-r--. 1 1000 1000 150569 Dec 5 2020 LICENSE.txt
-rw-rw-r--. 1 1000 1000 21943 Dec 5 2020 NOTICE.txt
-rw-rw-r--. 1 1000 1000 1361 Dec 5 2020 README.txt
drwxr-xr-x. 3 1000 1000 4096 Jan 3 17:29 sbin
drwxr-xr-x. 4 1000 1000 31 Jan 3 18:46 share
hadoop目錄下有兩個重要的目錄,一個是bin目錄,一個是sbin目錄。
先看一下bin目錄
[root@bigdata01 hadoop-3.2.2]# cd bin
[root@bigdata01 bin]# ll
total 1032
-rwxr-xr-x. 1 1000 1000 442728 Jan 3 17:54 container-executor
-rwxr-xr-x. 1 1000 1000 8707 Jan 3 17:28 hadoop
-rwxr-xr-x. 1 1000 1000 11265 Jan 3 17:28 hadoop.cmd
-rwxr-xr-x. 1 1000 1000 11274 Jan 3 17:32 hdfs
-rwxr-xr-x. 1 1000 1000 8081 Jan 3 17:32 hdfs.cmd
-rwxr-xr-x. 1 1000 1000 6237 Jan 3 17:57 mapred
-rwxr-xr-x. 1 1000 1000 6311 Jan 3 17:57 mapred.cmd
-rwxr-xr-x. 1 1000 1000 29184 Jan 3 17:54 oom-listener
-rwxr-xr-x. 1 1000 1000 485312 Jan 3 17:54 test-container-executor
-rwxr-xr-x. 1 1000 1000 12112 Jan 3 17:54 yarn
-rwxr-xr-x. 1 1000 1000 12840 Jan 3 17:54 yarn.cmd
裡面有hdfs,yarn等指令碼,這些指令碼後期主要是為了操作hadoop叢集中的hdfs 和yarn元件的。
再看一下sbin目錄,這裡面有很多start stop開頭的指令碼,這些指令碼是負責啟動 或者停止叢集中的元件的。
[root@bigdata01 hadoop-3.2.2]# cd sbin
[root@bigdata01 sbin]# ll
total 108
-rwxr-xr-x. 1 1000 1000 2756 Jan 3 17:32 distribute-exclude.sh
drwxr-xr-x. 4 1000 1000 36 Jan 3 17:54 FederationStateStore
-rwxr-xr-x. 1 1000 1000 1983 Jan 3 17:28 hadoop-daemon.sh
-rwxr-xr-x. 1 1000 1000 2522 Jan 3 17:28 hadoop-daemons.sh
-rwxr-xr-x. 1 1000 1000 1542 Jan 3 17:33 httpfs.sh
-rwxr-xr-x. 1 1000 1000 1500 Jan 3 17:29 kms.sh
-rwxr-xr-x. 1 1000 1000 1841 Jan 3 17:57 mr-jobhistory-daemon.sh
-rwxr-xr-x. 1 1000 1000 2086 Jan 3 17:32 refresh-namenodes.sh
-rwxr-xr-x. 1 1000 1000 1779 Jan 3 17:28 start-all.cmd
-rwxr-xr-x. 1 1000 1000 2221 Jan 3 17:28 start-all.sh
-rwxr-xr-x. 1 1000 1000 1880 Jan 3 17:32 start-balancer.sh
-rwxr-xr-x. 1 1000 1000 1401 Jan 3 17:32 start-dfs.cmd
-rwxr-xr-x. 1 1000 1000 5170 Jan 3 17:32 start-dfs.sh
-rwxr-xr-x. 1 1000 1000 1793 Jan 3 17:32 start-secure-dns.sh
-rwxr-xr-x. 1 1000 1000 1571 Jan 3 17:54 start-yarn.cmd
-rwxr-xr-x. 1 1000 1000 3342 Jan 3 17:54 start-yarn.sh
-rwxr-xr-x. 1 1000 1000 1770 Jan 3 17:28 stop-all.cmd
-rwxr-xr-x. 1 1000 1000 2166 Jan 3 17:28 stop-all.sh
-rwxr-xr-x. 1 1000 1000 1783 Jan 3 17:32 stop-balancer.sh
-rwxr-xr-x. 1 1000 1000 1455 Jan 3 17:32 stop-dfs.cmd
-rwxr-xr-x. 1 1000 1000 3898 Jan 3 17:32 stop-dfs.sh
-rwxr-xr-x. 1 1000 1000 1756 Jan 3 17:32 stop-secure-dns.sh
-rwxr-xr-x. 1 1000 1000 1642 Jan 3 17:54 stop-yarn.cmd
-rwxr-xr-x. 1 1000 1000 3083 Jan 3 17:54 stop-yarn.sh
-rwxr-xr-x. 1 1000 1000 1982 Jan 3 17:28 workers.sh
-rwxr-xr-x. 1 1000 1000 1814 Jan 3 17:54 yarn-daemon.sh
-rwxr-xr-x. 1 1000 1000 2328 Jan 3 17:54 yarn-daemons.sh
因為我們會用到bin目錄和sbin目錄下面的一些指令碼,為了方便使用,我們需要配置一下環境變數。
[root@bigdata01 sbin]# vi /etc/profile
.....
export JAVA_HOME=/data/soft/jdk1.8
export HADOOP_HOME=/data/soft/hadoop-3.2.2
export PATH=.:$JAVA_HOME/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/bin:$PATH
[root@bigdata01 sbin]# source /etc/profile
修改Hadoop相關配置檔案
進入配置檔案所在目錄
[root@bigdata01 hadoop-3.2.2]# cd etc/hadoop/
主要修改下面這幾個檔案:
hadoop-env.sh
core-site.xml
hdfs-site.xml
mapred-site.xml
yarn-site.xml
workers
首先修改hadoop-env.sh,增加環境變數,新增到檔案末尾即可。
[root@bigdata01 hadoop]# vi hadoop-env.sh
.......
export JAVA_HOME=/data/soft/jdk1.8
export HADOOP_LOG_DIR=/data/hadoop_repo/logs/hadoop
修改 core-site.xml 檔案
注意 fs.defaultFS 屬性中的主機名需要和你配置的主機名保持一致,新增到末尾
[root@bigdata01 hadoop]# vi core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://bigdata01:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/data/hadoop_repo</value> </property> </configuration>
修改hdfs-site.xml檔案,把hdfs中檔案副本的數量設定為1,因為現在偽分佈叢集只有一個節點
[root@bigdata01 hadoop]# vi hdfs-site.xml <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
修改mapred-site.xml,設定mapreduce使用的資源排程框架
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
修改yarn-site.xml,設定yarn上支援執行的服務和環境變數白名單
[root@bigdata01 hadoop]# vi yarn-site.xml <configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.env-whitelist</name> <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value> </property> </configuration>
修改workers,設定叢集中從節點的主機名資訊,在這裡就一臺叢集,所以就填寫bigdata01即可
[root@bigdata01 hadoop]# vi workers
bigdata01
配置檔案到這就修改好了,但是還不能直接啟動,因為Hadoop中的HDFS是一個分散式的檔案系統,檔案系統在使用之前是需要先格式化的,就類似我們買一塊新的磁碟,在安裝系統之前需要先格式化才可以 使用。
格式化HDFS
[root@bigdata01 hadoop]# cd /data/soft/hadoop-3.2.0
[root@bigdata01 hadoop-3.2.0]# bin/hdfs namenode -format
WARNING: /data/hadoop_repo/logs/hadoop does not exist. Creating.
2021-07-03 20:52:59,441 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = 192.168.184.128/192.168.184.128
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 3.2.2
STARTUP_MSG: classpath = /data/soft/hadoop-3.2.2/etc/hadoop:/data/soft/hadoop-3.2.2/share/hadoop/common/lib/jetty-security-9.4.20.v20190813.jar:
.......
2021-07-03 20:53:01,097 INFO common.Storage: Storage directory /data/hadoop_repo/dfs/name has been successfully formatted.
2021-07-03 20:53:01,149 INFO namenode.FSImageFormatProtobuf: Saving image file /data/hadoop_repo/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
2021-07-03 20:53:01,277 INFO namenode.FSImageFormatProtobuf: Image file /data/hadoop_repo/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 399 bytes saved in 0 seconds .
2021-07-03 20:53:01,291 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
2021-07-03 20:53:01,301 INFO namenode.FSImage: FSImageSaver clean checkpoint: txid=0 when meet shutdown.
2021-07-03 20:53:01,302 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at 192.168.184.128/192.168.184.128
************************************************************/
看到successfully formatted說明格式化成功了。 如果提示錯誤,一般都是因為配置檔案的問題,需要根據具體的報錯資訊去分析問題。
如果因為配置錯誤,需要重複執行格式化,需要把/data/hadoop_repo目錄中的內容全部刪除
[root@bigdata01 data]# cd /data [root@bigdata01 data]# ls hadoop_repo soft [root@bigdata01 data]# rm -r hadoop_repo/
再按之前格式步驟重新執行格式化
如果是返回成功資訊後就不可以再重新格式化了,否則叢集會出現問題。
啟動偽分佈叢集
執行sbin目錄下的start-all.sh指令碼
[root@bigdata01 hadoop-3.2.2]# sbin/start-all.sh Starting namenodes on [bigdata01] ERROR: Attempting to operate on hdfs namenode as root ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation. Starting datanodes ERROR: Attempting to operate on hdfs datanode as root ERROR: but there is no HDFS_DATANODE_USER defined. Aborting operation. Starting secondary namenodes [bigdata01] ERROR: Attempting to operate on hdfs secondarynamenode as root ERROR: but there is no HDFS_SECONDARYNAMENODE_USER defined. Aborting operation. Starting resourcemanager ERROR: Attempting to operate on yarn resourcemanager as root ERROR: but there is no YARN_RESOURCEMANAGER_USER defined. Aborting operation. Starting nodemanagers ERROR: Attempting to operate on yarn nodemanager as root ERROR: but there is no YARN_NODEMANAGER_USER defined. Aborting operation.
發現有很多ERROR資訊,提示缺少HDFS和YARN的一些使用者資訊。
解決方案如下: 修改sbin目錄下的 start-dfs.sh , stop-dfs.sh 這兩個指令碼檔案,在檔案前面增加如下內容
start-dfs.sh 檔案中# Start hadoop dfs daemons之前增加如下內容:
[root@bigdata01 sbin]# vi start-dfs.sh
HDFS_DATANODE_USER=root HDFS_DATANODE_SECURE_USER=hdfs HDFS_NAMENODE_USER=root HDFS_SECONDARYNAMENODE_USER=root
stop-dfs.sh檔案中# Stop hadoop dfs daemons之前增加如下內容:
[root@bigdata01 sbin]# vi stop-dfs.sh
HDFS_DATANODE_USER=root HDFS_DATANODE_SECURE_USER=hdfs HDFS_NAMENODE_USER=root HDFS_SECONDARYNAMENODE_USER=root
修改sbin目錄下的start-yarn.sh , stop-yarn.sh 這兩個指令碼檔案,在檔案前面增加如下內容
start-yarn.sh檔案中## @description usage info之前增加
[root@bigdata01 sbin]# vi start-yarn.sh
YARN_RESOURCEMANAGER_USER=root HADOOP_SECURE_DN_USER=yarn YARN_NODEMANAGER_USER=root
stop-yarn.sh檔案中## @description usage info之前增加
[root@bigdata01 sbin]# vi stop-yarn.sh YARN_RESOURCEMANAGER_USER=root HADOOP_SECURE_DN_USER=yarn YARN_NODEMANAGER_USER=root
重新啟動偽分佈叢集
[root@bigdata01 sbin]# cd /data/soft/hadoop-3.2.2 [root@bigdata01 hadoop-3.2.2]# sbin/start-all.sh Starting namenodes on [bigdata01] Last login: Sat Jul 3 17:53:46 CST 2021 from fe80::3c67:9a76:5a25:c4d9%ens33 on pts/3 Starting datanodes Last login: Sat Jul 3 21:25:38 CST 2021 on pts/2 Starting secondary namenodes [bigdata01] Last login: Sat Jul 3 21:25:40 CST 2021 on pts/2 Starting resourcemanager Last login: Sat Jul 3 21:25:48 CST 2021 on pts/2 Starting nodemanagers Last login: Sat Jul 3 21:25:58 CST 2021 on pts/2
驗證叢集程序資訊 執行jps命令可以檢視叢集的程序資訊,去掉Jps這個程序之外還需要有5個程序才說明叢集是正常啟動的
[root@bigdata01 hadoop-3.2.2]# jps 2561 DataNode 2403 NameNode 3448 Jps 3115 NodeManager 2751 SecondaryNameNode 2991 ResourceManager
還可以通過webui介面來驗證叢集服務是否正常
HDFS webui介面:http://192.168.184.128:9870
YARN webui介面:http://192.168.184.128:8088
如果想通過主機名訪問,則需要修改windows機器中的hosts檔案.
檔案所在位置為:C:\Windows\System32\drivers\etc\HOSTS 在檔案中增加下面內容,這個其實就是Linux虛擬機器的ip和主機名,在這裡做一個對映之後,就可以在 Windows機器中通過主機名訪問這個Linux虛擬機器了。
停止叢集
如果修改了叢集的配置檔案或者是其它原因要停止叢集,可以使用下面命令
[root@bigdata01 hadoop-3.2.0]# sbin/stop-all.sh Stopping namenodes on [bigdata01] Last login: Tue Apr 7 17:59:40 CST 2020 on pts/0 Stopping datanodes Last login: Tue Apr 7 18:06:09 CST 2020 on pts/0 Stopping secondary namenodes [bigdata01] Last login: Tue Apr 7 18:06:10 CST 2020 on pts/0 Stopping nodemanagers Last login: Tue Apr 7 18:06:13 CST 2020 on pts/0 Stopping resourcemanager Last login: Tue Apr 7 18:06:16 CST 2020 on pts/0