完全分布式集群(HA)
环境准备
修改IP
修改主机名及主机名和IP地址的映射
关闭防火墙
ssh免密登录
安装JDK,配置环境变量
集群规则
节点名称
NN
JJN
DN
ZK/FC
ZK
RM
NM
spark-node1
NameNode
JournalNode
DataNode
ZK/FC
ZooKeeper
NodeManager
spark-node2
NameNode
JournalNode
DataNode
ZK/FC
ZooKeeper
ResourceManager
NodeManager
spark-node3
JournalNode
DataNode
ZooKeeper
ResourceManager
NodeManager
安装Zookeeper集群
安装详解参考:CentOS 7.5 搭建Zookeeper集群与命令行操作
设置SSH免密钥
关于ssh免密钥的设置,要求每两台主机之前设置免密码,自己的主机与自己的主机之间也要设置免密码。这项操作可以在admin用户下执行,执行完毕公钥在/home/xxx/.ssh/id_rsa.pub
。
1 2 3 4 [xxx@spark-node1 ~]# ssh-keygen -t rsa [xxx@spark-node1 ~]# ssh-copy-id spark-node1 [xxx@spark-node1 ~]# ssh-copy-id spark-node2 [xxx@spark-node1 ~]# ssh-copy-id spark-node3
spark-node1与spark-node2为namenode节点要相互免密钥 HDFS的HA
1 2 3 4 [xxx@spark-node2 ~]# ssh-keygen -t rsa [xxx@spark-node2 ~]# ssh-copy-id spark-node2 [xxx@spark-node2 ~]# ssh-copy-id spark-node1 [xxx@spark-node2 ~]# ssh-copy-id spark-node3
spark-node2与spark-node3为yarn节点要相互免密钥 YARN的HA
1 2 3 4 [xxx@spark-node3 ~]# ssh-keygen -t rsa [xxx@spark-node3 ~]# ssh-copy-id spark-node3 [xxx@spark-node3 ~]# ssh-copy-id spark-node1 [xxx@spark-node3 ~]# ssh-copy-id spark-node2
安装配置Hadoop集群
解压hadoop-2.7.6.tar.gz
到 /opt/
目录下
1 sudo tar zxvf hadoop-2.7.6.tar.gz -C /opt/
创建软链接
1 sudo ln -s /opt/hadoop-2.7.6 /opt/hadoop
配置Hadoop集群,配置文件都在/opt/hadoop/etc/hadoop/
下,修改hadoop-env.sh
,mapred-env.sh
,yarn-env.sh
的JAVA环境变量
1 export JAVA_HOME=/usr/java/jdk1.8.0_191-amd64
修改core-site.xml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 <configuration > <property > <name > fs.defaultFS</name > <value > hdfs://mycluster</value > </property > <property > <name > hadoop.tmp.dir</name > <value > /opt/hadoop/data/ha/tmp</value > </property > <property > <name > ha.zookeeper.quorum</name > <value > spark-node1:2181,spark-node2:2181,spark-node3:2181</value > </property > </configuration >
修改hdfs-site.xml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 <configuration > <property > <name > dfs.replication</name > <value > 2</value > </property > <property > <name > dfs.nameservices</name > <value > mycluster</value > </property > <property > <name > dfs.ha.namenodes.mycluster</name > <value > nn1,nn2</value > </property > <property > <name > dfs.namenode.rpc-address.mycluster.nn1</name > <value > spark-node1:8020</value > </property > <property > <name > dfs.namenode.rpc-address.mycluster.nn2</name > <value > spark-node2:8020</value > </property > <property > <name > dfs.namenode.http-address.mycluster.nn1</name > <value > spark-node1:50070</value > </property > <property > <name > dfs.namenode.http-address.mycluster.nn2</name > <value > spark-node2:50070</value > </property > <property > <name > dfs.namenode.shared.edits.dir</name > <value > qjournal://spark-node1:8485;spark-node2:8485;spark-node3:8485/mycluster</value > </property > <property > <name > dfs.ha.fencing.methods</name > <value > sshfence</value > </property > <property > <name > dfs.ha.fencing.ssh.private-key-files</name > <value > /home/admin/.ssh/id_rsa</value > </property > <property > <name > dfs.journalnode.edits.dir</name > <value > /opt/hadoop/data/ha/jn</value > </property > <property > <name > dfs.permissions.enable</name > <value > false</value > </property > <property > <name > dfs.client.failover.proxy.provider.mycluster</name > <value > org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value > </property > <property > <name > dfs.ha.automatic-failover.enabled</name > <value > true</value > </property > </configuration >
修改mapred-site.xml
1 2 mv mapred-site.xml.template mapred-site.xml vim mapred-site.xml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 <configuration > <property > <name > mapreduce.framework.name</name > <value > yarn</value > </property > <property > <name > mapreduce.jobhistory.address</name > <value > spark-node1:10020</value > </property > <property > <name > mapreduce.jobhistory.webapp.address</name > <value > spark-node1:19888</value > </property > <property > <name > mapreduce.jobhistory.joblist.cache.size</name > <value > 20000</value > </property > <property > <name > mapreduce.jobhistory.done-dir</name > <value > ${yarn.app.mapreduce.am.staging-dir}/history/done</value > </property > <property > <name > mapreduce.jobhistory.intermediate-done-dir</name > <value > ${yarn.app.mapreduce.am.staging-dir}/history/done_intermediate</value > </property > <property > <name > yarn.app.mapreduce.am.staging-dir</name > <value > /tmp/hadoop-yarn/staging</value > </property > </configuration >
修改slaves
1 2 3 4 vim slaves spark-node1 spark-node2 spark-node3
修改yarn-site.xml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 <configuration> <!-- reducer获取数据的方式 --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <!--启用resourcemanager ha--> <property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> </property> <!--声明两台resourcemanager的地址--> <property> <name>yarn.resourcemanager.cluster-id</name> <value>rmCluster</value> </property> <property> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2</value> </property> <property> <name>yarn.resourcemanager.hostname.rm1</name> <value>spark-node2</value> </property> <property> <name>yarn.resourcemanager.hostname.rm2</name> <value>spark-node3</value> </property> <!--指定zookeeper集群的地址--> <property> <name>yarn.resourcemanager.zk-address</name> <value>spark-node1:2181,spark-node2:2181,spark-node3:2181</value> </property> <!--启用自动恢复--> <property> <name>yarn.resourcemanager.recovery.enabled</name> <value>true</value> </property> <!--指定resourcemanager的状态信息存储在zookeeper集群--> <property> <name>yarn.resourcemanager.store.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value> </property> </configuration>
拷贝hadoop到其它节点
1 2 scp -r hadoop-2.7.6/ xxx@spark-node2:/opt/ scp -r hadoop-2.7.6/ xxx@spark-node3:/opt/
配置Hadoop环境变量
1 2 3 4 5 sudo vim /etc/profile # export HADOOP_HOME=/opt/hadoop-2.7.6 export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
启动集群
在各个JournalNode节点上,输入以下命令启动journalnode服务:(前提zookeeper集群已启动)
1 2 3 [xxx@spark-node1 ~]$ hadoop-daemon.sh start journalnode [xxx@spark-node2 ~]$ hadoop-daemon.sh start journalnode [xxx@spark-node3 ~]$ hadoop-daemon.sh start journalnode
启动Journalnode是为了创建/data/ha/jn
,此时jn里面是空的
在[nn1 ]
上,对namenode进行格式化,并启动:
格式化namenode,此时jn里面会产生集群ID等信息
另外,/data/ha/tmp
也会产生如下信息
启动nn1上namenode
1 2 [xxx@spark-node1 current]$ hadoop-daemon.sh start namenode starting namenode, logging to /opt/hadoop-2.7.6/logs/hadoop-admin-namenode-node21.out
在[nn2]
上,同步nn1的元数据信息:
1 [xxx@spark-node2 ~]$ hdfs namenode -bootstrapStandby
启动[nn2]
:
1 [xxx@spark-node2 ~]$ hadoop-daemon.sh start namenode
在[nn1]
上,启动所有datanode
1 [xxx@spark-node1 ~]$ hadoop-daemons.sh start datanode
查看web页面此时显示
手动切换状态,在各个namenode节点上启动DFSZK Failover Controller,先在哪台机器启动,哪个机器的namenode就是Active NameNode
1 2 [xxx@spark-node1 ~]$ hadoop-daemon.sh start zkfc [xxx@spark-node2 ~]$ hadoop-daemon.sh start zkfc
或者强制手动其中一个节点变为Active
1 [xxx@spark-node2 data]$ hdfs haadmin -transitionToActive nn1 --forcemanual
web页面查看
自动切换状态,需要初始化HA在zookeeper中状态,先停掉hdfs服务,然后随便找一台zookeeper的安装节点
1 [xxx@spark-node1 current]$ hdfs zkfc -formatZK
查看,此时会产生一个hadoop-ha目录
1 [root@spark-node2 ~]# zkCli.sh
启动hdfs服务,查看namenode状态
1 [xxx@spark-node1 ~]$ start-dfs.sh
验证
将Active NameNode进程kill
将Active NameNode机器断开网络
启动yarn
在spark-node2中执行:
1 [xxx@spark-node2 ~]$ start-yarn.sh
在spark-node3中执行:
1 [xxx@spark-node3 ~]$ yarn-daemon.sh start resourcemanager
查看服务状态
1 2 3 4 [xxx@spark-node2 ~]$ yarn rmadmin -getServiceState rm1 active [xxx@spark-node2 ~]$ yarn rmadmin -getServiceState rm2 standby
测试集群
查看进程
1 2 3 [xxx@spark-node1 ~]$ start-dfs.sh [xxx@spark-node2 ~]$ start-yarn.sh [xxx@spark-node3 ~]$ yarn-daemon.sh start resourcemanager
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 # xxx @ spark-node1 in ~ [10:29:20] $ jps 3220 JournalNode 3588 DataNode 4967 Jps 4378 NodeManager 3725 DFSZKFailoverController 3407 NameNode # xxx @ spark-node2 in ~ [10:28:32] $ jps 3939 ResourceManager 3380 NameNode 3508 DataNode 4040 NodeManager 3660 DFSZKFailoverController 4621 Jps 3182 JournalNode # xxx @ spark-node3 in ~ [10:28:09] $ jps 3188 JournalNode 3989 Jps 3784 ResourceManager 3641 NodeManager 3371 DataNode
任务提交
上传文件到集群
1 2 3 4 [xxx@spark-node1 ~]$ hadoop fs -mkdir -p /user/galudisu/input [xxx@spark-node1 ~]$ mkdir -p /opt/wcinput/ [xxx@spark-node1 ~]$ vi /opt/wcinput/wc.txt [xxx@spark-node1 ~]$ hadoop fs -put /opt/wcinput/wc.txt /user/galudisu/input
wc.txt文本内容为
1 2 3 4 hadoop spark storm hbase hive sqoop hadoop flink flume spark hadoop
上传文件后查看文件存放在什么位置
1 2 3 4 5 6 7 8 9 文件存储路径 [xxx@spark-node1 subdir0]$ pwd /opt/hadoop/data/ha/tmp/dfs/data/current/BP-1244373306-192.168.100.21-1527653416622/current/finalized/subdir0/subdir0 查看文件内容 [xxx@spark-node1 subdir0]$ cat blk_1073741825 hadoop spark storm hbase hive sqoop hadoop flink flume spark hadoop
下载文件
1 [xxx@spark-node1 opt]$ hadoop fs -get /user/admin/input/wc.txt
执行wordcount程序
1 [xxx@spark-node1 ~]$ hadoop jar /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.6.jar wordcount /user/galudisu/input /user/galudisu/output
执行过程
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 18/10/22 10:44:06 INFO input.FileInputFormat: Total input paths to process : 1 18/10/22 10:44:06 INFO mapreduce.JobSubmitter: number of splits:1 18/10/22 10:44:07 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1540218481613_0001 18/10/22 10:44:07 INFO impl.YarnClientImpl: Submitted application application_1540218481613_0001 18/10/22 10:44:07 INFO mapreduce.Job: The url to track the job: http://spark-node2:8088/proxy/application_1540218481613_0001/ 18/10/22 10:44:07 INFO mapreduce.Job: Running job: job_1540218481613_0001 18/10/22 10:44:19 INFO mapreduce.Job: Job job_1540218481613_0001 running in uber mode : false 18/10/22 10:44:19 INFO mapreduce.Job: map 0% reduce 0% 18/10/22 10:44:31 INFO mapreduce.Job: map 100% reduce 0% 18/10/22 10:44:39 INFO mapreduce.Job: map 100% reduce 100% 18/10/22 10:44:40 INFO mapreduce.Job: Job job_1540218481613_0001 completed successfully 18/10/22 10:44:40 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=102 FILE: Number of bytes written=250893 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=178 HDFS: Number of bytes written=64 HDFS: Number of read operations=6 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Launched reduce tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=9932 Total time spent by all reduces in occupied slots (ms)=4401 Total time spent by all map tasks (ms)=9932 Total time spent by all reduce tasks (ms)=4401 Total vcore-milliseconds taken by all map tasks=9932 Total vcore-milliseconds taken by all reduce tasks=4401 Total megabyte-milliseconds taken by all map tasks=10170368 Total megabyte-milliseconds taken by all reduce tasks=4506624 Map-Reduce Framework Map input records=4 Map output records=11 Map output bytes=112 Map output materialized bytes=102 Input split bytes=108 Combine input records=11 Combine output records=8 Reduce input groups=8 Reduce shuffle bytes=102 Reduce input records=8 Reduce output records=8 Spilled Records=16 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=186 CPU time spent (ms)=1950 Physical memory (bytes) snapshot=291643392 Virtual memory (bytes) snapshot=4170993664 Total committed heap usage (bytes)=141291520 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=70 File Output Format Counters Bytes Written=64
下载查看
1 2 3 4 5 6 7 8 9 10 11 12 13 # xxx @ spark-node1 in ~ [10:52:01] $ hadoop fs -get /user/galudisu/output/part-r-00000 # galudisu @ spark-node1 in ~ [10:52:06] $ cat part-r-00000 flink 1 flume 1 hadoop 3 hbase 1 hive 1 spark 2 sqoop 1 storm 1
Hadoop加入Systemd服务
类似于ZooKeeper,将hadoop加入systemd中,让系统启动后自启
首先,在各个节点的hadoop目录创建一个启动脚本
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 # !/bin/bash start() { source "/etc/profile" /opt/hadoop/sbin/start-dfs.sh /opt/hadoop/sbin/start-yarn.sh } stop() { source "/etc/profile" /opt/hadoop/sbin/stop-yarn.sh /opt/hadoop/sbin/stop-dfs.sh } case $1 in start|stop) "$1" ;; esac exit 0
加入系统Systemd开机启动,注意下面after,没有的服务要去掉
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 [Unit] Description=Hadoop DFS namenode and datanode After=syslog.target network.target remote-fs.target nss-lookup.target network-online.target Requires=network-online.target Wants=zookeeper.target [Service] User=spark Group=spark Type=forking ExecStart=/opt/hadoop/hadoop-service.sh start ExecStop=/opt/hadoop/hadoop-service.sh stop RemainAfterExit=yes Environment=JAVA_HOME=/usr/java/jdk1.8.0_191-amd64 Environment=HADOOP_HOME=/opt/hadoop [Install] WantedBy=multi-user.target
按照上面的方式,单独修改每个节点的hadoop-service.sh
即可。