如何部署Oracle Linux 10.x Apache Hadoop YARN?
- By : Will
- Category : Apache-Hadoop
1 基础知识
2 最佳实践
2.1 部署环境信息
2.1.1 角色Hadoop HDFS的基本信息
IP Address = 10.168.0.10[1-3]
OS = Oracle Linux 10.x x86_64
Host Name = hd0[1-3].cmdschool.org
详细的角色分布如下,
Apache Hadoop HDFS NameNode(hdfs-nn) = hd01.cmdschool.org
Apache Hadoop HDFS SecondaryNameNode(hdfs-snn) = hd02.cmdschool.org
Apache Hadoop HDFS DataNode(hdfs-snn) = hd0[1-3].cmdschool.org
另外,如果你尚未配置HDFS,请参阅以下章节,
3.1.2 角色Apache Hadoop YARN的基本信息
hostname = hd0[1-3].cmdschool.org
ipaddress = 10.168.0.10[1-3]
OS = Oracle Linux 10.x x86_64
详细的角色分布如下,
Apache Hadoop YARN ResourceManager(yarn-rm) = hd01.cmdschool.org
Apache Hadoop YARN NodeManager(yarn-nm) = hd0[1-3].cmdschool.org
2.2 Hadoop YARN的基本配置
2.2.1 配置运行用户
In hd0[1-3],
groupadd yarn useradd -g yarn -G hadoop -d /var/lib/hadoop-yarn/ yarn
2.2.2 允许yarn以hdfs身份执行rsync
In hd0[1-3],
visudo
在以下第一行加入第二行配置,
root ALL=(ALL) ALL yarn ALL=(hdfs) NOPASSWD: /usr/bin/rsync
注:以上允许yarn用户以无密码的方式调用hdfs执行脚本同步
2.2.3 修改rsync的同步脚本
In hd0[1-3],
vim /usr/hadoop-2.6.0/sbin/yarn-daemon.sh
注释原来的行,然后如第二行的配置,
# rsync -a -e ssh --delete --exclude=.svn --exclude='logs/*' --exclude='contrib/hod/logs/*' $YARN_MASTER/ "$HADOOP_YARN_HOME" sudo -u hdfs rsync -a -e ssh --delete --exclude=.svn --exclude='logs/*' --exclude='contrib/hod/logs/*' $YARN_MASTER/ "$HADOOP_YARN_HOME"
注:以上yarn脚本调用hdfs的身份完成配置的同步
2.2.4 配置环境变量
In hd0[1-3],
vim /etc/profile.d/hadoop.sh
继之前的配置加入如下内容,
export YARN_CONF_DIR=/etc/hadoop
export YARN_LOG_DIR=/var/log/hadoop-yarn
export YARN_MASTER=hd01:${HADOOP_YARN_HOME}
export YARN_PID_DIR=/var/run/hadoop-yarn
export YARN_IDENT_STRING=$USER
export YARN_NICENESS=0
以上的配置解析如下,请重点理解第四行,
– 第一行声明YARN配置文件的位置
– 第二行声明YARN日志文件位置
– 第三行声明非主服务器使用rsync从hd01的“/usr/hadoop-2.6.0”目录同步配置(此配置是重点)
– 第四行声明YARN PDI文件的位置
– 第五行声明运行的用户(“$USER即指当前用户”)
– 第六行声明进程的优先级别,默认值“0”
如果使用如下命令查看整个配置文件的定义,
cat /etc/profile.d/hadoop.sh
可见如下配置,
export HADOOP_HOME=/usr/hadoop-2.6.0
export HADOOP_PREFIX=${HADOOP_HOME}
export HADOOP_YARN_HOME=${HADOOP_HOME}
export PATH=${HADOOP_HOME}/bin:$PATH
export PATH=${HADOOP_HOME}/sbin:$PATH
export HADOOP_CONF_DIR=/etc/hadoop
export HADOOP_LOG_DIR=/var/log/hadoop-hdfs
export HADOOP_PID_DIR=/var/run/hadoop-hdfs
export HADOOP_MASTER=hd01:${HADOOP_HOME}
export HADOOP_IDENT_STRING=$USER
export HADOOP_NICENESS=0
export YARN_CONF_DIR=/etc/hadoop
export YARN_LOG_DIR=/var/log/hadoop-yarn
export YARN_MASTER=hd01:${HADOOP_YARN_HOME}
export YARN_PID_DIR=/var/run/hadoop-yarn
export YARN_IDENT_STRING=$USER
export YARN_NICENESS=0
根据声明的目录配置权限,
mkdir -p /var/log/hadoop-yarn chown yarn:yarn /var/log/hadoop-yarn chmod 775 /var/log/hadoop-yarn mkdir -p /var/run/hadoop-yarn chown yarn:yarn /var/run/hadoop-yarn chmod 775 /var/run/hadoop-yarn
配置完成后,请使用如下命令重新导入环境变量,
source /etc/profile.d/hadoop.sh
2.2.4 确认YARN部署
In hd0[1-3],
yarn version
可见如下输出,
Hadoop 2.6.0 Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r e3496499ecb8d220fba99dc5ed4c99c8f9e33bb1 Compiled by jenkins on 2014-11-13T21:10Z Compiled with protoc 2.5.0 From source with checksum 18e43357c8f927c0695f1e9522859d6a This command was run using /usr/hadoop-2.6.0/share/hadoop/common/hadoop-common-2.6.0.jar
2.3 配置资源管理器节点(主节点)
In hd01,
2.3.1 定义mapred-site.xml配置文件
cp /etc/hadoop/mapred-site.xml.template /etc/hadoop/mapred-site.xml vim /etc/hadoop/mapred-site.xml
修改如下配置,
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
注:配置“mapreduce.framework.name”属性值为“mapreduce.framework.name”表示使用yarn运行mapreduce程序
2.3.2 定义yarn-site.xml配置文件
cp /etc/hadoop/yarn-site.xml /etc/hadoop/yarn-site.xml.default vim /etc/hadoop/yarn-site.xml
修改如下配置,
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hd01</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/data/yarn/nm</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/data/yarn/container-logs</value>
</property>
</configuration>
注:
– 属性“yarn.resourcemanager.hostname”指定资源管理器的主机名称
– 属性“yarn.nodemanager.local-dirs”指定节点管理器的本地存储目录
– 属性“yarn.nodemanager.log-dirs”指定节点管理器的本地日志存储目录
根据配置文件增加目录,
mkdir -p /data/yarn/nm chown yarn:yarn /data/yarn/nm chmod 775 /data/yarn/nm mkdir -p /data/yarn/container-logs chown yarn:yarn /data/yarn/container-logs chmod 775 /data/yarn/container-logs
2.3.3 启动守护进程
su - yarn -c '/usr/hadoop-2.6.0/sbin/yarn-daemon.sh start resourcemanager'
可见如下输出,
rsync from hd01:/usr/hadoop-2.6.0 starting resourcemanager, logging to /var/log/hadoop-yarn/yarn-yarn-resourcemanager-hd01.cmdschool.org.out
可使用如下命令确认运行的进程,
pgrep -u yarn -a java
可见如下输出,
102742 /usr/java/jdk1.8.0_121/bin/java -Dproc_resourcemanager -Xmx1000m -Dhadoop.log.dir=/var/log/hadoop-yarn -Dyarn.log.dir=/var/log/hadoop-yarn -Dhadoop.log.file=yarn-yarn-resourcemanager-hd01.cmdschool.org.log -Dyarn.log.file=yarn-yarn-resourcemanager-hd01.cmdschool.org.log -Dyarn.home.dir= -Dyarn.id.str=yarn -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Djava.library.path=/usr/hadoop-2.6.0/lib/native -Dyarn.policy.file=hadoop-policy.xml -Dhadoop.log.dir=/var/log/hadoop-yarn -Dyarn.log.dir=/var/log/hadoop-yarn -Dhadoop.log.file=yarn-yarn-resourcemanager-hd01.cmdschool.org.log -Dyarn.log.file=yarn-yarn-resourcemanager-hd01.cmdschool.org.log -Dyarn.home.dir=/usr/hadoop-2.6.0 -Dhadoop.home.dir=/usr/hadoop-2.6.0 -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Djava.library.path=/usr/hadoop-2.6.0/lib/native -classpath /etc/hadoop:/etc/hadoop:/etc/hadoop:/usr/hadoop-2.6.0/share/hadoop/common/lib/*:/usr/hadoop-2.6.0/share/hadoop/common/*:/usr/hadoop-2.6.0/share/hadoop/hdfs:/usr/hadoop-2.6.0/share/hadoop/hdfs/lib/*:/usr/hadoop-2.6.0/share/hadoop/hdfs/*:/usr/hadoop-2.6.0/share/hadoop/yarn/lib/*:/usr/hadoop-2.6.0/share/hadoop/yarn/*:/usr/hadoop-2.6.0/share/hadoop/mapreduce/lib/*:/usr/hadoop-2.6.0/share/hadoop/mapreduce/*:/usr/hadoop-2.6.0/contrib/capacity-scheduler/*.jar:/usr/hadoop-2.6.0/contrib/capacity-scheduler/*.jar:/usr/hadoop-2.6.0/share/hadoop/yarn/*:/usr/hadoop-2.6.0/share/hadoop/yarn/lib/*:/etc/hadoop/rm-config/log4j.properties org.apache.hadoop.yarn.server.resourcemanager.ResourceManager
可使用如下命令确认程序的启动,
ss -antp | grep -f <(pgrep -u yarn java)
可见如下输出,
LISTEN 0 128 [::ffff: 10.168.0.101]:8033 *:* users:(("java",pid=102742,fd=232))
LISTEN 0 128 [::ffff: 10.168.0.101]:8032 *:* users:(("java",pid=102742,fd=218))
LISTEN 0 128 [::ffff: 10.168.0.101]:8031 *:* users:(("java",pid=102742,fd=197))
LISTEN 0 128 [::ffff: 10.168.0.101]:8030 *:* users:(("java",pid=102742,fd=208))
LISTEN 0 128 [::ffff: 10.168.0.101]:8088 *:* users:(("java",pid=102742,fd=228))
测试到这里,请使用以下命令停止服务,
su - yarn -c '/usr/hadoop-2.6.0/sbin/yarn-daemon.sh stop resourcemanager'
2.3.4 配置服务控制脚本
vim /usr/lib/systemd/system/yarn-rm.service
可加入如下配置,
[Unit] Description=Apache YARN resource manager Wants=network.target Before=network.target After=network-pre.target Documentation=https://hadoop.apache.org/docs/ [Service] Type=forking User=yarn Group=yarn Environment="JAVA_HOME=/usr/java/jdk1.8.0_121" Environment="HADOOP_HOME=/usr/hadoop-2.6.0" Environment="HADOOP_PREFIX=/usr/hadoop-2.6.0" Environment="HADOOP_YARN_HOME=/usr/hadoop-2.6.0" Environment="HADOOP_CONF_DIR=/etc/hadoop" Environment="HADOOP_LOG_DIR=/var/log/hadoop-hdfs" Environment="HADOOP_PID_DIR=/var/run/hadoop-hdfs" Environment="HADOOP_MASTER=hd01:/usr/hadoop-2.6.0" Environment="HADOOP_IDENT_STRING=hdfs" Environment="HADOOP_NICENESS=0" Environment="YARN_CONF_DIR=/etc/hadoop" Environment="YARN_LOG_DIR=/var/log/hadoop-yarn" Environment="YARN_MASTER=hd01:/usr/hadoop-2.6.0" Environment="YARN_PID_DIR=/var/run/hadoop-yarn" Environment="YARN_IDENT_STRING=yarn" Environment="YARN_NICENESS=0" ExecStartPre=+/bin/sh -c 'mkdir -p /var/run/hadoop-yarn;chown yarn:yarn /var/run/hadoop-yarn;chmod 775 /var/run/hadoop-yarn' ExecStartPre=+/bin/sh -c 'mkdir -p /var/log/hadoop-yarn;chown yarn:yarn /var/log/hadoop-yarn;chmod 775 /var/log/hadoop-yarn' ExecStart=/usr/hadoop-2.6.0/sbin/yarn-daemon.sh start resourcemanager ExecStop=/usr/hadoop-2.6.0/sbin/yarn-daemon.sh stop resourcemanager PIDFile=/var/run/hadoop-yarn/yarn-yarn-resourcemanager.pid Restart=on-success [Install] WantedBy=multi-user.target
修改完脚本后,你需要使用如下命令重载服务,
systemctl daemon-reload
测试服务启动并设置服务自启动,
systemctl start yarn-rm.service systemctl status yarn-rm.service systemctl enable yarn-rm.service
2.3.5 开放节点的端口
firewall-cmd --permanent --add-port 8088/tcp --add-port 8030-8033/tcp firewall-cmd --reload firewall-cmd --list-all
2.3.6 浏览器测试
In Windows Client
http://10.168.0.101:8088/
可见如下显示,

如果你配置完所有节点,单击【nodes】则显示如下,

2.4 配置节点管理器(从节点)
In hd0[1-5],
2.4.1 启动守护进程
su - yarn -c '/usr/hadoop-2.6.0/sbin/yarn-daemon.sh start nodemanager'
可见如下输出,
rsync from hd01:/usr/hadoop-2.6.0 starting nodemanager, logging to /var/log/hadoop-yarn/yarn-yarn-nodemanager-hd01.cmdschool.org.out
可使用如下命令确认运行的进程,
pgrep -u yarn -a java | grep -i proc_nodemanager
可见如下输出,
104820 /usr/java/jdk1.8.0_121/bin/java -Dproc_nodemanager -Xmx1000m -Dhadoop.log.dir=/var/log/hadoop-yarn -Dyarn.log.dir=/var/log/hadoop-yarn -Dhadoop.log.file=yarn-yarn-nodemanager-hd01.cmdschool.org.log -Dyarn.log.file=yarn-yarn-nodemanager-hd01.cmdschool.org.log -Dyarn.home.dir= -Dyarn.id.str=yarn -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Djava.library.path=/usr/hadoop-2.6.0/lib/native -Dyarn.policy.file=hadoop-policy.xml -server -Dhadoop.log.dir=/var/log/hadoop-yarn -Dyarn.log.dir=/var/log/hadoop-yarn -Dhadoop.log.file=yarn-yarn-nodemanager-hd01.cmdschool.org.log -Dyarn.log.file=yarn-yarn-nodemanager-hd01.cmdschool.org.log -Dyarn.home.dir=/usr/hadoop-2.6.0 -Dhadoop.home.dir=/usr/hadoop-2.6.0 -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Djava.library.path=/usr/hadoop-2.6.0/lib/native -classpath /etc/hadoop:/etc/hadoop:/etc/hadoop:/usr/hadoop-2.6.0/share/hadoop/common/lib/*:/usr/hadoop-2.6.0/share/hadoop/common/*:/usr/hadoop-2.6.0/share/hadoop/hdfs:/usr/hadoop-2.6.0/share/hadoop/hdfs/lib/*:/usr/hadoop-2.6.0/share/hadoop/hdfs/*:/usr/hadoop-2.6.0/share/hadoop/yarn/lib/*:/usr/hadoop-2.6.0/share/hadoop/yarn/*:/usr/hadoop-2.6.0/share/hadoop/mapreduce/lib/*:/usr/hadoop-2.6.0/share/hadoop/mapreduce/*:/usr/hadoop-2.6.0/contrib/capacity-scheduler/*.jar:/usr/hadoop-2.6.0/contrib/capacity-scheduler/*.jar:/usr/hadoop-2.6.0/share/hadoop/yarn/*:/usr/hadoop-2.6.0/share/hadoop/yarn/lib/*:/etc/hadoop/nm-config/log4j.properties org.apache.hadoop.yarn.server.nodemanager.NodeManager
或者,以下命令更加简洁,
jps
可见如下输出,
104208 ResourceManager 1505 DataNode 1506 NameNode 104820 NodeManager 104950 Jps
可使用如下命令确认程序的启动,
ss -antp | grep -f <(pgrep -u yarn -a java | grep -i proc_nodemanager | awk '{print $1}')
可见如下输出,
LISTEN 0 128 *:42891 *:* users:(("java",pid=104820,fd=197))
LISTEN 0 128 *:8040 *:* users:(("java",pid=104820,fd=208))
LISTEN 0 128 *:8042 *:* users:(("java",pid=104820,fd=218))
ESTAB 0 0 [::ffff:10.168.0.101]:34842 [::ffff:10.168.0.101]:8031 users:(("java",pid=104820,fd=222))
测试到这里,请使用以下命令停止服务,
su - yarn -c '/usr/hadoop-2.6.0/sbin/yarn-daemon.sh stop nodemanager'
2.4.2 配置服务控制脚本
vim /usr/lib/systemd/system/yarn-nm.service
可加入如下配置,
[Unit] Description=Apache YARN nodemanager Wants=network.target Before=network.target After=network-pre.target Documentation=https://hadoop.apache.org/docs/ [Service] Type=forking User=yarn Group=yarn Environment="JAVA_HOME=/usr/java/jdk1.8.0_121" Environment="HADOOP_HOME=/usr/hadoop-2.6.0" Environment="HADOOP_PREFIX=/usr/hadoop-2.6.0" Environment="HADOOP_YARN_HOME=/usr/hadoop-2.6.0" Environment="HADOOP_CONF_DIR=/etc/hadoop" Environment="HADOOP_LOG_DIR=/var/log/hadoop-hdfs" Environment="HADOOP_PID_DIR=/var/run/hadoop-hdfs" Environment="HADOOP_MASTER=hd01:/usr/hadoop-2.6.0" Environment="HADOOP_IDENT_STRING=hdfs" Environment="HADOOP_NICENESS=0" Environment="YARN_CONF_DIR=/etc/hadoop" Environment="YARN_LOG_DIR=/var/log/hadoop-yarn" Environment="YARN_MASTER=hd01:/usr/hadoop-2.6.0" Environment="YARN_PID_DIR=/var/run/hadoop-yarn" Environment="YARN_IDENT_STRING=yarn" Environment="YARN_NICENESS=0" ExecStartPre=+/bin/sh -c 'mkdir -p /var/run/hadoop-yarn;chown yarn:yarn /var/run/hadoop-yarn;chmod 775 /var/run/hadoop-yarn' ExecStartPre=+/bin/sh -c 'mkdir -p /var/log/hadoop-yarn;chown yarn:yarn /var/log/hadoop-yarn;chmod 775 /var/log/hadoop-yarn' ExecStartPre=+/bin/sh -c 'mkdir -p /data/yarn/nm;chown yarn:yarn /data/yarn/nm;chmod 775 /data/yarn/nm' ExecStartPre=+/bin/sh -c 'mkdir -p /data/yarn/container-logs;chown yarn:yarn /data/yarn/container-logs;chmod 775 /data/yarn/container-logs' ExecStart=/usr/hadoop-2.6.0/sbin/yarn-daemon.sh start nodemanager ExecStop=/usr/hadoop-2.6.0/sbin/yarn-daemon.sh stop nodemanager PIDFile=/var/run/hadoop-yarn/yarn-yarn-nodemanager.pid Restart=on-success [Install] WantedBy=multi-user.target
修改完脚本后,你需要使用如下命令重载服务,
systemctl daemon-reload
你可使用如下命启动服务并设置自启动,
systemctl start yarn-nm.service systemctl status yarn-nm.service systemctl enable yarn-nm.service
2.4.3 开放节点的端口
firewall-cmd --permanent --add-port 8040/tcp --add-port 8042/tcp --add-port 13562/tcp --add-port 35000-50000/tcp firewall-cmd --reload firewall-cmd --list-all
2.4.4 浏览器测试
In Windows Client
http://10.168.0.101:8042/
可见如下显示,

没有评论