如何部署Oracle Linux 10.x Apache Hadoop YARN?

Apache-Hadoop

1 基础知识

如何部署Apache Hadoop YARN集群?

2 最佳实践

2.1 部署环境信息

2.1.1 角色Hadoop HDFS的基本信息

IP Address = 10.168.0.10[1-3] OS = Oracle Linux 10.x x86_64
Host Name = hd0[1-3].cmdschool.org
详细的角色分布如下,
Apache Hadoop HDFS NameNode(hdfs-nn) = hd01.cmdschool.org
Apache Hadoop HDFS SecondaryNameNode(hdfs-snn) = hd02.cmdschool.org
Apache Hadoop HDFS DataNode(hdfs-snn) = hd0[1-3].cmdschool.org
另外,如果你尚未配置HDFS,请参阅以下章节,

如何部署Oracle Linux 10.x Apache Hadoop 2.6.0 HDFS集群?

3.1.2 角色Apache Hadoop YARN的基本信息

hostname = hd0[1-3].cmdschool.org
ipaddress = 10.168.0.10[1-3] OS = Oracle Linux 10.x x86_64
详细的角色分布如下,
Apache Hadoop YARN ResourceManager(yarn-rm) = hd01.cmdschool.org
Apache Hadoop YARN NodeManager(yarn-nm) = hd0[1-3].cmdschool.org

2.2 Hadoop YARN的基本配置

2.2.1 配置运行用户

In hd0[1-3],

groupadd yarn
useradd -g yarn -G hadoop -d /var/lib/hadoop-yarn/ yarn

2.2.2 允许yarn以hdfs身份执行rsync

In hd0[1-3],

visudo

在以下第一行加入第二行配置,

root    ALL=(ALL)       ALL
yarn    ALL=(hdfs)      NOPASSWD: /usr/bin/rsync

注:以上允许yarn用户以无密码的方式调用hdfs执行脚本同步

2.2.3 修改rsync的同步脚本

In hd0[1-3],

vim /usr/hadoop-2.6.0/sbin/yarn-daemon.sh

注释原来的行,然后如第二行的配置,

# rsync -a -e ssh --delete --exclude=.svn --exclude='logs/*' --exclude='contrib/hod/logs/*' $YARN_MASTER/ "$HADOOP_YARN_HOME"
sudo -u hdfs rsync -a -e ssh --delete --exclude=.svn --exclude='logs/*' --exclude='contrib/hod/logs/*' $YARN_MASTER/ "$HADOOP_YARN_HOME"

注:以上yarn脚本调用hdfs的身份完成配置的同步

2.2.4 配置环境变量

In hd0[1-3],

vim /etc/profile.d/hadoop.sh

继之前的配置加入如下内容,

export YARN_CONF_DIR=/etc/hadoop
export YARN_LOG_DIR=/var/log/hadoop-yarn
export YARN_MASTER=hd01:${HADOOP_YARN_HOME}
export YARN_PID_DIR=/var/run/hadoop-yarn
export YARN_IDENT_STRING=$USER
export YARN_NICENESS=0

以上的配置解析如下,请重点理解第四行,
– 第一行声明YARN配置文件的位置
– 第二行声明YARN日志文件位置
– 第三行声明非主服务器使用rsync从hd01的“/usr/hadoop-2.6.0”目录同步配置(此配置是重点)
– 第四行声明YARN PDI文件的位置
– 第五行声明运行的用户(“$USER即指当前用户”)
– 第六行声明进程的优先级别,默认值“0”
如果使用如下命令查看整个配置文件的定义,

cat /etc/profile.d/hadoop.sh

可见如下配置,

export HADOOP_HOME=/usr/hadoop-2.6.0
export HADOOP_PREFIX=${HADOOP_HOME}
export HADOOP_YARN_HOME=${HADOOP_HOME}
export PATH=${HADOOP_HOME}/bin:$PATH
export PATH=${HADOOP_HOME}/sbin:$PATH
export HADOOP_CONF_DIR=/etc/hadoop
export HADOOP_LOG_DIR=/var/log/hadoop-hdfs
export HADOOP_PID_DIR=/var/run/hadoop-hdfs
export HADOOP_MASTER=hd01:${HADOOP_HOME}
export HADOOP_IDENT_STRING=$USER
export HADOOP_NICENESS=0

export YARN_CONF_DIR=/etc/hadoop
export YARN_LOG_DIR=/var/log/hadoop-yarn
export YARN_MASTER=hd01:${HADOOP_YARN_HOME}
export YARN_PID_DIR=/var/run/hadoop-yarn
export YARN_IDENT_STRING=$USER
export YARN_NICENESS=0

根据声明的目录配置权限,

mkdir -p /var/log/hadoop-yarn
chown yarn:yarn /var/log/hadoop-yarn
chmod 775 /var/log/hadoop-yarn
mkdir -p /var/run/hadoop-yarn
chown yarn:yarn /var/run/hadoop-yarn
chmod 775 /var/run/hadoop-yarn

配置完成后,请使用如下命令重新导入环境变量,

source /etc/profile.d/hadoop.sh

2.2.4 确认YARN部署

In hd0[1-3],

yarn version

可见如下输出,

Hadoop 2.6.0
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r e3496499ecb8d220fba99dc5ed4c99c8f9e33bb1
Compiled by jenkins on 2014-11-13T21:10Z
Compiled with protoc 2.5.0
From source with checksum 18e43357c8f927c0695f1e9522859d6a
This command was run using /usr/hadoop-2.6.0/share/hadoop/common/hadoop-common-2.6.0.jar

2.3 配置资源管理器节点(主节点)

In hd01,

2.3.1 定义mapred-site.xml配置文件

cp /etc/hadoop/mapred-site.xml.template /etc/hadoop/mapred-site.xml
vim /etc/hadoop/mapred-site.xml

修改如下配置,

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

注:配置“mapreduce.framework.name”属性值为“mapreduce.framework.name”表示使用yarn运行mapreduce程序

2.3.2 定义yarn-site.xml配置文件

cp /etc/hadoop/yarn-site.xml /etc/hadoop/yarn-site.xml.default
vim /etc/hadoop/yarn-site.xml

修改如下配置,

<configuration>
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>hd01</value>
    </property>
    <property>
        <name>yarn.nodemanager.local-dirs</name>
        <value>/data/yarn/nm</value>
    </property>
    <property>
        <name>yarn.nodemanager.log-dirs</name>
        <value>/data/yarn/container-logs</value>
    </property>
</configuration>

注:
– 属性“yarn.resourcemanager.hostname”指定资源管理器的主机名称
– 属性“yarn.nodemanager.local-dirs”指定节点管理器的本地存储目录
– 属性“yarn.nodemanager.log-dirs”指定节点管理器的本地日志存储目录
根据配置文件增加目录,

mkdir -p /data/yarn/nm
chown yarn:yarn /data/yarn/nm
chmod 775 /data/yarn/nm

mkdir -p /data/yarn/container-logs
chown yarn:yarn /data/yarn/container-logs
chmod 775 /data/yarn/container-logs

2.3.3 启动守护进程

su - yarn -c '/usr/hadoop-2.6.0/sbin/yarn-daemon.sh start resourcemanager'

可见如下输出,

rsync from hd01:/usr/hadoop-2.6.0
starting resourcemanager, logging to /var/log/hadoop-yarn/yarn-yarn-resourcemanager-hd01.cmdschool.org.out

可使用如下命令确认运行的进程,

pgrep -u yarn -a java

可见如下输出,

102742 /usr/java/jdk1.8.0_121/bin/java -Dproc_resourcemanager -Xmx1000m -Dhadoop.log.dir=/var/log/hadoop-yarn -Dyarn.log.dir=/var/log/hadoop-yarn -Dhadoop.log.file=yarn-yarn-resourcemanager-hd01.cmdschool.org.log -Dyarn.log.file=yarn-yarn-resourcemanager-hd01.cmdschool.org.log -Dyarn.home.dir= -Dyarn.id.str=yarn -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Djava.library.path=/usr/hadoop-2.6.0/lib/native -Dyarn.policy.file=hadoop-policy.xml -Dhadoop.log.dir=/var/log/hadoop-yarn -Dyarn.log.dir=/var/log/hadoop-yarn -Dhadoop.log.file=yarn-yarn-resourcemanager-hd01.cmdschool.org.log -Dyarn.log.file=yarn-yarn-resourcemanager-hd01.cmdschool.org.log -Dyarn.home.dir=/usr/hadoop-2.6.0 -Dhadoop.home.dir=/usr/hadoop-2.6.0 -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Djava.library.path=/usr/hadoop-2.6.0/lib/native -classpath /etc/hadoop:/etc/hadoop:/etc/hadoop:/usr/hadoop-2.6.0/share/hadoop/common/lib/*:/usr/hadoop-2.6.0/share/hadoop/common/*:/usr/hadoop-2.6.0/share/hadoop/hdfs:/usr/hadoop-2.6.0/share/hadoop/hdfs/lib/*:/usr/hadoop-2.6.0/share/hadoop/hdfs/*:/usr/hadoop-2.6.0/share/hadoop/yarn/lib/*:/usr/hadoop-2.6.0/share/hadoop/yarn/*:/usr/hadoop-2.6.0/share/hadoop/mapreduce/lib/*:/usr/hadoop-2.6.0/share/hadoop/mapreduce/*:/usr/hadoop-2.6.0/contrib/capacity-scheduler/*.jar:/usr/hadoop-2.6.0/contrib/capacity-scheduler/*.jar:/usr/hadoop-2.6.0/share/hadoop/yarn/*:/usr/hadoop-2.6.0/share/hadoop/yarn/lib/*:/etc/hadoop/rm-config/log4j.properties org.apache.hadoop.yarn.server.resourcemanager.ResourceManager

可使用如下命令确认程序的启动,

ss -antp | grep -f <(pgrep -u yarn java)

可见如下输出,

LISTEN 0      128    [::ffff: 10.168.0.101]:8033              *:*     users:(("java",pid=102742,fd=232))                                     
LISTEN 0      128    [::ffff: 10.168.0.101]:8032              *:*     users:(("java",pid=102742,fd=218))                                     
LISTEN 0      128    [::ffff: 10.168.0.101]:8031              *:*     users:(("java",pid=102742,fd=197))                                     
LISTEN 0      128    [::ffff: 10.168.0.101]:8030              *:*     users:(("java",pid=102742,fd=208))                                     
LISTEN 0      128    [::ffff: 10.168.0.101]:8088              *:*     users:(("java",pid=102742,fd=228))  

测试到这里,请使用以下命令停止服务,

su - yarn -c '/usr/hadoop-2.6.0/sbin/yarn-daemon.sh stop resourcemanager'

2.3.4 配置服务控制脚本

vim /usr/lib/systemd/system/yarn-rm.service

可加入如下配置,

[Unit]
Description=Apache YARN resource manager
Wants=network.target
Before=network.target
After=network-pre.target
Documentation=https://hadoop.apache.org/docs/

[Service]
Type=forking
User=yarn
Group=yarn

Environment="JAVA_HOME=/usr/java/jdk1.8.0_121"
Environment="HADOOP_HOME=/usr/hadoop-2.6.0"
Environment="HADOOP_PREFIX=/usr/hadoop-2.6.0"
Environment="HADOOP_YARN_HOME=/usr/hadoop-2.6.0"
Environment="HADOOP_CONF_DIR=/etc/hadoop"
Environment="HADOOP_LOG_DIR=/var/log/hadoop-hdfs"
Environment="HADOOP_PID_DIR=/var/run/hadoop-hdfs"
Environment="HADOOP_MASTER=hd01:/usr/hadoop-2.6.0"
Environment="HADOOP_IDENT_STRING=hdfs"
Environment="HADOOP_NICENESS=0"

Environment="YARN_CONF_DIR=/etc/hadoop"
Environment="YARN_LOG_DIR=/var/log/hadoop-yarn"
Environment="YARN_MASTER=hd01:/usr/hadoop-2.6.0"
Environment="YARN_PID_DIR=/var/run/hadoop-yarn"
Environment="YARN_IDENT_STRING=yarn"
Environment="YARN_NICENESS=0"

ExecStartPre=+/bin/sh -c 'mkdir -p /var/run/hadoop-yarn;chown yarn:yarn /var/run/hadoop-yarn;chmod 775 /var/run/hadoop-yarn'
ExecStartPre=+/bin/sh -c 'mkdir -p /var/log/hadoop-yarn;chown yarn:yarn /var/log/hadoop-yarn;chmod 775 /var/log/hadoop-yarn'
ExecStart=/usr/hadoop-2.6.0/sbin/yarn-daemon.sh start resourcemanager
ExecStop=/usr/hadoop-2.6.0/sbin/yarn-daemon.sh stop resourcemanager
PIDFile=/var/run/hadoop-yarn/yarn-yarn-resourcemanager.pid
Restart=on-success

[Install]
WantedBy=multi-user.target

修改完脚本后,你需要使用如下命令重载服务,

systemctl daemon-reload

测试服务启动并设置服务自启动,

systemctl start yarn-rm.service
systemctl status yarn-rm.service
systemctl enable yarn-rm.service

2.3.5 开放节点的端口

firewall-cmd --permanent --add-port 8088/tcp --add-port 8030-8033/tcp
firewall-cmd --reload
firewall-cmd --list-all

2.3.6 浏览器测试

In Windows Client
http://10.168.0.101:8088/
可见如下显示,

如果你配置完所有节点,单击【nodes】则显示如下,

2.4 配置节点管理器(从节点)

In hd0[1-5],

2.4.1 启动守护进程

su - yarn -c '/usr/hadoop-2.6.0/sbin/yarn-daemon.sh start nodemanager'

可见如下输出,

rsync from hd01:/usr/hadoop-2.6.0
starting nodemanager, logging to /var/log/hadoop-yarn/yarn-yarn-nodemanager-hd01.cmdschool.org.out

可使用如下命令确认运行的进程,

pgrep -u yarn -a java | grep -i proc_nodemanager

可见如下输出,

104820 /usr/java/jdk1.8.0_121/bin/java -Dproc_nodemanager -Xmx1000m -Dhadoop.log.dir=/var/log/hadoop-yarn -Dyarn.log.dir=/var/log/hadoop-yarn -Dhadoop.log.file=yarn-yarn-nodemanager-hd01.cmdschool.org.log -Dyarn.log.file=yarn-yarn-nodemanager-hd01.cmdschool.org.log -Dyarn.home.dir= -Dyarn.id.str=yarn -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Djava.library.path=/usr/hadoop-2.6.0/lib/native -Dyarn.policy.file=hadoop-policy.xml -server -Dhadoop.log.dir=/var/log/hadoop-yarn -Dyarn.log.dir=/var/log/hadoop-yarn -Dhadoop.log.file=yarn-yarn-nodemanager-hd01.cmdschool.org.log -Dyarn.log.file=yarn-yarn-nodemanager-hd01.cmdschool.org.log -Dyarn.home.dir=/usr/hadoop-2.6.0 -Dhadoop.home.dir=/usr/hadoop-2.6.0 -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Djava.library.path=/usr/hadoop-2.6.0/lib/native -classpath /etc/hadoop:/etc/hadoop:/etc/hadoop:/usr/hadoop-2.6.0/share/hadoop/common/lib/*:/usr/hadoop-2.6.0/share/hadoop/common/*:/usr/hadoop-2.6.0/share/hadoop/hdfs:/usr/hadoop-2.6.0/share/hadoop/hdfs/lib/*:/usr/hadoop-2.6.0/share/hadoop/hdfs/*:/usr/hadoop-2.6.0/share/hadoop/yarn/lib/*:/usr/hadoop-2.6.0/share/hadoop/yarn/*:/usr/hadoop-2.6.0/share/hadoop/mapreduce/lib/*:/usr/hadoop-2.6.0/share/hadoop/mapreduce/*:/usr/hadoop-2.6.0/contrib/capacity-scheduler/*.jar:/usr/hadoop-2.6.0/contrib/capacity-scheduler/*.jar:/usr/hadoop-2.6.0/share/hadoop/yarn/*:/usr/hadoop-2.6.0/share/hadoop/yarn/lib/*:/etc/hadoop/nm-config/log4j.properties org.apache.hadoop.yarn.server.nodemanager.NodeManager

或者,以下命令更加简洁,

jps

可见如下输出,

104208 ResourceManager
1505 DataNode
1506 NameNode
104820 NodeManager
104950 Jps

可使用如下命令确认程序的启动,

ss -antp | grep -f <(pgrep -u yarn -a java | grep -i proc_nodemanager | awk '{print $1}')

可见如下输出,

LISTEN 0      128                         *:42891                      *:*     users:(("java",pid=104820,fd=197))                                     
LISTEN 0      128                         *:8040                       *:*     users:(("java",pid=104820,fd=208))                                     
LISTEN 0      128                         *:8042                       *:*     users:(("java",pid=104820,fd=218))                                     
ESTAB  0      0      [::ffff:10.168.0.101]:34842 [::ffff:10.168.0.101]:8031  users:(("java",pid=104820,fd=222))  

测试到这里,请使用以下命令停止服务,

su - yarn -c '/usr/hadoop-2.6.0/sbin/yarn-daemon.sh stop nodemanager'

2.4.2 配置服务控制脚本

vim /usr/lib/systemd/system/yarn-nm.service

可加入如下配置,

[Unit]
Description=Apache YARN nodemanager
Wants=network.target
Before=network.target
After=network-pre.target
Documentation=https://hadoop.apache.org/docs/

[Service]
Type=forking
User=yarn
Group=yarn

Environment="JAVA_HOME=/usr/java/jdk1.8.0_121"
Environment="HADOOP_HOME=/usr/hadoop-2.6.0"
Environment="HADOOP_PREFIX=/usr/hadoop-2.6.0"
Environment="HADOOP_YARN_HOME=/usr/hadoop-2.6.0"
Environment="HADOOP_CONF_DIR=/etc/hadoop"
Environment="HADOOP_LOG_DIR=/var/log/hadoop-hdfs"
Environment="HADOOP_PID_DIR=/var/run/hadoop-hdfs"
Environment="HADOOP_MASTER=hd01:/usr/hadoop-2.6.0"
Environment="HADOOP_IDENT_STRING=hdfs"
Environment="HADOOP_NICENESS=0"

Environment="YARN_CONF_DIR=/etc/hadoop"
Environment="YARN_LOG_DIR=/var/log/hadoop-yarn"
Environment="YARN_MASTER=hd01:/usr/hadoop-2.6.0"
Environment="YARN_PID_DIR=/var/run/hadoop-yarn"
Environment="YARN_IDENT_STRING=yarn"
Environment="YARN_NICENESS=0"

ExecStartPre=+/bin/sh -c 'mkdir -p /var/run/hadoop-yarn;chown yarn:yarn /var/run/hadoop-yarn;chmod 775 /var/run/hadoop-yarn'
ExecStartPre=+/bin/sh -c 'mkdir -p /var/log/hadoop-yarn;chown yarn:yarn /var/log/hadoop-yarn;chmod 775 /var/log/hadoop-yarn'
ExecStartPre=+/bin/sh -c 'mkdir -p /data/yarn/nm;chown yarn:yarn /data/yarn/nm;chmod 775 /data/yarn/nm'
ExecStartPre=+/bin/sh -c 'mkdir -p /data/yarn/container-logs;chown yarn:yarn /data/yarn/container-logs;chmod 775 /data/yarn/container-logs'
ExecStart=/usr/hadoop-2.6.0/sbin/yarn-daemon.sh start nodemanager
ExecStop=/usr/hadoop-2.6.0/sbin/yarn-daemon.sh stop nodemanager
PIDFile=/var/run/hadoop-yarn/yarn-yarn-nodemanager.pid
Restart=on-success

[Install]
WantedBy=multi-user.target

修改完脚本后,你需要使用如下命令重载服务,

systemctl daemon-reload

你可使用如下命启动服务并设置自启动,

systemctl start yarn-nm.service
systemctl status yarn-nm.service
systemctl enable yarn-nm.service

2.4.3 开放节点的端口

firewall-cmd --permanent --add-port 8040/tcp --add-port 8042/tcp --add-port 13562/tcp --add-port 35000-50000/tcp
firewall-cmd --reload
firewall-cmd --list-all

2.4.4 浏览器测试

In Windows Client
http://10.168.0.101:8042/
可见如下显示,

没有评论

发表回复

Apache-Hadoop
如何部署Oracle Linux 10.x Apache Hadoop 2.6.0 HDFS集群?

1 基础知识 如何部署Apache Hadoop HDFS集群? 2 最佳实践 2.1 系统环境配置 …

Apache-Hadoop
如何测试Apache Hadoop HDFS?

1 前言 一个问题,一篇文章,一出故事。 今天完成 Apache Hadoop 2.6.0 HDFS …

Apache-Hadoop
如何部署Oracle Linux 10.x Apache Hadoop 2.6.0?

1 基础知识 如何二进制部署Apache Hadoop? 2 最佳实践 2.1 准备环境 2.1.1 …