如何部署Hadoop Cloudera 5.10.1(CDH)?

Cloudera-Hadoop

1 前言

Cloudera是基于Apache原生的Hadoop组件进行重新封装和加强,Cloudera可以简化Hadoop组件的部署。

2 理论基础

2.1 部署软件架构

1)Oracle JDK
2)Cloudera Manager Server and Agent packages
3)Supporting database software
4)CDH and managed service software

2.2 部署步骤和安装方法

2.2.1 安装方法

A)Cloudera Manager安装程序安装(容易)
B)yum源方式安装(中等)
C)源代码安装(难)
注:本教程使用方法B

2.2.2 部署步骤

1)安装JDK
2)安装并配置数据库
3)安装Cloudera管理服务端
4)安装Cloudera管理代理端
5)安装CDH管理服务软件
6)创建、启动和配置CDH并管理服务

2.3 Cloudera Manager端的相关文件

rpm -ql cloudera-manager-server

显示如下:

/etc/cloudera-scm-server
/etc/cloudera-scm-server/db.properties
/etc/cloudera-scm-server/log4j.properties
/etc/default/cloudera-scm-server
/etc/rc.d/init.d/cloudera-scm-server
/opt/cloudera/csd
/opt/cloudera/parcel-repo
/usr/sbin/cmf-server
/var/log/cloudera-scm-server
/var/run/cloudera-scm-server

文件与目录功能如下:
1)其中/etc/的2-4行为Cloudera Manager服务端配置文件
2)/opt/cloudera/parcel-repo为下载安装包存放目录

3 实践部分

3.1 基本信息

Hostname=HD0[1-5].cmdschool.org
Ipaddress=10.168.0.2[4-8] OS Version=CentOS 7.3

建议的硬盘空间需求,
/ > 20G
/tmp > 20G
/opt > 20G
/var > 100G
/data > 500G

3.2 环境配置

3.2.1 配置IP地址

In HD01-05:

nmcli connection delete "Wired connection 1"
nmcli connection show
nmcli device show
nmcli connection add ifname ens192 con-name ens192 type ethernet
nmcli connection modify ens192 ipv4.address "10.168.0.XX/24"
nmcli connection modify ens192 ipv4.gateway "10.168.0.1"
nmcli connection modify ens192 ipv4.dns "202.96.128.86 202.96.128.166"
nmcli connection modify ens192 ipv4.method manual
nmcli connection modify ens192 ipv6.method ignore
nmcli connection modify ens192 connection.autoconnect yes
nmcli connection up ens192

注:第五行的“XX”部分是主机号

3.2.2 配置主机名称

In HD01-05:

hostnamectl set-hostname HDXX.cmdschool.org

注:“XX”部分是主机名称编号

3.2.3 关闭SELinux

In HD01-05:

getenforce

如果显示如下:

Enforcing

则执行:

setenforce 0
sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config

3.2.4 配置名称解析

In HD01-05:

echo '10.168.0.24 hd01.cmdschool.org' >> /etc/hosts
echo '10.168.0.25 hd02.cmdschool.org' >> /etc/hosts
echo '10.168.0.26 hd03.cmdschool.org' >> /etc/hosts
echo '10.168.0.27 hd04.cmdschool.org' >> /etc/hosts
echo '10.168.0.28 hd05.cmdschool.org' >> /etc/hosts

3.2.5 配置时区

In HD01-05:

timedatectl set-timezone Asia/Shanghai

3.2.6 关闭防火墙并设置开机不启动

In HD01-05:

systemctl stop firewalld
systemctl disable firewalld

3.2.7 优化虚拟内存需求率

In HD01-05:
1)检查虚拟内存需求率

cat /proc/sys/vm/swappiness

显示如下:

 30

2)临时降低虚拟内存需求率

sysctl vm.swappiness=1

3)永久降低虚拟内存需求率

echo 'vm.swappiness = 1' > /etc/sysctl.d/swappiness.conf

并运行如下命令使生效

sysctl -p

注:某些版本存在bug,所以值不能设置为“0”

3.2.8 解决透明大页面问题

In HD01-05:
1)检查透明大页面问题

cat /sys/kernel/mm/transparent_hugepage/defrag
cat /sys/kernel/mm/transparent_hugepage/enabled

如果显示为:

[always] madvise never

2)临时关闭透明大页面问题

echo never > /sys/kernel/mm/transparent_hugepage/defrag
echo never > /sys/kernel/mm/transparent_hugepage/enabled

确认配置生效:

cat /sys/kernel/mm/transparent_hugepage/defrag
cat /sys/kernel/mm/transparent_hugepage/enabled

应该显示为:

always madvise [never]

3)配置开机自动生效

echo 'echo never > /sys/kernel/mm/transparent_hugepage/defrag' >> /etc/rc.local
echo 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' >> /etc/rc.local
chmod +x /etc/rc.d/rc.local

3.2.9 安装JDK

In HD01-05:
1)查询是否存在yum安装的java包

rpm -qa | grep java

注意:存在yum安装的rpm包可能导致Sqoop无法启动
2)安装jdk1.8

mkdir /usr/java
cd /usr/java
wget http://download.oracle.com/otn-pub/java/jdk/8u121-b13/e9e7ea248e2c4826b92b3f075a80e441/jdk-8u121-linux-x64.tar.gz
tar -xf jdk-8u121-linux-x64.tar.gz

2)配置jdk变量环境

echo 'export JAVA_HOME=/usr/java/jdk1.8.0_121' > /etc/profile.d/jdk.sh
echo 'export JRE_HOME=${JAVA_HOME}/jre' >> /etc/profile.d/jdk.sh
echo 'export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib' >> /etc/profile.d/jdk.sh
echo 'export PATH=${JAVA_HOME}/bin:$PATH' >> /etc/profile.d/jdk.sh

3)导入java环境变量

source /etc/profile

4)测试jdk的配置

java -version

3.3 yum源的安装配置

In HD01-05:

3.3.1 yum源配置

1)增加MySQL源
In HD01:

yum install -y https://dev.mysql.com/get/mysql57-community-release-el7-10.noarch.rpm

注:此出隐含条件是有系统默认的yum源并可在线更新
2)开启5.6版的源
In HD01:

vim /etc/yum.repos.d/mysql-community.repo

开启5.6的源并修改如下:

[mysql56-community]
name=MySQL 5.6 Community Server
baseurl=http://repo.mysql.com/yum/mysql-5.6-community/el/7/$basearch/
enabled=1
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-mysql

[mysql57-community]
name=MySQL 5.7 Community Server
baseurl=http://repo.mysql.com/yum/mysql-5.7-community/el/7/$basearch/
enabled=0
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-mysql

注:修改enabled选项等于1表示开启,0表示关闭(其他源都配置为0)
2)增加Cloudera源
In HD01-05:

curl https://archive.cloudera.com/cm5/redhat/7/x86_64/cm/cloudera-manager.repo > /etc/yum.repos.d/cloudera-manager.repo

注:此处隐含条件是有系统默认的yum源并可在线更新且源是最新版本,如果需要执行版本,请按如下方法操作,

vim /etc/yum.repos.d/cloudera-manager.repo

以下配置5.10.1的源,

[cloudera-manager]
# Packages for Cloudera Manager, Version 5, on RedHat or CentOS 7 x86_64
name=Cloudera Manager
baseurl=https://archive.cloudera.com/cm5/redhat/7/x86_64/cm/5.10.1/
gpgkey =https://archive.cloudera.com/cm5/redhat/7/x86_64/cm/RPM-GPG-KEY-cloudera
gpgcheck = 1

3.3.2 安装基本的工具

In HD01-05:
1)安装配置工具

yum install -y vim wget openssh-clients

2)安装python

yum install -y python

3)安装ntpd

yum install -y chrony

3.3.3 Cloudera Manager端yum源配置

In HD01:
1)安装Cloudera Manager包

yum install -y cloudera-manager-daemons cloudera-manager-server

2)安装mysql

yum install -y mysql-community-server mysql-community-devel mysql-community-client mysql-community-libs mysql-community-common mysql-community-libs-compat

3.3.4 Cloudera Manager Agent端yum源配置

In HD01-05:
安装Cloudera Manager Agent包

yum install -y cloudera-manager-agent cloudera-manager-daemons

3.4 配置前的准备配置

3.4.1权限检查(单用户模式适用,可选)

In HD01-05:
检查以下目录cloudera-scm用户具有完全的权限
检查当前目录权限:

ls -ld /opt/cloudera/

显示如下:

drwxr-xr-x. 4 cloudera-scm cloudera-scm 36 Apr 15 19:35 /opt/cloudera/

检查子目录权限:

ls -lR /opt/cloudera/

显示如下

/opt/cloudera/:
total 0
drwxr-xr-x. 2 cloudera-scm cloudera-scm 6 Mar 19 23:26 csd
drwxr-xr-x. 2 cloudera-scm cloudera-scm 6 Mar 19 23:26 parcel-repo

/opt/cloudera/csd:
total 0

/opt/cloudera/parcel-repo:
total 0

同样,检查服务器或客户端目录权限

ls -ld /var/log/cloudera-scm-server/
ls -lR /var/log/cloudera-scm-server/
ls -ld /var/lib/cloudera-scm-agent/
ls -lR /var/lib/cloudera-scm-agent/

3.4.2 检查线程限制配置

In HD01-05:

vim /etc/security/limits.d/cloudera-scm.conf

修改配置如下:

#
# (c) Copyright 2014 Cloudera, Inc.
#
cloudera-scm    soft  nofile  32768
cloudera-scm    soft  nproc   65536
cloudera-scm    hard  nofile  1048576
cloudera-scm    hard  nproc   unlimited
cloudera-scm    hard  memlock unlimited
cloudera-scm    soft  memlock unlimited

3.4.3 指定运行单用户模式的用户名(单用户模式适用,不配置)

In HD01-05:

vim /etc/default/cloudera-scm-agent

并取消以下行的注释

USER="cloudera-scm"

3.4.4 创建压缩包存放目录

In HD01-05:

mkdir -p /opt/cloudera/parcels
chown cloudera-scm:cloudera-scm /opt/cloudera/parcels

2.4.5 配置无密码的sudo访问(非默认单用户模式适用,可选)

In HD01-05:

visudo

增加如下组:

%cloudera-scm ALL=(ALL) NOPASSWD: ALL

确认包含如下行:

Defaults secure_path = /sbin:/bin:/usr/sbin:/usr/bin

3.4.6 配置su的limit

In HD01-05:

vim /etc/pam.d/su

加入如下配置:

session         required        pam_limits.so

3.4.7 NTP的配置

In HD01-5:
1)确认包含如下配置:

cat /etc/chrony.conf

应该包含如下配置:

server 0.centos.pool.ntp.org iburst
server 1.centos.pool.ntp.org iburst
server 2.centos.pool.ntp.org iburst
server 3.centos.pool.ntp.org iburst

2)启动并配置ntpd服务自动启动

systemctl restart chronyd
systemctl enable chronyd

3)触发时间同步

chronyc sources

3.4.8 安装MySQL JDBC Driver

In HD01-05:

cd ~
wget http://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.39.tar.gz
tar zxvf mysql-connector-java-5.1.39.tar.gz
mkdir /usr/share/java/
cp mysql-connector-java-5.1.39/mysql-connector-java-5.1.39-bin.jar /usr/share/java/mysql-connector-java.jar

3.4.9 配置公钥认证

In HD01:
1)生成秘钥

ssh-keygen -t rsa

注:以上一路回车
2)复制公钥到各个被登录的服务器

ssh-copy-id -i ~/.ssh/id_rsa.pub root@10.168.0.24
ssh-copy-id -i ~/.ssh/id_rsa.pub root@10.168.0.25
ssh-copy-id -i ~/.ssh/id_rsa.pub root@10.168.0.26
ssh-copy-id -i ~/.ssh/id_rsa.pub root@10.168.0.27
ssh-copy-id -i ~/.ssh/id_rsa.pub root@10.168.0.28

3)测试免密码登录

ssh 10.168.0.24
ssh 10.168.0.25
ssh 10.168.0.26
ssh 10.168.0.27
ssh 10.168.0.28

注:以上如果无需密码登记即成功

3.5 Cloudera Manager安装配置

In HD01:

3.5.1 修改mysql参数

1)关闭数据库

systemctl stop mysqld

2)备份ib_logfile文件

mkdir /var/lib/backup
cd /var/lib/mysql/
mv ib_logfile* /var/lib/backup/

3)修改MySQL配置

cp /etc/my.cnf /etc/my.cnf.default
vim /etc/my.cnf

修改参数如下:

[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
user=mysql

# Recommended in standard MySQL setup
sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES

transaction-isolation = READ-COMMITTED
# Disabling symbolic-links is recommended to prevent assorted security risks;
# to do so, uncomment this line:
# symbolic-links = 0

key_buffer_size = 32M
max_allowed_packet = 32M
thread_stack = 256K
thread_cache_size = 64
query_cache_limit = 8M
query_cache_size = 64M
query_cache_type = 1

max_connections = 550
#expire_logs_days = 10
#max_binlog_size = 100M

#log_bin should be on a disk with enough free space. Replace '/var/lib/mysql/mysql_binary_log' with an appropriate path for your system
#and chown the specified folder to the mysql user.
log_bin=/var/lib/mysql/mysql_binary_log
server-id=1

# For MySQL version 5.1.8 or later. Comment out binlog_format for older versions.
binlog_format = mixed

read_buffer_size = 2M
read_rnd_buffer_size = 16M
sort_buffer_size = 8M
join_buffer_size = 8M

# InnoDB settings
innodb_file_per_table = 1
innodb_flush_log_at_trx_commit  = 2
innodb_log_buffer_size = 64M
innodb_buffer_pool_size = 4G
innodb_thread_concurrency = 8
innodb_flush_method = O_DIRECT
innodb_log_file_size = 512M

[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid

sql_mode=STRICT_ALL_TABLES

3.5.2 启动并设置开机自动启动

systemctl start mysqld
systemctl enable mysqld

3.5.3 初始化数据库

mysql_secure_installation

向导如下:

[...]
Enter current password for root (enter for none):
OK, successfully used password, moving on...
[...]
Set root password? [Y/n] y
New password:
Re-enter new password:
Remove anonymous users? [Y/n] y
[...]
Disallow root login remotely? [Y/n] n
[...]
Remove test database and access to it [Y/n] y
[...]
Reload privilege tables now? [Y/n] y
All done!

3.5.4 准备scm库

数据库配置:

mysql -uroot -p
create database scm default character set utf8;
grant all privileges on *.* to scm@'hd01.cmdschool.org' identified by 'scm';
grant all privileges on *.* to scm@'hd01' identified by 'scm';
flush privileges;

测试数据库连接,

mysql -uscm -pscm -hhd01.cmdschool.org

修改数据库连接参数:

cp /etc/cloudera-scm-server/db.properties /etc/cloudera-scm-server/db.properties.default
vim /etc/cloudera-scm-server/db.properties

修改如下参数:

com.cloudera.cmf.db.type=mysql
com.cloudera.cmf.db.host=hd01.cmdschool.org
com.cloudera.cmf.db.name=scm
com.cloudera.cmf.db.user=scm
com.cloudera.cmf.db.password=scm
com.cloudera.cmf.db.setupType=EXTERNAL

3.5.5 创建附加数据库(可选)

1)附加数据库列表

Role	Database	User	Password
Activity Monitor	amon	amon	amon_password
Reports Manager	rman	rman	rman_password
Hive Metastore Server	metastore	hive	hive_password
Sentry Server	sentry	sentry	sentry_password
Cloudera Navigator Audit Server	nav	nav	nav_password
Cloudera Navigator Metadata Server
navms	navms	navms_password

2)创建数据库并配置管理账号密码

mysql -uroot -p
create database amon default character set utf8;
grant all privileges on amon.* to 'amon'@'%' identified by 'amon_password';

create database rman default character set utf8;
grant all privileges on rman.* to 'rman'@'%' identified by 'rman_password';

create database metastore default character set utf8;
grant all privileges on metastore.* to 'hive'@'%' identified by 'hive_password';

create database sentry default character set utf8;
grant all privileges on sentry.* to 'sentry'@'%' identified by 'sentry_password';

create database nav default character set utf8;
grant all privileges on nav.* to 'nav'@'%' identified by 'nav_password';

create database navms default character set utf8;
grant all privileges on navms.* to 'navms'@'%' identified by 'navms_password';

flush privileges;

3.5.6 配置Oozie库(可选)

1)数据库权限配置

mysql -uroot -p
create database oozie default character set utf8;
grant all privileges on oozie.* to 'oozie'@'localhost' identified by 'oozie';
grant all privileges on oozie.* to 'oozie'@'%' identified by 'oozie';
flush privileges;

2)配置Oozie库所需软连接

cd /opt/cloudera/parcels/CDH/lib/oozie/lib/
ln -s /usr/share/java/mysql-connector-java.jar mysql-connector-java.jar

3.5.7 启动服务并配置开机启动

/etc/init.d/cloudera-scm-server start
chkconfig cloudera-scm-server on

3.5.8 故障排除

tail -f /var/log/cloudera-scm-server/cloudera-scm-server.out

3.6 Cloudera Manager Agent安装

In HD01-05:

3.6.1 创建压缩包存放目录

mkdir -p /opt/cloudera/parcels
chown cloudera-scm:cloudera-scm /opt/cloudera/parcels

3.6.2 指定管理服务器和指定包存放目录

vim /etc/cloudera-scm-agent/config.ini

确保参数如下并启用:

server_host=hd01.cmdschool.org
server_port=7182
parcel_dir=/opt/cloudera/parcels

3.6.3 指定运行单用户模式的用户名(仅用于单用户模式,不配置)

vim /etc/default/cloudera-scm-agent

取消以下行的注释

USER="cloudera-scm"

3.6.4 启动服务并配置服务器开机启动

/etc/init.d/cloudera-scm-agent start
chkconfig cloudera-scm-agent on

3.6.5 故障排除

如下命令监控启动服务的错误输出

tail -f /var/log/cloudera-scm-agent/cloudera-scm-agent.out

3.7 登陆配置

In HD01:
http://10.168.0.24:7180/cmf/login

注:界面部分请跟着向导走,相对容易,这是不再详述。

参阅文档
============================
Overview:
https://www.cloudera.com/documentation/enterprise/latest/topics/installation_installation.html

Managed Service Database:
https://www.cloudera.com/documentation/enterprise/latest/topics/cm_ig_installing_configuring_dbs.html

系统地址:
https://www.centos.org/download/mirrors/

CDH地址:
https://archive.cloudera.com/cm5/redhat/7/x86_64/cm/

Java地址:
http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html

支持信息参考:
https://www.cloudera.com/documentation/enterprise/release-notes/topics/rn_consolidated_pcm.html

没有评论

发表回复

Cloudera-Hadoop
如何平衡Hadoop数据节点?

1 前言 一个问题,一篇文章,一出故事。 笔者生产环境有一套Hadoop平台需要自动平衡数据节点的数 …

Bash
如何自动备份HDFS Name Node?

1 前言 之前的章节手动完成Hadoop HDFS的NameNode节点备份,本章重点是实现名称节点 …

Cloudera-Hadoop
如何备份恢复HDFS元数据-逻辑级备份?

1 基础知识 1.1 备份命令的简介 – 备份命令用于防止所有名称节点都不可用的情况下可 …