corosync + pacemaker + postgres_streaming_replication
说明:
该文档用于说明以corosync+pacemaker的方式实现Postgresql流复制自动切换。注意内容包括有关corosync/pacemaker知识总结以及整个环境的搭建过程和问题处理。
一、介绍
Corosync
Corosync是由OpenAIS项目分离独立出来的项目,分离后能实现HA信息传输功能的就成为了Corosync,因此Corosync 60%的代码来源于OpenAIS。
Corosync与分离出来的Heartbeat类似,都属于集群信息层,负责传送集群信息以及节点间的心跳信息,单纯HA软件都不存在管理资源的功能,而是需要依赖上层的CRM来管理资源。目前最著名的资源管理器为Pacemaker,Corosync+Pacemaker也成为高可用集群方案中的最佳组合。
Pacemaker
Pacemaker,即Cluster Resource Manager(CRM),管理整个HA,客户端通过pacemaker管理监控整个集群。
常用的集群管理工具:
(1)基于命令行
crm shell/pcs
(2)基于图形化
pygui/hawk/lcmc/pcs
Pacemaker内部组件、模块关系图:
二、环境
2.1 OS
- #cat/etc/issue
- CentOSrelease6.4(Final)
- Kernel\ronan\m
- #uname-a
- Linuxnode12.6.32-358.el6.x86_64#1SMPFriFeb2200:31:26UTC2013x86_64x86_64x86_64GNU/Linux
2.2 IP
node1:
eth0 192.168.100.201/24 GW 192.168.100.1 ---真实地址
eth1 10.10.10.1/24 ---心跳地址
eth2 192.168.1.1/24 ---流复制地址
node2:
eth0 192.168.100.202/24 GW 192.168.100.1 ---真实地址
eth1 10.10.10.2/24 ---心跳地址
eth2 192.168.1.2/24 ---流复制地址
虚拟地址:
eth0:0 192.168.100.213/24 ---vip-master
eth0:0 192.168.100.214/24 ---vip-slave
eth2:0 192.168.1.3/24 ---vip-rep
2.3软件版本
- #rpm-qa|grepcorosync
- corosync-1.4.5-2.3.x86_64
- corosynclib-1.4.5-2.3.x86_64
- #rpm-qa|greppacemaker
- pacemaker-libs-1.1.10-14.el6_5.2.x86_64
- pacemaker-cli-1.1.10-14.el6_5.2.x86_64
- pacemaker-1.1.10-14.el6_5.2.x86_64
- pacemaker-cluster-libs-1.1.10-14.el6_5.2.x86_64
- #rpm-qa|grepcrmsh
- crmsh-1.2.6-6.1.x86_64
Postgresql Version:9.1.4
三、安装
3.1设置YUM源
- #cat/etc/yum.repos.d/ha-clustering.repo
- [haclustering]
- name=HAClustering
- baseurl=http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-6/
- enabled=1
- gpgcheck=0
3.2安装pacemaker/corosync/crmsh
- #yuminstallpacemakercorosynccrmsh
安装后会在/usr/lib/ocf/resource.d下生成相应的ocf资源脚本,如下:
- #cd/usr/lib/ocf/resource.d/
- [root@node1resource.d]#ls
- heartbeatpacemakerredhat
通过命令查看资源脚本:
- [root@node1resource.d]#crmralistocf
- ASEHAagent.shAoEtargetAudibleAlarmCTDBClusterMonDelayDummy
- EvmsSCCEvmsdFilesystemHealthcpuHealthSMARTICPIPaddr
- IPaddr2IPsrcaddrIPv6addrLVMLinuxSCSIMailToManageRAID
- ManageVEPure-FTPdRaid1RouteSAPDatabaseSAPInstanceSendArp
- ServeRAIDSphinxSearchDaemonSquidStatefulSysInfoSystemHealthVIPArip
- VirtualDomainWASWAS6WinPopupXenXinetdanything
- apacheapache.shasteriskclusterfs.shconntrackdcontrolddb2
- dhcpddrbddrbd.sheDir88ethmonitorexportfsfio
- fs.shiSCSILogicalUnitiSCSITargetidsip.shiscsijboss
- ldirectordlvm.shlvm_by_lv.shlvm_by_vg.shlxcMysqLMysqL-proxy
- MysqL.shnamednamed.shnetfs.shnfsclient.shnfsexport.shnfsserver
- nfsserver.shNginxocf-shellfuncsopenldap.shoracleoracledb.shorainstance.sh
- oralistener.shoralsnrpgsqlpingpingdportblockpostfix
- postgres-8.shpoundproftpdremotersyncdrsyslogsamba.sh
- script.shscsi2reservationservice.shsfexslapdsmb.shsvclib_nfslock
- symlinksyslog-ngtomcattomcat-5.shtomcat-6.shvarnishvm.sh
- vmwarezabbixserver
启动corosync:
- [root@node1~]#servicecorosyncstart
- StartingCorosyncClusterEngine(corosync):[OK]
- [root@node2~]#servicecorosyncstart
- StartingCorosyncClusterEngine(corosync):[OK]
- [root@node2~]#crmstatus
- Lastupdated:SatJan1807:00:342014
- Lastchange:SatJan1806:58:112014viacrmdonnode1
- Stack:classicopenais(withplugin)
- CurrentDC:node1-partitionwithquorum
- Version:1.1.10-14.el6_5.2-368c726
- 2Nodesconfigured,2expectedvotes
- 0Resourcesconfigured
- Online:[node1node2]
若出现以下错误可先禁止掉stonith,该错误是因为stonith未配置导致,错误如下:
crm_verify[4921]: 2014/01/10_07:34:34 ERROR: unpack_resources: Resource start-up disabled since no STONITH resources have been defined
crm_verify[4921]: 2014/01/10_07:34:34 ERROR: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
crm_verify[4921]: 2014/01/10_07:34:34 ERROR: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity
禁止stonith(只在一个节点上执行即可):
- [root@node1~]#crmconfigurepropertystonith-enabled=false
3.3安装Postgresql
安装目录为/opt/pgsql
{安装过程略}
为postgres用户配置环境变量:
- [postgres@node1~]$cat.bash_profile
- #.bash_profile
- #Getthealiasesandfunctions
- if[-f~/.bashrc];then
- .~/.bashrc
- fi
- #Userspecificenvironmentandstartupprograms
- exportPATH=/opt/pgsql/bin:$PATH:$HOME/bin
- exportPGDATA=/opt/pgsql/data
- exportPGUSER=postgres
- exportPGPORT=5432
- exportLD_LIBRARY_PATH=/opt/pgsql/lib:$LD_LIBRARY_PATH
四、配置
4.1 hosts设置
- #vim/etc/hosts
- 192.168.100.201node1
- 192.168.100.202node2
4.2配置corosync
- [root@node1~]#cd/etc/corosync/
- [root@node1corosync]#ls
- corosync.conf.examplecorosync.conf.example.udpuservice.duidgid.d
- [root@node1corosync]#cpcorosync.conf.examplecorosync.conf
- [root@node1corosync]#vimcorosync.conf
- compatibility:whitetank//兼容旧版本
- totem{//节点间通信协议定义
- version:2
- secauth:on//是否开启安全认证
- threads:0
- interface{//心跳配置
- ringnumber:0
- bindnetaddr:10.10.10.0//绑定网络
- mcastaddr:226.94.1.1//向外发送多播的地址
- mcastport:5405//多播端口
- ttl:1
- }
- }
- logging{//日志设置
- fileline:off
- to_stderr:no//是否发送错误信息到标准输出
- to_logfile:yes//是否记录到日志文件
- to_syslog:yes//是否记录到系统日志
- logfile:/var/log/cluster/corosync.log//日志文件,注意/var/log/cluster目录必须存在
- debug:off
- timestamp:on//日志中是否标记时间
- logger_subsys{
- subsys:AMF
- debug:off
- }
- }
- amf{
- mode:disabled
- }
- service{
- ver:0
- name:pacemaker//启用pacemaker
- }
- aisexec{
- user:root
- group:root
- }
4.3生成密钥
{默认利用random生成,但如果中断的系统随机数不够用就需要较长的时间,此时可以通过urandom来替代random}
- [root@node1corosync]#mv/dev/random/dev/random.bak
- [root@node1corosync]#ln-s/dev/urandom/dev/random
- [root@node1corosync]#corosync-keygen
- CorosyncClusterEngineAuthenticationkeygenerator.
- Gathering1024bitsforkeyfrom/dev/random.
- Presskeysonyourkeyboardtogenerateentropy.
- Writingcorosynckeyto/etc/corosync/authkey.
4.4 SSH互信配置
node1 -> node2 :
- [root@node1~]#cd.ssh/
- [root@node1.ssh]#ssh-keygen-trsa
- Generatingpublic/privatersakeypair.
- Enterfileinwhichtosavethekey(/root/.ssh/id_rsa):
- Enterpassphrase(emptyfornopassphrase):
- Entersamepassphraseagain:
- Youridentificationhasbeensavedin/root/.ssh/id_rsa.
- Yourpublickeyhasbeensavedin/root/.ssh/id_rsa.pub.
- Thekeyfingerprintis:
- 2c:ed:1e:a6:a7:cd:e3:b2:7c:de:aa:ff:63:28:9a:19root@node1
- Thekey'srandomartimageis:
- +--[RSA2048]----+
- ||
- ||
- ||
- |o|
- |.S|
- |o|
- |E+.|
- |=o*=oo|
- |+.*%O=o.|
- +-----------------+
- [root@node1.ssh]#ssh-copy-id-iid_rsa.pubnode2
- Theauthenticityofhost'node2(192.168.100.202)'can'tbeestablished.
- RSAkeyfingerprintisbe:76:cd:29:af:59:76:11:6a:c7:7d:72:27:df:d1:02.
- Areyousureyouwanttocontinueconnecting(yes/no)?yes
- Warning:Permanentlyadded'node2,192.168.100.202'(RSA)tothelistofknownhosts.
- root@node2'spassword:
- Nowtryloggingintothemachine,with"ssh'node2'",andcheckin:
- .ssh/authorized_keys
- tomakesurewehaven'taddedextrakeysthatyouweren'texpecting.
- [root@node1.ssh]#sshnode2date
- SatJan1806:36:21CST2014
node2 -> node1 :
- [root@node2~]#cd.ssh/
- [root@node2.ssh]#ssh-keygen-trsa
- [root@node2.ssh]#ssh-copy-id-iid_rsa.pubnode1
- [root@node2.ssh]#sshnode1date
- SatJan1806:37:31CST2014
4.5同步配置
- [root@node1corosync]#scpauthkeycorosync.confnode2:/etc/corosync/
- authkey100%1280.1KB/s00:00
- corosync.conf100%28082.7KB/s00:00
4.6下载替换脚本
虽然安装了上述软件后会生成pgsql资源脚本,但是其版本过旧,且自带的pgsql不能实现自动切换功能,所以在安装了pacemaker/corosync之后需要从网上下载进行替换,如下:
https://github.com/ClusterLabs/resource-agents/tree/master/heartbeat
下载pgsql与ocf-shellfuncs.in
替换:
- #cppgsql/usr/lib/ocf/resource.d/heartbeat/
- #cpocf-shellfuncs.in/usr/lib/ocf/lib/heartbeat/ocf-shellfuncs
{注意要将ocf-shellfuncs.in名称改为ocf-shellfuncs,否则pgsql可能会找不到要用的函数。新下载的函数定义文件中添加了一些新功能函数,如ocf_local_nodename等}
pgsql资源脚本特性:
●主节点失效切换
master宕掉时,RA检测到该问题并将master标记为stop,随后将slave提升为新的master。
●异步与同步切换
如果slave宕掉或者LAN中存在问题,那么当设置为同步复制时包含写操作的事务将会被终止,也就意味着服务将停止。因此,为防止服务停止RA将会动态地将同步转换为异步复制。
●初始启动时自动识别新旧数据
当两个或多个节点上的Pacemaker同时初始启动时,RA通过每个节点上最近的replay location进行比较,找出最新数据节点。这个拥有最新数据的节点将被认为是master。当然,若在一个节点上启动pacemaker或者该节点上的pacemaker是第一个被启动的,那么它也将成为master。RA依据停止前的数据状态进行裁定。
●读负载均衡
由于slave节点可以处理只读事务,因此对于读操作可以通过虚拟另一个虚拟IP来实现读操作的负载均衡。
4.7启动corosync
启动:
- [root@node1~]#servicecorosyncstart
- [root@node2~]#servicecorosyncstart
检测状态:
- [root@node1~]#crmstatus
- Lastupdated:TueJan2123:55:132014
- Lastchange:TueJan2123:37:362014viacrm_attributeonnode1
- Stack:classicopenais(withplugin)
- CurrentDC:node1-partitionwithquorum
- Version:1.1.10-14.el6_5.2-368c726
- 2Nodesconfigured,'宋体';font-size:13px;white-space:normal;">{corosync启动成功}
4.8配置流复制
在node1/node2上配置postgresql.conf/pg_hba.conf:
postgresql.conf :
listen_addresses = '*'
port = 5432
wal_level = hot_standby
archive_mode = on
archive_command = 'test ! -f /opt/archivelog/%f && cp %p /opt/archivelog/%f'
max_wal_senders = 4
wal_keep_segments = 50
hot_standby = on
pg_hba.conf :
host replication postgres 192.168.1.0/24 trust
在node2上执行基础同步:
[postgres@node2data]$pg_basebackup-h192.168.1.1-Upostgres-D/opt/pgsql/data-P若需测试流复制是否能够成功,可在此处手工配置(corosync启动数据库时自动生成,若已经存在将会被覆盖)recovery.conf进行测试:
standby_mode = 'on'
primary_conninfo = 'host=192.168.1.1 port=5432 user=postgres application_name=node2 keepalives_idle=60 keepalives_interval=5 keepalives_count=5'
restore_command = 'cp /opt/archivelog/%f %p'
recovery_target_timeline = 'latest'
[postgres@node2data]$pg_ctlstart
[postgres@node1pgsql]$psqlpostgres=#selectclient_addr,sync_statefrompg_stat_replication;
client_addr|sync_state
-------------+------------
192.168.1.2|sync
(1row)
停止数据库:
[postgres@node2~]$pg_ctlstop-mf
[postgres@node1~]$pg_ctlstop-mf
4.9配置pacemaker
{关于pacemaker的配置可通过多种方式,如crmsh、hb_gui、pcs等,该实验使用crmsh配置}
编写crm配置脚本:
[root@node1~]#catpgsql.crm property\//设置全局属性 no-quorum-policy="ignore"\//关闭法定投票人数策略,多节点时启用 stonith-enabled="false"\//禁用stonith设备检测 crmd-transition-delay="0s" rsc_defaults\//资源默认属性配置 resource-stickiness="INFINITY"\//资源留在所处位置的自愿程度,INFINITY为无限自愿 migration-threshold="1"//设置资源发生多少次故障时节点将失去管理该资源的资格 msmsPostgresqlpgsql\// Meta\ master-max="1"\ master-node-max="1"\ clone-max="2"\ clone-node-max="1"\ notify="true"cloneclnPingCheckpingCheck//克隆资源
groupmaster-group\//定义资源组
vip-master\
vip-rep
primitivevip-masterocf:heartbeat:IPaddr2\//定义vip-master资源
params\
ip="192.168.100.213"\ nic="eth0"\ cidr_netmask="24"\ opstarttimeout="60s"interval="0s"on-fail="stop"\ opmonitortimeout="60s"interval="10s"on-fail="restart"\ opstoptimeout="60s"interval="0s"on-fail="block"primitivevip-repocf:heartbeat:IPaddr2\//定义vip-rep资源
params\
ip="192.168.1.3"\ nic="eth2"\ cidr_netmask="24"\ Meta\ migration-threshold="0"\ opstarttimeout="60s"interval="0s"on-fail="restart"\ opmonitortimeout="60s"interval="10s"on-fail="restart"\ opstoptimeout="60s"interval="0s"on-fail="block"primitivevip-slaveocf:heartbeat:IPaddr2\//定义vip-slave资源
params\
ip="192.168.100.214"\ nic="eth0"\ cidr_netmask="24"\ Meta\ resource-stickiness="1"\ opstarttimeout="60s"interval="0s"on-fail="restart"\ opmonitortimeout="60s"interval="10s"on-fail="restart"\ opstoptimeout="60s"interval="0s"on-fail="block" primitivepgsqlocf:heartbeat:pgsql\//定义pgsql资源params\//设置相关参数
pgctl="/opt/pgsql/bin/pg_ctl"\ psql="/opt/pgsql/bin/psql"\ pgdata="/opt/pgsql/data/"\ start_opt="-p5432"\ rep_mode="sync"\ node_list="node1node2"\ restore_command="cp/opt/archivelog/%f%p"\ primary_conninfo_opt="keepalives_idle=60keepalives_interval=5keepalives_count=5"\ master_ip="192.168.1.3"\ stop_escalate="0"\ opstarttimeout="60s"interval="0s"on-fail="restart"\ opmonitortimeout="60s"interval="7s"on-fail="restart"\ opmonitortimeout="60s"interval="2s"on-fail="restart"role="Master"\ oppromotetimeout="60s"interval="0s"on-fail="restart"\ opdemotetimeout="60s"interval="0s"on-fail="stop"\ opstoptimeout="60s"interval="0s"on-fail="block"\ opnotifytimeout="60s"interval="0s"primitivepingCheckocf:pacemaker:ping\//定义pingCheck资源
params\
name="default_ping_set"\ host_list="192.168.100.1"\ multiplier="100"\ opstarttimeout="60s"interval="0s"on-fail="restart"\ opmonitortimeout="60s"interval="10s"on-fail="restart"\ opstoptimeout="60s"interval="0s"on-fail="ignore"locationrsc_location-1vip-slave\//定义资源vip-slave选择位置
rule200:pgsql-statuseq"HS:sync"\ rule100:pgsql-statuseq"PRI"\ rule-inf:not_definedpgsql-status\ rule-inf:pgsql-statusne"HS:sync"andpgsql-statusne"PRI" locationrsc_location-2msPostgresql\//定义资源msPostgresql选择位置rule-inf:not_defineddefault_ping_setordefault_ping_setlt100
colocationrsc_colocation-1inf:msPostgresqlclnPingCheck//定义在相同节点上运行的资源 colocationrsc_colocation-2inf:master-groupmsPostgresql:Master orderrsc_order-10:clnPingCheckmsPostgresql//定义对资源的操作顺序 orderrsc_order-20:msPostgresql:promotemaster-group:startsymmetrical=false orderrsc_order-30:msPostgresql:demotemaster-group:stopsymmetrical=false注:该脚本针对网上的配置方式做了一点修改,因为网上是针对pacemaker-1.0.*进行配置的,而本实验使用的是pacemaker-1.1.10。
导入配置脚本:
[root@node1~]#crmconfigureloadupdatepgsql.crm WARNING:pgsql:specifiedtimeout60sforstopissmallerthantheadvised120 WARNING:pgsql:specifiedtimeout60sforstartissmallerthantheadvised120 WARNING:pgsql:specifiedtimeout60sfornotifyissmallerthantheadvised90 WARNING:pgsql:specifiedtimeout60sfordemoteissmallerthantheadvised120 WARNING:pgsql:specifiedtimeout60sforpromoteissmallerthantheadvised120一段时间后查看ha状态:
sql[pgsql]Masters:[node1]
Slaves:[node2]
CloneSet:clnPingCheck[pingCheck]
Started:[node1node2]
[root@node1~]#crm_mon-Afr-1
Lastupdated:TueJan2123:37:202014
Lastchange:TueJan2123:37:362014viacrm_attributeonnode1
Stack:classicopenais(withplugin)
CurrentDC:node1-partitionwithquorum
Version:1.1.10-14.el6_5.2-368c726
2Nodesconfigured,2expectedvotes
7Resourcesconfigured
Online:[node1node2]
Fulllistofresources:
vip-slave(ocf::heartbeat:IPaddr2):Startednode2
ResourceGroup:master-group
vip-master(ocf::heartbeat:IPaddr2):Startednode1
vip-rep(ocf::heartbeat:IPaddr2):Startednode1
Master/SlaveSet:msPostgresql[pgsql]Masters:[node1]
Slaves:[node2]
CloneSet:clnPingCheck[pingCheck]
Started:[node1node2]
NodeAttributes:
*Nodenode1:
+default_ping_set:100
+master-pgsql:1000 +pgsql-data-status:LATEST +pgsql-master-baseline:0000000006000078 +pgsql-status:PRI*Nodenode2:
+default_ping_set:100
+master-pgsql:100 +pgsql-data-status:STREAMING|SYNC +pgsql-status:HS:syncMigrationsummary:
*Nodenode2:
*Nodenode1:
注:刚启动时两节点均为slave,一段时间后node1自动切换为master。