ReplSet Replicatin
master的IP: 10.0.1.9 slave IP: 10.0.106.2、10.0.106.6
以replSet形式启动master,replSet名称设置为rs0
mongod --dbpath /home/www/data/ --replSet rs0
连接master,将自己做为一个member添加进去
> rs.initiate({_id:"rs0",members:[{_id:0,host:"10.0.1.9:27017"}]})
{
"info" : "Config now saved locally. Should come online in about a minute.",
"ok" : 1
}
rs0:PRIMARY>
初始化成功后,shell会发生改变为 rs0:PRIMARY,标志着replSet的名称和主server;
添加成员
rs0:PRIMARY> rs.add("10.0.106.2:27017")
{ "ok" : 1 }
rs0:PRIMARY> rs.add("10.0.106.6:27017")
{ "ok" : 1 }
此时master的日志里面会出现以下log信息,说明已经有成员连接进来
2017-11-15T10:48:43.995+0800 [rsHealthPoll] replSet member 10.0.106.2:27017 is now in state SECONDARY
而slave的日志如下,开始连接master,然后并开始同步数据
2017-11-15T10:48:27.911+0800 I NETWORK [initandlisten] connection accepted from 10.0.1.9:54292 #1 (1 connection now open)
2017-11-15T10:48:27.913+0800 I NETWORK [initandlisten] connection accepted from 10.0.1.9:54293 #2 (2 connections now open)
2017-11-15T10:48:27.917+0800 I ASIO [NetworkInterfaceASIO-Replication-0] Successfully connected to 10.0.1.9:27017
2017-11-15T10:48:27.947+0800 I NETWORK [conn1] end connection 10.0.1.9:54292 (1 connection now open)
2017-11-15T10:48:27.950+0800 I REPL [replExecDBWorker-0] Starting replication applier threads
2017-11-15T10:48:27.950+0800 I REPL [ReplicationExecutor] New replica set config in use: { _id: "rs0", version: 2, members: [ { _id: 0, host: "10.0.1.9:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, ta
gs: {}, slaveDelay: 0, votes: 1 }, { _id: 1, host: "10.0.106.2:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } ], settings: { chainingAllowed: true, heartbeatIntervalMill
is: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 } } }
2017-11-15T10:48:27.950+0800 I REPL [ReplicationExecutor] This node is 10.0.106.2:27017 in the config
2017-11-15T10:48:27.950+0800 I REPL [ReplicationExecutor] transition to STARTUP2
2017-11-15T10:48:27.951+0800 I REPL [rsSync] ******
2017-11-15T10:48:27.951+0800 I REPL [rsSync] creating replication oplog of size: 990MB...
2017-11-15T10:48:27.952+0800 I REPL [ReplicationExecutor] Member 10.0.1.9:27017 is now in state PRIMARY
2017-11-15T10:48:27.956+0800 I STORAGE [rsSync] Starting WiredTigerRecordStoreThread local.oplog.rs
2017-11-15T10:48:27.957+0800 I STORAGE [rsSync] The size storer reports that the oplog contains 0 records totaling to 0 bytes
2017-11-15T10:48:27.957+0800 I STORAGE [rsSync] Scanning the oplog to determine where to place markers for truncation
2017-11-15T10:48:27.996+0800 I REPL [rsSync] ******
2017-11-15T10:48:27.996+0800 I REPL [rsSync] initial sync pending
2017-11-15T10:48:28.012+0800 I REPL [ReplicationExecutor] syncing from: 10.0.1.9:27017
2017-11-15T10:48:28.017+0800 I REPL [rsSync] initial sync drop all databases
2017-11-15T10:48:28.017+0800 I STORAGE [rsSync] dropAllDatabasesExceptLocal 1
2017-11-15T10:48:28.017+0800 I REPL [rsSync] initial sync clone all databases
mongo shell连接至slave的时候会发现shell变成了如下样式;标志着此server为rs0的replSet,为第二个节点
rs0:SECONDARY>
查询配置信息
rs0:SECONDARY> rs.conf();
{ "_id" : "rs0",
"version" : 11,
"members" : [
{
"_id" : 0,
"host" : "10.0.1.9:27017",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 2,
"tags" : {
},
"slaveDelay" : NumberLong(0),
"votes" : 1
},
{
"_id" : 1,
"host" : "10.0.106.2:27017",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 0.8,
"tags" : {
},
"slaveDelay" : NumberLong(0),
"votes" : 1
},
{
"_id" : 2,
"host" : "10.0.106.6:27017",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 1,
"tags" : {
},
"slaveDelay" : NumberLong(0),
"votes" : 1
}
],
"settings" : {
"chainingAllowed" : true,
"heartbeatIntervalMillis" : 2000,
"heartbeatTimeoutSecs" : 10,
"electionTimeoutMillis" : 10000,
"getLastErrorModes" : {
},
"getLastErrorDefaults" : {
"w" : 1,
"wtimeout" : 0
}
}
}
查看状态
rs0:SECONDARY> rs.status()
{
"set" : "rs0",
"date" : ISODate("2017-11-15T03:20:32.831Z"),
"myState" : 2,
"term" : NumberLong(-1),
"heartbeatIntervalMillis" : NumberLong(2000),
"members" : [
{
"_id" : 0,
"name" : "10.0.1.9:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 973,
"optime" : Timestamp(1510714107, 1),
"optimeDate" : ISODate("2017-11-15T02:48:27Z"),
"lastHeartbeat" : ISODate("2017-11-15T03:20:32.411Z"),
"lastHeartbeatRecv" : ISODate("2017-11-15T03:20:32.129Z"),
"pingMs" : NumberLong(0),
"electionTime" : Timestamp(1510715061, 1),
"electionDate" : ISODate("2017-11-15T03:04:21Z"),
"configVersion" : 2
},
{
"_id" : 1,
"name" : "10.0.106.2:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 974,
"optime" : Timestamp(1510714107, 1),
"optimeDate" : ISODate("2017-11-15T02:48:27Z"),
"infoMessage" : "could not find member to sync from",
"configVersion" : 2,
"self" : true
}
],
"ok" : 1
}
遇到的问题
> rs.initiate({_id:"xmdb",members:[{_id:0,host:"10.0.1.9:27017"}]})
{
"ok" : 0,
"errmsg" : "local.oplog.rs is not empty on the initiating member. cannot initiate."
}
原因是因为本地已经存在了 replSet的opLog,所以需要去掉,再重新initiate
> use local
switched to db local
> db.dropDatabase()
{ "dropped" : "local", "ok" : 1 }
再重新启动,可以考虑换一个新的replSet的名称
mongod --dbpath /home/www/data/ --replSet rs0
再重新初始化即可
> rs.initiate({_id:"rs0",members:[{_id:0,host:"10.0.1.9:27017"}]})
{
"info" : "Config now saved locally. Should come online in about a minute.",
"ok" : 1
}
rs0:PRIMARY>
或者直接强制初始化
rs.reconfig(config, {force: true})
设置权重
rs0:PRIMARY> cfg = rs.conf();
{
"_id" : "rs0",
"version" : 2,
"members" : [
{
"_id" : 0,
"host" : "10.0.1.9:27017"
"uth_id" : 1,
"host" : "10.0.106.2:27017"
}
]
}
rs0:PRIMARY> cfg.members[0].priority=2
2
rs0:PRIMARY> cfg.members[1].priority=1
1
rs0:PRIMARY> cfg.members[2].priority=1
1
rs0:PRIMARY> rs.reconfig(cfg)
注意事项
- 权重的设置只能在Master当中
rs0:SECONDARY> rs.reconfig(cfg)
{
"ok" : 0,
"errmsg" : "replSetReconfig command must be sent to the current replica set primary."
}
- 三个副本的replSet的名称必须一致
- 只有两个副本集的时候,PRIMARY挂掉之后,SECONDARY是不会成为PRIMARY的,必须三个副本集以上,最多50个副本集,并且只允许7个可投票进行选举的成员
- priority设置为0的成员是不参与投票的
- 当PRIMARY挂掉的时候,权重高的会被设置为PRIMARY;
- 非PRIMARY的副本只允许查询,不允许其他的操作;
使用Springboot进行操作
查看rs设置
rs0:PRIMARY> rs.conf()
{
"_id" : "rs0",
"version" : 12,
"members" : [
{
"_id" : 0,
"host" : "10.0.1.9:27017",
"priority" : 2
},
{
"_id" : 1,
"host" : "10.0.106.2:27017",
"priority" : 0.8
},
{
"_id" : 2,
"host" : "10.0.106.6:27017",
"priority" : 0.5
}
],
"settings" : {
"getLastErrorDefaults" : {
"w" : 1,
"wtimeout" : 0
}
}
}
springboot application.properties文件配置
mongo.replicaSet=mongodb://10.0.1.9:27017,10.0.106.2:27017,10.0.106.6:27017
PRIMARY: 10.0.1.9 SECONDARY: 10.0.106.2, 10.0.106.6
当工程启动后,实验过程和结论如下
- 关闭10.0.1.9
- 10.0.106.2成为PRIMARY
- 10.0.106.2与10.0.106.6都在尝试重连10.0.1.9
- 此时应用操作正常,后台无连数据不上的问题
- 关闭10.0.106.2
- 此时没有PRIMARY,10.0.106.6仍然是SECONDARY节点
- 此时应用操作不正常,后台过了最大尝试重连时间后直接异常提示
org.springframework.dao.DataAccessResourceFailureException: Timed out after 30000 ms while waiting for a server that matches {serverSelectors=[ReadPreferenceServerSelector{readPreference=primary}, LatencyMinimizingServerSelector{acceptableLatencyDifference=15 ms}]}. Client view of cluster state is {type=ReplicaSet, servers=[{address=10.0.1.9:27017, type=Unknown, state=Connecting, exception={com.mongodb.MongoException$Network: Exception opening the socket}, caused by {java.net.ConnectException: Connection refused}}, {address=10.0.106.2:27017, type=ReplicaSetSecondary, averageLatency=11.9 ms, state=Connected}, {address=10.0.106.6:27017, type=Unknown, state=Connecting, exception={com.mongodb.MongoException$Network: Exception opening the socket}, caused by {java.net.ConnectException: Connection refused}}]; nested exception is com.mongodb.MongoTimeoutException: Timed out after 30000 ms while waiting for a server that matches {serverSelectors=[ReadPreferenceServerSelector{readPreference=primary}, LatencyMinimizingServerSelector{acceptableLatencyDifference=15 ms}]}. Client view of cluster state is {type=ReplicaSet, servers=[{address=10.0.1.9:27017, type=Unknown, state=Connecting, exception={com.mongodb.MongoException$Network: Exception opening the socket}, caused by {java.net.ConnectException: Connection refused}}, {address=10.0.106.2:27017, type=ReplicaSetSecondary, averageLatency=11.9 ms, state=Connected}, {address=10.0.106.6:27017, type=Unknown, state=Connecting, exception={com.mongodb.MongoException$Network: Exception opening the socket}, caused by {java.net.ConnectException: Connection refused}}]
-
重新启动10.0.106.2
- 节点10.0.106.2变为PRIMARY
- 此时应用开始恢复,正常操作
-
重新启动10.0.1.9
- 节点10.0.106.6变为SECONDARY,10.0.1.9变为PRIMARY
- 此时应用正常操作
-
当没有PRIMARY的时候,应用才会宕机
-
当三个节点组成replica set的时候,集群宕机到只剩下一台机器的时候,就没有PRIMARY节点,应用也就无法正常运转
-
spring配置文件当中的配置顺序没有关系