重新建立管理流程仲裁

如上所述 管理流程仲裁,需要域中的管理过程(AP)的仲裁才能对耐用域配置进行任何更改。在不太可能的情况下,连接到足够的主机已经丢失,以便无法建立仲裁,执行以下斯特波以重新建立AP仲裁。

The commands required to re-establishing AP quorum are issued using NuoDB Command (nuocmd). For more information on NuoDB Command and other command line tools, see 命令行工具.

1. First confirm that action is required and obtain information on the servers that need to be removed from the domain. Run the show domain command to confirm that quorum has been lost and confirm the IDs of the disconnected admin servers. In the example below, servers r0db2, r0db3 and r0db4 are disconnected from r0db0 and r0db1 so quorum is not established and no leader is identified.

It is possible that the admin servers that showing as disconnected are in fact still running and have a majority partition of the domain. For example this could happen if there was a network partition with two hosts (r0db0 and r0db1) on one side of the partition and three hosts (r0db2, r0db3 and r0db4) on the other. If show domain is run connecting to one of the second set of hosts (r0db2, r0db3 or r0db4) then it would show three connected admin servers constituting a majority of the domain.

独立确认无法重新启动或重新启动断开连接主机上的管理服务器,并且需要在剩余主机上重新建立仲裁时进行操作。如果断开连接的管理服务器仍在运行并且您应该具有有效的仲裁 不是 使用少数管理员服务器重新建立AP仲裁。
nuocmd show domain
...
Servers:
  [r0db0] 172.31.45.7:48005 [last_ack = 9.32] [member = ADDED] [raft_state = ACTIVE] (FOLLOWER, Leader=<NO VALUE>, log=5/99/100) Connected *
  [r0db1] 172.31.44.101:48005 [last_ack = 9.32] [member = ADDED] [raft_state = ACTIVE] (FOLLOWER, Leader=<NO VALUE>, log=5/99/100) Connected
  [r0db2] 172.31.42.100:48005 [last_ack = 119.32] [member = ADDED] [raft_state = ACTIVE] (FOLLOWER, Leader=r0db4, log=5/98/98) Disconnected
  [r0db3] 172.31.47.31:48005 [last_ack = 59.32] [member = ADDED] [raft_state = ACTIVE] (FOLLOWER, Leader=r0db4, log=5/98/98) Disconnected
  [r0db4] 172.31.47.176:48005 [last_ack = 29.32] [member = ADDED] [raft_state = ACTIVE] (LEADER, Leader=r0db4, log=5/99/99) Disconnected

In this example the servers r0db2, r0db3 and r0db4 are disconnected. It is determined that r0db2 may be able to be restarted in the future but not soon enough to quickly form a valid quorum. It will not be possible to start the admin service on r0db3 and r0db4 and therefore they are to be removed from the domain.

2. To re-establish an AP quorum the disconnected servers will be temporarily removed from the voting for leadership. This is done by restarting all of the surviving admin servers with the --evicted-servers option and a comma separated list of servers to exclude from voting. This must be done on 全部 of the surviving admin servers. In this example we restart the servers on r0db0 and r0db1 with the following command:

service nuoadmin restart --evicted-servers r0db3,r0db4

This removes r0db3 and r0db4 from the voting for AP quorum but will not remove them from the durable domain.

3.由于现在有三个投票成员而不是原来的五个,其余的管理员服务器形成了大多数三分之一。这可以通过代表新的领导者而验证,并且是连接的管理服务器之一:

nuocmd show domain
...
Servers:
  [r0db0] 172.31.45.7:48005 [last_ack = 4.89] [member = ADDED] [raft_state = ACTIVE] (LEADER, Leader=r0db0, log=46/103/103) Connected *
  [r0db1] 172.31.44.101:48005 [last_ack = 4.61] [member = ADDED] [raft_state = ACTIVE] (FOLLOWER, Leader=r0db0, log=46/103/103) Connected
  [r0db2] 172.31.42.100:48005 [last_ack = NEVER] [member = ADDED] [raft_state = <NO VALUE>] (<NO VALUE>, Leader=<NO VALUE>, log=?/?/?) Disconnected
  [r0db3] 172.31.47.31:48005 [last_ack = NEVER] [member = ADDED] [raft_state = <NO VALUE>] (<NO VALUE>, Leader=<NO VALUE>, log=?/?/?) Disconnected
  [r0db4] 172.31.47.176:48005 [last_ack = NEVER] [member = ADDED] [raft_state = <NO VALUE>] (<NO VALUE>, Leader=<NO VALUE>, log=?/?/?) Disconnected

4. The disconnected servers can now be permanently removed from the durable domain using the delete server command.

nuocmd delete server --server-id r0db4
nuocmd show domain
...
Servers:
  [r0db0] 172.31.45.7:48005 [last_ack = 5.35] [member = ADDED] [raft_state = ACTIVE] (LEADER, Leader=r0db0, log=46/106/106) Connected *
  [r0db1] 172.31.44.101:48005 [last_ack = 5.35] [member = ADDED] [raft_state = ACTIVE] (FOLLOWER, Leader=r0db0, log=46/106/106) Connected
  [r0db2] 172.31.42.100:48005 [last_ack = NEVER] [member = ADDED] [raft_state = <NO VALUE>] (<NO VALUE>, Leader=<NO VALUE>, log=?/?/?) Disconnected
  [r0db3] 172.31.47.31:48005 [last_ack = NEVER] [member = ADDED] [raft_state = <NO VALUE>] (<NO VALUE>, Leader=<NO VALUE>, log=?/?/?) Disconnected
nuocmd delete server --server-id r0db3
nuocmd show domain
...
Servers:
[r0db0] 172.31.45.7:48005 [last_ack = 9.55] [member = ADDED] [raft_state = ACTIVE] (LEADER, Leader=r0db0, log=46/109/109) Connected *
[r0db1] 172.31.44.101:48005 [last_ack = 9.55] [member = ADDED] [raft_state = ACTIVE] (FOLLOWER, Leader=r0db0, log=46/106/106) Connected
 [r0db2] 172.31.42.100:48005 [last_ack = NEVER] [member = ADDED] [raft_state = <NO VALUE>] (<NO VALUE>, Leader=<NO VALUE>, log=?/?/?) Disconnected
删除的服务器可能无法使用现有服务器ID重新输入域。

5. Restart the admin servers without the --evicted-servers option.

要立即重新启动管理员服务器非常重要。未能这样做可能会导致未来域名投票的问题。

如果导致从域中删除服务器ID的问题是解析的,则可能希望再次在主机上启动管理服务器。