Replica Set Tutorial

NoSQL/mongoDB

Replica Set Tutorial

솔라리스™ 2011. 7. 8. 15:55

This tutorial will guide you through the basic configuration of a replica set. Given the tutorial is an example and should be easy to try, it runs several mongod processes on a single machine (in the real world one would use several machines). If you are attempting to deploy replica sets in production, be sure to read the replica set documentation. Replica sets are available in MongoDB V1.6+.

Introduction

A replica set is group of n mongod nodes (members) that work together. The goal is that each member of the set has a complete copy (replica) of the data form the other nodes.

Setting up a replica set is a two-step process that requires starting each mongod process and then formally initiating the set. Here, we'll be configuring a set of three nodes, which is standard.

Once the mongod processes are started, we will issue a command to initialize the set. After a few seconds, one node will be elected master, and you can begin writing to and querying the set.

Starting the nodes

First, create a separate data directory for each of the nodes in the set. In a real environment with multiple servers we could use the default /data/db directory if we wanted to, but on a single machine we will have to set up non-defaults:

$ mkdir -p /data/r0
$ mkdir -p /data/r1
$ mkdir -p /data/r2

Next, start each mongod process with the --replSet parameter. The parameter requires that you specify a logical name for our replica set. Let's call our replica set "foo". We'll launch our first node like so:

$ mongod --replSet foo --port 27017 --dbpath /data/r0

Let's now start the second and third nodes:

$ mongod --replSet foo --port 27018 --dbpath /data/r1
$ mongod --replSet foo --port 27019 --dbpath /data/r2

You should now have three nodes running. At this point, each node should be printing the following warning:

Mon Aug 2 11:30:19 [startReplSets] replSet can't get local.system.replset config from self or any seed (EMPTYCONFIG)

We can't use the replica set until we've initiated it, which we'll do next.

Initiating the Set

We can initiate the replica set by connecting to one of the members and running the replSetInitiate command (that is,rs.initiate() in the mongo shell). This command takes a configuration object that specifies the name of the set and each of the members.

The replSetInitiate command may be sent to any member of an uninitiated set. However, only the member performing the initiation may have any existing data. This data becomes the initial data for the set. The other members will begin synchronizing and receiving that data (if present; starting empty is fine too). This is called the "initial sync". Secondaries will not be online for reads (in state 2, "SECONDARY") until their initial sync completes.

Note: the replication oplog (in the local database) is allocated at initiation time. The oplog can be quite large, thus initiation may take some time.

$ mongo localhost:27017
MongoDB shell version: 1.5.7
connecting to: localhost:27017/test
> rs.help(); // if you are curious run this (optional)
>
> config = {_id: 'foo', members: [
                          {_id: 0, host: 'localhost:27017'},
                          {_id: 1, host: 'localhost:27018'},
                          {_id: 2, host: 'localhost:27019'}]
           }
> rs.initiate(config);
{
   "info" : "Config now saved locally.  Should come online in about a minute.",
   "ok" : 1
}

We specify the config object and then pass it to rs.initiate(). Then, if everything is in order, we get a response saying that the replica set will be online in a minute. During this time, one of the nodes will be elected master.

To check the status of the set, run rs.status():

> rs.status()
{
	"set" : "foo",
	"date" : "Mon Aug 02 2010 11:39:08 GMT-0400 (EDT)",
	"myState" : 1,
	"members" : [
		{
			"name" : "arete.local:27017",
			"self" : true,
		},
		{
			"name" : "localhost:27019",
			"health" : 1,
			"uptime" : 101,
			"lastHeartbeat" : "Mon Aug 02 2010 11:39:07 GMT-0400",
		},
		{
			"name" : "localhost:27018",
			"health" : 1,
			"uptime" : 107,
			"lastHeartbeat" : "Mon Aug 02 2010 11:39:07 GMT-0400",
		}
	],
	"ok" : 1
}

You'll see that the other members of the set are up. You may also notice that the myState value is 1, indicating that we're connected to the member which is currently primary; a value of 2 indicates a secondary.

You can also check the set's status in the HTTP Admin UI.

Replication

Go ahead and write something to the master node:

db.messages.insert({name: "ReplSet Tutorial"});

If you look at the logs on the secondary nodes, you'll see the write replicated.

Failover

The purpose of a replica set is to provide automated failover. This means that, if the primary node goes down, a secondary node can take over. When this occurs the set members which are up perform an election to select a new primary. To see how this works in practice, go ahead and kill the master node with Control-C (^C) (or if running with --journal, kill -9 would be ok too):

^CMon Aug  2 11:50:16 got kill or ctrl c or hup signal 2 (Interrupt), will terminate after current cmd ends
Mon Aug  2 11:50:16 [interruptThread] now exiting
Mon Aug  2 11:50:16  dbexit:

If you look at the logs on the secondaries, you'll see a series of messages indicating fail-over. On our first slave, we see this:

Mon Aug  2 11:50:16 [ReplSetHealthPollTask] replSet info localhost:27017 is now down (or slow to respond)
Mon Aug  2 11:50:17 [conn1] replSet info voting yea for 2
Mon Aug  2 11:50:17 [rs Manager] replSet not trying to elect self as responded yea to someone else recently
Mon Aug  2 11:50:27 [rs_sync] replSet SECONDARY

And on the second, this:

Mon Aug  2 11:50:17 [ReplSetHealthPollTask] replSet info localhost:27017 is now down (or slow to respond)
Mon Aug  2 11:50:17 [rs Manager] replSet info electSelf 2
Mon Aug  2 11:50:17 [rs Manager] replSet PRIMARY
Mon Aug  2 11:50:27 [initandlisten] connection accepted from 127.0.0.1:61263 #5

Both nodes notice that the master has gone down and, as a result, a new primary node is elected. In this case, the node at port 27019 is promoted. If we bring the failed node on 27017 back online, it will come back up as a secondary.

Changing the replica set configuration

There are times when you'll want to change the replica set configuration. Suppose, for instance, that you want to make a member have priority zero, indicating the member should never be primary. To do this, you need to pass a new configuration object to the database's replSetReconfig command. The shell rs.reconfig() helper makes this easier.

One note: the reconfig command must be sent to the current primary of the set. This implies that you need a majority of the set up to perform a reconfiguration.

> // we should be primary.  can be checked with rs.status() or with:
> rs.isMaster();
> var c = rs.conf();
{_id: 'foo', members: [
                       {_id: 0, host: 'localhost:27017'},
                       {_id: 1, host: 'localhost:27018'},
                       {_id: 2, host: 'localhost:27019'}]
}
> c.members[2].priority = 0;
> c
{_id: 'foo', members: [
                       {_id: 0, host: 'localhost:27017'},
                       {_id: 1, host: 'localhost:27018'},
                       {_id: 2, host: 'localhost:27019', priority: 0}]
}
> rs.reconfig(c);
> //done. to see new config,and new status:
> rs.conf()
> rs.status()

Running with two nodes

Suppose you want to run replica sets with just two database servers (that is, have a replication factor of two). This is possible, but as replica sets perform elections, here a majority would be 2 out of 2 which is not helpful. Thus in this situation one normally also runs an arbiter on a separate server. An arbiter is a set member which has no data but gets to vote in elections. In the case here, the arbiter is the tie breaker in elections. Arbiters are very lightweight and can be ran anywhere – say, on an app server or a micro vm. With an arbiter in place, the replica set will behave appropriately, recovering automatically during both network partitions and node failures (e.g., machine crashes).

You start up an arbiter just as you would a standard replica set node, as a mongod process with the --replSet option. However, when initiating, you need to include the arbiterOnly option in the config document.

With an arbiter, the configuration presented above would look like this instead:

config = {_id: 'foo', members: [
                          {_id: 0, host: 'localhost:27017'},
                          {_id: 1, host: 'localhost:27018'},
                          {_id: 2, host: 'localhost:27019', arbiterOnly: true}]
           }

Drivers

Most of the MongoDB drivers are replica set aware. The driver when connecting takes a list of seed hosts from the replica set and can then discover which host is primary and which are secondary (the isMaster command is used internally by the driver for this).

With this complete set of potential master nodes, the driver can automatically find the new master if the current master fails. See your driver's documentation for specific details.

If you happen to be using the Ruby driver, you may want to check out Replica Sets in Ruby.

참고 : http://mongodb.onconfluence.com/display/DOCSKR/Home ( 한국어 메뉴얼 )
http://groups.google.com/group/mongodb-kr ( 한국 MongoDB 사용자 그룹 )