SIWOO.FREE.SOUL

'전체 글'에 해당되는 글 380건

2011.07.08 Setting up replica sets with MongoDB 1.6
2011.07.08 Replica Set Tutorial
2011.04.22 다시보는 이지아 외계인설 1
2011.04.12 How FriendFeed uses MySQL to store schema-less data
2011.04.12 Twitter와 FriendFeed 이야기

Setting up replica sets with MongoDB 1.6

NoSQL/mongoDB 2011. 7. 8. 17:35 |

Introduction

MongoDB 1.6 was released today, and it includes, among other things it includes support for the incredible sexy replica sets feature – basically master/slave replication on crack with automatic failover and the like. I’m setting it up, and figured I’d document the pieces as I walk through them.

My test deploy is going to consist of two nodes and one arbiter; production will have several more potential nodes. We aren’t worrying about sharding at this point, but 1.6 brings automatic sharding with it, as well, so we can enable that at a later point if we need to.

Installation

Installation is very easy. 10gen offers a yum repo, so it’s as easy as adding the repo to /etc/yum.repos.d and then running yum install mongo-stable mongo-server-stable.

Once installed, mongo --version confirms that we’re on 1.6. Time to boot up our nodes.

Configuration

For staging, we’re going to run both replica nodes and the arbiter on a single machine. This means 3 configs.

I have 3 config files in /etc/mongod/ – mongo.node1.conf, mongo.node2.conf, and mongo.arbiter.conf. As follows:

# mongo.node1.conf

replSet=my_replica_set

logpath=/var/log/mongo/mongod.node1.log

port = 27017

logappend=true

dbpath=/var/lib/mongo/node1

fork = true

rest = true

# mongo.node2.conf

replSet=my_replica_set

logpath=/var/log/mongo/mongod.node2.log

port = 27018

logappend=true

dbpath=/var/lib/mongo/node2

fork = true

# mongo.arbiter.conf

replSet=my_replica_set

logpath=/var/log/mongo/mongod.arbiter.log

port = 27019

logappend=true

dbpath=/var/lib/mongo/arbiter

fork = true

oplogSize = 1

Starting it up

Then we just fire up our daemons:

mongod -f /etc/mongod/mongo.node1.conf

mongod -f /etc/mongod/mongo.node2.conf

mongod -f /etc/mongod/mongo.arbiter.conf

Once we spin up the servers, they need a bit to allocate files and start listening. I tried to connect a bit too early, and got the following:

[root@261668-db3 mongo]# mongo

MongoDB shell version: 1.6.0

connecting to: test

Fri Aug  6 03:48:40 Error: couldn't connect to server 127.0.0.1} (anon):1137

exception: connect failed

Configuring replica set members

Once you can connect to the mongo console, and we need to set up the replica set. If you have a compliant configuration, then you can just call rs.initiate() and everything will get spun up. If you don’t, though, you’ll need to specify your initial configuration.

This is where I hit my first problem; the hostname as the system defines it didn’t resolve. This was resulting in the following:

[root@261668-db3 init.d]# mongo --port 27017

MongoDB shell version: 1.6.0

connecting to: 127.0.0.1:27017/test

> rs.initiate();

{

        "info2" : "no configuration explicitly specified -- making one",

        "errmsg" : "couldn't initiate : need members up to initiate, not ok : 261668-db3.db3.domain.com:27017",

        "ok" : 0

}

The solution, then, is to specify the members, and to use a resolvable internal name. Note that you do NOT include the arbiter’s information; you don’t want to add it to the replica set early as a full-fledged member.

> cfg = {_id: "my_replica_set", members: [{_id: 0, host: "db3:27017"}, {_id: 1, host: "db3:27018"}] }

> rs.initiate(cfg);

{

        "info" : "Config now saved locally.  Should come online in about a minute.",

        "ok" : 1

}

Bingo. We’re in business.

Configuring the replica set arbiter

If the replica set master fails, a new master is elected. To be elected, a replica master needs to have at least floor(n / 2) + 1 votes, where n is the number of active nodes in the cluster. In a paired setup, if the master were to fail, then the remaining slave wouldn’t be able to elect itself to the new master, since it would only have 1 vote. Thus, we run an arbiter, which is a special lightweight, no-data-contained node whose only job is to be a tiebreaker. It will vote with the orphaned slave and elect it to the new master, so that the slave can continue duties while the old master is offline.

> rs.addArb("db3:27019")

{

        "startupStatus" : 6,

        "errmsg" : "Received replSetInitiate - should come online shortly.",

        "ok" : 0

}

Updated driver usage

Once we’re set up, the Ruby Mongo connection code is updated to connect to a replica set rather than a single server.

Before:

MongoMapper.connection = Mongo::Connection.new("db3", 27017)

After

MongoMapper.connection = Mongo::Connection.multi([["db3", 27017], ["db3", 27018]])

This will attempt to connect to each of the defined servers, and get a list of all the visible nodes, then find the master. Since you don’t have to specify the full list, you don’t have to update your connection info each time you change the machines in the set. All it needs is at least one connectable server (even a slave) and the driver will figure out the master from there.

Conclusion

That’s about all there is to it! We’re now up and running with a replica set. We can add new slaves to the replica set, force a new master, take nodes in the cluster down, and all that jazz without impacting your app. You can even set up replica slaves in other data centers for zero-effort offsite backup. If your DB server exploded, you could point your app at the external datacenter’s node and keep running while you replace your local database server. Once your new server is up, just bring it online and re-add its node back into your replica set. Data will be transparently synched back to your local node. Once the sync is complete, you can re-elect your local node as the master, and all is well again.

Congratulations – enjoy your new replica set!

출처 : http://www.coffeepowered.net/2010/08/06/setting-up-replica-sets-with-mongodb-1-6/

Posted by 솔라리스™

Replica Set Tutorial

NoSQL/mongoDB 2011. 7. 8. 15:55 |

This tutorial will guide you through the basic configuration of a replica set. Given the tutorial is an example and should be easy to try, it runs several mongod processes on a single machine (in the real world one would use several machines). If you are attempting to deploy replica sets in production, be sure to read the replica set documentation. Replica sets are available in MongoDB V1.6+.

Introduction

A replica set is group of n mongod nodes (members) that work together. The goal is that each member of the set has a complete copy (replica) of the data form the other nodes.

Setting up a replica set is a two-step process that requires starting each mongod process and then formally initiating the set. Here, we'll be configuring a set of three nodes, which is standard.

Once the mongod processes are started, we will issue a command to initialize the set. After a few seconds, one node will be elected master, and you can begin writing to and querying the set.

Starting the nodes

First, create a separate data directory for each of the nodes in the set. In a real environment with multiple servers we could use the default /data/db directory if we wanted to, but on a single machine we will have to set up non-defaults:

$ mkdir -p /data/r0
$ mkdir -p /data/r1
$ mkdir -p /data/r2

Next, start each mongod process with the --replSet parameter. The parameter requires that you specify a logical name for our replica set. Let's call our replica set "foo". We'll launch our first node like so:

$ mongod --replSet foo --port 27017 --dbpath /data/r0

Let's now start the second and third nodes:

$ mongod --replSet foo --port 27018 --dbpath /data/r1
$ mongod --replSet foo --port 27019 --dbpath /data/r2

You should now have three nodes running. At this point, each node should be printing the following warning:

Mon Aug 2 11:30:19 [startReplSets] replSet can't get local.system.replset config from self or any seed (EMPTYCONFIG)

We can't use the replica set until we've initiated it, which we'll do next.

Initiating the Set

We can initiate the replica set by connecting to one of the members and running the replSetInitiate command (that is,rs.initiate() in the mongo shell). This command takes a configuration object that specifies the name of the set and each of the members.

The replSetInitiate command may be sent to any member of an uninitiated set. However, only the member performing the initiation may have any existing data. This data becomes the initial data for the set. The other members will begin synchronizing and receiving that data (if present; starting empty is fine too). This is called the "initial sync". Secondaries will not be online for reads (in state 2, "SECONDARY") until their initial sync completes.

Note: the replication oplog (in the local database) is allocated at initiation time. The oplog can be quite large, thus initiation may take some time.

$ mongo localhost:27017
MongoDB shell version: 1.5.7
connecting to: localhost:27017/test
> rs.help(); // if you are curious run this (optional)
>
> config = {_id: 'foo', members: [
                          {_id: 0, host: 'localhost:27017'},
                          {_id: 1, host: 'localhost:27018'},
                          {_id: 2, host: 'localhost:27019'}]
           }
> rs.initiate(config);
{
   "info" : "Config now saved locally.  Should come online in about a minute.",
   "ok" : 1
}

We specify the config object and then pass it to rs.initiate(). Then, if everything is in order, we get a response saying that the replica set will be online in a minute. During this time, one of the nodes will be elected master.

To check the status of the set, run rs.status():

> rs.status()
{
	"set" : "foo",
	"date" : "Mon Aug 02 2010 11:39:08 GMT-0400 (EDT)",
	"myState" : 1,
	"members" : [
		{
			"name" : "arete.local:27017",
			"self" : true,
		},
		{
			"name" : "localhost:27019",
			"health" : 1,
			"uptime" : 101,
			"lastHeartbeat" : "Mon Aug 02 2010 11:39:07 GMT-0400",
		},
		{
			"name" : "localhost:27018",
			"health" : 1,
			"uptime" : 107,
			"lastHeartbeat" : "Mon Aug 02 2010 11:39:07 GMT-0400",
		}
	],
	"ok" : 1
}

You'll see that the other members of the set are up. You may also notice that the myState value is 1, indicating that we're connected to the member which is currently primary; a value of 2 indicates a secondary.

You can also check the set's status in the HTTP Admin UI.

Replication

Go ahead and write something to the master node:

db.messages.insert({name: "ReplSet Tutorial"});

If you look at the logs on the secondary nodes, you'll see the write replicated.

Failover

The purpose of a replica set is to provide automated failover. This means that, if the primary node goes down, a secondary node can take over. When this occurs the set members which are up perform an election to select a new primary. To see how this works in practice, go ahead and kill the master node with Control-C (^C) (or if running with --journal, kill -9 would be ok too):

^CMon Aug  2 11:50:16 got kill or ctrl c or hup signal 2 (Interrupt), will terminate after current cmd ends
Mon Aug  2 11:50:16 [interruptThread] now exiting
Mon Aug  2 11:50:16  dbexit:

If you look at the logs on the secondaries, you'll see a series of messages indicating fail-over. On our first slave, we see this:

Mon Aug  2 11:50:16 [ReplSetHealthPollTask] replSet info localhost:27017 is now down (or slow to respond)
Mon Aug  2 11:50:17 [conn1] replSet info voting yea for 2
Mon Aug  2 11:50:17 [rs Manager] replSet not trying to elect self as responded yea to someone else recently
Mon Aug  2 11:50:27 [rs_sync] replSet SECONDARY

And on the second, this:

Mon Aug  2 11:50:17 [ReplSetHealthPollTask] replSet info localhost:27017 is now down (or slow to respond)
Mon Aug  2 11:50:17 [rs Manager] replSet info electSelf 2
Mon Aug  2 11:50:17 [rs Manager] replSet PRIMARY
Mon Aug  2 11:50:27 [initandlisten] connection accepted from 127.0.0.1:61263 #5

Both nodes notice that the master has gone down and, as a result, a new primary node is elected. In this case, the node at port 27019 is promoted. If we bring the failed node on 27017 back online, it will come back up as a secondary.

Changing the replica set configuration

There are times when you'll want to change the replica set configuration. Suppose, for instance, that you want to make a member have priority zero, indicating the member should never be primary. To do this, you need to pass a new configuration object to the database's replSetReconfig command. The shell rs.reconfig() helper makes this easier.

One note: the reconfig command must be sent to the current primary of the set. This implies that you need a majority of the set up to perform a reconfiguration.

> // we should be primary.  can be checked with rs.status() or with:
> rs.isMaster();
> var c = rs.conf();
{_id: 'foo', members: [
                       {_id: 0, host: 'localhost:27017'},
                       {_id: 1, host: 'localhost:27018'},
                       {_id: 2, host: 'localhost:27019'}]
}
> c.members[2].priority = 0;
> c
{_id: 'foo', members: [
                       {_id: 0, host: 'localhost:27017'},
                       {_id: 1, host: 'localhost:27018'},
                       {_id: 2, host: 'localhost:27019', priority: 0}]
}
> rs.reconfig(c);
> //done. to see new config,and new status:
> rs.conf()
> rs.status()

Running with two nodes

Suppose you want to run replica sets with just two database servers (that is, have a replication factor of two). This is possible, but as replica sets perform elections, here a majority would be 2 out of 2 which is not helpful. Thus in this situation one normally also runs an arbiter on a separate server. An arbiter is a set member which has no data but gets to vote in elections. In the case here, the arbiter is the tie breaker in elections. Arbiters are very lightweight and can be ran anywhere – say, on an app server or a micro vm. With an arbiter in place, the replica set will behave appropriately, recovering automatically during both network partitions and node failures (e.g., machine crashes).

You start up an arbiter just as you would a standard replica set node, as a mongod process with the --replSet option. However, when initiating, you need to include the arbiterOnly option in the config document.

With an arbiter, the configuration presented above would look like this instead:

config = {_id: 'foo', members: [
                          {_id: 0, host: 'localhost:27017'},
                          {_id: 1, host: 'localhost:27018'},
                          {_id: 2, host: 'localhost:27019', arbiterOnly: true}]
           }

Drivers

Most of the MongoDB drivers are replica set aware. The driver when connecting takes a list of seed hosts from the replica set and can then discover which host is primary and which are secondary (the isMaster command is used internally by the driver for this).

With this complete set of potential master nodes, the driver can automatically find the new master if the current master fails. See your driver's documentation for specific details.

If you happen to be using the Ruby driver, you may want to check out Replica Sets in Ruby.

참고 : http://mongodb.onconfluence.com/display/DOCSKR/Home ( 한국어 메뉴얼 )
http://groups.google.com/group/mongodb-kr ( 한국 MongoDB 사용자 그룹 )

Posted by 솔라리스™

다시보는 이지아 외계인설

오늘의 뉴스 2011. 4. 22. 10:19 |

서태지와 이지아가 1997년 결혼해 14년간 부부였으며

최근 이혼소송 중이라는 기사

이지아 뱀파이어설, 외계인설 믿지 않았는데..

왠지 이젠 신비주의를 뛰어 공포 그 자체다 ㄷㄷ

★다시보는 이지아 외계인설★

우리 나라 인터넷 역사를 통틀어 지금까지 해결 못한 난제들이 있는데, 딸기를 들고 야릇한 표정을 짓는 딸녀, 지하철에서 개똥 안 치우고 내린 개똥녀, 등이 있다. 그런데 딸녀는 대충 외국 사람인 것으로 결론이 났고, 개똥녀는 일반인이라서 사실 공개된 정보도 적고, 찾는게 무척이나 힘들다. 하지만 유명인, 공공인사, 연예인이라면 사정이 다르다. 전국민이 다 아는 사람이기에 학교 다닐 때 같이 다녔다거나, 옆집에 산다거나, 부모를 안다거나, 어떻게든 아는 사람이 있기 마련이고.. 그런 소문들이 퍼지게 마련이다.
그런데 영원히 해결되지 않고 있는 문제가 하나 있는데 그게 바로 이지아다. 디씨에서 유일하게 추적에 실패한 연예인. 본명, 국적, 학력 등 기본적인 신상이 하나도 공개가 안 되었다. 사실 처음에 기획사에서 공개한 프로필이 있는데, 정작 그 학교에 알아보니 그런 졸업생이 없다고 해서.. 그것은 거짓으로 판명되었다. 이건 전에 케이블티비에서 직접 추적해서 학교 관계자를 통해서 확인한 것이다. 더 웃긴 건 이렇게 가짜 학력이라고 판명났음에도 여전히 네이버 프로필에는 그 학교로 기록되어 있고, 기획사에서 추후 조치를 하지도 않고, 기획사에서 학력을 다시 확인해주지도 않고, 또한 지난 번 학력위조 스캔들 대량으로 발생했을 때, 이지아 학력은 그냥 슬그머니 넘어갔다는 것이다. 허허.

더 황당한 것은 이지아가 못 하는게 없다는 것이다. 특히나 어학에서 출중한 능력을 보여주는데 아래는 그녀의 일본어 실력 및 영어 실력이다. 둘 다 수준급인데, 아무리 봐도 그냥 어학연수 1년 해서 나올 실력은 아니고.. 특히 발음이 아주 유창해서 일본 사람이라고 해도 믿겠고, 미국에 일찍 이민 갔다고 해도 믿겠다. (실제로 미국에는 이민 갔다고 주장하고, 미국에서 일본 친구들을 만나서 일본어를 배웠다고 주장하는데.. 미국에서 일본애들한테 배워서 이 수준이 된다?) 그리고 단순히 외워서 하는게 아니라 표정과 호흡하는 타이밍, 웃음이 완벽하다.

어학이야 일본과 미국에서 좀 살았나 보지.. 라고 생각할 수 있다. 그냥 살아서는 안 되고, 굉장히 열심히 어학을 배워야 가능한 실력인데.. 뭐... 어릴적에 미국에서 좀 살고 일본에서 몇년 살면서 일했나 보다.. 이러면 가능하다고 생각할 수도 있다. 그런데 여기에 또다른 미스테리 추가. 그녀의 음악 실력이다.
그리고 아래는 이지아의 베이스 기타 실력.

아니... 닥터코어 911 노래에 직접 들어가서 베이스 치고 있다. 이게 우리 나라 여자 연주자 중에서 가능한 사람 몇명 되겠음? 이거 단순히 한 몇달 연습해서 나올 수준이 아님...
그 외에 보컬도 되고, 작사도 함.
락밴드 닥터코어911과 상상밴드의 리더로 활동하고 있는 쇼기(showgy)가 작곡한 몽환적인 락발라드 ‘뱀파이어 로맨스’는 이지아의 진솔한 마음을 담아 가사가 돋보이는 곡. 특히 이지아는 직접 베이스를 연주, 세션으로 참여했다.

호소력 짙은 허스키 보이스를 선보인 이지아에 대해 작곡가 쇼기는 “나른한 목소리로 절제된 감성을 표현하면서도 마치 사막의 태양과 폭풍 같은 강렬함 또한 느껴진다”고 극찬했다.

2008년 불우아동돕기기금 마련차 ‘러브 바이러스’를 발표한 이지아는 지난해 텔레시네마 ‘내 눈에 콩깍지’ 삽입곡 ‘컵케익과 외계인’을 직접 작사하고 가창에 참여했다. 지난해 11월에는 팬미팅 겸 전시회에서 보컬무대를, 닥터코어911 단독공연에서 베이스기타를 연주하는 등 숨겨진 음악적 재능을 뽐냈다.

이외에도 그녀는 다양한 취미가 있다고 하는데.. 원래 디자인 전공했다고 확인되지 않은 이야기를 했었는데 실제로 의상, 그래픽 디자인을 한다고 한다.
이건 이지아가 직접 디자인한 옷이라고 하고..

이지아가 직접 디자인한 웹페이지 화면이라고 함.

이 외에도 승마, 바이올린 등을 드라마 출연을 계기로 익혔단다. 물론 이것들의 실력이 전문가는 아니겠지만.. 대단한 능력임에 틀림없다.

이런 이지아는 데뷔를 2004년에 LG 텔레콤 CF로 했는데, 배용준의 상대역으로 나왔다. 데뷔가 통신사 CF에.. 그것도 배용준 상대라니 이런 화려한 데뷔가 어디있겠나 싶다.

연기도 곧잘 해서 2007년에 MBC 신인상을 받았다. 태왕사신기, 베토벤 바이러스, 스타일에 출연했으며, 지금 아이리스 속편인 아테네 열심히 찍고 있다. 모두 주연급으로 출연했으며.. 데뷔부터 주연을 한 셈이다. 81년생이라고 주장하는데.. 도대체 이 모든 것이 다 가능한 것인가. 도대체 얘는 먹고 놀지도 않고 모든 것을 배우는데 인생을 다 쓰나. 81년생 이상이라고 하기에는 얼굴이 그렇지가 않고.. 특히 배용준이 굳이 데리고 있을 이유가 없을 것 같다. 그러면 진짜 81년생이라고 했을 때 얘는 정체가 뭐란 말인가.

그래서 여러 네티즌 수사대가 출동하여 몇년에 걸쳐서 캐어봤으나.. 아무 것도 알 수가 없었다. 인터넷에 이지아 기사에 댓글이 달릴 법도 한데, 그녀와 같이 학교에 다녔다는 사람, 어린 시절 같이 놀았다는 사람, 먼 친척 ... 아무도 없다. 그 흔한 교회 오빠도 하나 없으며, 초중고 선생님도 없고, 단지 아는 것은 배용준 연인이라는 것이다. 이나마도 초기에 연인설, 데이트 현장 목격설 나왔을 때 기획사에서 고소한다느니.. 아니라고 난리를 쳤고, 지금도 공식적으로 부인하고 있다. 어느 순간 어학 실력을 기준으로 추적 끝에 일본에서 텐프로 업소에서 일하다가 기획사 사장에게 픽업되어 왔다고 하는 설이 나왔는데.. 사실 그것도 그다지 설득력은 없다. 아무도 같이 일했다는 사람이 없고, 그렇다 하더라도 그러면 아는 친구라도 하나 있어야 하는데... 그런 사람이 없다. 아무리 성형 수술을 했어도.. 어떻게 이럴 수가 있냐 이거지.
연인설을 부인하고, 초등학교 졸업 이후 이민 갔다는 이지아의 인터뷰는 아래에서 볼 수 있다.

연속으로 2-3-4번 인터뷰도 있으니 읽어보시라. 특히 4번.. 시사 상식 퀴즈.시사 상식 퀴즈에서도 장난 아닌 실력을 보여줬다. 아래를 보시라. 솔직히 '간암'은 나라도 '위암'이라고 했을 것이고.. '명량대첩'은 국사를 한국에서 안 배우고 해외로 갔다는 걸 생각하면.. 한산대첩이라고 쓴 것도 용한 것이다.

이쁘고, 똑똑하고, 연기 잘 하고, 영어/일본어 능통에, 말도 타고, 미술에도 재능이 뛰어나단다. 아니 이런 엄친딸이 현실 속에서 있을 수 있냐고...

그리하여 결국 네티즌들은 추적을 포기하고 신이 내린 여자 이지아의 정체를 이렇게 규정하기 시작했다.
1. 배용준 여장설 (가만히 보면 배용준 하고 좀 닮았다.)
2. 아바타설
3. 컴퓨터 그래픽설 (정체가 없고 그냥 화면에 그래픽으로 나온다는 설..;;; ㅠ.ㅠ)
4. 뱀파이어설 (몇백년동안 살아서 이것 저것을 배우고 산다는 이야기)
5. 맨 프롬 어스 설 (영화 The Man from Earth에 나오는 2만년 산 사람.. 이런 동족이라는 것. 그래야 저걸 다 배울 수 있음)

암튼 결론은 이지아의 정체는 대중들에게 알려지지 않았고.. 아직도 어떤 네티즌들은 찾고 있지만.. 더이상 인터넷에 오를 정보는 다 올랐다고 본다. 그리고 어떠한 빈틈도 없이 여전히 베일에 쌓여있다.

이지아....도대체 정체가 뭐냐 ㅡㅡ;;;;

출처 : http://bbs1.telzone.daum.net/gaia/do/board/photo/read?bbsId=A000010&articleId=442854

Posted by 솔라리스™

How FriendFeed uses MySQL to store schema-less data

회사이야기?? 2011. 4. 12. 10:40 |

By Bret Taylor · February 27, 2009

Background

We use MySQL for storing all of the data in FriendFeed. Our database has grown a lot as our user base has grown. We now store over 250 million entries and a bunch of other data, from comments and "likes" to friend lists.

As our database has grown, we have tried to iteratively deal with the scaling issues that come with rapid growth. We did the typical things, like using read slaves and memcache to increase read throughput and sharding our database to improve write throughput. However, as we grew, scaling our existing features to accomodate more traffic turned out to be much less of an issue than adding new features.

In particular, making schema changes or adding indexes to a database with more than 10 - 20 million rows completely locks the database for hours at a time. Removing old indexes takes just as much time, and not removing them hurts performance because the database will continue to read and write to those unused blocks on every INSERT, pushing important blocks out of memory. There are complex operational procedures you can do to circumvent these problems (like setting up the new index on a slave, and then swapping the slave and the master), but those procedures are so error prone and heavyweight, they implicitly discouraged our adding features that would require schema/index changes. Since our databases are all heavily sharded, the relational features of MySQL like JOIN have never been useful to us, so we decided to look outside of the realm of RDBMS.

Lots of projects exist designed to tackle the problem storing data with flexible schemas and building new indexes on the fly (e.g., CouchDB). However, none of them seemed widely-used enough by large sites to inspire confidence. In the tests we read about and ran ourselves, none of the projects were stable or battle-tested enough for our needs (see this somewhat outdated article on CouchDB, for example). MySQL works. It doesn't corrupt data. Replication works. We understand its limitations already. We like MySQL for storage, just not RDBMS usage patterns.

After some deliberation, we decided to implement a "schema-less" storage system on top of MySQL rather than use a completely new storage system. This post attempts to describe the high-level details of the system. We are curious how other large sites have tackled these problems, and we thought some of the design work we have done might be useful to other developers.

Overview

Our datastore stores schema-less bags of properties (e.g., JSON objects or Python dictionaries). The only required property of stored entities is id, a 16-byte UUID. The rest of the entity is opaque as far as the datastore is concerned. We can change the "schema" simply by storing new properties.

We index data in these entities by storing indexes in separate MySQL tables. If we want to index three properties in each entity, we will have three MySQL tables - one for each index. If we want to stop using an index, we stop writing to that table from our code and, optionally, drop the table from MySQL. If we want a new index, we make a new MySQL table for that index and run a process to asynchronously populate the index without disrupting our live service.

As a result, we end up having more tables than we had before, but adding and removing indexes is easy. We have heavily optimized the process that populates new indexes (which we call "The Cleaner") so that it fills new indexes rapidly without disrupting the site. We can store new properties and index them in a day's time rather than a week's time, and we don't need to swap MySQL masters and slaves or do any other scary operational work to make it happen.

Details

In MySQL, our entities are stored in a table that looks like this:

CREATE TABLE entities (
    added_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
    id BINARY(16) NOT NULL,
    updated TIMESTAMP NOT NULL,
    body MEDIUMBLOB,
    UNIQUE KEY (id),
    KEY (updated)
) ENGINE=InnoDB;

The added_id column is present because InnoDB stores data rows physically in primary key order. TheAUTO_INCREMENT primary key ensures new entities are written sequentially on disk after old entities, which helps for both read and write locality (new entities tend to be read more frequently than old entities since FriendFeed pages are ordered reverse-chronologically). Entity bodies are stored as zlib-compressed, pickledPython dictionaries.

Indexes are stored in separate tables. To create a new index, we create a new table storing the attributes we want to index on all of our database shards. For example, a typical entity in FriendFeed might look like this:

{
    "id": "71f0c4d2291844cca2df6f486e96e37c",
    "user_id": "f48b0440ca0c4f66991c4d5f6a078eaf",
    "feed_id": "f48b0440ca0c4f66991c4d5f6a078eaf",
    "title": "We just launched a new backend system for FriendFeed!",
    "link": "http://friendfeed.com/e/71f0c4d2-2918-44cc-a2df-6f486e96e37c",
    "published": 1235697046,
    "updated": 1235697046,
}

We want to index the user_id attribute of these entities so we can render a page of all the entities a given user has posted. Our index table looks like this:

CREATE TABLE index_user_id (
    user_id BINARY(16) NOT NULL,
    entity_id BINARY(16) NOT NULL UNIQUE,
    PRIMARY KEY (user_id, entity_id)
) ENGINE=InnoDB;

Our datastore automatically maintains indexes on your behalf, so to start an instance of our datastore that stores entities like the structure above with the given indexes, you would write (in Python):

user_id_index = friendfeed.datastore.Index(
    table="index_user_id", properties=["user_id"], shard_on="user_id")
datastore = friendfeed.datastore.DataStore(
    mysql_shards=["127.0.0.1:3306", "127.0.0.1:3307"],
    indexes=[user_id_index])

new_entity = {
    "id": binascii.a2b_hex("71f0c4d2291844cca2df6f486e96e37c"),
    "user_id": binascii.a2b_hex("f48b0440ca0c4f66991c4d5f6a078eaf"),
    "feed_id": binascii.a2b_hex("f48b0440ca0c4f66991c4d5f6a078eaf"),
    "title": u"We just launched a new backend system for FriendFeed!",
    "link": u"http://friendfeed.com/e/71f0c4d2-2918-44cc-a2df-6f486e96e37c",
    "published": 1235697046,
    "updated": 1235697046,
}
datastore.put(new_entity)
entity = datastore.get(binascii.a2b_hex("71f0c4d2291844cca2df6f486e96e37c"))
entity = user_id_index.get_all(datastore, user_id=binascii.a2b_hex("f48b0440ca0c4f66991c4d5f6a078eaf"))

The Index class above looks for the user_id property in all entities and automatically maintains the index in the index_user_id table. Since our database is sharded, the shard_on argument is used to determine which shard the index gets stored on (in this case, entity["user_id"] % num_shards).

You can query an index using the index instance (see user_id_index.get_all above). The datastore code does the "join" between the index_user_id table and the entities table in Python, by first querying theindex_user_id tables on all database shards to get a list of entity IDs and then fetching those entity IDs from the entities table.

To add a new index, e.g., on the link property, we would create a new table:

CREATE TABLE index_link (
    link VARCHAR(735) NOT NULL,
    entity_id BINARY(16) NOT NULL UNIQUE,
    PRIMARY KEY (link, entity_id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

We would change our datastore initialization code to include this new index:

user_id_index = friendfeed.datastore.Index(
    table="index_user_id", properties=["user_id"], shard_on="user_id")
link_index = friendfeed.datastore.Index(
    table="index_link", properties=["link"], shard_on="link")
datastore = friendfeed.datastore.DataStore(
    mysql_shards=["127.0.0.1:3306", "127.0.0.1:3307"],
    indexes=[user_id_index, link_index])

And we could populate the index asynchronously (even while serving live traffic) with:

./rundatastorecleaner.py --index=index_link

Consistency and Atomicity

Since our database is sharded, and indexes for an entity can be stored on different shards than the entities themselves, consistency is an issue. What if the process crashes before it has written to all the index tables?

Building a transaction protocol was appealing to the most ambitious of FriendFeed engineers, but we wanted to keep the system as simple as possible. We decided to loosen constraints such that:

The property bag stored in the main entities table is canonical
Indexes may not reflect the actual entity values

Consequently, we write a new entity to the database with the following steps:

Write the entity to the entities table, using the ACID properties of InnoDB
Write the indexes to all of the index tables on all of the shards

When we read from the index tables, we know they may not be accurate (i.e., they may reflect old property values if writing has not finished step 2). To ensure we don't return invalid entities based on the constraints above, we use the index tables to determine which entities to read, but we re-apply the query filters on the entities themselves rather than trusting the integrity of the indexes:

Read the entity_id from all of the index tables based on the query
Read the entities from the entities table from the given entity IDs
Filter (in Python) all of the entities that do not match the query conditions based on the actual property values

To ensure that indexes are not missing perpetually and inconsistencies are eventually fixed, the "Cleaner" process I mentioned above runs continously over the entities table, writing missing indexes and cleaning up old and invalid indexes. It cleans recently updated entities first, so inconsistencies in the indexes get fixed fairly quickly (within a couple of seconds) in practice.

Performance

We have optimized our primary indexes quite a bit in this new system, and we are quite pleased with the results. Here is a graph of FriendFeed page view latency for the past month (we launched the new backend a couple of days ago, as you can tell by the dramatic drop):

In particular, the latency of our system is now remarkably stable, even during peak mid-day hours. Here is a graph of FriendFeed page view latency for the past 24 hours:

Compare this to one week ago:

The system has been really easy to work with so far. We have already changed the indexes a couple of times since we deployed the system, and we have started converting some of our biggest MySQL tables to use this new scheme so we can change their structure more liberally going forward.

Posted by 솔라리스™

Twitter와 FriendFeed 이야기

회사이야기?? 2011. 4. 12. 10:11 |

국내에서는 잘 모르시겠지만 해외에서는 Twitter라는 ‘소셜 메시징’ 서비스가 매우 인기입니다. 우리나라에서는 ‘마이크로 블로그’라고 알려진 Playtalk이나 Me2day의 원조 서비스로 잘알려져 있습니다만 제가 ‘소셜 메시징’이라고 한데는 다른 이유가 있습니다.

저는 웹 2.0을 초고속 인터넷망으로 인해 사람들이 인터넷에 직접 참여하면서 나온 문화적 결과라고 해석한 바 있습니다. 따라서, 우리 나라에서 나타난 (인터넷) 문화 현상이 해외에서도 공통적으로 나타나고 있죠. 마이스페이스나 페이스북은 아이러브스쿨이나 싸이월드와 같은 동인(動因)으로 생각하고 있구요. 블로그 확산 현상도 오마이뉴스 같은 개인 참여 미디어의 결과입니다.

그럼 Twitter는 무엇일까요? 블로그의 아류는 아니고 이게 어떻게 진화할지 꽤 궁금했는데요. 결과적으로 실시간 메시징 플랫폼으로 활용하고 있다고 봅니다. 솔직히 우리 나라도 SMS와 메신저 사용량이 증가하면서 이메일 사용량이 급격하게 낮아졌고, 이제 커뮤니케이션 가속 시대에 접어드는 문화적 변화가 있었습니다. 메신저나 SMS에 답이 안오면 “씹는다”라는 말이 생겼으니 삶이 더 각박해졌죠.

소셜 메시징 플랫폼의 한계
사람들은 이메일이 아니라 Twitter를 열어 놓고 서로 리플놀이를 하면서 연락을 주고 받고 있습니다. 물론 메신저를 사용하지 않는 건 아닙니다만, 웹 접속 접근성이 높아졌기 때문에 솔직히 프로그램을 설치해야 하는 번거러움이 있는 폐쇄된 버디 네트웍을 쓰는 것보다 웹 기반 서비스를 더 선호하고 있습니다.

근데 Twitter를 메시징 플랫폼으로 쓰기 시작하면서 어려움을 겪게 됩니다. 즉, 전 세계 사람들이 함께 토론하면 어떻게 될까? 라는 딜레마에 빠진 것이죠. 웹 서비스에 메시징이 결합하면 그 트래픽은 상상도 못할 정도의 데이터 처리 능력이 필요합니다. 지금까지 전혀 겪어 보지 못한 문제입니다.

올해 3월에 Techcrunch가 잠정 집계한 통계에 따르면 가입자는 수백 만명이고, 매주 20만명 이상이 한 개이상 메시지를 올리고 있으며, 하루에 300만개의 메시지가 왔다갔다 한다고 합니다.

솔직히 이런 폭주 현상은 스타트업 서비스로서 깜짝 인기를 얻은 Twitter가 감내하기 어려운 구조라 볼 수 있습니다. 몇 개월전만 해도 Twitter의 DB 구조가 취약해서 누군가 항상 기계 앞에 있다가 마스터 DB가 장애가 나면 수동으로 슬레이브 DB를 작동시켜야 한다는 루머(?)가 나돌기도 했습니다. 실제로 Twitter의 장애는 아주 빈번해서 사람들을 정말 열받게 할 정도였습니다.

대안은 오픈화? 하지만…
Twitter는 작년 말 부터 올해 상반기까지 수 없는 장애와 기능 중단과 재가동이 계속 겹치면서 기술 아키텍터인 Blaine Cook이 회사를 떠나게 되고 현재 문제를 해결할려고 많은 노력을 하고 있습니다. 하지만, 사람들은 이미 마음을 돌리고 있는 형편입니다.

기술 조언가들은 Twitter가 집중되지 않는 분산형 Instant Messaging 서비스 체제로 바뀌어야 한다고 조언하고 있고, 극단적으로 도메인 네임 서비스 같은 공공재로 전환해야 한다고 주장하는 사람도 있습니다.

일반적으로 Twitter같은 서비스는 사용자 수가 어느 정도 임계치에 다다르면 네트워크 효과가 생겨 새로 등록하는 사람 때문에 장애가 일어날 수 밖에 없습니다. DataPortability의 Faraday Media와 Chris Saad는 해결 방법으로 Twitter 트래픽의 많은 부분을 차지 하는 외부 서드파티 애플리케이션(Twitterific, AlertThingy, Twhirl등)과 메신저, SMS 등을 분산화 하라고 조언하기도 했습니다.

게다가 identi.ca라는 서비스는 아예 Twitter 같은 서비스를 laconi.ca라는 오픈 소스 소프트웨어로 만들어서 공개해 버렸습니다. 이 프로그램을 설치하면 Twitter 같은 소셜 메시징 사이트를 손 쉽게 만들 수 있습니다. 현재 크리에이티브 커먼즈 라이센스를 이용하고 있기 때문에 여기에 관여하는 유명한 분들이 많이 이용하고 있습니다.

틈새를 파고든 FriendFeed
이럴 때 사람들은 대안을 찾게 마련인데요. 남의 불행은 나의 기회라고 여기에 FriendFeed가 끼어듭니다. 이 서비스는 전직 구글 개발자들이 나와서 만든 일종의 소셜 서비스 신디케이션인데 Plxaso 같은데서 이미 있던 아이디어를 구현한 것입니다.

블로그, Twitter, Flickr, 유튜브 등 자기가 활동하는 웹 서비스의 데이터를 한꺼번에 나열해서 서로 뭐하고 놀고 있는지 친구들과 공유하는 것이죠. 솔직히 이게 처음 나왔을 때는 사람들에게 거의 관심을 못받았습니다.

그런데, FriendFeed에서 외부에서 받은 각 항목에다 직접 댓글을 다는 기능을 추가하여 신디케이션에서 메시징 기능을 추가하면서 논란이 벌어졌습니다. 내가 다는 댓글이 FriendFeed인지 원래 서비스에 가야할 댓글인지 논란이 된것이죠.

솔직히 신디케이션만 해야지 거기서 커뮤니케이션을 하게 한다는 건 올블로그에서 직접 댓글 서비스를 하는 것과 같은 거니까 문제가 있는 것이죠. 사실 직접적인 영향을 받은 게 Twitter인데 FriendFeed의 40%는 Twitter의 메시지를 받아오고 있었으니까요.

실제로 Twitter가 빈번히 장애가 나고 FriendFeed가 리플 기능을 제공하기 시작한 올해 상반기 부터 FriendFeed 이용률이 급속히 증가하고 있습니다. 주목할 점은 사용자의 체류 시간이 증가하고 있는데 이들 대부분은 Twitter에서 온 액티브 사용자라는 점은 이견이 없습니다.

아직은 FriendFeed가 완벽히 Twitter를 대체하고 있다던지 Twitter 사용자 탈퇴 러시가 있다던지 하는 것은 아닙니다. Twitter는 유명세 덕분에 서비스가 불안정함에도 불구하고 계속해서 사용자들이 유입되고 있고요. 특히 일본에서도 인기가 높습니다.

다만 소셜 네트웍 사이트들 끼리 경쟁이 극도로 심해 지고 있다는 것은 사실입니다. 얼마전 Facebook이 새로 화면 개편을 했는데 거의 FriendFeed와 비슷한 모양으로 바뀌었습니다. 오늘 나온 FriendFeed의 새 베타 버전을 보니 거의 Facebook과 닮아 있네요. 이제 서로 서로를 베끼면서 소셜 메시징 혹은 소셜 신디케이션의 UI 표준이 거의 자리 잡혔다고 봐야겠습니다.

결국 뛰어난 기술 기반이 중요할 듯
솔직히 말해 웹 2.0의 개방이니 참여니 하는 것은 다 개뿔같은 소리이고, 현재 미국의 소셜 네트웍 비지니스 이 동네는 서로 죽이지 못해 안달나있는 심한 경쟁 체제에 접어 들어 있습니다. 웹 2.0의 성공을 거울 삼은 이들 업체들이 ‘개방 플랫폼화’라는 기술적 성공 요소는 잘 접목을 시켜 왔지만 심한 경쟁 때문에 빛을 바래고 있다고 봐야 합니다.

오픈 ID니 DataPortablity니 하는 것도 성공한 젊은 창업자들에게는 별로 안중에 없습니다. 결국 성공한 서비스가 자기네들 역사를 새로 쓰게 되겠지요.

누구도 흉내 못낼 검색 기술을 웹을 성공적으로 플랫폼화 시킨 구글의 입장에서도 자기네들 안으로 팔을 굽는 (폐쇄된) 서비스들이 못마땅해 보일 겁니다. 게다가 폭발적인 성장을 이루고 있는 소셜 네트웍 시장에 어떻게든 한자리 차지해야 한다는 강박관념도 있어 만들어낸 오픈 소셜 같은 훌륭한 철학이 젊은 애들의 철없는 장난 같은 서비스 때문에 매장되는 것 처럼 보이죠. 어른말 안듣는 애들 마냥 답답해 보이기 까지 합니다.

반짝이는 아이디어와 문화적 코드가 만나 스파크를 일으키려는 이런 도찐개찐 같은 서비스 경쟁은 ‘기획’의 세계에서는 일상 다반사로 일어납니다. 하지만, Twitter와 FriendFeed 사이의 문제에서 불거진 “전 세계 사람들이 동시에 메시지를 주고 받을려면 어떻게 해야할까?”라는 문제에 해답은 기획에 있지 않습니다. 바로 기술에 있죠.

MS가 전 세계 사람들이 함께 쓰는 운영체제를 만들고, 구글이 전 세계 정보를 끌어 모으고, 아마존이 전 세계에 컴퓨터를 빌려주겠다고 나서는 기반에는 자기 기술에 대한 경쟁력이 있기 때문입니다. 단기적 서비스 확장 뿐 아니라 장기적 인터넷 기업의 생존을 위해서도 기술 확보 능력이 정말 중요하다는 것을 반증하는 것이죠.

Twitter의 예에서 보듯이 아이디어가 구현되어 글로벌 서비스로 나오는 단계에서 기술력이 얼마나 중요한가 다시 한번 깨닫을 수 있습니다. 근본적 물음에 대한 해결 없이 수틀리면 그냥 돈(장비)으로 쳐바르는 우리네 기술력도 한번 돌이켜 봐야겠지요. 그리고 주위에 개발자들이 있으면 격려의 한마디 건네 주시는 것도 좋겠습니다.

이어지는 글… Twitter와 FriendFeed 두번째 이야기 – 스코블은 블로그로 돌아올 것인가?

새벽에 쓰는 글이라 좀 횡설수설합니다. 너무 심각하게 받아들이지 마시고 그냥 세상 밖 돌아가는 이야기라고 생각하시고 읽어 주세요.

http://channy.creation.net/blog/550

Posted by 솔라리스™

PREV 1 ··· 30 31 32 33 34 35 36 ··· 76 NEXT

SIWOO.FREE.SOUL^TM

Category

'전체 글'에 해당되는 글 380건

Setting up replica sets with MongoDB 1.6

Introduction

Installation

Configuration

Starting it up

Configuring replica set members

Configuring the replica set arbiter

Updated driver usage

Conclusion

Replica Set Tutorial

다시보는 이지아 외계인설

How FriendFeed uses MySQL to store schema-less data

Background

Overview

Details

Consistency and Atomicity

Performance

Twitter와 FriendFeed 이야기

티스토리툴바

SIWOO.FREE.SOULTM

Category

'전체 글'에 해당되는 글 380건

Setting up replica sets with MongoDB 1.6

Introduction

Installation

Configuration

Starting it up

Configuring replica set members

Configuring the replica set arbiter

Updated driver usage

Conclusion

Replica Set Tutorial

다시보는 이지아 외계인설

How FriendFeed uses MySQL to store schema-less data

Background

Overview

Details

Consistency and Atomicity

Performance

Twitter와 FriendFeed 이야기

티스토리툴바

SIWOO.FREE.SOUL^TM