'NoSQL'에 해당되는 글 3건

  1. 2011.08.10 Web Technology Stack [Analysis]
  2. 2011.07.08 Setting up replica sets with MongoDB 1.6
  3. 2011.07.08 Replica Set Tutorial

Web Technology Stack [Analysis]

NoSQL 2011. 8. 10. 13:25 |

Even wondered what technologies are used for large web applications which have millions of unique visitors and have 1000s of requests per second. Which programming languages are making it happen, handling such peak amount of load at a time. We were curious about it and thought lets figure out what beneath the nice slick interface, who is handling the business logic efficiently. Here is what we found, a compiled list of technologies stack used at various web applications.

 
Product Front End Back end Database Others
Twitter Ruby on Rails (RoR), JavaScript, jQuery
LabJS, Modernizr, JSON-P, oEmbed
Scala Cassandra Java, C, Python, Mustache templating language
Facebook PHP, XHP, Hiphop for PHP, JavaScript C, C++, Java Cassandra, MySQL Python, Erlang
LinkedIn JSP, Apache Coyote Web Server Spring MVC, Linkedin spring, grails, Oracle and MySQL ActiveMQ for JMS, Lucene as a foundation for search, DWR, Jetty, Eh-cache, Quartz, Spring remoting.
YahooMail HTML, CSS, JavaScript (with YUI 3) PHP MySQL Apache Traffic Server (formely known as Yahoo! Traffic Server).
Google + Closure framework, including Closure’s JavaScript compiler and template system, HTML5 History API Closure templates server-side, C++, Java, or Python BigTable and Colossus/GFS MapReduce
FourSquare scala(lift framework) scala

Amazon S3 for hosting, /img/ folder which is served by nginx directly

MongoDB load balancer(s): nginx/0.8.52

Lift- A web framework written in scala.

Youtube Python psyco, a dynamic python->C compiler MySQL
Quora Python and JavaScript LiveNode/webnode2, Thrift (Communicate to backend)

Amazon EC2 and S3 for hosting

MySQL + memcached C++
Load Balancing: nginx in front of HAProxy
Viddler PHP, Python Rails 2.x, ffmpeg/mencoder/x264lib, Java 1.6 / Spring / Hibernate / ehcache, Erlang

Amazon EC2 and S3 for hosting

Mysql 5.0 Hadoop HDFS (distributed video source storage)
Nginx/Keepalived (Load balancers for all web traffic)
Wowza (RTMP video recording server)
Mediainfo(reporting video metadata)
Yamdi (metadata injector for flash videos)
Puppet(configuration management)
Logcheck(log scanning)
Duplicity(backup)
StackOverFlow jQuery, ASP .NET C#, Microsoft ASP.NET (version 4.0), ASP.NET MVC 3, Razor. LINQ to SQL, some raw SQL Server HAProxy (for load balancing), Bacula(for backup), Redis(caching layer)
Disqus jQuery,EasyXDM, Sammy, Flot, Raphaël, JSHint Python scripts, Django, Celery, South PostgreSQL, memcached HAProxy + heartbeat (Load balancing)

 

In Short..

Database Distribution

Backend Technology Distribution

Conclusion

The current trends for front end development are mainly jQuery, Python, Scala. Though some companies use Microsoft technologies but, the percentage of such companies is very less. Although MySQL and Cassandra has been favorites databases but few are moving to MongoDB which is a NoSQL database. Back end of these high scaling sites run on variety of different technologies such as Django, python, RoR, Closure, C++, Java, Scala etc.

I’ve written this post to give insights about which technology combinations are preferred by websites with high user base, you might want to consider these points while choosing platforms and programming languages for your start-up.

Like this article? or have some thing to say? Comment down your opinions.


http://www.tutkiun.com/2011/07/web-technology-stack-analysis.html

Posted by 솔라리스™
:

Introduction

MongoDB 1.6 was released today, and it includes, among other things it includes support for the incredible sexy replica sets feature – basically master/slave replication on crack with automatic failover and the like. I’m setting it up, and figured I’d document the pieces as I walk through them.

My test deploy is going to consist of two nodes and one arbiter; production will have several more potential nodes. We aren’t worrying about sharding at this point, but 1.6 brings automatic sharding with it, as well, so we can enable that at a later point if we need to.

Installation

Installation is very easy. 10gen offers a yum repo, so it’s as easy as adding the repo to /etc/yum.repos.d and then running yum install mongo-stable mongo-server-stable.

Once installed, mongo --version confirms that we’re on 1.6. Time to boot up our nodes.

Configuration

For staging, we’re going to run both replica nodes and the arbiter on a single machine. This means 3 configs.

I have 3 config files in /etc/mongod/mongo.node1.conf, mongo.node2.conf, and mongo.arbiter.conf. As follows:

1  
2
3
4
5
6
7
8
 
# mongo.node1.conf
replSet=my_replica_set
logpath=/var/log/mongo/mongod.node1.log
port = 27017
logappend=true
dbpath=/var/lib/mongo/node1
fork = true
rest = true
 
1  
2
3
4
5
6
7
 
# mongo.node2.conf
replSet=my_replica_set
logpath=/var/log/mongo/mongod.node2.log
port = 27018
logappend=true
dbpath=/var/lib/mongo/node2
fork = true
 
1  
2
3
4
5
6
7
8
# mongo.arbiter.conf
replSet=my_replica_set
logpath=/var/log/mongo/mongod.arbiter.log
port = 27019
logappend=true
dbpath=/var/lib/mongo/arbiter
fork = true
oplogSize = 1

Starting it up

Then we just fire up our daemons:

2
3
mongod -f /etc/mongod/mongo.node1.conf
mongod -f /etc/mongod/mongo.node2.conf
mongod -f /etc/mongod/mongo.arbiter.conf

Once we spin up the servers, they need a bit to allocate files and start listening. I tried to connect a bit too early, and got the following:

2
3
4
5
[root@261668-db3 mongo]# mongo
MongoDB shell version: 1.6.0
connecting to: test
Fri Aug  6 03:48:40 Error: couldn't connect to server 127.0.0.1} (anon):1137
exception: connect failed

Configuring replica set members

Once you can connect to the mongo console, and we need to set up the replica set. If you have a compliant configuration, then you can just call rs.initiate() and everything will get spun up. If you don’t, though, you’ll need to specify your initial configuration.

This is where I hit my first problem; the hostname as the system defines it didn’t resolve. This was resulting in the following:

2
3
4
5
6
7
 
8
9
[root@261668-db3 init.d]# mongo --port 27017
MongoDB shell version: 1.6.0
connecting to: 127.0.0.1:27017/test
> rs.initiate();
{
        "info2" : "no configuration explicitly specified -- making one",
        "errmsg" : "couldn't initiate : need members up to initiate, not ok : 261668-db3.db3.domain.com:27017",
        "ok" : 0
}

The solution, then, is to specify the members, and to use a resolvable internal name. Note that you do NOT include the arbiter’s information; you don’t want to add it to the replica set early as a full-fledged member.

1  
2
3
4
5
6
> cfg = {_id: "my_replica_set", members: [{_id: 0, host: "db3:27017"}, {_id: 1, host: "db3:27018"}] }
> rs.initiate(cfg);
{
        "info" : "Config now saved locally.  Should come online in about a minute.",
        "ok" : 1
}

Bingo. We’re in business.

Configuring the replica set arbiter

If the replica set master fails, a new master is elected. To be elected, a replica master needs to have at least floor(n / 2) + 1 votes, where n is the number of active nodes in the cluster. In a paired setup, if the master were to fail, then the remaining slave wouldn’t be able to elect itself to the new master, since it would only have 1 vote. Thus, we run an arbiter, which is a special lightweight, no-data-contained node whose only job is to be a tiebreaker. It will vote with the orphaned slave and elect it to the new master, so that the slave can continue duties while the old master is offline.

1  
2
3
4
5
6
> rs.addArb("db3:27019")
{
        "startupStatus" : 6,
        "errmsg" : "Received replSetInitiate - should come online shortly.",
        "ok" : 0
}

Updated driver usage

Once we’re set up, the Ruby Mongo connection code is updated to connect to a replica set rather than a single server.

Before:

MongoMapper.connection = Mongo::Connection.new("db3", 27017)

After

MongoMapper.connection = Mongo::Connection.multi([["db3", 27017], ["db3", 27018]])

This will attempt to connect to each of the defined servers, and get a list of all the visible nodes, then find the master. Since you don’t have to specify the full list, you don’t have to update your connection info each time you change the machines in the set. All it needs is at least one connectable server (even a slave) and the driver will figure out the master from there.

Conclusion

That’s about all there is to it! We’re now up and running with a replica set. We can add new slaves to the replica set, force a new master, take nodes in the cluster down, and all that jazz without impacting your app. You can even set up replica slaves in other data centers for zero-effort offsite backup. If your DB server exploded, you could point your app at the external datacenter’s node and keep running while you replace your local database server. Once your new server is up, just bring it online and re-add its node back into your replica set. Data will be transparently synched back to your local node. Once the sync is complete, you can re-elect your local node as the master, and all is well again.

Congratulations – enjoy your new replica set!

출처 : http://www.coffeepowered.net/2010/08/06/setting-up-replica-sets-with-mongodb-1-6/ 

Posted by 솔라리스™
:

Replica Set Tutorial

NoSQL/mongoDB 2011. 7. 8. 15:55 |

This tutorial will guide you through the basic configuration of a replica set. Given the tutorial is an example and should be easy to try, it runs several mongod processes on a single machine (in the real world one would use several machines). If you are attempting to deploy replica sets in production, be sure to read the replica set documentation. Replica sets are available in MongoDB V1.6+.

Introduction

A replica set is group of n mongod nodes (members) that work together. The goal is that each member of the set has a complete copy (replica) of the data form the other nodes.

Setting up a replica set is a two-step process that requires starting each mongod process and then formally initiating the set. Here, we'll be configuring a set of three nodes, which is standard.

Once the mongod processes are started, we will issue a command to initialize the set. After a few seconds, one node will be elected master, and you can begin writing to and querying the set.

Starting the nodes

First, create a separate data directory for each of the nodes in the set. In a real environment with multiple servers we could use the default /data/db directory if we wanted to, but on a single machine we will have to set up non-defaults:

$ mkdir -p /data/r0
$ mkdir -p /data/r1
$ mkdir -p /data/r2

Next, start each mongod process with the --replSet parameter. The parameter requires that you specify a logical name for our replica set. Let's call our replica set "foo". We'll launch our first node like so:

$ mongod --replSet foo --port 27017 --dbpath /data/r0

Let's now start the second and third nodes:

$ mongod --replSet foo --port 27018 --dbpath /data/r1
$ mongod --replSet foo --port 27019 --dbpath /data/r2

You should now have three nodes running. At this point, each node should be printing the following warning:

Mon Aug  2 11:30:19 [startReplSets] replSet can't get local.system.replset config from self or any seed (EMPTYCONFIG)

We can't use the replica set until we've initiated it, which we'll do next.

Initiating the Set

We can initiate the replica set by connecting to one of the members and running the replSetInitiate command (that is,rs.initiate() in the mongo shell). This command takes a configuration object that specifies the name of the set and each of the members.

The replSetInitiate command may be sent to any member of an uninitiated set. However, only the member performing the initiation may have any existing data. This data becomes the initial data for the set. The other members will begin synchronizing and receiving that data (if present; starting empty is fine too). This is called the "initial sync". Secondaries will not be online for reads (in state 2, "SECONDARY") until their initial sync completes.

Note: the replication oplog (in the local database) is allocated at initiation time. The oplog can be quite large, thus initiation may take some time.

$ mongo localhost:27017
MongoDB shell version: 1.5.7
connecting to: localhost:27017/test
> rs.help(); // if you are curious run this (optional)
>
> config = {_id: 'foo', members: [
                          {_id: 0, host: 'localhost:27017'},
                          {_id: 1, host: 'localhost:27018'},
                          {_id: 2, host: 'localhost:27019'}]
           }
> rs.initiate(config);
{
   "info" : "Config now saved locally.  Should come online in about a minute.",
   "ok" : 1
}

We specify the config object and then pass it to rs.initiate(). Then, if everything is in order, we get a response saying that the replica set will be online in a minute. During this time, one of the nodes will be elected master.

To check the status of the set, run rs.status():

> rs.status()
{
	"set" : "foo",
	"date" : "Mon Aug 02 2010 11:39:08 GMT-0400 (EDT)",
	"myState" : 1,
	"members" : [
		{
			"name" : "arete.local:27017",
			"self" : true,
		},
		{
			"name" : "localhost:27019",
			"health" : 1,
			"uptime" : 101,
			"lastHeartbeat" : "Mon Aug 02 2010 11:39:07 GMT-0400",
		},
		{
			"name" : "localhost:27018",
			"health" : 1,
			"uptime" : 107,
			"lastHeartbeat" : "Mon Aug 02 2010 11:39:07 GMT-0400",
		}
	],
	"ok" : 1
}

You'll see that the other members of the set are up. You may also notice that the myState value is 1, indicating that we're connected to the member which is currently primary; a value of 2 indicates a secondary.

You can also check the set's status in the HTTP Admin UI.

Replication

Go ahead and write something to the master node:

  db.messages.insert({name: "ReplSet Tutorial"});

If you look at the logs on the secondary nodes, you'll see the write replicated.

Failover

The purpose of a replica set is to provide automated failover. This means that, if the primary node goes down, a secondary node can take over. When this occurs the set members which are up perform an election to select a new primary. To see how this works in practice, go ahead and kill the master node with Control-C (^C) (or if running with --journal, kill -9 would be ok too):

^CMon Aug  2 11:50:16 got kill or ctrl c or hup signal 2 (Interrupt), will terminate after current cmd ends
Mon Aug  2 11:50:16 [interruptThread] now exiting
Mon Aug  2 11:50:16  dbexit: 

If you look at the logs on the secondaries, you'll see a series of messages indicating fail-over. On our first slave, we see this:

Mon Aug  2 11:50:16 [ReplSetHealthPollTask] replSet info localhost:27017 is now down (or slow to respond)
Mon Aug  2 11:50:17 [conn1] replSet info voting yea for 2
Mon Aug  2 11:50:17 [rs Manager] replSet not trying to elect self as responded yea to someone else recently
Mon Aug  2 11:50:27 [rs_sync] replSet SECONDARY

And on the second, this:

Mon Aug  2 11:50:17 [ReplSetHealthPollTask] replSet info localhost:27017 is now down (or slow to respond)
Mon Aug  2 11:50:17 [rs Manager] replSet info electSelf 2
Mon Aug  2 11:50:17 [rs Manager] replSet PRIMARY
Mon Aug  2 11:50:27 [initandlisten] connection accepted from 127.0.0.1:61263 #5

Both nodes notice that the master has gone down and, as a result, a new primary node is elected. In this case, the node at port 27019 is promoted. If we bring the failed node on 27017 back online, it will come back up as a secondary.

Changing the replica set configuration

There are times when you'll want to change the replica set configuration. Suppose, for instance, that you want to make a member have priority zero, indicating the member should never be primary. To do this, you need to pass a new configuration object to the database's replSetReconfig command. The shell rs.reconfig() helper makes this easier.

One note: the reconfig command must be sent to the current primary of the set. This implies that you need a majority of the set up to perform a reconfiguration.

> // we should be primary.  can be checked with rs.status() or with:
> rs.isMaster();
> var c = rs.conf();
{_id: 'foo', members: [
                       {_id: 0, host: 'localhost:27017'},
                       {_id: 1, host: 'localhost:27018'},
                       {_id: 2, host: 'localhost:27019'}]
}
> c.members[2].priority = 0;
> c
{_id: 'foo', members: [
                       {_id: 0, host: 'localhost:27017'},
                       {_id: 1, host: 'localhost:27018'},
                       {_id: 2, host: 'localhost:27019', priority: 0}]
}
> rs.reconfig(c);
> //done. to see new config,and new status:
> rs.conf()
> rs.status()

Running with two nodes

Suppose you want to run replica sets with just two database servers (that is, have a replication factor of two). This is possible, but as replica sets perform elections, here a majority would be 2 out of 2 which is not helpful. Thus in this situation one normally also runs an arbiter on a separate server. An arbiter is a set member which has no data but gets to vote in elections. In the case here, the arbiter is the tie breaker in elections. Arbiters are very lightweight and can be ran anywhere – say, on an app server or a micro vm. With an arbiter in place, the replica set will behave appropriately, recovering automatically during both network partitions and node failures (e.g., machine crashes).

You start up an arbiter just as you would a standard replica set node, as a mongod process with the --replSet option. However, when initiating, you need to include the arbiterOnly option in the config document.

With an arbiter, the configuration presented above would look like this instead:

config = {_id: 'foo', members: [
                          {_id: 0, host: 'localhost:27017'},
                          {_id: 1, host: 'localhost:27018'},
                          {_id: 2, host: 'localhost:27019', arbiterOnly: true}]
           }

Drivers

Most of the MongoDB drivers are replica set aware. The driver when connecting takes a list of seed hosts from the replica set and can then discover which host is primary and which are secondary (the isMaster command is used internally by the driver for this).

With this complete set of potential master nodes, the driver can automatically find the new master if the current master fails. See your driver's documentation for specific details.

If you happen to be using the Ruby driver, you may want to check out Replica Sets in Ruby.

참고 : http://mongodb.onconfluence.com/display/DOCSKR/Home ( 한국어 메뉴얼 )
           http://groups.google.com/group/mongodb-kr ( 한국 MongoDB 사용자 그룹 ) 
 



Posted by 솔라리스™
: