Even wondered what technologies are used for large web applications
which have millions of unique visitors and have 1000s of requests per
second. Which programming languages are making it happen, handling such
peak amount of load at a time. We were curious about it and thought lets
figure out what beneath the nice slick interface, who is handling the
business logic efficiently. Here is what we found, a compiled list of
technologies stack used at various web applications.
Product
Front End
Back end
Database
Others
Twitter
Ruby on Rails (RoR), JavaScript, jQuery
LabJS, Modernizr, JSON-P, oEmbed
Scala
Cassandra
Java, C, Python, Mustache templating language
Facebook
PHP, XHP, Hiphop for PHP, JavaScript
C, C++, Java
Cassandra, MySQL
Python, Erlang
LinkedIn
JSP, Apache Coyote Web Server
Spring MVC, Linkedin spring, grails,
Oracle and MySQL
ActiveMQ for JMS, Lucene as a foundation for search, DWR, Jetty, Eh-cache, Quartz, Spring remoting.
YahooMail
HTML, CSS, JavaScript (with YUI 3)
PHP
MySQL
Apache Traffic Server (formely known as Yahoo! Traffic Server).
Google +
Closure framework, including Closure’s JavaScript compiler and template system, HTML5 History API
Closure templates server-side, C++, Java, or Python
BigTable and Colossus/GFS
MapReduce
FourSquare
scala(lift framework)
scala
Amazon S3 for hosting, /img/ folder which is served by nginx directly
MongoDB
load balancer(s): nginx/0.8.52
Lift- A web framework written in scala.
Youtube
Python
psyco, a dynamic python->C compiler
MySQL
Quora
Python and JavaScript
LiveNode/webnode2, Thrift (Communicate to backend)
Hadoop HDFS (distributed video source storage)
Nginx/Keepalived (Load balancers for all web traffic)
Wowza (RTMP video recording server)
Mediainfo(reporting video metadata)
Yamdi (metadata injector for flash videos)
Puppet(configuration management)
Logcheck(log scanning)
Duplicity(backup)
StackOverFlow
jQuery, ASP .NET
C#, Microsoft ASP.NET (version 4.0), ASP.NET MVC 3, Razor.
The current trends for front end development are mainly jQuery,
Python, Scala. Though some companies use Microsoft technologies but, the
percentage of such companies is very less. Although MySQL and Cassandra
has been favorites databases but few are moving to MongoDB which is a
NoSQL database. Back end of these high scaling sites run on variety of
different technologies such as Django, python, RoR, Closure, C++, Java,
Scala etc.
I’ve written this post to give insights about which technology
combinations are preferred by websites with high user base, you might
want to consider these points while choosing platforms and programming
languages for your start-up.
Like this article? or have some thing to say? Comment down your opinions.
MongoDB 1.6 was released today, and it includes, among other things it includes support for the incredible sexy replica sets feature
– basically master/slave replication on crack with automatic failover
and the like. I’m setting it up, and figured I’d document the pieces as I
walk through them.
My test deploy is going to consist of two nodes and one arbiter;
production will have several more potential nodes. We aren’t worrying
about sharding at this point, but 1.6 brings automatic sharding with it,
as well, so we can enable that at a later point if we need to.
Installation
Installation is very easy. 10gen offers a yum repo, so it’s as easy as adding the repo to /etc/yum.repos.d and then running yum install mongo-stable mongo-server-stable.
Once installed, mongo --version confirms that we’re on 1.6. Time to boot up our nodes.
Configuration
For staging, we’re going to run both replica nodes and the arbiter on a single machine. This means 3 configs.
I have 3 config files in /etc/mongod/ – mongo.node1.conf, mongo.node2.conf, and mongo.arbiter.conf. As follows:
1
2
3
4
5
6
7
8
# mongo.node1.conf
replSet=my_replica_set
logpath=/var/log/mongo/mongod.node1.log
port = 27017
logappend=true
dbpath=/var/lib/mongo/node1
fork = true
rest = true
1
2
3
4
5
6
7
# mongo.node2.conf
replSet=my_replica_set
logpath=/var/log/mongo/mongod.node2.log
port = 27018
logappend=true
dbpath=/var/lib/mongo/node2
fork = true
1
2
3
4
5
6
7
8
# mongo.arbiter.conf
replSet=my_replica_set
logpath=/var/log/mongo/mongod.arbiter.log
port = 27019
logappend=true
dbpath=/var/lib/mongo/arbiter
fork = true
oplogSize = 1
Starting it up
Then we just fire up our daemons:
1
2
3
mongod -f /etc/mongod/mongo.node1.conf
mongod -f /etc/mongod/mongo.node2.conf
mongod -f /etc/mongod/mongo.arbiter.conf
Once we spin up the servers, they need a bit to allocate files and
start listening. I tried to connect a bit too early, and got the
following:
1
2
3
4
5
[root@261668-db3 mongo]# mongo
MongoDB shell version: 1.6.0
connecting to: test
Fri Aug 6 03:48:40 Error: couldn't connect to server 127.0.0.1} (anon):1137
exception: connect failed
Configuring replica set members
Once you can connect to the mongo console, and we need to set up the
replica set. If you have a compliant configuration, then you can just
call rs.initiate() and everything will get spun up. If you don’t, though, you’ll need to specify your initial configuration.
This is where I hit my first problem; the hostname as the system defines it didn’t resolve. This was resulting in the following:
1
2
3
4
5
6
7
8
9
[root@261668-db3 init.d]# mongo --port 27017
MongoDB shell version: 1.6.0
connecting to: 127.0.0.1:27017/test
> rs.initiate();
{
"info2": "no configuration explicitly specified -- making one",
"errmsg": "couldn't initiate : need members up to initiate, not ok : 261668-db3.db3.domain.com:27017",
"ok": 0
}
The solution, then, is to specify the members, and to use a
resolvable internal name. Note that you do NOT include the arbiter’s
information; you don’t want to add it to the replica set early as a
full-fledged member.
"info": "Config now saved locally. Should come online in about a minute.",
"ok": 1
}
Bingo. We’re in business.
Configuring the replica set arbiter
If the replica set master fails, a new master is elected. To be elected, a replica master needs to have at least floor(n / 2) + 1 votes, where n
is the number of active nodes in the cluster. In a paired setup, if the
master were to fail, then the remaining slave wouldn’t be able to elect
itself to the new master, since it would only have 1 vote. Thus, we run
an arbiter, which is a special lightweight, no-data-contained node
whose only job is to be a tiebreaker. It will vote with the orphaned
slave and elect it to the new master, so that the slave can continue
duties while the old master is offline.
1
2
3
4
5
6
> rs.addArb("db3:27019")
{
"startupStatus": 6,
"errmsg": "Received replSetInitiate - should come online shortly.",
"ok": 0
}
Updated driver usage
Once we’re set up, the Ruby Mongo connection code is updated to connect to a replica set rather than a single server.
This will attempt to connect to each of the defined servers, and get a
list of all the visible nodes, then find the master. Since you don’t
have to specify the full list, you don’t have to update your connection
info each time you change the machines in the set. All it needs is at
least one connectable server (even a slave) and the driver will figure
out the master from there.
Conclusion
That’s about all there is to it! We’re now up and running with a
replica set. We can add new slaves to the replica set, force a new
master, take nodes in the cluster down, and all that jazz without
impacting your app. You can even set up replica slaves in other data
centers for zero-effort offsite backup. If your DB server exploded, you
could point your app at the external datacenter’s node and keep running
while you replace your local database server. Once your new server is
up, just bring it online and re-add its node back into your replica set.
Data will be transparently synched back to your local node. Once the
sync is complete, you can re-elect your local node as the master, and
all is well again.
This tutorial will guide you through the
basic configuration of a replica set. Given the tutorial is an example and
should be easy to try, it runs several mongod processes on a single machine (in
the real world one would use several machines). If you are attempting to deploy
replica sets in production, be sure to read the replica set documentation. Replica sets are
available in MongoDB V1.6+.
Introduction
A replica set is group of nmongod nodes (members) that work
together. The goal is that each member of the set has a complete copy (replica)
of the data form the other nodes.
Setting up a replica set is a two-step
process that requires starting each mongod process and then formally initiating
the set. Here, we'll be configuring a set of three nodes, which is standard.
Once the mongod processes are started, we will
issue a command to initialize the set. After a few seconds, one node will be
elected master, and you can begin writing to and querying the set.
Starting the nodes
First, create a separate data directory
for each of the nodes in the set. In a real environment with multiple servers
we could use the default /data/db directory if we wanted to, but on a single
machine we will have to set up non-defaults:
Next, start each mongod process with the --replSet parameter. The parameter requires
that you specify a logical name for our replica set. Let's call our replica set
"foo". We'll launch our first node like so:
You should now have three nodes running.
At this point, each node should be printing the following warning:
Mon Aug2 11:30:19 [startReplSets] replSet can't get local.system.replset config
from self or any seed (EMPTYCONFIG)
We can't use the replica set until we've initiated it,
which we'll do next.
Initiating the Set
We can initiate the replica set by
connecting to one of the members and running the replSetInitiate command (that
is,rs.initiate() in the mongo shell).
This command takes a configuration object that specifies the name of the set
and each of the members.
The replSetInitiate command may be sent
to any member of an uninitiated set. However, only the member performing the
initiation may have any existing data. This data becomes the initial data for
the set. The other members will begin synchronizing and receiving that data (if
present; starting empty is fine too). This is called the "initial
sync". Secondaries will not be online for reads (in state 2,
"SECONDARY") until their initial sync completes.
Note: the replication oplog (in
the local database) is allocated at initiation time. The oplog can be quite
large, thus initiation may take some time.
$ mongo localhost:27017
MongoDB shell version: 1.5.7
connecting to: localhost:27017/test
> rs.help(); // if you are curious run this (optional)
>
> config = {_id: 'foo', members: [
{_id: 0, host: 'localhost:27017'},
{_id: 1, host: 'localhost:27018'},
{_id: 2, host: 'localhost:27019'}]
}
> rs.initiate(config);
{
"info" : "Config now saved locally. Should come online in about a minute.",
"ok" : 1
}
We specify the config object and then
pass it to rs.initiate(). Then, if everything is in order, we get a response saying that the
replica set will be online in a minute. During this time, one of the nodes will
be elected master.
You'll see that the other members of the
set are up. You may also notice that the myState value is 1, indicating that we're
connected to the member which is currently primary; a value of 2 indicates a
secondary.
You can also check the set's status in
the HTTP Admin UI.
Replication
Go ahead and write something to the
master node:
db.messages.insert({name: "ReplSet Tutorial"});
If you look at the logs on the secondary
nodes, you'll see the write replicated.
Failover
The purpose of a replica set is to
provide automated failover. This means that, if the primary node goes down, a
secondary node can take over. When this occurs the set members which are up
perform an election to
select a new primary. To see how this works in practice, go ahead and kill the
master node with Control-C (^C) (or if running with --journal, kill -9 would be ok too):
^CMon Aug 2 11:50:16 got kill or ctrl c or hup signal 2 (Interrupt), will terminate after current cmd ends
Mon Aug 2 11:50:16 [interruptThread] now exiting
Mon Aug 2 11:50:16 dbexit:
If you look at the logs on the
secondaries, you'll see a series of messages indicating fail-over. On our first
slave, we see this:
Mon Aug 2 11:50:16 [ReplSetHealthPollTask] replSet info localhost:27017 is now down (or slow to respond)
Mon Aug 2 11:50:17 [conn1] replSet info voting yea for 2
Mon Aug 2 11:50:17 [rs Manager] replSet not trying to elect self as responded yea to someone else recently
Mon Aug 2 11:50:27 [rs_sync] replSet SECONDARY
And on the second, this:
Mon Aug 2 11:50:17 [ReplSetHealthPollTask] replSet info localhost:27017 is now down (or slow to respond)
Mon Aug 2 11:50:17 [rs Manager] replSet info electSelf 2
Mon Aug 2 11:50:17 [rs Manager] replSet PRIMARY
Mon Aug 2 11:50:27 [initandlisten] connection accepted from 127.0.0.1:61263 #5
Both nodes notice that the master has
gone down and, as a result, a new primary node is elected. In this case, the
node at port 27019 is promoted. If we bring the failed node on 27017 back
online, it will come back up as a secondary.
Changing the replica set configuration
There are times when you'll want to
change the replica set configuration. Suppose, for instance, that you want to
make a member have priority zero, indicating the member should never be
primary. To do this, you need to pass a new configuration object to the
database's replSetReconfig command. The shell rs.reconfig() helper makes this easier.
One note: the reconfig command must be
sent to the current primary of the set. This implies that you need a majority
of the set up to perform a reconfiguration.
> // we should be primary. can be checked with rs.status() or with:
> rs.isMaster();
> var c = rs.conf();
{_id: 'foo', members: [
{_id: 0, host: 'localhost:27017'},
{_id: 1, host: 'localhost:27018'},
{_id: 2, host: 'localhost:27019'}]
}
> c.members[2].priority = 0;
> c
{_id: 'foo', members: [
{_id: 0, host: 'localhost:27017'},
{_id: 1, host: 'localhost:27018'},
{_id: 2, host: 'localhost:27019', priority: 0}]
}
> rs.reconfig(c);
> //done. to see new config,and new status:
> rs.conf()
> rs.status()
Running with two nodes
Suppose you want to run replica sets
with just two database servers (that is, have a replication factor of two).
This is possible, but as replica sets perform elections, here a majority would
be 2 out of 2 which is not helpful. Thus in this situation one normally also
runs an arbiter on a separate server. An arbiter is a set
member which has no data but gets to vote in elections. In the case here, the
arbiter is the tie breaker in elections. Arbiters are very lightweight and can
be ran anywhere – say, on an app server or a micro vm. With an arbiter in
place, the replica set will behave appropriately, recovering automatically
during both network partitions and node failures (e.g., machine crashes).
You start up an arbiter just as you
would a standard replica set node, as a mongod process with the --replSet option. However, when initiating,
you need to include the arbiterOnly option in the config document.
With an arbiter, the configuration
presented above would look like this instead:
Most of the MongoDB drivers are replica
set aware. The driver when connecting takes a list of seed hosts from the
replica set and can then discover which host is primary and which are secondary
(the isMaster command is used internally by the driver for this).
With this complete set of potential
master nodes, the driver can automatically find the new master if the current
master fails. See your driver's documentation for specific details.
If you happen to be using the Ruby
driver, you may want to check out Replica Sets in Ruby.
참고 : http://mongodb.onconfluence.com/display/DOCSKR/Home ( 한국어 메뉴얼 )
http://groups.google.com/group/mongodb-kr ( 한국 MongoDB 사용자 그룹 )