Insights from Our Experts
Software stack of five hot startups running on Django!
OK, so before jumping right into the core of the matter, let’s take a short walk through the history of Django. The first big wave of Rails hype was ramping up, and it was right after that when Django was born. Therefore it was immediately portrayed as being sort of "Python's answer to Rails". Django was born in 2003, when the web programmers at the Lawrence Journal-World newspaper, Adrian and Simon, began using Python to build applications. And ever since 2008, its maintenance has been in the safe hands of Django Software Foundation (DSF).
And then Django came out of the blocks with its largely successful documentation which was far above the standards which one would expect from a newly born framework back then. Even now, it is maintained so well that it covers and mixes up the training with sufficient API reference and examples. This has made life much easier for aspiring developers, with a nice and smooth learning curve to follow.
Coming to the core matter, it has been no surprise that most of the hot and upcoming startups possess a tendency to adopt Django as their primary stack. Here is a look at the five of the most popular tech startups developed in Django.
The largely popular online photo-sharing and social networking service has above 400 million active users, six years after its launch.
As far as OS / hosting is concerned, Instagram runs Ubuntu Linux on Amazon EC2. Every request to Instagram servers goes through load balancing machines. It uses Amazon’s Elastic Load Balancer, with a couple of NGINX instances behind it. Coming to the application servers, it runs Django on Amazon High-CPU Extra-Large machines, in fact more than 30 of them, since the usage is growing exponentially. It uses Gunicorn as the WSGI server. They once used mod_wsgi and Apache, but found Gunicorn much easier to configure, and less CPU-intensive.
Most of Instagram’s data (users, photo metadata, tags, etc) reside in PostgreSQL. The photos themselves go straight to Amazon S3, which currently stores several terabytes of photo data. Amazon CloudFront is used as the CDN, which helps with image load times from users around the world. Also, Redis is used extensively. Memcached is used for caching; it currently has a large number of Memcached instances. And for Python error reporting, Instagram uses Sentry, an awesome open-source Django app written by Disqus.
Disqus makes it easy and rewarding for people to interact on websites using its system. From a platform perspective, Disqus uses Django alongside Redis queue. It uses Nginx Push Stream Module, a pure stream http push technology for the Nginx setup. It also used Gevent, a coroutine-based Python networking library that uses greenlet to provide a high-level synchronous API, and so is Sentry for error-logging. Disqus runs on raw metal, not EC2.
Disqus uses a pipelined architecture where messages pass from queue to queue while being acted upon by filters. The current workflow looks like this:
New Posts -> Disqus -> redis queue -> “python glue” Gevent formatting server (2 servers) -> http post -> nginx pub endpoint -> nginx + push stream module (5 servers) <- clients.
The content-sharing service Pinterest is reportedly striving to build a platform with the scale and engagement of Facebook and the purchasing intent of Google. Objects are stored in S3 with thousands of terabytes of user data. EC2 instances have grown by 8x. Most traffic happens in the afternoons and evenings, so they reduce the number of instances at night by 40%. At peak traffic, $52 an hour is spent on EC2. As of 2012, 90 instances were used for in-memory caching, which removes database load, 35 instances were used for internal purposes. Surely those numbers must have heavily gone up during the course of these years, given the exponential rise in Pinterest users. Several master databases are used alongside a parallel set of backup databases in different regions around the world for redundancy.
And yes, Pinterest is written in Python and Django!
ELB (Elastic Load Balancing) is used to load balance across instances. The ELB API makes it easy to move instances in and out of production. Hadoop-based Elastic MapReduce is used for data analysis and costs only a few hundred dollars a month. Memcached and membase / redis is used for object-level and logical caching; RabbitMQ is used as message queue; data storage is done using MySQL.
The popular event-management platform is hosted on Amazon EC2. Code base is predominantly Python, using Django. Working from the top of the stack, Eventbrite uses HAProxy/Nginx for load-balancing and SSL encrypt/decrypt. The site is served using nginx/uwsgi. For caching, both Memcached and Redis are utilized. For datastores, HBase, MongoDB and MySQL are employed. Hadoop is used for data storage and Mapreduce for processing; it uses Hive for querying the data.
Django powers the website; Git, libgit2 and pygit for performing Git operations; HAproxy for load balancing. Gunicorn along-with Nginx is responsible for the application servers. Redis powers the newsfeed, and RabbitMQ is used to queue the background jobs. Celery is used to deliver jobs from the main application to RabbitMQ. Postgres is used for data storage; Memcache for in-app caching, and New Relic for application and service monitoring.
Special mention about a non-startup website: NASA!
The National Aeronautics and Space Administration’s official website is powered by Django, and is the place to find news, pictures, and videos about their ongoing space exploration. The website is rich with visual content, and remains stable and user friendly. It is not as popular as Pinterest or Instagram, but still serves 2 million visitors monthly. No wonder Django was chosen as the one to build the major functional elements in the site - after all the government always chooses the ones they can trust!
The list goes on. Endless are the Django success stories. In fact, you might want to check this list out: AMD, Discovery, HP, IBM, Intel, Lexis-Nexis, Mozilla, National Geographic, The New York Times, Orbitz, PBS, Rdio, VMWare, Walt Disney, and the Washington Post.
Not bad for a small bunch of geeks who started off their work in Lawrence, Kansas, eh? :)