Nginx configuration for Django

Django is a powerful framework for building websites. To run a production website, usually an application server is used. So nginx will do two basic things:

  • Serve your Django application from the application server port to the web port (Reverse Proxy)
  • Serve static and media files

The application server used in this example is gunicorn, the application server chosen by Instagram of the earlier days, but it can be anything running on port 9999. Change port number as required in the example.

The following nginx conf was adapted from this, with some additions and it contains:

  • a commented non www to www website redirect
  • gzip for javascript, json, css and proxy routes
  • media files with etag (1 year)
  • static files with etag (1 minute)
  • an host-based favicon distributor (reusable as is)
  • a commented basic auth to make a website private
  • reverse proxy to gunicorn
  • a simple block for a common type of malicious activity

It works fine with Django 1 and 2.

# uncomment for redirect
# server {
#    # redirect WITH www from and
#    listen 80;
#    server_name;
#    return 301$request_uri;
# }

server {
    listen	80;
    # the domain name it will serve for
    charset     utf-8;

    # max upload size
    client_max_body_size 75M;

    # enable gzip for proxy requests
    gzip on;
    gzip_proxied any;
    gzip_vary on;
    gzip_http_version 1.1;
    gzip_types application/javascript application/json text/css text/xml;
    gzip_comp_level 4;

    # @see

    # Django media
    location /media  {
        etag on;
        expires 365d;
        alias /path/to/media_root;  # your Django project's media files - amend as required

    location /static {
        etag on;
        expires 1m;
        alias /path/to/static_root; # your Django project's static files - amend as required

    location /favicon.ico {
        # all favicons inside /path/to/favicons/ this directory
        # notation:
       alias /path/to/favicons/$host.ico;

    location / {
        # an HTTP header important enough to have its own Wikipedia entry:
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

        # enable this if and only if you use HTTPS, this helps Rack
        # set the proper protocol for doing redirects:
        # proxy_set_header X-Forwarded-Proto https;

        # pass the Host: header from the client right along so redirects
        # can be set properly within the Rack application
        proxy_set_header Host $http_host;

        # we don't want nginx trying to do something clever with
        # redirects, we set the Host: header above already.
        proxy_redirect off;

        # set "proxy_buffering off" *only* for Rainbows! when doing
        # Comet/long-poll stuff.  It's also safe to set if you're
        # using only serving fast clients with Unicorn + nginx.
        # Otherwise you _want_ nginx to buffer responses to slow
        # clients, really.
        # proxy_buffering off;

        # Uncomment for maintenance
        ### auth_basic "Insert password here";
        ### auth_basic_user_file /path/to/.htpasswd;

        proxy_connect_timeout       30000;
        proxy_send_timeout          30000;
        proxy_read_timeout          30000;
        send_timeout                30000;

        # @see and
        if ($http_user_agent ~ "libwww-perl") {
          return 403;

        # Try to serve static files from nginx, no point in making an
        # *application* server like Unicorn/Rainbows! serve static files.
        if (!-f $request_filename) {
            proxy_pass http://localhost:9999;

Run nginx -t to check and then systemctl reload nginx to apply.

This is a http version, to configure the website for https follow this howto.


Reduce Time to the First Byte – TTFB on web applications

How to speed up the time to the first byte and what are the causes of a long TTFB? Main causes are network and server-side and I will focus on server-side causes. I’m not covering any CMS here but you can try to apply some of these techniques starting from how to interpret the browser Timing.

Get reliable timing

Take a website with cache enabled: at the 9th visit on a page you can be sure your page is in cache, the connection with the webserver is alive, the SSL/TLS connection is established, the SQL queries are cached and so on. Open the network tab and enjoy your site speed: well, very few real users will experience that speed.

Here a comparison of a first time, no-cache connection to a nginx webserver explored with Chrome (F12 > Network > Timing) and a second request with the same page refreshed right after the first:


I got a +420% on a first time request compared with a connected-and-cached case. To obtain a reliable result (1st figure) you should usually:

  • Wait several seconds after a previous call before doing anything, waiting for the webserver to close connection with the client
  • Add a ?string to the url of the page you’re visiting. Change the string every time you want a fresh page.
  • Ctrl+shift+R to reload the page

This technique bypass the Django view cache and similar cache systems on other frameworks. To check the framework cache impact, do a Ctrl+shift+R just after the first request obtaining a similar result of the 2nd figure. There are better ways to do the same, this is the easiest.

Break up the time report

Unpack the time report of the first-time request:

  • Connection setup (15% of the elapsed time in the example)
    • Queueing: slight, nothing to do.
    • Stalled: slight, nothing to do.
    • DNS lookup: slight, nothing to do.
    • Initial connection: significant, skip for now.
    • SSL: significant, client establish a SSL/TLS connection with the webserver. Disabling ciphers or tuning SSL can reduce the time but the priority here is best security for the visitor, not pure speed. However, take a look at this case study if you want to tune SSL/TLS for speed.
  • Request / response (85% of the elapsed time i.e.)
    • Request sent: slight, browser-related, nothing to do.
    • Waiting (TTFB): significant, time to first byte is the time the user wait after the request was sent to the web server. The waiting time includes:
      • Framework elaboration.
      • Database queries.
    • Content Download: significant, page size, network, server and client related. To speed up content download of a HTML page you should add compression: here an howto for nginx and for Apache webservers: these covers proxy servers, applying directly on a virtualhost is even simplier and the performance gain is huge.

Not surprisingly, the time of a first time request is elapsed most in Request / response than on connection setup. Among the Request / response times is the Waiting (TTFB) the prominent. Luckyly it is the same segment covered by cache mechanics of the framework and consequently is the most eroded passing from the first (not cached) to the second figure (cached by the framework). To erode the TTFB, database queries and elaboration must be optimized.

Optimize elaboration: program optimization

When Google, the web-giant behind the most used web search engine in history, try to suggest some tips to optimize PHP to programmers they react badly starting from daily programmers going up to the PHP team bureau.

In a long response, the PHP team teach Google how to program the web offering unsolicited advice offering “some thoughts aimed at debunking these claims” with stances like “Depending on the way PHP is set up on your host, echo can be slower than print in some cases”, a totally confusing comment for a real-world programmer.

Google put offline the PHP performance page that can be misleading but still contains valid optimization tips, especially if you compare with some of comments on itself. Google have interests to speed and code optimization and the writer has the know-how to talk about it, the PHP team here just want to be right and defend their language and starting from good points crossed the line of scientific dialectic.

Program optimization mottos are:

Look for the best language that suits to your work and the best tools you can and look for programmers from the real-world sharing their approaches to the program optimization.

PHP team’s whining will not change the fact that avoiding SQL inside a loop like Google employee suggested is the right thing to do to enhance performance. This leads to database optimization.

Dude, where is my data?

The standard web application nowadays has this structure:

A typical web application

A typical web application: application server run the application so from now on  Рoversimplifying РI will treat application and application servers as synonyms.

After the client requests pass through the firewall, webserver serve static files and ask to Application server the dynamic content.

Cache server can serve application or web server but in this example the earlier has the control: an example of cache controlled by application is on the Django docs about Memcache, an example of cache by web server is the HTTP Redis module or the standard use of Varnish cache.

Database server (DBMS) stores the structured data for the application. DBMS on standard use cases can be optimized with little effort. More difficult is to optimize the way the web application get the data from the database.

Database query optimization: prefetch and avoid duplicates

To optimize database queries you have to check the timing, again. Depending on the language and framework you are using there are tools to get information about queries to optimize:

Since I’m using Python I go with Django Debug Toolbar, a de-facto standard for application profiling. Here a sample of SQL query timing on a PostgreSQL database:

Timing of SQL queries on Django Debug Toolbar.

Timing of SQL queries on Django Debug Toolbar.

The total time elapsed on queries is 137,07 milliseconds, the total number of queries executed are 90. Among these, 85 are duplicates. Below any query you’ll find how many times the same query is executed. The objective is to reduce the number of queries executed.

If you’re using Django, create a manager for your to use like this:

class GenericManager(models.Manager):
    prefetch_related: join via ORM
    select_related: join via database
    related_models = ['people', 'photo_set']
    def per_organizer(self, orgz, **kwargs):
        p = kwargs.get('pubblicato', None)
        ret = self.filter(organizer = orgz)
        return ret

class People(models.Model):
    name = models.CharField(max_length=50)

class Party(models.Model):
    organizer = models.ForeignKey('People')
    objects   =  GenericManager()

class Photo(models.Model):
    party = models.ForeignKey('Party')

Then in call your custom method on GenericManager:

def all_parties(request, organizer_name):
    party_organizer = People.objects.get(name=organizer_name)
    all_parties = Party.objects.per_organizer(party_organizer)
    return render(request, 'myfunnywebsite/parties.html', {
        'parties' : all_parties

When you want to optimize data retreival for Party, instead of comb through objects.filter() methods on you will fix only the per_organizer method like this:

class GenericManager(models.Manager):
    prefetch_related: join via ORM
    select_related: join via database
    related_models = ['people', 'photo_set']
    def per_organizer(self, orgz, **kwargs):
        ret = self.filter(organizer = orgz)
        return ret.prefetch_related(*self.related_models)

Using prefetch_related queries are grouped via ORM and all objects are available, avoiding many query duplicates. Here a result of this first optimization:


  • Query number is dropped from 90 to 45
  • Query execution time dropped from 137,07 to 80,80 (-41%)

An alternative method is select_related, but in this case the ORM will produce a join and the above code will give an error because photo_set is not accessible in this way. If your models are structured in a way you got a better performance with select_related go with it but remember this limitation. In this use case the results of select_related are worse than prefetch_related.


  • TTFB can be a symptom of server-side inefficiency but you have to profile your application server-side to find out
  • Check SQL timing
  • Reduce the number of queries
  • Optimize application code
  • Use cache systems, memory-based (redis, memcached) are the faster

In my experience, inefficient code and a lot of cache are a frail solution compared with the right balance between caching and query + program optimization.

If you’ve tried everything and the application is still slow, consider to rewrite it or even to change the framework you’re using if speed is critical. When any optimization failed, I went from a Drupal 6 to a fresh Django 1.8 installation, and Google understood the difference in milliseconds elapsed to download the pages during indexing:


Since you can’t win a fight with windmills, a fresh start may be the only effective option on the table.

From Drupal to Django: how to migrate contents

In a recent article¬†I explain the motivations for an upgrade from a no longer maintained Drupal 6 installation to¬†Django 1.8. I will now cover more in detail the migration techniques adopted in the upgrade and I’ll deepen the models and the relationships.


If you’re a drupaler, you’re familiar with the node/NID/edit and the node/add/TYPE pages:


Here we have two visible fields: Title and Body. One is an input type text and the other a texarea. The good Form API provided by Drupal calls these two types textfield and textarea. However if you use the Content type creation interface you don’t see any of these, just declare some field types and you’ll see the form populating with new fields after the addition.

It’s similar in Django but you haven’t to pass to a graphical interface to do this: structure is code-driven and the side effect is the ability to put on revision almost anything. You can choose between different field types that will be reflected in database and on the user interface.

Here what the Drupal Body and Title fields looks like in a model called Article:

from django.db import models
from tinymce import models as tinymce_models
# Articles
class Article(models.Model):
    title       = models.CharField(max_length=250,null=False, blank=False)
    body        = tinymce_models.HTMLField(blank=True, default='')

The TinyMCE part require TinyMCE app installed and configured.¬†If you’re new to Django read and follow the great Writing your first Django app¬†to understand the basics, e.g the difference between a project and an app or the following sections will sound pretty obscure.

After editing your projectname/appname/ file you can now apply the changes in your app via makemigrations (create a migration file for the changes in the database) and migrate (apply the migrations inside the migration files).

In a real world scenario these two fields alone aren’t enough neither in a Drupal 6.¬†These information are all presented by default in any type on Drupal 6:


Drupal 6 treats author as entities you can search through an autocomplete field, and date as a pseudo-ISO 8601 date field. The author field is a link to the User table in Drupal. In Django a similar user model¬†exists but if you want to unchain the access to the admin backend and the authorship it’s simpler to¬†create a custom author model and later associate this with the real user model.


E-R of our app where migrate the Drupal contents to.

from django.db import models
from tinymce import models as tinymce_models
# Authors
class Author(models.Model):
    alias       = models.CharField(max_length=100)
    name        = models.CharField(max_length=100, null=True, blank=True)
    surname     = models.CharField(max_length=100, null=True, blank=True)
# Articles
class Article(models.Model):
    author      = models.ForeignKey('Author', verbose_name='Authored by')
    title       = models.CharField(max_length=250,null=False, blank=False)
    body        = tinymce_models.HTMLField(blank=True, default='')
    publishing_date = models.DateTimeField(auto_now=False, auto_now_add=False, verbose_name='First published on')

As you can see in the Entity-Relationship diagram one¬†Article must have one and only one Author, but many¬†Articles can have the same Author. This is called Many-to-one relationship¬†and it’s represented in Django as a foreign key from the destination “many” model (e.g. Article) to the “one” model (Author).

The Article.publishing_date field is where publishing date and time are stored and, clicking on the text field, a calendar popup is presented to choose the day and hour, with a useful “now” shortcut to populate the field with the current time.


How a calendar is represented in a DateTime field.

Now that the basic fields are in the right place you can makemigrations / migrate again to update your app, restarting the webserver to apply the changes.

Attachments and images

Drupal is shipped with the ability to upload files and images to nodes. Django has two different field for this: FileField and ImageField. Before continuing we have to rethink our E-R model to allow attachments.



The code is:

from django.db import models
from tinymce import models as tinymce_models
# Authors
class Author(models.Model):
    alias       = models.CharField(max_length=100)
    name        = models.CharField(max_length=100, null=True, blank=True)
    surname     = models.CharField(max_length=100, null=True, blank=True)
# Articles
class Article(models.Model):
    author      = models.ForeignKey('Author', verbose_name='Authored by')
    title       = models.CharField(max_length=250,null=False, blank=False)
    body        = tinymce_models.HTMLField(blank=True, default='')
    publishing_date = models.DateTimeField(auto_now=False, auto_now_add=False, verbose_name='First published on')
# Attachments
class Attachments(models.Model):
    description = models.CharField(max_length=255, default='', blank=True)
    list = models.BooleanField(default=True)
    file = models.FileField(upload_to='attachments_directory', max_length=255)

Images are similar: if you want to enrich your model with images you can create another model like Attachments but with an ImageField instead. Remember to use a different upload_to directory in order to keep the attachments and images separated.

We miss the last one field to complete our models: path. Django comes with an useful SlugField that as of Django 1.8 allows only ASCII characters and can be mapped to another field, the title for example.

from django.db import models
from tinymce import models as tinymce_models
# Articles
class Article(models.Model):
    author      = models.ForeignKey('Author', verbose_name='Authored by')
    title       = models.CharField(max_length=250,null=False, blank=False)
    body        = tinymce_models.HTMLField(blank=True, default='')
    publishing_date = models.DateTimeField(auto_now=False, auto_now_add=False, verbose_name='First published on')

Keep in mind that a SlugField differs from a Drupal path field because it doesn’t allow slashes. Consider a path like this:


In Drupal you will have a A) view with the path news and the argument news title or B) a fake path generated by pathauto or similar modules. In years of Drupal development, I can affirm that the B option is the typical easy way that turns into a nightmare of maintainance. Django core as far as I know allows only the A choice, so if you want a news view you have to declare it in and then in as stated in official documentation.

  • news/: the news root path, coupled with the view
  • news-title: the ¬†argument passed to the view and¬†the SlugField content for an article. It must be unique to be used as key to retreive an article but since it can be empty we cannot force it to has a value or to be unique at first. When all data are imported and fixed we can change this field to unique to improve database retrieval performance.


And what about categories? If you have a category named Section, and an article can be associated with only one Section, you have to create a Many-to-one relationship. As you see before, you have to put the foreign key in the N side of the relation, in this case Article, so the model Article will have a ForeignKey field referencing a specific section.

On the other hands if you have tags to associate to your article you have to create a Tag model with a Many-to-many relationship to the Article. Django will create an intermediate model storing the Article-Tag relationships.

Do not abuse of M2M relationships because each relation needs a separate table and the number of JOIN on database table will increase with side effects on the performance, not even perceivable on the first since Django ORM is very efficient. The event handling will be more difficult for a beginner since the many to many events occurs only when the parent models are saved and this require some experience if you need to add a custom action to a M2M event. If you design wisely your E-R model you have nothing to be scared of.

Migration techniques

Now that we have the destination models, fields and relationship we can import the content from Drupal. In the previous article I suggested to use Views Datasource module to create a JSON view to export content. Please read the Exporting the data from Drupal section inside the article before continue.

The obtained row is something like:

      {nid: '30004',
      domainsourceid: '2',
      nodepath: '',
      postdate: '2014-09-17T22:18:42+0200',
      nodebody: 'HTML TEXT HERE',
      nodetype: 'drupal type',
      nodetitle: 'Title here',
      nodeauthor: 'monty',
      nodetags: 'Drupal, dragonball, paintball'

If you haven’t a multi-site Drupal you can ignore domainsourceid field.¬†The nodetags lists some Tag names of a Many-to-many relationship not covered here.

All the other value are useful for the import:

  • nid: the original content id, used for pagination and retrieval
    Destination: parsing
  • nodepath:¬†content path
    Destination: Article.path
  • nodebody: content body
    Destination: Article.body
  • nodetype: type of the node
    Destination: parsing
  • nodetitle: title of the node
    Destination: Article.title
  • nodeauthor:¬†author of the content
    Destination: -> Author.alias

In the previous article you find how to make the View on Drupal (source) and now you have  rough idea of the field mapping. How to fetch the data from Django?

Management command and paged view

To start a one-time import you can write a custom management command for your Django application named project/app/management/commands/

from __future__ import unicode_literals
from import BaseCommand, CommandError
from django.core.exceptions import ValidationError, MultipleObjectsReturned, ObjectDoesNotExist
import json, urllib
import urlparse
from shutil import copyfile
from django.conf import settings
from os import sep
from import default_storage
from django.utils.text import slugify
import requests
import grequests
import time
from md5 import md5

class Command(BaseCommand):
    help = 'Import data from Drupal 6 Json view'
    def add_arguments(self, parser):
        parser.add_argument('start', nargs=1, type=int)
        parser.add_argument('importtype', nargs=1)
        # Named (optional) arguments
        # Crawl
            help='Crawl data.')
    def handle(self, *args, **options):
        # process data

This management command can be launched with

python myimport 0 article --crawl

Where 0 is the item to start + 1, “article” is the type of content to import (e.g. the destination model) and –crawl is the import option. Let’s add the import logic to the Command.handle method:

def handle(self, *args, **options):
        assert options['crawl'] and options['importtype']
        # start to import or store data
        sid = int(options['start'].pop())
        reading = True
        while reading:
            importazioni = []
            articoli = []
            url = '' % (sid,)
            print url
            response = urllib.urlopen(url)
            data = json.loads(
            data = data['']
            # no data received, quit
            if not data:
                reading = False
            for n, record in enumerate(data):
                sid = int(record['']['nid'])
                title = record['']['nodetitle']
                # continue to process data, row after row
                # ...

    except AssertionError:
        raise CommandError('Invalid import command')

This example will fetch /json-path-verylongkey starting from nid passed from the command + 1. Then, it will process the json row after row and keep in memory the id of the last item. When no content is available, the cycle will stop. It’s a common method and it’s lightweight on the source server because only one request at time are sent and then the response is processed. Anyway, this method can be also slow because we have to sum waiting time: (request 1 + response 1 + parse 1) + (request 2 + response 2 + parse 2) etc.

Multiple, asyncronous requests

We can speed up the retrieval by using grequests. You have to check what is the last element first by cloning the Drupal data source json view and showing only the last item, then fetching the id.

def handle(self, *args, **options):
        assert options['crawl'] and options['importtype']
        # start to import or store data
        sid = int(options['start'].pop())
        # find last node id to create an url list
        url = ''
        response = requests.get(url, timeout = 50)
        r = response.json()
        last_nid = int(r[''].pop()['']['nid'])

You can then create a from-to range starting from the first element passed by command line to the last.

url_pattern = "";
urls = []
per_page = 20
# e.g. [0, 20, 40, 60]
relements       = range(0, last_nid, per_page)
if relements[-1] < last_nid:
    relements.append(last_nid + 1)
for fromx, toy in zip(relements, relements[1:]):
    u = url_pattern % (fromx, toy)

rs = (grequests.get(u) for u in self.urls)
# blocking request: stay here until the last response is received
async_responses =
# all responses fetched

The per_page is the number of element per page specified on Drupal json view. Instead of a single nid parameter, fromnid and tonid are the parameter “greater than” and “less or equal than” specified in the Drupal view.

The core of the asyncronous, multiple requests is It take a list of urls and then request them. The response will arrive in random order but the async_responses will be populated by all of them.

At that point you can treat the response list like before, parsing the response.json() of each element of the list.

With these hints you can now create JSON views within Drupal ready to be fetched and parsed in Django. In a next article I will cover the conversion between the data and Django using the Django ORM.

Guide to migrate a Drupal website to Django after the release of Drupal 8

I maintain a news website written in Drupal since 2007. It is a Drupal 6, before was a 5. I made many Drupal 7 installations in these years and I went to three Drupal local conventions. This is a guide on how to abandon Drupal if you already knows some basics of Django and Python.

Drupal on LAMP: lessons learned

  • PHP is for (not so) fast development¬†but¬†maintainability can be a pain.
  • Drupal try to overcome PHP limits, with mixed results.
  • Apache cannot stands heavy¬†traffic without an accelerator like Varnish and time-consuming ad-hoc configurations. If traffic increases, Apache cannot stand it at all.
  • Drupal¬†contrib modules are¬†a mix of high quality tools (like Webform or Views Datasource) and bad written¬†projects. The more module are enabled, the more the project¬†lose in maintainability. It is not so evident if you don’t see any other open source project.

This is not the only real truth, this is my experience in these 8 years. I feel a more confident Python programmer than PHP programmer having spent less than one-third of the years working on it. At the end of the article I cite a list of article written for programmers feeling the same uneasiness of mine working on PHP and Drupal after trying other tools.

Django experiences

In the last years with Drupal still paying most of my bills I used the Django MVC framework written in Python for three project: an e-mail application, a real estate catalog  and a custom-made CRM. One of this is a porting of something written in PHP on Drupal 5. In all of these three project I was very happy with the maintainability, clearness of the code and high-level, well written packages I found while exploring it like Tastypie and many python packages found on cake shop.

Even considering I’m the only developer of these, I haven’t experienced the frustration I feel on Drupal when trying to make something work as I design or trying to fix some code I write time ago. I know that a CMS is at higher level than a framework, simply some projects are not suited for Drupal and I found more comfortable with Python than PHP in these days.

At the time I write¬†Drupal 8 is out as Release Candidate. I made migrations from 5 to 6 and from 6 to 7 on some websites in the past. Migrating to a new major¬†it’s not a science, it’s a sort of mystical art. When the Drupal 8 will be out, Drupal 6 will be automatically unsupported after 3 months Drupal 8 is out as of Drupal announcement since only the current and previous version are supported, 8.x and 7.x when 8 is out. Keeping a Drupal 6 running after that term will be risky.

Choosing the stack

Back to the news website I maintain, the choice is between a platform I already know well and it proves stable and maintainable for small/one-person team and another I have to learn. Plus,¬†Django will be the natural choice to avoid the problems I’ve listed above and use the¬†solutions I used on past django projects exploring new tools in the meanwhile.

Here the choices I made:

I decided to use gunicorn because it’s very easy to run and maintain¬†for a django project and you haven’t to make wsgi run on nginx. Nginx is in front of gunicorn, serving static files and sending right requests to it. Memcached is used inside Django and it will store cached pages from views¬†on volatile memory avoiding to read from the database any time a page is requested. I try to avoid using Varnish even if is a very good tool because I want to keep the stack¬†as simple as I can and I’m confident Varnish and Memcache will speed up the website enough. Now is the time to rewrite the Drupal-hosted website into a Django application.

Write the E-R model

If you are here probably you have a running Drupal website you want to port to Django. Browse it like an user, and then open your Content types list to identify the Entities and the Relationships as of the E-R model suggests. If your website is running for a long time you probably want to redesign some parts, adding, removing or fusing entities into another.

Take my news website¬†for example. I have 15 content types + 12 vocabularies (27 entities) on Drupal. After rewriting the E-R I’ve 14 models (entities), including the core ones. On the database side it translates into a 199 tables for Drupal and 25 for Django¬†since it¬†usually make an entity property into a database column. I trash some entities and fuse 4 entities into one.

From entities to models: understanding relationships

When you establish a relation between your re-designed entities you can have N:1¬†relations, N:N relations and 1:1 relations. A Drupal node “Article” that accepts a single term for a vocabulary named “Cheese type” translates into a N:1 relationship between the model Article¬†(N)¬†and the¬†model¬†CheeseType (1).¬†It is a simple case since you can translate it into a ForeignKey¬†field on your model since Article will get a ForeignKey field named author referencing to the Author model.

from django.db import models
from tinymce import models as tinymce_models
# Authors
class Author(models.Model):
    alias       = models.CharField(max_length=100)
    name        = models.CharField(max_length=100, null=True, blank=True)
    surname     = models.CharField(max_length=100, null=True, blank=True)
# Articles
class Article(models.Model):
    author      = models.ForeignKey('Author')
    title       = models.CharField(max_length=250,null=False, blank=False)
    body        = tinymce_models.HTMLField(blank=True, default='')
# Attachments to an Article
class Attachment(models.Model):
    article       = models.ForeignKey('Article', blank=True, null=True)
    file          = models.FileField(upload_to='attachment_dir', max_length=255, blank=True, null=True)
    description   = models.TextField(null=True, blank=True)
    weight        = models.PositiveSmallIntegerField()

In the case of a list of attachments to Article, you have a 1:N relationship between the Article model (1) and the Attachment model (N). Since the relationship is reversed, in the usual Django admin interface you cannot see the attachments in the article as is since you have to create an Attachment and then choose an article from a dropdown where attach it to.

For this case, Django provides an handy administration interface called inline to include entities in reversed relationship. This approach fix by design something that in Drupal world costs a lot of effort, with dozen of modules like Field Collection or workaround like this I write of in the past and it keep aligned your E-R design with your models. Plus, a list of all Attachment are available for free.

Exporting the data from Drupal

JSON is a pretty good interchange format: very fast to encode and decode, very well supported. I’m fascinated with YAML format but since I’ve to export thousands of articles I need pure speed and solid import/export modules on both Django and Drupal side.

There are many export module in the Drupal world. I’m very fond of Views Datasource and here how I used it:

  1. Install Views Json (part of Views Datasource): it is available for Drupal 6 and 7 and very solid
  2. Create a new view with your published nodes with the JSON Data style
    1. Field output: Normal
    2. Without Plain text (you need HTML)
    3. Json data format: Simple
    4. Without Views API mode
    5. application/json as Mime type
    6. Remove all parent / children tag name so you will have only arrays and objects
  3. Choose a path for your view
  4. Limit the view to a large number of elements, e.g. 1000
  5. Sort by node id, ascendent
  6. Add an exposed filter “greater than” Nid with a custom Filter identifier (e.g. nid)
  7. Add any field you need to import and any filter you need to limit the results
  8. Avoid caching the view
  9. Limit the access to the view if you don’t want to expose sensible contents¬†(optional)
  10. Install a plugin like JsonView (chrome) or JsonView (firefox) to look at the data on your browser

You will get something like that:

      {nid: "30004",
      domainsourceid: "1",
      nodepath: "",
      postdate: "2014-09-17T22:18:42+0200",
      nodebody: "HTML TEXT HERE",
      nodetype: "drupal type",
      nodetitle: "Title here",
      nodeauthor: "monty",
      nodetags: "Drupal, basketball, paintball"

Now you can reach the view appending ?nid=0 to your path. It means that any node with id greater than 0 will be listed. With nid=0 a max of 1000 elements are listed. To get other nodes you have simply to get the nid from the last record (e.g. 2478) and use it as value for the nid parameter obtaining something like

Try it on your browser simulating what a procedure will do for you: check the response size and adapt the number of elements (#4) accordingly to avoid to overload your server, hit the timeout or simply storing too much data into the memory when parsing. When the view response is¬†empty you’ve listed all nodes matching your filters and the parsing is complete.

In this example I’ve talked about nodes but you can do the same with files, using fid as id to pass as parameter and to sort your rows. In the case of files you have to move the files as well but it’s pretty simple to import these on a custom model on Django as you will see.

Importing data to Django

Django comes with some nice export (dumpdata)¬†¬†and import¬†(loaddata) commands. I’ve used a lot the YAML format to migrate and backup data from models but Json and SQL are other supported formats you can try. However in this migration I choose¬†custom admin command to do the job. It’s fast: in less than 10 minutes the procedure¬†imported 15k+ articles writing on a custom model some logging information on both error and¬†success.

All the import code in my case, comments and import included, is about 300 lines of python code. The core of the import function for nodes willing to become Articles is that:

import json, urllib
# ...
sid = int(options['start'].pop())
reading = True
while reading:
    url = "" % (sid,)
    print url
    response = urllib.urlopen(url)
    data = json.loads(
    data = data['']
    # no data received, empty view result, quit
    if not data:
        reading = False
    for n, record in enumerate(data):
        sid = int(record['']['nid'])
        # ... do something with data ...

In this cycle, sid is the start argument passed to the admin command via command line. Next, sid will be set to the last read record so, when record finishes, a new request to myview starting from the last read element will be made.

All input and output is UTF-8 in my case. JSON View quotes strings and you have to decode them before saving in Django:

from myapp.models import Article
import HTMLParser
hp = HTMLParser.HTMLParser()
authors = Author.objects.all()
for n, record in enumerate(data):
        art = Article(
            title = hp.unescape(record['']['nodetitle']),
            body = record['']['nodebody'],
            author = authors.get(alias=record['']['nodeauthor'])
        # run the same validation of an admin interface submit
    except ValidationError as e:
      # cannot save the element
      # inside e all the error data you can save into
      # a custom log model or print to screen
      # any other exception

On line 9 a new article is declared. The title in Json source is named nodetitle. On line 10 the title from json is unescaped and assigned to title CharField of Article. The nodebody  is set as it is since the destination field is a TextField with HTML. On line 11 username nodeauthor from Json is used as key to associate the already imported user to the ForeignKey field author, where username is saved as Author.alias.

Performance gains

Here the download time graph from Google Search Console after some months:

You can clearly see the results in speed, expressed in milliseconds, between 2015 (old Drupal 6 platform) and 2016 (new Django platform).


Here the very basics on how to prepare a migration from Django to Drupal using Views Datasource module and a custom admin command. I described why I choose Django after years of Drupal development for this migration suggesting some tools to do the job and introducing some basic concepts for Drupal developer who wants to try Django.

I’ve read about Drupal enthusiasts that suffers the same uneasiness of mine after long-time Drupal / PHP development. I talk about reasons to leave Drupal on another post.


  • I quit my Drupal job and I’m programming mostly with Python.
  • On October 2016 Django (Software) surpassed Drupal (Software) in Google Trends. Django gained 4 points from then, Drupal lost 2 points continuing its decline in popularity on Google search.

    Django vs Drupal on Google Trends

    Django vs Drupal on Google Trends. Django surpassed Drupal on October 2016.

How to enable gzip on proxy servers on Apache

I’m starting to use the gunicorn django app using supervisord. Here my configuration:

  • Varnish: port 80
  • Apache: port 8080
  • gunicorn: port 4180 (/path/to/my/ run_gunicorn localhost:4180)

Only the port 80 is exposed to other clients than localhost. The Varnish default backend is Apache (localhost:8080). I have a Drupal installation and a django installation on the same machine: since I want to expose django on the same domain at a defined location, I add to Apache this location:

ProxyRequests Off
ProxyPreserveHost On

Order deny,allow
Allow from all

# on port 4180 gunicorn is running
# @see /etc/supervisor.conf
ProxyPass /foo http://localhost:4180/
ProxyPassReverse /foo http://localhost:4180/
Order allow,deny
Allow from all
AddOutputFilterByType DEFLATE text/html

You can omit AddOutputFilterByType DEFLATE text/html: here I just take the response from gunicorn, compress and then serve to the client in this way:

(client) -> varnish -> apache -> gunicorn

                (X-Varnish-Cache: MISS) 

Here an example of what I get:

It’s a big page, but using gzip from 2.2 MB of the uncompressed page I get 417 KB gzipped text/html, less than 1/4 of the original!

Read also on the same topic: How to enable gzip on proxy servers on nginx

Autolaunch a command and restart it on quit on CentOS

I’ve a django-admin command running as a server thanks to gevent. I want this server to run on boot and autorestart on quit. ¬†StackOverflow give me a hint: use Supervisor.

On a Centos 5 distro:

# find supervisor for your distro...
yum search supervisor
# ...and install it
yum install supervisor.noarch
nano /etc/supervisord.conf

At the end of the file, add a new program:

command=/usr/bin/env python26 /usr/local/etc/django-apps/foo/ tcpapi 4114
priority=999                ; the relative start priority (default 999)
autostart=true              ; start at supervisord start (default: true)
autorestart=true            ; retstart at unexpected quit (default: true)
; startsecs=-1                ; number of secs prog must stay running (def. 10)
; startretries=3              ; max # of serial start failures (default 3)
exitcodes=0,2               ; 'expected' exit codes for process (default 0,2)
stopsignal=QUIT             ; signal used to kill process (default TERM)
; stopwaitsecs=10             ; max num secs to wait before SIGKILL (default 10)
; user=root                   ; setuid to this UNIX account to run the program
log_stdout=true             ; if true, log program stdout (default true)
log_stderr=true             ; if true, log program stderr (def false)
logfile=/var/log/myfunnydjangocommand.log    ; child log path, use NONE for none; default AUTO
logfile_maxbytes=1MB        ; max # logfile bytes b4 rotation (default 50MB)
logfile_backups=10          ; # of logfile backups (default 10)

Then, start supervisord.

service supervisord start

Take a look to supervisord log file:

less +G /var/log/supervisor/supervisord.log

You’ll see something like this:

2013-06-07 11:54:16,559 CRIT Supervisor running as root (no user in config file)
2013-06-07 11:54:16,576 INFO /var/tmp/supervisor.sock:Medusa (V1.1.1.1) started at Fri Jun  7 11:54:16 2013
        Hostname: <unix domain socket>
2013-06-07 11:54:16,645 CRIT Running without any HTTP authentication checking
2013-06-07 11:54:16,654 INFO daemonizing the process
2013-06-07 11:54:16,657 INFO supervisord started with pid 19316
2013-06-07 11:54:16,666 INFO spawned: 'myfunnydjangocommand' with pid 19318
2013-06-07 11:54:17,670 INFO success: myfunnydjangocommand entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

Read documentation about the configuration options but keep in mind your Supervisor version. I don’t use supervisorctl because of this bug, if you get an error simply go with service supervisord… but if you have a newer version this should be already fixed.

Note:¬†myfunnydjangocommand.log doesn’t contain anything useful in my experience but maybe it’s related how I write the output since I’ve written it to use interactively, outputting lines directly to the user. I’ll update this post if I find how to solve this issue.

Django and Drupal integration using drush via SSH

Some months ago I talked about how to achieve a unified login from Django to Drupal using drush. The basic assumption was that both Drupal and Django are on the same server. What if the two components are on different servers?

Paramiko is a SSH2 protocol library aimed to provide simple classes to make SSH connection. Let’s see how the code to call drush on command line changes.


  • paramiko
  • on your app add:
  • DRUPAL_SERVER_SSH_HOST     = '' # Your host here
    DRUPAL_SERVER_SSH_USERNAME = 'YourRemoteServerUserHere'
    DRUPAL_SERVER_SSH_PASSWORD = 'YourRemoteServerPasswordHere'

    And then:

    assert request.user.drupal_id > 0
    # user id to log in
    drupal_id = str(request.user.drupal_id)
    output = ""
     # a list with command as first element and arguments following
     get_password_recovery_url = ["drush", "-r", settings.DRUPAL_SITE_PATH, "-l", settings.DRUPAL_SITE_NAME, "user-login", drupal_id]
     # via ssh
     ssh = paramiko.SSHClient()
     # add to known_host the remote server key if it's not already stored
     # @see
     ssh.connect(settings.DRUPAL_SERVER_SSH_HOST, username=settings.DRUPAL_SERVER_SSH_USERNAME, password=settings.DRUPAL_SERVER_SSH_PASSWORD)
     ssh_stdin, output, ssh_stderr = ssh.exec_command(" ".join(get_password_recovery_url))
     output_lines =
     # taking only the first line of the output:
     # e.g. ''
     # @todo additional statements here
     if ssh:
    if output_lines:
    drupal_login_url = output_lines[0].replace("", "http://%s/" % settings.DRUPAL_SITE_URL).strip()
    destination = "%s?destination=%s" % (drupal_login_url, settings.DRUPAL_LOGIN_DESTINATION)
     return redirect(destination)
     return HttpResponse('
    <h1>Wrong request</h1>

    This is the same code of the previous howto, with the difference that drush now is running on a different server of django. You can use the same method to do anything you have to with drush, any time you call this piece of code an SSH connection is opened.

    See also: