Category Archives: Coding

It’s all about programming!

A typical web application

Reduce Time to the First Byte – TTFB on web applications

How to speed up the time to the first byte and what are the causes of a long TTFB? Main causes are network and server-side and I will focus on server-side causes. I’m not covering any CMS here but you can try to apply some of these techniques starting from how to interpret the browser Timing.

Get reliable timing

Take a website with cache enabled: at the 9th visit on a page you can be sure your page is in cache, the connection with the webserver is alive, the SSL/TLS connection is established, the SQL queries are cached and so on. Open the network tab and enjoy your site speed: well, very few real users will experience that speed.

Here a comparison of a first time, no-cache connection to a nginx webserver explored with Chrome (F12 > Network > Timing) and a second request with the same page refreshed right after the first:

performance-01

I got a +420% on a first time request compared with a connected-and-cached case. To obtain a reliable result (1st figure) you should usually:

  • Wait several seconds after a previous call before doing anything, waiting for the webserver to close connection with the client
  • Add a ?string to the url of the page you’re visiting. Change the string every time you want a fresh page.
  • Ctrl+shift+R to reload the page

This technique bypass the Django view cache and similar cache systems on other framework. To check the framework cache impact, do a Ctrl+shift+R just after the first request obtaining a similar result of the 2nd figure. There are better ways to do the same, this is the easiest.

Break up the time report

Unpack the time report of the first-time request:

  • Connection setup (15% of the elapsed time in the example)
    • Queueing: slight, nothing to do.
    • Stalled: slight, nothing to do.
    • DNS lookup: slight, nothing to do.
    • Initial connection: significant, skip for now.
    • SSL: significant, client establish a SSL/TLS connection with the webserver. Disabling ciphers or tuning SSL can reduce the time but the priority here is best security for the visitor, not pure speed. However, take a look at this case study if you want to tune SSL/TLS for speed.
  • Request / response (85% of the elapsed time i.e.)
    • Request sent: slight, browser-related, nothing to do.
    • Waiting (TTFB): significant, time to first byte is the time the user wait after the request was sent to the web server. The waiting time includes:
      • Framework elaboration.
      • Database queries.
    • Content Download: significant, page size, network, server and client related. To speed up content download of a HTML page you should add compression: here an howto for nginx and for Apache webservers: these covers proxy servers, applying directly on a virtualhost is even simplier and the performance gain is huge.

Not surprisingly, the time of a first time request is elapsed most in Request / response than on connection setup. Among the Request / response times is the Waiting (TTFB) the prominent. Luckyly it is the same segment covered by cache mechanics of the framework and consequently is the most eroded passing from the first (not cached) to the second figure (cached by the framework). To erode the TTFB, database queries and elaboration must be optimized.

Optimize elaboration: program optimization

When Google, the web-giant behind the most used web search engine in history, try to suggest some tips to optimize PHP to programmers they react badly starting from daily programmers going up to the PHP team bureau.

In a long response, the PHP team teach Google how to program the web offering unsolicited advice offering “some thoughts aimed at debunking these claims” with stances like “Depending on the way PHP is set up on your host, echo can be slower than print in some cases”, a totally confusing comment for a real-world programmer.

Google put offline the PHP performance page that can be misleading but still contains valid optimization tips, especially if you compare with some of comments on php.net itself. Google have interests to speed and code optimization and the writer has the know-how to talk about it, the PHP team here just want to be right and defend their language and starting from good points crossed the line of scientific dialectic.

Program optimization mottos are:

Look for the best language that suits to your work and the best tools you can and look for programmers from the real-world sharing their approaches to the program optimization.

PHP team’s whining will not change the fact that avoiding SQL inside a loop like Google employee suggested is the right thing to do to enhance performance. This leads to database optimization.

Dude, where is my data?

The standard web application nowadays has this structure:

A typical web application

A typical web application: application server run the application so from now on  Рoversimplifying РI will treat application and application servers as synonyms.

After the client requests pass through the firewall, webserver serve static files and ask to Application server the dynamic content.

Cache server can serve application or web server but in this example the earlier has the control: an example of cache controlled by application is on the Django docs about Memcache, an example of cache by web server is the HTTP Redis module or the standard use of Varnish cache.

Database server (DBMS) stores the structured data for the application. DBMS on standard use cases can be optimized with little effort. More difficult is to optimize the way the web application get the data from the database.

Database query optimization: prefetch and avoid duplicates

To optimize database queries you have to check the timing, again. Depending on the language and framework you are using there are tools to get information about queries to optimize:

Since I’m using Python I go with Django Debug Toolbar, a de-facto standard for application profiling. Here a sample of SQL query timing on a PostgreSQL database:

Timing of SQL queries on Django Debug Toolbar.

Timing of SQL queries on Django Debug Toolbar.

The total time elapsed on queries is 137,07 milliseconds, the total number of queries executed are 90. Among these, 85 are duplicates. Below any query you’ll find how many times the same query is executed. The objective is to reduce the number of queries executed.

If you’re using Django, create a manager for your models.py to use like this:

class GenericManager(models.Manager):
    """
    prefetch_related: join via ORM
    select_related: join via database
    """
    related_models = ['people', 'photo_set']
    def per_organizer(self, orgz, **kwargs):
        p = kwargs.get('pubblicato', None)
        ret = self.filter(organizer = orgz)
        return ret

class People(models.Model):
    name = models.CharField(max_length=50)
    ...

class Party(models.Model):
    organizer = models.ForeignKey('People')
    objects   =  GenericManager()

class Photo(models.Model):
    party = models.ForeignKey('Party')
    ...

Then in views.py call your custom method on GenericManager:

def all_parties(request, organizer_name):
    party_organizer = People.objects.get(name=organizer_name)
    all_parties = Party.objects.per_organizer(party_organizer)
    return render(request, 'myfunnywebsite/parties.html', {
        'parties' : all_parties
    })

When you want to optimize data retreival for Party, instead of comb through objects.filter() methods on views.py you will fix only the per_organizer method like this:

class GenericManager(models.Manager):
    """
    prefetch_related: join via ORM
    select_related: join via database
    """
    related_models = ['people', 'photo_set']
    def per_organizer(self, orgz, **kwargs):
        ret = self.filter(organizer = orgz)
        return ret.prefetch_related(*self.related_models)

Using prefetch_related queries are grouped via ORM and all objects are available, avoiding many query duplicates. Here a result of this first optimization:

django_sql_query_debug_toolbar_2

  • Query number is dropped from 90 to 45
  • Query execution time dropped from 137,07 to 80,80 (-41%)

An alternative method is select_related, but in this case the ORM will produce a join and the above code will give an error because photo_set is not accessible in this way. If your models are structured in a way you got a better performance with select_related go with it but remember this limitation. In this use case the results of select_related are worse than prefetch_related.

Recap:

  • TTFB can be a symptom of server-side inefficiency but you have to profile your application server-side to find out
  • Check SQL timing
  • Reduce the number of queries
  • Optimize application code
  • Use cache systems, memory-based (redis, memcached) are the faster

In my experience, inefficient code and a lot of cache are a frail solution compared with the right balance between caching and query + program optimization.

If you’ve tried everything and the application is still slow, consider to rewrite it or even to change the framework you’re using if speed is critical. When any optimization failed, I went from a Drupal 6 to a fresh Django 1.8 installation, and Google understood the difference in milliseconds elapsed to download the pages during indexing:

downloadtime

Since you can’t win a fight with windmills, a fresh start may be the only effective option on the table.

Advertisements

How to start programming in Python on Windows

To develop in Django can be confusing for a new Python developer but using Windows to develop in Django can be a major obstacle too.

How to choose the right IDE for Windows and how to find and install Python libraries? Below six fundamental resources to program with Python on Windows.

Bitnami Django Stack

For developer using Windows, Bitnami Django Stack is a life-saver. It raises you to the need of installing and configuring many libraries and simply create a Python / Django environment on your system. Even if you don’t want to use Django, it can be a great starting point to install Python and fundamental libraries you can extend via PyCharm.

PyCharm

complexlook2x

Screenshot: official website

JetBrains’ PyCharm is the multiplatform IDE to develop in Python. You can forget about the indentation issue and focus on programming. The autocomplete dropdown, the Python console, the easy management of DVCS systems (Git, Mercurial), the easy access to Python packages repositories will make it the tools for Python programming, especially in Windows where there are few alternatives than Linux. On Windows, rely on the Bitnami Django Stack you’re using to load the right libraries.

PyPI – Cheese Shop

PyPI is the repository of Python packages. Since the PyPI is nearly unpronounceable, you can call it Cheese Shop. Python was named by Guido van Rossum after the British comedy group Monty Python and the Cheese Shop is this sketch:

Contrary on the poor guy in the sketch, you will find all sort of cheese you need in the cheese shop.

Pip

Pip is the definitive tool for installing Python packages from Cheese shop on your environment. pip install package-name and you’ll get the package ready and running. Even more interesting is the pip install -r requirements.txt feature. It will install all the packages listed in the requirements.txt text file usually shipped with a package having some dependencies.

PgAdmin

pgadmin4-properties.png

Screenshot: official website

Django and PostgreSQL DBMS are a powerful couple. If you have to use a PostgreSQL database, the best interface you can use is PgAdmin.

Django Packages

Django Packages is the Hitchhiker guide to the cheese shop. You’ve to choose a REST framework but you don’t want to marry with a unreliable partner? You need a good photo gallery and you want to get the best django app to implement in your django application? Django packages will guide you to the best solution for your needs.

django-packages

Any feature has a comparison matrix, where all projects are listed in columns where these criterion, elaborated from Github, are contemplated:

  • Project status (production, beta, alpha)
  • Commit frequency in the repository
  • How many times the project was forked
  • Who work on the project
  • Link to online documentation
  • Features comparison

If you’re coming from a CMS like Drupal here some tips to how to approach a Model-View-Controller like Django, starting from the Entity-Relationship model.

Personal note: Back in the 1998 I start to develop application for the web using ASP and PHP and dependencies weren’t an issue since these languages are for the web. Developing in Python is more challenging and really more fun than programming in PHP. You have a powerful multipurpose language with a ton of libraries competing in a far larger arena than the web development. Not surprising, Google use this language extensively as of some popular web services like Pinterest and Instagram: these last two are using Django.

Read also on the same topic: Django development on Virtualbox: step by step setup

From Drupal to Django: how to migrate contents

In a recent article¬†I explain the motivations for an upgrade from a no longer maintained Drupal 6 installation to¬†Django 1.8. I will now cover more in detail the migration techniques adopted in the upgrade and I’ll deepen the models and the relationships.

Structure

If you’re a drupaler, you’re familiar with the node/NID/edit and the node/add/TYPE pages:

A-New-Page-Drupal-6-Sandbox

Here we have two visible fields: Title and Body. One is an input type text and the other a texarea. The good Form API provided by Drupal calls these two types textfield and textarea. However if you use the Content type creation interface you don’t see any of these, just declare some field types and you’ll see the form populating with new fields after the addition.

It’s similar in Django but you haven’t to pass to a graphical interface to do this: structure is code-driven and the side effect is the ability to put on revision almost anything. You can choose between different field types that will be reflected in database and on the user interface.

Here what the Drupal Body and Title fields looks like in a model called Article:

# models.py
from django.db import models
from tinymce import models as tinymce_models
# Articles
class Article(models.Model):
    title       = models.CharField(max_length=250,null=False, blank=False)
    body        = tinymce_models.HTMLField(blank=True, default='')

The TinyMCE part require TinyMCE app installed and configured.¬†If you’re new to Django read and follow the great Writing your first Django app¬†to understand the basics, e.g the difference between a project and an app or the following sections will sound pretty obscure.

After editing your projectname/appname/models.py file you can now apply the changes in your app via makemigrations (create a migration file for the changes in the database) and migrate (apply the migrations inside the migration files).

In a real world scenario these two fields alone aren’t enough neither in a Drupal 6.¬†These information are all presented by default in any type on Drupal 6:

authoring-info

Drupal 6 treats author as entities you can search through an autocomplete field, and date as a pseudo-ISO 8601 date field. The author field is a link to the User table in Drupal. In Django a similar user model¬†exists but if you want to unchain the access to the admin backend and the authorship it’s simpler to¬†create a custom author model and later associate this with the real user model.

sport3_uml

E-R of our app where migrate the Drupal contents to.

from django.db import models
from tinymce import models as tinymce_models
# Authors
class Author(models.Model):
    alias       = models.CharField(max_length=100)
    name        = models.CharField(max_length=100, null=True, blank=True)
    surname     = models.CharField(max_length=100, null=True, blank=True)
# Articles
class Article(models.Model):
    author      = models.ForeignKey('Author', verbose_name='Authored by')
    title       = models.CharField(max_length=250,null=False, blank=False)
    body        = tinymce_models.HTMLField(blank=True, default='')
    publishing_date = models.DateTimeField(auto_now=False, auto_now_add=False, verbose_name='First published on')

As you can see in the Entity-Relationship diagram one¬†Article must have one and only one Author, but many¬†Articles can have the same Author. This is called Many-to-one relationship¬†and it’s represented in Django as a foreign key from the destination “many” model (e.g. Article) to the “one” model (Author).

The Article.publishing_date field is where publishing date and time are stored and, clicking on the text field, a calendar popup is presented to choose the day and hour, with a useful “now” shortcut to populate the field with the current time.

calendario

How a calendar is represented in a DateTime field.

Now that the basic fields are in the right place you can makemigrations / migrate again to update your app, restarting the webserver to apply the changes.

Attachments and images

Drupal is shipped with the ability to upload files and images to nodes. Django has two different field for this: FileField and ImageField. Before continuing we have to rethink our E-R model to allow attachments.

sport3_uml

 

The model.py code is:

from django.db import models
from tinymce import models as tinymce_models
# Authors
class Author(models.Model):
    alias       = models.CharField(max_length=100)
    name        = models.CharField(max_length=100, null=True, blank=True)
    surname     = models.CharField(max_length=100, null=True, blank=True)
# Articles
class Article(models.Model):
    author      = models.ForeignKey('Author', verbose_name='Authored by')
    title       = models.CharField(max_length=250,null=False, blank=False)
    body        = tinymce_models.HTMLField(blank=True, default='')
    publishing_date = models.DateTimeField(auto_now=False, auto_now_add=False, verbose_name='First published on')
# Attachments
class Attachments(models.Model):
    description = models.CharField(max_length=255, default='', blank=True)
    list = models.BooleanField(default=True)
    file = models.FileField(upload_to='attachments_directory', max_length=255)

Images are similar: if you want to enrich your model with images you can create another model like Attachments but with an ImageField instead. Remember to use a different upload_to directory in order to keep the attachments and images separated.

We miss the last one field to complete our models: path. Django comes with an useful SlugField that as of Django 1.8 allows only ASCII characters and can be mapped to another field, the title for example.

from django.db import models
from tinymce import models as tinymce_models
# Articles
class Article(models.Model):
    author      = models.ForeignKey('Author', verbose_name='Authored by')
    title       = models.CharField(max_length=250,null=False, blank=False)
    body        = tinymce_models.HTMLField(blank=True, default='')
    publishing_date = models.DateTimeField(auto_now=False, auto_now_add=False, verbose_name='First published on')

Keep in mind that a SlugField differs from a Drupal path field because it doesn’t allow slashes. Consider a path like this:

news/news-title

In Drupal you will have a A) view with the path news and the argument news title or B) a fake path generated by pathauto or similar modules. In years of Drupal development, I can affirm that the B option is the typical easy way that turns into a nightmare of maintainance. Django core as far as I know allows only the A choice, so if you want a news view you have to declare it in urls.py and then in views.py as stated in official documentation.

  • news/: the news root path, coupled with the view
  • news-title: the ¬†argument passed to the view and¬†the SlugField content for an article. It must be unique to be used as key to retreive an article but since it can be empty we cannot force it to has a value or to be unique at first. When all data are imported and fixed we can change this field to unique to improve database retrieval performance.

Categories

And what about categories? If you have a category named Section, and an article can be associated with only one Section, you have to create a Many-to-one relationship. As you see before, you have to put the foreign key in the N side of the relation, in this case Article, so the model Article will have a ForeignKey field referencing a specific section.

On the other hands if you have tags to associate to your article you have to create a Tag model with a Many-to-many relationship to the Article. Django will create an intermediate model storing the Article-Tag relationships.

Do not abuse of M2M relationships because each relation needs a separate table and the number of JOIN on database table will increase with side effects on the performance, not even perceivable on the first since Django ORM is very efficient. The event handling will be more difficult for a beginner since the many to many events occurs only when the parent models are saved and this require some experience if you need to add a custom action to a M2M event. If you design wisely your E-R model you have nothing to be scared of.

Migration techniques

Now that we have the destination models, fields and relationship we can import the content from Drupal. In the previous article I suggested to use Views Datasource module to create a JSON view to export content. Please read the Exporting the data from Drupal section inside the article before continue.

The obtained row is something like:

{
  [
    {
      {nid: '30004',
      domainsourceid: '2',
      nodepath: 'http://example.com/path/here',
      postdate: '2014-09-17T22:18:42+0200',
      nodebody: 'HTML TEXT HERE',
      nodetype: 'drupal type',
      nodetitle: 'Title here',
      nodeauthor: 'monty',
      nodetags: 'Drupal, dragonball, paintball'
      }
    },
    ...
  ]
}

If you haven’t a multi-site Drupal you can ignore domainsourceid field.¬†The nodetags lists some Tag names of a Many-to-many relationship not covered here.

All the other value are useful for the import:

  • nid: the original content id, used for pagination and retrieval
    Destination: parsing
  • nodepath:¬†content path
    Destination: Article.path
  • nodebody: content body
    Destination: Article.body
  • nodetype: type of the node
    Destination: parsing
  • nodetitle: title of the node
    Destination: Article.title
  • nodeauthor:¬†author of the content
    Destination: Article.author -> Author.alias

In the previous article you find how to make the View on Drupal (source) and now you have  rough idea of the field mapping. How to fetch the data from Django?

Management command and paged view

To start a one-time import you can write a custom management command for your Django application named project/app/management/commands/myimport.py.

from __future__ import unicode_literals
from django.core.management.base import BaseCommand, CommandError
from django.core.exceptions import ValidationError, MultipleObjectsReturned, ObjectDoesNotExist
import json, urllib
import urlparse
from shutil import copyfile
from django.conf import settings
from os import sep
from django.core.files.storage import default_storage
from django.utils.text import slugify
import requests
import grequests
import time
from md5 import md5

class Command(BaseCommand):
    help = 'Import data from Drupal 6 Json view'
    def add_arguments(self, parser):
        parser.add_argument('start', nargs=1, type=int)
        parser.add_argument('importtype', nargs=1)
        # Named (optional) arguments
        # Crawl
        parser.add_argument('--crawl',
            action='store_true',
            dest='crawl',
            default=False,
            help='Crawl data.')
    def handle(self, *args, **options):
        # process data
        pass

This management command can be launched with

python manage.py myimport 0 article --crawl

Where 0 is the item to start + 1, “article” is the type of content to import (e.g. the destination model) and –crawl is the import option. Let’s add the import logic to the Command.handle method:

def handle(self, *args, **options):
    try:
        assert options['crawl'] and options['importtype']
        # start to import or store data
        sid = int(options['start'].pop())
        reading = True
        while reading:
            importazioni = []
            articoli = []
            url = 'http://www.example.com/json-path-verylongkey?nid=%d' % (sid,)
            print url
            response = urllib.urlopen(url)
            data = json.loads(response.read())
            data = data['']
            # no data received, quit
            if not data:
                reading = False
                break
            for n, record in enumerate(data):
                sid = int(record['']['nid'])
                title = record['']['nodetitle']
                # continue to process data, row after row
                # ...

    except AssertionError:
        raise CommandError('Invalid import command')

This example will fetch /json-path-verylongkey starting from nid passed from the command + 1. Then, it will process the json row after row and keep in memory the id of the last item. When no content is available, the cycle will stop. It’s a common method and it’s lightweight on the source server because only one request at time are sent and then the response is processed. Anyway, this method can be also slow because we have to sum waiting time: (request 1 + response 1 + parse 1) + (request 2 + response 2 + parse 2) etc.

Multiple, asyncronous requests

We can speed up the retrieval by using grequests. You have to check what is the last element first by cloning the Drupal data source json view and showing only the last item, then fetching the id.

def handle(self, *args, **options):
    try:
        assert options['crawl'] and options['importtype']
        # start to import or store data
        sid = int(options['start'].pop())
        # find last node id to create an url list
        url = 'http://www.example.com/json-path-verylongkey-last-nid'
        response = requests.get(url, timeout = 50)
        r = response.json()
        last_nid = int(r[''].pop()['']['nid'])

You can then create a from-to range starting from the first element passed by command line to the last.

url_pattern = "http://www.example.com/json-path-verylongkey-last-nid?fromnid=%d&tonid=%d";
urls = []
per_page = 20
# e.g. [0, 20, 40, 60]
relements       = range(0, last_nid, per_page)
if relements[-1] < last_nid:
    relements.append(last_nid + 1)
for fromx, toy in zip(relements, relements[1:]):
    u = url_pattern % (fromx, toy)
    urls.append(u)

rs = (grequests.get(u) for u in self.urls)
# blocking request: stay here until the last response is received
async_responses = grequests.map(rs)
# all responses fetched

The per_page is the number of element per page specified on Drupal json view. Instead of a single nid parameter, fromnid and tonid are the parameter “greater than” and “less or equal than” specified in the Drupal view.

The core of the asyncronous, multiple requests is grequests.map(). It take a list of urls and then request them. The response will arrive in random order but the async_responses will be populated by all of them.

At that point you can treat the response list like before, parsing the response.json() of each element of the list.

With these hints you can now create JSON views within Drupal ready to be fetched and parsed in Django. In a next article I will cover the conversion between the data and Django using the Django ORM.

How to display a custom cover embedding a youtube video and when stopped display the cover again

I need to display a custom image cover in front of an embedded Youtube video.

After the video has stopped, I need to display again the clickable cover.

For a better graphical result I’ve added an over image for the cover and a fadein to the cover when the video ends. To do this I’ve used the Youtube iframe API.

This code is for jQuery 1.4.4. If you have a newer version of jQuery and live() is not working change live() to on().

Here the html:

<a id="idcover" href="#" 
style="display: block; width: 100%;">
<img src="/path/to/cover/off.jpg" alt="Video"></a>

Here the js:

// include youtube API
$.getScript("http://www.youtube.com/player_api");
var myselector = "#idcover";
// preload image displayed on over to avoid glitches: 900 with, 500 height
overimg = new Image(900,500);
overimg.src = '/path/to/cover/on/hover.jpg';
var offimg_src = overimg.src;
$(myselector).live('mouseover', function (e) {
offimg_src = $(this).find('img:first').attr('src');
$(this).find('img:first').attr('src', overimg.src);
});
$(myselector).live('mouseout', function (e) {
$(this).find('img:first').attr('src', offimg_src);
});
$(myselector).live('click', function (e) {
e.preventDefault();
// add video player container
var playerid = 'yourplayercontainerid';
$(myselector).after('&lt;div style="display: none;" id="' + playerid + '"&gt;&lt;/div&gt;');
// I suppose the framework is loaded before the click, so this is not strictly necessary
// function onYouTubeIframeAPIReady() {
window.player = new YT.Player(playerid, {
width: '100%',
height: 720,
videoId: '7W2vjTgzucA', // your youtube code here
playerVars: { 'autoplay': 1, 'controls': 1, 'rel': 0 },
events: {
'onReady': onPlayerReady,
'onStateChange': onPlayerStateChange,
// 'onError': onPlayerError
}
});
// }
function onPlayerReady(event) {
// hide cover
$(myselector).hide();
// view the player
$('#'+playerid).show();
}

function onPlayerStateChange(e) {
// se stopped (raggiunto il fondo), rimette il tappo e distrugge il video player
if (e.data == 0) {
$(myselector).fadeIn(500);
// destroy iframe player
window.player.destroy();
// destroy player container
$('#'+playerid).remove();
// now the cover is ready to another click, and all 
// this process will restart on user click on cover
}
}

Clear Varnish cache via PHP: a Drupal 7 proof of concept

Using Varnish as reverse proxy or proxy is an useful approach to reduce the load of webservers like Apache.

In Drupal 7 I’ve to clear the varnish cache of a specific domain when Drupal caches are globally cleared. Drupal has the right hook invoked when cache are cleared:

function clearcachevarnish_flush_caches() {
  $filename = '/var/www/varnishdomains2cleardir/varnishdomains2clear';
  // each domain on a separate line: append to the end of the file
  $myfile = fopen($filename, &quot;a&quot;);
  $h = $_SERVER['HTTP_HOST'];
  $txt = $h . &quot;\n&quot;;
  fwrite($myfile, $txt);
  fclose($myfile);
  drupal_set_message('Varnish cache queued to be cleared. Please wait 1 minute before checking.');
  // no cache table should be cleared
  return array();
}

Now this piece of code simply adds the current domain to a ASCII text file on /var/www/varnishdomains2cleardir/varnishdomains2clear.

Preparing the file to the write

On CentOS you have to add /var/www/varnishdomains2cleardir to the httpd-writable directories list using:

mkdir /var/www/varnishdomains2cleardir;
chcon -v --type=httpd_sys_content_t /var/www/varnishdomains2cleardir;
chown myuser:mygroup /var/www/varnishdomains2cleardir;
chmod -R 777 /var/www/varnishdomains2cleardir;
touch /var/www/varnishdomains2cleardir/varnishdomains2clear;

Now the empty file is ready to be written by your hook_flush_caches() implementation. Now enable the clearvarnishcache module and clear the cache to write the current domain name to the file.

The clear varnish cache script

To clear the varnish cache you usually have to be logged as root using the command varnishadm. Here a script that will read the domains file written above, clear the varnish cache for that domain and then remove the domains lines.

#!/bin/bash
callinguser=`whoami`
if [ &quot;root&quot; != &quot;$callinguser&quot; ]
then
 echo &quot;Only root can run this command.&quot;
 exit 1
fi
cd /path/to/clear/cache/command/

date=`date +%Y-%m-%d_%H:%M:%S`

# check lock
# prevent the script from being run more than once
if [ -f /tmp/clearcachevarnish-lock ]; then
echo &quot;Script clearcachevarnish is already running. You can rm /tmp/clearcachevarnish-lock to break the lock manually.&quot;
exit 1
fi
touch /tmp/clearcachevarnish-lock
dominidapulire=`less /var/www/varnishdomains2cleardir/varnishdomains2clear`
while [[ ! -z $dominidapulire ]]
do
 dominio=$(echo &quot;$dominidapulire&quot; | sed -n '$p')
 echo $dominio
 dominidapulire=$(echo &quot;$dominidapulire&quot; | sed '$d')
 if [ &quot;&quot; != &quot;$dominio&quot; ]
 then
 varnishadm -T 127.0.0.1:6082 -S /etc/varnish/secret ban req.http.host == &quot;$dominio&quot;
 echo &quot;varnish cleared on $dominio&quot;
 fi
done
# remove all domains lines
truncate --size 0 /var/www/varnishdomains2cleardir/varnishdomains2clear

# remove lock
rm /tmp/clearcachevarnish-lock

Make¬†this script as executable .sh file using chmod a+x on it. If you run the bash script, varnish cache for files on the domains list will be cleared. It’s not so useful when using the Drupal UI so we should schedule this task periodically, e.g. every minute.

Scheduling the varnish clear cache

Here the crontab entry for execute the script every minute:


* * * * * root /path/to/clear/cache/command/clearcachevarnish.sh

The steps

  1. User clear Drupal cache
  2. hook_flush_caches() is invoked: the domains list file is written
  3. clear varnish cache script is launched by root every minute
  4. for each domain in the list, varnish cache is cleared

This is the end of this proof of concept. The code wasn’t tested against attacks so please comment if you have any suggestion to improve it. I’m not very fond of the idea of a php script writing something read by a bash script but this is the less problematic solution I found for this case.

Web fonts and dynamic height calculation issues on jQuery

Recently we’ve nice fonts on web pages like Google Fonts and other web fonts. Take this case, you have to set two divs to the same height. One (div.funny) has some text with Google Fonts, the other is empty.

On Chrome console you type something like:

jQuery(".very", ".myview").height(function () {
  jQuery(this).height(jQuery(this).parent(".myview").find('.funny').height());
});

Div.very and div.funny are now at the same height.

Now if you try to do the same on jquery document ready you got elements with different height. Why?

Because the calculation happens on document ready but before fonts are loaded. The solution is to wrap the code on $(window).load().

$(window).load(function () {
  $(".very", ".myview").height(function () {
    $(this).height($(this).parent(".myview").find('.funny').height());
  });
});

Now .very and .funny are at the same height.

See also:
Calculate Container’s Height After The Font File Loads

Web fonts and dynamic height calculation issues on jQuery

Recently we’ve nice fonts on web pages like Google Fonts and other web fonts. Take this case, you have to set two divs to the same height. One (div.funny) has some text with Google Fonts, the other is empty.

On Chrome console you type something like:

jQuery(".very", ".myview").height(function () {
  jQuery(this).height(jQuery(this).parent(".myview").find('.funny').height());
});

Div.very and div.funny are now at the same height.

Now if you try to do the same on jquery document ready you got elements with different height. Why?

Because the calculation happens on document ready but before fonts are loaded. The solution is to wrap the code on $(window).load().

$(window).load(function () {
  $(".very", ".myview").height(function () {
    $(this).height($(this).parent(".myview").find('.funny').height());
  });
});

Now .very and .funny are at the same height.

See also:
Calculate Container’s Height After The Font File Loads