PIL: ord() expected a character, but string of length 0 found (SOLVED)

Using Django, and easy_thumbnails coupled with Pillow specifically I’m stumbled upon this error in PIL.ImageFile on PIL/_binary.py:

ord() expected a character, but string of length 0 found

This python error was so frequent I’ve done some research, coming up with nothing.

I’ve checked current Pillow version with:

pip freeze | grep Pillow

Getting:

Pillow==3.0.0

Then I’ve upgraded the Pillow package with this:

pip install Pillow --upgrade

The sofware was updated to the very last version (5.0.0) without any issue on easy_thumbnails backend or frontend.

Consequently, pip freeze returned:

Pillow==5.0.0

and errors are gone.

tl;dr: Update Pillow from 3.0.0 to latest version (5.0.0 by now)

Note: this error can be accompained by others:

  • unpack requires a string argument of length 2
  • string index out of range

Upgrading Pillow also correct these.

Why you should not use Drupal anymore

I’ve started using Drupal on 2007. For about 9 years I visits drupal.org almost on daily basis, I released a module, I suggested some patches, participated to local events and so on. I’ve started working on Drupal with Drupal 5 and I ended on Drupal 7 with a long time on Drupal 6.

In the meantime, Acquia was created to support Drupal development and make some money from the project and for the project in a typical open source scheme, free software accompained to paid services.

This is my report based on those years.

Flaws

On my journey to Drupal and beyond, I enjoyed the community but I’ve seen many issues in those years:

  • Too much security flaws in core and contrib.
  • Too much bugfix to fix these flaws: maintaining tens of website without Acquia automatic update services become a challenge (and it’s the reason you want to pay for it).
  • Contrib modules are frequently poorly designed and maintained.
  • Issues remains unfixed forever or are automatically closed without being actually fixed on both core and contrib.
  • Any major version released can be totally different from the previous one, requiring extra efforts for nothing.
  • Drupal inherit all the PHP problems and try to overcome them with internal functions replacing some of those from the language accompained with the good Drupal API documentation.

The result of these issues together is that you cannot use Drupal without a dedicated Drupal team to take care of fixing components. Even two people are not enough to develop and follow couples of Drupal websites, not to mention a lone developer.

I’ve talked with other developers that used Drupal and other applications to design and publish websites and their assumption is easy: if you have a small team, you cannot use Drupal. Not anymore. Maintainance would be overwhelming.

Following this fondness for big teams, the release of Drupal 8 confirmed how low is the power of the community become compared to core contributors and Acquia (where many of them now works). They deliberately moved this open source software from a multipurpose Content Management System usable from small and big firm to a software that has the Enterprise world in his mind and forget the others.

This was a shock for many small developers and enthusiasts.

GTFO

I’ve read about Drupal enthusiasts that suffers the same uneasiness of mine after long-time Drupal or PHP development.

Here’s a list of theirs experiences along with the number of years they used this software:

Some of them migrated to Django like me. During this journey I’ve discovered a wonderful language like Python and his tools and documentation. I’ve also discovered how deep you can customize the framework to do the job, concentrating on the important things.

You cannot tell the difference without using another framework like Django or WordPress, just to pick some very different beasts. You need to compare Drupal with others to try the difference.

Try and choose

While Drupal try to overcome PHP language, Django uses only a fraction of the power of Python and it’s not the best tool on the Earth for building website like I supposed Drupal was.

This means that I can move for example to Flask when I have to build small of focused web applications, or to Kivy when I have to make a desktop and mobile app using the same language, the same Python packages, building my own classes to share when needed.

This is actually a change of perspective, to choose a language before the framework to easily switch from one to another in case the project go wild.

Upgrading an existing application using a well designed framework is straightforward compared to the major versions migration on Drupal. During the last 7 years, Django preserved much of his structure making simple the maintainance and the upgrade of websites. Virtualenvs surely helps, but the whole design supports the developer in his duties. This is not an unique feature of Python and Django, but it’s what it lacks to PHP/Drupal.

Here’s and example of how a framework built in Python can scale and how to migrate between major versions of the language here, even a very big website can be feasible when the framework design works in your same team:

Decreasing popularity

Another reason to think about leaving Drupal is his decreasing popularity. It seems naif but it’s a very important matter for open source software since a weak community leaves bugs unfixed and create less contrib modules.

Here’s the popularity in Google Trends of Drupal compared to Django between May 1st, 2007 and December 31st, 2017:

drupal-django-trends-2007-2017

Golden age for Drupal was long time ago, at the time of Drupal 6 and early 7. Decline followed the effort to build Drupal 8 in 2012 and the outcome of this transition is better described by this graph:

drupal-wordpress-trends-2007-2017

Drupal was a credible WordPress competitor back on 2008 scoring a 1:3 ratio in Trends at the time of Drupal 5. On January 2018 with Drupal 8 as major version it’s passed to 1:10.

Drupal failed to become the alternative for WordPress and actually was surpassed by niche, low-level alternatives like Django. Because by now:

  • Who wants to build web applications uses Python/Django, Ruby on Rails, Javascript/Node.js, Java, C# and so on.
  • Who wants to build website with a great backend still using PHP uses WordPress.

PHP as a declining language

If it’s not enough, here’s the TIOBE index for PHP language compared to Python from 2002 to 2018:

tiobe-python-php

These are simple indexes, but you can find other evidences about the usage crisis of the couple Drupal/PHP. Take another index like PyPL, read some of the experiences listed above or read this good picture of Drupal development cycle. Find as many sources as you want, the conclusion is the same: Drupal is now a declining framework written in a declining language. Who still uses PHP go for the winner WordPress.

What is to be done?

If you read this, probably you’re asking yourself how to leave Drupal. Start to use another language and framework to suit your needs and then try the difference. If you still want to use Drupal, using another language and framework will surely help you to write better code.

A new language for Social Media managers on Twitter

Last months I looked for a tool to shape my community on Twitter to follow interesting profiles and to increase my followers.

I had bad experiences using integration from third party (app) so I wanted this tool to be able to create my own app on Twitter without 3rd party involvment for better security and privacy.

Since I wanted real new followers and I don’t want to violate Twitter policies I looked for a tool able to select and filter users from my network, choosing only those whom are relevant to follow.

I wasn’t looking for a new, fancy SaaS website with a monthly fee to pay, and I want to handle many accounts at the same time without additional costs.

I wanted a tool able to run on my own PC and with the full access to the source code to avoid my data to be stolen and to understand its inner mechanics.

Well, that tool doesn’t exists at the time.

So I decided to write my own.

The fancy app

When I start to develop a full-featured Twitter app for desktop with a user interface running on my local machine. I started to add icons for the actions, tables to list users, buttons to do actions, a lot of checkboxes to select users and so on.

The result was a good-looking app that works. But when I tried to filter and select users using some criteria it all became clumsy.

As a programmer, I started to feel my own application as a cage.

I wasn’t allowed to search for users on my network for multiple or complex criteria. I wasn’t able to merge, diff or intersect different set of users.

Then I realized that what I was really looking for wasn’t an app but a brand new scripting language to manage social networks.

The programming language

I started to look for open source solutions able to create this new programming language for social media management using Python 3. There is a small bunch of instructions to add to this programming language but I want it to be efficient and well-designed.

Here comes in help TextX by the professor Igor Dejanović, a parser build on top of the Arpeggio PEG. Among all formal languages and parsers, Parsing Expression Language (PEG) seems to me the better for my purpose and the most modern approach to parsers.

The grammar of this new language is written on a single file and can be graphically represented using DOT language. Then TextX use the grammar to parse the language using a Meta-model where the language comes to life.

.ows language grammar

.ows language grammar

To handle the Twitter management I used the solid Tweepy by Joshua Roesslein and to query the social network SQLAlchemy and SQLite.

Scripts are launched by command line and an interactive console with history is available to manage your Twitter account using the scripting language using the Python Prompt Toolkit by Jonathan Slenders.

JournaKit Followship .ows

All these free software / open source tools among others are the construction blocks used to create this bare-bones social media managers’ tool for Twitter.

A simple scripting language to manage and expand your network of followers and friends with complex queries, running on my PC, registered with custom application on Twitter and executable on many accounts at the same time.

Its name is JournaKit Followship .ows and it’s available on Gumroad. The complete application, the source code and a comprehensive user manual are provided allowing you to master the .ows language.

This is the first application of the JournaKit suite aiming to help journalists and writers whom use the web to share their works and to discover new sources and contacts.

Comment this article or contact me privately if you want to know more about it.

Linux: MySQLdb on virtualenv with –no-site-packages

In the past it was difficult to get MySQL working on virtualenv without using system packages. Now you can have a real separated environment with simple steps:

  1. Follow this guide to install virtualenv using this command:
    virtualenv myproject --no-site-packages

    This command will install a new virtualenv inside a new directory myproject created by the command itself.

  2. Activate virtualenv:
    source myproject/bin/activate
  3. Upgrade setuptools
    pip install pip --upgrade
  4. You can now install MySQLdb, inside the package MySQL-python:
    pip install MySQL-python
  5. Now do a simple test trying to connect to an existing database:
    python
    import MySQLdb
    db = MySQLdb.connect(host="localhost",   # your host, usually localhost
                         user="chirale",         # your username
                         passwd="ITSASECRET",    # your password
                         db="chiraledb")         # name of the database
    cursor = conn.cursor()
    cursor.execute("SELECT VERSION()")
    row = cursor.fetchone()
    print "server version:", row[0]
    cursor.close()
    conn.close()
    

Tested on CentOS 7, Python 2.7

Tip: If you are starting to create a database doing all the dirty work alone you’ve to give SQLAlchemy a try. You can use like an ORM or a lower level as you wish.

See also

The Hitchhiker’s Guide to Python
Simple MySQLdb connection tutorial

About the same topic

Python: MySQLdb on Windows virtualenv (w. figures)

Python: MySQLdb on Windows virtualenv (w. figures)

If you have a virtualenv on Windows and you want to add MySQLdb support via mysql-python, read this before spending hours of your life to figure why it doesn’t and it will never work.

1) Install MySQL for Python selecting the same Python version of the virtualenv

python-mysql-win

2) From site-packages directory above, copy the selected files:

python-mysql-mysqldb

3) (optional) On PyCharm, look for virtualenv site-packages inside the path marked with the arrow:

pycharm-virtualenv

4) Open your virtualenv console ad do:

import MySQLdb

MySQL for Python is now installed on your virtualenv.

About the same topic
How to Install MySQLdb in PyCharm, Windows
Linux: MySQLdb on virtualenv with –no-site-packages

 

Memory Error on pip install (SOLVED)

Memory Error when using pip install on Python can emerge both from command line or from a IDE like PyCharm, usually when the package size is big.

When you try to install a python package with pip install packagename but it fails due to a Memory Error, you can fix it in this way:

  1. Go to your console
  2. Optional: if your application is into a a virtual environment activate it
  3. pip install packagename --no-cache-dir

The package will now be downloaded with the cache disabled (see pip –help).

Thanks to David Wolever

Reduce Time to the First Byte – TTFB on web applications

How to speed up the time to the first byte and what are the causes of a long TTFB? Main causes are network and server-side and I will focus on server-side causes. I’m not covering any CMS here but you can try to apply some of these techniques starting from how to interpret the browser Timing.

Get reliable timing

Take a website with cache enabled: at the 9th visit on a page you can be sure your page is in cache, the connection with the webserver is alive, the SSL/TLS connection is established, the SQL queries are cached and so on. Open the network tab and enjoy your site speed: well, very few real users will experience that speed.

Here a comparison of a first time, no-cache connection to a nginx webserver explored with Chrome (F12 > Network > Timing) and a second request with the same page refreshed right after the first:

performance-01

I got a +420% on a first time request compared with a connected-and-cached case. To obtain a reliable result (1st figure) you should usually:

  • Wait several seconds after a previous call before doing anything, waiting for the webserver to close connection with the client
  • Add a ?string to the url of the page you’re visiting. Change the string every time you want a fresh page.
  • Ctrl+shift+R to reload the page

This technique bypass the Django view cache and similar cache systems on other frameworks. To check the framework cache impact, do a Ctrl+shift+R just after the first request obtaining a similar result of the 2nd figure. There are better ways to do the same, this is the easiest.

Break up the time report

Unpack the time report of the first-time request:

  • Connection setup (15% of the elapsed time in the example)
    • Queueing: slight, nothing to do.
    • Stalled: slight, nothing to do.
    • DNS lookup: slight, nothing to do.
    • Initial connection: significant, skip for now.
    • SSL: significant, client establish a SSL/TLS connection with the webserver. Disabling ciphers or tuning SSL can reduce the time but the priority here is best security for the visitor, not pure speed. However, take a look at this case study if you want to tune SSL/TLS for speed.
  • Request / response (85% of the elapsed time i.e.)
    • Request sent: slight, browser-related, nothing to do.
    • Waiting (TTFB): significant, time to first byte is the time the user wait after the request was sent to the web server. The waiting time includes:
      • Framework elaboration.
      • Database queries.
    • Content Download: significant, page size, network, server and client related. To speed up content download of a HTML page you should add compression: here an howto for nginx and for Apache webservers: these covers proxy servers, applying directly on a virtualhost is even simplier and the performance gain is huge.

Not surprisingly, the time of a first time request is elapsed most in Request / response than on connection setup. Among the Request / response times is the Waiting (TTFB) the prominent. Luckyly it is the same segment covered by cache mechanics of the framework and consequently is the most eroded passing from the first (not cached) to the second figure (cached by the framework). To erode the TTFB, database queries and elaboration must be optimized.

Optimize elaboration: program optimization

When Google, the web-giant behind the most used web search engine in history, try to suggest some tips to optimize PHP to programmers they react badly starting from daily programmers going up to the PHP team bureau.

In a long response, the PHP team teach Google how to program the web offering unsolicited advice offering “some thoughts aimed at debunking these claims” with stances like “Depending on the way PHP is set up on your host, echo can be slower than print in some cases”, a totally confusing comment for a real-world programmer.

Google put offline the PHP performance page that can be misleading but still contains valid optimization tips, especially if you compare with some of comments on php.net itself. Google have interests to speed and code optimization and the writer has the know-how to talk about it, the PHP team here just want to be right and defend their language and starting from good points crossed the line of scientific dialectic.

Program optimization mottos are:

Look for the best language that suits to your work and the best tools you can and look for programmers from the real-world sharing their approaches to the program optimization.

PHP team’s whining will not change the fact that avoiding SQL inside a loop like Google employee suggested is the right thing to do to enhance performance. This leads to database optimization.

Dude, where is my data?

The standard web application nowadays has this structure:

A typical web application

A typical web application: application server run the application so from now on  – oversimplifying – I will treat application and application servers as synonyms.

After the client requests pass through the firewall, webserver serve static files and ask to Application server the dynamic content.

Cache server can serve application or web server but in this example the earlier has the control: an example of cache controlled by application is on the Django docs about Memcache, an example of cache by web server is the HTTP Redis module or the standard use of Varnish cache.

Database server (DBMS) stores the structured data for the application. DBMS on standard use cases can be optimized with little effort. More difficult is to optimize the way the web application get the data from the database.

Database query optimization: prefetch and avoid duplicates

To optimize database queries you have to check the timing, again. Depending on the language and framework you are using there are tools to get information about queries to optimize:

Since I’m using Python I go with Django Debug Toolbar, a de-facto standard for application profiling. Here a sample of SQL query timing on a PostgreSQL database:

Timing of SQL queries on Django Debug Toolbar.

Timing of SQL queries on Django Debug Toolbar.

The total time elapsed on queries is 137,07 milliseconds, the total number of queries executed are 90. Among these, 85 are duplicates. Below any query you’ll find how many times the same query is executed. The objective is to reduce the number of queries executed.

If you’re using Django, create a manager for your models.py to use like this:

class GenericManager(models.Manager):
    """
    prefetch_related: join via ORM
    select_related: join via database
    """
    related_models = ['people', 'photo_set']
    def per_organizer(self, orgz, **kwargs):
        p = kwargs.get('pubblicato', None)
        ret = self.filter(organizer = orgz)
        return ret

class People(models.Model):
    name = models.CharField(max_length=50)
    ...

class Party(models.Model):
    organizer = models.ForeignKey('People')
    objects   =  GenericManager()

class Photo(models.Model):
    party = models.ForeignKey('Party')
    ...

Then in views.py call your custom method on GenericManager:

def all_parties(request, organizer_name):
    party_organizer = People.objects.get(name=organizer_name)
    all_parties = Party.objects.per_organizer(party_organizer)
    return render(request, 'myfunnywebsite/parties.html', {
        'parties' : all_parties
    })

When you want to optimize data retreival for Party, instead of comb through objects.filter() methods on views.py you will fix only the per_organizer method like this:

class GenericManager(models.Manager):
    """
    prefetch_related: join via ORM
    select_related: join via database
    """
    related_models = ['people', 'photo_set']
    def per_organizer(self, orgz, **kwargs):
        ret = self.filter(organizer = orgz)
        return ret.prefetch_related(*self.related_models)

Using prefetch_related queries are grouped via ORM and all objects are available, avoiding many query duplicates. Here a result of this first optimization:

django_sql_query_debug_toolbar_2

  • Query number is dropped from 90 to 45
  • Query execution time dropped from 137,07 to 80,80 (-41%)

An alternative method is select_related, but in this case the ORM will produce a join and the above code will give an error because photo_set is not accessible in this way. If your models are structured in a way you got a better performance with select_related go with it but remember this limitation. In this use case the results of select_related are worse than prefetch_related.

Recap:

  • TTFB can be a symptom of server-side inefficiency but you have to profile your application server-side to find out
  • Check SQL timing
  • Reduce the number of queries
  • Optimize application code
  • Use cache systems, memory-based (redis, memcached) are the faster

In my experience, inefficient code and a lot of cache are a frail solution compared with the right balance between caching and query + program optimization.

If you’ve tried everything and the application is still slow, consider to rewrite it or even to change the framework you’re using if speed is critical. When any optimization failed, I went from a Drupal 6 to a fresh Django 1.8 installation, and Google understood the difference in milliseconds elapsed to download the pages during indexing:

downloadtime

Since you can’t win a fight with windmills, a fresh start may be the only effective option on the table.