Create nice unicode PDF using Python

Today I started one of the less motivating activities in Python 2.x: encoding.

In Python 3 unicode will be everywhere, but as of the 2.6 version I’ve on one of the server I have to endure.

Objective: get data from a UTF-8 encoded json and print a nice PDF.

Tools: json, urllib2, fpdf, cgi

What you need:
pyfpdf: https://code.google.com/p/pyfpdf/downloads/list

  • Download fpdf-1.7.hg.zip or more recent
  • Unzip, enter the directory and python setup.py install
  • locate fpdf
  • cd /usr/lib/python2.6/site-packages/fpdf (or the directory name you got with locate)
  • Download unicode fonts for fpdf
  • Unzip and copy the fonts folder in the fpdf directory

Now you have a working FPDF with unicode support and unicode fonts. Start to write your script, I assume you’re using python 2.6, if not change python2.6 to your python version (e.g. 2.7) or remove version number in the heading (just python). As now FPDF works with Python 2.5 to 2.7.

Here I write a simple cgi-bin script, so you have to put it in the /var/www/cgi-bin directory (CentOS) or in /usr/lib/cgi-bin (Debian).

#!/usr/bin/env python2.6
#-*- coding: utf-8 -*-
from fpdf import FPDF
import json
import urllib2
import os
import cgi
import sys
# set system encoding to unicode
import sys
reload(sys)
sys.setdefaultencoding("utf-8")

Now get some arguments from url. These will be used to compile a query to a external json service.

# e.g. http://example.com/cgi-bin/myscript.py?lang=en&sid=2
sid = arguments.getlist('sid')[0]
lang = arguments.getlist('lang')[0]
# compile a request to get a particular element from an external json
dataurl = "http://example.com/external-json-source?lang=%s&sid=%s" % (lang, sid)
# load json from dataurl and convert into python elements
data = json.load(urllib2.urlopen(dataurl))
# the json has a user attribute: the user attribute has name and surname attributes as strings
user = data['user']
# title is a simple string
title = data['title']

Now you have to load the json from the external source. Json must be encoded in UTF-8:

lato_lungo = 297
lato_corto = 210
pdf = FPDF('L','mm','A4')
# add unicode font
pdf.add_font('DejaVu','','DejaVuSansCondensed.ttf',uni=True)
pdf.add_page()
pdf.cell(w=lato_lungo,h=9,txt=title,border=0,ln=1,align='L',fill=0)
pdf.set_font('DejaVu','',12)
# paragraphs rendered as MultiCell
# @see https://code.google.com/p/pyfpdf/wiki/MultiCell
# print key: values for each user['data'] dictionary attributes
for val in user.iteritems():
    pdf.multi_cell(w=0,h=5,txt="%s: %s" % val)
# finally print pdf
print pdf.output(dest='S')

Now:

  1. Open your browser and visit http://example.com/cgi-bin/myscript.py?lang=en&sid=2
  2. The external source http://example.com/external-json-source?lang=en&sid=2 is grabbed and converted into a python data structure. Both source and destination encoding are unicode utf-8.
  3. Data from external source are used to create the pdf.

You can use as many fonts as you have in the fpdf/font directory, just add those using pdf.add_font().

https://code.google.com/p/pyfpdf/downloads/list

Scrapy on Debian 6

Debian 6 comes with Scrapy 0.8 as downloadable packages on apt. Here a quick howto to get this spider works on Debian 6.

  1. sudo apt-get install python-scrapy
  2. cd
  3. mkdir mydir
  4. cd mydir
  5. scrapy-ctl startproject anime
  6. export SCRAPY_SETTINGS_MODULE=anime.settings
  7. export PYTHONPATH=/home/YOURHOMEHERE/mydir

If you’ve already a bot but you, to run your spider thanks to point 6 and 7 you can simply type:

scrapy-ctl crawl example.com

Otherwise, now you can follow the howto on tutorial section of Scrapy 0.8 or this awesome howto by Pravin Paratey to write your own bot, but remember to use the scrapy-ctl command instead of the .py version and to add all your spiders to SCRAPY_SETTINGS_MODULE and PYTHONPATH.

To list your available (and correctly configured) spider, just type:

scrapy-ctl list

If a bot doesn’t appear here, you have an issue on point 6 or 7 or you have a misconfigured spider, i.e. I was forgetting the SPIDER part on bottom of my spider and I was using domain instead of domain_name on my script, see Pravin’s howto to write correct Scrapy 8.0 code.

Django development on Virtualbox: step by step setup

I had a bad morning trying to repair my Cygwin installation from a virtualenv mess. It’s time to get a Debian and install it on a Virtualbox for my new django project!

  • Windows: host
  • Debian: guest

Choosing the distro: what I want

  • Python 2.6
  • Django 1.4
  • Apache + Mysql

I’m a Debian fan from years so I go to the Debian website and download Wheezy netinst iso (32 bit, since I’m on a 32 bit OS and I want to use more core): wheezy met all the requirements above.

I already have a Virtualbox, so what I do is to add a new virtual disk and to add the new Wheezy netinst iso on CD/DVD images. Then I create a new Debian machine (32 bit) with two cores. I choose the iso image to be mounted on startup so the Debian setup process will start on boot.

As network device, I choose the Bridge option, so I can access the machine later from my windows host.

Installing the system

When you turn your machine on, many choices will be prompted to you. I install the webserver (apache) from the list, removed SQL server and print server and then leave desktop selected and the other default values. After some minutes Debian is installed and I can log in with the credential I have specified during installation.

Use WORKGROUP as network name if you’re running a Windows host when asked.

Install django packages

Under the Application menu, find the Debian package management tools to install what you want. As the requirements I’ve listeded above I search and install those packages:

  • python-django (1.4.1-2)
  • libapache2-mod-uwsgi
  • libapache2-mod-wsgi
  • mysql-server
  • samba

Later you can install more useful packages like virtualenv and phpmyadmin.

After you’ve installed those packages, you can do some test. Open a shell (Accessories > Terminal) and then type these commands:

What version of python I’m running?

$ python

Python 2.7.3rc2 (default, Apr 22 2012, 22:35:38)
[GCC 4.6.3] on linux2
Type “help”, “copyright”, “credits” or “license” for more information.
>>>

So I’ve python 2.7. Good!

>>> import django
>>> django.VERSION
(1, 4, 1, ‘final’, 0)

And I’ve Django 1.4.1.

Share your code to the Windows network (workgroup)

Now I want to read the code from one machine to another. I choose Samba server to read and write files from the virtual machine to windows and back. It will be useful since I’ve a complete Eclipse + pydev IDE on windows and I love work with it.

I open a Root terminal and type:

# ifconfig

If you choosed the Bridge network interface on installation, you will got something like this:

eth0 Link encap:Ethernet HWaddr ??????
inet addr:192.168.0.104 Bcast:192.168.0.255 Mask:255.255.255.0
inet6 addr: ???????????/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:49280 errors:0 dropped:0 overruns:0 frame:0
TX packets:19777 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:45400047 (43.2 MiB) TX bytes:1887849 (1.8 MiB)
Interrupt:19 Base address:0xd020

The address in bold (192.168.0.104) is the local network address of my virtual machine. If I just type this address in the Chrome browser it’s running on Windows (host) I got the “It works!” from Apache on the virtual machine. If you can’t see nothing, left click on the network icon on the bottom of your virtualbox windows > click on the menu voice and then choose the Bridge option. Then redo the ifconfig as above.

Samba tuning

Create a directory to store your django code (inside current user home folder). Open the terminal as normal user:

$ cd
$ mkdir my-django-code

Then share this folder with samba. To do this, let’s create a new user without a password:

adduser guest –home=/home/public –shell=/bin/false –disabled-password

Then add these lines to /etc/samba/smb.conf on “## Authentication ##” section:

security = share
guest_account = guest
invalid_users = root

obey pam restrictions = yes

And then after the [cdrom] commented text:

[my-django-code]
comment=Django-code
read only = no
locking = no
path = /home/myuser/my-django-code
guest ok = yes
force user = myuser

Where myuser is my (normal) user name. The lines above tell something like this to samba:

  • Let a guest user access without a password
  • …to the path /home/myuser/my-django-code
  • …”masquerading” like myuser

The “masquerade” thing is all about having the right to write files created from myuser from the guest user on the host.

When i browse my Workgroup on windows, I found the machine name I choose during installation and inside I found the my-django-code directory. I try to read and write files from the host (Windows) and from the guest (Debian) and it’s all ok.

Django, finally!

If you’re starting to develop on django, so this howto for beginners will help you a lot. Since I’ve installed the python-django package from Debian, to start a project is simple as typing this:

$ cd
$ cd my-django-code
$ django-admin startproject django_unchained
$ cd django_unchained
$ python manage.py runserver 192.168.0.104:8000

Where 192.168.0.104 is the virtual machine local network address from above and 8000 the port of the django testing webserver.

I type:

http://192.168.0.104:8000

on Chrome (host: Windows) and I get the hello page from Django. Perfect!

Then, I can just follow the django howto to do the right things during the creation of my new app django_unchained!

You can also explore the must-have list of tools and sites for Python developers.

Installing Plone on Debian

A little howto to quickly install and try Plone (a GPL’d CMS based on Zope) on your linux box. Well, the installer seems to do the job nicely. 🙂

Tested on Plone 3.* version, Debian “Lenny”.

    • apt-get install g++
    • Download latest version of Plone (Unified Installer)
    • Execute:
      tar zxvf Plone-YOURVERSION-UnifiedInstaller.tgz
      cd Plone-
      YOURVERSION-UnifiedInstaller
      ./install.sh standalone
      gedit /usr/local/Plone/zinstance/README.txt &
      gedit /usr/local/Plone/zinstance/buildout.cfg &
      /usr/local/Plone/zinstance/bin/plonectl start
      less /usr/local/Plone/zinstance/adminPassword.txt

README should be read to follow installation instructions, then you can modify Plone configuration on buildout.cfg, and then you can start Plone. On adminPassword.txt you’ll find your Plone passwords to use for administrative purpouses.

  • Add /usr/local/Plone/zinstance/bin/plonectl start to /etc/rc.local before exit 0 (Red Hat) to run plone at any server restart or create a script on /etc/init.d/ (Debian) like.

Now you can test this CMS based on Python (I’ve tested it 4 years ago, maybe it hardly can replace Drupal but you can give it a try 😉 ).