Tag Archives: pdf

How to shrink a scanned PDF on Linux

When you want to reduce the file size of a PDF document, this quick command using convert will shrink the original PDF file.

convert -density 150×150 -quality 60 -compress jpeg -colorspace Gray original.pdf new.pdf

This command is particularly useful against scanned documents, the jpeg quality will be 60% for 150dpi.

Converting an original 300dpi / color PDF to a 150dpi, greyscale PDF can reduce file size up to 50%. There will be some quality loss but in this way you can reduce file size enough to send scanned documents of dozens of pages via e-mail without using third-party services.


Create nice unicode PDF using Python

Today I started one of the less motivating activities in Python 2.x: encoding.

In Python 3 unicode will be everywhere, but as of the 2.6 version I’ve on one of the server I have to endure.

Objective: get data from a UTF-8 encoded json and print a nice PDF.

Tools: json, urllib2, fpdf, cgi

What you need:
pyfpdf: https://code.google.com/p/pyfpdf/downloads/list

  • Download fpdf-1.7.hg.zip or more recent
  • Unzip, enter the directory and python setup.py install
  • locate fpdf
  • cd /usr/lib/python2.6/site-packages/fpdf (or the directory name you got with locate)
  • Download unicode fonts for fpdf
  • Unzip and copy the fonts folder in the fpdf directory

Now you have a working FPDF with unicode support and unicode fonts. Start to write your script, I assume you’re using python 2.6, if not change python2.6 to your python version (e.g. 2.7) or remove version number in the heading (just python). As now FPDF works with Python 2.5 to 2.7.

Here I write a simple cgi-bin script, so you have to put it in the /var/www/cgi-bin directory (CentOS) or in /usr/lib/cgi-bin (Debian).

#!/usr/bin/env python2.6
#-*- coding: utf-8 -*-
from fpdf import FPDF
import json
import urllib2
import os
import cgi
import sys
# set system encoding to unicode
import sys

Now get some arguments from url. These will be used to compile a query to a external json service.

# e.g. http://example.com/cgi-bin/myscript.py?lang=en&sid=2
sid = arguments.getlist('sid')[0]
lang = arguments.getlist('lang')[0]
# compile a request to get a particular element from an external json
dataurl = "http://example.com/external-json-source?lang=%s&sid=%s" % (lang, sid)
# load json from dataurl and convert into python elements
data = json.load(urllib2.urlopen(dataurl))
# the json has a user attribute: the user attribute has name and surname attributes as strings
user = data['user']
# title is a simple string
title = data['title']

Now you have to load the json from the external source. Json must be encoded in UTF-8:

lato_lungo = 297
lato_corto = 210
pdf = FPDF('L','mm','A4')
# add unicode font
# paragraphs rendered as MultiCell
# @see https://code.google.com/p/pyfpdf/wiki/MultiCell
# print key: values for each user['data'] dictionary attributes
for val in user.iteritems():
    pdf.multi_cell(w=0,h=5,txt="%s: %s" % val)
# finally print pdf
print pdf.output(dest='S')


  1. Open your browser and visit http://example.com/cgi-bin/myscript.py?lang=en&sid=2
  2. The external source http://example.com/external-json-source?lang=en&sid=2 is grabbed and converted into a python data structure. Both source and destination encoding are unicode utf-8.
  3. Data from external source are used to create the pdf.

You can use as many fonts as you have in the fpdf/font directory, just add those using pdf.add_font().


How to edit a PDF file with Open Office

Some months ago I’ve looked for a decent PDF editor for Linux. Results? Only an application called PDFedit was interesenting enought.

Now, an extension (plugin) for the cross platform suite Open Office called PDF Import do the magic with a nice PDF import for Open Office Draw.

I’ve tested it on a simple PDF document (v. 1.0.1) and the result is amazing. With Open Office, you can rewrite a PDF, save it as Draw document and export the modified version as PDF format with the handy PDF conversion tool.

Since PDF is a widely used format, you can use tool like this to download documents that require some changes before print (e.g. a paper form) without awful cut-and-paste onto an editor.

Related links:


Happy GNU Year to all readers, I’m glad of all of the the 100k visits of this little blog!

PDF cover thumbnails for attached files

This howto has been superseded by http://drupal.org/node/815816 (patch for Upload preview module).
Tested on:

  • Drupal 5.x
  • Content Templates module
  • ImageMagick 6.3.7
  • GhostScript 8.15.3

Using ImageMagick + GhostScript you can convert the first page of a PDF into a thumbnail image linking to file. You can cut and paste this function on your theme main script: you have to manually create the thumb directory and grant write permission by scripts. Check also the path to ImageMagick convert command.

function pdf_thumb_attachments(){
 $allowed_mime = array("application/pdf");
	foreach($files as $file){
	if(in_array(strtolower($file->filemime),$allowed_mime) && strstr(strtolower($file->description),"classifica")){
		$title = $file->description;
			# create link title from file name (Transliteration module suggested)
			$title = str_replace("_"," ",substr($title,0,strlen($title)-4));
		$img = "";
		$local_src_path = getcwd() . "/" . $file->filepath;
		$destfilename = substr($file->filename,0,strlen($file->filename)-4) . ".gif";
		# YOUR-FILES-DIRECTORY/pdfgen will hosts the generated thumbnails
		$local_dest_path = getcwd() . "/" . file_directory_path() . "/pdfgen/" . $destfilename;
		$imageurl = base_path() . file_directory_path() . "/pdfgen/" . $destfilename;
		/* create thumbnail on node reading if file doesn't exist */
			# choose the first page of the PDF, scale to 90x90px and convert to GIF
			$exec = '/usr/bin/convert -scale 90x90 "'.$local_src_path.'"[0] "'.$local_dest_path . '"';
			/* delete this line if you cannot read the generated file
			$exec = "chmod 777 ".$local_src_path;
		$img = '<img src="'.$imageurl.'" alt="'.$title.'" />';
		$output .= "<li>". l($img." ".$title,$file->filepath, $attributes = array("title"=>$title), $query = NULL, $fragment = NULL, $absolute = TRUE, $html = TRUE) . "</li>";
	$output = '<div class="my-attachments"><ul>' . $output . "</ul></div>";
 return $output;


You can alternatively use imagemagick function provided by image.module (the image.imagemagick.inc file on drupal/includes):

function file_preview_path($filename, $filepath = 'filepreview') {
  /** generate jpg thumbnails **/
  return $filepath . "/" . substr($filename,0,strlen($filename)-4) . ".jpg";

 function file_preview(&$file, $pages = Array(1)) {
   $allowed_mime = array("application/pdf");
	$local_src_path = getcwd() . "/" . $file->filepath;
	$local_dest_path = getcwd() . "/" . file_directory_path() . "/" . file_preview_path($file->filename);
	# create thumbnail only when needed
	$imagemagick = getcwd() . '/includes/' . 'image.imagemagick.inc';
	if(!file_exists($local_dest_path) && file_exists($imagemagick)){
                                                       $local_dest_path, array('-colorspace RGB'));
		# all can read thumbnail files
			chmod($local_dest_path, 0777);
	return file_preview_path($file->filename);
	return FALSE;

Create nice pdf with ps2pdf and any word processing utility

Apply on: any GNU/Linux distrubution

Applications like OpenOffice allow to export document in the PDF format. However, sometimes the result is not very much professional. To obtain the best from your document in printing, you can follow a two-step conversion using any word processing utility.

  1. Use the option “Print to file” to convert your document to PostScript format. (i.e. my_document.ps)
  2. Convert the generated PostScript file in PDF using ps2pdf

Conversion tips:

To generate a document to print (more heavy), with embedded fonts, best image rendering etc. you can use:

ps2pdf -dPDFsettings=/prepress my_document.ps

The “prepress” distiller parameter automatically choose the best settings for print, but you can override it.

See also:
How to use ps2pdf advanced options