PDFTK: How to Merge Two PDF Documents

pdftk merge

Pdfs are an easy way to share documents through platforms without worrying about compatibility.

While this is clearly an advantage, pdfs lack the ease of modification that can be found in other documents types as Word documents. Essentially, pdfs are easy to read, but difficult to modify.

Fortunately, in Linux, there are many tricks to make the job easier. There is a piece of software called pdftkwhich allows great manipulation of pdfs.

Installation

Using Fedora:

$ sudo yum install pdftk

Using Ubuntu:

$ sudo apt-get install pdftk

 A few tricks to warm up

Pdftk allows to merge or split pdfs, rotate pdf pages, apply watermarks, attaching files to the pdfs,bursting the pdf in single pages, compressing it and repairing it in some cases, if corrupted.

In this post I will focus on how to merge pdf documents, but there is an extensive documentation on how to perform many other operations. That can be found on the PDFLabs website linked on the bottom of the page, or in the man pages.
The syntax to merge pdf files is relatively easy:

$ pdftk [input pdfs] cat output [output file name]

But its power comes from the possibility to use handles or wildcards. I’ll provide a few examples.

To merge two pdf files into a file called combined.pdf :

$ pdftk file1.pdf file2.pdf cat output combined.pdf

If you want to merge all pdf files in the current directory can use the  wildcard *:

$ pdftk *.pdf cat output combined.pdf

If you want to merge only specific pages of a pdf file with other pages of another pdf files (or the same)handles come handy:

$ pdftk A=file1.pdf B=file2.pdf cat A1 A2-10 B2 A3-7odd outputcombined.pdf

This may seem a bit complicated but I will explain: the input files are now called as two variables A and B, which are the handles. The codes in yellow are the rules to merge the two files, and it is here that lies the power of pdftk. It merges the pages from each file in the order we provide them. In this case:

A1 = the first page of file1.pdf
A2-10 = the pages from 2 to 10 of file1.pdf
B2 = the second page of file2.pdf
A3-7odd = the odd pages in the range from 3 to 7 of file1.pdf  (i.e. 3,5,7)

The even qualifier can be also used instead of odd, to take even pages only.

Main Problem and Solution

I came across pdftk when I needed to solve this problem:

I had scanned a set of papers through a fast document scanner, but since they were written on both sides, I found myself with two big pdfs, the first one containing odd pages only and the second one, containing even pages only, in reverse order!
I wanted a single file with the pages in the correct order but I could not find an easy way using a single command even using pdftk.

*** Update: This command would do the job for pdftk versions higher than 1.44:

 pdftk file2.pdf cat end-1 output – | pdftk file1.pdf – shuffle output combined.pdf

 Thanks to Andrew Peace ***

After some thinking I surrendered to the power of bash!

I decided to create a small bash script to form the command I needed to merge and order automatically the pages in a single file. For those who have done some basic bash, will immediately recognize the logic behind the script. For 50 pages pdfs:

#!/bin/bash
echo -e ‘ pdftk A=AQT.pdf B=AQTb.pdf cat \c’

max=50
for loop1 in {1..50}
do
for loop2 in {A..B}
do
n=0
if [[ ${loop2} == “A” ]] ; then
let “n=$loop1”
elif [[ ${loop2} == “B” ]] ; then
let “n=$max-$loop1+1”
fi
echo -e  ${loop2}${n} ‘\c’
n=0
done
done

echo -e ‘ output combined.pdf \c’

Or you can download the script here and run it by simply typing:

$ ./pdftk_merge.sh

from the folder in which you downloaded the file.

The script prints automatically to the screen the command that you need to type (or better, copy) to merge the two set of pdfs! In this case the result is

pdftk A=file1.pdf B=file2.pdf cat A1 B50 A2 B49 A3 B48 A4 B47 A5 B46 A6 B45 A7 B44 A8 B43 A9 B42 A10 B41 A11 B40 A12 B39 A13 B38 A14 B37 A15 B36 A16 B35 A17 B34 A18 B33 A19 B32 A20 B31 A21 B30 A22 B29 A23 B28 A24 B27 A25 B26 A26 B25 A27 B24 A28 B23 A29 B22 A30 B21 A31 B20 A32 B19 A33 B18 A34 B17 A35 B16 A36 B15 A37 B14 A38 B13 A39 B12 A40 B11 A41 B10 A42 B9 A43 B8 A44 B7 A45 B6 A46 B5 A47 B4 A48 B3 A49 B2 A50 B1  output combined.pdf

Which, if executed from terminal, will correctly merge the two files in the right page order.

Useful links

Pdftk is a Pdflabs product: http://www.pdflabs.com. Many more pdftk examples on how to use it can be found here.