Posts Tagged ‘scripting’

Extracting pages with specified string from PDF-file with Python

Thursday, January 15th, 2009

Yesterday I had a simple problem. I had one big PDF-file with 6000 pages in it and I wanted to prepare it for mail house. What they needed to get the job done was two PDF-files, one with single-paged documents and one with multi-paged ones.

Luckily all the single-paged documents had string “Page: 1/1″ (or same in Finnish) on the top of the page. So writing a small Python-script to do the job was easy. What I needed in addition to Python was pdftotext binary (here is a dmg for OS X) and pyPdf. So here is the code:
(more…)