Posts Tagged ‘python’

Extracting email addresses from any text file with Python

Saturday, June 13th, 2009

picture-11Ok, this might sound like that we are in the spamming business now. Well, we are not. The case is that email address is typically the only per-person unique key in CRM data. These couple of lines of Python will extract email addresses from any text file, e.g a HTML-file. This script will also make list unique so if the same email address is listed many times in the original data, it will be only once in the output. Enjoy:

#!/usr/bin/env python
# coding: utf-8

import os
import re
import sys

def grab_email(file):
    """Try and grab all emails addresses found within a given file."""
    email_pattern = re.compile(r'\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b',
                               re.IGNORECASE)
    found = set()
    if os.path.isfile(file):
        for line in open(file, 'r'):
            found.update(email_pattern.findall(line))
    for email_address in found:
        print email_address

if __name__ == '__main__':
    grab_email(sys.argv[1])

Using SugarCRM’s SOAP-API with Python and SOAPpy

Sunday, February 15th, 2009

SugarCRM is nowadays very widely used CRM system so I will cover it in my posts every now and then.

The API is not very well documented but you can browse its capabilities be opening http://URL-TO-SUGAR/soap.php in your browser

The API is not very well documented but you can browse its capabilities be opening http://URL-TO-SUGAR/soap.php in your browser

At work we use it by ourselves to handle some processes and have been working with customers using it. There are things I like and thing I don’t like in SugarCRM. It is bit bloated and the code is not the most beautiful I have seen. On the other hand it does its job quite well and it is easy to configure without touching a line of code.

If you use a CRM in your business for a while you will most likely to find that you would like to do some integration. At work we have been doing various kinds of integration tasks. We have integrated a phone system to automatically make entries to SugarCRM when inbound or outbound calls have been made, connected registration of a web application to create a new accounts to CRM, connected a LIMS in various ways to it and so on. The reason I found SOAP-API to be the best way in most these cases is the fact that if you use it you won’t break down anything and it is likely that your code will work with future releases too. Off course there is a performance penalty and using SOAP API is not a good option when performance is an issue. So lets get to the business.
(more…)

Extracting pages with specified string from PDF-file with Python

Thursday, January 15th, 2009

Yesterday I had a simple problem. I had one big PDF-file with 6000 pages in it and I wanted to prepare it for mail house. What they needed to get the job done was two PDF-files, one with single-paged documents and one with multi-paged ones.

Luckily all the single-paged documents had string “Page: 1/1″ (or same in Finnish) on the top of the page. So writing a small Python-script to do the job was easy. What I needed in addition to Python was pdftotext binary (here is a dmg for OS X) and pyPdf. So here is the code:
(more…)