Extracting email addresses from any text file with Python

June 13th, 2009 by Tuomas Rasila

picture-11Ok, this might sound like that we are in the spamming business now. Well, we are not. The case is that email address is typically the only per-person unique key in CRM data. These couple of lines of Python will extract email addresses from any text file, e.g a HTML-file. This script will also make list unique so if the same email address is listed many times in the original data, it will be only once in the output. Enjoy:

#!/usr/bin/env python
# coding: utf-8

import os
import re
import sys

def grab_email(file):
    """Try and grab all emails addresses found within a given file."""
    email_pattern = re.compile(r'\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b',
                               re.IGNORECASE)
    found = set()
    if os.path.isfile(file):
        for line in open(file, 'r'):
            found.update(email_pattern.findall(line))
    for email_address in found:
        print email_address

if __name__ == '__main__':
    grab_email(sys.argv[1])
Share and Enjoy:
  • del.icio.us
  • Digg
  • Facebook
  • Reddit
  • Twitter

Tags: ,

4 Responses to “Extracting email addresses from any text file with Python”

  1. garijon says:

    Thanks a lot, it was just what I was looking for.

  2. Carlos says:

    what do i have to do in case that i want to extract all emails but those which begins with postmaster???

    thanks

  3. Tuomas Rasila says:

    You can grep the file. Say your file with emails is foo.txt, do the following in the command line:

    grep -v postmaster@ foo.txt > new_file.txt

  4. John Kosty says:

    Echo Garijon. Needed a good Python how-to example and was lucky enough to find this.

    Thanks, big time!

Leave a Reply