<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Rasila Garage &#187; Programming</title>
	<atom:link href="http://rasilagarage.com/category/programming/feed/" rel="self" type="application/rss+xml" />
	<link>http://rasilagarage.com</link>
	<description>Tuomas Rasila's blog about software and entrepreneurship</description>
	<lastBuildDate>Sun, 07 Mar 2010 09:11:27 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>Extracting email addresses from any text file with Python</title>
		<link>http://rasilagarage.com/2009/06/extracting-email-addresses-from-any-text-file-with-python/</link>
		<comments>http://rasilagarage.com/2009/06/extracting-email-addresses-from-any-text-file-with-python/#comments</comments>
		<pubDate>Sat, 13 Jun 2009 09:05:48 +0000</pubDate>
		<dc:creator>Tuomas Rasila</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[email]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://rasilagarage.com/?p=215</guid>
		<description><![CDATA[Ok, this might sound like that we are in the spamming business now. Well, we are not. The case is that email address is typically the only per-person unique key in CRM data. These couple of lines of Python will extract email addresses from any text file, e.g a HTML-file. This script will also make [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignright size-medium wp-image-217" title="picture-11" src="http://rasilagarage.com/wp-content/uploads/2009/06/picture-11-300x190.png" alt="picture-11" width="300" height="190" />Ok, this might sound like that we are in the spamming business now. Well, we are not. The case is that email address is typically the only per-person unique key in CRM data. These couple of lines of Python will extract email addresses from any text file, e.g a HTML-file. This script will also make list unique so if the same email address is listed many times in the original data, it will be only once in the output. Enjoy:</p>
<pre class="brush:python">
#!/usr/bin/env python
# coding: utf-8

import os
import re
import sys

def grab_email(file):
    """Try and grab all emails addresses found within a given file."""
    email_pattern = re.compile(r'\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b',
                               re.IGNORECASE)
    found = set()
    if os.path.isfile(file):
        for line in open(file, 'r'):
            found.update(email_pattern.findall(line))
    for email_address in found:
        print email_address

if __name__ == '__main__':
    grab_email(sys.argv[1])
</pre>
]]></content:encoded>
			<wfw:commentRss>http://rasilagarage.com/2009/06/extracting-email-addresses-from-any-text-file-with-python/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Using SugarCRM&#8217;s SOAP-API with Python and SOAPpy</title>
		<link>http://rasilagarage.com/2009/02/using-sugarcrms-soap-api-with-python-and-soappy/</link>
		<comments>http://rasilagarage.com/2009/02/using-sugarcrms-soap-api-with-python-and-soappy/#comments</comments>
		<pubDate>Sun, 15 Feb 2009 18:13:32 +0000</pubDate>
		<dc:creator>Tuomas Rasila</dc:creator>
				<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[sugarcrm]]></category>

		<guid isPermaLink="false">http://rasilagarage.com/?p=195</guid>
		<description><![CDATA[SugarCRM is nowadays very widely used CRM system so I will cover it in my posts every now and then. At work we use it by ourselves to handle some processes and have been working with customers using it. There are things I like and thing I don&#8217;t like in SugarCRM. It is bit bloated [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.sugarcrm.com/" onclick="pageTracker._trackPageview('/outgoing/www.sugarcrm.com/?referer=');">SugarCRM</a> is nowadays very widely used CRM system so I will cover it in my posts every now and then.</p>
<div id="attachment_206" class="wp-caption alignright" style="width: 310px"><a href="http://rasilagarage.com/wp-content/uploads/2009/02/picture-1.png"><img class="size-medium wp-image-206" title="Sugarsoap" src="http://rasilagarage.com/wp-content/uploads/2009/02/picture-1-300x139.png" alt="The API is not very well documented but you can browse its capabilities be opening http://URL-TO-SUGAR/soap.php in your browser" width="300" height="139" /></a><p class="wp-caption-text">The API is not very well documented but you can browse its capabilities be opening http://URL-TO-SUGAR/soap.php in your browser</p></div>
<p>At work we use it by ourselves to handle some processes and have been working with customers using it. There are things I like and thing I don&#8217;t like in SugarCRM. It is bit bloated and the code is not the most beautiful I have seen. On the other hand it does its job quite well and it is easy to configure without touching a line of code.</p>
<p>If you use a CRM in your business for a while you will most likely to find that you would like to do some integration. At work we have been doing various kinds of integration tasks. We have integrated a phone system to automatically make entries to SugarCRM when inbound or outbound calls have been made, connected registration of a web application to create a new accounts to CRM, connected a <a href="http://en.wikipedia.org/wiki/Laboratory_Information_Management_System" onclick="pageTracker._trackPageview('/outgoing/en.wikipedia.org/wiki/Laboratory_Information_Management_System?referer=');">LIMS</a> in various ways to it and so on. The reason I found SOAP-API to be the best way in most these cases is the fact that if you use it you won&#8217;t break down anything and it is likely that your code will work with future releases too. Off course there is a performance penalty and using SOAP API is not a good option when performance is an issue. So lets get to the business.<br />
<span id="more-195"></span><br />
<strong>Installing SOAPpy</strong><br />
I assume that installing Python module with easy_install is not a problem. However, latest version of <a href="http://pywebsvcs.sourceforge.net/" onclick="pageTracker._trackPageview('/outgoing/pywebsvcs.sourceforge.net/?referer=');">SOAPpy</a> was not compatible with Python 2.5. To install SOAPpy I would suggest the following:</p>
<p>1. Download and extract SOAPpy <a href="http://sourceforge.net/project/showfiles.php?group_id=26590&amp;package_id=18246" onclick="pageTracker._trackPageview('/outgoing/sourceforge.net/project/showfiles.php?group_id=26590_amp_package_id=18246&amp;referer=');">http://sourceforge.net/project/showfiles.php?group_id=26590&amp;package_id=18246</a></p>
<p>2. Move every &#8220;from __future__ import&#8221; lines to the beginning of the files where they are in. Below you can see the listing of these files.</p>
<pre class="brush:bash">
localhost:SOAPpy-0.12.0 tuomas$ grep -R __future__ SOAPpy/*.py
SOAPpy/Client.py:from __future__ import nested_scopes
SOAPpy/GSIServer.py:from __future__ import nested_scopes
SOAPpy/NS.py:from __future__ import nested_scopes
SOAPpy/Server.py:from __future__ import nested_scopes
SOAPpy/Types.py:from __future__ import nested_scopes
</pre>
<p>3. Install SOAPpy</p>
<pre class="brush:bash">
sudo python setup.py install
</pre>
<p><strong>The simplest possible example of SugarCRM + SOAPpy</strong><br />
The code below logs in to the SugarCRM, asks the version of the server and prints it. Here is the code:</p>
<pre class="brush:python">
import md5
import SOAPpy
USERNAME= "user"
PASSWORD= "pass"

auth = {'user_name': USERNAME, 'password': md5.new(PASSWORD).hexdigest(), "version": "1.1"}

# Url of SugarCRM + soap.php?wsdl
SUGAR_URL = 'http://192.160.0.1/soap.php?wsdl'

sugar = SOAPpy.WSDL.Proxy(SUGAR_URL)
session = sugar.login(auth, "foobar")

try:
    response = sugar.get_server_version()
except StandardError, err:
    print '\nError in information retrieval from SugarCRM:' + str(err)
print response
</pre>
<p>The API is not documented very well but you can list all of its capabilities by browsing yourself to http://URL-TO-SUGAR/soap.php.</p>
<p><strong>Advanced example</strong><br />
This script will log into SugarCRM and create a new Account.</p>
<pre class="brush:python">
import md5
import SOAPpy
USERNAME= "user"
PASSWORD= "pass"

auth = {'user_name': USERNAME, 'password': md5.new(PASSWORD).hexdigest(), "version": "1.1"}

# Url of SugarCRM + soap.php?wsdl
SUGAR_URL = 'http://192.160.0.1/soap.php?wsdl'

sugar = SOAPpy.WSDL.Proxy(SUGAR_URL)
session = sugar.login(auth, "foobar")
module = "Accounts"

adata = [{'name': 'name', 'value': "Test Account"},
           {'name': 'shipping_address_street', 'value': "Address test"}]
try:
    response = sugar.set_entry(session['id'], module, adata)
except StandardError, err:
     print '\nError in information retrieval from SugarCRM:' + str(err)
print response
</pre>
<p><strong>UPDATE:</strong> It appears to be that, Sugar Team is working with the <a href="http://developers.sugarcrm.com/wordpress/2009/02/11/draft-2-of-the-web-services-documentation-posted/" onclick="pageTracker._trackPageview('/outgoing/developers.sugarcrm.com/wordpress/2009/02/11/draft-2-of-the-web-services-documentation-posted/?referer=');">API and the documentation</a> at the moment and I&#8217;ll take back what I said about missing documentation. Thanks SugarCRM devs! </p>
]]></content:encoded>
			<wfw:commentRss>http://rasilagarage.com/2009/02/using-sugarcrms-soap-api-with-python-and-soappy/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>How to write a simple bot that updates status to Laconi.ca</title>
		<link>http://rasilagarage.com/2009/01/how-to-write-a-simple-bot-that-updates-to-laconica/</link>
		<comments>http://rasilagarage.com/2009/01/how-to-write-a-simple-bot-that-updates-to-laconica/#comments</comments>
		<pubDate>Sun, 25 Jan 2009 10:36:43 +0000</pubDate>
		<dc:creator>Tuomas Rasila</dc:creator>
				<category><![CDATA[Business]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[bash]]></category>
		<category><![CDATA[laconi.ca]]></category>
		<category><![CDATA[unix]]></category>

		<guid isPermaLink="false">http://rasilagarage.com/?p=73</guid>
		<description><![CDATA[In my last post I wrote about Laconi.ca and mentioned that we have many bots updating Laconi.ca too. The bots are made active when some condition has met. Example: Make server running out of disk space to post an update to Laconi.ca This is actually quite useful. Sure, you can tell every service to send [...]]]></description>
			<content:encoded><![CDATA[<div id="attachment_91" class="wp-caption alignleft" style="width: 220px"><a href="http://rasilagarage.com/wp-content/uploads/2009/01/419970_front_rack_server.jpg"><img class="size-full wp-image-91 " title="Rack server" src="http://rasilagarage.com/wp-content/uploads/2009/01/419970_front_rack_server.jpg" alt="Full disk is a lousy reason to wake up in a middle of the night" width="210" height="158" /></a><p class="wp-caption-text">Full disk is a lousy reason to wake up in a middle of the night</p></div>
<p>In my <a href="http://rasilagarage.com/2009/01/laconica-an-open-source-twitter-clone-behind-a-password-and-your-firewall/">last post</a> I wrote about Laconi.ca and mentioned that we have many bots updating Laconi.ca too. The bots are made active when some condition has met.</p>
<p><strong>Example: Make server running out of disk space to post an update to Laconi.ca</strong></p>
<p>This is actually quite useful. Sure, you can tell every service to send an email when something not wanted is happening but then you will have to read that mailbox too. Sending email to someone else doesn’t solve the problem. Hmm, updating to Laconi.ca doesn’t solve the problem either but it makes servers easier follow and to subscribe.</p>
<p>So here is a simple shell script:</p>
<p><span id="more-73"></span></p>
<p><code>#!/bin/sh<br />
fs=`mount|egrep '^/dev'|grep -iv cdrom| awk '{print $3}'`<br />
typeset -i thresh="96"<br />
typeset -i warn="98"<br />
for i in $fs<br />
do<br />
skip=0<br />
typeset -i used=`df -k $i|tail -n 1|awk '{print $5}'|cut -d "%" -f 1`<br />
if [ "$used" -ge "$warn" ]; then<br />
laconica.sh "1.2.3.4 server CRITICAL: filesystem $i is $used% full"<br />
fi<br />
if [ "$used" -ge "$thresh" -a "$used" -le "$warn" ]; then<br />
laconica.sh "1.2.3.4 server WARNING: filesystem $i is $used %full"<br />
fi<br />
done</code></p>
<p>I didn’t want to use cURL directly to talk to Laconi.ca so I downloaded laconica.sh from http://downloads.guillermoamaral.com/misc/laconica. Sure it uses cURL, but if I change a password no I can change it to this wrapper instead.</p>
<p>I saved the script as diskfull.sh. So lets first chmod it and then try it:</p>
<p><code>chmod 755 diskfull.sh<br />
./diskfull.sh</code></p>
<p>Laconi.ca gets an update as follows:</p>
<div id="attachment_75" class="wp-caption alignnone" style="width: 310px"><a href="http://rasilagarage.com/wp-content/uploads/2009/01/picture-3.png"><img class="size-medium wp-image-75" title="Laconi.ca bot in action" src="http://rasilagarage.com/wp-content/uploads/2009/01/picture-3-300x49.png" alt="Screenshot of Laconi.ca" width="300" height="49" /></a><p class="wp-caption-text">Screenshot of Laconi.ca</p></div>
<p>You would need schedule Cron to run the script. To do so, edit the /etc/crontab and add the following line:</p>
<p><code>31 8  * * *     root    /opt/diskfull.sh</code></p>
<p>This line will run it daily on 08:31 as root. You might want to change it. If you are not familiar with Cron <a href="http://guerrillatech.wordpress.com/2008/01/25/howto-use-cron-to-schedule-tasks/" onclick="pageTracker._trackPageview('/outgoing/guerrillatech.wordpress.com/2008/01/25/howto-use-cron-to-schedule-tasks/?referer=');">here is some help</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://rasilagarage.com/2009/01/how-to-write-a-simple-bot-that-updates-to-laconica/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Extracting pages with specified string from PDF-file with Python</title>
		<link>http://rasilagarage.com/2009/01/extracting-pdf-with-python/</link>
		<comments>http://rasilagarage.com/2009/01/extracting-pdf-with-python/#comments</comments>
		<pubDate>Thu, 15 Jan 2009 20:43:44 +0000</pubDate>
		<dc:creator>Tuomas Rasila</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[scripting]]></category>

		<guid isPermaLink="false">http://rasilagarage.com/?p=1</guid>
		<description><![CDATA[Yesterday I had a simple problem. I had one big PDF-file with 6000 pages in it and I wanted to prepare it for mail house. What they needed to get the job done was two PDF-files, one with single-paged documents and one with multi-paged ones. Luckily all the single-paged documents had string &#8220;Page: 1/1&#8243; (or [...]]]></description>
			<content:encoded><![CDATA[<p>Yesterday I had a simple problem. I had one big PDF-file with 6000 pages in it and I wanted to prepare it for mail house. What they needed to get the job done was two PDF-files, one with single-paged documents and one with multi-paged ones.</p>
<p>Luckily all the single-paged documents had string &#8220;Page: 1/1&#8243; (or same in Finnish) on the top of the page. So writing a small Python-script to do the job was easy. What I needed in addition to Python was <a href="http://en.wikipedia.org/wiki/Pdftotext" onclick="pageTracker._trackPageview('/outgoing/en.wikipedia.org/wiki/Pdftotext?referer=');">pdftotext</a> binary (<a href="http://www.bluem.net/downloads/pdftotext_en/" onclick="pageTracker._trackPageview('/outgoing/www.bluem.net/downloads/pdftotext_en/?referer=');">here is a dmg for OS X</a>) and <a href="http://pybrary.net/pyPdf/" onclick="pageTracker._trackPageview('/outgoing/pybrary.net/pyPdf/?referer=');">pyPdf</a>. So here is the code:<br />
<span id="more-1"></span></p>
<pre><code>
#!/usr/bin/python
'''This script will need pypdf module and pdftotext binary'''
import sys
import os
from pyPdf import PdfFileReader, PdfFileWriter

def findstr(lookup, filename):
    textfile = open(filename, 'rb')
    text = textfile.read()
    textfile.close()
    pos = -1
    while True:
        # move index up on next call
        pos = text.find(lookup, pos + 1)
        # not found or done
        if pos < 0:
            return False
        return True

try:
    searchstr = sys.argv[1]
    searchstr2 = sys.argv[2]
    pdffile = PdfFileReader(file(sys.argv[3], "rb"))
    numpages = pdffile.getNumPages()
    singlefile = sys.argv[4]
    multifile = sys.argv[5]
except IndexError:
    print "Usage: getmulti.py [searchstring1] [searchstring2] [sourcefile] [destinationfile-single] [destinationfile-single]"

print "****************"
print "Extracting multipaged and singlepaged files from " + sys.argv[3] + " (%s pages)" % numpages
print "Outputting multipaged to " + multifile + " and singlepaged to " + singlefile

singleoutput = PdfFileWriter()
multioutput = PdfFileWriter()
for i in xrange(numpages):
    os.system("pdftotext -f %s -l %s %s /tmp/foo.txt" % (i+1, i+1, sys.argv[3]))
    print "pdftotext -f %s -l %s %s /tmp/foo.txt" % (i+1, i+1, sys.argv[3])
    if findstr(searchstr, "/tmp/foo.txt") or findstr(searchstr2, "/tmp/foo.txt"):
        print "got it"
        singleoutput.addPage(pdffile.getPage(i))
    else:
        print "not got"
        multioutput.addPage(pdffile.getPage(i))

multioutputStream = file(multifile, "wb")
singleoutputStream = file(singlefile, "wb")

multioutput.write(multioutputStream)
singleoutput.write(singleoutputStream)
multioutputStream.close()
singleoutputStream.close()
</code>
</pre>
<p>So now I can just say:<br />
<code><br />
./getmulti.py "Page: 1/ 1" "Sivu: 1/ 1" orig.pdf single.pdf multi.pdf<br />
</code></p>
]]></content:encoded>
			<wfw:commentRss>http://rasilagarage.com/2009/01/extracting-pdf-with-python/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
