<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Braydon Fuller &#187; Python</title>
	<atom:link href="http://aweplanet.com/braydon/tag/python/feed/" rel="self" type="application/rss+xml" />
	<link>http://aweplanet.com/braydon</link>
	<description>A Micro-Newspaper for Computing Liberties</description>
	<lastBuildDate>Fri, 18 May 2012 18:36:10 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Python Performance Part 3: Python 3000 and Transforming Large Lists into Seperate Smaller Lists</title>
		<link>http://aweplanet.com/braydon/2008/12/python-performance-part-3-python-3000-and-transforming-large-lists-into-seperate-smaller-lists/</link>
		<comments>http://aweplanet.com/braydon/2008/12/python-performance-part-3-python-3000-and-transforming-large-lists-into-seperate-smaller-lists/#comments</comments>
		<pubDate>Thu, 04 Dec 2008 13:30:00 +0000</pubDate>
		<dc:creator>AWE Planet</dc:creator>
				<category><![CDATA[Hacking]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://aweplanet.com/braydon/blog/?p=125</guid>
		<description><![CDATA[Preface This is a redux of Python Performance Part 1, where the fastest method was using the reduce builtin function in Python2.5. December 3rd, Python 3000 final was released so I have downloaded it and gone over some of these scripts again. In Python 3000 the reduce function is no longer a builtin, and has [...]]]></description>
			<content:encoded><![CDATA[<div id="extended" class="text">
<h2>Preface</h2>
<p>This is a redux of <a href="/blog/2009/2/11/python-performance-part-1">Python Performance Part 1</a>, where the fastest method was using the reduce builtin function in Python2.5. December 3rd, Python 3000 final was released so I have downloaded it and gone over some of these scripts again. In Python 3000 the reduce function is no longer a builtin, and has moved to the module functools. When doing some general comparisons between Python2.5 and Python 3000, the later seemed to always run slightly slower. This is due to the new IO system and unicode indentifiers, as I was told in #python channel by Crys_. It was also recommended that I also compare my tests with <a href="http://en.wikipedia.org/wiki/List_comprehension">List Comprehension</a>, of which is new to me.</p>
<h2>List Comprehension</h2>
<pre>from oids import oids as a
c = 3
res = [a[x:x+c] for x in [c*x for x in range(int(round(len(a)/c)))]]</pre>
<h3>Python 3k Times</h3>
<pre>real  0m0.218s
user  0m0.180s
sys   0m0.016s

real  0m0.262s
user  0m0.204s
sys   0m0.032s

real  0m0.287s
user  0m0.244s
sys   0m0.008s</pre>
<h3>Python 2.5 Times</h3>
<pre>real  0m0.244s
user  0m0.220s
sys   0m0.016s

real  0m0.229s
user  0m0.208s
sys   0m0.020s

real  0m0.251s
user  0m0.236s
sys   0m0.020s</pre>
<p>This makes it the fastest method, beating the previous fastest time of 0.34s. Even better, there is little difference between Python2.5 and Python3000 here.</p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://aweplanet.com/braydon/2008/12/python-performance-part-3-python-3000-and-transforming-large-lists-into-seperate-smaller-lists/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Python Performance Part 2 Redux: Split &amp; Reduce Large Strings for &#8216;A Href&#8217; Hypertext</title>
		<link>http://aweplanet.com/braydon/2008/06/python-performance-part-2-redux-split-reduce-large-strings-for-a-href-hypertext/</link>
		<comments>http://aweplanet.com/braydon/2008/06/python-performance-part-2-redux-split-reduce-large-strings-for-a-href-hypertext/#comments</comments>
		<pubDate>Tue, 17 Jun 2008 13:37:21 +0000</pubDate>
		<dc:creator>AWE Planet</dc:creator>
				<category><![CDATA[Hacking]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://aweplanet.com/braydon/blog/?p=129</guid>
		<description><![CDATA[split_2.py def get_value(a): return a[1:a.find("&#62;")-1] hrefs = map(get_value,open("hypertext.html","r").read().split("&#60;a href=")) Timing Comparison: ~ 300% Performance Improvement Note: hypertext.html is 48MB. braydon@bgf:~/python_tests/extract$ time python split.py real 0m1.263s user 0m1.112s sys 0m0.156s braydon@bgf:~/python_tests/extract$ time python split_2.py real 0m0.392s user 0m0.268s sys 0m0.120s split.py Previously, I had found the best solution to my problem was to split() the large [...]]]></description>
			<content:encoded><![CDATA[<div id="extended" class="text">
<h2>split_2.py</h2>
<pre>def get_value(a):
    return a[1:a.find("&gt;")-1]
hrefs = map(get_value,open("hypertext.html","r").read().split("&lt;a href="))</pre>
<h2>Timing Comparison: ~ 300% Performance Improvement</h2>
<p>Note: hypertext.html is 48MB.</p>
<pre>braydon@bgf:~/python_tests/extract$ time python split.py 

real    0m1.263s
user    0m1.112s
sys     0m0.156s

braydon@bgf:~/python_tests/extract$ time python split_2.py 

real    0m0.392s
user    0m0.268s
sys     0m0.120s</pre>
<h2>split.py</h2>
<p>Previously, I had found the best solution to my problem was to split() the large string up by the &#8220;&gt;&#8221; character, and then reduce to a list of hyperlinks.</p>
<pre>def is_ahref(a,b):
    y = b.find("&lt;a href=")
    if y != -1: a.append(b[y+9:-1])
    return a

def preduce(fn,ls,a):
    ls.insert(0,a)
    return reduce(fn,ls)

hrefs = preduce(is_ahref,open("hypertext.html","r").read().split("&gt;"),[])</pre>
<p>There is a better solution. Splitting the text up by the &#8220;&gt;&#8221; character is wasteful; there are many &#8220;&gt;&#8221;s in html, and most of them that will not have hyperlinks. We don&#8217;t need to even check if the item in the list is an href if we split the string into a list that all will have an href, and then reduce it as before.</p>
<pre>def is_ahref(a,b):
    z = b.find("&gt;")
    a.append(b[1:z-1])
    return a

def preduce(fn,ls,a):
    ls.insert(0,a)
    return reduce(fn,ls)

fc = open("hypertext.html","r").read().split("&lt;a href=")
hrefs = preduce(is_ahref,fc,[])</pre>
<p>However because the size of the list will be exactly the same as it started, we shouldn&#8217;t need to use reduce(), but rather we can just map() a fuction to run through the entire list, &#8216;reducing&#8217; it to a list of just the hyperlinks.</p>
<h2>split_2.py</h2>
<pre>def get_value(a):
    return a[1:a.find("&gt;")-1]
hrefs = map(get_value,open("hypertext.html","r").read().split("&lt;a href="))</pre>
</div>
]]></content:encoded>
			<wfw:commentRss>http://aweplanet.com/braydon/2008/06/python-performance-part-2-redux-split-reduce-large-strings-for-a-href-hypertext/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Python Performance Part 2: Parsing Large Strings for &#8216;A Href&#8217; Hypertext</title>
		<link>http://aweplanet.com/braydon/2008/06/python-performance-part-2-parsing-large-strings-for-a-href-hypertext/</link>
		<comments>http://aweplanet.com/braydon/2008/06/python-performance-part-2-parsing-large-strings-for-a-href-hypertext/#comments</comments>
		<pubDate>Mon, 16 Jun 2008 13:39:46 +0000</pubDate>
		<dc:creator>AWE Planet</dc:creator>
				<category><![CDATA[Hacking]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://aweplanet.com/braydon/blog/?p=133</guid>
		<description><![CDATA[Goal Write a fast Python script that will take a large string and reduce it to a list of all of the hyperlinks in the html string; such as [”http://world.org”,”/tree”]. Attempt 1: Self-Recursion f = open('hypertext_sm.html','r') ahrefs = [] count = [] def find_ahref(h): a = h.find("&#60;a href=") if a != -1: a = a+9 [...]]]></description>
			<content:encoded><![CDATA[<div id="extended" class="text">
<h2>Goal</h2>
<p>Write a fast Python script that will take a large string and reduce it to a list of all of the hyperlinks in the html string; such as [”http://world.org”,”/tree”].</p>
<h2>Attempt 1: Self-Recursion</h2>
<pre>f = open('hypertext_sm.html','r')
ahrefs = []
count = []
def find_ahref(h):
    a = h.find("&lt;a href=")
    if a != -1:
        a = a+9
        b = a + h[a:-1].find("&gt;")-1
        ahrefs.append(h[a:b])
        find_ahref(h[b:-1])

find_ahref(f.read())
f.close()</pre>
<h3>Summary</h3>
<p>Fairly fast with small strings, however with large strings it causes a memory overload from the large string being stored multiple times from the self-recursion, causing the script to fail.</p>
<h2>Attempt 2: Reduce</h2>
<pre>f = open('hypertext_sm.html','r')
def find_ahref(a,b):
    try:
        c = a[1] + str(b)
        x = c.find("&lt;a href=")
        y = c[x:-1].find("&gt;")
        if x != -1 and y != -1:
            a[0].append(c[x+9:x+y-1])
            return (a[0],"")
        else:
            return (a[0],c)
    except:
        return ([],str(a)+str(b))

hrefs = reduce(find_ahref,f.read())[0]
f.close()</pre>
<h3>Summary</h3>
<p>Not as fast as the previous with smaller strings, however it does not overload the memory and it atually completed parsing the larger string (43Mb). Because it took nearly 7min to run though it is difficult for in to be a solution.</p>
<h2>Attempt: 3: While Readline</h2>
<pre>f = open('hypertext.html','r')

ahrefs = []
def find_ahref(h):
    a = h.find("&lt;a href=")
    if a != -1:
        a = a+9
        b = a + h[a:-1].find("&gt;")-1
        ahrefs.append(h[a:b])
        find_ahref(h[b:-1])

while True:
    line = f.readline()
    if line:
        find_ahref(line)
    else:
        break

f.close()</pre>
<h3>Summary</h3>
<p>This is pretty fast with both small and large strings (with many lines). However if it was handed a very large single line it would crumble as Attempt 1: Self-Recursion.</p>
<h2>Attempt 4: Map/Reducer Readlines</h2>
<pre>f = open('hypertext.html','r')
def ahref_reducer(a,b):
    try:
        c = a[1] + str(b)
        x = c.find("&lt;a href=")
        y = c[x:-1].find("&gt;")
        if x != -1 and y != -1:
            a[0].append(c[x+9:x+y-1])
            return (a[0],"")
        else:
            return (a[0],c)
    except:
        return ([],str(a)+str(b))

def get_ahrefs(line):
    return reduce(ahref_reducer,line)[0]

lines = f.readlines()
hrefs = map(get_ahrefs,lines)
f.close()</pre>
<h3>Summary</h3>
<p>Slower but doesn’t destroy memory. However, it doesn’t really meet the goal either as it returns a nasty list with empty parts.</p>
<h2>Attempt 5: Reduce Recurse Readlines</h2>
<pre>f = open('hypertext.html','r')

def find_ahref(z,htxt):
    a = htxt.find("&lt;a href=")
    if a != -1:
        a = a+9
        b = a+htxt[a:-1].find("&gt;")-1
        href = htxt[a:b]
        z.append(href)
        return find_ahref(z,htxt[b:-1])
    else:
        return z

def preduce(fn,ls,a):
    ls.insert(0,a)
    return reduce(fn,ls)

lines = f.readlines()
hrefs = preduce(find_ahref,lines,[])

f.close()</pre>
<h3>Summary</h3>
<p>Fast, although it relies on their being many multiple lines.</p>
<h2>Attempt 6: Something Completly Different</h2>
<p>Instead of looking for the “a href” first we will look for the end “&gt;” and search for the “a href” to that point.</p>
<pre>hrefs = []

f = open("hypertext.html","r")
f_str = f.read()
f.close()

while True:
    b = f_str.find("&gt;")
    if b == -1:
        break
    a = f_str[0:b].find("&lt;a href=")
    if a != -1:
        hrefs.append(f_str[a+9:b-1])
    f_str = f_str[b+1:-1]</pre>
<h3>Summary</h3>
<p>While good in theory, it does not handle large strings well; and by well I mean not at all. However it did get the correct hrefs from a smaller list.</p>
<h2>Attempt 7: Split &amp; (P)Reduce</h2>
<p>Rather than breaking the large string up by line, we will break it up by a character “&gt;”.</p>
<pre>def is_ahref(a,b):
    y = b.find("&lt;a href=")
    if y != -1: a.append(b[y+9:-1])
    return a

def preduce(fn,ls,a):
    ls.insert(0,a)
    return reduce(fn,ls)

hrefs = preduce(is_ahref,open("hypertext.html","r").read().split("&gt;"),[])</pre>
<h3>Summary</h3>
<p>This is the fastest of them all, slightly faster than Reduce Recurse Readlines, and can even handle large single string lines quickly.</p>
<h2>Conclusion</h2>
<ul>
<li>Self-recursion is not good when passing around the same string to itself.</li>
<li>Concatenating a long list of charaters from a string and checking for “a href” does not destroy memory but is very slow.</li>
<li>Breaking a large string into smaller ones is faster and deosn’t destroy memory.</li>
<li>Reading by line is only one way to make a large string (or file) into smaller strings.</li>
</ul>
</div>
]]></content:encoded>
			<wfw:commentRss>http://aweplanet.com/braydon/2008/06/python-performance-part-2-parsing-large-strings-for-a-href-hypertext/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Python Performance Part 1: Transforming Large Lists into Seperate Smaller Lists</title>
		<link>http://aweplanet.com/braydon/2008/06/python-performance-part-1-transforming-large-lists-into-seperate-smaller-lists/</link>
		<comments>http://aweplanet.com/braydon/2008/06/python-performance-part-1-transforming-large-lists-into-seperate-smaller-lists/#comments</comments>
		<pubDate>Sat, 14 Jun 2008 13:39:26 +0000</pubDate>
		<dc:creator>AWE Planet</dc:creator>
				<category><![CDATA[Hacking]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://aweplanet.com/braydon/blog/?p=138</guid>
		<description><![CDATA[Goal Write a fast Python script that will take a large list and break it up into smaller sub-lists based on a set size; such as transforming [a,b,c,d,e,f] into [[a,b],[c,d],[e,f]]. Attempt 1: Map/Reduce (0.93s) #import a list of 247,213 integers from oids import oids def pre(a): return (list(), a, 0) def make_sets(a,b): set_size = 8 [...]]]></description>
			<content:encoded><![CDATA[<div id="extended" class="text">
<h2>Goal</h2>
<p>Write a fast Python script that will take a large list and break it up into smaller sub-lists based on a set size; such as transforming [a,b,c,d,e,f] into [[a,b],[c,d],[e,f]].</p>
<h2>Attempt 1: Map/Reduce (0.93s)</h2>
<pre>#import a list of 247,213 integers
from oids import oids

def pre(a):
    return (list(), a, 0)

def make_sets(a,b):
    set_size = 8
    if a[2] == set_size - 1 or a[0] == list():
        a[0].append([a[1]])
        return (a[0],b[1],0)
    else:
        a[0][-1].append(a[1])
        return (a[0],b[1],a[2]+1)

reduce(make_sets,map(pre,oids))</pre>
<h3>Times</h3>
<pre>real    0m0.935s
user    0m0.896s
sys     0m0.032s

real    0m0.948s
user    0m0.920s
sys     0m0.028s

real    0m0.929s
user    0m0.900s
sys     0m0.024s</pre>
<h2>Attempt 2: For-loop (0.41s)</h2>
<pre>from oids import oids

output = list()
count = 0
set_size = 8
for oid in oids:
    if count == set_size or output == list():
        output.append([oid])
        count = 0
    else:
        output[-1].append(oid)
        count = count + 1</pre>
<h3>Times</h3>
<pre>real    0m0.429s
user    0m0.404s
sys     0m0.024s

real    0m0.396s
user    0m0.384s
sys     0m0.012s

real    0m0.410s
user    0m0.396s
sys     0m0.012s</pre>
<h2>Attempt 3: Map (0.48s)</h2>
<pre>from oids import oids

output = list()
set_size = 8
count = [0]

def break_apart(a):
    if count[-1] == set_size or output == list():
        output.append([a])
        count.append(0)
    else:
        output[-1].append(a)
        count.append(count[-1] + 1)

map(break_apart,oids)</pre>
<h3>Timing</h3>
<pre>real    0m0.484s
user    0m0.476s
sys     0m0.012s

real    0m0.483s
user    0m0.464s
sys     0m0.016s

real    0m0.482s
user    0m0.452s
sys     0m0.028s</pre>
<h2>Attempt 4: Reduce (0.34s)</h2>
<pre>from oids import oids

def seperate(a,b,length=8):
    try:
        if len(a[-1]) == length:
            a.append([b])
            return a
        else:
            a[-1].append(b)
            return a
    except:
        return [[a,b]]

oids = reduce(seperate,oids)</pre>
<h3>Timing</h3>
<pre>real    0m0.323s
user    0m0.308s
sys     0m0.016s

real    0m0.329s
user    0m0.300s
sys     0m0.028s

real    0m0.353s
user    0m0.332s
sys     0m0.020s</pre>
</div>
]]></content:encoded>
			<wfw:commentRss>http://aweplanet.com/braydon/2008/06/python-performance-part-1-transforming-large-lists-into-seperate-smaller-lists/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

