sjh - mountain biking running linux vegan geek spice - mtb / vegan / running / linux / canberra / cycling / etc

sjh - mountain biking running linux vegan geek spice - mtb / vegan / running / linux / canberra / cycling / etc

Steven Hanley hackergotchi picture Steven
Hanley
About

email: sjh@svana.org

web: https://svana.org/sjh
instagram: https://instagram.com/sjhmtb

Other online diaries:

Aaron Broughton,
Andrew Pollock,
Anthony Towns,
Martijn van Oosterhout,
Michael Still,

Links:

Linux Weekly News,
XKCD,
Girl Genius,
Bilbys,
CORC,

Canberra Weather: forecast, radar.

Subscribe: rss, rss2.0, atom

←July→
Mon Tue Wed Thu Fri Sat Sun

1 2 3 4 5

6 7 8 9 10 11 12

13 14 15 16 17 18 19

20 21 22 23 24 25 26

27 28 29 30 31

←2026→
Months

Jan Feb Mar

Apr May Jun

Jul Aug Sep

Oct Nov Dec

Categories:

amusing - 37
comp - 143
blogging - 6
blosxom - 12
design - 5
email - 7
hardware - 37
internet - 14
ip - 4
linux - 22
prog - 2
schwag - 2
software - 17
lca - 56
leisure - 120
books - 13
food - 51
holiday - 2
music - 25
screen - 21
theatre - 5
mtb - 438
events - 113
gear - 94
various - 318
ilmiwac - 38
milkcarton - 21
work - 5

Archive by month:

June 2018 (1)
December 2016 (1)
August 2016 (1)
July 2016 (1)
June 2016 (4)
May 2016 (1)
November 2015 (21)
September 2014 (2)
May 2014 (1)
July 2013 (1)
May 2013 (2)
December 2012 (5)
December 2011 (31)
November 2011 (1)
October 2011 (5)
June 2011 (4)
May 2011 (5)
April 2011 (4)
March 2011 (3)
February 2011 (4)
January 2011 (8)
December 2010 (3)
November 2010 (11)
October 2010 (4)
September 2010 (3)
August 2010 (4)
July 2010 (5)
June 2010 (5)
May 2010 (4)
April 2010 (6)
March 2010 (4)
February 2010 (2)
January 2010 (9)
December 2009 (4)
November 2009 (7)
October 2009 (4)
September 2009 (7)
August 2009 (12)
July 2009 (8)
June 2009 (8)
May 2009 (14)
April 2009 (6)
March 2009 (8)
February 2009 (7)
January 2009 (13)
December 2008 (9)
November 2008 (12)
October 2008 (7)
September 2008 (10)
August 2008 (7)
July 2008 (11)
June 2008 (14)
May 2008 (30)
April 2008 (20)
March 2008 (12)
February 2008 (8)
January 2008 (7)
December 2007 (7)
November 2007 (11)
October 2007 (9)
September 2007 (7)
August 2007 (7)
July 2007 (23)
June 2007 (6)
May 2007 (17)
April 2007 (7)
March 2007 (9)
February 2007 (7)
January 2007 (26)
December 2006 (7)
November 2006 (10)
October 2006 (11)
September 2006 (12)
August 2006 (16)
July 2006 (9)
June 2006 (12)
May 2006 (23)
April 2006 (17)
March 2006 (16)
February 2006 (13)
January 2006 (23)
December 2005 (13)
November 2005 (20)
October 2005 (14)
September 2005 (27)
August 2005 (34)
July 2005 (28)
June 2005 (30)
May 2005 (12)
April 2005 (17)
March 2005 (28)
February 2005 (38)
January 2005 (29)
December 2004 (33)
November 2004 (38)
October 2004 (15)
September 2004 (25)

Mon, 01 Jun 2009

An interesting languages comparison - 15:45
I got the link to this from Tony and it is interesting to see the results of these tests. The speed, size and dependability of programming languages uses code from the Computer Language Benchmarks Game to generate some information comparing many (72) different languages.

Back in 1999 and 2000 I put a pretty trivial example of a single problem being solved in multiple languages online. In this case scanning html for entities, largely because I was mildly interested in how different languages and the different implementations of them may solve the same problem and the time it would take. I say mildly interested because it is such a trivial example and because I did not put much effort in. (I was amazed a few weeks ago to get an email from someone rerunning these to see if recent Java implementations had caught up to c yet).

The person who wrote this speed, size and dependability post put a lot more effort in and actually was able to draw some interesting conclusions about languages and how they work and develop over time. For the geeks out there I recommend having a look.

[/comp/prog] link

Thu, 08 May 2008

Move a little thing to python - 13:44
At ANU there is an online (web page) searchable phone database for all ANU phone numbers. A few years ago (July 2002, according to the version control dates) I spent an hour or two writing a command line program in perl that queries this and prints the results. I find it much easier to use a command line application than open a tab in a web browser and find the appropriate page and enter a query when all I want is a simple bit of information back. I suspect most of the staff in this department are similar (Computer Science).

Sometime last year I realised that though the URL I was using on the ANU Internal Web still worked it seemed not to interface with the latest phone database for the uni so it sometimes did not match people I knew worked on campus, other times it contained out of date numbers for people. However there were other important uses for my time so I did not bother looking too closely into updating it when most of the time the old results were still good enough.

Finally this week Bob noticed there were no matches coming back, it seems the old interface no longer connected to the database correctly. Thus I opened the program and had a look at updating it. The old program used LWP to fetch the page with a GET request. The newer interface now on ANU Web works properly with a POST request. Also the result page is more complex to parse than the old one (more complex regular expressions, or maybe a small state machine needed). Still it did not look too hard to spend an hour or so fixing the old perl code up to get the new page and parse it properly for the desired results.

However I hit a snag when for some reason LWP did not fetch the entire result from the web server that was returning the data in chunks. A tcpdump session showed it simply closed the request rather then fetch all the data. At this point I could have debugged the perl code and fixed, after all there is no good reason LWP should not work. However I thought to myself, I have been keen to write python a bit for a while. Bob bought the Mark Lutz Programming Python book for my office and I read through about half of it. So why not rewrite the program in python. See how a perl hacker can transfer to using python at least for a small program.

I am happy to say that the page fetching in python even made perl look complex, the code that did the job (and worked, doing a post request fine) was

   name = ' '.join(sys.argv[1:])
   params = urllib.urlencode({'stype': 'Staff Directory', 'button': 'Search', 'querytext': name})
   f = urllib.urlopen(searchuri, params)
   r = f.read()

Cool I thought, this is hell easy, what a fantastic language, I will forever give up my perl ways if everything is this easy and obvious. Obviously this was not going to last, I guess partly because my brain meshes with perl well after so many years, and I am used to perl associative arrays, classes, modules, and regular expressions. Anyway I now had my result from the search and all I had to do was parse it and extract a form that can be printed on a terminal nicely.

First I tried using the python regular expression matching and needed to create some hideous regexp to match the data returned. I also discovered that when a search matches more than about 2 people the data is returned in a different format. Fortunately in this second case the format is really easy to match against with a regexp. Even though the regexp language is similar/identical to perl I was still getting my head around the documentation for all of what I was doing and could not at first construct a regexp that made sense to parse the first sort of data. So I decided to get a HTMLParser and extract the data I wanted without the crap in the tags.

My first attempt was to use the HTMLParser module, however I soon found that this threw an exception when ever I fed it the page from the uni with the matches in it. I tried except: pass in the hopes it would keep on going, however it stopped there and did not process the rest of the page. So I had to change to using the htmllib.HTMLParser which was almost identically easy to use and managed to process the entire page.

Next I wanted to store the data until all matches were found, in perl this would be trivial using a multiple level hash or an array of hashes. Of course the most obvious way to do this in python now I think about it is using a list of dicts. However I had my brain stuck on using a multi level hash. I found this was most difficult in python as you need to initialise dict entries and can not simply assign arbitrarily into them when you need. I needed to use the following construct.

if (D.has_key (key1) == 0):
   (D[key1]) = {}

if ((D[key1]).has_key (key2) == 0):
   D[key1][key2] = ''

s = D[key1][key2]
D[key1][key2] = s + data

Which is obviously a bit more verbose than the perl vernacular of $H{key1}{key2} = $s; I think that dicts do not yet work this easily is a problem, however someone has assured me that future python releases will have dicts that can work as easily as a perl hacker would expect. Anyway rather than next go on to the now obvious that I thought about it list of dicts I was still stuck on the idea of using a pair of keys to access some value, thus a tuple seemed obvious to store the data in a dict still. However this meant that when I extract the values from the dict I can not simply use len on the dict collection as it does not accurately reflect the number of records.

Which of course was the perfect chance to go and learn how to use map and lambda in python, after all I use map in perl often and it really is lovely to have functional capabilities in a language you program in. Using a number as one of the record keys I was then able to have constructs such as (after refactoring to list of dicts I did not need the high = expression and modified the second expression slightly)

high = max (map (lambda k: k[0], D.keys()))

and

name, phone, address = map (lambda k: D[(i,k)],['Name', 'Phone', 'Address'])

The first to find the number of records from the numeric key and the second to extract the information I was interested in printing. The second especially is often used in perl to extract matches with a [0..N] or range(N) sort of thing when you get things with multiple function calls into a list. Such as the perl expression

my @emails = map { $res->getvalue ($_,0); } (0..$res->ntuples-1);

The final problem I had was when printing the data, in perl and c I can do

printf ("%-20s %-12s %46s", name, phone, address)

However in python the string formatting in print did not justify or cut off arguments as expected. Also string.rjust and string.ljust did not limit the size of strings if they were larger than the field size. So I needed to do the following.

   print "%s %s %s" % (name[0:30].ljust(30), \
                       phone.rjust(12), \
                       address[0:45].rjust(45))

That final concern is not really a problem, and arguably clearer as to what is going on than using printf formatting as a c programmer is used to. Anyway if anyone who works at ANU wants to use this from a command line or anyone wants to see it I have it online for download/viewing. There may be a few places I can clean this up better, and the version online is stripped of comments. I can understand how people like the way python works, the code really is almost like pseudo code in many ways, it does most of the time work the way you expect it to, it is a little hard to wrap my perl oriented brain around, however that does not take long to work around I expect. Also anyone complaining about whitespace formatting in python, IMO you are deranged, it really is not an issue needing to use whitespace for program layout.

[/comp/prog] link

home, email, rss, rss2.0, atom