Steven email: sjh@svana.org
web: https://svana.org/sjh Other online diaries:
Aaron Broughton, Links:
Linux Weekly News, Canberra Weather: forecast, radar.
Categories:
|
Thu, 08 May 2008
Yet Another Cycling Jersey - feeding the addiction - 16:33
Move a little thing to python - 13:44
Sometime last year I realised that though the URL I was using on the ANU Internal Web still worked it seemed not to interface with the latest phone database for the uni so it sometimes did not match people I knew worked on campus, other times it contained out of date numbers for people. However there were other important uses for my time so I did not bother looking too closely into updating it when most of the time the old results were still good enough. Finally this week Bob noticed there were no matches coming back, it seems the old interface no longer connected to the database correctly. Thus I opened the program and had a look at updating it. The old program used LWP to fetch the page with a GET request. The newer interface now on ANU Web works properly with a POST request. Also the result page is more complex to parse than the old one (more complex regular expressions, or maybe a small state machine needed). Still it did not look too hard to spend an hour or so fixing the old perl code up to get the new page and parse it properly for the desired results. However I hit a snag when for some reason LWP did not fetch the entire result from the web server that was returning the data in chunks. A tcpdump session showed it simply closed the request rather then fetch all the data. At this point I could have debugged the perl code and fixed, after all there is no good reason LWP should not work. However I thought to myself, I have been keen to write python a bit for a while. Bob bought the Mark Lutz Programming Python book for my office and I read through about half of it. So why not rewrite the program in python. See how a perl hacker can transfer to using python at least for a small program. I am happy to say that the page fetching in python even made perl look complex, the code that did the job (and worked, doing a post request fine) was
name = ' '.join(sys.argv[1:]) params = urllib.urlencode({'stype': 'Staff Directory', 'button': 'Search', 'querytext': name}) f = urllib.urlopen(searchuri, params) r = f.read() Cool I thought, this is hell easy, what a fantastic language, I will forever give up my perl ways if everything is this easy and obvious. Obviously this was not going to last, I guess partly because my brain meshes with perl well after so many years, and I am used to perl associative arrays, classes, modules, and regular expressions. Anyway I now had my result from the search and all I had to do was parse it and extract a form that can be printed on a terminal nicely. First I tried using the python regular expression matching and needed to create some hideous regexp to match the data returned. I also discovered that when a search matches more than about 2 people the data is returned in a different format. Fortunately in this second case the format is really easy to match against with a regexp. Even though the regexp language is similar/identical to perl I was still getting my head around the documentation for all of what I was doing and could not at first construct a regexp that made sense to parse the first sort of data. So I decided to get a HTMLParser and extract the data I wanted without the crap in the tags. My first attempt was to use the HTMLParser module, however I soon found that this threw an exception when ever I fed it the page from the uni with the matches in it. I tried except: pass in the hopes it would keep on going, however it stopped there and did not process the rest of the page. So I had to change to using the htmllib.HTMLParser which was almost identically easy to use and managed to process the entire page. Next I wanted to store the data until all matches were found, in perl this would be trivial using a multiple level hash or an array of hashes. Of course the most obvious way to do this in python now I think about it is using a list of dicts. However I had my brain stuck on using a multi level hash. I found this was most difficult in python as you need to initialise dict entries and can not simply assign arbitrarily into them when you need. I needed to use the following construct.
if (D.has_key (key1) == 0): (D[key1]) = {} if ((D[key1]).has_key (key2) == 0): D[key1][key2] = '' s = D[key1][key2] D[key1][key2] = s + data Which is obviously a bit more verbose than the perl vernacular of $H{key1}{key2} = $s; I think that dicts do not yet work this easily is a problem, however someone has assured me that future python releases will have dicts that can work as easily as a perl hacker would expect. Anyway rather than next go on to the now obvious that I thought about it list of dicts I was still stuck on the idea of using a pair of keys to access some value, thus a tuple seemed obvious to store the data in a dict still. However this meant that when I extract the values from the dict I can not simply use len on the dict collection as it does not accurately reflect the number of records. Which of course was the perfect chance to go and learn how to use map and lambda in python, after all I use map in perl often and it really is lovely to have functional capabilities in a language you program in. Using a number as one of the record keys I was then able to have constructs such as (after refactoring to list of dicts I did not need the high = expression and modified the second expression slightly)
high = max (map (lambda k: k[0], D.keys()))and name, phone, address = map (lambda k: D[(i,k)],['Name', 'Phone', 'Address']) The first to find the number of records from the numeric key and the second to extract the information I was interested in printing. The second especially is often used in perl to extract matches with a [0..N] or range(N) sort of thing when you get things with multiple function calls into a list. Such as the perl expression my @emails = map { $res->getvalue ($_,0); } (0..$res->ntuples-1); The final problem I had was when printing the data, in perl and c I can do printf ("%-20s %-12s %46s", name, phone, address)However in python the string formatting in print did not justify or cut off arguments as expected. Also string.rjust and string.ljust did not limit the size of strings if they were larger than the field size. So I needed to do the following.
print "%s %s %s" % (name[0:30].ljust(30), \ phone.rjust(12), \ address[0:45].rjust(45)) That final concern is not really a problem, and arguably clearer as to what is going on than using printf formatting as a c programmer is used to. Anyway if anyone who works at ANU wants to use this from a command line or anyone wants to see it I have it online for download/viewing. There may be a few places I can clean this up better, and the version online is stripped of comments. I can understand how people like the way python works, the code really is almost like pseudo code in many ways, it does most of the time work the way you expect it to, it is a little hard to wrap my perl oriented brain around, however that does not take long to work around I expect. Also anyone complaining about whitespace formatting in python, IMO you are deranged, it really is not an issue needing to use whitespace for program layout. |