Only Python

Tuesday, January 17, 2006

Rur-ple update

As the few readers of this blog know, my programming work had stalled for a few months. However, I have resumed work on rur-ple over the last few weeks and some good news should come out soon.

I have revised all existing lessons and changed from using hard-coded html information (like font-colors and such) to using cascading style sheets. This will give me more freedom in trying various visual designs for the lessons. However, since rur-ple makes use of wxPython and that its browser uses wxHtml which does not recognize style sheets ... I had to write a converter that would take an xHtml file with the associated .css and convert into plain Html. This work has taken a few (long) nights to complete but it is now essentially done; I can finally resume writing new lessons.

While revising the existing lessons, I wrote my own solutions to all suggested exercises. This lead to some reorganisation of the material, some new minor additions and, more importantly, some deletions of less pedagogical examples.

I have also received some feedback (bugs report!) from a MacOS user. I had to make a few changes to the basic code and it should now be more robust on all platforms. Unfortunately, this also lead me to change the default size on opening to 900x660; I just hope that this relatively large size (greater than 800x600) will not create problems for prospective users.

The lessons are now divided in 5 parts:

Welcome to Reeborg's world. This includes 9 lessons and introduce various robot commands.
Reeborg knows Python. 15 lessons, which include an introduction to the following Python keywords: def, if, else, elif, while, not, pass. Notions of "True" and "False" are also introduced.
Python the interpreter. This will be more akin to a "standard" introduction to Python, although with some side-excursion back into Reeborg's world. I have written a few lessons but they will need some polishing. There should be about 10-15 lessons in total covering numbers, strings, tuples, lists, dicts as well as the following keywords: print, for, in, from, import, return, and the pseudo-keyword "as". Variable assignment is also going to be covered in this part.
Learning about objects. This will include lessons going back and forth between "typical" Python examples and examples in Reeborg's world. I have written four lessons and plan for about a dozen. These lessons have been the most fun to write - hopefully, they will be fun to read as well.
Making games with Pygame. I got sidetracked with learning about Pygame a few months ago. I have some first drafts of 7 lessons which were originally written independently of rur-ple and that I will have to integrate in it. I am planning to have about 20 simple lessons that will eventually lead to some separate lessons on "major projects".

I don't know if I will delay the release of version 1.0 of rur-ple until after these 5 parts (excluding the major Pygame projects) are completed. The next release will probably occur once I have completed the first three parts. Hopefully this will be done within a month. At that time, I should probably release a first version (0.1?) of my "xHtml/css to Html" converter.

Finally, as some users have suggested, I might investigate the possibility of turning the lessons into a book. However, I should probably get some more feedback from users before attempting to do so.

Saturday, January 07, 2006

Note to self: about site customization

This post is probably of no interest to anyone but myself. I just installed Python on a new Windows XP (English), with André as my usual user name and SPE wouldn't work, due to some encoding error. The solution (which I only found in Python in a Nutshell - and only because I vaguely remembered what to look for [other Python books did not cover that topic :-( ] ) was to create a file name sitecustomize in Lib with the following:


import sys
sys.setdefaultencoding('iso-8859-1')

I have to remember this...

Friday, January 06, 2006

117 Outdone by a reader!

Markus Schramm from Germany has left a comment on my post Journey to 117 showing how one could shave yet one more character from the 117 character solution to the PyContest challenge. So, 116 it is!

Wednesday, January 04, 2006

css, wxHtmlWindow and going back to RUR-PLE

About a month ago, my second computer (actually, my main computer as the kids had taken over the better one) died. I had saved the hard-drive and managed tonight to recover the data on it. I found the beginning of a "standard" tutorial (in French) that I had started to write so that my kids could learn Python. That was in August 2004. Shortly afterwards, I discovered Guido van Robot and got inspired to create RUR-PLE. I worked fairly steady on it for a year and then, somehow, haven't been able to find the time to work on it for the past few months.

While I have been quite please with RUR-PLE in general, I have found one thing less than satisfactory: RUR-PLE uses wxPython and wxHtmlWindow to display its tutorial.

Unfortunately, wxHtmlWindow does not support style sheets, but only simple html tags. My French tutorial made use of all kinds of css tricks in an attempt to have a pleasant look. I had to gave that up and hard-code font information in the tutorial that has been included with RUR-PLE.

I have found the result with wxHtmlWindow and hard-coded fonts to be somewhat disappointing. So, today I decided that I had to try and implement some work-around so that at least I could use style-sheets to colorize the source code and headings. I have made some limited progress and I hope to have something working soon, so that I can 1. change the existing lessons so that they use some (limited) styles instead of hard-coded font information, and 2. go back to writing some more lessons and get to version 1.0!

Note: I know of the existence of wxMozilla, but haven't been able to get it to work last time I tried. RUR-PLE, which is geared towards beginners (or teachers!) already require three separate installs (Python, wxPython, and itself!). I find this is already a bit much. Besides, if I haven't been able to get wxMozilla to work on my computer, how could I expect a complete beginner to do it?...

Saturday, December 31, 2005

Journey to 117

This blog entry is dedicated to Michael Spencer whose 120 character-long entry in the PyContest was submitted less than 90 minutes after the official beginning of submissions. Michael's entry gave a target to aim for which few people managed to reach. I strongly suspect that, in large part because of this, various hints have been posted on the web, enabling more people to slowly crawl towards ever-shorter solutions. I must admit that without the various hints I have read, I would never have been able to challenge Michael's solution.

After being asked a few questions here and elsewhere, I've decided that I should try and document the reasoning that eventually led me to my final solution to the PyContest. Of course, it is going to be more linear here (and with fewer dead-end explorations) than it was in real life. I definitely welcome comments from others' journeys.

First solution

After writing on paper the desired output for an input that included all digits, I quickly wrote down the following program to produce the desired output:


def seven_seg(i):
~~n=[[" _ ","   "," _ "," _ ","   "," _ "," _ "," _ "," _ "," _ "],
~~~~~["| |","  |"," _|"," _|","|_|","|_ ","|_ ","  |","|_|","|_|"],
~~~~~["|_|","  |","|_ "," _|","  |"," _|","|_|","  |","|_|"," _|" ]]
~~m=''
~~for j in 0,1,2:
~~~~for d in i:
~~~~~~~~m+=n[j][int(d)]
~~~~m+='\n'
~~return m
print seven_seg('0123456789')

In an attempt to preserve the visual appearance, I have replaced all the leading spaces by "~". In the future, whenever you see a "~", take it to be a space. After removing the print statement, the number of characters is 310, which is quite a ways from 117. One can reduce the number of characters by removing many unnecessary spaces but I will not do that until we are much closer to the "final" solution. Furthermore, I will keep the explicit for loops rather than using generators [with the ''.join() method] until the very end as I find the code easier to read that way. I will, however, move the [m=''] assignment within the function argument, thus removing a line (and at least two extra characters!) from the solution.

My next step was to look at the various "triplets" (three-character combinations) that appeared in "n". Quite a few are repeated. In fact, there are 7 unique triplets; they are:


"~~~", "~_~", "~_|", "|_~", "|_|", "~~|", "|~|"

With this ordering, and giving each triplet a number from 0 to 6, we have each input digit can be given by the following three number combination:
0=164, 1=055, 2=123, 3=133, 4=065, 5=132, 6=134, 7=155, 8=144, 9=142.

Note that the first digit of each triplet is either a 0 or a 1; this will be important later. Using these triplets leads to a second, shorter, solution:


def seven_seg(i, m=''):
~~n=["~~~","~_~","~_|","|_~","|_|","~~|","|~|"]
~~for j in '1011011111', '6622433544', '4632524542':
~~~~for d in i:
~~~~~~m+= n[int(j[int(d)])]
~~~~m+= '\n'
~~return m

Note that the string '1011011111' represents the first digit of each triplet, and that the order in which they appear is '0123456789'.

By moving the inner for loop statement (5th line) at the end of the 4th line, using a single space for the first indentation and a tab character (expanded by Python to 8 spaces) for the second level of indentation and removing other non-required spaces, it is possible to reduce the length of the code to 171 characters -- still a long way from our final goal.

The next step is to eliminate "n" altogether; we can do this in two steps.
The first step is simply use it as is in the line


def seven_seg(i, m=''):
~~for j in '1011011111', '6622433544', '4632524542':
~~~~for d in i:
~~~~~~m+= ["~~~","~_~","~_|","|_~","|_|","~~|","|~|"][int(j[int(d)])]
~~~~m+= '\n'
~~return m

The second step is to try to replace the list of strings by a single string and extract, when assigning, a 3-character substring. The shortest such string that we can construct is 14 character long. There are many such strings; here is one of many examples:


'~_~_|_~~~|~|_|'
~01234567890123

Using this string, we see that the last 3-character substring, '|_|', starts at index 11. To encode the position of each beginning index, we would need at least 2 digits for some. A better way appears to lengthen the string a bit so that each required 3-character substring starts on an even index -- the last character of one substring being the first character of the next. Here is one such solution -- which, other than for some unnecessary spaces, is the exact first solution that I submitted at the exact opening of the contest at 14:00 UTC; for one minute it was in first place. :-)


def seven_seg(i, m=''):
~for s in '5455455555', '2600133611', '1630601610':
~~for k in i:
~~~~j=int(s[int(k)])*2;
~~~~m+='~_|_|~|_~~~_~~|'[j:j+3]
~~m+='\n'
~return m

Not having done any programming for a while, I didn't realise at the time that 'some_string'[j:j+3] could be written as 'some_string'[j][:3], thereby saving one assignment (to j) and some precious characters all around.

The next step is to try to shorten the three long numbers: '5455455555', '2600133611' and '160601610'. This can be done by encoding them in a base other than 10 and extracting the result when needed. A better way however is to re-arrange the number as a series of triplets and then encode the result. For example, the three long numbers just mentioned can be re-ordered as follows:


0=521, 1=466, 2=503, 3=500, 4=416, 5=530, 6=531, 7=566, 8=511, 9=510

This gives rather large integers (ex: 521) to try to encode in a base different from 10. A better choice would be to use the ordering


0=164, 1=055, 2=123, 3=133, 4=065, 5=132, 6=134, 7=155, 8=144, 9=142

that we had mentioned earlier, as it gives smaller integers. Now, Python's built-in function int() allows conversion to base 10 from other bases up to 36. Still, that would require a 2 character encoding from each triplet. A better choice is to take each triplet to be the decimal representation of an ascii character. As each triplet is a number less than 255, this is certainly possible (barring some unfortunate coincidence when a given triplet would correspond to a newline character.)

Actually, since none of the individual digits in the triplets is greater than 6, we can take them to be numbers in a base less than 10; from what I gather, most people chose 8 (as they try to do stuff with bit shifts and the like) and I chose 7 (for one of my submissions). Thus, a number like 164 is taken to mean

164 (in base 7)= 1*49 + 6*7 + 4 = 95 (in base 10). This will help further reduce the number of characters. Let's assume that everything is correct with these choices (i.e. still no newline character) and that the encoding of our triplets can be represented as


0=z, 1=o, 2=t, 3=T, 4=f, 5=F, 6=s, 7=S, 8=e, 9=n

where I have use the first letter of each number as a representation of its encoding.

As an aside, some useful comments about this (and other aspects of this problem) can be found on the following two blog entries (including comments):

Guyon Morée's blog

Roberto Alsina's blog

The corresponding program, still written with the explicit for-loops, can be written schematically as follows:


def seven_seg(i, m=''):
~for a in 49, 7, 0:
~~for k in i:
~~~~m+='~_|_|~|_~~~_~~|'[ord("z0tTfFsSen"[int(k)])/a%7*2:][:3]
~~m+='\n'
~return m

Let's examine what the innermost loop does. We convert each character in the input string into a digit, extract the corresponding ascii-encoded triplet, decode it into an actual number, divide by the appropriate power of our base and take the modulo in that base, thus extracting the relevant digit in the triplet; the result is multiplied by 2 to find the relevant index in our string of "~|_". So you can see the advantage of using a base less than 10 as it takes fewer characters to write.

To make the solution as short as possible, we rewrite it as a lambda and use join() and generators instead of explicit for loops. This also eliminates the use of an extra variable ("m"). The result can be written as:


j=''.join
seven_seg=lambda i:j(
j('~_|_|~|_~~~_~~|'[ord("z0tTfFsSen"[int(k)])/a%7*2:][:3]for k in i) 
+'\n' for a in(49,7,0))

When all the extra spaces are removed, this gives a 121-character long solution. I'll refer you to the already mentioned blogs for a 120 character long solution!

To get to 117 characters, we need a slightly different approach.

We need to go back to some assumptions we made earlier. Do we really need to have even-numbered indexes? If not, we could use a shorter "~|_" type string which included all relevant 3-character substrings, thereby possibly saving one character. Let us look at such a possible string and the various indices. I will take the string of my posted 117 character long solution '~_~~~|_|_~_|~|', and extract the location of the indices of the relevant substrings:
"~~~":2, "~_~":0, "~_|":9, "|_~":7, "|_|":5, "~~|":3, "|~|":11.
With these 3-character substrings, our triplets are:


0=0-11-5, 1=233, 2=097, 3=099, 4=253, 5=079, 6=075, 7=033, 8=055, 9=059

For each triplet, instead of encoding it in a given base, we will look at a number which, when taking the modulo with three different bases, will give us back the relevant digit. By this, I mean something like the following:


165%3 = 0
165%14 = 11
165%10 = 5

i.e., the triplet for "0" could be represented by the integer 165 when taking the modulo with 3, 14 and 10 respectively - as I have done in the 117 solution. Using brute force, which is what one does when one is too tired, one can loop over the integers less than 255 and find out if all the relevant triplets can be found. Or, you notice that 11*5*3 = 165 [11%14=11, 5%10=5, 3%3=0], and you proceed from there. With this information (and the comments on the previous blog entry), you should now have all the relevant information to understand how this solution was obtained.

On that final note, I wish you a very Happy New Year!

Friday, December 30, 2005

pyContest challenge: 117 character-long solution

It has been quite a while since I did any programming; the absence of posts on this blog (nothing since the end of July) is a reflection of this. Still, with a welcome break during the Xmas holidays, I got spurred back into action with the http://www.pycontest.net/ pyContest challenge. It has been fun.

On to serious business. A 117 byte solution to the challenge is given by:


j=''.join;seven_seg=lambda z:j(j(' _   |_|_ _| |'
[ord("'\xa5\x8f\xb1\xdb\xad\xbdi\x03K\x9f'"[int(a)])%u:][:3]for a in z)
+"\n"for u in(3,14,10))

where I have written the encoded string in a way that blogger would not complain about and added some line breaks as blogger breaks it up at inappropriate places. Last night, I couldn't figure out how to generate the string with the \x03 character included. (Today, after the challenge was over, I simply did print chr(3) in a Python interpreter and cut and paste the appropriate character in the string.) My attempts at generating the encoded string and printing to a file with the \x03 character included always resulted in an empty file.

So, what I did was to subtract one from each character before writing to the file ... and then change "ord" to "-~ord" to shift back the values to the right ones ... but thereby adding two extra characters. As this was already a solution shorter than those posted, I decided it was time to give up and try to get some much needed sleep.

Note that the string ' _ |_|_ _| |' is shorter than the "usual one" where the 3-character strings start on an even index.

Here's how to generate the encoded string (with 33 replacing 3; replace with cut-and-paste after):

dec_list=(165,143,177,219,173,189,105,33,75,159)
encoded = ''.join(chr(i) for i in dec_list)

These decimal numbers encode the following
>>> 165%3
0
>>> 165%14
11
>>> 165%10
5

Thus the number '0' is represented by (0, 11, 5) which are the beginning indices of 3-character strings in ' _ |_|_ _| |'. Thus,
'0' = (' _ ', '| |', '|_|').

I was going to wait until the official announcement to post this but thinking of how curious I was earlier on to eventually find out what I thought would be the winning solution (120 character), I decided it was time to do it.

Note For a more detailed explanation, see the next entry at Journey to 117

Friday, July 29, 2005

Poor man's i18n

Internationalization (i18n) of a [Python] program appears to be daunting task. After looking on the web, I haven't been able to find a simple tutorial explaining the steps to be taken; the closest that I have found is the wxPython one, on the wxPython wiki. The procedure described appears to be rather involved, even if one only deals with translations of text (and not localization of dates, currencies, etc.), which is what I will focus on.

The first step is the replacement of all strings of the form:
"This needs to be translated"
by the following call (interpreted to be a C macro in Gnu's gettext)
_("This needs to be translated")
which is very simple.

The standard way then requires the use of gettextand results in the creation of ".pot" files (Portable Object Templates) to be copied and translated in ".po" (Portable Object) files by a human translator; these are then converted into ".mo" (Machine Object) files by a compiler. Yet, a few more steps, mostly dealing with directory structures and setting up locales, are needed before one can conclude the process.

I present here a simpler way to proceed which, at a later time, can easily be converted to the standard gettext method as it basically uses the same "_()" notation required by gettext. This was inspired by a comp.lang.python post by Martin v. Löwis to a question I asked almost a year ago.

Consider the following simple (English) program [Note: "." are used to indicate indentation as Blogger appears to eat up leading spaces]

def name():
....print "My name is Andre"

if __name__ == '__main__':
....name()

The French translation of this program would be

# -*- coding: latin-1 -*-
def name():
....print u"Je m'appelle André"

if __name__ == '__main__':
....name()

Without further ado, here's the internationalized version using a poor man's i18n method and demonstrating how one can easily switch between languages:

from translate import _, select

def name():
....print _("My name is Andre")

if __name__ == '__main__':
....name()
....select('fr')
....name()
....select('en')
....name()

The key here is the creation of the simple translate.py module:

__language = 'en'  # leading double underscore to avoid namespace collisions

def select(lang):
....global  __language
....__language = lang
....if lang == 'fr':
........global fr
........import fr
def _(message):
....if  __language == 'en':
........return message
....elif  __language == 'fr':
........return fr.translation[message]

together with the language-specific fr.py module containing a single dictionary whose keys are the original English strings.:

# -*- coding: latin-1 -*-

translation = {
"My name is Andre" : u"Je m'appelle André"
}

That's it! Try it out!

In conclusion, if you want to make your programs translation friendly, all you have to do is:

replace all "strings" by _("strings")
include the statement "from translate import _, select" at the beginning of your program.
create a file named "translate.py" containing the following:

def select(lang):
....pass

def _(message):
....return message

and leave the rest to the international users/programmers; you will have done them a huge favour!