Friday, October 31, 2008

Efficient encoding of URL arguments

I would like to have URLs which look like:
"http://web.host/queryhandler/Tnidf3445ssx34" or something similar. That last part should be some reasonably space-efficient encoding of something like "SELECT * from FOO where CONDITION"

I have found I can get part-way using Python's encode and decode functions, e.g.
encoded = "SELECT * from FOO where CONDITION".encode("hex")
decoded = encoded.decode("hex")

However, the resulting values are still a bit long. This method quite nearly removes nonprintable characters, spaces etc from the URL. However, I have this nagging feeling that I could be doing some compression along the way to make the encoded part shorter. For example, HEX uses only part of the alphanumeric space. I'd like to use the digits 0-9 and letters a-z and A-Z ideally.

Let's just make this clear -- for no very good reason that I can justify, I want to encode a query as the shortest possible string using printable, alphanumeric characters (and no spaces) and be able to simply decode the string also.

"The Internet" has told me about md5 hashing, but I need a two-way function here. The help on string.encode in Python simply says I can specify the name of any encoder, but I can't find a list of the default encoders available.

Any ideas, internet?

Cheers and thanks in advance,
-T