This part is less about number bases and more about optimizing the code I did to handle number bases.
When I first got the code running, it was building two dictionaries every time it was called – one that mapped the characters to their values and another that mapped the values to their characters. This was fine at the time, but I saw that it was slow:
>>> speedTest() calling baser 131,071 times... 6 minutes, 27.36 seconds
(Here is what the speedTest() is – the Timer def it refers to is the DeltaT mentioned here)
def speedTest():
from sys import maxunicode as maxU
from time import time
b=Baser()
print 'calling baser {:,} times...'.format(2*maxU+1)
t1=time()
for i in range(-maxU,maxU+1):#builds largest str, dict on first call
I=i
if i in [-1,1]:I=2
b.Rebase('This is a test of a bunch of things',244,I)
print Timer(time()-t1)
It was slow, but it gave an error that I wanted to have:
>>> b.Rebase('this is a test',12,11)
KeyError: u't'
This error happened when the Rebaser was given a string that contained a character that has a higher value than the base. It meant that you gave it a number that was invalid for the given base. Base 12 above would only have the characters 0123456789AB, not any of the characters in the string ‘this is a test’; so it’s invalid for base 12.
I could’ve just left it at that, but I really wanted to speed it up. So I changed it so that the code only built the two mapping dictionaries when it needed to. So if you gave it a base that was larger than what it currently had mapped, it would map new characters to values and store the updated dictionaries – if you gave it a base smaller than what was currently mapped it would use the dictionaries it had on hand, no need to rebuild them since they contained everything needed for that base.
>>> speedTest() calling baser 131,071 times... 11.46 seconds
This was much much faster. But it came at the price of removing that error. Now you could give the Rebaser something invalid and it wouldn’t have an issue with it:
>>> b.Rebase('This is a test',12,11)
'930734A70932310'
This isn’t something I wanted to happen; I wanted to make sure that if the user mistakenly put in an invalid number for a base, the code would give an error instead of returning something that was wrong.
So my first thought was to “split” the dictionaries that were already made into smaller ones just for the base that was given. But you can’t split a dictionary into a smaller dictionary. A dictionary in Python is an unsorted set, whereas a list in Python is a sorted set. A list you can split into smaller pieces – like a stick. When you break a stick in half you know what you’ll get: two smaller sticks. But splitting a dictionary would be like splitting a box full of Christmas ornaments (the delicate ones) as you’re carrying it up into the attic. You’d just have a mess all over the stairs and the floor. As such you can’t split a dictionary (all those 1’s and 0’s spilling all over your computer’s RAM would be mayhem). The best you can do is build a smaller dictionary out of the larger one – like pulling some items out of the larger box and putting them into a smaller box. So that’s what I did; it looked like this:
{i[0]:i[1] for i in self._DictB.items() if i[0]<B}
But this turned out to be worse than building two dictionaries every time the Rebaser is called.
>>> speedTest() calling baser 131,071 times... 1 hour, 18 minutes, 48.77 seconds
Much worse. But it could give me the error I wanted!
This clearly wasn’t the way to go. I ended up having the dictionaries being built as needed, but explicitly checking to see if there weren’t invalid characters in the input.
>>> speedTest() calling baser 131,071 times... 14.37 seconds
Success! And as an added bonus for when checking for invalid characters I could have the error tell you exactly why the code stopped:
>>> b.Rebase('testing',12,11)
TypeError: Unsupported character for base 12: t
So to recap, I started out with code that took an average of 2.95 milliseconds to process and gave an unhelpful, vague error on invalid inputs. When I finished I had code that took an average of 0.1 milliseconds to process and gave an obvious, helpful error on invalid inputs. I think it turned out pretty well.
But it did take three weeks.
Leave a comment