Problem: How do you sort and store 1 million 8 digit numbers into 1 MB of ram
Initial thoughts:
7 digit number = 0-99,999,999 or in binary 0-0b101111101011110000011111111 (27 bits)
storing a million numbers as ints(32 bits) would require...
32 bits * 1 million = 3.81469727 megabytes
So some sort of compression is going to be needed. If you rearrange the formula above and solve for the number of bits required for roughly 1 MB of Ram you get the following.
1 million * 8bits = 976.5625 kilobytes
so we have roughly 1 byte to store a single number 2^8=256 The problem with binning is that you would have to have a whole lot of bins.
27bits-8bits = 19bits worth of bins 2^19 = 524288.
Worse than that is the average number of numbers per bin
1 million / 2^19 = 1.907 numbers per bin.
Due to the random variability of the input it is almost garunteed that the bins will not have an equal number of numbers per bin even if the bins had only 3 spots in them that is 3*2^19*8 bits = 1.5 MBs. This is also a mathematical statement of how dense the numbers are. There are 99,999,999 possible numbers so the average spacing between the numbers is 99. Maybe a better way of storing the numbers is not by bins and the absolute value of the numbers but by their relative spacing from one another.
Differential Encoding
Take for example an "array" with a couple numbers in it. Let's say we shall use the numbers...
5,10,30,45
If we were to store them as just integers we would need 32bits per int. in stead of representing their total value lets use the spacing between them instead.
0 |==5==| 5 |==5==| 10 |==20==| 30 |==15==| 45
we can now repack our numbers as the difference between our original numbers
5,5,20,15
While at first glance it may seem as though we haven't really made our numbers any easier to store the difference becomes more apparant with larger numbers
==| 1111111111105 |==5==| 1111111111110 |==20==| 1111111111130 |==15==| 1111111111145
Translates
1111111111105,1111111111110,1111111111130,1111111111145
Into
~,5,20,15
This is a great way to pack densely spaced numbers together. There are several other neat little advantages to this type of encoding. First and foremost is that there is almost no memory overhead involved with storing the data structure. It's only a flat array with no trees or anything like that.
I had to make a couple tweaks to our model to make it work on our system. First off we can't use variable length integers since that would require extra space to store the size of the integer. We only have 8 bits to store our 27 bit number but the problem with 8 bit numbers is that would set a maximum distance between numbers as being only 255. Obviously there are going to be cases where the distance between adjacent numbers is going to be significantly more than 255 (since the max size of our numbers is 9999999). We get around this by setting a rule where if the current by is 255 it doesn't contain a number.
This is our first major limitation of this type of encoding. If the numbers inserted into the array are greater than 255 we have to waste bytes as "spacers". Fortunately we know from one of the above calculations that there should be roughly 1 number per 99 worth of range.
Another limitation of this type of encoding could be the limit of how large of a number we could theoretically represent. a quick calculation shows that the max number that can be represented by a million bytes of our encoding is....
1 million * 255 ~ 255,000,000 max range that can be represented by an array like this.
but that is with an unoccupied array. What about an array where there is an average of 99 between the numbers?
1 million * 99 ~ 99,000,000
That's cutting it close but it should work.
Recursion vs. Loops: so I've run into performance problems/scaling problems. While my simple recursive version of the insert function worked for a small number of bytes it didn't scale much beyond that. First being that using recursive functions causes stack overflows.
A recursive function is a function that calls it self such as seen below.(note that it calls itself)
RecursiveFunction(){
//do something
RecursiveFunction();
}
The problem with this type of a function is that the compiler has to keep track of all active function calls so if our function calls it self a million times the compiler has to keep track of 1 million active function calls, which it can't keep track of so it crashes. So first step was to replace those loops with for loops. this replaces the recursive call with a repeating loop so that no extra calls are made.
LoopFunction(){
Loop{
//do something
}
}
Bins
Next Problem I'm facing is that insertions take ~O(n). Which is fine with a couple thousand bytes but in the array but not when there's a million or so bytes insertions will take forever. Currently with a million byte array and 10,000 numbers to add it take's ~42 seconds. So to sidestep this dilemma I'm going to use bins to break the array into chunks (so much for saying bins wouldn't work). so I'm just going to say that I'm going to want bins with roughly 50,000 numbers in them each. so
1,000,000/50,000 = 20 bins
|=========================|
| one giant array |
|=========================|
gets turned into....
|======|======|======|======|
| array | array |array | array |
|======|======|======|======|
Multithreading : So performance is still a problem. Inserting a million numbers into a ~million byte array takes~6 minutes so I've decided to multithread the code. Clearly any microcontroller with only 1 meg of ram is going to have a few cores laying around to speed things up. The reason for leaving this optimization for near last is that it is tricky to get right and oftentimes extremely difficult to get right. The trickiest part of getting multithreading working is to keep all the workers from stepping on each others toes. It is really easy to get into a situation where one thread is modifying a value that another is working on or reading. Fortunately our decision to use bins earlier makes this a whole lot easier. Each bin is fully independent of the other bins so each thread can be assigned to a single bin and there's no worries that one thread is going to get in the way of another.
I've implemented this in code by having one thread distribute numbers to the bins via a series of queues. once numbers start arriving in the queues the bin thread takes the number and inserts it into its own array.
|incoming numbers
|
|
|=thread distributing numbers=|
|| || || ||
|| || || || Queues
|| || || ||
|======|======|======|======|
| array | array |array | array |
|======|======|======|======|
This approach also allows the program to gracefully scale up to the number of bins(currently 40).
If you are feeling adventurous you can download and run the program (it's an exe (I know))
Currently on my 4 core 3.0ghz AMD Athlon x4 I can insert 1 million ints in 2:07.2802065.
https://docs.google.com/open?id=0BzYhy424geZ2RHZzenVxRnJBWm8
edit: Unfortunately the forum software mangled my pretty asci art. But I think the point still gets across.
Final Output From The Program
Creating new Array
#threads used: 40
added: 0
added: 50000
added: 100000
~~~~~~~
added: 900000
added: 950000
waiting for thread to finish
finished sorting
Time elapsed: 00:02:07.2802065
numberofints: 1000000
Total Memory in kbytes: 1113
finished testing