## Friday, January 16, 2015

### It is by caffeine alone I set my mind in motion...

 “It is by caffeine alone I set my mind in motion. It is by the beans of Java that thoughts acquire speed, the hands acquire shakes, the shakes become a warning. It is by caffeine alone I set my mind in motion.” - The Programmer's Mantra
In my previous post regarding MathLibrary.cmd, I pointed out that division was well-night unusable - dividing a 101-digit number by itself took over 22 minutes. This has led me, in fits and spurts, to find some ways to improve performance and streamline the code since I first started working on this over a month ago; unfortunately, most of my results had only provided incremental improvements at best.

I made a mistake yesterday that changed all that.

While fixing my subtraction code, which actually had a fairly subtle error that produced extra significant digits if the answer had fewer digits than either the minuend or the subtrahend, I accidentally created some code with an O(n^2) algorithm, which produced performance results that looked an awful lot like the results I was getting with my division algorithm. This led me to wonder if perhaps that might be the key to understanding where my problem was - was I iterating through my string, character by character, so often that I was killing my code? Or was something else holding it back?

The answer to both questions turned out to be yes.

First, a bit of technical background. When I first started working on MathLibrary.cmd, I explicitly wrote it in mind with the idea that I'd copy and paste blocks of it as needed for my Project Euler solutions. For example, if I was working on a problem that only needed addition, I'd just copy the part of the script that dealt with adding large numbers (:ExtAdd), along with whatever dependencies (:ExtMatchPad, :ExtDim). Consequently, I specifically built each portion of the script with the assumption that it would run independently. Meanwhile, in order to facilitate the elementary school style math (e.g. iterate through each digit of a large number, do some math against it, carry it to the next one, etc. - we're talking pre-Common Core algorithms here), I had to make sure that both numbers were more or less identically sized - otherwise I'd be adding, say, 9 to absolute emptiness, which would lead to all sorts of unpleasantness. So, to accommodate that, I'd get the length of each number string, then append 0's in either direction as needed.

The mistake I made yesterday was that I got the length of my number string over and over and over again.

Well shoot - why do that? What if I could get it once, store it, pass it to each subroutine, and have it arithmetically manipulated as needed instead of manually checking the length of the string each time? So, last night, that's what I tried - and it worked! I got about a 50% performance improvement from my script by doing that.

Not bad - but still not good enough.

Trouble is, 50% of 22 minutes is still over 10 minutes, which is just not practical if I need to run repeated division operations. I didn't need to drop my division runtime by 50% - I needed to drop it by at least an order of magnitude. What else could I try?

This morning, I had a rather large, rather strong cup of gas station coffee. A bit later, I then had a flash of inspiration. Since multiplication is slow, I smartly decided when I first wrote the division algorithm to do it as rarely as possible - I'd multiply the divisor from 1 to 10, store the values in an array, then compare against those pre-calculated values to find the answer.

What if I didn't multiply at all?

What I was really trying to do was add the divisor to itself ten times, and addition is way faster than multiplication. So, what if I just did that instead?
Testing division operations...
/01 OK 14:48:57.25 14:48:56.64 0:0:0.61
/02 OK 14:48:58.00 14:48:57.26 0:0:0.74
/03 OK 14:48:58.94 14:48:58.01 0:0:0.93
/04 OK 14:49:00.08 14:48:58.94 0:0:1.14
/05 OK 14:49:01.40 14:49:00.09 0:0:1.31
/06 OK 14:49:03.71 14:49:01.41 0:0:2.30
/07 OK 14:49:07.30 14:49:03.72 0:0:3.58
/08 OK 14:49:17.23 14:49:07.32 0:0:9.91
/09 OK 14:49:36.92 14:49:17.24 0:0:19.68
/10 OK 14:49:56.90 14:49:36.94 0:0:19.96
Ah - much better. Now, instead of taking over 20 minutes, division takes less than 20 seconds. Much more livable, especially since this means I can now meaningfully check and troubleshoot it and find little issues like this:
MathLibrary.cmd 1 / 0.5
10.000000000000
Oops - well, looks like I still have some work to do.