Saturday, January 24, 2015

Make Friends with the personal - not the political

Moloch by Karl-Ludwig Poggemann is licensed CC BY 2.0.

When I first ran across Slate's moralist harangue against Friends yesterday, my initial thought was, "Ah, this is obvious clickbait - pick something relatively non-offensive, choose a contrarian position against it, then let the clicks of outrage flow in". It looked like a classic toxoplasma, an intentional summoning of Moloch himself - intentional enough, in fact, that I assumed everyone else would see through it for what it was and let it silently die into obscurity.

I was wrong.

In my defense, I travel frequently in libertarian circles, where this sort of thing is such a predominant business plan of several notorious bloggers that even sympathetic libertarians are getting sick and tired of it, so what seemed obvious to me might be less so to others. Even so, I didn't seriously expect the idea that Friends needed some sort of moral and political deconstruction to actually receive anything resembling widespread support. And yet, there were so many bloggers that pursued this line of reasoning that this was the most "positive" conclusion that Salon could finish its roundup with:
As Vulture’s Margaret Lyons wrote in her “Stay Tuned” TV advice column, in response to a reader expressing discomfort with the show’s homophobia: “You can still love ‘Friends,’ but why would you want to love it like you did before? Love it the way you see it now, with the things you know now and the values you have now. I love ‘Friends,’ but I do not love its body or queer politics. Those things can be true at the same time.”
Indeed they can, assuming all entertainment and life experiences must be filtered through a political or ideological lens. That, however, is a very bad idea - and a very old one.


For whatever reason, I've been on a Soviet history reading kick lately. More recently, this has included Red Plenty, which came to my attention via Scott Alexander, and my current read, Anne Applebaum's Gulag: A History. One recurring theme of these books was how thoroughly the Soviet state and its ideology inserted itself into daily life, from scientific discovery to entertainment and beyond. See, from the Soviet Union's perspective, the New Soviet Man had to be consciously crafted - all bourgeois attitudes and mannerisms had to be cast out and replaced with a more socialist attitude. According to Soviet dogma, though they didn't quite phrase it quite this succinctly, the personal was political - as Trotsky put it, "You may not be interested in the dialectic, but the dialectic is interested in you." In order for the New Soviet Man to overcome his selfish, bourgeois habits, he needed to be reeducated so that he may better serve his fellow man according to his ability without requiring more than his need. To that end, artistic expression was closely proscribed - so closely, in fact, that the CIA secretly sponsored abstract art to undermine the Soviet government. Soviet biology, meanwhile, had been thoroughly destroyed by Lysenkoism, which asserted, among other things, that plants behaved according to communist principles and that genetics (a rather dangerous concept for an ideology that requires people to be perfectly malleable like clay) was a "bourgeois" concept meant to undermine the revolution. Soviet plays and shows had to express uplifting, socialist attitudes, as did Soviet music. This concept was adopted by other Soviet-aligned regimes - Kim Jong-Il famously wrote about opera, cinema, music, and architecture, focusing on how to best align these disciplines within the greater socialist project.

In short, everything - everything - had to advance the cause of communism and the revolution. Everything must show more concern for society, as defined by communism, than for the individual. Nothing could be enjoyed solely on its merits; everything must be enjoyed solely in the context of how it might better improve society.


The Soviets and their Communist allies weren't the only ones, of course. Just before Friends came out, former Vice-President Dan Quayle famously blasted Murphy Brown because it failed to display sufficient concern for society:
It doesn’t help matters when prime time TV has Murphy Brown – a character who supposedly epitomizes today’s intelligent, highly paid, professional woman – mocking the importance of a father, by bearing a child alone, and calling it just another “lifestyle choice.”

I know it is not fashionable to talk about moral values, but we need to do it. Even though our cultural leaders in Hollywood, network TV, the national newspapers routinely jeer at them, I think that most of us in this room know that some things are good, and other things are wrong. Now it’s time to make the discussion public.
This wasn't a new theme. In 1985, Tipper Gore, former wife of former Vice-President Al Gore, co-founded the Parents Music Resource Center, which wished to provide greater control to parents over music that wasn't sufficiently deferential to the concerns of society. To that end, the PMRC staged a Senate hearing, during which American government officials grilled various artists that were deemed "objectionable" by the PMRC about the content of their music. This led one targeted artist, Frank Zappa, to give a spirited defense of individual expression against the concerns of society. I encourage you to stop right now and read the whole thing, but if you're feeling impatient, here's the start of it:
The First thing I would like to do, because I know there is some foreign press involved here and they might not understand what the issue is about, one of the things the issue is about is the First Amendment to the Constitution, and it is short and I would like to read it so they will understand. It says:

Congress shall make no law respecting an establishment of religion or prohibiting the free exercise thereof, or abridging the freedom of speech or of the press or the right of the people peaceably to assemble and to petition the government for a redress of grievances.

That is for reference. 
These are my personal observations and opinions. They are addressed to the PMRC [Parents’ Music Resource Centre] as well as this committee. I speak on behalf of no group or professional organization. 
The PMRC proposal is an ill-conceived piece of nonsense which fails to deliver any real benefits to children, infringes the civil liberties of people who are not children, and promises to keep the courts busy for years, dealing with the interpretational and enforcemental problems inherent in the proposal’s design. 
It is my understanding that, in law, First Amendment Issues are decided with a preference for the least restrictive alternative. In this context, the PMRC’s demands are the equivalent of treating dandruff by decapitation.
Zappa then proceeds to explain not only why the PMRC's proposal is wholly inappropriate to a culture that ostensibly prizes freedom of expression, but also how the proposal actually obscures a hidden issue - a proposed tax on blank tapes, which was proposed by the recording industry to discourage copying of music while simultaneously serving as a handout to politically connected music companies. Whether the politicians of the time were opportunistically using the "cover fire" generated by the PMRC's proposals to sneak in a measure for their friends or whether the politicians themselves actually consciously generated the "cover fire" is certainly open to debate; whichever way it happened, though, it was crystal clear that the call for censorship had far less to do with any concerns for society and far more to do with the concerns of particular politically connected individuals.

This, it should go without saying, mirrored the experience in every other country that insisted that all speech is political and must be controlled accordingly.


Again, however, this isn't a new idea. Even by the time of the October Revolution, European civilization had hundreds of years of experience with treating the personal as political. As the French Revolution spiraled out of control, the Reign of Terror left no corner of society untouched. The French people were stripped of their religion and even their calendar. Even by then, however, this was still old hat. Puritans famously banned several forms of entertainment on the grounds that such pleasures interfered with their ideological struggle against sin, and personal enjoyment of such pleasures lacked sufficient concern for society - "idle hands are the Devil's playthings" and all that.

Speaking of "idle hands", who came up with that saying? Why, it was St. Jerome in the 4th century.

What's old is new again.

As Mark Twain once said, "History doesn't repeat itself, but it does rhyme."


Interestingly, though you wouldn't know it by reading that Salon article today, quite a few people on the left are starting to realize that politicizing the personal really isn't a good idea. Fredrik deBoer noted a few months back that this tactic is a two-edged sword (emphasis added by yours truly):
I’ve really been genuinely disturbed by #GamerGate. Obviously, some of that is just the threats and harassment of women online. But it’s also disturbing how successful they’ve been in pressuring advertisers, and in getting parts of the media to credulously accept much of their narrative. To me, it’s indicative of the problems that come about when there’s no limit to how much we politicize the personal.

We on the left have argued for ages that “the personal is political.” We’ve told people that they should look for political resonance in every aspect of their personal lives, in order to see the hand of various oppressions at play in microcosm. And we’ve incentivized that behavior in the way we always do, by treating the deployment of that kind of argument as a trump card against those who you’re arguing with. We have this magic words theory of argument where if you deploy certain terms like “tone policing,” the expectation is that you’ve won the argument and the other side has to stop arguing immediately. But those tactics don’t occur in a vacuum. Campus conservatives, for example, have succeeded in so many of their provocations because they have very deftly adopted the tactics and vocabulary of the academic left and employed them for their own purposes. And now we’re seeing the same thing from the GamerGate crew: this isn’t a fashion or hobby for me, it’s an identity. Your criticisms aren’t criticisms, they’re bullying. I’m not being blamed for bad behavior, I’m being oppressed.
And by two-edged sword, I mean a giant sinkhole that swallows all who dare stare into it:
Argument is like all other human behaviors: subject to conditioning through reward and punishment. And we’ve created these incentives on the left: always politicize; always escalate; always ridicule. We’re living with the consequences of those tendencies now. Unfortunately, I don’t know how we build a new left discourse, given that the two current modes of left-wing expression appear to be a) showily condescending ridicule and b) utter fury. I mean you can guess what the response by some will be to this essay: deBoer doesn’t think racism is real, he doesn’t think sexism is real, he wants people to just get over it when they’re the victims of sexism and racism. None of that is true. I write about the structural racism of our society constantly. I believe that we’re still a deeply, inherently sexist culture. (For example, you may have heard of #GamerGate.) And I absolutely believe that there are tons of daily encounters that demonstrate these problems, and that the victims of them should feel comfortable speaking out.

I just also think that we have to be able to say “you know, I don’t think that your particular political critique here is correct” without being accused of failing to oppose racism and sexism in general. And I think that we have to recognize that, by treating claims of oppression as immediate conversation winners, without the expectation that people actually have to defend and support those claims with evidence, we make the appropriation of these techniques that we’re seeing with GamerGate inevitable.
History has shown time and again what happens when you politicize the personal. When all personal action becomes a political statement, any personal action might command a political response. At best, this might involve a bit of hand wringing and calls for better "standards" and "taste". At worst, the ultimate embodiment of the political - the state - exercises its muscles, its guns, and its soldiers and provides the ultimate political response.

This still doesn't stop people from trying. It never will.


What really concerns me about the attempted politicization of Friends - beyond the raw ad-driven avarice that's powering it and which, by commenting on it, I'm contributing to - is that at least the Puritans and Tipper Gore and Dan Quayle were fighting for a fairly constant set of well known rules. Oh sure, there are some variations and oddities here and there in terms of interpretation and how naively literally we should interpret the Bible, but at least nobody's adding new chapters to the Bible on the fly (well, almost nobody). The basic rules that the Pilgrims and Tipper and Dan were fighting for are more or less the same, though they might disagree individually on degree - choose activities that glorify God and family and shun just about everything else. They're not rules I agree with - I'm an atheist, personally - but they are rules, they are written down, more or less, and just about everybody in our cultural corner is generally familiar with them. Granted, they're rules that are almost explicitly designed to guarantee that you'll be guilty of something - just looking at someone and experiencing some sort of sexual attraction is technically a sin, after all, and if you're still full but keep eating, well, that's one too - but they're there. The rules being applied to Friends, however, are nowhere near that constant. Take Chandler's homophobia, for example. The show established pretty thoroughly that Chandler was an insecure loser (he worked in data entry, had troubles dating, and so on) that used sarcasm and humor as a defense mechanism and had various attitudes that you would expect from someone with some deep-seated insecurities about his position in the world. Of course he was homophobic - that was the joke. He was so insecure about his own ability to find love that he took it out on pretty much everyone else - fat people, homosexuals, his father, sometimes even his friends. That's why it was okay to laugh at him - he was clearly written to be a flawed, insecure, occasionally horrible human being that we could all relate to from time to time and poke fun of. He was a younger, hipper, somewhat less reactionary Archie Bunker. By laughing at him, we were laughing at our worst. It's the same reason Homer Simpson and Peter Griffin are funny. It's why people laugh at Stan Marsh.

But that's no longer enough. 

It was in the '90s, when Friends came out - even in that dark, almost medieval age of misogyny, trans-erasure, and structural racism, it was pretty clear that homophobia was the product of a reactionary, insecure mind, one that deserved derisive laughter and jest. However, just as the Puritans slowly turned the ratchet, then quickly; just as the Jacobins slowly turned the ratchet, then quickly; just as the Bolsheviks slowly turned the ratchet, then quickly; just as Mao's followers and their like in Cambodia and Vietnam slowly turned the ratchet, then quickly - now it is time to quickly turn the ratchet on the heretics and to truly identify and weed out the faithful, the followers of whatever Revolution we're celebrating and pushing today. To do so, we must cast away the past - if we do not, people might actually learn something from it, and we can't have that. We must continue to change what's acceptable. Homophobia is no longer a laughing matter since it is proof of dissent, and dissent is most certainly no laughing matter. Transphobia must be redefined from "fearing trans-gendered people" to "speaking openly of vaginas". We must fight the patriarchy until it becomes clear that fighting the patriarchy no longer suitably identifies the truly faithful, at which point we shall fight the kyriarchy. We shall demand that businesses advertise to the revolution, then, once they start doing so, tell them that the revolution is not for sale and how dare they co-opt our cause.

Humor is potentially subversive and subtle, so of course it must be crushed. Socialist realism abhors subtlety. Subtlety breeds variation and dissent. Can't have that.


Before anyone gets the wrong idea, conservatives shouldn't pat themselves on the back too hard right now. You had your turn - oh, believe me, you had your turn. America knows what happens when you get to call the cultural shots:

An illustrative Dune quote rendered by Calvin & Muad'Dib
Believe me, given the choice between arguing with a few semi-obscure left wing giblet-heads on the Internet and dealing with what happens when you get the keys... look, we're a long, long, long way from getting from "Let's politicize Friends for ad revenue!" to "Let's send anyone who finds Friends acceptable to a kolkhoz in Manitoba!" We are not near far enough from 90% of the legislation, attempted or ratified, that you've thrown at this country through the years. America fought long and hard to actually take "freedom of speech" seriously - don't think for a moment that we've forgotten how hard you fought us every step of the way.

With that, I think I'm done here for the moment.

Friday, January 16, 2015

It is by caffeine alone I set my mind in motion...

“It is by caffeine alone I set my mind in motion. It is by the beans of Java that thoughts acquire speed, the hands acquire shakes, the shakes become a warning. It is by caffeine alone I set my mind in motion.” - The Programmer's Mantra
In my previous post regarding MathLibrary.cmd, I pointed out that division was well-night unusable - dividing a 101-digit number by itself took over 22 minutes. This has led me, in fits and spurts, to find some ways to improve performance and streamline the code since I first started working on this over a month ago; unfortunately, most of my results had only provided incremental improvements at best.

I made a mistake yesterday that changed all that.

While fixing my subtraction code, which actually had a fairly subtle error that produced extra significant digits if the answer had fewer digits than either the minuend or the subtrahend, I accidentally created some code with an O(n^2) algorithm, which produced performance results that looked an awful lot like the results I was getting with my division algorithm. This led me to wonder if perhaps that might be the key to understanding where my problem was - was I iterating through my string, character by character, so often that I was killing my code? Or was something else holding it back?

The answer to both questions turned out to be yes.

First, a bit of technical background. When I first started working on MathLibrary.cmd, I explicitly wrote it in mind with the idea that I'd copy and paste blocks of it as needed for my Project Euler solutions. For example, if I was working on a problem that only needed addition, I'd just copy the part of the script that dealt with adding large numbers (:ExtAdd), along with whatever dependencies (:ExtMatchPad, :ExtDim). Consequently, I specifically built each portion of the script with the assumption that it would run independently. Meanwhile, in order to facilitate the elementary school style math (e.g. iterate through each digit of a large number, do some math against it, carry it to the next one, etc. - we're talking pre-Common Core algorithms here), I had to make sure that both numbers were more or less identically sized - otherwise I'd be adding, say, 9 to absolute emptiness, which would lead to all sorts of unpleasantness. So, to accommodate that, I'd get the length of each number string, then append 0's in either direction as needed.

The mistake I made yesterday was that I got the length of my number string over and over and over again.

Well shoot - why do that? What if I could get it once, store it, pass it to each subroutine, and have it arithmetically manipulated as needed instead of manually checking the length of the string each time? So, last night, that's what I tried - and it worked! I got about a 50% performance improvement from my script by doing that.

Not bad - but still not good enough.

Trouble is, 50% of 22 minutes is still over 10 minutes, which is just not practical if I need to run repeated division operations. I didn't need to drop my division runtime by 50% - I needed to drop it by at least an order of magnitude. What else could I try?

This morning, I had a rather large, rather strong cup of gas station coffee. A bit later, I then had a flash of inspiration. Since multiplication is slow, I smartly decided when I first wrote the division algorithm to do it as rarely as possible - I'd multiply the divisor from 1 to 10, store the values in an array, then compare against those pre-calculated values to find the answer.

What if I didn't multiply at all?

What I was really trying to do was add the divisor to itself ten times, and addition is way faster than multiplication. So, what if I just did that instead?
Testing division operations...
/01 OK 14:48:57.25 14:48:56.64 0:0:0.61
/02 OK 14:48:58.00 14:48:57.26 0:0:0.74
/03 OK 14:48:58.94 14:48:58.01 0:0:0.93
/04 OK 14:49:00.08 14:48:58.94 0:0:1.14
/05 OK 14:49:01.40 14:49:00.09 0:0:1.31
/06 OK 14:49:03.71 14:49:01.41 0:0:2.30
/07 OK 14:49:07.30 14:49:03.72 0:0:3.58
/08 OK 14:49:17.23 14:49:07.32 0:0:9.91
/09 OK 14:49:36.92 14:49:17.24 0:0:19.68
/10 OK 14:49:56.90 14:49:36.94 0:0:19.96
Ah - much better. Now, instead of taking over 20 minutes, division takes less than 20 seconds. Much more livable, especially since this means I can now meaningfully check and troubleshoot it and find little issues like this:
MathLibrary.cmd 1 / 0.5
Oops - well, looks like I still have some work to do.

Thursday, January 15, 2015

Experimenting with string manipulation

Division using my MathLibrary.cmd script has always been slow. Really, really slow:
/01 OK 11:21:39.08 11:21:37.99 0:0:1.9
/02 OK 11:21:40.89 11:21:39.09 0:0:1.80
/03 OK 11:21:43.92 11:21:40.90 0:0:3.2
/04 OK 11:21:48.57 11:21:43.93 0:0:4.64
/05 OK 11:21:54.78 11:21:48.58 0:0:6.20
/06 OK 11:22:12.96 11:21:54.79 0:0:18.17
/07 OK 11:22:53.36 11:22:12.98 0:0:40.38
/08 OK 11:28:33.39 11:22:53.37 0:5:40.2
/09 OK 11:50:13.04 11:28:33.40 0:21:39.64
/10 OK 12:12:14.42 11:50:13.05 0:22:1.37
The bolded numbers are how long each unit test (/01, /02, etc.) took to complete in HH:MM:SS.mS format. Each test is a series of identical numbers (meaning the answer should always be "1") in increasing size - test /01, for example, is 2/2, while test /10 - which took over 22 minutes - is dividing the same 101-digit number by itself.

This is a problem.

It's a problem because, at the end of the day, the goal of this script is to solve the third Project Euler problem - find the largest prime factor of a large enough number for CMD to choke on it - and in order to do that, I need to be able to calculate the square root of that number so I have an upper bound of prime factors to consider. In order to determine a square root by scratch, however, the most straightforward method that I can wrap my mind around uses iterative floating point division to get closer and closer to the square root until it's found. Obviously, an algorithm that takes over 20 minutes to divide two numbers isn't going to work if I have to divide two numbers over and over again.

While running my unit tests, however, I discovered something curious:
 Testing subtraction operations...
-01 OK 10:55:41.34 10:55:41.16 0:0:0.18
-02 OK 10:55:41.82 10:55:41.35 0:0:0.47
-03 OK 10:55:43.21 10:55:41.82 0:0:1.39
-04 OK 10:55:46.00 10:55:43.22 0:0:2.78
-05 OK 10:55:59.50 10:55:46.02 0:0:13.48
-06 OK 10:55:59.61 10:55:59.52 0:0:0.9
-07 OK 10:55:59.88 10:55:59.63 0:0:0.25
-08 OK 10:56:00.93 10:55:59.88 0:0:1.5
-09 OK 10:56:02.84 10:56:00.94 0:0:1.90
-10 OK 10:56:05.72 10:56:02.84 0:0:2.88
-11 OK 10:56:05.82 10:56:05.72 0:0:0.10
-12 OK 10:56:06.07 10:56:05.82 0:0:0.25
-13 OK 10:56:07.13 10:56:06.09 0:0:1.4
-14 OK 10:56:09.04 10:56:07.13 0:0:1.91
-15 OK 10:56:11.93 10:56:09.04 0:0:2.89
-16 NO 10:56:12.05 10:56:11.93 0:0:0.12
-17 NO 10:56:12.24 10:56:12.05 0:0:0.19
-18 NO 10:56:13.56 10:56:12.25 0:0:1.31
-19 NO 10:56:16.14 10:56:13.58 0:0:2.56
-20 NO 10:56:27.52 10:56:16.16 0:0:11.36
Note the "NO" results on the last five tests - these are tests that subtract two identical floating point numbers, which should return 0. Well, it turns out it wasn't quite doing that...

C:\Users\ \Documents\GitHub\MathLibrary.cmd>MathLibrary.cmd 0.2 - 0.2

C:\Users\ \Documents\GitHub\MathLibrary.cmd>MathLibrary.cmd 14.320 - 14.320

Notice that the results - bolded for readability - aren't actually "0" - they're 0.however many significant digits were on the right of the number. This led me to put together a quick "strip trailing zeroes" routine, which is near-identical to the one used by the division subroutine:

        CALL :ExtDim %_SubtractResult% _SubtractResLen _SubtractResDec
        IF %_SubtractResDec% GTR %_SubtractResLen% GOTO ExtSubtractDoneStrippingTrailingZeroes
        SET _SubtractRInt=%_SubtractResult:~-1%
        IF %_SubtractRInt%==. (
            SET _SubtractResult=%_SubtractResult:~0,-1%
            GOTO ExtSubtractDoneStrippingTrailingZeroes
        IF NOT %_SubtractRInt%==0 GOTO ExtSubtractDoneStrippingTrailingZeroes
        SET _SubtractResult=%_SubtractResult:~0,-1%
        GOTO ExtSubtractStripTrailingZeroes

Then I re-ran my tests:
-16 OK 13:45:15.31 13:45:15.15 0:0:0.16
-17 OK 13:45:15.57 13:45:15.31 0:0:0.26
-18 OK 13:45:18.60 13:45:15.59 0:0:3.1
-19 OK 13:45:25.90 13:45:18.61 0:0:7.29
-20 OK 13:51:18.70 13:45:25.90 0:5:52.80
The good news is they all passed. The bad news is they started taking longer - much longer in the case of the last test, which went from taking an already slow 11 seconds to taking close to 6 minutes. Something needed to be done. So, instead of looping via GOTO and truncating the string one digit at a time, I decided I'd try finding the position of the first trailing zero and then cut all of them at once:

        CALL :ExtDim %_SubtractResult% _SubtractResLen _SubtractResDec
        IF %_SubtractResDec% GTR %_SubtractResLen% GOTO ExtSubtractDoneStrippingTrailingZeroes
        SET /A _SubStripMax=_SubtractResLen-_SubtractResDec+1
        SET _SubStripOff=0
        FOR /L %%G IN (1,1,%_SubStripMax%) DO (
            SET _SubStripPos=%%G
            CALL SET _SubZChk=%%_SubtractResult:~-!_SubStripPos!,%_One%%%
            IF NOT !_SubZChk!==0 (
                GOTO ExtSubtractFoundTrailingZeroes
            ) ELSE SET _SubStripOff=%%G
        IF %_SubZChk%==. SET /A _SubStripOff+=1
        CALL SET _SubtractResult=%%_SubtractResult:~%_Zero%,-%_SubStripOff%%%

The result?
-16 OK 15:12:51.19 15:12:51.05 0:0:0.14
-17 OK 15:12:51.41 15:12:51.19 0:0:0.22
-18 OK 15:12:52.83 15:12:51.42 0:0:1.41
-19 OK 15:12:55.59 15:12:52.84 0:0:2.75
-20 OK 15:13:08.87 15:12:55.61 0:0:13.26
Much better.  Granted, it's a little slower than it was when I wasn't cutting the zeroes out at all, but this was at least manageable. Then I took a second look at the first line of code from the original algorithm:

        CALL :ExtDim %_SubtractResult% _SubtractResLen _SubtractResDec

ExtDim iterates through each character of the string, manually counting its length and decimal position along the way. Realistically, we only need to run that once, not once per character. Moving things around a bit and reverting to the original algorithms gave me this:
-16 OK 16:55:20.62 16:55:20.47 0:0:0.15
-17 OK 16:55:20.85 16:55:20.62 0:0:0.23
-18 OK 16:55:22.49 16:55:20.87 0:0:1.62
-19 OK 16:55:25.67 16:55:22.51 0:0:3.16
-20 OK 16:55:41.43 16:55:25.68 0:0:15.75

So, even with my original algorithm fixed, the For loop method still remained faster, though nowhere near as much faster as I was hoping since the division zero stripping algorithm isn't as broken as my original subtraction zero stripping algorithm.

I have some other ideas to move division along, though. Stay tuned...

Wednesday, January 14, 2015

Performance Testing CMD

I've remarked a couple of times in the past about the inconvenience of CMD's insistence that all numbers that begin with a zero shall be treated as octal numbers. Since my MathLibrary.cmd project uses string manipulation to break down large and floating point numbers into chunks that CMD can actually handle, it'd be nice if I could break strings down into larger chunks and handle them "natively" instead of one digit at a time like I'm currently doing. However, every attempt I've made at doing so has actually led to a performance penalty since the overhead of "de-octalifying" - checking if a digit starts with a 0 and converting it to a decimal number - consumes any time I might otherwise save.

This led me to consider what the most efficient method of "de-octalifying" a number in CMD might be. Three approaches that immediately came to mind were:
  1. Append a "1" to the beginning of every number and apply the modulus operator against a known power of 10 to extract the desired number. For example, if I'm trying to de-octalify 0003, I'd append a 1 (10003), then modulus* that number against 10000. This is the traditional method seen in most Windows batch script examples that deal with this sort of problem.
  2. Apply a series of If/Then statements that checks for the presence of leading 0's, then return the truncated result.
  3. Similar logic as #2, only using a For loop to iterate through the number string.
I then decided to write a script to test these algorithms, which can be found here. A sample of the algorithms themselves:


    SETLOCAL EnableDelayedExpansion
    :: Accepts one parameter:
    :: %1 - The digit to be de-octalized (passed by reference)
    SET _TestMod=1!%1!
    SET /A _TestMod=_TestMod %% 1000000000

    ENDLOCAL & SET %1=%_TestMod%

    SETLOCAL EnableDelayedExpansion
    :: Accepts one parameter:
    :: %1 - The digit to be de-octalized (passed by reference)
    SET _TestIf=!%1!
    IF %_TestIf:~0,1%==0 (
        IF %_TestIf:~1,1%==0 (
            IF %_TestIf:~2,1%==0 (
                IF %_TestIf:~3,1%==0 (
                    IF %_TestIf:~4,1%==0 (
                        IF %_TestIf:~5,1%==0 (
                            IF %_TestIf:~6,1%==0 (
                                IF %_TestIf:~7,1%==0 (
                                    SET _TestIf=%_TestIf:~-1%
                                ) ELSE SET _TestIf=%_TestIf:~-2%
                            ) ELSE SET _TestIf=%_TestIf:~-3%
                        ) ELSE SET _TestIf=%_TestIf:~-4%
                    ) ELSE SET _TestIf=%_TestIf:~-5%
                ) ELSE SET _TestIf=%_TestIf:~-6%
            ) ELSE SET _TestIf=%_TestIf:~-7%
        ) ELSE SET _TestIf=%_TestIf:~-8%
    ENDLOCAL & SET %1=%_TestIf%

    SETLOCAL EnableDelayedExpansion
    :: Accepts one parameter
    :: %1 - The digit to be deoctalized (passed by reference)
    SET _TestFor=!%1!
    FOR /L %%G IN (1,1,8) DO (
        SET _TestZero=!_TestFor:~0,1!
        IF !_TestZero!==0 (
            SET _TestFor=!_TestFor:~1!
        ) ELSE GOTO TestForReturn
    ENDLOCAL & SET %1=%_TestFor%


Note that, of the three, the modulus approach is certainly the tersest of the bunch, which is arguably why it's the most popular. After testing it, there's another advantage:

It's faster.

My test script went through and truncated eight sets of numbers, plus processed a set that didn't need to be truncated at all. I chose this set because, owing to the Windows Batch Script environment's 32-bit signed integer limit, the largest number that can be processed by CMD is a 10-digit number (2147483647, to be more specific) - consequently, the largest block that I could subdivide addition and subtraction operations into would be a nine digit block (999999999 + 999999999 = 1,999,999,998, which is less than CMD's limit of 2,147,483,647), so that would be the largest block I would ever process. I then iterated through each digit in that set a thousand times, which I figured would be long enough to highlight any performance differences, and viewed the results:

Mod If For
1 3.1 3.15 3.66
2 2.85 3.7 3.53
3 2.8 2.99 3.39
4 2.68 2.9 3.23
5 2.59 2.82 3.1
6 2.5 2.72 2.92
7 2.42 2.63 2.77
8 2.31 2.54 2.61
9 2.22 2.41 2.45
The y-axis in that table is how many digits would be returned - for example, "9" corresponds to 123456789, which wouldn't have any 0s removed at all, while "1" corresponds to 000000001, which would have eight 0s removed.  The x-axis is the algorithm chosen for each iteration - either using the modulo operator, cascading If/Then statements, or a For loop. Time was measured in seconds. Notice that the modulus method of 0 truncation was consistently the fastest in each example, with If/Then being faster in most use cases than the For loop method.

The takeaway from all of this? Sometimes the simplest methods really are the best, and sometimes traditions happen for a reason.

* The modulo operator, usually expressed as "%" divides a numerator by a denominator, then returns the remainder. For example, 11 % 2 would give 1, which is the remainder after 11 / 2.

Saturday, January 3, 2015

Extending CMD's Math Abilities

Sample output from MathLibrary.cmd
A while back, I started working on writing Project Euler scripts using Windows Batch Script. I didn't get very far before I ran into some pretty clear limitations with my chosen development environment:
  • Windows Batch Script only supports 32-bit signed integers. This bounds its native math functionality from -2,147,483,648 to 2,147,483,647, inclusively.
  • Windows Batch Script does not handle decimal points at all.
  • Windows Batch Script's native arithmetic functions are extremely limited.
  • Windows Batch Script has almost no error handling capabilities at all. They're so bad, in fact, that you can oftentimes run afoul of each of the above limitations and never receive an error for doing so - you'll just get undesired results.
To that end, I started working on a way to broaden its capabilities a bit, and in the spirit of the original goal of implementing Project Euler solutions using nothing but Windows Batch Scripting, I decided to keep my solution confined to Windows Batch Scripting as well. At first I did what any good sysadmin would do and tried to Google the problem away, which led to some fruitful results. Someone found a way to calculate Pi using Batch Script, which was encouraging - that proved floating point math was possible. I also found solutions that allowed arbitrary length integers to be calculated, which were also promising. Unfortunately, when I attempted to apply those solutions to my particular problem, I found that the code wasn't particularly portable (the division subroutine in the Pi script, for example, only works as long as one of the digits is within the 32-bit signed limit) or otherwise useful for my needs. Consequently, it looked like I would have to write my own.

Now, before anyone gets excited, it's not done yet. I'm publishing it now because 2015 is here, I've been fiddling with this on and off for the past two months, and I decided it was high time to just get it out the door for the rest of the world. With that in mind, here are the current limitations of MathLibrary.cmd:
  • Negative numbers are only supported for addition, subtraction, and comparison. Multiplication and division are coming soon.
  • The algorithms are "conceptual", meaning that, if you were a 10-year-old elementary school student that could read Windows Batch Script, you would understand what was going on perfectly. I'm not taking advantages of quads and octs yet (I'll explain why below). 
  • It's slow. Two 500 digit numbers take about 10 seconds to add on my laptop (AMD Phenom II N660 3.0 GHz) and about 25 seconds to subtract. Two 50 digit numbers take about a minute and a half to multiply together. Division is unspeakably atrocious.
That said, it does successfully add, subtract, multiply, divide, and compare arbitrary-length floating point numbers, which is pretty cool. It can be done.

So, let's talk about speed for a bit. While working on the script, I decided to experiment with refactoring the addition subroutined (:ExtAdd) to use eight digit blocks instead of single integer blocks like what you see on GitHub. What I discovered was that, in order to do so, I had to check for blocks that started with a 0 - or several 0's - and filter them out, otherwise Windows Batch Script would assume I wanted to add two octal numbers together. Consequently, whatever performance gains I got out of adding in 8-digit blocks instead of single digits was devoured by the time it took to clear out the leading 0's out of the blocks so they would add together cleanly.

Another odd note - take a look at the code used in the Pi script for adding two digits together:

set /a Add_Digit = Add_Carry + %1_%%i + %2_%%i
set /a %1_%%i = Add_Digit %% 10000
set /a Add_Carry = Add_Digit / 10000

Now, let's compare to my equivalent code:

SET /A _AddByPos=_AddInt1+_AddInt2+_AddCarry
IF !_AddByPos! GEQ 10 (
SET _AddCarry=1
SET /A _AddByPos-=10
) ELSE (
SET _AddCarry=0

What's the difference? The Pi script algorithm is conceptually pretty clever - it takes advantage of the fact that Windows Batch Script's division operator will only return the integer value of a divide operation (i.e. everything to the left of the decimal point in the result). Consequently, if you divide a number between 10-19, inclusively, by 10, you will get 1 instead of 1.1, 1.2, 1.3, and so on. Then, you can use the modulus operator to get the remainder, which, when dividing by 10, would be the lone digit to the right of the decimal point (e.g. 11 %% 10 = 1, 12 %% 10 = 2, and so on).

It's also slower.

Using my UnitTest.cmd script, which I use to make sure that improvements to the script don't fundamentally break the script, here's the times I got for performing the first 10 addition tests using my method (time elapsed in bold):

01 OK 21:01:50.42 21:01:50.32 0:0:0.10
02 OK 21:01:50.51 21:01:50.44 0:0:0.7
03 OK 21:01:50.61 21:01:50.52 0:0:0.9
04 OK 21:01:50.73 21:01:50.62 0:0:0.11
05 OK 21:01:50.87 21:01:50.74 0:0:0.13
06 OK 21:01:51.19 21:01:50.88 0:0:0.31
07 OK 21:01:51.57 21:01:51.21 0:0:0.36
08 OK 21:01:52.91 21:01:51.59 0:0:1.32
09 OK 21:01:55.01 21:01:52.92 0:0:2.9
10 OK 21:02:05.95 21:01:55.08 0:0:10.87

And here's what I got when I implemented the Pi script algorithm:

01 OK 21:48:31.13 21:48:31.02 0:0:0.11
02 OK 21:48:31.24 21:48:31.15 0:0:0.9
03 OK 21:48:31.37 21:48:31.26 0:0:0.11
04 OK 21:48:31.58 21:48:31.40 0:0:0.18
05 OK 21:48:31.77 21:48:31.60 0:0:0.17
06 OK 21:48:32.11 21:48:31.79 0:0:0.32
07 OK 21:48:32.63 21:48:32.13 0:0:0.50
08 OK 21:48:34.21 21:48:32.65 0:0:1.56
09 OK 21:48:37.11 21:48:34.22 0:0:2.89
10 OK 21:48:50.02 21:48:37.12 0:0:12.90

Note that most of the tests are considerably slower - about 10-20% slower, give or take. Why? Part of the problem is that Windows Batch Script's internal math functions are unevenly implemented - addition and subtraction, which my method uses, is faster than division or the modulus operator. Part of it is that the Pi script algorithm applies the same three steps to all addition operations - add, divide, modulus - while mine oftentimes (especially in the first 10 unit tests, which are solely large number addition with no carry) only uses two operations (add, set), with a third set (subtract by 10) used only when there's a carry operation in play.

Ah... but what about that IF statement? Doesn't that count as an operation?

It does, actually, which is why my method is only 10-20% faster instead of 33% faster under ideal conditions, which the first ten unit tests simulate nicely. However, every time the Windows Batch Script arithmetic engine is called, it noticeably kills performance. How noticeably? When I first started this script, I intentionally set all of my "integers" using SET /A to help me differentiate in my code between the strings and substrings I was manipulating against actual numbers that Windows Batch Script could manipulate. While working on it, however, I read somewhere that SET /A added a step in the interpreter over just using SET, so I put it to the test and converted all of my SET /A declarations to SET. The result was a 5-10% performance improvement.

So there you have it. Like I said, I'm not done fiddling with this - negative number support will be coming to multiplication and division over the next week or two, depending on time and motivation. After that, I'm probably going to mix up the UnitTest script a bit so I can test individual components in a hurry instead of testing ALL THE THINGS all the time - the multiplication tests in particular really slow me down - and then I'm going to start playing with performance tuning the script as much as possible. Assuming I can get division into something resembling "usable condition" (it takes over 10 seconds to divide 1 by 3), I'll then start implementing a square root subroutine, which will then clear the way for me to get back to work on Project Euler solutions once more.

It'll get there, one line at a time. I just have to make this script suck a little bit less each week, that's all. It's a good metaphor for life.