Wednesday, January 14, 2015

Performance Testing CMD


I've remarked a couple of times in the past about the inconvenience of CMD's insistence that all numbers that begin with a zero shall be treated as octal numbers. Since my MathLibrary.cmd project uses string manipulation to break down large and floating point numbers into chunks that CMD can actually handle, it'd be nice if I could break strings down into larger chunks and handle them "natively" instead of one digit at a time like I'm currently doing. However, every attempt I've made at doing so has actually led to a performance penalty since the overhead of "de-octalifying" - checking if a digit starts with a 0 and converting it to a decimal number - consumes any time I might otherwise save.

This led me to consider what the most efficient method of "de-octalifying" a number in CMD might be. Three approaches that immediately came to mind were:
  1. Append a "1" to the beginning of every number and apply the modulus operator against a known power of 10 to extract the desired number. For example, if I'm trying to de-octalify 0003, I'd append a 1 (10003), then modulus* that number against 10000. This is the traditional method seen in most Windows batch script examples that deal with this sort of problem.
  2. Apply a series of If/Then statements that checks for the presence of leading 0's, then return the truncated result.
  3. Similar logic as #2, only using a For loop to iterate through the number string.
I then decided to write a script to test these algorithms, which can be found here. A sample of the algorithms themselves:

***

:TestModulus
    SETLOCAL EnableDelayedExpansion
    :: Accepts one parameter:
    :: %1 - The digit to be de-octalized (passed by reference)
   
    SET _TestMod=1!%1!
    SET /A _TestMod=_TestMod %% 1000000000

    ENDLOCAL & SET %1=%_TestMod%
GOTO :EOF

:TestIf
    SETLOCAL EnableDelayedExpansion
    :: Accepts one parameter:
    :: %1 - The digit to be de-octalized (passed by reference)
   
    SET _TestIf=!%1!
   
    IF %_TestIf:~0,1%==0 (
        IF %_TestIf:~1,1%==0 (
            IF %_TestIf:~2,1%==0 (
                IF %_TestIf:~3,1%==0 (
                    IF %_TestIf:~4,1%==0 (
                        IF %_TestIf:~5,1%==0 (
                            IF %_TestIf:~6,1%==0 (
                                IF %_TestIf:~7,1%==0 (
                                    SET _TestIf=%_TestIf:~-1%
                                ) ELSE SET _TestIf=%_TestIf:~-2%
                            ) ELSE SET _TestIf=%_TestIf:~-3%
                        ) ELSE SET _TestIf=%_TestIf:~-4%
                    ) ELSE SET _TestIf=%_TestIf:~-5%
                ) ELSE SET _TestIf=%_TestIf:~-6%
            ) ELSE SET _TestIf=%_TestIf:~-7%
        ) ELSE SET _TestIf=%_TestIf:~-8%
    )
   
    ENDLOCAL & SET %1=%_TestIf%
GOTO :EOF

:TestFor
    SETLOCAL EnableDelayedExpansion
    :: Accepts one parameter
    :: %1 - The digit to be deoctalized (passed by reference)
   
    SET _TestFor=!%1!
    FOR /L %%G IN (1,1,8) DO (
        SET _TestZero=!_TestFor:~0,1!
        IF !_TestZero!==0 (
            SET _TestFor=!_TestFor:~1!
        ) ELSE GOTO TestForReturn
    )
   
    :TestForReturn
    ENDLOCAL & SET %1=%_TestFor%
GOTO :EOF


***

Note that, of the three, the modulus approach is certainly the tersest of the bunch, which is arguably why it's the most popular. After testing it, there's another advantage:

It's faster.

My test script went through and truncated eight sets of numbers, plus processed a set that didn't need to be truncated at all. I chose this set because, owing to the Windows Batch Script environment's 32-bit signed integer limit, the largest number that can be processed by CMD is a 10-digit number (2147483647, to be more specific) - consequently, the largest block that I could subdivide addition and subtraction operations into would be a nine digit block (999999999 + 999999999 = 1,999,999,998, which is less than CMD's limit of 2,147,483,647), so that would be the largest block I would ever process. I then iterated through each digit in that set a thousand times, which I figured would be long enough to highlight any performance differences, and viewed the results:



Mod If For
1 3.1 3.15 3.66
2 2.85 3.7 3.53
3 2.8 2.99 3.39
4 2.68 2.9 3.23
5 2.59 2.82 3.1
6 2.5 2.72 2.92
7 2.42 2.63 2.77
8 2.31 2.54 2.61
9 2.22 2.41 2.45
The y-axis in that table is how many digits would be returned - for example, "9" corresponds to 123456789, which wouldn't have any 0s removed at all, while "1" corresponds to 000000001, which would have eight 0s removed.  The x-axis is the algorithm chosen for each iteration - either using the modulo operator, cascading If/Then statements, or a For loop. Time was measured in seconds. Notice that the modulus method of 0 truncation was consistently the fastest in each example, with If/Then being faster in most use cases than the For loop method.

The takeaway from all of this? Sometimes the simplest methods really are the best, and sometimes traditions happen for a reason.

*****
* The modulo operator, usually expressed as "%" divides a numerator by a denominator, then returns the remainder. For example, 11 % 2 would give 1, which is the remainder after 11 / 2.

No comments:

Post a Comment