Thursday, February 4, 2010

Python Tracing Performance

Looking back at my introductory post, I realize I didn't mention that I am currently working on my M.S. degree in Computer Science at UCSB. Occasionally, my academic work leads me to discover something immensely applicable to my industrial work -- I recently encountered one such case that was fairly game changing in terms of how I intend to use Python for large processing tasks.

The details of the particular assignment I was working on aren't pertinent to this discussion. Suffice it to say that I needed to perform some fairly complex processing on a fairly large graph of nodes. For the purposes of this discussion, I was particularly interested in some of the relationships between many of the nodes.

Once the majority of the code was written and tested, I attempted to run it on the large dataset, and found that while progress was being made, it was being made very slowly, taking upwards of 10 minutes to complete one particular portion of the processing. Naturally, I dug back into the relevant code, searching for where I had allowed some O(N^2) code to be introduced into what was supposed to be a O(N log N) algorithm. Much to my dismay, I couldn't find any area where I had allowed such an obvious mistake to enter my code (well, partial dismay -- I was actually glad I didn't screw it up).

Once I was satisfied that there were no major algorithmic mistakes, I took steps to begin optimizing out any of the constant-time operations that could possibly be removed. I pushed many of the operations into other areas of the code where I was already manipulating the object in question, improved some of the layout to avoid re-examining the same object, etc. However, none of these optimizations resulted in any worthwhile performance gains.

Up until now, I hadn't given my trace routines much thought, as I was simply using them to help me debug the problems in question, but even with the trace disabled I was still finding the code extremely slow to run... then it hit me -- even when my trace function was "disabled", I still had to call the function in question. So on a whim I commented out all of the calls to my trace function, and ran the algorithm again... and before I could lean back in my chair to wait, the results popped up on my screen. I was shocked -- from 10 minutes to less than a second, just by removing my "disabled" trace lines.

Now, I've been writing Python code for a while -- I'm fairly consistent in sticking to the performance patterns mentioned in the Python wiki and working with strings efficiently but I was amazed by how much this one little change affected my program.

As I thought about the problem in particular, I realized it made sense: function calls in Python are fairly expensive, so even if you're only doing O(N) of them, if N is large enough, that still adds up to a lot of function calls. So, I decided to develop a test module to demonstrate the various simple tracing routines I could quickly write down. Here's the code for the testing module:


'''
Demonstrate the performance impacts of various log/tracing techniques.

The tests in this module do not test the performance of various logging
methods while logging is enabled, as we likely don't mind the overhead 
of checking to see whether the statement is enabled or not when we're 
actually writing to sys.stdout or a file (the I/O overhead is ususally 
going to be much larger than even a few function calls).

Instead, these tests focus on the running code while it contains 
*disabled* log statements to see how that unused code is ultimately
affecting program performance.
'''
from __future__ import print_function
from logging import getLogger
import timeit

TRACE_MESSAGE_ARGS = tuple(range(5))
TRACE_MESSAGE_FORMAT = 'Traced message: %s' % ', '.join('%s' for _i in TRACE_MESSAGE_ARGS)



def method_0():
    '''Provide a baseline that simply comments out unused trace.'''
#    print TRACE_MESSAGE_FORMAT % TRACE_MESSAGE_ARGS
    pass


METHOD_1_ENABLED_LEVELS = set([
    'FOO',
    'BAR',
])



def method_1a():
    '''Use a trace method that checks for an enabled level.
    
    This is the typical simple trace mechanism. Using a set() for the
    level definition helps avoid unnecessary list scans, but two things
    are wrong with this approach:
        
        1. The string interpolation must be performed every iteration.
        2. We have to call the method_1_trace function every iteration. 
    '''
    method_1a_trace('NOT_ENABLED', TRACE_MESSAGE_FORMAT % TRACE_MESSAGE_ARGS)

def method_1a_trace(level, message):
    if level in METHOD_1_ENABLED_LEVELS:
        print(message)



def method_1b():
    '''Similar to method 1a, but don't force the string interpolation every iteration.'''
    method_1b_trace('NOT_ENABLED', TRACE_MESSAGE_FORMAT, *TRACE_MESSAGE_ARGS)

def method_1b_trace(level, message, *formatArgs):
    if level in METHOD_1_ENABLED_LEVELS:
        print(message % formatArgs)



def method_1c():
    '''Similar to method 1a, but don't force the function call every iteration.'''
    if 'NOT_ENABLED' in METHOD_1_ENABLED_LEVELS:
        print(TRACE_MESSAGE_FORMAT % TRACE_MESSAGE_ARGS)



METHOD_2_LEVEL = False

def method_2():
    '''Use the trace level as a boolean guard.
    
    The concept here is to avoid executing a function call, or a string 
    interpolation step at all in the case where the level is disabled.
    
    This appears to be the fastest mechanism available short of actually
    removing the trace code from the program entirely (via a comment or 
    otherwise).
    '''
    METHOD_2_LEVEL and print(TRACE_MESSAGE_FORMAT % TRACE_MESSAGE_ARGS)

METHOD_2_LEVEL_FOO = 'foo'
METHOD_2_LEVEL_BAR = 'bar'
METHOD_2_LEVEL_DISABLED = 'disabled'

METHOD_2_ENABLED_LEVELS = set([
    METHOD_2_LEVEL_FOO,
    METHOD_2_LEVEL_BAR,
])

def method_2b():
    '''Mix of method_2 with enabled levels.'''
    METHOD_2_LEVEL_DISABLED in METHOD_2_ENABLED_LEVELS and print(TRACE_MESSAGE_FORMAT % TRACE_MESSAGE_ARGS)


method_3_logger = getLogger(__name__)
method_3_trace = method_3_logger.debug

def method_3():
    '''Use the built-in logging mechanism.
    
    This was included somewhat as an afterthought, since we've already
    shown that function calls are the major overhead to avoid. 
    
    It turns out this is by far the worst performing option available!
    '''
    method_3_trace(TRACE_MESSAGE_FORMAT, *TRACE_MESSAGE_ARGS)



def main():
    print('Method 0: ', min(timeit.Timer(method_0).repeat()))

    print('Method 1a: ', min(timeit.Timer(method_1a).repeat()))
    print('Method 1b: ', min(timeit.Timer(method_1b).repeat()))
    print('Method 1c: ', min(timeit.Timer(method_1c).repeat()))

    print('Method 2: ', min(timeit.Timer(method_2).repeat()))
    print('Method 2b: ', min(timeit.Timer(method_2b).repeat()))

    print('Method 3: ', min(timeit.Timer(method_3).repeat()))



if __name__ == '__main__':
    main()

On my machine here at work, the output of running that module looks like this:
Method 0:  0.203303173335
Method 1a:  1.00159193476
Method 1b:  0.559357103269
Method 1c:  0.222934412164
Method 2:  0.193546388001
Method 2b:  0.244648784973
Method 3:  5.09330805922

Method 0 provides a simple baseline for the test -- the fastest run of the test with no code takes .2 seconds. Given that we're running the test method a million times (the default for the timeit.Timer.repeat method), that seems reasonable for Python.

The Method 1{a,b,c} results represent variations on a simple trace(level, message) metaphor that I've frequently used to do simple tracing. As you can see, dropping the string interpolation from 1a in test 1b cuts the time nearly in half, and dropping the function call in test 1c drops it nearly in half again. The point to take away from this is this: a trace function really can't afford to do any work while it's disabled!

Method 2 turns out to be my preferred choice for simple tracing now -- just guard a print statement with a boolean condition that's either enabled or disabled. Note, it's important that the object be simple to evaluate in a boolean context (e.g, True, False, 0, 1, etc) -- it wouldn't do any good if we had to check a complicated logic function to see if we were supposed to get past the guard. Method 2b is a simple variation on this theme that tries to combine the efficiency of method 2 with the convenience of method 1*. However, the slight degradation in performance doesn't seem worth it to me.

Astoundingly, using Python's provided logging module provided the worst performance by far! But, when you think about it, the logging module in concept is similar to the 1a method, but has to do a good deal more work to check it's guard condition. And in fact, judging from these results, it looks like it's about 5 times as much work!

There seem to be a couple of lessons we could take away from this brief analysis:

  1. If possible, be sure not to call any functions when checking a guard condition for a trace method -- function calls in Python are too expensive to be executing just to prevent printing. (Note that this includes any dot-operator accesses, as expressions like "a.b" are essentially just method calls like "a.__getattr__(b)", with some additional magic applied.
  2. If you must use something like the logging module or a level-checking function for trace, only include trace where absolutely necessary, and never inside frequently executed loops. If you need the trace there for debugging, be sure to comment it out when you've finished your investigation.

1 comment:

  1. I just stumbled across a similar problem and related solution to using the logging module in Python that's worth mentioning:

    http://dound.com/2010/02/python-logging-performance/

    ReplyDelete