• Announcements

    • khawk

      Download the Game Design and Indie Game Marketing Freebook   07/19/17

      GameDev.net and CRC Press have teamed up to bring a free ebook of content curated from top titles published by CRC Press. The freebook, Practices of Game Design & Indie Game Marketing, includes chapters from The Art of Game Design: A Book of Lenses, A Practical Guide to Indie Game Marketing, and An Architectural Approach to Level Design. The GameDev.net FreeBook is relevant to game designers, developers, and those interested in learning more about the challenges in game development. We know game development can be a tough discipline and business, so we picked several chapters from CRC Press titles that we thought would be of interest to you, the GameDev.net audience, in your journey to design, develop, and market your next game. The free ebook is available through CRC Press by clicking here. The Curated Books The Art of Game Design: A Book of Lenses, Second Edition, by Jesse Schell Presents 100+ sets of questions, or different lenses, for viewing a game’s design, encompassing diverse fields such as psychology, architecture, music, film, software engineering, theme park design, mathematics, anthropology, and more. Written by one of the world's top game designers, this book describes the deepest and most fundamental principles of game design, demonstrating how tactics used in board, card, and athletic games also work in video games. It provides practical instruction on creating world-class games that will be played again and again. View it here. A Practical Guide to Indie Game Marketing, by Joel Dreskin Marketing is an essential but too frequently overlooked or minimized component of the release plan for indie games. A Practical Guide to Indie Game Marketing provides you with the tools needed to build visibility and sell your indie games. With special focus on those developers with small budgets and limited staff and resources, this book is packed with tangible recommendations and techniques that you can put to use immediately. As a seasoned professional of the indie game arena, author Joel Dreskin gives you insight into practical, real-world experiences of marketing numerous successful games and also provides stories of the failures. View it here. An Architectural Approach to Level Design This is one of the first books to integrate architectural and spatial design theory with the field of level design. The book presents architectural techniques and theories for level designers to use in their own work. It connects architecture and level design in different ways that address the practical elements of how designers construct space and the experiential elements of how and why humans interact with this space. Throughout the text, readers learn skills for spatial layout, evoking emotion through gamespaces, and creating better levels through architectural theory. View it here. Learn more and download the ebook by clicking here. Did you know? GameDev.net and CRC Press also recently teamed up to bring GDNet+ Members up to a 20% discount on all CRC Press books. Learn more about this and other benefits here.
Sign in to follow this  
Followers 0
Storyyeller

strange performance patterns

22 posts in this topic

I have a python script which does some calculations, and I am trying to figure out how fast it is. I invoke it from the command line and it prints out the time taken after completing.

However, there is a mysterious pattern to the timings. When I first run it, it starts out relatively fast. However, subsequent runs become slower and slower. If I leave the computer idle for a while, it becomes fast again. I can't really think of any explanation for this. What is going on?
0

Share this post


Link to post
Share on other sites
What is the difference in time? Have you run your code through a profiler? (Have a look at cProfile: http://docs.python.org/library/profile.html.)

What OS?
What version of Python?
Can you give a basic description of what your code is doing?
0

Share this post


Link to post
Share on other sites
[quote name='Storyyeller' timestamp='1326295402' post='4901654']
However, there is a mysterious pattern to the timings. When I first run it, it starts out relatively fast. However, subsequent runs become slower and slower. If I leave the computer idle for a while, it becomes fast again. I can't really think of any explanation for this. What is going on?[/quote]
Does your script use vast amounts of memory? We sometimes observe this kind of behaviour on scripts that consume 8GB+ of memory (and thus overflow into virtual memory) - it seems the OS often takes a while to free up all that memory back to the next invocation.
0

Share this post


Link to post
Share on other sites
My os is Windows 7 and I'm using Python 2.6

As far as memory, according to the task manager, the python.exe process peaks around 6mb of memory, so I doubt that that is a problem.
0

Share this post


Link to post
Share on other sites
If the timing of the overall script with a simple timer is inconsistent, how would in depth profiling help?
0

Share this post


Link to post
Share on other sites
[quote name='Storyyeller' timestamp='1326312477' post='4901736']
If the timing of the overall script with a simple timer is inconsistent, how would in depth profiling help?[/quote]
You profile several runs, making sure to profile both a quick and a slow run. Compare where the majority of time is spent in each, and you may just have your culprit...
0

Share this post


Link to post
Share on other sites
While profiling, I noticed that a large amount of time is spent finding the set bits in an integer. Does anyone know the most effecient way to do this? Here's what I have right now. What would be ideal is a c implementation, but there doesn't appear to be one in the standard library, and I don't want to require other libraries.

[source lang='python']def toList(bs):
assert(bs >= 0) #negative numbers have infinitely many bits set
return [i for i,b in enumerate(reversed(bin(bs))) if b=='1']


[/source]
0

Share this post


Link to post
Share on other sites
Read all of [url="http://wiki.python.org/moin/BitManipulation"]BitManipulation[/url]. In particular, you might be able to build some variation of [color=#000000][font=Arial, Verdana, Geneva,]Kernighan/Knuth's bitCount() method, to iterate bits instead.[/font][/color]

[color=#000000][font=Arial, Verdana, Geneva,]Your current solution uses a string conversion, a reversal, an enumeration and a list comphrehension all together - that can't be very fast.[/font][/color]
-1

Share this post


Link to post
Share on other sites
That page looks out of date and inaccurate. Most of the methods involving manual loops are very slow in Python. (This is actually what I tried before I came up with the faster version posted above).
0

Share this post


Link to post
Share on other sites
Well, this problem intrigues me, so I took the time to produce what should be the optimal solution (for compiled languages, at least). It's based on [url="http://supertech.csail.mit.edu/papers/debruijn.pdf"]de Bruijn sequences[/url] to eliminate as many loop iterations as possible, and it's implemented as an iterator/generator.

[CODE]
_DeBruijnBitPosition = [
0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8,
31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
]

def _trunc32( w ):
w = int( ( w & 0x7fffFFFF ) | -( w & 0x80000000 ) )
return w

def findSetBits(v):
while v:
index = _DeBruijnBitPosition[_trunc32((v & -v) * 0x077CB531) >> 27]
v = (v & v-1)
yield index

# usage: list(findSetBits(4937)) => [0, 3, 6, 8, 9, 12]
[/CODE]

I don't happen to have a native python interpreter handy - fancy benchmarking this against your earlier list comprehension solution?
0

Share this post


Link to post
Share on other sites
That won't work in Python, because ints are arbitrary precision. The paper assumes fixed 32bit ints with overflow.
0

Share this post


Link to post
Share on other sites
[quote name='Storyyeller' timestamp='1326321224' post='4901788']
Is that only for 32bit ints?[/quote]
Yes - I had assumed you were operating on a maximum of 32-bits. It's not easy to generalise any of the high-performance solutions to a large number of bits.

How big are your bit sets typically?
0

Share this post


Link to post
Share on other sites
Well in this particular test case, the bit sets have length 109, though I hope it generalizes to larger values.
0

Share this post


Link to post
Share on other sites
[quote name='Storyyeller' timestamp='1326321806' post='4901797']
Well in this particular test case, the bit sets have length 109, though I hope it generalizes to larger values.[/quote]
One could derive further sequence values, but if you don't have a fixed upper bound, probably not worth it.

My research group attempted a similar thing for a large scale bittorrent simulation, with upwards of 1,000 elements per bitset. In the end, we ported the whole damn thing to C++, because we just couldn't squeeze enough performance out of python in this area.

That said, PyPy's performance has come along way in the last few years - have you considered trying your current solution out on their interpreter?



/aside:
[quote name='Storyyeller' timestamp='1326321224' post='4901788']
That won't work in Python, because ints are arbitrary precision. The paper assumes fixed 32bit ints with overflow.[/quote]
That's why I had to explicitly truncate to 32-bits with the _trunc32 function. I think I have handled the overflow correctly... Either way, it seems to pass all my unit tests.
0

Share this post


Link to post
Share on other sites
/aside:
[quote name='Storyyeller' timestamp='1326321224' post='4901788']
That won't work in Python, because ints are arbitrary precision. The paper assumes fixed 32bit ints with overflow.[/quote]
That's why I had to explicitly truncate to 32-bits with the _trunc32 function. I think I have handled the overflow correctly... Either way, it seems to pass all my unit tests.
0

Share this post


Link to post
[quote name='Storyyeller' timestamp='1326312477' post='4901736']
If the timing of the overall script with a simple timer is inconsistent, how would in depth profiling help?
[/quote]
Timing tells you it's slow. Profiling tells you why. You want to know the latter, presumably?

For counting bits, perhaps something like:

[code]
def count_bits(i):
return len(bin(i).replace('0', '')) - 1 # -1 for 'b'
[/code]
?

I would guess that moving the loop in to C land might help.
0

Share this post


Link to post
Share on other sites
[quote name='swiftcoder' timestamp='1326322188' post='4901799']
That's why I had to explicitly truncate to 32-bits with the _trunc32 function. I think I have handled the overflow correctly... Either way, it seems to pass all my unit tests.
[/quote]

Sorry, I said that while reading the paper and didn't look carefully at the code. It looks like your code will work as long as the bitset size is at most 32.



[quote name='edd²' timestamp='1326322461' post='4901802']
For counting bits, perhaps something like:

[code]
def count_bits(i):
return len(bin(i).replace('0', '')) - 1 # -1 for 'b'
[/code]
?

I would guess that moving the loop in to C land might help.
[/quote]

Personally, I use [code]bin(x).count("1")[/code] for counting bits. But the main problem here is gettign a list of set bits, not counting them.
0

Share this post


Link to post
Share on other sites
Ah, missed that. [s]Are you sure that's the problem? Do you have benchmark results?[/s]
EDIT: doh, and misread question.
0

Share this post


Link to post
Share on other sites
I figured out a way to rewrite my code so it doesn't require getting individual bits at all. Of course, it does mean more work has to be done in other places, but it is still faster over all.

Here's the profiler results after all the optimizations I've done.
[CODE]
5209449 function calls (5047367 primitive calls) in 27.757 CPU seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.004 0.004 27.757 27.757 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 bisect.py:1(<module>)
1904 0.017 0.000 0.036 0.000 bitset.py:10(bitset)
112424 0.640 0.000 1.129 0.000 bitset.py:3(bitcount)
321 0.008 0.000 0.009 0.000 bitset.py:33(toList)
1 0.000 0.000 0.000 0.000 bitset.py:41(makeKeyMap)
656208 2.846 0.000 4.449 0.000 bitset.py:7(union)
73374/13957 0.587 0.000 21.646 0.002 calculation.py:126(getCostSub2)
582249 1.164 0.000 1.164 0.000 calculation.py:14(isSubset)
31678 2.773 0.000 21.387 0.001 calculation.py:140(getSkeletonConnectLists)
112583 1.627 0.000 6.066 0.000 calculation.py:168(getTouchedChildren)
268969 1.865 0.000 3.028 0.000 calculation.py:17(connectKeyTupIsLessEq)
221132 0.478 0.000 0.478 0.000 calculation.py:173(getTouchSet)
751/109 0.018 0.000 0.118 0.001 calculation.py:186(makeCalcNode)
109 0.002 0.000 0.122 0.001 calculation.py:202(makeCalcRoot)
1 0.001 0.001 0.123 0.123 calculation.py:207(buildCalculationTrees)
321 0.008 0.000 0.077 0.000 calculation.py:22(makeConnectDict_Bitset)
109 0.001 0.000 0.003 0.000 calculation.py:31(__init__)
1983 0.005 0.000 0.005 0.000 calculation.py:36(getCost)
109 0.002 0.000 0.003 0.000 calculation.py:41(__init__)
342 0.032 0.000 25.089 0.073 calculation.py:46(getCost)
86698 1.167 0.000 4.451 0.000 calculation.py:6(insertIncomparableList)
321 0.006 0.000 0.097 0.000 calculation.py:68(__init__)
85023/2223 0.653 0.000 25.051 0.011 calculation.py:78(getCost)
12660/1466 1.070 0.000 25.019 0.017 calculation.py:95(getCostSub)
1453/321 0.022 0.000 0.026 0.000 connect.py:1(connectivityDictSub)
321 0.021 0.000 0.052 0.000 connect.py:18(connectivityDict)
215922 1.427 0.000 1.797 0.000 decomposition.py:13(<lambda>)
1 0.004 0.004 2.441 2.441 decomposition.py:33(carvingDecomposition)
236 0.001 0.000 0.002 0.000 decomposition.py:41(<lambda>)
1 0.006 0.006 0.036 0.036 decomposition.py:52(bitsetizeKeys)
1124 0.007 0.000 0.025 0.000 decomposition.py:53(convertEdge)
106 0.008 0.000 2.433 0.023 decomposition.py:6(greedySplitNode)
5989 0.039 0.000 0.073 0.000 decomposition.py:9(<lambda>)
212 0.001 0.000 0.001 0.000 graph.py:12(removeEdge)
107 0.000 0.000 0.001 0.000 graph.py:15(numNeighbors)
325 0.001 0.000 0.001 0.000 graph.py:4(__init__)
545 0.002 0.000 0.002 0.000 graph.py:8(addEdge)
1 0.003 0.003 0.003 0.003 heapq.py:31(<module>)
3 0.000 0.000 0.000 0.000 ntpath.py:122(splitdrive)
3 0.000 0.000 0.000 0.000 ntpath.py:55(isabs)
1 0.000 0.000 0.000 0.000 ntpath.py:63(join)
1 0.000 0.000 2.601 2.601 rsglib.py:27(__init__)
342 0.005 0.000 25.115 0.073 rsglib.py:40(getCost)
1 0.001 0.001 0.002 0.002 test.py:20(makeGraph)
493 0.007 0.000 0.007 0.000 test.py:31(getSuccessorStates)
1 0.007 0.007 25.145 25.145 test.py:38(aStarSearch)
342 0.010 0.000 25.127 0.073 test.py:41(push)
1 0.005 0.005 27.754 27.754 test.py:57(testLevel)
1 0.001 0.001 0.002 0.002 test.py:9(parseLevel)
220466 0.508 0.000 0.508 0.000 unionfind_bitset.py:18(getContainingSet)
406014 5.661 0.000 11.071 0.000 unionfind_bitset.py:3(unionBitsetLists)
153 0.001 0.000 0.001 0.000 {_heapq.heappop}
342 0.002 0.000 0.002 0.000 {_heapq.heappush}
196397 0.370 0.000 0.370 0.000 {any}
112745 0.250 0.000 0.250 0.000 {bin}
361464 0.640 0.000 0.640 0.000 {len}
2005/1097 0.014 0.000 0.033 0.000 {map}
1 0.000 0.000 0.000 0.000 {method '__enter__' of 'file' objects}
216 0.000 0.000 0.000 0.000 {method 'add' of 'set' objects}
574394 1.092 0.000 1.092 0.000 {method 'append' of 'list' objects}
112424 0.240 0.000 0.240 0.000 {method 'count' of 'str' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
106 0.000 0.000 0.000 0.000 {method 'get' of 'dict' objects}
215 0.001 0.000 0.001 0.000 {method 'items' of 'dict' objects}
321 0.001 0.000 0.001 0.000 {method 'iteritems' of 'dict' objects}
431 0.001 0.000 0.001 0.000 {method 'pop' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'readlines' of 'file' objects}
8 0.000 0.000 0.000 0.000 {method 'strip' of 'str' objects}
106 0.001 0.000 0.001 0.000 {method 'union' of 'frozenset' objects}
322 0.001 0.000 0.001 0.000 {method 'values' of 'dict' objects}
15168 0.593 0.000 2.388 0.000 {min}
1 0.000 0.000 0.000 0.000 {open}
65237 0.169 0.000 0.169 0.000 {range}
656208 1.602 0.000 1.602 0.000 {reduce}
7899/1910 0.064 0.000 0.114 0.000 {sorted}

[/CODE]
0

Share this post


Link to post
Share on other sites
[quote name='Storyyeller' timestamp='1326334144' post='4901863']
I figured out a way to rewrite my code so it doesn't require getting individual bits at all. Of course, it does mean more work has to be done in other places, but it is still faster over all.

Here's the profiler results after all the optimizations I've done.
...
[/quote]

What did the profile report look like before? Profiling only _after_ an optimization is kind of a useless exercise.
0

Share this post


Link to post
Share on other sites
So do any parts of that profile change significantly when when running your program repeatedly?
0

Share this post


Link to post
Share on other sites
[quote name='Storyyeller' timestamp='1326334144' post='4901863']
I figured out a way to rewrite my code so it doesn't require getting individual bits at all. Of course, it does mean more work has to be done in other places, but it is still faster over all.

Here's the profiler results after all the optimizations I've done.
[CODE]
5209449 function calls (5047367 primitive calls) in 27.757 CPU seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.004 0.004 27.757 27.757 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 bisect.py:1(<module>)
1904 0.017 0.000 0.036 0.000 bitset.py:10(bitset)
112424 0.640 0.000 1.129 0.000 bitset.py:3(bitcount)
321 0.008 0.000 0.009 0.000 bitset.py:33(toList)
1 0.000 0.000 0.000 0.000 bitset.py:41(makeKeyMap)
656208 2.846 0.000 4.449 0.000 bitset.py:7(union)
73374/13957 0.587 0.000 21.646 0.002 calculation.py:126(getCostSub2)
582249 1.164 0.000 1.164 0.000 calculation.py:14(isSubset)
31678 2.773 0.000 21.387 0.001 calculation.py:140(getSkeletonConnectLists)
112583 1.627 0.000 6.066 0.000 calculation.py:168(getTouchedChildren)
268969 1.865 0.000 3.028 0.000 calculation.py:17(connectKeyTupIsLessEq)
221132 0.478 0.000 0.478 0.000 calculation.py:173(getTouchSet)
751/109 0.018 0.000 0.118 0.001 calculation.py:186(makeCalcNode)
109 0.002 0.000 0.122 0.001 calculation.py:202(makeCalcRoot)
1 0.001 0.001 0.123 0.123 calculation.py:207(buildCalculationTrees)
321 0.008 0.000 0.077 0.000 calculation.py:22(makeConnectDict_Bitset)
109 0.001 0.000 0.003 0.000 calculation.py:31(__init__)
1983 0.005 0.000 0.005 0.000 calculation.py:36(getCost)
109 0.002 0.000 0.003 0.000 calculation.py:41(__init__)
342 0.032 0.000 25.089 0.073 calculation.py:46(getCost)
86698 1.167 0.000 4.451 0.000 calculation.py:6(insertIncomparableList)
321 0.006 0.000 0.097 0.000 calculation.py:68(__init__)
85023/2223 0.653 0.000 25.051 0.011 calculation.py:78(getCost)
12660/1466 1.070 0.000 25.019 0.017 calculation.py:95(getCostSub)
1453/321 0.022 0.000 0.026 0.000 connect.py:1(connectivityDictSub)
321 0.021 0.000 0.052 0.000 connect.py:18(connectivityDict)
215922 1.427 0.000 1.797 0.000 decomposition.py:13(<lambda>)
1 0.004 0.004 2.441 2.441 decomposition.py:33(carvingDecomposition)
236 0.001 0.000 0.002 0.000 decomposition.py:41(<lambda>)
1 0.006 0.006 0.036 0.036 decomposition.py:52(bitsetizeKeys)
1124 0.007 0.000 0.025 0.000 decomposition.py:53(convertEdge)
106 0.008 0.000 2.433 0.023 decomposition.py:6(greedySplitNode)
5989 0.039 0.000 0.073 0.000 decomposition.py:9(<lambda>)
212 0.001 0.000 0.001 0.000 graph.py:12(removeEdge)
107 0.000 0.000 0.001 0.000 graph.py:15(numNeighbors)
325 0.001 0.000 0.001 0.000 graph.py:4(__init__)
545 0.002 0.000 0.002 0.000 graph.py:8(addEdge)
1 0.003 0.003 0.003 0.003 heapq.py:31(<module>)
3 0.000 0.000 0.000 0.000 ntpath.py:122(splitdrive)
3 0.000 0.000 0.000 0.000 ntpath.py:55(isabs)
1 0.000 0.000 0.000 0.000 ntpath.py:63(join)
1 0.000 0.000 2.601 2.601 rsglib.py:27(__init__)
342 0.005 0.000 25.115 0.073 rsglib.py:40(getCost)
1 0.001 0.001 0.002 0.002 test.py:20(makeGraph)
493 0.007 0.000 0.007 0.000 test.py:31(getSuccessorStates)
1 0.007 0.007 25.145 25.145 test.py:38(aStarSearch)
342 0.010 0.000 25.127 0.073 test.py:41(push)
1 0.005 0.005 27.754 27.754 test.py:57(testLevel)
1 0.001 0.001 0.002 0.002 test.py:9(parseLevel)
220466 0.508 0.000 0.508 0.000 unionfind_bitset.py:18(getContainingSet)
406014 5.661 0.000 11.071 0.000 unionfind_bitset.py:3(unionBitsetLists)
153 0.001 0.000 0.001 0.000 {_heapq.heappop}
342 0.002 0.000 0.002 0.000 {_heapq.heappush}
196397 0.370 0.000 0.370 0.000 {any}
112745 0.250 0.000 0.250 0.000 {bin}
361464 0.640 0.000 0.640 0.000 {len}
2005/1097 0.014 0.000 0.033 0.000 {map}
1 0.000 0.000 0.000 0.000 {method '__enter__' of 'file' objects}
216 0.000 0.000 0.000 0.000 {method 'add' of 'set' objects}
574394 1.092 0.000 1.092 0.000 {method 'append' of 'list' objects}
112424 0.240 0.000 0.240 0.000 {method 'count' of 'str' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
106 0.000 0.000 0.000 0.000 {method 'get' of 'dict' objects}
215 0.001 0.000 0.001 0.000 {method 'items' of 'dict' objects}
321 0.001 0.000 0.001 0.000 {method 'iteritems' of 'dict' objects}
431 0.001 0.000 0.001 0.000 {method 'pop' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'readlines' of 'file' objects}
8 0.000 0.000 0.000 0.000 {method 'strip' of 'str' objects}
106 0.001 0.000 0.001 0.000 {method 'union' of 'frozenset' objects}
322 0.001 0.000 0.001 0.000 {method 'values' of 'dict' objects}
15168 0.593 0.000 2.388 0.000 {min}
1 0.000 0.000 0.000 0.000 {open}
65237 0.169 0.000 0.169 0.000 {range}
656208 1.602 0.000 1.602 0.000 {reduce}
7899/1910 0.064 0.000 0.114 0.000 {sorted}

[/CODE]
[/quote]

Maybe I wasn't clear enough. What I mean is, showing us the POST-optimization profiling report tells us nothing unless you actually have a PRE-optimization report to compare it to.

If you DO have both reports (but just haven't posted the other one), great! You should now know where your bottle neck is and/or if your 'optimizations' had the desired effect.
1

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0