strange performance patterns

Started by
21 comments, last by larsbutler 12 years, 3 months ago

I figured out a way to rewrite my code so it doesn't require getting individual bits at all. Of course, it does mean more work has to be done in other places, but it is still faster over all.

Here's the profiler results after all the optimizations I've done.
...


What did the profile report look like before? Profiling only _after_ an optimization is kind of a useless exercise.
Advertisement
Why? All the old profiling would show is the parts that I already changed.
I trust exceptions about as far as I can throw them.
So do any parts of that profile change significantly when when running your program repeatedly?

I figured out a way to rewrite my code so it doesn't require getting individual bits at all. Of course, it does mean more work has to be done in other places, but it is still faster over all.

Here's the profiler results after all the optimizations I've done.

5209449 function calls (5047367 primitive calls) in 27.757 CPU seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.004 0.004 27.757 27.757 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 bisect.py:1(<module>)
1904 0.017 0.000 0.036 0.000 bitset.py:10(bitset)
112424 0.640 0.000 1.129 0.000 bitset.py:3(bitcount)
321 0.008 0.000 0.009 0.000 bitset.py:33(toList)
1 0.000 0.000 0.000 0.000 bitset.py:41(makeKeyMap)
656208 2.846 0.000 4.449 0.000 bitset.py:7(union)
73374/13957 0.587 0.000 21.646 0.002 calculation.py:126(getCostSub2)
582249 1.164 0.000 1.164 0.000 calculation.py:14(isSubset)
31678 2.773 0.000 21.387 0.001 calculation.py:140(getSkeletonConnectLists)
112583 1.627 0.000 6.066 0.000 calculation.py:168(getTouchedChildren)
268969 1.865 0.000 3.028 0.000 calculation.py:17(connectKeyTupIsLessEq)
221132 0.478 0.000 0.478 0.000 calculation.py:173(getTouchSet)
751/109 0.018 0.000 0.118 0.001 calculation.py:186(makeCalcNode)
109 0.002 0.000 0.122 0.001 calculation.py:202(makeCalcRoot)
1 0.001 0.001 0.123 0.123 calculation.py:207(buildCalculationTrees)
321 0.008 0.000 0.077 0.000 calculation.py:22(makeConnectDict_Bitset)
109 0.001 0.000 0.003 0.000 calculation.py:31(__init__)
1983 0.005 0.000 0.005 0.000 calculation.py:36(getCost)
109 0.002 0.000 0.003 0.000 calculation.py:41(__init__)
342 0.032 0.000 25.089 0.073 calculation.py:46(getCost)
86698 1.167 0.000 4.451 0.000 calculation.py:6(insertIncomparableList)
321 0.006 0.000 0.097 0.000 calculation.py:68(__init__)
85023/2223 0.653 0.000 25.051 0.011 calculation.py:78(getCost)
12660/1466 1.070 0.000 25.019 0.017 calculation.py:95(getCostSub)
1453/321 0.022 0.000 0.026 0.000 connect.py:1(connectivityDictSub)
321 0.021 0.000 0.052 0.000 connect.py:18(connectivityDict)
215922 1.427 0.000 1.797 0.000 decomposition.py:13(<lambda>)
1 0.004 0.004 2.441 2.441 decomposition.py:33(carvingDecomposition)
236 0.001 0.000 0.002 0.000 decomposition.py:41(<lambda>)
1 0.006 0.006 0.036 0.036 decomposition.py:52(bitsetizeKeys)
1124 0.007 0.000 0.025 0.000 decomposition.py:53(convertEdge)
106 0.008 0.000 2.433 0.023 decomposition.py:6(greedySplitNode)
5989 0.039 0.000 0.073 0.000 decomposition.py:9(<lambda>)
212 0.001 0.000 0.001 0.000 graph.py:12(removeEdge)
107 0.000 0.000 0.001 0.000 graph.py:15(numNeighbors)
325 0.001 0.000 0.001 0.000 graph.py:4(__init__)
545 0.002 0.000 0.002 0.000 graph.py:8(addEdge)
1 0.003 0.003 0.003 0.003 heapq.py:31(<module>)
3 0.000 0.000 0.000 0.000 ntpath.py:122(splitdrive)
3 0.000 0.000 0.000 0.000 ntpath.py:55(isabs)
1 0.000 0.000 0.000 0.000 ntpath.py:63(join)
1 0.000 0.000 2.601 2.601 rsglib.py:27(__init__)
342 0.005 0.000 25.115 0.073 rsglib.py:40(getCost)
1 0.001 0.001 0.002 0.002 test.py:20(makeGraph)
493 0.007 0.000 0.007 0.000 test.py:31(getSuccessorStates)
1 0.007 0.007 25.145 25.145 test.py:38(aStarSearch)
342 0.010 0.000 25.127 0.073 test.py:41(push)
1 0.005 0.005 27.754 27.754 test.py:57(testLevel)
1 0.001 0.001 0.002 0.002 test.py:9(parseLevel)
220466 0.508 0.000 0.508 0.000 unionfind_bitset.py:18(getContainingSet)
406014 5.661 0.000 11.071 0.000 unionfind_bitset.py:3(unionBitsetLists)
153 0.001 0.000 0.001 0.000 {_heapq.heappop}
342 0.002 0.000 0.002 0.000 {_heapq.heappush}
196397 0.370 0.000 0.370 0.000 {any}
112745 0.250 0.000 0.250 0.000 {bin}
361464 0.640 0.000 0.640 0.000 {len}
2005/1097 0.014 0.000 0.033 0.000 {map}
1 0.000 0.000 0.000 0.000 {method '__enter__' of 'file' objects}
216 0.000 0.000 0.000 0.000 {method 'add' of 'set' objects}
574394 1.092 0.000 1.092 0.000 {method 'append' of 'list' objects}
112424 0.240 0.000 0.240 0.000 {method 'count' of 'str' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
106 0.000 0.000 0.000 0.000 {method 'get' of 'dict' objects}
215 0.001 0.000 0.001 0.000 {method 'items' of 'dict' objects}
321 0.001 0.000 0.001 0.000 {method 'iteritems' of 'dict' objects}
431 0.001 0.000 0.001 0.000 {method 'pop' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'readlines' of 'file' objects}
8 0.000 0.000 0.000 0.000 {method 'strip' of 'str' objects}
106 0.001 0.000 0.001 0.000 {method 'union' of 'frozenset' objects}
322 0.001 0.000 0.001 0.000 {method 'values' of 'dict' objects}
15168 0.593 0.000 2.388 0.000 {min}
1 0.000 0.000 0.000 0.000 {open}
65237 0.169 0.000 0.169 0.000 {range}
656208 1.602 0.000 1.602 0.000 {reduce}
7899/1910 0.064 0.000 0.114 0.000 {sorted}




Maybe I wasn't clear enough. What I mean is, showing us the POST-optimization profiling report tells us nothing unless you actually have a PRE-optimization report to compare it to.

If you DO have both reports (but just haven't posted the other one), great! You should now know where your bottle neck is and/or if your 'optimizations' had the desired effect.

This topic is closed to new replies.

Advertisement