string vs. int comparison which is faster?

Started by
10 comments, last by taz0010 13 years, 1 month ago
Which is faster string or int comparisons? I am assuming int is due to it should only have to check a single value... What about the new sse4.2? instructions, I thought I read they had added hardware to do this faster now?

Thanks!
Advertisement
I can't imagine anything being faster than an int comparison, though string compare time I'm sure is a factor of the length of the string.

If you're comparing one character strings, that might be close to an int, I'd guess it probably has to first compare lengths, then compare the individual characters. I'm not sure if it does char-by-char comparison, or maybe it can pack 4 chars into an int and do the compare, or maybe it does some kind of SSE compare. It shouldn't ever be faster than an int, maybe sometimes roughly the same, sometimes longer.
[size=2]My Projects:
[size=2]Portfolio Map for Android - Free Visual Portfolio Tracker
[size=2]Electron Flux for Android - Free Puzzle/Logic Game
[color="#FF0000"]Warning: The below is mostly hypothesizing, I don't know what's actually going on under the hood.

I don't have a clue about sse4.2, but in general int comparison is way faster than string. Best case scenario string is equal in speed to int, worst case depends on the length of the string.

Basically, imagine two arrays of 8 bit integers, then image running through the arrays comparing the values:

[1, 2, 3, 4, 5] == [1, 2, 5, 6, 7]


Is [1] == [1]? Yes, so move on.
Is [2] == [2]? Yes, so move on.
Is [3] == [5]? Nope, so the two strings are not equal.

If the very first letter of both strings are different, then it only has to check the first letter to know.
But if both the strings are entirely the same, it has to loop through every letter to confirm it.
If they are both the same except for the last letter, it still has to loop through every letter to confirm it.
The longer the similar parts of the strings are, the longer it takes to compare them.

I assume it's heavily optimized, but there is no way they can optimize comparing two 20-element arrays of 8bit integers to be faster than comparing two 1-element arrays of 32 bit integers - consider: they've also optimized int comparison as best as they can too.

If you have to do alot of these comparisons, maybe look into string hashing (But don't optimize pre-maturely). String comparison for very long very similar strings is working fast enough for my current project, because I only have to compare them when a new resource is loaded).

[color="#FF0000"]Warning: The below is mostly hypothesizing, I don't know what's actually going on under the hood.

I don't have a clue about sse4.2, but in general int comparison is way faster than string. Best case scenario string is equal in speed to int, worst case depends on the length of the string.

Basically, imagine two arrays of 8 bit integers, then image running through the arrays comparing the values:

[1, 2, 3, 4, 5] == [1, 2, 5, 6, 7]


Is [1] == [1]? Yes, so move on.
Is [2] == [2]? Yes, so move on.
Is [3] == [5]? Nope, so the two strings are not equal.

If the very first letter of both strings are different, then it only has to check the first letter to know.
But if both the strings are entirely the same, it has to loop through every letter to confirm it.
If they are both the same except for the last letter, it still has to loop through every letter to confirm it.
The longer the similar parts of the strings are, the longer it takes to compare them.

I assume it's heavily optimized, but there is no way they can optimize comparing two 20-element arrays of 8bit integers to be faster than comparing two 1-element arrays of 32 bit integers - consider: they've also optimized int comparison as best as they can too.

If you have to do alot of these comparisons, maybe look into string hashing (But don't optimize pre-maturely). String comparison for very long very similar strings is working fast enough for my current project, because I only have to compare them when a new resource is loaded).


I am planning on making a common resource box that allows all objects to pull the needed textures, mehses, sounds, ect.. from and storing the filenames is nice as its unique but calling a string comparison thousands of times a frame seems like huge waste of cycles... This is why I asked. My first game I just used strings but that was a simple arcade game with 20 objects in it. Next one could have 100's or more.

I am planning on making a common resource box that allows all objects to pull the needed textures, mehses, sounds, ect.. from and storing the filenames is nice as its unique but calling a string comparison thousands of times a frame seems like huge waste of cycles... This is why I asked. My first game I just used strings but that was a simple arcade game with 20 objects in it. Next one could have 100's or more.

I do something similar for tile images. When the Tile object is created or changed, it calls the image cache to get the image, and stores a pointer (boost::shared_ptr, actually) to the image. It only needs to do it once (whenever the Tile is created or recreated) not every frame, since it keeps the pointer to the image. Consider whether you actually need to do the comparison every frame, or whether you can just store the pointer.
If you can't, look up string hashing - I never used it before, but it looks like it'll help your situation.
Read some useful info here about strings for resource names and hashing them: http://bitsquid.blogspot.com/2010/10/static-hash-values.html (dont miss the comment section).
Preimptive optimization is never a good idea.

Is there a speed issue with using strings? I am guessing you have no idea if it is an issue or not.

Don't worry about it unless you profile it and see that it is an problem.

theTroll
Short version:
comparing two ints is significantly faster.

Long version:
Below you can see three different comparison functions and the assembly/machine code produced by the compiler (Visual Studio 2010):

INTEGER COMPARISON
bool iCompare ( int a, int b ){
return(a==B);
}
yields this assembly/machine code:?iCompare@@YI_NHH@Z (bool __fastcall iCompare(int,int)):
00000000: 33 C0 xor eax,eax
00000002: 3B CA cmp ecx,edx
00000004: 0F 94 C0 sete al
00000007: C3 ret

TRADITIONAL ASCII STRING COMPARISON
bool aCompare ( char* a, char* b, int n ){
while(n>=0)
if(a[n]!=b[n])
return false;
else
n--;

return true;
}
yields this assembly/machine code:?aCompare@@YI_NPAD0H@Z (bool __fastcall aCompare(char *,char *,int)):
00000000: 8B 44 24 04 mov eax,dword ptr [esp+4]
00000004: 56 push esi
00000005: 85 C0 test eax,eax
00000007: 74 12 je 0000001B
00000009: 8D 34 02 lea esi,[edx+eax]
0000000C: 2B CA sub ecx,edx
0000000E: 8B FF mov edi,edi
00000010: 8A 14 31 mov dl,byte ptr [ecx+esi]
00000013: 3A 16 cmp dl,byte ptr [esi]
00000015: 75 0A jne 00000021
00000017: 4E dec esi
00000018: 48 dec eax
00000019: 75 F5 jne 00000010
0000001B: B0 01 mov al,1
0000001D: 5E pop esi
0000001E: C2 04 00 ret 4
00000021: 32 C0 xor al,al
00000023: 5E pop esi
00000024: C2 04 00 ret 4

STD::STRING COMPARISON
bool sCompare ( string a, string b ){
return(a==B);
}
yields this assembly/machine code:?sCompare@@YI_NV?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@0@Z (bool __fastcall sCompare(class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >)):
00000000: 6A FF push 0FFFFFFFFh
00000002: 68 00 00 00 00 push offset __ehhandler$?sCompare@@YI_NV?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@0@Z
00000007: 64 A1 00 00 00 00 mov eax,dword ptr fs:[00000000h]
0000000D: 50 push eax
0000000E: 53 push ebx
0000000F: A1 00 00 00 00 mov eax,dword ptr [___security_cookie]
00000014: 33 C4 xor eax,esp
00000016: 50 push eax
00000017: 8D 44 24 08 lea eax,[esp+8]
0000001B: 64 A3 00 00 00 00 mov dword ptr fs:[00000000h],eax
00000021: 83 7C 24 48 10 cmp dword ptr [esp+48h],10h
00000026: 8B 44 24 34 mov eax,dword ptr [esp+34h]
0000002A: C7 44 24 10 01 00 mov dword ptr [esp+10h],1
00 00
00000032: 73 04 jae 00000038
00000034: 8D 44 24 34 lea eax,[esp+34h]
00000038: 8B 4C 24 44 mov ecx,dword ptr [esp+44h]
0000003C: 8B 54 24 28 mov edx,dword ptr [esp+28h]
00000040: 51 push ecx
00000041: 50 push eax
00000042: 52 push edx
00000043: 6A 00 push 0
00000045: 8D 4C 24 28 lea ecx,[esp+28h]
00000049: E8 00 00 00 00 call ?compare@?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@QBEHIIPBDI@Z
0000004E: 85 C0 test eax,eax
00000050: 0F 94 C3 sete bl
00000053: 83 7C 24 2C 10 cmp dword ptr [esp+2Ch],10h
00000058: 72 0D jb 00000067
0000005A: 8B 44 24 18 mov eax,dword ptr [esp+18h]
0000005E: 50 push eax
0000005F: E8 00 00 00 00 call ??3@YAXPAX@Z
00000064: 83 C4 04 add esp,4
00000067: 83 7C 24 48 10 cmp dword ptr [esp+48h],10h
0000006C: C7 44 24 2C 0F 00 mov dword ptr [esp+2Ch],0Fh
00 00
00000074: C7 44 24 28 00 00 mov dword ptr [esp+28h],0
00 00
0000007C: C6 44 24 18 00 mov byte ptr [esp+18h],0
00000081: 72 0D jb 00000090
00000083: 8B 4C 24 34 mov ecx,dword ptr [esp+34h]
00000087: 51 push ecx
00000088: E8 00 00 00 00 call ??3@YAXPAX@Z
0000008D: 83 C4 04 add esp,4
00000090: 8A C3 mov al,bl
00000092: 8B 4C 24 08 mov ecx,dword ptr [esp+8]
00000096: 64 89 0D 00 00 00 mov dword ptr fs:[0],ecx
00
0000009D: 59 pop ecx
0000009E: 5B pop ebx
0000009F: 83 C4 0C add esp,0Ch
000000A2: C2 38 00 ret 38h
Why does it matter which is faster?

Why does it matter which is faster?


Because as of now I am pulling my resources e.g. shaders from a cache...

I guess I could just assign pointer aliases to the objects and have that pointer set at load time and avoid looking up the resources in realtime...

This topic is closed to new replies.

Advertisement