Extracting an array from another array- speed

Started by
8 comments, last by Zeke 22 years, 7 months ago
I have an array of CStrings (yes I know mfc is a pain but I didnt know that when i began using it and now I just havent got the time to change it to win32). This array can be very big (20,000 is the biggest Ive needed it so far). In the array are names suck as "Horatio_00_00", "Horatio_01_00","Hamlet_00_00" etc. Now I need to be able to extract and return a CString array of every occurance of the name Horatio that has a different end number (so basically if the original array was : "Horatio_00_00", "Horatio_01_00", "Horatio_00_01", "Horatio_01_01", "Horatio_00_06", "Horatio_01_06", "Horatio_00_13", "Horatio_01_13", then my returned array would be: "Horatio_00_00", "Horatio_00_01", "Horatio_00_06", "Horatio_00_12", I can do this fine except when I got the array with 20,000 CStrings I noticed a slight slowdown (it took 15 minutes just to get through the first 1000). I know the reason for this iwas using a nested loop where x and y started off at 0 and went up to 20,000 like so: for (int x=0;x<NumElements;x++) { for (int y=0;y<NumElements;y++) { } } I was having to do this twice. Once to find out how many Elements I needed in my resultant array (so that i could allocate the memory) and then again to actually put it in the array. Of course this is fine if I have only a few elements in my array but way too long and inefficient if i have many. There has to be another way to get just the few elements of the array I need without having to loop through it like this but I cant think of any other way to do it. If anyone could hlep me I would be very grateful. Thanks
Just my thoughts take them as you will. "People spend too much time thinking about the past, whatever else it is, its gone"-Mel Gibson, Man Without A Face
Advertisement
One of my first rules of data structures is "Do optimization up front where you''re already spending ticks."

In analysis, I have to wonder what the CStrings are representing. Two options leap out immediately, but I''m MFC deficient (Delphi hax0r), so bear with me.

I''d add another array to act as a hash table. Every unique element I add to the big array gets added to the hash array with the new value, along with the quantity and list of indexes that contain that unique value. It is a tradeoff of size for speed, true, but hashing is almost always faster than brute-force.

Can your CString[] be configured to ignore duplicates like TStrings or TList in Delphi?

Is it undesirable for you to store duplicates in your large array, or are they in fact meaningful to your application?

---------------
-WarMage
...breck my chain at the door...
CString does reference counting, and stores the string length as a variable so it doesn''t need to recompute its length. I prefer MFC''s CString over the STL std::string, except for the fact that you can only use it in MFC programs.

I would say that you need to retune your representation, as WarMage mentioned. It''s hard to tell from this exactly what you are doing, but you might want to think of having a multi-dimensional array or hash table/map to solve this problem. Right off the bat, I''m thinking those numbers at the end need to be integers in a structure rather than strings appended to your names, but I don''t know exactly what you''re doing.
Thanks for your reply WarMage.

There arent actually any duplicates in my array as the CStrings are FileNames (of images) so if there are duplicates they need to be there because they are different images (i.e. a different path).

Now as for your hash table you have intrigued me (and pretty much lost me lol), I would much much much prefer to use more memory than for it to be so slow (6000 small images took me 1/2 hour to process and all i am doing to the images is compressing them and possibly converting them from 16bit to 24bit or vice versa- this should not take as long as it is doing). So id rather do this hash table thing. Could you perhaps explain it a bit more in depth?

Thanks again for the reply
Just my thoughts take them as you will. "People spend too much time thinking about the past, whatever else it is, its gone"-Mel Gibson, Man Without A Face
Compression can take a long time.

If I were you, I''d comment out all the stuff that''s changing image formats or doing compression, and time just the search by itself, to make sure it really is the major source of your slowdown.
Thanks Stoffel and AP.

AP>Yeah its definately this extracting thing. Ive put
TRACE("Processing") in the first loop:

for (int x=0;x{
for (int y blah blah)
{
}
TRACE("Processing")
}

And then Ive watched it in my debug window and its definately here that it spends 90% of its time.

Stoffel>This is the format of the CStrings CharacterName_Direction_Pose

So for Hamlet_00_00 it means the image is of Hamlet it is the first direction and the first pose. What Im trying to do is group all the poses for each name together so that i can pass along a CString* that contains all of the Hamlet_Directions_00 etc and convert them/compress them and put them in a dat file.

Hmmmm I still dont think im explaining this too well. I cant see a way a multidimensional array can help and im not sure what you or WarMage mean by hash table/map.

Thanks again for the help guys
Just my thoughts take them as you will. "People spend too much time thinking about the past, whatever else it is, its gone"-Mel Gibson, Man Without A Face
First, hashing is only an optimization for lookup, and particularly only if every element is not unique.

Consider your hash table to be a directory or catalog of your actual array, stored as an array of records. This directory should contain certain data that are useful for managing and manipulating the array itself.

XHashEntry = {CString Key, Int Count, CString CSVIndexList}

Key is the unique value in the array: "Horatio_00_01"
Count is the number of elements of this value, you inc and dec it when you add or remove an element of your array whose key matches this record.key

CSVIndexList is a comma-seperated value list containing the indices to which this record applies. Edit - Upon further consideration, this could also be an array of int, a vector, or some other ordered list class as well.

if your list is built in the order
1: "Horatio_01_01"
2: "Horatio_00_00"
3: "Horatio_01_00"
4: "Horatio_00_01"
5: "Horatio_00_06"
6: "Horatio_01_13"
7: "Horatio_01_06"

your hash would look like
1: {"Horatio_00_00", 2, "2,3"}
2: {"Horatio_00_01", 2, "1,4"}
3: {"Horatio_00_06", 2, "5,7"}
4: {"Horatio_00_13", 1, "6"}

Then you can search all of record.Keys for your desired value, and know how many elements you have of that type, and where they live.

----------------
-WarMage
...over 3000 recto-cranial inversions performed...

Edited by - WarMage on September 24, 2001 3:39:21 PM
Have you considered a tree? Your aim is to get a list of pictures that some property, right?

How about this structure:
Horatio|+-- 00|   ||   +-- Horatio_00_00|   ||   +-- Horatio_00_01|   ||   +-- Horatio_00_06|   ||   +-- Horatio_00_13|+-- 01    |    +-- Horatio_01_00    |    +-- Horatio_01_01    |    +-- Horatio_01_06    |    +-- Horatio_01_13 

And so on. That way, all you gotta do is return a pointer to the '00' node or '01' node and so forth.

You have a tiny overhead for insertion and deletion (if you want them sorted/balanced) but it worth it if you can speed up searching. log (n) + n... I think. log(n) to get the node and n to traverse the rest.

---
PAGE FAULT: Please insert "Swap File, Disk 2"
and press any key to continue.

Edited by - OberonZ on September 24, 2001 6:29:10 PM

Edited by - OberonZ on September 24, 2001 6:31:07 PM
---
PAGE FAULT: Please insert "Swap File, Disk 2"
and press any key to continue.
Thanks guys. Im not too sure whether im gonna go for the hash table thingy or the tree, I guess ill try both and see which is faster. Thanks a lot for the help
Just my thoughts take them as you will. "People spend too much time thinking about the past, whatever else it is, its gone"-Mel Gibson, Man Without A Face
Using the STL Multimap you can store and find any type of data. The following example uses STL strings ...

Note. Multimap needs the data to be entered in a sorted order.

#pragma warning(disable:4786)

#include &ltmap>
#include &ltstring>
#include &ltiostream>

using namespace std;

void main()
{
multimap&ltstring, string> mMap; // this line generates Warning 4786
multimap&ltstring, string>::iterator mMapIterator;
multimap&ltstring, string>::iterator mMapIteratorStart;
multimap&ltstring, string>::iterator mMapIteratorEnd;

mMap.insert(map&ltstring, string>::value_type("Horatio_00", "_01"));
mMap.insert(map&ltstring, string>::value_type("Horatio_00", "_02"));
mMap.insert(map&ltstring, string>::value_type("Horatio_00", "_03"));
mMap.insert(map&ltstring, string>::value_type("Horatio_00", "_04"));
mMap.insert(map&ltstring, string>::value_type("Horatio_01", "_01"));
mMap.insert(map&ltstring, string>::value_type("Horatio_01", "_02"));
mMap.insert(map&ltstring, string>::value_type("Horatio_01", "_03"));
mMap.insert(map&ltstring, string>::value_type("Horatio_01", "_04"));

mMapIteratorStart = mMap.equal_range("Horatio_00").first;
mMapIteratorEnd = mMap.equal_range("Horatio_00").second;

for (mMapIterator = mMapIteratorStart; mMapIterator != mMapIteratorEnd; mMapIterator++)
{
cout << (*MapIterator).first << (*mMapIterator).second << "\n";
}
}

Hope it helps

This topic is closed to new replies.

Advertisement