Archived

This topic is now archived and is closed to further replies.

Sorting multi dimensional structures

This topic is 5582 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

Recommended Posts

Hi there, I'm looking for the quickest way of sorting a kind of multi dimensional list with a few parameters to sort by, I'll try and explain as good as I can! struct { int structused; int id; float weight; int occurences; int positions[48]; } data[MAXSTRUCTS][500] Occurences could be any number > 0, the first 48 (or less) occurences are recorded as positions ie. "hello there you there" = "there" 2 occurences, position 2+4 (well 1+3, starting at 0) MAXSTRUCTS would be > 1 but at the moment < 6 but never > 10. Hope that made sense. 500 different ids is the max but there could be less (ie. structused). Say I had three lists: ---------- data[0][0] = id(100), weight(0.9873), occurences(3), positions(0, 2, 8) data[0][1] = id(56), weight(0.8946), occurences(2), positions(3, 10) ----------- data[1][0] = id(34), weight(0.9983), occurences(3), positions(1, 3, 6) data[1][1] = id(56), weight(0.9000), occurences(2), positions(4, 13) ----------- data[2][0] = id(34), weight(0.9999), occurences(4), positions(2, 9, 14) data[2][1] = id(100), weight(0.9873), occurences(3), positions(1, 3, 9) data[2][2] = id(56), weight(0.7635), occurences(1), positions(5, 10) ----------- A perfect match would be id(56) as its in all structures plus in order, positions [0]3, [1]4, [2]5, then id(34) as it has a higher combined weight than id(100) plus id(34) has a match of [1]1, [2]2. The more structs the id appears in the better, if more than one appears in all, the positions and weights come into play with order then weight as importance. Phew, that was difficult to explain I hope someone understands! any help would be appreciated. Edit: Forgot to say each data[x] is already sorted by weight order. Regards.

Share on other sites
Do you have a good reason of sticking to C? When working with comprex data structures, C++ and STL would be of great help.

Share on other sites
Only because that is what I''m used to and feel most comfortable with, the rest of the programs that make up the project are also C.

Share on other sites
You describe a total order thus it's a one dimensional case. Any conventionnal sort algorithm applies. In fact you can create a specific "number" or "string" n for each id. Then just sort all your ids with quicksort if you want (std C lib). You can also use a hash table to accelerate. But I suppose, given the size of data, it's not really worth optimizing it too much.

For n, for instance, simply concatenate the bytes or ints in a string :

id occurences in the structs (don't mix it with the occurences of the word in the sentences)
+ weight
+ positions.

Create a struct like that for each id :

- int id;
- char *string;

then it's only sorting strings ...

Well I replied quickly and not certain I understood your pb perfectly ...

[edited by - Charles B on July 4, 2003 5:52:45 PM]

Share on other sites
Hi Charles B,

Thanks for your help but yeah I think you misunderstood, I wish I did only have to sort by ids!

I need to sort by weight order, then by word position order over multiple structures which may contain a matched id (which is what I want) but may not.

If an id appears in all data[thisdimension][] structures the sort goes to positions, where a perfect match would be:
data[0][x].pos = 1, data[0][x].id = 100
data[1][x].pos = 2, data[1][x].id = 100
data[2][x].pos = 3, data[2][x].id = 100
with the highest combined weight.
x might not be the same number but data[][x].id is as the lists are already sorted by weight.

Head ache material trying to explain that

Cheers anyway.

[edited by - glacai on July 4, 2003 6:34:18 PM]

Share on other sites
K let's try again ... step by step this time so that you can correct the points if I misunderstood something

1) Your output is an ordered list of structures (not ids as I first thought). So this means there is a total order : if a < b and b < c then a < c (no cycle possible).

2) Priority 1 is the number of occurences of the structure id in all tables. For instance. All structs with id 56 are before structures with id 100.

3) Priority 2 is if two ids appear a same number of occurences in your tables, then you consider the sum of (combined) weights to know wich id wins.

4) Priority 3 is when two structs share the same id, you order them with positions. It's an alphabetic order that compares strings of numbers (instead of characters). Though you did not mention it I suppose 0,2 is lower than 0,2,3 (right?).

So my starting idea may be readaptated.

Then first sort all your ids.

First build an array that contains id structs like this :

- id
- occurences in struct tables (MAXSTRUCT) // BTW it's a bit confusing why not call it MAXSTRUCTARRAYS, as each array contains structs.
- combined weight.

Parsing your structures, for each new id you meet create an id struct. Store your current index. Continue and each time you meet the id again update the occurence and combined weight counts. BTW if your ids were consecutive it could be much faster. Remark (a) : You would simply have to access the table by indexing with id. If you have only a few holes use a fake id ( 0 ? ) that will remain inactive and not really sorted later. So you could parse your structs linearily (n) and not with a double loop ( n*(n-1)/2 )

Then sort your table by occurences, then weight. BTW if your weights were integers it would be simple to concatenate occurences in most significant bits (msb) and combined weight in lsb. Struct occurences in 8 bytes and weight in fixed point 4.20 will be accurate enough right ? If your number of ids is small then no need to use a super algorithm.

Now that the ids are sorted, for each id in the odered list parse every structure and push each structure with that id in your output table (push only a pointer to the struct probably ... I don't know your application details).

Once it's done for each id, then in each same-id sub table, sort each structure by position (string). If your number of structs per id is small then the basic n square sorts will be good enough.

This time I suppose I am right. It's the naive approach. However I am certain the whole process can be optimised. Still as the sort is split into two passes, it's not as inefficient as it seems. It's like a hashing algorithm in first order.

I think the complexity can be optimal much better than n log(n) in practice using remark a).

Damn you pushed my IQ and english level close to its boundaries ROFL. That's why I lost time to reply. Thanks God no head ache yet.

[edited by - Charles B on July 4, 2003 8:41:12 PM]

Share on other sites
Nice one Charles!!!!

That is so much appreciated you wouldn''t believe, in the last couple of hours I have nearly implemented exactly as you are suggesting! But its nice for someone else to recommend something similar as I had thought, slightly reassuring.

I agree there''s probably a better way but hopefully this will do for now, I should know in the next hour or so.

I really appreciate your time, thanks!

Share on other sites
Given you can define an ordering rules, as said as before, a < b, b < c -> a < c. You can reduce your problem to a one dimensional sorting problem.

you can use the qsort() function, passing it a function that compares 2 of your structures.

if would be something like:

int compare(void *va, void * vb)
{
struct {
int structused;
int id;
float weight;
int occurences;
int positions[48];
} * a = va, * b = vb;
if(a->id < b->id) return -1;
else if(a->id > b->id) return 1;
....
}

Given that you can sort your array pretty easily.
It seens that the main problem here is that you cannot explain correctly your problem to us.

Share on other sites
quote:
Given that you can sort your array pretty easily.
It seens that the main problem here is that you cannot explain correctly your problem to us.

Well it seems Charles B understood, thanks anyway.

Share on other sites
In fact this morning (same in France and UK I think) I saw a straightforward optimisation.

One remark first. There is no way to modify your structs so that a single quicksort call can be used. The total order is not explicit but a result of intrinsic rules. Even if someone tried to use a sort on the whole structs at once this would be highly unoptimised as the id order needs to be done in a separate pass anyway. So the other poster did not undertand your issue.

You understood that pass 1 can be linear with trick a).

I think you understood but more precisely :

You parse all your structures just once, a single loop. You access ids-structs with id being considered as the index in an id-struct array. So you don't have to seek where the id is in the table when you meet the id in another struct again. The table creation can't be optimised.

The sort of these ids can now use a hash table based on id occurence I suppose these occurences are fairly limited by your MAXSTRUCT constant. If you don't see how to optimize such a hash I can explain it to you in more details. It's based on an histogram that you transform into a list of consecutive smaller arrays (ptr, size) of ids with same occurences. This is all linear. Then you only have to sort the ids in these small arrays. Which should be quasi linear as these are very small lists.

Then back to pass 2 the sort of structs that share a same id. The problem in my first implementation is that you need to parse the whole structs for each id. which is some kind of n-squared. How to make it linear and access each struct once or twice only again ?

Simple ! In pass 1/ array construction create a single linked list (SLL) of structs in each id struct. So you just have to add a pNextWithSameID pointer in your structs and a pHead, pTail pointer for the SLL management in each id-struct.

this would look like :
for( pStruct = &StructArray[0][0]; pStruct <= pLastStruct; pStruct++ ){   int id = pStruct->id;   mySTRUCT *pPrev = IDArray[id]->pTail;   if(pPrev==0) // Empty SLL test.   {     IDArray[id].pHead = pStruct;     IDArray[id].pTail = pStruct; // Start the SLL with one element.   }   else   {     pTail->pNextWithSameID = pStruct;     idArray[id]->pTail = pStruct; // Push the structure in the SLL.   }}

Now you still have to sort your structures for each id. Simply parse your id-struct list, parse the SLL, push struct pointers in an array of 500*MAXSTRUCT elements, keep the starting point of the sub-array for each id and keep a count of those ids. Then sort the sub arrays of struct pointers with the standard quicksort routine for instance. Or use a hash table to accelerate again.

When you have finished this all your structs are ordered by their pointers in an array.

There is another solution where you dont need SLLs, but only arrays. That's more efficient for the cache. You just need to count the number of structs per ID, an histogram (ID, count). Iterate and create an array of pointers from the histogram, which build a list of sub arrays inside your 500*MAXSTRUCT array of struct pointers. Then parse the structs again and push their pointers in the sub arrays treated as stacks. Then reorder order each sub array with quicksort.

Probably all you need to be quicker to solve your own problems is to gather enough culture in maths and general algorithms to be able to select words (vulgar or scientific) that carry meaningful concepts. Then the solutions appear more quickly when the problems are slightly rewritten. For instance simply speaking of total order and priorities helped me to see it all much more clearly. But that's only experience and scientific culture. Don't worry I was really messy in the past too and you found a solution your own way which is good.

BTW ... something else. Don't believe all what your politicians or papers said about France. They are all liars, in France too. Chirac=Blair=Bush=liars. I still enjoy visiting England when I can and don't care much about considerations for these intellectual crooks.

[edited by - Charles B on July 5, 2003 6:14:34 AM]

1. 1
2. 2
3. 3
4. 4
Rutin
19
5. 5

• 14
• 14
• 9
• 9
• 9
• Forum Statistics

• Total Topics
632927
• Total Posts
3009253
• Who's Online (See full list)

There are no registered users currently online

×