# Packing Order Lookup Table Optimization

This topic is 2604 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

[size="5"]Background:
I'm currently working on some SIMD related stuff and a critical operation is 'packing' all elements in an 8-wide simd variable that satisfy a mask into memory. So let's say I have
 simd = [x y A d e B k j] mask = [0 0 1 0 0 1 0 0] memory = [A B]

[size="5"]The allowed operations:

simd (operator) simd
or
(operator) simd

Standard math
+/-/*/and/or/!/abs

Permute
8 wide SIMD has 2 lanes.
lanes = [1 1 1 1 2 2 2 2].

In a permute, an output slot's data can come from any input slot, if that slot combination doesn't cross lanes
For example,
 in = [1 2 3 4 5 6 7 8] out = [_ _ _ _ _ _ _ _]

The 1st to 4th slot of out can be any one of [1 2 3 4].
The 5th to 8th slot can be any one of [5 6 7 8]
The selection parameter is an int [A(2 bits) B(2 bits) C(2 bits) D(2 bits)] where each element specifies what slot I select from. The 2 bits means that the left side's permute pattern has to match the right hand permute pattern. For example,
 in = [0 5 1 5 2 5 3 5] selection = [0 0 2 2] out = [0 0 1 1 2 2 3 3]

Cross
Example,
 in = [1 2 3 4 5 6 7 8] out = [_ _ _ _ _ _ _ _]

The 1st to 4th slot can be [1 2 3 4] or [5 6 7 8]. The 5th to 8th slot can be [1 2 3 4] or [5 6 7 8].

[size="5"]An intuitive way to do the packing
(From http://stackoverflow...-gather-indices)
1. Define a lookup table that
a) maps from a mask to a permute / cross operation
2. At runtime, compute the mask, lookup a permute/cross parameter, permute and then cross.
 a = data b = permute(a, K) c = cross(a, b, L)

[size="5"]My current lookup table generation is as follows:
1. Let simd in = [Left Right] = [L0 L1 L2 L3 R0 R1 R2 R3]
2. Compute a permute selection that moves all valid elements in Left to as left as possible
3. Compute a permute selection that
a. moves 4 - length(valid elements in Left) valid elements in Right to length(valid elements in Left)
b. moves the rest in Right to as left as possible

For example, in = [0 A B 0 0 C D E]
1. Desired Left is [A B 0 0]
2/3. Desired Right is [E 0 C D]
When crossed, we get [A B C D E 0 0 0]

I can generate the parameter lookup table, but this is currently 2^8 * 3 entries.
1. A permute parameter for Left and Right
2. A mask -> length lookup table. E.g. [0 0 1 0 0 1 0 0] = 2

Are there any patterns that I can exploit to reduce the number of entries?

1. 1
2. 2
3. 3
Rutin
19
4. 4
khawk
14
5. 5
frob
12

• 9
• 11
• 11
• 23
• 12
• ### Forum Statistics

• Total Topics
633659
• Total Posts
3013207
×