Jump to content

  • Log In with Google      Sign In   
  • Create Account

We're offering banner ads on our site from just $5!

1. Details HERE. 2. GDNet+ Subscriptions HERE. 3. Ad upload HERE.


sse-alignment troubles


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • This topic is locked This topic is locked
33 replies to this topic

#1 fir   Members   -  Reputation: -456

Like
0Likes
Like

Posted 29 June 2014 - 03:42 AM

I got an array with input geometry, this is 100k entries 

array of triangles

 

Triangle triangles[trianglesMAX];
 
each traingle is 9 floats (it is 36 bytes each)
 
unfortunately this is not multiply of 16 bytes, and unfortunatelly
to pad this up i need tu add whole 12 empty bytes
 
so i wonder how i could change this, should i chnge the triangle
for SseTriangle (or what better name for this) which is build from 12
floats (48 bytes) ? - it would grow input ram size from 3.6 MB to 4.8 MB
this could slow the ram thing
 
alternatively i could copy each record to local variable buffer and then
work on this - not sure if this wouldnt be better
 
(at all i got yet a little confusion here)
 
other question is how 'keyword' in the GCC to use to align global and local 
storage variables for the usage of sse, i mean im mainly using the c types 
like float3 float9 float12 - how to align that?
 
sse has some type called __m128 probably this is automaticaly aligned
but it would be unconvinient to me to use this in all the places - in some 
sse agnostic places i would be liking using float3 then recast it to _m128
so i would prefer to use them both
 


Sponsor:

#2 haegarr   Crossbones+   -  Reputation: 4581

Like
1Likes
Like

Posted 29 June 2014 - 04:11 AM

You currently have an "array of structures" or AoS for short, i.e. a vertex (the structure) sequenced into an array. For SSE it is often better to have a "structure of arrays" or SoA for short. This means to split the vertex into parts, and each part gets its own array. These arrays can usually be organized better w.r.t. to SSE.

 

Which semantics have the 9 floats of your vertex, and what operation should be done on them?



#3 fir   Members   -  Reputation: -456

Like
0Likes
Like

Posted 29 June 2014 - 04:40 AM

You currently have an "array of structures" or AoS for short, i.e. a vertex (the structure) sequenced into an array. For SSE it is often better to have a "structure of arrays" or SoA for short. This means to split the vertex into parts, and each part gets its own array. These arrays can usually be organized better w.r.t. to SSE.

 

Which semantics have the 9 floats of your vertex, and what operation should be done on them?

 

for each traingle (lets call him abc - it has vertices abc) I

need to cross  and normalize to get the normal ,

 

n = (b-a) x (c-a)

n = normalize (n)

 

then i need to multiply it by model_pos matrix

 

n = modelpos * n

 

then i need to calc some light factors

 

Color *=  (n * LightDir) * LightColor

 

though here with color it is a bit more complicated as this base color to multiply is in packed ARGB format not float3 as the whole rest 

 

more info is in the anothers thread near here "how to optymize this with sse or sse intrinsics" - im a bit confused with this still as sse is a bit new

to me



#4 Tribad   Members   -  Reputation: 887

Like
0Likes
Like

Posted 29 June 2014 - 04:59 AM

Make an array of __m128 and make your triangles and normals to be indices into that array.

To fill the elements of the __m128 make a union with an array of floats or define four elements a, b, c d or x,y,z,w. However you want.

For the transformation you go blindly through the array.

 

The compiler will detect the best way to use sse instructions.



#5 haegarr   Crossbones+   -  Reputation: 4581

Like
3Likes
Like

Posted 29 June 2014 - 06:52 AM


for each traingle (lets call him abc - it has vertices abc) I
need to cross  and normalize to get the normal ,

Presumably (but I'm not an SSE expert, so someone may contradict me): The best performance for such a problem comes with a memory layout where each SSE register holds the same components of 4 vertices. I.e.

uint count = ( vertices.length + 3 ) / 4;
__m128 verticesX[count];
__m128 verticesY[count];
__m128 verticesZ[count];

Fill the arrays with the data of the vertices a, b, c of the first 4-tuple triangles, then of the second 4-tuple of triangles, and so on. In memory you then have something like:

verticesX[0] : tri[0].vertex_a.x, tri[1].vertex_a.x, tri[2].vertex_a.x, tri[3].vertex_a.x 
verticesX[1] : tri[0].vertex_b.x, tri[1].vertex_b.x, tri[2].vertex_b.x, tri[3].vertex_b.x
verticesX[2] : tri[0].vertex_c.x, tri[1].vertex_c.x, tri[2].vertex_c.x, tri[3].vertex_c.x
verticesX[3] : tri[4].vertex_a.x, tri[5].vertex_a.x, tri[6].vertex_a.x, tri[7].vertex_a.x 
verticesX[4] : tri[4].vertex_b.x, tri[5].vertex_b.x, tri[6].vertex_b.x, tri[7].vertex_b.x
verticesX[5] : tri[4].vertex_c.x, tri[5].vertex_c.x, tri[6].vertex_c.x, tri[7].vertex_c.x 
...
verticesX: analogously, but with the .y component

verticesZ: analogously, but with the .z component

 
Then computations along the scheme
dx01 = verticesX[i+0] - verticesX[i+1];
dy01 = verticesY[i+0] - verticesY[i+1];
dz01 = verticesZ[i+0] - verticesZ[i+1];
dx02 = verticesX[i+0] - verticesX[i+2];
dy02 = verticesY[i+0] - verticesY[i+2];
dz02 = verticesZ[i+0] - verticesZ[i+2];

nx = dy01 * dz02 - dz01 * dy02;
ny = dz01 * dx02 - dx01 * dz02;
nz = dx01 * dy02 - dy01 * dx02;

len = sqrt(nx * nx + ny * ny + nz * nz);

nx /= len;
ny /= len;
nz /= len;

should result in the normals of 4 triangles per run.

 


then i need to multiply it by model_pos matrix

Doing the same trickery with the model matrix requires each of its components to be replicated 4 times, so that each register holds 4 times the same value. It is not clear to me what "model_pos" means, but if it is the transform that relates the model to the world, all you need is the 3x3 sub-matrix that stores the rotational part since the vectors you are about to transform are direction vectors.



#6 fir   Members   -  Reputation: -456

Like
0Likes
Like

Posted 29 June 2014 - 07:04 AM

the question was more for alignment , say

 

void f()

{

  float3 x;

 

}

 

what 'keyword' tu use to align it to 16 bytes in GCC?

 

as to sse i will be trying to learn both vertical (this is 4 pack) way and horizontal way (harder but sometimes easier to integrate such procedure), tnx



#7 Tribad   Members   -  Reputation: 887

Like
0Likes
Like

Posted 29 June 2014 - 07:09 AM

If you use __m128 type variables they are automaticly 16Byte aligned if you do not force a special alignment.



#8 imoogiBG   Members   -  Reputation: 1245

Like
3Likes
Like

Posted 29 June 2014 - 07:14 AM

Currently compilers are really bad at autovectorization.

haegarr already pointed a simple solution, however SoA is not really comfortable, especially if you want to modify a vertex buffer(the most common case) where the data is usually interlaced (AoS).

 

If you can convert interlaced data to SoA use _mm_stream_ps intrinsic.

If we are talking about the interlaced data eg. (Triangle triangles[trianglesMAX], where Triangle has 3 vector that contain 3 points)  and it is not possible(worth it) to convert to SoA: 

alternatively i could copy each record to local variable buffer and thenwork on this - not sure if this wouldnt be better

Well this is a good solution. Because the data is not aligned you should use movups(_mm_loadu_ps for example) instructions to load your data into the xmm registers. While wrting SSE code keep in mind that __m128 is not just a type.

 

Pay attention when writing the cross product operation it is really tricky. 

Here is my implementation. You should try to find a better solution

 

 

EDIT:

what 'keyword' tu use to align it to 16 bytes in GCC?

__attribute__(align(16)) GCC

__declspec(align(16)) MSVC

alignas(16) C++ 11 not implemented on all compilers yet.

 

 

cross (a.m_M128, b.m_M128 )

__m128 T = _mm_shuffle_ps(a.m_M128, a.m_M128, SGE_SIMD_SHUFFLE(1, 2, 0, 3)); //(Y Z X 0)
__m128 V = _mm_shuffle_ps(b.m_M128, b.m_M128, SGE_SIMD_SHUFFLE(1, 2, 0, 3)); //(Y Z X 0)


//i(ay*bz - by*az)  + j(bx*az - ax*bz)  + k(ax*by - bx*ay)
T = _mm_mul_ps(T, b.m_M128);//bx * ay, by * az, bz * ax
V = _mm_mul_ps(V, a.m_M128);//ax * by, ay * bz, az * bx
V = _mm_sub_ps(V, T);

//V holds the result
V = _mm_shuffle_ps(V, V, SGE_SIMD_SHUFFLE(1, 2, 0, 3));

Helpful link:

https://software.intel.com/sites/landingpage/IntrinsicsGuide/.

 

If you prefer to write autovectorization friendly code, please tell me I will do my best to help you.


Edited by imoogiBG, 29 June 2014 - 08:12 AM.


#9 Tribad   Members   -  Reputation: 887

Like
0Likes
Like

Posted 29 June 2014 - 07:18 AM

I use a gcc-4.8.x compiler on my linux machine and it did  a good job even as I did not use __m128 data. Using a struct with 4 floats activated the SSE instruction set and made the software pretty fast. Multiplying the list of vectors by a matrix even has been converted to SSE instructions. It maybe getting better but only within single clockcycles range.



#10 imoogiBG   Members   -  Reputation: 1245

Like
0Likes
Like

Posted 29 June 2014 - 07:26 AM

I use a gcc-4.8.x compiler on my linux machine and it did  a good job even as I did not use __m128 data. Using a struct with 4 floats activated the SSE instruction set and made the software pretty fast. Multiplying the list of vectors by a matrix even has been converted to SSE instructions. It maybe getting better but only within single clockcycles range.

 

Just because there are SSE instructions that doesn't mean that the compiler did a great job. 

For example if you want to sum 2 vec4 vectors and if you're not writing autovect friendly code lets say there will be 4 addss (+ tons of movss) instead of a single addps.

 

EDIT:

what 'keyword' tu use to align it to 16 bytes in GCC?

__attribute__((align(16))) GCC

__declspec(align(16)) MSVC

alignas(16) C++11 not implemented on all compilers yet.

 

Those will also modify the sizeof(TYPE).


Edited by imoogiBG, 29 June 2014 - 07:35 AM.


#11 Tribad   Members   -  Reputation: 887

Like
0Likes
Like

Posted 29 June 2014 - 07:48 AM

You are right about the instructions used. But the time needed to transform about 300k vectors is nearly the same. This is what I ment before. The difference between hand optimized code and the output of the compiler may differ only in single clock cycles.

Sure it can be of some interest. But in most cases it makes more sense to do a highlevel optimization.


Edited by Tribad, 29 June 2014 - 07:48 AM.


#12 imoogiBG   Members   -  Reputation: 1245

Like
0Likes
Like

Posted 29 June 2014 - 08:06 AM

You are right about the instructions used. But the time needed to transform about 300k vectors is nearly the same.....But in most cases it makes more sense to do a highlevel optimization.

Well this depends on the situation(prefetch friendlyness, instruction cache, code complexity). Usually I convet to sse code, that is executed more than one and it is crtical.

Of course I hate to write SSE directly, it breaks everything, the code is ugly....blah

But there are a lot of cases where intrinsic code is much faster. Hopefully In the future that code will be scraped.


Edited by imoogiBG, 29 June 2014 - 08:10 AM.


#13 fir   Members   -  Reputation: -456

Like
0Likes
Like

Posted 29 June 2014 - 08:18 AM

 

Currently compilers are really bad at autovectorization.

haegarr already pointed a simple solution, however SoA is not really comfortable, especially if you want to modify a vertex buffer(the most common case) where the data is usually interlaced (AoS).

 

If you can convert interlaced data to SoA use _mm_stream_ps intrinsic.

If we are talking about the interlaced data eg. (Triangle triangles[trianglesMAX], where Triangle has 3 vector that contain 3 points)  and it is not possible(worth it) to convert to SoA: 

alternatively i could copy each record to local variable buffer and thenwork on this - not sure if this wouldnt be better

Well this is a good solution. Because the data is not aligned you should use movups(_mm_loadu_ps for example) instructions to load your data into the xmm registers. While wrting SSE code keep in mind that __m128 is not just a type, the compiler will try to put every __m128 into the registers.

 

Pay attention when writing the cross product operation it is really tricky. 

Here is my implementation. You should try to find a better solution

 

 

EDIT:

what 'keyword' tu use to align it to 16 bytes in GCC?

__attribute__(align(16)) GCC

__declspec(align(16)) MSVC

alignas(16) C++ 11 not implemented on all compilers yet.

 

 

cross (a.m_M128, b.m_M128 )

__m128 T = _mm_shuffle_ps(a.m_M128, a.m_M128, SGE_SIMD_SHUFFLE(1, 2, 0, 3)); //(Y Z X 0)
__m128 V = _mm_shuffle_ps(b.m_M128, b.m_M128, SGE_SIMD_SHUFFLE(1, 2, 0, 3)); //(Y Z X 0)


//i(ay*bz - by*az)  + j(bx*az - ax*bz)  + k(ax*by - bx*ay)
T = _mm_mul_ps(T, b.m_M128);//bx * ay, by * az, bz * ax
V = _mm_mul_ps(V, a.m_M128);//ax * by, ay * bz, az * bx
V = _mm_sub_ps(V, T);

//V holds the result
V = _mm_shuffle_ps(V, V, SGE_SIMD_SHUFFLE(1, 2, 0, 3));

Helpful link:

https://software.intel.com/sites/landingpage/IntrinsicsGuide/.

 

If you prefer to write autovectorization friendly code, please tell me I will do my best to help you.

 

alright,

 

as i understand there are two types of sse vectorization "vertical" when you are processing nearly the same way as scalar code but with 4-packs - and "horizontal"

 

- i would like to train a bit both of them, i got an idea to do some very basic tests today, i may give a results here if someone is interested

 

 

could you maybe say if i can typedef this "__attribute__(align(16)) GCC"

to get an always aligned version of "float3" struct i use for 3d?

 

i would like to typedef float3a (and float4a) type to be always aligned to 16 (but with no to much waste of spacei mean 

 

float3a tab[1000]; should be aligned and have 16k bytes



#14 imoogiBG   Members   -  Reputation: 1245

Like
1Likes
Like

Posted 29 June 2014 - 08:30 AM

typedef __declspec(align(16)) vec3 vec3a; is ignored and vec3a is NOT aligned at 16. This is on MSVC and I think that gcc will do the same.

 

struct __attribute__((align(16))) avec3 : public vec3
{};
 
Should do the trick but pay attention that sizeof(vec3) != sizeof(avec3) != 12.
sizeof(avec3) should be 16.
Also keep in mind that new operator DOES NOT return aligned addresses, I you want to use new, overload it and use something like aligned_malloc.
 
 
EDIT: 
 
I assume in this post that we are talking about float vectors.
 
as i understand there are two types of sse vectorization "vertical" when you are processing nearly the same way as scalar code but with 4-packs - and "horizontal"

No one calls it that way. Your algorithm depends on the data layout. So there are 2 major data layouts Structure of Arrays and Array of Strcutures (called interlaced).


Edited by imoogiBG, 29 June 2014 - 08:35 AM.


#15 fir   Members   -  Reputation: -456

Like
-2Likes
Like

Posted 29 June 2014 - 08:52 AM

 

typedef __declspec(align(16)) vec3 vec3a; is ignored and vec3a is NOT aligned at 16. This is on MSVC and I think that gcc will do the same.

 

struct __attribute__((align(16))) avec3 : public vec3
{};
 
Should do the trick but pay attention that sizeof(vec3) != sizeof(avec3) != 12.
sizeof(avec3) should be 16.
Also keep in mind that new operator DOES NOT return aligned addresses, I you want to use new, overload it and use something like aligned_malloc.
 
 
EDIT: 
 
I assume in this post that we are talking about float vectors.
 
as i understand there are two types of sse vectorization "vertical" when you are processing nearly the same way as scalar code but with 4-packs - and "horizontal"

No one calls it that way. Your algorithm depends on the data layout. So there are 2 major data layouts Structure of Arrays and Array of Strcutures (called interlaced).

 

great, 

this naming my be not official but i think this is worth to use as it describes two main styles of writing i think 

 

i did some tests

 

struct  __attribute__((align(16))) float3A { float x,y,z;};
typedef __attribute__((align(16))) float3 float3At;
 
float3 f1;
float3A f2;
float3At f3;
 
void tests()
{
  alert("%p ", &f1);
  alert("%p ", &f2);
  alert("%p ", &f3);
 
  exit(0);
 
}
 

and opposite as you said the two first were aligned to 4 only the last one (this with typedef is aligned 16), very fine - this works *

 

if you would like to talk its possible that i would be having some more questions (dont wannt to be sse master but would like to train basics of it to be able to use it quick sometimes )

 

 

  alert("%d ", sizeof(f1));
  alert("%d ", sizeof(f2));
  alert("%d ", sizeof(f3));
 
gives 12 12 12 not exactly you say, maybe compilers differ sadly im using  "# GNU C++ (tdm-1) version 4.7.1 (mingw32)" (c++ compilation though on c only code) - much tnx for pointing this out
 
* edit
 
damn it seemed that it was aligned only by chance so any one of this is not working
 
edit
 
this is working 
 
 typedef float3 float3At __attribute__ ((aligned (16)));
 
  alert("%d ", __alignof(f3));
 

-> 16

 

but there is another problem 

 

float3At tab[1000];

 

tests.c:118:18: error: alignment of array elements is greater than element size

Edited by fir, 29 June 2014 - 09:16 AM.


#16 fir   Members   -  Reputation: -456

Like
-5Likes
Like

Posted 29 June 2014 - 11:53 AM



You are right about the instructions used. But the time needed to transform about 300k vectors is nearly the same. This is what I ment before. The difference between hand optimized code and the output of the compiler may differ only in single clock cycles.

Sure it can be of some interest. But in most cases it makes more sense to do a highlevel optimization.

 

If so how will you explain such results:

 

  for(int i=0; i<1000*1000; i++)
  {
     tab3[i] =  tab1[i] * tab2[i];    //multiply 1 mln floats
  }
 

  for(int i=0; i<1000*1000; i+=4)
  {
     __m128 a = _mm_load_ps(  (float*)&tab1[i] );
     __m128 b = _mm_load_ps(  (float*)&tab2[i] );
     __m128 c = _mm_mul_ps(  a,b );
      _mm_store_ps(&tab3[i], c);
 
  }
 
consecutive runs
 
floats 
 
4.58 ms 4.58 ms 4.73 ms
 
m128
 
2.91 ms 2.95 ms 2.92 ms
 
?
 
its even more speedup that i have expected - i am expecting 
more 30% of speedup than 3x becouse as far as i know, most codes are
ram bound not calculation bound so sse would not help much in this cases (as im not sure how it can halp with improving ram throughtput
but probably and sadly its abilities here counts zero )
But for me 30 % is sometimes worth it - Also in some othercases of
heavy calculation with a less ram flow it can bring more like 3 times
speedup
 
so speaking about no difference you are maybe spreading some misinformation (as some antyoptimization propaganda does - from my 
experiences my efforts to optymize codes with "nanooptymizations" over algorythmic improvements in most cases was giving 2X 3X speedup 
9relative to base case with -O3, i mean when starting with srightforward (but not laggy) c code compiled with -O3 searching for various "picooptymizations" often gave 2x sometimes 3x speedup)


#17 Tribad   Members   -  Reputation: 887

Like
3Likes
Like

Posted 29 June 2014 - 12:24 PM


so speaking about no difference you are maybe spreading some misinformation (as some antyoptimization propaganda does - from my
experiences my efforts to optymize codes with "nanooptymizations" over algorythmic improvements in most cases was giving 2X 3X speedup
9relative to base case with -O3, i mean when starting with srightforward (but not laggy) c code compiled with -O3 searching for various "picooptymizations" often gave 2x sometimes 3x speedup)
 

 

Your first implementation is not a vector operation. It is a simple multiplication of values. And you do some highlevel optimizations, as I suggested, because you re-organize the data to take some benefits of vector operations. This has nothing todo with misinformation. You are doing some different things.

 

But think of it like you always do. You are right and I am some misinformation spreading noob. You are the king of codeing.

typedef float tElement;

typedef union tagVector {
    __m128 xmm;
    struct {
        tElement x,y,z,w;
    } e;
}tVector;

typedef struct tagMatrix {
    tVector v[4];
} tMatrix;
#if 1

            tiew[s].e.x = m.v[0].e.x*tiev[s].e.x + m.v[1].e.x*tiev[s].e.y + m.v[2].e.x*tiev[s].e.z +m.v[3].e.x*tiev[s].e.w;
            tiew[s].e.y=m.v[0].e.y*tiev[s].e.x + m.v[1].e.y*tiev[s].e.y + m.v[2].e.y*tiev[s].e.z +m.v[3].e.y*tiev[s].e.w;
            tiew[s].e.z=m.v[0].e.z*tiev[s].e.x + m.v[1].e.z*tiev[s].e.y + m.v[2].e.z*tiev[s].e.z +m.v[3].e.z*tiev[s].e.w;
            tiew[s].e.w=m.v[0].e.w*tiev[s].e.x + m.v[1].e.w*tiev[s].e.y + m.v[2].e.w*tiev[s].e.z +m.v[3].e.w*tiev[s].e.w;
#else
            a=_mm_mul_ps(tiev[s].xmm, m.v[0].xmm);
            b=_mm_mul_ps(tiev[s].xmm, m.v[1].xmm);
            a=_mm_add_ps(a,b);
            c=_mm_mul_ps(tiev[s].xmm, m.v[2].xmm);
            a=_mm_add_ps(a, c);
            d=_mm_mul_ps(tiev[s].xmm, m.v[3].xmm);
            tiew[s].xmm=_mm_add_ps(a, d);
#endif

The two make not many difference in speed. The first generates lots of skalar operations the second use the packed operations as expected. And the first is minimal faster, for what hell ever.

 

By giving the compiler a hint what datatype you use, __m128, it is able to produce highly optimized code that maybe only be topped by timeconsuming developing nano optimized code. The difference is..... the first, faster version is portable to any architecture.
 


Edited by Tribad, 29 June 2014 - 12:45 PM.


#18 fir   Members   -  Reputation: -456

Like
-2Likes
Like

Posted 29 June 2014 - 12:44 PM

 


so speaking about no difference you are maybe spreading some misinformation (as some antyoptimization propaganda does - from my
experiences my efforts to optymize codes with "nanooptymizations" over algorythmic improvements in most cases was giving 2X 3X speedup
9relative to base case with -O3, i mean when starting with srightforward (but not laggy) c code compiled with -O3 searching for various "picooptymizations" often gave 2x sometimes 3x speedup)
 

 

Your first implementation is not a vector operation. It is a simple multiplication of values. And you do some highlevel optimizations, as I suggested, because you re-organize the data to take some benefits of vector operations. This has nothing todo with misinformation. You are doing some different things.

 

But think of it like you always do. You are right and I am some misinformation spreading noob. You are the king of codeing.

 

what?

this is the revriting the loop to sse so this is the thing talken here

 

(though i must say now that the benchmark is very context sensitive - i moved the loops to some outer loop and i cannot reproduce the results they seem now about equal in speed) (need some more testing and some more arithmetic cases to show the speedup as i said most code cases are ram bound i think)

 

besides who said i am the king of coding? why you inventing that? im moderately experienced but i dislike beliving in statements who are not good substantiation - and im also indifferent to propaganda


Edited by fir, 29 June 2014 - 12:45 PM.


#19 Tribad   Members   -  Reputation: 887

Like
1Likes
Like

Posted 29 June 2014 - 12:48 PM

You are right. I am the noob. You did a great job in rewriting a loop.


Edited by Tribad, 29 June 2014 - 12:51 PM.


#20 fir   Members   -  Reputation: -456

Like
1Likes
Like

Posted 29 June 2014 - 12:51 PM

You are right. I am the noob.

Youre wrong - i was not saying you are a noob - I dont know what youare talkin about at all

 

I was refering to this 

 

"I use a gcc-4.8.x compiler on my linux machine and it did  a good job even as I did not use __m128 data. Using a struct with 4 floats activated the SSE instruction set and made the software pretty fast. Multiplying the list of vectors by a matrix even has been converted to SSE instructions. It maybe getting better but only within single clockcycles range."

 

if you are saying that compiler is doing close thing to hand sse optymizations i think youre wrong and saying so you propaply are spreading misleading information (though still i cannot prove it but i see no reasons to belive this ) - some mine previous experiences with the things some call "nanooptymizations" showed that this general statemants (that it bring no optimization) were  not true in my cases


Edited by fir, 29 June 2014 - 01:02 PM.





Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS