This topic is 436 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Why is it possible to overload operators for __m128 (although Microsoft didn't defined any operators for __m128) with the MSVC++ compiler? It seems strange compared to other built-in types such as float, but these are of course part of the C++ standard whereas __m128 isn't.

inline __m128 & __vectorcall operator+=(__m128 &v1, const __m128 &v2) {
v1 = _mm_add_ps(v1, v2);
return v1;
}

Edited by matt77hias

##### Share on other sites

Why would it be weird, you add one interpretation of binary + to the available set of options. You can define overloading on any operator with any known type (in any combination).

The main reason why this doesn't happen, is because overloading an operator is surprisingly ambigious, in the sense that users (ie us), have a surprisingly difficult time in correctly understanding the meaning of +, in particular if you didn't program the overloaded operators yourself. There are often several "+" interpretations which are all equally valid to chose from. For this reason, it is often better to use a name rather than a symbol for describing the operation.

Other languages have gone as far as not allowing operator overloading at all, which makes sense too, given the few useful opportunities that are valid cases.

##### Share on other sites
14 minutes ago, Alberth said:

Why would it be weird, you add one interpretation of binary + to the available set of options. You can define overloading on any operator with any known type (in any combination).

But not for standard built-in types. And __m128 is a built-in type as well though not standard.

Could there actually be a performance difference between:

using SIMD32x4 = __m128;

and some non-class/non-member methods versus

__declspec(align(16)) struct SIMD32x4 final {

__m128 m_v;

}

and some class/member methods (including operator new/delete) ? (all definitions in the headers)

Edited by matt77hias

##### Share on other sites
1 hour ago, matt77hias said:

But not for standard built-in types. And __m128 is a built-in type as well though not standard.

Don't you give an answer yourself here? Being non-standard means "not for standard built-in types" doesn't apply.

As for performance stuff, no clue, I typically don't go that close to the edge where such details matter. It takes too much time for too little benefit.

##### Share on other sites
3 minutes ago, Alberth said:

Don't you give an answer yourself here? Being non-standard means "not for standard built-in types" doesn't apply.

But in that sense, I find it strange to define them yourself? Other non-standard types such as __int32 for which all operators are already defined.

##### Share on other sites

__mm_* and _m128 are there as the minimal tools to expose the SSE instruction set into C / C++. They aren't meant to be a vector-maths library. They're tools for building a vector math library OR anything else that you want to run on top of SSE.

Traditionally on x86, you would have a vec4 type, which is 4 floats in an _m128, and also a float_in_vec type, which is 1 float in an _m128. You'd do this because mixing FPU and SSE code was extremely slow, so you'd want to avoid using normal floats, and use float_in_vec's instead.

When adding two float_in_vec's together, you're adding two _m128 variables, but the correct intrinsic is _mm_add_ss. When addign two vec4's together, you're adding two _m128 variables, but the correct intrinsic is _mm_add_ps. When adding a vec4 and a float_in_vec together, you're adding two _m128 variables, but the correct intrinsic is _mm_shuffle_ps followed by _mm_add_ps.

Therefore declaring that _mm_add_ps is the one true way to add together two _m128 variables is wrong, because _m128 is not a float4 type. It's the building block of a float4 and other types

##### Share on other sites
4 hours ago, Hodgman said:

Therefore declaring that _mm_add_ps is the one true way to add together two _m128 variables is wrong, because _m128 is not a float4 type. It's the building block of a float4 and other types

I took a look at DirectXMath which I already used and took for granted (never looked at the internals) and something similar puzzles me. The API let you convert XMUINT4, XMINT4 and XMFLOAT4 to a XMVECTOR which only provides one set of arithmetic operations (the float4 version)? The only thing that seems to adapt to the UINT and INT are the converter utilities (which convert to and from a floating point representation). But my understanding of integer and floating point arithmetic is that they are not the same. Does that mean that the API does not span the complete integer range?

Edited by matt77hias

1. 1
2. 2
3. 3
Rutin
15
4. 4
5. 5

• 9
• 9
• 11
• 11
• 23
• ### Forum Statistics

• Total Topics
633679
• Total Posts
3013296
×