SSE denormals

Started by
11 comments, last by thatguyfromthething 8 years, 11 months ago

apologies for a query posed in nescience........


i've been poring over a bit of code that uses a lot of cpu (seems to be around 56% a lot) at some points, and not (8%) at others.

there are a few comparisons, and a few divisions, though the divisions are by values 1 and greater... i do not have a profiler in my compiler (borland freecommandlinetools, i really love it.. on winXP..). the code also sets bits on a bitmap, so it could be due to the size of the array.

there are a lot of 2d rotations, and it's possible that these coefficients may incidentally be very small... ..i'm used to observing denormals in IIR filters, i'd be surprised to see them in "one-off" multiplications.. i have also noted that the denormaling occurs predictably when world objects are close to world origin (0,0). (this wouldn not affect the process of accessing/writing the bitmap).

..understanding the low-tech venue and psyche i am operating in, my guess is that the increase is due to the cpu denormaling.

i apologise to everyone that i have not been convinced to switch to a different compiler, or integrate a profiler with fclt, and have asked this very vague question anyway.. i know some people will say i don't have a right to ask anyone anything when i manifest such limits..

..but on the off chance that you've worked with XP and that this is an easy query to address or add to for you,

i spent some time reading about SSE as i vaguely remember that being an issue from a decade or so ago - i tried to adapt information from discussions to address the denormaling on my computer and was unsuccessful :) but i wasn't able to affect denormaling -



my compiler does not have
#include <xmmintrin.h>

which is needed to use either of these..
_MM_SET_FLUSH_ZERO_MODE(
x
)
_MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);




iirc i was able to get this to compile, but it did not affect performance -
#define _MM_DENORMALS_ZERO_MASK 0x0040
#define _MM_DENORMALS_ZERO_ON 0x0040
#define _MM_DENORMALS_ZERO_OFF 0x0000

#define _MM_SET_DENORMALS_ZERO_MODE(mode) \
_mm_setcsr((_mm_getcsr() & ~_MM_DENORMALS_ZERO_MASK) | (mode))
#define _MM_GET_DENORMALS_ZERO_MODE() \
(_mm_getcsr() & _MM_DENORMALS_ZERO_MASK)

..can haz system denormals for borland? :p i'd totally be on top of this if it were an IIR... this kinda stuff is not why i code :) "that kinda guy.."

neither a follower nor a leader behttp://www.xoxos.net
Advertisement

How do you know it did not affect performance? Although we have told you several times not to, you seem to keep using CPU usage as a measuring tool.
If your CPU usage is low then your project is not running as fast as it could. In a very simplistic sense, more CPU usage means better performance (or the same performance), but just forget about it and measure properly.

I don't know anything about Borland but I see no reason why denormalized numbers would be a problem.

If you asked anything else, I skipped copious amounts of your post because instead of just asking the question you went off on 10 tangents.

Next time just ask. You can look at almost any other post on the site for an example of a question that was asked directly and without an autobiography.


L. Spiro

[EDIT]

When I got to a computer the post did not seem as long as it did when I was on my phone (from where I posted).

I take back some of my criticism regarding its length, but not all. There still needs to be a lack of miscellaneous information in your posts.

[/EDIT]

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

A little off-topic but

demoralized numbers

biggrin.png

“If I understand the standard right it is legal and safe to do this but the resulting value could be anything.”

Stupid iPhone auto-correct…

L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

spiro, this is really off topic -

"we have told you several times" - please stop trying to make it look like a bunch of people regularly have to beat me for poor manners.

how i discern that writing near origin does not affect the speed of accessing the bitmap? simple - "origin" is a world, 3d idea. any pixel put to the bitmap must be between 0 and (max-1) on either axis. world origin, with a motile observer, has *nothing* to do with the position of pixels.

that's a really simple idea, one of many you have failed to understand because you were too busy trying to discredit me.

how do i test the speed of statements? i block off segments of code/substitute until i see a difference, and i've done it that way for a decade.

why?

because, it's what i can do. i have no other methods.

beating me because i do things the only way i know is...... well... it's apparently why you answered.

but it doesn't move anyone forward, does it.




think of my posts as like casual.... the guy in the desk next to you slides back and says, "hey..." if you can't accept the casual terms i base an enquiry on (trust me, they're the only terms i have) then please don't clutter my threads with telling me i have no grounds for enquiry.

neither a follower nor a leader behttp://www.xoxos.net

also, i'm not really sure what field of expertise you have,

but in audio dsp, generally "faster" means a routine is using *less* cpu resources, and "slower" means it is using more.

so that if i say, "my software is running really slow," it means the performance is poor, that there is less overhead for other processes.


if you believe that software that uses all the available overhead is "better" performance, well great! but i think if you were in the audio industry, and said your IIR filter uses the entire cpu, no one would agree that it is "fast," it would be called "slow".




i really, truly, do not believe the game industry could be so different. but if it is, it's no big deal. you go ahead and call it "fast".

neither a follower nor a leader behttp://www.xoxos.net

[EDIT]

There still needs to be a lack of miscellaneous information in your posts.

[/EDIT]

and greater extension of tolerance in yours :)

spiro, i'm now an old and jaded person writing software, living in a violent community.

but once, i was a proper young man from wales with an immense practice of protocol.

i don't think it should be necessary to dredge up my past in all of my posts, but unless i do it, how are you able to understand that i may be a "bigger person" than you are giving me credit for?

i'm asking questions about graphics coding, not attending an etiquette school. if i wished, i could put the british on and etiquette you all the way out the door to the end of the road, but i prefer to talk like an old, less cultured arizonan. so get over it.

neither a follower nor a leader behttp://www.xoxos.net

please stop trying to make it look like a bunch of people regularly have to beat me for [using CPU usage as a measuring device].

http://www.gamedev.net/topic/668500-win32-cpu-render-bottleneck/#entry5229728
http://www.gamedev.net/topic/668500-win32-cpu-render-bottleneck/#entry5229751

So I will keep this short and future-applicable.
As Hodgman said, CPU usage is basically irrelevant. Performance is measured in time, which means doing as he said and timing how long things take.


Even in your reply you mention nothing about actual timing. What you describe sounds as though it is nothing but your feeling.

So I asked you a very specific question. How do you know what changes in performance you are getting? It should be very simple to say, “I timed it using QueryPerformanceCounter(),” if that is what you did. Instead, once again, you ranted about nonsense, cluttered up your own topic, threw around accusations, and ultimately I still don’t even have an answer. How do you know if it is faster? How hard can it be to answer that? Did you time it or did you feel it?
Why can’t you give specifics and just answer the question?

also, i'm not really sure what field of expertise you have,

Primarily video games.

but in audio dsp, generally "faster" means a routine is using *less* cpu resources, and "slower" means it is using more.

There is no direct correlation between CPU usage and overall performance.
When the operating system decides that a thread or process needs more resources (based on priority levels, how often it returns control back to the operating system (for example waiting for a Windows message), accesses disk, etc.) it will give it more (and longer) time slices (which increases CPU usage), which means it has more time to run, which means its performance increases.
When your application’s CPU usage is low, it means most of the CPU power is spent on other processes or just idling, which generally implies your process is not performaning at full potential.
If your process does the exact same task at 8% vs at 92%, the 92% version will finish much faster (and you could call this “high performance”).


Since it also depends on many variables, including process priority, thread affinities, thread priorities, disk access, OS returns, waits, sleeps, yields, etc., it is a very very poor way to measure the “performance” of an application.
Its only practical use is to let you know you are draining the battery of a laptop faster or to know if you are taking resources from other processes, which may not be desirable for applications related to audio (until the final export), but that isn’t called “performance”, that is just called a “resource hog”.



i don't think it should be necessary to dredge up my past in all of my posts, but unless i do it, how are you able to understand that i may be a "bigger person" than you are giving me credit for?

I don’t need to understand your social standing, I need to know what you want to ask and any relevant details that would help me provide an answer.


L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

then try and acknowledge them!

i'm an audio coder. if i coded something that used 90% of the cpu, it's not considered "full potential".

i'm an audio coder. if i can see the cpu usage, i've failed!



acknowledgement is:
i'm using borland fclt. on an old system.

do you think i want to install microsoft compiler from 2013 on it?




how do i measure time -

i'm an audio coder. i can hear time like i doubt you can. while my nescience does not produce a quantifiable result of flops or whatever, i'm keenly aware of time, and i feel (yes feel) that this is adequate for discerning the difference between clocking a 56% cpu resourcing and 8%.




let's move forward:

what i am able to observe about the performance translates poorly to any objective analysis via forum unless you want to pore over my code (the triangle was in the last post). so, i am foregoing that.

what i have moved onwards to, is asking if someone has any SSE denormal kinda style code thingies handy for me to appropriate, suitable for a borland dev environ.

that was right up there in the first post. the rest of it was "colour" so that we might avoid any investigation of irrelavent factors, such as i'm using a compiler no one wants to know about. that's why i do the details smile.png

you know?

people say - why are you using old, useless crap?
i say - because i can't buy new useless crap

but people don't understand that, because they are able to buy, and can't relate to conditions they have not experienced. if i say, my arms were chewed off in an octopus fight, then maybe you can figure out why i can't buy new crap! but if i tell you about the octopus, you say i'm finding excuses to talk about my personal life!

can't i get a break where you figure maybe i've fgot a good reason for the challenges i face!

maybe you have a life, friend, but i've been doing this, and only this, for a very, very long and well documented time. i hope i don't have to explain everything like how i have observed the cpu performance of commands in the synthedit sdk using a graphical meter. i got that. i'm telling you, the stuff i don't got is knowledge of SSE and system denormals. i don't even know the correct terms, and i don't want to, because as soon as i have solded whatever performance issue is capping my triangle output, i'm going to disappear back up my own arse and program in happiness. i'm not going to seek out more APIs to learn because i do not want to dominate the field.

i posted the triangle code. you didn't want to talk about it. i'm posting the SSE question. try and understand, i may not be *your idea* of articulate, but i know why i'm asking about SSE on borland here even if you don't.

do you know about SE on borland? thank you tongue.png

neither a follower nor a leader behttp://www.xoxos.net

All I see from you is excuses, chief among them "I'm old and set in my ways, so don't expect me to change."

Well I'm afraid that attitude doesn't wash. You need to stop acting like such a petulant child when people question you. You need to stop derailing your own topics with endless sob stories. You need to show other posters at least a modicum of the respect that you demand they show you. You need to accept that a good chunk of what you think you know is flat out WRONG.

And if you are unwilling to do any of that, you need to leave.

This topic is closed to new replies.

Advertisement