Jump to content

  • Log In with Google      Sign In   
  • Create Account


phantom

Member Since 15 Dec 2001
Offline Last Active Today, 04:39 PM
****-

#5137451 Sorting a std::vector efficiently

Posted by phantom on 08 March 2014 - 06:54 PM

Just sort the renderables in place?
It'll probably be faster than 4 memory allocations, an initialisation of 3 arrays, a sort and 'newindex' setup (which will be horrible because you could end up jumping all over memory to find your indices as it's a double indirection).

or, when you add them, use an insertion sort to pre-position them in the correct place in the vector. (A vector which should be pre-sized to 'max number of entities' before you even try to do this loop).


#5137110 Geometry Shaders

Posted by phantom on 07 March 2014 - 05:57 AM

It might be sane, but the problem is as stated there is no stage before the VS that you could do this in; at best you could do a pass through-VS and do the work in the GS but that'll hurt as you'll end up spilling out to gfx memory in and introduce ordering.

 

Amusingly the whole VS->[tess]->GS thing is just an artifact of the way the pipeline was setup; current hardware could, quite happily, put a 'GS'-like stage first to operate on a primitive (IA is done in ALU cores these days anyway) before the VS/Tess stages... in fact VS wouldn't really be needed at all in that situation as you could test and transform 3 points just fine.

 

In short; the idea might have merit but the way the software pipeline I arranged means that while the hardware could operate in that manner you aren't going to be able to do it.




#5134905 AMD's Mantle API

Posted by phantom on 26 February 2014 - 06:04 PM

I definitely didn't think that Microsoft would respond so quickly! Let's hope that it's something substantial (bindless resources plz!), and not just a band-aid.


The brief I saw said 'future' in it; this could mean anything form 6 months or Windows 9 and given the recent tendency to put D3D updates behind Window version walls.

The problem is the only way you are going to get a proper fix is an API redesign, a backend redesign and thus a whole bunch of new driver code.. and if it looks nothing like D3D11 it's just going to make everyone throw their hands in the air because now you are going to have to support 4 APIs (D3D, DXNext, D3D-XBox One and Gnm) all of which are subtly different to the point of driving people made so you'll end up with a common subset work-alike aka D3D11 target anyway.

And this is before you take into account that the monolithic 'one interface to submit calls' doesn't really make sense any more; take some thing like AMD's R9 290X card; the GPU on that has a gfx/compute command processor AND 8 compute only command processors AND a DMA engine to handle up/downloads of data.

Even if you can hide the latter behind an API (and pray it does things correctly; in GL land I've heard you need to have a second context to get NV's drivers to do DMA uploads of data...) there needs to be some method of logically queuing up different workloads, establishing dependency graphs between them, kicking long running/low priority tasks, control resource allocation for tasks (shadow generation tends to be ROP and ALU but heavy but light on read bandwidth so being able to kick, say, a read bandwidth heavy, ALU light task at the same time would be useful) and even how memory is laid out (give me a 20meg chunk, I'll layer textures or whatever into it) and reused.

The fact is the GPUs can do a hell of a lot more than is exposed in any API right now and trying to slap an abstraction over it which hides that completely is just annoying.

Yes, not everything is needed by everyone but then those people aren't crying out for more resources so meh, they could stick with D3D11 - unless the API allows the same kind of control as Mantle was looking to give for people who want it then it's going to be horrible once again.

Right now I'd summarise things as follows;
- D3D; we promise things for the future, but it'll probably be for Win8 only if you are lucky, more than likely Win9.. and will probably still have more overhead than you want but abstraction!

- OpenGL; instancing is the answer to everything! Here is how to do it! It only requires these 8 extensions! Oh, and only one vendor has implemented them all, which may or may not be completely to the letter of the spec, another vendor is short some key extensions but 'soon' and the third is still a few GL versions behind but don't worry! we'll get there! You can trust us! Just forget it took us forever to get VBO out, the GL2.0 events and ignore GL3.0/Longs Peak... really, we can change, ignore 14 years of history!

- Mantle; One Vendor. One GPU type. Full Control. Maybe it'll work on more platforms... one day... and you can't see it unless you are a AAA dev!

It's all a wonderful mess... I'm just glad that as of today I'm no longer doing rendering for a day job as it's all just... ugh.


#5132917 Any options for affordable ray tracing?

Posted by phantom on 20 February 2014 - 03:51 AM

The catch is that it'll only be popular on the consoles, as on PC it currently requires Win8+D3D11 GPU's, or Mantle and D3D11 AMD GPUs...


GL also is part way to this with an GL_arb_sparse_texture extension, which if combined with arb_bindless_texture might yield some intresting possibilities. (Although unlike D3D and (probably) Mantle is currently leaves memory allocation under driver control.. which is mildly annoying.. apprently another extension in the works to fix that).

Same hardware constaints as the DX/Mantle stuff of course.


#5132589 I don't get c++11.

Posted by phantom on 19 February 2014 - 04:42 AM

I'm curious about this that dejaime posted:
 
std::function<double(double)> truncate
 
That syntax in between the <>'s, is that some kind of new language extension? Not heard anything about this but doesn't seem to be something that would have been valid in pre C++-11.


It's just function type declaration minus the typename part;
typedef double (*fp_t)(double)
fp_t foo;

typedef std::function<double(double)> fp_t;
fp_t foo;
boost::function used this syntax before std::function was folded into the standard.


#5132261 Why C++?

Posted by phantom on 18 February 2014 - 02:24 AM

About the only good thing with C++ are the typesafe containers implemented in the STL.  Still, I fiind them overally verbose when using the iterators and the code just looks ugly.  Here's a gem from Ogre code I wrote a while back:



 
std::vector<MovableObject*>::const_iterator it = movingObjs.begin();
std::vector<MovableObject*>::const_iterator end = movingObjs.end();
while(it != end) {
      MovableObject* movableObject = (*it);
// do somethign with the movable object
      ++it;
}
Tell me how that is better than something like

 
for(int n=0; n<movingObjects.size; n++) {
      MovableObject* movableObject = movingObjects.objs[n];
// do something with the movable object
}
Assuming I have a proper movingObjects struct (not class) hich has the appropriate fields.


Apprently you've not heard of typedef, std::for_each and, for C++11 and onwards, lambdas...

// C++11
std::for_each(std::begin(movingObjs), std::end(movingObjs), [](MovableObject* obj)
{
// do something with obj
});

// C++14 (iirc)
std::for_each(std::begin(movingObjs), std::end(movingObjs), [](auto* obj)
{
// do something with obj
});
Also, evaluating container.size every iteration is wasteful as chances are the compiler will end up emitting code which will reload the value from memory every loop instead of register caching the value.

And for the record; inhertiance might be a part of OO but the general rule is prefer composition over inhertiance anyway.
(and last I checked Ogre was an utter utter mess anyway, which isn't the fault of C++ but of bad design. Never. Look. Again.)


#5131783 small inline lib for math in mingw

Posted by phantom on 16 February 2014 - 01:33 PM

As to .h file: inline functions must be contained in h file becouse
if they would be in .c they couldnt be compiled inline, what a problem


Except you are now making assumptions about what the compiler might do.

Just because something is in the .h file doesn't mean it will be inlined 100% of the time.
Also, thanks to link to code generation, which has been around for some years now, the linker can inline code from different modules as it sees fit - so the code which is in a .c/.o file could very well get inlined into a function calling it from another .c/.o if the 'win' is considered worth it.


#5129963 Towards better error handling constructs

Posted by phantom on 08 February 2014 - 05:41 PM

This feels to me like the best of all possible worlds, but I'm curious if it even makes sense to anyone else, or if someone has a better idea for how to handle error situations in code. Keep in mind I'm not looking for a solution to bolt into an existing language so much as a theoretical ideal.
Thoughts?


Once I grok'd the syntax (Epoch?) the concept certainly made sense although the first concern which popped into my head was noise at the call site.

In fact, going back to your opening, one of my 'issues' with exceptions isn't so much the burden of handling as it is that you end up often wrapping up all the code in a try {} block instead of just the bits you think will fail as it saves you jumping in and out of the block and mixing try{} catch{} groups across a function.

I can see the same happening here, where everything ends up in the 'protect' block and an ever expanding list of 'task' to clean up on the end.

Working from that premise maybe assume always protected and setup the error handling via some other means?

entrypoint :
{
    print(FallibleFunction(42, "Test"))
}
with handlers
{
    Barf : { print("Function barfed.") }

    Vomit : string reason -> integer fallback = 0
        { print("Vomit! Falling back to 0 becuase: " ; reason) }
}


FallibleFunction : integer p1, string p2 -> string ret = ""
{
    if(p1 > 100)
    {
        panic => Barf()
    }

    while(p1 > 20)
    {
        p1 = (panic => Vomit("Number slightly too high"))
    }

    ret = p2 ; " ... " ; cast(string, p1)
}
Although this does introduce scoping issues for variables and the inability to 'catch and recover' within a function which might prove an issue, I'd have to ponder that some more. Although depending on the nature of the agents and how they bubble errors up this might not be a problem - it might not be logical to 'catch and recover' within in the semantics of the error handling agents and the rest of the program flow.

Overall however, I do find the idea of separating out error reporting/handling in this manner an interesting idea.


#5129847 Shader reflection unused variables in cbuffer

Posted by phantom on 08 February 2014 - 10:10 AM

You can query the variable to see if it is used or not.

However, unused variables are NOT always stripped out of cbuffers - it might be in this case because it is the last parameter however in larger cbuffers variables are left in place. Checking the assembly listing generated when compiling (assuming you compile offline) should show you what is going on.

(In fact I'm not sure I've seen a case where a cbuffer definition and a queried size HAS mismatched on size due to optimisation - but I wouldn't swear to it as it's been some time since I last had to deal with it.)


#5128689 Why would devs be opposed to the PS4`s x86 architecture?

Posted by phantom on 04 February 2014 - 04:44 AM

And yet for all of that processing power X360 games regularly outperformed their PS3 versions.


To be fair the SPUs were often used to make up for the utter garbage which was the PS3's GPU... god that thing sucked.


#5128659 AMD's Mantle API

Posted by phantom on 04 February 2014 - 01:27 AM

Really I think the best long-term outcome from this would be that GL and DX get a giant kick in the ass, and adopt the best parts of Mantle.


Well, I think for this to happen DX will have to 'die' in its current form as there is too much legacy there. I guess you could label it DX12 and be a completely new API but that's all it's going to be; it'll look nothing like the DX we know today.

And OpenGL... well, its improving... but the ARB have shown on two seperate occasions a total inability to dump the old and move to the new wholesale (see: GL2.0 & Longs Peak/GL3.0). While things have improved abit in regards to overhead the solutions are very much 'more instancing!' and 'keep things mapped forever!' rather than any conceptual change in execution model - Mantle, based on the slides from AMD's dev days, lets me control when compute work gets kicked, when gfx work gets kicked, when dma work gets kicked, to which gpu it gets kicked, lets me build command buffers across multiple threads, lets me control when surfaces get converted from type to type. Heck, based on a recent twitter comment Mantle even has sane semantics for letting you copy data inside a buffer backwards (so defragmenting a buffer) something which isn't garrenteed in the current GL spec at all.

Could GL adopt those things via extension? Sure.
The problem is there is a resistance to doing so ("instancing is the answer!", "indirection on the gpu!") and chances are we'd end up with 3 different sets of vendor extensions and a watered down version in the main spec. (plus all the existing baggage we have as it would have to work with all existing extensions and the like... *sigh*)

Ironically I suspect MS will be the one to kick a cross vendor thing into shape, assuming they are working on it, but that's probably still some time out and will probably come with more overhead than anyone wants.

What AMD need to do is get Mantle settled, get it proven, get it out to the world then open it up properly (much like HSA) and get vendors involved (not just NV and Intel but the mobile guys too) to get this execution model adopted.


#5128382 Why would devs be opposed to the PS4`s x86 architecture?

Posted by phantom on 03 February 2014 - 04:35 AM

I'm guessing that the awesome PS3 CPU model (traditional CPU core paired with SPUs) was abandoned because we now have compute-shaders, which are almost the same thing.


Except for all the things you lose and the extra 'baggage' of 'must have 64 units of work in flight or you are wasting alu', must put thousands of work units in flight to hide latency (which you can't explicately cover with alu ops) and all manner of other things which makes me sad sad.png

Ideal layout;
- few 'fat' cores
- 16 or so SPUs with more on-die cache (double PS3 basically at least)
- DX11 class gpu

You get things which are good at branchy code, things which are good at moderately branch/simd friendly code and things which are good at highly coherant large workset code.

I'd be all smile.png then.

As it is...
*sigh*


#5127915 Compute Shaders & Lighting - Performance

Posted by phantom on 01 February 2014 - 03:38 AM

"[numthreads(1, 1, 1)]" is likely to be your problem; you are telling the GPU to dispatch a thread group with only 1 active thread in it, which means on most GPUs you are idling 31 (NV) to 64 (AMD) threads per groups or most of the ALU power.

The number of threads dispatched here wants to be a multiple of 32 or 64, depending on target hardware, and then your overall thread group dispatch count needs to be adjusted to account for this.


#5127596 How long would you support Shader Model 2?

Posted by phantom on 30 January 2014 - 04:49 PM

- although supporting SM2 backwards compatibility is nice practice, I've never ever had to use it before


Practice for... what?


#5126993 isnt opengl to high or to low

Posted by phantom on 28 January 2014 - 11:24 AM

Otherwise - well all anyone has to do is point out all of the online clamouring about the ancient binding model as evidence.


Or the shit storm which was the reaction to the OpenGL3.0 announcement after the promises/designs of Longs Peak.

(Yeah, still bitter about that; taking that one to the grave I think...)




PARTNERS