fir

Members
  • Content count

    1272
  • Joined

  • Last visited

Community Reputation

-460

About fir

  • Rank
    DO NOT FEED
  1. AppochPiQ sorry but do not wait for an answer, becouse of your previous behaviour which was exceptionally primitive, sorry , i do not talk with such ppl
  2.   This.   We have an abundance of both CPU and GPU power today, and while we will always be happy for more and more to come, we really have reached a point where either are almost always more than enough. The worlds most efficient rendering engine will always look like crap if you feed it shitty content to render.   In terms of where content intersects with the technical, there are really two big areas of interest today that I know of -- unified, physically-correct materials BRDF, and real-time, realistic lighting. We used to fake all of this. We'd pre-bake lighting, or we'd create materials independently of one another and then tweak then to look "right" under local lighting conditions. We have enough GPU power now where there's less and less faking of these things.     still as speaking on games it seem to me that inagme physics is still weak in present games, (though i cannot be 100% sure how it looks like as im not playing to much games) by physics i mean real physics of destruction (when you throw a bomb and building falls down) not this extremally poor physics as a throwing the barrels or boxes But i dont know what is a reason for lask of such real physics (at least ehen speaking of damages not al physics as physics is misleading term at all, two main areas of this 'physics' would be probably only 'destruction' and 'biodynamics') if it is lack of cpu power lack of gpu power or just lack of algorithmic solutions here - probably the last one
  3. alright interesting answer (i may say that those estimation on span of processing powers of the cpu are consistent with my own (as i was doing some benchmarks) - though it is an answer on somewhat different question : "what is harder to optymize (or maintain) and why"- i was more curious what kind of hardware speedup cpu or gpu would be more welcome in games and had a bigger influence on outcome quality (maybe it is hard to answer but i wonder)- or maybe yet todays game quality depends mainly on content not on processing power?   processing power is important as it for sure can make some work easier (no need for maintaining complex ptymization structures) but is it still so important or not so important? Hard to answer by myslelf 
  4. I think what could be a list of common quests to code for some independant game programmer, There are such things that  many seem to do as some kind of fun and exercise , some easier some harder. i wonder how could be yet putted on the list   some things that comes to my mind   basic:   - program that plots an mathematic function plots  - mandelbrot set 'viewer'   bit more advanced:   - own rasterizer - own raytracer   not sure what else could be added (would be good to add) to this list should it be? - own simple arkanoid? - own 2d platform game? - own dx/ogl simple engine ?     If someone is able please provide you 5 or 10 candidates as such 'training' quest projects      
  5. Are todays games (by default im thinking on desktop, the x86 windows  games) more cpu or gpu constrained (by constrained i mean the situation where its quality 'suffers' in some kind of quality by the potential lacks of the platform) Is this gpu cpu or maybe something other, or maybe nothing?   Sorry duplicated  by mistake - could be deleted
  6. Is c++ good

      Imagine what it would be if c++ would not hold the backward compatibility ;/ Then you will have  1) two wersions of c++ in the run (which would will make some to revrite millions of lines probably etc) 2) as the previous c++ would be marked obsolete it would effect in the thing called 'code rot' imo both versions of c++ older and younger will be suffering on this   This is even now present by publicing the changes and staying backward compatible - but if it would drop backward compatibility it would raise much more   (for some case it is fun/silly of standarization comitee to produce different versions of language, the standarization was born exactly to deny the multiplity of close but not compatible versions and problems and space poluttion  that it makes - then standarization comietee was starting  to make those uncompatible verions theyselves ;/ this is a bit sick) - the one thing that helps with this kind of problems is backward compatibility as you may still use core language (i mean older version and its still alive   same think i dislike the rotting proces of opengl (first ogl version was sentenced rotten to the bone, then we have  the second about wich i cannot be sure will not rot as well - this makes me feeling a bit disrespectfull to that as i could respect more some real stable environment and system for decades
  7. alright this got yet more confuzed now but overal was helpfull, i will read about it more but i got some base picture, (now i dont want to go deeper in that but some day when a read a bit and clarify i can return with more detailed things)
  8. Phantom pretty much described, how it works (in the current generation), but to give a very basic comparison to CPUs: Take your i7 CPU: It has (amongst other things) various caches, scalar and vectorized 8-wide ALUs, 4 cores and SMT (intel calls it "hyperthreading") that allows for 2 threads per core. Now strip out the scalar ALUs, ramp up the vectorized ALUs from 8-wide to 32-wide and increase their number, allow the SMT to run 64 instead of 2 "threads"/warps/wavefronts per core (note that on GPUs, every SIMD lane is called a thread) and put in 8 of those cores instead of just 4. Then increase all ALU latencies by a factor of about 3, all cache and memory latencies by a factor of about 10, and also memory throughput by a significant factor (don't have a number, sorry). Add some nice stuff like texture samplers, shared memory (== local data store) and some hardware support for divergent control flows, and you arrive more or less at an NVidia GPU. Again, Phantom's desciption is way more accurate, but if you think in CPU terms, those are probably the key differences.   this is nice picture, but still is some confusion what is called  thread here; you describe some float32 simds - then you write about 64 threads each one working on float32? (you mean official documentations call thread each one scalar chanell here  i mean there would be 32x64 scaler threads?) as to those 64 big threads, are those independant, each one has its own code it executes (?) and own instruction pointer? That wopuld be more clear descriptions than phanthom's though phantom user gave more info about this sheduler thing    when speaking about sheduler, i understand that those 64 big threads are managed by those sheduler? here i do not understand or at leas im not sure - i may suspect that this sheduler comes between workloads and those 64 big threads I may suspect that each workload is seperate assembly program and threads are dynamically assigned to those workloads, maybe that could have some sense   if this picture is correct it would be like 64 cores each on working on float32 simd so it really is a whole bunch of processing power but im not sure if the way i see it here is xompatible with description       ps anyway is seem to be more clear, maybe details are not much important but probebry i got a general idea    input assembly routine (or few paralel routines) that is consumed by sheduler and dispathed "in width" to up to 64 32-simd threads   this is different than i thought becouse this involves this input assembly to be defined on some width of data, i mean not some normal scalar assembly but some width-assembly   yet my oryginal question was how those input assembly routines are provided for execution and also how results are taken back, (there must be some way some function pointers interpreted by hardware as routines to execute or something like that)   I am also curious what it is with results, if i provide three workloads can i run them asynchronously then get a signal that first is done then use the result as an input for some next workload etc - I mean if i can build some pre scheduler loop that constantly prowides workloads and consumes the results - that was the 'scheduling code' i had somewhat on my mind - is there something like here to run on gpu or this is just to write on cpu side?
  9.   Im not readed whole yet (will do tomorrow) but some parts of it are confusing    1CU = 4 groups of 16 SIMD units   what is SIMD unit, what is its size is it float4/int4 vector on each simd? I may suspect so but i cannot be sure ; further there is saying about 10 wavefronts, why 10? how many CUs is in this card?   also i dont understand what means simd unit, and what does mean thread here, when speaking about 4 groups of 16 simd units it is meant that there are 64 'threads' each one is working on float4 (or int4  i dont know) 'data packs'   this seem to me most probably but i cannot be sure and it is some more obstacle of understanding further description
  10. You might as well stop until you've got the time then; my initial explanation contains pretty much all the details but you are asking questions which are already answered, you just lack the base knowledge to make sense of them. Your comparison with CPUs is still incorrect because a CPU only schedules instructions from a single stream per core/hardware thread; it requires the OS to task switch. A GPU is automatically scheduling work from up to 5 threads, from a group of 10, per clock BEFORE the instructions are decoded and run on the correct unit - and that is just the CU level. You REALLY need to go and read plenty of docs if you didn't understand my explanation because this isn't an easy subject matter at all if you want to understand the low level stuff.     Not readed that yet, (scaned only), im reading the text i know not much quite slow and easily getting tired -,probably i will read it tomorow morning
  11.   Is there something that could be mentioned as a cause for this? (that each of those cores cannot execute in 'independant direction'?)
  12. It seen fun for me that present processors seem to work like interpreters, they read the code and interpret/recompile it on the fly, sad that programmers cannot for example reprogram this internal  interpreter ;\ (it would be there a processor program that is written to execute assembly, this assembly could be reprogrammed to, it would be kool) -  or, other option, maybe provide already staticaly compiled wersion (as compiled code usualy works faster than runtime interpretation)
  13. ps this is confusing what you name cpu shedulind - 1) dispatching assembly stream to some internal processors channels and 'stations' or 2) something like hyperthreading when you got two assembly streams (two instruction pointers etc) but you 'shedule' it to one processor or 3) to normal sheduling threads by os on 4 cores or something like that - i got confused, Anyway i thing it is worth to talk/reflect thing on such 'highre' level of abstraction to clarify a view of things (how cpu is working low lewel is well known to anybody ,. gpu seem to be very same system as cpu, this is just like second computer in the first computer :/
  14. If by "schedule the instructions onto the execution units yourself." you mean the think i got on my mind i mean called cores on chosen assembly code buffers i imagine it this low lewel way - I imegined it such as desktop thread sheduling is managed by os and GPU has no OS (or does he?) so i m trying to imagine this just a set of processors and assembly chunks and procedures (same as you see a processor working on a linear ram and thats all0 I would like to build this time of picture but related to gpu hardware I will reread the posts here yet as at least a half of it i didnt get at all (got no time and skills to read many docs but will try just to 'deduce' something from this info - it is worse but consumes less time, than in futre i will try to read more )   ps has GPU the OS? If so is it some asseembly routines loaded by some driver, or maybe some kind of flash rom os, or maybe some hardware coded 'microkernel for microcode' or what? (sory for speculations and weak knowledge but it is complex etc, hard to digest the info)
  15. That is actually the architecture behind the Cell processor. GPUs work differently. Phantom already described the GCN architecture, and for all intends and purposes of this discussion, the NVidia architectures are very similar. The biggest difference is that they had to invent new names for literally everything.   While good scheduling is actually very non trivial in a GPU, especially with those time constraints, I don't see why a fixed scheduling policy would make the GPU less flexible. Think of it like this: On the CPU, you also don't control the schedulers, neither the hardware ones, nor the software OS schedulers. Maybe you should download Cuda/OpenCL/... and give it a spin. Things will be a lot clearer, once you have actual hands on experience.     well i thinked that such GPU sheduling is quite different than cpu sheduling on threads - cpu shedules threads for many apps, and here on gpu you want to write a code that you want to shedule yourself - you need some expression for that (i mean you mean something like manual thread management in your app, so i think if this kind of sheduler is fixed, you cannot do such thing as  normal desktop coding when you could run 4 threads manually and assign task to it)  - or this is (or would be ) possible on gpu too?