Jump to content
  • Advertisement
Sign in to follow this  
fir

hyphotetical raw gpu programming

This topic is 1585 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I wonder if it would be possible to run raw assembly on GPU

(maybe I should name it G-cpu-s as those are maybe just

a number of some kind od simplified cpus, or something)

Could maybe some eleborate on this - is gpu just a row

od simplistic cpus ?

How such code would look like - a number of memory spaces then each one

feel fill with assembly code then run them?

 

 

Share this post


Link to post
Share on other sites
Advertisement
CPUs and GPUs are both processors, but they specialize in different areas. GPUs excel in massively parallel processing whereas CPUs excel more in general processing. You might want to look into OpenCL, I think it may have some support for video cards.

Share this post


Link to post
Share on other sites

It depends very much on the internal architecture -- because GPUs have sort of hidden behind the problem they focus on, the underlying silicon has been changed radically even in just the 10 or so years since they've become really programable -- off the top of my head, you had VLIW from AMD with an issue width of 5 and 4 going back to the HD5x00 and HD6x00 series, single-SIMD-per-core before that, and multiple-SIMD-per-core recently in GCN 1.0, and GCN 1.1/2.0 having the same basic architecture as that but with better integration into the system's memory hierarchy; From nVidia, you've had half-sized, double-pumped cores, single 'cores' with very many relatively-independent ALUs, and most recently (maxwell), which shrunk back the number of ALUs per core.

 

Both companies do expose a kind of assembly language for recent GPUs if you look around for it. It is entirely possible to write assembly programs for the GPU or build a compiler that can target them. But the mapping isn't quite as 1-1 as on, say x86 (even x86 you're only talking to a logical 'model' of an x86 CPU and the actual micro-code execution is more RISC-like).

 

If you branch out from PC and look at mobile GPUs that you find in phones and tablets, then you have tiled architectures too. ARM's latest Mali GPU architecture Midgard is something really unique -- where every small-vector ALU is completely execution-independant of any other -- as a consequence, every pixel could go down a different codepath for no penalty at all which is something no other GPU can do. In a normal GPU the penalty for divergent branches (an 'if' where the condition is true for some pixels and false for others) is proportional to the square of the number of divergent branches in the codepath, which can quickly become severe.

 

Then, you have something similar in intel's MIC platform, which was originally going to be a high-end GPU ~5 years ago. The upcoming incarnation of MIC is Knights Landing, which is up to 72 customized x86-64 processors based on the most recent Silvermont Atom core -- its been customized by having x87 floating point chopped off, each physical core runs 4 hyper-threads, and each physical core has 2 512-bit SIMDs, and its got up to 8 GB of on-package RAM using a 512-bit bus giving ~350GB/s bandwidth.

 

Anyhow, I get talking about cool hardware and I start to ramble smile.png -- Long story short, yes you can do what you want to do today, but the tricky part is that GPUs just aren't organized like a CPU or even a bunch of CPUs (Midgard is the exception, Knights Landing to a lesser extent) and so you can't just expect it to run CPU-style code well. A big part of making code go fast on a GPU is partitioning the problem into managable, cache-and-divergence-coherent chunks, which tends to either be super-straight forward (easy) or requires you to pull the solution entirely apart and put it back together in a different configuration (hard).

 

Got a problem understanding that as my knowledge is low - i reread

it few times.. Probably to know how (some) gpu is exactly build i would have to work in some company making them : /

 

But i can show my simple schematic picture of this and ask for some claryfying if possible. So for me GPU world seem to be contained from those parts

 

- Input Vram (containing some textures, geometry and some things)

 

- Output vram kontaining some framebuffers atc

 

- Some CPU's (I dont know nothing about this but i imagine they are something like normal cpu driven by some assembly but maybe this assembly is a bit simpler (?) they also say that they are close to x86

sse assembly (at least by type of registers ? - i dont know

 

- some assembly program (or programs) - it must be some program if

there are cpus - but its total unknown if this is one code ot there are 

many programs - each one for each cpu - are those programs clones of one program or are they different ?

 

this question how those programs look like is the one unknown, other

important unknown is if such hardware (i mean GPU) when executing

all this transformation from input Vram to output vram uses only those

assembly programs and those cpus or has yet some other kind of hardwate thad do some transforms but is not such like cpu+assembly

but some other 'hardware construct ' (maybe some that is hardcoded in transistors not programmable by assembly - if there are such things

Im speculating

Share this post


Link to post
Share on other sites

it few times.. Probably to know how (some) gpu is exactly build i would have to work in some company making them : /


Each individual GPU can use an entirely different instruction set, even in the same series of GPU. AMD rather publicly switched from a "VLIW5" to a "VLIW4" architecture recently which necessitates an entirely different architecture (and during the transition, some GPUs they release used the old version and other variations in the same product line used the new version). Even within a broad architecture like AMD's VLIW4, each card my have minor variations in its instruction set that is abstracted by the driver's low-level shader compiler.

Your only sane option is to compile to a hardware-neutral IR like SPIR (https://www.khronos.org/spir) or PTX (which is NVIDIA-specific). SPIR is the intended solution to this problem by the Khronos group that allows for a multitude of languages and APIs to all target GPUs without having to deal with the unstable instruction sets.

Some CPU's (I dont know nothing about this but i imagine they are something like normal cpu driven by some assembly but maybe this assembly is a bit simpler (?) they also say that they are close to x86


GPUs are not like CPUs. They are massive SIMD units. They'd be most similar to doing SSE/AVX/AVX512 coding, except _everything_ is SIMD (memory fetches/stores, comparisons, branches, etc.). A program instance in a GPU is really a collection of ~64 or so cores all running in lockstep. That's why branching in GPU code is so bad; in order for one instance to go down a branch _all_ instances must go down that branch (and ignore the results of doing so on instances that shouldn't be on that branch).

You might want go Google "gpu architecture" or check over those SPIR docs.

Share this post


Link to post
Share on other sites

You guys are wayyy over my head with this stuff.  I'm kinda with fir on this; I only have a vague notion of what a GPU does, but I figure it's like he says, a vast array of memory as data input, a similar vast array as output, and a set of processors that read and process instructions from yet another array of memory to transform the input to the output.  Is that not the case?

 

Do all the processing units always work in lock-step or can they be divided into subgroups each processing a different program on different input sets?

 

Is there a separate processor that divides up the data and feeds it or controls the main array of processors as appropriate?

 

I mean, I can describe how a traditional CPU works down to the NAND gate level (and possibly further), but I'd be interested in learning about GPU internals more.

Share this post


Link to post
Share on other sites

me too, especially to learn (/discuss most important knowledge)  in an easy way, Some docs harder than intel manuals can be obstacle. Most importand would be to get a picture how this assembly code looks like and how its executed - for example if this is some long linear assembly routine 

like

 

start:

  ..assembly..

  ..assembly..

 

  ..assembly..

  ..assembly..

 

  ..assembly..

end.

 

one long routine

that is given to consume by pack of 64 processors or if this is some other structure,

 

Some parts of pipeline are programmable by client programmer but what with the other parts - are those programmed by some internal assembly code or what? hard to find an answers but it would be interesting to know that

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!