It depends very much on the internal architecture -- because GPUs have sort of hidden behind the problem they focus on, the underlying silicon has been changed radically even in just the 10 or so years since they've become really programable -- off the top of my head, you had VLIW from AMD with an issue width of 5 and 4 going back to the HD5x00 and HD6x00 series, single-SIMD-per-core before that, and multiple-SIMD-per-core recently in GCN 1.0, and GCN 1.1/2.0 having the same basic architecture as that but with better integration into the system's memory hierarchy; From nVidia, you've had half-sized, double-pumped cores, single 'cores' with very many relatively-independent ALUs, and most recently (maxwell), which shrunk back the number of ALUs per core.
Both companies do expose a kind of assembly language for recent GPUs if you look around for it. It is entirely possible to write assembly programs for the GPU or build a compiler that can target them. But the mapping isn't quite as 1-1 as on, say x86 (even x86 you're only talking to a logical 'model' of an x86 CPU and the actual micro-code execution is more RISC-like).
If you branch out from PC and look at mobile GPUs that you find in phones and tablets, then you have tiled architectures too. ARM's latest Mali GPU architecture Midgard is something really unique -- where every small-vector ALU is completely execution-independant of any other -- as a consequence, every pixel could go down a different codepath for no penalty at all which is something no other GPU can do. In a normal GPU the penalty for divergent branches (an 'if' where the condition is true for some pixels and false for others) is proportional to the square of the number of divergent branches in the codepath, which can quickly become severe.
Then, you have something similar in intel's MIC platform, which was originally going to be a high-end GPU ~5 years ago. The upcoming incarnation of MIC is Knights Landing, which is up to 72 customized x86-64 processors based on the most recent Silvermont Atom core -- its been customized by having x87 floating point chopped off, each physical core runs 4 hyper-threads, and each physical core has 2 512-bit SIMDs, and its got up to 8 GB of on-package RAM using a 512-bit bus giving ~350GB/s bandwidth.
Anyhow, I get talking about cool hardware and I start to ramble :) -- Long story short, yes you can do what you want to do today, but the tricky part is that GPUs just aren't organized like a CPU or even a bunch of CPUs (Midgard is the exception, Knights Landing to a lesser extent) and so you can't just expect it to run CPU-style code well. A big part of making code go fast on a GPU is partitioning the problem into managable, cache-and-divergence-coherent chunks, which tends to either be super-straight forward (easy) or requires you to pull the solution entirely apart and put it back together in a different configuration (hard).