OpenCL useful at all? tl;dr warning

Started by
8 comments, last by Telgin 12 years, 7 months ago
Hello

I've been looking at HPC with OpenCL in the past few weeks and I'm wondering if it's actually good at what it does? If it's worth it, yet? Or ever? I feel it's full of promises it won't ever be able to keep up to.

The idea that OpenCL can split the processing load between the different computing hardware devices is nice, but is it meant to work? I'm trying to do a simple computational framework for my audio-visualization project (if such a thing can be...), but adding OpenCL to it seems a bit useless? I can always find ways of doing what I want outside OpenCL, with less work, and OpenCL mostly put restrictions on it (doing load balance between unsupported computational units is hard and can be a waste).

If I want to do imaging I'll use GLSL, if I want to do DSP I'll use sound card, if I want to do some very specific algos really fast I'll use FPGA (I guess :) ), if I want to be doing directing / logical code I'll send it to CPUs, no...?

I'm the example; I have an Intel Cpu with NVidia Gpu. With NVidia's OpenCL drivers, I can only compute on GPU. With NVidia giving me only GPU, why would I generate OpenCL code instead of only GLSL one? I'd rather use one less LLVM backend..!

Aren't CPUs, GPUs, DSPs and FPGAs too different from each other to be treated the same? Executing branching code on a GPU is a waste, they're streamlined, CPUs are fined grained and you should use them instead, no? Et vice versa.

I don't know so much about the other accelerators (DSPs, Cell processor, FPGAs) but I would think the same problems occur. I guess also most people don't see them, my sound card doesn't even show at all in OpenCL queries... If you have examples of OpenCL being useful there, please share with me. I've only seen OpenCL replace GPGPU since I've looked at it, it's not so heterogeneous.

I was delighted when I first learnt about OpenCl, it looked quite promising. I'm also a portability advocate, but within its limits... For HPC I'm feeling it's a pious dream.

Thanks for any enlightments!

Regards,
Guillaume
Advertisement
I've only ever seen OpenCL billed as a GPGPU solution, so I don't really see where you're coming from. It certainly isn't a magic wand that makes all computational processes better; you're right about that. But it does seem pretty darn good if you want to go GPGPU programming.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]


I've only ever seen OpenCL billed as a GPGPU solution, so I don't really see where you're coming from. It certainly isn't a magic wand that makes all computational processes better; you're right about that. But it does seem pretty darn good if you want to go GPGPU programming.


Basically. An OpenCL kernel can be compiled and run on CPU, but you get very little benefit over a simple threading/thread pooling library. The OpenCL specification itself is geared 100% toward GPU architecture. In fact, the current spec is nearly identical to CUDA circa major revisions number 2-3.

And as an aside, since GPGPU solutions are not currently, and probably will never be, consumer level solutions (except in extreme rare cases where people are willing to purchase multiple GPUs and dedicate one or more to GPGPU), there really is very little reason to go with anything other than CUDA (or possibly DirectCompute if you prefer HLSL) on an nVidia powered solution. CUDA is much more advanced than OpenCL, and in a lab or server setting (where GPGPU makes the most sense), you have full control over hardware composition. Also, the nVidia development and debugging tools for GPGPU are simply the best available.

[quote name='ApochPiQ' timestamp='1316149721' post='4862320']
I've only ever seen OpenCL billed as a GPGPU solution, so I don't really see where you're coming from. It certainly isn't a magic wand that makes all computational processes better; you're right about that. But it does seem pretty darn good if you want to go GPGPU programming.


Basically. An OpenCL kernel can be compiled and run on CPU, but you get very little benefit over a simple threading/thread pooling library. The OpenCL specification itself is geared 100% toward GPU architecture. In fact, the current spec is nearly identical to CUDA circa major revisions number 2-3.

And as an aside, since GPGPU solutions are not currently, and probably will never be, consumer level solutions (except in extreme rare cases where people are willing to purchase multiple GPUs and dedicate one or more to GPGPU), there really is very little reason to go with anything other than CUDA (or possibly DirectCompute if you prefer HLSL) on an nVidia powered solution. CUDA is much more advanced than OpenCL, and in a lab or server setting (where GPGPU makes the most sense), you have full control over hardware composition. Also, the nVidia development and debugging tools for GPGPU are simply the best available.
[/quote]

This.

I think OpenCL has potential, but right now at least it's no better than CUDA and has its own limitations to hold it down. The only real thing going for it right now is the fact that it can be compiled to a CPU and run without a GPU, but that's not as simple as it sounds. You might be tempted to say that you could code a solution in OpenCL and use the GPU when available or fall back to the CPU, but reworking a problem to run in OpenCL is non trivial. Worst of all, the current implementations are not very good if you want to use a CPU and GPU on the same system. You have to install AMD's platform to run it on the CPU, then install nVidia's to run it on nVidia GPUs. The nVidia platform won't compile to CPU at all, and from what I understand the AMD platform doesn't support any GPUs yet, not even AMD's.

So, that scraps the idea of doing this in any typical program (such as a game), since it demands a lot for a consumer product. It also means that right now at least, there's no way to concurrently run an OpenCL kernel on both the CPU and GPU at the same time, which should in principle be possible.

The only use I've found for OpenCL is in my graduate research where I have to deal with the AMD + nVidia platform installation and where I switch between the two when I need to swap between devices (CPU / GPU). If CUDA still supported CPU emulation, we'd have no use for OpenCL.

The future though, well, maybe that's a different story. Maybe one day general graphics cards will support OpenCL and installing the platform to support them will be simple. Maybe one day the platforms will support running kernels on multiple devices simultaneously. Until then though, it's hard to argue for OpenCL when CUDA is still around and a lot more mature.
Success requires no explanation. Failure allows none.

[quote name='arbitus' timestamp='1316185266' post='4862468']
[quote name='ApochPiQ' timestamp='1316149721' post='4862320']
I've only ever seen OpenCL billed as a GPGPU solution, so I don't really see where you're coming from. It certainly isn't a magic wand that makes all computational processes better; you're right about that. But it does seem pretty darn good if you want to go GPGPU programming.


Basically. An OpenCL kernel can be compiled and run on CPU, but you get very little benefit over a simple threading/thread pooling library. The OpenCL specification itself is geared 100% toward GPU architecture. In fact, the current spec is nearly identical to CUDA circa major revisions number 2-3.

And as an aside, since GPGPU solutions are not currently, and probably will never be, consumer level solutions (except in extreme rare cases where people are willing to purchase multiple GPUs and dedicate one or more to GPGPU), there really is very little reason to go with anything other than CUDA (or possibly DirectCompute if you prefer HLSL) on an nVidia powered solution. CUDA is much more advanced than OpenCL, and in a lab or server setting (where GPGPU makes the most sense), you have full control over hardware composition. Also, the nVidia development and debugging tools for GPGPU are simply the best available.
[/quote]

This.

I think OpenCL has potential, but right now at least it's no better than CUDA and has its own limitations to hold it down. The only real thing going for it right now is the fact that it can be compiled to a CPU and run without a GPU, but that's not as simple as it sounds. You might be tempted to say that you could code a solution in OpenCL and use the GPU when available or fall back to the CPU, but reworking a problem to run in OpenCL is non trivial. Worst of all, the current implementations are not very good if you want to use a CPU and GPU on the same system. You have to install AMD's platform to run it on the CPU, then install nVidia's to run it on nVidia GPUs. The nVidia platform won't compile to CPU at all, and from what I understand the AMD platform doesn't support any GPUs yet, not even AMD's.

So, that scraps the idea of doing this in any typical program (such as a game), since it demands a lot for a consumer product. It also means that right now at least, there's no way to concurrently run an OpenCL kernel on both the CPU and GPU at the same time, which should in principle be possible.

The only use I've found for OpenCL is in my graduate research where I have to deal with the AMD + nVidia platform installation and where I switch between the two when I need to swap between devices (CPU / GPU). If CUDA still supported CPU emulation, we'd have no use for OpenCL.

The future though, well, maybe that's a different story. Maybe one day general graphics cards will support OpenCL and installing the platform to support them will be simple. Maybe one day the platforms will support running kernels on multiple devices simultaneously. Until then though, it's hard to argue for OpenCL when CUDA is still around and a lot more mature.
[/quote]

I agree that right now OpenCL has limits but given that it is in the fairly early stages compared to CUDA and others it is rather understandable. But, in this one case, Os X beats the hell out of Win32's support as OpenCL is integrated and functional in the full intended manner, a bit buggy sometimes but very usable. Basically the large benefit is exactly the bit you mention about targeting your work to the proper location, so you get a CPU and a GPU device which can all execute the same code, though you have to compile for each device. Obviously for pure straight forward math the GPU is likely to make it virtually worthless to send the code to the CPU at the same time, on the other hand if the code is highly conditional with a lot of random'ish data access, the CPU is likely to leave the GPU in the dust. That of course seems to make the two devices somewhat worthless and you would just use one or the other.

But, you have some rather important benefits to the design which can't be found with the other solutions. Assuming Nvidia and AMD get their stuff to fully compliant states fairly soon (AMD had GPU support last I checked, Nvidia quits dicking around with GPU only, etc) OpenCL can/will be very useful. On the CPU side, it is inherently multi-core and SIMD optimized so even if you are doing complicated logic which the GPU doesn't much like, it can replace many multicore solutions for the CPU code side. If you are doing pure math and such, writing it for the GPU will of course be nice and fast. The combination of the two items is highly fun to play with, at least on Os X.

An example of utility, I coded up a basic OpenCL conversion of an old BSP compiler. I ran it on both CPU and GPU and it was about 75-100 times faster at converting a poly soup test scene. Not bad but not really as impressive as expected, my Nvidia OpenCL version of the same code was not much different performance on Win32. (Not apples to apples though, my older Mac Pro is a quad with a slightly dated Nvidia card.) So, I went through the code and split the logic portions into a high level file and another with mostly pure math bits. I ran the logic code on the CPU and the math on the GPU, hmm in the range of 500-1000 times faster. Ok, getting impressive. So, with about a days work I got a ~1k speed up over my old simple single threaded BSP compiler, had only two languages involved (C++ and OpenCL) and it was relatively simple and straight forward. Compared to writing a multicore system for the CPU myself and optimizing it for SIMD, OpenCL saved considerable work on this side alone, add support for GPU and it saved setting up OpenGL for GLSL etc and I can run it without a window as a service.

Another point in OpenCL's favor is that as an open standard it won't likely go away if GPU's get phased out due to some new tech or Intel/AMD having CPU's fast enough to completely replace GPU's. (I'm not convinced GPU's are around to stay, I've seen complex sound chips come and go, blitters come and go, secondary FPU's, etc, CPU's absorbed most of those and I don't doubt separate GPU's will go the same way.) Hell, it may be running on iOs or Android in the near future which you can't really say for DirectCompute, CUDA or other items except HLSL of course.
The only thing I will add to that is this:

The OpenCL kernel development paradigm has a 1 to 1 mapping with CUDA and GPU architecture. The idea of a multidimensional threadblock with multiple memory pathways (global, shared, thread) and the way in which threads share memory and synchronization is entirely a result of the GPU architecture, and if GPU architecture goes out of vogue, so will the current assumed model in OpenCL. Thems the breaks if you want to take full advantage of hardware architecture.

The only thing I will add to that is this:

The OpenCL kernel development paradigm has a 1 to 1 mapping with CUDA and GPU architecture. The idea of a multidimensional threadblock with multiple memory pathways (global, shared, thread) and the way in which threads share memory and synchronization is entirely a result of the GPU architecture, and if GPU architecture goes out of vogue, so will the current assumed model in OpenCL. Thems the breaks if you want to take full advantage of hardware architecture.


I'm not sure I entirely agree with this. The model is very similar to the CPU based multicore threading systems I've put into several games. As such, I tend to believe it has a lot of potential in any environment. I've actually been thinking about using it when the support is up to snuff as my "script" language for the game objects which would run purely on the CPU's. My thinking is that because I usually run thousands of objects at anytime, using OpenCL as the logic language would remove a fair amount of the work involved with maintaining a multicore engine, dealing with goofy threading mistakes etc and give me a compiled "script" language which multicores well. Even if it is not "great" on the CPU it should be able to accomplish this quite well.
This question sums it all up:

I'm wondering if it's actually good at what it does? If it's worth it, yet? Or ever? I feel it's full of promises it won't ever be able to keep up to.


If addresses your problem then yes, it is great. It is worth it for the problem. It will live up to all of its promises and more.

If it does not, then no, it is not worth it for the problem. It is a bad fit for that problem and you should use something else.



There are some existing problems that OpenCL is an excellent fit for. Many computationally-intense problem solvers have experienced ecstasy thanks to OpenCL. For those problems it is a wonderful solution to them. It is the hammer for their nail.


There are many other problems for which OpenCL is a poor fit. For those problems is is a horrible solution. It is a hammer but your problem needs a circular saw.


Which one applies for your problems? We can only speculate.
If I want to do imaging I'll use GLSL, if I want to do DSP I'll use sound card, if I want to do some very specific algos really fast I'll use FPGA (I guess :) ), if I want to be doing directing / logical code I'll send it to CPUs, no...?

I'm the example; I have an Intel Cpu with NVidia Gpu. With NVidia's OpenCL drivers, I can only compute on GPU. With NVidia giving me only GPU, why would I generate OpenCL code instead of only GLSL one? I'd rather use one less LLVM backend..!

Aren't CPUs, GPUs, DSPs and FPGAs too different from each other to be treated the same? Executing branching code on a GPU is a waste, they're streamlined, CPUs are fined grained and you should use them instead, no? Et vice versa.

I don't know so much about the other accelerators (DSPs, Cell processor, FPGAs) but I would think the same problems occur. I guess also most people don't see them, my sound card doesn't even show at all in OpenCL queries... If you have examples of OpenCL being useful there, please share with me. I've only seen OpenCL replace GPGPU since I've looked at it, it's not so heterogeneous.


Your sound card could show up if someone was to write a backend for it, however chances are it's too specific to be useful anyway (assuming you even HAVE a hardware sound card, most sound these days is software driven, certainly on windows).

NV probably don't produce x86/x64 code because they don't have a license to produce the chips and are thus unlikely to want to suppor that market share; I wouldn't rule out an ARM backend one day, but that's unlikely for now. (And I don't know where Telgin got the idea that AMD don't support their graphics cards; I've had functional OpenCL support on mine for some time now. They don't support ALL of them, I think they focus on CL1.1 spec cards but you can certainly run OCL code on CPU and GPU with their drivers installed).

Branching code on a GPU isn't a waste, however this depends on the branching pattern for current hardware and work loads, however branching can and will improve performance if used correctly.

While you might use GLSL for imaging you are unlikely to use it if you need to do something more complex which requires more fine grain control over memory accesses; last I checked GLSL gave you no control over access to the local store and many other pieces of compute functionality. There are a class of problems which suite the GPU well and if you aren't using an NV GPU (and not everyone in the world does after all) CUDA is of no use anyway. (There are certain classes of problems where AMD's hardware out performs NV's; bitcoin mining is one such example. the HD58xx series is very good at it)

Granted, not all problems will run well on all devices, however OpenCL never promised they would, it just opens up a common way of programming those problems and, for ones which will translate across well, lets you write code which can be compiled against a CPU or GPU device. Futher support would require backends, but as the current supporters are AMD, NV and Intel on the hardware front it's not surprising those are the most common cases.

Going forward OpenCL has potential due to AMD's Fusion APUs which pair an ALU array with a number of x86 cores. We are still some way of those being common in all PCs but we'll get there in time and at that point OpenCL will be very helpful, including for DSP/Audio work. Intel recently gave a presentation about using OpenCL to perform audio tasks and perform raytracing into the world to improve audio path modeling for sound sources.

Granted there is some work to do still, however the API is only at 1.1; most ideas are not perfect right away after all, and for something which is crosshardware it takes a while to get a baseline complete to get everything up to speed.

So, yes, OpenCL is useful.
YMMV but that's true of ANY solution after all.
(And I don't know where Telgin got the idea that AMD don't support their graphics cards; I've had functional OpenCL support on mine for some time now. They don't support ALL of them, I think they focus on CL1.1 spec cards but you can certainly run OCL code on CPU and GPU with their drivers installed).


I could have sworn I read this on AMD's forums somewhere. I think it was even one of their developers that wrote the comment. Granted, that post may have been years old and quite irrelevant by now, I didn't check. The hardware we have in our computing lab is all GeForce cards so I don't have any exposure to AMD's GPUs with OpenCL.

[color=#1C2837][size=2]
Branching code on a GPU isn't a waste, however this depends on the branching pattern for current hardware and work loads, however branching can and will improve performance if used correctly.[/quote]
[color=#1C2837][size=2]

[color=#1C2837][size=2]I can also vouch for this. The compiler / hardware are generally quite good at reducing the impact of branches. Unless you're using some very deeply nested branches and / or branching in a way that most threads take different paths, it doesn't matter so much. Memory throughput is a much bigger bottleneck in my experience.
Success requires no explanation. Failure allows none.

This topic is closed to new replies.

Advertisement