• Announcements

    • khawk

      Download the Game Design and Indie Game Marketing Freebook   07/19/17

      GameDev.net and CRC Press have teamed up to bring a free ebook of content curated from top titles published by CRC Press. The freebook, Practices of Game Design & Indie Game Marketing, includes chapters from The Art of Game Design: A Book of Lenses, A Practical Guide to Indie Game Marketing, and An Architectural Approach to Level Design. The GameDev.net FreeBook is relevant to game designers, developers, and those interested in learning more about the challenges in game development. We know game development can be a tough discipline and business, so we picked several chapters from CRC Press titles that we thought would be of interest to you, the GameDev.net audience, in your journey to design, develop, and market your next game. The free ebook is available through CRC Press by clicking here. The Curated Books The Art of Game Design: A Book of Lenses, Second Edition, by Jesse Schell Presents 100+ sets of questions, or different lenses, for viewing a game’s design, encompassing diverse fields such as psychology, architecture, music, film, software engineering, theme park design, mathematics, anthropology, and more. Written by one of the world's top game designers, this book describes the deepest and most fundamental principles of game design, demonstrating how tactics used in board, card, and athletic games also work in video games. It provides practical instruction on creating world-class games that will be played again and again. View it here. A Practical Guide to Indie Game Marketing, by Joel Dreskin Marketing is an essential but too frequently overlooked or minimized component of the release plan for indie games. A Practical Guide to Indie Game Marketing provides you with the tools needed to build visibility and sell your indie games. With special focus on those developers with small budgets and limited staff and resources, this book is packed with tangible recommendations and techniques that you can put to use immediately. As a seasoned professional of the indie game arena, author Joel Dreskin gives you insight into practical, real-world experiences of marketing numerous successful games and also provides stories of the failures. View it here. An Architectural Approach to Level Design This is one of the first books to integrate architectural and spatial design theory with the field of level design. The book presents architectural techniques and theories for level designers to use in their own work. It connects architecture and level design in different ways that address the practical elements of how designers construct space and the experiential elements of how and why humans interact with this space. Throughout the text, readers learn skills for spatial layout, evoking emotion through gamespaces, and creating better levels through architectural theory. View it here. Learn more and download the ebook by clicking here. Did you know? GameDev.net and CRC Press also recently teamed up to bring GDNet+ Members up to a 20% discount on all CRC Press books. Learn more about this and other benefits here.
Sign in to follow this  
Followers 0

OpenCL Driver/Runtime

1 post in this topic

So I have isolated some portions of an application that lend itself to being parallelized. I already have some multi-threading in place to spin off threads to distribute these "tasks" over multiple available CPU cores.


Now I am interested in (optionally) delegating some of these tasks to a second GPU and would like to use OpenCL to do so. After a (albeit tiny) bit of research it seems that the situation of detecting and initializing a GPU as compute device is anything but straightforward. Especially the OpenCL driver situation is rather confusing to me as it seems that all major vendors are supplying OpenCL drivers/runtimes that need to be shipped and installed with my application? Because of this I am also wondering if these drivers are actually specific to the vendors' hardware and if I'd have to ship all possible drivers with the application and then choose the "right" one to install depending on the client hardware?


Anyone have any experience with this or a reference to an article/blog that gives an overview of this situation?


Share this post

Link to post
Share on other sites

For me, OpenCL turned out to be a big letdown although it looked really cool and promising at first.


About the driver, this is very simple. The user has installed OpenCL when installing the driver for the graphics card (without even knowing!), which includes a vendor-specific component, and the stub DLL that you use. Nothing to do for you, nothing to distribute or install.

If no OpenCL has been installed by the user (10 year old graphics card?), there's nothing you can do about.


In easy words, you either simply link to opencl.lib (using the present opencl.dll) or load the DLL/so dynamically (I prefer that, having had trouble linking directly, and dynamic loading isn't very hard), and this one will forward your calls to the "secret" implementation of the platform/device combo that you use. Your work is basically the same thing as with OpenGL using 2/3/4 functionality or extensions.

You basically need to write a small GLEW for CL. Searching the internet for "OpenCL ICD loader" gave me a BSD-licensed library for CL 1.0 on Apple when I tried a year or two ago, it only needed some minor fixup to work with Windows, and I had to add a few tidbits for CL 1.1 (which is like 2 minutes of work once you have the skeleton!).


So far so good. Now comes the nasty part. Identifying the "correct" device to use isn't really easy or straighforward. OpenCL is maximally flexible and maximally portable, and maximally heterogenous and whatnot, and this is maximally shit. There is no single good way to choose the "correct" thing.


The only thing for the "usual" usage where you wish to consume the output for some kind of rendering that reliably works (works at all, or works without an explicit roundtrip) is creating a compatible CL context that lives on the same device from an existing GL context. For this, you need to use an extension (which is factually omni-present, but still it could in theory not be present... what do you do if it's not present?), and despite all "portability" this requires platform-specific code, grrr...


Now of course, you might not want a context that lives on the same device, but instead use another device (you've explicitly said so, too). If you have two GPUs, it makes for example sense to use one for graphics, and one for physics. And it "just works", right?

Sadly, this isn't well-supported, or supported at all. You must do some manual copying back and forth to/from the host to make it work (which may be slower than doing it on the CPU or on the main GPU), even if common sense tells you "hey, I have SLI/Crossfire, the driver could do that an order of magnitude faster and easier, without me even knowing". Maybe there is a way to get this working, but I'm not aware of it. In my experience, everything except "create CL context from GL context" sucks big time.


Other than the "create from GL" approach, you can enumerate platforms and devices and choose whatever you want to use, but if you search the internet, you'll be surprised to find that hardly anyone does anything but pick the first platform and the first device that comes up. You wonder why? Because that's the only thing that isn't totally convoluted and that actually works fine. You can easily write 50-100 lines of code only for figuring out what device to create a context for, and what you end up with may not be the best choice at all.


Share this post

Link to post
Share on other sites

Oh wow. Thanks a lot for the detailed reply!


Indeed in my case I am already using one of the GPUs for the (OpenGL-)rendering context and would like to use the second GPU for some things that right now are done on CPU (again, optionally of course, re-inforced by all the gotchas you mention).


I hadn't actually thought about using some sort of Crossfire/SLI technology here, but it makes a lot of sense, as for example one of the uses In my case would be tessellation, and the trip from

CPU -> 2nd GPU -> CPU (result) -> 1st GPU (render)

could then be trimmed down to

CPU -> 2nd GPU -> 1stGPU

and that would be great, but I guess the tech is not really there yet in reality. I did indeed also hear some horror-stories about OpenCL/OpenGL interop breaking randomly with driver releases and therefore not really being used outside of scientific ( = controlled environement) uses.


Is there any hope of these things being fixed in the not so distant future? Is maybe CUDA worth looking at more in terms of reliability (even if hardware specific)?



Also wondering if the situation is any better or worse on OS X? Specifically looking at hardware like their new dual AMD MacPro machines (where CUDA is obviously a no-go),

Edited by h3xl3r

Share this post

Link to post
Share on other sites

CUDA means "will never, not ever, not even a bit, run on AMD or Intel" which is a dealbreaker for me. Though of course if that little detail doesn't matter to you, then CUDA is a whole lot better.


Actually OpenCL tries to do the right thing, it only doesn't just get there because it's too complex/obscure (and went a step too far), and because the interoperability is too badly implemented.


I would wish for something like the user being able to define in the control panel which GPUs are elegible for computing, and you just say "give me a compute device", then I throw a kernel and some input and output buffer objects at it, and the rest is the driver's problem.

The only thing I'd really want to know is that my device is able to efficiently interoperate with the main GPU (for that there'd need to be a feature flag at context creation). I really don't want to know on which card a buffer lives or where a kernel executes, or anything else. I really don't want to know what it takes so the GPU will use my generated data as vertex input to draw some stuff. If it needs to be copied over crossfire, the driver should just do it, if it needs to go down and up again PCIe, then do that. If it's the same GPU, even better.


Of course in theory there exist those mysthical OpenCL accelerator cards, but nobody has them, and you wouldn't want to use them anyway, so all that is purely hypothetical. And then, there's CPU implementations, but you can likely write equally fast (or possibly faster, since you are not bound by the API contract and the execution model) code on the CPU with less trouble. What you realistically want is to use the GPU (or one GPU if there are several), and you want this fast and with little trouble.


Compute shaders may very well be an alternative to OpenCL, as they're basically just what one wants (and with one less dependency!). Unluckily, they're not available until OpenGL 4.3 and on the latest hardware. There is no such thing as a downgraded compute shader version akin to OpenCL 1.0 which basically runs fine on 10 year old hardware and is kind of sufficient for 97.5% of everything.


Share this post

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  
Followers 0