For me, OpenCL turned out to be a big letdown although it looked really cool and promising at first.
About the driver, this is very simple. The user has installed OpenCL when installing the driver for the graphics card (without even knowing!), which includes a vendor-specific component, and the stub DLL that you use. Nothing to do for you, nothing to distribute or install.
If no OpenCL has been installed by the user (10 year old graphics card?), there's nothing you can do about.
In easy words, you either simply link to opencl.lib (using the present opencl.dll) or load the DLL/so dynamically (I prefer that, having had trouble linking directly, and dynamic loading isn't very hard), and this one will forward your calls to the "secret" implementation of the platform/device combo that you use. Your work is basically the same thing as with OpenGL using 2/3/4 functionality or extensions.
You basically need to write a small GLEW for CL. Searching the internet for "OpenCL ICD loader" gave me a BSD-licensed library for CL 1.0 on Apple when I tried a year or two ago, it only needed some minor fixup to work with Windows, and I had to add a few tidbits for CL 1.1 (which is like 2 minutes of work once you have the skeleton!).
So far so good. Now comes the nasty part. Identifying the "correct" device to use isn't really easy or straighforward. OpenCL is maximally flexible and maximally portable, and maximally heterogenous and whatnot, and this is maximally shit. There is no single good way to choose the "correct" thing.
The only thing for the "usual" usage where you wish to consume the output for some kind of rendering that reliably works (works at all, or works without an explicit roundtrip) is creating a compatible CL context that lives on the same device from an existing GL context. For this, you need to use an extension (which is factually omni-present, but still it could in theory not be present... what do you do if it's not present?), and despite all "portability" this requires platform-specific code, grrr...
Now of course, you might not want a context that lives on the same device, but instead use another device (you've explicitly said so, too). If you have two GPUs, it makes for example sense to use one for graphics, and one for physics. And it "just works", right?
Sadly, this isn't well-supported, or supported at all. You must do some manual copying back and forth to/from the host to make it work (which may be slower than doing it on the CPU or on the main GPU), even if common sense tells you "hey, I have SLI/Crossfire, the driver could do that an order of magnitude faster and easier, without me even knowing". Maybe there is a way to get this working, but I'm not aware of it. In my experience, everything except "create CL context from GL context" sucks big time.
Other than the "create from GL" approach, you can enumerate platforms and devices and choose whatever you want to use, but if you search the internet, you'll be surprised to find that hardly anyone does anything but pick the first platform and the first device that comes up. You wonder why? Because that's the only thing that isn't totally convoluted and that actually works fine. You can easily write 50-100 lines of code only for figuring out what device to create a context for, and what you end up with may not be the best choice at all.