I've recently gone through the experience of porting a console/PC engine to VR, so I think I can answer most of your questions. However I only have real hands-on experience with the Rift, so I'm not an expert on the Vive or PSVR.
* What kind of 3D(buffer)data does a VR kit need (minimally) to start rolling?
The current headsets only need stereo LDR color buffers. There's been some dicussion of using depth and/or velocity to assist with reprojection, but currently none of them use that. So basically you'll give the SDK two separate textures (one for the left eye and one for the right eye), or a combined 2x texture here the left half is the left eye and the right half is the right eye. We give them the latter. Really you can think of it as another swap chain, except this swap chain goes to the headset and not to a window. The number of pixels you end giving them is about twice the number in a 1920x1080 buffer, which is higher than the display resolution due to the fisheye warping.
* Can you turn any 3D program into VR? Or does it really require specific techniques from ground up?
From a purely engine/graphics point of view, I would say the answer is "mostly yes". There's a few things that don't work as well in VR (normal mapping is a bit more obvious, as are any tricks that don't give the eyes proper parallax), but for the most part you'll be fine with standard 3D graphics techniques. The biggest considerations are performance, and gameplay/locomotion. Maintaining a consistent 90Hz is not easy, especially on PC. For gameplay you really want to think about how you an make a compelling experience in VR that plays to its strengths, and avoids motion sickness. I'll tell you right now that if you port a FPS to VR that uses analog sticks for movement and rotation, you're going to have some very nauseous players. There are some players that can handle more extreme situations without discomfort, but I personally am of the belief that as VR developers we have a responsibility to make comfort a top priority. It's already a small, niche market, and we're never going to expand past that if we make the average user want to throw up when they play our games.
* So I have a 3D (first person) game made in OpenGL (4.5), can that eventually be transferred to OpenGL?
Oculus supports OpenGL, DX11 and DX12. OpenVR (Rift) supports DX9, DX11, DX12, and OpenGL.
* There are couple of kits out there now (Occulus, PS4 VR, ...). I guess they come with their own SDK's. do they roughly work the same (can I swap easily between them), or are these SDK's big and hard to master?
So Oculus has their own SDK, which is pretty straightforward and easy to use. It actually doesn't have a very big surface area for the core functionality: just some functions for making a swap chain, presenting it, and functions for accessing the current pose and acceleration data from the headset and touch controllers. There's also a separate platform SDK for integrating into their store and multiplayer systems, which is similar to Steamworks. Valve has OpenVR, which has a bit more abstractions compared to the Oculus SDK, but it actually lets you target both the Rift and the Vive. PSVR is a bit different due to being on a console, but that's under NDA so I don't want to get into specifics here.
* Controls (with your head) - I guess that is really bound to whatever SDK the VR kit brings with it, right? Or is this also standardized, straight forward stuff?
This is pretty easy to work with. Both the Oculus SDK and OpenVR will give you a "pose" that represents the current position and orientation of the headset, which is calculated using a combination of sensors in the headset and external tracking cameras. One thing you have to watch out for is keeping your coordinate spaces straight, since you will have to transform from "real world" coordinate space (usually relative to the user's initial pose when they started the game) to your game's world coordinate space. The other wrinkle has to do with the way the headsets compensate for latency. Usually the basic flow of a VR app will go query pose->render the world using a camera locked to the headset post->present to final rendered images to the compositor->image shows up on the headset screen. The problem is that there can be 10's of milliseconds between the first and last step, which can cause a noticeable "laggy" feeling to the user. To compensate for this, all of the current VR headsets have their compositor estimate the pose at the time of display (using the current angular velocity and acceleration), and apply a warping function to your image that essentially rotates the pixels so that they appear to have less latency. What this means for you as a developer is that you want to minimize the time between pose query and presenting, since that means the compositor won't have to warp your image as much. So ideally you want to grab the pose right before your issue your rendering commands.
* For artists, is there anything that needs to be changed or tweaked in their workflow (creating 3D props, textures, normalMaps, ...)?
I already mentioned normal maps, which may not be as effective at fooling users in VR. In general you also want to try to avoid noisy textures with lots of high-frequency detail. Seeing lots of small details move across the screen tends to make users uncomfortable, particularly if it's in the periphery. So you want to stick to flatter, less-noisy textures if you can. As for props, if you can get some props with physics on them that the user can manipulate, that's always fun. We call them "toys" in our game, and put as many the world as we can.
The other big concern is UI. Standard screen-space 2D UI doesn't work very well for VR, and you can't just map it to the headset's screen. It doesn't feel nice, and you only have a very small area of the screen where text is readable. So at the very least you need to put the UI on a plane that the user can look at by turning their head, but ideally you want to find a better way to integrate into your game world. We ended up writing new UI tech from scratch for our game that would let us efficiently populate the world with UI.
* For design / game-rules, would you need to alter things like motion-speed, the size of your rooms? Or can it be mapped pretty much one-on-one from a non-VR setting?
Possibly, depending on your game. You definitely want to try to avoid any situations where the user travels very quickly through smaller areas, since that can give them motion sickness. The fast-moving pixels in the periphery can also cause discomfort. Otherwise, as long as your levels work for normal human scale they should be fine.
* Audio - anything that needs adjustments here, asides from beging 3D/stereo as good as possible?
Positional 3D audio is nice if you can do it. Oculus has plug-ins for the big engines and for popular audio middleware (such as wwise) that will do it for you.
* Performance. Being close to your eyes, would you need larger resolutions? Anything else that slows-down the performance?
The SDK's will tell let you query for the ideal resolution. This resolution is larger than the actually display resolution, since it will picked such that there's close to 1:1 pixel density in the center of the image after applying fish-eye warping. As I said earlier, the Rift will request a resolution that's roughly twice the size of 1920x1080, so you're dealing with a lot of pixels in not a lot of frame time.
One of the popular techniques for reducing pixel load is what Nvidia refers to as multi-res shading. This technique lets you use higher resolution towards the center of the screen and lower resolution towards the edges, which better matches the pixel distribution after fisheye distortion. Nvidia advocates doing it using their proprietary hardware extensions, but you can also do it in a way that works on any GPU: see slide 21 of this presentation, and slide 29 of this presentation.
Another performance concern is from having to draw everything twice for stereo rendering. Usually the simplest way to get VR working is to wrap your render function in a for loop that iterates once per-eye, but this very inefficient. At the very least you need to pull out things that aren't view-dependent, like rendering shadows maps. Ideally you want to set things up so that you don't have the loop at all, since making two passes through your render loop means doubling all of the sync points on the GPU. It's faster to do "draw meshes to both eyes for render target A->sync->draw meshes to both eyes for render target B" then it is to do "draw meshes for left eye for render target A->sync->draw meshes for left eye to render target B->sync->draw meshes for right eye to render target B->sync->draw meshes for right eye to render target B". You can also potentially cut your draw calls in half by doing both eyes simultaneously. We use instancing and clip planes to do this, but you can also use Nvidia's viewport multicast extension to do it more efficiently.
* Your personal experience - easy, hard? Maybe easy to set-up, but hard to make it working *good*?
Definitely not easy to make a good experience. It wasn't very hard just to get our engine up and running on VR (we had 1 programmer spend 2 weeks or so doing this), but it's taken a lot of effort make our engine more efficient at rendering VR. On top of that there was tons of time spent designing and iterating to figure out what would make for a compelling VR experience.
By the way, Oculus has a "Best Practices" doc that you should read through. It should give you a better idea of what's involved in making a VR app, although some of it is a little bit outdated (for instance, they no longer have the option of letting the app do its own distortion, the compositor always does it for you).