To expand on this, the speed of traditional ray-tracing is dependent on screen resolution by a linear factor (increase render width by two, work needed increased by two) by definition and the relationship to scene complexity depends on the underlying acceleration structure used to make collision tests faster. Existing algorithms (octrees, kd-trees, BVH's) have brought down this cost to O(log(n)) in the number of primitives in the scene, and that's the best you can do in the general case. Thought you also have to take into account how efficient it is to *create* the acceleration structure, which is often not a concern for static scenes provided it is reasonable, but is a problem when you have dynamic objects.
The key to fast ray-tracing is using a very fast method for determining the "which piece of geometry does this ray collide with" question.
The above quote is assuming that your ray-collision algorithm is not all all affected by the amount of objects in the scene or visible to the camera (that is to say, collision has O(1) complexity with regards to scene size). This is usually not the case, although the snake-oil salesmen at "Unlimited Detail" do make such a claim.
Compare this to rasterization, which is O(1) with respect to scene size, because rasterization is done in normalized coordinates and can be interpolated while preserving correctness (the reason higher resolutions are more expensive is not really rasterization but pixel shading, i.e. drawing all the pixels), and O(n) in scene complexity, as each triangle fed to the pipeline has to be rasterized somehow before it can be culled by falling off-screen, which isn't great. But on the plus side, you don't need any kind of preprocessing for rasterization, you just feed all the triangles, in any order, and it just works.
Using your heightfield approach, the same complexity issue arises: the more accurate the heightfield, the more "ray-marching steps" you'll need to do to actually reach that accuracy - and that's about a log(n) relationship if you use proper bisection to converge on the intersection point (but I feel your method is not foolproof and generates artifacts when the heightfield is not well-behaved).
Also, with regard to "Unlimited Detail" claims, let me just make it clear right here: it is impossible (unless you feel like contradicting information theory) to render a general scene, be it voxels, triangles, or atoms, in O(1) time with respect to the number of voxels, triangles, or atoms. The absolute best you can do is O(log(n)), and it can be proven. Even if UD can pull it off with some very specific scene (perhaps using instancing, as they seem to love doing), it cannot be done in the general case.
You can look it up on Youtube - you'll see videos of people rendering so-called "billions" of triangles in realtime, then you're realize it's just the same model instanced dozens of times in every dimension. Try that with different models, and surprise! It doesn't work anymore.