There's a few spots where APIs have built in features that don't need to be built in, as they're possible by using other API features. One example is mip-map generation -- many API's offer a single function call for this, but you can also do it yourself by rendering to each mip map level. Usually this stuff should belong in a utility library instead of the core API.
As for this particular case, it's pretty much just a utility function. I guess it's in there in case one of the GPU vendors is able to clear memory in a way other than writing to it with CS... In my experience so far though, the generic way to clear a block of memory is to write to it using a CS, so this is probably what the driver is doing under the hood.
As for your timing difference -- 50µs vs 80µs -- it's pretty much the same. I don't really trust GPU timing measurements that are less than around a dozen microseconds :wink:
If that difference remains when scaling up -- e.g. when clearing a larger block of memory, one method takes 5ms whereas the other takes 8ms, then I would certainly believe there is a difference in performance of 8/5=1.6x... but it's possible that there's also a performance difference of 20µs overehad, so in the large scale test the result would be 5ms vs 5.02ms (1.004x difference).
Benchmarking this stuff is also hard because the actual commands that the GPU has to execute include the dispatch/compute-shader execution, but then also include cache flushing, cache invalidation, and pipeline stalling. The cost of these extra operation can highly depend on what kind of dispatch/draw command follows your shader.
Back to mipmap generation -- In theory that should be part of the API so that each vendor can implement it in the fastest way possible for their hardware... but in my experience it's also just there as a helper/utility function, and that it's possible to implement your own versions of it that are faster than the driver.