Isn't it just possible to write a compute shader that goes over all pixels and sum all of log luminances and put it in a float . then simply we can multiply this float value by 1/N either by cpu or gpu ?
This can be done by using Parallel Reduction algorithm:
Which basically amounts to downsampling in the case of a 2D buffer
You can store your 2D buffer as 1D contiguous buffer with width*height elements and apply the Parallel reduction algorithm.