Center of Mass BackBuffer Implementation
category: code [glöplog]
Hi guys! Long time no see.
I am wondering if someone has got around to compute a center of mass from an image using shaders? You now the easy CPU based code where you iterate each pixel and if you meet certain condition you accumulate for the x and y components and normalize according to a counter.
I am wondering if someone has got around to compute a center of mass from an image using shaders? You now the easy CPU based code where you iterate each pixel and if you meet certain condition you accumulate for the x and y components and normalize according to a counter.
A simple way in a compute shader would be use the exact same technique as in synchronous code. Just use atomicAdd to keep three counters:
The centroid's xy coordinates will be (mySSBO[0] / mySSBO[2], mySSBO[1] / mySSBO[2]).
Code:
uvec2 coord = gl_GlobalInvocationID.xy;
if (imageLoad(image, coord).x > 0) {
atomicAdd(mySSBO[0], coord.x);
atomicAdd(mySSBO[1], coord.y);
atomicAdd(mySSBO[2], 1);
}
The centroid's xy coordinates will be (mySSBO[0] / mySSBO[2], mySSBO[1] / mySSBO[2]).
@cce problem is I don't have access to compute shaders for compatibility reasons, so best I can do is to use backbuffefs in a fragment shader. Thanks by the way!
summing up using one global atomic is easy but inefficient:) youre putting a lot of contention on that atomic.
an easy way to do better is to sum up locally within the threadgroup using groupshared memory, then do one write to the global atomic per threadgroup.
the most efficient way ends up being a parallel descent with no atomics.
an easy way to do better is to sum up locally within the threadgroup using groupshared memory, then do one write to the global atomic per threadgroup.
the most efficient way ends up being a parallel descent with no atomics.
(actually the method used in fragment shaders - multiple downsampling passes, each averaging small rects of samples - ends up being quite efficient)
That's similar to what I started prototyping @smash:
On the first frame seed the fragment's coordinate only when above the threshold.
Read from the backbuffer the previous fragment's coordinate and do a partial sum, keep a counter when that happens. Divide by the counter. I think I may be getting close but not yet, any pseudo/code would be very appreciated!
On the first frame seed the fragment's coordinate only when above the threshold.
Read from the backbuffer the previous fragment's coordinate and do a partial sum, keep a counter when that happens. Divide by the counter. I think I may be getting close but not yet, any pseudo/code would be very appreciated!
A fun technique I used with no shaders in the past: Render to a float texture, ask the hardware to generate mipmaps; that will give you the average of all pixels with an efficient reduction algorithm.
In theory, it could be something that's not a box filter, but in practice, it isn't.
In theory, it could be something that's not a box filter, but in practice, it isn't.
That might work! That gets me my partial sum, how do I normalize?
Perhaps it's enough to compute the mips of a float32 texture with three channels: pixel X and Y coordinates, and the actual binarized shape in the third channel. Then you get the average count in the third channel which you can divide the coordinates with. Sounds too easy to be true, though!
Oh it's not that easy, you need to mask the X and Y channels with the shape in an earlier pass of course.
smash, aren't the final passes quite small in that method? do they fill the gpu?
it feels to me that you can normalize mip gen's output by just multiplying by constant?
every pass is (a+b+c+d)/4, so it should be possible to calculate a constant. for pow textures it's maybe just width*height?
(also, if you'd use compute, you can minimize amount of atomics by using WaveActiveSum in wave, groupshared atomic and then every group does one global atomic)
it feels to me that you can normalize mip gen's output by just multiplying by constant?
every pass is (a+b+c+d)/4, so it should be possible to calculate a constant. for pow textures it's maybe just width*height?
(also, if you'd use compute, you can minimize amount of atomics by using WaveActiveSum in wave, groupshared atomic and then every group does one global atomic)
Even in pixel shaders you could atomic add to some global buffer/uav. Just use a bunch of sum positions (like 256) instead of one to relieve contention.
Depends if it's because of hardware/api limitations you can't use "compute" of course :)
Depends if it's because of hardware/api limitations you can't use "compute" of course :)
I'm intrigued that many seem to be familiar with this technique. What are the usecases for this "screenspace center of mass" algorithm? Culling? Lod?
@loaderror: I don't think everybody here actually used it for that. For me, I believe I dreamt it up as part of some GPGPU algorithm, but I can't recall if I ever actually needed it. I did use it as an interview question a few times, though :-)
+1 for generating mips on an RGB float texture. Set to (r,g,b)=(x,y,1) if the condition is met, and (0,0,0) otherwise. At the end, grab the single averaged (r,g,b) sample, and get x=r/b, y=g/b.
Downside: All that only works properly on square power-of-two textures, or if you can configure the texture to sample (0,0,0) at the borders and mip generation actually honors this.
Downside: All that only works properly on square power-of-two textures, or if you can configure the texture to sample (0,0,0) at the borders and mip generation actually honors this.