trick for 1k raymarchers :: lighting
category: code [glöplog]
Spatial coherence optimizations only works for the 1st intersection, reflections, ambient light, and other shading stuff do not benefit from it :(
Quote:
But because of the SIMD nature, the straightforward compute shader solution is a very bad idea, as the worst case pixel times will propagate to much larger areas (and the per-pixel work for sphere tracing varies wildly already).
That's absolutely true (was just saying that in theory one could do it that way -- no more no less).
Oswald: Not exactly. You still need to have some idea of where there is coherence before you can start taking advantage of it. Essentially it means you need to find edges, or guess at where the edges are by first sampling at a lower resolution. It all gets very complicated very fast, especially with the crazy warped shapes people like to render these days.
Paulo: For static scenes you can do a lot of preprocessing though. Say, partition your scene into a 3D grid, and for each region generate the set of shapes that affect the distance function within that region. Then for every raymarching step you can eliminate most of the terms of the distance function, no matter where the ray is at or where it's going. For regions that don't contain any shapes, you can precalculate the minimum value of the distance function and if it's above some threshold, don't evaluate distances in that region, just return the safe minimum distance.
Paulo: For static scenes you can do a lot of preprocessing though. Say, partition your scene into a 3D grid, and for each region generate the set of shapes that affect the distance function within that region. Then for every raymarching step you can eliminate most of the terms of the distance function, no matter where the ray is at or where it's going. For regions that don't contain any shapes, you can precalculate the minimum value of the distance function and if it's above some threshold, don't evaluate distances in that region, just return the safe minimum distance.
Yeah, sorry for stirring the shit where I dont quite understand the topic. I would simply start marching at a neighbour distance - X. Why wouldnt that work ? Go back even more if I get a negative distance (already passed surface) or distance grows instead of shrinking, etc. Well maybe this is complete rubbish, excuse me again, but to me it looks simple :)
lovely trick iq.
Well, consider this scenario:
You've just traced a pixel that landed you somewhere in Morocco, and now you're about to do the next ray to the left of that. You inherit the distance from the previous ray, which puts you slightly inside the Earth. So you backtrack a little bit, then march forward until you reach the surface. Now you've found the precise intersection between that ray and the Earth. You can keep moving one pixel to the left and each time find the correct intersection with the Earth, but then how do you know when your ray starts intersecting the Moon?
Sure, you can come up with solutions in this particular case, and many other cases. But a general solution is really tricky, especially if you don't like glitches. It's especially hard if you want a solution that's actually faster than just doing the full trace for every pixel. Keep in mind those first steps you're eliminating aren't like 99% of the total, since the whole point with distance-field ray marching is you approach the surface very quickly by taking large steps when you're far away from objects. So for all the extra processing, cache abuse and artifacts you introduce you might still save less than half of the distance function evaluations.
You've just traced a pixel that landed you somewhere in Morocco, and now you're about to do the next ray to the left of that. You inherit the distance from the previous ray, which puts you slightly inside the Earth. So you backtrack a little bit, then march forward until you reach the surface. Now you've found the precise intersection between that ray and the Earth. You can keep moving one pixel to the left and each time find the correct intersection with the Earth, but then how do you know when your ray starts intersecting the Moon?
Sure, you can come up with solutions in this particular case, and many other cases. But a general solution is really tricky, especially if you don't like glitches. It's especially hard if you want a solution that's actually faster than just doing the full trace for every pixel. Keep in mind those first steps you're eliminating aren't like 99% of the total, since the whole point with distance-field ray marching is you approach the surface very quickly by taking large steps when you're far away from objects. So for all the extra processing, cache abuse and artifacts you introduce you might still save less than half of the distance function evaluations.
Oswald, for distance field raymarching and CPU rendering, we discussed it here:
http://www.pouet.net/topic.php?which=6675&page=4
For the current GPUs it would not be an advantage because they take its computational advantage only when the calc is very parallel. If it is not -- like in this case -- GPUs can be even slower than a not very good CPU. This looks to be changing in the future GPUs... but the future of GPUs is offtopic here.
But, please think that raymarching is not always distance field raymarching, some times you don't get distance information at all.
http://www.pouet.net/topic.php?which=6675&page=4
For the current GPUs it would not be an advantage because they take its computational advantage only when the calc is very parallel. If it is not -- like in this case -- GPUs can be even slower than a not very good CPU. This looks to be changing in the future GPUs... but the future of GPUs is offtopic here.
But, please think that raymarching is not always distance field raymarching, some times you don't get distance information at all.
dewm, yes in static scenes lots os stuff can be optimized.
When i said Spatial coherence i should have said Screen Space Spatial coherence.
When i said Spatial coherence i should have said Screen Space Spatial coherence.
thanks dewm & texel, your explanations made a lot of sense :)
As I already mentioned, for raymarching in distance fields it's easy to make a safe estimate / lower bound on the starting distance. So you shouldn't use a neighboring pixel, but instead march to find the lower bound for a small group of pixels, and then use this a a starting value for the individual pixel rays. As long as your distance function is correct (ie. a lower bound) it will always work - if not you can get blocking artifacts which are worse than the errors introduced otherwise.
So a naive compute shader implementation can do this fairly easily for something like 4x4 pixels per thread, but depending on the scene this naive implementation may even hurt performance compared to the pure single pixel threads. This is due to the SIMD nature where something like 64 threads (ie 1024 pixels) are executing the same instructions / taking the same amount of time.
So a naive compute shader implementation can do this fairly easily for something like 4x4 pixels per thread, but depending on the scene this naive implementation may even hurt performance compared to the pure single pixel threads. This is due to the SIMD nature where something like 64 threads (ie 1024 pixels) are executing the same instructions / taking the same amount of time.
There is some kind of a space locality when you raymarch a low entropy and predictable surface in 3d. Now how to take advantage of it is maybe not so straightforward.
I've been using packet raymarching for a while, so i'll explain what i did and how it works out. The idea comes from the (in)famous MLRTA (Multi Level Ray Tracing Algorithm) with the magic result numbers of speed performance boosts that nobody manages to reproduce ( heh, those who have read the paper will understand that one :). The outline is simple: what if we raymarch the same scene, from the same camera settings and point of view, at a lower resolution, isn't the result easy to exploit to get closer to the intersection?
Yes, but there is some problems. Imagine for example, the case cited above, in which the moon overlaps the earth. If you throw a ray at the edge of the moon in a lower resolution, it might well fall down directly behind and we lose the previous intersection when we refine the intersection for each actual screen pixel. The idea to fix this, and it is very handy to use signed (hopefully euclidian) distance functions, is to add some kind of a magic constant to the signed distance function to make every object look more bloated than it actually is, this way it will block the rays instead of diving too deep and let some intersections missed.
So indeed the outline is as theyeti mentioned, by upscaling a lower resolution render in luminance float32 format, storing the farthest znear found for a given "frustum", using nearest filtering, but we don't raymarch quite the *same* signed distance function... :) that's the whole trick. We need a function that is proportional to the distance to camera eye (so it works only with perspective projections, but i guess that's ok, it will only accelerate point light tracing and the primary rays, but that's nice already).
I found out that using the distance to the neighbour ray on the unit sphere from the camera eye is pretty good, and scaling this with the current ray z. So it actually means, if you want to figure out things this way, that we're trying to trace a cone in the scene and we try to make it as big as possible... This has been done already, it's called cone tracing... Cone tracing is pretty nice, it enables to get a grasp of what is the "delta" to shade, and can help to solve various aliasing artifacts as a bonus (what is the color of the surface in the sphere of radius R?)...
But mixing cone tracing and ray marching will already boost performance, because you don't sluggish to find the 0.000001 precision for that object at z=1000.0 ... :) from what i've tested, even without "packet tracing" (=upscaling lower res renders with nearest filtering), it is already 2-3x faster, depending on the viewpoint and scene and everything of course.
Packet tracing will bring huge speed boosts on top of it as it will skip most of the "useless" empty spaces for large bunches of neighbour rays.. and indeed it gets things up to 10-20 faster than if you don't use anything.. I use 16*16 for the top level for these results, then i get down to screenResX/4, screenResY/4, and then i raytrace the actual resolution.
So the code looks like this:
so i guess with this you can probably make sislesix realtime now :)
regards,
ps: sorry, the topic was about lightning, but the discution got interesting :) i think your optimisation only works for lambertian lightning, but that's already nice for 1k, as you say :) and about the banding, here i see more banding on the 1st image.. idk what i should see though, i didn't get everything about it :p
I've been using packet raymarching for a while, so i'll explain what i did and how it works out. The idea comes from the (in)famous MLRTA (Multi Level Ray Tracing Algorithm) with the magic result numbers of speed performance boosts that nobody manages to reproduce ( heh, those who have read the paper will understand that one :). The outline is simple: what if we raymarch the same scene, from the same camera settings and point of view, at a lower resolution, isn't the result easy to exploit to get closer to the intersection?
Yes, but there is some problems. Imagine for example, the case cited above, in which the moon overlaps the earth. If you throw a ray at the edge of the moon in a lower resolution, it might well fall down directly behind and we lose the previous intersection when we refine the intersection for each actual screen pixel. The idea to fix this, and it is very handy to use signed (hopefully euclidian) distance functions, is to add some kind of a magic constant to the signed distance function to make every object look more bloated than it actually is, this way it will block the rays instead of diving too deep and let some intersections missed.
So indeed the outline is as theyeti mentioned, by upscaling a lower resolution render in luminance float32 format, storing the farthest znear found for a given "frustum", using nearest filtering, but we don't raymarch quite the *same* signed distance function... :) that's the whole trick. We need a function that is proportional to the distance to camera eye (so it works only with perspective projections, but i guess that's ok, it will only accelerate point light tracing and the primary rays, but that's nice already).
I found out that using the distance to the neighbour ray on the unit sphere from the camera eye is pretty good, and scaling this with the current ray z. So it actually means, if you want to figure out things this way, that we're trying to trace a cone in the scene and we try to make it as big as possible... This has been done already, it's called cone tracing... Cone tracing is pretty nice, it enables to get a grasp of what is the "delta" to shade, and can help to solve various aliasing artifacts as a bonus (what is the color of the surface in the sphere of radius R?)...
But mixing cone tracing and ray marching will already boost performance, because you don't sluggish to find the 0.000001 precision for that object at z=1000.0 ... :) from what i've tested, even without "packet tracing" (=upscaling lower res renders with nearest filtering), it is already 2-3x faster, depending on the viewpoint and scene and everything of course.
Packet tracing will bring huge speed boosts on top of it as it will skip most of the "useless" empty spaces for large bunches of neighbour rays.. and indeed it gets things up to 10-20 faster than if you don't use anything.. I use 16*16 for the top level for these results, then i get down to screenResX/4, screenResY/4, and then i raytrace the actual resolution.
So the code looks like this:
Code:
// deltapp1 is the distance to neighbour pixel on the unit sphere from the eye position...
vec4 raytraceRecursive( vec3 p, vec3 d, float deltapp1, float tnear, float tfar, float importance )
{
float dist;
float t = tnear;
float epsilon = t * deltapp1;
float closestMat = 0.0;
for (;;)
{
if (t >= tfar) break;
dist = distanceFunc( p+d*t, closestMat ) - epsilon;
if (dist < epsilon) break;
t += dist;
epsilon = t * deltapp1;
}
vec3 position = p+d*t;
return shade( position, d, closestMat, abs( dist ), importance );
}
so i guess with this you can probably make sislesix realtime now :)
regards,
ps: sorry, the topic was about lightning, but the discution got interesting :) i think your optimisation only works for lambertian lightning, but that's already nice for 1k, as you say :) and about the banding, here i see more banding on the 1st image.. idk what i should see though, i didn't get everything about it :p
glad to know my c64 coding instincts werent that bad afterall :)
doomdoom, almost 2 years ago you wrote in this thread:
"And you can improve the accuracy with float d=f(p+0.1*l/c)*c; Converges on the exact result of the dot product method as c -> infinity."
Forgive the noobness of this question but what exactly is c here?
"And you can improve the accuracy with float d=f(p+0.1*l/c)*c; Converges on the exact result of the dot product method as c -> infinity."
Forgive the noobness of this question but what exactly is c here?
c is probably the speed of light :) no, really
Oh man, total bbcode fail *facepalm*
voxelizr, if i'm not mistaken, c is any big constant. make it bigger for more accuracy. make it massive to fuck everything up cause you'll get float rounding issues.