Fast software rasteriser in JavaScript?
category: code [glöplog]
I'm not sure if keeping track of the spans is preferable to that branch in JS. Wonder how "smart" especially V8 is about that whole deal.
Anyway yeah it's quite suitable for SIMD, fillers in general are :) And that also exposes the option to turn some branching into (fast) masking instead.
Anyway yeah it's quite suitable for SIMD, fillers in general are :) And that also exposes the option to turn some branching into (fast) masking instead.
LordGraga: that drawRectangle function isn't even used by the filler!
marcus: less talking more coding!
marcus: less talking more coding!
[ot: more threads like this! Want to see more .js demos!]
anyway, there's no reason to believe that the half-space approach is the fastest one in this context. go ahead and try other stuff too!
i just picked it because a) the original implementation was following nicks original article on the per-block tests and he's Doing It Wrong(tm), b) i wanted to show that the edge functions really do correspond to barycentric coordinates, c) i just like the algorithm.
i just picked it because a) the original implementation was following nicks original article on the per-block tests and he's Doing It Wrong(tm), b) i wanted to show that the edge functions really do correspond to barycentric coordinates, c) i just like the algorithm.
I've updated the test with the latest and things are getting pretty impressive :D
http://dl.dropbox.com/u/7508542/three.js/software_renderer/disneys_ryg07.html
It slows down a lot when it's upside down, probably drawing outside the buffer, need to take a look into that...
http://dl.dropbox.com/u/7508542/three.js/software_renderer/disneys_ryg07.html
It slows down a lot when it's upside down, probably drawing outside the buffer, need to take a look into that...
it's got nothing to do with out-of-bounds accesses (checked that). it seems like some values go out of range and the whole function gets deoptimized as a result, but the debug output from --js-flags="--trace-deopt" is almost completely useless for figuring out why, released builds of chrome have all the disassembler/debug stuff compiled out, and i'm not going to compile my own version of chromium just to figure out what's going on - i'm not THAT bored. :)
Quote:
probably drawing outside the buffer
no, definitely not. take the first frame that runs slowly (which just has a few triangles barely poking outside the screen boundaries). add 16 to all y coordinates at the very start of the function - runs fast. add 16 (<<4) to all y coordinates just after rounding them to fixed point - this one is slow. add 16 to all y coordinates in the "offset" computation - also slow.
it's got something to do with too small values going into tri setup, but beats me what that is.
it's got something to do with too small values going into tri setup, but beats me what that is.
So a V8 developer got to see the test and seems like the deoptimisation is a bug in V8 (0 * negative_number). A bug got filled :)
Quote:
i'm not THAT bored. :)
but you definitively have little things to do :)
hey vibrator you mean like this thread
rudi, no, i just like to understand what i'm doing :)
on JAVASKRIPPPT?!? HAHA!
Quote:
First some disclaimer about arithmetic: in general if you want to ensure that your fix-point arithmetic is compiled down to int32 arithmetic you need to hint V8 with truncating operations |0. JavaScript numbers are doubles and it's not easy for V8 to narrow double to integer... Operations like * on int32 can actually produce doubles not only when they overflow but also in case like: 0 * -1. V8 has to understand whether these -0 is observable or not; which is not always possible.
Now issues: there are actually two:
1) V8 does not correctly detect that mul-i have actually seen -0 ( http://code.google.com/p/v8/issues/detail?id=2133 ). So it deoptimzes/reoptimizes function under incorrect assumptions and later disables optimizations for the function entirely.
2) |0 does not help with this because it gets optimzed out before we compute -0 checks ( http://code.google.com/p/v8/issues/detail?id=2132 ) so even if you add |0 to multiplications it will not help.
interesting.
though at least the triangle setup stuff should actually be done with doubles (well int64 ideally, but we don't get that with JS). the stuff that's iterated across the tris should be 32-bit int though.
though at least the triangle setup stuff should actually be done with doubles (well int64 ideally, but we don't get that with JS). the stuff that's iterated across the tris should be 32-bit int though.
javascript has a weird syntax, no wonder i never really started with that *couch*.
hey look, it now stays smooth in v8! sweet :)
useful thread is useful! \o/
ryg: yeah, it was all the
Code:
(which I added) I replaced them with Math.floor()
Code:
as suggested by the V8 dev and it's all smooth now. Now I just need to sit down and implement texturing and so on... :) | 0
When implementing "perspective-correct" texturing you can calculate the 4 corner values of 1/z, u/z, v/z (use fixed-point for those too btw as with the halfspace-values) for the current block, regardless if it's fully covered or not, and then do trilinear interpolation in the x- and y-loop. This works quite well (speedwise and visually), depending on the block size and your fixed-point number precision.
Btw: Can making some variables "const" help the speed a little maybe?
A friendly warning: If you calc 1/z, u/z, v/z for pixels outside the polygon itself (eg. corners of blocks, etc) be aware that z can very well be <=0. I have a memory from 1998 where a game engine coder took three days of staring at debug output until he figured out that was the reason for the occasional crashes every few hours. :)
the thing i wrote (the one you used in disneys_ryg07) writes screen-space barycentric coordinates as r and g (scaled of course). in fact that's the only reason it even looks at the edge equations for the full-tile case in the first place - that all goes away if you actually write the input r, g, b for flat shading.
if you want an affine texture mapper (gouraud, ...) you can use them to interpolate your values: during triangle setup, do
and so on for other parameters. then at any pixel that passes the half-space test, interp_u(x,y) = u1 + bary_u*du21 + bary_v*du31. for the whole-tile case, you already know you're gonna need the values for every pixel, so it makes sense to write it as adds. for non-filled tiles, you're usually better off eating the extra math on the pixels that pass, especially if there's lots of interpolated values. (you start running out of registers quick, and having everything in regs is key for the fast tests)
the same approach works for perspective correct interpolation too. this time you calculate world-space barycentric coordinates (which are just perspective-correct screen-space barycentric coordinates, easy to set up). the rest stays pretty much the same. this simplifies things because you still only calculate a few simple deltas for all attributes, and only need to do the (slightly more complicated) perspective interpolation setup and the perspective divide once - to get the perspective correct bary coordinates.
this is how modern gpus do interpolation: just pass the barycentric coords to the shader, it can do the two multiply-adds per interpolant itself. :) (of course, if you mix interpolator types, that's still multiple sets of bary coords that need to be fed into the shader).
if you want an affine texture mapper (gouraud, ...) you can use them to interpolate your values: during triangle setup, do
Code:
du21 = u2 - u1
du31 = u3 - u1
and so on for other parameters. then at any pixel that passes the half-space test, interp_u(x,y) = u1 + bary_u*du21 + bary_v*du31. for the whole-tile case, you already know you're gonna need the values for every pixel, so it makes sense to write it as adds. for non-filled tiles, you're usually better off eating the extra math on the pixels that pass, especially if there's lots of interpolated values. (you start running out of registers quick, and having everything in regs is key for the fast tests)
the same approach works for perspective correct interpolation too. this time you calculate world-space barycentric coordinates (which are just perspective-correct screen-space barycentric coordinates, easy to set up). the rest stays pretty much the same. this simplifies things because you still only calculate a few simple deltas for all attributes, and only need to do the (slightly more complicated) perspective interpolation setup and the perspective divide once - to get the perspective correct bary coordinates.
this is how modern gpus do interpolation: just pass the barycentric coords to the shader, it can do the two multiply-adds per interpolant itself. :) (of course, if you mix interpolator types, that's still multiple sets of bary coords that need to be fed into the shader).
oh, and of course, the punch line for the perspective correct case: if that's all you want, just work in 2DH clip space (instead of post-projection screen space) the whole time! the code in disneys_ryg07 sets everything up in projected screen space, which gives an affine mapper. if you set up things in homogeneous clip space instead, you end up getting [u/w v/w 1/w] (u, v here being bary coordinates) - exactly what you need. this is just olano's 2dh rasterizer.
but wait, there's more! this whole setup makes it super-easy to do perspective correction only every couple pixels. say you're processing 8x8 tiles. well, just calculate 1/w once per tile (preferably for the middle) - done. presto, perspective correction only once per 8x8 block, with affine in the middle, and it's *really* easy to set up.
and if 8x8 is too ratty, well, just calculate 4 values per 8x8 block, and you get correction every 4x4 pixels, which is totally fine for a sw rasterizer.
but wait, there's more! this whole setup makes it super-easy to do perspective correction only every couple pixels. say you're processing 8x8 tiles. well, just calculate 1/w once per tile (preferably for the middle) - done. presto, perspective correction only once per 8x8 block, with affine in the middle, and it's *really* easy to set up.
and if 8x8 is too ratty, well, just calculate 4 values per 8x8 block, and you get correction every 4x4 pixels, which is totally fine for a sw rasterizer.