multi-cores and soft rendering
category: general [glöplog]
Have any of you played a bit with software rendering in different threads to test the multi-core proccessors? What about results? I suppose it could be a really good improvement for raytracing for example...
hmm.. i dont like the idea that it'll be slugish as fuck on anything else..
raytracing is pretty much exactly N times faster on a N-core machine, yes.
Always you don't use textures that doesn't fit in cache I suppose... Any of you have tested results? I'm really interested on this thing... it looks as soon we will have quadcores, and more-cores, so maybe soon we will be learning to optimize for this coreshit.
'nature still fucking sucks'? ;)
paralel process me beautiful
polygon softrender is easy to make in parallel:
1. create 2*N worker threads with a common queue
2. on intensive parts, send big-enough chunks of work to each thread, for example:
2.1 one object per thread for vertex processing
2.2 subdivide the screen into 16*N rectangular chunks
2.3 assign one or more slices to each triangle
2.4 assign a slice to each thread until no more slices left, polysort, clip and paint each slice in parallel
alternatively:
Assign a part of the render pipe to each thread, accounting to the fact that pixel-processing will need more than one thread to get even against the vertex and object part.
BTW, anyone here got hold of a sun T1000 niagara? true, floating point suxx on niagara, but with 64 bit ALUs you don't really need any of that FPU things for precission...
1. create 2*N worker threads with a common queue
2. on intensive parts, send big-enough chunks of work to each thread, for example:
2.1 one object per thread for vertex processing
2.2 subdivide the screen into 16*N rectangular chunks
2.3 assign one or more slices to each triangle
2.4 assign a slice to each thread until no more slices left, polysort, clip and paint each slice in parallel
alternatively:
Assign a part of the render pipe to each thread, accounting to the fact that pixel-processing will need more than one thread to get even against the vertex and object part.
BTW, anyone here got hold of a sun T1000 niagara? true, floating point suxx on niagara, but with 64 bit ALUs you don't really need any of that FPU things for precission...
I can make a 50 FPS ray tracing engine with 10,000 linked commodore 64s :)
show me
No, you can't.
Even if you could turn it into an equivalent 10ghz, 8 bit machine with a shitty instruction set, you'd still need a nuclear plant to keep it alive. Also, output palette and resolution would suck ass.
Even if you could turn it into an equivalent 10ghz, 8 bit machine with a shitty instruction set, you'd still need a nuclear plant to keep it alive. Also, output palette and resolution would suck ass.
hey, the 6502 has a quite cute instruction set.
@ryg: for static scenes: yes. dynamic scenes: no.
knos: expecially to handle RTRT maths ;)
"nobody said it should move"
SLI!
anyone made some benchmark programs to see whether a dual/quad core p4 can keep up with a GPU when it comes to signal processing?
at this moment we're doing some demodulation of a 12 MHz sampled signal and hope to get 8 bands demodulated from it. in reality we now just get 4 bands (P4@3 GHz). we're using the superscalar but no SSE..
at this moment we're doing some demodulation of a 12 MHz sampled signal and hope to get 8 bands demodulated from it. in reality we now just get 4 bands (P4@3 GHz). we're using the superscalar but no SSE..
earx: i've implemented several stuff for benchmarking. one on a dual core p4smt and a gf6600. here goes what i remember:
400x400 multix-multiplication:
17hz on gpu, 5.5hz on cpu (two multiplications). gpu was faster about 1.5 times.
256x256 fft:
80hz on gpu, 45hz on cpu (two ffts)
though the fft seemed to suit gpu better (n²logn), because of sparse texture access it didn't satisfy our needs. though 6600 seems to be so sucky when it comes to texture fetching. some guys reported smt like 340hz in same dimensions on fx quadro smt on gpu gems 2.
also tried laplace filters, modulators (where gpu was way faster) and a random number generator (cpu kicked ass as expected :) complex systems generally need lots of modulators, accumulators, interpolators etc which make the gpu more advantageous. you don't want to switch domains to compute others (fft, matrix multiplication).
i could also suck at coding but this would cancel out if i suck coding on both. and i've used only cpp and cgfx.
400x400 multix-multiplication:
17hz on gpu, 5.5hz on cpu (two multiplications). gpu was faster about 1.5 times.
256x256 fft:
80hz on gpu, 45hz on cpu (two ffts)
though the fft seemed to suit gpu better (n²logn), because of sparse texture access it didn't satisfy our needs. though 6600 seems to be so sucky when it comes to texture fetching. some guys reported smt like 340hz in same dimensions on fx quadro smt on gpu gems 2.
also tried laplace filters, modulators (where gpu was way faster) and a random number generator (cpu kicked ass as expected :) complex systems generally need lots of modulators, accumulators, interpolators etc which make the gpu more advantageous. you don't want to switch domains to compute others (fft, matrix multiplication).
i could also suck at coding but this would cancel out if i suck coding on both. and i've used only cpp and cgfx.
You have to come at the MAIN demoparty for playing with the Eblade Center or the 5 bensley Xeon double core Intel machines... We try to do some grid computing party if possible.
### SPAM DETECTED ###
I believe a modern PC is more than 10000 times as fast as a C64 when it comes to floating point arithmetics. Also the communication overhead will significantly reduce performance.
@stelthz: you dont need communication overhead ;) each c64 has to calc 6,4 pixels (320x200) and truely we want multicolor so we have 160x200 (3,2 pixels)... you just have to keep them in sync :D
but truely if we use the KERNEL-FAC (floating point accumulator routines) we just need 10.000.000 c64s since they are so fucking slow...
but truely if we use the KERNEL-FAC (floating point accumulator routines) we just need 10.000.000 c64s since they are so fucking slow...
floats are not needed for raytracing
texel: well you need more precision than 8 bits. And in any way you do it, it will be a huge problem on 8 bit..
yes, thats true stelthz. But maybe 32 bits integers are easier to work with that floats in a 8 bits machine
texel: I doubt it. At least the difference for MUL/DIV is marginal, since most of the time on a 6502 will be spent on multiplying/dividing the mantissa. All the complicated bit fiddling that is required for floats is marginal compared to that. 32 bit floats may even be slightly faster than 32bit integers in some operations.