multi-cores and soft rendering

category: general [glöplog]

Have any of you played a bit with software rendering in different threads to test the multi-core proccessors? What about results? I suppose it could be a really good improvement for raytracing for example...

added on the 2006-10-08 14:47:34 by texel

hmm.. i dont like the idea that it'll be slugish as fuck on anything else..

added on the 2006-10-08 19:02:09 by psenough

raytracing is pretty much exactly N times faster on a N-core machine, yes.

added on the 2006-10-08 19:06:19 by ryg

Always you don't use textures that doesn't fit in cache I suppose... Any of you have tested results? I'm really interested on this thing... it looks as soon we will have quadcores, and more-cores, so maybe soon we will be learning to optimize for this coreshit.

added on the 2006-10-08 19:12:48 by texel

'nature still fucking sucks'? ;)

added on the 2006-10-08 19:17:01 by post malone

paralel process me beautiful

added on the 2006-10-08 19:18:56 by psenough

polygon softrender is easy to make in parallel:

1. create 2*N worker threads with a common queue
2. on intensive parts, send big-enough chunks of work to each thread, for example:
2.1 one object per thread for vertex processing
2.2 subdivide the screen into 16*N rectangular chunks
2.3 assign one or more slices to each triangle
2.4 assign a slice to each thread until no more slices left, polysort, clip and paint each slice in parallel

alternatively:

Assign a part of the render pipe to each thread, accounting to the fact that pixel-processing will need more than one thread to get even against the vertex and object part.

BTW, anyone here got hold of a sun T1000 niagara? true, floating point suxx on niagara, but with 64 bit ALUs you don't really need any of that FPU things for precission...

added on the 2006-10-08 22:17:39 by winden

I can make a 50 FPS ray tracing engine with 10,000 linked commodore 64s :)

added on the 2006-10-09 09:26:48 by Skate

show me

added on the 2006-10-09 09:35:14 by skrebbel

No, you can't.
Even if you could turn it into an equivalent 10ghz, 8 bit machine with a shitty instruction set, you'd still need a nuclear plant to keep it alive. Also, output palette and resolution would suck ass.

added on the 2006-10-09 12:40:26 by dixan

hey, the 6502 has a quite cute instruction set.

added on the 2006-10-09 12:53:56 by _-_-__

@ryg: for static scenes: yes. dynamic scenes: no.

added on the 2006-10-09 13:35:21 by toxie

knos: expecially to handle RTRT maths ;)

added on the 2006-10-09 14:23:10 by dixan

"nobody said it should move"

added on the 2006-10-09 14:29:12 by ryg

SLI!

added on the 2006-10-09 15:03:46 by the_Ye-Ti

anyone made some benchmark programs to see whether a dual/quad core p4 can keep up with a GPU when it comes to signal processing?

at this moment we're doing some demodulation of a 12 MHz sampled signal and hope to get 8 bands demodulated from it. in reality we now just get 4 bands (P4@3 GHz). we're using the superscalar but no SSE..

added on the 2006-10-09 16:45:38 by earx

earx: i've implemented several stuff for benchmarking. one on a dual core p4smt and a gf6600. here goes what i remember:

400x400 multix-multiplication:
17hz on gpu, 5.5hz on cpu (two multiplications). gpu was faster about 1.5 times.

256x256 fft:
80hz on gpu, 45hz on cpu (two ffts)

though the fft seemed to suit gpu better (n²logn), because of sparse texture access it didn't satisfy our needs. though 6600 seems to be so sucky when it comes to texture fetching. some guys reported smt like 340hz in same dimensions on fx quadro smt on gpu gems 2.

also tried laplace filters, modulators (where gpu was way faster) and a random number generator (cpu kicked ass as expected :) complex systems generally need lots of modulators, accumulators, interpolators etc which make the gpu more advantageous. you don't want to switch domains to compute others (fft, matrix multiplication).

i could also suck at coding but this would cancel out if i suck coding on both. and i've used only cpp and cgfx.

added on the 2006-10-09 19:31:11 by anesthetic

You have to come at the MAIN demoparty for playing with the Eblade Center or the 5 bensley Xeon double core Intel machines... We try to do some grid computing party if possible.

added on the 2006-10-09 22:41:26 by cybernostra

### SPAM DETECTED ###

added on the 2006-10-09 22:52:51 by dixan

I believe a modern PC is more than 10000 times as fast as a C64 when it comes to floating point arithmetics. Also the communication overhead will significantly reduce performance.

added on the 2006-10-09 22:59:37 by Stelthzje

@stelthz: you dont need communication overhead ;) each c64 has to calc 6,4 pixels (320x200) and truely we want multicolor so we have 160x200 (3,2 pixels)... you just have to keep them in sync :D

but truely if we use the KERNEL-FAC (floating point accumulator routines) we just need 10.000.000 c64s since they are so fucking slow...

added on the 2006-10-09 23:46:08 by Danzig

floats are not needed for raytracing

added on the 2006-10-10 00:20:31 by texel

texel: well you need more precision than 8 bits. And in any way you do it, it will be a huge problem on 8 bit..

added on the 2006-10-10 00:26:26 by Stelthzje

yes, thats true stelthz. But maybe 32 bits integers are easier to work with that floats in a 8 bits machine

added on the 2006-10-10 00:28:40 by texel

texel: I doubt it. At least the difference for MUL/DIV is marginal, since most of the time on a 6502 will be spent on multiplying/dividing the mantissa. All the complicated bit fiddling that is required for floats is marginal compared to that. 32 bit floats may even be slightly faster than 32bit integers in some operations.

added on the 2006-10-10 07:56:28 by Stelthzje

pouët.net

multi-cores and soft rendering

login