Fast Plasma Effect
category: general [glöplog]
Hi !
I have written some bad C# code with use of SDL.NET. I want to make a fast Plasma Effect. Its meant for a animated background so it must be quick :).
Can someone give me some faster ways to edit a surface in SDL, or see for yourself.
I have written some bad C# code with use of SDL.NET. I want to make a fast Plasma Effect. Its meant for a animated background so it must be quick :).
Can someone give me some faster ways to edit a surface in SDL, or see for yourself.
Code:
public virtual void UpdateEffect()
{
int w = surface.Width;
int h = surface.Height;
if (cls == null)
{
cls = new int[w, h];
for (int x = 0; x < w; x++)
{
for (int y = 0; y < h; y++)
{
cls[x, y] = (int)(
(127.5 + +(127.5 * Math.Sin(x / 32.0)))
+ (127.5 + +(127.5 * Math.Sin(y / 32.0)))
+ (127.5 + +(127.5 * Math.Sin(Math.Sqrt((x * x + y * y)) / 32.0)))
) / 3;
}
}
}
Color[,] colors = new Color[w, h];
paletteShift = Convert.ToInt32(Environment.TickCount/1000 );
for (int x = 0; x <w; x++)
{
for (int y = 0; y < h; y++)
{
colors[x, y] = palette[(cls[x,y] + paletteShift)%255];
}
}
surface.SetPixels(new Point(0, 0), colors);
}
SDL is slow by concept - if you're using C# and .NET, why dont you just go DirectDraw? would save the hassle - at least most of it... (sure, it's deprecated, but only because everything is accel now)
Exploring sw rendering can be a nice thing.
Some quick random hints looking at the code:
1) Always try to use look-up tables when it comes to computationally heavy stuff like sin,cos,tan,sqrt and friends (this should give you a good speed-up...)
2) Using integer math / fixed point math still can improve speed a lot expecially when math precision isn't a must
3_minus)
- multiply for (1/x) instead of dividing for x (if x is constant)
- when doing modulus operations where the divisor is a power of two (x % 2^n) (in your code "... % 256" should be) you can replace it with a faster bitwise AND where "(x & (2^n)) - 1", i.e. x % 256 <-equals-> x & (256 - 1).
- when dividing/multiplying for powers of two (integers) you can use bit shifting (i.e.: x * 32 <--> x << 5). Probably compilers already do these last two kinds of optimizations.
(bitwise operations)
Found also this site, maybe you'll find it useful: http://student.kuleuven.be/~m0216922/CG/index.html.
Some quick random hints looking at the code:
1) Always try to use look-up tables when it comes to computationally heavy stuff like sin,cos,tan,sqrt and friends (this should give you a good speed-up...)
2) Using integer math / fixed point math still can improve speed a lot expecially when math precision isn't a must
3_minus)
- multiply for (1/x) instead of dividing for x (if x is constant)
- when doing modulus operations where the divisor is a power of two (x % 2^n) (in your code "... % 256" should be) you can replace it with a faster bitwise AND where "(x & (2^n)) - 1", i.e. x % 256 <-equals-> x & (256 - 1).
- when dividing/multiplying for powers of two (integers) you can use bit shifting (i.e.: x * 32 <--> x << 5). Probably compilers already do these last two kinds of optimizations.
(bitwise operations)
Found also this site, maybe you'll find it useful: http://student.kuleuven.be/~m0216922/CG/index.html.
actually, cls doesn't seem to be time-dependent at all, so just compute it once (you don't even need sine tables for that, woohoo!).
and using an actual paletted image format and actual palette rotation instead of doing a paletted->truecolor conversion by hand in managed code each frame (it should be either %256 or &255, by the way) should make it run blazingly fast without any real work :)
and using an actual paletted image format and actual palette rotation instead of doing a paletted->truecolor conversion by hand in managed code each frame (it should be either %256 or &255, by the way) should make it run blazingly fast without any real work :)
ryg: smashing your lcd gives an even cooler plasma-effect in no-time!
IF the surface width is a power of 2 and the total pixel count is a multiple of 16
IF your buffers are in user memory
then try this
change your loops from this
buf = new int[x, y]
for(int x = 0; x < w; x++)
{
for(int y = 0; y < h; y++)
{
buf[x, y] = code here
...
to something like this
int Log2(int val)
{
ASSERT(val > 0)
int res = -1;
while((1 << ++res) <= val);
return(res - 1);
}
mask = w - 1 //w must be a power of 2 remember
bitshift = Log2(w)
pixelcount = w * h
//initialize buffer like this
buf = new int[pixelcount]
for(int j = 0; j < pixelcount; j += 48)
{
for(int i = 0; i < 16; ++i, ++j)
{
// i is never used
//if you need x and y, here is how to compute them
x = j & mask
y = j >> bitshift
buf[j] = palette[(cls[x,y] + paletteShift)%255];
x = (j+16) & mask
y = (j+16) >> bitshift
buf[j+16] = palette[(cls[x,y] + paletteShift)%255];
x = (j+32) & mask
y = (j+32) >> bitshift
buf[j+32] = palette[(cls[x,y] + paletteShift)%255];
x = (j+48) & mask
y = (j+48) >> bitshift
buf[j+48] = palette[(cls[x,y] + paletteShift)%255];
This memory access method gave me a better speed increase than anything else(in C)
The moral here is, dont access big buffers in user memory sequentially
Dam, I can't post code correctly
I use the magic number 16 because
16 ints = 16 x 4bytes = 64 bytes = cache line size of my cpu
But I'm not sure this is the reason of the speed increase
16 ints = 16 x 4bytes = 64 bytes = cache line size of my cpu
But I'm not sure this is the reason of the speed increase
For the first program, in SDL you can use the function of the documentation to edit the surface.
But it's very slow.
But it's very slow.
I doubt that using sin tables would speed up anything, only opposite. At least, lookup tables slow down shaders.
Allocating memory in a time-critical function called every frame isn't a good idea...
And can't SDL automagically convert to truecolor if you specify a 8-Bit surface with palette? I vaguely remember I saw some functions that did that. They might be optimized already, so at least that should work a bit quicker.
And can't SDL automagically convert to truecolor if you specify a 8-Bit surface with palette? I vaguely remember I saw some functions that did that. They might be optimized already, so at least that should work a bit quicker.
isn't cls allocated once already? and sintables already calculate only one time? and couldnt you avoid these shifts and ands by taking propper loops, you can predict it by using multiples of 64 as texture size? sorry kiddin but it seems that this one can be improved a lot.. thinkin about 8 bit mode and palette cycling or making palette double sized (and doublicated) so you can avoid the and(%255) at each pixel.. there are more possibilities. i suppose.. ;) maybe i'm wrong..
Quote:
I doubt that using sin tables would speed up anything, only opposite. At least, lookup tables slow down shaders.
Of course they would. I have tried once instead of using lookup tables for my regular plasmas to use math.sin per pixel and it was 3 times slower. Well that could be different for shaders I guess.
p.s. Hmm,. a plasma with sqrt? Never tried this equation, I wonder what kind of shape does it show!
but these "news" hurt, are they really required here? something which is always (atleast mostly) a bad idea.
sqrt... well, i guess it's fast on pc, lol ;)
the sin(sqrt(x*x+y*y)) shows circular ripples..
it's amazing that a plasma with sdl (or directx) is programmable in 30 seconds, flat. when i did my first plasma in '96 (or was it 95?) it cost me days doing all those luts and palette calcs in assembler. </oldfart>
the sin(sqrt(x*x+y*y)) shows circular ripples..
it's amazing that a plasma with sdl (or directx) is programmable in 30 seconds, flat. when i did my first plasma in '96 (or was it 95?) it cost me days doing all those luts and palette calcs in assembler. </oldfart>
1. don't re-allocate colors[][] in every frame, just make it a class member variable like cls[][].
2. replace %255 by &255, because %255 is wrong and &255 is faster -- or eliminate it altogether, as mad suggested
3. don't use managed code :)
2. replace %255 by &255, because %255 is wrong and &255 is faster -- or eliminate it altogether, as mad suggested
3. don't use managed code :)
people talking about bit manipulation to speed up a plasma effect on pc in 2007. who's trolling now?
thnx for the reactions
it's not a big thing (perhaps even optimized away by the compiler) but you also don't have to do new Point(0, 0) every frame, just store it as a class member... and you could do the "if (cls == null)"-thing in the some function which is executed before the first drawing (one check less).