pouët.net

VC++ is slow.

category: general [glöplog]
I wanna do 800*600 plasmas and fires. I choosed the best integer tables routines and tried them on my Athlon, but now I discover that it's not as fast as I want on a Pentium2. I beleive that all I need is to try some inline assembly. For sure, the very unpredictable directly wild unrolled speedcodes we once did on the CPC with Antitec, must be smaller and of less cycles than the ones that a compiler generates. I beleive that stronlg. I had once tried to optimize a fire for my 386, still I haven't tried to count cycles but calculated once for 4 pixels by 32bit, put in unroll loops, having something like 2-3 frames on 320*200. Still, I wanted it full frame rate and I beleive that when I learn to count cycles on the PC and see where they are lost, I can think of much much more! Nevertheless, I had a CPU raster bar for that, and it went less than 5% on my Pentium3. So,. I beleive that with much more optimizing we can have 1024*768 full frame rate effects on a Pentium2. Pitty that PCs are a beast to optimize and I am still hanging on my 386. But I beleive that X86 assembly can still overcome C++ and compilers by 3 or 23 or more times!!!!!!! Feel the power..

..on the partyplace, I did an RGB plasma for Ment0r's demo in a haste. I tried our demo on the Pentium2 among with another less complex plasma done on TinyPTC. Why the hell is the O.T.I.N.A.N.E. plasma still smooth enough there, unlike the TinyPTC plasma? I can't get it. Ok,. we have 640*480 by 16bit here and Ment0r's own routines for blitting in asm, DirectDraw and stuff. I guess I'll have to get the code and try to compile the same plasma both in TinyPTC and Ment0r's lib. Perhaps I am missing something, some optimization on the compiler. But I wanna have that 1024*768*32bit running on my Pentium, because PCs can do them all!!! FEEL THE POWER!!!!!!!
added on the 2004-05-13 12:35:53 by Optimus Optimus
Without reading 1% of this JALOP ... How many think Optimus ran it in debug?
added on the 2004-05-13 14:18:15 by Hatikvah Hatikvah
It's because you don't know how to Optimus. That has nothing to do with VC++.

Now fuck off, retard.
added on the 2004-05-13 14:21:26 by superplek superplek
and now for some usefull answers
added on the 2004-05-13 14:59:59 by psenough psenough
optimus' ignorance is funny
added on the 2004-05-13 15:09:50 by jar jar
Quote:

I beleive that X86 assembly can still overcome C++ and compilers by 3 or 23 or more times!!!!!!! Feel the power..


PRAISE THE LORD, OH YEAH! OOH!
added on the 2004-05-13 15:49:32 by superplek superplek
OK.
This is how far I could take it.
I'm willing to pay anyone a beer who gets off Optimus from Pouet for at least a month in ANY way and brings proof.
added on the 2004-05-13 16:00:05 by Gargaj Gargaj
Look, didn't I already tell you that it's because VC is utter and complete shit? GCC executes infinite loops at least ten times faster than VC.
added on the 2004-05-13 16:19:54 by 216 216
i suppose visual c++ generates better assembly code in milliseconds than you will do in your whole life.

but if you really want to optimise your effect, you should learn about memory access and cache.

if performance is THAT terrible, then i assume that you are working directly on the video memory. reading from videomemory is awfully slow, and writing should best be done in strict sequential order. (strict = really really strict).

another great trick to cripple performance is to us a large lookup table. if it does not fit in the L1 cache, you are lost. i wouldn't use anything larger than 1Kbyte.

for most cases, writing ASM is a waste of time. but it's sure worth reading the assembler code generated by the compiler from time to time to see what's going on, and sometimes the compiler needs a small hint to generate better code.
added on the 2004-05-13 16:19:59 by chaos chaos
I am sure that if you drove a Ferrari with the hand-break on and didn't know how to drive it, yes, it would be slow.

You could:
a) continue using your old trusty bicycle (slow but fast enough for you) or
b) spend some time to really learn how to drive that Ferrari and how that hand-break thingy works.
added on the 2004-05-13 16:29:44 by moT moT
x86 over c/c++ 23x faster?
optimus, please die.
added on the 2004-05-13 16:53:33 by abductee abductee
AAARRRRRGH!!!!!!

honestly:

vcpp is NOT slow. if you expect a better result than you got, you either have chosen an unsuitable approach/algorithm for your effect, or have no experience in cache strategies. i think usually you do NOT have to do a lot in assembler.

i remember to have speeded up an fpu-based rotozoomer without lookup-table by assembly, but that's it, just the inner loop. large lookup tables are NOT a good idea because you might cause cache misses like hell.

hint: do _NOT_ try to directly transfer 8-bit techniques to windows machines! all this lookup table precalcing and cycle-counting worked there because the precessors were uncached.

NO ASSEMBLY AT EARLY STAGES! PLAN YOUR LOOPS CAREFULLY!

you might also be more sprecific about how you did it. direct x ? GDI ?

you can send me the sources, i'll look at it. i can also send you my rotozoomer if you like.
added on the 2004-05-13 19:25:33 by rac rac
Bah Optimus. I much prefered your rants about how shit your life is.
added on the 2004-05-13 19:42:44 by Pete Pete
Before saying that a compiler is late, you should see algorithm and memory access too more.
And, almost all factors are based on algorithm.:)
added on the 2004-05-13 20:05:20 by got got
I can code a zillion times faster stuff in VHDL than you can in assembler, get that!
added on the 2004-05-13 20:11:36 by Stelthzje Stelthzje
This thread is funny..

I am going back to the CPC. :P
added on the 2004-05-14 01:13:41 by Optimus Optimus
Please do, btw, isn't there a nice CPC site for you retarded people where you can hangaround? Oh, and take Wade with you.
added on the 2004-05-14 08:52:06 by Hatikvah Hatikvah
I wonder if your C code turns out optimized when compiled with a CPC C-compiler instead of a Windows C-compiler ...
i think Optimus is the most successful troll in pouet's history, plek and stefan together could never raise the tenth of the flame he gets in a day in their whole life...
added on the 2004-05-14 10:20:29 by FooLman FooLman
tinyptc waits until retrace.
added on the 2004-05-14 13:09:57 by texel texel
I just can second chaos and rac....
really, vc++ is not slow. and writing better asm-code than this compiler needs a shitload of experience.
stelthz: wow, impressing ;-)
added on the 2004-05-14 14:56:53 by styx^hcr styx^hcr
Sometimes VC 6.0 needed some help here and there and there was some use especially for SSE/SSE2 routines. Dunno if its the same with the .NET compiler. But usually theres no use to ASM other than optimussing the most used routines after running a profiler...
The speed of most routines really depends on the algorithm you choose. So first C++ then ASM if you really need to....
added on the 2004-05-14 18:15:43 by raer raer
Reminds me of the "neXe" tutorials where somebody tries to explan that vertex buffers are exactly twice as fast as using DrawPrimitiveUP. He "proved" that by the fact that his one (!) triangle spinned at twice the speed.

And why? Because that fool called vertexbuffer->Process() and did TWO rotations per frame. Yeah. Didn't even recognize that his fps meter showed a flat 60 in both cases.
added on the 2004-05-14 18:19:16 by kb_ kb_
I tried to write some "hand-optimized" assembly once. I downloaded the performance guides from both Intel and AMD, got about halfway through reading them, and gave up. The fact is, there isn't really much one can do with a superscalar chached architecture. There is no way to count execution cycles, chache misses, or pipeline behaviour. Without that info, trying to hand-optimize code is pretty pointless.

The best we humans can do reliably on the Intel is size-optimization, and even that is pretty much guesswork around the compressor.
added on the 2004-05-14 19:22:55 by s_tec s_tec
If there is no way to manually evaluate how fast asm code can go, how would a compiler evaluate it?
added on the 2004-05-14 20:06:40 by _-_-__ _-_-__

login