4k intro, C vs assembler?
category: code [glöplog]
Quote:
Some assembly-level optimizations can also be done directly from C: looping down instead of up, indexed loops ending at zero, do-while loops to put the condition at the end, the do-undo trick to get rid of "else" jumps, …
These are extremely simple optimizations which any half-decent compiler will already perform anyway.
Lesson to learn from this: When sizecoding in C/C++, don't _assume_ what the compiler might do but always look at the disassembly output AND the crinkler stats. queezing the last few bytes out of a 4K is black magic anyway. :)
Less black magic and more trial-and-error :D
(The fun times of just switching the order of variables to see if it wins you a byte, etc.)
(The fun times of just switching the order of variables to see if it wins you a byte, etc.)
[quote]Less black magic and more trial-and-error[\quote]
.. and therefore best done drunk with your pants off. :-]
.. and therefore best done drunk with your pants off. :-]
I was recently quite mad with my C compiler. I had to copy some memory, so I wrote a for loop for that. That stupid compiler was smart enough to notice what I was trying to do and replaced my code with a call to memcpy. Too bad that I didn't link to the standard library. So I had to obfuscate my copy loop until the compiler didn't recognize it as a mere copy anymore :-)
I know, the right thing to do would've been a rep movsd or something. But the times when I enjoyed writing assembly are somewhat over.
I know, the right thing to do would've been a rep movsd or something. But the times when I enjoyed writing assembly are somewhat over.
Most compilers have intrinsics for memcpy etc. All you need is the right compiler switch and at least one line of prototype definition :)
dont use Turbo C 2.0 for optimization!
I use nasm/yasm for 4k coding. It's quite convenient - prototyping in done in standard C/C++. All you need is a very tiny dx/gl basecode anyways - if you do most of the things in shaders, so no real need for complicated cpu code.
An update: I did the following:
Conclusions:
Thank you guys and especially rrrola that did his magic...
- Add forward deceleration did not convince crinkler to keep function with a position relative address. I guess it shifts things around anyway.
- Changed parameter order, this made the compressed version 2 bytes smaller.
- Changed x and y loop to count down (in the c code), this somehow convinced the compiler of using EDX (god know why) and in total this saved additional 18 bytes before compression (wow!) and only 6 post compression.
(down counting the inner loop reduced the uncompressed size but actually inflated the end result) - Finally, I tried rrrola code, as expected uncompressed size is 96 bytes (32 bytes less) and the compressed result is 64 bytes (26 less!!).
Conclusions:
- Never trust the compiler, allways check the output.
- Even if the output seems reasonable, most of the times you could reduce size by 20%-25% by manual asm tweaking.
The only drawback is that you need to be at least as smart as rrrola :) - Apparently, hand written assembler gets compressed better.
Thank you guys and especially rrrola that did his magic...
that's a cool result.
1) the forward declararion is just dummy. the compiler compiles line per line. you have to to relocate the function below the code that calls it. thus making it the positve offset in the section. start with main and actually making it a subfunction "tree" from the top down the file.
2) that compression patterns do work. i said small improvement.just a repeated "token".
3) outer y slow and inner fast x loop. not a problem either. "array order magic". correct.
this what you got using optimizing c code. it's fine. ;)
1) the forward declararion is just dummy. the compiler compiles line per line. you have to to relocate the function below the code that calls it. thus making it the positve offset in the section. start with main and actually making it a subfunction "tree" from the top down the file.
2) that compression patterns do work. i said small improvement.just a repeated "token".
3) outer y slow and inner fast x loop. not a problem either. "array order magic". correct.
this what you got using optimizing c code. it's fine. ;)
i learned something myself about coding. even tho irrelevant. :D
I doubt it would have been possible to achieve equally good results in http://www.hugi.scene.org/compo/ using C instead of Assembler.
Nice way to miss the point Adok. It's about modern 4k intros, not tiny DOS sorcery.
There are so many benefic instructions from intel 80186 to Pentium Pro processor (pusha/popa/cmov/etc.) and I doubt that compilers take advantage of them by default, even specifying architecture to your compiler.
Does anyone already take a look about that?
For example I known that -march option for gcc but I never take a look at the assembly output.
Does anyone already take a look about that?
For example I known that -march option for gcc but I never take a look at the assembly output.
Compilers are fantastic at generating fast code, and somewhat decent at generating small code. But they are awful at generating compressible code. This is not because this is theoretically difficult to do for a compiler. This is just not what the available compilers were made for.
A few things to keep in mind regarding assembly for 4ks:
- Rewriting some of your code in assembly can give a small size benefit. Rewriting everything in assembly can give a huge size benefit, especially if the coding style is consistent.
- Write plain and regular code. Always use the same register save/restore sequence, always use the same register for the same thing, etc.
- The uncompressed code (and data) size is utterly irrelevant. If someone asks you how big your code is uncompressed, and you cannot honestly answer "I don't know", you are doing it wrong.
A few things to keep in mind regarding assembly for 4ks:
- Rewriting some of your code in assembly can give a small size benefit. Rewriting everything in assembly can give a huge size benefit, especially if the coding style is consistent.
- Write plain and regular code. Always use the same register save/restore sequence, always use the same register for the same thing, etc.
- The uncompressed code (and data) size is utterly irrelevant. If someone asks you how big your code is uncompressed, and you cannot honestly answer "I don't know", you are doing it wrong.
thread!
Blueberry's words are funny/reassuring as it is exactly my experience writing 1ks in JavaScript.
Blueberry's words are funny/reassuring as it is exactly my experience writing 1ks in JavaScript.
Quote:
If someone asks you how big your code is uncompressed, and you cannot honestly answer "I don't know", you are doing it wrong.
But but but Crinkler keeps shoving that data in my face when it's working... :(
I know, but I constantly watch at the /PROGRESSGUI instead of staring at the output console. Only when it's finished, I see the damned number and go FFUUUUUU.
I think that if someone plan to make a couple of 4kb intro and like the asm challenge, writing a custom C compiler/editor with the strict size optimization in mind could be quite interesting+challenging.
Afterall, like rrrola show up in this thread, it turn out to be a puzzle of replacing things by what take less bytes, moving things around.. but it could also be by re-writing the algorithm to produce different intermediate values which end up doing the same at the end or by making a function more generic and re-using it for video and sound for example and therefore having only 1 function for doing 2+ things.
I think being able to track down the weight in size of each functions... as you write it in C within a custom-editor which could show you the .ASM result live might be useful and fun. I am just proposing it... but I know this editor+compiler will take so much time and still the 4kb intro is not written after it's start to work.
Afterall, like rrrola show up in this thread, it turn out to be a puzzle of replacing things by what take less bytes, moving things around.. but it could also be by re-writing the algorithm to produce different intermediate values which end up doing the same at the end or by making a function more generic and re-using it for video and sound for example and therefore having only 1 function for doing 2+ things.
I think being able to track down the weight in size of each functions... as you write it in C within a custom-editor which could show you the .ASM result live might be useful and fun. I am just proposing it... but I know this editor+compiler will take so much time and still the 4kb intro is not written after it's start to work.
Well, there is the interactive GCC thingie here: http://gcc.godbolt.org/
You can edit your code, and see the assembly output immediately. Might be useful.
You can edit your code, and see the assembly output immediately. Might be useful.
Someone make the same for VC++ with Crinkler. :-)
That interactive thing is pretty cool. Too bad the "intel syntax"-option is greyed out :D
@Preacher "intel syntax" is enabled if you change the compiler to g++
Wow! Interactive/live Crinkling would be the shit!
http://www.beaengine.org/ would make it possible..