pouët.net

4k intro, C vs assembler?

category: code [glöplog]
Quote:
Some assembly-level optimizations can also be done directly from C: looping down instead of up, indexed loops ending at zero, do-while loops to put the condition at the end, the do-undo trick to get rid of "else" jumps, …


These are extremely simple optimizations which any half-decent compiler will already perform anyway.
added on the 2013-04-06 15:27:57 by Scali Scali
Lesson to learn from this: When sizecoding in C/C++, don't _assume_ what the compiler might do but always look at the disassembly output AND the crinkler stats. queezing the last few bytes out of a 4K is black magic anyway. :)
added on the 2013-04-06 15:31:20 by kb_ kb_
Less black magic and more trial-and-error :D
(The fun times of just switching the order of variables to see if it wins you a byte, etc.)
added on the 2013-04-06 15:37:37 by Gargaj Gargaj
[quote]Less black magic and more trial-and-error[\quote]
.. and therefore best done drunk with your pants off. :-]
added on the 2013-04-06 15:49:23 by trc_wm trc_wm
I was recently quite mad with my C compiler. I had to copy some memory, so I wrote a for loop for that. That stupid compiler was smart enough to notice what I was trying to do and replaced my code with a call to memcpy. Too bad that I didn't link to the standard library. So I had to obfuscate my copy loop until the compiler didn't recognize it as a mere copy anymore :-)

I know, the right thing to do would've been a rep movsd or something. But the times when I enjoyed writing assembly are somewhat over.
added on the 2013-04-06 17:42:16 by chock chock
Most compilers have intrinsics for memcpy etc. All you need is the right compiler switch and at least one line of prototype definition :)
added on the 2013-04-06 18:40:00 by kb_ kb_
dont use Turbo C 2.0 for optimization!
added on the 2013-04-06 19:08:04 by rudi rudi
I use nasm/yasm for 4k coding. It's quite convenient - prototyping in done in standard C/C++. All you need is a very tiny dx/gl basecode anyways - if you do most of the things in shaders, so no real need for complicated cpu code.
added on the 2013-04-06 21:41:59 by las las
An update: I did the following:
  1. Add forward deceleration did not convince crinkler to keep function with a position relative address. I guess it shifts things around anyway.
  2. Changed parameter order, this made the compressed version 2 bytes smaller.
  3. Changed x and y loop to count down (in the c code), this somehow convinced the compiler of using EDX (god know why) and in total this saved additional 18 bytes before compression (wow!) and only 6 post compression.
  4. (down counting the inner loop reduced the uncompressed size but actually inflated the end result)
  5. Finally, I tried rrrola code, as expected uncompressed size is 96 bytes (32 bytes less) and the compressed result is 64 bytes (26 less!!).

Conclusions:
  1. Never trust the compiler, allways check the output.
  2. Even if the output seems reasonable, most of the times you could reduce size by 20%-25% by manual asm tweaking.
  3. The only drawback is that you need to be at least as smart as rrrola :)
  4. Apparently, hand written assembler gets compressed better.

Thank you guys and especially rrrola that did his magic...
added on the 2013-04-07 00:48:47 by TLM TLM
that's a cool result.

1) the forward declararion is just dummy. the compiler compiles line per line. you have to to relocate the function below the code that calls it. thus making it the positve offset in the section. start with main and actually making it a subfunction "tree" from the top down the file.
2) that compression patterns do work. i said small improvement.just a repeated "token".
3) outer y slow and inner fast x loop. not a problem either. "array order magic". correct.

this what you got using optimizing c code. it's fine. ;)
added on the 2013-04-07 01:19:14 by yumeji yumeji
i learned something myself about coding. even tho irrelevant. :D
added on the 2013-04-07 01:26:14 by yumeji yumeji
I doubt it would have been possible to achieve equally good results in http://www.hugi.scene.org/compo/ using C instead of Assembler.
added on the 2013-04-07 11:31:38 by Adok Adok
Nice way to miss the point Adok. It's about modern 4k intros, not tiny DOS sorcery.
added on the 2013-04-07 17:23:13 by superplek superplek
There are so many benefic instructions from intel 80186 to Pentium Pro processor (pusha/popa/cmov/etc.) and I doubt that compilers take advantage of them by default, even specifying architecture to your compiler.
Does anyone already take a look about that?
For example I known that -march option for gcc but I never take a look at the assembly output.
added on the 2013-04-07 18:05:50 by stfsux stfsux
Compilers are fantastic at generating fast code, and somewhat decent at generating small code. But they are awful at generating compressible code. This is not because this is theoretically difficult to do for a compiler. This is just not what the available compilers were made for.

A few things to keep in mind regarding assembly for 4ks:
- Rewriting some of your code in assembly can give a small size benefit. Rewriting everything in assembly can give a huge size benefit, especially if the coding style is consistent.
- Write plain and regular code. Always use the same register save/restore sequence, always use the same register for the same thing, etc.
- The uncompressed code (and data) size is utterly irrelevant. If someone asks you how big your code is uncompressed, and you cannot honestly answer "I don't know", you are doing it wrong.
added on the 2013-04-07 21:47:31 by Blueberry Blueberry
BB Image thread!

Blueberry's words are funny/reassuring as it is exactly my experience writing 1ks in JavaScript.
added on the 2013-04-07 21:58:08 by p01 p01
Quote:
If someone asks you how big your code is uncompressed, and you cannot honestly answer "I don't know", you are doing it wrong.

But but but Crinkler keeps shoving that data in my face when it's working... :(
added on the 2013-04-08 12:42:58 by Gargaj Gargaj
I know, but I constantly watch at the /PROGRESSGUI instead of staring at the output console. Only when it's finished, I see the damned number and go FFUUUUUU.
added on the 2013-04-08 12:52:38 by xTr1m xTr1m
I think that if someone plan to make a couple of 4kb intro and like the asm challenge, writing a custom C compiler/editor with the strict size optimization in mind could be quite interesting+challenging.

Afterall, like rrrola show up in this thread, it turn out to be a puzzle of replacing things by what take less bytes, moving things around.. but it could also be by re-writing the algorithm to produce different intermediate values which end up doing the same at the end or by making a function more generic and re-using it for video and sound for example and therefore having only 1 function for doing 2+ things.

I think being able to track down the weight in size of each functions... as you write it in C within a custom-editor which could show you the .ASM result live might be useful and fun. I am just proposing it... but I know this editor+compiler will take so much time and still the 4kb intro is not written after it's start to work.
added on the 2013-04-08 16:10:58 by F-Cycles F-Cycles
Well, there is the interactive GCC thingie here: http://gcc.godbolt.org/

You can edit your code, and see the assembly output immediately. Might be useful.
added on the 2013-04-08 16:50:50 by Scali Scali
Someone make the same for VC++ with Crinkler. :-)
added on the 2013-04-08 19:12:27 by rrrola rrrola
That interactive thing is pretty cool. Too bad the "intel syntax"-option is greyed out :D
added on the 2013-04-08 19:20:15 by Preacher Preacher
@Preacher "intel syntax" is enabled if you change the compiler to g++
added on the 2013-04-08 21:48:21 by giddy giddy
Wow! Interactive/live Crinkling would be the shit!
added on the 2013-04-08 21:54:51 by p01 p01
http://www.beaengine.org/ would make it possible..
added on the 2013-04-08 22:33:59 by trc_wm trc_wm

login