4k intro, C vs assembler?

category: code [glöplog]

Quote:

Some assembly-level optimizations can also be done directly from C: looping down instead of up, indexed loops ending at zero, do-while loops to put the condition at the end, the do-undo trick to get rid of "else" jumps, …

These are extremely simple optimizations which any half-decent compiler will already perform anyway.

added on the 2013-04-06 15:27:57 by Scali

Lesson to learn from this: When sizecoding in C/C++, don't _assume_ what the compiler might do but always look at the disassembly output AND the crinkler stats. queezing the last few bytes out of a 4K is black magic anyway. :)

added on the 2013-04-06 15:31:20 by kb_

Less black magic and more trial-and-error :D
(The fun times of just switching the order of variables to see if it wins you a byte, etc.)

added on the 2013-04-06 15:37:37 by Gargaj

[quote]Less black magic and more trial-and-error[\quote]
.. and therefore best done drunk with your pants off. :-]

added on the 2013-04-06 15:49:23 by trc_wm

I was recently quite mad with my C compiler. I had to copy some memory, so I wrote a for loop for that. That stupid compiler was smart enough to notice what I was trying to do and replaced my code with a call to memcpy. Too bad that I didn't link to the standard library. So I had to obfuscate my copy loop until the compiler didn't recognize it as a mere copy anymore :-)

I know, the right thing to do would've been a rep movsd or something. But the times when I enjoyed writing assembly are somewhat over.

added on the 2013-04-06 17:42:16 by chock

Most compilers have intrinsics for memcpy etc. All you need is the right compiler switch and at least one line of prototype definition :)

added on the 2013-04-06 18:40:00 by kb_

dont use Turbo C 2.0 for optimization!

added on the 2013-04-06 19:08:04 by rudi

I use nasm/yasm for 4k coding. It's quite convenient - prototyping in done in standard C/C++. All you need is a very tiny dx/gl basecode anyways - if you do most of the things in shaders, so no real need for complicated cpu code.

added on the 2013-04-06 21:41:59 by las

An update: I did the following:

Add forward deceleration did not convince crinkler to keep function with a position relative address. I guess it shifts things around anyway.
Changed parameter order, this made the compressed version 2 bytes smaller.
Changed x and y loop to count down (in the c code), this somehow convinced the compiler of using EDX (god know why) and in total this saved additional 18 bytes before compression (wow!) and only 6 post compression.

Finally, I tried rrrola code, as expected uncompressed size is 96 bytes (32 bytes less) and the compressed result is 64 bytes (26 less!!).

Conclusions:

Never trust the compiler, allways check the output.
Even if the output seems reasonable, most of the times you could reduce size by 20%-25% by manual asm tweaking.

Apparently, hand written assembler gets compressed better.

Thank you guys and especially rrrola that did his magic...

added on the 2013-04-07 00:48:47 by TLM

that's a cool result.

1) the forward declararion is just dummy. the compiler compiles line per line. you have to to relocate the function below the code that calls it. thus making it the positve offset in the section. start with main and actually making it a subfunction "tree" from the top down the file.
2) that compression patterns do work. i said small improvement.just a repeated "token".
3) outer y slow and inner fast x loop. not a problem either. "array order magic". correct.

this what you got using optimizing c code. it's fine. ;)

added on the 2013-04-07 01:19:14 by yumeji

i learned something myself about coding. even tho irrelevant. :D

added on the 2013-04-07 01:26:14 by yumeji

I doubt it would have been possible to achieve equally good results in http://www.hugi.scene.org/compo/ using C instead of Assembler.

added on the 2013-04-07 11:31:38 by Adok

Nice way to miss the point Adok. It's about modern 4k intros, not tiny DOS sorcery.

added on the 2013-04-07 17:23:13 by superplek

There are so many benefic instructions from intel 80186 to Pentium Pro processor (pusha/popa/cmov/etc.) and I doubt that compilers take advantage of them by default, even specifying architecture to your compiler.
Does anyone already take a look about that?
For example I known that -march option for gcc but I never take a look at the assembly output.

added on the 2013-04-07 18:05:50 by stfsux

Compilers are fantastic at generating fast code, and somewhat decent at generating small code. But they are awful at generating compressible code. This is not because this is theoretically difficult to do for a compiler. This is just not what the available compilers were made for.

A few things to keep in mind regarding assembly for 4ks:
- Rewriting some of your code in assembly can give a small size benefit. Rewriting everything in assembly can give a huge size benefit, especially if the coding style is consistent.
- Write plain and regular code. Always use the same register save/restore sequence, always use the same register for the same thing, etc.
- The uncompressed code (and data) size is utterly irrelevant. If someone asks you how big your code is uncompressed, and you cannot honestly answer "I don't know", you are doing it wrong.

added on the 2013-04-07 21:47:31 by Blueberry

thread!

Blueberry's words are funny/reassuring as it is exactly my experience writing 1ks in JavaScript.

added on the 2013-04-07 21:58:08 by p01

Quote:

If someone asks you how big your code is uncompressed, and you cannot honestly answer "I don't know", you are doing it wrong.

But but but Crinkler keeps shoving that data in my face when it's working... :(

added on the 2013-04-08 12:42:58 by Gargaj

I know, but I constantly watch at the /PROGRESSGUI instead of staring at the output console. Only when it's finished, I see the damned number and go FFUUUUUU.

added on the 2013-04-08 12:52:38 by xTr1m

I think that if someone plan to make a couple of 4kb intro and like the asm challenge, writing a custom C compiler/editor with the strict size optimization in mind could be quite interesting+challenging.

Afterall, like rrrola show up in this thread, it turn out to be a puzzle of replacing things by what take less bytes, moving things around.. but it could also be by re-writing the algorithm to produce different intermediate values which end up doing the same at the end or by making a function more generic and re-using it for video and sound for example and therefore having only 1 function for doing 2+ things.

I think being able to track down the weight in size of each functions... as you write it in C within a custom-editor which could show you the .ASM result live might be useful and fun. I am just proposing it... but I know this editor+compiler will take so much time and still the 4kb intro is not written after it's start to work.

added on the 2013-04-08 16:10:58 by F-Cycles

Well, there is the interactive GCC thingie here: http://gcc.godbolt.org/

You can edit your code, and see the assembly output immediately. Might be useful.

added on the 2013-04-08 16:50:50 by Scali

Someone make the same for VC++ with Crinkler. :-)

added on the 2013-04-08 19:12:27 by rrrola

That interactive thing is pretty cool. Too bad the "intel syntax"-option is greyed out :D

added on the 2013-04-08 19:20:15 by Preacher

@Preacher "intel syntax" is enabled if you change the compiler to g++

added on the 2013-04-08 21:48:21 by giddy

Wow! Interactive/live Crinkling would be the shit!

added on the 2013-04-08 21:54:51 by p01

http://www.beaengine.org/ would make it possible..

added on the 2013-04-08 22:33:59 by trc_wm

pouët.net

4k intro, C vs assembler?

login