Interesting paper for sizecoders

category: general [glöplog]

I stumbled upon a paper that may be of interest to other sizecoders. The authors evaluate various architectures for code density and reports results. Perhaps predictably, i386 seems to be high on the list of architectures that are capable of producing dense (viz. tiny) code.

added on the 2015-07-17 16:32:24 by orby

If it doesn't take into account code density _after_ compression/entropy coding, then it's worth nothing for size coding. :) Who cares if your code is a billion gazillion bytes _before_ compression, as long as it compresses down to 1 kilobyte.

added on the 2015-07-17 17:41:56 by yzi

orbitaldecay: thanks!

added on the 2015-07-17 17:51:20 by rudi

The paper is more about CPU designing where cache size plays a role (smaller code -- more code in cache -- faster). And yet again, what is the size worth packing? Is there any code packing in 256byters or 128byters?

added on the 2015-07-17 17:57:01 by lvd

Interesting paper yes. They say they have a 64 byte LZSS decoder... Might be. How much was Crinkler's arithmetic decoder part again? There are so many options. I'm not convinced my platform and toolchain are the best for 1k size. But then again, compo entries just have to get done. I have yet to see an entry that only consists of speculation about the best platform and that did well in a compo. ;) What is the first party to have a "dreams and speculations" compo? It would have lots of entries for sure.
But anyway. Interesting paper.

added on the 2015-07-17 18:15:33 by yzi

Quote:

If it doesn't take into account code density _after_ compression/entropy coding, then it's worth nothing for size coding. :) Who cares if your code is a billion gazillion bytes _before_ compression, as long as it compresses down to 1 kilobyte.

But the ideas still apply to the decompression stub :)

Also, I think in an abstract sense an executable compressor is like a JIT virtual machine (albeit with a very complex, high entropy instruction encoding!)

Quote:

The paper is more about CPU designing where cache size plays a role (smaller code -- more code in cache -- faster). And yet again, what is the size worth packing? Is there any code packing in 256byters or 128byters?

<= 256 was mostly the level of size coding I was thinking of when I read the paper. Particularly the part on how instruction set features correlate with code density I found interesting. A lot of these ideas could be applied to creating an IBNIZ style virtual machine :)

added on the 2015-07-17 19:09:56 by orby

nice paper

z80 has unalligned ld/st I think? Writting ld (&2001),hl does work as it should.
how do they count x86 regs? I think there are more than 8, but anyway most of them are weird segment, offset and other nonorthogal things, I always had better feeling on z80 than x86.
I was expecting z80 to have good scores. Although x86 will do better in intros because of fpu.

added on the 2015-07-18 00:32:53 by Optimus

I experimented with packing in 256b intro for Z80. There are 39 byte LZSS unpackers for Z80 out there :) I doubt it would be practical in 128b though.

added on the 2015-07-18 01:06:21 by introspec

I laugh every time if i see tiny intros packing thread =)

added on the 2015-07-18 04:53:30 by g0blinish

well... reasonable that i386 is dense. how many 1, 2, 3 byte opcodes we have for the algorithms there? those instructions are smaller then the others. and how much memory you need adressed for things? how do you address it? compute what? integral ranges? small binary? is it fast? still... compression is not mentioned in the paper. it squeezes every longer code small. depends tho... on some things. :)

added on the 2015-07-18 05:19:57 by yumeji

There seem to be a bit of confusion between Lzss and Lz4.

On my blog, in the "Assembly Art" thumbnail section, one may find :
-a 30-byte 68k Lzss snippet from Francois "Zerkman^Sector1" Galea ;
-a 152-byte 65k lz4unp snippet by Peter "Qkumba" Ferrie ;
-a 79-byte x86 lz4decomp snippet by Jim "Trixter" Leonard & Peter "Qkumba" Ferrie ;
-other various compression/decompression schemes from many people.

@Introspec, I'd be interested into links providing "There are 39 byte LZSS unpackers for Z80 out there" please.

I agree with g0blinish's comment more or less in the sense that to my knowledge and for non-specifically organized/modified arbitrary code (not graphics or sound), it is feasible to save around 15 bytes starting 512-byte category.
In the 256b or 128b categories, compression is usually unusable ; for example the compressed size of the winner of this year's Outline sizecoding competition with aplib (superior to lz4's compression ratio) yields :

Quote:

■ Opening files
- Input type : COM
- Code size : 128 bytes
■ Allocating memory
■ Packing code -> 139 bytes
■ Code mover added -> 20 bytes
■ Depacker added -> 99 bytes
■ Freeing memory
■ Closing files
- No output file specified ... writing to 'OUT.COM'

Done ... compression ratio: -101% (128 bytes -> 258 bytes)

added on the 2015-07-18 11:12:55 by Baudsurfer

This test does not mean anything. You must change the way you code. You cannot compress size-optimized code, you have to change the approach completely, if you'd like to work with the compressor.

added on the 2015-07-18 11:15:40 by introspec

Funnily enough, I was in a similar position just over a year ago, going on record and saying that hand-optimized is the best way to go. I am less sure now, because I can see the power of both approaches.

added on the 2015-07-18 11:17:14 by introspec

So-called size-optimized code compresses very badly at least on x86. A data compressor will find a more efficient coding for the operations in your program than the fixed x86 opcodes.

added on the 2015-07-18 11:25:17 by yzi

"A data compressor will find a more efficient coding for the operations in your program than the fixed x86 opcodes." This is where my problem is - it is absolutely not clear why this should be the case. I do not understand Crinkler's technology, but I know what LZxx-based schemes do well. Most compressors are pretty dumb, especially those available at lower end platforms. They often do not recognize even some pretty glaring opportunities for packing. So, when using a compressor in this situation, you are not so much relying upon the compressor to come up with good solutions, you actually code differently, in an intentionally dumb way, to help the compressor.

For example, a simple loop that repeats 3 times may be more efficiently compressed as unrolled code, because your (longer) code will not need to set up the counter and will also lose a conditional jump.

added on the 2015-07-18 11:39:24 by introspec

Yes, no compression algorithm can beat (has beaten) hand-crafted optimized code in the 128-byte or 256-byte tinysize category, probably because your brain is the best optimizer/compresor.

If someone has the link to the "64 byte LZSS decoder" mentionned in the paper, can you please either mail to me or post URL for it. Thank you in advance.

added on the 2015-07-18 12:24:49 by Baudsurfer

Baudsurfer, this is the compressor that I was talking about: http://retrolandia.net/foro/showthread.php?tid=82.

added on the 2015-07-18 12:40:35 by introspec

And I would like to re-iterate - a well-optimized intro should not compress, whatever its size. My 1K MGDMA does not compress using compressors available on the platform, so the redundancy of hand-crafted code can be quite low, whatever the size. However, if you decide to use compressor, the games changes completely. The decompressed code of 256b Down with the Rules may look very suboptimal, but in fact, it is written while checking every step with the compressor, to be as compressor-friendly as possible.

I am sure that more can be done in this respect tool-wise. An assembler that understands what commands can be swapped without changing the results, generates all possible permutations of a given (short) program and then attempts compressing every one of them could be an interesting option.

added on the 2015-07-18 12:49:27 by introspec

Thank you for the paper.
Might be of interest for someone who is currently choosing a platform?

added on the 2015-07-18 18:09:41 by baah

Quote:

a well-optimized intro should not compress, whatever its size

Except if you care about size AND speed (eg unrolling loops on non cached cpus)?

added on the 2015-07-18 18:22:10 by baah

This is precisely why I decided to go for compressor approach in the first place. Seemed the best way to not have to compromize on speed.

added on the 2015-07-18 18:27:09 by introspec

as I understand Exomizer tuned on z80 and 6502 opcodes

added on the 2015-07-18 19:32:03 by g0blinish

I do not think Exomizer is useful for very short programs. The decompressor is too long. The last time I tried to make a 1K with compressor, Exomizer saved maybe 50 bytes compared to ZX7, but its decompressor was 100 bytes longer.

added on the 2015-07-18 20:00:16 by introspec

@introspec, as i remember you said size of packed code depends from data.
I am worked on 'interference' 1k, one packed by zx7, so size depends of code, e.g.

ld bc,768:ldir

versus

ld b,3:ldir

game different result

anyway, packing of tiny intro doesn't makes sence.

added on the 2015-07-19 08:01:44 by g0blinish

Uncompressibility does not mean that it's necessarily well optimized. It has just been coded or obfuscated so that the compressor cannot find a shorter decodable representation.

As long as some kind of decompressor routine fits in the size limit, then it's worth considering. It also gives many more possibilities for data-intensive means of expression, as opposed to code-intensive.

Ultimately I guess we're talking about something like Kolmogorov complexity. But in the case of demos, the "input string" is not fixed at all. Anything goes as long as it's impressive. Compogorov complexity?

added on the 2015-07-19 08:03:47 by yzi

pouët.net

Interesting paper for sizecoders

login