OpenGL framework for 1k intro

category: code [glöplog]

I'm getting enough spam already so I'm not putting my email address here, but you can find my email address by running laturi (or looking the screenshot of laturi in the webpage)

added on the 2014-10-22 08:27:36 by ts

Quote:

Then of course your "ExitProcess(NULL);" is all you need to end.

You don't even need that. Just return 0; from your entrypoint will work fine (the OS code that calls into your entrypoint will use the return value to exit the process).
ExitProcess() is mainly to force an exit from anywhere in your program.
In this case you only want to exit from your mainloop on keypress, so just returning from your entrypoint should be no problem.

added on the 2014-10-22 10:29:30 by Scali

Oh, pet peeve time again: ExitProcess(NULL) is semantically broken: ExitProcess() expects an UINT for the exit code. NULL is meant to describe a pointer with value 0, not an integer.
So it should read ExitProcess(0);

added on the 2014-10-22 10:31:26 by Scali

ExitProcess will also kill your threads (e.g. for audio rendering). Without it, your process will never quit

added on the 2014-10-22 13:40:01 by xTr1m

Ah yes, in that case it may be smaller than some quick hack to kill all threads (eg having some global 'exit' variable, and all threads constructed like while (!exit) ).

added on the 2014-10-22 14:29:33 by Scali

So what if a thread calls a VERY long function, like e.g. _4klang_render, which creates the whole sound buffer? Until that one finishes, I can't check upon any exit variable...

...but the main thread can call ExitProcess upon registering a VK_ESCAPE press.

added on the 2014-10-22 15:53:31 by xTr1m

If you use ExitProcess(), then you can leave out the final ret instruction.

added on the 2014-10-22 16:25:59 by yzi

Scali

Quote:

You don't even need that. Just return 0; from your entrypoint will work fine (the OS code that calls into your entrypoint will use the return value to exit the process).

Does this actually have any advantage?

At least in FreeBSD/Linux this does not seem to work. In my earlier tests, if you define your own entry point in ELF headers, calling ret does not seem to return anywhere, instead it will just crash the program.

Of course this might also be because I erase the entry and exit directives from _start() to save space.

I've found it more size-efficient overall to simply issue a system call to terminate the program. On x86 (again in FreeBSD / Linux) this can be done directly from C with:

Code:#define asm_exit() asm("int $128" : /* no output */ : "a"(1))

This does not specify the return code, so program return value is undetermined, but that hardly matters. Now you can erase any push / other save instructions that would be necessary upon entry to _start and any pop / other restore instructions that would have to be done before calling ret.

added on the 2014-10-22 17:22:54 by Trilkk

Quote:

Does this actually have any advantage?

Well, I was thinking of size... If you want to use ExitProcess, it needs to be imported, so at the least you need to store the string "ExitProcess" somewhere. So it's a lot more bytes than just a 'ret'.

Quote:

At least in FreeBSD/Linux this does not seem to work. In my earlier tests, if you define your own entry point in ELF headers, calling ret does not seem to return anywhere, instead it will just crash the program.

Could be... you could look through the sources and see what the CRT normally does when you exit from main().
I know that ret works in DOS and Windows at least.

Quote:

Of course this might also be because I erase the entry and exit directives from _start() to save space.

Ah yes, that may be an issue, you might have to preserve some registers (I don't think you have to preserve anything other than esp in DOS/Windows).

added on the 2014-10-22 18:31:34 by Scali

Quote:

Ah yes, that may be an issue, you might have to preserve some registers (I don't think you have to preserve anything other than esp in DOS/Windows).

Confirmed my suspicions for FreeBSD. Just returning with 'ret' from entry point you set yourself in the ELF header is not possible. It will always crash.

Looking at CRT sources, it makes sense. crt1_c.c actually calls the following in _start1:

Code:exit(main(argc, argv, env));

exit() will in turn call _exit(), that is a syscall.

This means erasing _start intro and outro and then executing the syscall yourself indeed is the best way to go.

added on the 2014-10-29 19:21:34 by Trilkk

Quote:

This means erasing _start intro and outro and then executing the syscall yourself indeed is the best way to go.

I prefer int3, it is one byte opcode and exits, unless you have debugger attached...

added on the 2014-10-29 19:46:53 by ts

Quote:

I prefer int3, it is one byte opcode and exits, unless you have debugger attached...

Neat, thanks. Saves not only one byte for int 128, but also the instruction required to move '1' into eax.

I also noticed that since you can give the program headers all (rwx) permissions in the PT_LOAD phdr, you can actually write your .bss data over your header memory. It does not matter that the header contents are essentially corrupted, since scouring libraries is the first thing to do, and after that they are useless anyway.

Current size of flow2: 790 bytes.

added on the 2014-10-29 21:24:05 by Trilkk

Quote:

I also noticed that since you can give the program headers all (rwx) permissions in the PT_LOAD phdr, you can actually write your .bss data over your header memory. It does not matter that the header contents are essentially corrupted, since scouring libraries is the first thing to do, and after that they are useless anyway.

The best way to manage the bss section/segment is to add your actual bss size in
p_memsz field of your PT_LOAD. If p_filesz != p_memsz, the loader will fill
the differnce with 0's.

Quote:

Current size of flow2: 790 bytes.

hehe nice :)
Which compression? lzma or zip?

added on the 2014-10-31 00:40:56 by stfsux

Quote:

The best way to manage the bss section/segment is to add your actual bss size in p_memsz field of your PT_LOAD. If p_filesz != p_memsz, the loader will fill the differnce with 0's.

I'm actually doing this: dnload#Fake_.bss_section. It's just that if you reuse the header space, you can make very small programs smaller when compressed, as some values, such as entry point address, appear more than once. Larger memory blocks of course need to still go into this actual fake .bss section.

The documentation is out of date, and should be updated regarding findings from this thread.

Would also like to include flow2 as an example now that it pretty much became the 1k compression benchmark... but I think I'd need to contact auld for that.

Quote:

Which compression? lzma or zip?

LZMA, see page 2 of this thread.

added on the 2014-10-31 12:42:57 by Trilkk

I put the Flow2 shader in my 1k framework, total compressed size is 754 bytes.
Actual code+data is 393 bytes and the rest is the header/decompressor.

I think a different compressor specific for very small files would be better at this point but I have no idea how to make something like that.

added on the 2014-10-31 15:32:45 by drift

That is for windows of course. Not sure how ts got it 100 bytes smaller. Is the overhead on linux that much smaller?

added on the 2014-10-31 15:36:12 by drift

I think Mac is leading in 1k currently. Small overhead, and standard way to get nice sounds. Linux kind of lacks the sound part of it.

added on the 2014-10-31 18:34:35 by yzi

@Trilkk: ah yeah right.
@drift: +1. Let's make a PAQ exe compressor for elf!

added on the 2014-11-01 20:24:59 by stfsux

Well, mac sucks in the graphics API department. At least if you want to use some of the new fancy stuff. And DX11 with just a single compute shader should be pretty small, that might be a good alternative to a GL based framework. Maybe not exactly only for 1k - but for some serious stuff ;).

I recommend to abandon 1k completely, go for some more high tech shit and learn some new techniques - and then do an intro about it.

added on the 2014-11-01 21:39:55 by las

I have written a context-modeling compressor for linux/elf recently, which I hope to release soon with full source - in its current state I would say it sucks for 1k (I get 995 bytes for flow2, but that is without any shell script or writing to /tmp, and also with a somewhat suboptimal best context search). It works on the same principle as crinkler - i.e. it replaces ld in that you pass it a .o file and it produces a binary. I sadly don't have any good comparison base right now to compare it to crinkler, since I would be really curious to see how bad my compressor fares compared to the one in crinkler.
On 4k I seem to do better, trying to build Yog Sototh gets me a smaller binary than the released one.

added on the 2014-11-02 21:35:39 by minas

Quote:

Well, mac sucks in the graphics API department. At least if you want to use some of the new fancy stuff.

Please update your knowledge, we are not anymore in the year 2007 where we had to use GL 2.0: Mavericks/Yosemite has GL 4+ and although it doesn't have all the latest gimmicks, it is not that far behind anymore. I would call it even as a modern graphics pipeline. Not perfect but getting better...

Quote:

I have written a context-modeling compressor for linux/elf recently, which I hope to release soon with full source - in its current state I would say it sucks for 1k

Few questions: What is the size of the decompressor and does it support dividing the input as sections?

Dividing the input into sections seems to be something that improves the compression ratio (that little nudge between meh-result and good one)

Also, speaking about context modeling. How do you model them? static models? how do you find out which ones are best?

Do you use squash()/stretch() for probabilities? If you do, do you calculate them in x87 or some other way? (This is very painful in my implementation)

And how much memory is required?

added on the 2014-11-03 14:27:01 by ts

Quote:

What is the size of the decompressor and does it support dividing the input as sections?

Current size: 215 bytes of ELF header and required tables, 219 bytes of decompressor, for a total of 434 uncompressed bytes at the beginning of the binary (this could obviously be compressed by using lzma/whatever and dropping to /tmp, but that's specifically what I don't want to do here). Then there's 105 bytes of dynamic linking that go into the compressed code.
The current state is somewhat optimized, but there's definitely potential to kill off more bytes yet.
Sections: The linker currently simply concatenates the sections present in the object file in a reasonable order, but does not attempt reorderings (I understand crinkler does try that, so I might have a go at that as well). A nice feature of gcc is that using -Os causes constants to be sorted into different sections based on their type, so I do get some reasonable preordering that way.

Quote:

Also, speaking about context modeling. How do you model them? static models? how do you find out which ones are best?

I select four contexts, where a context is composed of the byte currently being decoded plus up to two bytes from the preceding 7 bytes. The four contexts are then weighted and added together. I currently use an exhaustive search (i.e. 7 nested loops) to pick what works best, but I have a heuristic on the way to vastly speed that up.
The bit counts are of the forgetful form also seen in PAQ (i.e. +=1, /=2).

Quote:

Do you use squash()/stretch() for probabilities? If you do, do you calculate them in x87 or some other way?

No x87 anywhere.

Quote:

And how much memory is required?

161 MB. 4 x 32MB for the context counts since I am not using hashtables, 32MB for the compressed and decompressed code (and also to avoid having yet another constant around), and 1 MB rounding error :-P

Code being cleaned up for an initial release, coming real soon now :-)

added on the 2014-11-03 15:45:39 by minas

Quote:

Please update your knowledge[...]

Hi, nice opener!

Quote:

Mavericks/Yosemite has GL 4+ and although it doesn't have all the latest gimmicks, it is not that far behind anymore.

It depends how you define not that far behind. It's like years behind.

I made a short timeline for you, enjoy:
GL 4.0 & GL 4.1 - 2010 <--- OS X 10.9 is here, most probably even 10.10
GL 4.2 - 2011
GL 4.3 - 2012
GL 4.4 - 2013
GL 4.5 - 2014 <--- We are here...

added on the 2014-11-03 18:07:36 by las

The sound APIs that have been used in 1k intros aren't the hottest new tech either. You are most welcome to do better 1k intros with whatever hottest newest libraries you want. So what if Mac OpenGL is this or that, I don't understand why it should matter, unless you can demonstrate how the missing bits would make the result noticeably more impressive. Syncing music and visuals is pretty impressive by the way. What OpenGL version is that in?

added on the 2014-11-03 18:56:26 by yzi

Quote:

sound APIs

apples and oranges...

added on the 2014-11-03 19:43:05 by red

pouët.net

OpenGL framework for 1k intro

login