offtopic: ffmpeg sse2 alignment fix - asm help
category: general [glöplog]
hi guys! when searching for horrible random crashes in ffmpeg, a library for compressing and decompression video, I found the following fix posted:
i don't expect many people here to be all into ffmpeg, but as I have precisely zero asm knowledge, I was wondering if anyone could tell me what exactly the above piece of asm code does? it seems to be something with stack alignment, which in turn makes my avcodec_encode_video call not crash badly. it works fine, but i'd love to know how ugly a hack this is and whether it should be safe to use.
Code:
#define VHALIGNCALL16(x) \
{\
asm("movl %esp, %ebx");\
asm("andl $0xfffffff0, %esp");\
asm("subl $12, %esp");\
asm("pushl %ebx");\
x;\
asm("popl %ebx");\
asm("movl %ebx, %esp");\
}
VHALIGNCALL16(bytesFilled = avcodec_encode_video(codecContext, target, maxTargetSize, avframe));
i don't expect many people here to be all into ffmpeg, but as I have precisely zero asm knowledge, I was wondering if anyone could tell me what exactly the above piece of asm code does? it seems to be something with stack alignment, which in turn makes my avcodec_encode_video call not crash badly. it works fine, but i'd love to know how ugly a hack this is and whether it should be safe to use.
ehh
ok that was fucked up. preview me beautiful :(
i meant:
bugreport!! [ code ] seems to screw things up!
ok that was fucked up. preview me beautiful :(
i meant:
Quote:
#define VHALIGNCALL16(x) \
{\
asm("movl %esp, %ebx");\
asm("andl $0xfffffff0, %esp");\
asm("subl $12, %esp");\
asm("pushl %ebx");\
x;\
asm("popl %ebx");\
asm("movl %ebx, %esp");\
}
VHALIGNCALL16(bytesFilled = avcodec_encode_video(codecContext, target, maxTargetSize, avframe));
bugreport!! [ code ] seems to screw things up!
easy: It aligns the stack pointer to 16 bytes so that the subsequent call has its local variables (that go onto the stack) properly aligned which is needed for eg. SSE code.
It does that by first aligning the stack pointer downwards to the next 16bytes border, skipping another 12 bytes downwards, then pushing the original stack pointer value to the stack, so that the pointer is aligned to 16 bytes again.
after the call it simply fetches the old original stack pointer value and resets the pointer with it.
It does that by first aligning the stack pointer downwards to the next 16bytes border, skipping another 12 bytes downwards, then pushing the original stack pointer value to the stack, so that the pointer is aligned to 16 bytes again.
after the call it simply fetches the old original stack pointer value and resets the pointer with it.
thanks! but won't that make previous values that were on the stack be overwritten, after the stack point has been moved down? or do stacks just grow downwards? :-)
That's right, the stack grows downwards on x86.
which way is up/down depends on the coordinate system.
Btw: Funny thing is that the actual function call will store lots of things on the stack itself AFTER the alignment, so it's not "local variable space will have a 16byte aligned border" but rather "local variable space will have some arbitrary alignment with respect to the next 16bytes border but you can be sure it'll at least stay the same every time so pad your local variables with stuff until it works". :)
wouldnt it be more simple to just pass a pointer to an aligned variable instead of all the stack crap? :> well just wondering. but yep, imo best way is to use a good aligned malloc routine and local __m128 vars should be aligned to 16 by your compiler already.. even if to align local vars to 16 bytes the compiler has to do similar crap in your back to get it to work..
keyj, kb, thanks!
nystep, "should be aligned by the compiler" indeed, but gcc doesn't do it on windows before version 4.2 - and for some reason, mingw stable is still version 3. this hack seems easier than updating 14 noobs' compiler settings. :-)
nystep, "should be aligned by the compiler" indeed, but gcc doesn't do it on windows before version 4.2 - and for some reason, mingw stable is still version 3. this hack seems easier than updating 14 noobs' compiler settings. :-)