Tiny Intro Toolbox Thread
category: code [glöplog]
Your jokes are bad and you should feel bad. ;)
Oh fuck! With the pseudo-code syntax I didn't realize how short rrrola's 0xCCCD trick was in ASM. REALLY NEAT!
kkrunchy source code
kkrunchy_k7 source code
both include older but reasonably size-optimized versions of disfilter; the newer version described in my blog post generally works better, but it was done after i was actively working on kkrunchy.
kkrunchy_k7 source code
both include older but reasonably size-optimized versions of disfilter; the newer version described in my blog post generally works better, but it was done after i was actively working on kkrunchy.
oops, wrong thread :) sorry about that.
...and of course I meant DosBox Turbo, not DropBox. ...silly me, *GG*, *SS*, fnis tihi og Anders du lukker bare røven!
Small sine approximations (19 bytes Taylor for 2*Pi and 9 bytes parabola for Pi/2)
because smallest homogenized sine from FPU must be around 30 bytes
Download pics and sourcecode
because smallest homogenized sine from FPU must be around 30 bytes
Code:
; 19 bytes sine table approx from 0 to 2*Pi : 255 values amplitude=53 [-26;0;+26]
_sin:
mov bx,ax ; al=bl=x
imul bl ; ax=x*x
mov al,ah ; ax=x*x*256+x%256
imul bl ; ax=x*(x*x*256+x%256)
mov al,ah ; ax=(x*(x*x*256+x%256))*256+(x*(x*x*256+x%256))%256
shr bx,2 ; bx=x/4
add al,13 ; ax=13+(x*(x*x*256+x%256))*256+(x*(x*x*256+x%256))%256
sub ax,bx ; ax=-x/4+100+(x*(x*x*256+x%256))*256+(x*(x*x*256+x%256))%256
db 0d4h,64 ; ax=(-x/4+100+(x*(x*x*256+x%256))*256+(x*(x*x*256+x%256))%256)%64
ret
Code:
;9 bytes parabola
_sin: ; sin(x)=(x*4/pi)-x*x*(4/pi*pi) if x>0 from from http://www.coranac.com/2009/07/sines/ in deg : sin(x)=(10*x-x*x)/6000*scale
; sin(x)=(x*4/pi)+x*x*(4/pi*pi) if x=<0
mov bx,ax ; al=bl=x
mul al ; ax=x*x
xchg ax,bx ; ax=x bx=x*x
aad ; ax=10*x (2 bytes smaller than mov dl,10 + mul dl)
sub ax,bx ; ax=10*x-x*x if x>0 else branch/replace by add ax,bx or use absolute value trick
mov al,ah
ret ; ah=6000*scale, using scale=23,4375 ah=256*al
Download pics and sourcecode
Also
is 1 byte shorter than :
although it will exit on all keypresses and not just the <esc> key.
Abductee uses this.
Code:
in al,60h
das
jp loop
is 1 byte shorter than :
Code:
in al,60h
dec al
jnz loop
although it will exit on all keypresses and not just the <esc> key.
Abductee uses this.
absolute JP vs conditional branch? hmmm.
JP = "Jump if parity"
Baudsurfer:
exits directly when startet out of the console at XP (same for abductee).
Code:
in al,60h
das
jp loop
exits directly when startet out of the console at XP (same for abductee).
Sensenstahl,
It is sad when microcode for a mnemonic behaves differently from one x86 cpu to another. This code works as expected here either on ntvdm or DosBox-0.74 on Intel(R) Core(TM)2 Duo :
I would assume this applies to aaa, aas, das and daa microcode interpretation as well.
A wild guess would be that the microcode was indeed wrongfully referenced by Intel at the beginning and emulators started implementing that theoretical wrongful interpretation without testing it, quickly followed by a correction of the chip makers themselves. If that was the case it could explain the behaviour difference between old and new x86 cpus.
But of course this is only pure speculation (unlike Intel's wrong initial documentation which is proven).
It is sad when microcode for a mnemonic behaves differently from one x86 cpu to another. This code works as expected here either on ntvdm or DosBox-0.74 on Intel(R) Core(TM)2 Duo :
Code:
org 100h
esc:in al,60h
dec al
jnz esc
ret
Quote:
"Description: This instruction incorrectly documented in Intel's materials. See description field. [src : http://asm.inightmare.org/opcodelst/index.php?op=AAA]"
I would assume this applies to aaa, aas, das and daa microcode interpretation as well.
A wild guess would be that the microcode was indeed wrongfully referenced by Intel at the beginning and emulators started implementing that theoretical wrongful interpretation without testing it, quickly followed by a correction of the chip makers themselves. If that was the case it could explain the behaviour difference between old and new x86 cpus.
But of course this is only pure speculation (unlike Intel's wrong initial documentation which is proven).
In code above, I pasted wrong snippet.
This is the code tested that works here physically and is one byte less than dec al/jnz (other 1 byte BCD opcode combinations exist) - although it does not work in "emulated" DosBox-0.74 :
This is the code tested that works here physically and is one byte less than dec al/jnz (other 1 byte BCD opcode combinations exist) - although it does not work in "emulated" DosBox-0.74 :
Quote:
; works physically for Intel(R) Core(TM)2 Duo
org 100h
esc:in al,60h
aaa
jz esc
ret
Guys, you shouldn't assume anything about sign, zero, parity or overflow after aaa/aas/daa/das (so no jp/jz/js). They only predictably affect carry and auxcarry (and their operation depends on auxcarry, which "in al,60h" doesn't affect).
DAA: if (A0 > 9) AF = 1; if (AF) AL += 0x06; if (A1 > 9) CF = 1; if (CF) AL += 0x60;
DAS: if (A0 > 9) AF = 1; if (AF) AL -= 0x06; if (A1 > 9) CF = 1; if (CF) AL += 0x60;
AAA: if (A0 > 9) AF = 1; CF = AF; if (CF) { A0 += 0x06, AH++; }
AAS: if (A0 > 9) AF = 1; CF = AF; if (CF) { A0 -= 0x06, AH--; }
where A0 is AL&0x0F and A1 is AL&0xF0.
DAA: if (A0 > 9) AF = 1; if (AF) AL += 0x06; if (A1 > 9) CF = 1; if (CF) AL += 0x60;
DAS: if (A0 > 9) AF = 1; if (AF) AL -= 0x06; if (A1 > 9) CF = 1; if (CF) AL += 0x60;
AAA: if (A0 > 9) AF = 1; CF = AF; if (CF) { A0 += 0x06, AH++; }
AAS: if (A0 > 9) AF = 1; CF = AF; if (CF) { A0 -= 0x06, AH--; }
where A0 is AL&0x0F and A1 is AL&0xF0.
In Intel Instruction Set Reference for DAA, DAS all flags except OF are actually defined.
In that case, please carry on.
Quote:
Also
Code:
in al,60h
das
jp loop
is 1 byte shorter than :
Code:
in al,60h
dec al
jnz loop
although it will exit on all keypresses and not just the <esc> key.
Abductee uses this.
Quote:
In Intel Instruction Set Reference for DAA, DAS all flags except OF are actually defined.
added on the 2014-10-16 15:43:28 by frag
Frag And Řrřola : it is a relief to learn first-hand I am not totally crazy ;)
Started as a mail to Optimus, but i thought sharing this could help others
Quote:
Just look at the code of "Hypnoteye"
http://www.pouet.net/prod.php?which=65604The tricks are :
1.) Horizontal segment shift (rrrola)
Adding 1 to the segment, adds 16 to the horizontal pixel offset. So, by adding n * 10 + 20
you center the image horizontally.
2.) The 0xCCCD trick (rrrola)
Multiplying DI (the current pixel position) with 0xCCCD, will put their respective X and Y
coordinates into DX and DL, AH.
For example (5,5) = (5*320+5) = (1605) = (0x645)
0x645 * 0xCCCD = 0x05040141
X 05.04
Y 04.01
Note : these values are now between -32768 and 32768 ;)
3.) Centering vertically
By a funny coincidence, if you start your code with "Push <word>" there is the opcode
0x68 at the start, which is [SI], if you don't touch that register. That's about the half of
the screen height, so you can subtract that from the Y coordinate, which safes a byte
over "sub dh,100" ;) For finding such coincidences, this map is very helpful
http://sparksandflames.com/files/x86InstructionChart.html
4.) Stack Adressing (rrrola and frag)
If you keep the register BX at zero, or make it be zero at the time you use this trick
you can efficiently get you registers on the FPU. That's possible because the stack
pointer SP is almost always the same on startup http://www.fysnet.net/yourhelp.htm
Just do a "pusha/popa" instead of several pushs/pops and adress via [BX-??]
See the stack order here : http://x86.renejeschke.de/html/file_module_x86_id_270.html
You also get easy access to "<byte> << 8" and "<byte> >> 8" terms, without the need
to actually perform something like "mov al,ah" or "shr ax,8" which saves (1-2) bytes.
5.) Constant Overlap
Instead of using 4 bytes precision for floats, use 2! append your constant right after
your code and leave the lower 2 bytes out. Calculate you hexadecimal representation
of the float HERE : http://www.h-schmidt.net/FloatConverter/IEEE754.html
That's almost always precise enough.
Advanced : Find places in your code, where the first two bytes of the constant already
exist, and refer to that places. Saves 4 bytes if done right. I pulled this of once in
Hypnoteye ;) (label mm) Again, to know where your desired numbers - or
good estimations of them - are use this map
http://sparksandflames.com/files/x86InstructionChart.html
Protip : Sometimes switching the sign of the constant helps ;)
6.) Black Color
You might know this one. If you want a black pixel, do a SALC. No matter what the flag
states are, AL is 0x00 or 0xFF afterwards, which both map to black when you use the
default palette. One byte is one byte! ^^
7.) Dithering to make smooth animations
This one is funky. Since "stosb" doesn't trigger any flags on overflow
(see here : http://x86.renejeschke.de/html/file_module_x86_id_306.html)
you need to add anything by yourself anyway. I found that two "inc di" work well
enough. It bascically draws every third pixel and then repeats this 2 times on
a shifted offset. After three frames, the zeroflag finally triggers and increments
the REAL framecounter by one. 2 bytes for triple diagonal interlacing, so to speak ^^
8.) Alternative palette generation
Something everbody should look at : the interrupt 10h version of setting colors
via OUT (0x3C8,0x3c9). Since it uses the interrupt 10h, you can save two bytes
here and weave the color generation into setting the video mode.
See : http://webpages.charter.net/danrollins/techhelp/0144.HTM
That's it for once. You should be able to integrate your raytracing loop into the
space between "pusha" and "popa". Loading the coordinates is already there,
(remember, between -32768 and +32768). Storing it back onto the stack, and
then getting it, is also there (to AL and CH, see, i already used an implicit shift
by 8 there). I will copy this on into the tiny tool box thread also, for everyone ;)
Greets, HellMood
Code:
org 100h
push 0xa000 - 70 ; rrrolas trick I
pop es ; (advanced)
mm:
or al,0x13 ; 320x200 in 256 colors
cwd
L: ; custom palette
add cl,1
int 0x10
mov ax,0x1010
add ch,4
add dh,8
inc bx
jnz L
X:
mov ax,0xcccd ; rrrolas trick II
mul di
sub dh,byte [si]
pusha
fninit
fild word [byte bx-8] ;<- dh+dl ; x
fst st1 ; x x
fmul st0 ; x*x x
fild word [byte bx-9] ;<- dl+bh ; y x*x x
fst st3 ; y x*x x y
fmul st0 ; y*y x*x x y
faddp st1 ; y*y+x*x x y
fsqrt ; r x y
fidivr dword [byte si+mm-3-0x100] ; invr x y
fistp word [byte bx-5] ;-> al+ch ; x y
fpatan ; arc
fmul dword [byte si+num-2-0x100] ; arc/pi*256
fistp word [byte bx-8] ;-> arc -> dx ;
popa
add dx,bp ; time influence
shr dx,2
test al,0x80 ; inner bound
jnz colskip
cmp al,0x1e ; outer bound
jg F2
salc
jmp short colskip
F2:
add ax,bp ; time influence
xor ax,dx ; arc influence
colskip:
stosb ; write
inc di ; dither
inc di
jnz X
inc bp ; next frame
...
in al,0x60 ; check for ESC
dec al
jnz X
ret ; exit
num:
db 0xa2,0x42
Phoenix/Hornet reminded of this thread :)
Some specific case creative shorter loops on x86 (maybe could come handy)
;5 bytes :
mov cx,2
twice:... /w non-CF
loop uio
;4 bytes :
mov cl,2
twice:... /w non-CF
loop uio
;3 bytes :
twice:... /w non-CF
cmc
jc uio
Also for 3-times loop one can also inc accumulator and use PF with jnp for example
00000001b PF=0 first run
00000010b PF=0 second run
00000011b PF=1 third run
Some specific case creative shorter loops on x86 (maybe could come handy)
;5 bytes :
mov cx,2
twice:... /w non-CF
loop uio
;4 bytes :
mov cl,2
twice:... /w non-CF
loop uio
;3 bytes :
twice:... /w non-CF
cmc
jc uio
Also for 3-times loop one can also inc accumulator and use PF with jnp for example
00000001b PF=0 first run
00000010b PF=0 second run
00000011b PF=1 third run
Typo above : the 'uio' label refers ofc to the 'twice' label
Code:
mov al,13h
int 10h
push 0a000h
pop es
;pop ds
;ds (backbuffer) contains randomness
mov ds, [0]
fnstcw [w]
or dword [w], 0xc00 ;truncate
fldcw [w]
;32160
AGAIN:
xor di,di
xor si,si
mov byte [ds:32160], 15 ;plot in center
fld dword [a]
fsincos
fmul dword [p]
fstp dword [c]
fstp dword [s]
; xor si,si
; xor di,di
mov word [y],199
Y mov ax,width
mul word [y]
mov word [yaddr],ax ;yaddr = y*width
mov word bx,[y]
sub bx,100
mov word [y_],bx
mov word [x],319
X mov word bx,[x]
sub bx,160
mov word [x_],bx
fld dword [c]
fimul word [x_]
fld dword [s]
fimul word [y_]
fsubp st1,st0
fistp word [u]
fld dword [c]
fimul word [y_]
fld dword [s]
fimul word [x_]
fsubp st1,st0
fistp word [v]
mov dx,r
add dx,3
xor bx,dx
rol bx,5
xchg dx,bx
mov [r],word dx
mov cl,dh
and dx,1
and cl,1
add dl,cl
dec dx
add word [u],160
add word [u],dx
add dh,bl
and dh,1
add dh,cl
dec dh
add byte [v],100
add byte [v],dh
xor ax,ax
mov al,[v]
mul word [width]
mov [vaddr],ax ;vaddr=v*width
;buffer[x+yaddr]=backbuffer[u+v*width]
;lodsb ;al=ds:si
;stosb ;es:di=al
mov si,[x]
add si,[yaddr]
mov al,[ds:si]
mov di,[u]
add di,[vaddr]
mov [es:di],al
dec word [x]
jns X
dec word [y]
jns Y
;buffer = es:di
;backbuffer = ds:si
;for i=0 to size backbuffer[i]=buffer[i]
xor si,si
xor di,di
mov cx,64000
L mov al,[es:di]
mov [ds:si], al
inc si
inc di
loop L
jmp AGAIN
ret
ret
w: dw 0
a: dw 0 ;angle
p: dw 0 ;amplitude
c: dd 0.0 ;cos
s: dd 0.0 ;sin
r: dw 0 ;rand
x: dw 0
y: dw 0
x_: dw 0
y_: dw 0
u: dw 0
v: db 0
width: dw 320
yaddr: dw 0
vaddr: dw 0
unfinished and borked fractal rotozoomer. if anyone fixes/finishes it give me a peep. creds are due. im to lazy to finish it..
Things that I would love to have when coding small intros (but that does exists) :
- FPU instruction for loading a float from a 16-bit value to the stack (Half float and minifloat are cool!)
- CALL rel8 (similar to short jumps)
- FPU instruction for loading a float from a 16-bit value to the stack (Half float and minifloat are cool!)
- CALL rel8 (similar to short jumps)
to create your custom palette the way (almost?) any source tells:
but if you want to change all 256 colors you can skip
and use those bytes for your effect. just access the data port and set your 256 colors:
Code:
palette:
mov dx,3c8h ;index port
out dx,al ;al = #of your color
inc dx ;data port is at 3ch9h
[calculate al]
out dx,al ;red 0..63
[calculate al]
out dx,al ;green 0..63
[calculate al]
out dx,al ;blue 0..63
[loop stuff]
but if you want to change all 256 colors you can skip
Code:
mov dx,3c8h
out dx,al ;al = #of your color
inc dx
and use those bytes for your effect. just access the data port and set your 256 colors:
Code:
mov dx,3ch9h
palette:
[calculate al]
out dx,al ;red 0..63
[calculate al]
out dx,al ;green 0..63
[calculate al]
out dx,al ;blue 0..63
[loop stuff]
ah the typo: 3c9h and not 3ch9h :D