Tiny Intro Toolbox Thread

category: code [glöplog]

Your jokes are bad and you should feel bad. ;)

added on the 2012-06-06 02:18:11 by rrrola

Oh fuck! With the pseudo-code syntax I didn't realize how short rrrola's 0xCCCD trick was in ASM. REALLY NEAT!

added on the 2012-06-08 13:58:37 by p01

Is this any good? DropBox Turbo for Android platform

I really didn't bother to test it my self ;)

added on the 2012-06-08 19:11:22 by maytz

kkrunchy source code
kkrunchy_k7 source code

both include older but reasonably size-optimized versions of disfilter; the newer version described in my blog post generally works better, but it was done after i was actively working on kkrunchy.

added on the 2012-06-09 06:30:58 by ryg

oops, wrong thread :) sorry about that.

added on the 2012-06-09 06:31:28 by ryg

...and of course I meant DosBox Turbo, not DropBox. ...silly me, *GG*, *SS*, fnis tihi og Anders du lukker bare røven!

added on the 2012-06-09 09:25:34 by maytz

Small sine approximations (19 bytes Taylor for 2*Pi and 9 bytes parabola for Pi/2)
because smallest homogenized sine from FPU must be around 30 bytes

Code:


; 19 bytes sine table approx from 0 to 2*Pi : 255 values amplitude=53 [-26;0;+26]
_sin: 
mov bx,ax       ; al=bl=x
imul bl         ; ax=x*x
mov al,ah       ; ax=x*x*256+x%256
imul bl         ; ax=x*(x*x*256+x%256)
mov al,ah       ; ax=(x*(x*x*256+x%256))*256+(x*(x*x*256+x%256))%256
shr bx,2        ; bx=x/4
add al,13       ; ax=13+(x*(x*x*256+x%256))*256+(x*(x*x*256+x%256))%256 
sub ax,bx       ; ax=-x/4+100+(x*(x*x*256+x%256))*256+(x*(x*x*256+x%256))%256
db 0d4h,64      ; ax=(-x/4+100+(x*(x*x*256+x%256))*256+(x*(x*x*256+x%256))%256)%64
ret

Code:


;9 bytes parabola
_sin:          ; sin(x)=(x*4/pi)-x*x*(4/pi*pi) if x>0 from from http://www.coranac.com/2009/07/sines/ in deg : sin(x)=(10*x-x*x)/6000*scale
               ; sin(x)=(x*4/pi)+x*x*(4/pi*pi) if x=<0
mov bx,ax      ; al=bl=x
mul al         ; ax=x*x
xchg ax,bx     ; ax=x bx=x*x
aad            ; ax=10*x  (2 bytes smaller than mov dl,10 + mul dl)
sub ax,bx      ; ax=10*x-x*x if x>0 else branch/replace by add ax,bx or use absolute value trick
mov al,ah
ret            ; ah=6000*scale, using scale=23,4375 ah=256*al

Download pics and sourcecode

added on the 2013-12-31 15:52:29 by Baudsurfer

Corrected link to output pics : view.

There is also an old 40 bytes fixed point sinus generator gem by KarL/NoooN but it is bigger than fpu version I believe.

added on the 2013-12-31 16:08:25 by Baudsurfer

Also

Code:

  
in al,60h
das
jp loop

is 1 byte shorter than :

Code:

  
in al,60h
dec al
jnz loop

although it will exit on all keypresses and not just the <esc> key.
Abductee uses this.

added on the 2014-10-14 18:31:07 by Baudsurfer

absolute JP vs conditional branch? hmmm.

added on the 2014-10-15 10:43:36 by g0blinish

JP = "Jump if parity"

added on the 2014-10-15 16:34:32 by bartman

Baudsurfer:

Code:

in al,60h
das
jp loop

exits directly when startet out of the console at XP (same for abductee).

added on the 2014-10-15 17:54:26 by sensenstahl

Sensenstahl,

It is sad when microcode for a mnemonic behaves differently from one x86 cpu to another. This code works as expected here either on ntvdm or DosBox-0.74 on Intel(R) Core(TM)2 Duo :

Code:

      
org 100h
esc:in al,60h
dec al
jnz esc
ret

Quote:

"Description: This instruction incorrectly documented in Intel's materials. See description field. [src : http://asm.inightmare.org/opcodelst/index.php?op=AAA]"

I would assume this applies to aaa, aas, das and daa microcode interpretation as well.

A wild guess would be that the microcode was indeed wrongfully referenced by Intel at the beginning and emulators started implementing that theoretical wrongful interpretation without testing it, quickly followed by a correction of the chip makers themselves. If that was the case it could explain the behaviour difference between old and new x86 cpus.

But of course this is only pure speculation (unlike Intel's wrong initial documentation which is proven).

added on the 2014-10-16 07:21:35 by Baudsurfer

In code above, I pasted wrong snippet.

This is the code tested that works here physically and is one byte less than dec al/jnz (other 1 byte BCD opcode combinations exist) - although it does not work in "emulated" DosBox-0.74 :

Quote:

; works physically for Intel(R) Core(TM)2 Duo
org 100h
esc:in al,60h
aaa
jz esc
ret

added on the 2014-10-16 07:33:53 by Baudsurfer

Guys, you shouldn't assume anything about sign, zero, parity or overflow after aaa/aas/daa/das (so no jp/jz/js). They only predictably affect carry and auxcarry (and their operation depends on auxcarry, which "in al,60h" doesn't affect).

DAA: if (A0 > 9) AF = 1; if (AF) AL += 0x06; if (A1 > 9) CF = 1; if (CF) AL += 0x60;
DAS: if (A0 > 9) AF = 1; if (AF) AL -= 0x06; if (A1 > 9) CF = 1; if (CF) AL += 0x60;
AAA: if (A0 > 9) AF = 1; CF = AF; if (CF) { A0 += 0x06, AH++; }
AAS: if (A0 > 9) AF = 1; CF = AF; if (CF) { A0 -= 0x06, AH--; }
where A0 is AL&0x0F and A1 is AL&0xF0.

added on the 2014-10-16 13:13:52 by rrrola

In Intel Instruction Set Reference for DAA, DAS all flags except OF are actually defined.

added on the 2014-10-16 15:43:28 by frag

In that case, please carry on.

added on the 2014-10-16 18:14:59 by rrrola

Quote:

Also

Code:

in al,60h
das
jp loop

is 1 byte shorter than :

Code:

in al,60h
dec al
jnz loop

although it will exit on all keypresses and not just the <esc> key.
Abductee uses this.

Quote:

In Intel Instruction Set Reference for DAA, DAS all flags except OF are actually defined.
added on the 2014-10-16 15:43:28 by frag

Frag And Řrřola : it is a relief to learn first-hand I am not totally crazy ;)

added on the 2014-10-22 07:10:20 by Baudsurfer

Started as a mail to Optimus, but i thought sharing this could help others

Quote:

Just look at the code of "Hypnoteye"
http://www.pouet.net/prod.php?which=65604The tricks are :

1.) Horizontal segment shift (rrrola)
Adding 1 to the segment, adds 16 to the horizontal pixel offset. So, by adding n * 10 + 20
you center the image horizontally.

2.) The 0xCCCD trick (rrrola)
Multiplying DI (the current pixel position) with 0xCCCD, will put their respective X and Y
coordinates into DX and DL, AH.

For example (5,5) = (5*320+5) = (1605) = (0x645)
0x645 * 0xCCCD = 0x05040141

X 05.04
Y 04.01

Note : these values are now between -32768 and 32768 ;)

3.) Centering vertically
By a funny coincidence, if you start your code with "Push <word>" there is the opcode
0x68 at the start, which is [SI], if you don't touch that register. That's about the half of
the screen height, so you can subtract that from the Y coordinate, which safes a byte
over "sub dh,100" ;) For finding such coincidences, this map is very helpful
http://sparksandflames.com/files/x86InstructionChart.html
4.) Stack Adressing (rrrola and frag)
If you keep the register BX at zero, or make it be zero at the time you use this trick
you can efficiently get you registers on the FPU. That's possible because the stack
pointer SP is almost always the same on startup http://www.fysnet.net/yourhelp.htm
Just do a "pusha/popa" instead of several pushs/pops and adress via [BX-??]
See the stack order here : http://x86.renejeschke.de/html/file_module_x86_id_270.html
You also get easy access to "<byte> << 8" and "<byte> >> 8" terms, without the need
to actually perform something like "mov al,ah" or "shr ax,8" which saves (1-2) bytes.

5.) Constant Overlap
Instead of using 4 bytes precision for floats, use 2! append your constant right after
your code and leave the lower 2 bytes out. Calculate you hexadecimal representation
of the float HERE : http://www.h-schmidt.net/FloatConverter/IEEE754.html
That's almost always precise enough.
Advanced : Find places in your code, where the first two bytes of the constant already
exist, and refer to that places. Saves 4 bytes if done right. I pulled this of once in
Hypnoteye ;) (label mm) Again, to know where your desired numbers - or
good estimations of them - are use this map
http://sparksandflames.com/files/x86InstructionChart.html
Protip : Sometimes switching the sign of the constant helps ;)

6.) Black Color
You might know this one. If you want a black pixel, do a SALC. No matter what the flag
states are, AL is 0x00 or 0xFF afterwards, which both map to black when you use the
default palette. One byte is one byte! ^^

7.) Dithering to make smooth animations
This one is funky. Since "stosb" doesn't trigger any flags on overflow
(see here : http://x86.renejeschke.de/html/file_module_x86_id_306.html)
you need to add anything by yourself anyway. I found that two "inc di" work well
enough. It bascically draws every third pixel and then repeats this 2 times on
a shifted offset. After three frames, the zeroflag finally triggers and increments
the REAL framecounter by one. 2 bytes for triple diagonal interlacing, so to speak ^^

8.) Alternative palette generation
Something everbody should look at : the interrupt 10h version of setting colors
via OUT (0x3C8,0x3c9). Since it uses the interrupt 10h, you can save two bytes
here and weave the color generation into setting the video mode.
See : http://webpages.charter.net/danrollins/techhelp/0144.HTM

That's it for once. You should be able to integrate your raytracing loop into the
space between "pusha" and "popa". Loading the coordinates is already there,
(remember, between -32768 and +32768). Storing it back onto the stack, and
then getting it, is also there (to AL and CH, see, i already used an implicit shift
by 8 there). I will copy this on into the tiny tool box thread also, for everyone ;)

Greets, HellMood

Code:

org 100h
push 0xa000 - 70							; rrrolas trick I
pop es										; (advanced)
mm:
or al,0x13									; 320x200 in 256 colors
cwd
L:											; custom palette
add cl,1
int 0x10
mov ax,0x1010
add ch,4
add dh,8
inc bx
jnz L
X:
mov ax,0xcccd								; rrrolas trick II
mul di
sub dh,byte [si]
pusha
fninit
fild word [byte bx-8]		;<- dh+dl		; x
fst st1										; x x
fmul st0									; x*x x
fild word [byte bx-9]		;<- dl+bh		; y x*x x
fst st3										; y x*x x y									
fmul st0									; y*y x*x x y 
faddp st1									; y*y+x*x x y
fsqrt										; r x y
fidivr dword [byte si+mm-3-0x100]			; invr x y
fistp word [byte bx-5]		;-> al+ch		; x y
fpatan										; arc
fmul dword [byte si+num-2-0x100]			; arc/pi*256
fistp word [byte bx-8]		;-> arc -> dx	;
popa
add dx,bp									; time influence
shr dx,2
test al,0x80								; inner bound
jnz colskip
cmp al,0x1e									; outer bound
jg F2
salc
jmp short colskip
F2:
add ax,bp									; time influence
xor ax,dx									; arc influence
colskip:
stosb										; write
inc di										; dither
inc di
jnz X
inc bp										; next frame
...
in al,0x60									; check for ESC
dec al
jnz X
ret											; exit
num:
db 0xa2,0x42

added on the 2015-09-14 20:16:51 by HellMood

Phoenix/Hornet reminded of this thread :)

Some specific case creative shorter loops on x86 (maybe could come handy)

;5 bytes :
mov cx,2
twice:... /w non-CF
loop uio

;4 bytes :
mov cl,2
twice:... /w non-CF
loop uio

;3 bytes :
twice:... /w non-CF
cmc
jc uio

Also for 3-times loop one can also inc accumulator and use PF with jnp for example
00000001b PF=0 first run
00000010b PF=0 second run
00000011b PF=1 third run

added on the 2016-01-16 01:14:36 by Baudsurfer

Typo above : the 'uio' label refers ofc to the 'twice' label

added on the 2016-01-16 01:33:36 by Baudsurfer

Code:

mov al,13h
int 10h
push 0a000h
pop es
;pop ds

;ds (backbuffer) contains randomness
mov ds, [0]

fnstcw [w]
or dword [w], 0xc00	;truncate
fldcw [w]
	
;32160
AGAIN:
  xor di,di
  xor si,si
  mov byte [ds:32160], 15 ;plot in center

  fld dword [a]
  fsincos
  fmul dword [p]
  fstp dword [c]
  fstp dword [s]
;  xor si,si
;  xor di,di
  mov word [y],199
Y mov ax,width
  mul word [y]
  mov word [yaddr],ax	  ;yaddr = y*width
  mov word bx,[y]
  sub bx,100
  mov word [y_],bx
  mov word [x],319
X mov word bx,[x]
  sub bx,160
  mov word [x_],bx
  fld	dword [c]
  fimul	word [x_]
  fld	dword [s]
  fimul	word [y_]
  fsubp	st1,st0
  fistp	word [u]
  fld	dword [c]
  fimul	word [y_]
  fld	dword [s]
  fimul word [x_]
  fsubp st1,st0
  fistp word [v]
  mov dx,r
  add dx,3
  xor bx,dx
  rol bx,5
  xchg dx,bx
  mov [r],word dx
  mov cl,dh
  and dx,1
  and cl,1
  add dl,cl
  dec dx
  add word [u],160
  add word [u],dx
  add dh,bl
  and dh,1
  add dh,cl
  dec dh
  add byte [v],100
  add byte [v],dh
  xor ax,ax
  mov al,[v]
  mul word [width]
  mov [vaddr],ax		;vaddr=v*width
  ;buffer[x+yaddr]=backbuffer[u+v*width]
    ;lodsb	;al=ds:si
    ;stosb	;es:di=al
    mov si,[x]
    add si,[yaddr]
    mov al,[ds:si]

    mov di,[u]
    add di,[vaddr]
    mov [es:di],al
  dec word [x]
  jns X
  dec word [y]
  jns Y

  ;buffer = es:di
  ;backbuffer = ds:si

  ;for i=0 to size backbuffer[i]=buffer[i]
  xor si,si
  xor di,di
  mov cx,64000
  L mov al,[es:di]
    mov [ds:si], al
    inc si
    inc di
  loop L

jmp AGAIN
ret


ret

w: dw 0
a: dw 0		;angle
p: dw 0		;amplitude
c: dd 0.0	;cos
s: dd 0.0	;sin
r: dw 0		;rand
x: dw 0
y: dw 0
x_: dw 0
y_: dw 0
u: dw 0
v: db 0
width: dw 320
yaddr: dw 0
vaddr: dw 0

unfinished and borked fractal rotozoomer. if anyone fixes/finishes it give me a peep. creds are due. im to lazy to finish it..

added on the 2016-01-16 22:37:57 by rudi

Things that I would love to have when coding small intros (but that does exists) :

- FPU instruction for loading a float from a 16-bit value to the stack (Half float and minifloat are cool!)
- CALL rel8 (similar to short jumps)

added on the 2016-12-11 01:29:30 by Tigrou

to create your custom palette the way (almost?) any source tells:

Code:


palette:
mov dx,3c8h ;index port
out dx,al   ;al = #of your color
inc dx      ;data port is at 3ch9h
[calculate al]
out dx,al ;red 0..63
[calculate al]
out dx,al ;green 0..63
[calculate al]
out dx,al ;blue 0..63
[loop stuff]

but if you want to change all 256 colors you can skip

Code:


mov dx,3c8h
out dx,al       ;al = #of your color
inc dx

and use those bytes for your effect. just access the data port and set your 256 colors:

Code:


mov dx,3ch9h
palette:
[calculate al]
out dx,al ;red 0..63
[calculate al]
out dx,al ;green 0..63
[calculate al]
out dx,al ;blue 0..63
[loop stuff]

added on the 2016-12-11 08:10:37 by sensenstahl

ah the typo: 3c9h and not 3ch9h :D

added on the 2016-12-11 08:13:11 by sensenstahl

pouët.net

Tiny Intro Toolbox Thread

login