Tiny Intro Toolbox Thread

category: code [glöplog]

Here is what I have used in the past (it's pretty long but allow precise control over each RGB value):

Code:

salc                        ; clear AL	 			
mov	dx, 0x3c8   
out	dx, al
inc	dx
P:                          ; assume CX = 255
	mov	bl, 5
	call PAL
	mov	bl, 2
	call PAL
	mov	bl, 3
	call PAL
loop P

Code:


PAL:
mov	al, cl
not	al
mul	bl
shr 	ax, 2
cmp	al, 63
jbe	clamp			; clamp
	mov	al, 63
clamp:
out	dx, al		
ret

This is rougly the equivalent of :

Code:

for(i = 0 ; i < 256 ; i++)
{
  output(i * 2.5); //red
  output(i * 1); //green
  output(i * 1.5); //blue
}

Of course if you want RGB values to be simple muliples of AL and that result never overflow 63 there is simpler ways (just right or left shift AL).

There should be articles about palette on http://www.sizecoding.org/. No good 256b without good palette.

added on the 2016-12-11 15:06:21 by Tigrou

In fact, i'm procrastinating heavily with said article ;) Just as a teaser, "Hypnoteye" uses the Subfunction 10h of INT 10h, which allows for very short palette generation code (reuse of "int 10h")

Code:

(mov al,0x13)
L: add cl,1
int 0x10
mov ax,0x1010
add ch,4
add dh,8
inc bx
jnz L

added on the 2016-12-11 17:27:06 by HellMood

@HelloMood : what is the shortest way to check (and jump if needed) if value in ST0 (FPU stack) is greater than zero ?

I know about FTST but it require a bunch of other instructions to make it work (too much actually). I have also tried FCOMI but without success.

added on the 2016-12-11 17:56:08 by Tigrou

FCOMI is known to not work with DosBox, i'd check here ->
http://www.pouet.net/topic.php?which=8791&page=5

added on the 2016-12-11 19:56:44 by HellMood

This jumps when ST0 > 2^-133 and is shorter than ftst / fstsw ax / sahf / jg. It works by treating the more significant half of a float32 as a signed int16:

Code:

; needs di=-2 and ax=0 (or some other regs)
fst dword[bx+di]
cmp word[bx],ax
jg  Positive

You can compare with any simple float this way: you get the full sign/exponent and 1+7 bits of mantissa.

Code:

; needs di=-3 and al=0
fst dword[bx+di]
cmp byte[bx],al
jg  Positive

When using just the most significant byte you can compare with any 2^(2n+1).

added on the 2016-12-14 10:16:45 by rrrola

Negative floats are flipped, so they need to be tested with ja/jb.

added on the 2016-12-14 10:23:57 by rrrola

@rrrola : thanks for the trick. I also tried the following : fistp into a 16-bit var then testing using cmp (it's pretty short). It works but there is some impression that produce visual glitches.

Any idea how this palette code works ? (it's from quatro)

Code:

			push 0xA000				; Start of VGA video memory
			pop es					; into ES
			xor bp,bp				; BP adressing, uses SS, frees DS, no extra segment needed
			mov al,0x13				; mode 13h, 320x200 in 256 colors
			mov dh,0x80				; high byte of offscreen memory, low byte not important
			mov ds,dx				; no palette influence (later) when DH = 0x80
			inc cx					; align color components / color number / color count
palette_loop:
			int 0x10				; shared int 10h ! (palette entry , set mode)
			sub ch,2				; adjust green value
			sub dh,4				; adjust red value
			dec bx					; next color
			mov ax,0x1010				; sub function to change palette
			loop palette_loop			; adjust blue value & loop

I produce a nice 4 gradients palette.
I look at the docs but couldn't find anything. AFAIK it's contiunously calling int 10h with ah = 0x10 and al = 0x10.

added on the 2016-12-15 22:54:54 by Tigrou

Tigrou, that's basically the routine from "Hypnoteye"
mentioned above, but a bit optimized ;)

http://www.ctyme.com/intr/rb-0121.htm

You can also load a whole palette at once. If you load
your screen as palette you can achieve very very short
interesting effects.

Popshades 15b
http://www.ctyme.com/intr/rb-0122.htm

added on the 2016-12-15 23:09:39 by HellMood

Do'h! I remember I saw something about that palette trick somewhere but couldn't remember where exactly. I search the whole sizecoding.org tutorials and forgot about checking this topic.
Thanks for the links btw.

added on the 2016-12-15 23:22:14 by Tigrou

folks, how grayscale palette is obtained in Megapole by Baudsurfer, I see the code but can't find it
http://olivier.poudade.free.fr/src/Megapole.asm

added on the 2017-10-19 11:24:36 by gorgh

He is using the 16 gray shades already existing in the standard VGA palette (offset +16)

Critical code before writing to the screen (stosb)

Code:

  mov al,16                ; normalize with dithering add overlap ah=color/18+16 
  aad 1                    ; dithering normalized and prepare for next frame cwd
  test di,di               ; test for all pixels plotted overrunning vga segment
  jp o                     ; preserve zf flag and test if absolute beam position
  inc ax                   ; parity even augmenting lighting for odd meta-pixels
o:stosb                    ; write screen pixel & advance absolute beam position

added on the 2017-10-19 11:35:30 by HellMood

thanks!

added on the 2017-10-19 11:40:30 by gorgh

hello again, is it save to assume that the variable declared at the end of the code as

Code:yvar dw ?

will have zero value on the start?

added on the 2017-10-25 17:14:27 by gorgh

no, although it's almost safe to assume it works in dosbox. i'd suggest to only place vars outside the code, when you don't rely on any defined starting value. but you could reuse initial code as variables, so you would know the starting value ;)

added on the 2017-10-25 18:56:24 by HellMood

some other thingy...gave me a headache recently. As I tried to squeeze one byte from

Code:

mov bx,ax
shl bx,2

with

Code:

shld bx,ax,18

I realized that this worked only on DOSBox and my old AMD Sempron, but not on an Intel Core i5 or i7...seems what's written in the x86 manual "...If the count is greater than the operand size, the result in the destination operand is undefined." is true for some cpu's...oh well, the funny obstacles in sizecoding :-p ...just wanted to share this if you try the same...

added on the 2017-11-04 00:02:06 by Kuemmel

from the top of my head,

Code:imul bx,ax,byte 4

should do it in 3 bytes

added on the 2017-11-04 00:12:51 by HellMood

...it does ! Thanks, didn't pop up in my head. I'll see if there's a speed penalty on this...

added on the 2017-11-04 00:46:46 by Kuemmel

Of course there is a speed penalty, we keep shifting bytes for reasons!
But in case of limited size-intros it´s always either speed or size...size wins in 90% of all cases!

But there´s sth in this case which makes the IMUL the best for both: MOV/SHL takes 5 cycles together, SHLD takes 4 cycles and the IMUL just 3 cycles.

I wonder if my first sentences still make sense, i assumed x86-MUL would execute as slow as on some 8/16-bit machines i coded in the past, but it seems this ain´t the case!
With (I)DIVs 17-41 cycles i guess the SHR (1 cycle) or SHRD (4 cycles) are still to be preferred, though! :D

added on the 2017-11-04 04:15:16 by ɧ4ɾɗվ.

Hardy, I thought so too, you would want to go for shifts usually. It seems to be also dependent on the CPU architecture.

For my routine there's not much of a difference as the bootleneck is actually elsewhere, but when I look at Agner's Instruction tables here, for example on the Intel Skylake architecture it looks like MOV+SHL have a latency of 1+1, then SHLD has 3 and IMUL 4, but I think you can't rely on that tables in the end and test it anyway as it depends on the instructions before and after also.

added on the 2017-11-04 11:05:07 by Kuemmel

...due to learning that stuff for myself and the bytebeat achievements in the last few years in tiny intros I wrote a tutorial to do Advanced PC Speaker and COVOX sound via interrupt.

This section derived from a talk to TomCat who provided 99% of the code and that should give you a nice start to get your bytebeat into a nice 256 byte intro.

Of course there's lots more to add and talk about bytebeat. Any comments/additions/corrections welcome as always.

Now you don't have any excuse for a "soundless" tiny intro ;-)

added on the 2018-11-15 18:59:49 by Kuemmel

that's right :) Now the wiki covers MIDI, pcspeaker (PWM and normal) and COVOX

nice work dudes :)

added on the 2018-11-16 11:00:18 by HellMood

Quote:

...due to learning that stuff for myself and the bytebeat achievements in the last few years in tiny intros I wrote a tutorial to do Advanced PC Speaker and COVOX sound via interrupt.

Oei! That is some nice stuff. Thanks!

added on the 2018-11-16 13:35:31 by numtek

Quote:

btw: no fcomi in DOSBox.

Supported in DOSBox-X for more than 1 year. And I hope it will be supported in vanila DOSBox at one day. patch

This is the best thread on pouet (at least for me). I'd like to update it with some recent information. So I will respond to some old comments (sry).

added on the 2019-04-06 11:02:51 by TomCatAbaddon

Quote:

Operating system
Windows XP.

Editor / IDE
EditPlus - The only disadvantage is that it's not free.

Assembler
The Netwide Assembler - NASM, the most handy one.

Disassembler
NDISASM provided with NASM is enough.

Debugger
Unnecessary.

HexEditor
Viewing executable in hexadecimal mode is useful, for example for checking if some code parts can be used as constants. HxD is good and free.

My way:

Operating system
Native DOS from USB, created by Rufus.
DPMI extension: HX DPMI v2.17
mouse driver: CuteMouse v2.1

Editor / IDE
FASMIDE - Comes with FASM assembler for DOS.

Disassembler
DEBUG clone v1.32b - redirecting text output to a file :-)

Debugger
CodeView v2.2 by MS

HexEditor
HIEW v6.50 DOS - examine code and search for long instructions

added on the 2019-04-06 16:05:37 by TomCatAbaddon

Quote:

Code:; st0 st1 on fpu stack - leaves the maximum in st0 _max: fcomi st0, st1 fstsw ax jbe _max0 _max1: fxch st1 _max0: fstp st0 ret

Any suggestions for a smaller version?

Code:

FCMOVB ST(0),ST(1)
FSTP ST(1)

fast and 4 bytes only and yes, it's PPro

added on the 2019-04-06 16:19:43 by TomCatAbaddon

pouët.net

Tiny Intro Toolbox Thread

login