Tiny Intro Toolbox Thread
category: code [glöplog]
Here is what I have used in the past (it's pretty long but allow precise control over each RGB value):
This is rougly the equivalent of :
Of course if you want RGB values to be simple muliples of AL and that result never overflow 63 there is simpler ways (just right or left shift AL).
There should be articles about palette on http://www.sizecoding.org/. No good 256b without good palette.
Code:
salc ; clear AL
mov dx, 0x3c8
out dx, al
inc dx
P: ; assume CX = 255
mov bl, 5
call PAL
mov bl, 2
call PAL
mov bl, 3
call PAL
loop P
Code:
PAL:
mov al, cl
not al
mul bl
shr ax, 2
cmp al, 63
jbe clamp ; clamp
mov al, 63
clamp:
out dx, al
ret
This is rougly the equivalent of :
Code:
for(i = 0 ; i < 256 ; i++)
{
output(i * 2.5); //red
output(i * 1); //green
output(i * 1.5); //blue
}
Of course if you want RGB values to be simple muliples of AL and that result never overflow 63 there is simpler ways (just right or left shift AL).
There should be articles about palette on http://www.sizecoding.org/. No good 256b without good palette.
In fact, i'm procrastinating heavily with said article ;) Just as a teaser, "Hypnoteye" uses the Subfunction 10h of INT 10h, which allows for very short palette generation code (reuse of "int 10h")
Code:
(mov al,0x13)
L: add cl,1
int 0x10
mov ax,0x1010
add ch,4
add dh,8
inc bx
jnz L
@HelloMood : what is the shortest way to check (and jump if needed) if value in ST0 (FPU stack) is greater than zero ?
I know about FTST but it require a bunch of other instructions to make it work (too much actually). I have also tried FCOMI but without success.
I know about FTST but it require a bunch of other instructions to make it work (too much actually). I have also tried FCOMI but without success.
FCOMI is known to not work with DosBox, i'd check here ->
http://www.pouet.net/topic.php?which=8791&page=5
http://www.pouet.net/topic.php?which=8791&page=5
This jumps when ST0 > 2^-133 and is shorter than ftst / fstsw ax / sahf / jg. It works by treating the more significant half of a float32 as a signed int16:
Code:
You can compare with any simple float this way: you get the full sign/exponent and 1+7 bits of mantissa.; needs di=-2 and ax=0 (or some other regs)
fst dword[bx+di]
cmp word[bx],ax
jg Positive
Code:
When using just the most significant byte you can compare with any 2^(2n+1).; needs di=-3 and al=0
fst dword[bx+di]
cmp byte[bx],al
jg Positive
Negative floats are flipped, so they need to be tested with ja/jb.
@rrrola : thanks for the trick. I also tried the following : fistp into a 16-bit var then testing using cmp (it's pretty short). It works but there is some impression that produce visual glitches.
Any idea how this palette code works ? (it's from quatro)
I produce a nice 4 gradients palette.
I look at the docs but couldn't find anything. AFAIK it's contiunously calling int 10h with ah = 0x10 and al = 0x10.
Any idea how this palette code works ? (it's from quatro)
Code:
push 0xA000 ; Start of VGA video memory
pop es ; into ES
xor bp,bp ; BP adressing, uses SS, frees DS, no extra segment needed
mov al,0x13 ; mode 13h, 320x200 in 256 colors
mov dh,0x80 ; high byte of offscreen memory, low byte not important
mov ds,dx ; no palette influence (later) when DH = 0x80
inc cx ; align color components / color number / color count
palette_loop:
int 0x10 ; shared int 10h ! (palette entry , set mode)
sub ch,2 ; adjust green value
sub dh,4 ; adjust red value
dec bx ; next color
mov ax,0x1010 ; sub function to change palette
loop palette_loop ; adjust blue value & loop
I produce a nice 4 gradients palette.
I look at the docs but couldn't find anything. AFAIK it's contiunously calling int 10h with ah = 0x10 and al = 0x10.
Tigrou, that's basically the routine from "Hypnoteye"
mentioned above, but a bit optimized ;)
http://www.ctyme.com/intr/rb-0121.htm
You can also load a whole palette at once. If you load
your screen as palette you can achieve very very short
interesting effects.
Popshades 15b
http://www.ctyme.com/intr/rb-0122.htm
mentioned above, but a bit optimized ;)
http://www.ctyme.com/intr/rb-0121.htm
You can also load a whole palette at once. If you load
your screen as palette you can achieve very very short
interesting effects.
Popshades 15b
http://www.ctyme.com/intr/rb-0122.htm
Do'h! I remember I saw something about that palette trick somewhere but couldn't remember where exactly. I search the whole sizecoding.org tutorials and forgot about checking this topic.
Thanks for the links btw.
Thanks for the links btw.
folks, how grayscale palette is obtained in Megapole by Baudsurfer, I see the code but can't find it
http://olivier.poudade.free.fr/src/Megapole.asm
http://olivier.poudade.free.fr/src/Megapole.asm
He is using the 16 gray shades already existing in the standard VGA palette (offset +16)
Critical code before writing to the screen (stosb)
Critical code before writing to the screen (stosb)
Code:
mov al,16 ; normalize with dithering add overlap ah=color/18+16
aad 1 ; dithering normalized and prepare for next frame cwd
test di,di ; test for all pixels plotted overrunning vga segment
jp o ; preserve zf flag and test if absolute beam position
inc ax ; parity even augmenting lighting for odd meta-pixels
o:stosb ; write screen pixel & advance absolute beam position
thanks!
hello again, is it save to assume that the variable declared at the end of the code as
Code:
will have zero value on the start?yvar dw ?
no, although it's almost safe to assume it works in dosbox. i'd suggest to only place vars outside the code, when you don't rely on any defined starting value. but you could reuse initial code as variables, so you would know the starting value ;)
some other thingy...gave me a headache recently. As I tried to squeeze one byte from
Code:
withmov bx,ax
shl bx,2
Code:
I realized that this worked only on DOSBox and my old AMD Sempron, but not on an Intel Core i5 or i7...seems what's written in the x86 manual "...If the count is greater than the operand size, the result in the destination operand is undefined." is true for some cpu's...oh well, the funny obstacles in sizecoding :-p ...just wanted to share this if you try the same...shld bx,ax,18
from the top of my head,
should do it in 3 bytes
Code:
imul bx,ax,byte 4
should do it in 3 bytes
...it does ! Thanks, didn't pop up in my head. I'll see if there's a speed penalty on this...
Of course there is a speed penalty, we keep shifting bytes for reasons!
But in case of limited size-intros it´s always either speed or size...size wins in 90% of all cases!
But there´s sth in this case which makes the IMUL the best for both: MOV/SHL takes 5 cycles together, SHLD takes 4 cycles and the IMUL just 3 cycles.
I wonder if my first sentences still make sense, i assumed x86-MUL would execute as slow as on some 8/16-bit machines i coded in the past, but it seems this ain´t the case!
With (I)DIVs 17-41 cycles i guess the SHR (1 cycle) or SHRD (4 cycles) are still to be preferred, though! :D
But in case of limited size-intros it´s always either speed or size...size wins in 90% of all cases!
But there´s sth in this case which makes the IMUL the best for both: MOV/SHL takes 5 cycles together, SHLD takes 4 cycles and the IMUL just 3 cycles.
I wonder if my first sentences still make sense, i assumed x86-MUL would execute as slow as on some 8/16-bit machines i coded in the past, but it seems this ain´t the case!
With (I)DIVs 17-41 cycles i guess the SHR (1 cycle) or SHRD (4 cycles) are still to be preferred, though! :D
Hardy, I thought so too, you would want to go for shifts usually. It seems to be also dependent on the CPU architecture.
For my routine there's not much of a difference as the bootleneck is actually elsewhere, but when I look at Agner's Instruction tables here, for example on the Intel Skylake architecture it looks like MOV+SHL have a latency of 1+1, then SHLD has 3 and IMUL 4, but I think you can't rely on that tables in the end and test it anyway as it depends on the instructions before and after also.
For my routine there's not much of a difference as the bootleneck is actually elsewhere, but when I look at Agner's Instruction tables here, for example on the Intel Skylake architecture it looks like MOV+SHL have a latency of 1+1, then SHLD has 3 and IMUL 4, but I think you can't rely on that tables in the end and test it anyway as it depends on the instructions before and after also.
...due to learning that stuff for myself and the bytebeat achievements in the last few years in tiny intros I wrote a tutorial to do Advanced PC Speaker and COVOX sound via interrupt.
This section derived from a talk to TomCat who provided 99% of the code and that should give you a nice start to get your bytebeat into a nice 256 byte intro.
Of course there's lots more to add and talk about bytebeat. Any comments/additions/corrections welcome as always.
Now you don't have any excuse for a "soundless" tiny intro ;-)
This section derived from a talk to TomCat who provided 99% of the code and that should give you a nice start to get your bytebeat into a nice 256 byte intro.
Of course there's lots more to add and talk about bytebeat. Any comments/additions/corrections welcome as always.
Now you don't have any excuse for a "soundless" tiny intro ;-)
that's right :) Now the wiki covers MIDI, pcspeaker (PWM and normal) and COVOX
nice work dudes :)
nice work dudes :)
Quote:
Oei! That is some nice stuff. Thanks!...due to learning that stuff for myself and the bytebeat achievements in the last few years in tiny intros I wrote a tutorial to do Advanced PC Speaker and COVOX sound via interrupt.
Quote:
btw: no fcomi in DOSBox.
Supported in DOSBox-X for more than 1 year. And I hope it will be supported in vanila DOSBox at one day. patch
This is the best thread on pouet (at least for me). I'd like to update it with some recent information. So I will respond to some old comments (sry).
Quote:
Operating system
Windows XP.
Editor / IDE
EditPlus - The only disadvantage is that it's not free.
Assembler
The Netwide Assembler - NASM, the most handy one.
Disassembler
NDISASM provided with NASM is enough.
Debugger
Unnecessary.
HexEditor
Viewing executable in hexadecimal mode is useful, for example for checking if some code parts can be used as constants. HxD is good and free.
My way:
Operating system
Native DOS from USB, created by Rufus.
DPMI extension: HX DPMI v2.17
mouse driver: CuteMouse v2.1
Editor / IDE
FASMIDE - Comes with FASM assembler for DOS.
Disassembler
DEBUG clone v1.32b - redirecting text output to a file :-)
Debugger
CodeView v2.2 by MS
HexEditor
HIEW v6.50 DOS - examine code and search for long instructions
Quote:
Code:; st0 st1 on fpu stack - leaves the maximum in st0 _max: fcomi st0, st1 fstsw ax jbe _max0 _max1: fxch st1 _max0: fstp st0 ret
Any suggestions for a smaller version?
Code:
fast and 4 bytes only and yes, it's PProFCMOVB ST(0),ST(1)
FSTP ST(1)