Amiga 500: New blitter c2p routine implemented. Need help to speedtest
category: general [glöplog]
For more info:
http://ada.untergrund.net/forum/index.php?action=vthread&forum=4&topic=217
.
Expect 7mhz release from me on breakpoint. :-D
Come on old coders! 2 months to go.. enough time to make something interesting.
http://ada.untergrund.net/forum/index.php?action=vthread&forum=4&topic=217
.
Expect 7mhz release from me on breakpoint. :-D
Come on old coders! 2 months to go.. enough time to make something interesting.
Remove the space before =217 in the url. when cut'n paste.
great. I just got a 1200 here, my a600 is broken. I'll take a look, I've read the thread, amiga leet coders are back !
If you want to code an entire demo, a real A500 may be a good investment..
50SEK on ebay..
Fantastic! I'll check it on an A500 one of this days.
Newskool style demos on oldskool hardware? ;)
Newskool style demos on oldskool hardware? ;)
@sp_: any chance you can post that assembler code somewhere where you don't have to have an (yet another unwanted) account on a (yet another uneeded) file sharing server?
I just finished repairing the low-byte octal transceiver on my A4000 (I hate SMD's) so I have my fastmem back again (yay!) and may have some time to dig out one of my A500's and try that code.
I just finished repairing the low-byte octal transceiver on my A4000 (I hate SMD's) so I have my fastmem back again (yay!) and may have some time to dig out one of my A500's and try that code.
ham:
Newschool :-) I watched some of the newest Atari ST demos wich where pretty cool. Should be possible to do bether on amiga.
Dandelion:
I live in Thailand, 50SEK is 5 days of food here :-D WinUAE have an a500 speed option. If the timing on the realhardware is close to winuae timings I won't buy one.
http://www.geocities.com/hallu_a/c2p2x1_c0b1_scr_000.asm
I've included a CIA timer to make timing simple. I've set the blitter nasty bit. if possible I want timing withoutblitter nasty, and with 4 bpl (not 5 as in the source)
Newschool :-) I watched some of the newest Atari ST demos wich where pretty cool. Should be possible to do bether on amiga.
Dandelion:
I live in Thailand, 50SEK is 5 days of food here :-D WinUAE have an a500 speed option. If the timing on the realhardware is close to winuae timings I won't buy one.
http://www.geocities.com/hallu_a/c2p2x1_c0b1_scr_000.asm
I've included a CIA timer to make timing simple. I've set the blitter nasty bit. if possible I want timing withoutblitter nasty, and with 4 bpl (not 5 as in the source)
sp^ctz ... winaue timing is quite ok compared to a real a500... the last newschool thing i did the difference between a real a500 and winaue was just a few rasterlines...
btw so far you are interested... have a look at: http://www.pouet.net/prod.php?which=26920 there i'm using atari st c2p...
btw so far you are interested... have a look at: http://www.pouet.net/prod.php?which=26920 there i'm using atari st c2p...
Winuae timings
(use a500speed cpu setting, no JIT):
Blitterc2p:
320*256(2x1) res in 4 bpl(scroll to emulate 4bpl):
196 rasterlines (ABCD read:320*256*4/8, write:320*256*2/8 bytes)
smctable:
320*256(2x1):
101 rasterlines.(read: 320*256*4/8, write:320*256*2/8 bytes)
sum:
196+101 = 297 rasters.
One frame in PAL is around 312 rasterlines.
According to UAE timings a "table effect" like tunnel, bump etc. can be done in 50fps in a
160*256 resolution 2x1 4 bpl.
Sounds to good to me :-D
Ready to break some world records guys?
(use a500speed cpu setting, no JIT):
Blitterc2p:
320*256(2x1) res in 4 bpl(scroll to emulate 4bpl):
196 rasterlines (ABCD read:320*256*4/8, write:320*256*2/8 bytes)
smctable:
320*256(2x1):
101 rasterlines.(read: 320*256*4/8, write:320*256*2/8 bytes)
sum:
196+101 = 297 rasters.
One frame in PAL is around 312 rasterlines.
According to UAE timings a "table effect" like tunnel, bump etc. can be done in 50fps in a
160*256 resolution 2x1 4 bpl.
Sounds to good to me :-D
Ready to break some world records guys?
I uploaded a new version now. this verison renders a 160*256 4bpl chunkybuffer and perfoms a blitter c2p. I move red color into $Dff180. and black when finished. If the effect goes 50fps. the red color will not be blinking. (winuae use500 speed is not blinking indicating 50 fps.)
hm that sounds a bit too fast... i mean
when you need for example 12 clocks for a point(the offset access without c2p) then you have
160*256 points to calc... == 40960 * 12 = 491520 clockcycles...
491520 / 456 (456 cycles per rasterline so far i remember) == 1077,89 rasterlines == around 3,4 vbl...
so a tunnel is for for sure 50 fps in that size...
and this is without c2p !
when you need for example 12 clocks for a point(the offset access without c2p) then you have
160*256 points to calc... == 40960 * 12 = 491520 clockcycles...
491520 / 456 (456 cycles per rasterline so far i remember) == 1077,89 rasterlines == around 3,4 vbl...
so a tunnel is for for sure 50 fps in that size...
and this is without c2p !
so a tunnel is for sure not 50 fps in that size... i mean ;)
I agree that the WINuae timings sound to good to be true.
.
How many cycles? This is an SMC loop that will render 8 pixels in a scrambled nibblechunky buiffer.
SMCTABLE:
lea txture1,a0
lea txture2,a1
lea chunky,a2
REPT 160*256/8
move.b 0000(a0),d6
or.b 0000(a1),d6
move.b d6,(a2)+
move.b 0000(a0),d7
or.b 0000(a1),d7
move.b 0000(a0),d6
or.b 0000(a1),d6
move.b d6,(a2)+
move.b d7,(a2)+
move.b 0000(a0),d6
or.b 0000(a1),d6
move.b d6,(a2)+
ENDR
rts
.
How many cycles? This is an SMC loop that will render 8 pixels in a scrambled nibblechunky buiffer.
SMCTABLE:
lea txture1,a0
lea txture2,a1
lea chunky,a2
REPT 160*256/8
move.b 0000(a0),d6
or.b 0000(a1),d6
move.b d6,(a2)+
move.b 0000(a0),d7
or.b 0000(a1),d7
move.b 0000(a0),d6
or.b 0000(a1),d6
move.b d6,(a2)+
move.b d7,(a2)+
move.b 0000(a0),d6
or.b 0000(a1),d6
move.b d6,(a2)+
ENDR
rts
hm the loop needs so far my clockcycle list has correctly 132 cycles...
for a 256*160 it would be
20*256*132 = 675840 / 456 == 1482,10 / 312 == 4,7 vbl
so far i didn't do a misstake...
but the loop can be speeded up a bit by using 4 textures and doing .w accesses... oeh would save 18 clocks or so
for a 256*160 it would be
20*256*132 = 675840 / 456 == 1482,10 / 312 == 4,7 vbl
so far i didn't do a misstake...
but the loop can be speeded up a bit by using 4 textures and doing .w accesses... oeh would save 18 clocks or so
ok of course the blitter leeches some time too.. 2 sources and stuff...
I get the loop to be 128 cycles. (when unrolled)
If I shrink the txture from 256*256 to 16*16 I can calculate two pixels pr move
and dobbel the speed. A tunnel effect can be 2 times faster by using the copper
and modulo (-80) ---> Tunnel 50fps 4bpl 160*200 (160x100) possible? I think so...
For 16x16 txtures(8 pixels 64 cycles) 16x16 txture:
REPT 160*256/8
move.w 0000(a0),d6 ;12
or.w 0000(a0),d7 ;12
move.w 0000(a1),d6 ;12
or.w 0000(a1),d7 ;12
move.w d6,(a2)+ ;8
move.w d7,(a2)+ ;8
ENDR
I did a small test in winuae by moving colour to colorregister 456 cycles pr
rasterline seems correct on the a500
.
btw; I Checked out your intro. Pretty nice. I liked that u used the copper to make it look
like you used more bitplanes. 18 years :D welcome back!!
If I shrink the txture from 256*256 to 16*16 I can calculate two pixels pr move
and dobbel the speed. A tunnel effect can be 2 times faster by using the copper
and modulo (-80) ---> Tunnel 50fps 4bpl 160*200 (160x100) possible? I think so...
For 16x16 txtures(8 pixels 64 cycles) 16x16 txture:
REPT 160*256/8
move.w 0000(a0),d6 ;12
or.w 0000(a0),d7 ;12
move.w 0000(a1),d6 ;12
or.w 0000(a1),d7 ;12
move.w d6,(a2)+ ;8
move.w d7,(a2)+ ;8
ENDR
I did a small test in winuae by moving colour to colorregister 456 cycles pr
rasterline seems correct on the a500
.
btw; I Checked out your intro. Pretty nice. I liked that u used the copper to make it look
like you used more bitplanes. 18 years :D welcome back!!
ops, there is an error in the loop above. To quick there. :D
a 320*200 tunnel 16*16 texture... 2*2 should be possible in 50fps...
but you can do it maybe a bit faster... in my tunnel i'm using a small trick... the tunnel could be 256 lines but i ran into memory problems so it's smaller...
when you have a look to the tunnel offsets... often offsets are the same (when you use the right tunneloffsetmap i mean) for same offsets you use a different texture... that saves some moves in the innerloop ok works only with unrolled stuff...
ok maybe will not give you that big speedup like in mine... because you already have 2 pixeles combined in one texture...but at the tunnel borders often pixels are the same... maybe that helps a bit...;)
but you can do it maybe a bit faster... in my tunnel i'm using a small trick... the tunnel could be 256 lines but i ran into memory problems so it's smaller...
when you have a look to the tunnel offsets... often offsets are the same (when you use the right tunneloffsetmap i mean) for same offsets you use a different texture... that saves some moves in the innerloop ok works only with unrolled stuff...
ok maybe will not give you that big speedup like in mine... because you already have 2 pixeles combined in one texture...but at the tunnel borders often pixels are the same... maybe that helps a bit...;)
No i meen a 4bpl 2*1 tunnel in 320*200 50 fps. I will hint: I wil not able to rotate, only move in one axis Cheating: yeah. (but no color cycling. he-he)
.
Table effects are borring, i'm working on converting my 256byte txturemapper to plot in scrambled nibblechunky.
Intro for breakpoint.
Come on coders, challenge me there :D
.
Table effects are borring, i'm working on converting my 256byte txturemapper to plot in scrambled nibblechunky.
Intro for breakpoint.
Come on coders, challenge me there :D
hm just calc ;)
64 * 20 * 200 / 456 / 312 == 1,79 vbl...
i planned to convert my mapper from st to amiga ... but hm nope i'll not do something for breakpoint... too less time ;)
but i'm looking forward to your bp thing ;)
64 * 20 * 200 / 456 / 312 == 1,79 vbl...
i planned to convert my mapper from st to amiga ... but hm nope i'll not do something for breakpoint... too less time ;)
but i'm looking forward to your bp thing ;)
I'm looking forward to it too. This is REAL demo coding!
yay for hardware banging! looking forward to bp. will you be there, sjeggtryne?
In search for newschool a500 demos I found this one:
Rout by Potion 1999
http://ada.untergrund.net/showdemo.php?demoid=451
http://www.pouet.net/prod.php?which=8262
Running in winuae with match a500 speed makes a really descent framerate.
The resolution is probobly 128*100 using a blitter c2p similar to mine.
Pretty impressive code.
.
Does it run on 1meg 512k chip / 512k fast amigas?
Rout by Potion 1999
http://ada.untergrund.net/showdemo.php?demoid=451
http://www.pouet.net/prod.php?which=8262
Running in winuae with match a500 speed makes a really descent framerate.
The resolution is probobly 128*100 using a blitter c2p similar to mine.
Pretty impressive code.
.
Does it run on 1meg 512k chip / 512k fast amigas?
hm @decent framerate..
with my winaue config it runs with 4-5 vbls (so far i could recognise) sometime much more... and only 8 colors... but anyway this is from 1999 cool for this time ...
with my winaue config it runs with 4-5 vbls (so far i could recognise) sometime much more... and only 8 colors... but anyway this is from 1999 cool for this time ...
"Does it run on 1meg 512k chip / 512k fast amigas? "
sp^ctz: no, needs 1.5Mb minimum to run - can be all chipmem.
sp^ctz: no, needs 1.5Mb minimum to run - can be all chipmem.