Amiga Blitter strangeness
category: code [glöplog]
Hi!
Last Sunday with Sachy we were debugging my Amiga 500 OCS demo, which worked on WinUAE & MiST, but failed to work on the real thing. The bug was finally cornered to the clear routine in which Blitter shown some unexpected behavior.
Blitter setup is:
With intention to simply fill rectangular data block with BLTADAT value.
BLTDPT, BLTDMOD and BLTSIZE are verified correct and provided for completeness.
It turns out that BLTCON0 setup with only D channel and D=A function triggers unexpected behavior, ranging randomly from screen garbage to demo hang.
Could any Amiga coder comment on this?
Last Sunday with Sachy we were debugging my Amiga 500 OCS demo, which worked on WinUAE & MiST, but failed to work on the real thing. The bug was finally cornered to the clear routine in which Blitter shown some unexpected behavior.
Blitter setup is:
Code:
BLTAFWM = 0xFFFF;
BLTALWM = 0xFFFF;
BLTCON0 = 0x01F0;
BLTCON1 = 0;
BLTADAT = clear_value;
BLTDPT = destination;
BLTDMOD = (64+14)*2;
BLTSIZE = (80<<6) | 50;
With intention to simply fill rectangular data block with BLTADAT value.
BLTDPT, BLTDMOD and BLTSIZE are verified correct and provided for completeness.
It turns out that BLTCON0 setup with only D channel and D=A function triggers unexpected behavior, ranging randomly from screen garbage to demo hang.
Could any Amiga coder comment on this?
You're not touching COPJMP ($dff088 / $dff08a) with the CPU while the blitter is working, are you?
OK, now that's strangeness squared - could you explain why?
I'm COPJMPing every VBlank, so my demo does a lot of that. Still, only this particular Blitter job crashes.
I'm COPJMPing every VBlank, so my demo does a lot of that. Still, only this particular Blitter job crashes.
HW bug. ref1 and ref2
The reason why just one of them crashes is most likely down to timing. Shuffle enough stuff around and it'll crash somewhere else instead.
If you can do your copjmp-strobing with the copper rather than the CPU that should solve it for you. (e.g. copperlists triggering jumps to other copperlists).
If that isn't possible in your case then you'll just have to make sure the blitter isn't active when you strobe.
The reason why just one of them crashes is most likely down to timing. Shuffle enough stuff around and it'll crash somewhere else instead.
If you can do your copjmp-strobing with the copper rather than the CPU that should solve it for you. (e.g. copperlists triggering jumps to other copperlists).
If that isn't possible in your case then you'll just have to make sure the blitter isn't active when you strobe.
However, you said you do it every _vblank_.... Keep in mind that the copper1-list is reloaded automatically on vblank anyway so you might not need the explicit strobe. ;)
Thanks!
Well, I just figured it out too, that I'm basically replicating copper reload in VBlank.
I assume I have to disable Copper DMA during copper address update to make it atomic?
Or could it hinder copper restart in any way? (like: copper will not jump or jump incorrectly if DMA is off at perticular timing)
Well, I just figured it out too, that I'm basically replicating copper reload in VBlank.
I assume I have to disable Copper DMA during copper address update to make it atomic?
Or could it hinder copper restart in any way? (like: copper will not jump or jump incorrectly if DMA is off at perticular timing)
Assuming the last question is unrelated to the strobe-and-blitter-bug (which you now know how to avoid):
For avoiding race conditions on the copper address just make sure you don't set it from multiple places at the same time. (Yes, I'm Captain Obvious).
If your copperlists write to copadr themselves then just make sure they finish by setting the correct starting addr for cop1. Or you can do a little bit of timing and ensure that the CPU writes to cop1ptr after the last time (that frame) that the copper itself does it.
Basically do anything _except_ trying to set those pointers in your vblank interrupt :)
For avoiding race conditions on the copper address just make sure you don't set it from multiple places at the same time. (Yes, I'm Captain Obvious).
If your copperlists write to copadr themselves then just make sure they finish by setting the correct starting addr for cop1. Or you can do a little bit of timing and ensure that the CPU writes to cop1ptr after the last time (that frame) that the copper itself does it.
Basically do anything _except_ trying to set those pointers in your vblank interrupt :)
I meant CPU write vs. Copper read race conditions. AFAIK long move is not atomic, so there is minimal chance that copper restart will use high and low address words of different lists it it happens to use copadr exactly while long move is being executed.
btw. I don't do any manual addressing or copper restarts within copper list, so that is not a problem.
btw. I don't do any manual addressing or copper restarts within copper list, so that is not a problem.
I see no reason why you want to rewrite copper pointers each frame with the CPU. :)
If you're just doing doublebuffered copperlists then just make sure the lists re-initialize themselves. E.g. at the end of list1 you set cop1ptr = list2 and vice versa.
Keep in mind that the copper only reads the value of copadr during vblank and when you strobe copjmp. Since you know when those things occur you can simply make sure to not update copadr at that time. :)
If you're just doing doublebuffered copperlists then just make sure the lists re-initialize themselves. E.g. at the end of list1 you set cop1ptr = list2 and vice versa.
Keep in mind that the copper only reads the value of copadr during vblank and when you strobe copjmp. Since you know when those things occur you can simply make sure to not update copadr at that time. :)
I use doublebuffered copperlists, but automatic switching won't work as I'm sometimes using more than one video frame to compute the effect.
I still have to figure out how to determine when not to write to copadr (e.g. what value is in VHPOSR just before copper restart).
I still have to figure out how to determine when not to write to copadr (e.g. what value is in VHPOSR just before copper restart).
Quote:
"Wildcat" was also "released" last saturday. :)
Quote:
Last Sunday with Sachy we were debugging my Amiga 500 OCS demo
That's the spirit! :-D
Congrats on both releases, and all the best to you and your family!
In that case you can do something simpler than trying to race the copper and you also get to use every programmers favourite solution: another level of indirection! :)
Vblank probably occurs quite some lines above the start of your display so just use a small dispatch copperlist:
Then instead of directly setting the copptr with the CPU you just update the hi and low addresses from your vblank.
Of course, if you're using the copper for setting up the blitter (rather than actual on-screen stuff) you want the wait at the beginning to be as low as possible (rather than "anywhere before the display starts") but otherwise it'll still work the same way.
Vblank probably occurs quite some lines above the start of your display so just use a small dispatch copperlist:
Code:
dc.w $1007,$fffe ; or some other suitable position between vblank and display start
dc.w $0080,HIADR
dc.w $0082,LOADR
dc.w $0088,$666 ; satanic copstrobe
Then instead of directly setting the copptr with the CPU you just update the hi and low addresses from your vblank.
Of course, if you're using the copper for setting up the blitter (rather than actual on-screen stuff) you want the wait at the beginning to be as low as possible (rather than "anywhere before the display starts") but otherwise it'll still work the same way.
Quote:
I still have to figure out how to determine when not to write to copadr (e.g. what value is in VHPOSR just before copper restart).
This is discussed in the EAB thread about undocumented hardware features.
The copper fetches its address from COP1LC at the beginning of line 0, at the same time as the vblank interrupt is triggered. Thus, if the copper location is written in the vblank interrupt, it will always apply to the next frame.
If you don't know one frame ahead of time whether you will be ready to switch to the new copper on the next frame, this doesn't quite solve your problem of course. A write outside the vblank interrupt could still hit the copper location fetch.
Mark every prepared copper list with a "ready to be used" flag that _can_ be written with an atomic write. And the have your vblank interrupt check that flag and only set the list as the effective list if it's ready. ??? And reset the flag and the pointer after it has been used. ??
@yzi: Set the list... and? As Blueberry said, it will apply to next frame unless you copjump to it - which can't be done as long as blitter is active.
I think I'd borrow the idea from dispatch copperlists and make two trampolines jumping to the two copperlist buffers. Trampolines are short so it would be easy to make them exist on the same 64k page by allocating slightly more space. This way the rest of the demo would only have to update COP1LCL and forget about manipulating copper in vblank or manual strobes.
I think I'd borrow the idea from dispatch copperlists and make two trampolines jumping to the two copperlist buffers. Trampolines are short so it would be easy to make them exist on the same 64k page by allocating slightly more space. This way the rest of the demo would only have to update COP1LCL and forget about manipulating copper in vblank or manual strobes.
How about this:
When a new copperlist is ready, busy-wait until the raster is on any line but the last, then write COP1LC. That should avoid the race condition, at the cost of having one scanline less time to complete the computation of the frame.
Unless of course the CPU is completely locked out by the copper and/or blitter. Blitter nasty is nasty...
Then comes the challenge of reading the rasterpos atomically, since it is spread over two registers. This can be done as follows: read the upper byte, then read the lower byte. If the lower byte is zero, read the upper byte again.
When a new copperlist is ready, busy-wait until the raster is on any line but the last, then write COP1LC. That should avoid the race condition, at the cost of having one scanline less time to complete the computation of the frame.
Unless of course the CPU is completely locked out by the copper and/or blitter. Blitter nasty is nasty...
Then comes the challenge of reading the rasterpos atomically, since it is spread over two registers. This can be done as follows: read the upper byte, then read the lower byte. If the lower byte is zero, read the upper byte again.
@Blueberry: I was also thinking about this. Blitter is nasty, but you'd want to update COP1LC either after the last blit or just before the final blit, so it should be easy to pick an update spot where blitter isn't running. And if you want to keep compatibility with OCS, you don't have that second rasterpos register anyway, AFAIK, at the price of possible mostly harmless false positive.
Still, an interrupt may trigger between the rasterpos check and copper update. VSync shouldn't hurt, but CIA, audio & disk can.
All trampolines on a common 64k page still look most elegant, I think. They cost only 3 copper instructions (none of these being any wait), are quite easy to properly generate, do not require any DMA/IRQ disabling and for the rest of the code switching the copperlists is as simple as a single atomic COP1LCL write.
Still, an interrupt may trigger between the rasterpos check and copper update. VSync shouldn't hurt, but CIA, audio & disk can.
All trampolines on a common 64k page still look most elegant, I think. They cost only 3 copper instructions (none of these being any wait), are quite easy to properly generate, do not require any DMA/IRQ disabling and for the rest of the code switching the copperlists is as simple as a single atomic COP1LCL write.
Quote:
However, you said you do it every _vblank_.... Keep in mind that the copper1-list is reloaded automatically on vblank anyway so you might not need the explicit strobe. ;)
But it's weird that this implicit cop list 1 strobe, that happens at vblank, doesn't trigger the bug, since the copper is usually in an infinite wait when this happens ($FFFF,$FFFE).
hoover: The implicit strobe is not performed by the CPU (because that'd be explicit). One CPU-triggered copjmp strobing can trigger the hw bug afaik.