.kkapture 0.01 - demo capturing made easy (hopefully)
category: code [glöplog]
Somehow I have a feeling that there are ways to make .kkapture capture much faster, but my unexperienced and fragile amateur coder fingers won't dare to touch the source code, and since ryg is good at making stuff go fast, I'll leave the optimising to him.
haha, what a nice way of saying "OPTIMISE DAMMIT!!1!" :D
the main bottleneck (on my system anyway) is getting the framebuffer data back from video memory. because, surprise surprise, agp is write-only, and READING from video memory goes through the normal pci bus with its astronomical 132mb/sec bandwidth. and since integrated IDE controllers are also on the pci bus, this is the very same bus where that data gets sent back after encoding again. hooray.
i'm usually around 100mb/sec readback when encoding with huffyuv, which is writing about 25mb/sec to the HD (all averages).
so short of reducing the amount of data that is read back and thus getting bandwidth requirements down, that's about as fast as you can expect it to get on an AGP system.
pci express to the rescue, but i don't have pci express, so i can't say how much of an improvement it is. there's always cpu usage and hd speeds to consider too.
best thing to do performance-wise would probably to convert it to YCbCr and subsample it on the gpu, then read THAT back and store it uncompressed. but as said, right now i've got more important things to do. if anyone is interested, by all means, code that :)
hfr: i said it before, but i'll gladly repeat it again: i'm not going to add anything to kkapture that can be easily done with virtualdub. there's no point in wasting my time with adding features to kkapture that you already get in another free program, especially when it's as easy as opening the avi in virtualdub and configuring a few filters.
i can add a dummy encoder though.
i'm usually around 100mb/sec readback when encoding with huffyuv, which is writing about 25mb/sec to the HD (all averages).
so short of reducing the amount of data that is read back and thus getting bandwidth requirements down, that's about as fast as you can expect it to get on an AGP system.
pci express to the rescue, but i don't have pci express, so i can't say how much of an improvement it is. there's always cpu usage and hd speeds to consider too.
best thing to do performance-wise would probably to convert it to YCbCr and subsample it on the gpu, then read THAT back and store it uncompressed. but as said, right now i've got more important things to do. if anyone is interested, by all means, code that :)
hfr: i said it before, but i'll gladly repeat it again: i'm not going to add anything to kkapture that can be easily done with virtualdub. there's no point in wasting my time with adding features to kkapture that you already get in another free program, especially when it's as easy as opening the avi in virtualdub and configuring a few filters.
i can add a dummy encoder though.
Quote:
i'm not going to add anything to kkapture that can be easily done with another free program
indeed a good point.
on the other hand i'll implement the proposed features to v0.04 anyway - which was not yet necessary though, since all my captures worked perfectly with (minor tweaks of) v0.03 (which you can understand as a compliment).
so if other users happen to find this useful, too, i can provide the modified source to be kept in future versions of kkapture.
i was specifically talking about me :) - if you have something working and it's not totally hacked in, i'll gladly make it official.
hooray.
"after 5 years of silence"... well, not really. after quite some silence, i've finally found the time to make a new proper kkapture release: version 0.04.
this one really has a LOT of either long awaited or new but very useful fixes - to name the most important ones: automatic avi splitting in the video for windows avi writer [thanks bartman!], proper support for demos that change resolution or reinitialize the graphics api inbetween, and 64bit host machine support (though it can still only grab 32bit executables).
get it at http://www.farbrausch.de/~fg/kkapture, enjoy, and feel free to send me your bug reports :)
"after 5 years of silence"... well, not really. after quite some silence, i've finally found the time to make a new proper kkapture release: version 0.04.
this one really has a LOT of either long awaited or new but very useful fixes - to name the most important ones: automatic avi splitting in the video for windows avi writer [thanks bartman!], proper support for demos that change resolution or reinitialize the graphics api inbetween, and 64bit host machine support (though it can still only grab 32bit executables).
get it at http://www.farbrausch.de/~fg/kkapture, enjoy, and feel free to send me your bug reports :)
Hahaha!! How COOOOOL is that !??! :-D
Ryg, you simply _ruuuuule_ !!!
Keep it up...kkapture is the BEST ever!! :-D
Thanks for that, pal!
Bye,
Weasel
Ryg, you simply _ruuuuule_ !!!
Keep it up...kkapture is the BEST ever!! :-D
Thanks for that, pal!
Bye,
Weasel
new kkapture? GREAT! =)
I don't know if it is possible, but if the bus is the bottleneck could you read back DXT-textures? DXT1 comes with 6:1 compression and no alpha, but i don't know if cards can WRITE that format...
Converting it to YCbCr and compressing it on the GPU might need v3.0-shaders, which quite few people have...
Converting it to YCbCr and compressing it on the GPU might need v3.0-shaders, which quite few people have...
cards can only read, not write, dxt. besides, quality is quite low - for a tool which has archival purposes as one of its primary objectives anyway.
converting to ycbcr is easily done on anything with pixel shaders (3 dp3 instructions), though you probably want ps2.0 precisionwise (8bit matrix coefficients ain't that hot, for encoding anyway). anyway, that by itself doesn't save any data. color subsampling obviously does; and for the rest of what you'd expect in a video codec, most of the pixel processing stuff is moderately suited to gpus (assuming you have at least 16bit intermediate precision, i.e. ps2.0 cards min). however, virtually all of that increases dynamic range - in other words, generates >8bit output data, so it's actually MORE to read back. most of the control logic as well as the actual bitstream encoding stuff (which reduces data again) is very unsuitable for gpu-based processing. all that aside, implementing a whole gpu-based video codec seems like a LITTLE overkill to me. reading subsampled YCbCr pixels might help a bit on AGP systems, but right now that's not my main priority.
no, the right answer is to simply use a pci express system with decent cpu. my new machine just finished doing a 60fps huffyuv kkapture of fr-025 in 640x480, faster than realtime (average recording rate: 64fps). in 1024x768, it's mainly limited by hd write speed (and in parts where it's not, by huffyuv encoding).
besides, while kkapture is not realtime even with comperatively low resolutions on non-pci-express systems, i don't think it's exuberant to take about 3-4x the running time of a demo to get a really high quality highres video. i still make changes to improve speed when they're obvious (like disabling vsync on most graphics apis which is new in kkapture 0.04) or easy to do, but it's not a main priority for me right now.
converting to ycbcr is easily done on anything with pixel shaders (3 dp3 instructions), though you probably want ps2.0 precisionwise (8bit matrix coefficients ain't that hot, for encoding anyway). anyway, that by itself doesn't save any data. color subsampling obviously does; and for the rest of what you'd expect in a video codec, most of the pixel processing stuff is moderately suited to gpus (assuming you have at least 16bit intermediate precision, i.e. ps2.0 cards min). however, virtually all of that increases dynamic range - in other words, generates >8bit output data, so it's actually MORE to read back. most of the control logic as well as the actual bitstream encoding stuff (which reduces data again) is very unsuitable for gpu-based processing. all that aside, implementing a whole gpu-based video codec seems like a LITTLE overkill to me. reading subsampled YCbCr pixels might help a bit on AGP systems, but right now that's not my main priority.
no, the right answer is to simply use a pci express system with decent cpu. my new machine just finished doing a 60fps huffyuv kkapture of fr-025 in 640x480, faster than realtime (average recording rate: 64fps). in 1024x768, it's mainly limited by hd write speed (and in parts where it's not, by huffyuv encoding).
besides, while kkapture is not realtime even with comperatively low resolutions on non-pci-express systems, i don't think it's exuberant to take about 3-4x the running time of a demo to get a really high quality highres video. i still make changes to improve speed when they're obvious (like disabling vsync on most graphics apis which is new in kkapture 0.04) or easy to do, but it's not a main priority for me right now.
some more notes for speed junkies while i'm at it:
- when you're not bound by video grabbing speed (i.e. on a pci-x system), you can pick your punishment: hd or cpu bound. unless you have a fast RAID lying around, you're unlikely to be able to stream out uncompressed video in realtime for typical resolutions. however, most video codecs are relatively slow, and lossy to boot. huffyuv (google for it) is a nice, lossless (in rgb mode) or very slightly lossy (in yuv mode) vfw codec that provides a data reduction of typically around 2.5:1 while being quite fast. and it's free, too.
- in case you're the lucky owner of a dualcore system, you're probably interested in making use of the extra computation power. don't worry, it's easy: use the directshow avi writer when possible. it's multithreaded, and some tests show that for most demos that kkapture at a decent speed (i.e. are not limited by rendering or other factors) the load gets distributed quite nicely on dualcore setups - aforementioned fr-025 kkapture in 640x480 has an average of ~87% cpu usage on both cores. the video for windows avi writer currently can't make use of dualcore machines - but buffering a few frames (like the dshow encoder does already) and doing the encoding in a different thread shouldn't be too hard. i'll probably add it in a later release.
well, just an idea. Simply using a PCIe-machine sure is a solution. I'll wait with that till I'm not broke anymore ;-)
Thanks for the kkool tool anyway ryg!
Thanks for the kkool tool anyway ryg!
when capturing asd's animal attraction, i get 4secs black-output in the avi right before that "recursive cube" (pouet screenshot).
happens with both, v3 & v4, so navis is probably just doing something nasty :)
any ideas..?
happens with both, v3 & v4, so navis is probably just doing something nasty :)
any ideas..?
i ran into strange problems when capturing form factor to scissor some vj content.
0.04 alpha: if i let it exit normally, the beginning of the avi will be overwritten by zeroes, making the file unusable. if i press esc before it quits, the avi is okay (huffyuv, file size is 1.4gb)
0.04 final: freezes (raises an exception) with all possible combinations of parameters. the log says:
main: shutting down...
timing: 0.00 frames per second on average
main: everything ok, closing log.
this intro has a non-typical loader, perhaps that's causing some of the problems.
0.04 alpha: if i let it exit normally, the beginning of the avi will be overwritten by zeroes, making the file unusable. if i press esc before it quits, the avi is okay (huffyuv, file size is 1.4gb)
0.04 final: freezes (raises an exception) with all possible combinations of parameters. the log says:
main: shutting down...
timing: 0.00 frames per second on average
main: everything ok, closing log.
this intro has a non-typical loader, perhaps that's causing some of the problems.
Getting an error message logged as
avi_dshow: no mux! <fn C:\temp\sts-03.avi>
avi_dshow: getfiltergraph failed
Any ideas?
Using a Geforce 7800GT
avi_dshow: no mux! <fn C:\temp\sts-03.avi>
avi_dshow: getfiltergraph failed
Any ideas?
Using a Geforce 7800GT
hellfire: hmm... i added code to handle glReadBuffer and FBOs correctly - didn't help though. any gl expert here with a clue what else could prevent glReadPixels from returning the correct values?
gem: which video encoder are you using, avi/dshow or avi/vfw? dshow can produce opendml videos (i.e. >2GB) but is relatively problematic at times (which means it doesn't cleanly interact with some programs). i've found vfw to work in a lot of cases with dshow doesn't, and since 0.04 the vfw writer automatically starts new files, so it shouldn't be a big problem.
scalarwave: that's the dshow writer failing to init - this, too, happens sometimes, and i have no clue why. just try using the avi/vfw writer instead.
gem: which video encoder are you using, avi/dshow or avi/vfw? dshow can produce opendml videos (i.e. >2GB) but is relatively problematic at times (which means it doesn't cleanly interact with some programs). i've found vfw to work in a lot of cases with dshow doesn't, and since 0.04 the vfw writer automatically starts new files, so it shouldn't be a big problem.
scalarwave: that's the dshow writer failing to init - this, too, happens sometimes, and i have no clue why. just try using the avi/vfw writer instead.
Quote:
a clue what could prevent glReadPixels from returning the correct values
the only thing i could think of is rendering directly to the frontbuffer, thus leaving the backbuffer cleared - and possibly completely avoiding glswapbuffers.
swapbuffers is apparently performed, else kkapture wouldn't even advance the current time.
ryg: please note this post is rather informative, not demanding or anything (i read your other post and fully understand :)
here's some more demos I encountered couldn't be kkaptures (neither 0.03 nor 0.04 did work).
candela - jeff
fairlight - liquid lust
black maiden - monorail
xplsv - tokyo (random freezing during the demo)
3dmark06 (random freezing, threading?)
also, 0.04 seems to have problems running intros compressed with kkrunchy (e.g. dead ringer) "not a supported executable format" (works with 0.03)
also some good news:
fairlight - come clean now works
3dmark05 now works (although out-of-sync audio needed manual fixing in sound forge)
great work!
here's some more demos I encountered couldn't be kkaptures (neither 0.03 nor 0.04 did work).
candela - jeff
fairlight - liquid lust
black maiden - monorail
xplsv - tokyo (random freezing during the demo)
3dmark06 (random freezing, threading?)
also, 0.04 seems to have problems running intros compressed with kkrunchy (e.g. dead ringer) "not a supported executable format" (works with 0.03)
also some good news:
fairlight - come clean now works
3dmark05 now works (although out-of-sync audio needed manual fixing in sound forge)
great work!
jeff/monorail: will look into it.
liquid lust: i have a preliminary workaround for track one, if that fixes the problems for liquid lust too i'll stuff it as option into the dialog box.
tokyo/3dmark06: random freezing sounds bad (and yeah, might very well be a mt issue). i'll look into it, but no promises :)
"not a supported executable format": stupid bug related to some oversights in the new startup interception code, which cannot actually be disabled with the checkbox in 0.04 - forgot to write the required code :). both problems are already fixed in the current codebase, will work properly in 0.05.
out-of-sync audio: i'll probably be working on atleast one big source for them for 0.05 (or maybe 0.06); actually there are several small things about the whole "encoding end" that cause bugs, diminished performance or both; i'll definitely work on that for the next releases (whenever they will be; don't ask me, i don't know :)
more goodies for 0.05: somewhat improved opengl code. didn't solve the problems with animal attraction, which it was meant to (navis? what are you doing there? :), but it might help with current or future ogl demos that cause problems with versions <=0.04 (no, i don't know of any, but that doesn't mean much :).
liquid lust: i have a preliminary workaround for track one, if that fixes the problems for liquid lust too i'll stuff it as option into the dialog box.
tokyo/3dmark06: random freezing sounds bad (and yeah, might very well be a mt issue). i'll look into it, but no promises :)
"not a supported executable format": stupid bug related to some oversights in the new startup interception code, which cannot actually be disabled with the checkbox in 0.04 - forgot to write the required code :). both problems are already fixed in the current codebase, will work properly in 0.05.
out-of-sync audio: i'll probably be working on atleast one big source for them for 0.05 (or maybe 0.06); actually there are several small things about the whole "encoding end" that cause bugs, diminished performance or both; i'll definitely work on that for the next releases (whenever they will be; don't ask me, i don't know :)
more goodies for 0.05: somewhat improved opengl code. didn't solve the problems with animal attraction, which it was meant to (navis? what are you doing there? :), but it might help with current or future ogl demos that cause problems with versions <=0.04 (no, i don't know of any, but that doesn't mean much :).
Threestate - Sonnet crashes when I try to capture it...
As the video link of the product is broken I'd like to do a new video...
As the video link of the product is broken I'd like to do a new video...
ryg: To answer your question about animal attraction.
We got the same problem with Fraps - there are some 'scenes' that, for a few seconds appear to render a 'black' screen. What essentially happens there is that I (think) I have 3 context switches between pbuffers and probably another 3 render to textures. I think I was doing something like a texture cache 'warming', which made the scene switch work much better. I turned this feature off for when we recorded it again (with fraps) for scene.org - and I think the .avi version that is online hasn't got the problem.
Now all this sounds a bit bizarre to me also, but I remember that that was the only way that I found to make the scene switch good enough - on my machine at least... I hope that helps.
Btw congratulations about kkapture, it is "magical" !
We got the same problem with Fraps - there are some 'scenes' that, for a few seconds appear to render a 'black' screen. What essentially happens there is that I (think) I have 3 context switches between pbuffers and probably another 3 render to textures. I think I was doing something like a texture cache 'warming', which made the scene switch work much better. I turned this feature off for when we recorded it again (with fraps) for scene.org - and I think the .avi version that is online hasn't got the problem.
Now all this sounds a bit bizarre to me also, but I remember that that was the only way that I found to make the scene switch good enough - on my machine at least... I hope that helps.
Btw congratulations about kkapture, it is "magical" !
A feature sugestion: Crop and resize ;)
navis, ok thanks, that helped :) you're doing SwapBuffers on a DC that isn't current, which is perfectly legal, but not expected by kkapture (or FRAPS either it would seem). added some code to track which rendering contexts belong to which DCs and automatically switch to the right RC before glReadPixels is performed. seems to work fine in my small test, i'm running a 60fps full kkapture on animal attraction as i'm writing this to check whether it runs through or causes any other problems (it shouldn't, but let's play it safe :).
i guess i'll release 0.05 pretty soon then, doesn't include some of the functionality i wanted it to, but the number of important bugfixes warrants a new release.
i guess i'll release 0.05 pretty soon then, doesn't include some of the functionality i wanted it to, but the number of important bugfixes warrants a new release.
tell me if i'm just being silly, but why not make kkapture a codec? (beside the obvious horrible effort)
include some dummy AVI file with a KKAP header that brings up the kkapture dialog, and stream all output straight into (for instance) virtualdub. or something :-)
include some dummy AVI file with a KKAP header that brings up the kkapture dialog, and stream all output straight into (for instance) virtualdub. or something :-)