"Optimize" 4klang tune?
category: music [glöplog]
Punqtured: No, I haven't tried Quiver before, and my 30% cpu comment was tongue in cheek about "random shitty VST synth of which I'm pretty sure it exists somewhere" :). But just downloaded the beta and played around a bit - nice. I like how it doesn't pretend to be analog so you've got quite an amount of flexibility there in the osc section. Nice presets also.
sure, 4klang is coded the way it gets smallest in an intro in the end, thus its eating up a lot of cpu to achieve the best possible sound while wasting as few bytes as possible this way!
but what is a 4k nowadays and for years? -> pixel-shader-stuff rendered on the GPU instead of on the CPU!
so why not waste CPU if its smaller in the end? you dont need no cpu for visualizations!
v2 and 4klang arent really compareable! as kb pointed out -> 300MHz-CPUs! and also the difference between 64k and 4k is way more than just the size!
but what is a 4k nowadays and for years? -> pixel-shader-stuff rendered on the GPU instead of on the CPU!
so why not waste CPU if its smaller in the end? you dont need no cpu for visualizations!
v2 and 4klang arent really compareable! as kb pointed out -> 300MHz-CPUs! and also the difference between 64k and 4k is way more than just the size!
is as trying to compare results of kkrunchy vs crinkler on different filesizes.
its more than just the size, how you have to treat your data/bytes.
its more than just the size, how you have to treat your data/bytes.
CPU speed doesn't matter. If it's 300Mhrz, 1200Mhrz or 4Mhrz. Only time spent precalcing the shit differs :-p
kb_: Ah, I might have jumped to concluisons there, then. Sorry 'bout that ;-)
I agree that the oscillator section is where it really stands out. That, and the actual structure of it. It has the potential to offer unlimited flexibility and once you get a hang of all the possibilities, it's actually quite fast to work with. So for as long as we're able to use it for intros and 32k executable music, I think we might have an advantage over most other synths used in those categories. Well - sound and feature-wise, that is. The success of production always comes down to how good a song it is, where - in my opinion - the soundquality and technical achievement ranks second. At least when speaking strictly demoscene related use.
I agree that the oscillator section is where it really stands out. That, and the actual structure of it. It has the potential to offer unlimited flexibility and once you get a hang of all the possibilities, it's actually quite fast to work with. So for as long as we're able to use it for intros and 32k executable music, I think we might have an advantage over most other synths used in those categories. Well - sound and feature-wise, that is. The success of production always comes down to how good a song it is, where - in my opinion - the soundquality and technical achievement ranks second. At least when speaking strictly demoscene related use.
(strictly talking about music for productions - not the visuals)
revival: I tried Quiver, but unfortunately I hit a snag; I don't have Windows, so I use these things in WINE, and for some reason, the Quiver UI really doesn't like WINE. Every screen update (showing the window initially, response to buttons, change to preset tab, or even a simple redraw) needs 5–10 seconds of max CPU usage, which makes the whole UI pretty much unusable.
I don't think WINE users will be a very important customer group for you, but at least now you know. :-)
I don't think WINE users will be a very important customer group for you, but at least now you know. :-)
Sesse: Thanks for giving it a go. Actually I've noticed that the UI is somewhat slower on Windows than on OSX, so I am planning to take a look at that eventually, I may be doing something wrong.
Well, the only hint I can give is that wineserver, which is the process usually dealing with shared resources (e.g. bitmaps, handles, and other stuff that can be accessed by multiple processes), is eating about as much CPU as the VST host process itself. So most likely you are doing something with GDI that it doesn't like.
Oh, and the second most likely reason: Very frequent mutex use instead of critical sections (they can be shared across processes, and thus need to go into wineserver).
revival: I did a quick hack replacing all your mutexes by critical sections. It's perfectly smooth in WINE now (everything updates like 100x as fast; the only delay is switching between tabs, and that's also much better).
Sesse: Haha, that's very cool, thanks. I must admit I'm not much of a win32 API fiend, so I have to read up on the difference between the two.
Critical section is a way to flag a piece of code that has to run uninterrupted (i.e. no other threads/interrupts kicking in) -- as to safely update or touch global state/resources. Conceptually not much different than using a mutex but the internal implementation probably differs. Sesse? :)
pierrebeauregard: So, the only difference between critical sections and mutexes in Windows is that mutexes can be shared between processes and critical sections can not.
This means that they are semantically very similar; there's no “no other threads can be scheduled” or anything like that, both are locks on your own structures/sections and nothing else. The fact that mutexes can be shared between multiple processes has very important ramifications, though: It basically means the kernel _must_ be involved in some way, since if a process dies (or gets killed), all of the mutexes it holds must be released.
For Windows, this is OK enough, but for WINE, it is a disaster: There's no way you can implement that on top of the Linux kernel without patching it, so if you implemented a mutex by just an integer in shared memory you'd risk deadlocks when something died. This means that every mutex _must_ be owned and handled entirely by the wineserver (a separate process doing many of the same things the kernel would do on Windows). This means that every single call to take a mutex (or release it) must be implemented by messaging the wineserver and asking for the mutex to be taken — which obviously is very slow! (Serialize a message and send it over the socket, context switch to the wineserver, etc. etc.)
I think there is a small performance loss with mutexes on native Windows too (critical sections can usually be taken without having to involve a jump into the kernel, whereas this is harder with mutexes for some reason I don't know enough about), but it's nowhere near the insane slowdown you get with WINE. :-)
This means that they are semantically very similar; there's no “no other threads can be scheduled” or anything like that, both are locks on your own structures/sections and nothing else. The fact that mutexes can be shared between multiple processes has very important ramifications, though: It basically means the kernel _must_ be involved in some way, since if a process dies (or gets killed), all of the mutexes it holds must be released.
For Windows, this is OK enough, but for WINE, it is a disaster: There's no way you can implement that on top of the Linux kernel without patching it, so if you implemented a mutex by just an integer in shared memory you'd risk deadlocks when something died. This means that every mutex _must_ be owned and handled entirely by the wineserver (a separate process doing many of the same things the kernel would do on Windows). This means that every single call to take a mutex (or release it) must be implemented by messaging the wineserver and asking for the mutex to be taken — which obviously is very slow! (Serialize a message and send it over the socket, context switch to the wineserver, etc. etc.)
I think there is a small performance loss with mutexes on native Windows too (critical sections can usually be taken without having to involve a jump into the kernel, whereas this is harder with mutexes for some reason I don't know enough about), but it's nowhere near the insane slowdown you get with WINE. :-)
Thanks, that's very clear (and a much needed knowledge refresher -- it's been years since I touched threading code).
That's very interesting Sesse, thanks. I know we're getting totally derailed from the original topic here, sorry. I was thinking that I don't really do that much locking in the synth (maybe a couple of locks per frame, on the order of 100-1000 locks a second), and couldn't understand the performance hit you described.
Until I remembered that the Lua VM I'm using for the UI is set up to support multithreading (multiple threads can access the VM at the same time). This works by having every potentially dangerous Lua instruction use a lock when it's changing the VM state. Naturally this turns out to be quite a bit of locking :-) And I just naively used Mutexes as a locking primitive, not knowing any better. I had actually been thinking that the UI seemed a bit slower on Win than Mac, I'm interested to see if this changes anything in that regard. Thanks a bunch for the heads up!
Until I remembered that the Lua VM I'm using for the UI is set up to support multithreading (multiple threads can access the VM at the same time). This works by having every potentially dangerous Lua instruction use a lock when it's changing the VM state. Naturally this turns out to be quite a bit of locking :-) And I just naively used Mutexes as a locking primitive, not knowing any better. I had actually been thinking that the UI seemed a bit slower on Win than Mac, I'm interested to see if this changes anything in that regard. Thanks a bunch for the heads up!
Well, yeah, in general don't fear locking (as you say, 100–1000 locks per second is nothing). Taking an uncontended lock will take on the order of 20 cycles (about a L1 cache miss); I'm sure that if you switch to critical sections wherever you can, the problem will basically just go away. They're easier to work with anyways. =)