How to control multicore in C
category: general [glöplog]
Hi,
I don't have a multicore processor, but i've read about fork+pipe somewhere. Is it enough to launch two processes (or more) on different processors and communicate between them? Maybe it is system dependant?
What i've in mind is not really demo oriented (sorry) but relies with exhaustive exploration of game trees.
ThanX in advance, Alain
I don't have a multicore processor, but i've read about fork+pipe somewhere. Is it enough to launch two processes (or more) on different processors and communicate between them? Maybe it is system dependant?
What i've in mind is not really demo oriented (sorry) but relies with exhaustive exploration of game trees.
ThanX in advance, Alain
yes, you can use processes and also threads.
for cross platform code there is the posix api (pthreads).
if you use processes then you need some way to communicate, either stdio or with files... or with some ipc api.
for cross platform code there is the posix api (pthreads).
if you use processes then you need some way to communicate, either stdio or with files... or with some ipc api.
Usually, the OS will evenly distribute the processes amongst the cores, unless you set the processor affinity for a process. Setting the processor affinity requires OS dependent code.
that makes me wonder (again - i'm like a broken parrot), are you aware of any demo that does use multicore / is multithreaded ?
i guess it would only be useful for software demos, e.g. tracers (that dont run on shaders). mostly the gpu is the bottleneck, no?
maybe for demos showing off hardcore physics simulation, although physX engine is now accelerated by nvidia 8800+ GPU.
The only test I've done is with cpu raytracing:
http://www.pouet.net/prod.php?which=27041
I did it with threads. With threads the communication is very easy, since the program and the threads could use the same memory. It is just to launch the threads and wait until these finish. The OS do the work to send the threads to different cores (I suppose it send it to the "most free" core at that moment).
Just one advice: if you don't have a multicore processor, don't try to optimize for multicore. Multicore coding is a bit tricky. If you try to access to the same memory area from different cores at the same time, it works but extremelly slow. I'm not exactly sure of how is it done, I believe it is part of the multicore cpu design, protection of memory areas per core... well, the thing is, that you should take a lot of care with that, duplicate data if necessary and then test it in a multicore machine... if you do it wrong, the result could be slower than the single core version.
http://www.pouet.net/prod.php?which=27041
I did it with threads. With threads the communication is very easy, since the program and the threads could use the same memory. It is just to launch the threads and wait until these finish. The OS do the work to send the threads to different cores (I suppose it send it to the "most free" core at that moment).
Just one advice: if you don't have a multicore processor, don't try to optimize for multicore. Multicore coding is a bit tricky. If you try to access to the same memory area from different cores at the same time, it works but extremelly slow. I'm not exactly sure of how is it done, I believe it is part of the multicore cpu design, protection of memory areas per core... well, the thing is, that you should take a lot of care with that, duplicate data if necessary and then test it in a multicore machine... if you do it wrong, the result could be slower than the single core version.
only had a few basic posix thread lessons at school but i remember well the ill case called deadlock where threads wait for each other...
I used multiple CPUs in one of the effects in "Regus Ademordna" by using some OpenMP directives. It worked out quite well, I got around 80% speed-up on my dual-core laptop.
I've been using multiple cores on some effects for one of our demos in production, using win32 threads. It's working great.
The CPU is a massively underused resource these days. The problem is the setups out there vary so enormously. It's ok to demand a good GPU - an 8800 or so - and be reasonably sure of a base level. But CPUs people use in their demo-watching machines vary so much in terms of speed and core count that it makes it hard to plan.
The CPU is a massively underused resource these days. The problem is the setups out there vary so enormously. It's ok to demand a good GPU - an 8800 or so - and be reasonably sure of a base level. But CPUs people use in their demo-watching machines vary so much in terms of speed and core count that it makes it hard to plan.
mmm i would say the contrary as multicore are getting old whereas 8800 are still pretty recent and high-end, and a machine with a 8800+ is merely featuring a multicore whereas one with a multicore may still have an older GPU.
the demoscene audience may differ a bit from the gamers, but if you look at the latest steam hardware survey : 41% of steam users have at least 2 cores while there are 'only' 10% who own a 8800 (more powerful models don't even seem to appear).
the demoscene audience may differ a bit from the gamers, but if you look at the latest steam hardware survey : 41% of steam users have at least 2 cores while there are 'only' 10% who own a 8800 (more powerful models don't even seem to appear).
and personally i'm still stuck with a leet SSE2-less sempron and an AGP(!) 7800 GS :]
Thank you for your kind answers.
Well, i'm coding at home, but computers are a lot underused as smash says, so they bought tons of dualcore compy at work (math teacher, argh) and i (ab)use them just for what computer were meant to do... Compute. So i have access to dualcore systems, and speeding up things is important...
An easier way for me to do would be to play all 1st moves for white on the queenside (remember it's about chess) and the ones on kingside in two different m$do$ command line, output the (light) results in files that i can mix thereafter. But that's inelegant. If i understand correctly processes (ie fork()) are similar to this approach while thread can use the same memory?
Well, i'm coding at home, but computers are a lot underused as smash says, so they bought tons of dualcore compy at work (math teacher, argh) and i (ab)use them just for what computer were meant to do... Compute. So i have access to dualcore systems, and speeding up things is important...
An easier way for me to do would be to play all 1st moves for white on the queenside (remember it's about chess) and the ones on kingside in two different m$do$ command line, output the (light) results in files that i can mix thereafter. But that's inelegant. If i understand correctly processes (ie fork()) are similar to this approach while thread can use the same memory?
chess or other games i mean, but mostly chess
I'm not the best Windows coder here, but here a simple example to create a multithreaded program and to divide one function to be done in parallel:
Code:
#define THREADS 4
struct data
{
int thread_number;
// more data
};
struct data d[THREADS];
HANDLE ths[THREADS];
DWORD WINAPI myfunction(LPVOID lpParam)
{
struct data *d=(struct data*)lpParam;
// your function here
// divide the work by the number of THREADS
// and do the part corresponding with (*d).thread_number
}
int main()
{
int n;
for (n=0; n<THREADS; n++)
{
d[n].thread_number=n;
ths[n]=CreateThread(NULL, 0, myfunction, &(d[n]), 0, NULL);
}
WaitForMultipleObjects(THREADS, ths, TRUE, INFINITE);
for (n=0; n<THREADS; n++) CloseHandle(ths[n]);
}
nevertheless is the steam hardware survey merely biased about GPU stats by the flock of CS players who don't need any big one.
zest: 61% has a shader model 3 videocard though
yes, if you read the survey you'll notice nearly 60% are still on 1 cpu, and also > 60% are on a ps3.0-capable gpu. that survey is a little out of date though, by the looks of things.
it's pretty reasonable these days to ask for a dx10-capable gpu for running demos. if you have one of the lower end ones youve usually got the option to drop the res and still get it running at a decent rate - the joy of the g80 and up is that the low end hardware is just like the high end hardware, but with less shader cores and maybe slower clocks.
with cpus such inherent scaling is harder, and the difference is enormous - a p4 3hgz (which is still pretty common and sufficient for most demos) is, at a guess, potentially something like 1/10th the speed of an intel quad core 2.4 (which is also getting quite common in new pcs).
the fast, 4 core+ cpus are potentially very useful tools, you can do some interesting stuff on them. if only it were more standard.
it's pretty reasonable these days to ask for a dx10-capable gpu for running demos. if you have one of the lower end ones youve usually got the option to drop the res and still get it running at a decent rate - the joy of the g80 and up is that the low end hardware is just like the high end hardware, but with less shader cores and maybe slower clocks.
with cpus such inherent scaling is harder, and the difference is enormous - a p4 3hgz (which is still pretty common and sufficient for most demos) is, at a guess, potentially something like 1/10th the speed of an intel quad core 2.4 (which is also getting quite common in new pcs).
the fast, 4 core+ cpus are potentially very useful tools, you can do some interesting stuff on them. if only it were more standard.
yup that's a pretty good score actually :)
One thing: At least under Windows, don't start, stop, suspend and resume threads too often. XP's scheduler SUCKS for this (don't know about Vista's thread pools tho), so open up a few helper threads and keep them running. No prob with long mathematics stuff, but eg. for demo usage even suspending and resuming a thread a few times per frickin' frame can be problematic, even (and sadly sometimes: especially) on multicore CPUs. Best bet is probably starting the threads at the beginning, then preparing workload for them and then waking them up (for example via an Event) once. Oh, and use timeBeginPeriod(1) and timeEndPeriod(1).
kb_: I remember trying the impact of different number of threads with my code. With 256 threads instead of 4 (so, creating 256 threads per frame instead of 4) the speed decreased about 5% in Windows XP. It doesn't look as a very big decrease... I suppose it is possible to create about 100k threads per second...
this new situation you are describing enlightens the upcoming GPU vs CPU war and return of raytracing and software rendering.
it's really interesting to follow as an amateur but choices may become harzardous for some big hardware companies and videogame publishers/studios in a few years...
it's really interesting to follow as an amateur but choices may become harzardous for some big hardware companies and videogame publishers/studios in a few years...
texel: mm - i noticed a pretty big hit creating threads per frame. instead i created mine on startup and used events to wait and wake them. there might of course be a better way though.
dunno, if i´m in position to tell more about it, but our upcoming 4k will make use of dualcores ( 2 threads ) for some special purpose !
doesnt make it easier when having to crinkler several versions in several resolutions anyways... 4k ;) double-workamount !
but it´ll pay-out for ppl with dual-core-CPUs ! :)
doesnt make it easier when having to crinkler several versions in several resolutions anyways... 4k ;) double-workamount !
but it´ll pay-out for ppl with dual-core-CPUs ! :)
this is still futurology but do you consider possible/foreseeable that the next generation consoles (like let's say the so-called xbox 720) will be CPU-only with 8-32 cores ?