GL driver bugs
category: code [glöplog]
If there's any bug in a publically available OpenGL implementation(s) you're using that's been getting on your nerves for a long time now and you can't get your vendor to fix it, here's a chance to let some steam off & make the world a better place to live. I'd appreciate short code snippets, presenting how to re-produce the problem.
Any behavioral differences between implementations, which cannot be explained by spec language or different hardware capabilities, would also be much appreciated. So, for example, if there's an implementation, on which uniform values are not set to 0 at link-time, that's definitely a glitch. Different uniform locations, on the other hand, are *not* a bug :)
This could serve as a KB for other coders. In the meantime, I'm hoping to pull some strings to improve the situation on the driver front.
Any behavioral differences between implementations, which cannot be explained by spec language or different hardware capabilities, would also be much appreciated. So, for example, if there's an implementation, on which uniform values are not set to 0 at link-time, that's definitely a glitch. Different uniform locations, on the other hand, are *not* a bug :)
This could serve as a KB for other coders. In the meantime, I'm hoping to pull some strings to improve the situation on the driver front.
Great idea.
Obviously on Windows there has been a lot of talk about differences between drivers too, although you didn't mention specific platforms, which is a good thing :)
Would have to dig through either my notebooks or code but in my limited experience in at least one case (when I was doing something wrong) the NVidia driver would silently fail, whereas the ATI one would crash. It suspect it was probably to do with malformed VAOs/VBOs, or possibly RBOs. Sure I've made the ATI DLL throw an exception before too.
Since encountering those things I try to run/debug my code between on my dev host (ATI) and a second machine with an NVidia card in., obvioulsy a good thing to try when look at an issue.
Obviously on Windows there has been a lot of talk about differences between drivers too, although you didn't mention specific platforms, which is a good thing :)
Would have to dig through either my notebooks or code but in my limited experience in at least one case (when I was doing something wrong) the NVidia driver would silently fail, whereas the ATI one would crash. It suspect it was probably to do with malformed VAOs/VBOs, or possibly RBOs. Sure I've made the ATI DLL throw an exception before too.
Since encountering those things I try to run/debug my code between on my dev host (ATI) and a second machine with an NVidia card in., obvioulsy a good thing to try when look at an issue.
Recently I've been trying to compile shaders in a thread that uses a shared context. This simply crashes, albeit on some old Intel HD Graphics driver (no updates available) for Windows 7 x64. It works fine on recent Nvidia chip, and Linux...
This GLSL fragment shader used to crash nvidia drivers a few months ago by creating an infinite loop -> TDR timeout on windows. Seems to be fixed in the latest version, not sure when it go fixed. This is triggered by the two variables having the same name. Tells you a lot about that compiler.
Code:
#version 450
void f() {
for(int i = 0; i < 0; i++) {}
}
void main() {
for (int i = 0; i < 2; i++) {
f();
}
}
So reading between the lines is the problem with ATI?
When I last did it I had a 'render thread' and the message pump/main windows stuff was separated. (Window can be moved around and thread still renders to it wouhtout pausing which is desireable). Its 'just' a case of making sure you initialise and uninitialise the right things.. a quick look at a code back up has a comment saying you have to call wglMakeCurrent( NULL, NULL) in threads that are relinquishing the context before grabbing it in another place.
When I last did it I had a 'render thread' and the message pump/main windows stuff was separated. (Window can be moved around and thread still renders to it wouhtout pausing which is desireable). Its 'just' a case of making sure you initialise and uninitialise the right things.. a quick look at a code back up has a comment saying you have to call wglMakeCurrent( NULL, NULL) in threads that are relinquishing the context before grabbing it in another place.
(That was a reply to raer)
Here is a short analysis of the bugs cupe just mentioned. That thing is really a bad one.
I prefer to use DX/HLSL currently.
I prefer to use DX/HLSL currently.
@Canopy: Intel IGP (HD Graphics) on an i5-520M, latest (old, obviously) driver. I use Qt5 as a framework (5.4.1). I have two seperate, but shared contexts in two threads. One for GUI and rendering, one for compiling shaders, all mutexed in appropriate places. Contexts are valid, compile context is made current in compile thread (only there), but the following glCompileShader (afair) call simply crashes. If I use the compile context from the main thread (which then obviously blocks), it does not crash.
I did not figure out what I did wrong until I tried it on an different Windows PC (same executable) and/or under Linux and it would run flawlessly, so I guess it is either Qt5 doing crap inbetween or the driver b0rks out...
I did not figure out what I did wrong until I tried it on an different Windows PC (same executable) and/or under Linux and it would run flawlessly, so I guess it is either Qt5 doing crap inbetween or the driver b0rks out...
hmm now I think about it, before I put the ATI card in my main desktop a few years ago it also had a similar, or the same Intel integrated chip on the motherboard and I may have seen this too. At that point I was using a very small amount of C boilerplate code with nothing else in the mix. (Was before I had an active 'dev journal' notepad thing going though so no notes to look back at).
Just finished converting a few hundred music visualizer scenes with multiple shaders per scene from CgFX to GLSL and here is what I ran into. Mostly the issues are because Nvidia allow too much in GLSL compared to what the real specification allows.
Intel HD 3000 OpenGL 3.1 latest drivers v15
Nvidia
Intel HD 3000 OpenGL 3.1 latest drivers v15
- Doesn’t support interface blocks between shader stages (Difficult to find out if this actually is allowed in GL 3.1)
- singlefloat.xxx is not supported. Need to be “vec3(singlefloat)”
- Doesn’t allow const variables based on non const data. Like “const float v = x*5.0”
Nvidia
- Allows code only valid in fragment shader (like dfdx/dfdy) if not used.
- Allows 0 as float values (Should be 0.0)
- Allows arrays to be initialized using { } syntax. Should be “float[4](...)”
- Allows used of reserved words like “vec2” for variable names
This GLSL snippet produces incorrect output on nvidia, by ignoring a normalize(). Has been in the driver for at least a year. Tried to report this a year ago.
Code:
vec3 f() {
vec3 d = vec3 (1,0,0); // choice of value doesn't matter
d = normalize(d); // d now has length 1
d += vec3(1,0,0); // d now has length 2
// the following line is has no effect (probably "optimized" - happens only if the above normalize is present)
d = normalize(d);
// using this instead works as expected:
//d /= length(d);
return d; // the function now returns vec3(2,0,0), which is wrong
}
void main() {
// if the content of the above function is pasted in here instead, the code works as expected. needs to be inside a function for the bug to be triggered
outColor.rgb = f();
}
I totally have to agree that this is a horrible bug, but I also have to say that it's really bad style to require two normalize calls here (in case d becomes non constant, this can be quite expensive to evaluate).
Things regarding the "reproducer":
- Does this also happen for "dynamic" variables? This could be a bug only happening with constant evaluation (In this minimal reproducer f() can be evaluated to a constant value, the optimizer should do this, if not, the compiler sucks.).
What I think happens: The compiler has some kind of lost update and missed the "+=", therefore most probably something like this happens:
Should be easy to fix if "d = d + vec3(1,0,0)" works.
Recommendation of the day: Don't do something like this. :D
If your stuff is constant, tell your compiler, even if the current compiler version does not really care about it.
(@cupe: I know, you know all this. :))
Things regarding the "reproducer":
- Does this also happen for "dynamic" variables? This could be a bug only happening with constant evaluation (In this minimal reproducer f() can be evaluated to a constant value, the optimizer should do this, if not, the compiler sucks.).
What I think happens: The compiler has some kind of lost update and missed the "+=", therefore most probably something like this happens:
Code:
vec3 d = vec3(1,0,0);
d = normalize(d); // Seriously? This normalize can directly be kicked out (iff d = (1,0,0), because length(d) = 1)
d += vec3(1,0,0); // The compiler does not record the "is not normalized anymore" change on the "+=" operator, while d is still changed
d = normalize(d); // The "modified"-flag (or whatever) on d didn't change, this normalize gets simply optimized out, since d has been normalized before. Oooopps. Boom.
Should be easy to fix if "d = d + vec3(1,0,0)" works.
Recommendation of the day: Don't do something like this. :D
If your stuff is constant, tell your compiler, even if the current compiler version does not really care about it.
(@cupe: I know, you know all this. :))
Happens for non-normalized or non-const input values as well. That's what I meant with "choice of value doesn't matter" - sorry, could have been more clear. "d = d + ..." works as expected, it's the "+=".
Of course it's easy to fix, but also easy to miss. And I don't now if there are any other conditions that have to be met; in our case a piece of code in the analytic sky model triggered it, producing a weird discontinuity in the cloud colors sometimes.
Of course it's easy to fix, but also easy to miss. And I don't now if there are any other conditions that have to be met; in our case a piece of code in the analytic sky model triggered it, producing a weird discontinuity in the cloud colors sometimes.
Whoops. I found a condition that is necessary to trigger the bug in the last snippet. Sorry for the incomplete code. Anyway: a line like
Has to be present somewhere in the code. Note that:
In the earlier version I had forgotten to remove an include file which defined a similar array (which was longer and did actually contain meaningful numbers) but this was as small as I could get it. So, this time the complete fragment shader code for reproducing the problem:
On older drivers, the array definition was not necessary to trigger the bug, IIRC. It's probably best to avoid arrays in GLSL on nvidia completely - they are really weird performance-wise anyway. Blergh.
Code:
const vec3 test[1] = vec3[](normalize(vec3(sqrt(1)*0, 1, 0)));
Has to be present somewhere in the code. Note that:
- The array is not used at all.
- If the normalize() inside the array definition is removed, the bug is no longer triggered.
- If the sqrt(1)*0 term is removed, the bug is no longer triggered. I don't even.
In the earlier version I had forgotten to remove an include file which defined a similar array (which was longer and did actually contain meaningful numbers) but this was as small as I could get it. So, this time the complete fragment shader code for reproducing the problem:
Code:
#version 450
out vec4 outColor;
// if this line is removed, the bug is no longer triggered
const vec3 test[1] = vec3[](normalize(vec3(sqrt(1)*0, 1, 0)));
vec3 f() {
vec3 d = vec3 (2,0,0); // choice of initial value doesn't matter, it can be non-constant
d = normalize(d); // d now has length 1
d += vec3(1,0,0); // d now has length 2
// the following line is has no effect
// (probably "optimized" - happens only if the above normalize is present)
d = normalize(d);
// using this instead works as expected:
//d /= length(d);
return d; // the function now returns vec3(2,0,0) which is wrong
}
void main() {
// if the content of the above function is pasted in here instead, the code works as expected:
// needs to be inside a function for the bug to be triggered
outColor.rgb = f()*0.5; // *0.5 so you can actually see the difference visually
}
On older drivers, the array definition was not necessary to trigger the bug, IIRC. It's probably best to avoid arrays in GLSL on nvidia completely - they are really weird performance-wise anyway. Blergh.