pouët.net

OpenGL, showing lots of 2d tiles

category: code [glöplog]
 
Hi,

On my way to teach myself state-of-the-art OpenGL (>= v3.0). I want to blit lots of various tiles from a single tileset. Let's assume RGBA 8888 for the pixel format. Each tile can have its own transform (translate/rotate/scale). From what I read & understood, the best practice is to batch. Here's how I do it

= setup =
* Create a single texture for the tileset
* Create a VBO, built such as it have one quad per tile in the tileset (vertices coord + texture coord + indexes)
* Create a shader program, which does the transform, the transform being specified through 'uniform' variables

= blitting =
* set texture for the tileset, set the shader, VBO, etc.
* for each tile to be shown
* call glUniform to set the transform specific to current tile
* call glDrawElement to show the current tile
* clean-up what should be cleaned up

Question => Is it completely wrong ? Is it close to optimal ? Do you know a better way ?
if your VBO is filled with the tile data already, why do you need several drawcalls?
added on the 2012-01-31 08:07:21 by Gargaj Gargaj
Let's say my tileset contains 5 tiles, T1, T2, T3, T4 and T5. The VBO defines T1, T2, T3, T4 and T5, as quads centered on (0, 0).

So if I want to blit T3 at 42 differents screen locations, with my approach, for each location
* call glUniform to set the screen location of the tile on the screen
* call glDrawElements to blit T3 at the said location
You'll probably get a boost in performance by not using uniforms but an attribute buffer (or just the normals array if you have no other uses for it) to store all the transforms, send them to the gpu and render them all with a single call.
added on the 2012-01-31 08:12:12 by msqrt msqrt
if your number of screen locations is finite, you might as well just center the quads on their final location i suppose? keeping your drawcall count low is usually a good idea.
added on the 2012-01-31 08:12:53 by Gargaj Gargaj
Oh, what you seem to do is actually best achieved with a VBO containing all the tiles on the screen and they reference the tile by some attribute (again, normals or a specific attribute buffer).

So you create a VBO of your world, then inside the vertex shader you check which tile was to be used and calculate your texture coordinates accordingly.
added on the 2012-01-31 08:15:27 by msqrt msqrt
I see, so you guys suggest using one VBO with each instances of each tiles inside. And then, I transfer per-tile attributes. Same amount of data transferred, but a single glDrawElements to draw the whole thing. Let's code this, thanks for quick answers ^^
Yeah, same amount of data but it's a single large bunch and you don't have to resend it every frame :) And less commands to go through to the GPU too. The less you have to send things back and forth the better.
added on the 2012-01-31 08:44:27 by msqrt msqrt
@jmagic I quickly went through that code you linked, looks like the thing discussed above, which is encouraging. Why computing the individual transform matrix for a tile/sprite on the CPU ? In my own code, I do it on the GPU, mainly because it was easier & shorter to code ^^ Also I use quads, not tri. strips, is it a big deal ?
dunno how opengl handles quads internally, i suppose they end up as tristrips anyway?
added on the 2012-01-31 10:51:30 by Gargaj Gargaj
marmakoide, the code can also take in transformation matrices directly, so there's just one shader code to be written. 2D CPU transform is quite fast anyway.
added on the 2012-01-31 13:04:42 by jmagic jmagic
Yes, it's not the main point of the approach, just my curiosity talking "why this" "why that" :) I will code that "O(1) calls to blit the whole batch" approach tomorrow, and thanks to your code I even have a reference. We'll see the gain, learning is exciting ^^
Using instancing might be a better way to do your rendering.
gl_VertexID and gl_InstanceID variables in vertex shader or
glVertexAttribDivisor might help to reduce opengl function calls and memory.
Please see "2.8 Vertex Arrays" in OpenGL specification 3.3 or above
or
http://www.pouet.net/topic.php?which=7782&page=1&x=18&y=14
for more detail.

If "state-of-the-art OpenGL (>= v3.0)" means core profile and don't use deprecated features, you must draw a quad by 2 triangles.
GL_QUADS is deprecated.
But I don't know whether GL_TRIANGLE_STRIP or GL_TRIANGLE_FAN is faster than GL_QUADS.
added on the 2012-01-31 13:51:49 by tomohiro tomohiro
Quote:
Question => Is it completely wrong ? Is it close to optimal ? Do you know a better way ?

This isn't the bottleneck you are looking for. You can go about your business. Move along
added on the 2012-01-31 14:14:25 by evilpaul evilpaul
Quote:

Question => Is it completely wrong ? Is it close to optimal ? Do you know a better way ?

You may want to store the screen coordinates of your sprites in the vertex attribute of the vbo which you currently set to 0 and use a single glDrawElements.

Did I understood correctly you're doing 2D-sprites and not 3D-billboards? Or are you talking about coding billboards?

You can update the sprites coordinates in the VBO according to their respective transform in the case they're moving using glBufferSubData. Try to use a single call and make sure that all "animated" sprites are close to each other in the VBO. You'll need to pre-transform your sprite coordinates to screen coordinates on the CPU. (- is there any point to upload the transform data that might be bigger than the vertex data itself? depends on the GPU. Though in your case if you're doing the transform already on the GPU there's no point of moving it back on CPU actually. -).

Quote:

@Gargaj "dunno how opengl handles quads internally, i suppose they end up as tristrips anyway? "


an opengl Quad = A triangle fan with a fixed number of 4 vertex to be exact.

Quote:

If "state-of-the-art OpenGL (>= v3.0)" means core profile and don't use deprecated features, you must draw a quad by 2 triangles.


I hate how they try to make GL look like d3d thinking that it will get developers flogging instantly. That is just wrong. If GL was a copycat of d3d, I'd use d3d. honestly. Quads are one of the few good things of opengl that make it less work for the developer to perform the same task as in d3d.
added on the 2012-01-31 15:19:00 by nystep nystep
Quote:
Did I understood correctly you're doing 2D-sprites and not 3D-billboards? Or are you talking about coding billboards?

@nystep: are you so idiot you even can't read the topic? d'oh.
added on the 2012-01-31 15:26:24 by nystep nystep
i wouldn't full a VBO with quads, but points. One GL_POINT per tile. The point can have all the data for one tile: size, position, texture offset. That makes it compact, static, and you don't need instancing. Your vertex shader transforms the point to it's position in screen. Your geometry shader expands that point (in screen space) into four points making the tile . Your fragment shader does the texture lookup. Easy, compact, fast!!
added on the 2012-01-31 18:41:49 by iq iq
@iq
i would do the same, except for the geometry shader -- in my experience, it isn't very fast at all (my experience may be wrong or biased, however). for that reason i'd do proper 4 vertices per tile, but fill .x with global consecutve numer of vertex (so that we could get real.x = vtx.x % width, real.y = in_vtx.y / width) and have .yzw free for any other metadata there could be.
added on the 2012-02-01 06:36:47 by provod provod
So, to blit N tiles from a tileset defining M tiles, with CPU-computed attributes updated once per frame
1) One VBO with 4 vertices, setup that lonely quad for the N tiles to blit. Obviously not optimal, sub-optimal data traffic and N draw calls
2) One VBO with 4xM vertices, like I did. Optimal data traffic, easy to use (N can change, simple code) but N draw calls. Probably good enough for game sprites, say < 100 moving sprites
3) One VBO with 4XMxN vertices, like suggested. Optimal data traffic, 1 draw call. Sounds optimal from speed point of view. Various ways to implement it and to handle the per-tiles attributes, as mentioned by Tomohiro, IQ, and others.

1) sounds like "you're doing it wrong" 2) is probably good enough for typical usage (like animated entities of a platform game) 3) sounds like the way to go for some massive CPU-animated sprite mayem.

I appreciate a lot your advises, thanks a lot. I hope my OpenGL skills will get sharp enough to soon relieve my demo-coding itch ^^

login