At first I though my implementation was wrong, or that there was a bug in my code. But then it slowly dawned on me what the problem was. If you recall, the whole point to deferred is reducing the number of pixels that are shaded. A large majority of this is not shading pixels that are occluded (they fail the z-test). However, with my simple geometry, there are very few camera positions in which ANY geometry is occluded. Thus deferred shading just adds overhead.
So with this in mind I started looking for more complex scenes to test my code against. After a bit of searching I found Morgan McGuire's amazing Computer Graphics Data database. He has a good 20 some-odd scenes that he's personally maintained. (As most of them are no longer even available in their original form). Huge props to him and any others involved in the project.
Anyway, I downloaded the popular Crytek Sponza Scene in obj form. Awesome. Now what? Well, now I needed to load obj in Vertex and Index buffers. I looked around to see if there was any library to do it for me (why re-invent the wheel?), but I only found a smattering of thrown together code. Well and assimp. But assimp seemed a bit large for a temporary obj loader. More on that later. So with that said, I used the code here as a starting point, and created my own obj loader.
First off, obj files are HARD to parse. Mostly because they're so flexible. And being a text-based format, parsing text is just not fun. Ok, so maybe they're not HARD, but they're not easy either. The first major roadblock is that obj allows you to specify all the parts of a vertex separately.
Separate vertex definitions
For example:v 476.1832 128.5883 224.4587 vn -0.9176 -0.3941 -0.0529 vt 0.1674 0.8760 0.0000
The 'v' represents the vertex position, the 'vn' represents the vertex normal, and the 'vt' represents the vertex texture coordinates. Indices can then choose whichever grouping of position, normal, and texture coordinates they need. Like this:
f 140/45/140 139/18/139 1740/17/1852
This is especially handy if large portions of your scene have the same surface normals, like square buildings. Then you only have to store a single 'vn' for all the vertices sharing the same normal.
HOWEVER, while this is great for storage, DirectX expects a vertex to be a singular unit, AKA, position, normal, AND texture coordinate all together. (Yes, you can store them in separate Vertex Buffers, but then you run into cache misses and data incoherence) I chose to work around it like this:
I use these data structures to hold the data:
std::vector<Vertex> vertices; std::vector<uint> indices;
typedef std::tuple<uint, uint, uint> TupleUInt3; std::unordered_map<TupleUInt3, uint> vertexMap; std::vector<DirectX::XMFLOAT3> vertPos; std::vector<DirectX::XMFLOAT3> vertNorm; std::vector<DirectX::XMFLOAT2> vertTexCoord;
- When reading vertex data ('v', 'vn', 'vt'), the data is read into its corresponding vector.
- Then, when reading indices, the code creates a true Vertex and adds it to vertices. I use the unordered_map to check if the vertex already exists before creating a new one:
TupleUInt3 vertexTuple{posIndex, texCoordIndex, normalIndex}; auto iter = vertexMap.find(vertexTuple); if (iter != vertexMap.end()) { // We found a match indices.push_back(iter->second); } else { // No match. Make a new one uint index = meshData->Vertices.size(); vertexMap[vertexTuple] = index; DirectX::XMFLOAT3 position = posIndex == 0 ? DirectX::XMFLOAT3(0.0f, 0.0f, 0.0f) : vertPos[posIndex - 1]; DirectX::XMFLOAT3 normal = normalIndex == 0 ? DirectX::XMFLOAT3(0.0f, 0.0f, 0.0f) : vertNorm[normalIndex - 1]; DirectX::XMFLOAT2 texCoord = texCoordIndex == 0 ? DirectX::XMFLOAT2(0.0f, 0.0f) : vertTexCoord[texCoordIndex - 1]; vertices.push_back(Vertex(position, normal, texCoord)); indices.push_back(index); }
Success! On to the next roadblock!
N-gons
Obj supports all polygons; you just add more indices to the face definition:f 140/45/140 139/18/139 1740/17/1852 1784/25/429 1741/35/141
Again, this is extremely handy for reducing storage space. For example, if two triangles are are co-planar, you can combine them into a quad, etc. HOWEVER, DirectX only supports triangles. Therefore, we have to triangulate any faces that have more than 3 vertices. Triangulation can be quite complicated, depending on what assumptions you choose to make. However, I chose to assume that all polygons are convex, which makes life significantly easier. Following the algorithm in Braynzar Soft's code, you can triangulate by making triangles with the first vertex, the next vertex and the previous vertex. For example, let's choose this pentagon:
We would then form triangles like so:
So the triangles are:
- 0 1 2
- 0 2 3
- 0 3 4
The code can be found here. One note before I move on: This way of triangulating is definitely not optimal for high N-gons; it will create long skinny triangles, which is bad for rasterizers. However, it serves its purpose for now, so it will stay.
Normals
It's perfectly legal for an face in obj to not use normals:f 1270/3828 1261/3831 1245/3829
Similarly, you can have a face that doesn't use texture coordinates:
f -486096//-489779 -482906//-486570 -482907//-486571
(You'll also notice that you can use negative indices, which correspond to the index (1 - current number of vertices). But that's an easy thing to work around). The problem is normals. My shader code assumes that a vertex has a valid normal. If it doesn't, the default initialization to (0.0f, 0.0f, 0.0f) makes the whole object black. Granted, I could add some checks in the shader, where if the normal is all zero, just use the material color, but this just adds dynamic branching and in reality, there shouldn't be any faces that don't have normals.
So the first thing I tried is 'manually' calculating the vertex normals using this approach. The approach uses the cross product of two sides of a triangle to get the face normal, then averages all the face normals for faces sharing the same vertex. Simple, but it takes FOREVER. The first time I tried it, it ran for 10 minutes.... Granted, it is O(N12 + N2), where N1 is the number of vertices and N2 is the number of faces. The Sponza scene has 184,330 triangles and 262,267 faces. Therefore, I resolved to do the normal calculations once, and then re-create the obj with those normals. I'll get to that in a bit.
Vectors
After creating the basic obj loader I did some crude profiling and found some interesting behavior. When compiled for "Release", the obj loader ran 1 - 2 magnitudes of time faster. After much searching, I found out that in "Release", the VC++ compiler turns off a bunch of run-time checks on vectors. These checks are really good, in that they give improved iterator checks and various useful debug checks. However, they're really really slow. You can turn them off with compiler preprocessor defines, but I wouldn't. But just something to be aware of.So that's obj's. With that all done, I can now load interesting models! Yay!!
But even in "Release", the scene still takes ~4 seconds to load on my beefy computer. Hmmm.... Well the first thing I did was to put the obj parsing into a separate thread so the main window was still interactive.
I also sleep the main thread in 50ms intervals to give the background thread as many cycles as it can. I need to do some further testing to see if sleeping the main thread affects child threads. This is using std::thread. I wouldn't think it would, but it doesn't hurt to test. Let me know your thoughts.
Well, that's it for now. I'll cover some of the specifics of what changed in the renderer from Deferred Shading Demo to Obj Loader Demo in the next post, but this post is getting to be a bit long. As always, feel free to comment or leave suggestions.
-RichieSams