- Unity Game Optimization
- Dr. Davide Aversa Chris Dickinson
- 1120字
- 2021-06-24 12:13:09
Draw calls
Before we discuss dynamic batching and static batching, let's first learn about the problems that they are both trying to solve within the Rendering Pipeline. We will try to keep our analysis fairly light on the technicalities as we will explore this topic in greater detail in Chapter 6, Dynamic Graphics.
The primary goal of these batching methods is to reduce the number of draw calls required to render all objects in the current view. At its most basic form, a draw call is a request sent from the CPU to the GPU asking it to draw an object.
Before a draw call can be requested, the system needs to perform several operations. The complete list is too long for this book, and depends on the specific features enabled on Unity; however, we can categorize them into two significant steps:
- Upload assets and meshes to the GPU
- Set up the rendering of the meshes using the uploaded assets.
In the first step, mesh and texture data must be pushed from the CPU memory (RAM) into GPU memory (VRAM), which typically takes place during the initialization of the scene, but only for textures and meshes that the scene file knows about. If we dynamically instantiate objects at runtime using texture and mesh data that hasn't appeared in the scene yet, then they must be loaded at the time we instantiate them. The scene cannot know ahead of time which Prefabs we're planning to instantiate at runtime, as many of them are hidden behind conditional statements and much of our application's behavior depends upon user input.
In the second step, the CPU must prepare the GPU by configuring the options and rendering features that are needed to process the object that is the target of the draw call.
To handle all these interactions between the CPU and GPU, we use the underlying graphics API, which could be DirectX, OpenGL, OpenGLES, Metal, WebGL, or Vulkan, depending on the platform we're targeting and the specific graphical settings we are using. These API calls go through a library—called a driver—which maintains a long series of complex and interrelated settings, state variables, and datasets that can be configured and executed from our application. The available features change enormously based on the graphics card we're using and the version of the graphics API we're targeting. More advanced graphics cards support more advanced features, which would need to be supported by newer versions of the API, so updated drivers would be needed to enable them. The sheer number of settings, features, and compatibility levels between one version and another that have been created over the years (particularly for older APIs such as DirectX and OpenGL) is nothing short of mind boggling. Thankfully, at a certain level of abstraction, all of these APIs tend to operate similarly, which means that Unity can support many different graphics APIs through a common interface.
To refer to this utterly massive array of settings that must be configured to prepare the Rendering Pipeline just before rendering an object, we often use a single term: Render State. Until these Render State options remain the same, the GPU maintains the last Render State settings for all incoming objects and renders them accordingly.
Changing any of the Render State settings can be a time-consuming process. For example, if we set the Render State to use a blue texture file, and then we try to render one gigantic mesh, it would be rendered very rapidly, with the whole mesh appearing blue. At this point, we could render nine more completely different meshes, and they would all be rendered blue since we haven't changed which texture the GPU should use in Render State. If, however, we wanted to render 10 meshes using 10 different textures, then this would take longer because we would need to prepare Render State with the new texture for each mesh just before sending the draw call instruction.
The texture used to render the current object is effectively a global variable in the graphics API, and changing a global variable within a parallel system is much easier said than done. In a massively parallel system such as a GPU, we must effectively wait until all of the current jobs have reached the same synchronization point (in other words, the fastest cores need to stop and wait for the slowest ones to catch up, wasting processing time could be used on other tasks) before we can make Render State change, at which point we will need to spin up all of the parallel jobs again. This continuous waiting can waste a lot of time, and therefore the less we need to ask Render State to change, the faster the graphics API will be able to process our requests.
Things that can trigger Render State synchronization include—but are not limited to—an immediate push of a new texture to the GPU and changing a shader, lighting information, shadows, transparency, and pretty much any graphical setting we can think of.
Once we configure Render State, the CPU must decide what mesh to draw, what textures and shader it should use, and where to draw the object based on its position, rotation, and scale (all represented within a 4 x 4 matrix known as a transform, which is where the Transform component gets its name from), and then send an instruction to the GPU to draw it. To keep the communication between the CPU and GPU very dynamic, Unity pushes new instructions into a queue known as the command buffer. This queue contains instructions that the CPU has created, from which the GPU pulls a new command each time it finishes the preceding one.
The trick to how batching improves the performance of this process is that a new draw call does not necessarily mean that we need to configure a new Render State. If two objects share the exact same Render State information, then the GPU can immediately begin rendering the new object since the same Render State is maintained after the last object is finished. This eliminates the time wasted because of Render State synchronization. It also reduces the number of instructions that need to be pushed into the command buffer, reducing the workload on both the CPU and GPU.