Nvidia’s Maxwell GPU-based GeForce 900-series cards don’t have a hardware-based equivalent for that. Instead, they rely on software-based “pre-emption” that allows a GPU to pause a task to perform a more critical one, then switch back to the original task. (Think of it like a traffic light.) Maxwell’s pre-emption gets the job done, but nowhere near as well as AMD’s dedicated hardware (which behaves more like the flow of cars yielding in traffic).
Pascal GPUs introduces several new hardware and software features to beef up its async compute capabilities, though none behave exactly like the async hardware in Radeon GPUs.
The GeForce GTX 1080 adds flexibility in task execution with the introduction of dynamic load balancing, a new hardware-based feature that allows the GPU to adjust task partitioning on the fly rather than letting resources sit idle.
With the static partitioning technique used exclusively by all previous generation GeForce cards, resources for overlapping tasks each claimed a portion of the GPU resources available—let’s say 50 percent for PhysX compute and 50 percent for graphics, for example. But if the graphics finishes its task first, that 50 percent of resources allocated to it sits idle until the compute portion also completes. The Pascal GPU’s new dynamic load partitioning allows unfinished tasks to tap into idle GPU resources, so the PhysX task in the previous example gains access to the resources available when the graphics task wrapped up, which would obviously allow the PhysX task to finish sooner than it would with the older static partitioning scheme.
A fluid particle demo shown at Nvidia’s GTX 1080 Editors Day hit 78 frames per second with the feature disabled, and climbed to 94fps when it was turned on.
The Pascal GPU also adds “Pixel level pre-emption” and “Thread level pre-emption” to its bag of async tricks, which are designed to help minimize the cost of switching tasks on the fly when time-critical tasks (like Oculus’ asynchronous timewarp) come in hot.
Previously, pre-emption occurred at a fairly high level of the computing process, between rendering commands from the game engine. Each rendering command can consist of up to hundreds of individual draw calls in the command push buffer, Nvidia says, with each draw call containing hundreds of triangles, and each triangle requiring hundreds of individual pixels to be rendered. Performing all that work before switching tasks can take a long time. (Well, relatively speaking.)
Pixel level pre-emption—which is achieved using a blend of hardware and software, Nvidia says—allows Pascal GPUs to save their current workload at pixel-level granularity rather than the high rendering command state, switch to another time-critical task (like asynchronous timewarp), then pick up exactly where they left off. That lets the GTX 1080 pre-empt tasks quickly, with minimal overhead; Nvidia says pixel-level pre-emption takes under 100 microseconds to kick into gear. We’ll talk about real-world results with Pascal’s new async compute tools when we dive into our DirectX 12 testing with Ashes of the Singularity. (Spoiler alert: They’re impressive.)
Sign up for CIO Asia eNewsletters.