How Nvidia DLSS 3 works and why FSR is unable to recover for now

Nvidia’s RTX 40 series graphics cards will arrive in a few weeks, but among all the hardware improvements is what could be Nvidia’s golden egg: DLSS 3. It’s much more than just an update to the popular DLSS feature. (Deep Learning Super Sampling) from Nvidia and could end up defining the next generation of Nvidia much more than the graphics cards themselves.

AMD has worked hard to get its FidelityFX Super Resolution (FSR) on par with DLSS and has been successful in recent months. DLSS 3 looks like that dynamic will change and this time around, FSR may not be able to catch up anytime soon.

How DLSS 3 Works (and How It Doesn’t Work)

Nvidia

You would be forgiven if you think DLSS 3 is a completely new version of DLSS, but it isn’t. Or at least, it’s not entirely new. The backbone of DLSS 3 is the same super resolution technology available today in DLSS titles and presumably Nvidia will continue to improve it with new releases. Nvidia claims that you will now see the super resolution portion of DLSS 3 as a separate option in the graphics settings.

The new part is the frame generation. DLSS 3 will generate a completely unique frame every other frame, essentially generating seven of the eight pixels you see. You can see an illustration of this in the flow chart below. In the case of 4K, your GPU only renders pixels for 1080p and uses that information not only for the current frame but also for the next frame.

A graph showing how DLSS 3 reconstructs frames.
Nvidia

Frame generation, according to Nvidia, will be a separate switch from super resolution. This is because frame generation only works on RTX 40 series GPUs for now, while super resolution will continue to work on all RTX graphics cards, even in games that have been upgraded to DLSS 3. Needless to say, if half of your i frames are fully generated, this will increase your performance A lot.

However, frame generation isn’t just an AI secret sauce. In DLSS 2 and tools like FSR, motion vectors are a key input for upscaling. They describe where objects move from frame to frame, but motion vectors apply only to geometry in a scene. Elements that don’t have 3D geometry, such as shadows, reflections, and particles, have traditionally been masked by the upscaling process to avoid visual artifacts.

A shing motion of the graph through Nvidia's DLSS 3.
Nvidia

Masking is not an option when an AI is generating a completely unique frame, and this is where the optical flow accelerator in RTX 40 series GPUs comes in. It’s like a motion vector, except that the graphics card tracks the movement of individual pixels from one frame to another. This optical flow field, along with motion vectors, depth, and color, contribute to the AI-generated frame.

It all sounds good, but there is a big problem with AI-generated frames: they increase latency. The AI-generated frame never passes through your PC – it’s a “fake” frame, so you won’t see it on traditional fps readings in games or tools like FRAPS. Hence, the latency does not decrease despite there being so many extra frames, and due to the compute overhead of the optical stream, the latency actually increases. For this reason, DLSS 3 requires Nvidia Reflex to compensate for the increased latency.

Normally, your CPU stores a render queue for your graphics card to make sure your GPU is never waiting for work (which would cause stuttering and frame rate drops). Reflex removes the render queue and synchronizes the GPU and CPU so that as soon as the CPU can send instructions, the GPU starts processing them. When applied on top of DLSS 3, Nvidia claims that Reflex can sometimes also result in reduced latency.

Where AI makes the difference

AMD’s FSR 2.0 doesn’t use AI, and as I wrote some time ago, it proves that you can achieve the same quality as DLSS with algorithms rather than machine learning. DLSS 3 changes that with its unique frame generation capabilities, as well as the introduction of optical flow.

Optical flow isn’t a new idea – it’s been around for decades and has applications in everything from video editing applications to self-driving cars. However, machine learning optical flow computation is relatively new due to the increase in data sets on which to train artificial intelligence models. The reason you’d want to use AI is simple – it produces fewer visual errors with enough training and doesn’t have that much overhead at runtime.

DLSS is running at runtime. It is possible to develop an algorithm, devoid of machine learning, to estimate how each pixel moves from frame to frame, but it is computationally expensive, which goes against the whole point of super-sampling in the first place. With an AI model that doesn’t require a lot of power and enough training data – and rest assured, Nvidia has a lot of training data to work with – you can get a high-quality, executable optical stream at run time.

This leads to an improvement in the frame rate even in games with limited CPU. Supersampling only applies to your resolution, which is almost entirely dependent on your GPU. With a new frame bypassing CPU processing, DLSS 3 can double the frame rate in games even if you have a full CPU bottleneck. It’s impressive and currently only possible with AI.

Why FSR 2.0 fails to recover (for now)

FSR and DLSS image quality comparison in God of War.

AMD really did the impossible with FSR 2.0. It looks great and the fact that it is brand independent is even better. I’ve been ready to ditch DLSS for FSR 2.0 ever since I first saw it cycle of death. But as much as I like FSR 2.0 and think it’s a great kit from AMD, it’s not going to catch up with DLSS 3 anytime soon.

To begin with, developing an algorithm capable of tracking every pixel between frames without artifacts is quite difficult, especially in a 3D environment with dense and fine details (Cyberpunk 2077 is a great example). It is possible, but difficult. The bigger problem, however, is how bloated that algorithm should be. Tracing each pixel through 3D space, doing optical flow computation, generating a frame, and cleaning up any incidents that occur along the way – that’s a lot to ask.

Running it while a game is running, and still providing a frame rate improvement to the level of FSR 2.0 or DLSS, is even more to ask for. Nvidia, even with dedicated processors and a trained model, still has to use Reflex to compensate for the increased latency imposed by the optical flow. Without that hardware or software, FSR would likely trade too much latency to generate frames.

I have no doubt that AMD and other developers will eventually get there – or find another way around it – but it could take a few years. It’s hard to say right now.

What is easy to say is that DLSS 3 looks very exciting. Of course, we’ll have to wait until he’s here to validate Nvidia’s performance claims and see how it holds up image quality. So far, we only have a short Digital Foundry video showing DLSS 3 footage (above), which I highly recommend watching until we see further third-party testing. From our current perspective, however, DLSS 3 certainly looks promising.

This article is part of ReSpec, an ongoing bi-weekly column that includes in-depth discussions, advice and reports on the technology behind PC gaming.

Editor’s recommendations






Leave a Comment

%d bloggers like this: