DLSS 3 is the biggest advantage of GeForce RTX 40 series, NVIDIA Ada Lovelace Architecture and Features Introduction

DLSS 3 is really strong, and the Ada Lovelace architecture has a lot of features.

The NVIDIA GeForce RTX 40 series was officially announced on September 20, and NVIDIA and the global media clarified more details of the Ada Lovelace GPU architecture and the features of the GeForce RTX 40 series on September 21.

The new-generation GeForce RTX 40 series adopts the Ada Lovelace architecture. At this stage, the GeForce RTX 4090 24GB GDDR6x has been released. It will be released on October 12, followed by the GeForce RTX 4080 16GB GDDR6x and the GeForce RTX 4080 12GB GDDR6x. Currently tentatively scheduled for November.

In the current Ada Lovelace architecture, the flagship GeForce RTX 4090 uses AD102 chips, and the 16GB and 12GB GeForce RTX 4080 series use AD103 and AD104 chips respectively.

NVIDIA explicitly mentioned that the Ada Lovelace architecture has new Streaming Processors, RT Cores, Tensor Cores, Optical Flow Accelerator and Video Engine.

In the Video Engine part, GeForce RTX 40 series and GeForce RTX 30 series are compared, the new generation graphics card has 2x NVENC (8th generation) and 1x NVDEC (5th generation); GeForce RTX 30 series is 1x NVENC (7th generation) With 1x NVDEC (Gen 5). The main difference is with NVENC, which also makes the GeForce RTX 40 series more powerful in the 8K 60Hz part.

The GeForce RTX 40 series also has H.264, H.265 and AV1 codec capabilities.

Going back to the Ada Lovelace architecture, let’s take a look at how different it is from the Ampere architecture published in 2020.

Ada Lovelace advances to the TSMC 4nm process, which is very different from the Samsung 8nm process of the Ampere architecture.

First of all, in GPCs (Graphics Processing Clusters), AD102 has been upgraded from 7 groups of GA102 to 12 groups. As for each group of GPCs, it consists of 6 groups of TPCs (Texture Processing Clusters), and then each TPCs integrates 2 SMs (Streaming Clusters). Multiprocessors), each group of SMs integrates the 3rd generation RT Core, 128KB L1 cache and 4 TMUs (Texture Mapping Units), while the 4 clusters (clusters) each have 16 FP32 CUDA Cores, 16 simultaneous FP32 + INT32 CUDA Cores, 4 load / store units and L0 cache with warp-scheduler and threat-dispatch; of course, more important here are the 4th generation Tensor Cores.

In summary, Ada Lovelace has 128 CUDA Cores, 4 Tensor Cores, and 1 RT Core per SM; 12 SMs per GPC, or 1,536 CUDA Cores, 48 ​​Tensor Cores, and 12 RT Cores. Therefore, 12 GPCs can provide up to 18,432 CUDA Cores, 576 Tensor Cores and 144 RT Cores; in addition, each GPU has 16 RPOs, which means that the AD102 has up to 192 ROPs.

Ada Lovelace still maintains PCIe 4.0 x16 and 384 bit memory interface.

Of course, the performance improvement will also force the power consumption to increase, but compared with the Ampere, under the premise of the same power consumption, you can see that the performance of Ada Lovelace has a considerable increase of 2x; the default TGP of AD102 aka GeForce RTX 4090 is 450W.

The new features of the Ada Lovelace architecture GPU include SER (Shader Execution Reordering), DMM (Displace micro-mesh), OMM (Opacity micro-masks), FP8 Inferencing, Optical Flow Accelerator and DLSS 3.

Among the many new features, the addition of DLSS 3 can be said to be a revolutionary feature.

DLSS 3 has all the features of DLSS 2 and AI super-resolution, but its newly added AI frame-generation feature can nearly double the frame rate under the same quality conditions. On the other hand, DLSS 3 can simply generate the entire screen through AI without going through the graphics rendering pipeline.

DLSS 3 introduces a revolutionary new feature that promises to nearly double the frame rate for the same quality, called AI Frame Generation. While it has all the features of DLSS 2 with its AI super-resolution (upscaling lower resolution frames to their native resolution with minimal loss of quality); DLSS 3 can simply use AI to generate the entire frame without involving graphics rendering pipeline. Therefore, each alternate frame using DLSS 3 is AI-generated, not a copy of the previously rendered frame.

The reason it can only be implemented on the GPU of the Ada Lovelace architecture is mainly due to the Optical Flow Accelerator (OFA) hardware, through which a so-called optical flow field is created to predict the appearance of the next picture. OFA also ensures that the DLSS 3 algorithm is not confused by static objects in fast-changing 3D scenes, largely thanks to the performance improvements brought by the FP8 of the 4th generation Tensor Cores.

The last element of DLSS 3 is Reflex. Reflex plays a vital role in DLSS 3 frame times by reducing the rendering queue to zero, and ensures that the rendering queue does not confuse the upscaler. The combination of OFA and 4th generation Tensor Cores is why Ada Lovelace has DLSS 3, which is why Ampere and other older architectures can’t work.

Leave a Comment

Your email address will not be published.