NVIDIA’s GPU Accelerated Platforms: DGX, HGX, OVX, and RTX
Posted on August 12, 2024 – 16 – 21 minute read
Contents
NVIDIA has established itself as a pioneering and dominant force in graphical, machine learning and AI (HPC) computing workloads, offering an array of sophisticated platforms tailored to meet diverse and complex demands.
In June of 2024, NVIDIA briefly became the world’s most valuable company, achieving a market capitalization of over $3 Trillion USD, with over 80% market share in AI related computing. NVIDIA remains in the top 5 world’s most valuable companies and it’s predicted to permanently take this lead in the future.
From the data centers driving enterprise-level AI research to the advanced graphics rendering engines powering the latest in gaming technology, NVIDIA’s product lineup is both extensive and specialized.
Central to their portfolio housing the world’s fastest GPUs are the DGX, HGX, OVX, and RTX platforms, each engineered with distinct capabilities designed to push the boundaries of what is possible in their respective fields.
Production of NVIDIA’s Chips
The production of NVIDIA’s cutting-edge chips, such as the A100, H100, L40s and L4 are the result of a synergistic collaboration between NVIDIA, ASML, and TSMC, each contributing critical technology and expertise to the process.
ASML
At the forefront, ASML plays a crucial role by providing the advanced lithography machines essential for manufacturing intricate semiconductor patterns. Their extreme ultraviolet (EUV) lithography technology is pivotal in creating the tiny transistors that enable modern GPUs to achieve high performance and efficiency.
TSMC
TSMC (Taiwan Semiconductor Manufacturing Company), the world’s leading semiconductor foundry, leverages ASML’s EUV technology to produce NVIDIA’s advanced chips. TSMC’s state-of-the-art fabrication processes, including the sophisticated chip-on-wafer-on-substrate (CoWoS) packaging technique, allow them to meet the high demand for GPUs such as the H100 and A100. TSMC’s ability to scale production ensures a steady supply of these high-performance GPUs and drives continuous advancements in semiconductor manufacturing.
NVIDIA collaborates with both ASML and TSMC to integrate its computational lithography software, cuLitho. This software significantly accelerates the chip design and manufacturing process by using NVIDIA’s GPUs to enhance lithography computations.
Production of NVIDIA’s DGX Platform
The NVIDIA DGX platform is specifically manufactured and distributed via NVIDIA themselves, sold only via official NVIDIA DGX Solution Partners and DGX Cloud Partners. Besides some of the integrated technology and software aspects, perhaps the main advantage of the DGX platform is the direct relationship with NVIDIA the customer has access to, together with the expertise and requirements of the delivery and hosting agents, such as NEBUL.
Production of NVIDIA’s HGX, OVX and RTX Platform
Several leading server manufacturers produce NVIDIA HGX systems, ensuring a wide range of configurations and options to meet various requirements.
These manufacturers include:
- Supermicro
- Hewlett Packard Enterprise
- Inspur
- Lenovo
- Dell Technologies
- ASUS
NVIDIA DGX: The F1 Race Car with the F1 Team of AI Supercomputing
DGX Overview
NVIDIA DGX systems represent the cutting-edge in AI supercomputing, purpose-built to meet the demands of AI, deep learning and advanced analytics.
As the flagship product line in NVIDIA’s AI portfolio, DGX systems deliver unparalleled computational power, scalability, and efficiency, making them the go-to choice for enterprises and research institutions aiming to drive innovation and gain insights at unprecedented speeds.
NVIDIA DGX is known as the world’s fastest and most comprehensive ‘off-the-shelf’ commercially available supercomputer. Meaning, it’s a fully contained solution including reference architecture, hardware networking and software capabilities.
DGX Architecture Design:
NVIDIA DGX systems is meticulously designed to optimize AI workflows. At its core, the DGX platform leverages the latest NVIDIA GPUs, interconnected with NVLink and NVSwitch technologies to facilitate ultra-high bandwidth and low-latency communication between GPUs. This configuration ensures that data flows seamlessly, allowing for rapid model training and inferencing.
The DGX architecture is further enhanced by an integrated software stack, including the NVIDIA CUDA-X AI libraries and the NGC catalog, which provides pre-trained models and optimized AI frameworks, streamlining the deployment and management of AI workloads.
DGX Key Use-Cases:
NVIDIA DGX systems are deployed across a wide array of industries and applications. Here are some of the most impactful workload-related use-cases:
- Deep Learning Training: DGX systems are optimized for training large-scale neural networks for NLP (natural language processing), and autonomous systems.
- Enterprise AI Inference: With high throughput and low latency, DGX systems are ideal for deploying trained models into production environments where real-time inference is critical, such as recommendation engines, fraud detection, and voice assistants.
- Machine Learning: DGX systems accelerate traditional machine learning algorithms, enhancing tasks like regression analysis, classification, and clustering for big data analytics.
- Data Analytics and Processing: DGX systems enable largest-scale data processing and analytics, empowering organizations to analyze vast datasets quickly and derive actionable insights, essential for sectors like finance, retail, and telecommunications.
- High-Performance Computing (HPC): The DGX platform supports computationally intensive simulations and complex mathematical modeling, making it suitable for scientific research in fields such as climate modeling, genomics, and astrophysics.
- Reinforcement Learning: Ideal for training reinforcement learning models that require substantial computational power and parallel processing capabilities, applicable in robotics, game development, and automated trading systems.
- Enterprise Generative AI: Facilitates the training and deployment of generative models like GANs and VAEs, which are used for content creation, drug discovery, and synthetic data generation.
- Enterprise Scale Digital Twins
DGX Supported GPUs
NVIDIA DGX systems support the latest and most powerful GPUs in NVIDIA’s lineup. This includes the A100 and the new H100 Tensor Core GPUs. Both GPUs are designed to deliver exceptional performance for AI and high-performance computing (HPC) workloads, but they come with distinct features and advancements.
The A100 GPU, based on the Ampere architecture, has been a cornerstone for AI and HPC applications. It features third-generation Tensor Cores, which significantly accelerate AI computations, and supports multi-instance GPU (MIG) technology, allowing multiple networks to operate concurrently on a single GPU. The A100 is known for its versatility and power, capable of handling a wide range of AI and data analytics tasks efficiently.
In contrast, the newer and more powerful H100 GPU represents a significant leap forward with the new Hopper architecture. It brings substantial improvements in both performance and energy efficiency over the A100.
DGX Notable Attribute
NVIDIA DGX systems boast several attributes that set them apart from other AI computing platforms such as HGX, OVX and RTX:
Support and Ecosystem: NVIDIA provides direct and comprehensive support and a robust ecosystem of partners and developers, ensuring that DGX users have access to the latest advancements and resources in AI technology. The DGX is the only NVIDIA platform that gives the customer direct support and access to NVIDIA engineering and executive teams.
What about External Data Storage?
The only component not included in this platform is the external shared data storage solution. NVIDIA has defined a strict certification process for qualifying shared storage solutions which today include solutions like:
- VAST Data (Most modern, simple to operate and built for AI Capabilities)
- NetApp (Most mature platform, feature rich and stable)
- DDN (Fastest for largest-scale applications – Best for Superpod)
- Weka (Most modular and virtualized)
Depending on your objectives, application and workload requirements, the supported DGX storage platform solutions have their strengths and weaknesses.
NVIDIA HGX: The Flexible AI Supercomputing King
HGX Overview
NVIDIA HGX systems are designed as highly flexible, supercomputing-performance platform to meet the diverse demands of AI, data analytics, and high-performance computing (HPC).
Serving as the backbone for various hyperscale and enterprise AI deployments, HGX systems offer unmatched scalability, performance, and efficiency.
HGX are crafted to deliver the computational power necessary for today’s most intensive AI and HPC workloads, providing a robust foundation for data centers aiming to drive innovation and operational excellence.
Typically, you’ll find NVIDIA Cloud Solution providers will host the HGX with H100s as their most capable supercomputing platform. The HGX is more optimized to host multi-tenant environments vs the DGX and can hit a lower price-point with similar performance, though not as fully integrated of a solution.
HGX Architecture and Design
The NVIDIA HGX system is essentially built to the same and specifications as the DGX platform, and capable of similar performance metrics. Whereas the HGX is more flexible, modular, and supports multi-tenancy with more ease, typically deployed by Cloud Service Providers, where DGX is typically more focused on specific reference architecture, deployed by companies who want single-purpose use from single tenants without the need for flexibility of multiple-uses simultaneously.
One notable difference is the HGX does not include the more advanced software packages included with NVIDIA’s DGX platform, but can be ordered separately if desired.
HGX Key Use-Cases
NVIDIA HGX systems are employed across a broad spectrum of industries, powering some of the most demanding AI and HPC applications. Here are key workload-related use-cases:
- Deep Learning Training
- AI Inference
- Machine Learning
- Data Analytics and Processing
- High-Performance Computing (HPC)
- Reinforcement Learning
- Generative AI
- Digital Twins
- Edge Computing
HGX Supported GPUs
Like the DGX platform, the HGX platform support the latest and most advanced GPUs, including the A100 and the H100 Tensor Core GPUs.
NVIDIA OVX: The Powerhouse of Virtualization and Simulation
Overview OVX
NVIDIA OVX systems are specifically designed to accelerate virtualized workloads, providing the computational power necessary for demanding applications in virtualization, simulation, and enterprise AI.
Built to deliver exceptional performance, scalability, and efficiency, OVX systems enable organizations to harness the full potential of virtualized environments, driving innovation and operational excellence across various industries.
OVX Architecture and Design
NVIDIA OVX systems are architected to support the complex needs of virtualized and simulation workloads. Central to the OVX platform are the latest NVIDIA GPUs, including the L4 and L40 GPUs, interconnected using NVLink and NVSwitch technologies to provide high-bandwidth, low-latency communication.
This setup ensures efficient data transfer and parallel processing, which is critical for virtual desktop infrastructure (VDI), high-fidelity simulations, and real-time rendering. The OVX architecture is further enhanced by a robust software stack, including NVIDIA vGPU software, CUDA-X AI, and optimized drivers, which streamline the deployment and management of virtualized workloads.
OVX Key Use-Cases
NVIDIA OVX systems are employed in a wide range of industries, powering some of the most demanding virtualization and simulation applications. Here are key workload-related use-cases:
- Virtual Desktop Infrastructure (VDI): OVX systems provide high-performance virtual desktops, enabling organizations to deliver a seamless, responsive user experience for remote workforces, with support for GPU-accelerated applications.
- High-Fidelity Simulations: OVX systems excel at running complex simulations in fields such as automotive, aerospace, and manufacturing, where high computational power and real-time rendering are essential.
- Generative AI: OVX platforms support AI-powered virtual environments and applications, enhancing productivity and operational efficiency across various sectors including finance, healthcare, and retail.
- 3D Rendering and Visualization: OVX systems enable real-time 3D rendering and visualization for industries such as media and entertainment, architecture, and engineering, facilitating the creation of high-quality visual content.
- Collaborative Workspaces: OVX systems power virtual collaborative environments, allowing teams to work together in simulated spaces, which is particularly useful for design, training, and planning applications.
- Digital Twins: OVX systems support the creation and operation of digital twins, providing real-time simulations and monitoring of physical assets and processes, essential for sectors like manufacturing, energy, and smart cities.
- Edge Computing: OVX systems are also well-suited for edge computing scenarios, where low latency and real-time data processing are critical for applications such as IoT, smart grids, and autonomous vehicles.
- Omniverse Platform: NVIDIA Omniverse provides a collaborative platform for 3D design and simulation, enabling teams to work together seamlessly across different applications and locations.
OVX Supported GPUs
NVIDIA OVX systems support the latest and most powerful midrange GPUs, including the L4, L40, L40s, and A40 GPUs.
The L4 GPU, based on the Ada Lovelace architecture, is a versatile, energy-efficient accelerator ideal for edge, cloud, and enterprise deployments.
The L40 GPU, also based on the Ada Lovelace architecture, is designed to handle the most demanding AI and visualization workloads. It features fourth-generation Tensor Cores, third-generation RT Cores, and supports advanced AI training and inference, rendering, and simulation tasks.
The L40s GPU includes additional optimizations for performance and efficiency, making it ideal for AI training, inference, and large-scale digital twin simulations, supporting the most demanding Omniverse workloads.
The A40 GPU, based on the Ampere architecture, is designed for professional visual computing. It supports AI and data science workloads, real-time rendering, and high-fidelity simulations.
NVIDIA RTX: The King of Graphics and Ray Tracing
Overview RTX
NVIDIA RTX systems represent the pinnacle of graphics performance, combining real-time ray tracing, AI-enhanced graphics, and cutting-edge simulation capabilities.
Designed for creative professionals, scientists, and gamers, RTX systems deliver the ultimate in visual computing. They offer unmatched performance and realism for applications ranging from digital content creation to high-end gaming and virtual reality (VR).
Architecture and Design
NVIDIA RTX systems are built on the latest GPU architectures, including Ada Lovelace for the RTX 40 series, which provide significant enhancements in rendering, AI processing, and simulation. RTX GPUs are equipped with RT Cores for real-time ray tracing, Tensor Cores for AI workloads, and CUDA Cores for general-purpose computing.
This combination enables RTX systems to deliver photorealistic graphics, accelerate AI tasks, and support complex simulations. The RTX architecture is further enhanced by NVIDIA’s software ecosystem, including NVIDIA Studio drivers, RTX IO for faster data processing, and the NVIDIA Omniverse platform for collaborative 3D design.
RTX Key Use-Cases
NVIDIA RTX systems are utilized across various industries, powering some of the most demanding graphics and AI applications. Here are key workload-related use-cases:
- Digital Content Creation: RTX systems enable artists and designers to work with complex models and high-resolution textures in real-time, enhancing workflows in 3D modeling, animation, and visual effects.
- High-End Gaming: RTX GPUs bring immersive gaming experiences with real-time ray tracing and AI-enhanced graphics, providing lifelike visuals and smooth gameplay.
- Virtual Reality (VR) and Augmented Reality (AR): RTX systems support the creation and rendering of VR and AR content, enabling applications in gaming, training, and simulation.
- AI-Enhanced Graphics: Tensor Cores in RTX GPUs accelerate AI-driven tasks such as image upscaling with DLSS (Deep Learning Super Sampling), providing higher resolution and better performance in games and creative applications.
- Scientific Visualization: RTX systems are used for visualizing complex scientific data, supporting fields like medical imaging, molecular modeling, and climate simulation.
- Product Design and Engineering: RTX GPUs facilitate real-time rendering and simulation for product design, allowing engineers to visualize and test designs with high fidelity.
- Video Production: RTX systems accelerate video editing and rendering tasks, enabling faster workflows and higher quality output in media production.
RTX Supported GPUs
NVIDIA RTX systems support the latest GPUs from the RTX series, including the RTX 30 and RTX 40 series.
The RTX 30 series, based on the Ampere architecture, offers substantial improvements in performance and efficiency over previous generations. Key features include second-generation RT Cores for enhanced ray tracing and third-generation Tensor Cores for improved AI processing.
The RTX 40 series, built on the Ada Lovelace architecture, represents a significant leap forward in graphics technology. Key advancements of the RTX 40 series include:
- Fourth-Generation Tensor Cores: These provide significant boosts in AI processing power, accelerating tasks such as DLSS and AI-enhanced applications.
- Third-Generation RT Cores: Enhanced ray tracing capabilities deliver more realistic lighting, shadows, and reflections, improving the visual fidelity of rendered scenes.
- Improved CUDA Cores: Offering better performance and efficiency for general-purpose computing tasks, supporting a wide range of applications from gaming to scientific computing.
- NVIDIA DLSS 3: The latest version of DLSS leverages AI to upscale lower-resolution images to higher resolutions, providing better performance and visual quality in games and other applications.
Nebul Provides the Right NVIDIA Platform Solution in the Right Format for Your EU Organization – Available Today!
1. On-Prem Delivery of Complete Platform Solution Stacks
Nebul is an official European NVIDIA partner, and can help your teams select (and order) the right platform, networking, data storage and delivery a complete solution On-Prem, and if desired, include full consulting, installation and tuning, and even maintenance and support.
As an authorized NVIDIA partner we are certified to order, deliver, install, maintain and support all of these platform solutions mentioned in this article.
2. Turn-key Private Sovereign AI Cloud (Platform-as-Service)
Nebul hosts all NVIDIA platform solutions mentioned in our European data centers. A Platform-as-a-Service approach means it’s all installed and running, your teams can simply use it and start running your workloads. We manage all the way up to and including the OS layers, giving you the availability, performance and velocity to start your projects next week, at any scale.
This solution is delivered with VMs and/or containerized front-ends with virtualization management tools and various capabilities to allow you to just do your work. Our team provides direct support to ensure your focused on workloads and not technical infrastructure management issues.
Our Cloud services provide solutions that are hosted in Europe, operated from Europe and meet the stringent European regulations and compliance for European data sets.
3. Turn-key AI Infrastructure (Infrastructure-as-a-Service)
If you have an advanced technical team who just wants raw access to NVIDIA platform solutions, we have these available as well. You manage the complete system, but we host it in certified EU data center providers connected with the right networks and energy capabilities. We build it for you, including network, storage, energy connections, but YOU are your own pit-crew, while we provide the F1 Race-Car with NVIDIA and any ad-hoc support you might need to address any technical issues.
4. Hybrid-Cloud and Hybrid-IaaS
Finally, if you have (and want) a mix of data On-Prem DCs and external data centers or cloud, we can provide a complete solution, including how to connect all data sources safely. We can provide a custom SLA and management layer tailored to exactly what your more complex requirements need.
Flexible Financing
In all of our NVIDIA AI platform solutions, you can pay up-front, monthly, yearly, or whatever works best for you. You can ‘own’ the solution, or just utilize it for the term you need. In general our commercial model is 1/2 the price of Public Cloud, our infrastructure has been measures to be 15x faster than Public Cloud.
We typically work with flat monthly rates, so that means no monthly variability or hidden charges. You know exactly what you’ll spend in advance. And you have 100% full utilization of the system, meaning, no waiting for your dedicated resources.
Taking action is as simple as scheduling a chat with our expert team to discuss your needs and requirements, and we’ll deliver.
Don’t wait to start your AI projects, we have the complete platforms ready-to-go, you can start tomorrow.
Our customers find that our turn-key NVIDIA infrastructure solutions bridge the gap and eliminate the wait times. Why wait 6-12 months for your NVIDIA Clusters when you can start now.