GPUs on Azure Local: Bringing AI to the Edge

There’s an inescapable gravitational pull in the technology world right now, and it’s called AI. Every conversation, every strategy deck, every vendor keynote inevitably arrives at the same destination. While there’s plenty of hype to cut through, there’s also genuine substance underneath, and for on-premises infrastructure the question is increasingly not whether you’ll run AI workloads locally, but when and how.

Azure Local’s GPU support has been maturing steadily, and it’s now at a point where running GPU-accelerated workloads on your on-premises Azure Local clusters is a practical reality. Let’s talk about what this looks like, why it matters, and where it makes sense.

The Case for Local AI

Before getting into the technical bits, it’s worth articulating why you’d want to run AI workloads on-premises rather than in Azure. There are a few common drivers.

Data gravity and sovereignty. If your data can’t leave your premises, whether for regulatory, compliance, or policy reasons, then your AI models need to come to your data, not the other way round. This is particularly relevant in government, healthcare, financial services, and manufacturing environments where data sensitivity is paramount.

Latency. AI inference workloads that need real-time or near-real-time responses don’t always tolerate the round trip to a cloud data centre. Think quality inspection on a manufacturing line, real-time video analytics for security, or diagnostic imaging in a hospital. Milliseconds matter.

Cost predictability. GPU instances in the public cloud are expensive, and for persistent workloads that run 24/7, the economics of running them on-premises can be significantly more favourable. If you know you need a GPU running all day every day, owning it is cheaper than renting it.

Disconnected or limited connectivity scenarios. Some edge locations simply don’t have reliable connectivity to the cloud. Running AI inference locally ensures the workload keeps running regardless of network availability.

How It Works on Azure Local

Azure Local supports GPU passthrough using Discrete Device Assignment (DDA), which allows you to assign a physical GPU directly to a virtual machine. The VM gets dedicated access to the GPU hardware, and the workload running inside the VM can use it just as it would on bare metal. This is supported for both Azure Local VMs and for AKS node pools, giving you flexibility in how you deploy GPU-accelerated workloads.

The supported GPU list has been growing, with NVIDIA being the primary partner here. NVIDIA RTX series GPUs are supported for professional visualisation and AI inference workloads, with the Tesla and data centre GPU lines providing higher end capability for more demanding scenarios.

From an Azure management perspective, GPU-enabled VMs are managed the same way as any other Azure Local VM through the Azure portal. You can deploy, monitor, resize, and manage them using the same tools and APIs. The GPU is visible as a resource in the VM configuration, and you can use standard NVIDIA drivers and CUDA toolkits inside the VM.

What This Looks Like on Dell AX

Dell AX nodes support GPU configurations depending on the specific node model. The form factor and power envelope of the node determine what GPUs can be physically installed. For AI inference workloads, the common configurations use NVIDIA professional GPUs that fit within the power and thermal constraints of the AX chassis.

The important thing from a Dell perspective is that GPU configurations within AX nodes are validated and supported as part of the overall solution. The Solution Builder Extension handles firmware updates for the GPU alongside the rest of the system, and Dell ProSupport covers the GPU hardware. You’re not bolting a GPU into a server and hoping for the best, you’re deploying a validated configuration that’s been tested end to end.

For organisations looking at larger scale AI workloads, the AX node lineup offers options with more PCIe slots and power headroom to accommodate higher end GPUs. The specific configuration will depend on your workload requirements, and it’s worth engaging with Dell’s solution architects to get the sizing right.

Practical Use Cases

The use cases I’m seeing most often in the field fall into a few categories.

AI inference at the edge is the most common. Organisations training models in Azure and then deploying them to Azure Local for inference close to the data source. This pattern works well because the training happens where the GPU capacity is abundant and elastic (Azure), while inference happens where the data lives (Azure Local).

Visual computing and VDI. GPU-accelerated virtual desktops for engineering, design, and creative workloads. If you’re running Azure Virtual Desktop on Azure Local , adding GPUs to your session hosts opens up workloads that previously required dedicated physical workstations.

Video analytics. Processing camera feeds for security, safety, or operational intelligence. This requires real-time processing that can’t tolerate cloud latency, and GPUs provide the compute density needed to process multiple video streams concurrently.

Where We’re Heading

The GPU story on Azure Local is only going to get bigger. Microsoft and NVIDIA continue to deepen their partnership, with new GPU models being validated, new capabilities for managing GPU resources through Azure, and expanding support for AI frameworks and model serving tools.

The direction is clear: Azure Local is positioning itself as the platform for running AI workloads outside of Azure data centres. Whether that’s at the edge, in your main data centre, or in disconnected environments, the combination of Azure management, NVIDIA GPU hardware, and Dell AX infrastructure provides a foundation that’s ready for what’s coming. The question for most organisations isn’t whether to start exploring this, it’s what use case to start with. Pick one, prove the value, and build from there.