Concepts on DRANET

Linux Network Namespaces and Interfaces

Thu, 05 Jun 2025 11:20:46 +0000

Network namespaces create isolated network stacks, including network devices, IP addresses, routing tables, rules , … This separation is crucial for containerization.

Network namespaces also contain network devices that can live exactly on one network namespace:

physical network device can live in exactly one network namespace. When a network namespace is freed (i.e., when the last process in the namespace terminates), its physical network devices are moved back to the initial network namespace (not to the namespace of the parent of the process).

Making Networks Flexible

Thu, 05 Jun 2025 11:20:46 +0000

Think about how we build things. In the old days of IT, setting up a server was like building a detailed model airplane. Every piece had a specific part number and a precise spot where it had to be glued. The network card was eth0, and it was always eth0. If that changed, things broke.

Today, in the world of Kubernetes and the cloud, we build things more like we’re using Lego bricks. We have a big box of resources—CPU, memory, and networking and we snap them together to build what we need, when we need it. When we’re done, we take it apart and throw the bricks back in the box for the next project.

Interface Status

Sun, 25 May 2025 11:30:40 +0000

Understanding Interface Status Output

When DRANET allocates a network interface to a Pod via a ResourceClaim, it publishes the status of the allocated device within the ResourceClaim’s status field. This provides crucial insights into the readiness and configuration of the network interface from a Kubernetes perspective, adhering to the standardized device status defined in KEP-4817.

After a ResourceClaim is processed and a network device is allocated, its status is reflected under ResourceClaim.status.devices. This section contains conditions and networkData for each allocated device.

Hardware Efficiency

Sun, 25 May 2025 11:20:46 +0000

Scaling Out, Not Just Up

The journey of computing has always been a quest for greater efficiency. From hypervisors carving up physical servers to containers offering even more granular control, the pattern is clear. Now, with AI/ML and High-Performance Computing (HPC) taking center stage, a new frontier in resource optimization is opening up, especially around specialized hardware like high-performance networking.

This is where solutions like DRANET, a Kubernetes network driver, are making significant strides. By cleverly using Kubernetes’ Dynamic Resource Allocation (DRA), DRANET offers a declarative, Kubernetes-native method to manage and assign advanced network interfaces, including those powerful RDMA-capable NICs, directly to Pods. This isn’t merely about network connectivity; it’s a more intelligent approach to utilizing the potent, and often costly, hardware that underpins today’s distributed applications.

RDMA

Sun, 25 May 2025 11:20:46 +0000

Understanding RDMA Components in Linux

RDMA (Remote Direct Memory Access) is a powerful technology enabling applications to directly read from or write to memory on a remote machine without involving the CPU, caches, or operating system of either machine during the data transfer. This achieves ultra-low latency and high throughput, making it ideal for high-performance computing (HPC), AI/ML, and storage.

In a Linux system, the RDMA ecosystem involves several interconnected components:

RDMA Device Handling

Sun, 25 May 2025 11:20:46 +0000

DRANET provides robust support for Remote Direct Memory Access (RDMA) devices, essential for high-performance computing (HPC) and AI/ML workloads that require ultra-low latency communication. DRANET’s RDMA implementation intelligently handles device allocation based on the host system’s RDMA network namespace mode.

RDMA Device Handling in DRANET

DRANET manages three primary types of RDMA-related components for Pods:

RDMA Character Devices: These are user-space interfaces (e.g., /dev/infiniband/uverbsN, /dev/infiniband/rdma_cm) that user applications interact with to set up RDMA resources.

How It Works

Thu, 19 Dec 2024 11:20:46 +0000

The networking DRA driver uses GRPC to communicate with the Kubelet via the DRA API and the Container Runtime via NRI. This architecture facilitates the supportability and reduces the complexity of the solution, it also makes it fully compatible and agnostic of the existing CNI plugins in the cluster.

The DRA driver, once the Pod network namespaces has been created, will receive a GRPC call from the Container Runtime via NRI to execute the corresponding configuration. A more detailed diagram can be found in:

References

Thu, 19 Dec 2024 11:20:46 +0000

The Kubernetes Network Driver Model: A Composable Architecture for High-Performance Networking - This paper introduces the Kubernetes Network Driver model and provides a detailed performance evaluation of DRANET, demonstrating significant bandwidth improvements for AI/ML workloads.

The Challenges of AI/ML Multi-Node Workloads in Kubernetes - Antonio Ojea, Google - Regular SIG Network Meeting for 2025-07-17

Kubernetes Network Drivers, Antonio Ojea, Presentation