The rise of large language models (LLMs) and distributed AI training has fundamentally changed data center network architecture. GPU clusters require massive bandwidth, ultra-low latency, and deterministic performance—demands that legacy 100G infrastructures often cannot meet. Today, hyperscalers and enterprises building AI fabrics are rapidly adopting 400G optics, particularly OSFP112-400G-VSR4 and QSFP56-DD-400G-VSR4 for short-reach GPU-to-switch connections, and QSFP56-DD-400G-DR4 (also called QSFP DD DR4) for spine-leaf fabrics up to 500 meters. Meanwhile, traditional 100G optics such as QSFP28 100G LR4, ER4, ZR4, BIDI 40KM/80KM, and QSFP28 100G 100KM remain relevant for management networks, storage backends, and metro interconnects—but are being phased out for GPU front-end networks. This article explains the technical reasons behind this shift and provides a migration roadmap for AI-ready optical infrastructure.

AI training clusters (e.g., NVIDIA DGX H100/H200, AMD Instinct) use collective communication algorithms (all-reduce, all-to-all) that generate incast traffic patterns. Key requirements:
Bandwidth per GPU: 400G is becoming the minimum for next-gen GPUs (e.g., NVIDIA B200).
Latency: Sub-microsecond switch latency; transceiver latency must be under 100ns for VSR links.
Lossless fabric: Any packet loss triggers retransmission, stalling training.
Power efficiency: Thousands of transceivers in a cluster; every watt matters.
Density: 1U switches with 32–64 ports to minimize rack space.
Legacy 100G optics, even QSFP28 100G LR4, cannot deliver the required per-port bandwidth. 400G is the new baseline.
In an AI rack, GPUs are often less than 3 meters from the top-of-rack (ToR) switch. For such distances, OSFP112-400G-VSR4 and QSFP56-DD-400G-VSR4 offer compelling advantages.
OSFP112-400G-VSR4 uses 4×112G electrical lanes with no gearbox overhead (unlike 8×50G designs). This results in transceiver latency of approximately 60–80 nanoseconds, compared to 150–200ns for QSFP56-DD-400G-DR4 and over 500ns for 100G LR4 (due to serialization/deserialization). In all-reduce operations, every nanosecond adds up across thousands of hops.
At 7–8W, OSFP112-400G-VSR4 achieves ~18 mW/Gb. A cluster with 8,000 GPUs might require 32,000 VSR4 links (4 per GPU). Power saving compared to using four 100G LR4 links per GPU is enormous: 32,000 × (3.5W×4 - 8W) = 32,000 × (14W - 8W) = 192kW saved. Over a year, that translates to significant OpEx reduction.
VSR4 uses multimode fiber (OM4) and VCSELs, which are cheaper than single-mode DFB lasers. Cabling is also less expensive. For intra-rack links, QSFP56-DD-400G-VSR4 provides a QSFP-form-factor option for those already invested in QSFP ecosystem.
In larger AI clusters, GPU racks are distributed across multiple rows, with distances up to 500 meters between leaf and spine switches. Here, QSFP56-DD-400G-DR4 is the standard. It uses single-mode fiber and DFB lasers for higher power and reach, but still maintains 10W power consumption and 25 mW/Gb efficiency.
Compared to using four QSFP28 100G LR4 links for the same aggregate bandwidth, DR4 saves 4× switch ports (one 400G port vs four 100G ports) and reduces cabling complexity (one MPO-12 vs four duplex LC). For a spine switch with 32 uplinks, 400G DR4 provides 12.8 Tbps in 1U, while 100G LR4 would require 128 ports (impossible in 1U). Density wins.
Although the GPU fabric migrates to 400G, 100G transceivers remain essential for other functions:
Storage backend (NVMe over Fabrics): Often runs at 100G; QSFP28 100G LR4 or ER4 for longer distances to storage arrays.
Out-of-band management network: 100G is more than sufficient.
Data center interconnect (DCI): Connecting AI clusters across metro distances (40–100km) still relies on QSFP28 100G ZR4, BIDI 40KM/80KM, or coherent QSFP28 100G 100KM modules. 400G ZR is emerging but remains expensive.
Legacy compute clusters: Not all servers need 400G; 100G LR4 continues to serve.
Therefore, a mixed 100G/400G environment is inevitable. The key is to avoid using 100G optics in the GPU-facing fabric.
In distributed training, the all-reduce latency penalty is roughly proportional to the round-trip time of the fabric. The table below compares typical transceiver latencies (excluding switch ASIC and cable delay):
Clearly, VSR4 is the only choice for GPU-to-ToR links where latency is paramount. For leaf-spine, DR4’s 160ns is acceptable, but some hyperscalers are moving to OSFP112-based DR4 variants to lower latency further.
Consider a 4,000-GPU cluster (e.g., 500 NVIDIA H100 nodes). Each node typically has 8 GPUs. The GPU-to-ToR connections: 4,000 GPUs × 1 link per GPU = 4,000 400G links (using VSR4). Power consumption:
OSFP112-400G-VSR4: 4,000 × 7.5W = 30,000W (30kW) for optics.
If using legacy 100G LR4 (4×100G per GPU): 4,000 × 4 × 3.5W = 56,000W (56kW).
Savings: 26kW. At a PUE of 1.5, total facility power saved = 39kW. Annual electricity cost (at $0.10/kWh) = 39kW × 8760h × $0.10 = $34,164 per year just for the GPU fabric. Over a 5-year lifespan, >$170,000 saved, not including lower CapEx on cables and switch ports.
400G VSR4 and DR4 use MPO-12 connectors, which are more compact than four separate duplex LC connectors. In a rack with 32 ToR ports, MPO cabling reduces cable bulk by 75%. This improves airflow and simplifies maintenance. For AI clusters with thousands of cables, this is a major operational advantage.
If you currently have a 100G GPU fabric using QSFP28 100G LR4 or SR4, upgrading to 400G requires:
Replace GPU NICs with 400G-capable ones (e.g., NVIDIA ConnectX-7 or -8).
Replace ToR switches with 400G models (OSFP or QSFP-DD).
Replace optical cables: If using MMF, upgrade to OM4 and install OSFP112-400G-VSR4 or QSFP56-DD-400G-VSR4. If using SMF for longer runs, deploy QSFP56-DD-400G-DR4.
For leaf-spine, upgrade spine switches and use DR4 modules.
To maintain existing 100G storage or management links, keep those transceivers and connect them to separate switch ports. Use 400G-to-100G breakout only when absolutely necessary, as it adds latency.
NVIDIA’s announced B200 GPU and future AMD/Intel GPUs will support 800G per GPU. The industry is already standardizing 800G SR8 and DR8 using OSFP112 and QSFP112 form factors. However, 400G VSR4 will remain relevant for several years as the cost-effective option for less demanding workloads. Moreover, many 800G modules can be configured to run in 400G mode, allowing a gradual upgrade.
For long-haul AI DCI (e.g., connecting two AI clusters 80km apart), 400G ZR coherent optics will eventually replace 100G ZR4. But today, QSFP28 100G ZR4 and BIDI 80KM remain practical for bandwidth-limited interconnects.
Bonding four 100G links does not give you a single 400G logical pipe with the same flow hashing and latency characteristics. Load balancing issues can cause packet reordering and reduce training efficiency. Native 400G with a single flow is superior.
NVIDIA ConnectX-7 supports OSFP and QSFP-DD 400G. Check the specific NIC model; some use QSFP-DD. You may need a different form factor or an adapter cable.
Yes, but it is overkill and less power-efficient than VSR4. Use VSR4 for sub-100m.
Latency is around 1-2 microseconds due to DSP. Not suitable for GPU fabrics. BIDI is for metro, not intra-DC.
OSFP112 has slightly lower latency and power due to 4-lane electrical interface. QSFP56-DD-VSR4 uses 8 lanes and a gearbox, adding ~15ns latency. Both work, but OSFP112 is superior for greenfield AI.
Not for DCI and management networks. But within the GPU fabric, they are being phased out quickly.
OM4 multimode fiber (850nm). OM3 is limited to 70 meters. Use OM5 for future 200G VCSELs but not required.
AI/ML workloads have redefined data center networking. The days of stitching together 100G links are over. For GPU-to-switch connections, OSFP112-400G-VSR4 and QSFP56-DD-400G-VSR4 deliver the lowest latency, power, and cost per gigabit. For leaf-spine fabrics up to 500 meters, QSFP56-DD-400G-DR4 provides the density and performance required. Legacy 100G optics—QSFP28 100G LR4, ER4, ZR4, BIDI 40KM/80KM, and 100KM—still have roles in storage, management, and DCI, but not in the critical path of GPU communication.
Our team specializes in AI cluster optical design. We offer pre-validated 400G VSR4 and DR4 modules compatible with NVIDIA, Arista, Cisco, and Juniper switches. We also provide latency testing, power analysis, and full cabling support. Contact us to receive a tailored 400G migration plan for your AI/ML infrastructure.
| Transceiver | Typical Latency (ns) | Impact on 10k-GPU All-Reduce (relative) |
|---|---|---|
| OSFP112-400G-VSR4 | 70 ns | 1x (baseline) |
| QSFP56-DD-400G-VSR4 | 85 ns | 1.2x |
| QSFP56-DD-400G-DR4 | 160 ns | 2.3x |
| QSFP28 100G LR4 (used in breakout) | 500 ns | 7.1x |
| QSFP28 100G ZR4 (with FEC and DSP) | 1,200 ns | 17x |
Headquarter address :Room 1603, Coolpad Building B, North District of Science and Technology Park, Nanshan District, Shenzhen,China.518057