The Inevitable Convergence: How Silicon Photonics and Co-Packaged Optics Are Redefining AI Data Center Interconnects

Sonya
Sep 26
15 min read

The End of the Copper Era: Physical and Economic Limits at the 1.6T Bottleneck

The Fundamental Trade-off Dilemma

Data center architects have long faced a fundamental trade-off: the choice of interconnect technology requires a balance between transmission distance, power consumption, and reliability. Traditional copper cable links have been the preferred solution for short-reach applications due to their high energy efficiency and reliability. However, their physical properties impose severe distance limitations; passive copper cables are typically effective for less than 2 meters, and even active copper cables can only extend to 5 to 7 meters. On the other hand, conventional fiber optic links, while offering longer reach, come at the cost of higher power consumption and lower reliability.

For a long time, this trade-off was acceptable. But with the exponential growth in bandwidth demand driven by artificial intelligence (AI) workloads, particularly within GPU clusters, this balance has been completely disrupted. As network speeds advance from 400G and 800G to 1.6T and even 3.2T, existing interconnect technologies—both copper and traditional fiber—are facing insurmountable physical scaling bottlenecks.

Hitting the Physical Wall at 1.6T Speeds

At data rates of 1.6T, the limitations of copper become especially pronounced. At terabit speeds, copper cables are not only too short but also physically bulky, making them impractical for deployment in high-density AI server racks. Simultaneously, traditional pluggable optical modules are confronting their own "power wall."

Preliminary industry data indicates that a 1.6T pluggable optical transceiver consumes a staggering 23 to 25 watts. NVIDIA CEO Jensen Huang has issued a stark warning on this matter: a modern AI GPU requires six optical transceivers, each consuming about 30 watts. This means that building a massive cluster of one million GPUs would result in an interconnect power consumption of an astonishing 180 megawatts (MW)—an unsustainable energy burden for any large-scale system. This clearly illustrates that interconnect power is no longer a secondary consideration but a core factor limiting the expansion of AI computing capabilities. Energy has become the "most important commodity" in the data center.

Consequently, the industry's focus has shifted to a more critical metric: energy per bit, measured in picojoules per bit (pJ/bit). Currently, a typical 800G optical transceiver consumes about 15 watts, which translates to an energy efficiency of approximately 18.75 pJ/bit. To achieve sustainable scaling, however, data center engineers are aiming to reduce this figure to below 5 pJ/bit, with an ultimate goal of 1 pJ/bit. Under the existing pluggable architecture, this target is unattainable.

The SerDes Bottleneck and the Energy Imperative

The root of the problem lies in the signal path from the switch Application-Specific Integrated Circuit (ASIC) to the front-panel optical module. As per-lane signal rates increase to 112 Gbps and even 224 Gbps, signals traveling over even a few tens of centimeters of copper traces on a printed circuit board (PCB) suffer from severe attenuation and distortion.

To compensate for these losses and ensure signal integrity, systems must incorporate high-power digital signal processors (DSPs) and retimers in the signal path. These compensation circuits are themselves significant power consumers; in some high-speed pluggable modules, the DSP can account for 25% to 30% of the total power consumption. This architecture has led to an economically irrational situation where the power for data I/O could soon exceed the compute power of the core switch chip itself. This is a clear signal that the industry needs not an incremental improvement of existing technology, but a complete architectural revolution.

From this analysis, a clear conclusion emerges: the transition to optical interconnects is no longer an option but an economic necessity driven by the immense energy demands of AI. The core driver of this transformation has shifted from simply pursuing longer reach to the urgent need for "power-efficient bandwidth." In the race to build AI factories, every watt saved in the interconnect system can be reallocated to a GPU for computation, directly impacting the data center's Total Cost of Ownership (TCO) and its ultimate computational output.

Silicon Photonics: Fabricating Light Paths on a Chip

To address the aforementioned interconnect bottleneck, the industry has turned its attention to a disruptive technology: Silicon Photonics (SiPh). The core idea of this technology is to integrate the functions of optical components onto standard silicon chips, thereby creating "Photonic Integrated Circuits" (PICs).

An Introduction to Silicon Photonics Technology

The silicon photonics platform utilizes silicon as an optical medium to guide infrared light, typically at the 1.31-micron and 1.55-micron wavelengths commonly used in telecommunications. Leveraging silicon's high refractive index (approximately 3.5), light can be confined and propagated within microscopic waveguides with cross-sectional dimensions of just a few hundred nanometers. These waveguides act as "wires" for photons.

A variety of key optical components can be integrated onto a single silicon photonics chip, including:

Waveguides: Paths that guide photons within the chip.
Modulators: Devices that encode electronic data signals onto a light beam, performing the electro-optic conversion.
Photodetectors: Devices that absorb photons and convert the information they carry back into an electronic signal, performing the opto-electric conversion.

By integrating these components on a single chip, silicon photonics technology can transmit information with smaller size, lower power consumption, and higher speed than traditional discrete optical components.

The Unparalleled Advantage: The CMOS Ecosystem

The most central and strategic advantage of silicon photonics lies in its compatibility with the existing Complementary Metal-Oxide-Semiconductor (CMOS) manufacturing infrastructure. This means that silicon photonics components can be mass-produced in mature semiconductor foundries around the world using standardized, high-yield processes.

This advantage has profound economic implications. It allows silicon photonics to directly leverage the hundreds of billions of dollars in R&D and equipment investment made by the semiconductor industry over decades, enabling manufacturing at a fraction of the cost of traditional optical materials like Indium Phosphide (InP) or Gallium Arsenide (GaAs). According to industry roadmaps, using mature 300mm wafer processes, the yield for silicon photonics chips can exceed 90%, providing a scalable path from prototype to mass production.

The Technology's Achilles' Heel: Light Source Integration and Manufacturing Hurdles

Despite its promising outlook, silicon photonics still faces inherent challenges that create barriers to its widespread adoption.

The primary challenge is the integration of the light source. Due to silicon's indirect bandgap, a fundamental physical property, it is not an efficient light-emitting material itself. Therefore, the lasers required to generate the optical signal must be made from III-V semiconductor materials (such as InP) and then integrated with the silicon photonics chip. This process, whether through external coupling or heterogeneous integration onto the chip, is extremely complex and has historically been a major source of cost.

A second challenge is the complexity of packaging. Precisely aligning a single-mode optical fiber to a silicon photonics waveguide with dimensions of only a few hundred nanometers is a formidable packaging task. This step, known as "fiber coupling," demands extremely high precision, and its cost can account for as much as 80% of the total cost of an entire optical transceiver module. Consequently, while the manufacturing cost of the silicon photonics chip itself can be low, the final product's cost and yield are largely determined by the efficiency of packaging, assembly, and testing.

From a strategic perspective, silicon photonics represents a strategic convergence of the semiconductor and optical industries. In this convergence, the core of the value chain is shifting from traditional discrete component assembly to wafer-level integrated manufacturing. This means silicon photonics transforms a complex precision assembly problem into a semiconductor manufacturing problem—an area where semiconductor giants like TSMC and Intel hold an overwhelming competitive advantage. Therefore, future investment focus will shift from traditional optical component manufacturers to the semiconductor giants and their ecosystems that can master advanced packaging and solve the laser integration problem at scale.

The CPO Revolution: Integrating Light and Logic

Building on the maturity of silicon photonics, a revolutionary system architecture has emerged: Co-Packaged Optics (CPO). CPO is not a simple refinement of existing technology but a fundamental reinvention of data center interconnects.

Deconstructing the CPO Architecture

The core architecture of CPO is clearly defined: it moves the optical engine responsible for electro-optic conversion from the traditional front-panel pluggable module and integrates it directly with the switch ASIC on the same substrate or within the same package.

This design leap shortens the electrical signal path from tens of centimeters on a PCB to just a few millimeters on the substrate. This extreme physical proximity is the foundation for all of CPO's advantages. It completely eliminates the high-frequency loss issues that plague traditional designs, thereby removing the need for power-hungry DSPs for signal compensation.

Comparative Architectural Analysis: CPO and Its Alternatives

To fully appreciate the revolutionary nature of CPO, it is necessary to compare it with existing and transitional technologies.

DSP-based Pluggable Optics: This is the current mainstream technology. Its greatest advantages are high field serviceability (hot-swappable) and a mature, open multi-vendor ecosystem. However, at speeds of 1.6T and above, it faces insurmountable power and density bottlenecks.
Linear Pluggable Optics (LPO): This is an evolutionary, transitional solution. LPO retains the pluggable form factor to maintain serviceability but removes the DSP chip from the module, shifting the burden of signal compensation to the switch ASIC. Compared to traditional DSP modules, LPO can save about 50% in power consumption, but its application is primarily limited to short-reach scenarios. LPO can be seen as a pragmatic, lower-risk compromise before a full transition to CPO.
Co-Packaged Optics (CPO): This is a revolutionary architectural change. CPO offers the lowest power and latency through deep integration of silicon and photonic chips. It is not a simple module replacement but requires a redesign of the entire switch system.

The following table provides a quantitative comparison of these technologies' key metrics at speeds of 1.6T and above, offering a clear decision-making framework for investors.

Table 1: Comparative Analysis of Interconnect Technologies at 1.6T+ Speeds

Metric	Passive Copper Cable	Pluggable Optics (DSP)	Linear Pluggable Optics (LPO)	Co-Packaged Optics (CPO)
Power Consumption (pJ/bit)	Lowest (<1)	High (15-20)	Medium (~7-10)	Lowest (<5, Target <1)
Latency	Lowest	High (DSP adds ns-level delay)	Medium (No DSP)	Lowest (shortest path)
Max Reach	< 2 meters	> 10 km	< 2 km	> 100 meters
Bandwidth Density	Low	Medium	Medium	Highest
Serviceability	High	Highest (Hot-swappable)	Highest (Hot-swappable)	Low (Requires board replacement)
Ecosystem Maturity	Mature	Mature	Emerging	Nascent/Pre-standardization

This table clearly reveals that CPO is not merely an incremental improvement but a fundamental architectural shift. It also explains the rationale for LPO's existence as a less invasive transitional technology. For investors, this reframes the decision from "which technology is best?" to "which technology is best suited for a specific application, timeline, and risk tolerance?"

Quantifiable Returns: Power, Latency, and Density

The benefits brought by CPO are concrete and disruptive, primarily in three areas:

Energy Efficiency: CPO solutions can achieve a system-level power reduction of 30% to 50% compared to pluggable modules. Interconnect power can be compressed to below 5 pJ/bit , and in some advanced designs, even below 1 pJ/bit. This is a qualitative leap compared to the current 15-20 pJ/bit level of pluggable modules. NVIDIA claims its CPO solution offers a 3.5x improvement in energy efficiency.
Latency Reduction: By eliminating DSP processing delay and drastically shortening the electrical path, CPO provides the lowest latency achievable with current technology. This is crucial for tightly coupled, collaborative AI training clusters, where any minor delay can lead to expensive GPUs sitting idle.
Bandwidth Density: CPO significantly increases the total bandwidth that can be brought out of a single chip package, supporting over 1 Tbps of bandwidth density per millimeter of chip edge. This enables the design of switches with a higher radix (number of ports), such as NVIDIA's planned CPO switch with 512 ports of 800G, which will greatly enhance network topology flexibility and scalability.

The Market Tipping Point: The CPO Adoption Path for 2025 and Beyond

Based on forecasts from multiple industry analysis firms, the tipping point for the CPO market is expected to arrive between the second half of 2025 and 2026, when the technology will transition from laboratory prototypes to commercial deployment.

Consolidated Market Forecasts

Timeline: The market consensus is that commercial deployment of CPO will begin in 2025. NVIDIA plans to launch its Quantum-X InfiniBand CPO switch in the second half of 2025, followed by the Spectrum-X Ethernet CPO switch in 2026.
Market Size: Market research firm LightCounting predicts that the "AI Cluster Optics" market will grow from $5 billion in 2024 to over $10 billion in 2026, with LPO and CPO being the main growth drivers after 2026-2027. Yole Group forecasts the CPO market to reach $2.6 billion by 2033, with a compound annual growth rate (CAGR) of 46%. Other firms project a CAGR for the market between 15% and 37%.
Penetration Rate: By 2027, CPO ports are expected to account for nearly 30% of the total 800G and 1.6T port shipments, indicating a rapid rise to become a significant part of the high-speed interconnect market.

The Killer Application: AI Scale-Up and Scale-Out

The unique demands of AI infrastructure are the primary catalyst driving the adoption of CPO technology.

Characteristics of AI Workloads: Training and inference for large-scale AI models require massive amounts of "east-west" data exchange within GPU clusters. This communication has extremely stringent requirements for high bandwidth and ultra-low latency to prevent GPUs from idling while waiting for data, thereby maximizing computational efficiency.
Scale-Up Networks: CPO is crucial for the "scale-up" networks between AI accelerators. It enables the creation of tightly coupled compute fabrics, allowing hundreds or even thousands of GPUs to work in concert as a single, massive super-GPU. In this application, CPO will first replace traditional copper interconnects.
Scale-Out Networks: In the long term, CPO will also penetrate the "scale-out" networks that connect switches. In this domain, it will begin to cannibalize the market share of traditional pluggable optical modules, although this will be a more gradual process.

Overcoming Headwinds: Thermal Management, Serviceability, and Standardization

Despite the bright outlook, the path to widespread CPO adoption faces three major challenges that could slow its pace.

Thermal Management: Packaging an extremely high-power ASIC (often exceeding 500 watts) with temperature-sensitive optical components creates a formidable thermal challenge. High temperatures not only affect the ASIC's performance but can also cause wavelength drift and reliability degradation in optical components, especially lasers. Therefore, advanced liquid cooling technology is no longer an option but a necessity for CPO systems. Thermal management studies for 51.2 Tbit/s CPO systems have already validated that well-designed liquid cooling solutions are key to ensuring stable system operation.
Serviceability and Reliability: This is CPO's biggest operational drawback. When a traditional pluggable module fails, it can be hot-swapped in minutes. However, CPO's optical engine is deeply integrated with the ASIC. A failure could require replacing an entire line card or even the whole switch, leading to longer downtime and higher operational risk. Therefore, incorporating redundancy into the design is crucial for improving overall system reliability.
Ecosystem Maturity and Vendor Lock-in: The CPO ecosystem is still in its infancy. Early deployments will be proprietary solutions, which introduces the risk of vendor lock-in—something hyperscale data center operators actively avoid. The long-term success of CPO depends on establishing a healthy, competitive, multi-vendor ecosystem supported by strong industry standards.

Analyzing market dynamics and challenges, CPO adoption is likely to follow a two-phase model. Phase One (2025-2027) will be dominated by vertically integrated giants like NVIDIA, who will apply CPO in their proprietary, high-margin AI scale-up systems. In these systems, extreme performance and power efficiency are the primary goals, and the "single-vendor" issue is an inherent part of the system.

Phase Two (post-2027), as standards mature, field reliability is proven, and a multi-vendor supply chain emerges, CPO will begin to be adopted in the broader general-purpose data center switch market. For investors, this means the market will be bifurcated in the medium term. Attention should be paid to companies with competitive advantages in different phases: NVIDIA and Broadcom have a head start in the early proprietary market, while companies that actively embrace open standards may win out in the later, broader market.

The Emerging Supply Chain: A New Ecosystem Forged by Integration

The rise of CPO is reshaping the entire optical communication and semiconductor supply chain, giving birth to a new ecosystem driven by deep integration. In this ecosystem, traditional boundaries are blurring, and success depends on cross-disciplinary collaboration.

The Key Player Landscape

Several tech giants are actively positioning themselves in CPO technology, but their strategies vary.

System and ASIC Designers:
- NVIDIA: As the leader in AI computing, NVIDIA is aggressively pushing CPO adoption in its AI platforms (like Quantum-X and Spectrum-X). NVIDIA not only designs the switch ASICs but also delves deep into core silicon photonics R&D, such as its innovative Micro Ring Modulator, and works closely with TSMC to form a powerful vertically integrated capability.
- Broadcom: As the giant in the merchant switch silicon market, Broadcom is a CPO pioneer. Its Tomahawk switch series has already launched its second-generation CPO product (Bailly) and announced a third-generation platform. Leveraging its deep expertise in SerDes, DSP, and switch ASICs, Broadcom is collaborating with TSMC to provide open CPO platform solutions.
- Marvell: Marvell's strategy is to be an enabler for hyperscale customers developing their own custom silicon. It is developing a 3D silicon photonics engine and CPO architecture aimed at integrating optical I/O into customers' custom AI accelerators (XPUs), offering a more flexible solution.
- Intel: Intel was an early leader in silicon photonics, but its strategy appears to have shifted from developing CPO switch products to focusing on providing optical I/O chiplets and leveraging its advanced foundry and packaging capabilities. Intel aims to be a key technology provider within the CPO ecosystem.

The Foundry Battlefield: Advanced Packaging is King

The realization of CPO technology is inextricably linked to cutting-edge semiconductor packaging. Efficiently integrating electronic integrated circuits (EICs) and photonic integrated circuits (PICs) within the same package is the core challenge of CPO, making advanced packaging the focal point of competition.

Heterogeneous Integration: CPO is essentially a heterogeneous integration technology, combining chips with different functions and from different manufacturing processes into a single package. This makes 2.5D and 3D packaging technologies critical for implementing CPO.
TSMC: With its leading CoWoS (Chip-on-Wafer-on-Substrate) platform, TSMC has become the key manufacturing partner for leaders like NVIDIA and Broadcom. For TSMC, CPO is not just a product but a crucial part of its advanced manufacturing flow and an extension of its technological leadership.
Intel: Intel is leveraging its unique Foveros (3D stacking) and EMIB (Embedded Multi-die Interconnect Bridge) technologies to compete directly with TSMC. Intel Foundry is actively showcasing its packaging capabilities to the industry, even offering to port designs originally made for CoWoS directly to its platform, in an attempt to alleviate the industry-wide advanced packaging capacity bottleneck.
OSATs (Outsourced Assembly and Test): Specialized packaging and testing firms like Amkor and Sanmina are partnering with fabless design companies like Marvell to offer an alternative CPO manufacturing path to TSMC and Intel, helping to build a more diversified supply chain.

Building Consensus: The Critical Role of Standardization

To avoid market fragmentation and proprietary technology barriers, the establishment of industry standards is crucial.

OIF (Optical Internetworking Forum): The OIF is the core organization driving CPO standardization. Its publication of the "3.2T Co-Packaged Module Implementation Agreement" is a landmark achievement. This agreement defines the mechanical dimensions, optical interfaces, electrical interfaces (like CEI-112G-XSR), and management interfaces (CMIS) for interoperable modules, laying the foundation for a multi-vendor ecosystem.
UCIe (Universal Chiplet Interconnect Express): UCIe is an open die-to-die interconnect standard. Its proliferation is essential for creating a "plug-and-play" chiplet ecosystem, allowing optical I/O chiplets from companies like Ayar Labs to seamlessly connect with ASICs from various suppliers.

A deep analysis of the supply chain structure reveals that the competition in the CPO market is, in effect, a proxy war between different foundry ecosystems. The outcome of this race depends not only on the chip design capabilities of NVIDIA or Broadcom but, on a deeper level, on the clash between the vertical integration capabilities of TSMC and its partners versus the ecosystem of Intel Foundry and its partners. In the CPO era, the key to success is no longer having the best single component, but mastering the complex capability to co-produce an entire electro-optical system from design to manufacturing across multiple disciplines with high yield. Therefore, investors evaluating the CPO market must look to the packaging technology roadmaps, capacity plans, and ecosystem strategies of the foundries, as they are the ultimate enablers—or limiters—of the CPO equipment suppliers.

The Next Frontier: CPO as a Catalyst for Disaggregated Architectures

The impact of CPO extends far beyond improving switch performance. In the long run, it and its underlying optical I/O technology will become the foundational technology that catalyzes the next revolution in data center architecture.

Reinventing the Data Center: Disaggregated and Composable Architectures

Optical I/O technology fundamentally removes the limitations of distance, bandwidth, and power inherent in traditional electrical interconnects. This allows system architects to physically separate resources that were once tightly coupled within a single server—such as compute (CPU/GPU), memory, and storage—into what is known as "Disaggregated Architectures".

Resource Pooling: In a disaggregated architecture, the data center is no longer composed of individual servers but of dynamically combinable resource pools. For example, memory can be pooled and dynamically allocated to different compute units via a high-speed optical network based on workload demands. This completely solves the "memory wall" problem of traditional architectures, where powerful GPUs are limited by the capacity and bandwidth of their local memory.
Composable Infrastructure: This architectural shift makes it possible to build "Composable Infrastructure." Operators can "assemble" optimized virtual servers in real-time, defined by software, according to the specific needs of an application. This greatly enhances resource utilization, flexibility, and scalability. It represents a paradigm shift from a server-centric architecture to a more efficient, fabric-based architecture at the data center scale.

The Roadmap Beyond 3.2T: Towards On-Chip Photonics

As technology evolves, CPO itself will continue to advance.

The Drive for Higher Speeds: As single-lane speeds increase from 100G to 200G and even 400G, the physical challenges that initially drove CPO development will become even more severe. This will make CPO not just a preferred option but the only viable technology path for systems at 400G/lane (i.e., 3.2T and beyond).
The Ultimate Goal: On-Chip Photonics: The ultimate evolutionary direction for CPO is the monolithic or 3D integration of photonic components with logic circuits, completely eliminating package-level interconnects to achieve true "On-Chip Photonics." TSMC's developing COUPE (Compact Universal Photonic Engine) technology, which utilizes its SoIC-X 3D stacking technology, is a significant step in this direction.
New Physical Challenges: However, achieving such high integration density will also present new fundamental physical challenges. For example, extremely high optical power density could lead to nonlinear losses within the waveguides themselves. At the same time, the cost, power, and thermal issues associated with large-scale laser integration will become even more acute.

Strategic Significance in the Post-Moore's Law Era

In an era where traditional transistor scaling (i.e., Moore's Law) is slowing down, the strategic importance of silicon photonics and CPO is particularly pronounced.

When performance gains from shrinking transistor sizes become increasingly difficult, future performance growth will rely more on architectural innovation and improvements in data movement efficiency. Silicon photonics provides an effective path to break the I/O bottleneck, which is increasingly becoming the primary factor limiting overall system performance.

Therefore, CPO should not be viewed merely as a networking component but as a fundamental enabling technology for next-generation computing architectures. Its true long-term value lies in its potential to unlock system-level innovations like resource disaggregation. This elevates the investment thesis from a simple component upgrade cycle to a bet on the fundamental reset of data center infrastructure. For a tech industry seeking continued performance scaling in the post-Moore's Law era, silicon photonics and CPO represent one of the most important strategic technology platforms for the next decade.