At Cloud Next 2025, Google has launched Ironwood, its most capable, robust, and energy-efficient TPU (Tensor Processing Unit). Its advanced design makes it suitable for power thinking, helping inferential AI models at scale. It is the seventh-generation TPU, and Google describes it as the most scalable and high-performing custom AI accelerator to date. It is the first ever chip especially designed to power AI inference models. For more than 10 years, TPUs have served demanding workloads of AI training and serving.


Ironwood showcases a major transformation in the development and evolution of artificial intelligence technology. Now, it is expected to enhance the infrastructure that empowers its progress. It represents a shift from responsive AI models that deliver real-time information for people to understand to a proactive model that actively understands data and generates insights. This is commonly referred to as the age of inference, where AI models will not only collect and generate data but also actively contribute to delivering answers and insights.


Understanding Ironwood- —A Powerful TPU to Cater to Massive Computational Demands


Ironwood marks an unforeseen AI advancement that augments the next step in the creation of generative AI. Via its sophisticated architecture, it can support both communication and computational requirements. It can scale up to approximately 9,216 liquid chips connected with ICI networking of around 10 MW. It optimizes both software and hardware to meet the demands of the most advanced AI workloads.


Also with Ironwood, developers can leverage the Pathways software stack of Google to seamlessly and reliably use the complete computing potential of thousands of Ironwood TPUs.


Key Features of Ironwood


Currently, Google Cloud is the only hyperscaler with 10+ years of experience in supporting advanced research with AI computational power. It has also been smoothly integrated with an exponential number of services, serving billions of users across Google Search, Gmail, and more. This decade-long expertise forms the foundation of the enriching capabilities of Ironwood. Top features of Ironwood include:


Substantial performance improvement while also concentrating on overall power efficiency:


This empowers AI models to operate more cost-effectively. The perf/watt parameter of Ironwood is two times higher than that of Trillium. Trillium is the sixth generation of Tensor Processing Unit, which was announced last year. In the present time, power availability is one of the limitations of delivering advanced AI capabilities. However, Google is committed to providing more per-watt capacity to the customer workloads.


The sophisticated liquid cooling solution and streamlined chip design can dependably sustain up to two times the power of standard air cooling. This happens even when there are massive and consistent AI workloads. To put things in perspective, Ironwood is approximately 30 times more power-effective than the first cloud TPU, which came out in 2018.


Considerable Increase in HBM, i.e., High Bandwidth Memory Capacity:


It provides memory of approximately 192 GB per chip (around 6 times more than Trillium). This enables easy processing of large datasets and models. This reduces the requirement of continuous data transfer, substantially enhancing performance.


Significantly Enhanced HBM Bandwidth (7.2 Tbps per chip):


It has around 4.5 times more HBM bandwidth than Trillium. This helps it ensure quick accessibility to data. This also helps make Ironwood more critical for AI workloads, which are more memory intensive.


Improved ICI Bandwidth:


Ironwood enhances this bandwidth to 1.2 Tbps (bidirectional). This enables quicker communication between chips. This facilitates inferences at scale and effective distributed training.


Importance of Ironwood: Empowering the Age of Inference


The cutting-edge design of Ironwood makes it capable of seamlessly handling the communication and computation requirements of “thinking models.” This involves MOEs (mixture of experts), large language models (LLMs), and sophisticated reasoning tasks. Managing such heavy models requires substantial parallel processing and effective memory accessibility. Essentially, Ironwood is well-equipped to reduce latency and data movement on chips while implementing substantial tensor manipulations. Google ensured that the design of Ironwood TPUs supports low latency and a high-bandwidth network. This facilitates the TPU to be compatible with coordinated and synchronous communication at TPod scale.


For the customers of Google Cloud, Ironwood comes in two different sizes depending on the workload demands of AI:


  • 9216 chip configurations.
  • 256 chip configurations.

When Ironwood is enhanced to 9216 chips per pod for an overall 42.5 exaflops, it supports approximately 24 times more computing power than El Capitan. El Capitan is one of the world’s largest supercomputers but offers just 1.7 exaflops per nod. This substantial processing power helps Ironwood cater to most complex AI workloads like MOE or LLM models. Such models have a lot more capacity for inference and training. All individual chips comprise of a peak computing power of around 4,614 TFLOPs. This marks an unprecedented leap in AI evolution. The network architecture and capability of Ironwood makes sure that the correct data is always there to support peak performance at this substantial scale.


Ironwood also contains an improved SparseCode. This is a specialized accelerator for assessing extra-large embedding prevalent in recommendation workloads and advanced ranking. Extended Sparsecode support in Ironwood ensures acceleration of broader range of workloads. This includes moving beyond the conventional AI domain to scientific and financial domains.


Pathways, which is the ML runtime model designed by Google’s DeepMind, ensures efficient distributed computing across numerous TPU chips. Pathways through Google Cloud make it easy to move away from Ironwood Pod. This allows thousands of chips to be compiled together for quickly enhancing the limits of generative AI computation.


Ironwood —A Futuristic Technology to Meet Diverse AI Demands


Ironwood is a marquee breakthrough in the domain of AI inference with enhanced memory capacity, computation power, reliability, and ICI networking advancements. These breakthroughs, combined with two enhancements in power efficiency, ensure effective solutions to rising computation demand. This helps users work on complex serving and training workloads with low latency and high performance. Prominent AI models like Gemini 2.5 and AlphaFold all operate on Tensor Processing Units. Experts are eager to see what new changes Ironwood will bring to the world of AI.


To know more about this TPU chip, it is worthwhile to explore the official Google website.