Semi Doped

Vikram Sekar and Austin Lyons
Semi Doped
Latest episode

25 episodes

  • Semi Doped

    Masterclass on Google's TPU v8 Networking

    04/24/2026 | 46 mins.
    Google's Cloud Next 2026 keynote? Fire. 🔥
    The TPU is now two chips instead of one — 8t for training, 8i for inference — but more interestingly, it's two scale-up networking topologies too.
    Austin Lyons (Chipstrat) and Vik Sekar (Vik's Newsletter) walk through what actually changed, one day after the announcement. OCS? Yes. AECs? Yep. Copper? Yep. Optics? Yep.
    We cover Virgo (Google's 47 petabit/second scale-out fabric, built entirely on OCS), Boardfly (the new scale-up topology for MoE inference that cuts hop count from 16 to 7), and the 3D torus Google still uses for training.
    Why is optical circuit switching the substrate of Google's data center? Why do active electrical cables still carry scale-up traffic inside racks? Why did Google split the CPU layer too, with custom ARM Axion head nodes to keep the TPUs fed?
    Along the way we trace the Dragonfly topology lineage to a 2008 paper by John Kim, Bill Dally, Steve Scott, and Dennis Abts. Abts went on to build Groq's rack-scale interconnect before landing at Nvidia.
    Chapters:
     0:00 Intro
     0:21 Two TPUs for two workloads
     2:31 HBM, SRAM, and Axion CPUs
     7:22 Why networking is the new bottleneck
     17:14 Virgo: rebuilding scale-out on optics
     25:24 3D torus Rubik's Cube scale-up for training
     34:50 Boardfly: scale-up for MoE inference
     42:07 Workload-specific everything
    Follow Chipstrat:
    Newsletter: https://www.chipstrat.com
    X: https://x.com/austinsemis
    Follow Vik:
    Newsletter: https://www.viksnewsletter.com/
    X: https://x.com/vikramskr
  • Semi Doped

    Meta VP Matt Steiner on Ads Infra, GPUs, MTIA, and LLM-Written Kernels

    04/20/2026 | 39 mins.
    Matt Steiner, VP of Monetization Infrastructure, Ranking & AI Foundations at Meta, walks through how Meta's ad system actually works, and why the infrastructure behind it differs from what you'd build for LLMs.

    We cover Andromeda (retrieval on a custom NVIDIA Grace Hopper SKU Meta co-designed), Lattice (consolidating N ranking models into one), GEM (Meta's Generative Ads Recommendation foundation model), and the adaptive ranking model, a roughly one-trillion-parameter recommender served at sub-second latency.

    We get into why recommender workloads aren't embarrassingly parallel like LLMs (the "personalization blob"), what that means for Meta's MTIA custom silicon roadmap, and how LLM-written kernels (KernelEvolve) flipped the economics of running a heterogeneous hardware fleet. Demand for software engineering has actually gone up as the price has come down. Meta now wants ~100x more optimized kernels per chip.

    Read the full transcript at https://www.chipstrat.com/p/an-interview-with-meta-vp-matt-steiner

    Chapters:
    0:00 Intro and scale
    0:39 How Meta's ad system works
    2:00 Meta Andromeda and the custom NVIDIA SKU
    3:30 Lattice: consolidating ranking models
    5:00 GEM, Meta's ads foundation model
    6:30 Adaptive ranking for power users
    8:17 The scale: 3B DAUs at sub-second latency
    9:40 Why longer interaction histories matter
    10:45 The anniversary gift analogy
    12:57 A decade of compute evolution
    15:21 Meta's infra as a CP-SAT problem
    16:07 Co-designing Grace Hopper with NVIDIA
    17:47 Matching compute shape to workload
    18:26 Influencing hardware and software roadmaps
    20:23 MTIA: why ads aren't LLMs
    22:07 The personalization blob and I/O ratios
    26:38 One trillion parameters at sub-second latency
    28:26 Heterogeneous hardware trade-offs
    29:30 KernelEvolve: LLMs writing custom kernels
    33:30 GenAI and recommender systems cross-pollination
    35:21 The 2-year infrastructure outlook
    37:00 Why demand for software engineering is rising
    38:53 How Matt stays on top of it all

    Relevant reading:
    KernelEvolve (Meta Engineering): https://engineering.fb.com/2026/04/02/developer-tools/kernelevolve-how-metas-ranking-engineer-agent-optimizes-ai-infrastructure/

    Follow Chipstrat:
    Newsletter: https://www.chipstrat.com
    X: https://x.com/chipstrat
  • Semi Doped

    Credo + Dust Photonics, XPO, Nuvacore

    04/17/2026 | 38 mins.
    Austin and Vik discuss Credo's acquisition of Dust Photonics, XPO as the new standard for scale-out (maybe instead of CPO?) and some thoughts about Nuvacore entering the CPU scene for agentic AI.

    Gavin Baker's tweet: https://x.com/GavinSBaker/status/2044410644301046031?s=20

    Vik's Substack: https://www.viksnewsletter.com
    Austin's Substack: https://www.chipstrat.com

    Chapters

    00:00 Introduction to the Semiconductor Landscape
    02:49 The Rise of Nuvacore and CPU Innovations
    05:27 The Demand for CPUs in the AI Era
    07:59 Photonics: The Next Frontier in Semiconductors
    10:26 Credo's Acquisition of Dust Photonics
    13:12 Vertical Integration in Semiconductor Companies
    15:15 The Future of Copper and Optical Technologies
    20:28 The Evolution of AI Training Models
    25:28 Innovations in Optical Interconnects
    31:10 The Future of Data Center Connectivity
    36:56 Strategic Implications in the Optical Ecosystem
  • Semi Doped

    Is Intel Finally Back with a $300B market cap? OpenClaw can Dream?

    04/10/2026 | 34 mins.
    In this episode, Austin and Vik discuss if Intel is finally back with CPU partnerships with Google, and heterogeneous inference with SambaNova, while market cap soars above $300B. Vik tries to get his OpenClaw instance to dream every night.

    Chapters

    00:00 Anthropic's New Direction: Chip Development
    02:30 Navigating Subscription Changes and Token Costs
    05:25 Exploring Alternative AI Models
    08:10 The Economics of AI: Rent vs. Buy
    10:56 Intel's Resurgence and Market Dynamics
    15:23 Intel's Strategic Partnerships and Market Positioning
    19:37 The Role of IPUs in Modern Computing
    25:08 Coexistence of x86 and ARM Architectures
    29:55 Innovations in Chip Architecture and Future Prospects
  • Semi Doped

    Reiner Pope (MatX): Designing AI Chips From First Principles for LLMs

    04/09/2026 | 38 mins.
    Reiner Pope is the co-founder and CEO of MatX, the startup building chips designed from first principles for LLMs. Before MatX, Reiner was on the Google Brain team training LLMs, and his co-founder Mike Gunter was on the TPU team. They left Google one week before ChatGPT was released.
    A counterintuitive throughput insight from the conversation:
    “Low latency means small batch sizes. That is just Little’s law. Memory occupancy in HBM is proportional to batch size. So you can actually fit longer contexts than you could if the latency were larger. Low latency is not just a usability win, it improves throughput.”
    We get into:
    • The hybrid SRAM + HBM bet, and why pipeline parallelism finally works
    • Overcoming the CUDA moat
    • Why frontier labs are willing to bet on an AI ASIC startup
    • Memory-bandwidth-efficient attention, numerics, and what MatX publishes (and what it does not)
    • Why 95% of model-side news is noise for chip design
    • Why sparse MoE drives MatX to “the most interconnect of any announced product”
    • How MatX uses AI for its own chip design
    • The biggest challenges ahead
    Chapters:
    00:00 “We left Google one week before ChatGPT”
    00:24 Intro: who is MatX
    01:17 Origin story: leaving Google for LLM chips
    02:21 GPT-3 and the “too expensive” problem
    04:25 Why buy hardware that is not a GPU
    05:52 Overcoming the CUDA moat
    08:46 Early investors
    09:35 The name MatX
    09:59 The chip: matrix multiply + hybrid SRAM/HBM
    12:11 Why pipeline parallelism finally works
    14:22 Reading papers and Google going dark
    15:20 Research agenda: attention and numerics
    17:06 Five specs and meeting customers where they are
    19:24 Why frontier labs are the natural first customer
    20:32 Workloads: training, prefill, decode
    22:18 Little’s law and the throughput case for low latency
    24:29 Interconnect and MoE topology
    26:35 Inside the team: 100 people, full stack
    28:32 Agentic AI: 95% noise for hardware
    30:35 KV cache sizing in an agentic world
    32:11 How MatX uses AI for chip design (Verilog + BlueSpec)
    34:23 Go to market: proving credibility under NDA
    35:12 Porting effort for frontier labs
    36:34 Biggest skepticism: manufacturing at gigawatt scale
    37:32 Hiring plug

    Austin Lyons @ Chipstrat: https://www.chipstrat.com
    Vik Sekar @ Vik's Newsletter: https://www.viksnewsletter.com/

More Technology podcasts

About Semi Doped

The business and technology of semiconductors. Alpha for engineers and investors alike.
Podcast website

Listen to Semi Doped, Hard Fork and many other podcasts from around the world with the radio.net app

Get the free radio.net app

  • Stations and podcasts to bookmark
  • Stream via Wi-Fi or Bluetooth
  • Supports Carplay & Android Auto
  • Many other app features
Social
v8.8.12| © 2007-2026 radio.de GmbH
Generated: 4/27/2026 - 5:55:38 AM