Powered by RND
PodcastsScienceInterconnects

Interconnects

Nathan Lambert
Interconnects
Latest episode

Available Episodes

5 of 115
  • Ranking the Chinese Open Model Builders
    The Chinese AI ecosystem has taken the AI world by storm this summer with an unrelenting pace of stellar open model releases. The flagship releases that got the most Western media coverage are the likes of Qwen 3, Kimi K2, or Zhipu GLM 4.5, but there is a long-tail of providers close behind in both quality and cadence of releases.In this post we rank the top 19 Chinese labs by the quality and quantity of contributions to the open AI ecosystem — this is not a list of raw ability, but outputs — all the way from the top of DeepSeek to the emerging open research labs. For a more detailed coverage of all the specific models, we recommend studying our Artifacts Log series, which chronicles all of the major open model releases every month. We plan to revisit this ranking and make note of major new players, so make sure to subscribe.At the frontierThese companies rival Western counterparts with the quality and frequency of their models.DeepSeekdeepseek.com | 🤗 deepseek-ai | X @DeepSeek_AIDeepSeek needs little introduction. Their V3 and R1 models, and their impact, are still likely the biggest AI stories of 2025 — open, Chinese models at the frontier of performance with permissive licenses and the exposed model chains of thought that enamored users around the world.With all the attention following the breakthrough releases, a bit more has been said about DeepSeek in terms of operations, ideology, and business model relative to the other labs. They are very innovative technically and have not devoted extensive resources to their consumer chatbot or API hosting (as judged by higher than industry-standard performance degradation).Over the last 18 months, DeepSeek was known for making “about one major release a month.” Since the updated releases of V3-0324 and R1-0528, many close observers have been surprised by their lack of contributions. This has let other players in the ecosystem close the gap, but in terms of impact and actual commercial usage, DeepSeek is still king.An important aspect of DeepSeek’s strategy is their focus on improving their core models at the frontier of performance. To complement this, they have experiments using their current generation to make fundamental research innovations, such as theorem proving or math models, which ultimately get used for the next iteration of models. This is similar to how Western labs operate. First, you test a new idea as an experiment internally, then you fold it into the “main product” that most of your users see.DeepSeekMath, for example, used DeepSeek-Coder-Base-v1.5 7B and introduced the now famous reinforcement learning algorithm Group Relative Policy Optimization (GRPO), which is one of the main drivers of R1. The exception to this (at least today) is Janus, their omni-modal series, which has not been used in their main line.Qwenqwenlm.ai | 🤗 Qwen | X @Alibaba_QwenTongyi Qianwen, the primary AI lab within Alibaba’s cloud division, is by far and away most known for their open language model series. They have been releasing many models across a range of sizes (quite similar to Llama 1 through 3) for years. Recently, their models from Qwen 2.5 and Qwen 3 have had accelerating market share among AI research and startup development.Qwen is closer to American Big Tech companies than to other Chinese AI labs in terms of releases: They are covering the entire stack, from VLMs to embedding models, coding models, image and video generation, and so on.They also cater to all possible customers (or rather every part of the open community) by releasing capable models of all sizes. Small dense models are important for academia to run experiments and for small/medium businesses to power their applications, so it comes to no surprise that Qwen-based models are exploding in popularity.On top of model releases for everyone, they also focused on supporting the (Western) community, releasing MLX and GGUF versions of their models for local usage or a CLI for their coding models, which includes a generous amount of free requests.Unlike some American companies, the core team seems to have stayed relatively small in terms of headcount, in line with other Chinese AI labs: Qwen3 has 177 contributors, whereas Llama 3 has thrice the amount, while Gemini 2.5 has over 3,000 people as part of the model. Close competitorsThese companies have recently arrived at the frontier of performance and we will see if they have the capability to consistently release great models at a pace matching Qwen or DeepSeek.Moonshot AI (Kimi)moonshot.cn | 🤗 moonshotai | X @Kimi_MoonshotMoonshot AI is one of the so-called “AI tigers”, a group of hot Chinese AI startups determined by Chinese media and investors. This group consists of Baichuan, Zhipu AI, Moonshot AI, MiniMax, StepFun, and 01.AI — most of which have attracted investments by tech funds and other tech grants. For example, Alibaba is seen as a big winner in the AI space by having their own models and by being a lead investor in Moonshot, sort of like how big tech companies in the U.S. are investing in fundraising rounds for newer AI labs.While their first models, K1 and K1.5, were closed and available on their API, they started releasing open models after the R1 release with experimental models using the Muon optimizer. Similar to DeepSeek, they focus on a single model line, with small experiments eventually feeding back into the main model. K2 is their “moonshot run,” a.k.a. yolo run, and quickly became a hit similar to R1 (see our report from the release).Further reading on Kimi can be found on ChinaTalk.Zhipu / Z.AIz.ai | 🤗 zai-org | X @Zai_orgZhipu, known in the west as Z.ai, is a startup spinoff of Tsinghua University with considerable investments by Chinese companies and VCs. Currently, they are even considering an IPO, which would make them the first AI tiger to do so.In terms of models, they are mostly known for their recent release of GLM-4.5 and GLM-4.5V, which are all very capable for their sizes (both of which are fairly large mixture of expert models). However, they are not just releasing LLMs, but also image and video generation models, setting them apart from pure-LLM companies and labs.NoteworthyThese companies are transitioning to open releases, have open models with inferior capabilities, or slightly different foci than the text-centric labs pushing the frontiers of intelligence.StepFunstepfun.ai | 🤗 stepfun-ai | X @StepFun_aiStepFun first started as a closed model provider, but pivoted to open model releases after DeepSeek R1 shook up the industry. They are mostly focusing on multi-modal model releases, with Step3 being their flagship VLM. They also have image, audio and video generation models.Tencent (Hunyuan)hunyuan.tencent.com | 🤗 Tencent | X @TencentHunyuanHunyuan is mostly known for HunyuanVideo and Hunyuan3D. While they have released three series of different LLMs, their releases come with very strict licenses, which is unusual for Chinese companies and dampens excitement when combined with performance levels that can be found elsewhere.RedNote (Xiaohongshu)xiaohongshu.com | 🤗 rednote-hilabThe Chinese version of Instagram, RedNote, recently joined the ranks of Chinese companies releasing open models. Especially their capable character recognition / OCR model surprised many (see our coverage). Similar to Xiaomi and Baidu, it remains to be seen what their overall open strategy will be in the near and distant future and they have not competed in the large, frontier model space.MiniMaxminimaxi.com | 🤗 MiniMaxAI | X @MiniMax__AIMiniMax is another of the AI tigers and also started as a closed company. After the release of R1, they changed their strategy and released the weights of Minimax-Text-01, following up with reasoning models building upon it. The unique selling point of these models are the 1M context window achieved with hybrid attention.These text models are not the only thing they are focusing on — they also have image and video generation models, but those remain closed and only available on their API. They are also promoting their consumer platform heavily as they eye an IPO.OpenGVLab / InternLMinternlm.intern-ai.org.cn | 🤗 InternLM | X @opengvlabInternLM & OpenGVLab have deep ties to the Shanghai AI Laboratory, with InternLM focusing on the language models, while OpenGVLab releases vision models. While they release a range of models such as S1 or InternLM-Math, the orgs are mostly known for the strong InternVL series. While the first versions mostly used their own InternLM pretrained models, later releases (such as InternVL3) rely on Qwen as the language backend. Skyworkskywork.ai | 🤗 Skywork | X @Skywork_AIThe Singaporean Skywork first started out as an online karaoke company (yes, really) before they pivoted to AI and being a competitor to Manus, with their platform focusing on agents for work-related tasks, such as slide generation.Their LLM journey started with them releasing their own pretrained dense and MoE models. However, they stopped pre-training their own models and instead started to fine-tune existing models: Their OR1 reasoning model builds on top of DeepSeek-R1-Distill-Qwen-32B, R1V3 uses InternVL3 (which itself uses Qwen2.5 as its LLM backend).Aside from LLMs, they have a wide range of other models, from world models, image and video generation models, and reward models. Similar to their LLMs, they mostly build on top of other models. Unlike many labs, Skywork has released some datasets with their models, such as preference and reasoning training data.On the riseThese companies are either just getting their toes wet with open models or operating as more of academic research organizations than labs pushing the performance of models.ByteDance Seedseed.bytedance.com | 🤗 ByteDance-SeedSeed is the R&D arm of ByteDance and eerily similar to Meta’s FAIR division: Diverse models with interesting research, with their papers garnering a ton of attention in the community. However, it remains to be seen whether they shoot for a Llama-style model release or continue to release research artifacts.Here are some recent papers:* Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference* Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving* Seed-X: Building Strong Multilingual Translation LLM with 7B Parameters* Seedance 1.0: Exploring the Boundaries of Video Generation Models* SeedEdit 3.0: Fast and High-Quality Generative Image Editing* Seed1.5‑VL Technical Report* Mogao: An Omni Foundation Model for Interleaved Multi‑Modal Generation* Seed1.5‑Thinking: Advancing Superb Reasoning Models with Reinforcement Learning* VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks* Seed LiveInterpret 2.0: End‑to‑end Simultaneous Speech‑to‑speech Translation with Your VoiceOpenBMBopenbmb.ai | 🤗 openbmb | X @OpenBMBOpenBMB is an open-source community (comparable to BigScience) from Tsinghua University NLP Lab (the very same university where Zhipu was spun off from) with support from the Beijing Academy of Artificial Intelligence (BAAI) and ModelBest.They are mostly focusing on small multi-modal models for the edge, such as MiniCPM-V-4. However, the license is rather restrictive, which is surprising given the community-driven origins of the group. Aside from model releases, they also release frameworks and specialized kernels to make sure their models run on low-end hardware.Xiaomi (MiMo)mi.com | 🤗 XiaomiMiMoXiaomi started releasing a bunch of small, capable models, ranging from LLMs to VLMs and audio models. Xiaomi updating the models quickly after an initial launch and releasing multiple variants of the models show that it is not a one-off foray into open models. However, it remains to be seen whether those are mostly research artifacts or whether they are serious about potentially pushing the frontier or competing for adoption.Baidu (ERNIE)yiyan.baidu.com | 🤗 baidu | X @Baidu_IncBaidu, one of the original names in the Chinese AI space, has only released the weights of ERNIE 4.5. It remains to be seen whether they continue to release weights of newer releases as well.Honorable MentionsThe rest of the labs that we are watching.Multimodal Art Projectionm-a-p.ai | 🤗 m-a-pAn open research community, releasing all kinds of models (including a truly open 7B language model with data, etc.). Now, they’re mostly known for the music generation model YuE.Alibaba International Digital Commerce Groupaidc-ai.com | 🤗 AIDC-AIAnother R&D arm of Alibaba, mostly releasing niche models building upon Qwen.Beijing Academy of Artificial Intelligence (BAAI)baai.ac.cn | 🤗 BAAI | X @BAAIBeijingAs a university, the Beijing Academy of Artificial Intelligence has a high diversity of projects. They are mostly known for BGE, which are capable embedding models.inclusionAI🤗 inclusionAI | X @InclusionAI666The open weight arm from the Ant Group (an affiliate of Alibaba handling mobile payments and some financial industries), responsible for Ling Lite, a series of LLMs.Pangu (Huawei)huaweicloud.com | X @HuaweiCloud1Huawei is working on AI accelerators to threaten the market share of Nvidia GPUs, which are often targeted by regulations, both from the US and China. Their model releases are mostly to show what’s possible with their cards, but not without drama accusing them of upcycling Qwen models and not stating it. We would expect them to continue to release more models in the near future. This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.interconnects.ai/subscribe
    --------  
    12:41
  • Contra Dwarkesh on Continual Learning
    Dwarkesh Patel’s now well-read post on why he is extending his AI timelines focuses on the idea of continual learning. If you ask me, what we have already is AGI, so the core question is: Is continual learning a bottleneck on AI progress?In this post, I argue that continual learning as he describes it actually doesn’t matter for the trajectory of AI progress that we are on. Continual learning will eventually be solved, but in the sort of way that a new type of AI will emerge from it, rather than continuing to refine what it means to host ever more powerful LLM-based systems. Continual learning is the ultimate algorithmic nerd snipe for AI researchers, when in reality all we need to do is keep scaling systems and we’ll get something indistinguishable from how humans do it, for free.To start, here’s the core of the Dwarkesh piece as a refresher for what he means by continual learning.Sometimes people say that even if all AI progress totally stopped, the systems of today would still be far more economically transformative than the internet. I disagree. I think the LLMs of today are magical. But the reason that the Fortune 500 aren’t using them to transform their workflows isn’t because the management is too stodgy. Rather, I think it’s genuinely hard to get normal humanlike labor out of LLMs. And this has to do with some fundamental capabilities these models lack.I like to think I’m “AI forward” here at the Dwarkesh Podcast. I’ve probably spent over a hundred hours trying to build little LLM tools for my post production setup. And the experience of trying to get them to be useful has extended my timelines. I’ll try to get the LLMs to rewrite autogenerated transcripts for readability the way a human would. Or I’ll try to get them to identify clips from the transcript to tweet out. Sometimes I’ll try to get them to co-write an essay with me, passage by passage. These are simple, self contained, short horizon, language in-language out tasks - the kinds of assignments that should be dead center in the LLMs’ repertoire. And they're 5/10 at them. Don’t get me wrong, that’s impressive.But the fundamental problem is that LLMs don’t get better over time the way a human would. The lack of continual learning is a huge huge problem. The LLM baseline at many tasks might be higher than an average human's. But there’s no way to give a model high level feedback. You’re stuck with the abilities you get out of the box. You can keep messing around with the system prompt. In practice this just doesn’t produce anything even close to the kind of learning and improvement that human employees experience.The core issue I have with this argument is the dream of making the LLMs we’re building today look more like humans. In many ways I’m surprised that Dwarkesh and other very AGI-focused AI researchers or commentators believe this — it’s the same root argument that AI critics use when they say AI models don’t reason. The goal to make AI more human is constraining the technological progress to a potentially impossible degree. Human intelligence has long been the inspiration for AI, but we have long surpassed it being the mirror we look to for inspiration. Now the industry is all in on the expensive path to make the best language models it possibly can. We’re no longer trying to build the bird, we’re trying to transition the Wright Brothers’ invention into the 737 in the shortest time frame possible.To put it succinctly. My argument very much rhymes with some of my past writing. Do language models reason like humans? No. Do language models reason? Yes. Will language model systems continually learn like humans? No.Will language model systems continually learn? Of course.Interconnects is a reader-supported publication. Consider becoming a subscriber.Dwarkesh writes “Rather, I think it’s genuinely hard to get normal humanlike labor out of LLMs.” This is because we’re still early on the buildout of the technology. Human labor takes an immense amount of context and quick thinking, both of which we’re starting to unlock with our language models. On top of this, human labor may not be what we want to create — we want to augment it. Using LLMs as drop in replacements for humans is not a requirement for AGI nor is what Dwarkesh describes a fundamental limitation on AI progress. Francois Chollet cleverly poked at this weakness in his recent conversation with Dwarkesh at an ARC-AGI event:Well, how do you define the difference between the ability to adapt to a new task and learning on the fly? It's, it sounds like the same thing to me.Language models can already pick up subtle context extremely fast. ChatGPT’s memory feature has gotten far better for me. When we’re using the far more powerful models we can expect in the next 18 months this’ll already start to appear magical. Language models are extremely apt at inferring context even without us giving it to them. Soon we’ll be unlocking that subtle connection engine by providing immense, explicit context. I don’t know of anyone who has actually thoroughly digitized all the relevant context of their job and formatted it in a way that is easily readable by an LLM. GPT-5 Pro estimates that all of the writing on Interconnects would be only 500K tokens. That would fit into an existing LLM with no extra system, but I’ve never tried it.The problem that Dwarkesh is facing is that we’re still using LLMs primarily in a single generation manner, which got far better with the introduction of reasoning models, but the economically useful way to use current tools in more complex intellectual domains will require a deep-research style approach over all of your recent work interactions. No one is giving language models that kind of context. None of the tools we use are set up properly to accumulate this type of context.I expect this to change rapidly. ChatGPT, Claude, and the likes are all adding memory features across chats and countless connectors to other pieces of information in your professional life. These memory features will be omnimodal and essential to extracting the type of value Dwarkesh wants. Without them, I agree language models in their current form are hopeless at solving continual learning.This is what I would expect the rumored $2000/month ChatGPT level subscriptions to work with. Each of these bespoke tasks needs to absorb a ton of context and reasoning tokens in order to make a directionally right output. If someone built the Claude Code equivalent for my Substack, with every post tagged by topic and performance metrics, I bet the AI could easily make useful suggestions on how to format my content.Continual learning in how Dwarkesh presents it is a systems problem rather than a learning problem. I expect better context management over my information ecosystem to exist in 2026, but more work to be needed for the AI companies to know how best to reference it and unlock in-context learning that feels like rapid adaptation. Call that 2027.The models that have been released in 2025 will make this far more tractable in the near future. Reasoning models have made in-context learning far more powerful, resulting in rapid progress on held-out and complex domains such as ARC-AGI. These models also have come with massive improvements in context length. Claude and Gemini have 1M+ token context lengths and GPT-5’s is at 400K — they’re all growing steadily. What is important with the context length numbers is that evaluations are showing that these are meaningful improvements that the models can leverage intelligently.With these reasoning models and smart retrieval of context, the systems we are building will look indistinguishable from continual learning. This will definitely be multiple LLMs working together and will operate very differently than the first versions of ChatGPT we were given (and often still use today).The path to continual learning is more context and more horsepower. This is directly in line with the direction AI investment is going. This doesn’t feel like a bottleneck, rather another product problem that we are going to solve. This sort of continual learning may not enable the type of raw intelligence and autonomy that many vocal leaders in AI describe as “superintelligence.” Training models to be smarter on even more complex tasks — e.g. novel biological research — requires mastering agentic behaviors that need to be learned from scratch, as discussed in my post on “What comes next with RL”. There’s no internet scale pretraining data for such agentic tasks. My point is that not all jobs that require continual learning will require the frontiers of intelligence. I’m excited to write blog posts with the bliss of my ChatGPT 6 co-editor.This technology coming soon will not be without its challenges. My first reaction to the continual learning post was more in line with “society isn’t ready for this” rather than commentary on its feasibility. I’ll repeat my warning:For a long time I’ve written that AI models have a higher risk potential in terms of social outcomes because the modalities they interact with us in are far more personal… As AI is going to be so powerful as a standalone entity, breaking some of the symbiotic links will be good for adding friction that makes the technology easier to steer towards good outcomes. In short, be wary of wishing for end-to-end (reinforcement) learning when you’re part of the environment.2 It’s a destiny to dystopia.What we have today is a form of AGI and it’ll soon get much better with better context and memory. The industrialization of language models is giving us incredible improvements across a wide swath of use-cases. These will blow past many basic primitives of intelligence in humans that have motivated AI for decades. First was models reasoning, then will come systems with continual learning. This is exactly what most AI companies are actually building — regardless of what their superintelligence messaging is.Comments are open on this post, please continue the debate! This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.interconnects.ai/subscribe
    --------  
    10:04
  • GPT-5 and the arc of progress
    If you want a video version of this, check out the last 20 minutes of the livestream reaction (edit, fixed link) I did with Will Brown of Prime Intellect and Swyx of Smol AI & Latent Space.GPT-5 was set up to fail on some of the narratives it was expected to satisfy. The two central themes it had to decide between were the AGI (or superintelligence) narrative that Sam Altman & co. have been using to fundraise and the fact that ChatGPT is one of the fastest-growing consumer technologies of all time. To fulfill both, GPT-5 needed to be AGI while also being cheap enough to serve as the most-used AI system in the world. Business and technological realities made it inevitable that GPT-5’s primary impact would be to solidify OpenAI’s market position, even if it raises a lot of eyebrows for the long-term trajectory of AI.The reactions online capture this as well. The OpenAI live streams have historically catered to AI insiders, but the product speaks entirely to a different audience. The people discussing this release on Twitter will be disappointed in a first reaction, but 99% of people using ChatGPT are going to be so happy about the upgrade. Confusingly enough, this includes many of the critics. GPT-5 is a good AI system. It’s right in line with best-in-class across pretty much every evaluation, while being cheap enough to serve the whole world. OpenAI is largely fixing its product offering with an announcement that was hyped to be one of the biggest AI news cycles of the year. AI news being loud is defined by narratives being different more-so than technology being better. OpenAI releasing an open model again will likely be pinpointed as just as important a day for the arc of AI as the GPT-5 release. In many ways GPT-5 was set up to fail and that is very off-putting for those expecting maximum AI progress in the near term.I’m not going to dwell on it, but oh boy, that was a messy release. GPT-5 being announced and rolled out like this is very odd. Countless plots were mislabeled, live demos had bugs, and the early rollout is doing some weird stuff. This reinforces how OpenAI was torn about the release and backed into a corner with their messaging. They knew they needed to improve the experience with strong competition in the industry, but releasing GPT-5 needed to make a splash after how long they’ve waited (and already parked the GPT 4.5 name).The core question we track in this post is: What does it mean for the next 6-18 months of AI progress if GPT-5 is just as good as all the best models out there, e.g., Claude Sonnet for coding or o3 for search, funneled into one, super cheap package? If AGI was a real goal, the main factor on progress would be raw performance. GPT-5 shows that AI is on a somewhat more traditional technological path, where there isn’t one key factor, it is a mix of performance, price, product, and everything in between. Interconnects is a reader-supported publication. Consider becoming a subscriber.GPT-5’s performanceThere are a few places that we can see that GPT-5 represents a solid step on the performance trend line, but nothing like a step change. First, on LMArena, GPT-5 is fantastic, sweeping the board to #1 on all categories. The last model to claim #1 in pretty much every category was Gemini 2.5 Pro — and that was the biggest step change in Elo since GPT-4 Turbo skyrocketed past the first Claude.Second, GPT-5 is the top model on the ArtificialAnalysis composite benchmark.These two, LMArena & ArtificialAnalysis, represent two coarse evaluations — community vibes and raw benchmarks. Both of these can be gamed, but are still correlated with real-world use. You can also see in OpenAI’s shared results how much the smaller versions improve on the likes of GPT-4.1 mini and o4-mini.In many ways, the march of progress on evals has felt slowed for a while because model releases are so frequent and each individual step is smaller. Lots of small steps make for big change. The overall trend line is still very positive, and multiple companies are filling in the shape of it. My post on “what comes next” from earlier this summer all but called this type of release, where the numbers aren’t shocking but the real world use cases are great, becoming more common.This is a different path for the industry and will take a different form of messaging than we’re used to. More releases are going to look like Anthropic’s Claude 4, where the benchmark gains are minor and the real world gains are a big step. There are plenty of more implications for policy, evaluation, and transparency that come with this. It is going to take much more nuance to understand if the pace of progress is continuing, especially as critics of AI are going to seize the opportunity of evaluations flatlining to say that AI is no longer working.To say it succinctly: Abilities will develop more slowly than products.The product overhang is being extended with each release. We’re still building untapped value with AI models and systems faster than we’re capturing it.Another way to see this incremental push out in models or systems is through OpenAI’s update to the famous METR plot of time to completion for humans of various tasks AI systems can solve 50% of the time. GPT-5 is leading, but also just in line with trends.All of this is to say comprehensively that AI progress is very alive and well, as long as you don’t subscribe to the exponential takeoff in ability. Those arguments are very strained by this GPT-5 release.Yes, AI progress on intelligence and “raw ability” is certainly going to continue at a solid pace for a long time, but how will this translate into recursive self-improvement?GPT-5’s detailsIf you’re reading closely, you may have noticed that this post uses the word system instead of model. All of the leading chat systems have been adding more components onto them like safety checkers and so on, but this is the first one to use different architectures and weights for the primary generation of content across similar queries. GPT-5 is the first in what is to come, mostly to better balance cost and give better user experiences. From the system card:GPT‑5 is a unified system with a smart and fast model that answers most questions, a deeper reasoning model for harder problems, and a real-time router that quickly decides which model to use based on conversation type, complexity, tool needs, and explicit intent (for example, if you say “think hard about this” in the prompt). The router is continuously trained on real signals, including when users switch models, preference rates for responses, and measured correctness, improving over time.Along with this, they shipped many product improvements, such as how the model has a 400K context window in the API with great performance, reduced hallucinations, and new personalities. Primarily, I worry as a power user about the router. I sense that for now I’ll default to GPT-5 Thinking, and sometimes upgrade to Pro mode, while downgrading to standard GPT-5 only for benign queries (depending on its search behavior — if it is search-heavy like o3 without thinking, then it should still work well). Thankfully, the thinking mode has a “get an early answer” button, so I don’t see any reason to start elsewhere. If I need an answer fast, I’ll get one. If not, I want the best responses possible.As for prices, here’s a comparison. GPT-5’s top-level model is cheaper than Claude Sonnet and far better than any OpenAI model has been before at coding — one of the core details of this release. Matching Gemini Pro’s pricing when considering Google’s infrastructure advantage is a substantial accomplishment.* OpenAI — GPT-5 (API sizes)* GPT-5: input $1.25, output $10.00. (OpenAI)* GPT-5 mini: input $0.25, output $2.00. (OpenAI)* GPT-5 nano: input $0.05, output $0.40. (OpenAI)* OpenAI — o3 (reasoning)* o3: input $2.00, output $8.00. (OpenAI Platform)* o3-mini: input $1.10, output $4.40. (cached input $0.55) (OpenAI Platform)* Anthropic — Claude 4 family* Claude Sonnet 4: input $3.00, output $15.00. (Anthropic)* Claude Opus 4.1: input $15.00, output $75.00. (Anthropic)* Google — Gemini 2.5* Gemini 2.5 Pro: input $1.25 (≤200k prompt) / $2.50 (>200k); output $10.00 (≤200k) / $15.00 (>200k). (Google AI for Developers)* Gemini 2.5 Flash: input $0.30 (text/image/video) or $1.00 (audio); output $2.50 (includes thinking tokens). (Google AI for Developers)* Gemini 2.5 Flash-Lite: input $0.10 (text/image/video) or $0.30 (audio); output $0.40. (Google AI for Developers)Cheaper, thinking models that work well in applications are far more useful than scaling (as GPT-4.5 has shown us).GPT-5’s impactIt seems like most people in all walks of life are going to love this model — from AI researchers all the way to people who are learning of ChatGPT for the first time today. This is very in line with my expectations for how AI will proceed, as a long, steady march of progress. The fact that the models are getting way cheaper rather than way more expensive definitely signals that we cannot just brute-force scale our way to much stronger systems. Scaling helps, but it is now one of many considerations, and all the laboratories are showing us that much bigger models have diminishing returns in value to customers. At the same time, models being cheaper could be just what we need for Jevons paradox to kick in and provide another boost in AI adoption.Many people will claim that the GPT-5 release was a flop and the bubble will pop for AI. This is downstream of the industry generally making totally unrealistic promises. As someone whose core through-line when covering frontier models is tracking the pace of progress, I translate this as “AI capabilities on benchmarks will proceed a bit more slowly, but we aren’t reaching any clear walls in performance.” The AI performance hills we’re climbing up as an industry do put up some more resistance as the obvious low hanging fruit is gone, but we have the tools to overcome it consistently for the next 6 to 18 months. For companies that have been fundraising on promises of AGI, such as Anthropic and OpenAI, closing the next rounds could be harder. Of course, this depends on whether the messaging of the rounds was a key part of the fundraising. This fundraising inspires capital expenditures across the industry, e.g. TSMC developing the next node for NVIDIA to build new chips, and so on. The AGI narrative and the fundraising it has enabled have been good for the U.S. in terms of building out valuable, raw infrastructure. This could be the beginning of the money train slowing down, but that’s very different from a derailment and a stock market crash. As raw infrastructure spend slows, there will be even more pressure to deliver valuable products to users. A key trend for 2025 has been many of those appearing — Deep Research and Claude Code being the paradigms that everyone has copied. GPT-5 makes these applications better and makes it easier and cheaper for the next viral AI products to hit the market. I’m still excited for what is to come. But first, I’m going to sign off and go play with GPT-5. It’s a good day to build something for the fun of it. As I use it more, I’ll have more to say.Extra GPT-5 linksFor more specifics on the model from people who got early access, I recommend Tyler Cowen, Every.to, or Simon Willison (or Swyx soon, on Latent.Space).Livestream link: https://openai.com/gpt-5/ Research blog post: https://openai.com/index/introducing-gpt-5/ Developer blog post: https://openai.com/index/introducing-gpt-5-for-developers Enterprise blog post: https://openai.com/index/gpt-5-new-era-of-work GPT-5 landing page: https://openai.com/gpt-5/ System Card: https://openai.com/index/gpt-5-system-card/ Coding examples: https://openai.github.io/gpt-5-coding-examples/What would you say if you could talk to a future OpenAI model https://progress.openai.com/Finally, I’ll plug again the video I did with Will Brown and Swyx: Send me the most interesting things you find on GPT-5! This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.interconnects.ai/subscribe
    --------  
    10:41
  • gpt-oss: OpenAI validates the open ecosystem (finally)
    OpenAI released two open-weight, text-only reasoning models today, both mixture of experts (MoE) sized to run efficiently on a range of hardware from consumer GPUs to the cloud. These models have the Apache 2.0 license, so they’re available for distillation into other reasoning models, deployment into commercial products, and are free of downstream restrictions. These two models, the smaller gpt-oss-20B with 3.6B active parameters and 21B total and the larger gpt-oss-120B with 5.1B active parameters, follow the trends we’ve seen with the other leading open models in architecture choices. Where this release shines is in the dramatic change in open model performance and strategy that comes with the leading name in AI releasing an open model that undercuts some of their own API products.We’ll get to the technical details on the model later, but the main point of this post is how much OpenAI has changed by releasing their first open language model since GPT-2. The larger 120B model “achieves near-parity with OpenAI o4 mini on core reasoning benchmarks‬” and is a major moment for the ecosystem:* OpenAI has released an open model at the frontier of current open model performance — highlighting how major concerns over open models that OpenAI leadership mentioned in 2023 were overblown. The marginal risks of open models have been shown to not be as extreme as many people thought (at least for text only — multimodal is far riskier). Once other organizations, particularly Meta and China showed OpenAI that there was no risk here, the path was opened to release a model.* OpenAI has revealed far more of their technical stack than any release to date. This blog post has light details on many things in the model, but community tinkering will begin to better understand what is going on here. This includes basic things like our first time seeing a raw chain of thought (CoT) for an OpenAI reasoning model, but also more interesting things like how this model is trained to use tools in the CoT like their o3 model. Other details include researchers being able to play with OpenAI’s instruction hierarchy in raw weights (where pieces of it are untouchable in the API), a new “harmony” prompt format, the same “reasoning efforts” of low, medium & high from the API, a huge proof of concept on how far basic, community standard architectures with MoEs can be pushed, and other small details for the AI community to unpack.* OpenAI has initiated a scorched earth policy on the API market, undercutting their own offerings and unleashing an extremely strong, trusted model brand with a permissive license. While adoption of any open model is much slower than an API due to testing, additional configuration, etc., this is set up to go about as fast as it can. Any API model that competes with current models like OpenAI o4 mini, Claude Haiku, Gemini Flash, DeepSeek R1 etc. are all going to have to compete with this model. OpenAI’s o4 mini model is currently served at $1.1 per million input tokens and $4.4 per million output. Serving this open model will likely cost at least 10x less. There are many potential strategic reasons for this, all of which paint OpenAI as having a clearer vision of what makes it valuable. What OpenAI hasn’t touched with this model is interesting too — “For those seeking multimodal support, built-in tools, and‬ seamless integration with our platform, models available through our API platform remain the‬ best option.” These are dropped for reasons above, and “headaches” discussed later in the post.Together, these paint a much clearer vision by OpenAI on how they’ll control the AI ecosystem. The top potential reasons on my mind are:* OpenAI could be trying to make all API models potentially obsolete on cost ahead of the GPT-5 release, which they hope to capture the top end of the market on. Or,* OpenAI could be realizing that models are no longer their differentiation, as ChatGPT users continue to steadily climb — and they’ll soon pass 1 billion weekly actives.There are plenty of other reasons, such as the politics alluded to at the end of the blog post, but OpenAI tends to only act when it serves them directly — they’ve always been a focused company on their goals.There’s also a long list of head scratchers or in-between the lines points that illuminate OpenAI’s strategy a bit more. OpenAI of course didn’t release training data, code, or a technical report, as expected. OpenAI is trying to make a big splash with the name that captures more of the enterprise market, but in doing so takes some collateral damage in the research and true “open source” AI communities. These future questions include:* The naming is bad — a mixture of cringe, confusion-inducing, and still useful for their marketing goals. For anyone following open-source AI for a long time it won’t be new that a major company is blurring the association of the term open-source with the community accepted definitions. I understand why OpenAI did this, but the naming conflict further enforces that the true open source AI community isn’t the target of this release — it’s people that want to try an “open source AI model” for their business, and OpenAI has made the target too big to miss for enterprises.* OpenAI did not release the base models. Anyone following the space would’ve expected this, but it matters substantially for researchers. These two sparse, low numerical precision MoE models won’t be easy for researchers to use. The best model for researchers and tinkerers are dense, base models from 1 to 7 billion parameters. These are much “longer term” artifacts in the open community that will still be using almost only Qwen.I need to take a second before the “unknowns” section and comment on the architecture. These models are reinforcing trends we’re seeing in modeling across the industry. Recent frontier open models are all very sparse MoEs inspired by the DeepSeek architecture. DeepSeek V3 had 37B active and 671B total parameters. Kimi K2 had 32B active and 1T total parameters. With 5B active and 121B total, the sparsity factor fits right in with normal. Sparsity in MoEs is totally king right now. The smaller gpt-oss is a bit less sparse than Qwen’s 3B active, 30B total smaller MoE, but expect the sparsity of these models to continue to increase.Some things we need more testing to know the impact of include:* The model has been quantized for release to MXFP4 (4 bit floating point). It’s not clear exactly who will be impacted here, but this could make it benefit people most with the newest hardware, cause minor issues across Torch/Cuda versions, or even make some of the behaviors weird relative to the trained version internal to OpenAI. This could also be a plus, depending on performance, as the bigger model is quantized to 4 bit precision to enable it to be run on GPUs with 80GB of memory, such as the A/H100 line from NVIDIA.* Safety measures have been taken to change how finetunable the model is. With, or soon after, this release OpenAI is releasing a research paper on new methods to make it so you can’t “finetune the safety away” from a released instruct model. This is a very long-standing issue that people have concerns with over releasing open models. The main question here is if the models OpenAI releases are still able to be finetuned or not for productive use-cases. OpenAI claims they can be in their blog post, but this will be left up to the community to decide. Is finetuning the safety away actually a feature of an easy to use model?For example, Gemma has been tougher for people to finetune historically because it uses a different attention implementation and has a different parameter space from being distilled. Open finetuning stacks are still tuned for Llama and Qwen — this takes a long time to change.Many people will take the “we made it impossible to un-censor this model” as a challenge, which will be interesting to follow in the jailbreaking research community. There is a substantial market for modifiable models.* The model was trained to expect tools, but open model tool use is a mess. One of the biggest problems I worry about in designing an OLMo model with native o3-style tool use is that I need to make it seamless for users to use the same tools from training time at inference time. An early tester in my network mentioned that the model would hallucinate tool calls from training (sort of like what was mentioned around o3’s full release). I don’t expect this to be an unsolvable issue, but it could slow adoption. It could also allow people to reverse engineer the tools that OpenAI uses during training, we’ll see!* We need to re-benchmark the model on open infrastructure. OpenAI did a good job for this release integrating it everywhere, but we need to confirm that the community can easily replicate their evaluation scores. Evaluation at closed labs has increasingly become bespoke to suit their internal needs, which is a logical decision, but this comes at a cost of friction when an open model is released. This is me saying loud and clear that this isn’t a model performance review in a nuanced sense, but a summary of the importance of OpenAI’s approach (and where the opportunity is for the rest of us). Not all good models are easy to use. Some models benchmark well and are useful — e.g. Qwen. Some models benchmark well and are forgotten. Regardless of scores, I expect this to be a useful model.Overall, I would give OpenAI a very strong grade on their first open release in a while — they definitely listened to the feedback given by the community. The path to earning goodwill with the open community, especially with researchers, is to embrace more risk in making models that are easier to modify (and potentially even more revealing), such as the base models for these checkpoints. Open models from the U.S. labs were in such a dire spot that we need any step back in the right direction. As the rollout of the model begins and we have more understanding of it, we’ll include more updates on Interconnects, such as in the next Artifacts Log issue.Interconnects is a reader-supported publication. Consider becoming a subscriber.So, OpenAI is the new open champion, right? There’s no more risk vis-a-vis China? We don’t need Llama anymore? Not quite, let me explain.OpenAI, ATOM, and national championsIt’s a phenomenal step for the open ecosystem, especially for the West and its allies, that the most known brand in the AI space has returned to openly releasing models. This is momentum and could be the start of the turning point of adoption and impact of open models relative to China. The open ecosystem moves fast in some ways and slow in others. Many workflows and expertise is now built on Qwen models due to their frequent, accessible releases. Some of these will try OpenAI the next time they want to make a change, but it’s far from the fact that everyone will immediately switch to OpenAI’s model now that it’s out. To me, OpenAI dropping a strong model has switched the second derivative on the open model scales. The U.S. and its allies will no longer be falling further and further behind, which was the main story of 2025, but we need to build on this momentum if we want to have competitive open models for all use cases in the order of months rather than years.There’s a lot of uncertainty in the incentives for open models. Some of the best China analysts I know share how China is sensing that releasing open models is a successful strategy for them and are doubling down. This is a very reasonable take. The retort is that if we use it as a weakness of the American ecosystem that it is so reliant on Meta’s Llamas, or now GPT OSS, the same could happen for Qwen. So then, what happens if Alibaba decides Qwen’s stellar releases no longer serve them?In this case, there would be a large opportunity in the series of small models from 1 to 70B parameters, but there’s so much competition from China at the larger scales. These are currently the big mixture of experts (MoE) models like DeepSeek V3/R1, Z.ai’s / Zhipu’s GLM 4.5, Kimi K2, and so on. China has more models that are close to this performance level, such as MiniMax or Tencent.All of these companies have uncertainty, but there’s a strength in numbers that reinforces standard practice and sets standards. Releasing strong, large, open models is now the standard in China. We’re back in the precarious period of establishing standards for American companies, who are exposed to the legal risk of not being able to un-release models with many open lawsuits, such as in areas like copyright.These two sides of the open ecosystem are at very different stages and need very different actions. In many ways, we shared The ATOM Project when we did because we could tell this was a local (and hopefully global) minimum in terms of the distance between Western contributions to the open science of AI compared to any point in the recent past and near future. OpenAI’s release is a step in the right direction, but it is still a precarious position. Many people make noise about creating open models, from the AI Action Plan to venture capitalists and academics. What all of these parties have in common is that its not their number one goal. The goal of The ATOM Project is to give an outlet for people like myself that want to make this project their number one priority. This is why we need to keep nurturing entrants into the open model space that are releasing their best models there. It is what made the early versions of Llama great, and is what will be the defining factor of the outputs of ATOM. Models that are designed from first principles to be modifiable, interpretable, and extendable is what will enable a new decade of AI research to be born. This needs base models, training details, convenient sizes, and other little details that are missing from many recent open model releases, including OpenAI’s. This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.interconnects.ai/subscribe
    --------  
    13:36
  • Towards American Truly Open Models: The ATOM Project
    I’m very excited to share a substantial project on invigorating investment in open language models and AI research in the U.S. The ATOM (American Truly Open Models) Project is the mature evolution of my original “American DeepSeek Project” and I hope it can help be a turning point in the current trajectory of losing open model relevance vis-a-vis China, and even the rest of the world.I’ve included the full text below, but I encourage you to visit the website for the full version with added visuals, data, and a place to sign your support. This is a community movement, rather than me fundraising, starting an organization, or anything like thatIf you can help get the word out and or sign your support, I’d greatly appreciate it. (Or watch a 5 minute overview on YouTube)The ATOM Project: Towards fully open models for US research & industryReinvigorating AI research in the U.S. by building leading, open models at homeAmerica's AI leadership was built by being the global hub and leading producer of open AI research, research which led directly to innovations like the Transformer architecture, ChatGPT, and the latest innovations in reasoning models and agents. America is poised to lose this leadership to China, in a period of geopolitical uncertainty and rising tensions between these two nations. America's best AI models have become more closed and restricted, while Chinese models have become more open, capturing substantial market share from businesses and researchers in the U.S. and abroad.Open language models are becoming the foundation of AI research and the most important tool in securing this leadership. America has lost its lead in open models – both in performance and adoption – and is on pace to fall further behind. The United States must lead AI research globally, and we must invest in making the tools our researchers need to do their job here in America: a suite of leading, open foundation models that can re-establish the strength of the research ecosystem.Recommendation: To regain global leadership in open source AI, America needs to maintain at least one lab focused on training open models with 10,000+ leading-edge GPUs. The PRC currently has at least five labs producing and releasing open models at or beyond the capabilities of the best U.S. open model. Regaining open source leadership is necessary to drive research into fundamental AI advances, to maximize U.S. AI market share, and to secure the U.S. AI stack.OverviewOpen language model weights and data are the core currency of recent AI research – these are the artifacts that people use to come up with new architectures, training paradigms, or tools that will lead to the next paradigms in AI to rival The Transformer or Inference-time Scaling. These research advances provide continued progress on existing products or form the basis for new technology companies. At the same time, open language models create potential for a broader suite of AI offerings by allowing anyone to build and modify AI how they see fit, without their data being sent through the cloud to a few, closed model providers.Open language models are crucial for long-term competition within American industry. Today, substantial innovation is happening inside of large, closed AI laboratories, but these groups can only cover so many of the potential ideas. These companies spend the vast majority of their resources focusing on the next model they need to train, where the broader, open research community focuses on innovations that’ll be transformative in 2, 5, 10, or more years. The most progress in building useful, intelligent AI systems will come when the most people can participate in improving today's state-of-the-art, rather than the select few at certain companies.The open AI ecosystem (regarding the models, not to be confused with the company OpenAI) has historically been defined by many parties participating. The United States emerged as a hub of the deep learning revolution via close collaboration between leading technology companies and academic institutions. Following ChatGPT, there have been countless contributions from around the globe. This distribution of impact on research has been collapsing towards clear Chinese leadership due to their commitment to open innovation, while a large proportion of leading scientists working in the United States have joined closed research organizations.The playbook that led Google to invent and share the Transformer – the defining language model architecture of which all leading models such as ChatGPT, Gemini, or Claude are derived from – is now the standard mode of operation for Chinese companies, but it is increasingly neglected by American companies.The impact of China’s models and research are growing because the institutions focused on open models have access to substantial compute resources for training – e.g. some have formed a close relationship between leading AI training laboratories and academic institutions. Until the United States and its partners directly invest in training more, higher performance open models and sharing the processes to do so, its pace of progress in AI research will lag behind.To train open models at the frontier of performance, a developer currently needs a high concentration of capital and talent. We estimate that to lead in open model development, the United States needs to invest in multiple clusters of 10,000+ H100 level GPUs to create an ecosystem of fully open language models that are designed to enable a resurgence in Western AI research. Stacking large investments such as this into a few focused efforts will help them to learn from each other and make progress across a range of challenges quickly and robustly. Splitting such an investment in AI training into smaller, widespread projects will not be sufficient to build leading models due to a lack of compute concentration. Along the way we need to build models of various sizes that can enable applications of AI at every scale from local or edge devices all the way to high performance cloud computing.Open models as the engine for AI research and developmentAmerica's AI leadership was built by tens of thousands of our best and brightest students, academics and researchers. This process occurred over decades, but it is faltering at a crucial transition point to the new, language modeling era of AI research. Since the release of ChatGPT, open language models and computational resources are the most important table stakes for doing relevant and impactful research. High-quality open models and their subsequent technical reports quickly accrue thousands of citations and accolades such as best paper awards and the focus of large swaths of students. These act as foundational currencies of AI research and are crucial, achievable artifacts for the long-term American AI ecosystem.While many direct consumers of open models are academics, this community is far from the only group that will benefit immensely from a new wave of American open models. The low cost, flexibility, and customizability of open models makes them ideal for many use cases, including many of the ways that AI stands to advance and transform businesses large and small.If the United States does not create its own leading open models, the focus of American researchers and businesses will continue to shift abroad. The benefits of openly sharing a technology accrue to the builder in mindshare and other subtle soft power dynamics seen throughout the history of open source software. Today, these benefits are accruing elsewhere due to the intentional support of open models by many Chinese organizations. The gap in performance and adoption will only grow as the American ecosystem sees strong open models as something that is nice to have, or an afterthought, rather than a key long-term priority.China is adopting the playbook for open innovation of language models that the United States used to create its current AI leadership, yielding rapid innovation, international adoption, and research interest. The collapse of American dominance in AI research is driven not only by the remarkable quality of the Chinese ecosystem, but also by the commitment of China to these very same Open Model Principles - the principles that American scientists used to start this AI revolution. This is reflected further in a consistent trend of Chinese open models being released with more permissive terms of use than their American counterparts.The many leading closed research institutions in the United States are still creating world-class models – and the work they do is extraordinary. This collapse is not their fault, but closed labs make closed research, and the acceleration of AI was built on open collaboration with world-class American models as the key tool.As researchers, our focus is on leading the research and development for the core technology defining the future, but there is also a growing list of other urgent security and policy concerns facing our nation around the lack of strong open models. To start, adoption of open models from the PRC in the US and our allies has been slow in some sectors due to worries about backdoors or poor security in generated code. Similarly, there is concern over the outputs of these Chinese models being censored or inconsistent with everyday American values of freedom, equality, and independence. There are even parallels between how the PRC’s national AI champions are increasingly racing to release cheap and open AI models and the PRC’s historical practice of dumping state-subsidized, below-cost exports from China to undermine American competitors. With the dynamic and rapid evolution of this technology, we need to get ahead of these issues before stronger habits, cost disadvantages, or other incentives reduce the practicality of adopting American open models.America's lost lead in open model performanceOn countless benchmarks, the leading American models have fallen behind counterparts from Chinese companies. In July 2024, American models in the form of Llama 3 had leading performance over any openly available Chinese models. Since then, a growing number of Chinese open model providers have surpassed and widened the performance gap with the leading American open models.The leading American open models are Meta's Llama and Google's Gemma models. The Chinese open models from DeepSeek and Alibaba's Qwen have traded off positions at the frontier of capabilities ahead of their American counterparts. However, the Chinese ecosystem is expanding rapidly, with new players such as Moonshot AI (Kimi), Zhipu AI, or Tencent close behind.We consider two popular public, aggregate benchmarks to demonstrate the state of China’s current open model dominance. These represent crowdsourced rankings, LMArena, and comprehensive intelligence rankings by blending a variety of capability benchmarks, from ArtificialAnalysis. The pace of progress on these Pareto frontiers is only part of the equation. In addition to leading, the top 10 open models on LMArena are all created by Chinese organizations. For ArtificialAnalysis rankings, the top 3 open models are of Chinese origin as of publishing on August 4th, 2025.The isolation of Meta's LlamaMeta CEO Mark Zuckerberg has been one of the few clear advocates for the long-term imperative of America building open models. Since the release of ChatGPT, this has been manifested by Meta's Llama series of models – these had long been the definitional open models that served as the basis for research and product development in 2023 and 2024. This basis for research is established by releasing a suite of strong models across a variety of sizes. The original LLaMA family came with models of 7, 13, 32, and 65B parameters, which quickly became defaults of the research community based on convenient factors of them fitting on certain popular GPUs for finetuning or inference.For a first instance showcasing the gap in adoption, the Qwen 1.5 family of 8 models was released shortly after the Llama 2 family of four comparably sized models in the summer of 2023. An analysis of cumulative model downloads shows the Llama 2 models being downloaded about 500% of that of early Qwen models (a difference of 10M versus 60M total downloads with half of the models), highlighting the original state of play in the open ecosystem – a large lead for American models.Llama 3 continued this trend with a series of models across 2024. Pieces of the Llama 3 family (and its various versions in Llama 3.1 and 3.2) are some of the most popular models ever in HuggingFace’s history as the leading distributor of open models. At the same time, the newer Qwen models from Alibaba, this time the Qwen 2.5 suite of 2024, showed substantially closer adoption numbers to Meta’s Llamas – a lead of only 20 million cumulative downloads for Llama 3 over the Qwen 2.5 suite with both of them crossing over 120M total downloads.Llama’s lead was built on a combination of strong performance and existing distribution channels. This success came in spite of a restrictive license – the contract between the open artifact’s creator and the downstream user – that can require nuanced legal consideration about if a particular use-case is compliant. Meanwhile, Qwen and other Chinese models have adopted simpler licenses drawing on historical practices in open-source software (OSS), removing another barrier to uptake on their models.Meta has effectively been a singular horse in this race. As language models were established as a core technology, competition has arrived. Between the last releases of Llama 3 and the arrival of Llama 4, the landscape of open models changed substantially with the arrival of DeepSeek’s permissively licensed, frontier models in DeepSeek V3 and DeepSeek R1. Now, Meta was effectively alone in releasing its best models regularly and expected to compete with Qwen making large families of models great at any size scale and DeepSeek releasing open frontier models. Both types of models are crucial to the health of the ecosystem, but they can take slightly different foci to get right.China today has 5 amazing open labs, a number which is growing, and America has Meta as its open models champion. We are running Meta in a race against 5 other Chinese runners, and then complain when it doesn't win every race every time. Our problem is not Llama 4 being not state-of-the-art; our problem is running a solo athlete against a team built with an ecosystem to support its growth.Chinese open models are taking the all-time lead in adoptionThe available data showcasing adoption of open language models – how much models are downloaded and how much base models are modified for new uses – shows that China has taken the lead in recent adoption and will soon take the lead in all-time adoption.We collected historical, daily download data from 6 of the leading open model providers across the world – Meta, Google, Mistral AI, Microsoft, Alibaba Qwen, and DeepSeek AI. Grouping by locality we can see America’s early lead with Llama, Europe’s surge with Mistral’s early viral releases almost surpassing the U.S. in April of 2024, and a consistent acceleration from the Chinese providers until they’re surpassing the U.S. this summer. As of August 2025, the leading U.S. and Chinese models both have around 300M total downloads on HuggingFace with the Chinese rate of growth being notably higher. The growth rate for European models has remained lower, with their cumulative downloads reaching around 100M today.An important benefit of open models is the ability to finetune them, a process to adapt a given model to a specific purpose. This process is at the heart of academic research and important for businesses to shape a given model to their individual needs. While there are more cumulative derivatives of American models at the moment, Chinese models are gaining momentum, especially this year.Early in 2024, Chinese models accounted for 10-30% of the new finetuned models appearing on HuggingFace. Today, derivatives of Alibaba’s Qwen models account for more than 40% of the language models appearing on HuggingFace month over month (the overall picture is quite similar to the downloads data) – and that is just one of China’s leading open model laboratories. Meta’s share of derivatives with the Llama models has dropped from a peak of nearly 50% in the fall of 2024 down to only 15% today. With far fewer open model options appearing from the U.S. or Europe, the proportion of Chinese models in the AI ecosystem is expected to continue to rise.What the ecosystem needsWe can fix this. America has the talent, compute, and capital to lead open model development – we just need to get them to the right place.The tone for change is well represented by the White House's recent AI Action Plan, which paints a much clearer vision for the benefits of innovation and adoption globally to far outweigh the current measured risks. This represents an inflection point in the perception of open models, especially in the United States, but we still have a long way to go to support this vision with artifacts and actions.The United States has a thriving AI research community, but it is missing the models that it itself has created and has complete knowledge of in order to create clear, and rapid progress. For example, the area of research with the most excitement following recent reasoning models is reinforcement learning with verifiable rewards (RLVR). This research has largely been performed on Alibaba's Qwen models from China due to their strong performance across math, code, and STEM benchmarks.There are two categories of truly open models that we need in order to lead on all metrics of open models defined by how AI is studied and used. Both are essential and complement each other and the rest of a leading AI ecosystem. The best outcome is when these are accompanied by training data, intermediate checkpoints, base models, training code, and permissive licenses accepted as standards for free use by the AI community. These models with everything released, currently less common across the industry, are known as “open source models” to clearly note the benefits that come with more knowledge of how it was built.First, we need leading open models at the frontier of performance. These should be the best models in the world and can be complementary to offerings from the leading closed AI models built in America, offering cheaper costs and more modifiability. The fundamental insight driving the recent rapid buildout of AI training infrastructure is the idea of scaling laws – this applies to open and closed models alike. The ballpark of scale needed to reach the leading edge of performance today is 200 to 600+ billion parameters with a mixture of experts (MoE) architecture – a size range used for all the leading open models from the U.S. and China in 2025 that challenge the best closed models on intelligence benchmarks.With these leading models, we need a family of related models across a variety of sizes to allow every application and direction of study to be addressed. This is a standard adapted by leading open model suites from the U.S. and China alike. Only the most challenging tasks need the largest models, and for the rest of the tasks facing AI there needs to be tools to understand the minimum model size to solve certain simple tasks. A distribution of model sizes from those that can run on your iPhone to those that are assisting with the hardest intellectual work and everything in between creates maximum opportunity to advance and integrate AI broadly.The entry point to train models of this size distribution is a cluster of compute on the order of 10,000+ leading GPUs. It is standard for top models to be trained with small teams of fifty to a few hundred people. A famous number on the cost of training frontier AI models from earlier this year was the often quoted $5 million figure for DeepSeek V3 – this is misleading on what it actually takes to develop these models, and the authors of the DeepSeek technical report acknowledged so much. 10,000 GPUs provide an entry point for rapid iteration concurrent to large-scale training.America should target having multiple centers producing excellent open models. This serves to de-risk progress on training these models, given the urgency of the mission, but will also allow for a more diverse set of artifacts and for the research groups to learn from each other without first making the training organizations so large that progress is slowed.There are many avenues to obtain and allocate these resources across multiple stakeholders. We need to engage across private companies, philanthropic institutions, and government agencies. Programs such as the National AI Research Resource (NAIRR) are important for broadening access to resources related to AI research including compute, data, software, and models, but these ecosystem-wide solutions are not enough to create breakthrough models as China is with concentrated bets. We need immediate, targeted interventions that can deliver frontier open models within 6-12 months, not years.As many organizations around the world create strong AI models, it is becoming clearer that with the right compute and talent, strong models can follow. The formula we must follow is delivering these resources with the directive to release the models openly, then we can solidify American AI leadership. Every stakeholder – from tech giants to philanthropies to federal agencies to researchers and engineers – must ask themselves: Are we funding or participating in the future of AI research, or are we ceding it to competitors who understand that open models are the foundation of AI supremacy? This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.interconnects.ai/subscribe
    --------  
    22:12

More Science podcasts

About Interconnects

Audio essays about the latest developments in AI and interviews with leading scientists in the field. Breaking the hype, understanding what's under the hood, and telling stories. www.interconnects.ai
Podcast website

Listen to Interconnects, Science Friday and many other podcasts from around the world with the radio.net app

Get the free radio.net app

  • Stations and podcasts to bookmark
  • Stream via Wi-Fi or Bluetooth
  • Supports Carplay & Android Auto
  • Many other app features

Interconnects: Podcasts in Family

Social
v7.23.3 | © 2007-2025 radio.de GmbH
Generated: 8/31/2025 - 1:46:04 AM