Track frontier AI research in real time.

Since the public release of GPT-3 in mid-2020, AI has entered an era of foundation models, scaling and general-purpose algorithms. As AI systems become increasingly capable, they create new risks to public safety that need to be monitored. These include accident risk, as well as risks borne of new malicious applications that were previously impossible.

We built AI Tracker to monitor cutting-edge developments in this fast-moving field in real time, to help researchers and policy specialists better understand the AI risk landscape.

See something wrong or missing? Let us know.

We try to keep this list accurate and up to date, but we aren't perfect. If you spot a mistake or a missing model, let us know below. We'll verify your submission, add it to the catalog, and confirm with you once it's done.

👉 Please include a public source for your correction if there is one. This helps us verify submissions quickly.
Thanks for your submission! 🙏

We'll review it and get back to you as soon as we can - typically within 48 hours.
There was an error submitting the form. Please try again, or contact us directly at hello@mercurius.ai. Thanks!

See something wrong or missing? Let us know.

We try to keep this list accurate and up to date, but we aren't perfect. If you spot a mistake or a missing model, let us know below. We'll verify your submission, add it to the catalog, and confirm with you once it's done.

👉 Please include a public source for your correction if there is one. This helps us verify submissions quickly.
Thanks for your submission! 🙏

We'll review it and get back to you as soon as we can - typically within 48 hours.
There was an error submitting the form. Please try again, or contact us directly at hello@mercurius.ai. Thanks!
Thanks for your submission! 🙏

We'll review it and get back to you as soon as we can - typically within 48 hours.
There was an error submitting the form. Please try again, or contact us directly at hello@mercurius.ai. Thanks!
Checklist icon.

Our methodology

If we think a new model has important public safety or security implications, we add it to the tracker. New entries usually introduce a capability that hadn’t previously existed, or represent the proliferation of a flagged capability of concern.

Each entry includes our best assessment of the model’s scale (in terms of number of parameters, dataset size, and total FLOPs of compute), a short description of the model and its capabilities, some industry context, and other information that we think helps paint a picture of the model’s significance to public safety.

Our methodology is constantly evolving. If you believe we’re omitting useful information or have any suggestions for us, please submit a correction above, or email us at hello@mercurius.ai.

Frequently asked questions

FAQ dropdown arrow.
Why did you make this?
Over the last year and a half, we’ve been engaging with senior policymakers in the United States and Canada to understand how best to make the case for global coordination on AI safety. We put together AI Tracker to ground those conversations in data, and to make it easier for long term safety-oriented AI policy professionals to understand the big picture of AI risk.
FAQ dropdown arrow.
Why do you think AI is risky?
In the present and near-term future, AI poses a number of risks for public safety and security. AI can be used by malicious actors to scale phishing attacks, generate disinformation, and manipulate users on social media, for example. As AI is included in more safety-critical applications, AI accident risk is also becoming an important factor to track.

Over longer time horizons, AI may pose much more significant risks as systems become more capable. Specifically, many top AI researchers at world-leading labs like DeepMind, OpenAI and Anthropic are concerned about what might happen when we develop AIs that are more capable than human beings at a wide range of tasks. Such systems may be difficult or impossible to control, and may in effect develop their own objectives, distinct from those of their developers — and which, more broadly, may also be misaligned with human values. AI systems with these characteristics would be a source of catastrophic, or even existential risk for humans.

While some observers have dismissed concerns over these long-term risks, we've found that the researchers most actively involved in developing frontier AI capabilities also tend to be the ones who are the most worried about catastrophic and existential risk from AI.
FAQ dropdown arrow.
Why now?
GPT-3 was a powerful proof point for the scaling hypothesis: the idea that increasing the number of parameters, the compute budget, and the training set size of an AI system can lead to predictable increases in its capabilities.

That’s a compelling equation for well-funded tech companies: with a large-but-accessible investment in compute and training, they can now expect to recover reliable value from AI research. The result has been a dramatic doubling-down on scaling across the industry, with trillion-parameter monolithic models and an unknown but likely impressive range of capabilities on the horizon.

Notably, all of this is happening in a context where the impact of AI accidents is rapidly increasing, and the problem of aligning AI systems with human values remains unsolved.

We think there’s now room for a much more robust and long-term focused public safety conversation about AI in policy circles. And that can’t happen without an AI tracking effort focused on public safety.
FAQ dropdown arrow.
Can anyone just build a model like these?
No. Among other things, at these scales processors need to be connected to each other with high-bandwidth physical cables in order to ensure the model can be trained within a reasonable timeframe. It's still not practical to train models like these on commercial cloud infrastructure, though this may change.

Part of what makes the models on our list so significant is that most of them required a custom, integrated build combining hardware, software and algorithms at the cost of significant engineering effort. We believe this is an indication of the value placed on these models by the institutions that train them.
FAQ dropdown arrow.
How do you decide which models make the list?
We include models that we think are relevant to the security and public safety implications of AI. Models are included when they:

1) Lend themselves to well-defined malicious applications in the present;

2) Suggest that certain AI capabilities may be imminent, which would unlock malicious applications when they arise in the future; or

3) Speak to a trend in AI development that has important implications for safety.
FAQ dropdown arrow.
Why isn’t my favorite model on the list?
We might have decided that its public safety or security implications weren’t significant enough to warrant its inclusion. But we might also have missed it, or it could have been released recently enough that we aren't aware of it yet. If you think we’ve missed something, please drop us a correction above or email us at hello@mercurius.ai. We’ll either make the change, or explain the reason for our decision.
FAQ dropdown arrow.
Why do you care about these trends?
We think these trends are the likely to be relevant to the risks advanced AI may pose to humans in the future. But they aren’t set in stone: we expect over time we’ll add more, remove a few, and tweak others.