A new Science Advances study compares AI-driven weather prediction to traditional physics-based models, testing how well cutting-edge AI systems forecast record-breaking heat, cold, and wind events.
The research analyzes three leading AI models—GraphCast, Pangu-Weather, and Fuxi—against ECMWF’s High Resolution physics-based model by reproducing thousands of extreme events identified in the ERA5 reanalysis data from 2018 and 2020.
The findings show that while AI models can excel at general forecasting and use far less computational power, they struggle with unprecedented extremes and near-term forecasts.
AI vs physics-based models for extreme weather
The study highlights a crucial nuance in weather modeling: AI tools offer speed and efficiency, but their performance for extreme events is not yet on par with physics-based approaches at short lead times.
Disaster preparedness hinges on accurate forecasts of rare, high-impact events, and this work underscores the safety margin provided by physics-based models in this domain.
What the study did
The researchers conducted a head-to-head assessment of three prominent AI systems against a state-of-the-art physics-based model.
The evaluation used an extensive set of record-breaking events derived from ERA5 reanalysis, focusing on the period around 2018 and 2020.
The aim was to probe the limits of AI when confronted with events that sit outside the patterns seen in historical data.
- Models compared: GraphCast, Pangu-Weather, Fuxi versus ECMWF’s High Resolution physics-based model.
- Data for testing: Thousands of record-breaking extremes identified from ERA5 reanalysis.
- Forecast horizon: Emphasis on short lead times where near-term warnings are most critical.
These AI systems are deterministic and trained on historical data, which tends to bias them toward previously observed patterns and magnitudes.
This makes extrapolating to truly novel, record-shattering events more challenging.
Key findings on extreme events
- Overall performance gap: AI models generally lag behind physics-based forecasts when predicting extremes, especially for unprecedented heat, cold, and wind events.
- Heat records: The physics-based model consistently produced lower RMSE (root-mean-square error) than the AI models, indicating more accurate temperature forecasts during heat records.
- Cold and wind extremes: Similar advantages were observed for cold records and wind extremes, with AI tending to underpredict extreme temperatures and occasionally mischaracterizing wind intensities.
- Lead-time sensitivity: The largest performance differences occurred at short lead times, where early warnings are most vital for preparedness and response.
The authors also note that AI tends to underpredict temperatures during heat records and overpredict during cold records, with larger failures when the breaks are substantial.
These patterns suggest a systematic bias related to the extremes that AI models have encountered before.
Limitations and cautions for forecast use
The study urges caution about replacing physics-based models with AI in operational forecasting.
The ability to reliably predict extremes is essential for disaster risk reduction, evacuation planning, and infrastructure resilience.
- Probabilistic AI: Exploring AI models that quantify uncertainty could improve extreme-event characterization.
- Hybrid approaches: Combining physics-based fidelity with AI-driven efficiency may offer practical gains without sacrificing reliability.
- Testing protocols: The study provides a protocol for evaluating unprecedented extremes to guide future development.
Paths forward: probabilistic AI and hybrid models
Looking ahead, researchers advocate for integrating probabilistic AI with traditional physics-based forecasts to better capture the tails of the distribution.
Hybrid systems could deliver timely, energy-efficient predictions while maintaining a robust ability to forecast record-breaking events.
The study also calls for ongoing development of physics-based models alongside advancing AI methods to ensure dependable operational forecasts across all lead times.
Implications for researchers, policymakers, and disaster readiness
For scientists, the work reinforces the enduring value of physics-based dynamics in capturing extreme events. It also stresses the need for careful validation before deploying AI as a replacement in critical forecasting tasks.
Policymakers and emergency managers can interpret these findings as a reminder that while AI offers exciting capabilities, it should complement—not supplant—well-established physical models. This is particularly important when lives and infrastructure are on the line.
The report advocates using physics-based models for the core physics of the atmosphere and augmenting with AI where appropriate. Pursuing probabilistic and hybrid strategies can improve extreme-event forecasting while preserving reliability for disaster response.
Here is the source article for this story: Traditional models still ‘outperform AI’ for extreme weather forecasts

