By Scott K. Johnson
Jun 08, 2026
The weather and climate science AI revolution isn’t revolutionary
It feels like there’s no escaping AI right now, whether you’re trying to type a sentence without being interrupted by a digital “assistant” or struggling to find a new refrigerator that doesn’t require a Wi-Fi connection for some reason. You’d be forgiven for wondering if we’re in the midst of a quantum leap in tech or whether people are just hyping up a heap of slop. So what should we make of the growing use of AI in weather and climate modeling? The conversation didn’t get off to a great start earlier this year when a National Weather Service office posted a forecast map featuring nonexistent cities in Idaho with names like “Whata Bod” and “Orangeotild.” Thankfully, that was just an AI-generated image produced for social media, not the actual forecast model. Meteorologists and climate scientists are not yet being replaced by large language model prompt engineers. But AI is being used in these fields through techniques that researchers have studied for years and whose strengths and weaknesses are well understood. And for good reason, those techniques differ between weather and climate simulation models. ML, not LLM In all these models, “AI” refers to machine learning. Without diving into the technical details of the many variations of machine learning, the idea is straightforward: using computers to identify patterns in data. Fitting a straight trend line to data, known as linear regression, is a very simple way to identify a pattern. And we can do regressions with more complicated curves and equations as well. The power (and potential pitfall) of machine learning is that an algorithm can handle much higher levels of complexity, picking out relationships we would have a tough time putting a finger on manually. Machine learning starts with training a model from scratch. The model is assigned some structure—like a neural network —giving us a number of knobs that can be independently tweaked to fine-tune the algorithm’s behavior. It is given a huge pile of example data, often with the answer attached, such as thousands of bird photos labeled by species. The model then iteratively determines the best set of knob values to connect the photo’s contents to the correct species. Some limitations should be obvious. This algorithm won’t identify a species it wasn’t trained on or any subpopulations of species that differ too much from the example. The quality of the training data matters a lot, too. If we only use photos of chickadees in pine trees, the model could include pine needles in its definition of chickadee-ness. Without a lot of extra work, we may not know how the model arrives at its answers. The internal mechanisms are pretty much a black box most of the time. The upside is real, though. Machine learning algorithms often outperform our best human-crafted algorithms, at least in terms of computational efficiency, if not also accuracy. They just have to be used properly, or the limitations will show. Cloud computing For weather forecast models, the process isn’t too different from our bird identification example, but the models are trained on two sets of weather data obtained a short time apart. Because they aren’t solving lots of physics equations in every location, these models run far more quickly than traditional weather models. A number of companies, including Google, Nvidia, Huawei, and Microsoft, have developed initial models—sometimes in collaboration with independent academics—that could compare favorably to the forecast models we currently use. Once we began to understand where the models excel and struggle, some of the major weather forecast centers started developing their own. The European Centre for Medium-Range Weather Forecasts (ECMWF) put its first machine-learning-based model into service in February 2025, running it alongside its long-standing Integrated Forecasting System (IFS) model. The AIFS model is trained using a reanalysis —a dataset built by taking all available weather observations and filling out a physically consistent picture where we don’t have measurements. This critical tool greatly simplifies the machine learning task of predicting the next global snapshot (six hours ahead) based on previous snapshots. Each snapshot contains information on temperature, air pressure, wind, water vapor, cloud cover, precipitation, solar radiation, and soil moisture. Instead of applying the physics connecting any of those things, the model simply distills the spatial patterns through which they’ve changed in the past. That means weird things can happen. A machine learning model doesn’t “know” that the number in a column is rainfall and rainfall can’t be negative, or that the wind moving out of one part of the model grid must be balanced by the wind moving into the neighboring pixel because the conservation of mass and energy is a thing. When a model is optimized for the smallest overall error, it may get there by allowing nonsensical impossibilities. Dealing with this issue commonly involves constraining model outputs. The ECMWF model takes negative predicted precipitation values and remaps them to zero, for example. Physical guardrails of one form or another constitute a major focus for improving machine learning models. AIFS modeled precipitation before (left) and after (middle) an upgrade that included constraining negative precipitation, with the traditional IFS model (right) for comparison. AIFS modeled precipitation before (left) and after (middle) an upgrade that included constraining negative precipitation, with the traditional IFS model (right) for comparison. Credit: Moldovan, et al. The payoff for these machine learning models is that they absolutely clean up on computational efficiency. ECMWF says a forecast run of the IFS uses about 1,000 times as much energy as a run of the AIFS and requires about 30 minutes versus three. The savings really add up for the ensemble versions of these forecast models, which run 50 simulations to better capture the range of possible outcomes. Given that the forecast quality has been good, these machine learning models are enormously useful. Here there be dragons Forecasts of run-of-the-mill weather conditions have a lot of practical value, but there is life-or-death value in an accurate forecast of extreme weather conditions. The more extreme, the more true that is. But just as a bird-identifying algorithm can’t identify a bird it wasn’t shown during training, AI-based weather models can fail at predicting extreme weather that wasn’t in their training dataset. Because extremes are rare, even a very large training dataset may lack certain kinds of events, or at least any examples as extreme as what might be about to happen in the real world. (If climate change is influencing a given weather pattern, the past is a poor guide to the future.) And if we include all the extreme events in the training phase, we’re left without any to use to test the system afterward. Compared to ECMWF’s high-resolution physics-based model, a recent study found that the common machine learning models “tend to underestimate both the frequency and intensity of record-breaking events, [...] with growing errors for larger record exceedance.” Since these models won’t go beyond what they saw in training, they may smooth out extreme events, capping them so they stay within the bounds of normal conditions. That behavior is problematic for extreme-weather forecasts. But for climate models, it’s a deal-breaker. Out of bounds Weather forecasting involves looking at the current state of the atmosphere and projecting it just a few hours (or days) into the future. Climate models do something very different. Climate science asks broad “what if” questions about the effects of changing how much energy is in the atmosphere or about what factors control the atmosphere’s current state. In modeling terms, this relates to boundary conditions—the factors that shape long-term weather patterns rather than the evolution of weather on a specific day. If we emit a given amount of CO2, how will those statistics change? What would the statistics look like today if we had never emitted CO2? These counterfactuals and projections generally can’t be learned from a historical training dataset. The laws of physics are pretty indispensable for this kind of science, so ditching all of our physics-based calculations is out of the question. Still, researchers are finding ways to put machine learning to use. Caltech’s Tapio Schneider is part of a project called the Climate Modeling Alliance, or CliMA. This ambitious effort is building a new climate model from the ground up, making a clean break from existing Fortran code in favor of Julia and cloud-native architectures that can take advantage of GPUs. The result will be a hybrid climate model—mostly physics-based, but with machine learning components. “I think our essential bet is that it’s important to retain physical guardrails so that we can confidently predict the climate for which we do not have data,” Schneider told Ars, “which forces you down this path of putting machine learning at relatively small scales inside the model rather than replacing the entire model with [machine learning].” Climate models are really multiple models connected together —one component might model the atmosphere, another the ocean, another some land surface processes, and so on. Within each component, many processes occur at a scale smaller than an individual segment of the model grid. We can’t simulate every droplet inside a cloud or every plant’s response to dry weather. Instead, these processes are handled by bulk approximations called “parameterizations,” which calculate average behavior across a segment based on physical values like humidity or temperature. The CliMA group’s model is replacing some of those parameterizations with machine learning algorithms. Snow cover modeling, for example, requires a surprisingly intensive set of physical equations because of all the processes involved in controlling it. So they’ve replaced this specific parameterization module with machine learning and a requirement that water in equals water out. “It works really well, actually, because snow conditions in the present climate sample [can help predict] what will happen in the future very well,” Schneider said. “What happens at lower altitudes right now will happen at higher altitudes later, or what happens at lower latitudes will happen at higher latitudes later, but [the] relation between temperature, snow melt, and the like—it’s well sampled in the present climate.” “In other contexts, it doesn’t work so well,” Schneider explained. “Clouds, for example, will get deeper as the climate warms. So there will be taller clouds than we’ve ever seen on Earth as the climate gets warmer—meaning, if you try to learn the relation between cloud condensate concentrations and the like and environmental conditions in the present climate, you’re not sampling at all what the cloud will look like in the future.” Still, the researchers have found narrower opportunities within cloud parameterizations. They’re implementing a machine learning solution for the exchange of air inside the cloud and the air around it—a process that sounds minor but has a significant impact on cloud cover. Overall, the CliMA team’s goal is to incorporate machine learning where they see clear advantages for computational efficiency and scientific quality while preserving the methods that work better everywhere else. Let’s get meta Some equations in physics-based climate models have terms that can be tuned to achieve the best fit to reality. Optimizing that tuning, called model calibration, is a process that machine learning can fit into nicely. A recent study from the NASA Goddard Institute for Space Studies (GISS) climate modeling group solved for the best-tuned combination of values for key terms across their entire atmosphere model—a daunting task that machine learning has made feasible. To do this, they varied the parameter values related to things like processes inside clouds, resulting in 450 combinations of values. Each combination was used to simulate one year of atmospheric conditions and then scored against metrics like the number of tropical cyclones that occurred or the difference between energy entering and leaving the top of the atmosphere. Each of the metrics (y-axis) with their sensitivity to changes in the parameters (x-axis). For example, the number of topical cyclones goes up (red) or down (blue) if you increase the value of a specific parameter. Each of the metrics (y-axis) with their sensitivity to changes in the parameters (x-axis). For example, the number of topical cyclones goes up (red) or down (blue) if you increase the value of a specific parameter. Credit: Elsaesser, et al./JAMES A machine learning model was trained on the error in those metrics compared to real-world observations. That model could then be used to identify a set of exact values (within the ranges used in the simulations) for all the parameters that would result in the lowest error. This is, after all, exactly what neural network machine learning is designed to do—find the best fit for a dauntingly large number of knobs. Another attractive use for machine learning is to train a model to imitate other models. That might sound goofy, but there are pretty of good reasons to do it. It allows you to take a complex model that might take heavy compute resources and time to run and train an incredibly lightweight model to estimate its output. These “emulators” can be trained on a massive climate model’s projections for the standard set of future greenhouse gas emissions scenarios and then used to explore any new emissions scenario without getting in line for a week of supercomputer time. It won’t give you the detail of a full model simulation, but it could quickly provide bottom-line answers to key questions. As a recent perspective article on emulators published in Communications Earth & Environment put it, “The result is a dynamic relationship between simulators and emulators: simulators generate data that trains emulators, and emulators, in turn, help target where simulation efforts are most needed.” Emulators can be used to stand in for computationally expensive parameterizations. Instead of training a machine learning model to represent ice sheets based on data, as we described earlier, we could train it to emulate a beefy physics-based ice sheet model that is simply too big to fit into a global climate model. If you could get half of the benefit of an advanced model for less than 1 percent of its computation cost, the juice would be well worth the squeeze. This process is currently being pursued for areas like the physics of energy radiating through the atmosphere.
Source: Ars Technica