The Firebird's Lair: Weather Generation 2: What Are We Trying to Match, Anyway?

Last time, I proposed three criteria for weather generation. These are: (1) simulationist: it has to match real world data. (2) algorithmic: I have to be able to generate more tables easily. (3) practical: No more than two dice rolls and two tables.

Today I want to talk about the first bit. How can we tell if a weather system is simulationist?

Throughout this series I’ll be using Chicago as an example. It’s a big city and so should have good data; it has seasons; and I’ve lived there in the past, which should help me identify data that doesn’t add up.

First, I had to get data for real locations. This takes some effort. You can get data from individual weather centers in the US from NOAA. These data, when they exist, seem to be of high quality. But there are oddities. For example, the Chicago dataset has temperature and precipitation since 1946 (albeit with 85% coverage) but only has cloud cover until 1996. So each dataset we use would require some choices and care.

Using local weather stations would also make it challenging to apply anything to places outside the US. The formatting may be different and the data may be less available.

Another option is to use reanalysis datasets. These have already taken data from many local stations, as well things like satellite data, and combined them with weather models to get continuous, global coverage. However, these data can be extensive. One of the best models, ERA5, for example, has a size of tens of terabytes. I ended up not trying to use these.

Instead, I went with data from Open-Meteo, a service which has already pulled some key parameters from ERA5 for easy access. This gives me data of the same type, since 1940, for any location on Earth. Presumably the quality varies somewhat based on how much local coverage went into the ERA5 data? I’d bet the 1940 data are worse, for example. But I think this is sufficient for RPG purposes.

---

So, what do we actually get? The key variables are temperatures, wind speeds, precipitation (snow & rain), and cloud cover. Open-Meteo also provides weather codes by day (e.g., weather code 3 = “Overcast”). At first those seemed helpful, but the distribution wasn’t quite what I expected. For example, for the Chicago summer, the most common weather code I got was Overcast, which was about 10x as likely as Clear Sky. It turns out they are probably set by the most severe weather encountered that day, and that other people encountered similar issues.

I decided to try to set my own weather codes then. I had to decide what to include. Temperature and precipitation are obvious. Cloud cover seemed necessary as well—there is an argument to keep it out because it is not gameable, but I felt it is so important to getting the vibe of a place, and to getting transitions between weather with no precipitation vs with, that I wanted to keep it.

In contrast, I decided not to use wind. Similar arguments can be made in its favor (it precedes rain) and it might be more gameable (-2 to ranged attacks?). But I felt it was less fundamental than cloud cover and 3 variables per day was already a lot. Wind is also complicated, because you have to distinguish gusts vs more steady winds and there might be a need for wind direction. Later on I may revisit it, especially for marine systems where it will be important for sailing.

I was disappointed to not get data for fog. This didn’t seem to be available as a variable in Open-Meteo. Some of the weather codes, at least, indicated fog; but I already noted those were not that reliable. And at the same time, for Chicago at least there were no weather codes that had fog in the past 85 years. Fog isn’t that complicated to add (just yes/no) and is gameable (less visibility) in important ways (good for ambushes, sneaking). So I consider this a real loss. If anyone has any ideas for how to get good fog data in the future I would love to hear them.

---

With those variables, I then set up a categorization scheme. I didn’t want to be too precise, keeping in mind my goal of practicality. A change of 10 degrees F is not that gameable.

I settled on--for temperature, taking the mean temperature each day:

>100 F:          Extreme Heat
>80 F:            Hot
>55 F:             Warm
>35 F:             Cool
>0 F:              Cold
<0 F:              Extreme Cold

For cloud cover, taking the average cloud cover:

>60%:            Overcast
>25%:             Partly Cloudy
<25%:             Clear

And for precipitation, taking the sum across the day:

>0.8” rain:     Heavy rain
>0.2” rain:     Rain
>3” snow:      Heavy snow
>0.3” snow:   Snow
<0.2” rain and 0.3” snow:   No precipitation

If there was both rain and snow, I defaulted to snow. (E.g., 0.3” rain, 0.2” snow would be snow). (NB: I originally tried 0.1”as the threshold for snow, but this led to a Chicago map that was subjectively “too snowy”).

---

I categorized each day into these categories. Then I split them days up by season to figure out the dynamics for each season. (Total, there are 6*3*5=90 possible types of days). Here, for example, is part of the output for Chicago autumn:

Cool / Overcast / None                   0.168862
Cool / Partly Cloudy / None           0.128163
Cold / Overcast / None                   0.108924
Cool / Clear / None                          0.097676
Cold / Partly Cloudy / None           0.081841
Warm / Overcast / None                0.072961
Warm / Partly Cloudy / None        0.064082
Warm / Clear / None                       0.060530
…

And it continues like that. Total, 34 days are represented, with the rarest occurring 0.015% of the time (1 day across 85 years).

When we think about simulationism, this is the first data we need to match—the day-by-day probabilities of a certain weather state. An authentic weather table for Chicago autumn should have days which are Cool and Overcast with no precipitation about 17% of the time.

---

The second thing we need to match is the transition probabilities: assuming today is Cool and Overcast, how likely is tomorrow to be Cool and Overcast? This is the heart of a ‘realistic’ feeling weather system. So what do those data look like? Across all types of day, something like this:

Here, I shortened the descriptors for each day. The first entry gives temperature:

XC = Extreme Cold
C =      Cold
L =      Cool
W =     Warm
H =     Hot
XH = Extreme Heat

The second gives cloud cover:

C =      Clear
P =      Partly Cloudy
O =      Overcast

The third gives precipitation:

LS = Light snow
LR = Light rain
HS = Heavy snow
HR = Heavy rain

If there is no precipitation, that entry is left blank. So XC.C. corresponds to extreme cold and clear.

To read this plot, types of days are given on both columns and rows. You start by finding the row for the current type of day. Then the chance you have of transitioning to another type of day (the columns) is given by the color of that box. The number of times each day appeared in the dataset is given next to the row, in parentheses.

For example, the XC.C. type only appears once. It was followed by a cold, overcast day (C.O.), so that box is dark blue and has a transition probability of 1.

For other days, there are more complex distributions. They are ordered by temperature, with colder days appearing at the top and the left. You can see that generally speaking, days cluster by temperature. Cold days transition to cold days and do not transition to warm ones.

This is a big and complicated object. Keep in mind that we can generate this for any location we want on the planet. So if we determine a good way to match it for Chicago, we should be able to apply that anywhere.

Over the next posts, we’ll see how well we can do.

End note: If we wanted to go even further we could include state information for multiple older days: given it was cold and overcast today and cool and clear today, what will tomorrow be like? I don’t know of any RPG system that attempts this and I think any attempt would be far too detailed to be practical. Still, I note it as a limitation.

The Firebird's Lair

Tuesday, February 3, 2026

Weather Generation 2: What Are We Trying to Match, Anyway?

No comments:

Post a Comment

Report Abuse

Labels