Tuesday, February 3, 2026

Weather Generation 5: Temperature Swings

 I next generated weather matrices for four cities: Chicago, Cairo, Seattle, and Singapore. I picked these arbitrarily.

Generating their matrices taught me two things. First, it is important in some cases to account for daily shifts in temperature. For Chicago, it isn’t that relevant: here are the CDFs for temperature range, by season.

In Chicago, the 50th percentile is about 10 degrees of difference, regardless of season. And here is the same for Cairo:

We see much more substantial temperature swings, especially in spring and summer. I found this especially important in Cairo when we are hitting mean temperatures of “Hot” (>80), because it means we can hit extreme heat during the daytime.

I don’t have a very sophisticated way to deal with this yet. I have just said: for Cairo, the GM may increase the temperature by 1 band during the day and decrease it by 1 at night if they want. I am open to suggestions about how to deal with it more effectively.

Weather Generation 4: Introducing the Weather Matrix

Looking at the transition matrices gave me another thought—what if we just encoded the transition matrix directly as a table? Let’s go back to our comparison of the d100 table I had:


(Note this is subtly different from what I had before—I updated the rules about ordering, and condensed light rain and light snow to just rain and snow). If we could turn the entire left panel into a single table, we’d be able to pull values from the distribution exactly.

Implementing this was somewhat involved but it doesn’t require any innovations compared to what we did before. The process is:

1) Choose a minimum probability that is worth including in the table (I picked 1%). Restrict yourself to only those days.

2) For each day you are moving from, create a d100 table.

3) Compile all of these d100 tables into a single matrix.

Other methodological notes: in part (2) I had to account for transitions from days which were included to days that were not included (i.e., because they were too unlikely and culled in (1)). To do this I normalized the outgoing probabilities after step (1).

Also in (2), you don’t need to specify a number of repeats here because the ‘repeat’ is directly encoded into the table. So it is just as I described in part 3 with a repeat of 0.

Regarding (3), I made some modifications such that the roll tables read from 01-100 going down the table. That also means the roll tables are transposed compared to the image above, which I find much easier for play.

And here is what it looks like:


How do we do on accuracy? Here’s a comparison of the two matrices:


The transition matrices here are way nicer. That said, there is some error with the daily probabilities, typically on the order of ~10%. This is, I think, an unavoidable consequence of using just the transitions and throwing out the rare days. If we often get Cold, Overcast from low probability extreme cold days which are cut from the table, then our rolled data will underestimate, for example.

Next, error metrics. Remember, for the single d100 table, we had:

Daily Error: 0.0019
Unweighted Transition Error: 0.028
Weighted Transition Error: 0008

With the weather matrix, we achieve:

Daily probabilities: 0.0043
Transition matrix (unweighted): 0.017
Transition matrix (weighted): 0.0001

We do about half as well (but still quite well) on the daily probabilities, and see improvements in the transition matrices, especially the weighted one. Because the unweighted transition matrix deals with states we don’t access, no strategy is going to improve on those. For the states we do access, we get the transitions very accurate.

This also achieves what I set out to do regarding gameability—with these transitions, the players know that cold days will follow cold days and warm days will follow warm ones. While there can be more creative strategies to generate days, like the hexflower or creative methods that roll independently on multiple tables, at this point I find it unlikely that they’ll be more satisfying from a simulationist angle. It’s hard to beat a single roll on a matrix from a gamist perspective.

That said, there are a few weaknesses of these matrices. The first is seasonality—it doesn’t get transitions from early to late spring, for example. (You could but you’d need more matrices). Also, you have to construct each table for a new location and that takes some effort. More importantly, it is hard to modify—say I want “Chicago, but 10 degrees colder”. With other methods this wouldn’t be hard, but I don’t think its plausible to modify a weather matrix like that. The work is automated and  you get what you get. Finally, there are these lingering issues of wind and fog.

Future installments will take a look at those in more detail. For the moment I’m happy with my design, so I decided to make examples for several locales.

Weather Generation 3: Single Tables With Repeats

 We finished last time with a picture of the data we were hoping to match—both (1) daily probabilities and (2) the transition matrix for movement between types of days. That looked like this: 

 

where the codes each give a type of day. For example, XC.C gives extreme cold with clear skies.

One of the first things I want to note is that there are many very rare days. The number of occurrences of each day are given on the y axis, in parantheses. For example, H.P, Hot and partly cloudy, explored only 6 times in the Chicago Winter since 1940. That doesn’t seem worth including.

A second idea: in this plot, columns and rows are sorted by temperature; all the XC results are in the top left, for example. We can see that transitions between different temperatures are rare, especially those going more than one step. I’ll keep this in mind going forward.

But now its time to have our first go at a weather table. If we do a d100 table matching the daily probabilities, how close do we get to the actual data?

To get the d100 table, we have to decide how to go from our observations to d100 results. I wanted to do it proportionally; for example, a day that occurs 16% of the time would get a result of 01-16. But we also need to do something with the very rare days (<1%) and to figure out fractions, like if a day is observed 16.25% of the time.

I ended up using Hamilton’s method, designed for assigning seats in proportional electoral systems. (I saw on Wikipedia some people have concerns with this because of weird edge cases; but I figured it was fine for our purposes). The idea is, you start out with some number of seats—say 100, because that is our number of rolls. Then we define the number of rolls needed for a whole seat. Say there were 50,000 days; then we have:

quota = 50,000 / 100 = 500.

First we distribute votes by whole multiples of 500 votes. Suppose we had three results with

Cool / Overcast / None                   35,232 votes
Cool / Partly Cloudy / None           10,200 votes
Cold / Overcast / None                   4568 votes

Then we get

Cool / Overcast / None                   75 seats
Cool / Partly Cloudy / None           20 seats
Cold / Overcast / None                   9 seats

That leaves us with one seat left over. We check how many votes remain after distributing the whole multiples:

Cool / Overcast / None                   232 votes
Cool / Partly Cloudy / None           200 votes
Cold / Overcast / None                   68 votes

And give the final seat to the one with the most votes:

Cool / Overcast / None                   76 seats
Cool / Partly Cloudy / None           20 seats
Cold / Overcast / None                   9 seats

Here, I ended up doing the exact same thing, except distributing to ‘rolls’ rather than ‘seats’. Applying this directly to Chicago, Autumn yields the following table:

So, how close is this to the actual result? Here is a comparison of the transition matrices. These are much smaller, because the very rare results in the actual data don’t appear. The daily probabilities are listed on the y axis, and the transition probabilities are given in the matrix. Note I generated the rolled data by 100,000 rolls on the table—that’s why the values aren’t all whole percentages.

Here the daily probabilities match well but the transitions are not so good. Because we are just making new rolls from a table, the chance of any day transitioning to other types of days are all uniform. The weather today says nothing about the weather tomorrow. Not only is that not simulationist—it fails from a gamist perspective, because the information the players have about the weather does not inform their choices.

Before moving on, I wanted to quantify how close our results were to reality, to have an error metric to use in the future. I defined values for both the daily results and the transition matrices. For the daily results, I took the absolute value of the difference in day-by-day probabilities:

E_i = Σ_{i=1}^{n} (p_obs,i − p_sim,i)

Where i indexes across the weather states. I included the very rare days in this computation, even if they did not appear in the rolled data. I chose the mean absolute differences for interpretability; the value is how far off we are, on average, from the daily distribution.

For the transition matrices, I defined both an unweighted metric (which did not take into account daily probabilities) and a weighted metric (which reported a greater penalty for missing the transitions of common days). For the unweighted metric, I took the mean absolute error between the transition probabilities:

E_tm = (1 / (N*K)) * Σ_j Σ_i (t_obs,i,j − t_sim,i,j)

where i and j are different weather states and t_obs, i, j gives the transition probability from state i to state j.

For the weighted one, I used the proportion of each day as the weights. The equation becomes:

E_tm = (1 / N) * Σ_i (1 / K) * p_obs,i * Σ_j (t_obs,i,j − t_sim,i,j) 

In this case I also used the mean error. I don’t feel the interpretability is great here because so many transition probabilities are 0. But it still seems better in that regard than a squared error. Note that these sums are taken across all observed types of days, even if they don’t appear in the rolled data.

 For 10,000 generated days, I get the following:

 Daily Error: 0.0019
Unweighted Transition Error: 0.028
Weighted Transition Error: 0008

 Importantly, the weighted transition error is a different metric than the unweighted and should not be thought of as ‘better’. These numbers are useful because they can serve as a baseline. Are our methods better or worse than the simple d100 table? And if so, is it because they are matching transitions or daily errors?

 ---

 Looking at the transition matrices is really helpful for figuring out what we need to improve on. There are two main factors that jump out at me. First, the observed data has these clear ‘blocks’ to it associated with regions of similar temperature. These are on the left, below. You’re much more likely to transition within the same temperature band.

 Second, there seems to be a high chance of repeats. Looking just at the diagonal gives the chances of transitioning to the same day, and these all have high probabilities. So this gives us two ideas for improving generation: either heighten your chance of the same temperature, or heighten your chance of the same day.

 

I decided to give the second one a go first, because it seemed easier to me. We can get it with a  straightforward change to our d100 table generation. Rather than distributing day probabilities across all 100 rolls, we fix a number of ‘repeats’—that is, entries on the table which just say ‘repeat the previous day’—and then we distribute the probabilities across the remaining values. This lets us use the same quota math, but dividing by fewer rolls; for example, 90 if we have 10 repeats.

Then, we can optimize the number of repeats to find the right number to include. This requires a choice about how to combine the different errors. I decided to just use a factor of 2.5, which puts the weighted transition matrix error on the same magnitude as the daily errors. For this data, varying the number of repeats yields:

We get substantial improvements in the transition matrix down to ~22 or so repeats, while the daily errors do decently until >20. The optimum number of repeats here is 15, and the table looks like:

 

Readers may note that the daily error at 0 repeats here is 0.0017, rather than the 0.0019 I quoted above. That’s because I’m rolling new weather data each time, and there is some variability.

Here is what the resulting matrices look like, again compared to the observed data.

 
 

I’m not that impressed with the results from a simulationist perspective—the rolled distribution is still clearly different than the observations. That said, we cut the errors by about 25% for the transition matrix without much change to the daily errors, and we ended up with a simple d100 table. We can run this algorithm on any table we want to improve it a bit and still have something that is easy to use at the table.

That said, there is another question I have regarding simulationism: what would the error be on a typical year relative to the average probabilities?

---

Where do we go next? We still have this idea of matching the blocks to one another; it seems that if we specified the temperature could change by no more than one step, we’d be doing better. However, we need a good way to do that without adding too much complexity to the rolling, and that takes some care.

At the same time, all of this work with the transition matrices prompted another thought—what if we just encode the transition matrix directly into a table? A weather matrix, if you will. This should get us exactly the results we need.

Weather Generation 2: What Are We Trying to Match, Anyway?

Last time, I proposed three criteria for weather generation. These are: (1) simulationist: it has to match real world data. (2) algorithmic: I have to be able to generate more tables easily. (3) practical: No more than two dice rolls and two tables.

Today I want to talk about the first bit. How can we tell if a weather system is simulationist?

Throughout this series I’ll be using Chicago as an example. It’s a big city and so should have good data; it has seasons; and I’ve lived there in the past, which should help me identify data that doesn’t add up.

First, I had to get data for real locations. This takes some effort. You can get data from individual weather centers in the US from NOAA. These data, when they exist, seem to be of high quality. But there are oddities. For example, the Chicago dataset has temperature and precipitation since 1946 (albeit with 85% coverage) but only has cloud cover until 1996. So each dataset we use would require some choices and care.

Using local weather stations would also make it challenging to apply anything to places outside the US. The formatting may be different and the data may be less available.

Another option is to use reanalysis datasets. These have already taken data from many local stations, as well things like satellite data, and combined them with weather models to get continuous, global coverage. However, these data can be extensive. One of the best models, ERA5, for example, has a size of tens of terabytes. I ended up not trying to use these.

Instead, I went with data from Open-Meteo, a service which has already pulled some key parameters from ERA5 for easy access. This gives me data of the same type, since 1940, for any location on Earth. Presumably the quality varies somewhat based on how much local coverage went into the ERA5 data? I’d bet the 1940 data are worse, for example. But I think this is sufficient for RPG purposes.

---

So, what do we actually get? The key variables are temperatures, wind speeds, precipitation (snow & rain), and cloud cover. Open-Meteo also provides weather codes by day (e.g., weather code 3 = “Overcast”). At first those seemed helpful, but the distribution wasn’t quite what I expected. For example, for the Chicago summer, the most common weather code I got was Overcast, which was about 10x as likely as Clear Sky. It turns out they are probably set by the most severe weather encountered that day, and that other people encountered similar issues.

I decided to try to set my own weather codes then. I had to decide what to include. Temperature and precipitation are obvious. Cloud cover seemed necessary as well—there is an argument to keep it out because it is not gameable, but I felt it is so important to getting the vibe of a place, and to getting transitions between weather with no precipitation vs with, that I wanted to keep it.

In contrast, I decided not to use wind. Similar arguments can be made in its favor (it precedes rain) and it might be more gameable (-2 to ranged attacks?). But I felt it was less fundamental than cloud cover and 3 variables per day was already a lot. Wind is also complicated, because you have to distinguish gusts vs more steady winds and there might be a need for wind direction. Later on I may revisit it, especially for marine systems where it will be important for sailing.

I was disappointed to not get data for fog. This didn’t seem to be available as a variable in Open-Meteo. Some of the weather codes, at least, indicated fog; but I already noted those were not that reliable. And at the same time, for Chicago at least there were no weather codes that had fog in the past 85 years. Fog isn’t that complicated to add (just yes/no) and is gameable (less visibility) in important ways (good for ambushes, sneaking). So I consider this a real loss. If anyone has any ideas for how to get good fog data in the future I would love to hear them.

---

With those variables, I then set up a categorization scheme. I didn’t want to be too precise, keeping in mind my goal of practicality. A change of 10 degrees F is not that gameable.

I settled on--for temperature, taking the mean temperature each day:

>100 F:          Extreme Heat
>80 F:            Hot
>55 F:             Warm
>35 F:             Cool
>0 F:              Cold
<0 F:              Extreme Cold

For cloud cover, taking the average cloud cover:

>60%:            Overcast
>25%:             Partly Cloudy
<25%:             Clear

And for precipitation, taking the sum across the day:

>0.8” rain:     Heavy rain
>0.2” rain:     Rain
>3” snow:      Heavy snow
>0.3” snow:   Snow
<0.2” rain and 0.3” snow:   No precipitation

If there was both rain and snow, I defaulted to snow. (E.g., 0.3” rain, 0.2” snow would be snow). (NB: I originally tried 0.1”as the threshold for snow, but this led to a Chicago map that was subjectively “too snowy”).

---

I categorized each day into these categories. Then I split them days up by season to figure out the dynamics for each season. (Total, there are 6*3*5=90 possible types of days). Here, for example, is part of the output for Chicago autumn:

Cool / Overcast / None                   0.168862
Cool / Partly Cloudy / None           0.128163
Cold / Overcast / None                   0.108924
Cool / Clear / None                          0.097676
Cold / Partly Cloudy / None           0.081841
Warm / Overcast / None                0.072961
Warm / Partly Cloudy / None        0.064082
Warm / Clear / None                       0.060530


And it continues like that. Total, 34 days are represented, with the rarest occurring 0.015% of the time (1 day across 85 years).

When we think about simulationism, this is the first data we need to match—the day-by-day probabilities of a certain weather state. An authentic weather table for Chicago autumn should have days which are Cool and Overcast with no precipitation about 17% of the time.

---

The second thing we need to match is the transition probabilities: assuming today is Cool and Overcast, how likely is tomorrow to be Cool and Overcast? This is the heart of a ‘realistic’ feeling weather system. So what do those data look like? Across all types of day, something like this:

Here, I shortened the descriptors for each day. The first entry gives temperature:

XC =   Extreme Cold
C =      Cold
L =      Cool
W =     Warm
H =     Hot
XH = Extreme Heat

The second gives cloud cover:

C =      Clear
P =      Partly Cloudy
O =      Overcast

The third gives precipitation:

LS =    Light snow
LR =   Light rain
HS =   Heavy snow
HR =   Heavy rain

If there is no precipitation, that entry is left blank. So XC.C. corresponds to extreme cold and clear.

To read this plot, types of days are given on both columns and rows. You start by finding the row for the current type of day. Then the chance you have of transitioning to another type of day (the columns) is given by the color of that box. The number of times each day appeared in the dataset is given next to the row, in parentheses.

For example, the XC.C. type only appears once. It was followed by a cold, overcast day (C.O.), so that box is dark blue and has a transition probability of 1.

For other days, there are more complex distributions. They are ordered by temperature, with colder days appearing at the top and the left. You can see that generally speaking, days cluster by temperature. Cold days transition to cold days and do not transition to warm ones.

This is a big and complicated object. Keep in mind that we can generate this for any location we want on the planet. So if we determine a good way to match it for Chicago, we should be able to apply that anywhere.

Over the next posts, we’ll see how well we can do.

 

End note: If we wanted to go even further we could include state information for multiple older days: given it was cold and overcast today and cool and clear today, what will tomorrow be like? I don’t know of any RPG system that attempts this and I think any attempt would be far too detailed to be practical. Still, I note it as a limitation.