We finished last
time with a picture of the data we were hoping to match—both (1) daily
probabilities and (2) the transition matrix for movement between types of days.
That looked like this:

where the codes
each give a type of day. For example, XC.C gives extreme cold with clear skies.
One of the first
things I want to note is that there are many very rare days. The number of
occurrences of each day are given on the y axis, in parantheses. For example,
H.P, Hot and partly cloudy, explored only 6 times in the Chicago Winter since
1940. That doesn’t seem worth including.
A second idea: in
this plot, columns and rows are sorted by temperature; all the XC results are
in the top left, for example. We can see that transitions between different
temperatures are rare, especially those going more than one step. I’ll
keep this in mind going forward.
But now its time to
have our first go at a weather table. If we do a d100 table matching the daily
probabilities, how close do we get to the actual data?
To get the d100
table, we have to decide how to go from our observations to d100 results. I
wanted to do it proportionally; for example, a day that occurs 16% of the time
would get a result of 01-16. But we also need to do something with the very
rare days (<1%) and to figure out fractions, like if a day is observed
16.25% of the time.
I ended up using Hamilton’s method,
designed for assigning seats in proportional electoral systems. (I saw on
Wikipedia some people have concerns with this because of weird edge cases; but
I figured it was fine for our purposes). The idea is, you start out with some
number of seats—say 100, because that is our number of rolls. Then we define
the number of rolls needed for a whole seat. Say there were 50,000 days; then
we have:
quota = 50,000 /
100 = 500.
First we distribute
votes by whole multiples of 500 votes. Suppose we had three results with
Cool / Overcast /
None 35,232 votes
Cool / Partly Cloudy / None 10,200
votes
Cold / Overcast / None 4568
votes
Then we get
Cool / Overcast /
None 75 seats
Cool / Partly Cloudy / None 20
seats
Cold / Overcast / None 9
seats
That leaves us with
one seat left over. We check how many votes remain after distributing the whole
multiples:
Cool / Overcast /
None 232 votes
Cool / Partly Cloudy / None 200
votes
Cold / Overcast / None 68
votes
And give the final
seat to the one with the most votes:
Cool / Overcast /
None 76 seats
Cool / Partly Cloudy / None 20
seats
Cold / Overcast / None 9
seats
Here, I ended up
doing the exact same thing, except distributing to ‘rolls’ rather than ‘seats’.
Applying this directly to Chicago, Autumn yields the following table:

So, how close is
this to the actual result? Here is a comparison of the transition matrices.
These are much smaller, because the very rare results in the actual data don’t
appear. The daily probabilities are listed on the y axis, and the transition
probabilities are given in the matrix. Note I generated the rolled data by
100,000 rolls on the table—that’s why the values aren’t all whole percentages.

Here the daily
probabilities match well but the transitions are not so good. Because we are
just making new rolls from a table, the chance of any day transitioning to
other types of days are all uniform. The weather today says nothing about the
weather tomorrow. Not only is that not simulationist—it fails from a gamist perspective,
because the information the players have about the weather does not inform
their choices.
Before moving on, I
wanted to quantify how close our results were to reality, to have an error
metric to use in the future. I defined values for both the daily results and
the transition matrices. For the daily results, I took the absolute value of
the difference in day-by-day probabilities:
E_i = Σ_{i=1}^{n} (p_obs,i − p_sim,i)
Where i indexes
across the weather states. I included the very rare days in this computation,
even if they did not appear in the rolled data. I chose the mean absolute
differences for interpretability; the value is how far off we are, on average,
from the daily distribution.
For the transition
matrices, I defined both an unweighted metric (which did not take into account
daily probabilities) and a weighted metric (which reported a greater penalty
for missing the transitions of common days). For the unweighted metric, I took the
mean absolute error between the transition probabilities:
E_tm = (1 / (N*K)) * Σ_j Σ_i (t_obs,i,j − t_sim,i,j)
where i and j are
different weather states and t_obs, i, j gives the transition probability from
state i to state j.
For the weighted
one, I used the proportion of each day as the weights. The equation becomes:
E_tm = (1 / N) * Σ_i (1 / K) * p_obs,i * Σ_j (t_obs,i,j − t_sim,i,j)
In this case I also
used the mean error. I don’t feel the interpretability is great here because so
many transition probabilities are 0. But it still seems better in that regard
than a squared error. Note that these sums are taken across all observed types
of days, even if they don’t appear in the rolled data.
For 10,000
generated days, I get the following:
Daily Error: 0.0019
Unweighted Transition Error: 0.028
Weighted Transition Error: 0008
Importantly, the
weighted transition error is a different metric than the unweighted and should
not be thought of as ‘better’. These numbers are useful because they can serve
as a baseline. Are our methods better or worse than the simple d100 table? And
if so, is it because they are matching transitions or daily errors?
---
Looking at the
transition matrices is really helpful for figuring out what we need to improve
on. There are two main factors that jump out at me. First, the observed data
has these clear ‘blocks’ to it associated with regions of similar temperature.
These are on the left, below. You’re much more likely to transition within the
same temperature band.
Second, there seems
to be a high chance of repeats. Looking just at the diagonal gives the chances
of transitioning to the same day, and these all have high probabilities. So
this gives us two ideas for improving generation: either heighten your chance of
the same temperature, or heighten your chance of the same day.
I decided to give
the second one a go first, because it seemed easier to me. We can get it with
a straightforward change to our d100
table generation. Rather than distributing day probabilities across all 100
rolls, we fix a number of ‘repeats’—that is, entries on the table which just
say ‘repeat the previous day’—and then we distribute the probabilities across
the remaining values. This lets us use the same quota math, but dividing by
fewer rolls; for example, 90 if we have 10 repeats.
Then, we can
optimize the number of repeats to find the right number to include. This
requires a choice about how to combine the different errors. I decided to just
use a factor of 2.5, which puts the weighted transition matrix error on the
same magnitude as the daily errors. For this data, varying the number of
repeats yields:

We get substantial
improvements in the transition matrix down to ~22 or so repeats, while the
daily errors do decently until >20. The optimum number of repeats here is
15, and the table looks like:

Readers may note
that the daily error at 0 repeats here is 0.0017, rather than the 0.0019 I
quoted above. That’s because I’m rolling new weather data each time, and there
is some variability.
Here is what the
resulting matrices look like, again compared to the observed data.

I’m not that
impressed with the results from a simulationist perspective—the rolled
distribution is still clearly different than the observations. That said, we
cut the errors by about 25% for the transition matrix without much change to
the daily errors, and we ended up with a simple d100 table. We can run this
algorithm on any table we want to improve it a bit and still have something
that is easy to use at the table.
That said, there is
another question I have regarding simulationism: what would the error be on
a typical year relative to the average probabilities?
---
Where do we go
next? We still have this idea of matching the blocks to one another; it seems
that if we specified the temperature could change by no more than one step,
we’d be doing better. However, we need a good way to do that without adding
too much complexity to the rolling, and that takes some care.
At the same time,
all of this work with the transition matrices prompted another thought—what if
we just encode the transition matrix directly into a table? A weather
matrix, if you will. This should get us exactly the results we need.