Dear New Zealand: Here's a simulation of 4.8 million citizens moving about spreading a virus - it runs on a laptop

Peter Cotton

Helping to build a decentralized prediction network

Published Apr 8, 2020

This is a longer form followup to my post describing the open source pandemic package on PyPI (with Python code also available on Github). You can use the code to simulate millions of people moving about in two dimensions, crossing paths and, unfortunately, getting sick. In fact you can do it right now. First install docker if you don't have it and then:

docker run xtellurian/pandemic

When you do that, you'll be contributing to a community created surrogate model - an approximation of the simulations discussed here that runs a million times faster. This video illustrates the dynamic with a toy sized town of fifty people. Watch how transmission takes place. You might even discern commuting and households.

By default the package implements a novel parsimonious agent based model whose properties I discuss here. As a category, agent models can be conceptualized as taking several million copies of a disease progression model for an individual, and having those models walk around and bump into each other. It is not unusual to model individuals in this fashion, as with this paper on influenza modeling in Australia for example. What's different is that I set out to do this with very, very few parameters and with an eye on some analytical results that might further simplify estimation.

Outline:

The modeling of illness progression
The model for population movement (Ornstein-Uhlenbeck processes on the plane)
The model used to generate a city (sprawling home and work locations)
Candidate model improvements
A visual tour of parameters and their sensitivities, stressing the importance of population density and of social distance
How you can help if you have spare CPU cycles

0. Motivation. Deficiencies with widely cited models for virus spread

Like you, I had heard forecasts and strategy choices referencing how far the virus will penetrate into the population. They did not accord with my intuition. No code was released. In some case not even an outline of the mathematical approach was made public, which seems unforgivable in this day and age. Where modeling was referenced, it seemed all too often to be a reference to the "workhorse" compartmental model known as CIR - a borderline tautology. Two schools of thought emerged:

Strong form defeatism ("let's head straight for herd immunity")
Weak form defeatism ("we must flatten the curve but the area under the curve may remain roughly the same").

The "Flatten The Curve" rallying cry of weak-form defeatism is perhaps the defining image of our time. Certainly the defining animation:

Weak form defeatism served a noble purpose as a foil to strong form defeatism. Isn't it cool how the mass of the curve stays the same as the curve is flattened? Complete baloney as it turns out - at least in the simulations I will present. Curves are flattened out to the right a little bit depending on what changes, but mostly they are just squished straight down. Watch these simulations of disease spreading in Disk City as we vary an important parameter.

Of course I don't wish to beat up on a campaign that saved lives by highlighting overloading of the medical system - but how did strong form defeatism get a grip in the first place? Defeatism is mathematically seductive if you are primarily guided by simple models that treat the population as a whole and stir everyone around constantly like cake batter. The world sits on a knife edge. Any observed exponential explosion in infection implies that we'll all get the bug.

Well, not quite everyone. Even with these models - where everyone is constantly being beamed to random locations on Earth - there is still a point at which so-called herd immunity is reached. But that point is just a bookkeeping equality and not truly an attempt to model a turning point with any genuine effort.

In contrast the model and simulation package presented here might, if interpreted carefully, suggest quite different characteristics of disease progression across cities, towns and countries - especially when densities and movements vary considerably. Things are still on a knife edge, but it is a blunter, rustier knife. Moreover, as the rightmost panel of the video above suggests, we will try to actually model something - the relationship between density and infection growth rate is shown in that plot and there are other things one can instrument. We might get it wrong, but at least we can try to construct falsifiable hypotheses for what determines rates of infection rather than assuming it is a physical constant.

In my previous post I showed a video of two larger towns with population 40,000 that suffer 50% and 75% virus penetration due to differing testing regimes. In the video shown below we remove half the inhabitants and run it again. The result is dramatically different. The smaller town fares much better. The virus penetration is less than 30%.

To tilt the scales I set the testing rate per capita three times lower in the small town compared to the more populated. And in one of the more populous towns, I introduced randomized testing. Not so for the small town. In other words, the smaller town was set up to fail but despite these disadvantages, the smaller town fared substantially better in relative terms. Density matters in this model.

Here is another example, this time with a much larger city of 800,000 people. The virus spreads in this city along the main commuting corridor between the two centers of population. The result is anything but pretty but again, only about 35% of the populace gets the bug and the percentage falls dramatically as you move into the suburbs.

It may not be best to interpret density literally as geographical density. If you prefer it is merely a trick for ensuring there are significant differences in the intensity of interactions between people and variation in the frequency of happenstance exposure. Even more importantly, we will not assume the dots are randomly shuffled at each time step!

1. Modeling an individual's illness progression.

Turning to the description of the model that generated these simulations we start with disease progression. This aspect is orthodox. The code assumes that each individual has a health state:

Vulnerable,
Infected but not symptomatic or as yet tested
Symptomatic but not as yet tested
Positive
Recovered, or
Deceased.

For every moment that passes, there is a probability that an individual will move from one state to another. As per the code:

Infected or symptomatic people might be tested, and progress to being positive.
An infected, symptomatic or positive person may either recover or die.

Furthermore,

The vulnerable colliding with those infected or symptomatic may become infected.

This is the key difference between this simulation and the models for pandemics I am trying to wean you from which, need I say it again, are equivalent to assuming that anyone can become infected at any time by any other person.

Put another way, we are using a motion simulation in order to model the exposure (viral load) that individuals receive - as compared with simply assuming it is some fixed or proportional quantity. In this model there is no single rate of infection - your odds of being exposed vary dramatically depending on where you are, which in turn depends on where you live, whether you commute and so forth. We are modeling the fact that you move in a circle of people that while technically infinite, grows more slowly in logarithmic terms after a short while.

If this type of model (where motion and collisions or near misses creates exposure) is new to you and my first video above doesn't bring home the point, then I heartily recommend this video by Grant Sanderson on spatiotemporal models for contagion (he has taken mathematical pedagogy to a very high level - do come back here after you have spent an afternoon on his channel watching gems like this).

Here are nine simulations of disease spreading. It is somewhat reminiscent of a forest fire. You would never model fire by assuming that any tree in the country might spontaneously ignite whether or not it was close to the flame front. People move more than trees but not so much more, as you pan out. We too are rooted to our homes and setting aside the jet-setting, our movements around our homes are like embers in the wind.

2. The OU Movement model

I now describe the pandemic package's motion model that controls movement.

The primary challenge as I saw it was finding a way to deal with movement at different scales without getting super messy, prescriptive or exploding the parameter space. Which seat you choose on a train will determine your viral load. Which elevator you take could be critical. Whether you place your hand on a table in one position or an inch to the side could make the difference between life and death. These tiny differences are as important as whether you live in San Mateo or San Salvador, or whether you come into the city from the North or South on a regular basis.

One can imagine the list of parameters getting out of hand pretty quickly if we start modeling modes of transport and where you sit on the sofa when you get home. You should surely try to use that data if you have it. I don't. Instead, the pandemic library pulls out that old chestnut Brownian motion, or rather a minor modification, to try to capture unfortunate coincidence on different scales.

It says it on the box actually, individuals follow OU processes.

The OU stands for mathematicians Leonard Ornstein and George Eugene Uhlenbeck (we'll pretend Uhlenbeck means "one length back" in German). An Ornstein-Uhlenbeck pandemic model, as we might term it, is one where everyone ambles about like Brownian motion - aka a random walk. However an OU process isn't entirely directionless. Rather, it is a combination of a stagger and a steady pull towards a target - like someone who has imbibed too much looking for the campground toilet in the dark.

Take another close look at the first video and you may see that some of the points are meandering to work (or school) and back. Others are staying home - including some who are sick. They are all following random walks with a pull. Here is an example of such a walk courtesy of Shiyu Ji and wikipedia, where you can read more about uses and properties of OU processes.

In the OU pandemic model as currently implemented, people take random walks but they have invisible springs drawing them towards an attractor. The stiffness of the spring is a parameter whose role shall return to. A percentage of the population wander their way to work and then to home on a daily basis (increment the parameter count by one for the working fraction, those of you who are counting). They have two attractors instead of one. In the morning the commuters are pulled by an invisible spring in the direction of their workplace. In the evening they are pulled by another invisible force back home.

We say the commuters follow regime switching OU processes in two dimensions as the attractors switch abruptly. The commuters are drawn into a more densely populated region and then return. Their typical distance from others other waxes and wanes and at the same time their probability of colliding with someone rises and falls - though that is random too. In the video below we trace out the the average inverse squared distance between commuters as a function of time, and the number of collisions per unit time.

Somebody should tell me how to make the bottom left plot a straight line - I suspect the tidal force on the commuters is the first order effect there. But let's focus on the bottom right plot for now.

When the instantaneous probability of something happening is itself a random process we call it doubly stochastic (a.k.a. a Cox process). Names aside what I want you to notice is that the number of collisions (crosses in the bottom right plot) rises and falls as if it were doing its own commute back and forth between two levels. The high level is around a dozen collisions per time step. The low is four or so. The volatility seems higher during work hours as well.

Thus we see that one might reasonably model the rate of collision as a regime switching Ornstein-Uhlenbeck process - one where the level and volatility are pulled towards two different values depending on the time of day. It is a very little known fact that this setup admits an asymptotic series expansion for the probability of collision, where all terms can be calculated. In fact I've been carrying that in my back pocket for twenty years wondering if it would ever be useful.

The non-mathematical version of that aside is that one can construct an OU model inside a OU model, a little like a dream inside a dream in Inception. In fact as you progress deeper into this model you come across some amazing creatures from the mathematical depths. The problem of calculating collision probabilities brings out the heavy machinery from complex analysis - not to mention things with great names like the Wiener sausage (it would surely ruin your fun to provide a direct link to that piece of mathematical arcana). That feels like two dreams down, however, so back to the top level reality we go - which is to say the OU walks on the plane.

The attraction term in the drunken walk is linear. Only the best, perfectly crafted invisible Hooken springs are employed. There is no momentum because I wasn't sure it was necessary or even a good idea, but we could consider it. Commuting by spring is something Elon Musk might perfect when, God willing, this is all over.

We can critique the realism of OU walks with or without momentum - although let's be honest, you were never that keen to go straight to work. After you stop in at the coffee shop and then run into a few friends you're approaching OU. At home you wander about too, trying to avoid the horror of home schooling.

When you get to work, in this model, you are still being tugged at by the spring but it doesn't mean you will all settle into the same exact location and automatically infect everyone you work with. Your random motion keeps you bobbling about in a neighborhood of your attractor, which is to say the water cooler. The same occurs when you go home. People don't converge to a single point so infection is not guaranteed at either location.

Open up this page and cut and paste the contents into the terminal. You'll see. In better times we could use this stylized setup to model the spread of gossip or something. Sadly, we have a morbid scenario unfolding before our eyes and that's why I am trying to enlist your help.

3. Modeling cities

What of initial conditions? The people in the model need to start somewhere, and they start a small random distance from their home. But where is their home? Here is a little piece of code that generates a synthetic collection of home and work locations. I would consider it something of a placeholder, yet I suspect, and I might be wrong, that the model retains some essential connection to population density regardless. When you run the model you will see just how important density is.

The city is generated in a manner vaguely inspired by the Chinese restaurant process. The general idea is that you start throwing people at a map but not uniformly. Instead they pick a person already on the map at random and then decide to live near them. But they also have a tendency to plonk themselves down further away from the origin thus leading to urban sprawl. More on that below, but your best documentation for this is the code. I'd expect someone will suggest a more realistic generating process.

I will make one remark, however. There would seem to be a danger in confusing implied geometry with actual geometry. The former could be defined as a correction to the latter when you take into account the overly stylized movement model. It is not necessarily the case that swapping out the fake city model for actual coordinates of people from actual demographic data is the best immediate use of time. For while we can certainly question the dynamics of motion I suspect there is a stretching of the map that makes it more realistic - and with sufficient skullduggery we could reverse engineer that into the code that generates the locations of workplaces and homes. What I mean to say is that deficits in the modeling of movement and geography can cancel out - but only if you let them.

4. What's not in, yet.

Quite a few things could be added quite easily - though I am not sure they justify increasing the parameter space - something I am loathe to do.

Overloading. The addition of an intensive care state and contingent death hazard rates based on proximal capacity would capture a major effect now missing in the model - namely overloading of the system. However to be pedantic, this effect can be calculated after the fact to first order so including it in the simulation is not strictly required. Indeed some PDE models treat death and recovery as the same thing, which is a similar trick.
Fomites. (Hat tip Stephen Luterman). I had to Google that word too. The passing of disease via inanimate objects. I can't find literature that makes quantitative assessment, but including trails of people and not just their contemporaneous positions would not be unreasonable. However my gut tells me that this might be taking movement too literally. It is just one of the ways that proximity leads to transmission and we aren't modeling whether it is a cough or cross contamination. We can absorb it into the walk size to first order and if necessary, testing rates - since although they don't move, fomites are otherwise like non-compliant symptomatic people.
Cohorts. Different behavior by age and vulnerability. If the data by age is available, then we get back the cost of putting in cohorts. I think this might be the next thing that goes into the model.
Schools. I point out that work, in the model, is wherever anyone goes. It can be school. I think the city simulation could be improved. School may be an important dynamic especially during lockdown periods in countries like Australia, where most things are closed but schools remain open.
Non-compliant positive people could be included in the model, but this is very close to simply changing the testing rates (non-compliant positives are the same as asymptomatic carriers).
Repeat illness. A post by Daniel Goldstein reads: "New data not published yet suggests that 70% of people develop measurable neutralizing antibodies that peak and stabilize at about 2 weeks. 30% don’t, even though they recovered clinically. This may prove to be a thing, we don't know yet.
Gatherings. A hurricane could force many people into shelters (suggestion from Michael Broome). A small change could enable one to study the impact of allowing events involving fifty or five hundred people. However is this similar to including a small number of large but far-flung households?

5. A visual tour of some parameters and their sensitivity

In mathematics we count one, two, many. Two is usually the hardest so we skipped it and went straight to many agents. I alluded to some analytical possibilities that might constitute an attempt at "two", but that's probably for another time and format.

Many agents does not imply many parameters - indeed it was my intent at the outset to try to understand deficits in "one agent" models with some kind of parsimonious many agent model - without knowing the location of train stations or how many people work in a particular barber shop. In this section we introduce the small number of control knobs for the city generation and movement.

Kappa [default=3.0] Kappa is the linear coefficient in the restoring force that drags everyone towards their attractors. The larger kappa, the stiffer the invisible spring. Kappa controls how keen people are to get to work, and symmetrically how keen your are to get home. Some might wish to break this symmetry - and we probably all know people for whom the homebound kappa might be smaller than the workbound kappa - but I don't think it is required. In fact we might convince ourselves that kappa might not even need to be a free parameter and can be fixed once and for all.

The first thing to appreciate about kappa is its role at home. The video below shows a number of OU particles that are all attracted to the origin. They quickly reach an equilibrium state where their typical distance to the origin ceases to change very much. The green circle on the left is the theoretically computed root mean square distance to the origin, which is inversely proportional to the square root of kappa. The larger kappa, the tighter people cluster at home and, to a lesser extent, at work.

Typically we would expect higher kappa to lead to more contagion.

Recommended by LinkedIn

Expand Your Mind: Discover How "The Rules of…

K.C. Barr 5 months ago

The effect of temperature and humidity on the growth…

Charles Wiles 4 years ago

Network Meta-analysis in R part II. The Network…

Darko Medin 7 months ago

h. Average household size [default=2.5]

Households are a recent change to the code. Household sizes are binomially distributed. A household is merely a group of people with the exact same home attractor. Household size and the relative size of kappa and W (discussed next) are going to determine if everyone gets sick when one person does. Anecdotally that seems to be the case!

It isn't clear to me that we need household size to be a free parameter for forecasting purposes given we have two other knobs, and I would be inclined to fix it at the national average. However in the presence of demographic data, varying household size and holding everything else constant might enrich the feature space and possibly help us understand differing infection rates (say between the Bronx and Manhattan).

Before leaving household size some comment on the relationship to kappa is in order. The interplay between kappa and household size is seemingly straightforward at home. Commuters' mean square distance to home in the evenings can also relate quite closely to the ergodic average if kappa is on the higher side. Here we set kappa=6 to make the point.

All that said commuting muddies the waters in other ways. Higher values of kappa might lead to a faster commute through troubled waters - depending on the geography - thus reducing infection en route. The video below shows progression of the disease in a town using nine increasing values of kappa (increasing left to right along the first row, then the second and third). In this and the other comparisons, I will show that all parameters are relative to a baseline town (see the code for this and other ready to go towns and cities). The baseline for kappa is 3.0. Thus the top left simulation is for kappa=1.8 and the bottom right kappa=4.2.

There is, I think you will agree, no glaringly obvious pattern although, as expected, contagion occurs less rapidly for the small values of kappa. It may not be just be distance from the water cooler. With kappa set very low, people are often not even getting to work every day.

A caveat. With all of these simulations the initial conditions are important and those are generated randomly by design. If you scan back up to the first 3 x 3 video you'll see nine simulations that, though I didn't mention it at the time, use identical parameters. They certainly don't end up the same. Perhaps this will give you newfound respect for experts trying to predict the course of this disease but also a sense of the noise in these comparisons.

c [default=0.5] The fraction of people who have work attractors. Work, as noted, can be school or just some place people go that is sufficiently removed from where one would otherwise wander to near home.

W [default=1.0] The random walk step size is governed by the variable W. If two people walk (stagger, drift) towards each other then all else being equal they are less likely to collide if they take large random steps.

Philosophically I would like to think we could fix this as W=1.0 the way physicists like to set the speed of light c=1, but as a practical matter that doesn't work too well (see the geohashing part of the code, those who are interested). It might also detract from the message in the video below: walk size matters a lot.

Morally, walk size W is social distance.

You'll notice that contagion occurs very quickly on the top left simulation - although its geography was a little unlucky, admittedly. The penetration is about 80%. On the bottom right, in comparison, we have a simulation where roughly one third as many people have caught the bug.

Larger step size W, less transmission.

One should be leery of interpreting this as a "right to roam". I prefer the interpretation where people are keeping to their offices, crossing the street to avoid joggers, choosing alternative means to get to work, commuting at off hours, wearing a mask (to create greater effective distance) and changing what they do when they socialize - such as shouting at each other as they sit in Adirondack chairs placed twenty feet apart.

n [default=40000] The population parameter is, as you might expect, the number of agents in the simulation. If we hold the city generation parameters constant then this is almost a population density parameter - except for a small sprawl effect.

In this simulation we watch towns with differing populations varying from 24,000 on the top left to 56,000 bottom right. Towns begin with the same proportion of infections, ranging from 30 infections top left to 70 bottom right. I apologize but because the simulation runs more slowly for the larger populations the progression is not in sync, making it look as if the smaller towns are doing worse than they are.

Nonetheless we see the larger towns fare considerably worse by every proportional measure. The dreaded exponential growth kicks in quickly, powered by asymptomatic carriers who collide with a sufficient number of people to set off the chain reaction. Meanwhile, in the small town things turn pear shaped, but more slowly. The rate of recovery is still an appreciable fraction of the people getting sick. Recovered people are the control rods for this reactor. For some time this takes some heat out of the core, as it were. Had this simulated town done more (higher rate of testing, or higher rate of contact tracing, which is equivalent to raising the asymptomatic test rate) it might have done much better.

Need we say it, a relatively small change in density makes a big difference. At the risk of a misleading comparison between this simulation's density and real world density, we note that the top right versus the bottom right density ratio happens to be about the same as the density ratio between San Fransisco and New York.

The two leftmost simulations provide a density ratio of 2:1 which is similar to the density difference between Manhattan and Brooklyn. The density ratio between right and top left approximates the ratio between the densities of New York City versus Los Angeles or in turn, Los Angeles against Detroit, Cleveland or the Portland metropolitan area. We should not bucket New York with Portland. Never mind the fact that Tennessee, South Dakota and Alaska are about twenty times less populated than New Jersey per square mile.

Care must be taken, however, as this is a highly stylized model intended to capture close collisions. Offices might be slightly bigger in Missouri than Manhattan, but perhaps not three times as large. People with longer commutes don't stand apart from each other more than those with short commutes, once they get to the office. There are close talkers living in Brandenburg, the least dense city in Germany.

The fact that the larger town started with the same proportion of infections but a numerically larger number than the small town might play a role. However here is a similar set of simulations in which all towns start with fifty infections, irrespective of size. Things still turn out worse for the more densely packed populations.

You are going to see a lot of discussion about population density. Out of curiosity I quickly scratched out this plot of the logarithm of COVID-19 deaths (as of Apr 4th) versus the logarithm of population density for European countries. It ain't a law of physics, but it is hard to ignore.

I know what you are thinking ... there might be all manner of confounding variables here and I'm sure you are right. What else is correlated? I did discover in the course of my brief investigations that amongst the "kissy countries" COVID-19 deaths are also correlated strongly with the number of times it is customary to kiss someone on the cheek when meeting. However I hastily add that this correlation completely disappeared after accounting for population density (actually the sign turned negative, a little). I think, therefore, we should take this as a cautionary tale. A lot of things are likely to be correlated, spuriously or otherwise, with population density. If you hear that "X causes COVID-19" then check against the plausible culprit: population density.

Parenthetically, this does leave us with a true mystery. I leave it to a reader with superior insight to explain why people who live in more densely populated countries kiss each other on the cheek a larger number of times when they meet, than those living in less densely populated countries. And before you jump to it, no I am not convinced that in increase in the number of ceremonial kisses causes children.

Radius [default=0.04] The radius spaces out homes and workplaces. A work location is selected randomly by first choosing an existing work location and then moving away (on average) a distance equal to the radius. We multiply the radius by a standard normal number. At present this parameter also determines how close people live to each other. The home radius is fixed at four times the work radius parameter (there is no real rationale for that particular choice beyond a desire for one less parameter).

I'm sure it will not surprise you to learn that decreasing the radius, ceteris paribus, tends to set up a city more likely to be susceptible to contagion. However geometry and luck enter the fray as seen in the comparison between the bottom left and top right towns.

The radius is, to my way of thinking, a crucial parameter because it can be used to create different types of growth varying from exponential to sub-exponential and for that matter linear growth. To achieve the same effect, epidemiologists have used meta-population models and sometimes elaborate network models to achieve the same flexibility - but the parameter count is quite large in those setups.

To see how radius might grant our model the ability to straddle very different growth rates, imagine a row of households equally spaced along a road. If they are separated sufficiently relative to their walk size W, and if there is no commuting, the virus will spread approximately like a wick burning. This growth will therefore be approximately linear. On the other hand you have already seen exponential growth. All things in between are certainly possible if you choose to set the initial home and work locations manually (someone can arrange a tribute to the late John Conway) but even without doing this there is sufficient flexibility. We can generate a plausible city with one or two more parameters.

s. Sprawl [default=0.25] Controls the extent to which home and work locations tend to drift away from the origin. After an existing home location is chosen, the new location is centered at (1+s) times the existing home position. So when we said the mean distance from the previous home was the radius r, that wasn't entirely true.

e. Sprawl quadratic term [default=0.05] ... actually we also add e times the square of the distance to the origin of the existing home when choosing the center of where the next home is to be built. It remains to be seen if this guy gets the chop from Ockham's razor.

I won't make you watch any more poorly produced videos but different choices of sprawl coefficients can yield different density profiles for the city and this tips the scales toward higher or lower virus penetration into the periphery.

Estimation

The free parameter space can be kept as small as one likes, though there are quite a few numbers we have omitted in our discussion (for those who are interested the names of parameters including health parameters are found in pandemic/conventions.py).

It is my suspicion that you can go a long way varying a small number of these parameters - perhaps even just one, W or maybe two: W and radius. But I don't yet have a lot to back up that statement. And while running the model for any one set of initial conditions or parameters is easy, running it for a large number is not. Thus...

6. (Update) A community built model at SwarmPrediction.com

Even if you are unfamiliar with Python you can run pandemic by installing docker and then issuing the command:

docker run xtellurian/pandemic

That's why when you run this script the following occurs:

The simulation starts with random parameters
When it finishes, it sends back the results to a database API

It would be better if the parameters were chosen to optimize an acquisition function. Work in progress. The API and Python examples of use are open to all. If enough people run the model, we may build up a large database of initial conditions, parameters, and model outcomes (a public database of course) which can serve as a training set for a surrogate model of pandemics. A surrogate model is one that is approximately the same as the simulation yet can be computed thousands or millions of times faster.

So ... armchair epidemiologists, disease epistemologists, statistical etymologists ("data science"?) and all you people on Linked-in offering to be my personal life coach. Do you have a few spare CPU cycles? Please stop drawing exponential curves and instead do the following:

Installing docker
docker run xtellurian/pandemic

Alternatively:

Open up terminal
Open up this page and cut and paste the bash script into the terminal

The script will show plots for one simulation pass and then ceed back your screen real estate while it continues to purr away.

Acknowledgements

I would like to acknowledge Python's implementation of set authored by Raymond D. Hettinger (which I'm guessing doesn't get a shout out too often). I found Hioaki Kawai's geohash package to be more than useful. Ryan Finnegan of Amphora created the docker container you see here. My company Intech Investments picks up the tab for the surrogate model database.

I hope I have made it clear that the motion model is the only novel thing here - not a Markov model for illness which is, lets face it, is almost impossible to not reinvent. While I like to think of myself as a defender of research mores, I have on this occasion written the code more quickly than I can navigate academic paywalls and research prior work. An open bibliography may make up for that over time.

Other open source models for pandemics

I've found a few network (graph) based simulations.

The epydemic library by Simon Dobson places people on nodes of a graph.
SimpactCyan (Liesenborgs et al) also models relationships and events.
CoMSES repository contains several flu and COVID-19 simulations

Motion models:

MASON commuters travel in straight lines from home to work and back.

One could, I suppose, use OU dynamics to set parameters in these network models and conversely, use calibrated network models to identify misspecification in the OU motion model (insofar as it generates networks of acquaintances).

Parting thoughts. Fast turnaround at first, then slowly back.

As noted in the prior post, I wrote this code because I couldn't get my hands on any "official" open source spatiotemporal model for contagion and, like many of you, wanted to better understand the dynamics of disease. Agent models such as this one can be used or abused. However I don't think we should be resorting to heterogenous population models for infection that don't actually model infection at all.

Another way to come at this is to recognize that an agent model can, with a small modification, imitate a heterogeneous population (macroscopic) model - albeit a computationally inefficient one. Suppose I were to introduce a line of code into the simulation that shuffled the positions of every particle on the screen, placing people uniformly irrespective of their home or work locations, and doing this over and over again. This enforces homogeneity, but doesn't seem terribly sensible.

Instead, I encourage you to mess with the agent model code where density differentials are preserved if you look closely, there are parties at Westport and Stanwell Tops. I don't think it will do your mathematical intuition any harm - nor your mental health. As noted this has turned me into something of a cautious conditional optimist, my warning about lack of rigorous estimation notwithstanding.

Here's why. Notice in the video below how the underlying drivers turn around before the things we measure do. This is true also for "classic" pandemic models. However I believe there is another dynamic in the motion model that kills off the exponential growth faster - it relates to exhaustion in the circle of people you regularly collide with, an effect I hope to quantify in a future post.

The dramatic drop in movement we have seen (as registered by cell phone locations) in places like New York City needs to be carefully translated in the strange geometry of this model, but combined with the knife edge behavior you see in the simulations, it suggests a rapid turnaround is possible - albeit one followed by a longer slower trip down than up.

About Me

Hi I'm the author of Microprediction: Building an Open AI Network published by MIT Press. I create open-source Python packages such as timemachines, precise and humpday for benchmarking, and I maintain a live prediction exchange at www.microprediction.org which you can participate in (see docs). I also develop portfolio techniques for Intech Investments unifying hierarchical and optimization perspectives.

Patrick Quade

Founder & CEO @ DineSafe, IWasPoisoned & SafelyHQ | Leveraging AI & Crowdsourcing for Public Health & Safety

Ben Chapman Lee-Ann Jaykus

1 Reaction

Steven N.

Business first BI, Data Visualization, and Data Lake Architect embracing fast data and data science collaboration

Very good work. The first encounter I had with the word "fomites" was in my undergrad immunology class as a pneumonic on the 5 Fs regarding the spreaders of disease-"fingers", "feces", "files", "food", and......…."fomites".

Stephan Mathys

Storyteller by nature, actuary by training | FSA, Author, Speaker

I LOL'd a couple of times, especially at "you were never that keen to get straight to work" and "trying to avoid the horror of home schooling." This point: "How does one model the percentage of a population in a highly heterogenous population and how much virus they are exposed to? How could it not be the case that the geometry of cities and suburbs impacts transmission opportunity and that local dynamics take root that have little to do with an average across the country?"needs to be discussed more and considered more in projections as well as mitigation efforts. We hear much talk about absolute numbers of cases, and then the "oh, but you didn't control for population counts!" hue and cry comes out. Well, sure, there's a lot more people in the United States so there's a good chance for more absolute # of cases, but how does population density affect this whole thing? I'm in the US so it's relevant to me, and I think what we're going to see is that the # Cases / M population in an area with 100 people / km^2 is going to be extremely different from # Cases / M population when there are 1,000 people / km^2. Intuitively, you'd think there's going to be less transmission, because the people are more spread out. But, is that the case? There are so many factors that parameterization becomes almost impossible. Anyway, thanks for the post and the read. Looking forward to more.

1 Reaction

Jim Moore

President of Avalana Advisors, Inc.

Peter - Looks like an interesting approach! I have played around a bit w SEIR type diff eq models and wondered “what if you add a stochastic element into RO?” A Markov switching approach might be a good way to capture a mixture of “light spreaders” and “super-spreaders” among other things. Also agree population density plays a large role in rate of transmission. Have reverted to simpler models to build intuition as the diff eq models are so parameter sensitive and hard to see joint impact of parameter specification and time-varying R0. Hope you and yours are safe!

2 Reactions

See more comments

To view or add a comment, sign in

Dear New Zealand: Here's a simulation of 4.8 million citizens moving about spreading a virus - it runs on a laptop

Peter Cotton

Helping to build a decentralized prediction network

Outline:

0. Motivation. Deficiencies with widely cited models for virus spread

1. Modeling an individual's illness progression.

2. The OU Movement model

3. Modeling cities

4. What's not in, yet.

5. A visual tour of some parameters and their sensitivity

Recommended by LinkedIn

6. (Update) A community built model at SwarmPrediction.com

Acknowledgements

Other open source models for pandemics

Parting thoughts. Fast turnaround at first, then slowly back.

About Me

More articles by Peter Cotton

Insights from the community

Others also viewed

Today's Artificial Intelligence is Ready to Battle the Next Pandemic?

What about Flattening the Infodemic Curve?

The Likely Nasty Social, Economic and Surveillance Aftereffects of COVID-19 -- and How to Combat Them

Contagion of Disease Conceptual View in the Space Dimension.

How Machine Learning in Business is Affected by the COVID-19 Pandemic

Unmasking the Silent Killer: A Tale of Heartlands and Hope

Model Validation for ONS

Artificial Intelligence in Population Health

Silver Bullets from Silicon Valley: Artificial Intelligence will improve our ability to respond to global outbreaks like Covid-19

Explore topics

Outline:

0. Motivation. Deficiencies with widely cited models for virus spread

1. Modeling an individual's illness progression.

2. The OU Movement model

3. Modeling cities

4. What's not in, yet.

5. A visual tour of some parameters and their sensitivity

Recommended by LinkedIn

6. (Update) A community built model at SwarmPrediction.com

Acknowledgements

Other open source models for pandemics

Parting thoughts. Fast turnaround at first, then slowly back.

About Me

More articles by Peter Cotton

Shutting Down California — The Billion Dollar Prediction Problem

Nine Yards is Enough - Why NFL Receivers and Running Backs Should Stop Shy of the First Down

Comparing Python Global Optimization Packages

The Instant, Morbid Reaction to the "Worst Debate in History"

Be the World's Most Asymptotically Productive Data Scientist (Deploying Models Edition)

Live, Online Distribution Estimation Using t-Digests

Benchmarking AutoML Vendors and Open Source Time Series Packages

On Masks and Seat Belts. COVID Cases Mount Higher where Ralph Nader was Unpopular, and Conversely.

Ever Lost an Algorithm? A Suggestion for Addressing the Reusability Crisis

Where will a Badminton Player Move to Next, and How Should we Adjudicate Predictions of the Same?

Insights from the community

Others also viewed

Today's Artificial Intelligence is Ready to Battle the Next Pandemic?

What about Flattening the Infodemic Curve?

The Likely Nasty Social, Economic and Surveillance Aftereffects of COVID-19 -- and How to Combat Them

Contagion of Disease Conceptual View in the Space Dimension.

How Machine Learning in Business is Affected by the COVID-19 Pandemic

Unmasking the Silent Killer: A Tale of Heartlands and Hope

Model Validation for ONS

Artificial Intelligence in Population Health

Silver Bullets from Silicon Valley: Artificial Intelligence will improve our ability to respond to global outbreaks like Covid-19

Explore topics