Using a real-world network to model localized COVID-19 control strategies

Ethics statement

Information was provided and consent was obtained from all participants in the study before the app recorded any data. The study was approved by the London School of Hygiene & Tropical Medicine Observational Research Ethics Committee (ref. 14400).

Social tracking data

The Haslemere dataset was generated and described as part of previous work, which gives a detailed description of the characteristics of this dataset and town3,4. Briefly, the data were collected during the 2017–2018 BBC Pandemic project conducted in Haslemere, Surrey, UK. The project involved a massive citizen-science experiment to collect social contact and movement data using a custom-made phone app and was designed to generate data relevant to understanding directly transmitted infectious disease3,4. Of the 1,272 individuals in Haslemere who downloaded the app, 468 individuals had sufficient data points at a resolution of 1 m over three full days within the focal area for further analysis3. All 468 focal individuals were known to have spent >6 h within the area bounded by 51.0132° N, 0.7731° W (southwest) and 51.1195° N, 0.6432° W (northeast) (within postcode GU27), but the dataset used here comprises deidentified proximity data made available as pairwise distances (~1-m resolution) at 5-min intervals (excluding the period from 11 p.m. to 7 a.m.)3.

Social network construction

In our primary analysis, we defined social contacts as events when the average pairwise distances between individuals within a 5-min time interval (calculated using the Haversine formula for great-circle geographic distance3) were 4 m or less. By doing so, we aimed to capture the majority of relevant face-to-face contacts (that is, those that might result in transmission) over 5-min periods, particularly given the 1-m potential error3 on the tracking measurement during these short time intervals. Furthermore, this threshold of 4 m is within typical mobile phone Bluetooth ranges for relatively accurate and reliable detections. Therefore, this contact dataset will also be comparable to proximity-based contacts identified through Bluetooth contact tracing apps, which may be preferred to real-location tracking for privacy reasons. We considered the sensitivity of the network to the contact definition by testing six further social networks from contacts defined using different threshold distances spanning the conceivable potential transmission range within the 5-min intervals (thresholds of 1 m to 7 m). We first measured the correlation of the network structure (that is, pairwise contacts) across the seven networks using Mantel tests. We also measured the correlation of each individual’s degree (number of contacts), clustering coefficient (number of contacts also connected to one another), betweenness (number of shortest paths between nodes that passed through an individual) and eigenvector centrality (a measure that accounts for both a node’s centrality and that of its neighbors) across the seven networks.

The Haslemere data are a temporal dataset spanning three full days. While the epidemic model we use is dynamic, the contagion process of COVID-19 operates over a longer time period than 3 days. To be able to meaningfully simulate longer-term outbreak dynamics, we quantified the data as a static social network in which edges indicate the propensities for social contact between nodes. Temporal information is incorporated by weighting the edges using the temporal contact information, instead of using a dynamic network, which would require contact data over a much longer period of time. In the primary analysis, we weighted the edges by the number of unique days a dyad was observed together (but see the Supplementary Information for other temporal definitions). Therefore, the weight score indicates the propensity for each dyad to engage in a social contact event on any given day, with 0 corresponding to no contact, 1 corresponding to ‘weak links’ observed on the minority of days (one-third), 2 corresponding to ‘moderate links’ observed on the majority of days (two-thirds) and 3 corresponding to ‘strong links’ observed on all days. In this way, the weights of this social network could be included directly and intuitively into the dynamic epidemic model. For sensitivity analysis, we also created other weightings for this network and examined the correlation in dyadic social association scores (using Mantel tests) with our primary weighting method. Specifically, for the sensitivity analysis, we used edges specified as (1) a binary (that is, unweighted) network across all days, (2) a raw (and ranked) count of the 5-min intervals in contact, (3) a transformed weighted count (edge weight transformed as 1 – einterval count, which approximates a scenario where infection risk increases with contact time but reaches 95{554322552816a46baa129cc1ab31b2aae22be5e23f407658ace83a643d80b0e9} saturation after ~15 min of contact within dyads) and (4) a simple ratio index (SRI) weighting that corrects for observation number as SRI score31. The SRI score for any two individuals (that is, A and B) is calculated as

$${textrm{SRI}}_{{textrm{A,B}}} = frac{{{textrm{Obs}}_{{textrm{A,B}}}}}{{{textrm{Obs}}_{textrm{A}} + {textrm{Obs}}_{textrm{B}} – {textrm{Obs}}_{{textrm{A,B}}}}}$$


where Obs is the number of 5-min observation periods (the intervals since the start of the day) within which an individual is recorded within 4 m of another individual.

Null network simulation approach

We used null networks32 to understand the network properties that shape predictions of COVID-19 spread under different control scenarios. Null networks can also show how contagion may depend on the arrangement of social ties, how it may operate in different social environments and which simulation approaches may be the most similar to real-world infection dynamics. We created four null network scenarios (Extended Data Fig. 9) with 1,000 networks generated under each of these. All of the null network scenarios kept the same number of nodes, number of edges and weights of these edges as the Haslemere network but were generated under the following nulls: (1) the ‘edge null’ scenario (Extended Data Fig. 9a) considered random social associates, allowing the edges of the network to be randomly allocated among all nodes; (2) the ‘degree null’ scenario (Extended Data Fig. 9b) considered individual differences in sociality but assigned random social links for dyads, so randomly swapped the edges between nodes but maintained the degree distribution of the real network (this was therefore even more conservative than a power-law network simulation aiming to match real differences in sociality); (3) the ‘lattice null’ scenario (Extended Data Fig. 9c) considered triadic and tight clique associations, so created a ring-like lattice structure through assigning all edges into a ring lattice where individuals were connected to their direct neighbors and their second- and third-order neighbors (that is, six links per individual), from which excess links were then randomly removed (until the observed number of edges was reached); and (4) the ‘cluster null’ scenario (Extended Data Fig. 9d) considered the observed level of clustering, so created a ring lattice structure as described above but only between individuals observed as connected (at least one social link) in the real network, added remaining links (sampled from fourth-order neighbors) and then rewired the edges until the real-world global clustering was observed (~20{554322552816a46baa129cc1ab31b2aae22be5e23f407658ace83a643d80b0e9} rewiring; Extended Data Fig. 9d). These conservative (and informed) null models allowed connections to be arranged differently within the network but maintained the exact same number of individuals, number of social connections and weights of these social connections at each simulation.

Epidemic model

By building on the epidemiological structure of a previous branching-process model13, we developed a full epidemic model to simulate COVID-19 dynamics across the Haslemere network. Full model parameters are given in Supplementary Table 1. For a given network of individuals, an outbreak is seeded by randomly infecting a given number of individuals (default = 1). The model then moves through daily time steps, with opportunities for infection on each day. All newly infected individuals are assigned an ‘onset time’ drawn from a Weibull distribution (mean = 5.8 days) that determines the point of symptom onset (for symptomatic individuals) and the point at which infectiousness is highest (for all individuals)12. Each individual is then simultaneously assigned asymptomatic status (whether they will develop symptoms at their onset time) and presymptomatic status (whether they will infect others before their assigned onset time), drawn from Bernoulli distributions with defined probabilities (defaults = 0.4 and 0.2, respectively; Supplementary Table 1). At the start of each day, individuals are assigned a status of susceptible, infectious or recovered (which includes deaths) on the basis of their exposure time, onset time and recovery time (calculated as the onset time plus 7 days) and are isolated or quarantined on the basis of their isolation or quarantine time. The model simulates infection dynamics over 70 days.

Possible infectors are all non-isolated and non-quarantined infectious individuals. Each day, all non-isolated, non-quarantined susceptible contacts of all infectors within the network are at risk of being infected. The transmission rate for a given pair of contacts is modeled as

$$lambda left( {t,s_i,p_i} right) = A_{s_i}I_{ei}mathop {smallint }limits_{t – 1}^t fleft( {u;;mu _i,alpha _{p_i},omega _{p_i}} right)du$$


where t is the number of days since infector i was exposed, si and pi are the infector’s symptom status (asymptomatic (yes/no) and presymptomatic (yes/no), respectively). (A_{s_i}) is the scaling factor for the infector’s symptomatic status (Supplementary Table 1) and Iei is the weighting of the edge in the network (that is, the number of days observed together) between the infector and susceptible individual. The probability density function (fleft( {u;;mu _i,alpha _{p_i},omega _{p_i}} right)) corresponds to the generation time, which is drawn from a skewed normal distribution (see ref. 13 for details). Briefly, this uses the infector’s onset time as the location parameter μi, while the slant parameter αpi and the scale parameter ωpi both vary according to the infector’s presymptomatic transmission status (Supplementary Table 1). This enabled us to simulate a predefined rate of presymptomatic transmission while retaining a correlation structure between onset time and infectiousness, avoiding a scenario in which a large number of individuals were highly infectious on the first day of exposure (see Supplementary Table 1 and Data availability for more details).

When using this transmission rate, the probability of infection within a susceptible–infectious pair of individuals t days after the infector’s exposure time is then modeled as

$$Pleft( {t,s_i,p_i} right) = 1 – {textrm{e}}^{ – lambda left( {t,s_i,p_i} right)}$$


Note that the change in status from ‘infectious’ to ‘recovered’ at 7 days after symptom onset does not affect infection dynamics (as transmission rate ≈ 0 7 days after onset time in our model) but is instead used for contact tracing purposes. To test how the above rate of infection related to the reproduction number R0 and the observed generation times, we generated empirical estimates of the number of secondary infections in the early outbreak stages of the model. We ran 1,000 trial simulations from a random single starting infector and quantified (1) the mean number of secondary infections from this case and (2) the time at which each secondary case was infected. We multiplied the rate of infection by a scaling parameter to obtain a baseline R0 of 2.8, although we also performed sensitivity analysis (Supplementary Table 1). The mean generation time using this method was 6.3 days (median = 6 days). These basic parameters correspond closely to published estimates12,33.

In addition to the infection rate from within the network, the infection rate from outside the network is also simulated daily by randomly infecting susceptible individuals with a probability of 0.001 (although we also performed sensitivity analysis of this parameter).

We simulated different contact tracing scenarios using contact information from the network, with the aim of evaluating both app-based and manual contact tracing strategies. Primary and secondary contacts of individuals are identified from the network on the day of the infector’s symptom onset, and, as such, contacts of asymptomatic infectors are not traced. Contacts who have already recovered are excluded. Susceptible contacts are traced with a given probability (0.3–0.9 tested; Supplementary Table 1). We assume that this probability captures a wide range of reasons why contacts might not be traced, and it thus acts as an intuitive simplification.

The isolation and/or quarantine time of each individual is determined on the basis of their infection status, their symptomatic status, whether they have been traced and the control scenario. We considered four control scenarios: (1) no control, where no individuals are isolated or quarantined; (2) case isolation, where individuals isolate upon symptom onset after a delay period; (3) primary contact tracing with quarantine, where individuals isolate upon symptom onset (after a delay) and traced contacts are quarantined upon their infector’s symptom onset (also after a delay); and (4) secondary contact tracing, as in scenario (3) but including contacts of contacts. All isolated and quarantined individuals are contained for 14 days.

Finally, we simulated a range of testing efforts for SARS-CoV-2. Each individual is assigned a testing time on isolation or quarantine, with the delay between containment and testing sampled from a Weibull distribution. A cap on the number of daily tests is assigned, and each day up to this number of individuals are randomly selected for testing. Test results are dependent on infection and asymptomatic status, with a false-negative rate (that is, the probability that an infectious individual will test negative) of 0.1 (ref. 21) and a false-positive rate (that is, the probability that a susceptible individual will test positive) of 0.02 (ref. 22). Individuals who test negative are immediately released from isolation or quarantine.

A set of default parameters was chosen to represent a relatively optimistic model of contact tracing, which included a short time delay between symptom onset/tracing and isolation/quarantine (1–2 days) and a high proportion (90{554322552816a46baa129cc1ab31b2aae22be5e23f407658ace83a643d80b0e9}) of contacts traced within this tracked population (default parameters highlighted in bold in Supplementary Table 1). We assumed that the probability of tracing was constant over time and therefore independent of previous isolation and quarantine events and that all individuals remained in quarantine for the full 14 days, unless released via testing. We performed sensitivity tests on all relevant parameters (Supplementary Table 1). To examine how infection dynamics were affected by network structure, we ran epidemic simulations on each of the null networks described above. We also ran simulations on networks generated using higher distance thresholds (7 m and 16 m) for defining a contact. These networks were 20{554322552816a46baa129cc1ab31b2aae22be5e23f407658ace83a643d80b0e9} and 100{554322552816a46baa129cc1ab31b2aae22be5e23f407658ace83a643d80b0e9} more dense, respectively, and therefore provide an estimate of the robustness of our simulations to missing contacts.

We ran each simulation for 70 days, at which point the majority of new infections came from outside the network, with all scenarios replicated 1,000 times. With the null networks and physical distancing simulations, we ran one replicate simulation on each of 1,000 simulated networks. In no simulations were all individuals in the population infected under our default settings. Therefore, for each simulation, we report the number of cases per week and quantify the total number of cases after 70 days as a measure of outbreak severity. To present the level of isolation and quarantine required under different scenarios, we calculated the number of people contained on each day of the outbreak and averaged this over weeks to obtain weekly changes in the daily rates of isolation and quarantine.

Physical distancing simulations

We simulated a population-level physical distancing effort, in which a given proportion of the weak links were removed (edges observed on only a single day; Extended Data Fig. 10a–d). This is akin to a simple situation in which individuals reduce their non-regular contacts (for example, with people outside of their household or other frequently visited settings such as workplaces). As further supplementary analysis, we also carried out a more complex physical distancing simulation in which the weak links that were removed were randomly reassigned to existing contacts (Extended Data Fig. 10e–g). This represents a scenario where individuals reduce their non-regular contacts but spend more time with regular contacts.

The epidemic model code can be accessed at

Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Source Article