Data aggregation and reporting delays
Chi Zhang
2022-02-23
Source:vignettes/aggregation_and_delays.Rmd
aggregation_and_delays.Rmd
This document provides information on aggregation death registration data.
NOTE: the data example here are simulated, not real.
Individual level daily mortality reporting
The raw data typically has DOE (date of event/death), DOR (date of registration). Sometimes it also contains additional information such as location and age.
data_fake_nowcasting_raw
#> doe dor location_code
#> 1: 2018-01-01 2018-01-10 norge
#> 2: 2018-01-01 2018-01-07 norge
#> 3: 2018-01-01 2018-01-05 norge
#> 4: 2018-01-01 2018-01-07 norge
#> 5: 2018-01-01 2018-01-05 norge
#> ---
#> 83949: 2020-01-01 2020-01-08 norge
#> 83950: 2020-01-01 2020-01-06 norge
#> 83951: 2020-01-01 2020-01-15 norge
#> 83952: 2020-01-01 2020-01-06 norge
#> 83953: 2020-01-01 2020-01-04 norge
Weekly aggregation by location
We need to specify a time point until when we stop counting. This is usually the last or latest date available.
The time unit here is week. The events (deaths) happened in a natural week do not necessarily get counted within the same week. Therefore we create a data table that counts the number of events:
- happened in this week
- happened and were registered within this week (lag 0)
- happened and were registered within this and the next week (lag 1)
- …
The number are cumulatives.
These two concepts are different: - deaths happened and registered within the same week - deaths registered in this week (mixture of current and delayed registrations from previous weeks)
For convenience, we only count the events on a weekly basis, i.e. if an event happened on a Sunday and is counted on the next Monday, the delay in week (within the next week) is used, rather than the delay in day (one day delay).
In the future version, we might investigate this in detail.
Without weekly percentile
weekly_counts <- count_weekly_reporting(
data = data_fake_nowcasting_raw,
aggregation_date = as.Date('2020-01-01'),
max_week_delay = 4)
weekly_counts
#> isoyearweek_event location_code monday_doe n_event n_cum_lag0 n_cum_lag1
#> 1: 2018-01 norge 2018-01-01 809 120 697
#> 2: 2018-02 norge 2018-01-08 806 118 707
#> 3: 2018-03 norge 2018-01-15 848 122 709
#> 4: 2018-04 norge 2018-01-22 782 133 662
#> 5: 2018-05 norge 2018-01-29 806 106 695
#> ---
#> 100: 2019-48 norge 2019-11-25 789 116 664
#> 101: 2019-49 norge 2019-12-02 778 126 680
#> 102: 2019-50 norge 2019-12-09 846 142 737
#> 103: 2019-51 norge 2019-12-16 677 113 677
#> 104: 2019-52 norge 2019-12-23 135 135 135
#> n_cum_lag2 n_cum_lag3 n_cum_lag4
#> 1: 809 809 809
#> 2: 806 806 806
#> 3: 844 848 848
#> 4: 782 782 782
#> 5: 805 806 806
#> ---
#> 100: 789 789 789
#> 101: 778 778 778
#> 102: 846 846 846
#> 103: 677 677 677
#> 104: 135 135 135
With weekly percentile
If desired, weekly percentile of the cumulative registrations can be easily computed. This is used for explorative analysis so that we can see how much percent of events are registered completely.
weekly_counts_p <- count_weekly_reporting(
data = data_fake_nowcasting_raw,
aggregation_date = as.Date('2020-01-01'),
max_week_delay = 4,
keep_weekly_prob = T)
weekly_counts_p
#> isoyearweek_event location_code monday_doe n_event n_cum_lag0 n_cum_lag1
#> 1: 2018-01 norge 2018-01-01 809 120 697
#> 2: 2018-02 norge 2018-01-08 806 118 707
#> 3: 2018-03 norge 2018-01-15 848 122 709
#> 4: 2018-04 norge 2018-01-22 782 133 662
#> 5: 2018-05 norge 2018-01-29 806 106 695
#> ---
#> 100: 2019-48 norge 2019-11-25 789 116 664
#> 101: 2019-49 norge 2019-12-02 778 126 680
#> 102: 2019-50 norge 2019-12-09 846 142 737
#> 103: 2019-51 norge 2019-12-16 677 113 677
#> 104: 2019-52 norge 2019-12-23 135 135 135
#> n_cum_lag2 n_cum_lag3 n_cum_lag4 p_cum_lag0 p_cum_lag1 p_cum_lag2
#> 1: 809 809 809 0.1483313 0.8615575 1.0000000
#> 2: 806 806 806 0.1464020 0.8771712 1.0000000
#> 3: 844 848 848 0.1438679 0.8360849 0.9952830
#> 4: 782 782 782 0.1700767 0.8465473 1.0000000
#> 5: 805 806 806 0.1315136 0.8622829 0.9987593
#> ---
#> 100: 789 789 789 0.1470215 0.8415716 1.0000000
#> 101: 778 778 778 0.1619537 0.8740360 1.0000000
#> 102: 846 846 846 0.1678487 0.8711584 1.0000000
#> 103: 677 677 677 0.1669129 1.0000000 1.0000000
#> 104: 135 135 135 1.0000000 1.0000000 1.0000000
#> p_cum_lag3 p_cum_lag4
#> 1: 1 1
#> 2: 1 1
#> 3: 1 1
#> 4: 1 1
#> 5: 1 1
#> ---
#> 100: 1 1
#> 101: 1 1
#> 102: 1 1
#> 103: 1 1
#> 104: 1 1
Old aggregation data
The old version of weekly aggregation is kept here for reference
data_fake_nowcasting_aggregated
#> cut_doe n_death n0_0 p0_0 n0_1 p0_1 n0_2 p0_2 n0_3
#> 1: 2018-01-01 1000 1 0.001 13 0.013000000 255 0.2550000 765
#> 2: 2018-01-08 1065 0 0.000 13 0.012206573 245 0.2300469 816
#> 3: 2018-01-15 1197 0 0.000 15 0.012531328 283 0.2364244 921
#> 4: 2018-01-22 1191 0 0.000 19 0.015952981 281 0.2359362 910
#> 5: 2018-01-29 1272 0 0.000 11 0.008647799 311 0.2444969 968
#> ---
#> 99: 2019-11-18 709 0 0.000 8 0.011283498 162 0.2284908 525
#> 100: 2019-11-25 781 0 0.000 12 0.015364917 211 0.2701665 601
#> 101: 2019-12-02 695 0 0.000 9 0.012949640 208 0.2992806 NA
#> 102: 2019-12-09 320 0 0.000 11 0.034375000 NA NA NA
#> 103: 2019-12-16 16 0 0.000 NA NA NA NA NA
#> p0_3 n0_4 p0_4 n0_5 p0_5 n0_6 p0_6 n0_7 p0_7 n0_8 p0_8 n0_9
#> 1: 0.7650000 977 0.9770000 1000 1.0000000 1000 1 1000 1 1000 1 1000
#> 2: 0.7661972 1037 0.9737089 1065 1.0000000 1065 1 1065 1 1065 1 1065
#> 3: 0.7694236 1170 0.9774436 1197 1.0000000 1197 1 1197 1 1197 1 1197
#> 4: 0.7640638 1161 0.9748111 1190 0.9991604 1191 1 1191 1 1191 1 1191
#> 5: 0.7610063 1246 0.9795597 1270 0.9984277 1272 1 1272 1 1272 1 1272
#> ---
#> 99: 0.7404795 686 0.9675599 NA NA NA NA NA NA NA NA NA
#> 100: 0.7695262 NA NA NA NA NA NA NA NA NA NA NA
#> 101: NA NA NA NA NA NA NA NA NA NA NA NA
#> 102: NA NA NA NA NA NA NA NA NA NA NA NA
#> 103: NA NA NA NA NA NA NA NA NA NA NA NA
#> p0_9 n0_10 p0_10 n0_11 p0_11 n0_12 p0_12 n0_13 p0_13 n0_14 p0_14 week year
#> 1: 1 1000 1 1000 1 1000 1 1000 1 1000 1 1 2018
#> 2: 1 1065 1 1065 1 1065 1 1065 1 1065 1 2 2018
#> 3: 1 1197 1 1197 1 1197 1 1197 1 1197 1 3 2018
#> 4: 1 1191 1 1191 1 1191 1 1191 1 1191 1 4 2018
#> 5: 1 1272 1 1272 1 1272 1 1272 1 1272 1 5 2018
#> ---
#> 99: NA NA NA NA NA NA NA NA NA NA NA 47 2019
#> 100: NA NA NA NA NA NA NA NA NA NA NA 48 2019
#> 101: NA NA NA NA NA NA NA NA NA NA NA 49 2019
#> 102: NA NA NA NA NA NA NA NA NA NA NA 50 2019
#> 103: NA NA NA NA NA NA NA NA NA NA NA 51 2019
#> pop
#> 1: 5238654
#> 2: 5238654
#> 3: 5238654
#> 4: 5238654
#> 5: 5238654
#> ---
#> 99: 5272821
#> 100: 5272821
#> 101: 5272821
#> 102: 5272821
#> 103: 5272821
Another data (nation, simulated) looks like this. It is kept for now because fit_attrib()
requires it as an example.
data_fake_nation
#> week season year yrwk x location_code pop pr100_ili
#> 1: 1 2009/2010 2010 2010-01 24.0 norge 5367580 1.160403
#> 2: 1 2010/2011 2011 2011-01 24.0 norge 5367580 1.164511
#> 3: 1 2011/2012 2012 2012-01 24.0 norge 5367580 1.472767
#> 4: 1 2012/2013 2013 2013-01 24.0 norge 5367580 1.435822
#> 5: 1 2013/2014 2014 2014-01 24.0 norge 5367580 1.402403
#> ---
#> 570: 52 2017/2018 2017 2017-52 23.0 norge 5367580 1.270134
#> 571: 52 2018/2019 2018 2018-52 23.0 norge 5367580 1.028794
#> 572: 52 2019/2020 2019 2019-52 23.0 norge 5367580 1.236302
#> 573: 53 2009/2010 2009 2009-53 23.5 norge 5367580 1.100072
#> 574: 53 2015/2016 2015 2015-53 23.5 norge 5367580 1.269054
#> pr100_ili_lag_1 temperature_high deaths
#> 1: 1.1000719 0 891
#> 2: 0.9623231 0 874
#> 3: 1.3356371 0 854
#> 4: 1.2320998 0 885
#> 5: 1.2514772 0 904
#> ---
#> 570: 1.0938242 0 854
#> 571: 0.8534488 0 893
#> 572: 1.0609100 0 829
#> 573: 1.0387242 0 880
#> 574: 1.1645116 0 912