Data aggregation and reporting delays

This document provides information on aggregation death registration data.

NOTE: the data example here are simulated, not real.

library(nowcast)
#> nowcast 2024.5.27
#> https://www.csids.no/nowcast/
library(magrittr)

Individual level daily mortality reporting

The raw data typically has DOE (date of event/death), DOR (date of registration). Sometimes it also contains additional information such as location and age.

data_fake_nowcasting_raw
#>               doe        dor location_code
#>     1: 2018-01-01 2018-01-10         norge
#>     2: 2018-01-01 2018-01-07         norge
#>     3: 2018-01-01 2018-01-05         norge
#>     4: 2018-01-01 2018-01-07         norge
#>     5: 2018-01-01 2018-01-05         norge
#>    ---                                    
#> 83949: 2020-01-01 2020-01-08         norge
#> 83950: 2020-01-01 2020-01-06         norge
#> 83951: 2020-01-01 2020-01-15         norge
#> 83952: 2020-01-01 2020-01-06         norge
#> 83953: 2020-01-01 2020-01-04         norge

Weekly aggregation by location

We need to specify a time point until when we stop counting. This is usually the last or latest date available.

The time unit here is week. The events (deaths) happened in a natural week do not necessarily get counted within the same week. Therefore we create a data table that counts the number of events:

happened in this week
happened and were registered within this week (lag 0)
happened and were registered within this and the next week (lag 1)
…

The number are cumulatives.

These two concepts are different: - deaths happened and registered within the same week - deaths registered in this week (mixture of current and delayed registrations from previous weeks)

For convenience, we only count the events on a weekly basis, i.e. if an event happened on a Sunday and is counted on the next Monday, the delay in week (within the next week) is used, rather than the delay in day (one day delay).

In the future version, we might investigate this in detail.

Without weekly percentile

weekly_counts <- count_weekly_reporting(
  data = data_fake_nowcasting_raw, 
  aggregation_date = as.Date('2020-01-01'), 
  max_week_delay = 4)
weekly_counts
#>      isoyearweek_event location_code monday_doe n_event n_cum_lag0 n_cum_lag1
#>   1:           2018-01         norge 2018-01-01     809        120        697
#>   2:           2018-02         norge 2018-01-08     806        118        707
#>   3:           2018-03         norge 2018-01-15     848        122        709
#>   4:           2018-04         norge 2018-01-22     782        133        662
#>   5:           2018-05         norge 2018-01-29     806        106        695
#>  ---                                                                         
#> 100:           2019-48         norge 2019-11-25     789        116        664
#> 101:           2019-49         norge 2019-12-02     778        126        680
#> 102:           2019-50         norge 2019-12-09     846        142        737
#> 103:           2019-51         norge 2019-12-16     677        113        677
#> 104:           2019-52         norge 2019-12-23     135        135        135
#>      n_cum_lag2 n_cum_lag3 n_cum_lag4
#>   1:        809        809        809
#>   2:        806        806        806
#>   3:        844        848        848
#>   4:        782        782        782
#>   5:        805        806        806
#>  ---                                 
#> 100:        789        789        789
#> 101:        778        778        778
#> 102:        846        846        846
#> 103:        677        677        677
#> 104:        135        135        135

With weekly percentile

If desired, weekly percentile of the cumulative registrations can be easily computed. This is used for explorative analysis so that we can see how much percent of events are registered completely.

weekly_counts_p <- count_weekly_reporting(
  data = data_fake_nowcasting_raw, 
  aggregation_date = as.Date('2020-01-01'), 
  max_week_delay = 4, 
  keep_weekly_prob = T)
weekly_counts_p
#>      isoyearweek_event location_code monday_doe n_event n_cum_lag0 n_cum_lag1
#>   1:           2018-01         norge 2018-01-01     809        120        697
#>   2:           2018-02         norge 2018-01-08     806        118        707
#>   3:           2018-03         norge 2018-01-15     848        122        709
#>   4:           2018-04         norge 2018-01-22     782        133        662
#>   5:           2018-05         norge 2018-01-29     806        106        695
#>  ---                                                                         
#> 100:           2019-48         norge 2019-11-25     789        116        664
#> 101:           2019-49         norge 2019-12-02     778        126        680
#> 102:           2019-50         norge 2019-12-09     846        142        737
#> 103:           2019-51         norge 2019-12-16     677        113        677
#> 104:           2019-52         norge 2019-12-23     135        135        135
#>      n_cum_lag2 n_cum_lag3 n_cum_lag4 p_cum_lag0 p_cum_lag1 p_cum_lag2
#>   1:        809        809        809  0.1483313  0.8615575  1.0000000
#>   2:        806        806        806  0.1464020  0.8771712  1.0000000
#>   3:        844        848        848  0.1438679  0.8360849  0.9952830
#>   4:        782        782        782  0.1700767  0.8465473  1.0000000
#>   5:        805        806        806  0.1315136  0.8622829  0.9987593
#>  ---                                                                  
#> 100:        789        789        789  0.1470215  0.8415716  1.0000000
#> 101:        778        778        778  0.1619537  0.8740360  1.0000000
#> 102:        846        846        846  0.1678487  0.8711584  1.0000000
#> 103:        677        677        677  0.1669129  1.0000000  1.0000000
#> 104:        135        135        135  1.0000000  1.0000000  1.0000000
#>      p_cum_lag3 p_cum_lag4
#>   1:          1          1
#>   2:          1          1
#>   3:          1          1
#>   4:          1          1
#>   5:          1          1
#>  ---                      
#> 100:          1          1
#> 101:          1          1
#> 102:          1          1
#> 103:          1          1
#> 104:          1          1

Old aggregation data

The old version of weekly aggregation is kept here for reference

data_fake_nowcasting_aggregated
#>         cut_doe n_death n0_0  p0_0 n0_1        p0_1 n0_2      p0_2 n0_3
#>   1: 2018-01-01    1000    1 0.001   13 0.013000000  255 0.2550000  765
#>   2: 2018-01-08    1065    0 0.000   13 0.012206573  245 0.2300469  816
#>   3: 2018-01-15    1197    0 0.000   15 0.012531328  283 0.2364244  921
#>   4: 2018-01-22    1191    0 0.000   19 0.015952981  281 0.2359362  910
#>   5: 2018-01-29    1272    0 0.000   11 0.008647799  311 0.2444969  968
#>  ---                                                                   
#>  99: 2019-11-18     709    0 0.000    8 0.011283498  162 0.2284908  525
#> 100: 2019-11-25     781    0 0.000   12 0.015364917  211 0.2701665  601
#> 101: 2019-12-02     695    0 0.000    9 0.012949640  208 0.2992806   NA
#> 102: 2019-12-09     320    0 0.000   11 0.034375000   NA        NA   NA
#> 103: 2019-12-16      16    0 0.000   NA          NA   NA        NA   NA
#>           p0_3 n0_4      p0_4 n0_5      p0_5 n0_6 p0_6 n0_7 p0_7 n0_8 p0_8 n0_9
#>   1: 0.7650000  977 0.9770000 1000 1.0000000 1000    1 1000    1 1000    1 1000
#>   2: 0.7661972 1037 0.9737089 1065 1.0000000 1065    1 1065    1 1065    1 1065
#>   3: 0.7694236 1170 0.9774436 1197 1.0000000 1197    1 1197    1 1197    1 1197
#>   4: 0.7640638 1161 0.9748111 1190 0.9991604 1191    1 1191    1 1191    1 1191
#>   5: 0.7610063 1246 0.9795597 1270 0.9984277 1272    1 1272    1 1272    1 1272
#>  ---                                                                           
#>  99: 0.7404795  686 0.9675599   NA        NA   NA   NA   NA   NA   NA   NA   NA
#> 100: 0.7695262   NA        NA   NA        NA   NA   NA   NA   NA   NA   NA   NA
#> 101:        NA   NA        NA   NA        NA   NA   NA   NA   NA   NA   NA   NA
#> 102:        NA   NA        NA   NA        NA   NA   NA   NA   NA   NA   NA   NA
#> 103:        NA   NA        NA   NA        NA   NA   NA   NA   NA   NA   NA   NA
#>      p0_9 n0_10 p0_10 n0_11 p0_11 n0_12 p0_12 n0_13 p0_13 n0_14 p0_14 week year
#>   1:    1  1000     1  1000     1  1000     1  1000     1  1000     1    1 2018
#>   2:    1  1065     1  1065     1  1065     1  1065     1  1065     1    2 2018
#>   3:    1  1197     1  1197     1  1197     1  1197     1  1197     1    3 2018
#>   4:    1  1191     1  1191     1  1191     1  1191     1  1191     1    4 2018
#>   5:    1  1272     1  1272     1  1272     1  1272     1  1272     1    5 2018
#>  ---                                                                           
#>  99:   NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA   47 2019
#> 100:   NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA   48 2019
#> 101:   NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA   49 2019
#> 102:   NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA   50 2019
#> 103:   NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA   51 2019
#>          pop
#>   1: 5238654
#>   2: 5238654
#>   3: 5238654
#>   4: 5238654
#>   5: 5238654
#>  ---        
#>  99: 5272821
#> 100: 5272821
#> 101: 5272821
#> 102: 5272821
#> 103: 5272821

Another data (nation, simulated) looks like this. It is kept for now because fit_attrib() requires it as an example.

data_fake_nation
#>      week    season year    yrwk    x location_code     pop pr100_ili
#>   1:    1 2009/2010 2010 2010-01 24.0         norge 5367580  1.160403
#>   2:    1 2010/2011 2011 2011-01 24.0         norge 5367580  1.164511
#>   3:    1 2011/2012 2012 2012-01 24.0         norge 5367580  1.472767
#>   4:    1 2012/2013 2013 2013-01 24.0         norge 5367580  1.435822
#>   5:    1 2013/2014 2014 2014-01 24.0         norge 5367580  1.402403
#>  ---                                                                 
#> 570:   52 2017/2018 2017 2017-52 23.0         norge 5367580  1.270134
#> 571:   52 2018/2019 2018 2018-52 23.0         norge 5367580  1.028794
#> 572:   52 2019/2020 2019 2019-52 23.0         norge 5367580  1.236302
#> 573:   53 2009/2010 2009 2009-53 23.5         norge 5367580  1.100072
#> 574:   53 2015/2016 2015 2015-53 23.5         norge 5367580  1.269054
#>      pr100_ili_lag_1 temperature_high deaths
#>   1:       1.1000719                0    891
#>   2:       0.9623231                0    874
#>   3:       1.3356371                0    854
#>   4:       1.2320998                0    885
#>   5:       1.2514772                0    904
#>  ---                                        
#> 570:       1.0938242                0    854
#> 571:       0.8534488                0    893
#> 572:       1.0609100                0    829
#> 573:       1.0387242                0    880
#> 574:       1.1645116                0    912

Chi Zhang

2022-02-23

Individual level daily mortality reporting

Weekly aggregation by location

Without weekly percentile

With weekly percentile

Old aggregation data