Tabulated life

Posted on 2023-03-27 by Karl Pettersson. Tags:

Last week, official life tables for Sweden 2022 were released (Statistics Sweden 2023a). According to these, female life expectancy at birth decreased from 84.82 years 2021 to 84.73 years, while male life expectancy increased from 81.21 to 81.34 years. These numbers are the same as 2019, and 0.01 years lower than what I estimated in my post 26 February. These slight differences may be due to the fact that I used 1-years age groups for ages 0–99 and an open interval for ages ≥100, according to data available from Statistics Sweden (2023b) and Statistics Sweden (2023c), while the official life tables uses 1-year age groups for ages below 110, with a model for smoothing death risks in the oldest age groups.

For my estimates, I used my Julia LifeTable.jl package. During recent weeks, I have also asked the much publicised large language models ChatGPT/GPT-3.5 and GPT-4 for period life tables in Haskell: a language that is, in many ways, well-suited to manipulating data tables but has not been very widely used in that area, compared to languages such as R, Python, and Julia. I have shared one GPT-4 attempt via Poe. The output may be useful for getting started with defining basic types, and sensible names, but is not functional without major changes (even though the chatbot presents it as such, with its usual self-assurance).

First, the code will not compile as it is, because of some type errors. More importantly, the function computeLifeExpectancies, which takes a list of pairs of ages and death rates as input, tries to calculate only the lx column in the life table, which gives the number or proportion of the population surviving to a certain age, which the names in the code, and the explanations from the chatbot wrongly describes as life expectancy. Not to mention that the lx calculation itself is incorrect. Earlier outputs from the old ChatGPT were somewhat better, but did not produce any calculations of full life tables either.

But what could then a Haskell life table calculator, at least capable of reproducing the estimates in my February post, look like? I made an attempt, and here are the central type definitions and functions.

import Data.List

type Age = Int
type DeathCount = Int
type Rate = Double
type Probability = Double
type LifeTime = Double

data DeathsMpop = DeathsMpop { age :: Age
                             , deaths :: DeathCount
                             , mpop :: LifeTime
                             } deriving Show

data LifeTableRow = LifeTableRow { x :: Age
                                 , mx :: Rate
                                 , qx :: Probability
                                 , lx :: Probability
                                 , dx :: Probability
                                 , ex :: LifeTime 
                                 } deriving Show

periodLifeTable :: [DeathsMpop] -> [LifeTableRow]
periodLifeTable dthspop = zipWith6 LifeTableRow x' mx' qx' lx' dx' ex'
    where x' = age <$> dthspop 
          dt = deaths <$> dthspop
          mp = mpop <$> dthspop
          mx' = zipWith (\d p -> (fromIntegral d) / p) dt mp
          qx' = (init $ (\m -> m / (1 + 0.5 * m)) <$> mx') ++ [1.0]
          px' = (\q -> 1 - q) <$> qx'
          lx' = tail $ product <$> inits (1.0:init px')
          dx' = zipWith (*) qx' lx'
          ldend = (last lx') * 1/(last mx')
          ldx' = (init $ zipWith (\l d -> l - 0.5 * d) lx' dx') ++ [ldend]
          tx' = init $ sum <$> tails ldx'
          ex' = zipWith (/) tx' lx'

The function periodLifeTable takes a list of records with fields for age, number of deaths and mean population, and outputs a life table in the form of a list of records with fields corresponding to standard life table columns: age (x), mortality rate (mx), probability of death (qx), survival up to an age (lx), proportion of the (synthetic) cohort dying at an age (dx), and remaining life expectancy (ex). The calculations should be equivalent to those used in LifeTable.jl, except that it is assumed that risk of death is related to mortality rate by \(\mathrm{q}(x)=\frac{\mathrm{m}(x)}{1+\mathrm{m}(x)/2}\), for all closed age groups \(x\), instead of using special calculations for infant mortality. In countries with very low infant mortality, like modern Sweden, this makes no significant difference.

The subdirectory postdata/2023-03-27-table in the blog repository contains the file lt.hs which implements the functions above in a program that, using the cassava library, reads tab-separated values with age, deaths and population from standard input, and writes the calculated life table as tab-separated values to standard output.

After compilation in this directory, the lt program may be run on the files data/dthsmpop_f22.tsv and data/dthsmpop_m22.tsv, with data on female and male deaths and population in Sweden 2022, from Statistics Sweden (2023b) and Statistics Sweden (2023c).

$ ./lt < data/dthsmpop_f22.tsv > data/ltf22.tsv
$ ./lt < data/dthsmpop_m22.tsv > data/ltm22.tsv

This should produce life tables like data/ltf22.tsv and data/ltm22.tsv where the corresponding columns are close to the estimates made with LifeTable.jl in the files with the same names in the directory postdata/2023-02-26-2019/data.

References

Statistics Sweden. 2023b. “Mean population by region, marital status, age and sex.” https://www.statistikdatabasen.scb.se/goto/en/ssd/MedelfolkHandelse.
———. 2023a. “Life table by sex and age.” https://www.statistikdatabasen.scb.se/goto/en/ssd/LivslangdEttariga.
———. 2023c. “Deaths by region, age (during the year) and sex.” https://www.statistikdatabasen.scb.se/goto/en/ssd/DodaHandelseK.