Séminaire de Vannes du LMBA

Le
À 11h30
LMBA Site de Vannes, salle F196
Vannes

Valérie Garès (INSA Rennes)

Titre : Record linkage and analysis of linked data with application in French national health data system

Résumé : The French National Health Data System is the national health data system which collects all the longitudinal health records and insurance information of most of the French population. These data can be used to enrich other existing databases (cohorts, health registries...), which allows to get a more comprehensive medical information on each patient, and thus, to improve the subsequent statistical analysis. However, patients in the SNDS and health databases are usually anonymised, and no unique patient identifier is available to match the databases. Fellegi and Sunter (1969) proposed a probabilistic record linkage method, based on the fact that we usually access some "matching variables"  which serve as partial identifiers common to both databases (e.g., gender, postal codes, dates of the treatment…). They allow to calculate "matching probabilities" for each pair of patients taken in the SNDS and the health registry of interest. The Fellegi and Sunter model is limited to simple binary comparison between matching variables. In our first work, we proposed an extension of this model for mixed-type comparison vectors. We developed a mixture model for handling comparison values of low prevalence categorical matching variables, and a mixture of hurdle gamma distribution for handling comparison values of continuous matching variables. In a second work, we proposed models for survival analysis with matched data. Indeed, perfect matching is never achieved, and neglecting associated errors can lead to biased estimates. In this work, we proposed an adjusted estimating equation for secondary Cox regression analysis, where linked data have been prepared by someone else and no information on matching variables are available to the analyst. Finally, we may access the matching probabilities which convey some uncertainty on the matching process, and this uncertainty must be taken into account in any subsequent statistical analysis. We proposed a new method in order to take account of these errors in a survival analysis based on the Cox model. This method is based on the well-known EM algorithm for estimation in a missing-data context. The proposed models are applied to perform a survival analysis of linked data between a registry of patients suffering from venous thromboembolism in the Brest and the SNDS.

Joint work with Vanessa Chezeu, Huan Vo Tanh, Guillaume Chauvet and Jean-François Dupuy

Vo T.H., Gares V., L-C. Zhang L-C., Happe A., Oger E., S. Paquelet S. et Chauvet G. Cox regression with linked data. Statistics in medecine. 43(2), pp. 296-314, 2023. doi

Vo T.H., Chauvet G., Happe A., Oger E., Paquelet S. et Gares V. Extending the Fellegi-Sunter record linkage model for mixed-type data with application to the French national health data system. Computational Statistics and Data Analysis journal. 79, pp. 107656, 2023. doi