GESIS-Workshop Synthetic Data: Generation and Evaluation

Zeitraum:
.ics / iCalendar: Kalenderdatei herunterladen

Dozent(en): Thom Volker

Seminarinhalt

In the current age of open science, sharing research code and data is often required when publishing a scientific paper. Moreover, the open dissemination of research data is a potential gold mine for answering many research questions. However, privacy and confidentiality constraints often impede the open dissemination of research data. Synthetic data can be an excellent solution to this problem: the real data is kept secret, but a "fake" version of the data is made available. This synthetic dataset can serve many purposes. For example, it allows those in the process of obtaining access to the real data set to get familiar with the structure of the data, and it allows reviewers (or other researchers) to rerun scripts and assess whether the original analysis code is reproducible and runs as intended. Additionally, the synthetic data itself can be used to run completely different analyses, unrelated to the original research problem. In this course, you will learn what synthetic data is, how to generate synthetic data, how to evaluate its quality in terms of utility and remaining privacy risks, and how to obtain statistically valid results from analyses on this data.

Mehr Informationen