GESIS - for a research-based infrastructure

GESIS is a research-based infrastructure institution for the social sciences and conducts its own continuous and interdisciplinary research in four major research areas. The results of our research serve both to gain scientific knowledge and to sustainably improve our offerings for the social sciences.

For GESIS, the quality of data takes center stage. GESIS strives to provide high-quality research data as well as methods and tools that enable users to assess for themselves how high the quality of research data is. The research focus is therefore also geared towards this core interest. In order to contribute to the generation of knowledge about data quality, GESIS focuses on the research areas of survey methodology, computer-based methods and research data management. Together, we focus our methodologically oriented research on supporting researchers who work with quantitative data.

Research output at GESIS

Sen, Indira, Dennis Assenmacher, Mattia Samory, Isabelle Augenstein, Wil van der Aalst, and Claudia Wagner. 2023. "People Make Better Edits: Measuring the Efficacy of LLM-Generated Counterfactually Augmented Data for Harmful Language Detection." 2023. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP, edited by Houda Bouamor, Juan Pino, and Kalika Bali, 10480-10504. Singapore: Association for Computational Linguistics.
Dahou, Abdelhalim Hafedh, Mohamed Amine Cheragui, and Ahmed Abdelali. 2023. "Performance Analysis of Arabic Pre-Trained Models on Named Entity Recognition Task." In Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing, edited by Ruslan Mitkov, and Galia Angelova, 458–467. Shoumen: INCOMA Ltd.. https://aclanthology.org/2023.ranlp-1.51.pdf.
Diera, Andor, Abdelhalim Hafedh Dahou, Lukas Galke, Fabian Karl, Florian Sihler, and Ansgar Scherp. 2023. GenCodeSearchNet: A Benchmark Test Suite for Evaluating Generalization in Programming Language Understanding. Proceedings of the 1st GenBench Workshop on (Benchmarking) Generalisation in NLP. Association for Computational Linguistics (ACL). doi: https://doi.org/10.18653/v1/2023.genbench-1.2.
Dahou, Abdelhalim Hafedh, and Brigitte Mathiak. 2023. "Subject Classification of Software Repository." In Proceedings of the 15th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - KDIR, 1, 30-38. SciTePress. doi: https://doi.org/10.5220/0012159600003598.
Lietz, Haiko, Mohsen Jadidi, Daniel Kostic, Milena Tsvetkova, and Claudia Wagner. 2024. "Individual and gender inequality in computer science: A career study of cohorts from 1970 to 2000." Quantitative Science Studies online first 1-24. doi: https://doi.org/10.1162/qss_a_00283.

1
2
3
4
…
309

Research focus on data quality

Ein wesentliches Merkmal von GESIS ist, dass das Institut insbesondere bei den Daten, für die es auch die Erhebung verantwortet, sehr hohe Ansprüche und Standards an die Qualität der bereitge- stellten Daten anlegt. Daher ist es für GESIS zentral, eigene Beiträge zur Untersuchung und Verbes- serung von Aspekten der Datenqualität zu leisten. Die Forschung in den GESIS-Forschungsbereichen trägt deshalb direkt zum Schwerpunkt Datenqualität bei. Dies betrifft sowohl Umfragedaten als auch digitale Verhaltensdaten und relevante Metadaten. Datenqualität umfasst Aspekte der (a) Kor- rektheit und Repräsentativität von Daten und (b) Nutzbarkeit und FAIRness von Daten. Beispiele für (a) sind die Vollständigkeit, Korrektheit, Provenienz der Repräsentativität von Daten, während (b) Aspekte wie Findbarkeit, Qualität der Dokumentation, Aufbereitung oder die Interoperabilität von Daten und Metadaten berücksichtigt. Damit wird eine wichtige Voraussetzung dafür erfüllt, dass die Bearbeitung inhaltlicher Fragestel- lungen (Substantive Research) auf Basis dieser Daten zu validen Ergebnissen führt.

Research area Survey Methodology

At GESIS we conduct basic and applied research in the field of survey methodology. Our survey research is divided into the focus areas of Survey Statistics, Survey Instruments, Survey Operations and Comparative Surveys. We pursue the goal of gaining evidence-based insights into how surveys and their data quality can be optimised. Within the framework of systematic reviews and meta-analyses, we evaluate existing research and identify research gaps. In the research area of survey methodology, we also explore the connection of survey data with digital behavioural data (e.g. social media profiles, smartphone usage data or browsing histories) and examine how these data types can be complemented and combined. To this end, we are also driving the transfer of established concepts for assessing the data quality of surveys to digital behavioural data.

Research area Computational Methods

In order to ensure a high level of quality of GESIS digital products and services in view of the rapid changes in information and knowledge technologies, GESIS conducts research in the field of applied informatics and information science.

The aim of this research area is to test, analyse, adapt, further develop and evaluate novel methods, models and algorithms of computer science in the application field of social sciences. A core component of this research area is, above all, the development of digital behavioural data such as data from social media or data generated by sensors for social science research. This is because the development and evaluation of methods for collecting, processing and analysing this new data expands the basis for answering social science questions. By implementing the knowledge gained, innovative and integrated research infrastructures and services tailored to the social sciences can thus be created in the future for all phases of the research data cycle.

Research area Research Data Management

Against the backdrop of the large and growing data base of GESIS, as well as the related offers for data reference and data archiving, research in this area is an important component for the expansion and progress of the related infrastructures.

Research in this field is concerned with long time preservation, data documentation and the legal framework of data access and licensing of data. Topics in this research area address the challenges arising from data sharing and data security. Additional important research topics are the creation of data documentation standards and meta data, the handling of new data types such as digital behavioral data and long time preservation issues.

Research area Substantive Research

Our commitment to diversified topics in the political and social sciences also ensures that we remain relevant and attentive to the latest trends and developments, thereby enriching our infrastructure offerings. We increase the visibility of our data through our publications and by presenting our research at relevant conferences. We show our data’s potential and their usability and promote the exchange with the relevant scientific communities.

Of particular importance is the application of current analytical models to the data such as cross-classified multilevel models and various applications of random, fixed and hybrid longitudinal models.