Mapping, harmonising and integrating innovative data sources for research purposes

Mapping, harmonising and integrating innovative data sources for research purposes

As we navigate the path toward a more sustainable future, the need to study and measure the performance of sustainability transitions becomes crucial. In the past decades, the advent of innovative data sources has significantly expanded our ability to gauge the many dimensions of sustainability addressed by the SPES framework. Unlike traditional data sources (e.g., surveys and census data), innovative data sources are often generated as a by-product of activities of all kinds and are only subsequently (and sometimes creatively) repurposed in innovative ways for research goals. Albeit providing unprecedented opportunities, indeed, such new sources of data come with several methodological and epistemological caveats.

The deliverable “Report on mapping,harmonising and integrating novel data sources for research purposes” by SPES Researchers from the University of Florence and the University of Amsterdam provides on the one hand a mapping of innovative data sources used to measure the dimensions of sustainability transition and on the other hand an overview of data integration methods which enable the use of this novel data sources for research purposes.


What novel data sources are currently leveraged to study the many dimensions of sustainability?

The Report focuses on two main aspects: on one hand the identification, mapping and illustration of innovative data sources for the study of sustainability transition. Indeed, the first part of the Report is composed of an exploratory research that leverages Large Language Models (LLMs) to extract, classify and organise information about the use of innovative data sources in the study of sustainability, based on a large corpus of recent academic studies. One of the issues that the analysis highlights is the heterogeneity of data types that can be used for the purpose, and the widespread practice of integrating data of different types.

In order to answer its research question, the report adopted a systematic approach, mapping data sources claimed to be innovative within a large sample of academic sources, collected following an expansive operationalisation of the concept of sustainability and leveraging Large Language Models (LLM) for the purpose.

Which existing methods can be leveraged to develop an appropriate harmonisation and integration strategy of both traditional and new data sources?

The second part consists in a systematic review of data integration strategies for the creation of a synthetic dataset but also for combining probability and nonprobability samples in the context of non-conventional data sources.

Researchers present two data integration procedures, namely, record linkage and statistical matching. Such procedures aim to integrate two (or more) datasets that contain information on a set of common variables and variables that are not jointly observed. As output, the implementation of these integration procedures gives a set of pairs of records.

The aim is to provide an overview for a broader audience of the statistical methods developed in the last decades in Official statistics and Survey statistics to deal with data integration issues, at proposing toy/practical examples to illustrate both the issues at stake and the methods suggested to tackle them and supplying available statistical software and packages for their implementation.

This Report D4.1 “Report on mapping, harmonising and integrating novel data sources for research purposes” for the project SPES has been prepared by the University of Amsterdam and the University of Florence as part of Task 4.1 “Map complex and novel data sources and methods” of Work Package 4. The report has been written by Veronica Ballerini, University of Florence, Davide Beraldo, University of Amsterdam, Chiara Bocci, University of Florence, Lisa Braito, University of Florence, Roberta Milana, University of Amsterdam, Emilia Rocco, University of Florence, Martin Trans, University of Amsterdam.