Huvudbild för Vitalis 2025

Ensuring Ethical and Trustworthy Secondary Use of Health Data: Insights from the Dorieh Platform

Onsdag 21 maj 2025 13:00 - 13:15 A3

Föreläsare: Michael Bouzinier, Francesco Pontiggia

Spår: Health data

The vast proliferation of health data presents an enormous opportunity for research and policy-making but also poses significant challenges in trust, efficiency, and regulatory compliance. The secondary use of health data requires robust setups to ensure data is accurately harnessed for insights while meeting ethical and legal standards. This paper explores the integration of advanced data management, using tools such as descriptive workflow languages and domain-specific languages (DSLs), to create more trustworthy and efficient infrastructures in health data utilization.


The primary focus of the research was the Dorieh Data Platform, developed by Harvard University Research Computing in partnership with the Harvard T.H. Chan School of Public Health. Dorieh embodies a sophisticated data management approach that incorporates descriptive dataflow operators, enabling granular tracing of data transformations. By doing so, it addresses critical aspects of data provenance — the ability to trace and validate the lineage of every data element. Dorieh is deployed in the Harvard University FISMA-compliant Trusted Research Environment (TRE) leveraging Open OnDemand infrastructure. Dorieh is being used to prepare and document research datasets for National Studies of Air Pollution and Health.


Central to this work is employing a domain-specific language for data modeling, to allow for explicit definitions of transformations and enhance reproducibility and accountability in the secondary use of health data. Through integration with descriptive workflow languages, we create comprehensive frameworks that better adapt to the demands of modern data science, particularly in healthcare where regulatory compliance is rigorous.


The application of these methodologies on Medicare data highlighted data inconsistencies and underscored the effectiveness of Dorieh's approach in maintaining data quality. By providing detailed data lineage and error logging, Dorieh bolsters the trustworthiness and regulatory adherence of data-driven projects. We advocate adopting similar DSL tools across diverse health-related domains, ensuring data lineage is meticulously documented, thereby reinforcing the reliability and validity of research outcomes.


While the methodologies discussed were developed within a tightly controlled environment, they are positioned for scalability to more complex ecosystems like the European Health Data Space (EHDS). By addressing multimodal regulatory requirements, including FISMA and EMA-HMA Data Quality stipulations, this approach stands to become a pivotal element in modern data governance, ensuring that AI models and health policy decisions are based on transparent and scientifically sound data processing methods.

Språk

English

Ämne

Data och information

Seminarietyp

Live + på plats

Föreläsningsformat

Presentation

Föreläsningssyfte

Verktyg för implementering

Kunskapsnivå

Fördjupning

Målgrupp

Chef/Beslutsfattare
Tekniker/IT/Utvecklare
Forskare (även studerande)

Nyckelord

Exempel från verkligheten (goda/dåliga)
Innovativ/forskning
Appar
Information/myndighet
Informatik/Interoperabilitet

Föreläsare

Michael Bouzinier Föreläsare

Harvard University

Francesco Pontiggia Föreläsare

Francesco Pontiggia is a Sr Director of Harvard University Research Computing