
Ensuring Ethical and Trustworthy Secondary Use of Health Data: Insights from the Dorieh Platform
Onsdag 21 maj 2025 13:00 - 13:15 A3
Föreläsare: Michael Bouzinier, Francesco Pontiggia
Spår: Health data
The vast proliferation of health data presents an enormous opportunity for research and policy-making but also poses significant challenges in trust, efficiency, and regulatory compliance. The secondary use of health data requires robust setups to ensure data is accurately harnessed for insights while meeting ethical and legal standards. This paper explores the integration of advanced data management, using tools such as descriptive workflow languages and domain-specific languages (DSLs), to create more trustworthy and efficient infrastructures in health data utilization.
The primary focus of the research was the Dorieh Data Platform, developed by Harvard University Research Computing in partnership with the Harvard T.H. Chan School of Public Health. Dorieh embodies a sophisticated data management approach that incorporates descriptive dataflow operators, enabling granular tracing of data transformations. By doing so, it addresses critical aspects of data provenance — the ability to trace and validate the lineage of every data element. Dorieh is deployed in the Harvard University FISMA-compliant Trusted Research Environment (TRE) leveraging Open OnDemand infrastructure. Dorieh is being used to prepare and document research datasets for National Studies of Air Pollution and Health.
Central to this work is employing a domain-specific language for data modeling, to allow for explicit definitions of transformations and enhance reproducibility and accountability in the secondary use of health data. Through integration with descriptive workflow languages, we create comprehensive frameworks that better adapt to the demands of modern data science, particularly in healthcare where regulatory compliance is rigorous.
The application of these methodologies on Medicare data highlighted data inconsistencies and underscored the effectiveness of Dorieh's approach in maintaining data quality. By providing detailed data lineage and error logging, Dorieh bolsters the trustworthiness and regulatory adherence of data-driven projects. We advocate adopting similar DSL tools across diverse health-related domains, ensuring data lineage is meticulously documented, thereby reinforcing the reliability and validity of research outcomes.
While the methodologies discussed were developed within a tightly controlled environment, they are positioned for scalability to more complex ecosystems like the European Health Data Space (EHDS). By addressing multimodal regulatory requirements, including FISMA and EMA-HMA Data Quality stipulations, this approach stands to become a pivotal element in modern data governance, ensuring that AI models and health policy decisions are based on transparent and scientifically sound data processing methods.
Ämne
Data och information
Seminarietyp
Live + på plats
Föreläsningsformat
Presentation
Föreläsningssyfte
Verktyg för implementering
Kunskapsnivå
Fördjupning
Målgrupp
Chef/Beslutsfattare
Tekniker/IT/Utvecklare
Forskare (även studerande)
Nyckelord
Exempel från verkligheten (goda/dåliga)
Innovativ/forskning
Appar
Information/myndighet
Informatik/Interoperabilitet
Föreläsare
Michael Bouzinier Föreläsare
Harvard University
Francesco Pontiggia Föreläsare
Francesco Pontiggia is a Sr Director of Harvard University Research Computing