Ensuring Ethical and Trustworthy Secondary Use of Health Data: Insights from the Dorieh Platform

Ensuring Ethical and Trustworthy Secondary Use of Health Data: Insights from the Dorieh Platform Passed

Wednesday May 21, 2025 13:00 - 13:15 Vitalis Plaza

Lecturers: Michael Bouzinier, Francesco Pontiggia

Track: Health data

The vast proliferation of health data presents an enormous opportunity for research and policy-making but also poses significant challenges in trust, efficiency, and regulatory compliance. The secondary use of health data requires robust setups to ensure data is accurately harnessed for insights while meeting ethical and legal standards. This paper explores the integration of advanced data management, using tools such as descriptive workflow languages and domain-specific languages (DSLs), to create more trustworthy and efficient infrastructures in health data utilization.

The primary focus of the research was the Dorieh Data Platform, developed by Harvard University Research Computing in partnership with the Harvard T.H. Chan School of Public Health. Dorieh embodies a sophisticated data management approach that incorporates descriptive dataflow operators, enabling granular tracing of data transformations. By doing so, it addresses critical aspects of data provenance — the ability to trace and validate the lineage of every data element. Dorieh is deployed in the Harvard University FISMA-compliant Trusted Research Environment (TRE) leveraging Open OnDemand infrastructure. Dorieh is being used to prepare and document research datasets for National Studies of Air Pollution and Health.

Central to this work is employing a domain-specific language for data modeling, to allow for explicit definitions of transformations and enhance reproducibility and accountability in the secondary use of health data. Through integration with descriptive workflow languages, we create comprehensive frameworks that better adapt to the demands of modern data science, particularly in healthcare where regulatory compliance is rigorous.

The application of these methodologies on Medicare data highlighted data inconsistencies and underscored the effectiveness of Dorieh's approach in maintaining data quality. By providing detailed data lineage and error logging, Dorieh bolsters the trustworthiness and regulatory adherence of data-driven projects. We advocate adopting similar DSL tools across diverse health-related domains, ensuring data lineage is meticulously documented, thereby reinforcing the reliability and validity of research outcomes.

While the methodologies discussed were developed within a tightly controlled environment, they are positioned for scalability to more complex ecosystems like the European Health Data Space (EHDS). By addressing multimodal regulatory requirements, including FISMA and EMA-HMA Data Quality stipulations, this approach stands to become a pivotal element in modern data governance, ensuring that AI models and health policy decisions are based on transparent and scientifically sound data processing methods.

Language

English

Topic

Data and Information

Seminar type

Live + On site

Lecture type

Presentation

Objective of lecture

Tools for implementation

Level of knowledge

Intermediate

Target audience

Management/decision makers
Technicians/IT/Developers
Researchers

Keyword

Actual examples (good/bad)
Innovation/research
Apps
Government information
Informatics/Interoperability

Lecturers

Michael Bouzinier Lecturer

AI Data Architect
IDEXX Laboratories / Harvard University

Michael (Misha) Bouzinier is an AI Data Architect with IDEXX Labrotories. He has over 30 years of diverse experience in software research and development and 10 years as a professional educator. Misha’s intellectual interests include semiotics, natural language processing and text analytics, data visualization, evolutionary and medical genetics, computer simulations, and explainable AI. Throughout his career, he has worked and led diverse international teams, successfully collaborating with developers and researchers from within the US, UK, Sweden, Finland, Belgium, The Netherlands, and Japan. In his free time, Misha loves to enjoy the outdoors, travel, and interact with people from diverse backgrounds.

Francesco Pontiggia Lecturer

Francesco Pontiggia is a Sr Director of Harvard University Research Computing

Cookie	Category	Provider	Expiry	Legal basis	Intention
_production_session_id	Necessary	InvitePeople	Session	Legitimate interest	Stores a unique ID of the InvitePeople-user session. This will allow the user to login into InvitePeople.
order_id	Necessary	InvitePeople	6 hours	Legitimate interest	Stores a unique ID of the current order (shopping cart). This allows the user to add, view, and manage items in their order during the session.
cookie_consent_[ID]	Necessary	InvitePeople	6 months	Legitimate interest	Stores the user's cookie consent preference for a specific Optional cookie policy. The ID in the cookie name refers to the unique identifier of that policy. This ensures the user’s choice—whether accepting all cookies or only necessary ones—is remembered on future visits.
lang	Necessary	InvitePeople	1 year	Legitimate interest	Remembers the user’s selected language version.
logged_in_user	Necessary	InvitePeople	1 year	Legitimate interest	Stores a unique ID of the InvitePeople-user. This will help the user navigate.
production_auth	Necessary	InvitePeople	1 year	Legitimate interest	Stores a unique ID of the InvitePeople-user. This will help the user with future logins.