Flatiron Health, a global leader in real-world oncology data and evidence generation, today announced a suite of groundbreaking scientific innovations that promise to redefine how the healthcare industry understands, interprets, and predicts real-world outcomes in cancer. With the launch of LLM-extracted real-world progression data at unprecedented scale, the VALID AI data quality framework, and harmonized multinational real-world datasets, Flatiron is advancing from descriptive analytics—what has happened—to uncovering the “why” behind clinical outcomes and predicting the “what will be” in cancer care.
Together, these innovations bring Flatiron closer to its vision of building the world’s most comprehensive oncology dataset, now encompassing more than five million patient records and 1.5 billion datapoints available for research. This dataset continues to grow exponentially as Flatiron expands its global network, deepens partnerships, and deploys next-generation technology to accelerate cancer discovery worldwide.
By addressing long-standing challenges in oncology data—such as the manual abstraction of disease progression, the absence of standardized AI data quality frameworks, and the complexity of cross-border data harmonization—Flatiron’s innovations unlock new possibilities in predictive modeling and hypothesis generation. The result is a transformative step from reactive analysis toward proactive, insight-driven cancer research.
Moving from “What Is” to “Why” and “What Will Be”
“While the industry talks about the potential of AI in healthcare, we’re delivering the reality,” said Nathan Hubbard, Chief Executive Officer at Flatiron Health. “These aren’t incremental improvements—they’re foundational innovations that redefine what’s possible in cancer research. We’re not just describing what happened; we’re predicting what’s next and explaining why it matters. This positions our life sciences partners to answer the questions they’ll be asking in years to come with the innovations we’re delivering today.”
Flatiron’s approach marks a paradigm shift for real-world evidence (RWE). Traditionally, RWE has focused on describing patient outcomes based on retrospective data, often constrained by data quality, limited scalability, and fragmented healthcare records. By applying advanced AI and large language models (LLMs), Flatiron enables researchers to analyze cancer data in near real time, uncovering causal insights and generating predictive forecasts that help guide future therapeutic strategies.
LLM-Extracted Real-World Progression: Unlocking a Critical Data Point
One of the most profound breakthroughs is Flatiron’s ability to extract real-world progression data across all major solid tumors using large language models.
Progression—the point at which a patient’s cancer advances or becomes resistant to therapy—is one of the most important events in understanding a cancer journey. Historically, capturing this data required manual chart review by trained abstractors, a process that was both time-intensive and resource-heavy. As a result, real-world progression data was limited in availability, constraining the types of research questions that could be asked.
Flatiron’s LLM-based extraction technology now enables the automated capture of disease progression data from unstructured clinical notes and electronic health records (EHRs) at unprecedented scale. This innovation allows for cross-tumor, cross-treatment, and cross-regional analyses spanning lung, breast, ovarian, and prostate cancers, among others.
With this advancement, researchers can now study treatment effectiveness, resistance mechanisms, and survival outcomes with greater precision and depth—opening new avenues for understanding how and why patients respond differently to therapies in real-world settings.
VALID Framework: Setting the Gold Standard for AI Data Quality
Flatiron has also introduced the VALID (Validation of Accuracy for LLM/ML-Extracted Information and Data) Framework, establishing the industry’s first comprehensive standard for evaluating the quality and reliability of AI-extracted real-world data.
The VALID Framework represents a rigorous methodology to ensure that AI-derived data meets the gold standards required for regulatory, clinical, and scientific decision-making. The framework assesses key aspects of AI data extraction, including:
- Performance Metrics: Quantifying precision, recall, and accuracy across diverse data sources.
- Usability and Reproducibility: Ensuring data can be applied consistently in varied research contexts.
- Benchmarking Against Human Abstraction: Comparing AI-extracted results against expert human curators to validate accuracy and reliability.
- Regulatory Readiness: Demonstrating that AI-driven data can support submissions to health authorities and meet real-world evidence standards.
By setting this new standard, Flatiron ensures that researchers and regulators can trust AI-generated insights with the same level of confidence traditionally reserved for manually curated datasets—bridging the gap between innovation and scientific rigor.
Harmonized Global Datasets: Enabling Multinational Cancer Research
Flatiron’s harmonized multinational real-world datasets represent another transformative milestone. For decades, researchers have struggled to integrate data across countries due to variations in healthcare systems, regulatory environments, and data structures.
Flatiron has solved this long-standing challenge by harmonizing oncology datasets across major global markets including the United Kingdom, Germany, and Japan, with plans to expand further into Europe and Asia. These harmonized datasets are securely accessible through Flatiron’s Trusted Research Environment, powered by Lifebit CloudOS, allowing for multi-country data analysis while maintaining local data sovereignty, privacy, and compliance with regulations such as GDPR and regional health data laws.
This capability enables researchers to conduct cross-border studies that compare clinical practices, treatment outcomes, and disease trajectories across diverse healthcare ecosystems. Such analyses are invaluable for understanding global variations in cancer care, supporting multinational clinical development, and informing regulatory and reimbursement decisions worldwide.
Expertise That Powers Innovation
“Flatiron’s scientific innovation is built on the strength of our proven methodological expertise,” said Michael Bierl, Vice President and Head of Evidence Solutions at Flatiron Health. “Our global evidence generation team of research oncologists, biostatisticians, epidemiologists, and regulatory specialists partner with customers from study design through analysis, applying validated methods that generate robust evidence to accelerate therapeutic development and improve the way cancer is treated around the world.”
This combination of AI technology and human expertise has become Flatiron’s hallmark. The company’s teams work closely with biopharmaceutical partners, academic researchers, and regulators to ensure that every dataset and model meets the highest scientific and ethical standards.
Flatiron’s collaborations extend to major oncology networks, enabling integration of clinical, genomic, and real-world treatment data. This interconnected infrastructure positions the company to support the full continuum of cancer research—from early discovery through clinical trials and post-market evidence generation.
From Reactive to Predictive Oncology Research
The convergence of Flatiron’s AI technologies and harmonized data systems marks a turning point for real-world oncology research. Instead of relying solely on retrospective analyses, researchers can now use predictive models to anticipate disease trajectories, therapy responses, and population-level outcomes.
These models enable hypothesis generation before clinical signals emerge, helping scientists and drug developers to identify potential biomarkers, optimize trial design, and predict therapeutic impact. For healthcare providers, predictive real-world insights can inform clinical decision-making, support personalized treatment planning, and improve patient outcomes in real time.
Looking Ahead: The Future of Real-World Evidence
Flatiron’s ongoing commitment to advancing scientific innovation has positioned it as a global leader in next-generation real-world evidence. The company continues to evolve from a data provider into a strategic research partner capable of shaping the future of oncology.
Flatiron plans to present new research demonstrating these capabilities at upcoming international oncology conferences, showcasing how its AI-powered data ecosystem is already transforming both cancer research and clinical care.
With its rapidly expanding global data network, validated AI frameworks, and unprecedented ability to extract meaningful clinical signals at scale, Flatiron Health is leading a new era of evidence-based prediction—one that moves beyond describing “what is” toward uncovering “why” outcomes occur and predicting “what will be” for cancer patients around the world.



