Catalyzing Integration of Genomic and Clinical Datasets Across Multiple Cancer Institutions Worldwide

The AACR Project GENIE enables precision cancer medicine research

Several “big data” initiatives, including the Genomics, Evidence, Neoplasia, Information, Exchange (GENIE) project have been launched in recent years to address the challenges of large-scale sharing of genomic and clinical data and to accelerate progress in identifying both effective and ineffective therapies to treat cancer.

Recognising the immediate and urgent need for broad data sharing across cancer centres and with the wider scientific community, the American Association for Cancer Research (AACR) in partnership with eight global academic leaders in clinical cancer genomics initiated the AACR Project GENIE. It is a multi-phase, multi-year, international data-sharing project that aims to catalyze precision cancer medicine through identification of novel therapeutic targets, design of biomarker-driven clinical trials, and identification of genomic determinants of response to therapy.

The project fulfills an unmet need in oncology by providing a data-sharing platform to enable scientific and clinical discovery. Ultimately, the platform can improve clinical decision-making and increase the likelihood that cancer treatments patients receive are beneficial. At the societal level, this approach has immense potential to maximize the value of care delivery.

The GENIE platform is built to integrate and link clinical-grade cancer genomic data with clinical outcomes data. The database currently contains approximately 19,000 genomic and clinical records generated in CLIA-/ISO-certified laboratories and will continue to grow as additional patients are treated at each of the participating centres and as more centres join the consortium.

Eight participating cancer institutions so far are Dana-Farber Cancer Institute, USA; Institut Gustave Roussy, France; Johns Hopkins Sidney Kimmel Comprehensive Cancer Center, USA; MD Anderson Cancer Center, USA; Memorial Sloan Kettering Cancer Center, USA; Netherlands Cancer Institute, on behalf of the Center for Personalized Cancer Treatment, the Netherlands; Princess Margaret Cancer Centre, University Health Network, Canada; and Vanderbilt-Ingram Cancer Center, USA.

Because each of the current participating centres is a tertiary referral centre within its community, the platform is enriched in samples of late-stage disease. Each of the participating centres has extensive clinical data characterising individual patients via Electronic Health Record systems, and GENIE is therefore uniquely positioned to integrate genomic data with clinical data and harmonize such data across multiple cancer centres. To accomplish this, the consortium members have defined a parsimonious set of harmonized clinical data elements and outcome endpoints.

The GENIE platform will enable researchers to better understand clinical actionability across cancer types, assess the clinical utility of genomic sequencing, define clinical trial enrollment rates to genotype-specific clinical trials, validate genomic biomarkers, reposition, or repurpose of already approved drugs, expand existing drug labels by addition of new mutations, and identify new drug targets. Importantly, researchers will also be able to compare and cross-validate the clinically derived datasets generated by GENIE with other publicly available datasets including The Cancer Genome Atlas Project and the International Cancer Genome Consortium.

To enable broad-based sharing, the project has partnership agreements in place with Sage Bionetworks, and the cBioPortal for Cancer Genomics, both of which have significant prior experience in similar projects and have developed established and accepted data-sharing platforms within the community.

The first integrated GENIE dataset (version 1.1) provides genomic and limited clinical data for 18,804 genomically profiled samples across 18,324 patients at 8 academic medical centres, each of which utilized genomic strategies tailored to best support their local clinical programmes. These strategies include highly targeted, amplicon-based panels covering mutation hotspots from approximately 50 genes, designed to cover current clinically actionable mutations and clinical trials, as well as broader, custom panels (275–429 genes) utilizing hybrid-capture to isolate all exons and some introns to support discovery as well as clinical research projects. In addition, each centre’s approach has evolved, such that the GENIE dataset contains 12 different gene panels that were used in at least 50 samples. A total of 44 genes were included on all 12 of these panels. The larger hybrid-capture gene panels included all of the genes on the smaller gene panels and added 145 genes common to all of the larger panels and an additional 134 genes common to at least 2 of these larger panels.

Genomic data within GENIE include mutation data (all centres), copy-number number (three centres), and structural rearrangement data (two centres). Two centres implemented paired tumour/normal sequencing, whereas all other centres conducted tumour-only sequencing.

The most highly represented tumour types across the GENIE consortium tend to be those where genomic data are currently used to guide standard treatment decisions, such as non–small cell lung cancer and colorectal cancer along with melanoma.

In a paper published online in the Cancer Discovery on 1 June 2017, the Consortium also provided examples of the clinical utility of GENIE data, such as an estimate of clinical actionability across multiple cancer types (>30%) and prediction of accrual rates to the NCI MATCH trial that accurately reflect recently reported actual match rates.

Based on yearly rates of sequencing at each of founder institutions, the GENIE database is expected to grow by approximately 16,000 samples per year. But, with the addition of new members, it is likely that the GENIE database will grow to over 100,000 samples within 5 years.

With recent technological advances, it is anticipated that future releases of GENIE data will be enriched for large, targeted DNA sequencing panels that characterise further sources of genomic variation, including new structural rearrangements and promoter mutations, and integration of additional genomic platforms, including whole-exome and whole-genome DNA sequencing, transcriptome sequencing, methylation, proteomics, and immunoprofiling. In addition, analyses of circulating tumour DNA or circulating tumour cells from blood specimens or other bodily fluids may be included to identify molecular changes in cancer genomes at the time of diagnosis or during therapy as these analyses become included in routine laboratory practice.


The AACR Project GENIE Consortium. AACR Project GENIE: Powering Precision Medicine Through An International Consortium. Cancer Discovery; Published OnlineFirst June 1, 2017; DOI: 10.1158/2159-8290.CD-17-0151.