Fig. 2From: Biased data, biased AI: deep networks predict the acquisition site of TCGA imagesDistribution of cancer types among samples contributed by each group A institution, showing top 5 most frequent cancer types (In TCGA dataset), the rest is labeled as “other”. One could see that the distribution significantly varies among institutions, which can be a source of bias for a model trained for cancer subtype classification. MSKCC: Memorial Sloan Kettering Cancer Center, Pitt: University of Pittsburgh, IGC: International Genomic Consortium, HFH: Henry Ford Hospital, UMich: University of Michigan, UNC: University of North Carolina, GPCC: Greater Poland Cancer Center, UHN: University Health Network, UCSF: University of California San Francisco, BCH: Barretos Cancer Hospital, Duke U: Duke University, Emory U: Emory University, Christiana HC: Christiana Healthcare. KIRC: Kidney Renal Carcinoma, PRAD: Prostate Adenocarcinoma, LUSC: Lung Squamous Cell Carcinoma, BRCA: Breast Carcinoma, THCA: Thyroid CarcinomaBack to article page