Although the cause of Henrietta Lacks’ cervical tumor was not known in her lifetime, we now understand that it was triggered by infection with human papillomavirus (HPV) type 18. When this virus infects the cervical epithelium, the viral DNA may integrate into the host genome, causing the cells to become transformed and eventually malignant. HeLa cells are known to contain integrated HPV18 DNA.
There are many different types of cancer, each caused by errors in DNA. The Cancer Genome Atlas (TCGA) is a database for collecting the DNA sequences of diverse cancers from many different individuals. It was established to help understand what mutations cause various types of cancer. As viruses are known to be responsible for about 20% of human cancers, searching this database for viral sequences can advance our understanding of their role in this disease. For example, almost every genome from patients with cervical cancer contains HPV DNA.
A recent search of the TCGA for viral sequences revealed that, in addition to cervical cancer, HPV18 sequences were found in many other cancers, including colon, head and neck, kidney, liver, lung, ovary, rectum, and stomach. The HPV18 sequences in non-cervical cancers resembled the viral sequence found in HeLa cells, both in integration site and single nucleotide variations. In other words, the HPV18 in these cancers closely matches that of the viral genome integrated into HeLa cells, and their presence is likely due to contamination.
Further analysis revealed that the contaminated samples originated from only two genome sequencing centers, the University of North Carolina Lineberger Comprehensive Cancer Center, and the Michael Smith Genome Sciences Centre of the British Columbia Cancer Agency. All the contamination took place in 2011 and 2012, and was limited to 18 (6%) of the sequencing machines.
The contamination with HeLa nucleic acid was observed only in datasets derived from sequencing of RNA, not DNA. I asked the senior author Jim Pipas how he thought this contamination might have taken place:
I can think of two possibilities. One is that the RNA isolated from the tumor was somehow contaminated with HeLa sequences. The other is that HeLa cell RNA was sequenced on the same machine as the tumors and the contamination is from the sequencing machine itself.
It is well known that nucleic acids can become contaminated during their manipulation in the laboratory. The use of sensitive techniques such as PCR and deep sequencing reveal such contamination when it previously went unnoticed. High profile examples of nucleic acid contamination include the retrovirus XMRV associated with chronic fatigue syndrome, and a virus believed to cause hepatitis (a contaminant from laboratory plasticware).
As virus discoverer Eric Delwart noted on TWiV 86, ‘DNA is a real problem. It’s everywhere’. Apparently so is HeLa cell RNA.
The contamination reported by Drs. Cantalupo, Katz, and Pipas in regard to HeLa Nucleic Acid and HPV18 [1] appears to also apply to HIV. In 2014 I was able to find HIV-1 DNA in a surprisingly wide variety of unrelated taxa. Like Professor Racaniello I assumed “their presence is likely due to contamination”. However it is a mystery why I am not able to find any contaminations with the far more prevalent Hepatitis B and Hepatitis C viruses (which are associated with HIV infections). My results can be found here [2].
Similar cases of contaminations may occur when laboratories use “lab strains†such as JR-CSF or HXB2 to validate their PCR primers. Then when they amplify and sequence virus from patients some turn out to be the “lab strainâ€. On the other hand, labs may harbor sequences from one patient contaminated by another patient. In situations such as these it will not be possible be certain about the true origin of the sequences.
1. Cantalupo PG, Katz JP, Pipas JM. HeLa nucleic acid contamination in The Cancer Genome Atlas leads to the misidentification of human papillomavirus 18. J Virol 2015; 89: 4051-7.
2. Romero Fernández-Bravo M. Contamination of genomic databases by HIV-1 and its possible consequences. A study in Bioinformatics. 2014. http://openaccess.uoc.edu/webapps/o2/handle/10609/31361