Cellosaurus logo
expasy logo

Cellosaurus FAQ (Frequently Asked Questions)


CALIPHO group at the SIB - Swiss Institute of Bioinformatics Geneva, Switzerland


Index of questions
Q01: How do I cite the Cellosaurus in a publication?
Q02: How can I contact the Cellosaurus so as to propose an update to an existing entry or a new entry?
Q03: How frequently is the Cellosaurus upated?
Q04: Are plant cell lines going to be added to the Cellosaurus?
Q05: Are patient-derived xenografts and other transplantable tumors going to be added to the Cellosaurus?
Q06: Is it possible to directly indicate search terms in a URL?
Q07: Can I save the Cellosaurus records corresponding to the result of a search?
Q08: I would like to order cell line XYZ from you?
Q09: Where can I buy cell line XYZ?
Q10: Why is supplier XYZ not listed in the answer to question Q09?
Q11: Is there a difference in the content of the Cellosaurus in the structured text (TXT), OBO and XML formats?
Q12: Why is cell line XYZ reported to originate from a male individual but its authentication value for amelogenin is reported as "X" instead of "X,Y"?
Q13: How to make best use of the Cellosaurus in the context of literature text-mining activities?
Q14: What is the meaning of the "CVCL" prefix in Cellosaurus accession numbers?
Q15: Can I search inside NCBI PubMed or Europe PMC for papers that are cited in the Cellosaurus?
Q16: My cell line XYZ is not in the Cellosaurus, how can I get it in?
Q17: Is there a one to one relationship between the cell lines listed in the ICLAC register and the Cellosaurus "Problematic cell lines"?
Q18: What is the difference between a Cellosaurus accession number and a cell line RRID?
Q19: Why are primary cells not included in the Cellosaurus (while it includes many finite cell lines).
Q20: Does the Cellosaurus include references to all papers relevant to a given cell line?
Q21: Why are there different entries for the 3T3 cell line in the Cellosaurus?
Q22: What is the licence under which the Cellosaurus is provided?
Q23: What can I search for using the search bar?
Q24: Why can't I find a recently created entry in the Cellosaurus?
Q25: How can I access an old version of the Cellosaurus?
Q26: Do vaccines contain cell lines?
Q27: How do I get cell line XYZ to grow?
Q28: What are the differences between the Cellosaurus and hPSCreg?
Q29: Are fetuses aborted to derive cell lines?
Q30: Is the sequence of gene ABCD identical in cell line XYZ and in the reference human genome?
Q31: Why when I use CLASTR to search for similarities between the STR profile of my mouse cell line and the profiles stored in the Cellosaurus I am either not getting any hits or unexpectedly get a very high match with a number of cell lines?
Q32: Can you provide the Cellosaurus in JSON format?
Q33: What is the meaning of the different two-letter line codes in the text version of the Cellosaurus?


Q01: How do I cite the Cellosaurus in a publication?

The paper to cite that describes the Cellosaurus is:

Bairoch A. The Cellosaurus, a cell line knowledge resource. J. Biomol. Tech. 29:25-38(2018).

DOI: 10.7171/jbt.18-2902-002; PMID: 29805321; PMCID: PMC5945021

See: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5945021/

You can also cite Cellosaurus using its RRID: Cellosaurus (RRID:SCR_013869)

If you have used the CLASTR STR similarity search tool we would be pleased that you cite:

Robin T., Capes-Davis A., Bairoch A. CLASTR: the Cellosaurus STR similarity search tool -- A precious help for cell line authentication. Int. J. Cancer 146:1299-1306(2020).

DOI: 10.1002/ijc.32639; PMID: 31444973

See: https://doi.org/10.1002/ijc.32639

You can also cite CLASTR using its RRID: CLASTR (RRID:SCR_024863)


Q02: How can I contact the Cellosaurus so as to propose an update to an existing entry or a new entry?

You can either click on the "contact" link on the top right of all Cellosaurus web pages or send an email to "cellosaurus@sib.swiss"


Q03: How frequently is the Cellosaurus upated?

Currently there is no regular update schedule, but we try to release new versions of the Cellosaurus at least 4 times per year.


Q04: Are plant cell lines going to be added to the Cellosaurus?

Currently there is no plan to increase the coverage of the Cellosaurus to plant cell lines.


Q05: Are patient-derived xenografts and other transplantable tumors going to be added to the Cellosaurus?

This is something undergoing internal discussion and if you think that it would be useful to add them, please let us know!


Q06: Is it possible to directly indicate search terms in a URL?

Yes, you can do so with the following syntax:

https://www.cellosaurus.org/search?query=your_query

Examples:

https://www.cellosaurus.org/search?query=CVCL_0030
https://www.cellosaurus.org/search?query=hela
https://www.cellosaurus.org/search?query=sus+scrofa
https://www.cellosaurus.org/search?query=%22fish%20cell%20line%22
https://www.cellosaurus.org/search?query=doubling+time+%223%20days%22
https://www.cellosaurus.org/search?query=%22PubMed=1000501%22


Q07: Can I save the Cellosaurus records corresponding to the result of a search?

Not yet, but this is someting we would like to implement in a future version of the web interface


Q08: I would like to order cell line XYZ from you?

We are not selling or distributing any cell lines. The Cellosaurus is a knowledge resource on cell lines.


Q09: Where can I buy cell line XYZ?

The Cellosaurus provides cross-references or web links to the main organisations and companies selling or distributing cell lines. You will find these links in the relevant cell line entries.

Important caveats:

We try to be as inclusive as possible but the lack of indication of the availability of a cell line in a Cellosaurus entry can not be considered as a proof that the cell line is not available somewhere.

The list of cell lines distributed by a given entity is very dynamic and we can't insure to be in perfect synchronization with all the entities that are involved in cell line distribution.

If you are unable to find and organisation that distributes your cell line of interest, you could try to contact the laboratory that established it or, if it is no longer in activity, a laboratory that makes use of that cell line.

The papers referenced in the relevant Cellosaurus cell line entry should help you to find the right person to contact. Look at the oldest reference listed in the Cellosaurus entry. This will most often be the reference describing the establishment of that cell line or if not the first reference mentioning it. However if the group that created a cell line is no longer active, search in Google Scholar or PMC Central for recent papers mentioning this cell line. Look in the material and methods section for mention of the type "thanks to Xxxx for providing the Yyyyy cell line". You can then try to contact the relevant person.

Hereunder we provide links to a number of companies and organisations that provide cell lines. We encourage you to browse their online catalog.

Abcam: https://www.abcam.com/nav/cell-lines-and-lysates
AddexBio: https://www.addexbio.com/productshow?id=4
Allen Cell Collection: https://www.allencell.org/cell-catalog.html
American Type Culture Collection (ATCC): https://www.atcc.org/
Applied Biological Materials (abm): https://www.abmgood.com/Cell-Biology.html
B'SYS https://www.bsys.ch/cell-lines.html
Banco de Celulas do Rio de Janeiro (BCRJ): https://bcrj.org.br/pesquisa/
BioIVT https://www.bioivt.com/cell-products/?_sf_s=cell%20lines
Biopredic International: https://www.biopredic.com/
Bioresource Collection and Research Center (BCRC) from Taiwan: https://www.bcrc.firdi.org.tw/en/home/
Cedarlane Cellutions Biosystems: https://www.cedarlanelabs.com/Cellutions
Cell Biolabs: https://www.cellbiolabs.com/
CellBank Australia (CBA): https://www.cellbankaustralia.com/
Cellero: https://cellero.com/products/immune-cells/
Cellular Engineering Technologies (CET): https://www.celleng-tech.com/ips-cell-lines
Children's Oncology Group (COG) Cell Culture and Xenograft Repository: https://www.cccells.org/cellreqs-leuk.php
Chordoma Foundation cell line repository: https://www.chordomafoundation.org/research/disease-models/
CLS - Cell lines Services: https://cls.shop/
Collection of Cell Lines in Veterinary Medicine (CCLV): https://www.fli.de/en/services/collection-of-cell-lines-in-veterinary-medicine-cclv/
Coriell Institute Biorepositories: https://catalog.coriell.org/
Cosmo Bio https://www.cosmobio.com/contents/list.php?category_id=36
Creative Bioarray: https://www.creative-bioarray.com/products/immortalized-cells-191.htm
Creative Biolabs: https://neuros.creative-biolabs.com/category-neural-cell-lines-42.htm
Deutsche Sammlung von Mikroorganismen und Zellkulturen (DSMZ): https://www.dsmz.de/catalogues/catalogue-human-and-animal-cell-lines.html
DiscoverX: https://www.discoverx.com/products-applications/cell-lines
Drosophila Genomics Resource Center (DGRC): https://dgrc.bio.indiana.edu/cells/Catalog
ESI BIO: http://www.esibio.com/products/product-category/cell-lines/
European Bank for induced pluripotent Stem Cells (EBiSC): http://www.ebisc.org/
European Collection for Biomedical Research: http://bioinformatics.hsanmartino.it/ecbr/ecbrsite.html
European Collection of Cell Cultures (ECACC): http://www.phe-culturecollections.org.uk/products/celllines/index.aspx
Evercyte: https://evercyte.com/products-cells-catalogued-by-cell-type/
FujiFilm Cellular Dynamics, Inc: https://fujifilmcdi.com/the-cirm-ipsc-bank
GenHunter: https://www.genhunter.com/cell-lines/
GenScript: https://www.genscript.com/cell_lines.html
HGDP-CEPH Human Genome Diversity Cell Line Panel: https://cephb.fr/en/hgdp_panel.php
Horizon Discovery: https://www.horizondiscovery.com/cell-lines
Human Induced Pluripotent Stem Cells Initiative: https://www.hipsci.org/
IHWG Cell and DNA Bank: https://www.fredhutch.org/en/labs/clinical/projects/ihwg/cell-lines-genes.html
Imanis Life Sciences: https://www.imanislife.com/collections/cell-lines/
Inbiomed: http://www.inbiomed.org/Index.php/servicios_externos/inbiobank
INCELL: https://www.incell.com/product-category/proprietary-cell-lines/
IncuCyte (Essen BioScience): https://www.essenbioscience.com/en/products/reagents-consumables/category/cell-lines/
Innoprot: https://innoprot.com/products/
Interlab Cell Line Collection (ICLC): http://www.iclc.it/
Inven2Biologics: http://inven2biologics.com/shop/cell-lines
InvivoGen: https://www.invivogen.com/cell-lines
ISENET Biobanking: https://www.isenetbiobanking.com/stem-cells-catalogue
Istituto Zooprofilattico Sperimentale della Lombardia e dell'Emilia Romagna biobank (IZSLER): http://www.ibvr.org/Services/CellCultures.aspx
Japanese Collection of Research Bioresources Cell Bank (JCRB): https://cellbank.nibiohn.go.jp/english/
Kerafast: https://www.kerafast.com/cat/60/cells-tissue-samples
King's College London hES cell line catalog: https://www.kcl.ac.uk/lsm/research/divisions/wh/groups/medicine/hescell.aspx
Korean Cell Line Bank (KCLB): https://cellbank.snu.ac.kr/main/index.html
Kunming Cell Bank of Type Culture Collection (KCB): http://english.kiz.cas.cn/about/an/201108/t20110824_74162.html
KYinno: https://www.kyinno.com/
Lurie Children Hospital of Chicacgo Human iPS & Stem Cell Core Facility: https://www.luriechildrens.org/en/research/core-facilities/ips-stem-cell/
Memorial Sloan Kettering Cancer Center: https://www.mskcc.org/research-advantage/support/office-technology-development/tangible-materials-available-licensing
Merck Millipore (EMD Millipore) https://www.merckmillipore.com/
National Cell Bank of Iran (NCBI): https://en.pasteur.ac.ir/Department%20of%20Cell%20Bank
National Laboratory for the Genetics of Israeli Populations (NLGIP): http://yoran.tau.ac.il/nlgip/
Ncardia: https://ncardia.com/
NIA ES Cell Bank: https://esbank.nia.nih.gov/
NIH HIV Reagent Program: https://www.hivreagentprogram.org/
NIMH Repository and Genomic Resource: https://www.nimhgenetics.org/stem_cells/crm_lines.php
NINDS Human Cell and Data Repository (NHCDR): https://stemcells.nindsgenetics.org
Novus Biologicals: https://www.novusbio.com/
Oxford Expression Technology (OET): https://oetltd.com/product-category/insect-cell-culture/insect-cells/
ParaTechs: https://www.paratechs.com/ve-bevs-cell-lines/
PerkinElmer: https://www.perkinelmer.com/
ProBioGen: https://www.probiogen.de/virus-production-cell-lines.html
Progeria Research Foundation: https://www.progeriaresearch.org/available-cell-lines/
RIKEN Bioresource Center Cell Bank (RCB): https://cell.brc.riken.jp/en/rcb
Rockefeller University Embryonic Stem Cell Lines: https://rues.rockefeller.edu/
Rockland Immunochemicals: https://www.rockland.com/categories/cell-lines-and-lysates/
Royan Stem Cell Bank (RSCB): http://www.royaninstitute.org/cmsen/index.php?option=com_content&task=view&id=205&Itemid=40
Selexis: http://selexis.com/technology/proprietary-cho-k1-cell-line/
Sigma-Aldrich: https://www.sigmaaldrich.com/life-science/cell-culture.html
Takara (Clontech Laboratories): https://www.takarabio.com/
ThermoFischer Scientific: http://www.thermofisher.com/ch/en/home/life-science.html
Tiandz: http://www.tiandz.com/en/
Tick Cell Biobank: https://www.liverpool.ac.uk/infection-and-global-health/research/tick-cell-biobank/
Tohoku University cell line catalog (TKG): http://www2.idac.tohoku.ac.jp/dep/ccr/
Vircell: https://en.vircell.com/products/?technique=34
WiCell: https://www.wicell.org/
World of Sarcome cell line https://en.cellline.jp/
XCell Science: http://www.xcellscience.com/products/ipsc
Ximbio reagents online portal: https://ximbio.com/
XIP: https://xip.uclb.com/products/materials/cell_lines/list
ZenBio: https://www.zen-bio.com/products/cells/


Q10: Why is supplier XYZ not listed in the answer to question Q09?

There are many cell line suppliers and while we strive to be comprehensive we are probably missing a number of organizations/companies that distribute cell lines. So let us know if you think a supplier should be added to the list.

But you should also be aware that:

1) We may have been informed by members of the scientific community that a supplier distributes cell lines that are not tested for authenticity or have been obtained illegally. We will remove the supplier from our list if we are requested to do so for those reasons.

2) We do not list suppliers that act solely as redistributors for another organisation.


Q11: Is there a difference in the content of the Cellosaurus in the structured text (TXT), OBO and XML formats?

Yes, you should be aware that:

The OBO format file (cellosaurus.obo) does not contain: the STR profile data, the age at sampling and the full reference records (authors, title, journal). All the comments are concatenated.

And while the TXT and XML versions of the Cellosaurus have the same data content, there are some differences in terms of the organization of the data. These differences are described below.

The TXT version is split into two data files: cellosaurus.txt and cellosaurus_refs.txt, the first file contains all the Cellosaurus information except for the full reference records which are stored in the second file. This is not the case for the XML version which consists of a single data file, cellosaurus.xml and a XML schema definition file, cellosaurus.xsd

In the TXT file cross-references to external resources do not contain the URLs of these cross-references, these can be instantiated using the information stored in the file cellosaurus_xrefs.txt


Q12: Why is cell line XYZ reported to originate from a male individual but its authentication value for amelogenin is reported as "X" instead of "X,Y"?

Nearly half of male cancer cell lines are incorrectly identified as female. Authentication testing kits usually include sex-specific markers such as the amelogenin gene. Frequent loss or mutation of this gene has been observed resulting in a false-negative call for presence of the Y chromosome and the sample typing as female. Loss of an intact Y chromosome has also been observed, sometimes with residual fragments found in other chromosomes suggesting a catastrophic chromosomal event in these cancer cells. Loss of the Y chromosome is also evident in normal physiology. In cases where the autosomal STR results match the original sample, but the allosome results are not what is expected, this indicates the cell line origin is correct but further investigation is required to understand the sex of the sample.

For further reading: https://www.ncbi.nlm.nih.gov/pubmed/9462733
https://www.ncbi.nlm.nih.gov/pubmed/15637111
https://www.ncbi.nlm.nih.gov/pubmed/25714623

PS1: Thanks to Richard Neve and Amanda Capes-Davies for providing this very informative answer.

PS2: In August 2021 out of 4279 human male cell lines that have a STR profile in the Cellosaurus, 1174 of them have at least one report of a STR profile with no "Y" allele detected.


Q13: How to make best use of the Cellosaurus in the context of literature text-mining activities?

We describe here the features of the Cellosaurus which are particularly useful in the context of the development of dictionaries for name entity recognition (NER) in the full text of life sciences publications (journals and patents).

All examples below make use of the Cellosaurus XML format, but the equivalent information is available in the structured text and OBO formats files.

1) Names/synonyms

A cell line recommended name and synonyms are stored in the "name_list" element.

Example (CVCL_0030):

<name-list>
 <name type="identifier">HeLa</name>
 <name type="synonym">HELA</name>
 <name type="synonym">Hela</name>
 <name type="synonym">He La</name>
 <name type="synonym">He-La</name>
 <name type="synonym">Henrietta Lacks cells</name>
 <name type="synonym">Helacyton gartleri</name>
</name-list>

Important notes:


- There is always one recommended name ("identifier") and >=0 alternative name(s) ("synonym").


- As shown in the above example, synonyms include alternative use of lower and upper cases.


- For text-mining purpose there is no distinction to be made between the recommended name and the alternative names. The recommended name does not aim to capture the most frequently used name but rather either that proposed by the originator of the cell line, or one consistent with other similarly named cell lines, or the least ambiguous one.


- When two or more cell lines share identical recommended names, the ambiguous names are disambiguated by post-fixing the name with a short description between square brackets. Examples:

 CVCL_5246:
 <name type="identifier">C32 [Human colon adenocarcinoma]</name>
CVCL_1097:
 <name type="identifier">C32 [Human melanoma]</name>

The description field should be ignored when building a cell name dictionary.

2) Misspellings

Misspellings are captured in the Cellosaurus through the comment category "Misspelling". Examples:

CVCL_Z275:
<misspelling-list>
  <misspelling>
    <misspelling-name>KLBIQ-Chsu-I</misspelling-name>
    <misspelling-note>In text of paper but not in abstract.</misspelling-note>
    <reference-list>
      <reference resource-internal-ref="PubMed=25381037"/>
    </reference-list>
  </misspelling>
</misspelling-list>

CVCL_0332:
<misspelling-list>
  <misspelling>
    <misspelling-name>HTB126</misspelling-name>
    <misspelling-note>Based on the ATCC catalog number.</misspelling-note>
  </misspelling>
  <misspelling>
    <misspelling-name>Hs-587-T</misspelling-name>
    <xref-list>
      <xref database="Cosmic" category="Polymorphism and mutation databases" accession="1176633">
        <url><![CDATA[https://cancer.sanger.ac.uk/cosmic/sample/overview?id=1176633]]></url>

      </xref>
      <xref database="Cosmic" category="Polymorphism and mutation databases" accession="1010929">
        <url><![CDATA[https://cancer.sanger.ac.uk/cosmic/sample/overview?id=1010929]]></url>

      </xref>
    </xref-list>
  </misspelling>
</misspelling-list>

3) Accession numbers

As Cellosaurus accession numbers are now starting to be used to cite cell lines in the literature (mainly in the context of the efforts of the Resource Identification Initiative) it is useful to include them in your cell line dictionary. The accession numbers are stored in the "accession_list" element.

Example:

<accession-list>
 <accession type="primary">CVCL_0298</accession>
 <accession type="secondary">CVCL_4526</accession>
</accession-list>

4) Contextualization

Cell line names be contextualized on the basis of:


- Species: through the "species-list" element which makes use of NCBI Taxonomy TaxIDs.


- Subspecies/breed: through the comment category "Breed/subspecies" which is free text.


- Disease: through the "disease-list" which makes use of NCI Thesaurus and Orphanet ORDO codes.


- Tissue: Not yet possible. We plan to add the tissue of origin using Uberon as the standardization resource, but this will not take place until 2022.


Q14: What is the meaning of the "CVCL" prefix in Cellosaurus accession numbers?

It is a good practice for knowledge resources and ontologies to use accession numbers that are composed of a prefix followed by a separator (often ':' or '_') and then an alphanumerical string. The prefix is often the abbreviation or the name of the resource. As the name "Cellosaurus" is a bit long and not obvious to abbreviate we chose "CVCL" which originally stood for "Controlled Vocabulary for Cell Lines". As the Cellosaurus is now much more than a controlled vocabulary we are not eager to explicit what "CVCL" means!


Q15: Can I search inside NCBI PubMed or Europe PMC for papers that are cited in the Cellosaurus?

Yes, the Cellosaurus is part of both the NCBI "Linkout" and Europe PMC "External Links" services.

A) For PubMed:

To restrict searches to papers cited in the Cellosaurus, you need to add "AND loprovCellosaurus [filter]" to your search pattern.

Example: prostate[Title/Abstract] AND loprovCellosaurus[filter]

https://www.ncbi.nlm.nih.gov/pubmed/?term=prostate%5BTitle%2FAbstract%5D+AND+loprovCellosaurus%5Bfilter%5D

will retrieve all papers that mention the word "prostate" in their title or abstract and are cited in the Cellosaurus.

B) For Europe PMC:

To restrict searches to papers cited in the Cellosaurus, you need to add "AND (LABS_PUBS:"1815")" to your search pattern.

Example: (ABSTRACT:"prostate") AND (LABS_PUBS:"1815")

https://europepmc.org/search?query=%28ABSTRACT:%22prostate%22%29+AND+%28LABS_PUBS:%221815%22%29&page=1

will retrieve all papers that mention the word "prostate" in their abstract and are cited in the Cellosaurus.


Q16: My cell line XYZ is not in the Cellosaurus, how can I get it in?

There are two possibilities:

1) You can send us the link to the publication where your cell line was first described. Based on the information in that publication we will create a preliminary Cellosaurus entry and then send you that entry for feedback. We will also possibly ask you for further useful information that was not available in the paper.

2) If your cell line was not described in any publication, you can send us the information that will be used to create a new Cellosaurus entry. The type of information needed varies depending on the type of cell line (cancer, transformed, ESC, iPSC, hybridoma, etc) and we are planning to create a web form to allow submission of new cell line information. In the meanwhile, the best strategy is to send us in a first step a minimal set of information which consist of:


- Name
- Species of origin (and if not human the strain/breed of the animal)
- Gender and age of donor
- Category of cell line (examples: cancer, hybridoma, iPSC, ESC, etc).
- If the donor is suffering from a disease (cancer or genetic), that disease name

There are many other information items that could be useful and you should look in the Cellosaurus at entries for cell lines "similar" to that you are submitting so that you have an idea of the type of information we capture. And you can also read the following page for a complete list of Cellosaurus data fields:

https://www.cellosaurus.org/description.html

As for a publication-linked cell line, we will send you back a preliminary entry for feedback.

The preliminary entry will already contain the accession number of the future entry and you can cite it as a RRID (Research Resource Identifier) in your manuscripts.


Q17: Is there a one to one relationship between the cell lines listed in the ICLAC register and the Cellosaurus "Problematic cell lines"?

There is not a one to one relationship between the ICLAC register of misidentified cell lines and the Cellosaurus entries that contain the comment statement "Problematic cell line".

To understand the differences between the two sets one first need to be aware that the ICLAC register is composed of two tables:


- Table 1: cell lines where there is no known authentic stock


- Table 2: cell lines where some stocks have been shown to be misidentified, but where authentic stock is known to exist.

For all cell lines listed in Table 1 of the ICLAC register, there is in the corresponding Cellosaurus entries a comment "Problematic cell line". That comment starts with either "Contaminated" or "Misidentified".

Example:

Problematic cell line: Contaminated. Shown to be a HT-29 derivative (PubMed=10508494; PubMed=20143388). Originally thought to originate from a 60 year old male patient gastric carcinoma.

In addition these Cellosaurus entries will have a "Registration" comment line indicating the ICLAC register number for the cell line.

Example:

Registration: International Cell Line Authentication Committee, Register of Misidentified Cell Lines; ICLAC-00444.

But there are additional Cellosaurus entries that contain "Problematic cell line" comments, these belong to 3 categories:

1) Cell lines that are derived from a cell line in ICLAC Table 1. Currently ICLAC does not provide register numbers for all known derivative of a contaminated/misidentified cell line. For example: the cell line KB is in the ICLAC register (ICLAC-00010) as well as two of its derivatives: KB-3-1 (ICLAC-00372) and KB-V1 (ICLAC-00373). But KB-A1, KB-C1.5, KB-C2 as well as 25 other KB-derived cell lines are not reported in the ICLAC register.

2) Cell lines that will be added to Table 1 but which have not yet appeared in the current version of the register. In the past six years the ICLAC register has been updated every second year. Thus quite a number of Table 1 cell lines can accumulate between two register updates.

3) Cell lines which have either not yet been considered by ICLAC for inclusion in the register or that have been considered but for which there is not enough evidence for them to appear in Table 1. The entries generally have a "Problematic cell line" comment that starts with "Probably".

Example:

Problematic cell line: Probably contaminated. The STR profile is identical to that of the T24 cell line.

All the entries in the above 3 categories do not have a "Registration" comment for the ICLAC register.

Concerning ICLAC Table 2 the situation in the Cellosaurus is the following: the majority of these cell lines contains a "Problematic cell line" comment that starts with "Partially contaminated" or "Partially misidentified" as well as an ICLAC "Registration" comment. But about one third of the 43 cell lines that are currently reported in Table 2 are not associated with a "Problematic cell line" nor an ICLAC "Registration" comment in their corresponding Cellosaurus entries. These are well known and widely used cell lines (examples: BJA-B, BT-20, J82, RT-4, etc.) where stock contamination was reported a very long time ago in a very limited number of laboratories. We believe that flagging them with the "Problematic cell line" comment would create confusion for researchers and would lead them to unnecessarily question their use of these cell lines.

Finally you should note that the list of Cellosaurus entries that are listed in tables 1 and 2 of the current public release of the ICLAC register can be retrieved using either one of the following two queries:

https://www.cellosaurus.org/search?query=%22iclac.org%22
https://www.cellosaurus.org/search?query=%22Register%20of%20Misidentified%20Cell%20Lines%22


Q18: What is the difference between a Cellosaurus accession number and a cell line RRID?

There is no difference. The Cellosaurus accession number (in the format CVCL_xxxx) is identical to the Research Resource Identifier (RRID).

To cite a cell line in a publication use the format: Cell_line_name (RRID:CVCL_xxxx). Example: HeLa (RRID:CVCL_0030)

See also our educational video on cell lines RRIDs:

https://www.youtube.com/watch?v=Cz64B4ShS64


Q19: Why are primary cells not included in the Cellosaurus (while it includes many finite cell lines).

In the Cellosaurus we include finite cell lines (that have a limited lifespan and will senesce after a number of population doublings) if they are well defined in term of their origin (species, gender, age at sampling) and if they are distributed by an academic or commercial entity or used widely by one or more laboratory. Although it has a limited lifespan, a finite cell line from a single donor may be used extensively provided it is stored early in its lifespan.

Primary cells also know as primary culture are generally taken directly from a living tissue, are established for growth in vitro and have undergone very few population doubling before they are distributed. They are generally more heterogeneous than a cell line which has been continually passaged over a long period of time and has acquired homogeneous genotypic and phenotypic characteristics. Companies distributing primary cells will often replenish their stock of primary cells and each batch may originate from a different donor, thus while the product catalog number of a primary cell may be stable, its genetic and phenotypic background may change.

As primary cells do not represent a precisely defined entity, they are out of the scope of the Cellosaurus. This is also why we will not assign RRIDs for these entities.


Q20: Does the Cellosaurus include references to all papers relevant to a given cell line?

We try to include references to papers that describe the establishment and characterization of a cell line. We are definitively not attempting to capture all papers that make use of a cell line, as this would be, in many case, quite consequential.

An important rule is that we always include in a Cellosaurus entry the references for the papers from which we have extracted information that was added to that entry.

For hybridomas we do not capture references pertinent to the use of the monoclonal antibody (mAB) produced by the hybridoma as this is outside of the scope of the Cellosaurus. We make an exception for papers that report the exact molecular target of the mAb as this information is useful to describe the hybridoma.

If you believe we are missing a reference to a paper that should be included in a Cellosaurus entry, please let us know so that we can add it.


Q21: Why are there different entries for the 3T3 cell line in the Cellosaurus?

As nicely explained in a Wikidata page and an ASBMB Today blog entry:

https://en.wikipedia.org/wiki/3T3_cells
https://www.asbmb.org/asbmbtoday/201501/Generations/3T3/

the term 3T3 not only refers to a number of cell lines but originates in the method that Todaro and Green used to establish these cell lines, namely primary mouse embryonic fibroblasts were transferred ("T") every 3 days (the first "3") and inoculated at the rigid density of 3x10^5 cells per 20 cm2 dish (the second "3") continuously.

Therefore there are a variety of "3T3" cell lines that have been established over the years. Most of them are pre- or post-fixed with the name of the mouse breed from which they originate (like: 3T3-Swiss albino, BALB/3T3 or NIH 3T3).

There is however grounds for confusion with these names and specifically one of the problem lies with the 3T3-Swiss albino and NIH 3T3 lines:

https://www.cellosaurus.org/CVCL_0120
https://www.cellosaurus.org/CVCL_0594

The first one is from a Swiss-Albino mouse and was published in 1963, while the second is from a NIH Swiss mouse and was published in 1969. Many authors abbreviate them both to 3T3-Swiss or Swiss-3T3 thus blurring the distinction between the two cell lines. The only way to know which one was used in a paper is to look up the catalog number of the cell line collection where they got their "3T3" line and to see to which Cellosaurus entry it is linked with. Unfortunately many papers do not contain this information and the only possibility to disambiguate between the two cell lines is to directly contact the authors.


Q22: What is the licence under which the Cellosaurus is provided?

We have chosen to apply to the Cellosaurus the Creative Commons Attribution 4.0 International (CC BY 4.0) license:

https://creativecommons.org/licenses/by/4.0/

This means that you are free to:

Share: copy and redistribute the Cellosaurus in any medium or format

Adapt: remix, transform, and build upon the Cellosaurus for any purpose, even commercially.

Under the following terms:

Attribution: You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests we (the licensor) endorses you or your use.

No additional restrictions: You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.


Q23: What can I search for using the search bar?

The search bar offers access to a full text index of the Cellosaurus. This means you can not only search for a cell line name, but also other things such as catalog numbers in cell line collections, disease names, species, reference authors, reference titles, PubMed IDs, sequence variations, tranfected genes, etc.


Q24: Why can't I find a recently created entry in the Cellosaurus?

We create new entries and provide the corresponding accession numbers (RRIDs) as quickly as possible so that researchers can cite them in their manuscripts in a timely manner. But the Cellosaurus is not updated daily, new releases are prepared every 3 to 4 months. Thus depending on when a new RRID was created it can take up to 4 months before you can find it online in the Cellosaurus. However you do not need to wait until a RRID is made public to cite it in a publication.

See also our educational video on this subject:

https://www.youtube.com/watch?v=Cz64B4ShS64


Q25: How can I access an old version of the Cellosaurus?

Lets start by some preliminary explanations and a bit of "history":


- The Cellosaurus is updated regularly through a release mechanism with new versions being released at least 4 times per year.
- The first version to be publicly distributed was release 2.0 on 04-Apr-2012.
- Each subsequent release is consecutively numbered with one exception: release 9.0 of 16-Apr-2014 was followed by release 9.1 of 17-Jul-2014.
- Initially the Cellosaurus files were available for download on the neXtProt FTP site but since May 2015 (with release 12.0 of 10-Apr-2015) they have been distributed from the Expasy FTP site:

ftp://ftp.expasy.org/databases/cellosaurus


- The number of files distributed at each release has evolved through time:

File name                       Release and date of first distribution
------------------------------  --------------------------------------
cellosaurus.txt                 2.0 of 04-Apr-2012
cellosaurus_relnotes.txt        2.0 of 04-Apr-2012
cellosaurus.obo                 4.0 of 22-Oct-2012
cellosaurus_deleted_ACs.txt     7.0 of 05-Nov-2013
cellosaurus_refs.txt            9.0 of 16-Apr-2014
cellosaurus_xrefs.txt           9.1 of 17-Jul-2014
cellosaurus_faq.txt             15.0 of 14-Dec-2015
cellosaurus.xml                 20.0 of 01-Dec-2016
cellosaurus.xsd                 20.0 of 01-Dec-2016
cellopub.txt                    21.0 of 03-Mar-2017
cellosaurus_name_conflicts.txt  23.0 of 22-Aug-2017

So where can you find old versions of the Cellosaurus files?

a) Starting with release 11.0 of 07-Nov-2014 the Cellosaurus files are on a GitHub directory at:

https://github.com/calipho-sib/cellosaurus

So to get the files for a particular release go to:

https://github.com/calipho-sib/cellosaurus/commits/master

Look for the commit labelled with the release number you are interested in (example "Release 15"). Click on that commit then click on the "Browse files" button and when the list of files is displayed click on the green button "Clone or download" and select the "Download ZIP" option.

All the Cellosaurus files are on GitHub with one exception: the XML file (cellosaurus.xml) which is too big to be stored on this platform.

b) We have archived all releases of the Cellosaurus on Yareta, the research data repository of Geneva's higher education institutions. To access the Cellosaurus archives go to:

https://yareta.unige.ch/home/search?search=search%3Dcellosaurus

Note that the Yareta archives for releases 2 up to 32 do not include the OBO and XML files.


Q26: Do vaccines contain cell lines?

Cell lines are used in the production of human and veterinary vaccines. They were first used for the testing and production of poliovirus vaccine. An estimated 10.3 million lives globally have been saved due to vaccines developed in WI-38 and other cell lines (see 2nd web link below).

Their purpose is to produce the active component of the vaccine. For some vaccines, it is an attenuated or killed virus, for some others an engineered virus vector and it can also be to produce the recombinant viral or bacterial proteins that will directly elicit an immune response.

During the vaccine production process the cellular debris and growth reagents are removed using a variety of purification methods and any remaining cell line DNA is broken down using nucleases.

So vaccines do NOT contain cell lines.

For further reading: https://en.wikipedia.org/wiki/Cell-based_vaccine
http://www.aimspress.com/article/10.3934/publichealth.2017.2.127
https://bioscience.lonza.com/lonza_bs/CH/en/vaccine-manufacturing
https://www.welch-us.com/address-the-problem-of-vaccine-purification/
https://www.tandfonline.com/doi/full/10.1080/14712598.2020.1693541


Q27: How do I get cell line XYZ to grow?

Unfortunately we are not experts in lab techniques and thus are not able to help you to solve problems relevant to cell culture.

So what you can do?


- If you have obtained your cell line from a cell collection or a colleague, you should contact that institution or person.
- If you have a ResearchGate account you can use that platform to ask your question. Technical questions are very often answered in a timely and satisfactory manner by users of this platform.
- Finally, you should consider acquiring what we consider to be the "bible" of cell culture:

Freshney's Culture of Animal Cells: A Manual of Basic Technique and Specialized Applications, 8th Edition Amanda Capes-Davis and R. Ian Freshney Wiley-Blackwell, 2021 ISBN: 978-1-119-51304-9

https://www.wiley.com/en-gb/Freshney%27s+Culture+of+Animal+Cells%3A+A+Manual+of+Basic+Technique+and+Specialized+Applications%2C+8th+Edition-p-9781119513049


Q28: What are the differences between the Cellosaurus and hPSCreg?

In a nutshell, the Cellosaurus is a knowledge resource on all types of cell lines from a wide range of organisms, hPSCreg is a registry for human pluripotent stem cell lines (ESCs and iPSCs).

Because of this difference in scope, all cell lines (with very few exceptions) that are described by hPSCreg can be found in the Cellosaurus but not the converse. It should also be noted that currently only 20% of human ESC and 40% of human iPSC cell lines described in the Cellosaurus have been registered in hPSCreg but that these percentages are rising as the majority of newly established human pluripotent cell lines do get registered.

For a given cell line both resources contain information on its name and synonyms; provider; donor age, sex and ethnicity; disease status and associated sequence variations; engineered KOs and KIs; tissue/organ of sampling of the original cells and information on the relationship between lines originating from the same donor. Both also provide literature references as well as cross-references to external resources.

But additionally hPSCreg will contain information that are not captured by the Cellosaurus, namely: the reprogramming method; culture conditions; pluripotency markers expression; morphology; differentiation potency, in depth karyotyping information, the consent under which a cell line has been derived and if the cell line is readily obtainable to third parties.

Entry of information in hPSCreg is directly performed by the groups that have established the relevant cell lines while Cellosaurus entries are manually curated. This has an impact on the update "strategies": the Cellosaurus is updated 4-6 times per year and thus new entries and updates will only be available every 2-3 months. In contrast hPSCreg is updated on a daily basis.

The Cellosaurus and hPSCreg are bi-directionally cross-referenced and these cross-references are updated at each release of the Cellosaurus.

There is a very active collaboration between the two resources that include regular extraction of information by both sides as well as feedback on potential issues and errors.


Q29: Are fetuses aborted to derive cell lines?

No! While fetuses have and are been used to derive cell lines, abortion is not carried out with the specific purpose of establishing cell lines.

Cell lines obtained from human fetuses can be classified into two categories:

1) Cell lines obtained from fetuses originating from a pregnancy loss. Pregnancy loss is often called miscarriage or spontaneous abortion if it occurs before 20 weeks of pregnancy and stillbirth after the 20th week. As such pregnancy loss can frequently be due to genetic defects such as chromosomal abnormalities or genetic diseases. Deriving cell lines from such type of fetus is very useful as it allows researchers to perform experiments to better understand the effects of such genetic anomalies.

For example, cell line AG13074 is a skin fibroblast cell line obtained from a 19 week miscarried fetus which was affected with a trisomy of chromosome 18. This cell line is used for studies of gene dosage imbalance.

2) Cell lines obtained from fetuses obtained after an elected or medical abortion. The cell lines established from these fetuses are expected to be "normal", ie not suffering from any genetic disease and, because of the youth of the tissues, not likely to have been the target of mutations that would give rise to cancers. A small number of such fetal cell lines were established, most of them in the 1960-70's. We do not know of any instance where an abortion was carried out with the purpose of establishing such type of cell lines.

These fetal cell lines were very useful to study life processes in cells that were not derived from cancerous growth nor derived from older individuals and thus representing good models of young healthy cells. Two of them, MRC-5 and WI-38, are often used to grow a range of human viruses and thus also to produce viral vaccines against these viruses. A third cell line, HEK293, was immortalized by transformation with an adenovirus and is used by thousands of laboratories worldwide. More than one thousand different cell lines are known to have been derived from HEK293 by genetic engineering, mostly to disable (knock out) or add (knock in) specific genes so as to be able to study their cellular roles and functions.

Fetal cell lines were also used to address safety concerns when producing early vaccines. Many cell lines at that time were contaminated with microorganisms and fetal cell lines were believed to have a lower risk of such problems. As better methods were developed to test cell lines for microorganisms, contaminated cell lines became less of a problem in vaccine manufacture. Today's manufacturing methods rely on more than a century of continuous efforts to improve vaccine safety.

Relevant Cellosaurus entries: AG13074 https://www.cellosaurus.org/CVCL_X801
HEK293 https://www.cellosaurus.org/CVCL_0045
MRC-5 https://www.cellosaurus.org/CVCL_0440
WI-38 https://www.cellosaurus.org/CVCL_0579

Thanks to Amanda Capes-Davies for important contributions to this FAQ.


Q30: Is the sequence of gene ABCD identical in cell line XYZ and in the reference human genome?

This is a quite interesting question whose answer is dependent on two distinct factors.

First, if you are working with a cancer cell line, it will harbor a varying number of somatic mutations that the original tumor accumulated during the cancer progression. The number of these mutations will differ across different cancer types and goes up to more than 200 in melanomas (see https://pubmed.ncbi.nlm.nih.gov/23539594
). It is also important to be aware that cancer cells generally undergo chromosomal rearrangements that lead to gene fusions, deletions and amplifications. All of those events have an impact on the sequence of the gene that you are interested to study.

If the cell line in which you are interested to check for a particular gene sequence has been exome sequenced you will be able to check if that gene has been somatically mutated or not. Projects like DepMap (formerly the Cancer Cell Line Encyclopedia; CCLE) or the COSMIC cell line project have performed this analysis on about 2000 different cancer cell lines. In the Cellosaurus you will find cross-references to entries in both projects for these cell lines. And if your cell line has not been analyzed by these projects it still may be that this information is available; check if there is a "Omics: Deep exome analysis" comment in the relevant Cellosaurus entry. If it is the case you will find the relevant data sets in ArrayExpress, GEO or in supplementary tables of cited papers.

The second reason that will lead to sequence variations are the presence of sequence polymorphisms. Resources such as the Genome Aggregation Database (GnomeAD; https://gnomad.broadinstitute.org/
) already contains information on more than 700 millions sequence variants obtained from exome and genome sequence of more than 150000 individuals. While many of these variants are rare there are a significant number of variants that are present with a high frequency in one or more human populations. Thus there is a high probability that the sequence of a gene from an individual from which a cell line was established will be slightly different from that stored in the reference human genome.


Q31: Why when I use CLASTR to search for similarities between the STR profile of my mouse cell line and the profiles stored in the Cellosaurus I am either not getting any hits or unexpectedly get a very high match with a number of cell lines?

There are two issues one must consider when using CLASTR with mouse cell lines.

1) Currently only very few mouse cell lines have been STR profiled. In June 2022 we have STR profiles for only 77 mouse cell lines in the Cellosaurus. Thus we encourage anyone that is sending a mouse cell line to be authenticated to submit the STR profile to the Cellosaurus or at least to include it in the supplementary files of their publication. As not all companies/facilities are yet proficient in calling the alleles for the different loci, it would be useful to include the electropherogram in the STR report.

2) Many cell lines are established from mice with the same genetic background, some are even derived from the same generation of clonal mice, and their STR profiles are generally either identical or highly similar. Thus it is not surprising that two cell lines that have been independently established by different laboratories at different time will share very similar STR profiles if they originate from the same mouse breed/strain.

Thanks to Jamie L. Almeida for important contributions to this FAQ.


Q32: Can you provide the Cellosaurus in JSON format?

JSON format is quite verbose and while we do not provide a file containing all of the Cellosaurus entries with all of their records in that format you can use our API to obtain JSON formated entries.

For example if you want all fish cell lines entries in JSON you can use:

https://api.cellosaurus.org/search/cell-line?q=group:fish&format=json


Q33: What is the meaning of the different two-letter line codes in the text version of the Cellosaurus?

Here is an explanation of these codes

For the cell line entries (stored in cellosaurus.txt):

---------  ------------------------------  -----------------------
Line code  Content                         Occurrence in an entry
---------  ------------------------------  -----------------------
ID         Identifier (cell line name)     Once; starts an entry
AC         Accession (CVCL_xxxx)           Once
AS         Secondary accession number(s)   Optional; once
SY         Synonyms                        Optional; once
DR         Cross-references                Optional; once or more
RX         References identifiers          Optional: once or more
WW         Web pages                       Optional; once or more
CC         Comments                        Optional; once or more
ST         STR profile data                Optional; twice or more
DI         Diseases                        Optional; once or more
OX         Species of origin               Once or more
HI         Hierarchy                       Optional; once or more
OI         Originate from same individual  Optional; once or more
SX         Sex of cell                     Optional; once
AG         Age of donor at sampling        Optional; once
CA         Category                        Once
DT         Date (entry history)            Once
//         Terminator                      Once; ends an entry

For the comments lines, the currently defined topics are:

CC   Anecdotal
CC   Biotechnology
CC   Breed/subspecies
CC   Caution
CC   Cell type
CC   Characteristics
CC   Derived from site
CC   Discontinued
CC   Donor information
CC   Doubling time
CC   From
CC   Genome ancestry
CC   Group
CC   HLA typing
CC   Karyotypic information
CC   Knockout cell
CC   Microsatellite instability
CC   Miscellaneous
CC   Misspelling
CC   Monoclonal antibody isotype
CC   Monoclonal antibody target
CC   Omics
CC   Part of
CC   Population
CC   Problematic cell line
CC   Registration
CC   Selected for resistance to
CC   Senescence
CC   Sequence variation
CC   Transfected with
CC   Transformant
CC   Virology

For the reference entries (stored in cellosaurus_refs.txt):

---------  --------------------------   -------------------------
Line code  Content                      Occurrence in a reference
---------  --------------------------   -------------------------
RX         Reference identifier(s)      Once; starts a reference
RA         Reference authors            Optional; Once or more
RG         Reference group/consortium   Optional; Once or more
RT         Reference title              Once or more
RL         Reference citation           Once
//         Terminator                   Once; ends a reference