Pathology-associated TCR database

McPAS-TCR: A manually curated catalogue of pathology associated T-cell receptor sequences

McPAS-TCR is a manually curated catalogue of T cell receptor (TCR) sequences that were found in T cells associated with various pathological conditions in humans and in mice. It is meant to link TCR sequences to their antigen target or to the pathology and organ with which they are associated.

The database can be queried by disease condition, T cell type, tissue, epitope, source organism, MHC restriction, assay type and other criteria.

Download the complete database...

How to cite us: Tickotsky N, Sagiv T, Prilusky J, Shifrut E, Friedman N (2017). McPAS-TCR: A manually-curated catalogue of pathology-associated T cell receptor sequences. Bioinformatics 33:2924-2929

The database was last updated on: September 10, 2022.

Enter Query
Or: upload file

Does the file have a header?

Yes

Search Parameters

Sequence type:

Max Levenshtein distance (the edit distance between two sequences, i.e. sum of insertions, deletions or substitutions):

Tip: Clicking search without query returns the entire database. Fill the boxes on top of the columns to filter by field.

Download csv...

Database Glossary

A full explanation on each column of the database reveals its scope:

Category- CDR3 sequences curated in the database were found in T cell clones expended in various pathologic conditions and sorted according to their pathology's category:

'Pathogens' category includes all sequences found in tissues/cells inflicted with bacteria, a virus or a parasite.

'Autoimmune' category includes all sequences identified in tissues/cells from human and mice with an autoimmune condition.

'Cancer' category includes all sequences identified in malignant tissues/cells.

'Other' category includes all sequences not belonging to the first three categories (e.g. Allergy).

Pathology- describes a disease or some sort of medical problem (e.g. EBV viral infection, rheumatoid arthritis and allergy). CDR3 sequences curated in the database were sorted according to the pathologic condition of the tissue/cells they were identified in. This enables the user to find all curated sequences related to a certain disease/pathological condition.

Pathology details- additional details pertaining to the study that may be relevant. For example, if the sequences were identified in bone-marrow transplanted patients.

Antigen identification methods are critical to aid the user in judging the reliability of the data, and present the level of confidence in the accuracy of the data. To avoid 'bystander sequences' (i.e., sequences not directly related to the pathology) we recommend the use of sequences that were experimentally acquired through tetramers binding and/or T cell stimulation.

Here's the Antigen identification methods index:

1 Peptide-MHC (pMHC) tetramers

2 In vitro stimulation, sub-divided by the type of antigen used for the stimulation:

2.1 Stimulation with a peptide

2.2 Stimulation with a protein

2.3 Stimulation with a pathogen

2.4 Stimulation with tumor cells

2.5 Other types of in vitro stimulation

3 Isolated from a specific tissue under a specific pathology / condition (for example, tumor infiltrating T cells), sub-divided by method of TCR sequencing:

NGS- This field specifies if the TCR sequence was identified using Next Generation Sequencing (NGS). It states either "Yes", for use of NGS, or "No" otherwise.

Antigen protein- the antigen protein that the T-cells target, if mentioned in the curated article. The Protein ID is the protein's entry in the The UniProt Knowledgebase (UniProtKB) reviewed (Swiss-Prot) manually annotated records (at http://www.uniprot.org/uniprot/).

Epitope peptide- the specific epitope of antigen protein that is targeted by the T cells, if mentioned in the curated article. The Epitope ID is the epitope's entry in the Immune Epitope Database (IEDB, www.iedb.org). If the epitope belongs to a neoantigen, we use

MHC- The major histocompatibility complex associated with the epitope, as described in the article.

Tissue- the tissue/tissues the T-cells were extracted from.

T cell type- the T-cells that the TCR CDR3s were extracted from. In addition to CD4 and CD8, Treg (T-regulatory), Teff ( T- effector) and TIL (Tumor infiltrating lymphocytes) are included.

T cell characteristics- Marks phenotypes such as Treg (T-regulatory), Teff ( T- effector) and TIL (Tumor infiltrating lymphocytes).

TRAV- T-cell receptor alpha variable gene segment.

TRAJ- T-cell receptor alpha junctional gene segment.

CDR3 alpha nt-The CDR3 alpha nucleotide sequence.

TRBV- T-cell receptor beta variable gene segment. When not identified in the article, it was not reconstructed, in order to avoid possible mistakes.

TRBVD- T-cell receptor beta variable diversity gene segment.

TRBJ- T-cell receptor beta junctional gene segment. This region, when not mentioned in the article was nevertheless reconstructed according to the CDR3 sequence. In papers that the TRBJ region was reconstructed, the adjacent column, 'reconstructed J annotation', states 'Yes'.

Reconstructed J annotation - T-cell receptor beta junctional gene segment. This region, when not mentioned in the article was nevertheless reconstructed according to the CDR3 sequence. Only in papers that the TRBJ region was reconstructed, the adjacent column, 'Reconstructed J annotation ', states 'Yes'.

CDR3 beta nt- The CDR3 beta nucleotide sequence.

Mouse strain- Mouse strain/s used in the study.

PubMed.ID- the study's entry in PubMed database of citations for biomedical literature at http://www.ncbi.nlm.nih.gov/pubmed . A link is provided so that the paper's abstract is easily accessible.

Usage examples

The database aims to supply researchers with various data analysis options:

1) Search for all CDR3 sequences associated with a specific disease/condition.

2) Search for TRBV and TRBJ chains associated with specific diseases.

3) Search for CDR3 sequences associated with a specific tissue type.

4) Search for CDR3 sequences associated with a specific epitope.

5) Compare human CDR3 sequences with mice CDR3 sequences associated with a specific disease/condition.

6) Check users' high throughput sequencing data against database data, to search for matching sequences. The results will provide a Levenshtein's distance for each match, with a lower value indicating a better similarity of the sequence with the query.

7) Cluster data according to inter- sequences similarities.

Our final goal is to have the database accessible for researchers who would like to deposit high throughput sequencing data. All data would be manually curated by the database team prior to its inclusion in the database.

How to search the database

Push the 'Search' button at the top of the page. In the search section, you can either paste the sequences you want to search for, or upload a csv or a Notepad file of your sequences. Please specify if the file has a heading/title.

You will need to choose if you want to find CDR3 beta, CDR3 alpha or epitopes in the database.

It is advisable to choose the Levenshtein's distance (default set at 2), that is a string metric for measuring the difference between two sequences. Informally, the Levenshtein distance between two sequences is the minimum number of single-character edits (i.e. insertions, deletions or substitutions) required to change one sequence into the other.

General remarks: The query takes all sequences supplied by the user and searches each one against the sequences in the database. Therefore, if a sequence appears more than once in the query, it will appear more than once in the search results. The database consists of some identical sequences that were identified in more than one study, and as a result appear as two annotated sequences in the database.

Search content details: Column A stands for the original line in the user's dataset. Column B shows the sequences in the Mc-Pas TCR that match the queried sequences. Column C shows the calculated distance between the queried sequences and the annotated sequences

Additional Resources

IMGT THE INTERNATIONAL IMMUNOGENETICS INFORMATION SYSTEM ® http://www.imgt.org/

IEDB Immune Epitope Database Analysis Resource http://www.iedb.org

Ensembl Annotation of Ig/Tcr Segment Genes http://asia.ensembl.org/info/genome/genebuild/ig_tcr.html

dbMHC

NIH-NIAID Tetramer Facility

About

The database is curated and maintained by the Friedman Lab at the Weizmann Institute of Science. If you use the database in your work, please cite McPAS-TCR and its address. Disclaimer: Since the database is manually curated, the information may not be a hundred percent accurate at all times. It is advisable to refer to the paper cited for the original data.

For any questions or comments contact us at nilitiko@gmail.com