Every cell in the human body harbors a nearly identical copy of the genome, yet each tissue only expresses the subset of genes that are required to fulfill its particular functions. Human diseases are often quite clearly caused by dysfunction or disruption of particular tissues, yet human disease genetics typically treat all regions of the genome equally in regard to their prior probability of contributing to disease risk. If it could be established that genes with highly enriched or specific expression in a particular tissue were more likely to contribute to diseases of that tissue, then tissue specific expression could be reasonably used to weight tests of statistical association to improve our ability to identify risk loci. Here, we provide a simple web tool to test this ‘selective expression’ hypothesis by testing whether currently identified disease genes are over-represented by enriched expression in the disease-relevant tissue. We call our tool Tissue Specific Expression Analysis (TSEA).
The results will be shown in a page in about one minute…
This server can accept an input list of gene symbols (as assigned by HUGO), and return an enrichment analysis of their expressions across 25 tissues. Candidate gene lists that overlap with lists of transcripts enriched in a particular tissue are identified by Fisher’s Exact test with Benjamini-Hochberg correction.
User gene lists may be from any source including, lists of genes from annotated GWAS studies, Gene Ontological (GO) analyses, genetic analyses of desease/traits related tissues, lists of genes with altered expression from RNA-seq or Microarray analysis from postmortem patient tissue, animal models, tumor samples, etc.
The gene expression data used in TSEA was collected using published RNA-Seq data from the Genotype-Tissue Expression project (GTEx Analysis Pilot Data 2013-01-31, summarized to genes). The GTEx project was supported by the Common Fund of the Office of the Director of the National Institutes of Health (commonfund.nih.gov/GTEx). The raw GTEx data consists of 1,839 samples derived from 189 post-mortem subjects. This included samples from 45 different tissues, with some tissues offering multiple ‘sub-tissue’ types (e.g., multiple brain dissections). To analyze the data at tissue level, RPKM values for the sub-tissue types were averaged resulting in 25 ‘whole-tissue’ types. Transcripts were further filtered to include well-annotated protein-coding genes designated by RefSeq (release 60) gene annotations. After filtering, 18,056 rof the original 52,576 transcripts remained.
For each tissue, transcripts from the processed GTEx transcripts that are specifically expressed or enriched have been identified by using our pSI R package function specificity.index (download here) to calculate the Specificity Index thresholds (pSI) of varying stringency (Dougherty et al and CSEA publication). For example, pSI smaller than .01 identifies a larger number of relatively enriched transcripts, while those at pSI smaller than .0001 will be just the subset that is relatively specific. Results will appear for all thresholds. Sample results and explanation are (here).