VARAdb

VARAdb!v1.0

A variation annotation database for human

What is VARAdb?

Here, we developed a comprehensive human variation annotation database (VARAdb, http://www.licpathway.net/VARAdb), which aims to provide a large number of variations and annotate their potential roles with a large amount of regulatory information. The current version of VARAdb cataloged a total of 577,283,813 variations and provided five annotation sections including ‘Variation information’, ‘Regulatory information’, ‘Related genes’, ‘Chromatin accessibility’ and ‘Chromatin interaction’, with significantly more information than similar databases. The information includes motif changes, risk SNPs, LD SNPs, eQTLs, clinical variant-drug-gene pairs, sequence conservation, somatic mutations, enhancers, super enhancers, promoters, TFs, ChromHMM states, histone modifications, ATAC accessible regions and chromatin interactions from Hi-C and ChIA-PET.
Importantly, we considered two types of variation related genes: 1) variation that sets in enhancer may associate with enhancer target genes predicted by Lasso method; 2) variation related genes based on distance. In addition, VARAdb can prioritize variations based on score, annotate novel variants and perform pathway downstream analysis conveniently. Together, VARAdb is a user-friendly database to query, browse and visualize variations of interest. We believe VARAdb will help obtain perspectives on the regulation of variations in complex diseases.

Collection of variations

We have not only collected the variation from dbSNP but also multiple other resources. Notably, 577,098,938 SNVs were collected from dbSNP v151 and 79,482,384 common SNPs were collected from the 1000 Genomes Project. Each common SNP from the 1000 Genomes Project has at least one 1000 Genomes population with a minor allele of frequency ≥ 1%. Millions of LD SNPs of five super-populations (4,477,132 from African; 4,548,152 from Ad Mixed American; 3,693,208 from East Asian; 4,011,947 from European; and 3,838,175 from South Asian) were calculated using phased genotype information accompanying the 1000 Genomes Project phase 3. In addition, we integrated 1,515,001 risk SNPs from the GWAS Catalog, GWASdbv2.0, GAD, Johnson and O'Donnell, and GRASP v2.0. We also obtained 3,998,301 eQTLs from GTEx v7, PancanQTL, and HaploReg v4.1.

Score

Note: We calculated score of the variation that means how many categories the variation associated with. Each variation is scored based on its annotated records on nine annotation categories: risk SNP, eQTL, motif change, conservation, enhancer and super enhancer, promoter, TF binding, ATAC accessible region and Hi-C.

Note: VARAdb provides 5 annotation sections for the cataloged and novel variations. These sections are shown as below.

Regulatory infomation

Related genes

Chromatin accessibility

Chromatin interaction

Variation information

Statistics Table

Variations	577,283,813
Enhancer sources	8
Enhancer number	7,841,333
Enhancer-Gene pairs	5,880,825
Enhancer-Gene pair sources	3
TF ChIP-seq sources	5
TF ChIP-seq samples	7734
TF number	1261
Promoter sources	2
Promoter number	6203292
Pathway sources	10
Pathway number	2880

Details please see Statistics

Note:Coding region variations account for ~1.79% of 577,283,813 variations. Among all variations, 10,386,595 are coding region variations, 566,895,064 are non-coding variations and 2,154 are other variations.

Sister Projects

SEdb
SEdb: The comprehensive human Super-Enhancer databasebr

SEanalysis
SEanalysis: a web tool for super-enhancer associated regulatory analysis

KnockTF
KnockTF: a comprehensive human gene expression profile database with knockdown/knockout of transcription factors

ENdb
ENdb: An experimentally supported enhancer database for human and mouse

TRlnc
TRlnc: A comprehensive database of human transcriptional regulation of lncRNAs

TRCirc
TRCirc: a resource for transcriptional regulation information of circRNAs

ATACdb
ATACdb: A comprehensive human chromatin accessibility database

LncSEA
LncSEA: A comprehensive human lncRNA sets resource and enrichment analysis platform

News and Updates

2020.10 VARAdb is accepted by Nucleic Acids Research

2020.05 Browse page is designed

2019.12 The database is online

2019.09 Database construction

For publication of results please cite the following article

Pan Q, Liu YJ, Bai XF, Han XL, Jiang Y, Ai B, Shi SS, Wang F, Xu MC, Wang YZ, Zhao J, Chen JX, Zhang J, Li XC, Zhu J, Zhang GR, Wang QY, Li CQ. VARAdb: a comprehensive variation annotation database for human. Nucleic Acids Res. 2020 Oct 23:gkaa922. doi: 10.1093/nar/gkaa922. Epub ahead of print. PMID: 33095866.