A variation annotation database for human

What is VARAdb?

Here, we developed a comprehensive human variation annotation database (VARAdb,, which aims to provide a large number of variations and annotate their potential roles with a large amount of regulatory information. The current version of VARAdb cataloged a total of 577,283,813 variations and provided five annotation sections including ‘Variation information’, ‘Regulatory information’, ‘Related genes’, ‘Chromatin accessibility’ and ‘Chromatin interaction’, with significantly more information than similar databases. The information includes motif changes, risk SNPs, LD SNPs, eQTLs, clinical variant-drug-gene pairs, sequence conservation, somatic mutations, enhancers, super enhancers, promoters, TFs, ChromHMM states, histone modifications, ATAC accessible regions and chromatin interactions from Hi-C and ChIA-PET.
Importantly, we considered two types of variation related genes: 1) variation that sets in enhancer may associate with enhancer target genes predicted by Lasso method; 2) variation related genes based on distance. In addition, VARAdb can prioritize variations based on score, annotate novel variants and perform pathway downstream analysis conveniently. Together, VARAdb is a user-friendly database to query, browse and visualize variations of interest. We believe VARAdb will help obtain perspectives on the regulation of variations in complex diseases.

Collection of variations

We have not only collected the variation from dbSNP but also multiple other resources. Notably, 577,098,938 SNVs were collected from dbSNP v151 and 79,482,384 common SNPs were collected from the 1000 Genomes Project. Each common SNP from the 1000 Genomes Project has at least one 1000 Genomes population with a minor allele of frequency ≥ 1%. Millions of LD SNPs of five super-populations (4,477,132 from African; 4,548,152 from Ad Mixed American; 3,693,208 from East Asian; 4,011,947 from European; and 3,838,175 from South Asian) were calculated using phased genotype information accompanying the 1000 Genomes Project phase 3. In addition, we integrated 1,515,001 risk SNPs from the GWAS Catalog, GWASdbv2.0, GAD, Johnson and O'Donnell, and GRASP v2.0. We also obtained 3,998,301 eQTLs from GTEx v7, PancanQTL, and HaploReg v4.1.


Note: VARAdb provides 5 annotation sections for the cataloged and novel variations. These sections are shown as below.

Regulatory infomation

Related genes
Chromatin accessibility
Chromatin interaction

Variation information

Statistics Table

Variations 577,283,813
Enhancer sources 8
Enhancer number 7,841,333
Enhancer-Gene pairs 5,880,825
Enhancer-Gene pair sources 3
TF ChIP-seq sources 5
TF ChIP-seq samples 7734
TF number 1261
Promoter sources 2
Promoter number 6203292
Pathway sources 10
Pathway number 2880

Details please see Statistics

Note:Coding region variations account for ~1.79% of 577,283,813 variations. Among all variations, 10,386,595 are coding region variations, 566,895,064 are non-coding variations and 2,154 are other variations.

Sister Projects

SEdb: The comprehensive human Super-Enhancer databasebr

SEanalysis: a web tool for super-enhancer associated regulatory analysis

KnockTF: a comprehensive human gene expression profile database with knockdown/knockout of transcription factors

ENdb: An experimentally supported enhancer database for human and mouse

TRlnc: A comprehensive database of human transcriptional regulation of lncRNAs

TRCirc: a resource for transcriptional regulation information of circRNAs