The remainder of this page is identical on the following tracks:
  - Common SNPs(135)
  
- Flagged SNPs(135)
  
- Mult. SNPs(135)
  
- All SNPs(135)
Interpreting and Configuring the Graphical Display
  Variants are shown as single tick marks at most zoom levels.
  When viewing the track at or near base-level resolution, the displayed
  width of the SNP corresponds to the width of the variant in the reference
  sequence. Insertions are indicated by a single tick mark displayed between
  two nucleotides, single nucleotide polymorphisms are displayed as the width 
  of a single base, and multiple nucleotide variants are represented by a 
  block that spans two or more bases.
On the track controls page, SNPs can be colored and/or filtered from the 
display according to several attributes:
  
    - 
      
      Class: Describes the observed alleles
 
        - Single - single nucleotide variation: all observed alleles are single nucleotides
	    (can have 2, 3 or 4 alleles)
        
- In-del - insertion/deletion
        
- Heterozygous - heterozygous (undetermined) variation: allele contains string '(heterozygous)'
        
- Microsatellite - the observed allele from dbSNP is a variation in counts of short tandem repeats
        
- Named - the observed allele from dbSNP is given as a text name instead of raw sequence, e.g., (Alu)/-
        
- No Variation - the submission reports an invariant region in the surveyed sequence
        
- Mixed - the cluster contains submissions from multiple classes
        
- Multiple Nucleotide Polymorphism (MNP) - the alleles are all of the same length, and length > 1
        
- Insertion - the polymorphism is an insertion relative to the reference assembly
        
- Deletion - the polymorphism is a deletion relative to the reference assembly
        
- Unknown - no classification provided by data contributor
      
 
- 
      
      Validation: Method used to validate
	the variant (each variant may be validated by more than one method)
 
        - By Frequency - at least one submitted SNP in cluster has frequency data submitted
        
- By Cluster - cluster has at least 2 submissions, with at least one submission assayed with a non-computational method
        
- By Submitter - at least one submitter SNP in cluster was validated by independent assay
        
- By 2 Hit/2 Allele - all alleles have been observed in at least 2 chromosomes
        
- By HapMap - submitted by HapMap project (human only)
        
- By 1000Genomes - submitted by 1000Genomes project (human only)
        
- Unknown - no validation has been reported for this variant
      
 
- 
      
      Function: Predicted functional role 
	(each variant may have more than one functional role)
 
        - Locus Region - variation is 3' to and within 500  bases of a
          transcript, or is 5' to and within 2000 bases of a transcript
          (near-gene-3, near-gene-5)
        
- Coding - Synonymous - no change in peptide for allele with 
	    respect to the reference assembly (coding-synon)
        
- Coding - Non-Synonymous - change in peptide for allele with 
	    respect to the reference assembly (nonsense, missense, 
            frameshift, cds-indel, coding-synonymy-unknown)
        
- Untranslated - variation is in a transcript, but not in a coding 
	    region interval (untranslated-3, untranslated-5)
        
- Intron - variation is in an intron, but not in the first two or
          last two bases of the intron
        
- Splice Site - variation is in the first two or last two bases
          of an intron (splice-3, splice-5)
        
- Unknown - no known functional classification
      
 
- 
      
      Molecule Type: Sample used to find this variant
 
        - Genomic - variant discovered using a genomic template
        
- cDNA - variant discovered using a cDNA template
        
- Unknown - sample type not known
      
 
- 
      
      Unusual Conditions (UCSC): UCSC checks for several anomalies 
      that may indicate a problem with the mapping, and reports them in the 
      Annotations section of the SNP details page if found:
      
        - AlleleFreqSumNot1 - Allele frequencies do not sum
            to 1.0 (+-0.01).  This SNP's allele frequency data are
	    probably incomplete.
- DuplicateObserved,
            MixedObserved - Multiple distinct insertion SNPs have 
	    been mapped to this location, with either the same inserted 
	    sequence (Duplicate) or different inserted sequence (Mixed).
- FlankMismatchGenomeEqual,
	    FlankMismatchGenomeLonger,
	    FlankMismatchGenomeShorter - NCBI's alignment of
            the flanking sequences had at least one mismatch or gap
	    near the mapped SNP position.
            (UCSC's re-alignment of flanking sequences to the genome may
            be informative.)
- MultipleAlignments - This SNP's flanking sequences 
            align to more than one location in the reference assembly.
- NamedDeletionZeroSpan - A deletion (from the
            genome) was observed but the annotation spans 0 bases.
            (UCSC's re-alignment of flanking sequences to the genome may
            be informative.)
- NamedInsertionNonzeroSpan - An insertion (into the
            genome) was observed but the annotation spans more than 0
            bases.  (UCSC's re-alignment of flanking sequences to the
            genome may be informative.)
- NonIntegerChromCount - At least one allele
            frequency corresponds to a non-integer (+-0.010000) count of
            chromosomes on which the allele was observed.  The reported
            total sample count for this SNP is probably incorrect.
- ObservedContainsIupac - At least one observed allele 
            from dbSNP contains an IUPAC ambiguous base (e.g., R, Y, N).
- ObservedMismatch - UCSC reference allele does not
            match any observed allele from dbSNP.  This is tested only
	    for SNPs whose class is single, in-del, insertion, deletion,
	    mnp or mixed.
- ObservedTooLong - Observed allele not given (length
            too long).
- ObservedWrongFormat - Observed allele(s) from dbSNP
            have unexpected format for the given class.
- RefAlleleMismatch - The reference allele from dbSNP
            does not match the UCSC reference allele, i.e., the bases in
	    the mapped position range.
- RefAlleleRevComp - The reference allele from dbSNP
            matches the reverse complement of the UCSC reference
            allele.
- SingleClassLongerSpan - All observed alleles are
            single-base, but the annotation spans more than 1 base.
            (UCSC's re-alignment of flanking sequences to the genome may
            be informative.)
- SingleClassZeroSpan - All observed alleles are
            single-base, but the annotation spans 0 bases.  (UCSC's
            re-alignment of flanking sequences to the genome may be
            informative.)
 Another condition, which does not necessarily imply any problem,
      is noted:
        - SingleClassTriAllelic, SingleClassQuadAllelic - 
            Class is single and three or four different bases have been
	    observed (usually there are only two).
 
- 
      
      Miscellaneous Attributes (dbSNP): several properties extracted
         from dbSNP's SNP_bitfield table
         (see dbSNP_BitField_v5.pdf for details)
      
        - Clinically Associated - SNP is in OMIM/OMIA and/or at 
	    least one submitter is a Locus-Specific Database.  This does
	    not necessarily imply that the variant causes any disease,
	    only that it has been observed in clinical studies.
- Appears in OMIM/OMIA - SNP is mentioned in 
	    Online Mendelian Inheritance in Man for 
	    human SNPs, or Online Mendelian Inheritance in Animals for 
	    non-human animal SNPs.  Some of these SNPs are quite common,
	    others are known to cause disease; see OMIM/OMIA for more
	    information.
- Has Microattribution/Third-Party Annotation - At least
	    one of the SNP's submitters studied this SNP in a biomedical
	    setting, but is not a Locus-Specific Database or OMIM/OMIA.
- Submitted by Locus-Specific Database - At least one of
	    the SNP's submitters is associated with a database of variants
	    associated with a particular gene.  These variants may or may
	    not be known to be causative.
- MAF >= 5% in Some Population - Minor Allele Frequency is 
	    at least 5% in at least one population assayed.
- MAF >= 5% in All Populations - Minor Allele Frequency is 
	    at least 5% in all populations assayed.
- Genotype Conflict - Quality check: different genotypes 
	    have been submitted for the same individual.
- Ref SNP Cluster has Non-overlapping Alleles - Quality
	    check: this reference SNP was clustered from submitted SNPs
	    with non-overlapping sets of observed alleles.
- Some Assembly's Allele Does Not Match Observed - 
	    Quality check: at least one assembly mapped by dbSNP has an allele
            at the mapped position that is not present in this SNP's observed
            alleles.
 
Several other properties do not have coloring options, but do have 
  some filtering options:
    - 
      
      Average heterozygosity: Calculated by dbSNP as described 
      here
      
      -  Average heterozygosity should not exceed 0.5 for bi-allelic 
           single-base substitutions.
      
 
- 
      
      Weight: Alignment quality assigned by dbSNP
 
      - Weight can be 0, 1, 2, 3 or 10.   
      
- Weight = 1 are the highest quality alignments.
      
- Weight = 0 and weight = 10 are excluded from the data set.
      
- A filter on maximum weight value is supported, which defaults to 1
        on all tracks except the Mult. SNPs track, which defaults to 3.
      
 
- 
      
      Submitter handles: These are short, single-word identifiers of
      labs or consortia that submitted SNPs that were clustered into this
      reference SNP by dbSNP (e.g., 1000GENOMES, ENSEMBL, KWOK).  Some SNPs
      have been observed by many different submitters, and some by only a
      single submitter (although that single submitter may have tested a
      large number of samples).
    
- 
      
      AlleleFrequencies: Some submissions to dbSNP include 
      allele frequencies and the study's sample size 
      (i.e., the number of distinct chromosomes, which is two times the
      number of individuals assayed, a.k.a. 2N).  dbSNP combines all
      available frequencies and counts from submitted SNPs that are 
      clustered together into a reference SNP.
    
 You can configure this track such that the details page displays
 the function and coding differences relative to 
 particular gene sets.  Choose the gene sets from the list on the SNP 
 configuration page displayed beneath this heading: On details page,
 show function and coding differences relative to.  
 When one or more gene tracks are selected, the SNP details page 
 lists all genes that the SNP hits (or is close to), with the same keywords 
 used in the function category.  The function usually 
 agrees with NCBI's function, except when NCBI's functional annotation is 
 relative to an XM_* predicted RefSeq (not included in the UCSC Genome 
 Browser's RefSeq Genes track) and/or UCSC's functional annotation is 
 relative to a transcript that is not in RefSeq.
 
Insertions/Deletions
dbSNP uses a class called 'in-del'.  We compare the length of the
reference allele to the length(s) of observed alleles; if the
reference allele is shorter than all other observed alleles, we change
'in-del' to 'insertion'.  Likewise, if the reference allele is longer
than all other observed alleles, we change 'in-del' to 'deletion'.
UCSC Re-alignment of flanking sequences
dbSNP determines the genomic locations of SNPs by aligning their flanking 
sequences to the genome.
UCSC displays SNPs in the locations determined by dbSNP, but does not
have access to the alignments on which dbSNP based its mappings.
Instead, UCSC re-aligns the flanking sequences 
to the neighboring genomic sequence for display on SNP details pages.  
While the recomputed alignments may differ from dbSNP's alignments,
they often are informative when UCSC has annotated an unusual condition.
Non-repetitive genomic sequence is shown in upper case like the flanking 
sequence, and a "|" indicates each match between genomic and flanking bases.
Repetitive genomic sequence (annotated by RepeatMasker and/or the
Tandem Repeats Finder with period <= 12) is shown in lower case, and matching
bases are indicated by a "+".
Data Sources and Methods
The data that comprise this track were extracted from database dump files 
and headers of fasta files downloaded from NCBI.  
The database dump files were downloaded from 
ftp://ftp.ncbi.nih.gov/snp/organisms/
organism_tax_id/database/
(e.g., for Human, organism_tax_id = human_9606).
The fasta files were downloaded from 
ftp://ftp.ncbi.nih.gov/snp/organisms/
organism_tax_id/rs_fasta/
  
  - Coordinates, orientation, location type and dbSNP reference allele data
      were obtained from b135_SNPContigLoc_37_3.bcp.gz and 
      b135_ContigInfo_37_3.bcp.gz.
- b135_SNPMapInfo_37_3.bcp.gz provided the alignment weights.
  
- Functional classification was obtained from 
      b135_SNPContigLocusId_37_3.bcp.gz.
- Validation status and heterozygosity were obtained from SNP.bcp.gz.
- SNPAlleleFreq.bcp.gz and ../shared/Allele.bcp.gz provided allele frequencies.
      For the human assembly, allele frequencies were also taken from
      SNPAlleleFreq_TGP.bcp.gz .
- Submitter handles were extracted from Batch.bcp.gz, SubSNP.bcp.gz and 
      SNPSubSNPLink.bcp.gz.
- SNP_bitfield.bcp.gz provided miscellaneous properties annotated by dbSNP,
       such as clinically-associated.  See the document 
       dbSNP_BitField_v5.pdf for details.
- The header lines in the rs_fasta files were used for molecule type,
      class and observed polymorphism.
Orthologous Alleles (human assemblies only)
For the human assembly, we provide a related table that contains
orthologous alleles in the chimpanzee, orangutan and rhesus macaque
reference genome assemblies.  
We use our liftOver utility to identify the orthologous alleles.  
The candidate human SNPs are a filtered list that meet the criteria:
- class = 'single'
- mapped position in the human reference genome is one base long
- aligned to only one location in the human reference genome
- not aligned to a chrN_random chrom
- biallelic (not tri- or quad-allelic)
In some cases the orthologous allele is unknown; these are set to 'N'.
If a lift was not possible, we set the orthologous allele to '?' and the 
orthologous start and end position to 0 (zero).Masked FASTA Files (human assemblies only)
FASTA files that have been modified to use 
IUPAC
ambiguous nucleotide characters at
each base covered by a single-base substitution are available for download
here.
Note that only single-base substitutions (no insertions or deletions) were used
to mask the sequence, and these were filtered to exlcude problematic SNPs.
References
Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. 
dbSNP: the NCBI database of genetic variation.
Nucleic Acids Res. 2001 Jan 1;29(1):308-11.