What are Single Nucleotide Polymorphisms (SNPs)?

SNPs result from replication errors and DNA damage

A Single Nucleotide Polymorphism, or SNP (pronounced "snip"), is a small genetic change, or variation, that can occur within a person's DNA sequence.  The genetic code is specified by the four nucleotide "letters" A (adenine), C (cytosine), T (thymine), and G (guanine).  SNP variation occurs when a single nucleotide, such as an A, replaces one of the other three nucleotide letters:  C, G, or T. Single nucleotide polymorphism (SNP) is a new term for an old concept.  Geneticists have been trying for decades to find the genetic differences among individuals.  Originally phenotypes were used, then protein sequence, electrophoresis, restriction fragment polymorphisms (RFLPs), and microsatellites.  With recent technologies for DNA sequencing and the detection of single-base differences, we are approaching the time when all differences in DNA sequence among individuals can be found.  

SNPs most commonly refer to single-base differences in DNA among individuals.  The assays that detect these point differences generally can also detect small insertions or deletions of one or a few bases.  Polymorphisms are usually defined as sites where the less common variant has a frequency of at least 1% in the population, but for some purposes rarer variants are important as well.

An example of a SNP is the alteration of the DNA segment AAGGTTA to ATGGTTA, where the second "A" in the first snippet is replaced with a "T". On average, SNPs occur in the human population more than 1 percent of the time. Because only about 3 to 5 percent of a person's DNA sequence codes for the production of proteins, most SNPs are found outside of "coding sequences".  SNPs found within a coding sequence are of particular interest to researchers because they are more likely to alter the biological function of a protein.  Because of the recent advances in technology, coupled with the unique ability of these genetic variations to facilitate gene identification, there has been a recent flurry of SNP discovery and detection.

Why SNPs?

  • The basis for high-throughput and massively parallel genotyping technologies.
  • Phenotypic changes produced by SNPs (e.g., human diseases caused by SNPs) can be directly genotyped.
  • SNPs open the way to the development of ultra-high density maps. The estimated frequency of SNPs in the human genome is one per 1,000 bp; thus, in theory, markers could be developed for as many as ~3,000,000 human SNPs.
  • One-tenth of the total alone would yield a genetic map comprised of 300,000
  • SNP markers dispersed every 10,000 bp.
  • Alleles making up blocks of such SNPs in close physical proximity are often correlated, and define a limited number of SNP haplotypes, each of which reflects descent from a single, ancient ancestral chromosome.
  • The majority of human sequence variation is due to substitutions that have occurred once in the history of mankind at individual base pairs, SNPs (1).
  • SNPs seem to comprise the largest class of functional polymorphisms (i.e., those producing phenotypic effects).

Strengths

  1. The most abundant class of DNA polymorphisms.
  2. SNPs are the basis for a variety of ultra-high throughput and massively parallel genotyping technologies.
  3. SNP markers are locus specific
  4. SNP markers are an excellent long term investment.
  5. SNP markers can be used to pinpoint functional polymorphisms.
  6. SNP assays typically require very small amounts of DNA (typically 25 to 50 ng per individual).

Number of SNPs: How many SNPs are there in the human genome?  This is the same as asking how many of the 3.2 billion sites in the genome have variant forms, at frequencies above the mutation rate.  There is good information on the proportion of sites that differ between two randomly chosen homologous chromosomes.  This proportion is called the nucleotide diversity; it is useful for comparing the amount of variability among chromosome regions or among populations, and takes into account the number of chromosomes examined (2).  Many SNPs were discovered in the overlap of the ends of bacterial artificial chromosomes (BAC) clones used to assemble the human genome, when these BAC clones came from different individuals or from different chromosomes from the same individual; the number of differences between two chromosomes averaged 1/1331 sites of the DNA sequence (3). Since people have two copies of all chromosomes (except the sex chromosomes in males), this means that any one individual is heterozygous at about 3.2 billion bases 1 difference/1331 bases = 2.4 million sites across all chromosomes.  

When two chromosomes are compared, they may have the same base at a DNA site even though that site is polymorphic in the population.  The number of sites that vary in a population cannot be estimated simply by counting the number of sites that differ between two chromosomes.  The number of sites seen to have variants will rise as more individuals are examined; the exact number will depend on the distribution of the frequencies of the SNP alleles, but many SNPs will be missed.  For example, samples of 10 chromosomes have a 97% chance of including both SNP alleles when the minor allele frequency is at least 20% in the population, but only a 59% chance when the minor allele frequency is at least 1% (4).  Thus small samples are going to miss many SNPs with common alleles as well as most SNPs with rare alleles, and even samples that are larger are going to miss many SNPs with rare alleles. Based on neutral theory and the observed rate of 1/1331 differences in two chromosomes, the estimate of the number of SNPs in humans with minor allele frequencies above 1% is 11 million (4).  However, this estimate misses SNPs that are rare overall but are more common in some populations.  Currently there is too little information about the variation in rare allele frequencies among populations as well as about the deviations from the assumptions of the neutral model to make a good guess of the number of SNPs (5).  A rough guess is that there are about 10 – 30 million SNPs in the human genome, or one on average about every 100 – 300 bases. Eventually the number of SNPs will be found empirically, as many individuals are genotyped across the genome.  

The Pattern of Human SNP Variation Humans arose about 100,000 – 200,000 years ago in Africa, and spread from there to the rest of the world (6).  The original population was polymorphic, and so populations around the world share most polymorphisms from our common ancestors.  For example, all populations are variable at the gene for the ABO blood group.  About 85– 90% of human variation is within all populations (7).  Thus any two random people from one population are almost as different from each other as are any two random people from the world.  Mutations have arisen in populations since humans spread around the world, so some variation is mostly within particular populations.  Variants that are rare are likely to have arisen recently, and are more likely than common variants to be found in some populations but not others (8,9).  Common variants are usually common in all populations.  Only a small proportion of variants are common in one population and rare in another.  Usually, a difference among populations is of the sort that a variant has a frequency of 20% in one population and 30% in another.

References:
1. N Patil et al. (2001) Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21.Science 294, 1719-1723.
2. Hartl, D. L. and Clark, A. G. (1997) Principles of Population Genetics, 3rd ed. Sinauer, Sunderland, MA.
3. The International SNP Map Working Group (2001) A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409, 928-933.
4. Kruglyak, L. and Nickerson, D. A. (2001) Variation is the spice of life. Nat. Genet. 27, 234-236.
5. Przeworski, M., et al. (2000) Adjusting the focus on human variation. Trends Genet. 16, 296-302.
6. Tishkoff, S., et al. (1996) Global patterns of linkage disequilibrium at the CD4 locus and modern human origins. Science 271, 1380-1387.
7. Barbujani, G., et al. (1997) An apportionment of human DNA diversity. Proc. Natl. Acad. Sci. USA 94, 4516-4519.
8. Rieder, M. J., et al. (1999) Sequence variation in the human angiotensin converting enzyme. Nat. Genet. 22, 59-62.
9. Nickerson, D. A., et al. (1998) DNA sequence diversity in a 9.7- kb region of the human lipoprotein lipase gene. Nat. Genet. 19, 233-240.

Copyright © 1999-2003 Genetic Identity LLC
All Rights Reserved.
Click here for additional copyright information.

Privacy Policy

[Genetic Identity] [Personal Fees] [Legal Fees] [Sibling] [International] [Ancestry Testing] [FAQ] [Semen/Sperm ID] [Forensic Paternity Testing] [Pregnant Clients] [DNABanking] [FAQ] [State Standards] [Birth Certificate] [Basic Genetics] [Science of Identity] [Accreditation] [Contact Info] [Test Results] [.] [..]