Reprinted from the International Society of Genetic Genealogy August 2, 2017. No adjustment was made to this article and is the ISOGG position.
In genetic genealogy, a centiMorgan (cM) or map unit (m.u.) is a unit of recombinant frequency which is used to measure genetic distance. It is often used to imply distance along a chromosome, and takes into account how often recombination occurs in a region. A region with few cMs undergoes relatively less recombination. The number of base pairs to which it corresponds varies widely across the genome (different regions of a chromosome have different propensities towards crossover). One centiMorgan corresponds to about 1 million base pairs in humans on average. The centiMorgan is equal to a 1% chance that a marker at one genetic locus on a chromosome will be separated from a marker at a second locus due to crossing over in a single generation.
The genetic genealogy testing companies 23andMe, AncestryDNA, Family Tree DNA and MyHeritage DNA use centiMorgans to denote the size of matching DNA segments in autosomal DNA tests. Segments which share a large number of centiMorgans in common are more likely to be of significance and to indicate a common ancestor within a genealogical timeframe.
The centiMorgan was named in honor of geneticist Thomas Hunt Morgan by his student Alfred Henry Sturtevant. Note that the parent unit of the centiMorgan, the Morgan, is rarely used today.
centiMorgans vs megabases
CentiMorgans are interpolated numbers that take into consideration each area of a chromosome and its propensity to recombine. This means if two cousins share 40 cM on chromosome 1, and two different cousins share 40 cM on chromosome 5, they both can be predicted to share a certain degree of relationship statistically. Megabases vary slightly in different locations so that in the same scenario, if both sets shared 40 Mb pairs, it would be more difficult to ensure they are of a similar degree of relation without further accounting for location, chromosome and other factors.
Ann Turner provides a useful explanation: “I think of the cM as being a unit of ‘effective’ distance. As an analogy, a mile is a fixed quantity (5280 feet), and so are megabases. But the probability that a person can walk a mile in 20 minutes is more fluid. If the terrain is very rough, the “effective” distance of a literal mile might be more like two miles if you’re trying to arrive at a certain time. We’re more interested in the probability that a segment will be passed on intact than the size of the segment in Mb”.
As the cM is an empirical measure, based on recombination events in a particular dataset of parents and offspring, it can vary somewhat from study to study. This set of maps for each chromosome shows that the general shape of the centiMorgan vs megabase curve is similar for two datasets, but the absolute values are not quite the same:
cm values per chromosome
The following table compares cM values per chromosome at Family Tree DNA, GEDmatch, and 23andMe. AncestryDNA uses 3475 as the total cM according to the help screen for confidence level in a DNA match. This presumably excludes the X chromosome.
Probability of crossover
The following chart shows the estimated probability that a segment will be affected by a crossover. The chart does not take into account some variables such as inversions and different recombination rates for males and females.
Converting centiMorgans into percentages
The way the calculation works is that your total genome in cMs with the Family Finder test is 6770 cM. A half-identical match (such as a parent/child) is 3385 cM. This number has to be doubled to represent both the maternal and paternal sides giving a total of 6770 cM. Matt Dexter explains: “The reason the number is not 6770 or 6800, but rather 68, is that it saves an additional step doing the math to convert an answer to percent. For example, 3385 / 6770 = .5 then as a second step, .5 times 100 = 50%. Using 68 to start with saves the added math step. So (3385 / 6800) * 100 is the same thing as 3385 / 68, which results in = 50%.”
Human reference genome
The centiMorgan totals per chromosome are based on the Human Reference Genome. 23andMe and Ancestry DNA use Build 37. Family Tree DNA use Build 37 for matching but Build 36 for segment boundaries in the Chromosome Browser. Raw data files are provided in both formats. Build 37 filled in quite a few gaps, and the number of base pairs in each of the chromosomes was longer in Build 37 as compared to Build 36. Consequently the cM totals per chromosome are lower for Family Finder than they are for 23andMe. GedMatch use Build 36, and convert AncestryDNA and 23andMe data from Build 37 to Build 36 for backward compatibility.
The latest version of the Human Reference Genome, Build 38, was released in December 2013. However, none of the companies have as yet adopted Build 38 and there is a “gentleman’s agreement” in place to stick with Build 37 for the present time.
- Definition of centiMorgan from the FTDNA glossary
- How do you determine the centiMorgan value for a DNA segment FTDNA Learning Center article
- Definition of centiMorgan from the National Human Genome Research Institute
- Lobo I and Shaw K (2008). Thomas Hunt Morgan, genetic recombination and gene mapping . Nature Education 1(1):205.
- Genome Reference Consortium
- Introducing the New Human Genome Assembly: GRCh38. NCBI Insights, 24 December 2013
- Genome Reference Blog
- Look up tables for Build 36
- Look up tables for Build 37 (zip file)
- Rutgers Map Interpolator This resource allows you to determine cM-scale linkage-based map positions for any marker, given only its physical position.