Genetic Communities™ White Paper: Predicting fine-scale ancestral origins from the genetic sharing patterns among millions of individuals
Catherine A. Ball, Erin Battat, Jake K. Byrnes, Peter Carbonetto, Kenneth G. Chahine, Ross E. Curtis, Eyal Elyashiv, Ahna Girshick, Julie M. Granka, Harendra Guturu, Eunjung Han, Ariel Hippen Anderson, Eurie Hong, Amir Kermany, Natalie M. Myres, Keith Noto, Kristin A. Rand, Shiya Song, Yong Wang(in alphabetical order).
AncestryDNA™ offers several genetic analyses to help customers discover, preserve, and share their family history. Some of the features offered to date are based exclusively on genetic information. These include a genetic ethnicity or ancestry inference (described in Ethnicity Estimate White Paper) and an identity-by-descent (IBD) or DNA matching analysis (Matching White Paper). Other features, like DNA Circles, rely on the integration of pedigree and IBD data across the entire AncestryDNA database (DNA Circles White Paper). Each of these features provides complementary information to an ancestryDNA member: (1) the ethnicity estimate provides a distant picture of a customer’s genetic origins, perhaps hundreds or thousands of years ago; (2) DNA matches provide a customer with a list of fellow AncestryDNA test-takers who are relatives and with whom she or he shares a common ancestor within the last 10 generations; (3) DNA Circles integrate IBD and pedigree data to provide a customer with groups of relatives that appear to share DNA with one another due to a specific shared ancestor, to potentially reinforce their connection to this ancestor. In combination, these features provide a detailed portrait of an individual’s genetic ancestry.
Here, we augment these DNA and pedigree-based insights even further with our new genetic communities feature (Figure 1.1). Instead of considering the IBD connection between each pair of customers in isolation, we simultaneously analyze more than 20 billion connections identified among over 2 million AncestryDNA customers as a large genetic network (described in Section 3). Intuitively, because the estimated IBD connections between individuals are likely due to recent shared ancestry (within the past 10 generations), broader patterns in this large network likely represent recent shared history. The result is that we can identify clusters of living individuals that share large amounts of DNA due to specific, recent shared history. For example, we identify groups of customers that likely descend from immigrants participating in a particular wave of migration (e.g. Irish fleeing the Great Famine), (Insert: Duke B. Montgomery, Genetic Genealogist, force migration of African and South American Indians and enslavement of Native Americans), or customers that descend from ancestral populations that have remained in the same geographic location for many generations (e.g. the early settlers of the Appalachian Mountains and Blue Ridge Mountains). Following the identification of these clusters of individuals in the entire network, we can then assign any AncestryDNA customer to one or more of these clusters based on their IBD with other AncestryDNA members. Such assignment can provide a customer with insight into their recent ancestral history, in some cases traceable back to a historical event.
In the following coming sections, (Section 2) I will turn away for a moment to describe haplogroups which is specific to African and African-Americans, their geographical locations and population. Example of two families, one in the Blue Ridge Mountains of Virginia and the other living in two places in Henderson and Sayersville, Kentucky. You will be able to use this data to look at your own haplogroup. Genetic Communities is fine as long as you can apply it to yourself and to your family research. After that, we will turn back to the scientific principles behind the genetic network (Sections 3 and 4), how Ancestor identify clusters within it (Sections 5 and 6), their use of DNA and pedigree data to annotate these clusters (Section 7), and finally our method for assigning customer samples to these clusters (Section 8).