Proposed Algorithm to Study DNA Faster

Scientists propose an algorithm to study DNA faster and more accurately

January 18, 2016
Stylized image of DNA
Stylized image of DNA. Credit:

A team of scientists from Germany, the United States and Russia, including Dr. Mark Borodovsky, a Chair of the Department of Bioinformatics at MIPT, have proposed an algorithm to automate the process of searching for genes, making it more efficient. The new development combines the advantages of the most advanced tools for working with genomic data. The new method will enable scientists to analyse DNA sequences faster and more accurately and identify the full set of genes in a genome.

Although the paper describing the  only appeared recently in the journal Bioinformatics, which is published by Oxford Journals, the proposed method has already proven to be very popular—the computer  has been downloaded by more than 1500 different centres and laboratories worldwide. Tests of the algorithm have shown that it is considerably more accurate than other similar algorithms.

The development involves applications of the cross-disciplinary field of bioinformatics. Bioinformatics combines mathematics, statistics and computer science to study biological molecules, such as DNA, RNA and protein structures. DNA, which is fundamentally an information molecule, is even sometimes depicted in computerized form (see Fig. 1) in order to emphasize its role as a molecule of biological memory. Bioinformatics is a very topical subject; every new sequenced genome raises so many additional questions that scientists simply do not have time to answer them all. So automating processes is key to the success of any bioinformatics project, and these algorithms are essential for solving a wide variety of problems.

One of the most important areas of bioinformatics is annotating genomes – determining which particular DNA molecules are used to synthesize RNA and proteins (see Fig. 2). These parts –  – are of great scientific interest. The fact is that in many studies, scientists do not need information about the entire genome (which is around 2 metres long for a single human cell), but about its most informative part – genes. Gene sections are identified by searching for similarities between sequence fragments and known genes, or by detecting consistent patterns of the nucleotide sequence. This process is carried out using predictive algorithms.

Locating gene sections is no easy task, especially in eukaryotic organisms, which includes almost all widely known types of organism, except for bacteria. This is due to the fact that in these cells, the transfer of genetic information is complicated by “gaps” in the coding regions (introns) and because there are no definite indicators to determine whether a region is a coding region or not.

Diagram showing the transmission of hereditary information in a cell
Diagram showing the transmission of hereditary information in a cell. Credit:

The algorithm proposed by the scientists determines which regions in the DNA are genes and which are not. The scientists used a Markov chain, which is a sequence of random events, the future of which is dependent on past events. The states of the chain in this case are either nucleotides or nucleotide words (k-mers). The algorithm determines the most probable division of a genome into coding and noncoding regions, classifying the genomic fragments in the best possible way according to their ability to encode proteins or RNA. Experimental data obtained from RNA give additional useful information which can be used to train the model used in the algorithm. Certain gene prediction programs can use this data to improve the accuracy of finding genes. However, these algorithms require type-specific training of the model. For the AUGUSTUS software program, for example, which has a high level of accuracy, a training set of genes is needed. This set can be obtained using another program – GeneMark-ET – which is a self-training algorithm. These two algorithms were combined in the BRAKER1 algorithm, which was proposed jointly by the developers of AUGUSTUS and GeneMark-ET.

BRAKER1 has demonstrated a high level of efficiency. The developed program has already been downloaded by more than 1500 different centres and laboratories. Tests of the algorithm have shown that it is considerably more accurate than other similar algorithms. The example running time of BRAKER1 on a single processor is ∼17.5 hours for training and the prediction of genes in a genome with a length of 120 megabases. This is a good result, considering that this time may be significantly reduced by using parallel processors, and this means that in the future, the algorithm might function even faster and generally more efficiently.

Tools such as these solve a variety of problems. Accurately annotating genes in a genome is extremely important – an example of this is the global 1000 Genomes Project, the initial results of which have already been published. Launched in 2008, the project involves researchers from 75 different laboratories and companies. Sequences of rare gene variants and gene substitutions were discovered, some of which can cause disease. When diagnosing genetic diseases, it is very important to know which substitutions in gene sections cause the disease to develop. The project mapped genomes of different people , noting their coding sections, and rare nucleotide substitutions were identified. In the future, this will help doctors to diagnose complex diseases such as heart disease, diabetes, and cancer.

BRAKER1 enables scientists to work effectively with the genomes of new organisms, speeding up the process of annotating genomes and acquiring essential knowledge about life sciences.

 Explore further: Novel algorithm better assembles DNA sequences and detects genetic variation

Read more at:


No One testing Company Can Provide Best Matches

No One testing Company Can Provide Best Matches

Nobody can predict which database is going to have your best matches because it is somewhat a matter of chance of who tests where. Not all of your relatives or potential relatives are going to test at the same company. Also, the algorithms (mathematical formula) is not the same for each testing company.

The big 3 companies at the moment are, 23andMe, and FTDNA. In Europe, it is GPS and Living DNA. You can test with the later if you have European roots.

I would highly recommend uploading your data to the free site which acts as a central site for intercompany compares. Since it is on a volunteer basis, it only has some of the results from each company, but it does tend to attract the most interested researchers, and therefore should have a good response rate if you attempt contact. You can upgrade to Tier 1 for advanced data manipulations for $10.00 per month.

Once you get uploaded, try the one-to-many lookup. You can see the original source of a contributor by the first letter of his kit code. (A=ancestry, M=23andme, T=FTDNA).

The Children of Adam – National Geographics



In the name of God, “The Most Gracious”, The Dispenser of Grace”

In the name of the Father, Son, and Holy Ghost, “the Lord, The Almighty, the Creator, the Maker, the Godhead

Jehovah, Yahweh, Catholic, Jew, Baptist, Methodist, Buddist and other religious groups

God is the Ruler of the Universe and creator of all life which began in Africa and spread around the world. God makes no distinction of religion, color, sexual orientation. He only and simply stated follow my teaching and I give you free will to choose me your Father or Satan.

This is a fact that lies within all of us in a special place in our DNA.

African is the beginning and the end, take your place with God.

When reviewing material for this article, I found so much hate and rejection of the scientifically validated facts. Challenges based on religious preferences without an open mind or understanding or wanting to seek validation. This is the position of some of the world today but it is changing. We are one and you are his people.

Coming soon, DNA spirituality, health, disease, relationships and mental health. I wrote a blog that came to my mind while flying to DC-Bal. So for five hours, I wrote the blog while deep in thought and prayer. It has nothing to do with Trump tweet today regarding North Korea.

I met a man in the Dollar Store today and I turn to him and I said I know you some how. He said to me I have been waiting for you. We talked a little about our father our God. He wrote down a book he wanted me to read, and we continue a discussion about our connection. Finally, he said we shall meet again and other will be coming to give you greetings. I asked his name and he just smiled. He said I know your name. As he said to me, “there is no burden too heavy or hard to that you can not bear the burden if you believe in God. This is the third experience since my transition and return to this life.

May our God bless you all and keep you safe.


Gedmatch To Triangulate Matches



Gedmatch is a very useful tool, but you need to be a Tier 1 user. You must have an Autosomal DNA testing.

If you are looking for a family using DNA results you are probably using GEDMatch or one of the other website to help compare your DNA kit results.  Triangulation is comparing A to B, then A to C, then B to C, etc.  You might match B and B might match C, but that doesn’t mean you match C.  Here is some information and advice for those of you using GEDMatch and its Triangulation feature.  You have to be a Tier 1 user to get this feature, but Triangulation on GEDMatch is well worth the $10 a month donation.

If you have a match of interest,  use yourself or relative  and the match of interest and on GEDmatch  use the “People who match one or both of two kits.”  Enter both kit numbers (yourself or your relative first so that everyone is being compared to you in the later options), and submit. In the top part, you get all the people who match the two of you. These people that show up in the top section matching both of you can be in three categories:

1. People who match both of you on the same chromosome and segment. (you have a common ancestral couple)
2. People who match both of you, but on different chromosomes and segments (you probably have a common ancestral couple)
3. People who match both of you, but through different common ancestral couples.
Check all the boxes of all the people who match both of you and click “submit.”
You will then get the GEDmatch visualization options. What’s nice about this screen is no matter which you chose, you can use your back button in the browser to get back to it to use another tool in the visualization options.
If you click on the “2-D Chromosome Browser” you will see all the segments where everyone is matching you. As you scroll down the page, look for the segments that are the colors for the segments greater than 10 CM (ignore pink for example).  You will find that most of the time that what you are looking for includes a lot of green segments. So once you’ve found where there are at least three of you are overlapping on the same segment, pursue that as a triangulated segment. Remember that you are not showing on this as a separate bar. It’s showing how people are matching you. Either save this page or do a screen capture of the segment with the segment beginning and ending points in the table above the chromosome image.
Go back to the Visualization options and use “Segment csv file.”  This gives you all the matching chromosomes and segments of all the people who matched both of you. This not only gives the segments where they are matching you and the other kit you used (in the People who match one or both of two kits.) but also shows how they match each other. Even if you do follow that one location where you had two or more people matching you on the same chromosome, you should keep track of other segments because other matches may eventually show up that will match on those, too.
One thing that is difficult with the “Segment csv File” is that if there are close relatives involved (say your aunt and yourself and your sister), it’s going to show a lot of segments where we match each other. Those aren’t going to help you identify the unknown ancestral couple shared with other matches. So this file is created sorted by names of matches. So you can go through the resulting spreadsheet (from opening the csv file) and first delete all rows of close relatives. You can also do this when you can tell that the matches are close relatives.  Then delete the rows showing matching segments to each other.

Triangulation on GEDMatch – Why Do It This Way?

Working with GEDmatch this way, you verify triangulated segments by being able to get the Segment csv file. You have to use triangulation because even if two people who match each other are matching you on a segment does not necessarily mean that they are matching each other. It is likely they will be, .but it is possible that they match each other through a different ancestral couple and just happen to match you on the same segment.  One may be matching on your maternal chromosome and one the paternal chromosome.  Triangulation on GEDMatch helps to prove that they are indeed matching each other on the same segment and the “Segment csv File” does this for you.
To read more about DNA triangulation, there are several articles listed at ISOGG.

Genetic Inheritance Follows Rules Concept 5


Ref: DNA for, access Aug 1, 2017

When Mendel proposed that each trait is determined by a pair of genes, it presented a potential problem. If parents pass on both copies of a gene pair, then offspring would end up with four genes for each trait. Mendel deduced that sex cells — sperm and eggs — contain only one parental gene of each pair. The half-sets of genes contributed by sperm and egg restore a whole set of genes in the offspring.

Mendel found that different gene combinations from the parents resulted in specific ratios of dominant-to-recessive traits. The results of a cross between two hybrid parents — each carrying one dominant and one recessive gene — were key to his synthesis. For example, a cross between two yellow-seed hybrids produces three times as many yellow seeds as green seeds. This is Mendel’s famous 3 to 1 ratio.

23andMe Cleared to Provide Health Reports

Pharmaceutical Companies are not liking this at all.

Purchasers of 23andMe ( Health + Ancestry DNA test now have access to genetic health risk reports for conditions including late-onset Alzheimer’s disease, Parkinson disease, Celiac disease and seven others.

This is a big step, if incremental, step forward for the company. Its DNA service initially provided ancestry information and genetic risk reports on about 250 conditions. In 2013, the FDA ordered 23andMe to stop providing health-related information to US customers because the company hadn’t proven its test were analytically or clinically validated.

The new reports calculate genetic risk based on the presence (or absence) of specific markers in a person’s DNA. To obtain FDA authorization for them, 23andMe conducted “extensive validation studies for accuracy and user comprehensive that met FDA standards,” according to its announcement. FDA considered evidence tying each condition with relevant genetic markers in customers’ DNA.

Take a look at the new 23andMe reports at (

%d bloggers like this: