Proposed Algorithm to Study DNA Faster

Scientists propose an algorithm to study DNA faster and more accurately

January 18, 2016
Stylized image of DNA
Stylized image of DNA. Credit:

A team of scientists from Germany, the United States and Russia, including Dr. Mark Borodovsky, a Chair of the Department of Bioinformatics at MIPT, have proposed an algorithm to automate the process of searching for genes, making it more efficient. The new development combines the advantages of the most advanced tools for working with genomic data. The new method will enable scientists to analyse DNA sequences faster and more accurately and identify the full set of genes in a genome.

Although the paper describing the  only appeared recently in the journal Bioinformatics, which is published by Oxford Journals, the proposed method has already proven to be very popular—the computer  has been downloaded by more than 1500 different centres and laboratories worldwide. Tests of the algorithm have shown that it is considerably more accurate than other similar algorithms.

The development involves applications of the cross-disciplinary field of bioinformatics. Bioinformatics combines mathematics, statistics and computer science to study biological molecules, such as DNA, RNA and protein structures. DNA, which is fundamentally an information molecule, is even sometimes depicted in computerized form (see Fig. 1) in order to emphasize its role as a molecule of biological memory. Bioinformatics is a very topical subject; every new sequenced genome raises so many additional questions that scientists simply do not have time to answer them all. So automating processes is key to the success of any bioinformatics project, and these algorithms are essential for solving a wide variety of problems.

One of the most important areas of bioinformatics is annotating genomes – determining which particular DNA molecules are used to synthesize RNA and proteins (see Fig. 2). These parts –  – are of great scientific interest. The fact is that in many studies, scientists do not need information about the entire genome (which is around 2 metres long for a single human cell), but about its most informative part – genes. Gene sections are identified by searching for similarities between sequence fragments and known genes, or by detecting consistent patterns of the nucleotide sequence. This process is carried out using predictive algorithms.

Locating gene sections is no easy task, especially in eukaryotic organisms, which includes almost all widely known types of organism, except for bacteria. This is due to the fact that in these cells, the transfer of genetic information is complicated by “gaps” in the coding regions (introns) and because there are no definite indicators to determine whether a region is a coding region or not.

Diagram showing the transmission of hereditary information in a cell
Diagram showing the transmission of hereditary information in a cell. Credit:

The algorithm proposed by the scientists determines which regions in the DNA are genes and which are not. The scientists used a Markov chain, which is a sequence of random events, the future of which is dependent on past events. The states of the chain in this case are either nucleotides or nucleotide words (k-mers). The algorithm determines the most probable division of a genome into coding and noncoding regions, classifying the genomic fragments in the best possible way according to their ability to encode proteins or RNA. Experimental data obtained from RNA give additional useful information which can be used to train the model used in the algorithm. Certain gene prediction programs can use this data to improve the accuracy of finding genes. However, these algorithms require type-specific training of the model. For the AUGUSTUS software program, for example, which has a high level of accuracy, a training set of genes is needed. This set can be obtained using another program – GeneMark-ET – which is a self-training algorithm. These two algorithms were combined in the BRAKER1 algorithm, which was proposed jointly by the developers of AUGUSTUS and GeneMark-ET.

BRAKER1 has demonstrated a high level of efficiency. The developed program has already been downloaded by more than 1500 different centres and laboratories. Tests of the algorithm have shown that it is considerably more accurate than other similar algorithms. The example running time of BRAKER1 on a single processor is ∼17.5 hours for training and the prediction of genes in a genome with a length of 120 megabases. This is a good result, considering that this time may be significantly reduced by using parallel processors, and this means that in the future, the algorithm might function even faster and generally more efficiently.

Tools such as these solve a variety of problems. Accurately annotating genes in a genome is extremely important – an example of this is the global 1000 Genomes Project, the initial results of which have already been published. Launched in 2008, the project involves researchers from 75 different laboratories and companies. Sequences of rare gene variants and gene substitutions were discovered, some of which can cause disease. When diagnosing genetic diseases, it is very important to know which substitutions in gene sections cause the disease to develop. The project mapped genomes of different people , noting their coding sections, and rare nucleotide substitutions were identified. In the future, this will help doctors to diagnose complex diseases such as heart disease, diabetes, and cancer.

BRAKER1 enables scientists to work effectively with the genomes of new organisms, speeding up the process of annotating genomes and acquiring essential knowledge about life sciences.

 Explore further: Novel algorithm better assembles DNA sequences and detects genetic variation

Read more at:


The Children of Adam – National Geographics



In the name of God, “The Most Gracious”, The Dispenser of Grace”

In the name of the Father, Son, and Holy Ghost, “the Lord, The Almighty, the Creator, the Maker, the Godhead

Jehovah, Yahweh, Catholic, Jew, Baptist, Methodist, Buddist and other religious groups

God is the Ruler of the Universe and creator of all life which began in Africa and spread around the world. God makes no distinction of religion, color, sexual orientation. He only and simply stated follow my teaching and I give you free will to choose me your Father or Satan.

This is a fact that lies within all of us in a special place in our DNA.

African is the beginning and the end, take your place with God.

When reviewing material for this article, I found so much hate and rejection of the scientifically validated facts. Challenges based on religious preferences without an open mind or understanding or wanting to seek validation. This is the position of some of the world today but it is changing. We are one and you are his people.

Coming soon, DNA spirituality, health, disease, relationships and mental health. I wrote a blog that came to my mind while flying to DC-Bal. So for five hours, I wrote the blog while deep in thought and prayer. It has nothing to do with Trump tweet today regarding North Korea.

I met a man in the Dollar Store today and I turn to him and I said I know you some how. He said to me I have been waiting for you. We talked a little about our father our God. He wrote down a book he wanted me to read, and we continue a discussion about our connection. Finally, he said we shall meet again and other will be coming to give you greetings. I asked his name and he just smiled. He said I know your name. As he said to me, “there is no burden too heavy or hard to that you can not bear the burden if you believe in God. This is the third experience since my transition and return to this life.

May our God bless you all and keep you safe.


Gedmatch To Triangulate Matches



Gedmatch is a very useful tool, but you need to be a Tier 1 user. You must have an Autosomal DNA testing.

If you are looking for a family using DNA results you are probably using GEDMatch or one of the other website to help compare your DNA kit results.  Triangulation is comparing A to B, then A to C, then B to C, etc.  You might match B and B might match C, but that doesn’t mean you match C.  Here is some information and advice for those of you using GEDMatch and its Triangulation feature.  You have to be a Tier 1 user to get this feature, but Triangulation on GEDMatch is well worth the $10 a month donation.

If you have a match of interest,  use yourself or relative  and the match of interest and on GEDmatch  use the “People who match one or both of two kits.”  Enter both kit numbers (yourself or your relative first so that everyone is being compared to you in the later options), and submit. In the top part, you get all the people who match the two of you. These people that show up in the top section matching both of you can be in three categories:

1. People who match both of you on the same chromosome and segment. (you have a common ancestral couple)
2. People who match both of you, but on different chromosomes and segments (you probably have a common ancestral couple)
3. People who match both of you, but through different common ancestral couples.
Check all the boxes of all the people who match both of you and click “submit.”
You will then get the GEDmatch visualization options. What’s nice about this screen is no matter which you chose, you can use your back button in the browser to get back to it to use another tool in the visualization options.
If you click on the “2-D Chromosome Browser” you will see all the segments where everyone is matching you. As you scroll down the page, look for the segments that are the colors for the segments greater than 10 CM (ignore pink for example).  You will find that most of the time that what you are looking for includes a lot of green segments. So once you’ve found where there are at least three of you are overlapping on the same segment, pursue that as a triangulated segment. Remember that you are not showing on this as a separate bar. It’s showing how people are matching you. Either save this page or do a screen capture of the segment with the segment beginning and ending points in the table above the chromosome image.
Go back to the Visualization options and use “Segment csv file.”  This gives you all the matching chromosomes and segments of all the people who matched both of you. This not only gives the segments where they are matching you and the other kit you used (in the People who match one or both of two kits.) but also shows how they match each other. Even if you do follow that one location where you had two or more people matching you on the same chromosome, you should keep track of other segments because other matches may eventually show up that will match on those, too.
One thing that is difficult with the “Segment csv File” is that if there are close relatives involved (say your aunt and yourself and your sister), it’s going to show a lot of segments where we match each other. Those aren’t going to help you identify the unknown ancestral couple shared with other matches. So this file is created sorted by names of matches. So you can go through the resulting spreadsheet (from opening the csv file) and first delete all rows of close relatives. You can also do this when you can tell that the matches are close relatives.  Then delete the rows showing matching segments to each other.

Triangulation on GEDMatch – Why Do It This Way?

Working with GEDmatch this way, you verify triangulated segments by being able to get the Segment csv file. You have to use triangulation because even if two people who match each other are matching you on a segment does not necessarily mean that they are matching each other. It is likely they will be, .but it is possible that they match each other through a different ancestral couple and just happen to match you on the same segment.  One may be matching on your maternal chromosome and one the paternal chromosome.  Triangulation on GEDMatch helps to prove that they are indeed matching each other on the same segment and the “Segment csv File” does this for you.
To read more about DNA triangulation, there are several articles listed at ISOGG.
%d bloggers like this: