Jun 15 2001
A ready reckoner and guide for potential entrants into bio-informatics PDF Print E-mail
Written by K Rajeshwar   
Friday, 15 June 2001

Writing an article, which serves, as a guide to potential entrants into bio-informatics is not an easy task. For the simple reason that the very term is misunderstood by many people both within and outside the industry. My structuring of the article and its content reflect this with more emphasis on understanding what is bio-informatics? It is my strong belief that proper understanding of bio-informatics is a big hurdle well-crossed for any new potential entrant. Before I get onto defining bio-informatics and detailing various streams within it let me just give an idea of the opportunity we are talking about here. According to industry forecasts and estimates, projected worldwide revenues from bio-informatics and related activities alone are estimated at US$ 25 billion. Predictably it is the hot new buzzword on the lips of the venture capitalists and market watchers.

Let us move beyond the hype for a moment and focus on what exactly does bio-informatics mean or what does it connote? Unfortunately there is no single universally accpeted definition of bio-informatics and here are a few definitions used by various people:

1. “...bio-informatics has to do with management and the subsequent use of biological information, particular genetic information.”

2. “Information science has been applied to biology to produce the field called Bio-informatics.”

3. “the application of computer technology to the management of biological information; the melding of molecular biology with computer science”

4. “Bio-informatics: the body of tools, algorithms and know-how needed to handle complex biological information (the technological aspect) Computational biology: the application of bio-informatics tools to perform biological studies (the scientific aspect)”

5. “Bio-informatics is the creation of databases containing biological data and data mining for significant information from them”

6. “The organization and the analysis of biological data.”

While most of them do define bio-informatics they are not complete. Personally I prefer to define bio-informatics as the application of computers to biology so as to enable, hasten or facilitate the process of biological discovery.

Believe me this definition throws open a number of questions than answers. “Information technology” and “data mining” don’t really mean too much to a molecular biologist, and they don’t communicate the possibilities that computers and IT create for researchers. And from the opposite perspective, “biology” doesn’t mean much to a computer professional. What is it that biological researchers do? What are they trying to find out? How do they go about finding it out? And the obvious question, what are the benefits of marrying information technology to biological research, and why is bio-informatics the hot area that it is? Bio-informatics is an umbrella term which covers topics like mapping of the genome, evolutionary studies, sequencing, statistical analysis of biopolymers, sequence comparisons, gene modeling, DNA sequence to DNA structure, molecular modeling, structure comparisons, sequences/structures to function, gene expression and genetic/metabolic networks, database management, visualization and interaction and comparing tools. Bio-informatics in common usage usually refers to genomics, though it covers many other fields like proteomics, transcriptomics etc. Genome refers to the total DNA content of the cell that is the total information content, and genomics is the study of the genome. Similarly, proteomics is the study of the total protein content of the cell (organism) and transcriptomics is the study of the total RNA content. In this article, I will try to shed some light on genomics.

The Human Genome Project was a massive collaborative effort by the governments of many countries, universities and a private organization (Celera) to sequence the DNA sequence of man. This gargantuan effort was recently concluded a few years ahead of schedule. This amazing achievement was possible only because of parallel sequencing by robotic sequencers and computerized analysis of the results. The complete sequence is now available to anyone, from public databases on the Internet. Now that we have the DNA sequence with us, what’s next?

The DNA content of every human being on the planet differs by only 0.1%. This accounts for all the genetic diversity, genetic predisposition and disease conditions. The actual known information content (the coding genes) at present constitute only 3% of the genome and the remaining 97% does not have any known function.

The goal of bio-informatics, in the aftermath of the Human Genome and other genome projects, is to develop an understanding of how living things are built from the DNA that encodes them. We need to know which parts of the genome actually have a functional value, what is the function, what are the physiological implications of changes in the coding region and how to identify predisposition to disease conditions from the genetic makeup of an individual. This is like finding a needle in the haystack. Actually, finding the proverbial needle is easier because in the case of the genome, the needle looks almost exactly like the hay!!

The challenge now is to decode the genome - and that is a difficult task. To get an idea of how difficult - we still have difficulty identifying unknown genes by computer analysis of genomic sequence. Though we have leads on how to correlate and extend information from model systems to the human system, we still cannot predict or model how a sequence of amino acids folds into the specific three-dimensional structure of a functional protein.

In terms of the volume of data to be handled also, the challenges are immense. The amount of data in GenBank is now growing at an exponential rate, and as data types beyond DNA, RNA, and protein sequence begin to undergo the same kind of explosion, simply managing, accessing, and presenting this data to users in an intelligible form is a critical task. Bioinformaticians, who interact with the computers to do this, need to work closely with academic and clinical researchers in the biological sciences to manage such staggering amounts of data.

Besides the volume of the data to be managed, the interlinking of the various bits of data adds another dimension to the complexity of the issue. A spot on a DNA array or DNA chip, for instance, conveys not only the immediate information about its intensity, but also relates to many strata of information about its genomic location, DNA sequence and the coded protein’s structure & function. Creating easily navigable information systems that allow the researcher to seamlessly follow these links without getting lost in a deluge of irrelevant data is a huge opportunity for Bio-informatics companies.

Lastly, each gene in the genome isn’t a stand-alone unit. Bio-chemical pathways are formed by the interaction of multiple genes, which in turn feed into other pathways. The biochemical flux and the physiological condition of the cell at any point are governed by the external environment (like pathogens, temperature, chemical stimuli etc.) as much as the genomic instruction set. Combining the genomic and biochemical data into a quantitative and predictive model will be the work of a complete generation of bio-informaticians. To accomplish the aforementioned tasks manually would take decades. For example it is estimated that to completely characterize a single protein conventionally, it would take 40 years! The application of Information Technology makes it possible to achieve this mammoth job.

Having cleared the background to some extent let me focus on what bio-informatics can do? Bio-informatics can contribute to a better understanding of molecular evolution, origin of life, genomics and proteomics - the Human Genome Project, theoretical biology, complexity and information theory, bio-technology, lead drug discovery (lead informatics) and computing with biomolecules (DNA computers). Scientists hope that with the unraveling of the genome secrets, we should have a deeper understanding of what causes disease at the genetic and protein level, what drug targets can be aimed for, design of drugs to hit these targets and eventually personalized healthcare.

With the scope and relevance of bio-informatics covered let me now move onto briefly describing various business opportunities in the sector for potential entrants. New sequence information is being produced at a staggering rate. A non-redundant subset of the major sequence databases contains over 650,000,000 bases of DNA sequence. The contents of GenBank, which is a public database containing the human genome sequences, double every year. How do we handle all the information? This information has to be Produced (sequencing), Processed (put in the appropriate format), Stored (as a database), Shared (on the Internet usually), Queried (to retrieve useful information), Retrieved, Visualized (preferably in a graphical manner which highlights trends, similarities etc.), Annotated (ancillary information about the gene, its protein product, its function and structure is entered here) and Curated (entered as a complete record with all relevant details, position in the genetic or protein network).

Indian companies have business opportunity in all of these areas. Production, processing and annotation of data require the so-called ‘wet-lab’, which is the conventional molecular biology laboratory. Annotation is the result of either original research carried out or from the study of existing scientific literature, which is then appended to the relevant part of the sequence. The other areas can be addressed by companies with an IT background who have a good biology or bio-informatics team.

This naturally evokes the obvious question – what is the background of the average bioinformatician? The answer is that there is no such creature as the average bioinformatician. The field is populated with people with backgrounds as diverse as physics, crystallography, molecular biology, computer science, mathematics, statistics, biochemistry, biotechnology, synthetic chemistry, stereochemistry and medical sciences.

A bioinformatician needs to have very strong skills in his/her subject (from above) and also cross-domain understanding of atleast a few of the other areas. My company for instance has a team of domain experts in statistics, modeling, molecular biology, protein engineering and pharmaceutical science comprising the bio-informatics unit. Each of the team members is aware of and understands the other domains besides being the expert in his/her field.

Why is this so necessary? A biologist can use existing software tools but might misinterpret the results. For the biologist, the tool or the software kit is like a black box; the result appears magically once you key in your inputs. A biologist who has the ability to work around the limitations of existing software would be at a great advantage.

At the same time a computer scientist or a programmer can produce innovative and/or efficient algorithms and tools, but these might lack biological relevance. A biological training/background is important to give the development a proper focus and to maintain relevance to the biological system under scrutiny. Cross-domain training is extremely important to be a key-contributing player in this field.

Fresh entrants to the Bio-informatics arena should beware of the ‘just a tool user’ stigma. To be a key player in Bio-informatics, it is not sufficient to be proficient in using the software tools developed by third parties and in annotating the existing sequences. These activities are both quite low in the value chain and do not generate any IPR for the company. It is of prime importance for Indian Bio-informatics companies to move up the value chain and start providing value-added services like data mining and developing better algorithms for the same.
The creation of non-redundant fully annotated databases is also a good potential business opportunity. This entails the set up of a sequencing lab for a specific organism (initially) and characterizing it completely in terms of the DNA sequence, the coded proteins and the regulatory and metabolic pathways. These databases are then subscribed out to pharmaceutical or academic institutions at a price, which constitutes the revenue model. The most attractive part of this business is the creation of intellectual wealth in the form of IPR, which is a sure shot way to move up the value chain in this field. These databases constitute ‘model systems’ which are used by researchers to estimate structure and function in the human system by correlation. Some model systems that are widely used are the mouse system, yeast system and drosophila system.

photo.jpgA major opportunity is the clinical research market. The spate of drugs which are identified by bio-informatics need to be clinically evaluated before they get approval. This is a long process that is extremely expensive. There is a significant savings in conducting these trials in India as compared to the USA, since the cost of qualified manpower for the large number of clinical and biochemical tests which need to be performed. This should lead to a number of contract clinical research organizations, which take contracts from multinationals to conduct clinical trials for their drug candidates. A developing trend is to use molecular simulation packages to screen out unfavorable toxic drug candidates by in silico testing, which reduces the number of drug candidates to be tested. However, this technique can be used only to eliminate unsuitable candidates and cannot be used as a substitute for toxicity studies, since no regulatory body allows bypassing the statutory in vivo toxicity testing.

In conclusion, test the waters before you jump onto the Bio-informatics bandwagon. Evaluate your business and revenue model carefully, and if you can generate intellectual wealth on the way – go full steam ahead and just do it!

The author K. Rajeshwar is General Manager-Business Development at bigtec Private Limited, a bio-informatics company in Bangalore. For comments or queries about the article, please mail: This e-mail address is being protected from spam bots, you need JavaScript enabled to view it

Issue BG3 June01


Related Items:

A guide to protect your Intellectual Property Righ
A startup gets a boost
A Student for Life
A tale of today and tomorrow
Agony and ecstasy




Reddit!Del.icio.us!Google!Facebook!Slashdot!Netscape!Technorati!StumbleUpon!Newsvine!Furl!Yahoo!Ma.gnolia!Free social bookmarking plugins and extensions for Joomla! websites! title=



Be first to comment this article
RSS comments

Only registered users can write comments.
Please login or register.


AkoComment © Copyright 2004 by Arthur Konze - www.mamboportal.com
All right reserved

Last Updated ( Thursday, 18 January 2007 )
 
< Prev   Next >
Generated in 1.03994 Seconds