|
Writing
an article, which serves, as a guide to potential entrants into
bio-informatics is not an easy task. For the simple reason that the
very term is misunderstood by many people both within and outside the
industry. My structuring of the article and its content reflect this
with more emphasis on understanding what is bio-informatics? It is my
strong belief that proper understanding of bio-informatics is a big
hurdle well-crossed for any new potential entrant. Before I get onto
defining bio-informatics and detailing various streams within it let me
just give an idea of the opportunity we are talking about here.
According to industry forecasts and estimates, projected worldwide
revenues from bio-informatics and related activities alone are
estimated at US$ 25 billion. Predictably it is the hot new buzzword on
the lips of the venture capitalists and market watchers.
Let
us move beyond the hype for a moment and focus on what exactly does
bio-informatics mean or what does it connote? Unfortunately there is no
single universally accpeted definition of bio-informatics and here are
a few definitions used by various people:
1.
“...bio-informatics has to do with management and the subsequent use of
biological information, particular genetic information.”
2. “Information science has been applied to biology to produce the field called Bio-informatics.”
3.
“the application of computer technology to the management of biological
information; the melding of molecular biology with computer science”
4.
“Bio-informatics: the body of tools, algorithms and know-how needed to
handle complex biological information (the technological aspect)
Computational biology: the application of bio-informatics tools to
perform biological studies (the scientific aspect)”
5.
“Bio-informatics is the creation of databases containing biological
data and data mining for significant information from them”
6. “The organization and the analysis of biological data.”
While
most of them do define bio-informatics they are not complete.
Personally I prefer to define bio-informatics as the application of
computers to biology so as to enable, hasten or facilitate the process
of biological discovery.
Believe
me this definition throws open a number of questions than answers.
“Information technology” and “data mining” don’t really mean too much
to a molecular biologist, and they don’t communicate the possibilities
that computers and IT create for researchers. And from the opposite
perspective, “biology” doesn’t mean much to a computer professional.
What is it that biological researchers do? What are they trying to find
out? How do they go about finding it out? And the obvious question,
what are the benefits of marrying information technology to biological
research, and why is bio-informatics the hot area that it is?
Bio-informatics is an umbrella term which covers topics like mapping of
the genome, evolutionary studies, sequencing, statistical analysis of
biopolymers, sequence comparisons, gene modeling, DNA sequence to DNA
structure, molecular modeling, structure comparisons,
sequences/structures to function, gene expression and genetic/metabolic
networks, database management, visualization and interaction and
comparing tools. Bio-informatics in common usage usually refers to
genomics, though it covers many other fields like proteomics,
transcriptomics etc. Genome refers to the total DNA content of the cell
that is the total information content, and genomics is the study of the
genome. Similarly, proteomics is the study of the total protein content
of the cell (organism) and transcriptomics is the study of the total
RNA content. In this article, I will try to shed some light on genomics.
The
Human Genome Project was a massive collaborative effort by the
governments of many countries, universities and a private organization
(Celera) to sequence the DNA sequence of man. This gargantuan effort
was recently concluded a few years ahead of schedule. This
amazing achievement was possible only because of parallel sequencing by
robotic sequencers and computerized analysis of the results. The
complete sequence is now available to anyone, from public databases on
the Internet. Now that we have the DNA sequence with us, what’s next?
The
DNA content of every human being on the planet differs by only 0.1%.
This accounts for all the genetic diversity, genetic predisposition and
disease conditions. The actual known information content (the coding
genes) at present constitute only 3% of the genome and the remaining
97% does not have any known function.
The
goal of bio-informatics, in the aftermath of the Human Genome and other
genome projects, is to develop an understanding of how living things
are built from the DNA that encodes them. We need to know which parts
of the genome actually have a functional value, what is the function,
what are the physiological implications of changes in the coding region
and how to identify predisposition to disease conditions from the
genetic makeup of an individual. This is like finding a needle in the
haystack. Actually, finding the proverbial needle is easier because in
the case of the genome, the needle looks almost exactly like the hay!!
The
challenge now is to decode the genome - and that is a difficult task.
To get an idea of how difficult - we still have difficulty identifying
unknown genes by computer analysis of genomic sequence. Though we have
leads on how to correlate and extend information from model systems to
the human system, we still cannot predict or model how a sequence of
amino acids folds into the specific three-dimensional structure of a
functional protein.
In
terms of the volume of data to be handled also, the challenges are
immense. The amount of data in GenBank is now growing at an exponential
rate, and as data types beyond DNA, RNA, and protein sequence begin to
undergo the same kind of explosion, simply managing, accessing, and
presenting this data to users in an intelligible form is a critical
task. Bioinformaticians, who interact with the computers to do this,
need to work closely with academic and clinical researchers in the
biological sciences to manage such staggering amounts of data.
Besides
the volume of the data to be managed, the interlinking of the various
bits of data adds another dimension to the complexity of the issue. A
spot on a DNA array or DNA chip, for instance, conveys not only the
immediate information about its intensity, but also relates to many
strata of information about its genomic location, DNA sequence and the
coded protein’s structure & function. Creating easily navigable
information systems that allow the researcher to seamlessly follow
these links without getting lost in a deluge of irrelevant data is a
huge opportunity for Bio-informatics companies.
Lastly,
each gene in the genome isn’t a stand-alone unit. Bio-chemical pathways
are formed by the interaction of multiple genes, which in turn feed
into other pathways. The biochemical flux and the physiological
condition of the cell at any point are governed by the external
environment (like pathogens, temperature, chemical stimuli etc.) as
much as the genomic instruction set. Combining the genomic and
biochemical data into a quantitative and predictive model will be the
work of a complete generation of bio-informaticians. To accomplish the
aforementioned tasks manually would take decades. For example it is
estimated that to completely characterize a single protein
conventionally, it would take 40 years! The application of Information
Technology makes it possible to achieve this mammoth job.
Having
cleared the background to some extent let me focus on what
bio-informatics can do? Bio-informatics can contribute to a better
understanding of molecular evolution, origin of life, genomics and
proteomics - the Human Genome Project, theoretical biology, complexity
and information theory, bio-technology, lead drug discovery (lead
informatics) and computing with biomolecules (DNA computers).
Scientists hope that with the unraveling of the genome secrets, we
should have a deeper understanding of what causes disease at the
genetic and protein level, what drug targets can be aimed for, design
of drugs to hit these targets and eventually personalized healthcare.
With
the scope and relevance of bio-informatics covered let me now move onto
briefly describing various business opportunities in the sector for
potential entrants. New sequence information is being produced at a
staggering rate. A non-redundant subset of the major sequence databases
contains over 650,000,000 bases of DNA sequence. The contents of
GenBank, which is a public database containing the human genome
sequences, double every year. How do we handle all the information?
This information has to be Produced (sequencing), Processed (put in the
appropriate format), Stored (as a database), Shared (on the Internet
usually), Queried (to retrieve useful information), Retrieved,
Visualized (preferably in a graphical manner which highlights trends,
similarities etc.), Annotated (ancillary information about the gene,
its protein product, its function and structure is entered here) and
Curated (entered as a complete record with all relevant details,
position in the genetic or protein network).
Indian
companies have business opportunity in all of these areas. Production,
processing and annotation of data require the so-called ‘wet-lab’,
which is the conventional molecular biology laboratory. Annotation is
the result of either original research carried out or from the study of
existing scientific literature, which is then appended to the relevant
part of the sequence. The other areas can be addressed by companies
with an IT background who have a good biology or bio-informatics team.
This
naturally evokes the obvious question – what is the background of the
average bioinformatician? The answer is that there is no such creature
as the average bioinformatician. The field is populated with people
with backgrounds as diverse as physics, crystallography, molecular
biology, computer science, mathematics, statistics, biochemistry,
biotechnology, synthetic chemistry, stereochemistry and medical
sciences.
A
bioinformatician needs to have very strong skills in his/her subject
(from above) and also cross-domain understanding of atleast a few of
the other areas. My company for instance has a team of domain experts
in statistics, modeling, molecular biology, protein engineering and
pharmaceutical science comprising the bio-informatics unit. Each of the
team members is aware of and understands the other domains besides
being the expert in his/her field.
Why
is this so necessary? A biologist can use existing software tools but
might misinterpret the results. For the biologist, the tool or the
software kit is like a black box; the result appears magically once you
key in your inputs. A biologist who has the ability to work around the
limitations of existing software would be at a great advantage.
At
the same time a computer scientist or a programmer can produce
innovative and/or efficient algorithms and tools, but these might lack
biological relevance. A biological training/background is important to
give the development a proper focus and to maintain relevance to the
biological system under scrutiny. Cross-domain training is extremely
important to be a key-contributing player in this field.
Fresh
entrants to the Bio-informatics arena should beware of the ‘just a tool
user’ stigma. To be a key player in Bio-informatics, it is not
sufficient to be proficient in using the software tools developed by
third parties and in annotating the existing sequences. These
activities are both quite low in the value chain and do not generate
any IPR for the company. It is of prime importance for Indian
Bio-informatics companies to move up the value chain and start
providing value-added services like data mining and developing better
algorithms for the same.
The creation of non-redundant fully
annotated databases is also a good potential business opportunity. This
entails the set up of a sequencing lab for a specific organism
(initially) and characterizing it completely in terms of the DNA
sequence, the coded proteins and the regulatory and metabolic pathways.
These databases are then subscribed out to pharmaceutical or academic
institutions at a price, which constitutes the revenue model. The most
attractive part of this business is the creation of intellectual wealth
in the form of IPR, which is a sure shot way to move up the value chain
in this field. These databases constitute ‘model systems’ which are
used by researchers to estimate structure and function in the human
system by correlation. Some model systems that are widely used are the
mouse system, yeast system and drosophila system.
A
major opportunity is the clinical research market. The spate of drugs
which are identified by bio-informatics need to be clinically evaluated
before they get approval. This is a long process that is extremely
expensive. There is a significant savings in conducting these trials in
India as compared to the USA, since the cost of qualified manpower for
the large number of clinical and biochemical tests which need to be
performed. This should lead to a number of contract clinical research
organizations, which take contracts from multinationals to conduct
clinical trials for their drug candidates. A developing trend is to use
molecular simulation packages to screen out unfavorable toxic drug
candidates by in silico testing, which reduces the number of
drug candidates to be tested. However, this technique can be used only
to eliminate unsuitable candidates and cannot be used as a substitute
for toxicity studies, since no regulatory body allows bypassing the
statutory in vivo toxicity testing.
In
conclusion, test the waters before you jump onto the Bio-informatics
bandwagon. Evaluate your business and revenue model carefully, and if
you can generate intellectual wealth on the way – go full steam ahead
and just do it!
The
author K. Rajeshwar is General Manager-Business Development at bigtec
Private Limited, a bio-informatics company in Bangalore. For comments
or queries about the article, please mail:
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
Issue BG3 June01
Related Items:
A guide to protect your Intellectual Property Righ
A startup gets a boost
A Student for Life
A tale of today and tomorrow
Agony and ecstasy
Only registered users can write comments. Please login or register. AkoComment © Copyright 2004 by Arthur Konze - www.mamboportal.com All right reserved |