[Narrator:] As we celebrate 20 amazing years
of research, discovery, and innovation at the National Center for Biotechnology Information,
the past is prolog and the future… [Narrator:]… sparkles with the dew of excitement,
and expectation coalesce in NCBI’S ability to create critical resources, bridging basic
research and human health. [Narrator:] Biology is riding a wave of revolutionary
advances in technology, including such big science projects as the Human Genome and now
large scale worldwide collaborations that study the complexity of variation within the
human genome. [Narrator:] NCBI is meeting these challenges
with databases such as GenBank and recently, the database of Genotypes and Phenotypes,
dbGaP. [Donald Lindberg:] May be the most important
set of experiments of the century. [Narrator:] dbGaP links the genetic background
of thousands of individuals with their clinical information.
[Elizabeth Nabel:] When we were formulating the Framingham Share Program in which we wanted
to genotype the 10,000 participants in the Framingham Heart Study we knew that we had
60 years of clinical information that we had gathered…
[Nabel:] … and we wanted to put the genotypes and phenotypes together in a common database.
[Jim Ostell:] In a sense by putting these all into a common database we’re making a
more and more powerful telescope, which means that taking them all together and doing statistics
on them you see more than you can see from any single one.
[Francis Collins:] The existence of the db-GaP database is making all of this information
rapidly accessible to qualified investigators to begin to do the hard work of understanding
how this all works. [David Lipman:] Where the rubber meets the
road is really making an impact on healthcare, and to be at that stage, with what seems like
a basic research group, but now we’re looking at how these things can affect human health.
It’s very exciting. [Lindberg:] NCBI today is very different from
what it was in the beginning, roughly 20-years ago. It’s grown from 12 employees to nearly
500. And the story, I suppose, really begins with planning in 1985.
[Narrator:] Yes, 1985. And the National Library of Medicine’s long-range plan endorses both
the concept of the Library taking an active role in biotechnology information and the
creation of a national center. [Narrator:] As NLM gears up to build the information
systems that will be needed under the leadership of Dan Masys, director of the Lister Hill
Center, Denis Benson is appointed to build the framework for the future national center.
[Narrator:] Numbering only a handful, Benson’s team begins linking Medline to the Nucleotide
Databases, and tt about the same time on Capitol Hill, the words ‘biotechnology information’
are beginning to resonate in the halls of Congress
On March 6th, 1987, Senator Pepper introduces the bill H.R. 393 that will create NCBI. Pepper
notes: “We are dealing with nothing less than the mystery of human life and the unfolding
scroll of knowledge, seeking to penetrate that mystery, which is life itself.”
[David Obey :] And Bill Natcher and those of us on the appropriations committee supported
the project. [Obey:] The magnitude of the Genome project
was nothing short of Jack Kennedy’s vision of landing a man on the moon. It helped launch
a grand, national challenge of utmost importance to human health.
[Narrator:] A little over a year later, November 4th, 1988, the efforts of Pepper and key members
of the House and Senate, including Henry Waxman, William Natcher, David Obey, Lawton Chiles
and Edward Kennedy pay off with the signing of H.R. 393 by President Reagan.
[Obey:] NIH Director, Jim Wyngarden told me during hearings that year that we had sequencing
data on less than one-tenth of one percent of the human genome.
[Obey:] He said then that while biology research was rapidly accelerating, the ability to analyze
and share information was severely constrained, if we were going to gain an understanding
of disease we would need new and better information approaches.
[Narrator:] In creating the National Center for Biotechnology Information, the legislation
lays out its mission: To conduct basic research in computational biology, to build databases,
and to provide world-wide access to its databases. [Narrator:] A young researcher at the National
Institute of Diabetes and Digestive and Kidney Diseases, Dr. David Lipman, is chosen to lead
NCBI. [Narrator:] Lipman is one of the key developers
of the FASTA algorithm, which revolutionized sequence comparisons.
[Narrator:] Once in place at NCBI, Lipman quickly names his three branch chiefs, Dennis
Benson, James Ostell, and David Landsman. [Landsman:] And it’s been a wonderful experience
being involved with some outstanding scientists from all over, but that growth process of
starting as something that’s very, very small and growing more than ten-fold has been
great to watch. [Narrator:] Within a year, Lipman and his
core group have settled upon a strategic plan with three principles: Creating GenBank DNA
sequence records from the journal literature, linking Medline citations to GenBank and other
biology databases… [Narrator:] …and using a standard representation
schema, ASN 1, for all biological objects in the NCBI databases.
[Narrator:] In 1990, the introduction of BLAST, the basic local alignment search tool, makes
it easier to rapidly scan huge sequence databases for similar sequences with millions of known
sequences, and the closest matches are determined in a matter of seconds.
[Landsman:] That’s one of the tools that have been developed in the Computational Biology
Branch and used in a very productive way as a resource by the NCBI.
[Narrator:] BLAST becomes so widely used and pervasive in the scientific community that
the word is used as both a noun and a verb. [Narrator:] A year later, 1991, and NCBI debuts
the Entrez search and retrieval system, a software toolkit that enables a user to rapidly
search several hundred megabytes of sequence and literature data, using techniques that
are fast and intuitive. [Narrator:] In 1992, NCBI becomes home to
NIH’s GenBank, Genetic Sequence Database. As such, NCBI works closely with partners
at the European Bioinformatics Institute in the UK and the DNA Databank of Japan, making
it a truly international effort that continues today.
[Narrator:] By October 1993, there are 50,000 Medline records connected to the sequence
database, and by December of that year, the number has tripled to 150,000.
[Narrator:] About a year later, Network Entrez debuts with 500,000 Medline records, and it
is extremely popular. [Narrator:] Then, in 1997, when it’s clear
that the internet is “the” way to get information out, it’s time for major change.
[Lipman:] I went to see Dr. Lindberg about the possibility of taking Medline, which had
been available for a very low fee in a variety of ways…to make it freely available on line,
on the Web, as PubMed. [Narrator:] The idea of “free Medline”
quickly finds favor on Capitol Hill. [Arlen Specter:] And I’m delighted to see
that the superhighway on medical information will now become a freeway.
[Tom Harkin:] The latest medical breakthroughs will literally be at the finger tips of American
families and that is a breakthrough in itself which will directly improve human lives.
[Al Gore:] I dare say that this development by itself may do more to reform and improve
the quality of healthcare in the United States than anything else we’ve done in a long
time. [Lipman:] Having your parents get a flu shot.
And you kind of go back and forth, should I really bother? Should I encourage my parents
to do that? So let’s look at — boy, look what he’s trying to do here.
[Lipman:] Should I get a flu shot? That’s not going to work.
[Gore:] The term ‘should’ was not found. [Lipman:] Wait a second. Wait, look we’ve got a good
query right here. [Gore:] Where? [Lipman:] Vaccination against… [Gore:] Vaccination
against influenza in elderly persons. Okay. Pretty close. [Lipman:] Let’s try that one.
[Gore:] No, let’s see related articles. [Lipman:] You would have found flu shots are
one of the most important healthcare interventions for the elderly.
[Gore:] Well, I’m very impressed, and I’m even more impressed that when I went off your
script and said should I get a flu shot, it found a way into it. So — [Lipman:] You had
me nervous now. [Narrator:] In 1999, with the advent of the
Human Genome Project, sequence data explodes and NCBI develops a suite of specialized databases
and tools, such as LocusLink…RefSeq…and Microbial Genomes.
[Narrator:] Interest in individual variation soars, and NCBI introduces its “GEO”,
Gene Expression Database, and its db-SNP database that identifies areas where the genome commonly
varies by a single letter. [Collins:] The genome, much of it catalyzed
by the wonderful folks at NCBI who have had all of these remarkable abilities to archive
and display the data and in fact think about it.
[Narrator:] PubMed Central debuts in 2000. It’s a free, full-text digital archive of
biomedical and life sciences journal literature. [Lipman:] The richest information about biological
function right now is really in the literature. It’s not in specialized databases. It’s
not in some mathematical framework like quantum mechanics….
[Lipman:] …it’s in the papers that biologists are writing, and the easiest way to learn
about that is to read the papers. [Obey:] And we hope that our recent requirement
that NIH-funded research be made available through NCBI will get even more information
out there to enhance the discovery process and the public health.
[Narrator:] One of the prime movers in the campaign for “open access” to published
scientific literature — Dr. Richard Roberts, 1993 Nobel laureate — calls PubMed Central
the “GenBank” of the published literature. [Roberts:] I think we all know that if you
want to work at the cutting edge of science, you’ve got to know where it is…and the
only way you know where it is, is by reading the literature. Access — this idea that everybody
should have access is key. [Narrator:] In 2004, as the new millennium
gains momentum, NCBI continues to enhance the understanding of our genetic legacy, and
its role in health and disease, by linking its genetic data to a new database of information
on how chemicals affect biological processes. [Narrator:] As part of the NIH Molecular Libraries
Program, NCBI develops the PubChem project. The idea is to provide information on the
biological activities of small molecules. [Bryant:] The real heart of PubChem is the
links from each chemical to other sources of information about its biological properties.
[Narrator:] Of prime importance to the NCBI mission, is a constant quest for better, newer,
faster, more efficient ways of searching databases, and extracting or “mining” the information
within the text. [Obey:] Today molecular biology and genomics
have become the primary drivers of medical progress. Under the leadership of Dr. David
Lipman, NCBI’s key molecular biology information resources are empowering hundreds of thousands
of researchers around the world… [Obey:] … in the race to identify disease-specific
genes, as well as strategies for treating and preventing disease. In fact, I’m told
that researchers are downloading data equivalent in size to the contents of the entire Library
of Congress every week. That’s amazing. [Lipman:] With the new technologies, the new
sequencing technologies, we’re talking about an amazing jump in the amount of data…
[Lipman:] …and what that really means and what that’s going to lead to is a very different
kind of thing we can do with sequencing than we ever did before.
[Lipman:] They talk about that as being a challenge, and we do see it as a challenge,
but we also see it as an amazing opportunity [Obey:] The era of personalized medicine
— targeted individualized treatment — will soon be here, and NCBI’s clearly a major
force in making that a reality.