This page contains a growing list of nucleotide, protein and genomic databases located throughout the world. Each database is provided with a description, and a hot-link to a site when it can be searched. Currently, the only way to find your database is to scroll through the alphabetical listing. Soon, a better search mechanism will be implemented.
The Amino Acid Index Database
is a collection of published indices together with the result of cluster analysis using the correlation coefficient as the distance between two indices. Because the 20 naturally occurring amino acids have various physicochemical and biochemical properties, there have been a number of reports on the set of numerical values, which we call the amino acid index, reflecting such properties. Click here to connect to the Amino Acid Index Database.

A Caenorhabditis elegans DataBase (ACEDB)
Containing data from the Caenorhabditis Genetics Center (funded by the NIH National Center for Research Resources), the C. elegans genome project (funded by the MRC and NIH), and the worm community. ACEDB is the database program written for the nematode genome project Click here to connect to ACEDB.

are multiply aligned ungapped segments corresponding to the most highly conserved regions of proteins. The blocks for the BLOCKS database are made automatically by looking for the most highly conserved regions in groups of proteins represented in the PROSITE database. These blocks are then calibrated against the SWISS-PROT database to obtain a measure of the chance distribution of matches. It is these calibrated blocks that make up the BLOCKS database. Click here to connect to BLOCKS.

Click here to connect to CPGISLE.

The Dali server is a network service for comparing protein structures in 3D. You submit the coordinates of a query protein structure and Dali compares them against those in the Protein Data Bank. A multiple alignment of structural neighbours is mailed back to you. In favourable cases, comparing 3D structures may reveal biologically interesting similarities that are not detectable by comparing sequences. Click here to connect to DALI.

dbEST : A division of GenBank for cDNA sequence and mapping data
(Boguski et al., 1993) is an NCBI resource, now in its third year of operation, that contains sequence and mapping data on partial, "single-pass" cDNA sequences or Expressed Sequence Tags (Adams et al., 1991). Although dbEST sequences are incorporated into the new EST Division of GenBank, annotation in dbEST is more comprehensive and includes detailed contact information about the contributors, genetic map locations (when available), and instructions on obtaining physical DNA clones from the American Type Culture Collection and other sources. In addition, NCBI periodically updates putative homology assignments using the BLAST family of programs (Altschul et al., 1994). Click here to connect to dbEST.

dbSTS : A public database of "Sequence Tagged Sites"
is a new NCBI resource that contains sequence and mapping data on short genomic landmark sequences or Sequence Tagged Sites (Olson et al., 1989). Although dbSTS sequences are to be incorporated into the new STS Division of GenBank, annotation in dbSTS is more comprehensive and includes detailed contact information about the contributors, experimental conditions and genetic map locations. In addition, NCBI periodically updates putative homology assignments using the BLAST family of programs (Altschul et al., 1994). Click here to connect to dbSTS.

The Developmental Studies Hybridoma Bank,
under the auspices of the National Institute of Child Health & Human Development, was established in 1986 to supply investigators with monoclonal antibodies useful for studies in developmental and cell biology. They may be ordered as tissue culture supernatants, ascites, or partially purified immunoglobulin; selected hybridomas are also available. The DSHB is jointly administered by J. Thomas August of The Johns Hopkins University School of Medicine and David R. Soll of The University of Iowa. Click here to connect to the Developmental Studies Hybridoma Bank Database.

DNA Databank of Japan (DDBJ)
does not have a web page but can be accessed via e-mail, ftp, and wais. ADD MORE INFO HERE!!!!!!

The DSSP (Database of Secondary Structures)
is a database of secondary structure assignments (and much more) for all of the entries in the Protein Data Bank (PDB). The DSSP program defines secondary structure and solvent exposure of proteins, given atomic coordinates in Protein Data Bank format. The program does NOT PREDICT protein structure. Click here to connect to DSSP.

(also called ENZYME) is a repository of information relative to the nomenclature of enzymes. It is primarily based on the recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (IUBMB) and it describes each type of characterized enzyme for which an EC (Enzyme Commission) number has been provided Click here to search the EC-ENZYME database in North America or here to search the original ENZYME Database in Europe.

E.coli database collection (ECDC)
In this collection you will find all information regarding the entire E.coli K12 chromosome, we possibly could get. The collection is organized like a genetic map. Searching for names or for certain map positions may be performed by scanning the respective list. A list of promoter or terminator structures may be scanned also. Within the genetic map you may walk into either direction within each 5 min interval. You may receive a complete sequence contig, which may be build from several individual sequence files. Coding sequences (CDS) are provided for each individual genes, putative open reading frames (ORF) and RNA, other than mRNA. Each individual gene or structural element is described as detailed as possible. Coding sequences and contigs are generated at Giessen, while EMBL or SwissProt files are collected directly via WWW connections. The same is true for all other database connections provides by these links. ECDC is still in an experimental status. We will constantly try provide new features. Click here to connect to ECDC.

The EMBL Nucleotide Sequence Database
is a comprehensive database of DNA and RNA sequences collected from the scientific literature, patent applications, and directly submitted from researchers and sequencing groups. The database is produced in collaboration with GenBank and the DNA Database of Japan (DDBJ). Each of the three groups collects a portion of the total sequence data reported worldwide, and all new and updated database entries are exchanged between the groups on a daily basis. Click here to search the EMBL Database.

The Eukaryotic Promoter Database (EPD)
was designed and developed at the Weizmann Institute of Science in Rehovot (Israel) and is currently maintained at ISREC in Epalinges s/Lausanne (Switzerland). EPD is a specialized annotation database of the EMBL Data Library. It provides information about eukaryotic promoters available in the EMBL Data Library and is intended to assist experimental researchers, as well as computer analysts, in the investigation of eukaryotic transcription signals. The present version originated from a previous compilation and is organized as a hierarchically ordered and documented "functional position set" pointing to transcription initiation sites. All information is directly abstracted from scientific literature and is thus independent of the EMBL sequence entry descriptions. As a consequence, many of the initiation sites referred to in EPD do not appear in corresponding EMBL feature tables. Click here to connect to the Eukaryotic Promoter Database (EPD).

is a comprehensive database for information on the genetics and molecular biology of Drosophila. It includes data from the Drosophila Genome Projects and data curated from the literature. FlyBase is a joint project with the Berkeley and Europe an Drosophila Genome Projects.

Click here to connect to FLYCLONES.

Click here to connect to FLYGENE.

Click here to connect to FLYPEOPLE.

Clikc here to connect to FLYREFS.

The FSSP database (Families of Structurally Similar Proteins)
is a database of structural alignments of proteins in the Protein Data Bank (PDB) The database currently contains an extended structural family for each of 330 representative protein chains. Each data set contains structural alignments of one search structure with all other structurally significantly similar proteins in the representative set (remote homologs, (30% sequence identity), as well as all structures in the Protein Data Bank with 70-30% sequence identity relative to the search structure (medium homologs). Very close homologs (above 70 % sequence identity) are excluded as they rarely have marked structural differences. The alignments of remote homologs are the result of pairwise all-against-all structural comparisons in the set of 330 representative protein chains. All such comparisons are based purely on the 3D co-ordinates of the proteins and are derived by automatic (objective) structure comparison programs. The significance of structural similarity is estimated based on statistical criteria. Click here to connect to FSSP.

is the NIH's database of all known nucleotide and protein sequences including supporting bibliographic and biological information. As of Release 89.0 in June, 1995, GenBank contained over 318,000,000 nucleotide bases from over 425,000 different sequences. Entries include a concise description of the sequence, scientific name and taxonomy of the source organism, and a table of features specifying coding regions and other sites of biological significance. As part of the feature table, protein translations for coding regions are included. GenBank has been the responsibility of the National Center for Biotechnology Information (NCBI) since October, 1992. The NCBI is part of the National Library of Medicine (NLM), and, in turn, a part of the National Institutes of Health. Click here to connect to a form based query of Genbank.

GenQuest - The Q server
An integrated interface to the sequence comparison server at the Oak Ridge National Lab designed for rapid and sensitive comparison of DNA and Protein sequence to existing DNA and Protein sequence databases and the rapid retrieval of the full database entries of any sequence found in the course of a search. You have a choice of FASTA, BLAST, or Smith-Waterman Searches. Click here to connect to GenQuest.

The Genome Sequence Data Base (GSDB)
is dedicated to supporting scientific research and development by creating, maintaining and distributing a complete, timely, accurate and useful collection of DNA sequences and related information. As an on-line, client-server, relational database, GSDB operates as part of the DOE federated information infrastructure and focuses on meeting the needs of the major genome sequencing laboratories. GSDB at NCGR is a direct outgrowth of the Los Alamos National Laboratory component of GenBank. The original project was first conceived by Walter Goad at the Los Alamos National Laboratory (LANL) in Los Alamos, New Mexico in 1979. From 1982-1992 LANL performed data collection as part of the GenBank contract from NIH/NIGMS. In 1992-1993 LANL performed data collection for NCBI. In 1993 the LANL effort became the Genome Sequence Data Base (GSDB). In August 1994, GSDB moved from LANL to the National Center for Genome Resources in Santa Fe, New Mexico. Click here to connect to GSDB.

The GDB Human Genome Data Base
supports biomedical research, clinical practice, and professional and scientific education by providing human gene mapping information through GDB and genetic disease information through OMIM. GDB is maintained as a relational database. The data are in many different tables which represent nine primary data objects. The Navigational Map graphically displays the links between the data contained in these tables and the links to the data contained in OMIM. There are also links to the Enzyme Data Bank via EC numbers and the Genome Sequence Data Bank (GSDB) via DNA sequence accession numbers. Click here to connect to GDB

HIV Sequence Database
collects, curates, analyzes, and publishes genetic sequences of HIV-1, HIV-2, SIV and related animal retroviruses, as well as host cellular proteins. The database is located at Los Alamos National Laboratory in the Theoretical Biology and Biophysics Group and is funded by the Division of AIDS of the National Institute of Allergy and Infectious Diseases through an interagency agreement with the Department of Energy. Recently, the same group created a sister database, the HIV Molecular Immunology Database, that provides a comprehensive listing of defined HIV epitopes.
The HSSP database (Homology-derived Secondary Structure of Proteins)
is a database of homology-derived secondary structure of proteins. It was designed by Chris Sander and Reinhard Schneider. The database of known protein three-dimensional structures can be significantly increased by the use of sequence homology, based on the following observations. (1) The database of known sequences, currently at more than 12000 proteins, is two orders of magnitude larger than the database of known structures. (2) The currently most powerful method of predicting protein structures is model building by homology. (3) Structural homology can be inferred from the level of sequence similarity. (4) The threshold of sequence similarity sufficient for structural homology depends strongly on the length of the alignment. Tertiary structures of the aligned sequences are implied, but not modelled explicitly. The database effectively increases the number of known protein structures by a factor of five to more than 1800. The results may be useful in assessing the structural significance of matches in sequence database searches, in deriving preferences and patterns for structure prediction, in elucidating the structural role of conserved residues and in modelling three-dimensional detail by homology. Click here to connect to HSSP.

The Ligand Chemical Database for Enzyme Reaction (Ligand Chemical Database)
are constructed to provide the linkage between amino acid sequences of enzymes and chemical compounds recognized by the enzymes as ligands. Entries are constructed besed on the classification of IUB (International Union of Biochemistry). For amino acid sequences the PIR entry codes are indicated at the end of each entry. In the case of the enzymes that their 3D structure are determined, PDB entry codes are also included in the entries. Click here to connect to The Ligand Chemical Database for Enzyme Reaction (Ligand Chemical Database).

contains information about the contents and details of maintenance of databases related to molecular biology. It was created to facilitate the process of locating and accessing data sets upon which the research community depends; we believe it will also be of use to those who are doing research in designing and linking these databases. Click here to connect to LiMB.

covers literature containing molecular aspects of proteins from about 1000 journals. Key-words and key-phrases for compounds, fact data and subjects. Some key-words are abbreviated as in abbrev-table. Compiled by PRF. Click here to connect to LITDB

Mouse Genome Database (MGD)
provides a comprehensive source of information on the experimental genetics of the laboratory mouse. MGD includes information on mouse loci, homologies, probes and clones, PCR primers, and experimental marker mapping data such as strain distribution patterns for RIs and cross haplotypes. MGD is integrated with the Encyclopedia of the Mouse Genome. The integration features are optional. If you activate the option to "link" to a browser and open a map in the Encyclopedia, the browser automatically starts up so that you have the map and a browser window displayed on your screen. It is funded by NIH.

The Medline databank on this site is the subset of sequence related entries that comes with the distributed version of Entrez on CD-ROM. Click here to connect to MEDLINE.

is a sequence--structure database derived from the 3 dimensional structure of proteins deposited with the Brookhaven National Laboratory's Protein Data Bank. This database was conceived, developed and tested at the Naval Research Laboratory, Washington, DC and incorporated into the *PIR System at the National Biomedical Research Foundation, Washington, DC. Click here to connect to NRL_3D.

Non redundant Bacillus subtilis sequences Stored in EMBL format Click here to connect to NRSUB.

is the continuously updated online version of Dr. Victor A. McKusick's book MENDELIAN INHERITANCE IN MAN (MIM). Because this knowledge-base is updated daily, the entries may differ from the most recently published version of the book. Click here to connect to OMIM.

The PDBSELECT database
is a subset of the structures in the PDB that does not contain (highly) homolog sequences. It was designed by Uwe Hobohm and Chris Sander. PDB_SELECT: Representative list of PDB chain identifiers. The representative lists of protein chains are intended for anyone interested in working with currently known protein structures. They are intended to save time and effort by offering a representative selection that is currently about a factor of five or six smaller than the entire database. Typical uses are introductory browsing, analysis of protein architecture, development of prediction methods, and model building by modular construction. To use the lists, you need access to data sets from the Protein Data Bank and to software that reads protein structure files Click here to connect to PDBSELECT.

The PDBFINDER database
is a database that is constructed using a PERL script from the PDB, DSSP and HSSP databases. Many of the fields contained in the PDBFINDER database are difficult to access from the original databases. Some information is retrieved from the original literature. The PDBFINDER database was constructed by Rob W.W. Hooft, Michael Scharf, Chris Sander, and Gert Vriend. Click here to connect to PDBFINDER.

PIR (1-3)(Protein Identification Resource)
is a database meant for idenification of proteins. It can be searched by title, species, reference, comment, keyword, superfamily, features, map position, molecular weight, and number of residues. Click here to connect to PIR (1-3).

PIRALN (Database of Protein Sequence Alignments)
is created by Protein Identification Resource (PIR) at the National Biomedical Research Foundation in Washington, DC. Click here to connect to PIRALN.

The Bookhaven Protein Data Bank (PDB)
is an archival computer database of three-dimensional structures of biological macromolecules. The database contains atomic coordinates, bibliographic citations, primary sequence and secondary structure information, as well as crystallographic structure factors and 2D-NMR experimental data. Information is available on protein, DNA, RNA, virus and carbohydrate structures. Click here to connect to PDB.

PDBSTR (Protein Data Bank Restructured)
Because the Protein Data Bank is not designed for sequence analysis, there are three major difficulties in accessing the Bank directly. First, the Bank contains non-protein entries, nucleic acids and other biological macromolecules. Second, one entry may contain multiple sequences, such as light and heavy chains of immunoglobulin. Third, the residue numbering in one sequence may not be consecutive containing deletions (jumps in numbering) and insertions (additional letters A,B,C,...) because the numbering is often defined from the reference (homologous) sequence. The conversion programs BNLCV1 and BNLCV2 standardize the Protein Data Bank in consistent with other sequence databases. The resulting database PDBSTR contains basically three types of data in addition to header and other text information: sequence data in SEQUENCE, secondary structure information in FEATURES, and structural data for alpha-carbons in STRUCTURE. The coordinates for other atoms are not included in our database, and the user must refer to the original Protein Data Bank files. Click here to connect to PDBSTR.

is a compendium of protein motif "fingerprints". A fingerprint is a group of conserved motifs used to characterise a protein family; its diagnostic power is refined by iterative scanning of OWL. Usually the motifs do not overlap, but are separated along a sequence, though they may be contiguous in 3D-space. Fingerprints can encode protein folds and functionalities more flexibly and powerfully than can single motifs: the database thus provides a useful adjunct to PROSITE.

Protein Mutant Database (PMD)
Mutant database of proteins would be valuable as a basis of protein engineering. Our database is of a type based on literature (not on proteins); that is, each entry of the database corresponds to one article which describes protein mutations. This project started in 1989, and the data-input has been completed for the literature of up to 1990. The total number of articles entered is now about 2,800, and the total number of mutants contained is about 22,000. Each entry of the database is identified with a serial number, and distinguished either "natural" or "artificial" depending on the type of mutation. Click here to connect to PMD.

PRF/SEQDB (Protein Research Foundation Sequence Database)
is the database of protein primary structures founded by the Protein Research Foundation in 1979. The database is updated bimonthly. SEQDB contains sequence-related information (e.g. processing sites, modified residues, S-S bonds, etc.) as well as protein primary structures reported in scientific journals. Some of the sequences included are those converted from nucleic acid sequences. Each entry in SEQDB has its counterpart in PRF/LITDB, our original database of literature citations. Click here to connect to PRF/SEQDB.

is a method of determining the function of uncharacterized proteins translated from genomic or cDNA sequences. It consists of a database of biologically significant sites, patterns and profiles that help to reliably identify to which known family of protein (if any) a new sequence belongs. Click here to connect to Prosite.

PSORT (Prediction of Sorting Signals)
is an expert system for the prediction of protein localization sites in cells. It receives the information of an amino acid sequence and its source orgin, e.g., Gram-negative bacteria, as inputs. Then, the system analyzes the input sequence by applying the stored rules for various sequence features of known protein sorting signals. Then, it reports the possiblity for the input protein to be localized at each candidate site with additional information. Click here to connect to PSORT.

The Restriction Enzyme Database (REBASE),
is a collection of information about restriction enzymes, methylases, the microorganisms from which they have been isolated, recognition sequences, cleavage sites, methylation specificity, the commercial availability of the enzymes, and references - both published and unpublished observations (dating back to 1952). Click here to connect to the Restriction Enzyme Database.

contains mapping information. Click here to connect to RHDB.

contains mapping information. Click here to connect to RHEXP.

contains mapping information. Click here to connect to RHPANEL.

Sequence Analysis Reference (SeqAnalRef)
is a bibliographic reference data bank relative to papers dealing with sequence analysis. This data banks stores the references of articles from the expanding field of mathematical and computer analysis of biomolecular sequences. Click here to connect to the SeqAnalRef Database.

SGD (Saccharomyces Genomic Information Resource)
is a collection of information about the yeast Saccharomyces cerevisiae, commonly known as baker's or budding yeast. This database includes a variety of genomic and biological information. Click here to connect to SGD.

contains data on proteins identified on various 2-D PAGE reference maps. You can locate these proteins on the 2-D PAGE maps or display the region of a 2-D PAGE map where one might expect to find a protein from SWISS-PROT. Click here to connect to the SWISS-2DPAGE.

is an image database which strives to provide high quality pictures of biological macromolecules with known three-dimensional structure. The database contains mostly images of experimentally elucidated structures, but also provides views of well accepted theoretical protein models. The images are provided in several useful formats; both mono and stereo pictures are generally available. Click here to connect to the SWISS-3DIMAGE.

SWISSDOM shows for each SwissProt entry included in PRODOM a map of detected domains. Click here to connect to SWISSDOM.

is a curated protein sequence database produced collaboratively by Amos Bairoch (University of Geneva) and the EBI.which strives to provide a high level of annotations (such as the description of the function of a protein, its domains structure, post-translational modifications, variants, etc), a minimal level of redundancy and high level of integration with EMBL, PROSITE, and PDB. Click here to connect to the SWISS-PROT.

is an attempt to organize information on transgenic animals and targeted mutations generated and analyzed worldwide. Since development of the technology to manipulate the germline of animals over a decade ago, a large number of transgenic animals have been produced worldwide for use in both basic and applied research. Additionally, development of gene targeting protocols involving homologous recombination in mouse embryonic stem cells has resulted in a considerable number of mutant lines with specific phenotypes and well-defined DNA structural changes. Click here to connect to TBASE.
contains information about transcription factors. Click here to connect to TFFACTOR.

contains information about transcription factors. Click here to connect to TFSITE.

