ExplorEnz: a MySQL database of the IUBMB enzyme nomenclature
© McDonald et al; licensee BioMed Central Ltd. 2007
Received: 05 April 2007
Accepted: 27 July 2007
Published: 27 July 2007
We describe the database ExplorEnz, which is the primary repository for EC numbers and enzyme data that are being curated on behalf of the IUBMB. The enzyme nomenclature is incorporated into many other resources, including the ExPASy-ENZYME, BRENDA and KEGG bioinformatics databases.
The data, which are stored in a MySQL database, preserve the formatting of chemical and enzyme names. A simple, easy to use, web-based query interface is provided, along with an advanced search engine for more complex queries. The database is publicly available at http://www.enzyme-database.org. The data are available for download as SQL and XML files via FTP.
ExplorEnz has powerful and flexible search capabilities and provides the scientific community with the most up-to-date version of the IUBMB Enzyme List.
The Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (NC-IUBMB), in association with the IUPAC-IUBMB Joint Commission on Biochemical Nomenclature (JCBN), is responsible for the classification of enzymes and production of the IUBMB Enzyme List. The NC-IUBMB assigns EC numbers to enzymes and provides a brief synopsis of each enzyme, a work that is coordinated by our group at Trinity College Dublin. These data are then used by many other resources, including the Swiss-Prot ENZYME, BRENDA and KEGG databases.
The classification system: a brief description
When classified, each enzyme is assigned a four-part EC number, in the form of digits separated by periods. The first three numbers represent the class, subclass and sub-subclass to which an enzyme belongs, and the fourth digit is a serial number to identify the particular enzyme within a sub-subclass. The class, subclass and sub-subclass each provide additional information about the reaction classified. For example, in the case of EC 22.214.171.124, the digits indicate that the enzyme is an oxidoreductase (class 1), that it acts on the aldehyde or oxo group of donors (subclass 2), that oxygen is an acceptor (sub-subclass 3) and that it was the fourth enzyme classified in this sub-subclass (serial number 4).
In addition to the EC number, other information about the enzyme is provided so that the user can get a flavour of the enzyme's function and how it differs from similar enzymes. This additional information is divided into the following fields: accepted name, reaction, glossary, synonyms, systematic name, comments, references and links to other databases. Diagrams of individual reactions or of the related metabolic pathways are also provided in many instances. Further details of the classification system can be found elsewhere [1, 2]. An important aspect of the Enzyme List is that it attempts to ensure a high degree of accuracy and quality for each enzyme entry. Thus, for example, a new enzyme is added only when there is sufficient, published, evidence that the reaction claimed is actually catalysed by a single enzyme that differs from all previously listed enzymes.
The IUBMB enzyme data are publicly available on the web  as a series of flat files. While a number of endeavours already use the enzyme data as an integral subset of the data they provide – for example, the BRENDA , ExPASy , GO , IntEnz  and KEGG  databases – the manually curated IUBMB enzyme data are not distinguished from the other data provided. In addition, the formatting of chemical names is, in many cases, not in accordance with IUBMB recommendations (e.g. no subscripts, superscripts or italicization of locants), although otherwise the names are semantically accurate. In this article, we present ExplorEnz as an alternative means of accessing the most up-to-date Enzyme Nomenclature information, in a readily searchable manner and with correctly rendered output.
Construction and content
The enzyme data and their associated literature references are stored in MySQL databases on a dedicated server, and are accessed through a web interface written in PHP. The initial content for the database was extracted from the HTML-formatted flat files located on the home page of the IUBMB Enzyme List . Custom Perl scripts were used to strip out the hard-coded HTML formatting and to convert the data into a plain ASCII flat file. A second set of Perl scripts was written to convert the plain-text data into HTML. A unique feature of these scripts is that they include rules to automatically generate the correct formatting of chemical names and formulae using a regular-expression-based pattern-matching system. This set of regular-expression-based replacement rules has been incorporated into its own database for use within this and other web applications.
In addition to the public database, a curatorial interface was also developed, which provides members of the reviewing panel with real-time access to all data on new/amended enzymes in an effort to speed up the classification process. The interface allows direct entry or modification of data in individual fields as plain text, which is then automatically rendered into the correct format. References can be imported automatically into the database using PubMed (PMID) numbers. All changes to the database are logged, which enables tracking of all changes made on a specific date or to a particular enzyme entry over time. A script was also written to convert these data into the format used on the IUBMB website , to prevent duplication of effort and to ensure consistency among the IUBMB data sets.
Utility and Discussion
ExplorEnz makes use of the regular-expression matching facility of MySQL, thus allowing the user to construct more complex queries; since text fields within the database are set as case-insensitive, the most basic use of this feature would be for case-sensitive search functionality. In addition, there is an "Advanced search" facility that allows the user to search for up to four different text patterns at once, using Boolean algebra to include or exclude terms from the selected fields. To our knowledge, this range of search and display options is unavailable in other enzyme databases at present. Fig. 2(b) shows the result of searching for some of the enzymes involved in the early stages of lysine biosynthesis. This query takes advantage of the regular-expression-based search facility to limit the search to specific EC numbers, i.e. EC 126.96.36.199 and EC 188.8.131.52.
While the database returns its results as HTML, the user-supplied term is matched against a plain-ASCII version of the data. In the majority of cases, queries can be posed unambiguously; bold, italic, subscripted and superscripted entities should be submitted inline without any modifier: for example, either "tRNATyr" or "trnatyr" can be used to match entries that appear in the output as "tRNATyr". Greek letters should be spelt out in English: e.g., "alpha" for "α ", "beta" for "β", "delta" for "δ", "Delta" for "Δ", etc.
There is also the option of outputting the results in a format that is more suitable for printing (Print Version button). In this case, the font size is reduced to make the text more compact, the output is rendered in black and white, the 'Links to other databases' field is omitted and all underlining of links is suppressed. Alternatively, the printable version can be saved as a PDF file to the user's hard disk if the user has an appropriate OS or relevant third-party software. The user's search term can be highlighted in the results page, a feature that takes advantage of the regular-expression formatting to compute the string that becomes highlighted in the HTML data. Thus, entering "alpha-D-glucose" as a search term, and with highlighting selected, will result in each occurrence of "α-D-glucose" being highlighted in the output.
The diagrams of enzyme reaction mechanisms and pathways, produced by Moss and Dixon for the Enzyme List [1, 3], are also available through ExplorEnz. The diagrams show the structures of the substrates and products and, in the case of reaction mechanisms, the intermediates. EC numbers, where shown, are linked to the corresponding entries in the database. The diagrams are supplied as GIF images, although it is hoped to provide Scalable Vector Graphic (SVG) versions in the future, as this would allow the user to search for chemical names and EC numbers within the diagrams.
Database curation and the automatic formatting of chemical names
Some examples of the formatting of enzyme and chemical names.
Unformatted and Formatted Data
NAD(P)+ + l-arginine = nicotinamide + Nomega-(ADP-d-ribosyl)-L-arginine
NAD(P)+ + l-arginine = nicotinamide + Nω-(ADP-d-ribosyl)-l-arginine
eicosapentaenoate cis-Delta5,17-eicosapentaenoate cis-Delta5-trans-Delta7,9-cis-Delta14,17 isomerase
eicosapentaenoate cis-Δ5,17-eicosapentaenoate cis-Δ5-trans-Δ7,9-cis-Δ14,17 isomerase
Such conditions can readily be converted, on retrieval, to the regular-expression syntax of the language in which the web application is written. This feature reduces the time required for the curator to input data and ensures consistency of formatting throughout the database. The direct output of the data in IUBMB nomenclature format should be of benefit to journal editors wishing to check standardized usage, and the comprehensive searching facility, including searches by synonyms or reactants, should facilitate the ready identification of novel enzymes that should be included in the Enzyme List.
Statistics on EC numbers held in the database.
Class 1 (Oxidoreductases)
Class 2 (Transferases)
Class 3 (Hydrolases)
Class 4 (Lyases)
Class 5 (Isomerases)
Class 6 (Ligases)
A key attribute of ExplorEnz is its superior search and display functionality. Data in the HTML output are formatted according to accepted conventions, something that few databases have implemented to date. This database is the primary source of new EC numbers, from which all other databases containing the Enzyme Nomenclature data can be updated. To this end, we have made provision for MySQL replication of ExplorEnz to interested parties. In addition, daily updates of the data are made available for download in both SQL and XML format on the ExplorEnz website.
Availability and requirements
The ExplorEnz website is publicly available at http://www.enzyme-database.org. The data are accessible as (gzip-compressed) SQL and XML files via FTP from ftp://ftp.enzyme-database.org/pub/sql/enzyme-data.sql.gz and ftp://ftp.enzyme-database.org/pub/xml/enzyme-data.xml.gz. Users are requested to acknowledge the IUBMB as the source of these data.
The assistance of Prof. Toni Kazic (University of Missouri-Columbia) in parsing the original HTML data is gratefully acknowledged. We are thankful to Science Foundation Ireland (grant No. SFI 02/IN.1/B043-Tipton) for financial support.
- Tipton KF, Boyce S: Enzyme Classification and Nomenclature. Nature Encyclopedia of Life Sciences. 2000, Nature Publishing Group, London
- Boyce S, Tipton KF: History of the enzyme nomenclature system. Bioinformatics. 2000, 16: 34-40. 10.1093/bioinformatics/16.1.34.View ArticlePubMed
- Enzyme Nomenclature. [http://www.chem.qmul.ac.uk/iubmb/enzyme/]
- Schomburg I, Chang A, Ebeling C, Gremse M, Heldt C, Huhn G, Schomburg D: BRENDA, the enzyme database: updates and major new developments. Nucleic Acids Res. 2004, 32: D431-D433. 10.1093/nar/gkh081. [http://brenda.bc.uni-koeln.de/]PubMed CentralView ArticlePubMed
- Bairoch A: The ENZYME database in 2000. Nucleic Acids Res. 2000, 28: 304-305. 10.1093/nar/28.1.304. [http://expasy.org/enzyme/]PubMed CentralView ArticlePubMed
- Harris MA, Clark J, Ireland A, Lomax J, Ashburner M, et al.: The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 2004, 32: 258-261. 10.1093/nar/gkh066. [http://www.godatabase.org/]View Article
- Fleischmann A, Darsow M, Degtyarenko K, Fleischmann W, Boyce S, Axelsen KB, Bairoch A, Schomburg D, Tipton KF, Apweiler R: IntEnz, the integrated relational enzyme database. Nucleic Acids Res. 2004, 32: D434-D437. 10.1093/nar/gkh119. [http://www.ebi.ac.uk/intenz/]PubMed CentralView ArticlePubMed
- Kanehisa M, Goto S, Hattori M, Aoki-Kinoshita KF, Itoh M, Kawashima S, Katayama T, Araki M, Hirakawa M: From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res. 2006, 34: D354-D357. 10.1093/nar/gkj102. [http://www.genome.ad.jp/]PubMed CentralView ArticlePubMed
- A direct link to EC x.y.z.w.http://www.enzyme-database.org/query.php?ec=x.y.z.w,
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.