---------------------------------------------------------------------- Protein Data Bank Quarterly Newsletter Number 68 April 1994 ---------------------------------------------------------------------- April 1994 PDB Release 2441 full-release atomic coordinate entries (120 new additions) 2247 proteins, enzymes and viruses 166 DNA's 9 RNA's 9 tRNA's 10 carbohydrates 353 structure factor entries 31 NMR experimental entries The total size of the atomic coordinate entry database is 720 Mbytes uncompressed. ---------------------------------------------------------------------- Technical Format Details Recently we have made two small changes in the format of coordinate entries. These changes will only have an impact if you look explicitly at these fields. 1. Columns 21-25 of HET records are defined as the number of atoms in the non-standard group. We are now generating this number automatically, and it is now set equal to the number of HETATM records that are included in the het group. Thus, for example, if an atom is presented in two conformations it will be counted twice in the number presented in columns 21-25. As another example, if some of the atoms in a het group could not be located in density and no coordinates are given for these atoms, then they will not be included in this count. The formula for the het group that is given in the corresponding FORMUL record is unchanged and represents the composition of the unliganded group. 2. For every publication that is listed on JRNL or REMARK records, the last record is of type REFN and contains several codes to identify the journal or book. In the past, columns 68-70 were used to list the coden which is simply an identification number for the journal or book. The same codens are used by the Protein Data Bank and the Cambridge Crystallographic Data Centre. With the increasing numbers of journals and books, we have had to expand the three-digit coden to a four-digit number which is now presented in columns 67-70. ---------------------------------------------------------------------- Unix-Based Filters and Browser The response to our PC-based PDB-Shell has been so positive that a similar system, PDB-Browse, has been developed for PDB's UNIX users. The PDB-Browse system consists of three parts. The first of these builds indices (based on Unix standard dbm files) of the PDB entries as they exist on the user's computer system. The requirements for running this part are PERL and 50 Mbytes of disk space (the disk requirement can be lowered by indexing fewer fields in the PDB file). Currently indexed are the AUTHOR, COMPND, CRYST1, JRNL, SOURCE, EXPDAT and REMARK records, plus the accession date, functional classification (as it appears in the HEADER record) and resolution fields, as well as the file location as a function of ident code. The second part of PDB-Browse consists of a number of PERL programs that act as filters and scan the above indices. The programs auth.pl, comp.pl, cryst.pl, expd.pl, jrnl.pl, rem.pl and sour.pl scan the appropriate records. For example, expd.pl -a NMR returns the ident codes of all (-a) entries for which NMR techniques were used to collect the data. (Adding a -v flag would report all non-nmr entries.) To find out which of these entries describe DNA, execute the command: expd.pl -a NMR | comp.pl DNA which will return the entries satisfying both conditions. To find out which of these have a SYNTHETIC source, execute expd.pl -a NMR | comp.pl DNA | sour.pl SYNTHETIC to produce the ident codes of those 14. On a Silicon Graphics workstation, with the indices residing on an NFS-mounted file system, the above query took less than 10 seconds. A PERL program called loc.pl is also provided which, given an ident code, will provide the location of an entry. Combining loc.pl with the pipeline, one could view each of the above files with the command: view \expd.pl -a NMR | comp.pl DNA |sour.pl SYNTHETIC | loc.pl\ PERL scripts are also provided to scan the accession date (before.pl and after.pl), functional classification (head.pl), and the resolution as it appears in remark'2 (resolu.pl). In addition, full.pl searches the full file for a particular expression including wild cards, list.pl lists an entire index, and lookup.pl extracts certain records or fields from a list of ident codes. For example, to find the authors (not ident codes) from the above set, one would execute: expd.pl -a NMR | comp.pl DNA | sour.pl SYNTHETIC | lookup.pl auth The third part of the PDB-Browse system, called browse, is a graphical user interface (GUI) front-end for all of these filters (as well as custom filters that may be written by the user). This front-end requires the user to install tcl and tk (Tool Command Language and Tool Kit, public domain utilities written by John Ousterhout of UC Berkeley and available via anonymous FTP from harbor.ecn.purdue.edu). It is also helpful, but not required, to have a graphical viewing program such as RASMOL (by Roger Sayle, available from ftp.dcs.ed.ac.uk) or MidasPlus (Conrad Huang and Thomas Ferrin of UCSF). A screen dump of a typical PDB-Browse session is shown below. All of the programs making up the PDB-Browse system are available in source form via each of the PDB's distribution methods, in a directory named pdbbrowse. Please send all comments and suggestions for improvements to Dave Stampf (stampf@bnl.gov). ---------------------------------------------------------------------- Deposition Contact Persons PDB requires the names of two contact persons - a primary and a secondary contact - on the deposition form for each new structure. For detailed instructions, please see the current deposition form available on the PDB FTP server or request a copy via e-mail (pdb@bnl.gov). To avoid confusion, we suggest that the primary contact be the person who actually submits the data. When requesting acknowledgment of a data deposition, please be sure to provide PDB with the name of the primary contact person so that we can properly identify your structure. Deposition material should only be sent to the following: e-mail: pdb@bnl.gov FTP: connect to pdb.pdb.bnl.gov using anonymous FTP cd to the /new_uploads directory upload your files normal mail: Protein Data Bank Depositions Chemistry Department, Building 555 Brookhaven National Laboratory P.O. Box 5000 Upton, NY 11973-5000 USA Be sure to provide identifying information, including your name, address and telephone number, in the header of all files submitted. ---------------------------------------------------------------------- Newsletter Availability The PDB Newsletter is available from FTP, Gopher and Listserver archives as soon as it is prepared. Allowing for printing and mailing time, we expect this to be about four weeks earlier than a printed copy would reach you by mail. Availability of each new issue is announced on the Listserver (see article pertaining to Listserver). If you are satisfied receiving the PostScript or ASCII version of the Newsletter and no longer wish to receive a printed copy, or if you know of copies that are being discarded because of obsolete addresses, please let us know electronically by sending a message to pdb@bnl.gov or by normal mail. Printing and mailing fewer copies saves both time and trees. ---------------------------------------------------------------------- Revised Newsletter Format Considered PDB is considering eliminating the tables of newly released and newly deposited entries in future Newsletters. All available and pending PDB entries, with newly released and newly deposited entries flagged, are listed in the Full Tables document. This document accompanies each order shipped and is available from FTP in the /newsletter subdirectory in both PostScript and ASCII formats. A printed copy may be obtained upon request. We would appreciate feedback from you about this possible change. Please send your response via e-mail to pdb@bnl.gov or by normal mail. ---------------------------------------------------------------------- PDB User Group A User Group for the Protein Data Bank is currently being organized under the leadership of Jane S. Richardson of Duke University. A Coordinating Committee to represent the diverse spectrum of users will soon be named. The User Group aims to facilitate communication in both directions between PDB and all types of users, improving knowledge of what is already available or in progress, diagnosing problems quickly, and arriving collectively at the best ideas and innovations for the future. The User Group also intends to collect and make available various subsets and annotations of PDB entries for such purposes as teaching and structural analysis. Your input is solicited. Please let us know what type of user you are and what you consider priorities for the future. Please respond by e-mail to PDBusrgp@suna.biochem.duke.edu or by normal mail to: Jane Richardson - PDB User Group Box 3711 DUMC Durham, NC 27710 USA ---------------------------------------------------------------------- Distribution Directory Structure As discussed on the PDB Listserver, the distribution directory layout groups entries by the middle two characters of the ident code. Therefore, entry file pdb1abc.ent will be found in */compressed_files/ab and */uncompressed_files/ab. This change was introduced in the January 1994 release to alleviate the difficulty users were having finding entries using the old tapeNN layout method. Now, once an ident code is known it is simple to find an entry. If an ident code is not known, the index files provide a quick means of finding it. ---------------------------------------------------------------------- Access to PDB using FTP PDB has an anonymous FTP account on the computer system pdb.pdb.bnl.gov (Internet address 130.199.144.1). Files may be transferred to and from this system using anonymous as the FTP user name and your e-mail address as the password. Besides downloading entries, data files and documentation, it is possible to upload any files that you may wish to send to PDB. Please note that those using VMS may need to place quotes around file names. ---------------------------------------------------------------------- FTP Access Help Useful FTP commands: ascii - Set file transfer type to network ASCII. binary - Set file transfer type to support binary image transfer. bye - Terminate FTP session with remote server and exit FTP. cd remote-directory - Change working directory on remote machine to remote-directory. cdup - Change remote machine working directory to parent of current remote machine working directory. dir [ remote-directory ] [ local-file ] - Print listing of directory contents in directory, remote-directory, and, optionally, placing output in local-file. get remote-file [ local-file ] - Retrieve remote-file and store it on local machine. help [ command ] - Print informative message about meaning of command. ls [ remote-directory ] [ local-file ] - Print listing of contents of directory on remote machine. put local-file [ remote-file ] - Store local file on remote machine. pwd - Print name of current working directory on remote machine. quit - Synonym for bye. ---------------------------------------------------------------------- Useful directory and file descriptions: directory: all_entries - Contains up-to-date full-release entries. - All entries together in single directory (not divided by 2-character code). directory: crystallographer_info - Informational files of interest to crystallographers. directory: current_release - Contains up-to-date full-release entries. - Made up of last quarterly full-release entries and updated and additional full-release entries. - Always current full release (last quarterly plus updates). - Divided into 2-character directories. directory: fullrelease - Contains last quarterly full-release entries. - Divided into 2-character directories. directory: index - Index files that cross-reference ident codes to various parameters. directory: new_uploads - Uploads to PDB accepted here. directory: newly_released - Contains all updated and additional full-release entries since last quarterly full-release entries. - Divided into 2-character directories. directory: newsletter - PDB Newsletters and Full Tables. directory: nmr_restraints - NMR restraint files. directory: pub - Various useful items. directory: structure_factors - Contains last quarterly full-release structure factor files. file: README - FTP login message and README file. file: advisory.doc - PDB Advisory Notice. This notice should be signed and returned if you intend to download files. file: contents.lis (same as ls-lR) - Listing of files and directories on FTP. file: datestamp.txt - Description of datestamping method used for entry files. file: ftphelp.txt - Help on using FTP. file: how2dnld.txt - Information on how to download files. file: how2find.txt - Instructions on how to find a file (entry). file: how2upld.txt - Information on how to upload files. ---------------------------------------------------------------------- Access to PDB using Listserver PDB has a mailing list devoted to discussions concerning its operation and contents, and access to the Data Bank. If you would like to subscribe, please send e-mail to: listserv@pdb.pdb.bnl.gov with the message: subscribe PDB-L Firstname Lastname To find out what you can do with this mailing list, send e-mail to the same address (listserv@pdb.pdb.bnl.gov) with the one-line message of: help To send a message to all PDB-L subscribers, e-mail the message to: PDB-L@pdb.pdb.bnl.gov ---------------------------------------------------------------------- Access to PDB using Gopher PDB is accessible using Gopher software which follows a simple protocol to tunnel through a TCP/IP Internet. Gopher is recommended for obtaining information and files quickly and easily from PDB. As a Gopher client, you can navigate through a hierarchy of directories and documents or ask an index server to return a list of all documents that contain one or more specified words. You can choose "The PDB Anonymous FTP" after reaching PDB's Gopher server in order to search and download the same information and coordinate files as through FTP. Alternatively, you can select An (almost) full-text search of the PDB Bibliographic Headers, in order to search PDB using any keyword. Users running a Gopher client can access the PDB server by including the following link: Name = Protein Data Bank FTP site Type = 1 Host = pdb.pdb.bnl.gov Port = 70 Path = 1/ Information on setting up an Internet Gopher client including source files for different machines is available from anonymous FTP at boombox.micro.umn.edu (134.84.132.2) in the directory /pub/gopher. For more information or help in searching the PDB from Gopher, send e-mail to oeder@bnl.gov. ---------------------------------------------------------------------- Procheck Software Package The Procheck software package is being made available for electronic distribution from PDB. Oxford Molecular, Ltd. and PDB have agreed that upon receipt of a signed license agreement at PDB, the source and documentation for Procheck will be made available free of charge. The Procheck software package, which was created by J.M.Thornton, M.W.MacArthur, R.A.Laskowski and D.S.Moss, performs evaluations of the stereochemical quality of protein structures. To acquire a copy of Procheck you must obtain the license agreement, copies of which are available from FTP, Gopher and Listserver archives in the file /pub/procheck/procheck-license. You must complete and sign this license agreement and return it to PDB: Protein Data Bank Procheck License Chemistry Department, Building 555 Brookhaven National Laboratory P.O. Box 5000, Upton, NY 11973-5000 USA Once we have your signed agreement, we will either e-mail the source to you or place it on the machine of your choice by FTP. Your signed license agreement will be forwarded to Oxford Molecular who will keep you up to date about further developments. All queries concerning the software should be directed to: Steve Gardner, Macromolecular Product Manager Oxford Molecular, Ltd. The Magdalen Centre Oxford Science Park Sanford-on-Thames, Oxford, England OX4 4GA telephone: +44-865-784600 ---------------------------------------------------------------------- Guidelines for Deposition and Deposition Form Now available on FTP are a set of helpful Guidelines on preparing entries for submission to PDB. The Guidelines address issues of representation that frequently arise in the preparation of coordinate entries. For example, an explanation is included on how PDB represents structures with multiple chains. Also included is advice for avoiding errors commonly found in new depositions. Entries prepared following the Guidelines take less time to process, making it possible for us to issue ident codes more promptly. PDB strongly recommends that depositors review the documented Guidelines in order to facilitate the deposition process. The Guidelines and latest version of the Deposition Form are available from FTP in the /pub subdirectory. We are requesting that depositors discard all old versions of the Deposition Form (printed and/or electronic) and pick up the latest electronic version each time they are preparing data for deposition. Documents are available from FTP, Gopher and Listserver as well as in printed form upon request. ---------------------------------------------------------------------- Depositing Data with PDB PDB accepts depositions of biological macromolecule structures and the corresponding crystallographic structure factors or NMR experimental data. Types of structures accepted include proteins, carbohydrates, viruses, DNA and RNA. We convert deposited data to standard PDB format, run many verification, checking and quality control programs on the data, and archive and distribute the data worldwide. A deposition has three essential components, all of which must be received by the PDB before we can begin processing an entry: the completed Deposition Form, reprints and preprints of the referenced papers, and the data itself. The data must be submitted in machine readable form via one of the following: anonymous FTP in the directory new_uploads, e-mail, magnetic tape or floppy disk. You can obtain the latest Deposition Form, Format Description, and Guidelines for preparing entries from the FTP directory /pub, or from the PDB Listserver. Please use the latest version of the Deposition Form each time you make a deposit. We suggest that you edit the Deposition Form on-line and return it to us electronically. Please see additional information pertaining to depositing in the article entitled Deposition Contact Persons. Be sure to provide identifying information, including your name, address and telephone number, in the header of all files submitted. ---------------------------------------------------------------------- Assignment of Ident Codes Each entry in PDB is uniquely identified by a four-character ident code (also sometimes referred to as an accession code). Present PDB practice assigns ident codes without regard for the structure name. However, we recognize that many depositors would like to have ident codes that are related mnemonically to the names of their structures. Should you have a preference for a particular ident code, the PDB requests that you inform us about this on your Deposition Form. All reasonable suggestions will be considered. Of course, if the ident code that you are suggesting has already been used for an existing entry, then an alternative code will have to be assigned by PDB. ---------------------------------------------------------------------- Obtaining an Ident Code for a New Entry The ident code of a new entry will be issued only after the complete deposition is received and processing verifies the correctness and integrity of the entry. After processing is complete, a letter providing the ident code and requesting approval for release is sent to the depositor. To facilitate assignment of an ident code, it is necessary that the deposition include all applicable information, including preprints or reprints of journal articles referenced. Data must be in PDB format, and the Deposition Form must be legible and complete. ---------------------------------------------------------------------- Finding the Ident Code of an Existing Entry Each PDB entry is uniquely identified by an ident code. Therefore, retrieving the file for a particular structure requires this ident code. Lists of newly received entries currently are published quarterly in the PDB Newsletter and also in the Full Tables document. Tables of all PDB entries can be obtained from the FTP, Gopher and Listserver archives or by normal mail upon request. In FTP and Gopher are two subdirectories which are useful for locating ident codes. The first is /index. This contains the following files updated continuously to coincide with the current release: author.idx - ident codes and authors compound.idx - ident codes and full compound names cmpd_res.idx - ident codes, resolutions and compound names crystal.idx - ident codes, unit-cell dimensions, space groups and Z's resolu.idx - ident codes and resolutions source.idx - ident codes and biological sources The second useful subdirectory is /newsletter. This contains text (.txt) and PostScript (.ps) files of the Full Tables document listing all currently available entries and pending entries that are in preparation for future release. Retrieving files from these two subdirectories, as well as the directory listings which are in files named 'ls-lR', allows you to determine whether a molecule of interest is available, and its ident code. Please be aware that files can be downloaded while using FTP, but they cannot be viewed on the terminal while within this program. Therefore, it is sometimes helpful to download tables or directory listings, quit FTP, determine which additional files you want to retrieve and then reconnect to FTP to get them. If you are logged in from a UNIX computer, after retrieving a file you may view its contents by escaping to the shell with the command '!cat filename' or '!more filename', where cat and more are UNIX commands. ftp> get ls-lR (retrieves the file named ls-lR) ftp> !cat ls-lR (types the local file 'ls-lR') ftp> !more ls-lR (types the local file 'ls-lR' one page at a time) The PDB Listserver archives and Gopher are other methods to locate PDB ident codes. Please see detailed information about these services in other articles in this Newsletter. Finally, in some cases, journal articles reporting results of structure analyses of biological macromolecules provide their PDB ident codes. ---------------------------------------------------------------------- CD-ROM Information PDB releases are available on CD-ROM in ISO 9660 format. The layout of files on the CD-ROM mirrors the PDB UNIX tape distribution and uses the aa, ab, ac, ..., zz subdirectories. The entry files are ASCII format and are readable by software able to process text files. PDB-Shell, a facility for Windows users to access and display structures from the PDB database, is available on our CD-ROM. PDB-Shell allows the user to search the database for various criteria such as ident code, accession date, compound name and author. The PDB CD-ROM also includes the MAGE and PREKIN structure display and manipulation software by David Richardson and Jane Richardson of Duke University [The Kinemage: A Tool for Scientific Communication. Protein Science 1, 3-9 (1992)] in both Windows and Macintosh versions. VAX/VMS systems currently do not directly support access to ISO 9660 CD-ROMs. The PDB CD-ROM may be accessed on VAX/VMS systems using either of the following approaches: 1. Obtain the ISO 9660 compliant device driver available from Digital Equipment Corporation (DEC) that allows direct access to the CD-ROM (driver part number YT-GS001-01). Please contact your DEC sales representative for further information. 2. Use a public utility for accessing ISO 9660 CD-ROMs called CD_ACCESS, written by Peter Stockwell, University of Otago, New Zealand, that will allow all the files on the CD-ROM to be copied to a magnetic disk drive. This utility can be obtained from the EMBL e-mail server (for additional information you may contact DataLib@EMBL-Heidelberg.DE). When copying files using CD_ACCESS, be sure to use the /BINARY qualifier to the copy command. The CD-ROM does not mount properly on Silicon Graphics systems running IRIX version 4.0.1. To resolve this problem, you need to upgrade to version 4.0.2 or higher. Because of ISO 9660 limitations on symbolic links, we were unable to provide a directory on the CD-ROMs pointing to all entry files in the subdirectories. Therefore, we recommend you do so from a directory on one of your local disk filesystems. Further detailed instructions are included with each CD-ROM shipment. ---------------------------------------------------------------------- Affiliated Centers Ten affiliated centers offer DATAPRTP information for distribution. These centers are members of the Protein Data Bank Service Association (PDBSA). Centers designated with an asterisk(*) may distribute DATAPRTP information both on-line and on magnetic or optical media; those without an asterisk are on-line distributors only. CAN/SND Canadian Scientific Numeric Data Base Service Ottawa, Ontario, Canada Roger Gough (613-993-3294) cansnd@vm.nrc.ca CAOS/CAMM Dutch National Facility for Computer Assisted Chemistry Nijmegen, The Netherlands Jan Noordik (31-80-653386) noordik@caos.caos.kun.nl CINECA NE Italy Interuniversity Computing Center Caselecchio di Reno (BO), Italy Salvatore Rago (39-51-598411) argo@icineca.bitnet EMBL European Molecular Biology Laboratory Heidelberg, Germany Peter Rice (49-6221-387-247) peter.rice@embl-heidelberg.de *JAICI Japan Association for International Chemical Information Tokyo, Japan Hideaki Chihara (81-3-5978-3608) NCSA National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Champaign, Illinois Marcia Miller (217-244-0634) mmiller@ncsa.uiuc.edu *Osaka University Institute for Protein Research Osaka, Japan Yoshiki Matsuura (81-6-879-8605) Pittsburgh Supercomputing Center Pittsburgh, Pennsylvania Hugh Nicholas (412-268-4960) nicholas@cpwpsca.bitnet SDSC San Diego Supercomputer Center San Diego, California Lynn Ten Eyck (619-534-8189) teneyckl@sdsc.bitnet SEQNET Daresbury Laboratory Warrington, United Kingdom User Interface Group (44-925-603351) uig@daresbury.ac.uk ---------------------------------------------------------------------- To Contact PDB Protein Data Bank Chemistry Department, Building 555 Brookhaven National Laboratory P.O. Box 5000 Upton, NY 11973-5000 USA Telephone: +1 516-282-3629 Facsimile: +1 516-282-5751 e-mail: pdb@bnl.gov or pdb@bnl.bitnet Please include your telephone number, facsimile number, mailing address and e-mail address in all correspondence. ---------------------------------------------------------------------- PDB Staff Joel L. Sussman - Head David R. Stampf - Sr. Project Mgr. Enrique E. Abola - Science Coordinator Frances C. Bernstein Judith A. Callaway Minette Cummings Betty R. Deroski Pamela A. Esposito Arthur Forman Thomas F. Koetzle Patricia A. Langdon Michael D. Libeson Nancy O. Manning (Oeder) John E. McCarthy Regina K. Shea John G. Skora Karen E. Smith Dejun Xue ---------------------------------------------------------------------- Statement of Support PDB is supported by a combination of Federal Government Agency funds (work supported by the U.S. National Science Foundation; the U.S. Public Health Service, National Institutes of Health, National Center for Research Resources, National Institute of General Medical Sciences and National Library of Medicine; and the U.S. Department of Energy under contract DE-AC02-76CH00016) and user fees. ftp to pdb.pdb.bnl.gov ----------------------------------------------------------------------