NAME

getPdbUpdate.pl


SYNOPSIS

A simple Perl utility for downloading the files from any given Protein Data Bank update date.

  getPdbUpdate.pl dates
  getPdbUpdate.pl latest
  getPdbUpdate.pl 20020708


DESCRIPTION


Overview

getPdbUpdate.pl is a simple Perl program for downloading all files from the Protein Data Bank (PDB) associated with a particular update date, including coordinate files, structure factors, and nmr restraints.


Requirements

getPdbUpdate.pl requires either one of the following common download utilities. On most systems where Perl is installed, one or both of these utilities will already be present.

1) LWP::UserAgent, a Perl module for downloading files from the World Wide Web. LWP::UserAgent is part of the libwww-perl module, which is available from CPAN. Please see the link in the SEE ALSO section.

2) wget, a common Unix utility for downloading files from the World Wide Web. wget is also available for most Windows operating systems. Please see the link in the SEE ALSO section.


Usage

The program can be run in three different modes as follows:

1) Get a list of valid update dates:

  getPdbUpdate.pl dates

In this mode, the program will only retrieve and print a list of all valid update dates.

2) Get the files from the latest update:

  getPdbUpdate.pl latest

In this mode, the program will retrieve all files associated with the latest update. The files will be saved in some or all of the following directories underneath your current working directory:

  <yyyymmdd>/added
  <yyyymmdd>/modified
  <yyyymmdd>/obsolete
  <yyyymmdd>/models_added
  <yyyymmdd>/models_modified
  <yyyymmdd>/models_obsolete

3) Get the files from any given update:

  getPdbUpdate.pl <yyyymmdd> 

  e.g.:
  getPdbUpdate.pl 20020708  

In this mode, the program will retrieve all files associated with the specified update date. Again, the files will be saved in the same directory structure, e.g.:

  20020708/added
  20020708/modified
  20020708/obsolete
  20020708/models_added
  20020708/models_obsolete


Notes

1) Please note that this program will write some or all of the following additional files relative to your current working directory:

  ls-lR
  wgettest.temp 
  <yyyymmdd>/added.pdb
  <yyyymmdd>/added.sf
  <yyyymmdd>/added.nmr
  <yyyymmdd>/modified.pdb
  <yyyymmdd>/modified.sf
  <yyyymmdd>/modified.nmr
  <yyyymmdd>/obsolete.pdb
  <yyyymmdd>/obsolete.sf
  <yyyymmdd>/obsolete.nmr
  <yyyymmdd>/models_added.pdb
  <yyyymmdd>/models_modified.pdb
  <yyyymmdd>/models_obsolete.pdb

If any of these files are present prior to running this program, they will be overwritten! A safe option would be to create a new directory first, and then run getPdbUpdate.pl, e.g.:

  mkdir newpdbfiles
  cd newpdbfiles
  getPdbUpdate.pl latest

2) If an added file (e.g. pdb1f5d.ent.Z from added.pdb) was obsoleted after the specified update date (20001212), then the program will issue a warning, and save the file in the obsolete directory (20001212/obsolete) on local disk instead of the added directory.

3) As of 1 July 2002, all theoretical models were moved to a dedicated area of the PDB FTP archive, and effectively all access to them was removed from the PDB Web interface. If a file (e.g. pdb1pln.ent.Z) represents a theoretical model, the program will issue a warning, and save the file in one of the models directories on local disk.

4) By default, the program will access the main PDB FTP servers at SDSC. Alternatively, another mirror FTP site may be specified as a second command line argument, e.g.:

  getPdbUpdate.pl latest ftp://rutgers.rcsb.org/PDB/pub/pdb/


VERSION

This documentation refers to version 1.1 of getPdbUpdate.pl.

Version history:

  Version  Date        Comments
  1.1      2002-09-13  Moved from LWP::Simple to LWP::UserAgent
                       to support proxies (e.g. from behind a firewall)
  1.0      2002-07-16  First release


AUTHOR

Wolfgang Bluhm ( mail@wbluhm.com ) for the Protein Data Bank ( info@rcsb.org )


BUGS

1) Not really a bug, but if your perl locations happens to be different from /usr/local/bin/perl, simply run the program as

  perl getPdbUpdate latest 

2) Be aware that only LWP::UserAgent::mirror, but not wget, preserves the original time stamps of the files being downloaded. getPdbUpdate.pl is intended only for your personal use, and hence this limitation may be of little consequence to you. Please note that files downloaded by getPdbUpdate.pl should not be served to the public through any kind of mirror site.


SEE ALSO

  http://www.rcsb.org/pdb/ -- Protein Data Bank (PDB) home page
  ftp://ftp.rcsb.org/pub/pdb -- PDB FTP site
  http://www.rcsb.org/pdb/cgi/resultBrowser.cgi?Date::update=1 -- Last PDB Update
  ftp://ftp.rcsb.org/pub/pdb/software -- download page for this script and documentation

  http://www.cpan.org/modules/by-module/LWP/ -- libwww-perl download page
  http://www.gnu.org/software/wget/wget.html -- wget home page


COPYRIGHT

                            Copyright 2002
               The Regents of the University of California
                          All Rights Reserved

 Permission to use, copy, modify and distribute any part of this PDB
 software for educational, research and non-profit purposes, without fee,
 and without a written agreement is hereby granted, provided that the above
 copyright notice, this paragraph and the following three paragraphs appear
 in all copies.
 
 Those desiring to incorporate this PDB Software into commercial products
 or use for commercial purposes should contact the Technology Transfer
 Office, University of California, San Diego, 9500 Gilman Drive, La Jolla,
 CA 92093-0910, Ph: (858) 534-5815, FAX: (858) 534-7345.
 
 In no event shall the University of California be liable to any party for
 direct, indirect, special, incidental, or consequential damages, including
 lost profits, arising out of the use of this PDB software, even if the
 University of California has been advised of the possibility of such
 damage.
 
 The PDB software provided herein is on an "as is" basis, and the
 University of California has no obligation to provide maintenance,
 support, updates, enhancements, or modifications.  The University of
 California makes no representations and extends no warranties of any kind,
 either implied or express, including, but not limited to, the implied
 warranties of merchantability or fitness for a particular purpose, or that
 the use of the pdb software will not infringe any patent, trademark or
 other rights.