NEWHELIX Instructions -- 1 NEWHELIX INSTRUCTIONS Richard E. Dickerson Molecular Biology Institute University of California Los Angeles, CA 90024, U.S.A. Version of 22 June 1989 NEWHELIX is a modest revision of MODHELIX, the helix analysis program written at the Weizmann Institute of Science by Dov Rabinovich, Klara Reich, and Zippora Shakked, to bring its nomenclature and signs of parameters into agreement with the 1988 Cambridge Conventions. MODHELIX itself represented a combining into one master program of the four routines of the HELIB library: HELIX by John M. Rosenberg, and BROLL, CYLIN and DTORAN by R. E. Dickerson. Whereas the latter three programs required the input of long lists of sequential numbers to identify atoms needed in the calculations, NEWHELIX (and MODHELIX) read a six-character atom identifier code along with x, y and z coordinates, and use this code to find the needed atoms from the input list. This code is of the form: abbccc, where a = type of base (A,C,G,T), bb = base sequence number, and ccc = type of atom (C4', N7, P, O1P, etc.). NEWHELIX is most useful in analyzing double helices, whether selfcomplementary or with differing strand sequences. But for special-purpose calculations, such as considering stacked helices in a crystal as a continuous helix and evaluating helix parameters across the junction, the HELIB routines remain more flexible and easier to use. In NEWHELIX, once subroutine HELIX has been used to generate a coordinate set with the helix axis ascending along z, the other principal subroutines BROLL, CYLIN and TORANG can be run separately or in any combination. NEWHELIX calculates all quantities that were produced by HELIB, plus pseudorotational parameters and interior bond angles for sugar rings, and angles between glycosyl bonds and C1'-C1' vectors of base pairs. It also computes mean values and standard deviations of most parameters. The following input/output files are used: Instruction card input Logical input file 5 Input atomic coordinate list Logical input file 11 Output of helix analysis tables Logical output file 6 Output atomic coordinates: Format (3F10.5,11X,2A4,5X,I5) Logical output file 12 Konnert-Hendrickson format Logical output file 15 INSTRUCTION CARDS These cards supply the program with information about unit cell dimensions, atom list format, etc., and define which calculations are to be carried out. The style of input parameters is similar to that of SHELX. The first four characters on each card define the type of instruction. If the program does not recognize them, or if the first four characters are blank, the contents of the card are ignored. Such a card can be used as a comment card. Alternatively, one can have several cards of the same type (e.g., for different input formats), activating one by moving it flush left and inactivating the others by spacing them four or more places to the right. Following the first four spaces, the remaining 72 spaces on a card are used to convey numerical information. Characters '0' to '9', '-' and '.' are always assumed to be part of a number. Any other characters except '=' may be used to separate two numbers, providing that the instruction does not include alphabetic information (e.g. HELX). Thus the following two cards are exactly equivalent in action: CELL 7.64 8.39 13.00 90.0 103.7 90.0 CELL a 7.64, b 8.39, c 13, ALPHA90BETA103.7GAMMA90 But do not use 'a=7.64'; the '=' has a special function. To continue a line onto the next card, end the line with '=', and indent the following card by four blank spaces. The '=' at the end of the preceding card causes the following card to be interpreted as a continuation card, rather than being ignored. Note that if an expected number is not found, it will either be given a default setting (not necessarily zero), or if there is no default setting, will be treated as an error. In the description below, '*' denotes cards which, if present, must appear in the sequence given below. All other cards may appear in any order, except that the last card must be 'END'. The obligatory minimum set of cards is: 'TITL', 'CELL', one of the cards specifying which of the subroutines are to be executed, one card for atom input format ('FFMT', 'CORL', 'KONN' or 'BRKH'), and 'END'. * TITL (Followed by a title of up to 72 characters) * CELL a, b, c, alpha, beta, gamma (in Angstroms, and in degrees or cosines of the angles.) * Atom Input Coordinate Format Card Four options are available, reading from logical unit 11: (1) FFMT (72 characters specifying format of input file) Parameters are read in the following sequence: TEST: The first four columns of an input data card, acted upon only in the Brookhaven input option. The Brookhaven option bypasses an input card unless it begins with ATOM. All other input format options simply read the TEST columns and then ignore them. X, Y, Z coordinates Atom identification code: a bb ccc , where: a = Base type: A, C, G or T (read with A1) bb = Base sequence number (read with 2A1) ccc = Atom identification--e.g. C1', O2P, or N9 (read with 3A1) (Warning: Label phosphate oxygens as O1P and O2P, not OL and OR.) The T format is extremely useful in building the FFMT statement. 'Tn' means "position the index pointer to column 'n' for future action. If the input file is of the form: XXXX.XXXYYYY.YYYZZZZ.ZZZ......a.....bb...ccc it can be read by: FFMT (A4,T1,3F8.3,6X,A1,5X,2A1,3X,3A1) or by: FFMT (A4,T1,3F8.3,T31,A1,T37,2A1,T42,3A1). Format 'A4' reads the first four columns as TEST, and T1 sets the pointer back to the first column again for reading coordinates. But a different order of data: bb....XXXX.XXXYYYY.YYYZZZZ.ZZZ...ccc..a can only be read with the aid of the T format: FFMT (A4,T7,3F8.3,T39,A1,T1,2A1,T34,3A1). (2) KONN (72 blanks) Causes atom coordinates to be read in the Konnert-Hendrickson format used in NUCLSQ (not the PROLSQ format)--i.e.: (A4,T19,3F10.5,T9,6A1). (3) CORL (72 blanks) Causes coordinates to be read in Corels format: (A4,T16,3F10.5,T4,A1,T10,5A1) (4) BRKH (72 blanks) Causes coordinates to be read in Brookhaven Data Bank format: (A4,T31,3F8.3,T20,A1,T25,2A1,T14,3A1) FLGP (default 0) If a FLGP card is present, coordinates generated by HELIX are included along with helix parameter tables in the output listing on logical file 6. FPUN (default 0) If present, causes coordinates to be written on logical units 12 and 15. This card is necessary if BROLL, CYLIN or TORANG are to be run, because they use the file 12 coordinate list. Files 12 and 15 can be deleted at the very end, if desired, by a $ DEL command as shown on the sample input. As presently written in the 'PUTATM' routine, the program writes onto logical unit 12 the Cartesian coordinates for all even powers of the helix operator as specified by PMIN and PMAX. For example, in order to create coordinates of a continuous helix based on input coordinates of a dodecamer, one may use the powers 0, 12, 24, etc. For an octamer the corresponding powers are 0, 8, 16, etc. * PMIN n (default 0) Number n is the minimum even power of the helix operator. (Applies only to options HELX or HLX2.) * PMAX n (default 0) Number n is the maximum even power of the helix operator. (Applies only to options HELX or HLX2.) * NATM n (default: all atoms read in) Number n limits the atom coordinate read-in to the first n atoms in the list. When BRKH rejects a card because it does not begin with ATOM, it also does not count it toward the total of n atoms. The present array limits allow 996 input atoms and 200 helix vectors. * BASE n Number n is the number of base pairs in the double helix. HELX cards Construct best helix using: HELX (72 blanks) C1' atoms only HELX C1' C1' C1' atoms HELX RN9 YN1 Purine N9 and pyrimidine N1 atoms HELX (any atom names) The specified atoms Note: Several consecutive HELX cards can be used to combine sets of the above atoms. In particular, the pair: HELX C1' C1' followed by: HELX RN9 YN1 will cause a helix axis to be generated using C1' and N9 of purines and C1' and N1 of pyrimidines, probably the most generally useful combination. More precision can be gained by using the HLX2 card instead of HELX: HLX2 Cards--Alternative to HELX cards HLX2 a, b, c, d, e, f, g, h,....(where a--h are atom numbers) This causes the helix to be defined by the vectors from atom pairs a to b, c to d, e to f, g to h, etc. Stepping along a row of sequential atoms down the helix is achieved by repeating atom numbers. If successive atom numbers along the helix are a, b, c, d, e, f, etc., the proper card entry would be: HLX2 a, b, b, c, c, d, d, e, e, f,.... Commas above are optional; a blank between numbers is sufficient. The easiest way to obtain numbers of the atoms is to run the program first with cards: HELX C1' C1' and: HELX RN9 YN1. The output from this run will list all of the atom-atom vectors by numbers, and certain of those pairs then can be selected for the second run. You can use a combination of any number of HELX cards and one (only) HLX2 card. HELX then invokes all atoms of the type specified, and HLX2 adds other specific atoms. Only one HLX2 can be used, as a second one would override the first. Extend the first HLX2 card via '=' continuations instead. BROL Runs the BROLL program (see description below) CYLN Runs the CYLIN program (see description below) TRNG Runs the TORANG program (see description below) * END Mandatory final card SAMPLE INPUT AND OUTPUT Attached are three sample input files, A, B and C, and the output that results with input C. They illustrate the use of alternative input command cards (BRKH vs. FFMT), alternative specifications of the atom pairs used in defining the helix axis (HELX vs. HLX2), and the inactivation of command cards by shifting them four or more spaces to the right. Typed comments in parentheses are only explanatory. COMMENTS ON INDIVIDUAL SUBROUTINES The helix parameters calculated by NEWHELIX are discussed in Fratini et al. (1982) J. Biol. Chem. 257, 14686-14707, and in the appendix to Jurnak and McPherson (eds) (1985) Biological Macromolecules and Assemblies: Vol. 2, Nucleic Acids and Interactive Proteins, Wylie, New York, pp. 471-494. The new nomenclature and definitions of the Cambridge Conventions have been published in EMBO Journal 8, 1-4 (1989), J. Biomol. Str. Dyn. 6, 627-634 (1989), Nucl. Acids Res. 17, 1797-1803 (1989) and J. Mol. Biol. 205, 787-791 (1989), and will be assumed from here on as having been read. A copy is attached, and reference will be made to its Figures. I. HELIX This is the helix-generating program by John M. Rosenberg, as revised by Horace R. Drew in 1980. Present array limits as set by the Weizmann group in November 1988 allow 996 input atoms and 200 helix vectors. As incorporated in NEWHELIB, HELIX now always generates helix coordinates in which Strand 1 ascends the helix axis toward greater z values, no matter what the orientation of the input helix coordinates. This convention is essential if all of the signs of parameters calculated later are to be consistent. (In earlier uses of HELIB, Strand 1 was chosen to descend the z axis. All of the alterations of signs listed below have been made in order to be consistent with the new, ascending-z convention.) II. BROLL This program, by RED, uses the output coordinate listings from HELIX (on logical unit 12) to calculate direction cosines and corresponding angles for the normals to all base planes, and to the best plane through both bases of a pair. It then calculates helix parameters that depend on base plane normals, or on the orientation of the base pair long axis, defined by the line between the C6 of a pyrimidine and the C8 of a purine. (Note: MODHELIX assumed that if one base of a pair was a purine, the other had to be a pyrimidine. NEWHELIX looks at the identity of each base individually, and therefore allows for purine/purine or pyrimidine/pyrimidine mispairings.) Parameters calculated include (see Figures 7 and 8): 1. TIP and INCLination angles for individual bases and for base pairs (old PHI/R and -PHI/T). TIP is positive for right-hand rotation about a C6-C8 vector along the base pair long axis from Strand 2 to Strand 1 (the +y axis). INCL is positive for right-hand rotation about a vector from the helix axis toward the major groove (the +x axis). INCL* is similar to INCL, but defined as the angle that the C6-C8 long axis makes with a plane normal to the helix axis, rather than in terms of all of the atoms of a base or base pair. (The sign of INCL has been reversed to bring it into agreement with the Cambridge Convention and with INCL*.) INCL is positive for A-DNA. 2. ROLL and TILT angles between adjacent bases and between adjacent base pairs along the helix (old -THET/R and THET/T). ROLL is positive if the roll angle between base or base pair planes opens toward the minor groove. TILT is positive if the tilt angle between bases or base pairs opens toward Strand 1. (The sign of ROLL was automatically reversed by the new ascending-z convention for Strand 1, but has been reversed back again to bring it into agreement with past practice and the Cambridge Conventions.) 3. Propeller twist (PR TW) between bases of a pair, negative for clockwise rotation of the nearer base in a view down the long axis. (This is a reversal of the former sign choice, but has been made by the Cambridge Conventions because the old positive sense of PR TW would have violated the standard IUPAC right-hand rule for torsion angle signs.) If TIP1 and TIP2 are values for individual base tip on Strands 1 and 2 of a pair, then it is approximately true that: PR TW = TIP1 - TIP2. 4. BUCKLE, which is the dihedral angle between bases along their short axis, after propeller twist has been rotated back to zero. BUCKLE is positive if the base pair domes in the 5'-to-3' chain direction of Strand 1 of the helix. If INCL1 and INCL2 are values for individual base inclination on Strands 1 and 2 of a pair, it is approximately true that: BUCKLE = INCL2 - INCL1. 5. SLIDE, the relative displacement of midpoints of the C6-C8 line for two adjacent base pairs, viewed in projection on a plane midway between the two base pairs. It therefore measures relative lateral displacement from one base pair to the next, and is totally independent of the choice of helix axis. An analytical expression for SLIDE is found in the appendix to Jurnak and McPherson. SLIDE is positive if the second base pair is shifted more toward Strand 1 than was the first base pair. 6. X and Y displacement (X DSP and Y DSP) These measure the displacement of the midpoint of the C6-C8 line from the helix axis, in a direction perpendicular to and parallel to the C6-C8 line respectively (see Figure 8). X DSP is positive if the base pair is moved away from the helix axis in the direction of the major groove (making the helix axis run down the minor groove). X DSP is positive for Z-DNA, nearly zero for B-DNA, and negative for A-DNA. Y DSP is positive if the base pair is slid along its long axis toward Strand 1 of the helix. 7. C6/C8 is the distance between pyrimidine C6 and purine C8 atoms, viewed in projection down the helix axis. ***** Special note for Z-DNA: The left-handed helix sense means that the signs for INCL, INCL*, and X DSP as printed in the NEWHELIX output must be reversed. The signs of Y DSP and all other signs are correct. III. CYLIN This program uses the output from HELIX to calculate various helix parameters that depend on cylindrical coordinates of sugar C1' and phosphate P atoms. Parameters calculated include: 1. R, PHI and Z: Polar coordinates of the phosphorus atoms. As mentioned earlier, Z always increases along Strand 1 and decreases along Strand 2. 2. D = Distance between two successive P along one strand. Q = Component of D in plane normal to helix axis. H = Component of D along helix axis. PI = Local pitch angle = arcsin(H/D). 3. Single-strand rotations and rise relative to helix axis: S5" = Helical rotation semiangle from P past O5' to Cl' in a 5'-to-3' direction. S3" = Helical rotation semiangle from C1' past O3' to P. [R(P) = Helical rotation from one P to the next = S5"+S3"]. Note: This quantity had little use, and has been replaced by: Q(Cl") = Distance from one C1' atom to the next along one strand, measured in projection on a plane perpen- dicular to the helix axis. (Measures base overlap.) T(Cl") = Helix twist angle from one C1' to the next along one strand = S3"+ following S5". H(Cl") = Vertical rise along helix axis from one C1' to the next along one strand. 4. Global TWIST and RISE: TWIST = Angle between C1'-C1' vectors of two successive base pairs, viewed in projection down the helix axis. RISE = Mean of the rise between successive C1' atoms on the two ends of the base pairs. Note that TWIST and RISE are properties of the double helix, whereas all other quantities are properties of one individual helix strand or the other. 5. SLIDE, X DSP, Y DSP and C1/C1, defined as in the BROLL program, but now using C1' atoms positions instead of C6 and C8. As with the BROLL program, for Z-DNA the sign of X DSP must be reversed; that of Y DSP is correct as printed. 6. LAMBDA, the angles between C1'-N1 or C1'-N9 glycosidic bonds and the base pair C1'-C1' line. This quantity has been used by Kennard and coworkers in the study of mispaired bases. 7. All reduced P-P distances in the double helix. These are the true P-P distances, decreased by 5.8 A to approximate two van der Waals phosphate group radii, and are of particular utility in examining the effective opening widths in major and minor grooves. For a N-base pair double helix, phosphorus P2 to PN of Strand 1 run from left to right across the table, and phosphorus P(N+2) to P(2N) of Strand 2 run down the table. The minor groove widths are marked on the illustrative example. III. TORANG This program calculates main chain torsion angles and glycosyl angles. It has been extended by the Weizmann group to calculate sugar angles and pseudorotation angles as well. It uses output coordinate listings from HELIX. Angles are named in accordance with IUB/IUPAC recommendations: P------O5'------C5'------C4'------C3'------O3'------P alpha beta gamma delta epsilon zeta For pyrimidines, chi is defined by: O4'--C1'--N1--C2 For purines, chi is defined by: O4'--C1'--N9--C4 In NEWHELIX, all main chain and glycosyl torsion angles now are emitted in the range of 0 to 360 degrees, rather than the previous -180 to +180. That old range had been one of the mistakes of HELIB, because it led to a discontinuity exactly in the middle of values for a trans torsion angle, and made averaging unnecessarily tedious. Having the range end at 0 and 360 is harmless because torsion angles around 0 are almost never encountered. The new, all-positive angle values should save everyone a lot of work. Torsion angles are printed in a 5'-to-3' direction along each strand. Hence the first base in the Strand 1 listing is associated with the last base in the Strand 2 listing, etc. At the right of the Strand 1 table, DEL is the difference between delta values at the two ends of that particular base pair. At the right of the Strand 2 table, MEAN is the average of deltas at two ends of the base pair in question. (Note that, for the first base pair of a helix as measured on Strand 1, the DEL value will be at the top of the Strand 1 column, and the MEAN value will be at the bottom of the Strand 2 column.) The difference between torsion angles epsilon and zeta is listed at the far right of the table. This difference is useful in identifying BII phosphates. In both of the earlier routines DTORAN and MODHELIX, the atom order in Strands 1 and 2 of the helix was required to be the same, as the locations of atoms in Strand 2 were found by adding a constant (the number of atoms in one strand) to the locations in Strand 1. This computational shortcut meant that if the sequences of the two strands were not identical, erroneous torsion angles would result in Strand 2. This limitation now has been removed in NEWHELIX. The only requirement is that Strands 1 and 2 have the same number of bases. Hence NEWHELIX can be used with non-selfcomplementary helices. Sugar ring pseudorotation angles V0 through V4 and the phase angle P (Pseud.) are calculated as in Altona and Sundaralingam (1972), J. Amer. Chem. Soc. 94, 8205-8212 or page 20 of Saenger's "Principles of Nucleic Acid Structure", Springer-Verlag, 1983. Torsion angle delta is repeated alongside P for ease in making comparisons. P and delta can be related theoretically by: Delta = 40 cos(P + 144) + 120 (angles in degrees) P is centered around 0 for C2'-exo/C3'-endo conformations, and around 180 for C2'-endo/C3'-exo. Sugar ring internal angles are listed in a 5'-to-3' direction for each strand. REFERENCE SUMMARY OF PROGRAM MODIFICATIONS 8 May 89 Correction of sign of Y DSP 20 Jun 89 Q(Cl") replaces R(P) in CYLIN subroutine. 22 Jun 89 (Epsilon-Zeta) added to TORANG subroutine.