NEWHELIX Memo -- 1 22 June 1989 To: All DNA Helix Analyzers From: Richard E. Dickerson Re: Upgrading HELIB and MODHELIX to Cambridge Conventions Last April I decided to revise the HELIB library of programs: AHELIX, BROLL, CYLIN and DTORAN, to bring them into agreement with the new Cambridge Conventions as published in EMBO Journal 8, 1-4, 1989, J. Biomol. Str. Dyn. 6, 627-634, 1989, Nucl. Acids Res., 5, 1797-1803, 1989, and J. Mol. Biol. 205, 787-791, 1989. I did so, but the more forward-looking members of my research group persuaded me to try using the MODHELIX program written at the Weizmann Institute by Dov Rabinovich, Klara Reich and Zippora Shakked. The experience made me a total convert. MODHELIX combines AHELIX, BROLL, CYLIN and DTORAN into one integrated package that is far more convenient to use than the individual HELIB routines. Laboriously constructed lists of sequential numbers of specific atoms must be fed as input to the four programs of HELIB, and preparing and checking such lists constitutes the major time sink in using HELIB to analyze DNA structures. MODHELIX eliminates all of this. It reads in a six- character atom identification code: abbcc, where a = base type (A, C, G, T), bb = base sequence number, and ccc = atom code (C1', N9, O1P, etc). It then uses this information to find its own atoms for the various calculations. MODHELIX is so superior to HELIB that it should be used exclusively in all cases where a double helix is being analyzed. HELIB remains more flexible, though less convenient, and still has a role when unusual tasks are demanded, such as stacking the top of one helix against the bottom of another within the crystal to evaluate how well the stacking simulates a continuous helix. To use a graduate student analogy, MODHELIX is smart but opinionated, and unwilling to modify its predetermined course of action, whereas HELIB is dumb but cooperative. Moreover, MODHELIX calculates useful functions that are missing from HELIB: the angles between glycosidic bonds and C1'-C1' base pair vectors, pseudorotation angles for sugar rings, and internal bond angles in the sugar rings. It also calculates mean values and standard deviations for most of the parameters tabulated. I have edited MODHELIX to bring it into line with the Cambridge nomenclature conventions, both with respect to names of parameters, and some essential changes in signs. These sign changes have been checked extensively by comparisons with stereo pairs of A, B and Z-DNA helices, and I believe that they now are correct. I have also changed MODHELIX to eliminate a few programming glitches and make the output format a little more convenient. This new version of MODHELIX has been labeled NEWHELIX. Its use is described in the accompanying instructions. If you send me a magnetic tape, I will be glad to send you a FORTRAN copy of NEWHELIX, along with hard copy documentation and examples of use. ********** For reference, the specific changes made between MODHELIX and NEWHELIX are as follows: 1. Names of parameters have been changed to agree with the Cambridge Conventions: Old: PHI/R PHI/T THET/R THET/T INCLIN DISP SLIP New: TIP INCL ROLL TILT INCL* X DSP Y DSP (INCL* differs from INCL in that INCL* is based on the inclination of the C6-C8 long base pair axis vector, whereas INCL is derived from the normals to planes through all atoms of the base pair. They are approximately equal.) Signs have been reversed for INCL, ROLL, PR TW, BUCKLE, SLIDE and Y DSP (the latter two in both BROLL and CYLIN subroutines), in accordance with the Cambridge Conventions, and these sign changes have been verified by viewing stereo pairs. 2. The helix axis generating program has been modified so that Strand 1 of the double helix always rises along the z axis to more positive values, no matter what the signs on the input coordinates. This is essential if correct signs are to be calculated for several of the output parameters. 3. The calculation of pseudorotation angle P has been corrected. (MODHELIX gave P centered around zero for both C3'- endo and C2'-endo conformations, because tan(P) is the same as tan(P+180), making the arctan function ambiguous.) Main chain torsion angle delta has been repeated alongside P to facilitate comparisons. 4. Main chain torsion angles and chi now are emitted in the range 0 to 360, rather than -180 to +180. This avoids an annoying break in values to either side of a trans torsion angle, and facilitates averaging. Epsilon - Zeta now is listed to facilitate location of BII phosphates. 5. NEWHELIX no longer requires that the sequences of Strands 1 and 2 of the double helix be identical, as MODHELIX and DTORAN had done. The only requirement now is that the two strands have the same number of bases. Hence NEWHELIX can be used with non-selfcomplementary helices such as G-G-A-T-G-G-G-A- G/C-T-C-C-C-A-T-C-C. 6. NEWHELIX now can handle purine-purine and pyrimidine- pyrimidine mispairs. The BROLL program as incorporated in MODHELIX assumed that, if the base on one strand was a purine, that on the other strand was a pyrimidine, and vice versa. This restriction has been removed, so one now can examine GA mispairs or the TT "base pairs" of the C-G-C-G-C-G-T-T-T-T-C-G-C-G-C-G hairpin. 7. The format-determining subroutine has been modified so that it will read the Brookhaven Data Bank files correctly, including ignoring all lines that do not begin with ATOM.... This means that one no longer has to delete dozens of lines, including HETATM and TER lines embedded in the atom coordinate list. 8. The layout format has been touched up here and there, among other things ensuring that the title, which identifies the sequence under analysis and the parameters being tabulated on that page, appears at the head of every single output page. Hence you could drop multiple runs on the floor (if you were so inclined) and re-order them again without ambiguity. Two or three explanatory comments on the printed output sheets have been changed to reflect the new conventions. ********** The Weizmann Institute improvements have made NEWHELIX a nearly ideal program--simple and rapid to use. I hope that NEWHELIX will become the "industry standard" after which the new generation of helix analysis programs will be patterned, and against which the ease of operation of the new programs will be judged. To this one particularly intensive user, the ideal new analysis program would be one that looked and acted exactly like NEWHELIX, but incorporated the improved computational algorithms that were discussed at Cambridge. As mentioned before, I will be happy to send a magnetic tape copy of the NEWHELIX program to anyone who requests it. I will also be happy to send you a copy of the HELIB library with Cambridge Convention changes, in case you need it for special purpose computing.