This is RNAlib.info, produced by makeinfo version 4.13 from RNAlib.texinfo.  File: RNAlib.info, Node: Top, Next: Introduction, Prev: (dir), Up: (dir) This file documents the Vienna RNA Package Version 1.6 Copyright (C)1994-2002 Ivo Hofacker and Peter Stadler * Menu: * Introduction:: * Folding Routines:: Functions for Folding RNA Secondary Structures * Parsing and Comparing:: Functions to Manipulate Structures * Utilities:: Odds and Ends * Example:: A Small Example Program * References:: * Function Index:: * Variable Index::  File: RNAlib.info, Node: Introduction, Next: Folding Routines, Prev: Top, Up: Top 1 Introduction ************** The core of the Vienna RNA Package is formed by a collection of routines for the prediction and comparison of RNA secondary structures. These routines can be accessed through stand-alone programs, such as RNAfold, RNAdistance etc., which should be sufficient for most users. For those who wish to develop their own programs we provide a library which can be linked to your own code. This document describes the library and will be primarily useful to programmers. However, it also contains details about the implementation that may be of interest to advanced users. The stand-alone programs are described in separate man pages. The latest version of the package including source code and html versions of the documentation can be found at `http://www.tbi.univie.ac.at/~ivo/RNA/'. This manual documents version 1.6. Please send comments and bug reports to .  File: RNAlib.info, Node: Folding Routines, Next: Parsing and Comparing, Prev: Introduction, Up: Top 2 Folding Routines ****************** * Menu: * mfe Fold:: Calculating Minimum Free Energy Structures * PF Fold:: Calculating Partition Functions and Pair Probabilities * Inverse Fold:: Searching for Predefined Structures * Suboptimal folding:: Enumerating Suboptimal Structures * Cofolding:: Folding of 2 RNA molecules * Local Fold:: Predicting local structures of large sequences * Alignment Fold:: Predicting Consensus Structures from Alignment * Fold Vars:: Global Variables for the Folding Routines * Param Files:: Reading Energy Parameters from File  File: RNAlib.info, Node: mfe Fold, Next: PF Fold, Prev: Folding Routines, Up: Folding Routines 2.1 Minimum free Energy Folding =============================== The library provides a fast dynamic programming minimum free energy folding algorithm as described by `Zuker & Stiegler (1981)'. Associated functions are: -- Function: float fold (char* SEQUENCE, char* STRUCTURE) folds the SEQUENCE and returns the minimum free energy in kcal/mol; the mfe structure in bracket notation (*note notations::) is returned in STRUCTURE. Sufficient space for string of the same length as SEQUENCE must be allocated for STRUCTURE before calling `fold()'. If `fold_constrained' (*note Variables: Fold Vars.) is 1, the STRUCTURE string is interpreted on input as a list of constraints for the folding. The characters " | x < > " mark bases that are paired, unpaired, paired upstream, or downstream, respectively; matching brackets " ( ) " denote base pairs, dots "." are used for unconstrained bases. Constrained folding works by assigning bonus energies to all structures that comply with the constraint. -- Function: float energy_of_struct (char* SEQUENCE, char* STRUCTURE) calculates the energy of SEQUENCE on the STRUCTURE The `circ_fold()' and `energy_of_circ_struct()' provide analogous functions for folding and evaluating cricular RNA molecules. -- Function: void initialize_fold (int LENGTH) Obsolete function kept for backward compatibility. Allocates memory for mfe folding sequences not longer than LENGTH, and sets up pairing matrix and energy parameters. -- Function: void free_arrays () frees the memory allocated by `fold()'. -- Function: void update_fold_params () call this to recalculate the pair matrix and energy parameters after a change in folding parameters like `temperature' (*note Variables: Fold Vars.). Prototypes for these functions are declared in `fold.h'.  File: RNAlib.info, Node: PF Fold, Next: Inverse Fold, Prev: mfe Fold, Up: Folding Routines 2.2 Partition Function Folding ============================== Instead of the minimum free energy structure the partition function of all possible structures and from that the pairing probability for every possible pair can be calculated, using a dynamic programming algorithm as described by `McCaskill (1990)'. The following functions are provided: -- Function: float pf_fold (char* SEQUENCE, char* STRUCTURE) calculates the partition function Z of SEQUENCE and returns the free energy of the ensemble F in kcal/mol, where F=-kT ln(Z). If STRUCTURE is not a NULL pointer on input, it contains on return a string consisting of the letters " . , | { } ( ) " denoting bases that are essentially unpaired, weakly paired, strongly paired without preference, weakly upstream (downstream) paired, or strongly up- (down-)stream paired bases, respectively. If `fold_constrained' (*note Variables: Fold Vars.) is 1, the STRUCTURE string is interpreted on input as a list of constraints for the folding. The character "x" marks bases that must be unpaired, matching brackets " ( ) " denote base pairs, all other characters are ignored. Any pairs conflicting with the constraint will be forbidden. This usually sufficient to ensure the constraints are honored. If `do_backtrack' (*note Variables: Fold Vars.) has been set to 0 base pairing probabilities will not be computed (saving CPU time), otherwise the `pr[iindx[i]-j]' (*note Variables: Fold Vars.) will contain the probability that bases i and j pair. -- Function: void init_pf_fold (int LENGTH) allocates memory for folding sequences not longer than LENGTH; sets up pairing matrix and energy parameters. Has to be called before the first call to `pf_fold()'. -- Function: void free_pf_arrays (void) frees the memory allocated by `init_pf_fold()'. -- Function: void update_pf_params (int LENGTH) Call this function to recalculate the pair matrix and energy parameters after a change in folding parameters like temperature (*note Variables: Fold Vars.). -- Function: double mean_bp_dist (int LENGTH) computes the mean base pair distance in the equilibrium ensemble as a measure of the structural diversity. It is given by = \sum_a,b p_a * p_b * d(a,b), where the sum goes over all pairs of possible structure a,b, p_a is the Boltzmann weight of structure a, and d(a,b) is the base pair distance (see `bp_distance()' in *Note Distances::.). The mean base pair distances can be computed efficiently from the pair probabilities p_ij as = \sum_ij p_ij * (1-p_ij). Uses the global `pr' array filled by a previous call to `pf_fold()'. -- Function: char *centroid (int LENGTH, double *DIST) Computes the centroid structure, i.e. the structure having the lowest average base pair distance to all structures in the Boltzmann ensemble. This can be computed trivially from the pair probabilities by choosing all base pairs that have probability greater 0.5. The distance of the centroid to the ensemble is returned in DIST. Prototypes for these functions are declared in `part_func.h'. -- Function: float *MEA (plist *PL, char *STRUCTURE, double GAMMA) Computes a MEA (maximum expected accuracy) structure, such that expected accuracy A(S) = \sum_(i,j) \in S 2 \gamma p_ij + \sum_i \notin S p^u_i is maximised. Higher values of \gamma result in more base pairs of lower probability and thus higher sensitivity. Low values of gamm result in structures containing only highly likely pairs (high specificity). The code of the MEA function also demonstrates the use of sparse dynamic programming scheme to reduce the time and memory complexity of folding. The corresponding protoype is declared in `MEA.h'.  File: RNAlib.info, Node: Inverse Fold, Next: Suboptimal folding, Prev: PF Fold, Up: Folding Routines 2.3 Inverse Folding =================== We provide two functions that search for sequences with a given structure, thereby inverting the folding routines. -- Function: float inverse_fold (char* START, char* TARGET) searches for a sequence with minimum free energy structure TARGET, starting with sequence START. It returns 0 if the search was successful, otherwise a structure distance to TARGET is returned. The found sequence is returned in START. If `give_up' is set to 1, the function will return as soon as it is clear that the search will be unsuccessful, this speeds up the algorithm if you are only interested in exact solutions. Since `inverse_fold()' calls `fold()' you have to allocate memory for folding by calling `initialize_fold()' -- Function: float inverse_pf_fold (char* START, char* TARGET) searches for a sequence with maximum probability to fold into structure TARGET using the partition function algorithm. It returns -kT log(p) where p is the frequency of TARGET in the ensemble of possible structures. This is usually much slower than `inverse_fold()'. Since `inverse_pf_fold()' calls `pf_fold()' you have to allocate memory for folding by calling `init_pf_fold()' -- Variable: char *symbolset The global variable `char *symbolset' points to the allowed bases, initially `"AUGC"'. It can be used to design sequences from reduced alphabets. Prototypes for these functions are declared in `inverse.h'.  File: RNAlib.info, Node: Suboptimal folding, Next: Cofolding, Prev: Inverse Fold, Up: Folding Routines 2.4 Suboptimal folding ====================== -- data type: SOLUTION struct {float energy; char *structure;} -- Function: SOLUTION* subopt (char* SEQUENCE, char* CONSTRAINT, int* DELTA, FILE* FP) produces *all* suboptimal secondary structures within DELTA*0.01 kcal/mol of the optimum. The results are either directly written to a FP (if FP is not `NULL'), or (FP`==NULL') returned in a `SOLUTION *' list terminated by an entry were the `structure' pointer is `NULL'. Prototypes for these functions are declared in `subopt.h'.  File: RNAlib.info, Node: Cofolding, Next: Local Fold, Prev: Suboptimal folding, Up: Folding Routines 2.5 Predicting hybridization structures of two molecules ======================================================== The function of an RNA molecule often depends on its interaction with other RNAs. The following routines therefore allow to predict structures formed by two RNA molecules upon hybridization. One approach to co-folding two RNAs consists of concatenating the two sequences and keeping track of the concatenation point in all energy evaluations. Correspondingly, many of the `cofold()' and `co_pf_fold()' routines below take one sequence string as argument and use the the global variable CUT_POINT to mark the concatenation point. Note that while the `RNAcofold' program uses the '&' character to mark the chain break in its input, you should not use an '&' when using the library routines (set CUT_POINT instead). In a second approach to co-folding two RNAs, cofolding is seen as a stepwise process. In the first step the probability of an unpaired region is calculated and in a second step this probability of an unpaired region is multiplied with the probability of an interaction between the two RNAs. This approach is implemented for the interaction between a long target sequence and a short ligand RNA. Function `pf_unstru()' calculates the partition function over all unpaired regions in the input sequence. Function `pf_interact()', which calculates the partition function over all possible interactions between two sequences, needs both sequence as separate strings as input. -- Variable: int cut_point `cut_point' marks the position (starting from 1) of the first nucleotide of the second molecule within the concatenated sequence. The default value of -1 stands for single molecule folding. The `cut_point' variable is also used by `PS_rna_plot()' and `PS_dot_plot()' to mark the chain break in postscript plots. -- Function: float cofold (char *SEQUENCE, char *STRUCTURE) The analog to the `fold()' function. Computes the minimum free energy of two RNA molecules interacting. If `cut_point==-1' results should be the same as with `fold()'. -- Function: void free_co_arrays (void) Frees arrays allocated by `cofold()'. -- Function: void initialize_cofold (int LENGTH) Allocates memory for mfe cofolding sequences not longer than LENGTH, and sets up pairing matrix and energy parameters. Explicitly calling `initialize_cofold()' is normally not necessary, as it will be called automagically before folding. Prototypes for these functions are declared in `cofold.h'. 2.6 Partition Function Cofolding ================================ As for folding one RNA molecule, this computes the partition function of all possible structures and the base pair probabilities. Uses the same global PF_SCALE variable to avoid overflows. To simplify the implementation the partition function computation is done internally in a null model that does not include the duplex initiation energy, i.e. the entropic penalty for producing a dimer from two monomers). The resulting free energies and pair probabilities are initially relative to that null model. In a second step the free energies can be corrected to include the dimerization penalty, and the pair probabilities can be divided into the conditional pair probabilities given that a re dimer is formed or not formed. -- data type: cofoldF struct {double F0AB, FAB, FcAB, FA, FB;} -- Function: cofoldF co_pf_fold (char* SEQUENCE, char* STRUCTURE) The analog to the `pf_fold()' function, it computes the partition function of two interacting RNA molecules as well as base pair probabilities. It returns a struct including free energies computed for different models. F0AB is the free energy in the null model; FAB is corrected to include the initiation penalty, FcAB only includes real dimers. FA and FB are the free energies of the 2 molecules, resp. As with `cofold()' the CUT_POINT global variable has to be set to mark the chain break between the molecules. -- Function: void free_co_pf_arrays (void) frees partition function cofold arrays allocated in ` co_pf_fold()'. -- Function: void init_co_pf_fold (int LENGTH) Obsolete function kept for backward compatibility. Allocates memory for partition function cofolding sequences not longer than LENGTH, and sets up pairing matrix and energy parameters. -- data type: plist struct {int i; int j; float p;} -- Function: extern struct plist *get_plist (struct plist *PL, int LENGTH, double CUT_OFF) Makes a list of base pairs out of the global *PR array, ignoring all base pairs with a probability less than CUT_OFF. This is useful since the PR array gets overwritten by subsequent foldings. Prototypes for these functions are declared in `co_part_func.h'. 2.7 Cofolding all Dimeres, Concentrations ========================================= After computing the partition functions of all possible dimeres one can compute the probabilities of base pairs, the concentrations out of start concentrations and sofar and soaway. -- Function: void compute_probabilities (double FAB, double FEA, double FEB, struct plist *PRAB, struct plist *PRA, struct plist *PRB, int ALENGTH) Given the pair probabilities and free energies (in the null model) for a dimer AB and the two constituent monomers A and B, compute the conditional pair probabilities given that a dimer AB actually forms. Null model pair probabilities are given as a list as produced by `get_plist()'), the dimer probabilities PRAB are modified in place. Dimer formation is inherently concentration dependent. Given the free energies of the monomers A and B and dimers AB, AA, and BB one can compute the equilibrium concentrations, given input concentrations of A and B, see e.g.~`Dimitrov & Zuker (2004)' -- data type: ConcEnt struct {double A0; double B0;double ABc;double AAc; double BBc; double Ac; double Bc;} -- Function: extern struct ConcEnt *get_concentrations(double FEAB, double FEAA, double FEBB, double FEA, double FEB, double * STARTCONC) Takes an array STARTCONC of input concentrations with alternating entries for the initial concentrations of molecules A and B (terminated by two zeroes), then computes the resulting equilibrium concentrations from the free energies for the dimers. Dimer free energies should be the dimer-only free energies, i.e. the FcAB entries from the `cofoldF' struct. Prototypes for these functions are declared in `co_part_func.h'. 2.8 Partition Function Cofolding as stepwise process ==================================================== In this approach to cofolding the interaction between two RNA molecules is seen as a stepwise process. In a first step, the target molecule has to adopt a structure in which a binding site is accessible. In a second step, the ligand molecule will hybridize with a region accessible to an interaction. Consequently the algorithm is designed as a two step process: The first step is the calculation of the probability that a region within the target is unpaired, or equivalently, the calculation of the free energy needed to expose a region. In the second step we compute the free energy of an interaction for every possible binding site. -- data type: pu_contrib struct {double **H; double **I; double **M; double **E; int length;} -- Function: `pu_contrib' *pf_unstru (char *SEQUENCE, int U) This function calculates the partition function over all unpaired regions in *SEQUENCE of a maximal length U. You have to call function `pf_fold()' for *SEQUENCE befor calling `pf_unstru()'. If you want to calculate unpaired regions for a constrained structure, set variable STRUCTURE in function `pf_fold()' to the constrain string. Function `pf_unstru' returns a `pu_contrib' struct containing four arrays of dimension [i = 1 to length of *SEQUENCE][j = 0 to U-1] containing all possible contributions to the probabilities of unpaired regions of maximum length U. Each array in `pu_contrib' contains one of the contributions to the total probability of being unpaired: The probability of being unpaired within an exterior loop is in array `pu_contrib->E', the probability of being unpaired within a hairpin loop is in array `pu_contrib->H', the probability of being unpaired within an interior loop is in array `pu_contrib->I' and probability of being unpaired within a multi-loop is in array `pu_contrib->M'. The total probability of being unpaired is the sum of the four arrays of `pu_contrib'. Function `pf_unstru' frees everything allocated automatically. To free the output structure call `free_pu_contrib'. -- Function: void free_pu_contrib (`pu_contrib' *P_CON) frees the output of function `pf_unstru'. -- data type: interact struct {double *Pi; double *I; double Gikjl; double Gikjl_wo; int i, int k, int j, int l, int length;} -- Function: interact *pf_interact (const char *S1, const char *S2, `pu_contrib' *P_C, `pu_contrib' *P_C2,int W, char *CSTRUC, int INCR3, int INCR5) Calculates the probability of a local interaction between sequence *S1 and sequence *S2, considering the probability that the region of interaction is unpaired within *S1 and *S2. The longer sequence has to be given as *S1. The shorter sequence has to be given as *S2. Function `pf_unstru()' has to be called for *S1 and *S2, where the probabilities of being unpaired have to be given in *P_C and *P_C2, respectively. If you do not want to include the probabilities of being unpaired for *S2 set *P_C2 to NULL. If variable char *CSTRUC is not NULL, constrained folding is done: The available constrains for intermolecular interaction are: '.' (no constrain), 'x' (the base has no intermolecular interaction) and '|' (the corresponding base has to be paired intermolecularily). The parameter W determines the maximal length of the interaction. The parameters INCR5 and INCR3 allows inclusion of unpaired residues left (INCR5) and right (INCR3) of the region of interaction in *S1. If the INCR options are used, function `pf_unstru()' has to be called with W=W+INCR5+INCR3 for the longer sequence *S1. Function `pf_interact()' returns a structure of type `interact' which contains the probability of the best local interaction including residue i in Pi and the minimum free energy in Gi, where i is the position in sequence *S1. The member Gikjl of structure `interact' is the best interaction between region [k,i] k= PAIRSIZE. Note that in contrast to `Lfold()', bases outside of the window do not influence the structure at all. Only probabilities higher than CUTOFF are kept. The remaining arguments are optional in the sense that they can be set to NULL unless the corresponding functionality is wanted. If PU is supplied (i.e is not the NULL pointer), `pfl_fold()' will also compute local _accessibilities_, as the mean probability that regions of length `u' or smaller are unpaired. The parameter `ulength' is supplied in `pU[0][0]'. On return the PU array will contain these probabilities, with the entry on `pU[x][y]' containing the mean probability that x and the y following bases are unpaired. If DPP2 is supplied (i.e is not the NULL pointer), `pfl_fold()' will also compute the probability of stacked pairs, more precisely the conditional probabilities of base pairs (i,j) given that the enclosing pair (i-1,j+1) is formed. The file pointers PUFP and SPUP file pointer is provided, the unpaired probabilities will direcly be printed and not saved, which can be useful when memory is an issue). If the file pointer is provided, the base pair probabilities will direcly be printed and not saved, which can be useful when memory is an issue). -- Function: void init_pf_foldLP (int LENGTH) Analog to `init_pf_fold()'. LENGTH is the length of the sequence. -- Function: void free_pf_arraysLP (void) frees the arrays used by `pfl_fold()' -- Function: void update_pf_paramsLP (int LENGTH) Analog to `update_pf_params()'. LENGTH is the length of the sequence.  File: RNAlib.info, Node: Alignment Fold, Next: Fold Vars, Prev: Local Fold, Up: Folding Routines 2.10 Predicting Consensus Structures from an Alignment ====================================================== Consensus structures can be predicted by a modified version of the `fold()' algorithm that takes a set of aligned sequences instead of a single sequence. The energy function consists of the mean energy averaged over the sequences, plus a covariance term that favors pairs with consistent and compensatory mutations and penalizes pairs that cannot be formed by all structures. For details see `Hofacker (2002)'. -- Function: float alifold (char** SEQUENCES, char* STRUCTURE) Predicts the consensus structure for the aligned SEQUENCES and returns the minimum free energy; the mfe structure in bracket notation (*note notations::) is returned in STRUCTURE. Sufficient space must be allocated for STRUCTURE before calling `alifold()'. -- Function: void free_alifold_arrays () frees the memory allocated by `alifold()'. -- Function: void update_alifold_params () call this to recalculate the pair matrix and energy parameters after a change in folding parameters like `temperature' (*note Variables: Fold Vars.). -- Function: float alipf_fold (char** SEQUENCES, char* STRUCTURE, pair_info **PI) The partition function version of `alifold()' works in analogy to `pf_fold()'. Pair probabilities and information about sequence covariations are returned via the PI variable as a list of `pair_info' structs. The list is terminated by the first entry with `pi.i = 0'. -- data type: pair_info struct {short i,j; float p; float ent; short bp[8]; char comp;} for each base pair `i,j' the structure lists: its probability `p', an entropy-like measure for its well-definedness `ent', and in `bp[]' the frequency of each type of pair. `bp[0]' contains the number of non-compatible sequences, `bp[1]' the number of CG pairs, etc. -- Variable: double cv_fact `cv_fact' controls the weight of the covariance term in the energy function. Default is 1. -- Variable: double nc_fact controls the magnitude of the penalty for non-compatible sequences in the covariance term. Default is 1.  File: RNAlib.info, Node: Fold Vars, Next: Param Files, Prev: Alignment Fold, Up: Folding Routines 2.11 Global Variables for the Folding Routines ============================================== The following global variables change the behavior the folding algorithms or contain additional information after folding. -- Variable: int noGU do not allow GU pairs if equal 1; default is 0. -- Variable: int no_closingGU if 1 allow GU pairs only inside stacks, not as closing pairs; default is 0. -- Variable: int noLonelyPairs Disallow all pairs which can *only* occur as lonely pairs (i.e. as helix of length 1). This avoids lonely base pairs in the predicted structures in most cases. -- Variable: int tetra_loop include special stabilizing energies for some tetra loops; default is 1. -- Variable: int energy_set if 1 or 2: fold sequences from an artificial alphabet ABCD..., where A pairs B, C pairs D, etc. using either GC (1) or AU parameters (2); default is 0, you probably don't want to change it. -- Variable: float temperature rescale energy parameters to a temperature of `temperature' C. Default is 37C. You have to call the update_..._params() functions after changing this parameter. -- Variable: int dangles if set to 0 no stabilizing energies are assigned to bases adjacent to helices in free ends and multiloops (so called dangling ends). Normally (`dangles = 1') dangling end energies are assigned only to unpaired bases and a base cannot participate simultaneously in two dangling ends. In the partition function algorithm `pf_fold()' these checks are neglected. If `dangles' is set to 2, the `fold()' and `energy_of_struct()' function will also follow this convention. This treatment of dangling ends gives more favorable energies to helices directly adjacent to one another, which can be beneficial since such helices often do engage in stabilizing interactions through co-axial stacking. If `dangles = 3' co-axial stacking is explicitly included for adjacent helices in mutli-loops. The option affects only mfe folding and energy evaluation (`fold()' and `energy_of_struct()'), as well as suboptimal folding (`subopt()') via re-evaluation of energies. Co-axial stacking with one intervening mismatch is not considered so far. Default is 1, `pf_fold()' treats 1 as 2. -- Variable: char* nonstandards Lists additional base pairs that will be allowed to form in addition to GC, CG, AU, UA, GU and UG. Nonstandard base pairs are given a stacking energy of 0. -- Variable: int cut_point To evaluate the energy of a duplex structure (a structure formed by two strands), concatenate the to sequences and set CUT_POINT to the first base of the second strand in the concatenated sequence. Reset to 0 or -1 (the default) to evaluate normal single sequence structures. -- Variable: struct bond { int i,j;} base_pair Contains a list of base pairs after a call to `fold()'. `base_pair[0].i' contains the total number of pairs. -- Variable: double* pr contains the base pair probability matrix after a call to `pf_fold()'. -- Variable: int* iindx index array to move through pr. The probability for base i and j to form a pair is in `pr[iindx[i]-j]'. -- Variable: float pf_scale a scaling factor used by `pf_fold()' to avoid overflows. Should be set to approximately exp((-F/kT)/length), where F is an estimate for the ensemble free energy, for example the minimum free energy. You must call `update_pf_params()' or `init_pf_fold()' after changing this parameter. If pf_scale is -1 (the default) , an estimate will be provided automatically when calling `init_pf_fold()' or `update_f_params()'. The automatic estimate is usually insufficient for sequences more than a few hundred bases long. -- Variable: int fold_constrained If 1, calculate constrained minimum free energy structures. *Note mfe Fold::, for more information. Default is 0; -- Variable: int do_backtrack if 0, do not calculate pair probabilities in `pf_fold()'; this is about twice as fast. Default is 1. -- Variable: char backtrack_type only for use by `inverse_fold()'; 'C': force (1,N) to be paired, 'M' fold as if the sequence were inside a multi-loop. Otherwise the usual mfe structure is computed. include `fold_vars.h' if you want to change any of these variables from their defaults.  File: RNAlib.info, Node: Param Files, Prev: Fold Vars, Up: Folding Routines 2.12 Energy Parameter Files =========================== A default set of parameters, identical to the one described in `Mathews et.al. (1999)', is compiled into the library. Alternately, parameters can be read from and written to a file. -- Function: void read_parameter_file (const char FNAME[]) reads energy parameters from file FNAME. See below for the format of the parameter file. -- Function: void write_parameter_file (const char FNAME[]) writes current energy parameters to the file FNAME. The following describes the file format expected by `read_parameter_file()'. All energies should be given as integers in units of 0.01kcal/mol. Various loop parameters depend in general on the pairs closing the loops, as well as unpaired bases in the loops. Internally, the library distinguishes 8 types of pairs (CG=1, GC=2, GU=3, UG=4, AU=5, UA=6, nonstandard=7, 0= no pair), and 5 types of bases (A=1, C=2, G=3, U=4 and 0 for anything else). Parameters belonging to pairs of type 0 are not listed in the parameter files, but values for nonstandard pairs (type 7) and nonstandard bases (type 0) are. Thus, a table for symmetric size 2 interior loops would have 7*7*5*5 entries (2 pairs, two unpaired bases). The order of entries always uses the closing pair or pairs as the first indices followed by the unpaired bases in 5' to 3' direction. To determine the type of a pair consider the base at the (local) 5' end of each strand first, i.e. use the pairs (i,j) and (q,p) for an interior loop with i #include #include "utils.h" #include "fold_vars.h" #include "fold.h" #include "part_func.h" #include "inverse.h" #include "RNAstruct.h" #include "treedist.h" #include "stringdist.h" #include "profiledist.h" void main() { char *seq1="CGCAGGGAUACCCGCG", *seq2="GCGCCCAUAGGGACGC", *struct1,* struct2,* xstruc; float e1, e2, tree_dist, string_dist, profile_dist, kT; Tree *T1, *T2; swString *S1, *S2; float **pf1, **pf2; /* fold at 30C instead of the default 37C */ temperature = 30.; /* must be set *before* initializing */ /* allocate memory for fold(), could be skipped */ initialize_fold(strlen(seq1)); /* allocate memory for structure and fold */ struct1 = (char* ) space(sizeof(char)*(strlen(seq1)+1)); e1 = fold(seq1, struct1); struct2 = (char* ) space(sizeof(char)*(strlen(seq2)+1)); e2 = fold(seq2, struct2); free_arrays(); /* free arrays used in fold() */ /* produce tree and string representations for comparison */ xstruc = expand_Full(struct1); T1 = make_tree(xstruc); S1 = Make_swString(xstruc); free(xstruc); xstruc = expand_Full(struct2); T2 = make_tree(xstruc); S2 = Make_swString(xstruc); free(xstruc); /* calculate tree edit distance and aligned structures with gaps */ edit_backtrack = 1; tree_dist = tree_edit_distance(T1, T2); free_tree(T1); free_tree(T2); unexpand_aligned_F(aligned_line); printf("%s\n%s %3.2f\n", aligned_line[0], aligned_line[1], tree_dist); /* same thing using string edit (alignment) distance */ string_dist = string_edit_distance(S1, S2); free(S1); free(S2); printf("%s mfe=%5.2f\n%s mfe=%5.2f dist=%3.2f\n", aligned_line[0], e1, aligned_line[1], e2, string_dist); /* for longer sequences one should also set a scaling factor for partition function folding, e.g: */ kT = (temperature+273.15)*1.98717/1000.; /* kT in kcal/mol */ pf_scale = exp(-e1/kT/strlen(seq1)); init_pf_fold(strlen(seq1)); /* calculate partition function and base pair probabilities */ e1 = pf_fold(seq1, struct1); pf1 = Make_bp_profile(strlen(seq1)); e2 = pf_fold(seq2, struct2); pf2 = Make_bp_profile(strlen(seq2)); free_pf_arrays(); /* free space allocated for pf_fold() */ profile_dist = profile_edit_distance(pf1, pf2); printf("%s free energy=%5.2f\n%s free energy=%5.2f dist=%3.2f\n", aligned_line[0], e1, aligned_line[1], e2, profile_dist); free_profile(pf1); free_profile(pf2); } In a typical Unix environment you would compile this program using: `cc -c example.c -IHPATH' and link using `cc -o example example.o -LLPATH -lRNA -lm' where HPATH and LPATH point to the location of the header files and library, respectively.  File: RNAlib.info, Node: References, Next: Function Index, Prev: Example, Up: Top 6 References ************ - D.H. Mathews, J. Sabina, M. Zuker and H. Turner (1999) Expanded sequence dependence of thermodynamic parameters provides robust prediction of RNA secondary structure, JMB, 288: 911-940 - M. Zuker and P. Stiegler (1981) Optimal computer folding of large RNA sequences using thermodynamic and auxiliary information, Nucl Acid Res 9: 133-148 - D.A. Dimitrov, M.Zuker(2004) Prediction of hybridization and melting for double stranded nucleic acids, Biophysical J. 87: 215-226, - J.S. McCaskill (1990) The equilibrium partition function and base pair binding probabilities for RNA secondary structures, Biopolymers 29: 1105-1119 - D.H. Turner, N. Sugimoto and S.M. Freier (1988) RNA structure prediction, Ann Rev Biophys Biophys Chem 17: 167-192 - J.A. Jaeger, D.H. Turner and M. Zuker (1989) Improved predictions of secondary structures for RNA, Proc. Natl. Acad. Sci. 86: 7706-7710 - L. He, R. Kierzek, J. SantaLucia, A.E. Walter and D.H. Turner (1991) Nearest-Neighbor Parameters For GU Mismatches, Biochemistry 30: 11124-11132 - A.E. Peritz, R. Kierzek, N, Sugimoto, D.H. Turner (1991) Thermodynamic Study of Internal Loops in Oligoribonucleotides ... , Biochemistry 30: 6428-6435 - A. Walter, D. Turner, J. Kim, M. Lyttle, P. Mu"ller, D. Mathews and M. Zuker (1994) Coaxial stacking of helices enhances binding of Oligoribonucleotides.., Proc. Natl. Acad. Sci. 91: 9218-9222 - B.A. Shapiro, (1988) An algorithm for comparing multiple RNA secondary structures, CABIOS 4, 381-393 - B.A. Shapiro and K. Zhang (1990) Comparing multiple RNA secondary structures using tree comparison, CABIOS 6, 309-318 - R. Bruccoleri and G. Heinrich (1988) An improved algorithm for nucleic acid secondary structure display, CABIOS 4, 167-173 - W. Fontana , D.A.M. Konings, P.F. Stadler, P. Schuster (1993) Statistics of RNA secondary structures, Biopolymers 33, 1389-1404 - W. Fontana, P.F. Stadler, E.G. Bornberg-Bauer, T. Griesmacher, I.L. Hofacker, M. Tacker, P. Tarazona, E.D. Weinberger, P. Schuster (1993) RNA folding and combinatory landscapes, Phys. Rev. E 47: 2083-2099 - I.L. Hofacker, W. Fontana, P.F. Stadler, S. Bonhoeffer, M. Tacker, P. Schuster (1994) Fast Folding and Comparison of RNA Secondary Structures. Monatshefte f. Chemie 125: 167-188 - I.L. Hofacker (1994) The Rules of the Evolutionary Game for RNA: A Statistical Characterization of the Sequence to Structure Mapping in RNA. PhD Thesis, University of Vienna. - I.L. Hofacker, M. Fekete, P.F. Stadler (2002). Secondary Structure Prediction for Aligned RNA Sequences. J. Mol. Biol. 319:1059-1066 - S.H. Bernhart, I.L. Hofacker, S. Will, A. Gruber, P.F. Stadler (2008). RNAalifold: improved consensus structure prediction for RNA alignments. BMC Bioinformatics, 9:474, 2008. - I.L. Hofacker and Peter F. Stadler (2006). Memory efficient folding algorithms for circular RNA secondary structures. Bioinformatics, 22:1172–1176. - S.H. Bernhart, H. Tafer, U. Mückstein, Ch. Flamm, P.F. Stadler, I.L. Hofacker (2006). Partition function and base pairing probabilities of RNA heterodimers. Algorithms Mol. Biol., 1:3. - S.H. Bernhart, I.L. Hofacker, P.F. Stadler (2006). Local RNA base pairing probabilities in large sequences. Bioinformatics, 22:614–615. - U. Mückstein, H. Tafer, J. Hackermüller, S.H. Bernhart, P.F. Stadler, I.L Hofacker (2006). Thermodynamics of RNA-RNA binding. Bioinformatics, 22:1177–1182. - I.L. Hofacker, B. Priwitzer, P.F. Stadler (2004). Prediction of locally stable RNA secondary structures for genome-wide surveys. Bioinformatics, 20:186–190. - D. Adams (1979) The hitchhiker's guide to the galaxy, Pan Books, London  File: RNAlib.info, Node: Function Index, Next: Variable Index, Prev: References, Up: Top Function Index ************** [index] * Menu: * *centroid: PF Fold. (line 57) * *MEA: PF Fold. (line 67) * *pf_interact: Cofolding. (line 183) * *pf_unstru: Cofolding. (line 154) * add_root: Parsing. (line 31) * alifold: Alignment Fold. (line 14) * alipf_fold: Alignment Fold. (line 29) * b2C: Parsing. (line 18) * b2HIT: Parsing. (line 14) * b2Shapiro: Parsing. (line 22) * bp_distance: Distances. (line 20) * co_pf_fold: Cofolding. (line 70) * cofold: Cofolding. (line 36) * compute_probabilities: Cofolding. (line 109) * energy_of_struct: mfe Fold. (line 25) * expand_Full: Parsing. (line 10) * expand_Shapiro: Parsing. (line 27) * fold: mfe Fold. (line 11) * free_alifold_arrays: Alignment Fold. (line 20) * free_arrays: mfe Fold. (line 36) * free_co_arrays: Cofolding. (line 41) * free_co_pf_arrays: Cofolding. (line 81) * free_interact: Cofolding. (line 211) * free_pf_arrays: PF Fold. (line 38) * free_pf_arraysLP: Local Fold. (line 49) * free_profile: Profile Distances. (line 20) * free_pu_contrib: Cofolding. (line 175) * free_tree: Tree Distances. (line 15) * get_line: Utilities. (line 98) * gmlRNA: Utilities. (line 39) * hamming: Utilities. (line 56) * init_co_pf_fold: Cofolding. (line 85) * init_pf_fold: PF Fold. (line 33) * init_pf_foldLP: Local Fold. (line 46) * initialize_cofold: Cofolding. (line 44) * initialize_fold: mfe Fold. (line 31) * int_urn: Utilities. (line 91) * inverse_fold: Inverse Fold. (line 10) * inverse_pf_fold: Inverse Fold. (line 21) * Lfold: Local Fold. (line 10) * Make_bp_profile: Profile Distances. (line 10) * make_pair_table: Utilities. (line 71) * Make_swString: String Distances. (line 7) * make_tree: Tree Distances. (line 7) * mean_bp_dist: PF Fold. (line 46) * nrerror: Utilities. (line 80) * pack_structure: Utilities. (line 60) * parse_structure: Parsing. (line 48) * pf_fold: PF Fold. (line 13) * plist: Local Fold. (line 18) * profile_edit_distance: Profile Distances. (line 17) * PS_dot_plot: Utilities. (line 9) * PS_rna_plot: Utilities. (line 19) * PS_rna_plot_a: Utilities. (line 27) * random_string: Utilities. (line 52) * read_parameter_file: Param Files. (line 11) * space: Utilities. (line 94) * string_edit_distance: String Distances. (line 11) * struct: Cofolding. (line 93) * subopt: Suboptimal folding. (line 10) * svg_RNA_plot: Utilities. (line 34) * time_stamp: Utilities. (line 76) * tree_edit_distance: Tree Distances. (line 12) * unexpand_aligned_F: Parsing. (line 43) * unexpand_Full: Parsing. (line 34) * unpack_structure: Utilities. (line 67) * unweight: Parsing. (line 38) * update_alifold_params: Alignment Fold. (line 23) * update_fold_params: mfe Fold. (line 39) * update_pf_params: PF Fold. (line 41) * update_pf_paramsLP: Local Fold. (line 52) * urn: Utilities. (line 83) * write_parameter_file: Param Files. (line 15)  File: RNAlib.info, Node: Variable Index, Prev: Function Index, Up: Top Variable Index ************** [index] * Menu: * *symbolset: Inverse Fold. (line 29) * aligned_line: Distances. (line 76) * backtrack_type: Fold Vars. (line 100) * base_pair: Fold Vars. (line 69) * cost_matrix: Distances. (line 68) * cut_point <1>: Fold Vars. (line 62) * cut_point: Cofolding. (line 29) * cv_fact: Alignment Fold. (line 44) * dangles: Fold Vars. (line 36) * do_backtrack: Fold Vars. (line 96) * edit_backtrack: Distances. (line 72) * energy_set: Fold Vars. (line 26) * fold_constrained: Fold Vars. (line 92) * give_up: Inverse Fold. (line 13) * helix_size: Parsing. (line 59) * iindx: Fold Vars. (line 77) * loop_degree: Parsing. (line 56) * loop_size: Parsing. (line 52) * loops: Parsing. (line 62) * nc_fact: Alignment Fold. (line 48) * no_closingGU: Fold Vars. (line 13) * noGU: Fold Vars. (line 10) * noLonelyPairs: Fold Vars. (line 17) * nonstandards: Fold Vars. (line 57) * pairs: Parsing. (line 65) * pf_scale: Fold Vars. (line 81) * pr: Fold Vars. (line 73) * rna_plot_type: Utilities. (line 46) * temperature: Fold Vars. (line 31) * tetra_loop: Fold Vars. (line 22) * unpaired: Parsing. (line 68) * xsubi: Utilities. (line 87)  Tag Table: Node: Top0 Node: Introduction602 Node: Folding Routines1612 Node: mfe Fold2423 Node: PF Fold4424 Node: Inverse Fold8436 Node: Suboptimal folding10073 Node: Cofolding10754 Node: Local Fold22112 Node: Alignment Fold24820 Node: Fold Vars27147 Node: Param Files31766 Node: Parsing and Comparing39083 Node: notations39493 Node: Parsing43016 Node: Distances45657 Node: Tree Distances49520 Node: String Distances50298 Node: Profile Distances50828 Node: Utilities51907 Node: Example56543 Node: References60052 Node: Function Index64180 Node: Variable Index69658  End Tag Table