Computational Methods in Biomolecular Structures and Interaction Networks

(9 Jul - 3 Aug 2007)

 Jointly organized with Genome Institute of Singapore

~ Abstracts ~

Computational and experimental Analysis of signaling pathway crosstalk in developing tissues
Jessica Lembong, Princeton University, USA

The main goal of the talk is to illustrate how multiple experimental (genomic, genetic, imaging) and computational (mechanistic and data-driven modeling) are combined to provide insights into the dynamics of living tissues. Given the fact that the same pathways which coordinate tissue development are deregulated in multiple diseases, this work has broad implications for tissue dynamics in all animals.

Pattern formation in development relies on combinatorial interactions of signaling pathways. Multiple cellular and biochemical mechanisms of signaling crosstalk have been characterized, but the spatial and temporal aspects of pathway integration remain poorly understood. We show that in Drosophila egg development (oogenesis), the dynamics of the evolutionarily conserved epidermal growth factor receptor (EGFR) and bone morphogenetic protein (BMP) pathways are coordinated by a spatially distributed feedforward circuit. The two pathways act independently during early eggshell patterning, but become fully integrated when EGFR assumes complete control over BMP signaling via coordinated regulation of multiple BMP pathway components. The dynamic consequences of this mode of pathway integration are manifested at the genome-wide level, in the coordinated response of the transcriptional targets of EGFR and BMP signals. The signaling and transcriptional patterns induced by these interactions are conserved across species (separated by tens of million years), suggesting that feedforward control is a general strategy for the coordination of convergent inductive signals.

While experimental studies led us to the discovery of this pathway coordination mechanism, computational models are important to further understand the extent and the importance of this regulation. In modeling signaling pathways, we focus on three main processes: the extracellular transport of ligand, the cytoplasmic signal transduction, and the nuclear-cytoplasmic shuttling of signal transduction molecules. By combining these three modules, we have developed a model which can make experimentally testable predictions of the spatiotemporal dynamics of EGFR and BMP signaling in oogenesis. We will present the results of the computational analysis of this model and the first steps towards the experimental validation of its predictions.

This work is done jointly with Nir Yakoby, Chris Bristow, Trudi Schupbach, and Stanislav Shvartsman.

 

« Back...


 

Dynamic transcriptional control of gene expressions using sequential logic model (Cellogica)
Masa Tsuchiya, Keio University, Japan

Cellular signaling involves a sequence of events from ligand binding to membrane receptors through transcription factors activation and the induction of mRNA expression. The transcriptional-regulatory system plays a pivotal role in the control of gene expression. A novel computational approach to the study of gene regulation circuits is presented and a simulation software (Cellogica) is developed.

Based on the concept of finite state machine, which provides a discrete view of gene regulation, a novel sequential logic model (SLM) is developed to decipher control mechanisms of dynamic transcriptional regulation of gene expressions. The SLM technique is also used to systematically analyze the dynamic function of transcriptional inputs, the dependency and cooperativity, such as synergy effect, among the binding sites with respect to when, how much and how fast the gene of interest is expressed.

SLM is verified by a set of well studied expression data on endo16 of Strongylocentrotus purpuratus (sea urchin) during the embryonic midgut development. A dynamic regulatory mechanism for endo16 expression controlled by three binding sites, UI, R and Otx is identified and demonstrated to be consistent with experimental findings. During transition from specification to differentiation in the wild type endo16 expression profile, SLM reveals the three binary activities are not sufficient to explain the transcriptional regulation of endo16 expression and additional activities of binding sites are required. Further analyses suggest a novel repression effect of R during specification to differentiation stage that is independent of UI activation.

The sequential logic formalism allows for a simplification of regulation network dynamics going from a continuous to a discrete representation of gene activation in time. In effect our SLM is non-parametric and model-independent, yet providing rich biological insight. The demonstration of the efficacy of this approach in endo16 is a promising step for further application of the proposed method.
 

« Back....


 

Control of fibrosis by the Triple Helix-Forming Oligodeoxyribonucleotides targeting the promoter of Type I Collagen gene
Ramareddy Guntaka, University of Tennessee, USA

In response to injury, an evolutionarily conserved wound healing process occurs. This response, if gone awry, can result in a pathologic process commonly referred to as ‘fibrosis’. Fibrosis is due to abnormal accumulation of ‘extracellular matrix (ECM) proteins’, and is responsible for causing structural alterations and loss of function of the involved organ in the body. The main component of the ECM is Type I Collagen. In our laboratory, we have developed triple helix-forming oligodeoxyribonucleotides (TFOs), which form triplex structures with the C1 region (-170 to -141 from transcription start site) of the α1(I) collagen gene promoter. This region contains a stretch of about 30 pyrimidines on the upper (coding) strand and the complementary purine stretch on the lower (non-coding) strand.

Kinetic studies indicated that the antiparallel homopurine strand (GGGAAGGAAAGGGAGGAGGGGGGAG) forms the most stable and efficient triplexes, with a Kd of about 10-8 compared to the parallel homopyrimidine or homopurine strands. Even the overlapping 18-mers (AAAGGGAGGAGGGGGGAG and GGAAGGAAAGGGAGGAGG) form triplexes very efficiently. Both the 25-mer and the 18-mers containing the contiguous stretch of 6 Gs readily form G-quartets in the presence of Na+ ions. Further, we showed that triplex formation results in a significant inhibition of transcription of the α1(I) gene both in cell-free nuclear extracts and in cells transfected with a1(I) collagen promoter-containing plasmid DNAs. However, mutant TFOs which failed to form triplexes also failed to inhibit transcription, indicating that triplex formation is essential for inhibition of transcription. Recently, using psoralen-conjugated TFOs to cross-link the TFO to the target promoter, we could demonstrate formation of triplexes with the native collagen gene promoter in stellate cells.

In order to determine whether the TFO has any inhibitory effect on fibrosis in vivo, we used a rat model for liver fibrosis. It has been shown that treatment of rats with dimethyl nitrosamine (DMN) induces hepatic fibrosis, which is mainly mediated by production of abnormal amounts of Type I collagen by activated stellate cells. Biodistribution studies indicated that when the TFO is administered by intraperitoneal route, it is predominantly taken up by the liver and kidney and in small amounts by other organs including heart and lung. Further we showed that the uptake by stellate cells is relatively more compared to the hepatocytes and endothelial cells. This uptake can be further enhanced by conjugating the TFO to mannose-6-phosphate. In addition, we could also demonstrate accumulation of significant amounts of the TFO in the nuclei of stellate cells. Parallel experiments with fibrotic rats also indicated that the TFO reached the target stellate cells.

Administration of these TFOs into rats at 10 mg/kg body weight had no deleterious effect on the growth of the animals or liver weights. At 4mg/kg, the 25-mer significantly reduced fibrosis as analyzed by the morphometric analysis of liver tissue sections stained for collagen with Masson trichrome stain. In the same group of animals, the TFO significantly improved the liver function as assessed by ALT levels. Both 18-mer TFOs also significantly reduced fibrosis. However, for the G-quartet-forming 18-mer, it is absolutely necessary to block G-quartet formation for it to be effective in reducing fibrosis.
 

« Back....
 

 

Gene expression: molecular mechanisms of gene transcription
Piergiorgio Percipalle, Center of Molecular Biology, Sweden

During gene expression the information contained in a linear DNA sequence of nucleotides termed gene is decoded into a protein, a linear sequence of amino acids. This process is highly regulated and consists of several steps. Initially a DNA molecule is used as template to produce an intermediate molecule termed messenger-RNA (mRNA) which is then further translated to yield a protein as the final outcome. This process is universal but there are differences in prokaryotic and eukaryotic cells mainly due to the different extent of cellular compartmentalisation in each cell type. In any cases, proteins are essential for cell survival since they are involved in all cellular processes and therefore the process of gene expression is universal.

In this lecture we are going to discuss the mechanisms underlying the first step of gene expression termed gene transcription. In particular we are going to emphasize the importance of RNA polymerase, the key enzyme in the process and how it functions to produce an mRNA molecule. In addition we are going to discuss the role of chromatin in eukaryotic organisms and how modifications of chromatin structure are required for productive gene transcription. Finally we will put the process of gene transcription in the context of nuclear architecture and compare it with transcription of ribosomal genes.

 

« Back....

 

Gene expression: the mechanisms of mRNA translation and protein synthesis
Piergiorgio Percipalle, Center of Molecular Biology, Sweden

In the first part of gene expression, a gene is transcribed into an mRNA molecule. This intermediate nucleic acid is then used as template to decode further the information and yield a protein. The process during which an mRNA molecule is decoded is known as mRNA translation or protein synthesis and it is considered as the second phase of gene expression.

mRNA translation is a very complex event that is highly conserved in all organisms. It is carried out by very complex molecular machineries called ribosomes and requires a large number of regulatory factors that guarantee the correct functioning of the process. A faulty mRNA translation often correlates with disease. Therefore it is very important to understand how these mechanisms operate. In this lecture we are going to dissect how an mRNA molecule originated during the first phase of gene expression – gene transcription – is decoded and we are going to go in details on how ribosomes mediate this process.
 

« Back....

 

Projection of gene-protein networks to functional space of mammalian proteome via alternative splicing
Alexander Kanapin, European Bioinformatics Institute, UK

Informational content of genome coding sequences unfolds via functions of proteins. Alternative splicing is one of the ways of the genome manifestation into its proteome. We consider the problem of projection of genetic information into the functional space of the proteome. The later is defined as a set of all functions committed by proteins. First, we automatically created a set of functional labels of proteins attributed by conservative protein domains. A new type of relational networks between genomic DNA sequences and functional labels has been proposed. The networks has been used to analyze the acquisition of a new function in the proteome and evolutionary plasticity of the genome.

We use the InterPro database as integrative resource combining data from different protein domain databases. About 70% of InterPro entries have UniProt Keywords (KW) assigned to them. This makes it possible to assign KW ID to a protein sequence via its conservative domains. Then we consider the combinations of KW IDs as the functional labels (FL) which characterize the biological functions of the given protein. The unique set of functional labels of a proteome is considered as the functional space or functional complexity of the proteome. Then we analyzed how protein isoforms (PI), produced by alternative splicing, differ in their functional annotation. By mapping of a set of transcriptional units (TU) for human and mouse transcriptomes, produced by the FANTOM consortium, on the set of the functional labels, we construct and characterize  TU/FL interconnection networks.

We created a catalog of common and unique functional labels to both mammalian species tested based on functional labels analysis of the transcriptome. The new type of functional networks and statistics of FL-to-protein links and PI-to-TUs links for human and mouse proteomes is derived. Using statistical analysis of these networks, we propose an evolutionary mechanism of protein function acquisition. The process includes stages with different evolutionary constraints. The network analysis allows usto analyze genome-transcriptome-proteome evolutionary plasticity. We also compare diversity of biological functions of proteomes in different species. The functional networks reveal a group of genes and corresponding functions which could be attributed to an early conservative part of the cellular machinery. 
 

« Back....


 

Sequence-specific interaction of PNA with duplex DNA
Maxim Frank-Kamenetskii, Boston University, USA

Peptide Nucleic Acid (PNA) is an artificial analog of nucleic acids carrying DNA bases and a peptide backbone. It has pretty unusual modes of binding to DNA and RNA. Most interestingly, PNA has a unique ability of sequence-specific targeting double-stranded DNA (dsDNA) by invading the DNA double helix. There are two modes of duplex invasion complexes of PNA with DNA. Homopyrimidine PNAs, known also as DNA openers, invade duplex DNA via triplex formation leaving one of two DNA strands displaced and thus capable of interactions with single-stranded oligonucleotides and PNA oligomers via Watson-Crick pairing. Pseudocomplementary PNAs (pcPNAs), which carry chemically modified bases, exhibit a double-duplex mode of binding to duplex DNA interacting with both complementary strands. The mechanism, structure and applications of these two types of sequence-specific PNA-dsDNA complexes are covered. In particular, the use of PNA openers for highly specific fluorescence detection of short signature sequences in bacterial genomes and the use of pcPNAs for sequence-specific bending of duplex DNA are discussed in detail. 
 

« Back....


 

Hybrid hamiltonian replica exchange based on poisson-boltzmann model
Yuguang Mu, Nanyang Technological University, Singapore

A modified sampling scheme which utilizes the framework of replica-exchange simulation with explicit solvent molecules and replaces the system exchange probability for jumping between different temperature replicas with the kernel exchange probability is brought out termed as REMDhPB. The kernel mainly includes the peptide/protein and/or protein complexes only. The energy of the kernel is the sum of the vacuum energy calculated by force field and the polar solvation energy obtained from Poisson-Boltzmann model plus the non-polar solvation energy estimated from solvent accessible surface. Canonical distributions of the conformations of three distinct penta-peptides were obtained in comparison with those from standard REMD simulations. Moreover this method has been applied to ab inito fold a decapeptide peptide to its native beta hairpin structure from extended conformations.
 

« Back....


 

Increasing confidence of protein-protein interactomes
Limsoon Wong, National University of Singapore

High-throughput experimental methods, such as yeast-two-hybrid and phage display, have fairly high levels of false positives (and false negatives). Thus the list of protein-protein interactions detected by such experiments would need additional wet laboratory validation. It would be useful if the list could be prioritized in some way. Advances in computational techniques for assessing the reliability of protein-protein interactions detected by such high-throughput methods are reviewed in this talk, with a focus on techniques that rely only on topological information of the protein interaction network derived from such high-throughput experiments. In particular, we discuss indices that are abstract mathematical characterizations of networks of reliable protein-protein interactions and indices that are based on explicit motifs associated with true-positive protein interactions.

 

« Back....

 

Large-scale inference of condition-specific regulation using gene expression data and the predicted transcription factor occupancy of promoters
Neil Clarke, Genome Institute of Singapore

Gene expression experiments have been performed under many different conditions. In contrast, large-scale ChIP-chip experiments (i.e., those involving many transcription factors) have been performed under just a few. It is likely, therefore, that only a fraction of condition-specific functional binding sites have been identified. Computational methods are required to further correlate factors, conditions, and target genes in order to infer more comprehensive regulatory networks, and to generate hypotheses that can be tested by directed ChIP experiments.

We have previously developed a method for predicting the probability of transcription factor binding to a promoter.[1] The method models cooperative and competitive binding in a physically meaningful manner, and appropriately uses protein concentration as a parameter. Genome-wide nucleosome location data has also been incorporated into the model to improve the prediction of bound sites.[2] We are now using this method to systematically compare predicted binding profiles for over a hundred transcription factors to the changes in gene expression in hundreds of microarray experiments, and have identified many conditions under which predicted binding is significantly correlated with gene regulation. A joint probability analysis, using gene expression changes and predicted binding probabilities, further identifies the genes that are most likely to be direct targets of the transcription factor under that condition. This analysis recapitulates interactions inferred from expression and ChIP-chip analyses, and makes novel predictions that can be tested by ChIP experiments under previously unexplored conditions.

1.Granek JA, Clarke ND: Explicit equilibrium modeling of transcription-factor binding and gene regulation. Genome Biol 2005, 6:R87.
2.Liu X, Lee CK, Granek JA, Clarke ND, Lieb JD: Whole-genome comparison of Leu3 binding in vitro and in vivo reveals the importance of nucleosome occupancy in target site selection. Genome Res 2006, 16:1517-1528.

 

« Back....

 

 

An overview of weighted gene co-expression network analysis
Steve Horvath, University of California, USA

Weighted gene co-expression network analysis (WGCNA) facilitates a systems biologic view of gene expression data. The network framework makes it straightforward to integrate gene expression data with other types of data, e.g. clinical traits and genetic marker data. This talk covers several theoretical topics including network construction, module definition, network based gene screening, and differential network analysis. The methods are illustrated using several applications including i) screening for cancer genes, ii) comparing human and chimp brains, and iii) complex disease gene mapping. Related articles and material can be found at the following webpage http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/

 

« Back....

 

Functions, networks, and phenotypes by integrative genomics analysis
Xianghong Jasmine Zhou, University of Southern California Los Angeles, USA

The rapid accumulation of genomics data provides unprecedented opportunities to systematically infer gene functions, regulatory networks, and phenotype associations. In this talk, we develop several graph-based data mining algorithms to integrate diverse genomics data, especially the vast amount of microarray data in the public repositories. A series of microarray data sets are modeled as a series of co-expression networks, in which we search for frequently occurring network patterns. Our integrative approach for functional annotation provides three major advantages over the commonly used microarray analysis methods: (1) enhance signal to noise separation (2) identify functionally related genes without co-expression, and (3) provides a way to predict gene functions in a context-specific way. Furthermore, we show that frequently occurring co-expression clusters are more likely to represent transcriptional modules than those clusters derived from a single microarray dataset. In addition, we propose the concept of "second-order correlation" which enables us to trace the upstream events of transcription cascades. Finally, we develop methods to systematically identify phenotype specific network patterns and regulatory modules.
 

« Back....

 

 

Modeling of genetic regulatory networks: a data-driven process
Wei Zhang, M. D. Anderson Cancer Center, USA

High throughput genomic and proteomic studies of clinical samples have generated a large amount of data but very little information and much less wisdom. We understand that transcripts and proteins are interlinked but it is a major challenge to develop appropriate mathematical models that reveal the logical and physical relationships among the components of the biological systems. We submit that a key modeling criterion is that the model has to be data-driven: it has to be able to take in biological data and produce experimentally testable diagrams or networks. Only when this correlation is demonstrated again and again can we reach a conclusion that a biologically appropriate mathematical model is born.

Our group developed a mathematical model termed Probabilistic Boolean Network (PBN) considering the uncertainties and probabilistic nature of biological systems. We applied this PBN model to a set of microarray data generated from 25 glioma tissues that were from different stages of the cancer developments. Then we generated two subnetworks focusing on two important genes for glioma development and progression: vascular endothelial growth factor (VEGF) and insulin-like growth factor binding protein 2 (IGFBP2). VEGF is required for angiogenesis, which is critical for providing nutrients for tumor growth. The VEGF subnetwork revealed a number of relationships that are supported by literature reports. IGFBP2 is overexpressed in 80% of the most advanced glioma, glioblastoma multiforme (GBM) and contributes to glioma cell migration and invasion. Mathematical modeling with glioma gene expression profiling data suggested that IGFBP2 was linked to the integrin pathway. This notion was subsequently validated by demonstration that IGFBP2 interacts with integrin through an RGD domain. We hypothesized that IGFBP2 is a key regulator of glioma progression. We tested our hypothesis using a glial-specific somatic gene transfer mouse model called the RCAS-tva model. In this system, avian virus receptor is only expressed in glial cells via a neuroglial-specific nestin promoter. Genes of interest are cloned into an avian RCAS vector and viral particles are expanded in DF1 avian fibroblasts. When injected into the mouse brain, the viral particles only infect glial cells and genes of interest are only expressed in the glial cells. Our results showed that chronic platelet-derived growth factor (PDGF) signaling leads exclusively to the formation of oligodendrogliomas (O). When PDGF is delivered in combination with IGFBP2, anaplastic oligodendrogliomas (AO) form. These higher-grade tumors are characterized by vascular proliferation, increased cellular density, and poor survival. Also using this model, combined activated K-Ras and Akt signaling leads to the formation of astrocytomas. However, up-regulation of Ras alone or Akt alone does not result in tumor formation. Forced IGFBP2 expression in combination with activated K-Ras leads to the formation of astrocytomas. These tumors are histologically similar to gliomas formed by K-Ras/Akt stimulation. Interestingly, we do not see tumor formation when Akt and IGFBP2 are delivered simultaneously. Therefore, IGFBP2 and Akt likely lie in the same pathway or in converging pathways. These data show that IGFBP2 actively contributes to tumor initiation and progression in two lineages of gliomas. Through these functional genomic and mathematical modeling studies, we believe we have gained important insight into a key gene, IGFBP2, which serves as a potential therapeutic target.

Reference:

1. Fuller, G.N., Rhee, C.H., Hess, K., Caskey, L., Wang, R.-P, Bruner, J.,Yung, A., and Zhang, W. Reactivation of insulin-like grwoth factor binding protein II expression during glioblastoma transformation revealed by parallel gene expression profiling. Cancer Res. 59:4228-4232, 1999.
2. Shmulevich I, Dougherty ER, Kim S, and Zhang W. Probabilistic Boolean network: a rule-based uncertainty model for gene regulatory networks. Bioinformatics 18:261-274, 2002.
3. Song SW, Fuller GN, Khan A, Kong S, Shen W, Taylor E, Ramdas L, Lang F, and Zhang W. IIp45, an insulin-like growth factor binding protein 2 (IGFBP-2) binding protein, antagonizes IGFBP-2 stimulation of glioma cell invasion. Proc Natl Acad Sci USA 100:13970-75, 2003.
4. Hashimoto RF, Kim S, Shmulevich I, Zhang W, Bittner ML, and Dougherty ER. A directed-graph algorithm to grow genetic regulatory subnetworks from seed genes based on strength of connection. Bioinformatics 20:1241-7.
5. Dunlap SM, Celestino J, Wang H, Jiang R, Holland E, Fuller GN, and Zhang W. IGFBP2 promotes gliomagenesis and progression. Proc Natl Acad. Sci. USA (in press)

 

« Back....

 

 

Local signaling networks defined by quantitative morphological signatures
Chris Bakal, Harvard University, USA

Genetically identical cells can adopt a diverse spectrum of complex shapes in order to accomplish a variety of functions. Classical experimental approaches have identified hundreds of unique proteins that play roles in the dynamic remodeling of cell shape in response to upstream signals, but there is little understanding of how these proteins are physically organized into networks in subcellular space, and how information flows through this sophisticated molecular circuitry in real-time. In order to model the signaling networks that regulate cell shape, we have developed a novel analytical technology termed “Quantitative Morphological Profiling” that uses 150-600 different features to describe the morphology of single cells.
Here we describe that quantitative morphological profiling of single cells combined with RNAi-based genetic screening technology results in the identification of local signaling networks with spatially, temporally, and functionally defined characteristics that act in a hierarchical manner to regulate cell shape and migration. These methods can be used not only in the context of genetic screens, but also in large-scale screens of small-molecule libraries, or screens involving overexpression of cDNAs. Akin to gene expression data, we can now employ morphological data for computational approaches that aim to model the dynamic nature of signaling networks, while the RNAi component pushes us closer to causal mechanistic linkages.
 

« Back....

 

Micro/nano crystal network: from understanding to design of bio-functional materials
Xiang Yang Liu, National University of Singapore

High preferment functional materials consisting of interconnecting network become increasingly important in both sciences and technologies. It has been shown that in many cases, crystal networks of the hybrid structure give rise to much more superior properties than singles crystals themselves. For instance, the special crystal network structure of amino acids in spider silk leads to the tensile strength several magnitude higher than single chains of amino acids. Due to these facts, our interests are shifted from the control of single crystals, such as size and shape of single crystals to the engineering the crystal network. In this contribution, new understandings on the formation kinetics of crystal network, and the between the network structure and the properties of the systems will be presented. This represents a new direction in the field of crystal growth and crystal engineering. It can also be visualized that in the 21st century, the engineering of crystal network will become one of the most active directions in materials sciences. In this talk, I will introduce the latest development in the kinetics of fiber network formation, the correlation between the structures of biological functional materials and the in use properties, and the application to nanoengineering. This includes the engineering of nano phase and ultra-functional bio-materials.
 

« Back....


 

Probabilistic models of sampling and emergence of biological networks
Vladimir Kuznetsov, Genome Institute of Singapore

We will show that statistics of observed events in evolving finite biological systems cannot be formally fitted and mechanistically explained in the terms of so-called "scale-free" network approach. However, the families of skewed size-dependent probability distribution functions could be used. In particular, we demonstrate that statistics of the number of domain-to-protein links in the proteomes of hundreds species representing all of three super-kingdoms of life (archea, bacteria, eukaryotes) fit well to the Markov birth-death random process models the steady-state solution of which is approached by size-dependent Generalized Pareto function. A parameterization of this model allows us to associate the complexities of prokaryotic and eukaryotic organisms with two distinct network statistics,
respectively. We also discuss other types of stochastic evolution models with size-dependent attributes and new applications of such skewed probabilistic models to de-noise the experiment data in large-scale genomic experiments. We present the methods to identify the underlying probability functions of gene expression levels and of avidity of transcription binding sites in the eukaryotic genomes by available high-noisy and incomplete data.
 

« Back....

 

Computational identification of gene sets controlled by transcription factors on genome scale
Vladimir Kuznetsov, Genome Institute of Singapore

Advances in high-throughput technologies, such as ChIP-chip and ChIP-PET (Chromatin Immuno-Precipitation Paired-End diTag), and the availability of human and mouse genome sequences now allow us to identify transcription factor binding sites (TFBS) and analyze mechanisms of gene regulation on the level of the entire genome. Here, we have developed a computational approach, which uses ChIP-PET data and statistical modeling to assess experimental noise and identify reliable TFBS for c-Myc, STAT1 and p53 transcription factors in the human genome. We present a mixture probabilistic model and the Monte Carlo simulation model of ChIP-PET data to  define the background noise of the sequence clustering and to identify the probability function of specific DNA-protein binding in the eukaryotic genome. We will demonstrate good agreement of the curve-fitting and simulation methods which in combination with motif search procedure not only distinguishes bona fide TFBSs from non-specific binding sites with a high specificity, but also provides computational basis for further optimization of experimental parameters of the ChIP-PET method. We will also present novel methods of estimation of saturation of the specific binding sites, sensitivity and reproducibility of essentially incomplete ChIP-PET data sets. Computational integration of ChIP-PET method with microarray expression data will be also carrying out and using to prediction direct target genes and transcriptional network modules.
 

« Back....

 

Computer Simulation of Biological Pathways and Network Crosstalk based on Mass Action Laws
Yu Zong Chen, National University of Singapore

Biological systems are complex systems involving dynamic biomolecular interactions in the context of specific pathways and crosstalk among different pathways. Computational simulation of these processed enables a deeper understanding of the functional outcomes and fundamental mechanism of the actions of biological networks. It also enables the identification of therapeutic targets, simulation of disease processes, therapeutic and other effects of drug actions. This tutorial gives introductory overview about the simulation of biological pathways and network crosstalk based on the application of mass action laws. The simulation model for the EGFR-Ras/MAPK pathway and the simulation model of RhoA's crosstalk to EGFR-mediated Ras/MAPK activation via MEKK1 are used as illustrative examples.
 

« Back....

 

Building and using protein interaction networks: industry perspective
Andrej Bugrim, Genego, Inc., USA

In this presentation we describe our efforts in collecting protein interaction database, developing functional ontologies and building mechanistic reconstructions of major biological pathways. We describe our annotation process by which we collect information about three major levels of cell processes: activation of membrane receptors, signaling cascades and “core effectors” such as metabolic pathways. Collected protein and small molecule interaction data form the basis for building mechanistic reconstruction for major processes in cell signaling and metabolism and for making marked improvements in functional ontologies. We describe tools and algorithms that allow utilizing content of our database in the process of mining different disease and drug-related “OMICs” datasets produced in the context of drug discovery and other biomedical research.
 

« Back....

 

Elucidation of differential response networks from gene expression data
Andrej Bugrim, Genego, Inc., USA

We describe a novel systems level approach to the analysis of gene expression data and the elucidation of biological networks affected by drug action. Specifically, some 15,000 human linear signaling and biochemical pathway modules were generated from “canonical” maps in MetaCore™ (GeneGo, Inc.), and used as templates for mapping microarray expression data from rat livers exposed to phenobarbital, mestranol and tamoxifen. We analyzed sample-to-sample distances, in gene expression space, of individual pathway modules in order to select “differential” pathways. These are defined by highly correlated expression among multiple repeats of the same treatment, and strong anti-correlation between different treatments. The gene content of these differential pathways is then used for generating network modules that distinguish between the treatments, followed by enrichment analysis using several ontologies. Unlike traditional techniques in microarray expression profiling, our method takes into account both network connectivity and gene expression profiles, and allows for the use of “whole genome” expression data, which is not restricted by fold change and p-value for individual data points. The method enables detection of important cellular mechanisms involved in drug response that would have been missed by traditional procedures of statistical and functional analysis.


 

« Back....

 

Protein scoring based on significance in biological networks
Andrej Bugrim, Genego, Inc., USA

While systems biology tools and approaches are gaining wide acceptance among molecular biologists and clinical researchers, two fundamental issues have emerged. The first one is how to use sets of available high-throughput molecular data to reconstruct biological networks that are truly relevant to the condition of interest. The second, even more important issue is how to utilize results of such reconstruction in the framework of standard laboratory practices and in clinical applications. We present a novel algorithm designed to evaluate importance of individual protein nodes for providing connectivity in condition-specific biological networks. The algorithm starts with a condition-specific set of genes or proteins (e.g. differentially expressed genes). First, we construct shortest path network connecting these genes using global database of interactions available in MetaCore™. Second, we evaluate the number of all paths traversing each node in the shortest path network in relation to the total number of paths going via the same node in the global network. Using these numbers as well as relative size of the initial gene set we calculate p-values for each node in the shortest path network, showing whether or not it is statistically significant for providing connectivity. We test algorithm’s ability to assign high significance to biologically validated drug targets by using public set of gene expression data from psoriatic patients. We show that our method is able to uncover many genes that do not show up on gene expression level but are nevertheless highly related to disease pathways. Then we proceed to demonstrate application of this approach in finding correlations between sets of genomic and proteomic data. The approach can be applied for uncovering new, higher-quality drug targets, validation of existing targets and cross-validation of genomic and proteomics or other types of data.


 

« Back....

 

MetaTox: leveraging the power of systems biology for improving drug safety
Andrej Bugrim, Genego, Inc., USA

It is now an accepted practice that evaluation of drug and chemical safety should involve understanding of perturbations caused by a compound in functional units of living cells – biological pathways, networks and modules. Current analytical procedures in toxicogenomics however are mainly focused on statistical analysis of expression patterns, aiming at identification of small sets of genes which are most characteristic for a certain treatment. MetaTox is the new concept that applies techniques of systems biology to predicting drug toxicity and understanding cellular mechanisms behind drug-response. In MetaTox we conduct analysis on the level of “functional descriptors”: pathways, network modules, and functional profiles, rather than analyzing individual genes. Such analysis allows building predictors that are multidimensional, robust, and allow mechanistic and functional interpretation. Moreover, these functional predictors enable one to consider drug safety in the context of specific indications. We demonstrate application of this concept to the analysis of real-life toxicogenomics datasets and describe MetaTox Consortium – a collaborative effort to improve drug safety evaluation by leveraging systems biology approaches.


 

« Back....

 

Molecular simulations and insights into molecular biology
Chandra Verma, Bioinformatics Institute, Singapore

Applications of mutli-resolution techniques in molecular simulations are providing new insights into the functionings of biological macromolecular machines and the regulations of pathways and networks. These will be discussed using several examples that have been developed in close association with experiments.
 

« Back....

 

Towards bridging the gap between transcriptome and proteome measurements
Mahesan Niranjan, The University of Sheffield, UK

Much of the interesting cellular function in biology is attributable to mechanisms that differentially regulate concentrations of proteins.
High throughput measurements with microarrays capture the profile of mRNA concentrations which are treated as proxies for protein abundances. That there cannot be a one to one correspondence between these two levels has been noted in the literature by several authors.
Eukaryotic cells may employ a number of different mechanisms to regulate protein levels. First and simplest is via a direct link between the required protein levels and transcription. The more protein is required, the greater is the amount of transcribed mRNA.
Secondly, the mRNA can be degraded selectively. mRNA molecules are usually unstable and the rate at which they decay show a high variation governed by different mechanisms. Thus controlling the decay rates selectively can lead to different rates at which the corresponding protein can be synthesised. Thirdly, ribosome binding can be differential, achieving selectivity in the rate of translation of protein molecules. Finally, there can be differential regulation at the post translational level whereby different proteins decay at different rates. Thus we find very little correlation between mRNA concentrations and the corresponding protein levels, as several authors have noted.

Here I use a machine learning framework to predict protein levels using mRNA levels and several proxies of mRNA level regulation such as codon bias, transcript length, mass of the resulting protein, measured ribosome occupancy and measured mRNA halflives as input.
Using Gaussian process regression I show it is possible to achieve a higher level of predictability of protein levels than is apparent when one looks for correlation between mRNA and protein abundances.
From this framework I show how a leave-one-out strategy can be employed to uncover which of the mRNA-protein pairs do not fit such data driven model based prediction. I argue that these pairs that do not fit the learned model are candidates for regulation at the post translational level, i.e. failure, rather than success, of the model is informative! Putting together a small dataset for yeast, where simultaneous measurements of mRNA and protein levels are available in the public domain, and the remaining features can be extracted from databases and publications, I ranked potential candidates for post translational regulation. Searching through the literature it was possible to find experimental evidence for five mRNA-prote in pairs amongst the top twelve, confirming the hypothesis.

 

« Back....

 

RNA - biology and secondary structure
Peter Clote, Boston College, USA

According to the very recent ENCODE Consortium paper appearing in Nature, the human genome is pervasively transcribed; i.e. around 15% of the genome is transcribed although only a fraction of the transcripts account for mRNA, tRNA, rRNA, microRNA, etc. Is nature so wasteful as to squander a large percent of the cell's energy resources toward transcribing "junk RNA"? Or instead are we at the very threshold of beginning to unravel the mystery of new classes of RNA and their function? In this talk, we present an overview of the chemistry and biology of RNA: glycosidic bond, nonstandard RNAs, base stacking, Tinoco, free energy, noncanonical base pairing, Leontis-Westhof classification, 3-dimensional motifs. We then discuss RNA secondary structure, asymptotic results, dynamic programming, minimum free energy structure prediction, and applications.
 

« Back....

 

RNA - algorithms
Peter Clote, Boston College, USA

In this continuation, we discuss the Boltzmann partition function for RNA secondary structure, sampling, applications to siRNA design, as well as structural alignment algorithms and some noncoding RNA gene finders.
 

« Back....

 

Physics-based all-atom modeling of protein dynamics
Yong Duan, University of California, USA

Physics-based modeling has become an indispensable tool to study protein dynamics. Recent improvement in force field and rapid increase in computer speed have made this method increasingly powerful. Recent advances included the successful simulations of a number of small proteins to their native states with the structures to as close as sub-angstrom from the experimental structures. These exciting advances marked the beginning of accurate simulations of protein folding to the native states of proteins which have not been possible before. This talk is divided into two parts.

In the first part, I will discuss the history and future of AMBER force field and its applications to study protein dynamics. AMBER, as one of the powerful simulation packages, has been used in the biomolecular simulation community to help tackle a wide-range of problems. The AMBER force fields, first developed in 1981, have evolved into a collection of force fields including the fully polarizable version released in 2002. An exciting development is the AMBER force field consortium which will steer the future development of AMBER force fields. Two examples of the applications of AMBER in the studies of protein dynamics will be presented. In the first example, AMBER is used to study the dynamics of nucleosome particles, to understand the dynamics of the histone tails which are intimately linked to gene control. In the second example, AMBER is used to study G-protein Coupled Receptors which play key roles in human physiology and are the targets of more than 40% of the drugs.

In the second part, I will focus on the application of physics-based simulations to study protein folding and aggregation. Understanding the mechanisms of protein folding is a key step to link protein primary sequences to their structure and function and has been termed the second half of genomics. The discovery of the folding-related diseases, including the debilitating Alzheimer and Parkinson and other neuro-degenerative diseases, underscores the need of a comprehensive understanding of how proteins reach their native states. We have attempted to understand how proteins aggregate using small peptides as the model systems. In the past, direct simulations of protein folding using physics-based models have not been possible. Thanks to the recent advances in force field, we have been able to reach the native states of four small proteins. The simulations unvealed a rather complex picture of protein folding.
 

« Back....

 

Controllable gating of the water permeation across nanoscale channels
Haiping Fang, Shanghai Institute of Applied Physics, Chinese Academy of Sciences, China

In this talk, the dynamics of the single-file water chains inside a single-walled carbon nanotube (SWNT) with an appropriate radius was studied with molecular dynamics simulations under the influence of continuous deformations and/or external charges. It is found that the water permeation across the channel has an excellent on-off gating behaviour. The water conduction across the water channel keeps almost fixed for a considerable deformation and/or a very small distance of the external charge from the channel. The channel closes rapidly when the deformation exceeds and/or the distance of the external charge from the channel is less than a threshold. We believe that this excellent property is important for biological systems to achieve accurate information transfer in an environment full of thermal fluctuations and useful to develop SWNT-based molecular machines.
 

« Back....

 

Signal and motif detection in genomic sequences
Jagath C. Rajapakse, Nanyang Technological University, Singapore

Signals of genomics sequences refer to specific sites relating to important biological phenomena, for example, transcription start sites (TSS), translation initiation sites (TIS), and splice sides (SS). The computational techniques to detect these signals are becoming popular because of the complexities and difficulties in determining these sites experimentally. This tutorial will introduce computational intelligence techniques, such as neural networks, genetic algorithms, and their hybrids for the detection of TSS, TIS, and SS.

Motifs in genomic sequences refer to short segments of DNA, which are conserved and have some important biological function. Most motifs in DNA sequences have regulatory functions, for examples, transcription factor binding sites (TFBS), promoters, and ribosome binding sites. Motif detection is a difficult problem in computational biology because the motif instances usually present in the sequences with a considerable number of degenerations. We discuss several approaches to motif detection, including profile analysis, neural network methods, the MEME approach, etc. We then proceed to discuss the graphical methods for week motif detection problem where classical techniques fail.

 

« Back....

 

Statistical physics of RNA
Ralf Bundschuh, Ohio State University, USA

Recap of Boltzmann partition function, molten RNA, z-transforms, RNA denaturation, native-molten transition.

 

« Back....

 

 

Quantitative modeling of force-extension experiments
Ralf Bundschuh, Ohio State University, USA

Experimental techniques for force-extension experiments, polymer physics of single-stranded RNA, secondary structure in force-extension experiments, quantitative modeling, structure determination through force-extension experiments.
 

« Back....

 

 

RNA folding kinetics
Ralf Bundschuh, Ohio State University, USA

Types of experiments, modeling approaches, nanopore experiments, modeling of nanopore experiments.

 

« Back....

 

 

MicroRNA target prediction
Ralf Bundschuh, Ohio State University, USA

Biology of microRNAs, target prediction problem, overview over target prediction software.

 

« Back....

 

 

Physics-based all-atom modeling of protein dynamics
Yong Duan, University of California, USA

Physics-based modeling has become an indispensable tool to study protein dynamics. Recent improvement in force field and rapid increase in computer speed have made this method increasingly powerful. Recent advances included the successful simulations of a number of small proteins to their native states with the structures to as close as sub-angstrom from the experimental structures. These exciting advances marked the beginning of accurate simulations of protein folding to the native states of proteins which have not been possible before. This talk is divided into two parts.

In the first part, I will discuss the history and future of AMBER force field and its applications to study protein dynamics. AMBER, as one of the powerful simulation packages, has been used in the biomolecular simulation community to help tackle a wide-range of problems. The AMBER force fields, first developed in 1981, have evolved into a collection of force fields including the fully polarizable version released in 2002. An exciting development is the AMBER force field consortium which will steer the future development of AMBER force fields. Two examples of the applications of AMBER in the studies of protein dynamics will be presented. In the first example, AMBER is used to study the dynamics of nucleosome particles, to understand the dynamics of the histone tails which are intimately linked to gene control. In the second example, AMBER is used to study G-protein Coupled Receptors which play key roles in human physiology and are the targets of more than 40% of the drugs.

In the second part, I will focus on the application of physics-based simulations to study protein folding and aggregation. Understanding the mechanisms of protein folding is a key step to link protein primary sequences to their structure and function and has been termed the second half of genomics. The discovery of the folding-related diseases, including the debilitating Alzheimer and Parkinson and other neuro-degenerative diseases, underscores the need of a comprehensive understanding of how proteins reach their native states. We have attempted to understand how proteins aggregate using small peptides as the model systems. In the past, direct simulations of protein folding using physics-based models have not been possible. Thanks to the recent advances in force field, we have been able to reach the native states of four small proteins. The simulations unvealed a rather complex picture of protein folding.


 

« Back....

 

 

Algorithms for peptide sequencing from tandem mass spectrometry
Hon Wai Leong, National University of Singapore

Tandem Mass spectrometry has become the technology of choice in many proteomics projects. Computational analysis of the MS/MS mass spectra generated by proteomics machines have often been the bottleneck in applying this technology. In this talk, we present a quick overview of computational approaches for peptide sequencing.
We then focus on our work on peptide sequencing for multi charge mass spectra. Most of the computational methods for peptide sequencing from mass spectra have focused on singly charged ion types. A few of these methods also consider doubly charged ion types. However, there has been little attention focused on multi charged
ion types (charge of 3 or more) even though multi charge spectra (with charges up to 5) are publicly available, for example, from the GPM web-site.

We present our recent work on using generalized model to analyze and characterize multi charged mass spectra. Our analysis shows that higher charge ions contributes significantly to the specificity of the spectra. Our model also allow us to derive upper bounds on sensitivity that help to explain the relative poor performance of current algorithms on these higher charge spectra.

Finally, we also present our initial work on de novo peptide sequencing algorithms that are designed specifically for multi charge mass spectra. Specifically, we present our algorithm, called the GST-SPC algorithm that uses the concept of strong tags as candidates for peptide extension and use a graph theoretical algorithm using the generalized spectrum graph model based on our multi-charge model. Our experimental results show that our algorithm outperforms existing algorithms on multi charge mass spectra. We also present some current computational issues arising from this research.

(This is joint work with Kang Ning, Ket Fah Chong and Nan Ye of NUS and Pavel Pevzner of UC-San Diego.)

 

« Back....

 

 

Sense-antisense human gene expression pairs: data mining and analysis on global genome scale
Vladimir Kuznetsov, Genome Institute of Singapore

Transcription of mRNAs from opposite strand to a given gene may cause numerous regulatory effects on gene expression, pathways and cellular functions. Computational approaches based on gene transcripts mapping onto human genome reported several thousands of naturally transcribed mRNAs from genes located on opposite strand of the same locus (cis-antisense (or sense-antisense) gene pairs (CASGP)). However, since reported databases use different sources of the sequence information (UniGene, EST, SAGE etc), they provide poorly compatible and essentially incomplete sets of sense-antisense (SA) gene pairs. To integrate the data on SA pair transcription we created united SA gene pairs database that map the latest GenBank RefSeq, mRNA and EST sequences onto human genome and re-map several previously published CASGP data sets (Y.L. Orlov, Jiangtao Zhou, V.A. Kuznetsov). Clustering of reported transcripts by chromosome coordinates revealed up to 9000 of SA loci. Analyzing our database, microarray expression datum and literature, we demonstrate that sense-antisense gene pairs can provide regulatory functions at several levels of gene expression process including alternative splicing, binding, translational regulation, RNA stability and trafficking. We also demonstrate the associations of different expression pattern of CASGP transcripts with phenotypes of different normal and cancer cells.

« Back....

 

Probing the secondary structure landscape of RNA
Peter Clote, Boston College, USA

In Nature 447(7146):799--816, 2007 the ENCODE Consortium published a landmark paper which stated that the human genome is "pervasively expressed"; indeed, while 14.7% of genome is transcribed, only 1-2% of the transcript can be accounted for by mRNA, rRNA, tRNA, miRNA, etc. The intellectual end of the popular press reacted to the ENCODE paper in the June 14, 2007 issue of Economist, which stated “Molecular biology is undergoing its biggest shake-up in 50 years, as a hitherto little-regarded chemical called RNA acquires an unsuspected significance.It is beginning to dawn on biologists that they may have got it wrong. Not completely wrong, but wrong enough to be embarrassing.â€

Put in a less castigating light, we can posit that the ENCODE Consortium paper provides data underlines the necessity of developing new algorithms to analyze RNA secondary structures, evolutionarily related RNA sequences and structures, and more generally to better understand the landscape of RNA secondary structures. In this talk, we present new algorithms developed by our lab to compute the minimum free energy structure and partition function for various classes of RNA, with potential application for riboswitch detection. This work is joint with E. Freyhult, J. Waldispuhl, B. Beshadi, and J.-M. Steyaert.

 

« Back....

 

Some recent genomics and structural biology web servers
Peter Clote, Boston College, USA

In this talk, we present an overview of various tools our lab has developed, mostly in the area of structural biology. We will discuss time warping and its use in functional genomics, disulfide connectivity, cysteine state prediction, beta-barrel transmembrane structure, and 3-dimensional motif detection in RNA.

This work is the collaboration of F. Ferre, W.A. Lorenz, Y. Ponty, J. Waldispuhl and myself. Of particular significance is the energy model and very deep use of grammars in the transmembrane supersecondary structure algorithm of Jerome Waldispuhl, now a lecturer at MIT.

 

« Back....