Computational Methods in Biomolecular Structures and Interaction Networks(9 Jul - 3 Aug 2007)Jointly organized with Genome Institute of Singapore ~ Abstracts ~Computational and experimental
Analysis of signaling pathway crosstalk in developing
tissues The main goal of the talk is to illustrate how multiple
experimental (genomic, genetic, imaging) and computational
(mechanistic and data-driven modeling) are combined to
provide insights into the dynamics of living tissues. Given
the fact that the same pathways which coordinate tissue
development are deregulated in multiple diseases, this work
has broad implications for tissue dynamics in all animals.
Dynamic transcriptional
control of gene expressions using sequential logic model (Cellogica) Cellular signaling involves a sequence of events from
ligand binding to membrane receptors through transcription
factors activation and the induction of mRNA expression. The
transcriptional-regulatory system plays a pivotal role in
the control of gene expression. A novel computational
approach to the study of gene regulation circuits is
presented and a simulation software (Cellogica) is
developed. Control of fibrosis by the
Triple Helix-Forming Oligodeoxyribonucleotides targeting the
promoter of Type I Collagen gene In response to injury, an evolutionarily conserved wound
healing process occurs. This response, if gone awry, can
result in a pathologic process commonly referred to as
‘fibrosis’. Fibrosis is due to abnormal accumulation of
‘extracellular matrix (ECM) proteins’, and is responsible
for causing structural alterations and loss of function of
the involved organ in the body. The main component of the
ECM is Type I Collagen. In our laboratory, we have developed
triple helix-forming oligodeoxyribonucleotides (TFOs), which
form triplex structures with the C1 region (-170 to -141
from transcription start site) of the α1(I) collagen gene
promoter. This region contains a stretch of about 30
pyrimidines on the upper (coding) strand and the
complementary purine stretch on the lower (non-coding)
strand.
Gene expression: molecular
mechanisms of gene transcription During gene expression the information contained in a
linear DNA sequence of nucleotides termed gene is decoded
into a protein, a linear sequence of amino acids. This
process is highly regulated and consists of several steps.
Initially a DNA molecule is used as template to produce an
intermediate molecule termed messenger-RNA (mRNA) which is
then further translated to yield a protein as the final
outcome. This process is universal but there are differences
in prokaryotic and eukaryotic cells mainly due to the
different extent of cellular compartmentalisation in each
cell type. In any cases, proteins are essential for cell
survival since they are involved in all cellular processes
and therefore the process of gene expression is universal.
Gene expression: the
mechanisms of mRNA translation and protein synthesis In the first part of gene expression, a gene is
transcribed into an mRNA molecule. This intermediate nucleic
acid is then used as template to decode further the
information and yield a protein. The process during which an
mRNA molecule is decoded is known as mRNA translation or
protein synthesis and it is considered as the second phase
of gene expression.
Projection of gene-protein
networks to functional space of mammalian proteome via
alternative splicing Informational content of genome coding sequences unfolds via functions of proteins. Alternative splicing is one of the ways of the genome manifestation into its proteome. We consider the problem of projection of genetic information into the functional space of the proteome. The later is defined as a set of all functions committed by proteins. First, we automatically created a set of functional labels of proteins attributed by conservative protein domains. A new type of relational networks between genomic DNA sequences and functional labels has been proposed. The networks has been used to analyze the acquisition of a new function in the proteome and evolutionary plasticity of the genome. We use the InterPro database as integrative resource combining data from different protein domain databases. About 70% of InterPro entries have UniProt Keywords (KW) assigned to them. This makes it possible to assign KW ID to a protein sequence via its conservative domains. Then we consider the combinations of KW IDs as the functional labels (FL) which characterize the biological functions of the given protein. The unique set of functional labels of a proteome is considered as the functional space or functional complexity of the proteome. Then we analyzed how protein isoforms (PI), produced by alternative splicing, differ in their functional annotation. By mapping of a set of transcriptional units (TU) for human and mouse transcriptomes, produced by the FANTOM consortium, on the set of the functional labels, we construct and characterize TU/FL interconnection networks. We created a catalog of common and unique functional
labels to both mammalian species tested based on functional
labels analysis of the transcriptome. The new type of
functional networks and statistics of FL-to-protein links
and PI-to-TUs links for human and mouse proteomes is
derived. Using statistical analysis of these networks, we
propose an evolutionary mechanism of protein function
acquisition. The process includes stages with different
evolutionary constraints. The network analysis allows usto
analyze genome-transcriptome-proteome evolutionary
plasticity. We also compare diversity of biological
functions of proteomes in different species. The functional
networks reveal a group of genes and corresponding functions
which could be attributed to an early conservative part of
the cellular machinery.
Sequence-specific interaction
of PNA with duplex DNA Peptide Nucleic Acid (PNA) is an artificial analog of
nucleic acids carrying DNA bases and a peptide backbone. It
has pretty unusual modes of binding to DNA and RNA. Most
interestingly, PNA has a unique ability of sequence-specific
targeting double-stranded DNA (dsDNA) by invading the DNA
double helix. There are two modes of duplex invasion
complexes of PNA with DNA. Homopyrimidine PNAs, known also
as DNA openers, invade duplex DNA via triplex formation
leaving one of two DNA strands displaced and thus capable of
interactions with single-stranded oligonucleotides and PNA
oligomers via Watson-Crick pairing. Pseudocomplementary PNAs
(pcPNAs), which carry chemically modified bases, exhibit a
double-duplex mode of binding to duplex DNA interacting with
both complementary strands. The mechanism, structure and
applications of these two types of sequence-specific
PNA-dsDNA complexes are covered. In particular, the use of
PNA openers for highly specific fluorescence detection of
short signature sequences in bacterial genomes and the use
of pcPNAs for sequence-specific bending of duplex DNA are
discussed in detail.
Hybrid hamiltonian replica
exchange based on poisson-boltzmann model A modified sampling scheme which utilizes the framework
of replica-exchange simulation with explicit solvent
molecules and replaces the system exchange probability for
jumping between different temperature replicas with the
kernel exchange probability is brought out termed as REMDhPB.
The kernel mainly includes the peptide/protein and/or
protein complexes only. The energy of the kernel is the sum
of the vacuum energy calculated by force field and the polar
solvation energy obtained from Poisson-Boltzmann model plus
the non-polar solvation energy estimated from solvent
accessible surface. Canonical distributions of the
conformations of three distinct penta-peptides were obtained
in comparison with those from standard REMD simulations.
Moreover this method has been applied to ab inito fold a
decapeptide peptide to its native beta hairpin structure
from extended conformations.
Increasing confidence of
protein-protein interactomes High-throughput experimental methods, such as
yeast-two-hybrid and phage display, have fairly high levels
of false positives (and false negatives). Thus the list of
protein-protein interactions detected by such experiments
would need additional wet laboratory validation. It would be
useful if the list could be prioritized in some way.
Advances in computational techniques for assessing the
reliability of protein-protein interactions detected by such
high-throughput methods are reviewed in this talk, with a
focus on techniques that rely only on topological
information of the protein interaction network derived from
such high-throughput experiments. In particular, we discuss
indices that are abstract mathematical characterizations of
networks of reliable protein-protein interactions and
indices that are based on explicit motifs associated with
true-positive protein interactions.
Large-scale inference of
condition-specific regulation using gene expression data and
the predicted transcription factor occupancy of promoters Gene expression experiments have been performed under
many different conditions. In contrast, large-scale ChIP-chip
experiments (i.e., those involving many transcription
factors) have been performed under just a few. It is likely,
therefore, that only a fraction of condition-specific
functional binding sites have been identified. Computational
methods are required to further correlate factors,
conditions, and target genes in order to infer more
comprehensive regulatory networks, and to generate
hypotheses that can be tested by directed ChIP experiments.
An overview of weighted gene
co-expression network analysis Weighted gene co-expression network analysis (WGCNA)
facilitates a systems biologic view of gene expression data.
The network framework makes it straightforward to integrate
gene expression data with other types of data, e.g. clinical
traits and genetic marker data. This talk covers several
theoretical topics including network construction, module
definition, network based gene screening, and differential
network analysis. The methods are illustrated using several
applications including i) screening for cancer genes, ii)
comparing human and chimp brains, and iii) complex disease
gene mapping. Related articles and material can be found at
the following webpage
http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/
Functions, networks, and
phenotypes by integrative genomics analysis The rapid accumulation of genomics data provides
unprecedented opportunities to systematically infer gene
functions, regulatory networks, and phenotype associations.
In this talk, we develop several graph-based data mining
algorithms to integrate diverse genomics data, especially
the vast amount of microarray data in the public
repositories. A series of microarray data sets are modeled
as a series of co-expression networks, in which we search
for frequently occurring network patterns. Our integrative
approach for functional annotation provides three major
advantages over the commonly used microarray analysis
methods: (1) enhance signal to noise separation (2) identify
functionally related genes without co-expression, and (3)
provides a way to predict gene functions in a
context-specific way. Furthermore, we show that frequently
occurring co-expression clusters are more likely to
represent transcriptional modules than those clusters
derived from a single microarray dataset. In addition, we
propose the concept of "second-order correlation" which
enables us to trace the upstream events of transcription
cascades. Finally, we develop methods to systematically
identify phenotype specific network patterns and regulatory
modules.
Modeling of genetic regulatory
networks: a data-driven process High throughput genomic and proteomic studies of
clinical samples have generated a large amount of data but
very little information and much less wisdom. We understand
that transcripts and proteins are interlinked but it is a
major challenge to develop appropriate mathematical models
that reveal the logical and physical relationships among the
components of the biological systems. We submit that a key
modeling criterion is that the model has to be data-driven:
it has to be able to take in biological data and produce
experimentally testable diagrams or networks. Only when this
correlation is demonstrated again and again can we reach a
conclusion that a biologically appropriate mathematical
model is born.
Local signaling networks
defined by quantitative morphological signatures Genetically identical cells can adopt a diverse spectrum
of complex shapes in order to accomplish a variety of
functions. Classical experimental approaches have identified
hundreds of unique proteins that play roles in the dynamic
remodeling of cell shape in response to upstream signals,
but there is little understanding of how these proteins are
physically organized into networks in subcellular space, and
how information flows through this sophisticated molecular
circuitry in real-time. In order to model the signaling
networks that regulate cell shape, we have developed a novel
analytical technology termed “Quantitative Morphological
Profiling” that uses 150-600 different features to describe
the morphology of single cells.
Micro/nano crystal network: from
understanding to design of bio-functional materials High preferment functional materials consisting of
interconnecting network become increasingly important in
both sciences and technologies. It has been shown that in
many cases, crystal networks of the hybrid structure give
rise to much more superior properties than singles crystals
themselves. For instance, the special crystal network
structure of amino acids in spider silk leads to the tensile
strength several magnitude higher than single chains of
amino acids. Due to these facts, our interests are shifted
from the control of single crystals, such as size and shape
of single crystals to the engineering the crystal network.
In this contribution, new understandings on the formation
kinetics of crystal network, and the between the network
structure and the properties of the systems will be
presented. This represents a new direction in the field of
crystal growth and crystal engineering. It can also be
visualized that in the 21st century, the engineering of
crystal network will become one of the most active
directions in materials sciences. In this talk, I will
introduce the latest development in the kinetics of fiber
network formation, the correlation between the structures of
biological functional materials and the in use properties,
and the application to nanoengineering. This includes the
engineering of nano phase and ultra-functional
bio-materials.
Probabilistic models of sampling
and emergence of biological networks We will show that statistics of observed events in
evolving finite biological systems cannot be formally fitted
and mechanistically explained in the terms of so-called
"scale-free" network approach. However, the families of
skewed size-dependent probability distribution functions
could be used. In particular, we demonstrate that statistics
of the number of domain-to-protein links in the proteomes of
hundreds species representing all of three super-kingdoms of
life (archea, bacteria, eukaryotes) fit well to the Markov
birth-death random process models the steady-state solution
of which is approached by size-dependent Generalized Pareto
function. A parameterization of this model allows us to
associate the complexities of prokaryotic and eukaryotic
organisms with two distinct network statistics,
Computational identification of
gene sets controlled by transcription factors on genome
scale Advances in high-throughput technologies, such as ChIP-chip
and ChIP-PET (Chromatin Immuno-Precipitation Paired-End
diTag), and the availability of human and mouse genome
sequences now allow us to identify transcription factor
binding sites (TFBS) and analyze mechanisms of gene
regulation on the level of the entire genome. Here, we have
developed a computational approach, which uses ChIP-PET data
and statistical modeling to assess experimental noise and
identify reliable TFBS for c-Myc, STAT1 and p53
transcription factors in the human genome. We present a
mixture probabilistic model and the Monte Carlo simulation
model of ChIP-PET data to define the background noise
of the sequence clustering and to identify the probability
function of specific DNA-protein binding in the eukaryotic
genome. We will demonstrate good agreement of the
curve-fitting and simulation methods which in combination
with motif search procedure not only distinguishes bona fide
TFBSs from non-specific binding sites with a high
specificity, but also provides computational basis for
further optimization of experimental parameters of the ChIP-PET
method. We will also present novel methods of estimation of
saturation of the specific binding sites, sensitivity and
reproducibility of essentially incomplete ChIP-PET data
sets. Computational integration of ChIP-PET method with
microarray expression data will be also carrying out and
using to prediction direct target genes and transcriptional
network modules.
Computer Simulation of
Biological Pathways and Network Crosstalk based on Mass
Action Laws Biological systems are complex systems involving dynamic
biomolecular interactions in the context of specific
pathways and crosstalk among different pathways.
Computational simulation of these processed enables a deeper
understanding of the functional outcomes and fundamental
mechanism of the actions of biological networks. It also
enables the identification of therapeutic targets,
simulation of disease processes, therapeutic and other
effects of drug actions. This tutorial gives introductory
overview about the simulation of biological pathways and
network crosstalk based on the application of mass action
laws. The simulation model for the EGFR-Ras/MAPK pathway and
the simulation model of RhoA's crosstalk to EGFR-mediated
Ras/MAPK activation via MEKK1 are used as illustrative
examples.
Building and using protein
interaction networks: industry perspective In this presentation we describe our efforts in
collecting protein interaction database, developing
functional ontologies and building mechanistic
reconstructions of major biological pathways. We describe
our annotation process by which we collect information about
three major levels of cell processes: activation of membrane
receptors, signaling cascades and “core effectors” such as
metabolic pathways. Collected protein and small molecule
interaction data form the basis for building mechanistic
reconstruction for major processes in cell signaling and
metabolism and for making marked improvements in functional
ontologies. We describe tools and algorithms that allow
utilizing content of our database in the process of mining
different disease and drug-related “OMICs” datasets produced
in the context of drug discovery and other biomedical
research.
Elucidation of differential
response networks from gene expression data We describe a novel systems level approach to the analysis of gene expression data and the elucidation of biological networks affected by drug action. Specifically, some 15,000 human linear signaling and biochemical pathway modules were generated from “canonical” maps in MetaCore™ (GeneGo, Inc.), and used as templates for mapping microarray expression data from rat livers exposed to phenobarbital, mestranol and tamoxifen. We analyzed sample-to-sample distances, in gene expression space, of individual pathway modules in order to select “differential” pathways. These are defined by highly correlated expression among multiple repeats of the same treatment, and strong anti-correlation between different treatments. The gene content of these differential pathways is then used for generating network modules that distinguish between the treatments, followed by enrichment analysis using several ontologies. Unlike traditional techniques in microarray expression profiling, our method takes into account both network connectivity and gene expression profiles, and allows for the use of “whole genome” expression data, which is not restricted by fold change and p-value for individual data points. The method enables detection of important cellular mechanisms involved in drug response that would have been missed by traditional procedures of statistical and functional analysis.
Protein scoring based on
significance in biological networks While systems biology tools and approaches are gaining wide acceptance among molecular biologists and clinical researchers, two fundamental issues have emerged. The first one is how to use sets of available high-throughput molecular data to reconstruct biological networks that are truly relevant to the condition of interest. The second, even more important issue is how to utilize results of such reconstruction in the framework of standard laboratory practices and in clinical applications. We present a novel algorithm designed to evaluate importance of individual protein nodes for providing connectivity in condition-specific biological networks. The algorithm starts with a condition-specific set of genes or proteins (e.g. differentially expressed genes). First, we construct shortest path network connecting these genes using global database of interactions available in MetaCore™. Second, we evaluate the number of all paths traversing each node in the shortest path network in relation to the total number of paths going via the same node in the global network. Using these numbers as well as relative size of the initial gene set we calculate p-values for each node in the shortest path network, showing whether or not it is statistically significant for providing connectivity. We test algorithm’s ability to assign high significance to biologically validated drug targets by using public set of gene expression data from psoriatic patients. We show that our method is able to uncover many genes that do not show up on gene expression level but are nevertheless highly related to disease pathways. Then we proceed to demonstrate application of this approach in finding correlations between sets of genomic and proteomic data. The approach can be applied for uncovering new, higher-quality drug targets, validation of existing targets and cross-validation of genomic and proteomics or other types of data.
MetaTox: leveraging the power of
systems biology for improving drug safety It is now an accepted practice that evaluation of drug and chemical safety should involve understanding of perturbations caused by a compound in functional units of living cells – biological pathways, networks and modules. Current analytical procedures in toxicogenomics however are mainly focused on statistical analysis of expression patterns, aiming at identification of small sets of genes which are most characteristic for a certain treatment. MetaTox is the new concept that applies techniques of systems biology to predicting drug toxicity and understanding cellular mechanisms behind drug-response. In MetaTox we conduct analysis on the level of “functional descriptors”: pathways, network modules, and functional profiles, rather than analyzing individual genes. Such analysis allows building predictors that are multidimensional, robust, and allow mechanistic and functional interpretation. Moreover, these functional predictors enable one to consider drug safety in the context of specific indications. We demonstrate application of this concept to the analysis of real-life toxicogenomics datasets and describe MetaTox Consortium – a collaborative effort to improve drug safety evaluation by leveraging systems biology approaches.
Molecular simulations and
insights into molecular biology Applications of mutli-resolution techniques in molecular
simulations are providing new insights into the functionings
of biological macromolecular machines and the regulations of
pathways and networks. These will be discussed using several
examples that have been developed in close association with
experiments.
Towards bridging the gap between
transcriptome and proteome measurements Much of the interesting cellular function in biology is
attributable to mechanisms that differentially regulate
concentrations of proteins.
RNA - biology and secondary
structure According to the very recent ENCODE Consortium paper
appearing in Nature, the human genome is pervasively
transcribed; i.e. around 15% of the genome is transcribed
although only a fraction of the transcripts account for
mRNA, tRNA, rRNA, microRNA, etc. Is nature so wasteful as to
squander a large percent of the cell's energy resources
toward transcribing "junk RNA"? Or instead are we at the
very threshold of beginning to unravel the mystery of new
classes of RNA and their function? In this talk, we present
an overview of the chemistry and biology of RNA: glycosidic
bond, nonstandard RNAs, base stacking, Tinoco, free energy,
noncanonical base pairing, Leontis-Westhof classification,
3-dimensional motifs. We then discuss RNA secondary
structure, asymptotic results, dynamic programming, minimum
free energy structure prediction, and applications.
RNA - algorithms In this continuation, we discuss the Boltzmann partition
function for RNA secondary structure, sampling, applications
to siRNA design, as well as structural alignment algorithms
and some noncoding RNA gene finders.
Physics-based all-atom
modeling of protein dynamics Physics-based modeling has become an indispensable tool
to study protein dynamics. Recent improvement in force field
and rapid increase in computer speed have made this method
increasingly powerful. Recent advances included the
successful simulations of a number of small proteins to
their native states with the structures to as close as
sub-angstrom from the experimental structures. These
exciting advances marked the beginning of accurate
simulations of protein folding to the native states of
proteins which have not been possible before. This talk is
divided into two parts.
Controllable gating of the
water permeation across nanoscale channels In this talk, the dynamics of the single-file water
chains inside a single-walled carbon nanotube (SWNT) with an
appropriate radius was studied with molecular dynamics
simulations under the influence of continuous deformations
and/or external charges. It is found that the water
permeation across the channel has an excellent on-off gating
behaviour. The water conduction across the water channel
keeps almost fixed for a considerable deformation and/or a
very small distance of the external charge from the channel.
The channel closes rapidly when the deformation exceeds
and/or the distance of the external charge from the channel
is less than a threshold. We believe that this excellent
property is important for biological systems to achieve
accurate information transfer in an environment full of
thermal fluctuations and useful to develop SWNT-based
molecular machines.
Signal and motif detection in
genomic sequences Signals of genomics sequences refer to specific sites
relating to important biological phenomena, for example,
transcription start sites (TSS), translation initiation
sites (TIS), and splice sides (SS). The computational
techniques to detect these signals are becoming popular
because of the complexities and difficulties in determining
these sites experimentally. This tutorial will introduce
computational intelligence techniques, such as neural
networks, genetic algorithms, and their hybrids for the
detection of TSS, TIS, and SS.
Statistical physics of RNA Recap of Boltzmann partition function, molten RNA,
z-transforms, RNA denaturation, native-molten transition.
Quantitative modeling of
force-extension experiments Experimental techniques for force-extension experiments,
polymer physics of single-stranded RNA, secondary structure
in force-extension experiments, quantitative modeling,
structure determination through force-extension experiments.
RNA folding kinetics Types of experiments, modeling approaches, nanopore
experiments, modeling of nanopore experiments.
MicroRNA target prediction Biology of microRNAs, target prediction problem, overview
over target prediction software.
Physics-based all-atom modeling of
protein dynamics Physics-based modeling has become an indispensable tool
to study protein dynamics. Recent improvement in force field
and rapid increase in computer speed have made this method
increasingly powerful. Recent advances included the
successful simulations of a number of small proteins to
their native states with the structures to as close as
sub-angstrom from the experimental structures. These
exciting advances marked the beginning of accurate
simulations of protein folding to the native states of
proteins which have not been possible before. This talk is
divided into two parts.
Algorithms for peptide sequencing
from tandem mass spectrometry Tandem Mass spectrometry has become the technology of
choice in many proteomics projects. Computational analysis
of the MS/MS mass spectra generated by proteomics machines
have often been the bottleneck in applying this technology.
In this talk, we present a quick overview of computational
approaches for peptide sequencing.
Sense-antisense human gene
expression pairs: data mining and analysis on global genome
scale Transcription of mRNAs from opposite strand to a given gene may cause numerous regulatory effects on gene expression, pathways and cellular functions. Computational approaches based on gene transcripts mapping onto human genome reported several thousands of naturally transcribed mRNAs from genes located on opposite strand of the same locus (cis-antisense (or sense-antisense) gene pairs (CASGP)). However, since reported databases use different sources of the sequence information (UniGene, EST, SAGE etc), they provide poorly compatible and essentially incomplete sets of sense-antisense (SA) gene pairs. To integrate the data on SA pair transcription we created united SA gene pairs database that map the latest GenBank RefSeq, mRNA and EST sequences onto human genome and re-map several previously published CASGP data sets (Y.L. Orlov, Jiangtao Zhou, V.A. Kuznetsov). Clustering of reported transcripts by chromosome coordinates revealed up to 9000 of SA loci. Analyzing our database, microarray expression datum and literature, we demonstrate that sense-antisense gene pairs can provide regulatory functions at several levels of gene expression process including alternative splicing, binding, translational regulation, RNA stability and trafficking. We also demonstrate the associations of different expression pattern of CASGP transcripts with phenotypes of different normal and cancer cells.
Probing the secondary structure
landscape of RNA In Nature 447(7146):799--816, 2007 the ENCODE Consortium
published a landmark paper which stated that the human
genome is "pervasively expressed"; indeed, while 14.7% of
genome is transcribed, only 1-2% of the transcript can be
accounted for by mRNA, rRNA, tRNA, miRNA, etc. The
intellectual end of the popular press reacted to the ENCODE
paper in the June 14, 2007 issue of Economist, which stated
“Molecular biology is undergoing its biggest shake-up in
50 years, as a hitherto little-regarded chemical called RNA
acquires an unsuspected significance.It is beginning to dawn
on biologists that they may have got it wrong. Not
completely wrong, but wrong enough to be embarrassing.â€
Some recent genomics and
structural biology web servers In this talk, we present an overview of various tools our lab has developed, mostly in the area of structural biology. We will discuss time warping and its use in functional genomics, disulfide connectivity, cysteine state prediction, beta-barrel transmembrane structure, and 3-dimensional motif detection in RNA. This work is the collaboration of F. Ferre, W.A. Lorenz,
Y. Ponty, J. Waldispuhl and myself. Of particular
significance is the energy model and very deep use of
grammars in the transmembrane supersecondary structure
algorithm of Jerome Waldispuhl, now a lecturer at MIT.
|
||