|
RECOMB Satellite Workshop on Regulatory
Genomics
(17 - 18 Jul 2006)
~ Abstracts ~
Discovering motifs with
transcription factor domain knowledge
Francis Chin, Hong Kong University, Hong Kong
Finding the binding sites of transcription factors from a
set of promoter regions of co-regulated genes is an
important problem in molecular biology. Most
motif-discovering algorithms consider over-represented
similar patterns as binding sites and find the position
specific score matrix (PSSM) with the maximum likelihood as
the solution motif. However, many motifs in real biological
data cannot be discovered by these algorithms because they
do not consider the biological characteristics of binding
sites. We introduce a new algorithm, DIMDom, which exploits
two kinds of information: (a) the characteristic pat-tern of
binding site classes, where class is determined based on
biological information about transcription factor domains
and (b) posterior probabilities of these classes.
We compared the performance of DIMDom with MEME on all the
transcription factors of Drosophiia in the TRANSFAC database
and found that DIMDom outperformed MEME with more than
double the number of successes and double the accuracy in
finding binding sites and motifs.
Joint work with Henry Leung.
« Back...
Computational challenges for
top-down modeling and simulation of biological pathways
Satoru Miyano, University of Tokyo, Japan
If the concept of ordinary/partial differential equations
would be the only way for modeling biological pathways for
simulation like in some software tools, our understanding of
life as system through computation would be not be
drastically increased and would be very biased. If the
language for modeling and describing biological pathways
would be not rich like graph structures, GIF files, binary
relations, kinetic equations, links to another information
resources, English narrations, etc., we would loose a lot of
valuable knowledge and information on biological systems
produced and reported by laboratories because biological
knowledge and information are very heterogeneous.
Placing this understanding as our basis of development, we
have been developing an XML format Cell System Markup
Language CSML and a modeling and simulation tool Cell
Illustrator. In this talk, we present the newest version
CSML 3.0 and Cell Illustrator 3.0 which supports CSML 3.0.
Cell Illustrator (CI for short) is a software tool for
modeling and simulating biological pathways which is based
on the notion of Petri net which was developed with the name
Genomic Object Net. An important challenge for Systems
Biology is to create a software platform with which
scientists in biology/medicine can comfortably create models
of dynamic causal interactions and processes in the cell(s)
and simulate them for further investigations, e.g.
testing/creating hypotheses. CI employs the notion of Hybrid
Functional Petri Net with extension (HFPNe) as its
architecture. HFPNe was defined by enhancing some functions
to hybrid Petri net so that various aspects in pathways can
be intuitively modeled, including integer, real, string,
boolean, vector, objects, etc. The architecture of CI 3.0 is
designed so that users can get involved with modeling and
simulation in a biologically intuitive way with their
profound knowledge and insights, and they can also be
benefited from some public/commercial pathway databases. We
consider that biological system modeling should be conducted
by biological scientists because their minds are full of
unpublished deep insights which are inevitable for right
modeling. Therefore, any computational challenge for
developing such modeling and simulation tools should take
care of this aspect. CI 3.0 has a biology-oriented GUI and
we can make modeling of very complex biological processes
like a drawing tool. Further, we can create a personalized
visualization of simulation by developing an XML document
for animation. Its effectiveness has been demonstrated by
modeling various biological processes. Recently, we have
developed a method for automatic parameter estimation for
HFPN models by developing a theory of data assimilation that
will be implemented as a function of CI.
Simultaneously, we developed an XML format called Cell
System Markup Language (CSML) for describing biological
systems for simulation. Some XML formats are proposed to be
a standard format for biological pathways. However, all
formats provide only a partial solution for the storage and
integration of biological data. The aim of CSML 3.0 is to
create a really usable XML format for visualizing, modeling
and simulating biological pathways. For many cases, in
vivo/vitro biological experimental results and in silico
analyzed results are useful information for biological
pathway analysis. A successful application is Cytoscape,
which can combine in vivo/vitro and in silico analyses into
one graphical network. The core application supports a
text-based and a GML formats. Plugins for importing XML
format are developed. However, the functionality is limited.
In addition, the application just visualizes the biological
pathway related data but dynamic simulation part is missing.
Other XML formats, SBML 2.0 and CellML 1.0 are proposed and
developed for dynamic simulation. These formats have become
popular for chemical reactions and many applications support
them as data exchanging formats. However, these formats do
not define any graphical elements, which cause a difficulty
to be a powerful data exchange format among biological
pathway applications. Here, CSML 3.0 is developed as an
integrated/unified data exchange format which covers widely
used data formats and applications, e.g. CellML 1.0, SBML
2.0, BioPAX, and Cytoscape. In CSML 1.9 and CSML 2.0, the
main focus was to support Hybrid Functional Petri net (HFPN)
based visualization and simulation. CSML 3.0 has focused on
Hybrid Functional Petri net with extension (HFPNe)
architecture, extended HFPN with object notion, for more
advanced biological pathway modeling. In short, objects that
construct biological pathways are treated as "generic
entity" of HFPNe architecture and any relations among
objects are treated as "generic process" on the HFPNe
architecture. The details of CSML 3.0 will be available form
http://www.csml.org/
We also developed automatic conversion programs which
convert SBML 2.0 to CSML 3.0 and CellML 1.0 to CSML 3.0
automatically. Cell Illustrator 3.0 fully supports CSML 3.0
as its base XML. Thus every model in SBML 2.0 and CellML 1.0
can be executable on Cell Illustrator 3.0. It is also
possible to automatically convert KEGG and BioCyc metabolic
pathways to CSML.
« Back...
A tale of two topics --- motif
significance and sensitivity of spaced seeds
Ming Li, University of Waterloo, Canada
Computing the p-value of a motif has been a very
difficult problem. Many heuristic algorihms try to
approximate it. It turns out that this problem is very
similar to the optimal spaced seed design in homology
search. Connecting the two topics, for the first time we
show computing the p-value is NP-hard, and give a reasonably
fast algorithm by dynamic programming. Test results will be
given.
Joint work with J. Zhang, Bo Jiang, J. Tromp, X. Zhang, M.Q.
Zhang
« Back...
Computational structural proteomics
and inhibitor discovery
Ruben Abagyan, The Scripps Research Institute, La Jolla,
USA
Rapid advance of structural proteomics calls for the
development of new methods for predicting structural
changes, association, function, as well as improving methods
for structure based molecular design. The main challenges of
computational structural biology and chemistry will be
reviewed. We have developed methods for predicting the
functional map of a protein with a known 3D structure,
accurate docking of compounds to a binding site and virtual
ligand screening of large chemical databases, and structure
prediction by global energy optimization, e.g.
characterizing mutants and SNPs, homology modeling, protein
protein or peptide docking, and accurate loop prediction.
Predicting how flexible molecules dock to a flexible
receptor is one of the main challenges in computational
structural biology and structure based ligand design. Two
stories in which novel compounds were discovered through "ligand-guided"
receptor pocket modeling followed by virtual screening of
large compound libraries, were presented. First, we
developed models of the androgen receptor in an
antagonist-bound conformation. These models were used to
discover computationally the secondary activity of
antipsychotic drugs. These drugs were then chemically
altered and "re-purposed" to loose their binding to the
serotonin and dopamin receptors, and improve their
anti-androgen properties. The experimental side of this
project was performed by the labs of Xiaokun Zhang and James
Dalton. Second, in a collaboration with the David Lomas lab
at Cambridge, we identified the first small molecules to
inhibit pathological polymerization of an alpha1-antitrypsin
mutant which is the most common genetic cause of a lethal
liver disease in childhood. Computationally this project was
particularly difficult because the target of a small
molecule was a dynamic protein-protein interface. Third, we
developed a protocol for protein-protein docking which
produced the winning overall predictions in two consecutive
CAPRI competitions.
Finally, a new way to disseminate structural and functional
information in structural proteomics developed in
collaboration with the Oxford Center for Structural Genomics
is presented.
« Back...
An improved gibbs sampling method
for motif discovery via sequence weighting
Tao Jiang, University of California at Riverside, USA
The discovery of motifs in DNA sequences remains a
fundamental and challenging problem in computational
molecular biology and regulatory genomics, although a large
number of computational methods have been proposed in the
past decade. Among these methods, the Gibbs sampling
strategy has shown great promise and
is routinely used for finding regulatory motif elements in
the promoter regions of co-expressed genes. In this paper,
we present an enhancement to the Gibbs sampling method when
the expression data of the concerned genes is given. A
sequence weighting scheme is proposed by explicitly taking
gene expression variation into account in Gibbs sampling.
That is, every putative motif element is assigned a weight
proportional to the fold change in the
expression level of its downstream gene under a single
experimental condition, and a position specific scoring
matrix (PSSM) is estimated from these weighted putative
motif elements. Such an estimated PSSM might represent a
more accurate motif model since motif elements with dramatic
fold changes in gene expression are more likely to represent
true motifs. This weighted Gibbs sampling method has been
implemented and successfully tested on
both simulated and biological sequence data. Our
experimental results demonstrate that the use of sequence
weighting has a profound impact on the performance of a
Gibbs motif sampling algorithm.
Joint work with Xin Chen (School of Physical and
Mathematical Sciences, Nanyang Technological University,
Singapore)
« Back...
Computational prediction of
regulatory elements by comparative sequence analysis
Martin Tompa, University of Washington, USA
With many vertebrate genomes now completely sequenced,
the most promising methods for predicting functional
sequence elements are based on comparison of sequences from
multiple species. We focus on problems that arise when using
such tools on a genome-wide scale in the vertebrates. These
problems include difficulties in finding reliably homologous
promoter sequences, difficulties in choosing the best tool
and parameters to apply to these sequences, and difficulties
in assessing the significance of the predictions produced.
Solutions are offered to each of these problems, though they
are far from complete.
« Back...
|
|