Workshop on Computational Systems Biology Approaches to Analysis of Genome Complexity and Regulatory Gene Networks - IMS

Workshop on Computational Systems Biology Approaches to Analysis of Genome Complexity and Regulatory Gene Networks
(20 - 25 Nov 2008)

Jointly organized with Bioinformatics Institute, Agency for Science, Technology and Research, Singapore

~ Abstracts ~

Computations reveals novel methods of regulation of tumor suppressors
Madhumalar Arumugam, Bioinformatics Institute

Computer simulations are used to examine the nature of cooperativity that characterizes the association of the tumour suppressor protein p53 with DNA.
They show why its close homologues do not exhibit cooperativity and also reveal a mode of oligomerization that is novel and complements the consensus view.

The enthalpy-entropy drama of a p53 mutant: simulation studies

Network motifs performance and its possible evolutionary importance
Danail Georgiev Bonchev, Virginia Commonwealth University, USA

Motifs, the simplest building blocks of networks, have characteristic frequency in networks of different nature, and even in individual networks of the same kind. The initial expectations that evolutionary selection works down to the level of motifs in biological networks has not been fulfilled, and at present motifs specifics is viewed rather as a side-effects of evolution. We developed a different approach to the potential biological importance of motifs in biomolecular networks, asking the question: "Is there a specific motif topology, which at equal other conditions would provide a higher effectiveness of converting an input chemical signal into an output one?" The dynamics of motifs of the same small size was modeled in parallel by cellular automata and ODE and confirmed our hypothesis by ordering the motifs into a series with increasing rate of the chemical signal transmitting. Our analysis of metabolic networks has shown rather an opposite trend: the "fastest" motifs were among the least frequent ones. The possible explanation could be related to the cyclic nature of many metabolic processes for which stability matters more than speed. The motif effectiveness proved of importance on a higher organizational level of metabolisms - the network of interacting metabolic pathways (NIPs). The cross-talk of metabolic pathways was found to favor some of the motifs with the most effective performance. These conclusions were drawn from a large-scale analysis of the metabolic databases of Ma et al. and KEGG with 107 and 252 species, respectively. Our studies revealed some cases of isodynamicity - motifs with different topology producing the same conversion rate - and theorems were proved for two such classes of isodynamic graphs. The comparison of isodynamic motifs frequency in NIPs revealed an interesting pattern: in such motifs having a single input and a single output nodes, considerably more frequent are those having excessive feed-forward link of no influence on the motif performance. We hypothesize that evolution preferred such structures for stability reasons: incapacitating of such link in a cell attack would leave the cell performance unchanged. Investigation is in progress on the potential evolutionary importance of the best performing motifs in signaling and gene regulatory networks with some examples presented.

Importance sampling of word patterns in DNA and protein sequences
Hock Peng Chan, National University of Singapore

Monte Carlo methods can provide accurate p-value estimates of word counting test statistics and are easy to implement. They are especially attractive when an asymptotic theory is absent or when either the search sequence or the word pattern is too short for the application of asymptotic formulae. Naive direct Monte Carlo is undesirable for the estimation of small probabilities because the associated rare events of interest are seldom generated. We propose instead efficient importance sampling algorithms that use controlled insertion of the desired word patterns on randomly generated sequences. The implementation is illustrated on word patterns of biological interest: palindromes and inverted repeats, patterns arising from position specific weight matrices and co-occurrences of pairs of motifs. This is work done jointly with Nancy Zhang and Louis Chen.

Thermodynamic insights based on atomistic computer simulations reveal an intricate coupling between enthalpy and entropy
Shubhra Ghosh Dastidar, Bioinformatics Institute

Thermodynamic insights based on atomistic computer simulations reveal an intricate coupling between enthalpy and entropy that governs the relationship between the tumor suppressor transcription factor p53 and ubiquitin ligase, MDM2. Simulations show for the first time, how the same peptide can associate in multiple conformational states with the same receptor but driven by different thermodynamics.

Dastidar S. G., Lane D. P., Verma C. S., /J. Am. //Chem. Soc./ *2008*, 130, 13514

Outcome prediction in breast and colon cancer: machine learning vs biology
Eytan Domany, The Weizmann Institute of Science, Israel

Considerable effort has been devoted during the recent five years to identify a gene expression signature that predicts outcome of early-discovery breast cancer. Different groups used different cohorts of patients and different DNA microarrays to produce short-lists of predictive genes, and reported high success rates.

I will review some of this work, point out problematic aspects of it and present PAC-ranking, a method designed to estimate the number of training samples needed to produce a robust predictive gene list.

If time permits, I will describe briefly an alternative, biology-based approach to outcome prediction in colon cancer.

Biological context prediction from gene sequence sets within the ANNOTATOR framework
Frank Eisenhaber, Bioinformatics Institute

The identification and validation of protein targets for their future exploitation as entry points of pharmaceutical intervention requires understanding of their role in mechanisms of pathogenesis. Experimentally, the determination of genes involved is addressed by genetic screens, expression profile studies and other high-throughput techniques. Typically, these approaches result in identifier lists of relevant genes many of which are just known from sequencing projects. Therefore, the analysis of uncharacterized biomolecular sequence sets is a standard task in a life science research environment and, generally, the understanding of function of proteins encoded by them is the main difficulty.

Different concepts with regard to globular domains and non-globular regions guide the segment-based functional analysis of protein sequences. State-of-the-art protein sequence studies require the application of several dozens of prediction tools that easily generate hundreds of megabytes of ASCII output for every sequence studied. In addition to the de novo sequence annotation, it is important to link the targets into the context of interaction complexes and pathways.

The ANNOTATOR software environment was especially designed for the purpose of efficient workflows in protein sequence analysis to assist the purpose of function and context prediction. More than 30 academic tools, many integrated algorithms such as family searching heuristics as well as all relevant public databases are integrated for the study of protein sequence sets and the prediction of their functional significance.

Dynamics of DNA methylation
Chih-Lin Hsieh, USC Norris Comprehensive Cancer Center, USA

CpG methylation has been shown to play an important role in genetic imprinting, neoplasia, and mammalian development. Many phenomena and correlations related to DNA methylation and gene regulation have been described and proposed. However, DNA methylation is a dynamic process and the pattern of DNA methylation is very heterogeneous from one cell to another. Furthermore, the lack of effective genetic systems and quantitative methods has limited the ability to elucidate the causal role of DNA methylation in biological processes in most studies. Using a stable episomal system, we have dissected the impact of DNA methylation on gene regulation and some of the processes of de novo methylation and demethylation in human cells. We addressed the questions of how DNA methylation impacts transcription, how protein-DNA interaction can lead to demethylation, how DNA binding proteins can protect DNA sites from de novo methylation in a qualitative as well as quantitative manner. We have also analyzed the site preference of one of the de novo methyltransferases using a biochemical approach. In an attempt to develop quantitative methods to evaluate the dynamics of the DNA methylation process, we examined DNA methylation patterns and expression of some endogenous genes in various cell types. Utilizing a chromatin immunoprecipitation method on the episomal system, we have been able to dissect the interaction between transcription, DNA methylation, and histone modification.

The analysis of organism complexity through functional space of proteome
Alex Kanapin, Cold Spring Harbor Laboratory, USA

The new measure of organism complexity has been introduced. The concept of functional space of proteome provides a new method of estimating of functional relations of eukaryotic species. The combination of alternative splicing analysis and proteome functional labels algorithm reveals new type of relational networks (splice-function networks, SFN) and conservative functional modules in biological systems.

We reported earlier the application of the method for human and mouse transcriptome analysis which revealed common conservative core in SFNs for both species. Now we apply the approach to organisms with different levels of organization complexity (nematode, fruit fly etc). The results show the parameters describing SFN complexity correlate with the complexity of a given organism.

The analysis also reveals differences in functional categories comprising central core of SFN for different organisms and allows us to speculate about possible ways of evolution of high eucaryots from the point of view of the repertoire of protein functions.

Signal transduction noise in eukaryotic cells: measurements and modeling
Marek Kimmel, Rice University, USA

It has been noticed that there exists serious variability among individual cells' responses to activating signals. Part of this variability can be attributed to extrinsic sources of noise, but there seems to exist a substantial component due to intrinsic causes such as randomness of gene activation and receptor noise. This talk uses examples from literature and our group's research to show how mathematical modeling can help to understand the mechanism and purpose of noisy cell response. The examples include among other the NFkB module and the prolactin receptor system.

Modeling the evolution of Alu repeats in human genome
Marek Kimmel, Rice University, USA

Alu elements occupy about eleven percent of the human genome and are still growing in copy numbers. Since Alu elements substantially impact the shape of our genome, there is a need for modeling the amplification, mutation and selection forces of these elements. Our proposed theoretical neutral model follows a continuous-time branching process described by Pakes, or its equivalent discrete counterpart as described by Griffith and Pakes. From Pakes? model, we are able to derive a limiting frequency spectrum of the Alu element distribution, which serves as the theoretical, neutral frequency to which real Alu insertion data can be compared to through statistical goodness of fit tests. Departures from the neutral frequency spectrum may indicate different types of selection. A comparison of the Alu sequence data (obtained by courtesy of Dr. Jerzy Jurka) with our model shows that the distribution of Alu sequences in subfamilies AluYa5 and AluYb8 does not follow the expected distribution derived from the branching process. This observation suggests that Alu sequences do not evolve neutrally and might be under selection.

Modeling and analysis of relative avidity, specificity, sensitivity of Transcription Factor-DNA binding in Genome-scale experiments
Vladimir Kuznetsov, Bioinformatics Institute

One of the most crucial problems with genome-wide experimental analysis is how to extract meaningful biological phenomena from the resulting large data sets. Here, we present modeling and prediction techniques that are applied to genome-wide identification of in vivo protein-DNA binding sites from ChIP-based data sets. We develop a simple mixture probabilistic model of occurrence of nonspecific and specific TF-DNA binding events for transcription factor binding to any site in the genome. We calculated the statistical significance of specific and non-specific random binding events using Kolmogorov-Waring and exponential functions, respectively. The binding events in the chromosome regions associated with non-specific, non-random binding loci were also identified and filtered out. The mixture model fits equally well to 5 different TFs (ERE, CREB, STAT1, Nanog, Oct4) data provided by ChIP-PET, SACO and ChiP-seq methods included in this study. Each of these DNA-protein binding data sets exhibits sample size-dependent and complexity- dependent skewed statistical distributions of occurrence of binding event (no scale-free model properties). We present a uniform methodology for estimating specificity, total number of binding sites and sensitivity of datasets detected by these ChIP-based genome-wide experimental systems. We demonstrate strong heterogeneity of specific TF-DNA binding sites in terms of their avidity and by correlation between observed relative binding avidity of specific TF-DNA binding site with the level of mRNA transcription of the nearest gene target.

Finally, we conclude that the sensitivity problem has not been resolved by current ChIP-based methods, including ChIP-Seq.

Global and local transcriptome reprogramming in low- and high- aggressive breast cancer sub-types
Vladimir Kuznetsov, Bioinformatics Institute

Microarray hybridization signals contain an additive and multiplicative noise which the both mask real specific hybridization signals and thus provide the difficulties in analysis and interpretation of such massive dataset. Using a mixture probability model of signal value in microarray hybridization experiment, we demonstrate that statistics of specific signals at transcriptome level for each studied RNA sample are following the Generalized Pareto-Gamma distribution (GPGD). Goodness of fit statistical analysis of the mixture model using a lag cohort of the human breast cancer samples reveals significant correlations of the parameters of GPGD with known biological markers of aggressiveness of breast cancer and with survival time of the patients. In the human transcriptome, we identified ~4000 transcripts providing significant associations with these biological and clinical observations. Many known and novel small gene signatures are founded in this significant gene set. We demonstrate that observed global transcriptome re-scaling and gene network complexity switches in the tumor subtypes could be represented by several low-dimension subsets of highly specific genes and the molecular pathways associated with these biologically important and clinically significant genes.

A region based nucleus detector using the mumford-Shah model
Hwee Kuan Lee, Bioinformatics Institute

The Mumford-Shah model is one of the best segmentation models and it has many superior properties such as robust to noise and able to segment illusive objects. We developed a nucleus detector based on the Mumford-Shah model that inherits the good properties of the Mumford-Shah model. In our nucleus detector, the free curves in the Mumford-Shah model are constrained to non-overlapping ellipses. Quantitative comparison with the randomized Hough transform shows that the Mumford-Shah based approach performs signi?cantly better on our data sets.
Authors: Choon Kong Yap, Hwee Kuan Lee

Liquid association for large scale gene expression and network studies
Ker-Chau Li, Academia Sinica , Taiwan

The fast-growing public repertoire of microarray gene expression databases provides individual investigators with unprecedented opportunities to study transcriptional activities for genes of their research interest at no additional cost. Methods such as hierarchical clustering, principal component analysis, gene network and others, have been widely used. They offer biologists valuable genome-wide portraits of how genes are co-regulated in groups. Such approaches have a limitation because it often turns out that the majority of genes do not fall into the detected gene clusters. If one has a gene of primary interest in mind and cannot find any nearby clusters, what additional analysis can be conducted? In this talk, I will show how to address this issue via the statistical notion of liquid association. An online biodata mining system is developed in my lab for aiding biologists to distil information from a web of aggregated genomic knowledgebase and data sources at multi-levels, including gene ontology, protein complexes, genetic markers, drug sensitivity.

Data mining in protein interaction networks
Xiao-Li Li, Institute for Infocomm Research

The cellular machinery is a complex system with a multitude of bio-molecular interactions. Most, if not all, of the cellular processes are mediated by protein-protein interactions (PPI). Recently, high-throughput methods for detecting PPIs have given researchers an initial global picture of protein interactions on a genomic scale, typically represented as large protein interaction networks (PINs) by considering individual proteins as the nodes, and the existence of a physical interaction between a pair of proteins, e.g. as measured by high-throughput experiments, as a link between two corresponding nodes.
In this talk, we will describe some of our research work on mining protein interaction network, e.g. identifying essential proteins and detecting protein complexes, etc.

Inference of patterns and associations using dictionary models
Jun Liu, Harvard University, USA

Pattern discovery is a ubiquitous problem in many disciplines. It is especially prominent in recent years due to our greatly improved data-generation capabilities in science and technologies. The method I present here is motivated by the "motif-finding" and "module-finding" problems in biology, i.e., to find sequence patterns (i.e., "words") that seem to appear more frequent than usual in a given set of text sequences (i.e., sentences) and to find which of these "words" tend to co-occur in a sentence. A challenge in the motif-finding problem is that there are no spacings and punctuations between the words and the dictionary of "words" is unknown to us. Existing methods are mostly "bottom-up" approaches, i.e., to build up the dictionary starting with single-letter words and then concatenate some existing words that appear to occur next to each other in sentences more frequently than chance. Our new approach is a top-down strategy, which uses a tree structure to represent the relationship among all possible existing words and uses the EM algorithm to estimate the usage frequency of each word. It automatically trims down most of the incorrect "words" by letting their usage frequencies converge to zero.

The module-finding problem is closely related to the well-known "market basket" problem, in which one attempts to mine association rules among the items in a supermarket based on customers' transaction records. It is also related to the two-way clustering problem. In this problem, we assume that the words are given, and our goal is to find subsets of words that tend to co-occur in a sentence. We call the set of co-occurring words (not necessarily orderly) a "theme" or a "module". We can generalize the dictionary model to the "theme"-model and use a similar EM-strategy to infer these themes. I will demonstrate its applications in a few examples including an analysis of chinese medicine prescriptions and an analysis of a chinese novel.

An iterative approach to weighting and expanding protein interaction networks and its impact on complex discovery
Guimei Liu, National University of Singapore

High-throughput protein interaction data, with ever-increasing volume, are becoming the foundation of many biological discoveries. However, high-throughput protein interaction data are often associated with high false positive and false negative rates. It is desirable to develop scalable methods to identify these errors. Several methods, such as CD-distance and FSWeight, have been proposed to assess the reliability of protein interactions based on common neighbors. In this talk, I will introduce a new approach that iteratively calculates the score of protein interactions based on common neighbors. Our study shows that the iterative approach improves the performance greatly, especially for predicting new interactions. We have also studied the impact of different scoring methods on complex discovery. We use a maximal clique finding algorithm to identify complexes from weighted and expanded protein interaction networks. Our results show that the iterative approach can improve recall and precision significantly.

Chiping the human cistrome
Shirley X Liu, Harvard University, USA

Cistrome defines the set of cis-acting targets of a trans- acting factor on a genome scale. To this end, we have developed a number of algorithms for genome-wide ChIP-chip and ChIP-Seq data analysis, including binding site peak finding, motif analysis, nucleosome positioning, and integrative modeling of transcription mechanism. I will present some of the algorithms, and their applications to understanding nuclear receptor regulation in cancers.

Functional analysis of OMICs data for cancer and toxicity phenotypes
Yuri Nikolsky, GeneGo Inc, USA

GeneGo, Inc. 169 Saxony Rd. #104, Encinitas, CA 92024

High-throughput assays become a mainstream in experimental studies of complex human diseases, particularly cancers. Recently, a number of functional analysis methods were developed and applied for the analysis of SNP and expression arrays, CGH arrays, exon re-sequencing data, proteomics and siRNA profiles. I will describe the basic techniques of pathway, network and interactome data analysis developed and implemented by GeneGo and summarize the results of collaborative studies on breast, colorectal, pancreatic cancers and glioblastoma we were involved in over the last three years. I will also describe novel methods of functional analysis of predictive gene signature models we developed as part of FDA?s MAQCII project.

RNA polymerase, cell growth and proliferation: inhibitory mechanisms by the tumour suppressor glycogen synthase kinase (GSK) 3beta
Piergiorgio Percipalle, Karolinska Institute, Sweden

Transcription of rRNA genes by RNA polymerase I is essential for sustained protein synthesis and therefore for cellular growth and proliferation. Within the entire rRNA biogenesis, transcription control represents the most important level of regulation and it is intimately connected with intracellular signalling pathways and mitogenic stimulation. Therefore it is not surprising that the RNA polymerase I machinery is a target for tumour suppressors as well as oncogenes with their potential to deregulate the entire rRNA biogenesis pathway. In fact at the onset of cancer development the activity of tumour suppressors is down-regulated whereas the rate of oncogenic activities is significantly elevated; the consequence is increased rates of rRNA biogenesis and considerable impact on tumour growth. Here I will discuss recent work from my lab on the discovery of the novel tumour suppressor activity of the glycogen synthase kinase (GSK)3β. Evidence will be shown supporting the view that GSK3β suppresses cell growth and proliferation through a direct effect on the assembly of the transcription-competent RNA polymerase I machinery in an oncogenic H-RAS dependent manner. A model is proposed in which GSK3β cooperates with the tumour suppressor PTEN at the rDNA promoter to repress pol I transcription. Future developments of this work will also be anticipated in the context of rRNA biogenesis in proliferating cells.

Transcriptional and post-transcriptional control of gene expression by actin and myosin
Piergiorgio Percipalle, Karolinska Institute, Sweden

Actin and an ever growing family of actin-associated proteins have been accepted as members of the nuclear crew, regulating eukaryotic gene expression. My lab has contributed to determine how actin cooperates with heterogeneous nuclear ribonucleoproteins and certain myosin species as molecular motors required for transcription of protein coding genes and rRNA genes. Recent work has also uncovered evidence that actin and myosin are likely to be implicated in the post-transcriptional control of RNA biogenesis. These findings represent the tip of the iceberg of a rapidly growing research area within the functional architecture of the cell nucleus. Further studies will contribute to clarify how actin mediates nuclear functions while keeping an eye open on cytoplasmic signalling pathways. In any case, these discoveries have the potential to identify novel regulatory networks required to modulate the multiple steps of gene expression.

Weighted gene coexpression network analysis and causality testing for finding complex disease genes
Angela Presson, University of California, Los Angeles, USA

This talk is divided into two parts. Part I describes weighted gene co-expression network analysis (WGCNA), its applications and software. Part II describes how causal relationships among gene expressions and traits can be predicted when genetic marker data is available using Network Edge Orienting (NEO) methods and software.

Part I:
Gene co-expression networks are increasingly used to explore the system-level functionality of genes. Network construction is conceptually straightforward: nodes represent genes and nodes are connected if the corresponding genes are significantly co-expressed across appropriately chosen tissue samples. In reality, it is tricky to define the connections between the nodes in such networks. An important question is whether it is biologically meaningful to encode gene co-expression using binary information (connected=1, unconnected=0). I will describe a general framework that assigns a connection weight to each gene pair and results in a weighted gene co-expression network. These methods have been successfully applied to microarray experiments designed for a) inter-species and evolution, and b) to identify disease related pathways and candidate genes.

Part II:
In the second part of this talk I will describe Network Edge Orienting (NEO) methods and software that address the challenges of inferring causal relationships among traits, genetic markers and weighted gene co-expression. NEO methods are based on structural equation model comparisons, and it performs the following tasks: relates traits to multiple genetic markers, scores the genetic evidence in favor of an edge orientation, and ranks the causal importance of these markers. NEO's ability to orient the edges of gene co-expression or quantitative trait networks relies on relevant genetic marker data.

R software tutorials, data, and supplementary material on WGCNA and NEO can be downloaded from: http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/.

Integrated weighted gene co-expression network analysis with an application to chronic fatigue syndrome
Angela Presson, University of California, Los Angeles, USA

Systems biologic approaches such as Weighted Gene Co-expression Network Analysis (WGCNA) can effectively integrate gene expression and trait data to identify pathways and candidate biomarkers. Here I will describe how genetic marker data can be included to characterize network relationships as causal or reactive, an analysis referred to as "Integrated WGCNA" or IWGCNA. Specifically, I will present the following IWGCNA approach: 1) construct a co-expression network, 2) identify trait-related modules within the network, 3) use a trait-related genetic marker to prioritize genes within the module, 4) apply an integrated gene screening strategy to identify candidate genes and 5) carry out causality testing to verify and/or prioritize results from step 4. This strategy will be demonstrated on a chronic fatigue syndrome (CFS) data set consisting of microarray, SNP and clinical trait data. IWGCNA identifies a cluster of 299 highly correlated genes (called a 'module') that is associated with CFS severity. For comparison, we re-analyze the CFS microarray and trait data using a traditional approach that ignores the SNP data. Gene ontology information indicates that these methods both yield pathway functions related to the immune system, which is relevant to CFS. However, the IWGCNA method results in genes that additionally are related to the SNP data and are causal drivers for their parent module. IWGCNA identifies disease-related pathways and the causal drivers within them. The systems genetics approach described here can easily be used to generate testable genetic hypotheses in other complex disease studies.

Computational and experimental approaches to modeling gene regulation
Gary Stormo, Washington University School of Medicine, USA

One of the challenges of genomics research is to understand the regulation of gene expression. Much of the regulation is controlled through DNA-protein interactions and we have been developing tools, both computational and experimental, to study those interactions for many years. This talk will outline some of the approaches we have been using and how they inform us about the regulatory network that governs the cell's behavior. This includes work on developing a "recognition code" that allows one to predict the binding specificity of novel transcription factors and to design factors to bind to specific sequences.

Genome-wide identification of differential histone modification sites from ChIP-seq data using HMM
Ken Sung, National University of Singapore

Epigenetic modifications are one of the critical factors to regulate gene expression and genome function. Among different epigenetic modifications, the differential histone modification sites (DHMSs) are of great interest to study the dynamic nature of epigenetic and gene expression regulations among various cell-types, stages or environmental responses. To capture the histone modifications at whole genome scale, ChIP-seq technology is becoming a robust and comprehensive approach. Thus the DHMSs are potentially identifiable by comparing two ChIP-seq libraries. However, little has been addressed on this issue in literature.

Aiming at identifying DHMSs, we propose an approach called ChIPDiff for the genome-wide comparison of histone modification sites identified by ChIP-seq. Based on the observations of ChIP fragment counts, the proposed approach employs a Hidden Markov Model (HMM) to infer the states of histone modification changes at each genomic location.

Biomolecular network reconstruction reveals mechanisms of immune reaction in colorectal cancer
Zlatko Trajanoski, University Graz, Austria

We used data integration and biomolecular network reconstruction to generate hypotheses about the mechanisms underlying immune responses in colorectal cancer that are relevant to tumor recurrence. Mechanistic hypotheses were formulated on the basis of data from 108 colorectal carcinomas and tested with a combination of different assays (gene expression, phenome mapping, tissue-microarrays, TCR-repertoire). This integrative approach revealed chemoattraction and adhesion to play important roles in determining the density of intratumoral immune cells. The presence of specific chemokines and adhesion molecules correlated with different subsets of immune cells and with high densities of T cell subpopulation within specific tumor regions. High expression of these molecules correlated with prolonged disease-free survival.

Efficient watershed evolution for cellular image segmentation based on topological analysis
Weimiao Yu, Bioinformatics Institute

Segmentation of cells is a crucial and challenging step for quantitative analysis of biological assays. Level-set methods can segment cells of irregular shapes in images with low signal-to-noise ratio, however they could not effectively segment cells that touch each other. In order to solve this problem, topological dependence is introduced as a critical constraint for the segmentation of cellular image of multiple channels. In this paper, we propose an algorithm that evolves watershed lines based on topological dependence to segment cells correctly even when they touch each other. Our new algorithm overcomes the shortcomings of level-sets while utilizing its strengths. Proper treatments of topological changes, such as splitting and merging of segments, are generally challenging and complicated. However, the implementation of our approach is easy and simple. It is efficient and the computational complexity does not depend on the number of the cells. According to our experimental results, it performs efficiently compared to other existing segmentation algorithms.
Authors: Weimiao Yu, Hwee Kuan Lee, Srivats Hariharan, Wenyu Bu, Sohail Ahmed

Network-based global inference of human disease genes
Michael Zhang, Cold Spring Harbor Laboratory, USA

Deciphering the genetic basis of human diseases is an important goal of biomedical research. On the basis of the assumption that phenotypically similar diseases are caused by functionally related genes, we propose a computational framework that integrates human protein-protein interactions, disease phenotype similarities, and known gene-phenotype associations to capture the complex relationships between phenotypes and genotypes. We develop a tool named CIPHER to predict and prioritize disease genes, and we show that the global concordance between the human protein network and the phenotype network reliably predicts disease genes. Our method is applicable to genetically uncharacterized phenotypes, effective in the genome-wide scan of disease genes, and also extendable to explore gene cooperativity in complex diseases. The predicted genetic landscape of over 1000 human phenotypes, which reveals the global modular organization of phenotype-genotype relationships. The genome-wide prioritization of candidate genes for over 5000 human phenotypes, including those with under-characterized disease loci or even those lacking known association, is publicly released to facilitate future discovery of disease genes.

Defining splicing-regulatory networks of the tissue-specific factors fox-1/2
Michael Zhang, Cold Spring Harbor Laboratory, USA

The precise regulation of many alternative splicing (AS) events by specific splicing factors is essential to determine tissue types and developmental stages. However, the molecular basis of tissue-specific AS regulation and the properties of splicing-regulatory networks (SRNs) are only partly understood. Here we undertook to predict the targets of the brain- and muscle-specific splicing factor Fox-1 (A2BP1) and its paralog Fox-2 (RBM9) and to define the corresponding SRNs genomewide. Fox-1/2 are conserved from worm to human, and specifically recognize the RNA element UGCAUG. We integrate Fox-1/2 binding specificity with phylogenetic conservation, splicing-microarray data, and additional computational and experimental characterization. We predict thousands of Fox-1/2 targets with conserved binding sites, at a false discovery rate (FDR) of ~24%, including dozens validated experimentally, suggesting a surprisingly extensive SRN. The preferred position of the binding sites differs according to AS pattern, and determines either activation or repression of exon recognition by Fox-1/2. Many predicted targets are important for neuromuscular functions, and have been implicated in several genetic diseases. We also identified instances of binding-site creation or loss in different lineages and human populations, which likely reflect fine-tuning of gene-expression regulation during evolution.

Best viewed with IE 7 and above