Networks in Biological Sciences - IMS

Networks in Biological Sciences
(01 Jun - 31 Jul 2015)

Jointly organized with Department of Mathematics, NUS

~ Abstracts ~

Minimum dominating set approach to analysis and control of biological networks
Tatsuya Akutsu, Kyoto University, Japan

Extensive studies have recently been done on structural controllability of complex networks. Different from the well-studied approach based on bipartite matching, we have been studying an approach based on the minimum dominating set (MDS), where MDS is a well-known concept in graph theory and has been applied to analysis and control of engineering systems. We show via theoretical analysis and computer simulation that the more heterogeneous a network degree distribution is, the easier it is to control the entire system [1]. We also verify this tendency using biological and other network data. It is also suggested by other researchers that MDS is useful to identify important nodes in complex networks including protein-protein interaction networks. We also present some variants and extensions of the MDS-based approach:

(i) MDS in bipartite networks [2], (ii) critical nodes in MDS [3], and (iii) robust MDS for structurally robust control of complex networks [4]. This talk is based on joint work with Jose Nacher in Toho University, Japan.

[1] J. C. Nacher and T. Akutsu, New Journal of Physics, 14:073005, 2012.
[2] J. C. Nacher and T. Akutsu, Scientific Reports, 3:1647, 2013.
[3] J. C. Nacher and T. Akutsu, Journal of Complex Networks, 2:394-412, 2014.
[4] J. C. Nacher and T. Akutsu, Physical Review E, in press.

Data and text mining applied to the computational study of protein interaction networks
Miguel Andrade, Johannes-Gutenberg University of Mainz, Germany

Protein interaction networks allow us to understand the mechanisms of life. Data on particular protein interactions is produced at increasing speed in a variety of contexts. These data is deposited in the biomedical literature and databases. Integrating these data in meaningful networks requires computational methods working with and relating information from different biological levels. I will present several methods that we developed that use data and text mining to support the study of protein interaction networks, and the application of these methods to the study of the alterations of protein networks in disease.

Deciphering reticulate evolution using phylogenetic reconciliation
Mukul Bansal, University of Connecticut, USA

Duplication-Transfer-Loss (DTL) reconciliation has emerged as a powerful technique for studying gene family evolution in the presence of horizontal gene transfer (HGT). DTL reconciliation can accurately infer HGT events and place them on the species tree, making it a natural candidate for constructing phylogenetic networks for microbes. However, there are several challenges that must be overcome to fully adapt the DTL reconciliation framework for phylogenetic network inference. In this talk, we first survey some recent advances related to improving the accuracy and utility of DTL reconciliation and then discuss how this framework can be extended to elucidate phylogenetic networks.

Apprehending life's complexity and its evolution with networks
Eric Bapteste, Université Pierre et Marie Curie, Paris, France

Biological objects (from genes to genomes and holobionts) are composite entities, made of interacting heterogeneous parts ; often brought together by reticulate processes. Describing the evolution of such complex objects, in particular the association, stabilisation, and transformation of biological elements resulting in novel higher level structures requires the developments of network-based analytical tools and of increasingly flexible representations of life's history. In order to reach this conclusion, I will introduce some conceptual challenges raised by biological data and recent discoveries from microbiology and virology, and explain how these challenges encourage to expand the framework of evolutionary analyses through the use of sequence similarity networks and bipartite graphs.

Organelle-focused proteomes and interactomes in rice
Ming Chen, Zhejiang University , China

Proteomic analysis (proteomics) refers to the systematic identification and quantification of the complete complement of proteins (the proteome) of a biological system (cell, tissue, organ, biological fluid, or organism). To better understand the interactions of proteins in rice, we developed PRIN, a predicted rice interactome network. The protein-protein interactions data of PRIN are based on interologs of six model organisms where the large-scale protein-protein interaction experiments are applied. An example showed that proteins functional complex and biological pathways could be effectively expanded in our predicted network. Protein subcellular localization has been a long-standing key problem in investigating proteins' function, which provides important clues for revealing their functions and aids in understanding their interactions with other biomolecules at the cellular level. We presented a novel integrative approach (PSI) that derives the wisdom of multiple specialized predictors via a joint-approach of group decision making strategy and machine learning methods to give an integrated best result. We systematically defined the organelle-focused proteomes and interactomes in rice. A total of 83.42% of the whole rice proteome obtained their subcellular localizations based on manual annotation, manual adjustment and the prediction results of PSI. We illustrated the cross talk bias between different organelles and the function organization accounting for nine organelles. Motif analysis illustrated the protein interaction bias in different organelles to implement certain biology functions.

Computational developments in microRNA-regulated protein-protein interactions and pathways
Phoebe Chen, La Trobe University, Australia

Protein-protein interaction (PPI) is one of the most important functional components of a living cell. This talk describes basic studies on the miRNA-regulated PPI network in the way of bioinformatics which includes constructing a miRNA-target protein network, describing the features of miRNA-regulated PPI networks and overviewing previous findings based on analysing miRNA-regulated PPI network features.

Hidden critical circuits in the human signaling network
Kwang-Hyun Cho, KAIST, South Korea

Systems biology combines systems science and biology to explore the emergent property that is unique for a biological system. Such a property emerges when multiple components interact with each other in a nonlinear way. Cells have evolved a complicated signaling network to recognize external signals and produce appropriate responses for survival. We found that there are intriguing circuits in such a signaling network that were evolutionarily designed to elicit critical functions, which is a good example of the emergent property. In particular, we found that feedforward and feedback loops are essential in such circuits and that cellular dysfunctions related to complex human disease can be caused by malfunctioning of these circuits. In this talk, I will briefly review the main concept and history of systems biology and then introduce several illustrative case studies ranging from a small scale signal transduction pathway to a large and complex molecular interaction network to discuss how the emergent properties of cellular functions can be induced from complicated interaction of multiple molecules.

Large-scale study of genetic exchange through bipartite graphs
Eduardo Corel, Université Pierre et Marie Curie, Paris, France

Introgressive events are recognized as an important driving force in the evolution of prokaryotes. A convenient way of representing the complexity of this exchange of genetic material is to construct large-scale similarity networks. I will specifically focus on the use of algorithms on bipartite graphs and apply them to the study of the adaptation to lifestyle in prokaryotes, and to the characterisation of their pathogenicity.

Topological implications of negative curvature for biological networks
Bhaskar DasGupta, University of Illinois at Chicago, USA

In real biological network applications, one frequently encounters phenomena of the following type:

a. Network motifs are often nested.

b. Paths mediating up- or down-regulation of a target node starting from the same regulator node often have many small crosstalk paths.

c. There are central nodes of influential neighborhoods.

Although each of these phenomena can be studied on its own, it is desirable to have a network measure reflecting salient properties of complex large-scale networks that can explain all these phenomena at one shot. In this talk we adapt a combinatorial measure of negative curvature (Gromov hyperbolicity) to parameterized finite networks, and show that a variety of biological networks are hyperbolic. The hyperbolicity property has strong implications on the higher-order connectivity and other topological properties of these networks. Specifically, we derive and prove bounds on the distance among shortest or approximately shortest paths in hyperbolic networks, and explain how implications of these bounds may provide answers to observations such as in a-c above.

Based on a joint results with R. Albert and N. Mobasheri (Physical review E, 89 (3), 032811, 2014).

Statistical methods for network analysis of biological data
Minghua Deng, Peking University, China

In this talk, I will give a brief introduction of our recent works on network analysis of biological data. Including matrix decomposition for genetic interaction data, network inference for genetic survey data, as well network based association study of eQTL data.

The evolution of the metazoan protein toolkit
Dannie Durand, Carnegie Mellon University, USA

Domains, sequence fragments that encode protein folds with a distinct function, are the basic building blocks of proteins. The set of all encoded domains can be viewed as the protein function toolkit of the genome. Using a phylogenetic birth-death-gain model, we investigate how the evolution of the metazoan protein toolkit drives functional innovation in metazoa. Given a species tree and the set of protein domain families in each present-day species, the birth-death-gain approach estimates the most likely rates, the expected ancestral domain content, and the history of domain family gains, losses, expansions, and contractions. Comparative analysis of these events reveals that a small number of evolutionary strategies, corresponding to toolkit expansion, turnover, specialization, and streamlining, is sufficient to describe the evolution of the metazoan protein domain complement. Domain family rates similarly adhere to a mere handful of evolutionary patterns. Clustering protein domain families according to their rates reveals modules of families evolving in concert. We find that domains with similar rate profiles tend to belong to similar functional groups. Further, within a module, domains tend to arise or expand in the same lineage. These bursts of gains and expansions correlate with major shifts in metazoan evolution, such as the emergence of cell-cell signaling, the synaptic nervous system, and the adaptive immune system.

In summary, the use of a powerful, probabilistic birth-death-gain model reveals a striking harmony between the evolution of domain usage in metazoan proteins and organismal innovation. We observe a limited set of evolutionary patterns in both domain family rates and lineage-specific events, suggesting that domain evolution does not proceed independently in each lineage. Nor is there a single, dominant mode of evolution. Rather, a highly constrained set of evolutionary strategies gave rise to the complexity and variety seen in present-day metazoan species.

Exploring the phylogenetic network community
Philippe Gambette, University of East Paris, France

Several kinds of generalizations of phylogenetic trees have been introduced in the literature by researchers with different backgrounds, choosing different types of approaches (combinatorial, geometric, statistical, etc.) to propose phylogenetic network models and the corresponding reconstruction algorithms. Since 2005, more than 30 publications each year provide new methods dealing with phylogenetic networks, focusing on their reconstruction, comparison, visualization, simulation, etc. This talk will present tools to explore the scientific literature on these methods, using relationships between authors, as well as keywords used to tag those publications, referring to input data, algorithmic techniques, subclasses of phylogenetic networks, software names, etc. Parts of this work were done jointly with Tushar Agarwal, David Morrison and Maxime Morgado.

Computing split systems from weighted quartets
Stefan Gruenewald, CAS-MPG Partner Institute for Computational Biology, China

A common way to generalise unrooted phylogenetic trees is to consider them as compatible weighted split systems and then to relax or omit the compatibility constraint. General split systems can be visualised by a splits graph which is often called an unrooted phylogenetic network. The most commonly used methods to construct not necessarily compatible split systems are NeighborNet and split decomposition. Both are distance-based but can also be interpreted as quartet-based, because the non-trivial splits and their weight are essentially determined by weights of the quartets (partial splits with exactly 2 taxa in each part) that can be obtained from the pairwise distances. Therefore, it is a natural approach to compute quartet weights directly from the raw data, w.g. sequences. This saves one step of potential error accumulation and allows us to reconstruct more general split systems.

In my talk I will summarise various methods to construct split systems from weighted quartets and the classes of split systems that they can reconstruct consistently. I will also discuss how quartet weights can be computed from sequences.

Studying phylogenetic networks via integer linear programming (ILP)
Daniel Gusfield, University of California at Davis, USA

It has been estimated that benchmark integer programs can now be solved 200 billion times faster than twenty-five years ago. That dramatic speed-up makes it possible to solve many realistic instances of NP-hard problems in computational biology. In this talk I explain in general how to formulate integer programs to address problems in phylogenetic networks, and then show some recent formulations and computational results. The recent problems include: using galled-trees in haplotyping problems; studying the frequency of persistent phylogenies, and the role of galled-trees in persistent phylogeny; alternative ILP formulation for computing rSPR distance. If time allows, I will also talk about the use of ILPs in association mapping, although that use does not explicitly involve networks.

Integrative analysis of biological big data
Jing-Dong Han, Jackie, CAS-MPG Partner Institute for Computational Biology, China

New high-throughput technologies, such as microarrays and deep sequencing technologies, have provided unprecedented opportunities for mapping mutations, transcripts, transcription factor binding and histone modifications at high resolution and at genome-wide level. This has revolutionized the way regulations of diseases and other biological processes are studies and generated a large amount of heterogeneous data, which is begging to be unbiasedly and efficiently integrated. How to integrate these data still remains a big challenge. We have explored to ab initio predict or reconstruct regulatory networks based on heterogeneous data on gene expression, histone modification and genomic changes. We find that innovative integrations of these data can lead to not only global pictures of the complex biological processes, such as aging and early development, but also key regulatory events of these processes. We have also developed new computational algorithms to facilitate mapping of epigenetic features from the deep sequencing data. I will highlight our new methods and results for the integrative analyses of large datasets to infer regulatory events, in particular in light of incorporating the epigenome and imaging data recently generated by international consortiums.

Bridging the gap
Katharina Huber, University of East Anglia, UK

Mirroring the situation for phylogenetic trees, phylogenetic networks have been proposed and studied successfully in terms of rooted and unrooted graphs. However and contrary to phylogenetic trees, these studies have generally viewed them as independent objects with either focusing solely on the rooted case or on the unrooted case (the exception being studies in Gambette et al. (2012) and Keijsper et al. (2013), respectively).
--------------------------------
Thus, not much is known about the interrelationship between both types of networks. In this talk we will present recent results aimed at bridging this gap.

This is joint work with P. Gambette and G. Scholz.

Phylogenetic networks and software
Daniel Huson, University of Tubingen, Germany

Unrooted phylogenetic networks are widely used in the biological literature and a main reason for this may be the availability of suitable algorithms, such as Neighbor-net (Bryant and Moulton, 2003), and suitable software, such as SplitsTree4 (Huson and Bryant, 2006). In contrast, rooted phylogenetic networks appeared much less frequently. This is perhaps surprising, because rooted phylognetic networks such as hybridization networks or duplication loss transfer scenarios have an explicit interpretation, whereas unrooted networks, such as split networks, are a much more abstract construct. What is required are algorithms and software that take "realistic" rooted phylogenetic trees as input and produce a useful set of phylogenetic networks as output. We will describe one such algorithm that is implemented in Dendroscope3 (Huson and Scornavacca, 2012).

Reconstructing phylogenetic level-1 networks from nondense binet and trinet sets
Leo Van Iersel, Technische Universiteit Delft, The Netherlands

Binets and trinets are phylogenetic networks with two and three leaves, respectively. Here we consider the problem of deciding if there exists a binary level-1 phylogenetic network displaying a given set T of binary binets or trinets over a set X of taxa, and constructing such a network whenever it exists. We show that this is NP-hard for trinets but polynomial-time solvable for binets. Moreover, we show that the problem is still polynomial-time solvable for inputs consisting of binets and trinets as long as the cycles in the trinets have size three. Finally, we present an exponential-time algorithm for general sets of binets and trinets. The latter two algorithms generalise to instances containing level-1 networks with arbitrarily many leaves, and thus provide some of the first supernetwork algorithms for computing networks from a set of rooted phylogenetic networks. This is joint work with Katharina Huber, Vincent Moulton, Celine Scornavacca and Taoyang Wu.

Comparing phylogenetic networks by counting triangles
Jesper Jansson, Kyoto University, Japan

We consider a generalization of the rooted triplet distance between two phylogenetic trees to two phylogenetic networks. A naive algorithm can compute this distance in O(n^3) time, where n is the number of leaf labels in the input networks. We show that if each of the given phylogenetic networks is a so-called ''galled tree'' then the rooted triplet distance can be computed in o(n^{2.687}) time. Our bound is obtained by reducing the problem to that of counting monochromatic and almost-monochromatic triangles in an undirected, edge-colored graph. To count different types of colored triangles in a graph efficiently, we extend an existing technique based on matrix multiplication and obtain some new algorithmic results that may be of independent interest.
[Joint work with Andrzej Lingas.]

Applications of the probabilistic species-tree aware approach
Jens Lagergren, KTH Royal Institute of Technology, Sweden

The duplication-loss model has been successfully used in several tree reconstruction methods. We have earlier integrated it with rate variation and sequence evolution, the DLRS model, as well as extended it by including lateral gene transfers, the DLTRS model. Here we describe several application building on these models, e.g., orthology analysis, reconciliation sampling, and pseudogenization analysis.

Controlling nonlinear dynamics on complex networks
Ying-Cheng Lai, Arizona State University, USA

The recent frameworks of controllability of complex networks have been developed but exclusively for linear dynamics. We first show that, even when a linear dynamical network is structurally controllable, it may not be physically controllable as the energy required for control can diverge. We introduce the notion of physical controllability to quantify this phenomenon. Further, we establish that, for physically controllable networks, the control energy exhibits an algebraic scaling behavior. These results point at the practical difficulty in formulating a general framework to control complex networks, even under the assumption of linear dynamics.

We then turn to complex, nonlinear dynamical networks by articulating a control framework whereby we nudge the system from attractor to attractor through small perturbations to a set of experimentally feasible parameters. This principle enables us to formulate a controllability framework for nonlinear dynamical networks arising from systems and synthetic biology. In particular, given a set of system performance indicators, we classify all the accessible attractors into three categories: undesirable, desirable, and intermediate attractors. The network is deemed controllable if there is a control path from any undesirable attractor to the desirable attractor under finite parameter perturbations. Regarding each attractor as a node and the control paths as directed links or edges, we can construct an "attractor network" that determines the controllability of the original nonlinear network. An interesting consequence is that, due to the interplay between nonlinearity and stochasticity, control of nonlinear dynamical networks can be facilitated by noise, leading to the surprising phenomenon of noise-enhanced controllability. These ideas are illustrated using a class of synthetic biological networks.

This is joint work with ASU PhD students Mr. Riqi Su, Ms. Lezhi Wang, and Mr. Yuzhong Chen, and Prof. Wenxu Wang from Beijing Normal University as well as Prof. Xiao Wang from ASU Bioengineering.

A statistical framework to analyze complex metagenomic networks
François-Joseph Lapointe, Université de Montréal, Canada

During the last decade, advances in sequencing technologies have generated an enormous volume of sequences from a myriad of organisms including both prokaryotes and eukaryotes, and have done so with continuous cost reduction and increased numbers and lengths of sequence reads. In addition, metagenomics is currently providing an unprecedented richness of DNA sequences directly from environmental or tissue samples that can be used to describe in detail the enormous complexity, diversity, and evolutionary dynamics of biological systems. Some of the challenges of metagenomics are to efficiently assess the biodiversity of microbial communities and to compare these communities with one another. Yet we lack the proper statistical framework for doing so. In this talk, I will present a series of statistical and graph-theoretical tools to efficiently analyze large metagenomics datasets using network-based approaches. I will propose novel diversity indices to characterize the network complexity, I will discuss alternative methods to efficiently compare network topologies, and I will propose null models for testing the presence of common evolutionary processes in complex metagenomic networks.

Cherry picking: a new characterization to quantify reticulation
Simone Linz, University of Auckland, New Zealand

The reconstruction of phylogenetic networks from phylogenetic trees is an active field of research and many algorithms exist that quantify reticulation for when the input consists of two phylogenetic trees. However, an exact quantification of reticulation for more than two trees remains largely elusive. In this talk, we provide a new characterization that quantifies reticulation for an arbitrary number of phylogenetic trees under two time constraints. The characterization is in terms of cherries and the existence of a particular type of sequence. This is joint work with Peter Humphries and Charles Semple.

Exploring environmental genetic diversity with similarity networks
Philippe Lopez, Université Pierre et Marie Curie, Paris, France

For the past decade, Next Generation Sequencing techniques have allowed to routinely explore the genetic diversity that is found in the environment, revealing in that process (and continuing to do so) a wealth of previously unknown genes, gene families and organisms. Since such metagenomics projects typically provide for every single run millions of reads belonging to an unknown number of genomes, the sheer quantity of sequence data obviously challenges most bioinformatics analyses. Being reasonably fast to build and analyze, similarity networks, where individual sequences are represented as nodes and a significant similarity between two sequences is represented as an edge, prove to be excellent tools for visualizing, exploring and structuring the genetic diversity that is found in such massive datasets. This talk will present how such networks can be applied to environmental sequence data (human gut microbiome, lizard gut microbiome, free-living marine ciliates) and how they can be used to investigate evolutionary questions.

Biological networks and network biology: a cancer story
Ali Masoudi-Nejad, University of Tehran, Iran

Cancer is a complex disease which contains multiple types of biological interactions through various physical, sequential, and biological scales. This complexity generates considerable challenges for the description of cancer biology, and inspires the study of cancer in the context of molecular, cellular, and physiological systems. An ultimate goal of bioinformatics and systems biology in next decade is a complete computer representation of the cell and the organism, which will enable computational prediction of higher-level complexity, such as molecular interaction networks behind all of the various cellular processes and phenotypes of entire organisms. The development of experimental and high-throughput analytical tools which generate huge amount of biological data could lead to application of computational models in biological discovery and clinical medicine especially for cancer. When we have a complete computer representation of living cells and organisms and know the principles of how they compute and interact, then, in the words of Sydney Brenner, "computational biology will become biological computation". In this talk we have described recent advances in biological network-based analysis of cancer as a model disease.

Sequence similarity networks, n-rooted fusion graphs and lego diagrams: new tools for understanding evolution
James McInerney, National University of Ireland, Ireland

The development of comprehensive high-level views of gene and genome evolution requires that we understand the flows of genetic information. These flows can be diverging and can also be introgressive. Traditionally, phylogenetic trees have been used to depict evolution, but they have limitations to only being able to describe the evolution of continuously diverging entities. More recently, we have developed N-rooted fusion graphs, which can successfully depict the evolutionary history of genetic mergers. These N-rooted graphs can be constructed using Sequence Similarity Networks (SSNs) as a guiding principle. In addition, in order to display the sum total of evolutionary history, we have developed lego diagrams to show how evolution is not treelike, but is more akin to an economic system where genetic parts are seen as "Public Goods".

The biology of phylogenetic networks
David Morrison, Uppsala University, Sweden

The pathways of evolutionary history are notably complex, and biologists have been slowly making progress in understanding both: how complex they usually are, and just how complex they might potentially be. To this end, it is now recognized that there are five conceptual levels at which we can usefully study evolutionary history: genes, genomes, individuals, populations, and taxa. (NB. Within any level, the groups have fuzzy boundaries.) All of these hierarchical levels turn out commonly to have reticulated histories, and so their genealogies form networks - only at the level of individual nucleotides can we expect history to show a tree-like structure. At the molecular level, recombination, gene conversion and gene fusion produce non-tree structures in gene phylogenies; and thus only non-recombining sequence blocks will have tree-like genealogies. In addition to recombination, genomes are also subject to various forms of gene flow, such as introgression, hybridization, and horizontal gene transfer; all genomes thus have network genealogies, although these may not be evident in any empirical genomic subset. At the level of whole organisms, a similar picture is evident. For example, gene flow affects the historical patterns among taxa, so that all phylogenies have network structures somewhere in their history, although these may not be evident in any empirical subset of characters. Pedigrees are complicated by the presence of two parents in all sexually reproducing species, so that if both parents are included in the pedigree then it will inevitably form a network; however, pedigrees often ignore one sex or the other, so that they are tree-like. Haplotype histories are simplified pedigrees applying to populations, grouping individuals with identical character data, and so recombination and gene flow will usually make them networks. All network analyses will be complicated in the face of various confounding issues, including: presence/absence of ancestors in the dataset; gene duplication-loss; and incomplete lineage sorting. No current computerized algorithms yet deal with all of the conceptual levels of history, nor do they deal with the potentially confounding factors.

Optimal trees and realizations from split networks
Vincent Moulton, University of East Anglia, UK

Split networks are a tool for analyzing reticulate evolution which have the ability to display conflict in data. They can be generated in various ways, for example, from character data (e.g., median networks) and from distance data (e.g., NeighborNets). As well as giving a visual snapshot of the data, split networks can contain a lot of useful information. In this talk, we will review split networks and some of their applications in phylogenetics, before presenting some recent work on how certain types of optimal trees and realizations of metrics can be obtained from such networks.

Statistical inference of reticulate evolutionary histories of species
Luay K. Nakhleh, Rice University, USA

Gene flow plays an important role in the evolution and adaptation of various groups of species, and results in reticulate evolutionary histories that are best modeled by phylogenetic networks. Incongruence among gene trees estimated from the sequences of multiple loci has been utilized as the main signal for inference of phylogenetic networks. However, it has been shown in several studies that both hybridization and incomplete lineage sorting (ILS) could be at play simultaneously. In such scenarios, methods for phylogenetic inference must account for both processes at the same time, as ignoring ILS results in incorrect estimates of the evolutionary history.

In this talk, I will describe a phylogenetic network model that incorporates reticulation and ILS, and describe new methods for inference of such phylogenetic networks from multi-locus data sets. I will describe maximum likelihood and Bayesian approaches to inference of phylogenetic networks from collections of gene tree estimates. I will demonstrate the performance of the methods on both biological data sets and synthetic ones.

Identification and study of composite genes in the environment
Jananan Pathmanathan, Université Pierre et Marie Curie, Paris, France

Composite genes are formed through evolutionary combinatorial processes such as fusion and recombination of segments derived from different gene families. Eukaryotic genomes seem to be particularly concerned by these saltatory mechanisms. For example, Adiantum ferns' adaptation to low light environment relies upon a composite photoreceptor, joining phytochrome and phototropin genes, which enables these ferns to use red light effectively. Despite their high adaptive potential, the global distribution of composite genes around Earth and their components' rules of combination are not well known.

An increasing amount of molecular data from metagenomic projects, with a considerable genetic diversity, is now available to address these fundamental issues beyond eukaryotic genomes: i) How are composite genes created ? Ii) Where are composite genes created ? Sequence similarity networks, where each node represents a unique sequence and each edge represents the similarity between connected sequences, appear to be well suited to quantify and study this genetic mosaicism.

We will present in this talk our methods used to detect composite genes (triplets) and families of composite genes (cliques) in large sequence similarity networks, in order to tackle these issues. Afterwards, we will show some results from a case study of polluted environments.

Gene regulatory networks to identify new targets for disease - applications to type I diabetes, cell multinucleation and epilepsy
Enrico Petretto, Duke-NUS Graduate Medical School

We designed a new integrative approach, called "Systems-Genetics", for the identification of genes, pathways and regulatory networks that underlie common human disease. In this, we developed tools to link DNA sequence variation with gene expression variation of complex biological networks using Bayesian variable selection approaches at the genome-wide level. Using our Systems-Genetics strategy we can identify relevant functional pathways and single out the key genes that regulate these pathways in disease - which are not captured by traditional genetic strategies such as genome wide association studies (GWAS). This provides a direct avenue for the identification of relevant cellular processes and therapeutic targets for disease modification. We illustrate the usefulness and power of our approach in diverse disease contexts: type I diabetes, inflammatory conditions characterized by cell multinucleation and human epilepsy.

Community structure and multilayer networks (and a few protein interactions)
Mason Porter, Oxford University, UK

I'll give an introduction to community structure and multilayer networks, which are two of the most active research areas in network science. In a network, a "community" is a densely connected set of nodes that is supposed to be sparsely connected to other sets of nodes. The algorithmic detection of communities can be used for finding functional groups in protein-protein interaction networks and for many other applications. Over the past few years, my collaborators and I have generalized methods for community detection from ordinary networks to "multilayer networks", which can change in time and/or include multiple different types of connections between nodes. I will introduce the idea of multilayer networks and discuss how to find dense sets of nodes in such networks. My work on multilayer community detection has not yet included protein interaction networks, but I will try to explain why such methods offer considerable promise for such applications.

Some statistical aspects of the W-graph model
Stéphane Robin, Agro ParisTech, France

W-graph refers to a general class of random graph models that can be seen as a random graph limit. It is characterized by both its graphon function and its motif frequencies. In the past decades, it has been mostly studied from a probabilistic point-of-view and statistical aspects have been addressed only within the last few years.
The stochastic block model is a special case of the W-graph where the graphon function is block-wise constant. In a first part of the presentation, we will propose a variational Bayes approach to estimate the W-graph as an average of stochastic block models with increasing number of blocks. We will derive a variational Bayes algorithm and the corresponding variational weights for model averaging. In the same framework, we will derive the variational posterior frequency of any motif. This approach will be illustrated on both synthetic and real networks analysis.
If time permitted, we will also address the problem of goodness-of-fit (GOF) of a given network model to an observed network, using a specific graphon function as a 'residual' null model. To this aim we will consider the degree distribution of the nodes, and more specifically its variance. Indeed, the degree variance has been considered for a long time as a relevant characteristic to depict the structure of a network. Our aim will be to derive a formal GOF test based on this statistic.

Kernelizations for the hybridization number problem on multiple nonbinary trees
Céline Scornavacca, Université Montpellier II, France

Given a finite set $X$, a collection $\mathcal{T}$ of rooted phylogenetic trees on $X$ and an integer $k$, the Hybridization Number problem asks if there exists a phylogenetic network on $X$ that displays all trees from $\mathcal{T}$ and has reticulation number at most $k$. We show two kernelization algorithms for Hybridization Number, with kernel sizes $4k(5k)^t$ and $20k^2(\Delta^+-1)$ respectively, with $t$ the number of input trees and $\Delta^+$ their maximum outdegree. In addition, we present an $\revX{n^{f(k)}}$\revB{-}time algorithm, with $n=|X|$ and $f$ some computable function of $k$.

Counting phylogenetic networks
Charles Semple, University of Canterbury, New Zealand

The number of binary phylogenetic trees on $\ell$ taxa is a classical result in mathematical phylogenetics dating back to Schr\"{o}der's work in 1870. This result also gives the number of such trees on $n$ labelled vertices. In contrast, the number of binary phylogenetic networks on $n$ labelled vertices is unknown. In this talk, we provide some answers to the problems of counting the numbers of phylogenetic networks. This is joint work with Colin McDiarmid and Dominic Welsh (University of Oxford).

Protein networks: from topology to logic
Roded Sharan, Tel-Aviv University, Israel

Protein networks have become the workhorse of biological research in recent years, providing mechanistic explanations for basic cellular processes in health and disease. However, these explanations remain topological in nature as the underlying logic of these networks is to the most part unknown. In this talk I will describe the work in my group toward the automated learning of the Boolean rules that govern network behavior under varying conditions. I will highlight the algorithmic problems involved and demonstrate how they can be tackled using integer linear programming techniques.

Studying horizontal gene transfer via quartet based methods
Sagi Snir, University of Haifa, Israel

One of the most fundamental tasks in biology is deciphering the history of life on Earth. It was generally thought that the history of life is best described using a tree structure. However, the reconstruction of trees of ancestor-descendant relationships for families of orthologous genes in prokaryotes have revealed widespread discordance between different gene trees. One of the major factors that impact gene trees incongruence is horizontal gene transfer (HGT), which is the non-vertical transfer of genetic material between organisms. The prevalence of HGT has led some researchers to question the meaningfulness of the Tree of Life (TOL) concept and the topic has remained a heated debate among evolutionists. Notably, much of the intensive study on HGT lacks mathematical modeling and rigor. Here we present a comprehensive study of HGT, that relies on the notion of the quartet plurality and its derivatives, in particular the quartet plurality distribution. We present the theoretical basis of this concept and subsequently it application to a large microbial data set encompassing around 7000 gene histories over 100 prokaryotes representing the entire prokaryotic world, that reveals several surprising facts. Among the major findings are that the prevailing uniformity assumption regarding HGT is inherently incorrect. Notwithstanding, the tools we develop here allow us to prove that a strong tree-like signal of evolution does exist, although each individual gene history is substantially obfuscated with heavy HGT. We also give an assessment of the rate of HGT that have prevailed during the entire prokaryotic history. HGT plays a major role in the emergence of new human diseases, as well as promoting the spread of antibiotic resistance in bacterial species. The results presented hereby may advance us in understanding the nature of HGT apart from their direct contribution to the long standing debate regarding the notion of the Tree of Life.

Mathematical aspects of phylogenetic networks
Mike Steel, University of Canterbury, New Zealand

In this talk, I will describe some recent mathematical results on phylogenetic networks, aimed at addressing three questions:

1. When is a network merely a tree with arcs between its branches?
2. When can distances between taxa in a network appear perfectly tree-like?
3 Which networks remain the same when they are 'unfolded' and then 'refolded'?

Results for the first two questions are joint work with Andrew Francis, while the third is with Vincent Moulton, Katharina Huber and Taoyang Wu.
I may also report on a curious combinatorial property that applies to binary planar networks, and which is algorithmically useful (joint work with David Bryant).

Lateral gene transfer as a molecular clock
Eric Tannier, INRIA, France

I will show how Lateral Gene Transfer keeps the record of a relative dating of diversification events. Thus it offers an alternative or a complement to the molecular clock, and is sometimes the only way to recover chronological informations in a deep past from which no fossil has been kept. I propose some methods to recover this signal by detecting lateral gene transfers. The results on cyanobacteria show a total consistency with the rare fossil record. It shows that Lateral Gene Transfer, which was often considered as an obstacle to the construction of molecular phylogenies, can oppositely be a support and bring additional informations.

Bayesian Network Modelling of Cancer Response to the Drug LY303511
Lisa Tucker-Kellogg, Duke-NUS Graduate Medical School

MOTIVATION:
The drug LY303511 has been proven to induce oxidative stress and make cancer cells easier to kill. Because oxidative stress as a complex effect, including many interdependent pathways, the potential for LY303511 in chemotherapy treatment was not well understood.

METHODS:
A Bayesian network was constructed with fixed topology, representing biochemical pathways that have been observed in various other types of oxidative stress. The parameters of the network represent the unknown importance of each biochemical pathway for this disease and this drug. Experiments with drug-treated cancer cells were conducted to measure biochemical species over time. This dataset was purely observational with none of the suspected pathways blocked. Our observational data was fed into the EM algorithm for estimating the Bayesian network parameters. The parameterized model yielded predictions for the relative contribution of each biochemical pathway for the effects of LY303511. Finally the contribution of each pathway was verified experimentally by inhibiting each pathway, one at a time, and repeating the drug treatment.

RESULTS:
Bayesian modelling predicted that calcium was a partial cause of the oxidative stress induced by short incubations with LY30, and that RNS (reactive nitrogen species) were strongly responsible for the oxidative stress induced by long incubations with LY30. Validation experiments confirmed the predicted roles of calcium and RNS, and also demonstrated a causal role for superoxide. In cell death measurements (quantified as sensitization to TRAIL-induced apoptosis), we found that 90% of drug effects could be explained by the combined effects of calcium and peroxynitrie, a key form of RNS. In summary, LY303511 induces multiple interdependent pathways of cell stress in cancer, with peroxynitrite and calcium contributing most significantly in HeLa cells.

CONCLUSION:
Our work shows that Bayesian networks can leverage existing biochemical knowledge from dissimilar experiments, for more efficiently determining the effects of new drugs.

Inference of genomic network dynamics with non-linear ODEs
Ernst Wit, University of Groningen, The Netherlands

Gene-regulatory systems, signalling pathways and metabolic fluxes are examples in the life-sciences where non-linear dynamics plays an important role. Ignoring single-cell fluctuations, these systems can be described by non-linear systems of differential equations. These models have been very popular in many branches of science due to their flexibility and their ability to describe dynamical systems. Despite the importance of such models in many branches of science they have not been the focus of systematic statistical analysis until recently.

In this talk we propose an approach to estimate the parameters of systems of differential equations measured with noise. Our methodology is based on the maximization of a penalized likelihood where the differential system of equations is used as a penalty. The proposed method is tested in real and simulated examples showing its utility in a wide range of scenarios.

Advancing clinical proteomics using protein complexes as a contextualization framework
Limsoon Wong, National University of Singapore

Proteomics based on mass spectrometry (MS) is a vital technology for profiling and understanding the mechanisms underpinning a biological phenomenon, e.g. drug response or disease. Although MS-based proteomics has improved dramatically in recent years, persistent issues still remain in the areas of protein coverage, data reproducibility, quantitation accuracy, and applicability on small data sets. Recently, impressive advances have been made on these problems by analyzing proteomics profiling data in the context of protein complexes and biological networks. In this talk, I will describe some of these successes.

Information needed to infer phylogenetic networks
Taoyang Wu, University of East Anglia, UK

Phylogenetic networks are a generalization of evolutionary trees and are an important tool for analyzing reticulate evolutionary histories. Recently, there has been great interest in developing new methods to construct rooted phylogenetic networks, that is, networks whose internal vertices correspond to hypothetical ancestors, whose leaves correspond to sampled taxa, and in which vertices with more than one parent correspond to taxa formed by reticulate evolutionary events such as recombination or hybridization.

Several methods for constructing evolutionary trees use the strategy of building up a tree from simpler building blocks (such as triplets or clusters), and so it is natural to look for ways to construct networks from smaller networks. In this talk I will discuss my recent joint work with Katharina Huber, Leo van Iersel and Vincent Moulton on a fundamental issue with this approach. Namely, we show that even if we are given all of the subnetworks induced on all proper subsets of the leaves of some rooted phylogenetic network, we still do not have all of the information required to completely determine that network. This implies that even if all of the building blocks for some reticulate evolutionary history were to be taken as the input for any given network building method, the method might still output an incorrect history. I will also discuss some potential consequences of this result for constructing phylogenetic networks.

Algorithms for constructing hybridization networks from multiple gene trees
Yufeng Wu, University of Connecticut, USA

The problem of constructing hybridization networks from gene trees has been actively studied recently. A hybridization network is a compact representation of the given gene trees in the sense that this network "displays" each of the gene trees. Hybridization networks may be useful in the study of reticulate evolution (e.g. recombination, horizontal gene transfer, hybrid speciation, etc.). When constructing hybridization networks, we usually want to construct the most parsimonious (i.e. simplest) networks. It is known that constructing the most parsimonious networks is computational difficult even only two gene trees are given. There are several methods for constructing hybridization networks for two gene trees. Fewer methods can reconstruct networks for three or more gene trees.

In this talk, I will present several methods for constructing phylogenetic networks that my research group has developed recently. These methods can construct hybridization networks for three or more gene trees. Some of these methods are heuristics (i.e. they do not always reconstruct the most parsimonious networks). Instead, these methods are designed to be efficient in reconstructing near parsimonious networks. We will present the basic ideas of these methods. We will also demonstrate through simulation that these methods perform well for constructing hybridization networks with multiple gene trees.

Integrative analysis for identifying joint gene-drug modular patterns via network-based methods
Shihua Zhang, Chinese Academy of Sciences, China

The underlying relationship between genomic factors and the response of diverse cancer drugs still remains unclear. A number of studies have showed that the heterogeneous responses to anticancer treatments of patients are partly associated with their specific changes in gene expression and somatic alterations. The emerging large-scale pharmacogenomic data provide us valuable opportunities to improve existing therapies or to guide the early-phase clinical trials of the compounds under development. However, how to identify the underlying combinatorial patterns among the data is still a challenging issue. In this talk, we will report two network-based methods to address it. First, we proposed a new quantitative function and developed a heuristic label propagation algorithm (BiLPA) to optimize it for modular patterns in a gene-drug bipartite network. Second, we developed a sparse network-regularized partial least square (SNPLS) method to identify joint modular patterns using gene-expression, drug-response data and a gene network. We demonstrated the effectiveness of BiLPA and SNPLS on a set of simulation data and applied it to a real biological data across 641 cancer cell lines consisting of diverse tissue types of human cancers. We found that the gene-drug modular patterns provide us new insights into the molecular mechanisms of how drugs act and suggest new drug targets for therapy of certain types of cancers.

Multi-view spectral clustering with applications in gene coexpression networks
Shuqin Zhang, Fudan University, China

Multi-view data clustering is an increasingly popular research topic in recent years. Exploiting information from multiple views, one can hope to find a clustering which is more accurate among all the views than the result obtained from a single view. In this talk, we will introduce a spectral clustering framework for multi-view data. The problem is formulated as an optimization model, which combines the clustering in each individual view and alignment of the clusters from different views together. An approximation algorithm based on eigenvector computation is proposed. Our method outperforms the existing methods, especially when the underlying clusters in multiple views are different. We applied our method to two groups of gene coexpression networks for humans, which include one for three different cancers, and one for three tissues from the morbidly obese patients. The results were validated through Gene Ontology enrichment and KEGG pathway enrichment analysis. We also showed that the main functions of most clusters identified for the corresponding disease have been addressed by other researchers, which may provide the theoretical basis for further experimental study.

Knowledge-guided fuzzy logic network modeling to detect alterations in cancer signaling pathways
Jie Zheng, Nanyang Technological University

Abnormal alteration in signaling pathways is a key characteristic in cancer cells. As drug-induced rewiring of signaling network is a major strategy of anticancer treatment, accurate prediction of cellular responses to drugs is a crucial but challenging task. Our prior knowledge about mechanism of signaling transduction is limited and often fails to predict the actual cellular responses to perturbations. Despite encouraging success, data-driven methods have their limitations including the requirement of large-size dataset that may not be available and the difficulty of interpreting the results. Hybrid methods integrating prior knowledge with data-driven inference are therefore highly desirable. In this project, we propose a fuzzy logic network model integrating the prior knowledge and data-driven inference to detect signaling pathway alteration. In particular, we introduce a regularizer to encode the penalty against both model complexity and structural divergence between prior and learned networks, to the least square error between experimental and predicted data. We formulate the knowledge-guided fuzzy logic network model into a constrained nonlinear integer programming problem that can be efficiently solved by genetic algorithm. The proposed method is evaluated on a synthetic dataset and three real phosphoproteomic datasets, and the experimental results demonstrate that our method can not only effectively uncover both the topological structure and logic gates of network, but also infer the signaling pathway alterations that are not included in prior knowledge network but supported by data.

Network biology for complex human diseases
Jun Zhu, Mount Sinai Hospital, USA

Part I: Fundamental theories of network biology
In this part, I will overview theories and hypotheses underlying different network biology approaches.

Part II: Overview of association and causal networks
In this part, I will overview association networks and causal network, how to compare different networks, how to integrate different types of networks

Part III: Application of network biology in human diseases
Cells employ multiple levels of regulation, including transcriptional and translational regulation, that drive core biological processes and enable cells to respond to genetic, epigenetic, and environmental changes. In this part, I will focus on integrating diverse types of data into network models for elucidating mechanisms of complex human diseases.

Compare GWAS candidates in multiple species using physiologically relevant networks
Jun Zhu, Mount Sinai Hospital, USA

Multiple candidate genes have been identified in recent Genome-wide association studies (GWAS) either by SNP arrays or next generation sequencing. However, the function of these candidate genes and the relevant tissues where these genes are functional are not always clear. Network analysis has shed some light on biological processes that these candidate genes may involve. An animal model is frequently needed to further validate function and disease association of a candidate gene. To address which animal model is appropriate for a specific disease phenotype, we systematically compared GWAS candidates in different species using physiologically relevant network models. We evaluated different networks for analyzing GWAS candidate genes, and developed multiple approaches to compare networks of different species. We found that human polygenetic disease traits such as height, anemia, and blood lipid traits, were better represented in the pig than in mouse.

1 Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA;
2 Jiangxi Agricultural University, Nanchang, Jiangxi, China;
3 Department of Integrative Biology and Physiology, University of California at Los Angeles, Los Angeles, CA, USA;

Diversity and evolution of secondary metabolite clusters in marine actinomycetes
Nadine Ziemert, University of Tuebingen, Germany

Marine actinomycetes have been proven to be a valuable source for the discovery of new natural products. In order to develop effective sampling strategies, we need to understand the evolution and diversifying mechanisms of secondary metabolites in bacteria. We chose the obligate marine actinomycete genus Salinispora as the perfect model organism to study diversity and distribution of secondary metabolite genes. This genus is the source of diverse natural products and has proven to be a tractable model with which to address correlations between taxonomy and secondary metabolite production.
Here we report the analysis of secondary metabolite biosynthetic gene clusters in 75 Salinispora genome sequences sampled around the world. The results detail the diversity and distributions of secondary metabolite biosynthetic pathways among closely related populations and reveal an extraordinary level of pathway sampling and horizontal gene transfer. The sequence data also provide clear evidence of the evolutionary mechanisms that generate new pathway diversity.

Best viewed with IE 7 and above