- Facebook
(function (d, s, id) { var js, fjs = d.getElementsByTagName(s)[0]; if (d.getElementById(id)) { return; } js = d.createElement(s); js.id = id; js.src = "//connect.facebook.net/en_US/all.js#xfbml=1&appId=273056202727397"; fjs.parentNode.insertBefore(js, fjs); } (document, 'script', 'facebook-jssdk')); - LinkedIn

**These files have been approved for public release. Distribution is unlimited. LA-UR-16-20194.**

Madhu Advani, Stanford University

Statistical Physics of High Dimensional Inference

To model modern large-scale datasets, we need efficient algorithms to infer a set of P unknown model parameters from N noisy measurements. What are the fundamental limits on the accuracy of parameter inference given limited measurements, signal-to-noise ratios, prior information, and computational tractability requirements? How can we combine prior information with measurements to achieve these limits? Classical statistics gives incisive answers to these questions as the measurement density (N divided by P) approaches infinity. However, modern high-dimensional inference problems, in fields ranging from bio-informatics to economics, occur at finite measurement density. We formulate and analyze high-dimensional inference analytically by applying the replica and cavity methods of statistical physics where data serves as quenched disorder and inferred parameters play the role of thermal degrees of freedom. Our analysis reveals that widely cherished Bayesian inference algorithms such as maximum likelihood and maximum a posteriori are suboptimal in the modern setting, and yields new tractable, optimal algorithms to replace them as well as novel bounds on the achievable accuracy of a large class of high-dimensional inference algorithms.

Anima Anandkumar, University of California Irvine

Tensor Methods: A new paradigm for guaranteed learning of probabilistic models and neural networks

Tensors are rich structures for modeling complex higher order relationships in data rich domains such as social networks, computer vision, internet of things, and so on. Tensor decomposition methods are embarrassingly parallel and scalable to enormous datasets. They are guaranteed to converge to the global optimum and yield consistent estimates for many probabilistic models such as topic models, community models, hidden Markov models, and so on. I will also demonstrate how tensor methods can yield rich discriminative features for classification tasks and provides a guaranteed method for training neural networks.

Jennifer Annoni, University of Minnesota

Poster-A New Approach to Reduced-Order Modeling for Flow Control

The objective of this work is to obtain linear reduced-order models that can be used for controller design. This technique has specifically been applied to fluid flows. High-fidelity computational fluid dynamic models provide accurate characterizations of complex flow dynamics but are not suitable for control design due to their prohibitive computational complexity. A variety of methods, including proper orthogonal decomposition (POD) and dynamic mode decomposition (DMD) can be used to extract the dominant flow structures and obtain reduced-order models of the flow. In this presentation, we introduce a new method, termed input-output dynamic mode decomposition (IODMD), that uses a subspace identification technique to obtain models of low-complexity. We show that, relative to standard DMD, the introduction of an external forcing in IODMD provides robustness with respect to small disturbances and noise in the model. This method captures the dominant characteristics of the flow as well as the input/output behavior of the system. Specifically, IODMD is applied to high-fidelity simulations of wind farm flow to obtain a reduced-order model that can be used for wind farm control.

Guy Bresler, Massachusetts Institute of Technology

Learning a Tree-Structured Ising Model in Order to Make Predictions

We study the problem of learning a tree graphical model from samples such that low-order marginals are accurate. We define a distance (``small set TV" or ssTV) between distributions P and Q by taking the maximum, over all subsets S of a given size, of the total variation between the marginals of P and Q on S. Approximating a distribution to within small ssTV allows making predictions based on partial observations. Focusing on pairwise marginals and tree-structured Ising models on p nodes, we give an algorithm that produces a distribution (from the same class) with ssTV at most eta using log p / eta^2 IID samples. We also prove a matching lower bound showing that this sample complexity is optimal. In contrast to the problem of learning the tree structure itself, there is no dependence on the strength of edge interactions. Thus, even when there are far too few samples to recover the correct tree, it is possible to learn an incorrect tree that is useful. The time complexity of the proposed algorithm is on the order of p^2, dominated by the computation of pairwise correlations. Joint work with Mina Karzand.

Joachim Buhmann, ETH Zurich

Information theory of algorithms: How precisely should we compute?

Algorithms map input spaces to output spaces where inputs are possibly affected by fluctuations. Beside run time and memory consumption, an algorithm might be characterized by its sensitivity to the signal in the input and its robustness to input fluctuations. The achievable precision of an algorithm, i.e., the attainable resolution in output space, is determined by its capability to extract predictive information in the input relative to its output. I will present an information theoretic framework for algorithm analysis based on maximum entropy estimation where an algorithm is characterized as computational evolution of a (possibly contracting) posterior distribution over the output space. The tradeoff between precision and stability is controlled by an input sensitive generalization capacity (GC). GC measures how much the posteriors on two different problem instances agree despite the input noise. For optimization problems in the two instance setting, GC is calculated by estimating the free energies of the two instance costs and of the joint costs. Thereby, GC objectively ranks different algorithms for the same data processing task based on the bit rate of their respective capacities. Information theoretic algorithm selection is demonstrated for minimum spanning tree algorithms and for greedy MaxCut algorithms. The method can rank centroid based and spectral clustering methods, e.g. k-means, pairwise clustering, normalized cut, adaptive ratio cut and dominant set clustering.

Kieron Burke, University of California Irvine

Machine learning density functionals

In chemistry and materials science, the electronic structure problem (finding the ground-state energy of electrons bound to nuclei) is incredibly important. More than 30,000 papers used density functional theory to solve this problem last year. The theory component of the Materials Genome Initiative involves creating huge databases of DFT materials calculations and searching for desired properties. In contrast, with friends from the machine-learning community, I have been trying to construct approximate density functionals using ML, in order to overcome the limitations of our present approximations. In some ways, ML is the perfect tool for this procedure. In other ways, it is a nightmare. I will present our latest results. We could use any help we can get.

Deepjyoti Deka, Los Alamos National Laboratory

Learning and Estimation Problems in Radial Distribution Networks

Distribution Networks provide the final tier in the transfer of electricity from generators to the end consumers. Traditionally, distribution networks have been equipped with limited real-time monitoring devices and have low observability. However, there has been a recent surge in the deployment of smart meters in distribution networks for use in demand response, improved home-monitoring and other smart grid inspired topics. We discuss learning and estimation problems in the distribution grid using available phasor and voltage magnitude data. For a wide range of realistic operating conditions, polynomial time algorithms are proposed to learn the structure of the distribution network as well as to estimate the statistics of load profiles at the constituent nodes and/or line parameters. Further, the algorithms are extended to cases with missing data, where observations limited to a fraction of the nodes are available for structure learning. This work can be applied to improve control and optimize operations in the grid as well as to understand the scope of adversarial attacks. Current extensions include learning techniques in three phase systems and intersection with graphical model based learning. This is a joint work with S. Backhaus and M. Chertkov.

Karthik Duraisamy, University of Michigan

Inference and Machine Learning for Turbulence Modeling

The recent acceleration in computational power and measurement resolution has made possible the availability of extreme scale simulations and data sets. In this work, a modeling paradigm that seeks to comprehensively harness large scale data is introduced, with the aim of improving closure models. Full-field inversion (in contrast to parameter estimation) is used to obtain corrective, spatially distributed functional terms, offering a route to directly address model-form errors. Once the inference has been performed over a number of problems that are representative of the deficient physics in the closure model, machine learning techniques are used to reconstruct the model corrections in terms of variables that appear in the closure model. These machine-learned functional forms are then used to augment the closure model in predictive computations. The approach is demonstrated to be able to successfully reconstruct functional corrections and yield predictions with quantified uncertainties in a range of turbulent flows.

David Gamarnik, Massachusetts Institute of Technology

(Arguably) Hard on Average Optimization Problems and the Overlap Gap Property

Many problems in the area of random combinatorial structures and high-dimensional statistics exhibit an apparent computational hardness, even though the formal results establishing such hardness is lacking. Examples include the problem of finding a independent set of a random graph, finding a proper coloring of a random graph and the problem of finding the largest submatrix of a random matrix. Inspired by insights from the statistical physics we propose a general conjecture regarding the onset of hardness for these types of problems formulated in terms of a certain overlap gap property. We conjecture that such problems become hard on average when the space of overlaps becomes disconnected in an appropriate sense (the model exhibits an overlap gap), and we prove the existence of such gaps for several of these problems including the problem of finding the largest submatrix of a random Gaussian matrix. Furthermore, we establish that the overlap gap property is a provable obstacle for a certain class of local algorithms for the problem of finding a largest independent set of a sparse graph and finding a satisfying assignment of a random NAE-K-SAT problem.

Surya Ganguli, Stanford University

The statistical physics of deep learning: on the role of dynamic criticality, random landscapes, and the reversal of time. Neuronal networks have enjoyed a resurgence both in the worlds of neuroscience, where they yield mathematical frameworks for thinking about complex neural datasets, and in machine learning, where they achieve state of the art results on a variety of tasks, including machine vision, speech recognition, and language translation. Despite their empirical success, a mathematical theory of how deep neural circuits, with many layers of cascaded nonlinearities, learn and compute remains elusive. We will discuss three recent vignettes in which ideas from statistical physics can shed light on this issue. In particular, we show how dynamical criticality can help in neural learning, how the non-intuitive geometry of high dimensional error landscapes can be exploited to speed up learning, and how modern ideas from non-equilibrium statistical physics, like the Jarzynski equality, can be extended to yield powerful algorithms for modeling complex probability distributions. Time permitting, we will also discuss the relationship between neural network learning dynamics and the developmental time course of semantic concepts in infants.

Thomas Halsey, ExxonMobil Upstream Research Co.

Managing Business Risk under Uncertainty: Development Planning in the Oil and Gas Industry

The upstream oil and gas industry is distinguished by its dependence on a poorly understood subsurface as the ultimate source of value. All phases of upstream activity: exploration, development, and production, incur significant business risk arising from our incomplete knowledge of the subsurface and of the fluids therein. There are two ways of dealing with this risk: either by trying to reduce it through additional data collection and better characterization of the subsurface, or by better managing it, by better understanding the range of plausible potential subsurface realities that are consistent with a given set of characterization data and the economic implications of this range, by selection actions that optimize return over the range of uncertainty, and by identifying possible responses to unfavorable outcomes. In this contribution we discuss the differences between development and production (or more informally, green-field and brown-field) oriented uses of modeling and simulation, and then focus on development planning applications, which attempt to account for uncertainty early in an asset lifecycle, when both uncertainty and the economic impact of decisions are at their maximum. “Scenario discovery”, a clustering process across numerous scenarios, is key to these applications.

Hendrik Hamann, IBM T.J. Watson Research Center

Physical Analytics: Bringing big data together with physics

In the past most information on the internet has been originated by humans or computers. However with the emergence of cyber-physical systems, vast amount of data is now being created by sensors from devices, machines etc digitizing the physical world. While cyber-physical systems are subject to active research around the world, the vast amount of actual data generated from the physical world has attracted so far little attention from the engineering and physics community. In this presentation we use examples to highlight the opportunities in this new subject of “Physical Analytics” for highly inter-disciplinary research (including physics, engineering and computer science), which aims understanding real-world physical systems by leveraging cyber-physical technologies. More specifically, the convergence of the physical world with the digital domain allows applying physical principles to everyday problems in a much more effective and informed way than what was possible in the past. Very much like traditional applied physics and engineering has made enormous advances and changed our lives by making detailed measurements to understand the physics of an engineered device, we can now apply the same rigor and principles to understand large-scale physical systems. In the talk we first present a set of “configurable” enabling technologies for Physical Analytics including ultralow power sensing and communication technologies, physical big data management technologies, numerical modeling for physical systems, machine learning based physical model blending, and physical analytics based automation and control. Then we discuss in detail several concrete applications of Physical Analytics ranging from energy management in buildings and data centers, environmental sensing and controls, precision agriculture to renewable energy forecasting and management.

Lior Horesh, IBM Research

Should you Derive, or let the data Drive? Framework for design of hybrid first-principles data-driven models

Mathematical models are employed ubiquitously for description, prediction and decision making. In addressing end-goal objectives, great care needs to be devoted to attainment of appropriate balance of inexactness throughout the various stages of the end goal process (e.g. modeling and inversion). Disregard to such considerations, either entails redundant computation or impairment of the overall fidelity of the inversion process. Model reduction is instrumental in trading-off fidelity for computation, yet, in some situations, it is essential to take an opposite perspective, and enhance model fidelity and thereby end-goal output. In this talk, we shall describe the interplay between model reduction and model mis-specification mitigation and provide a generic infrastructure for model re-specification based upon a hybrid first principles and data-driven approach.

Jason Johnson, Numerica Corporation

Machine Learning and Discrete Optimization for Police Patrol Planning

Today’s law enforcement decision makers are inundated with data. We discuss how machine learning and discrete optimization techniques can be used to extract actionable information from the historical record of demand for police service, propelling patrol activities to new levels of efficiency and effectiveness.

Bert Kappen, Radboud University

Control, inference and learning

Intelligent systems, whether natural or artificial, must act in a world that is highly unpredictable. To plan actions with uncertainty is a stochastic optimal control problem. However, there are two fundamental problems: the optimal control solution is intractable to compute and intractable to represent due the non-trivial state dependence of the optimal control. This has prevented large scale application of stochastic optimal control theory sofar. The path integral control theory describes a class of control problems whose solution can be computed as an inference computation. In this presentation I focus on efficient computation of the optimal control solution using importance sampling. I formalize the intuitive notion that the efficiency of the inference computation is related to the proximity of the sampling control to the optimal control. Secondly, I show new results that allow approximate computation of state dependent optimal controls using the cross entropy method. These two ingredients together suggest a novel adaptive sampling procedure, called PICE (path integral cross entropy method), that learns a controller based on self-generated data. |The adaptive sampling procedure can be used to efficiently compute optimal controls but can also be used to accelerate other Monte Carlo computations. We illustrate the results on a few examples in robotics and time series.

Yannis Kevrekidis, Princeton University

No equations, no variables, no parameters: Data and the computational modeling of complex systems

I will discuss an interface between traditional continuum modeling of complex dynamical systems (ODEs, PDEs, evolving networks) and fine scale, atomistic/stochastic/agent based numerics. The use of modern data-mining techniques (and in particular here, manifold-learning techniques like diffusion maps) allows us to solve coarse-grained, systems level descriptions of the fine scale simulations without deriving these descriptions in closed form - what we call the equation-free approach. It also allows us to do this when it is not explicitly known what the right coarse-grained variables are (variable-free computation) nor what the right coarse-grained parameters are (parameter-free computation). This ``agnostic" approach to modeling only takes into account data and the nature of the unavailable model (not its actual closed form); and it uses traditional scientific computing as a tool for the "on the fly" design of computational experiments.

Jorge Kurchan, LPS-Ecole Normale Superieure

Darwinian versus thermal optimization

Darwinian optimization -- walkers that randomly diffuse and reproduce proportionally to the quantity that has to be maximized -- may be mapped into ordinary thermal optimization (e.g. Monte Carlo) when applied to complex landscapes. We may thus see the limitations of the former clearly. More interestingly, we may apply what we know from annealing schemes to accelerating the dynamics of an evolving population.

Po-Ling Loh, University of Pennsylvania

Confidence sets for diffusion source estimators

We study the problem of identifying the source of a diffusion spreading over a regular tree. When the degree of each node is at least three, we show that it is possible to construct confidence sets for the diffusion source with size independent of the number of infected nodes. Our estimators are motivated by analogous results in the literature concerning identification of the root node in preferential attachment and uniform attachment trees. At the core of our proofs is a probabilistic analysis of Polya urns corresponding to the number of uninfected neighbors in specific subtrees of the infection tree.

Andrey Lokhov, Los Alamos National Laboratory

Efficiently (re)constructing spreading networks from partial observations

An important problem of reconstruction of diffusion network and transmission probabilities from spreading data has attracted a considerable attention in the past several years. In this talk, we focus on a realistic and restricted scenario, in which only a partial information on cascades is available, and argue that the methods based on the maximization of the likelihood are computationally intractable. To tackle this problem, we suggest an algorithm based on recently introduced dynamic message-passing equations for spreading processes, which enables a fast and robust reconstruction of transmission probabilities in sparse networks. As an application, we discuss how the developed framework can be used for engineering artificial networks with desired properties.

Charles Mathy, Lyric Labs ADI

SPARTA: Fast global planning of collision-avoiding robot trajectories

I will present an algorithm for obtaining collision-avoiding robot trajectories called SPARTA, which stands for SPline-based ADMM for Robot Trajectory Avoidance. The problem of finding smooth collision-avoiding trajectories of robots is solved using a message passing algorithm called the Three Weight Algorithm. The generated paths include physical constraints of the robots, and the convergence speed is such that it becomes feasible for real-time applications. We physically implemented our algorithm using mobile robots and showcase the results. The general approach can be applied to many kinds of optimization problems, including both soft and hard costs in the objective.

Pankaj Mehta, Boston University

Variational RG and Deep Learning

Deep learning is a broad set of techniques that uses multiple layers of representation to automatically learn relevant features directly from structured data. Recently, such techniques have yielded record-breaking results on a diverse set of difficult machine learning tasks in computer vision, speech recognition, and natural language processing. Despite the enormous success of deep learning, relatively little is understood theoretically about why these techniques are so successful at feature learning and compression. Here, we show that deep learning is intimately related to one of the most important and successful techniques in theoretical physics, the renormalization group (RG). RG is an iterative coarse-graining scheme that allows for the extraction of relevant features (i.e. operators) as a physical system is examined at different length scales. We construct an exact mapping from the variational renormalization group, first introduced by Kadanoff, and deep learning architectures based on Restricted Boltzmann Machines (RBMs). We illustrate these ideas using the nearest-neighbor Ising Model in one and two-dimensions. Our results suggest that deep learning algorithms may be employing a generalized RG-like scheme to learn relevant features from data.

A. Alan Middleton, Syracuse University

The effects of boundaries and dimensions on optimization and memory recovery

Some links between optimization and dimensionality, especially with respect to dimensionality and to recovery of global solutions, will be explored, using the example of disordered magnets. In some cases, polynomial time algorithms can be used, which allow for thorough examination of the effects of system size on the uniqueness of solutions and the recovery after aging can be explored. In particular, recent results on the effects of boundary conditions in random Ising magnets in two dimensions found using recently developed shortest paths methods will be presented, along with preliminary work on recovery of ground states using patchwork dynamics (optimization over blocks).

Remi Monasson, ENS Paris

Benchmarking inverse statistical approaches for protein structure and design with lattice-protein models

Inverse statistical approaches, modeling pairwise correlations between amino acids in the sequences of similar proteins across many different organisms, can successfully extract protein structure (contact) information. I will present a recent work, in collaboration with S. Cocco and E. Shakhnovich, where we benchmark those statistical approaches on exactly solvable models of proteins, folding on a 3D lattice, to assess the reasons underlying their success and their limitations. We show that the inferred parameters (effective pairwise interactions) of the statistical models have clear and quantitative interpretations in terms of positive (favoring the native fold) and negative (disfavoring competing folds) protein sequence design. New sequences randomly drawn from the statistical models are likely to fold into the native structures when effective pairwise interactions are accurately inferred, a performance, which cannot be achieved with independent-site models.

Cristopher Moore, Santa Fe Institute

Phase transitions in sparse, high-dimensional clustering

Many problems in data analysis have phase transitions: when the data becomes too noisy or too sparse, it suddenly becomes impossible to find any pattern in it, or to distinguish it from a null model with no pattern at all. In some cases, there is also a regime where pattern-finding is information-theoretically possible, but appears to be computationally hard. I will review work over the past few years on one such problem, finding communities in networks. I will then present some new work, which is partly analogous, on one of the most basic problems in data science: clustering O(n) points in n-dimensional space.

Gautam Nallamala, University of California, San Diego

Learning to Navigate Turbulent Environments: The Soaring Flight of Birds and Gliders

Birds and gliders use warm, rising
currents (thermals) in the atmosphere to ascend up while minimizing
energy expenditure. Thermal soaring, as this form of flight is called,
is a strategy used by birds as they migrate across large distances,
sometimes even across oceans and continents. Soaring flight provides an
interesting example of a complex decision-making problem in biology that
requires a long-term strategy in order to effectively utilize the
ascending thermals. The process of finding thermals is further
complicated by turbulent fluctuations that accompany thermal formation.
We approach the soaring flight of birds and gliders as a problem of
learning to navigate complex, highly fluctuating turbulent environments.
Most soaring is observed in the atmospheric convective boundary layer
on a warm, sunny day. The convective boundary layer of the atmosphere is
modeled in two ways: As a Rayleigh-Benard flow at high Rayleigh number,
and as a kinetic model that reproduces the statistical properties of
the boundary layer. The two models include the necessary physical
features that enable and also complicate effective soaring. We use a
model-free, experience-based, reinforcement learning algorithm to train
gliders. We identify and compare the learned policies in the regimes of a
weak and strong flow. From weak to strong, we show how the glider
adopts an increasingly conservative policy, quantifying the degree of
risk affordable in turbulent environments. The learning
procedure also permits the identification of the turbulent observables
that are most effectively exploited by the glider for its soaring in the
atmospheric boundary layer.

Masayuki Ohzeki, Kyoto University

Accelerated Langevin Dynamics and its application to machine learning

In machine learning, in order to estimate the parameter of a generative model, the most fundamental approach is the maximum likelihood estimation. However the computational cost for maximum likelihood estimation is harmful to compute the partition function or expectation values in equilibrium. The alternative technique is the contrastive divergence to quickly reach the equilibrium state while avoiding the computation of the partition function. In order to develop the method of the contrastive divergence, we implement a novel technique to accelerate the convergence to the desired distribution inspired by nonequilibrium dynamics in the framework of the contrastive divergence. We confirm its relevance in the contrastive divergence in the simple model. In addition, we refer the possibility of the quantum annealing for utilization of the contrastive divergence in the machine learning.

Ruslan Salakhutdinov, University of Toronto

Recent Advances in Deep Learning: Learning Structured, Robust, and Multimodal Deep Models

Building intelligent systems that are capable of extracting meaningful representations from high-dimensional data lies at the core of solving many Artificial Intelligence tasks, including visual object recognition, information retrieval, speech perception, and language understanding. In this talk I will first introduce a broad class of hierarchical probabilistic models called Deep Boltzmann Machines (DBMs) and show that DBMs can learn useful hierarchical representations from large volumes of high-dimensional data with applications in information retrieval, object recognition, and speech perception. I will then introduce deep models that are capable of extracting a unified representation that fuses together multiple data modalities and discuss models that can generate natural language descriptions (captions) of images. I will show that on several tasks, including modelling images and text, video and sound, that these models significantly improve upon many of the existing techniques.

David Schwab, Northwestern University

The Deterministic Information Bottleneck

Compression fundamentally involves a decision about what is relevant and what is not. The information bottleneck (IB) formalized this notion as an information-theoretic optimization problem and proposed an optimal tradeoff between throwing away as many bits as possible, and selectively keeping those that are most important. Here, we introduce an alternative formulation of the IB, the deterministic information bottleneck (DIB), that we argue better captures the notion of compression. As suggested by its name, the solution to the DIB problem is a deterministic encoder, as opposed to the stochastic encoder that is optimal under the IB. We then compare the IB and DIB on synthetic data, showing that the IB and DIB perform similarly in terms of the IB cost function, but that the DIB vastly outperforms the IB in terms of the DIB cost function. Our derivation of the DIB also provides a family of models which interpolates between the DIB and IB by adding noise of a particular form. We discuss the role of noise as a regularizer of the DIB.

Alexander Shapeev, Skolkovo Institute of Science and Technology

A New Class of Machine Learning Interatomic Potentials

Molecular modelling becomes increasingly important in materials research. There, the forces on atoms are obtained either through using the empirical potentials or the ab initio calculations. The accuracy of the empirical potentials allows only for qualitative predictions, but they allow modelling millions to billions atoms. The ab initio calculations, on the other hand, are typically limited to 100-1000 atoms, but their accuracy is much higher. Hence, a practically important problem is to bridge the gap in the accuracy and efficiency of the empirical potentials and ab initio calculations. Ideally, this should be interatomic potentials whose efficiency would be comparable to empirical potentials and accuracy - to the ab initio calculations. At least, they should be potentials whose accuracy could be systematically improved at the cost of increasing computational complexity. As a rule, the discipline of machine learning is heavily used to construct such interatomic potentials. In my talk, I will review the state of the art in constructing such potentials and present a new approach based on linear regression and invariant polynomials. This approach can achieve an accuracy comparable to most accurate of the existing schemes, yet 1-2 orders of magnitude more computationally efficient.

Jinwoo Shin, KAIST

Polynomial-time Message-passing Algorithm for Minimum Weight Perfect Matching

It has been shown that the max-product Belief Propagation (BP) can solve a class of Linear Programming (LP) relaxations on combinatorial optimizations if no integrality gap is present. However, when LP shows an integrality gap, no model has been known which can be solved systematically via sequential applications of BP. In this paper, we develop the first such algorithm for solving the minimum weight matching problem over arbitrary graphs. Each step of the sequential algorithm requires applying BP over a modified graph constructed by contractions and expansions of blossoms, i.e., odd sets of vertices. For graph G= (V,E), our scheme guarantees termination in O(|V|^2) of BP runs and each BP converges in O(|V|) iterations in the worst case. In essence, it offers a message-passing version of the celebrated Edmonds’ Blossom algorithm by jumping at once over many sub-steps with a single BP.

David Sontag, New York University

How Hard is Inference for Structured Prediction?

Many machine learning tasks can be posed as structured prediction, where the goal is to predict a labeling or structured object. For example, the input may be an image or a sentence, and the output is a labeling such as an assignment of each pixel in the image to foreground or background, or the parse tree for the sentence. Despite marginal and MAP inference for many of these models being NP-hard in the worst-case, approximate inference algorithms are remarkably successful and as a result structured prediction is widely used. What makes these real-world instances different from worst-case instances? One key difference is that in all of these applications, there is an underlying "ground truth" which structured prediction is aiming to find. In this talk, I will introduce a new theoretical framework for analyzing structured prediction algorithms in terms of their ability to achieve small Hamming error. We study the computational and statistical trade-offs that arise in this setting, and illustrate a setting where polynomial-time algorithms can perform optimal prediction, despite the corresponding MAP inference task being NP-hard.

Naftali Tishby, Hebrew University

Deep Learning and the Information-Bottleneck phase transitions

Deep Neural Networks (DNNs) are analyzed via the theoretical framework of the information bottleneck (IB) framework. We first show that any DNN can be quantified by the mutual information between the layers and the input and output variables. Using this representation we can calculate the optimal information theoretic limits of the DNN and obtain finite sample generalization bounds. The advantage of getting closer to the theoretical limit is quantifiable both by the generalization bound and by the network’s simplicity. We argue that both the optimal architecture, number of layers and features/connections at each layer, are related to critical points on the information bottleneck tradeoff line, namely, relevant compression of the input layer with respect to the output layer. The hierarchical representations at the layered network naturally correspond to the structural phase transitions along the information curve. An interesting class of solvable DNN's arise by applying this framework to the case of symmetries in the supervised learning task. The case of translation invariance leads to the familiar convolution neural networks. Other symmetry group chains yield different types of bifurcation diagrams and network architectures. These new insights also suggest new sample complexity bounds, architecture design principles (number and widths of layers) and entirely different deep learning algorithms. Based on joint works with Noga Zaslavsky and Ravid Ziv.

Eric Vanden-Eijnden, Courant Institute, NYU

Markov State Models

Markov State Models have emerged has a popular way to interpret and analyze time-series data generated e.g. in the context of molecular dynamics simulations. The basic idea of these models is to represent the dynamics of the system as memoryless jumps between predefined sets in the systems state space, i.e. as a Markov jump process (MJP). Performing this mapping typically involves two nontrivial questions: first how to define these sets and second how to learn the rate matrix of MJP once the sets have been identified? Both these questions will be discussed in this talk, with emphasis put on the problem of model specification error.

Marc Vuffray, Los Alamos National Laboratory

Efficient Structure and Parameter Learning of Ising Models: A New Approach

An Ising model is a probability distribution over binary variables which is modeled by a graph. The vertices represent the variables and the edges carry a positive weight that is function of the variables at the endpoint. The probability distribution is proportional to the product of the weights associated with edges. As variables are binary, the weights are fully characterized by a single real parameter called coupling. We consider the learning problem where an observer receives a collection of i.i.d. samples from an Ising Model and wants to reconstruct the edges and the couplings. Our contribution lies in a new reconstruction method that is efficient computationally and sample-wise. Our method is based on a genuine statistical estimator of the couplings inspired from physical consideration. This estimator is the results of a convex optimization program. The estimator is consistent and unbiased and we show that with a proper regularization it estimates the couplings with a number of samples that is small with respect to the system size and to the coupling strengths.

Jonathan Yedidia, Analog Devices Lyric Labs

Combining Inference with Learning Using Memory Factor Networks

We introduce a new type of graphical model that we call a “memory factor network” (MFN). We show how to use MFNs to model the structure inherent in many types of data sets. We also introduce an associated message- passing style algorithm called “proactive message passing”? (PMP) that performs inference on MFNs. PMP comes with convergence guarantees and is efficient in comparison to competing algorithms such as variants of belief propagation. We specialize MFNs and PMP to a number of distinct types of data (discrete, continuous, labelled) and inference problems (interpolation, hypothesis testing), provide examples, and discuss approaches for efficient implementation.

Riccardo Zecchina, Politecnico di Torino

Efficient learning and optimization algorithms from local entropy maximization

We will discuss the role that subdominant states play in the design of algorithms for large-scale optimization problems. We shall take as representative case the problem of learning random patterns with binary synapses in single layer networks. The standard statistical physics results show that this problem is exponentially dominated by isolated solutions that are extremely hard to find algorithmically. By a novel large deviation method we find unexpected analytical evidence for the existence of subdominant and extremely dense regions of solutions. Numerical experiments confirm these findings. We also show that the dense regions are surprisingly accessible by simple learning protocols, and that these synaptic configurations are robust to perturbations and generalize better than typical solutions. These outcomes extend to synapses with multiple states and to deeper neural architectures. The large deviation measure we introduced for the analytic study also suggests how to design general optimization algorithms based on local entropy maximization.

Amanda Ziemann, Los Alamos National Laboratory

The challenge of variability: detecting materials of interest in remotely sensed hyperspectral imagery

Spectral imagery collected from satellites and airborne platforms provides an important tool for remotely analyzing the content of a scene. In particular, the ability to remotely detect a specific material within a scene is of critical importance in nonproliferation applications. Hyperspectral imaging sensors measure, at each pixel, a radiance spectrum with up to hundreds of distinct wavelength channels; this high-dimensional spectral information allows for pixel-level material discrimination. In target detection, the goal is to distinguish target materials from image background clutter, and the challenge is that both the target and background exhibit considerable spectral variability: target spectra have within-material variability, and background spectra have variability from material clutter. Target variability is often due to known physical effects (e.g., particle size, packing density, and moisture content) that can be explicitly modeled. Background variability is far more heterogeneous and is typically characterized with statistical or geometrical models in a high-dimensional feature space. Effectively distinguishing target from background requires that both of these variabilities be taken into account, i.e., the physically modeled variability of the target and the statistical variability of the background. This can be done using physics-informed machine learning. In this poster, we will describe various methodologies, from a simple subspace Gaussian model to a sophisticated graph-based manifold approach, for detecting and labeling physically-varying targets in statistically-varying backgrounds. Results will be shown on a ground-truthed hyperspectral image for two different target materials.Jennifer Annoni, University of Minnesota

A New Approach to Reduced-Order Modeling for Flow Control

The objective of this work is to obtain linear reduced-order models that can be used for controller design. This technique has specifically been applied to fluid flows. High-fidelity computational fluid dynamic models provide accurate characterizations of complex flow dynamics but are not suitable for control design due to their prohibitive computational complexity. A variety of methods, including proper orthogonal decomposition (POD) and dynamic mode decomposition (DMD) can be used to extract the dominant flow structures and obtain reduced-order models of the flow. In this presentation, we introduce a new method, termed input-output dynamic mode decomposition (IODMD), that uses a subspace identification technique to obtain models of low-complexity. We show that, relative to standard DMD, the introduction of an external forcing in IODMD provides robustness with respect to small disturbances and noise in the model. This method captures the dominant characteristics of the flow as well as the input/output behavior of the system. Specifically, IODMD is applied to high-fidelity simulations of wind farm flow to obtain a reduced-order model that can be used for wind farm control.

Amir Ghasemianlangroodi, University of Colorado Boulder

Detectability Limit in Dynamic Networks

Detectability limit is an interesting question in the extensively studied community detection problem. In this work we study the detectability limit in dynamic networks. Our model for dynamic networks (which is a special case of many other models) characterizes the interactions independently at each time slot through the regular stochastic block model (SBM) and the role assignments of each node are modeled as a Markov chain across the time dimension. Using this model we will find the detectability threshold of dynamic networks as a function of rate of change and the strength of communities. Below this threshold we claim that no efficient algorithms can identify the communities better than chance. We also propose two algorithms that are optimal in the sense that they succeed all the way down to this threshold. The first algorithm uses belief propagation to infer the role assignments, which gives asymptotically the optimal accuracy, and the second one is a fast spectral clustering algorithm based on linearizing the BP equations around the trivial fixed point solutions.

Varun Jog, University of Pennsylvania

Persistence of centrality in random growing trees

We investigate properties of node centrality in random growing tree models. We focus on a measure of centrality that computes the maximum subtree size of the tree rooted at each node, with the most central node being the tree centroid. For random trees grown according to a preferential attachment model, a uniform attachment model, or a diffusion processes over a regular tree, we prove that a single node persists as the tree centroid after a finite number of steps, with probability 1. Furthermore, this persistence property generalizes to the top K?1 nodes with respect to the same centrality measure. We also establish necessary and sufficient conditions for the size of an initial seed graph required to ensure persistence of a particular node with probability 1??, as a function of ?: In the case of preferential and uniform attachment models, we derive bounds for the size of an initial hub constructed around the special node. In the case of a diffusion process over a regular tree, we derive bounds for the radius of an initial ball centered around the special node. Our necessary and sufficient conditions match up to constant factors for preferential attachment and diffusion tree models.

Ryan King, University of Colorado at Boulder

Autonomic Machine Learning Closure for Turbulence Simulations

We present a reproducing kernel formulation of the recently introduced autonomic closure for coarse-grained turbulence simulations [1,2]. The autonomic closure is a data-driven approach that eliminates the need to specify a predefined turbulence model and instead provides for fully-adaptive, self-optimizing turbulence closures on-the-fly without external training data. This is accomplished by posing a supervised learning problem based on test filtering in the self-similar turbulent inertial range. The supervised learning problem is solved using reproducing kernel Hilbert spaces to efficiently find high or even infinite dimensional turbulence closures. This nonparametric approach allows the autonomic closure to freely adapt to varying nonlinear, nonlocal, nonequilibrium and other turbulence characteristics of the flow. Initial results with the autonomic closure prove to be remarkably more accurate in a priori tests than do dynamic versions of traditional prescribed closures, with limited computational penalties in the kernelized formulation.

Remi Lam, Massachusetts Institute of Technology

Optimization Framework for Data-driven Improvement of Physics-based Models.

Mathematical models are extensively used for diverse tasks including analysis, optimization, and decision making. However, those models are not perfect representations of reality: simplified physical descriptions are used (simplified governing equations, boundary conditions, etc.), and numerical approximations are made (discretization, linearization, round-off error, etc.). Such misspecifications can lead to erroneous or suboptimal decisions for the end-goal task one aims to solve. To mitigate this effect, one can adjust the available model using limited data produced by experiments or higher fidelity numerical models. A large body of research has focused on estimating explicit model parameters. This work takes a different perspective and targets construction of a correction operator with implicit properties. We investigate the case where the end goal is an inversion problem and illustrate how appropriate choices of structure lead to improved end-goal insight.

Li Li, University of California, Irvine

Understanding Machine-Learned Density Functionals

Machine learning (ML) is an increasingly popular statistical tool for analyzing either measured or calculated data sets. Here we explore its application to a well-defined physics problem, investigating issues of how the underlying physics are handled by ML, and how self-consistent solutions can be found by limiting the domain in which ML is applied. The particular problem is how to find accurate approximate density functionals for the kinetic energy of non-interacting electrons. Kernel ridge regression is used to approximate the kinetic energy of non-interacting fermions in a one dimensional box as a functional of their density. The properties of different kernels and methods of cross-validation are explored, reproducing the physics faithfully in some cases, but not others. We also address how self-consistency can be achieved with information on only a limited electronic density domain. Accurate constrained optimal densities are found via a modified Euler-Lagrange constrained minimization of the machine-learned total energy, despite the poor quality of its functional derivative. A projected gradient descent algorithm is derived using local principal component analysis. Additionally, a sparse grid representation of the density can be used without degrading the performance of the methods. The implications for machine-learned density functional approximations are discussed.

Nicholas Lubbers, Boston University

Deep Learning, Image Processing, and Materials Microstructure

In the past ten years, significant improvements in machine learning and pattern recognition have been enabled by novel training algorithms and architectures for artificial neural networks. These techniques are the current state of the art for computer vision, including image recognition and image captioning tasks. We explore the scope of these algorithms and the potential for turning them towards scientific domains in particular, characterization of materials microstructure.

Sergey Matveev, Lomonosov Moscow State University, Skolkovo Institute of Science and Technology

TT-approximation technique for acceleration of the finite-difference scheme solving multicomponent coagulation equation

Numerical solution of the multicomponent Smoluchowski coagulation equation is a complex problem of numerical mathematics. The most spread technique for the mathematical modelling of the coagulation process is Monte Carlo methodology. The classical and highly-accurate finite-difference schemes are useless in this application due to their extremely high algorithmic complexity. In this work we applied the low-rank approximations in TT-format of both the solution and the coagulation kernel and the fast TT-arithmetics to the implementation of the time-steps of the classical Runge-Kutta method. The obtained acceleration allows us to claim that we have constructed a novel numerical method which is more efficient than the known Monte Carlo technique.

Chihiro Nakajima, WPI-AIMR, Tohoku University

3D image reconstruction of a gold nano-cluster by compressed sensing

Recently, an atomic-scale discrete computer tomography (CT) with transmission electron microscopy (TEM) is becoming possible. However, it is difficult to obtain many projected images for single object from various directions since desorption of atoms is caused by irradiation of high-energy electrons. We achieved the 3D-image reconstruction of nano-scale object from a few (three) projected images by an application of L1-norm minimization to the reconstruction algorithm. The poster illustrates a demonstration of the reconstruction of atomic arrangement from experimental data with a gold nano-porous-cluster.

Dave Osthus, Los Alamos National Laboratory

Probabilistic Forecasting of Seasonal Influenza in the US

Seasonal influenza is a serious public health and societal problem due to its consequences resulting from absenteeism, hospitalizations, and deaths. The overall burden of influenza is captured by the Centers for Disease Control and Preventions (CDC) influenza-like illness (ILI) network, which provides invaluable information about the current incidence. Despite the relatively rich surveillance data and the recurrent nature of seasonal influenza, forecasting the timing and intensity of seasonal influenza in the US represents a significant challenge. This is because the form of the disease transmission process is uncertain, the disease dynamics are only partially observed, and the public health observations are noisy. In this work, we model the data, disease transmission process, discrepancy, and parameters jointly in a Bayesian probabilistic model. This is a promising approach for forecasting seasonal influenza, as it allows us to impose constraints based on historical flu seasons. Accounting for multiple modes of uncertainty and systematic model discrepancy results in a probabilistic, prospective forecast of the current flu season, available at http://bsvgateway.org/flu/.

Fatih G. Sen, Argonne National Laboratory

A comparative study of optimization methods for force field fitting

Large-scale atomistic simulations using classical molecular dynamics/molecular mechanics (MD/MM) methods enable modeling of materials, biological systems, and chemical reactions at surfaces/interfaces that are not easily accessible with experiments. The predictive capability and performance of MD/MC methods greatly rely on the empirical force field (EFF) used, which describes all interatomic interactions. The EFF parameters are generally fitted using least-squares local minima search algorithms to represent interatomic interactions and material properties obtained from quantum-mechanical based methods such as density functional theory (DFT), as well as available experimental data. Evolutionary algorithms such as genetic algorithm (GA) enable finding the global optimum of parameter space, even for EFF with very complex functional forms. However, systematic studies of the efficiency and accuracy of different optimization techniques for the determination of EFF parameters are still lacking. In the present work, we test the efficiency and effectiveness of a variety of global and local optimization schemes in the determination of EFF parameters. We first generated a training data set using DFT for Ir-O system, which included pertinent structural and thermodynamics properties for a variety of structures. The EFF functional forms were chosen to be Morse for the Ir-O system. The optimization of the EFF parameters against the training data was carried out using (1) multi-start local optimization with common techniques including Simplex and Levenberg-Marquardt, and (2) single-objective GA. Using the random search as a base line, we compared the effectiveness and efficiency of the algorithms in terms of reaching the lowest error in representing DFT data, and number of function evaluations. We also investigated the use of multi-objective GA (MOGA) to remove the ambiguities in selection of weighting scheme in defining the fitness function. We analyzed accuracy and performance of MOGA in finding the Pareto-optimal solutions with respect to population size and number of objectives. Overall, this study provides a systematic approach for selecting a suitable optimization method to determine the optimal EFF parameters for use in classical simulations, which in turn enable more accurate prediction of material properties and chemical phenomena.

Miles Stoudenmire, Perimeter Institute for Theoretical Physics

Supervised Learning with Tensor Networks

We propose an approach to supervised learning based on tensor network approximations used in physics. First we map input vectors to an exponentially larger 'quantum state' space. Then we use the overlap of these mapped vectors with 'classifier tensors' to decide which class they belong to. Crucially, we approximate the classifier tensors as matrix product states (a.k.a. tensor trains). Such a tensor network approximation makes our approach computationally efficient and regularizes our models. In future work, we envision using even more powerful classes of tensor networks and applying insights from physics to better understand the process of learning.

Koujin Takeda, Ibaraki University

Application of partial parallel interference cancellation to sparse signal recovery

We propose a scheme for how to construct sparse signal recovery algorithms in compressed sensing under general measurement matrix, by applying the idea of partial parallel interference cancellation in CDMA demodulation to iterative soft thresholding. As a result, we arrive at a novel algorithm, which is equivalent to approximate message passing (AMP) under an appropriate assumption on the measurement matrix. This means that our novel algorithm is one of generalizations of AMP. In addition, via numerical experiment we verify that the proposed algorithm shows better convergence than the original AMP when the measurement matrix does not satisfy the assumption of AMP.

Dominik Thalmeier, Radboud University Nijmegen

Towards Optimal Control of Network Structure Growth

We consider the problem of influencing the growth of a network to optimize some observable that depends on the network structure. We formulate this problem as a Kullback-Leibler control problem and show some interesting properties. First, we show that this approach captures complex phenomena such as transitions between qualitatively different optimal solutions and delayed choice mechanisms. Second, we propose an annealing importance sampling method to approximate the optimal control. Our approach uses network features and can efficiently address high dimensional problems. Finally, we apply this framework to influence the online commenting behaviour in Internet forums, using realistic models of cascade growth. We incorporate a realistic user model to account for the limited controllability that characterizes these problems and show that the proposed approach improves complex measures that depend on the structure of the observed discussion threads.

Ching-Hao Wang, Boston University

A statistical physics approach to designing synthetic signaling cascades with message-passing algorithms

Cells have evolved intricate signaling pathways to respond to multiple inputs. The process of signal transduction can be understood as a message relay between intracellular effectors and the extracellular cues. Here we present an algorithmic way that combines message-passing algorithm with a statistical-mechanical model of the underlying biophysics to investigate the computational capability a general signaling network entails. We demonstrate the utility of this approach using an example from synthetic biology: designing a synthetic kinase cascade that implements an arbitrary decision surface. We conclude by discussing the implication of this approach for synthetic biology and the possibility of designing new machine learning algorithms inspired by cellular signaling networks.Outlook |

iCal |

Yahoo! |

MSN |

This event is brought to you by the Center for Nonlinear Studies at Los Alamos National Laboratory.

This event has been declared "Open to the Public - with Registration".

Top