These files have been approved for public release. Distribution is unlimited. LA-UR-16-20194.

Seminar Abstracts

Madhu Advani, Stanford University

Statistical Physics of High Dimensional Inference

To model modern large-scale
datasets, we need efficient algorithms to infer a set of P unknown model
parameters from N noisy measurements.What are the
fundamental limits on the accuracy of parameter inference given limited
measurements, signal-to-noise ratios, prior information, and
computational tractability requirements?How can we
combine prior information with measurements to achieve these limits?
Classical statistics gives incisive answers to these questions as the
measurement density (N divided by P) approaches infinity. However,
modern high-dimensional inference problems, in fields ranging from
bio-informatics to economics, occur at finite measurement density.We
formulate and analyze high-dimensional inference analytically by
applying the replica and cavity methods of statistical physics where
data serves as quenched disorder and inferred parameters play the role
of thermal degrees of freedom. Our analysis reveals that widely
cherished Bayesian inference algorithms such as maximum likelihood and
maximum a posteriori are suboptimal in the modern setting, and yields
new tractable, optimal algorithms to replace them as well as novel
bounds on the achievable accuracy of a large class of high-dimensional
inference algorithms.

Anima Anandkumar, University of California Irvine

Tensor Methods: A new paradigm for guaranteed learning of probabilistic models and neural networks

Tensors are rich structures for
modeling complex higher order relationships in data rich domains such as
social networks, computer vision, internet of things, and so on. Tensor
decomposition methods are embarrassingly parallel and scalable to
enormous datasets. They are guaranteed to converge to the global optimum
and yield consistent estimates for many probabilistic models such as
topic models, community models, hidden Markov models, and so on. I will
also demonstrate how tensor methods can yield rich discriminative
features for classification tasks and provides a guaranteed method for
training neural networks.

Jennifer Annoni, University of Minnesota

Poster-A New Approach to Reduced-Order Modeling for Flow Control

The objective of this work is to obtain linear reduced-order models that can be used for controller design.This technique has specifically been applied to fluid flows.High-fidelity
computational fluid dynamic models provide accurate characterizations
of complex flow dynamics but are not suitable for control design due to
their prohibitive computational complexity.A variety of
methods, including proper orthogonal decomposition (POD) and dynamic
mode decomposition (DMD) can be used to extract the dominant flow
structures and obtain reduced-order models of the flow.In
this presentation, we introduce a new method, termed input-output
dynamic mode decomposition (IODMD), that uses a subspace identification
technique to obtain models of low-complexity.We show
that, relative to standard DMD, the introduction of an external forcing
in IODMD provides robustness with respect to small disturbances and
noise in the model.This method captures the dominant characteristics of the flow as well as the input/output behavior of the system.Specifically,
IODMD is applied to high-fidelity simulations of wind farm flow to
obtain a reduced-order model that can be used for wind farm control.

Guy Bresler, Massachusetts Institute of Technology

Learning a Tree-Structured Ising Model in Order to Make Predictions

We study the problem of learning a
tree graphical model from samples such that low-order marginals are
accurate. We define a distance (``small set TV" or ssTV) between
distributions P and Q by taking the maximum, over all subsets S of a
given size, of the total variation between the marginals of P and Q on
S. Approximating a distribution to within small ssTV allows making
predictions based on partial observations. Focusing on pairwise
marginals and tree-structured Ising models on p nodes, we give an
algorithm that produces a distribution (from the same class) with ssTV
at most eta using log p / eta^2 IID samples. We also prove a matching
lower bound showing that this sample complexity is optimal. In contrast
to the problem of learning the tree structure itself, there is no
dependence on the strength of edge interactions. Thus, even when there
are far too few samples to recover the correct tree, it is possible to
learn an incorrect tree that is useful. The time complexity of the
proposed algorithm is on the order of p^2, dominated by the computation
of pairwise correlations. Joint work with Mina Karzand.

Joachim Buhmann, ETH Zurich

Information theory of algorithms: How precisely should we compute?

Algorithms map input spaces to
output spaces where inputs are possibly affected by fluctuations. Beside
run time and memory consumption, an algorithm might be characterized by
its sensitivity to the signal in the input and its robustness to input
fluctuations. The achievable precision of an algorithm, i.e., the
attainable resolution in output space, is determined by its capability
to extract predictive information in the input relative to its output. I
will present an information theoretic framework for algorithm analysis
based on maximum entropy estimation where an algorithm is characterized
as computational evolution of a (possibly contracting) posterior
distribution over the output space. The tradeoff between precision and
stability is controlled by an input sensitive generalization capacity
(GC). GC measures how much the posteriors on two different problem
instances agree despite the input noise. For optimization problems in
the two instance setting, GC is calculated by estimating the free
energies of the two instance costs and of the joint costs. Thereby, GC
objectively ranks different algorithms for the same data processing task
based on the bit rate of their respective capacities. Information
theoretic algorithm selection is demonstrated for minimum spanning tree
algorithms and for greedy MaxCut algorithms. The method can rank
centroid based and spectral clustering methods, e.g. k-means, pairwise
clustering, normalized cut, adaptive ratio cut and dominant set
clustering.

Kieron Burke, University of California Irvine

Machine learning density functionals

In chemistry and materials science,
the electronic structure problem (finding the ground-state energy of
electrons bound to nuclei) is incredibly important. More than 30,000
papers used density functional theory to solve this problem last year.The
theory component of the Materials Genome Initiative involves creating
huge databases of DFT materials calculations and searching for desired
properties. In contrast, with friends from the machine-learning
community, I have been trying to construct approximate density
functionals using ML, in order to overcome the limitations of our
present approximations.In some ways, ML is the perfect tool for this procedure.In other ways, it is a nightmare.I will present our latest results.We could use any help we can get.

Deepjyoti Deka, Los Alamos National Laboratory

Learning and Estimation Problems in Radial Distribution Networks

Distribution Networks provide the
final tier in the transfer of electricity from generators to the end
consumers. Traditionally, distribution networks have been equipped with
limited real-time monitoring devices and have low observability.
However, there has been a recent surge in the deployment of smart meters
in distribution networks for use in demand response, improved
home-monitoring and other smart grid inspired topics. We
discuss learning and estimation problems in the distribution grid using
available phasor and voltage magnitude data. For a wide range of
realistic operating conditions, polynomial time algorithms are proposed
to learn the structure of the distribution network as well as to
estimate the statistics of load profiles at the constituent nodes and/or
line parameters. Further, the algorithms are extended to cases with
missing data, where observations limited to a fraction of the nodes are
available for structure learning. This work can be applied to improve
control and optimize operations in the grid as well as to understand the
scope of adversarial attacks. Current extensions include learning
techniques in three phase systems and intersection with graphical model
based learning. This is a joint work with S. Backhaus and M. Chertkov.

Karthik Duraisamy, University of Michigan

Inference and Machine Learning for Turbulence Modeling

The recent acceleration in
computational power and measurement resolution has made possible the
availability of extreme scale simulations and data sets. In this work, a
modeling paradigm that seeks to comprehensively harness large scale
data is introduced, with the aim of improving closure models. Full-field
inversion (in contrast to parameter estimation) is used to obtain
corrective, spatially distributed functional terms, offering a route to
directly address model-form errors. Once the inference has been
performed over a number of problems that are representative of the
deficient physics in the closure model, machine learning techniques are
used to reconstruct the model corrections in terms of variables that
appear in the closure model. These machine-learned functional forms are
then used to augment the closure model in predictive computations. The
approach is demonstrated to be able to successfully reconstruct
functional corrections and yield predictions with quantified
uncertainties in a range of turbulent flows.

David Gamarnik, Massachusetts Institute of Technology

(Arguably) Hard on Average Optimization Problems and the Overlap Gap Property

Many problems in the area of random
combinatorial structures and high-dimensional statistics exhibit an
apparent computational hardness, even though the formal results
establishing such hardness is lacking. Examples include the problem of
finding a independent set of a random graph, finding a proper coloring
of a random graph and the problem of finding the largest submatrix of a
random matrix. Inspired by insights from the statistical physics we
propose a general conjecture regarding the onset of hardness for these
types of problems formulated in terms of a certain overlap gap property.
We conjecture that such problems become hard on average when the space
of overlaps becomes disconnected in an appropriate sense (the model
exhibits an overlap gap), and we prove the existence of such gaps for
several of these problems including the problem of finding the largest
submatrix of a random Gaussian matrix. Furthermore, we establish that
the overlap gap property is a provable obstacle for a certain class of
local algorithms for the problem of finding a largest independent set of
a sparse graph and finding a satisfying assignment of a random
NAE-K-SAT problem.

Surya Ganguli, Stanford University

The statistical physics of deep
learning: on the role of dynamic criticality, random landscapes, and the
reversal of time. Neuronal networks have enjoyed a resurgence both in
the worlds of neuroscience, where they yield mathematical frameworks for
thinking about complex neural datasets, and in machine learning, where
they achieve state of the art results on a variety of tasks, including
machine vision, speech recognition, and language translation.Despite
their empirical success, a mathematical theory of how deep neural
circuits, with many layers of cascaded nonlinearities, learn and compute
remains elusive.We will discuss three recent vignettes in which ideas from statistical physics can shed light on this issue.In
particular, we show how dynamical criticality can help in neural
learning, how the non-intuitive geometry of high dimensional error
landscapes can be exploited to speed up learning, and how modern ideas
from non-equilibrium statistical physics, like the Jarzynski equality,
can be extended to yield powerful algorithms for modeling complex
probability distributions.Time permitting, we will also
discuss the relationship between neural network learning dynamics and
the developmental time course of semantic concepts in infants.

Thomas Halsey, ExxonMobil Upstream Research Co.

Managing Business Risk under Uncertainty:Development Planning in the Oil and Gas Industry

The upstream oil and gas industry
is distinguished by its dependence on a poorly understood subsurface as
the ultimate source of value. All phases of upstream activity:
exploration, development, and production, incur significant business
risk arising from our incomplete knowledge of the subsurface and of the
fluids therein. There are two ways of dealing with this risk: either by
trying to reduce it through additional data collection and better
characterization of the subsurface, or by better managing it, by better
understanding the range of plausible potential subsurface realities that
are consistent with a given set of characterization data and the
economic implications of this range, by selection actions that optimize
return over the range of uncertainty, and by identifying possible
responses to unfavorable outcomes.In this contribution we
discuss the differences between development and production (or more
informally, green-field and brown-field) oriented uses of modeling and
simulation, and then focus on development planning applications, which
attempt to account for uncertainty early in an asset lifecycle, when
both uncertainty and the economic impact of decisions are at their
maximum. “Scenario discovery”, a clustering process across numerous
scenarios, is key to these applications.

Hendrik Hamann, IBM T.J. Watson Research Center

Physical Analytics: Bringing big data together with physics

In the past most information on the
internet has been originated by humans or computers. However with the
emergence of cyber-physical systems, vast amount of data is now being
created by sensors from devices, machines etc digitizing the physical
world. While cyber-physical systems are subject to active research
around the world, the vast amount of actual data generated from the
physical world has attracted so far little attention from the
engineering and physics community. In this presentation we use examples
to highlight the opportunities in this new subject of “Physical
Analytics” for highly inter-disciplinary research (including physics,
engineering and computer science), which aims understanding real-world
physical systems by leveraging cyber-physical technologies. More
specifically, the convergence of the physical world with the digital
domain allows applying physical principles to everyday problems in a
much more effective and informed way than what was possible in the past.
Very much like traditional applied physics and engineering has made
enormous advances and changed our lives by making detailed measurements
to understand the physics of an engineered device, we can now apply the
same rigor and principles to understand large-scale physical systems. In
the talk we first present a set of “configurable” enabling technologies
for Physical Analytics including ultralow power sensing and
communication technologies, physical big data management technologies,
numerical modeling for physical systems, machine learning based physical
model blending, and physical analytics based automation and control.
Then we discuss in detail several concrete applications of Physical
Analytics ranging from energy management in buildings and data centers,
environmental sensing and controls, precision agriculture to renewable
energy forecasting and management.

Lior Horesh, IBM Research

Should you Derive, or let the data Drive? Framework for design of hybrid first-principles data-driven models

Mathematical models are employed
ubiquitously for description, prediction and decision making. In
addressing end-goal objectives, great care needs to be devoted to
attainment of appropriate balance of inexactness throughout the various
stages of the end goal process (e.g. modeling and inversion). Disregard
to such considerations, either entails redundant computation or
impairment of the overall fidelity of the inversion process. Model
reduction is instrumental in trading-off fidelity for computation, yet,
in some situations, it is essential to take an opposite perspective, and
enhance model fidelity and thereby end-goal output. In this talk, we
shall describe the interplay between model reduction and model
mis-specification mitigation and provide a generic infrastructure for
model re-specification based upon a hybrid first principles and
data-driven approach.

Jason Johnson, Numerica Corporation

Machine Learning and Discrete Optimization for Police Patrol Planning

Today’s law enforcement decision
makers are inundated with data. We discuss how machine learning and
discrete optimization techniques can be used to extract actionable
information from the historical record of demand for police service,
propelling patrol activities to new levels of efficiency and
effectiveness.

Bert Kappen, Radboud University

Control, inference and learning

Intelligent systems, whether
natural or artificial, must act in a world that is highly unpredictable.
To plan actions with uncertainty is a stochastic optimal control
problem. However, there are two fundamental problems: the optimal
control solution is intractable to compute and intractable to represent
due the non-trivial state dependence of the optimal control. This has
prevented large scale application of stochastic optimal control theory
sofar. The path integral control theory describes a class of control
problems whose solution can be computed as an inference computation. In
this presentation I focus on efficient computation of the optimal
control solution using importance sampling. I formalize the intuitive
notion that the efficiency of the inference computation is related to
the proximity of the sampling control to the optimal control. Secondly, I
show new results that allow approximate computation of state dependent
optimal controls using the cross entropy method. These two ingredients
together suggest a novel adaptive sampling procedure, called PICE (path
integral cross entropy method), that learns a controller based on
self-generated data. |The adaptive sampling procedure can be used to
efficiently compute optimal controls but can also be used to accelerate
other Monte Carlo computations. We illustrate the results on a few
examples in robotics and time series.

Yannis Kevrekidis, Princeton University

No equations, no variables, no parameters: Data and the computational modeling of complex systems

I will discuss an interface between
traditional continuum modeling of complex dynamical systems (ODEs,
PDEs, evolving networks) and fine scale, atomistic/stochastic/agent
based numerics. The use of modern data-mining techniques (and in
particular here, manifold-learning techniques like diffusion maps)
allows us to solve coarse-grained, systems level descriptions of the
fine scale simulations without deriving these descriptions in closed
form - what we call the equation-free approach. It also allows us to do
this when it is not explicitly known what the right coarse-grained
variables are (variable-free computation) nor what the right
coarse-grained parameters are (parameter-free computation). This
``agnostic" approach to modelingonly takes into account
data and the nature of the unavailable model (not its actual closed
form); and it uses traditional scientific computing as a tool for the
"on the fly" design of computational experiments.

Jorge Kurchan, LPS-Ecole Normale Superieure

Darwinian versus thermal optimization

Darwinian optimization -- walkers
that randomly diffuse and reproduce proportionally to the quantity that
has to be maximized -- may be mapped into ordinary thermal optimization
(e.g. Monte Carlo) when applied to complex landscapes. We may thus see
the limitations of the former clearly. More interestingly, we may apply
what we know from annealing schemes to accelerating the dynamics of an
evolving population.

Po-Ling Loh, University of Pennsylvania

Confidence sets for diffusion source estimators

We study the problem of identifying
the source of a diffusion spreading over a regular tree. When the
degree of each node is at least three, we show that it is possible to
construct confidence sets for the diffusion source with size independent
of the number of infected nodes. Our estimators are motivated by
analogous results in the literature concerning identification of the
root node in preferential attachment and uniform attachment trees. At
the core of our proofs is a probabilistic analysis of Polya urns
corresponding to the number of uninfected neighbors in specific subtrees
of the infection tree.

Andrey Lokhov, Los Alamos National Laboratory

Efficiently (re)constructing spreading networks from partial observations

An important problem of
reconstruction of diffusion network and transmission probabilities from
spreading data has attracted a considerable attention in the past
several years. In this talk, we focus on a realistic and restricted
scenario, in which only a partial information on cascades is available,
and argue that the methods based on the maximization of the likelihood
are computationally intractable. To tackle this problem, we suggest an
algorithm based on recently introduced dynamic message-passing equations
for spreading processes, which enables a fast and robust reconstruction
of transmission probabilities in sparse networks. As an application, we
discuss how the developed framework can be used for engineering
artificial networks with desired properties.

Charles Mathy, Lyric Labs ADI

SPARTA: Fast global planning of collision-avoiding robot trajectories

I will present an algorithm for
obtaining collision-avoiding robot trajectories called SPARTA, which
stands for SPline-based ADMM for Robot Trajectory Avoidance. The problem
of finding smooth collision-avoiding trajectories of robots is solved
using a message passing algorithm called the Three Weight Algorithm. The
generated paths include physical constraints of the robots, and the
convergence speed is such that it becomes feasible for real-time
applications. We physically implemented our algorithm using mobile
robots and showcase the results. The general approach can be applied to
many kinds of optimization problems, including both soft and hard costs
in the objective.

Pankaj Mehta, Boston University

Variational RG and Deep Learning

Deep learning is a broad set of
techniques that uses multiple layers of representation to automatically
learn relevant features directly from structured data. Recently, such
techniques have yielded record-breaking results on a diverse set of
difficult machine learning tasks in computer vision, speech recognition,
and natural language processing. Despite the enormous success of deep
learning, relatively little is understood theoretically about why these
techniques are so successful at feature learning and compression. Here,
we show that deep learning is intimately related to one of the most
important and successful techniques in theoretical physics, the
renormalization group (RG). RG is an iterative coarse-graining scheme
that allows for the extraction of relevant features (i.e. operators) as a
physical system is examined at different length scales. We construct an
exact mapping from the variational renormalization group, first
introduced by Kadanoff, and deep learning architectures based on
Restricted Boltzmann Machines (RBMs). We illustrate these ideas using
the nearest-neighbor Ising Model in one and two-dimensions. Our results
suggest that deep learning algorithms may be employing a generalized
RG-like scheme to learn relevant features from data.

A. Alan Middleton, Syracuse University

The effects of boundaries and dimensions on optimization and memory recovery

Some links between optimization and
dimensionality, especially with respect to dimensionality and to
recovery of global solutions, will be explored, using the example of
disordered magnets. In some cases, polynomial time algorithms can be
used, which allow for thorough examination of the effects of system size
on the uniqueness of solutions and the recovery after aging can be
explored. In particular, recent results on the effects of boundary
conditions in random Ising magnets in two dimensions found using
recently developed shortest paths methods will be presented, along with
preliminary work on recovery of ground states using patchwork dynamics
(optimization over blocks).

Remi Monasson, ENS Paris

Benchmarking inverse statistical approaches for protein structure and design with lattice-protein models

Inverse statistical approaches,
modeling pairwise correlations between amino acids in the sequences of
similar proteins across many different organisms, can successfully
extract protein structure (contact) information. I will present a recent
work, in collaboration with S. Cocco and E. Shakhnovich, where we
benchmark those statistical approaches on exactly solvable models of
proteins, folding on a 3D lattice, to assess the reasons underlying
their success and their limitations. We show that the inferred
parameters (effective pairwise interactions) of the statistical models
have clear and quantitative interpretations in terms of positive
(favoring the native fold) and negative (disfavoring competing folds)
protein sequence design. New sequences randomly drawn from the
statistical models are likely to fold into the native structures when
effective pairwise interactions are accurately inferred, a performance,
which cannot be achieved with independent-site models.

Cristopher Moore, Santa Fe Institute

Phase transitions in sparse, high-dimensional clustering

Many problems in data analysis have
phase transitions: when the data becomes too noisy or too sparse, it
suddenly becomes impossible to find any pattern in it, or to distinguish
it from a null model with no pattern at all. In some cases, there is
also a regime where pattern-finding is information-theoretically
possible, but appears to be computationally hard. I will review work
over the past few years on one such problem, finding communities in
networks. I will then present some new work, which is partly analogous,
on one of the most basic problems in data science: clustering O(n)
points in n-dimensional space.

Gautam Nallamala, University of California, San Diego

Learning to Navigate Turbulent Environments: The Soaring Flight of Birds and Gliders

Birds and gliders use warm, rising
currents (thermals) in the atmosphere to ascend up while minimizing
energy expenditure. Thermal soaring, as this form of flight is called,
is a strategy used by birds as they migrate across large distances,
sometimes even across oceans and continents. Soaring flight provides an
interesting example of a complex decision-making problem in biology that
requires a long-term strategy in order to effectively utilize the
ascending thermals. The process of finding thermals is further
complicated by turbulent fluctuations that accompany thermal formation.
We approach the soaring flight of birds and gliders as a problem of
learning to navigate complex, highly fluctuating turbulent environments.
Most soaring is observed in the atmospheric convective boundary layer
on a warm, sunny day. The convective boundary layer of the atmosphere is
modeled in two ways: As a Rayleigh-Benard flow at high Rayleigh number,
and as a kinetic model that reproduces the statistical properties of
the boundary layer. The two models include the necessary physical
features that enable and also complicate effective soaring. We use a
model-free, experience-based, reinforcement learning algorithm to train
gliders. We identify and compare the learned policies in the regimes of a
weak and strong flow. From weak to strong, we show how the glider
adopts an increasingly conservative policy, quantifying the degree of
risk affordable in turbulent environments.The learning
procedure also permits the identification of the turbulent observables
that are most effectively exploited by the glider for its soaring in the
atmospheric boundary layer.

Masayuki Ohzeki, Kyoto University

Accelerated Langevin Dynamics and its application to machine learning

In machine learning, in order to
estimate the parameter of a generative model, the most fundamental
approach is the maximum likelihood estimation. However the computational
cost for maximum likelihood estimation is harmful to compute the
partition function or expectation values in equilibrium. The alternative
technique is the contrastive divergence to quickly reach the
equilibrium state while avoiding the computation of the partition
function. In order to develop the method of the contrastive divergence,
we implement a novel technique to accelerate the convergence to the
desired distribution inspired by nonequilibrium dynamics in the
framework of the contrastive divergence. We confirm its relevance in the
contrastive divergence in the simple model. In addition, we refer the
possibility of the quantum annealing for utilization of the contrastive
divergence in the machine learning.

Ruslan Salakhutdinov, University of Toronto

Recent Advances in Deep Learning: Learning Structured, Robust, and Multimodal Deep Models

Building intelligent systems that
are capable of extracting meaningful representations from
high-dimensional data lies at the core of solving many Artificial
Intelligence tasks, including visual object recognition, information
retrieval, speech perception, and language understanding. In this talk I
will first introduce a broad class of hierarchical probabilistic models
called Deep Boltzmann Machines (DBMs) and show that DBMs can learn
useful hierarchical representations from large volumes of
high-dimensional data with applications in information retrieval, object
recognition, and speech perception. I will then introduce deep models
that are capable of extracting a unified representation that fuses
together multiple data modalities and discuss models that can generate
natural language descriptions (captions) of images. I will show that on
several tasks, including modelling images and text, video and sound,
that these models significantly improve upon many of the existing
techniques.

David Schwab, Northwestern University

The Deterministic Information Bottleneck

Compression fundamentally involves a
decision about what is relevant and what is not. The information
bottleneck (IB) formalized this notion as an information-theoretic
optimization problem and proposed an optimal tradeoff between throwing
away as many bits as possible, and selectively keeping those that are
most important. Here, we introduce an alternative formulation of the IB,
the deterministic information bottleneck (DIB), that we argue better
captures the notion of compression. As suggested by its name, the
solution to the DIB problem is a deterministic encoder, as opposed to
the stochastic encoder that is optimal under the IB. We then compare the
IB and DIB on synthetic data, showing that the IB and DIB perform
similarly in terms of the IB cost function, but that the DIB vastly
outperforms the IB in terms of the DIB cost function. Our derivation of
the DIB also provides a family of models which interpolates between the
DIB and IB by adding noise of a particular form. We discuss the role of
noise as a regularizer of the DIB.

Alexander Shapeev, Skolkovo Institute of Science and Technology

A New Class of Machine Learning Interatomic Potentials

Molecular modelling becomes
increasingly important in materials research. There, the forces on atoms
are obtained either through using the empirical potentials or the ab
initio calculations. The accuracy of the empirical potentials allows
only for qualitative predictions, but they allow modelling millions to
billions atoms. The ab initio calculations, on the other hand, are
typically limited to 100-1000 atoms, but their accuracy is much higher.
Hence, a practically important problem is to bridge the gap in the
accuracy and efficiency of the empirical potentials and ab initio
calculations. Ideally, this should be interatomic potentials whose
efficiency would be comparable to empirical potentials and accuracy - to
the ab initio calculations. At least, they should be potentials whose
accuracy could be systematically improved at the cost of increasing
computational complexity. As a rule, the discipline of machine learning
is heavily used to construct such interatomic potentials. In my talk, I
will review the state of the art in constructing such potentials and
present a new approach based on linear regression and invariant
polynomials. This approach can achieve an accuracy comparable to most
accurate of the existing schemes, yet 1-2 orders of magnitude more
computationally efficient.

Jinwoo Shin, KAIST

Polynomial-time Message-passing Algorithm for Minimum Weight Perfect Matching

It has been shown that the
max-product Belief Propagation (BP) can solve a class of Linear
Programming (LP) relaxations on combinatorial optimizations if no
integrality gap is present. However, when LP shows an integrality
gap, no model has been known which can be solved systematically via
sequential applications of BP. In this paper, we develop the first such
algorithm for solving the minimum weight matching problem over arbitrary
graphs. Each step of the sequential algorithm requires applying BP over
a modified graph constructed by contractions and expansions of
blossoms, i.e., odd sets of vertices. For graph G= (V,E), our scheme
guarantees termination in O(|V|^2) of BP runs and each BP converges in
O(|V|) iterations in the worst case. In essence, it offers a
message-passing version of the celebrated Edmonds’ Blossom algorithm by
jumping at once over many sub-steps with a single BP.

David Sontag, New York University

How Hard is Inference for Structured Prediction?

Many machine learning tasks can be
posed as structured prediction, where the goal is to predict a labeling
or structured object. For example, the input may be an image or a
sentence, and the output is a labeling such as an assignment of each
pixel in the image to foreground or background, or the parse tree for
the sentence. Despite marginal and MAP inference for many of these
models being NP-hard in the worst-case, approximate inference algorithms
are remarkably successful and as a result structured prediction is
widely used.What makes these real-world instances
different from worst-case instances? One key difference is that in all
of these applications, there is an underlying "ground truth" which
structured prediction is aiming to find. In this talk, I will introduce a
new theoretical framework for analyzing structured prediction
algorithms in terms of their ability to achieve small Hamming error. We
study the computational and statistical trade-offs that arise in this
setting, and illustrate a setting where polynomial-time algorithms can
perform optimal prediction, despite the corresponding MAP inference task
being NP-hard.

Naftali Tishby, Hebrew University

Deep Learning and the Information-Bottleneck phase transitions

Deep Neural Networks (DNNs) are
analyzed via the theoretical framework of the information bottleneck
(IB) framework. We first show that any DNN can be quantified by the
mutual information between the layers and the input and output
variables. Using this representation we can calculate the optimal
information theoretic limits of the DNN and obtain finite sample
generalization bounds. The advantage of getting closer to the
theoretical limit is quantifiable both by the generalization bound and
by the network’s simplicity. We argue that both the optimal
architecture, number of layers and features/connections at each layer,
are related to critical points on the information bottleneck tradeoff
line, namely, relevant compression of the input layer with respect to
the output layer. The hierarchical representations at the layered
network naturally correspond to the structural phase transitions along
the information curve. An interesting class of solvable DNN's arise by
applying this framework to the case of symmetries in the supervised
learning task. The case of translation invariance leads to the familiar
convolution neural networks. Other symmetry group chains yield different
types of bifurcation diagrams and network architectures. These new
insights also suggest new sample complexity bounds, architecture design
principles (number and widths of layers) and entirely different deep
learning algorithms. Based on joint works with Noga Zaslavsky and Ravid
Ziv.

Eric Vanden-Eijnden, Courant Institute, NYU

Markov State Models

Markov State Models have emerged
has a popular way to interpret and analyze time-series data generated
e.g. in the context of molecular dynamics simulations. The basic idea of
these models is to represent the dynamics of the system as memoryless
jumps between predefined sets in the systems state space, i.e. as a
Markov jump process (MJP). Performing this mapping typically involves
two nontrivial questions: first how to define these sets and second how
to learn the rate matrix of MJP once the sets have been identified? Both
these questions will be discussed in this talk, with emphasis put on
the problem of model specification error.

Marc Vuffray, Los Alamos National Laboratory

Efficient Structure and Parameter Learning of Ising Models: A New Approach

An Ising model is a probability
distribution over binary variables which is modeled by a graph. The
vertices represent the variables and the edges carry a positive weight
that is function of the variables at the endpoint. The probability
distribution is proportional to the product of the weights associated
with edges. As variables are binary, the weights are fully characterized
by a single real parameter called coupling. We consider the learning
problem where an observer receives a collection of i.i.d. samples from
an Ising Model and wants to reconstruct the edges and the couplings. Our
contribution lies in a new reconstruction method that is efficient
computationally and sample-wise. Our method is based on a genuine
statistical estimator of the couplings inspired from physical
consideration. This estimator is the results of a convex optimization
program. The estimator is consistent and unbiased and we show that with a
proper regularization it estimates the couplings with a number of
samples that is small with respect to the system size and to the
coupling strengths.

Jonathan Yedidia, Analog Devices Lyric Labs

Combining Inference with Learning Using Memory Factor Networks

We introduce a new type of
graphical model that we call a “memory factor network” (MFN). We show
how to use MFNs to model the structure inherent in many types of data
sets. We also introduce an associated message- passing style algorithm
called “proactive message passing”? (PMP) that performs inference on
MFNs. PMP comes with convergence guarantees and is efficient in
comparison to competing algorithms such as variants of belief
propagation. We specialize MFNs and PMP to a number of distinct types of
data (discrete, continuous, labelled) and inference problems
(interpolation, hypothesis testing), provide examples, and discuss
approaches for efficient implementation.

Riccardo Zecchina, Politecnico di Torino

Efficient learning and optimization algorithms from local entropy maximization

We will discuss the role that
subdominant states play in the design of algorithms for large-scale
optimization problems. We shall take as representative case the problem
of learning random patterns with binary synapses in single layer
networks. The standard statistical physics results show that this
problem is exponentially dominated by isolated solutions that are
extremely hard to find algorithmically.By a novel large
deviation method we find unexpected analytical evidence for the
existence of subdominant and extremely dense regions of solutions.
Numerical experiments confirm these findings. We also show that the
dense regions are surprisingly accessible by simple learning protocols,
and that these synaptic configurations are robust to perturbations and
generalize better than typical solutions. These outcomes extend to
synapses with multiple states and to deeper neural architectures. The
large deviation measure we introduced for the analytic study also
suggests how to design general optimization algorithms based on local
entropy maximization.

Amanda Ziemann, Los Alamos National Laboratory

The challenge of variability: detecting materials of interest in remotely sensed hyperspectral imagery

Spectral imagery collected from
satellites and airborne platforms provides an important tool for
remotely analyzing the content of a scene. In particular, the ability to
remotely detect a specific material within a scene is of critical
importance in nonproliferation applications. Hyperspectral imaging
sensors measure, at each pixel, a radiance spectrum with up to hundreds
of distinct wavelength channels; this high-dimensional spectral
information allows for pixel-level material discrimination. In target
detection, the goal is to distinguish target materials from image
background clutter, and the challenge is that both the target and
background exhibit considerable spectral variability: target spectra
have within-material variability, and background spectra have
variability from material clutter. Target variability is often due to
known physical effects (e.g., particle size, packing density, and
moisture content) that can be explicitly modeled. Background variability
is far more heterogeneous and is typically characterized with
statistical or geometrical models in a high-dimensional feature space.
Effectively distinguishing target from background requires that both of
these variabilities be taken into account, i.e., the physically modeled
variability of the target and the statistical variability of the
background. This can be done using physics-informed machine learning. In
this poster, we will describe various methodologies, from a simple
subspace Gaussian model to a sophisticated graph-based manifold
approach, for detecting and labeling physically-varying targets in
statistically-varying backgrounds. Results will be shown on a
ground-truthed hyperspectral image for two different target materials.

Poster Abstracts

Jennifer Annoni, University of Minnesota

A New Approach to Reduced-Order Modeling for Flow Control

The objective of this work is to obtain linear reduced-order models that can be used for controller design.This technique has specifically been applied to fluid flows.High-fidelity
computational fluid dynamic models provide accurate characterizations
of complex flow dynamics but are not suitable for control design due to
their prohibitive computational complexity.A variety of
methods, including proper orthogonal decomposition (POD) and dynamic
mode decomposition (DMD) can be used to extract the dominant flow
structures and obtain reduced-order models of the flow.In
this presentation, we introduce a new method, termed input-output
dynamic mode decomposition (IODMD), that uses a subspace identification
technique to obtain models of low-complexity.We show
that, relative to standard DMD, the introduction of an external forcing
in IODMD provides robustness with respect to small disturbances and
noise in the model.This method captures the dominant characteristics of the flow as well as the input/output behavior of the system.Specifically,
IODMD is applied to high-fidelity simulations of wind farm flow to
obtain a reduced-order model that can be used for wind farm control.

Amir Ghasemianlangroodi, University of Colorado Boulder

Detectability Limit in Dynamic Networks

Detectability limit is an
interesting question in the extensively studied community detection
problem. In this work we study the detectability limit in dynamic
networks. Our model for dynamic networks (which is a special case of
many other models) characterizes the interactions independently at each
time slot through the regular stochastic block model (SBM) and the role
assignments of each node are modeled as a Markov chain across the time
dimension. Using this model we will find the detectability threshold of
dynamic networks as a function of rate of change and the strength of
communities. Below this threshold we claim that no efficient algorithms
can identify the communities better than chance. We also propose two
algorithms that are optimal in the sense that they succeed all the way
down to this threshold. The first algorithm uses belief propagation to
infer the role assignments, which gives asymptotically the optimal
accuracy, and the second one is a fast spectral clustering algorithm
based on linearizing the BP equations around the trivial fixed point
solutions.

Varun Jog, University of Pennsylvania

Persistence of centrality in random growing trees

We investigate properties
of node centrality in random growing tree models. We focus on a measure
of centrality that computes the maximum subtree size of the tree rooted
at each node, with the most central node being the tree centroid. For
random trees grown according to a preferential attachment model, a
uniform attachment model, or a diffusion processes over a regular tree,
we prove that a single node persists as the tree centroid after a finite
number of steps, with probability 1. Furthermore, this persistence
property generalizes to the top K?1 nodes with respect to the same
centrality measure. We also establish necessary and sufficient
conditions for the size of an initial seed graph required to ensure
persistence of a particular node with probability 1??, as a function of
?: In the case of preferential and uniform attachment models, we derive
bounds for the size of an initial hub constructed around the special
node. In the case of a diffusion process over a regular tree, we derive
bounds for the radius of an initial ball centered around the special
node. Our necessary and sufficient conditions match up to constant
factors for preferential attachment and diffusion tree models.

Ryan King, University of Colorado at Boulder

Autonomic Machine Learning Closure for Turbulence Simulations

We present a reproducing
kernel formulation of the recently introduced autonomic closure for
coarse-grained turbulence simulations [1,2]. The autonomic closure is a
data-driven approach that eliminates the need to specify a predefined
turbulence model and instead provides for fully-adaptive,
self-optimizing turbulence closures on-the-fly without external training
data. This is accomplished by posing a supervised learning problem
based on test filtering in the self-similar turbulent inertial range.
The supervised learning problem is solved using reproducing kernel
Hilbert spaces to efficiently find high or even infinite dimensional
turbulence closures. This nonparametric approach allows the autonomic
closure to freely adapt to varying nonlinear, nonlocal, nonequilibrium
and other turbulence characteristics of the flow. Initial results with
the autonomic closure prove to be remarkably more accurate in a priori
tests than do dynamic versions of traditional prescribed closures, with
limited computational penalties in the kernelized formulation.

Remi Lam, Massachusetts Institute of Technology

Optimization Framework for Data-driven Improvement of Physics-based Models.

Mathematical models are
extensively used for diverse tasks including analysis, optimization, and
decision making. However, those models are not perfect representations
of reality: simplified physical descriptions are used (simplified
governing equations, boundary conditions, etc.), and numerical
approximations are made (discretization, linearization, round-off error,
etc.). Such misspecifications can lead to erroneous or suboptimal
decisions for the end-goal task one aims to solve. To mitigate this
effect, one can adjust the available model using limited data produced
by experiments or higher fidelity numerical models. A large body of
research has focused on estimating explicit model parameters. This work
takes a different perspective and targets construction of a correction
operator with implicit properties. We investigate the case where the end
goal is an inversion problem and illustrate how appropriate choices of
structure lead to improved end-goal insight.

Li Li, University of California, Irvine

Understanding Machine-Learned Density Functionals

Machine learning (ML) is
an increasingly popular statistical tool for analyzing either measured
or calculated data sets. Here we explore its application to a
well-defined physics problem, investigating issues of how the underlying
physics are handled by ML, and how self-consistent solutions can be
found by limiting the domain in which ML is applied. The particular
problem is how to find accurate approximate density functionals for the
kinetic energy of non-interacting electrons. Kernel ridge regression is
used to approximate the kinetic energy of non-interacting fermions in a
one dimensional box as a functional of their density. The properties of
different kernels and methods of cross-validation are explored,
reproducing the physics faithfully in some cases, but not others. We
also address how self-consistency can be achieved with information on
only a limited electronic density domain. Accurate constrained optimal
densities are found via a modified Euler-Lagrange constrained
minimization of the machine-learned total energy, despite the poor
quality of its functional derivative. A projected gradient descent
algorithm is derived using local principal component analysis.
Additionally, a sparse grid representation of the density can be used
without degrading the performance of the methods. The implications for
machine-learned density functional approximations are discussed.

Nicholas Lubbers, Boston University

Deep Learning, Image Processing, and Materials Microstructure

In the past ten years,
significant improvements in machine learning and pattern recognition
have been enabled by novel training algorithms and architectures for
artificial neural networks. These techniques are the current state of
the art for computer vision, including image recognition and image
captioning tasks. We explore the scope of these algorithms and the
potential for turning them towards scientific domains in particular,
characterization of materials microstructure.

Sergey Matveev, Lomonosov Moscow State University, Skolkovo Institute of Science and Technology

TT-approximation technique for acceleration of the finite-difference scheme solving multicomponent coagulation equation

Numerical solution of the
multicomponent Smoluchowski coagulation equation is a complex problem
of numerical mathematics. The most spread technique for the mathematical
modelling of the coagulation process is Monte Carlo methodology. The
classical and highly-accurate finite-difference schemes are useless in
this application due to their extremely high algorithmic complexity. In
this work we applied the low-rank approximations in TT-format of both
the solution and the coagulation kernel and the fast TT-arithmetics to
the implementation of the time-steps of the classical Runge-Kutta
method. The obtained acceleration allows us to claim that we have
constructed a novel numerical method which is more efficient than the
known Monte Carlo technique.

Chihiro Nakajima, WPI-AIMR, Tohoku University

3D image reconstruction of a gold nano-cluster by compressed sensing

Recently, an atomic-scale
discrete computer tomography (CT) with transmission electron microscopy
(TEM) is becoming possible. However, it is difficult to obtain many
projected images for single object from various directions since
desorption of atoms is caused by irradiation of high-energy electrons.
We achieved the 3D-image reconstruction of nano-scale object from a few
(three) projected images by an application of L1-norm minimization to
the reconstruction algorithm. The poster illustrates a demonstration of
the reconstruction of atomic arrangement from experimental data with a
gold nano-porous-cluster.

Dave Osthus, Los Alamos National Laboratory

Probabilistic Forecasting of Seasonal Influenza in the US

Seasonal influenza is a
serious public health and societal problem due to its consequences
resulting from absenteeism, hospitalizations, and deaths. The overall
burden of influenza is captured by the Centers for Disease Control and
Preventions (CDC) influenza-like illness (ILI) network, which provides
invaluable information about the current incidence. Despite the
relatively rich surveillance data and the recurrent nature of seasonal
influenza, forecasting the timing and intensity of seasonal influenza in
the US represents a significant challenge. This is because the form of
the disease transmission process is uncertain, the disease dynamics are
only partially observed, and the public health observations are noisy.
In this work, we model the data, disease transmission process,
discrepancy, and parameters jointly in a Bayesian probabilistic model.
This is a promising approach for forecasting seasonal influenza, as it
allows us to impose constraints based on historical flu seasons.
Accounting for multiple modes of uncertainty and systematic model
discrepancy results in a probabilistic, prospective forecast of the
current flu season, available at http://bsvgateway.org/flu/.

Fatih G. Sen, Argonne National Laboratory

A comparative study of optimization methods for force field fitting

Large-scale atomistic
simulations using classical molecular dynamics/molecular mechanics
(MD/MM) methods enable modeling of materials, biological systems, and
chemical reactions at surfaces/interfaces that are not easily accessible
with experiments. The predictive capability and performance of MD/MC
methods greatly rely on the empirical force field (EFF) used, which
describes all interatomic interactions. The EFF parameters are generally
fitted using least-squares local minima search algorithms to represent
interatomic interactions and material properties obtained from
quantum-mechanical based methods such as density functional theory
(DFT), as well as available experimental data. Evolutionary algorithms
such as genetic algorithm (GA) enable finding the global optimum of
parameter space, even for EFF with very complex functional forms.
However, systematic studies of the efficiency and accuracy of different
optimization techniques for the determination of EFF parameters are
still lacking. In the present work, we test the efficiency and
effectiveness of a variety of global and local optimization schemes in
the determination of EFF parameters. We first generated a training data
set using DFT for Ir-O system, which included pertinent structural and
thermodynamics properties for a variety of structures. The EFF
functional forms were chosen to be Morse for the Ir-O system. The
optimization of the EFF parameters against the training data was carried
out using (1) multi-start local optimization with common techniques
including Simplex and Levenberg-Marquardt, and (2) single-objective GA.Using
the random search as a base line, we compared the effectiveness and
efficiency of the algorithms in terms of reaching the lowest error in
representing DFT data, and number of function evaluations. We also
investigated the use of multi-objective GA (MOGA) to remove the
ambiguities in selection of weighting scheme in defining the fitness
function. We analyzed accuracy and performance of MOGA in finding the
Pareto-optimal solutions with respect to population size and number of
objectives. Overall, this study provides a systematic approach for
selecting a suitable optimization method to determine the optimal EFF
parameters for use in classical simulations, which in turn enable more
accurate prediction of material properties and chemical phenomena.

Miles Stoudenmire, Perimeter Institute for Theoretical Physics

Supervised Learning with Tensor Networks

We propose an approach to
supervised learning based on tensor network approximations used in
physics. First we map input vectors to an exponentially larger 'quantum
state' space. Then we use the overlap of these mapped vectors with
'classifier tensors' to decide which class they belong to. Crucially, we
approximate the classifier tensors as matrix product states (a.k.a.
tensor trains). Such a tensor network approximation makes our approach
computationally efficient and regularizes our models. In future work, we
envision using even more powerful classes of tensor networks and
applying insights from physics to better understand the process of
learning.

Koujin Takeda, Ibaraki University

Application of partial parallel interference cancellation to sparse signal recovery

We propose a scheme for
how to construct sparse signal recovery algorithms in compressed sensing
under general measurement matrix, by applying the idea of partial
parallel interference cancellation in CDMA demodulation to iterative
soft thresholding. As a result, we arrive at a novel algorithm, which is
equivalent to approximate message passing (AMP) under an appropriate
assumption on the measurement matrix. This means that our novel
algorithm is one of generalizations of AMP. In addition, via numerical
experiment we verify that the proposed algorithm shows better
convergence than the original AMP when the measurement matrix does not
satisfy the assumption of AMP.

Dominik Thalmeier, Radboud University Nijmegen

Towards Optimal Control of Network Structure Growth

We consider the problem
of influencing the growth of a network to optimize some observable that
depends on the network structure. We formulate this problem as a
Kullback-Leibler control problem and show some interesting properties.
First, we show that this approach captures complex phenomena such as
transitions between qualitatively different optimal solutions and
delayed choice mechanisms. Second, we propose an annealing importance
sampling method to approximate the optimal control. Our approach uses
network features and can efficiently address high dimensional problems.
Finally, we apply this framework to influence the online commenting
behaviour in Internet forums, using realistic models of cascade growth.
We incorporate a realistic user model to account for the limited
controllability that characterizes these problems and show that the
proposed approach improves complex measures that depend on the structure
of the observed discussion threads.

Ching-Hao Wang, Boston University

A statistical physics approach to designing synthetic signaling cascades with message-passing algorithms

Cells have evolved
intricate signaling pathways to respond to multiple inputs. The process
of signal transduction can be understood as a message relay between
intracellular effectors and the extracellular cues. Here we present an
algorithmic way that combines message-passing algorithm with a
statistical-mechanical model of the underlying biophysics to investigate
the computational capability a general signaling network entails. We
demonstrate the utility of this approach using an example from synthetic
biology: designing a synthetic kinase cascade that implements an
arbitrary decision surface.We conclude by discussing the
implication of this approach for synthetic biology and the possibility
of designing new machine learning algorithms inspired by cellular
signaling networks.