Free Energy Rates for Parameter Rich Optimization Problems

Size: px
Start display at page:

Download "Free Energy Rates for Parameter Rich Optimization Problems"

Transcription

1 Free Energy Rates for Parameter Rich Optimization Problems Joachim M. Buhmann 1, Julien Dumazert 1, Alexey Gronskiy 1, Wojciech Szpankowski 2 1 ETH Zurich, {jbuhmann, alexeygr}@inf.ethz.ch, julien.dumazert@gmail.com, 2 Purdue University, spa@cs.purdue.edu AofA Workshop Princeton, 20th June 2017

2 Roadmap Generic problem Case study: sparse minimum bisection Main result formulation Proof outline Application Corollary AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 1/24

3 Generic Problem Generic problem Case study: sparse minimum bisection Main result formulation Proof outline Application Corollary AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 2/24

4 Generic Problem: Gibbs Sampling We search for solutions: X c AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 3/24

5 Generic Problem: Gibbs Sampling We search for solutions: X c Input X (e. g. a graph) is random! AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 3/24

6 Generic Problem: Gibbs Sampling We search for solutions: X c Input X (e. g. a graph) is random! Want solution c still to be stable define a Maximum Entropy Gibbs measure over c: Gibbs measures p(c X) exp( β cost(c,x)) solutions AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 3/24

7 Generic Problem: free energy Now p(c X) = exp ( β cost F(X) ), where the following is free energy: F(X) = log Z(X) (here Z(X) is partition function). Goal: compute expected free energy hard: E X F(X) = E X log Z(X). AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 4/24

8 Case Study: Sparse Minimum Bisection Generic problem Case study: sparse minimum bisection Main result formulation Proof outline Corollary Application AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 5/24

9 Case Study: Sparse Minimum Bisection Find two subgraphs of some small size d with minimum edge cost between them AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 6/24

10 Case Study: Sparse Minimum Bisection Find two subgraphs of some small size d with minimum edge cost between them AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 6/24

11 Case Study: Sparse Minimum Bisection Find two subgraphs of some small size d with minimum edge cost between them AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 6/24

12 Case Study: Sparse Minimum Bisection flashback to Generic Problem : in high dimensions may require solving via sampling! AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 6/24

13 Case Study: Sparse Minimum Bisection flashback to Generic Problem : in high dimensions may require solving via sampling! solutions are dependent on each other not Random Energy Model (next slide)! AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 6/24

14 Case Study: Sparse Minimum Bisection flashback to Generic Problem : in high dimensions may require solving via sampling! solutions are dependent on each other not Random Energy Model (next slide)! contribution: free energy asymptotics solved! AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 6/24

15 Case Study: Sparse Minimum Bisection flashback to Generic Problem : in high dimensions may require solving via sampling! solutions are dependent on each other not Random Energy Model (next slide)! contribution: free energy asymptotics solved! another case study Lawler Quadratic Assignment Problem solved as well: (see the paper) AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 6/24

16 Free Energy: History and Advances [Derrida 80] established REM no dependencies; AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 7/24

17 Free Energy: History and Advances [Derrida 80] established REM no dependencies; [Talagrand 02]: systematization; simple proof of REM free energy phase transition and beyond: { E[log Z(β, X)] β 2 lim = + log 2 β < 2 log 2, 4 n n β log 2 β 2 log 2. AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 7/24

18 Main Result Generic problem Case study: sparse minimum bisection Main result formulation Proof outline Application Corollary AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 8/24

19 Main Result: Setting m is number of solutions log m = o(n) N AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 9/24

20 Main Result: Setting m is number of solutions log m = o(n) N is size of one solution (number of parameters) N AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 9/24

21 Main Result: Setting m is number of solutions log m = o(n) N is size of one solution (number of parameters) require to be parameter rich N 10 log m = o(n) AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 9/24

22 Main Result: 2 nd Order Phase Transition Main theorem Consider sparse MBP on a complete graph; edge weights mutually independent within any given solution; have mean µ and variance σ 2. Then, the following holds: E[log Z] + lim ˆβµ N log m n log m provided log n d n2/7 log n. = { 1 + ˆβ 2 σ 2, ˆβ 2 < 2 ˆβσ 2, ˆβ 2 σ σ, AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 10/24

23 Proof Outline Generic problem Case study: sparse minimum bisection Main result formulation Proof outline Application Corollary AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 11/24

24 Proof Outline I Introduce solution overlap D: average intersection of two bisections D is key to understand dependencies (remember we are no REM!) Compute dependencies P(``Z close to EZ ) via Var Z Lemma 1 The following holds E rand choice D = O(d 4 /n). Breaking E log Z into two cond s Final bounding via adjusting parameter AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 12/24

25 Proof Outline II Introduce event A: happens when Z is close to EZ, i. e. A := {Z ɛez} Goal is to compute P(A) Fact 1 P(A) can be bounded by VarZ via Chebychev. Lemma 2 [Buhmann et al., 2014] VarZ can be asymptotically approximated via E rand choice D: Compute dependencies P(``Z close to EZ ) via Var Z Breaking E log Z into two cond s Final bounding via adjusting parameter VarZ (EZ) 2( σ 2 β 2 E rand choice D ). AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 13/24

26 Proof Outline III Break E log Z into Fact 3 E log Z = E[log Z A] P(A) + E[log Z1(Ā)] (log EZ + log ɛ)p(a) + E[log Z1(Ā)] Can expand log EZ via Taylor expansion (used assumptions of Theorem) and bound P(A) from previous. Fact 4 E[log Z1(Ā)]: enough to bound loosely Compute dependencies P(``Z close to EZ ) via Var Z Breaking E log Z into two cond s Final bounding via adjusting parameter AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 14/24

27 Proof Outline IV Finally, the right choice of ɛ for two regimes of β gives the phase transition in lower bound E[log Z] + lim n > { 1 + ˆβ 2 σ 2, ˆβ 2 < 2, σ ˆβσ 2, ˆβ 2. σ Compute dependencies P(``Z close to EZ ) via Var Z Breaking E log Z into two cond s The same phase transition happens for upper bound, easier to prove (no computing dependencies) Final bounding via adjusting parameter AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 15/24

28 Possible Application Generic problem Case study: sparse minimum bisection Main result formulation Proof outline Application Corollary AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 16/24

29 Application Setting: Gibbs Sampling Regularizer Remember our approach: AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 17/24

30 Application Setting: Gibbs Sampling Regularizer Remember our approach: Q: what is regularizing here? A: choice of β essentially controls width AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 17/24

31 Application Setting: Gibbs Sampling Regularizer Remember our approach: Q: what is regularizing here? A: choice of β essentially controls width Q: how? A: maximize expected log-convolution: [ β = arg max E X,X log ] p β (c X )p β (c X ) β c AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 17/24

32 Application Setting: Intuition Log-convolution is almost cross entropy; AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 18/24

33 Application Setting: Intuition Log-convolution is almost cross entropy; It stabilizes solution output: β = 0.25: underfitting β = 2.8: near-optimal β = 19.5: overfitting copt(x ) copt(x ) pβ( X ) pβ( X ) solutions solutions solutions β = 2.8 β = 0.25 β = β AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 18/24

34 Corollary and Application Generic problem Case study: sparse minimum bisection Main result formulation Proof outline Application Corollary AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 19/24

35 Corollary: Log-Convolution The log-convolution score can be rewritten: E log c p β (c X )p β (c X ) = E log Z(β, X + X ) 2 E log Z(β, X) }{{}}{{} Thm Thm AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 20/24

36 Corollary: Log-Convolution The log-convolution score can be rewritten: E log c p β (c X )p β (c X ) = E log Z(β, X + X ) 2 E log Z(β, X) }{{}}{{} Thm Thm Thus possible to analytically compute score: more noise less beta less noise greater beta noise-to-signal ratio AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 20/24

37 Conclusion We computed asymptotically precisely the value of E log Z with small dependencies It has applications to model validation (submitted to J. of Theor. CS) in machine learning: method involves comparing two fluctuating Gibbs distributions Still not found a general way (and probably there is no such way) AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 21/24

38 Thanks for your attention! AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 22/24

39 References I E. T. Jaynes, Information Theory and Statistical Mechanics. Phys. Rev., 106, 1957, p W. Szpankowski, Combinatorial optimization problems for which almost every algorithm is asymptotically optimal. Optimization, 33, 1995, pp M. Talagrand. Spin Glasses: A Challenge for Mathematicians. Springer, NY, AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 23/24

40 References II J. Vannimenus and M. Mézard, On the Statistical Mechanics of Optimization Problems of the traveling Salesman Type. J. de Physique Lettres, 45, 1984, pp J. M. Buhmann, A. Gronskiy and W. Szpankowski Free Energy Rates for a Class of Combinatorial Optimization Problems AofA 14, Paris J. M. Buhmann, A. Gronskiy, J. Dumazert and W. Szpankowski Phase Transitions in Parameter Rich Optimization Problems SODA/ANALCO 17, Barcelona AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 24/24

A Validation Criterion for a Class of Combinatorial Optimization Problems via Helmholtz Free Energy and its Asymptotics

A Validation Criterion for a Class of Combinatorial Optimization Problems via Helmholtz Free Energy and its Asymptotics JMLR: Workshop and Conference Proceedings vol 40:1 19, 2015 A Validation Criterion for a Class of Combinatorial Optimization Problems via Helmholtz Free Energy and its Asymptotics Joachim M. Buhmann Julien

More information

A Study of the Boltzmann Sequence-Structure Channel

A Study of the Boltzmann Sequence-Structure Channel 1 A Study of the Boltzmann Sequence-Structure Channel Abram Magner and Daisuke Kihara and Wojciech Szpankowski, Fellow, IEEE, Abstract We rigorously study a channel that maps binary sequences to self-avoiding

More information

CS340 Machine learning Lecture 5 Learning theory cont'd. Some slides are borrowed from Stuart Russell and Thorsten Joachims

CS340 Machine learning Lecture 5 Learning theory cont'd. Some slides are borrowed from Stuart Russell and Thorsten Joachims CS340 Machine learning Lecture 5 Learning theory cont'd Some slides are borrowed from Stuart Russell and Thorsten Joachims Inductive learning Simplest form: learn a function from examples f is the target

More information

Inverse Power Method for Non-linear Eigenproblems

Inverse Power Method for Non-linear Eigenproblems Inverse Power Method for Non-linear Eigenproblems Matthias Hein and Thomas Bühler Anubhav Dwivedi Department of Aerospace Engineering & Mechanics 7th March, 2017 1 / 30 OUTLINE Motivation Non-Linear Eigenproblems

More information

The Moment Method; Convex Duality; and Large/Medium/Small Deviations

The Moment Method; Convex Duality; and Large/Medium/Small Deviations Stat 928: Statistical Learning Theory Lecture: 5 The Moment Method; Convex Duality; and Large/Medium/Small Deviations Instructor: Sham Kakade The Exponential Inequality and Convex Duality The exponential

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Introduction to Machine Learning Fall 2017 Note 5. 1 Overview. 2 Metric

Introduction to Machine Learning Fall 2017 Note 5. 1 Overview. 2 Metric CS 189 Introduction to Machine Learning Fall 2017 Note 5 1 Overview Recall from our previous note that for a fixed input x, our measurement Y is a noisy measurement of the true underlying response f x):

More information

Machine Learning for NLP

Machine Learning for NLP Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline

More information

Information, Physics, and Computation

Information, Physics, and Computation Information, Physics, and Computation Marc Mezard Laboratoire de Physique Thdorique et Moales Statistiques, CNRS, and Universit y Paris Sud Andrea Montanari Department of Electrical Engineering and Department

More information

1 Maximizing a Submodular Function

1 Maximizing a Submodular Function 6.883 Learning with Combinatorial Structure Notes for Lecture 16 Author: Arpit Agarwal 1 Maximizing a Submodular Function In the last lecture we looked at maximization of a monotone submodular function,

More information

12 Statistical Justifications; the Bias-Variance Decomposition

12 Statistical Justifications; the Bias-Variance Decomposition Statistical Justifications; the Bias-Variance Decomposition 65 12 Statistical Justifications; the Bias-Variance Decomposition STATISTICAL JUSTIFICATIONS FOR REGRESSION [So far, I ve talked about regression

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Introduction to Machine Learning. Regression. Computer Science, Tel-Aviv University,

Introduction to Machine Learning. Regression. Computer Science, Tel-Aviv University, 1 Introduction to Machine Learning Regression Computer Science, Tel-Aviv University, 2013-14 Classification Input: X Real valued, vectors over real. Discrete values (0,1,2,...) Other structures (e.g.,

More information

Machine Learning. Lecture 9: Learning Theory. Feng Li.

Machine Learning. Lecture 9: Learning Theory. Feng Li. Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell

More information

COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d)

COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless

More information

Phase transitions for particles in R 3

Phase transitions for particles in R 3 Phase transitions for particles in R 3 Sabine Jansen LMU Munich Konstanz, 29 May 208 Overview. Introduction to statistical mechanics: Partition functions and statistical ensembles 2. Phase transitions:

More information

Variance Reduction and Ensemble Methods

Variance Reduction and Ensemble Methods Variance Reduction and Ensemble Methods Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Last Time PAC learning Bias/variance tradeoff small hypothesis

More information

LE JOURNAL DE PHYSIQUE - LETTRES

LE JOURNAL DE PHYSIQUE - LETTRES Les The Tome 46 No 6 15 MARS 1985 LE JOURNAL DE PHYSIQUE - LETTRES J. Physique Lett. 46 (1985) L-217 - L-222 15 MARS 1985, : L-217 Classification Physics Abstracts 75.50K Random free energies in spin glasses

More information

Definition (The carefully thought-out calculus version based on limits).

Definition (The carefully thought-out calculus version based on limits). 4.1. Continuity and Graphs Definition 4.1.1 (Intuitive idea used in algebra based on graphing). A function, f, is continuous on the interval (a, b) if the graph of y = f(x) can be drawn over the interval

More information

n! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2

n! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2 Order statistics Ex. 4. (*. Let independent variables X,..., X n have U(0, distribution. Show that for every x (0,, we have P ( X ( < x and P ( X (n > x as n. Ex. 4.2 (**. By using induction or otherwise,

More information

Problem set 2 The central limit theorem.

Problem set 2 The central limit theorem. Problem set 2 The central limit theorem. Math 22a September 6, 204 Due Sept. 23 The purpose of this problem set is to walk through the proof of the central limit theorem of probability theory. Roughly

More information

Lecture 10: Normal RV. Lisa Yan July 18, 2018

Lecture 10: Normal RV. Lisa Yan July 18, 2018 Lecture 10: Normal RV Lisa Yan July 18, 2018 Announcements Midterm next Tuesday Practice midterm, solutions out on website SCPD students: fill out Google form by today Covers up to and including Friday

More information

Graphlet Screening (GS)

Graphlet Screening (GS) Graphlet Screening (GS) Jiashun Jin Carnegie Mellon University April 11, 2014 Jiashun Jin Graphlet Screening (GS) 1 / 36 Collaborators Alphabetically: Zheng (Tracy) Ke Cun-Hui Zhang Qi Zhang Princeton

More information

Lecture 5: Random Energy Model

Lecture 5: Random Energy Model STAT 206A: Gibbs Measures Invited Speaker: Andrea Montanari Lecture 5: Random Energy Model Lecture date: September 2 Scribe: Sebastien Roch This is a guest lecture by Andrea Montanari (ENS Paris and Stanford)

More information

Machine Learning Basics: Maximum Likelihood Estimation

Machine Learning Basics: Maximum Likelihood Estimation Machine Learning Basics: Maximum Likelihood Estimation Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics 1. Learning

More information

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE EXAMINATION MODULE 2

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE EXAMINATION MODULE 2 THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE EXAMINATION NEW MODULAR SCHEME introduced from the eaminations in 7 MODULE SPECIMEN PAPER B AND SOLUTIONS The time for the eamination is ½ hours The paper

More information

To get horizontal and slant asymptotes algebraically we need to know about end behaviour for rational functions.

To get horizontal and slant asymptotes algebraically we need to know about end behaviour for rational functions. Concepts: Horizontal Asymptotes, Vertical Asymptotes, Slant (Oblique) Asymptotes, Transforming Reciprocal Function, Sketching Rational Functions, Solving Inequalities using Sign Charts. Rational Function

More information

An Introduction to Statistical and Probabilistic Linear Models

An Introduction to Statistical and Probabilistic Linear Models An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning

More information

Simple Linear Regression Estimation and Properties

Simple Linear Regression Estimation and Properties Simple Linear Regression Estimation and Properties Outline Review of the Reading Estimate parameters using OLS Other features of OLS Numerical Properties of OLS Assumptions of OLS Goodness of Fit Checking

More information

Variances and covariances

Variances and covariances Page 1 Chapter 4 Variances and covariances variance The expected value of a random variable gives a crude measure of the center of location of the distribution of that random variable. For instance, if

More information

Machine Learning, Fall 2011: Homework 5

Machine Learning, Fall 2011: Homework 5 0-60 Machine Learning, Fall 0: Homework 5 Machine Learning Department Carnegie Mellon University Due:??? Instructions There are 3 questions on this assignment. Please submit your completed homework to

More information

Simple Techniques for Improving SGD. CS6787 Lecture 2 Fall 2017

Simple Techniques for Improving SGD. CS6787 Lecture 2 Fall 2017 Simple Techniques for Improving SGD CS6787 Lecture 2 Fall 2017 Step Sizes and Convergence Where we left off Stochastic gradient descent x t+1 = x t rf(x t ; yĩt ) Much faster per iteration than gradient

More information

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

Economics 583: Econometric Theory I A Primer on Asymptotics

Economics 583: Econometric Theory I A Primer on Asymptotics Economics 583: Econometric Theory I A Primer on Asymptotics Eric Zivot January 14, 2013 The two main concepts in asymptotic theory that we will use are Consistency Asymptotic Normality Intuition consistency:

More information

11. Learning graphical models

11. Learning graphical models Learning graphical models 11-1 11. Learning graphical models Maximum likelihood Parameter learning Structural learning Learning partially observed graphical models Learning graphical models 11-2 statistical

More information

Gibbs Sampling in Linear Models #2

Gibbs Sampling in Linear Models #2 Gibbs Sampling in Linear Models #2 Econ 690 Purdue University Outline 1 Linear Regression Model with a Changepoint Example with Temperature Data 2 The Seemingly Unrelated Regressions Model 3 Gibbs sampling

More information

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting

More information

On the number of circuits in random graphs. Guilhem Semerjian. [ joint work with Enzo Marinari and Rémi Monasson ]

On the number of circuits in random graphs. Guilhem Semerjian. [ joint work with Enzo Marinari and Rémi Monasson ] On the number of circuits in random graphs Guilhem Semerjian [ joint work with Enzo Marinari and Rémi Monasson ] [ Europhys. Lett. 73, 8 (2006) ] and [ cond-mat/0603657 ] Orsay 13-04-2006 Outline of the

More information

Chapter 12 - Lecture 2 Inferences about regression coefficient

Chapter 12 - Lecture 2 Inferences about regression coefficient Chapter 12 - Lecture 2 Inferences about regression coefficient April 19th, 2010 Facts about slope Test Statistic Confidence interval Hypothesis testing Test using ANOVA Table Facts about slope In previous

More information

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability

More information

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Multivariate Gaussians Mark Schmidt University of British Columbia Winter 2019 Last Time: Multivariate Gaussian http://personal.kenyon.edu/hartlaub/mellonproject/bivariate2.html

More information

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 2015 MODULE 1 : Probability distributions Time allowed: Three hours Candidates should answer FIVE questions. All questions carry equal marks.

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Learning latent structure in complex networks

Learning latent structure in complex networks Learning latent structure in complex networks Lars Kai Hansen www.imm.dtu.dk/~lkh Current network research issues: Social Media Neuroinformatics Machine learning Joint work with Morten Mørup, Sune Lehmann

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

The non-backtracking operator

The non-backtracking operator The non-backtracking operator Florent Krzakala LPS, Ecole Normale Supérieure in collaboration with Paris: L. Zdeborova, A. Saade Rome: A. Decelle Würzburg: J. Reichardt Santa Fe: C. Moore, P. Zhang Berkeley:

More information

Rational Functions. Elementary Functions. Algebra with mixed fractions. Algebra with mixed fractions

Rational Functions. Elementary Functions. Algebra with mixed fractions. Algebra with mixed fractions Rational Functions A rational function f (x) is a function which is the ratio of two polynomials, that is, Part 2, Polynomials Lecture 26a, Rational Functions f (x) = where and are polynomials Dr Ken W

More information

Outline. Random Variables. Examples. Random Variable

Outline. Random Variables. Examples. Random Variable Outline Random Variables M. Sami Fadali Professor of Electrical Engineering University of Nevada, Reno Random variables. CDF and pdf. Joint random variables. Correlated, independent, orthogonal. Correlation,

More information

Linear Models for Regression

Linear Models for Regression Linear Models for Regression CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 The Regression Problem Training data: A set of input-output

More information

ALGEBRAIC GEOMETRY HOMEWORK 3

ALGEBRAIC GEOMETRY HOMEWORK 3 ALGEBRAIC GEOMETRY HOMEWORK 3 (1) Consider the curve Y 2 = X 2 (X + 1). (a) Sketch the curve. (b) Determine the singular point P on C. (c) For all lines through P, determine the intersection multiplicity

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1

More information

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Gaussian and Linear Discriminant Analysis; Multiclass Classification Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015

More information

10701/15781 Machine Learning, Spring 2007: Homework 2

10701/15781 Machine Learning, Spring 2007: Homework 2 070/578 Machine Learning, Spring 2007: Homework 2 Due: Wednesday, February 2, beginning of the class Instructions There are 4 questions on this assignment The second question involves coding Do not attach

More information

Lecture 4: September Reminder: convergence of sequences

Lecture 4: September Reminder: convergence of sequences 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 4: September 6 In this lecture we discuss the convergence of random variables. At a high-level, our first few lectures focused

More information

GWAS IV: Bayesian linear (variance component) models

GWAS IV: Bayesian linear (variance component) models GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian

More information

Information Measure Estimation and Applications: Boosting the Effective Sample Size from n to n ln n

Information Measure Estimation and Applications: Boosting the Effective Sample Size from n to n ln n Information Measure Estimation and Applications: Boosting the Effective Sample Size from n to n ln n Jiantao Jiao (Stanford EE) Joint work with: Kartik Venkat Yanjun Han Tsachy Weissman Stanford EE Tsinghua

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

Lecture 4: Types of errors. Bayesian regression models. Logistic regression Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture

More information

Kyle Reing University of Southern California April 18, 2018

Kyle Reing University of Southern California April 18, 2018 Renormalization Group and Information Theory Kyle Reing University of Southern California April 18, 2018 Overview Renormalization Group Overview Information Theoretic Preliminaries Real Space Mutual Information

More information

Machine learning - HT Basis Expansion, Regularization, Validation

Machine learning - HT Basis Expansion, Regularization, Validation Machine learning - HT 016 4. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford Feburary 03, 016 Outline Introduce basis function to go beyond linear regression Understanding

More information

ECE 302 Division 1 MWF 10:30-11:20 (Prof. Pollak) Final Exam Solutions, 5/3/2004. Please read the instructions carefully before proceeding.

ECE 302 Division 1 MWF 10:30-11:20 (Prof. Pollak) Final Exam Solutions, 5/3/2004. Please read the instructions carefully before proceeding. NAME: ECE 302 Division MWF 0:30-:20 (Prof. Pollak) Final Exam Solutions, 5/3/2004. Please read the instructions carefully before proceeding. If you are not in Prof. Pollak s section, you may not take this

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem set 1 Solutions Thursday, September 19 What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.

More information

An Introduction to Deterministic Annealing

An Introduction to Deterministic Annealing An Introduction to Deterministic Annealing Fabrice Rossi SAMM Université Paris 1 2012 Optimization in machine learning: optimization is one of the central tools methodology: choose a model with some adjustable

More information

Asymptotic Rational Approximation To Pi: Solution of an Unsolved Problem Posed By Herbert Wilf

Asymptotic Rational Approximation To Pi: Solution of an Unsolved Problem Posed By Herbert Wilf AofA 0 DMTCS proc AM, 00, 59 60 Asymptotic Rational Approximation To Pi: Solution of an Unsolved Problem Posed By Herbert Wilf Mark Daniel Ward Department of Statistics, Purdue University, 50 North University

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

ECE 595: Machine Learning I Adversarial Attack 1

ECE 595: Machine Learning I Adversarial Attack 1 ECE 595: Machine Learning I Adversarial Attack 1 Spring 2019 Stanley Chan School of Electrical and Computer Engineering Purdue University 1 / 32 Outline Examples of Adversarial Attack Basic Terminology

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the

More information

CS145: Probability & Computing

CS145: Probability & Computing CS45: Probability & Computing Lecture 5: Concentration Inequalities, Law of Large Numbers, Central Limit Theorem Instructor: Eli Upfal Brown University Computer Science Figure credits: Bertsekas & Tsitsiklis,

More information

Specification Errors, Measurement Errors, Confounding

Specification Errors, Measurement Errors, Confounding Specification Errors, Measurement Errors, Confounding Kerby Shedden Department of Statistics, University of Michigan October 10, 2018 1 / 32 An unobserved covariate Suppose we have a data generating model

More information

Bayesian Learning in Undirected Graphical Models

Bayesian Learning in Undirected Graphical Models Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ Work with: Iain Murray and Hyun-Chul

More information

Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression

Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Due: Monday, February 13, 2017, at 10pm (Submit via Gradescope) Instructions: Your answers to the questions below,

More information

n! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2

n! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2 Order statistics Ex. 4.1 (*. Let independent variables X 1,..., X n have U(0, 1 distribution. Show that for every x (0, 1, we have P ( X (1 < x 1 and P ( X (n > x 1 as n. Ex. 4.2 (**. By using induction

More information

An Asymptotically Optimal Algorithm for the Max k-armed Bandit Problem

An Asymptotically Optimal Algorithm for the Max k-armed Bandit Problem An Asymptotically Optimal Algorithm for the Max k-armed Bandit Problem Matthew J. Streeter February 27, 2006 CMU-CS-06-110 Stephen F. Smith School of Computer Science Carnegie Mellon University Pittsburgh,

More information

Upper Bounds for Partitions into k-th Powers Elementary Methods

Upper Bounds for Partitions into k-th Powers Elementary Methods Int. J. Contemp. Math. Sciences, Vol. 4, 2009, no. 9, 433-438 Upper Bounds for Partitions into -th Powers Elementary Methods Rafael Jaimczu División Matemática, Universidad Nacional de Luján Buenos Aires,

More information

Gaussian processes for inference in stochastic differential equations

Gaussian processes for inference in stochastic differential equations Gaussian processes for inference in stochastic differential equations Manfred Opper, AI group, TU Berlin November 6, 2017 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017

More information

STAT 111 Recitation 7

STAT 111 Recitation 7 STAT 111 Recitation 7 Xin Lu Tan xtan@wharton.upenn.edu October 25, 2013 1 / 13 Miscellaneous Please turn in homework 6. Please pick up homework 7 and the graded homework 5. Please check your grade and

More information

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships

More information

Brief Review of Probability

Brief Review of Probability Brief Review of Probability Nuno Vasconcelos (Ken Kreutz-Delgado) ECE Department, UCSD Probability Probability theory is a mathematical language to deal with processes or experiments that are non-deterministic

More information

ECE 595: Machine Learning I Adversarial Attack 1

ECE 595: Machine Learning I Adversarial Attack 1 ECE 595: Machine Learning I Adversarial Attack 1 Spring 2019 Stanley Chan School of Electrical and Computer Engineering Purdue University 1 / 32 Outline Examples of Adversarial Attack Basic Terminology

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

Optimal predictive inference

Optimal predictive inference Optimal predictive inference Susanne Still University of Hawaii at Manoa Information and Computer Sciences S. Still, J. P. Crutchfield. Structure or Noise? http://lanl.arxiv.org/abs/0708.0654 S. Still,

More information

Optimization Methods. Lecture 19: Line Searches and Newton s Method

Optimization Methods. Lecture 19: Line Searches and Newton s Method 15.93 Optimization Methods Lecture 19: Line Searches and Newton s Method 1 Last Lecture Necessary Conditions for Optimality (identifies candidates) x local min f(x ) =, f(x ) PSD Slide 1 Sufficient Conditions

More information

Data Analysis and Manifold Learning Lecture 9: Diffusion on Manifolds and on Graphs

Data Analysis and Manifold Learning Lecture 9: Diffusion on Manifolds and on Graphs Data Analysis and Manifold Learning Lecture 9: Diffusion on Manifolds and on Graphs Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture

More information

Multiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar

Multiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression 1 / 36 Previous two lectures Linear and logistic

More information

Free Entropy for Free Gibbs Laws Given by Convex Potentials

Free Entropy for Free Gibbs Laws Given by Convex Potentials Free Entropy for Free Gibbs Laws Given by Convex Potentials David A. Jekel University of California, Los Angeles Young Mathematicians in C -algebras, August 2018 David A. Jekel (UCLA) Free Entropy YMC

More information

Kernel Methods. Barnabás Póczos

Kernel Methods. Barnabás Póczos Kernel Methods Barnabás Póczos Outline Quick Introduction Feature space Perceptron in the feature space Kernels Mercer s theorem Finite domain Arbitrary domain Kernel families Constructing new kernels

More information

Linear Classification

Linear Classification Linear Classification Lili MOU moull12@sei.pku.edu.cn http://sei.pku.edu.cn/ moull12 23 April 2015 Outline Introduction Discriminant Functions Probabilistic Generative Models Probabilistic Discriminative

More information

x 2 x n r n J(x + t(x x ))(x x )dt. For warming-up we start with methods for solving a single equation of one variable.

x 2 x n r n J(x + t(x x ))(x x )dt. For warming-up we start with methods for solving a single equation of one variable. Maria Cameron 1. Fixed point methods for solving nonlinear equations We address the problem of solving an equation of the form (1) r(x) = 0, where F (x) : R n R n is a vector-function. Eq. (1) can be written

More information

1 Regression with High Dimensional Data

1 Regression with High Dimensional Data 6.883 Learning with Combinatorial Structure ote for Lecture 11 Instructor: Prof. Stefanie Jegelka Scribe: Xuhong Zhang 1 Regression with High Dimensional Data Consider the following regression problem:

More information

Advanced computational methods X Selected Topics: SGD

Advanced computational methods X Selected Topics: SGD Advanced computational methods X071521-Selected Topics: SGD. In this lecture, we look at the stochastic gradient descent (SGD) method 1 An illustrating example The MNIST is a simple dataset of variety

More information

Final Examination Statistics 200C. T. Ferguson June 11, 2009

Final Examination Statistics 200C. T. Ferguson June 11, 2009 Final Examination Statistics 00C T. Ferguson June, 009. (a) Define: X n converges in probability to X. (b) Define: X m converges in quadratic mean to X. (c) Show that if X n converges in quadratic mean

More information

Graduate Econometrics I: Asymptotic Theory

Graduate Econometrics I: Asymptotic Theory Graduate Econometrics I: Asymptotic Theory Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Asymptotic Theory

More information

The deterministic Lasso

The deterministic Lasso The deterministic Lasso Sara van de Geer Seminar für Statistik, ETH Zürich Abstract We study high-dimensional generalized linear models and empirical risk minimization using the Lasso An oracle inequality

More information

Math 473: Practice Problems for Test 1, Fall 2011, SOLUTIONS

Math 473: Practice Problems for Test 1, Fall 2011, SOLUTIONS Math 473: Practice Problems for Test 1, Fall 011, SOLUTIONS Show your work: 1. (a) Compute the Taylor polynomials P n (x) for f(x) = sin x and x 0 = 0. Solution: Compute f(x) = sin x, f (x) = cos x, f

More information

Efficient and Optimal Modal-set Estimation using knn graphs

Efficient and Optimal Modal-set Estimation using knn graphs Efficient and Optimal Modal-set Estimation using knn graphs Samory Kpotufe ORFE, Princeton University Based on various results with Sanjoy Dasgupta, Kamalika Chaudhuri, Ulrike von Luxburg, Heinrich Jiang

More information

Day 3: Classification, logistic regression

Day 3: Classification, logistic regression Day 3: Classification, logistic regression Introduction to Machine Learning Summer School June 18, 2018 - June 29, 2018, Chicago Instructor: Suriya Gunasekar, TTI Chicago 20 June 2018 Topics so far Supervised

More information

Machine Learning Lecture 7

Machine Learning Lecture 7 Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant

More information