Free Energy Rates for Parameter Rich Optimization Problems
|
|
- Julianna Roberts
- 5 years ago
- Views:
Transcription
1 Free Energy Rates for Parameter Rich Optimization Problems Joachim M. Buhmann 1, Julien Dumazert 1, Alexey Gronskiy 1, Wojciech Szpankowski 2 1 ETH Zurich, {jbuhmann, alexeygr}@inf.ethz.ch, julien.dumazert@gmail.com, 2 Purdue University, spa@cs.purdue.edu AofA Workshop Princeton, 20th June 2017
2 Roadmap Generic problem Case study: sparse minimum bisection Main result formulation Proof outline Application Corollary AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 1/24
3 Generic Problem Generic problem Case study: sparse minimum bisection Main result formulation Proof outline Application Corollary AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 2/24
4 Generic Problem: Gibbs Sampling We search for solutions: X c AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 3/24
5 Generic Problem: Gibbs Sampling We search for solutions: X c Input X (e. g. a graph) is random! AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 3/24
6 Generic Problem: Gibbs Sampling We search for solutions: X c Input X (e. g. a graph) is random! Want solution c still to be stable define a Maximum Entropy Gibbs measure over c: Gibbs measures p(c X) exp( β cost(c,x)) solutions AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 3/24
7 Generic Problem: free energy Now p(c X) = exp ( β cost F(X) ), where the following is free energy: F(X) = log Z(X) (here Z(X) is partition function). Goal: compute expected free energy hard: E X F(X) = E X log Z(X). AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 4/24
8 Case Study: Sparse Minimum Bisection Generic problem Case study: sparse minimum bisection Main result formulation Proof outline Corollary Application AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 5/24
9 Case Study: Sparse Minimum Bisection Find two subgraphs of some small size d with minimum edge cost between them AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 6/24
10 Case Study: Sparse Minimum Bisection Find two subgraphs of some small size d with minimum edge cost between them AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 6/24
11 Case Study: Sparse Minimum Bisection Find two subgraphs of some small size d with minimum edge cost between them AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 6/24
12 Case Study: Sparse Minimum Bisection flashback to Generic Problem : in high dimensions may require solving via sampling! AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 6/24
13 Case Study: Sparse Minimum Bisection flashback to Generic Problem : in high dimensions may require solving via sampling! solutions are dependent on each other not Random Energy Model (next slide)! AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 6/24
14 Case Study: Sparse Minimum Bisection flashback to Generic Problem : in high dimensions may require solving via sampling! solutions are dependent on each other not Random Energy Model (next slide)! contribution: free energy asymptotics solved! AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 6/24
15 Case Study: Sparse Minimum Bisection flashback to Generic Problem : in high dimensions may require solving via sampling! solutions are dependent on each other not Random Energy Model (next slide)! contribution: free energy asymptotics solved! another case study Lawler Quadratic Assignment Problem solved as well: (see the paper) AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 6/24
16 Free Energy: History and Advances [Derrida 80] established REM no dependencies; AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 7/24
17 Free Energy: History and Advances [Derrida 80] established REM no dependencies; [Talagrand 02]: systematization; simple proof of REM free energy phase transition and beyond: { E[log Z(β, X)] β 2 lim = + log 2 β < 2 log 2, 4 n n β log 2 β 2 log 2. AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 7/24
18 Main Result Generic problem Case study: sparse minimum bisection Main result formulation Proof outline Application Corollary AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 8/24
19 Main Result: Setting m is number of solutions log m = o(n) N AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 9/24
20 Main Result: Setting m is number of solutions log m = o(n) N is size of one solution (number of parameters) N AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 9/24
21 Main Result: Setting m is number of solutions log m = o(n) N is size of one solution (number of parameters) require to be parameter rich N 10 log m = o(n) AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 9/24
22 Main Result: 2 nd Order Phase Transition Main theorem Consider sparse MBP on a complete graph; edge weights mutually independent within any given solution; have mean µ and variance σ 2. Then, the following holds: E[log Z] + lim ˆβµ N log m n log m provided log n d n2/7 log n. = { 1 + ˆβ 2 σ 2, ˆβ 2 < 2 ˆβσ 2, ˆβ 2 σ σ, AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 10/24
23 Proof Outline Generic problem Case study: sparse minimum bisection Main result formulation Proof outline Application Corollary AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 11/24
24 Proof Outline I Introduce solution overlap D: average intersection of two bisections D is key to understand dependencies (remember we are no REM!) Compute dependencies P(``Z close to EZ ) via Var Z Lemma 1 The following holds E rand choice D = O(d 4 /n). Breaking E log Z into two cond s Final bounding via adjusting parameter AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 12/24
25 Proof Outline II Introduce event A: happens when Z is close to EZ, i. e. A := {Z ɛez} Goal is to compute P(A) Fact 1 P(A) can be bounded by VarZ via Chebychev. Lemma 2 [Buhmann et al., 2014] VarZ can be asymptotically approximated via E rand choice D: Compute dependencies P(``Z close to EZ ) via Var Z Breaking E log Z into two cond s Final bounding via adjusting parameter VarZ (EZ) 2( σ 2 β 2 E rand choice D ). AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 13/24
26 Proof Outline III Break E log Z into Fact 3 E log Z = E[log Z A] P(A) + E[log Z1(Ā)] (log EZ + log ɛ)p(a) + E[log Z1(Ā)] Can expand log EZ via Taylor expansion (used assumptions of Theorem) and bound P(A) from previous. Fact 4 E[log Z1(Ā)]: enough to bound loosely Compute dependencies P(``Z close to EZ ) via Var Z Breaking E log Z into two cond s Final bounding via adjusting parameter AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 14/24
27 Proof Outline IV Finally, the right choice of ɛ for two regimes of β gives the phase transition in lower bound E[log Z] + lim n > { 1 + ˆβ 2 σ 2, ˆβ 2 < 2, σ ˆβσ 2, ˆβ 2. σ Compute dependencies P(``Z close to EZ ) via Var Z Breaking E log Z into two cond s The same phase transition happens for upper bound, easier to prove (no computing dependencies) Final bounding via adjusting parameter AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 15/24
28 Possible Application Generic problem Case study: sparse minimum bisection Main result formulation Proof outline Application Corollary AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 16/24
29 Application Setting: Gibbs Sampling Regularizer Remember our approach: AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 17/24
30 Application Setting: Gibbs Sampling Regularizer Remember our approach: Q: what is regularizing here? A: choice of β essentially controls width AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 17/24
31 Application Setting: Gibbs Sampling Regularizer Remember our approach: Q: what is regularizing here? A: choice of β essentially controls width Q: how? A: maximize expected log-convolution: [ β = arg max E X,X log ] p β (c X )p β (c X ) β c AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 17/24
32 Application Setting: Intuition Log-convolution is almost cross entropy; AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 18/24
33 Application Setting: Intuition Log-convolution is almost cross entropy; It stabilizes solution output: β = 0.25: underfitting β = 2.8: near-optimal β = 19.5: overfitting copt(x ) copt(x ) pβ( X ) pβ( X ) solutions solutions solutions β = 2.8 β = 0.25 β = β AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 18/24
34 Corollary and Application Generic problem Case study: sparse minimum bisection Main result formulation Proof outline Application Corollary AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 19/24
35 Corollary: Log-Convolution The log-convolution score can be rewritten: E log c p β (c X )p β (c X ) = E log Z(β, X + X ) 2 E log Z(β, X) }{{}}{{} Thm Thm AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 20/24
36 Corollary: Log-Convolution The log-convolution score can be rewritten: E log c p β (c X )p β (c X ) = E log Z(β, X + X ) 2 E log Z(β, X) }{{}}{{} Thm Thm Thus possible to analytically compute score: more noise less beta less noise greater beta noise-to-signal ratio AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 20/24
37 Conclusion We computed asymptotically precisely the value of E log Z with small dependencies It has applications to model validation (submitted to J. of Theor. CS) in machine learning: method involves comparing two fluctuating Gibbs distributions Still not found a general way (and probably there is no such way) AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 21/24
38 Thanks for your attention! AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 22/24
39 References I E. T. Jaynes, Information Theory and Statistical Mechanics. Phys. Rev., 106, 1957, p W. Szpankowski, Combinatorial optimization problems for which almost every algorithm is asymptotically optimal. Optimization, 33, 1995, pp M. Talagrand. Spin Glasses: A Challenge for Mathematicians. Springer, NY, AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 23/24
40 References II J. Vannimenus and M. Mézard, On the Statistical Mechanics of Optimization Problems of the traveling Salesman Type. J. de Physique Lettres, 45, 1984, pp J. M. Buhmann, A. Gronskiy and W. Szpankowski Free Energy Rates for a Class of Combinatorial Optimization Problems AofA 14, Paris J. M. Buhmann, A. Gronskiy, J. Dumazert and W. Szpankowski Phase Transitions in Parameter Rich Optimization Problems SODA/ANALCO 17, Barcelona AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 24/24
A Validation Criterion for a Class of Combinatorial Optimization Problems via Helmholtz Free Energy and its Asymptotics
JMLR: Workshop and Conference Proceedings vol 40:1 19, 2015 A Validation Criterion for a Class of Combinatorial Optimization Problems via Helmholtz Free Energy and its Asymptotics Joachim M. Buhmann Julien
More informationA Study of the Boltzmann Sequence-Structure Channel
1 A Study of the Boltzmann Sequence-Structure Channel Abram Magner and Daisuke Kihara and Wojciech Szpankowski, Fellow, IEEE, Abstract We rigorously study a channel that maps binary sequences to self-avoiding
More informationCS340 Machine learning Lecture 5 Learning theory cont'd. Some slides are borrowed from Stuart Russell and Thorsten Joachims
CS340 Machine learning Lecture 5 Learning theory cont'd Some slides are borrowed from Stuart Russell and Thorsten Joachims Inductive learning Simplest form: learn a function from examples f is the target
More informationInverse Power Method for Non-linear Eigenproblems
Inverse Power Method for Non-linear Eigenproblems Matthias Hein and Thomas Bühler Anubhav Dwivedi Department of Aerospace Engineering & Mechanics 7th March, 2017 1 / 30 OUTLINE Motivation Non-Linear Eigenproblems
More informationThe Moment Method; Convex Duality; and Large/Medium/Small Deviations
Stat 928: Statistical Learning Theory Lecture: 5 The Moment Method; Convex Duality; and Large/Medium/Small Deviations Instructor: Sham Kakade The Exponential Inequality and Convex Duality The exponential
More informationIntroduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin
1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)
More informationIntroduction to Machine Learning Fall 2017 Note 5. 1 Overview. 2 Metric
CS 189 Introduction to Machine Learning Fall 2017 Note 5 1 Overview Recall from our previous note that for a fixed input x, our measurement Y is a noisy measurement of the true underlying response f x):
More informationMachine Learning for NLP
Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline
More informationInformation, Physics, and Computation
Information, Physics, and Computation Marc Mezard Laboratoire de Physique Thdorique et Moales Statistiques, CNRS, and Universit y Paris Sud Andrea Montanari Department of Electrical Engineering and Department
More information1 Maximizing a Submodular Function
6.883 Learning with Combinatorial Structure Notes for Lecture 16 Author: Arpit Agarwal 1 Maximizing a Submodular Function In the last lecture we looked at maximization of a monotone submodular function,
More information12 Statistical Justifications; the Bias-Variance Decomposition
Statistical Justifications; the Bias-Variance Decomposition 65 12 Statistical Justifications; the Bias-Variance Decomposition STATISTICAL JUSTIFICATIONS FOR REGRESSION [So far, I ve talked about regression
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationIntroduction to Machine Learning. Regression. Computer Science, Tel-Aviv University,
1 Introduction to Machine Learning Regression Computer Science, Tel-Aviv University, 2013-14 Classification Input: X Real valued, vectors over real. Discrete values (0,1,2,...) Other structures (e.g.,
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationCOMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d)
COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless
More informationPhase transitions for particles in R 3
Phase transitions for particles in R 3 Sabine Jansen LMU Munich Konstanz, 29 May 208 Overview. Introduction to statistical mechanics: Partition functions and statistical ensembles 2. Phase transitions:
More informationVariance Reduction and Ensemble Methods
Variance Reduction and Ensemble Methods Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Last Time PAC learning Bias/variance tradeoff small hypothesis
More informationLE JOURNAL DE PHYSIQUE - LETTRES
Les The Tome 46 No 6 15 MARS 1985 LE JOURNAL DE PHYSIQUE - LETTRES J. Physique Lett. 46 (1985) L-217 - L-222 15 MARS 1985, : L-217 Classification Physics Abstracts 75.50K Random free energies in spin glasses
More informationDefinition (The carefully thought-out calculus version based on limits).
4.1. Continuity and Graphs Definition 4.1.1 (Intuitive idea used in algebra based on graphing). A function, f, is continuous on the interval (a, b) if the graph of y = f(x) can be drawn over the interval
More informationn! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2
Order statistics Ex. 4. (*. Let independent variables X,..., X n have U(0, distribution. Show that for every x (0,, we have P ( X ( < x and P ( X (n > x as n. Ex. 4.2 (**. By using induction or otherwise,
More informationProblem set 2 The central limit theorem.
Problem set 2 The central limit theorem. Math 22a September 6, 204 Due Sept. 23 The purpose of this problem set is to walk through the proof of the central limit theorem of probability theory. Roughly
More informationLecture 10: Normal RV. Lisa Yan July 18, 2018
Lecture 10: Normal RV Lisa Yan July 18, 2018 Announcements Midterm next Tuesday Practice midterm, solutions out on website SCPD students: fill out Google form by today Covers up to and including Friday
More informationGraphlet Screening (GS)
Graphlet Screening (GS) Jiashun Jin Carnegie Mellon University April 11, 2014 Jiashun Jin Graphlet Screening (GS) 1 / 36 Collaborators Alphabetically: Zheng (Tracy) Ke Cun-Hui Zhang Qi Zhang Princeton
More informationLecture 5: Random Energy Model
STAT 206A: Gibbs Measures Invited Speaker: Andrea Montanari Lecture 5: Random Energy Model Lecture date: September 2 Scribe: Sebastien Roch This is a guest lecture by Andrea Montanari (ENS Paris and Stanford)
More informationMachine Learning Basics: Maximum Likelihood Estimation
Machine Learning Basics: Maximum Likelihood Estimation Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics 1. Learning
More informationTHE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE EXAMINATION MODULE 2
THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE EXAMINATION NEW MODULAR SCHEME introduced from the eaminations in 7 MODULE SPECIMEN PAPER B AND SOLUTIONS The time for the eamination is ½ hours The paper
More informationTo get horizontal and slant asymptotes algebraically we need to know about end behaviour for rational functions.
Concepts: Horizontal Asymptotes, Vertical Asymptotes, Slant (Oblique) Asymptotes, Transforming Reciprocal Function, Sketching Rational Functions, Solving Inequalities using Sign Charts. Rational Function
More informationAn Introduction to Statistical and Probabilistic Linear Models
An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning
More informationSimple Linear Regression Estimation and Properties
Simple Linear Regression Estimation and Properties Outline Review of the Reading Estimate parameters using OLS Other features of OLS Numerical Properties of OLS Assumptions of OLS Goodness of Fit Checking
More informationVariances and covariances
Page 1 Chapter 4 Variances and covariances variance The expected value of a random variable gives a crude measure of the center of location of the distribution of that random variable. For instance, if
More informationMachine Learning, Fall 2011: Homework 5
0-60 Machine Learning, Fall 0: Homework 5 Machine Learning Department Carnegie Mellon University Due:??? Instructions There are 3 questions on this assignment. Please submit your completed homework to
More informationSimple Techniques for Improving SGD. CS6787 Lecture 2 Fall 2017
Simple Techniques for Improving SGD CS6787 Lecture 2 Fall 2017 Step Sizes and Convergence Where we left off Stochastic gradient descent x t+1 = x t rf(x t ; yĩt ) Much faster per iteration than gradient
More informationPaper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)
Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationEconomics 583: Econometric Theory I A Primer on Asymptotics
Economics 583: Econometric Theory I A Primer on Asymptotics Eric Zivot January 14, 2013 The two main concepts in asymptotic theory that we will use are Consistency Asymptotic Normality Intuition consistency:
More information11. Learning graphical models
Learning graphical models 11-1 11. Learning graphical models Maximum likelihood Parameter learning Structural learning Learning partially observed graphical models Learning graphical models 11-2 statistical
More informationGibbs Sampling in Linear Models #2
Gibbs Sampling in Linear Models #2 Econ 690 Purdue University Outline 1 Linear Regression Model with a Changepoint Example with Temperature Data 2 The Seemingly Unrelated Regressions Model 3 Gibbs sampling
More informationLecture 3: More on regularization. Bayesian vs maximum likelihood learning
Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting
More informationOn the number of circuits in random graphs. Guilhem Semerjian. [ joint work with Enzo Marinari and Rémi Monasson ]
On the number of circuits in random graphs Guilhem Semerjian [ joint work with Enzo Marinari and Rémi Monasson ] [ Europhys. Lett. 73, 8 (2006) ] and [ cond-mat/0603657 ] Orsay 13-04-2006 Outline of the
More informationChapter 12 - Lecture 2 Inferences about regression coefficient
Chapter 12 - Lecture 2 Inferences about regression coefficient April 19th, 2010 Facts about slope Test Statistic Confidence interval Hypothesis testing Test using ANOVA Table Facts about slope In previous
More informationPROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS
PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability
More informationComputer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization
Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Multivariate Gaussians Mark Schmidt University of British Columbia Winter 2019 Last Time: Multivariate Gaussian http://personal.kenyon.edu/hartlaub/mellonproject/bivariate2.html
More informationEXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY
EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 2015 MODULE 1 : Probability distributions Time allowed: Three hours Candidates should answer FIVE questions. All questions carry equal marks.
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationLearning latent structure in complex networks
Learning latent structure in complex networks Lars Kai Hansen www.imm.dtu.dk/~lkh Current network research issues: Social Media Neuroinformatics Machine learning Joint work with Morten Mørup, Sune Lehmann
More informationLinear Methods for Prediction
Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we
More informationThe non-backtracking operator
The non-backtracking operator Florent Krzakala LPS, Ecole Normale Supérieure in collaboration with Paris: L. Zdeborova, A. Saade Rome: A. Decelle Würzburg: J. Reichardt Santa Fe: C. Moore, P. Zhang Berkeley:
More informationRational Functions. Elementary Functions. Algebra with mixed fractions. Algebra with mixed fractions
Rational Functions A rational function f (x) is a function which is the ratio of two polynomials, that is, Part 2, Polynomials Lecture 26a, Rational Functions f (x) = where and are polynomials Dr Ken W
More informationOutline. Random Variables. Examples. Random Variable
Outline Random Variables M. Sami Fadali Professor of Electrical Engineering University of Nevada, Reno Random variables. CDF and pdf. Joint random variables. Correlated, independent, orthogonal. Correlation,
More informationLinear Models for Regression
Linear Models for Regression CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 The Regression Problem Training data: A set of input-output
More informationALGEBRAIC GEOMETRY HOMEWORK 3
ALGEBRAIC GEOMETRY HOMEWORK 3 (1) Consider the curve Y 2 = X 2 (X + 1). (a) Sketch the curve. (b) Determine the singular point P on C. (c) For all lines through P, determine the intersection multiplicity
More informationIntroduction to Machine Learning
Introduction to Machine Learning Linear Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1
More informationGaussian and Linear Discriminant Analysis; Multiclass Classification
Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015
More information10701/15781 Machine Learning, Spring 2007: Homework 2
070/578 Machine Learning, Spring 2007: Homework 2 Due: Wednesday, February 2, beginning of the class Instructions There are 4 questions on this assignment The second question involves coding Do not attach
More informationLecture 4: September Reminder: convergence of sequences
36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 4: September 6 In this lecture we discuss the convergence of random variables. At a high-level, our first few lectures focused
More informationGWAS IV: Bayesian linear (variance component) models
GWAS IV: Bayesian linear (variance component) models Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS IV: Bayesian
More informationInformation Measure Estimation and Applications: Boosting the Effective Sample Size from n to n ln n
Information Measure Estimation and Applications: Boosting the Effective Sample Size from n to n ln n Jiantao Jiao (Stanford EE) Joint work with: Kartik Venkat Yanjun Han Tsachy Weissman Stanford EE Tsinghua
More informationPATTERN RECOGNITION AND MACHINE LEARNING
PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality
More informationLecture 4: Types of errors. Bayesian regression models. Logistic regression
Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture
More informationKyle Reing University of Southern California April 18, 2018
Renormalization Group and Information Theory Kyle Reing University of Southern California April 18, 2018 Overview Renormalization Group Overview Information Theoretic Preliminaries Real Space Mutual Information
More informationMachine learning - HT Basis Expansion, Regularization, Validation
Machine learning - HT 016 4. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford Feburary 03, 016 Outline Introduce basis function to go beyond linear regression Understanding
More informationECE 302 Division 1 MWF 10:30-11:20 (Prof. Pollak) Final Exam Solutions, 5/3/2004. Please read the instructions carefully before proceeding.
NAME: ECE 302 Division MWF 0:30-:20 (Prof. Pollak) Final Exam Solutions, 5/3/2004. Please read the instructions carefully before proceeding. If you are not in Prof. Pollak s section, you may not take this
More information6.867 Machine Learning
6.867 Machine Learning Problem set 1 Solutions Thursday, September 19 What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.
More informationAn Introduction to Deterministic Annealing
An Introduction to Deterministic Annealing Fabrice Rossi SAMM Université Paris 1 2012 Optimization in machine learning: optimization is one of the central tools methodology: choose a model with some adjustable
More informationAsymptotic Rational Approximation To Pi: Solution of an Unsolved Problem Posed By Herbert Wilf
AofA 0 DMTCS proc AM, 00, 59 60 Asymptotic Rational Approximation To Pi: Solution of an Unsolved Problem Posed By Herbert Wilf Mark Daniel Ward Department of Statistics, Purdue University, 50 North University
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationECE 595: Machine Learning I Adversarial Attack 1
ECE 595: Machine Learning I Adversarial Attack 1 Spring 2019 Stanley Chan School of Electrical and Computer Engineering Purdue University 1 / 32 Outline Examples of Adversarial Attack Basic Terminology
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the
More informationCS145: Probability & Computing
CS45: Probability & Computing Lecture 5: Concentration Inequalities, Law of Large Numbers, Central Limit Theorem Instructor: Eli Upfal Brown University Computer Science Figure credits: Bertsekas & Tsitsiklis,
More informationSpecification Errors, Measurement Errors, Confounding
Specification Errors, Measurement Errors, Confounding Kerby Shedden Department of Statistics, University of Michigan October 10, 2018 1 / 32 An unobserved covariate Suppose we have a data generating model
More informationBayesian Learning in Undirected Graphical Models
Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ Work with: Iain Murray and Hyun-Chul
More informationMachine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression
Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Due: Monday, February 13, 2017, at 10pm (Submit via Gradescope) Instructions: Your answers to the questions below,
More informationn! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2
Order statistics Ex. 4.1 (*. Let independent variables X 1,..., X n have U(0, 1 distribution. Show that for every x (0, 1, we have P ( X (1 < x 1 and P ( X (n > x 1 as n. Ex. 4.2 (**. By using induction
More informationAn Asymptotically Optimal Algorithm for the Max k-armed Bandit Problem
An Asymptotically Optimal Algorithm for the Max k-armed Bandit Problem Matthew J. Streeter February 27, 2006 CMU-CS-06-110 Stephen F. Smith School of Computer Science Carnegie Mellon University Pittsburgh,
More informationUpper Bounds for Partitions into k-th Powers Elementary Methods
Int. J. Contemp. Math. Sciences, Vol. 4, 2009, no. 9, 433-438 Upper Bounds for Partitions into -th Powers Elementary Methods Rafael Jaimczu División Matemática, Universidad Nacional de Luján Buenos Aires,
More informationGaussian processes for inference in stochastic differential equations
Gaussian processes for inference in stochastic differential equations Manfred Opper, AI group, TU Berlin November 6, 2017 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017
More informationSTAT 111 Recitation 7
STAT 111 Recitation 7 Xin Lu Tan xtan@wharton.upenn.edu October 25, 2013 1 / 13 Miscellaneous Please turn in homework 6. Please pick up homework 7 and the graded homework 5. Please check your grade and
More informationMachine Learning - MT & 5. Basis Expansion, Regularization, Validation
Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships
More informationBrief Review of Probability
Brief Review of Probability Nuno Vasconcelos (Ken Kreutz-Delgado) ECE Department, UCSD Probability Probability theory is a mathematical language to deal with processes or experiments that are non-deterministic
More informationECE 595: Machine Learning I Adversarial Attack 1
ECE 595: Machine Learning I Adversarial Attack 1 Spring 2019 Stanley Chan School of Electrical and Computer Engineering Purdue University 1 / 32 Outline Examples of Adversarial Attack Basic Terminology
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationOptimal predictive inference
Optimal predictive inference Susanne Still University of Hawaii at Manoa Information and Computer Sciences S. Still, J. P. Crutchfield. Structure or Noise? http://lanl.arxiv.org/abs/0708.0654 S. Still,
More informationOptimization Methods. Lecture 19: Line Searches and Newton s Method
15.93 Optimization Methods Lecture 19: Line Searches and Newton s Method 1 Last Lecture Necessary Conditions for Optimality (identifies candidates) x local min f(x ) =, f(x ) PSD Slide 1 Sufficient Conditions
More informationData Analysis and Manifold Learning Lecture 9: Diffusion on Manifolds and on Graphs
Data Analysis and Manifold Learning Lecture 9: Diffusion on Manifolds and on Graphs Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture
More informationMultiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar
Multiple regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression 1 / 36 Previous two lectures Linear and logistic
More informationFree Entropy for Free Gibbs Laws Given by Convex Potentials
Free Entropy for Free Gibbs Laws Given by Convex Potentials David A. Jekel University of California, Los Angeles Young Mathematicians in C -algebras, August 2018 David A. Jekel (UCLA) Free Entropy YMC
More informationKernel Methods. Barnabás Póczos
Kernel Methods Barnabás Póczos Outline Quick Introduction Feature space Perceptron in the feature space Kernels Mercer s theorem Finite domain Arbitrary domain Kernel families Constructing new kernels
More informationLinear Classification
Linear Classification Lili MOU moull12@sei.pku.edu.cn http://sei.pku.edu.cn/ moull12 23 April 2015 Outline Introduction Discriminant Functions Probabilistic Generative Models Probabilistic Discriminative
More informationx 2 x n r n J(x + t(x x ))(x x )dt. For warming-up we start with methods for solving a single equation of one variable.
Maria Cameron 1. Fixed point methods for solving nonlinear equations We address the problem of solving an equation of the form (1) r(x) = 0, where F (x) : R n R n is a vector-function. Eq. (1) can be written
More information1 Regression with High Dimensional Data
6.883 Learning with Combinatorial Structure ote for Lecture 11 Instructor: Prof. Stefanie Jegelka Scribe: Xuhong Zhang 1 Regression with High Dimensional Data Consider the following regression problem:
More informationAdvanced computational methods X Selected Topics: SGD
Advanced computational methods X071521-Selected Topics: SGD. In this lecture, we look at the stochastic gradient descent (SGD) method 1 An illustrating example The MNIST is a simple dataset of variety
More informationFinal Examination Statistics 200C. T. Ferguson June 11, 2009
Final Examination Statistics 00C T. Ferguson June, 009. (a) Define: X n converges in probability to X. (b) Define: X m converges in quadratic mean to X. (c) Show that if X n converges in quadratic mean
More informationGraduate Econometrics I: Asymptotic Theory
Graduate Econometrics I: Asymptotic Theory Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Asymptotic Theory
More informationThe deterministic Lasso
The deterministic Lasso Sara van de Geer Seminar für Statistik, ETH Zürich Abstract We study high-dimensional generalized linear models and empirical risk minimization using the Lasso An oracle inequality
More informationMath 473: Practice Problems for Test 1, Fall 2011, SOLUTIONS
Math 473: Practice Problems for Test 1, Fall 011, SOLUTIONS Show your work: 1. (a) Compute the Taylor polynomials P n (x) for f(x) = sin x and x 0 = 0. Solution: Compute f(x) = sin x, f (x) = cos x, f
More informationEfficient and Optimal Modal-set Estimation using knn graphs
Efficient and Optimal Modal-set Estimation using knn graphs Samory Kpotufe ORFE, Princeton University Based on various results with Sanjoy Dasgupta, Kamalika Chaudhuri, Ulrike von Luxburg, Heinrich Jiang
More informationDay 3: Classification, logistic regression
Day 3: Classification, logistic regression Introduction to Machine Learning Summer School June 18, 2018 - June 29, 2018, Chicago Instructor: Suriya Gunasekar, TTI Chicago 20 June 2018 Topics so far Supervised
More informationMachine Learning Lecture 7
Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant
More information