Machine Learning Theory Lecture 4
|
|
- Justina Crawford
- 5 years ago
- Views:
Transcription
1 Machine Learning Theory Lecture 4 Nicholas Harvey October 9, Basic Probability One of the first concentration bounds that you learn in probability theory is Markov s inequality. It bounds the right-tail of a random variable, using very few assumptions. Theorem 1.1 (Markov s Inequality). Let Y be a real-valued random variable that assumes only nonnegative values. Then, for all a > 0, Pr Y a E Y. a Wikipedia, 1, Equation B.3, Grimmett-Stirzaker Lemma 7..7, Durrett Theorem Unions and Intersections Another useful tool is the union bound. We typically use this to show that no bad events should happen. Fact 1. (Union Bound). Let E 1 and E be arbitrary events, not necessarily independent. Then Pr E 1 E Pr E 1 + Pr E. Often we use it in the reverse direction, to show all good events should happen. Fact 1.3 (Reverse Union Bound). Let F 1 and F be arbitrary events, not necessarily independent. Suppose that Pr F 1 1 p 1 and Pr F 1 p. Then Pr F 1 F 1 (p 1 + p ). Proof. Let E i = F i. Then Pr E i p i. Then Pr F 1 F = 1 Pr F 1 F = 1 Pr F 1 F (complementary event) (De Morgan s law) = 1 Pr E 1 E (definition of E i ) 1 (p 1 + p ) (union bound). Another useful trick concerns the union of independent events. If F 1 and F are each likely to happen, and independent, then their union is even more likely to happen. 1
2 Fact 1.4 (Union Bound). Let F 1 and F be independent events. Suppose that Pr F 1 1 p 1 and Pr F 1 p. Then Pr F 1 F = 1 p 1 p. Proof. Observe that Pr F i pi. So Pr F 1 F = 1 Pr F 1 F = 1 Pr F 1 Pr F 1 p 1 p. (De Morgan s law) (independence) Hoeffding s Inequality Theorem.1. Let X 1,..., X n be independent random variables such that X i always lies in the interval 0, 1. Define X = n X i. Then Pr X E X t exp( t /n) t 0. Wikipedia, 1, Lemma B.6. We will prove a weaker result where exponent is decreased from to 1/. Simplifications. First of all, we will center the random variables, which cleans up the inequality by eliminating the expectation. Define ˆX i = X i E X i and ˆX = n ˆX i. Note that 1 ˆX i 1, 1. Our main argument is to prove that Pr ˆX t exp( t /n). (.1) The same argument also applies to ˆX, so we get that Pr ˆX t = Pr ˆX t exp( t /n). Combining them with a union bound, we get Pr X E X t = Pr ˆX t Pr ˆX t + Pr ˆX t exp( t /n). This proves the theorem (with the weaker exponent). Proof (of (.1)). The Hoeffding inequality crucially relies on mutual independence of the ˆX i random variables. How can we exploit independence in the proof? What special properties to independent random variables have? One basic property is that for any independent random variables A and B. E A B = E A E B (.) 1 This step is where the argument is not careful enough to obtain the optimal exponent: ˆXi is supported on an interval of length 1, not an interval of length.
3 Key Idea #1: The Hoeffding inequality has nothing to do with products of random variables, it is about sums of random variables. So one trick we could try is to convert sums into products using the exponential function. Fix some parameter λ > 0 whose value we will choose later. Define Y i = exp(λ ˆX i ) Y = exp(λ ˆX) = exp ( λ n ˆX i ) = exp(λ ˆX i ) = Y i. It is easy to check that, since {X 1,..., X n } is mutually independent, so is { ˆX1,..., ˆ Xn } and {Y 1,..., Y n }. Therefore, by (.), E Y = E Y i. (.3) So far this all seems quite good. We want to prove that ˆX is small, which is equivalent to proving Y is small. Using (.3), we can do this by showing that the E Y i terms are small. Doing so involves an extremely useful tool. Key Idea #: The second main idea is a clever trick to bound terms of the form E exp(λa), where A is a mean-zero random variable. We discuss this idea in more detail in the next subsection. We will use Claim. to show E Y i = E exp(λ ˆX i ) exp(λ /). Thus, combining this with (.3), E Y exp(λ /) = exp(λ n/). (.4) Now we are ready to prove Hoeffding s inequality: Pr ˆX t = Pr exp(λ ˆX) exp ( λt ) (by monotonicity of e x ) E exp(λ ˆX) (by Markov s inequality (Theorem 1.1)) exp(λt) = E Y exp( λt) exp(λ n/ λt) (by (.4)) = exp( t /n), by optimizing to get λ = t/n.. If we were more careful here and instead used Lemma.4, we could improve the exponent in (.1) from 1/ to 3
4 .1 Exponentiated Mean-Zero RVs The second main idea of Hoeffding s inequality is the following claim. Claim.. Let A be a random variable such that A 1 with probability 1 and E A = 0. Then for any λ > 0, we have E exp(λa) exp(λ /). Intuitively, the expectation should be maximized by the random variable A that is uniform on { 1, +1}. In this case, E exp(λa) = 1 eλ 1 e λ e λ /. This inequality is a nice bound on the hyperbolic cosine function (Claim.3). The full proof of Claim. basically reduces to the case of A { 1, 1} using convexity of e x. Proof. Define p = (1 + A)/ and q = (1 A)/. Observe that p, q 0, p + q = 1, and p q = A. By convexity, exp(λa) = exp ( λ(p q) ) = exp ( λp + ( λ)q ) p exp(λ) + q exp( λ) = eλ + e λ + A (eλ e λ ). Thus, e λ + e λ E exp(λa) E + A (eλ e λ ) = eλ + e λ, since E A = 0. This last quantity is bounded by the following technical claim. Claim.3 (Approximation of Cosh). For any real x, we have (e x + e x )/ exp(x /). A more general result can be found in Alon & Spencer Lemma A.1.5. Proof. First observe that the product of all the even numbers at most n does not exceed the product of all numbers at most n. In symbols, n (n!) = (i) i = (n)! Now to bound (e x + e x )/, we write it as a Taylor series and observe that the odd terms cancel. e x + e x = n 0 x n n! + ( x) n n! n 0 = n 0 x n (n)! n 0 x n n (n!) = (x /) n n! n 0 = exp(x /). A common scenario is that A is mean-zero, but lies in an asymmetric interval a, b, where a < 0 < b. A slightly tighter version of these MGF bounds can be derived for this scenario. Lemma.4 (Hoeffding s Lemma). Let A be a random variable such that A a, b with probability 1 and E A = 0. Then for any λ > 0, we have E exp(λa) exp ( λ (b a) /8 ). Wikipedia, 1, Lemma B.7. The proof uses ideas similar to the proof of Claim., except we cannot use Claim.3 and must instead use an ad-hoc calculus argument. 4
5 References 1 Shai Shalev-Shwartz and Shai Ben-David. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press,
March 1, Florida State University. Concentration Inequalities: Martingale. Approach and Entropy Method. Lizhe Sun and Boning Yang.
Florida State University March 1, 2018 Framework 1. (Lizhe) Basic inequalities Chernoff bounding Review for STA 6448 2. (Lizhe) Discrete-time martingales inequalities via martingale approach 3. (Boning)
More information1 Large Deviations. Korea Lectures June 2017 Joel Spencer Tuesday Lecture III
Korea Lectures June 017 Joel Spencer Tuesday Lecture III 1 Large Deviations We analyze how random variables behave asymptotically Initially we consider the sum of random variables that take values of 1
More informationCPSC 536N: Randomized Algorithms Term 2. Lecture 2
CPSC 536N: Randomized Algorithms 2014-15 Term 2 Prof. Nick Harvey Lecture 2 University of British Columbia In this lecture we continue our introduction to randomized algorithms by discussing the Max Cut
More information1.1 Basic Algebra. 1.2 Equations and Inequalities. 1.3 Systems of Equations
1. Algebra 1.1 Basic Algebra 1.2 Equations and Inequalities 1.3 Systems of Equations 1.1 Basic Algebra 1.1.1 Algebraic Operations 1.1.2 Factoring and Expanding Polynomials 1.1.3 Introduction to Exponentials
More information1 Delayed Renewal Processes: Exploiting Laplace Transforms
IEOR 6711: Stochastic Models I Professor Whitt, Tuesday, October 22, 213 Renewal Theory: Proof of Blackwell s theorem 1 Delayed Renewal Processes: Exploiting Laplace Transforms The proof of Blackwell s
More informationIFT Lecture 7 Elements of statistical learning theory
IFT 6085 - Lecture 7 Elements of statistical learning theory This version of the notes has not yet been thoroughly checked. Please report any bugs to the scribes or instructor. Scribe(s): Brady Neal and
More informationLectures 6: Degree Distributions and Concentration Inequalities
University of Washington Lecturer: Abraham Flaxman/Vahab S Mirrokni CSE599m: Algorithms and Economics of Networks April 13 th, 007 Scribe: Ben Birnbaum Lectures 6: Degree Distributions and Concentration
More information1 Review of The Learning Setting
COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #8 Scribe: Changyan Wang February 28, 208 Review of The Learning Setting Last class, we moved beyond the PAC model: in the PAC model we
More informationSpring 2014 Advanced Probability Overview. Lecture Notes Set 1: Course Overview, σ-fields, and Measures
36-752 Spring 2014 Advanced Probability Overview Lecture Notes Set 1: Course Overview, σ-fields, and Measures Instructor: Jing Lei Associated reading: Sec 1.1-1.4 of Ash and Doléans-Dade; Sec 1.1 and A.1
More information11.1 Set Cover ILP formulation of set cover Deterministic rounding
CS787: Advanced Algorithms Lecture 11: Randomized Rounding, Concentration Bounds In this lecture we will see some more examples of approximation algorithms based on LP relaxations. This time we will use
More informationEntropy and Ergodic Theory Lecture 15: A first look at concentration
Entropy and Ergodic Theory Lecture 15: A first look at concentration 1 Introduction to concentration Let X 1, X 2,... be i.i.d. R-valued RVs with common distribution µ, and suppose for simplicity that
More informationCSE 525 Randomized Algorithms & Probabilistic Analysis Spring Lecture 3: April 9
CSE 55 Randomized Algorithms & Probabilistic Analysis Spring 01 Lecture : April 9 Lecturer: Anna Karlin Scribe: Tyler Rigsby & John MacKinnon.1 Kinds of randomization in algorithms So far in our discussion
More informationThe main results about probability measures are the following two facts:
Chapter 2 Probability measures The main results about probability measures are the following two facts: Theorem 2.1 (extension). If P is a (continuous) probability measure on a field F 0 then it has a
More informationHigh Dimensional Probability
High Dimensional Probability for Mathematicians and Data Scientists Roman Vershynin 1 1 University of Michigan. Webpage: www.umich.edu/~romanv ii Preface Who is this book for? This is a textbook in probability
More informationLecture 18: March 15
CS71 Randomness & Computation Spring 018 Instructor: Alistair Sinclair Lecture 18: March 15 Disclaimer: These notes have not been subjected to the usual scrutiny accorded to formal publications. They may
More informationLecture 5: Probabilistic tools and Applications II
T-79.7003: Graphs and Networks Fall 2013 Lecture 5: Probabilistic tools and Applications II Lecturer: Charalampos E. Tsourakakis Oct. 11, 2013 5.1 Overview In the first part of today s lecture we will
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 2: Introduction to statistical learning theory. 1 / 22 Goals of statistical learning theory SLT aims at studying the performance of
More informationTail and Concentration Inequalities
CSE 694: Probabilistic Analysis and Randomized Algorithms Lecturer: Hung Q. Ngo SUNY at Buffalo, Spring 2011 Last update: February 19, 2011 Tail and Concentration Ineualities From here on, we use 1 A to
More informationLecture 4. P r[x > ce[x]] 1/c. = ap r[x = a] + a>ce[x] P r[x = a]
U.C. Berkeley CS273: Parallel and Distributed Theory Lecture 4 Professor Satish Rao September 7, 2010 Lecturer: Satish Rao Last revised September 13, 2010 Lecture 4 1 Deviation bounds. Deviation bounds
More informationActive Learning: Disagreement Coefficient
Advanced Course in Machine Learning Spring 2010 Active Learning: Disagreement Coefficient Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz In previous lectures we saw examples in which
More informationLarge Deviations for Weakly Dependent Sequences: The Gärtner-Ellis Theorem
Chapter 34 Large Deviations for Weakly Dependent Sequences: The Gärtner-Ellis Theorem This chapter proves the Gärtner-Ellis theorem, establishing an LDP for not-too-dependent processes taking values in
More informationExponential Tail Bounds
Exponential Tail Bounds Mathias Winther Madsen January 2, 205 Here s a warm-up problem to get you started: Problem You enter the casino with 00 chips and start playing a game in which you double your capital
More informationTail Inequalities. The Chernoff bound works for random variables that are a sum of indicator variables with the same distribution (Bernoulli trials).
Tail Inequalities William Hunt Lane Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, WV William.Hunt@mail.wvu.edu Introduction In this chapter, we are interested
More informationSTA 711: Probability & Measure Theory Robert L. Wolpert
STA 711: Probability & Measure Theory Robert L. Wolpert 6 Independence 6.1 Independent Events A collection of events {A i } F in a probability space (Ω,F,P) is called independent if P[ i I A i ] = P[A
More informationThe Completion of a Metric Space
The Completion of a Metric Space Let (X, d) be a metric space. The goal of these notes is to construct a complete metric space which contains X as a subspace and which is the smallest space with respect
More informationConvergence of Random Walks and Conductance - Draft
Graphs and Networks Lecture 0 Convergence of Random Walks and Conductance - Draft Daniel A. Spielman October, 03 0. Overview I present a bound on the rate of convergence of random walks in graphs that
More informationLecture 9. = 1+z + 2! + z3. 1 = 0, it follows that the radius of convergence of (1) is.
The Exponential Function Lecture 9 The exponential function 1 plays a central role in analysis, more so in the case of complex analysis and is going to be our first example using the power series method.
More informationCS 6820 Fall 2014 Lectures, October 3-20, 2014
Analysis of Algorithms Linear Programming Notes CS 6820 Fall 2014 Lectures, October 3-20, 2014 1 Linear programming The linear programming (LP) problem is the following optimization problem. We are given
More informationHoeffding, Chernoff, Bennet, and Bernstein Bounds
Stat 928: Statistical Learning Theory Lecture: 6 Hoeffding, Chernoff, Bennet, Bernstein Bounds Instructor: Sham Kakade 1 Hoeffding s Bound We say X is a sub-gaussian rom variable if it has quadratically
More information1.1.1 Algebraic Operations
1.1.1 Algebraic Operations We need to learn how our basic algebraic operations interact. When confronted with many operations, we follow the order of operations: Parentheses Exponentials Multiplication
More informationEE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16
EE539R: Problem Set 4 Assigned: 3/08/6, Due: 07/09/6. Cover and Thomas: Problem 3.5 Sets defined by probabilities: Define the set C n (t = {x n : P X n(x n 2 nt } (a We have = P X n(x n P X n(x n 2 nt
More informationMATH 1A, Complete Lecture Notes. Fedor Duzhin
MATH 1A, Complete Lecture Notes Fedor Duzhin 2007 Contents I Limit 6 1 Sets and Functions 7 1.1 Sets................................. 7 1.2 Functions.............................. 8 1.3 How to define a
More informationCOMPSCI 240: Reasoning Under Uncertainty
COMPSCI 240: Reasoning Under Uncertainty Andrew Lan and Nic Herndon University of Massachusetts at Amherst Spring 2019 Lecture 20: Central limit theorem & The strong law of large numbers Markov and Chebyshev
More informationMAT128A: Numerical Analysis Lecture Three: Condition Numbers
MAT128A: Numerical Analysis Lecture Three: Condition Numbers October 1, 2018 Lecture 1 October 1, 2018 1 / 26 An auspicious example Last time, we saw that the naive evaluation of the function f (x) = 1
More informationLearning Theory. Sridhar Mahadevan. University of Massachusetts. p. 1/38
Learning Theory Sridhar Mahadevan mahadeva@cs.umass.edu University of Massachusetts p. 1/38 Topics Probability theory meet machine learning Concentration inequalities: Chebyshev, Chernoff, Hoeffding, and
More informationLecture 2: Review of Basic Probability Theory
ECE 830 Fall 2010 Statistical Signal Processing instructor: R. Nowak, scribe: R. Nowak Lecture 2: Review of Basic Probability Theory Probabilistic models will be used throughout the course to represent
More informationLecture 1 Measure concentration
CSE 29: Learning Theory Fall 2006 Lecture Measure concentration Lecturer: Sanjoy Dasgupta Scribe: Nakul Verma, Aaron Arvey, and Paul Ruvolo. Concentration of measure: examples We start with some examples
More informationTheorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1
Chapter 2 Probability measures 1. Existence Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension to the generated σ-field Proof of Theorem 2.1. Let F 0 be
More informationCS261: A Second Course in Algorithms Lecture #18: Five Essential Tools for the Analysis of Randomized Algorithms
CS261: A Second Course in Algorithms Lecture #18: Five Essential Tools for the Analysis of Randomized Algorithms Tim Roughgarden March 3, 2016 1 Preamble In CS109 and CS161, you learned some tricks of
More informationLecture 3: AC 0, the switching lemma
Lecture 3: AC 0, the switching lemma Topics in Complexity Theory and Pseudorandomness (Spring 2013) Rutgers University Swastik Kopparty Scribes: Meng Li, Abdul Basit 1 Pseudorandom sets We start by proving
More informationSTAT 302 Introduction to Probability Learning Outcomes. Textbook: A First Course in Probability by Sheldon Ross, 8 th ed.
STAT 302 Introduction to Probability Learning Outcomes Textbook: A First Course in Probability by Sheldon Ross, 8 th ed. Chapter 1: Combinatorial Analysis Demonstrate the ability to solve combinatorial
More informationCS264: Beyond Worst-Case Analysis Lecture #18: Smoothed Complexity and Pseudopolynomial-Time Algorithms
CS264: Beyond Worst-Case Analysis Lecture #18: Smoothed Complexity and Pseudopolynomial-Time Algorithms Tim Roughgarden March 9, 2017 1 Preamble Our first lecture on smoothed analysis sought a better theoretical
More informationFIRST ORDER SENTENCES ON G(n, p), ZERO-ONE LAWS, ALMOST SURE AND COMPLETE THEORIES ON SPARSE RANDOM GRAPHS
FIRST ORDER SENTENCES ON G(n, p), ZERO-ONE LAWS, ALMOST SURE AND COMPLETE THEORIES ON SPARSE RANDOM GRAPHS MOUMANTI PODDER 1. First order theory on G(n, p) We start with a very simple property of G(n,
More informationHMMT February 2018 February 10, 2018
HMMT February 018 February 10, 018 Algebra and Number Theory 1. For some real number c, the graphs of the equation y = x 0 + x + 18 and the line y = x + c intersect at exactly one point. What is c? 18
More information2 = = 0 Thus, the number which is largest in magnitude is equal to the number which is smallest in magnitude.
Limits at Infinity Two additional topics of interest with its are its as x ± and its where f(x) ±. Before we can properly discuss the notion of infinite its, we will need to begin with a discussion on
More informationICML '97 and AAAI '97 Tutorials
A Short Course in Computational Learning Theory: ICML '97 and AAAI '97 Tutorials Michael Kearns AT&T Laboratories Outline Sample Complexity/Learning Curves: nite classes, Occam's VC dimension Razor, Best
More informationLecture 4: Lower Bounds (ending); Thompson Sampling
CMSC 858G: Bandits, Experts and Games 09/12/16 Lecture 4: Lower Bounds (ending); Thompson Sampling Instructor: Alex Slivkins Scribed by: Guowei Sun,Cheng Jie 1 Lower bounds on regret (ending) Recap from
More informationSupport vector machines Lecture 4
Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The
More information1 Recap: Interactive Proofs
Theoretical Foundations of Cryptography Lecture 16 Georgia Tech, Spring 2010 Zero-Knowledge Proofs 1 Recap: Interactive Proofs Instructor: Chris Peikert Scribe: Alessio Guerrieri Definition 1.1. An interactive
More informationLecture 2: Random Variables and Expectation
Econ 514: Probability and Statistics Lecture 2: Random Variables and Expectation Definition of function: Given sets X and Y, a function f with domain X and image Y is a rule that assigns to every x X one
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 59 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d
More informationCS264: Beyond Worst-Case Analysis Lecture #15: Smoothed Complexity and Pseudopolynomial-Time Algorithms
CS264: Beyond Worst-Case Analysis Lecture #15: Smoothed Complexity and Pseudopolynomial-Time Algorithms Tim Roughgarden November 5, 2014 1 Preamble Previous lectures on smoothed analysis sought a better
More information6.842 Randomness and Computation March 3, Lecture 8
6.84 Randomness and Computation March 3, 04 Lecture 8 Lecturer: Ronitt Rubinfeld Scribe: Daniel Grier Useful Linear Algebra Let v = (v, v,..., v n ) be a non-zero n-dimensional row vector and P an n n
More informationPOISSON PROCESSES 1. THE LAW OF SMALL NUMBERS
POISSON PROCESSES 1. THE LAW OF SMALL NUMBERS 1.1. The Rutherford-Chadwick-Ellis Experiment. About 90 years ago Ernest Rutherford and his collaborators at the Cavendish Laboratory in Cambridge conducted
More informationThe first bound is the strongest, the other two bounds are often easier to state and compute. Proof: Applying Markov's inequality, for any >0 we have
The first bound is the strongest, the other two bounds are often easier to state and compute Proof: Applying Markov's inequality, for any >0 we have Pr (1 + ) = Pr For any >0, we can set = ln 1+ (4.4.1):
More informationFirst In-Class Exam Solutions Math 410, Professor David Levermore Monday, 1 October 2018
First In-Class Exam Solutions Math 40, Professor David Levermore Monday, October 208. [0] Let {b k } k N be a sequence in R and let A be a subset of R. Write the negations of the following assertions.
More informationNotes 6 : First and second moment methods
Notes 6 : First and second moment methods Math 733-734: Theory of Probability Lecturer: Sebastien Roch References: [Roc, Sections 2.1-2.3]. Recall: THM 6.1 (Markov s inequality) Let X be a non-negative
More informationChapter 13. Convex and Concave. Josef Leydold Mathematical Methods WS 2018/19 13 Convex and Concave 1 / 44
Chapter 13 Convex and Concave Josef Leydold Mathematical Methods WS 2018/19 13 Convex and Concave 1 / 44 Monotone Function Function f is called monotonically increasing, if x 1 x 2 f (x 1 ) f (x 2 ) It
More informationFooling Sets and. Lecture 5
Fooling Sets and Introduction to Nondeterministic Finite Automata Lecture 5 Proving that a language is not regular Given a language, we saw how to prove it is regular (union, intersection, concatenation,
More informationLECTURE 10: REVIEW OF POWER SERIES. 1. Motivation
LECTURE 10: REVIEW OF POWER SERIES By definition, a power series centered at x 0 is a series of the form where a 0, a 1,... and x 0 are constants. For convenience, we shall mostly be concerned with the
More informationQF101: Quantitative Finance August 22, Week 1: Functions. Facilitator: Christopher Ting AY 2017/2018
QF101: Quantitative Finance August 22, 2017 Week 1: Functions Facilitator: Christopher Ting AY 2017/2018 The chief function of the body is to carry the brain around. Thomas A. Edison 1.1 What is a function?
More information5 + 9(10) + 3(100) + 0(1000) + 2(10000) =
Chapter 5 Analyzing Algorithms So far we have been proving statements about databases, mathematics and arithmetic, or sequences of numbers. Though these types of statements are common in computer science,
More informationDon t Compare Averages
Don t Compare Averages Holger Bast and Ingmar Weber Max-Planck-Institut für Informatik Saarbrücken, Germany bast@mpi-sb.mpg.de iweber@mpi-sb.mpg.de Abstract. We point out that for two sets of measurements,
More informationFinding a Needle in a Haystack: Conditions for Reliable. Detection in the Presence of Clutter
Finding a eedle in a Haystack: Conditions for Reliable Detection in the Presence of Clutter Bruno Jedynak and Damianos Karakos October 23, 2006 Abstract We study conditions for the detection of an -length
More informationECS171: Machine Learning
ECS171: Machine Learning Lecture 6: Training versus Testing (LFD 2.1) Cho-Jui Hsieh UC Davis Jan 29, 2018 Preamble to the theory Training versus testing Out-of-sample error (generalization error): What
More informationEquations and Inequalities
Chapter 3 Equations and Inequalities Josef Leydold Bridging Course Mathematics WS 2018/19 3 Equations and Inequalities 1 / 61 Equation We get an equation by equating two terms. l.h.s. = r.h.s. Domain:
More informationUniversity of Sheffield. School of Mathematics & and Statistics. Measure and Probability MAS350/451/6352
University of Sheffield School of Mathematics & and Statistics Measure and Probability MAS350/451/6352 Spring 2018 Chapter 1 Measure Spaces and Measure 1.1 What is Measure? Measure theory is the abstract
More informationThe concentration of the chromatic number of random graphs
The concentration of the chromatic number of random graphs Noga Alon Michael Krivelevich Abstract We prove that for every constant δ > 0 the chromatic number of the random graph G(n, p) with p = n 1/2
More informationConcentration Inequalities
Chapter Concentration Inequalities I. Moment generating functions, the Chernoff method, and sub-gaussian and sub-exponential random variables a. Goal for this section: given a random variable X, how does
More informationLecture 1: September 25, A quick reminder about random variables and convexity
Information and Coding Theory Autumn 207 Lecturer: Madhur Tulsiani Lecture : September 25, 207 Administrivia This course will cover some basic concepts in information and coding theory, and their applications
More informationLecture 5. Shearer s Lemma
Stanford University Spring 2016 Math 233: Non-constructive methods in combinatorics Instructor: Jan Vondrák Lecture date: April 6, 2016 Scribe: László Miklós Lovász Lecture 5. Shearer s Lemma 5.1 Introduction
More informationM361 Theory of functions of a complex variable
M361 Theory of functions of a complex variable T. Perutz U.T. Austin, Fall 2012 Lecture 4: Exponentials and logarithms We have already been making free use of the sine and cosine functions, cos: R R, sin:
More informationLecture 7: Semidefinite programming
CS 766/QIC 820 Theory of Quantum Information (Fall 2011) Lecture 7: Semidefinite programming This lecture is on semidefinite programming, which is a powerful technique from both an analytic and computational
More informationReview of Optimization Methods
Review of Optimization Methods Prof. Manuela Pedio 20550 Quantitative Methods for Finance August 2018 Outline of the Course Lectures 1 and 2 (3 hours, in class): Linear and non-linear functions on Limits,
More informationAdvanced Machine Learning
Advanced Machine Learning Bandit Problems MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Multi-Armed Bandit Problem Problem: which arm of a K-slot machine should a gambler pull to maximize his
More informationEfficiently Training Sum-Product Neural Networks using Forward Greedy Selection
Efficiently Training Sum-Product Neural Networks using Forward Greedy Selection Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Greedy Algorithms, Frank-Wolfe and Friends
More informationReal Analysis on Metric Spaces
Real Analysis on Metric Spaces Mark Dean Lecture Notes for Fall 014 PhD Class - Brown University 1 Lecture 1 The first topic that we are going to cover in detail is what we ll call real analysis. The foundation
More informationAn Algorithmic Proof of the Lopsided Lovász Local Lemma (simplified and condensed into lecture notes)
An Algorithmic Proof of the Lopsided Lovász Local Lemma (simplified and condensed into lecture notes) Nicholas J. A. Harvey University of British Columbia Vancouver, Canada nickhar@cs.ubc.ca Jan Vondrák
More informationA NICE PROOF OF FARKAS LEMMA
A NICE PROOF OF FARKAS LEMMA DANIEL VICTOR TAUSK Abstract. The goal of this short note is to present a nice proof of Farkas Lemma which states that if C is the convex cone spanned by a finite set and if
More informationRelaxations and Randomized Methods for Nonconvex QCQPs
Relaxations and Randomized Methods for Nonconvex QCQPs Alexandre d Aspremont, Stephen Boyd EE392o, Stanford University Autumn, 2003 Introduction While some special classes of nonconvex problems can be
More informationLearning symmetric non-monotone submodular functions
Learning symmetric non-monotone submodular functions Maria-Florina Balcan Georgia Institute of Technology ninamf@cc.gatech.edu Nicholas J. A. Harvey University of British Columbia nickhar@cs.ubc.ca Satoru
More informationGreedy Algorithms. CSE 101: Design and Analysis of Algorithms Lecture 9
Greedy Algorithms CSE 101: Design and Analysis of Algorithms Lecture 9 CSE 101: Design and analysis of algorithms Greedy algorithms Reading: Kleinberg and Tardos, sections 4.1, 4.2, and 4.3 Homework 4
More informationLecture Support Vector Machine (SVM) Classifiers
Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in
More informationMATH 1231 MATHEMATICS 1B CALCULUS. Section 4: - Convergence of Series.
MATH 23 MATHEMATICS B CALCULUS. Section 4: - Convergence of Series. The objective of this section is to get acquainted with the theory and application of series. By the end of this section students will
More informationECE 4400:693 - Information Theory
ECE 4400:693 - Information Theory Dr. Nghi Tran Lecture 8: Differential Entropy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 1 / 43 Outline 1 Review: Entropy of discrete RVs 2 Differential
More informationLecture 4: Proof of Shannon s theorem and an explicit code
CSE 533: Error-Correcting Codes (Autumn 006 Lecture 4: Proof of Shannon s theorem and an explicit code October 11, 006 Lecturer: Venkatesan Guruswami Scribe: Atri Rudra 1 Overview Last lecture we stated
More informationLecture 4: Inequalities and Asymptotic Estimates
CSE 713: Random Graphs and Applications SUNY at Buffalo, Fall 003 Lecturer: Hung Q. Ngo Scribe: Hung Q. Ngo Lecture 4: Inequalities and Asymptotic Estimates We draw materials from [, 5, 8 10, 17, 18].
More informationWe are now going to move on to a discussion of Inequality constraints. Our canonical problem now looks as ( ) = 0 ( ) 0
4 Lecture 4 4.1 Constrained Optimization with Inequality Constraints We are now going to move on to a discussion of Inequality constraints. Our canonical problem now looks as Problem 11 (Constrained Optimization
More informationIntroduction to Machine Learning (67577) Lecture 3
Introduction to Machine Learning (67577) Lecture 3 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem General Learning Model and Bias-Complexity tradeoff Shai Shalev-Shwartz
More informationMeasure and integration
Chapter 5 Measure and integration In calculus you have learned how to calculate the size of different kinds of sets: the length of a curve, the area of a region or a surface, the volume or mass of a solid.
More information1 Maximizing a Submodular Function
6.883 Learning with Combinatorial Structure Notes for Lecture 16 Author: Arpit Agarwal 1 Maximizing a Submodular Function In the last lecture we looked at maximization of a monotone submodular function,
More informationMatrix concentration inequalities
ELE 538B: Mathematics of High-Dimensional Data Matrix concentration inequalities Yuxin Chen Princeton University, Fall 2018 Recap: matrix Bernstein inequality Consider a sequence of independent random
More informationIEOR 165: Spring 2019 Problem Set 2
IEOR 65: Spring 209 Problem Set 2 Instructor: Professor Anil Aswani Issued: 2/8/9 Due: 3//9 Problem : Part a: We may first calculate the sample means: ȳ =.6, x = 7. Then: i= ˆβ = y ix i ȳ x i= x2 i x2
More informationProbability inequalities 11
Paninski, Intro. Math. Stats., October 5, 2005 29 Probability inequalities 11 There is an adage in probability that says that behind every limit theorem lies a probability inequality (i.e., a bound on
More informationWriting proofs for MATH 51H Section 2: Set theory, proofs of existential statements, proofs of uniqueness statements, proof by cases
Writing proofs for MATH 51H Section 2: Set theory, proofs of existential statements, proofs of uniqueness statements, proof by cases September 22, 2018 Recall from last week that the purpose of a proof
More information3 The Simplex Method. 3.1 Basic Solutions
3 The Simplex Method 3.1 Basic Solutions In the LP of Example 2.3, the optimal solution happened to lie at an extreme point of the feasible set. This was not a coincidence. Consider an LP in general form,
More information2-7 Solving Quadratic Inequalities. ax 2 + bx + c > 0 (a 0)
Quadratic Inequalities In One Variable LOOKS LIKE a quadratic equation but Doesn t have an equal sign (=) Has an inequality sign (>,
More information7. Lecture notes on the ellipsoid algorithm
Massachusetts Institute of Technology Michel X. Goemans 18.433: Combinatorial Optimization 7. Lecture notes on the ellipsoid algorithm The simplex algorithm was the first algorithm proposed for linear
More informationAn analogy from Calculus: limits
COMP 250 Fall 2018 35 - big O Nov. 30, 2018 We have seen several algorithms in the course, and we have loosely characterized their runtimes in terms of the size n of the input. We say that the algorithm
More informationLecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016
Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,
More information