CS 543 Page 1 John E. Boon, Jr.

Size: px
Start display at page:

Download "CS 543 Page 1 John E. Boon, Jr."

Transcription

1 CS 543 Machine Learning Spring 2010 Lecture 05 Evaluating Hypotheses I. Overview A. Given observed accuracy of a hypothesis over a limited sample of data, how well does this estimate its accuracy over additional examples? B. Given that one hypothesis outperforms another over some sample of data, how probable is it that this hypothesis is more accurate in general? C. When data is limited what is the best way to use this data to both learn a hypothesis and estimate its accuracy? D. [statistical methods, assumptions about underlying distributions of data are the basic tools we ll use]. II. 5.1 Motivation A. Need to evaluate the performance of a learned hypothesis when it is important to understand whether or not to use the hypothesis (medical treatment effectiveness database and resulting learned hypothesis) B. Evaluating a hypothesis is a component of many learning methods (such as post-pruning decision trees to avoid overfitting) C. Estimating the accuracy of a hypothesis is straightforward when data is plentiful D. Difficulties in estimating the accuracy of a hypothesis with a limited set of data 1. Bias in the estimate Observed accuracy of the learned hypothesis over the training examples is a poor estimator of its accuracy over future examples (1) Typically provides an optimistically biased estimate of the hypothesis over future examples b) To obtain an unbiased estimate of future accuracy we test the hypothesis on some set of test examples chose independently of the training examples and the hypothesis 2. Variance in the estimate Even if the hypothesis accuracy is measured over an unbiased set of test examples independent of the training examples, the measured accuracy can still vary from the true accuracy b) The smaller the set of test examples the greater the expected variance. III. 5.2 Estimating Hypothesis Accuracy A. Introductory remarks 1. We want to know The accuracy of a hypothesis in classifying future examples b) The probable error in this accuracy estimate 2. Data and variables for this chapter X - the set of possible instances over which target functions may be defined (1) Assume that different instances of X may be encountered with different frequencies b) some unknown probability distribution that defines the probability of encountering each instance in X (1) says nothing about whether x is a positive or negative example; (2) it only determines the probability that x will be CS 543 Page 1 John E. Boon, Jr.

2 encountered c) learning task (1) f the target concept or target function (2) H the space of possible hypotheses 3. Training examples of the target function f are provided to the learner by a trainer who draws each instance independently according to distribution, and who then forwards the instance x along with the correct target value f(x) to the learner 4. Example page 130 Learn target function people who plan to purchase new skis this year 5. Given the general setting for learning we are interested in Given a hypothesis h and a data sample containing n examples drawn at random according to the distribution, what it the best estimate of the accuracy of h over future instances drawn from the same distribution? b) What is the probable error in this accuracy estimate? B Sample Error and True Error 1. Sample error - Error rate of the hypothesis over the sample of data that is available the sample error of a hypothesis with respect to some sample S of instances drawn from X is the fraction of S it misclassifies b) c) (1) n is the number of examples in S (2) 2. True error - Error rate of the hypothesis over the entire unknown distribution D of examples the true error of a hypothesis is the probability that it will misclassify a single randomly drawn instance from the distribution D. b) c) d) denotes the probability is taken over instance distribution 3. We usually wish to know the true error of the hypothesis (that is what we can expect when applying the hypothesis to future examples) 4. All we can measure is the sample error though 5. So How good is an estimate of true error is provided by the sample error? C Confidence Intervals for Discrete-Valued Hypotheses 1. Suppose we wish to estimate the true error from some discrete-valued hypothesis h over a sample S where The sample S contains n examples drawn independent of one another and independent of h according to probability distribution (1) (DISCUSS THIS ON BOARD) b) n 30 c) hypothesis h commits r errors over these n examples (1) CS 543 Page 2 John E. Boon, Jr.

3 2. Given no other information, the most probable value of is and 3. With approx 95% probability the true error lies within the interval 4. Illustrative example (page 131) (Discuss confidence interval interpretation on board) 5. See table 5.1 page 132 two-sided N% confidence intervals 6. (Excel worbook for examples) 7. (discuss why 68% CI is broader than 95% CI) 8. A more accurate rule of thumb is that this method of approximating of the true error works well when Section 5.2 Basic Confidence Interval Example n 40 (number of examples in data sample S ) r 12 (number of errors with respect to h ) sample error 0.30 sqrt term rule of thumb confidence interval s on true error N % z N lower CI upper CI IV V. 5.3 Basics of Sampling Theory A. Summary Table 5.2 page 133 B Error Estimation and Estimating Binomial Properties 1. How does the deviation between sample error and true error depend on the size of the data sample? 2. When we measure the sample error we are performing an experiment with a random outcome Collect a random sample S of n independently drawn instances from the distribution and then measure the sample error b) As you repeat that step (experiment) each time drawing a different random sample S of size n we expect different values of the sample error c) is the error of the ith such experiment and is a random variable 3. Perform k such experiments and as k gets large the distribution approaches a Binomial Distribution C The Binomial Distribution CS 543 Page 3 John E. Boon, Jr.

4 1. Binomial distribution Binomial experiment properties (1) The experiment consists of n repeated trials (2) Each trial results in an outcome that may be classified as a success or a failure (3) The probability of success, denoted p, remains constant from trial to trial (4) The repeated trials are independent b) The number X of successes in n trials of a binomial experiment is called a binomial random variable c) Binomial Distribution (1) If a binomial trial can result in a success with probability p and a failure with probability q = 1-p, then the probability distribution of the binomial random variable X, the number of successes in n independent trials is, x = 0,1,2, n. 2. See Table 5.3 page 134 in text 3. Estimating p from a random sample of coin tosses is equivalent to estimating the true error from testing h on a random sample of instances. A single coin toss corresponds to drawing a single random instance from and determining whether it is misclassified by h b) The probability p that a single random coin toss will result in heads corresponds to the probability that a single instance drawn at random will be misclassified (1) p corresponds to c) the number r of heads observed over n coin tosses corresponds to the number of misclassification observed over n randomly drawn instances (1) r/n corresponds to 4. General setting to which the Binomial distribution applies is An underlying experiment whose outcome can be described by a random variable, Y, which can take on two possible values b) The probability Y = 1 on any single trial of the experiment is given by some constant p independent of the outcome of any other experiment. (p is usually not known in advance and the problem is to estimate p) c) A series of independent trials of the experiment is performed producing a sequence of IID r.v.y 1, Y 2, Y n. Let for this sequence of n experiments when Y i = 1 d) The probability that r.v.r will take on a specific value r is (1) D Mean and Variance 1. Expected value of a r.v.is the sum of (probability of a value * value) 2. (see equations 5.3 and 5.4 page 136) 3. How far a r.v.is expected to differ from its expected value 4. (see equation 5.5 page 136) Standard deviation equations 5.6 and 5.7 pgs CS 543 Page 4 John E. Boon, Jr.

5 E Estimators, Bias, and Variance 1. Recast sample error and true error using the equation for the Binomial distribution, equation 5.2 page and n is the number of instances in the sample S b) r is the number of instances from S misclassified by h c) p is the probability of misclassifying a single instance drawn from 3. is called an estimator for the true error 4. The estimation bias is the difference between the expected value of the estimator and the true value of the parameter Estimation bias of an estimator Y for arbitrary parameter p is E[Y]-p 5. If the estimation bias is zero then Y is an unbiased estimator of p 6. Is an unbiased estimator for? For the Binomial distribution E[r] = np b) For a constant n, E[r/n] = p c) So answer is yes an unbiased estimator for 7. At the start of the chapter said that testing the hypothesis on training examples provides an optimistically biased estimate of the hypothesis b) For to give an unbiased estimator for the hypothesis h and the sample S must be chosen independently 8. Given an choice among alternative unbiased estimators it makes sense to choose the unbiased estimator with lest variance Variance arises completely from the variance of r in our Binomial experiments b) Because r is Binomially distributed its variance is given in equation 5.7 page 137 (1) p is unknown, though, so we substitute our estimate r /n for p c) (see equations 5.8 and 5.9 page 138). Binomial probability distribution p 0.30 using estimate of r /n for p Pr(R =r ) E[Y ] 12 Var[Y ] std dev for sample error F Confidence Intervals 1. Definition page To find a confidence interval knowing the mean and the standard deviation of a probability distribution only have to determine the area under the probability curve that contains the N% of the total probability distribution 3. Time invoke for large enough sample sizes the Binomial distribution can be closely approximated by the Normal distribution 4. (See table 5.4 page 139) 5. Review page Remember that in calculating CS 543 Page 5 John E. Boon, Jr.

6 b) Two approximations were involved (1) To estimate the standard deviation of we have approximated by (Eq 5.8 Eq 5.9 page 138) ( Remember that (2) The Binomial distribution has been approximated by the Normal distribution 7. Rule of thumb is that these two approximations give good results when and n 30 G Two-Sided and One-Sided Bounds 1. What if we asked what is the probability that the true error is at most U? (will need a one-sided on the CI) (1- )% confidence interval with lower L and upper U implies a 100(1- /2)% confidence interval with lower L and no upper b) a 100(1- /2)% confidence interval with upper U and no lower c) where is the probability that the value will fall into the unshaded region in Figure 5.1( page 140 d) /2 is the probability that the value will fall into the unshaded region in Figure 5.1(b) page 140 Section 5.2 Basic Confidence Interval Example n 40 (number of examples in data sample S ) r 12 (number of errors with respect to h ) sample error 0.30 sqrt term rule of thumb confidence interval s on true error Two-sided lower CI N % z N upper CI One-sided N % VI. 5.4 A General Approach for Deriving Confidence Intervals A. Introductory remarks 1. View as a problem of estimating the mean of a population based on the CS 543 Page 6 John E. Boon, Jr.

7 mean of a randomly drawn sample of sample size n 2. General steps Identify the underlying population parameter p to be estimated (for example, ) b) Define the estimator Y (for example [try to choose a minimum variance, unbiased estimator] c) Determine the probability distribution Y that governs the estimator Y including its mean and variance d) Determine the N% confidence interval by finding thresholds L and U such that N% of the mass in the probability Y distribution falls between L and U (area under the curve Y ) B Central Limit Theorem 1. The values of the n independently drawn random variables Y 1, Y 2, Y n. obey the same unknown underlying probability distribution 2. Let be the mean of the unknown distribution governing each Y i b) be the standard deviation of the unknown distribution 3. The Y 1, Y 2, Y n are IID r.v. because they describe independent experiments each obeying the same underlying probability distribution 4. Compute the sample mean to estimate the true mean 5. Central limit theorem says that the probability distribution governing the sample mean approached a Normal distribution as n regardless of the distribution that governs the underlying distribution of the random variables Y i 6. Further states that the mean of the distribution governing the sample mean approaches the true mean and the standard deviation approaches 7. (see page 143) VII. 5.5 Difference in Error of Two Hypotheses A. Introductory remarks 1. h 1 has been tested on sample S 1 containing n 1 randomly drawn examples 2. h 2 has been tested on sample S 2 containing n 2 randomly drawn examples 3. Suppose we wish to estimate the difference d between the true errors of these two hypotheses 4. Step 1: identify d as the parameter to be estimated 5. Step 2: choose an estimator for d b) stated here but not proven 6. Step 3: what is the probability distribution governing the r.v.? Invoke Central Limit Theorem b) Difference between two Normal distributions is a Normal distribution c) fill follow a distribution that is approximately Normal with mean d d) Variance of this distribution is the sum of the variances of and e) (see equation 5.12 page 144) 7. Step 4: Determine the confidence intervals (see equation 5.13 page 144) for two-sided confidence interval CS 543 Page 7 John E. Boon, Jr.

8 8. What if h 1 and h 2 had been tested on single sample S containing n randomly drawn examples b) The variance in the new will usually be smaller than that in equation 5.12 because lost random differences between the two samples used in equation 5.12 c) Confidence interval from equation 5.13 will be overly conservative but still correct B Hypothesis Testing 1. Suppose you want to know What is the probability that? 2. For the example What is the probability that given the observed difference in the samples errors =.10? b) Pr(d > 0) is equal to the probability that has not overestimated d by more than.10 c) The probability that falls into the one-sided interval d) Since d is the mean of the distribution governing we can express this one-sided interval as e) We want the number of standard deviations allows from the mean (1) The value.10 is Hypothesis Testing sample error h 1 sample error h 2 d-hat n 1 n 2 Var[d-hat] stddev[d-hat] (2) For a one-sided test, =.10 is a 95% CI (3) The z N for this is 1.64 (4) Therefore f) Therefore for this problem example, the probability that given the observed difference in the samples errors =.10 is.95 (1) We accept the hypothesis that with confidence 0.95 (2) We reject the alternative hypothesis wt the (1-.95)=0.5 level of significance Eq VIII. 5.6 Comparing Learning Algorithms A. Introductory remarks 1. We are interested in comparing the performance of two learning algorithms L A and L B 2. One approach presented here in the text; reference cited for an alternative method CS 543 Page 8 John E. Boon, Jr.

9 3. What is the parameter we wish to measure? Determine which of two learning algorithms L A and L B is better on average for some particular target concept f (1) On average might mean the relative performance of the two algorithms averaged over all training sets of size n that might be drawn from the underlying distribution (2) Estimate the expected value of the difference in their errors (3) (see equation 5.14 page 146) (4) In practice we have only a limited sample D 0 of data when comparing learning methods ( Divide D 0 into a training set S 0 and a disjoint test set T 0 (b) (c) Use the training set for both learning algorithms Use the test data to compare the accuracy of the two learned hypotheses (d) (see equation 5.15 page 146) (i) Now use used to approximate (ii) Only measure the differences in errors for one training set S 0 rather than taking the expected value over all samples S that might be drawn from the distribution b) How can this estimator be improved (1) Repeatedly partition the data D 0 into disjoint training and test sets and take the mean of the test set errors for these experiments (2) Algorithm table 5.5 page 147 (3) The returned value is an estimate of the desired quantity in equation 5.14 (4) It is an estimate of the quantity in equation 5.16 (note the difference in how S is defined) ( S is a sample of size drawn uniformly from D 0 and k is the number of disjoint subsets of equal size used in the algorithm (5) N% confidence interval equation 5.17 page 147 ( Now using t-statistic instead of z N (6) Estimate of the standard deviation of the distribution governing in equation 5.18 page 147 c) Table 5.6 page 148 gives t-statistic values d) The procedure thus far (1) For comparing two learning methods (2) Involves testing the two learned hypotheses on identical test sets (3) Tests where the hypotheses are evaluated over identical samples are called pared tests ( Typically produce tighter confidence interval than unpaired tests (b) When hypotheses tested on separate data CS 543 Page 9 John E. Boon, Jr.

10 samples, differences in the two sample errors might be partially attributable to differences in the makeup of the two samples B Paired t Tests (details for analysis in previous section) 1. Summary of the estimation problem Given observed values of a set of IID r.v. Y 1, Y 2, Y k b) Wish to estimate the mean of the probability distribution governing these Y i c) The estimator we will use is the sample mean 2. Consider the following idealization of the algorithm in Table 5.5 Assume that we can request new training examples drawn according to the underlying instance probability distribution. b) Modify the algorithm so that on each iteration through the loop it generates a new training set S i and a new random test set T i by drawing from this underlying instance distribution c) I measured by the new procedure now correspond to the IID r.v. Y i d) The mean of the distribution corresponds to the expected difference in error between the two leaning methods (equation 5.14) e) The sample mean is the quantity computed by the idealized version of the algorithm. f) Now ask How good an estimate of is provided by? 3. Analysis We have a special case where the Y i are governed by an approximately Normal distribution (because test sets have 30 or more examples the I are approximately Normally distributed) b) We don t know the standard deviation of the distribution of the though c) Need the t-test in these cases (estimate the sample mean of a collection of IID Normally r.v.) (1) We can estimate the standard deviation of the sample mean using our and Y i values (2) (see unnumbered equation page 149) C Practical Considerations 1. In practice we are given a limited set of data D 0 and use the algorithm in Table Statistical foundations require a sample containing k independent, IID Normal r.v. and unlimited access to examples of the target function 3. In practice, the only way to generate new estimates I (see the algorithm step) for the difference between the errors in the two learning algorithms is to resample D 0 dividing it into training and test sets in different ways Now the I are not independent of one another 4. Algorithm in table 5.5 implements a k-fold method. Each example from D 0 is used exactly once in a test set and (k -1) times in a training set 5. Might randomly choose a test set of at least 30 examples from D 0 and use the remaining examples for training (repeat as many times as desired) Advantage can be repeated an infinite number of times b) Disadvantage test sets no longer qualify as being independently CS 543 Page 10 John E. Boon, Jr.

11 drawn with respect to the underlying instance distribution IX. 5.7 Summary and Further Reading X. Suggested HW 5.2, 5.3, 5.4 CS 543 Page 11 John E. Boon, Jr.

CHAPTER EVALUATING HYPOTHESES 5.1 MOTIVATION

CHAPTER EVALUATING HYPOTHESES 5.1 MOTIVATION CHAPTER EVALUATING HYPOTHESES Empirically evaluating the accuracy of hypotheses is fundamental to machine learning. This chapter presents an introduction to statistical methods for estimating hypothesis

More information

Estimating the accuracy of a hypothesis Setting. Assume a binary classification setting

Estimating the accuracy of a hypothesis Setting. Assume a binary classification setting Estimating the accuracy of a hypothesis Setting Assume a binary classification setting Assume input/output pairs (x, y) are sampled from an unknown probability distribution D = p(x, y) Train a binary classifier

More information

Evaluating Hypotheses

Evaluating Hypotheses Evaluating Hypotheses IEEE Expert, October 1996 1 Evaluating Hypotheses Sample error, true error Confidence intervals for observed hypothesis error Estimators Binomial distribution, Normal distribution,

More information

Empirical Risk Minimization, Model Selection, and Model Assessment

Empirical Risk Minimization, Model Selection, and Model Assessment Empirical Risk Minimization, Model Selection, and Model Assessment CS6780 Advanced Machine Learning Spring 2015 Thorsten Joachims Cornell University Reading: Murphy 5.7-5.7.2.4, 6.5-6.5.3.1 Dietterich,

More information

Empirical Evaluation (Ch 5)

Empirical Evaluation (Ch 5) Empirical Evaluation (Ch 5) how accurate is a hypothesis/model/dec.tree? given 2 hypotheses, which is better? accuracy on training set is biased error: error train (h) = #misclassifications/ S train error

More information

[Read Ch. 5] [Recommended exercises: 5.2, 5.3, 5.4]

[Read Ch. 5] [Recommended exercises: 5.2, 5.3, 5.4] Evaluating Hypotheses [Read Ch. 5] [Recommended exercises: 5.2, 5.3, 5.4] Sample error, true error Condence intervals for observed hypothesis error Estimators Binomial distribution, Normal distribution,

More information

Stephen Scott.

Stephen Scott. 1 / 35 (Adapted from Ethem Alpaydin and Tom Mitchell) sscott@cse.unl.edu In Homework 1, you are (supposedly) 1 Choosing a data set 2 Extracting a test set of size > 30 3 Building a tree on the training

More information

Performance Evaluation and Hypothesis Testing

Performance Evaluation and Hypothesis Testing Performance Evaluation and Hypothesis Testing 1 Motivation Evaluating the performance of learning systems is important because: Learning systems are usually designed to predict the class of future unlabeled

More information

یادگیري ماشین. (Machine Learning) ارزیابی فرضیه ها دانشگاه فردوسی مشهد دانشکده مهندسی رضا منصفی. Evaluating Hypothesis (بخش سوم)

یادگیري ماشین. (Machine Learning) ارزیابی فرضیه ها دانشگاه فردوسی مشهد دانشکده مهندسی رضا منصفی. Evaluating Hypothesis (بخش سوم) یادگیري ماشین درس بیستم (Machine Learning) دانشگاه فردوسی مشهد دانشکده مهندسی رضا منصفی ارزیابی فرضیه ها Evaluating Hypothesis (بخش سوم) فهرست مطالب (Sample Error) (True Error) (Confidence Interval) خطاي

More information

Performance Evaluation and Comparison

Performance Evaluation and Comparison Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Cross Validation and Resampling 3 Interval Estimation

More information

TDT4173 Machine Learning

TDT4173 Machine Learning TDT4173 Machine Learning Lecture 3 Bagging & Boosting + SVMs Norwegian University of Science and Technology Helge Langseth IT-VEST 310 helgel@idi.ntnu.no 1 TDT4173 Machine Learning Outline 1 Ensemble-methods

More information

CS 6375: Machine Learning Computational Learning Theory

CS 6375: Machine Learning Computational Learning Theory CS 6375: Machine Learning Computational Learning Theory Vibhav Gogate The University of Texas at Dallas Many slides borrowed from Ray Mooney 1 Learning Theory Theoretical characterizations of Difficulty

More information

Hypothesis Evaluation

Hypothesis Evaluation Hypothesis Evaluation Machine Learning Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall 1395 1 / 31 Table of contents 1 Introduction

More information

MODULE -4 BAYEIAN LEARNING

MODULE -4 BAYEIAN LEARNING MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities

More information

How do we compare the relative performance among competing models?

How do we compare the relative performance among competing models? How do we compare the relative performance among competing models? 1 Comparing Data Mining Methods Frequent problem: we want to know which of the two learning techniques is better How to reliably say Model

More information

Regularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline

Regularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline Other Measures 1 / 52 sscott@cse.unl.edu learning can generally be distilled to an optimization problem Choose a classifier (function, hypothesis) from a set of functions that minimizes an objective function

More information

Evaluating Classifiers. Lecture 2 Instructor: Max Welling

Evaluating Classifiers. Lecture 2 Instructor: Max Welling Evaluating Classifiers Lecture 2 Instructor: Max Welling Evaluation of Results How do you report classification error? How certain are you about the error you claim? How do you compare two algorithms?

More information

Understanding Generalization Error: Bounds and Decompositions

Understanding Generalization Error: Bounds and Decompositions CIS 520: Machine Learning Spring 2018: Lecture 11 Understanding Generalization Error: Bounds and Decompositions Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the

More information

10.1 The Formal Model

10.1 The Formal Model 67577 Intro. to Machine Learning Fall semester, 2008/9 Lecture 10: The Formal (PAC) Learning Model Lecturer: Amnon Shashua Scribe: Amnon Shashua 1 We have see so far algorithms that explicitly estimate

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem

More information

Computational Learning Theory

Computational Learning Theory 1 Computational Learning Theory 2 Computational learning theory Introduction Is it possible to identify classes of learning problems that are inherently easy or difficult? Can we characterize the number

More information

Lecture 9: Bayesian Learning

Lecture 9: Bayesian Learning Lecture 9: Bayesian Learning Cognitive Systems II - Machine Learning Part II: Special Aspects of Concept Learning Bayes Theorem, MAL / ML hypotheses, Brute-force MAP LEARNING, MDL principle, Bayes Optimal

More information

CS 160: Lecture 16. Quantitative Studies. Outline. Random variables and trials. Random variables. Qualitative vs. Quantitative Studies

CS 160: Lecture 16. Quantitative Studies. Outline. Random variables and trials. Random variables. Qualitative vs. Quantitative Studies Qualitative vs. Quantitative Studies CS 160: Lecture 16 Professor John Canny Qualitative: What we ve been doing so far: * Contextual Inquiry: trying to understand user s tasks and their conceptual model.

More information

Computational Learning Theory

Computational Learning Theory CS 446 Machine Learning Fall 2016 OCT 11, 2016 Computational Learning Theory Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes 1 PAC Learning We want to develop a theory to relate the probability of successful

More information

CSC314 / CSC763 Introduction to Machine Learning

CSC314 / CSC763 Introduction to Machine Learning CSC314 / CSC763 Introduction to Machine Learning COMSATS Institute of Information Technology Dr. Adeel Nawab More on Evaluating Hypotheses/Learning Algorithms Lecture Outline: Review of Confidence Intervals

More information

Computational Learning Theory (VC Dimension)

Computational Learning Theory (VC Dimension) Computational Learning Theory (VC Dimension) 1 Difficulty of machine learning problems 2 Capabilities of machine learning algorithms 1 Version Space with associated errors error is the true error, r is

More information

Machine Learning. Lecture 9: Learning Theory. Feng Li.

Machine Learning. Lecture 9: Learning Theory. Feng Li. Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell

More information

Learning with multiple models. Boosting.

Learning with multiple models. Boosting. CS 2750 Machine Learning Lecture 21 Learning with multiple models. Boosting. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Learning with multiple models: Approach 2 Approach 2: use multiple models

More information

Lecture Slides for INTRODUCTION TO. Machine Learning. ETHEM ALPAYDIN The MIT Press,

Lecture Slides for INTRODUCTION TO. Machine Learning. ETHEM ALPAYDIN The MIT Press, Lecture Slides for INTRODUCTION TO Machine Learning ETHEM ALPAYDIN The MIT Press, 2004 alpaydin@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/i2ml CHAPTER 14: Assessing and Comparing Classification Algorithms

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem

More information

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS In our work on hypothesis testing, we used the value of a sample statistic to challenge an accepted value of a population parameter. We focused only

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Computational Learning Theory

Computational Learning Theory Computational Learning Theory Slides by and Nathalie Japkowicz (Reading: R&N AIMA 3 rd ed., Chapter 18.5) Computational Learning Theory Inductive learning: given the training set, a learning algorithm

More information

Lecture Topic 4: Chapter 7 Sampling and Sampling Distributions

Lecture Topic 4: Chapter 7 Sampling and Sampling Distributions Lecture Topic 4: Chapter 7 Sampling and Sampling Distributions Statistical Inference: The aim is to obtain information about a population from information contained in a sample. A population is the set

More information

CSE 103 Homework 8: Solutions November 30, var(x) = np(1 p) = P r( X ) 0.95 P r( X ) 0.

CSE 103 Homework 8: Solutions November 30, var(x) = np(1 p) = P r( X ) 0.95 P r( X ) 0. () () a. X is a binomial distribution with n = 000, p = /6 b. The expected value, variance, and standard deviation of X is: E(X) = np = 000 = 000 6 var(x) = np( p) = 000 5 6 666 stdev(x) = np( p) = 000

More information

COMP9444: Neural Networks. Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization

COMP9444: Neural Networks. Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization : Neural Networks Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization 11s2 VC-dimension and PAC-learning 1 How good a classifier does a learner produce? Training error is the precentage

More information

Learning From Data Lecture 3 Is Learning Feasible?

Learning From Data Lecture 3 Is Learning Feasible? Learning From Data Lecture 3 Is Learning Feasible? Outside the Data Probability to the Rescue Learning vs. Verification Selection Bias - A Cartoon M. Magdon-Ismail CSCI 4100/6100 recap: The Perceptron

More information

Diagnostics. Gad Kimmel

Diagnostics. Gad Kimmel Diagnostics Gad Kimmel Outline Introduction. Bootstrap method. Cross validation. ROC plot. Introduction Motivation Estimating properties of an estimator. Given data samples say the average. x 1, x 2,...,

More information

Smart Home Health Analytics Information Systems University of Maryland Baltimore County

Smart Home Health Analytics Information Systems University of Maryland Baltimore County Smart Home Health Analytics Information Systems University of Maryland Baltimore County 1 IEEE Expert, October 1996 2 Given sample S from all possible examples D Learner L learns hypothesis h based on

More information

Qualifying Exam in Machine Learning

Qualifying Exam in Machine Learning Qualifying Exam in Machine Learning October 20, 2009 Instructions: Answer two out of the three questions in Part 1. In addition, answer two out of three questions in two additional parts (choose two parts

More information

Evaluation requires to define performance measures to be optimized

Evaluation requires to define performance measures to be optimized Evaluation Basic concepts Evaluation requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain (generalization error) approximation

More information

Machine Learning. Ensemble Methods. Manfred Huber

Machine Learning. Ensemble Methods. Manfred Huber Machine Learning Ensemble Methods Manfred Huber 2015 1 Bias, Variance, Noise Classification errors have different sources Choice of hypothesis space and algorithm Training set Noise in the data The expected

More information

CptS 570 Machine Learning School of EECS Washington State University. CptS Machine Learning 1

CptS 570 Machine Learning School of EECS Washington State University. CptS Machine Learning 1 CptS 570 Machine Learning School of EECS Washington State University CptS 570 - Machine Learning 1 IEEE Expert, October 1996 CptS 570 - Machine Learning 2 Given sample S from all possible examples D Learner

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem set 1 Solutions Thursday, September 19 What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.

More information

Bayesian Learning (II)

Bayesian Learning (II) Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

More information

Notes on Machine Learning for and

Notes on Machine Learning for and Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Learning = improving with experience Improve over task T (e.g, Classification, control tasks) with respect

More information

Statistical Learning Theory: Generalization Error Bounds

Statistical Learning Theory: Generalization Error Bounds Statistical Learning Theory: Generalization Error Bounds CS6780 Advanced Machine Learning Spring 2015 Thorsten Joachims Cornell University Reading: Murphy 6.5.4 Schoelkopf/Smola Chapter 5 (beginning, rest

More information

http://imgs.xkcd.com/comics/electoral_precedent.png Statistical Learning Theory CS4780/5780 Machine Learning Fall 2012 Thorsten Joachims Cornell University Reading: Mitchell Chapter 7 (not 7.4.4 and 7.5)

More information

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag Decision Trees Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Supervised Learning Input: labelled training data i.e., data plus desired output Assumption:

More information

CS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning

CS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning CS 446 Machine Learning Fall 206 Nov 0, 206 Bayesian Learning Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes Overview Bayesian Learning Naive Bayes Logistic Regression Bayesian Learning So far, we

More information

Introduction to Bayesian Learning. Machine Learning Fall 2018

Introduction to Bayesian Learning. Machine Learning Fall 2018 Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability

More information

Computational Learning Theory. CS 486/686: Introduction to Artificial Intelligence Fall 2013

Computational Learning Theory. CS 486/686: Introduction to Artificial Intelligence Fall 2013 Computational Learning Theory CS 486/686: Introduction to Artificial Intelligence Fall 2013 1 Overview Introduction to Computational Learning Theory PAC Learning Theory Thanks to T Mitchell 2 Introduction

More information

Advanced Herd Management Probabilities and distributions

Advanced Herd Management Probabilities and distributions Advanced Herd Management Probabilities and distributions Anders Ringgaard Kristensen Slide 1 Outline Probabilities Conditional probabilities Bayes theorem Distributions Discrete Continuous Distribution

More information

Evaluation. Andrea Passerini Machine Learning. Evaluation

Evaluation. Andrea Passerini Machine Learning. Evaluation Andrea Passerini passerini@disi.unitn.it Machine Learning Basic concepts requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain

More information

Machine Learning. Computational Learning Theory. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Machine Learning. Computational Learning Theory. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012 Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Computational Learning Theory Le Song Lecture 11, September 20, 2012 Based on Slides from Eric Xing, CMU Reading: Chap. 7 T.M book 1 Complexity of Learning

More information

Boosting. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL , 10.7, 10.13

Boosting. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL , 10.7, 10.13 Boosting Ryan Tibshirani Data Mining: 36-462/36-662 April 25 2013 Optional reading: ISL 8.2, ESL 10.1 10.4, 10.7, 10.13 1 Reminder: classification trees Suppose that we are given training data (x i, y

More information

Midterm, Fall 2003

Midterm, Fall 2003 5-78 Midterm, Fall 2003 YOUR ANDREW USERID IN CAPITAL LETTERS: YOUR NAME: There are 9 questions. The ninth may be more time-consuming and is worth only three points, so do not attempt 9 unless you are

More information

Lecture 6. Probability events. Definition 1. The sample space, S, of a. probability experiment is the collection of all

Lecture 6. Probability events. Definition 1. The sample space, S, of a. probability experiment is the collection of all Lecture 6 1 Lecture 6 Probability events Definition 1. The sample space, S, of a probability experiment is the collection of all possible outcomes of an experiment. One such outcome is called a simple

More information

CS446: Machine Learning Spring Problem Set 4

CS446: Machine Learning Spring Problem Set 4 CS446: Machine Learning Spring 2017 Problem Set 4 Handed Out: February 27 th, 2017 Due: March 11 th, 2017 Feel free to talk to other members of the class in doing the homework. I am more concerned that

More information

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting

More information

9/2/2010. Wildlife Management is a very quantitative field of study. throughout this course and throughout your career.

9/2/2010. Wildlife Management is a very quantitative field of study. throughout this course and throughout your career. Introduction to Data and Analysis Wildlife Management is a very quantitative field of study Results from studies will be used throughout this course and throughout your career. Sampling design influences

More information

This does not cover everything on the final. Look at the posted practice problems for other topics.

This does not cover everything on the final. Look at the posted practice problems for other topics. Class 7: Review Problems for Final Exam 8.5 Spring 7 This does not cover everything on the final. Look at the posted practice problems for other topics. To save time in class: set up, but do not carry

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem set 1 Due Thursday, September 19, in class What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.

More information

Bayesian Learning Features of Bayesian learning methods:

Bayesian Learning Features of Bayesian learning methods: Bayesian Learning Features of Bayesian learning methods: Each observed training example can incrementally decrease or increase the estimated probability that a hypothesis is correct. This provides a more

More information

Bias-Variance in Machine Learning

Bias-Variance in Machine Learning Bias-Variance in Machine Learning Bias-Variance: Outline Underfitting/overfitting: Why are complex hypotheses bad? Simple example of bias/variance Error as bias+variance for regression brief comments on

More information

The Components of a Statistical Hypothesis Testing Problem

The Components of a Statistical Hypothesis Testing Problem Statistical Inference: Recall from chapter 5 that statistical inference is the use of a subset of a population (the sample) to draw conclusions about the entire population. In chapter 5 we studied one

More information

Decision Tree Learning

Decision Tree Learning Decision Tree Learning Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University References: 1. Machine Learning, Chapter 3 2. Data Mining: Concepts, Models,

More information

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14 CS 70 Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14 Introduction One of the key properties of coin flips is independence: if you flip a fair coin ten times and get ten

More information

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

CS6375: Machine Learning Gautam Kunapuli. Decision Trees Gautam Kunapuli Example: Restaurant Recommendation Example: Develop a model to recommend restaurants to users depending on their past dining experiences. Here, the features are cost (x ) and the user s

More information

Data Mining. Chapter 5. Credibility: Evaluating What s Been Learned

Data Mining. Chapter 5. Credibility: Evaluating What s Been Learned Data Mining Chapter 5. Credibility: Evaluating What s Been Learned 1 Evaluating how different methods work Evaluation Large training set: no problem Quality data is scarce. Oil slicks: a skilled & labor-intensive

More information

14.30 Introduction to Statistical Methods in Economics Spring 2009

14.30 Introduction to Statistical Methods in Economics Spring 2009 MIT OpenCourseWare http://ocw.mit.edu 4.0 Introduction to Statistical Methods in Economics Spring 009 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Overview. Confidence Intervals Sampling and Opinion Polls Error Correcting Codes Number of Pet Unicorns in Ireland

Overview. Confidence Intervals Sampling and Opinion Polls Error Correcting Codes Number of Pet Unicorns in Ireland Overview Confidence Intervals Sampling and Opinion Polls Error Correcting Codes Number of Pet Unicorns in Ireland Confidence Intervals When a random variable lies in an interval a X b with a specified

More information

Stochastic Gradient Descent

Stochastic Gradient Descent Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular

More information

Computational Learning Theory: Shattering and VC Dimensions. Machine Learning. Spring The slides are mainly from Vivek Srikumar

Computational Learning Theory: Shattering and VC Dimensions. Machine Learning. Spring The slides are mainly from Vivek Srikumar Computational Learning Theory: Shattering and VC Dimensions Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 This lecture: Computational Learning Theory The Theory of Generalization

More information

Lecture 3: Decision Trees

Lecture 3: Decision Trees Lecture 3: Decision Trees Cognitive Systems - Machine Learning Part I: Basic Approaches of Concept Learning ID3, Information Gain, Overfitting, Pruning last change November 26, 2014 Ute Schmid (CogSys,

More information

DECISION TREE LEARNING. [read Chapter 3] [recommended exercises 3.1, 3.4]

DECISION TREE LEARNING. [read Chapter 3] [recommended exercises 3.1, 3.4] 1 DECISION TREE LEARNING [read Chapter 3] [recommended exercises 3.1, 3.4] Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting Decision Tree 2 Representation: Tree-structured

More information

Web-Mining Agents Computational Learning Theory

Web-Mining Agents Computational Learning Theory Web-Mining Agents Computational Learning Theory Prof. Dr. Ralf Möller Dr. Özgür Özcep Universität zu Lübeck Institut für Informationssysteme Tanya Braun (Exercise Lab) Computational Learning Theory (Adapted)

More information

Basic Concepts of Inference

Basic Concepts of Inference Basic Concepts of Inference Corresponds to Chapter 6 of Tamhane and Dunlop Slides prepared by Elizabeth Newton (MIT) with some slides by Jacqueline Telford (Johns Hopkins University) and Roy Welsch (MIT).

More information

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics A short review of the principles of mathematical statistics (or, what you should have learned in EC 151).

More information

Entropy. Probability and Computing. Presentation 22. Probability and Computing Presentation 22 Entropy 1/39

Entropy. Probability and Computing. Presentation 22. Probability and Computing Presentation 22 Entropy 1/39 Entropy Probability and Computing Presentation 22 Probability and Computing Presentation 22 Entropy 1/39 Introduction Why randomness and information are related? An event that is almost certain to occur

More information

Linear Classifiers: Expressiveness

Linear Classifiers: Expressiveness Linear Classifiers: Expressiveness Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 Lecture outline Linear classifiers: Introduction What functions do linear classifiers express?

More information

Decision Trees. Tirgul 5

Decision Trees. Tirgul 5 Decision Trees Tirgul 5 Using Decision Trees It could be difficult to decide which pet is right for you. We ll find a nice algorithm to help us decide what to choose without having to think about it. 2

More information

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple

More information

Lecture 8 Sampling Theory

Lecture 8 Sampling Theory Lecture 8 Sampling Theory Thais Paiva STA 111 - Summer 2013 Term II July 11, 2013 1 / 25 Thais Paiva STA 111 - Summer 2013 Term II Lecture 8, 07/11/2013 Lecture Plan 1 Sampling Distributions 2 Law of Large

More information

Algorithm-Independent Learning Issues

Algorithm-Independent Learning Issues Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning

More information

Decision Tree Learning Lecture 2

Decision Tree Learning Lecture 2 Machine Learning Coms-4771 Decision Tree Learning Lecture 2 January 28, 2008 Two Types of Supervised Learning Problems (recap) Feature (input) space X, label (output) space Y. Unknown distribution D over

More information

Relationship between Least Squares Approximation and Maximum Likelihood Hypotheses

Relationship between Least Squares Approximation and Maximum Likelihood Hypotheses Relationship between Least Squares Approximation and Maximum Likelihood Hypotheses Steven Bergner, Chris Demwell Lecture notes for Cmpt 882 Machine Learning February 19, 2004 Abstract In these notes, a

More information

Boosting & Deep Learning

Boosting & Deep Learning Boosting & Deep Learning Ensemble Learning n So far learning methods that learn a single hypothesis, chosen form a hypothesis space that is used to make predictions n Ensemble learning à select a collection

More information

PDEEC Machine Learning 2016/17

PDEEC Machine Learning 2016/17 PDEEC Machine Learning 2016/17 Lecture - Model assessment, selection and Ensemble Jaime S. Cardoso jaime.cardoso@inesctec.pt INESC TEC and Faculdade Engenharia, Universidade do Porto Nov. 07, 2017 1 /

More information

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

Lecture 4: Types of errors. Bayesian regression models. Logistic regression Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture

More information

CS534 Machine Learning - Spring Final Exam

CS534 Machine Learning - Spring Final Exam CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University August 30, 2017 Today: Decision trees Overfitting The Big Picture Coming soon Probabilistic learning MLE,

More information

Chapter 7: Sampling Distributions

Chapter 7: Sampling Distributions Chapter 7: Sampling Distributions Section 7.1 What is a Sampling Distribution? The Practice of Statistics, 4 th edition For AP* STARNES, YATES, MOORE Chapter 7 Sampling Distributions 7.1 What is a Sampling

More information

2.830J / 6.780J / ESD.63J Control of Manufacturing Processes (SMA 6303) Spring 2008

2.830J / 6.780J / ESD.63J Control of Manufacturing Processes (SMA 6303) Spring 2008 MIT OpenCourseWare http://ocw.mit.edu 2.830J / 6.780J / ESD.63J Control of Processes (SMA 6303) Spring 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

MTH U481 : SPRING 2009: PRACTICE PROBLEMS FOR FINAL

MTH U481 : SPRING 2009: PRACTICE PROBLEMS FOR FINAL MTH U481 : SPRING 2009: PRACTICE PROBLEMS FOR FINAL 1). Two urns are provided as follows: urn 1 contains 2 white chips and 4 red chips, while urn 2 contains 5 white chips and 3 red chips. One chip is chosen

More information

Lecture 24: Other (Non-linear) Classifiers: Decision Tree Learning, Boosting, and Support Vector Classification Instructor: Prof. Ganesh Ramakrishnan

Lecture 24: Other (Non-linear) Classifiers: Decision Tree Learning, Boosting, and Support Vector Classification Instructor: Prof. Ganesh Ramakrishnan Lecture 24: Other (Non-linear) Classifiers: Decision Tree Learning, Boosting, and Support Vector Classification Instructor: Prof Ganesh Ramakrishnan October 20, 2016 1 / 25 Decision Trees: Cascade of step

More information

Comparison of Bayesian and Frequentist Inference

Comparison of Bayesian and Frequentist Inference Comparison of Bayesian and Frequentist Inference 18.05 Spring 2014 First discuss last class 19 board question, January 1, 2017 1 /10 Compare Bayesian inference Uses priors Logically impeccable Probabilities

More information

Lecture 10: Bayes' Theorem, Expected Value and Variance Lecturer: Lale Özkahya

Lecture 10: Bayes' Theorem, Expected Value and Variance Lecturer: Lale Özkahya BBM 205 Discrete Mathematics Hacettepe University http://web.cs.hacettepe.edu.tr/ bbm205 Lecture 10: Bayes' Theorem, Expected Value and Variance Lecturer: Lale Özkahya Resources: Kenneth Rosen, Discrete

More information

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 16. Random Variables: Distribution and Expectation

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 16. Random Variables: Distribution and Expectation CS 70 Discrete Mathematics and Probability Theory Spring 206 Rao and Walrand Note 6 Random Variables: Distribution and Expectation Example: Coin Flips Recall our setup of a probabilistic experiment as

More information

Minimax risk bounds for linear threshold functions

Minimax risk bounds for linear threshold functions CS281B/Stat241B (Spring 2008) Statistical Learning Theory Lecture: 3 Minimax risk bounds for linear threshold functions Lecturer: Peter Bartlett Scribe: Hao Zhang 1 Review We assume that there is a probability

More information