Mixtures of Gaussians and the EM Algorithm

Size: px
Start display at page:

Download "Mixtures of Gaussians and the EM Algorithm"

Transcription

1 Mixtures of Gaussias ad the EM Algorithm CSE 6363 Machie Learig Vassilis Athitsos Computer Sciece ad Egieerig Departmet Uiversity of Texas at Arligto 1

2 Gaussias A popular way to estimate probability desity fuctios is to model them as Gaussias. Review: a 1D ormal distributio is defied as: N x = 1 σ 2π e x μ 2 2σ 2 To defie a Gaussia, we eed to specify just two parameters: μ, which is the mea (average) of the distributio. σ, which is the stadard deviatio of the distributio. Note: σ 2 is called the variace of the distributio. 2

3 Estimatig a Gaussia I oe dimesio, a Gaussia is defied like this: N x = 1 x μ 2 σ 2π e 2σ 2 Give a set of real umbers x 1,, x, we ca easily fid the best-fittig Gaussia for that data. The mea μ is simply the average of those umbers: μ = 1 x i 1 The stadard deviatio σ is computed as: σ = 1 1 (x i μ) 2 1 3

4 Estimatig a Gaussia Fittig a Gaussia to data does ot guaratee that the resultig Gaussia will be a accurate distributio for the data. The data may have a distributio that is very differet from a Gaussia. 4

5 Example of Fittig a Gaussia The blue curve is a desity fuctio F such that: - F(x) = 0.25 for 1 x 3. - F(x) = 0.5 for 7 x 8. The red curve is the Gaussia fit G to data geerated usig F. 5

6 Naïve Bayes with 1D Gaussias Suppose the patters come from a d-dimesioal space: Examples: pixels to be classified as ski or o-ski, or the statlog dataset. Notatio: x i = (x i,1, x i,2,, x i,d ) For each dimesio j, we ca use a Gaussia to model the distributio p j (x i,j C k ) of the data i that dimesio, give their class. For example for the statlog dataset, we would get 216 Gaussias: 36 dimesios * 6 classes. The, we ca use the aïve Bayes approach (i.e., assume pairwise idepedece of all dimesios), to defie P(x C k ) as: d p x C k ) = p j x i,j ) C k ) i=1 6

7 Mixtures of Gaussias This figure shows our previous example, where we fitted a Gaussia ito some data, ad the fit was poor. Overall, Gaussias have attractive properties: They require learig oly two umbers (μ ad σ), ad thus require few traiig data to estimate those umbers. However, for some data, Gaussias are just ot good fits. 7

8 Mixtures of Gaussias Mixtures of Gaussias are oftetimes a better solutio. They are defied i the ext slide. They still require relatively few parameters to estimate, ad thus ca be leared from relatively small amouts of data. They ca fit pretty well actual distributios of data. 8

9 Mixtures of Gaussias Suppose we have k Gaussia distributios N i. Each N i has its ow mea μ i ad std σ i. Usig these k Gaussias, we ca defie a Gaussia mixture M as follows: k M x = w i N i x i=1 Each w i is a weight, specifyig the relative importace of Gaussia N i i the mixture. Weights w i are real umbers betwee 0 ad 1. Weights w i must sum up to 1, so that the itegral of M is 1. 9

10 Mixtures of Gaussias Example The blue ad gree curves show two Gaussias. The red curve shows a mixture of those Gaussias. w 1 = 0.9. w 2 = 0.1. The mixture looks a lot like N 1, but is iflueced a little by N 2 as well. 10

11 Mixtures of Gaussias Example The blue ad gree curves show two Gaussias. The red curve shows a mixture of those Gaussias. w 1 = 0.7. w 2 = 0.3. The mixture looks less like N 1 compared to the previous example, ad is iflueced more by N 2. 11

12 Mixtures of Gaussias Example The blue ad gree curves show two Gaussias. The red curve shows a mixture of those Gaussias. w 1 = 0.5. w 2 = 0.5. At each poit x, the value of the mixture is the average of N 1 (x) ad N 2 (x). 12

13 Mixtures of Gaussias Example The blue ad gree curves show two Gaussias. The red curve shows a mixture of those Gaussias. w 1 = 0.3. w 2 = 0.7. The mixture ow resembles N 2 more tha N 1. 13

14 Mixtures of Gaussias Example The blue ad gree curves show two Gaussias. The red curve shows a mixture of those Gaussias. w 1 = 0.1. w 2 = 0.9. The mixture ow is almost idetical to N 2 (x). 14

15 Learig a Mixture of Gaussias Suppose we are give traiig data x 1, x 2,, x. Suppose all x j belog to the same class c. How ca we fit a mixture of Gaussias to this data? This will be the topic of the ext few slides. We will lear a very popular machie learig algorithm, called the EM algorithm. EM stads for Expectatio-Maximizatio. Step 0 of the EM algorithm: pick k maually. Decide how may Gaussias the mixture should have. Ay approach for choosig k automatically is beyod the scope of this class. 15

16 Learig a Mixture of Gaussias Suppose we are give traiig data x 1, x 2,, x. Suppose all x j belog to the same class c. We wat to model P(x c) as a mixture of Gaussias. Give k, how may parameters do we eed to estimate i order to fully defie the mixture? Remember, a mixture M of k Gaussias is defied as: M x = w i N i x For each N i, we eed to estimate three umbers: w i, μ i, σ i. k i=1 i=1 = w i 1 σ i 2π e So, i total, we eed to estimate 3*k umbers. k x μ i 2 2σ i 2 16

17 Learig a Mixture of Gaussias Suppose we are give traiig data x 1, x 2,, x. A mixture M of k Gaussias is defied as: k M x = w i N i x For each N i, we eed to estimate w i, μ i, σ i. k i=1 i=1 = w i 1 σ i 2π e x μ i 2 2σ i 2 Suppose that we kew for each x j, that it belogs to oe ad oly oe of the k Gaussias. The, learig the mixture would be a piece of cake: For each Gaussia N i : Estimate μ i, σ i based o the examples that belog to it. Set w i equal to the fractio of examples that belog to N i. 17

18 Learig a Mixture of Gaussias Suppose we are give traiig data x 1, x 2,, x. A mixture M of k Gaussias is defied as: k M x = w i N i x For each N i, we eed to estimate w i, μ i, σ i. k i=1 i=1 = w i 1 σ i 2π e x μ i 2 2σ i 2 However, we have o idea which mixture each x j belogs to. If we kew μ i ad σ i for each N i, we could probabilistically assig each x j to a compoet. Probabilistically meas that we would ot make a hard assigmet, but we would partially assig x j to differet compoets, with each assigmet weighted proportioally to the desity value N i (x j ). 18

19 Example of Partial Assigmets Usig our previous example of a mixture: Suppose x j = 6.5. How do we assig 6.5 to the two Gaussias? N 1 (6.5) = N 2 (6.5) = So: 6.5 belogs to N 1 by = 20.6%. 6.5 belogs to N 2 by = 79.4%

20 The Chicke-ad-Egg Problem To recap, fittig a mixture of Gaussias to data ivolves estimatig, for each N i, values w i, μ i, σ i. If we could assig each x j to oe of the Gaussias, we could compute easily w i, μ i, σ i. Eve if we probabilistically assig x j to multiple Gaussias, we ca still easily w i, μ i, σ i, by adaptig our previous formulas. We will see the adapted formulas i a few slides. If we kew μ i, σ i ad w i, we could assig (at least probabilistically) x j s to Gaussias. So, this is a chicke-ad-egg problem. If we kew oe piece, we could compute the other. But, we kow either. So, what do we do? 20

21 O Chicke-ad-Egg Problems Such chicke-ad-egg problems occur frequetly i AI. Surprisigly (at least to people ew i AI), we ca easily solve such chicke-ad-egg problems. Overall, chicke ad egg problems i AI look like this: We eed to kow A to estimate B. We eed to kow B to compute A. There is a fairly stadard recipe for solvig these problems. Ay guesses? 21

22 O Chicke-ad-Egg Problems Such chicke-ad-egg problems occur frequetly i AI. Surprisigly (at least to people ew i AI), we ca easily solve such chicke-ad-egg problems. Overall, chicke ad egg problems i AI look like this: We eed to kow A to estimate B. We eed to kow B to compute A. There is a fairly stadard recipe for solvig these problems. Start by givig to A values chose radomly (or perhaps oradomly, but still i a uiformed way, sice we do ot kow the correct values). Repeat this loop: Give our curret values for A, estimate B. Give our curret values of B, estimate A. If the ew values of A ad B are very close to the old values, break. 22

23 The EM Algorithm - Overview We use this approach to fit mixtures of Gaussias to data. This algorithm, that fits mixtures of Gaussias to data, is called the EM algorithm (Expectatio-Maximizatio algorithm). Remember, we choose k (the umber of Gaussias i the mixture) maually, so we do t have to estimate that. To iitialize the EM algorithm, we iitialize each μ i, σ i, ad w i. Values w i are set to 1/k. We ca iitialize μ i, σ i i differet ways: Givig radom values to each μ i. Uiformly spacig the values give to each μ i. Givig radom values to each σ i. Settig each σ i to 1 iitially. The, we iteratively perform two steps. The E-step. The M-step. 23

24 The E-Step E-step. Give our curret estimates for μ i, σ i, ad w i : We compute, for each i ad j, the probability p ij = P(N i x j ): the probability that x j was geerated by Gaussia N i. How? Usig Bayes rule. p ij = P(N i x j ) = P x j N i P(N i ) P(x j ) N i x j = 1 σ i k 2π e x μ i 2 2σ i 2 P x j = w i N i x j i =1 = N i x j w i P(x j ) 24

25 The M-Step: Updatig μ i ad σ i M-step. Give our curret estimates of p ij, for each i, j: We compute μ i ad σ i for each N i, as follows: μ i = j=1 [p ij x j ] j=1 p ij σ i = j=1 [p ij x j μ i 2 ] j=1 p ij To uderstad these formulas, it helps to compare them to the stadard formulas for fittig a Gaussia to data: μ = 1 1 x j σ = 1 1 j=1 (x j μ) 2 25

26 μ i = The M-Step: Updatig μ i ad σ i j=1 [p ij x j ] j=1 p ij σ i = j=1 [p ij x j μ i 2 ] j=1 p ij To uderstad these formulas, it helps to compare them to the stadard formulas for fittig a Gaussia to data: μ = 1 1 x j σ = 1 1 j=1 (x j μ) 2 Why do we take weighted averages at the M-step? Because each x j is probabilistically assiged to multiple Gaussias. We use p ij = P N i x j as weight of the assigmet of x j to N i. 26

27 The M-Step: Updatig w i w i = k i=1 j=1 p ij j=1 p ij At the M-step, i additio to updatig μ i ad σ i, we also eed to update w i, which is the weight of the i-th Gaussia i the mixture. The formula show above is used for the update of w i. We sum up the weights of all objects for the i-th Gaussia. We divide that sum by the sum of weights of all objects for all Gaussias. k The divisio esures that i=1 w i = 1. 27

28 The EM Steps: Summary E-step: Give curret estimates for each μ i, σ i, ad w i, update p ij : p ij = N i x j w i P(x j ) M-step: Give our curret estimates for each p ij, update μ i, σ i ad w i : μ i = j=1 [p x ij j] j=1 p ij σ i = j=1[p ij j=1 p ij x j μ i 2 ] w i = j=1 p ij k i=1 j=1 p ij 28

29 The EM Algorithm - Termiatio The log likelihood of the traiig data is defied as: L x 1,, x = log 2 M x j As a remider, M is the Gaussia mixture, defied as: k M x = w i N i x i=1 = w i 1 Oe ca prove that, after each iteratio of the E-step ad the M- step, this log likelihood icreases or stays the same. We check how much the log likelihood chages at each iteratio. Whe the chage is below some threshold, we stop. 29 j=1 k i=1 σ i 2π e x μ i 2 2σ i 2

30 The EM Algorithm: Summary Iitializatio: Iitialize each μ i, σ i, w i, usig your favorite approach (e.g., set each μ i to a radom value, ad set each σ i to 1, set each w i equal to 1/k). last_log_likelihood = -ifiity. Mai loop: E-step: Give our curret estimates for each μ i, σ i, ad w i, update each p ij. M-step: Give our curret estimates for each p ij, update each μ i, σ i, ad w i. log_likelihood = L x 1,, x. if (log_likelihood last_log_likelihood) < threshold, break. last_log_likelihood = log_likelihood 30

31 The EM Algorithm: Limitatios Whe we fit a Gaussia to data, we always get the same result. We ca also prove that the result that we get is the best possible result. There is o other Gaussia givig a higher log likelihood to the data, tha the oe that we compute as described i these slides. Whe we fit a mixture of Gaussias to the same data, do we always ed up with the same result? 31

32 The EM Algorithm: Limitatios Whe we fit a Gaussia to data, we always get the same result. We ca also prove that the result that we get is the best possible result. There is o other Gaussia givig a higher log likelihood to the data, tha the oe that we compute as described i these slides. Whe we fit a mixture of Gaussias to the same data, we (sadly) do ot always get the same result. The EM algorithm is a greedy algorithm. The result depeds o the iitializatio values. We may have bad luck with the iitial values, ad ed up with a bad fit. There is o good way to kow if our result is good or bad, or if better results are possible. 32

33 Mixtures of Gaussias - Recap Mixtures of Gaussias are widely used. Why? Because with the right parameters, they ca fit very well various types of data. Actually, they ca fit almost aythig, as log as k is large eough (so that the mixture cotais sufficietly may Gaussias). The EM algorithm is widely used to fit mixtures of Gaussias to data. 33

34 Multidimesioal Gaussias Istead of assumig that each dimesio is idepedet, we ca istead model the distributio usig a multi-dimesioal Gaussia: N v = 1 2π d Σ exp 1 2 (x μ)τ Σ 1 (x μ) To specify this Gaussia, we eed to estimate the mea μ ad the covariace matrix Σ. 34

35 Multidimesioal Gaussias - Mea Let x 1, x 2,, x be d-dimesioal vectors. x i = (x i,1, x i,2,, x i,d ), where each x i,j is a real umber. The, the mea μ = (μ 1,..., μ d ) is computed as: μ = 1 x i 1 Therefore, μ j = 1 i=1 x i,j 35

36 Multidimesioal Gaussias Covariace Matrix Let x 1, x 2,, x be d-dimesioal vectors. x i = (x i,1, x i,2,, x i,d ), where each x i,j is a real umber. Let Σ be the covariace matrix. Its size is dxd. Let σ r,c be the value of Σ at row r, colum c. σ r,c = 1 1 j=1 (x j,r μ r )(x j,c μ c ) 36

37 Multidimesioal Gaussias Traiig Let N be a d-dimesioal Gaussia with mea μ ad covariace matrix Σ. How may parameters do we eed to specify N? The mea μ is defied by d umbers. The covariace matrix Σ requires d 2 umbers σ r,c. Strictly speakig, Σ is symmetric, σ r,c = σ c,r. So, we eed roughly d 2 /2 parameters. The umber of parameters is quadratic to d. The umber of traiig data we eed for reliable estimatio is also quadratic to d. 37

38 The Curse of Dimesioality We will discuss this "curse" i several places i this course. Summary: dealig with high dimesioal data is a pai, ad presets challeges that may be surprisig to someoe used to dealig with oe, two, or three dimesios. Oe first example is i estimatig Gaussia parameters. I oe dimesio, it is very simple: We estimate two parameters, μ ad σ. Estimatio ca be pretty reliable with a few tes of examples. I d dimesios, we estimate O(d 2 ) parameters. The umber of traiig data is quadratic to the dimesios. 38

39 The Curse of Dimesioality For example: suppose we wat to trai a system to recogize the faces of Michael Jorda ad Kobe Bryat. Assume each image is 100x100 pixels. Each pixel has three umbers: r, g, b. Thus, each image has 30,000 umbers. Suppose we model each class as a multi-dimesioal Gaussia. The, we eed to estimate parameters of a 30,000- dimesioal Gaussia. We eed roughly 450 millio umbers for the covariace matrix. We would eed more tha te billio traiig images to have a reliable estimate. It is ot realistic to expect to have such a large traiig set for learig how to recogize a sigle perso. 39

40 The Curse of Dimesioality The curse of dimesioality makes it (usually) impossible to estimate precisely probability desities i high-dimesioal spaces. The umber of traiig data that is eeded is expoetial to the umber of dimesios. The curse of dimesioality also makes histogram-based probability estimatio ifeasible i high dimesios. Estimatig a histogram still requires a umber of traiig examples that is expoetial to the dimesios. Estimatig a Gaussia requires a umber of traiig parameters that is "oly" quadratic to the dimesios. However, Gaussias may ot be accurate fits for the actual distributio. Mixtures of Gaussias ca ofte provide sigificatly better fits. 40

Bayesian Classifiers and Probability Estimation. Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington

Bayesian Classifiers and Probability Estimation. Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington Bayesian Classifiers and Probability Estimation Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington 1 Data Space Suppose that we have a classification problem The

More information

Axis Aligned Ellipsoid

Axis Aligned Ellipsoid Machie Learig for Data Sciece CS 4786) Lecture 6,7 & 8: Ellipsoidal Clusterig, Gaussia Mixture Models ad Geeral Mixture Models The text i black outlies high level ideas. The text i blue provides simple

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Patter Recogitio Classificatio: No-Parametric Modelig Hamid R. Rabiee Jafar Muhammadi Sprig 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Ageda Parametric Modelig No-Parametric Modelig

More information

Algorithms for Clustering

Algorithms for Clustering CR2: Statistical Learig & Applicatios Algorithms for Clusterig Lecturer: J. Salmo Scribe: A. Alcolei Settig: give a data set X R p where is the umber of observatio ad p is the umber of features, we wat

More information

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015 ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],

More information

Expectation-Maximization Algorithm.

Expectation-Maximization Algorithm. Expectatio-Maximizatio Algorithm. Petr Pošík Czech Techical Uiversity i Prague Faculty of Electrical Egieerig Dept. of Cyberetics MLE 2 Likelihood.........................................................................................................

More information

Grouping 2: Spectral and Agglomerative Clustering. CS 510 Lecture #16 April 2 nd, 2014

Grouping 2: Spectral and Agglomerative Clustering. CS 510 Lecture #16 April 2 nd, 2014 Groupig 2: Spectral ad Agglomerative Clusterig CS 510 Lecture #16 April 2 d, 2014 Groupig (review) Goal: Detect local image features (SIFT) Describe image patches aroud features SIFT, SURF, HoG, LBP, Group

More information

15-780: Graduate Artificial Intelligence. Density estimation

15-780: Graduate Artificial Intelligence. Density estimation 5-780: Graduate Artificial Itelligece Desity estimatio Coditioal Probability Tables (CPT) But where do we get them? P(B)=.05 B P(E)=. E P(A B,E) )=.95 P(A B, E) =.85 P(A B,E) )=.5 P(A B, E) =.05 A P(J

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5 CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio

More information

Clustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar.

Clustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar. Clusterig CM226: Machie Learig for Bioiformatics. Fall 216 Sriram Sakararama Ackowledgmets: Fei Sha, Ameet Talwalkar Clusterig 1 / 42 Admiistratio HW 1 due o Moday. Email/post o CCLE if you have questios.

More information

The Expectation-Maximization (EM) Algorithm

The Expectation-Maximization (EM) Algorithm The Expectatio-Maximizatio (EM) Algorithm Readig Assigmets T. Mitchell, Machie Learig, McGraw-Hill, 997 (sectio 6.2, hard copy). S. Gog et al. Dyamic Visio: From Images to Face Recogitio, Imperial College

More information

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance Hypothesis Testig Empirically evaluatig accuracy of hypotheses: importat activity i ML. Three questios: Give observed accuracy over a sample set, how well does this estimate apply over additioal samples?

More information

NUMERICAL METHODS FOR SOLVING EQUATIONS

NUMERICAL METHODS FOR SOLVING EQUATIONS Mathematics Revisio Guides Numerical Methods for Solvig Equatios Page 1 of 11 M.K. HOME TUITION Mathematics Revisio Guides Level: GCSE Higher Tier NUMERICAL METHODS FOR SOLVING EQUATIONS Versio:. Date:

More information

Lecture 4. Hw 1 and 2 will be reoped after class for every body. New deadline 4/20 Hw 3 and 4 online (Nima is lead)

Lecture 4. Hw 1 and 2 will be reoped after class for every body. New deadline 4/20 Hw 3 and 4 online (Nima is lead) Lecture 4 Homework Hw 1 ad 2 will be reoped after class for every body. New deadlie 4/20 Hw 3 ad 4 olie (Nima is lead) Pod-cast lecture o-lie Fial projects Nima will register groups ext week. Email/tell

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio

More information

CS284A: Representations and Algorithms in Molecular Biology

CS284A: Representations and Algorithms in Molecular Biology CS284A: Represetatios ad Algorithms i Molecular Biology Scribe Notes o Lectures 3 & 4: Motif Discovery via Eumeratio & Motif Represetatio Usig Positio Weight Matrix Joshua Gervi Based o presetatios by

More information

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample. Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized

More information

Stat 421-SP2012 Interval Estimation Section

Stat 421-SP2012 Interval Estimation Section Stat 41-SP01 Iterval Estimatio Sectio 11.1-11. We ow uderstad (Chapter 10) how to fid poit estimators of a ukow parameter. o However, a poit estimate does ot provide ay iformatio about the ucertaity (possible

More information

Vector Quantization: a Limiting Case of EM

Vector Quantization: a Limiting Case of EM . Itroductio & defiitios Assume that you are give a data set X = { x j }, j { 2,,, }, of d -dimesioal vectors. The vector quatizatio (VQ) problem requires that we fid a set of prototype vectors Z = { z

More information

Empirical Process Theory and Oracle Inequalities

Empirical Process Theory and Oracle Inequalities Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi

More information

Machine Learning Regression I Hamid R. Rabiee [Slides are based on Bishop Book] Spring

Machine Learning Regression I Hamid R. Rabiee [Slides are based on Bishop Book] Spring Machie Learig Regressio I Hamid R. Rabiee [Slides are based o Bishop Book] Sprig 015 http://ce.sharif.edu/courses/93-94//ce717-1 Liear Regressio Liear regressio: ivolves a respose variable ad a sigle predictor

More information

Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651 Data Aalysis ad Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasii/teachig.html Suhasii Subba Rao Review of testig: Example The admistrator of a ursig home wats to do a time ad motio

More information

Design and Analysis of Algorithms

Design and Analysis of Algorithms Desig ad Aalysis of Algorithms Probabilistic aalysis ad Radomized algorithms Referece: CLRS Chapter 5 Topics: Hirig problem Idicatio radom variables Radomized algorithms Huo Hogwei 1 The hirig problem

More information

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering CEE 5 Autum 005 Ucertaity Cocepts for Geotechical Egieerig Basic Termiology Set A set is a collectio of (mutually exclusive) objects or evets. The sample space is the (collectively exhaustive) collectio

More information

Simulation. Two Rule For Inverting A Distribution Function

Simulation. Two Rule For Inverting A Distribution Function Simulatio Two Rule For Ivertig A Distributio Fuctio Rule 1. If F(x) = u is costat o a iterval [x 1, x 2 ), the the uiform value u is mapped oto x 2 through the iversio process. Rule 2. If there is a jump

More information

Expectation maximization

Expectation maximization Motivatio Expectatio maximizatio Subhrasu Maji CMSCI 689: Machie Learig 14 April 015 Suppose you are builig a aive Bayes spam classifier. After your are oe your boss tells you that there is o moey to label

More information

Lecture 2: Monte Carlo Simulation

Lecture 2: Monte Carlo Simulation STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?

More information

Probability and MLE.

Probability and MLE. 10-701 Probability ad MLE http://www.cs.cmu.edu/~pradeepr/701 (brief) itro to probability Basic otatios Radom variable - referrig to a elemet / evet whose status is ukow: A = it will rai tomorrow Domai

More information

Outline. CSCI-567: Machine Learning (Spring 2019) Outline. Prof. Victor Adamchik. Mar. 26, 2019

Outline. CSCI-567: Machine Learning (Spring 2019) Outline. Prof. Victor Adamchik. Mar. 26, 2019 Outlie CSCI-567: Machie Learig Sprig 209 Gaussia mixture models Prof. Victor Adamchik 2 Desity estimatio U of Souther Califoria Mar. 26, 209 3 Naive Bayes Revisited March 26, 209 / 57 March 26, 209 2 /

More information

Chapter 12 EM algorithms The Expectation-Maximization (EM) algorithm is a maximum likelihood method for models that have hidden variables eg. Gaussian

Chapter 12 EM algorithms The Expectation-Maximization (EM) algorithm is a maximum likelihood method for models that have hidden variables eg. Gaussian Chapter 2 EM algorithms The Expectatio-Maximizatio (EM) algorithm is a maximum likelihood method for models that have hidde variables eg. Gaussia Mixture Models (GMMs), Liear Dyamic Systems (LDSs) ad Hidde

More information

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Discrete Mathematics for CS Spring 2008 David Wagner Note 22 CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig

More information

Lecture 1 Probability and Statistics

Lecture 1 Probability and Statistics Wikipedia: Lecture 1 Probability ad Statistics Bejami Disraeli, British statesma ad literary figure (1804 1881): There are three kids of lies: lies, damed lies, ad statistics. popularized i US by Mark

More information

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1 EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum

More information

Random Variables, Sampling and Estimation

Random Variables, Sampling and Estimation Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig

More information

INF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification

INF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification INF 4300 90 Itroductio to classifictio Ae Solberg ae@ifiuioo Based o Chapter -6 i Duda ad Hart: atter Classificatio 90 INF 4300 Madator proect Mai task: classificatio You must implemet a classificatio

More information

Module 1 Fundamentals in statistics

Module 1 Fundamentals in statistics Normal Distributio Repeated observatios that differ because of experimetal error ofte vary about some cetral value i a roughly symmetrical distributio i which small deviatios occur much more frequetly

More information

Lecture 3: August 31

Lecture 3: August 31 36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,

More information

Kinetics of Complex Reactions

Kinetics of Complex Reactions Kietics of Complex Reactios by Flick Colema Departmet of Chemistry Wellesley College Wellesley MA 28 wcolema@wellesley.edu Copyright Flick Colema 996. All rights reserved. You are welcome to use this documet

More information

Understanding Samples

Understanding Samples 1 Will Moroe CS 109 Samplig ad Bootstrappig Lecture Notes #17 August 2, 2017 Based o a hadout by Chris Piech I this chapter we are goig to talk about statistics calculated o samples from a populatio. We

More information

Parameter, Statistic and Random Samples

Parameter, Statistic and Random Samples Parameter, Statistic ad Radom Samples A parameter is a umber that describes the populatio. It is a fixed umber, but i practice we do ot kow its value. A statistic is a fuctio of the sample data, i.e.,

More information

1 Review of Probability & Statistics

1 Review of Probability & Statistics 1 Review of Probability & Statistics a. I a group of 000 people, it has bee reported that there are: 61 smokers 670 over 5 960 people who imbibe (drik alcohol) 86 smokers who imbibe 90 imbibers over 5

More information

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01 ENGI 44 Cofidece Itervals (Two Samples) Page -0 Two Sample Cofidece Iterval for a Differece i Populatio Meas [Navidi sectios 5.4-5.7; Devore chapter 9] From the cetral limit theorem, we kow that, for sufficietly

More information

Machine Learning Assignment-1

Machine Learning Assignment-1 Uiversity of Utah, School Of Computig Machie Learig Assigmet-1 Chadramouli, Shridhara sdhara@cs.utah.edu 00873255) Sigla, Sumedha sumedha.sigla@utah.edu 00877456) September 10, 2013 1 Liear Regressio a)

More information

Statistics 511 Additional Materials

Statistics 511 Additional Materials Cofidece Itervals o mu Statistics 511 Additioal Materials This topic officially moves us from probability to statistics. We begi to discuss makig ifereces about the populatio. Oe way to differetiate probability

More information

Table 12.1: Contingency table. Feature b. 1 N 11 N 12 N 1b 2 N 21 N 22 N 2b. ... a N a1 N a2 N ab

Table 12.1: Contingency table. Feature b. 1 N 11 N 12 N 1b 2 N 21 N 22 N 2b. ... a N a1 N a2 N ab Sectio 12 Tests of idepedece ad homogeeity I this lecture we will cosider a situatio whe our observatios are classified by two differet features ad we would like to test if these features are idepedet

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Machine Learning. Ilya Narsky, Caltech

Machine Learning. Ilya Narsky, Caltech Machie Learig Ilya Narsky, Caltech Lecture 4 Multi-class problems. Multi-class versios of Neural Networks, Decisio Trees, Support Vector Machies ad AdaBoost. Reductio of a multi-class problem to a set

More information

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals 7-1 Chapter 4 Part I. Samplig Distributios ad Cofidece Itervals 1 7- Sectio 1. Samplig Distributio 7-3 Usig Statistics Statistical Iferece: Predict ad forecast values of populatio parameters... Test hypotheses

More information

Lecture 5. Materials Covered: Chapter 6 Suggested Exercises: 6.7, 6.9, 6.17, 6.20, 6.21, 6.41, 6.49, 6.52, 6.53, 6.62, 6.63.

Lecture 5. Materials Covered: Chapter 6 Suggested Exercises: 6.7, 6.9, 6.17, 6.20, 6.21, 6.41, 6.49, 6.52, 6.53, 6.62, 6.63. STT 315, Summer 006 Lecture 5 Materials Covered: Chapter 6 Suggested Exercises: 67, 69, 617, 60, 61, 641, 649, 65, 653, 66, 663 1 Defiitios Cofidece Iterval: A cofidece iterval is a iterval believed to

More information

Frequentist Inference

Frequentist Inference Frequetist Iferece The topics of the ext three sectios are useful applicatios of the Cetral Limit Theorem. Without kowig aythig about the uderlyig distributio of a sequece of radom variables {X i }, for

More information

10-701/ Machine Learning Mid-term Exam Solution

10-701/ Machine Learning Mid-term Exam Solution 0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it

More information

Basics of Probability Theory (for Theory of Computation courses)

Basics of Probability Theory (for Theory of Computation courses) Basics of Probability Theory (for Theory of Computatio courses) Oded Goldreich Departmet of Computer Sciece Weizma Istitute of Sciece Rehovot, Israel. oded.goldreich@weizma.ac.il November 24, 2008 Preface.

More information

Discrete Mathematics and Probability Theory Fall 2016 Walrand Probability: An Overview

Discrete Mathematics and Probability Theory Fall 2016 Walrand Probability: An Overview CS 70 Discrete Mathematics ad Probability Theory Fall 2016 Walrad Probability: A Overview Probability is a fasciatig theory. It provides a precise, clea, ad useful model of ucertaity. The successes of

More information

CS322: Network Analysis. Problem Set 2 - Fall 2009

CS322: Network Analysis. Problem Set 2 - Fall 2009 Due October 9 009 i class CS3: Network Aalysis Problem Set - Fall 009 If you have ay questios regardig the problems set, sed a email to the course assistats: simlac@staford.edu ad peleato@staford.edu.

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression REGRESSION 1 Outlie Liear regressio Regularizatio fuctios Polyomial curve fittig Stochastic gradiet descet for regressio MLE for regressio Step-wise forward regressio Regressio methods Statistical techiques

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence Chapter 8: Estimatig with Cofidece Sectio 8.2 The Practice of Statistics, 4 th editio For AP* STARNES, YATES, MOORE Chapter 8 Estimatig with Cofidece 8.1 Cofidece Itervals: The Basics 8.2 8.3 Estimatig

More information

Chapter 2 The Monte Carlo Method

Chapter 2 The Monte Carlo Method Chapter 2 The Mote Carlo Method The Mote Carlo Method stads for a broad class of computatioal algorithms that rely o radom sampligs. It is ofte used i physical ad mathematical problems ad is most useful

More information

Economics Spring 2015

Economics Spring 2015 1 Ecoomics 400 -- Sprig 015 /17/015 pp. 30-38; Ch. 7.1.4-7. New Stata Assigmet ad ew MyStatlab assigmet, both due Feb 4th Midterm Exam Thursday Feb 6th, Chapters 1-7 of Groeber text ad all relevat lectures

More information

Monte Carlo Integration

Monte Carlo Integration Mote Carlo Itegratio I these otes we first review basic umerical itegratio methods (usig Riema approximatio ad the trapezoidal rule) ad their limitatios for evaluatig multidimesioal itegrals. Next we itroduce

More information

Random assignment with integer costs

Random assignment with integer costs Radom assigmet with iteger costs Robert Parviaie Departmet of Mathematics, Uppsala Uiversity P.O. Box 480, SE-7506 Uppsala, Swede robert.parviaie@math.uu.se Jue 4, 200 Abstract The radom assigmet problem

More information

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece,, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet as

More information

Class 23. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Class 23. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700 Class 23 Daiel B. Rowe, Ph.D. Departmet of Mathematics, Statistics, ad Computer Sciece Copyright 2017 by D.B. Rowe 1 Ageda: Recap Chapter 9.1 Lecture Chapter 9.2 Review Exam 6 Problem Solvig Sessio. 2

More information

Lecture 19: Convergence

Lecture 19: Convergence Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may

More information

Distributional Similarity Models (cont.)

Distributional Similarity Models (cont.) Sematic Similarity Vector Space Model Similarity Measures cosie Euclidea distace... Clusterig k-meas hierarchical Last Time EM Clusterig Soft versio of K-meas clusterig Iput: m dimesioal objects X = {

More information

Introduction to Artificial Intelligence CAP 4601 Summer 2013 Midterm Exam

Introduction to Artificial Intelligence CAP 4601 Summer 2013 Midterm Exam Itroductio to Artificial Itelligece CAP 601 Summer 013 Midterm Exam 1. Termiology (7 Poits). Give the followig task eviromets, eter their properties/characteristics. The properties/characteristics of the

More information

CSE 527, Additional notes on MLE & EM

CSE 527, Additional notes on MLE & EM CSE 57 Lecture Notes: MLE & EM CSE 57, Additioal otes o MLE & EM Based o earlier otes by C. Grat & M. Narasimha Itroductio Last lecture we bega a examiatio of model based clusterig. This lecture will be

More information

6.867 Machine learning, lecture 7 (Jaakkola) 1

6.867 Machine learning, lecture 7 (Jaakkola) 1 6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit

More information

(7 One- and Two-Sample Estimation Problem )

(7 One- and Two-Sample Estimation Problem ) 34 Stat Lecture Notes (7 Oe- ad Two-Sample Estimatio Problem ) ( Book*: Chapter 8,pg65) Probability& Statistics for Egieers & Scietists By Walpole, Myers, Myers, Ye Estimatio 1 ) ( ˆ S P i i Poit estimate:

More information

Lecture 3. Properties of Summary Statistics: Sampling Distribution

Lecture 3. Properties of Summary Statistics: Sampling Distribution Lecture 3 Properties of Summary Statistics: Samplig Distributio Mai Theme How ca we use math to justify that our umerical summaries from the sample are good summaries of the populatio? Lecture Summary

More information

Exponential Families and Bayesian Inference

Exponential Families and Bayesian Inference Computer Visio Expoetial Families ad Bayesia Iferece Lecture Expoetial Families A expoetial family of distributios is a d-parameter family f(x; havig the followig form: f(x; = h(xe g(t T (x B(, (. where

More information

BIOSTATISTICS. Lecture 5 Interval Estimations for Mean and Proportion. dr. Petr Nazarov

BIOSTATISTICS. Lecture 5 Interval Estimations for Mean and Proportion. dr. Petr Nazarov Microarray Ceter BIOSTATISTICS Lecture 5 Iterval Estimatios for Mea ad Proportio dr. Petr Nazarov 15-03-013 petr.azarov@crp-sate.lu Lecture 5. Iterval estimatio for mea ad proportio OUTLINE Iterval estimatios

More information

Statistical and Mathematical Methods DS-GA 1002 December 8, Sample Final Problems Solutions

Statistical and Mathematical Methods DS-GA 1002 December 8, Sample Final Problems Solutions Statistical ad Mathematical Methods DS-GA 00 December 8, 05. Short questios Sample Fial Problems Solutios a. Ax b has a solutio if b is i the rage of A. The dimesio of the rage of A is because A has liearly-idepedet

More information

Lecture 1 Probability and Statistics

Lecture 1 Probability and Statistics Wikipedia: Lecture 1 Probability ad Statistics Bejami Disraeli, British statesma ad literary figure (1804 1881): There are three kids of lies: lies, damed lies, ad statistics. popularized i US by Mark

More information

MATH/STAT 352: Lecture 15

MATH/STAT 352: Lecture 15 MATH/STAT 352: Lecture 15 Sectios 5.2 ad 5.3. Large sample CI for a proportio ad small sample CI for a mea. 1 5.2: Cofidece Iterval for a Proportio Estimatig proportio of successes i a biomial experimet

More information

Lesson 10: Limits and Continuity

Lesson 10: Limits and Continuity www.scimsacademy.com Lesso 10: Limits ad Cotiuity SCIMS Academy 1 Limit of a fuctio The cocept of limit of a fuctio is cetral to all other cocepts i calculus (like cotiuity, derivative, defiite itegrals

More information

Random Matrices with Blocks of Intermediate Scale Strongly Correlated Band Matrices

Random Matrices with Blocks of Intermediate Scale Strongly Correlated Band Matrices Radom Matrices with Blocks of Itermediate Scale Strogly Correlated Bad Matrices Jiayi Tog Advisor: Dr. Todd Kemp May 30, 07 Departmet of Mathematics Uiversity of Califoria, Sa Diego Cotets Itroductio Notatio

More information

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece 1, 1, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet

More information

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4 MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.

More information

Distributional Similarity Models (cont.)

Distributional Similarity Models (cont.) Distributioal Similarity Models (cot.) Regia Barzilay EECS Departmet MIT October 19, 2004 Sematic Similarity Vector Space Model Similarity Measures cosie Euclidea distace... Clusterig k-meas hierarchical

More information

PRACTICE PROBLEMS FOR THE FINAL

PRACTICE PROBLEMS FOR THE FINAL PRACTICE PROBLEMS FOR THE FINAL Math 36Q Fall 25 Professor Hoh Below is a list of practice questios for the Fial Exam. I would suggest also goig over the practice problems ad exams for Exam ad Exam 2 to

More information

Problem Set 4 Due Oct, 12

Problem Set 4 Due Oct, 12 EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios

More information

Problem Set 2 Solutions

Problem Set 2 Solutions CS271 Radomess & Computatio, Sprig 2018 Problem Set 2 Solutios Poit totals are i the margi; the maximum total umber of poits was 52. 1. Probabilistic method for domiatig sets 6pts Pick a radom subset S

More information

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam. Probability ad Statistics FS 07 Secod Sessio Exam 09.0.08 Time Limit: 80 Miutes Name: Studet ID: This exam cotais 9 pages (icludig this cover page) ad 0 questios. A Formulae sheet is provided with the

More information

Probability and statistics: basic terms

Probability and statistics: basic terms Probability ad statistics: basic terms M. Veeraraghava August 203 A radom variable is a rule that assigs a umerical value to each possible outcome of a experimet. Outcomes of a experimet form the sample

More information

THE SYSTEMATIC AND THE RANDOM. ERRORS - DUE TO ELEMENT TOLERANCES OF ELECTRICAL NETWORKS

THE SYSTEMATIC AND THE RANDOM. ERRORS - DUE TO ELEMENT TOLERANCES OF ELECTRICAL NETWORKS R775 Philips Res. Repts 26,414-423, 1971' THE SYSTEMATIC AND THE RANDOM. ERRORS - DUE TO ELEMENT TOLERANCES OF ELECTRICAL NETWORKS by H. W. HANNEMAN Abstract Usig the law of propagatio of errors, approximated

More information

Interval Estimation (Confidence Interval = C.I.): An interval estimate of some population parameter is an interval of the form (, ),

Interval Estimation (Confidence Interval = C.I.): An interval estimate of some population parameter is an interval of the form (, ), Cofidece Iterval Estimatio Problems Suppose we have a populatio with some ukow parameter(s). Example: Normal(,) ad are parameters. We eed to draw coclusios (make ifereces) about the ukow parameters. We

More information

Bayesian Methods: Introduction to Multi-parameter Models

Bayesian Methods: Introduction to Multi-parameter Models Bayesia Methods: Itroductio to Multi-parameter Models Parameter: θ = ( θ, θ) Give Likelihood p(y θ) ad prior p(θ ), the posterior p proportioal to p(y θ) x p(θ ) Margial posterior ( θ, θ y) is Iterested

More information

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc. Chapter 22 Comparig Two Proportios Copyright 2010 Pearso Educatio, Ic. Comparig Two Proportios Comparisos betwee two percetages are much more commo tha questios about isolated percetages. Ad they are more

More information

Pattern Classification, Ch4 (Part 1)

Pattern Classification, Ch4 (Part 1) Patter Classificatio All materials i these slides were take from Patter Classificatio (2d ed) by R O Duda, P E Hart ad D G Stork, Joh Wiley & Sos, 2000 with the permissio of the authors ad the publisher

More information

Intro to Learning Theory

Intro to Learning Theory Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified

More information

Final Review for MATH 3510

Final Review for MATH 3510 Fial Review for MATH 50 Calculatio 5 Give a fairly simple probability mass fuctio or probability desity fuctio of a radom variable, you should be able to compute the expected value ad variace of the variable

More information

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc. Chapter 22 Comparig Two Proportios Copyright 2010, 2007, 2004 Pearso Educatio, Ic. Comparig Two Proportios Read the first two paragraphs of pg 504. Comparisos betwee two percetages are much more commo

More information

AAEC/ECON 5126 FINAL EXAM: SOLUTIONS

AAEC/ECON 5126 FINAL EXAM: SOLUTIONS AAEC/ECON 5126 FINAL EXAM: SOLUTIONS SPRING 2015 / INSTRUCTOR: KLAUS MOELTNER This exam is ope-book, ope-otes, but please work strictly o your ow. Please make sure your ame is o every sheet you re hadig

More information

Element sampling: Part 2

Element sampling: Part 2 Chapter 4 Elemet samplig: Part 2 4.1 Itroductio We ow cosider uequal probability samplig desigs which is very popular i practice. I the uequal probability samplig, we ca improve the efficiecy of the resultig

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week Lecture: Cocept Check Exercises Starred problems are optioal. Statistical Learig Theory. Suppose A = Y = R ad X is some other set. Furthermore, assume P X Y is a discrete

More information

Perceptron. Inner-product scalar Perceptron. XOR problem. Gradient descent Stochastic Approximation to gradient descent 5/10/10

Perceptron. Inner-product scalar Perceptron. XOR problem. Gradient descent Stochastic Approximation to gradient descent 5/10/10 Perceptro Ier-product scalar Perceptro Perceptro learig rule XOR problem liear separable patters Gradiet descet Stochastic Approximatio to gradiet descet LMS Adalie 1 Ier-product et =< w, x >= w x cos(θ)

More information

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting Lecture 6 Chi Square Distributio (χ ) ad Least Squares Fittig Chi Square Distributio (χ ) Suppose: We have a set of measuremets {x 1, x, x }. We kow the true value of each x i (x t1, x t, x t ). We would

More information

Randomized Algorithms I, Spring 2018, Department of Computer Science, University of Helsinki Homework 1: Solutions (Discussed January 25, 2018)

Randomized Algorithms I, Spring 2018, Department of Computer Science, University of Helsinki Homework 1: Solutions (Discussed January 25, 2018) Radomized Algorithms I, Sprig 08, Departmet of Computer Sciece, Uiversity of Helsiki Homework : Solutios Discussed Jauary 5, 08). Exercise.: Cosider the followig balls-ad-bi game. We start with oe black

More information