Topics on Statistics 3
|
|
- Beverly Hood
- 5 years ago
- Views:
Transcription
1 Topics on Statistics 3 Pejman Mahboubi April 24, Contingency Tables Assume we ask a sample of 1127 Americans if they believe in an afterlife world. The table below cross classifies the sample based on their gender and response. Here, Gender and Response are two categorical YES NO FEMALE MALE variables. Gender has 2 levels Male and Female. Response also has 2 levels Yes and No. In general with two categorical variables X with I levels and Y with J levels, we can build a I J contingency table which display the I J possible combinations of the count outcomes. From the table, we can answer the following questions: 1. Joint Probability of being male and not believing in afterlife. If we decide to let the coordinates denote the response and gender respectively, the we can write P(No, Male) = (1) 2. Conditional Probability of not believing in after life given or conditioned on the respondent is male. Since the total number of males is 502 = , and 104 of them don t believe in afterlife, Conditional probability can be defined based on joint probability. P(No Male) = , (2) Definition 1 (Conditional Probability). For two events A and B with P(B) > 0, P(A B) = P(A, B) P(B), (3) which can also be written as P(A, B) = P(B) P(A B). (4) Example 1. Using definition of the conditional probability, we have 104 P(No, M) P(No M) = P(Gender = M) = =
2 2 Measures of Accuray Assume you train a classifier machine CL which predicts the gender based on subjects religosity. CL(religious) = F CL(non-religious) = M, Assume, we apply this classifier to the test dataset of 20 individuals and get the following result > CL Gender Religiosity prdgender class 1 F Y F TRUE+ 2 F Y F TRUE+ 3 F Y F TRUE+ 4 F Y F TRUE+ 5 F N M FALSE- 6 F Y F TRUE+ 7 F Y F TRUE+ 8 F N M FALSE- 9 F N M FALSE- 10 F Y F TRUE+ 11 F Y F TRUE+ 12 F Y F TRUE+ 13 M N M TRUE- 14 M Y F FALSE+ 15 M N M TRUE- 16 M Y F FALSE+ 17 M N M TRUE- 18 M Y F FALSE+ 19 M Y F FALSE+ 20 M Y F FALSE+ Let s assume, F and M represent the positive and negative classes respectively. For example we predicted that samples on row 5, 8, 9, 13, 15, 17 are + and the rest are. Then, by comparing our predictions with the gender (first column), we can tell which of our predictions were T RUE or F ALSE. This way, we put our predictions in four categories: TRUE+ or TP, TRUE- or TN, FALSE- or FN and FALSE+ or FP. The following cross classification gives us the number of our predictions in each class: > (tbl<-table(cl[,c("gender","prdgender")])) prdgender Gender F M F 9 3 M 5 3 Therefore T P = 9 F N = 3 F P = 5 T N = 3 People in positive class are the ones who are correctly predicted positive or wrongly predicted negative: P = T P + F N N = T N + F P The diagonal elements are the counts of true positive and true negative predictions. One simple and highly intuitive measure of accuracy is accuracy = T P + T N T OT AL = =
3 There is a major issue with this measure, because sometimes it can fool us. Assume an endemic disease infected 95% of a population. Then if we have a sample of 100 ppl, and with no testing, just predict all of them as infected, then we will have a high true positive and 0 true negative. Therefore the accuracy measure of our trivial model would be: accuracy = = Sensitivity and Specificity In the picture blow, you see a square which is divided to two rectangles. The square includes all points in the dataset while the left and right rectangles denote the + and classes. We have a model that predicts the points inside the circle are + and the rest as negative. Then the left half-disk contains the true positive and the right half-disk the false positive. There are different ways of measuring accuracy of classifiers. 1. Sensitivity, Recall or True positive rate is the probability that a positive sample (left rectangle) is predicted as positive (in the disk) Sensitivity = left half-disk left rectangle 2. Specificity is the probability the diagnostic test will show negative, given the subject does not have the disease right rectangle right half-disk Specificity = right rectangle sensitivity = = 0.75 Remark 1. We can also define a false positive rate as 1 specificity. Let s see an example in context of testing for a disease. Assume a screening method for a rare disease has sensitivity =.86 and specificity.88. This means that P(X = 1 Y = 1) =.86, 1 denotes the positive class P(X = 2 Y = 2) =.88, 2 denotes the negative class Furthermore, assume only 1% of population is infected by this disease. A person takes the test and X = 1 (test is positive). What is the probability that he has the disease, Y = 1. Solution. Here we want to find P(Y = 1 X = 1). By the Bayes rule P(Y = 1 X = 1) = The numerator is readily computed as P(X = 1 Y = 1) P(Y = 1) P(X = 1) P(X = 1 Y = 1) P(Y = 1) = =.0086 For denominator we have, P(X = 1) = P(X = 1 Y = 1) P(Y = 1) + P(X = 1 Y = 2) P(Y = 2) = = , which still a very small number. Here, P(Y = 1) =.01 is called Bayesian prior and is the posterior. 3
4 where Y = 1 means the subject has the disease and X = 1 means the result of the test is positive. Another way of measuring accuracy of classifiers is by Precision and Recall. Using the picture on the right we have Precision = T P T P + F P In our example, Recall=.75 and Precision is Recall = T P T P + F N = Sensitivity Precision = 9 = (5) Function confusionmatrix() from library{caret} takes prediction columns and the true values and computes the contingency table, precision, recall, sensitivity, specificity and much more. You see that it generates a confidence interval for the accuracy. It is because the test data set is a random sample. > library(caret) > confusionmatrix(cl$prdgender,cl$gender) Confusion Matrix and Statistics Reference Prediction F M F 9 5 M 3 3 Accuracy : % CI : (0.3605, ) No Information Rate : 0.6 P-Value [Acc > NIR] : Kappa : Mcnemar's Test P-Value : Sensitivity : Specificity : Pos Pred Value : Neg Pred Value : Prevalence : Detection Rate : Detection Prevalence : Balanced Accuracy : 'Positive' Class : F There is a trade off between Precision and Recall in the sense that, if we try to improve one of them in our model, the other will decrease. 3 Marginal Probabilities and Independence Remember the result of the survey: > table(df) 4
5 Response Gender NO YES FEMALE MALE We can normalize the table by dividing each cell by total number of participants, i.e, 1127, to define a joint probability on the product space of G R as follows > prop.table(table(df)) Response Gender NO YES FEMALE MALE This means that P gives probabilities to pairs of gender and response. For example P(M, N) =.0923 We can normalize the table in different ways. For example, if we divide the first row and second row by the their corresponding total numbers, > (G.cond<-prop.table(table(df),1)) Response Gender NO YES FEMALE MALE we get conditional probabilities conditioned on Gender. The first row is conditioned on Gender = F and the second row conditioned on Gender = M and we have P(N F ) = P(Y M) = Similarly we can condition on the response (probabilities in each column adds up to 1) > (R.cond<-prop.table(table(df),2)) Response Gender NO YES FEMALE MALE P(F N) = = 1 P(M N) The third way of normalizing is marginalizing. For example the marginal probability of GENDER is > prop.table(table(df$gender)) FEMALE MALE or > prop.table(table(df$response)) NO YES
6 We can compute the marginal probabilities from the joint probabilities. For example P(F ) = P(F, Y ) + P(F, N) = = , because events R = Y and R = N partition the sample space, i.e., every subject falls in one of theses two sets and no subject falls in both events: P(Y N) = 1 P(Y, N) = 0. (6) This is an example of Law of Total probability. To state this law formally, we need to give a definition of a partition. Definition 2. A collection of events A 1,, A n form a partition of the sample space if they satisfy the following two conditions 1. They are mutually disjoint: P(A i, A j ) = 0 i j (7) 2. They cover the entire sample space together Theorem 1 (Law of Total Probability). Let A 1,, A n be a partition of a sample space. Then for any event B, S = A 1 A n (8) P(B) = P(B, A 1 ) + + P(B, A n ) sum of joint probabilities (9) P(B) = P(A 1 )P(B A 1 ) + + P(A n )P(B A n ). (10) So the marginal probabilities of an event A is obtained by summing up all joint probabilities whose one of the inputs (margins) is A. In the plot (left) A 1, A 2, A 3 form a partition for S S the sample space S. Equation (10) is also referred to as the Law of Total Conditional Probability, which is A1 A2 A3 B readily derived from (9) using the identity P (B, A n ) = P(A n )P(B A n ), B1 B2 B3 S see definition of the conditional probability and (4). You can check that a conditional probability can be derived by dividing joint probability by marginal probability. For example check that P(N F ) = P(N, F ) P(F ) = = Independence Independence of Two Events Two events A, and B are independent with respect to a probability P : S [0, 1] if P(A, B) = P(A) P(B), which is equivalent to P(A B) = P(A), We interpret the last one as information about B doesn t doesn t change information about A. 6
7 3.1.2 Independent Random Variables Give two categorical variables X with I levels and Y with J, and joint probability density P(X = i, Y = j) let s define the following notations: We also define notations for marginals: π ij = P(X = i, Y = j) i = 1,, I and j = 1,, J (11) π i+ = j π ij = j P(X = i, Y = j) = P(X = i) for i = 1,, I (12) π +j = i π ij = i P(X = i, Y = j) = P(Y = j) for j = 1,, J. (13) levels are independent, if for any i {1,, I} and j {1,, J}, the joint probability of the events equals the product of the marginals: or, using definition of conditional probability, i.e., conditional probability equals the marginal probability! P(X = i, Y = j) = P(X = i) P(Y = j) (14) P(X = i Y = j) = P(X = i) i and j, (15) Example 2. There are 100 blue, black and red balls in a jar. Each ball is either wooden or glass. The cross classification is given above. Is color independent of type? Blue Black Red Glass Wood Solution. Joint probabilities are Blue Black Red Glass Wood Marginals are > (m1<-apply(joint,1,sum)) Glass Wood > (m2<-apply(joint,2,sum)) Blue Black Red The product holds. 7
8 YES NO FEMALE MALE Table 1: Cross Classifying Contingency Table NO YES FEMALE MALE Table 2: Conditional Probabilities 4 Comparing Probabilities in 2 2 Contingency Tables In our 2 2 contingency table, think of levels of gender (Female, Male) as the explanatory random variable or groups that predict the response variable (Yes, No). Then we can think of p 1 = 0.81 and p 2 = 0.79 as probabilities of success (Yes) in each groups. Remark 2. Here we tacitly assume that we are taking number of males and females fixed (non-random). Our analysis doesn t say if we repeat the sample, what would be our best guess for number of males and females. It only analyzes the range of probabilities p 1 and p 2 in each group. Let π 1 and π 2 denote the true rates of Response = Y ES in the female and male populations respectively. Then Remark 2 implies that Y ES and NO responses in each group follows Bernoulli distributions with parameters π 1 and π 2 which are unknown to us. Remark 3. If B is a Bernoulli random variable with parameter p, then mean(b) = p and V ar(b) = p(1 p). We know the sample rates are p 1 = 0.81 and p 2 = Can we compute a 95% confidence interval for π 1 = π 2? The rate of success p 1 = , where we put 1 and 0 for Y ES and NO responses respectively, for 402 male participants. Therefore, we can think of p 1 and p 2 as random sample means. By the Central Limit Theorem, we can assume they are sampled from two normal random variables p 1 and p 2, that are distributed normally. More precisely, p 1 N(π 1, σ1) 2 p 2 N(π 2, σ2), 2 (16) where σ1 2 = π1(1 π1) 625 and σ2 2 = π2(1 π2) 502. Since we don t have π 1 and π 2, we use p 1 and p 2 as an approximation and for π 1 and π 2 and write s 2 1 and s 2 2 instead of σ1 2 and σ2. 2 Therefore we have > p1=0.81;p2<-0.79 > (s1<-p1*(1-p1)/625) [1] > (s2<-p2*(1-p2)/502) [1] Therefore p 1 N(0.81, ) p 2 N(0.79, ). (17) Now, let s discuss p 1 p 2. But first a theorem! In the following theorem pay attention that the variance is always the sum of the variances. Theorem 2. If X 1 and X 2 are independent normal random variables with parameters mean and variance (m 1, σ 2 1) and (m 2, σ 2 2), then X 1 ± X 2 is normal with parameters (m 1 ± m 2, σ σ 2 2) Therefore, p 1 p 2 is a normal distribution with mean = =.02 and Therefore, the 95% confidence interval is SE = = (18) [ , ] = [ 0.028, 0.068] We can also perform hypothesis testing. Assume we want to check if H 0 : π 1 = π 2 vs H 1 : π 1 π 2. 8
9 Remark 4. π 1 = π 2 means P(R = Y F ) = P(R = Y M). Then P(Y, F )P(M) = P(Y, M)P(F ). Therefore, P(Y, F ) P(Y, F )P(F ) = P(Y, M)P(F ), which implies that P(Y,F ) P(F ) = P(Y, M) + P(Y, F ). Therefore P(Y F ) = P(F ), which implies that P(N F ) = P(N). Similarly, we can check that P(Y M) = P(Y ), which implies that P(N Y ) = P(N). Therefore, we have Response is independent of Gender. Remember how we approximated σ1 2 and σ2 2 in (16) by s 2 1 and s 2 2. In hypothesis testing, under the null hypothesis π 1 = π 2 we can do a better job, thanks to Remark 4. The pooled variance is a common variance σ 2, closely related to the between variation in AOV, replacing both σ1 2 and σ2. 2 If π 1 = π 2, then as mentioned in Remark 4, Response is independent of Gender. Therefore, two samples are from the same populations. Therefore, σ1 2 = σ2 2 = σ 2. And σ 2 is the average of variances based on the samples we have. It is calculated as, (n 1 1)s 2 1 s p = + (n 2 1)s 2 2 = , n 1 + n 2 2 compare with (18). If π 1 = π 2, then p 1 p 2 N(0, ). The number we sampled is p 1 p 2 =.02. What is the chance to with N(0, ) sample a number.02 or further distance from the center 0? > 2*pnorm(q = -.02,mean = 0,sd = ,lower.tail = T) [1] Therefore, we cannot reject the null hypothesis. 5 Odds and Odds Ratio If π is the rate of success in a binomial trial, then its corresponding odds is defined to be odds = π 1 π. (19) If odds = 4, then success is 4 times as likely as a failure. We expect to see, on average, 4 successes for each failure. We of course can retrieve π if we know its corresponding odds by π = odds/(odds + 1). Every 2 2 contingency table induces two rates of success π 1 and π 2 corresponding to its rows. Let odds 1 and odds 2 be the odds corresponding to π 1 and π 2. By dividing the odds 1 by odds 2 we find another measure of association between the rows. This measure, denoted by θ, is called the odds ratio and is defined by θ = odds1 odds2 = π 1/(1 π 1 ) π 2 /(1 π 2 ). (20) Odds ratios are positive numbers in interval θ (0, ). θ = 4 means the odds of group in the first row is 4 times the odds of the group in second row. θ = 1 4 means the opposite is true, i.e., the odds of group in the second row is 4 times the odds of the group in first row. θ = 1 means the odds are equal, which implies that π 1 = π 2. In general θ > 1 implies π 1 > π 2, θ = 1, implies π 1 = π 2 and θ < 1 implies that π 1 < π 2. Furthermore, for any positive number α > 0, θ = α and θ = 1/α are convey opposite implications about odds of the 2 groups. As we always do in statistics, we have only the sample odds, which is defined by ˆθ = p 1/(1 p 1 ) p 2 /(1 p 2 ) Consider two population with equal odds. Then the sampling odds ratio will be around 1. You can see that the left tail is in (0, 1) and right tail in (1, ). Therefore, the sampling distribution of the odds ratio is highly skewed. But if we consider log θ instead of θ, then we will have nicer, and more intuitive properties. for example 9
10 log θ = 0 (i.e θ = 1) implies π 1 = π 2. log θ = 2 and log θ = 2 are symmetric around 0 and convey opposite statement about π 1 and π 2. The sample log odds ratio, log ˆθ, has a less skewed sampling distribution that is bell-shaped with standard deviation given by 1 SE = , (21) n 11 n 12 n 21 n 22 where n ij are the counts in the contingency table. Example 3. In our contingency table of the afterlife belief, compute log ˆθ and a 95% confidence interval for the log θ. Solution. Since the sample logˆθ and standard deviation are > odds.f< /( ) > odds.m< /( ) > (p<-log(odds.f/odds.m)) [1] > (SE=sqrt((1/509)+(1/116)+(1/398)+(1/104))) [1] Then the lower and upper limits of the 95% CI are > (lower<-p-1.96*se) [1] > (upper<-p+1.96*se) [1] Since zero is included in the interval then log θ = 0 is a possibility. Therefore, π 1 = π 2 is a possibility with 95% chance. By exponentiating, we find that the 95% CI for θ is [ , ] 5.1 Contingency Tables and Chi-Square test A 2000 General Social Survey, cross classifies 2757 subjects based on gender and their political party as below This table defines a sample joint probability p = {p 11, p 12, p 13, p 21, p 22, p 23 } that is Democrat Independent Republican Female Male Democrat Independent Republican Female Male Of course p is random as it is defined by a random sample. Does there enough evidence there to reject H 0 defined by H 0 : π = {0.25, 0.1, 0.25, 0.15, 0.1, 0.15}? 10
11 Democrat Independent Republican Female Male Solution. The expected number for each cell µ = µ ij, based on π = π ij is obtained by µ = π So we have There are 6 residuals, which are the difference between the expected(fitted) value and sample(actual) value. The residual squares are > sample<-c(762,327,468,484,239,477) > expected<-c(689.25,275.70,689.25,413.55,275.70,413.55) > res.sq<-c(sample-expected)^2 Bigger residuals are stronger evidences against H Chi-squared Distribution Democrat Independent Republican Female Male Chi-squared distribution also denoted by χ 2 distribution with k degrees of freedom, is the sum of square of k independent standard normal distributions. Think of residuals in a contingency table. They are approximately normal and after dividing by their standard deviation they become standard normal. Definition 3 (Wikipedia). If Z 1,, Z k are independent, standard normal random variables, then the sum of their squares, k Q = Zi 2 (22) i=1 is distributed according to the chi-squared distribution with k degrees of freedom. This is usually denoted as Q χ 2 k. Furthermore, if X χ 2 k, then EX = k and V arx = 2k. 11
12 The graph of the densities above shows how it become closer to a normal density as degrees of freedom increases. In the discussion that comes next, we will talk about the mean and variance of the frequencies of each cell, not to be confused with the mean and variance of Q. 5.3 Simulating The Contingency Table Assume we know the population probabilities of each cell: π = {π 11, π 12, π 13, π 21, π 22, π 23 }. Then the counts of these 6 cells follow a multinomial distribution. For example with 1000 people we might get > set.seed(1001) > pi<-c(.1,.3,.2,.1,.2,.1) > r=14 > N=1000 > (sample<-rmultinom(n = r,size = N,prob = pi)) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [1,] [2,] [3,] [4,] [5,] [6,] Each column is a random sample for the 6 cells of the contingency table and each row are 10 samples for one of the 6 cells. Check that each cell looks normal with mean = N π and sd = N π(1 π) > (mean<-1000*pi[1]) [1] 100 > (sd=sqrt(1000*pi[1]*(1-pi[1]))) [1] > hist(sample[1,]) Histogram of sample[1, ] Frequency sample[1, ] 12
13 So, each cell cell i,j is a binomial with parameter π ij, with mean equal to Nπ ij and variance Nπ ij (1 π ij ), see Theorem 3. Therefore, by CLT, (o Nπ ij )/ Nπ ij (1 π ij ) N(0, 1). 5.4 A Two-Cell Model Let s see what this yields when there are only two cells, one row and 2 columns, i.e, j = 1 and i = 1, 2. Let π 1 and π 2 denote probabilities of cell 1 and cell 2: π 1 + π 2 = 1. Furthermore, let 1. E i = Nπ i denote the mean (Expected value) of the cell i. 2. O i the observation in cell i. Therefore, O 1 + O 2 = N. 3. Define Q i = Oi Nπi. Nπi(1 π i) By CLT, Q 1 is sampled from an approximately normal distribution, see Theorem 3. And Q 2 1 is Q 2 1 = (O 1 E 1 ) 2 Nπ(1 π) = (O 1 Nπ) 2 Nπ after some algebraic manipulations + (O 2 N(1 π)) 2 N(1 π) = (O 1 E 1 ) 2 E 1 + (O 2 E 2 ) 2 E 2. Therefore, in a two cell model, if we compute (Oi Ei)2 E i for each cell and add them up, the result is a χ 2 1. Theorem 3. If X is a binomial random variable with parameters N and π, where N is the number of trials and π the rate of success, then EX = Nπ sdx = Nπ(1 π) (23) 5.5 Six-cell Model Same holds when there are 6 cells. We want to test the null hypothesis where H 0 and H 1 are given by H 0 : π = (π 1,, π 6 ), H 1 : π is not as given by H 0 (24) and we have O 1,, O 6 observations. Furthermore the total number of observations N = O O 6 can be computed. 1. Compute E i = Nπ i, for i = 1,, 6 2. Compute (Oi Ei)2 E i for i = 1,, 6, 3. Compute the test statistic Q = 6 (O i E i) 2 i=1 E i Q is a number, which under H 0 is sampled from χ 2 5 : Q χ 2 5. Check the table to see what is the chance p v al value that one would sample Q or bigger from χ 2 5. If p v al <.05, you reject H Example Continued, Goodness of the Fit For our cross classification of the gender-political party, we computed all the summands. If we add them up we get > (chi.sq<-sum(res.sq/expected)) [1] This number is sampled from χ 2 5. What is the chance that χ 2 5 generates or bigger? 13
14 > (p_val<-pchisq(q = chi.sq,df = 5,lower.tail = FALSE)) [1] e-23 And we reject H 0. This is an example of checking the goodness of the fit using chi-squared test. We are given observation and our model is basically defined by π 1,, π 6, or same as π ij, i = 1,, 3 j = 1, 2. In this case we concluded that our model defined by H 0 is not appropriate. Example 4. Assume we are given 20 numbers and we want to see if it is acceptable to assume they are sampled from a normal distribution. Let s assume the numbers are [1] [20] 120 > hist(t) Histogram of t Frequency t Solution. If the numbers are from a normal distribution, then the mean and variance would be > (m<-mean(t)) [1] > (sd<-sd(t)) [1] The range starts from 88 and ends with biggest number120. Let s make 3 cells. Cell 2 for all observations within one standard deviation of the mean, i.e., all observations in interval Cell 2 = [ , ]. Cell 3 = ( , ) and Cell 1 = (, ]. If numbers are from N(102, 10), then we can find probability of each cell. Actually we know from the picture below that > pi<-c(.16,.68,.16) > o1<-sum(t<(m-sd));o3<-sum(t>(m+sd));o2<-(length(t)-(o1+o3)) > (ob<-c(o1,o2,o3)) 14
15 [1] > (E<-pi*20) [1] > (res<-((ob-e)^2)/e) [1] > (chi.sq<-sum(res)) [1] There are 3 cells, therefore, there are 2 degrees of freedom. Is χ 2 2 =.588 too big? > (p_val<-pchisq(q = chi.sq,df = 2,lower.tail = FALSE)) [1] No, we cannot reject the possibility that numbers are sampled from N(102, 10). 5.7 Test of Independence We dealt with independence at Example 2. What is different here? In Example 2 we had access to the entire population (the jar of the balls) and could compute π ij for each cell. Here, we have a sample, and need a more powerful theory to infer about the population probabilities. We cannot apply the definition of independence to the probabilities p derived from the contingency table, as they are estimates of π ij at best and fluctuate Structure of H 0 In the χ 2 test of independence, H 0 is different than H 0 for goodness of the fit as in (24). Here, instead of joint probabilities π ij, we are given the observations. Then we can add up observations in columns and rows to compute the marginals, π i+ and π +j, see (12) Degrees of Freedom Furthermore, when testing the goodness of the fit, the only restrain on joint probabilities π ij is that π ij = 1, (25) ij therefore, there are I J 1 degrees of freedom. When testing independence marginals are computed from the observations. Using marginals, we compute joint probabilities under the independence condition, see (14). Therefore, instead of ij π ij = 1, joint probabilities in each row and each column should add up to the first marginals and the second marginals. Therefore, the degrees of freedom are (I 1) (J 1) How It Works Assume there are I rows and J columns. Therefore, i ranges in 1,..., I and j in 1,, J. We are given observations O ij for all i, j. Therefore, we can compute the sample marginals p i+ and p +j. Use the sample marginal as an approximation for π i+ and π +j. Use π i+ and π +j to compute the joint probabilities π ij under the independence condition. Then We want to test the hypothesis that. Finally, test the hypothesis that the observations is consistent with the joint probabilities H 0 : O ij is sampled from π ij for all i, j H 1 : O ij is not sampled from π ij at least for one i, j 15
16 Solution. We discuss below the procedure step-by-step. Remember that we are only given the observations O ij 1. Let O = ij O ij denote the total number of observations. 2. Add observations in each row and divide by O to obtain row marginals π 1+,, π I+. 3. Add observations in each column and divide by O to obtain column marginals π +1,, π +J. 4. Under independence, we can compute the joint probabilities π ij = π i+ π +j. 5. Compute all the expected observations E ij = π ij O. 6. Compute (Oij Eij)2 E ij for all ij. 7. Compute the statistic test: Q = ij (O ij E ij ) 2 E ij (26) 8. Compute p value p v al, the right tail of χ 2 df (J 1). that is bigger than Q, using degrees of freedom (I 1) Let me sample from from the jar in Example 2. The sample is with replacement, so size of the sample could be bigger than 100. Test the hypothesis that H 0 : Color of the balls is independent of its type. Blue Black Red Glass Wood To be able to use R, we store these numbers in a matrix: > (a<-matrix(data = c(2,6,8,7,14,13),nrow = 2,byrow = TRUE, + dimnames = list(c("glass","wood"),c("blue","black","red")))) Blue Black Red Glass Wood Total number of observation is : O = = First marginal is m.1 = [( )/50, ( )/50]: > (m.1<-apply(x = a,margin = 1,FUN = sum)/50) Glass Wood second marginal: > (m.2<-apply(x = a,margin = 2,FUN = sum)/50) Blue Black Red Joint probabilities: 16
17 Blue Black Red Glass Wood Expected observations > (E<-jp*50) Blue Black Red Glass Wood Compute residuals squared divided by expected value > (R<-((E-a)^2)/E) Blue Black Red Glass Wood Compute Q: > (chi.sq<-sum(r)) [1] degrees of freedom is (2 1) (3 1) = 2 9. Compute p value > (p_val<-pchisq(q = ,df = 2,lower.tail = FALSE)) [1] We cannot reject independence OR YOU CAN SIMPLY FEED YOUR DATA TO THE FOLLOWING COMMAND IN R > chisq.test(a) Pearson's Chi-squared test data: a X-squared = , df = 2, p-value = ROC curve and AUC Assume an endemic affected 10% of a population. We designed a classifiers that generate probabilities p of being in positive class (diseased). We gather two populations of 500 healthy and 50 diseased patients and look at he distribution of scores that the classifier spits out for each group. Assume we get the following results > H.score<-rnorm(500,.3,.15) > S.score<-rnorm(50,.7,.1) Lets look at the overlapping distributions of the two groups of scores in a plot. The vertical line is a threshold which indicates the prediction rule, scores on the right (bigger than the threshold) are predicted sick and scores on the left are predicted healthy. Then 1. Red on the left of the vertical line means : TRUE NEGATIVE 17
18 2. Red on the right of the vertical line means: FALSE POSITIVE 3. Green on the right of the vertical line means: TRUE POSITIVE 4. Green on the left of the vertical line means: FALSE NEGATIVE We can place the vertical line at any x (0, 1) and compute the corresponding true positive rate and false positive rate. The plot in false positive-true positive space is a curve. Assume from the data we know the positive class and negative class (this is not based on prediction, but based on the label of the data). For example H.score has 500 members. Therefore we know that TN+FP=500. Similarly, there are 50 sick people which means TP+FN=50. Before we proceed further to compute the rates, let s redefine S.score and H.score. > library(ggplot2) > dat<-data.frame(dens = c(h.score,s.score),lines=rep(c("healthy", "Sick"), c(500,50))) > ggplot(dat, aes(dens, fill = lines)) + geom_histogram(position = "dodge")+ + geom_vline(xintercept =.6) 40 count lines Healthy Sick dens 18
19 > H.score<-rnorm(500,.4,.2) > S.score<-rnorm(50,.6,.2) Therefore for a given threshold, say.6, we have > threshold=.6 > lp<-length(pclass<-s.score) > ln<-length(nclass<-h.score) > c(tpr=sum(pclass>threshold)/lp,fpr=sum(nclass>threshold)/ln) tpr fpr where tpr and fpr stand for true positive rate and false positive rate respectively. We can compute tpr and fpr for different thresholds > ts<-seq(from =.01,to=.99,by=.01) > K<-unname(unlist(lapply(X = ts,fun = function(threshold) + c(tpr=sum(pclass>threshold)/length(pclass), + fpr=sum(nclass>threshold)/length(nclass))))) Then we can separate tpr from fpr > tpr<-k[seq(from=1,to = 197,by = 2)] > fpr<-k[seq(from=2,to = 198,by = 2)] Then we can plot it > plot(fpr,tpr,type = 'l') tpr fpr The curve produced this way is called ROC curve and the area under the curve equals to the probability that 19
20 the classifier will rank a randomly chosen positive example higher than a randomly chosen negative example.[from This fact allows us to compute the area under the curve by sampling and counting > p = replicate(50000, sample(pclass, size=1) > sample(nclass, size=1)) > (mean(p)) [1] Let s repeat the process with a modified classifier which gives different (better) scores > H.score<-rnorm(500,.4,.15)->nclass > S.score<-rnorm(50,.6,.12)->pclass > ts<-seq(from =.01,to=.99,by=.01) > K<-unname(unlist(lapply(X = ts,fun = function(threshold) + c(tpr=sum(pclass>threshold)/length(pclass), + fpr=sum(nclass>threshold)/length(nclass))))) Then we can separate tpr from fpr > tpr<-k[seq(from=1,to = 197,by = 2)] > fpr<-k[seq(from=2,to = 198,by = 2)] Then we can plot it > plot(fpr,tpr,type = 'l') tpr fpr 20
21 > p = replicate(50000, sample(pclass, size=1) > sample(nclass, size=1)) > (mean(p)) [1] You can play with the distributions of pclass and nclass to repeat the procedure above to see that as scores become more concentrated and more separate the area under the curve increases. 21
ST3241 Categorical Data Analysis I Two-way Contingency Tables. 2 2 Tables, Relative Risks and Odds Ratios
ST3241 Categorical Data Analysis I Two-way Contingency Tables 2 2 Tables, Relative Risks and Odds Ratios 1 What Is A Contingency Table (p.16) Suppose X and Y are two categorical variables X has I categories
More informationIntroduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution
Introduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution James V. Lambers Department of Mathematics The University of Southern Mississippi James V. Lambers Statistical Data Analysis
More informationProbability and Statistics. Terms and concepts
Probability and Statistics Joyeeta Dutta Moscato June 30, 2014 Terms and concepts Sample vs population Central tendency: Mean, median, mode Variance, standard deviation Normal distribution Cumulative distribution
More informationPerformance evaluation of binary classifiers
Performance evaluation of binary classifiers Kevin P. Murphy Last updated October 10, 2007 1 ROC curves We frequently design systems to detect events of interest, such as diseases in patients, faces in
More informationn y π y (1 π) n y +ylogπ +(n y)log(1 π).
Tests for a binomial probability π Let Y bin(n,π). The likelihood is L(π) = n y π y (1 π) n y and the log-likelihood is L(π) = log n y +ylogπ +(n y)log(1 π). So L (π) = y π n y 1 π. 1 Solving for π gives
More informationProbability and Statistics. Joyeeta Dutta-Moscato June 29, 2015
Probability and Statistics Joyeeta Dutta-Moscato June 29, 2015 Terms and concepts Sample vs population Central tendency: Mean, median, mode Variance, standard deviation Normal distribution Cumulative distribution
More informationPerformance Evaluation and Hypothesis Testing
Performance Evaluation and Hypothesis Testing 1 Motivation Evaluating the performance of learning systems is important because: Learning systems are usually designed to predict the class of future unlabeled
More informationPerformance Evaluation and Comparison
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Cross Validation and Resampling 3 Interval Estimation
More informationStatistics 3858 : Contingency Tables
Statistics 3858 : Contingency Tables 1 Introduction Before proceeding with this topic the student should review generalized likelihood ratios ΛX) for multinomial distributions, its relation to Pearson
More informationLecture 14: Introduction to Poisson Regression
Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why
More informationModelling counts. Lecture 14: Introduction to Poisson Regression. Overview
Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week
More informationEvaluation requires to define performance measures to be optimized
Evaluation Basic concepts Evaluation requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain (generalization error) approximation
More information13.1 Categorical Data and the Multinomial Experiment
Chapter 13 Categorical Data Analysis 13.1 Categorical Data and the Multinomial Experiment Recall Variable: (numerical) variable (i.e. # of students, temperature, height,). (non-numerical, categorical)
More informationIntroduction to Supervised Learning. Performance Evaluation
Introduction to Supervised Learning Performance Evaluation Marcelo S. Lauretto Escola de Artes, Ciências e Humanidades, Universidade de São Paulo marcelolauretto@usp.br Lima - Peru Performance Evaluation
More informationEvaluation. Andrea Passerini Machine Learning. Evaluation
Andrea Passerini passerini@disi.unitn.it Machine Learning Basic concepts requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain
More informationLecture 01: Introduction
Lecture 01: Introduction Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 01: Introduction
More informationSTAT 705: Analysis of Contingency Tables
STAT 705: Analysis of Contingency Tables Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Analysis of Contingency Tables 1 / 45 Outline of Part I: models and parameters Basic
More informationBIOS 625 Fall 2015 Homework Set 3 Solutions
BIOS 65 Fall 015 Homework Set 3 Solutions 1. Agresti.0 Table.1 is from an early study on the death penalty in Florida. Analyze these data and show that Simpson s Paradox occurs. Death Penalty Victim's
More informationFundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur
Fundamentals to Biostatistics Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Statistics collection, analysis, interpretation of data development of new
More informationSTAC51: Categorical data Analysis
STAC51: Categorical data Analysis Mahinda Samarakoon January 26, 2016 Mahinda Samarakoon STAC51: Categorical data Analysis 1 / 32 Table of contents Contingency Tables 1 Contingency Tables Mahinda Samarakoon
More informationChapter 26: Comparing Counts (Chi Square)
Chapter 6: Comparing Counts (Chi Square) We ve seen that you can turn a qualitative variable into a quantitative one (by counting the number of successes and failures), but that s a compromise it forces
More informationA.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I. kevin small & byron wallace
A.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I kevin small & byron wallace today a review of probability random variables, maximum likelihood, etc. crucial for clinical
More informationStatistical Inference: Estimation and Confidence Intervals Hypothesis Testing
Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing 1 In most statistics problems, we assume that the data have been generated from some unknown probability distribution. We desire
More informationDiscrete Multivariate Statistics
Discrete Multivariate Statistics Univariate Discrete Random variables Let X be a discrete random variable which, in this module, will be assumed to take a finite number of t different values which are
More informationSTAT 4385 Topic 01: Introduction & Review
STAT 4385 Topic 01: Introduction & Review Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2016 Outline Welcome What is Regression Analysis? Basics
More informationHow do we compare the relative performance among competing models?
How do we compare the relative performance among competing models? 1 Comparing Data Mining Methods Frequent problem: we want to know which of the two learning techniques is better How to reliably say Model
More informationSTA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).
STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population
More informationTesting Independence
Testing Independence Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM 1/50 Testing Independence Previously, we looked at RR = OR = 1
More informationSociology 6Z03 Review II
Sociology 6Z03 Review II John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 1 / 35 Outline: Review II Probability Part I Sampling Distributions Probability
More informationST3241 Categorical Data Analysis I Two-way Contingency Tables. Odds Ratio and Tests of Independence
ST3241 Categorical Data Analysis I Two-way Contingency Tables Odds Ratio and Tests of Independence 1 Inference For Odds Ratio (p. 24) For small to moderate sample size, the distribution of sample odds
More informationChapter 2: Describing Contingency Tables - I
: Describing Contingency Tables - I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu]
More informationSTAT Chapter 13: Categorical Data. Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure).
STAT 515 -- Chapter 13: Categorical Data Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure). Many studies allow for more than 2 categories. Example
More informationBusiness Statistics. Lecture 10: Course Review
Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,
More informationMA : Introductory Probability
MA 320-001: Introductory Probability David Murrugarra Department of Mathematics, University of Kentucky http://www.math.uky.edu/~dmu228/ma320/ Spring 2017 David Murrugarra (University of Kentucky) MA 320:
More informationThe Chi-Square Distributions
MATH 183 The Chi-Square Distributions Dr. Neal, WKU The chi-square distributions can be used in statistics to analyze the standard deviation σ of a normally distributed measurement and to test the goodness
More informationThe Multinomial Model
The Multinomial Model STA 312: Fall 2012 Contents 1 Multinomial Coefficients 1 2 Multinomial Distribution 2 3 Estimation 4 4 Hypothesis tests 8 5 Power 17 1 Multinomial Coefficients Multinomial coefficient
More informationQuantitative Analysis and Empirical Methods
Hypothesis testing Sciences Po, Paris, CEE / LIEPP Introduction Hypotheses Procedure of hypothesis testing Two-tailed and one-tailed tests Statistical tests with categorical variables A hypothesis A testable
More informationLeast Squares Classification
Least Squares Classification Stephen Boyd EE103 Stanford University November 4, 2017 Outline Classification Least squares classification Multi-class classifiers Classification 2 Classification data fitting
More informationCohen s s Kappa and Log-linear Models
Cohen s s Kappa and Log-linear Models HRP 261 03/03/03 10-11 11 am 1. Cohen s Kappa Actual agreement = sum of the proportions found on the diagonals. π ii Cohen: Compare the actual agreement with the chance
More informationQ1 (12 points): Chap 4 Exercise 3 (a) to (f) (2 points each)
Q1 (1 points): Chap 4 Exercise 3 (a) to (f) ( points each) Given a table Table 1 Dataset for Exercise 3 Instance a 1 a a 3 Target Class 1 T T 1.0 + T T 6.0 + 3 T F 5.0-4 F F 4.0 + 5 F T 7.0-6 F T 3.0-7
More informationSmart Home Health Analytics Information Systems University of Maryland Baltimore County
Smart Home Health Analytics Information Systems University of Maryland Baltimore County 1 IEEE Expert, October 1996 2 Given sample S from all possible examples D Learner L learns hypothesis h based on
More informationSTAT:5100 (22S:193) Statistical Inference I
STAT:5100 (22S:193) Statistical Inference I Week 3 Luke Tierney University of Iowa Fall 2015 Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 1 Recap Matching problem Generalized
More informationEpidemiology Wonders of Biostatistics Chapter 11 (continued) - probability in a single population. John Koval
Epidemiology 9509 Wonders of Biostatistics Chapter 11 (continued) - probability in a single population John Koval Department of Epidemiology and Biostatistics University of Western Ontario What is being
More informationPerformance Evaluation
Performance Evaluation Confusion Matrix: Detected Positive Negative Actual Positive A: True Positive B: False Negative Negative C: False Positive D: True Negative Recall or Sensitivity or True Positive
More informationUnit 9: Inferences for Proportions and Count Data
Unit 9: Inferences for Proportions and Count Data Statistics 571: Statistical Methods Ramón V. León 1/15/008 Unit 9 - Stat 571 - Ramón V. León 1 Large Sample Confidence Interval for Proportion ( pˆ p)
More informationSTA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).
STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent
More information15: CHI SQUARED TESTS
15: CHI SQUARED ESS MULIPLE CHOICE QUESIONS In the following multiple choice questions, please circle the correct answer. 1. Which statistical technique is appropriate when we describe a single population
More informationSUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION
SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology
More informationSTA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3
STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae
More informationBinary Logistic Regression
The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b
More informationCategorical Variables and Contingency Tables: Description and Inference
Categorical Variables and Contingency Tables: Description and Inference STAT 526 Professor Olga Vitek March 3, 2011 Reading: Agresti Ch. 1, 2 and 3 Faraway Ch. 4 3 Univariate Binomial and Multinomial Measurements
More informationEvaluation & Credibility Issues
Evaluation & Credibility Issues What measure should we use? accuracy might not be enough. How reliable are the predicted results? How much should we believe in what was learned? Error on the training data
More informationCategorical Data Analysis Chapter 3
Categorical Data Analysis Chapter 3 The actual coverage probability is usually a bit higher than the nominal level. Confidence intervals for association parameteres Consider the odds ratio in the 2x2 table,
More informationData Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 1 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification
More informationUNIVERSITY OF TORONTO Faculty of Arts and Science
UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator
More informationPubH 5450 Biostatistics I Prof. Carlin. Lecture 13
PubH 5450 Biostatistics I Prof. Carlin Lecture 13 Outline Outline Sample Size Counts, Rates and Proportions Part I Sample Size Type I Error and Power Type I error rate: probability of rejecting the null
More informationThe Naïve Bayes Classifier. Machine Learning Fall 2017
The Naïve Bayes Classifier Machine Learning Fall 2017 1 Today s lecture The naïve Bayes Classifier Learning the naïve Bayes Classifier Practical concerns 2 Today s lecture The naïve Bayes Classifier Learning
More information" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2
Notation and Equations for Final Exam Symbol Definition X The variable we measure in a scientific study n The size of the sample N The size of the population M The mean of the sample µ The mean of the
More informationInterpret Standard Deviation. Outlier Rule. Describe the Distribution OR Compare the Distributions. Linear Transformations SOCS. Interpret a z score
Interpret Standard Deviation Outlier Rule Linear Transformations Describe the Distribution OR Compare the Distributions SOCS Using Normalcdf and Invnorm (Calculator Tips) Interpret a z score What is an
More informationLecture 8: Summary Measures
Lecture 8: Summary Measures Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 8:
More informationSection 4.6 Simple Linear Regression
Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval
More informationSTAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression
STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression Rebecca Barter April 20, 2015 Fisher s Exact Test Fisher s Exact Test
More informationGlossary for the Triola Statistics Series
Glossary for the Triola Statistics Series Absolute deviation The measure of variation equal to the sum of the deviations of each value from the mean, divided by the number of values Acceptance sampling
More informationLog-linear Models for Contingency Tables
Log-linear Models for Contingency Tables Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Log-linear Models for Two-way Contingency Tables Example: Business Administration Majors and Gender A
More informationProbability Theory and Applications
Probability Theory and Applications Videos of the topics covered in this manual are available at the following links: Lesson 4 Probability I http://faculty.citadel.edu/silver/ba205/online course/lesson
More informationLecture 5: ANOVA and Correlation
Lecture 5: ANOVA and Correlation Ani Manichaikul amanicha@jhsph.edu 23 April 2007 1 / 62 Comparing Multiple Groups Continous data: comparing means Analysis of variance Binary data: comparing proportions
More informationChapter 10. Discrete Data Analysis
Chapter 1. Discrete Data Analysis 1.1 Inferences on a Population Proportion 1. Comparing Two Population Proportions 1.3 Goodness of Fit Tests for One-Way Contingency Tables 1.4 Testing for Independence
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationInference for Binomial Parameters
Inference for Binomial Parameters Dipankar Bandyopadhyay, Ph.D. Department of Biostatistics, Virginia Commonwealth University D. Bandyopadhyay (VCU) BIOS 625: Categorical Data & GLM 1 / 58 Inference for
More informationTwo-sample Categorical data: Testing
Two-sample Categorical data: Testing Patrick Breheny April 1 Patrick Breheny Introduction to Biostatistics (171:161) 1/28 Separate vs. paired samples Despite the fact that paired samples usually offer
More information16.400/453J Human Factors Engineering. Design of Experiments II
J Human Factors Engineering Design of Experiments II Review Experiment Design and Descriptive Statistics Research question, independent and dependent variables, histograms, box plots, etc. Inferential
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationAn introduction to biostatistics: part 1
An introduction to biostatistics: part 1 Cavan Reilly September 6, 2017 Table of contents Introduction to data analysis Uncertainty Probability Conditional probability Random variables Discrete random
More informationReview of Statistics
Review of Statistics Topics Descriptive Statistics Mean, Variance Probability Union event, joint event Random Variables Discrete and Continuous Distributions, Moments Two Random Variables Covariance and
More informationPerformance Evaluation
Performance Evaluation David S. Rosenberg Bloomberg ML EDU October 26, 2017 David S. Rosenberg (Bloomberg ML EDU) October 26, 2017 1 / 36 Baseline Models David S. Rosenberg (Bloomberg ML EDU) October 26,
More informationPsych 230. Psychological Measurement and Statistics
Psych 230 Psychological Measurement and Statistics Pedro Wolf December 9, 2009 This Time. Non-Parametric statistics Chi-Square test One-way Two-way Statistical Testing 1. Decide which test to use 2. State
More informationReview of Statistics 101
Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods
More informationTA: Sheng Zhgang (Th 1:20) / 342 (W 1:20) / 343 (W 2:25) / 344 (W 12:05) Haoyang Fan (W 1:20) / 346 (Th 12:05) FINAL EXAM
STAT 301, Fall 2011 Name Lec 4: Ismor Fischer Discussion Section: Please circle one! TA: Sheng Zhgang... 341 (Th 1:20) / 342 (W 1:20) / 343 (W 2:25) / 344 (W 12:05) Haoyang Fan... 345 (W 1:20) / 346 (Th
More informationAP Statistics Cumulative AP Exam Study Guide
AP Statistics Cumulative AP Eam Study Guide Chapters & 3 - Graphs Statistics the science of collecting, analyzing, and drawing conclusions from data. Descriptive methods of organizing and summarizing statistics
More informationSTATISTICS 141 Final Review
STATISTICS 141 Final Review Bin Zou bzou@ualberta.ca Department of Mathematical & Statistical Sciences University of Alberta Winter 2015 Bin Zou (bzou@ualberta.ca) STAT 141 Final Review Winter 2015 1 /
More informationCptS 570 Machine Learning School of EECS Washington State University. CptS Machine Learning 1
CptS 570 Machine Learning School of EECS Washington State University CptS 570 - Machine Learning 1 IEEE Expert, October 1996 CptS 570 - Machine Learning 2 Given sample S from all possible examples D Learner
More informationSTATS 200: Introduction to Statistical Inference. Lecture 29: Course review
STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout
More informationGoodness of Fit Tests
Goodness of Fit Tests Marc H. Mehlman marcmehlman@yahoo.com University of New Haven (University of New Haven) Goodness of Fit Tests 1 / 38 Table of Contents 1 Goodness of Fit Chi Squared Test 2 Tests of
More informationClass 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio
Class 4: Classification Quaid Morris February 11 th, 211 ML4Bio Overview Basic concepts in classification: overfitting, cross-validation, evaluation. Linear Discriminant Analysis and Quadratic Discriminant
More informationHarvard University. Rigorous Research in Engineering Education
Statistical Inference Kari Lock Harvard University Department of Statistics Rigorous Research in Engineering Education 12/3/09 Statistical Inference You have a sample and want to use the data collected
More informationLecture 41 Sections Mon, Apr 7, 2008
Lecture 41 Sections 14.1-14.3 Hampden-Sydney College Mon, Apr 7, 2008 Outline 1 2 3 4 5 one-proportion test that we just studied allows us to test a hypothesis concerning one proportion, or two categories,
More informationChapter 9 Inferences from Two Samples
Chapter 9 Inferences from Two Samples 9-1 Review and Preview 9-2 Two Proportions 9-3 Two Means: Independent Samples 9-4 Two Dependent Samples (Matched Pairs) 9-5 Two Variances or Standard Deviations Review
More informationUnit 9: Inferences for Proportions and Count Data
Unit 9: Inferences for Proportions and Count Data Statistics 571: Statistical Methods Ramón V. León 12/15/2008 Unit 9 - Stat 571 - Ramón V. León 1 Large Sample Confidence Interval for Proportion ( pˆ p)
More informationChapter 10: Chi-Square and F Distributions
Chapter 10: Chi-Square and F Distributions Chapter Notes 1 Chi-Square: Tests of Independence 2 4 & of Homogeneity 2 Chi-Square: Goodness of Fit 5 6 3 Testing & Estimating a Single Variance 7 10 or Standard
More informationCHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)
FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter
More informationStatistical methods for comparing multiple groups. Lecture 7: ANOVA. ANOVA: Definition. ANOVA: Concepts
Statistical methods for comparing multiple groups Lecture 7: ANOVA Sandy Eckel seckel@jhsph.edu 30 April 2008 Continuous data: comparing multiple means Analysis of variance Binary data: comparing multiple
More informationLecture 7: Hypothesis Testing and ANOVA
Lecture 7: Hypothesis Testing and ANOVA Goals Overview of key elements of hypothesis testing Review of common one and two sample tests Introduction to ANOVA Hypothesis Testing The intent of hypothesis
More informationStatistical methods in recognition. Why is classification a problem?
Statistical methods in recognition Basic steps in classifier design collect training images choose a classification model estimate parameters of classification model from training images evaluate model
More informationProbability: Why do we care? Lecture 2: Probability and Distributions. Classical Definition. What is Probability?
Probability: Why do we care? Lecture 2: Probability and Distributions Sandy Eckel seckel@jhsph.edu 22 April 2008 Probability helps us by: Allowing us to translate scientific questions into mathematical
More informationLecture 1: Probability Fundamentals
Lecture 1: Probability Fundamentals IB Paper 7: Probability and Statistics Carl Edward Rasmussen Department of Engineering, University of Cambridge January 22nd, 2008 Rasmussen (CUED) Lecture 1: Probability
More informationProbability and Discrete Distributions
AMS 7L LAB #3 Fall, 2007 Objectives: Probability and Discrete Distributions 1. To explore relative frequency and the Law of Large Numbers 2. To practice the basic rules of probability 3. To work with the
More informationMath Review Sheet, Fall 2008
1 Descriptive Statistics Math 3070-5 Review Sheet, Fall 2008 First we need to know about the relationship among Population Samples Objects The distribution of the population can be given in one of the
More informationPoisson regression: Further topics
Poisson regression: Further topics April 21 Overdispersion One of the defining characteristics of Poisson regression is its lack of a scale parameter: E(Y ) = Var(Y ), and no parameter is available to
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University August 30, 2017 Today: Decision trees Overfitting The Big Picture Coming soon Probabilistic learning MLE,
More information3 PROBABILITY TOPICS
Chapter 3 Probability Topics 135 3 PROBABILITY TOPICS Figure 3.1 Meteor showers are rare, but the probability of them occurring can be calculated. (credit: Navicore/flickr) Introduction It is often necessary
More information