(Re)introduction to Statistics Dan Lizotte
|
|
- Shavonne Skinner
- 5 years ago
- Views:
Transcription
1 (Re)introduction to Statistics Dan Lizotte Statistics The systematic collection and arrangement of numerical facts or data of any kind; (also) the branch of science or mathematics concerned with the analysis and interpretation of numerical data and appropriate ways of gathering such data. [OED] Why statistics? Can tell you if you should be surprised by your data Can help predict what future data will look like Data ## We'll use data on the duration and spacing of eruptions ## of the old faithful geyser ## Data are eruption duration and waiting time to next eruption data ("faithful") # load data str (faithful) # display the internal structure of an R object ## 'data.frame': 272 obs. of 2 variables: ## $ eruptions: num ## $ waiting : num Data summaries A statistic is a the result of applying a function (summary) to the data: statistic <- function(data) E.g. ranks: Min, Quantiles, Median, Mean, Max summary (faithful$eruptions) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## Roughly, a quantile for a proportion p is a value x for which p of the data are less than or equal to x. The first quartile, median, and third quartile are the quantiles for p = 0.25, p = 0.5, and p = 0.75, respectively. Visual Summary 1: Box Plot boxplot (faithful$eruptions, main="eruption time", horizontal=t) 1
2 Eruption time Visual Summary 1.5: Box Plot, Jitter Plot library('ggplot2');library(gridextra); #boxplot relatives b1<-ggplot(faithful, aes(x="all",y=eruptions)) + labs(x=null) + geom_boxplot() #jitter plot b2<-ggplot(faithful, aes(x="all",y=eruptions)) + labs(x=null) + geom_jitter(position=position_jitter(height=0,width=0.25)) grid.arrange(b1, b2, nrow=1) 2
3 eruptions 3 eruptions All All Visual Summary 2: Histogram ## Construct histogram of eruption times, plot data points on the x axis hist (faithful$eruptions, main="eruption time", xlab="time (minutes)", ylab="count") points (x=faithful$eruptions,y=rep(0,length(faithful$eruptions)), lwd=4, col='blue') 3
4 Eruption time Count Time (minutes) Visual Summary 2.5: Histogram ## Construct different histogram of eruption times ggplot(faithful, aes(x=eruptions)) + labs(y="proportion") + geom_histogram(aes(y =..count../sum(..count 4
5 Proportion eruptions Visual Summary 3: Empirical Cumulative Distribution Function ## Construct ECDF of eruption times, plot data points on the x axis plot(ecdf(faithful$eruptions), main="eruption time", xlab="time (minutes)", ylab="proportion") points (x=faithful$eruptions,y=rep(0,length(faithful$eruptions)), lwd=4, col='blue') 5
6 Eruption time Proportion Time (minutes) Visual Summary 3.5: Empirical Cumulative Distribution Function ## Different picture of ECDF, with jitter plot ggplot(faithful, aes(x=eruptions)) + labs(x="eruption Time",y="Proportion") + stat_ecdf() + geom_jitter(aes(y=0.125),position=position_jitter(width=0,height=0.1)) 6
7 Proportion Eruption Time Replicates Common assumption is that data consists of replicates that are the same. Come from the same population Come from the same process The goal of data analysis is to understand what the data tell us about the population. Randomness We often assume that we can treat items as if they were distributed randomly. That s so random! Result of a coin flip is random Passengers were screened at random random does not mean uniform Mathematical formalism: events and probability Sample Spaces and Events Sample space S is the set of all possible events we might observe. Depends on context. Coin flips: S = {h, t} Eruption times: S = R 0 7
8 (Eruption times, Eruption waits): S = R 0 R 0 An event is a subset of the sample space. Observe heads: {h} Observe eruption for 2 minutes: {2.0} Observe eruption with length between 1 and 2 minutes and wait between 50 and 70 minutes: [1, 2] [50, 70]. Event Probabilities Any event can be assigned a probability between 0 and 1 (inclusive). Pr({h}) = 0.5 Pr([1, 2] [50, 70]) = 0.10 Probability (OED) Math. As a measurable quantity: the extent to which a particular event is likely to occur, or a particular situation be the case, as measured by the relative frequency of occurrence of events of the same kind in the whole course of experience, and expressed by a number between 0 and 1. An event that cannot happen has probability 0; one that is certain to happen has probability 1. Probability is commonly estimated by the ratio of the number of successful cases to the total number of possible cases, derived mathematically using known properties of the distribution of events, or estimated logically by inferential or inductive reasoning (when mathematical concepts may be inapplicable or insufficient). Axioms of probability Pr is a probability function over S iff 1. For all events A, Pr(A) R, Pr(A) 0 2. Pr(S) = 1 3. If A 1, A 2,... are disjoint, then Pr( A i ) = Pr(A i ) i=1 i=1 Interpreting probability: Objectivist view Suppose we observe n replications of an experiment. Let n(a) be the number of times event A was observed lim n n(a) n = Pr(A) This is (loosely) Borel s Law of Large Numbers (The more correct statment of this is coming up in a few slides.) Subjective interpretation is possible as well. ( Bayesian statistics is related to this idea more later.) 8
9 Abstraction of data: Random Variable We often reduce data to numbers. 1 means heads, 0 means tails. A random variable is a mapping from the event space to a number (or vector.) Usually rendered in uppercase italics X is every statistician s favourite, followed closely by Y and Z. Realizations of X are written in lower case, e.g. x 1, x 2,... We will write the set of possible realizations as: X for X, Y for Y, and so on. Distributions of random variables Realizations are observed according to probabilities specified by the distribution of X Can think of X as an infinite supply of data Separate realizations of the same r.v. X are independent and identically distributed (i.i.d.) Formal definition of a random variable requires measure theory, not covered here Probabilities for random variables Random variable X, realization x. What is the probability we see x? Pr(X = x), (if lazy, Pr(x), but don t do this) Subsets of the domain of a random variable correspond to events. Pr(X > 0) probability that I see a realization that is positive. Discrete Random Variables Discrete random variables take values from a countable set Coin flip X X = {0, 1} Number of snowflakes that fall in a day Y Y = {0, 1, 2,...} Probability Mass Function (PMF) For a discrete X, p X (x) gives Pr(X = x). Requirement: x X p X(x) = 1. Note that the sum can have an infinite number of terms. 9
10 Probability Mass Function (PMF) Example X is number of heads in 20 flips of a fair coin X = {0, 1,..., 20} 0.15 p X (x) x Cumulative Distribution Function (CDF) For a discrete X, P X (x) gives Pr(X x). Requirements: P is nondecreasing sup x X P X (x) = 1 Note: P X (b) = x b p X(x) Pr(a < X b) = P X (b) P X (a) Cumulative Distribution Function (CDF) Example X is number of heads in 20 flips of a fair coin 10
11 P X (x) x Continuous random variables Continuous random variables take values in intervals of R Mass M of a star M = (0, ) Oxygen saturation S of blood S = [0, 1] For a continuous r.v. X, Pr(X = x) = 0 for all x. There is no probability mass function. However, Pr(X (a, b)) 0 in general. Probability Density Function (PDF) For continuous X, Pr(X = x) = 0 and PMF does not exist. However, we define the Probability Density Function f X : Pr(a X b) = b a f X(x) dx Requirement: x f X (x) > 0, f X(x) dx = 1 11
12 Probability Density Function (PDF) Example Density x Cumulative Distribution Function (CDF) For a continuous X, F X (x) gives Pr(X x) = Pr(X (, x]). Requirements: F is nondecreasing sup x X F X (x) = 1 Note: F X (x) = x f X(x) dx Pr(x 1 < X x 2 ) = F X (x 2 ) F X (x 1 ) 12
13 Cumulative Distribution Function (CDF) Example Probability x Expectation The expected value of a discrete random variable X is denoted E[X] = x p X (X = x) x X The expected value of a continuous random variable Y is denoted E[Y ] = y f Y (Y = y) dy y Y E[X] is called the mean of X, often denoted µ or µ X. Sample Mean Given a dataset (collection of realizations) x 1, x 2,..., x n of X, the sample mean is: x n = 1 n Given a dataset, x n is a fixed number. We use X n to denote the random variable corresponding to the sample mean computed from a randomly drawn dataset of size n. i x i 13
14 Datasets and sample means Datasets of size n = 15, sample means plotted in red. (Weak) Law of Large Numbers Informally: If n is large, then x n is probably close to µ X. Formally: lim Pr( X n µ x > ε) = 0 n Statistics, Parameters, and Estimation A statistic is any summary of a dataset. (E.g. function applied to a dataset. Xn, sample median.) A statistic is the result of a A parameter is any summary of the distribution of a random variable. (E.g. µ X, median.) A parameter is the result of a function applied to a distribution. Estimation uses a statistic (e.g. Xn ) to estimate a parameter (e.g. µ X ) of the distribution of a random variable. Estimate: value obtained from a specific dataset Estimator: function (e.g. sum, divide by n) used to compute the estimate Estimand: parameter of interest 14
15 Consistency We often use X n to estimate µ X. Law of Large Numbers is one bit of theory that justifies this choice. An estimator is consistent for an estimand if it converges to the estimand in probability. Sampling Distributions Given an estimate, how good is it? The distribution of an estimator is called its sampling distribution. Bias The expected difference between estimator and parameter. If 0, estimator is unbiased. E[ X n µ X ] Sometimes, Xn > µ X, sometimes X n < µ X, but the long run average of these differences will be zero. Variance The expected squared difference between estimator and its mean 15
16 Positive for all interesting estimators. For an unbiased estimator E[( X n E[ X n ]) 2 ] E[( X n µ X ) 2 ] Sometimes, Xn > µ X, sometimes X n < µ X, but the squared differences are all positive and do not cancel out. Central Limit Theorem Informally: The sampling distribution of Xn is approximately normal if n is big enough. More formally, for X with finite variance: where F Xn ( x) x 1 e ( x µ X ) 2 2σ n 2 σ n 2π σ 2 n = σ2 n is called the standard error and σ 2 is the variance of X. NOTE: More data means lower standard error. Normal (Gaussian) Distribution f X (x) = 1 e (x µ X ) 2 2σ 2 X σ X 2π The normal distribution is special (among other reasons) because many estimators have approximately normal sampling distributions or have sampling distributions that are closely related to the normal. Reminder, σ 2 X = E[(X µ X) 2 ]. If X is normal and we let we have Z = X µ X σ X f Z (z) = 1 2π e z2 2 Who cares? Eruptions dataset has n = 272 observations. Our estimate of the mean of eruption times is x 272 = What is the probability of observing an x 272 that is within 10 seconds of the true mean? 16
17 Who cares? Let σ X272 = σ X / 272, let Z = X 272 µ X σ X272 be a new r.v. By the C.L.T., Pr( 0.17 X 272 µ X 0.17) = Pr( 0.17 σ X272 Z 0.17 σ X272 ) 0.17 σ X272 1 z= 0.17 σ X272 Note! I estimated σ X here. (Look up t-test. ) 2π e z2 2 = z= π e z2 2 = z= π e z2 2 = Density z Confidence Intervals Typically, we specify confidence given by 1 α Use the sampling distribution to get an interval that traps the parameter (estimand) with probability 1 α. 95% C.I. for eruption mean is (3.35, 3.62) 17
18 95% Confidence Region Density z 18
19 What a Confidence Interval Means 19
20 Effect of n on width The Bootstrap CLT gives theoretical approximate sampling distribution of X n. We could also estimate the sampling distribution of Xn by drawing many datasets of size n, computing X n on each, constructing histogram. This is impossible. But we can use the data we have as a surrogate. The Bootstrap Call our dataset D. Draw B new datasets by sampling observations with replacement from D. (B is often at least 1000) Compute X (b) n for each of the datasets. Use the histogram/empirical distribution of these pretend X to determine confidence limits. Bootstrap example library(boot) bootstraps <- boot(faithful$eruptions,function(d,i){mean(d[i])},r=5000) bootdata = data.frame(xbars=bootstraps$t); limits = quantile(bootdata$xbars,c(0.025,0.975)) 20
21 ggplot(bootdata, aes(x=xbars)) + labs(y="prop.") + geom_histogram(aes(y =..density..)) + geom_errorbarh(aes(xmin=limits[[1]], xmax=limits[[2]], y=c(0)),height=0.25,colour="red",size=2) 6 4 Prop xbars 21
22 Reality Check Prop eruptions How much data do I need? Performance measurement: Preview My classifier is correct 20 times out of 30 on this test set! Let X be r.v. representing correctness as {0, 1}. What does µ X mean? Have 50 observations of X. Performance measurement: Preview My classifier is correct 20 times out of 30 on this test set! Let X be r.v. representing correctness as {0, 1}. What does µ X mean? Have 50 observations of X. binom.test(20,30) ## ## Exact binomial test 22
23 ## ## data: 20 and 30 ## number of successes = 20, number of trials = 30, p-value = ## alternative hypothesis: true probability of success is not equal to 0.5 ## 95 percent confidence interval: ## ## sample estimates: ## probability of success ## Test set sample size calculation. Suppose true accuracy is Can I tell the difference from 0.5 with a sample size of 30? How much data would I need to distinguish my classifier from 0.5 with probability (1 β) = 0.8 at a significance level of α = 0.05? Test set sample size calculation. Suppose true accuracy is Can I tell the difference from 0.5 with a sample size of 30? How much data would I need to distinguish my classifier from 0.5 with probability (1 β) = 0.8 at a significance level of α = 0.05? library(pwr) pwr.p.test(h = ES.h(p1 = 0.5, p2 = 0.66), n = NULL, power = 0.8, sig.level = 0.05) ## ## proportion power calculation for binomial distribution (arcsine transformation) ## ## h = ## n = ## sig.level = 0.05 ## power = 0.8 ## alternative = two.sided R summary commands str() shows the structure of a vector, matrix, table, data frame, etc. summary() shows basic summary statistics foo$bar extracts column bar from data frame foo length(),nrow(),ncol() size information for vector, data frame, etc. min(), max(), median(), IQR(), quantile(data,prob) do what you expect IQR is Inter-Quartile Range: 3rd Quartile minus 1st Quartile. mean(), var(), sd() Note: variance and standard deviation use n 1 in denominator Utility commands rep(e,n) creates vector by repeating element e, n times 23
24 which(b) returns list of indices for which boolean expression b is true R commands hist() computes and plots a histogram (probability=t shows proportions instead of frequency) ecdf() boxplot() draws box plot. Whiskers extend at most 1.5 IQR from the nearest quartile. density() constructs a kernel density estimate using given data plot() creates a new scatterplot of given x, y coordinates. Can also be used to plot many other R objects. Try it! points() adds additional points to an existing plot Common function arguments main - Plot title xlab - x label for plot ylab - y label for plot pch - plotting character, what shape to use for points cex - character expansion - multiplicative factor to enlarge/shrink points ggplot2 Very stylish Learning curve steep but worth it Examples in this document Lots of resources on the web 24
EXAM. Exam #1. Math 3342 Summer II, July 21, 2000 ANSWERS
EXAM Exam # Math 3342 Summer II, 2 July 2, 2 ANSWERS i pts. Problem. Consider the following data: 7, 8, 9, 2,, 7, 2, 3. Find the first quartile, the median, and the third quartile. Make a box and whisker
More informationThe Central Limit Theorem
The Central Limit Theorem Patrick Breheny September 27 Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 1 / 31 Kerrich s experiment Introduction 10,000 coin flips Expectation and
More informationCounting principles, including permutations and combinations.
1 Counting principles, including permutations and combinations. The binomial theorem: expansion of a + b n, n ε N. THE PRODUCT RULE If there are m different ways of performing an operation and for each
More informationRandom variables. DS GA 1002 Probability and Statistics for Data Science.
Random variables DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall17 Carlos Fernandez-Granda Motivation Random variables model numerical quantities
More informationStatistics and Econometrics I
Statistics and Econometrics I Random Variables Shiu-Sheng Chen Department of Economics National Taiwan University October 5, 2016 Shiu-Sheng Chen (NTU Econ) Statistics and Econometrics I October 5, 2016
More informationLecture 2: Repetition of probability theory and statistics
Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:
More informationClass 26: review for final exam 18.05, Spring 2014
Probability Class 26: review for final eam 8.05, Spring 204 Counting Sets Inclusion-eclusion principle Rule of product (multiplication rule) Permutation and combinations Basics Outcome, sample space, event
More informationRecitation 2: Probability
Recitation 2: Probability Colin White, Kenny Marino January 23, 2018 Outline Facts about sets Definitions and facts about probability Random Variables and Joint Distributions Characteristics of distributions
More informationLecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019
Lecture 10: Probability distributions DANIEL WELLER TUESDAY, FEBRUARY 19, 2019 Agenda What is probability? (again) Describing probabilities (distributions) Understanding probabilities (expectation) Partial
More informationBandits, Experts, and Games
Bandits, Experts, and Games CMSC 858G Fall 2016 University of Maryland Intro to Probability* Alex Slivkins Microsoft Research NYC * Many of the slides adopted from Ron Jin and Mohammad Hajiaghayi Outline
More informationProbability. Lecture Notes. Adolfo J. Rumbos
Probability Lecture Notes Adolfo J. Rumbos October 20, 204 2 Contents Introduction 5. An example from statistical inference................ 5 2 Probability Spaces 9 2. Sample Spaces and σ fields.....................
More informationCentral Limit Theorem and the Law of Large Numbers Class 6, Jeremy Orloff and Jonathan Bloom
Central Limit Theorem and the Law of Large Numbers Class 6, 8.5 Jeremy Orloff and Jonathan Bloom Learning Goals. Understand the statement of the law of large numbers. 2. Understand the statement of the
More informationBusiness Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee
Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee Lecture - 04 Basic Statistics Part-1 (Refer Slide Time: 00:33)
More informationContinuous Expectation and Variance, the Law of Large Numbers, and the Central Limit Theorem Spring 2014
Continuous Expectation and Variance, the Law of Large Numbers, and the Central Limit Theorem 18.5 Spring 214.5.4.3.2.1-4 -3-2 -1 1 2 3 4 January 1, 217 1 / 31 Expected value Expected value: measure of
More informationBusiness Statistics. Lecture 10: Course Review
Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,
More informationWhat is a random variable
OKAN UNIVERSITY FACULTY OF ENGINEERING AND ARCHITECTURE MATH 256 Probability and Random Processes 04 Random Variables Fall 20 Yrd. Doç. Dr. Didem Kivanc Tureli didemk@ieee.org didem.kivanc@okan.edu.tr
More informationWhy study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables
ECE 6010 Lecture 1 Introduction; Review of Random Variables Readings from G&S: Chapter 1. Section 2.1, Section 2.3, Section 2.4, Section 3.1, Section 3.2, Section 3.5, Section 4.1, Section 4.2, Section
More informationPAC-learning, VC Dimension and Margin-based Bounds
More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based
More informationMath 494: Mathematical Statistics
Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/
More informationProbability and Probability Distributions. Dr. Mohammed Alahmed
Probability and Probability Distributions 1 Probability and Probability Distributions Usually we want to do more with data than just describing them! We might want to test certain specific inferences about
More information2. Variance and Covariance: We will now derive some classic properties of variance and covariance. Assume real-valued random variables X and Y.
CS450 Final Review Problems Fall 08 Solutions or worked answers provided Problems -6 are based on the midterm review Identical problems are marked recap] Please consult previous recitations and textbook
More information1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.
1-1 Chapter 1 Sampling and Descriptive Statistics 1-2 Why Statistics? Deal with uncertainty in repeated scientific measurements Draw conclusions from data Design valid experiments and draw reliable conclusions
More informationSummary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016
8. For any two events E and F, P (E) = P (E F ) + P (E F c ). Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016 Sample space. A sample space consists of a underlying
More informationSingle Maths B: Introduction to Probability
Single Maths B: Introduction to Probability Overview Lecturer Email Office Homework Webpage Dr Jonathan Cumming j.a.cumming@durham.ac.uk CM233 None! http://maths.dur.ac.uk/stats/people/jac/singleb/ 1 Introduction
More informationRandom Variables. Statistics 110. Summer Copyright c 2006 by Mark E. Irwin
Random Variables Statistics 110 Summer 2006 Copyright c 2006 by Mark E. Irwin Random Variables A Random Variable (RV) is a response of a random phenomenon which is numeric. Examples: 1. Roll a die twice
More information7 Random samples and sampling distributions
7 Random samples and sampling distributions 7.1 Introduction - random samples We will use the term experiment in a very general way to refer to some process, procedure or natural phenomena that produces
More informationTo find the median, find the 40 th quartile and the 70 th quartile (which are easily found at y=1 and y=2, respectively). Then we interpolate:
Joel Anderson ST 37-002 Lecture Summary for 2/5/20 Homework 0 First, the definition of a probability mass function p(x) and a cumulative distribution function F(x) is reviewed: Graphically, the drawings
More informationProbability and Statistics
Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be CHAPTER 4: IT IS ALL ABOUT DATA 4a - 1 CHAPTER 4: IT
More informationEE5319R: Problem Set 3 Assigned: 24/08/16, Due: 31/08/16
EE539R: Problem Set 3 Assigned: 24/08/6, Due: 3/08/6. Cover and Thomas: Problem 2.30 (Maimum Entropy): Solution: We are required to maimize H(P X ) over all distributions P X on the non-negative integers
More informationMath Review Sheet, Fall 2008
1 Descriptive Statistics Math 3070-5 Review Sheet, Fall 2008 First we need to know about the relationship among Population Samples Objects The distribution of the population can be given in one of the
More informationLearning Objectives for Stat 225
Learning Objectives for Stat 225 08/20/12 Introduction to Probability: Get some general ideas about probability, and learn how to use sample space to compute the probability of a specific event. Set Theory:
More informationMATH2206 Prob Stat/20.Jan Weekly Review 1-2
MATH2206 Prob Stat/20.Jan.2017 Weekly Review 1-2 This week I explained the idea behind the formula of the well-known statistic standard deviation so that it is clear now why it is a measure of dispersion
More informationAP Statistics Cumulative AP Exam Study Guide
AP Statistics Cumulative AP Eam Study Guide Chapters & 3 - Graphs Statistics the science of collecting, analyzing, and drawing conclusions from data. Descriptive methods of organizing and summarizing statistics
More informationStephen Scott.
1 / 35 (Adapted from Ethem Alpaydin and Tom Mitchell) sscott@cse.unl.edu In Homework 1, you are (supposedly) 1 Choosing a data set 2 Extracting a test set of size > 30 3 Building a tree on the training
More informationExploratory Data Analysis August 26, 2004
Exploratory Data Analysis August 26, 2004 Exploratory Data Analysis p. 1/?? Agent Orange Case Study (SS: Ch 3) Dioxin concentrations in parts per trillion (ppt) for 646 Vietnam veterans and 97 veterans
More informationSummarizing Measured Data
Performance Evaluation: Summarizing Measured Data Hongwei Zhang http://www.cs.wayne.edu/~hzhang The object of statistics is to discover methods of condensing information concerning large groups of allied
More informationCHAPTER 2: Describing Distributions with Numbers
CHAPTER 2: Describing Distributions with Numbers The Basic Practice of Statistics 6 th Edition Moore / Notz / Fligner Lecture PowerPoint Slides Chapter 2 Concepts 2 Measuring Center: Mean and Median Measuring
More informationStatistics 100A Homework 5 Solutions
Chapter 5 Statistics 1A Homework 5 Solutions Ryan Rosario 1. Let X be a random variable with probability density function a What is the value of c? fx { c1 x 1 < x < 1 otherwise We know that for fx to
More informationSTA 111: Probability & Statistical Inference
STA 111: Probability & Statistical Inference Lecture Four Expectation and Continuous Random Variables Instructor: Olanrewaju Michael Akande Department of Statistical Science, Duke University Instructor:
More informationStatistics 1 - Lecture Notes Chapter 1
Statistics 1 - Lecture Notes Chapter 1 Caio Ibsen Graduate School of Economics - Getulio Vargas Foundation April 28, 2009 We want to establish a formal mathematic theory to work with results of experiments
More informationCS280, Spring 2004: Final
CS280, Spring 2004: Final 1. [4 points] Which of the following relations on {0, 1, 2, 3} is an equivalence relation. (If it is, explain why. If it isn t, explain why not.) Just saying Yes or No with no
More informationAlgorithms for Uncertainty Quantification
Algorithms for Uncertainty Quantification Tobias Neckel, Ionuț-Gabriel Farcaș Lehrstuhl Informatik V Summer Semester 2017 Lecture 2: Repetition of probability theory and statistics Example: coin flip Example
More informationMATH Notebook 5 Fall 2018/2019
MATH442601 2 Notebook 5 Fall 2018/2019 prepared by Professor Jenny Baglivo c Copyright 2004-2019 by Jenny A. Baglivo. All Rights Reserved. 5 MATH442601 2 Notebook 5 3 5.1 Sequences of IID Random Variables.............................
More informationLecture 1: August 28
36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 1: August 28 Our broad goal for the first few lectures is to try to understand the behaviour of sums of independent random
More informationReview (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology
Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adopted from Prof. H.R. Rabiee s and also Prof. R. Gutierrez-Osuna
More informationLecture 8 Sampling Theory
Lecture 8 Sampling Theory Thais Paiva STA 111 - Summer 2013 Term II July 11, 2013 1 / 25 Thais Paiva STA 111 - Summer 2013 Term II Lecture 8, 07/11/2013 Lecture Plan 1 Sampling Distributions 2 Law of Large
More informationLecture 1: Probability Fundamentals
Lecture 1: Probability Fundamentals IB Paper 7: Probability and Statistics Carl Edward Rasmussen Department of Engineering, University of Cambridge January 22nd, 2008 Rasmussen (CUED) Lecture 1: Probability
More informationIntroduction to Probability and Statistics (Continued)
Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:
More informationStat 704 Data Analysis I Probability Review
1 / 39 Stat 704 Data Analysis I Probability Review Dr. Yen-Yi Ho Department of Statistics, University of South Carolina A.3 Random Variables 2 / 39 def n: A random variable is defined as a function that
More informationChapter 1: Exploring Data
Chapter 1: Exploring Data Section 1.3 with Numbers The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE Chapter 1 Exploring Data Introduction: Data Analysis: Making Sense of Data 1.1
More informationWeek 9 The Central Limit Theorem and Estimation Concepts
Week 9 and Estimation Concepts Week 9 and Estimation Concepts Week 9 Objectives 1 The Law of Large Numbers and the concept of consistency of averages are introduced. The condition of existence of the population
More informationProbability. Table of contents
Probability Table of contents 1. Important definitions 2. Distributions 3. Discrete distributions 4. Continuous distributions 5. The Normal distribution 6. Multivariate random variables 7. Other continuous
More informationLecture 2: Review of Probability
Lecture 2: Review of Probability Zheng Tian Contents 1 Random Variables and Probability Distributions 2 1.1 Defining probabilities and random variables..................... 2 1.2 Probability distributions................................
More informationBusiness Statistics. Lecture 3: Random Variables and the Normal Distribution
Business Statistics Lecture 3: Random Variables and the Normal Distribution 1 Goals for this Lecture A little bit of probability Random variables The normal distribution 2 Probability vs. Statistics Probability:
More informationMATH4427 Notebook 4 Fall Semester 2017/2018
MATH4427 Notebook 4 Fall Semester 2017/2018 prepared by Professor Jenny Baglivo c Copyright 2009-2018 by Jenny A. Baglivo. All Rights Reserved. 4 MATH4427 Notebook 4 3 4.1 K th Order Statistics and Their
More informationCS 361: Probability & Statistics
February 19, 2018 CS 361: Probability & Statistics Random variables Markov s inequality This theorem says that for any random variable X and any value a, we have A random variable is unlikely to have an
More informationEstimating the accuracy of a hypothesis Setting. Assume a binary classification setting
Estimating the accuracy of a hypothesis Setting Assume a binary classification setting Assume input/output pairs (x, y) are sampled from an unknown probability distribution D = p(x, y) Train a binary classifier
More informationSample Spaces, Random Variables
Sample Spaces, Random Variables Moulinath Banerjee University of Michigan August 3, 22 Probabilities In talking about probabilities, the fundamental object is Ω, the sample space. (elements) in Ω are denoted
More informationCS 5014: Research Methods in Computer Science. Bernoulli Distribution. Binomial Distribution. Poisson Distribution. Clifford A. Shaffer.
Department of Computer Science Virginia Tech Blacksburg, Virginia Copyright c 2015 by Clifford A. Shaffer Computer Science Title page Computer Science Clifford A. Shaffer Fall 2015 Clifford A. Shaffer
More informationLast few slides from last time
Last few slides from last time Example 3: What is the probability that p will fall in a certain range, given p? Flip a coin 50 times. If the coin is fair (p=0.5), what is the probability of getting an
More informationRandom Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R
In probabilistic models, a random variable is a variable whose possible values are numerical outcomes of a random phenomenon. As a function or a map, it maps from an element (or an outcome) of a sample
More informationStatistical Data Analysis
DS-GA 0 Lecture notes 8 Fall 016 1 Descriptive statistics Statistical Data Analysis In this section we consider the problem of analyzing a set of data. We describe several techniques for visualizing the
More informationSCHOOL OF MATHEMATICS AND STATISTICS
RESTRICTED OPEN BOOK EXAMINATION (Not to be removed from the examination hall) Data provided: Statistics Tables by H.R. Neave MAS5052 SCHOOL OF MATHEMATICS AND STATISTICS Basic Statistics Spring Semester
More informationExample. If 4 tickets are drawn with replacement from ,
Example. If 4 tickets are drawn with replacement from 1 2 2 4 6, what are the chances that we observe exactly two 2 s? Exactly two 2 s in a sequence of four draws can occur in many ways. For example, (
More informationNull Hypothesis Significance Testing p-values, significance level, power, t-tests Spring 2017
Null Hypothesis Significance Testing p-values, significance level, power, t-tests 18.05 Spring 2017 Understand this figure f(x H 0 ) x reject H 0 don t reject H 0 reject H 0 x = test statistic f (x H 0
More informationRandom Variables. Definition: A random variable (r.v.) X on the probability space (Ω, F, P) is a mapping
Random Variables Example: We roll a fair die 6 times. Suppose we are interested in the number of 5 s in the 6 rolls. Let X = number of 5 s. Then X could be 0, 1, 2, 3, 4, 5, 6. X = 0 corresponds to the
More informationELEG 3143 Probability & Stochastic Process Ch. 2 Discrete Random Variables
Department of Electrical Engineering University of Arkansas ELEG 3143 Probability & Stochastic Process Ch. 2 Discrete Random Variables Dr. Jingxian Wu wuj@uark.edu OUTLINE 2 Random Variable Discrete Random
More informationContinuous random variables
Continuous random variables Continuous r.v. s take an uncountably infinite number of possible values. Examples: Heights of people Weights of apples Diameters of bolts Life lengths of light-bulbs We cannot
More information6 Single Sample Methods for a Location Parameter
6 Single Sample Methods for a Location Parameter If there are serious departures from parametric test assumptions (e.g., normality or symmetry), nonparametric tests on a measure of central tendency (usually
More informationMAT 271E Probability and Statistics
MAT 71E Probability and Statistics Spring 013 Instructor : Class Meets : Office Hours : Textbook : Supp. Text : İlker Bayram EEB 1103 ibayram@itu.edu.tr 13.30 1.30, Wednesday EEB 5303 10.00 1.00, Wednesday
More informationStatistical inference (estimation, hypothesis tests, confidence intervals) Oct 2018
Statistical inference (estimation, hypothesis tests, confidence intervals) Oct 2018 Sampling A trait is measured on each member of a population. f(y) = propn of individuals in the popn with measurement
More informationName: Firas Rassoul-Agha
Midterm 1 - Math 5010 - Spring 016 Name: Firas Rassoul-Agha Solve the following 4 problems. You have to clearly explain your solution. The answer carries no points. Only the work does. CALCULATORS ARE
More informationFourier and Stats / Astro Stats and Measurement : Stats Notes
Fourier and Stats / Astro Stats and Measurement : Stats Notes Andy Lawrence, University of Edinburgh Autumn 2013 1 Probabilities, distributions, and errors Laplace once said Probability theory is nothing
More informationProbability Distributions & Sampling Distributions
GOV 2000 Section 4: Probability Distributions & Sampling Distributions Konstantin Kashin 1 Harvard University September 26, 2012 1 These notes and accompanying code draw on the notes from Molly Roberts,
More informationPolitical Science Math Camp: Problem Set 2
Political Science Math Camp: Problem Set 2 Due Thursday Aug at 9:00 am. Suppose the probability of observation O given hypothesis H is P (O H ) = 3, the probability of observation O 2 given H is P (O 2
More informationHow Monte Carlo Sampling Contributes to Data Analysis. Outline
http://www.math.umd.edu/~evs/mmistat09.pdf How Monte Carlo Sampling Contributes to Data Analysis Eric Slud, Mathematics Department, UMCP Objective: to explain an experimental approach to Probability &
More informationOutline. Unit 3: Inferential Statistics for Continuous Data. Outline. Inferential statistics for continuous data. Inferential statistics Preliminaries
Unit 3: Inferential Statistics for Continuous Data Statistics for Linguists with R A SIGIL Course Designed by Marco Baroni 1 and Stefan Evert 1 Center for Mind/Brain Sciences (CIMeC) University of Trento,
More informationEpidemiology Principle of Biostatistics Chapter 11 - Inference about probability in a single population. John Koval
Epidemiology 9509 Principle of Biostatistics Chapter 11 - Inference about probability in a single population John Koval Department of Epidemiology and Biostatistics University of Western Ontario What is
More information1.1 Review of Probability Theory
1.1 Review of Probability Theory Angela Peace Biomathemtics II MATH 5355 Spring 2017 Lecture notes follow: Allen, Linda JS. An introduction to stochastic processes with applications to biology. CRC Press,
More informationSTAT Chapter 5 Continuous Distributions
STAT 270 - Chapter 5 Continuous Distributions June 27, 2012 Shirin Golchi () STAT270 June 27, 2012 1 / 59 Continuous rv s Definition: X is a continuous rv if it takes values in an interval, i.e., range
More information1.3: Describing Quantitative Data with Numbers
1.3: Describing Quantitative Data with Numbers Section 1.3 Describing Quantitative Data with Numbers After this section, you should be able to MEASURE center with the mean and median MEASURE spread with
More informationDiscrete Random Variables
CPSC 53 Systems Modeling and Simulation Discrete Random Variables Dr. Anirban Mahanti Department of Computer Science University of Calgary mahanti@cpsc.ucalgary.ca Random Variables A random variable is
More informationAn introduction to biostatistics: part 1
An introduction to biostatistics: part 1 Cavan Reilly September 6, 2017 Table of contents Introduction to data analysis Uncertainty Probability Conditional probability Random variables Discrete random
More informationMath 151. Rumbos Fall Solutions to Review Problems for Exam 2. Pr(X = 1) = ) = Pr(X = 2) = Pr(X = 3) = p X. (k) =
Math 5. Rumbos Fall 07 Solutions to Review Problems for Exam. A bowl contains 5 chips of the same size and shape. Two chips are red and the other three are blue. Draw three chips from the bowl at random,
More information(Ch 3.4.1, 3.4.2, 4.1, 4.2, 4.3)
3 Probability Distributions (Ch 3.4.1, 3.4.2, 4.1, 4.2, 4.3) Probability Distribution Functions Probability distribution function (pdf): Function for mapping random variables to real numbers. Discrete
More informationLecture Notes 7 Random Processes. Markov Processes Markov Chains. Random Processes
Lecture Notes 7 Random Processes Definition IID Processes Bernoulli Process Binomial Counting Process Interarrival Time Process Markov Processes Markov Chains Classification of States Steady State Probabilities
More informationIntroduction to Statistical Inference
Introduction to Statistical Inference Dr. Fatima Sanchez-Cabo f.sanchezcabo@tugraz.at http://www.genome.tugraz.at Institute for Genomics and Bioinformatics, Graz University of Technology, Austria Introduction
More informationSome Assorted Formulae. Some confidence intervals: σ n. x ± z α/2. x ± t n 1;α/2 n. ˆp(1 ˆp) ˆp ± z α/2 n. χ 2 n 1;1 α/2. n 1;α/2
STA 248 H1S MIDTERM TEST February 26, 2008 SURNAME: SOLUTIONS GIVEN NAME: STUDENT NUMBER: INSTRUCTIONS: Time: 1 hour and 50 minutes Aids allowed: calculator Tables of the standard normal, t and chi-square
More information20 Hypothesis Testing, Part I
20 Hypothesis Testing, Part I Bob has told Alice that the average hourly rate for a lawyer in Virginia is $200 with a standard deviation of $50, but Alice wants to test this claim. If Bob is right, she
More informationProbability Theory. Introduction to Probability Theory. Principles of Counting Examples. Principles of Counting. Probability spaces.
Probability Theory To start out the course, we need to know something about statistics and probability Introduction to Probability Theory L645 Advanced NLP Autumn 2009 This is only an introduction; for
More information2 Chapter 2: Conditional Probability
STAT 421 Lecture Notes 18 2 Chapter 2: Conditional Probability Consider a sample space S and two events A and B. For example, suppose that the equally likely sample space is S = {0, 1, 2,..., 99} and A
More informationCIVL 7012/8012. Collection and Analysis of Information
CIVL 7012/8012 Collection and Analysis of Information Uncertainty in Engineering Statistics deals with the collection and analysis of data to solve real-world problems. Uncertainty is inherent in all real
More informationPROBABILITY THEORY REVIEW
PROBABILITY THEORY REVIEW CMPUT 466/551 Martha White Fall, 2017 REMINDERS Assignment 1 is due on September 28 Thought questions 1 are due on September 21 Chapters 1-4, about 40 pages If you are printing,
More informationSTATISTICS 1 REVISION NOTES
STATISTICS 1 REVISION NOTES Statistical Model Representing and summarising Sample Data Key words: Quantitative Data This is data in NUMERICAL FORM such as shoe size, height etc. Qualitative Data This is
More informationExam 2 Practice Questions, 18.05, Spring 2014
Exam 2 Practice Questions, 18.05, Spring 2014 Note: This is a set of practice problems for exam 2. The actual exam will be much shorter. Within each section we ve arranged the problems roughly in order
More informationSampling Distribution: Week 6
Sampling Distribution: Week 6 Kwonsang Lee University of Pennsylvania kwonlee@wharton.upenn.edu February 27, 2015 Kwonsang Lee STAT111 February 27, 2015 1 / 16 Sampling Distribution: Sample Mean If X 1,
More information(Ch 3.4.1, 3.4.2, 4.1, 4.2, 4.3)
3 Probability Distributions (Ch 3.4.1, 3.4.2, 4.1, 4.2, 4.3) Probability Distribution Functions Probability distribution function (pdf): Function for mapping random variables to real numbers. Discrete
More informationApplied Regression Analysis
Applied Regression Analysis Lecture 2 January 27, 2005 Lecture #2-1/27/2005 Slide 1 of 46 Today s Lecture Simple linear regression. Partitioning the sum of squares. Tests of significance.. Regression diagnostics
More informationChapter 1 Statistical Inference
Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations
More informationSTAT 4385 Topic 01: Introduction & Review
STAT 4385 Topic 01: Introduction & Review Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2016 Outline Welcome What is Regression Analysis? Basics
More information