Statistics. Nicodème Paul Faculté de médecine, Université de Strasbourg
|
|
- Claire Juliana Merritt
- 5 years ago
- Views:
Transcription
1 Statistics Nicodème Paul Faculté de médecine, Université de Strasbourg
2 Course logistics Statistics & Experimental plani cation Course website: ( Lecture slides and lecture notes Lectures, quizzes and practical exercises R statistical software Exam 2/59
3 Statistics - De nition A statistic is a quantity or numerical value calculated from a set of data. - Average height of people living in Strasbourg Statistics refer to global caracteristics of population Number of people who smoke Number of people owning a car Relation between smoking and owing a car Statistics is the scienti c discipline that provides methods to make sense of data. - - Descriptive statistics : collecting, summarizing and presenting data Inferential statistics : making inferences, hypothesis testing, determining relationships and making predictions 3/59
4 Statistics applications Biology - Comparison betwen two population of mice: knockout versus wildtype Medecine - Perform clinical trials and data analysis Pharmacy - Knowing whether a new drug is better than the current one Finance - Pricing and portfolio management, risk modelling Agriculture - Plant breeding, the study of the in uence of particular factors on agricultural production, measuring of contribution of production factors, fertilizers and technical progress. 4/59
5 Terminology A population is collection of individuals or objects about which information is desired. A sample is a subset of the population selected for study. A random sample of size n is a sample that is selected in such a way that ensures that every di erent possible sample of the desired size has the same chance of being selected. A variable is any characteristic whose value may change from one individual or object to another. A variable can be categorical: - - Nominal (color : red, black, green, white) Ordinal (size : small, medium, big) A variable can be numerical: - - Discrete (number of s received per day) Continuous (height, weight) 5/59
6 Data Variables as columns and individuals as rows Show 10 entries Search: ID AGE SEX CHESTPAIN RESTBP CHOL MAXHR HD Typical No Asymptomatic Yes Asymptomatic Yes Nonanginal No Nontypical No Nontypical No Asymptomatic Yes Asymptomatic No Asymptomatic Yes Asymptomatic Yes Showing 1 to 10 of 303 entries Previous Next 6/59
7 Graphical representation - barplot Asymptomatic Nonanginal Nontypical Typical 7/59
8 Graphical representation - histogram of Age FREQ (20,30] (30,40] (40,50] (50,60] (60,70] (70,80] BIN 8/59
9 Terminology Distribution of a variable over a sample is given by: ( c 1, n 1 ), ( c 2, n 2 ),..., ( c k, n k ) - n i represents the frequency associated with c i - For categorical variable c i is any value of the variable - For numerical variable c i is an interval ], ] a i b i 9/59
10 Categorical data distribution - Terminology The frequency for a particular category is the number of times the category appears in the data set. The relative frequency is the proportion of the observations that belong to that category, it is caculated as: p = c N Where c is the frequency and N is the number of observations in the data The distribution for a categorical variable is a table that displays the possible categories along with the associated frequencies or relative frequencies. A barchart or barplot is a graphical representation of the distribution of a categorical variable. 10/59
11 Histogram The histogram is a method of displaying data. It displays the shape of the distribution of data values. The range of the data is divided into intervals proportion of the observations falling in each bin c i a i ], ] is plotted. b i or bins, and the number or A histogram is said to be unimodal if it has a single peak, bimodal if it has two peaks and multimodal if it has more than two peaks. A histogram is symmetric if there is a vertical line of symmetry such that the part of the histogram to the left of the line is a mirror image of the part to the right. A unimodal histogram that is not symmetric is said to be skewed. - - If the upper tail of the histogram stretches out much farther than the lower tail, then the distribution of values is positively skewed or right skewed. If the lower tail is much longer than the upper tail, the histogram is negatively skewed or left skewed. 11/59
12 Example 12/59
13 Example 13/59
14 Barplot: group comparison Grouped Stacked No Yes Asymptomatic Nonanginal Nontypical Typical 14/59
15 Histogram: group comparison 77.0 Grouped Stacked No Yes CHOL 15/59
16 Measures of location The sample mean of a sample consisting of numerical observations x 1, x 2,..., x n, denoted by x, is: xˉ = 1 n x i n i=1 The population mean, denoted by μ, is the average of all x values in the entire population. The sample median or Q 2 is obtained by rst ordering the n observations from smallest to largest as x (1) x (2)... x (n). Then: sample median = x (n+1)/2, if n is odd 1 ( + ), if n is even 2 x n x n ( +1) /59
17 Measures of location Data = 75, 69, 88, 93, 95, 54, 87, 88, 27 Ordered data = 27, 54, 69, 75, 87, 88, 88, 93, 95 Sample median = 87 If Data = 100, 75, 69, 88, 93, 95, 54, 87, 88, What is the median? Submit Show Hint Show Answer Clear 17/59
18 Measures of location For any particular number r between 0 and 100, the rth percentile is a value such that r percent of the observations in the data set fall at or below that value. The lower quartile or 25th percentile or Q 1 The upper quartile or 75th percentile or Q 3 is the median of the lower half of the sample. is the median of the upper half of the sample. The mode is the most observed value 18/59
19 Example: Histogram FREQ (0,5] (5,10] (10,15] (15,20] (20,25] (25,30] (30,35] (35,40] (40,45] BIN What is the value of Q 1 given the samples size is 38? 10 5 Q 1 = 5 + ( ) 38 = /59
20 Empirical cummulative distribution The empirical cummulative distribution associated with a sample is the function any real number t, by the expression: card{ : t} F n (t) = n Son graphe est appelé le graphique des fréquences cumulées. x i x i F n de ned for 20/59
21 Empirical cumulative distribution 0.8 FREQ FAILURE 21/59
22 Robust statistics 22/59
23 Check yourself The statistic can be used as a measure of skewness (either right or left). If this statistic is less than 1, the distribution is most likely left skewed. True False mean median Submit Show Hint Show Answer Clear 23/59
24 Measures of dispersion The sample variance, denoted by s 2, is the sum of squared deviations from the mean divided by n 1. That is, s 2 1 = ( x i xˉ) 2 n 1 i=1 n The sample standard deviation is the positive square root of the sample variance and is denoted by s. The variance, denoted by σ 2, is the sum of squared deviations from the mean divided by n. That is, n σ 2 1 = ( x i μ) 2 n i=1 24/59
25 Check yourself Which of the below data sets has the lowest standard deviation? You do not need to calculate the exact standard deviations to answer this question. 0,1,2,3,4,5,6 0,1,3,3,3,5,6 100, 100, 100, 100, 100, 100, 101 0, 25, 50, 100, 125, 150, 1000 Submit Show Hint Show Answer Clear 25/59
26 Measures of dispersion The standard deviation of the population is the positive square root of the variance and is denoted by σ. The interquartile range (IRQ), is a measure of variability de ned as: IRQ = upper quartile lower quartile An observation is an outlier if it is more than 1.5(IRQ) away from the nearest quartile. An outlier is extreme if it is more than 3(IRQ) from the nearest quartile and it is mild otherwise. The coe cient of variation (CV)is a normalized measure of variability de ned as: s CV = 100 xˉ 26/59
27 Boxplot: description 27/59
28 Example:water quality comparison 28/59
29 Check yourself Which of the following statements is supported by the plot? The mean of the distribution is smaller than its median It is not possible to estimate the median without knowing the sample size The distribution is multimodal The IQR of the distribution is roughly 10 Submit Show Hint Show Answer Clear 29/59
30 Check yourself Which of the following statements is not supported by the plot? Both distributions are unimodal B is more variable than A Median of A is higher than median of B Both distributions are roughly symmetric Submit Show Hint Show Answer Clear 30/59
31 Probability - motivation Suppose we have a drug that we know, from long experience, cures a patient with some speci c illness in 70% of cases. A new drug is proposed as having a higher cure rate than the present one. To assess this claim, the new drug is given to 1000 people su ering from the illness, among these, 741 are cured. Do we have signi cant evidence that this new drug is better than the current one? H 0 : the new drug is equally e ective than the the current one H 1 : the new drug is better than the current one Probability calculation - If the new drug is equally e ective as the current one, how likely is it that, by chance, 741 or more people given the new drug will be cured? Statisical inference - Based on the above probability calculation, the data may provide convincing evidence that the new drug is better than the current one. 31/59
32 Check yourself Suppose that the probabiltiy to observe 741 or more cured patients under the assumption that the new medicine in no better that the old is Do the data provide convincing evidence that the new drug is better than the current one? Yes No Submit Show Hint Show Answer Clear 32/59
33 Probability - terminology A random experiment is any activity or situation in which there is uncertainty about which of two or more possible outcomes will result. A bernoulli trial is a random experiment with exactly two possible outcomes: success or failure. - - Tossing a coin with Head or H and Tail or T as possible outcome A patient can be cured by the new medicine or not The collection of all possible outcomes of a random experiment is the sample space Ω the experiment. An outcome from the sample is denoted as ω. for Examples of sample space: Ω = {H, T }, Ω = {HH, HT, T H, T T } An event E is any collection of outcomes from the sample space of a chance experiment. A simple event is an event consisting of exactly one outcome. Tossing a coin twice and obtain at least one head : E = {HH, HT, T H} 33/59
34 Random variables - De nition A random variable X is a real-valued function de ned on a sample space. In other terms, a random variable associates a numerical value to each outcome of a random experiment. A random variable X is discrete if its set of possible values is discrete. Otherwise, it is continous. Tossing a coin: X = {0, 1} Drug trial: number of patient cured by the new medecine in a sample of a 1000 patients. X = {0, 1, 2,..., 1000} 34/59
35 Discrete probability - distribution The probability distribution of a discrete random variable X taking values in { x 1, x 2,..., x n } can be represented by a table: Probability distribution of X X P x 1 p 1 x 2 p x n p n 0 p i 1 n i=1 p i = 1 Drug trial with Cured = 1 and Not cured = 0 Probability distribution example X P /59
36 Couple of discrete random variables Given two discrete random variables X and Y, we can de ne a new random variable (X, Y) whose joint distribution is de ned by: n i=1 m j=1 p ij = P (X =, Y = ) p ij x i y j with 0 p ij 1, = 1. The distribution can be represented as a table: Y X x 1 x 2 y 1 p 11 p 21 y 2 p 12 p y m p 1m p 2m x n p n1 p n2... p nm The marginal distribution of X : P(X = x i ) = p = m i. j=1 p ij The marginal distribution of Y : P(Y = y j ) = p.j = n i=1 p ij 36/59
37 Example - Diagnosing Tuberculosis (TB) Before 1998, culturing was the existing gold standard for diagnosing TB This method took 10 to 15 days to yield a positive or negative result. In 1998, investigators evaluated a DNA technique that turned out to be much faster ("LCx: A Diagnostic Alternative for the Early Detection of Mycobacterium tuberculosis Complex," Diagnostic Microbiology and Infectious Diseases [1998]: ). T models the outcome of the gold standard method: 1 indicates TB, 0 not TB N models the outcome of the DNA test: 1 indicates positive test, 0 negative test The data is summarized in the following table: T N /59
38 Example - Joint distribution calculation T N P(N = 0, T = 0) =, P(N = 0, T = 1) = P(N = 1, T = 0) =, P(N = 1, T = 1) = T N /59
39 Check yourself Calculate P(T = 1). Choose the right answer Not de ned Submit Show Hint Show Answer Clear 39/59
40 Check yourself Calculate P(N = 0). Choose the right answer Not de ned Submit Show Hint Show Answer Clear 40/59
41 Parameters of a random variable X The expectation of a discrete random variable X taking the values probabily values p 1, p 2,..., p n is the number: = n μ = E[X] i=1 x i p i x 1, x 2,..., x n, with We call variance of X, the number if it exists: n σ 2 = V ar(x) = E[(X E[X] ) 2 ] = E[ X 2 ] E[X ] 2 = p i ( x i μ) 2 i=1 σ is called the standard deviation of X. If X and Y are two random variables with expected values E[X] and E[Y], a and b two real numbers, we have the following: E[X + Y ] = E[X] + E[Y ], E[aX + b] = ae[x] + b 41/59
42 Discrete distribution - Bernoulli A random variable X follows a Bernoulli distribution with parameter p noted L(X) = B(p), if it takes only two values commonly noted 0 and 1 with probabilities: P(X = 1) = p P(X = 0) = 1 p - Example: drug trial where a patient is cured with a probability 0.7 The expected value of X is p as: The variance of X is p(1 p) as: E[X] = 1 p + 0 (1 p) = p V ar(x) = E[(X E[X] ) 2 ] = E[ X 2 ] E[X ] 2 = p p 2 = p(1 p) 42/59
43 Discrete distribution - Binomial distribution Given X 1, X 2,..., X n n independent random variables having the same distribution B(p), the random variable Y = X 1 + X X n taken the values 0, 1,..., n follows a binomial distribution noted B(n; p) with parameters n, and p. Its distribution is de ned by: with n n P(Y = k) = ( ) (1 p k = 0, 1,..., n k pk ) n k ( ) = and x! = x (x 1) (x 2) k n! k!(n k)! As sum of independent Bernoulli random variables we have: E[Y ] = np V ar(y ) = np(1 p) 43/59
44 Binomial distribution - Example Sickle cell anemia is a genetic blood disorder where red blood cells lose their exibility and assume an abnormal, rigid, "sickle" shape, which results in a risk of various complications. If both parents are carriers of the disease, then a child has a 25% chance of having the disease, 50% chance of being a carrier, and 25% chance of neither having the disease nor being a carrier. If two parents who are carriers of the disease have 3 children, what is the probability that: (a) two will have the disease? (b) none will have the disease? (c) at least one will neither have the disease nor be a carrier? 44/59
45 Binomial distribution - Example Let X be a random variable that represents the number of children with the disease and Y the number of children that have neither the disease nor be a carrier. We have: L(X) = B(3; 0.25) and L(Y ) = B(3; 0.25) Answers to the questions: - - (a) (b) 3 P(X = 2) = ( ) (1 0.25) = = P(X = 0) = ( ) ( = ( = ) 3 ) 3 - (c) P(Y = 1) + P(Y = 2) + P(Y = 3) = 1 P(Y = 0) = /59
46 Normal distribution A random variable X is said to follow a normal distribution N (μ; ) de parameters and σ 2 > 0 if: σ 2 μ R 1 1 f X (t) = exp( (t μ ) 2 ), t R E(X) = μ and V ar(x) = σ 2 σ 2π 2σ 2 46/59
47 Normal distribution rule 47/59
48 Check yourself A doctor collects a large set of heart rate measurements that approximately follow a normal distribution. He only reports 3 statistics, the mean = 110 beats per minute, the minimum = 65 beats per minute, and the maximum = 155 beats per minute. Which of the following is most likely to be the standard deviation of the distribution? Submit Show Hint Show Answer Clear 48/59
49 Calculate with the normal distribution If L(X) = N (μ; σ 2 ), then random variable Z = X μ has the standard normal distribution N (0; 1) σ If L(X) = N (μ; σ 2 ) and given [a, b[ an interval: a μ X μ b μ P(a X < b) = P( < ) σ σ σ a μ b μ P(a X < b) = P( Z < ) σ σ b μ a μ P(a X < b) = P(Z < ) P(Z ) σ σ b μ a μ P(a X < b) = F Z ( ) F Z ( ) σ σ F Z is the cummulative distribution of the standard normal. 49/59
50 Standard normal distribution table P(Z 0.14) = P(Z 0.58) = P(0.14 Z 0.58) = = /59
51 Calculations P(Z > 0.23) = 1 P(Z 0.23) = = /59
52 Calculations P(Z 0.53) = P(Z 0.53) = 1 P(Z 0.53) = = /59
53 Calculations with L(X) = N (25; 16) P(X 26.4) = P((X 25)/4 ( )/4) = P(Z 0.35) = /59
54 Calculations If L(X) = N (100; 25), calculate P(90 X 105) X P( ) = P( 2 Z 1) P( 2 Z 1) = P(Z 1) P(Z < 2) P(Z 1) P(Z < 2) = P(Z 1) (1 P(Z 2)) P( 2 Z 1) = P(Z 1) + P(Z 2) 1 P( 2 Z 1) = P( 2 Z 1) = P(90 X 105) = /59
55 Properties If two random variables X 1 et X 2 are independant with distribution N ( μ 1 ; σ 2 1 ) and N ( μ 2 ; σ 2 2 ) respectively and α, β real numbers, then: L( X 1 + X 2 ) = N ( μ 1 + μ 2, σ σ 2 2 ) L( X 1 X 2 ) = N ( μ 1 μ 2, σ σ 2 2 ) L(αX 1 β X 2 ) = N (αμ 1 β μ 2, α 2 σ β 2 σ 2 2 ) If L( X 1 ) = N (15; 16) and L( X 2 ) = N (10; 9), let Y = X 1 X 2, we have: P( X 1 X 2 3) = P(Y 3) Y 5 2 P( X 1 X 2 3) = P( ) 5 5 P( X 1 X 2 3) = P(Z 0.4) = P(Z > 0.4) P( X 1 X 2 3) = 1 P(Z 0.4) = /59
56 Check yourself X 1, X 2 and X 3 are independent and normally distributed with the same normal distribution N (0, 1). Y = 2X 1 2 X 2 + X 3 2 What is the distribution of Y? Poisson Uniform Normal Not de ned Submit Show Hint Show Answer Clear 56/59
57 Check yourself Y = 2X 1 2 X 2 + X 3 2 What is the expected value of Y? Submit Show Hint Show Answer Clear 57/59
58 Check yourself Y = 2X 1 2 X 2 + X 3 2 What is variance of Y? Submit Show Hint Show Answer Clear 58/59
59 See you next time 59/59
Statistics. Nicodème Paul Faculté de médecine, Université de Strasbourg. 9/5/2018 Statistics
Statistics Nicodème Paul Faculté de médecine, Université de Strasbourg file:///users/home/npaul/enseignement/esbs/2018-2019/cours/01/index.html#21 1/62 Course logistics Statistics Course website: http://statnipa.appspot.com/
More informationStatistics - Lecture 04
Statistics - Lecture 04 Nicodème Paul Faculté de médecine, Université de Strasbourg file:///users/home/npaul/enseignement/esbs/2018-2019/cours/04/index.html#40 1/40 Correlation In many situations the objective
More informationBIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke
BIOL 51A - Biostatistics 1 1 Lecture 1: Intro to Biostatistics Smoking: hazardous? FEV (l) 1 2 3 4 5 No Yes Smoke BIOL 51A - Biostatistics 1 2 Box Plot a.k.a box-and-whisker diagram or candlestick chart
More informationProbability and Probability Distributions. Dr. Mohammed Alahmed
Probability and Probability Distributions 1 Probability and Probability Distributions Usually we want to do more with data than just describing them! We might want to test certain specific inferences about
More informationWhat is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.
What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty. Statistics is a field of study concerned with the data collection,
More informationExam 1 Review (Notes 1-8)
1 / 17 Exam 1 Review (Notes 1-8) Shiwen Shen Department of Statistics University of South Carolina Elementary Statistics for the Biological and Life Sciences (STAT 205) Basic Concepts 2 / 17 Type of studies:
More informationLecture 1: Descriptive Statistics
Lecture 1: Descriptive Statistics MSU-STT-351-Sum 15 (P. Vellaisamy: MSU-STT-351-Sum 15) Probability & Statistics for Engineers 1 / 56 Contents 1 Introduction 2 Branches of Statistics Descriptive Statistics
More informationUniversity of Jordan Fall 2009/2010 Department of Mathematics
handouts Part 1 (Chapter 1 - Chapter 5) University of Jordan Fall 009/010 Department of Mathematics Chapter 1 Introduction to Introduction; Some Basic Concepts Statistics is a science related to making
More informationChapter 4. Displaying and Summarizing. Quantitative Data
STAT 141 Introduction to Statistics Chapter 4 Displaying and Summarizing Quantitative Data Bin Zou (bzou@ualberta.ca) STAT 141 University of Alberta Winter 2015 1 / 31 4.1 Histograms 1 We divide the range
More informationChapter 3. Data Description
Chapter 3. Data Description Graphical Methods Pie chart It is used to display the percentage of the total number of measurements falling into each of the categories of the variable by partition a circle.
More informationMATH4427 Notebook 4 Fall Semester 2017/2018
MATH4427 Notebook 4 Fall Semester 2017/2018 prepared by Professor Jenny Baglivo c Copyright 2009-2018 by Jenny A. Baglivo. All Rights Reserved. 4 MATH4427 Notebook 4 3 4.1 K th Order Statistics and Their
More informationLearning Objectives for Stat 225
Learning Objectives for Stat 225 08/20/12 Introduction to Probability: Get some general ideas about probability, and learn how to use sample space to compute the probability of a specific event. Set Theory:
More informationStatistics I Chapter 2: Univariate data analysis
Statistics I Chapter 2: Univariate data analysis Chapter 2: Univariate data analysis Contents Graphical displays for categorical data (barchart, piechart) Graphical displays for numerical data data (histogram,
More informationStatistics - Lecture 05
Statistics - Lecture 05 Nicodème Paul Faculté de médecine, Université de Strasbourg http://statnipa.appspot.com/cours/05/index.html#47 1/47 Descriptive statistics and probability Data description and graphical
More information1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.
1-1 Chapter 1 Sampling and Descriptive Statistics 1-2 Why Statistics? Deal with uncertainty in repeated scientific measurements Draw conclusions from data Design valid experiments and draw reliable conclusions
More informationStatistics I Chapter 2: Univariate data analysis
Statistics I Chapter 2: Univariate data analysis Chapter 2: Univariate data analysis Contents Graphical displays for categorical data (barchart, piechart) Graphical displays for numerical data data (histogram,
More informationLast time. Numerical summaries for continuous variables. Center: mean and median. Spread: Standard deviation and inter-quartile range
Lecture 4 Last time Numerical summaries for continuous variables Center: mean and median Spread: Standard deviation and inter-quartile range Exploratory graphics Histogram (revisit modes ) Histograms Histogram
More informationEXAM. Exam #1. Math 3342 Summer II, July 21, 2000 ANSWERS
EXAM Exam # Math 3342 Summer II, 2 July 2, 2 ANSWERS i pts. Problem. Consider the following data: 7, 8, 9, 2,, 7, 2, 3. Find the first quartile, the median, and the third quartile. Make a box and whisker
More informationFinal Exam STAT On a Pareto chart, the frequency should be represented on the A) X-axis B) regression C) Y-axis D) none of the above
King Abdul Aziz University Faculty of Sciences Statistics Department Final Exam STAT 0 First Term 49-430 A 40 Name No ID: Section: You have 40 questions in 9 pages. You have 90 minutes to solve the exam.
More informationChapter 2 Class Notes Sample & Population Descriptions Classifying variables
Chapter 2 Class Notes Sample & Population Descriptions Classifying variables Random Variables (RVs) are discrete quantitative continuous nominal qualitative ordinal Notation and Definitions: a Sample is
More informationLecture 2. Descriptive Statistics: Measures of Center
Lecture 2. Descriptive Statistics: Measures of Center Descriptive Statistics summarize or describe the important characteristics of a known set of data Inferential Statistics use sample data to make inferences
More informationChapter 2 Solutions Page 15 of 28
Chapter Solutions Page 15 of 8.50 a. The median is 55. The mean is about 105. b. The median is a more representative average" than the median here. Notice in the stem-and-leaf plot on p.3 of the text that
More informationCHAPTER 1. Introduction
CHAPTER 1 Introduction Engineers and scientists are constantly exposed to collections of facts, or data. The discipline of statistics provides methods for organizing and summarizing data, and for drawing
More informationLecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1
Lecture Slides Elementary Statistics Tenth Edition and the Triola Statistics Series by Mario F. Triola Slide 1 Chapter 3 Statistics for Describing, Exploring, and Comparing Data 3-1 Overview 3-2 Measures
More information2011 Pearson Education, Inc
Statistics for Business and Economics Chapter 2 Methods for Describing Sets of Data Summary of Central Tendency Measures Measure Formula Description Mean x i / n Balance Point Median ( n +1) Middle Value
More informationAn introduction to biostatistics: part 1
An introduction to biostatistics: part 1 Cavan Reilly September 6, 2017 Table of contents Introduction to data analysis Uncertainty Probability Conditional probability Random variables Discrete random
More informationStat 101 Exam 1 Important Formulas and Concepts 1
1 Chapter 1 1.1 Definitions Stat 101 Exam 1 Important Formulas and Concepts 1 1. Data Any collection of numbers, characters, images, or other items that provide information about something. 2. Categorical/Qualitative
More informationReview for Exam #1. Chapter 1. The Nature of Data. Definitions. Population. Sample. Quantitative data. Qualitative (attribute) data
Review for Exam #1 1 Chapter 1 Population the complete collection of elements (scores, people, measurements, etc.) to be studied Sample a subcollection of elements drawn from a population 11 The Nature
More informationSTAT 200 Chapter 1 Looking at Data - Distributions
STAT 200 Chapter 1 Looking at Data - Distributions What is Statistics? Statistics is a science that involves the design of studies, data collection, summarizing and analyzing the data, interpreting the
More informationLast Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics
Last Lecture Distinguish Populations from Samples Importance of identifying a population and well chosen sample Knowing different Sampling Techniques Distinguish Parameters from Statistics Knowing different
More informationProbability Distributions.
Probability Distributions http://www.pelagicos.net/classes_biometry_fa18.htm Probability Measuring Discrete Outcomes Plotting probabilities for discrete outcomes: 0.6 0.5 0.4 0.3 0.2 0.1 NOTE: Area within
More informationUnits. Exploratory Data Analysis. Variables. Student Data
Units Exploratory Data Analysis Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison Statistics 371 13th September 2005 A unit is an object that can be measured, such as
More informationP8130: Biostatistical Methods I
P8130: Biostatistical Methods I Lecture 2: Descriptive Statistics Cody Chiuzan, PhD Department of Biostatistics Mailman School of Public Health (MSPH) Lecture 1: Recap Intro to Biostatistics Types of Data
More informationREVIEW: Midterm Exam. Spring 2012
REVIEW: Midterm Exam Spring 2012 Introduction Important Definitions: - Data - Statistics - A Population - A census - A sample Types of Data Parameter (Describing a characteristic of the Population) Statistic
More informationHomework 4 Solution, due July 23
Homework 4 Solution, due July 23 Random Variables Problem 1. Let X be the random number on a die: from 1 to. (i) What is the distribution of X? (ii) Calculate EX. (iii) Calculate EX 2. (iv) Calculate Var
More informationLecture 1: Probability Fundamentals
Lecture 1: Probability Fundamentals IB Paper 7: Probability and Statistics Carl Edward Rasmussen Department of Engineering, University of Cambridge January 22nd, 2008 Rasmussen (CUED) Lecture 1: Probability
More informationConditional Probability (cont'd)
Conditional Probability (cont'd) April 26, 2006 Conditional Probability (cont'd) Midterm Problems In a ten-question true-false exam, nd the probability that a student get a grade of 70 percent or better
More informationTastitsticsss? What s that? Principles of Biostatistics and Informatics. Variables, outcomes. Tastitsticsss? What s that?
Tastitsticsss? What s that? Statistics describes random mass phanomenons. Principles of Biostatistics and Informatics nd Lecture: Descriptive Statistics 3 th September Dániel VERES Data Collecting (Sampling)
More informationare the objects described by a set of data. They may be people, animals or things.
( c ) E p s t e i n, C a r t e r a n d B o l l i n g e r 2016 C h a p t e r 5 : E x p l o r i n g D a t a : D i s t r i b u t i o n s P a g e 1 CHAPTER 5: EXPLORING DATA DISTRIBUTIONS 5.1 Creating Histograms
More informationLecture 2: Probability and Distributions
Lecture 2: Probability and Distributions Ani Manichaikul amanicha@jhsph.edu 17 April 2007 1 / 65 Probability: Why do we care? Probability helps us by: Allowing us to translate scientific questions info
More informationDescription of Samples and Populations
Description of Samples and Populations Random Variables Data are generated by some underlying random process or phenomenon. Any datum (data point) represents the outcome of a random variable. We represent
More informationUseful material for the course
Useful material for the course Suggested textbooks: Mood A.M., Graybill F.A., Boes D.C., Introduction to the Theory of Statistics. McGraw-Hill, New York, 1974. [very complete] M.C. Whitlock, D. Schluter,
More information2 Descriptive Statistics
2 Descriptive Statistics Reading: SW Chapter 2, Sections 1-6 A natural first step towards answering a research question is for the experimenter to design a study or experiment to collect data from the
More informationTopic 3: Introduction to Statistics. Algebra 1. Collecting Data. Table of Contents. Categorical or Quantitative? What is the Study of Statistics?!
Topic 3: Introduction to Statistics Collecting Data We collect data through observation, surveys and experiments. We can collect two different types of data: Categorical Quantitative Algebra 1 Table of
More informationBNG 495 Capstone Design. Descriptive Statistics
BNG 495 Capstone Design Descriptive Statistics Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential statistical methods, with a focus
More informationBiostatistics for biomedical profession. BIMM34 Karin Källen & Linda Hartman November-December 2015
Biostatistics for biomedical profession BIMM34 Karin Källen & Linda Hartman November-December 2015 12015-11-02 Who needs a course in biostatistics? - Anyone who uses quntitative methods to interpret biological
More information4. Conditional Probability
1 of 13 7/15/2009 9:25 PM Virtual Laboratories > 2. Probability Spaces > 1 2 3 4 5 6 7 4. Conditional Probability Definitions and Interpretations The Basic Definition As usual, we start with a random experiment
More information3 Lecture 3 Notes: Measures of Variation. The Boxplot. Definition of Probability
3 Lecture 3 Notes: Measures of Variation. The Boxplot. Definition of Probability 3.1 Week 1 Review Creativity is more than just being different. Anybody can plan weird; that s easy. What s hard is to be
More informationIntroduction to Statistics
Introduction to Statistics By A.V. Vedpuriswar October 2, 2016 Introduction The word Statistics is derived from the Italian word stato, which means state. Statista refers to a person involved with the
More informationdates given in your syllabus.
Slide 2-1 For exams (MD1, MD2, and Final): You may bring one 8.5 by 11 sheet of paper with formulas and notes written or typed on both sides to each exam. For the rest of the quizzes, you will take your
More informationStatistics in medicine
Statistics in medicine Lecture 3: Bivariate association : Categorical variables Proportion in one group One group is measured one time: z test Use the z distribution as an approximation to the binomial
More informationDescriptive Statistics-I. Dr Mahmoud Alhussami
Descriptive Statistics-I Dr Mahmoud Alhussami Biostatistics What is the biostatistics? A branch of applied math. that deals with collecting, organizing and interpreting data using well-defined procedures.
More informationSummarizing and Displaying Measurement Data/Understanding and Comparing Distributions
Summarizing and Displaying Measurement Data/Understanding and Comparing Distributions Histograms, Mean, Median, Five-Number Summary and Boxplots, Standard Deviation Thought Questions 1. If you were to
More informationOutline PMF, CDF and PDF Mean, Variance and Percentiles Some Common Distributions. Week 5 Random Variables and Their Distributions
Week 5 Random Variables and Their Distributions Week 5 Objectives This week we give more general definitions of mean value, variance and percentiles, and introduce the first probability models for discrete
More informationProbability: Why do we care? Lecture 2: Probability and Distributions. Classical Definition. What is Probability?
Probability: Why do we care? Lecture 2: Probability and Distributions Sandy Eckel seckel@jhsph.edu 22 April 2008 Probability helps us by: Allowing us to translate scientific questions into mathematical
More informationDescribing distributions with numbers
Describing distributions with numbers A large number or numerical methods are available for describing quantitative data sets. Most of these methods measure one of two data characteristics: The central
More information1. Poisson distribution is widely used in statistics for modeling rare events.
Discrete probability distributions - Class 5 January 20, 2014 Debdeep Pati Poisson distribution 1. Poisson distribution is widely used in statistics for modeling rare events. 2. Ex. Infectious Disease
More information4.2 Probability Models
4.2 Probability Models Ulrich Hoensch Tuesday, February 19, 2013 Sample Spaces Examples 1. When tossing a coin, the sample space is S = {H, T }, where H = heads, T = tails. 2. When randomly selecting a
More information1: PROBABILITY REVIEW
1: PROBABILITY REVIEW Marek Rutkowski School of Mathematics and Statistics University of Sydney Semester 2, 2016 M. Rutkowski (USydney) Slides 1: Probability Review 1 / 56 Outline We will review the following
More informationLecture 01: Introduction
Lecture 01: Introduction Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 01: Introduction
More informationRecap of Basic Probability Theory
02407 Stochastic Processes Recap of Basic Probability Theory Uffe Høgsbro Thygesen Informatics and Mathematical Modelling Technical University of Denmark 2800 Kgs. Lyngby Denmark Email: uht@imm.dtu.dk
More informationChapter 4: Continuous Random Variables and Probability Distributions
Chapter 4: and Probability Distributions Walid Sharabati Purdue University February 14, 2014 Professor Sharabati (Purdue University) Spring 2014 (Slide 1 of 37) Chapter Overview Continuous random variables
More informationRecap of Basic Probability Theory
02407 Stochastic Processes? Recap of Basic Probability Theory Uffe Høgsbro Thygesen Informatics and Mathematical Modelling Technical University of Denmark 2800 Kgs. Lyngby Denmark Email: uht@imm.dtu.dk
More informationSpecial distributions
Special distributions August 22, 2017 STAT 101 Class 4 Slide 1 Outline of Topics 1 Motivation 2 Bernoulli and binomial 3 Poisson 4 Uniform 5 Exponential 6 Normal STAT 101 Class 4 Slide 2 What distributions
More informationSTP 420 INTRODUCTION TO APPLIED STATISTICS NOTES
INTRODUCTION TO APPLIED STATISTICS NOTES PART - DATA CHAPTER LOOKING AT DATA - DISTRIBUTIONS Individuals objects described by a set of data (people, animals, things) - all the data for one individual make
More informationAP Statistics Cumulative AP Exam Study Guide
AP Statistics Cumulative AP Eam Study Guide Chapters & 3 - Graphs Statistics the science of collecting, analyzing, and drawing conclusions from data. Descriptive methods of organizing and summarizing statistics
More informationBiostatistics and Epidemiology, Midterm Review
Biostatistics and Epidemiology, Midterm Review New York Medical College By: Jasmine Nirody This review is meant to cover lectures from the first half of the Biostatistics course. The sections are not organised
More informationConditional Probability (cont...) 10/06/2005
Conditional Probability (cont...) 10/06/2005 Independent Events Two events E and F are independent if both E and F have positive probability and if P (E F ) = P (E), and P (F E) = P (F ). 1 Theorem. If
More informationBinomial and Poisson Probability Distributions
Binomial and Poisson Probability Distributions Esra Akdeniz March 3, 2016 Bernoulli Random Variable Any random variable whose only possible values are 0 or 1 is called a Bernoulli random variable. What
More informationMATH 10 INTRODUCTORY STATISTICS
MATH 10 INTRODUCTORY STATISTICS Tommy Khoo Your friendly neighbourhood graduate student. Week 1 Chapter 1 Introduction What is Statistics? Why do you need to know Statistics? Technical lingo and concepts:
More informationChapter2 Description of samples and populations. 2.1 Introduction.
Chapter2 Description of samples and populations. 2.1 Introduction. Statistics=science of analyzing data. Information collected (data) is gathered in terms of variables (characteristics of a subject that
More informationMATH 19B FINAL EXAM PROBABILITY REVIEW PROBLEMS SPRING, 2010
MATH 9B FINAL EXAM PROBABILITY REVIEW PROBLEMS SPRING, 00 This handout is meant to provide a collection of exercises that use the material from the probability and statistics portion of the course The
More informationSTAT 101 Notes. Introduction to Statistics
STAT 101 Notes Introduction to Statistics September 2017 CONTENTS 1 Introduction 1 1.1 Data........................................................ 2 1.2 Tabular, graphical and numerical summaries...............................
More informationDetermining the Spread of a Distribution
Determining the Spread of a Distribution 1.3-1.5 Cathy Poliak, Ph.D. cathy@math.uh.edu Department of Mathematics University of Houston Lecture 3-2311 Lecture 3-2311 1 / 58 Outline 1 Describing Quantitative
More informationChapter 2: Tools for Exploring Univariate Data
Stats 11 (Fall 2004) Lecture Note Introduction to Statistical Methods for Business and Economics Instructor: Hongquan Xu Chapter 2: Tools for Exploring Univariate Data Section 2.1: Introduction What is
More informationDetermining the Spread of a Distribution
Determining the Spread of a Distribution 1.3-1.5 Cathy Poliak, Ph.D. cathy@math.uh.edu Department of Mathematics University of Houston Lecture 3-2311 Lecture 3-2311 1 / 58 Outline 1 Describing Quantitative
More informationMIDTERM EXAMINATION (Spring 2011) STA301- Statistics and Probability
STA301- Statistics and Probability Solved MCQS From Midterm Papers March 19,2012 MC100401285 Moaaz.pk@gmail.com Mc100401285@gmail.com PSMD01 MIDTERM EXAMINATION (Spring 2011) STA301- Statistics and Probability
More information1 of 14 7/15/2009 9:25 PM Virtual Laboratories > 2. Probability Spaces > 1 2 3 4 5 6 7 5. Independence As usual, suppose that we have a random experiment with sample space S and probability measure P.
More informationIntroduction to Statistical Data Analysis Lecture 3: Probability Distributions
Introduction to Statistical Data Analysis Lecture 3: Probability Distributions James V. Lambers Department of Mathematics The University of Southern Mississippi James V. Lambers Statistical Data Analysis
More informationDescriptive Data Summarization
Descriptive Data Summarization Descriptive data summarization gives the general characteristics of the data and identify the presence of noise or outliers, which is useful for successful data cleaning
More information2. AXIOMATIC PROBABILITY
IA Probability Lent Term 2. AXIOMATIC PROBABILITY 2. The axioms The formulation for classical probability in which all outcomes or points in the sample space are equally likely is too restrictive to develop
More informationCHAPTER 2: Describing Distributions with Numbers
CHAPTER 2: Describing Distributions with Numbers The Basic Practice of Statistics 6 th Edition Moore / Notz / Fligner Lecture PowerPoint Slides Chapter 2 Concepts 2 Measuring Center: Mean and Median Measuring
More informationStat Lecture Slides Exploring Numerical Data. Yibi Huang Department of Statistics University of Chicago
Stat 22000 Lecture Slides Exploring Numerical Data Yibi Huang Department of Statistics University of Chicago Outline In this slide, we cover mostly Section 1.2 & 1.6 in the text. Data and Types of Variables
More information550 = cleaners. Label the managers 1 55 and the cleaners Use random numbers to select 5 managers and 45 cleaners.
Review Exercise 1 1 a A census observes every member of a population. A disadvantage of a census is it would be time-consuming to get opinions from all the employees. OR It would be difficult/time-consuming
More informationAll the men living in Turkey can be a population. The average height of these men can be a population parameter
CHAPTER 1: WHY STUDY STATISTICS? Why Study Statistics? Population is a large (or in nite) set of elements that are in the interest of a research question. A parameter is a speci c characteristic of a population
More informationMath 140 Introductory Statistics
Math 140 Introductory Statistics Professor Silvia Fernández Chapter 2 Based on the book Statistics in Action by A. Watkins, R. Scheaffer, and G. Cobb. Visualizing Distributions Recall the definition: The
More informationMath 140 Introductory Statistics
Visualizing Distributions Math 140 Introductory Statistics Professor Silvia Fernández Chapter Based on the book Statistics in Action by A. Watkins, R. Scheaffer, and G. Cobb. Recall the definition: The
More informationGlossary for the Triola Statistics Series
Glossary for the Triola Statistics Series Absolute deviation The measure of variation equal to the sum of the deviations of each value from the mean, divided by the number of values Acceptance sampling
More informationIntroduction to Probability and Statistics Slides 1 Chapter 1
1 Introduction to Probability and Statistics Slides 1 Chapter 1 Prof. Ammar M. Sarhan, asarhan@mathstat.dal.ca Department of Mathematics and Statistics, Dalhousie University Fall Semester 2010 Course outline
More informationPractice problems from chapters 2 and 3
Practice problems from chapters and 3 Question-1. For each of the following variables, indicate whether it is quantitative or qualitative and specify which of the four levels of measurement (nominal, ordinal,
More informationM378K In-Class Assignment #1
The following problems are a review of M6K. M7K In-Class Assignment # Problem.. Complete the definition of mutual exclusivity of events below: Events A, B Ω are said to be mutually exclusive if A B =.
More informationCounting principles, including permutations and combinations.
1 Counting principles, including permutations and combinations. The binomial theorem: expansion of a + b n, n ε N. THE PRODUCT RULE If there are m different ways of performing an operation and for each
More informationFundamental Tools - Probability Theory II
Fundamental Tools - Probability Theory II MSc Financial Mathematics The University of Warwick September 29, 2015 MSc Financial Mathematics Fundamental Tools - Probability Theory II 1 / 22 Measurable random
More informationSTAT 414: Introduction to Probability Theory
STAT 414: Introduction to Probability Theory Spring 2016; Homework Assignments Latest updated on April 29, 2016 HW1 (Due on Jan. 21) Chapter 1 Problems 1, 8, 9, 10, 11, 18, 19, 26, 28, 30 Theoretical Exercises
More informationLecture 2 and Lecture 3
Lecture 2 and Lecture 3 1 Lecture 2 and Lecture 3 We can describe distributions using 3 characteristics: shape, center and spread. These characteristics have been discussed since the foundation of statistics.
More informationNotes slides from before lecture. CSE 21, Winter 2017, Section A00. Lecture 16 Notes. Class URL:
Notes slides from before lecture CSE 21, Winter 2017, Section A00 Lecture 16 Notes Class URL: http://vlsicad.ucsd.edu/courses/cse21-w17/ Notes slides from before lecture Notes March 8 (1) This week: Days
More informationCommon Discrete Distributions
Common Discrete Distributions Statistics 104 Autumn 2004 Taken from Statistics 110 Lecture Notes Copyright c 2004 by Mark E. Irwin Common Discrete Distributions There are a wide range of popular discrete
More informationA SHORT INTRODUCTION TO PROBABILITY
A Lecture for B.Sc. 2 nd Semester, Statistics (General) A SHORT INTRODUCTION TO PROBABILITY By Dr. Ajit Goswami Dept. of Statistics MDKG College, Dibrugarh 19-Apr-18 1 Terminology The possible outcomes
More informationSTAC51: Categorical data Analysis
STAC51: Categorical data Analysis Mahinda Samarakoon January 26, 2016 Mahinda Samarakoon STAC51: Categorical data Analysis 1 / 32 Table of contents Contingency Tables 1 Contingency Tables Mahinda Samarakoon
More informationAnnouncements. Lecture 1 - Data and Data Summaries. Data. Numerical Data. all variables. continuous discrete. Homework 1 - Out 1/15, due 1/22
Announcements Announcements Lecture 1 - Data and Data Summaries Statistics 102 Colin Rundel January 13, 2013 Homework 1 - Out 1/15, due 1/22 Lab 1 - Tomorrow RStudio accounts created this evening Try logging
More information