probability George Nicholson and Chris Holmes 29th October 2008
|
|
- Elizabeth Stafford
- 5 years ago
- Views:
Transcription
1 probability George Nicholson and Chris Holmes 29th October 2008 This practical focuses on understanding probabilistic and statistical concepts using simulation and plots in R R. It begins with an introduction to writing loops and functions in R. Loops in R This section gives a very brief introduction to writing loops in R. 1 Type in the following code and see what it does. sum=0 for(i in 1:10){ sum=sum+i print(paste("loop ",i,", sum = ",sum,sep="")) Now try this: for(mychar in letters){ print(paste("loop ",mychar,sep="")) Define a vector of characters (e.g. cv=c("s","g","u")) and write a loop that will calculate the sum of the letter indices of cv (i.e =47). (Think about using the function match() in your loop.) Can you think of a way of an automated way of doing this calculation without the loop? 1 You should try whenever possible to avoid using loops in R, as they are relatively slow ways to do computations. The way to work efficiently is to vectorise calculations (e.g. use matrix multiplication), and to use specialized R functions that can do iterative calculations fast (e.g. rowsums(), ifelse()). 1
2 Functions in R A function in R is defined using the following syntax: funname=function(argument1, argument2, etc.){...code here... listout=list(outname1=object1, outname2=object2, etc.) return(listout) You only need to create a list to return if you want to return more than one R object at a time (otherwise you can just use return(object)). First, let s create a function that takes, as arguments, two numbers, and returns the first number raised to the power of the second number: pow=function(x,p){ out=x^p return(out) Play around with the function pow. What happens if the argument x is a vector? What about p? Now create a function whose arguments are two numbers, m and n say; the function must return two objects, named div and rem, with the property that m divides n div times with remainder rem (e.g. 5 divides 7 once leaving remainder 2). divide=function(m,n){ div=floor(n/m) rem=n-m*div return(div,rem) Now write a function which takes a single argument, x say (a numeric vector). The function must calculate the sample mean and variance of x using the following formulae: x = 1 n x i n s 2 = 1 n 1 i=1 n (x i x) 2 where n is the length of x. Return the objects mean and variance in a list. i=1 2
3 Simulation and probability in R 1. Generate n random draws from a standard Gaussian distribution. Plot a histogram of the data. Overlay the histogram with the density function of a standard Gaussian; first look at?dnorm and understand what this function does. xpl=seq(-5,5,l=10000) lines(x=xpl,y=dnorm(xpl)) Repeat 6 times. Open a graphics device, and define a 2 3 layout using the argument mfrow to the par function. Plot the 6 histograms in this graphics device. Export the plots into a.pdf file using the saveplot() function. How does the appearance of the histograms change with n? 2. Repeat the previous task in its entirety, but now create Q-Q plots instead of histograms; superimpose the line y = x on each plot using abline(). 3. Generate n draws from a standard Gaussian distribution; calculate the mean of the n values. Repeat the procedure of the previous sentence m times, storing the m means in a vector, mv say. Standardise mv (i.e. subtract its mean and divide by its standard deviation). Plot the standardised vector mv in a histogram and overlay with the density of a standard Gaussian. What do you observe? Vary n and m; what happens as they increase? Can you spot a pattern? 4. Repeat the previous exercise, but use the chi-squared distribution with one degree of freedom as the generating distribution (rchisq()). (You may want to remind yourself what this distribution looks like by plotting a histogram of some simulated data before setting out.) 5. Generate n draws from a standard Gaussian; save the 0.1 quantile (10th percentile); repeat this m times, storing the m values in a vector, mv say; plot a histogram of mv and add a line indicating the theoretical 0.1 quantile (find this using qnorm). Repeat the previous sentence s procedure for the median and for the mean. Boxplot the distribution of the means and distribution of the medians. What do you see? 6. Write a function with arguments n, m and alpha. The function should generate a vector of length m, mv say, each element of which is the alpha quantile of a random sample of size n from a standard Gaussian. The function should create a histogram of mv, label it appropriately, and return mv. Note: you ve just written a function to plot the distribution of the order statistics!! Compare the empirical distribution of mv with the distribution of a standard Gaussian. What happens as n and m become large? 3
4 7. Generate a pair of independent draws from a standard Gaussian distribution. Write a function which simulates n pairs and then calculates the proportion of pairs for which (i) both members are smaller than alpha (ii) either member is smaller than alpha. Your function should return these two proportions in a list. Calculate the exact (theoretical) probability of these events and compare your accuracy as n changes. Amend your function so that the above is repeated m times, plots a histogram of each of the two vectors of length m, and returns these two vectors in a list. Finally, can you repeat this whole procedure, but now with triples rather than pairs? 8. Generate a pair of samples from a multivariate normal density with correlation rho=0.6. Scatterplot a large number. Write a function that uses simulation to estimate the conditional probability P r(x > 0.6 Y > 0.6) and returns the estimate, along with the marginal estimate of P r(x > 0.6). Generalise your function to take rho, alpha and beta as arguments, and to estimate P r(x > α Y > β) and P r(x > α) where each pair (X, Y ) is drawn from from a multivariate normal density with correlation rho. 9. Using very large sample sizes (say 100,000 samples) draw correlated pairs of multivariate Gaussian observations, (X, Y ). Store X only if Y is in some small range (e.g. store X only if 0.6 < Y < 0.61). Explore the resulting distribution of the stored X values. What do you see? 10. Generate n random draws from a standard Gaussian distribution. Store this in a vector z. Input xpl=seq(-5,5,l=10000) fn=ecdf(x=z) Try to work out what the object fn is. What does fn(xpl) return? Plot the empirical CDF of the data: plot(x=xpl,y=fn(xpl),type="l") Superimpose the theoretical CDF of a standard Gaussian distribution (see pnorm). 11. Write a function g that takes as arguments two numeric vectors, u and v say; it must return an argument, w say, the same length as u, with the ith element in w equal to the proportion of elements in v that are less than or equal to the ith element in u. Generate n random draws from a standard Gaussian distribution and store in a vector z. Then run gout=g(u=xpl,v=z) 4
5 fn=ecdf(x=z) fout=fn(xpl) Compare fout and gout. If they re the same, you ve written a function to evaluate the empirical CDF of a sample of data, v, at a set of points, u! Hypothesis Testing In the examples above we ve considered random variation that naturally occurs when we sample from a population. In this section we will look at random variation that occurs when we look for differences between samples drawn from two populations. That is, when testing for differences. The next couple of questions are aimed to get you thinking about p-values under the null (when there is no change in distribution between two treatments/experiments/categories) 1. Generate two sets of 50 values each from a standard normal N(0,1). That is, they have the same distribution Use t.test() to test for differences in the means: Write a for loop to repeat the test 1000 times (with different data sets drawn from rnorm() each time) storing the p-values from the 1000 tests. Plot a histogram of the p-values and Q-Q plot them against a uniform density (note: you can approximate the theoretical i th quantile of a uniform by (i/(n+1)). What percentage of your p-values fall below 0.05? Is that what you expect? 2. Perform question (1) above but now using 100 samples for each set. Histogram the p-values from 1000 repeats and see how many fall below What do you expect to see? 3. Repeat (1) and (2) but now using a chi-squared distribution to draw the samples. Is the t-test robust to changes in distribution? 4. Generate 50 points from N(0, 1) and 50 points from N(µ, 1) with say µ = 0.4, y <- rnorm(50)+µ. Repeat question (1) above and plot the histograms of p-values. What percentage of p-values fall below 0.05? Compare your result with that given by power.t.test(100,µ,1,0.1) 5. Permutation testing is a great way to think about how we can explore the natural variation in a test result that occurs purely by chance when the null is true. To do a permutation we randomly swap points between the two sets of samples that we re testing. Having done this we know (by design) that there is no association between the class labels and the measurements (think about this!). Hence, any association we do see is purely by chance. To demonsatrate the principle perform 5
6 the following: Generate 50 points from N(0, 1), x<-rnorm(50) and 50 points from N(µ, 1), y<-rnorm(50) + µ. Swap points from X to Y at random. Suppose you have data stored in x and y then the following code will shuffle the points across to create xnew ynew n <- length(x) t <- rnorm(n) indx <- t < 0 indx_2 <- t > 0 xnew <- x[indx] ynew <- x[indx_2] t <- rnorm(n) indx <- t < 0 indx_2 <- t > 0 xnew <- c(xnew, y[indx]) ynew <- c(ynew, y[indx_2]) Use t.test to test for association. Repeat the above 1000 times and plot the distribution of the p-values. 6. Repeat the task in (6) but this time store the standardised difference between the sample means at each point. You can use the code (below) to calculate the standardised difference in mean. mu_x <- mean(x_new) mu_y <- mean(y_new) n <- length(mu_x) grouped_standard_deviation <- sqrt(((n-1)/(2*n-2))*(sd(x_new)+sd(y_new)) mu_dif <- (mu_x - mu_y) / (grouped_standard_deviation / sqrt(2*n)) plot the distribution of standardised mean differences using histogram and q-q plots. What is the distribution? 6
probability George Nicholson and Chris Holmes 31st October 2008
probability George Nicholson and Chris Holmes 31st October 2008 This practical focuses on understanding probabilistic and statistical concepts using simulation and plots in R R. It begins with an introduction
More informationRobustness and Distribution Assumptions
Chapter 1 Robustness and Distribution Assumptions 1.1 Introduction In statistics, one often works with model assumptions, i.e., one assumes that data follow a certain model. Then one makes use of methodology
More informationIntroductory Statistics with R: Simple Inferences for continuous data
Introductory Statistics with R: Simple Inferences for continuous data Statistical Packages STAT 1301 / 2300, Fall 2014 Sungkyu Jung Department of Statistics University of Pittsburgh E-mail: sungkyu@pitt.edu
More informationExam Empirical Methods VU University Amsterdam, Faculty of Exact Sciences h, February 12, 2015
Exam Empirical Methods VU University Amsterdam, Faculty of Exact Sciences 18.30 21.15h, February 12, 2015 Question 1 is on this page. Always motivate your answers. Write your answers in English. Only the
More informationOutline. Unit 3: Inferential Statistics for Continuous Data. Outline. Inferential statistics for continuous data. Inferential statistics Preliminaries
Unit 3: Inferential Statistics for Continuous Data Statistics for Linguists with R A SIGIL Course Designed by Marco Baroni 1 and Stefan Evert 1 Center for Mind/Brain Sciences (CIMeC) University of Trento,
More informationLecture 3. The Population Variance. The population variance, denoted σ 2, is the sum. of the squared deviations about the population
Lecture 5 1 Lecture 3 The Population Variance The population variance, denoted σ 2, is the sum of the squared deviations about the population mean divided by the number of observations in the population,
More informationThe Central Limit Theorem
The Central Limit Theorem for Sums By: OpenStaxCollege Suppose X is a random variable with a distribution that may be known or unknown (it can be any distribution) and suppose: 1. μ X = the mean of Χ 2.
More informationProbability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur
Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur Lecture No. # 36 Sampling Distribution and Parameter Estimation
More informationQuiz 1. Name: Instructions: Closed book, notes, and no electronic devices.
Quiz 1. Name: Instructions: Closed book, notes, and no electronic devices. 1.(10) What is usually true about a parameter of a model? A. It is a known number B. It is determined by the data C. It is an
More informationModeling Uncertainty in the Earth Sciences Jef Caers Stanford University
Probability theory and statistical analysis: a review Modeling Uncertainty in the Earth Sciences Jef Caers Stanford University Concepts assumed known Histograms, mean, median, spread, quantiles Probability,
More informationReview. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda
Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with
More informationAssignments. Statistics Workshop 1: Introduction to R. Tuesday May 26, Atoms, Vectors and Matrices
Statistics Workshop 1: Introduction to R. Tuesday May 26, 2009 Assignments Generally speaking, there are three basic forms of assigning data. Case one is the single atom or a single number. Assigning a
More informationChapter 3: Statistical methods for estimation and testing. Key reference: Statistical methods in bioinformatics by Ewens & Grant (2001).
Chapter 3: Statistical methods for estimation and testing Key reference: Statistical methods in bioinformatics by Ewens & Grant (2001). Chapter 3: Statistical methods for estimation and testing Key reference:
More informationSTATISTICS 1 REVISION NOTES
STATISTICS 1 REVISION NOTES Statistical Model Representing and summarising Sample Data Key words: Quantitative Data This is data in NUMERICAL FORM such as shoe size, height etc. Qualitative Data This is
More informationEE/CpE 345. Modeling and Simulation. Fall Class 10 November 18, 2002
EE/CpE 345 Modeling and Simulation Class 0 November 8, 2002 Input Modeling Inputs(t) Actual System Outputs(t) Parameters? Simulated System Outputs(t) The input data is the driving force for the simulation
More informationProbability theory and inference statistics! Dr. Paola Grosso! SNE research group!! (preferred!)!!
Probability theory and inference statistics Dr. Paola Grosso SNE research group p.grosso@uva.nl paola.grosso@os3.nl (preferred) Roadmap Lecture 1: Monday Sep. 22nd Collecting data Presenting data Descriptive
More informationSTA2601. Tutorial Letter 104/1/2014. Applied Statistics II. Semester 1. Department of Statistics STA2601/104/1/2014 TRIAL EXAMINATION PAPER
STA2601/104/1/2014 Tutorial Letter 104/1/2014 Applied Statistics II STA2601 Semester 1 Department of Statistics TRIAL EXAMINATION PAPER BAR CODE Learn without limits. university of south africa Dear Student
More informationLinear Regression. In this lecture we will study a particular type of regression model: the linear regression model
1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor
More informationGlossary. The ISI glossary of statistical terms provides definitions in a number of different languages:
Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the
More informationNull Hypothesis Significance Testing p-values, significance level, power, t-tests Spring 2017
Null Hypothesis Significance Testing p-values, significance level, power, t-tests 18.05 Spring 2017 Understand this figure f(x H 0 ) x reject H 0 don t reject H 0 reject H 0 x = test statistic f (x H 0
More information1 Measures of the Center of a Distribution
1 Measures of the Center of a Distribution Qualitative descriptions of the shape of a distribution are important and useful. But we will often desire the precision of numerical summaries as well. Two aspects
More informationHypothesis Tests and Estimation for Population Variances. Copyright 2014 Pearson Education, Inc.
Hypothesis Tests and Estimation for Population Variances 11-1 Learning Outcomes Outcome 1. Formulate and carry out hypothesis tests for a single population variance. Outcome 2. Develop and interpret confidence
More informationFirst steps of multivariate data analysis
First steps of multivariate data analysis November 28, 2016 Let s Have Some Coffee We reproduce the coffee example from Carmona, page 60 ff. This vignette is the first excursion away from univariate data.
More informationNonparametric Estimation of Distributions in a Large-p, Small-n Setting
Nonparametric Estimation of Distributions in a Large-p, Small-n Setting Jeffrey D. Hart Department of Statistics, Texas A&M University Current and Future Trends in Nonparametrics Columbia, South Carolina
More informationReview of Basic Probability Theory
Review of Basic Probability Theory James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) 1 / 35 Review of Basic Probability Theory
More informationNull Hypothesis Significance Testing p-values, significance level, power, t-tests
Null Hypothesis Significance Testing p-values, significance level, power, t-tests 18.05 Spring 2014 January 1, 2017 1 /22 Understand this figure f(x H 0 ) x reject H 0 don t reject H 0 reject H 0 x = test
More informationMATH4427 Notebook 4 Fall Semester 2017/2018
MATH4427 Notebook 4 Fall Semester 2017/2018 prepared by Professor Jenny Baglivo c Copyright 2009-2018 by Jenny A. Baglivo. All Rights Reserved. 4 MATH4427 Notebook 4 3 4.1 K th Order Statistics and Their
More informationFundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur
Fundamentals to Biostatistics Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Statistics collection, analysis, interpretation of data development of new
More information3 Joint Distributions 71
2.2.3 The Normal Distribution 54 2.2.4 The Beta Density 58 2.3 Functions of a Random Variable 58 2.4 Concluding Remarks 64 2.5 Problems 64 3 Joint Distributions 71 3.1 Introduction 71 3.2 Discrete Random
More information6348 Final, Fall 14. Closed book, closed notes, no electronic devices. Points (out of 200) in parentheses.
6348 Final, Fall 14. Closed book, closed notes, no electronic devices. Points (out of 200) in parentheses. 0 11 1 1.(5) Give the result of the following matrix multiplication: 1 10 1 Solution: 0 1 1 2
More informationChapter 3 - The Normal (or Gaussian) Distribution Read sections
Chapter 3 - The Normal (or Gaussian) Distribution Read sections 3.1-3.2 Basic facts (3.1) The normal distribution gives the distribution for a continuous variable X on the interval (-, ). The notation
More information(Re)introduction to Statistics Dan Lizotte
(Re)introduction to Statistics Dan Lizotte 2017-01-17 Statistics The systematic collection and arrangement of numerical facts or data of any kind; (also) the branch of science or mathematics concerned
More informationEE/CpE 345. Modeling and Simulation. Fall Class 9
EE/CpE 345 Modeling and Simulation Class 9 208 Input Modeling Inputs(t) Actual System Outputs(t) Parameters? Simulated System Outputs(t) The input data is the driving force for the simulation - the behavior
More informationStatistical Inference
Statistical Inference Bernhard Klingenberg Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Outline Estimation: Review of concepts
More information1 Introduction to Minitab
1 Introduction to Minitab Minitab is a statistical analysis software package. The software is freely available to all students and is downloadable through the Technology Tab at my.calpoly.edu. When you
More informationLecture 5 - Plots and lines
Lecture 5 - Plots and lines Understanding magic Let us look at the following curious thing: =rnorm(100) y=rnorm(100,sd=0.1)+ k=ks.test(,y) k Two-sample Kolmogorov-Smirnov test data: and y D = 0.05, p-value
More informationHomework for 1/13 Due 1/22
Name: ID: Homework for 1/13 Due 1/22 1. [ 5-23] An irregularly shaped object of unknown area A is located in the unit square 0 x 1, 0 y 1. Consider a random point distributed uniformly over the square;
More informationAnalytical Graphing. lets start with the best graph ever made
Analytical Graphing lets start with the best graph ever made Probably the best statistical graphic ever drawn, this map by Charles Joseph Minard portrays the losses suffered by Napoleon's army in the Russian
More informationBusiness Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee
Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee Lecture - 04 Basic Statistics Part-1 (Refer Slide Time: 00:33)
More informationExpectation, Variance and Standard Deviation for Continuous Random Variables Class 6, Jeremy Orloff and Jonathan Bloom
Expectation, Variance and Standard Deviation for Continuous Random Variables Class 6, 8.5 Jeremy Orloff and Jonathan Bloom Learning Goals. Be able to compute and interpret expectation, variance, and standard
More informationUsing R in 200D Luke Sonnet
Using R in 200D Luke Sonnet Contents Working with data frames 1 Working with variables........................................... 1 Analyzing data............................................... 3 Random
More informationLecture 9: Predictive Inference
Lecture 9: Predictive Inference There are (at least) three levels at which we can make predictions with a regression model: we can give a single best guess about what Y will be when X = x, a point prediction;
More informationSTAT 4385 Topic 01: Introduction & Review
STAT 4385 Topic 01: Introduction & Review Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2016 Outline Welcome What is Regression Analysis? Basics
More informationBootstrap tests. Patrick Breheny. October 11. Bootstrap vs. permutation tests Testing for equality of location
Bootstrap tests Patrick Breheny October 11 Patrick Breheny STA 621: Nonparametric Statistics 1/14 Introduction Conditioning on the observed data to obtain permutation tests is certainly an important idea
More informationEstimating a population mean
Introductory Statistics Lectures Estimating a population mean Confidence intervals for means Department of Mathematics Pima Community College Redistribution of this material is prohibited without written
More informationThe Normal Distribution
The Mary Lindstrom (Adapted from notes provided by Professor Bret Larget) February 10, 2004 Statistics 371 Last modified: February 11, 2004 The The (AKA Gaussian Distribution) is our first distribution
More informationChapter 7. Inference for Distributions. Introduction to the Practice of STATISTICS SEVENTH. Moore / McCabe / Craig. Lecture Presentation Slides
Chapter 7 Inference for Distributions Introduction to the Practice of STATISTICS SEVENTH EDITION Moore / McCabe / Craig Lecture Presentation Slides Chapter 7 Inference for Distributions 7.1 Inference for
More informationMultivariate Distributions
Copyright Cosma Rohilla Shalizi; do not distribute without permission updates at http://www.stat.cmu.edu/~cshalizi/adafaepov/ Appendix E Multivariate Distributions E.1 Review of Definitions Let s review
More informationMATH 1150 Chapter 2 Notation and Terminology
MATH 1150 Chapter 2 Notation and Terminology Categorical Data The following is a dataset for 30 randomly selected adults in the U.S., showing the values of two categorical variables: whether or not the
More informationL03. PROBABILITY REVIEW II COVARIANCE PROJECTION. NA568 Mobile Robotics: Methods & Algorithms
L03. PROBABILITY REVIEW II COVARIANCE PROJECTION NA568 Mobile Robotics: Methods & Algorithms Today s Agenda State Representation and Uncertainty Multivariate Gaussian Covariance Projection Probabilistic
More information(Elementary) Regression Methods & Computational Statistics ( ) Part IV: Hypothesis Testing and Confidence Intervals (cont.)
(Elementary) Regression Methods & Computational Statistics (405.95) Part IV: Hypothesis Testing and Confidence Intervals (cont.) Assoz. Prof. Dr. Arbeitsgruppe Stochastik/Statistik Fachbereich Mathematik
More informationMultiple comparison procedures
Multiple comparison procedures Cavan Reilly October 5, 2012 Table of contents The null restricted bootstrap The bootstrap Effective number of tests Free step-down resampling While there are functions in
More informationOpen book, but no loose leaf notes and no electronic devices. Points (out of 200) are in parentheses. Put all answers on the paper provided to you.
ISQS 5347 Final Exam Spring 2017 Open book, but no loose leaf notes and no electronic devices. Points (out of 200) are in parentheses. Put all answers on the paper provided to you. 1. Recall the commute
More informationAP Statistics Cumulative AP Exam Study Guide
AP Statistics Cumulative AP Eam Study Guide Chapters & 3 - Graphs Statistics the science of collecting, analyzing, and drawing conclusions from data. Descriptive methods of organizing and summarizing statistics
More informationPackage jmuoutlier. February 17, 2017
Type Package Package jmuoutlier February 17, 2017 Title Permutation Tests for Nonparametric Statistics Version 1.3 Date 2017-02-17 Author Steven T. Garren [aut, cre] Maintainer Steven T. Garren
More informationStatistical Computing Session 4: Random Simulation
Statistical Computing Session 4: Random Simulation Paul Eilers & Dimitris Rizopoulos Department of Biostatistics, Erasmus University Medical Center p.eilers@erasmusmc.nl Masters Track Statistical Sciences,
More informationMath 494: Mathematical Statistics
Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/
More informationSection 3: Permutation Inference
Section 3: Permutation Inference Yotam Shem-Tov Fall 2015 Yotam Shem-Tov STAT 239/ PS 236A September 26, 2015 1 / 47 Introduction Throughout this slides we will focus only on randomized experiments, i.e
More informationContents. Acknowledgments. xix
Table of Preface Acknowledgments page xv xix 1 Introduction 1 The Role of the Computer in Data Analysis 1 Statistics: Descriptive and Inferential 2 Variables and Constants 3 The Measurement of Variables
More informationMeasures of center. The mean The mean of a distribution is the arithmetic average of the observations:
Measures of center The mean The mean of a distribution is the arithmetic average of the observations: x = x 1 + + x n n n = 1 x i n i=1 The median The median is the midpoint of a distribution: the number
More informationChapter 23: Inferences About Means
Chapter 3: Inferences About Means Sample of Means: number of observations in one sample the population mean (theoretical mean) sample mean (observed mean) is the theoretical standard deviation of the population
More informationThis does not cover everything on the final. Look at the posted practice problems for other topics.
Class 7: Review Problems for Final Exam 8.5 Spring 7 This does not cover everything on the final. Look at the posted practice problems for other topics. To save time in class: set up, but do not carry
More informationGlossary for the Triola Statistics Series
Glossary for the Triola Statistics Series Absolute deviation The measure of variation equal to the sum of the deviations of each value from the mean, divided by the number of values Acceptance sampling
More informationReview of Statistics 101
Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods
More informationStatistics Part IV Confidence Limits and Hypothesis Testing. Joe Nahas University of Notre Dame
Statistics Part IV Confidence Limits and Hypothesis Testing Joe Nahas University of Notre Dame Statistic Outline (cont.) 3. Graphical Display of Data A. Histogram B. Box Plot C. Normal Probability Plot
More informationAnalytical Graphing. lets start with the best graph ever made
Analytical Graphing lets start with the best graph ever made Probably the best statistical graphic ever drawn, this map by Charles Joseph Minard portrays the losses suffered by Napoleon's army in the Russian
More informationHotelling s One- Sample T2
Chapter 405 Hotelling s One- Sample T2 Introduction The one-sample Hotelling s T2 is the multivariate extension of the common one-sample or paired Student s t-test. In a one-sample t-test, the mean response
More informationStat 139 Homework 2 Solutions, Spring 2015
Stat 139 Homework 2 Solutions, Spring 2015 Problem 1. A pharmaceutical company is surveying through 50 different targeted compounds to try to determine whether any of them may be useful in treating migraine
More informationCHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)
FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter
More informationData Analysis I. Dr Martin Hendry, Dept of Physics and Astronomy University of Glasgow, UK. 10 lectures, beginning October 2006
Astronomical p( y x, I) p( x, I) p ( x y, I) = p( y, I) Data Analysis I Dr Martin Hendry, Dept of Physics and Astronomy University of Glasgow, UK 10 lectures, beginning October 2006 4. Monte Carlo Methods
More informationProbability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology Kharagpur
Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology Kharagpur Lecture No. #13 Probability Distribution of Continuous RVs (Contd
More informationTwo-Sample Inferential Statistics
The t Test for Two Independent Samples 1 Two-Sample Inferential Statistics In an experiment there are two or more conditions One condition is often called the control condition in which the treatment is
More informationSubject CS1 Actuarial Statistics 1 Core Principles
Institute of Actuaries of India Subject CS1 Actuarial Statistics 1 Core Principles For 2019 Examinations Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and
More informationInference for Single Proportions and Means T.Scofield
Inference for Single Proportions and Means TScofield Confidence Intervals for Single Proportions and Means A CI gives upper and lower bounds between which we hope to capture the (fixed) population parameter
More information7.2 Linear equation systems. 7.3 Linear least square fit
72 Linear equation systems In the following sections, we will spend some time to solve linear systems of equations This is a tool that will come in handy in many di erent places during this course For
More informationGoodness-of-fit Tests for the Normal Distribution Project 1
Goodness-of-fit Tests for the Normal Distribution Project 1 Jeremy Morris September 29, 2005 1 Kolmogorov-Smirnov Test The Kolmogorov-Smirnov Test (KS test) is based on the cumulative distribution function
More informationUnit 14: Nonparametric Statistical Methods
Unit 14: Nonparametric Statistical Methods Statistics 571: Statistical Methods Ramón V. León 8/8/2003 Unit 14 - Stat 571 - Ramón V. León 1 Introductory Remarks Most methods studied so far have been based
More informationLecture 1: Random number generation, permutation test, and the bootstrap. August 25, 2016
Lecture 1: Random number generation, permutation test, and the bootstrap August 25, 2016 Statistical simulation 1/21 Statistical simulation (Monte Carlo) is an important part of statistical method research.
More informationDistribution Fitting (Censored Data)
Distribution Fitting (Censored Data) Summary... 1 Data Input... 2 Analysis Summary... 3 Analysis Options... 4 Goodness-of-Fit Tests... 6 Frequency Histogram... 8 Comparison of Alternative Distributions...
More informationStat 135, Fall 2006 A. Adhikari HOMEWORK 6 SOLUTIONS
Stat 135, Fall 2006 A. Adhikari HOMEWORK 6 SOLUTIONS 1a. Under the null hypothesis X has the binomial (100,.5) distribution with E(X) = 50 and SE(X) = 5. So P ( X 50 > 10) is (approximately) two tails
More informationExploratory data analysis: numerical summaries
16 Exploratory data analysis: numerical summaries The classical way to describe important features of a dataset is to give several numerical summaries We discuss numerical summaries for the center of a
More informationComparison of Two Population Means
Comparison of Two Population Means Esra Akdeniz March 15, 2015 Independent versus Dependent (paired) Samples We have independent samples if we perform an experiment in two unrelated populations. We have
More informationSociology 6Z03 Review II
Sociology 6Z03 Review II John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 1 / 35 Outline: Review II Probability Part I Sampling Distributions Probability
More informationSTAT 513 fa 2018 Lec 02
STAT 513 fa 2018 Lec 02 Inference about the mean and variance of a Normal population Karl B. Gregory Fall 2018 Inference about the mean and variance of a Normal population Here we consider the case in
More informationPower and Sample Size + Principles of Simulation. Benjamin Neale March 4 th, 2010 International Twin Workshop, Boulder, CO
Power and Sample Size + Principles of Simulation Benjamin Neale March 4 th, 2010 International Twin Workshop, Boulder, CO What is power? What affects power? How do we calculate power? What is simulation?
More informationa. Explain whether George s line of fit is reasonable.
1 Algebra I Chapter 11 Test Review Standards/Goals: A.REI.10.: I can identify patterns that describe linear functions. o I can distinguish between dependent and independent variables. F.IF.3.: I can recognize
More informationRelating Graph to Matlab
There are two related course documents on the web Probability and Statistics Review -should be read by people without statistics background and it is helpful as a review for those with prior statistics
More informationSingle Sample Means. SOCY601 Alan Neustadtl
Single Sample Means SOCY601 Alan Neustadtl The Central Limit Theorem If we have a population measured by a variable with a mean µ and a standard deviation σ, and if all possible random samples of size
More informationWISE Power Tutorial Answer Sheet
ame Date Class WISE Power Tutorial Answer Sheet Power: The B.E.A.. Mnemonic Select true or false for each scenario: (Assuming no other changes) True False 1. As effect size increases, power decreases.
More informationSTAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015
STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots March 8, 2015 The duality between CI and hypothesis testing The duality between CI and hypothesis
More informationSimple linear regression: estimation, diagnostics, prediction
UPPSALA UNIVERSITY Department of Mathematics Mathematical statistics Regression and Analysis of Variance Autumn 2015 COMPUTER SESSION 1: Regression In the first computer exercise we will study the following
More informationThe Chi-Square Distributions
MATH 03 The Chi-Square Distributions Dr. Neal, Spring 009 The chi-square distributions can be used in statistics to analyze the standard deviation of a normally distributed measurement and to test the
More informationSTP 420 INTRODUCTION TO APPLIED STATISTICS NOTES
INTRODUCTION TO APPLIED STATISTICS NOTES PART - DATA CHAPTER LOOKING AT DATA - DISTRIBUTIONS Individuals objects described by a set of data (people, animals, things) - all the data for one individual make
More informationMetric Predicted Variable on One Group
Metric Predicted Variable on One Group Tim Frasier Copyright Tim Frasier This work is licensed under the Creative Commons Attribution 4.0 International license. Click here for more information. Prior Homework
More informationIntroduction to Matrix Algebra and the Multivariate Normal Distribution
Introduction to Matrix Algebra and the Multivariate Normal Distribution Introduction to Structural Equation Modeling Lecture #2 January 18, 2012 ERSH 8750: Lecture 2 Motivation for Learning the Multivariate
More informationSummarizing Measured Data
Summarizing Measured Data 12-1 Overview Basic Probability and Statistics Concepts: CDF, PDF, PMF, Mean, Variance, CoV, Normal Distribution Summarizing Data by a Single Number: Mean, Median, and Mode, Arithmetic,
More informationMeasures of Agreement
Measures of Agreement An interesting application is to measure how closely two individuals agree on a series of assessments. A common application for this is to compare the consistency of judgments of
More informationDesigning Information Devices and Systems II Fall 2018 Elad Alon and Miki Lustig Homework 9
EECS 16B Designing Information Devices and Systems II Fall 18 Elad Alon and Miki Lustig Homework 9 This homework is due Wednesday, October 31, 18, at 11:59pm. Self grades are due Monday, November 5, 18,
More informationThe Chi-Square Distributions
MATH 183 The Chi-Square Distributions Dr. Neal, WKU The chi-square distributions can be used in statistics to analyze the standard deviation σ of a normally distributed measurement and to test the goodness
More informationBackground to Statistics
FACT SHEET Background to Statistics Introduction Statistics include a broad range of methods for manipulating, presenting and interpreting data. Professional scientists of all kinds need to be proficient
More information