CS 395T: Computational Statistics with Application to Bioinformatics

Size: px
Start display at page:

Download "CS 395T: Computational Statistics with Application to Bioinformatics"

Transcription

1 CS 395T: Computational Statistics with Application to Bioinformatics Instructor: Prof. William H. Press Spring Term, 2008 The University of Texas at Austin Lecture 14 The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 1

2 Last lecture, we Explored the use of the permutation test for the significance of contingency tables requires expanding the contingency table to a dataset combines nicely with compression of ordinal data [example: maternal drinking] e.g., test difference of means, or of higher moments if you test multiple moments, think about multiple hypothesis correction but don t necessarily do it or use permutation test directly on (nominal) contingency tables in which case you have to pick a statistic Wald or Pearson are fine Contrasted the permutation test to bootstrap resampling one breaks the causal connection in the data, the other doesn t one lives in the null hypothesis world, the other in the (estimated) true population therefore, one addresses false positives (p-values), the other, false negatives Observed that, on (nominal) contingency tables, the permutation test samples exactly the null population of Fisher s Exact Test preserves all marginals gives easy Monte Carlo way to compute Fisher s Exact Test on contingency tables of any size Saw that even if we can easily compute it, Fisher s Exact Test remains problematic discrete values, and data always lands on a tie 2-tailed test is fragile to irrelevant number-theoretic coincidences and it still doesn t properly model an actual protocol! Decided, as good Bayesians, to marginalize over (by sampling from) the unknown column and row probabilities We use Dirichlet, the conjugate distribution to multinomial The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 2

3 In case it s not obvious that sampling over q is the same as marginalizing over it, here s a formal proof: We generate a deviate pair (f,q) by choosing a q, and then, independently, an f given q, so p(f,q) =p(q) p(f q) We then ignore q, so our f is drawn from the distribution p(f) = R p(f,q) dq = R p(q) p(f q) dq which is the same as the desired marginalization p(f) = R p(f q) p(q) dq (This is so close to self-evident, that I m not sure that the proof adds anything!) The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 3

4 So let s reanalyze assuming that the condition (column) marginals were fixed by the protocol, and we Bayessample the row probabilities: (You ll never believe my sampling function unless I go through an example!) >> table = [8 3; 16 26] table = >> marfix = sum(table,1)' marfix = >> marvar = sum(table,2)' marvar = >> gammas = gamrnd(marvar+1,1) gammas = >> q = gammas./ sum(gammas) q = >> qmat = repmat(q,[size(table,2),1]) qmat = >> tabout = mnrnd(marfix,qmat)' tabout = column marginals (transposed) row marginals (transposed) ~Gamma(n i +1) generated (random) row probabilities q i finally, we generate multinomial deviates for each column, using the generated row probabilities The reason everything is done in the transpose is because of the way that Matlab s mnrnd function expects its arguments to be shaped. Sorry about that! The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 4

5 Encapsulate the sampling process into a function. Then, generate a bunch of samples and look at their Wald statistics. function tabout = tabnullsamp(tabin) marfix = sum(tabin,1)'; marvar = sum(tabin,2)'; q = gamrnd(marvar+1,1); qmat = repmat(q./sum(q),[size(tabin,2),1]); tabout = mnrnd(marfix,qmat)'; wald(table) ans = tabnullsamp(table), tabnullsamp(table) ans = ans = wald(tabnullsamp(table)) ans = samps = arrayfun(@(x) wald(tabnullsamp(table)), 1:30000); hist(samps,-4:.05:4) cdfplot(samps) The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 5

6 There are still discreteness effects (after all, these are integer tables), but they are less troubling: pval = numel(samps(samps>=wald(table)))/numel(samps) pvaltt = (numel(samps(samps>=wald(table)))+numel(samps(samps<= -wald(table))) )/numel(samps) pval = pvaltt = one-tail vs. two-tail now much more reasonble This is probably the most honest answer that we can get for the significance of this particular contingency table. The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 6

7 Let s reanalyze the maternal drinking data using the same methodology, but (just for variety) with the Pearson statistic: function chis = pearson(table) nhtable = sum(table,2)*sum(table,1)/sum(sum(table)); chis = sum(sum((table-nhtable).^2./nhtable)); table = [ ; ]' table = pearson(table) ans = transpose to make the unfixed marginals be the rows, as before (this is a guess as to what the protocol actually was!) columns are the conditions samps = arrayfun(@(x) pearson(tabnullsamp(table)), 1:10000); hist(samps,0:0.25:90) The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 7

8 Giving the results pval = numel(samps(samps>pearson(table)))/numel(samps) pval = Why is this less significant than the previous analysis, which gave p=.015 or.011 (mean or square-mean)? Because here we didn t use the fact that the factors were ordinal and quantitatively related. By virtue of using that information, and compressing the rows, the previous method was more powerful. The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 8

9 Actually, it seems likely that this data was a cross-sectional study with no fixed marginals. If so, a better sampling of the null hypothesis would be: function tabout = tabnullsamp2(tabin) marcol = sum(tabin,1); marrow = sum(tabin,2); ntot = sum(marcol); q = gamrnd(marcol+1,1); q = q./sum(q); p = gamrnd(marrow+1,1); p = p./sum(p); pq = p * q; tabout = reshape(mnrnd(ntot,pq(:)'),size(tabin)); samps = arrayfun(@(x) pearson(tabnullsamp2(table)), 1:10000); hist(samps,0:0.25:90) pval = numel(samps(samps>pearson(table)))/numel(samps) pval = different from previous: protocol matters! This would probably the honest answer if the table were nominal, not ordinal. But, as said, since it is ordinal, the previous analysis using difference of means is more powerful. You now know all you need to know about contingency tables and much more than almost everyone who uses them! The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 9

10 New topic: the Multivariate Normal Distribution Generalizes Normal (Gaussian) to M-dimensions Like 1-d Gaussian, completely defined by its mean and (co-)variance Mean is a M-vector, covariance is a M x M matrix Because mean and covariance are easy to estimate from a data set, it is easy perhaps too easy to fit a multivariate normal distribution to data. μ = hxi Σ = h(x μ) (x μ)i Sample the sizes of 1 st and 2 nd introns for 1000 genes: g = readgenestats('genestats.dat'); ggg = g(g.ne>2,:); which = randsample(size(ggg,1),1000); iilen = ggg.intronlen(which); i1len = zeros(size(which)); i2len = zeros(size(which)); for j=1:numel(i1len), i1llen(j) = log10(iilen{j}(1)); end; for j=1:numel(i2len), i2llen(j) = log10(iilen{j}(2)); end; plot(i1llen,i2llen,'+') hold on The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 10

11 This is kind of fun, because it s not just the usual featureless scatter plot notice the biology! Is there a significant correlation here? (Yes, we ll see.) (Do you think the correlation could be only from the biological censoring of low values, and not be a real association?) The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 11

12 Biological digression: The hard lower bounds on intron length are because the intron has to fit around the big spliceosome machinery! It s all carefully arranged to allow exons of any length, even quite small. Why? Could the spliceosome have evolved to require a minimum exon length, too? Are we seeing chance early history, or selection? credit: Alberts et al. Molecular Biology of the Cell The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 12

13 We can easily sample from the fitted multivariate Gaussian: mu = mean([i1llen; i2llen],2) sig = cov(i1llen,i2llen) mu = sig = rsamp = mvnrnd(mu,sig,1000); plot(rsamp(:,1),rsamp(:,2),'or') hold off By the way, if renormalized, the covariance matrix is the linear correlation matrix: r = sig./ sqrt(diag(sig) * diag(sig)') tval = sqrt(numel(iilen))*r r = tval = [rr p] = corrcoef(i1llen,i2llen) rr = p = statistical significance of the correlation in standard deviations (but note: uses CLT) Matlab has built-ins The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 13

14 How to generate multivariate normal deviates: So, just take: and you get a multivariate normal deviate! The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 14

15 A related, useful, Cholesky trick is to draw error ellipses (ellipsoids, ) So, locus of points at 1 standard deviation is If z is on the unit circle (sphere, ) then function [x y] = errorellipse(mu,sigma,stdev,n) L = chol(sigma,'lower'); circle = [cos(2*pi*(0:n)/n); sin(2*pi*(0:n)/n)].*stdev; ellipse = L*circle + repmat(mu,[1,n+1]); x = ellipse(1,:); y = ellipse(2,:); plot(i1llen,i2llen,'+b'); hold on [xx yy] = errorellipse(mu,sig,1,100); plot(xx,yy,'-r'); [xx yy] = errorellipse(mu,sig,2,100); plot(xx,yy,'-r') The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 15

16 A few slides back, I put out the red herring that there could be a correlation r that wasn t a real association but instead came from weirdness in the individual distributions. I was being a provocateur. It can t be so: 1. The theorem that r is ~ normal (around zero) for independent distributions is a CLT and doesn t depend on the individual distributions (as long as they have suitably convergent moments and the number of data points is large) 2. It s clear for our example data if you plot even one random permutation: plot(i1llen,i2llen,'+b') hold on plot(i1llen(randperm(numel(i1llen))),i2llen,'+r'); 3. In case of any lingering doubt, you could use brute force: do the permutation test and plot the histogram of r values. You ll recover the CLT result. The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 16

Computational Statistics with Application to Bioinformatics

Computational Statistics with Application to Bioinformatics Computational Statistics with Application to Bioinformatics Prof. William H. Press Spring Term, 2008 The University of Texas at Austin Unit 9: Working with Multivariate Normal Distributions The University

More information

Computational Statistics with Application to Bioinformatics

Computational Statistics with Application to Bioinformatics Computational Statistics with Application to Bioinformatics Prof. William H. Press Spring Term, 2008 The University of Texas at Austin Unit 8: Contingency Tables, Experimental Protocols, and All That The

More information

CS395T Computational Statistics with Application to Bioinformatics

CS395T Computational Statistics with Application to Bioinformatics CS395T Computational Statistics with Application to Bioinformatics Prof. William H. Press Spring Term, 2010 The University of Texas at Austin Unit 14: Contingency Tables The University of Texas at Austin,

More information

Computational Statistics with Application to Bioinformatics. Unit 11: Gaussian Mixture Models and EM Methods

Computational Statistics with Application to Bioinformatics. Unit 11: Gaussian Mixture Models and EM Methods Computational Statistics with Application to Bioinformatics Prof. William H. Press Spring Term, 2008 The University of Texas at Austin Unit 11: Gaussian Mixture Models and EM Methods The University of

More information

CS395T Computational Statistics with Application to Bioinformatics

CS395T Computational Statistics with Application to Bioinformatics CS395T Computational Statistics with Application to Bioinformatics Prof. William H. Press Spring Term, 2010 The University of Texas at Austin Unit 9: Count Data The University of Texas at Austin, CS 395T,

More information

Computational Statistics with Application to Bioinformatics

Computational Statistics with Application to Bioinformatics Computational Statistics with Application to Bioinformatics Prof. William H. Press Spring Term, 2008 The University of Texas at Austin Unit 7: Fitting Models to Data and Estimating Errors in Model-Derived

More information

4th IMPRS Astronomy Summer School Drawing Astrophysical Inferences from Data Sets

4th IMPRS Astronomy Summer School Drawing Astrophysical Inferences from Data Sets 4th IMPRS Astronomy Summer School Drawing Astrophysical Inferences from Data Sets William H. Press The University of Texas at Austin Lecture 6 IMPRS Summer School 2009, Prof. William H. Press 1 Mixture

More information

Chi square test of independence

Chi square test of independence Chi square test of independence Eyeball differences between percentages: large enough to be important Better: Are they statistically significant? Statistical significance: are observed differences significantly

More information

Chi square test of independence

Chi square test of independence Chi square test of independence We can eyeball differences between percentages to determine whether they seem large enough to be important Better: Are differences in percentages statistically significant?

More information

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works CS68: The Modern Algorithmic Toolbox Lecture #8: How PCA Works Tim Roughgarden & Gregory Valiant April 20, 206 Introduction Last lecture introduced the idea of principal components analysis (PCA). The

More information

CS395T Computational Statistics with Application to Bioinformatics

CS395T Computational Statistics with Application to Bioinformatics CS395T Computational Statistics with Application to Bioinformatics Prof. William H. Press Spring Term, 2010 The University of Texas at Austin Unit 2: Estimation of Parameters The University of Texas at

More information

Computational Statistics with Application to Bioinformatics

Computational Statistics with Application to Bioinformatics Computational Statistics with Application to Bioinformatics Prof. William H. Press Spring Term, 2008 The University of Texas at Austin Unit 6: Understanding Distributions Known Only Empirically The University

More information

Permutation Tests. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods

Permutation Tests. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods Permutation Tests Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods The Two-Sample Problem We observe two independent random samples: F z = z 1, z 2,, z n independently of

More information

CS 361: Probability & Statistics

CS 361: Probability & Statistics March 14, 2018 CS 361: Probability & Statistics Inference The prior From Bayes rule, we know that we can express our function of interest as Likelihood Prior Posterior The right hand side contains the

More information

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21 CS 70 Discrete Mathematics and Probability Theory Fall 205 Lecture 2 Inference In this note we revisit the problem of inference: Given some data or observations from the world, what can we infer about

More information

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis Lecture 3 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,

More information

Statistical Data Analysis Stat 3: p-values, parameter estimation

Statistical Data Analysis Stat 3: p-values, parameter estimation Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,

More information

Computational Statistics with Application to Bioinformatics. Unit 12: Maximum Likelihood Estimation (MLE) on a Statistical Model

Computational Statistics with Application to Bioinformatics. Unit 12: Maximum Likelihood Estimation (MLE) on a Statistical Model Computational Statistics with Application to Bioinformatics Prof. William H. Press Spring Term, 2008 The University of Texas at Austin Unit 12: Maximum Likelihood Estimation (MLE) on a Statistical Model

More information

Two-sample Categorical data: Testing

Two-sample Categorical data: Testing Two-sample Categorical data: Testing Patrick Breheny April 1 Patrick Breheny Introduction to Biostatistics (171:161) 1/28 Separate vs. paired samples Despite the fact that paired samples usually offer

More information

Correlation and regression

Correlation and regression NST 1B Experimental Psychology Statistics practical 1 Correlation and regression Rudolf Cardinal & Mike Aitken 11 / 12 November 2003 Department of Experimental Psychology University of Cambridge Handouts:

More information

QUADRATICS 3.2 Breaking Symmetry: Factoring

QUADRATICS 3.2 Breaking Symmetry: Factoring QUADRATICS 3. Breaking Symmetry: Factoring James Tanton SETTING THE SCENE Recall that we started our story of symmetry with a rectangle of area 36. Then you would say that the picture is wrong and that

More information

Simulation. Where real stuff starts

Simulation. Where real stuff starts 1 Simulation Where real stuff starts ToC 1. What is a simulation? 2. Accuracy of output 3. Random Number Generators 4. How to sample 5. Monte Carlo 6. Bootstrap 2 1. What is a simulation? 3 What is a simulation?

More information

Topic 1. Definitions

Topic 1. Definitions S Topic. Definitions. Scalar A scalar is a number. 2. Vector A vector is a column of numbers. 3. Linear combination A scalar times a vector plus a scalar times a vector, plus a scalar times a vector...

More information

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Joint Probability Distributions and Random Samples (Devore Chapter Five) Joint Probability Distributions and Random Samples (Devore Chapter Five) 1016-345-01: Probability and Statistics for Engineers Spring 2013 Contents 1 Joint Probability Distributions 2 1.1 Two Discrete

More information

Chapter 18. Sampling Distribution Models. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Chapter 18. Sampling Distribution Models. Copyright 2010, 2007, 2004 Pearson Education, Inc. Chapter 18 Sampling Distribution Models Copyright 2010, 2007, 2004 Pearson Education, Inc. Normal Model When we talk about one data value and the Normal model we used the notation: N(μ, σ) Copyright 2010,

More information

Lecture: Mixture Models for Microbiome data

Lecture: Mixture Models for Microbiome data Lecture: Mixture Models for Microbiome data Lecture 3: Mixture Models for Microbiome data Outline: - - Sequencing thought experiment Mixture Models (tangent) - (esp. Negative Binomial) - Differential abundance

More information

Fall 07 ISQS 6348 Midterm Solutions

Fall 07 ISQS 6348 Midterm Solutions Fall 07 ISQS 648 Midterm Solutions Instructions: Open notes, no books. Points out of 00 in parentheses. 1. A random vector X = 4 X 1 X X has the following mean vector and covariance matrix: E(X) = 4 1

More information

Probability and Statistics

Probability and Statistics CHAPTER 5: PARAMETER ESTIMATION 5-0 Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be CHAPTER 5: PARAMETER

More information

1 Probabilities. 1.1 Basics 1 PROBABILITIES

1 Probabilities. 1.1 Basics 1 PROBABILITIES 1 PROBABILITIES 1 Probabilities Probability is a tricky word usually meaning the likelyhood of something occuring or how frequent something is. Obviously, if something happens frequently, then its probability

More information

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI Introduction of Data Analytics Prof. Nandan Sudarsanam and Prof. B Ravindran Department of Management Studies and Department of Computer Science and Engineering Indian Institute of Technology, Madras Module

More information

What is proof? Lesson 1

What is proof? Lesson 1 What is proof? Lesson The topic for this Math Explorer Club is mathematical proof. In this post we will go over what was covered in the first session. The word proof is a normal English word that you might

More information

Lecture 30. DATA 8 Summer Regression Inference

Lecture 30. DATA 8 Summer Regression Inference DATA 8 Summer 2018 Lecture 30 Regression Inference Slides created by John DeNero (denero@berkeley.edu) and Ani Adhikari (adhikari@berkeley.edu) Contributions by Fahad Kamran (fhdkmrn@berkeley.edu) and

More information

Lecture 3: Mixture Models for Microbiome data. Lecture 3: Mixture Models for Microbiome data

Lecture 3: Mixture Models for Microbiome data. Lecture 3: Mixture Models for Microbiome data Lecture 3: Mixture Models for Microbiome data 1 Lecture 3: Mixture Models for Microbiome data Outline: - Mixture Models (Negative Binomial) - DESeq2 / Don t Rarefy. Ever. 2 Hypothesis Tests - reminder

More information

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n =

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n = Hypothesis testing I I. What is hypothesis testing? [Note we re temporarily bouncing around in the book a lot! Things will settle down again in a week or so] - Exactly what it says. We develop a hypothesis,

More information

The Multinomial Model

The Multinomial Model The Multinomial Model STA 312: Fall 2012 Contents 1 Multinomial Coefficients 1 2 Multinomial Distribution 2 3 Estimation 4 4 Hypothesis tests 8 5 Power 17 1 Multinomial Coefficients Multinomial coefficient

More information

Multiple Sample Categorical Data

Multiple Sample Categorical Data Multiple Sample Categorical Data paired and unpaired data, goodness-of-fit testing, testing for independence University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html

More information

Business Statistics. Lecture 5: Confidence Intervals

Business Statistics. Lecture 5: Confidence Intervals Business Statistics Lecture 5: Confidence Intervals Goals for this Lecture Confidence intervals The t distribution 2 Welcome to Interval Estimation! Moments Mean 815.0340 Std Dev 0.8923 Std Error Mean

More information

Physics 509: Bootstrap and Robust Parameter Estimation

Physics 509: Bootstrap and Robust Parameter Estimation Physics 509: Bootstrap and Robust Parameter Estimation Scott Oser Lecture #20 Physics 509 1 Nonparametric parameter estimation Question: what error estimate should you assign to the slope and intercept

More information

You have 3 hours to complete the exam. Some questions are harder than others, so don t spend too long on any one question.

You have 3 hours to complete the exam. Some questions are harder than others, so don t spend too long on any one question. Data 8 Fall 2017 Foundations of Data Science Final INSTRUCTIONS You have 3 hours to complete the exam. Some questions are harder than others, so don t spend too long on any one question. The exam is closed

More information

Lecture 2. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1

Lecture 2. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1 Lecture 2 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,

More information

Lecture 10: Generalized likelihood ratio test

Lecture 10: Generalized likelihood ratio test Stat 200: Introduction to Statistical Inference Autumn 2018/19 Lecture 10: Generalized likelihood ratio test Lecturer: Art B. Owen October 25 Disclaimer: These notes have not been subjected to the usual

More information

Statistics: revision

Statistics: revision NST 1B Experimental Psychology Statistics practical 5 Statistics: revision Rudolf Cardinal & Mike Aitken 29 / 30 April 2004 Department of Experimental Psychology University of Cambridge Handouts: Answers

More information

CS168: The Modern Algorithmic Toolbox Lecture #8: PCA and the Power Iteration Method

CS168: The Modern Algorithmic Toolbox Lecture #8: PCA and the Power Iteration Method CS168: The Modern Algorithmic Toolbox Lecture #8: PCA and the Power Iteration Method Tim Roughgarden & Gregory Valiant April 15, 015 This lecture began with an extended recap of Lecture 7. Recall that

More information

CS 361: Probability & Statistics

CS 361: Probability & Statistics October 17, 2017 CS 361: Probability & Statistics Inference Maximum likelihood: drawbacks A couple of things might trip up max likelihood estimation: 1) Finding the maximum of some functions can be quite

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Bayesian Inference for Normal Mean

Bayesian Inference for Normal Mean Al Nosedal. University of Toronto. November 18, 2015 Likelihood of Single Observation The conditional observation distribution of y µ is Normal with mean µ and variance σ 2, which is known. Its density

More information

Nonparametric hypothesis tests and permutation tests

Nonparametric hypothesis tests and permutation tests Nonparametric hypothesis tests and permutation tests 1.7 & 2.3. Probability Generating Functions 3.8.3. Wilcoxon Signed Rank Test 3.8.2. Mann-Whitney Test Prof. Tesler Math 283 Fall 2018 Prof. Tesler Wilcoxon

More information

Fin285a:Computer Simulations and Risk Assessment Section 2.3.2:Hypothesis testing, and Confidence Intervals

Fin285a:Computer Simulations and Risk Assessment Section 2.3.2:Hypothesis testing, and Confidence Intervals Fin285a:Computer Simulations and Risk Assessment Section 2.3.2:Hypothesis testing, and Confidence Intervals Overview Hypothesis testing terms Testing a die Testing issues Estimating means Confidence intervals

More information

STA Module 10 Comparing Two Proportions

STA Module 10 Comparing Two Proportions STA 2023 Module 10 Comparing Two Proportions Learning Objectives Upon completing this module, you should be able to: 1. Perform large-sample inferences (hypothesis test and confidence intervals) to compare

More information

CS395T Computational Statistics with Application to Bioinformatics

CS395T Computational Statistics with Application to Bioinformatics CS395T Computational Statistics with Application to Bioinformatics Prof. William H. Press Spring Term, 2010 The University of Texas at Austin Unit 1. Probability and Inference The University of Texas at

More information

(x 1 +x 2 )(x 1 x 2 )+(x 2 +x 3 )(x 2 x 3 )+(x 3 +x 1 )(x 3 x 1 ).

(x 1 +x 2 )(x 1 x 2 )+(x 2 +x 3 )(x 2 x 3 )+(x 3 +x 1 )(x 3 x 1 ). CMPSCI611: Verifying Polynomial Identities Lecture 13 Here is a problem that has a polynomial-time randomized solution, but so far no poly-time deterministic solution. Let F be any field and let Q(x 1,...,

More information

Statistical Distribution Assumptions of General Linear Models

Statistical Distribution Assumptions of General Linear Models Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions

More information

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #27 Estimation-I Today, I will introduce the problem of

More information

STA Why Sampling? Module 6 The Sampling Distributions. Module Objectives

STA Why Sampling? Module 6 The Sampling Distributions. Module Objectives STA 2023 Module 6 The Sampling Distributions Module Objectives In this module, we will learn the following: 1. Define sampling error and explain the need for sampling distributions. 2. Recognize that sampling

More information

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes COMP 55 Applied Machine Learning Lecture 2: Gaussian processes Instructor: Ryan Lowe (ryan.lowe@cs.mcgill.ca) Slides mostly by: (herke.vanhoof@mcgill.ca) Class web page: www.cs.mcgill.ca/~hvanho2/comp55

More information

Solutions to In-Class Problems Week 14, Mon.

Solutions to In-Class Problems Week 14, Mon. Massachusetts Institute of Technology 6.042J/18.062J, Spring 10: Mathematics for Computer Science May 10 Prof. Albert R. Meyer revised May 10, 2010, 677 minutes Solutions to In-Class Problems Week 14,

More information

Day 1: Over + Over Again

Day 1: Over + Over Again Welcome to Morning Math! The current time is... huh, that s not right. Day 1: Over + Over Again Welcome to PCMI! We know you ll learn a great deal of mathematics here maybe some new tricks, maybe some

More information

Instructor (Andrew Ng)

Instructor (Andrew Ng) MachineLearning-Lecture13 Instructor (Andrew Ng):Okay, good morning. For those of you actually online, sorry; starting a couple of minutes late. We re having trouble with the lights just now, so we re

More information

MITOCW ocw f99-lec09_300k

MITOCW ocw f99-lec09_300k MITOCW ocw-18.06-f99-lec09_300k OK, this is linear algebra lecture nine. And this is a key lecture, this is where we get these ideas of linear independence, when a bunch of vectors are independent -- or

More information

Machine Learning, Fall 2009: Midterm

Machine Learning, Fall 2009: Midterm 10-601 Machine Learning, Fall 009: Midterm Monday, November nd hours 1. Personal info: Name: Andrew account: E-mail address:. You are permitted two pages of notes and a calculator. Please turn off all

More information

Getting Started with Communications Engineering

Getting Started with Communications Engineering 1 Linear algebra is the algebra of linear equations: the term linear being used in the same sense as in linear functions, such as: which is the equation of a straight line. y ax c (0.1) Of course, if we

More information

CS 340 Fall 2007: Homework 3

CS 340 Fall 2007: Homework 3 CS 34 Fall 27: Homework 3 1 Marginal likelihood for the Beta-Bernoulli model We showed that the marginal likelihood is the ratio of the normalizing constants: p(d) = B(α 1 + N 1, α + N ) B(α 1, α ) = Γ(α

More information

Language as a Stochastic Process

Language as a Stochastic Process CS769 Spring 2010 Advanced Natural Language Processing Language as a Stochastic Process Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu 1 Basic Statistics for NLP Pick an arbitrary letter x at random from any

More information

MITOCW watch?v=v4xodtqic3o

MITOCW watch?v=v4xodtqic3o MITOCW watch?v=v4xodtqic3o The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high-quality educational resources for free. To

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

PROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

PROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception PROBABILITY DISTRIBUTIONS Credits 2 These slides were sourced and/or modified from: Christopher Bishop, Microsoft UK Parametric Distributions 3 Basic building blocks: Need to determine given Representation:

More information

Bootstrapping, Randomization, 2B-PLS

Bootstrapping, Randomization, 2B-PLS Bootstrapping, Randomization, 2B-PLS Statistics, Tests, and Bootstrapping Statistic a measure that summarizes some feature of a set of data (e.g., mean, standard deviation, skew, coefficient of variation,

More information

EXAMPLE 7: EIGENVALUE PROBLEM EXAMPLE. x ks1. ks2. fs1. fs2 !!! +!!! =!!!!! 4) State variables:!!!,!!!,!!!,!!! (Four SV s here!) =!!!

EXAMPLE 7: EIGENVALUE PROBLEM EXAMPLE. x ks1. ks2. fs1. fs2 !!! +!!! =!!!!! 4) State variables:!!!,!!!,!!!,!!! (Four SV s here!) =!!! EXAMPLE 7: EIGENVALUE PROBLEM EXAMPLE x ks ks m m ) CL: ) GC: 3) FBD: fs fs fs m m + 4) State variables:,,, (Four SV s here) 5) Solve for the state equations for each variable + + Wow, that was one of

More information

Essential facts about NP-completeness:

Essential facts about NP-completeness: CMPSCI611: NP Completeness Lecture 17 Essential facts about NP-completeness: Any NP-complete problem can be solved by a simple, but exponentially slow algorithm. We don t have polynomial-time solutions

More information

MITOCW ocw f99-lec30_300k

MITOCW ocw f99-lec30_300k MITOCW ocw-18.06-f99-lec30_300k OK, this is the lecture on linear transformations. Actually, linear algebra courses used to begin with this lecture, so you could say I'm beginning this course again by

More information

Note: Please use the actual date you accessed this material in your citation.

Note: Please use the actual date you accessed this material in your citation. MIT OpenCourseWare http://ocw.mit.edu 18.06 Linear Algebra, Spring 2005 Please use the following citation format: Gilbert Strang, 18.06 Linear Algebra, Spring 2005. (Massachusetts Institute of Technology:

More information

Practical Statistics

Practical Statistics Practical Statistics Lecture 1 (Nov. 9): - Correlation - Hypothesis Testing Lecture 2 (Nov. 16): - Error Estimation - Bayesian Analysis - Rejecting Outliers Lecture 3 (Nov. 18) - Monte Carlo Modeling -

More information

Biology as Information Dynamics

Biology as Information Dynamics Biology as Information Dynamics John Baez Biological Complexity: Can It Be Quantified? Beyond Center February 2, 2017 IT S ALL RELATIVE EVEN INFORMATION! When you learn something, how much information

More information

Quantitative Understanding in Biology Principal Components Analysis

Quantitative Understanding in Biology Principal Components Analysis Quantitative Understanding in Biology Principal Components Analysis Introduction Throughout this course we have seen examples of complex mathematical phenomena being represented as linear combinations

More information

Sections 3.4, 3.5. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Sections 3.4, 3.5. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Sections 3.4, 3.5 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 3.4 I J tables with ordinal outcomes Tests that take advantage of ordinal

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Statistical inference

Statistical inference Statistical inference Contents 1. Main definitions 2. Estimation 3. Testing L. Trapani MSc Induction - Statistical inference 1 1 Introduction: definition and preliminary theory In this chapter, we shall

More information

CS395T Computational Statistics with Application to Bioinformatics

CS395T Computational Statistics with Application to Bioinformatics CS395T Computational Statistics with Application to Bioinformatics Prof. William H. Press Spring Term, 2009 The University of Texas at Austin Unit 2: (Bayesian) Estimation of Parameters The University

More information

Advanced Statistical Methods. Lecture 6

Advanced Statistical Methods. Lecture 6 Advanced Statistical Methods Lecture 6 Convergence distribution of M.-H. MCMC We denote the PDF estimated by the MCMC as. It has the property Convergence distribution After some time, the distribution

More information

Psych Jan. 5, 2005

Psych Jan. 5, 2005 Psych 124 1 Wee 1: Introductory Notes on Variables and Probability Distributions (1/5/05) (Reading: Aron & Aron, Chaps. 1, 14, and this Handout.) All handouts are available outside Mija s office. Lecture

More information

Lecture 5. 1 Review (Pairwise Independence and Derandomization)

Lecture 5. 1 Review (Pairwise Independence and Derandomization) 6.842 Randomness and Computation September 20, 2017 Lecture 5 Lecturer: Ronitt Rubinfeld Scribe: Tom Kolokotrones 1 Review (Pairwise Independence and Derandomization) As we discussed last time, we can

More information

Lecture Wigner-Ville Distributions

Lecture Wigner-Ville Distributions Introduction to Time-Frequency Analysis and Wavelet Transforms Prof. Arun K. Tangirala Department of Chemical Engineering Indian Institute of Technology, Madras Lecture - 6.1 Wigner-Ville Distributions

More information

Introduction to Applied Bayesian Modeling. ICPSR Day 4

Introduction to Applied Bayesian Modeling. ICPSR Day 4 Introduction to Applied Bayesian Modeling ICPSR Day 4 Simple Priors Remember Bayes Law: Where P(A) is the prior probability of A Simple prior Recall the test for disease example where we specified the

More information

Simulation. Where real stuff starts

Simulation. Where real stuff starts Simulation Where real stuff starts March 2019 1 ToC 1. What is a simulation? 2. Accuracy of output 3. Random Number Generators 4. How to sample 5. Monte Carlo 6. Bootstrap 2 1. What is a simulation? 3

More information

PSY 305. Module 3. Page Title. Introduction to Hypothesis Testing Z-tests. Five steps in hypothesis testing

PSY 305. Module 3. Page Title. Introduction to Hypothesis Testing Z-tests. Five steps in hypothesis testing Page Title PSY 305 Module 3 Introduction to Hypothesis Testing Z-tests Five steps in hypothesis testing State the research and null hypothesis Determine characteristics of comparison distribution Five

More information

Algebra Exam. Solutions and Grading Guide

Algebra Exam. Solutions and Grading Guide Algebra Exam Solutions and Grading Guide You should use this grading guide to carefully grade your own exam, trying to be as objective as possible about what score the TAs would give your responses. Full

More information

Two-sample inference: Continuous data

Two-sample inference: Continuous data Two-sample inference: Continuous data Patrick Breheny November 11 Patrick Breheny STA 580: Biostatistics I 1/32 Introduction Our next two lectures will deal with two-sample inference for continuous data

More information

1 Probabilities. 1.1 Basics 1 PROBABILITIES

1 Probabilities. 1.1 Basics 1 PROBABILITIES 1 PROBABILITIES 1 Probabilities Probability is a tricky word usually meaning the likelyhood of something occuring or how frequent something is. Obviously, if something happens frequently, then its probability

More information

Evaluating Classifiers. Lecture 2 Instructor: Max Welling

Evaluating Classifiers. Lecture 2 Instructor: Max Welling Evaluating Classifiers Lecture 2 Instructor: Max Welling Evaluation of Results How do you report classification error? How certain are you about the error you claim? How do you compare two algorithms?

More information

Sums of Squares (FNS 195-S) Fall 2014

Sums of Squares (FNS 195-S) Fall 2014 Sums of Squares (FNS 195-S) Fall 014 Record of What We Did Drew Armstrong Vectors When we tried to apply Cartesian coordinates in 3 dimensions we ran into some difficulty tryiing to describe lines and

More information

STAT 536: Genetic Statistics

STAT 536: Genetic Statistics STAT 536: Genetic Statistics Tests for Hardy Weinberg Equilibrium Karin S. Dorman Department of Statistics Iowa State University September 7, 2006 Statistical Hypothesis Testing Identify a hypothesis,

More information

Unit 16: Hidden Markov Models

Unit 16: Hidden Markov Models Computational Statistics with Application to Bioinformatics Prof. William H. Press Spring Term, 2008 The University of Texas at Austin Unit 16: Hidden Markov Models The University of Texas at Austin, CS

More information

Review of probability

Review of probability Review of probability Computer Sciences 760 Spring 2014 http://pages.cs.wisc.edu/~dpage/cs760/ Goals for the lecture you should understand the following concepts definition of probability random variables

More information

Lecture 2: Mutually Orthogonal Latin Squares and Finite Fields

Lecture 2: Mutually Orthogonal Latin Squares and Finite Fields Latin Squares Instructor: Padraic Bartlett Lecture 2: Mutually Orthogonal Latin Squares and Finite Fields Week 2 Mathcamp 2012 Before we start this lecture, try solving the following problem: Question

More information

Name Solutions Linear Algebra; Test 3. Throughout the test simplify all answers except where stated otherwise.

Name Solutions Linear Algebra; Test 3. Throughout the test simplify all answers except where stated otherwise. Name Solutions Linear Algebra; Test 3 Throughout the test simplify all answers except where stated otherwise. 1) Find the following: (10 points) ( ) Or note that so the rows are linearly independent, so

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

Do students sleep the recommended 8 hours a night on average?

Do students sleep the recommended 8 hours a night on average? BIEB100. Professor Rifkin. Notes on Section 2.2, lecture of 27 January 2014. Do students sleep the recommended 8 hours a night on average? We first set up our null and alternative hypotheses: H0: μ= 8

More information

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu

More information

Mathematical Induction. EECS 203: Discrete Mathematics Lecture 11 Spring

Mathematical Induction. EECS 203: Discrete Mathematics Lecture 11 Spring Mathematical Induction EECS 203: Discrete Mathematics Lecture 11 Spring 2016 1 Climbing the Ladder We want to show that n 1 P(n) is true. Think of the positive integers as a ladder. 1, 2, 3, 4, 5, 6,...

More information

Chapter 7. Inference for Distributions. Introduction to the Practice of STATISTICS SEVENTH. Moore / McCabe / Craig. Lecture Presentation Slides

Chapter 7. Inference for Distributions. Introduction to the Practice of STATISTICS SEVENTH. Moore / McCabe / Craig. Lecture Presentation Slides Chapter 7 Inference for Distributions Introduction to the Practice of STATISTICS SEVENTH EDITION Moore / McCabe / Craig Lecture Presentation Slides Chapter 7 Inference for Distributions 7.1 Inference for

More information