# 1 Using standard errors when comparing estimated values

Save this PDF as:

Size: px
Start display at page:

## Transcription

3 5 s Some people incorrectly stated that the negative correlation evident between log λ and ϵ in the scatter plot of MCMC samples from the posterior was due to the correlation between successive samples when using a MCMC method. While it is correct to say that generally successive samples generated using a MCMC method will be correlated (often highly so), this is distinct from whether the variables of the target distribution we are trying to sample from have correlations between them. (a) i. (a) ii (b) i. (b) ii.. x.6.. x x (c) i (c) ii x (d) i (d) ii.. x.6.. x x x Figure : D sample scatter plots and between sample correlation plots for sets of independent and Metropolis random walked based samples from two zero-mean Gaussian distributions. (a) i. independent samples of a pair of zero mean unit variance independent Gaussian variables. (a) ii. correlations between successive x samples from (a) i. at different lags. (b) i. Random-walk Metropolis samples of a pair of zero mean unit variance independent Gaussian variables. (b) ii. correlations between successive x samples from (b) i. at different lags. (c) i. independent samples of a pair of zero mean unit variance Gaussian variables with correlation Σ =. (c) ii. correlations between successive x samples from (c) i. at different lags. (d) i. Random-walk Metropolis samples of a pair of zero mean unit variance Gaussian variables with correlation Σ =. (d) ii. correlations between successive x samples from (d) i. at different lags. In the sample plots blue points are individual samples while the red paths are the trajectories of set of successive samples (including rejected updates for Metropolis samples). This is illustrated in figure. Here we have plots visualising the samples from two different zero-mean bivariate Gaussian distributions - the top row for a bivariate Gaussian with covariance Σ = Σ = Σ = i.e. independent variables, and the bottom row for a bivariate Gaussian with covariance Σ = Σ = Σ = i.e. negatively correlated variables. The left two columns are based on samples drawn independently from the respective distributions. The right two columns are based on samples generated using a MCMC method (random-walk Metropolis updates with proposal width ). For each distribution / sampling method combination, both a scatter plot of samples is shown (i.) and a plot of the correlations measured between successive x samples at different lags (ii.). It can be seen here in panels (b) i. and (b) ii. that we can have a target distribution in which our beliefs about the variables are independent, but that successive samples from the MCMC method used are highly correlated. Conversely in panels (c) i and (c) ii we can see an example of a distribution in which we our beliefs about the variables are highly correlated but successive samples are not at all correlated. What we can see from panel (d) ii. however is that strong correlations between variables in the target distribution can lead to the successive MCMC samples

4 being more correlated (compare the decay of the correlations in (d) ii. to (b) ii.) - this is due to the random-walk Metropolis MCMC dynamic here being not very well suited to highly correlated distributions. Another correlation related issue that a few people ran into was the difference between correlation and statistical dependence. It was correctly pointed out by several students that if two variables are uncorrelated this does not necessarily mean they are statistically independent (unless they are jointly Gaussian). This caused some people to decide to try to assess if there was any dependence between log λ and ϵ under the posterior in question b by estimating the mutual information between them, which is a more general measure of the statistical dependence between two random variables. This is a valid approach to take however there are several caveats that need to be born in mind. Unlike correlation, estimating the mutual information between a pair of continuous random variables from a finite set of samples if very difficult, with estimators tending to be either very noisy or biased. Mutual information estimates are also a lot more expensive to compute than correlations and more care needs to be taken with their interpretation - ideally we would also calculate error bars on our estimate to allow us to decide more rigorously if the mutual information appears to be distinct from zero or not, it is not immediately obvious what a typical mutual information value between two dependent variables would be. Although a correlation coefficient estimate may be insufficient to declare two variables independent if it is indistinguishable from zero, if it is significantly different from zero this is sufficient to show statistical dependence between the two variables without the complications of invoking mutual information. 6 Transforming variables in probability density functions A relatively subtle error that was made by a number of people in question b when performing the slice sampling was to apply a non-linear transformation to the variables the posterior density was defined on without accounting for this change of variables correctly. In particular several people defined their log-posterior function in terms of λ rather than log λ and some used a = σ (a) instead of ϵ (slice sample expects that the arguments used in the function handle specified to calculate the target density are the parameters the density is defined on). The correction needed involves calculating the determinant of the Jacobian (matrix of first partial derivatives) of the transformation - there was an example of applying this in the Background Work tutorial sheet. What follows is a slightly mathematically involved explanation of why we need to account for a change of variables like this applied specifically to the problem here. In question b you were asked to sample from a posterior distribution defined by a probability density function, specifically p(w, ϵ, log λ data) = Z exp {L(w, ϵ)} λ D exp { λw T w } with Z being an (unknown) normalising constant. A probability density function implicitly defines probabilities via an integral over regions of the space the density is defined on [ P w A, ϵ B, l ] C data = p(w, ϵ, l data) dw dϵ dl here the (non-standard) notation is being used that characters with tildes ( ) above them represent random variables and for the sake of clarity l has been used to denote log λ. If we want to redefine the variables in a probability density and ensure that the probabilities that the random variables take values in arbitrary ranges stay the same we need to apply the usual change of variables formula used when making a substitution in an integral. For instance if we want to make the change of variables λ = exp(l) l = log λ inside the integral first we need to calculate the Jacobian of this transform l λ = λ and we also need to find the region in λ space to which corresponds to the region in l space being integrated over C = {exp l : l C} l C

5 This gives us that [ ] P w A, ϵ B, λ C data = = = p(w, ϵ, log λ data) l λ C λ dw dϵ dλ λ C Z exp {L(w, ϵ)} λ { D exp λw T w } dw dϵ dλ λ D exp {L(w, ϵ)} λ exp { λw T w } dw dϵ dλ λ C Z }{{} p (w,ϵ,λ data) and so we get that the new probability density function parametrised in terms of λ as opposed to log λ is p (w, ϵ, λ data) = D exp {L(w, ϵ)} λ exp { λw T w } Z Note that if D and so D D that this redefined density will be very similar to incorrect density found simply by swapping λ for log λ in the density above. This is why those who made an incorrect change of variables like this may have found their scatter plot looks visually indistinguishable from the correct version. An intuition here is that not correctly accounting for the change of variables by including the Jacobian term corresponds to using a different prior density on a parameter - in this case the prior on log λ has a very weak effect on the posterior density and the change to a different prior makes little difference. In general however not accounting for change of variables properly can lead to quite significant changes in the density being sampled from and so it is vital to always correctly apply the change of variables formula when redefining the variables in a density. 5

### Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

### Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

### Stochastic Processes

qmc082.tex. Version of 30 September 2010. Lecture Notes on Quantum Mechanics No. 8 R. B. Griffiths References: Stochastic Processes CQT = R. B. Griffiths, Consistent Quantum Theory (Cambridge, 2002) DeGroot

### Machine Learning

Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 13, 2011 Today: The Big Picture Overfitting Review: probability Readings: Decision trees, overfiting

### TECHNISCHE UNIVERSITEIT EINDHOVEN Faculteit Wiskunde en Informatica. Final exam Logic & Set Theory (2IT61) (correction model)

TECHNISCHE UNIVERSITEIT EINDHOVEN Faculteit Wiskunde en Informatica Final exam Logic & Set Theory (2IT61) (correction model) Thursday November 4, 2016, 9:00 12:00 hrs. (2) 1. Determine whether the abstract

### 19 : Bayesian Nonparametrics: The Indian Buffet Process. 1 Latent Variable Models and the Indian Buffet Process

10-708: Probabilistic Graphical Models, Spring 2015 19 : Bayesian Nonparametrics: The Indian Buffet Process Lecturer: Avinava Dubey Scribes: Rishav Das, Adam Brodie, and Hemank Lamba 1 Latent Variable

### L2: Review of probability and statistics

Probability L2: Review of probability and statistics Definition of probability Axioms and properties Conditional probability Bayes theorem Random variables Definition of a random variable Cumulative distribution

### 18.05 Final Exam. Good luck! Name. No calculators. Number of problems 16 concept questions, 16 problems, 21 pages

Name No calculators. 18.05 Final Exam Number of problems 16 concept questions, 16 problems, 21 pages Extra paper If you need more space we will provide some blank paper. Indicate clearly that your solution

### Machine Learning 2017

Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section

### Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes

Statistical Techniques in Robotics (16-831, F12) Lecture#21 (Monday November 12) Gaussian Processes Lecturer: Drew Bagnell Scribe: Venkatraman Narayanan 1, M. Koval and P. Parashar 1 Applications of Gaussian

### CPSC 540: Machine Learning

CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

### A-LEVEL STATISTICS. SS04 Report on the Examination June Version: 1.0

A-LEVEL STATISTICS SS04 Report on the Examination 6380 June 2016 Version: 1.0 Further copies of this Report are available from aqa.org.uk Copyright 2016 AQA and its licensors. All rights reserved. AQA

### Feature selection and classifier performance in computer-aided diagnosis: The effect of finite sample size

Feature selection and classifier performance in computer-aided diagnosis: The effect of finite sample size Berkman Sahiner, a) Heang-Ping Chan, Nicholas Petrick, Robert F. Wagner, b) and Lubomir Hadjiiski

### Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

### CSE 150. Assignment 6 Summer Maximum likelihood estimation. Out: Thu Jul 14 Due: Tue Jul 19

SE 150. Assignment 6 Summer 2016 Out: Thu Jul 14 ue: Tue Jul 19 6.1 Maximum likelihood estimation A (a) omplete data onsider a complete data set of i.i.d. examples {a t, b t, c t, d t } T t=1 drawn from

### Bayesian linear regression

Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

### Probability and Statistics

Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be CHAPTER 4: IT IS ALL ABOUT DATA 4a - 1 CHAPTER 4: IT

### Review of Probability Theory

Review of Probability Theory Arian Maleki and Tom Do Stanford University Probability theory is the study of uncertainty Through this class, we will be relying on concepts from probability theory for deriving

### Errors: What they are, and how to deal with them

Errors: What they are, and how to deal with them A series of three lectures plus exercises, by Alan Usher Room 111, a.usher@ex.ac.uk Synopsis 1) Introduction ) Rules for quoting errors 3) Combining errors

### Conditional Distributions

Conditional Distributions The goal is to provide a general definition of the conditional distribution of Y given X, when (X, Y ) are jointly distributed. Let F be a distribution function on R. Let G(,

### 1 Introduction. Systems 2: Simulating Errors. Mobile Robot Systems. System Under. Environment

Systems 2: Simulating Errors Introduction Simulating errors is a great way to test you calibration algorithms, your real-time identification algorithms, and your estimation algorithms. Conceptually, the

### THE SIMPLE PROOF OF GOLDBACH'S CONJECTURE. by Miles Mathis

THE SIMPLE PROOF OF GOLDBACH'S CONJECTURE by Miles Mathis miles@mileswmathis.com Abstract Here I solve Goldbach's Conjecture by the simplest method possible. I do this by first calculating probabilites

### Chapter 8: An Introduction to Probability and Statistics

Course S3, 200 07 Chapter 8: An Introduction to Probability and Statistics This material is covered in the book: Erwin Kreyszig, Advanced Engineering Mathematics (9th edition) Chapter 24 (not including

### Hierarchical Modeling for Univariate Spatial Data

Hierarchical Modeling for Univariate Spatial Data Geography 890, Hierarchical Bayesian Models for Environmental Spatial Data Analysis February 15, 2011 1 Spatial Domain 2 Geography 890 Spatial Domain This

### PERFECTLY secure key agreement has been studied recently

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 45, NO. 2, MARCH 1999 499 Unconditionally Secure Key Agreement the Intrinsic Conditional Information Ueli M. Maurer, Senior Member, IEEE, Stefan Wolf Abstract

### CS 630 Basic Probability and Information Theory. Tim Campbell

CS 630 Basic Probability and Information Theory Tim Campbell 21 January 2003 Probability Theory Probability Theory is the study of how best to predict outcomes of events. An experiment (or trial or event)

### Bayesian Learning (II)

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

### Measurement: The Basics

I. Introduction Measurement: The Basics Physics is first and foremost an experimental science, meaning that its accumulated body of knowledge is due to the meticulous experiments performed by teams of

### Parametric Techniques

Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

### Why Bayesian? Rigorous approach to address statistical estimation problems. The Bayesian philosophy is mature and powerful.

Why Bayesian? Rigorous approach to address statistical estimation problems. The Bayesian philosophy is mature and powerful. Even if you aren t Bayesian, you can define an uninformative prior and everything

### Political Science 236 Hypothesis Testing: Review and Bootstrapping

Political Science 236 Hypothesis Testing: Review and Bootstrapping Rocío Titiunik Fall 2007 1 Hypothesis Testing Definition 1.1 Hypothesis. A hypothesis is a statement about a population parameter The

### Today. Calculus. Linear Regression. Lagrange Multipliers

Today Calculus Lagrange Multipliers Linear Regression 1 Optimization with constraints What if I want to constrain the parameters of the model. The mean is less than 10 Find the best likelihood, subject

### Machine Learning, Fall 2012 Homework 2

0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0

### !t + U " #Q. Solving Physical Problems: Pitfalls, Guidelines, and Advice

Solving Physical Problems: Pitfalls, Guidelines, and Advice Some Relations in METR 420 There are a relatively small number of mathematical and physical relations that we ve used so far in METR 420 or that

### Midterm, Fall 2003

5-78 Midterm, Fall 2003 YOUR ANDREW USERID IN CAPITAL LETTERS: YOUR NAME: There are 9 questions. The ninth may be more time-consuming and is worth only three points, so do not attempt 9 unless you are

### Inference with few assumptions: Wasserman s example

Inference with few assumptions: Wasserman s example Christopher A. Sims Princeton University sims@princeton.edu October 27, 2007 Types of assumption-free inference A simple procedure or set of statistics

### Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

### GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

### Basic Probability space, sample space concepts and order of a Stochastic Process

The Lecture Contains: Basic Introduction Basic Probability space, sample space concepts and order of a Stochastic Process Examples Definition of Stochastic Process Marginal Distributions Moments Gaussian

### The Origin of Deep Learning. Lili Mou Jan, 2015

The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets

### Midterm Exam Solutions, Spring 2007

1-71 Midterm Exam Solutions, Spring 7 1. Personal info: Name: Andrew account: E-mail address:. There should be 16 numbered pages in this exam (including this cover sheet). 3. You can use any material you

### Bayesian Inference for Normal Mean

Al Nosedal. University of Toronto. November 18, 2015 Likelihood of Single Observation The conditional observation distribution of y µ is Normal with mean µ and variance σ 2, which is known. Its density

### An Introduction to Reversible Jump MCMC for Bayesian Networks, with Application

An Introduction to Reversible Jump MCMC for Bayesian Networks, with Application, CleverSet, Inc. STARMAP/DAMARS Conference Page 1 The research described in this presentation has been funded by the U.S.

### Error analysis in biology

Error analysis in biology Marek Gierliński Division of Computational Biology Hand-outs available at http://is.gd/statlec Errors, like straws, upon the surface flow; He who would search for pearls must

### ECONOMETRIC METHODS II: TIME SERIES LECTURE NOTES ON THE KALMAN FILTER. The Kalman Filter. We will be concerned with state space systems of the form

ECONOMETRIC METHODS II: TIME SERIES LECTURE NOTES ON THE KALMAN FILTER KRISTOFFER P. NIMARK The Kalman Filter We will be concerned with state space systems of the form X t = A t X t 1 + C t u t 0.1 Z t

### Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

### CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2017. Tom M. Mitchell. All rights reserved. *DRAFT OF September 16, 2017* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is

### STA 294: Stochastic Processes & Bayesian Nonparametrics

MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a

### MCMC notes by Mark Holder

MCMC notes by Mark Holder Bayesian inference Ultimately, we want to make probability statements about true values of parameters, given our data. For example P(α 0 < α 1 X). According to Bayes theorem:

### Statistics for the LHC Lecture 1: Introduction

Statistics for the LHC Lecture 1: Introduction Academic Training Lectures CERN, 14 17 June, 2010 indico.cern.ch/conferencedisplay.py?confid=77830 Glen Cowan Physics Department Royal Holloway, University

### Notes 3: Statistical Inference: Sampling, Sampling Distributions Confidence Intervals, and Hypothesis Testing

Notes 3: Statistical Inference: Sampling, Sampling Distributions Confidence Intervals, and Hypothesis Testing 1. Purpose of statistical inference Statistical inference provides a means of generalizing

### Fundamental Issues in Bayesian Functional Data Analysis. Dennis D. Cox Rice University

Fundamental Issues in Bayesian Functional Data Analysis Dennis D. Cox Rice University 1 Introduction Question: What are functional data? Answer: Data that are functions of a continuous variable.... say

Stat 200 The Bayesian Paradigm Friday March 2nd The Bayesian Paradigm can be seen in some ways as an extra step in the modelling world just as parametric modelling is. We have seen how we could use probabilistic

### Partial factor modeling: predictor-dependent shrinkage for linear regression

modeling: predictor-dependent shrinkage for linear Richard Hahn, Carlos Carvalho and Sayan Mukherjee JASA 2013 Review by Esther Salazar Duke University December, 2013 Factor framework The factor framework

### Metropolis-Hastings Algorithm

Strength of the Gibbs sampler Metropolis-Hastings Algorithm Easy algorithm to think about. Exploits the factorization properties of the joint probability distribution. No difficult choices to be made to

### THE ROLE OF COMPUTER BASED TECHNOLOGY IN DEVELOPING UNDERSTANDING OF THE CONCEPT OF SAMPLING DISTRIBUTION

THE ROLE OF COMPUTER BASED TECHNOLOGY IN DEVELOPING UNDERSTANDING OF THE CONCEPT OF SAMPLING DISTRIBUTION Kay Lipson Swinburne University of Technology Australia Traditionally, the concept of sampling

### STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

### Generalized Linear Models for Non-Normal Data

Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture

### Uni- and Bivariate Power

Uni- and Bivariate Power Copyright 2002, 2014, J. Toby Mordkoff Note that the relationship between risk and power is unidirectional. Power depends on risk, but risk is completely independent of power.

### Optimal Sequencing of Inspection of Dependent Characteristics 1

JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS: Vol. 103, No. 3, pp. 557-565, DECEMBER 1999 Optimal Sequencing of Inspection of Dependent Characteristics 1 S. O. DUFFUUA 2 AND S. M. POLLOCK 3 Communicated

### MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

### PHY 123 Lab 1 - Error and Uncertainty and the Simple Pendulum

To print higher-resolution math symbols, click the Hi-Res Fonts for Printing button on the jsmath control panel. PHY 13 Lab 1 - Error and Uncertainty and the Simple Pendulum Important: You need to print

### MA554 Assessment 1 Cosets and Lagrange s theorem

MA554 Assessment 1 Cosets and Lagrange s theorem These are notes on cosets and Lagrange s theorem; they go over some material from the lectures again, and they have some new material it is all examinable,

### Multivariate random variables

DS-GA 002 Lecture notes 3 Fall 206 Introduction Multivariate random variables Probabilistic models usually include multiple uncertain numerical quantities. In this section we develop tools to characterize

### Bayesian PalaeoClimate Reconstruction from proxies:

Bayesian PalaeoClimate Reconstruction from proxies: Framework Bayesian modelling of space-time processes General Circulation Models Space time stochastic process C = {C(x,t) = Multivariate climate at all

### An Adaptive Bayesian Network for Low-Level Image Processing

An Adaptive Bayesian Network for Low-Level Image Processing S P Luttrell Defence Research Agency, Malvern, Worcs, WR14 3PS, UK. I. INTRODUCTION Probability calculus, based on the axioms of inference, Cox

### Linear Models for Regression CS534

Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

### Central Limit Theorem and the Law of Large Numbers Class 6, Jeremy Orloff and Jonathan Bloom

Central Limit Theorem and the Law of Large Numbers Class 6, 8.5 Jeremy Orloff and Jonathan Bloom Learning Goals. Understand the statement of the law of large numbers. 2. Understand the statement of the

### 2 Belief, probability and exchangeability

2 Belief, probability and exchangeability We first discuss what properties a reasonable belief function should have, and show that probabilities have these properties. Then, we review the basic machinery

### SAMSI Astrostatistics Tutorial. More Markov chain Monte Carlo & Demo of Mathematica software

SAMSI Astrostatistics Tutorial More Markov chain Monte Carlo & Demo of Mathematica software Phil Gregory University of British Columbia 26 Bayesian Logical Data Analysis for the Physical Sciences Contents:

### The Inductive Proof Template

CS103 Handout 24 Winter 2016 February 5, 2016 Guide to Inductive Proofs Induction gives a new way to prove results about natural numbers and discrete structures like games, puzzles, and graphs. All of

### Note that we are looking at the true mean, μ, not y. The problem for us is that we need to find the endpoints of our interval (a, b).

Confidence Intervals 1) What are confidence intervals? Simply, an interval for which we have a certain confidence. For example, we are 90% certain that an interval contains the true value of something

### Bayesian Concept Learning

Learning from positive and negative examples Bayesian Concept Learning Chen Yu Indiana University With both positive and negative examples, it is easy to define a boundary to separate these two. Just with

### Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability

### Atutorialonactivelearning

Atutorialonactivelearning Sanjoy Dasgupta 1 John Langford 2 UC San Diego 1 Yahoo Labs 2 Exploiting unlabeled data Alotofunlabeleddataisplentifulandcheap,eg. documents off the web speech samples images

### Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation

### y Xw 2 2 y Xw λ w 2 2

CS 189 Introduction to Machine Learning Spring 2018 Note 4 1 MLE and MAP for Regression (Part I) So far, we ve explored two approaches of the regression framework, Ordinary Least Squares and Ridge Regression:

### Linear Models for Regression CS534

Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

### Relevance Vector Machines

LUT February 21, 2011 Support Vector Machines Model / Regression Marginal Likelihood Regression Relevance vector machines Exercise Support Vector Machines The relevance vector machine (RVM) is a bayesian

### Bayesian Updating with Continuous Priors Class 13, Jeremy Orloff and Jonathan Bloom

Bayesian Updating with Continuous Priors Class 3, 8.05 Jeremy Orloff and Jonathan Bloom Learning Goals. Understand a parameterized family of distributions as representing a continuous range of hypotheses

### Bayesian inference: what it means and why we care

Bayesian inference: what it means and why we care Robin J. Ryder Centre de Recherche en Mathématiques de la Décision Université Paris-Dauphine 6 November 2017 Mathematical Coffees Robin Ryder (Dauphine)

### MATH2206 Prob Stat/20.Jan Weekly Review 1-2

MATH2206 Prob Stat/20.Jan.2017 Weekly Review 1-2 This week I explained the idea behind the formula of the well-known statistic standard deviation so that it is clear now why it is a measure of dispersion

### Ensemble Data Assimilation and Uncertainty Quantification

Ensemble Data Assimilation and Uncertainty Quantification Jeff Anderson National Center for Atmospheric Research pg 1 What is Data Assimilation? Observations combined with a Model forecast + to produce

### Expected Value II. 1 The Expected Number of Events that Happen

6.042/18.062J Mathematics for Computer Science December 5, 2006 Tom Leighton and Ronitt Rubinfeld Lecture Notes Expected Value II 1 The Expected Number of Events that Happen Last week we concluded by showing

### Regularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline

Other Measures 1 / 52 sscott@cse.unl.edu learning can generally be distilled to an optimization problem Choose a classifier (function, hypothesis) from a set of functions that minimizes an objective function

### Kernel-based density. Nuno Vasconcelos ECE Department, UCSD

Kernel-based density estimation Nuno Vasconcelos ECE Department, UCSD Announcement last week of classes we will have Cheetah Day (exact day TBA) what: 4 teams of 6 people each team will write a report

### Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasini/teaching.html Lecture 5 (MWF) Probabilities and the rules Suhasini Subba Rao Review of previous lecture We looked

### Online Learning, Mistake Bounds, Perceptron Algorithm

Online Learning, Mistake Bounds, Perceptron Algorithm 1 Online Learning So far the focus of the course has been on batch learning, where algorithms are presented with a sample of training data, from which

### Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features Yangxin Huang Department of Epidemiology and Biostatistics, COPH, USF, Tampa, FL yhuang@health.usf.edu January

### Nonparametric Drift Estimation for Stochastic Differential Equations

Nonparametric Drift Estimation for Stochastic Differential Equations Gareth Roberts 1 Department of Statistics University of Warwick Brazilian Bayesian meeting, March 2010 Joint work with O. Papaspiliopoulos,

### Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Melanie M. Wall, Bradley P. Carlin November 24, 2014 Outlines of the talk

### Homework 2 Solutions Kernel SVM and Perceptron

Homework 2 Solutions Kernel SVM and Perceptron CMU 1-71: Machine Learning (Fall 21) https://piazza.com/cmu/fall21/17115781/home OUT: Sept 25, 21 DUE: Oct 8, 11:59 PM Problem 1: SVM decision boundaries

### Technical Details about the Expectation Maximization (EM) Algorithm

Technical Details about the Expectation Maximization (EM Algorithm Dawen Liang Columbia University dliang@ee.columbia.edu February 25, 2015 1 Introduction Maximum Lielihood Estimation (MLE is widely used

### Expectations and Variance

4. Model parameters and their estimates 4.1 Expected Value and Conditional Expected Value 4. The Variance 4.3 Population vs Sample Quantities 4.4 Mean and Variance of a Linear Combination 4.5 The Covariance

### Data Preprocessing. Cluster Similarity

1 Cluster Similarity Similarity is most often measured with the help of a distance function. The smaller the distance, the more similar the data objects (points). A function d: M M R is a distance on M

### Unitary Dynamics and Quantum Circuits

qitd323 Unitary Dynamics and Quantum Circuits Robert B. Griffiths Version of 20 January 2014 Contents 1 Unitary Dynamics 1 1.1 Time development operator T.................................... 1 1.2 Particular

### Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

Econ 690 Purdue University In virtually all of the previous lectures, our models have made use of normality assumptions. From a computational point of view, the reason for this assumption is clear: combined