Nearest-Neighbor Matchup Effects: Predicting March Madness

Size: px
Start display at page:

Download "Nearest-Neighbor Matchup Effects: Predicting March Madness"

Transcription

1 Nearest-Neighbor Matchup Effects: Predicting March Madness Andrew Hoegh, Marcos Carzolio, Ian Crandell, Xinran Hu, Lucas Roberts, Yuhyun Song, Scotland Leman Department of Statistics, Virginia Tech September 26, 2015

2 NCAA Bracket Competition Who is has filled out an NCAA bracket before? Introduction Motivation 2 / 36

3 NCAA Bracket Competition Who has won an NCAA bracket competition? Introduction Motivation 3 / 36

4 NCAA Bracket Competition Who has lost a bracket competition to someone that does not know what a basketball looks like? Introduction Motivation 4 / 36

5 Talk Overview General modeling strategies for NCAA tournament competitions Matchup effects modeling framework Uncertainty inherent in NCAA tournament & competitions Introduction Overview 5 / 36

6 Competition Types General Modeling Competition Types 6 / 36

7 Loss Functions: Bracket L(y, ŷ) = cδ(y ŷ) L(y, ŷ, r) = c r δ(y ŷ) General Modeling Decision Theory 7 / 36

8 Loss Functions: Probabilistic Prediction Kaggle loss function L(y, p) = y log(p) (1 y) log(1 p) This is a proper scoring rule. L(y=1,p) P General Modeling Decision Theory 8 / 36

9 Relevant Data: Team Characteristics There are many important characteristics useful for modeling the strength of an NCAA basketball team: Winning percentage, Point differential, Strength of schedule, Conference affiliation,... Rebounding percentage, Adjusted offensive efficiency, and Adjusted defensive efficiency. General Modeling Data 9 / 36

10 Relevant Data: Ratings and Rankings Rather than using team characteristics, there are many available rating and ranking systems: ESPN BPI, Sagarin, RPI, Pomeroy, and Logistic Regression/Markov Chain (LRMC). General Modeling Data 10 / 36

11 Relative Strength Models Typically the model framework for predicting winner of games (or winning probabilities) can be formulated as a relative strength model. Formally this can be expressed as: y ij = f (θ i, θ j ) General Modeling Data 11 / 36

12 Relative Strength Models Typically the model framework for predicting winner of games (or winning probabilities) can be formulated as a relative strength model. Formally this can be expressed as: y ij = f (θ i, θ j ) such as linear model: y ij = β home + (θ i θ j )β D + ɛ ij, where ɛ ij N (0, σ 2 ) General Modeling Data 11 / 36

13 Relative Strength Models Typically the model framework for predicting winner of games (or winning probabilities) can be formulated as a relative strength model. Formally this can be expressed as: y ij = f (θ i, θ j ) such as linear model: y ij = β home + (θ i θ j )β D + ɛ ij, where ɛ ij N (0, σ 2 ) logistic regression: y ij Bernoulli(p ij ) where or logit(p ij ) = β home + (θ i θ j )β D General Modeling Data 11 / 36

14 Transitivity Note that relative strength models of this type are strictly transitive, where P A>B denotes the probability that team A beats team B. Under transitive models, {P A>B > 0.5 P B>C > 0.5} P A>C > 0.5. General Modeling Data 12 / 36

15 Bayesian Relative Strength Models Bayesian Linear Model Prior β N (β; m, s) Likelihood β Y, X N (β; ˆβ, Σ) Posterior β Y, X N (β; µ, Σ β ) General Modeling Data 13 / 36

16 Bayesian Relative Strength Models Bayesian Linear Model Prior β N (β; m, s) Likelihood β Y, X N (β; ˆβ, Σ) Posterior β Y, X N (β; µ, Σ β ) General Modeling Data 14 / 36

17 Bayesian Relative Strength Models Bayesian Linear Model Prior β N (β; m, s) Likelihood β Y, X N (β; ˆβ, Σ) Posterior β Y, X N (β; µ, Σ β ) General Modeling Data 15 / 36

18 Bayesian Relative Strength Models Bayesian Linear Model Predictive: p(y X, Y, X) = p(y X, β)p(β Y, X)dβ P(Y < 0) = 0 p(y X, Y, X)dY General Modeling Data 16 / 36

19 Analytics for Bracket Competitions General Modeling Data 17 / 36

20 Analytics for Kaggle-Style Competitions A substantial challenge in predictive modeling of sports competitions, and major discussion point, is predicting upsets. Consider predicting upsets as a comparison of the probabilistic predictions for competitors. General Modeling Data 18 / 36

21 Analytics for Kaggle-Style Competitions Distribution of Predictions A substantial challenge in predictive modeling of sports competitions, and major discussion point, is predicting upsets. Consider predicting upsets as a comparison of the probabilistic predictions for competitors. Your Prediction P(Upset) General Modeling Data 18 / 36

22 Matchup Effects Motivating Quote Well, it s hard to predict a particular team. You have to look at the higher seed and see, do they overwhelm the smaller school or the lower seed? Can they overwhelm someone with their athleticism or length or size or quickness or speed? Do they play a particular style of the pressure defense? And I think I look at the lower seed then and say, can they counter the higher seed s strengths? Do they shoot the three point shot well? Do they have athleticism at certain positions? Do they play a particular style that will give the higher seed trouble? Andy Enfield Nearest Neighbor Matchup Effects motivation 19 / 36

23 NNME Intuition Observed Results Games Nearest Neighbor Matchup Effects motivation 20 / 36

24 NNME Intuition Observed Results Estimated Strength Games Nearest Neighbor Matchup Effects motivation 21 / 36

25 NNME Intuition Observed Results Games Nearest Neighbor Matchup Effects motivation 22 / 36

26 NNME Intuition Observed Results Games Nearest Neighbor Matchup Effects motivation 23 / 36

27 Outline of the NNME Estimation of the nearest-neighbor matchup effects has three components: 1. Fit a relative strength model, 2. Identify neighbors for the matchup, and 3. Calibrate the matchup adjustment. Nearest Neighbor Matchup Effects Outline 24 / 36

28 Relative Strength Model Consider the simple relative strength model using the Sagarin ratings and home court effect. Y ij = β home + D ij β D + ɛ ij, ɛ ij N (0, σ 2 ) where D ij = Sagarin i Sagarin j Nearest Neighbor Matchup Effects Relative Strength Model 25 / 36

29 Relative Strength Model Consider the simple relative strength model using the Sagarin ratings and home court effect. Y ij = β home + D ij β D + ɛ ij, ɛ ij N (0, σ 2 ) where D ij = Sagarin i Sagarin j Note that relative strength models of this type are strictly transitive. Nearest Neighbor Matchup Effects Relative Strength Model 25 / 36

30 Identifying Neighbors Identifying neighbors first requires specifying team characteristics and a distance function between teams. We used a collection of data from Ken Pomeroy s including: Effective Height Adjusted Tempo Effective Field Goal Percentage Defense Offensive Rebound Percentage Block Percentage Steal Rate Three Point Field Goal Contribution Nearest Neighbor Matchup Effects Identifying Neighbors 26 / 36

31 Uniformly Weighting Characteristics Nearest Neighbor Matchup Effects Identifying Neighbors 27 / 36

32 Bayesian Visual Analytics Nearest Neighbor Matchup Effects Identifying Neighbors 28 / 36

33 Bayesian Visual Analytics Nearest Neighbor Matchup Effects Identifying Neighbors 29 / 36

34 Finding Neighbors For each particular matchup, say Dayton vs. Stanford, identify past opponents to see who is most like the current opponent. Teams Dayton played similar to Stanford Teams Stanford played similar to Dayton California George Mason Georgia Tech George Washington Gonzaga Cal Poly California Oregon Pittsburgh Utah Nearest Neighbor Matchup Effects Identifying Neighbors 30 / 36

35 Matchup Adjustment Notation Let R j (i) = 1 K k N j (Y ik µ ik ), where N j are the neighbors for team i with respect to team j, Y ik is the observed point differential between team i, and team k and µ ik is the expected point differential between team i and team k. Then φ ij = ρ(r i (j) R j (i)), where ρ controls how much information is passed from the neighbors. Nearest Neighbor Matchup Effects Matchup Adjustment 31 / 36

36 Predictive Distribution Then using the relative strength model previously described, the predictive distribution becomes: Y ij X ij = β home + D ij β D + φ ij + ɛ ij Effectively φ ij shifts the predictive distribution and results in a non-transitive model. Nearest Neighbor Matchup Effects Matchup Adjustment 32 / 36

37 Estimating Model Parameters Using data across NCAA tournaments from , the model parameters are estimated. β home β D σ 2 ρ Posterior mean Credible interval (3.83,3.91) (0.909,0.916) (120.0,123.2) (0.012,0.454) The positive credible interval suggests a moderate, but meaningful result from the matchup effect. Nearest Neighbor Matchup Effects Results 33 / 36

38 Demonstration: 2014 NCAA Tournament Largest shifts in expected point spread (φ ij ). Team 1 Team 1 R 1 (2) R 2 (1) φ 12 Cal Poly Wichita St UConn St. Joes Dayton Stanford Dayton Syracuse Kentucky Michigan UMass Tennessee Memphis Virginia Michigan Tennessee Michigan Texas Syracuse W.Mich Nearest Neighbor Matchup Effects Demonstration 34 / 36

39 Closer Look: 2014 NCAA Tournament Team Neighbor Point Diff. E[Point Diff] Residual Dayton California Dayton Gonzaga Dayton George Mason Dayton Georgia Tech Dayton George Washington Stanford California Stanford California Stanford Oregon Stanford Pittsburgh Stanford Cal Poly Stanford Utah Nearest Neighbor Matchup Effects Demonstration 35 / 36

40 NNME Concluding Thoughts 1. Evaluated on small number of data points leads to very uncertain outcomes, even with quality predictions 2. Does this contest actually have a proper scoring rule? 3. Alternative strategies (maximizing expected return). Nearest Neighbor Matchup Effects Demonstration 36 / 36

41 NNME Concluding Thoughts 1. Evaluated on small number of data points leads to very uncertain outcomes, even with quality predictions 2. Does this contest actually have a proper scoring rule? 3. Alternative strategies (maximizing expected return). Nearest Neighbor Matchup Effects Demonstration 36 / 36

Lesson Plan. AM 121: Introduction to Optimization Models and Methods. Lecture 17: Markov Chains. Yiling Chen SEAS. Stochastic process Markov Chains

Lesson Plan. AM 121: Introduction to Optimization Models and Methods. Lecture 17: Markov Chains. Yiling Chen SEAS. Stochastic process Markov Chains AM : Introduction to Optimization Models and Methods Lecture 7: Markov Chains Yiling Chen SEAS Lesson Plan Stochastic process Markov Chains n-step probabilities Communicating states, irreducibility Recurrent

More information

A Better BCS. Rahul Agrawal, Sonia Bhaskar, Mark Stefanski

A Better BCS. Rahul Agrawal, Sonia Bhaskar, Mark Stefanski 1 Better BCS Rahul grawal, Sonia Bhaskar, Mark Stefanski I INRODUCION Most NC football teams never play each other in a given season, which has made ranking them a long-running and much-debated problem

More information

Colley s Method. r i = w i t i. The strength of the opponent is not factored into the analysis.

Colley s Method. r i = w i t i. The strength of the opponent is not factored into the analysis. Colley s Method. Chapter 3 from Who s # 1 1, chapter available on Sakai. Colley s method of ranking, which was used in the BCS rankings prior to the change in the system, is a modification of the simplest

More information

All-Time Conference Standings

All-Time Conference Standings All-Time Conference Standings Pac 12 Conference Conf. Matches Sets Overall Matches Team W L Pct W L Pct. Score Opp Last 10 Streak Home Away Neutral W L Pct. Arizona 6 5.545 22 19.537 886 889 6-4 W5 4-2

More information

Ranking using Matrices.

Ranking using Matrices. Anjela Govan, Amy Langville, Carl D. Meyer Northrop Grumman, College of Charleston, North Carolina State University June 2009 Outline Introduction Offense-Defense Model Generalized Markov Model Data Game

More information

Swarthmore Honors Exam 2012: Statistics

Swarthmore Honors Exam 2012: Statistics Swarthmore Honors Exam 2012: Statistics 1 Swarthmore Honors Exam 2012: Statistics John W. Emerson, Yale University NAME: Instructions: This is a closed-book three-hour exam having six questions. You may

More information

Approximate Inference in Practice Microsoft s Xbox TrueSkill TM

Approximate Inference in Practice Microsoft s Xbox TrueSkill TM 1 Approximate Inference in Practice Microsoft s Xbox TrueSkill TM Daniel Hernández-Lobato Universidad Autónoma de Madrid 2014 2 Outline 1 Introduction 2 The Probabilistic Model 3 Approximate Inference

More information

arxiv: v1 [math.pr] 14 Dec 2016

arxiv: v1 [math.pr] 14 Dec 2016 Random Knockout Tournaments arxiv:1612.04448v1 [math.pr] 14 Dec 2016 Ilan Adler Department of Industrial Engineering and Operations Research University of California, Berkeley Yang Cao Department of Industrial

More information

Approximate Bayesian binary and ordinal regression for prediction with structured uncertainty in the inputs

Approximate Bayesian binary and ordinal regression for prediction with structured uncertainty in the inputs Approximate Bayesian binary and ordinal regression for prediction with structured uncertainty in the inputs Aleksandar Dimitriev Erik Štrumbelj Abstract We present a novel approach to binary and ordinal

More information

Bayesian locally-optimal design of knockout tournaments

Bayesian locally-optimal design of knockout tournaments Bayesian locally-optimal design of knockout tournaments Mark E. Glickman Department of Health Services Boston University School of Public Health Abstract The elimination or knockout format is one of the

More information

SEEDING, COMPETITIVE INTENSITY AND QUALITY IN KNOCK-OUT TOURNAMENTS

SEEDING, COMPETITIVE INTENSITY AND QUALITY IN KNOCK-OUT TOURNAMENTS Dmitry Dagaev, Alex Suzdaltsev SEEDING, COMPETITIVE INTENSITY AND QUALITY IN KNOCK-OUT TOURNAMENTS BASIC RESEARCH PROGRAM WORKING PAPERS SERIES: ECONOMICS WP BRP 91/EC/2015 This Working Paper is an output

More information

g-priors for Linear Regression

g-priors for Linear Regression Stat60: Bayesian Modeling and Inference Lecture Date: March 15, 010 g-priors for Linear Regression Lecturer: Michael I. Jordan Scribe: Andrew H. Chan 1 Linear regression and g-priors In the last lecture,

More information

The AGA Ratings System

The AGA Ratings System The AGA Ratings System Philip Waldron June 3, 00 Basic description The AGA ratings system uses a Bayesian approach where a player s rating rating is updated based on additional information that is available

More information

Rank University AMJ AMR ASQ JAP OBHDP OS PPSYCH SMJ SUM 1 University of Pennsylvania (T) Michigan State University

Rank University AMJ AMR ASQ JAP OBHDP OS PPSYCH SMJ SUM 1 University of Pennsylvania (T) Michigan State University Rank University AMJ AMR ASQ JAP OBHDP OS PPSYCH SMJ SUM 1 University of Pennsylvania 4 1 2 0 2 4 0 9 22 2(T) Michigan State University 2 0 0 9 1 0 0 4 16 University of Michigan 3 0 2 5 2 0 0 4 16 4 Harvard

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling

Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling Jon Wakefield Departments of Statistics and Biostatistics University of Washington 1 / 37 Lecture Content Motivation

More information

Bayesian model selection: methodology, computation and applications

Bayesian model selection: methodology, computation and applications Bayesian model selection: methodology, computation and applications David Nott Department of Statistics and Applied Probability National University of Singapore Statistical Genomics Summer School Program

More information

12 Generalized linear models

12 Generalized linear models 12 Generalized linear models In this chapter, we combine regression models with other parametric probability models like the binomial and Poisson distributions. Binary responses In many situations, we

More information

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation Patrick Breheny February 8 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/27 Introduction Basic idea Standardization Large-scale testing is, of course, a big area and we could keep talking

More information

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions

More information

Multivariate Analysis

Multivariate Analysis Multivariate Analysis Chapter 5: Cluster analysis Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2015/2016 Master in Business Administration and

More information

Time Series Prediction by Kalman Smoother with Cross-Validated Noise Density

Time Series Prediction by Kalman Smoother with Cross-Validated Noise Density Time Series Prediction by Kalman Smoother with Cross-Validated Noise Density Simo Särkkä E-mail: simo.sarkka@hut.fi Aki Vehtari E-mail: aki.vehtari@hut.fi Jouko Lampinen E-mail: jouko.lampinen@hut.fi Abstract

More information

A Bayesian Perspective on Residential Demand Response Using Smart Meter Data

A Bayesian Perspective on Residential Demand Response Using Smart Meter Data A Bayesian Perspective on Residential Demand Response Using Smart Meter Data Datong-Paul Zhou, Maximilian Balandat, and Claire Tomlin University of California, Berkeley [datong.zhou, balandat, tomlin]@eecs.berkeley.edu

More information

BIOS 312: Precision of Statistical Inference

BIOS 312: Precision of Statistical Inference and Power/Sample Size and Standard Errors BIOS 312: of Statistical Inference Chris Slaughter Department of Biostatistics, Vanderbilt University School of Medicine January 3, 2013 Outline Overview and Power/Sample

More information

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I SYDE 372 Introduction to Pattern Recognition Probability Measures for Classification: Part I Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 Why use probability

More information

Summary of Extending the Rank Likelihood for Semiparametric Copula Estimation, by Peter Hoff

Summary of Extending the Rank Likelihood for Semiparametric Copula Estimation, by Peter Hoff Summary of Extending the Rank Likelihood for Semiparametric Copula Estimation, by Peter Hoff David Gerard Department of Statistics University of Washington gerard2@uw.edu May 2, 2013 David Gerard (UW)

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Department of Forestry & Department of Geography, Michigan State University, Lansing Michigan, U.S.A. 2 Biostatistics, School of Public

More information

Predictive Engineering and Computational Sciences. Research Challenges in VUQ. Robert Moser. The University of Texas at Austin.

Predictive Engineering and Computational Sciences. Research Challenges in VUQ. Robert Moser. The University of Texas at Austin. PECOS Predictive Engineering and Computational Sciences Research Challenges in VUQ Robert Moser The University of Texas at Austin March 2012 Robert Moser 1 / 9 Justifying Extrapolitive Predictions Need

More information

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33 Hypothesis Testing Econ 690 Purdue University Justin L. Tobias (Purdue) Testing 1 / 33 Outline 1 Basic Testing Framework 2 Testing with HPD intervals 3 Example 4 Savage Dickey Density Ratio 5 Bartlett

More information

A Discussion of the Bayesian Approach

A Discussion of the Bayesian Approach A Discussion of the Bayesian Approach Reference: Chapter 10 of Theoretical Statistics, Cox and Hinkley, 1974 and Sujit Ghosh s lecture notes David Madigan Statistics The subject of statistics concerns

More information

AMS-207: Bayesian Statistics

AMS-207: Bayesian Statistics Linear Regression How does a quantity y, vary as a function of another quantity, or vector of quantities x? We are interested in p(y θ, x) under a model in which n observations (x i, y i ) are exchangeable.

More information

Optimal Seedings in Elimination Tournaments

Optimal Seedings in Elimination Tournaments Optimal Seedings in Elimination Tournaments Christian Groh Department of Economics, University of Bonn Konrad-Adenauerallee 24-42, 53131 Bonn, Germany groh@wiwi.uni-bonn.de Benny Moldovanu Department of

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Bayesian methods for missing data: part 1. Key Concepts. Nicky Best and Alexina Mason. Imperial College London

Bayesian methods for missing data: part 1. Key Concepts. Nicky Best and Alexina Mason. Imperial College London Bayesian methods for missing data: part 1 Key Concepts Nicky Best and Alexina Mason Imperial College London BAYES 2013, May 21-23, Erasmus University Rotterdam Missing Data: Part 1 BAYES2013 1 / 68 Outline

More information

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 Statistics Boot Camp Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 March 21, 2018 Outline of boot camp Summarizing and simplifying data Point and interval estimation Foundations of statistical

More information

Lecture 3: Multiclass Classification

Lecture 3: Multiclass Classification Lecture 3: Multiclass Classification Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Some slides are adapted from Vivek Skirmar and Dan Roth CS6501 Lecture 3 1 Announcement v Please enroll in

More information

Edexcel GCE Statistics S1 Advanced/Advanced Subsidiary

Edexcel GCE Statistics S1 Advanced/Advanced Subsidiary physicsandmathstutor.com Centre No. Candidate No. Paper Reference(s) 6683/01 Edexcel GCE Statistics S1 Advanced/Advanced Subsidiary Friday 20 May 2011 Afternoon Time: 1 hour 30 minutes Materials required

More information

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

IE 4521 Midterm #1. Prof. John Gunnar Carlsson. March 2, 2010

IE 4521 Midterm #1. Prof. John Gunnar Carlsson. March 2, 2010 IE 4521 Midterm #1 Prof. John Gunnar Carlsson March 2, 2010 Before you begin: This exam has 9 pages (including the normal distribution table) and a total of 8 problems. Make sure that all pages are present.

More information

Hypothesis Registration: Structural Predictions for the 2013 World Schools Debating Championships

Hypothesis Registration: Structural Predictions for the 2013 World Schools Debating Championships Hypothesis Registration: Structural Predictions for the 2013 World Schools Debating Championships Tom Gole * and Simon Quinn January 26, 2013 Abstract This document registers testable predictions about

More information

Announcements. CS 188: Artificial Intelligence Spring Mini-Contest Winners. Today. GamesCrafters. Adversarial Games

Announcements. CS 188: Artificial Intelligence Spring Mini-Contest Winners. Today. GamesCrafters. Adversarial Games CS 188: Artificial Intelligence Spring 2009 Lecture 7: Expectimax Search 2/10/2009 John DeNero UC Berkeley Slides adapted from Dan Klein, Stuart Russell or Andrew Moore Announcements Written Assignment

More information

Parameter Estimation. Industrial AI Lab.

Parameter Estimation. Industrial AI Lab. Parameter Estimation Industrial AI Lab. Generative Model X Y w y = ω T x + ε ε~n(0, σ 2 ) σ 2 2 Maximum Likelihood Estimation (MLE) Estimate parameters θ ω, σ 2 given a generative model Given observed

More information

Measuring Consensus in Binary Forecasts: NFL Game Predictions

Measuring Consensus in Binary Forecasts: NFL Game Predictions Measuring Consensus in Binary Forecasts: NFL Game Predictions ChiUng Song Science and Technology Policy Institute 26F., Specialty Construction Center 395-70 Shindaebang-dong, Tongjak-ku Seoul 156-714,

More information

A hybrid heuristic for minimizing weighted carry-over effects in round robin tournaments

A hybrid heuristic for minimizing weighted carry-over effects in round robin tournaments A hybrid heuristic for minimizing weighted carry-over effects Allison C. B. Guedes Celso C. Ribeiro Summary Optimization problems in sports Preliminary definitions The carry-over effects minimization problem

More information

Why a Bayesian researcher might prefer observational data

Why a Bayesian researcher might prefer observational data Why a Bayesian researcher might prefer observational data Macartan Humphreys May 2, 206 Abstract I give an illustration of a simple problem in which a Bayesian researcher can choose between random assignment

More information

Problem Selected Scores

Problem Selected Scores Statistics Ph.D. Qualifying Exam: Part II November 20, 2010 Student Name: 1. Answer 8 out of 12 problems. Mark the problems you selected in the following table. Problem 1 2 3 4 5 6 7 8 9 10 11 12 Selected

More information

Classification and Support Vector Machine

Classification and Support Vector Machine Classification and Support Vector Machine Yiyong Feng and Daniel P. Palomar The Hong Kong University of Science and Technology (HKUST) ELEC 5470 - Convex Optimization Fall 2017-18, HKUST, Hong Kong Outline

More information

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P. Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Melanie M. Wall, Bradley P. Carlin November 24, 2014 Outlines of the talk

More information

Multiclass Classification-1

Multiclass Classification-1 CS 446 Machine Learning Fall 2016 Oct 27, 2016 Multiclass Classification Professor: Dan Roth Scribe: C. Cheng Overview Binary to multiclass Multiclass SVM Constraint classification 1 Introduction Multiclass

More information

ST440/540: Applied Bayesian Statistics. (9) Model selection and goodness-of-fit checks

ST440/540: Applied Bayesian Statistics. (9) Model selection and goodness-of-fit checks (9) Model selection and goodness-of-fit checks Objectives In this module we will study methods for model comparisons and checking for model adequacy For model comparisons there are a finite number of candidate

More information

players based on a rating system

players based on a rating system A Bayesian regression approach to handicapping tennis players based on a rating system Timothy C. Y. Chan 1 and Raghav Singal,2 1 Department of Mechanical and Industrial Engineering, University of Toronto

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

Making rating curves - the Bayesian approach

Making rating curves - the Bayesian approach Making rating curves - the Bayesian approach Rating curves what is wanted? A best estimate of the relationship between stage and discharge at a given place in a river. The relationship should be on the

More information

R-squared for Bayesian regression models

R-squared for Bayesian regression models R-squared for Bayesian regression models Andrew Gelman Ben Goodrich Jonah Gabry Imad Ali 8 Nov 2017 Abstract The usual definition of R 2 (variance of the predicted values divided by the variance of the

More information

CS 4649/7649 Robot Intelligence: Planning

CS 4649/7649 Robot Intelligence: Planning CS 4649/7649 Robot Intelligence: Planning Probability Primer Sungmoon Joo School of Interactive Computing College of Computing Georgia Institute of Technology S. Joo (sungmoon.joo@cc.gatech.edu) 1 *Slides

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Naïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Naïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824 Naïve Bayes Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative HW 1 out today. Please start early! Office hours Chen: Wed 4pm-5pm Shih-Yang: Fri 3pm-4pm Location: Whittemore 266

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information

Lecture 3: Sports rating models

Lecture 3: Sports rating models Lecture 3: Sports rating models David Aldous January 27, 2016 Sports are a popular topic for course projects usually involving details of some specific sport and statistical analysis of data. In this lecture

More information

Rank and Rating Aggregation. Amy Langville Mathematics Department College of Charleston

Rank and Rating Aggregation. Amy Langville Mathematics Department College of Charleston Rank and Rating Aggregation Amy Langville Mathematics Department College of Charleston Hamilton Institute 8/6/2008 Collaborators Carl Meyer, N. C. State University College of Charleston (current and former

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

Class 26: review for final exam 18.05, Spring 2014

Class 26: review for final exam 18.05, Spring 2014 Probability Class 26: review for final eam 8.05, Spring 204 Counting Sets Inclusion-eclusion principle Rule of product (multiplication rule) Permutation and combinations Basics Outcome, sample space, event

More information

Bayesian Melding. Assessing Uncertainty in UrbanSim. University of Washington

Bayesian Melding. Assessing Uncertainty in UrbanSim. University of Washington Bayesian Melding Assessing Uncertainty in UrbanSim Hana Ševčíková University of Washington hana@stat.washington.edu Joint work with Paul Waddell and Adrian Raftery University of Washington UrbanSim Workshop,

More information

Machine Learning for NLP

Machine Learning for NLP Machine Learning for NLP Uppsala University Department of Linguistics and Philology Slides borrowed from Ryan McDonald, Google Research Machine Learning for NLP 1(50) Introduction Linear Classifiers Classifiers

More information

Be a Winner, Against All Odds by Rusty Withers

Be a Winner, Against All Odds by Rusty Withers Be a Winner, Against All Odds by Rusty Withers Rusty Withers is coming to OCA s Lecture Hall introduce you to the Game Chart The Game Chart tells who has a chance to win against Las Vegas Point Spread.

More information

Russ Althof For the DBF Governing Committee

Russ Althof For the DBF Governing Committee The 2013 AIAA/Cessna Aircraft Company/Raytheon Missile Systems Design/Build/Fly Competition Flyoff was held at TIMPA Field in Tucson, AZ on the weekend of April 19-21, 2013. This was the 17th year the

More information

Warwick Business School Forecasting System. Summary. Ana Galvao, Anthony Garratt and James Mitchell November, 2014

Warwick Business School Forecasting System. Summary. Ana Galvao, Anthony Garratt and James Mitchell November, 2014 Warwick Business School Forecasting System Summary Ana Galvao, Anthony Garratt and James Mitchell November, 21 The main objective of the Warwick Business School Forecasting System is to provide competitive

More information

Convergence of Random Processes

Convergence of Random Processes Convergence of Random Processes DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall17 Carlos Fernandez-Granda Aim Define convergence for random

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

Model comparison. Patrick Breheny. March 28. Introduction Measures of predictive power Model selection

Model comparison. Patrick Breheny. March 28. Introduction Measures of predictive power Model selection Model comparison Patrick Breheny March 28 Patrick Breheny BST 760: Advanced Regression 1/25 Wells in Bangladesh In this lecture and the next, we will consider a data set involving modeling the decisions

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

arxiv: v1 [physics.soc-ph] 24 Sep 2009

arxiv: v1 [physics.soc-ph] 24 Sep 2009 Final version is published in: Journal of Applied Statistics Vol. 36, No. 10, October 2009, 1087 1095 RESEARCH ARTICLE Are soccer matches badly designed experiments? G. K. Skinner a and G. H. Freeman b

More information

Probabilistic Reasoning in Deep Learning

Probabilistic Reasoning in Deep Learning Probabilistic Reasoning in Deep Learning Dr Konstantina Palla, PhD palla@stats.ox.ac.uk September 2017 Deep Learning Indaba, Johannesburgh Konstantina Palla 1 / 39 OVERVIEW OF THE TALK Basics of Bayesian

More information

Machine Learning (CS 567) Lecture 5

Machine Learning (CS 567) Lecture 5 Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

More information

A few Murray connections: Extended Bradley-Terry models. Bradley-Terry model. Pair-comparison studies. Some data. Some data

A few Murray connections: Extended Bradley-Terry models. Bradley-Terry model. Pair-comparison studies. Some data. Some data Introduction A few Murray connections: Extended Bradley-Terry models David Firth (with Heather Turner) Department of Statistics University of Warwick psychometrics sport Lancaster missing data structured

More information

Recap Social Choice Fun Game Voting Paradoxes Properties. Social Choice. Lecture 11. Social Choice Lecture 11, Slide 1

Recap Social Choice Fun Game Voting Paradoxes Properties. Social Choice. Lecture 11. Social Choice Lecture 11, Slide 1 Social Choice Lecture 11 Social Choice Lecture 11, Slide 1 Lecture Overview 1 Recap 2 Social Choice 3 Fun Game 4 Voting Paradoxes 5 Properties Social Choice Lecture 11, Slide 2 Formal Definition Definition

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Using Box-Scores to Determine a Position's Contribution to Winning Basketball Games

Using Box-Scores to Determine a Position's Contribution to Winning Basketball Games Brigham Young University BYU ScholarsArchive All Theses and Dissertations 2005-08-16 Using Box-Scores to Determine a Position's Contribution to Winning Basketball Games Garritt L. Page Brigham Young University

More information

Statistical learning. Chapter 20, Sections 1 4 1

Statistical learning. Chapter 20, Sections 1 4 1 Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

More information

STAT Lecture 11: Bayesian Regression

STAT Lecture 11: Bayesian Regression STAT 491 - Lecture 11: Bayesian Regression Generalized Linear Models Generalized linear models (GLMs) are a class of techniques that include linear regression, logistic regression, and Poisson regression.

More information

Lecture 4 How to Forecast

Lecture 4 How to Forecast Lecture 4 How to Forecast Munich Center for Mathematical Philosophy March 2016 Glenn Shafer, Rutgers University 1. Neutral strategies for Forecaster 2. How Forecaster can win (defensive forecasting) 3.

More information

Bayesian GLMs and Metropolis-Hastings Algorithm

Bayesian GLMs and Metropolis-Hastings Algorithm Bayesian GLMs and Metropolis-Hastings Algorithm We have seen that with conjugate or semi-conjugate prior distributions the Gibbs sampler can be used to sample from the posterior distribution. In situations,

More information

Lecture 3: Sports rating models

Lecture 3: Sports rating models Lecture 3: Sports rating models David Aldous August 31, 2017 Sports are a popular topic for course projects usually involving details of some specific sport and statistical analysis of data. In this lecture

More information

The linear model is the most fundamental of all serious statistical models encompassing:

The linear model is the most fundamental of all serious statistical models encompassing: Linear Regression Models: A Bayesian perspective Ingredients of a linear model include an n 1 response vector y = (y 1,..., y n ) T and an n p design matrix (e.g. including regressors) X = [x 1,..., x

More information

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10 EECS 70 Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10 Introduction to Basic Discrete Probability In the last note we considered the probabilistic experiment where we flipped

More information

Brian Richardet For the DBF Organizing Committee

Brian Richardet For the DBF Organizing Committee The 2018 AIAA/Textron Aviation/Raytheon Missile Systems Design/Build/Fly Competition Flyoff was held at Cessna East Field in Wichita, KS on the weekend of April 19-22, 2018. This was the 22nd year for

More information

Spatial inference. Spatial inference. Accounting for spatial correlation. Multivariate normal distributions

Spatial inference. Spatial inference. Accounting for spatial correlation. Multivariate normal distributions Spatial inference I will start with a simple model, using species diversity data Strong spatial dependence, Î = 0.79 what is the mean diversity? How precise is our estimate? Sampling discussion: The 64

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

Proteomics and Variable Selection

Proteomics and Variable Selection Proteomics and Variable Selection p. 1/55 Proteomics and Variable Selection Alex Lewin With thanks to Paul Kirk for some graphs Department of Epidemiology and Biostatistics, School of Public Health, Imperial

More information

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017 Bayesian inference Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark April 10, 2017 1 / 22 Outline for today A genetic example Bayes theorem Examples Priors Posterior summaries

More information

BAYESIAN ESTIMATION OF LINEAR STATISTICAL MODEL BIAS

BAYESIAN ESTIMATION OF LINEAR STATISTICAL MODEL BIAS BAYESIAN ESTIMATION OF LINEAR STATISTICAL MODEL BIAS Andrew A. Neath 1 and Joseph E. Cavanaugh 1 Department of Mathematics and Statistics, Southern Illinois University, Edwardsville, Illinois 606, USA

More information

Pima Community College Students who Enrolled at Top 200 Ranked Universities

Pima Community College Students who Enrolled at Top 200 Ranked Universities Pima Community College Students who Enrolled at Top 200 Ranked Universities Institutional Research, Planning and Effectiveness Project #20170814-MH-60-CIR August 2017 Students who Attended Pima Community

More information

Regression diagnostics

Regression diagnostics Regression diagnostics Kerby Shedden Department of Statistics, University of Michigan November 5, 018 1 / 6 Motivation When working with a linear model with design matrix X, the conventional linear model

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Model Checking and Improvement

Model Checking and Improvement Model Checking and Improvement Statistics 220 Spring 2005 Copyright c 2005 by Mark E. Irwin Model Checking All models are wrong but some models are useful George E. P. Box So far we have looked at a number

More information