Predicting Winners of Competitive Events with Topological Data Analysis

Similar documents
7.3 The Jacobi and Gauss-Seidel Iterative Methods

Lecture 6. Notes on Linear Algebra. Perceptron

Lecture No. 1 Introduction to Method of Weighted Residuals. Solve the differential equation L (u) = p(x) in V where L is a differential operator

Lecture 3. Linear Regression

Worksheets for GCSE Mathematics. Quadratics. mr-mathematics.com Maths Resources for Teachers. Algebra

Classical RSA algorithm

When GIS meets LUTI: Enhanced version of the MARS simulation model through local accessibility coefficients

Lecture 11. Kernel Methods

Chapter 5: Spectral Domain From: The Handbook of Spatial Statistics. Dr. Montserrat Fuentes and Dr. Brian Reich Prepared by: Amanda Bell

(1) Introduction: a new basis set

Combinatorial Laplacian and Rank Aggregation

Radial Basis Function (RBF) Networks

Grover s algorithm. We want to find aa. Search in an unordered database. QC oracle (as usual) Usual trick

PHL424: Nuclear Shell Model. Indian Institute of Technology Ropar

A Posteriori Error Estimates For Discontinuous Galerkin Methods Using Non-polynomial Basis Functions

Property Testing and Affine Invariance Part I Madhu Sudan Harvard University

Lecture No. 5. For all weighted residual methods. For all (Bubnov) Galerkin methods. Summary of Conventional Galerkin Method

Work, Energy, and Power. Chapter 6 of Essential University Physics, Richard Wolfson, 3 rd Edition

Algebraic Codes and Invariance

General Strong Polarization

Worksheets for GCSE Mathematics. Algebraic Expressions. Mr Black 's Maths Resources for Teachers GCSE 1-9. Algebra

Data Preprocessing. Jilles Vreeken IRDM 15/ Oct 2015

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher

Uncertain Compression & Graph Coloring. Madhu Sudan Harvard

Review for Exam Hyunse Yoon, Ph.D. Assistant Research Scientist IIHR-Hydroscience & Engineering University of Iowa

Hopfield Network for Associative Memory

General Strong Polarization

Solving Fuzzy Nonlinear Equations by a General Iterative Method

SECTION 7: FAULT ANALYSIS. ESE 470 Energy Distribution Systems

Numerical Solution of Fredholm and Volterra Integral Equations of the First Kind Using Wavelets bases

Math, Stats, and Mathstats Review ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD

Module 7 (Lecture 25) RETAINING WALLS

A A A A A A A A A A A A. a a a a a a a a a a a a a a a. Apples taste amazingly good.

Support Vector Machines. CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

1. The graph of a function f is given above. Answer the question: a. Find the value(s) of x where f is not differentiable. Ans: x = 4, x = 3, x = 2,

Secondary 3H Unit = 1 = 7. Lesson 3.3 Worksheet. Simplify: Lesson 3.6 Worksheet

Locality in Coding Theory

Estimate by the L 2 Norm of a Parameter Poisson Intensity Discontinuous

Lecture 10: Logistic Regression

Thermodynamic Cycles

Introduction to Density Estimation and Anomaly Detection. Tom Dietterich

HISTORY. In the 70 s vs today variants. (Multi-layer) Feed-forward (perceptron) Hebbian Recurrent

Combinatorial Hodge Theory and a Geometric Approach to Ranking

Real-Time Weather Hazard Assessment for Power System Emergency Risk Management

Heteroskedasticity ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD

(2) Orbital angular momentum

Lesson 24: True and False Number Sentences

Angular Momentum, Electromagnetic Waves

Terms of Use. Copyright Embark on the Journey

CSC 578 Neural Networks and Deep Learning

2.4 Error Analysis for Iterative Methods

ECE 6540, Lecture 06 Sufficient Statistics & Complete Statistics Variations

Haar Basis Wavelets and Morlet Wavelets

Independent Component Analysis and FastICA. Copyright Changwei Xiong June last update: July 7, 2016

Variations. ECE 6540, Lecture 02 Multivariate Random Variables & Linear Algebra

Mathematical Methods 2019 v1.2

Advanced data analysis

International Journal of Mathematical Archive-5(3), 2014, Available online through ISSN

ON CALCULATION OF EFFECTIVE ELASTIC PROPERTIES OF MATERIALS WITH CRACKS

Graph Helmholtzian and Rank Learning

Number Representations

Crash course Verification of Finite Automata Binary Decision Diagrams

A Step Towards the Cognitive Radar: Target Detection under Nonstationary Clutter

Lesson 2. Homework Problem Set Sample Solutions S.19

Rotational Motion. Chapter 10 of Essential University Physics, Richard Wolfson, 3 rd Edition

PHY103A: Lecture # 4

Dan Roth 461C, 3401 Walnut

Gravitation. Chapter 8 of Essential University Physics, Richard Wolfson, 3 rd Edition

Approximate Second Order Algorithms. Seo Taek Kong, Nithin Tangellamudi, Zhikai Guo

Conharmonically Flat Vaisman-Gray Manifold

The Arithmetic Mean Solver in Lagged Diffusivity Method for Nonlinear Diffusion Equations

Chapter 22 : Electric potential

Inferring the origin of an epidemic with a dynamic message-passing algorithm

MATH 1080: Calculus of One Variable II Fall 2018 Textbook: Single Variable Calculus: Early Transcendentals, 7e, by James Stewart.

EE 424 Introduction to Optimization Techniques

Jasmin Smajic1, Christian Hafner2, Jürg Leuthold2, March 23, 2015

Analytics Software. Beyond deterministic chain ladder reserves. Neil Covington Director of Solutions Management GI

Math 3 Unit 3: Polynomial Functions

Promotion Time Cure Rate Model with Random Effects: an Application to a Multicentre Clinical Trial of Carcinoma

Review for Exam Hyunse Yoon, Ph.D. Adjunct Assistant Professor Department of Mechanical Engineering, University of Iowa

Elastic light scattering

Quantum state measurement

" = Y(#,$) % R(r) = 1 4& % " = Y(#,$) % R(r) = Recitation Problems: Week 4. a. 5 B, b. 6. , Ne Mg + 15 P 2+ c. 23 V,

Rank Aggregation via Hodge Theory

Statistical ranking problems and Hodge decompositions of graphs and skew-symmetric matrices

Gradient expansion formalism for generic spin torques

Non-Parametric Weighted Tests for Change in Distribution Function

CS145: INTRODUCTION TO DATA MINING

Embedded 5(4) Pair Trigonometrically-Fitted Two Derivative Runge- Kutta Method with FSAL Property for Numerical Solution of Oscillatory Problems

International Journal of Mathematical Archive-5(12), 2014, Available online through ISSN

Quantum Mechanics. An essential theory to understand properties of matter and light. Chemical Electronic Magnetic Thermal Optical Etc.

STAT2201 Slava Vaisman

Low-Degree Polynomials

Modeling of a non-physical fish barrier

Radiation. Lecture40: Electromagnetic Theory. Professor D. K. Ghosh, Physics Department, I.I.T., Bombay

10.1 Three Dimensional Space

Math 3 Unit 3: Polynomial Functions

CHAPTER 5 Wave Properties of Matter and Quantum Mechanics I

Yang-Hwan Ahn Based on arxiv:

Online Mode Shape Estimation using Complex Principal Component Analysis and Clustering, Hallvar Haugdal (NTMU Norway)

Transcription:

Predicting Winners of Competitive Events with Topological Data Analysis Conrad D Souza Ruben Sanchez-Garcia R.Sanchez-Garcia@soton.ac.uk Tiejun Ma tiejun.ma@soton.ac.uk Johnnie Johnson J.E.Johnson@soton.ac.uk Ming-Chien Sung M.SUNG@soton.ac.uk

Research Aims Develop topological tools for data analysis Measure the underlying quality of horses based on past race performances Make profitable predictions from the results

Horse Racing Data Data consists of outcome of UK horse races between 2005 and 2014 70261 races competed in by 64691 horses Various performance indicators including finishing position and beaten lengths

Handicapping Handicapping decreases the predictability of races Horses are given extra weight to carry to inhibit their performance Aim of handicappers is that races finish in a dead heat Account for this by estimating the results had all horses carried the same weight

Pairwise Scores Each race αα, with performance indicator PP, forms a local pairwise score matrix YY αα with YY αα iiii = PP ii PP jj Information is aggregated, with respect to a reliability measure

Indirect Comparisons Horse A 1 B 2 Finishing Position Horse B 1 C 2 Finishing Position Which is better? A or C?

Optimisation Problem HodgeRank finds global scores ss minimising the weighted square error between the induced and observed pairwise comparisons Want to solve the optimisation problem: min ss R mm ii,jj WW iiii (ss ii ss jj YY iiii ) 2

Flag Complex Representation Horses form the 0-skeleton Pairwise scores form the 1- simplices

Cochains kk-cochains are realvalued functions on the kk-simplices Reversing direction negates the value of the cochain Set of all kk-cochains is denoted CC kk

Inner Products Choose unweighted Euclidean inner products on CC 0 and CC 2 Equip CC 1 with a weighted Euclidean inner product ff, gg CC 1 = ii EE WW ii ff ii gg(ii)

Coboundary Operators kk-th coboundary operator is a linear map δδ kk : CC kk CC kk+1 Adjoint of the kk-th coboundary operator is a linear map δδ kk : CC kk+1 CC kk kk-th combinatorial Laplacian is a map Δ kk = δδ kk δδ kk + δδ kk 1 δδ kk 1 : CC kk CC kk

Hodge Decomposition Theorem CC kk admits an orthogonal decomposition CC kk = iiii δδ kk 1 kkkkkk Δ kk iiii δδ kk and kkkkkk Δ kk = kkkkkk(δδ kk ) kkkkkk(δδ kk 1 ) Split YY into orthogonal cochains according to the Hodge Decomposition Theorem

Globally Consistent iiii(δδ 0 ) are globally consistent cochains Any ff 1 iiii(δδ 0 ) has the form ff 1 ii, jj = ff 0 jj ff 0 (ii) for some ff 0 CC 0 ff 0 = 5 ff 1 = 2 ff 1 = 3 ff 0 = 3 ff 0 = 2 ff 1 = 4 ff 1 = 1 ff 0 = 1

Optimisation Problem Since iiii(δδ 0 ) is the set of all globally consistent pairwise scores, the optimisation problem can be written as: min ss R mm ii,jj WW iiii (ss ii ss jj YY iiii ) 2 = min δδ 2 0ss YY ss R mm 2,WW

Optimisation Problem Solved Solutions to the optimisation problem satisfy Δ 0 ss = δδ 0 YY and the minimum norm solution is given by ss = Δ 0 δδ 0 YY ss is unique up to an additive constant

Inconsistencies ker(δ 1 ) are partial inconsistencies Every triple of horses are consistent but larger cycles are inconsistent

Inconsistencies iiii(δδ 1 ) are complete inconsistencies These are functions which are inconsistent on every level

Partial Inconsistency Weights Measure how far a 1-simplex is from local consistency by φφ 1 ii = (δδ 1 δδ 1 YY)(ii) Reweight complex by 1 WW ii = φφ 1 ii + 1

Predicting Winners Train and test a conditional logit model using public odds as a predictor Assess impact of adding global score variable to the model Measure goodness-of-fit by McFadden s RR 2 Log Likelihood Ratio Test determines benefit of adding signal variable

Conditional Logit Model Generate vector of winning probabilities for each horse in each race pp ii αα = (pp 1 αα,, pp nn αα ) Probabilities based on predictive variables xx ii αα = (xx 1 αα 1,, xx 1 αα mm ) Representative utility for horse ii in race αα mm uu ii αα = kk=1 ββ(kk)xx ii αα (kk) + εε ii αα

Conditional Logit Model Assuming error terms are identically and independently distributed via a double exponential distribution, probabilities are given by pp ii αα = nn αα ii=1 exp mm kk=1 eeeeee mm kk=1 ββ(kk)xx ii αα (kk) ββ(kk)xx ii αα (kk)

Kelly Wagering Strategy Fractional Kelly wagering strategy employed to assigns bets A fraction of the initial capital is bet, given by ff = pp bb + 1 1 bb where bb is the odds ratio bb: 1 and pp is the estimated probability of winning

Results Model trained over 2011 to 2013 and evaluated in 2014 LLR test significant at the 0.1% RR 2 increase of 0.227% over public odds model with RR 2 of 0.16547

Results Betting simulation results (initial capital of 1000) Model Profit ( ) Rate of Return (%) excl. scores -424.67-7.40 incl. scores 52.05 0.48