The Fundamental Principle of Data Science

Similar documents
Homotopy Probability Theory in the Univalent Foundations

Model Theory in the Univalent Foundations

Reasoning with Uncertainty

Chapter Three. Hypothesis Testing

Principles of Statistical Inference

De Finetti s ultimate failure. Krzysztof Burdzy University of Washington

Symmetric Probability Theory

(1) Introduction to Bayesian statistics

Discussion of Dempster by Shafer. Dempster-Shafer is fiducial and so are you.

Uncertainty and Rules

From syntax to semantics of Dependent Type Theories - Formalized

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Principles of Statistical Inference

Reasoning Under Uncertainty

LOGIC OF PROBABILITY AND CONJECTURE

Naive Bayes classification

Bayesian Statistics. State University of New York at Buffalo. From the SelectedWorks of Joseph Lucke. Joseph F. Lucke

Uncertain Inference and Artificial Intelligence

Uncertainty. Chapter 13

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

Bayesian Reasoning. Adapted from slides by Tim Finin and Marie desjardins.

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14

A union of Bayesian, frequentist and fiducial inferences by confidence distribution and artificial data sampling

Machine Learning. Bayes Basics. Marc Toussaint U Stuttgart. Bayes, probabilities, Bayes theorem & examples

Introduction to Imprecise Probability and Imprecise Statistical Methods

Bios 6649: Clinical Trials - Statistical Design and Monitoring

Machine Learning

Inferential models: A framework for prior-free posterior probabilistic inference

Probability Theory Review

M B S E. Representing Uncertainty in SysML 2.0. Chris Paredis Director. Model-Based Systems Engineering Center

Probability is related to uncertainty and not (only) to the results of repeated experiments

Belief functions: A gentle introduction

Data Mining Chapter 4: Data Analysis and Uncertainty Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Bayesian Models in Machine Learning

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10

BFF Four: Are we Converging?

Probability Basics. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Computer Proof Assistants and Univalent Foundations of Mathematics

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Hypothesis Testing. Rianne de Heide. April 6, CWI & Leiden University

Bayesian Statistics as an Alternative for Analyzing Data and Testing Hypotheses Benjamin Scheibehenne

Two hours. Note that the last two pages contain inference rules for natural deduction UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

CS 4649/7649 Robot Intelligence: Planning

Basic Probabilistic Reasoning SEG

A gentle introduction to imprecise probability models

For True Conditionalizers Weisberg s Paradox is a False Alarm

Frequentist Statistics and Hypothesis Testing Spring

Pengju XJTU 2016

Imprecise Probability

Semantic Foundations for Probabilistic Programming

Bayesian Inference: What, and Why?

Structural Foundations for Abstract Mathematics

Chapter 12: Estimation

Course Introduction. Probabilistic Modelling and Reasoning. Relationships between courses. Dealing with Uncertainty. Chris Williams.

Probabilistic Machine Learning. Industrial AI Lab.

Dempster-Shafer Theory and Statistical Inference with Weak Beliefs

Bayesian Networks BY: MOHAMAD ALSABBAGH

Probability and Statistics

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Statistical Inference

Probabilistic Reasoning

Computational Cognitive Science

Introduction to Bayesian Inference

A Logic for Reasoning about Evidence

CMPT Machine Learning. Bayesian Learning Lecture Scribe for Week 4 Jan 30th & Feb 4th

Deciding, Estimating, Computing, Checking

Deciding, Estimating, Computing, Checking. How are Bayesian posteriors used, computed and validated?

Grundlagen der Künstlichen Intelligenz

Bayesian Inference. STA 121: Regression Analysis Artin Armagan

Statistical Machine Learning Lecture 1: Motivation

Machine Learning 4771

Uncertainty. Introduction to Artificial Intelligence CS 151 Lecture 2 April 1, CS151, Spring 2004

STAT 515 fa 2016 Lec Statistical inference - hypothesis testing

Structure learning in human causal induction

COHERENCE AND PROBABILITY. Rosangela H. Loschi and Sergio Wechsler. Universidade Federal de Minas Gerais e Universidade de S~ao Paulo ABSTRACT

Classification & Information Theory Lecture #8

For True Conditionalizers Weisberg s Paradox is a False Alarm

Machine Learning

AIM HIGH SCHOOL. Curriculum Map W. 12 Mile Road Farmington Hills, MI (248)

Confidence Distribution

Bayesian vs frequentist techniques for the analysis of binary outcome data

DIFFERENT APPROACHES TO STATISTICAL INFERENCE: HYPOTHESIS TESTING VERSUS BAYESIAN ANALYSIS

Rigorous Science - Based on a probability value? The linkage between Popperian science and statistical analysis

Whaddya know? Bayesian and Frequentist approaches to inverse problems

Examine characteristics of a sample and make inferences about the population

Machine Learning

Rigorous Science - Based on a probability value? The linkage between Popperian science and statistical analysis

Dynamics of Inductive Inference in a Uni ed Model

Uncertainty and Bayesian Networks

Statistical Methods in Particle Physics. Lecture 2

The Limitation of Bayesianism

Tests about a population mean

Rigorous Science - Based on a probability value? The linkage between Popperian science and statistical analysis

Probability and Information Theory. Sargur N. Srihari

Three grades of coherence for non-archimedean preferences

A survey of the theory of coherent lower previsions

Conditional Degree of Belief

The Curry-Howard Isomorphism

Transcription:

The Fundamental Principle of Data Science Harry Crane Department of Statistics Rutgers May 7, 2018 Web : www.harrycrane.com Project : researchers.one Contact : @HarryDCrane Harry Crane (Rutgers) Foundations of Data Science BFF5: May 7, 2018 1 / 26

Overview: Foundations of Data Science Burning Questions What is Data Science? What is a Foundation? What is the purpose of a Foundation? Acknowledgments FPDS based on ongoing collaboration with Ryan Martin. Also very grateful for conversations with Glenn Shafer, Dimitris Tsementzis, and Snow Zhang. Harry Crane (Rutgers) Foundations of Data Science BFF5: May 7, 2018 2 / 26

Main References H. Crane. (2018). Logic of probability and conjecture. https://philpapers.org/rec/cralop H. Crane. (2018). Probabilistic Foundations of Statistical Network Analysis. Chapman Hall. Released May 2018 Harry Crane (Rutgers) Foundations of Data Science BFF5: May 7, 2018 3 / 26

Foundations of Probability Axioms/ Principles encodes Formal expresses Basic discerns Theory Concept Pre-formal Intuition Harry Crane (Rutgers) Foundations of Data Science BFF5: May 7, 2018 4 / 26

Foundations of Probability Axioms/ Principles encodes Formal expresses Basic discerns Theory Concept Pre-formal Intuition Frequentist Probability Kolmogorov s Axioms Frequentist Probability Frequencies Chance/ Randomness Bayesian Probability de Finetti s coherence Subjective Probability Prices of fair bets Credence Dempster Shafer Probability Axioms of Belief Functions D Shafer Theory Measure of evidence Degrees of Belief Harry Crane (Rutgers) Foundations of Data Science BFF5: May 7, 2018 4 / 26

Foundations of Statistics Axioms/ Principles encodes Formal expresses Basic supports Theory Concept Inference Methods Frequentist Statistics Kolmogorov s Axioms Frequentist Probability Frequencies P-values, Conf intervals, etc. Bayesian Statistics de Finetti s coherence Subjective Probability Prices of fair bets Bayes Factor, Posterior inf Inferential Models Axioms of Belief Functions D Shafer Theory Measure of evidence Valid prob. inference Harry Crane (Rutgers) Foundations of Data Science BFF5: May 7, 2018 5 / 26

Instances of Data Science Data Method Inference Rule Decision/ Application Harry Crane (Rutgers) Foundations of Data Science BFF5: May 7, 2018 6 / 26

Instances of Data Science Data Method Inference Rule Decision/ Application Hypothesis testing Set of Observations P-value Reject H 0 Harry Crane (Rutgers) Foundations of Data Science BFF5: May 7, 2018 6 / 26

Instances of Data Science Data Method Inference Rule Decision/ Application Hypothesis testing Set of Observations P-value Reject H 0 Interaction data Network science Network analysis Advertising campaign Harry Crane (Rutgers) Foundations of Data Science BFF5: May 7, 2018 6 / 26

Instances of Data Science Data Method Inference Rule Decision/ Application Hypothesis testing Set of Observations P-value Reject H 0 Interaction data Network science Network analysis Clustering/Classification Advertising campaign Cloud of Points Clustering Fraud Detection Harry Crane (Rutgers) Foundations of Data Science BFF5: May 7, 2018 6 / 26

The F-word Data Method Inference Rule Decision Methods Hypothesis tests, Bayesian methods, confidence distributions, fiducial distributions, machine learning are not Foundations. Foundations cannot only be based on statistics and probability. Algorithmic methods (clustering, network science, machine learning) are data science. Inferential Models based on Dempster Shafer theory not strictly probability but still statistics/data science. Foundations of Data Science should handle all kinds of (complex) data not just sets, sequences, tables, arrays, etc. Harry Crane (Rutgers) Foundations of Data Science BFF5: May 7, 2018 7 / 26

The F-word Data Method Inference Rule Decision Methods Hypothesis tests, Bayesian methods, confidence distributions, fiducial distributions, machine learning are not Foundations. Foundations cannot only be based on statistics and probability. Algorithmic methods (clustering, network science, machine learning) are data science. Inferential Models based on Dempster Shafer theory not strictly probability but still statistics/data science. Foundations of Data Science should handle all kinds of (complex) data not just sets, sequences, tables, arrays, etc. Role of a Foundation: Internal (Logical): Provides a standard of coherence by which to assess methods. Frequentist statistics: ad hoc principles/rules of thumb. Bayesian statistics: coherence/dutch book argument. Inferential Models (Martin & Liu): validity as standard of rigor. External (Scientific): Provides a standard of falsification for inferences. Inferences are meaningful/scientific only if they are falsifiable. Any method for data science should have built-in criteria under which inferences would be falsified. Harry Crane (Rutgers) Foundations of Data Science BFF5: May 7, 2018 7 / 26

Principles of Data Science Data Method Inference Rule Decision Basic Concepts of Data Science: 1 Fundamental Principle of Data Science (Crane Martin): A Method is a protocol which takes any piece of data to an inference about every assertion A. Method : Data A Inference(A). 2 Identical data structures should lead to identical inferences: D = Data D = A Method D (A) = Inference(A) Method D (A). How to make these principles rigorous? Harry Crane (Rutgers) Foundations of Data Science BFF5: May 7, 2018 8 / 26

Simple Example: Hypothesis testing (frequentist) Data Method Inference Rule Decision If the data provides sufficient evidence in favor of A, then inferences should support A. Hypothesis testing: H 0 : null hypothesis Method: compute a P-value based on the data D: P = Pr(D H 0 ). Interpret P < α as evidence against H 0 (i.e., reject H 0 ). Inference based on P: { reject H0, P < α, Method D (H 0 ) = inconclusive, otherwise. Harry Crane (Rutgers) Foundations of Data Science BFF5: May 7, 2018 9 / 26

Simple Example: Hypothesis testing (Bayes) Data Method Inference Rule Decision If the data provides sufficient evidence in favor of A, then inferences should support A. Hypothesis testing: H 0 : null hypothesis H 1 : alternative hypothesis Method: compute a Bayes factor based on the data D: BF = Pr(D H 1) Pr(D H 0 ). Interpret BF as measure of evidence in favor of H 1 over H 0 (denoted as A). Inference based on P (from Kass & Raftery, 1995): inconclusive, BF < 3.2, substantial, 3.2 BF 10, Method D (A) = strong, 10 < BF 100, decisive, BF > 100. Harry Crane (Rutgers) Foundations of Data Science BFF5: May 7, 2018 10 / 26

Formalization: Data as Structure Data Method Inference Rule Decision Informal: Data is observable structure, including measurements, relations between measurements (interactions, dependencies), etc. Harry Crane (Rutgers) Foundations of Data Science BFF5: May 7, 2018 11 / 26

Formalization: Data as Structure Data Method Inference Rule Decision Informal: Data is observable structure, including measurements, relations between measurements (interactions, dependencies), etc. Formal: Represent Data as a topological space whose points consist of observable information (observable data ) and the relations between them. Harry Crane (Rutgers) Foundations of Data Science BFF5: May 7, 2018 11 / 26

Formalization: The structure of evidence Data Method Inference Rule Decision Informal: Inferences are based on the body of evidence in support of a claim. Harry Crane (Rutgers) Foundations of Data Science BFF5: May 7, 2018 12 / 26

Formalization: The structure of evidence Data Method Inference Rule Decision Informal: Inferences are based on the body of evidence in support of a claim. Formal: Represent the body of evidence for A as a topological space Evid(A) whose points are pieces of evidence in favor of A and paths are relations between pieces of evidence. Harry Crane (Rutgers) Foundations of Data Science BFF5: May 7, 2018 12 / 26

Formalization: The logical structure of inference Data Method Inference Rule Decision Informal: An inference is a judgment about evidential strength (based on data). Harry Crane (Rutgers) Foundations of Data Science BFF5: May 7, 2018 13 / 26

Formalization: The logical structure of inference Data Method Inference Rule Decision Informal: An inference is a judgment about evidential strength (based on data). Formal: Represent Inference(A) as the disjoint union of spaces whose points are evidence in favor of A, in favor of A, or that the outcome is inconclusive. Inference(A) Evid(A) Evid( A) Inconclusive(A). Can account for strength of evidence (strong, suggestive) but simple case here. Harry Crane (Rutgers) Foundations of Data Science BFF5: May 7, 2018 13 / 26

The Logic of Evidence: Syntax (Crane, 2018) (1) For any assertion A there is a space of evidence for that assertion: A Space Evid(A) Space (2) If A has been proven, then there is evidence to support A: A Space a A evid A (a) Evid(A) (3) If A implies B, then evidence for A implies evidence for B: A Space B Space f : A B imp(f ) : Evid(A) Evid(B) (4) There is no evidence for a contradiction ( ): Evid( ) Harry Crane (Rutgers) Foundations of Data Science BFF5: May 7, 2018 14 / 26

The Logic of Evidence: Syntax (Crane, 2018) A Space (1) Harry Crane (Rutgers) Foundations of Data Science BFF5: May 7, 2018 15 / 26

The Logic of Evidence: Syntax (Crane, 2018) A Space Evid(A) Space (1) Harry Crane (Rutgers) Foundations of Data Science BFF5: May 7, 2018 16 / 26

The Logic of Evidence: Syntax (Crane, 2018) A Space Evid(A) Space (1) a A evid A (a) Evid(A) (2) Harry Crane (Rutgers) Foundations of Data Science BFF5: May 7, 2018 17 / 26

The Logic of Evidence: Syntax (Crane, 2018) A Space Evid(A) Space (1) a A evid A (a) Evid(A) (2) A Space B Space f : A B (3) Harry Crane (Rutgers) Foundations of Data Science BFF5: May 7, 2018 18 / 26

The Logic of Evidence: Syntax (Crane, 2018) A Space Evid(A) Space (1) a A evid A (a) Evid(A) (2) A Space B Space f : A B imp(f ) : Evid(A) Evid(B) (3) Harry Crane (Rutgers) Foundations of Data Science BFF5: May 7, 2018 19 / 26

The Logic of Evidence: Full Picture (Crane, 2018) A Space Evid(A) Space (1) a A evid A (a) Evid(A) (2) A Space B Space f : A B imp(f ) : Evid(A) Evid(B) (3) Evid( ) (4) Harry Crane (Rutgers) Foundations of Data Science BFF5: May 7, 2018 20 / 26

Consequences of these axioms The following can be derived as theorems from the rules (1)-(4). Evid(A B) Evid(A) Evid(B) Evid(A) Evid(B) Evid(A B) Inferences supporting A and B (jointly) support inferences for A and for B (individually), which in turn supports inferences for A or inferences for B, which supports inference for A or B. (Arrows do not reverse in general.) Universal/Existential quantification: Evid ( a A, B(a)) a A, Evid(B(a)). Inference that B(a) holds for all a A supports inference that B(a) holds for every a A. a A, Evid(B(a)) Evid ( a A, B(a)). Inferring B(a) for some a A supports inference that there exists a : A such that B(a) holds. Derived inference rules: Evid(A) (A B) Evid(A B) Inferring A and proving that A implies B supports inference in A and B. A Evid(A B) Evid(A B). Harry Crane Knowing (Rutgers) A and inferring Foundations that of AData implies Science B supports inference BFF5: in MayA7, 2018 and B. 21 / 26

Formalization in Coq/UniMath: Definition of Evid as a homotopy type A Space Evid(A) Space (1) Written in Coq using UniMath library: A Space a A evid A (a) Evid(A) (2) A Space B Space f : A B imp(f ) : Evid(A) Evid(B) (3) Evid( ) (4) Harry Crane (Rutgers) Foundations of Data Science BFF5: May 7, 2018 22 / 26

Formalization in Coq/UniMath: Proofs of basic properties Theorems mentioned earlier: Written in Coq using UniMath library: Evid(A B) Evid(A) Evid(B) Evid(A) Evid(A B) Evid(B) Evid(A B) Evid(A) Evid(B) Evid(A B) Harry Crane (Rutgers) Foundations of Data Science BFF5: May 7, 2018 23 / 26

Interpretation of Evid: Semantics Can these properties be interpreted in a familiar statistics/data science context? Can interpret the space of evidence in terms of Probability functions on fixed set Ω. Spaces of probability measures (Giry monad in Category Theory). Similarly, finitely additive probabilities as in 1 and 2. Dempser Shafer belief functions with fixed frame of discernment Ω. Spaces of Dempster Shafer functions. Immediate connections to Maximum likelihood estimation Hypothesis testing: P-values, Bayes factors Inferential Models Harry Crane (Rutgers) Foundations of Data Science BFF5: May 7, 2018 24 / 26

Next Steps There are many... Develop the theory further. Shore up the semantics in terms of existing approaches. Formalize existing statistics, machine learning, algorithmic approaches in this framework. Many more... Harry Crane (Rutgers) Foundations of Data Science BFF5: May 7, 2018 25 / 26

References H. Crane. (2018). Logic of probability and conjecture. https://philpapers.org/rec/cralop H. Crane. (2017). Probability as Shape. Talk at BFF4. Video available at https://youtu.be/cndko0jy6dy H. Crane. (2018). Probabilistic Foundations of Statistical Network Analysis. Chapman Hall. D. Tsementzis. (2018). A meaning explanation for HoTT. Synthese, to appear. Web : www.harrycrane.com Project : researchers.one Contact : @HarryDCrane Harry Crane (Rutgers) Foundations of Data Science BFF5: May 7, 2018 26 / 26