Uncertain Inference and Artificial Intelligence

Similar documents
Valid Prior-Free Probabilistic Inference and its Applications in Medical Statistics

Probabilistic Inference for Multiple Testing

Discussion on Fygenson (2007, Statistica Sinica): a DS Perspective

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Inferential models: A framework for prior-free posterior probabilistic inference

Lecture Notes 1: Decisions and Data. In these notes, I describe some basic ideas in decision theory. theory is constructed from

Cromwell's principle idealized under the theory of large deviations

THE MINIMAL BELIEF PRINCIPLE: A NEW METHOD FOR PARAMETRIC INFERENCE

PMR Learning as Inference

GENERAL THEORY OF INFERENTIAL MODELS I. CONDITIONAL INFERENCE

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

Theory of Maximum Likelihood Estimation. Konstantin Kashin

Fiducial Inference and Generalizations

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Bayesian Machine Learning

A union of Bayesian, frequentist and fiducial inferences by confidence distribution and artificial data sampling

2. I will provide two examples to contrast the two schools. Hopefully, in the end you will fall in love with the Bayesian approach.

Marginal inferential models: prior-free probabilistic inference on interest parameters

1 Hypothesis Testing and Model Selection

Bayesian Linear Regression [DRAFT - In Progress]

Parameter Estimation. Industrial AI Lab.

CSC321 Lecture 18: Learning Probabilistic Models

Machine Learning! in just a few minutes. Jan Peters Gerhard Neumann

Machine Learning 4771

Intelligent Systems I

Naïve Bayes classification

Lecture : Probabilistic Machine Learning

Dempster-Shafer Theory and Statistical Inference with Weak Beliefs

Exact and Efficient Inference for Partial Bayes. Problems

STAT 830 Bayesian Estimation

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Bayesian Econometrics

Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric?

Central Limit Theorem ( 5.3)

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling

Bayesian Inference: What, and Why?

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Bayesian Inference. STA 121: Regression Analysis Artin Armagan

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning

Bayesian Models in Machine Learning

Lecture 1: Bayesian Framework Basics

Two examples of the use of fuzzy set theory in statistics. Glen Meeden University of Minnesota.

A Discussion of the Bayesian Approach

TEACHING INDEPENDENCE AND EXCHANGEABILITY

Lecture 20 May 18, Empirical Bayes Interpretation [Efron & Morris 1973]

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

g-priors for Linear Regression

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Machine Learning. Bayes Basics. Marc Toussaint U Stuttgart. Bayes, probabilities, Bayes theorem & examples

Probabilistic Graphical Models for Image Analysis - Lecture 1

Confidence Distribution

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Applied Bayesian Statistics STAT 388/488

Grundlagen der Künstlichen Intelligenz

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

Graduate Econometrics I: What is econometrics?

BFF Four: Are we Converging?

Variability within multi-component systems. Bayesian inference in probabilistic risk assessment The current state of the art

Statistical Models. David M. Blei Columbia University. October 14, 2014

ECE521 week 3: 23/26 January 2017

On Generalized Fiducial Inference

Parametric Techniques

CS 188: Artificial Intelligence Spring Today

Machine Learning Basics: Maximum Likelihood Estimation

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

MLE/MAP + Naïve Bayes

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

Review: Statistical Model

The Fundamental Principle of Data Science

Principles of Bayesian Inference

One-parameter models

Conditional probabilities and graphical models

STAT 425: Introduction to Bayesian Analysis

Parametric Techniques Lecture 3

COMP90051 Statistical Machine Learning

5601 Notes: The Sandwich Estimator

COMP3702/7702 Artificial Intelligence Lecture 11: Introduction to Machine Learning and Reinforcement Learning. Hanna Kurniawati

Probabilistic Reasoning

BTRY 4830/6830: Quantitative Genomics and Genetics

Introduction to Machine Learning. Lecture 2

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models

Notes 11: OLS Theorems ECO 231W - Undergraduate Econometrics

Uncertain Knowledge and Bayes Rule. George Konidaris

CPSC 340: Machine Learning and Data Mining

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION

Introduction to Probabilistic Machine Learning

Learning in Bayesian Networks

BAYESIAN DECISION THEORY

Natural Language Processing. Classification. Features. Some Definitions. Classification. Feature Vectors. Classification I. Dan Klein UC Berkeley

Generative Techniques: Bayes Rule and the Axioms of Probability

Probability models for machine learning. Advanced topics ML4bio 2016 Alan Moses

Chapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

Probabilistic and Bayesian Machine Learning

Uni- and Bivariate Power

Computer Vision Group Prof. Daniel Cremers. 3. Regression

Transcription:

March 3, 2011 1 Prepared for a Purdue Machine Learning Seminar

Acknowledgement Prof. A. P. Dempster for intensive collaborations on the Dempster-Shafer theory. Jianchun Zhang, Ryan Martin, Duncan Ermini Leaf, Zouyi Zhang, Huiping Xu, Jing-Shiang Hwang, Jun Xie, and Hyokun Yun for collaborations on a variety of IM research projects. NSF support for a joint project with Jun Xie on large-scale multinomial inference and its applications in genome-wide association studies.

References Martin, R. and Liu, C. (2011, ) and the references therein. A possible tbook (Liu and Martin, 2012+; Reasoning with Uncertainty) having the futures: A prior-free and valid probabilistic inference system, which is promising for serious applications of statistics. Fully developed valid probabilistic inferential methods for textbook problems A large collection of applications to modern, challenging, and large-scale statistical problems Deeper understanding of existing schools of thought and their strengths and weaknesses. Satisfactory solutions to well-known benchmark problems, including Stein s paradox and the Behrens-Fisher problem A direct attack on the source of uncertainty, which makes learning and teaching easier and more enjoyable

Abstract Artificial Intelligence It is difficult, perhaps, to believe that artificial intelligence can be made intelligent enough without a valid probabilistic inferential system as a critical module. After a brief review of existing schools of thought on uncertain inference, we introduce a valid probabilistic inferential framework termed inferential models (IMs). With several simple and benchmark examples, we discuss potential applications of IMs in artificial intelligence (in general and machine learning in particular).

Artificial intelligence Machine learning Learning from data What is it? An answer from the web Artificial Intelligence (AI) is the area of computer science focusing on creating machines that can engage on behaviors that humans consider intelligent. The ability to create intelligent machines has intrigued humans since ancient times, and today with the advent of the computer and 50 years of research into AI programming techniques, the dream of smart machines is becoming a reality. Researchers are creating systems which can mimic human thought, understand speech, beat the best human chess player, and countless other feats never before possible.

Is the answer precise? Artificial Intelligence Artificial intelligence Machine learning Learning from data If not, blame on Google s machine learning algorithms

Artificial intelligence Machine learning Learning from data What is it? An answer from the web Machine learning has been central to AI research from the beginning. Unsupervised learning is the ability to find patterns in a stream of input. Supervised learning includes both classification and numerical regression. Classification is used to determine what category something belongs in, after seeing a number of examples of things from several categories. Regression takes a set of numerical input/output examples and attempts to discover a continuous function that would generate the outputs from the inputs. In reinforcement learning the agent is rewarded for good responses and punished for bad ones. These can be analyzed in terms of decision theory, using concepts like utility. The mathematical analysis of machine learning algorithms and their performance is a branch of theoretical computer science known as computational learning theory.

Artificial intelligence Machine learning Learning from data The inference problem Input 1. Data x observed observable quantities X X. 2. Assertion A statements on θ Θ, unknown quantities. 3. Association between X and θ. For example, x is a sample of the population characterized by the cdf F θ (.). Output: 1. Probabilistic uncertainty assessments on the truth or the falsity of A given X = x. 2. Plausible regions for θ and its functions.

Uncertain inference Artificial Intelligence Intelligence and uncertainty Probability models Statistical models Existing schools of thought is critical to AI No?

Intelligence and uncertainty Probability models Statistical models Existing schools of thought One (simple) kind of uncertain inference Probability models A probability model has a meaningful/valid probability distribution assumed to be adequate for everything. In particular, θ has a valid marginal distribution that can be operated via the usual probability calculus to derive valid, e.g., marginal and conditional posterior distributions. Subjective Bayesian Philosophically, every Bayesian is subjective. Bayes was not Bayesian. What s wrong? Nothing is wrong you make the decision and (you or your clients) should take the consequence.

Statistical models Artificial Intelligence Intelligence and uncertainty Probability models Statistical models Existing schools of thought Statistical models In what follows, we consider the cases where you don t have valid distributions for everything, which we refer to as Statistical Models. θ is taken to be unknown.

Intelligence and uncertainty Probability models Statistical models Existing schools of thought Objective Bayesian a personal view The idea can be viewed as to use magic priors to approximate (ideal) frequentist results. Remarks: Assertion-specific priors: Certain priors can work for certain assertions on θ. Large-sample theory: It is really on the case when uncertainty goes away; thinking about both normality and vanishing variances in very-high-dimensional problems. Robust Bayesian: The worst case scenario thinking ultimately leads the Bayesian to a non-bayesian school.

Existing schools of thought Intelligence and uncertainty Probability models Statistical models Existing schools of thought Bayes: for it to work, it really requires valid priors. Fiducial: it is very interesting. It is wrong (but better than Bayes[?]). Dempster-Shafer: as an extension of both Bayes and fiducial, it requires valid independent individual components that are probabilistically meaningful. For example, individual components are specified with fiducial probabilities. Frequentist: starting with specified rules and criteria, it invites the guess and check approach to uncertain inference. If so, is it very appealing? For example, 24+ methods for 2x2 tables and penalty-based methods.

Intelligence and uncertainty Probability models Statistical models Existing schools of thought Remarks These existing methods are useful. All these schools of thought fail for many benchmark examples, such as, the many-normal-means, Behrens-Fisher, and constrained parameter problems. Thinking outside the box may be necessary for new generations.

A valid probabilistic inference framework Two simple examples Predictive random sets One sample test The likelihood insufficiency principle Likelihood alone is not sufficient for probabilistic inference. An unobserved but predictable quantity called the auxiliary (a)-variable, must be introduced for predictive/probabilistic inference. Remark: Bayes makes θ predictable. Is it credible/valid?

A valid probabilistic inference framework Two simple examples Predictive random sets One sample test The No Validity, No Probability principle? Notation: denote by P x (A) the probability for the truth of A given the observed data x. Definition (validity). An inferential framework is said to be valid if A Θ, P X (A), as a function of X, satisfies P X (A) stochastically Unif(0, 1) under the falsity of A, i.e., under the truth of A c, the negation of A.

A valid probabilistic inference framework Two simple examples Predictive random sets One sample test The Inferential Model (IM) framework IM is valid and consists of three steps: Association-step: Associate X and θ with an a-variable z to obtain the mapping Θ X (z) Θ (z π z) consisting of candidate values of θ given X and z. Prediction-step: Predict z with a credible predictive random set (PRS) S θ, i.e., P (S θ z) Unif (0,1), where z π z. Combination-step: Combine x and S θ to obtain Θ x(s θ ) = z Sθ Θ x(z) and compute evidence e x (A) = P (Θ x(s θ ) A) and e x (A c ) = P (Θ x(s θ ) A c ) with e x(a) = 1 e x (A c ) called plausibility.

A valid probabilistic inference framework Two simple examples Predictive random sets One sample test X N(θ, 1) Example A-step. X = θ + z, where z N(0,1). P-step. S θ = [ Z, Z ], where Z N(0,1). C-step. e x (A) and e x (A c ) with Θ x (S θ ) = [x Z,x + Z ]. e x (θ 0 ) 0.0 0.2 0.4 0.6 0.8 1.0 2 0 2 4 6 θ 0 Figure: Plausibility of assertion A = {θ : θ = θ 0 }, indexed by θ 0, given x = 1.96. Note e x (θ 0 ) = 0.

A valid probabilistic inference framework Two simple examples Predictive random sets One sample test X Binomial(n, θ) This is a homework problem for Stat 598D.

Efficiency Artificial Intelligence A valid probabilistic inference framework Two simple examples Predictive random sets One sample test See Stat 598D lecture notes on Statistical Inference. Let b(z) be a continuous function and define Then S = {z : b(z) b(z)} (Z π z ). P(S z) Unif(0,1) (z π z ). We can use this result to construct credible PRS.

A valid probabilistic inference framework Two simple examples Predictive random sets One sample test Combining information: Conditional IMs Example (A textbook example) Consider the association model X i = θ + z i (z i iid N(0, 1), i = 1,...,n). Write X = θ + z and X i X = (z i z) (i = 1,...,n). Predict z conditional on the observed a-quantities {(z i z)} n 1. This leads to simplified conditional IM: A-step. X = θ + n 1 u, where u N(0, 1). P-step. S = [ U, U ], where U N(0, 1). C-step. Θ x(s) = [ X U / n, X + U / n].

A valid probabilistic inference framework Two simple examples Predictive random sets One sample test Efficient inference: Marginal IMs Example (Another textbook example) Consider the association model X i = η + σz i (z i iid N(0, 1), i = 1,...,n). Let θ = (η, σ 2 ) Θ = R R + and write X = η + σ z, s 2 x = σ2 s 2 z, and (X X1)/s x = (z z1)/s z. Predict z and s 2 z conditional on the observed a-quantities (z z1)/s z. This leads to simplified conditional IM: A-step. X = η + sx n u and s 2 x = σ2 s 2 z, where u t n 1(0, 1) s 2 z χ2 n 1. P-step. S = [ U, U ] [0, ], where U t n 1 (0, 1). C-step. Θ x(s) = [ X U s x/ n, X + U s x/ n] [0, ].

A valid probabilistic inference framework Two simple examples Predictive random sets One sample test Model selection via AI (or by AS Artificial Statistician)? Consider choosing a model from a collection of models, including, e.g., normal for simplicity (and efficiency) and non-parametric for robustness. See Jianchun Zhang s PhD thesis for an IM-based method.

2 2 tables Artificial Intelligence Simpson s paradox The Behrens-Fisher problem Stein s paradox A meta-analysis problem Example (Kidney stone treatment, Steven et al (1994)) Table 1. Small Stones? Table 2. Large Stones Treatment Success Failure Treatment Success Failure A 81 6 A 192 71 B 234 26 B 55 25 For making intelligent decision, there are (at least) two things to consider. Prediction: Conditional on the Stone type. Estimation: Combining data if possible. Thus, check the homogeneity of each of the two tables Table 3. Treatment A & Table 4. Treatment B Stone type Success Failure Stone type Success Failure Small 81 6 Small 234 26 Large 192 71 Large 55 25

Simpson s paradox The Behrens-Fisher problem Stein s paradox A meta-analysis problem Evidence for and against homogeneity of treatments For each of Table 3 and Table 4, compute 1. e(homogeneous), 2. e(homogeneous), and 3. 95% plausibility interval for the odd ratio. Remarks. 1. Simpson s paradox is related more to wrong statistical analysis, i.e., modeling, than to inferential method(?) How can this be done in AI? 2. Some relevant statistical thoughts Increase precision of prediction via conditioning, and Increase precision of estimation via pooling. Can some basics like these be integrated into AI?

Numerical results Artificial Intelligence Simpson s paradox The Behrens-Fisher problem Stein s paradox A meta-analysis problem Figure: Plausibilities for log odd ratios Tables 3 and 4, which shows that pooling makes no sense in this example.

Simpson s paradox The Behrens-Fisher problem Stein s paradox A meta-analysis problem Comparing two normal means with unknown variances This is a common textbook, controversial, and practically useful example (Bayes and fiducial do not work well); See Martin, Hwang, and Liu (2010b).

Many-normal-means Artificial Intelligence Simpson s paradox The Behrens-Fisher problem Stein s paradox A meta-analysis problem The association model: X i = µ i + z i (z i iid N(0,1),i = 1,...,n). The problem of interest is to infer µ. A very important example for understanding inference. (Bayes and fiducial do not work); See Martin, Hwang, and Liu (2010b).

Many-normal-means Artificial Intelligence Simpson s paradox The Behrens-Fisher problem Stein s paradox A meta-analysis problem The usual model for the observable X 1,...,X n: µ i iid N(θ, σ 2 ) (i = 1,...,n) and X i µ ind N(µ i, s 2 i ) (i = 1,...,n) with known positive s 2 1,...,s2 n, where µ = (µ 1,...,µ n) and (θ, σ 2 ) R R + unknown. Here, we are interested in inference about σ 2. Since there are really meaningful prior knowledge in practice, it has been tremendous interest on choosing Bayesian priors.

Many-normal-means Artificial Intelligence Simpson s paradox The Behrens-Fisher problem Stein s paradox A meta-analysis problem The sampling model for the observable quantity is X i ind N(θ, σ 2 + s 2 i ) (i = 1,...,n) For simplicity to motivate ideas, consider the case with known θ = 0, that is, X i ind N(0, σ 2 + s 2 i ) (i = 1,...,n) An association model is given by and " nx i=1 nx i=1 X 2 i σ 2 + s 2 i 0 # Xi 2 1/2 B σ 2 + si 2 @ X 1 q σ 2 + s 2 i = V 1 X n C,..., p σ 2 + sn 2 A = U, where V χ 2 n U Unif (On).

Many-normal-means Artificial Intelligence Simpson s paradox The Behrens-Fisher problem Stein s paradox A meta-analysis problem Specify the predictive random set, which predicts u alone, S = {(v, u) : F n(v).5 F n(v).5 } This is a constrained parameter inference problem. Remark. Validity is not a problem, but efficient inference is not straightforward. It requires to consider Generalized Conditional IMs a challenging topic under investigation!