Statistical Methods in Applied Computer Science

Similar documents
Nonparametric Bayesian Methods (Gaussian Processes)

Deciding, Estimating, Computing, Checking

Deciding, Estimating, Computing, Checking. How are Bayesian posteriors used, computed and validated?

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

Bayesian Reasoning. Adapted from slides by Tim Finin and Marie desjardins.

Reasoning with Uncertainty

STA 4273H: Statistical Machine Learning

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

Algorithms for Classification: The Basic Methods

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1

COURSE INTRODUCTION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

An Introduction to Statistical Theory of Learning. Nakul Verma Janelia, HHMI

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Statistical Methods in Particle Physics Lecture 1: Bayesian methods

PART I INTRODUCTION The meaning of probability Basic definitions for frequentist statistics and Bayesian inference Bayesian inference Combinatorics

Pattern Recognition and Machine Learning

Applied Logic. Lecture 4 part 2 Bayesian inductive reasoning. Marcin Szczuka. Institute of Informatics, The University of Warsaw

Probabilistic Machine Learning. Industrial AI Lab.

International Journal "Information Theories & Applications" Vol.14 /

COMS 4771 Introduction to Machine Learning. Nakul Verma

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Lecture 1: Bayesian Framework Basics

Machine Learning Overview

Statistical Methods for Particle Physics Lecture 1: parameter estimation, statistical tests

Bayesian Inference. Chapter 1. Introduction and basic concepts

Content. Learning. Regression vs Classification. Regression a.k.a. function approximation and Classification a.k.a. pattern recognition

Master of Science in Statistics A Proposal

FINAL: CS 6375 (Machine Learning) Fall 2014

VCMC: Variational Consensus Monte Carlo

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated

Pattern Recognition and Machine Learning. Perceptrons and Support Vector machines

Qualifying Exam in Machine Learning

Chapter 6 The Structural Risk Minimization Principle

Machine Learning using Bayesian Approaches

ECE-271B. Nuno Vasconcelos ECE Department, UCSD

Bayesian Reasoning and Recognition

The Fundamental Principle of Data Science

Contents. Part I: Fundamentals of Bayesian Inference 1

Subject CS1 Actuarial Statistics 1 Core Principles

Semester I BASIC STATISTICS AND PROBABILITY STS1C01

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric?

Machine Learning Lecture 7

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

Probabilistic Graphical Models for Image Analysis - Lecture 1

CSE 546 Final Exam, Autumn 2013

Computational Learning Theory

Universal Learning Technology: Support Vector Machines

CS 4649/7649 Robot Intelligence: Planning

Some Concepts of Probability (Review) Volker Tresp Summer 2018

CMPT Machine Learning. Bayesian Learning Lecture Scribe for Week 4 Jan 30th & Feb 4th

Bayesian Inference: What, and Why?

Bayesian Phylogenetics:

Generative Techniques: Bayes Rule and the Axioms of Probability

Solving Classification Problems By Knowledge Sets

From Complexity to Intelligence

Discriminative Models

Machine Learning

Prentice Hall Mathematics, Geometry 2009 Correlated to: Connecticut Mathematics Curriculum Framework Companion, 2005 (Grades 9-12 Core and Extended)

Introduction to Statistical Methods for High Energy Physics

Course Goals and Course Objectives, as of Fall Math 102: Intermediate Algebra

Uncertain Inference and Artificial Intelligence

Machine Learning Linear Classification. Prof. Matteo Matteucci

Introduction to Machine Learning Midterm Exam

Discussion of Dempster by Shafer. Dempster-Shafer is fiducial and so are you.

STA414/2104 Statistical Methods for Machine Learning II

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Machine Learning

STA 4273H: Statistical Machine Learning

Bayesian Nonparametric Regression for Diabetes Deaths

Bayesian Regression Linear and Logistic Regression

Probability Basics. Robot Image Credit: Viktoriya Sukhanova 123RF.com

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

Introduction to Machine Learning CMU-10701

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

PROBABILISTIC PROGRAMMING: BAYESIAN MODELLING MADE EASY

Machine Learning. Instructor: Pranjal Awasthi

Machine Learning Basics Lecture 4: SVM I. Princeton University COS 495 Instructor: Yingyu Liang

Major Matrix Mathematics Education 7-12 Licensure - NEW

Introduction. Chapter 1

Discriminative Models

Algorithmisches Lernen/Machine Learning

IE598 Big Data Optimization Introduction

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

Course Introduction. Probabilistic Modelling and Reasoning. Relationships between courses. Dealing with Uncertainty. Chris Williams.

Part IV: Monte Carlo and nonparametric Bayes

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

Related Concepts: Lecture 9 SEM, Statistical Modeling, AI, and Data Mining. I. Terminology of SEM

Mining Classification Knowledge

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

Pei Wang( 王培 ) Temple University, Philadelphia, USA

Stochastic Processes, Kernel Regression, Infinite Mixture Models

Subjective and Objective Bayesian Statistics

Monte Carlo in Bayesian Statistics

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Recent Advances in Bayesian Inference Techniques

Hidden Markov Models. Vibhav Gogate The University of Texas at Dallas

Transcription:

DD2447, DD3342 Statistical Methods in Applied Computer Science Stefan Arnborg, KTH http://www.nada.kth.se/~stefan SYLLABUS Common statistical models and their use: Bayesian, testing, and fiducial statistical philosophy Hypothesis choice Parametric inference Non-parametric inference Elements of regression Clustering Graphical statistical models Prediction and retrodiction Chapman-Kolmogoroff formulation Elements of Vapnik/Chervonenki's learning theory Evidence theory, estimation and combination of evidence. Support Vector Machines and Kernel methods Vovk/Gammerman hedged prediction technology Stochastic simulation, Markov Chain Monte Carlo. LEARNING GOALS After successfully taking this course, you will be able to: -motivate the use of uncertainty management and statistical methodology in computer science applications, as well as the main methods in use, -account for algorithms used in the area and use the standard tools, -critically evaluate the applicability of these methods in new contexts, and design new applications of uncertainty management, -follow research and development in the area. GRADING DD2447 Grades are E-A during 2009. 70% of homeworks and a very short oral discussion of them gives grade C. Less gives F-D. For higher grades, essentially all homeworks should be turned in on time. Alternative assignments will be substituted for those homeworks you miss. For grade B you must pass one Master's test, for grade A you must do two Master's tests. DD3342 Research level project, or deeper study of part of course 1

Applications of Uncertainty everywhere Medical Imaging/Research (Schizophrenia) Land Use Planning Environmental Surveillance and Prediction Finance and Stock Marketing into Google Robot Navigation and Tracking Security and Military Performance Tuning 2

Aristotle: Logic Logic as a semi-formal system was created by Aristotle, probably inspired by current practice in mathematical arguments. There is no record of Aristotle himself applying logic, but probably the Elements of Euclid derives from Aristotles illustrations of the logical method. Which role has logic in Computer Science?? Nicomachean Ethics Every action is thought to aim at some good Should we not, like archers, aim at what is right? We must be content to indicate the truth roughly and in outline, and with premises of the same kind, to reach conclusions that are no better. It is equally foolish to expect probable reasoning from a mathematician as it is to demand, from a rhetorician, scientific proofs. Visualization Visualize data in such a way that the important aspects are obvious - A good visualization strikes you as a punch between your eyes (Tukey, 1970) Pioneered by Florence Nightingale, first female member of Royal Statistical Society, inventor of pie charts and performance metrics Probabilistic approaches Bayes: Probability conditioned by observation Cournot: An event with very small probability will not happen. Vapnik-Chervonenkis: VC-dimension and PAC, distribution-independence Kolmogorov/Vovk: A sequence is random if it cannot be compressed 3

Peirce: Abduction and uncertainty Aristotles induction, generalizing from particulars, is considered invalid by strict deductionists. Peirce made the concept clear, or at least confused on a higher level. Abduction is verification by finding a plausible explanation. Key process in scientific progress. Sherlock Holmes: common sense inference Techniques used by Sherlock are modelled on Conan Doyle s professor in medical school, who followed the methodological tradition of Hippocrates and Galen. Abductive reasoning, first spelled out by Peirce, is found in 217 instances in Sherlock Holmes adventures - 30 of them in the first novel, A study in Scarlet. Thomas Bayes, amateur mathematician If we have a probability model of the world we know how to compute probabilities of events. But is it possible to learn about the world from events we see? Bayes proposal was forgotten but rediscovered by Laplace. Antoine Augustine Cournot (1801--1877) Pioneer in stochastic processes, market theory and structural post-modernism. Predicted demise of academic system due to discourses of administration and excellence(cf Readings). An alternative to Bayes method - hypothesis testing - is based on Cournot s Bridge : an event with very small probability will not happen 4

Kolmogorov and randomness Andrei Kolmogorov(1903-1987) is the mathematician best known for shaping probability theory into a modern axiomatized theory. His axioms of probability tells how probability measures are defined, also on infinite and infinite-dimensional event spaces and complex product spaces. Kolmogorov complexity characterizes a random string by the smallest size of a description of it. Used to explain Vovk/Gammerman scheme of hedged prediction. Also used in MDL (Minimum Description Length) inference. Normative claim of Bayesianism EVERY type of uncertainty should be treated as probability This claim is controversial and not universally accepted: Fisher(1922), Cramér, Zadeh, Dempster, Shafer, Walley(1999) Students encounter many approaches to uncertainty management and identify weaknessess in foundational arguments. Foundations for Bayesian Inference Bayes method, first documented method based on probability: Plausibility of event depends on observation, Bayes rule: Showcase application: PET-camera f (! D) " f (D!) f (!) Bayes rule organizing principle for uncertainty Parameter and observation spaces can be extremely complex, priors and likelihoods also. MCMC current approach -- often but not always applicable (difficult when posterior has many local maxima separated by low density regions) Better than Numerics?? Camera geometry&noise film scene regularity and also any other camera or imaging device 5

PET camera Project Aims likelihood prior D: film, count by detector j X: radioactivity in voxel i a: camera geometry Inference about Y gives posterior, its mean is often a good picture Support transformation of tasks and solutions in a generic fashion Integrate different command levels and services in a dynamic organization Facilitate consistent situation awareness * WIRED on Total Information Awareness WIRED (Dec 2, 2002) article "Total Info System Totally Touchy" discusses the Total Information Awareness system. ~~~ Quote: "People have to move and plan before committing a terrorist act. Our hypothesis is their planning process has a signature." Jan Walker, Pentagon spokeswoman, in Wired, Dec 2, 2002. "What's alarming is the danger of false positives based on incorrect data," Herb Edelstein, in Wired, Dec 2, 2002. Combination of evidence f (! D) " f (D!) f (!) f (! {d 1,d 2 }) " f (d 1!) f (d 2!) f (!) In Bayes method, evidence is likelihood for observation. 6

Particle filtergeneral tracking Chapman Kolmogorov version of Bayes rule f (! t D t ) " f (d t! t )# f (! t! t $1 ) f (! t $1 D t $1 )d! t$1 Berry and Linoff have eloquently stated their preferences with the often quoted sentence: "Neural networks are a good choice for most classification problems when the results of the model are more important than understanding how the model works". Neural networks typically give the right answer 7

1950-1980: The age of rationality. Let us describe the world with a mathematical model and compute the best way to manage it!! This is a large Bayesian Network, a popular statistical model Ed Jaynes devoted a large part of his career to promote Bayesian inference. He also championed the use of Maximum Entropy in physics Outside physics, he received resistance from people who had already invented other methods. Why should statistical mechanics say anything about our daily human world?? Robust Bayes Priors and likelihoods are convex sets of probability distributions (Berger, de Finetti, Walley,...): imprecise probability: f (! D) " f (D!) f (!) F(! D) " F(D!)F(!) Every member of posterior is a parallell combination of one member of likelihood and one member of prior. For decision making: Jaynes recommends to use that member of posterior with maximum entropy (Maxent estimate). 8

SVM and Kernel method SVM and Kernel method Based on Vapnik-Chervonenkis learning theory Separate classes by wide margin hyperplane classifier, or enclose data points between close parallell hyperplanes for regression Possibly after non-linear mapping to highdimensional space Assumption is only point exchangeability Vovk/Gammerman Hedged predictions Based on Kolmogorov complexity or non-conformance measure In classification, each prediction comes with confidence Asymptotically, misclassifications appear independently and with probability 1-confidence. Only assumption is exchangeability 9