Intelligent Embedded Systems Uncertainty, Information and Learning Mechanisms (Part 1)

Similar documents
Cognitive Cyber-Physical System

Statistics. Lent Term 2015 Prof. Mark Thomson. 2: The Gaussian Limit

Statistical inference

Probability and Information Theory. Sargur N. Srihari

Coding for Digital Communication and Beyond Fall 2013 Anant Sahai MT 1

DEEP LEARNING CHAPTER 3 PROBABILITY & INFORMATION THEORY

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Autonomous Navigation for Flying Robots

ECE521 week 3: 23/26 January 2017

Lessons in Estimation Theory for Signal Processing, Communications, and Control

Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn!

Estimating functional uncertainty using polynomial chaos and adjoint equations

Reading Group on Deep Learning Session 1

1 Cricket chirps: an example

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2011

Chapter 6. Hypothesis Tests Lecture 20: UMP tests and Neyman-Pearson lemma

Kalman Filter. Predict: Update: x k k 1 = F k x k 1 k 1 + B k u k P k k 1 = F k P k 1 k 1 F T k + Q

2.830J / 6.780J / ESD.63J Control of Manufacturing Processes (SMA 6303) Spring 2008

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

Statistics and Data Analysis

Modern Navigation. Thomas Herring

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

11. Further Issues in Using OLS with TS Data

L03. PROBABILITY REVIEW II COVARIANCE PROJECTION. NA568 Mobile Robotics: Methods & Algorithms

Probing the covariance matrix

A Training-time Analysis of Robustness in Feed-Forward Neural Networks

ROBOTICS 01PEEQW. Basilio Bona DAUIN Politecnico di Torino

Jim Lambers MAT 610 Summer Session Lecture 2 Notes

Scalable robust hypothesis tests using graphical models

SECTION 7: CURVE FITTING. MAE 4020/5020 Numerical Methods with MATLAB

HOMEWORK #4: LOGISTIC REGRESSION

Bayes Filter Reminder. Kalman Filter Localization. Properties of Gaussians. Gaussians. Prediction. Correction. σ 2. Univariate. 1 2πσ e.

Lecture 2: Repetition of probability theory and statistics

Statistical Methods in Particle Physics

Modern Methods of Data Analysis - WS 07/08

9.520: Class 20. Bayesian Interpretations. Tomaso Poggio and Sayan Mukherjee

STA 4273H: Statistical Machine Learning

Kalman filtering and friends: Inference in time series models. Herke van Hoof slides mostly by Michael Rubinstein

5 Handling Constraints

GPS Geodesy - LAB 7. Neglecting the propagation, multipath, and receiver errors, eq.(1) becomes:

Composite Hypotheses and Generalized Likelihood Ratio Tests

Lecture 2: Basic Concepts of Statistical Decision Theory

Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

4. Quantization and Data Compression. ECE 302 Spring 2012 Purdue University, School of ECE Prof. Ilya Pollak

Advanced Signal Processing Introduction to Estimation Theory

Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi

Signal Modeling, Statistical Inference and Data Mining in Astrophysics

Lecture Notes 1: Vector spaces

Mathematics-I Prof. S.K. Ray Department of Mathematics and Statistics Indian Institute of Technology, Kanpur. Lecture 1 Real Numbers

1 Random walks and data

Linear Regression and Its Applications

1.1 Basis of Statistical Decision Theory

FORMULATION OF THE LEARNING PROBLEM

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

Machine Learning. 7. Logistic and Linear Regression

Lecture 3: Statistical sampling uncertainty

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Preliminary draft only: please check for final version

Introduction to Statistical Hypothesis Testing

Neural Network Training

ik () uk () Today s menu Last lecture Some definitions Repeatability of sensing elements

The natural numbers. The natural numbers come with an addition +, a multiplication and an order < p < q, q < p, p = q.

Convergence of Square Root Ensemble Kalman Filters in the Large Ensemble Limit

Estimating the accuracy of a hypothesis Setting. Assume a binary classification setting

Regularization Parameter Estimation for Least Squares: A Newton method using the χ 2 -distribution

Introduction to Econometrics Final Examination Fall 2006 Answer Sheet

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics. 1 Executive summary

Probabilistic Fundamentals in Robotics. DAUIN Politecnico di Torino July 2010

Error Analysis in Experimental Physical Science Mini-Version

Statistics: Learning models from data

Lecture Topic 4: Chapter 7 Sampling and Sampling Distributions

Statistical Pattern Recognition

ELEG 3143 Probability & Stochastic Process Ch. 6 Stochastic Process

IEOR 165 Lecture 7 1 Bias-Variance Tradeoff

Multiple Bits Distributed Moving Horizon State Estimation for Wireless Sensor Networks. Ji an Luo

DISCRETE STOCHASTIC PROCESSES Draft of 2nd Edition

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics

Linear Regression with 1 Regressor. Introduction to Econometrics Spring 2012 Ken Simons

ECE 275A Homework 7 Solutions

Generative Techniques: Bayes Rule and the Axioms of Probability

If we want to analyze experimental or simulated data we might encounter the following tasks:

ECE531: Principles of Detection and Estimation Course Introduction

Stochastic process for macro

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory

Intelligent Data Analysis. Principal Component Analysis. School of Computer Science University of Birmingham

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Evaluating Classifiers. Lecture 2 Instructor: Max Welling

Supplementary Note on Bayesian analysis

I will post Homework 1 soon, probably over the weekend, due Friday, September 30.

variability and their application for environment protection

Introduction to Unscented Kalman Filter

Fast Numerical Methods for Stochastic Computations

Variations. ECE 6540, Lecture 02 Multivariate Random Variables & Linear Algebra

System Identification, Lecture 4

ECON 4160, Autumn term Lecture 1

BAYESIAN DECISION THEORY

Organization. I MCMC discussion. I project talks. I Lecture.

System Identification, Lecture 4

Transcription:

Advanced Research Intelligent Embedded Systems Uncertainty, Information and Learning Mechanisms (Part 1) Intelligence for Embedded Systems Ph. D. and Master Course Manuel Roveri Politecnico di Milano, DEIB, Italy

Uncertaintanty The real world is prone to uncertainty At different level of the data analysis process Acquiring data Representing information Processing the information Learning mechanisms Part 1 of this lecture Part 2 of this lecture To formalize the concept of uncertainty we need to define an «uncertainty-free» entity and a way to evaluate the error w.r.t. this entity

From errors to perturbations We have uncertainty any time we have an approximated entity which, to some extent, estimates the ideal -possibly unknown- one. Such a situation can be formalized by introducing the ideal uncertainty-free entity and the real uncertainty-affected one and evaluating the error: the discrepancy between the two according to a suitable figure of merit. The error is strictly dependent on a specific pointwise instance: we abstract the pointwise error with the concept of perturbation

From errors to perturbations (2) A generic perturbation δa intervenes on the computation by modifying the status assumed by an entity from its nominal configuration A to a perturbed one A p The effect induced by the perturbation can be evaluated through a suitable figure of merit A, A p measuring the discrepancy between the two states. Example: a real sensor providing the constant value a R the discrepancy between the ideal nominal value and the perturbed one can be expressed as the error A,A p = e = a p a the error would assume a different value with different instances of the perturbed acquisition a p

Modeling the uncertainty The mechanism inducing uncertainty can be modeled with the signal plus noise model and a p =a+δ a A,A p = a p a = δ a = e δ a can be described in many cases as a random variable with its probability density function fully characterizing the way uncertainty disrupt the information.

General signals and perturbations The signal The perturbation Perturbed signal drawn from distribution Perturbation operator Discrete or continuous Mean Covariance Examples of perturbations Continuous perturbations Acute perturbations

Perturbations At representational level: Natural numbers Integer numbers Rational and reals During the computational flow: Linear function Nonlinear function

Perturbations at the data representation level Numerical data acquired by sensors and digitalised through an ADC are represented as a sequence of bits coded according to a given transformation which depends on the numerical information we need to represent. We now introduce the main transformations used in numerical representations as well as the types and characterization of uncertainty introduced when representing data in a digital format: Projection Truncation Rounding

Natural Numbers: exact representation and uncertainties Assume we are willing to spend n bits to represent a finite value a N. It immediately comes out that we can represent only numbers belonging to a subset N(n) N given the finiteness of n. bits Exact representation Uncertainty introduced by projection, truncation or rounding i.e., removing bits

Natural Numbers: projection to a subspace A projection to a lower dimensional space is achieved by simply setting to zero the least significant n q bits of the n bits codeword associated with a (the least significant q bits are set to zero leading to value a(q)). Original (n=4 bits) 0000 0000 0001 0000 0010 0000 0011 0000 0100 0100 Projected to n-q=2 bits (q=2) The projection introduces an absolute error e(q) = a a(q) < 2 q

Natural Numbers: truncation Truncation operates as a chopping operator that removes the least significant q bits Original (n=4 bits) 0000 00 0001 00 0010 00 0011 00 0100 01 Truncation to n-q=2 bits (q=2) The projection introduces an absolute error e(q) = a 2 q a(q) < 2 q

Natural Numbers: rounding Rounding of a positive number truncates the q least significant bits and adds 1 to the unchopped part if and only if the most significant bit of the truncated segment is 1. otherwise, the rounded value is the one defined over the n q bits Original (n=4 bits) 0000 00 0001 00 0010 01 0011 01 0100 01 Rounding to n-q=2 bits (q=2)

Perturbation at the data level: integer numbers Use of 2cp notation Truncation Biased approximate representation Rounding Unbiased approximate representation

Perturbation at the data level: the fixed point representation is integer bits assigned to the integer part bits assigned to the fractional one Example: fixed point representation

Many sources of uncertainties at the data representation level but the question is: What is the effect of these uncertainties within the propagation flow? Sensitivity Analysis

Sensitivity Analysis The sensitivity analysis provides ü closed form expressions for the linear function case ü approximated results for the non linear one, provided that the perturbations affecting the inputs are small in magnitude compared to the inputs (small perturbation hypothesis) The analysis of Perturbations in the large i.e., perturbations of arbitrary magnitude, for the nonlinear case, cannot be obtained in a closed form unless y = f (x) assumes a particular structure and has properties that make the mathematics amenable.

Sensitivity Analysis: the computational flow Measurements Output Linear Closed form solution Twice differentiable Approximated solution Under the small perturbation assumption

Linear Function: additive perturbation Parameters of the linear function Point-wise error: Assume perturbation distribution Mean and variance of the error

Linear Function: additive perturbation (2) If is diagonal, i.e., independence assumption on the perturbations Squared i-th component of vector If all the components have the same variance i-th diagonal component of covariance matrix

But what about the pdf of the error?

How to get the pdf of the error? The pdf of the propagated error cannot be evaluated a priori in a closed form unless we assume that the dimension d is large enough. In such a case, we can invoke the Central Limit Theorem (CLT) under the Lyapunov assumptions and δy can be modeled as a random variable drawn from a Gaussian distribution.

Central Limit Theorem under the Lyapunov Condition Convergence of the moments W.r.t. the standard CTL here we have hypotheses on the moments but we do not require Y i, i=1,..,d to be identically distributed

CLT under the Lyapunov Condition (2) From the intuitive point of view, the central limit theorem tells us that the sum of many, not-too-large and not-toocorrelated random terms, average out. The Lyapunov condition is one way for quantifying the nottoo-large term request by inspecting the behaviour on some 2 + l moments. In most of cases, one tests the satisfaction of the condition for l = 1 or 2.

CLT under the Lyapunov Condition (3) From the theorem, with the choice Yi = θiδxi,,, δy can be approximated as a Gaussian random variable and, when the variances are identical It is easy to show that the Lyapunov condition holds if each component of random variable δx is uniformly distributed within a given interval, as it happens in many application cases (think of the error distribution introduced by the rounding and truncation operators operating on binary 2cp codewords).

Linear Function: multiplicative perturbation Point-wise error: Element-wise multiplication Assume perturbation distribution Assume input distribution When the variances are identical Mean and variance of the error

Perturbations of Nonlinear function Nonlinear function modeling the computational flow Point-wise error: Small perturbation hypothesis Second order Taylor expansion around x Jacobian vector Hessian matrix

Nonlinear function (2) By discarding the terms of order larger than two, the perturbed propagated output takes the form of Not much more can be said within a deterministic framework unless we introduce strong assumptions on f (x) or δx. However, by moving to a stochastic framework, which considers x and δx mutually i.i.d random variables drawn from distributions f x (0,C x ) and f δx (0,C δx ), respectively, the first two moments of the distribution of δy can be computed

Nonlinear function (3) Under the above assumptions and by taking expectation w.r.t. x and δx, the expected value of the perturbed output Quasi-Newton approximation f

Let s play with MATLAB Download the examples related to Lecture 3 In the ZIP file: Example 3_A.m About Projection and Truncation of Natural Numbers Example 3_B.m About the Central Limit Theorem (CLT) under the Lyapunov assumptions