J. Sadeghi E. Patelli M. de Angelis

Size: px

Start display at page:

Download "J. Sadeghi E. Patelli M. de Angelis"

Amie Eaton
5 years ago
Views:

1 J. Sadeghi E. Patelli Institute for Risk and, Department of Engineering, University of Liverpool, United Kingdom 8th International Workshop on Reliable Computing, Computing with Confidence University of Liverpool, Liverpool, UK, July / 16

2 Outline / 16

3 In recent years Machine Learning (particularly Deep ) has exploded in popularity. Successful revival credited to increases in computational power: Huge architectures with popular applications in image recognition and speech analysis. New implementations like TensorFlow. have long been the tool of choice when dealing with datasets for which Gaussian Process Emulators are difficult or slow to construct. 3 / 16

Most popular model is to treat the output as mean of Gaussian distribution and measure σ with a test set of data

4 A brief history of in Models of uncertainty in are the subject of active research. approaches have been proposed since 80s/90s (interval weights for interval training data). Most popular model is to treat the output as mean of Gaussian distribution and measure σ with a test set of data not used during training. Bayesian approach (also proposed 90s, now implemented with ADVI) is gaining massive popularity due to heteroskedastic noise structure. 4 / 16

5 Outline / 16

6 Constant width Network (Campi): arg min [h : y i ŷ(x i ) < h i], W,h (1) Finds the smallest interval centred on a neural network which contains all the data, by changing weights of a fixed architecture. This is a (non-convex) scenario approximation of a chance constrained optimisation program, since the sampled constraints are generated probabilistically. 6 / 16

7 Non-convex Scenario Optimisation We can bound the probability of constraint violation (V ) for a scenario optimisation program: P N (V (ˆx N ) > ɛ(s)) < β The probability that our solved scenario program has a reliability less than ɛ is less than β, with β ɛ(s) = 1 N s N ( ) (2) N s s is the number of support constraints - constraints which if removed improve the solution. For small networks solve with fmincon in Matlab N times. 7 / 16

Can be used to solve our problem: 1 Transform problem: minimise maximum error on whole batch - no constraints.

8 Gradient based optimisation comes in two flavours: Batch - very stable but slow Stochastic Gradient Descent - random data point selected for each step Minibatch SGD (a subset) - best of both worlds, stable & quick. Can be used to solve our problem: 1 Transform problem: minimise maximum error on whole batch - no constraints. 2 Minimise max error on minibatch. 3 Find s by considering which samples produce maximum error at end of training. Minibatch Optimisation 8 / 16

9 Minibatch Optimisation Order statistics of a uniform distribution sampled without replacement show that the Minibatch procedure is approximately equal to the true problem. Typically N > and M is a few hundred, depending on GPU available. For minibatch size M we expect to minimise the 1 1 M th percentile of true error. Variance is 1 M 2. Cost can be reduced from O(N 2 N iter ) (multiple run whole batch) to O(NN iter ) (single run whole batch) to O(MN iter ) (single run minibatch). 9 / 16

10 Imprecise training data One advantage of the Approach is that set valued training data can be considered. e.g. bounds on output data, l ball and l 2 ball. bounds on output can be considered trivially - but gradient based modifications to the algorithm (in essence, proposed elsewhere) allow interval data to be included. 10 / 16

11 Outline / 16

12 Toy function Network with 1 layer containing 10 neurons Training data: 1250 samples from the following test function: y = 0.3 (15 u exp( 3 u) + w) (3) w is a normal distributed random variable with zero mean and standard deviation = u will be uniformly distributed between 0 and 1. M = 200, Adam optimiser used in TensorFlow. 12 / 16

13 Results Maximum Squared Error (Minibatch) ,000 4,000 Iteration y u s = 9 P N (V (ˆx N ) > 0.057) < 10 6 (4) 13 / 16

14 Outline / 16

15 Summary We have demonstrated an efficient method to train constant width based on the scenario approach. Crisp or set valued training data can be used. Method only requires a small modification to gradient update rule in existing TensorFlow codes (change mean squared error to max squared error). Good potential to extend the approach to larger problems. 15 / 16

16 Thank you for your attention Questions? Jonathan Sadeghi e: w: jonathan.sadeghi/ 16 / 16

Introduction to Neural Networks

Introduction to Neural Networks CUONG TUAN NGUYEN SEIJI HOTTA MASAKI NAKAGAWA Tokyo University of Agriculture and Technology Copyright by Nguyen, Hotta and Nakagawa 1 Pattern classification Which category of an input? Example: Character