Source localization in an ocean waveguide using supervised machine learning

Similar documents
Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

Neural networks and support vector machines

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

Neural Networks and Deep Learning

Logistic Regression. COMP 527 Danushka Bollegala

Logistic Regression & Neural Networks

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

CSC321 Lecture 9: Generalization

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Bagging and Other Ensemble Methods

Holdout and Cross-Validation Methods Overfitting Avoidance

Linear & nonlinear classifiers

Introduction to Machine Learning Midterm Exam

Online Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions?

Article from. Predictive Analytics and Futurism. July 2016 Issue 13

Classification of Higgs Boson Tau-Tau decays using GPU accelerated Neural Networks

Neural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Computational statistics

Machine Learning Lecture 5

Neural Networks Teaser

CSC321 Lecture 9: Generalization

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions

AE = q < H(p < ) + (1 q < )H(p > ) H(p) = p lg(p) (1 p) lg(1 p)

Deep Learning Lab Course 2017 (Deep Learning Practical)

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Learning with multiple models. Boosting.

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I

Pattern Recognition and Machine Learning. Perceptrons and Support Vector machines

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1

Machine Learning, Midterm Exam

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

The Perceptron Algorithm

Ensemble Methods and Random Forests

Jakub Hajic Artificial Intelligence Seminar I

Introduction to Neural Networks

Neural Network Tutorial & Application in Nuclear Physics. Weiguang Jiang ( 蒋炜光 ) UTK / ORNL

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Nonlinear Classification

10-701/ Machine Learning, Fall

Introduction to Neural Networks

Decision Support Systems MEIC - Alameda 2010/2011. Homework #8. Due date: 5.Dec.2011

Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

Machine Learning Lecture 12

CSC 578 Neural Networks and Deep Learning

Machine Learning Lecture 7

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

Welcome to the Machine Learning Practical Deep Neural Networks. MLP Lecture 1 / 18 September 2018 Single Layer Networks (1) 1

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

Decision trees COMS 4771

Midterm: CS 6375 Spring 2018

Multilayer Perceptron

Other Topologies. Y. LeCun: Machine Learning and Pattern Recognition p. 5/3

Bits of Machine Learning Part 1: Supervised Learning

Low Bias Bagged Support Vector Machines

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

FINAL: CS 6375 (Machine Learning) Fall 2014

Boosting & Deep Learning

The Perceptron algorithm

ECLT 5810 Classification Neural Networks. Reference: Data Mining: Concepts and Techniques By J. Hand, M. Kamber, and J. Pei, Morgan Kaufmann

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

CS534 Machine Learning - Spring Final Exam

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Neural Network Training

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning Basics

text classification 3: neural networks

Feed-forward Network Functions

Support Vector Machine (continued)

ECE521: Inference Algorithms and Machine Learning University of Toronto. Assignment 1: k-nn and Linear Regression

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning

Introduction to Machine Learning Midterm Exam Solutions

I D I A P. Online Policy Adaptation for Ensemble Classifiers R E S E A R C H R E P O R T. Samy Bengio b. Christos Dimitrakakis a IDIAP RR 03-69

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning

Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks

MACHINE LEARNING AND PATTERN RECOGNITION Fall 2005, Lecture 4 Gradient-Based Learning III: Architectures Yann LeCun

@SoyGema GEMA PARREÑO PIQUERAS

Natural Language Processing with Deep Learning CS224N/Ling284

Support Vector Regression (SVR) Descriptions of SVR in this discussion follow that in Refs. (2, 6, 7, 8, 9). The literature

Machine Learning : Support Vector Machines

The connection of dropout and Bayesian statistics

Regularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline

Reading Group on Deep Learning Session 1

>TensorFlow and deep learning_

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

ECE521 Lectures 9 Fully Connected Neural Networks

Pattern Recognition 2018 Support Vector Machines

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

Artificial Neural Networks 2

Machine Learning Lecture 10

Day 3: Classification, logistic regression

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Deep Learning for Computer Vision

Transcription:

Source localization in an ocean waveguide using supervised machine learning Haiqiang Niu, Emma Reeves, and Peter Gerstoft Scripps Institution of Oceanography, UC San Diego

Part I Localization on Noise09 data and SBCEx16 data

Noise 09 Experiment (a) R = 0:1! 2:86 km (b) Depth (m) 0 50 100 Z s = 5 m D = 152 m Z r = 128! 143 m "z = 1 m 150 1490 1495 1500 1505 1510 Sound speed (m/s) One Two Three Layer Halfspace 24 m C p = 1572! 1593 m=s ; = 1:76 g=cm 3, p = 2:0 db=6 C p = 5200 m=s ; = 1:8 g=cm 3, p = 2:0 db=6

Noise 09 Experiment Training data Jan. 31, 2009 01:43-2:05 2 m/s 32.62 32.61 32.6 32.59 (a) VLA 2 Test-Data-1 Jan. 31, 2009 01:01-01:24-2 m/s Latitude (deg) 32.62 32.61 32.6 32.59 (b) VLA 2 Test-Data-2 Feb. 4, 2009 13:41-13:51 4 m/s 32.62 (c) 32.61 32.6 VLA 2 32.59-117.43-117.425-117.42-117.415-117.41 Longitude (deg)

Noise 09 Experiment Training data Jan. 31, 2009 01:43-2:05 2 m/s Test-Data-1 Jan. 31, 2009 01:01-01:24-2 m/s Test-Data-2 Feb. 4, 2009 13:41-13:51 4 m/s

Pre-Processing 1. Convert p(t) to p(f) (Fast Fourier Transform) 2. Normalize p(f): 3. Construct Cross-Spectral Density Matrix (CSDM) 1. Contains signal coherence information 2. Improves Signal-to-Noise Ratio (SNR) 4. Concatenate upper triangular elements real and imaginary parts, vectorize to create input X 1. Reduces memory requirements L For L sensors, N training samples, X has size L( L+1) N. L L(L+1 ) 1

Feed-Forward Neural Network 2-layer network Classification with classes r k, k = 1,, K Activation Functions: Layer 2: Sigmoid ( σ(x) ) Output Layer: Softmax Multiple-frequency inputs to increase SNR Best error rate: Test-Data-1: 3% ** Freq. = 300:10:950 Hz, Hidden Nodes = 1024, # Outputs = 690, # Snapshots = 10 Test-Data-2: 3% ** Freq. = 300:10:950 Hz, Hidden Nodes = 1024, # Outputs = 138, # Snapshots = 5 or 20 **MAPE error, R pi = predicted range, R gi = ground truth range: 100 N N i=1 R pi R gi R gi

Multiple Frequencies: FNN (a)-(c) Test-Data-1 (d)-(f) Test-Data-2 From top to bottom: 550 Hz, 950 Hz, and 300-950 Hz with 10 Hz increments

Other parameters: FNN Test-Data-1 (a)-(c) varying # of classes (output nodes) 138, 690, 14 outputs (d)-(f) varying # of snapshots (stacked CSDMs) 1, 5, 20 snapshots

Signal-to-Noise Ratio: FNN SNR affects any algorithms ability to localize a source Source localization on simulated data with added white noise at SNR: (a) -10 db (b) -5 db (c) 0 db (d) 5 db Multiple frequencies, more snapshots, also increase SNR indirectly

Support Vector Machine (SVM) Hyperplane maximally separates (overlapping) classes Shown: 2-class, 2-D example with no overlap x (2) 5 0-5 x 0 Class: -1 Class: 1 Support vectors Hyperplane Margin boundary d M x s Acoustic source localization has ~138 classes and 17,952 dimensions! -10 0 5 10 x (1)

Support Vector Machine (SVM) Gaussian radial basis function (RBF): k = exp( 1 K x x' 2 ) Best error rate-- Test-Data-1 (left): 2% **, Test-Data-2 (right): 2% ** ** MAPE error

Random Forest (RF) Gini index used to find optimal partition: H = 1 I(t n,l m ) 1 1 I(t n l m ) n m n m x n x m 15 10 4.6 `2 = 1 t n =!1 t n = 1 `1 = 1 x2 I(t n l m ): identity function l m : estimated class for region m t n : true label of a point n m : number of points in region m Gini index : equivalently, the percent of correctly estimated labels multiplied by the percent of incorrectly estimated labels. 0-5 `3 =!1-10 -5 0 1.9 5 10 x 1! " < 1.9! ( < 4.6! ( 4.6 3 =-1 2 =1! " 1.9 1 =1

Random Forest (RF) Bagging is used to avoid learning noise in the data 1. Learn a tree model until any new region contains less than 50 points (then stop) 2. Randomly initialize the model and run again. Since it is a greedy model, the trees likely won t match 3. Average the label for each point across all trees: ˆf bag (x n ) = 1 B B b=1 ˆf trree,b (x n ) where ˆf trree,b (x i ) is the estimated class of x i for the b th tree

Random Forest (RF) Best error rate-- Test-Data-1 (left): 3% **, Test-Data-2 (right): 2% ** ** MAPE error

Regression v Classification Replace error cost function with mean squared (or absolute) error: FNN: SVM: RF: E(w) = 1 2 N n=1 E(y n r n ) = H = (l m r n ) 2 x n x m l m = 1 n m x n x m r n y(x n, w) r n 2 0, y n r n ε, y n r n < ε otherwise

Regression Left: FNN results for (a)-(c) Test-Data-1 and (b)-(d) Test-Data-2. Top to Bottom: 1, 2, and 3 hidden layers Right: SVM (top) and RF (bottom) results for (a)-(b) Test-Data-1 and (c)-(d) Test- Data-2

Matched-Field Processing Matched-field processing (MFP) is a popular method in underwater acoustics Maximize a i x i 2, where a i is a replica and x i is the data, both at the i th receiver, over all i a i is generated by a realistic physical model (e.g. using the wave equation) Requires us to know the environment pretty well Add L 2 or L 1 penalties to promote sparsity Adaptive solutions make assumptions on the noise to suppress it

Matched-Field Processing The properties of sound in the ocean leads to peak ambiguities or sidelobes, that degrades performance 120

Preliminary results from SBCEx16 text

Part II How to use Python for machine learning codes

FNN in TensorFlow 1. Read in datasets define a function not used load training and test sets fix dimensions to match TensorFlow input make sure dimensions look right assign to data_sets object

FNN in TensorFlow Properties of data_sets object

FNN in TensorFlow How to define the mini-batch update Do this within the data_sets object declaration

FNN in TensorFlow Import libraries prevents Python from getting confused import separate file to load data here we use MatPlotLib when looping over files in a directory

FNN in TensorFlow TensorFlow uses flags to keep track of model parameters It also uses Interactive Sessions that update variables dynamically when run

FNN in TensorFlow First, load data and define neural network architecture (We put this inside the function train_and_prediction() for processing multiple files) (You could define the function with more inputs, like n_hidden, to try different architectures)

FNN in TensorFlow Second, define your variables These include: weights, biases, weight and bias random initialization, cost function, optimization method, performance metrics (i.e. mean square error) Example: define weight variable other types: constant, weight_variable, and bias_variable. See TensorFlow documentation for more.

FNN in TensorFlow Third, define a neural network layer As we saw in previous lectures, we usually use sigmoid activation function for hidden layers and softmax or sigmoid for output layers You can always find more in online TensorFlow documents preallocate weights & biases apply activation function (if any) compute activation

FNN in TensorFlow Fourth, define neural network layer architecture A dropout layer removes hidden nodes with probability 50% to prevent overfitting. special TensorFlow feature: Softmax is built into the error function (softmax_cross_entropy_with_logits)

FNN in TensorFlow Finally, train and predict! All variables must be initialized first: Run the Interactive Session we defined earlier

Next step, deep learning?... photo credit: Kris Bouchard Berkeley Lab Computing Sciences photo credit: Roy Kaltschmidt