KAI ZHANG, SHIYU LIU, AND QING WANG

Similar documents
Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

SGD and Deep Learning

Lecture 16: Introduction to Neural Networks

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

ECE521 Lectures 9 Fully Connected Neural Networks

Lecture 17: Neural Networks and Deep Learning

CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!!

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

Machine Learning Basics III

Topic 3: Neural Networks

Deep Feedforward Networks

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016

ECE521 Lecture 7/8. Logistic Regression

Learning from Examples

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Introduction to Machine Learning Spring 2018 Note Neural Networks

Intro to Neural Networks and Deep Learning

Data Mining & Machine Learning

Machine Learning (CSE 446): Neural Networks

Deep Feedforward Networks

Deep Feedforward Networks

Sample questions for Fundamentals of Machine Learning 2018

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

CSC321 Lecture 5: Multilayer Perceptrons

Logistic Regression Trained with Different Loss Functions. Discussion

A summary of Deep Learning without Poor Local Minima

Reading Group on Deep Learning Session 1

CS 453X: Class 20. Jacob Whitehill

Deep Feedforward Networks. Han Shao, Hou Pong Chan, and Hongyi Zhang

Deep Feedforward Networks. Sargur N. Srihari

Linear Regression. S. Sumitra

Introduction to Deep Neural Networks

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Deep Feedforward Networks. Lecture slides for Chapter 6 of Deep Learning Ian Goodfellow Last updated

Introduction Biologically Motivated Crude Model Backpropagation

Neural Networks and Deep Learning

Lab 5: 16 th April Exercises on Neural Networks

Lecture 5 Neural models for NLP

Deep Neural Networks (1) Hidden layers; Back-propagation

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Intelligent Systems Discriminative Learning, Neural Networks

CSC 578 Neural Networks and Deep Learning

Least Mean Squares Regression. Machine Learning Fall 2018

Classification of Higgs Boson Tau-Tau decays using GPU accelerated Neural Networks

Least Mean Squares Regression

C4 Phenomenological Modeling - Regression & Neural Networks : Computational Modeling and Simulation Instructor: Linwei Wang

Introduction to Machine Learning

Neural networks and support vector machines

Neural Networks. Intro to AI Bert Huang Virginia Tech

Statistical Machine Learning from Data

Accelerating Stochastic Optimization

Neural networks. Chapter 20. Chapter 20 1

Linear Models in Machine Learning

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

Artificial Neural Networks

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Thesis. Wei Tang. 1 Abstract 3. 3 Experiment Background The Large Hadron Collider The ATLAS Detector... 4

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1

Lecture 3 Feedforward Networks and Backpropagation

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Lecture 3 Feedforward Networks and Backpropagation

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions

2015 Todd Neller. A.I.M.A. text figures 1995 Prentice Hall. Used by permission. Neural Networks. Todd W. Neller

Jakub Hajic Artificial Intelligence Seminar I

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Neural Networks: Backpropagation

BACKPROPAGATION. Neural network training optimization problem. Deriving backpropagation

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 16

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

Supporting Information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

Artificial Neural Networks (ANN) Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso

Neural networks. Chapter 19, Sections 1 5 1

Rapid Introduction to Machine Learning/ Deep Learning

Deep Neural Networks

Artificial Neuron (Perceptron)

A Deep Interpretation of Classifier Chains

Frequency Selective Surface Design Based on Iterative Inversion of Neural Networks

Feed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch.

CS489/698: Intro to ML

Normalization Techniques in Training of Deep Neural Networks

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Logistic Regression & Neural Networks

Understanding How ConvNets See

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Neural Networks Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav

1 Machine Learning Concepts (16 points)

Neural Networks for Protein Structure Prediction Brown, JMB CS 466 Saurabh Sinha

Retrieval of Cloud Top Pressure

Linear discriminant functions

Characterization of Gradient Dominance and Regularity Conditions for Neural Networks

Based on the original slides of Hung-yi Lee

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION

18.6 Regression and Classification with Linear Models

Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks

Machine Learning (CSE 446): Backpropagation

Transcription:

CS 229 FINAL PROJECT: STRUCTURE PREDICTION OF OPTICAL FUNCTIONAL DEVICES WITH DEEP LEARNING KAI ZHANG, SHIYU LIU, AND QING WANG x y H l m n p r Nomenclature The output electrical field, which is the input to the neural networks The Si structure, which is the output of the neural networks Number of neurons in in layer l Size of data sets Size of the Si structure Number of layers in the neural networks Size of the discretized electrical field signal 1. Introduction Improving the efficiency of optical device is important in a variety of applications. However, this process is typically done empirically. In this project, we formulate a systematically way to improve the efficiency of optical device design with machine learning. The optical device we focus on is a silicon (Si) nano-structure. This device is constructed by a series of Si nano-cells, which is represented as the {0, 1} binary series in Fig. 1. In Fig. 1, 1 and 0 indicate the existence and void of Si at a particular location. A beam of light perpendicular to the Si structure is incident onto the Si structure and excites the surrounding electric field. In this project, we use the output electric field to predict its corresponding Si structure. Output electric field 1 1 0 1 0 0 1 0 1 1 Si Structure Incident light Figure 1. Schematic of 1D Si structure with simulation procedure. Note that the projection from the input Si structure to the output electrical field follows the underlying physical law governed by the Maxwell s equations, which is a full rank operation, the inversion of this problem is well defined. Therefore, with appropriate machine learning technique, we can predict the Si structure based on the output electrical field. To address the non-linear relation between the Si structure and the output electrical field, we use deep learning as our training algorithm. In the meanwhile, this approach also allows us to reduce the dimensionality of the underlying physics. One the other hand, although the Si structure that is represented by an R 1 n vector can take 2 n combinations, a functional optical device almost always follows specific patterns and the number of output labels in the neural networks is confined. Therefore, by limiting our scope to 1

2 KAI ZHANG, SHIYU LIU, AND QING WANG Input layer Hidden layer 1 Hidden layer 2 Output layer f 1,W 1,b 1 f 2,W 2,b 2 f o,w o,b o x 2 r z (1) 2 H1 z (2) 2 H2 ŷ 2 { 1, +1} n Figure 2. Neural networks with 2 hidden layers. Si structures with physical functionalities, we can get a converged solution using finite number of data sets. The structure of this report is given as follows. A detailed interpretation of the neural networks we use in this project is provided in Section 2. Section 3 describes the procedure of the simulation and the preprocessing of the training data. The training results are discussed in Section 4. 2. Neural networks In deep learning, the input data is processed through a neural networks that is constructed by a multiple layers of computational units. An example of a four-layer neural networks is shown in Fig. 2. This neural networks consists of one input layer, one output layer and two hidden layers. Data propagates from the input layer to the output layer by going through transformation functions between each layer. The dimensions of hidden layers determine the complexity of parameterization of the neural networks. The propagation of data in the neural networks is characterized by a forward-feeding process. Specifically, each layer takes a linear transformation of the data from previous layer and process it using an activation function. At layer l, we can express the forwardfeeding process as: (1) z (l) = f l ( Wl z (l 1) + b l ), l {1,..., p}, where z l is the transformed data, W l and b l are the weights and bias respectively, and f l is the activation function at layer l. In the first hidden layer, the input data z (0) = x, which is the input to the neural networks. For layers with size of H l, we have W l R H l H l 1 and b l R 1 H l. In addition, in this project, we use a rectified linear unit (ReLU) function as the activation function at all layers, which is defined as: (2) f l (z) = max(0, z), l {1,..., p}, where p is the total number of layers in the neural networks. The network between the last hidden layer and the output layer gives the prediction to the output data. Because the output data is a vector of size n, we represent the prediction function by a linear transformation, which is: (3) ŷ = W o z (p) + b o. Because the output vector in this project is binary, we obtain our predictions as: { +1, ŷ j > 0 (4) ŷ j = 1, ŷ j 0.

CS 229 FINAL PROJECT 3 The cost function of this neural networks is defined by a mean square error (MSE) function, which is: (5) J(W, b; x, y) = 1 m m y (i) ŷ (i) 2 2. i=1 Because J(W, b; x, y) is convex, the parameters can be trained by solving the following problem: (6) min J(W, b; x, y). W l,b l,l={1,...,p} We solve this problem using stochastic gradient descent in a set of back-propagation steps. The training and testing errors are defined by the following equation: m (7) ɛ train/test = i=1 ŷ(i) y (i) 2 2 m. i=1 y(i) 2 2 This error definition is used to characterize the convergence of the learning process at each iteration. In addition, to quantify the accuracy in Si structure prediction, we define the prediction error as: (8) ɛ prediction = 1 m n j=1 1{ŷ j y j }. m n i=1 We use this error definition as a metric to the performance of the learning algorithm. 3. Problem setup and data acquisition We obtain the data for learning from simulations of the optic-electric interaction using the rigorous coupled-wave analysis (RCWA) software. In the simulations, the refraction index of Si is n Si = 3.5 and the refraction index of vacuum is n vacuum = 1. Light with wavelength λ = 1 µm is illuminated perpendicular to the Si structure from the top. For each Si structure randomly generated by the RCWA software, the corresponding electric field is recorded. Figures 3(a) and 3(b) show an example of the Si structure and the magnitude of its output electric field. The Si structure is represented by the dark red and vacuum is represented by dark blue in Fig. 3(a). In Fig. 3(b), the electric field above the Si structure is activated by the reflected light and the electric field below the Si structure is activated by the transmitted light. For the learning purpose, we collected 250000 data sets from the simulations. We use 80% of the data sets as the training set and the rest of the 20% as the test set. The inputs of learning are the electric fields retrieved at two specific vertical locations, which are represented by x (i) C 1 51. In the first case, the inputs are taken at the near field that is 0.1λ below the Si structure; in the second case, the inputs are taken at the far field that is 10λ below the Si struture. Note that x is complex because the electric field signal contains both magnitude and phase information. To avoid the computations with complex numbers, we treat the magnitude and phase of the inputs separately. Therefore, the inputs used in the learning process are represented by x (i) R 1 102. The outputs of learning are the binary vector of Si structure, which is y (i) {+1, 1} n, where n = 20 is used.

4 KAI ZHANG, SHIYU LIU, AND QING WANG (a) (b) Figure 3. (a) 1D Si structure aligned along x-direction at height z = 1000; and (b) the corresponding output electric field given light normal incident on Si structures. 4. Training outcome and discussion The training curve for the near field and far field learning are shown in Figs. 4(a) and 4(b). In both cases, the results converges in about 100 iterations. The test error is slightly greater than the training error. However, the errors for the near-field learning, which are 34.9% for training and 35.4% for testing, are more than two-folds lower than the errors for the far-field training, which are 86.8% for training and 86.9% for testing. The corresponding prediction error for the near-field learning are 3.5% for training and 3.6% for testing, and the prediction error for far-field learning are 29.5% for training and 29.6% for testing. The degradation in performance as the data collection location moves away from the Si structure is a consequence of the decay in optical intensity of the transmitted light. Information carried by the transmitted light dissipates as the light travels. This indicates that the electric field in the near-field provides a better metric than in the far-field to characterize the Si structure. 80 70 Training Testing 90 89 Training Testing Error (%) 60 50 Error (%) 88 40 87 30 0 100 200 300 400 Iterations 86 0 100 200 300 400 Iterations (a) (b) Figure 4. Learning curve with two-layer neural networks for: (a) nearfield electric field training; (b) far-field electric field training. To quantify the far-field efficiency of the prediction, we pre-define an output electric field and apply the parameters obtained from the far-field learning to predict its corresponding Si structure. The comparison of the far-field efficiency of the 45 transmitted light between the predicted Si structure and the optimal structure are shown in Fig. 5. Mispredictions in magnitude and phase of the electric field are observed in Fig. 5. The

CS 229 FINAL PROJECT 5 efficiency associated with the predicted Si structure is 43.4%. To correct for potentially misrepresented Si cell in the predicted structure, we perform a local found method that alters each element in the predicted Si structure one at a time. With this approach, the efficiency improves to 51.2%. Note that the maximum efficiency for all possible 20-cell Si structures is 69.0%. This shows that the deep learning procedure provides us a reliable method to predict the Si structure for prescribed electric field. Figure 5. Comparison of 45 deflected transmitted light electric field of the predicted Si structure and the ideal 100% efficiency electric field. 5. Conclusion In this project, we implemented a two-layer neural networks algorithm to study the relation between optical Si structures and the output electric fields. We trained two neural networks using the near-field and far-field electric field data obtained from the RCWA software. The size of the data sets used for learning is 250000. In both setups, the results converges in 100 iterations. The accuracy of prediction using the near-field data reached 96.5% and the accuracy using far-field data is 70.4%. With local found method, an efficiency of 51.2% is achieved by the predicted Si structure against the 69.0% maximum possible efficiency for the 45 transmitted-light. Compared to the traditional opticaldevice design process, the deep learning approach significantly improves the effectiveness of this process without considering the convoluted underlying physics. To further improve the method, in our future research, we will test different cost functions for training such that the binary nature of the Si structure is better represented. The depth of the neural networks and the dimension of parameters will also be adjusted to optimize the results.