Neural Network Tutorial & Application in Nuclear Physics. Weiguang Jiang ( 蒋炜光 ) UTK / ORNL

Size: px

Start display at page:

Download "Neural Network Tutorial & Application in Nuclear Physics. Weiguang Jiang ( 蒋炜光 ) UTK / ORNL"

Toby Allison
5 years ago
Views:

1 Neural Network Tutorial & Application in Nuclear Physics Weiguang Jiang ( 蒋炜光 ) UTK / ORNL

2 Machine Learning Logistic Regression Gaussian Processes Neural Network Support vector machine Random Forest Genetic Algorithm

3 Machine Learning Establish a function Image Reconition f ( ) = Panda Playing Go f ( ) = 3-4 (next move) Extrapolation f ( ) = g.s. : MeV

4 Machine Learning Framework Image Recognition f ( ) = Panda f Model : Complex function with lots of parameters (black box) f 1 ( ) = Panda f 1 ( ) = Dog f 2 ( ) = Panda f 2 ( ) = cat

5 Machine Learning Supervised Learning A set of function f 1 ( ) = Panda f 1 ( ) = Dog f 2 ( ) = Panda f 2 ( ) = cat Goodness of the function f Training Data Supervised Learning Function input: Function output: bird Timon dog

6 Machine Learning Supervised Learning A set of function f 1, f 2, f 3 Training Testing dog Goodness of the function f Find the best function f Use f Training Data bird Timon dog

7 Three Steps for Machine Learning Step 1: Define a set of function Step 2: Evaluate the function Step 3: Choose the best function

8 Three Steps for Machine Learning Step 1: Neural Network Step 2: Evaluate the function Step 3: Choose the best function

9 Neural Network & Machine Learning neurons input layer weights hidden layer(s) 128 X 128 X 3 color = neurons chroma output layer panda A sample of feed forward neural network (NN)

10 Feedforward Neural Network x 1 x 2 x 3 x 4 x 5 x 6 z 1 = x 1 w 1 + +x n w n + bias z 1 x w 1 z σ(z) x Activation function y 1 Feedforward: The value for every neuron only depend on the previous layer.

11 Activation function give non-linearity to the NN Sigmoid z σ(z) α Activation function ReLU

12 Neural Network Function input layer (X): one hidden layer: output layer (Y): i neurons j neurons k neurons Z j = X i W ij + b j i X j = σ (Z j ) Y k = X j W jk + b k j

13 Tensor operation Training data: input layer (X): one hidden layer: output layer (Y): p sample i neurons j neurons k neurons Z (p,j) = X (p,i) W (i,j) + b (p,j) z 11 z 1j = z p1 z pj x 11 x 1i x p1 x pi w 11 w 1j + w i1 w ij b 11 b 1j b p1 b pj Y (p,k) = X (p,j) W (j,k) + b (p,j) y 11 y 1j = y p1 y pj σ(z) 11 σ(z) 1j σ(z) p1 σ(z) pj w 11 w 1j + w i1 w ij b 11 b 1j b p1 b pj

14 Three Steps for Machine Learning Step 1: Neural Network Step 2: Evaluate the function Step 3: Choose the best function

15 Evaluate a network Image Recognition Introduce a loss function to describe the performance of the network (mse, cross entropy) x= y = f(x) NN Loss: p L = l p r=1 Smaller the better = Supervised : = 0 1 mse: l p = σ k y k y 2 k /k

16 Three Steps for Machine Learning Step 1: Neural Network Step 2: Evaluate the function Step 3: Choose the best function

17 Learning : find the best function Ultimate goal: Find the network parameters set that minimize the total loss L Gradient Descent (even for AlphaGo) Pick an initial value for network parameters w Compute L/ w with training data Update the parameters w w η L/ w Repeat until L/ w is small enough This procedure is so call the machine learning.

18 Backpropagation An efficient way to compute L/ w Z j = X i W ij + b j i X j = σ (Z j ) Y k = X j W jk + b k j mse: l p = σ k y k y 2 k /k Function signals Error signals Backpropagation (BP)

19 Backpropagation Z j = X i W ij + b j i l w = l z x l b = l z X j = σ (Z j ) Sigmoid: x = l = l 1 Z x 1+e z (1 1 1+e z) Y k = X j W jk + b k j l w = l y x l b = l y mse: l p = σ k y k y k 2 /k l y = 2/k * (y y)

20 Optimizer Gradient Descent : walking in the desert, blindfold Cannot see the whole picture SGD Momentum AdaGrad Adam

21 Deep learning Deep just means more hidden layers Deep is better But not too deep Deep VS Wide

22 Deep learning Gradient vanishing/exploding problem n hidden layers w q is the weights for q th hidden layer l x = l y w l = l w q y w n w n 1 w n 2 w q+1 x

23 Overfitting Training data and testing data can be different Solution: Get more training data Create more training data Dropout L1/L2 regularization Early Stopping

24 Friendly tool: Keras Python $ apt-get install python3-pip $ pip3 install keras $ pip3 install tensorflow

Neural network simulation & extrapolation

25 Neural network simulation & extrapolation NN application in nuclear physics 4 He ground-state energy NCSM NCSM ħω ħω * Negoita G A, Luecke G R, Vary J P, et al. Deep Learning: A Tool for Computational Nuclear Physics[J]. arxiv preprint arxiv: , 2018.

26 Neural network simulation & extrapolation 4 He radius NCSM NCSM ħω ħω

Neural network simulation & extrapolation The minimum

* Forssén, C., Carlsson, B. D., Johansson, H. T.

27 Neural network simulation & extrapolation The minimum g.s energy of each Nmax drops exponentially Nmax Nmax * Forssén, C., Carlsson, B. D., Johansson, H. T., Sääf, D., Bansal, A., Hagen, G., & Papenbrock, T. (2018). Large-scale exact diagonalizations reveal low-momentum scales of nuclei. Physical Review C, 97(3),

28 Neural network for Coupled-cluster CCSDT equations: We want to truncate the 3p3h configurations Introduce NN to select the more important configurations

29 Neural network for Coupled-cluster The input layer include all the quantum numbers of the 3p3h amplitudes ( n l j j tot t z ). 8 He 16 O NN NN 3p3h amplitudes 3p3h amplitudes

30 Neural network in nuclear matter Train with different combination of cd,ce. density

31 Neural network Uncertainty analysis Even with the same input data and network structure, the NN will give different results in mutual independence trainings. Sources of uncertainty 1. random initialization of neural network parameters 2. different divisions between training data and validation data 3. data shuffle (limit batch size) Solution 1.??? 2. k-fold cross validation 3. increase batch size

32 Ensembles of Neural Networks 4 He radius 1.45 The distribution of NN predict radius is Gaussian. Nmax 4-12 Define the full width at half maximum value as the uncertainty for certain NN structure. The NN uncertainty reduce with more training data. Nmax The almost identical value of NN prediction radius indicates that the NN has a good generalization performance. Nmax

33 More complex case: 4 He g.s. energy, with two peak Nmax 4-10 Nmax 4-12 Nmax 4-14

34 4 He g.s. energy, with two peak Nmax 4-16 Nmax 4-18 Nmax 4-20

35 We can separate the peaks with features (loss) provided by NN.

Introduction to Neural Networks

Introduction to Neural Networks CUONG TUAN NGUYEN SEIJI HOTTA MASAKI NAKAGAWA Tokyo University of Agriculture and Technology Copyright by Nguyen, Hotta and Nakagawa 1 Pattern classification Which category of an input? Example: Character