Introduction to PyTorch

Similar documents
Data Mining Techniques. Lecture 2: Regression

GRAD6/8104; INES 8090 Spatial Statistic Spring 2017

Multiple Regression Part I STAT315, 19-20/3/2014

<br /> D. Thiebaut <br />August """Example of DNNRegressor for Housing dataset.""" In [94]:

SKLearn Tutorial: DNN on Boston Data

Stat588 Homework 1 (Due in class on Oct 04) Fall 2011

Supervised Learning. Regression Example: Boston Housing. Regression Example: Boston Housing

Sparse polynomial chaos expansions as a machine learning regression technique

On the Harrison and Rubinfeld Data

Maximum Entropy Klassifikator; Klassifikation mit Scikit-Learn

ON CONCURVITY IN NONLINEAR AND NONPARAMETRIC REGRESSION MODELS

HW1 Roshena MacPherson Feb 1, 2017

Analysis of multi-step algorithms for cognitive maps learning

Introduction to Neural Networks

Introduction to CNN and PyTorch

Review on Spatial Data

Bayesian Model Averaging (BMA) with uncertain Spatial Effects A Tutorial. Martin Feldkircher

DISCRIMINANT ANALYSIS: LDA AND QDA

A Robust Approach of Regression-Based Statistical Matching for Continuous Data

EE-559 Deep learning LSTM and GRU

Pytorch Tutorial. Xiaoyong Yuan, Xiyao Ma 2018/01

Finding Outliers in Models of Spatial Data

epochs epochs

CSC321 Lecture 8: Optimization

Model Selection in Functional Networks via Genetic Algorithms

CSC 411: Lecture 02: Linear Regression

More on Neural Networks

Chapter 4 Dimension Reduction

EE-559 Deep learning Recurrent Neural Networks

Based on the original slides of Hung-yi Lee

Day 3 Lecture 3. Optimizing deep networks

Machine Learning Basics III

Neural Networks: Backpropagation

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

CSC321 Lecture 7: Optimization

text classification 3: neural networks

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms

Theano: A Few Examples

CSC321 Lecture 5: Multilayer Perceptrons

CSC242: Intro to AI. Lecture 21

Stat 401B Final Exam Fall 2016

Optimization for Machine Learning

Solutions. Part I Logistic regression backpropagation with a single training example

Introduction to Machine Learning HW6

Neural Network Tutorial & Application in Nuclear Physics. Weiguang Jiang ( 蒋炜光 ) UTK / ORNL

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Deep Learning & Artificial Intelligence WS 2018/2019

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Intro to Neural Networks and Deep Learning

Backpropagation Introduction to Machine Learning. Matt Gormley Lecture 12 Feb 23, 2018

Machine Learning (CSE 446): Backpropagation

Deep Neural Networks (3) Computational Graphs, Learning Algorithms, Initialisation

Supporting Information

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

NLP Homework: Dependency Parsing with Feed-Forward Neural Network

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

CSE 559A: Computer Vision

GENERAL. CSE 559A: Computer Vision AUTOGRAD AUTOGRAD

ECE521 W17 Tutorial 1. Renjie Liao & Min Bai

Single Index Quantile Regression for Heteroscedastic Data

CSCI 315: Artificial Intelligence through Deep Learning

Short-term water demand forecast based on deep neural network ABSTRACT

Introduction to TensorFlow

Machine Learning. Boris

Artificial Neuron (Perceptron)

(Artificial) Neural Networks in TensorFlow

Name: Student number:

TTIC 31230, Fundamentals of Deep Learning David McAllester, April Vanishing and Exploding Gradients. ReLUs. Xavier Initialization

Statistical Machine Learning

(Artificial) Neural Networks in TensorFlow

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Computational Photography

Midterm for CSC321, Intro to Neural Networks Winter 2018, night section Tuesday, March 6, 6:10-7pm

HOMEWORK #4: LOGISTIC REGRESSION

CSCI567 Machine Learning (Fall 2018)

EPL442: Computational

Neural Networks and Deep Learning

Deep Feedforward Networks

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression

Logistic Regression & Neural Networks

Neural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35

Bayesian Classification Methods

A Practitioner s Guide to MXNet

Advanced Machine Learning

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Lecture 35: Optimization and Neural Nets

STK 2100 Oblig 1. Zhou Siyu. February 15, 2017

Deep Feedforward Networks

CSC 578 Neural Networks and Deep Learning

Ch.6 Deep Feedforward Networks (2/3)

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Recurrent Neural Networks with Flexible Gates using Kernel Activation Functions

Lecture 4 Backpropagation

Forward Selection and Estimation in High Dimensional Single Index Models

Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 3: Introduction to Deep Learning (continued)

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

Nonlinear Statistical Learning with Truncated Gaussian Graphical Models

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

TF Mutiple Hidden Layers: Regression on Boston Data Batched, Parameterized, with Dropout

ECE521: Inference Algorithms and Machine Learning University of Toronto. Assignment 1: k-nn and Linear Regression

Transcription:

Introduction to PyTorch Benjamin Roth Centrum für Informations- und Sprachverarbeitung Ludwig-Maximilian-Universität München beroth@cis.uni-muenchen.de Benjamin Roth (CIS) Introduction to PyTorch 1 / 16

Why PyTorch? Relatively new (Aug. 2016?) Python toolkit based on Torch Overwhelmingly positive reception by the deep learning community. See e.g. http://www.fast.ai/2017/09/08/ introducing-pytorch-for-fastai/ Dynamic computation graphs: process complex inputs and outputs, without worrying to convert every batch of input into a big fat tensor E.g. sequences with different length Control structures, sampling Flexibility to implement low-level and high-level functionality. Modularization uses object orientation. Benjamin Roth (CIS) Introduction to PyTorch 2 / 16

Tensors Tensors hold data Similar to numpy arrays # Unitialized Tensor with values from memory: x = torch.tensor(5, 3) # Randomly initialized Tensor (values in [0..1]): y = torch.rand(5, 3) print(x + y) Output: 0.9404 1.0569 1.1124 0.3283 1.1417 0.6956 0.4977 1.7874 0.2514 0.9630 0.7120 1.0820 1.8417 1.1237 0.1738 [torch.floattensor of size 5x3] In-place operations can increase efficiency: y.add_(x) 100+ Tensor operations: http://pytorch.org/docs/master/torch.html Benjamin Roth (CIS) Introduction to PyTorch 3 / 16

Tensors NumPy import torch a = torch.ones(5) b = a.numpy() print(b) Output: [ 1. 1. 1. 1. 1.] import numpy as np a = np.ones(3) b = torch.from_numpy(a) print(b) Output: 1 1 1 [torch.doubletensor of size 3] Benjamin Roth (CIS) Introduction to PyTorch 4 / 16

Automatic differentiation torch.autograd package Central concept: Variable class node in function graph Holds a value (Tensor):.data Most tensor operations are available for combining Variables If variable was created by functional composition (x = a + b),.grad fn references the function (creator) that created the variable Gradient can be computed:.backward(). Gradient with respect to a specific output:.backward(grad output). Benjamin Roth (CIS) Introduction to PyTorch 5 / 16

Automatic differentiation: Example # Set requires_grad=true, if gradient is to be computed x = Variable(3 * torch.ones(1), requires_grad=true) y = x + 2*x**2 y.backward() Value of x.grad? Benjamin Roth (CIS) Introduction to PyTorch 6 / 16

Defining a neural network A self-defined neural net should inherit from nn.module torch.nn contains predefined layers: nn.linear(input_size, output_size), nn.conv2d(in_channels, out_channels, kernel_size),... Set layers as class attributes: All parameter Variables get automatically registered with the neural net (can be accessed by net.parameters()) Functions without learnable paramters (torch.nn.functional) do not have to be registered as class attributes: relu(...), tanh(...),... Prediction needs to be implemened in net.forward(...) class Net(nn.Module): def init (self, num_features, hidden_size): super(net, self). init () # self.learnable_layer =... def forward(self, x): return # do prediction Benjamin Roth (CIS) Introduction to PyTorch 7 / 16

Linear Regression What is layer and learnable parameters? How to do prediction? Benjamin Roth (CIS) Introduction to PyTorch 8 / 16

Linear Regression import torch.nn as nn class LinearRegression(nn.Module): def init (self, num_features): super(linearregression, self). init () self.linear_layer = nn.linear(num_features, 1) def forward(self, x): return self.linear_layer(x) Benjamin Roth (CIS) Introduction to PyTorch 9 / 16

Linear Regression: prediction for one instance (with untrained model) x_instance: torch.floattensor of size 10 (num_features) y_instance: torch.floattensor of size 1 In order to use tensors in function graph, we need to wrap them in Variables! Type of y_predicted? num_features = 10 lr_model = LinearRegression(num_features) x_var = Variable(x_instance) #y_var = Variable(y_instance) y_predicted = linreg_model.forward(x_var) Benjamin Roth (CIS) Introduction to PyTorch 10 / 16

Linear Regression: training the model Loss function: Define yourself or pre-defined. loss=(y_var-y_predicted)**2 criterion = nn.mseloss() loss = criterion(y_var, y_predicted) Training update: Define yourself or pre-defined. loss.backward() for f in lr_model.parameters(): f.data.sub_(f.grad.data * 0.0001) optimizer = optim.sgd(lr_model.parameters(), lr=0.0001)... loss.backward() optimizer.step() Note: Gradients are accumulated (added) in the Variables for each call of.backward() need to be set to zero for next gradient update optimizer.zero_grad() sets gradients of all network Variables to zero Benjamin Roth (CIS) Introduction to PyTorch 11 / 16

Linear Regression: training the model lr_model = LinearRegression(num_features) optimizer = optim.sgd(lr_model.parameters(), lr=0.0001) criterion = nn.mseloss() for epoch in range(num_epochs): for x_instance, y_instanve in data: x_var = Variable(x_instance) y_var = Variable(y_instance) y_pred = linreg_model.forward(x_var) optimizer.zero_grad() loss.backward() optimizer.step() Comments: Here, we are using only 1 example at a time for our updates. Instead of using plain SGD, there are better learning methods that can adapt their learning rate per parameter (e.g. Adam) Question: step() does not take any arguments. How does it know which parameters to update? Benjamin Roth (CIS) Introduction to PyTorch 12 / 16

Materials http://pytorch.org/tutorials/beginner/deep_learning_ 60min_blitz.html http://pytorch.org/tutorials/beginner/pytorch_with_ examples.html http://pytorch.org/tutorials/beginner/deep_learning_ nlp_tutorial.html Benjamin Roth (CIS) Introduction to PyTorch 13 / 16

Homework Learn regression models for predicting house prices. Derivation of gradient for logistic regression. Benjamin Roth (CIS) Introduction to PyTorch 14 / 16

Homework: Boston house prices prediction Dataset: Harrison & Rubinfeld, 1978 Predict median house price (in 1000USD) per district/town. 506 instances, 13 features Features: CRIM per capita crime rate by town ZN proportion of residential land zoned for lots over 25,000 sq.ft. INDUS proportion of non-retail business acres per town CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise) NOX nitric oxides concentration (parts per 10 million) RM average number of rooms per dwelling AGE proportion of owner-occupied units built prior to 1940 DIS weighted distances to five Boston employment centres RAD index of accessibility to radial highways TAX full-value property-tax rate per 10, 000 USD PTRATIO pupil-teacher ratio by town B 1000(Bk 0.63) 2 where Bk is the proportion of blacks by town LSTAT % lower status of the population Benjamin Roth (CIS) Introduction to PyTorch 15 / 16

Homework: Boston houses prediction Linear regression: ŷ = Wx + b Neural network regression (one hidden layer, ReLu activation): ŷ = W B max(0, W A x + b A ) + b B Benjamin Roth (CIS) Introduction to PyTorch 16 / 16