arxiv: v3 [stat.ml] 4 May 2017

Size: px
Start display at page:

Download "arxiv: v3 [stat.ml] 4 May 2017"

Transcription

1 arxiv: v3 [stat.ml] 4 May R 2

2 R 2

3

4

5

6

7 x F P L z Ll R l {1, 2,..., F P D } z x R F P L z x score = f(z x W output H + b output H )W output O + b output O W output H W output O R HLS dout all R 1 HLS b output O R F P L HLS b output H R 1 dout all d out all = 1 d out all = 4 softmax p(i x) = i escore(x) i 4 j=1 escore(x) j i score(x) i

8 β 1 = 0.9, β 2 = 0.999

9 L2 16, 32, 48, 64, 80, 96, 112, 128 1, 2, 3, 4 5, 10, 15, 20 50, 60, 70, 80, 90, 100 log 6, 5, 4, 3, 2, 1, 0, 1, 2 log 6, 5, 4, 3, 2 log 8, 7, 6, 5, 4

10 J(θ) = 1 n n (ŷ i y i ) 2 + α θ 2 i=1 J(θ) = 1 n [ n i=1 k 1 { y (i) = j } ] e θt j x(i) k + α θ 2 l=1 eθt l x(i) j=1 J(θ) L2

11 log log log softmax

12 F P L

13 y X y ŷ RMSE = n i=1 (ŷ i y i ) 2 n MAE = 1 n n ŷ i y i i=1 P CC 2 = [ ] n i=1 (x 2 i x)(y i ȳ) n i=1 (x i x) 2 n i=1 (y i ȳ) 2

14 log

15 1 RM1 RM2 RM3 RM4 RM5 RM6 RM7 RM8 RM9 RM10 Consensus PCC MAE RMSE 0 Training set Validation set Test set I I II III IV I % I II III IV I % Prediction II % III % Prediction II % III % IV % 73.3% 65.1% 90.3% 40.4% 76.1% Reference of training set IV % 84.4% 76.9% 93.2% 60.1% 85.2% Reference of validation set I II III IV I II III IV I % I % Prediction II % III % Prediction II % III % IV % 71.7% 73.6% 93.4% 63.1% 85.6% Reference of test set I IV % 70.2% 60.2% 92.3% 14.3% 72.3% Reference of test set II 2

16 log log log log

17 I II III IV I % I II III IV I % Prediction II % III % Prediction II % III % IV % 93.8% 90.2% 94.5% 84.5% 92.1% Reference of training set IV % 97.3% 95.9% 96.9% 88.2% 95.8% Reference of validation set I II III IV I II III IV I % I % Prediction II % III % Prediction II % III % IV % IV % 96.7% 91.8% 97.2% 90.8% 95.5% Reference of test set I I II III IV I % 98.2% 95.7% 97.3% 90.5% 96.3% Reference of test set II I II III IV I % Prediction II % III % Prediction II % III % IV % 99.9% 96.9% 92.1% 99.1% 94.9% Reference of training set IV % 90.2% 85.7% 84.9% 75.9% 84.8% Reference of validation set I II III IV I II III IV I % I % Prediction II % III % Prediction II % III % IV % 87.0% 84.8% 88.8% 73.0% 86.6% Reference of test set I IV % 98.2% 95.7% 90.2% 97.6% 93.6% Reference of test set II

18 J(θ) = J C (θ) + βj R (θ) + α θ 2

19 J C (θ) J R (θ) β (0, 1]

20 N 50

21 i i i {{1, 2,..., 10}} 2 2 C γ

22 i i i {{1, 2,..., 10}} ± ± ± ±

23 αβ

24

25

26

27

28

29

30

31

32

33

34

35

36

6.036: Midterm, Spring Solutions

6.036: Midterm, Spring Solutions 6.036: Midterm, Spring 2018 Solutions This is a closed book exam. Calculators not permitted. The problems are not necessarily in any order of difficulty. Record all your answers in the places provided.

More information

Machine Learning. Lecture 3: Logistic Regression. Feng Li.

Machine Learning. Lecture 3: Logistic Regression. Feng Li. Machine Learning Lecture 3: Logistic Regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2016 Logistic Regression Classification

More information

TTIC 31230, Fundamentals of Deep Learning David McAllester, Winter Multiclass Logistic Regression. Multilayer Perceptrons (MLPs)

TTIC 31230, Fundamentals of Deep Learning David McAllester, Winter Multiclass Logistic Regression. Multilayer Perceptrons (MLPs) TTIC 31230, Fundamentals of Deep Learning David McAllester, Winter 2018 Multiclass Logistic Regression Multilayer Perceptrons (MLPs) Stochastic Gradient Descent (SGD) 1 Multiclass Classification We consider

More information

Tom Durrant Frank Woodcock. Diana Greenslade

Tom Durrant Frank Woodcock. Diana Greenslade Tom Durrant Frank Woodcock Centre for Australian Weather and Climate Research Bureau of Meteorology Melbourne, VIC Australia Motivation/Application techniques have been found to be very useful in operational

More information

Lecture 05 Geometry of Least Squares

Lecture 05 Geometry of Least Squares Lecture 05 Geometry of Least Squares 16 September 2015 Taylor B. Arnold Yale Statistics STAT 312/612 Goals for today 1. Geometry of least squares 2. Projection matrix P and annihilator matrix M 3. Multivariate

More information

Neural Networks: Backpropagation

Neural Networks: Backpropagation Neural Networks: Backpropagation Seung-Hoon Na 1 1 Department of Computer Science Chonbuk National University 2018.10.25 eung-hoon Na (Chonbuk National University) Neural Networks: Backpropagation 2018.10.25

More information

Deep Feedforward Networks

Deep Feedforward Networks Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3

More information

Math 1710 Class 20. V2u. Last Time. Graphs and Association. Correlation. Regression. Association, Correlation, Regression Dr. Back. Oct.

Math 1710 Class 20. V2u. Last Time. Graphs and Association. Correlation. Regression. Association, Correlation, Regression Dr. Back. Oct. ,, Dr. Back Oct. 14, 2009 Son s Heights from Their Fathers Galton s Original 1886 Data If you know a father s height, what can you say about his son s? Son s Heights from Their Fathers Galton s Original

More information

ENSEMBLE FLOOD INUNDATION FORECASTING: A CASE STUDY IN THE TIDAL DELAWARE RIVER

ENSEMBLE FLOOD INUNDATION FORECASTING: A CASE STUDY IN THE TIDAL DELAWARE RIVER ENSEMBLE FLOOD INUNDATION FORECASTING: A CASE STUDY IN THE TIDAL DELAWARE RIVER Michael Gomez & Alfonso Mejia Civil and Environmental Engineering Pennsylvania State University 10/12/2017 Mid-Atlantic Water

More information

Collaborative Recommendation with Multiclass Preference Context

Collaborative Recommendation with Multiclass Preference Context Collaborative Recommendation with Multiclass Preference Context Weike Pan and Zhong Ming {panweike,mingz}@szu.edu.cn College of Computer Science and Software Engineering Shenzhen University Pan and Ming

More information

Horizontal Alignment Review Report

Horizontal Alignment Review Report Project: Default Description: Report Created: 3/5/2014 Time: 5:50pm File Name: F:\PROJECT\5114076 SR 76 Design\42264145201\roadway\SS3\DSGNRDMC.dgn Last Revised: 3/5/2014 17:10:26 Note: All units in this

More information

Dr. Allen Back. Sep. 23, 2016

Dr. Allen Back. Sep. 23, 2016 Dr. Allen Back Sep. 23, 2016 Look at All the Data Graphically A Famous Example: The Challenger Tragedy Look at All the Data Graphically A Famous Example: The Challenger Tragedy Type of Data Looked at the

More information

Stephen Scott.

Stephen Scott. 1 / 21 sscott@cse.unl.edu 2 / 21 Introduction Designed to model (profile) a multiple alignment of a protein family (e.g., Fig. 5.1) Gives a probabilistic model of the proteins in the family Useful for

More information

Least Squares Data Fitting

Least Squares Data Fitting Least Squares Data Fitting Stephen Boyd EE103 Stanford University October 31, 2017 Outline Least squares model fitting Validation Feature engineering Least squares model fitting 2 Setup we believe a scalar

More information

Expectations, Markov chains, and the Metropolis algorithm

Expectations, Markov chains, and the Metropolis algorithm Expectations, Markov chains, and the Metropolis algorithm Peter Hoff Departments of Statistics and Biostatistics and the Center for Statistics and the Social Sciences University of Washington 7-27-05 1

More information

INTERVAL ESTIMATION AND HYPOTHESES TESTING

INTERVAL ESTIMATION AND HYPOTHESES TESTING INTERVAL ESTIMATION AND HYPOTHESES TESTING 1. IDEA An interval rather than a point estimate is often of interest. Confidence intervals are thus important in empirical work. To construct interval estimates,

More information

The minus sign indicates that the centroid is located below point E. We will relocate the axis as shown in Figure (1) and take discard the sign:

The minus sign indicates that the centroid is located below point E. We will relocate the axis as shown in Figure (1) and take discard the sign: AOE 304: Thin Walled Structures Solutions to Consider a cantilever beam as shown in the attached figure. At the tip of the beam, a bending moment M = 1000 N-m is applied at an angle θ with respect to the

More information

Machine Learning 1: Linear Regression

Machine Learning 1: Linear Regression 15-830 Machine Learning 1: Linear Regression J. Zico Kolter September 18, 2012 1 Motivation How much energy will we consume tommorrow? Difficult to estimate from a priori models But, we have lots of data

More information

Deep Learning For Mathematical Functions

Deep Learning For Mathematical Functions 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

(X i,y, i i ), i =1, 2,,n, Y i = min (Y i, C i ), i = I (Y i C i ) C i. Y i = min (C, Xi t θ 0 + i ) E [ρ α (Y t )] t ρ α (y) =(α I (y < 0))y I (A )

(X i,y, i i ), i =1, 2,,n, Y i = min (Y i, C i ), i = I (Y i C i ) C i. Y i = min (C, Xi t θ 0 + i ) E [ρ α (Y t )] t ρ α (y) =(α I (y < 0))y I (A ) Y F Y ( ) α q α q α = if y : F Y (y ) α. α E [ρ α (Y t )] t ρ α (y) =(α I (y < 0))y I (A ) Y 1,Y 2,,Y q α 1 Σ i =1 ρ α (Y i t ) t X = x α q α (x ) Y q α (x )= if y : P (Y y X = x ) α. q α (x ) θ α q α

More information

Machine Learning and Data Mining. Linear regression. Kalev Kask

Machine Learning and Data Mining. Linear regression. Kalev Kask Machine Learning and Data Mining Linear regression Kalev Kask Supervised learning Notation Features x Targets y Predictions ŷ Parameters q Learning algorithm Program ( Learner ) Change q Improve performance

More information

Kernel Learning via Random Fourier Representations

Kernel Learning via Random Fourier Representations Kernel Learning via Random Fourier Representations L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Module 5: Machine Learning L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Kernel Learning via Random

More information

Classification: Analyzing Sentiment

Classification: Analyzing Sentiment Classification: Analyzing Sentiment STAT/CSE 416: Machine Learning Emily Fox University of Washington April 17, 2018 Predicting sentiment by topic: An intelligent restaurant review system 1 4/16/18 It

More information

A Constructive-Fuzzy System Modeling for Time Series Forecasting

A Constructive-Fuzzy System Modeling for Time Series Forecasting A Constructive-Fuzzy System Modeling for Time Series Forecasting I. Luna 1, R. Ballini 2 and S. Soares 1 {iluna,dino}@cose.fee.unicamp.br, ballini@eco.unicamp.br 1 Department of Systems, School of Electrical

More information

Bayesian Interpretations of Regularization

Bayesian Interpretations of Regularization Bayesian Interpretations of Regularization Charlie Frogner 9.50 Class 15 April 1, 009 The Plan Regularized least squares maps {(x i, y i )} n i=1 to a function that minimizes the regularized loss: f S

More information

Loss Functions and Optimization. Lecture 3-1

Loss Functions and Optimization. Lecture 3-1 Lecture 3: Loss Functions and Optimization Lecture 3-1 Administrative: Live Questions We ll use Zoom to take questions from remote students live-streaming the lecture Check Piazza for instructions and

More information

Overview. Linear Regression with One Variable Gradient Descent Linear Regression with Multiple Variables Gradient Descent with Multiple Variables

Overview. Linear Regression with One Variable Gradient Descent Linear Regression with Multiple Variables Gradient Descent with Multiple Variables Overview Linear Regression with One Variable Gradient Descent Linear Regression with Multiple Variables Gradient Descent with Multiple Variables Example: Advertising Data Data taken from An Introduction

More information

Classification: Analyzing Sentiment

Classification: Analyzing Sentiment Classification: Analyzing Sentiment STAT/CSE 416: Machine Learning Emily Fox University of Washington April 17, 2018 Predicting sentiment by topic: An intelligent restaurant review system 1 It s a big

More information

Orthogonality. Orthonormal Bases, Orthogonal Matrices. Orthogonality

Orthogonality. Orthonormal Bases, Orthogonal Matrices. Orthogonality Orthonormal Bases, Orthogonal Matrices The Major Ideas from Last Lecture Vector Span Subspace Basis Vectors Coordinates in different bases Matrix Factorization (Basics) The Major Ideas from Last Lecture

More information

Chapter 12 - Solved Problems

Chapter 12 - Solved Problems Chapter 12 - Solved Problems Solved Problem 12.1. The Z-transform of a signal f[k] is given by F q (z) = 2z6 z 5 + 3z 3 + 2z 2 z 7 + 2z 6 + z 5 + z 4 + 0.5 = n(z) d(z) where n(z) and d(z) are the numerator

More information

1. Vectors. notation. examples. vector operations. linear functions. complex vectors. complexity of vector computations

1. Vectors. notation. examples. vector operations. linear functions. complex vectors. complexity of vector computations L. Vandenberghe ECE133A (Winter 2018) 1. Vectors notation examples vector operations linear functions complex vectors complexity of vector computations 1-1 Vector a vector is an ordered finite list of

More information

Math Final Exam.

Math Final Exam. Math 106 - Final Exam. This is a closed book exam. No calculators are allowed. The exam consists of 8 questions worth 100 points. Good luck! Name: Acknowledgment and acceptance of honor code: Signature:

More information

Complex Trigonometric and Hyperbolic Functions (7A)

Complex Trigonometric and Hyperbolic Functions (7A) Complex Trigonometric and Hyperbolic Functions (7A) 07/08/015 Copyright (c) 011-015 Young W. Lim. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation

More information

Last Lecture Recap UVA CS / Introduc8on to Machine Learning and Data Mining. Lecture 3: Linear Regression

Last Lecture Recap UVA CS / Introduc8on to Machine Learning and Data Mining. Lecture 3: Linear Regression UVA CS 4501-001 / 6501 007 Introduc8on to Machine Learning and Data Mining Lecture 3: Linear Regression Yanjun Qi / Jane University of Virginia Department of Computer Science 1 Last Lecture Recap q Data

More information

Business Mathematics and Statistics (MATH0203) Chapter 1: Correlation & Regression

Business Mathematics and Statistics (MATH0203) Chapter 1: Correlation & Regression Business Mathematics and Statistics (MATH0203) Chapter 1: Correlation & Regression Dependent and independent variables The independent variable (x) is the one that is chosen freely or occur naturally.

More information

Gradient-Based Learning. Sargur N. Srihari

Gradient-Based Learning. Sargur N. Srihari Gradient-Based Learning Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation and Other Differentiation

More information

Homework 6. Due: 10am Thursday 11/30/17

Homework 6. Due: 10am Thursday 11/30/17 Homework 6 Due: 10am Thursday 11/30/17 1. Hinge loss vs. logistic loss. In class we defined hinge loss l hinge (x, y; w) = (1 yw T x) + and logistic loss l logistic (x, y; w) = log(1 + exp ( yw T x ) ).

More information

Loss Functions and Optimization. Lecture 3-1

Loss Functions and Optimization. Lecture 3-1 Lecture 3: Loss Functions and Optimization Lecture 3-1 Administrative Assignment 1 is released: http://cs231n.github.io/assignments2017/assignment1/ Due Thursday April 20, 11:59pm on Canvas (Extending

More information

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang Example: image classification indoor Indoor outdoor Example: image classification (multiclass)

More information

CS229 Supplemental Lecture notes

CS229 Supplemental Lecture notes CS229 Supplemental Lecture notes John Duchi Binary classification In binary classification problems, the target y can take on at only two values. In this set of notes, we show how to model this problem

More information

Notes on policy gradients and the log derivative trick for reinforcement learning

Notes on policy gradients and the log derivative trick for reinforcement learning Notes on policy gradients and the log derivative trick for reinforcement learning David Meyer dmm@{1-4-5.net,uoregon.edu,brocade.com,...} June 3, 2016 1 Introduction The log derivative trick 1 is a widely

More information

Definition A.1. We say a set of functions A C(X) separates points if for every x, y X, there is a function f A so f(x) f(y).

Definition A.1. We say a set of functions A C(X) separates points if for every x, y X, there is a function f A so f(x) f(y). The Stone-Weierstrass theorem Throughout this section, X denotes a compact Hausdorff space, for example a compact metric space. In what follows, we take C(X) to denote the algebra of real-valued continuous

More information

CS229 Lecture notes. Andrew Ng

CS229 Lecture notes. Andrew Ng CS229 Lecture notes Andrew Ng Supervised learning Lets start by talking about a few examples of supervised learning problems Suppose we have a dataset giving the living areas and prices of 47 houses from

More information

A Gradient-based Adaptive Learning Framework for Efficient Personal Recommendation

A Gradient-based Adaptive Learning Framework for Efficient Personal Recommendation A Gradient-based Adaptive Learning Framework for Efficient Personal Recommendation Yue Ning 1 Yue Shi 2 Liangjie Hong 2 Huzefa Rangwala 3 Naren Ramakrishnan 1 1 Virginia Tech 2 Yahoo Research. Yue Shi

More information

Lecture 2 - Learning Binary & Multi-class Classifiers from Labelled Training Data

Lecture 2 - Learning Binary & Multi-class Classifiers from Labelled Training Data Lecture 2 - Learning Binary & Multi-class Classifiers from Labelled Training Data DD2424 March 23, 2017 Binary classification problem given labelled training data Have labelled training examples? Given

More information

Machine Learning: Logistic Regression. Lecture 04

Machine Learning: Logistic Regression. Lecture 04 Machine Learning: Logistic Regression Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Supervised Learning Task = learn an (unkon function t : X T that maps input

More information

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression AMS 315/576 Lecture Notes Chapter 11. Simple Linear Regression 11.1 Motivation A restaurant opening on a reservations-only basis would like to use the number of advance reservations x to predict the number

More information

Article from. Predictive Analytics and Futurism. July 2016 Issue 13

Article from. Predictive Analytics and Futurism. July 2016 Issue 13 Article from Predictive Analytics and Futurism July 2016 Issue 13 Regression and Classification: A Deeper Look By Jeff Heaton Classification and regression are the two most common forms of models fitted

More information

Chapter 6. Systems of Equations and Inequalities

Chapter 6. Systems of Equations and Inequalities Chapter 6 Systems of Equations and Inequalities 6.1 Solve Linear Systems by Graphing I can graph and solve systems of linear equations. CC.9-12.A.CED.2, CC.9-12.A.CED.3, CC.9-12.A.REI.6 What is a system

More information

Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks

Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks Jan Drchal Czech Technical University in Prague Faculty of Electrical Engineering Department of Computer Science Topics covered

More information

You submitted this quiz on Wed 16 Apr :18 PM IST. You got a score of 5.00 out of 5.00.

You submitted this quiz on Wed 16 Apr :18 PM IST. You got a score of 5.00 out of 5.00. Feedback IX. Neural Networks: Learning Help You submitted this quiz on Wed 16 Apr 2014 10:18 PM IST. You got a score of 5.00 out of 5.00. Question 1 You are training a three layer neural network and would

More information

Estimating Unnormalised Models by Score Matching

Estimating Unnormalised Models by Score Matching Estimating Unnormalised Models by Score Matching Michael Gutmann Probabilistic Modelling and Reasoning (INFR11134) School of Informatics, University of Edinburgh Spring semester 2018 Program 1. Basics

More information

Example. Populations and samples

Example. Populations and samples Example Two strains of mice: A and B. Measure cytokine IL10 (in males all the same age) after treatment. A B 500 1000 1500 2000 2500 IL10 A B 500 1000 1500 2000 2500 IL10 (on log scale) Point: We re not

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Policy gradients Daniel Hennes 26.06.2017 University Stuttgart - IPVS - Machine Learning & Robotics 1 Policy based reinforcement learning So far we approximated the action value

More information

Lecture 7: Kernels for Classification and Regression

Lecture 7: Kernels for Classification and Regression Lecture 7: Kernels for Classification and Regression CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 15, 2011 Outline Outline A linear regression problem Linear auto-regressive

More information

Linear and Logistic Regression. Dr. Xiaowei Huang

Linear and Logistic Regression. Dr. Xiaowei Huang Linear and Logistic Regression Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Two Classical Machine Learning Algorithms Decision tree learning K-nearest neighbor Model Evaluation Metrics

More information

Multiple Aspect Ranking Using the Good Grief Algorithm. Benjamin Snyder and Regina Barzilay MIT

Multiple Aspect Ranking Using the Good Grief Algorithm. Benjamin Snyder and Regina Barzilay MIT Multiple Aspect Ranking Using the Good Grief Algorithm Benjamin Snyder and Regina Barzilay MIT From One Opinion To Many Much previous work assumes one opinion per text. (Turney 2002; Pang et al 2002; Pang

More information

Point Distribution Models

Point Distribution Models Point Distribution Models Jan Kybic winter semester 2007 Point distribution models (Cootes et al., 1992) Shape description techniques A family of shapes = mean + eigenvectors (eigenshapes) Shapes described

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Expectation Maximization Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning with MAR Values We discussed learning with missing at random values in data:

More information

Data Analysis, Standard Error, and Confidence Limits E80 Spring 2015 Notes

Data Analysis, Standard Error, and Confidence Limits E80 Spring 2015 Notes Data Analysis Standard Error and Confidence Limits E80 Spring 05 otes We Believe in the Truth We frequently assume (believe) when making measurements of something (like the mass of a rocket motor) that

More information

Data Analysis, Standard Error, and Confidence Limits E80 Spring 2012 Notes

Data Analysis, Standard Error, and Confidence Limits E80 Spring 2012 Notes Data Analysis Standard Error and Confidence Limits E80 Spring 0 otes We Believe in the Truth We frequently assume (believe) when making measurements of something (like the mass of a rocket motor) that

More information

TTIC 31230, Fundamentals of Deep Learning David McAllester, April Sequence to Sequence Models and Attention

TTIC 31230, Fundamentals of Deep Learning David McAllester, April Sequence to Sequence Models and Attention TTIC 31230, Fundamentals of Deep Learning David McAllester, April 2017 Sequence to Sequence Models and Attention Encode-Decode Architectures for Machine Translation [Figure from Luong et al.] In Sutskever

More information

Adversarial Surrogate Losses for Ordinal Regression

Adversarial Surrogate Losses for Ordinal Regression Adversarial Surrogate Losses for Ordinal Regression Rizal Fathony, Mohammad Bashiri, Brian D. Ziebart (Illinois at Chicago) August 22, 2018 Presented by: Gregory P. Spell Outline 1 Introduction Ordinal

More information

Replicated Softmax: an Undirected Topic Model. Stephen Turner

Replicated Softmax: an Undirected Topic Model. Stephen Turner Replicated Softmax: an Undirected Topic Model Stephen Turner 1. Introduction 2. Replicated Softmax: A Generative Model of Word Counts 3. Evaluating Replicated Softmax as a Generative Model 4. Experimental

More information

The Effect of Stale Ranging Data on Indoor 2-D Passive Localization

The Effect of Stale Ranging Data on Indoor 2-D Passive Localization The Effect of Stale Ranging Data on Indoor 2-D Passive Localization Chen Xia and Lance C. Pérez Department of Electrical Engineering University of Nebraska-Lincoln, USA chenxia@mariner.unl.edu lperez@unl.edu

More information

Lecture 35: Optimization and Neural Nets

Lecture 35: Optimization and Neural Nets Lecture 35: Optimization and Neural Nets CS 4670/5670 Sean Bell DeepDream [Google, Inceptionism: Going Deeper into Neural Networks, blog 2015] Aside: CNN vs ConvNet Note: There are many papers that use

More information

Notes on Back Propagation in 4 Lines

Notes on Back Propagation in 4 Lines Notes on Back Propagation in 4 Lines Lili Mou moull12@sei.pku.edu.cn March, 2015 Congratulations! You are reading the clearest explanation of forward and backward propagation I have ever seen. In this

More information

Relative Loss Bounds for Multidimensional Regression Problems

Relative Loss Bounds for Multidimensional Regression Problems Relative Loss Bounds for Multidimensional Regression Problems Jyrki Kivinen and Manfred Warmuth Presented by: Arindam Banerjee A Single Neuron For a training example (x, y), x R d, y [0, 1] learning solves

More information

2.10 Saddles, Nodes, Foci and Centers

2.10 Saddles, Nodes, Foci and Centers 2.10 Saddles, Nodes, Foci and Centers In Section 1.5, a linear system (1 where x R 2 was said to have a saddle, node, focus or center at the origin if its phase portrait was linearly equivalent to one

More information

Lecture 12: From One-dimensional to Two-dimensional Random Walks

Lecture 12: From One-dimensional to Two-dimensional Random Walks Physical Principles in Biology Biology 3550 Fall 2018 Lecture 12: From One-dimensional to Two-dimensional Random Walks Monday, 17 September 2018 c David P. Goldenberg University of Utah goldenberg@biology.utah.edu

More information

Linear classifiers: Logistic regression

Linear classifiers: Logistic regression Linear classifiers: Logistic regression STAT/CSE 416: Machine Learning Emily Fox University of Washington April 19, 2018 How confident is your prediction? The sushi & everything else were awesome! The

More information

Fast and Effective Limited Pass Learning for Large Data Quantities. Presented by: Nayyar Zaidi

Fast and Effective Limited Pass Learning for Large Data Quantities. Presented by: Nayyar Zaidi Fast and Effective Limited Pass Learning for Large Data Quantities Presented by: Nayyar Zaidi nayyar.zaidi@monash.edu .8.7.6 RMSE.5.4.3.2.1 1, 2, 3, 4, 5, 6, Training set size 7, 8, 9, 1,, .8.7.6 RMSE.5.4.3.2.1

More information

Lecture 4 Logistic Regression

Lecture 4 Logistic Regression Lecture 4 Logistic Regression Dr.Ammar Mohammed Normal Equation Hypothesis hθ(x)=θ0 x0+ θ x+ θ2 x2 +... + θd xd Normal Equation is a method to find the values of θ operations x0 x x2.. xd y x x2... xd

More information

Multilayer Perceptron

Multilayer Perceptron Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Single Perceptron 3 Boolean Function Learning 4

More information

CS 453X: Class 1. Jacob Whitehill

CS 453X: Class 1. Jacob Whitehill CS 453X: Class 1 Jacob Whitehill How old are these people? Guess how old each person is at the time the photo was taken based on their face image. https://www.vision.ee.ethz.ch/en/publications/papers/articles/eth_biwi_01299.pdf

More information

Confidence intervals and Hypothesis testing

Confidence intervals and Hypothesis testing Confidence intervals and Hypothesis testing Confidence intervals offer a convenient way of testing hypothesis (all three forms). Procedure 1. Identify the parameter of interest.. Specify the significance

More information

Serie temporali: previsioni

Serie temporali: previsioni A. A. 2016-2017 Serie temporali: previsioni prof. ing. Antonio Comi Department of Enterprise Engineering University of Rome Tor Vergata Bibliography Forecasting: principles and practice by Rob J Hyndman

More information

ECE 275A Homework 6 Solutions

ECE 275A Homework 6 Solutions ECE 275A Homework 6 Solutions. The notation used in the solutions for the concentration (hyper) ellipsoid problems is defined in the lecture supplement on concentration ellipsoids. Note that θ T Σ θ =

More information

UVA$CS$6316$$ $Fall$2015$Graduate:$$ Machine$Learning$$ $ $Lecture$4:$More$opBmizaBon$for$Linear$ Regression$

UVA$CS$6316$$ $Fall$2015$Graduate:$$ Machine$Learning$$ $ $Lecture$4:$More$opBmizaBon$for$Linear$ Regression$ Dr.YanjunQi/UVACS6316/f15 UVACS6316 Fall2015Graduate: MachineLearning Lecture4:MoreopBmizaBonforLinear Regression 9/14/15 Dr.YanjunQi UniversityofVirginia Departmentof ComputerScience 1 Whereweare?! FivemajorsecHonsofthiscourse

More information

Deep Feedforward Networks. Seung-Hoon Na Chonbuk National University

Deep Feedforward Networks. Seung-Hoon Na Chonbuk National University Deep Feedforward Networks Seung-Hoon Na Chonbuk National University Neural Network: Types Feedforward neural networks (FNN) = Deep feedforward networks = multilayer perceptrons (MLP) No feedback connections

More information

Hidden Markov Models. Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98)

Hidden Markov Models. Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98) Hidden Markov Models Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98) 1 The occasionally dishonest casino A P A (1) = P A (2) = = 1/6 P A->B = P B->A = 1/10 B P B (1)=0.1... P

More information

Advanced Parametric Mixture Model for Multi-Label Text Categorization. by Tzu-Hsiang Kao

Advanced Parametric Mixture Model for Multi-Label Text Categorization. by Tzu-Hsiang Kao Advanced Parametric Mixture Model for Multi-Label Text Categorization by Tzu-Hsiang Kao A dissertation submitted in partial fulfillment of the requirements for the degree of Master of Science (Industrial

More information

Simple and Multiple Linear Regression

Simple and Multiple Linear Regression Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where

More information

Pairwise alignment using HMMs

Pairwise alignment using HMMs Pairwise alignment using HMMs The states of an HMM fulfill the Markov property: probability of transition depends only on the last state. CpG islands and casino example: HMMs emit sequence of symbols (nucleotides

More information

Proximal Policy Optimization (PPO)

Proximal Policy Optimization (PPO) Proximal Policy Optimization (PPO) default reinforcement learning algorithm at OpenAI Policy Gradient On-policy Off-policy Add constraint DeepMind https://youtu.be/gn4nrcc9twq OpenAI https://blog.openai.com/o

More information

MATH Calculus III Fall 2009 Homework 1 - Solutions

MATH Calculus III Fall 2009 Homework 1 - Solutions MATH 2300 - Calculus III Fall 2009 Homework 1 - Solutions 1. Find the equations of the two spheres that are tangent with equal radii whose centers are ( 3, 1, 2) and (5, 3, 6). SOLUTION: In order for the

More information

Practice Examination # 3

Practice Examination # 3 Practice Examination # 3 Sta 23: Probability December 13, 212 This is a closed-book exam so do not refer to your notes, the text, or any other books (please put them on the floor). You may use a single

More information

Levinson Durbin Recursions: I

Levinson Durbin Recursions: I Levinson Durbin Recursions: I note: B&D and S&S say Durbin Levinson but Levinson Durbin is more commonly used (Levinson, 1947, and Durbin, 1960, are source articles sometimes just Levinson is used) recursions

More information

Lecture 5 Multivariate Linear Regression

Lecture 5 Multivariate Linear Regression Lecture 5 Multivariate Linear Regression Dan Sheldon September 23, 2014 Topics Multivariate linear regression Model Cost function Normal equations Gradient descent Features Book Data 10 8 Weight (lbs.)

More information

VECTORS IN A STRAIGHT LINE

VECTORS IN A STRAIGHT LINE A. The Equation of a Straight Line VECTORS P3 VECTORS IN A STRAIGHT LINE A particular line is uniquely located in space if : I. It has a known direction, d, and passed through a known fixed point, or II.

More information

Lesson 2: Analysis of time series

Lesson 2: Analysis of time series Lesson 2: Analysis of time series Time series Main aims of time series analysis choosing right model statistical testing forecast driving and optimalisation Problems in analysis of time series time problems

More information

Levinson Durbin Recursions: I

Levinson Durbin Recursions: I Levinson Durbin Recursions: I note: B&D and S&S say Durbin Levinson but Levinson Durbin is more commonly used (Levinson, 1947, and Durbin, 1960, are source articles sometimes just Levinson is used) recursions

More information

Normalising constants and maximum likelihood inference

Normalising constants and maximum likelihood inference Normalising constants and maximum likelihood inference Jakob G. Rasmussen Department of Mathematics Aalborg University Denmark March 9, 2011 1/14 Today Normalising constants Approximation of normalising

More information

Omm Al-Qura University Dr. Abdulsalam Ai LECTURE OUTLINE CHAPTER 3. Vectors in Physics

Omm Al-Qura University Dr. Abdulsalam Ai LECTURE OUTLINE CHAPTER 3. Vectors in Physics LECTURE OUTLINE CHAPTER 3 Vectors in Physics 3-1 Scalars Versus Vectors Scalar a numerical value (number with units). May be positive or negative. Examples: temperature, speed, height, and mass. Vector

More information

Multiple state optimal design problems with explicit solution

Multiple state optimal design problems with explicit solution Multiple state optimal design problems with explicit solution Marko Vrdoljak Department of Mathematics Faculty of Science University of Zagreb, Croatia Krešimir Burazin Department of Mathematics University

More information

arxiv: v3 [stat.ml] 14 Apr 2016

arxiv: v3 [stat.ml] 14 Apr 2016 arxiv:1307.0048v3 [stat.ml] 14 Apr 2016 Simple one-pass algorithm for penalized linear regression with cross-validation on MapReduce Kun Yang April 15, 2016 Abstract In this paper, we propose a one-pass

More information

Deep Feedforward Networks

Deep Feedforward Networks Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3

More information

Evaluation. Andrea Passerini Machine Learning. Evaluation

Evaluation. Andrea Passerini Machine Learning. Evaluation Andrea Passerini passerini@disi.unitn.it Machine Learning Basic concepts requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain

More information

Introduction to conic sections. Author: Eduard Ortega

Introduction to conic sections. Author: Eduard Ortega Introduction to conic sections Author: Eduard Ortega 1 Introduction A conic is a two-dimensional figure created by the intersection of a plane and a right circular cone. All conics can be written in terms

More information