Knowledge Tracing Machines: Families of models for predicting student performance
|
|
- Magdalene Montgomery
- 5 years ago
- Views:
Transcription
1 Knowledge Tracing Machines: Families of models for predicting student performance Jill-Jênn Vie RIKEN Center for Advanced Intelligence Project, Tokyo Optimizing Human Learning, June 12, 2018 Polytechnique Montréal, June 18, 2018
2 Predicting student performance Data A population of students answering questions Events: Student i answered question j correctly/incorrectly Goal Learn the difficulty of questions automatically from data Measure the knowledge of students Potentially optimize their learning Assumption Good model for prediction Good adaptive policy for teaching
3 Learning outcomes of this tutorial Logistic regression is amazing Unidimensional Takes IRT, PFA as special cases Factorization machines are even more amazing Multidimensional Take MIRT as special case It makes sense to consider deep neural networks What does deep knowledge tracing model exactly?
4 Families of models Factorization Machines (Rendle 2012) Multidimensional Item Response Theory Logistic Regression Item Response Theory Performance Factor Analysis Recurrent Neural Networks Deep Knowledge Tracing (Piech et al. 2015) Steffen Rendle (2012). Factorization Machines with libfm. In: ACM Transactions on Intelligent Systems and Technology (TIST) 3.3, 57:1 57:22. doi: / Chris Piech et al. (2015). Deep knowledge tracing. In: Advances in Neural Information Processing Systems (NIPS), pp
5 Problems Weak generalization Filling the blanks: some students did not attempt all questions Strong generalization Cold-start: some new students are not in the train set
6 Dummy dataset User 1 answered Item 1 correct User 1 answered Item 2 incorrect User 2 answered Item 1 incorrect User 2 answered Item 1 correct User 2 answered Item 2??? user item correct dummy.csv
7 Task 1: Item Response Theory Learn abilities θ i for each user i Learn easiness e j for each item j such that: Pr(User i Item j OK) = σ(θ i + e j ) logit Pr(User i Item j OK) = θ i + e j Logistic regression Learn w such that logit Pr(x) = w, x Usually with L2 regularization: w 2 2 penalty Gaussian prior
8 Graphically: IRT as logistic regression Encoding of User i answered Item j : w x θ i U i 1 e j I j 1 Users Items logit Pr(User i Item j OK) = w, x = θ i + e j
9 Encoding python encode.py --users --items Users Items U 0 U 1 U 2 I 0 I 1 I data/dummy/x-ui.npz Then logistic regression can be run on the sparse features: python lr.py data/dummy/x-ui.npz
10 Oh, there s a problem python encode.py --users --items python lr.py data/dummy/x-ui.npz Users Items U 0 U 1 U 2 I 0 I 1 I 2 y pred y User 1 Item 1 OK User 1 Item 2 NOK User 2 Item 1 NOK User 2 Item 1 OK User 2 Item 2 NOK We predict the same thing when there are several attempts.
11 Count successes and failures Keep track of what the student has done before: user item skill correct wins fails data/dummy/data.csv
12 Task 2: Performance Factor Analysis W ik : how many successes of user i over skill k (F ik : #failures) Learn β k, γ k, δ k for each skill k such that: logit Pr(User i Item j OK) = Skill k of Item j python encode.py --skills --wins --fails Skills Wins Fails S 0 S 1 S 2 S 0 S 1 S 2 S 0 S 1 S 2 β k + W ik γ k + F ik δ k data/dummy/x-swf.npz
13 Better! python encode.py --skills --wins --fails python lr.py data/dummy/x-swf.npz Skills Wins Fails S 0 S 1 S 2 S 0 S 1 S 2 S 0 S 1 S 2 y pred y User 1 Item 1 OK User 1 Item 2 NOK User 2 Item 1 NOK User 2 Item 1 OK User 2 Item 2 NOK
14 Task 3: a new model (but still logistic regression) python encode.py --items --skills --wins --fails python lr.py data/dummy/x-iswf.npz
15 Here comes a new challenger How to model side information in, say, recommender systems? Logistic Regression Learn a bias for each feature (each user, item, etc.) Factorization Machines Learn a bias and an embedding for each feature
16 What can be done with embeddings?
17 Interpreting the components
18 Interpreting the components
19 Graphically: logistic regression w x θ i U i 1 e j I j 1 Users Items
20 Graphically: factorization machines V u i v j s k w θ i e j β k x U i I j S k Users Items Skills
21 Formally: factorization machines Learn bias w k and embedding v k for each feature k such that: N logit p(x) = µ + w k x k + x k x l v k, v l Particular cases k=1 }{{} logistic regression 1 k<l N }{{} pairwise interactions Multidimensional item response theory: logit p = u i, v j + e j SPARFA: v j > 0 and v j sparse GenMA: v j is constrained by the zeroes of a q-matrix (q ij ) i,j Andrew S Lan, Andrew E Waters, Christoph Studer, and Richard G Baraniuk (2014). Sparse factor analysis for learning and content analytics. In: The Journal of Machine Learning Research 15.1, pp Jill-Jênn Vie, Fabrice Popineau, Yolaine Bourda, and Éric Bruillard (2016). Adaptive Testing Using a General Diagnostic Model. In: European Conference on Technology Enhanced Learning. Springer, pp
22 Tradeoff expressiveness/interpretability NLL logit p 4 q 7 q 10 q Rasch θ i + e j (79%) (79%) (79%) DINA 1 s j or g j (80%) (82%) (82%) MIRT u i, v j + e j (83%) (86%) (86%) GenMA u i, q j + e j (79%) (85%) (88%) Comparaison de modèles de tests adaptatifs (Fraction) Rasch MIRT K = 2 GenMA K = 8 DINA K = Comparaison de modèles de tests adaptatifs (TIMSS) Rasch MIRT K = 2 GenMA K = 13 DINA K = Log loss 0.5 Log loss Nombre de questions posées Nombre de questions posées
23 Assistments 2009 dataset attempts of 4163 students over items on 124 skills. Download Put it in data/assistments09 python fm.py data/assistments09/x-ui.npz etc. or make big AUC users + items skills + w + f items + skills + w + f LR (IRT) 2s (PFA) 9s s FM min9s s min30s Results obtained with FM d = 20
24 Deep Factorization Machines Learn layers W (l) and b (l) such that: a 0 (x) = (v user, v item, v skill,...) a (l+1) (x) = ReLU(W (l) a (l) (x) + b (l) ) l = 0,..., L 1 y DNN (x) = ReLU(W (L) a (L) (x) + b (L) ) logit p(x) = y FM (x) + y DNN (x) Jill-Jênn Vie (2018). Deep Factorization Machines for Knowledge Tracing. In: The 13th Workshop on Innovative Use of NLP for Building Educational Applications. url:
25 Comparison FM: y FM factorization machine with λ = 0.01 Deep: y DNN : multilayer perceptron DeepFM: y DNN + y FM with shared embedding Bayesian FM: w k, v kf N (µ f, 1/λ f ) µ f N (0, 1), λ f Γ(1, 1) (trained using Gibbs sampling) Various types of side information first: <discrete> (user, token, countries, etc.) last: <discrete> + <continuous> (time + days) pfa: <discrete> + wins + fails
26 Duolingo dataset
27 Results Model d epoch train first last pfa Bayesian FM / Bayesian FM / DeepFM 20 15/ Bayesian FM / FM 20 20/ Bayesian FM / FM 20 21/ FM 20 37/ DeepFM 20 77/ Deep 20 7/ Deep / LR 0 50/ LR 0 50/ LR 0 50/
28 Duolingo ranking Rank Team Algo AUC 1 SanaLabs RNN + GBDT singsound RNN NYU GBDT CECL LR + L1 (13M feat.) TMU RNN (off) JJV Bayesian FM (off) JJV DeepFM JJV DeepFM Duolingo LR.771 Burr Settles, Chris Brust, Erin Gustafson, Masato Hagiwara, and Nitin Madnani (2018). Second language acquisition modeling. In: Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications, pp url:
29 What bout recurrent neural networks? Deep Knowledge Tracing: model the problem as sequence prediction Each student on skill q t has performance a t How to predict outcomes y on every skill k? Spoiler: by measuring the evolution of a latent state h t
30 Graphically: deep knowledge tracing q 0, a 0 q 1, a 1 q 2, a 2 h 0 h 1 h 2 h 3 y = y 0 y q1 y M 1 y y = y 0 y M 1
31 Graphically: there is a MIRT in my DKT q 0, a 0 q 1, a 1 q 2, a 2 h 0 h 1 h 2 h 3 v q1 v q2 v q3 y q1 = σ( h 1, v q1 ) y q2 y q3
32 Drawback of Deep Knowledge Tracing DKT does not model individual differences. Actually, Wilson even managed to beat DKT with (1-dim!) IRT. By estimating on-the-fly the student s learning ability, we managed to get a better model. AUC BKT IRT PFA DKT DKT-DSC Cognitive Tutor Assistments Assistments Assistments Sein Minn, Yi Yu, Michel Desmarais, Feida Zhu, and Jill-Jênn Vie (2018). Deep Knowledge Tracing and Dynamic Student Classification for Knowledge Tracing. Submitted at IEEE International Conference on Data Mining.
33 Take home message Factorization machines are a strong baseline that take many models as special cases Recurrent neural networks are powerful because they track the evolution of the latent state (try simpler dynamic models?) Deep factorization machines may require more data/tuning, but neural collaborative filtering offer promising directions
34 Any suggestions are welcome! Feel free to chat: All code: github.com/jilljenn/ktm Do you have any questions?
35 Lan, Andrew S, Andrew E Waters, Christoph Studer, and Richard G Baraniuk (2014). Sparse factor analysis for learning and content analytics. In: The Journal of Machine Learning Research 15.1, pp Minn, Sein, Yi Yu, Michel Desmarais, Feida Zhu, and Jill-Jênn Vie (2018). Deep Knowledge Tracing and Dynamic Student Classification for Knowledge Tracing. Submitted at IEEE International Conference on Data Mining. Piech, Chris, Jonathan Bassen, Jonathan Huang, Surya Ganguli, Mehran Sahami, Leonidas J Guibas, and Jascha Sohl-Dickstein (2015). Deep knowledge tracing. In: Advances in Neural Information Processing Systems (NIPS), pp Rendle, Steffen (2012). Factorization Machines with libfm. In: ACM Transactions on Intelligent Systems and Technology (TIST) 3.3, 57:1 57:22. doi: /
36 Settles, Burr, Chris Brust, Erin Gustafson, Masato Hagiwara, and Nitin Madnani (2018). Second language acquisition modeling. In: Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications, pp url: Vie, Jill-Jênn (2018). Deep Factorization Machines for Knowledge Tracing. In: The 13th Workshop on Innovative Use of NLP for Building Educational Applications. url: Vie, Jill-Jênn, Fabrice Popineau, Yolaine Bourda, and Éric Bruillard (2016). Adaptive Testing Using a General Diagnostic Model. In: European Conference on Technology Enhanced Learning. Springer, pp
arxiv: v2 [cs.lg] 5 May 2015
fastfm: A Library for Factorization Machines Immanuel Bayer University of Konstanz 78457 Konstanz, Germany immanuel.bayer@uni-konstanz.de arxiv:505.0064v [cs.lg] 5 May 05 Editor: Abstract Factorization
More informationAutomating variational inference for statistics and data mining
Automating variational inference for statistics and data mining Tom Minka Machine Learning and Perception Group Microsoft Research Cambridge A common situation You have a dataset Some models in mind Want
More informationarxiv: v3 [cs.lg] 23 Nov 2016
Journal of Machine Learning Research 17 (2016) 1-5 Submitted 7/15; Revised 5/16; Published 10/16 fastfm: A Library for Factorization Machines Immanuel Bayer University of Konstanz 78457 Konstanz, Germany
More informationAndriy Mnih and Ruslan Salakhutdinov
MATRIX FACTORIZATION METHODS FOR COLLABORATIVE FILTERING Andriy Mnih and Ruslan Salakhutdinov University of Toronto, Machine Learning Group 1 What is collaborative filtering? The goal of collaborative
More informationFACTORIZATION MACHINES AS A TOOL FOR HEALTHCARE CASE STUDY ON TYPE 2 DIABETES DETECTION
SunLab Enlighten the World FACTORIZATION MACHINES AS A TOOL FOR HEALTHCARE CASE STUDY ON TYPE 2 DIABETES DETECTION Ioakeim (Kimis) Perros and Jimeng Sun perros@gatech.edu, jsun@cc.gatech.edu COMPUTATIONAL
More informationStatistical NLP for the Web
Statistical NLP for the Web Neural Networks, Deep Belief Networks Sameer Maskey Week 8, October 24, 2012 *some slides from Andrew Rosenberg Announcements Please ask HW2 related questions in courseworks
More informationDeep Neural Networks as Gaussian Processes
Deep Neural Networks as Gaussian Processes Jaehoon Lee, Yasaman Bahri, Roman Novak, Samuel S. Schoenholz, Jeffrey Pennington, Jascha Sohl-Dickstein Google Brain {jaehlee, yasamanb, romann, schsam, jpennin,
More informationLogistic Regression. Will Monroe CS 109. Lecture Notes #22 August 14, 2017
1 Will Monroe CS 109 Logistic Regression Lecture Notes #22 August 14, 2017 Based on a chapter by Chris Piech Logistic regression is a classification algorithm1 that works by trying to learn a function
More information* Matrix Factorization and Recommendation Systems
Matrix Factorization and Recommendation Systems Originally presented at HLF Workshop on Matrix Factorization with Loren Anderson (University of Minnesota Twin Cities) on 25 th September, 2017 15 th March,
More informationMachine Learning (CS 567) Lecture 2
Machine Learning (CS 567) Lecture 2 Time: T-Th 5:00pm - 6:20pm Location: GFS118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationWHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY,
WHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY, WITH IMPLICATIONS FOR TRAINING Sanjeev Arora, Yingyu Liang & Tengyu Ma Department of Computer Science Princeton University Princeton, NJ 08540, USA {arora,yingyul,tengyu}@cs.princeton.edu
More informationMachine Teaching. for Personalized Education, Security, Interactive Machine Learning. Jerry Zhu
Machine Teaching for Personalized Education, Security, Interactive Machine Learning Jerry Zhu NIPS 2015 Workshop on Machine Learning from and for Adaptive User Technologies Supervised Learning Review D:
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationRecurrent Latent Variable Networks for Session-Based Recommendation
Recurrent Latent Variable Networks for Session-Based Recommendation Panayiotis Christodoulou Cyprus University of Technology paa.christodoulou@edu.cut.ac.cy 27/8/2017 Panayiotis Christodoulou (C.U.T.)
More informationBasic Principles of Unsupervised and Unsupervised
Basic Principles of Unsupervised and Unsupervised Learning Toward Deep Learning Shun ichi Amari (RIKEN Brain Science Institute) collaborators: R. Karakida, M. Okada (U. Tokyo) Deep Learning Self Organization
More information36-720: The Rasch Model
36-720: The Rasch Model Brian Junker October 15, 2007 Multivariate Binary Response Data Rasch Model Rasch Marginal Likelihood as a GLMM Rasch Marginal Likelihood as a Log-Linear Model Example For more
More informationRandom Field Models for Applications in Computer Vision
Random Field Models for Applications in Computer Vision Nazre Batool Post-doctorate Fellow, Team AYIN, INRIA Sophia Antipolis Outline Graphical Models Generative vs. Discriminative Classifiers Markov Random
More informationLikelihood and Fairness in Multidimensional Item Response Theory
Likelihood and Fairness in Multidimensional Item Response Theory or What I Thought About On My Holidays Giles Hooker and Matthew Finkelman Cornell University, February 27, 2008 Item Response Theory Educational
More informationClick Prediction and Preference Ranking of RSS Feeds
Click Prediction and Preference Ranking of RSS Feeds 1 Introduction December 11, 2009 Steven Wu RSS (Really Simple Syndication) is a family of data formats used to publish frequently updated works. RSS
More informationLearning from Data. Amos Storkey, School of Informatics. Semester 1. amos/lfd/
Semester 1 http://www.anc.ed.ac.uk/ amos/lfd/ Introduction Welcome Administration Online notes Books: See website Assignments Tutorials Exams Acknowledgement: I would like to that David Barber and Chris
More informationBLAh: Boolean Logic Analysis for Graded Student Response Data
TO APPEAR IN IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING 1 BLAh: Boolean Logic Analysis for Graded Student Response Data Andrew S. Lan, Andrew E. Waters, Christoph Studer, and Richard G. Baraniuk
More informationParadoxical Results in Multidimensional Item Response Theory
UNC, December 6, 2010 Paradoxical Results in Multidimensional Item Response Theory Giles Hooker and Matthew Finkelman UNC, December 6, 2010 1 / 49 Item Response Theory Educational Testing Traditional model
More informationProduct-based Neural Networks for User Response Prediction
Product-based Neural Networks for User Response Prediction Yanru Qu, Han Cai, Kan Ren, Weinan Zhang, Yong Yu Shanghai Jiao Tong University {kevinqu, hcai, kren, wnzhang, yyu}@apex.sjtu.edu.cn Ying Wen,
More informationRelational Stacked Denoising Autoencoder for Tag Recommendation. Hao Wang
Relational Stacked Denoising Autoencoder for Tag Recommendation Hao Wang Dept. of Computer Science and Engineering Hong Kong University of Science and Technology Joint work with Xingjian Shi and Dit-Yan
More informationCourse 495: Advanced Statistical Machine Learning/Pattern Recognition
Course 495: Advanced Statistical Machine Learning/Pattern Recognition Deterministic Component Analysis Goal (Lecture): To present standard and modern Component Analysis (CA) techniques such as Principal
More informationModeling User s Cognitive Dynamics in Information Access and Retrieval using Quantum Probability (ESR-6)
Modeling User s Cognitive Dynamics in Information Access and Retrieval using Quantum Probability (ESR-6) SAGAR UPRETY THE OPEN UNIVERSITY, UK QUARTZ Mid-term Review Meeting, Padova, Italy, 07/12/2018 Self-Introduction
More informationScaling Neighbourhood Methods
Quick Recap Scaling Neighbourhood Methods Collaborative Filtering m = #items n = #users Complexity : m * m * n Comparative Scale of Signals ~50 M users ~25 M items Explicit Ratings ~ O(1M) (1 per billion)
More informationPersonalized Multi-relational Matrix Factorization Model for Predicting Student Performance
Personalized Multi-relational Matrix Factorization Model for Predicting Student Performance Prema Nedungadi and T.K. Smruthy Abstract Matrix factorization is the most popular approach to solving prediction
More informationA Matrix Factorization Method for Mapping Items to Skills and for Enhancing Expert-Based Q-matrices
A Matrix Factorization Method for Mapping Items to Skills and for Enhancing Expert-Based Q-matrices Michel C. Desmarais and Rhouma Naceur École Polytechnique de Montréal, C.P. 6079, succ. Centre-ville
More informationPsychometric Issues in Formative Assessment: Measuring Student Learning Throughout the Academic Year Using Interim Assessments
Psychometric Issues in Formative Assessment: Measuring Student Learning Throughout the Academic Year Using Interim Assessments Jonathan Templin The University of Georgia Neal Kingston and Wenhao Wang University
More informationIntroduction to Deep Neural Networks
Introduction to Deep Neural Networks Presenter: Chunyuan Li Pattern Classification and Recognition (ECE 681.01) Duke University April, 2016 Outline 1 Background and Preliminaries Why DNNs? Model: Logistic
More informationNeural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17
3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/9/7 Perceptron as a neural
More informationCollaborative Filtering Applied to Educational Data Mining
Journal of Machine Learning Research (200) Submitted ; Published Collaborative Filtering Applied to Educational Data Mining Andreas Töscher commendo research 8580 Köflach, Austria andreas.toescher@commendo.at
More informationReview: Probabilistic Matrix Factorization. Probabilistic Matrix Factorization (PMF)
Case Study 4: Collaborative Filtering Review: Probabilistic Matrix Factorization Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 2 th, 214 Emily Fox 214 1 Probabilistic
More informationPrediction of Citations for Academic Papers
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationIntroduction to Logistic Regression
Introduction to Logistic Regression Guy Lebanon Binary Classification Binary classification is the most basic task in machine learning, and yet the most frequent. Binary classifiers often serve as the
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationCourse 495: Advanced Statistical Machine Learning/Pattern Recognition
Course 495: Advanced Statistical Machine Learning/Pattern Recognition Lecturer: Stefanos Zafeiriou Goal (Lectures): To present discrete and continuous valued probabilistic linear dynamical systems (HMMs
More informationCSE 258, Winter 2017: Midterm
CSE 258, Winter 2017: Midterm Name: Student ID: Instructions The test will start at 6:40pm. Hand in your solution at or before 7:40pm. Answers should be written directly in the spaces provided. Do not
More informationarxiv: v1 [cs.cl] 21 May 2017
Spelling Correction as a Foreign Language Yingbo Zhou yingbzhou@ebay.com Utkarsh Porwal uporwal@ebay.com Roberto Konow rkonow@ebay.com arxiv:1705.07371v1 [cs.cl] 21 May 2017 Abstract In this paper, we
More informationChapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang
Chapter 4 Dynamic Bayesian Networks 2016 Fall Jin Gu, Michael Zhang Reviews: BN Representation Basic steps for BN representations Define variables Define the preliminary relations between variables Check
More informationContent-based Recommendation
Content-based Recommendation Suthee Chaidaroon June 13, 2016 Contents 1 Introduction 1 1.1 Matrix Factorization......................... 2 2 slda 2 2.1 Model................................. 3 3 flda 3
More informationVariable Selection in Data Mining Project
Variable Selection Variable Selection in Data Mining Project Gilles Godbout IFT 6266 - Algorithmes d Apprentissage Session Project Dept. Informatique et Recherche Opérationnelle Université de Montréal
More informationLecture 17: Neural Networks and Deep Learning
UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions
More informationApprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning
Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire
More informationDeep Learning for NLP
Deep Learning for NLP CS224N Christopher Manning (Many slides borrowed from ACL 2012/NAACL 2013 Tutorials by me, Richard Socher and Yoshua Bengio) Machine Learning and NLP NER WordNet Usually machine learning
More informationMatrix Factorization and Factorization Machines for Recommender Systems
Talk at SDM workshop on Machine Learning Methods on Recommender Systems, May 2, 215 Chih-Jen Lin (National Taiwan Univ.) 1 / 54 Matrix Factorization and Factorization Machines for Recommender Systems Chih-Jen
More information11/3/15. Deep Learning for NLP. Deep Learning and its Architectures. What is Deep Learning? Advantages of Deep Learning (Part 1)
11/3/15 Machine Learning and NLP Deep Learning for NLP Usually machine learning works well because of human-designed representations and input features CS224N WordNet SRL Parser Machine learning becomes
More informationDANIEL WILSON AND BEN CONKLIN. Integrating AI with Foundation Intelligence for Actionable Intelligence
DANIEL WILSON AND BEN CONKLIN Integrating AI with Foundation Intelligence for Actionable Intelligence INTEGRATING AI WITH FOUNDATION INTELLIGENCE FOR ACTIONABLE INTELLIGENCE in an arms race for artificial
More informationsmart reply and implicit semantics Matthew Henderson and Brian Strope Google AI
smart reply and implicit semantics Matthew Henderson and Brian Strope Google AI collaborators include: Rami Al-Rfou, Yun-hsuan Sung Laszlo Lukacs, Ruiqi Guo, Sanjiv Kumar Balint Miklos, Ray Kurzweil and
More informationPILCO: A Model-Based and Data-Efficient Approach to Policy Search
PILCO: A Model-Based and Data-Efficient Approach to Policy Search (M.P. Deisenroth and C.E. Rasmussen) CSC2541 November 4, 2016 PILCO Graphical Model PILCO Probabilistic Inference for Learning COntrol
More informationIntroduction to Machine Learning (Pattern recognition and model fitting) for Master students
Introduction to Machine Learning (Pattern recognition and model fitting) for Master students Spring 007, ÖU/RAP Thorsteinn Rögnvaldsson thorsteinn.rognvaldsson@tech.oru.se Contents Machine learning algorithms
More informationMIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,
MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run
More informationMachine Learning. Boris
Machine Learning Boris Nadion boris@astrails.com @borisnadion @borisnadion boris@astrails.com astrails http://astrails.com awesome web and mobile apps since 2005 terms AI (artificial intelligence)
More informationDetecting Statistical Interactions from Neural Network Weights
Detecting Statistical Interactions from Neural Network Weights Michael Tsang Joint work with Dehua Cheng, Yan Liu 1/17 Motivation: We seek assurance that a neural network learned the longitude x latitude
More informationGenerative Models for Sentences
Generative Models for Sentences Amjad Almahairi PhD student August 16 th 2014 Outline 1. Motivation Language modelling Full Sentence Embeddings 2. Approach Bayesian Networks Variational Autoencoders (VAE)
More informationIntelligent Systems I
Intelligent Systems I 00 INTRODUCTION Stefan Harmeling & Philipp Hennig 24. October 2013 Max Planck Institute for Intelligent Systems Dptmt. of Empirical Inference Which Card? Opening Experiment Which
More informationLearning Binary Classifiers for Multi-Class Problem
Research Memorandum No. 1010 September 28, 2006 Learning Binary Classifiers for Multi-Class Problem Shiro Ikeda The Institute of Statistical Mathematics 4-6-7 Minami-Azabu, Minato-ku, Tokyo, 106-8569,
More informationInput Switched Affine Networks: An RNN Architecture Designed for Interpretability(ICML 2017) Nayeong Kim
Input Switched Affine Networks: An RNN Architecture Designed for Interpretability(ICML 2017) Jakob N.Foerster, Justin Gilmer, Jascha Sohl-Dickstein, Jan Chorowski, David Sussillo Nayeong Kim Machine Learning
More informationLarge-Scale Feature Learning with Spike-and-Slab Sparse Coding
Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab
More informationCSC321 Lecture 15: Recurrent Neural Networks
CSC321 Lecture 15: Recurrent Neural Networks Roger Grosse Roger Grosse CSC321 Lecture 15: Recurrent Neural Networks 1 / 26 Overview Sometimes we re interested in predicting sequences Speech-to-text and
More informationThe Origin of Deep Learning. Lili Mou Jan, 2015
The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets
More informationMeasuring the Usefulness of Hidden Units in Boltzmann Machines with Mutual Information
Measuring the Usefulness of Hidden Units in Boltzmann Machines with Mutual Information Mathias Berglund, Tapani Raiko, and KyungHyun Cho Department of Information and Computer Science Aalto University
More informationPAC-learning, VC Dimension and Margin-based Bounds
More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based
More informationDeep Learning Basics Lecture 10: Neural Language Models. Princeton University COS 495 Instructor: Yingyu Liang
Deep Learning Basics Lecture 10: Neural Language Models Princeton University COS 495 Instructor: Yingyu Liang Natural language Processing (NLP) The processing of the human languages by computers One of
More informationCollaborative Filtering. Radek Pelánek
Collaborative Filtering Radek Pelánek 2017 Notes on Lecture the most technical lecture of the course includes some scary looking math, but typically with intuitive interpretation use of standard machine
More informationEnsemble Rasch Models
Ensemble Rasch Models Steven M. Lattanzio II Metamatrics Inc., Durham, NC 27713 email: slattanzio@lexile.com Donald S. Burdick Metamatrics Inc., Durham, NC 27713 email: dburdick@lexile.com A. Jackson Stenner
More informationComputational statistics
Computational statistics Lecture 3: Neural networks Thierry Denœux 5 March, 2016 Neural networks A class of learning methods that was developed separately in different fields statistics and artificial
More informationDynamic Probabilistic Models for Latent Feature Propagation in Social Networks
Dynamic Probabilistic Models for Latent Feature Propagation in Social Networks Creighton Heaukulani and Zoubin Ghahramani University of Cambridge TU Denmark, June 2013 1 A Network Dynamic network data
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationCSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer
CSE446: Neural Networks Spring 2017 Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer Human Neurons Switching time ~ 0.001 second Number of neurons 10 10 Connections per neuron 10 4-5 Scene
More informationMaking Deep Learning Understandable for Analyzing Sequential Data about Gene Regulation
Making Deep Learning Understandable for Analyzing Sequential Data about Gene Regulation Dr. Yanjun Qi Department of Computer Science University of Virginia Tutorial @ ACM BCB-2018 8/29/18 Yanjun Qi / UVA
More informationSegmental Recurrent Neural Networks for End-to-end Speech Recognition
Segmental Recurrent Neural Networks for End-to-end Speech Recognition Liang Lu, Lingpeng Kong, Chris Dyer, Noah Smith and Steve Renals TTI-Chicago, UoE, CMU and UW 9 September 2016 Background A new wave
More informationCSC 411: Lecture 09: Naive Bayes
CSC 411: Lecture 09: Naive Bayes Class based on Raquel Urtasun & Rich Zemel s lectures Sanja Fidler University of Toronto Feb 8, 2015 Urtasun, Zemel, Fidler (UofT) CSC 411: 09-Naive Bayes Feb 8, 2015 1
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationStatistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks
Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks Jan Drchal Czech Technical University in Prague Faculty of Electrical Engineering Department of Computer Science Topics covered
More informationLatent Dirichlet Allocation Introduction/Overview
Latent Dirichlet Allocation Introduction/Overview David Meyer 03.10.2016 David Meyer http://www.1-4-5.net/~dmm/ml/lda_intro.pdf 03.10.2016 Agenda What is Topic Modeling? Parametric vs. Non-Parametric Models
More informationSparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference
Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference Shunsuke Horii Waseda University s.horii@aoni.waseda.jp Abstract In this paper, we present a hierarchical model which
More informationNatural Language Processing with Deep Learning CS224N/Ling284
Natural Language Processing with Deep Learning CS224N/Ling284 Lecture 4: Word Window Classification and Neural Networks Richard Socher Organization Main midterm: Feb 13 Alternative midterm: Friday Feb
More informationOnline Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions?
Online Videos FERPA Sign waiver or sit on the sides or in the back Off camera question time before and after lecture Questions? Lecture 1, Slide 1 CS224d Deep NLP Lecture 4: Word Window Classification
More informationCS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu
CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu Feature engineering is hard 1. Extract informative features from domain knowledge
More informationDecoupled Collaborative Ranking
Decoupled Collaborative Ranking Jun Hu, Ping Li April 24, 2017 Jun Hu, Ping Li WWW2017 April 24, 2017 1 / 36 Recommender Systems Recommendation system is an information filtering technique, which provides
More informationPulses Characterization from Raw Data for CDMS
Pulses Characterization from Raw Data for CDMS Physical Sciences Francesco Insulla (06210287), Chris Stanford (05884854) December 14, 2018 Abstract In this project we seek to take a time series of current
More informationtext classification 3: neural networks
text classification 3: neural networks CS 585, Fall 2018 Introduction to Natural Language Processing http://people.cs.umass.edu/~miyyer/cs585/ Mohit Iyyer College of Information and Computer Sciences University
More informationarxiv: v2 [cs.ir] 14 May 2018
A Probabilistic Model for the Cold-Start Problem in Rating Prediction using Click Data ThaiBinh Nguyen 1 and Atsuhiro Takasu 1, 1 Department of Informatics, SOKENDAI (The Graduate University for Advanced
More informationCE213 Artificial Intelligence Lecture 14
CE213 Artificial Intelligence Lecture 14 Neural Networks: Part 2 Learning Rules -Hebb Rule - Perceptron Rule -Delta Rule Neural Networks Using Linear Units [ Difficulty warning: equations! ] 1 Learning
More informationAn Ensemble Ranking Solution for the Yahoo! Learning to Rank Challenge
An Ensemble Ranking Solution for the Yahoo! Learning to Rank Challenge Ming-Feng Tsai Department of Computer Science University of Singapore 13 Computing Drive, Singapore Shang-Tse Chen Yao-Nan Chen Chun-Sung
More informationarxiv: v2 [cs.cl] 1 Jan 2019
Variational Self-attention Model for Sentence Representation arxiv:1812.11559v2 [cs.cl] 1 Jan 2019 Qiang Zhang 1, Shangsong Liang 2, Emine Yilmaz 1 1 University College London, London, United Kingdom 2
More informationMatrix Factorization Techniques for Recommender Systems
Matrix Factorization Techniques for Recommender Systems By Yehuda Koren Robert Bell Chris Volinsky Presented by Peng Xu Supervised by Prof. Michel Desmarais 1 Contents 1. Introduction 4. A Basic Matrix
More informationFrom perceptrons to word embeddings. Simon Šuster University of Groningen
From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written
More informationMachine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6
Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)
More informationIntroduction to Machine Learning Midterm Exam
10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationDeep Learning for NLP
Deep Learning for NLP Instructor: Wei Xu Ohio State University CSE 5525 Many slides from Greg Durrett Outline Motivation for neural networks Feedforward neural networks Applying feedforward neural networks
More informationGenerative Clustering, Topic Modeling, & Bayesian Inference
Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week
More informationFrom statistics to data science. BAE 815 (Fall 2017) Dr. Zifei Liu
From statistics to data science BAE 815 (Fall 2017) Dr. Zifei Liu Zifeiliu@ksu.edu Why? How? What? How much? How many? Individual facts (quantities, characters, or symbols) The Data-Information-Knowledge-Wisdom
More informationProbabilistic Time Series Classification
Probabilistic Time Series Classification Y. Cem Sübakan Boğaziçi University 25.06.2013 Y. Cem Sübakan (Boğaziçi University) M.Sc. Thesis Defense 25.06.2013 1 / 54 Problem Statement The goal is to assign
More informationMachine learning for pervasive systems Classification in high-dimensional spaces
Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version
More informationMachine Learning (CS 567) Lecture 5
Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationStat 542: Item Response Theory Modeling Using The Extended Rank Likelihood
Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Jonathan Gruhl March 18, 2010 1 Introduction Researchers commonly apply item response theory (IRT) models to binary and ordinal
More information