Classifier Adaptation at Prediction Time
|
|
- Malcolm Warner
- 5 years ago
- Views:
Transcription
1 Classifier Adaptation at Prediction Time or How Bayes rule might help you to reduce your error rate by half Christoph Lampert Yandex, Moscow September 8th, 2016
2 IST Austria (Institute of Science and Technology Austria), Vienna New public research institute Natural and formal sciences - Computer Science, Mathematics, Biology, Neuroscience, Physics PhD-granting, no undergrad Basic, curiosity-driven research Focus on interdisciplinary Open positions in all fields IST Austria Graduate School Postdoc Fellowships Tenure-Track Assistant Professors Full Professors Internships, Reseach Visits, Sabbaticals,... More information: or chl@ist.ac.at
3 Long term goal Automatic systems that can analyze and interpret data Image Understanding Three men sit at a table in a pub, drinking beer. One of them talks while the other two listen. Image: British Broadcasting Corporation (BBC)
4 State of the art Analyze individual aspects of visual data indoors in a pub Scene Classification drinking talking Action Classification Object Recognition three persons one table three glasses
5 Crucial Step: Object Recognition object? person: bottle: cake: truck: car: table: tiger: zebra: Image: Tony Alter, under Creative Commons
6 Object recognition has gone large scale: big data Image: Forsyth, Efros, Fei-Fei, Torralba, Zisserman, "The Promise and Perils of Benchmark Datasets and Challenges", 2011.
7 Object recognition has gone complex: deep networks Image left: Krizhevsky, Sutskever, Hinton, "Imagenet classification with deep convolutional neural networks", NIPS Image right: adapted from He, Zhang, Ren, Sun,"Deep Residual Learning for Image Recognition", arxiv:
8 Object recognition has gone expensive: HPC/GPU clusters Image: "The CSIRO GPU cluster at the data centre" by CSIRO. Licensed under CC BY 3.0 via Wikimedia Commons
9 Don t train object classifiers yourself. Order them pre-trained. Image: faked
10 Research Challenge Image Understanding with Pretrained Classifiers
11 Academic setting Independent identically distributed data at training and prediction time Image: ImageNet dataset Image:
12 Vendor Customer 1 Domain Shift Image: ImageNet dataset Image: "Supermarkt". Licensed under GFDL via Wikimedia Commons
13 Vendor Customer 2 Domain Shift, Dependent Samples Image: ImageNet dataset Image: "Baggage Claim at CPH" by Duhhitsminerva. Licensed under CC BY 3.0 via Wikimedia Commons
14 Vendor Customer 3 Domain Shift, Dependent Samples, Non-Stationary Distribution Image: ImageNet dataset Image: Christoph Lampert 2015
15 Dependent Samples Academic setting: training and test data are sampled i.i.d. images are independent, identically distributed Real-life prediction tasks: very much non-i.i.d. surveillance: temporal dependences between images photo collections: specific selection of themes
16 Dependent Samples Academic setting: training and test data are sampled i.i.d. images are independent, identically distributed Real-life prediction tasks: very much non-i.i.d. surveillance: temporal dependences between images photo collections: specific selection of themes We argue: This is a blessing, not a nuisance! some shop Images: ImageNet dataset
17 Dependent Samples Academic setting: training and test data are sampled i.i.d. images are independent, identically distributed Real-life prediction tasks: very much non-i.i.d. surveillance: temporal dependences between images photo collections: specific selection of themes We argue: This is a blessing, not a nuisance! earlier images act as context bakery Images: ImageNet dataset
18 Domain Shift Notation: x X images, y Y = {1,..., K} class labels P(x, y) data distribution at training time (vendor) Q(x, y) data distribution at prediction time (customer) Domain shift: P(x, y) Q(x, y)
19 Domain Shift Notation: x X images, y Y = {1,..., K} class labels P(x, y) data distribution at training time (vendor) Q(x, y) data distribution at prediction time (customer) Domain shift: P(x, y) Q(x, y) Three cases: P(y x) = Q(y x), but P(x) Q(x) P(x y) Q(x y) P(x y) = Q(x y), but P(y) Q(y) covariate shift appearance shift class prior shift
20 Domain Shift Notation: x X images, y Y = {1,..., K} class labels P(x, y) data distribution at training time (vendor) Q(x, y) data distribution at prediction time (customer) Domain shift: P(x, y) Q(x, y) Three cases: P(y x) = Q(y x), but P(x) Q(x) P(x y) Q(x y) P(x y) = Q(x y), but P(y) Q(y) covariate shift appearance shift class prior shift
21 Domain Shift Image: [Donahue et al., ICML 2014] Appearance shift is mitigated by invariant features
22 Domain Shift Notation: x X images, y Y = {1,..., K} class labels P(x, y) data distribution at training time (vendor) Q(x, y) data distribution at prediction time (customer) Domain shift: P(x, y) Q(x, y) Three cases: P(y x) = Q(y x), but P(x) Q(x) P(x y) Q(x y) P(x y) = Q(x y), but P(y) Q(y) covariate shift appearance shift class prior shift
23 Domain Shift training time: P(y) typically balanced, P(y) 1 K e.g. in ILSVR2014, as many volcanos as cucumbers prediction time: Q(y) supermarket: Q(y) lots of fruit, most likely no volcanos airport: Q(y) lots of people and baggage, also no volcanos vacation: Q(y) occasional volcanos, but more beaches highly imbalanced low entropy easy to learn! Class prior shift is real, but also potentially beneficial.
24 Classifier Adaptation at Prediction Time Amélie Royer ENS Rennes/IST Austria [A. Royer, CHL, "Classifier Adaptation at Prediction Time", CVPR 2015]
25 Class Prior Adaptation Training time: optimal multi-class classifier f : X Y f (x) = argmax f y (x) for f y (x) P(y x). y Y Prediction time: optimal multi-class classifier g : X Y g(x) = argmax g y (x) for g y (x) Q(y x). y Y For P(x y) = Q(x y), but P(y) Q(y), Q(y x) = P(y x)p(x)q(y) P(y)Q(x) f y (x) Q(y) P(y) Optimal classifier: g(x) = argmax y Y f y (x) Q(y) P(y).
26 Class Prior Adaptation probabilistic classifier f (x) = argmax y f y (x), with f y : X R class proportions at training time ρ R K, i.e. ρ y = P(y) class proportions at prediction time π R K, i.e. π y = Q(y) Definition: The class-prior adaptation of f from ρ to π is g(x) = argmax g y (x) for g y (x) = f y(x)π y. y Y ρ y Note: no retraining, only adjust output scores. [Saerens et al., 2002] Lemma: g is Bayes-optimal for Q(x, y)-distributed data, if P(x, y) differs from Q(x, y) only in the class proportions and f y (x) = P(y x).
27 Class Prior Adaptation probabilistic classifier f (x) = argmax y f y (x), with f y : X R class proportions at training time ρ R K, i.e. ρ y = P(y) class proportions at prediction time π R K, i.e. π y = Q(y) Definition: The class-prior adaptation of f from ρ to π is g(x) = argmax g y (x) for g y (x) = f y(x)π y. y Y ρ y Note: no retraining, only adjust output scores. [Saerens et al., 2002] Lemma: g is Bayes-optimal for Q(x, y)-distributed data, if P(x, y) differs from Q(x, y) only in the class proportions and f y (x) = P(y x).
28 Class-Prior Adaptation during Sequential Prediction Problem: In the vendor/customer scenario, the class proportion at prediction time, π, are unknown. Solution: learn proportions on-the-fly at prediction time
29 Class-Prior Adaptation during Sequential Prediction Problem: In the vendor/customer scenario, the class proportion at prediction time, π, are unknown. Solution: learn proportions on-the-fly at prediction time Sequential prediction scenario: images to be classified arrive sequentially, x 1, x 2,... goal: for each x t make prediction g(x t ) three possible feedback scenarios: online: after prediction the correct label, yt, is revealed e.g. supermarket cash register bandit: after prediction it is revealed if a mistake was made e.g. augmented reality glasses unsupervised: no feedback about correct labels e.g. surveillance
30 Example (no adaptation) image x t f cat (x t ) f dog (x t ) f truck (x t ) prediction f (x t ) online feedback bandit feedback no feedback
31 Example (no adaptation) image x t f cat (x t ) 0.8 f dog (x t ) 0.1 f truck (x t ) 0.1 prediction f (x t ) cat online feedback bandit feedback no feedback
32 Example (no adaptation) image x t f cat (x t ) 0.8 f dog (x t ) 0.1 f truck (x t ) 0.1 prediction f (x t ) cat online feedback bandit feedback no feedback cat
33 Example (no adaptation) image x t f cat (x t ) 0.8 f dog (x t ) 0.1 f truck (x t ) 0.1 prediction f (x t ) cat online feedback bandit feedback! no feedback
34 Example (no adaptation) image x t f cat (x t ) 0.8 f dog (x t ) 0.1 f truck (x t ) 0.1 prediction f (x t ) cat online feedback bandit feedback no feedback
35 Example (no adaptation) image x t f cat (x t ) 0.8 f dog (x t ) 0.1 f truck (x t ) 0.1 prediction f (x t ) cat online feedback cat bandit feedback! no feedback
36 Example (no adaptation) image x t f cat (x t ) f dog (x t ) f truck (x t ) prediction f (x t ) cat truck online feedback cat bandit feedback! no feedback
37 Example (no adaptation) image x t f cat (x t ) f dog (x t ) f truck (x t ) prediction f (x t ) cat truck online feedback cat dog bandit feedback! no feedback
38 Example (no adaptation) image x t f cat (x t ) f dog (x t ) f truck (x t ) prediction f (x t ) cat truck online feedback cat bandit feedback! % no feedback
39 Example (no adaptation) image x t f cat (x t ) f dog (x t ) f truck (x t ) prediction f (x t ) cat truck online feedback cat bandit feedback! no feedback
40 Example (no adaptation) image x t f cat (x t ) f dog (x t ) f truck (x t ) prediction f (x t ) cat truck online feedback cat dog bandit feedback! % no feedback
41 Example (no adaptation) image x t f cat (x t ) f dog (x t ) f truck (x t ) prediction f (x t ) cat truck dog online feedback cat dog bandit feedback! % no feedback
42 Example (no adaptation) image x t f cat (x t ) f dog (x t ) f truck (x t ) prediction f (x t ) cat truck dog online feedback cat dog dog bandit feedback! % no feedback
43 Example (no adaptation) image x t f cat (x t ) f dog (x t ) f truck (x t ) prediction f (x t ) cat truck dog online feedback cat dog bandit feedback! %! no feedback
44 Example (no adaptation) image x t f cat (x t ) f dog (x t ) f truck (x t ) prediction f (x t ) cat truck dog online feedback cat dog bandit feedback! % no feedback
45 Example (no adaptation) image x t f cat (x t ) f dog (x t ) f truck (x t ) prediction f (x t ) cat truck dog online feedback cat dog dog bandit feedback! %! no feedback
46 Example (no adaptation) image x t f cat (x t ) f dog (x t ) f truck (x t ) prediction f (x t ) cat truck dog cat online feedback cat dog dog cat bandit feedback! %!! no feedback
47 Example (no adaptation) image x t f cat (x t ) f dog (x t ) f truck (x t ) prediction f (x t ) cat truck dog cat online feedback cat dog dog cat bandit feedback! %!! no feedback
48 Example (no adaptation) image x t f cat (x t ) f dog (x t ) f truck (x t ) prediction f (x t ) cat truck dog cat truck online feedback cat dog dog cat dog bandit feedback! %!! % no feedback
49 Example (no adaptation) image x t f cat (x t ) f dog (x t ) f truck (x t ) prediction f (x t ) cat truck dog cat truck online feedback cat dog dog cat dog bandit feedback! %!! % no feedback...
50 Estimating Class Priors Examples, x 1, x 2,..., with labels, y 1, y 2,... Task: estimate class proportions, (π y ) y Y Smoothed Maximum Likelihood (aka Bayesian estimator with Dirichlet prior): π y (t) = n t(y) + α for α > 0, (e.g. α = 1 t + Kα 2 ) t where n t (y) = y τ = y τ=1 counts how often each label occurred so far. (preferable to ML estimator, π (t) y = nt (y), which assigns 0 probability to unseen classes) t
51 Online Feedback π (t) y = n t(y) + α t + Kα t for n t (y) = y τ = y τ=1 after prediction g(x t ), correct label y t is revealed compute n t (y) incrementally n t (y) = n t 1 (y) + y t = y
52 Online Feedback π (t) y = n t(y) + α t + Kα t for n t (y) = y τ = y τ=1 after prediction g(x t ), correct label y t is revealed compute n t (y) incrementally n t (y) = n t 1 (y) + y t = y Law of large numbers: π (t) converges to the true class distribution. This holds even for dependent samples (under weak conditions).
53 Example (online feedback) π (cat,dog,truck) 1 3, 1 3, 1 3 image x t f cat (x t ) f dog (x t ) f truck (x t ) g cat (x t ) g dog (x t ) g truck (x t ) prediction g(x t ) feedback y t π update
54 Example (online feedback) π (cat,dog,truck) 1 3, 1 3, 1 3 image x t f cat (x t ) 0.8 f dog (x t ) 0.1 f truck (x t ) 0.1 g cat (x t ) g dog (x t ) g truck (x t ) prediction g(x t ) feedback y t π update
55 Example (online feedback) π (cat,dog,truck) 1 3, 1 3, 1 3 image x t f cat (x t ) 0.8 f dog (x t ) 0.1 f truck (x t ) 0.1 g cat (x t ) 0.8 g dog (x t ) 0.1 g truck (x t ) 0.1 prediction g(x t ) cat feedback y t π update
56 Example (online feedback) π (cat,dog,truck) 1 3, 1 3, 1 3 image x t f cat (x t ) 0.8 f dog (x t ) 0.1 f truck (x t ) 0.1 g cat (x t ) 0.8 g dog (x t ) 0.1 g truck (x t ) 0.1 prediction g(x t ) cat feedback y t cat π update n(cat) += 1
57 Example (online feedback) π (cat,dog,truck) 1 3, 1 3, , 1 4, 1 4 image x t f cat (x t ) 0.8 f dog (x t ) 0.1 f truck (x t ) 0.1 g cat (x t ) 0.8 g dog (x t ) 0.1 g truck (x t ) 0.1 prediction g(x t ) cat feedback y t cat π update n(cat) += 1
58 Example (online feedback) π (cat,dog,truck) 1 3, 1 3, , 1 4, 1 4 image x t f cat (x t ) 0.8 f dog (x t ) 0.1 f truck (x t ) 0.1 g cat (x t ) 0.8 g dog (x t ) 0.1 g truck (x t ) 0.1 prediction g(x t ) cat feedback y t cat π update n(cat) += 1
59 Example (online feedback) π (cat,dog,truck) 1 3, 1 3, , 1 4, 1 4 image x t f cat (x t ) f dog (x t ) f truck (x t ) g cat (x t ) 0.8 g dog (x t ) 0.1 g truck (x t ) 0.1 prediction g(x t ) cat feedback y t cat π update n(cat) += 1
60 Example (online feedback) π (cat,dog,truck) 1 3, 1 3, , 1 4, 1 4 image x t f cat (x t ) f dog (x t ) f truck (x t ) g cat (x t ) g dog (x t ) g truck (x t ) prediction g(x t ) cat truck feedback y t cat π update n(cat) += 1
61 Example (online feedback) π (cat,dog,truck) 1 3, 1 3, , 1 4, 1 4 image x t f cat (x t ) f dog (x t ) f truck (x t ) g cat (x t ) g dog (x t ) g truck (x t ) prediction g(x t ) cat truck feedback y t cat dog π update n(cat) + = 1 n(dog) + = 1
62 Example (online feedback) π (cat,dog,truck) 1 3, 1 3, , 1 4, , 2 5, 1 5 image x t f cat (x t ) f dog (x t ) f truck (x t ) g cat (x t ) g dog (x t ) g truck (x t ) prediction g(x t ) cat truck feedback y t cat dog π update n(cat) + = 1 n(dog) + = 1
63 Example (online feedback) π (cat,dog,truck) 1 3, 1 3, , 1 4, , 2 5, 1 5 image x t f cat (x t ) f dog (x t ) f truck (x t ) g cat (x t ) g dog (x t ) g truck (x t ) prediction g(x t ) cat truck feedback y t cat dog π update n(cat) + = 1 n(dog) + = 1
64 Example (online feedback) π (cat,dog,truck) 1 3, 1 3, , 1 4, , 2 5, 1 5 image x t f cat (x t ) f dog (x t ) f truck (x t ) g cat (x t ) g dog (x t ) g truck (x t ) prediction g(x t ) cat truck feedback y t cat dog π update n(cat) + = 1 n(dog) + = 1
65 Example (online feedback) π (cat,dog,truck) 1 3, 1 3, , 1 4, , 2 5, 1 5 image x t f cat (x t ) f dog (x t ) f truck (x t ) g cat (x t ) g dog (x t ) g truck (x t ) prediction g(x t ) cat truck dog feedback y t cat dog π update n(cat) + = 1 n(dog) + = 1
66 Example (online feedback) π (cat,dog,truck) 1 3, 1 3, , 1 4, , 2 5, 1 5 image x t f cat (x t ) f dog (x t ) f truck (x t ) g cat (x t ) g dog (x t ) g truck (x t ) prediction g(x t ) cat truck dog feedback y t cat dog dog π update n(cat) + = 1 n(dog) + = 1 n(dog) + = 1
67 Example (online feedback) π (cat,dog,truck) 1 3, 1 3, , 1 4, , 2 5, , 3 6, 1 6 image x t f cat (x t ) f dog (x t ) f truck (x t ) g cat (x t ) g dog (x t ) g truck (x t ) prediction g(x t ) cat truck dog feedback y t cat dog dog π update n(cat) + = 1 n(dog) + = 1 n(dog) + = 1
68 Example (online feedback) π (cat,dog,truck) 1 3, 1 3, , 1 4, , 2 5, , 3 6, 1 6 image x t f cat (x t ) f dog (x t ) f truck (x t ) g cat (x t ) g dog (x t ) g truck (x t ) prediction g(x t ) cat truck dog feedback y t cat dog dog π update n(cat) + = 1 n(dog) + = 1 n(dog) + = 1
69 Example (online feedback) π (cat,dog,truck) 1 3, 1 3, , 1 4, , 2 5, , 3 6, 1 6 image x t f cat (x t ) f dog (x t ) f truck (x t ) g cat (x t ) g dog (x t ) g truck (x t ) prediction g(x t ) cat truck dog feedback y t cat dog dog π update n(cat) + = 1 n(dog) + = 1 n(dog) + = 1
70 Example (online feedback) π (cat,dog,truck) 1 3, 1 3, , 1 4, , 2 5, , 3 6, 1 6 image x t f cat (x t ) f dog (x t ) f truck (x t ) g cat (x t ) g dog (x t ) g truck (x t ) prediction g(x t ) cat truck dog cat feedback y t cat dog dog π update n(cat) + = 1 n(dog) + = 1 n(dog) + = 1
71 Example (online feedback) π (cat,dog,truck) 1 3, 1 3, , 1 4, , 2 5, , 3 6, 1 6 image x t f cat (x t ) f dog (x t ) f truck (x t ) g cat (x t ) g dog (x t ) g truck (x t ) prediction g(x t ) cat truck dog cat feedback y t cat dog dog cat π update n(cat) + = 1 n(dog) + = 1 n(dog) + = 1 n(cat) + = 1
72 Example (online feedback) π (cat,dog,truck) 1 3, 1 3, , 1 4, , 2 5, , 3 6, , 3 7, 1 7 image x t f cat (x t ) f dog (x t ) f truck (x t ) g cat (x t ) g dog (x t ) g truck (x t ) prediction g(x t ) cat truck dog cat feedback y t cat dog dog cat π update n(cat) + = 1 n(dog) + = 1 n(dog) + = 1 n(cat) + = 1
73 Example (online feedback) π (cat,dog,truck) 1 3, 1 3, , 1 4, , 2 5, , 3 6, , 3 7, 1 7 image x t f cat (x t ) f dog (x t ) f truck (x t ) g cat (x t ) g dog (x t ) g truck (x t ) prediction g(x t ) cat truck dog cat feedback y t cat dog dog cat π update n(cat) + = 1 n(dog) + = 1 n(dog) + = 1 n(cat) + = 1
74 Example (online feedback) π (cat,dog,truck) 1 3, 1 3, , 1 4, , 2 5, , 3 6, , 3 7, 1 7 image x t f cat (x t ) f dog (x t ) f truck (x t ) g cat (x t ) g dog (x t ) g truck (x t ) prediction g(x t ) cat truck dog cat feedback y t cat dog dog cat π update n(cat) + = 1 n(dog) + = 1 n(dog) + = 1 n(cat) + = 1
75 Example (online feedback) π (cat,dog,truck) 1 3, 1 3, , 1 4, , 2 5, , 3 6, , 3 7, 1 7 image x t f cat (x t ) f dog (x t ) f truck (x t ) g cat (x t ) g dog (x t ) g truck (x t ) prediction g(x t ) cat truck dog cat dog(!) feedback y t cat dog dog cat π update n(cat) + = 1 n(dog) + = 1 n(dog) + = 1 n(cat) + = 1
76 Example (online feedback) π (cat,dog,truck) 1 3, 1 3, , 1 4, , 2 5, , 3 6, , 3 7, image x t f cat (x t ) f dog (x t ) f truck (x t ) g cat (x t ) g dog (x t ) g truck (x t ) prediction g(x t ) cat truck dog cat dog(!) feedback y t cat dog dog cat dog π update n(cat) + = 1 n(dog) + = 1 n(dog) + = 1 n(cat) + =
77 Bandit Feedback π (t) y = n t(y) + α t + Kα t for n t (y) = y τ = y τ=1 after prediction g(x t ), it is revealed if the prediction was correct estimate n t (y) incrementally n t (y) = n t 1 (y) + δ t (y) if decision was correct: δ t (y) = y t = y 0 for y = g(x t ). if decision was incorrect: δ t (y) = 1 otherwise (also possible: δ t(y) Q (t) (y x t), for y g(x t)) K 1
78 Example (bandit feedback) π (cat,dog,truck) 1 3, 1 3, , 1 4, , 1.5 5, , 2.5 6, , 2.5 7, image x t f cat (x t ) f dog (x t ) f truck (x t ) g cat (x t ) g dog (x t ) g truck (x t ) pred. g(x t ) cat truck dog cat dog(!) feedback y t! %!!! π update n(cat) += 1 n(cat) += 0.5 n(dog) += 0.5 n(dog) + = 1 n(cat) + =
79 Unsupervised (No Feedback) π (t) y = n t(y) + α t + Kα t for n t (y) = y τ = y τ=1 no information if prediction g(x t ) was correct or not estimate n t (y) by trusting own predictions (self-training) n t (y) = n t 1 (y) + δ t (y) for { } g (t) δ t (y) = Eȳ Q (t) (ȳ x t) y = ȳ = y (x t ) ȳ g (t) ȳ (x t ). No guarantee, but can be expected to work for decent base classifiers.
80 Example (no feedback) π (cat,dog,truck) 1 3, 1 3, , 1.1 4, , 1.5 5, , 2.1 6, , 2.4 7, image x t f cat (x t ) f dog (x t ) f truck (x t ) g cat (x t ) g dog (x t ) g truck (x t ) pred. g(x t ) cat truck dog cat dog(!) feedback y t π update n(cat) += 0.8 n(dog) += 0.1 n(truck)+ = 0.1 n(cat) += 0.15 n(dog) += 0.38 n(truck)+ = 0.47 n(cat) += 0.25 n(dog) += 0.65 n(truck)+ = 0.10 n(cat) += 0.53 n(dog) += 0.31 n(truck)+ =
81 Extension: Non-Stationary Data Distribution What if data distribution changes, e.g. mobile camera? Sliding window estimate: adapt only to recent past (e.g. L = 100). π (t) y = n t(y) + α L + Kα, with n t(y) = t τ=t L+1 δ τ (y),
82 Realistic Image Sequences How to benchmark such an adaptive classification system? We need realistic image sequences: non-uniform class distribution dependent samples non-stationary distribution
83 Realistic Image Sequences How to benchmark such an adaptive classification system? We need realistic image sequences: non-uniform class distribution dependent samples non-stationary distribution Proposal: three methods based on existing i.i.d. corpus (ILSVRC) KS/MDS: Hidden Markov Model structure random walk between classes for each visited class, sample one image TXT: based on class structure in natural language for each class occurrence in a text document, sample one image
84 Realistic Image Sequences: MDS Apply Multi-Dimensional Scaling (MDS) to ImageNet hierarchy Random walk on k-nn graph For each visited class, sample one image from ILSVRC corpus Properties: highly connected semantic clusters, random walk stays within one "topic" for an extended time
85 Realistic Image Sequences: KS Apply Kernelized Sorting (KS) to ImageNet hierarchy Random walk on resulting grid graph For each visited class, sample one image from ILSVRC corpus Properties: similar classes are close, but no cluster structure, random walk frequently "changes topic"
86 Introducing Context Switches Extend MDS/KS to allow "jumps" called MDS(λ), KS(λ) Introduce parameter λ > 0 Instead of taking a random walk step, jump to arbitrary (random) node in the graph with probability λ. Result: homogeneous subsequences of variable length
87 Realistic Image Sequences: TXT Given: corpus of well-formed English texts (project Gutenberg) Generate image sequence: discard non-nouns from text scan noun sequence for class names or ImageNet hypernyms if leaf in hierarchy (cucumber): sample image from that class if interior node (dog): sample random leaf from subtree (tibetan mastiff), sample image from leaf class... when the rabbit actually took a watch out of its waistcoat-pocket and looked at it and then hurried on, Alice started to her feet, for it flashed across her mind that she had never before seen a rabbit with either a waistcoat-pocket, or a watch to take out of it,... (Excerpt from Alice in Wonderland, italics: nouns, bold: ILSVRC2010 (super-)classes)
88 Introduction Challenges Class-Prior Adaptation Realistic Image Sequences Experiments Example Sequences TXT rabbit watch rabbit watch rabbit rabbit jar orange jar... MDS asparagus jalapeno green onion jalapeno jalapeno kidney bean pumpkin french fries... nematode sea cucumber snow leopard leopard leopard leopard mink weasel... KS RND speedboat coral reef burrito lionfish envelope fur coat trifle paddle punching bag Example label sequences and test images for TXT, MDS, KS, compared to uniform i.i.d. (RND). Images: ImageNet dataset...
89 Experimental Setup Base datasets: ILSVRC2010, ILSVRC2012 (val part) Base classifiers (pre-trained): Convolutional Neural Network (libccv, AlexNet style) SVM with 4K-dim. Fisher vectors (yael/jsgd) + Platt scaling Methods: base classifier base classifier + adaptation (+adapt) base classifier + windowed adaptation (+dyn) Test sets: 100 sequences each of MDS, MDS(λ) for λ {0.001, 0.01, 0.1} length 3000 KS, KS(λ) for λ {0.001, 0.01, 0.1} length 3000 TXT variable length, (avg. 3475) RND length 3000 Error measures: top-1 error rate, top-5 error rate
90 Experimental Setup Base datasets: ILSVRC2010, ILSVRC2012 (val part) Base classifiers (pre-trained): Convolutional Neural Network (libccv, AlexNet style) SVM with 4K-dim. Fisher vectors (yael/jsgd) + Platt scaling Methods: base classifier base classifier + adaptation (+adapt) base classifier + windowed adaptation (+dyn) Test sets: 100 sequences each of MDS, MDS(λ) for λ {0.001, 0.01, 0.1} length 3000 KS, KS(λ) for λ {0.001, 0.01, 0.1} length 3000 TXT variable length, (avg. 3475) RND length 3000 Error measures: top-1 error rate, top-5 error rate
91 Results ILSVRC2012 CNN CNN+adapt CNN+dyn TXT 19.8 ± ± ± 1.7 MDS 16.1 ± ± ± 2.6 MDS(0.001) 15.6 ± ± ± 1.9 MDS(0.01) 15.7 ± ± ± 1.1 MDS(0.1) 16.2 ± ± ± 0.7 KS 16.4 ± ± ± 1.3 KS(0.001) 16.5 ± ± ± 1.2 KS(0.01) 16.4 ± ± ± 1.0 KS(0.1) 16.5 ± ± ± 0.8 RND 16.5 ± ± ± 0.6 Online Feedback (each cell: top-5 error [%], mean and std.dev. over 100 sequences)
92 Results ILSVRC2012 CNN CNN+adapt CNN+dyn TXT 19.8 ± ± ± 1.7 MDS 16.1 ± ± ± 2.6 MDS(0.001) 15.6 ± ± ± 1.9 MDS(0.01) 15.7 ± ± ± 1.1 MDS(0.1) 16.2 ± ± ± 0.7 KS 16.4 ± ± ± 1.3 KS(0.001) 16.5 ± ± ± 1.2 KS(0.01) 16.4 ± ± ± 1.0 KS(0.1) 16.5 ± ± ± 0.8 RND 16.5 ± ± ± 0.6 Online Feedback (each cell: top-5 error [%], mean and std.dev. over 100 sequences)
93 Results ILSVRC2012 CNN CNN+adapt CNN+dyn TXT 19.8 ± ± ± 1.7 MDS 16.1 ± ± ± 2.6 MDS(0.001) 15.6 ± ± ± 1.9 MDS(0.01) 15.7 ± ± ± 1.1 MDS(0.1) 16.2 ± ± ± 0.7 KS 16.4 ± ± ± 1.3 KS(0.001) 16.5 ± ± ± 1.2 KS(0.01) 16.4 ± ± ± 1.0 KS(0.1) 16.5 ± ± ± 0.8 RND 16.5 ± ± ± 0.6 Online Feedback (each cell: top-5 error [%], mean and std.dev. over 100 sequences)
94 Results ILSVRC2012 CNN CNN+adapt CNN+dyn TXT 19.8 ± ± ± 1.7 MDS 16.1 ± ± ± 2.6 MDS(0.001) 15.6 ± ± ± 1.9 MDS(0.01) 15.7 ± ± ± 1.1 MDS(0.1) 16.2 ± ± ± 0.7 KS 16.4 ± ± ± 1.3 KS(0.001) 16.5 ± ± ± 1.2 KS(0.01) 16.4 ± ± ± 1.0 KS(0.1) 16.5 ± ± ± 0.8 RND 16.5 ± ± ± 0.6 Online Feedback (each cell: top-5 error [%], mean and std.dev. over 100 sequences)
95 Results ILSVRC2012 CNN CNN+adapt CNN+dyn TXT 19.8 ± ± ± 1.7 MDS 16.1 ± ± ± 3.3 MDS(0.001) 15.6 ± ± ± 2.6 MDS(0.01) 15.7 ± ± ± 1.4 MDS(0.1) 16.2 ± ± ± 0.8 KS 16.4 ± ± ± 1.6 KS(0.001) 16.5 ± ± ± 1.6 KS(0.01) 16.4 ± ± ± 1.3 KS(0.1) 16.5 ± ± ± 0.8 RND 16.5 ± ± ± 0.6 Bandit Feedback (each cell: top-5 error [%], mean and std.dev. over 100 sequences)
96 Results ILSVRC2012 CNN CNN+adapt CNN+dyn TXT 19.8 ± ± ± 1.7 MDS 16.1 ± ± ± 2.9 MDS(0.001) 15.6 ± ± ± 2.7 MDS(0.01) 15.7 ± ± ± 1.5 MDS(0.1) 16.2 ± ± ± 0.8 KS 16.4 ± ± ± 1.5 KS(0.001) 16.5 ± ± ± 1.5 KS(0.01) 16.4 ± ± ± 1.1 KS(0.1) 16.5 ± ± ± 0.8 RND 16.5 ± ± ± 0.6 Unsupervised (No Feedback) (each cell: top-5 error [%], mean and std.dev. over 100 sequences)
97 Summary Observations: soon we will buy computer vision components pre-trained new kinds of research problems in real problems the images to be classified are not uniform i.i.d.: class imbalance, dependent samples, non-stationary distribution Contributions: classifier adaptation with on-the-fly estimation of class priors, oblivious of underlying base classifiers, only adjust scores three methods for creating dependent test image sequences Results: on-the-fly adaptation can reduce the error rate substantially for good enough base classifiers, no additional supervision needed
98 Thanks to... The team at IST Austria: Alex Kolesnikov Georg Martius Asya Pentina Amélie Royer Alex Zimin Funding Sources:
Towards Lifelong Machine Learning Multi-Task and Lifelong Learning with Unlabeled Tasks Christoph Lampert
Towards Lifelong Machine Learning Multi-Task and Lifelong Learning with Unlabeled Tasks Christoph Lampert HSE Computer Science Colloquium September 6, 2016 IST Austria (Institute of Science and Technology
More informationDeep Learning (CNNs)
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Deep Learning (CNNs) Deep Learning Readings: Murphy 28 Bishop - - HTF - - Mitchell
More informationBayesian Networks (Part I)
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Bayesian Networks (Part I) Graphical Model Readings: Murphy 10 10.2.1 Bishop 8.1,
More informationDistinguish between different types of scenes. Matching human perception Understanding the environment
Scene Recognition Adriana Kovashka UTCS, PhD student Problem Statement Distinguish between different types of scenes Applications Matching human perception Understanding the environment Indexing of images
More informationPart-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287
Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287 Review: Neural Networks One-layer multi-layer perceptron architecture, NN MLP1 (x) = g(xw 1 + b 1 )W 2 + b 2 xw + b; perceptron x is the
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationNeural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17
3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/9/7 Perceptron as a neural
More informationAlgorithmic Stability and Generalization Christoph Lampert
Algorithmic Stability and Generalization Christoph Lampert November 28, 2018 1 / 32 IST Austria (Institute of Science and Technology Austria) institute for basic research opened in 2009 located in outskirts
More informationAsaf Bar Zvi Adi Hayat. Semantic Segmentation
Asaf Bar Zvi Adi Hayat Semantic Segmentation Today s Topics Fully Convolutional Networks (FCN) (CVPR 2015) Conditional Random Fields as Recurrent Neural Networks (ICCV 2015) Gaussian Conditional random
More informationDeep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang
Deep Learning Basics Lecture 7: Factor Analysis Princeton University COS 495 Instructor: Yingyu Liang Supervised v.s. Unsupervised Math formulation for supervised learning Given training data x i, y i
More informationMachine Learning (CS 567) Lecture 2
Machine Learning (CS 567) Lecture 2 Time: T-Th 5:00pm - 6:20pm Location: GFS118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationApprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning
Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire
More informationCSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer
CSE446: Neural Networks Spring 2017 Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer Human Neurons Switching time ~ 0.001 second Number of neurons 10 10 Connections per neuron 10 4-5 Scene
More informationClassification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box
ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton Motivation Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) Human Brain Neurons Input-Output Transformation Input Spikes Output Spike Spike (= a brief pulse) (Excitatory Post-Synaptic Potential)
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Neural Networks: A brief touch Yuejie Chi Department of Electrical and Computer Engineering Spring 2018 1/41 Outline
More informationText Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University
Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data
More informationStatistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.
http://goo.gl/xilnmn Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT
More informationMachine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016
Machine Learning for Signal Processing Neural Networks Continue Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016 1 So what are neural networks?? Voice signal N.Net Transcription Image N.Net Text
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward
More informationIntroduction to Machine Learning Midterm Exam
10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but
More informationStatistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.
http://goo.gl/jv7vj9 Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT
More informationBe able to define the following terms and answer basic questions about them:
CS440/ECE448 Section Q Fall 2017 Final Review Be able to define the following terms and answer basic questions about them: Probability o Random variables, axioms of probability o Joint, marginal, conditional
More informationConvolutional Neural Networks II. Slides from Dr. Vlad Morariu
Convolutional Neural Networks II Slides from Dr. Vlad Morariu 1 Optimization Example of optimization progress while training a neural network. (Loss over mini-batches goes down over time.) 2 Learning rate
More informationTwo at Once: Enhancing Learning and Generalization Capacities via IBN-Net
Two at Once: Enhancing Learning and Generalization Capacities via IBN-Net Supplementary Material Xingang Pan 1, Ping Luo 1, Jianping Shi 2, and Xiaoou Tang 1 1 CUHK-SenseTime Joint Lab, The Chinese University
More informationOnline Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions?
Online Videos FERPA Sign waiver or sit on the sides or in the back Off camera question time before and after lecture Questions? Lecture 1, Slide 1 CS224d Deep NLP Lecture 4: Word Window Classification
More informationOnline Learning and Sequential Decision Making
Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning
More informationMore on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013
More on HMMs and other sequence models Intro to NLP - ETHZ - 18/03/2013 Summary Parts of speech tagging HMMs: Unsupervised parameter estimation Forward Backward algorithm Bayesian variants Discriminative
More informationNeural networks and optimization
Neural networks and optimization Nicolas Le Roux Criteo 18/05/15 Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 1 / 85 1 Introduction 2 Deep networks 3 Optimization 4 Convolutional
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationFrom perceptrons to word embeddings. Simon Šuster University of Groningen
From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written
More informationMachine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?
Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone extension 6372 Email: sbh11@cl.cam.ac.uk www.cl.cam.ac.uk/ sbh11/ Unsupervised learning Can we find regularity
More informationN-gram Language Modeling Tutorial
N-gram Language Modeling Tutorial Dustin Hillard and Sarah Petersen Lecture notes courtesy of Prof. Mari Ostendorf Outline: Statistical Language Model (LM) Basics n-gram models Class LMs Cache LMs Mixtures
More informationIntroduction to Machine Learning Midterm Exam Solutions
10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,
More informationCSC321 Lecture 16: ResNets and Attention
CSC321 Lecture 16: ResNets and Attention Roger Grosse Roger Grosse CSC321 Lecture 16: ResNets and Attention 1 / 24 Overview Two topics for today: Topic 1: Deep Residual Networks (ResNets) This is the state-of-the
More informationNeural networks and optimization
Neural networks and optimization Nicolas Le Roux INRIA 8 Nov 2011 Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov 2011 1 / 80 1 Introduction 2 Linear classifier 3 Convolutional neural networks
More informationWHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY,
WHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY, WITH IMPLICATIONS FOR TRAINING Sanjeev Arora, Yingyu Liang & Tengyu Ma Department of Computer Science Princeton University Princeton, NJ 08540, USA {arora,yingyul,tengyu}@cs.princeton.edu
More informationAdministration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6
Administration Registration Hw3 is out Due on Thursday 10/6 Questions Lecture Captioning (Extra-Credit) Look at Piazza for details Scribing lectures With pay; come talk to me/send email. 1 Projects Projects
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationTutorial on Methods for Interpreting and Understanding Deep Neural Networks. Part 3: Applications & Discussion
Tutorial on Methods for Interpreting and Understanding Deep Neural Networks W. Samek, G. Montavon, K.-R. Müller Part 3: Applications & Discussion ICASSP 2017 Tutorial W. Samek, G. Montavon & K.-R. Müller
More informationSum-Product Networks: A New Deep Architecture
Sum-Product Networks: A New Deep Architecture Pedro Domingos Dept. Computer Science & Eng. University of Washington Joint work with Hoifung Poon 1 Graphical Models: Challenges Bayesian Network Markov Network
More informationMachine Learning. Classification. Bayes Classifier. Representing data: Choosing hypothesis class. Learning: h:x a Y. Eric Xing
Machine Learning 10-701/15 701/15-781, 781, Spring 2008 Naïve Bayes Classifier Eric Xing Lecture 3, January 23, 2006 Reading: Chap. 4 CB and handouts Classification Representing data: Choosing hypothesis
More informationSome Applications of Machine Learning to Astronomy. Eduardo Bezerra 20/fev/2018
Some Applications of Machine Learning to Astronomy Eduardo Bezerra ebezerra@cefet-rj.br 20/fev/2018 Overview 2 Introduction Definition Neural Nets Applications do Astronomy Ads: Machine Learning Course
More informationMachine Learning for Structured Prediction
Machine Learning for Structured Prediction Grzegorz Chrupa la National Centre for Language Technology School of Computing Dublin City University NCLT Seminar Grzegorz Chrupa la (DCU) Machine Learning for
More informationRecent Advances in Bayesian Inference Techniques
Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian
More informationCOMPARING FIXED AND ADAPTIVE COMPUTATION TIME FOR RE-
Workshop track - ICLR COMPARING FIXED AND ADAPTIVE COMPUTATION TIME FOR RE- CURRENT NEURAL NETWORKS Daniel Fojo, Víctor Campos, Xavier Giró-i-Nieto Universitat Politècnica de Catalunya, Barcelona Supercomputing
More informationMultimodal context analysis and prediction
Multimodal context analysis and prediction Valeria Tomaselli (valeria.tomaselli@st.com) Sebastiano Battiato Giovanni Maria Farinella Tiziana Rotondo (PhD student) Outline 2 Context analysis vs prediction
More informationarxiv: v1 [cs.cv] 11 May 2015 Abstract
Training Deeper Convolutional Networks with Deep Supervision Liwei Wang Computer Science Dept UIUC lwang97@illinois.edu Chen-Yu Lee ECE Dept UCSD chl260@ucsd.edu Zhuowen Tu CogSci Dept UCSD ztu0@ucsd.edu
More informationMaking Deep Learning Understandable for Analyzing Sequential Data about Gene Regulation
Making Deep Learning Understandable for Analyzing Sequential Data about Gene Regulation Dr. Yanjun Qi Department of Computer Science University of Virginia Tutorial @ ACM BCB-2018 8/29/18 Yanjun Qi / UVA
More informationLecture 5 Neural models for NLP
CS546: Machine Learning in NLP (Spring 2018) http://courses.engr.illinois.edu/cs546/ Lecture 5 Neural models for NLP Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office hours: Tue/Thu 2pm-3pm
More informationSequential Supervised Learning
Sequential Supervised Learning Many Application Problems Require Sequential Learning Part-of of-speech Tagging Information Extraction from the Web Text-to to-speech Mapping Part-of of-speech Tagging Given
More informationGlobal Scene Representations. Tilke Judd
Global Scene Representations Tilke Judd Papers Oliva and Torralba [2001] Fei Fei and Perona [2005] Labzebnik, Schmid and Ponce [2006] Commonalities Goal: Recognize natural scene categories Extract features
More informationDeep Learning Sequence to Sequence models: Attention Models. 17 March 2018
Deep Learning Sequence to Sequence models: Attention Models 17 March 2018 1 Sequence-to-sequence modelling Problem: E.g. A sequence X 1 X N goes in A different sequence Y 1 Y M comes out Speech recognition:
More informationProbabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov
Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly
More informationRAGAV VENKATESAN VIJETHA GATUPALLI BAOXIN LI NEURAL DATASET GENERALITY
RAGAV VENKATESAN VIJETHA GATUPALLI BAOXIN LI NEURAL DATASET GENERALITY SIFT HOG ALL ABOUT THE FEATURES DAISY GABOR AlexNet GoogleNet CONVOLUTIONAL NEURAL NETWORKS VGG-19 ResNet FEATURES COMES FROM DATA
More informationIntelligent Systems (AI-2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 24, 2016 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,
More informationProbabilistic Graphical Models
Probabilistic Graphical Models David Sontag New York University Lecture 4, February 16, 2012 David Sontag (NYU) Graphical Models Lecture 4, February 16, 2012 1 / 27 Undirected graphical models Reminder
More informationFeature Design. Feature Design. Feature Design. & Deep Learning
Artificial Intelligence and its applications Lecture 9 & Deep Learning Professor Daniel Yeung danyeung@ieee.org Dr. Patrick Chan patrickchan@ieee.org South China University of Technology, China Appropriately
More informationBayesian Classifiers and Probability Estimation. Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington
Bayesian Classifiers and Probability Estimation Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington 1 Data Space Suppose that we have a classification problem The
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationBayesian Methods: Naïve Bayes
Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior
More informationFinal Examination CS 540-2: Introduction to Artificial Intelligence
Final Examination CS 540-2: Introduction to Artificial Intelligence May 7, 2017 LAST NAME: SOLUTIONS FIRST NAME: Problem Score Max Score 1 14 2 10 3 6 4 10 5 11 6 9 7 8 9 10 8 12 12 8 Total 100 1 of 11
More informationCS 570: Machine Learning Seminar. Fall 2016
CS 570: Machine Learning Seminar Fall 2016 Class Information Class web page: http://web.cecs.pdx.edu/~mm/mlseminar2016-2017/fall2016/ Class mailing list: cs570@cs.pdx.edu My office hours: T,Th, 2-3pm or
More informationExpectation Maximization (EM)
Expectation Maximization (EM) The EM algorithm is used to train models involving latent variables using training data in which the latent variables are not observed (unlabeled data). This is to be contrasted
More informationMachine Learning with Quantum-Inspired Tensor Networks
Machine Learning with Quantum-Inspired Tensor Networks E.M. Stoudenmire and David J. Schwab Advances in Neural Information Processing 29 arxiv:1605.05775 RIKEN AICS - Mar 2017 Collaboration with David
More informationLogistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Logistic Regression Matt Gormley Lecture 8 Feb. 12, 2018 1 10-601 Introduction
More informationMixture of Gaussians Models
Mixture of Gaussians Models Outline Inference, Learning, and Maximum Likelihood Why Mixtures? Why Gaussians? Building up to the Mixture of Gaussians Single Gaussians Fully-Observed Mixtures Hidden Mixtures
More informationONR Mine Warfare Autonomy Virtual Program Review September 7, 2017
ONR Mine Warfare Autonomy Virtual Program Review September 7, 2017 Information-driven Guidance and Control for Adaptive Target Detection and Classification Silvia Ferrari Pingping Zhu and Bo Fu Mechanical
More informationStatistical Machine Learning
Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x
More informationNatural Language Processing with Deep Learning CS224N/Ling284
Natural Language Processing with Deep Learning CS224N/Ling284 Lecture 4: Word Window Classification and Neural Networks Richard Socher Organization Main midterm: Feb 13 Alternative midterm: Friday Feb
More informationChapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang
Chapter 4 Dynamic Bayesian Networks 2016 Fall Jin Gu, Michael Zhang Reviews: BN Representation Basic steps for BN representations Define variables Define the preliminary relations between variables Check
More informationThe Noisy Channel Model and Markov Models
1/24 The Noisy Channel Model and Markov Models Mark Johnson September 3, 2014 2/24 The big ideas The story so far: machine learning classifiers learn a function that maps a data item X to a label Y handle
More informationEssence of Machine Learning (and Deep Learning) Hoa M. Le Data Science Lab, HUST hoamle.github.io
Essence of Machine Learning (and Deep Learning) Hoa M. Le Data Science Lab, HUST hoamle.github.io 1 Examples https://www.youtube.com/watch?v=bmka1zsg2 P4 http://www.r2d3.us/visual-intro-to-machinelearning-part-1/
More informationThe Naïve Bayes Classifier. Machine Learning Fall 2017
The Naïve Bayes Classifier Machine Learning Fall 2017 1 Today s lecture The naïve Bayes Classifier Learning the naïve Bayes Classifier Practical concerns 2 Today s lecture The naïve Bayes Classifier Learning
More informationGenerative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul
Generative Learning INFO-4604, Applied Machine Learning University of Colorado Boulder November 29, 2018 Prof. Michael Paul Generative vs Discriminative The classification algorithms we have seen so far
More informationVery Deep Residual Networks with Maxout for Plant Identification in the Wild Milan Šulc, Dmytro Mishkin, Jiří Matas
Very Deep Residual Networks with Maxout for Plant Identification in the Wild Milan Šulc, Dmytro Mishkin, Jiří Matas Center for Machine Perception Department of Cybernetics Faculty of Electrical Engineering
More informationNaïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Naïve Bayes Matt Gormley Lecture 18 Oct. 31, 2018 1 Reminders Homework 6: PAC Learning
More informationLast Time. Today. Bayesian Learning. The Distributions We Love. CSE 446 Gaussian Naïve Bayes & Logistic Regression
CSE 446 Gaussian Naïve Bayes & Logistic Regression Winter 22 Dan Weld Learning Gaussians Naïve Bayes Last Time Gaussians Naïve Bayes Logistic Regression Today Some slides from Carlos Guestrin, Luke Zettlemoyer
More informationCS 188: Artificial Intelligence. Machine Learning
CS 188: Artificial Intelligence Review of Machine Learning (ML) DISCLAIMER: It is insufficient to simply study these slides, they are merely meant as a quick refresher of the high-level ideas covered.
More informationLecture 12. Neural Networks Bastian Leibe RWTH Aachen
Advanced Machine Learning Lecture 12 Neural Networks 10.12.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression
More informationLogistic Regression Introduction to Machine Learning. Matt Gormley Lecture 9 Sep. 26, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Logistic Regression Matt Gormley Lecture 9 Sep. 26, 2018 1 Reminders Homework 3:
More informationNatural Language Processing
Natural Language Processing Word vectors Many slides borrowed from Richard Socher and Chris Manning Lecture plan Word representations Word vectors (embeddings) skip-gram algorithm Relation to matrix factorization
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationN-gram Language Modeling
N-gram Language Modeling Outline: Statistical Language Model (LM) Intro General N-gram models Basic (non-parametric) n-grams Class LMs Mixtures Part I: Statistical Language Model (LM) Intro What is a statistical
More informationEEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1
EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle
More informationBrief Introduction of Machine Learning Techniques for Content Analysis
1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview
More informationMaxout Networks. Hien Quoc Dang
Maxout Networks Hien Quoc Dang Outline Introduction Maxout Networks Description A Universal Approximator & Proof Experiments with Maxout Why does Maxout work? Conclusion 10/12/13 Hien Quoc Dang Machine
More informationReading Group on Deep Learning Session 1
Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular
More informationJakub Hajic Artificial Intelligence Seminar I
Jakub Hajic Artificial Intelligence Seminar I. 11. 11. 2014 Outline Key concepts Deep Belief Networks Convolutional Neural Networks A couple of questions Convolution Perceptron Feedforward Neural Network
More informationDeep Learning for NLP Part 2
Deep Learning for NLP Part 2 CS224N Christopher Manning (Many slides borrowed from ACL 2012/NAACL 2013 Tutorials by me, Richard Socher and Yoshua Bengio) 2 Part 1.3: The Basics Word Representations The
More informationCS 188: Artificial Intelligence Fall 2008
CS 188: Artificial Intelligence Fall 2008 Lecture 23: Perceptrons 11/20/2008 Dan Klein UC Berkeley 1 General Naïve Bayes A general naive Bayes model: C E 1 E 2 E n We only specify how each feature depends
More informationGeneral Naïve Bayes. CS 188: Artificial Intelligence Fall Example: Overfitting. Example: OCR. Example: Spam Filtering. Example: Spam Filtering
CS 188: Artificial Intelligence Fall 2008 General Naïve Bayes A general naive Bayes model: C Lecture 23: Perceptrons 11/20/2008 E 1 E 2 E n Dan Klein UC Berkeley We only specify how each feature depends
More informationMini-project 2 (really) due today! Turn in a printout of your work at the end of the class
Administrivia Mini-project 2 (really) due today Turn in a printout of your work at the end of the class Project presentations April 23 (Thursday next week) and 28 (Tuesday the week after) Order will be
More informationDynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji
Dynamic Data Modeling, Recognition, and Synthesis Rui Zhao Thesis Defense Advisor: Professor Qiang Ji Contents Introduction Related Work Dynamic Data Modeling & Analysis Temporal localization Insufficient
More informationMachine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang Example: image classification indoor Indoor outdoor Example: image classification (multiclass)
More informationTensor Methods for Feature Learning
Tensor Methods for Feature Learning Anima Anandkumar U.C. Irvine Feature Learning For Efficient Classification Find good transformations of input for improved classification Figures used attributed to
More informationLanguage Models. CS6200: Information Retrieval. Slides by: Jesse Anderton
Language Models CS6200: Information Retrieval Slides by: Jesse Anderton What s wrong with VSMs? Vector Space Models work reasonably well, but have a few problems: They are based on bag-of-words, so they
More informationA Deep Interpretation of Classifier Chains
A Deep Interpretation of Classifier Chains Jesse Read and Jaakko Holmén http://users.ics.aalto.fi/{jesse,jhollmen}/ Aalto University School of Science, Department of Information and Computer Science and
More informationlecture 6: modeling sequences (final part)
Natural Language Processing 1 lecture 6: modeling sequences (final part) Ivan Titov Institute for Logic, Language and Computation Outline After a recap: } Few more words about unsupervised estimation of
More information