Classifier Adaptation at Prediction Time

Size: px
Start display at page:

Download "Classifier Adaptation at Prediction Time"

Transcription

1 Classifier Adaptation at Prediction Time or How Bayes rule might help you to reduce your error rate by half Christoph Lampert Yandex, Moscow September 8th, 2016

2 IST Austria (Institute of Science and Technology Austria), Vienna New public research institute Natural and formal sciences - Computer Science, Mathematics, Biology, Neuroscience, Physics PhD-granting, no undergrad Basic, curiosity-driven research Focus on interdisciplinary Open positions in all fields IST Austria Graduate School Postdoc Fellowships Tenure-Track Assistant Professors Full Professors Internships, Reseach Visits, Sabbaticals,... More information: or chl@ist.ac.at

3 Long term goal Automatic systems that can analyze and interpret data Image Understanding Three men sit at a table in a pub, drinking beer. One of them talks while the other two listen. Image: British Broadcasting Corporation (BBC)

4 State of the art Analyze individual aspects of visual data indoors in a pub Scene Classification drinking talking Action Classification Object Recognition three persons one table three glasses

5 Crucial Step: Object Recognition object? person: bottle: cake: truck: car: table: tiger: zebra: Image: Tony Alter, under Creative Commons

6 Object recognition has gone large scale: big data Image: Forsyth, Efros, Fei-Fei, Torralba, Zisserman, "The Promise and Perils of Benchmark Datasets and Challenges", 2011.

7 Object recognition has gone complex: deep networks Image left: Krizhevsky, Sutskever, Hinton, "Imagenet classification with deep convolutional neural networks", NIPS Image right: adapted from He, Zhang, Ren, Sun,"Deep Residual Learning for Image Recognition", arxiv:

8 Object recognition has gone expensive: HPC/GPU clusters Image: "The CSIRO GPU cluster at the data centre" by CSIRO. Licensed under CC BY 3.0 via Wikimedia Commons

9 Don t train object classifiers yourself. Order them pre-trained. Image: faked

10 Research Challenge Image Understanding with Pretrained Classifiers

11 Academic setting Independent identically distributed data at training and prediction time Image: ImageNet dataset Image:

12 Vendor Customer 1 Domain Shift Image: ImageNet dataset Image: "Supermarkt". Licensed under GFDL via Wikimedia Commons

13 Vendor Customer 2 Domain Shift, Dependent Samples Image: ImageNet dataset Image: "Baggage Claim at CPH" by Duhhitsminerva. Licensed under CC BY 3.0 via Wikimedia Commons

14 Vendor Customer 3 Domain Shift, Dependent Samples, Non-Stationary Distribution Image: ImageNet dataset Image: Christoph Lampert 2015

15 Dependent Samples Academic setting: training and test data are sampled i.i.d. images are independent, identically distributed Real-life prediction tasks: very much non-i.i.d. surveillance: temporal dependences between images photo collections: specific selection of themes

16 Dependent Samples Academic setting: training and test data are sampled i.i.d. images are independent, identically distributed Real-life prediction tasks: very much non-i.i.d. surveillance: temporal dependences between images photo collections: specific selection of themes We argue: This is a blessing, not a nuisance! some shop Images: ImageNet dataset

17 Dependent Samples Academic setting: training and test data are sampled i.i.d. images are independent, identically distributed Real-life prediction tasks: very much non-i.i.d. surveillance: temporal dependences between images photo collections: specific selection of themes We argue: This is a blessing, not a nuisance! earlier images act as context bakery Images: ImageNet dataset

18 Domain Shift Notation: x X images, y Y = {1,..., K} class labels P(x, y) data distribution at training time (vendor) Q(x, y) data distribution at prediction time (customer) Domain shift: P(x, y) Q(x, y)

19 Domain Shift Notation: x X images, y Y = {1,..., K} class labels P(x, y) data distribution at training time (vendor) Q(x, y) data distribution at prediction time (customer) Domain shift: P(x, y) Q(x, y) Three cases: P(y x) = Q(y x), but P(x) Q(x) P(x y) Q(x y) P(x y) = Q(x y), but P(y) Q(y) covariate shift appearance shift class prior shift

20 Domain Shift Notation: x X images, y Y = {1,..., K} class labels P(x, y) data distribution at training time (vendor) Q(x, y) data distribution at prediction time (customer) Domain shift: P(x, y) Q(x, y) Three cases: P(y x) = Q(y x), but P(x) Q(x) P(x y) Q(x y) P(x y) = Q(x y), but P(y) Q(y) covariate shift appearance shift class prior shift

21 Domain Shift Image: [Donahue et al., ICML 2014] Appearance shift is mitigated by invariant features

22 Domain Shift Notation: x X images, y Y = {1,..., K} class labels P(x, y) data distribution at training time (vendor) Q(x, y) data distribution at prediction time (customer) Domain shift: P(x, y) Q(x, y) Three cases: P(y x) = Q(y x), but P(x) Q(x) P(x y) Q(x y) P(x y) = Q(x y), but P(y) Q(y) covariate shift appearance shift class prior shift

23 Domain Shift training time: P(y) typically balanced, P(y) 1 K e.g. in ILSVR2014, as many volcanos as cucumbers prediction time: Q(y) supermarket: Q(y) lots of fruit, most likely no volcanos airport: Q(y) lots of people and baggage, also no volcanos vacation: Q(y) occasional volcanos, but more beaches highly imbalanced low entropy easy to learn! Class prior shift is real, but also potentially beneficial.

24 Classifier Adaptation at Prediction Time Amélie Royer ENS Rennes/IST Austria [A. Royer, CHL, "Classifier Adaptation at Prediction Time", CVPR 2015]

25 Class Prior Adaptation Training time: optimal multi-class classifier f : X Y f (x) = argmax f y (x) for f y (x) P(y x). y Y Prediction time: optimal multi-class classifier g : X Y g(x) = argmax g y (x) for g y (x) Q(y x). y Y For P(x y) = Q(x y), but P(y) Q(y), Q(y x) = P(y x)p(x)q(y) P(y)Q(x) f y (x) Q(y) P(y) Optimal classifier: g(x) = argmax y Y f y (x) Q(y) P(y).

26 Class Prior Adaptation probabilistic classifier f (x) = argmax y f y (x), with f y : X R class proportions at training time ρ R K, i.e. ρ y = P(y) class proportions at prediction time π R K, i.e. π y = Q(y) Definition: The class-prior adaptation of f from ρ to π is g(x) = argmax g y (x) for g y (x) = f y(x)π y. y Y ρ y Note: no retraining, only adjust output scores. [Saerens et al., 2002] Lemma: g is Bayes-optimal for Q(x, y)-distributed data, if P(x, y) differs from Q(x, y) only in the class proportions and f y (x) = P(y x).

27 Class Prior Adaptation probabilistic classifier f (x) = argmax y f y (x), with f y : X R class proportions at training time ρ R K, i.e. ρ y = P(y) class proportions at prediction time π R K, i.e. π y = Q(y) Definition: The class-prior adaptation of f from ρ to π is g(x) = argmax g y (x) for g y (x) = f y(x)π y. y Y ρ y Note: no retraining, only adjust output scores. [Saerens et al., 2002] Lemma: g is Bayes-optimal for Q(x, y)-distributed data, if P(x, y) differs from Q(x, y) only in the class proportions and f y (x) = P(y x).

28 Class-Prior Adaptation during Sequential Prediction Problem: In the vendor/customer scenario, the class proportion at prediction time, π, are unknown. Solution: learn proportions on-the-fly at prediction time

29 Class-Prior Adaptation during Sequential Prediction Problem: In the vendor/customer scenario, the class proportion at prediction time, π, are unknown. Solution: learn proportions on-the-fly at prediction time Sequential prediction scenario: images to be classified arrive sequentially, x 1, x 2,... goal: for each x t make prediction g(x t ) three possible feedback scenarios: online: after prediction the correct label, yt, is revealed e.g. supermarket cash register bandit: after prediction it is revealed if a mistake was made e.g. augmented reality glasses unsupervised: no feedback about correct labels e.g. surveillance

30 Example (no adaptation) image x t f cat (x t ) f dog (x t ) f truck (x t ) prediction f (x t ) online feedback bandit feedback no feedback

31 Example (no adaptation) image x t f cat (x t ) 0.8 f dog (x t ) 0.1 f truck (x t ) 0.1 prediction f (x t ) cat online feedback bandit feedback no feedback

32 Example (no adaptation) image x t f cat (x t ) 0.8 f dog (x t ) 0.1 f truck (x t ) 0.1 prediction f (x t ) cat online feedback bandit feedback no feedback cat

33 Example (no adaptation) image x t f cat (x t ) 0.8 f dog (x t ) 0.1 f truck (x t ) 0.1 prediction f (x t ) cat online feedback bandit feedback! no feedback

34 Example (no adaptation) image x t f cat (x t ) 0.8 f dog (x t ) 0.1 f truck (x t ) 0.1 prediction f (x t ) cat online feedback bandit feedback no feedback

35 Example (no adaptation) image x t f cat (x t ) 0.8 f dog (x t ) 0.1 f truck (x t ) 0.1 prediction f (x t ) cat online feedback cat bandit feedback! no feedback

36 Example (no adaptation) image x t f cat (x t ) f dog (x t ) f truck (x t ) prediction f (x t ) cat truck online feedback cat bandit feedback! no feedback

37 Example (no adaptation) image x t f cat (x t ) f dog (x t ) f truck (x t ) prediction f (x t ) cat truck online feedback cat dog bandit feedback! no feedback

38 Example (no adaptation) image x t f cat (x t ) f dog (x t ) f truck (x t ) prediction f (x t ) cat truck online feedback cat bandit feedback! % no feedback

39 Example (no adaptation) image x t f cat (x t ) f dog (x t ) f truck (x t ) prediction f (x t ) cat truck online feedback cat bandit feedback! no feedback

40 Example (no adaptation) image x t f cat (x t ) f dog (x t ) f truck (x t ) prediction f (x t ) cat truck online feedback cat dog bandit feedback! % no feedback

41 Example (no adaptation) image x t f cat (x t ) f dog (x t ) f truck (x t ) prediction f (x t ) cat truck dog online feedback cat dog bandit feedback! % no feedback

42 Example (no adaptation) image x t f cat (x t ) f dog (x t ) f truck (x t ) prediction f (x t ) cat truck dog online feedback cat dog dog bandit feedback! % no feedback

43 Example (no adaptation) image x t f cat (x t ) f dog (x t ) f truck (x t ) prediction f (x t ) cat truck dog online feedback cat dog bandit feedback! %! no feedback

44 Example (no adaptation) image x t f cat (x t ) f dog (x t ) f truck (x t ) prediction f (x t ) cat truck dog online feedback cat dog bandit feedback! % no feedback

45 Example (no adaptation) image x t f cat (x t ) f dog (x t ) f truck (x t ) prediction f (x t ) cat truck dog online feedback cat dog dog bandit feedback! %! no feedback

46 Example (no adaptation) image x t f cat (x t ) f dog (x t ) f truck (x t ) prediction f (x t ) cat truck dog cat online feedback cat dog dog cat bandit feedback! %!! no feedback

47 Example (no adaptation) image x t f cat (x t ) f dog (x t ) f truck (x t ) prediction f (x t ) cat truck dog cat online feedback cat dog dog cat bandit feedback! %!! no feedback

48 Example (no adaptation) image x t f cat (x t ) f dog (x t ) f truck (x t ) prediction f (x t ) cat truck dog cat truck online feedback cat dog dog cat dog bandit feedback! %!! % no feedback

49 Example (no adaptation) image x t f cat (x t ) f dog (x t ) f truck (x t ) prediction f (x t ) cat truck dog cat truck online feedback cat dog dog cat dog bandit feedback! %!! % no feedback...

50 Estimating Class Priors Examples, x 1, x 2,..., with labels, y 1, y 2,... Task: estimate class proportions, (π y ) y Y Smoothed Maximum Likelihood (aka Bayesian estimator with Dirichlet prior): π y (t) = n t(y) + α for α > 0, (e.g. α = 1 t + Kα 2 ) t where n t (y) = y τ = y τ=1 counts how often each label occurred so far. (preferable to ML estimator, π (t) y = nt (y), which assigns 0 probability to unseen classes) t

51 Online Feedback π (t) y = n t(y) + α t + Kα t for n t (y) = y τ = y τ=1 after prediction g(x t ), correct label y t is revealed compute n t (y) incrementally n t (y) = n t 1 (y) + y t = y

52 Online Feedback π (t) y = n t(y) + α t + Kα t for n t (y) = y τ = y τ=1 after prediction g(x t ), correct label y t is revealed compute n t (y) incrementally n t (y) = n t 1 (y) + y t = y Law of large numbers: π (t) converges to the true class distribution. This holds even for dependent samples (under weak conditions).

53 Example (online feedback) π (cat,dog,truck) 1 3, 1 3, 1 3 image x t f cat (x t ) f dog (x t ) f truck (x t ) g cat (x t ) g dog (x t ) g truck (x t ) prediction g(x t ) feedback y t π update

54 Example (online feedback) π (cat,dog,truck) 1 3, 1 3, 1 3 image x t f cat (x t ) 0.8 f dog (x t ) 0.1 f truck (x t ) 0.1 g cat (x t ) g dog (x t ) g truck (x t ) prediction g(x t ) feedback y t π update

55 Example (online feedback) π (cat,dog,truck) 1 3, 1 3, 1 3 image x t f cat (x t ) 0.8 f dog (x t ) 0.1 f truck (x t ) 0.1 g cat (x t ) 0.8 g dog (x t ) 0.1 g truck (x t ) 0.1 prediction g(x t ) cat feedback y t π update

56 Example (online feedback) π (cat,dog,truck) 1 3, 1 3, 1 3 image x t f cat (x t ) 0.8 f dog (x t ) 0.1 f truck (x t ) 0.1 g cat (x t ) 0.8 g dog (x t ) 0.1 g truck (x t ) 0.1 prediction g(x t ) cat feedback y t cat π update n(cat) += 1

57 Example (online feedback) π (cat,dog,truck) 1 3, 1 3, , 1 4, 1 4 image x t f cat (x t ) 0.8 f dog (x t ) 0.1 f truck (x t ) 0.1 g cat (x t ) 0.8 g dog (x t ) 0.1 g truck (x t ) 0.1 prediction g(x t ) cat feedback y t cat π update n(cat) += 1

58 Example (online feedback) π (cat,dog,truck) 1 3, 1 3, , 1 4, 1 4 image x t f cat (x t ) 0.8 f dog (x t ) 0.1 f truck (x t ) 0.1 g cat (x t ) 0.8 g dog (x t ) 0.1 g truck (x t ) 0.1 prediction g(x t ) cat feedback y t cat π update n(cat) += 1

59 Example (online feedback) π (cat,dog,truck) 1 3, 1 3, , 1 4, 1 4 image x t f cat (x t ) f dog (x t ) f truck (x t ) g cat (x t ) 0.8 g dog (x t ) 0.1 g truck (x t ) 0.1 prediction g(x t ) cat feedback y t cat π update n(cat) += 1

60 Example (online feedback) π (cat,dog,truck) 1 3, 1 3, , 1 4, 1 4 image x t f cat (x t ) f dog (x t ) f truck (x t ) g cat (x t ) g dog (x t ) g truck (x t ) prediction g(x t ) cat truck feedback y t cat π update n(cat) += 1

61 Example (online feedback) π (cat,dog,truck) 1 3, 1 3, , 1 4, 1 4 image x t f cat (x t ) f dog (x t ) f truck (x t ) g cat (x t ) g dog (x t ) g truck (x t ) prediction g(x t ) cat truck feedback y t cat dog π update n(cat) + = 1 n(dog) + = 1

62 Example (online feedback) π (cat,dog,truck) 1 3, 1 3, , 1 4, , 2 5, 1 5 image x t f cat (x t ) f dog (x t ) f truck (x t ) g cat (x t ) g dog (x t ) g truck (x t ) prediction g(x t ) cat truck feedback y t cat dog π update n(cat) + = 1 n(dog) + = 1

63 Example (online feedback) π (cat,dog,truck) 1 3, 1 3, , 1 4, , 2 5, 1 5 image x t f cat (x t ) f dog (x t ) f truck (x t ) g cat (x t ) g dog (x t ) g truck (x t ) prediction g(x t ) cat truck feedback y t cat dog π update n(cat) + = 1 n(dog) + = 1

64 Example (online feedback) π (cat,dog,truck) 1 3, 1 3, , 1 4, , 2 5, 1 5 image x t f cat (x t ) f dog (x t ) f truck (x t ) g cat (x t ) g dog (x t ) g truck (x t ) prediction g(x t ) cat truck feedback y t cat dog π update n(cat) + = 1 n(dog) + = 1

65 Example (online feedback) π (cat,dog,truck) 1 3, 1 3, , 1 4, , 2 5, 1 5 image x t f cat (x t ) f dog (x t ) f truck (x t ) g cat (x t ) g dog (x t ) g truck (x t ) prediction g(x t ) cat truck dog feedback y t cat dog π update n(cat) + = 1 n(dog) + = 1

66 Example (online feedback) π (cat,dog,truck) 1 3, 1 3, , 1 4, , 2 5, 1 5 image x t f cat (x t ) f dog (x t ) f truck (x t ) g cat (x t ) g dog (x t ) g truck (x t ) prediction g(x t ) cat truck dog feedback y t cat dog dog π update n(cat) + = 1 n(dog) + = 1 n(dog) + = 1

67 Example (online feedback) π (cat,dog,truck) 1 3, 1 3, , 1 4, , 2 5, , 3 6, 1 6 image x t f cat (x t ) f dog (x t ) f truck (x t ) g cat (x t ) g dog (x t ) g truck (x t ) prediction g(x t ) cat truck dog feedback y t cat dog dog π update n(cat) + = 1 n(dog) + = 1 n(dog) + = 1

68 Example (online feedback) π (cat,dog,truck) 1 3, 1 3, , 1 4, , 2 5, , 3 6, 1 6 image x t f cat (x t ) f dog (x t ) f truck (x t ) g cat (x t ) g dog (x t ) g truck (x t ) prediction g(x t ) cat truck dog feedback y t cat dog dog π update n(cat) + = 1 n(dog) + = 1 n(dog) + = 1

69 Example (online feedback) π (cat,dog,truck) 1 3, 1 3, , 1 4, , 2 5, , 3 6, 1 6 image x t f cat (x t ) f dog (x t ) f truck (x t ) g cat (x t ) g dog (x t ) g truck (x t ) prediction g(x t ) cat truck dog feedback y t cat dog dog π update n(cat) + = 1 n(dog) + = 1 n(dog) + = 1

70 Example (online feedback) π (cat,dog,truck) 1 3, 1 3, , 1 4, , 2 5, , 3 6, 1 6 image x t f cat (x t ) f dog (x t ) f truck (x t ) g cat (x t ) g dog (x t ) g truck (x t ) prediction g(x t ) cat truck dog cat feedback y t cat dog dog π update n(cat) + = 1 n(dog) + = 1 n(dog) + = 1

71 Example (online feedback) π (cat,dog,truck) 1 3, 1 3, , 1 4, , 2 5, , 3 6, 1 6 image x t f cat (x t ) f dog (x t ) f truck (x t ) g cat (x t ) g dog (x t ) g truck (x t ) prediction g(x t ) cat truck dog cat feedback y t cat dog dog cat π update n(cat) + = 1 n(dog) + = 1 n(dog) + = 1 n(cat) + = 1

72 Example (online feedback) π (cat,dog,truck) 1 3, 1 3, , 1 4, , 2 5, , 3 6, , 3 7, 1 7 image x t f cat (x t ) f dog (x t ) f truck (x t ) g cat (x t ) g dog (x t ) g truck (x t ) prediction g(x t ) cat truck dog cat feedback y t cat dog dog cat π update n(cat) + = 1 n(dog) + = 1 n(dog) + = 1 n(cat) + = 1

73 Example (online feedback) π (cat,dog,truck) 1 3, 1 3, , 1 4, , 2 5, , 3 6, , 3 7, 1 7 image x t f cat (x t ) f dog (x t ) f truck (x t ) g cat (x t ) g dog (x t ) g truck (x t ) prediction g(x t ) cat truck dog cat feedback y t cat dog dog cat π update n(cat) + = 1 n(dog) + = 1 n(dog) + = 1 n(cat) + = 1

74 Example (online feedback) π (cat,dog,truck) 1 3, 1 3, , 1 4, , 2 5, , 3 6, , 3 7, 1 7 image x t f cat (x t ) f dog (x t ) f truck (x t ) g cat (x t ) g dog (x t ) g truck (x t ) prediction g(x t ) cat truck dog cat feedback y t cat dog dog cat π update n(cat) + = 1 n(dog) + = 1 n(dog) + = 1 n(cat) + = 1

75 Example (online feedback) π (cat,dog,truck) 1 3, 1 3, , 1 4, , 2 5, , 3 6, , 3 7, 1 7 image x t f cat (x t ) f dog (x t ) f truck (x t ) g cat (x t ) g dog (x t ) g truck (x t ) prediction g(x t ) cat truck dog cat dog(!) feedback y t cat dog dog cat π update n(cat) + = 1 n(dog) + = 1 n(dog) + = 1 n(cat) + = 1

76 Example (online feedback) π (cat,dog,truck) 1 3, 1 3, , 1 4, , 2 5, , 3 6, , 3 7, image x t f cat (x t ) f dog (x t ) f truck (x t ) g cat (x t ) g dog (x t ) g truck (x t ) prediction g(x t ) cat truck dog cat dog(!) feedback y t cat dog dog cat dog π update n(cat) + = 1 n(dog) + = 1 n(dog) + = 1 n(cat) + =

77 Bandit Feedback π (t) y = n t(y) + α t + Kα t for n t (y) = y τ = y τ=1 after prediction g(x t ), it is revealed if the prediction was correct estimate n t (y) incrementally n t (y) = n t 1 (y) + δ t (y) if decision was correct: δ t (y) = y t = y 0 for y = g(x t ). if decision was incorrect: δ t (y) = 1 otherwise (also possible: δ t(y) Q (t) (y x t), for y g(x t)) K 1

78 Example (bandit feedback) π (cat,dog,truck) 1 3, 1 3, , 1 4, , 1.5 5, , 2.5 6, , 2.5 7, image x t f cat (x t ) f dog (x t ) f truck (x t ) g cat (x t ) g dog (x t ) g truck (x t ) pred. g(x t ) cat truck dog cat dog(!) feedback y t! %!!! π update n(cat) += 1 n(cat) += 0.5 n(dog) += 0.5 n(dog) + = 1 n(cat) + =

79 Unsupervised (No Feedback) π (t) y = n t(y) + α t + Kα t for n t (y) = y τ = y τ=1 no information if prediction g(x t ) was correct or not estimate n t (y) by trusting own predictions (self-training) n t (y) = n t 1 (y) + δ t (y) for { } g (t) δ t (y) = Eȳ Q (t) (ȳ x t) y = ȳ = y (x t ) ȳ g (t) ȳ (x t ). No guarantee, but can be expected to work for decent base classifiers.

80 Example (no feedback) π (cat,dog,truck) 1 3, 1 3, , 1.1 4, , 1.5 5, , 2.1 6, , 2.4 7, image x t f cat (x t ) f dog (x t ) f truck (x t ) g cat (x t ) g dog (x t ) g truck (x t ) pred. g(x t ) cat truck dog cat dog(!) feedback y t π update n(cat) += 0.8 n(dog) += 0.1 n(truck)+ = 0.1 n(cat) += 0.15 n(dog) += 0.38 n(truck)+ = 0.47 n(cat) += 0.25 n(dog) += 0.65 n(truck)+ = 0.10 n(cat) += 0.53 n(dog) += 0.31 n(truck)+ =

81 Extension: Non-Stationary Data Distribution What if data distribution changes, e.g. mobile camera? Sliding window estimate: adapt only to recent past (e.g. L = 100). π (t) y = n t(y) + α L + Kα, with n t(y) = t τ=t L+1 δ τ (y),

82 Realistic Image Sequences How to benchmark such an adaptive classification system? We need realistic image sequences: non-uniform class distribution dependent samples non-stationary distribution

83 Realistic Image Sequences How to benchmark such an adaptive classification system? We need realistic image sequences: non-uniform class distribution dependent samples non-stationary distribution Proposal: three methods based on existing i.i.d. corpus (ILSVRC) KS/MDS: Hidden Markov Model structure random walk between classes for each visited class, sample one image TXT: based on class structure in natural language for each class occurrence in a text document, sample one image

84 Realistic Image Sequences: MDS Apply Multi-Dimensional Scaling (MDS) to ImageNet hierarchy Random walk on k-nn graph For each visited class, sample one image from ILSVRC corpus Properties: highly connected semantic clusters, random walk stays within one "topic" for an extended time

85 Realistic Image Sequences: KS Apply Kernelized Sorting (KS) to ImageNet hierarchy Random walk on resulting grid graph For each visited class, sample one image from ILSVRC corpus Properties: similar classes are close, but no cluster structure, random walk frequently "changes topic"

86 Introducing Context Switches Extend MDS/KS to allow "jumps" called MDS(λ), KS(λ) Introduce parameter λ > 0 Instead of taking a random walk step, jump to arbitrary (random) node in the graph with probability λ. Result: homogeneous subsequences of variable length

87 Realistic Image Sequences: TXT Given: corpus of well-formed English texts (project Gutenberg) Generate image sequence: discard non-nouns from text scan noun sequence for class names or ImageNet hypernyms if leaf in hierarchy (cucumber): sample image from that class if interior node (dog): sample random leaf from subtree (tibetan mastiff), sample image from leaf class... when the rabbit actually took a watch out of its waistcoat-pocket and looked at it and then hurried on, Alice started to her feet, for it flashed across her mind that she had never before seen a rabbit with either a waistcoat-pocket, or a watch to take out of it,... (Excerpt from Alice in Wonderland, italics: nouns, bold: ILSVRC2010 (super-)classes)

88 Introduction Challenges Class-Prior Adaptation Realistic Image Sequences Experiments Example Sequences TXT rabbit watch rabbit watch rabbit rabbit jar orange jar... MDS asparagus jalapeno green onion jalapeno jalapeno kidney bean pumpkin french fries... nematode sea cucumber snow leopard leopard leopard leopard mink weasel... KS RND speedboat coral reef burrito lionfish envelope fur coat trifle paddle punching bag Example label sequences and test images for TXT, MDS, KS, compared to uniform i.i.d. (RND). Images: ImageNet dataset...

89 Experimental Setup Base datasets: ILSVRC2010, ILSVRC2012 (val part) Base classifiers (pre-trained): Convolutional Neural Network (libccv, AlexNet style) SVM with 4K-dim. Fisher vectors (yael/jsgd) + Platt scaling Methods: base classifier base classifier + adaptation (+adapt) base classifier + windowed adaptation (+dyn) Test sets: 100 sequences each of MDS, MDS(λ) for λ {0.001, 0.01, 0.1} length 3000 KS, KS(λ) for λ {0.001, 0.01, 0.1} length 3000 TXT variable length, (avg. 3475) RND length 3000 Error measures: top-1 error rate, top-5 error rate

90 Experimental Setup Base datasets: ILSVRC2010, ILSVRC2012 (val part) Base classifiers (pre-trained): Convolutional Neural Network (libccv, AlexNet style) SVM with 4K-dim. Fisher vectors (yael/jsgd) + Platt scaling Methods: base classifier base classifier + adaptation (+adapt) base classifier + windowed adaptation (+dyn) Test sets: 100 sequences each of MDS, MDS(λ) for λ {0.001, 0.01, 0.1} length 3000 KS, KS(λ) for λ {0.001, 0.01, 0.1} length 3000 TXT variable length, (avg. 3475) RND length 3000 Error measures: top-1 error rate, top-5 error rate

91 Results ILSVRC2012 CNN CNN+adapt CNN+dyn TXT 19.8 ± ± ± 1.7 MDS 16.1 ± ± ± 2.6 MDS(0.001) 15.6 ± ± ± 1.9 MDS(0.01) 15.7 ± ± ± 1.1 MDS(0.1) 16.2 ± ± ± 0.7 KS 16.4 ± ± ± 1.3 KS(0.001) 16.5 ± ± ± 1.2 KS(0.01) 16.4 ± ± ± 1.0 KS(0.1) 16.5 ± ± ± 0.8 RND 16.5 ± ± ± 0.6 Online Feedback (each cell: top-5 error [%], mean and std.dev. over 100 sequences)

92 Results ILSVRC2012 CNN CNN+adapt CNN+dyn TXT 19.8 ± ± ± 1.7 MDS 16.1 ± ± ± 2.6 MDS(0.001) 15.6 ± ± ± 1.9 MDS(0.01) 15.7 ± ± ± 1.1 MDS(0.1) 16.2 ± ± ± 0.7 KS 16.4 ± ± ± 1.3 KS(0.001) 16.5 ± ± ± 1.2 KS(0.01) 16.4 ± ± ± 1.0 KS(0.1) 16.5 ± ± ± 0.8 RND 16.5 ± ± ± 0.6 Online Feedback (each cell: top-5 error [%], mean and std.dev. over 100 sequences)

93 Results ILSVRC2012 CNN CNN+adapt CNN+dyn TXT 19.8 ± ± ± 1.7 MDS 16.1 ± ± ± 2.6 MDS(0.001) 15.6 ± ± ± 1.9 MDS(0.01) 15.7 ± ± ± 1.1 MDS(0.1) 16.2 ± ± ± 0.7 KS 16.4 ± ± ± 1.3 KS(0.001) 16.5 ± ± ± 1.2 KS(0.01) 16.4 ± ± ± 1.0 KS(0.1) 16.5 ± ± ± 0.8 RND 16.5 ± ± ± 0.6 Online Feedback (each cell: top-5 error [%], mean and std.dev. over 100 sequences)

94 Results ILSVRC2012 CNN CNN+adapt CNN+dyn TXT 19.8 ± ± ± 1.7 MDS 16.1 ± ± ± 2.6 MDS(0.001) 15.6 ± ± ± 1.9 MDS(0.01) 15.7 ± ± ± 1.1 MDS(0.1) 16.2 ± ± ± 0.7 KS 16.4 ± ± ± 1.3 KS(0.001) 16.5 ± ± ± 1.2 KS(0.01) 16.4 ± ± ± 1.0 KS(0.1) 16.5 ± ± ± 0.8 RND 16.5 ± ± ± 0.6 Online Feedback (each cell: top-5 error [%], mean and std.dev. over 100 sequences)

95 Results ILSVRC2012 CNN CNN+adapt CNN+dyn TXT 19.8 ± ± ± 1.7 MDS 16.1 ± ± ± 3.3 MDS(0.001) 15.6 ± ± ± 2.6 MDS(0.01) 15.7 ± ± ± 1.4 MDS(0.1) 16.2 ± ± ± 0.8 KS 16.4 ± ± ± 1.6 KS(0.001) 16.5 ± ± ± 1.6 KS(0.01) 16.4 ± ± ± 1.3 KS(0.1) 16.5 ± ± ± 0.8 RND 16.5 ± ± ± 0.6 Bandit Feedback (each cell: top-5 error [%], mean and std.dev. over 100 sequences)

96 Results ILSVRC2012 CNN CNN+adapt CNN+dyn TXT 19.8 ± ± ± 1.7 MDS 16.1 ± ± ± 2.9 MDS(0.001) 15.6 ± ± ± 2.7 MDS(0.01) 15.7 ± ± ± 1.5 MDS(0.1) 16.2 ± ± ± 0.8 KS 16.4 ± ± ± 1.5 KS(0.001) 16.5 ± ± ± 1.5 KS(0.01) 16.4 ± ± ± 1.1 KS(0.1) 16.5 ± ± ± 0.8 RND 16.5 ± ± ± 0.6 Unsupervised (No Feedback) (each cell: top-5 error [%], mean and std.dev. over 100 sequences)

97 Summary Observations: soon we will buy computer vision components pre-trained new kinds of research problems in real problems the images to be classified are not uniform i.i.d.: class imbalance, dependent samples, non-stationary distribution Contributions: classifier adaptation with on-the-fly estimation of class priors, oblivious of underlying base classifiers, only adjust scores three methods for creating dependent test image sequences Results: on-the-fly adaptation can reduce the error rate substantially for good enough base classifiers, no additional supervision needed

98 Thanks to... The team at IST Austria: Alex Kolesnikov Georg Martius Asya Pentina Amélie Royer Alex Zimin Funding Sources:

Towards Lifelong Machine Learning Multi-Task and Lifelong Learning with Unlabeled Tasks Christoph Lampert

Towards Lifelong Machine Learning Multi-Task and Lifelong Learning with Unlabeled Tasks Christoph Lampert Towards Lifelong Machine Learning Multi-Task and Lifelong Learning with Unlabeled Tasks Christoph Lampert HSE Computer Science Colloquium September 6, 2016 IST Austria (Institute of Science and Technology

More information

Deep Learning (CNNs)

Deep Learning (CNNs) 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Deep Learning (CNNs) Deep Learning Readings: Murphy 28 Bishop - - HTF - - Mitchell

More information

Bayesian Networks (Part I)

Bayesian Networks (Part I) 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Bayesian Networks (Part I) Graphical Model Readings: Murphy 10 10.2.1 Bishop 8.1,

More information

Distinguish between different types of scenes. Matching human perception Understanding the environment

Distinguish between different types of scenes. Matching human perception Understanding the environment Scene Recognition Adriana Kovashka UTCS, PhD student Problem Statement Distinguish between different types of scenes Applications Matching human perception Understanding the environment Indexing of images

More information

Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287

Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287 Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287 Review: Neural Networks One-layer multi-layer perceptron architecture, NN MLP1 (x) = g(xw 1 + b 1 )W 2 + b 2 xw + b; perceptron x is the

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17 3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/9/7 Perceptron as a neural

More information

Algorithmic Stability and Generalization Christoph Lampert

Algorithmic Stability and Generalization Christoph Lampert Algorithmic Stability and Generalization Christoph Lampert November 28, 2018 1 / 32 IST Austria (Institute of Science and Technology Austria) institute for basic research opened in 2009 located in outskirts

More information

Asaf Bar Zvi Adi Hayat. Semantic Segmentation

Asaf Bar Zvi Adi Hayat. Semantic Segmentation Asaf Bar Zvi Adi Hayat Semantic Segmentation Today s Topics Fully Convolutional Networks (FCN) (CVPR 2015) Conditional Random Fields as Recurrent Neural Networks (ICCV 2015) Gaussian Conditional random

More information

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang Deep Learning Basics Lecture 7: Factor Analysis Princeton University COS 495 Instructor: Yingyu Liang Supervised v.s. Unsupervised Math formulation for supervised learning Given training data x i, y i

More information

Machine Learning (CS 567) Lecture 2

Machine Learning (CS 567) Lecture 2 Machine Learning (CS 567) Lecture 2 Time: T-Th 5:00pm - 6:20pm Location: GFS118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

More information

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire

More information

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer CSE446: Neural Networks Spring 2017 Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer Human Neurons Switching time ~ 0.001 second Number of neurons 10 10 Connections per neuron 10 4-5 Scene

More information

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton Motivation Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) Human Brain Neurons Input-Output Transformation Input Spikes Output Spike Spike (= a brief pulse) (Excitatory Post-Synaptic Potential)

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Neural Networks: A brief touch Yuejie Chi Department of Electrical and Computer Engineering Spring 2018 1/41 Outline

More information

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data

More information

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima. http://goo.gl/xilnmn Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT

More information

Machine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016

Machine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016 Machine Learning for Signal Processing Neural Networks Continue Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016 1 So what are neural networks?? Voice signal N.Net Transcription Image N.Net Text

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward

More information

Introduction to Machine Learning Midterm Exam

Introduction to Machine Learning Midterm Exam 10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but

More information

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima. http://goo.gl/jv7vj9 Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT

More information

Be able to define the following terms and answer basic questions about them:

Be able to define the following terms and answer basic questions about them: CS440/ECE448 Section Q Fall 2017 Final Review Be able to define the following terms and answer basic questions about them: Probability o Random variables, axioms of probability o Joint, marginal, conditional

More information

Convolutional Neural Networks II. Slides from Dr. Vlad Morariu

Convolutional Neural Networks II. Slides from Dr. Vlad Morariu Convolutional Neural Networks II Slides from Dr. Vlad Morariu 1 Optimization Example of optimization progress while training a neural network. (Loss over mini-batches goes down over time.) 2 Learning rate

More information

Two at Once: Enhancing Learning and Generalization Capacities via IBN-Net

Two at Once: Enhancing Learning and Generalization Capacities via IBN-Net Two at Once: Enhancing Learning and Generalization Capacities via IBN-Net Supplementary Material Xingang Pan 1, Ping Luo 1, Jianping Shi 2, and Xiaoou Tang 1 1 CUHK-SenseTime Joint Lab, The Chinese University

More information

Online Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions?

Online Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions? Online Videos FERPA Sign waiver or sit on the sides or in the back Off camera question time before and after lecture Questions? Lecture 1, Slide 1 CS224d Deep NLP Lecture 4: Word Window Classification

More information

Online Learning and Sequential Decision Making

Online Learning and Sequential Decision Making Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning

More information

More on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013

More on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013 More on HMMs and other sequence models Intro to NLP - ETHZ - 18/03/2013 Summary Parts of speech tagging HMMs: Unsupervised parameter estimation Forward Backward algorithm Bayesian variants Discriminative

More information

Neural networks and optimization

Neural networks and optimization Neural networks and optimization Nicolas Le Roux Criteo 18/05/15 Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 1 / 85 1 Introduction 2 Deep networks 3 Optimization 4 Convolutional

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

From perceptrons to word embeddings. Simon Šuster University of Groningen

From perceptrons to word embeddings. Simon Šuster University of Groningen From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written

More information

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels? Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone extension 6372 Email: sbh11@cl.cam.ac.uk www.cl.cam.ac.uk/ sbh11/ Unsupervised learning Can we find regularity

More information

N-gram Language Modeling Tutorial

N-gram Language Modeling Tutorial N-gram Language Modeling Tutorial Dustin Hillard and Sarah Petersen Lecture notes courtesy of Prof. Mari Ostendorf Outline: Statistical Language Model (LM) Basics n-gram models Class LMs Cache LMs Mixtures

More information

Introduction to Machine Learning Midterm Exam Solutions

Introduction to Machine Learning Midterm Exam Solutions 10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,

More information

CSC321 Lecture 16: ResNets and Attention

CSC321 Lecture 16: ResNets and Attention CSC321 Lecture 16: ResNets and Attention Roger Grosse Roger Grosse CSC321 Lecture 16: ResNets and Attention 1 / 24 Overview Two topics for today: Topic 1: Deep Residual Networks (ResNets) This is the state-of-the

More information

Neural networks and optimization

Neural networks and optimization Neural networks and optimization Nicolas Le Roux INRIA 8 Nov 2011 Nicolas Le Roux (INRIA) Neural networks and optimization 8 Nov 2011 1 / 80 1 Introduction 2 Linear classifier 3 Convolutional neural networks

More information

WHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY,

WHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY, WHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY, WITH IMPLICATIONS FOR TRAINING Sanjeev Arora, Yingyu Liang & Tengyu Ma Department of Computer Science Princeton University Princeton, NJ 08540, USA {arora,yingyul,tengyu}@cs.princeton.edu

More information

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6 Administration Registration Hw3 is out Due on Thursday 10/6 Questions Lecture Captioning (Extra-Credit) Look at Piazza for details Scribing lectures With pay; come talk to me/send email. 1 Projects Projects

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Tutorial on Methods for Interpreting and Understanding Deep Neural Networks. Part 3: Applications & Discussion

Tutorial on Methods for Interpreting and Understanding Deep Neural Networks. Part 3: Applications & Discussion Tutorial on Methods for Interpreting and Understanding Deep Neural Networks W. Samek, G. Montavon, K.-R. Müller Part 3: Applications & Discussion ICASSP 2017 Tutorial W. Samek, G. Montavon & K.-R. Müller

More information

Sum-Product Networks: A New Deep Architecture

Sum-Product Networks: A New Deep Architecture Sum-Product Networks: A New Deep Architecture Pedro Domingos Dept. Computer Science & Eng. University of Washington Joint work with Hoifung Poon 1 Graphical Models: Challenges Bayesian Network Markov Network

More information

Machine Learning. Classification. Bayes Classifier. Representing data: Choosing hypothesis class. Learning: h:x a Y. Eric Xing

Machine Learning. Classification. Bayes Classifier. Representing data: Choosing hypothesis class. Learning: h:x a Y. Eric Xing Machine Learning 10-701/15 701/15-781, 781, Spring 2008 Naïve Bayes Classifier Eric Xing Lecture 3, January 23, 2006 Reading: Chap. 4 CB and handouts Classification Representing data: Choosing hypothesis

More information

Some Applications of Machine Learning to Astronomy. Eduardo Bezerra 20/fev/2018

Some Applications of Machine Learning to Astronomy. Eduardo Bezerra 20/fev/2018 Some Applications of Machine Learning to Astronomy Eduardo Bezerra ebezerra@cefet-rj.br 20/fev/2018 Overview 2 Introduction Definition Neural Nets Applications do Astronomy Ads: Machine Learning Course

More information

Machine Learning for Structured Prediction

Machine Learning for Structured Prediction Machine Learning for Structured Prediction Grzegorz Chrupa la National Centre for Language Technology School of Computing Dublin City University NCLT Seminar Grzegorz Chrupa la (DCU) Machine Learning for

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

COMPARING FIXED AND ADAPTIVE COMPUTATION TIME FOR RE-

COMPARING FIXED AND ADAPTIVE COMPUTATION TIME FOR RE- Workshop track - ICLR COMPARING FIXED AND ADAPTIVE COMPUTATION TIME FOR RE- CURRENT NEURAL NETWORKS Daniel Fojo, Víctor Campos, Xavier Giró-i-Nieto Universitat Politècnica de Catalunya, Barcelona Supercomputing

More information

Multimodal context analysis and prediction

Multimodal context analysis and prediction Multimodal context analysis and prediction Valeria Tomaselli (valeria.tomaselli@st.com) Sebastiano Battiato Giovanni Maria Farinella Tiziana Rotondo (PhD student) Outline 2 Context analysis vs prediction

More information

arxiv: v1 [cs.cv] 11 May 2015 Abstract

arxiv: v1 [cs.cv] 11 May 2015 Abstract Training Deeper Convolutional Networks with Deep Supervision Liwei Wang Computer Science Dept UIUC lwang97@illinois.edu Chen-Yu Lee ECE Dept UCSD chl260@ucsd.edu Zhuowen Tu CogSci Dept UCSD ztu0@ucsd.edu

More information

Making Deep Learning Understandable for Analyzing Sequential Data about Gene Regulation

Making Deep Learning Understandable for Analyzing Sequential Data about Gene Regulation Making Deep Learning Understandable for Analyzing Sequential Data about Gene Regulation Dr. Yanjun Qi Department of Computer Science University of Virginia Tutorial @ ACM BCB-2018 8/29/18 Yanjun Qi / UVA

More information

Lecture 5 Neural models for NLP

Lecture 5 Neural models for NLP CS546: Machine Learning in NLP (Spring 2018) http://courses.engr.illinois.edu/cs546/ Lecture 5 Neural models for NLP Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office hours: Tue/Thu 2pm-3pm

More information

Sequential Supervised Learning

Sequential Supervised Learning Sequential Supervised Learning Many Application Problems Require Sequential Learning Part-of of-speech Tagging Information Extraction from the Web Text-to to-speech Mapping Part-of of-speech Tagging Given

More information

Global Scene Representations. Tilke Judd

Global Scene Representations. Tilke Judd Global Scene Representations Tilke Judd Papers Oliva and Torralba [2001] Fei Fei and Perona [2005] Labzebnik, Schmid and Ponce [2006] Commonalities Goal: Recognize natural scene categories Extract features

More information

Deep Learning Sequence to Sequence models: Attention Models. 17 March 2018

Deep Learning Sequence to Sequence models: Attention Models. 17 March 2018 Deep Learning Sequence to Sequence models: Attention Models 17 March 2018 1 Sequence-to-sequence modelling Problem: E.g. A sequence X 1 X N goes in A different sequence Y 1 Y M comes out Speech recognition:

More information

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly

More information

RAGAV VENKATESAN VIJETHA GATUPALLI BAOXIN LI NEURAL DATASET GENERALITY

RAGAV VENKATESAN VIJETHA GATUPALLI BAOXIN LI NEURAL DATASET GENERALITY RAGAV VENKATESAN VIJETHA GATUPALLI BAOXIN LI NEURAL DATASET GENERALITY SIFT HOG ALL ABOUT THE FEATURES DAISY GABOR AlexNet GoogleNet CONVOLUTIONAL NEURAL NETWORKS VGG-19 ResNet FEATURES COMES FROM DATA

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 24, 2016 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models David Sontag New York University Lecture 4, February 16, 2012 David Sontag (NYU) Graphical Models Lecture 4, February 16, 2012 1 / 27 Undirected graphical models Reminder

More information

Feature Design. Feature Design. Feature Design. & Deep Learning

Feature Design. Feature Design. Feature Design. & Deep Learning Artificial Intelligence and its applications Lecture 9 & Deep Learning Professor Daniel Yeung danyeung@ieee.org Dr. Patrick Chan patrickchan@ieee.org South China University of Technology, China Appropriately

More information

Bayesian Classifiers and Probability Estimation. Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington

Bayesian Classifiers and Probability Estimation. Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington Bayesian Classifiers and Probability Estimation Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington 1 Data Space Suppose that we have a classification problem The

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Bayesian Methods: Naïve Bayes

Bayesian Methods: Naïve Bayes Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior

More information

Final Examination CS 540-2: Introduction to Artificial Intelligence

Final Examination CS 540-2: Introduction to Artificial Intelligence Final Examination CS 540-2: Introduction to Artificial Intelligence May 7, 2017 LAST NAME: SOLUTIONS FIRST NAME: Problem Score Max Score 1 14 2 10 3 6 4 10 5 11 6 9 7 8 9 10 8 12 12 8 Total 100 1 of 11

More information

CS 570: Machine Learning Seminar. Fall 2016

CS 570: Machine Learning Seminar. Fall 2016 CS 570: Machine Learning Seminar Fall 2016 Class Information Class web page: http://web.cecs.pdx.edu/~mm/mlseminar2016-2017/fall2016/ Class mailing list: cs570@cs.pdx.edu My office hours: T,Th, 2-3pm or

More information

Expectation Maximization (EM)

Expectation Maximization (EM) Expectation Maximization (EM) The EM algorithm is used to train models involving latent variables using training data in which the latent variables are not observed (unlabeled data). This is to be contrasted

More information

Machine Learning with Quantum-Inspired Tensor Networks

Machine Learning with Quantum-Inspired Tensor Networks Machine Learning with Quantum-Inspired Tensor Networks E.M. Stoudenmire and David J. Schwab Advances in Neural Information Processing 29 arxiv:1605.05775 RIKEN AICS - Mar 2017 Collaboration with David

More information

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Logistic Regression Matt Gormley Lecture 8 Feb. 12, 2018 1 10-601 Introduction

More information

Mixture of Gaussians Models

Mixture of Gaussians Models Mixture of Gaussians Models Outline Inference, Learning, and Maximum Likelihood Why Mixtures? Why Gaussians? Building up to the Mixture of Gaussians Single Gaussians Fully-Observed Mixtures Hidden Mixtures

More information

ONR Mine Warfare Autonomy Virtual Program Review September 7, 2017

ONR Mine Warfare Autonomy Virtual Program Review September 7, 2017 ONR Mine Warfare Autonomy Virtual Program Review September 7, 2017 Information-driven Guidance and Control for Adaptive Target Detection and Classification Silvia Ferrari Pingping Zhu and Bo Fu Mechanical

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

Natural Language Processing with Deep Learning CS224N/Ling284

Natural Language Processing with Deep Learning CS224N/Ling284 Natural Language Processing with Deep Learning CS224N/Ling284 Lecture 4: Word Window Classification and Neural Networks Richard Socher Organization Main midterm: Feb 13 Alternative midterm: Friday Feb

More information

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang Chapter 4 Dynamic Bayesian Networks 2016 Fall Jin Gu, Michael Zhang Reviews: BN Representation Basic steps for BN representations Define variables Define the preliminary relations between variables Check

More information

The Noisy Channel Model and Markov Models

The Noisy Channel Model and Markov Models 1/24 The Noisy Channel Model and Markov Models Mark Johnson September 3, 2014 2/24 The big ideas The story so far: machine learning classifiers learn a function that maps a data item X to a label Y handle

More information

Essence of Machine Learning (and Deep Learning) Hoa M. Le Data Science Lab, HUST hoamle.github.io

Essence of Machine Learning (and Deep Learning) Hoa M. Le Data Science Lab, HUST hoamle.github.io Essence of Machine Learning (and Deep Learning) Hoa M. Le Data Science Lab, HUST hoamle.github.io 1 Examples https://www.youtube.com/watch?v=bmka1zsg2 P4 http://www.r2d3.us/visual-intro-to-machinelearning-part-1/

More information

The Naïve Bayes Classifier. Machine Learning Fall 2017

The Naïve Bayes Classifier. Machine Learning Fall 2017 The Naïve Bayes Classifier Machine Learning Fall 2017 1 Today s lecture The naïve Bayes Classifier Learning the naïve Bayes Classifier Practical concerns 2 Today s lecture The naïve Bayes Classifier Learning

More information

Generative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul

Generative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul Generative Learning INFO-4604, Applied Machine Learning University of Colorado Boulder November 29, 2018 Prof. Michael Paul Generative vs Discriminative The classification algorithms we have seen so far

More information

Very Deep Residual Networks with Maxout for Plant Identification in the Wild Milan Šulc, Dmytro Mishkin, Jiří Matas

Very Deep Residual Networks with Maxout for Plant Identification in the Wild Milan Šulc, Dmytro Mishkin, Jiří Matas Very Deep Residual Networks with Maxout for Plant Identification in the Wild Milan Šulc, Dmytro Mishkin, Jiří Matas Center for Machine Perception Department of Cybernetics Faculty of Electrical Engineering

More information

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Naïve Bayes Matt Gormley Lecture 18 Oct. 31, 2018 1 Reminders Homework 6: PAC Learning

More information

CSE446: Neural Networks Winter 2015

CSE446: Neural Networks Winter 2015 CSE446: Neural Networks Winter 2015 Luke Ze

More information

Last Time. Today. Bayesian Learning. The Distributions We Love. CSE 446 Gaussian Naïve Bayes & Logistic Regression

Last Time. Today. Bayesian Learning. The Distributions We Love. CSE 446 Gaussian Naïve Bayes & Logistic Regression CSE 446 Gaussian Naïve Bayes & Logistic Regression Winter 22 Dan Weld Learning Gaussians Naïve Bayes Last Time Gaussians Naïve Bayes Logistic Regression Today Some slides from Carlos Guestrin, Luke Zettlemoyer

More information

CS 188: Artificial Intelligence. Machine Learning

CS 188: Artificial Intelligence. Machine Learning CS 188: Artificial Intelligence Review of Machine Learning (ML) DISCLAIMER: It is insufficient to simply study these slides, they are merely meant as a quick refresher of the high-level ideas covered.

More information

Lecture 12. Neural Networks Bastian Leibe RWTH Aachen

Lecture 12. Neural Networks Bastian Leibe RWTH Aachen Advanced Machine Learning Lecture 12 Neural Networks 10.12.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression

More information

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 9 Sep. 26, 2018

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 9 Sep. 26, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Logistic Regression Matt Gormley Lecture 9 Sep. 26, 2018 1 Reminders Homework 3:

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Word vectors Many slides borrowed from Richard Socher and Chris Manning Lecture plan Word representations Word vectors (embeddings) skip-gram algorithm Relation to matrix factorization

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

N-gram Language Modeling

N-gram Language Modeling N-gram Language Modeling Outline: Statistical Language Model (LM) Intro General N-gram models Basic (non-parametric) n-grams Class LMs Mixtures Part I: Statistical Language Model (LM) Intro What is a statistical

More information

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1 EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

Maxout Networks. Hien Quoc Dang

Maxout Networks. Hien Quoc Dang Maxout Networks Hien Quoc Dang Outline Introduction Maxout Networks Description A Universal Approximator & Proof Experiments with Maxout Why does Maxout work? Conclusion 10/12/13 Hien Quoc Dang Machine

More information

Reading Group on Deep Learning Session 1

Reading Group on Deep Learning Session 1 Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular

More information

Jakub Hajic Artificial Intelligence Seminar I

Jakub Hajic Artificial Intelligence Seminar I Jakub Hajic Artificial Intelligence Seminar I. 11. 11. 2014 Outline Key concepts Deep Belief Networks Convolutional Neural Networks A couple of questions Convolution Perceptron Feedforward Neural Network

More information

Deep Learning for NLP Part 2

Deep Learning for NLP Part 2 Deep Learning for NLP Part 2 CS224N Christopher Manning (Many slides borrowed from ACL 2012/NAACL 2013 Tutorials by me, Richard Socher and Yoshua Bengio) 2 Part 1.3: The Basics Word Representations The

More information

CS 188: Artificial Intelligence Fall 2008

CS 188: Artificial Intelligence Fall 2008 CS 188: Artificial Intelligence Fall 2008 Lecture 23: Perceptrons 11/20/2008 Dan Klein UC Berkeley 1 General Naïve Bayes A general naive Bayes model: C E 1 E 2 E n We only specify how each feature depends

More information

General Naïve Bayes. CS 188: Artificial Intelligence Fall Example: Overfitting. Example: OCR. Example: Spam Filtering. Example: Spam Filtering

General Naïve Bayes. CS 188: Artificial Intelligence Fall Example: Overfitting. Example: OCR. Example: Spam Filtering. Example: Spam Filtering CS 188: Artificial Intelligence Fall 2008 General Naïve Bayes A general naive Bayes model: C Lecture 23: Perceptrons 11/20/2008 E 1 E 2 E n Dan Klein UC Berkeley We only specify how each feature depends

More information

Mini-project 2 (really) due today! Turn in a printout of your work at the end of the class

Mini-project 2 (really) due today! Turn in a printout of your work at the end of the class Administrivia Mini-project 2 (really) due today Turn in a printout of your work at the end of the class Project presentations April 23 (Thursday next week) and 28 (Tuesday the week after) Order will be

More information

Dynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji

Dynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji Dynamic Data Modeling, Recognition, and Synthesis Rui Zhao Thesis Defense Advisor: Professor Qiang Ji Contents Introduction Related Work Dynamic Data Modeling & Analysis Temporal localization Insufficient

More information

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang Example: image classification indoor Indoor outdoor Example: image classification (multiclass)

More information

Tensor Methods for Feature Learning

Tensor Methods for Feature Learning Tensor Methods for Feature Learning Anima Anandkumar U.C. Irvine Feature Learning For Efficient Classification Find good transformations of input for improved classification Figures used attributed to

More information

Language Models. CS6200: Information Retrieval. Slides by: Jesse Anderton

Language Models. CS6200: Information Retrieval. Slides by: Jesse Anderton Language Models CS6200: Information Retrieval Slides by: Jesse Anderton What s wrong with VSMs? Vector Space Models work reasonably well, but have a few problems: They are based on bag-of-words, so they

More information

A Deep Interpretation of Classifier Chains

A Deep Interpretation of Classifier Chains A Deep Interpretation of Classifier Chains Jesse Read and Jaakko Holmén http://users.ics.aalto.fi/{jesse,jhollmen}/ Aalto University School of Science, Department of Information and Computer Science and

More information

lecture 6: modeling sequences (final part)

lecture 6: modeling sequences (final part) Natural Language Processing 1 lecture 6: modeling sequences (final part) Ivan Titov Institute for Logic, Language and Computation Outline After a recap: } Few more words about unsupervised estimation of

More information