Questioning Question Answering Answers

Similar documents
Why Should I Trust You? Speaker: Philip-W. Grassal Date: 05/03/18

Solving with Absolute Value

An introduction to plotting data

COMP3702/7702 Artificial Intelligence Week1: Introduction Russell & Norvig ch.1-2.3, Hanna Kurniawati

Chemical Applications of Symmetry and Group Theory Prof. Manabendra Chandra Department of Chemistry Indian Institute of Technology, Kanpur.

Fog Chamber Testing the Label: Photo of Fog. Joshua Gutwill 10/29/1999

Primary KS1 1 VotesForSchools2018

2. Light carries information. Scientists use light to learn about the Universe.

Newton s Cooling Model in Matlab and the Cooling Project!

Chapter 6. Net or Unbalanced Forces. Copyright 2011 NSTA. All rights reserved. For more information, go to

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16

Guest Speaker. CS 416 Artificial Intelligence. First-order logic. Diagnostic Rules. Causal Rules. Causal Rules. Page 1

Homework 6: Image Completion using Mixture of Bernoullis

Chapter 15. Probability Rules! Copyright 2012, 2008, 2005 Pearson Education, Inc.

Lesson 7: Classification of Solutions

SOLUTIONS Workshop 2: Reading Graphs, Motion in 1-D

PARCC Research Simulation Task Grade 3 Reading Lesson 5: Using Context Clues for the Vocabulary EBSR

Chemical Applications of Symmetry and Group Theory Prof. Manabendra Chandra Department of Chemistry Indian Institute of Technology, Kanpur

PHYSICS 15a, Fall 2006 SPEED OF SOUND LAB Due: Tuesday, November 14

Solving Equations by Adding and Subtracting

PH104 Lab 2 Measuring Distances Pre-Lab

Ask. Don t Tell. Annotated Examples

The Shrinking Aral Sea

Please read the following excerpt from an editorial about the Atkins diet

, (1) e i = ˆσ 1 h ii. c 2016, Jeffrey S. Simonoff 1

Math 111, Introduction to the Calculus, Fall 2011 Midterm I Practice Exam 1 Solutions

These are my slides and notes introducing the Red Queen Game to the National Association of Biology Teachers meeting in Denver in 2016.

Lecture 4: Training a Classifier

Algorithms other than SGD. CS6787 Lecture 10 Fall 2017

Discovering Geographical Topics in Twitter

Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi

Propositions. c D. Poole and A. Mackworth 2010 Artificial Intelligence, Lecture 5.1, Page 1

Foundations for Functions

33. SOLVING LINEAR INEQUALITIES IN ONE VARIABLE

Regression, part II. I. What does it all mean? A) Notice that so far all we ve done is math.

Big Bang, Black Holes, No Math

CS 188: Artificial Intelligence Spring Today

Is there life outside of Earth? Activity 2: Moving Stars and Their Planets

CS 160: Lecture 16. Quantitative Studies. Outline. Random variables and trials. Random variables. Qualitative vs. Quantitative Studies

The PROMYS Math Circle Problem of the Week #3 February 3, 2017

Clouds & Mission for NASA

Activity Book Made just for me by the Santa Cruz Consolidated Emergency Communications Center

Data Warehousing & Data Mining

Expansion of Terms. f (x) = x 2 6x + 9 = (x 3) 2 = 0. x 3 = 0

Explaining Results of Neural Networks by Contextual Importance and Utility

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER / Lines and Their Equations

Module 4 Educator s Guide Overview

Algebra Exam. Solutions and Grading Guide

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n =

Linear classifiers Lecture 3

(Refer Slide Time: 00:10)

Egyptian Fractions: Part I

China Day 5 The Shang Dynasty

2.4 Investigating Symmetry

How GIS can be used for improvement of literacy and CE programmes

Directed Reading B. Section: Tools and Models in Science TOOLS IN SCIENCE MAKING MEASUREMENTS. is also know as the metric system.

NAME: DATE: MATHS: Working with Sets. Maths. Working with Sets

Integrating ARCGIS with Datamining Software to Predict Habitat for Red Sea Urchins on the Coast of British Columbia.

How to Lie with Maps. Author: Mark Monmonier

Crossing the Atlantic: Then and Now

Privacy of Numeric Queries Via Simple Value Perturbation. The Laplace Mechanism

Quantum Socks and Photon Polarization Revision: July 29, 2005 Authors: Appropriate Level: Abstract:

Honors Advanced Algebra Unit 3: Polynomial Functions October 28, 2016 Task 10: Factors, Zeros, and Roots: Oh My!

Solar Open House Toolkit

Conduction and Radiation Prof. C. Balaji Department of Mechanical Engineering Indian Institute of Technology, Madras

Programming Assignment 4: Image Completion using Mixture of Bernoullis

Introduction to Machine Learning Midterm Exam

Science Analysis Tools Design

California State Science Fair

Math Fundamentals for Statistics I (Math 52) Unit 7: Connections (Graphs, Equations and Inequalities)

HOST: Welcome to Astronomy Behind the Headlines, a podcast by the Astronomical

1.1 The Language of Mathematics Expressions versus Sentences

LABORATORY 4: ROTATIONAL MOTION PLAYGROUND DYNAMICS: THE MERRY-GO-ROUND Written May-June 1993 by Melissa Wafer '95

Lesson 8: Graphs of Simple Non Linear Functions

Linear Programming and its Extensions Prof. Prabha Shrama Department of Mathematics and Statistics Indian Institute of Technology, Kanpur

Absolute and Local Extrema

Kernels. Machine Learning CSE446 Carlos Guestrin University of Washington. October 28, Carlos Guestrin

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning

Physics 8 Monday, September 16, 2013

SAMPLE. Succeeding in Social Studies 3 4 TH IN A SERIES OF 7. Years 3 4. Written by Valerie Marett. CORONEOS PUBLICATIONS Item No 506

Intensity of Light and Heat. The second reason that scientists prefer the word intensity is Well, see for yourself.

Linear Independence Reading: Lay 1.7

MITOCW 6. Standing Waves Part I

Efficient and Principled Online Classification Algorithms for Lifelon

Basics of Proofs. 1 The Basics. 2 Proof Strategies. 2.1 Understand What s Going On

Electromagnetic radiation simply a stream of photons (a bundle of energy) What are photons???

TIME: 45 minutes. LESSON: Curious About Clouds GRADE: 1 st SUMMARY:

Lecture 27: Theory of Computation. Marvin Zhang 08/08/2016

Alg2H Ch6: Investigating Exponential and Logarithmic Functions WK#14 Date:

Your web browser (Safari 7) is out of date. For more security, comfort and. the best experience on this site: Update your browser Ignore

Effective January 2008 All indicators in Standard / 11

Where I is the current in Amperes, q is the charge in coulombs, and t is the time in seconds.

Section 29: What s an Inverse?

Exploring Graphs of Polynomial Functions

Special Theory of Relativity Prof. Shiva Prasad Department of Physics Indian Institute of Technology, Bombay. Lecture - 15 Momentum Energy Four Vector

Adam Blank Spring 2017 CSE 311. Foundations of Computing I

Your web browser (Safari 7) is out of date. For more security, comfort and. the best experience on this site: Update your browser Ignore

Multiple Representations: Equations to Tables and Graphs Transcript

Towards Fully-automated Driving

Design and Optimization of Energy Systems Prof. C. Balaji Department of Mechanical Engineering Indian Institute of Technology, Madras

Transcription:

Questioning Question Answering Answers Sameer Singh University of California, Irvine

Questioning Question Answering Answers Sameer Singh University of California, Irvine

QA Systems are really good! Is there a moustache in the picture? > Yes What is the moustache made of? > Banana Visual7A [Zhu et al 2016]

QA Systems are really good! The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest river in Central and Western Europe (after the Danube), at about 1,230 km (760 mi) How long is the Rhine? 1230km Is it doing the right thing? BiDAF [Seo et al 2017] 4

We know that they are not Jia and Liang, EMNLP 2017 Mudrakarta et al ACL 2018

Overstability! What is the moustache made of? > Banana What are the eyes made of? > Bananas What is? > Banana What? > Banana

Oversensitivity to phrasing! What type of road sign is shown? > STOP. What type of road sign is shown? > Do not Enter.

Oversensitivity to unimportant typos! How long is the Rhine? The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest river in Central and Western Europe (after the Danube), at about 1,230 km (760 mi) > 1230km How long is the Rhine? > More than 1,050,000

QA Systems are brittle Our goals are to provide automated tools For both oversensitivity and overstability Can we figure these out automatically, with minimal human time? Can we try to rationalize/explain predictions? analyze the mistakes? Hopefully, they help design choices for: Data gathering and annotations Model structure and training Evaluation pipelines

Being Model-Agnostic Ignore the internal structure X1 > 0.5 f(x) X2 > 0.5 Not restricted to differentiable modules Practically easy: not tied to PyTorch, Tflow, etc. Study models that you don t have access to! 10

Talk Overview Explaining Predictions LIME: Linear Explanations Anchors: Sufficient Conditions SEARS: Detecting Oversensitivity

Talk Overview Explaining Predictions LIME: Linear Explanations Anchors: Sufficient Conditions SEARS: Detecting Oversensitivity

Being Local Global explanation is too complicated

Being Local Global explanation is too complicated

Being Local Global explanation is too complicated Describe the locally-accurate behavior, using interpretable representations

Talk Overview KDD 2016 Explaining Predictions LIME: Linear Explanations Anchors: Sufficient Conditions SEARS: Detecting Oversensitivity

LIME: Sparse, Linear Explanations Identify the important words, and present their relative importance

What an explanation looks like From: Keith Richards Subject: Christianity is the answer NTTP-Posting-Host: x.x.com I think Christianity is the one true religion. If you d like to know more, send me a note Why did this happen?

LIME on VisualQA What type of road sign is shown? > STOP. LIME What type of road sign is shown?

LIME on SQuAD The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest river in Central and Western Europe (after the Danube), at about 1,230 km (760 mi) What is the longest river in Central and Western Europe? the Danube LIME BiDAF [Seo et al 2017] What is the longest river in Central and Western Europe?

LIME on SQuAD The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest river in Central and Western Europe (after the Danube), at about 1,230 km (760 mi) What is the second longest river in Central and Western Europe? the Danube LIME BiDAF [Seo et al 2017] What is the second longest river in Central and Western Europe?

Limitations of LIME Gain understanding of local behavior, but very little generalization The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest river in Central and Western Europe (after the Danube), at about 1,230 km (760 mi) Which is the second longest river in Germany s part of Europe? Unless they run it, the users have little idea of what the answer will be

Talk Overview Explaining Predictions LIME: Linear Explanations Anchors: Sufficient Conditions SEARS: Detecting Oversensitivity AAAI 2018

Anchors: Sufficient Conditions Identify the conditions under which the classifier has the same prediction

Anchors on VisualQA What type of road sign is shown? STOP. If question starts with What (and is similarly structured) the prediction will be STOP What type of road sign is shown? What type of road sign is shown? 96.8%

Anchors on Visual QA Anchor

Anchors on Visual QA Anchor

Anchors on SQuAD The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest river in Central and Western Europe (after the Danube), at about 1,230 km (760 mi) What is the longest river in Central and Western Europe? the Danube What is the longest river in Central and Western Europe? What is the longest river in Central and Western Europe? 96.5%

Anchors on SQuAD The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest river in Central and Western Europe (after the Danube), at about 1,230 km (760 mi) What is the second longest river in Central and Western Europe? the Danube What is the second longest river in Central and Western Europe? What is the second longest river in Central and Western Europe?

User study on VisualQA Show humans predictions + explanations Ask them to predict what the model will do in new instances (only if confident) No explanations Which is the longest river? LIME Which is the longest river? Anchor Danube Danube Which is second longest river? Danube, Rhine, I don t know Anchor: longest river Danube

Summary of VisualQA Results How often they predict How often they correct Time per prediction 100 100 95.95 20 80 60 62.85 80 60 64.95 66.9 10 16.3 9.85 40 20 35.3 29.6 40 Users are more precise 20 and quicker with anchors 4.55 0 No Explanations LIME Anchor 0 No Explanations LIME Anchor 0 No Explanations LIME Anchor

Anchors: Tools for Overstability What about Over-sensitivity?

Talk Overview Explaining predictions LIME: Linear Explanations Anchors: Sufficient Conditions SEARS: Detecting Oversensitivity ACL 2018

Oversensitivity: Adversarial Examples Find closest example with different prediction 37

But unlikely in the real world (except for attacks) Oversensitivity in images panda 57.7% confidence gibbon 99.3% confidence Adversaries are indistinguishable to humans 39

What about text? What type of road sign is shown? > STOP. What type of of road sign sign is is shown? Perceptible by humans, unlikely in real world 40

What about text? What type of road sign is shown? > STOP. What type of road sign is shown? A single word changes too much! 41

Semantics matter What type of road sign is shown? > STOP. What type of road sign is shown? > Do not Enter. Bug, and likely in the real world 42

Semantics matter The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest river in Central and Western Europe (after the Danube), at about 1,230 km (760 mi) How long is the Rhine? > 1230km How long is the Rhine? > More than 1,050,000 Not all changes are the same: meaning should be same 43

Characterize via Rules Find rule that generates many adversaries 44

Characterizing via Rules What type of road sign is shown? > STOP. What type of road sign is shown? > Do not Enter. Rule What NOUN Which NOUN - flips 3.9% of examples

Characterizing via Rules How long is the Rhine? The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest river in Central and Western Europe (after the Danube), at about 1,230 km (760 mi) > 1230km How long is the Rhine? > More than 1,050,000 Rule??? - flips 3% of examples 47

SEARS: Adversarial Rules Rules are global and actionable, more interesting than individual adversaries 48

SEARS Examples: VisualQA Visual7a-Telling [Zhu et al 2016] 49

SEARS Examples: SQuAD BiDAF [Seo et al 2017] 50

VQA User Study: Detecting adversaries 45 40 33.6 36 SEAs find adversaries as often as humans! 20 SEAs + Humans better than humans! 0 Human SEA Human + SEA Human SEA Human + SEA

VQA User study: Can experts find bugs? % predictions flipped Time (minutes) 20 14.2 SEARs are much better than expert-produced rules 20 16.9 10.1 3 Evaluating is much easier than finding them 0 Experts Visual QA SEARs Closing the loop brings it down to 1.4% 0 Finding Rules Visual QA Evaluating SEARs

Talk Overview Explaining Predictions LIME: Linear Explanations Anchors: Sufficient Conditions SEARS: Detecting Oversensitivity

Why such tools can be useful Annotations and Task Definitions SQuAD 2.0: unanswerable questions VisualQA 2.0: questions with different answers Evaluation Create robust test set Include explanations/bugs as qualitative evaluation End to End QA may not be sufficient Saleforce s NLP Decathalon ELMO Representation: learn across domains, and fine-tune!

Questioning Question Answering Answers Work with Marco T. Ribeiro and Carlos Guestrin, University of Washington Thanks! sameer@uci.edu sameersingh.org @sameer_ Work with Matt Gardner and me as part of The Allen Institute for Artificial Intelligence in Irvine, CA All levels: pre-docs, PhD interns, postdocs, and research scientists!