Concepts & Categorization. Measurement of Similarity

Similar documents
Machine Learning

Problem sets. Late policy (5% off per day, but the weekend counts as only one day). E.g., Friday: -5% Monday: -15% Tuesday: -20% Thursday: -30%

Space Exploration. Before You Read LESSON 6

CLASSROOM VISIT ELECTROMAGNETS

Alien Earths Using Conic Sections to Explore the Solar System

TEKS Cluster: Space. identify and compare the physical characteristics of the Sun, Earth, and Moon

Classification is the KEY...

CLASSROOM KIT ELECTROMAGNETS

Georgia Performance Standards Framework for Science

0 questions at random and keep in order

Pre-Level I. Teacher s Manual. Rebecca W. Keller, Ph.D.

The Science of Biology Chapter 1. Sec. 1-1 What Is Science?

Conceptual Physics Fundamentals

Ponce de Leon Middle School 8 th Grade Summer 2017 Summer Instructional Packet

The Science of Physics

Finding the Moon. DELTA SCIENCE READER Overview Before Reading Guide the Reading After Reading

2275 Speedway, Mail Code C9000 Austin, TX (512) Planet Fun

Science of Spin TM and the 2010 TEKS Science Objectives Grades K-5

Ellie gets charged about Electricity!

Integrated Curriculum and Instruction Design: Inquiry-Based Learning Authors: Tomi Diefenbach & Leslie Harder Title: Solar System Grade Level: 3

Advanced Intermediate Science Curriculum Map Grade

Strange New Planet. Time Budget: 1 hour

Forces and Newton s Laws

Mission To Mars! A dialogue activity for upper KS2

PAKISTAN INTERNATIONAL SCHOOL, RIYADH 1ST Semester Revision Worksheet Subject: Science Grade: III Short Answers Answer these Questions:

SCI-5 Deane_Units1_2_SOL_Practice_Test Exam not valid for Paper Pencil Test Sessions

Activity 2 Mechanical Interactions and Motion Energy INTERACTIONS AND ENERGY. What changes in motion energy occur? The car s motion energy increases.

Year 7 Science Booklet Name:

Resource and Activity Pack. Discussion questions Comprehension exercise Lesson plans Activities

The complete lesson plan for this topic is included below.

Students will read supported and shared informational materials, including social

HADDONFIELD PUBLIC SCHOOLS Curriculum Map for Science, Grade 5, Magnets and Motors

continued Before you use the slides you might find the following websites useful for information on the satellite and also space in general: The

Magnetism and Electricity Unit Design Rev9.08 Grade 5

In this section we want to apply what we have learned about functions to real world problems, a.k.a. word problems.

Missions mars. Beyond the Book. FOCUS Book

KIDS HOPE AUS. THEMED MENTOR HOUR

Chapter 3 Laws of Motion

all the passengers. Figure 4.1 The bike transfers the effort and motion of the clown's feet into a different motion for all the riders.

Zoomer s Event Kit. Illustrations Ned Young.

Office of Curriculum, Instruction, and Technology. Mathematics. Grade 7 ABSTRACT

UNM Department of Physics and Astronomy SLOs for all core lecture and lab courses as of Oct 7, 2015

INSTRUCTIONAL FOCUS DOCUMENT Grade 3 Ciencia,Science

Topic 5: Probability. 5.4 Combined Events and Conditional Probability Paper 1

Showing That Air Has Mass

Physical Science Curriculum Guide Scranton School District Scranton, PA

Science Year 5 and 6

Planets and Moons. unit overview

Amarillo ISD - Physics - Science Standards

date: math analysis 2 chapter 18: curve fitting and models

Activity 1: Evidence of Interactions

Mobiles by All Rights Reserved.

MARS, THE RED PLANET.

topic modeling hanna m. wallach

Electricity and Magnetism Static Electricity, Current Electricity, and Magnets

AP Physics C Syllabus

Author Jamey Acosta The articles in this book are collected from the TIME For Kids archives.

INTERACTIVE INTERACTIVE NOTEBOOK NOTEBOOK VOCABULARY VOCABULARY GRADES

8L Earth and Space. recommended teaching time hours

Unit of Study. Interconnections Lessons. Vocabulary Additional Resources/Notes Assessment. 5 th Grade Science Curriculum Map

Total Points 50. Project grade 50

EXTENDED CLASSROOM Kopernik Observatory & Science Center

3rd Grade. Forces and Motion Review. Slide 1 / 106 Slide 2 / 106. Slide 4 / 106. Slide 3 / 106. Slide 5 / 106. Slide 6 / 106. Motion and Stability

Are Spinners Really Random?

Support Resources Techniquest Stuart Street Cardiff CF10 5BW Tel:

Madison County Schools Suggested 7 th Grade Math Pacing Guide,

Living and Non-Living First Grade NTI: 5 Day Project

Magnetism 2. D. the charge moves at right angles to the lines of the magnetic field. (1)

About the Van Allen Probes Mission

OFFICE OF CURRICULUM AND INSTRUCTION EARTH SCIENCE. Grades Credits: 5

Cambridge International Examinations Cambridge International General Certificate of Secondary Education

3rd Grade Motion and Stability

Name: Block: Date: NNHS Introductory Physics: MCAS Review Packet #4 Introductory Physics, High School Learning Standards for a Full First-Year Course

Electromagnetism Question 1 What influences the strength of an electromagnet? What property does a needle inside a compass possess?

QUESTION 1 MULTIPLE CHOICE ST. MARY S DSG KLOOF

Table of Contents SPACE 3 PETS 15 HOME GROWN 51 YOUR WORLD 27 IN AMERICA 63 SUMMER FUN 39 ON THE GO 75 APES 87

COWLEY COLLEGE & Area Vocational Technical School

Science in the news Voyager s 11 billion mile journey

ocr.org.uk/gcsescience GCSE (9-1) Gateway Science Suite

UNIVERSITY OF CAMBRIDGE INTERNATIONAL EXAMINATIONS International General Certificate of Secondary Education

Centripetal Force Review. 1. The graph given shows the weight of three objects on planet X as a function of their mass.

GEMS Student Activity

GCSE PHYSICS. Materials For this paper you must have: a ruler a scientific calculator the Physics Equations Sheet (enclosed).

Introduction to Electricity

DRAFT. Activity 16, Electromagnetic Induction! Science & Global Issues: Global Energy & Power! from! 2014 The Regents of the University of California!

AB EXAM. 2. When the first 2003 positive odd primes are multiplied together, what is the units digit of the product?

AGN. discover how. discoveries. Science.

Momentum and Impulse

Chapter 7: Momentum and Impulse

2. The distance between the Sun and the next closest star, Proxima Centuari, is MOST accurately measured in

GCSE PHYSICS. Please write clearly in block capitals. Surname. Forename(s) Candidate signature

Lesson 2.18: Physical Science Energy & Physical Science Review

Riverview NATURE S PATTERNS NATURE S PATTERNS. January/ February LEARNING ABOUT HAS BEEN A BLAST! Jason shares. what he sees.

8. The graph below shows a beetle s movement along a plant stem.

MAGNETS: A FIRST LOOK Grade Levels: K-4 17 minutes AIMS MULTIMEDIA Instructional Graphics Enclosed

Electromagnetism Review Sheet

Physics Curriculum Map school year

TEARING THROUGH HALF-LIFE

Grade 6 Lesson 1. Lesson Plan Page 2. Student Activity Handout 1 Page 5

Conceptual Physics 11 th Edition

Transcription:

Concepts & Categorization Measurement of Similarity Geometric approach Featural approach both are vector representations

Vector-representation for words Words represented as vectors of feature values Similar words have similar vectors 9 9 radio 5 6 pet 5 7 6 dog 6 cat 9 9 radio 5 6 pet 5 7 6 dog 6 cat How to get vector representations Multidimensional scaling on similarity ratings Tversky s (977) contrast model Latent Semantic Analysis (Landauer & Dumais, 997) Topics Model (e.g., Griffiths & Steyvers, 00)

Multidimensional Scaling (MDS) Approach Suppose we have N stimuli Measure the (dis)similarity between every pair of stimuli (N x (N-) / pairs). Represent each stimulus as a point in a multidimensional space. Similarity is measured by geometric distance, e.g., Minkowski distance metric: d ij n = x k= ik x r r jk Multidimensional Scaling Represent observed similarities by a multidimensional space close neighbors should have high similarity Multidimensional Scaling: iterative procedure to place points in a (low) dimensional space to model observed similarities

Data: Matrix of (dis)similarity MDS procedure: move points in space to best model observed similarity relations

Example: D solution for bold faces D solution for fruit words 5

Critical Assumptions of Geometric Approach Psychological distance should obey three axioms Minimality d( a, b) d( a, a) = d( b, b) = 0 Symmetry d ( a, b) = d( b, a) Triangle inequality d( a, b) + d( b, c) d( a, c) For conceptual relations, violations of distance axioms often found Similarities can often be asymmetric North-Korea is more similar to China than vice versa Pomegranate is more similar to Apple than vice versa Violations of triangle inequality: Lemon Orange Apricot 6

Triangle Inequality and similarity constraints on words with multiple meanings SOCCER AB AC FIELD BC MAGNETIC Euclidian distance: AC AB + BC Nearest neighbor problem (Tversky & Hutchinson (96) In similarity data, Fruit is nearest neighbor in out of 0 items In D solution, Fruit can be nearest neighbor of at most 5 items High-dimensional solutions might solve this but these are less appealing 7

Feature Contrast Model (Tversky, 977) Represent stimuli with sets of discrete features Similarity is an increasing function of common features decreasing function of distinct features Common features Features unique to I Features unique to J Sim( I, J ) = af ( I J ) bf ( I J ) cf ( J I) a,b, and c are weighting parameters Contrast model predicts asymmetries Weighting parameter b > c pomegranate is more similar to apple than vice versa because pomegranate has fewer distinctive features

Contrast model predicts violations of triangle inequality Weighting parameter a > b > c (common feature should be weighted more) Additive Tree solution 9

Latent Semantic Analysis (LSA) Landauer & Dumais (997) Assumptions ) words similar in meaning occur in similar verbal contexts (e.g., magazine articles, book chapters, newspaper articles) ) we can count number of times words occur in documents and construct a large word x document matrix ) this co-occurrence matrix contains a wealth of latent semantic information that can be extracted by statistical techniques ) words can be represented as points in a multidimensional space Latent Semantic Analysis (Landauer & Dumais, 97) (high dimensional space) TERMS DOCUMENTS D FIELD 5 MEADOW BASEBALL 0 MAJOR 5 MAJOR FOOTBALL FIELD CORN BASEBALL GRASS MEADOW Information in matrix is compressed; relationships between words through other words are used. 0

Problem: LSA has to obey triangle inequality SOCCER AB AC FIELD BC MAGNETIC Euclidian distance: AC AB + BC The Topics Model (Griffith & Steyvers, 00 & 00) A probabilistic version of LSA: no spatial constraints. Each document (i.e. context) is a mixture of topics. Each topic is a distribution over words Each word chosen from a single topic: P T ( w ) = P( w z = j) P( z = j) i j= i i i word probability in topic j probability of topic j in document

A toy example w i P( z = ) P( z = ) TOPIC MIXTURE P( w z ) HEART 0. LOVE 0. SOUL 0. TEARS 0. MYSTERY 0. JOY 0. Words can occur in multiple topics P( w z ) SCIENTIFIC 0. KNOWLEDGE 0. WORK 0. RESEARCH 0. MATHEMATICS 0. MYSTERY 0. MIXTURE COMPONENTS All probability to topic w i P( z = ) = P( z = ) = 0 TOPIC MIXTURE P( w z ) HEART 0. LOVE 0. SOUL 0. TEARS 0. MYSTERY 0. JOY 0. P( w z ) SCIENTIFIC 0. KNOWLEDGE 0. WORK 0. RESEARCH 0. MATHEMATICS 0. MYSTERY 0. MIXTURE COMPONENTS Document: HEART, LOVE, JOY, SOUL, HEART,.

All probability to topic w i P( z = ) = 0 P( z = ) = TOPIC MIXTURE P( w z ) HEART 0. LOVE 0. SOUL 0. TEARS 0. MYSTERY 0. JOY 0. P( w z ) SCIENTIFIC 0. KNOWLEDGE 0. WORK 0. RESEARCH 0. MATHEMATICS 0. MYSTERY 0. MIXTURE COMPONENTS Document: SCIENTIFIC, KNOWLEDGE, SCIENTIFIC, RESEARCH,. Mixing topic and w i P( z = ) = 0.5 P( z = ) = 0.5 TOPIC MIXTURE P( w z ) HEART 0. LOVE 0. SOUL 0. TEARS 0. MYSTERY 0. JOY 0. P( w z ) SCIENTIFIC 0. KNOWLEDGE 0. WORK 0. RESEARCH 0. MATHEMATICS 0. MYSTERY 0. MIXTURE COMPONENTS Document: LOVE, SCIENTIFIC, HEART, SOUL, KNOWLEDGE, RESEARCH,.

Application to corpus data TASA corpus: text from first grade to college representative sample of text 6,000+ word types (stop words removed) 7,000+ documents 6,000,000+ word tokens A selection from 500 topics THEORY SCIENTISTS EXPERIMENT OBSERVATIONS SCIENTIFIC EXPERIMENTS HYPOTHESIS EXPLAIN SCIENTIST OBSERVED EXPLANATION BASED OBSERVATION IDEA EVIDENCE THEORIES BELIEVED DISCOVERED OBSERVE FACTS SPACE EARTH MOON PLANET ROCKET MARS ORBIT ASTRONAUTS FIRST SPACECRAFT JUPITER SATELLITE SATELLITES ATMOSPHERE SPACESHIP SURFACE SCIENTISTS ASTRONAUT SATURN MILES ART PAINT ARTIST PAINTING PAINTED ARTISTS MUSEUM WORK PAINTINGS STYLE PICTURES WORKS OWN SCULPTURE PAINTER ARTS BEAUTIFUL DESIGNS PORTRAIT PAINTERS STUDENTS TEACHER STUDENT TEACHERS TEACHING CLASS CLASSROOM SCHOOL LEARNING PUPILS CONTENT INSTRUCTION TAUGHT GROUP GRADE SHOULD GRADES CLASSES PUPIL GIVEN BRAIN NERVE SENSE SENSES ARE NERVOUS NERVES BODY SMELL TASTE TOUCH MESSAGES IMPULSES CORD ORGANS SPINAL FIBERS SENSORY PAIN IS CURRENT ELECTRICITY ELECTRIC CIRCUIT IS ELECTRICAL VOLTAGE FLOW BATTERY WIRE WIRES SWITCH CONNECTED ELECTRONS RESISTANCE POWER CONDUCTORS CIRCUITS TUBE NEGATIVE

Polysemy: words with multiple meanings represented in different topics FIELD MAGNETIC MAGNET WIRE NEEDLE CURRENT COIL POLES IRON COMPASS LINES CORE ELECTRIC DIRECTION FORCE MAGNETS BE MAGNETISM POLE INDUCED SCIENCE STUDY SCIENTISTS SCIENTIFIC KNOWLEDGE WORK RESEARCH CHEMISTRY TECHNOLOGY MANY MATHEMATICS BIOLOGY FIELD PHYSICS LABORATORY STUDIES WORLD SCIENTIST STUDYING SCIENCES BALL GAME TEAM FOOTBALL BASEBALL PLAYERS PLAY FIELD PLAYER BASKETBALL COACH PLAYED PLAYING HIT TENNIS TEAMS GAMES SPORTS BAT TERRY JOB WORK JOBS CAREER EXPERIENCE EMPLOYMENT OPPORTUNITIES WORKING TRAINING SKILLS CAREERS POSITIONS FIND POSITION FIELD OCCUPATIONS REQUIRE OPPORTUNITY EARN ABLE No Problem of Triangle Inequality TOPIC TOPIC SOCCER FIELD MAGNETIC Topic structure easily explains violations of triangle inequality 5

How to get vector representations Multidimensional scaling on similarity ratings Tversky s (977) contrast model Latent Semantic Analysis (Landauer & Dumais, 997) Topics Model (e.g., Griffiths & Steyvers, 00) 6