Integra(ng and Ranking Uncertain Scien(fic Data

Size: px
Start display at page:

Download "Integra(ng and Ranking Uncertain Scien(fic Data"

Transcription

1 Jan 19, Biomedical and Health Informatics 2 Computer Science and Engineering University of Washington Integra(ng and Ranking Uncertain Scien(fic Data Wolfgang Ga*erbauer 2 Based on joint work with: Todd Detwiler 1, Abhay Jha 2, Brent Louie 1, Dan Suciu 2, and Peter Tarczy- Hornoch 1

2 Mo1va1on: Retrieving relevant infos across several DBs Keyword ABCC8 DB2 B C DB1 A B ABCC8 AGTTC... DB3 AGTTC... xxx AGTTC... xyz AGTCC... xyz DB4 B xxx xyy C a c Results a c e B C xyz e... AGTTC... xyy xzz e AGTTC... xyz AGTCC... xzz Problem: mul(ple expansions across different data- bases can quickly lead to many less relevant results. Ques1on: how can prune or rank those results? 2

3 Agenda How to model uncertainty in data integra(on? How do we rank? How well, how fast, how robust on real data? A short database research point of view 3

4 4 Probabilis1c Metrics in UII Granularity Schema Instance E/R En(ty p s p r Rela(onship q s q r 4

5 Example Belief Metrics Schema graph p s0 Q p s1 p s4 q q s1-4 s0-1 p s3 q s0-2 q s2-3 R q s3-5 Instance graph p s2 E 0 q r0-1 q r0-2 p r1 q E r1-5 1 p r2 q r2-3 E 2 E 3 q r2-4 p r3 p r4 E 4 qr3-5 q r4-6 E 5 E 6 p r5 p r6 Final Scores p = p s p r q = q s q r 5

6 Transla1on of Uncertain1es into Probabilis1c Weights We use domain experts to quan(fy and transform data uncertain(es into the 4 types of probabilis(c weights Example transforma(ons:

7 How can we assign some score (here the color)... Source: Todd

8 ... that allows ranking? Source: Todd

9 Agenda How to model uncertainty in data integra(on? How do we rank? How well, how fast, how robust on real data? A short database research point of view 9

10 Network Reliability Theory ( source- target reachability ) Source- target- reachability: probability that a node is reachable from the start (query) node Query node Func(on # Func(on #2 Func(on #2 = = 0.81 Func(on #1 = 1 Prob(all paths failed) = 1 ( )( ) =

11 Incorpora1ng Uncertainty: Network Reliability Theory score = probability that an answer node is reachable from the start (query) node. s q q p q p Problem: Compu(ng U2 score is #P. p q q p q t 11

12 Why is reliability = reachability hard? The following graph is nasty = hard! Can come in different forms: Wheatstone Bridge :n n:m n:1 Reachability score: = :n 1:n n:1 n:1 12

13 Closed solu1on is possible some1mes Detail: gene ABCC8, upstream node

14 Techniques to perform probabilis1c scoring Naive Monte Carlo simula(on Improved Monte Carlo simula(on Analyze the necessary number of simula(ons Graph reduc(ons (Parallel- serial reduc(ons) Closed solu(on for subgraphs Propaga1on score Deterministc counterparts 14

15 Ignoring correla1ons: the relevance propaga1on model Ignoring correla(ons leads to a local point of view. One equa(on for relevance r for each node n i and each arc a i,j Solve simple equa(on system (closed or itera(vely) ARC a i,j NODE e i,j p i q i,j r i1,j p j n i a i,j n j r i2,j n j r i r i,j r i3,j r j r i,j = r i q i,j r j = (1- i (1- r i,j )) p j 15

16 Example: reliability vs. propaga1on Reliability Propaga(on Reliability = Propaga(on s 0.5 s s u r = u r = u r =

17 Comparing reliability and propaga1on: complexity Reliability Propaga(on global measure combinatorial problem P# = hard Mone Carlo es(mates local measure con(nuous state space P = not hard Itera(ve algorithm 17

18 Agenda How to model uncertainty in data integra(on? How do we rank? How well, how fast, how robust on real data? A short database research point of view 18

19 Experiments: Func1onal gene annota1on 3 ques1ons 1) How well do different approaches perform? [Average precision (AP)] 2) How fast is probabilis(c query evalua(on? [Focus on reliability] 3) Where do you get the probabili(es from? How robust is our system to varia(ons in the input probabili(es? [Sensi(vity analysis] 6 data sources: Pfam, TIGRFAM, NCBIBlast, EntrezProtein, EntrezGen, AmiGo 3 scenarios 1) Well- known func(ons for well- studied proteins (306/20) 2) Less- known func(ons for well- studied proteins (7/3) 3) Unknown func(ons for less- studied proteins (11/11) 19

20 1. How well (1/3): Average Precision Assume 4 out of 10 items are relevant Rank Ranking method 1 relevant precision@k x 1.00 (=1/1) x 1.00 (=2/2) x x 0.75 (=3/4) 0.57 (=4/7) Ranking method 2 relevant precision@k x 1.00 (=1/1) x x x 0.67 (=2/3) 0.75 (=3/4) 0.50 (=4/8) Random AP Averaged over all 10 4 permutabons AP AP as measure for the quality of the ranking seman(cs with regard to ground truth 20

21 1. How well (2/3): Scoring func1ons Scoring func1on Example graph Example score Reliability s t Propaga(on s t InEdge s t 2 incoming edges PathCount s t 3 paths (1 shown) Random AP no score: AP averaged over all ranking permutabons 21

22 1. How well (3/3): AP across 3 scenarios Scenario 1: 306 well- known func(on, 20 well- studied proteins Scenario 2 7 less- known func(ons, 3 well- studied proteins Scenario 1: 11 unknown func(ons, 11 less- studied proteins Observa(on 1: Probabilis(c methods perform berer for predic(ng less- known or previously unknown func(ons! 22

23 2. How fast: Several techniques for speeding up reliability Techniques (not discussed in detail): naive Monte Carlo (N), efficient Monte Carlo (M), instead of simulabons (e4, e5), graph reducbons (R), closed solubon (C) Observa(on 2: Several techniques allowed us to evaluate the reliability seman(cs in ~20ms (propaga(on ~5ms, InEdge and Pathcount ~1ms) 23

24 3. How robust: sensi1vity analysis Our approach depends on transforming uncertainty into probabilisbc weights. How robust is the performance to systemabc variabons in these input parameters? Idea: mulb- way sensibvity analysis p = Lo 1 Lo(p)+ε ε = N(0, σ 2 ) Lo(p) = log( p 1 p ) Observa(on 3: Small random perturba(ons to the ini(al parameters do not nega(vely affect the quality of rankings. The approach is robust! 24

25 Take- way from experiments on real data Uncertainty of informa(on Unknown informa1on Less- known informa1on Well- known informa1on Determinis1c Probabilis1c Informa(on integra(on approach Explicit modeling of uncertain(es as probabili(es increases our ability to predict less- known or previously unknown protein func(ons. This suggests that uncertainty models offer u(lity for knowledge discovery. Small perturba(ons in the input probabili(es (parameters) tend to produce only minor changes in the quality of our result rankings. This suggests that probabilis(c methods are robust against varia(ons in the way uncertain(es are transformed into probabili(es. Several techniques allow us to evaluate probabilis(c rankings efficiently. This suggests that probabilis(c query evalua(on is not as hard for real- world problems as theory indicates. 25

26 Agenda How to model uncertainty in data integra(on? How do we rank? How well, how fast, how robust on real data? A short database research point of view 26

27 Short database background (1/2) Schema ATTEND(student,class) TEACH(class,prof)! DEP(prof,department)! SQL query select!a.student, T.department! from!attend A, TEACH T, DEP D! where!attend.class=teach.class! and!teach.prof=dep.prof! 27

28 Short database background (2/2) Schema R(A,B) S(B,C)! T(C,D)! SQL query select!r.a, T.D! from!r, S, T! where!r.b=s.b! and!s.c=t.c! Datalog q(x,u):-r(x,y),s(y,z),t(z,u)! Conjunc(ve queries: very efficient! 28

29 Probabilis)c databases (1/3) q(x,u):-r(x,y), S(y,z), T(z,u)! R S T A B B C C D a y 1 y 1 z 1 z 1 d a y 2 y 1 z 2 z 2 d y 2 z 2 Which tuples? q(a,d)! 29

30 Probabilis)c databases (2/3) q(x,u):-r p (x,y), S(y,z), T p (z,u)! Which tuples & how likely?! R p S T p A B a y 1 a y 2 a p 1 p 2 y 1 B C p 1 y 1 z 1 z 1 d p 3 p 2 y 1 z 2 z 2 d p 4 y 1 y 2 y 1 y 2 y 2 z 2 C D Nasty graph! Not efficient! p 3 p 4 d Can propagadon help? P[q(a,d)] = p 1 p 3 p 1 p 4 p 2 p 4 = reachability a d z 1 z 2 z 2 30

31 Probabilis)c databases (3/3) q(x,u):-r p (x,y), S(y,z), T p (y,u)! R p S T p A B B C C D a y 1 p 1 y 1 z 1 y 1 d p 3 a y 2 p 2 y 1 z 2 y 2 d p 4 y 2 z 2 q(x,y):-r p (x,y), R p (x,z), T p (z,u)! Non- linear chain queries / self joins. How to define a propagadon semandcs? 31

32 Which ranking seman1cs is appropriate for real data? Input (probabilis1c) data? Ouput ranked results R p 1 1. Hard in general A B a a p 1 a e p 2 Possible world seman(cs ~ reliability SensiBvity of ranking with respect to accur- acy of input probabilit b c p 3 d c p 4 d a p 5 e a p 6 e c p 7 e d p 8 4. Hidden dependencies in the input data in the first place Alterna(ve ranking seman(cs ~ propagabon Decrease in ranking quality due to approximabon Can we get good ranking results for arbitrary queries on real data or at least a good trade- off speed / ranking accuracy? 32

33 Further informa.on PAPER L. Detwiler, W. Ga4erbauer, B. Louie, D. Suciu and P. Tarczy- Hornoch. IntegraEng and Ranking Uncertain ScienEfic Data. In Proceedings of the 25th InternaEonal Conference on Data Engineering, PROJECT WEB PAGE h4p:// DATABASE RESEARCH GROUP h4p://db.cs.washington.edu/ CONTACT Wolfgang Ga4erbauer: THANKS! 33

Priors in Dependency network learning

Priors in Dependency network learning Priors in Dependency network learning Sushmita Roy sroy@biostat.wisc.edu Computa:onal Network Biology Biosta2s2cs & Medical Informa2cs 826 Computer Sciences 838 hbps://compnetbiocourse.discovery.wisc.edu

More information

CSCI 360 Introduc/on to Ar/ficial Intelligence Week 2: Problem Solving and Op/miza/on. Professor Wei-Min Shen Week 8.1 and 8.2

CSCI 360 Introduc/on to Ar/ficial Intelligence Week 2: Problem Solving and Op/miza/on. Professor Wei-Min Shen Week 8.1 and 8.2 CSCI 360 Introduc/on to Ar/ficial Intelligence Week 2: Problem Solving and Op/miza/on Professor Wei-Min Shen Week 8.1 and 8.2 Status Check Projects Project 2 Midterm is coming, please do your homework!

More information

CSE 473: Ar+ficial Intelligence. Hidden Markov Models. Bayes Nets. Two random variable at each +me step Hidden state, X i Observa+on, E i

CSE 473: Ar+ficial Intelligence. Hidden Markov Models. Bayes Nets. Two random variable at each +me step Hidden state, X i Observa+on, E i CSE 473: Ar+ficial Intelligence Bayes Nets Daniel Weld [Most slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at hnp://ai.berkeley.edu.]

More information

CSE 473: Ar+ficial Intelligence

CSE 473: Ar+ficial Intelligence CSE 473: Ar+ficial Intelligence Hidden Markov Models Luke Ze@lemoyer - University of Washington [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188

More information

Least Squares Parameter Es.ma.on

Least Squares Parameter Es.ma.on Least Squares Parameter Es.ma.on Alun L. Lloyd Department of Mathema.cs Biomathema.cs Graduate Program North Carolina State University Aims of this Lecture 1. Model fifng using least squares 2. Quan.fica.on

More information

Networks. Can (John) Bruce Keck Founda7on Biotechnology Lab Bioinforma7cs Resource

Networks. Can (John) Bruce Keck Founda7on Biotechnology Lab Bioinforma7cs Resource Networks Can (John) Bruce Keck Founda7on Biotechnology Lab Bioinforma7cs Resource Networks in biology Protein-Protein Interaction Network of Yeast Transcriptional regulatory network of E.coli Experimental

More information

Quan&fying Uncertainty. Sai Ravela Massachuse7s Ins&tute of Technology

Quan&fying Uncertainty. Sai Ravela Massachuse7s Ins&tute of Technology Quan&fying Uncertainty Sai Ravela Massachuse7s Ins&tute of Technology 1 the many sources of uncertainty! 2 Two days ago 3 Quan&fying Indefinite Delay 4 Finally 5 Quan&fying Indefinite Delay P(X=delay M=

More information

Bayesian networks Lecture 18. David Sontag New York University

Bayesian networks Lecture 18. David Sontag New York University Bayesian networks Lecture 18 David Sontag New York University Outline for today Modeling sequen&al data (e.g., =me series, speech processing) using hidden Markov models (HMMs) Bayesian networks Independence

More information

Lecture 4 Introduc-on to Data Flow Analysis

Lecture 4 Introduc-on to Data Flow Analysis Lecture 4 Introduc-on to Data Flow Analysis I. Structure of data flow analysis II. Example 1: Reaching defini?on analysis III. Example 2: Liveness analysis IV. Generaliza?on 15-745: Intro to Data Flow

More information

Introduc)on to Ar)ficial Intelligence

Introduc)on to Ar)ficial Intelligence Introduc)on to Ar)ficial Intelligence Lecture 10 Probability CS/CNS/EE 154 Andreas Krause Announcements! Milestone due Nov 3. Please submit code to TAs! Grading: PacMan! Compiles?! Correct? (Will clear

More information

Ensemble of Climate Models

Ensemble of Climate Models Ensemble of Climate Models Claudia Tebaldi Climate Central and Department of Sta7s7cs, UBC Reto Knu>, Reinhard Furrer, Richard Smith, Bruno Sanso Outline Mul7 model ensembles (MMEs) a descrip7on at face

More information

Differen'al Privacy with Bounded Priors: Reconciling U+lity and Privacy in Genome- Wide Associa+on Studies

Differen'al Privacy with Bounded Priors: Reconciling U+lity and Privacy in Genome- Wide Associa+on Studies Differen'al Privacy with Bounded Priors: Reconciling U+lity and Privacy in Genome- Wide Associa+on Studies Florian Tramèr, Zhicong Huang, Erman Ayday, Jean- Pierre Hubaux ACM CCS 205 Denver, Colorado,

More information

Correla'on. Keegan Korthauer Department of Sta's'cs UW Madison

Correla'on. Keegan Korthauer Department of Sta's'cs UW Madison Correla'on Keegan Korthauer Department of Sta's'cs UW Madison 1 Rela'onship Between Two Con'nuous Variables When we have measured two con$nuous random variables for each item in a sample, we can study

More information

Unit 3: Ra.onal and Radical Expressions. 3.1 Product Rule M1 5.8, M , M , 6.5,8. Objec.ve. Vocabulary o Base. o Scien.fic Nota.

Unit 3: Ra.onal and Radical Expressions. 3.1 Product Rule M1 5.8, M , M , 6.5,8. Objec.ve. Vocabulary o Base. o Scien.fic Nota. Unit 3: Ra.onal and Radical Expressions M1 5.8, M2 10.1-4, M3 5.4-5, 6.5,8 Objec.ve 3.1 Product Rule I will be able to mul.ply powers when they have the same base, including simplifying algebraic expressions

More information

Seman&cs with Dense Vectors. Dorota Glowacka

Seman&cs with Dense Vectors. Dorota Glowacka Semancs with Dense Vectors Dorota Glowacka dorota.glowacka@ed.ac.uk Previous lectures: - how to represent a word as a sparse vector with dimensions corresponding to the words in the vocabulary - the values

More information

Introduc)on to Ar)ficial Intelligence

Introduc)on to Ar)ficial Intelligence Introduc)on to Ar)ficial Intelligence Lecture 13 Approximate Inference CS/CNS/EE 154 Andreas Krause Bayesian networks! Compact representa)on of distribu)ons over large number of variables! (OQen) allows

More information

Sta$s$cal sequence recogni$on

Sta$s$cal sequence recogni$on Sta$s$cal sequence recogni$on Determinis$c sequence recogni$on Last $me, temporal integra$on of local distances via DP Integrates local matches over $me Normalizes $me varia$ons For cts speech, segments

More information

CS 6140: Machine Learning Spring What We Learned Last Week. Survey 2/26/16. VS. Model

CS 6140: Machine Learning Spring What We Learned Last Week. Survey 2/26/16. VS. Model Logis@cs CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa@on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Assignment

More information

DART Tutorial Part IV: Other Updates for an Observed Variable

DART Tutorial Part IV: Other Updates for an Observed Variable DART Tutorial Part IV: Other Updates for an Observed Variable UCAR The Na'onal Center for Atmospheric Research is sponsored by the Na'onal Science Founda'on. Any opinions, findings and conclusions or recommenda'ons

More information

CS 6140: Machine Learning Spring 2016

CS 6140: Machine Learning Spring 2016 CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa?on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Logis?cs Assignment

More information

FIDUCEO Fidelity and Uncertainty in Climate Data Records from Earth Observation

FIDUCEO Fidelity and Uncertainty in Climate Data Records from Earth Observation FIDUCEO has received funding from the European Union s Horizon 2020 Programme for Research and Innovation, under Grant Agreement no. 638822 FIDUCEO Fidelity and Uncertainty in Climate Data Records from

More information

Mul$- model ensemble challenge ini$al/model uncertain$es

Mul$- model ensemble challenge ini$al/model uncertain$es Mul$- model ensemble challenge ini$al/model uncertain$es Yuejian Zhu Ensemble team leader Environmental Modeling Center NCEP/NWS/NOAA Acknowledgments: EMC ensemble team staffs Presenta$on for WMO/WWRP

More information

Linear Regression and Correla/on. Correla/on and Regression Analysis. Three Ques/ons 9/14/14. Chapter 13. Dr. Richard Jerz

Linear Regression and Correla/on. Correla/on and Regression Analysis. Three Ques/ons 9/14/14. Chapter 13. Dr. Richard Jerz Linear Regression and Correla/on Chapter 13 Dr. Richard Jerz 1 Correla/on and Regression Analysis Correla/on Analysis is the study of the rela/onship between variables. It is also defined as group of techniques

More information

Linear Regression and Correla/on

Linear Regression and Correla/on Linear Regression and Correla/on Chapter 13 Dr. Richard Jerz 1 Correla/on and Regression Analysis Correla/on Analysis is the study of the rela/onship between variables. It is also defined as group of techniques

More information

PSAAP Project Stanford

PSAAP Project Stanford PSAAP Project QMU @ Stanford Component Analysis and rela:on to Full System Simula:ons 1 What do we want to predict? Objec:ve: predic:on of the unstart limit expressed as probability of unstart (or alterna:vely

More information

Announcements. Topics: Work On: - sec0ons 1.2 and 1.3 * Read these sec0ons and study solved examples in your textbook!

Announcements. Topics: Work On: - sec0ons 1.2 and 1.3 * Read these sec0ons and study solved examples in your textbook! Announcements Topics: - sec0ons 1.2 and 1.3 * Read these sec0ons and study solved examples in your textbook! Work On: - Prac0ce problems from the textbook and assignments from the coursepack as assigned

More information

CSE 473: Ar+ficial Intelligence. Probability Recap. Markov Models - II. Condi+onal probability. Product rule. Chain rule.

CSE 473: Ar+ficial Intelligence. Probability Recap. Markov Models - II. Condi+onal probability. Product rule. Chain rule. CSE 473: Ar+ficial Intelligence Markov Models - II Daniel S. Weld - - - University of Washington [Most slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188

More information

Class Notes. Examining Repeated Measures Data on Individuals

Class Notes. Examining Repeated Measures Data on Individuals Ronald Heck Week 12: Class Notes 1 Class Notes Examining Repeated Measures Data on Individuals Generalized linear mixed models (GLMM) also provide a means of incorporang longitudinal designs with categorical

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 1 Evalua:on

More information

Graphical Models. Lecture 1: Mo4va4on and Founda4ons. Andrew McCallum

Graphical Models. Lecture 1: Mo4va4on and Founda4ons. Andrew McCallum Graphical Models Lecture 1: Mo4va4on and Founda4ons Andrew McCallum mccallum@cs.umass.edu Thanks to Noah Smith and Carlos Guestrin for some slide materials. Board work Expert systems the desire for probability

More information

Outline. Logic. Knowledge bases. Wumpus world characteriza/on. Wumpus World PEAS descrip/on. A simple knowledge- based agent

Outline. Logic. Knowledge bases. Wumpus world characteriza/on. Wumpus World PEAS descrip/on. A simple knowledge- based agent Outline Logic Dr. Melanie Mar/n CS 4480 October 8, 2012 Based on slides from hap://aima.eecs.berkeley.edu/2nd- ed/slides- ppt/ Knowledge- based agents Wumpus world Logic in general - models and entailment

More information

Cosmological N-Body Simulations and Galaxy Surveys

Cosmological N-Body Simulations and Galaxy Surveys Cosmological N-Body Simulations and Galaxy Surveys Adrian Pope, High Energy Physics, Argonne Na3onal Laboratory, apope@anl.gov CScADS: Scien3fic Data and Analy3cs for Extreme- scale Compu3ng, 30 July 2012

More information

Founda'ons of Large- Scale Mul'media Informa'on Management and Retrieval. Lecture #4 Similarity. Edward Chang

Founda'ons of Large- Scale Mul'media Informa'on Management and Retrieval. Lecture #4 Similarity. Edward Chang Founda'ons of Large- Scale Mul'media Informa'on Management and Retrieval Lecture #4 Similarity Edward Y. Chang Edward Chang Foundations of LSMM 1 Edward Chang Foundations of LSMM 2 Similar? Edward Chang

More information

EESC 9945 Geodesy with the Global Posi6oning System. Class 2: Satellite orbits

EESC 9945 Geodesy with the Global Posi6oning System. Class 2: Satellite orbits EESC 9945 Geodesy with the Global Posi6oning System Class 2: Satellite orbits Background The model for the pseudorange was Today, we ll develop how to calculate the vector posi6on of the satellite The

More information

DART Tutorial Sec'on 1: Filtering For a One Variable System

DART Tutorial Sec'on 1: Filtering For a One Variable System DART Tutorial Sec'on 1: Filtering For a One Variable System UCAR The Na'onal Center for Atmospheric Research is sponsored by the Na'onal Science Founda'on. Any opinions, findings and conclusions or recommenda'ons

More information

Least Square Es?ma?on, Filtering, and Predic?on: ECE 5/639 Sta?s?cal Signal Processing II: Linear Es?ma?on

Least Square Es?ma?on, Filtering, and Predic?on: ECE 5/639 Sta?s?cal Signal Processing II: Linear Es?ma?on Least Square Es?ma?on, Filtering, and Predic?on: Sta?s?cal Signal Processing II: Linear Es?ma?on Eric Wan, Ph.D. Fall 2015 1 Mo?va?ons If the second-order sta?s?cs are known, the op?mum es?mator is given

More information

Recurrent Neural Networks. Dr. Kira Radinsky CTO SalesPredict Visi8ng Professor/Scien8st Technion. Slides were adapted from lectures by Richard Socher

Recurrent Neural Networks. Dr. Kira Radinsky CTO SalesPredict Visi8ng Professor/Scien8st Technion. Slides were adapted from lectures by Richard Socher Recurrent Neural Networks Dr. Kira Radinsky CTO SalesPredict Visi8ng Professor/Scien8st Technion Slides were adapted from lectures by Richard Socher Overview Tradi8onal language models RNNs RNN language

More information

Crowdsourcing Mul/- Label Classifica/on. Jonathan Bragg University of Washington

Crowdsourcing Mul/- Label Classifica/on. Jonathan Bragg University of Washington Crowdsourcing Mul/- Label Classifica/on Jonathan Bragg University of Washington Collaborators Dan Weld University of Washington Mausam University of Washington à IIT Delhi Overview What is mul?- label

More information

Latent Dirichlet Alloca/on

Latent Dirichlet Alloca/on Latent Dirichlet Alloca/on Blei, Ng and Jordan ( 2002 ) Presented by Deepak Santhanam What is Latent Dirichlet Alloca/on? Genera/ve Model for collec/ons of discrete data Data generated by parameters which

More information

STAD68: Machine Learning

STAD68: Machine Learning STAD68: Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! h0p://www.cs.toronto.edu/~rsalakhu/ Lecture 1 Evalua;on 3 Assignments worth 40%. Midterm worth 20%. Final

More information

Probability and Structure in Natural Language Processing

Probability and Structure in Natural Language Processing Probability and Structure in Natural Language Processing Noah Smith Heidelberg University, November 2014 Introduc@on Mo@va@on Sta@s@cal methods in NLP arrived ~20 years ago and now dominate. Mercer was

More information

Par$cle Filters Part I: Theory. Peter Jan van Leeuwen Data- Assimila$on Research Centre DARC University of Reading

Par$cle Filters Part I: Theory. Peter Jan van Leeuwen Data- Assimila$on Research Centre DARC University of Reading Par$cle Filters Part I: Theory Peter Jan van Leeuwen Data- Assimila$on Research Centre DARC University of Reading Reading July 2013 Why Data Assimila$on Predic$on Model improvement: - Parameter es$ma$on

More information

Hidden Markov Models and Applica2ons. Spring 2017 February 21,23, 2017

Hidden Markov Models and Applica2ons. Spring 2017 February 21,23, 2017 Hidden Markov Models and Applica2ons Spring 2017 February 21,23, 2017 Gene finding in prokaryotes Reading frames A protein is coded by groups of three nucleo2des (codons): ACGTACGTACGTACGT ACG-TAC-GTA-CGT-ACG-T

More information

Ensemble Data Assimila.on and Uncertainty Quan.fica.on

Ensemble Data Assimila.on and Uncertainty Quan.fica.on Ensemble Data Assimila.on and Uncertainty Quan.fica.on Jeffrey Anderson, Alicia Karspeck, Tim Hoar, Nancy Collins, Kevin Raeder, Steve Yeager Na.onal Center for Atmospheric Research Ocean Sciences Mee.ng

More information

CSE P 501 Compilers. Value Numbering & Op;miza;ons Hal Perkins Winter UW CSE P 501 Winter 2016 S-1

CSE P 501 Compilers. Value Numbering & Op;miza;ons Hal Perkins Winter UW CSE P 501 Winter 2016 S-1 CSE P 501 Compilers Value Numbering & Op;miza;ons Hal Perkins Winter 2016 UW CSE P 501 Winter 2016 S-1 Agenda Op;miza;on (Review) Goals Scope: local, superlocal, regional, global (intraprocedural), interprocedural

More information

IS4200/CS6200 Informa0on Retrieval. PageRank Con+nued. with slides from Hinrich Schütze and Chris6na Lioma

IS4200/CS6200 Informa0on Retrieval. PageRank Con+nued. with slides from Hinrich Schütze and Chris6na Lioma IS4200/CS6200 Informa0on Retrieval PageRank Con+nued with slides from Hinrich Schütze and Chris6na Lioma Exercise: Assump0ons underlying PageRank Assump0on 1: A link on the web is a quality signal the

More information

Mixture Models. Michael Kuhn

Mixture Models. Michael Kuhn Mixture Models Michael Kuhn 2017-8-26 Objec

More information

Approximate Inference

Approximate Inference Approximate Inference Simulation has a name: sampling Sampling is a hot topic in machine learning, and it s really simple Basic idea: Draw N samples from a sampling distribution S Compute an approximate

More information

Probability. CS 3793/5233 Artificial Intelligence Probability 1

Probability. CS 3793/5233 Artificial Intelligence Probability 1 CS 3793/5233 Artificial Intelligence 1 Motivation Motivation Random Variables Semantics Dice Example Joint Dist. Ex. Axioms Agents don t have complete knowledge about the world. Agents need to make decisions

More information

The Mysteries of Quantum Mechanics

The Mysteries of Quantum Mechanics The Mysteries of Quantum Mechanics Class 5: Quantum Behavior and Interpreta=ons Steve Bryson www.stevepur.com/quantum Ques=ons? The Quantum Wave Quantum Mechanics says: A par=cle s behavior is described

More information

The consistency between measured radiance and retrieved profiles at climate scales a study in uncertainty propaga9on

The consistency between measured radiance and retrieved profiles at climate scales a study in uncertainty propaga9on The consistency between measured radiance and retrieved profiles at climate scales a study in uncertainty propaga9on Nadia Smith, Dave Tobin, Bob Knuteson, Bill Smith Sr., Elisabeth Weisz and Hank Revercomb

More information

Parallel Tempering Algorithm in Monte Carlo Simula5on

Parallel Tempering Algorithm in Monte Carlo Simula5on Parallel Tempering Algorithm in Monte Carlo Simula5on Tony Cheung (CUHK) Kevin Zhao (CUHK) Mentors: Ying Wai Li (ORNL) Markus Eisenbach (ORNL) Kwai Wong (UTK/ORNL) Monte Carlo Algorithms Mo5va5on: Idea:

More information

Bias/variance tradeoff, Model assessment and selec+on

Bias/variance tradeoff, Model assessment and selec+on Applied induc+ve learning Bias/variance tradeoff, Model assessment and selec+on Pierre Geurts Department of Electrical Engineering and Computer Science University of Liège October 29, 2012 1 Supervised

More information

Exact data mining from in- exact data Nick Freris

Exact data mining from in- exact data Nick Freris Exact data mining from in- exact data Nick Freris Qualcomm, San Diego October 10, 2013 Introduc=on (1) Informa=on retrieval is a large industry.. Biology, finance, engineering, marke=ng, vision/graphics,

More information

Engineering Characteriza.on of Spa.ally Variable Ground Mo.on

Engineering Characteriza.on of Spa.ally Variable Ground Mo.on Engineering Characteriza.on of Spa.ally Variable Ground Mo.on Timothy D. Ancheta PEER Center, UC Berkeley Jonathan P. Stewart UCLA Civil & Environmental Engineering Department Norman A. Abrahamson Pacific

More information

1998: enter Link Analysis

1998: enter Link Analysis 1998: enter Link Analysis uses hyperlink structure to focus the relevant set combine traditional IR score with popularity score Page and Brin 1998 Kleinberg Web Information Retrieval IR before the Web

More information

Parameter Es*ma*on: Cracking Incomplete Data

Parameter Es*ma*on: Cracking Incomplete Data Parameter Es*ma*on: Cracking Incomplete Data Khaled S. Refaat Collaborators: Arthur Choi and Adnan Darwiche Agenda Learning Graphical Models Complete vs. Incomplete Data Exploi*ng Data for Decomposi*on

More information

Lecture 12 The Level Set Approach for Turbulent Premixed Combus=on

Lecture 12 The Level Set Approach for Turbulent Premixed Combus=on Lecture 12 The Level Set Approach for Turbulent Premixed Combus=on 12.- 1 A model for premixed turbulent combus7on, based on the non- reac7ng scalar G rather than on progress variable, has been developed

More information

CSCI 360 Introduc/on to Ar/ficial Intelligence Week 2: Problem Solving and Op/miza/on

CSCI 360 Introduc/on to Ar/ficial Intelligence Week 2: Problem Solving and Op/miza/on CSCI 360 Introduc/on to Ar/ficial Intelligence Week 2: Problem Solving and Op/miza/on Professor Wei-Min Shen Week 13.1 and 13.2 1 Status Check Extra credits? Announcement Evalua/on process will start soon

More information

Semantics of Ranking Queries for Probabilistic Data and Expected Ranks

Semantics of Ranking Queries for Probabilistic Data and Expected Ranks Semantics of Ranking Queries for Probabilistic Data and Expected Ranks Graham Cormode AT&T Labs Feifei Li FSU Ke Yi HKUST 1-1 Uncertain, uncertain, uncertain... (Probabilistic, probabilistic, probabilistic...)

More information

Ensemble Data Assimila.on for Climate System Component Models

Ensemble Data Assimila.on for Climate System Component Models Ensemble Data Assimila.on for Climate System Component Models Jeffrey Anderson Na.onal Center for Atmospheric Research In collabora.on with: Alicia Karspeck, Kevin Raeder, Tim Hoar, Nancy Collins IMA 11

More information

Exponen'al func'ons and exponen'al growth. UBC Math 102

Exponen'al func'ons and exponen'al growth. UBC Math 102 Exponen'al func'ons and exponen'al growth Course Calendar: OSH 4 due by 12:30pm in MX 1111 You are here Coming up (next week) Group version of Quiz 3 distributed by email Group version of Quiz 3 due in

More information

Planning and Analyzing WFIRST Grism Observa:ons

Planning and Analyzing WFIRST Grism Observa:ons Planning and Analyzing WFIRST Grism Observa:ons Stefano Casertano and the STScI Slitless Spectroscopy Working Group (Brammer, Dixon, MacKenty, Pirzkal, Ravindranath, Ryan) Pasadena 2/29/2016 - WFIRST mee6ng,

More information

CSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015

CSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015 CSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015 Luke ZeElemoyer Slides adapted from Carlos Guestrin Predic5on of con5nuous variables Billionaire says: Wait, that s not what

More information

CS 161: Design and Analysis of Algorithms

CS 161: Design and Analysis of Algorithms CS 161: Design and Analysis of Algorithms NP- Complete I P, NP Polynomial >me reduc>ons NP- Hard, NP- Complete Sat/ 3- Sat Decision Problem Suppose there is a func>on A that outputs True or False A decision

More information

1. Introduc9on 2. Bivariate Data 3. Linear Analysis of Data

1. Introduc9on 2. Bivariate Data 3. Linear Analysis of Data Lecture 3: Bivariate Data & Linear Regression 1. Introduc9on 2. Bivariate Data 3. Linear Analysis of Data a) Freehand Linear Fit b) Least Squares Fit c) Interpola9on/Extrapola9on 4. Correla9on 1. Introduc9on

More information

Cri$ques Ø 5 cri&ques in total Ø Each with 6 points

Cri$ques Ø 5 cri&ques in total Ø Each with 6 points Cri$ques Ø 5 cri&ques in total Ø Each with 6 points 1 Distributed Applica$on Alloca$on in Shared Sensor Networks Chengjie Wu, You Xu, Yixin Chen, Chenyang Lu Shared Sensor Network Example in San Francisco

More information

CS 7180: Behavioral Modeling and Decision- making in AI

CS 7180: Behavioral Modeling and Decision- making in AI CS 7180: Behavioral Modeling and Decision- making in AI Hidden Markov Models Prof. Amy Sliva October 26, 2012 Par?ally observable temporal domains POMDPs represented uncertainty about the state Belief

More information

CS 188: Artificial Intelligence Fall Recap: Inference Example

CS 188: Artificial Intelligence Fall Recap: Inference Example CS 188: Artificial Intelligence Fall 2007 Lecture 19: Decision Diagrams 11/01/2007 Dan Klein UC Berkeley Recap: Inference Example Find P( F=bad) Restrict all factors P() P(F=bad ) P() 0.7 0.3 eather 0.7

More information

Graph structure learning for network inference

Graph structure learning for network inference Graph structure learning for network inference Sushmita Roy sroy@biostat.wisc.edu Computa9onal Network Biology Biosta2s2cs & Medical Informa2cs 826 Computer Sciences 838 hbps://compnetbiocourse.discovery.wisc.edu

More information

Elici%ng Informa%on from the Crowd a Part of the EC 13 Tutorial on Social Compu%ng and User- Generated Content

Elici%ng Informa%on from the Crowd a Part of the EC 13 Tutorial on Social Compu%ng and User- Generated Content Elici%ng Informa%on from the Crowd a Part of the EC 13 Tutorial on Social Compu%ng and User- Generated Content Yiling Chen Harvard University June 16, 2013 Roadmap Elici%ng informa%on for events with verifiable

More information

CSE 473: Ar+ficial Intelligence. Example. Par+cle Filters for HMMs. An HMM is defined by: Ini+al distribu+on: Transi+ons: Emissions:

CSE 473: Ar+ficial Intelligence. Example. Par+cle Filters for HMMs. An HMM is defined by: Ini+al distribu+on: Transi+ons: Emissions: CSE 473: Ar+ficial Intelligence Par+cle Filters for HMMs Daniel S. Weld - - - University of Washington [Most slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All

More information

CS 188: Artificial Intelligence. Bayes Nets

CS 188: Artificial Intelligence. Bayes Nets CS 188: Artificial Intelligence Probabilistic Inference: Enumeration, Variable Elimination, Sampling Pieter Abbeel UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew

More information

Pedro Alexandrino Fernandes, Dep. Chemistry & Biochemistry, University of Porto, Portugal

Pedro Alexandrino Fernandes, Dep. Chemistry & Biochemistry, University of Porto, Portugal Pedro Alexandrino Fernandes, Dep. Chemistry & Biochemistry, University of Porto, Portugal pedro.fernandes@fc.up.pt 1. Introduc3on Intermolecular Associa3ons 1. Introduc3on What type of forces govern these

More information

Database design and implementation CMPSCI 645

Database design and implementation CMPSCI 645 Database design and implementation CMPSCI 645 Lectures 20: Probabilistic Databases *based on a tutorial by Dan Suciu Have we seen uncertainty in DB yet? } NULL values Age Height Weight 20 NULL 200 NULL

More information

A Model for Quan.fying Informa.on Leakage. Steven Whang, Hector Garcia Molina Stanford University

A Model for Quan.fying Informa.on Leakage. Steven Whang, Hector Garcia Molina Stanford University A Model for Quan.fying Informa.on Leakage Steven Whang, Hector Garcia Molina Stanford University Mo.va.on Insurers Test Data Profiles to Iden7fy Risky Clients Steven E. Whang 2 Mo.va.on How Apple and Amazon

More information

Data Prepara)on. Dino Pedreschi. Anna Monreale. Università di Pisa

Data Prepara)on. Dino Pedreschi. Anna Monreale. Università di Pisa Data Prepara)on Anna Monreale Dino Pedreschi Università di Pisa KDD Process Interpretation and Evaluation Data Consolidation Selection and Preprocessing Warehouse Data Mining Prepared Data p(x)=0.02 Patterns

More information

Image Processing 1 (IP1) Bildverarbeitung 1

Image Processing 1 (IP1) Bildverarbeitung 1 MIN-Fakultät Fachbereich Informatik Arbeitsbereich SAV/BV (KOGS) Image Processing 1 (IP1) Bildverarbeitung 1 Lecture 18 Mo

More information

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic CS 188: Artificial Intelligence Fall 2008 Lecture 16: Bayes Nets III 10/23/2008 Announcements Midterms graded, up on glookup, back Tuesday W4 also graded, back in sections / box Past homeworks in return

More information

LINK ANALYSIS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

LINK ANALYSIS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS LINK ANALYSIS Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Retrieval models Retrieval evaluation Link analysis Models

More information

Molecular Replacement. Airlie McCoy

Molecular Replacement. Airlie McCoy Molecular Replacement Airlie McCoy Molecular Replacement Find orienta5on and posi5on where model overlies the target structure Borrow the phases Then it becomes a refinement problem the phases change known

More information

Computer Vision. Pa0ern Recogni4on Concepts. Luis F. Teixeira MAP- i 2014/15

Computer Vision. Pa0ern Recogni4on Concepts. Luis F. Teixeira MAP- i 2014/15 Computer Vision Pa0ern Recogni4on Concepts Luis F. Teixeira MAP- i 2014/15 Outline General pa0ern recogni4on concepts Classifica4on Classifiers Decision Trees Instance- Based Learning Bayesian Learning

More information

Ensemble 4DVAR and observa3on impact study with the GSIbased hybrid ensemble varia3onal data assimila3on system. for the GFS

Ensemble 4DVAR and observa3on impact study with the GSIbased hybrid ensemble varia3onal data assimila3on system. for the GFS Ensemble 4DVAR and observa3on impact study with the GSIbased hybrid ensemble varia3onal data assimila3on system for the GFS Xuguang Wang University of Oklahoma, Norman, OK xuguang.wang@ou.edu Ting Lei,

More information

Unsupervised Learning: K- Means & PCA

Unsupervised Learning: K- Means & PCA Unsupervised Learning: K- Means & PCA Unsupervised Learning Supervised learning used labeled data pairs (x, y) to learn a func>on f : X Y But, what if we don t have labels? No labels = unsupervised learning

More information

REGRESSION AND CORRELATION ANALYSIS

REGRESSION AND CORRELATION ANALYSIS Problem 1 Problem 2 A group of 625 students has a mean age of 15.8 years with a standard devia>on of 0.6 years. The ages are normally distributed. How many students are younger than 16.2 years? REGRESSION

More information

THE VINE COPULA METHOD FOR REPRESENTING HIGH DIMENSIONAL DEPENDENT DISTRIBUTIONS: APPLICATION TO CONTINUOUS BELIEF NETS

THE VINE COPULA METHOD FOR REPRESENTING HIGH DIMENSIONAL DEPENDENT DISTRIBUTIONS: APPLICATION TO CONTINUOUS BELIEF NETS Proceedings of the 00 Winter Simulation Conference E. Yücesan, C.-H. Chen, J. L. Snowdon, and J. M. Charnes, eds. THE VINE COPULA METHOD FOR REPRESENTING HIGH DIMENSIONAL DEPENDENT DISTRIBUTIONS: APPLICATION

More information

Linking Traits to Ecosystem Processes. Moira Hough

Linking Traits to Ecosystem Processes. Moira Hough Linking Traits to Ecosystem Processes Moira Hough How do organisms impact ecosystems? Long history of study of ecological effects of biodiversity and species composi@on comes out of the diversity stability

More information

Data Envelopment Analysis (DEA) with an applica6on to the assessment of Academics research performance

Data Envelopment Analysis (DEA) with an applica6on to the assessment of Academics research performance Data Envelopment Analysis (DEA) with an applica6on to the assessment of Academics research performance Outline DEA principles Assessing the research ac2vity in an ICT School via DEA Selec2on of Inputs

More information

Multi-join Query Evaluation on Big Data Lecture 2

Multi-join Query Evaluation on Big Data Lecture 2 Multi-join Query Evaluation on Big Data Lecture 2 Dan Suciu March, 2015 Dan Suciu Multi-Joins Lecture 2 March, 2015 1 / 34 Multi-join Query Evaluation Outline Part 1 Optimal Sequential Algorithms. Thursday

More information

Statistical Models for sequencing data: from Experimental Design to Generalized Linear Models

Statistical Models for sequencing data: from Experimental Design to Generalized Linear Models Best practices in the analysis of RNA-Seq and CHiP-Seq data 4 th -5 th May 2017 University of Cambridge, Cambridge, UK Statistical Models for sequencing data: from Experimental Design to Generalized Linear

More information

Graphical Models. Lecture 3: Local Condi6onal Probability Distribu6ons. Andrew McCallum

Graphical Models. Lecture 3: Local Condi6onal Probability Distribu6ons. Andrew McCallum Graphical Models Lecture 3: Local Condi6onal Probability Distribu6ons Andrew McCallum mccallum@cs.umass.edu Thanks to Noah Smith and Carlos Guestrin for some slide materials. 1 Condi6onal Probability Distribu6ons

More information

Graphical Models. Lecture 10: Variable Elimina:on, con:nued. Andrew McCallum

Graphical Models. Lecture 10: Variable Elimina:on, con:nued. Andrew McCallum Graphical Models Lecture 10: Variable Elimina:on, con:nued Andrew McCallum mccallum@cs.umass.edu Thanks to Noah Smith and Carlos Guestrin for some slide materials. 1 Last Time Probabilis:c inference is

More information

Deriva'on of The Kalman Filter. Fred DePiero CalPoly State University EE 525 Stochas'c Processes

Deriva'on of The Kalman Filter. Fred DePiero CalPoly State University EE 525 Stochas'c Processes Deriva'on of The Kalman Filter Fred DePiero CalPoly State University EE 525 Stochas'c Processes KF Uses State Predic'ons KF es'mates the state of a system Example Measure: posi'on State: [ posi'on velocity

More information

Machine learning for Dynamic Social Network Analysis

Machine learning for Dynamic Social Network Analysis Machine learning for Dynamic Social Network Analysis Manuel Gomez Rodriguez Max Planck Ins7tute for So;ware Systems UC3M, MAY 2017 Interconnected World SOCIAL NETWORKS TRANSPORTATION NETWORKS WORLD WIDE

More information

PSPACE, NPSPACE, L, NL, Savitch's Theorem. More new problems that are representa=ve of space bounded complexity classes

PSPACE, NPSPACE, L, NL, Savitch's Theorem. More new problems that are representa=ve of space bounded complexity classes PSPACE, NPSPACE, L, NL, Savitch's Theorem More new problems that are representa=ve of space bounded complexity classes Outline for today How we'll count space usage Space bounded complexity classes New

More information

Predicate abstrac,on and interpola,on. Many pictures and examples are borrowed from The So'ware Model Checker BLAST presenta,on.

Predicate abstrac,on and interpola,on. Many pictures and examples are borrowed from The So'ware Model Checker BLAST presenta,on. Predicate abstrac,on and interpola,on Many pictures and examples are borrowed from The So'ware Model Checker BLAST presenta,on. Outline. Predicate abstrac,on the idea in pictures 2. Counter- example guided

More information

Computational Issues in BSM Theories -- Past, Present and Future

Computational Issues in BSM Theories -- Past, Present and Future Computational Issues in BSM Theories -- Past, Present and Future Meifeng Lin Computa0onal Science Center Brookhaven Na0onal Laboratory Field Theore0c Computer Simula0ons for Par0cle Physics And Condensed

More information

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty Lecture 10: Introduction to reasoning under uncertainty Introduction to reasoning under uncertainty Review of probability Axioms and inference Conditional probability Probability distributions COMP-424,

More information

Some thoughts on linearity, nonlinearity, and partial separability

Some thoughts on linearity, nonlinearity, and partial separability Some thoughts on linearity, nonlinearity, and partial separability Paul Hovland Argonne Na0onal Laboratory Joint work with Boyana Norris, Sri Hari Krishna Narayanan, Jean Utke, Drew Wicke Argonne Na0onal

More information