Par$$oned Elias- Fano indexes. Giuseppe O)aviano Rossano Venturini
|
|
- Brandon Conley
- 6 years ago
- Views:
Transcription
1 Par$$oned Elias- Fano indexes Giuseppe O)aviano Rossano Venturini
2 Inverted indexes Core data structure of Informa$on Retrieval Documents are sequences of terms 1: [it is what it is not] 2: [what is a] 3: [it is a banana] 1, 2, 3 are document iden$fiers (docids) We want to find the documents that match a query, that is a set of terms
3 Inverted indexes 1: [it is what it is not] 2: [what is a] 3: [it is a banana] Query: [what] Results: 1, 2 Query: [it AND is] Results: 1, 3 Query: [banana OR not] Results: 1, 3 Query: [what AND is] Results: 1, 2 Query: [what AND not] Results: 1
4 Inverted index Pos$ng list: list of docids containing a term Inverted index: pos$ng lists for each term 1: [it is what it is not] 2: [what is a] 3: [it is a banana] a 2, 3 banana 3 is 3, 1, 2 it 1, 3 not 1 what 2, 1
5 Inverted indexes 1: [it is what it is not] 2: [what is a] 3: [it is a banana] a 2, 3 banana 3 is 3, 1, 2 it 1, 3 not 1 what 2, 1 Query: it AND is a 2, 3 banana 3 is 1, 2, 3 it 1, 3 not 1 what 1, 2
6 Docid- sorted inverted indexes Efficient intersec$on One- pass scan of the pos$ng lists Efficient compression Reduces to storage of monotonically increasing lists of integers
7 Index compression Inverted indexes can be very large Example: ClueWeb09 Track B Collec$on 50M documents 15B pos$ngs No compression (32 bits per integer): 60GB Spoiler: with compression, can go as low as 11GB
8 Sequences in pos$ng lists Generally, a pos$ng lists is composed of Sequence of docids: strictly monotonic Sequence of frequencies/scores: strictly posi$ve Sequence of posi$ons: concatena$on of strictly monotonic lists We focus on docids and frequencies
9 Elias Fano! Monotonic Sequences a i + i Docids Strictly monotonic Interpola$ve coding i a i i=0 a i a i - 1 Non- nega$ve Gap coding a i - 1 Scores Strictly non- nega$ve (Posi$ve)
10 Gap compression Replace list of integers with gaps between adjacent integers Since list is increasing, all gaps are posi$ve a 2, 3 banana 3 is 1, 2, 3 it 1, 3 not 1 what 1, 2 a 2, 1 banana 3 is 1, 1, 1 it 1, 2 not 1 what 1, 1
11 Elias- Fano encoding Data structure from the 70s, mostly in the succinct data structures niche Natural encoding of monotonically increasing sequences of integers Recently successfully applied to the representa$on of inverted indexes Used by Facebook Graph Search!
12 Elias- Fano representa$on Example: 2, 3, 5, 7, 11, 13, 19 w l upper bits l lower bits 0, 0, 1, 1, 2, 3, 4 Extract the upper bits 0, 0, 1, 0, 1, 1, 1 Compute the gaps Encode gaps with unary code Elias- Fano representa$on of the sequence
13 Elias- Fano representa$on Example: 2, 3, 5, 7, 11, 13, 19 w l upper bits l lower bits Elias- Fano representa$on of the sequence Maximum upper value: [U / 2 l ] where U is largest sequence value Sequence length is n Example: [19 / 2 2 ] = 4 = 100 In upper bits there are [U / 2 l ] zeros and n ones Space [U / 2 l ] + n + nl bits
14 Elias- Fano representa$on Example: 2, 3, 5, 7, 11, 13, 19 w l upper bits l lower bits Can show that l = [log(u/n)] is op$mal Space [U / 2 l ] + n + nl bits (2 + log(u/n))n bits
15 Elias- Fano representa$on With addi$onal (small) data structures can support fast search and random access Almost- O(1)- $me skips for intersec$on queries (2 + log(u/n))n- bits space independent of values distribu1on! is this a good thing?
16 Term- document matrix Alterna$ve interpreta$on of inverted index a 2, 3 banana 3 is 1, 2, 3 it 1, 3 not 1 what 1, a X X banana X is X X X it X X not X what X X Gaps are distances between the Xs
17 Gaps are usually small Assume that documents from the same domain have similar docids unipi.it/a unipi.it/b unipi.it/c unipi.it/d livorno.it livorno.it/nemici pisa X X X X X Cluster of docids Pos$ng lists contain long runs of very close integers That is, long runs of very small gaps
18 Gap compression Gaps are mostly small integers Small integers can be compressed efficiently with universal codes Example: unary code 1: 1 2: 01 3: 001 4: 0001 Example: 2, 3, 5, 9 - > 2, 1, 2, 4 - >
19 Gap compression Gap compression historically the most common representa$on of inverted indexes Drawbacks Decoding is sequen$al, can be slow Bemer compression - > even slower decoding
20 Elias- Fano and clustering Consider the following two lists 1, 2, 3, 4,, n 1, U n random values between 1 and U Both have n elements and last value U Elias- Fano compresses both to the exact same number of bits! But first list is fare more compressible: it is sufficient to store n and U Elias- Fano doesn t exploit clustering
21 Par11oned Elias- Fano XXX XXXXX XXXXXXXXX X X X X XXXX X XXX Par$$on the sequence into chunks Add pointers to the beginning of each chunk Represent each chunk and the sequence of pointers with Elias- Fano If the chunks approximate the clusters, compression improves
22 Par$$on op$miza$on We want to find, among all the possible par$$ons, the one that takes up less space Exhaus$ve search is exponen1al Dynamic programming can be done quadra1c Our solu$on: (1 + ε)- approximate solu$on in linear $me O(n log(1/ε)/log(1 + ε)) Reduce to a shortest path in a sparsified DAG
23 Par$$on op$miza$on Nodes correspond to sequence elements Edges to poten$al chunks Paths = Sequence par$$ons
24 Par$$on op$miza$on Each edge weight is the cost of the chunk defined by the edge endpoints Shortest path = Minimum cost par$$on Edge costs can be computed in O(1) but number of edges is quadra$c!
25 Idea n i
26 Idea n.1 General DAG sparsifica1on technique Quan$ze edge costs in classes of cost between (1 + ε 1 ) i and (1 + ε 1 ) i + 1 For each node and each cost class, keep only one maximal edge O(log n / log (1 + ε 1 )) edges per node! Shortest path in sparsified DAG at most (1 + ε 1 ) $mes more expensive than in original DAG Sparsified DAG can be computed on the fly
27 Idea n.2 XXXXXXXXX X X X X If we split a chunk at an arbitrary posi$on New cost Old cost + cost of new pointer If chunk is big enough, loss is negligible We keep only edges with cost O(1 / ε 2 ) New DAG has O(n log (1 / ε 2 ) / log (1 + ε 1 )) edges! Approxima$on factor becomes (1 + ε 2 ) (1 + ε 1 )
28 Dependency on ε 1 No$ce the scale! Bits per posbng Time in seconds
29 Dependency on ε 2 Here we go from O(n log n) to O(n) Bits per posbng Time in seconds
30 Results on GOV2 and ClueWeb09 GOV2 ClueWeb09 Elias- Fano 5.6GB 14GB Par$$oned Elias- Fano 3.0GB 11GB
31 Results on GOV2 and ClueWeb09 Gov2 ClueWeb09 space doc freq space doc freq GB bpi bpi GB bpi bpi EF single 7.66 (+64.7%) 7.53 (+83.4%) 3.14 (+32.4%) (+23.1%) 7.46 (+27.7%) 2.44 (+11.0%) EF uniform 5.17 (+11.2%) 4.63 (+12.9%) 2.58 (+8.4%) (+11.5%) 6.58 (+12.6%) 2.39 (+8.8%) EF -optimal Interpolative 4.57 ( 1.8%) 4.03 ( 1.8%) 2.33 ( 1.8%) ( 8.3%) 5.33 ( 8.8%) 2.04 ( 7.1%) OptPFD 5.22 (+12.3%) 4.72 (+15.1%) 2.55 (+7.4%) (+11.6%) 6.42 (+9.8%) 2.56 (+16.4%) Varint-G8IU (+202.2%) (+158.2%) 8.98 (+278.3%) (+148.3%) (+88.1%) 8.98 (+308.8%) Gov2 ClueWeb09 TREC 05 TREC 06 TREC 05 TREC 06 Gov2 ClueWeb09 TREC 05 TREC 06 TREC 05 TREC 06 EF single 2.1 (+10%) 4.7 (+1%) 13.6 ( 5%) 15.8 ( 9%) EF uniform 2.1 (+9%) 5.1 (+10%) 15.5 (+8%) 18.9 (+9%) EF -optimal Interpolative 7.5 (+291%) 20.4 (+343%) 55.7 (+289%) 76.5 (+341%) OptPFD 2.2 (+14%) 5.7 (+24%) 16.6 (+16%) 21.9 (+26%) Varint-G8IU 1.5 ( 20%) 4.0 ( 13%) 11.1 ( 23%) 14.8 ( 15%) OR queries EF single 80.7 (+8%) (+10%) (+0%) ( 2%) EF uniform 72.1 ( 3%) ( 3%) ( 3%) ( 4%) EF -optimal Interpolative (+62%) (+62%) (+53%) (+51%) OptPFD 69.5 ( 7%) ( 7%) ( 10%) ( 12%) Varint-G8IU 67.4 ( 10%) ( 10%) ( 15%) ( 17%) AND queries
32 Thanks for your amen$on! Ques$ons?
Par$$oned Elias- Fano Indexes
Par$$oned Elias- Fano Indexes Giuseppe O)aviano ISTI- CNR, Pisa Rossano Venturini Università di Pisa Inverted indexes Docid Document 1: [it is what it is not] 2: [what is a] 3: [it is a banana] a 2, 3
More informationNearest Neighbor Search with Keywords: Compression
Nearest Neighbor Search with Keywords: Compression Yufei Tao KAIST June 3, 2013 In this lecture, we will continue our discussion on: Problem (Nearest Neighbor Search with Keywords) Let P be a set of points
More informationInteres'ng- Phrase Mining for Ad- Hoc Text Analy'cs
Interes'ng- Phrase Mining for Ad- Hoc Text Analy'cs Klaus Berberich (kberberi@mpi- inf.mpg.de) Joint work with: Srikanta Bedathur, Jens DiIrich, Nikos Mamoulis, and Gerhard Weikum (originally presented
More informationSKA machine learning perspec1ves for imaging, processing and analysis
1 SKA machine learning perspec1ves for imaging, processing and analysis Slava Voloshynovskiy Stochas1c Informa1on Processing Group University of Geneva Switzerland with contribu,on of: D. Kostadinov, S.
More informationSpace and Time-Efficient Data Structures for Massive Datasets
Space and Time-Efficient Data Structures for Massive Datasets Giulio Ermanno Pibiri giulio.pibiri@di.unipi.it Supervisor Rossano Venturini Computer Science Department University of Pisa 17/10/2016 1 Evidence
More informationQuerying. 1 o Semestre 2008/2009
Querying Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2008/2009 Outline 1 2 3 4 5 Outline 1 2 3 4 5 function sim(d j, q) = 1 W d W q W d is the document norm W q is the
More informationCSE P 501 Compilers. Value Numbering & Op;miza;ons Hal Perkins Winter UW CSE P 501 Winter 2016 S-1
CSE P 501 Compilers Value Numbering & Op;miza;ons Hal Perkins Winter 2016 UW CSE P 501 Winter 2016 S-1 Agenda Op;miza;on (Review) Goals Scope: local, superlocal, regional, global (intraprocedural), interprocedural
More informationAlgorithms for NLP. Classifica(on III. Taylor Berg- Kirkpatrick CMU Slides: Dan Klein UC Berkeley
Algorithms for NLP Classifica(on III Taylor Berg- Kirkpatrick CMU Slides: Dan Klein UC Berkeley The Perceptron, Again Start with zero weights Visit training instances one by one Try to classify If correct,
More informationStreaming - 2. Bloom Filters, Distinct Item counting, Computing moments. credits:www.mmds.org.
Streaming - 2 Bloom Filters, Distinct Item counting, Computing moments credits:www.mmds.org http://www.mmds.org Outline More algorithms for streams: 2 Outline More algorithms for streams: (1) Filtering
More informationNov 12, 2014 LB273 Prof. Vash6 Sawtelle. Today s Topics: SHM. h<ps:// v=6sas9n3omum&feature=youtu.be&t=16s
Nov 12, 2014 LB273 Prof. Vash6 Sawtelle Today s Topics: SHM h
More informationGraphical Models. Lecture 3: Local Condi6onal Probability Distribu6ons. Andrew McCallum
Graphical Models Lecture 3: Local Condi6onal Probability Distribu6ons Andrew McCallum mccallum@cs.umass.edu Thanks to Noah Smith and Carlos Guestrin for some slide materials. 1 Condi6onal Probability Distribu6ons
More informationProbability and Structure in Natural Language Processing
Probability and Structure in Natural Language Processing Noah Smith, Carnegie Mellon University 2012 Interna@onal Summer School in Language and Speech Technologies Quick Recap Yesterday: Bayesian networks
More informationGraphical Models. Lecture 10: Variable Elimina:on, con:nued. Andrew McCallum
Graphical Models Lecture 10: Variable Elimina:on, con:nued Andrew McCallum mccallum@cs.umass.edu Thanks to Noah Smith and Carlos Guestrin for some slide materials. 1 Last Time Probabilis:c inference is
More informationMachine learning for Dynamic Social Network Analysis
Machine learning for Dynamic Social Network Analysis Manuel Gomez Rodriguez Max Planck Ins7tute for So;ware Systems UC3M, MAY 2017 Interconnected World SOCIAL NETWORKS TRANSPORTATION NETWORKS WORLD WIDE
More informationNetworks. Can (John) Bruce Keck Founda7on Biotechnology Lab Bioinforma7cs Resource
Networks Can (John) Bruce Keck Founda7on Biotechnology Lab Bioinforma7cs Resource Networks in biology Protein-Protein Interaction Network of Yeast Transcriptional regulatory network of E.coli Experimental
More informationCS 6140: Machine Learning Spring What We Learned Last Week 2/26/16
Logis@cs CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa@on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Sign
More informationGradient Descent for High Dimensional Systems
Gradient Descent for High Dimensional Systems Lab versus Lab 2 D Geometry Op>miza>on Poten>al Energy Methods: Implemented Equa3ons for op3mizer 3 2 4 Bond length High Dimensional Op>miza>on Applica3ons:
More informationAlphabet Friendly FM Index
Alphabet Friendly FM Index Author: Rodrigo González Santiago, November 8 th, 2005 Departamento de Ciencias de la Computación Universidad de Chile Outline Motivations Basics Burrows Wheeler Transform FM
More informationExponen'al growth and differen'al equa'ons
Exponen'al growth and differen'al equa'ons But first.. Thanks for the feedback! Feedback about M102 Which of the following do you find useful? 70 60 50 40 30 20 10 0 How many resources students typically
More informationSummary of Last Lectures
Lossless Coding IV a k p k b k a 0.16 111 b 0.04 0001 c 0.04 0000 d 0.16 110 e 0.23 01 f 0.07 1001 g 0.06 1000 h 0.09 001 i 0.15 101 100 root 1 60 1 0 0 1 40 0 32 28 23 e 17 1 0 1 0 1 0 16 a 16 d 15 i
More informationShannon-Fano-Elias coding
Shannon-Fano-Elias coding Suppose that we have a memoryless source X t taking values in the alphabet {1, 2,..., L}. Suppose that the probabilities for all symbols are strictly positive: p(i) > 0, i. The
More informationElectricity & Magnetism Lecture 3: Electric Flux and Field Lines
Electricity & Magnetism Lecture 3: Electric Flux and Field Lines Today s Concepts: A) Electric Flux B) Field Lines Gauss Law Electricity & Magne@sm Lecture 3, Slide 1 Your Comments What the heck is epsilon
More informationLecture 18 April 26, 2012
6.851: Advanced Data Structures Spring 2012 Prof. Erik Demaine Lecture 18 April 26, 2012 1 Overview In the last lecture we introduced the concept of implicit, succinct, and compact data structures, and
More informationAlgorithms: COMP3121/3821/9101/9801
NEW SOUTH WALES Algorithms: COMP3121/3821/9101/9801 Aleks Ignjatović School of Computer Science and Engineering University of New South Wales TOPIC 4: THE GREEDY METHOD COMP3121/3821/9101/9801 1 / 23 The
More informationRegister Alloca.on. CMPT 379: Compilers Instructor: Anoop Sarkar. anoopsarkar.github.io/compilers-class
Register Alloca.on CMPT 379: Compilers Instructor: Anoop Sarkar anoopsarkar.github.io/compilers-class 1 Register Alloca.on Intermediate code uses unlimited temporaries Simplifying code genera.on and op.miza.on
More informationAlgorithms for NLP. Language Modeling III. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley
Algorithms for NLP Language Modeling III Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Announcements Office hours on website but no OH for Taylor until next week. Efficient Hashing Closed address
More informationCompression. What. Why. Reduce the amount of information (bits) needed to represent image Video: 720 x 480 res, 30 fps, color
Compression What Reduce the amount of information (bits) needed to represent image Video: 720 x 480 res, 30 fps, color Why 720x480x20x3 = 31,104,000 bytes/sec 30x60x120 = 216 Gigabytes for a 2 hour movie
More informationClassical Mechanics Lecture 21
Classical Mechanics Lecture 21 Today s Concept: Simple Harmonic Mo7on: Mass on a Spring Mechanics Lecture 21, Slide 1 The Mechanical Universe, Episode 20: Harmonic Motion http://www.learner.org/vod/login.html?pid=565
More informationR ij = 2. Using all of these facts together, you can solve problem number 9.
Help for Homework Problem #9 Let G(V,E) be any undirected graph We want to calculate the travel time across the graph. Think of each edge as one resistor of 1 Ohm. Say we have two nodes: i and j Let the
More informationWavelets & Mul,resolu,on Analysis
Wavelets & Mul,resolu,on Analysis Square Wave by Steve Hanov More comics at http://gandolf.homelinux.org/~smhanov/comics/ Problem set #4 will be posted tonight 11/21/08 Comp 665 Wavelets & Mul8resolu8on
More informationPredicate abstrac,on and interpola,on. Many pictures and examples are borrowed from The So'ware Model Checker BLAST presenta,on.
Predicate abstrac,on and interpola,on Many pictures and examples are borrowed from The So'ware Model Checker BLAST presenta,on. Outline. Predicate abstrac,on the idea in pictures 2. Counter- example guided
More informationCompact Indexes for Flexible Top-k Retrieval
Compact Indexes for Flexible Top-k Retrieval Simon Gog Matthias Petri Institute of Theoretical Informatics, Karlsruhe Institute of Technology Computing and Information Systems, The University of Melbourne
More informationSuccinct Data Structures for Approximating Convex Functions with Applications
Succinct Data Structures for Approximating Convex Functions with Applications Prosenjit Bose, 1 Luc Devroye and Pat Morin 1 1 School of Computer Science, Carleton University, Ottawa, Canada, K1S 5B6, {jit,morin}@cs.carleton.ca
More informationCS 161: Design and Analysis of Algorithms
CS 161: Design and Analysis of Algorithms Greedy Algorithms 3: Minimum Spanning Trees/Scheduling Disjoint Sets, continued Analysis of Kruskal s Algorithm Interval Scheduling Disjoint Sets, Continued Each
More informationCS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #15: Mining Streams 2
CS 5614: (Big) Data Management Systems B. Aditya Prakash Lecture #15: Mining Streams 2 Today s Lecture More algorithms for streams: (1) Filtering a data stream: Bloom filters Select elements with property
More informationMA/CS 109 Lecture 7. Back To Exponen:al Growth Popula:on Models
MA/CS 109 Lecture 7 Back To Exponen:al Growth Popula:on Models Homework this week 1. Due next Thursday (not Tuesday) 2. Do most of computa:ons in discussion next week 3. If possible, bring your laptop
More informationObjec&ves. Review. Data structure: Heaps Data structure: Graphs. What is a priority queue? What is a heap?
Objec&ves Data structure: Heaps Data structure: Graphs Submit problem set 2 1 Review What is a priority queue? What is a heap? Ø Proper&es Ø Implementa&on What is the process for finding the smallest element
More informationSmall-Space Dictionary Matching (Dissertation Proposal)
Small-Space Dictionary Matching (Dissertation Proposal) Graduate Center of CUNY 1/24/2012 Problem Definition Dictionary Matching Input: Dictionary D = P 1,P 2,...,P d containing d patterns. Text T of length
More informationCSE 473: Ar+ficial Intelligence. Probability Recap. Markov Models - II. Condi+onal probability. Product rule. Chain rule.
CSE 473: Ar+ficial Intelligence Markov Models - II Daniel S. Weld - - - University of Washington [Most slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188
More informationPart 2. Representation Learning Algorithms
53 Part 2 Representation Learning Algorithms 54 A neural network = running several logistic regressions at the same time If we feed a vector of inputs through a bunch of logis;c regression func;ons, then
More informationExact data mining from in- exact data Nick Freris
Exact data mining from in- exact data Nick Freris Qualcomm, San Diego October 10, 2013 Introduc=on (1) Informa=on retrieval is a large industry.. Biology, finance, engineering, marke=ng, vision/graphics,
More informationQ1 current best prac2ce
Group- A Q1 current best prac2ce Star2ng from some common molecular representa2on with bond orders, configura2on on chiral centers (e.g. ChemDraw, SMILES) NEW! PDB should become resource for refinement
More informationarxiv: v1 [cs.ds] 15 Feb 2012
Linear-Space Substring Range Counting over Polylogarithmic Alphabets Travis Gagie 1 and Pawe l Gawrychowski 2 1 Aalto University, Finland travis.gagie@aalto.fi 2 Max Planck Institute, Germany gawry@cs.uni.wroc.pl
More informationData Compression Techniques (Spring 2012) Model Solutions for Exercise 2
582487 Data Compression Techniques (Spring 22) Model Solutions for Exercise 2 If you have any feedback or corrections, please contact nvalimak at cs.helsinki.fi.. Problem: Construct a canonical prefix
More informationLesson 76 Introduction to Complex Numbers
Lesson 76 Introduction to Complex Numbers HL2 MATH - SANTOWSKI Lesson Objectives (1) Introduce the idea of imaginary and complex numbers (2) Prac?ce opera?ons with complex numbers (3) Use complex numbers
More informationElectricity & Magnetism Lecture 5: Electric Potential Energy
Electricity & Magnetism Lecture 5: Electric Potential Energy Today... Ø Ø Electric Poten1al Energy Unit 21 session Gravita1onal and Electrical PE Electricity & Magne/sm Lecture 5, Slide 1 Stuff you asked
More informationHW #4. (mostly by) Salim Sarımurat. 1) Insert 6 2) Insert 8 3) Insert 30. 4) Insert S a.
HW #4 (mostly by) Salim Sarımurat 04.12.2009 S. 1. 1. a. 1) Insert 6 2) Insert 8 3) Insert 30 4) Insert 40 2 5) Insert 50 6) Insert 61 7) Insert 70 1. b. 1) Insert 12 2) Insert 29 3) Insert 30 4) Insert
More informationSeman&cs with Dense Vectors. Dorota Glowacka
Semancs with Dense Vectors Dorota Glowacka dorota.glowacka@ed.ac.uk Previous lectures: - how to represent a word as a sparse vector with dimensions corresponding to the words in the vocabulary - the values
More informationBellman s Curse of Dimensionality
Bellman s Curse of Dimensionality n- dimensional state space Number of states grows exponen
More informationAlgorithms for NLP. Classifica(on III. Taylor Berg- Kirkpatrick CMU Slides: Dan Klein UC Berkeley
Algorithms for NLP Classifica(on III Taylor Berg- Kirkpatrick CMU Slides: Dan Klein UC Berkeley Objec(ve Func(ons What do we want from our weights? Depends! So far: minimize (training) errors: This is
More informationPSPACE, NPSPACE, L, NL, Savitch's Theorem. More new problems that are representa=ve of space bounded complexity classes
PSPACE, NPSPACE, L, NL, Savitch's Theorem More new problems that are representa=ve of space bounded complexity classes Outline for today How we'll count space usage Space bounded complexity classes New
More informationThe Boolean Model ~1955
The Boolean Model ~1955 The boolean model is the first, most criticized, and (until a few years ago) commercially more widespread, model of IR. Its functionalities can often be found in the Advanced Search
More informationLecture 3 : Algorithms for source coding. September 30, 2016
Lecture 3 : Algorithms for source coding September 30, 2016 Outline 1. Huffman code ; proof of optimality ; 2. Coding with intervals : Shannon-Fano-Elias code and Shannon code ; 3. Arithmetic coding. 1/39
More informationAuthor to whom correspondence should be addressed;
Algorithms 2009, 2, 1031-1044; doi:10.3390/a2031031 Article OPEN ACCESS algorithms ISSN 1999-4893 www.mdpi.com/journal/algorithms Graph Compression by BFS Alberto Apostolico 1,2 and Guido Drovandi 3,4,
More informationCOMP 562: Introduction to Machine Learning
COMP 562: Introduction to Machine Learning Lecture 20 : Support Vector Machines, Kernels Mahmoud Mostapha 1 Department of Computer Science University of North Carolina at Chapel Hill mahmoudm@cs.unc.edu
More informationPseudospectral Methods For Op2mal Control. Jus2n Ruths March 27, 2009
Pseudospectral Methods For Op2mal Control Jus2n Ruths March 27, 2009 Introduc2on Pseudospectral methods arose to find solu2ons to Par2al Differen2al Equa2ons Recently adapted for Op2mal Control Key Ideas
More informationDivide-and-Conquer Algorithms Part Two
Divide-and-Conquer Algorithms Part Two Recap from Last Time Divide-and-Conquer Algorithms A divide-and-conquer algorithm is one that works as follows: (Divide) Split the input apart into multiple smaller
More informationboolean queries Inverted index query processing Query optimization boolean model January 15, / 35
boolean model January 15, 2017 1 / 35 Outline 1 boolean queries 2 3 4 2 / 35 taxonomy of IR models Set theoretic fuzzy extended boolean set-based IR models Boolean vector probalistic algebraic generalized
More informationModule 4: Linear Equa1ons Topic D: Systems of Linear Equa1ons and Their Solu1ons. Lesson 4-24: Introduc1on to Simultaneous Equa1ons
Module 4: Linear Equa1ons Topic D: Systems of Linear Equa1ons and Their Solu1ons Lesson 4-24: Introduc1on to Simultaneous Equa1ons Lesson 4-24: Introduc1on to Simultaneous Equa1ons Purpose for Learning:
More informationData Structures in Java
Data Structures in Java Lecture 20: Algorithm Design Techniques 12/2/2015 Daniel Bauer 1 Algorithms and Problem Solving Purpose of algorithms: find solutions to problems. Data Structures provide ways of
More informationIn-core computation of distance distributions and geometric centralities with HyperBall: A hundred billion nodes and beyond
In-core computation of distance distributions and geometric centralities with HyperBall: A hundred billion nodes and beyond Paolo Boldi, Sebastiano Vigna Laboratory for Web Algorithmics Università degli
More informationLecture 16. Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code
Lecture 16 Agenda for the lecture Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code Variable-length source codes with error 16.1 Error-free coding schemes 16.1.1 The Shannon-Fano-Elias
More informationRela%ons and Their Proper%es. Slides by A. Bloomfield
Rela%ons and Their Proper%es Slides by A. Bloomfield What is a rela%on Let A and B be sets. A binary rela%on R is a subset of A B Example Let A be the students in a the CS major A = {Alice, Bob, Claire,
More informationCon$nuous Op$miza$on: The "Right" Language for Fast Graph Algorithms? Aleksander Mądry
Con$nuous Op$miza$on: The "Right" Language for Fast Graph Algorithms? Aleksander Mądry We have been studying graph algorithms for a while now Tradi$onal view: Graphs are combinatorial objects graph algorithms
More informationMARKOV CHAINS A finite state Markov chain is a sequence of discrete cv s from a finite alphabet where is a pmf on and for
MARKOV CHAINS A finite state Markov chain is a sequence S 0,S 1,... of discrete cv s from a finite alphabet S where q 0 (s) is a pmf on S 0 and for n 1, Q(s s ) = Pr(S n =s S n 1 =s ) = Pr(S n =s S n 1
More informationExponen'al func'ons and exponen'al growth. UBC Math 102
Exponen'al func'ons and exponen'al growth Course Calendar: OSH 4 due by 12:30pm in MX 1111 You are here Coming up (next week) Group version of Quiz 3 distributed by email Group version of Quiz 3 due in
More informationComputer Vision. Pa0ern Recogni4on Concepts. Luis F. Teixeira MAP- i 2014/15
Computer Vision Pa0ern Recogni4on Concepts Luis F. Teixeira MAP- i 2014/15 Outline General pa0ern recogni4on concepts Classifica4on Classifiers Decision Trees Instance- Based Learning Bayesian Learning
More informationParameter Es*ma*on: Cracking Incomplete Data
Parameter Es*ma*on: Cracking Incomplete Data Khaled S. Refaat Collaborators: Arthur Choi and Adnan Darwiche Agenda Learning Graphical Models Complete vs. Incomplete Data Exploi*ng Data for Decomposi*on
More informationInteger Sorting on the word-ram
Integer Sorting on the word-rm Uri Zwick Tel viv University May 2015 Last updated: June 30, 2015 Integer sorting Memory is composed of w-bit words. rithmetical, logical and shift operations on w-bit words
More informationLOWELL. MICHIGAN. WEDNESDAY, FEBRUARY NUMllEE 33, Chicago. >::»«ad 0:30am, " 16.n«l w 00 ptn Jaekten,.'''4snd4:4(>a tii, ijilwopa
4/X6 X 896 & # 98 # 4 $2 q $ 8 8 $ 8 6 8 2 8 8 2 2 4 2 4 X q q!< Q 48 8 8 X 4 # 8 & q 4 ) / X & & & Q!! & & )! 2 ) & / / ;) Q & & 8 )
More informationLast Lecture Recap UVA CS / Introduc8on to Machine Learning and Data Mining. Lecture 3: Linear Regression
UVA CS 4501-001 / 6501 007 Introduc8on to Machine Learning and Data Mining Lecture 3: Linear Regression Yanjun Qi / Jane University of Virginia Department of Computer Science 1 Last Lecture Recap q Data
More informationLecture 8: Channel and source-channel coding theorems; BEC & linear codes. 1 Intuitive justification for upper bound on channel capacity
5-859: Information Theory and Applications in TCS CMU: Spring 23 Lecture 8: Channel and source-channel coding theorems; BEC & linear codes February 7, 23 Lecturer: Venkatesan Guruswami Scribe: Dan Stahlke
More informationIntroduc)on to Ar)ficial Intelligence
Introduc)on to Ar)ficial Intelligence Lecture 13 Approximate Inference CS/CNS/EE 154 Andreas Krause Bayesian networks! Compact representa)on of distribu)ons over large number of variables! (OQen) allows
More informationCMPSCI611: The Matroid Theorem Lecture 5
CMPSCI611: The Matroid Theorem Lecture 5 We first review our definitions: A subset system is a set E together with a set of subsets of E, called I, such that I is closed under inclusion. This means that
More informationGraphical Models. Lecture 1: Mo4va4on and Founda4ons. Andrew McCallum
Graphical Models Lecture 1: Mo4va4on and Founda4ons Andrew McCallum mccallum@cs.umass.edu Thanks to Noah Smith and Carlos Guestrin for some slide materials. Board work Expert systems the desire for probability
More informationIS4200/CS6200 Informa0on Retrieval. PageRank Con+nued. with slides from Hinrich Schütze and Chris6na Lioma
IS4200/CS6200 Informa0on Retrieval PageRank Con+nued with slides from Hinrich Schütze and Chris6na Lioma Exercise: Assump0ons underlying PageRank Assump0on 1: A link on the web is a quality signal the
More informationRank and Select Operations on Binary Strings (1974; Elias)
Rank and Select Operations on Binary Strings (1974; Elias) Naila Rahman, University of Leicester, www.cs.le.ac.uk/ nyr1 Rajeev Raman, University of Leicester, www.cs.le.ac.uk/ rraman entry editor: Paolo
More informationLecture 1 : Data Compression and Entropy
CPS290: Algorithmic Foundations of Data Science January 8, 207 Lecture : Data Compression and Entropy Lecturer: Kamesh Munagala Scribe: Kamesh Munagala In this lecture, we will study a simple model for
More informationQuickScorer: a fast algorithm to rank documents with additive ensembles of regression trees
QuickScorer: a fast algorithm to rank documents with additive ensembles of regression trees Claudio Lucchese, Franco Maria Nardini, Raffaele Perego, Nicola Tonellotto HPC Lab, ISTI-CNR, Pisa, Italy & Tiscali
More informationBandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet)
Compression Motivation Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet) Storage: Store large & complex 3D models (e.g. 3D scanner
More informationLecture 3. 1 Polynomial-time algorithms for the maximum flow problem
ORIE 633 Network Flows August 30, 2007 Lecturer: David P. Williamson Lecture 3 Scribe: Gema Plaza-Martínez 1 Polynomial-time algorithms for the maximum flow problem 1.1 Introduction Let s turn now to considering
More informationCSCI 360 Introduc/on to Ar/ficial Intelligence Week 2: Problem Solving and Op/miza/on. Instructor: Wei-Min Shen
CSCI 360 Introduc/on to Ar/ficial Intelligence Week 2: Problem Solving and Op/miza/on Instructor: Wei-Min Shen Today s Lecture Search Techniques (review & con/nue) Op/miza/on Techniques Home Work 1: descrip/on
More informationLec 03 Entropy and Coding II Hoffman and Golomb Coding
CS/EE 5590 / ENG 40 Special Topics Multimedia Communication, Spring 207 Lec 03 Entropy and Coding II Hoffman and Golomb Coding Zhu Li Z. Li Multimedia Communciation, 207 Spring p. Outline Lecture 02 ReCap
More informationAn instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1
Kraft s inequality An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if N 2 l i 1 Proof: Suppose that we have a tree code. Let l max = max{l 1,...,
More informationMethodological Foundations of Biomedical Informatics (BMSC-GA 4449) Optimization
Methodological Foundations of Biomedical Informatics (BMSCGA 4449) Optimization Op#miza#on A set of techniques for finding the values of variables at which the objec#ve func#on amains its cri#cal (minimal
More information11 The Max-Product Algorithm
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms for Inference Fall 2014 11 The Max-Product Algorithm In the previous lecture, we introduced
More informationAlgorithms, Graph Theory, and the Solu7on of Laplacian Linear Equa7ons. Daniel A. Spielman Yale University
Algorithms, Graph Theory, and the Solu7on of Laplacian Linear Equa7ons Daniel A. Spielman Yale University Rutgers, Dec 6, 2011 Outline Linear Systems in Laplacian Matrices What? Why? Classic ways to solve
More informationCSE 200 Lecture Notes Turing machine vs. RAM machine vs. circuits
CSE 200 Lecture Notes Turing machine vs. RAM machine vs. circuits Chris Calabro January 13, 2016 1 RAM model There are many possible, roughly equivalent RAM models. Below we will define one in the fashion
More informationQuery CS347. Term-document incidence. Incidence vectors. Which plays of Shakespeare contain the words Brutus ANDCaesar but NOT Calpurnia?
Query CS347 Which plays of Shakespeare contain the words Brutus ANDCaesar but NOT Calpurnia? Lecture 1 April 4, 2001 Prabhakar Raghavan Term-document incidence Incidence vectors Antony and Cleopatra Julius
More informationDatabase Theory VU , SS Ehrenfeucht-Fraïssé Games. Reinhard Pichler
Database Theory Database Theory VU 181.140, SS 2018 7. Ehrenfeucht-Fraïssé Games Reinhard Pichler Institut für Informationssysteme Arbeitsbereich DBAI Technische Universität Wien 15 May, 2018 Pichler 15
More informationOptimisation and Operations Research
Optimisation and Operations Research Lecture 12: Algorithm Analysis and Complexity Matthew Roughan http://www.maths.adelaide.edu.au/matthew.roughan/ Lecture_notes/OORII/
More informationTopic 4 Randomized algorithms
CSE 103: Probability and statistics Winter 010 Topic 4 Randomized algorithms 4.1 Finding percentiles 4.1.1 The mean as a summary statistic Suppose UCSD tracks this year s graduating class in computer science
More informationImage Data Compression. Dirty-paper codes Alexey Pak, Lehrstuhl für Interak<ve Echtzeitsysteme, Fakultät für Informa<k, KIT
Image Data Compression Dirty-paper codes 1 Reminder: watermarking with side informa8on Watermark embedder Noise n Input message m Auxiliary informa
More informationTheoretical Computer Science. Dynamic rank/select structures with applications to run-length encoded texts
Theoretical Computer Science 410 (2009) 4402 4413 Contents lists available at ScienceDirect Theoretical Computer Science journal homepage: www.elsevier.com/locate/tcs Dynamic rank/select structures with
More informationDictionary: an abstract data type
2-3 Trees 1 Dictionary: an abstract data type A container that maps keys to values Dictionary operations Insert Search Delete Several possible implementations Balanced search trees Hash tables 2 2-3 trees
More informationFounda'ons of Large- Scale Mul'media Informa'on Management and Retrieval. Lecture #3 Machine Learning. Edward Chang
Founda'ons of Large- Scale Mul'media Informa'on Management and Retrieval Lecture #3 Machine Learning Edward Y. Chang Edward Chang Founda'ons of LSMM 1 Edward Chang Foundations of LSMM 2 Machine Learning
More informationBIOINF 4371 Drug Design 1 Oliver Kohlbacher & Jens Krüger
BIOINF 4371 Drug Design 1 Oliver Kohlbacher & Jens Krüger Winter 2013/2014 11. Docking Part IV: Receptor Flexibility Overview Receptor flexibility Types of flexibility Implica5ons for docking Examples
More informationAlgorithms, Graph Theory, and Linear Equa9ons in Laplacians. Daniel A. Spielman Yale University
Algorithms, Graph Theory, and Linear Equa9ons in Laplacians Daniel A. Spielman Yale University Shang- Hua Teng Carter Fussell TPS David Harbater UPenn Pavel Cur9s Todd Knoblock CTY Serge Lang 1927-2005
More informationElectronic Structure Calcula/ons: Density Func/onal Theory. and. Predic/ng Cataly/c Ac/vity
Electronic Structure Calcula/ons: Density Func/onal Theory and Predic/ng Cataly/c Ac/vity Review Examples of experiments that do not agree with classical physics: Photoelectric effect Photons and the quan/za/on
More informationContent-Addressable Memory Associative Memory Lernmatrix Association Heteroassociation Learning Retrieval Reliability of the answer
Associative Memory Content-Addressable Memory Associative Memory Lernmatrix Association Heteroassociation Learning Retrieval Reliability of the answer Storage Analysis Sparse Coding Implementation on a
More information