Reconstruction. Reading for this lecture: Lecture Notes.

Size: px
Start display at page:

Download "Reconstruction. Reading for this lecture: Lecture Notes."

Transcription

1 ɛm Reconstruction Reading for this lecture: Lecture Notes.

2 The Learning Channel... ɛ -Machine of a Process: Intrinsic representation! Predictive (or causal) equivalence relation: s s Pr( S S= s ) = Pr( S S= s ) Causal State: s, s S Set of pasts with same morph : Pr( S s ) S = { s : s s } Set of causal states: S = S/ = {S, S, S 2,...} Causal state map: ɛ : S S L Causal state morph: Pr S S ɛ( s )={ s : s s } 2

3 The Learning Channel... Causal State Dynamic: State-to-State Transitions: {T (s) ij : s, i, j =,,..., S } T (s) = Pr(S ij j,s S i ) ( = Pr S = ɛ( s s) S = ɛ( ) s ) 3

4 The Learning Channel... The ɛ-machine of a Process... { } M = S, {T (s),s } State State 2 3 B Transient States Unique Start State: 2 C No measurements made: s = λ Start state: 2 Recurrent States S =[λ] D Start state distribution: Pr(S, S, S 2,...)=(,,,...) 4

5 ɛmreconstruction: ny method to go from process P Pr(S) to its ɛm () nalytical: Given model, equations of motion, description,... (2) Statistical inference: Given samples of P (i) Subtree Reconstruction: Time or spacetime data to ɛm (ii) State-splitting (CSSR): Time or spacetime data to ɛm (iii) Spectral (emsr): Power spectra to ɛm (iv) Optimal Causal Inference: Time or spacetime data to ɛm (v) Enumerative Bayesian Inference 5

6 How to reconstruct an ɛm: Subtree algorithm Given: Word distributions Pr(s D ), D =, 2, 3,... Steps: () Form depth-d parse tree. (2) Calculate node-to-node transition probabilities. (3) Causal states: Find morphs Pr( s L s K ) as subtrees. (4) Label tree nodes with morph (causal state) names. (5) Extract state-to-state transitions from parse tree. { } (6) ssemble into ɛm : M = S, {T (s),s }. lgorithm parameters: D, L, K 6

7 How to reconstruct an ɛm... Form parse tree estimate of Pr(s D ). Data stream: s M =... Start node Parse tree of depth D = 5 Number of samples:m D History length: K =,, 2, 3,... 7

8 How to reconstruct an ɛm... Form parse tree estimate of Pr(s D ). Data stream: s M =... Start node Parse tree of depth D =

9 How to reconstruct an ɛm... Form parse tree estimate of Pr(s D ). Data stream: s M =... Start node Parse tree of depth D =

10 How to reconstruct an ɛm... Form parse tree estimate of Pr(s D ). Data stream: s M =... Start node Parse tree of depth D = 5...

11 How to reconstruct an ɛm... Form parse tree estimate of Pr(s D ). Data stream: s M =... M Total samples: M D M Parse tree of depth D = Store word counts at nodes. Probability of node = Probability of word leading to node: Pr(w) = node count M

12 How to reconstruct an... ɛm ssume we have correct word distribution: Pr( ) = Pr(s D ) Pr() Pr() Pr() Pr() Pr() Pr() Pr() Pr() 2

13 How to reconstruct an... ɛm Node-to-node transition probability: w = ws Pr(n n, s) = Pr(n n ) = Pr(n ) Pr(n) = Pr(w ) Pr(w) = Pr(s w) w w w n n w n s Pr( w) n n Pr( w) n 3

14 How to reconstruct an... ɛm Find morphs Pr( s L s K ) as subtrees Future: L = 2 /2 /2 Past: K = Morph: Pr( S 2 S =) 2/3 /3 /2 /2 Given: s K = Pr() = = 4 9 Pr() = = 2 9 Pr() = 3 2 = 6 Pr() = 3 2 = 6 2/3 /3 /2 /2 2/3 /3 /2 /

15 How to reconstruct an... ɛm Find morphs Pr( s L s K ) as subtrees Future: L = 2 /2 /2 Past: K = Morph: Pr( S 2 S = ) 2/3 /3 /2 /2 Given: s K = Pr() = = 3 Pr() = 2 3 = 6 Pr() = 2 2 = 4 Pr() = 2 2 = 4 2/3 /3 /2 /2 2/3 /3 /2 /2 5

16 How to reconstruct an... ɛm Set of distinct morphs: Morph B: 2/3 /3 Morph : /2 /2 2/3 /3 /2 /2 2/3 /3 /2 /2 Set of causal states = Set of distinct morphs. S = {, B} 6

17 How to reconstruct an... ɛm Causal state transitions? Label tree nodes with their morph (causal state) names: /2 /2 B 2/3 /3 /2 /2 S = {, B} B B 2/3 /3 /2 /2 2/3 /3 /2 /2 B B B B

18 How to reconstruct an... ɛm Form ɛm: Causal states: S = {, B} B Start state ~ top tree node 8

19 How to reconstruct an... ɛm Form ɛm: Causal-state transitions from node-to-node transitions: 2 2 B

20 How to reconstruct an ɛm: Subtree algorithm Given: Word distributions Pr(s D ), D =, 2, 3,... Steps: () Form depth-d parse tree. (2) Calculate node-to-node transition probabilities. (3) Causal states: Find morphs Pr( s L s K ) as subtrees. (4) Label tree nodes with morph (causal state) names. (5) Extract state-to-state transitions from parse tree. { } (6) ssemble into ɛm : M = S, {T (s),s }. lgorithm parameters: D, L, K 2

21 How to reconstruct an... ɛm Example Processes:. Period- 2. Fair Coin 3. Biased Coin 4. Period-2 5. Golden Mean Process 6. Even Process 2

22 Examples (back to the Prediction Game): Period-:... Parse Tree D =

23 Examples (back to the Prediction Game): Period-:... Parse Tree D = 5 Morph L =

24 Examples (back to the Prediction Game)... Period-:... Space of histories: single point. One future morph:. Support: { + } Distribution: Pr(! S L = L s = K )=. 24

25 Examples (back to the Prediction Game)... Period-... ɛm: M = { } S, {T (s),s } S = {S = {...}} T () = () T () = () 25

26 Examples (back to the Prediction Game)... Period-... Causal state distribution: p S = () Entropy Rate: h µ = bits per symbol Statistical Complexity: C µ = bits 26

27 Examples (back to the Prediction Game)... Fair Coin:... Parse Tree D = 5 /2 /2 /2 /2 /2 /2 /2 /2 /2 /2 /2 /2 /2 /2 /2 / /2 /2 27

28 Examples (back to the Prediction Game)... Fair Coin:... Parse Tree D = 5 Future Morph L = 2 /2 /2 /2 /2 /2 /2 /2 /2 /2 /2 /2 /2 /2 /2 /2 /2 /2 /2 /2 /2 /2 / /2 /2 28

29 Examples (back to the Prediction Game)... Fair Coin... Space of histories: K S = K One future morph: Support: L Distribution: Pr( S L s )=2 L Call it state. 29

30 Examples (back to the Prediction Game)... Fair Coin... Label tree nodes with state names: /2 /2 /2 /2 /2 /2 /2 /2 /2 /2 /2 /2 /2 /2 /2 / /2 /2 3

31 Examples (back to the Prediction Game)... Fair Coin... ɛm : M = { } S, {T (s),s } States: S = {S = L } Transitions: T () = T () = 2 3

32 Examples (back to the Prediction Game)... Fair Coin... Causal state distribution: p S = () Entropy Rate: h µ = bit per symbol Statistical Complexity: C µ = bits 32

33 Examples... Biased Coin: p Pr() = 2 3 Parse Tree D = 5 /3 2/3... /3 2/3 /3 2/3 /3 2/3 /3 2/3 /3 2/3 /3 2/3 / 2/ /3 2/3 33

34 Examples (back to the Prediction Game)... Biased Coin: Parse Tree D = 5 Future Morph L = 2 /3 2/3 /3 2/3 /3 2/3 /3 2/3 /3 2/3 /3 2/3 /3 2/3 /3 2/3 /3 2/3 /3 2/3 / 2/ /3 2/3 34

35 Examples (back to the Prediction Game)... Biased Coin... Space of histories: K S = K Single future morph: Support: L Distribution: Pr(! S L s )=p n ( p) L n (n # 2! s L ) Call it state. 35

36 Examples (back to the Prediction Game)... Biased Coin... Label tree nodes with state names: /3 2/3 /3 2/3 /3 2/3 /3 2/3 /3 2/3 /3 2/3 /3 2/3 / 2/ /3 2/3 36

37 Examples (back to the Prediction Game)... Biased Coin... ɛm : M = { } S, {T (s),s } States: S = {S = L } Transitions: T () = 3 T () 2 =

38 Examples (back to the Prediction Game)... Biased Coin... Causal state distribution: p S = () Entropy Rate: h µ = H bits per symbol Statistical Complexity: C µ = bits 38

39 Examples (back to the Prediction Game)... Period-2 Process:... Parse Tree D = 5 /2 /2 39

40 Examples (back to the Prediction Game)... Period-2 Process:... Parse Tree D = 5 Future Morphs at L = 2 /2 /2 S S S 2 /2 /2 4

41 Examples (back to the Prediction Game)... Period-2 Process:... Space of histories: S = { s =..., s =...} Future morphs: { S λ} = {...,...} { S } = {...} { S } = {...} { S } = {...} { S } = {...} 4

42 Examples (back to the Prediction Game)... Period-2 Process:... Morph distributions: Pr( λ) = 2 Pr( λ) = 2 Pr( ) = Pr( ) = Pr( ) = Pr( ) = 42

43 Examples (back to the Prediction Game)... S Period-2 Process... /2 /2 Label tree nodes: S 2 S S S 2 S 2 S 43

44 Examples (back to the Prediction Game)... Period-2 Process... { } M = S, {T (s),s } States: S = {S = {...,...}, S = {...}, S 2 = {...}} Transitions: T () 2 T () = S 2 2 S S 2 44

45 Examples (back to the Prediction Game)... Period-2 Process... S Causal State Distribution: 2 2 p S =, 2, 2 S S 2 Entropy rate: h µ = bits per symbol Statistical complexity: C µ = bit 45

46 Examples... Golden Mean Process: Topological Reconstruction (Only support of word distribution) Parse Tree D = 5 46

47 Examples... Golden Mean Process: Topological Reconstruction (Only support of word distribution) Parse Tree D = 5 Morphs at L = 2 46

48 Examples... Golden Mean Process: Topological Reconstruction (Only support of word distribution) Parse Tree D = 5 Morphs at L = 2 B 46

49 Examples... Golden Mean Process: Topological -machine { } M = S, {T (s),s } Topological Causal States: S = {, B} B Transitions: T () = T () = 47

50 Examples... Golden Mean Process: Topological -machine Probabilistic Transitions: T () = 2 T () = B Causal State Distribution: p S = 2 3, 3 Entropy rate: h µ = 2 3 bits per symbol Statistical complexity: C µ = H( 2 3 ) bits 48

51 Examples... Golden Mean Process: Probabilistic reconstruction { } M = S, {T (s),s } (Capture full word distribution) Causal States: S = {, B, C} Transitions: /3 T () = / B C 2/3 T () = /2 49

52 Parse Tree D = 5 Examples... Even Process: Topological reconstruction 5

53 Examples... Even Process: Topological reconstruction Parse Tree D = 5 Future Morphs at L = 2 5

54 Examples... Even Process: Topological reconstruction Parse Tree D = 5 Future Morphs at L = 2 B C 5

55 Parse Tree D = 5 Examples... Even Process: Topological reconstruction B Label tree nodes: B C B B C B B C B B C B C C B B B B B C 52

56 Examples... Even Process: Topological reconstruction Topological States: S = {, B, C} Topological Transitions: T () T () 53

57 Examples... Even Process: Topological ε-machine { } M = S, {T (s),s } States: S = {, B, C} Transitions: T () = T () = B 2 2 C 54

58 Examples Even Process: Probabilistic reconstruction { } M = S, {T (s),s } B 4 States: S = {, B, C, D} 2 C Transitions: T () = 3 B 4 2 T () = B 4 2 D 2 Entropy rate: h µ = 2 3 bits per symbol Statistical complexity: C µ = H( 2 3 ) bits 55

59 Reading for next lecture: CMR article CMPPSS Homework: ɛm reconstruction for GMP, EP, & RRXOR Helpful? Tree & morph paper at: 56

Quantum versus Classical Measures of Complexity on Classical Information

Quantum versus Classical Measures of Complexity on Classical Information Quantum versus Classical Measures of Complexity on Classical Information Lock Yue Chew, PhD Division of Physics and Applied Physics School of Physical & Mathematical Sciences & Complexity Institute Nanyang

More information

Quantum Finite-State Machines

Quantum Finite-State Machines Quantum Finite-State Machines manda Young Department of Mathematics University of California, Davis amyoung@math.ucdavis.edu June, bstract Stochastic finite-state machines provide information about the

More information

SIGNAL COMPRESSION Lecture 7. Variable to Fix Encoding

SIGNAL COMPRESSION Lecture 7. Variable to Fix Encoding SIGNAL COMPRESSION Lecture 7 Variable to Fix Encoding 1. Tunstall codes 2. Petry codes 3. Generalized Tunstall codes for Markov sources (a presentation of the paper by I. Tabus, G. Korodi, J. Rissanen.

More information

Lecture 9. Intro to Hidden Markov Models (finish up)

Lecture 9. Intro to Hidden Markov Models (finish up) Lecture 9 Intro to Hidden Markov Models (finish up) Review Structure Number of states Q 1.. Q N M output symbols Parameters: Transition probability matrix a ij Emission probabilities b i (a), which is

More information

Intelligent Systems:

Intelligent Systems: Intelligent Systems: Undirected Graphical models (Factor Graphs) (2 lectures) Carsten Rother 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM Roadmap for next two lectures Definition

More information

Lecture 6: Entropy Rate

Lecture 6: Entropy Rate Lecture 6: Entropy Rate Entropy rate H(X) Random walk on graph Dr. Yao Xie, ECE587, Information Theory, Duke University Coin tossing versus poker Toss a fair coin and see and sequence Head, Tail, Tail,

More information

Lecture 1: September 25, A quick reminder about random variables and convexity

Lecture 1: September 25, A quick reminder about random variables and convexity Information and Coding Theory Autumn 207 Lecturer: Madhur Tulsiani Lecture : September 25, 207 Administrivia This course will cover some basic concepts in information and coding theory, and their applications

More information

KNOWLEDGE AND MEANING... CHAOS AND COMPLEXITY

KNOWLEDGE AND MEANING... CHAOS AND COMPLEXITY in Modeling Complex Phenomena, L. Lam and V. Naroditsky, editors, Springer, Berlin (992) 66. SFI 9 9 35 KNOWLEDGE AND MEANING... CHAOS AND COMPLEXITY J. P. Crutchfield What are models good for? Taking

More information

The Modeling and Complexity of Dynamical Systems by Means of Computation and Information Theories

The Modeling and Complexity of Dynamical Systems by Means of Computation and Information Theories Proceedings of the nd Central European Conference on Information and Intelligent Systems 8 The Modeling and Complexity of Dynamical Systems by Means of Computation and Information Theories Robert Logozar

More information

Lecture 16. Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code

Lecture 16. Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code Lecture 16 Agenda for the lecture Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code Variable-length source codes with error 16.1 Error-free coding schemes 16.1.1 The Shannon-Fano-Elias

More information

Randomized Decision Trees

Randomized Decision Trees Randomized Decision Trees compiled by Alvin Wan from Professor Jitendra Malik s lecture Discrete Variables First, let us consider some terminology. We have primarily been dealing with real-valued data,

More information

Pattern Discovery in Time Series, Part II: Implementation, Evaluation, and Comparison

Pattern Discovery in Time Series, Part II: Implementation, Evaluation, and Comparison Pattern Discovery in Time Series, Part II: Implementation, Evaluation, and Comparison Kristina isa Shalizi klinkner@santafe.edu Physics Department, University of San Francisco, 23 Fulton Street, San Francisco,

More information

Bayesian Additive Regression Tree (BART) with application to controlled trail data analysis

Bayesian Additive Regression Tree (BART) with application to controlled trail data analysis Bayesian Additive Regression Tree (BART) with application to controlled trail data analysis Weilan Yang wyang@stat.wisc.edu May. 2015 1 / 20 Background CATE i = E(Y i (Z 1 ) Y i (Z 0 ) X i ) 2 / 20 Background

More information

Lecture 7: DecisionTrees

Lecture 7: DecisionTrees Lecture 7: DecisionTrees What are decision trees? Brief interlude on information theory Decision tree construction Overfitting avoidance Regression trees COMP-652, Lecture 7 - September 28, 2009 1 Recall:

More information

Structural Drift: The Population Dynamics of Sequential Learning

Structural Drift: The Population Dynamics of Sequential Learning Structural Drift: The Population Dynamics of Sequential Learning James P. Crutchfield Sean Whalen SFI WORKING PAPER: 2010-05-011 SFI Working Papers contain accounts of scientific work of the author(s)

More information

CS Communication Complexity: Applications and New Directions

CS Communication Complexity: Applications and New Directions CS 2429 - Communication Complexity: Applications and New Directions Lecturer: Toniann Pitassi 1 Introduction In this course we will define the basic two-party model of communication, as introduced in the

More information

ECE 564/645 - Digital Communications, Spring 2018 Homework #2 Due: March 19 (In Lecture)

ECE 564/645 - Digital Communications, Spring 2018 Homework #2 Due: March 19 (In Lecture) ECE 564/645 - Digital Communications, Spring 018 Homework # Due: March 19 (In Lecture) 1. Consider a binary communication system over a 1-dimensional vector channel where message m 1 is sent by signaling

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

1 Ways to Describe a Stochastic Process

1 Ways to Describe a Stochastic Process purdue university cs 59000-nmc networks & matrix computations LECTURE NOTES David F. Gleich September 22, 2011 Scribe Notes: Debbie Perouli 1 Ways to Describe a Stochastic Process We will use the biased

More information

Lecture 1 : Data Compression and Entropy

Lecture 1 : Data Compression and Entropy CPS290: Algorithmic Foundations of Data Science January 8, 207 Lecture : Data Compression and Entropy Lecturer: Kamesh Munagala Scribe: Kamesh Munagala In this lecture, we will study a simple model for

More information

Lecture 3: Decision Trees

Lecture 3: Decision Trees Lecture 3: Decision Trees Cognitive Systems II - Machine Learning SS 2005 Part I: Basic Approaches of Concept Learning ID3, Information Gain, Overfitting, Pruning Lecture 3: Decision Trees p. Decision

More information

6.02 Fall 2012 Lecture #1

6.02 Fall 2012 Lecture #1 6.02 Fall 2012 Lecture #1 Digital vs. analog communication The birth of modern digital communication Information and entropy Codes, Huffman coding 6.02 Fall 2012 Lecture 1, Slide #1 6.02 Fall 2012 Lecture

More information

Quantitative Information Flow. Lecture 7

Quantitative Information Flow. Lecture 7 Quantitative Information Flow Lecture 7 1 The basic model: Systems = Information-Theoretic channels Secret Information Observables s1 o1... System... sm on Input Output 2 Probabilistic systems are noisy

More information

Decision Trees. Tirgul 5

Decision Trees. Tirgul 5 Decision Trees Tirgul 5 Using Decision Trees It could be difficult to decide which pet is right for you. We ll find a nice algorithm to help us decide what to choose without having to think about it. 2

More information

Reconstruction Deconstruction:

Reconstruction Deconstruction: Reconstruction Deconstruction: A Brief History of Building Models of Nonlinear Dynamical Systems Jim Crutchfield Center for Computational Science & Engineering Physics Department University of California,

More information

Computation at the Nanoscale:

Computation at the Nanoscale: Computation at the Nanoscale: thoughts on information processing in novel materials, molecules, & atoms Jim Crutchfield Complexity Sciences Center Physics Department University of California at Davis Telluride

More information

Towards a Theory of Information Flow in the Finitary Process Soup

Towards a Theory of Information Flow in the Finitary Process Soup Towards a Theory of in the Finitary Process Department of Computer Science and Complexity Sciences Center University of California at Davis June 1, 2010 Goals Analyze model of evolutionary self-organization

More information

STATISTICAL METHODS IN AI/ML Vibhav Gogate The University of Texas at Dallas. Bayesian networks: Representation

STATISTICAL METHODS IN AI/ML Vibhav Gogate The University of Texas at Dallas. Bayesian networks: Representation STATISTICAL METHODS IN AI/ML Vibhav Gogate The University of Texas at Dallas Bayesian networks: Representation Motivation Explicit representation of the joint distribution is unmanageable Computationally:

More information

Unsupervised Inductive Learning In Symbolic Sequences via Recursive Identification of Self-Similar Semantics

Unsupervised Inductive Learning In Symbolic Sequences via Recursive Identification of Self-Similar Semantics 2011 American Control Conference on O'Farrell Street, San Francisco, CA, USA June 29 - July 01, 2011 Unsupervised Inductive Learning In Symbolic Sequences via Recursive Identification of Self-Similar Semantics

More information

CS5371 Theory of Computation. Lecture 9: Automata Theory VII (Pumping Lemma, Non-CFL, DPDA PDA)

CS5371 Theory of Computation. Lecture 9: Automata Theory VII (Pumping Lemma, Non-CFL, DPDA PDA) CS5371 Theory of Computation Lecture 9: Automata Theory VII (Pumping Lemma, Non-CFL, DPDA PDA) Objectives Introduce the Pumping Lemma for CFL Show that some languages are non- CFL Discuss the DPDA, which

More information

Properties of Context-Free Languages

Properties of Context-Free Languages Properties of Context-Free Languages Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

CS155: Probability and Computing: Randomized Algorithms and Probabilistic Analysis

CS155: Probability and Computing: Randomized Algorithms and Probabilistic Analysis CS155: Probability and Computing: Randomized Algorithms and Probabilistic Analysis Eli Upfal Eli Upfal@brown.edu Office: 319 TA s: Lorenzo De Stefani and Sorin Vatasoiu cs155tas@cs.brown.edu It is remarkable

More information

Classification & Information Theory Lecture #8

Classification & Information Theory Lecture #8 Classification & Information Theory Lecture #8 Introduction to Natural Language Processing CMPSCI 585, Fall 2007 University of Massachusetts Amherst Andrew McCallum Today s Main Points Automatically categorizing

More information

COMP3702/7702 Artificial Intelligence Lecture 11: Introduction to Machine Learning and Reinforcement Learning. Hanna Kurniawati

COMP3702/7702 Artificial Intelligence Lecture 11: Introduction to Machine Learning and Reinforcement Learning. Hanna Kurniawati COMP3702/7702 Artificial Intelligence Lecture 11: Introduction to Machine Learning and Reinforcement Learning Hanna Kurniawati Today } What is machine learning? } Where is it used? } Types of machine learning

More information

BAYESIAN STRUCTURAL INFERENCE AND THE BAUM-WELCH ALGORITHM

BAYESIAN STRUCTURAL INFERENCE AND THE BAUM-WELCH ALGORITHM BAYESIAN STRUCTURAL INFERENCE AND THE BAUM-WELCH ALGORITHM DMITRY SHEMETOV UNIVERSITY OF CALIFORNIA AT DAVIS Department of Mathematics dshemetov@ucdavis.edu Abstract. We perform a preliminary comparative

More information

Anatomy of a Bit. Reading for this lecture: CMR articles Yeung and Anatomy.

Anatomy of a Bit. Reading for this lecture: CMR articles Yeung and Anatomy. Anatomy of a Bit Reading for this lecture: CMR articles Yeung and Anatomy. 1 Information in a single measurement: Alternative rationale for entropy and entropy rate. k measurement outcomes Stored in log

More information

Learning Decision Trees

Learning Decision Trees Learning Decision Trees Machine Learning Spring 2018 1 This lecture: Learning Decision Trees 1. Representation: What are decision trees? 2. Algorithm: Learning decision trees The ID3 algorithm: A greedy

More information

EECS 229A Spring 2007 * * (a) By stationarity and the chain rule for entropy, we have

EECS 229A Spring 2007 * * (a) By stationarity and the chain rule for entropy, we have EECS 229A Spring 2007 * * Solutions to Homework 3 1. Problem 4.11 on pg. 93 of the text. Stationary processes (a) By stationarity and the chain rule for entropy, we have H(X 0 ) + H(X n X 0 ) = H(X 0,

More information

Supervised Machine Learning (Spring 2014) Homework 2, sample solutions

Supervised Machine Learning (Spring 2014) Homework 2, sample solutions 58669 Supervised Machine Learning (Spring 014) Homework, sample solutions Credit for the solutions goes to mainly to Panu Luosto and Joonas Paalasmaa, with some additional contributions by Jyrki Kivinen

More information

Statistical Methods for NLP

Statistical Methods for NLP Statistical Methods for NLP Sequence Models Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(21) Introduction Structured

More information

Even More on Dynamic Programming

Even More on Dynamic Programming Algorithms & Models of Computation CS/ECE 374, Fall 2017 Even More on Dynamic Programming Lecture 15 Thursday, October 19, 2017 Sariel Har-Peled (UIUC) CS374 1 Fall 2017 1 / 26 Part I Longest Common Subsequence

More information

Phylogenetics: Likelihood

Phylogenetics: Likelihood 1 Phylogenetics: Likelihood COMP 571 Luay Nakhleh, Rice University The Problem 2 Input: Multiple alignment of a set S of sequences Output: Tree T leaf-labeled with S Assumptions 3 Characters are mutually

More information

Beyond the typical set: Fluctuations in Intrinsic Computation

Beyond the typical set: Fluctuations in Intrinsic Computation Beyond the typical set: Fluctuations in Intrinsic Computation Cina Aghamohammadi Complexity Sciences Center and Department of Physics, niversity of California at Davis, One Shields Avenue, Davis, CA 9566

More information

Designing Information Devices and Systems I Fall 2018 Lecture Notes Note Introduction: Op-amps in Negative Feedback

Designing Information Devices and Systems I Fall 2018 Lecture Notes Note Introduction: Op-amps in Negative Feedback EECS 16A Designing Information Devices and Systems I Fall 2018 Lecture Notes Note 18 18.1 Introduction: Op-amps in Negative Feedback In the last note, we saw that can use an op-amp as a comparator. However,

More information

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10 EECS 70 Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10 Introduction to Basic Discrete Probability In the last note we considered the probabilistic experiment where we flipped

More information

Dynamically data-driven morphing of reduced order models and the prediction of transients

Dynamically data-driven morphing of reduced order models and the prediction of transients STOCHASTIC ANALYSIS AND NONLINEAR DYNAMICS Dynamically data-driven morphing of reduced order models and the prediction of transients Joint NSF/AFOSR EAGER on Dynamic Data Systems Themis Sapsis Massachusetts

More information

Local Probability Models

Local Probability Models Readings: K&F 3.4, 5.~5.5 Local Probability Models Lecture 3 pr 4, 2 SE 55, Statistical Methods, Spring 2 Instructor: Su-In Lee University of Washington, Seattle Outline Last time onditional parameterization

More information

1 The Probably Approximately Correct (PAC) Model

1 The Probably Approximately Correct (PAC) Model COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #3 Scribe: Yuhui Luo February 11, 2008 1 The Probably Approximately Correct (PAC) Model A target concept class C is PAC-learnable by

More information

Streaming Algorithms for Optimal Generation of Random Bits

Streaming Algorithms for Optimal Generation of Random Bits Streaming Algorithms for Optimal Generation of Random Bits ongchao Zhou Electrical Engineering Department California Institute of echnology Pasadena, CA 925 Email: hzhou@caltech.edu Jehoshua Bruck Electrical

More information

9 Forward-backward algorithm, sum-product on factor graphs

9 Forward-backward algorithm, sum-product on factor graphs Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 9 Forward-backward algorithm, sum-product on factor graphs The previous

More information

Learning from Sensor Data: Set II. Behnaam Aazhang J.S. Abercombie Professor Electrical and Computer Engineering Rice University

Learning from Sensor Data: Set II. Behnaam Aazhang J.S. Abercombie Professor Electrical and Computer Engineering Rice University Learning from Sensor Data: Set II Behnaam Aazhang J.S. Abercombie Professor Electrical and Computer Engineering Rice University 1 6. Data Representation The approach for learning from data Probabilistic

More information

Rapid Introduction to Machine Learning/ Deep Learning

Rapid Introduction to Machine Learning/ Deep Learning Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University 1/32 Lecture 5a Bayesian network April 14, 2016 2/32 Table of contents 1 1. Objectives of Lecture 5a 2 2.Bayesian

More information

Information Theory and Coding Techniques: Chapter 1.1. What is Information Theory? Why you should take this course?

Information Theory and Coding Techniques: Chapter 1.1. What is Information Theory? Why you should take this course? Information Theory and Coding Techniques: Chapter 1.1 What is Information Theory? Why you should take this course? 1 What is Information Theory? Information Theory answers two fundamental questions in

More information

Context-Free Languages

Context-Free Languages CS:4330 Theory of Computation Spring 2018 Context-Free Languages Non-Context-Free Languages Haniel Barbosa Readings for this lecture Chapter 2 of [Sipser 1996], 3rd edition. Section 2.3. Proving context-freeness

More information

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag Decision Trees Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Supervised Learning Input: labelled training data i.e., data plus desired output Assumption:

More information

Machine Learning Recitation 8 Oct 21, Oznur Tastan

Machine Learning Recitation 8 Oct 21, Oznur Tastan Machine Learning 10601 Recitation 8 Oct 21, 2009 Oznur Tastan Outline Tree representation Brief information theory Learning decision trees Bagging Random forests Decision trees Non linear classifier Easy

More information

Data Compression Techniques

Data Compression Techniques Data Compression Techniques Part 2: Text Compression Lecture 5: Context-Based Compression Juha Kärkkäinen 14.11.2017 1 / 19 Text Compression We will now look at techniques for text compression. These techniques

More information

Expectation Maximization (EM)

Expectation Maximization (EM) Expectation Maximization (EM) The EM algorithm is used to train models involving latent variables using training data in which the latent variables are not observed (unlabeled data). This is to be contrasted

More information

CMPSCI 240: Reasoning about Uncertainty

CMPSCI 240: Reasoning about Uncertainty CMPSCI 240: Reasoning about Uncertainty Lecture 17: Representing Joint PMFs and Bayesian Networks Andrew McGregor University of Massachusetts Last Compiled: April 7, 2017 Warm Up: Joint distributions Recall

More information

EE376A: Homework #2 Solutions Due by 11:59pm Thursday, February 1st, 2018

EE376A: Homework #2 Solutions Due by 11:59pm Thursday, February 1st, 2018 Please submit the solutions on Gradescope. Some definitions that may be useful: EE376A: Homework #2 Solutions Due by 11:59pm Thursday, February 1st, 2018 Definition 1: A sequence of random variables X

More information

Lecture 12: May 09, Decomposable Graphs (continues from last time)

Lecture 12: May 09, Decomposable Graphs (continues from last time) 596 Pat. Recog. II: Introduction to Graphical Models University of Washington Spring 00 Dept. of lectrical ngineering Lecture : May 09, 00 Lecturer: Jeff Bilmes Scribe: Hansang ho, Izhak Shafran(000).

More information

6.867 Machine learning, lecture 23 (Jaakkola)

6.867 Machine learning, lecture 23 (Jaakkola) Lecture topics: Markov Random Fields Probabilistic inference Markov Random Fields We will briefly go over undirected graphical models or Markov Random Fields (MRFs) as they will be needed in the context

More information

Linear Sketches A Useful Tool in Streaming and Compressive Sensing

Linear Sketches A Useful Tool in Streaming and Compressive Sensing Linear Sketches A Useful Tool in Streaming and Compressive Sensing Qin Zhang 1-1 Linear sketch Random linear projection M : R n R k that preserves properties of any v R n with high prob. where k n. M =

More information

http://imgs.xkcd.com/comics/electoral_precedent.png Statistical Learning Theory CS4780/5780 Machine Learning Fall 2012 Thorsten Joachims Cornell University Reading: Mitchell Chapter 7 (not 7.4.4 and 7.5)

More information

Machine learning: lecture 20. Tommi S. Jaakkola MIT CSAIL

Machine learning: lecture 20. Tommi S. Jaakkola MIT CSAIL Machine learning: lecture 20 ommi. Jaakkola MI CAI tommi@csail.mit.edu opics Representation and graphical models examples Bayesian networks examples, specification graphs and independence associated distribution

More information

Computing and Communications 2. Information Theory -Entropy

Computing and Communications 2. Information Theory -Entropy 1896 1920 1987 2006 Computing and Communications 2. Information Theory -Entropy Ying Cui Department of Electronic Engineering Shanghai Jiao Tong University, China 2017, Autumn 1 Outline Entropy Joint entropy

More information

Statistical Methods for NLP

Statistical Methods for NLP Statistical Methods for NLP Information Extraction, Hidden Markov Models Sameer Maskey Week 5, Oct 3, 2012 *many slides provided by Bhuvana Ramabhadran, Stanley Chen, Michael Picheny Speech Recognition

More information

2 : Directed GMs: Bayesian Networks

2 : Directed GMs: Bayesian Networks 10-708: Probabilistic Graphical Models 10-708, Spring 2017 2 : Directed GMs: Bayesian Networks Lecturer: Eric P. Xing Scribes: Jayanth Koushik, Hiroaki Hayashi, Christian Perez Topic: Directed GMs 1 Types

More information

Decision Tree Learning Lecture 2

Decision Tree Learning Lecture 2 Machine Learning Coms-4771 Decision Tree Learning Lecture 2 January 28, 2008 Two Types of Supervised Learning Problems (recap) Feature (input) space X, label (output) space Y. Unknown distribution D over

More information

Bayesian Learning (II)

Bayesian Learning (II) Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

More information

6.891: Lecture 24 (December 8th, 2003) Kernel Methods

6.891: Lecture 24 (December 8th, 2003) Kernel Methods 6.891: Lecture 24 (December 8th, 2003) Kernel Methods Overview ffl Recap: global linear models ffl New representations from old representations ffl computational trick ffl Kernels for NLP structures ffl

More information

Shannon s Noisy-Channel Coding Theorem

Shannon s Noisy-Channel Coding Theorem Shannon s Noisy-Channel Coding Theorem Lucas Slot Sebastian Zur February 2015 Abstract In information theory, Shannon s Noisy-Channel Coding Theorem states that it is possible to communicate over a noisy

More information

Entropy. Probability and Computing. Presentation 22. Probability and Computing Presentation 22 Entropy 1/39

Entropy. Probability and Computing. Presentation 22. Probability and Computing Presentation 22 Entropy 1/39 Entropy Probability and Computing Presentation 22 Probability and Computing Presentation 22 Entropy 1/39 Introduction Why randomness and information are related? An event that is almost certain to occur

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 9: Variational Inference Relaxations Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 24/10/2011 (EPFL) Graphical Models 24/10/2011 1 / 15

More information

Natural Language Processing : Probabilistic Context Free Grammars. Updated 5/09

Natural Language Processing : Probabilistic Context Free Grammars. Updated 5/09 Natural Language Processing : Probabilistic Context Free Grammars Updated 5/09 Motivation N-gram models and HMM Tagging only allowed us to process sentences linearly. However, even simple sentences require

More information

Naive Bayes classification

Naive Bayes classification Naive Bayes classification Christos Dimitrakakis December 4, 2015 1 Introduction One of the most important methods in machine learning and statistics is that of Bayesian inference. This is the most fundamental

More information

1/22/13. Example: CpG Island. Question 2: Finding CpG Islands

1/22/13. Example: CpG Island. Question 2: Finding CpG Islands I529: Machine Learning in Bioinformatics (Spring 203 Hidden Markov Models Yuzhen Ye School of Informatics and Computing Indiana Univerty, Bloomington Spring 203 Outline Review of Markov chain & CpG island

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

This lecture covers Chapter 7 of HMU: Properties of CFLs

This lecture covers Chapter 7 of HMU: Properties of CFLs This lecture covers Chapter 7 of HMU: Properties of CFLs Chomsky Normal Form Pumping Lemma for CFs Closure Properties of CFLs Decision Properties of CFLs Additional Reading: Chapter 7 of HMU. Chomsky Normal

More information

Notes on Machine Learning for and

Notes on Machine Learning for and Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Learning = improving with experience Improve over task T (e.g, Classification, control tasks) with respect

More information

10.1 The Formal Model

10.1 The Formal Model 67577 Intro. to Machine Learning Fall semester, 2008/9 Lecture 10: The Formal (PAC) Learning Model Lecturer: Amnon Shashua Scribe: Amnon Shashua 1 We have see so far algorithms that explicitly estimate

More information

Optimal Causal Inference

Optimal Causal Inference Optimal Causal Inference Susanne Still James P. Crutchfield Christopher J. Ellison SFI WORKING PAPER: 007-08-04 SFI Working Papers contain accounts of scientific work of the author(s) and do not necessarily

More information

2011 Pearson Education, Inc

2011 Pearson Education, Inc Statistics for Business and Economics Chapter 3 Probability Contents 1. Events, Sample Spaces, and Probability 2. Unions and Intersections 3. Complementary Events 4. The Additive Rule and Mutually Exclusive

More information

2 : Directed GMs: Bayesian Networks

2 : Directed GMs: Bayesian Networks 10-708: Probabilistic Graphical Models, Spring 2015 2 : Directed GMs: Bayesian Networks Lecturer: Eric P. Xing Scribes: Yi Cheng, Cong Lu 1 Notation Here the notations used in this course are defined:

More information

1 Review of The Learning Setting

1 Review of The Learning Setting COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #8 Scribe: Changyan Wang February 28, 208 Review of The Learning Setting Last class, we moved beyond the PAC model: in the PAC model we

More information

Dynamic Approaches: The Hidden Markov Model

Dynamic Approaches: The Hidden Markov Model Dynamic Approaches: The Hidden Markov Model Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Inference as Message

More information

Supervised Learning via Decision Trees

Supervised Learning via Decision Trees Supervised Learning via Decision Trees Lecture 4 1 Outline 1. Learning via feature splits 2. ID3 Information gain 3. Extensions Continuous features Gain ratio Ensemble learning 2 Sequence of decisions

More information

Structural Drift: The Population Dynamics of Sequential Learning

Structural Drift: The Population Dynamics of Sequential Learning Structural Drift: The Population Dynamics of Sequential Learning James P. Crutchfield 1, 2, 3, and Sean Whalen 4, 1 Complexity Sciences Center 2 Physics Department University of California Davis, One Shields

More information

The Modeling and Complexity of Dynamical Systems by Means of Computation and Information Theories

The Modeling and Complexity of Dynamical Systems by Means of Computation and Information Theories JIOS, VOL. 35,NO. 2 (2011) SUBMITTED 11/11; ACCEPTED 12/11 UDC 0004.03 The Modeling and Complexity of Dynamical Systems by Means of Computation and Information Theories Robert Logozar Polytechnic of Varazdin

More information

Learning Decision Trees

Learning Decision Trees Learning Decision Trees Machine Learning Fall 2018 Some slides from Tom Mitchell, Dan Roth and others 1 Key issues in machine learning Modeling How to formulate your problem as a machine learning problem?

More information

Math 1313 Experiments, Events and Sample Spaces

Math 1313 Experiments, Events and Sample Spaces Math 1313 Experiments, Events and Sample Spaces At the end of this recording, you should be able to define and use the basic terminology used in defining experiments. Terminology The next main topic in

More information

Introduction to Information Theory and Its Applications

Introduction to Information Theory and Its Applications Introduction to Information Theory and Its Applications Radim Bělohlávek Information Theory: What and Why information: one of key terms in our society: popular keywords such as information/knowledge society

More information

Feature selection and extraction Spectral domain quality estimation Alternatives

Feature selection and extraction Spectral domain quality estimation Alternatives Feature selection and extraction Error estimation Maa-57.3210 Data Classification and Modelling in Remote Sensing Markus Törmä markus.torma@tkk.fi Measurements Preprocessing: Remove random and systematic

More information

Semantics and Thermodynamics

Semantics and Thermodynamics in Nonlinear Modeling and Forecasting, M. Casdagli and S. Eubank, editors, Santa Fe Institute Studies in the Sciences of Complexity XII Addison-Wesley, Reading, Massachusetts (1992) 317 359. SFI 91 9 33

More information

Information and Entropy. Professor Kevin Gold

Information and Entropy. Professor Kevin Gold Information and Entropy Professor Kevin Gold What s Information? Informally, when I communicate a message to you, that s information. Your grade is 100/100 Information can be encoded as a signal. Words

More information

Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet)

Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet) Compression Motivation Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet) Storage: Store large & complex 3D models (e.g. 3D scanner

More information

Lecture 2: Weighted Majority Algorithm

Lecture 2: Weighted Majority Algorithm EECS 598-6: Prediction, Learning and Games Fall 3 Lecture : Weighted Majority Algorithm Lecturer: Jacob Abernethy Scribe: Petter Nilsson Disclaimer: These notes have not been subjected to the usual scrutiny

More information

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma COMS 4771 Probabilistic Reasoning via Graphical Models Nakul Verma Last time Dimensionality Reduction Linear vs non-linear Dimensionality Reduction Principal Component Analysis (PCA) Non-linear methods

More information

Distributed Data Fusion with Kalman Filters. Simon Julier Computer Science Department University College London

Distributed Data Fusion with Kalman Filters. Simon Julier Computer Science Department University College London Distributed Data Fusion with Kalman Filters Simon Julier Computer Science Department University College London S.Julier@cs.ucl.ac.uk Structure of Talk Motivation Kalman Filters Double Counting Optimal

More information

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network. ecall from last time Lecture 3: onditional independence and graph structure onditional independencies implied by a belief network Independence maps (I-maps) Factorization theorem The Bayes ball algorithm

More information