Formal Modeling in Cognitive Science Lecture 29: Noisy Channel Model and Applications;

Similar documents
ECE 534 Information Theory - Midterm 2

An Introduction to Information Theory: Notes

Homework Set #3 Rates definitions, Channel Coding, Source-Channel coding

Named Entity Recognition using Maximum Entropy Model SEEM5680

On Code Design for Simultaneous Energy and Information Transfer

EE/Stats 376A: Information theory Winter Lecture 5 Jan 24. Lecturer: David Tse Scribe: Michael X, Nima H, Geng Z, Anton J, Vivek B.

Convex Optimization methods for Computing Channel Capacity

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

Dept. of Linguistics, Indiana University Fall 2015

Lecture 21: Quantum Communication

Improved Capacity Bounds for the Binary Energy Harvesting Channel

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

Convolutional Codes. Lecture 13. Figure 93: Encoder for rate 1/2 constraint length 3 convolutional code.

I - Information theory basics

Lecture 1: Introduction, Entropy and ML estimation

Lecture 2: August 31

Improving on the Cutset Bound via a Geometric Analysis of Typical Sets

Shannon s noisy-channel theorem

On the capacity of the general trapdoor channel with feedback

THE development of efficient and simplified bounding

Distributed K-means over Compressed Binary Data

s v 0 q 0 v 1 q 1 v 2 (q 2) v 3 q 3 v 4

q-ary Symmetric Channel for Large q

Distributed Rule-Based Inference in the Presence of Redundant Information

Lecture 1: Shannon s Theorem

Training sequence optimization for frequency selective channels with MAP equalization

Approximating min-max k-clustering

Inequalities for the L 1 Deviation of the Empirical Distribution

(Classical) Information Theory III: Noisy channel coding

FAST AND EFFICIENT SIDE INFORMATION GENERATION IN DISTRIBUTED VIDEO CODING BY USING DENSE MOTION REPRESENTATIONS

dn i where we have used the Gibbs equation for the Gibbs energy and the definition of chemical potential

Interactive Hypothesis Testing Against Independence

Information Theory and Statistics Lecture 2: Source coding

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy

Outline. Markov Chains and Markov Models. Outline. Markov Chains. Markov Chains Definitions Huizhen Yu

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

Coding Along Hermite Polynomials for Gaussian Noise Channels

Noisy channel communication

4. Score normalization technical details We now discuss the technical details of the score normalization method.

Information in Biology

Feedback-error control

1 Introduction. Sept CS497:Learning and NLP Lec 4: Mathematical and Computational Paradigms Fall Consider the following examples:

Evaluating Circuit Reliability Under Probabilistic Gate-Level Fault Models

Lecture 5 Channel Coding over Continuous Channels

Probability Estimates for Multi-class Classification by Pairwise Coupling

Information Theory in Intelligent Decision Making

LDPC codes for the Cascaded BSC-BAWGN channel

UNCERTAINLY MEASUREMENT

Towards understanding the Lorenz curve using the Uniform distribution. Chris J. Stephens. Newcastle City Council, Newcastle upon Tyne, UK

Information Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18

Outline. EECS150 - Digital Design Lecture 26 Error Correction Codes, Linear Feedback Shift Registers (LFSRs) Simple Error Detection Coding

Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code

Digital Communications III (ECE 154C) Introduction to Coding and Information Theory

Chapter 2: Source coding

U Logo Use Guidelines

Information in Biology

Lecture 9: Connecting PH, P/poly and BPP

Periodic scheduling 05/06/

Information Theory. Week 4 Compressing streams. Iain Murray,

Convex Analysis and Economic Theory Winter 2018

Chapter 1 Fundamentals

MATHEMATICAL MODELLING OF THE WIRELESS COMMUNICATION NETWORK

17.1 Binary Codes Normal numbers we use are in base 10, which are called decimal numbers. Each digit can be 10 possible numbers: 0, 1, 2, 9.

Covariance Matrix Estimation for Reinforcement Learning

δq T = nr ln(v B/V A )

Amin, Osama; Abediseid, Walid; Alouini, Mohamed-Slim. Institute of Electrical and Electronics Engineers (IEEE)

FE FORMULATIONS FOR PLASTICITY

A Parallel Algorithm for Minimization of Finite Automata

STABILITY ANALYSIS TOOL FOR TUNING UNCONSTRAINED DECENTRALIZED MODEL PREDICTIVE CONTROLLERS

Cryptanalysis of Pseudorandom Generators

10-704: Information Processing and Learning Fall Lecture 10: Oct 3

Some Results on the Generalized Gaussian Distribution

16.2. Infinite Series. Introduction. Prerequisites. Learning Outcomes

Shadow Computing: An Energy-Aware Fault Tolerant Computing Model

Classification & Information Theory Lecture #8

General Linear Model Introduction, Classes of Linear models and Estimation

Analyses of Orthogonal and Non-Orthogonal Steering Vectors at Millimeter Wave Systems

Neural coding Ecological approach to sensory coding: efficient adaptation to the natural environment

Damage Identification from Power Spectrum Density Transmissibility

Optimal Recognition Algorithm for Cameras of Lasers Evanescent

ELEMENT OF INFORMATION THEORY

Massachusetts Institute of Technology

Anytime communication over the Gilbert-Eliot channel with noiseless feedback

EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 2018 Please submit on Gradescope. Start every question on a new page.

Information Theory, Statistics, and Decision Trees

Lecture 16 Oct 21, 2014

Uncertainty Modeling with Interval Type-2 Fuzzy Logic Systems in Mobile Robotics

Elementary Analysis in Q p

Phase Equilibrium Calculations by Equation of State v2

EE 508 Lecture 13. Statistical Characterization of Filter Characteristics

Universal Finite Memory Coding of Binary Sequences

Lecture Notes for Communication Theory

State Estimation with ARMarkov Models

HetNets: what tools for analysis?

Model checking, verification of CTL. One must verify or expel... doubts, and convert them into the certainty of YES [Thomas Carlyle]

Iterative Methods for Designing Orthogonal and Biorthogonal Two-channel FIR Filter Banks with Regularities

ECE 6960: Adv. Random Processes & Applications Lecture Notes, Fall 2010

3F1 Information Theory, Lecture 3

An introduction to basic information theory. Hampus Wessman

Transcription:

Formal Modeling in Cognitive Science Lecture 9: and ; ; Frank Keller School of Informatics University of Edinburgh keller@inf.ed.ac.uk Proerties of 3 March, 6 Frank Keller Formal Modeling in Cognitive Science Frank Keller Formal Modeling in Cognitive Science Proerties of Proerties of So far, we have looked at encoding a message efficiently, ut what about transmitting the message? The transmission of a message can be ed using a noisy channel: a message W is encoded, resulting in a string X ; X is transmitted through a channel with the robability distribution f (y x); the resulting string Y is decoded, yielding an estimate of the message Ŵ. W Message Encoder X Channel f(y x) Y W^ Estimate of message Frank Keller Formal Modeling in Cognitive Science 3 We are interested in the mathematical roerties of the channel used to transmit the message, and in articular in its caacity. Definition: Discrete Channel A discrete channel consists of an inut alhabet X, an outut alhabet Y and a robability distribution f (y x) that exresses the robability of observing symbol y given that symbol x is sent. Definition: The channel caacity of a discrete channel is: C = max f (x) I (X ; Y ) The caacity of a channel is the maximum of the mutual information of X and Y over all inut distributions f (x). Frank Keller Formal Modeling in Cognitive Science

Proerties of Proerties of Examle: Noiseless Binary Channel Assume a binary channel whose inut is reroduced exactly at the outut. Each transmitted bit is received without error: The channel caacity of this channel is: C = max I (X ; Y ) = bit f (x) This maximum is achieved with f () = and f () =. Examle: Binary Symmetric Channel Assume a binary channel whose inut is flied ( transmitted a or transmitted as ) with robability : The mutual information of this channel is bounded by: I (X ; Y ) = H(Y ) H(X Y ) = H(Y ) x f (x)h(y X = x) = H(Y ) x f (x)h() = H(Y ) H() H() The channel caacity is therefore: C = max I (X ; Y ) = H() bits f (x) Frank Keller Formal Modeling in Cognitive Science 5 Frank Keller Formal Modeling in Cognitive Science 6 Proerties of Proerties Proerties of A binary data sequence of length, transmitted over a binary symmetric channel with =.: Theorem: Proerties of C since I (X ; Y ) ; C log X, since C = max I (X ; Y ) max H(X ) log X ; 3 C log Y for the same reason. Frank Keller Formal Modeling in Cognitive Science 7 Frank Keller Formal Modeling in Cognitive Science 8

Proerties of of the Proerties of of the The noisy channel can be alied to decoding rocesses involving linguistic information. A tyical formulation of such a roblem is: we start with a linguistic inut I ; I is transmitted through a noisy channel with the robability distribution f (o i); the resulting outut O is decoded, yielding an estimate of the inut Î. I Noisy Channel f(o i) O I^ Alication Inut Outut f (i) f (o i) Machine translation Otical character recognition Part of seech tagging Seech recognition target source actual text text with mistakes POS seech signal target robability of POS translation of OCR errors f (w t) acoustic Frank Keller Formal Modeling in Cognitive Science 9 Frank Keller Formal Modeling in Cognitive Science Proerties of of the Examle Outut: Sanish English Proerties of Let s look at machine translation in more detail. Assume that the French text (F ) assed through a noisy channel and came out as English (E). We decode it to estimate the original French ( ˆF ): ^ F Noisy Channel E F f(e f) We comute ˆF using Bayes theorem: f (f )f (e f ) ˆF = arg max f (f e) = arg max f f f (e) = arg max f (f )f (e f ) f Here f (e f ) is the translation, f (f ) is the French, and f (e) is the English (constant). we all know very well that the current treaties are insufficient and that, in the future, it will be necessary to develo a better structure and different for the euroean union, a structure more constitutional also make it clear what the cometences of the member states and which belong to the union. messages of concern in the first lace just before the economic and social roblems for the resent situation, and in site of sustained growth, as a result of years of effort on the art of our citizens. the current situation, unsustainable above all for many self-emloyed drivers and in the area of agriculture, we must imrove without doubt. in itself, it is good to reach an agreement on rocedures, but we have to ensure that this system is not likely to be used as a weaon olicy. now they are also clear rights to be resected. i agree with the signal warning against the return, which some are temted to the intergovernmental methods. there are many of us that we want a federation of nation states. Frank Keller Formal Modeling in Cognitive Science Frank Keller Formal Modeling in Cognitive Science

Examle Outut: Finnish English Proerties of the raorteurs have drawn attention to the quality of the debate and also the need to go further : of course, i can only agree with them. we know very well that the current treaties are not enough and that in future, it is necessary to develo a better structure for the union and, therefore erustuslaillisemi structure, which also exressed more clearly what the member states and the union is concerned. first of all, kohtaamiemme economic and social difficulties, there is concern, even if growth is sustainable and the result of the efforts of all, on the art of our citizens. the current situation, which is unaccetable, in articular, for many carriers and resonsible for agriculture, is in any case, to be imroved. agreement on rocedures in itself is a good thing, but there is a need to ensure that the system cannot be used as a olitical lyomaaseena. they also have a clear icture of the rights of now, in which they have to work. i agree with him when he warned of the consenting to return to intergovernmental methods. many of us want of a federal state of the national member states. Definition: For two robability distributions f (x) and g(x) for a random variable X, the Kullback-Leibler divergence or relative entroy is given as: D(f g) = f (x) log f (x) g(x) x X The KL divergence comares the entroy of two distributions over the same random variable. Intuitively, the KL divergence number of additional bits required when encoding a random variable with a distribution f (x) using the alternative distribution g(x). Frank Keller Formal Modeling in Cognitive Science 3 Frank Keller Formal Modeling in Cognitive Science Theorem: Proerties of the D(f g) ; D(f g) = iff f (x) = g(x) for all x X ; 3 D(f g) D(g f ); I (X ; Y ) = D(f (x, y) f (x)f (y)). So the mutual information is the KL divergence between f (x, y) and f (x)f (y). It measures how far a distribution is from indeendence. Examle For a random variable X = {, } assume two distributions f (x) and g(x) with f () = r, f () = r and g() = s, g() = s: D(f g) = ( r) log r s + r log r s D(g f ) = ( s) log s r + s log s r If r = s then D(f g) = D(g f ) =. If r = and r = : D(f g) = log 3 D(g f ) = 3 log 3 + log + log =.75 =.887 Frank Keller Formal Modeling in Cognitive Science 5 Frank Keller Formal Modeling in Cognitive Science 6

Definition: For a random variable X with the robability distribution f (x) the cross-entroy for the robability distribution g(x) is given as: H(X, g) = x X f (x) log g(x) The cross-entroy can also be exressed in terms of entroy and KL divergence: H(X, g) = H(X ) + D(f g) Intuitively, the cross-entroy is the total number of bits required when encoding a random variable with a distribution f (x) using the alternative distribution g(x). Examle In the last lecture, we constructed a code for the following distribution using Huffman coding: x a e i o u f (x)...9.3.7 log f (x) 3.6.5 3.7.7 3.8 The entroy of this distribution is H(X ) =.995. Now comute the distribution g(x) = l(x) associated with the Huffman code: x a e i o u C(x) l(x) 3 g(x) 8 6 6 Frank Keller Formal Modeling in Cognitive Science 7 Frank Keller Formal Modeling in Cognitive Science 8 Summary Examle Then the cross-entroy for g(x) is: H(X, g) = x X f (x) log g(x) = (. log 8 +.3 log +.7 log 6 ) =.3 The KL divergence is: D(f g) = H(X, g) H(X ) =.35 +.9 log 6 +.3 log This means we are losing on average.35 bits by using the Huffman code rather then the theoretically otimal code given by the Shannon information. Frank Keller Formal Modeling in Cognitive Science 9 The noisy channel can the errors and loss when transmitting a message with inut X and outut Y ; the caacity of the channel is given by the maximum of the mutual information of X and Y ; a binary symmetric channel is one where each bit is flied with robability ; the noisy channel can be alied to linguistic roblems, e.g., machine translation; the Kullback-Leibler divergence is the distance between two distributions (the cost of encoding f (x) through g(x)). Frank Keller Formal Modeling in Cognitive Science