Maximum Entropy - Lecture 8
|
|
- Austin Tyler
- 5 years ago
- Views:
Transcription
1 1 Introduction Maximum Entropy - Lecture 8 In the previous discussions we have not been particularly concerned with the selection of the initial prior probability to use in Bayes theorem, as the result in most cases is insensitive to initial values. This is not always the case, however, and we need a way of selecting initial probabilities among many possible choices. We can use the principle of maximum entropy which is defined by information theory. Thus information entropy is a measure of the uncertainty in a probability distribution. A data stream is an information transfer to which we can assign a set of possible probability distributions. The principle of maximum entropy states that the set of possible probability distributions, {p i }, occurs in a way to maximize the entropy. As an example, supppose 12 balls and all but one are of equal weight. You have a balance as an instrument and wish to determine the best way to do find the odd ball. This is a problem in optimization and is solved by choosing the most probable path. 2 Review of information theory Shannon s theorem states that the information contained in a probability, p i, contained in a set {p i } is I = ln(1/p i ). The entropy of the ensemble of the set is; S = p i ln(1/p i ) As previously noted, the ln function could be defined in any base, but in information theory this is almost always base 2 as information is coded in bits. Thus N random variables each with entropy, S, can be compressed without loss of information into NS or more bits. Consider an ensemble of N independent distributed random variables. The outcome of this distribution, x = (x 1 x N ), most likely belongs to the subset, A x, having 2 NS members, and the probability is close to 1/2 NS. This is due to the equipartition of probabilities. Thus as N the number of bits per outcome required to specify an outcome S. (ie S δ /N Constant as N In the above S δ is the smallest subset of outcomes that have the highest probabilities (the largest information content). This is shown in Figure 1. 1
2 Figure 1: A plot of S δ /N (vertical axis) vs δ (horizontal axis) for various N 3 Application of information theory Suppose a gas of N point particles. Initially all particles are contained in a volume V. Suppose we know the position of each particle. Divide the volume into 10 6 small volumes, δv. For each of the 10 6 volumes assign a probability of p i of finding a gas particle within the volume. Thus the probability is p i = n i /N. For an even distribution n i = N/10 6. The entropy is; S = 106 i=1 p i ln(1/p i ) = (n i /N) ln(n/n i ) In the case of an even distribution; S = N/10 6 n ln( N N/10 6 ) = ln(10 6 ) Note that the entropy depends on the measurement scale. If the volume size is decreased by 10 3 (ie the accuracy in the measurement of the particle s position increases) the entropy increases to; S = ln(10 9 ) For macroscopic systems we most often use relative rather than absolute measures of entropy, due to the large number of particles. Now allow the particles to move. The state of the system is determined by the the distribution of particles in each volume. We assume that particles are equally likely to move in any box in any given time step. There are 10 6 configurations with all particles in one box. 2
3 In this case the relative entropy is; S(all in 1 box) = 106 i=1 p i ln(1/p i ) = 0 This is because p 1 = n i /10 6 = 1 because n i = 10 6 and p i ln(1/p i ) = 0 when p i = 0. Now suppose the particles are put in 2 boxes. The number of distributions are; ( ) 10 6 = 10 6! 2 2! (10 6 2)! The entropy in this case is; S(all in 2 boxes) = 1/2 ln(2) + 1/2 ln(2) = ln(2) There are distributions out of 10 6 possible boxes. The probability is; P = [5x ] Suppose the particles deitribute almost equally with 1/2 the boxes having one less particle and 1/2 having one more particle. The number of configurations are; ( ) / Each of these configurations has entropy ln(10 6. The result of all this is that we started a system with entropy 0 an the probability that it moves to a higher configuration is; P = 1 (1/ ) 1 It is owhelmingly probable that as time passes a macroscopic syatem will increase in entropy to reach a maximum. This provides an explanation (if not a proof) of the 2 nd law of thermodynamics and the foundation of the maximum entropy principle. 4 Example - Texas armadillos and dosequis We cannot explore the application of maximum entropy in detail, but can give an illustrative example. We look at Texas armadillos distributed as to whether they are left handed and drink Dos Equis beer 2. [paraphrased from Jaynes - Maximum Entropy and Bayesian Methods] Suppose observation establishes that 3/4 of the armadillos in Texas are left-handed and 3/4 drink Dos Equis. We fill in the data table 1. Armadillos come in quantized units, so for a 3
4 Figure 2: A Dos Equis drinking Armadillo Table 1: Probability table for Texas Armadillos Beer Left Handed Right Handed Probability Dos Equis p 11 p 12 3/4 Other p 21 p 22 1/4 Total 3/4 1/4 1 total of N armadillos we have probabilities; P ij N ij /N which leads to the equations; p 11 + p 12 = 3/4 p 21 + p 22 = 1/4 p 11 + p 21 = 3/4 p 12 + p 22 = 1/4 Solve these equations letting p 22 = q, and then put the table in the form; ( q ) 0.25 q 0.25 q q We find that p 12 = p 21. It can also be shown that when assigning probabilities, maximizing entropy is the only choice that does not introduce correlations in the variables. The number of armadillos in Texas is large, so there is a large number of possible probabilities. We count these using a binomial distribution; W = N! N 11! N 12! N 21! N 22! As an example, choose N = 4. There are 2 possible solutions; 4
5 Solution 1 q = 0 ( ) 2 1 (1/4) 1 0 This has multiplicity W = 12, and entropy S = p i ln(p i ) = 0.45 Solution 2 q = 1/N ( ) 3 0 (1/4) 0 1 This has multiplicity W = 4 and entropy S = p i ln(p i ) = The solution with greatest entropy occurs 75% of the time ( ). Now suppose we change the analysis by introducing a connection between drinking Dos Equis and handiness - a gene perhaps. This introduces a constraint and a correlation. The solution can be developed as above and these would be iterated in a Bayesian approach with the above prior. 5
Signal Processing - Lecture 7
1 Introduction Signal Processing - Lecture 7 Fitting a function to a set of data gathered in time sequence can be viewed as signal processing or learning, and is an important topic in information theory.
More informationStatistical Mechanics and Information Theory
1 Multi-User Information Theory 2 Oct 31, 2013 Statistical Mechanics and Information Theory Lecturer: Dror Vinkler Scribe: Dror Vinkler I. INTRODUCTION TO STATISTICAL MECHANICS In order to see the need
More informationModel Averaging (Bayesian Learning)
Model Averaging (Bayesian Learning) We want to predict the output Y of a new case that has input X = x given the training examples e: p(y x e) = m M P(Y m x e) = m M P(Y m x e)p(m x e) = m M P(Y m x)p(m
More informationCS 630 Basic Probability and Information Theory. Tim Campbell
CS 630 Basic Probability and Information Theory Tim Campbell 21 January 2003 Probability Theory Probability Theory is the study of how best to predict outcomes of events. An experiment (or trial or event)
More information1. Thermodynamics 1.1. A macroscopic view of matter
1. Thermodynamics 1.1. A macroscopic view of matter Intensive: independent of the amount of substance, e.g. temperature,pressure. Extensive: depends on the amount of substance, e.g. internal energy, enthalpy.
More informationLecture 1: Introduction, Entropy and ML estimation
0-704: Information Processing and Learning Spring 202 Lecture : Introduction, Entropy and ML estimation Lecturer: Aarti Singh Scribes: Min Xu Disclaimer: These notes have not been subjected to the usual
More informationInformation in Biology
Lecture 3: Information in Biology Tsvi Tlusty, tsvi@unist.ac.kr Living information is carried by molecular channels Living systems I. Self-replicating information processors Environment II. III. Evolve
More informationOutline of the Lecture. Background and Motivation. Basics of Information Theory: 1. Introduction. Markku Juntti. Course Overview
: Markku Juntti Overview The basic ideas and concepts of information theory are introduced. Some historical notes are made and the overview of the course is given. Source The material is mainly based on
More informationLecture 13. Multiplicity and statistical definition of entropy
Lecture 13 Multiplicity and statistical definition of entropy Readings: Lecture 13, today: Chapter 7: 7.1 7.19 Lecture 14, Monday: Chapter 7: 7.20 - end 2/26/16 1 Today s Goals Concept of entropy from
More informationLecture Notes Set 3a: Probabilities, Microstates and Entropy
Lecture Notes Set 3a: Probabilities, Microstates and Entropy Thus far.. In Sections 1 and 2 of the module we ve covered a broad variety of topics at the heart of the thermal and statistical behaviour of
More informationChapter 9 Fundamental Limits in Information Theory
Chapter 9 Fundamental Limits in Information Theory Information Theory is the fundamental theory behind information manipulation, including data compression and data transmission. 9.1 Introduction o For
More informationBasic Concepts and Tools in Statistical Physics
Chapter 1 Basic Concepts and Tools in Statistical Physics 1.1 Introduction Statistical mechanics provides general methods to study properties of systems composed of a large number of particles. It establishes
More informationCOMBINATORIAL COUNTING
COMBINATORIAL COUNTING Our main reference is [1, Section 3] 1 Basic counting: functions and subsets Theorem 11 (Arbitrary mapping Let N be an n-element set (it may also be empty and let M be an m-element
More informationProbability, Entropy, and Inference / More About Inference
Probability, Entropy, and Inference / More About Inference Mário S. Alvim (msalvim@dcc.ufmg.br) Information Theory DCC-UFMG (2018/02) Mário S. Alvim (msalvim@dcc.ufmg.br) Probability, Entropy, and Inference
More information6.730 Physics for Solid State Applications
6.730 Physics for Solid State Applications Lecture 25: Chemical Potential and Equilibrium Outline Microstates and Counting System and Reservoir Microstates Constants in Equilibrium Temperature & Chemical
More informationNeyman-Pearson. More Motifs. Weight Matrix Models. What s best WMM?
Neyman-Pearson More Motifs WMM, log odds scores, Neyman-Pearson, background; Greedy & EM for motif discovery Given a sample x 1, x 2,..., x n, from a distribution f(... #) with parameter #, want to test
More informationInformation in Biology
Information in Biology CRI - Centre de Recherches Interdisciplinaires, Paris May 2012 Information processing is an essential part of Life. Thinking about it in quantitative terms may is useful. 1 Living
More informationBandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet)
Compression Motivation Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet) Storage: Store large & complex 3D models (e.g. 3D scanner
More information(# = %(& )(* +,(- Closed system, well-defined energy (or e.g. E± E/2): Microcanonical ensemble
Recall from before: Internal energy (or Entropy): &, *, - (# = %(& )(* +,(- Closed system, well-defined energy (or e.g. E± E/2): Microcanonical ensemble & = /01Ω maximized Ω: fundamental statistical quantity
More informationPhysics 403. Segev BenZvi. Choosing Priors and the Principle of Maximum Entropy. Department of Physics and Astronomy University of Rochester
Physics 403 Choosing Priors and the Principle of Maximum Entropy Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Odds Ratio Occam Factors
More informationECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam
ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The
More informationCompressing Kinetic Data From Sensor Networks. Sorelle A. Friedler (Swat 04) Joint work with David Mount University of Maryland, College Park
Compressing Kinetic Data From Sensor Networks Sorelle A. Friedler (Swat 04) Joint work with David Mount University of Maryland, College Park Motivation Motivation Computer Science Graphics: Image and video
More informationECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam
ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The
More informationBOLTZMANN-GIBBS ENTROPY: AXIOMATIC CHARACTERIZATION AND APPLICATION
Internat. J. Math. & Math. Sci. Vol. 23, No. 4 2000 243 251 S0161171200000375 Hindawi Publishing Corp. BOLTZMANN-GIBBS ENTROPY: AXIOMATIC CHARACTERIZATION AND APPLICATION C. G. CHAKRABARTI and KAJAL DE
More informationLecture 5: Temperature, Adiabatic Processes
Lecture 5: Temperature, Adiabatic Processes Chapter II. Thermodynamic Quantities A.G. Petukhov, PHYS 743 September 20, 2017 Chapter II. Thermodynamic Quantities Lecture 5: Temperature, Adiabatic Processes
More informationConfidence Intervals. First ICFA Instrumentation School/Workshop. Harrison B. Prosper Florida State University
Confidence Intervals First ICFA Instrumentation School/Workshop At Morelia,, Mexico, November 18-29, 2002 Harrison B. Prosper Florida State University Outline Lecture 1 Introduction Confidence Intervals
More informationLearning with Probabilities
Learning with Probabilities CS194-10 Fall 2011 Lecture 15 CS194-10 Fall 2011 Lecture 15 1 Outline Bayesian learning eliminates arbitrary loss functions and regularizers facilitates incorporation of prior
More informationProbability Models for Bayesian Recognition
Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIAG / osig Second Semester 06/07 Lesson 9 0 arch 07 Probability odels for Bayesian Recognition Notation... Supervised Learning for Bayesian
More informationMinimum Error-Rate Discriminant
Discriminants Minimum Error-Rate Discriminant In the case of zero-one loss function, the Bayes Discriminant can be further simplified: g i (x) =P (ω i x). (29) J. Corso (SUNY at Buffalo) Bayesian Decision
More informationEBCOT coding passes explained on a detailed example
EBCOT coding passes explained on a detailed example Xavier Delaunay d.xav@free.fr Contents Introduction Example used Coding of the first bit-plane. Cleanup pass............................. Coding of the
More informationBayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007
Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.
More informationAn instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1
Kraft s inequality An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if N 2 l i 1 Proof: Suppose that we have a tree code. Let l max = max{l 1,...,
More informationIntroduction to Probability and Statistics (Continued)
Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:
More informationThermodynamics of feedback controlled systems. Francisco J. Cao
Thermodynamics of feedback controlled systems Francisco J. Cao Open-loop and closed-loop control Open-loop control: the controller actuates on the system independently of the system state. Controller Actuation
More informationIntroduction to Information Entropy Adapted from Papoulis (1991)
Introduction to Information Entropy Adapted from Papoulis (1991) Federico Lombardo Papoulis, A., Probability, Random Variables and Stochastic Processes, 3rd edition, McGraw ill, 1991. 1 1. INTRODUCTION
More informationThe Bayes classifier
The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal
More informationPERFECT SECRECY AND ADVERSARIAL INDISTINGUISHABILITY
PERFECT SECRECY AND ADVERSARIAL INDISTINGUISHABILITY BURTON ROSENBERG UNIVERSITY OF MIAMI Contents 1. Perfect Secrecy 1 1.1. A Perfectly Secret Cipher 2 1.2. Odds Ratio and Bias 3 1.3. Conditions for Perfect
More informationAs mentioned, we will relax the conditions of our dictionary data structure. The relaxations we
CSE 203A: Advanced Algorithms Prof. Daniel Kane Lecture : Dictionary Data Structures and Load Balancing Lecture Date: 10/27 P Chitimireddi Recap This lecture continues the discussion of dictionary data
More informationBAYESIAN ANALYSIS OF DOSE-RESPONSE CALIBRATION CURVES
Libraries Annual Conference on Applied Statistics in Agriculture 2005-17th Annual Conference Proceedings BAYESIAN ANALYSIS OF DOSE-RESPONSE CALIBRATION CURVES William J. Price Bahman Shafii Follow this
More informationMicrocanonical Ensemble
Entropy for Department of Physics, Chungbuk National University October 4, 2018 Entropy for A measure for the lack of information (ignorance): s i = log P i = log 1 P i. An average ignorance: S = k B i
More informationStatistical Theory 1
Statistical Theory 1 Set Theory and Probability Paolo Bautista September 12, 2017 Set Theory We start by defining terms in Set Theory which will be used in the following sections. Definition 1 A set is
More informationSome Basic Concepts of Probability and Information Theory: Pt. 2
Some Basic Concepts of Probability and Information Theory: Pt. 2 PHYS 476Q - Southern Illinois University January 22, 2018 PHYS 476Q - Southern Illinois University Some Basic Concepts of Probability and
More informationLinear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging
More informationChapter 16 Thermodynamics
Nicholas J. Giordano www.cengage.com/physics/giordano Chapter 16 Thermodynamics Thermodynamics Introduction Another area of physics is thermodynamics Continues with the principle of conservation of energy
More informationInformation Theory Primer:
Information Theory Primer: Entropy, KL Divergence, Mutual Information, Jensen s inequality Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,
More informationClassification & Information Theory Lecture #8
Classification & Information Theory Lecture #8 Introduction to Natural Language Processing CMPSCI 585, Fall 2007 University of Massachusetts Amherst Andrew McCallum Today s Main Points Automatically categorizing
More informationClassification and Regression Trees
Classification and Regression Trees Ryan P Adams So far, we have primarily examined linear classifiers and regressors, and considered several different ways to train them When we ve found the linearity
More informationAN OPTIMAL STRATEGY FOR SEQUENTIAL CLASSIFICATION ON PARTIALLY ORDERED SETS. Abstract
AN OPTIMAL STRATEGY FOR SEQUENTIAL CLASSIFICATION ON PARTIALLY ORDERED SETS Thomas S. Ferguson Department of Mathematics UCLA Los Angeles, CA 90095, USA Curtis Tatsuoka 1,2 Department of Statistics The
More informationt = no of steps of length s
s t = no of steps of length s Figure : Schematic of the path of a diffusing molecule, for example, one in a gas or a liquid. The particle is moving in steps of length s. For a molecule in a liquid the
More informationCommunication Complexity 16:198:671 2/15/2010. Lecture 6. P (x) log
Communication Complexity 6:98:67 2/5/200 Lecture 6 Lecturer: Nikos Leonardos Scribe: Troy Lee Information theory lower bounds. Entropy basics Let Ω be a finite set and P a probability distribution on Ω.
More informationRandomized Algorithms
Randomized Algorithms Prof. Tapio Elomaa tapio.elomaa@tut.fi Course Basics A new 4 credit unit course Part of Theoretical Computer Science courses at the Department of Mathematics There will be 4 hours
More informationDiscrete Mathematics, Spring 2004 Homework 4 Sample Solutions
Discrete Mathematics, Spring 2004 Homework 4 Sample Solutions 4.2 #77. Let s n,k denote the number of ways to seat n persons at k round tables, with at least one person at each table. (The numbers s n,k
More informationPROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS
PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability
More information(Classical) Information Theory II: Source coding
(Classical) Information Theory II: Source coding Sibasish Ghosh The Institute of Mathematical Sciences CIT Campus, Taramani, Chennai 600 113, India. p. 1 Abstract The information content of a random variable
More informationInformation Theory with Applications, Math6397 Lecture Notes from September 30, 2014 taken by Ilknur Telkes
Information Theory with Applications, Math6397 Lecture Notes from September 3, 24 taken by Ilknur Telkes Last Time Kraft inequality (sep.or) prefix code Shannon Fano code Bound for average code-word length
More informationBayesian networks approximation
Bayesian networks approximation Eric Fabre ANR StochMC, Feb. 13, 2014 Outline 1 Motivation 2 Formalization 3 Triangulated graphs & I-projections 4 Successive approximations 5 Best graph selection 6 Conclusion
More informationSuggestions for Further Reading
Contents Preface viii 1 From Microscopic to Macroscopic Behavior 1 1.1 Introduction........................................ 1 1.2 Some Qualitative Observations............................. 2 1.3 Doing
More informationMATH115. Sequences and Infinite Series. Paolo Lorenzo Bautista. June 29, De La Salle University. PLBautista (DLSU) MATH115 June 29, / 16
MATH115 Sequences and Infinite Series Paolo Lorenzo Bautista De La Salle University June 29, 2014 PLBautista (DLSU) MATH115 June 29, 2014 1 / 16 Definition A sequence function is a function whose domain
More informationProbability calculus and statistics
A Probability calculus and statistics A.1 The meaning of a probability A probability can be interpreted in different ways. In this book, we understand a probability to be an expression of how likely it
More informationClassification: Decision Trees
Classification: Decision Trees Outline Top-Down Decision Tree Construction Choosing the Splitting Attribute Information Gain and Gain Ratio 2 DECISION TREE An internal node is a test on an attribute. A
More informationComputing and Communications 2. Information Theory -Entropy
1896 1920 1987 2006 Computing and Communications 2. Information Theory -Entropy Ying Cui Department of Electronic Engineering Shanghai Jiao Tong University, China 2017, Autumn 1 Outline Entropy Joint entropy
More informationEntropy. Probability and Computing. Presentation 22. Probability and Computing Presentation 22 Entropy 1/39
Entropy Probability and Computing Presentation 22 Probability and Computing Presentation 22 Entropy 1/39 Introduction Why randomness and information are related? An event that is almost certain to occur
More informationStatistical learning. Chapter 20, Sections 1 4 1
Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationLecture 8: Clustering & Mixture Models
Lecture 8: Clustering & Mixture Models C4B Machine Learning Hilary 2011 A. Zisserman K-means algorithm GMM and the EM algorithm plsa clustering K-means algorithm K-means algorithm Partition data into K
More informationWhat is Entropy? Jeff Gill, 1 Entropy in Information Theory
What is Entropy? Jeff Gill, jgill@ucdavis.edu 1 Entropy in Information Theory There are many definitions of information in various literatures but all of them have the same property of distinction from
More informationDS-GA 1002 Lecture notes 11 Fall Bayesian statistics
DS-GA 100 Lecture notes 11 Fall 016 Bayesian statistics In the frequentist paradigm we model the data as realizations from a distribution that depends on deterministic parameters. In contrast, in Bayesian
More informationEntropy Principle in Direct Derivation of Benford's Law
Entropy Principle in Direct Derivation of Benford's Law Oded Kafri Varicom Communications, Tel Aviv 6865 Israel oded@varicom.co.il The uneven distribution of digits in numerical data, known as Benford's
More informationF R A N C E S C O B U S C E M I ( N A G OYA U N I V E R S I T Y ) C O L L O Q U I U D E P T. A P P L I E D M AT H E M AT I C S H A N YA N G U N I
QUANTUM UNCERTAINT Y F R A N C E S C O B U S C E M I ( N A G OYA U N I V E R S I T Y ) C O L L O Q U I U M @ D E P T. A P P L I E D M AT H E M AT I C S H A N YA N G U N I V E R S I T Y ( E R I C A ) 2
More informationalthough Boltzmann used W instead of Ω for the number of available states.
Lecture #13 1 Lecture 13 Obectives: 1. Ensembles: Be able to list the characteristics of the following: (a) icrocanonical (b) Canonical (c) Grand Canonical 2. Be able to use Lagrange s method of undetermined
More informationInformation Theory. Lecture 5 Entropy rate and Markov sources STEFAN HÖST
Information Theory Lecture 5 Entropy rate and Markov sources STEFAN HÖST Universal Source Coding Huffman coding is optimal, what is the problem? In the previous coding schemes (Huffman and Shannon-Fano)it
More informationReconstruction. Reading for this lecture: Lecture Notes.
ɛm Reconstruction Reading for this lecture: Lecture Notes. The Learning Channel... ɛ -Machine of a Process: Intrinsic representation! Predictive (or causal) equivalence relation: s s Pr( S S= s ) = Pr(
More informationChapter 3: Asymptotic Equipartition Property
Chapter 3: Asymptotic Equipartition Property Chapter 3 outline Strong vs. Weak Typicality Convergence Asymptotic Equipartition Property Theorem High-probability sets and the typical set Consequences of
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationLECTURE 1. 1 Introduction. 1.1 Sample spaces and events
LECTURE 1 1 Introduction The first part of our adventure is a highly selective review of probability theory, focusing especially on things that are most useful in statistics. 1.1 Sample spaces and events
More informationLecture 8. The Second Law of Thermodynamics; Energy Exchange
Lecture 8 The Second Law of Thermodynamics; Energy Exchange The second law of thermodynamics Statistics of energy exchange General definition of temperature Why heat flows from hot to cold Reading for
More informationThe Effects of Coarse-Graining on One- Dimensional Cellular Automata Alec Boyd UC Davis Physics Deparment
The Effects of Coarse-Graining on One- Dimensional Cellular Automata Alec Boyd UC Davis Physics Deparment alecboy@gmail.com Abstract: Measurement devices that we use to examine systems often do not communicate
More informationMassachusetts Institute of Technology
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Department of Mechanical Engineering 6.050J/2.0J Information and Entropy Spring 2005 Issued: March 7, 2005
More informationCombinations and Probabilities
Combinations and Probabilities Tutor: Zhang Qi Systems Engineering and Engineering Management qzhang@se.cuhk.edu.hk November 2014 Tutor: Zhang Qi (SEEM) Tutorial 7 November 2014 1 / 16 Combination Review
More informationFrequentist-Bayesian Model Comparisons: A Simple Example
Frequentist-Bayesian Model Comparisons: A Simple Example Consider data that consist of a signal y with additive noise: Data vector (N elements): D = y + n The additive noise n has zero mean and diagonal
More informationThermodynamics of the nucleus
Thermodynamics of the nucleus Hilde-Therese Nyhus 1. October, 8 Hilde-Therese Nyhus Thermodynamics of the nucleus Owerview 1 Link between level density and thermodynamics Definition of level density Level
More informationLecture Notes 1 Basic Probability. Elements of Probability. Conditional probability. Sequential Calculation of Probability
Lecture Notes 1 Basic Probability Set Theory Elements of Probability Conditional probability Sequential Calculation of Probability Total Probability and Bayes Rule Independence Counting EE 178/278A: Basic
More informationThermodynamics Second Law Entropy
Thermodynamics Second Law Entropy Lana Sheridan De Anza College May 9, 2018 Last time entropy (macroscopic perspective) Overview entropy (microscopic perspective) Reminder of Example from Last Lecture
More informationLecture 5: Asymptotic Equipartition Property
Lecture 5: Asymptotic Equipartition Property Law of large number for product of random variables AEP and consequences Dr. Yao Xie, ECE587, Information Theory, Duke University Stock market Initial investment
More informationEndogenous Information Choice
Endogenous Information Choice Lecture 7 February 11, 2015 An optimizing trader will process those prices of most importance to his decision problem most frequently and carefully, those of less importance
More informationBayesian Methods: Naïve Bayes
Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior
More informationInformation Theory and Coding Techniques: Chapter 1.1. What is Information Theory? Why you should take this course?
Information Theory and Coding Techniques: Chapter 1.1 What is Information Theory? Why you should take this course? 1 What is Information Theory? Information Theory answers two fundamental questions in
More informationarxiv: v1 [physics.data-an] 10 Sep 2007
arxiv:0709.1504v1 [physics.data-an] 10 Sep 2007 Maximum Entropy, Time Series and Statistical Inference R. Kariotis Department of Physics University of Wisconsin Madison, Wisconsin 53706 ABSTRACT: A brief
More informationEE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16
EE539R: Problem Set 4 Assigned: 3/08/6, Due: 07/09/6. Cover and Thomas: Problem 3.5 Sets defined by probabilities: Define the set C n (t = {x n : P X n(x n 2 nt } (a We have = P X n(x n P X n(x n 2 nt
More informationOutline Review Example Problem 1. Thermodynamics. Review and Example Problems: Part-2. X Bai. SDSMT, Physics. Fall 2014
Review and Example Problems: Part- SDSMT, Physics Fall 014 1 Review Example Problem 1 Exponents of phase transformation : contents 1 Basic Concepts: Temperature, Work, Energy, Thermal systems, Ideal Gas,
More informationHomework 4 Solution, due July 23
Homework 4 Solution, due July 23 Random Variables Problem 1. Let X be the random number on a die: from 1 to. (i) What is the distribution of X? (ii) Calculate EX. (iii) Calculate EX 2. (iv) Calculate Var
More informationLecture 2 Binomial and Poisson Probability Distributions
Binomial Probability Distribution Lecture 2 Binomial and Poisson Probability Distributions Consider a situation where there are only two possible outcomes (a Bernoulli trial) Example: flipping a coin James
More informationLecture 1: Probability Fundamentals
Lecture 1: Probability Fundamentals IB Paper 7: Probability and Statistics Carl Edward Rasmussen Department of Engineering, University of Cambridge January 22nd, 2008 Rasmussen (CUED) Lecture 1: Probability
More informationPattern Recognition and Machine Learning. Learning and Evaluation of Pattern Recognition Processes
Pattern Recognition and Machine Learning James L. Crowley ENSIMAG 3 - MMIS Fall Semester 2016 Lesson 1 5 October 2016 Learning and Evaluation of Pattern Recognition Processes Outline Notation...2 1. The
More informationRun-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE
General e Image Coder Structure Motion Video x(s 1,s 2,t) or x(s 1,s 2 ) Natural Image Sampling A form of data compression; usually lossless, but can be lossy Redundancy Removal Lossless compression: predictive
More informationBayesian Learning. Bayesian Learning Criteria
Bayesian Learning In Bayesian learning, we are interested in the probability of a hypothesis h given the dataset D. By Bayes theorem: P (h D) = P (D h)p (h) P (D) Other useful formulas to remember are:
More informationProbability - Lecture 4
1 Introduction Probability - Lecture 4 Many methods of computation physics and the comparison of data to a mathematical representation, apply stochastic methods. These ideas were first introduced in the
More informationIt From Bit Or Bit From Us?
It From Bit Or Bit From Us? Majid Karimi Research Group on Foundations of Quantum Theory and Information Department of Chemistry, Sharif University of Technology On its 125 th anniversary, July 1 st, 2005
More informationStatistical Methods for Astronomy
Statistical Methods for Astronomy Probability (Lecture 1) Statistics (Lecture 2) Why do we need statistics? Useful Statistics Definitions Error Analysis Probability distributions Error Propagation Binomial
More informationHow generative models develop in predictive processing
Faculty of Social Sciences Bachelor Artificial Intelligence Academic year 2016-2017 Date: 18 June 2017 How generative models develop in predictive processing Bachelor s Thesis Artificial Intelligence Author:
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More information