Maximum Entropy - Lecture 8

Size: px
Start display at page:

Download "Maximum Entropy - Lecture 8"

Transcription

1 1 Introduction Maximum Entropy - Lecture 8 In the previous discussions we have not been particularly concerned with the selection of the initial prior probability to use in Bayes theorem, as the result in most cases is insensitive to initial values. This is not always the case, however, and we need a way of selecting initial probabilities among many possible choices. We can use the principle of maximum entropy which is defined by information theory. Thus information entropy is a measure of the uncertainty in a probability distribution. A data stream is an information transfer to which we can assign a set of possible probability distributions. The principle of maximum entropy states that the set of possible probability distributions, {p i }, occurs in a way to maximize the entropy. As an example, supppose 12 balls and all but one are of equal weight. You have a balance as an instrument and wish to determine the best way to do find the odd ball. This is a problem in optimization and is solved by choosing the most probable path. 2 Review of information theory Shannon s theorem states that the information contained in a probability, p i, contained in a set {p i } is I = ln(1/p i ). The entropy of the ensemble of the set is; S = p i ln(1/p i ) As previously noted, the ln function could be defined in any base, but in information theory this is almost always base 2 as information is coded in bits. Thus N random variables each with entropy, S, can be compressed without loss of information into NS or more bits. Consider an ensemble of N independent distributed random variables. The outcome of this distribution, x = (x 1 x N ), most likely belongs to the subset, A x, having 2 NS members, and the probability is close to 1/2 NS. This is due to the equipartition of probabilities. Thus as N the number of bits per outcome required to specify an outcome S. (ie S δ /N Constant as N In the above S δ is the smallest subset of outcomes that have the highest probabilities (the largest information content). This is shown in Figure 1. 1

2 Figure 1: A plot of S δ /N (vertical axis) vs δ (horizontal axis) for various N 3 Application of information theory Suppose a gas of N point particles. Initially all particles are contained in a volume V. Suppose we know the position of each particle. Divide the volume into 10 6 small volumes, δv. For each of the 10 6 volumes assign a probability of p i of finding a gas particle within the volume. Thus the probability is p i = n i /N. For an even distribution n i = N/10 6. The entropy is; S = 106 i=1 p i ln(1/p i ) = (n i /N) ln(n/n i ) In the case of an even distribution; S = N/10 6 n ln( N N/10 6 ) = ln(10 6 ) Note that the entropy depends on the measurement scale. If the volume size is decreased by 10 3 (ie the accuracy in the measurement of the particle s position increases) the entropy increases to; S = ln(10 9 ) For macroscopic systems we most often use relative rather than absolute measures of entropy, due to the large number of particles. Now allow the particles to move. The state of the system is determined by the the distribution of particles in each volume. We assume that particles are equally likely to move in any box in any given time step. There are 10 6 configurations with all particles in one box. 2

3 In this case the relative entropy is; S(all in 1 box) = 106 i=1 p i ln(1/p i ) = 0 This is because p 1 = n i /10 6 = 1 because n i = 10 6 and p i ln(1/p i ) = 0 when p i = 0. Now suppose the particles are put in 2 boxes. The number of distributions are; ( ) 10 6 = 10 6! 2 2! (10 6 2)! The entropy in this case is; S(all in 2 boxes) = 1/2 ln(2) + 1/2 ln(2) = ln(2) There are distributions out of 10 6 possible boxes. The probability is; P = [5x ] Suppose the particles deitribute almost equally with 1/2 the boxes having one less particle and 1/2 having one more particle. The number of configurations are; ( ) / Each of these configurations has entropy ln(10 6. The result of all this is that we started a system with entropy 0 an the probability that it moves to a higher configuration is; P = 1 (1/ ) 1 It is owhelmingly probable that as time passes a macroscopic syatem will increase in entropy to reach a maximum. This provides an explanation (if not a proof) of the 2 nd law of thermodynamics and the foundation of the maximum entropy principle. 4 Example - Texas armadillos and dosequis We cannot explore the application of maximum entropy in detail, but can give an illustrative example. We look at Texas armadillos distributed as to whether they are left handed and drink Dos Equis beer 2. [paraphrased from Jaynes - Maximum Entropy and Bayesian Methods] Suppose observation establishes that 3/4 of the armadillos in Texas are left-handed and 3/4 drink Dos Equis. We fill in the data table 1. Armadillos come in quantized units, so for a 3

4 Figure 2: A Dos Equis drinking Armadillo Table 1: Probability table for Texas Armadillos Beer Left Handed Right Handed Probability Dos Equis p 11 p 12 3/4 Other p 21 p 22 1/4 Total 3/4 1/4 1 total of N armadillos we have probabilities; P ij N ij /N which leads to the equations; p 11 + p 12 = 3/4 p 21 + p 22 = 1/4 p 11 + p 21 = 3/4 p 12 + p 22 = 1/4 Solve these equations letting p 22 = q, and then put the table in the form; ( q ) 0.25 q 0.25 q q We find that p 12 = p 21. It can also be shown that when assigning probabilities, maximizing entropy is the only choice that does not introduce correlations in the variables. The number of armadillos in Texas is large, so there is a large number of possible probabilities. We count these using a binomial distribution; W = N! N 11! N 12! N 21! N 22! As an example, choose N = 4. There are 2 possible solutions; 4

5 Solution 1 q = 0 ( ) 2 1 (1/4) 1 0 This has multiplicity W = 12, and entropy S = p i ln(p i ) = 0.45 Solution 2 q = 1/N ( ) 3 0 (1/4) 0 1 This has multiplicity W = 4 and entropy S = p i ln(p i ) = The solution with greatest entropy occurs 75% of the time ( ). Now suppose we change the analysis by introducing a connection between drinking Dos Equis and handiness - a gene perhaps. This introduces a constraint and a correlation. The solution can be developed as above and these would be iterated in a Bayesian approach with the above prior. 5

Signal Processing - Lecture 7

Signal Processing - Lecture 7 1 Introduction Signal Processing - Lecture 7 Fitting a function to a set of data gathered in time sequence can be viewed as signal processing or learning, and is an important topic in information theory.

More information

Statistical Mechanics and Information Theory

Statistical Mechanics and Information Theory 1 Multi-User Information Theory 2 Oct 31, 2013 Statistical Mechanics and Information Theory Lecturer: Dror Vinkler Scribe: Dror Vinkler I. INTRODUCTION TO STATISTICAL MECHANICS In order to see the need

More information

Model Averaging (Bayesian Learning)

Model Averaging (Bayesian Learning) Model Averaging (Bayesian Learning) We want to predict the output Y of a new case that has input X = x given the training examples e: p(y x e) = m M P(Y m x e) = m M P(Y m x e)p(m x e) = m M P(Y m x)p(m

More information

CS 630 Basic Probability and Information Theory. Tim Campbell

CS 630 Basic Probability and Information Theory. Tim Campbell CS 630 Basic Probability and Information Theory Tim Campbell 21 January 2003 Probability Theory Probability Theory is the study of how best to predict outcomes of events. An experiment (or trial or event)

More information

1. Thermodynamics 1.1. A macroscopic view of matter

1. Thermodynamics 1.1. A macroscopic view of matter 1. Thermodynamics 1.1. A macroscopic view of matter Intensive: independent of the amount of substance, e.g. temperature,pressure. Extensive: depends on the amount of substance, e.g. internal energy, enthalpy.

More information

Lecture 1: Introduction, Entropy and ML estimation

Lecture 1: Introduction, Entropy and ML estimation 0-704: Information Processing and Learning Spring 202 Lecture : Introduction, Entropy and ML estimation Lecturer: Aarti Singh Scribes: Min Xu Disclaimer: These notes have not been subjected to the usual

More information

Information in Biology

Information in Biology Lecture 3: Information in Biology Tsvi Tlusty, tsvi@unist.ac.kr Living information is carried by molecular channels Living systems I. Self-replicating information processors Environment II. III. Evolve

More information

Outline of the Lecture. Background and Motivation. Basics of Information Theory: 1. Introduction. Markku Juntti. Course Overview

Outline of the Lecture. Background and Motivation. Basics of Information Theory: 1. Introduction. Markku Juntti. Course Overview : Markku Juntti Overview The basic ideas and concepts of information theory are introduced. Some historical notes are made and the overview of the course is given. Source The material is mainly based on

More information

Lecture 13. Multiplicity and statistical definition of entropy

Lecture 13. Multiplicity and statistical definition of entropy Lecture 13 Multiplicity and statistical definition of entropy Readings: Lecture 13, today: Chapter 7: 7.1 7.19 Lecture 14, Monday: Chapter 7: 7.20 - end 2/26/16 1 Today s Goals Concept of entropy from

More information

Lecture Notes Set 3a: Probabilities, Microstates and Entropy

Lecture Notes Set 3a: Probabilities, Microstates and Entropy Lecture Notes Set 3a: Probabilities, Microstates and Entropy Thus far.. In Sections 1 and 2 of the module we ve covered a broad variety of topics at the heart of the thermal and statistical behaviour of

More information

Chapter 9 Fundamental Limits in Information Theory

Chapter 9 Fundamental Limits in Information Theory Chapter 9 Fundamental Limits in Information Theory Information Theory is the fundamental theory behind information manipulation, including data compression and data transmission. 9.1 Introduction o For

More information

Basic Concepts and Tools in Statistical Physics

Basic Concepts and Tools in Statistical Physics Chapter 1 Basic Concepts and Tools in Statistical Physics 1.1 Introduction Statistical mechanics provides general methods to study properties of systems composed of a large number of particles. It establishes

More information

COMBINATORIAL COUNTING

COMBINATORIAL COUNTING COMBINATORIAL COUNTING Our main reference is [1, Section 3] 1 Basic counting: functions and subsets Theorem 11 (Arbitrary mapping Let N be an n-element set (it may also be empty and let M be an m-element

More information

Probability, Entropy, and Inference / More About Inference

Probability, Entropy, and Inference / More About Inference Probability, Entropy, and Inference / More About Inference Mário S. Alvim (msalvim@dcc.ufmg.br) Information Theory DCC-UFMG (2018/02) Mário S. Alvim (msalvim@dcc.ufmg.br) Probability, Entropy, and Inference

More information

6.730 Physics for Solid State Applications

6.730 Physics for Solid State Applications 6.730 Physics for Solid State Applications Lecture 25: Chemical Potential and Equilibrium Outline Microstates and Counting System and Reservoir Microstates Constants in Equilibrium Temperature & Chemical

More information

Neyman-Pearson. More Motifs. Weight Matrix Models. What s best WMM?

Neyman-Pearson. More Motifs. Weight Matrix Models. What s best WMM? Neyman-Pearson More Motifs WMM, log odds scores, Neyman-Pearson, background; Greedy & EM for motif discovery Given a sample x 1, x 2,..., x n, from a distribution f(... #) with parameter #, want to test

More information

Information in Biology

Information in Biology Information in Biology CRI - Centre de Recherches Interdisciplinaires, Paris May 2012 Information processing is an essential part of Life. Thinking about it in quantitative terms may is useful. 1 Living

More information

Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet)

Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet) Compression Motivation Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet) Storage: Store large & complex 3D models (e.g. 3D scanner

More information

(# = %(& )(* +,(- Closed system, well-defined energy (or e.g. E± E/2): Microcanonical ensemble

(# = %(& )(* +,(- Closed system, well-defined energy (or e.g. E± E/2): Microcanonical ensemble Recall from before: Internal energy (or Entropy): &, *, - (# = %(& )(* +,(- Closed system, well-defined energy (or e.g. E± E/2): Microcanonical ensemble & = /01Ω maximized Ω: fundamental statistical quantity

More information

Physics 403. Segev BenZvi. Choosing Priors and the Principle of Maximum Entropy. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Choosing Priors and the Principle of Maximum Entropy. Department of Physics and Astronomy University of Rochester Physics 403 Choosing Priors and the Principle of Maximum Entropy Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Odds Ratio Occam Factors

More information

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The

More information

Compressing Kinetic Data From Sensor Networks. Sorelle A. Friedler (Swat 04) Joint work with David Mount University of Maryland, College Park

Compressing Kinetic Data From Sensor Networks. Sorelle A. Friedler (Swat 04) Joint work with David Mount University of Maryland, College Park Compressing Kinetic Data From Sensor Networks Sorelle A. Friedler (Swat 04) Joint work with David Mount University of Maryland, College Park Motivation Motivation Computer Science Graphics: Image and video

More information

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The

More information

BOLTZMANN-GIBBS ENTROPY: AXIOMATIC CHARACTERIZATION AND APPLICATION

BOLTZMANN-GIBBS ENTROPY: AXIOMATIC CHARACTERIZATION AND APPLICATION Internat. J. Math. & Math. Sci. Vol. 23, No. 4 2000 243 251 S0161171200000375 Hindawi Publishing Corp. BOLTZMANN-GIBBS ENTROPY: AXIOMATIC CHARACTERIZATION AND APPLICATION C. G. CHAKRABARTI and KAJAL DE

More information

Lecture 5: Temperature, Adiabatic Processes

Lecture 5: Temperature, Adiabatic Processes Lecture 5: Temperature, Adiabatic Processes Chapter II. Thermodynamic Quantities A.G. Petukhov, PHYS 743 September 20, 2017 Chapter II. Thermodynamic Quantities Lecture 5: Temperature, Adiabatic Processes

More information

Confidence Intervals. First ICFA Instrumentation School/Workshop. Harrison B. Prosper Florida State University

Confidence Intervals. First ICFA Instrumentation School/Workshop. Harrison B. Prosper Florida State University Confidence Intervals First ICFA Instrumentation School/Workshop At Morelia,, Mexico, November 18-29, 2002 Harrison B. Prosper Florida State University Outline Lecture 1 Introduction Confidence Intervals

More information

Learning with Probabilities

Learning with Probabilities Learning with Probabilities CS194-10 Fall 2011 Lecture 15 CS194-10 Fall 2011 Lecture 15 1 Outline Bayesian learning eliminates arbitrary loss functions and regularizers facilitates incorporation of prior

More information

Probability Models for Bayesian Recognition

Probability Models for Bayesian Recognition Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIAG / osig Second Semester 06/07 Lesson 9 0 arch 07 Probability odels for Bayesian Recognition Notation... Supervised Learning for Bayesian

More information

Minimum Error-Rate Discriminant

Minimum Error-Rate Discriminant Discriminants Minimum Error-Rate Discriminant In the case of zero-one loss function, the Bayes Discriminant can be further simplified: g i (x) =P (ω i x). (29) J. Corso (SUNY at Buffalo) Bayesian Decision

More information

EBCOT coding passes explained on a detailed example

EBCOT coding passes explained on a detailed example EBCOT coding passes explained on a detailed example Xavier Delaunay d.xav@free.fr Contents Introduction Example used Coding of the first bit-plane. Cleanup pass............................. Coding of the

More information

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007 Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.

More information

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1 Kraft s inequality An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if N 2 l i 1 Proof: Suppose that we have a tree code. Let l max = max{l 1,...,

More information

Introduction to Probability and Statistics (Continued)

Introduction to Probability and Statistics (Continued) Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:

More information

Thermodynamics of feedback controlled systems. Francisco J. Cao

Thermodynamics of feedback controlled systems. Francisco J. Cao Thermodynamics of feedback controlled systems Francisco J. Cao Open-loop and closed-loop control Open-loop control: the controller actuates on the system independently of the system state. Controller Actuation

More information

Introduction to Information Entropy Adapted from Papoulis (1991)

Introduction to Information Entropy Adapted from Papoulis (1991) Introduction to Information Entropy Adapted from Papoulis (1991) Federico Lombardo Papoulis, A., Probability, Random Variables and Stochastic Processes, 3rd edition, McGraw ill, 1991. 1 1. INTRODUCTION

More information

The Bayes classifier

The Bayes classifier The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal

More information

PERFECT SECRECY AND ADVERSARIAL INDISTINGUISHABILITY

PERFECT SECRECY AND ADVERSARIAL INDISTINGUISHABILITY PERFECT SECRECY AND ADVERSARIAL INDISTINGUISHABILITY BURTON ROSENBERG UNIVERSITY OF MIAMI Contents 1. Perfect Secrecy 1 1.1. A Perfectly Secret Cipher 2 1.2. Odds Ratio and Bias 3 1.3. Conditions for Perfect

More information

As mentioned, we will relax the conditions of our dictionary data structure. The relaxations we

As mentioned, we will relax the conditions of our dictionary data structure. The relaxations we CSE 203A: Advanced Algorithms Prof. Daniel Kane Lecture : Dictionary Data Structures and Load Balancing Lecture Date: 10/27 P Chitimireddi Recap This lecture continues the discussion of dictionary data

More information

BAYESIAN ANALYSIS OF DOSE-RESPONSE CALIBRATION CURVES

BAYESIAN ANALYSIS OF DOSE-RESPONSE CALIBRATION CURVES Libraries Annual Conference on Applied Statistics in Agriculture 2005-17th Annual Conference Proceedings BAYESIAN ANALYSIS OF DOSE-RESPONSE CALIBRATION CURVES William J. Price Bahman Shafii Follow this

More information

Microcanonical Ensemble

Microcanonical Ensemble Entropy for Department of Physics, Chungbuk National University October 4, 2018 Entropy for A measure for the lack of information (ignorance): s i = log P i = log 1 P i. An average ignorance: S = k B i

More information

Statistical Theory 1

Statistical Theory 1 Statistical Theory 1 Set Theory and Probability Paolo Bautista September 12, 2017 Set Theory We start by defining terms in Set Theory which will be used in the following sections. Definition 1 A set is

More information

Some Basic Concepts of Probability and Information Theory: Pt. 2

Some Basic Concepts of Probability and Information Theory: Pt. 2 Some Basic Concepts of Probability and Information Theory: Pt. 2 PHYS 476Q - Southern Illinois University January 22, 2018 PHYS 476Q - Southern Illinois University Some Basic Concepts of Probability and

More information

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging

More information

Chapter 16 Thermodynamics

Chapter 16 Thermodynamics Nicholas J. Giordano www.cengage.com/physics/giordano Chapter 16 Thermodynamics Thermodynamics Introduction Another area of physics is thermodynamics Continues with the principle of conservation of energy

More information

Information Theory Primer:

Information Theory Primer: Information Theory Primer: Entropy, KL Divergence, Mutual Information, Jensen s inequality Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

Classification & Information Theory Lecture #8

Classification & Information Theory Lecture #8 Classification & Information Theory Lecture #8 Introduction to Natural Language Processing CMPSCI 585, Fall 2007 University of Massachusetts Amherst Andrew McCallum Today s Main Points Automatically categorizing

More information

Classification and Regression Trees

Classification and Regression Trees Classification and Regression Trees Ryan P Adams So far, we have primarily examined linear classifiers and regressors, and considered several different ways to train them When we ve found the linearity

More information

AN OPTIMAL STRATEGY FOR SEQUENTIAL CLASSIFICATION ON PARTIALLY ORDERED SETS. Abstract

AN OPTIMAL STRATEGY FOR SEQUENTIAL CLASSIFICATION ON PARTIALLY ORDERED SETS. Abstract AN OPTIMAL STRATEGY FOR SEQUENTIAL CLASSIFICATION ON PARTIALLY ORDERED SETS Thomas S. Ferguson Department of Mathematics UCLA Los Angeles, CA 90095, USA Curtis Tatsuoka 1,2 Department of Statistics The

More information

t = no of steps of length s

t = no of steps of length s s t = no of steps of length s Figure : Schematic of the path of a diffusing molecule, for example, one in a gas or a liquid. The particle is moving in steps of length s. For a molecule in a liquid the

More information

Communication Complexity 16:198:671 2/15/2010. Lecture 6. P (x) log

Communication Complexity 16:198:671 2/15/2010. Lecture 6. P (x) log Communication Complexity 6:98:67 2/5/200 Lecture 6 Lecturer: Nikos Leonardos Scribe: Troy Lee Information theory lower bounds. Entropy basics Let Ω be a finite set and P a probability distribution on Ω.

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms Prof. Tapio Elomaa tapio.elomaa@tut.fi Course Basics A new 4 credit unit course Part of Theoretical Computer Science courses at the Department of Mathematics There will be 4 hours

More information

Discrete Mathematics, Spring 2004 Homework 4 Sample Solutions

Discrete Mathematics, Spring 2004 Homework 4 Sample Solutions Discrete Mathematics, Spring 2004 Homework 4 Sample Solutions 4.2 #77. Let s n,k denote the number of ways to seat n persons at k round tables, with at least one person at each table. (The numbers s n,k

More information

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability

More information

(Classical) Information Theory II: Source coding

(Classical) Information Theory II: Source coding (Classical) Information Theory II: Source coding Sibasish Ghosh The Institute of Mathematical Sciences CIT Campus, Taramani, Chennai 600 113, India. p. 1 Abstract The information content of a random variable

More information

Information Theory with Applications, Math6397 Lecture Notes from September 30, 2014 taken by Ilknur Telkes

Information Theory with Applications, Math6397 Lecture Notes from September 30, 2014 taken by Ilknur Telkes Information Theory with Applications, Math6397 Lecture Notes from September 3, 24 taken by Ilknur Telkes Last Time Kraft inequality (sep.or) prefix code Shannon Fano code Bound for average code-word length

More information

Bayesian networks approximation

Bayesian networks approximation Bayesian networks approximation Eric Fabre ANR StochMC, Feb. 13, 2014 Outline 1 Motivation 2 Formalization 3 Triangulated graphs & I-projections 4 Successive approximations 5 Best graph selection 6 Conclusion

More information

Suggestions for Further Reading

Suggestions for Further Reading Contents Preface viii 1 From Microscopic to Macroscopic Behavior 1 1.1 Introduction........................................ 1 1.2 Some Qualitative Observations............................. 2 1.3 Doing

More information

MATH115. Sequences and Infinite Series. Paolo Lorenzo Bautista. June 29, De La Salle University. PLBautista (DLSU) MATH115 June 29, / 16

MATH115. Sequences and Infinite Series. Paolo Lorenzo Bautista. June 29, De La Salle University. PLBautista (DLSU) MATH115 June 29, / 16 MATH115 Sequences and Infinite Series Paolo Lorenzo Bautista De La Salle University June 29, 2014 PLBautista (DLSU) MATH115 June 29, 2014 1 / 16 Definition A sequence function is a function whose domain

More information

Probability calculus and statistics

Probability calculus and statistics A Probability calculus and statistics A.1 The meaning of a probability A probability can be interpreted in different ways. In this book, we understand a probability to be an expression of how likely it

More information

Classification: Decision Trees

Classification: Decision Trees Classification: Decision Trees Outline Top-Down Decision Tree Construction Choosing the Splitting Attribute Information Gain and Gain Ratio 2 DECISION TREE An internal node is a test on an attribute. A

More information

Computing and Communications 2. Information Theory -Entropy

Computing and Communications 2. Information Theory -Entropy 1896 1920 1987 2006 Computing and Communications 2. Information Theory -Entropy Ying Cui Department of Electronic Engineering Shanghai Jiao Tong University, China 2017, Autumn 1 Outline Entropy Joint entropy

More information

Entropy. Probability and Computing. Presentation 22. Probability and Computing Presentation 22 Entropy 1/39

Entropy. Probability and Computing. Presentation 22. Probability and Computing Presentation 22 Entropy 1/39 Entropy Probability and Computing Presentation 22 Probability and Computing Presentation 22 Entropy 1/39 Introduction Why randomness and information are related? An event that is almost certain to occur

More information

Statistical learning. Chapter 20, Sections 1 4 1

Statistical learning. Chapter 20, Sections 1 4 1 Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

More information

Lecture 8: Clustering & Mixture Models

Lecture 8: Clustering & Mixture Models Lecture 8: Clustering & Mixture Models C4B Machine Learning Hilary 2011 A. Zisserman K-means algorithm GMM and the EM algorithm plsa clustering K-means algorithm K-means algorithm Partition data into K

More information

What is Entropy? Jeff Gill, 1 Entropy in Information Theory

What is Entropy? Jeff Gill, 1 Entropy in Information Theory What is Entropy? Jeff Gill, jgill@ucdavis.edu 1 Entropy in Information Theory There are many definitions of information in various literatures but all of them have the same property of distinction from

More information

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics DS-GA 100 Lecture notes 11 Fall 016 Bayesian statistics In the frequentist paradigm we model the data as realizations from a distribution that depends on deterministic parameters. In contrast, in Bayesian

More information

Entropy Principle in Direct Derivation of Benford's Law

Entropy Principle in Direct Derivation of Benford's Law Entropy Principle in Direct Derivation of Benford's Law Oded Kafri Varicom Communications, Tel Aviv 6865 Israel oded@varicom.co.il The uneven distribution of digits in numerical data, known as Benford's

More information

F R A N C E S C O B U S C E M I ( N A G OYA U N I V E R S I T Y ) C O L L O Q U I U D E P T. A P P L I E D M AT H E M AT I C S H A N YA N G U N I

F R A N C E S C O B U S C E M I ( N A G OYA U N I V E R S I T Y ) C O L L O Q U I U D E P T. A P P L I E D M AT H E M AT I C S H A N YA N G U N I QUANTUM UNCERTAINT Y F R A N C E S C O B U S C E M I ( N A G OYA U N I V E R S I T Y ) C O L L O Q U I U M @ D E P T. A P P L I E D M AT H E M AT I C S H A N YA N G U N I V E R S I T Y ( E R I C A ) 2

More information

although Boltzmann used W instead of Ω for the number of available states.

although Boltzmann used W instead of Ω for the number of available states. Lecture #13 1 Lecture 13 Obectives: 1. Ensembles: Be able to list the characteristics of the following: (a) icrocanonical (b) Canonical (c) Grand Canonical 2. Be able to use Lagrange s method of undetermined

More information

Information Theory. Lecture 5 Entropy rate and Markov sources STEFAN HÖST

Information Theory. Lecture 5 Entropy rate and Markov sources STEFAN HÖST Information Theory Lecture 5 Entropy rate and Markov sources STEFAN HÖST Universal Source Coding Huffman coding is optimal, what is the problem? In the previous coding schemes (Huffman and Shannon-Fano)it

More information

Reconstruction. Reading for this lecture: Lecture Notes.

Reconstruction. Reading for this lecture: Lecture Notes. ɛm Reconstruction Reading for this lecture: Lecture Notes. The Learning Channel... ɛ -Machine of a Process: Intrinsic representation! Predictive (or causal) equivalence relation: s s Pr( S S= s ) = Pr(

More information

Chapter 3: Asymptotic Equipartition Property

Chapter 3: Asymptotic Equipartition Property Chapter 3: Asymptotic Equipartition Property Chapter 3 outline Strong vs. Weak Typicality Convergence Asymptotic Equipartition Property Theorem High-probability sets and the typical set Consequences of

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

LECTURE 1. 1 Introduction. 1.1 Sample spaces and events

LECTURE 1. 1 Introduction. 1.1 Sample spaces and events LECTURE 1 1 Introduction The first part of our adventure is a highly selective review of probability theory, focusing especially on things that are most useful in statistics. 1.1 Sample spaces and events

More information

Lecture 8. The Second Law of Thermodynamics; Energy Exchange

Lecture 8. The Second Law of Thermodynamics; Energy Exchange Lecture 8 The Second Law of Thermodynamics; Energy Exchange The second law of thermodynamics Statistics of energy exchange General definition of temperature Why heat flows from hot to cold Reading for

More information

The Effects of Coarse-Graining on One- Dimensional Cellular Automata Alec Boyd UC Davis Physics Deparment

The Effects of Coarse-Graining on One- Dimensional Cellular Automata Alec Boyd UC Davis Physics Deparment The Effects of Coarse-Graining on One- Dimensional Cellular Automata Alec Boyd UC Davis Physics Deparment alecboy@gmail.com Abstract: Measurement devices that we use to examine systems often do not communicate

More information

Massachusetts Institute of Technology

Massachusetts Institute of Technology Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Department of Mechanical Engineering 6.050J/2.0J Information and Entropy Spring 2005 Issued: March 7, 2005

More information

Combinations and Probabilities

Combinations and Probabilities Combinations and Probabilities Tutor: Zhang Qi Systems Engineering and Engineering Management qzhang@se.cuhk.edu.hk November 2014 Tutor: Zhang Qi (SEEM) Tutorial 7 November 2014 1 / 16 Combination Review

More information

Frequentist-Bayesian Model Comparisons: A Simple Example

Frequentist-Bayesian Model Comparisons: A Simple Example Frequentist-Bayesian Model Comparisons: A Simple Example Consider data that consist of a signal y with additive noise: Data vector (N elements): D = y + n The additive noise n has zero mean and diagonal

More information

Thermodynamics of the nucleus

Thermodynamics of the nucleus Thermodynamics of the nucleus Hilde-Therese Nyhus 1. October, 8 Hilde-Therese Nyhus Thermodynamics of the nucleus Owerview 1 Link between level density and thermodynamics Definition of level density Level

More information

Lecture Notes 1 Basic Probability. Elements of Probability. Conditional probability. Sequential Calculation of Probability

Lecture Notes 1 Basic Probability. Elements of Probability. Conditional probability. Sequential Calculation of Probability Lecture Notes 1 Basic Probability Set Theory Elements of Probability Conditional probability Sequential Calculation of Probability Total Probability and Bayes Rule Independence Counting EE 178/278A: Basic

More information

Thermodynamics Second Law Entropy

Thermodynamics Second Law Entropy Thermodynamics Second Law Entropy Lana Sheridan De Anza College May 9, 2018 Last time entropy (macroscopic perspective) Overview entropy (microscopic perspective) Reminder of Example from Last Lecture

More information

Lecture 5: Asymptotic Equipartition Property

Lecture 5: Asymptotic Equipartition Property Lecture 5: Asymptotic Equipartition Property Law of large number for product of random variables AEP and consequences Dr. Yao Xie, ECE587, Information Theory, Duke University Stock market Initial investment

More information

Endogenous Information Choice

Endogenous Information Choice Endogenous Information Choice Lecture 7 February 11, 2015 An optimizing trader will process those prices of most importance to his decision problem most frequently and carefully, those of less importance

More information

Bayesian Methods: Naïve Bayes

Bayesian Methods: Naïve Bayes Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior

More information

Information Theory and Coding Techniques: Chapter 1.1. What is Information Theory? Why you should take this course?

Information Theory and Coding Techniques: Chapter 1.1. What is Information Theory? Why you should take this course? Information Theory and Coding Techniques: Chapter 1.1 What is Information Theory? Why you should take this course? 1 What is Information Theory? Information Theory answers two fundamental questions in

More information

arxiv: v1 [physics.data-an] 10 Sep 2007

arxiv: v1 [physics.data-an] 10 Sep 2007 arxiv:0709.1504v1 [physics.data-an] 10 Sep 2007 Maximum Entropy, Time Series and Statistical Inference R. Kariotis Department of Physics University of Wisconsin Madison, Wisconsin 53706 ABSTRACT: A brief

More information

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16 EE539R: Problem Set 4 Assigned: 3/08/6, Due: 07/09/6. Cover and Thomas: Problem 3.5 Sets defined by probabilities: Define the set C n (t = {x n : P X n(x n 2 nt } (a We have = P X n(x n P X n(x n 2 nt

More information

Outline Review Example Problem 1. Thermodynamics. Review and Example Problems: Part-2. X Bai. SDSMT, Physics. Fall 2014

Outline Review Example Problem 1. Thermodynamics. Review and Example Problems: Part-2. X Bai. SDSMT, Physics. Fall 2014 Review and Example Problems: Part- SDSMT, Physics Fall 014 1 Review Example Problem 1 Exponents of phase transformation : contents 1 Basic Concepts: Temperature, Work, Energy, Thermal systems, Ideal Gas,

More information

Homework 4 Solution, due July 23

Homework 4 Solution, due July 23 Homework 4 Solution, due July 23 Random Variables Problem 1. Let X be the random number on a die: from 1 to. (i) What is the distribution of X? (ii) Calculate EX. (iii) Calculate EX 2. (iv) Calculate Var

More information

Lecture 2 Binomial and Poisson Probability Distributions

Lecture 2 Binomial and Poisson Probability Distributions Binomial Probability Distribution Lecture 2 Binomial and Poisson Probability Distributions Consider a situation where there are only two possible outcomes (a Bernoulli trial) Example: flipping a coin James

More information

Lecture 1: Probability Fundamentals

Lecture 1: Probability Fundamentals Lecture 1: Probability Fundamentals IB Paper 7: Probability and Statistics Carl Edward Rasmussen Department of Engineering, University of Cambridge January 22nd, 2008 Rasmussen (CUED) Lecture 1: Probability

More information

Pattern Recognition and Machine Learning. Learning and Evaluation of Pattern Recognition Processes

Pattern Recognition and Machine Learning. Learning and Evaluation of Pattern Recognition Processes Pattern Recognition and Machine Learning James L. Crowley ENSIMAG 3 - MMIS Fall Semester 2016 Lesson 1 5 October 2016 Learning and Evaluation of Pattern Recognition Processes Outline Notation...2 1. The

More information

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE General e Image Coder Structure Motion Video x(s 1,s 2,t) or x(s 1,s 2 ) Natural Image Sampling A form of data compression; usually lossless, but can be lossy Redundancy Removal Lossless compression: predictive

More information

Bayesian Learning. Bayesian Learning Criteria

Bayesian Learning. Bayesian Learning Criteria Bayesian Learning In Bayesian learning, we are interested in the probability of a hypothesis h given the dataset D. By Bayes theorem: P (h D) = P (D h)p (h) P (D) Other useful formulas to remember are:

More information

Probability - Lecture 4

Probability - Lecture 4 1 Introduction Probability - Lecture 4 Many methods of computation physics and the comparison of data to a mathematical representation, apply stochastic methods. These ideas were first introduced in the

More information

It From Bit Or Bit From Us?

It From Bit Or Bit From Us? It From Bit Or Bit From Us? Majid Karimi Research Group on Foundations of Quantum Theory and Information Department of Chemistry, Sharif University of Technology On its 125 th anniversary, July 1 st, 2005

More information

Statistical Methods for Astronomy

Statistical Methods for Astronomy Statistical Methods for Astronomy Probability (Lecture 1) Statistics (Lecture 2) Why do we need statistics? Useful Statistics Definitions Error Analysis Probability distributions Error Propagation Binomial

More information

How generative models develop in predictive processing

How generative models develop in predictive processing Faculty of Social Sciences Bachelor Artificial Intelligence Academic year 2016-2017 Date: 18 June 2017 How generative models develop in predictive processing Bachelor s Thesis Artificial Intelligence Author:

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information