Column: The Physics of Digital Information-Part 2 1

Similar documents
Column: The Physics of Digital Information 1

1.1 The Boolean Bit. Note some important things about information, some of which are illustrated in this example:

Cryptographic Hash Functions

CprE 281: Digital Logic

How generative models develop in predictive processing

Bits. Chapter 1. Information can be learned through observation, experiment, or measurement.

Creative Objectivism, a powerful alternative to Constructivism

Take the measurement of a person's height as an example. Assuming that her height has been determined to be 5' 8", how accurate is our result?

Probability and Statistics

Ch 7. Finite State Machines. VII - Finite State Machines Contemporary Logic Design 1

Chapter 7 HYPOTHESIS-BASED INVESTIGATION OF DIGITAL TIMESTAMPS. 1. Introduction. Svein Willassen

EECS150 - Digital Design Lecture 23 - FSMs & Counters

Great Theoretical Ideas in Computer Science. Lecture 7: Introduction to Computational Complexity

Sequential Logic. Handouts: Lecture Slides Spring /27/01. L06 Sequential Logic 1

From Sequential Circuits to Real Computers

Improving Disk Sector Integrity Using 3-dimension Hashing Scheme

Models for representing sequential circuits

2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51

Overview. Discrete Event Systems Verification of Finite Automata. What can finite automata be used for? What can finite automata be used for?

Unit 8: Sequ. ential Circuits

Turing Machines, diagonalization, the halting problem, reducibility

Let s now begin to formalize our analysis of sequential machines Powerful methods for designing machines for System control Pattern recognition Etc.

6. Finite State Machines

ELEC Digital Logic Circuits Fall 2014 Sequential Circuits (Chapter 6) Finite State Machines (Ch. 7-10)

UNIVERSITY OF BOLTON SCHOOL OF ENGINEERING BENG (HONS) ELECTRICAL & ELECTRONICS ENGINEERING EXAMINATION SEMESTER /2017

5 + 9(10) + 3(100) + 0(1000) + 2(10000) =

CSE370 HW6 Solutions (Winter 2010)

12 Hash Functions Defining Security

An introduction to basic information theory. Hampus Wessman

ENGG 1203 Tutorial_9 - Review. Boolean Algebra. Simplifying Logic Circuits. Combinational Logic. 1. Combinational & Sequential Logic

ECS 189A Final Cryptography Spring 2011

Treatment of Error in Experimental Measurements

Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi

Why Complexity is Different

Entanglement and information

Recommended Reading. A Brief History of Infinity The Mystery of the Aleph Everything and More

MACHINE COMPUTING. the limitations

Lecture 8: Sequential Networks and Finite State Machines

A Note on Turing Machine Design

Sequential Logic (3.1 and is a long difficult section you really should read!)

ACTIVITY 2: Motion and Energy

Lecture 1: Shannon s Theorem

Topics in Computer Mathematics

Collaborative topic models: motivations cont

1 Distributional problems

Clojure Concurrency Constructs, Part Two. CSCI 5828: Foundations of Software Engineering Lecture 13 10/07/2014

Counting. 1 Sum Rule. Example 1. Lecture Notes #1 Sept 24, Chris Piech CS 109

Knots, Coloring and Applications

This is a repository copy of Improving the associative rule chaining architecture.

Clocked Sequential Circuits UNIT 13 ANALYSIS OF CLOCKED SEQUENTIAL CIRCUITS. Analysis of Clocked Sequential Circuits. Signal Tracing and Timing Charts

CS 361 Meeting 26 11/10/17

Introducing Proof 1. hsn.uk.net. Contents

Discrete Probability and State Estimation

Vectors for Beginners

Table of Content. Chapter 11 Dedicated Microprocessors Page 1 of 25

5th March Unconditional Security of Quantum Key Distribution With Practical Devices. Hermen Jan Hupkes

de Blanc, Peter Ontological Crises in Artificial Agents Value Systems. The Singularity Institute, San Francisco, CA, May 19.

Reducing power in using different technologies using FSM architecture

Laboratory Exercise #11 A Simple Digital Combination Lock

Last lecture Counter design Finite state machine started vending machine example. Today Continue on the vending machine example Moore/Mealy machines

Topic Contents. Factoring Methods. Unit 3: Factoring Methods. Finding the square root of a number

How to write maths (well)

The Cycloid. and the Kinematic Circumference. by Miles Mathis

A Brief Introduction to Proofs

Counters. We ll look at different kinds of counters and discuss how to build them

2 Systems of Linear Equations

of Digital Electronics

Lecture 18 - Secret Sharing, Visual Cryptography, Distributed Signatures

AUTOMATIC CONTROL. Andrea M. Zanchettin, PhD Spring Semester, Introduction to Automatic Control & Linear systems (time domain)

Design Example: 4-bit Sequence Detector

ENEE 459-C Computer Security. Message authentication (continue from previous lecture)

NONLINEAR FEEDBACK LOOPS. Posted on February 5, 2016 by Joss Colchester

Notes on Zero Knowledge

THE SIMPLE PROOF OF GOLDBACH'S CONJECTURE. by Miles Mathis

Sequential Logic. Road Traveled So Far

Case Studies of Logical Computation on Stochastic Bit Streams

Pr[X = s Y = t] = Pr[X = s] Pr[Y = t]

Uncertainty: A Reading Guide and Self-Paced Tutorial

Undecidability. Andreas Klappenecker. [based on slides by Prof. Welch]

Lecture 14: Secure Multiparty Computation

Operational amplifiers (Op amps)

Equivalence Checking of Sequential Circuits

15 Skepticism of quantum computing

CPSC 467: Cryptography and Computer Security

Opleiding Informatica

Beyond the MD5 Collisions

QUIZ - How Well Do You Know RPN? Richard J. Nelson

Designing Information Devices and Systems I Summer 2017 D. Aranki, F. Maksimovic, V. Swamy Homework 5

Lecture 14, Thurs March 2: Nonlocal Games

Bell s inequalities and their uses

The effect of learning on membership and welfare in an International Environmental Agreement

11.1 As mentioned in Experiment 10, sequential logic circuits are a type of logic circuit where the output of

Great Theoretical Ideas in Computer Science. Lecture 9: Introduction to Computational Complexity

Welcome to Comp 411! 2) Course Objectives. 1) Course Mechanics. 3) Information. I thought this course was called Computer Organization

Equalities and Uninterpreted Functions. Chapter 3. Decision Procedures. An Algorithmic Point of View. Revision 1.0

Parallel Performance Evaluation through Critical Path Analysis

Axiom of Choice extended to include Relativity

Quantum Error Correcting Codes and Quantum Cryptography. Peter Shor M.I.T. Cambridge, MA 02139

Math 308 Midterm Answers and Comments July 18, Part A. Short answer questions

Quantum Computing. Vraj Parikh B.E.-G.H.Patel College of Engineering & Technology, Anand (Affiliated with GTU) Abstract HISTORY OF QUANTUM COMPUTING-

Transcription:

Column: The Physics of Digital Information-Part 2 1 Fred Cohen In part 1 of this series (Cohen, 2011a), we discussed some of the basics of building a physics of digital information. Assuming, as we have, that science is about causality and that a scientific theory should require that cause(c) produces effect (E) via mechanism M (written C M E), we explore that general theory of digital systems from the perspective of attributing effects (i.e., traces of activities in digital systems) to their causes. Full details of the current version of this physics are available online 2, and in this article, we explore a few more of them. Previous results questioning consensus around common definitions for the field of digital forensics (Cohen, 2010) have led to additional study suggesting that definitions presented before discussion lead to substantial consensus (Cohen, 2012). Thus each item discussed will start with a loose definition and example. Definition: A unique history is a single C M E chain that is the only consistent path from the cause to the effect. For example, suppose we have an imitative copy 3 of an asserted electronic message sent from one party to another. Given that trace and a set of claims about the computers involved, a unique history would demonstrate that there is one and only one party who could have produced the resulting trace, using one and only one process, at one and only one time, from one and only one place. Current state does not always imply unique history. More generally there are two important rules that are almost always true for the DFE examiner: Given initial state and inputs, later outputs and states are known. Given final state and output, inputs and prior states are not unique. Digital systems have a finite number of states (settings of the digital values across all of the stored values in the system). The mechanisms that manipulate digital data are commonly called finite state machines or automata (FSM), 1 This editorial piece is extracted and modified from Cohen (2011c). 2 http://infophys.com/ 3 Imitative copy := A reproduction of both the form and content of a record. This is what is typically available and called an exact, bit image, or forensically sound, copy in digital forensics. See Cohen (2011b). 7

often detailed in terms of Moore machines (Moore, 1956) or Mealy machines (Mealy, 1955). In such machines, current state and input lead to the next state and output of the machine in a unique way. That is, given the initial state and sequence of inputs, the final state and sequence of outputs are uniquely determined. Thus time transforms the artifice as it moves forward. But in digital forensics, we generally don't start with causes and try to predict effects. Rather, we start with effects and seek to identify their causes. In modern computers it is almost never possible to run time backwards given a set of traces, and identify a unique history that led to the traces found. Definition: Convergence asserts that, as a mechanism transforms inputs and internal states into outputs and subsequent internal states over time, different inputs produce identical outputs. Divergence asserts the opposite, that for the same input, different outputs are detectable. For example, if we test rolling a rock down a hill repeatedly and, no matter how tightly we control the process, there are slightly different outcomes each time, this would be divergence. But if we ran an FSM forward again and again with different inputs and initial states each time, and got identical outputs and final states, this would be convergence. Digital space converges while physical space diverges with time. The digital artifice over time is, in general, a many-to-one transform. Furthermore, inverting time in an FSM produces potentially enormous class sets of possible prior states and inputs, and determining them precisely is too complex to be done for nontrivial systems (Backes, Kopf, & Rybalchenko, 2009). This is at odds with the current model of the natural world, in that physical space is generally believed to have an essentially infinite number of possible states and to increase in entropy over time so that order is always reduced. No matter how tightly we control a physical experiment, there will always be a level of granularity at which outputs are differentiable. The difference between the digital and physical spaces is greatly influenced by the fact that the digital space has only finite granularity in time and space, as was discussed in the first article in this series. Definition: Equivalent machines are, possibly different machines that, from an external perspective, behave identically with respect to a defined set of external data. For example, different compilers may transform the same program into different binary executable codes that work slightly differently even though they produce the same outputs from the same inputs. Many FSMs are equivalent. An unlimited number of different FSMs may produce the same output 8

sequence from the same or different input sequences. For example, at the level of computer programs in common use, an editor, digital recorder, or user program, may produce the same outputs from different inputs. With incomplete traces, we cannot uniquely determine prior states and inputs. To the extent that traces are more or less complete, we may or may not be able to uniquely determine or bound the set of programs that might have produced the traces. We may not even be able to determine the extent of completeness of traces we have. Definition: A lossy transform is a mapping from input to output that cannot be reversed to produce a unique input. That is, it is a many to one transform. For example, JPEG files are often compressed using the JPEG lossy compression algorithm (Hamilton, 1992). The results trade off space for quality. Hash functions and digital signatures as lossy and thus not unique. Any transform that produces output space of a predefined size for an input space of a larger size is lossy and thus not unique. As an example, an MD5 or SHA hash of a file does not uniquely identify that file. There are in fact an unlimited number of other files that would produce that same hash value. Being careful, note that this is not an infinite number of files only an unlimited number of them. To see this, suppose we generate file after file of length one bit more than the length of the hash. Since the length is one bit more, there are twice as many files of that length than there are hash values. If we create one after another of these files, eventually we will exhaust all of the possible values for the hash function, and as soon as we get to one more unique input file than the number of possible hash value, we are guaranteed that two different input files will have identical hash values. This does not make such hashes useless in digital forensics, but it does mean that they do not uniquely identify an input or certify that a produced file is unaltered from its initial creation. A summary of properties There are many other properties of digital systems and the physics of digital information. A summary extracted from the book chapter identified above is included here to expand thinking about these issues. Digital World Finite time granularity (the clock) Finite space granularity (the bit) Physical World Infinite time granularity Infinite space granularity 9

Digital World Observation without alteration Exact copies, original intact Theft without direct loss Finite (fast) rate of movement An artifice created by people Finite State Machines (FSMs) Homing sequences may exist Forward time perfect prediction Backward time non-unique Digital space converges in time The results are always bits Results are always "Exact" Time is a partial ordering Errors accumulate Representation limits accuracy Precision may exceed accuracy Forgery can be perfect DFE is almost always latent DFE is trace but not transfer DFE is circumstantial DFE is hearsay Physical World No observation w/out alteration No exact copy, original changed Theft produces direct loss No locality (entanglement) A reality regardless of people Physics and field equations No perfect repeatability Forward time non-unique Backward time unique Physical space diverges in time The results are always continua Results never perfectly known Time is real(location) Errors are local Reality is what it is Precision is potentially infinite Forgery cannot be perfect Some evidence is latent Traces comes from transfers Evidence is circumstantial Evidence is physical 10

Digital World DFE cannot place a person at a place at a time DFE can show consistency or inconsistency only Probability is dubious Content has information density Content density variable Content perfectly compressible Digital signatures, fingerprints, etc. generated from content Physical World Evidence may put an individual at a place at a time Evidence can show more than just consistency Probability is often usable No defined density limits Content density not controlled No perfect compression Body (phenome) generated from DNA (genome) Content meaning is dictated by context No universal theory of meaning but physicality exists regardless Context tends to be global and dramatically changes meaning FSMs come to a conclusion Cognitive limits from program Hardware fault models from computer engineering Time and space tradeoffs known Near perfect virtualization and simulation possible Many nearly or equivalent FSMs Undecidable problems Computational complexity limits computations Context tends to be local and incrementally changes meaning Eats shoots and leaves Cognitive limits from physiology Hardware fault models from physics Tradeoffs unclear No virtualization The uncertainty principal Nothing known as "unthinkable" No well understood limits on new ideas 11

Digital World Everything is decidable Consistency is guaranteed Completeness is guaranteed Consistency AND completeness Time limits on achievable results Complexity-based designs Fault tolerance by design Accidental assumption violations Intentional assumption violations Discontinuous space Discontinuous time Minor differences amplified near discontinuities Major differences suppressed away from discontinuities Identical use of an interface may produce different results Ordering may be reversed Value sorts may be reversed Actuate-sensors loop errors Sensors/actuators limited in physical properties Physical World Many things are not decidable Consistency is possible Completeness is possible Consistency OR completeness Time limits unknown Complexity not determinant Normally not fault tolerant Assumptions non-violable Assumptions non-violable Continuous space Continuous time Differences retain fidelity Differences retain fidelity No such thing as identical, each thing is unique Ordering subject to light time Value sorts remain consistent Interference based errors All physical properties present Table 1 Summary of Information Physics 12

A final comment There is a lot to learn about the physics of digital information, and from the perspective of digital forensics, this is the sort of knowledge that is increasingly necessary to understanding what you are doing when you undertake to testify about such matters. I urge you to review the details of the physics in its full richness and with its current limitations, and to draw your own conclusions. Read the chapter cited above, comment on it, prove it is wrong if and where it is, show its limits, and move the field forward. And I urge you to challenge yourself and others to up your game. In case after case, we encounter self-identified experts who don't understand the basics of how things work and end up testifying with inadequate basis. In many cases their conclusions may be right, but their presentation and the facts they provide may not support them. In other cases, their conclusions are not right at all. At the heart of it all is the lack of attention to the basics of the science that underlies digital forensics. This is a problem we hope to continue to address in this series and this publication. References Backes, M., Kopf, B., & Rybalchenko, A. (2009). Automatic Discovery of Quantification of Information Leaks. In Proceedings of the 30th IEEE Symposium on Security and Privacy, May 17-20, 2009, Berkeley, CA. Los Alamitos, CA: IEEE. Cohen, F. (2010). Toward a Science of Digital Forensic Evidence Examination. IFIP TC11.8 International Conference on Digital Forensics, Hong Kong, China, January 4, 2010. In K.-P. Chow and S. Shenoi (eds.), Advances in Digital Forensics VI, Springer, pp. 17-36. Cohen, F. (2011a). Digital Forensic Evidence Examination. Livermore, CA: ASP Press. Cohen, F. (2011b). Putting the Science in Digital Forensics. Journal of Digital Forensics, Security and Law, 6(1), 7-14. Cohen, F. (2011c). The Physics of Digital Information. Journal of Digital Forensics, Security and Law, 6(3), 11-16. Cohen, F. (2012). Update on the State of the Science of Digital Evidence Examination. Submission to the 2012 Conference of Digital Forensics, Security and Law. Hamilton, E. (1992, September 1). JPEG File Interchange Format, Version 1.02. Milpitas, CA: C-Cube Microsystems. Retrieved from http://www.jpeg.org/public/jfif.pdf 13

Mealy, G. (1955). A Method for Synthesizing Sequential Circuits. Bell Systems Technical Journal, 34, 1045 1079. Moore, E.F. (1956). Gedanken experiments on sequential machines. In Automata Studies. Princeton, N. J.: Princeton University Press, pp. 129-153. 14