Bayesian Methods for Diagnosing Physiological Conditions of Human Subjects from Multivariate Time Series Biosensor Data

Similar documents
Naïve Bayes classification

DEEP LEARNING CHAPTER 3 PROBABILITY & INFORMATION THEORY

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

MODULE -4 BAYEIAN LEARNING

Bayesian Machine Learning

Machine Learning, Fall 2012 Homework 2

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

ECE521 week 3: 23/26 January 2017

Probability Based Learning

Introduction. Chapter 1

Final Exam, Fall 2002

Bayesian Learning Features of Bayesian learning methods:

Chris Bishop s PRML Ch. 8: Graphical Models

STA 4273H: Statistical Machine Learning

Click Prediction and Preference Ranking of RSS Feeds

Bayesian Approaches Data Mining Selected Technique

Conditional Independence

LECTURE NOTE #3 PROF. ALAN YUILLE

CPSC 540: Machine Learning

Learning Bayesian network : Given structure and completely observed data

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

CSC 411: Lecture 09: Naive Bayes

Bayesian Classifiers, Conditional Independence and Naïve Bayes. Required reading: Naïve Bayes and Logistic Regression (available on class website)

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

Bayesian Networks BY: MOHAMAD ALSABBAGH

Exact model averaging with naive Bayesian classifiers

1 EM Primer. CS4786/5786: Machine Learning for Data Science, Spring /24/2015: Assignment 3: EM, graphical models

Bayesian Learning (II)

Undirected Graphical Models

HOMEWORK #4: LOGISTIC REGRESSION

Introduction to ML. Two examples of Learners: Naïve Bayesian Classifiers Decision Trees

Linear Classification

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

DEPARTMENT OF COMPUTER SCIENCE AUTUMN SEMESTER MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

CS340 Winter 2010: HW3 Out Wed. 2nd February, due Friday 11th February

Machine Learning, Fall 2009: Midterm

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

PATTERN CLASSIFICATION

Advanced Introduction to Machine Learning

Introduction to Bayesian Learning. Machine Learning Fall 2018

Probabilistic Machine Learning

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Non-Parametric Bayes

Probabilistic Graphical Models

L11: Pattern recognition principles

10-701/ Machine Learning, Fall

CS Machine Learning Qualifying Exam

Recurrent Latent Variable Networks for Session-Based Recommendation

Probabilistic Graphical Networks: Definitions and Basic Results

Bayesian Inference. Definitions from Probability: Naive Bayes Classifiers: Advantages and Disadvantages of Naive Bayes Classifiers:

Conjoint Modeling of Temporal Dependencies in Event Streams. Ankur Parikh Asela Gunawardana Chris Meek

Directed Graphical Models

CS6220: DATA MINING TECHNIQUES

Bayesian Learning. CSL603 - Fall 2017 Narayanan C Krishnan

Lecture 9: Bayesian Learning

Lecture 2: Basic Concepts of Statistical Decision Theory

Bayesian Learning. Two Roles for Bayesian Methods. Bayes Theorem. Choosing Hypotheses

p(x ω i 0.4 ω 2 ω

Probability and Information Theory. Sargur N. Srihari

Algorithmisches Lernen/Machine Learning

Intelligent Systems (AI-2)

Directed Graphical Models or Bayesian Networks

STA 4273H: Statistical Machine Learning

Probabilistic Graphical Models

Representation. Stefano Ermon, Aditya Grover. Stanford University. Lecture 2

CS540 ANSWER SHEET

Final Examination CS 540-2: Introduction to Artificial Intelligence

Bayesian Decision Theory

Lecture 1 October 9, 2013

Lecture 3: Pattern Classification

Nonparametric Bayesian Methods (Gaussian Processes)

Notes on Machine Learning for and

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

Data Mining Part 4. Prediction

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber

Machine Learning

Learning Gaussian Process Models from Uncertain Data

Pattern Recognition. Parameter Estimation of Probability Density Functions

Linear Models for Classification

Forecasting Wind Ramps

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Unsupervised Learning with Permuted Data

Introduction to Probability and Statistics (Continued)

MACHINE LEARNING ADVANCED MACHINE LEARNING

Advanced statistical methods for data analysis Lecture 2

Probabilistic Graphical Models

Lecture 3: Pattern Classification. Pattern classification

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

Generative Learning algorithms

HOMEWORK #4: LOGISTIC REGRESSION

Bayesian Methods: Naïve Bayes

Uncertainty and Bayesian Networks

Ch 4. Linear Models for Classification

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

Transcription:

Bayesian Methods for Diagnosing Physiological Conditions of Human Subjects from Multivariate Time Series Biosensor Data Mehmet Kayaalp Lister Hill National Center for Biomedical Communications, U.S. National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, Maryland Email: Mehmet.Kayaalp@NIH.gov In this study, we developed a set of new methods to diagnose physiological conditions (PCs) of human subjects based on biometric multichannel time series data that were obtained through nine biosensors attached on the upper arms of the human test subjects. The task is to learn the relationships between PCs and the biosensor data on the training dataset and identify the PCs of interest on the test dataset. The goal of this study is to test the predictive performances of two sets of Bayesian network topologies and the effectiveness of a new parameter learning method applied to the problem on the provided dataset. This paper summarizes these methods. Data The dataset was collected and made available to the public by BodyMedia, Inc. as the workshop dataset of the Physiological Data Modeling Contest at the International Conference on Machine Learning in 2004. 1 The entire dataset was deidentified such that the identifiers of human subjects and the original labels of PCs were replaced with unique integers. The training dataset was provided in 16 columns (see Table 1) and 580,264 cases (temporal records). The test dataset comprises 720,792 cases, whose format follows that of the training dataset except for the columns PC and gender, which are omitted and left to be predicted by the participants for each test case. Table 1: The Format of the Training Dataset 2 Line char1 char2 PC gender sensor1 sensor9 userid sessionid sessiontime 1 38 0 0 0 0.168662-0.41966 1 2 0 2 38 0 0 0 0.159785-0.31002 1 2 60000 580264 24 0 0 0 0.754541-0.30018 32 4064 5400000 1 The dataset decomposed into two mutually exclusive subsets, training and test data, is available at the official website of the workshop: http://www.cs.utexas.edu/users/sherstov/pdmc/ 2 For the semantics of char1, char2 and gender, please refer to Section Determining Genders of the Subjects.

Dynamic Simple Bayesian Modeling Dynamic simple Bayesian (DSB) modeling is a Bayesian method to forecast future outcomes of stochastic processes given a sequence of multivariate time series (Kayaalp, 2003). A DSB model is a dynamic Bayesian network with the following assumptions (see Figure 1): 1. The underlying data-generating stochastic process is strictly stationary, which implies that the parameters of the stochastic process are constant under any time displacement d! ; i.e., P ( ti),..., ( tn) = P ti+ d,..., tn+d. ( ) ( ( ) ( ) ) 2. All contemporaneous variables of a stochastic process are measured concurrently, and all successive measurements are performed in equidistant time intervals. 3. Temporal dependencies are first-order Markov; that is, the state of a process at time depends only on the state of the same process at time t, i.e. tn n 1 ( ( n) ( n 1) ) = ( ( n) ( n 1 ),..., ( )) P t t P t t t 0. 4. The observations on a given time slice t are conditionally independent, given the outcomes in. t +1 5. All variables are discrete. 1 ( t) ( t 0 ) ( t 1 ) ( ) ( t 3 ) t 2 2 3 ( t) time past future Figure 1: Dynamic Simple Bayesian (DSB) Model with Three Temporal Variables In the provided dataset, physiological signals are represented in continuous numbers over a discrete timeline. The conventional approach of using continuous data in Bayesian networks usually requires either a priori discretization of the data or a priori assumption that the data conform to a particular parametric distribution such as Gaussian. These approaches may be optimal under certain conditions; however, they may fail when their (rather strong) premises contradict with the underlying distributions.

In this study, parameters of the Bayesian networks are learned from continuous data. Consider the Bayesian network in Figure 2, where and Y are continuous and binomial variables, respectively. 0.04 p (,Y = 2) Y 0.03 0.02 p (,Y = 1) 0.01 0-5 5 15 25 35 45 Figure 2: A Bivariate Bayesian Network and the Associated Density Distributions The area under any segment of a probability density function renders the probability density of that segment, x2 p( x1 x2) p( ) d and the probability of at any given point x is equal to 0. 3 = x, (1) x1 x P( x) p( ) dx 0 = = = (2) x Using continuous data in Bayesian networks has always been problematic. The most important problem is the difficulty of obtaining the integral value of Equation (1) for multivariate distributions. The probability of any point on the real line is a null set may also be construed as a hurdle to parameterize a test data point in question. To solve the current PY= x, which is a non-zero value for a problem, what we need is to obtain the value of ( ) continuous variable and a categorical variable Y. Namely, in the example illustrated in PY= 1 = x + PY= 2 = x =1. Figure 2, ( ) ( ) Given a test case in which = x, how can we learn PY ( = y= x) from a finite sample? Our approach is online parameter learning; that is, parameters of every test case are learned dynamically. Each probability of interest is learned parametrically by combining values of all training sample points via a sigmoid decay function (sdf), which discounts the contribution of each sample point x i according to the Euclidean distance between x i and the test data point x of interest. 3 In this work, we denote probability density functions with (a lower case letter) p and probability mass functions with (an upper case letter) P.

N yk x ( k, xi) n y = (3) a xi i x c scale 1 ( + e ) + N N αk k ( k x) = α P y + y Y y x yx (4) In Equation (3), n( y k, x i) denotes the number of data points in the training sample where values of variable Y and are equal to yk and x i, respectively, and scale( ) = max ( ) min ( ). In this study, we set sdf parameters as a = 400 and c = 6. In Equation (4), αk α denotes the prior probability of Y = yk given = x, and α = α i, where all priors i iki, k αi = αk. are distributed uniformly, i.e. ( ) ( n 2 ) The computational complexity of Equation (3) as implemented in this study is Θ where n is the number of unique 4 data points in the training sample. However, the current algorithm may easily be approximated numerically to achieve Θ ( 1). This method enables us to eliminate discrete data requirement (Assumption 5) from the list of assumptions without making as strong assumptions about the underlying distributions of the data as done by other methods (Shachter & Kenley, 1989; Cowell, 1998). The rationale of the method is based on the natural order of the continuous data. It takes into account that measurements of a given phenomenon are usually bound to a certain variance, which may be accommodated through the parameters of the sdf. In this study, the outcomes of interest are categorical; hence, we restrict our implementation to discrete Y variables. If Y were defined on real line, the two density plots in Figure 2 would be a surface plot. In that case, the unidimensional Euclidian distance metric xi x of Equation (3) would be replaced with a two-dimensional one such as ( xi x) + ( yj y) 2 2 and the single summation over i in Equation (3) would be replaced with a double summation over i and j. Extending sdf to an n-dimensional metric space is straightforward (Berge, 1962). Modified Dynamic Simple Bayesian Modeling DSB models are designed for forecasting (i.e. for predicting future outcomes); however, the current task is diagnosing physiological conditions at time t from time series data that include sensor information at time t. Such contemporaneous information is too valuable to ignore; 4 Depending on sensor resolutions, actual data do not always reflect the analog nature of the measured signals; rather, certain data points appear quite frequently. For instance, each of the following Sensor 3 data points 32.582733 and 32.751022 was recorded more than 1000 times in the training dataset. Resolutions of Sensors 6 and 8 seem significantly higher than the others; thus, computing Equation (3) took much longer for the data of these two sensors than for the data of the others.

thus, we modified current DSB model and added contemporaneous variables. The modified DSB model (mdsb) can be conceptualized as a coalescence of a DSB model and a simple Bayesian network model (SB); the latter is also known as naive Bayes classifier. An SB model is essentially an atemporal model such that its random variables are not associated with any temporal semantic and variable interdependencies do not indicate any temporal transitions. In this study, the following assumptions are made for using the SB approach in the temporal domain: 1. An event at time t n are independent from any other event at time ( ( n) ( n 1),..., ( 0) ) = ( ( n) ) t t ; that is, P t t t P t. (5) 2. All covariates are conditionally independent given the outcome of the variable of interest. The main difference between the DSB and SB models is that the former contains only transitional (temporal) relations whereas the latter contains only contemporaneous ( atemporal ) relations. As a combination of the two, an mdsb contains both transitional and contemporaneous relations. n 1 ( t 0 ) ( t 1 ) ( t 2 ) 2 3 4 time Figure 3: An mdsb Model The SB portion of an mdsb model comprises a set of variables ( t i ) and a set of relations contemporaneous to t i (see Figure 3). The values of the variables of interest denoted by unshaded nodes were unobserved (i.e., unknown). In the current study, transitional relations (represented with the dashed arcs in Figure 3) between successive unknown variables were not implemented. In the general case of mdsb, variables with unknown values may also

have a complete set of first-order Markov dependencies as their observable contemporaneous counterparts. Determining Genders of the Subjects There are three additional variables of the training dataset that are not mentioned above: The gender of each subject, and two sets of subject characteristics, which we here denote as char1 and char2 (see Table 1). All three variables were deidentified and enumerated. Here are some observations about this portion of the training dataset: Both gender and char2 were static. Changes in char1 over time were rare. Effective sample size was small four subjects with gender = 1 and 14 subjects with gender = 0 in the training dataset. The task specified by the organizers of the workshop was to determine the gender of each test subject by using all available information, which includes time series sensor data as well. With only a small set of parameters, we identified a deterministic relation between gender and the two characteristics on the training dataset through constructive induction: Results and Conclusions 1 if char2= 1 char1= 29 char1= 42 gender = 0 otherwise In this study, we applied a set of new Bayesian methods to diagnose physiological conditions (PCs) of human subjects from multivariate time series data that were obtained through a set of biosensors. We submitted our results in three sets: The first two sets were predicted by two mdsbs with two different decision thresholds and the last set was predicted by an SB. At the time of submission of this paper, actual genders and PC values of the subjects in the test data were not publicized; thus, we defer the evaluation of the methods until the availability of the test results. References Berge, C. (1962). Espaces topologiques, fonctions multivoques (Topological Spaces translated by E.M. Patterson. Dover, Mineola, NY, 1997 ed.). Paris, France: Dunod. Cowell, R. (1998). Advanced Inference in Bayesian Networks. Learning in Graphical Models (p. 27 49). Boston, MA: Kluwer Academic Publishers. Kayaalp, M. M. (2003). Learning Dynamic Bayesian Network Structures from Data. Ph.D. Dissertation, University of Pittsburgh, Pittsburgh, PA. Shachter, R. D., & Kenley, C. R. (1989). Gaussian Influence Diagrams. Management Science, 35(5), 527 550.