Missing Data and Dynamical Systems

Size: px

Start display at page:

Download "Missing Data and Dynamical Systems"

Marilyn Harrell
5 years ago
Views:

1 U NIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN CS598PS Machine Learning for Signal Processing Missing Data and Dynamical Systems 12 October 2017

2 Today s lecture Dealing with missing data Tracking and linear dynamical systems 2

3 Missing data You don t always have all your data! e.g. cell phone drops, gaps in pictures, How do we deal with that? 3

4 A simple case? 4

5 A real-world case! 5

6 SIGNAL PROCESSING FALL 2017 FOR MACHINE LEARNING 6 General classification of cases Missing completely at random (MCAR) Missingness is really random Missing at random (MAR) Missingness depends on some variable But not on the missing data values Not missing at random (NMAR) None of the above

7 Starting with 1D Assume a small gap in 1D data Can we find the missing values?

8 Key observations Temporal structure The signal is somewhat periodic We could predict the future from the past How do we formalize this?

9 A predictive model Predict current sample from the past Using a weighted average of preceding samples x t = Autoregressive (AR) model a i x t i + e t We can learn the coefficients a using various methods 9

10 Rewriting the model Linear algebra notation A = e = A x a p! a ! a p! a ! 0 " " # # # # # " "! 0 0 a p! a ! 0 0 a p! a ! 0 0 a p! a

11 Filling gaps Model the input sequence as: e = A x = A U x u + K x k ( ) = A u x u + A k x k x u and x k are the unknown and known samples U and K are repositioning matrices, e.g.: x = x 1 x 2 x 3 x 4 = x 2 x 3 Unknown x 1 x 4 Known 11

12 Isolating the unknowns Final form isolates unknowns as a variable These are values that should conform to the model Assuming they aren t missing at random! We thus want to find x u so that we minimize e i.e. fill in the blanks with least unpredictable samples A u x u + A k x k = e 12

13 Solving for the gaps Set the derivative to zero e e = 2e e x u Solve for unknown samples ( ) A u = 0 x u = 2 A x u + A k x k ( ) 1 A u A k x k = A u + A k x k ˆx u = A u A u 13

14 Results Works pretty well! Input data blue/cyan real red restoration

15 But not for large windows! Input data blue/cyan real red restoration

16 Looking for a new idea We should move away from the sample level, and look at a coarser scale e.g. a time-frequency view 16

17 A simpler idea Find sections most similar to the gap edges Replace missing sections with their neighbors?? x 1 x 2 x 3?? x 6 x 7 x 8?? How do we find the close match? 17

18 In action First iteration 18

19 In action Second iteration 19

20 In action Third iteration 20

21 In action Fourth iteration 21

22 In action Final iteration Input Output 22

23 Some examples Out In Out In Out In Speech Music Sound effects 23

24 What about images? Basic idea is the same Find a neighbor and blend it A bit more complicated We need to blend carefully, we can t just add Poisson blending Also more computationally intensive (2d search!) 24

25 A real-world example Inpainting, recomposing, warping the truth! 25 PatchMatch: A Randomized Correspondence Algorithm for Structural Image Editing Connelly Barnes, Eli Shechtman, Adam Finkelstein, Dan B Goldman

26 The statistical viewpoint Nearest-neighbor search was local We ignore global data structure Missing data should conform to a statistical model of the input i.e. they shouldn t be outliers 26

27 SVD-based imputation Simple global approach Replace missing data with some values Perform an SVD approximation of the data Replace missing data with SVD approximation Repeat! x u = E{x} or x u N(µ,σ 2 ) or... X U S V ( ) u x u = U S V 27

28 Why? Known data will dominate statistics Assuming they are enough! Each approximation will try to conform to the global statistics of the input i.e. made up values will not be random With each new iteration, that conformance to global statistics will increase 28

29 Variations We don t have to use an SVD e.g. our data might be non-negative So use NMF, or probabilistic models, or whatever makes sense Any global statistical model would work Just make sure that it fits the data well 29

30 Example Learn from input and fill-in missing values Input SVD NMF 30

31 Learning from outside Bandwidth expansion Learn model from other recordings Input SVD NMF 31

32 Tracking Tracking things that evolve in time missiles, faces, finances, etc A time-series model again 32

33 A simple problem I m looking at a star which is about there So does my drunken friend How do we consolidate our observations? 33

34 An uncertain estimate My observation is a Gaussian that helps me account for some uncertainty Mean is what I think is right Variance is an indication of how sure I am N(µ 1,σ 1 2 )

35 More uncertainty My drunken friend s estimate is unreliable Therefore his variance is higher N(µ,σ 2 ) 1 1 N(µ,σ 2 ) 2 2 σ > σ

36 Consensus estimate The consensus estimate is proportional to a Gaussian N(m,s) N(µ 1,σ 1 )N(µ 2,σ 2 ) 2 K = σ 1 σ 2 + σ m = µ + K(µ µ ) s =(1 K)σ

37 Serializing this process Assume a noisy time series z = z + e t t 1 t e N t Can we use that consensus idea for removing noise in successive values? What is the model here? 37

38 Example case Track the position of a moving ball 38

39 Some simple processing Background removal Subtract constant part, find center of remaining pixels 39

40 The problem The position estimate is very noisy Lets see how we can clean it up

41 Making a model Defining a transition process Looks familiar? x t y t A = I e t N = A x t 1 y t 1 In short, the next input will be a random value away from the current input + e z = A z + e t t t 1 t 1 41

42 Starting with it Initial estimate will be the first input: With a given certainty: And two noise models: P = σ σ 2 Q = q x 0 0 q y Process noise (how much the data changes) ẑ 1 = z 1 R = r x 0 0 r y Measurement noise (how reliable my measurements are) 42

43 Setting up sequential estimates Predict future value z = ẑ t t 1 P = P + Q t 1 We don t expect change other than noise And we accumulate the uncertainty Correct that estimate given new input ( ) 1 K = P P + R ẑ t = z t + K z t z t P = ( I K) P t ( ) Get uncertainty of a consensus estimate And the estimate itself Get new uncertainty 43

44 Single step Q ẑ t 1 ẑ t z t R 44

45 Example output Estimate is closer to ground truth

46 Example output Tracking is more smooth But it lags a bit (why?) Noisy input Filtered output 46

47 Complicating the process Adding velocity information Internal state representation z = dx x y t t t dt Transition forces constant velocity z t = A z t 1 + e t = dy dt 1 0 dt dt x t 1 y t 1 dx dt dy dt 47

48 Measurements What we measure is position only: w = H z + v = t t t z + v t t Internal state is not same as measurement And it has it s own noise 48

49 The Kalman filter Predict future value z t = A ẑ t 1 P = A P t 1 A + Q Change state according to model Accumulate the uncertainty Correct that estimate given new input ( ) 1 ( ) K = P H H P H + R ẑ t = z t + K w t H z t P = ( I K H) P t Get uncertainty of a consensus estimate And the estimate itself Get new uncertainty 49

50 More elaborate extensions Add acceleration, etc More complex internal state More accurate models given various inputs Use this to track moving processes Missile over an ocean Finger over a touchscreen Cars on the highway 50

51 And yet more extensions Non-linear processes Extended Kalman filter Unscented Kalman filter (for very nonlinear) Particle filters z = f ( z )+ e, w = h( w )+ v t t 1 t t t 1 t Sampling based method Bypasses Gaussian assumptions 51

52 An example 52

53 In the real world 53

54 Using dynamical systems to classify We can also classify sequences using dynamical models So far we only did prediction, tracking and smoothing Dynamical models can be fit on training sequences And evaluated on new sequences that we want to classify Goodness of fit (for now) can be used for classification 54

55 An example 3D Accelerometer data from a cell phone Classes are user activity (sitting, walking, going upstairs, etc). Each activity has a unique temporal pattern Sensor reading (g) Accelerometer data from walking z-axis y-axis x-axis Sensor reading (g) Accelerometer data from walking_downstairs z-axis y-axis x-axis Time (sec) Time (sec)

56 The VAR model (Vector AutoRegression) We can make a linear model to predict future samples using the past x t+1 y t+1 z t+1 Matrix A should be unique to each class Captures the temporal structure of each class Can easily estimate A with e.g. least squares = A x t y t z t 1 + e t 56

57 Predicting new inputs Get each model s prediction error on new samples Model with least error belongs to the same class as input Can also reformulate probabilistically to get likelihoods instead Prediction error Prediction error Sample class A on model A, MSE: Sample class A on model B, MSE: Sample class B on model A, MSE: Prediction error Prediction error Sample class B on model B, MSE:

58 One big happy family ICA A = 0 x = g N(0,Q ) y = C x + N(0,R) HMM ( ) x = WTA A x + N(0,Q ) t+1 y t = C x t + N(0,R) LDS x t+1 = A x + N(0,Q ) y t = C x t + N(0,R) PCA A = 0 x = N(0,Q ) y = C x + N(0,R) GMM / k-means / VQ A = 0 x = WTA N(µ,Q ) y = C x + N(0,R) 58

59 So it s all really the same model! Like I said in the beginning of this class, DSP and ML are pretty much the same equation over and over 59

60 Recap Missing data techniques AR models using temporal dynamics NN, SVD, NMF using context information Tracking and prediction Kalman filter et al. The broader Linear Dynamical System family 60

61 Next week Source separation Array methods using machine learning Monophonic signal separation methods 61

62 Reading Missing data in audio Inpainting (and more) in images The Kalman filter LDS 62

Classification: The rest of the story

U NIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN CS598 Machine Learning for Signal Processing Classification: The rest of the story 3 October 2017 Today s lecture Important things we haven t covered yet Fisher