Bayesian Regression of Piecewise Constant Functions

Size: px

Start display at page:

Download "Bayesian Regression of Piecewise Constant Functions"

Leslie Quinn
5 years ago
Views:

1 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Bayesian Regression of Piecewise Constant Functions Marcus Hutter Istituto Dalle Molle di Studi sull Intelligenza Artificiale IDSIA, Galleria 2, CH-6928 Manno-Lugano, Switzerland marcus@idsia.ch, marcus Valencia ISBA, 1 6 June 2006

2 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Table of Contents Bayesian Regression Quantities of Interest Efficient Solutions by Dynamic Programming Determination of the Hyper-Parameters Example: Gene Expression Data Extensions Summary

3 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Abstract I derive an exact and efficient Bayesian regression algorithm for piecewise constant functions of unknown segment number, boundary location, and levels. It works for any noise and segment level prior, e.g. Cauchy which can handle outliers. I derive simple but good estimates for the in-segment variance. I also propose a Bayesian regression curve as a better way of smoothing data without blurring boundaries. The Bayesian approach also allows straightforward determination of the evidence, break probabilities and error estimates, useful for model selection and significance and robustness studies. I present an application to microarray-cgh data analysis. Many possible extensions will be discussed. Keywords: Bayesian regression, exact polynomial algorithm, non-parametric inference, piecewise constant function, dynamic programming, application, microarray-cgh data.

4 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Advantages of Bayesian Regression Very principled, hence involves less heuristic design choices. Important for estimating the number of segments. One can decide among competing models solely on evidence. Bayes often works well in theory and practice. Probability estimates and variances for quantities of interest. Bayesian regression curve (better than local smoothing which wiggles more and blurs jumps).

5 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Setup / Likelihood True function f = (f 1,..., f n ) has k segments with boundaries 0 = t 0 < t 1 <... < t k 1 < t k = n, i.e. f is const. on {t q 1 +1,.., t q } for each 0 < q k. Noisy observations y = (y 1,..., y n ). Any independent noise with mean µ q and variance σ Data Break-Probability PCRegression STD(µ_t y) = Likelihood: P (y µ, σ) = k q=1 t q i=t q 1 +1 P (y i µ q, σ)

6 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Goal: Estimate segment levels µ = (µ 1,..., µ k ), boundaries t = (t 0,..., t k ), and their number k. Bayesian Regression Bayesian regression: Assume prior P (µ, t, k) Compute posterior: P (µ, t, k y) = Evidence: P (y) = k,t P (y µ, t, k)p (µ, t, k) P (y) P (y µ, t, k)p (µ, t, k) dµ Too complex: We need summaries like mean or MAP.

7 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Prior We model the level of each segment by a broad (e.g. Gaussian) distribution P (µ q ν, ρ) Uniform distribution among all segmentations into k segments: P (t k) = ( n 1 k 1 ) 1 Uniform prior over segment number k: P (k) = 1/n. = Prior: P (µ, t, k) = k q=1 P (µ q ν, ρ) P (t k) P (k) (ρ, ν, σ) are fixed hyper-parameters determined later.

8 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Quantities of Interest # Segments: ˆk = arg max k P (k y) Boundaries: ˆt q = arg max tq P (t q y, ˆk) Segment level: ˆµ q = E[µ q y, ˆt, ˆk] = P (µ q y, ˆt, ˆk)µ q dµ q The estimate (ˆµ,ˆt, ˆk) defines a (single) piecewise constant (PC) function ˆf, which is our estimate of f. A (very) different quantity is to Bayes-average over all piecewise constant functions and to ask for the mean at location i as an estimate for f i : Regression curve: ˆµ i = E[µ i y] = P (µ i y)µ i dµ i

9 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Dynamic Programming Dynamic programming: Fix a break, then data left and right of the break are independent. Evidence and moments of single segment from i + 1 to j. A r ij := P (µ m ) j t=i+1 P (y t µ m )µ r mdµ m Analytical for exponential family with conjugate prior like Gauss and numerically for others like Cauchy. L kj : P (y 1..y j k) of first j data, given k segments. R ki : P (y i+1..y n k) of last n i data, given k segments. Left recursion: Evidence of y 1..y j with k + 1 segments = evidence of y 0h with k segments single-segment evidence of y h+1..y j, summed over all locations h of boundary k: L k+1,j = j 1 h=k L kha 0 hj Similarly: Right recursion: R k+1,i = n k h=i+1 A0 ih R kh

10 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Efficient Solutions for Quantities of Interest Evidence P (y) = n k=1 P (y k)p (k) = 1 n The posterior of k and its MAP estimate are n k=1 L kn/( n 1 k 1 ) P (k y) = L kn ( n 1 k 1 )k maxe and ˆk = arg max k=1..k max P (k y) Prob. that boundary p located at h is P (t p =h y, ˆk) = L ph Rˆk p,h /Lˆkn MAP segment boundary p is ˆt p := arg max h P (t p = h y, ˆk) Segment level moments are µ r p = A rˆt /A 0ˆt p 1ˆt p p 1ˆt p Regression curve: Fix single segment t m 1 = i,.., t m = j containing t, then µ t = µ m. Now sum over all such segments: µ t r = 1 ˆk i<t j Lˆkn m=1 L m 1,i A r ijrˆk m,j

11 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Determination of the Hyper-Parameters Global variance ρ and mean ν of µ, in-segment variance σ. (Empirical) Bayes: Averaging or maximizing P (y σ, ν, ρ) is expensive. Fast semi-principled estimation Global mean ˆν 1 n n t=1 y t Global variance ˆρ 2 1 n n 1 t=1 (y t ˆν) 2 In-segment variance σ more tricky without knowing segmentation: ˆσ 2 1 n 1 2(n 1) t=1 (y t+1 y t ) 2 Good for large noise. Crude estimate is enough if noise is low (regression easy).

12 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Quartiles for Heavy-Tailed Robust Distributions Let [y] be the data vector y sorted in ascending order. Global median ˆν [y] n/2 Global scale ˆρ [y] 3n/4 [y] n/4 2α Differences t := y t+1 y t. with α 1 In-segment scale ˆσ [ ] 3n/4 [ ] n/4 2β with β 12 Iteratively improve them, if the estimates are really not sufficient.

13 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Example: Gene Copy # Data All genes in a healthy human cell come in pairs, but can be lost or multiplied in tumor cells. With modern micro-arrays one can measure the copy-number of genes along a chromosome. It is important to determine the breaks, where copy-number chances. The measurements are very noisy [Pinkel 98]. Hence this is a natural application for piecewise constant regression of noisy (one-dimensional) data. Regression results of one aberrant and one healthy chromosome (without biological interpretation) are shown...

14 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Aberrant Gene Copy # of Chromosome Data Break-Probability PCRegression STD(µ_t y) Data (blue), PCR (black), BP (red), and variance 1/2 (green).

15 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Aberrant Gene Copy # of Chromosome Data BayesMean Data with Bayesian regression curve ± 1 std.-deviation.

16 Marcus Hutter Bayesian Regression of Piecewise Constant Functions log(evidence) Aberrant Gene Copy # of Chromosome log(evidence) ML#segments Estimate 100 sqrt(varseg) log P (y) (blue) and ˆk (green) as function of σ and our estimate ˆσ of (arg) max σ P (y) and ˆk(ˆσ) (black triangles) ML#segments

17 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Normal Gene Copy # of Chromosome Data with Bayesian regression curve.

18 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Posterior Segment Number Probability P (k y) log P(k y) GM k CH CMwG Gen(5,9) Gen(3,1) For medium Gaussian noise (GM, black), high Cauchy noise (CH, blue), medium Cauchy noise with Gaussian regression (CMwG, green), aberrant gene expression of chromosome 1 (Gen(3,1), red), normal gene expression of chromosome 9 (Gen(5,9), pink).

19 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Synthetic Example: What do you see?

20 Marcus Hutter Bayesian Regression of Piecewise Constant Functions 1.5 Synthetic Example: PC-Regression Data Break-Probability PCRegression Var(mu_t y) Data was indeed sampled from a three segment function with high Cauchy noise. Data (blue), PCR (black), BP (red), and Var (green).

21 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Synthetic Example: Bayesian Regression Curve Data with Bayesian regression curve ± 1 std.-deviation.

22 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Regression Summary of Gene and other Examples ( Setup) Gauss, Cauchy, Low, Medium, High noise, Gene true noise scale data size method global mean estimate global deviation estimate in-segment deviation est. log-evidence log P (y) rel. log-likelihood ll E[ll ˆf] Var[ll ˆf] 1/2 Opt.#segm. Confidence P (ˆk( 1, +1) y) Name σ n P ˆν ˆρ ˆσ log E ll E σ ll ˆk Ck( 1,+1) GL G %(0 20) GM G %(0 29) GH G %(10 12) CL C %(0 21) CM C %(0 27) CH C %(11 11) GMwC C %(0 26) CMwG G %(8 8) Gen G %(6 6) Gen G %(0 6)

23 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Extensions Any generalized one-segment evidence (no problem) Known segment levels (even easier) (Non)constant regressors (easy) Piecewise linear regression (easy) Continuous regression (harder, approximate) Non-parametric prior and noise (easy) Very large n (break into overlapping pieces, heuristic)

24 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Related work [Sen&Srivastava 75] Frequentist solution for detecting a single break. [Olshen&al 04] Generalization to pair of breaks. Heuristic recursion for further remaining breaks. [Jon 03,Lavielle 05] Penalized Maximum Likelihood. [Endres&Földiák 05] Piecewise constant (PC) Bayesian density estimation. [Lav 05,EF 05] Dynamic programming.

25 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Summary Full Bayesian PC-regression (Non)Gaussian noise and prior Handling of outliers Analytic estimate for in-segment variance Bayesian regression curve Break probabilities and variances Global evidence for model comparison Principled, little parameters to choose (important for det. of k).

Marcus Hutter - 26 - Bayesian Regression of Piecewise Constant Functions Thanks! Questions? Details: Papers at http://www.idsia.

26 Marcus Hutter Bayesian Regression of Piecewise Constant Functions Thanks! Questions? Details: Papers at marcus Book intends to excite a broader AI audience about abstract Algorithmic Information Theory and inform theorists about exciting applications to AI. Decision Theory = Probability + Utility Theory + + Universal Induction = Ockham + Bayes + Turing = = A Unified View of Artificial Intelligence

Exact Bayesian Regression of Piecewise Constant Functions

Exact Bayesian Regression of Piecewise Constant Functions Marcus Hutter RSISE @ ANU and SML @ NICTA Canberra, ACT, 2, Australia marcus@hutter.net 4 May 27 Abstract www.hutter.net We derive an exact and