Quantitative Steganalysis of LSB Embedding in JPEG Domain

Size: px

Start display at page:

Download "Quantitative Steganalysis of LSB Embedding in JPEG Domain"

Bernice Pitts
5 years ago
Views:

1 Quantitative Steganalysis of LSB Embedding in JPEG Domain Jan Kodovský, Jessica Fridrich September 10, 2010 / ACM MM&Sec 10 1 / 17

2 Motivation Least Significant Bit (LSB) embedding Simplicity, high embedding capacity Used in Jsteg, JP Hide&Seek, and other commercial stego software Steganalysis of LSB embedding in spatial domain is mature area [Dumitrescu-2002], [Ker-2008] Our focus Transform domain JPEG format Quantitative steganalysis Outputs the estimate of the message length 2 / 17

3 Jsteg Jsteg: [Upham-1993] LSB replacement Embedding along a pseudo-random path DCT histogram Skipping 0 and / 17

4 Jsteg Jsteg: [Upham-1993] LSB replacement Embedding along a pseudo-random path Full embedding Skipping 0 and Embedding violates histogram symmetry 3 / 17

5 Selected Existing Attacks [Zhang,Ping-2003] the first quantitative attack Employed violation of histogram symmetry [Yu-2004] histogram-based attack Generalized Cauchy ML fit Chi-square test [Lee-2006], [Lee-2007] Category attack Technically not quantitative [Westfeld-2007], [Böhme-2008] adaptation of spatial domain attacks [Pevný-2009] support vector regression Feature-based non-structural attack Currently the most accurate quantitative attack 4 / 17

6 Our Goals / Challenges Improve the accuracy of existing quantitative attacks to Jsteg Achieve better performance than the feature-based machine learning approach (SVR) Focus on the structure of LSB embedding Deliver theoretically well-founded modular framework Explore the applicability of the proposed attacks to a different LSB embedding paradigms 5 / 17

7 Maximum Likelihood β... change rate P x (x) x Emb(β) Emb(β) P (y x,β) P (y x,β) cover feature vector x stego feature vector y P (y, β) = P (y, x, β)dx = ˆβ = arg max β 0 P (y x, β)p (x, β)dx = P (β) P (y β) = arg max β 0 P (y x, β)p x(x)dx Choice of the feature vector x is crucial P (y x, β)p x(x)dx 6 / 17

8 Features of Zhang & Ping x = [ x 1 x 2 x 3 ] [Zhang,Ping-2003] P (y x, β) Binomial distribution Gaussian approximation Embedding invariants: x 1 + x 2, x 3 P x (x) precover assumption [Ker-2007] 7 / 17

9 Features of Zhang & Ping x = [ x 1 x 2 x 3 ] [Zhang,Ping-2003] P (y x, β) Binomial distribution Gaussian approximation Embedding invariants: x 1 + x 2, x 3 P x (x) precover assumption [Ker-2007] 7 / 17

10 Features of Zhang & Ping x = [ x 1 x 2 x 3 ] [Zhang,Ping-2003] P (y x, β) Binomial distribution Gaussian approximation Embedding invariants: x 1 + x 2, x 3 P x (x) precover assumption [Ker-2007] 7 / 17

11 Features of Zhang & Ping x = [ x 1 x 2 x 3 ] [Zhang,Ping-2003] P (y x, β) Binomial distribution Gaussian approximation Embedding invariants: x 1 + x 2, x 3 P x (x) precover assumption [Ker-2007] 7 / 17

12 Features of Zhang & Ping Emb(β) x = [ x 1 x 2 x 3 ] β 1 β y = [ x β 1 x β 2 x β 3 ] [Zhang,Ping-2003] arg max β 0 P (y x, β)p x(x)dx P (y x, β) Binomial distribution Gaussian approximation Embedding invariants: x 1 + x 2, x 3 P x (x) precover assumption [Ker-2007] Precover 1/2 1/2 x 1 x 2 + x 3 7 / 17

13 Performance Evaluation Median absolute error 10 2 Jsteg ML - Zhang & Ping Change rate β 3,250 JPEG images resized and compressed to QF=75 Performance similar to [Zhang,Ping-2003] Assumption x β 1 = expected value Zhang & Ping s estimator 8 / 17

14 Performance Evaluation Median absolute error 10 2 Jsteg ML - Zhang & Ping SVR Cartesian-calibrated Pevný features (548) Additional 3,250 images for training Change rate β 3,250 JPEG images resized and compressed to QF=75 Performance similar to [Zhang,Ping-2003] Assumption x β 1 = expected value Zhang & Ping s estimator 8 / 17

15 First-Order Statistics x = [ x 2L, x 2L+1,..., x 2R, x 2R+1 ] ˆβ = arg max β 0 P (y x, β)p x(x)dx Embedding changes in individual LSB pairs are independent ( ) ( ) P (y x, β) = P x β 0 x 0, β P x β 1 x 1, β P Embedding invariants: x 0, x 1, x 2k + x 2k+1 Binomial distribution Gaussian approximation k ( ) x β 2k, xβ 2k+1 x 2k, x 2k+1, β 9 / 17

16 First-Order Statistics x = [ x 2L, x 2L+1,..., x 2R, x 2R+1 ] ˆβ = arg max β 0 P (y x, β)p x(x)dx p 1 2s ( x + 1 ) p s DCT coefficients are i.i.d. drawn from generalized Cauchy distribution Parameters p and s are ML estimates, given embedding invariants Precover assumption for every LSB pair Precover Embedding invariants: x 0, x 1, x 2k + x 2k+1 9 / 17

17 Performance Evaluation Median absolute error 10 2 Jsteg ML - Zhang & Ping ML - First-order SVR Change rate β 10 / 17

18 Second-Order Statistics DCT coefficients are not i.i.d. We capture dependencies using adjacency matrix X Natural decomposition into k-nodes, k {1, 2, 4} Binomial / multinomial distributions Gaussian approximations [-2,3] [-1,3] [0,3] [1,3] [2,3] [3,3] [-2,2] [-1,2] [0,2] [1,2] [2,2] [3,2] arg max β 0 P (Y X, β)p x(x)dx [-2,1] [-1,1] [0,1] [1,1] [2,1] [3,1] [-2,0] [-1,0] [0,0] [1,0] [2,0] [3,0] Factorization of P (y x, β) [-2,-1] [-1,-1] [0,-1] [1,-1] [2,-1] [3,-1] Embedding invariants [-2,-2] [-1,-2] [0,-2] [1,-2] [2,-2] [3,-2] Analytic expression 11 / 17

19 Second-Order Statistics DCT coefficients are not i.i.d. We capture dependencies using adjacency matrix X Natural decomposition into k-nodes, k {1, 2, 4} Binomial / multinomial distributions Gaussian approximations [-2,3] [-1,3] [0,3] [1,3] [2,3] [3,3] [-2,2] [-1,2] [0,2] [1,2] [2,2] [3,2] arg max β 0 P (Y X, β)p x(x)dx [-2,1] [-1,1] [0,1] [1,1] [2,1] [3,1] [-2,0] [-1,0] [0,0] [1,0] [2,0] [3,0] Complications arise [-2,-1] [-1,-1] [0,-1] [1,-1] [2,-1] [3,-1] Good parametric model? [-2,-2] [-1,-2] [0,-2] [1,-2] [2,-2] [3,-2] High complexity 11 / 17

20 Zero Message Hypothesis (ZMH) Alternative heuristic approach Penalty function z(x) 0 satisfying z(x β ) 0 when β = 0 z(x β ) > 0 when β > 0 z(x) should be a quantitative description of a zero message hypothesis capturing a key cover property violated by embedding Assumption: y = E[x β ] = Emb(x, β) Assumption: mapping Emb is invertible x = Emb 1 (y, β) Comments ˆβ = arg min β 0 z(emb 1 (y, β)) Low computational complexity one-dimensional search over β ZMH-based steganalysis is not a new idea! [RS steganalysis,2001] 12 / 17

21 First-Order Statistics (ZMH) x = [x 2L, x 2L+1,..., x 2R 1, x 2R ] Penalty function z sym (x) = w k (x k x k ) 2 Weights w k chosen to minimize the estimator variance least squares steganalysis [Ker-2007] Final form of the penalty function: z sym (x) = k>0 (x k x k ) 2 x k + x k 13 / 17

22 Performance Evaluation Median absolute error 10 2 Jsteg ML - Zhang & Ping ML - First-order ZMH - First-order SVR Change rate β 14 / 17

23 Second-Order Statistics (ZMH) Feature vector: adjacency matrix X ZMH approach Decomposition into k-nodes Embedding is invertible provided 0 β < 1/2 Symmetry about D [-2,3] [-2,2] [-1,3] [-1,2] [0,3] [0,2] [1,3] [1,2] [2,3] [2,2] [3,3] [3,2] [-2,1] [-1,1] [0,1] [1,1] [2,1] [3,1] z adj (X) = i,j (x i,j x j, i ) 2 x i,j + x j, i [-2,0] [-1,0] [0,0] [1,0] [2,0] [3,0] [-2,-1] [-1,-1] [0,-1] [1,-1] [2,-1] [3,-1] [-2,-2] [-1,-2] [0,-2] [1,-2] [2,-2] [3,-2] D 15 / 17

24 Second-Order Statistics (ZMH) Feature vector: adjacency matrix X ZMH approach Decomposition into k-nodes Embedding is invertible provided 0 β < 1/2 Symmetry about D [-2,3] [-2,2] [-1,3] [-1,2] [0,3] [0,2] [1,3] [1,2] [2,3] [2,2] [3,3] [3,2] [-2,1] [-1,1] [0,1] [1,1] [2,1] [3,1] z adj (X) = i,j (x i,j x j, i ) 2 x i,j + x j, i [-2,0] [-1,0] [0,0] [1,0] [2,0] [3,0] [-2,-1] [-1,-1] [0,-1] [1,-1] [2,-1] [3,-1] [-2,-2] [-1,-2] [0,-2] [1,-2] [2,-2] [3,-2] D 15 / 17

25 Performance Evaluation Median absolute error 10 2 Jsteg ML - Zhang & Ping ML - First-order ZMH - First-order ZMH - Second-order SVR Change rate β 16 / 17

26 What Else Can You Find in the Paper / Journal Version Error analysis of between-image and within-image errors for selected attacks Verification of precover assumptions using two different statistical tests Discussion & experiments with the symmetrized version of Jsteg Conversion of the Category attack [Lee-2006] into a quantitative one through the proposed ZMH framework Experiments conducted on two different sources of images Results reported in terms of two more security measures: IQR, median bias 17 / 17

Quantitative Structural Steganalysis of Jsteg

Quantitative Structural Steganalysis of Jsteg Jan Kodovský and Jessica Fridrich, Member, IEEE Abstract Quantitative steganalysis strives to estimate the change rate defined as the relative number of embedding