Region Description for Recognition

Region Description for Recognition For object recognition, descriptions of regions in an image have to be compared with descriptions of regions of meaningful objects (models). The general problem of object recognition will be treated later. Here we learn basic region description techniques for later stages in image analysis (including recognition). Typically, region descriptions suppress (abstract from) irrelevant details and expose relevant properties. What is "relevant" depends on the task. Example: OCR (Optical Character Recognition) region abstraction 1 abstraction abstraction 3 1 Simple D Shape Features For industrial recognition tasks it is often required to distinguish a small number of different shapes viewed from a small number of different view points with a small computational effort. In such cases simple D shape features may be useful, such as: - area - boxing rectangle - boundary length - compactness - second-order momentums - polar signature - templates Features may or may not have invariance properties: - D translation invariance - D rotation invariance - scale invariance 1

Euler Number The Euler number is the difference between the number of disjoint regions and the number of holes in an image. P = number of parts H = number of holes E = P - H Example: P = 5 H = E = 3 Surprisingly, E (but not P or H) can be computed by simple local operators. Operators for regions with asymmetric connectivity: 4-connected NE and SW 8-connected NW and SE pattern1 = pattern = E = (count of pattern1) - (count of pattern) 3 Area The area of a digital region is defined as the number of pixels of the region. For an arbitrarily fine resolution, area is translation and rotation invariant. In praxis, discretization effects may cause considerably variations. area = 8 area = 31 4

Boxing Rectangle Boxing rectangle = width of a shape in x- and y-direction y a b easy to compute not rotation invariant x To achieve rotation invariance, the rectangle must be fitted parallel to an innate orientation of the shape. Orientation can be determined as the axis of least inertia (see second order moments). 5 Boundary Length The boundary length is defined as the number of pixels which constitute the boundary of a shape. Example: area = 77 boundary length = 3 area = 69 boundary length = 40 6 3

Compactness compactness = (boundary length) area Compactness describes analog shapes independent of linear transformations. very compact not very compact Compactness for discrete shapes is in general not translation, rotation or scale invariant due to discretization effects. 7 Center of Gravity Consider a D shape evenly covered with mass. Physical concepts such as - center of gravity - moments of inertia may be applied. Center-of-gravity coordinates: D = digital region y The center of gravity is the location where first-order moments sum to zero. (i- i s ) = 0 ( j - j s ) = 0 j s i s = 1 D i j s = 1 D j i s x 8 4

Second-order Moments Second-order moments ("moments of inertia") measure the distribution of mass relative to axes through the center of gravity. m x = (i - i s ) = i i s D ij D m y = (j- j s ) = j j s D m xy = (i - i s )(j- j s ) = ij i s j s D ij D moment of inertia relative to y-axis through center of gravity moment of inertia relative to x-axis through center of gravity "mixed"moment of inertia relative to x- and y-axis through center of gravity, zero if x- and y-axis are "main axes" y m x > m y m xy = 0 y m x < m y m xy 0 x x 9 Axis of Minimal Inertia The axis of minimal inertia can be used as an innate orientation of a D shape. w Inertia (= second order moment) r ij relative to an axis is the sum of m v = r ij the squared distances between all pixels of the shape and the axis. v 1. The axis of least inertia passes through the center of gravity. The mixed moment m vw relative to the axes v and w must be zero If the mixed moment is nonzero, the axis must be turned by the angle α: y y tanα = m y m x w m xy x α x v 10 5

Polar Signature The polar signature records the angular segments where circles around the center of gravity lie within a shape. R 1 R 0 90 180 70 360 0 90 180 70 360 A 180 90 0 70 R 3 0 90 180 70 360 scalable from coarse to fine by appropriate number of circles radii of circles must be chosen judiciously translation-invariant rotation-invariance can be achieved by cyclic shifting 11 Object Recognition Using the Polar Signature Model signatures Recognition results 1 6

Convex Hull A region R is convex if the straight-line segment x 1 x between any two points of R lies completely inside of R. For an arbitrary region R, the convex hull H is the smallest convex region which contains R. Example of shape with convex hull: Intuitive convex hull algorithm: 1. Pick lowest and left-most boundary point of R as starting point P k = P 1. Set direction of previous line segment of convex-hull boundary to v = (0, -1).. Follow boundary of R from current point P k in an anti-clockwise direction and compute angle θ n of line P k P n for all boundary points P n after P k. The point P q with θ q = min{θ n } is a vertex of the convex hull boundary. 3. Set P k = P q and v = (P k P n ) and repeat ) and 3) until P k = P 1. There are numerous convex hull algorithms in the literature. The most efficient is O(N) [Melkman 87], see Sonka et al. "Image Processing...". 13 Skeletons The skeleton of a region is a line structure which represents "the essence" of the shape of the region, i.e. follows elongated parts. Useful e.g. for character recognition Medial Axis Transform (MAT) is one way to define a skeleton: The MAT of a region R consists of all pixels of R which have more than one closest boundary point. MAT skeleton consists of centers of circles which touch boundary at more than one point MAT skeleton of a rectangle shows problems: Note that "closest boundary point" depends on digital metric! 14 7

Thinning Algorithm Thinning algorithm by Zhang and Suen 1987 (from Gonzalez and Wintz: "Digital Image Processing") Repeat A to D until no more changes: A Flag all contour points which satisfy conditions (1) to (4) B Delete flagged points C Flag all contour points which satisfy conditions (5) to (8) D Delete flagged points Example: Assumptions: region pixels = 1 background pixels = 0 contour pixels 8-neighbours of background Neighbourhood labels: p 9 p 8 p p 3 p 1 p 4 p 7 p 6 p 5 Conditions: (1) N(p 1 ) 6 () S(p 1 ) = 1 (3) p p 4 p 6 = 0 (4) p 4 p 6 p 8 = 0 (5) N(p 1 ) 6 (6) S(p 1 ) = 1 (7) p p 4 p 8 = 0 (8) p p 6 p 8 = 0 N(p 1 ) = number of nonzero neighbours of p 1 S(p 1 ) = number of 0-1 transitions in ordered sequence p, p 3,... 15 B-splines (1) B-splines are piecewise polynomial curves which provide an approximation of a polygon based on vertices. precision depends on distances of vertices Important properties: eye-pleasing smooth approximation of control polygon change of control polygon vertex influences only small neighbourhood curve is twice differentiable (e.g. has well-defined curvature) easy to compute x(s) = Σ v i B i (s) i = 0.. N+1 s parameter, changing linearly from i to i+1 between vertices v i and v i+1 v i vertices of control polygon B i (s) base functions, nonzero for s [i-, i+] 16 8

B-splines () Each base function B i (s) consists of four parts: 1 C 0 (t) = t 3 6 C 1 (t)= 3t3 + 3t + 3t + 1 6 C (t) = 3t3 6t + 4 6 C 3 (t) = 3t3 + 3t 3t + 1 6 C 0 C 1 C C 3 x(s) = C 3 (s-i)v i-1 + C (s-i)v i + C 1 (s-i)v i+1 + C 0 (s-i)v i+ Example: s = 7.7 i = 7 x(7.7) = C 3 (0.7)v 6 + C (0.7)v 7 + C 1 (0.7)v 8 + C 0 (0.7)v 9 17 Shape Description by Fourier Expansion (1) The curvature function k(s) of a region is necessarily periodic: k(s) = k(s+l) L = length of boundary Hence k(s) can be expanded by a Fourier series with coefficients: c n = 1 L πin k(s) exp( L s)ds L 0 To avoid problems with curvature discontinuities at corners, it is useful to consider the slope intrinsic function: θ (s) = θ(s) πs L µ with θ(s) = k(ς )dς s 0 πs L µ = 1 L θ(s) πs L L ds 0 tangent angle normalization to achieve periodicity mean (dependent on starting point) 18 9

Shape Description by Fourier Expansion () The shape of a contour can be approximately represented by a limited number of harmonics of the Fourier expansion of the slope intrinsic function θ (s): L 1 πin cn = θ (s)exp( s)ds L0 L Example: (from Duda and Hart 73: Pattern Classification and Scene Analysis) original 5 harmonics 10 harmonics Caution: It is questionable whether the approximations by a limited number of harmonics capture the most frequent deviations from the normal. 15 harmonics 19 Templates A template is a translation-, rotation- and scale-variant shape desription. It may be used for object recognition in a fixed, reoccurring pose. A M-by-N template may be treated as a vector in MN-dimensional feature space Unknown objects may be compared with templates by their distance in feature space Example: Template for face recognition Distance measures: g t pixels of image pixels of template (g = g de = t ) t da absolute distance db = max g t squared Euclidean distance maximal absolute distance gmn g13 template as point in feature space g1 g11 0 10

Cross-correlation r = g t cross-correlation between image g and template t Compare with squared Euclidean distance d e : d e = (g t ) = g + t r Image "energy" Σg and template "energy" Σt correspond to length of feature vectors. r = g t g t Cauchy-Schwartz Inequality: Normalized cross-correlation is independent of image and template energy. It measures the cosine of the angle between the feature vectors in MN-space. r 1 with equality iff g = c t, all 1 Artificial Neural Nets Information processing in biological systems is based on neurons with roughly the following properties: the degree of activation is determined by incoming signals the outgoing signal is a function of the activation incoming signals are mediated by weights weights may be modified by learning net input for cell j Σ w ij o i (t) activation a j (t) = f j (a j, Σ w ij o i (t)) output signal o j (t) = F j (a j ) Typical shapes of f i and F i : input signal for cell j from cell i weight w ij output signal of cell j f i F i cell j 11

Multilayer Feed-forward Nets Example: 3-layer net output units... each unit of a layer is connected to each unit of the layer below hidden units...... units within a layer are not connected activation function f is differentiable (for learning) input units... 3 Character Recognition with a Neural Net Schematic drawing shows 3-layer feed-forward net: input units are activated by sensors and feed hidden units hidden units feed output units each unit receives weighted sum of incoming signals Supervised learning Weights are adjusted iteratively until prototypes are classified correctly (-> backpropagation) output units hidden units 0 1 3 4 5 6 7 8 9 input units 4 1

Learning by Backpropagation nominal output signal t pj actual output signal o pj cell j w ij cell i Supervised learning procedure: present example and determine output error signals adjust weights which contribute to errors Adjusting weights: Error signal of output cell j for pattern p is δ pj = (t pj - o pj ) f j (net pj ) f j () is the derivative of the activation function f() Determine error signal δ pi for internal cell i recursively from error signals of all cells k to which cell i contributes. δ pi = f i (net pi ) Σ k δ pk w ik Modify all weights: Δ p w ij = ηδ pj o pi η is a positive constant input pattern p The procedure must be repeated many times until the weights are "optimally" adjusted. There is no general convergence guarantee. 5 Perceptrons (1) Which shape properties can be determined by combining the outputs of local operators? A perceptron is a simple computational model for combining local Boolean operations. (Minsky and Papert, Perceptrons,??) ϕ 1 ϕ Ω 0 1 ϕ i Boolean functions with local support in the retina: - limited diameter - limited number of cells output is 0 or 1 ϕ n Boolean Boole'sche functions Funktionen lineare linear Schwellwert= threshold zelle element Ω compares weighted sum of the ϕ i with fixed threshold θ: Ω = 1 if Σ w iϕ i > θ 0 otherwise B/W Retina S/W-Retina 6 13

Perceptrons () A limited-diameter perceptron cannot determine connectedness Assume perceptron with maximal diameter d for the support of each ϕ i. Consider 4 shapes as below with a < d and b >> d. a b Boolean operators may distinguish 5 local situations: ϕ 1 ϕ ϕ 3 ϕ 4 ϕ 5 For Ω to exist, we must have: w 1 ϕ 1 + w 4 ϕ 4 < θ w ϕ + w 4 ϕ 4 > θ w ϕ + w 3 ϕ 3 < θ w 1 ϕ 1 + w 3 ϕ 3 > θ Σ w i ϕ i < θ Σ w i ϕ i > θ ϕ 5 is clearly irrelevant for distinguishing between the connected and the disconnected shapes contradiction, hence Ω cannot exist 7 14