Lecture Notes On- K. Sridhar Patnaik(Lecturer) & Rashmi Sahay (Lecturer) Computer Sc. Dept. Birla Institute of Technology, Mesra

Lecture Notes On- Soft Computing SOFT COMPUTING (CP 5105) By K. Sridhar Patnaik(Lecturer) & Rashmi Sahay (Lecturer) Computer Sc. Dept. Birla Institute of Technology, Mesra

Lecture-1 Fuzzy Set Theory:- Probability theory has been an age old and effective tools to handle uncertainty. But it can be applied only to situation whose characteristics are based on random processes, that is processes in which the occurrence of event is strictly determined by chance. However, in reality, large classes of the problems are characterized by a non random process. Here uncertainty may arise due to partial information about the problem, or due to information which is not fully reliable, or due to inherent imprecision in the language with which the problem is defined or due to receipt of information from more then one source about problem which is conflicting. In this situation fuzzy set theory exhibit immense potential for effective solving of uncertainty in the problem. Fuzziness means Vagueness Fuzzy Versus Crisp:- Crisp-Yes No Theory, Which demands binary type handling. Like today is Saturday Crisp set Fuzzy-No definite yes or No Like how is the Weather today Fuzzy set Fuzzy set theory from the basis of fuzzy logic.

Crisp set Universe of discourse: - Universal set is the set which, with reference to a particular content, continuous all possible elements having the same characteristics and from which set can be formed. Universal set is usually denoted by E. E.g.: The universal set of all students in a university. Set: - A set is a well defined collection of objects. Is a object either belongs to or does not belong to the set. A set in certain contexts may be associated with its universal set from which it is derived. Given A= { a 1 a 2 a 3........ a n } a 1 a 2 a 3 are the member of the set. This representation is known as list form. E.g.: A= {Gandhi, Bose, Nehru} A set may also be defined based on the properties the members have to satisfy. E.g.: A={x P(x)} P is the property to be satisfied by x; P(x) also known as characteristic function. Pictorially set can be represented by Venn diagrams: Membership: - An element x is said to be a member of set A if x belongs to the set A. Cardinality: - The number of elements in the set is called its cardinality. Cardinality is denoted by n(a) or A or A E.g.: A= {1, 2, 3, 4, 5} A =5

Family of set:-a set whose members are set in them is referred as a family of sets. E.g.: A= {{1, 2, 3}, {4, 5, 6}} Null set:-empty set denoted by Φ or { } Singleton set:-a set having exactly one element a. A singleton set is denoted by {a} and is the simplest example of nonempty set. Sub set: - A subset is a portion of a set. B is a subset of A (written B A) if every member of B is a member of A. Fuzzy Set:- Suppose a flexible sense of membership of elements do a set. In set theory an element either belongs to or does not belong to a set. In fuzzy theory many degree of membership (between 0 and 1) are allowed. Membership functionµ (x) Since many degrees of membership is allowed is allowed in membership function is associated with a fizzy set such that the function maps every elements of the universe of discourse X (universal set or reference set) to the interval [0,1]. Formally, the mapping is written as: µ (x) :X [0, 1] Definition of Fuzzy set:- If X is a universe of discourse and x is a particular element of X, then a fuzzy set A determined on X may be written as a collection of ordered pairs. A= {(x, µ (x) ), x X} Where each pair (x, µ (x) ) is called a singleton. An alternative definition which includes fuzzy set as union of all µ (x) /x. Singleton is given by: A= 1 in discrete case. And A= µ in continuous case.

Membership Function:- May not be always discrete values, may be continuous function, may be mathematical µ (x) = Soft Computing E.g.:-Consider a set of population, the following age group: 0-10 40-50 10-20 50-60 20-30 60-70 30-40 70 and above Membership function representing fuzzy set Young, Middle aged and old.

Lecture-2 Basic Fuzzy set Theory:- Given X to be the universe of discourse and and to be fuzzy set units A (x) and B (x) are their respective membership function. 1. Union: The union of two fuzzy set and is a new fuzzy set also on X.with membership defined is. A B (x) = max ( A (x), B (x)) Let is fuzzy set of young people. is fuzzy set of middle aged people. is set of people either young or middle aged. In discrete form: ={(x 1,0.5),(x 2,0.7),(x 3,0)} =={(x 1,0.8),(x 2,0.2),(x 3,1)} ={(x 1,0.8),(x 2,0.7),(x 3,1)} 2. Intersection: The intersection of fuzzy set and is is a new fuzzy set with membership function defined as. = A (x), B (x))

3. Complement: Complement of fuzzy set is a new set with a membership function. c c=- Set of young. c Set of not young. A = {(x 1, 0.3), (x 2, 0.7), (x 3, 0.8)} A c = {(x 1, 0.7), (x 2, 0.3), (x 3, 0.2)} 4. Product of two Fuzzy set: Product of two fuzzy set and is a new fuzzy set. whose membership function is defined as. =XX = {(x 1, 0.2), (x 2, 0.8)} = {(x 1, 0.1), (x 2, 0)} = {(x 1, 0.2), (x 2, 0)} 5. Equality: Two fuzzy set and are said to be equal ( = ) if X =X E.g.: = {(x 1, 0.2), (x 2, 0.8)} = {(x 1, 0.6), (x 2, 0.8)} = {(x 1, 0.2), (x 2, 0.8)}

= 6. Product of fuzzy set with a crisp number:mupgfuzz bcpubuwfuzzpducwbp fuc =X E.g.: = {(x 1, 0.4), (x 2, 0.6), (x 3, 0.8)} a. 0.3 a. = {(x 1, 0.12), (x 2, 0.18), (x 3, 0.24)} 7. Power of a Fuzzy set: The power of a fuzzy set is a new fuzzy set α whose membership function is given by. α= α Raising a Fuzzy set to its 2 nd power is calledc c O O Taking the square root is calledddil DDIL E.g.: = {(x 1, 0.4), (x 2, 0.2)} α=2 (A) α = {(x 1, 0.16), (x 2, 0.04)} 8. Difference: The difference of two fuzzy set and is a new fuzzy set - defined as: - = ( c ) E.g.: = {(x 1, 0.2), (x 2, 0.5), (x 3, 0.6)} = {(x 1, 0.1), (x 2, 0.4), (x 3, 0.5)} c = {(x 1, 0.9), (x 2, 0.6), (x 2, 0.5)} - = {(x 1, 0.2), (x 2, 0.5), (x 2, 0.5)}

9. Disjunctive sum: The disjunctive sum of two fuzzy sets and is a new fuzzy set defined as: =( c ) ( c ) Lecture-3 Properties of Fuzzy set:- Any fuzzy set is a sub set of reference set X. Also membership of any element belong to the null set is 0 and that of any element belonging is to reference set is (1) Commutative: = = (2)Associativity: ( = ) ( = ) (3)Distributivity: ( = ) ( = ) (4)Idempotance: = = (5)Identity: = = X=X = = X=X (6)Transitivity: (7)Issolation: ( c ) c = (8)De Morgan s Law: ( ) c = ( c c ) ( ) c = ( c c )

Since fuzzy sets can overlap, the law of excluded middle do not hold good. This is defined below: C X C it is called law ofdc dc Fuzzy relations: Is a fuzzy set defined on the Cartesian product crisp set X 1, X 2,. X n, where n tuples (X 1, X 2,. X n) may have varying degree of membership value indicate the strength of the relation. E.g.: X 1 = {,, Cold } X 2 = {, High temp, Shining} Fuzzy relation may be defined as: High temp, shining Typ 0.1 0.4 0.8 Sura 0.2 0.9 0.7 Cold 0.9 0.4 0.6 Fuzzy Cartesian product: Let be a fuzzy set defined on the universe and be a fuzzy set defined on the universe Y, the Cartesian product between fuzzy set and indicated as and resuting in a fuzzy relation is given by: = X W has its membership function given by- (x,y) = (x,y) = min (,

Lecture-4 Operations on fuzzy Relations: L and be fuzzy relation on X Y. Soft Computing Union: Intersection: Complement:,=,,,,=,,,,=-, Comparisons of relations: upp fuzz dfd X, d fuzz dfd Z, fuzzx ZTfuzz- cpdfd,=,,, Cartesian product and Co-Product: LdbfuzzXdpc,Tpducf dfuzzpducpcx wbpfuc,=, Similarly, Cartesian Co Product. A+ B is a fuzzy set with the membership function.,=,

Fuzzy set: It is a set without a crisp boundary. The transition from belong to a set to not to belong to a set is gradual and this smooth transition is characterized by membership functions. Membership function gives fuzzy sets flexibility in modeling commonly used linguistic expressions, the water is hot the temperature is high. Such imprecision plays an important role in human thinking, particularly in the domain of pattern recognition, communication of information and abstraction. Fuzziness is due to uncertain and imprecise nature of abstract thoughts and concept. It is not because of randomness of the constituent member of the set. Fuzzy set expresses the degree to which an element belongs to a set. Constructing of fuzzy set depends on (i) identification of suitable universe of discourse (ii) the specification of an appropriate membership function Specification of membership function is subjective i.e. membership function defined by different persons may vary considerably for the same concept In practice, when the universe of discourse X is a continuous space, we usually partition X into several fuzzy set whose MF s cover X in more or less uniform manner. The fuzzy sets usually carry that conform to adjectives appearing in our daily linguistic usage such as large, medium, small called linguistic values or linguistic labels. Thus universe of discourse is often called the linguistic variable. Common nomenclature:- 1. Support: The support of a fuzzy set A is the set of all points x in X such that > Support (A) = > 2. Core: The core of fuzzy set A is the set of all points x in X such that =

Core (A) = {x = Soft Computing 3. Normality: A fuzzy set A is normal if it s core is non empty i.e. there is always a point x X such that = 4. Crossover points: a crossover point of a fuzzy set A is a point x X at which = Crossover (A) = {x = } 5. Fuzzy singleton: a fuzzy set whose support is a single point in x with = is called a fuzzy singleton 6. α cut: the α cut of fuzzy set A is a crisp set defined by Aα = {x α Strong α cut: A'α = {x > } Hence support and core fuzzy set A can be expressed as Support (A) = A' 0 And Core (A) = respectively 7. Convexity: a fuzzy set A is convex if and only if for any x 1,x 2 X and any λ [0,1], Equation (1): (λx 1 + (1-λ) x 2 ) min {; Alternatively A is convex if all it s α level sets are convex. In crisp set C in R n is convex if and only if for any two points x 1 ϵc and x 2 ϵc, their combination λx 1 + (1-λ) x 2 is still in C, where Hence the convexity of a crisp level set Aα is composed of a single line segment only Convexity definition of fuzzy set is not as strict as the common definition of convexity of a function f(x) is This is a more stringent condition than equation (1)

8. Fuzzy numbers: A fuzzy number A is a fuzzy set in the real line(r) that satisfies the condition for normality and convexity. Most non convexity fuzzy set used in the literature satisfy the conditions for normality. 9. Bandwidths of normal and convex fuzzy set: for a normal and convex fuzzy set, the bandwidth or width is defined as the distance between the two unique crossover points: Where ==0. =, 10. Symmetry: A fuzzy set A is symmetric if its MF is symmetric around a certain point x=c namely =(c-x) f X 11..Open left, open right, closed: A fuzzy set A is open left if =and = ; open right if '= and = ; and closed if = =

Lecture-5 Soft Computing MF Formulation and parameterization:- A fuzzy set is completely characterized by its MF. There are various ways of defining Membership function Discrrestized membership function of every pair is explicitly stated. A more and convenient and concise way is to represent membership function by some mathematical formula for e.g. = We can also define membership function through classes of parameterized functions. MF s of one dimension has one input. Fuzzy complement: A fuzzy complement operator is a continuous function N:[0,1] [0, 1] which meets the following axiomatic requirements: N (0) =1 and N (1) =0 (boundary) N (a)n (b) if ab (monotonocity)

Any function satisfies this requirement form the general class of fuzzy complements. A notification of boundary conditions would include functions that do not conform to the ordinary complement for crisp sets. The monotonic decreasing requirement is essential since we intuitively expect that an increase in membership grade of a fuzzy set must result in a decrease in the membership grade of its complement. Another optional requirement imposes involution on a fuzzy complement: N (N (a)) = a (Involution) Sugeno s Complement: One class of fuzzy complement is Sugeno s complement, defined by: N S (a) = When S is a parameter greater then -1.

Yager s Complement: Another class of fuzzy complement is Yager s complement, defined by: N w (a) = (1- a w ) 1/w Where w is a (+)ve parameter. Both Sugeno s and yager s complement are symmetric about 45 about line connecting (0,0), (1,1).

Lecture-6 Soft Computing Fuzzy intersection and Union: The intersection of two fuzzy set A and B is specified in general by a function: T:[0,1] [0,1] [0,1] which aggregates two membership grades as follows: A B (x) = T ( A (x), B (x)) = A (x) B (x) where is a Binary operator for T. This class of fuzzy intersection operator is usually referred as T-norm operator which should meets the following requirements. T-norm: A T-norm operator is a two-place function satisfying T, = ; Impose correct generalized to crisp set T, =, = (Boundary condition) A decrease in member value in A or B can not produce an use in membership in value in A B T (a,b) T(c,a) if if a c and b d (monotonicity) Allows to take intersection of any number of set in any order of pair wise grouping T (a,t(b,c)=t(t(a,b),c) (Associativity) Indicates that operator is indifferent of the order T (a,b) =T(b,a) (Commutativity) Four T-norm Operators: Minimum: T min (a,b) = min(a,b) = a b

Algebraic Product: T ap (a,b) =ab Soft Computing Bounded Product: T bp (a,b) =0 (a +b -1) fb = Drastic Product: T dp (a,b) = bf = f, b E.g. With an understanding that a and b are between 0 and 1, we can draw plots. Let, a= µa(x) =trapezoid(x; 3, 8, 12, 17) b= µb(x) = trapezoid(y; 3, 8, 12, 17) Fuzzy Union: The union operator is specified in general by a function S: [0 1] [0, 1] [0, 1] In symbol: µa B(x) =S (µa(x), µb(x)) = µa(x) µb(x) Where a binary operator is for the function S.These class of operator are referred as T-conorm or (S-norm) operator. This should satisfy the following requirements. T-conorm (S-norm): A T-conorm or S-norm operators is a this place operator satisfying:, =, Boundary, =, b = S (a,b) S(c,a) if a c and b d (Monotonicity) S (a,b)= S(b,a) (Commutativity) S (a, S (b,c))= S(S(a,b)),c) (associativity)

Four T-conorm operators:- Soft Computing Maximum: Algebraic sum: Bounded sum: Drastic sum: S (a,b)=max S(a,b)=a b S (a,b)=a+b-ab S (a,b)=1 (a +b) fb = S (a,b)= bf = f, b > S max (a,b) S ap (a,b) S bp (a,b) S dp (a,b) Generalized De Morgan s Law: T-norms T (.,.) and T-conorms S(.,.) are duals which supports the generalization of De Morgan s law: T (a,b)=n(s(n(a),n(b))), S (a,b) = N(T(N(a),N(b))) Also can be written as: a b = N(N(a) N(b)), a b = N(N(a) N(b))

Lecture-7 Soft Computing Triangular MF s: Is specified by their parameters {a, b, c}. a < b < c,, b Triangle (X; a,b,c)=, b c, c Alternatively; Triangle (X: a, b, c) =max (min (, ), 0) The Parameter {a,b,c} (with a< b< c) determine the x coordinates of the three corners of the underlying triangular MF. Trapezoidal MF s: Is specified by four parameters {a, b, c, d} as follows. Trapezoid (X; a, b, c, d) a <b <c < d,, b, b c =, c d, d

Alternatively; Trapezoid (X; a, b, c, d) = max (min (,, The parameter (a, b, c, d) determine the X coordinates of the four corners of the underlying Trapezoidal MF. When (b=c) a trapezoidal MF is equivalent to triangular MF. ), 0) Gaussian MF s: A Gaussian MF is specified by two parameters {c, σ} 2 Gaussian (x; c, σ= - GuMFddcpb: wcpcfmf σwcdmf wd Gaussian (x; 50, 20) Generalized bell MF s: A generalized bell MF (or bell MF) is specified by three parameters {a,b,c} bell (x;a,b,c)=1/1+ )2b Where the parameters, b is usually (+) ve. If b is negative (-) ve the MF becomes up-side down bell. Adjusting c, a will vary the centre and width of the MF, and then use b to control slop at crossover point, since bell function has one extra parameter than Gaussian it has on more degree of freedom to adjust the steepness at the crossover points.

Lecture-8 Sigmoidal MF s: (used to specify assemytrical membership function) A Sigmoidal MF is defined by: sig(x; a, c) = Where a controls the slope at crossover point x =c. Depending on the sign of the parameter a, the Sigmoidal is open right or open left. Used widely to activation function of artificial neural networks. a +ve open right a ve pf Closed and asymmetric MF s based on sigmoidal function: Obtained by taking the difference of two sigmoidal functions. (Also by taking product of two sigmoidal function) y 1 y 2 y 1 =sig(x; 1,-5) and y 2 =sig(x; 2,5)

Left Right MF: Soft Computing A left right MF or L-R MF is specified by the parameter {a, b, c}; FL ; c L R(x; c, α, β= F ; c When F L (x) and F R (x) are monotonically decreasing function defined on [0, 0] with F L (0) = F R (0) =1 and FL = F = 0 F L (x) max (0, ) F R (x) Any type of continuous probability distribution function can be used as an MF provided that a set of parameter is given to specify the appropriate meaning of MF.

MF s of Two Dimension: MF s with two inputs, each in a different universe of discourse. Soft Computing Cylindrical extension of one dimensional fuzzy set: If A is a fuzzy set in X, then its cylindrical extension in X Y is a fuzzy set c(a) defined by: c(a)= (x)/(x,y) (1.) A={1,3,5} B={1,3,5} A B={(1,1), (1,3), (1,5), (3,1), (3,3), (3,5), (5,1), (5,3), (5,5)} R :{(x,y) y=x+2}, S :{(x,y) x<y} The relation matrix: R :{( 1, 3), (3, 5)} S :{ (1, 3), (1, 5), (3, 5)} R S=max (R(x,y),S(x,y)) = =S R S =min(r(x,y),s(x,y)) = = =,z:,z X Z, uc, d,z

Lecture-9 Soft Computing Max Min Composition: T- R S T(x,z)=max(min(R,,,z R S (1, 1) = max ((min(r,,, (min(r,,, (min(r,,, = max(min,, min,, min, = max,, = R S(1,3)=max(0 0 0)= 0 R S(1,5)=max(0 1 0)= 1 Similarly R S(3,1)= R S(3,3)= R S(3,5)= R S(5,1) R S(5,3)= R S(5,5)= 0 R S form the similar relation matrix {(1,5)} R S = Also S R = (2.) = {(x 1, 0.2), (x 2, 0.7), (x 3, 0.4)}

= {(y 1, 0.5), (y 2, 0.6)} Soft Computing = = x 1 0.2 0.2 x 2 0.5 0.6 x 3 0.4 0.4 X={ x 1,x 2,x 3 },Y={y 1,y 2 },Z={z 1,z 2,z 3 } =,=, y 1 y 2 R(x,y)= x 1 0.5 0.1 x 2 0.2 0.9 x 3 0.8 0.6 z 1 z 2 z 3 S(x,y)= y 1 0.6 0.4 0.7 y 2 0.3 0.8 0.9 R S= max ((min (,,,z R S(x 1,z 1 )= max(min (,,,z,min (,,,z = max(min (0.5,0.6),min(0.1,0.5)) = max (0.5, 0.1) = 0.5 R S(x 1,z 2 )= max(min (,,,z,min (,,,z = max(min (0.5,0.4),min(0.1,0.8)) = max (0.4, 0.1) = 0.4

R S= z 1 z 2 z 3 x 1 0.5 0.4 0.5 x 2 0.5 0.8 0.9 x 3 0.6 0.6 0.7 (3.) P= {P1, P2, P 3, P 4 } plant D= {D 1, D2, D3, D4} Disease S= {S 1, S2, S3, S4} Symptoms =P D =D S = =

Lecture-10 Soft Computing **Find the association of the plants with different symptoms in find P ugcp ugcp u: P S= Extension Principle: It is used for extending crisp domains of mathematical expressions to fuzzy domains. It generalizes a common point to point mapping of a function f(.) to a mapping between fuzzy sets. Suppose a function X to Y and A is a fuzzy set on X defined as. A= µa(x 1 )/x 1 + µa(x 2 )/x 2 +. + µa(x n )/x n By extension principle image of fuzzy set A under the mapping f(.) can be expressing as a fuzzy set B. B = f(a) = µa(x 1 )/y 1 + µa(x 2 )/y 2 +. + µa(x n )/y n Where y i = f (x i ); i = 1,..n Let A = 0.1/-2 + 0.4/-1 + 0.8/0 + 0.9/1 + 0.3/2 = ((-2,0.1),(-1,0.4),(0,0.8),(1,0.9),(2,0.3)) f(x) = x 3 3 By applying extension principle- B = 0.1/1 + 0.4/-2 + 0.8/-3 + 0.9/-2 + 0.3/1 = 0.8/-3 + (0.4 0.9) /-2 + (0.1 =-9-

Fuzzy Relation: Soft Computing Binary Fuzzy relation: -LXdbwufdcu, R = {((x,y); µr(x,y)) (x,y) X Y } is abinary relation in X Y µr(x,y) is the two dimensional membership function. Suppose x = {3, 4, 5} y = {3, 4, 5, 6, 7} R: Y is greater than x µr(x,y) = f > f Relation Matrix: R = 9 Max Min Composition: Let R 1 and R 2 be two fuzzy relation defined on X Y and Y Z, respectively. The max Min composition of R 1 and R 2 is a fuzzy set defined by. R 1 R 2 ={ [ (x,z),max min(µr 1 (x,y) ; µr 2 (y,z))] x X y z Z µr 1 R 2 = max (min(µr 1 (x,y) ; µr 2 (y,z)) This is came as malri x multiplicand just that dpcdb d

Properties common to max min composition and Binary Relation:- c: T= T Dbu: T= T Lkdbu: T= T Mc: T=> T MPducp:- µr 1 R 2 (x,z) = max[µr 1 (x,y) µr 2 (y,z)] Fuzzy Logic:- In case of crisp logic the p truth values acquired by propositions or predicates ( is a statement which is either true or false but not both)equivalent to [0, 1] namely true or false. In fuzzy logic, the truth values may be multi valued numerically equivalent to (0-1) A Fuzzy proposition P is a statement which aquires a fuzzy truth value T(P). In simple form fuzzy propositions are associated with fuzzy set for P.The fuzzy membership value associated with fuzzy set for P is reated as the fuzzy truth value for P. i.e. T(P) = Fuzzy connectivity s Uses g- P (1- T(P)) Disjunction P T(P),T()) Conjunction P T(P),T()) Implication => P => P =-T(P),T()) Implication (=>) represents if then statement

IF Then Statement as: IF x is T y is and is equivalent to R= ( ) ( Example: We have two fuzzy set A and B on universe U 1 and U 2,where U 1 and U 2 both are identical and have integer 1 to 10 as the elements. U 1 =U 2 = {1, 2, 3, 4,.10} Let A be approximately 2 = { + + } and B be approximately 6 = { + + } Find approximately 12 2 6 = {( + + ) ( + + )} = {, +, +..+, +, = { + + + + + + + + + } A = { + + } B= { + } Find arithmetic product A B. 2) Suppose we have a universe of integer Y= {1, 2,3, 4, 5}.We define the following linguistic term as a mapping onto Y. small = { + + + + } large = { + + + + } Modify this two linguistic terms with hedges very small = small 2 Not very small = 1- very small Not very small and not very very large = (1- very small) (1-(large) 4 ) Intensely small ={ = { + + + + + + } + } + }

Lecture-11 Soft Computing Linguistic Variables: According to cognitive scientist, human base their thinking primarily on conceptual patterns and mental images rather than numerical computation. Also, human communicate with their own natural language by referring to previous mental images. Despite of the vagueness and ambiguity in natural language, it is the most powerful form of conveying information the human poses for any given problem or situation that requires solving or reasoning. Also human communication in natural language has very little trouble in basic understanding. The conventional techniques for system analysis are intrinsically unsuited for dealing with humanistic system, whose behavior is strongly influenced by human judgment, perception and emotion. This manifestation may be termed as principle of incompatibility. As the complexity of the system increases, our ability to make precise and yet significant statement about its behavior diminished until a threshold is reached beyond which precision and significance become mutually inclusive characteristics. This belief let Zadeh to propose the concept of linguistic variables. A linguistic variable differ from linguistic value in that its value are not number by words or sometimes in natural language Definition: A linguistic variable is characterized by quintuple(x, T(x),X,G,M) in which x is the name of the variable T(x) is the term set- the set of its linguistic values or linguistic terms X is the universe of discourse G is the Syntactic rule which generate the term T(x) M is a semantic rule which associates with each linguistic value A its meaning; M(A), where M(A) denotes fuzzy set in X. Semantic rule define the MF for each linguistic value in the term set. Examples:- If age is interpreted as a linguistic variable then its term set T(age)={ young, not young, not very young,... middle aged, not middle aged,...old, not old, very old, more or less old... not very young...}

Each term in T(age) is characterized by a fuzzy set of a universe of discourse X. If age is interpreted as a numerical variable we may say age=20 but when as linguistic variable we may say age is young meaning young is assigned to age Age young Primary term young, old, middle aged Hedges very, more or less, extremely Linguistic hedges: In linguistic the fundamental atomic terms/primary terms are often modified with adjectives or adverb like very, low, slight more or less. These modifiers are termed as linguistic hedges. Connectives: as and or either and neither Concentration and Dilation: Let A be a linguistic value characterized by a fuzzy set with membership function. The A K is interpreted as modified version of the original linguistic value expressed as = k If particular concentration is defined as: CON (A) = A 2 used for very. Dilation as DIL (A)=A 0.5 used for more or less.

Lecture-12 Soft Computing Constructing MF s for Composite Linguistic term: Let young and old be two linguistic values =,,, = =,,, = More or less old = DIA(old)=(old) 0.5 more less old (x) = Not young and not old Young old = young but not too young Young (young) 2 }Assumption that many of too is same as very = extremely old CON (CON (CON (OLD))) = (((old) 2 ) 2 ) 2 =old 8 =

Lecture-13 Soft Computing Contrast Intensification: Another operation that reduces fuzziness of a fuzzy set and is defined as f INT (A) = It is the value of which are above 0.5 and diminishes those which are below this point. Orthogonality:- A term set T=t 1,..,t n of a linguistic variable x on the universe X is orthogonal if it fulfills the following property. µ t i (x)=1, x X

When t i s are convex and normal fuzzy set defined on X and these fuzzy set make up the term set T. For MF s in term set to be intuitively reasonable, the orthogonality requirement has to be followed to some extend. Fuzzy If the Rules:- A fuzzy if then rule (also known as fuzzy rule, fuzzy implication, or fuzzy conditional statement) assumes the form. If x is A then y is B Where A and B are linguistic values defined by fuzzy set on universes of discourse X and Y, if X is A then y is B is abbreviated as A B. The expression describes a relation between the two variables x and y. One may then say that if-then rule can be defined as a binary relation R on the product space. Generally speaking there are two ways to implement fuzzy rule A B. 1. If we interpret A B as A coupled with B then. R = A B = A (x) B (y) / (x, y) Where T p 2. If A B is interpreted as A entails B then it can be written as four different formulas. Material Implication: R = A B = A B Proportional Calculus: R = A B = A (A B) External proportional Calculus: R = A B = ( A B) Generalized Modus Ponens: R (x, y)=sup{c A (x) c B (y) and 0 c 1} ----equation A Where R = A B and T p.

Soft Computing Based on the above two interpretation and various T-norm and T-conorm operators, a member of qualified models can be formulated to calculate the fuzzy relation R = A B. R can be viewed as a fuzzy set with 2D MF. R (x,y)= ( A (x), B (y)) = (a, b) Where a= A (x) b= B (y) and is a fuzzy implication function. Performs the job of transforming the membership grade of x in A and y in B into those of (x,y) in A B. For the 1 st interpretation A coupled with B as the meaning of A B,these are four different fuzzy relation exploring four different T-norm operator 1. Propsed by Mamdani R m =A B = A (x), B (y) / (x, y) or c (a, b) = a b (min operator used) 2. Proposed by Larsen R p = A B = A (x) B(y) / (x, y) or p = ab (algebraic product used for conjuct) 3. R bp = A B = A (x) B(y) / (x, y) = A B = A(x) B(y) 1)/ (x, y) or bp = (a, b) = a+b-1) (bound) 4. R dp = A B = A (x) B(y) / (x, y) fb = or (a, b) = bf = f, b For 2 nd interpretation A entails B A B

Lecture-14 Soft Computing Zadeh s arithmetic rule. R a = A B = A(x) B(y))/ (x, y) Or (a, b) = 1-a+b) (bounded sum = a+b) used for union operator) Zadeh s max min rule. R mm = A (A B) = A(x)) B(y)) / (x, y) Or m (a, b) = a b) (min for intersection and max for union) Boolean fuzzy implication using max for R s = A B= A(x)) B(x) / (x, y) Or (a, b) = b R = A(x) B(y)) / (x, y) (Goguens fuzzy implication) f b A b = f > It follows e.g. A by using algebraic product for T-norm operator. Rule Based System: The most common way to represent human know knowledge is to form it into natural expression of the type. IF premise (antecedent) THEN conclusion (consequent) referred IF-THEN rule based form. This form of knowledge representation is characterized as shallow knowledge. This linguistic variables can be naturally represented by fuzzy set and logical connect vive of these sets.

Canonical rule Forms: In general thee general forms exist for any linguistic variables. Assignment Conditional Statement Unconditional Statement Assignment: X = large Temperature = hot Conditional Statement: IF the tomato is red THEN the tomato is ripe. IF x is very hot THEN stop. IF x is very large THEN y is small ELSE y is not small. Unconditional Statement: Stop. Divide by x. T is the pressure higher. The assignment statement restricts the value of a variable to a specific quantity. The unconditional statement may be thought of as a conditional restriction with this IF clause condition being the universe of discourse of the input condition which is always true. IF any condition THEN true pressure high. Hence the Rule base can be described using a collection of conditional restriction statement. These restrictions are usually manifested in terms of vague natural language words that can be modeled using fuzzy mathematics.

Lecture-15 Conditional form for a fuzzy rule based system: Rule-1 IF condition C 1 THEN restriction R 1. Rule-2 IF condition C 2 THEN restriction R 2... Rule-r IF condition C r THEN restriction R r. Decomposition of Compound R rules: IF THEN. IF THEN ELS. IF AND IN By using basic properties and operation defined for fuzzy set any compound rule structure may be reduced to number of simple canonical rules. These rules are based on natural language representation and models which are themselves based on fuzzy set and fuzzy logic. Most common the technique for Decomposition. Multiple conjunction antecedents If x is A 1 and A 1 and A L THEN y is B s Assuming a new fuzzy subset A s as A s = A 1 A 2 A L Expressed by membership function =, L d df f fuzz c p, cpudubwif T Multiple disjunction antecedents IF x is A 1 OR x is A 2 OR x is A L THEN y is B S. Ordered be written as If x is A S THEN y is B S Where the fuzzy set A S defined as A S = A 1 A 2 A L =,

Conditions Statement: (1) IF A 1 THEN (B 1 ELSE B 2 ) may be decomposed as IF A 1 THEN B 1 OR IF NOTA 1 THEN B 2 Soft Computing (2) IF A 1 THEN B 1 UNLESS A 2 IF A 1 THEN B 1 OR IF A 2 THEN NOTB 1 (3) IF A 1 THEN (B 1 ELSE IF A 1 THEN (B 2 )) IF A 1 THEN B 1 OR IF NOTA 1 AND A 2 THEN B 2 NESTED IF THEN RULE IF A 1 THEN (IF A 2 THEN (B 1 )) IF A 1 AND A 2 THEN B 1 Aggregation of Fuzzy Rules: Most of rule based systems include more then one rule, The process of finding the over all consequent (conclusion) from individual consequents as contributed by each rule is the rule base is known as aggregation of rules.

Two simple extreme cases: a. Conjunction system of Rules: In this case the rules must be jointly satisfied, the rules are concluded by and connective. The aggregation o/p is the fuzzy intersection of individual rules consequent y i i = 1,2,..r Y = y 1 and y 2 and... y r Y = y 1 y 2...y r Which is defined by the membership function: =,, b. Disjunctive system of rules: Fcfdjucfufcf uquduccubccic gggd p fud b fuzz u f ddu u cbu Y = y 1 or y 2 or... y r Y = y 1 y 2...y r Which is defined by the membership function: =,,

Lecture-17 Defuzzification: Iufwupufuzz,k cp upu f upu pd g c qu T cffuzzgcpucddfzzfcd pcffuzzfc Centroid Method: Also known as centre of gravity or centre of area method. It obtains the centre of area occupied by the fuzzy set. µ x= Continuous x = discrete Here, x no. of element x 1 element bpfuc Compositional Rule of Inference: y= f(x) regulates the relation between x and y x = a y = f(a) = b If this is generalized over an interval (a).

To find the resulting y=b corresponding to interval a, a cylindrical extension of a is done and find its intersection I. The projection of I onto y leads to the interval y=b. Ifwu F fuzzx fuzzxt fd w cuc cdc w b T cfdffgfgipjc Fwffuzz If,c,dFMF f,,dfpc c,= bcdc c F,=c,,F, =,F, By projecting C(A) F on to Y axis we have. µ B (y) =max x min [µ A (x), µ F (x, y)] This reduces to max min composition of two relation matrix A (a unary fuzzy relation) and F (a binary relation) B = A F Extension principle is also a special case of compositional rule of inference. Using compositional rule we can formalize an inference procedure upon a set of fuzzy if then rules. The inference procedure is generally called approximate reasoning.

Lecture-18 Soft Computing Fuzzy reasoning: Premise (fact). X is A. (Rule) If x is a then y is B y is B Modus Ponens. Approximate reasoning or generalized modus ponens Computational aspects of Fuzzy Reasoning: Single rule with single antecedent: = = =,, = = =w Here we first find degree of matchw Then MF of the resulting B is equal to MF of B clipped it by w Degree of match Institutively it is said to be the measure of belief. The measure of belief is propagated by if then rules. The above figure shows the graphical interpretation of GMP using Mamdani fuzzy implication and max min composition.

Single Rule with multiple Antecedents: Premise (fact): x is A then y is B Premise (Rule): If x is A and is y is B then Z is C. Consequence Z is C 1. A B 2. A B C using Mamdani R m (A, B, C) = (A B) C = cz,,z c'z= A B C) c'z= A B C) Mamdani =, cz =,- cz =w w cz Firing strength or degree of fulfillment.

Lecture-19 Soft Computing Multiple Rules with multiple antecedents: The interpretation of many rules is usually taken as the union of the fuzzy relations corresponding to the fuzzy rules. There fore for a GPM problem. Premise 1 (fact): x is A then y is B Premise 2 (Rule 1): If x is A 1 and is y is B 1 then Z is C 1 Premise 3 (Rule 2): If x is A 2 and is y is B 2 then Z is C 2 Consequence: Z is C R 1 A 1 B 1 R 2 A 2 B 2 = = =

Fuzzy Tolerance and Equivalence Relation: Fuzzy relation R on a single universe X is a relation from X to X. It is a fuzzy equivalence relation if all three of the following properties for matrix relation define it. Eg: - Reflexivity:,= Symmetry:,j=j, Transitivity:,j=λdj,k=λ,k=λ Whereλλ,λ F gp f quc f,c, Transitivity short chain, stronger relation. In general the strength of the link between two elements must be greater than or equal to the strength of any indirect chain involving other element. Tolerance: A tolerance relation R on a universe X is a relation that exhibits only the properties of reflexivity and symmetry. Suppose, A tolerance relation can be reformed into equivalence relation by at most (n-1) components n is cardinal number of sets defining R.

R 1 = 9 9,=,=9,= fdcbu 9 = = 9 9 == 9

Lecture-20 Soft Computing Value Assignment: Where do the membership values that are contained in a relation come from. There are many different way. 1. Cartesian Product 2. Closed form expression 3. Look up table 4. Linguistic rules of knowledge 5. Classification 6. Similarity methods in data manipulation 1. Cartesian product of two or more fuzzy set. 2. Through simple observation of physical system. For a given set of input we observe a process of yielding a set of outputs. If there is no variation between specific pair of i/p o/p, the process may be modeled with a crisp relation, or may be expressed as Y=f(X) where X is a vector of input and Y is a vector of output. These expressions are termed as closed form of expression. 3. If some usability exists, membership value on the internal [0, 1] may lead us to develop a fuzzy relation from a look up table. 4. Fuzzy relation can also be assembled from linguistic knowledge expressed as if, if then else rules, such knowledge may come from a expert polls. 5. Relation also arises from notion of classification where issues associated with similarity are essential to determining relationship among patterns or clusters of data. 6. Similarity methods in data manipulation. Most prevalent method, these are actually a family of procedures almost as similarity method.

Cosine Amplitude: Collection of data samples, n data samples in particular. If these data samples are collection they form a data array X. X = {X 1, X 2,., X n } Each element x i in the data array X is itself a vector of length in that is X i = { Xi1, Xi2,., Xim } Each data example can be thought as a n-dimensional space where each coordinate require m coordinate for complete description. Each relation r ij results from the fair wise computation of two data sample x 1 and data sample x j and the strength of the relationship is given by the membership value r ij = k,j r ij is an m n matrix, this will be reflexive and symmetric hence a tolerance relation. r ij = i, j = 1,2 n Close implication results that this method is related to the dot product of cosine function. When two vectors are most similar, their dot product is usually, when two vectors are at right angle to one another, their dot product is zero. Q. Find separate region suffered with earth quake. Survey of damaged building made for purpose of assembly payout from the insurance companies to building owners. Xi1 Xi2 Xi3 Region X 1 X 2 X 3 X 4 X 5 Ration 0.3 0.2 0.1 0.7 0.4 with no damage Ration 0.6 0.4 0.6 0.2 0.6 with medium damage Ration 0.1 0.4 0.5 0.1 0 with serious damage

Express the similarity of damage of each of the regions x r ij= e.g. r 12 = Max min method:, r ij=, Where i, j=1, 2, 3...n r ij=,,,,,,,,, = * The membership function of is given by, =,, IF THEN ELSE (Compound Implication) If x is THEN y is ELSE y is the relation R is equivalent to =( The membership function is given by, =,,,

Lecture-21 Soft Computing Fuzzy Inference: Also referred to as approximate reasoning refers to computational procedures used for evaluating linguistic description. Two important inferring procedures are 1. Generalized Modus Ponens(GMP) 2. Generalized Modus Tollens(GMT) GMP is formally stated as 1. Rule IF x is THEN y is 2. Fact x is 3. y is Here,,, are fuzzy terms 1 and 2 are analytically known.3 is unknown. To compute the membership function of, the max min composition of fuzzy set with, which is known as implication relation CPF-THEN relation is used. That is =, Where is fuzzy term, o is max-min composite,, represents the Implication relation (IF THEN) =max (min (,, Where is membership function of,, is membership function of GMT: 1. Rule IF x is THEN y is 2. Fact y is 3. x is =, In terms of membership function =max (min (,,

Fact x is A Rule if x is A then y is B Consequence y is B B =A o R = A o (A B) Soft Computing Degree of compatibility: Compare the known facts to find the degree of compatibility with respect to each antecedent MF Firing strength: Combine degrees of compatibility with respect antecedent MF in a rule using fuzzy AND or OR operators to form a firing strength. Qualified (induced) consequent MF: Apply the firing strength to the consequent MF of a rule to generate qualified consequent MF. Overall output MF: Aggregate all qualified consequent MF s to obtain an overall output MF. =, Gives Q.3 Apply Modus Ponens rule to deduce Rotation is quite slow given 1. If the temperature is high then the rotation is slow. 2. The temperature is very high. Let (high),v (very high),(slow), and S(quite slow) indicate the associated fuzzy set as follows: For X = {30 40 50 60 70 80 90 100}

Where X is the set of temperature. Y = {10 20 30 40 50 60} Where Y is rotation per minute. = {(70,1)(80,1)(90,0.3)} V = {(90, 0.9) (100, 1)} S = {(10, 1) (20, 0.8)} = {(30, 0.8) (40, 1) (50, 0.6)} Soft Computing 1. If X is Then Y is => (x,y) = ( ) ( 2. X is H If X is H Then Y is S X isv Y is Q Q = V (X,Y) (X,Y) = ( ) (

Lecture-22 Soft Computing *Find the appropriate voltage composition. for this temperature using max-product In computer system there is a relationship between CPU board temperatures and power supplying voltage. Let us consider the following relation: If the temperature (in degree Fahrenheit) is high then power supply in volts will drop or become low. Let = temperature is high = voltage is low A B = If the temperature is high then the voltage will be low. The following membership function might be appropriate for these two variables A= { + + + + } B= { + + + + } a. Find A B b. Suppose we consider another temperature ' = temp is very high A= { + + + + } 1. P : Marry is efficient T(P) =0.8 : Ram is efficient T() =0.65 I. Marry is not efficient T(P)= (1- T(P)) =(1 0.8) = 0.2 II. Marry is efficient and so is Ram => P T(P ) = min(t(p),t()) = min(0.8, 0.65) = 0.65 III. Either marry or ram is efficient => P T(P ) = max(t(p),t()) = max(0.8, 0.65) = 0.8

IV. If Marry is efficient than so is Ram P => = P=> T(P=>) = max((1- T(P), T()) = max ((1 0.8), 0.65) =max (0.2, 0.65) =0.65 Soft Computing 2. X = {a, b, c,d} Y = {1, 2, 3,4} And = {(a, 0), (b, 0.8), (c, 0.6), (d, 1)} = {(1, 0.2), (2, 1), (3, 0.8), (4, 0)} = {(1, 0), (2, 0.4), (3, 1), (4, 0.8)} 1.) If x is then y is = ( ) ( ) where =,,-, =,,- ( ) = = { (a,1), (b,0.2), (c,0.4), (d,0)} = ( ) ( ) =

2.) If x is then y is Else y is =( ) ( ) w=,,-, =

Lecture-23 Soft Computing Fuzzy Inference System: The fuzzy inference system is a popular computing framework based on the concepts of Fuzzy set theory, fuzzy if then else rules and fuzzy reasoning. It has found application in a wide variety of fields Because of its multidisciplinary nature fuzzy inference system is known by many other means such as fuzzy rule based system, fuzzy expert system, fuzzy model, fuzzy associative memory, fuzzy logic controller and simply fuzzy system. Fuzzy inference system consists of: 1. A rule base which contains selection of fuzzy rules. 2. Data base which define the membership function used in fuzzy rules. 3. A reasoning mechanism which performs the inference procedure upon the rules and given facts to derive a reasonable outputs or conclusion. Fuzzy inference system can take either fuzzy input or crisp input, but the output it produces are almost always fuzzy set. But when it is necessary to have a crisp output especially when the fuzzy inference system is used as a controller, defuzzification is used to extract crisp value that best represents a fuzzy set. With crisp input and output the fuzzy inference system implements a non linear mapping from its input space to output space. This mapping is accomplished by a number of fuzzy if then else rules, each of which describes the local behavior of the mapping. There are different types of fuzzy inference system or fuzzy controllers. In general, the principal design elements in a general fuzzy logic control system are as follows: 1. Fuzzification strategies and interpretation of a fuzzification operator or fuzzifier.

2. Knowledgebase:- a. Discretization / normalization of the universe of discourse. b. Fuzzy partition of input and output spaces. c. Completeness of the partition. d. Choice of membership function of a primary fuzzy set. 3. Rule Base:- a. Choice of process state (input) variable and control (output) variable. b. Source of derivation of fuzzy control rules. c. Consistency interacting and completeness of fuzzy control rules. 4. Decision making:- a. Definition of fuzzy implication. b. Interpretation of sentena connective and. c. Interpretation of sentena connective OR. d. Inference mechanism. 5. Defuzzification strategies and interpretation of different defuzzification operator.

Lecture-24 Soft Computing MAMDANI FUZZY MODELS:- Mamdani fuzzy inference system was proposed as the first attempt to control a system engine and boiler combination by a set of linguistic control rules obtained from experience human operators. Two rule Mamdani inference system: Two crisp input x, y overall output z. Let min and max be adapted as T-norm (fuzzy intersection) and T-conorm (fuzzy union) operator. For finding relation Max min composition is used.

For a two input system: 1. The inputs of the system are crisp values and we use max min inference method. 2. The inputs to the system are crisp value and we use max product inference method. 3. Inputs to the system are represented by fuzzy set and we use max min inference method. 4. Inputs to the system are represented by fuzzy set and we use max product method. Case 1: R input x 1 and x 2 are crisp value. Rule based system. If x 1 is A k k 1 and x 2 is A 2 Then y k is B k Membership function for input x 1 and x 2 will be described by. = pu =δ pu= w = puj =δ puj= w k = kkk=,

Case 2: Max Product (co-relation product) =input(i) k input(j)) Graph Case 3: Input (i) and input (j) are fuzzy variables described by fuzzy membership function. The aggregated output by Mamdani implication will be given by: k = k, k Case 4:Input (i) and input (j) are fuzzy variable function and the inference method is a correlation product method. = k k

In mechanics the energy of a moving body is called kinetic energy. If an object of mass m (kilograms) is moving with a velocity v (miles/second) then the kinetic energy K(in joules) is given by the equation K = mv2.model the mass and the velocity as input to a system and the energy as output then observe the system for a while and deduce the following two disjunctive rules by inference. Rule 1: If x 1 is A 1 1 (small mass) and x 2 Is A 2 1 (high velocity) THEN Y is B'(medium energy) Rule 2: If x 1 is A 1 2 (large mass) OR x 2 A 2 2 (medium velocity) THEN Y is B 2 (high energy)\

Lecture-25 Soft Computing Defuzzification Methods:- Defuzzification is the conversion of a fuzzy quantity to a precise quantity. The output of a fuzzy process can be the logical union of two or more membership function defined on the universe of discourse of the output variables. 1. Max membership principle: Also known a height method. cz * czfz Z 2. Centroid method: Also known as centre of area or centre of gravity method. Z * = Z * = dgbcg 3. Weighted average method: This method is only valid for symmetrical outputs membership functions. Z * = dgbcu The weighted average method is formed by weighting each membership function in the output by its respective maximum membership value.

Example: Z * = Soft Computing Since it is restricted to symmetrical membership function, the value of a and b are the max of their respective shapes. 4. Mean Max membership: This method is closely related to first method, except that the location of the maximum membership can be non unique. i.e. maximum membership can be a plateau rather than a single point. Z * = 5. Center of sum: This is faster than many defuzzification methods that are used. This process involves the algebraic sum of individual outputs fuzzy set say c 1 and c 2 instead of their union. One drawback is that in this method intersecting areas are added twice. Z * µ = µ This method is similar to the weight average method except is centre of sum method the weight are the areas of two respective membership function whereas in the weighted average method the weight are individual membership values.

6. Bisector 0f area: A vertical line portioning the region between z=α,y=0 and z=β, y= p (z) into two region with same area.z BOA satisfies. Z is the universe of discourse α = min {z z Z} β = max {z z Z} 7. Smallest of maximum: Z SOM is the minimum in terms of magnitude of the maximum Z. 8. Largest of maximum: Z LOM is the maximum in terms of magnitude, of the maximum Z. This (7,8) are not often used.

Lecture-26 Soft Computing Sugeno Fuzzy Model:- AKA: TSK fuzzy model by Takagi, Sugeno & Kang. Is an effort to develop a systematic approach to generating fuzzy rules from a given input output data set. A typical fuzzy rule in a Sugeno Fuzzy model. Fuzzy model has the form. If x is A and y is B the Z=f(x, y) A and B are the fuzzy set in the antecedent. Z = f(x, y) is a crisp function in the consequent. w 1 z 1 = p 1 x+ q 1 y + z 1 w 2 z 2 = p 2 x+ q 2 y + z 2 Z = C '' = (A' B) = ' cz =,' ' cz = ' ' cz =w w cz

Here, w= ' dw= ' Let input mass = 0.35kg and input velocity = 55 m/s Soft Computing input mass be = approximate 0.35kg input velocity = approximate 55 m/s

Two input single output Sugeno fuzzy model: E.g., If x is small y is small then z = -x + y + 1 If x is small y is large then z = - y + 3 If x is large y is small then z = - x 1 If x is large y is small then z = x + y + 2 Z=x 2 y + 1 Z = x 2 + y + 1

Lecture-27 TSUKAMOTO Model: M.F. As a result the inferred output of each rule is defined as a crisp value induced by the rules firing strength. Overall output is the weighted average of each rules output. No time consuming defuzzification process required. Not used often since it is not transparent as either Mamdani or Sugeno. Let = { [ (x, y),,, X be.. 1 st Projection:- R (1) = { (x, max,, X 2 nd Projection:- R (2) = { (y, max,, X

Total Projection:- T =max max,, X! Total Projection 1

Lecture-28 Soft Computing Genetic Algorithm:- A genetic algorithm (GA) is a procedure that tries to mimic the genetic evolution a consecutive generations in a population to adapt to their environment. The adaptation process is mainly applied through genetic inheritance from parents to children and through survival of the fittest. Therefore, GA is a population-based search methodology. Some pioneering works traced back to the middle of 1960s preceded the main presentation of the GAs of Holland in 1975. However, GAs were limitedly applied until their multipurpose presentation of Goldberg 1989 in search, optimization, design and machine learning areas. Nowadays, GAs are considered to be the most widely known and applicable type of metaheuristics. GA starts with an initial population whose elements are called chromosomes. The chromosome consists of a fixed number of variables which are called genes. In order to evaluates and rank chromosomes in a population, a fitness function based on the objective function should be defined. Three operators must be specified to construct the complete structure of the GA procedure; selection, crossover and mutation operators. The selection operator cares with selecting an intermediate population from the current one in order to be used by the other operators; crossover and mutation. In this selection process, chromosomes with higher fitness function values have a greater chance to be chosen than those with lower fitness function values. Pairs of parents in the intermediate population of the current generation are probabilistically chosen to be mated in order to reproduce the offspring. In order to increase the variability structure, the mutation operator is applied to alter one or more genes of a probabilistically chosen chromosome. Finally, another type of selection mechanism is applied to copy the survival members from the current generation to the next one. GA operators; selection, crossover and mutation have been extensively studied. Many effective setting of these operators have been proposed to fit a wide variety of problems. More details about GA elements are discussed below before stating a standard GA.

Fitness Function Fitness function is a designed function that measures the goodness of a solution. It should be designed in the way that better solutions will have a higher fitness function value than worse solutions. The fitness function plays a major role in the selection process. Coding Coding in GA is the form in which chromosomes and genes are expressed. There are mainly two types of coding; binary and real. The binary coding was presented in the GA original presentation in which the chromosome is expressed as a binary string. Therefore, the search space of the considered problem is mapped into a space of binary strings through a coder mapping. Then, after reproducing an offspring, a decoder mapping is applied to bring them back to their real form in order to compute their fitness function values. Actually, many researchers still believe that the binary coding is the ideal. However, the real coding is more applicable and easy in programming. Moreover, it seems that the real coding fits the continuous optimization problems better than the binary coding. Selection Consider a population P, selection operator selects a set P P of the chromosomes that will be given the chance to be mated and mutated. The size of P is the same as that of P but more fit chromosomes in P are chosen with higher probability to be included in P. Therefore, the most fit chromosomes in P may be represented by more than one copy in P and the least fit chromosomes in P may be not represented at all in P. Consider the population P = {x 1, x 2,..., x N }. The difference between selection operators lies in the way of computing the probability of including a copy of chromosome x i P into the set P, which is denoted by p s (x i ). Using these probabilities, the population is mapped onto a roulette wheel, where each chromosome x i is represented by a space that proportionally corresponds to p s (x i ). Chromosomes in the set P are chosen by repeatedly spinning the roulette wheel until all positions in P are filled.

Lecture-29 Soft Computing Crossover and Mutation Crossover operator aims to interchange the information and genes between chromosomes. Therefore, crossover operator combines two or more parents to reproduce new children, then, one of these children may hopefully collect all good features that exist in his parents. Crossover operator is not typically applied for all parents but it is applied with probability pc which is normally set equal to 0.6. Actually, crossover operator plays a major role in GA, so defining a proper crossover operator is highly needed in order to achieve a better performance of GA. Different types of crossover operators have been studied. Mutation operator alters one or more gene in a chromosome. Mutation operator aims to achieve some stochastic variability of GA in order to get a quicker convergence. The probability pm of applying the mutation operator is usually set to be small, normally 0.01. Standard Genetic Algorithm: 1. Initialization. Generate an initial population P 0. Set the crossover andmutation probabilities p c (0, 1) and p m (0, 1), respectively. Set the generation counter t := 1. 2. Selection. Evaluate the fitness function F at all chromosomes in P t. Select an intermediate population P t from the current population P t. 3. Crossover. Associate a random number from (0, 1) with each chromosome in P t and add this chromosome to the parents pool set SP t if the associated number is less than p c. Repeat the following Steps 3.1 and 3.2 until all parents in SP t are mated: 3.1. Choose two parents p 1 and p 2 from SP t. Mate p 1 and p 2 to reproduce children c 1 and c 2.

3.2. Update the children pool set SC t through SC t := SC t {c 1, c 2 } and update SP t through SP t := SP t {p 1, p 2 }. 4. Mutation. Associate a random number from (0, 1) with each gene in each chromosome in P t, mutate this gene if the associated number is less than p m, and add the mutated chromosome only to the children pool set SC t. 5. Stopping Conditions. If stopping conditions are satisfied, then terminate. Otherwise, select the next generation P t+1 from P t SC t. Set SC t to be empty, set t := t + 1, and go to Step 2.

Lecture-30 Neural Network: Work on artificial Neural N/W is commonly referred as Neural Networks Motivation: Human brain computes in an entirely different fashion from the computational computer.though, it is highly complex, nonlinear, and has parallel information processing system. It has the capability to organize its structural constituent known as neurons so as to perform certain computation (e.g. Pattern reorganization, perception e.t.c) and many times faster than the fastest digital computer in existence today. E.g human vision:-the visual system provides the representation of the environment around us and more important information we need to interact with the environment or recognizing a familiar face in an unfamiliar experience. A brain has a great structure and the ability to build up its own rules through what we usually refer as experience. A developing neuron is synonymous with a plastic brain. Plasticity permits developing neurons to (neuron system) adopt to is surrounding environment. In general, also plasticity is an essential property for functioning of any machine for information processing. Neural Network is a machine that is designed to model the way in which human brain performs a particular task or function This N/W can be realized either by using electronic component or is simulated in software on a digital computer. An important class of neural network is one that performs useful computation through a process of learning. Neural network viewed as an adaptive machine can be defined as: A Neural Network is a massively parallel distributed processor made up of simple processing units, which has a natural propensity for storing experimental knowledge and making it available for use. It resembles the brain in two respects.

1. Knowledge is acquired by the N/W from its environment through a learning process. 2. Inter neuron connection strengths, known as synaptic weights, are used to store the acquired knowledge. The procedure used to perform the learning process is called learning algorithm, the function of which is to modify the synaptic weight of the network in an orderly fashion to attain a desired design objective. Benefits of Neural Network: Neural networks derive its computing power through: 1. Massively parallel distributed structure. 2. Its ability to learn and there for generalize. Generalization refers to that the neural network can produce reasonable output for inputs not encountered during training (learning). **N.N can not provide solution by working individually. Rather it can be integrated into a system engineering process.

Lecture-31 Soft Computing NN offers the following useful properties: 1. Nonlinearity:-A N can be linear as well as non linear. 2. Input-Output Mapping:- NN can be trained using sample data or task example. Each example consists of a unique input signal and a corresponding classical response. The network is trained by adjusting the weights to minimize difference between classical o/p and actual o/p. 3. Adaptivity:- Neural network have a built in capability to adopt their synaptic weights to changes in the surrounding environment. In particular a neural network trained to operate in a specific environment can be easily retrained to deal with minor changes in the operating environmental condition. Also if NN is meant to function is a non stationary environment, it can be designed to change its synaptic weight in real time. This enables to make it a useful tool in adaptive pattern classification, adaptive signal processing, and adaptive control. ** To realize the full benefit of adaptivity, the principle time constant of the system should be long enough for the system to ignore spurious disturbances and yet short enough to respond to meaning full changes in the environment. 4. Evidential response:- In context to pattern classification, a neural network can be designed to provide information not only about which pattern to select but also about the confidence in the discussion made: The latter information is used to reject ambiguous patterns. 5. Contextual Information:- Knowledge is represented by the very structure and actuation state of a neural network. Every neuron in the network is potentially affected by the global activity of all other neurons in the network.

6. Fault Tolerance:- A neural network, implemented in hardware form has the potential to be inherently fault tolerant or capable of about computation, in the sense that is performance degrades gradually under adverse operating conditions. Thus in principle a neural network exhibits a graceful degradation in performance rather than catastrophic failure. 7. VLSI: Implementation:- The massively parallel nature of a neural network makes it potentially fast for the computation of certain task. This same feature makes a neural network will suited for implementation using very large scale integrated technology. 8. Uniformity of Analysis & Design:- A neural n/w enjoys universality as information processes. I.e. like same notion as used in all domains involving application of NN. 9. Neurobiological Analogy:- The design of a neural network is motivated by analogy with the brain, which is a living proof that fault tolerant parallel processing is not only physically possible but also fast and powerful.

Lecture-32 Soft Computing Neural Networks: Brain contains about 10 10 basic units called neurons. A neuron is a small cell that receives electro-chemical signals from its various sources and in turn responds by transmitting electrical impulses to other neurons. Some neurons perform input operation referred to as afferent cell; some perform output operation referred to as efferent cells, the remaining form a part of interconnected network of neurons which are responsible for signal transformation and storage of information. Structure of neuron: Graph Dendrites: Behave as input channels, i.e. all inputs from other neurons arrive through the dendrites. Axiom: Is electrically active and serves as an output channel. There are the non linear threshold devices which produce a voltage pulse called Action Potential. It the cumulative inputs received by the soma raise the interval electric potential of the cell neuron as Membrane potential, then the neuron fires by propagating the action potential shown the axiom to either or inhibit other neurons. Synapse or Synaptic Junction: The axiom terminates in a specialized contact called synapse or synaptic function that connects axiom to dendrites links of other neurons.

This synaptic function which is a very minute gap at the end of the dendrite link contacts a neuron transmitter fluid. The size of the synaptic junction or synapses is believed to be related to learning. Thus, synapses with large area are thought to be exhibitory while those with small area are believed to be inhibitory. Model of A Artificial Neuron: Human brain is a highly interconnected network of simple processing elements called neurons. The behavior of a neuron can be captured by a simple model termed as artificial neuron. In artificial neurons acceleration and retardation of modeled by weights. An efficient synapse which transmits a stronger signal will have a corresponding larger weight. I = w 1x 1 + w 2 x 2 +.+ w n x n = w To generate the final output, the sum is passed on to a non linear filled called Activation function, tranfi function, ar squash function. Y = I

Lecture-33 Soft Computing Commonly used activation function: 1. Thresholding function:- Sum is compared with a threshold value θ. If the value of I is greater than θ,upu Y = w θ Where, pfuckwudfucduc : = I-θ= fi> = I =θpuupu 2. Signum function:- I=I> = -1 I

3. Sigmoidal function:- This function is a continuous function that varies gradually between the asymptotic values 0 and 1 or -1 and +1 given by. 4. Piecewise-Linear Function:- For the piecewise linear function described. = > I= α Slope parameter which adjust the abruptness of the function as it changes between the tied asymptotic value. Neural network Architecture: It is defined as data processing systems consisting of large number of simple highly interconnected processing elements.

Lecture-34 Soft Computing Network Properties: Generally an artificial neural network can be represented by using a directed graph. The topology of a neural network refers to its framework as well is as its interconnection scheme. The frame of layers and the number of nodes per layer include: 1. The input layer: - The nodes in it are called inputs merely transmit signal. 2. Hidden layer:- Performs useful intermediary computation. 3. Output layer:- Encodes possible values. Human brain: : Block diagram representation of nervous system: Central to system is the brain represented by neural network which continually receives information, process it and make appropriate decision. Forward transmission takes the information bearing signal through the system.

Backward transmission provides feedback to system. Neuron -Structural constituents of brain. Typically a neuron is 5 to 6 order of magnitude slower than silicon logic gates. Event in silicon chip happen is 10-9 s. where as in neuron 10-3 s. The brain makes up for the relatively slow s rate of operation of a neuron by having a truly staggering number of neuron with massive interconnection between them. Energy Efficiency: of brain is approximately 10-16 /joules per operation per sec. where as the corresponding value for the best computer in use today is about 10-6 j/ operation per second. A Neuron is an information processing unit that is fundamental to the operation of a neural network Three basic elements or connecting link, each of which is characterized a weight or strength of its own. 1. A set of synapses or connecting link, each of which is characterized a weight or strength of its own. 2. A adder:- for summing up input signals. 3. An activation function:- for limitation the output of a neuron also referred as equalizing function.

Lecture-35 Structural Organization of levels in the brain:

Neural Network Viewed as directed Graph: Soft Computing A neural network is a directed graph consisting of nodes with interconnectivity synaptic and activation links and is characterized by four properties. 1. Each neuron is represented by a set of linear synaptic links, an externally applied bias, and possibly non linear activation links. The bias is represented by a synaptic link connected to an input find at +1. 2. The synaptic links of a neuron weight thus respective input signals. 3. The weight sum of the input signal defines the induced local field of the neuron in question. 4. The activation link squashes the individual local field of the neuron to produce an output.

Lecture-36 Soft Computing Feedback:-Feed back is said to exist in a dynamic system whenever the output of an element in the system influences in part the input applied to that particular element, there by giving rise to one or more closed paths for the transmission of signal around the system. x' j (x) = x j (x) + By k (x) Y k (x) AB ( Y k (x)) = A x j (x) Y k (x) = x j(x) y kx = A x' j (x) = A (x j (x) + By k (x)) Y k (x) = A(x' j (x)) = A(x j (x) +B [Y k (x)]) Here, Closed loop operator. AB Open loop operator. The fixed weight B can be used as limit delay operator, Z -1 whose o/p is delayed w.r.t input by one time unit. = Then = w( 1- wz-1 ) -1 = w w z using Binomial Theorom. y k (x) = w w z [ x j (x) ] z [x j (x) ] = x j (x - l) by definition of z -`1 Where x j (x - l ) is a sample of input signal delayed by l times unit. y k (x) = w The dynamic behavior of an neural network is controlled by weight w

1. w 1 Output signal is extremely convergent, system is stable. W < 1 corresponds to system with infinite memory; i.e lte. 2. w 1 Output signal is divergent, system is unstable. 3. w = 1 Signal divergence is linear. Network Architecture:- The manner in which the neurons of a neural network are structured is intimately linked with the learning algorithm used to train the network. Three Fundamental Classes of network architecture:- 1. Single layered Fed forward network:(no computation required at input ) : Input Layer: :o/p layer: Source node neuron

Lecture-37 Soft Computing 2. Multi layer Feedback Networks:- The second class of feed forward network distinguishes itself by the presence of one or more class of hidden layers, whose computation nodes are called hidden neurons or hidden unit: The function of hidden neuron is to intervene between the external input and network output is in some useful manner. Presence of one or more hidden layer enables to extract higher order statistic. This is particularly valuable when the size of the input layer is large. Network acquires a global perspective despite its local connectivity due to extra set of synaptic connection and extra dimension of neural interaction. :Input Layer: :layer of hiddenneuron: :layer of o/p neuron : m h 1 h 2 9 n/w m input h 1 no of neuron in first hidden layer.

h 2 no of neuron in 2nd hidden layer. 9 no of output number. Soft Computing 3. Recurrent Network:- It has at least one Feed back loop. Every node performs a specific function and the nodes are associated with certain parameters, if the parameters change the overall behavior also changes.hence the node function depends on parameters value. If node parameters set is not empty there are p represented by a square and if the node parameter is empty and if performs a specific function then it is represented by a click x f y x f y Based on the type of connection adaptive networks can be classified as 1. Feed forward network: all connection in one direction. a. Single layer b. Multi layer c. 2. Recurrent network: there are feedback connections on loops. Types of connections: 1. Inter layer connection: a connection between nodes of adjacent layer. 2. Intra layer connection: a connection between nodes within the sume layers. 3. Supra layer connection: a connection between nodes in distance (nonadjacent) layer. 4. High order connection: a connection that contain input from more than one layer.

Perceptron: The perceptron is a computational model of the retina of the eye and hence is named as a perceptron. The n/w comprises of three units Photo electrodes are randomly connected to associated units. The A unit used to comprise of features on predicates. The predicates examines the o/p of the s units for specific features of the image. Respose R unit comprises or pattern recogniser or preceptors. o/p y 1 =f(net j)=1 if net j>0 =0 otherwise Where netj= w Or o/p column the net j also The training algorithm for perceptron is of supervised learning where units are adjusted to minimize error whenever the computed o/p does not match the target o/p

Lecture-38 Soft Computing Basis learning algorithm for perceptron: 1. If the o/p is correct no adjustment of units W ij (k+1) =W ij (k) 2. if the o/p is 1 but should have been 0 then the weights are decreased on the active i/p links W ij (k+1) =W ij (k) -α 3. if the o/p is 0 but chould have been 1 then the weights are increased on the action links W ij (k+1) =W ij (k) Where W ij (k+1) new adjusted weight W ij (k) old weight α learning rate +α small α slow learning large α fast learning constant α learning algorithm is termed as fixed increment algorithm. The value of the learning rate can be constant throughout the training or it can be varying quantity proportional to error. Varying learning rate proportional to error leads to faster convergence but can cause unstable learning. w i =η.t i.x i also used same as α, the learning rate.

Perceptron and linear separable tasks: Perceptron cannot handle task which are not (linear separable) i.e. set of point in a 2D can be separated by a straight line. Perceptron cannot find weights for classification type of problem that are not linearly separable. e.g. an example is XOR problem XOR Problem. X Y O/p 0 0 0 1 1 0 even parity. 0 1 1 1 0 1 odd parity. Problem is to classify input as even and odd parity.

This is impossible since the perceptron is unable to find a line which can separate odd i/p path and even i/p pattern Why it cannot be find? Consider an example: x o = 1 w 0 x 1 w 1 ΣΣ Σ x 2 w 2 Σ Σ =w 0 +w 1 x 1 +w 2 x 2 (represents eqn of a line)

x 2 st line acts as a decision boundary class c 1 class c 2 x 1 Adaline Network: The (Adapline Linear Neural Network) only one i/p neuron, o/p value are bipolar.( -1 or +1) inputs could be binary, bipolar, real value. If weights sum of input is greater than zero then o/p is 1 otherwise -1. The supervised learning algorithm adopted by n/w is known as lean mean square or delta rule. W 1 new = W 1 old + α(t-y)x i t target o/p Similar to perceptron learning algorithm x 1 x 2 x n Σupu Σ Thresholding function

Lecture-39 Soft Computing Madaline Network: Created by combining a sum of adalines. Use of multiple adaline help conquer the problem of non linear separability Madaline with 2 limit exhibit the capability to solve XOR problem. x 0 =1 w 0 Y x 1 w 2 Z x 2 x 0 =1 x 1 w 1 w 2 Σ Σ x 2 if Z and Z are same o/p is +1 if different o/p is -1 Y Σ Z AND output (-1,+1) (+1,-1) (+)ve o/p (-)ve o/p (-1,-1) (+1,-1) Represents even and odd pierly

Advantage of using delta rule: 1. Simplicity 2. Distributed learning: learning is not reliant on central control of the network; it can be performed locally at each node level 3. Online learning(or pattern by putting learning): weights are adjusted after presentation of each pattern. 4. Summary of Perceptron convergence Algorithm: Variables and Parameters X(n)=(m+1)-by- 1 input vector =[+1,x 1 (n),x 2 (n),...x m (n)] T W(n)=(m+1)-by- 1 weight vector =[b(n),w 1 (n),w 2 (n),...w m (n)] bn = bias y(n)= actual response; d(n)= desired response η= learning rate parameter, a positive constant less than unity. 1. Initialization: set w(0)=0: then perform the following computation for time stop n=1,2,... 2. Activation: at time step n, activate the perceptron by applying continuous-valued input vector x(n) and desired response d(n) 3. Computation of actual response:- Compute the actual response of perceptron. Y(n)=sqn[ w T (n) x(n)] 4. Adaptation of weights: update the weight vector of the perceptron W(n+1)=w(n) + η[d(n)-y(n)]x(n)

Where d(n)= fbgcc fbgcc 5. Continuation: increment time step by 1 and go to step 2 Exclusive OR Problem X Y class 0 0 0 0 1 1 1 0 1 not linear operable 1 1 0 0 w 1 + 0 w 2 + w 0 0 w 0 0 0 w 1 +1 w 2 + w 0 >0 w 0 > - w 2 1 w ww> w>-w w ww w-w w

Lecture-40 Soft Computing Features of ANN: 1> They learn by example, 2> They constitute a distributed, associative memory 3> They are fault tolerant. 4> They are capable of pattern recognition. The capability learning by example utilizing example taken from data and organise the information into useful form. This form constitute a mode that represents the relationship between I/p and o/p variables. Associative Memory:- An associative memory can be thought of as a mapping g between a pattern space R m with R n. Thus, for R m and =R n = g() Quite often g tends to be a non linear matrix type operator resulting in =M() M has different form for different memory models. The algorithm which computes M is known as recording or storage algorithm. Mostly M is computed using input pattern vectors. Based on principle of recall, associative pattern may be classified into static and dynamic M Static model(non recurrent)

(0) M k Soft Computing (k) (k+1) k Dynamic model (recurrent) For static model, the associated for the input pattern is recognised in one feed forward pass whereas for a dynamic network the following recursion formula is put until an equilibrium state is reached. k =M k k =M k, k ) Auto-correlators: Also known as Hopfield Associative Memory First order correlaters obtain their connection matrix (indicative of association of the pattern with itself, by multiplying a pattern element with other pattern elements. For a first order autocorrealators that stores M bipolar patterns A 1,A 2,...A m by summing. T= Here T=[t ij ] is a p p connection matrix and A i {-1,1} p Recall equation of the autocorrelators: it is a vector multiplication followed by a point wise nonlinear threshold operation. a new i = f(, j=1,2,...p I Where A i =(a 1, a 2,...a p ) and the two parametric threshold function is fα > f( α, β )= βfα = II fα Working: Consider the following path

A 1 = (-1, 1, -1, 1) A 2 = (1, 1, 1, -1) A 3 = (-1, -1, -1, 1) Which are to be stored as an autocorrelator Soft Computing The connection matrix T= 4 1[A i ] 1 4= Recognition of stored patterns: The autocorrelator is presented a stored pattern A 2 = (1, 1, 1, -1) a new i = f(, =f(3+1+3+3, 1)=1 a new 2 = f(1+3+1+1, 1)=f(6,1)=1 a new 3 = f(10,1)=1 a new 4 = f(-10,1)=-1 this is same as A 2 Recognition of noisy pattern: Consider a vector A 1 =(1, 1, 1, 1) which is a distorted presentation of one of the stored pattern. Hamming distance measure can be used to find the proximity of hte noisy vector to the stored patterns using the Hamming distance measure. Hamming distance of a vector X from Y Given X={x 1, x 2, x 3...x n ) and Y={y 1, y 2,...y n ) is given by HD(x, y)=x i y i ] Now HD (A 1, A 1 ) = 4 HD (A 1, A 2 ) = 2 HD (A 1, A 3 ) = 6 It is evident that A 1 is closer to A n. Now let use equation I and find if autocorrelators can find the pattern or not a new 1 = f (4, 1) =1

a 2 new = f (4, 1) =1 a 3 new = f (4, 1) =1 a 4 new = f (-4, 1)=-1 (1, 1, 1, -1)=A 2 Hence in case of partial vectors, an autocorrelators results in the refinements of the pattern or removal of noise to retrieve the closest matching pattern.

Lecture-41 Soft Computing Heterocorrelators (KOSKO S DISCRETE BAM): Bidirectional associative memory (BAM) is a two level non linear neural network based on associative network. It has the following operations. 1. There are N training pairs{(a 1 B 1 ), (A 2,B 2 ),...(A n, B n ) }where A i = {a i1, a i2... a in } B i = {b i1, b i2...b in ) That aii or b ii is either OFF or ON OFF=0 ON=1 OFF=-1(in bipolar mode) 2. Correlation matrix M= X To retrieve the nearest (A i, B i ) pair given any (α, β), the recall equation are as follows Starting with (α, β) as the initial condition, we determine a finite sequence (α,β ), (α, β )... until an equilibrium point (α F, β F ) is created Here β =Φ(αM) α =Φ(β M T ) Φ(F)=G=g 1, g 2...g n F=f,ff ff > G= b bp, f pug, f = Working: Suppose N=3 with patterns A 1 = (100001) B 1 = (11000) A 2 = (011000) B 2 = (10100) A 3 = (001011) B 3 = (01110) Converting these to bipolar forms X 1 = (1-1 -1-1 -1 1) Y 1 = (1 1-1 -1-1) X 2 = (-1 1 1-1 -1-1) Y 2 = (1-1 1-1 -1) X 3 = (-1-1 1-1 1 1) Y 3 = (-1 1 1 1-1)

The matrix M is calculated as M=X T 1 Y 1 + X T 2 Y 2 + X T 3 Y 3 = Suppose α=x 3 helping to retrieve the associated pair Y 3 αm = (-1-1 1-1 1 1) (M) = (-6 6 6 6 6-6) β = Φ(αM) = ( -1 1 1 1 1-1) β M T = (-5-5 5-3 7 5) α =Φ(β M T )=(-1-1 1-1 1 1) α M = (-6 6 6 6-6) Φ(α M)=β =(-1 1 1 1-1) =β Here β is same as Y 3 Hence (α F, β F ) = (X 3 Y 3 ) is desired result. Back propagation learning: It is a systematic method of learning multilayer artificial neural network. Consider the network: I l1 O l1 V l1 O H1 O 01 1 1 21 W 1 11 O l2 O H2 W 21 O 02 2 2 W m1 l 2 l1 O 0n 2 O ll O Hm l is input node. m is hidden node n is output node m n

Set of input and output: I I I O O O Step 1: is input layer computation: Consider linear activation function LOL= I L l 1 l 1 Step 2: Hidden layer computations: Input of the hidden layer of any hidden layer neuron p I Hp =V 1p O 2L + V 2p O 12 +...+ V 1p O LL P=1, 2, 3...n [I] H = [V] T [o] I m 1 m t l I let us consider a sigmoidal function at p ln hidden neuron. O HP = λ O HP threshold of the P th neuron I HP input of the P th neuron Output of the hidden neuron {O} HP = λ λ Soft Computing Step 3:Output layer computation: I oq =W 1q O H1 +W 2q O H2 +...+W mq O Hm Q=1,2,3...n {I} o = [W] T {O} H n 1 n m m 1 Considering sigmoidal function, the o/p of the q th output neuron is given by

O oq = {O} o = λ λ Soft Computing Step 4: Calculation of error: For any r th o/p neuron error norm in o/p of r th neuron is taken = = T O The square of the error is considered since irrespective of whether error is (+)ve or negative we consider only absolute values. Euclidean norm of error E for the first training path is given by E = T O If some technique is used for all training pattern we get E(V,W)=, W, I

Lecture-41 Soft Computing Training of neural network: The synaptic weighting and aggregation operations performed by the synapses and soma respectively provide a similarity measure between the input vector I and the synaptic weights [V] and [W]( accumulation knowledge base). When a new input pattern is significantly different from the previously learned pattern is presented to the neural network, the similarity between this input and the existing knowledge base is small. Method of steepest descent: The error surface is given by E=, W, I Multilayer feed forward networks with nonlinear activation functions have mean squared error surface above the total Q-dimensional weight space R Q. In general, the error surface is complex and consists of many local and global minima. In back propagation (BP) Network, at any given error value E, including minima region, there are many permutation of weights which give rise to the same value of E.

BP is never assured of finishing global minimum as in the simple layer delta rule case. At start of the training process, gradient descent search begins at a location with error value E determined by initial weight assignments W(O),V(O) and the training pattern pair (I P,O P ) where = = T O During training, the gradient descent computations incrementally determine how the weights should be modified at new location to move most rapidly in the direction opposite to the direction of deepest descent. After the incremental adjustments of weights have been made, the location is shifted to a different E location on the error weight surface. The process is repeated for each training pattern, progressively shifting the location to lower level until a limit on the total number of training cycle is reached. In moving down the error path, the path followed is generally not the ideal path. It depends on the shape of the surface and the learning rate coefficient η For simplicity assuming error surface to be truly spherical = (V i+1 - V i ) + (W i+i -W i ) = V + W Gradient is given by G= +

Hence unit vector in the direction of the gradient is given by: = Where = -η η = is a constant hence V=-η For k LN o/p neuron E k is given by E k =1/2 (T k -O ok )^2 T k target o/p O ok compute o/p ; W=- η To compute, chain rule of differentiation be applied Where w = I I w = T O o/p of the k th neuron Soft Computing Hence = λλ λ = λλ =λ λ λ Ok = λ = λ λ λ λ =λ as I = w w w =

Hence w = λt O k k w = η k w k In matrix form [ w]=η {O} H <d> m n=m 1 1 n <d>=λt O k k By applying chain rule = I I I I = λt I O k k I = w k = λ I Soft Computing I = = I Hence I = w I k d k = Let d * k = λ = d I v= η {I} H <d * > E effect of learning rate η keep constant through all iteration Add momentum term increases rate of convergence w = η α w w =η α α momentum coefficient value should be positive but less than 1

Lecture-42 * Advantage of having hidden layers: This allows ANN to develop its own internal representation of this mapping such a rich and complex internal representation allows the network to brainy any kind of mapping not just linearly separable ones. Back propagation algorithm: Basic loop structure: Initialize the weights Repeat For each training pattern Train on that pattern End Until the error is acceptably low. Step by step procedure: Step 1. Normalize the input and outputs with respect to their maximum values(it is proved that the neural network work better if o/p and i/p lie between 0 and -1) for training pair assume there are l inputs given by {I} I and n o/p {O} o in a normalised form. l 1 n 1 Step 2. Assume the number of neurons in the hidden layer to lie between l<m<2l Step 3. [ [v] represents the weight of synapse connecting input neurons and hidden neurons and [w] represents weights of synapses connecting hidden neurons and o/p ] Initialize the weights to small random values usually from -1 to 1. For general purpose λ can be assumed to be [v] o = [random weight] [w] o = [random weights] [ v] o =[ w] o =[0]

Step 4. For the training data, presents one set of input and output. Present the pattern to the input layer {I} I as inputs to the input layers. By using activation function the output of the input layer may be evaluated as {O} I = {I} I l 1 l 1 Step 5. Compute the input to the hidden layer by multiplying corresponding weights of synapses as {I} H = [v] T {O} I m 1 = m l l 1 Step 6. Let the hidden layer units evaluate the o/p using the synmoidal function. = = Step 7. Compute the input to the o/p layers by multiplying corresponding weights of synapses as {I} o = [w] T {O} H n 1 = n m m 1 Step 8. Let the o/p layer units evaluate the output using sigmoidal function as O = Step 9. Calculate the error, the difference between the n/w output and the desired output as for the i th training set as E p=

Step 10. Find {d} as T O k k {d}= Step 11. Find [Y] matrix as [Y] = {O} H <d> m n = m 1 1 n Soft Computing Step 12. Find [ w] th =α [ w] t + η[y] m n m n m n Step 13. Find {l} = [w] {d} m 1 m n n 1 {d * }= Find [x] matrix as [x] = {O} I <d*> = {I} I <d*> l m l 1 1 m l 1 1 m Step 14. Find [ v] HI =α[ v] t + η[x] l m l m l m Step 15. Find [ v] t+1 = [v] t + [ v] t+1 [ w] t+1 = [w] t + [ w] t+1 Step 16. Find error rate as Σ Step 17. Repeat step 4-16 until the convergence in the error rate is less than the tolerance value.

Lecture-43 Effect of tuning parameters of the back propagation neural network: Momentum factor Learning coefficient Parameter sigmoidal gain Threshold value Momentum factor: It has significant role in deciding the values of learning rate that will produce rapid learning. determines the step size of change in weight or biases. if momentum factor is zero the smoothening is minimum and the entire weight adjustment comes for newly calculated change. if momentum factor is 1, new adjustment is ignored and previous one repeated. Between 0 and 1 is the region where the weight adjustment is smoothened by an amount proportional to the momentum factor. Momentum factor 0.9 has been found suitable for most of the problem. role of momentum factor is to increase the rate of learning. Learning Coefficient: Choice of learning depends on number and types of input pattern η =

Cumulative Update Of Weights: Cumulative BP individual weight changes are accumulated for an epoch of training and summed. These cumulative weight changes are applied in the individual weights. Where N is the number of pattern of type and m is the number of different pattern type it may be difficult to spot patterns. Target o/p is used to determine a pattern type. if the learning coefficient is large,that is greater than 0.5,the weights are changed drastically but this may cause optimum combination of weights to be overshot resulting in oscillation about the optimum. if the learning rate is small less that 0.2 the weight are changes in small increment thus causing the system to converge slowly but with little oscillation. Sigmoidal Gain: If sigmoidal function is selected, the i/p, o/p relationship of the neuron can be set as : O= λθ λ = known as scaling factor is the sigmoidal gain λ affects back propagation- Improper combination of scaling factor or momentum factor leads to over correction and poor convergence. Threshold value θ: either assigned a small value and kept constant or it can be changed during training depending on the application. Dealing with local minima: Fast back propagation: adjusting the activator value prior to adjusting weights. Extended Back propagation for Recurrent Network: For recurrent network, an extended version of back propagation is applied to find gradient vectors.

Consider the following network x 3 =f 3 (x 1,x 5 ) x 4 =f 4 (x 2,x 3 ) x 5 =f 5 (x 4,x 6 ) x 6 =f 6 (x 4,x 6 ) There are two distinct operating mode through which the network may satisfy eq-1 1. Synchronous operation 2. Continuous operation

Lecture-44 Soft Computing Synchronous operation: if a network is operated synchronously all node change their output simultaneously according to global clock signal and there is a time delay associated with each link. i.e. x 3 (t+1)=f 3 (x 1 (t),x 5 (t) ) x 4 (t+1)=f 4 (x 2 (t),x 3 (t)) x 5 (t+1)=f 5 (x 4 (t),x 6 (t)) x 6 (t+1)=f 6 (x 4 (t),x 6 (t)) Back propagation through time (BPTT): We have to identify a set of parameter/weights that will make the output of a mode follow a given trajectory (tracking or trajectory following) This is done by unfolding of time to transform a recurrent n/w into a feed forward one as long as t does not exceed a reasonable T max

Network in fig(2) and fig(3) behave identically if all copies of the parameters or weight remain identical across different time slot. For parameter to remain constant one can go for parameter sharing. After setting up the parameter nodes in this way, back propagation is as usual applied to the network. The error signals of a parameter node come from nodes located at layers across different terms instants, thus the back propagation procedure (and the corresponding steepest descent) for this kind of unfolded network is often called back propagation through time(bptt) Real Time Recurrent Learning (RTRL): Only compilation with BPTT is that it requires extensive computation resources when the sequence length T is large, as duplication of node makes both simulation time and memory requirement proportional to T RTRL performs on line learning i.e. to update the parameter while the network is running rather than at the end of the presented sequence.

Consider the following network: Assume: E= = d i= time index d i = desired o/p x i =actual o/p to save computation and memory time a better option is to minimize E i at each step time instead of trying to minimize the sequence at the end of the sequence. At i=1 = Error at i=2 = and = + J=3 = and = +

Lecture-45 Soft Computing Continuously operated Networks: Mason s Gain Formula x 3 =f 3 (x 1,x 5 ) x 4 =f 4 (x 2,x 3 ) x 5 =f 5 (x 4,x 6 ) x 6 =f 6 (x 4,x 6 ) In a network that is operating in continuous mode all nodes continuously change their o/p till eq -1 is satisfied --- this operating mode is of particular interest for analog circuit implementation. Here dynamical evolution rule is imposed on the network. T 3 + x 3 = f 3 (x 1,x 5 )

When x 3 stops changing i.e. is zero then eq 1 is satisfied. It is assumed that at least one such fixed point exist for every node o/p. Assuming that error measure is a function of o/p node: =, = + = + = +, +,, Let be denoted as 6 i E 3 =E 4 w 43 E 4 =E 5 w 54 + E 6 w 64 E 5 =E 5 w 35 + E 6 =E 5 w 56 + E 6 w 66 + W ij = Once Ep, the gradient for a generic parameter α in node i can be found directly α = f α = f α Eq -2 can be represented as a recurrent n/w as shown below: