On Variable Constraints in Privacy Preserving Data Mining

Size: px
Start display at page:

Download "On Variable Constraints in Privacy Preserving Data Mining"

Transcription

1 On Variable Cnstraints in Privacy Preserving Data Mining Charu C. Aggarwal, Philip S. Yu IBM T. J. Watsn Research Center { charu, psyu }@us.ibm.cm Abstract In recent years, privacy preserving data mining has becme an imprtant prblem because f the large amunt f persnal data which is tracked by many business applicatins. In many cases, users are unwilling t prvide persnal infrmatin unless the privacy f sensitive infrmatin is guaranteed. A recent framewrk perfrms privacy preserving data mining by using a cndensatin based apprach. In this framewrk, the privacy f all recrds is treated hmgeneusly. It is therefre inefficient t design a system with a unifrm privacy requirement ver all recrds. We discuss a new framewrk fr privacy preserving data mining, in which the privacy f all recrds is nt the same, but can vary cnsiderably. This is ften the case in many real applicatins, in which different grups f individuals may have different privacy requirements. We discuss a cndensatin based apprach fr privacy preserving data mining in which an efficient methd is discussed fr cnstructing the cndensatin in a hetergeneus way. The hetergeneus cndensatin is capable f handling bth static and dynamic data sets. We present empirical results illustrating the effectiveness f the methd. 1 Intrductin Privacy preserving data mining has becme an imprtant prblem in recent years, because f the large amunt f cnsumer data tracked by autmated systems n the internet. The prliferatin f electrnic cmmerce n the wrld wide web has resulted in the strage f large amunts f transactinal and persnal infrmatin abut users. In additin, advances in hardware technlgy have als made it feasible t track infrmatin abut individuals frm transactins in everyday life. In many cases, users are nt willing t supply such persnal data unless its privacy is guaranteed. Therefre, in rder t ensure effective data cllectin, it is imprtant t design methds which can mine the data with a guarantee f privacy. Sme interesting discurses n the nature f privacy in the cntet f recent trs in infrmatin technlgy may be fund in [6, 9, 10]. The recent fcus n privacy in data cllectin has resulted t a cnsiderable amunt f research n the subject [1, 2, 3, 4, 5, 7, 11, 12, 15, 16]. A recent apprach t privacy preserving data mining has been a cndensatin-based technique [2]. This technique essentially creates cndensed grups f recrds which are then utilized in ne f tw ways: The statistical infrmatin in the pseud-grups can be utilized t generate a new set f pseuddata which can be utilized with a variety f data mining algrithms. The cndensed pseud-grups can be utilized directly with minr mdificatins f eisting data mining algrithms. The cndensatin apprach f [2] is als referred t as the k-indistinguishability mdel. A recrd is said t be k-indistinguishable, when there are at least k ther recrds in the data (including itself) frm which it cannt be distinguished. Clearly, when a recrd is 1-indistinguishable, it has n privacy. The k- indistinguishability f a recrd is achieved by placing it in a grup with at least (k 1) ther recrds. This mdel mdel shares a number f cnceptual characteristics with the k-annymity mdel [18], thugh the algrithms fr ding s are quite different. Anther imprtant difference between the tw schemes is that the frmer des nt rely n dmain specific hierarchies (as in the case f the k-annymity mdel). The k-indistinguishability mdel can als wrk effectively in a dynamic envirnment such as that created by data streams. In the mdel discussed in [2], it was assumed that all recrds have the same privacy requirement. This is als the case fr the k-annymity mdel in which the level f privacy is fied a-priri. In mst practical applicatins, this is nt be a reasnable assumptin. Fr eample, when a data repsitry cntains recrds frm hetergeneus data surces, it is rarely the case that each repsitry has the same privacy requirement. Similarly, in an applicatin tracking the data fr brkerage custmers, the privacy requirements f retail investrs are likely t be different frm thse f institutinal investrs. Even amng a particular class f custmers, sme custmers

2 (such as high net-wrth individuals) may desire a higher level f privacy than thers. In general, we wuld like t assciate a different privacy level with each recrd in the data set. Let us assume that we have a database D cntaining N recrds. The recrds are dented by X 1... X N. We dente this desired privacy level fr recrd X i by p(i). The prcess f finding cndensed grups with varying level f pint specific privacy makes the prblem significantly mre difficult frm a practical standpint. This is because it is nt advisable t pre-segment the data int different privacy levels befre perfrming the cndensatin separately fr each segment. When sme f the segments cntain very few recrds, such a cndensatin may result in an inefficient representatin f the data. In sme cases, the number f recrds fr a given level f privacy k may be lwer than k. Clearly, it is nt even pssible t create a grup cntaining nly recrds with privacy level k, since the privacy level f the entire grup wuld then be less than k. Therefre, it is nt pssible t create an efficient (and feasible) system f grup cndensatin withut miing recrds f different privacy levels. This leads t a number f interesting trade-ffs between infrmatin lss and privacy preservatin. We will discuss these trade-ffs and the algrithms t ptimize them. In many cases, the data may be available at ne time r it may be available in a mre dynamic and incremental fashin. algrithm: We discuss tw cases fr ur We discuss an algrithm t perfrm the cndensatin when the entire data is available at ne time. We discuss an algrithm fr the case when the data is available incrementally. This is a mre difficult case because it is ften nt pssible t design the mst effective cndensatin at the mment the data becmes available. We will shw that in mst cases, the algrithm fr perfrming the dynamic grup cnstructin is able t achieve results which are cmparable t the algrithm fr static grup cnstructin. This paper is rganized as fllws. In the net sectin, we will discuss sme ntatins and definitins and als intrduce the lcality sensitive cndensatin apprach. We will first discuss the simple case in which an entire data set is available fr applicatin f the privacy preserving apprach. This apprach will be eted t incrementally updated data sets in sectin 3. The empirical results are discussed in sectin 4. Finally, sectin 5 cntains the cnclusins and summary. 2 The Cndensatin Apprach In this sectin, we will discuss the cndensatin apprach fr privacy preserving data mining. Befre describing details f the algrithm, we will discuss sme ntatins and definitins. We assume that we have a set f N recrds, each f which cntain d dimensins. We als assume that assciated with each data pint i, we have a crrespnding privacy level p(i). The verall database is dented by D whereas the database crrespnding t the privacy level p is dented by D p. The privacy level fr a recrd is defined as fllws: Definitin 2.1. The privacy level fr a given recrd is defined as the minimum number f ther recrds in the data frm which it cannt be distinguished. In the cndensatin based apprach, the data is partitined int grups f recrds. Recrds within a given grup cannt be distinguished frm ne anther. Fr each grup, we maintain certain summary statistics abut the recrds. This summary statistics prvides the ability t apply data mining algrithms directly t the cndensed grups f recrds. This infrmatin als suffices t preserve infrmatin abut the mean and crrelatins acrss the different dimensins. The size f the grups may vary, but its size is at least equal t the desired privacy level f each recrd in that grup. Thus, a recrd with privacy level equal t p(i) may be cndensed with recrds f privacy levels different frm p(i). Hwever, the size f that grup must at least be equal t the maimum privacy level f any recrd in that grup. Each grup f recrds is referred t as a cndensed unit. Let G be a cndensed grup cntaining the recrds {X 1... X k }. Let us als assume that each recrd X i cntains the d dimensins which are dented by ( 1 i... d i ). The fllwing infrmatin is maintained abut each grup f recrds G: Fr each attribute j, we maintain the sum f crrespnding values. The crrespnding value is given by k i=1 j i. We dente the crrespnding first-rder sums by F s j (G). The vectr f first rder sums is dented by F s(g). Fr each pair f attributes i and j, we maintain the sum f the prduct f crrespnding attribute values. The crrespnding sum is given by k t=1 i t j t. We dente the crrespnding secnd rder sums by Sc ij (G). The vectr f secnd rder sums is dented by Sc(G). We maintain the sum f the privacy levels f the recrds in the grup. This number f dented by P s(g).

3 All data pints with Privacy Level 4 All data pints with Privacy Level 3 Befre Attritin Figure 1: Levels After Attritin The efficiency f Miing Different Privacy We maintain the ttal number f recrds k in that grup. This number is dented by n(g). The fllwing facts are true abut the recrds in a given grup. Observatin 2.1. The mean value f attribute j in grup G is given by F s j (G)/n(G). Observatin 2.2. The cvariance between attributes i and j in grup G is given by Sc ij (G)/n(G) F s i (G) F s j (G)/n(G) 2. We nte that the algrithm fr grup cnstructin must try t put each recrd in a grup which is at least equal t the maimum privacy level f any recrd in the grup. A natural slutin is t first classify the recrds based n their privacy levels and then indepently create the grups fr varying privacy levels. Unfrtunately, this des nt lead t the mst efficient methd fr packing the sets f recrds int different grups. This is because the mst effective methd fr cnstructing the grups may require us t cmbine recrds frm different privacy levels. Fr eample, a recrd with a very lw privacy requirement may smetimes naturally be cmbined with a grup f high privacy recrds in its lcality. An attempt t cnstruct a separate grup f recrds with a lw privacy requirement may lead t an even higher lss f infrmatin. In rder t illustrate this pint better, we will prvide an eample. Cnsider the set f recrds illustrated in Figure 1. In this case, there are 3 recrds with privacy level 3 and 5 recrds with privacy level 4. One way f gruping the recrds is t place all the recrds f privacy level 3 in ne grup and all recrds with privacy level 4 in the ther. Unfrtunately, the grup crrespnding t privacy level 4 turns ut t be ineffective in representing the data. The cndensed grup utilized frm this set f recrds has pr statistical characteristics, since ne f the data pints is far remved frm the grup. Since the cndensed statistics f the grup des nt represent the variatins within it, this can lead t an Algrithm CnstructGrups(Level: M ap rivacylevel, Database: D); p = 2; H 1 = Grups frm singletn pints in D 1 ; while (p MaP rivacylevel) d H p = Segment(D p, p); (H p 1, H p ) = Cannibalize(H p 1, H p )); (H p 1, H p ) = Attritin(H p 1, H p ); H p = H p H p 1 ; p = p + 1; ; Figure 2: The Prcess f Grup Cnstructin fr Privacy Preserving Data Mining Algrithm Segment(Database: D p, Privacy level: p) while D p cntains at least p data pints; Sample a data pint X frm D p ; Find the (p 1) data pints clsest t X in D p ; Create a grup G f p data pints cmprising X and the p 1 ther clsest data pints; Add G t the set f grups H; Assign remaining data pints in D p t clsest grups; Figure 3: Grup Segmentatin inefficient representatin in many cases. In the situatin illustrated in Figure 1, it is better t place the utlying recrd f privacy level 4 int the grup with privacy level 3. We als nte that it may nt be pssible t place this utlying recrd in a grup with nly tw pre-eisting members, because f the higher privacy requirement f the recrd. First, we need a measure t quantify the effectiveness f a given cndensatin based apprach. In general, this effectiveness is related t the level f cmpactness with which we can partitin the data int different grups. As a gal, this cmpactness is nt very different frm the aim f mst clustering algrithms. Hwever, the difference here is that there are several cnstraints n the cardinality f the data pints in each grup as

4 Algrithm Cannibalize(Grups: H p 1, H p ); fr each grup G H p 1 d fr each pint in G perfrm temprary assignment t clsest grup in H p ; if (SSQ f temprary assignment is lwer) r (H p 1 cntains fewer than (p 1) members), then make assignment permanent else keep ld assignment; Figure 4: Cannibalizatin Algrithm Algrithm Attritin(Grups: H p 1, H p, Privacy Level: p); fr each data pint X in H p d Distc(X, p) = Distance f X t centrid f its current grup in H p ; Dist(X, p 1) = Distance f X t centrid f its clsest viable grup in H p 1 ; Imprve(X) = Distc(X, p) Dist(X, p 1); ; fr each grup in H p with at least p > p pints d find (if any) the at mst (p p) data pints with largest value f Imprve( ) functin which is larger than 0; Assign these at mst (p p) pints t their crrespnding clsest grups in H p 1 ; Grup 1 Privacy Level 2 Figure 5: Attritin Algrithm Grup 2 Privacy Level 3 Grup 3 Privacy Level 3 Grup 1 Cannibalized Grup 2 Grup 3 Figure 6: An eample f Cannibalizatin well as the identity f the data pints which can be added t a grup with given cardinality. Thus, fr the prcess f quantificatin f the cndensatin quality, we simply use the square sum errr f the data pints in each grup. While the privacy level f a grup is determined by the number f recrds in it, the infrmatin lss is defined by the average variance f the recrds abut their centrid. We will refer t this quantity as the Sum Squared Errr (SSQ). The methd f grup cnstructin is different deping upn whether an entire database f recrds is available r whether the data recrds arrive in an incremental fashin. We will discuss tw appraches fr cnstructin f class statistics. The first apprach is utilized fr the case when the entire database f recrds is available. The secnd apprach is utilized in an incremental scheme in which the data pints arrive ne at a time. First, we will discuss the static case in which the entire database f recrds is available. The essence f the static apprach is t cnstruct the grups using an iterative methd in which the grups are prcessed with increasing privacy level. The verall prcess f grup cnstructin is illustrated in Figure 2. The input t the algrithm is the database D and the maimum privacy level which is dented by MaP rivacylevel. We assume that the segment f the database with privacy level requirement f p is dented by D p. We als assume that the set f grups with privacy level f p is dented by H p. We nte that the database D 1 cnsists f the set f pints which have n privacy cnstraint at all. Therefre, the grup H 1 cmprises f the singletn items frm the database D 1. Net, we cnstruct the statistics f the grups in H p using an iterative algrithm. In each iteratin, we increase the privacy level p by 1, and cnstruct the cndensed grups H p which have privacy level p. The first step is t cnstruct the grup H p by using a purely segmentatin based prcess. This prcess is dented by Segment in Figure 2. This segmentatin prcess is a straightfrward iterative apprach. In each iteratin, a recrd X is sampled frm the database H p. The clsest (p 1) recrds t this individual recrd X are added t this grup. Let us dente this grup by G. The statistics f the p recrds in G are cmputed. Net, the p recrds in G are remved frm D p. The prcess is repeated iteratively, until the database D p is empty. We nte that at the f the prcess, it is pssible that between 1 and (p 1) recrds may remain. These recrds can be added t their nearest sub-grup in the data. Thus, a small number f grups in the data may cntain larger than p data pints. The segmentatin prcedure is illustrated in Figure 3.

5 Once the segmentatin prcedure has been perfrmed, we apply the prcess f Attritin and Cannibalize in rder t further reduce the level f infrmatin lss withut cmprmising n the privacy requirements. The purpse f the Cannibalize prcedure is slightly different. In this prcedure, we int t cannibalize sme f the grups in H p 1 and reassign their data pints t better fitting grups in H p. Cnsider the eample illustrated in Figure 6. In this case, we have illustrated three grups. One f the grups (cntaining tw pints) has privacy level f tw, and anther grup (cntaining three pints) has privacy level f three. Hwever, the grup with privacy level tw des nt frm an natural cluster f data pints. In such a case, it may be desirable t break up the grup with privacy level 2 and assign ne pint each t the grups with privacy level 3. Thus, cannibalizatin is perfrmed when the grup G H p 1 des nt frm a natural cluster. In such cases, it is mre effective t cannibalize the grup G and reassign its grup members t ne r mre clusters in H p. Anther eample f a situatin when cannibalizatin is desirable is when H p 1 has fewer than (p 1) members. Such a situatin arises in situatins in which there are very few recrds fr a given privacy level. Cnsequently, it is nt pssible t create a grup cntaining nly the pints at a particular privacy level. We refer t this test fr cannibalizatin as the numerical test. If the grup passes the numerical test, we perfrm an additinal qualitative test t see if cannibalizatin shuld be perfrmed. In rder t test whether the cannibalizatin prcedure shuld be perfrmed, we calculate the SSQ f the regruping when a temprary assignment f the data pints in G is perfrmed t ne r mre grups in H p. If the SSQ f the resulting assignment is lwer, then we make this assignment permanent. The pseud-cde fr the cannibalizatin prcess is illustrated in Figure 4. By perfrming this peratin, the apprpriate privacy level f all data pints is maintained. This is because the cannibalizatin prcess nly assigns data pints t grups with higher privacy level. Therefre, the assigned data pints find themselves in a grup with at least their crrespnding required privacy. We nte that sme grups in H p may smetimes cntain mre than p data pints. This is due t the effects f the Segment and Cannibalize prcedures discussed earlier. The idea in the Attritin prcedure is t mve these ecess pints t a better fitting grup in H p 1. The mvement f these ecess pints is likely t imprve the quality f data representatin in terms f reducing the level f infrmatin lss. An eample f such a case is illustrated in Figure 1. In this case, the grup with five data pints cntains ne recrd which des nt fit very well with the rest f the grup. In such a case, the reassignment f the data pint t a grup with privacy level 3 results in a mre cmpact representatin. We nte that the reassigned data pint has privacy level 4. Hwever, the reassignment prcess results in the grup with privacy level 3 cntaining 4 data pints. Therefre, even thugh the data pint with privacy level 4 was assigned t a grup with lwer privacy level, the resulting grup cntinues t maintain the desired level f privacy fr the reassigned data pint. Fr this purpse, during the attritin prcess we cnsider nly thse grups which are viable fr reassignment. Fr a grup t be cnsidered viable, it must cntain at least as many data pints as the privacy level (after the assignment). Furthermre, fr a grup G cntaining p data pints and with privacy level p, we can remve at mst (p p) data pints frm it withut disturbing the privacy level f the remaining grup. In rder t perfrm the actual reassignment, we calculate a functin called Imprve(X) fr each data pint X G. The value f Imprve(X) is defined t be difference between the distance f X frm its clsest viable centrid and the distance frm its current centrid. Clearly, the reassignment f the data pint X t anther grup is useful nly when the value f Imprve(X) is larger than 0. We re-assign the at mst (p p) data pints with largest value f Imprve( ), prvided that the value f Imprve( ) fr each f these data pints is larger than 0. The verall attritin prcedure is illustrated in Figure 5. The prcesses f segmentatin, cannibalizatin and attritin are applied iteratively t the segment D p f the database fr each value f the privacy level p. The value f p is incremented by 1 in each iteratin up t the maimum privacy level. The set f grups cnstructed at this pint are returned as the final cndensatin. Once the cndensed statistics have been cnstructed, annymized data can be generated as discussed in [2]. The annymized data is generated using the statistical prperties which can be derived frm the grup. While this new set f pints resembles the riginal data distributin, it maintains the privacy f the data. The prcess f annymized grup cnstructin is achieved by first cnstructing a d d cvariance matri fr each grup G. This matri is dented by C(G). The ijth entry f the c-variance matri is the c-variance between the attributes i and j f the set f recrds in G. The eigenvectrs f this c-variance matri are determined by decmpsing the matri C(G) in the fllwing frm: (2.1) C(G) = P (G) (G) P (G) T The clumns f P (G) are the eigenvectrs f C(G). The diagnal entries λ 1 (G)... λ d (G) f (G) represent

6 a a/2 split a/2 plane Centers f split b grups Figure 7: Splitting Grup Statistics (Illustratin) the crrespnding eigenvalues. It can be shwn that the eigenvectrs f a cvariance matri frm an rthnrmal ais system. This rth-nrmal ais-system represents the directins alng which the secnd rder crrelatins are zer. If the data were represented using this rth-nrmal ais system, then the cvariance matri wuld be the diagnal matri crrespnding t (G). The diagnal entries f (G) represent the variances alng the individual dimensins in this new ais system. We can assume withut lss f generality that the eigenvalues λ 1 (G)... λ d (G) are rdered in decreasing magnitude. The crrespnding eigenvectrs are dented by e 1 (G)... e d (G). The annymized data fr each grup is recnstructed assuming that the data within each grup is indepently and unifrmly distributed alng the different eigenvectrs. Furthermre, the variance f the distributin alng each eigenvectr is equal t the crrespnding eigenvalue. These apprimatins are reasnable when nly a small spatial lcality is used. 3 Dynamic Maintenance f Grups The prcess f dynamic maintenance f grups is useful in a variety f settings such as that f data streams. In the prcess f dynamic maintenance, the pints in the data stream are prcessed incrementally. It is assumed that a set S f the data pints (dented by InitNumber) are available at the ning f the prcess. The static prcess CnstructGrups is applied t this set S. Once the initial grups have been cnstructed, a dynamic prcess f grup maintenance is applied in rder t maintain the cndensed grups f varying privacy levels. The incremental algrithm wrks by using a nearest neighbr apprach. When an incming data pint X i is received, we find the clsest cluster t it using the distance f the data pint X i t the different centrids. While it is desirable t add X i t its clsest centrid, we cannt add X i t a given cluster which has fewer than p(i) 1 data pints in it. Therefre, the data pint X i is added t the clsest cluster which als happens t have at least p(i) 1 data pints inside it. In general, it is nt desirable t have grups with high sizes cmpared t their cnstituent privacy levels. When such a situatin arises, it effectively means that a higher level f representatinal inaccuracy is created than is really necessary with the privacy requirements f the pints within the grup. The average privacy level f the grup G can be cmputed frm the cndensed statistics. This number is equal t P s(g)/n(g). This is because P s(g) is equal t the sum f the privacy levels f the data pints in the grup. The split criterin used by ur algrithm is that a grup is divided when the number f items in the grup is mre than twice the average privacy level f the items in the grup. Therefre, the grup is split when the fllwing hlds true: (3.2) n(g) 2 P s(g)/n(g) As in the case f annymized data cnstructin, we utilize the unifrmity assumptin in rder t split the grup statistics. In each case, the grup is split alng the eigenvectr with the largest eigenvalue. This als crrespnds t the directin with the greatest level f variance. This is dne in rder t reduce the verall variance f the resulting clusters and ensure the greatest cmpactness f representatin. An eample f this case is illustrated in Figure 7. We assume withut lss f generality that the eigenvectr e 1 with the lwest inde is the chsen directin the split. The crrespnding eigenvalue is dented by λ 1. Since the variance f the data alng e 1 is λ 1, then the range (a) f the crrespnding unifrm distributin alng e 1 is given 1 by a = 12 λ 1. In such a case, the riginal grup f size 2 k is split int tw grups f equal size. We need t determine the first rder and secnd rder statistical data abut each f the split grups M 1 and M 2. We assume that the privacy cmpnent P s(g) is als equally divided between the tw grups. We first derive the centrid and eigenvectr directins fr each grup. These values are sufficient t recnstruct the values f F s i (G) and Sc ij (G) abut each grup. Assume that the centrid f the unsplit grup M is dented by Y (M). This centrid can be cmputed 1 This calculatin was dne by using the frmula fr the standard deviatin f a unifrm distributin with range a. The crrespnding standard deviatin is given by a/12.

7 frm the first rder values F s(m) as fllws: (3.3) Y (M) = (F s 1 (M),... F s d (M))/n(G) Once the centrid has been cmputed, thse f each f the split grups can be cmputed as well. Frm Figure 7, it is easy t see that the centrids f each f the split grups M 1 and M 2 are given by Y (M) (a/4) e 1 and Y (M) + (a/4) e 1 respectively. By substituting a = 12 λ 1, it is easy t see that the new centrids f the grups M 1 and M 2 are given by Y (M) ( 12 λ 1 /4) e 1 and Y (M) + ( 12 λ 1 /4) e 1 respectively. We will nw discuss hw t cmpute the secnd rder statistical values. The first step is the determinatin f the cvariance matri f the split grups. Let us assume that the ijth entry f the c-variance matri fr the grup M 1 is given by C ij (M 1 ). We als nte that the eigenvectrs f M 1 and M 2 are identical t the eigenvectrs f M, since the directins f zer crrelatin remain unchanged by the splitting prcess. Therefre, we have: e 1 (M 1 ) = e 1 (M 2 ) = e 1 (M) e 2 (M 1 ) = e 2 (M 2 ) = e 2 (M) e 3 (M 1 ) = e 3 (M 2 ) = e 3 (M)... e d (M 1 ) = e d (M 2 ) = e d (M) The eigenvalue (in the split grups M 1 and M 2 ) crrespnding t e 1 (M) is equal t λ 1 /4. This is because the splitting prcess alng e 1 reduces the crrespnding variance by a factr f 4. Other eigenvalues remain unchanged. Let P (M) represent the eigenvectr matri f M, and (M) represent the crrespnding diagnal matri. Then, the new diagnal matri (M 1 ) = (M 2 ) f M 1 can be derived by dividing the entry λ 1 (M) by 4. Therefre, we have: λ 1 (M 1 ) = λ 1 (M 2 ) = λ 1 (M)/4 The ther eigenvalues f M 1 and M 2 remain the same: λ 2 (M 1 ) = λ 2 (M 2 ) = λ 2 (M) λ 3 (M 1 ) = λ 3 (M 2 ) = λ 3 (M)... λ d (M 1 ) = λ d (M 2 ) = λ d (M) Thus, the (identical) c-variance matries f M 1 and M 2 may be determined as fllws: C(M 1 ) = P (M 1 ) (M 1 ) P (M 1 ) T Frm Observatin 2.2, it is clear that the secnd rder statistics f M 1 may be determined as fllws: Sc ij (M 1 ) = k C ij (M 1 ) + F s i (M 1 ) F s j (M 1 )/k An imprtant bservatin is that even thugh the cvariance matrices f M 1 and M 2 are identical, the values f Sc ij (M 1 ) and Sc ij (M 2 ) are different because f different first rder aggregates substituted in the abve frmula fr Sc ij (M 1 ). The verall prcess fr splitting the grup statistics is illustrated in Figure 7. Anther interesting pint t be nted is that the entire purpse f splitting is t keep grups sizes sufficiently cmpact fr data mining algrithms. The prcess f splitting itself can never result in the vilatin f the privacy cnditin, since the split grup is based n a split f the statistics, but nt f the data pints themselves. In rder t understand this pint, let us cnsider the fllwing eample f a case where the split cnditin seems t vilate privacy. Cnsider a grup having 5 tuples, the privacy cnstraints f the tuples being 2, 2, 2, 3, 5 respectively. The grup des nt split because 5 < 2 14/5. Nw, if a new tuple having privacy cnstraint 3 wants t jin the grup, the splitting cnditin is satisfied since 6 > 2 17/6. Hence each f the split grup crrespnds t statistics f 3 data pints. Therefre, it wuld apparently seem that the privacy f the tuple with requirement 5 has been vilated. This is nt the case since we split the statistics int tw pseud-grups f 3 pints each, rather than actually split the pints themselves. The prcess f perfrming the split partitins the statistics based n a prbability distributin assumptin (unifrm distributin) rather than using the actual pints themselves (which have already been lst in the merged statistics). The tuple with privacy cnditin 5 may cntribute t the statistics f bth grups, when the splitting cnditin is used. Each pseud-grup thus has a privacy level as high as the unsplit grup, frm the perspective f the ld data pints in it, but at the same time we wuld need t use the size f the grup while cnsidering the additin f further data pints int the smaller pseud-grups. In rder t test the quality f ur results we applied ur apprach t a nearest neighbr classifier. In the classificatin prcess, the cndensatin prcess was perfrmed separately fr each class. In the net sectin, we will discuss the behavir f this nearest neighbr classifier. 4 Empirical Results We tested the privacy preserving apprach ver a wide range f data sets and metrics. An imprtant questin which arises in the cntet f a privacy preserving

8 1 Insphere Data Set (Classificatin Accuracy) Classificatin Accuracy (Static Cndensatin) Classificatin Accuracy (Dynamic Cndensatin) Classificatin Accuracy (Original Data) 0.9 Classificatin Accuracy Alpha (Maimum Grup Size) Cvariance Cmpatibility Cefficient Ecli Data Set (Statistical Cmpatibility) Figure 8: Accuracy f Classifier with Increasing Privacy Level (Insphere Data Set) Cvariance Cmpatibility Cefficient Insphere Data Set (Statistical Cmpatibility) Cvariance Cmpatibility Cefficient (Static Cndensatin) Cvariance Cmpatibility Cefficient (Dynamic Cndensatin) Cvariance Cmpatibility Cefficient (Static Cndensatin) Cvariance Cmpatibility Cefficient (Dynamic Cndensatin) Alpha (Maimum Grup Size) Figure 11: Cvariance Cmpatibility f Cndensed Data Set with Increasing Privacy Level (Ecli Data Set) Alpha (Maimum Grup Size) Figure 9: Cvariance Cmpatibility f Cndensed Data Set with Increasing Privacy Level (Insphere Data Set) Pima Data Set (Classificatin Accuracy) Classificatin Accuracy (Static Cndensatin) Classificatin Accuracy (Dynamic Cndensatin) Classificatin Accuracy (Original Data) Classificatin Accuracy Ecli Data Set (Classificatin Accuracy) Classificatin Accuracy (Static Cndensatin) Classificatin Accuracy (Dynamic Cndensatin) Classificatin Accuracy (Original Data) Classificatin Accuracy Alpha (Maimum Grup Size) Alpha (Maimum Grup Size) Figure 12: Accuracy f Classifier with Increasing Privacy Level (Pima Indian Data Set) Figure 10: Accuracy f Classifier with Increasing Privacy Level (Ecli Data Set)

9 1 Abalne Data Set (Statistical Cmpatibility) Cvariance Cmpatibility Cefficient Pima Data Set (Statistical Cmpatibility) Cvariance Cmpatibility Cefficient Cvariance Cmpatibility Cefficient (Static Cndensatin) Cvariance Cmpatibility Cefficient (Dynamic Cndensatin) Alpha (Maimum Grup Size) Cvariance Cmpatibility Cefficient (Static Cndensatin) Cvariance Cmpatibility Cefficient (Dynamic Cndensatin) Alpha (Maimum Grup Size) Figure 15: Cvariance Cmpatibility f Cndensed Data Set with Increasing Privacy Level (Abalne Data Set) Figure 13: Cvariance Cmpatibility f Cndensed Data Set with Increasing Privacy Level (Pima Indian Data Set) Classificatin Accuracy Abalne Data Set (Classificatin Accuracy) Classificatin Accuracy (Static Cndensatin) Classificatin Accuracy (Dynamic Cndensatin) Classificatin Accuracy (Original Data) Alpha (Maimum Grup Size) Figure 14: Accuracy f Classifier with Increasing Privacy Level (Abalne Data Set) apprach is the nature f the metric t be used in rder t test the quality f the apprach. The first step is t test the nature f the tradeff between increased levels f privacy, and the resulting infrmatin lss. While the level f privacy is cntrlled by the average cndensed grup size, the infrmatin lss is measured indirectly in terms f the effect f the perturbatin n the quality f data mining algrithms. We tested the accuracy f a simple k-nearest neighbr classifier with the use f different levels f privacy. The minimum privacy level f each data pint was generated frm a (discrete) unifrm distributin in the range [α β, α]. By changing the value f α it is pssible t vary the level f privacy during the cndensatin prcess. The aim f ur apprach is t shw that a high level f privacy can be achieved withut significantly cmprmising accuracy. Anther useful metric fr testing the quality f the privacy preserving prcess arises frm the level f matching between the riginal and perturbed data. This prvides insight int the nature f the relatinship between the riginal data set and perturbed data set. The first step is therefre t identify the statistics used fr testing the effectiveness f the perturbatin prcess. One simple methd is t test hw the cvariance structure f the perturbed data set matched with the riginal data set. This is because the cvariance structure f the data identifies the essential data prperties up t a secnd rder apprimatin. If the newly created data set has very similar data characteristics t the riginal data set, then the cndensed data set is a gd substitute fr mst data mining algrithms. Fr each dimensin pair (i, j), let the crrespnding entries in the cvariance matri fr the riginal and the perturbed data be dented

10 by ij and p ij respectively. We cmputed the statistical cefficient f crrelatin between the data entry pairs ( ij, p ij ). Let us dente this value by µ. When the tw matrices are identical, the value f µ is 1. On the ther hand, when there is perfect negative crrelatin between the entries, the value f µ is 1. A number f real data sets frm the UCI machine learning repsitry 2 were used fr the testing. We used the Insphere, Ecli, Pima Indian and Abalne data sets. The last data set was a regressin mdeling prblem, and therefre the classificatin measure needed t be redefined. Fr this prblem, the classificatin accuracy measure used was the percentage f the time that the age was predicted within an accuracy f less than ne year by the nearest neighbr classifier. In many cases, the number f data pints fr a given privacy level fr lwer than the numerical value f the privacy level itself. In such cases, the miing f data pints fr different privacy levels is inevitable. Thus, the cndensatin prcess culd nt have been perfrmed fr such cases using the hmgeneus k -annymity mdel r k- indistinguishability mdel [2, 18]. The results n classificatin accuracy fr the Insphere, Ecli, Pima Indian, and Abalne data sets are illustrated in Figures 8, 10, 12 and 14 respectively. The value f β was fied t 4, whereas the value f α was varied ver the different data sets. The range f values f α is determined by the number f data pints in the particular data set at hand. This value f α is illustrated n the X-ais. On the Y-ais, we have pltted the classificatin accuracy f the nearest neighbr classifier, when the cndensatin technique was used. Fr each graph, we have illustrated the results using bth static and dynamic cndensatin. In additin, a baseline is marked n each graph. This baseline is a hrizntal line n the graph which shws the classificatin accuracy using the riginal data. It is clear that in mst cases, the accuracy f classificatin reduced with increasing grup size. This is a natural tradeff because a greater amunt f privacy is achieved with larger grups sizes. At the same time, it leads t a higher amunt f infrmatin lss. In many cases, the quality f the classificatin imprved because f the cndensatin prcess. in mst cases. While the aim f ur apprach was t prvide a high level f privacy withut lsing infrmatin, it appears that the prcess f cndensatin itself actually helped in remving the anmalies in the data fr the purpse f classificatin. This phenmenn is likely t be helpful ver a number f different data mining prblems in which the aggregate behavir f the data is epsed by the cndensatin prcess. 2 http : // mlearn Furthermre, the static cndensatin apprach prvided higher quality results than the dynamic technique. This is because the splitting algrithm f the dynamic cndensatin prcess intrduced an additinal level f apprimatin int the data representatin. The splitting prcedure assumed a unifrm distributin f the data within a cndensed grup f data pints. The accuracy f this apprimatin reduces when grup sizes are small. In such cases, there are simply t few data pints t make an accurate estimatin f the values f split grup statistics. Thus, the use f the unifrm distributin apprimatin reduces the quality f the cvariance statistics in the split grups fr small grup sizes. Fr this reasn, the dynamic cndensatin prcess was smetimes less effective than the static cndensatin apprach. Hwever, in all cases, the dynamic cndensatin apprach wrked almst as effectively as the classifier n the riginal data. One ntable eceptin t the general advantage f the static cndensatin prcess was the behavir n the Pima Indian data set. In this case, the dynamic cndensatin prcess prvided results f higher quality fr larger grup sizes. The reasn fr this was that the splitting prcess seemed t imprve the quality f the classificatin. The data set seemed t cntain a number f anmalies. These anmalies were remved by the splitting prcess. This resulted in a higher classificatin accuracy f the dynamic apprach. We als cmpared the cvariance characteristics f the data sets. The results are illustrated in Figures 9, 11, 13 and 15 respectively. Fr mst data sets, the value f the statistical crrelatin is almst perfect. This crrespnds t the fact that the crrelatin values was larger than in mst cases. Fr sme eamples such as the Abalne data set (illustrated in Figure 15), the cvariance cmpatibility value was larger than These results emphasize the fact that the perturbed data is similar t the riginal data in terms f its statistical structure. As in the previus case, the results fr the case f static cndensatin were better than thse fr dynamic cndensatin. This is again because f the additinal inaccuracy intrduced by the splitting prcess. In all cases, the abslute crrelatin prvided by the scheme was very high. In the dynamic case, the crrelatin cefficient ted t drp fr small grup sizes. The nly eceptin t this general rule was the insphere data set in which the cvariance cmpatibility values were slightly lwer fr the static case. The cvariance cmpatibility als reduced fr etremely large grup sizes. This is because in such a case, the pseud-data n lnger represents a particular data lcality well. Thus, the cvariance cmpatibility was highest in thse cases in which the data cntained tight clusters cmprising a relatively mdest number f

11 data pints. This is because f the fllwing reasns: When the number f pints in each cluster were large, the accuracy f the unifrm distributin assumptin during the splitting prcess is maintained. When the clusters are tight, these data pints represent a small spatial lcality with respect t the rest f the data set. An apprimatin in a small spatial lcality des nt significantly affect the verall crrelatin structure. We nte that the prcess f representing a small spatial lcality in a grup and that f representing a larger number f data pints in a grup are tw cmpeting and cntradictry gals. It is imprtant t pick a balance between the tw, since this tradeff defines the quality f perfrmance n the underlying data mining algrithm. This balance is eternally defined, since the average grup size is determined by the privacy requirements f the users. In general, since ur apprach cntinued t be as effective as the base classificatin accuracy ver a wide range f grup sizes, this illustrates the effectiveness f ur methdlgy in mst practical scenaris. 5 Cnclusins and Summary In this paper, we discussed a scheme fr privacy preserving data mining in which the data pints are allwed t have variable privacy levels. This is useful in a number f applicatins in which different recrds have inherently different privacy requirements. We prpse a methd fr privacy prtectin in a data stream envirnment using cndensed statistics f the data set. These cndensed statistics can either be generated statically r they can be generated dynamically in a data stream envirnment. We tested ur results n a number f real data sets frm the UCI machine learning repsitry. The results shw that ur methd prduces data sets which are quite similar t the riginal data in structure, and als ehibit similar accuracy results. References [1] C. C. Aggarwal, and S. Parthasarathy, Mining Massively Incmplete Data Sets by Cnceptual Recnstructin, Prceedings f the ACM KDD Cnference, (2001), pp , [2] C. C. Aggarwal, and P. S. Yu, A Cndensatin Based Apprach t Privacy Preserving Data Mining, Prceedings f the EDBT Cnference, (2004), pp [3] D. Agrawal, and C. C. Aggarwal, On the Design and Quantificatin f Privacy Preserving Data Mining Algrithms, Prceedings f the ACM PODS Cnference, (2002). [4] R. Agrawal, and R. Srikant, Privacy Preserving Data Mining, Prceedings f the ACM SIGMOD Cnference, (2000). [5] P. Benassi, Truste: An nline privacy seal prgram, Cmmunicatins f the ACM, 42(2), (1999), pp [6] C. Cliftn, and D. Marks, Security and Privacy Implicatins f Data Mining, ACM SIGMOD Wrkshp n Research Issues in Data Mining and Knwledge Discvery, (1996), pp [7] J. Vaidya, and C. Cliftn, Privacy Preserving Assciatin Rule Mining in Vertically Partitined Data, ACM KDD Cnference, (2002). [8] T. M. Cver, J. A. Thmas, Elements f Infrmatin Thery, Jhn Wiley & Sns, Inc., New Yrk, (1991). [9] Cranr L. F. (Ed.) Special Issue n Internet Privacy, Cmmunicatins f the ACM, 42(2), (1999). [10] The Ecnmist, The End f Privacy, (1999). [11] V. Estivill-Castr, and L. Brankvic, Data Swapping: Balancing privacy against precisin in mining fr lgic rules, Data Warehusing and Knwledge Discvery, Springer-Verlag, Lecture Ntes in Cmputer Science 1676, (1999), pp [12] A. Evfimievski, R. Srikant, R. Agrawal, and J. Gehrke, Privacy Preserving Mining Of Assciatin Rules, ACM KDD Cnference, (2002). [13] A. Hinneburg, and D. A. Keim, An Efficient Apprach t Clustering in Large Multimedia Databases with Nise, ACM KDD Cnference, (1998). [14] V. S. Iyengar, Transfrming Data T Satisfy Privacy Cnstraints, ACM KDD Cnference, (2002). [15] C. K. Liew, U. J. Chi, and C. J. Liew, A data distrtin by prbability distributin, ACM TODS Jurnal, (1985), 10(3) pp [16] T. Lau, O. Etzini, and D. S. Weld, Privacy Interfaces fr Infrmatin Management, Cmmunicatins f the ACM, 42(10), (1999), pp [17] S. Murthy, Autmatic Cnstructin f Decisin Trees frm Data: A Multi-Disciplinary Survey, Data Mining and Knwledge Discvery, 2, (1998), pp [18] P. Samarati, and L. Sweeney, Prtecting Privacy when Disclsing Infrmatin: k-annymity and its Enfrcement Thrugh Generalizatin and Suppressin. Prceedings f the IEEE Sympsium n Research in Security and Privacy, (1998). [19] S. L. Warner, Randmized Respnse: A survey technique fr eliminating evasive answer bias, Jurnal f the American Statistical Assciatin, 60(309), (1965), pp

Chapter 3: Cluster Analysis

Chapter 3: Cluster Analysis Chapter 3: Cluster Analysis } 3.1 Basic Cncepts f Clustering 3.1.1 Cluster Analysis 3.1. Clustering Categries } 3. Partitining Methds 3..1 The principle 3.. K-Means Methd 3..3 K-Medids Methd 3..4 CLARA

More information

CAUSAL INFERENCE. Technical Track Session I. Phillippe Leite. The World Bank

CAUSAL INFERENCE. Technical Track Session I. Phillippe Leite. The World Bank CAUSAL INFERENCE Technical Track Sessin I Phillippe Leite The Wrld Bank These slides were develped by Christel Vermeersch and mdified by Phillippe Leite fr the purpse f this wrkshp Plicy questins are causal

More information

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeff Reading: Chapter 2 STATS 202: Data mining and analysis September 27, 2017 1 / 20 Supervised vs. unsupervised learning In unsupervised

More information

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeff Reading: Chapter 2 STATS 202: Data mining and analysis September 27, 2017 1 / 20 Supervised vs. unsupervised learning In unsupervised

More information

Pattern Recognition 2014 Support Vector Machines

Pattern Recognition 2014 Support Vector Machines Pattern Recgnitin 2014 Supprt Vectr Machines Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 1 / 55 Overview 1 Separable Case 2 Kernel Functins 3 Allwing Errrs (Sft

More information

Bootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) >

Bootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) > Btstrap Methd > # Purpse: understand hw btstrap methd wrks > bs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(bs) > mean(bs) [1] 21.64625 > # estimate f lambda > lambda = 1/mean(bs);

More information

A Matrix Representation of Panel Data

A Matrix Representation of Panel Data web Extensin 6 Appendix 6.A A Matrix Representatin f Panel Data Panel data mdels cme in tw brad varieties, distinct intercept DGPs and errr cmpnent DGPs. his appendix presents matrix algebra representatins

More information

Part 3 Introduction to statistical classification techniques

Part 3 Introduction to statistical classification techniques Part 3 Intrductin t statistical classificatin techniques Machine Learning, Part 3, March 07 Fabi Rli Preamble ØIn Part we have seen that if we knw: Psterir prbabilities P(ω i / ) Or the equivalent terms

More information

A Condensation Approach to Privacy Preserving Data Mining

A Condensation Approach to Privacy Preserving Data Mining A Condensation Approach to Privacy Preserving Data Mining Charu C. Aggarwal and Philip S. Yu IBM T. J. Watson Research Center, 19 Skyline Drive, Hawthorne, NY 10532 { charu, psyu }@us.ibm.com Abstract.

More information

MODULE FOUR. This module addresses functions. SC Academic Elementary Algebra Standards:

MODULE FOUR. This module addresses functions. SC Academic Elementary Algebra Standards: MODULE FOUR This mdule addresses functins SC Academic Standards: EA-3.1 Classify a relatinship as being either a functin r nt a functin when given data as a table, set f rdered pairs, r graph. EA-3.2 Use

More information

AP Statistics Notes Unit Two: The Normal Distributions

AP Statistics Notes Unit Two: The Normal Distributions AP Statistics Ntes Unit Tw: The Nrmal Distributins Syllabus Objectives: 1.5 The student will summarize distributins f data measuring the psitin using quartiles, percentiles, and standardized scres (z-scres).

More information

NUROP CONGRESS PAPER CHINESE PINYIN TO CHINESE CHARACTER CONVERSION

NUROP CONGRESS PAPER CHINESE PINYIN TO CHINESE CHARACTER CONVERSION NUROP Chinese Pinyin T Chinese Character Cnversin NUROP CONGRESS PAPER CHINESE PINYIN TO CHINESE CHARACTER CONVERSION CHIA LI SHI 1 AND LUA KIM TENG 2 Schl f Cmputing, Natinal University f Singapre 3 Science

More information

the results to larger systems due to prop'erties of the projection algorithm. First, the number of hidden nodes must

the results to larger systems due to prop'erties of the projection algorithm. First, the number of hidden nodes must M.E. Aggune, M.J. Dambrg, M.A. El-Sharkawi, R.J. Marks II and L.E. Atlas, "Dynamic and static security assessment f pwer systems using artificial neural netwrks", Prceedings f the NSF Wrkshp n Applicatins

More information

Biplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 13 Offprint

Biplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 13 Offprint Biplts in Practice MICHAEL GREENACRE Prfessr f Statistics at the Pmpeu Fabra University Chapter 13 Offprint CASE STUDY BIOMEDICINE Cmparing Cancer Types Accrding t Gene Epressin Arrays First published:

More information

5 th grade Common Core Standards

5 th grade Common Core Standards 5 th grade Cmmn Cre Standards In Grade 5, instructinal time shuld fcus n three critical areas: (1) develping fluency with additin and subtractin f fractins, and develping understanding f the multiplicatin

More information

Admissibility Conditions and Asymptotic Behavior of Strongly Regular Graphs

Admissibility Conditions and Asymptotic Behavior of Strongly Regular Graphs Admissibility Cnditins and Asympttic Behavir f Strngly Regular Graphs VASCO MOÇO MANO Department f Mathematics University f Prt Oprt PORTUGAL vascmcman@gmailcm LUÍS ANTÓNIO DE ALMEIDA VIEIRA Department

More information

3.4 Shrinkage Methods Prostate Cancer Data Example (Continued) Ridge Regression

3.4 Shrinkage Methods Prostate Cancer Data Example (Continued) Ridge Regression 3.3.4 Prstate Cancer Data Example (Cntinued) 3.4 Shrinkage Methds 61 Table 3.3 shws the cefficients frm a number f different selectin and shrinkage methds. They are best-subset selectin using an all-subsets

More information

Tree Structured Classifier

Tree Structured Classifier Tree Structured Classifier Reference: Classificatin and Regressin Trees by L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stne, Chapman & Hall, 98. A Medical Eample (CART): Predict high risk patients

More information

1 The limitations of Hartree Fock approximation

1 The limitations of Hartree Fock approximation Chapter: Pst-Hartree Fck Methds - I The limitatins f Hartree Fck apprximatin The n electrn single determinant Hartree Fck wave functin is the variatinal best amng all pssible n electrn single determinants

More information

Floating Point Method for Solving Transportation. Problems with Additional Constraints

Floating Point Method for Solving Transportation. Problems with Additional Constraints Internatinal Mathematical Frum, Vl. 6, 20, n. 40, 983-992 Flating Pint Methd fr Slving Transprtatin Prblems with Additinal Cnstraints P. Pandian and D. Anuradha Department f Mathematics, Schl f Advanced

More information

Subject description processes

Subject description processes Subject representatin 6.1.2. Subject descriptin prcesses Overview Fur majr prcesses r areas f practice fr representing subjects are classificatin, subject catalging, indexing, and abstracting. The prcesses

More information

Least Squares Optimal Filtering with Multirate Observations

Least Squares Optimal Filtering with Multirate Observations Prc. 36th Asilmar Cnf. n Signals, Systems, and Cmputers, Pacific Grve, CA, Nvember 2002 Least Squares Optimal Filtering with Multirate Observatins Charles W. herrien and Anthny H. Hawes Department f Electrical

More information

Computational modeling techniques

Computational modeling techniques Cmputatinal mdeling techniques Lecture 4: Mdel checing fr ODE mdels In Petre Department f IT, Åb Aademi http://www.users.ab.fi/ipetre/cmpmd/ Cntent Stichimetric matrix Calculating the mass cnservatin relatins

More information

CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS

CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS 1 Influential bservatins are bservatins whse presence in the data can have a distrting effect n the parameter estimates and pssibly the entire analysis,

More information

Study Group Report: Plate-fin Heat Exchangers: AEA Technology

Study Group Report: Plate-fin Heat Exchangers: AEA Technology Study Grup Reprt: Plate-fin Heat Exchangers: AEA Technlgy The prblem under study cncerned the apparent discrepancy between a series f experiments using a plate fin heat exchanger and the classical thery

More information

Sequential Allocation with Minimal Switching

Sequential Allocation with Minimal Switching In Cmputing Science and Statistics 28 (1996), pp. 567 572 Sequential Allcatin with Minimal Switching Quentin F. Stut 1 Janis Hardwick 1 EECS Dept., University f Michigan Statistics Dept., Purdue University

More information

Revision: August 19, E Main Suite D Pullman, WA (509) Voice and Fax

Revision: August 19, E Main Suite D Pullman, WA (509) Voice and Fax .7.4: Direct frequency dmain circuit analysis Revisin: August 9, 00 5 E Main Suite D Pullman, WA 9963 (509) 334 6306 ice and Fax Overview n chapter.7., we determined the steadystate respnse f electrical

More information

The blessing of dimensionality for kernel methods

The blessing of dimensionality for kernel methods fr kernel methds Building classifiers in high dimensinal space Pierre Dupnt Pierre.Dupnt@ucluvain.be Classifiers define decisin surfaces in sme feature space where the data is either initially represented

More information

CS 477/677 Analysis of Algorithms Fall 2007 Dr. George Bebis Course Project Due Date: 11/29/2007

CS 477/677 Analysis of Algorithms Fall 2007 Dr. George Bebis Course Project Due Date: 11/29/2007 CS 477/677 Analysis f Algrithms Fall 2007 Dr. Gerge Bebis Curse Prject Due Date: 11/29/2007 Part1: Cmparisn f Srting Algrithms (70% f the prject grade) The bjective f the first part f the assignment is

More information

Resampling Methods. Chapter 5. Chapter 5 1 / 52

Resampling Methods. Chapter 5. Chapter 5 1 / 52 Resampling Methds Chapter 5 Chapter 5 1 / 52 1 51 Validatin set apprach 2 52 Crss validatin 3 53 Btstrap Chapter 5 2 / 52 Abut Resampling An imprtant statistical tl Pretending the data as ppulatin and

More information

Support-Vector Machines

Support-Vector Machines Supprt-Vectr Machines Intrductin Supprt vectr machine is a linear machine with sme very nice prperties. Haykin chapter 6. See Alpaydin chapter 13 fr similar cntent. Nte: Part f this lecture drew material

More information

How do scientists measure trees? What is DBH?

How do scientists measure trees? What is DBH? Hw d scientists measure trees? What is DBH? Purpse Students develp an understanding f tree size and hw scientists measure trees. Students bserve and measure tree ckies and explre the relatinship between

More information

k-nearest Neighbor How to choose k Average of k points more reliable when: Large k: noise in attributes +o o noise in class labels

k-nearest Neighbor How to choose k Average of k points more reliable when: Large k: noise in attributes +o o noise in class labels Mtivating Example Memry-Based Learning Instance-Based Learning K-earest eighbr Inductive Assumptin Similar inputs map t similar utputs If nt true => learning is impssible If true => learning reduces t

More information

x 1 Outline IAML: Logistic Regression Decision Boundaries Example Data

x 1 Outline IAML: Logistic Regression Decision Boundaries Example Data Outline IAML: Lgistic Regressin Charles Suttn and Victr Lavrenk Schl f Infrmatics Semester Lgistic functin Lgistic regressin Learning lgistic regressin Optimizatin The pwer f nn-linear basis functins Least-squares

More information

Methods for Determination of Mean Speckle Size in Simulated Speckle Pattern

Methods for Determination of Mean Speckle Size in Simulated Speckle Pattern 0.478/msr-04-004 MEASUREMENT SCENCE REVEW, Vlume 4, N. 3, 04 Methds fr Determinatin f Mean Speckle Size in Simulated Speckle Pattern. Hamarvá, P. Šmíd, P. Hrváth, M. Hrabvský nstitute f Physics f the Academy

More information

Kinetic Model Completeness

Kinetic Model Completeness 5.68J/10.652J Spring 2003 Lecture Ntes Tuesday April 15, 2003 Kinetic Mdel Cmpleteness We say a chemical kinetic mdel is cmplete fr a particular reactin cnditin when it cntains all the species and reactins

More information

SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis

SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical mdel fr micrarray data analysis David Rssell Department f Bistatistics M.D. Andersn Cancer Center, Hustn, TX 77030, USA rsselldavid@gmail.cm

More information

MATCHING TECHNIQUES. Technical Track Session VI. Emanuela Galasso. The World Bank

MATCHING TECHNIQUES. Technical Track Session VI. Emanuela Galasso. The World Bank MATCHING TECHNIQUES Technical Track Sessin VI Emanuela Galass The Wrld Bank These slides were develped by Christel Vermeersch and mdified by Emanuela Galass fr the purpse f this wrkshp When can we use

More information

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9. Sectin 7 Mdel Assessment This sectin is based n Stck and Watsn s Chapter 9. Internal vs. external validity Internal validity refers t whether the analysis is valid fr the ppulatin and sample being studied.

More information

Interference is when two (or more) sets of waves meet and combine to produce a new pattern.

Interference is when two (or more) sets of waves meet and combine to produce a new pattern. Interference Interference is when tw (r mre) sets f waves meet and cmbine t prduce a new pattern. This pattern can vary depending n the riginal wave directin, wavelength, amplitude, etc. The tw mst extreme

More information

What is Statistical Learning?

What is Statistical Learning? What is Statistical Learning? Sales 5 10 15 20 25 Sales 5 10 15 20 25 Sales 5 10 15 20 25 0 50 100 200 300 TV 0 10 20 30 40 50 Radi 0 20 40 60 80 100 Newspaper Shwn are Sales vs TV, Radi and Newspaper,

More information

A New Evaluation Measure. J. Joiner and L. Werner. The problems of evaluation and the needed criteria of evaluation

A New Evaluation Measure. J. Joiner and L. Werner. The problems of evaluation and the needed criteria of evaluation III-l III. A New Evaluatin Measure J. Jiner and L. Werner Abstract The prblems f evaluatin and the needed criteria f evaluatin measures in the SMART system f infrmatin retrieval are reviewed and discussed.

More information

READING STATECHART DIAGRAMS

READING STATECHART DIAGRAMS READING STATECHART DIAGRAMS Figure 4.48 A Statechart diagram with events The diagram in Figure 4.48 shws all states that the bject plane can be in during the curse f its life. Furthermre, it shws the pssible

More information

, which yields. where z1. and z2

, which yields. where z1. and z2 The Gaussian r Nrmal PDF, Page 1 The Gaussian r Nrmal Prbability Density Functin Authr: Jhn M Cimbala, Penn State University Latest revisin: 11 September 13 The Gaussian r Nrmal Prbability Density Functin

More information

NAME: Prof. Ruiz. 1. [5 points] What is the difference between simple random sampling and stratified random sampling?

NAME: Prof. Ruiz. 1. [5 points] What is the difference between simple random sampling and stratified random sampling? CS4445 ata Mining and Kwledge iscery in atabases. B Term 2014 Exam 1 Nember 24, 2014 Prf. Carlina Ruiz epartment f Cmputer Science Wrcester Plytechnic Institute NAME: Prf. Ruiz Prblem I: Prblem II: Prblem

More information

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification COMP 551 Applied Machine Learning Lecture 5: Generative mdels fr linear classificatin Instructr: Herke van Hf (herke.vanhf@mail.mcgill.ca) Slides mstly by: Jelle Pineau Class web page: www.cs.mcgill.ca/~hvanh2/cmp551

More information

Comparing Several Means: ANOVA. Group Means and Grand Mean

Comparing Several Means: ANOVA. Group Means and Grand Mean STAT 511 ANOVA and Regressin 1 Cmparing Several Means: ANOVA Slide 1 Blue Lake snap beans were grwn in 12 pen-tp chambers which are subject t 4 treatments 3 each with O 3 and SO 2 present/absent. The ttal

More information

WRITING THE REPORT. Organizing the report. Title Page. Table of Contents

WRITING THE REPORT. Organizing the report. Title Page. Table of Contents WRITING THE REPORT Organizing the reprt Mst reprts shuld be rganized in the fllwing manner. Smetime there is a valid reasn t include extra chapters in within the bdy f the reprt. 1. Title page 2. Executive

More information

ENSC Discrete Time Systems. Project Outline. Semester

ENSC Discrete Time Systems. Project Outline. Semester ENSC 49 - iscrete Time Systems Prject Outline Semester 006-1. Objectives The gal f the prject is t design a channel fading simulatr. Upn successful cmpletin f the prject, yu will reinfrce yur understanding

More information

February 28, 2013 COMMENTS ON DIFFUSION, DIFFUSIVITY AND DERIVATION OF HYPERBOLIC EQUATIONS DESCRIBING THE DIFFUSION PHENOMENA

February 28, 2013 COMMENTS ON DIFFUSION, DIFFUSIVITY AND DERIVATION OF HYPERBOLIC EQUATIONS DESCRIBING THE DIFFUSION PHENOMENA February 28, 2013 COMMENTS ON DIFFUSION, DIFFUSIVITY AND DERIVATION OF HYPERBOLIC EQUATIONS DESCRIBING THE DIFFUSION PHENOMENA Mental Experiment regarding 1D randm walk Cnsider a cntainer f gas in thermal

More information

4th Indian Institute of Astrophysics - PennState Astrostatistics School July, 2013 Vainu Bappu Observatory, Kavalur. Correlation and Regression

4th Indian Institute of Astrophysics - PennState Astrostatistics School July, 2013 Vainu Bappu Observatory, Kavalur. Correlation and Regression 4th Indian Institute f Astrphysics - PennState Astrstatistics Schl July, 2013 Vainu Bappu Observatry, Kavalur Crrelatin and Regressin Rahul Ry Indian Statistical Institute, Delhi. Crrelatin Cnsider a tw

More information

Determining the Accuracy of Modal Parameter Estimation Methods

Determining the Accuracy of Modal Parameter Estimation Methods Determining the Accuracy f Mdal Parameter Estimatin Methds by Michael Lee Ph.D., P.E. & Mar Richardsn Ph.D. Structural Measurement Systems Milpitas, CA Abstract The mst cmmn type f mdal testing system

More information

THE LIFE OF AN OBJECT IT SYSTEMS

THE LIFE OF AN OBJECT IT SYSTEMS THE LIFE OF AN OBJECT IT SYSTEMS Persns, bjects, r cncepts frm the real wrld, which we mdel as bjects in the IT system, have "lives". Actually, they have tw lives; the riginal in the real wrld has a life,

More information

Technical Bulletin. Generation Interconnection Procedures. Revisions to Cluster 4, Phase 1 Study Methodology

Technical Bulletin. Generation Interconnection Procedures. Revisions to Cluster 4, Phase 1 Study Methodology Technical Bulletin Generatin Intercnnectin Prcedures Revisins t Cluster 4, Phase 1 Study Methdlgy Release Date: Octber 20, 2011 (Finalizatin f the Draft Technical Bulletin released n September 19, 2011)

More information

Elements of Machine Intelligence - I

Elements of Machine Intelligence - I ECE-175A Elements f Machine Intelligence - I Ken Kreutz-Delgad Nun Vascncels ECE Department, UCSD Winter 2011 The curse The curse will cver basic, but imprtant, aspects f machine learning and pattern recgnitin

More information

Phys. 344 Ch 7 Lecture 8 Fri., April. 10 th,

Phys. 344 Ch 7 Lecture 8 Fri., April. 10 th, Phys. 344 Ch 7 Lecture 8 Fri., April. 0 th, 009 Fri. 4/0 8. Ising Mdel f Ferrmagnets HW30 66, 74 Mn. 4/3 Review Sat. 4/8 3pm Exam 3 HW Mnday: Review fr est 3. See n-line practice test lecture-prep is t

More information

Enhancing Performance of MLP/RBF Neural Classifiers via an Multivariate Data Distribution Scheme

Enhancing Performance of MLP/RBF Neural Classifiers via an Multivariate Data Distribution Scheme Enhancing Perfrmance f / Neural Classifiers via an Multivariate Data Distributin Scheme Halis Altun, Gökhan Gelen Nigde University, Electrical and Electrnics Engineering Department Nigde, Turkey haltun@nigde.edu.tr

More information

This section is primarily focused on tools to aid us in finding roots/zeros/ -intercepts of polynomials. Essentially, our focus turns to solving.

This section is primarily focused on tools to aid us in finding roots/zeros/ -intercepts of polynomials. Essentially, our focus turns to solving. Sectin 3.2: Many f yu WILL need t watch the crrespnding vides fr this sectin n MyOpenMath! This sectin is primarily fcused n tls t aid us in finding rts/zers/ -intercepts f plynmials. Essentially, ur fcus

More information

IAML: Support Vector Machines

IAML: Support Vector Machines 1 / 22 IAML: Supprt Vectr Machines Charles Suttn and Victr Lavrenk Schl f Infrmatics Semester 1 2 / 22 Outline Separating hyperplane with maimum margin Nn-separable training data Epanding the input int

More information

OF SIMPLY SUPPORTED PLYWOOD PLATES UNDER COMBINED EDGEWISE BENDING AND COMPRESSION

OF SIMPLY SUPPORTED PLYWOOD PLATES UNDER COMBINED EDGEWISE BENDING AND COMPRESSION U. S. FOREST SERVICE RESEARCH PAPER FPL 50 DECEMBER U. S. DEPARTMENT OF AGRICULTURE FOREST SERVICE FOREST PRODUCTS LABORATORY OF SIMPLY SUPPORTED PLYWOOD PLATES UNDER COMBINED EDGEWISE BENDING AND COMPRESSION

More information

COMP 551 Applied Machine Learning Lecture 11: Support Vector Machines

COMP 551 Applied Machine Learning Lecture 11: Support Vector Machines COMP 551 Applied Machine Learning Lecture 11: Supprt Vectr Machines Instructr: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/cmp551 Unless therwise nted, all material psted fr this curse

More information

Department of Economics, University of California, Davis Ecn 200C Micro Theory Professor Giacomo Bonanno. Insurance Markets

Department of Economics, University of California, Davis Ecn 200C Micro Theory Professor Giacomo Bonanno. Insurance Markets Department f Ecnmics, University f alifrnia, Davis Ecn 200 Micr Thery Prfessr Giacm Bnann Insurance Markets nsider an individual wh has an initial wealth f. ith sme prbability p he faces a lss f x (0

More information

MATCHING TECHNIQUES Technical Track Session VI Céline Ferré The World Bank

MATCHING TECHNIQUES Technical Track Session VI Céline Ferré The World Bank MATCHING TECHNIQUES Technical Track Sessin VI Céline Ferré The Wrld Bank When can we use matching? What if the assignment t the treatment is nt dne randmly r based n an eligibility index, but n the basis

More information

Lead/Lag Compensator Frequency Domain Properties and Design Methods

Lead/Lag Compensator Frequency Domain Properties and Design Methods Lectures 6 and 7 Lead/Lag Cmpensatr Frequency Dmain Prperties and Design Methds Definitin Cnsider the cmpensatr (ie cntrller Fr, it is called a lag cmpensatr s K Fr s, it is called a lead cmpensatr Ntatin

More information

CONSTRUCTING STATECHART DIAGRAMS

CONSTRUCTING STATECHART DIAGRAMS CONSTRUCTING STATECHART DIAGRAMS The fllwing checklist shws the necessary steps fr cnstructing the statechart diagrams f a class. Subsequently, we will explain the individual steps further. Checklist 4.6

More information

T Algorithmic methods for data mining. Slide set 6: dimensionality reduction

T Algorithmic methods for data mining. Slide set 6: dimensionality reduction T-61.5060 Algrithmic methds fr data mining Slide set 6: dimensinality reductin reading assignment LRU bk: 11.1 11.3 PCA tutrial in mycurses (ptinal) ptinal: An Elementary Prf f a Therem f Jhnsn and Lindenstrauss,

More information

Lab 1 The Scientific Method

Lab 1 The Scientific Method INTRODUCTION The fllwing labratry exercise is designed t give yu, the student, an pprtunity t explre unknwn systems, r universes, and hypthesize pssible rules which may gvern the behavir within them. Scientific

More information

NUMBERS, MATHEMATICS AND EQUATIONS

NUMBERS, MATHEMATICS AND EQUATIONS AUSTRALIAN CURRICULUM PHYSICS GETTING STARTED WITH PHYSICS NUMBERS, MATHEMATICS AND EQUATIONS An integral part t the understanding f ur physical wrld is the use f mathematical mdels which can be used t

More information

Sections 15.1 to 15.12, 16.1 and 16.2 of the textbook (Robbins-Miller) cover the materials required for this topic.

Sections 15.1 to 15.12, 16.1 and 16.2 of the textbook (Robbins-Miller) cover the materials required for this topic. Tpic : AC Fundamentals, Sinusidal Wavefrm, and Phasrs Sectins 5. t 5., 6. and 6. f the textbk (Rbbins-Miller) cver the materials required fr this tpic.. Wavefrms in electrical systems are current r vltage

More information

making triangle (ie same reference angle) ). This is a standard form that will allow us all to have the X= y=

making triangle (ie same reference angle) ). This is a standard form that will allow us all to have the X= y= Intrductin t Vectrs I 21 Intrductin t Vectrs I 22 I. Determine the hrizntal and vertical cmpnents f the resultant vectr by cunting n the grid. X= y= J. Draw a mangle with hrizntal and vertical cmpnents

More information

Thermodynamics and Equilibrium

Thermodynamics and Equilibrium Thermdynamics and Equilibrium Thermdynamics Thermdynamics is the study f the relatinship between heat and ther frms f energy in a chemical r physical prcess. We intrduced the thermdynamic prperty f enthalpy,

More information

Math Foundations 20 Work Plan

Math Foundations 20 Work Plan Math Fundatins 20 Wrk Plan Units / Tpics 20.8 Demnstrate understanding f systems f linear inequalities in tw variables. Time Frame December 1-3 weeks 6-10 Majr Learning Indicatrs Identify situatins relevant

More information

CHAPTER 24: INFERENCE IN REGRESSION. Chapter 24: Make inferences about the population from which the sample data came.

CHAPTER 24: INFERENCE IN REGRESSION. Chapter 24: Make inferences about the population from which the sample data came. MATH 1342 Ch. 24 April 25 and 27, 2013 Page 1 f 5 CHAPTER 24: INFERENCE IN REGRESSION Chapters 4 and 5: Relatinships between tw quantitative variables. Be able t Make a graph (scatterplt) Summarize the

More information

Five Whys How To Do It Better

Five Whys How To Do It Better Five Whys Definitin. As explained in the previus article, we define rt cause as simply the uncvering f hw the current prblem came int being. Fr a simple causal chain, it is the entire chain. Fr a cmplex

More information

IN a recent article, Geary [1972] discussed the merit of taking first differences

IN a recent article, Geary [1972] discussed the merit of taking first differences The Efficiency f Taking First Differences in Regressin Analysis: A Nte J. A. TILLMAN IN a recent article, Geary [1972] discussed the merit f taking first differences t deal with the prblems that trends

More information

Chapters 29 and 35 Thermochemistry and Chemical Thermodynamics

Chapters 29 and 35 Thermochemistry and Chemical Thermodynamics Chapters 9 and 35 Thermchemistry and Chemical Thermdynamics 1 Cpyright (c) 011 by Michael A. Janusa, PhD. All rights reserved. Thermchemistry Thermchemistry is the study f the energy effects that accmpany

More information

We can see from the graph above that the intersection is, i.e., [ ).

We can see from the graph above that the intersection is, i.e., [ ). MTH 111 Cllege Algebra Lecture Ntes July 2, 2014 Functin Arithmetic: With nt t much difficulty, we ntice that inputs f functins are numbers, and utputs f functins are numbers. S whatever we can d with

More information

Differentiation Applications 1: Related Rates

Differentiation Applications 1: Related Rates Differentiatin Applicatins 1: Related Rates 151 Differentiatin Applicatins 1: Related Rates Mdel 1: Sliding Ladder 10 ladder y 10 ladder 10 ladder A 10 ft ladder is leaning against a wall when the bttm

More information

SPH3U1 Lesson 06 Kinematics

SPH3U1 Lesson 06 Kinematics PROJECTILE MOTION LEARNING GOALS Students will: Describe the mtin f an bject thrwn at arbitrary angles thrugh the air. Describe the hrizntal and vertical mtins f a prjectile. Slve prjectile mtin prblems.

More information

PSU GISPOPSCI June 2011 Ordinary Least Squares & Spatial Linear Regression in GeoDa

PSU GISPOPSCI June 2011 Ordinary Least Squares & Spatial Linear Regression in GeoDa There are tw parts t this lab. The first is intended t demnstrate hw t request and interpret the spatial diagnstics f a standard OLS regressin mdel using GeDa. The diagnstics prvide infrmatin abut the

More information

o o IMPORTANT REMINDERS Reports will be graded largely on their ability to clearly communicate results and important conclusions.

o o IMPORTANT REMINDERS Reports will be graded largely on their ability to clearly communicate results and important conclusions. BASD High Schl Frmal Lab Reprt GENERAL INFORMATION 12 pt Times New Rman fnt Duble-spaced, if required by yur teacher 1 inch margins n all sides (tp, bttm, left, and right) Always write in third persn (avid

More information

Fall 2013 Physics 172 Recitation 3 Momentum and Springs

Fall 2013 Physics 172 Recitation 3 Momentum and Springs Fall 03 Physics 7 Recitatin 3 Mmentum and Springs Purpse: The purpse f this recitatin is t give yu experience wrking with mmentum and the mmentum update frmula. Readings: Chapter.3-.5 Learning Objectives:.3.

More information

Writing Guidelines. (Updated: November 25, 2009) Forwards

Writing Guidelines. (Updated: November 25, 2009) Forwards Writing Guidelines (Updated: Nvember 25, 2009) Frwards I have fund in my review f the manuscripts frm ur students and research assciates, as well as thse submitted t varius jurnals by thers that the majr

More information

Physics 2010 Motion with Constant Acceleration Experiment 1

Physics 2010 Motion with Constant Acceleration Experiment 1 . Physics 00 Mtin with Cnstant Acceleratin Experiment In this lab, we will study the mtin f a glider as it accelerates dwnhill n a tilted air track. The glider is supprted ver the air track by a cushin

More information

Introduction to Quantitative Genetics II: Resemblance Between Relatives

Introduction to Quantitative Genetics II: Resemblance Between Relatives Intrductin t Quantitative Genetics II: Resemblance Between Relatives Bruce Walsh 8 Nvember 006 EEB 600A The heritability f a trait, a central cncept in quantitative genetics, is the prprtin f variatin

More information

Lecture 17: Free Energy of Multi-phase Solutions at Equilibrium

Lecture 17: Free Energy of Multi-phase Solutions at Equilibrium Lecture 17: 11.07.05 Free Energy f Multi-phase Slutins at Equilibrium Tday: LAST TIME...2 FREE ENERGY DIAGRAMS OF MULTI-PHASE SOLUTIONS 1...3 The cmmn tangent cnstructin and the lever rule...3 Practical

More information

1996 Engineering Systems Design and Analysis Conference, Montpellier, France, July 1-4, 1996, Vol. 7, pp

1996 Engineering Systems Design and Analysis Conference, Montpellier, France, July 1-4, 1996, Vol. 7, pp THE POWER AND LIMIT OF NEURAL NETWORKS T. Y. Lin Department f Mathematics and Cmputer Science San Jse State University San Jse, Califrnia 959-003 tylin@cs.ssu.edu and Bereley Initiative in Sft Cmputing*

More information

Pipetting 101 Developed by BSU CityLab

Pipetting 101 Developed by BSU CityLab Discver the Micrbes Within: The Wlbachia Prject Pipetting 101 Develped by BSU CityLab Clr Cmparisns Pipetting Exercise #1 STUDENT OBJECTIVES Students will be able t: Chse the crrect size micrpipette fr

More information

On Huntsberger Type Shrinkage Estimator for the Mean of Normal Distribution ABSTRACT INTRODUCTION

On Huntsberger Type Shrinkage Estimator for the Mean of Normal Distribution ABSTRACT INTRODUCTION Malaysian Jurnal f Mathematical Sciences 4(): 7-4 () On Huntsberger Type Shrinkage Estimatr fr the Mean f Nrmal Distributin Department f Mathematical and Physical Sciences, University f Nizwa, Sultanate

More information

Coalition Formation and Data Envelopment Analysis

Coalition Formation and Data Envelopment Analysis Jurnal f CENTRU Cathedra Vlume 4, Issue 2, 20 26-223 JCC Jurnal f CENTRU Cathedra Calitin Frmatin and Data Envelpment Analysis Rlf Färe Oregn State University, Crvallis, OR, USA Shawna Grsspf Oregn State

More information

CHAPTER 3 INEQUALITIES. Copyright -The Institute of Chartered Accountants of India

CHAPTER 3 INEQUALITIES. Copyright -The Institute of Chartered Accountants of India CHAPTER 3 INEQUALITIES Cpyright -The Institute f Chartered Accuntants f India INEQUALITIES LEARNING OBJECTIVES One f the widely used decisin making prblems, nwadays, is t decide n the ptimal mix f scarce

More information

Keysight Technologies Understanding the Kramers-Kronig Relation Using A Pictorial Proof

Keysight Technologies Understanding the Kramers-Kronig Relation Using A Pictorial Proof Keysight Technlgies Understanding the Kramers-Krnig Relatin Using A Pictrial Prf By Clin Warwick, Signal Integrity Prduct Manager, Keysight EEsf EDA White Paper Intrductin In principle, applicatin f the

More information

Churn Prediction using Dynamic RFM-Augmented node2vec

Churn Prediction using Dynamic RFM-Augmented node2vec Churn Predictin using Dynamic RFM-Augmented nde2vec Sandra Mitrvić, Jchen de Weerdt, Bart Baesens & Wilfried Lemahieu Department f Decisin Sciences and Infrmatin Management, KU Leuven 18 September 2017,

More information

APPLICATION OF THE BRATSETH SCHEME FOR HIGH LATITUDE INTERMITTENT DATA ASSIMILATION USING THE PSU/NCAR MM5 MESOSCALE MODEL

APPLICATION OF THE BRATSETH SCHEME FOR HIGH LATITUDE INTERMITTENT DATA ASSIMILATION USING THE PSU/NCAR MM5 MESOSCALE MODEL JP2.11 APPLICATION OF THE BRATSETH SCHEME FOR HIGH LATITUDE INTERMITTENT DATA ASSIMILATION USING THE PSU/NCAR MM5 MESOSCALE MODEL Xingang Fan * and Jeffrey S. Tilley University f Alaska Fairbanks, Fairbanks,

More information

ROUNDING ERRORS IN BEAM-TRACKING CALCULATIONS

ROUNDING ERRORS IN BEAM-TRACKING CALCULATIONS Particle Acceleratrs, 1986, Vl. 19, pp. 99-105 0031-2460/86/1904-0099/$15.00/0 1986 Grdn and Breach, Science Publishers, S.A. Printed in the United States f America ROUNDING ERRORS IN BEAM-TRACKING CALCULATIONS

More information

UNIT 6 DETERMINATION OF FLASH AND FIRE POINT OF A LUBRICATING OIL BY OPEN CUP AND CLOSED CUP METHODS

UNIT 6 DETERMINATION OF FLASH AND FIRE POINT OF A LUBRICATING OIL BY OPEN CUP AND CLOSED CUP METHODS UNIT 6 DETERMINATION OF FLASH AND FIRE POINT OF A LUBRICATING OIL BY OPEN CUP AND CLOSED CUP METHODS Determinatin f Flash and Fire Pint f a Cup and Clsed Cup Structure 6. Intrductin Objectives 6. Experiment

More information

Module 4: General Formulation of Electric Circuit Theory

Module 4: General Formulation of Electric Circuit Theory Mdule 4: General Frmulatin f Electric Circuit Thery 4. General Frmulatin f Electric Circuit Thery All electrmagnetic phenmena are described at a fundamental level by Maxwell's equatins and the assciated

More information

and the Doppler frequency rate f R , can be related to the coefficients of this polynomial. The relationships are:

and the Doppler frequency rate f R , can be related to the coefficients of this polynomial. The relationships are: Algrithm fr Estimating R and R - (David Sandwell, SIO, August 4, 2006) Azimith cmpressin invlves the alignment f successive eches t be fcused n a pint target Let s be the slw time alng the satellite track

More information

Physics 2B Chapter 23 Notes - Faraday s Law & Inductors Spring 2018

Physics 2B Chapter 23 Notes - Faraday s Law & Inductors Spring 2018 Michael Faraday lived in the Lndn area frm 1791 t 1867. He was 29 years ld when Hand Oersted, in 1820, accidentally discvered that electric current creates magnetic field. Thrugh empirical bservatin and

More information

Revisiting the Socrates Example

Revisiting the Socrates Example Sectin 1.6 Sectin Summary Valid Arguments Inference Rules fr Prpsitinal Lgic Using Rules f Inference t Build Arguments Rules f Inference fr Quantified Statements Building Arguments fr Quantified Statements

More information