On classifier behavior in the presence of mislabeling noise

Size: px
Start display at page:

Download "On classifier behavior in the presence of mislabeling noise"

Transcription

1 Data Min Knwl Disc DOI /s On classifier behavir in the presence f mislabeling nise Katsiaryna Mirylenka 1 Gerge Giannakpuls 2 Le Minh D 3 Themis Palpanas 4 Received: 12 Nvember 2015 / Accepted: 1 Nvember 2016 The Authr(s) 2016 Abstract Machine learning algrithms perfrm differently in settings with varying levels f training set mislabeling nise. Therefre, the chice f the right algrithm fr a particular learning prblem is crucial. The cntributin f this paper is twards tw, dual prblems: first, cmparing algrithm behavir; and secnd, chsing learning algrithms fr nisy settings. We present the sigmid rule framewrk, which can be used t chse the mst apprpriate learning algrithm depending n the prperties f nise in a classificatin prblem. The framewrk uses an existing mdel f the expected perfrmance f learning algrithms as a sigmid functin f the signal-t-nise rati in the training instances. We study the characteristics f the sigmid functin using Respnsible editr: Jhannes Fuernkranz. Katsiaryna Mirylenka: Wrk mainly dne while at the University f Trent, Italy. B Katsiaryna Mirylenka kmi@zurich.ibm.cm Gerge Giannakpuls ggianna@iit.demkrits.gr Le Minh D leminh.d@studenti.unitn.it Themis Palpanas themis@mi.parisdescartes.fr 1 IBM Research Zurich, Säumerstrasse 4, 8803 Rüschlikn, Switzerland 2 Institute f Infrmatics and Telecmmunicatins f NCSR Demkrits, Ag. Paraskevi, Athens, Greece 3 University f Trent, via Smmarive 5, 39 Trent, Italy 4 Department f Cmputer Science, Paris Descartes University, 45 Rue Des Saints-Peres, Paris, France

2 K. Mirylenka et al. five representative nn-sequential classifiers, namely, Naïve Bayes, knn, SVM, a decisin tree classifier, and a rule-based classifier, and three widely used sequential classifiers based n hidden Markv mdels, cnditinal randm fields and recursive neural netwrks. Based n the sigmid parameters we define a set f intuitive criteria that are useful fr cmparing the behavir f learning algrithms in the presence f nise. Furthermre, we shw that there is a cnnectin between these parameters and the characteristics f the underlying dataset, shwing that we can estimate an expected perfrmance ver a dataset regardless f the underlying algrithm. The framewrk is applicable t cncept drift scenaris, including mdeling user behavir ver time, and mining f nisy time series f evlving nature. Keywrds Classificatin Sequential classifiers Classifier evaluatin Handling nise Cncept drift 1 Intrductin Transfrming vast amunts f cllected pssibly nisy data int useful infrmatin thrugh the prcesses f clustering and classificatin has imprtant applicatins in many dmains. One example is preference detectin f custmers fr marketing. Anther example is detectin f negligent prviders fr Mechanical turk and reductin f their behavir n task results. The machine learning and data mining cmmunities have extensively studied the behavir f classifiers in different settings (e.g., Han and Kamber 2006; Thedridis and Kutrumbas 2003), hwever, the effect f nise n the classificatin task is still an imprtant and pen prblem. The imprtance f studying nisy data settings is augmented by the fact that nise is very cmmn in a variety f large scale data surces, such as sensr netwrks and the Web, while Machine learning algrithms (bth classic and sequential) perfrm differently in settings with varying levels f labeling nise. Thus, there rises a need fr a unified framewrk aimed at studying the behavir f learning algrithms in the presence f nise, depending n the specifics f each particular dataset. Labeling nise is cmmn in cases f cncept drift, where a target cncept shifts ver time, rendering previus training instances bslete. Essentially, in the case f cncept drift, feature nise causes the labels f previus training instances t be ut f date and, thus, equivalent t label nise. Drifting cncepts appear in a variety f settings in the real wrld, such as the state f a free market, r the traits f the mst viewed mvie. Thus, we expect this wrk t ffer intuitin and tls fr meta-learning in similar, nise-prne, real cases f learning. Earlier wrk has shwn that the perfrmance 1 f a classifier in the presence f nise can be effectively apprximated by a sigmid functin, which relates the signal-t-nise rati in training data t the expected perfrmance f the classifier (Giannakpuls and Palpanas 2010). We call this fact the sigmid rule. The sigmid rule states that the expected perfrmance f each learning algrithm can be seen as a sigmid functin. This functin indicates the impact f signal-t-nise rati in the 1 When we write perfrmance f an algrithm, we mean the classificatin accuracy.

3 On classifier behavir in the presence f mislabeling nise Expected Classificatin Perfrmance Signal-t-nise Fig. 1 An example f a sigmid functin relating signal-t-nise (rati f mislabeled instances) in the training set (hrizntal axis) t expected classificatin perfrmance (vertical axis) training set t the perfrmance f the algrithm, under an assumptin f mntnicity f the change f perfrmance (i.e., when cleaner data are prvided the algrithm is expected t perfrm better). The added value f this view is that it allws a single mdel, with a limited number f parameters, t be used as an illustratin f the behavir f all learning algrithms in a nisy setting. We illustrate an example f a sigmid functin in Fig. 1 and elabrate n the cncept in Sect In this wrk, we study the effect f training set label nise 2 n a classificatin task. We further build n this study t achieve tw, dual gals: We shw hw t supprt the cmparisn between algrithms and their behavir in a nisy setting. The wrkflw in this case is as fllws: we select an algrithm; we sample the input fr training data; we estimate a set f parameters (based n the prpsed sigmid rule framewrk we discuss belw) that characterize the behavir f the algrithm in the presence f nise; we cmpare algrithms based n thse parameters. Thus, in Sect. 4, we examine hw much added benefit we can derive frm the sigmid rule mdel, by elabrating n the parameters f the sigmid and illustrating the cnnectin f each such parameter t the learner behavir. Based n the mst prminent parameters we define several dimensins characterizing the behavir f learning algrithms. We, finally, illustrate hw these dimensins can be used t select learning algrithms in different settings, based n user requirements. We shw hw t allw fr the predictin f algrithm perfrmance, given dataset attributes related t a specific classificatin prblem. The wrkflw in this case is as fllws: we calculate a set f dataset attributes (i.e., the number f classes, features and instances and the fractal dimensinality de Susa et al. 2006); we use estimatin (regressin) mdels t predict the expected perfrmance regardless f the chice f the underlying algrithm. We study the crrelatin between dataset characteristics and mdeled perfrmance in Sect. 7. Mrever, we extend ur study t sequential classificatin cases, t illustrate the brad applicability and usefulness f the sigmid rule framewrk (SRF). The SRF uses a mdel f the expected perfrmance f learning algrithms based n a sigmid functin f the signal-t-nise rati in the training instances. SRF is specialized fr nisy settings, prviding mre insights regarding the behavir f the algrithms. Essentially, it describes a meta-mdel (the SRF sig- 2 Thrughut the rest f this wrk we will use the term nise t refer t the label nise, unless therwise indicated.

4 K. Mirylenka et al. mid) n varius algrithms, thus making it a special case f meta-mdeling. Cntributins In summary, we make the fllwing cntributins. We define a set f intuitive criteria based n the SRF that can be used t cmpare the behavir f learning algrithms in the presence f nise. This set f criteria prvides bth quantitative and qualitative supprt fr learner selectin in different settings. We demnstrate that there is a cnnectin between the SRF dimensins and the characteristics f the underlying dataset, using bth a crrelatin study and regressin mdeling based n linear and lgistic regressin. In bth cases we discvered statistically significant relatins between SRF dimensins and dataset characteristics. Our analysis shws that these relatins are strnger fr the sequential classifiers than fr the nn-sequential, traditinal classifiers, where the instances are cnsidered t be independent. Our results are based n an extensive experimental evaluatin, using 10 synthetic and 31 real datasets frm diverse dmains. The hetergeneity f the dataset cllectin validates the general applicability f the SRF fr bth nn-sequential and sequential learners. The rest f this paper 3 is rganized as fllws. We review the related wrk (Sect. 2) and define the prblem studied (Sect. 3). We then fcus n the sigmid functin and define the SRF dimensins (Sect. 4), describing their qualitative value. We test ur framewrk n multiple datasets (Sect. 6), statistically analyzing the cnnectin between dataset prperties and SRF dimensins (Sect. 7) fr bth sequential and nnsequential classificatin tasks. Sectin 8 prvides a summary f the prpsed apprach and the experimental results. Finally, we cnclude in Sect Related wrk Since 1984, when Valiant published his thery f learnable (Valiant 1984), setting the basis f prbably apprximately crrect (PAC) learning, numerus different machine learning algrithms and classificatin prblems have been devised and studied in the machine learning cmmunity. Given the variety f existing learning algrithms, researchers are ften interested in btaining the best algrithm fr their particular tasks. The algrithm selectin is cnsidered t be a part f the meta-learning dmain (Giraud-Carrier et al. 2004). Accrding t the n-free-lunch therems (NFL) described in (Wlpert 2001) and prven in (Wlpert 1996a, b), there is n verall best classificatin algrithm, given the fact that all algrithms have equal perfrmance n average ver all datasets. Nevertheless, we are usually interested in applying machine learning algrithms t a specific dataset with certain characteristics, in which case we are nt limited by the 3 A preliminary versin f this wrk has appeared in Mirylenka et al. (2012).

5 On classifier behavir in the presence f mislabeling nise NFL therems. The results f NFL therems hint at cmparing different classificatin algrithms n the basis f dataset characteristics (Ali and Smith 2006), which is the aim f ur wrk in this research directin. Cncerning the measures f perfrmance that help distinguish amng learners, in Ali and Smith (2006) the authrs cmpared algrithms n a large number f datasets, using measures f perfrmance that take int accunt the distributin f the classes, which is a characteristic f a dataset. The Area Under the receiver perating Curve (AUC) is anther measure used t assess machine learning algrithms and t divide them int grups f classifiers that have statistically significant difference in perfrmance (Bradley 1997). In all the abve studies, the analysis f perfrmance has been applied fr the datasets withut extra nise, while we study the behavir f classificatin algrithms in nisy settings. In ur study, we use the fact shwn by Giannakpuls and Palpanas (2010, 2013) abut cncept drift, which illustrates that a sigmid functin can accurately describe perfrmance in the presence f varying levels f label nise in the training set. In an early influential wrk, Kuh et al. (1990) indicated and studied the prblem f cncept attainment in the presence f nise in the STAGGER system. In this paper, we analytically study the sigmid functin t determine the set f parameters that can be used t supprt the selectin f the learner in different nisy classificatin settings. The behavir f machine learning classifiers in the presence f nise is als cnsidered in the wrk (Kalapanidas et al. 2003). The artificial datasets used fr classificatin were created n the basis f predefined linear and nnlinear regressin mdels, and nise was injected in the features, instead f the class labels as in ur case. Nisy mdels f nn-markvian prcesses using reinfrcement learning algrithms and the methds f tempral difference are analyzed in the paper (Pendrith and Sammut 1994). Chevaleyre and Zucker (2000) examine multiple-instance inductin f rules fr different nise mdels. There are als theretical studies n regressin algrithms fr nisy data (Teytaud 2001) and wrks n denising, like Li et al. (2002), where a wavelet-based nise remval technique was shwn t increase the efficiency f fur learners. Bth nise-related studies (Teytaud 2001; Li et al. 2002) dealt with nise in the attributes. Hwever, we study label-related nise and d nt cnsider a specific nise mdel, which is a different prblem. The label-related nise crrespnds mstly t the cncept drift scenari, as was als discussed in the Sect. 1. Mrever, we estimate perfrmance as a functin f the signal-t-nise rati, cntrary t the wrks described abve. Inthe wrk f Nettletn et al. (2010), the authrs study the effect f different types f nise n the precisin f supervised learning techniques. They address this issue by systematically cmparing hw different degrees f nise affect supervised learners that belng t different paradigms. They cnsider 4 classifiers, namely the Naïve Bayes classifier, the decisin tree classifier, the IBk instance-based learner and the SMO supprt vectr machine. They test them n a cllectin f data sets that are perturbed with nise in the input attributes and nise in the utput class, and bserve that Naïve Bayes is mre rbust t nisy data than the ther three techniques. The perfrmance was evaluated n 13 datasets. Hwever, n general framewrk was prpsed fr mdeling the effect f nise n the perfrmance f the classifiers. In cntrast, we prpse the sigmid rule framewrk fr cmparing the behavir f classifiers. We als

6 K. Mirylenka et al. intrduce different dimensins that can be used t frmulate the criteria fr cmparing the algrithms depending n user needs. Mrever, we als apply this framewrk t sequential classifiers, which were nt cnsidered in abve studies. Previus studies n meta-learning and cncept drift (Klinkenberg 2005; Widmer 1997) describe appraches that adjust t changing user interests, taking int accunt the cntext. Meta-learning can ccur at varius levels, e.g., training instance selectin, learning algrithm and parameters selectin, windw size selectin, based n nline measurements. These measurements may include errr rates, distributin f class instances in a windw f recent instances, and ther related data r classificatin traits. Hwever, these wrks d nt explicitly address label nise, and cannt prvide an a priri estimatin f classifier perfrmance in such situatins, as in ur wrk. Recent wrks n ensemble-based meta-learning (e.g., Cruz et al. 2015) deal with lcal characteristics f the data and ptimize ensembles f classifiers (ECs), but d less t prvide dataset-related and algrithm-independent results, as we d in this wrk (Sect. 7). In Heywd (2015), the challenges f evlutinary mdel building are discussed, including a number f challenges and related appraches in the literature aimed at (evlving) streaming data in classificatin tasks. In this verview, the authr disuses (thrugh a multitude f wrks) a variety f required characteristics f learning methds, and hw evlutinary cmputatin can cntribute twards these characteristics. We believe that ur wrk is cmplementary, in that it can prvide a basis fr future meta-learning and evlutinary algrithms, where fine traits f learning (e.g., stability f perfrmance vs. maximum perfrmance) can priritize selectin in a learning setting. The SRF view f meta-learning prvides anther imprtant tl t the currently available appraches. This view may be even mre effective taking int accunt references t recurring cntexts (Pechenizkiy 2015), where the variance f nise levels may be predictable after a perid f bservatin and, thus, the knwledge f an algrithm s behavir within these levels can cnstitute imprtant knwledge. Furthermre, in this wrk we prpse a nvel directin f meta-learning, mving the fcus frm accuracy t transparency and trust (Pechenizkiy 2015). We achieve this by explaining classificatin expected behavir under nvel, interesting axes f analysis, which take int accunt nt nly the abslute values f the perfrmance f a classifier, but als characterize the behavir f the perfrmance change frm varius perspectives. Other meta-learning literature has fcused n the fllwing tpics: impact f the dataset and algrithm features n the learning perfrmance (Brazdil et al. 2003, 2009; wn Lee and Giraud-Carrier 2008; Rssi et al. 2014; Garcia et al. 2016; Mantvani et al. 2015); identificatin and handling f different types f nise (Brazdil et al. 2003; Rssi et al. 2014) and cncept drift (Rssi et al. 2014; Garcia et al. 2016) in the learning setting; meta-learning framewrks fr fine-tuning f the learning systems (Brazdil et al. 2003, 2009; Smith et al. 2014; Abdulrahman et al. 2015). Cncerning the study f dataset and algrithm features, Brazdil et al. (2003, 2009) describe a number f dataset features and discuss their usefulness n the perfrmance f individual classifiers. Hwever, these wrks d nt include a systematic analysis f all dataset-classifier pairs. Paper (wn Lee and Giraud-Carrier 2008) intrduces a dataset characteristic called hardness and demnstrates that it cntributes significantly t the perfrmance f any classifier, which supprts ur cnclusin that the dataset

7 On classifier behavir in the presence f mislabeling nise characteristics can be used t assess classifier perfrmance a priri. The paper (Mantvani et al. 2015) addresses the prblem f chsing the parameters f a learning algrithm that wuld maximize its perfrmance n a new dataset based n its characteristics. The prpsed slutin uses a meta-classifier with 81 features, which include the number f attributes and classes in the dataset, statistical mments, infrmatin thery measures, and thers. The wrk des nt chse the mst efficient dataset features. Rssi et al. (2014) and Garcia et al. (2016) apply meta-learning t regressin prblems and t nise filters respectively. Similarly t Mantvani et al. (2015), these wrks als use varius dataset statistics as features, withut a systematic analysis f feature imprtance. Cncerning nise, in ur wrk we study the crrelatin between labeling nise (as ppsed t feature nise) and the expected perfrmance. Rssi et al. (2014) and Garcia et al. (2016) als cnsider nisy settings. Similarly t ur wrk, Rssi et al. (2014) fcus n cncept drift scenaris. Hwever, it des nt prvide generally applicable criteria fr selecting the learning algrithm and des nt deal with explicit label nise. Garcia et al. (2016) discuss the perfrmance f nise filters, but des nt prvide a generic mdel f algrithm behavir in the presence f nise. In cmparisn t these wrks, we additinally address the perfrmance f sequential mdels fr classificatin in the presence f nise. We prvide the added value f an estimated analytic cnnectin between nise levels and perfrmance (described by the sigmid curve). The sigmid rule allws an adaptive meta-learning system t have prir knwledge regarding algrithms and their expected perfrmance in different levels f nise and under different requirements (stability, average perfrmance, etc.). Meta-learning framewrks that can estimate nise levels in a dataset will als benefit frm ur analysis. General meta-learning framewrks are described in detail in Brazdil et al. (2003, 2009). Efficiency mdificatins fr fast discvery f ptimal algrithms are prpsed in wrk (Abdulrahman et al. 2015). In ur we fcus n a systematic analysis f the perfrmance f classifiers depending n the dataset and classifier features, and the level f nise. A future study can adpt the architecture f meta-learning framewrk fr classifier selectin in nisy settings. Smith et al. (2014) prpse cllabrative filtering apprach fr meta-learning, arguing that the relevant algrithm and dataset features can be chsen while applying cllabrative filtering. It is nt clear hw this apprach will perfrm in the case f cncept drift scenari and in the case f multi-criteria perfrmance bjectives. In this wrk we analyze the perfrmance f a classifier in the envirnment with varying label nise and study the effect f dataset characteristics n perfrmance behavir. We leave the cmparisn f meta-learning platfrms fr future wrk. Sequential data and especially time series data cllected frm sensrs are ften affected by nise. That is, the quality f the data may be decreased by errrs due t limitatins in the tlerance f the measurement equipment. Sequential classificatin based n Markv chains with nisy training data is cnsidered in the wrk f Dupnt (2006). The paper shws that smthed Markv chains are very rbust t classificatin nise. The same paper discusses the relatin between the classificatin accuracy and the test set perplexity that is ften used t measure predictin quality. Hwever, unlike ur wrk, n specific frmalizatin is prpsed fr analyzing the effect f different nise levels n the perfrmance f the classifier. Develpment f such frmalizatin is becming increasingly imprtant as sequential and time series data classificatin are

8 K. Mirylenka et al. applied in diverse dmains, such as financial analysis, biinfrmatics and infrmatin retrieval. Fr example, a recent survey (Xing et al. 2010) cnsiders varius methds f sequential classificatin in terms f methdlgies and applicatin dmains. The methds fr sequential classificatin we fcus n are intrduced in the wrks f Rabiner and Juang (1986), Suttn and McCallum (2010), and Elman (1990), which describe hidden Markv mdels, cnditinal randm fields, and recurrent neural netwrks crrespndingly. The techniques described by Mirylenka et al. (2013, 2015), such as cnditinal heavy hitters, allw estimating sequential mdels in real time. The efficiency f such techniques enables the applicability f sequential classificatin t sensr data analysis and mving trajectries. 3 Prblem statement and preliminaries We cnsider settings where a classificatin task is perfrmed under varying levels f label nise, similarly t a cncept drift scenari, where the labels are gradually changing and the nise level is nt knwn in advance. Such settings are cmmn and have been detailed in previus wrks (e.g., see definitin f the prblem f the demanding lrd (Giannakpuls and Palpanas 2013), where a user changes her preferences gradually ver time and an autmatic system is called t adapt t this change). Our gal is t recmmend a classifier that is beneficial fr a given dataset, based n custm algrithm ptimality criteria (e.g., stability under nise vs. maximum pssible perfrmance). The nly required knwledge related t the task is n characteristics f the dataset, such as the number f classes, number f attributes, and intrinsic dimensinality. Mre frmally, we define a training set as T. Part f the instances in this dataset have true labels. We define this set f instances S and term them signal instances. The rest f the instances have nisy (i.e., errneus) labels. We define this set as N and term them nisy instances. Hence, S T, N T and S N = T while S N =. Then S = S and N = N represent the signal magnitude and the nise magnitude f a training set T, respectively. We als allw S = S = 0, N = T N = T, when all the training set is nt valid any lnger because a shift has just ccurred. Crrespndingly, N = N = 0, S = T S = T, when n nise is induced in the training set. Given the abve definitins, we define the signal-t-nise rati Z f a given mment in time: Z = lg(1 + S) lg(1 + N) = lg 1 + S 1 + N, (1) where lg( ) represents the natural lgarithm. We add ne t signal and nise magnitudes s that the lgarithm is defined fr their zer values. 3.1 Characteristic transfer functin In rder t describe the perfrmance f a classifier we use the sigmid rule, which was intrduced and discussed in previus studies (Giannakpuls and Palpanas 2010, 2013). The sigmid rule uses a functin that relates signal-t-nise rati f the training

9 On classifier behavir in the presence f mislabeling nise f( Z) d a lg ( 3) Z 1 Z * r a lg Z inf ( 3) Z 2 Z * Z Fig. 2 Sigmid functin and pints f interest (i.e. an inflectin pint, essentially the pint where the curve changes significantly) set t the expected perfrmance f a learner. This functin is called the characteristic transfer functin (CTF) f the learning algrithm. In this wrk, we als refer t it as the sigmid functin f the algrithm, and use the terms CTF and sigmid functin interchangeably. Given that an algrithm has a minimum perfrmance f m and a maximum perfrmance f M fr a given dmain f signal-t-nise values, then the functin that describes the average perfrmance f as a functin f the signal-t-nise rati Z (Eq. 1), is f the frm (Giannakpuls and Palpanas 2010, 2013): 1 f (Z) = m + (M m) 1 + b exp( c(z d)), (2) where m M, m, M [0, 1], b, c R +, d R are the parameters f the sigmid functin. The number f parameters can be reduced, thugh their presence allws fr a mre accurate estimatin f the perfrmance f a classifier, as has been empirically shwn Giannakpuls and Palpanas (2013). An example f a sigmid functin can be seen in Fig. 2. The definitins f the main cncepts used in the paper are given in Table 1. The basis f the CTF lies n prbably apprximately crrect learning (Haussler 1990; Valiant 1984) and, mre specifically, n the expectatin that in PAC learnable settings enugh (randmly drawn) samples can help a system select a hypthesis identifying cncepts. Thus, in PAC learning, mre crrectly labeled instances decrease the prbability f errr, reaching a limit ɛ. In the CTF mdeling f learning we have tried t mdel what happens in the presence f mislabeled instances, while keeping the limit cases aligned t PAC learning. Let us cnsider the case where a set f cncepts C is (PAC) learnable. This means that, prvided with enugh learning instances, the system will end up learning (a hypthesis H identifying) the cncepts. In this case, the selected, crrectly labeled

10 K. Mirylenka et al. Table 1 The definitins f the main cncepts Cncept Signal-t-nise rati Perfrmance Characteristic transfer functin (CTF) Sigmid rule framewrk (SRF) Definitin The rati f crrectly labeled training instances, t the incrrectly labeled training instances A measure f the accuracy f an algrithm in a nisy training setting. In a classificatin prblem, we use accuracy Is a functin cnnecting the signal-t-nise in the training dataset t the expected perfrmance f a learning algrithm Is an apprach that mdels the expected perfrmance f a learning algrithm, using a sigmid functin as the CTF fr any algrithm instances that were input t the learning system as training are all signal, with n nise. The errr f this system tends t be belw ɛ. Nw, let us cnsider the case where there exists a false labeling prbability 1 P f 0, when the input instances are prvided t the learner. These instances cnstitute a ppulatin f training instances that d nt reflect the riginal, true instance distributins fr C. In this case, the system will (try t) learn a different set f cncepts C, which is a functin f the distributins f C and the false cncepts set distributins F = C. It is easy t deduce that, if the system learns C, i.e., it frms a hypthesis H = H that can map instances t C with an errr prbability belw ɛ after enugh instances, then a prtin f these mappings/predictins will stand in C but nt in C. The mre mislabeled instances are prvided as input t the learner, the higher the prbability that the predictins will be wrng, with respect t C. In the extreme case that all instances f C are prvided as inputs and the learner learns all the space, the prbability f errr can be exactly P f, since the learner will have mistakenly and perfectly learned all the falsely-labeled instances. If all the input instances are prvided mislabeled, then the learner will have learned the full set f false cncepts C = F and C will have n effect n the learning. In this case the perfrmance f the system with respect t C will be minimum. Different values f P f are expected t vary the errr rates between the minimum and the maximum value. If we are interested in the average errr rate (and dually the perfrmance) f the learner ver several experiments in the presence f nise, we can nly assume that, after enugh tries, increasing P f will lead t higher errr ɛ = ɛ + P f >ɛ,p f > 0. At this pint we prvide an alternative view f P f : it is directly cnnected the percentage f nise N = P f T and signal S = (1 P f ) T in a signal-t-nise rati Z frm Eq. 1, where T is a training set as is intrduced abve. T summarize, the intuitin behind the sigmid in Eq. 2 is based n the fact that a (nn-trivial) learning algrithm starts t perfrm well after a certain rati f gd t bad examples has been bserved. Frm that mment n, the perfrmance f the algrithm cnstantly imprves (n average) as the rati is imprved (mntnicity

11 On classifier behavir in the presence f mislabeling nise assumptin), until the pint where the best perfrmance is reached. Then, n matter hw much the rati f gd t bad examples increases, there is little change, because the algrithm cannt d much better, since it has learned the input cncepts (and als pssibly due t its generalizatin prperty, where such a prperty is applicable). We cnsider the CTF t be characteristic f an algrithm fr a given dataset (i.e., set f cncepts, instance and hypthesis space). It has been demnstrated that the sigmid can be estimated frm training sets f varying Z and, then, it can be used as a knwn functin fr a given algrithm (Giannakpuls and Palpanas 2013). Inthis wrk, we demnstrate that the characteristics f a dataset (e.g., Vapnik Chervnenkis r VC dimensin; Vapnik 1998) have a strng influence ver the sigmid parameters (refer t Sect. 6). This prperty f the sigmid allws reusing sigmids acrss datasets with similar characteristics and suggesting the mst prmising learner fr a new dataset, based n custm ptimality criteria. The behavir f different machine learning algrithms in the presence f nise can be cmpared alng several axes f cmparisn, based n the parameters f the sigmid functin. The parameters related t perfrmance include: (a) The minimal perfrmance m, (b) The maximal perfrmance M, (c) The span f the perfrmance range r alg = M m. Withregard t the sensitivity f perfrmance t the change in the signal-t-nise rati, we cnsider: (a) Within which range f nise levels, change in the nise leads t significant change in perfrmance; (b) Hw we can tell apart algrithms that imprve their perfrmance even when the signal-t-nise levels are lw, frm thse that nly imprve in high ranges f the signal-t-nise rati; (c) Hw we can measure the stability f perfrmance f an algrithm against varying nise; (d) At what nise level an algrithm reaches its average perfrmance; (f) Whether reducing the nise in a dataset is expected t have a significant impact n the perfrmance. T answer these questins we perfrm an analytic study f the sigmid functin f a particular algrithm by the means f traditinal functin analysis. This analysis helps t devise measurable dimensins that can answer ur questins abut the parameters related t perfrmance f classifiers. In the next sectin we intrduce the SRF which prpsed several axes fr algrithm cmparisn, based n the CTF. The set f parameters f the CTF and the prpsed SRF is summarized in Table 2 fr quick reference. 4 The sigmid rule framewrk We investigate the prperties f the sigmid functin (Eq. 2) in rder t determine hw each f the parameters m, M, b, c, and d affects the shape f the sigmid, and hw this translates t the expected perfrmance f the learning algrithm. We start by a direct

12 K. Mirylenka et al. Table 2 Ntatin f sigmid-related variables and SRF dimensins Parameter Descriptin Sigmid-related variables S Signal magnitude in training set N Nise magnitude in training set Z Signal-t-nise rati m, M, b, c, d Parameters f sigmid f (Z) Sigmid functin Z inf Pint f inflectin f sigmid (zer f the 2nd derivative) Z (3) 1,2 Zeres f the 3d derivative f sigmid d s Distance between Z inf and Z (3) 1, characterizes the slpe f sigmid [Z, Z ] Active nise range f an algrithm, where perfrmance changes significantly f 1 ( ) Inverse sigmid functin SRF dimensins m Minimal perfrmance f a classificatin algrithm M Maximal perfrmance f a classificatin algrithm r alg Span f the perfrmance range c Slpe indicatr f sigmid functin f an algrithm d alg Width f the active area f algrithm perfrmance r alg /d alg Perfrmance imprvement f an algrithm ver signal-t-nise rati change analysis f the sigmid functin and its parameters. In Fig. 2 we see an illustratin f a sigmid, where we have indicated the pints related t the curve parameters. The reader is referred t Table 2 fr a summary f the symbls. The dmain f the sigmid is, in the general case, Z (, + ). The range f values is (m, M), as als can be seen n Fig. 2. Then, we cnsider the derivatives f the sigmid functin in rder t study its characteristics. Since the first rder derivative is always psitive, f (Z) >0fr Z (, + ), the functin f (Z) is mntnically increasing, which crrespnds t the mntnicity assumptin made in Giannakpuls and Palpanas (2013). Fr the secnd rder derivative, f (Z), it hlds that f (Z) = 0ifZ = d + 1 c lg b, and the pint Z inf = d + 1 c lg b is the pint f inflectin, which shws when the curvature f the functin changes its sign. In the case f the sigmid, the pint f inflectin, Z inf (the middle pint in Fig. 2), is the pint f symmetry, crrespnding t the average perfrmance f the algrithm. Furthermre, Z inf indicates the shift f the sigmid with respect t the rigin, which

13 On classifier behavir in the presence f mislabeling nise (a) (b) Fig. 3 Varying c parameter. c = 1(left), c = 3(right). Other parameters are the same: m = 0, M = 1, b = 2.5, d = 3. is equal t d + 1 c lg b. S, the shift f the sigmid and the average perfrmance fr the signal-t-nise range depends n three parameters: b, c, and d. Learning algrithms can be cmpared based n the slpe f their sigmids. The slpe f the sigmid reflects the imprvement f an expected perfrmance f an algrithm per change in the signal-t-nise rati. T estimate the slpe, we use the distance d s between the pint f inflectin Z inf (zer f the secnd derivative) and the zers f the third derivative (bth zers f the third derivative are at the same distance frm the pint f inflectin). Figure 2 shws the sigmid curve alng with its pints f interest. The zers f the third rder derivative are Z (3) 1,2 = d 1 c lg 2± 3 b. Thus, we have fr d s : d s = Z (3) 1 Z inf = Z inf Z (3) 2 = 1 c lg 1 2. The larger this distance 3 1 is, the mre expanded the sigmid curve is. Since lg , d 3 s is in inverse prprtin t parameter c: d s = a c, where a We find that the parameter c directly influences the slpe f the sigmid curve. We term c as the slpe indicatr f the CTF. Figure 3 depicts sigmids with different c values, illustrating gradual and sharp slpes f the sigmid functin. In the fllwing sectin, we frmulate and discuss dimensins that characterize the behavir f algrithms, based n the axes f cmparisn discussed abve. 4.1 SRF dimensins In this sectin we determine several SRF dimensins based n the sigmid prperties, in additin t m, M, r alg, and c, discussed in Sect. 4. We define active nise range as the range [Z, Z ] in which the change in nise induces a measurable change in the perfrmance. In rder t calculate [Z, Z ] we assume that there is a gd-enugh perfrmance n a given task, appraching M fr a given algrithm. We knw that f (Z) (m, M), and we say that the perfrmance is gd enugh if f (Z) = M (M m) p, p = We define the learning imprvement f the algrithm as the 4 Instead f 0.05, ne can use any value clse t 0, describing a nrmalized measure f distance frm the ptimal perfrmance. In the case f p = 0.05, the distance frm the ptimal perfrmance is 5%.

14 K. Mirylenka et al. size f the signal-t-nise interval in which f (Z) [m+(m m) p, M (M m) p]. Then, using the inverse f 1 (y) = d 1 c lg ( 1 b ( )) M m y m 1, where y = f (Z), we calculate the pints Z and Z, which respectively are the bttm and the tp pints in Fig. 2 fr a given p. We term the distance d alg = Z Z as the width f the active area f the machine learning classifier (see Fig. 2). Then we define a rati between the width f the active area and the span f the perfrmance range r alg d alg, which describes (and is termed as) the learning perfrmance imprvement ver signal-t-nise rati change. In the fllwing paragraphs we describe hw this analysis allws us t cmpare the perfrmance f learning algrithms in the presence f nise. 5 Using SRF t cmpare algrithms Given the perfrmance dimensins described abve, we can cmpare algrithms as fllws. Fr perfrmance-related cmparisn we can use minimal perfrmance m, maximal perfrmance M, and the span f perfrmance range r alg. Algrithms nt affected by the presence r absence f nise will have a minimal r alg value. In a setting with a randmly changing level f nise, this parameter is related t the pssible variance in perfrmance. Related t the sensitivity f perfrmance t the change f the signal-t-nise rati, we can use the fllwing dimensins. 1. The active nise range [Z, Z ], which shws hw early the algrithm starts perating (measured by Z ) and hw early it reaches gd-enugh perfrmance (measured by Z ). We say that the algrithm perates when the level f nise in the data is within the active nise range f the algrithm. 2. The width f the active area d alg = Z Z f the algrithm, which is related t the speed f changing perfrmance fr a given r alg in the dmain f nise. A high d alg value indicates that the algrithm varies its perfrmance in a brad range f signal-t-nise ratis, implying less perfrmance stability in an envirnment with heavily varying degrees f nise. 3. The parameter c f the sigmid functin (slpe indicatr), which is related t the distance d s = 1.32/c. The distance has similar prperties as the inversin f r alg /d alg, but it is independent f the predefined parameter p, which shws the percentage f perfrmance cnsidered as gd-enugh fr a given task and is nt directly cnnected t the active nise range. d s reflects the change in the slpe f the functin. If c is large, then d s is small and indicates higher stability f the algrithm in the presence f nise, and vice versa. 4. The pint f inflectin Z inf, which shws the signal-t-nise rati fr which an algrithm gives the average perfrmance. The parameter can be used t chse the algrithm that reaches its average perfrmance earlier in a nisy envirnment, r the algrithm whse speed f imprvement changes earlier.

15 On classifier behavir in the presence f mislabeling nise A parameter related t bth perfrmance and sensibility is the learning perfrmance imprvement ver signal-t-nise rati change, r alg /d alg. It can be used t determine whether reducing the nise in a dataset is expected t have a significant impact n the perfrmance. An algrithm with a high value f r alg /d alg wuld imply that it makes sense t put mre effrt int reducing nise. Furthermre, using this dimensin a decisin maker can chse mre stable algrithm, when the variance f nise is knwn. In this case, the algrithm with the lwest value f r alg /d alg shuld be chsen in rder t limit the crrespnding variance in perfrmance. Based n the abve discussin, we favr the classifiers with the fllwing prperties: Higher maximal perfrmance M, Larger span f perfrmance range r alg, Higher learning perfrmance imprvement ver signal-t-nise rati change r alg /d alg, Shrter width f the active area f the algrithm d alg, and Larger slpe indicatr c. We expect t get high perfrmance frm an algrithm if the level f nise in the dataset is very lw, and lw perfrmance if the level f nise in the dataset is very high. Decisin makers can easily frmulate different criteria, based n the prpsed dimensins (summarized in Table 2). Based n the abve, the wrkflw t select an algrithm given sme user preference is as fllws: we select an algrithm; we sample the input data fr training; we estimate the SRF parameters; we repeat the prcess fr different algrithms; finally, we analyze the SRF parameters in rder t select the algrithm with the mst apprpriate behavir, i.e., the ne that best matches the user preferences. 6 Experimental evaluatin In the fllwing paragraphs we describe the experimental setup, the datasets used, and the results f ur experiments. We first cnsider nn-sequential and then sequential classifiers. 6.1 Experimental setup fr nn-sequential classifiers In ur study we applied the fllwing machine learning algrithms, implemented in Weka (Hall et al. 2009): (a) IBk k-nearest neighbr classifier; (b) Naïve Bayes classifier; (c) SMO supprt vectr classifier (cf. Keerthi et al. 2001); (d) NbTree a decisin tree with Naïve Bayes classifiers at the leaves; (e) JRip a RIPPER (Chen 1995) rule learner implementatin. We have chsen representative algrithms frm different families f classificatin appraches, cvering ppular classificatin schemes. The experiments were perfrmed n a 2.4 GHz machine with 4GB RAM.

16 K. Mirylenka et al. Classes <7 7 lw high Fig. 4 Dataset gruping labels Attributes <10 10 lw high lw Instances < x<5000 medium 5000 high Number f features Number f classes Fractal Dimensin Lg f instance cunt Fig. 5 Distributin f the dataset characteristics. Real data: triangles, artificial: circles We used a ttal f 24 datasets fr ur experiments. Mst f the real datasets cme frm the UCI machine learning repsitry (Lichman 2013), and ne frm the study (Giannakpuls and Palpanas 2010). Furteen f the dataset are real, while ten are synthetic. All the datasets are divided int grups accrding t the number f classes, attributes (features) and instances in the dataset, as shwn in Fig. 4. There are 12 pssible grups that include all cmbinatins f the parameters. Tw datasets frm each grup were emplyed fr the experiments. We created artificial datasets in the cases were real datasets with a certain set f characteristics were nt available. The distributin f the dataset characteristics is illustrated in Fig. 5. The traits f the datasets illustrated are the number f classes, the number f attributes (features), the number f instances, and the estimated intrinsic (fractal) dimensin. Table 3 cntains the descriptin f the datasets and their surces. We build ten artificial datasets using the fllwing prcedure. Having randmly sampled the number f classes, features and instances, we sample the parameters f each feature distributin. We assume that the features fllw the Gaussian distributin with mean value (μ) in the interval [ 100, 100] and standard deviatin(σ ) in the interval [0.1, 30]. The μ and σ intervals allw verlapping features acrss classes. The descriptin f the parameters f the generated datasets is shwn in Table 3. The nise was induced as fllws. We created the stratified training sets, equal in size t the stratified test sets. T induce nise, we created nisy versins f the training sets

17 On classifier behavir in the presence f mislabeling nise Table 3 The descriptin f the datasets used Dataset Parameters # f classes # f attr # f inst Fract dim x 1 x 2 x 3 x 4 1. Wine a Australian credit apprval b Real frm Giannakpuls and Palpanas (2010) 4. Yeast c Artificial Artificial Artificial Artificial Artificial Breast cancer Wiscnsin d Breast tissue e Letter recgnitin f , Artificial Artificial , Artificial Vehicle Silhuettes g MAGIC gamma , telescpe h 18. Wavefrm i Shuttle j , Artificial , Libras mvement k Image l Wine quality m Artificial x 1 is a number f classes, x 2 is the number f attributes, x 3 is the number f instances, and x 4 is the fractal dimensinality f a dataset a See b See c See d See e See f See g See h See i See j See k See l See m See

18 K. Mirylenka et al. by mislabeling the instances. Using different levels l n f nise, l n = 0, 0.05,...,0.95 5, a training set with l n nise is a set with fractin l n f mislabeled instances. Hence we btained 20 versins f the datasets with varying nise levels. The equal percentages f instances frm different classes (fr stratificatin) were put int test and training sets (as in the riginal dataset with 0% f nise), using the fllwing prcedure: 1. Calculate the number N f bservatins in the riginal dataset; the number f bservatins in the test r training set shuld be the integer part f N/2; 2. Calculate the percentages f each class in the riginal dataset; 3. Calculate the number f bservatins frm each class that shuld be in the training and test sets based n the percentages frm the previus step (runding the numbers if necessary); 4. Divide the bservatins accrding t the classes t btain class datasets, i.e., sets f instances with nly ne class per set; 5. Frm each class dataset, sample bservatins (based n the results f step 3) fr the training set, and place the remaining bservatins f each class dataset int the test set; 6. Make nisy training datasets frm the riginal training dataset with different levels l n f nise. 7. Get the perfrmance f the mdel trained n each nisy dataset l n = 0, 0.05,...,0.95 and tested n a nise-free stratified test set. Given the perfrmance f a classifier fr a particular dataset, we estimate the parameters f the sigmid functin fr each algrithm-dataset pair using a genetic algrithm (described in detail in Appendix 1). The idea is that the algrithm tries t find parameters that minimize the distance between the distributin f values f the real data t that f the estimated sigmid. T this end, we use as a fitness measure a functin f the D statistic f the Klmgrv Smirnv test (Massey 1951), which measures the maximum distance between the Cumulative distributin functins f the tw sample sets. Having the sets f estimated sigmids describing the perfrmance fr particular algrithms and datasets with certain characteristics, we can reasn abut the prperties f the algrithms alng the SRF dimensins. 6.2 Experimental setup fr sequential classifiers We nw cnsider sequential datasets, where the rder f instances is imprtant. Using these datasets, we study whether sequential classifiers adhere t the sigmid rule framewrk. The main difference t the nn-sequential case is that fr the classificatin task we als use the infrmatin abut the previus instances in additin t the features f the current instance. Fr sequential data, we apply three classificatin algrithms based n hidden Markv mdels (HMM) (Rabiner and Juang 1986), cnditinal ran- 5 We nte that high levels f nise such as 95% are ften bserved in the presence f cncept drift, e.g., when learning cmputer-user brwsing habits in a netwrk envirnment with a single IP, and several different users sharing it.

19 On classifier behavir in the presence f mislabeling nise True label Nisy label Features Factrs Fig. 6 Linear chain CRF architecture dm fields (CRF) (Lafferty et al. 2001), and recurrent neural netwrks (RNN) (Elman 1990). Fr the experiments, we cnsider the whle sequence f bservatins as an instance. Fr each algrithm, we define the rder f dependency, and cnsider sequences f the length crrespnding t that rder. Fr the implementatin f these sequential classifiers, we use the JAHMM (Françis 2013), Mallet (McCallum 2002), and Weka (Hall et al. 2009) libraries fr HMM, CRF, and RNN, respectively. The experiments and the libraries are implemented in Java, and the classificatin algrithms are called as library functins. The experiments were perfrmed n a 2.67 GHz machine with 4GB RAM. Hidden Markv mdels Fr HMM-based classificatin, we perfrm the experiments in tw settings by using HMMs f the 3rd and the furth rder. In Fig. 8a and b, we depict the sigmids f the HMM algrithm fr the Rbt walk 4 dataset. The green slid line crrespnds t the true measurements f the perfrmance f the HMM-based classifier fr different values f signal-t-nise rati Z, while the dashed red line crrespnds t the estimated sigmid, the parameters being assessed with a generic algrithm (see Appendix 1). The Pearsn s crrelatin test between the btained perfrmance and the estimated perfrmance based n the sigmid functin shws statistically significant crrelatin values, with cef > 0.99 fr significance level α = This hlds fr bth the first and the secnd setting (the third and the furth rder mdels, crrespndingly). Similar results were btained fr the rest f the datasets and sequential learners. These results cnfirm that the sequential learners adhere t the sigmid rule framewrk. Cnditinal randm fields T better cver sequential classifiers, we emply and study linear and quadratic CRFs. These g beynd class labels, which HMMs use, and als take int accunt data attributes t determine a class. We experiment with CRF in tw different settings, accrding t the rder f the cnnectin between the instances. T label an unseen instance, the mst likely Viterbi labeling y = arg max y p(y (i) x (i) ) is cmputed. In the first setting we cnsider a linear chain CRF, in which an instance depends nly n the previus ne, as depicted in Fig. 6. Since we cnsider nly label nise, we induce nise nly in the labels f the instances. In the secnd setting, we cnsider the mdel f the secnd rder, where an instance depends n the tw previus instances. As an example, the sigmids f the CRF algrithm fr the Rbt walk 4 dataset are shwn in Fig. 8c and d. Like in the case with HMMs, the sigmids fit the perfrmance curves f the CRF classifier well fr all the datasets. Recurrent neural netwrks Whereas HMM takes int accunt nly the data labels f the sequence, and CRF uses bth class labels and the attributes f the previus instances in the sequence in rder t

20 K. Mirylenka et al. Fig. 7 The architecture f RNN classify the current instance, RNN uses the attributes f the current instance, and nly the label f the previus instances. Fr RNN-based classificatin, we prvide the previus label as input t calculate the utput f the current instances (see Fig. 7), as in the wrk (Bdén 2002), and we use tw settings with secnd and third rder dependencies: in the first setting, the tw previus labels are used, while in the secnd setting the three previus labels are used. The sigmids f the RNN algrithms fr the Rbt walk 4 dataset are shwn in Fig. 8e and f. Accrding t Pearsn s crrelatin test fr setting 1 (cef = , p-value = ) and setting 2 (cef = , p-value = ) between the btained perfrmance and estimated sigmid functin, we can say that the classifier based n RNN als fllws the sigmid rule. As in the previus cases f HMMs and CRFs, the sigmids fit well the perfrmance curves fr all the datasets fr the RNN classifier. Thus, the sigmid rule fits RNN-based classifiers as well Datasets We used 18 real datasets fr ur sequential experiments. All the datasets cntain time series where the instances appear in the chrnlgical rder. The label f the last bservatin is assigned t the sequence t imply causality. The distributin f the characteristics f these datasets is illustrated in Fig. 9. These characteristics are: the number f classes, the number f attributes, the number f instances, and the largest lag value that crrespnds t significant autcrrelatin f the labels. Autcrrelatin is a linear dependence f a variable with itself at tw pints in time (Bx et al. 2015). We use the autcrrelatin functin (ACF) implemented in R 6 t calculate autcrrelatin with maximum lag equal t 50 fr all datasets. We chse the first lag that gives the smallest autcrrelatin falling ut f the significance band, as shwn, fr example, in Fig. 10 fr the Pineer gripper dataset, where we chse lag equal t 37. Descriptin f the datasets are shwn in Table 4. We induce nise int sequential data in a way similar t the independent data. We randmly split all data int equally sized training and test parts. Mislabeled instances are in the training data nly. We induce varying levels f nise frm 0 t 100% 6

21 On classifier behavir in the presence f mislabeling nise (a) 1 (b) (c) 1 (d) (e) 1 (f) Fig. 8 Sigmid CTF f the tw settings f the HMM, CRF and RNN-based classificatin algrithm fr the Rbt walk 4 dataset, indicating different algrithm behavirs. Green slid line true measurements, dashed red line estimated sigmid. a Sigmid CTF f HMM 1st setting, b Sigmid CTF f HMM 2nd setting, c Sigmid CTF f CRF 1st setting, d Sigmid CTF f CRF 2nd setting, e Sigmid CTF f RNN, 1st setting, f Sigmid CTF f RNN, 2nd setting (Clr figure nline) as fllws. The nisy versin f a class C i f a training set is created by selecting the level f nise l n, and replacing sme instance with an instance f anther class with prbability l n. Then sequential classifiers are trained using the training data with varius levels f nise, and are used t predict the test data. Given the classificatin f all the test instances, we cmpute the verall perfrmance f the algrithm using classificatin accuracy: P = #CrrectClassificatins #TtalClassificatins

22 K. Mirylenka et al. Fig. 9 Distributins f the characteristics f sequential datasets Fig. 10 Autcrrelatin plt f the Pineer gripper dataset Autcrrelatin value Lag We perfrm repeated randm sub-sampling validatin with 20 splits by randmly chsing half f the instances as training data and the remaining instances as test data, and calculate the average perfrmance f the classifier. Fr the sequential datasets the crrect rder f the instances is kept fr bth training and test phases. The parameters f the crrespnding sigmid are assessed using the values f signal-t-nise rati and crrespnding average perfrmance levels. In the fllwing sectins we cnsider sequential classifiers in mre detail. 6.3 Experiments n classifier behavir cmparisn We perfrm experiments f nisy classificatin using repeated randm sub-sampling validatin 7 with 10 splits per algrithm per dataset per nise level, and calculate the average perfrmance fr each setting. Repeated randm sub-sampling is a cmmn technique f crss-validatin, in which a dataset is randmly split int the training and testing parts and the accuracy results are averaged ver the splits. In the case f sequential learners, we use 20 splits. Given the 20 levels f the signal-t-nise rati and the crrespnding algrithm perfrmance, (i.e., classificatin accuracy) we estimated the parameters f the sigmid. The search in the space f sigmid parameters is 7 Repeated randm sub-sampling validatin is als knwn as Mnte Carl crss-validatin (Kuhn and Jhnsn 2013).

23 On classifier behavir in the presence f mislabeling nise Table 4 Dataset descriptins Dataset Parameters # f classes # f inst Autcrr lag # f attr x 1 x 2 x 3 x 4 1. Climate a Human activity b Pineer gripper c Rbt walk 4 d Diabetes Ozne level 1 h e Pineer turn c Ozne level 8 h f Opprtunity g Rbt walk 2 d Rbt walk 24 d Pineer mve c Callt2 h 2 10, Electric 2 a 2 10, PAMAP i 3 16, Electric 3 a 2 27, Daphnet freezing j 3 38, Ddger k 2 50, x 1 is the number f classes, x 2 is the number f instances, x 3 is the autcrrelatin lag, and x 4 is the number f attributes (features) a Dataset frm Giannakpuls and Palpanas (2010) b See c See d See e See f See g See h See i See j See k See perfrmed by a genetic algrithm, 8 as was prpsed in Giannakpuls and Palpanas (2010). A sample f the true and sigmid-estimated perfrmance graphs fr varying levels f nise fr a nn-sequential classifier can be seen in Fig. 11. The crrespnding graphs fr sequential classifiers are shwn in Fig. 8. Althugh in ur experiments the parameters f the sigmid were estimated ffline, the sigmid rule framewrk can be applied in an nline scenari, fllwing a training perid. 8 The settings f the genetic algrithm can be fund in the appendix.

24 K. Mirylenka et al. (a) (b) Fig. 11 Sigmid CTF f SMO and IBk algrrithms fr Wine dataset. Green slid line true measurements, dashed red line estimated sigmid (Clr figure nline) (a) (b) (c) m M Ibk JRip NB NBTree SMO Ibk JRip NB NBTree SMO c Ibk JRip NB NBTree SMO (d) (e) (f) ralg dalg ralg/dalg Ibk JRip NB NBTree SMO Ibk JRip NB NBTree SMO Ibk JRip NB NBTree SMO Fig. 12 Estimated SRF parameters per algrithm. X axis labels (left-t-right): IBk, JRip, NB, NBTree, SMO. a Min perfrmance m, b Max perfrmance M, c Sigmid slpe c, d Perfrmance span r alg, e Active r area width d alg, f Perf. imprvement alg d alg Figure 12 illustrates the values f the mean and the standard deviatin f the SRF parameters per algrithm, ver all 24 datasets fr the nn-sequential classifiers. As an example f an interpretatin f the figure using SRF, the plts indicate that (fr the range f datasets studied) SMO is expected t imprve its perfrmance faster n average than all ther algrithms, when the signal-t-nise rati increases. This cnclusin is based n the r alg d alg dimensin and the values f the slpe indicatr c (refer t Fig. 12c, f). Figure 11 ffers a visual cmparisn f the perfrmances f the SMO and IBk algrithms fr the Wine dataset, where we can see that SMO imprves its perfrmance twice as fast as IBk. At the same time, the cnfidence interval fr r alg d alg and c f the Naïve Bayes algrithm verlaps with that f the SMO, s these algrithms can als be recmmended if the

25 On classifier behavir in the presence f mislabeling nise m (a) (b) (c) M CRF1 CRF2 HMM1 HMM2 RNN1 RNN2 CRF1 CRF2 HMM1 HMM2 RNN1 RNN2 CRF1 CRF2 HMM1 HMM2 RNN1 RNN2 (d) (e) (f) c r d r/d CRF1 CRF2 HMM1 HMM2 RNN1 RNN2 CRF1 CRF2 HMM1 HMM2 RNN1 RNN2 CRF1 CRF2 HMM1 HMM2 RNN1 RNN2 Fig. 13 SRF parameters per algrithm. X axis labels (left-t-right): CRF 1st setting and 2nd setting, HMM 1st setting and 2nd setting, RNN 1st setting and 2nd setting. a Min perfrmance m, b Max perfrmance M, c Sigmid slpe c, d Perfrmance span r alg, e Active area width d alg, f Perf. imprvement r alg d alg same criteria are ptimized. Nevertheless, SMO and Naïve Bayes algrithms can be well differentiated by the d alg criterin (see Fig. 12e). IBk has a smaller ptential fr imprvement f perfrmance (but als smaller ptential fr lss) than SMO, when nise levels change, given that the span f the perfrmance range r alg is higher fr SMO (Fig. 12d). This difference can als be seen in Fig. 11, where the distance between the minimum and the maximum perfrmance values is larger fr SMO. Figure 13 illustrates the values f the mean and the standard deviatin fr the SRF parameters per algrithm, ver all 18 datasets fr the sequential case. The plts indicate that HMM is expected t imprve its perfrmance faster than the ther algrithms, when the signal-t-nise rati increases (Fig. 13c, f). Thugh the tw settings f RNN perfrm similarly alng these parameters, they are clearly distinct frm the rest f the algrithms. On the ther hand, CRF has smaller ptential fr imprvement f its perfrmance, but als smaller risk fr lw perfrmance than HMM, when nise levels change (Fig. 13d, f). An example f the fast imprvement f HMM when cmpared t ther families f algrithms can be seen n Fig. 8. As can be seen frm Fig. 13, all the families f the sequential classifiers have their specific behavir alng the devised SRF dimensins, with certain variatins depending n the rder f the mdel. We stress that the parameter estimatin des nt require previus knwledge f the nise levels, but is dataset dependent. In the special case f a classifier selectin prcess, having an estimate f the nise level in the dataset helps t reach a decisin thrugh the use f SRF. In the fllwing, we demnstrate the existence f this cnnectin between the parameters f the sigmid curve and the characteristics f the

26 K. Mirylenka et al. datasets. Therefre, SRF can help in chsing the better learner, prvided that we knw the characteristic f the dataset. 7 Mdeling expected perfrmance in new datasets We nw study the cnnectin between the dataset characteristics and the sigmid parameters, irrespective f the chice f the algrithm. In essence, this study shws if and hw the framewrk can be used t estimate perfrmance f algrithms in a nisy setting, in the presence f a new dataset. We cnsider the results btained frm all the algrithms as different samples f the SRF parameters fr a particular dataset. We use regressin analysis t bserve the cumulative effect f the dataset characteristics n a single parameter, and we use crrelatin analysis t detect any cnnectin between each pair f the dataset characteristic and the sigmid parameter. We examine the cnnectins between the dataset characteristics and the sigmid parameters bth individually, and all tgether, in rder t draw the cmplete picture. 7.1 Linear regressin We use the lm methd (fitting linear mdels) implemented in the package stats in R t estimate the values f the parameters f the linear regressin mdel fr each chsen SRF parameter y: y = α 0 + α 1 x 1 + α 2 x 2 + α 3 x 3 + α 4 x 4 + ε, where x i, i = 1, 2,... crrespnd t the features f a dataset, such as the number f classes, the number f instances, and ther dataset characteristics specific t nnsequential and sequential cases, which are ging t be discussed belw; ε is the errr term, E(ε) = 0. Fr chsing the best mdel, we use a mdificatin f the bestlm functin frm nsrfa package in R. This functin als explits the functin leaps f the R package leaps. The criterin fr the best mdel is the adjusted R 2 statistic (the clser the adjusted R 2 is t ne, the better the estimatin is). R 2 defines the prprtin f variatin in the dependent variable (y) that can be explained by the predictrs (X variables) in the regressin mdel. As sme variatin f y can be predicted by chance, the adjusted value f R 2 a is used. Adjusted R 2 takes int accunt the number f bservatins (N) and the number f predictrs (k), and is cmputed using the frmula: Ra 2 = 1 (1 R2 )(N 1). N k 1 After applying the linear mdel, we exclude the nn-significant parameters, and search fr the mdel that maximizes the adjusted R 2 statistic (i.e., the best linear mdel). In additin t Ra 2, we applied a statistical test with significance level α = 0.05 fr each chsen regressin mdel in rder t check if the mdel fits the data well. If the

27 On classifier behavir in the presence f mislabeling nise Table 5 Predictin errr f linear regressin mdels fr nn-sequential classifiers Parameters Errr measures Average (R 2 a ) MSE MAE RMSE RMAE NRMSE Min perfrmance m Max perfrmance M Perfrmance span r alg Active area width d alg Perf. imprvement r alg d alg Sigmid slpe c mdel describes the distributin f the values statistically significantly, then, after the hypthesis testing, the p-value shuld be within the significance level. We applied a leave-ne-ut validatin prcess, where ne dataset is left ut frm training and used fr testing n every run. We perfrmed this analysis separately with each f the variables m, M, r alg, d alg, r alg d alg, and c as a dependent variable y. Nn-sequential case Fr the nn-sequential classifiers, the fllwing dataset parameters were chsen: a number f classes (x 1 ), a number f features (x 2 ), a number f instances (x 3 ), and the intrinsic dimensinality (x 4 ) f a dataset 9, i.e., the fractal crrelatin dimensin (Camastra and Vinciarelli 2002; de Susa et al. 2006), calculated using a bx cunting prcess. The results f mdel fitting and predictin f the SRF dimensins are reprted in Table 5, where average errrs between the bserved and predicted SRF dimensins are shwn. Fr each SRF dimensin chsen, we have bserved 5 values (since 5 machine learning algrithms were used), and having estimated them fr 24 datasets, we ended up with 120 predictins fr a single SRF dimensin. We calculated fur 1 types f errrs: (1) MSE rt mean square errr, MSE = ni=1 n (ŷ i y i ) 2 ; (2) MAE mean abslute errr, MAE = n 1 ni=1 ŷ i y i ; (3) RMSE relative 1 rt mean square errr, RMSE = ni=1 n ( ŷi y i y i ) 2 ;(4)RMAE 10 relative mean abslute errr, RMAE = n 1 ni=1 ŷi y i y i ; and (5) NRMSE nrmalized rt mean square errr, NRMSE = y MSE max y min. In the abve frmulas, ŷ i stands fr the predicted value f the parameter, while y i is the actual value, and n is the number f predictins. The last clumn f Table 5 shws the average f the adjusted R 2 statistic fr mdels that where estimated fr all the SRF dimensins (averaged n the 24 datasets). Figure 14 illustrates hw ur mdels fit the test data, shwing that in mst cases the true values f the sigmid parameters fr each dataset (illustrated by circles that 9 We wuld like t thank Christs Falutss fr kindly prviding the cde fr the fractal dimensinality estimatin. 10 RMAE is als knwn as mean abslute percentage errr.

28 K. Mirylenka et al. m M Value Value Dataset Dataset r alg d alg Value Value Dataset r alg/d alg Dataset c Value Value Dataset Dataset Fig. 14 Real and estimated values f the sigmid parameters fr nn-sequential classifiers. Real values: black rectangles, estimated values: circles, gray zne: 95% predictin cnfidence interval crrespnd t 5 algrithms fr each test dataset i, i = 1, 2,...,24) are within the 95% cnfidence level zne arund the estimated values. This finding further supprts the cnnectin between the dataset parameters and the SRF dimensins. Accrding t the results, the chsen parameters f the datasets can be used t reasn abut the parameters f the sigmid. This allws us t recmmend algrithms fr a new dataset with knwn characteristics, if we have estimated the SRF dimensins fr a set f algrithms n the datasets with similar characteristics. The relatin between nise level and average classificatin perfrmance ver the Climate dataset is illustrated in Fig. 15. There, we depict the curve f the estimated perfrmance, based n the regressin mdels, as well as the curves f the actual perfrmances f several algrithms. The plt shws that the estimatin is quite gd fr the IBk, NBTree and JRip algrithms. The estimatin fr the ther algrithms is nt as gd, which may be attributed t the fact that in ur experiments, we trained the regressin mdels using all available datasets that have very different characteristics. We expect that training mdels with a large number f datasets with similar characteristics can further imprve the accuracy. Sequential case In this set f experiments, nt all the sequential classificatin algrithms use the features f the data instances. Therefre, we examine the influence f the dataset n the CTF parameters by studying the number f classes (x 1 ), the number f instances (x 2 ), and the maximal autcrrelatin lag f a dataset (x 3 ), which is described in mre detail in Sect. 6.2.

29 On classifier behavir in the presence f mislabeling nise Fig. 15 Sigmid with regressin-estimated parameters with actual sigmids f the crrespnding classificatin algrithms fr Climate dataset Table 6 Predictin errr f linear regressin mdels fr sequential data Parameters Errr measures Average (R 2 a ) MSE MAE RMSE RMAE NRMSE Min perfrmance m Max perfrmance M Perfrmance span r alg Active area width d alg Perf. imprvement r alg d alg Sigmid slpe c We als applied a leave-ne-ut methd fr six dependent variables, as in the nnsequential dataset case. The results f mdel fitting and predictins f SRF dimensins are reprted in Table 6. Fr each SRF dimensin chsen, we have bserved 6 values, since 3 sequential learning algrithms were used, and each algrithm is applied in tw different settings f dependency rder. These 6 SRF dimensins are estimated fr 18 datasets, thus 108 predictins were dne fr each SRF dimensin. We als calculated fur types f errrs, as with the nn-sequential datasets (Table 6). Just like the nn-sequential data, fr sequential data classificatin, we can see the relatinship between the dataset parameters and the SRF dimensins. Figure 16 illustrates hw ur mdels fit the test data. The results shw that mst f the true parameter values are within the 95% predictin cnfidence levels. Thus, we can use the dataset parameters in rder t reasn abut the parameters f the sigmid f the algrithms. This enables us t recmmend the algrithm f chice in advance fr a new dataset with knwn characteristics. We can additinally recmmend the rder f the sequential classifier, if we have cmputed the SRF dimensins fr a set f algrithms fr datasets with similar characteristics.

30 K. Mirylenka et al. Fig. 16 Real and estimated values f the sigmid parameters fr sequential classifiers. Real values: black rectangles, estimated values: circles, gray zne: 95% predictin cnfidence interval 7.2 Lgistic regressin We als emplyed lgistic regressin, by applying the same leave-ne-ut prcess, fr the purpse f predicting the SRF dimensins. While applying lgistics regressin, we use the Akaike infrmatin criterin (AIC) instead f the adjusted Radj 2 in rder t chse the best mdel. Being a measure f the relative quality f a statistical mdel fr given data, AIC prvides the means fr mdel selectin. In the general case, the AIC is calculated as fllws AIC = 2k 2ln(L), where k is a number f parameters in the statistical mdel, and L is the maximized value f the likelihd functin fr the estimated mdel (Akaike 1974). We chse the best predictin mdel by minimizing the average AIC. We als calculate fur types f errrs, as in linear regressin. The results f mdel fitting and predictin f SRF dimensin fr nn-sequential data sets are shwn in Table 7, while the results fr sequential data sets are shwn in Table 8. Accrding t the experimental evaluatin, the lgistic regressin mdel gives better predictin fr sme SRF dimensins, prducing relatively smaller errrs cmpared t the linear regressin results. Fr bth the nn-sequential (Tables 5 and 7) and sequential (Tables 6 and 8) cases, the parameters M and r alg are better predicted with the lgistic regressin mdel. Additinally, fr the nn-sequential case, c is better predicted using the lgistic regressin mdel. The accurate predictin f the SRF dimensins supprts

The blessing of dimensionality for kernel methods

The blessing of dimensionality for kernel methods fr kernel methds Building classifiers in high dimensinal space Pierre Dupnt Pierre.Dupnt@ucluvain.be Classifiers define decisin surfaces in sme feature space where the data is either initially represented

More information

Chapter 3: Cluster Analysis

Chapter 3: Cluster Analysis Chapter 3: Cluster Analysis } 3.1 Basic Cncepts f Clustering 3.1.1 Cluster Analysis 3.1. Clustering Categries } 3. Partitining Methds 3..1 The principle 3.. K-Means Methd 3..3 K-Medids Methd 3..4 CLARA

More information

, which yields. where z1. and z2

, which yields. where z1. and z2 The Gaussian r Nrmal PDF, Page 1 The Gaussian r Nrmal Prbability Density Functin Authr: Jhn M Cimbala, Penn State University Latest revisin: 11 September 13 The Gaussian r Nrmal Prbability Density Functin

More information

Pattern Recognition 2014 Support Vector Machines

Pattern Recognition 2014 Support Vector Machines Pattern Recgnitin 2014 Supprt Vectr Machines Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 1 / 55 Overview 1 Separable Case 2 Kernel Functins 3 Allwing Errrs (Sft

More information

SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis

SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical mdel fr micrarray data analysis David Rssell Department f Bistatistics M.D. Andersn Cancer Center, Hustn, TX 77030, USA rsselldavid@gmail.cm

More information

Resampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017

Resampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017 Resampling Methds Crss-validatin, Btstrapping Marek Petrik 2/21/2017 Sme f the figures in this presentatin are taken frm An Intrductin t Statistical Learning, with applicatins in R (Springer, 2013) with

More information

COMP 551 Applied Machine Learning Lecture 11: Support Vector Machines

COMP 551 Applied Machine Learning Lecture 11: Support Vector Machines COMP 551 Applied Machine Learning Lecture 11: Supprt Vectr Machines Instructr: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/cmp551 Unless therwise nted, all material psted fr this curse

More information

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeff Reading: Chapter 2 STATS 202: Data mining and analysis September 27, 2017 1 / 20 Supervised vs. unsupervised learning In unsupervised

More information

Admin. MDP Search Trees. Optimal Quantities. Reinforcement Learning

Admin. MDP Search Trees. Optimal Quantities. Reinforcement Learning Admin Reinfrcement Learning Cntent adapted frm Berkeley CS188 MDP Search Trees Each MDP state prjects an expectimax-like search tree Optimal Quantities The value (utility) f a state s: V*(s) = expected

More information

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification COMP 551 Applied Machine Learning Lecture 5: Generative mdels fr linear classificatin Instructr: Herke van Hf (herke.vanhf@mail.mcgill.ca) Slides mstly by: Jelle Pineau Class web page: www.cs.mcgill.ca/~hvanh2/cmp551

More information

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeff Reading: Chapter 2 STATS 202: Data mining and analysis September 27, 2017 1 / 20 Supervised vs. unsupervised learning In unsupervised

More information

k-nearest Neighbor How to choose k Average of k points more reliable when: Large k: noise in attributes +o o noise in class labels

k-nearest Neighbor How to choose k Average of k points more reliable when: Large k: noise in attributes +o o noise in class labels Mtivating Example Memry-Based Learning Instance-Based Learning K-earest eighbr Inductive Assumptin Similar inputs map t similar utputs If nt true => learning is impssible If true => learning reduces t

More information

Enhancing Performance of MLP/RBF Neural Classifiers via an Multivariate Data Distribution Scheme

Enhancing Performance of MLP/RBF Neural Classifiers via an Multivariate Data Distribution Scheme Enhancing Perfrmance f / Neural Classifiers via an Multivariate Data Distributin Scheme Halis Altun, Gökhan Gelen Nigde University, Electrical and Electrnics Engineering Department Nigde, Turkey haltun@nigde.edu.tr

More information

Modelling of Clock Behaviour. Don Percival. Applied Physics Laboratory University of Washington Seattle, Washington, USA

Modelling of Clock Behaviour. Don Percival. Applied Physics Laboratory University of Washington Seattle, Washington, USA Mdelling f Clck Behaviur Dn Percival Applied Physics Labratry University f Washingtn Seattle, Washingtn, USA verheads and paper fr talk available at http://faculty.washingtn.edu/dbp/talks.html 1 Overview

More information

Biplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 13 Offprint

Biplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 13 Offprint Biplts in Practice MICHAEL GREENACRE Prfessr f Statistics at the Pmpeu Fabra University Chapter 13 Offprint CASE STUDY BIOMEDICINE Cmparing Cancer Types Accrding t Gene Epressin Arrays First published:

More information

A New Evaluation Measure. J. Joiner and L. Werner. The problems of evaluation and the needed criteria of evaluation

A New Evaluation Measure. J. Joiner and L. Werner. The problems of evaluation and the needed criteria of evaluation III-l III. A New Evaluatin Measure J. Jiner and L. Werner Abstract The prblems f evaluatin and the needed criteria f evaluatin measures in the SMART system f infrmatin retrieval are reviewed and discussed.

More information

What is Statistical Learning?

What is Statistical Learning? What is Statistical Learning? Sales 5 10 15 20 25 Sales 5 10 15 20 25 Sales 5 10 15 20 25 0 50 100 200 300 TV 0 10 20 30 40 50 Radi 0 20 40 60 80 100 Newspaper Shwn are Sales vs TV, Radi and Newspaper,

More information

ENSC Discrete Time Systems. Project Outline. Semester

ENSC Discrete Time Systems. Project Outline. Semester ENSC 49 - iscrete Time Systems Prject Outline Semester 006-1. Objectives The gal f the prject is t design a channel fading simulatr. Upn successful cmpletin f the prject, yu will reinfrce yur understanding

More information

7 TH GRADE MATH STANDARDS

7 TH GRADE MATH STANDARDS ALGEBRA STANDARDS Gal 1: Students will use the language f algebra t explre, describe, represent, and analyze number expressins and relatins 7 TH GRADE MATH STANDARDS 7.M.1.1: (Cmprehensin) Select, use,

More information

NUROP CONGRESS PAPER CHINESE PINYIN TO CHINESE CHARACTER CONVERSION

NUROP CONGRESS PAPER CHINESE PINYIN TO CHINESE CHARACTER CONVERSION NUROP Chinese Pinyin T Chinese Character Cnversin NUROP CONGRESS PAPER CHINESE PINYIN TO CHINESE CHARACTER CONVERSION CHIA LI SHI 1 AND LUA KIM TENG 2 Schl f Cmputing, Natinal University f Singapre 3 Science

More information

COMP 551 Applied Machine Learning Lecture 9: Support Vector Machines (cont d)

COMP 551 Applied Machine Learning Lecture 9: Support Vector Machines (cont d) COMP 551 Applied Machine Learning Lecture 9: Supprt Vectr Machines (cnt d) Instructr: Herke van Hf (herke.vanhf@mail.mcgill.ca) Slides mstly by: Class web page: www.cs.mcgill.ca/~hvanh2/cmp551 Unless therwise

More information

MODULE FOUR. This module addresses functions. SC Academic Elementary Algebra Standards:

MODULE FOUR. This module addresses functions. SC Academic Elementary Algebra Standards: MODULE FOUR This mdule addresses functins SC Academic Standards: EA-3.1 Classify a relatinship as being either a functin r nt a functin when given data as a table, set f rdered pairs, r graph. EA-3.2 Use

More information

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9. Sectin 7 Mdel Assessment This sectin is based n Stck and Watsn s Chapter 9. Internal vs. external validity Internal validity refers t whether the analysis is valid fr the ppulatin and sample being studied.

More information

x 1 Outline IAML: Logistic Regression Decision Boundaries Example Data

x 1 Outline IAML: Logistic Regression Decision Boundaries Example Data Outline IAML: Lgistic Regressin Charles Suttn and Victr Lavrenk Schl f Infrmatics Semester Lgistic functin Lgistic regressin Learning lgistic regressin Optimizatin The pwer f nn-linear basis functins Least-squares

More information

CS 477/677 Analysis of Algorithms Fall 2007 Dr. George Bebis Course Project Due Date: 11/29/2007

CS 477/677 Analysis of Algorithms Fall 2007 Dr. George Bebis Course Project Due Date: 11/29/2007 CS 477/677 Analysis f Algrithms Fall 2007 Dr. Gerge Bebis Curse Prject Due Date: 11/29/2007 Part1: Cmparisn f Srting Algrithms (70% f the prject grade) The bjective f the first part f the assignment is

More information

Tree Structured Classifier

Tree Structured Classifier Tree Structured Classifier Reference: Classificatin and Regressin Trees by L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stne, Chapman & Hall, 98. A Medical Eample (CART): Predict high risk patients

More information

Resampling Methods. Chapter 5. Chapter 5 1 / 52

Resampling Methods. Chapter 5. Chapter 5 1 / 52 Resampling Methds Chapter 5 Chapter 5 1 / 52 1 51 Validatin set apprach 2 52 Crss validatin 3 53 Btstrap Chapter 5 2 / 52 Abut Resampling An imprtant statistical tl Pretending the data as ppulatin and

More information

Data Mining: Concepts and Techniques. Classification and Prediction. Chapter February 8, 2007 CSE-4412: Data Mining 1

Data Mining: Concepts and Techniques. Classification and Prediction. Chapter February 8, 2007 CSE-4412: Data Mining 1 Data Mining: Cncepts and Techniques Classificatin and Predictin Chapter 6.4-6 February 8, 2007 CSE-4412: Data Mining 1 Chapter 6 Classificatin and Predictin 1. What is classificatin? What is predictin?

More information

Part 3 Introduction to statistical classification techniques

Part 3 Introduction to statistical classification techniques Part 3 Intrductin t statistical classificatin techniques Machine Learning, Part 3, March 07 Fabi Rli Preamble ØIn Part we have seen that if we knw: Psterir prbabilities P(ω i / ) Or the equivalent terms

More information

Application of ILIUM to the estimation of the T eff [Fe/H] pair from BP/RP

Application of ILIUM to the estimation of the T eff [Fe/H] pair from BP/RP Applicatin f ILIUM t the estimatin f the T eff [Fe/H] pair frm BP/RP prepared by: apprved by: reference: issue: 1 revisin: 1 date: 2009-02-10 status: Issued Cryn A.L. Bailer-Jnes Max Planck Institute fr

More information

Eric Klein and Ning Sa

Eric Klein and Ning Sa Week 12. Statistical Appraches t Netwrks: p1 and p* Wasserman and Faust Chapter 15: Statistical Analysis f Single Relatinal Netwrks There are fur tasks in psitinal analysis: 1) Define Equivalence 2) Measure

More information

Lab 1 The Scientific Method

Lab 1 The Scientific Method INTRODUCTION The fllwing labratry exercise is designed t give yu, the student, an pprtunity t explre unknwn systems, r universes, and hypthesize pssible rules which may gvern the behavir within them. Scientific

More information

Determining the Accuracy of Modal Parameter Estimation Methods

Determining the Accuracy of Modal Parameter Estimation Methods Determining the Accuracy f Mdal Parameter Estimatin Methds by Michael Lee Ph.D., P.E. & Mar Richardsn Ph.D. Structural Measurement Systems Milpitas, CA Abstract The mst cmmn type f mdal testing system

More information

IAML: Support Vector Machines

IAML: Support Vector Machines 1 / 22 IAML: Supprt Vectr Machines Charles Suttn and Victr Lavrenk Schl f Infrmatics Semester 1 2 / 22 Outline Separating hyperplane with maimum margin Nn-separable training data Epanding the input int

More information

Least Squares Optimal Filtering with Multirate Observations

Least Squares Optimal Filtering with Multirate Observations Prc. 36th Asilmar Cnf. n Signals, Systems, and Cmputers, Pacific Grve, CA, Nvember 2002 Least Squares Optimal Filtering with Multirate Observatins Charles W. herrien and Anthny H. Hawes Department f Electrical

More information

Admissibility Conditions and Asymptotic Behavior of Strongly Regular Graphs

Admissibility Conditions and Asymptotic Behavior of Strongly Regular Graphs Admissibility Cnditins and Asympttic Behavir f Strngly Regular Graphs VASCO MOÇO MANO Department f Mathematics University f Prt Oprt PORTUGAL vascmcman@gmailcm LUÍS ANTÓNIO DE ALMEIDA VIEIRA Department

More information

Writing Guidelines. (Updated: November 25, 2009) Forwards

Writing Guidelines. (Updated: November 25, 2009) Forwards Writing Guidelines (Updated: Nvember 25, 2009) Frwards I have fund in my review f the manuscripts frm ur students and research assciates, as well as thse submitted t varius jurnals by thers that the majr

More information

COMP 551 Applied Machine Learning Lecture 4: Linear classification

COMP 551 Applied Machine Learning Lecture 4: Linear classification COMP 551 Applied Machine Learning Lecture 4: Linear classificatin Instructr: Jelle Pineau (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/cmp551 Unless therwise nted, all material psted

More information

Elements of Machine Intelligence - I

Elements of Machine Intelligence - I ECE-175A Elements f Machine Intelligence - I Ken Kreutz-Delgad Nun Vascncels ECE Department, UCSD Winter 2011 The curse The curse will cver basic, but imprtant, aspects f machine learning and pattern recgnitin

More information

MATCHING TECHNIQUES. Technical Track Session VI. Emanuela Galasso. The World Bank

MATCHING TECHNIQUES. Technical Track Session VI. Emanuela Galasso. The World Bank MATCHING TECHNIQUES Technical Track Sessin VI Emanuela Galass The Wrld Bank These slides were develped by Christel Vermeersch and mdified by Emanuela Galass fr the purpse f this wrkshp When can we use

More information

The Kullback-Leibler Kernel as a Framework for Discriminant and Localized Representations for Visual Recognition

The Kullback-Leibler Kernel as a Framework for Discriminant and Localized Representations for Visual Recognition The Kullback-Leibler Kernel as a Framewrk fr Discriminant and Lcalized Representatins fr Visual Recgnitin Nun Vascncels Purdy H Pedr Mren ECE Department University f Califrnia, San Dieg HP Labs Cambridge

More information

Churn Prediction using Dynamic RFM-Augmented node2vec

Churn Prediction using Dynamic RFM-Augmented node2vec Churn Predictin using Dynamic RFM-Augmented nde2vec Sandra Mitrvić, Jchen de Weerdt, Bart Baesens & Wilfried Lemahieu Department f Decisin Sciences and Infrmatin Management, KU Leuven 18 September 2017,

More information

Collocation Map for Overcoming Data Sparseness

Collocation Map for Overcoming Data Sparseness Cllcatin Map fr Overcming Data Sparseness Mnj Kim, Yung S. Han, and Key-Sun Chi Department f Cmputer Science Krea Advanced Institute f Science and Technlgy Taejn, 305-701, Krea mj0712~eve.kaist.ac.kr,

More information

Math Foundations 20 Work Plan

Math Foundations 20 Work Plan Math Fundatins 20 Wrk Plan Units / Tpics 20.8 Demnstrate understanding f systems f linear inequalities in tw variables. Time Frame December 1-3 weeks 6-10 Majr Learning Indicatrs Identify situatins relevant

More information

A Quick Overview of the. Framework for K 12 Science Education

A Quick Overview of the. Framework for K 12 Science Education A Quick Overview f the NGSS EQuIP MODULE 1 Framewrk fr K 12 Science Educatin Mdule 1: A Quick Overview f the Framewrk fr K 12 Science Educatin This mdule prvides a brief backgrund n the Framewrk fr K-12

More information

WRITING THE REPORT. Organizing the report. Title Page. Table of Contents

WRITING THE REPORT. Organizing the report. Title Page. Table of Contents WRITING THE REPORT Organizing the reprt Mst reprts shuld be rganized in the fllwing manner. Smetime there is a valid reasn t include extra chapters in within the bdy f the reprt. 1. Title page 2. Executive

More information

Computational modeling techniques

Computational modeling techniques Cmputatinal mdeling techniques Lecture 2: Mdeling change. In Petre Department f IT, Åb Akademi http://users.ab.fi/ipetre/cmpmd/ Cntent f the lecture Basic paradigm f mdeling change Examples Linear dynamical

More information

Differentiation Applications 1: Related Rates

Differentiation Applications 1: Related Rates Differentiatin Applicatins 1: Related Rates 151 Differentiatin Applicatins 1: Related Rates Mdel 1: Sliding Ladder 10 ladder y 10 ladder 10 ladder A 10 ft ladder is leaning against a wall when the bttm

More information

This section is primarily focused on tools to aid us in finding roots/zeros/ -intercepts of polynomials. Essentially, our focus turns to solving.

This section is primarily focused on tools to aid us in finding roots/zeros/ -intercepts of polynomials. Essentially, our focus turns to solving. Sectin 3.2: Many f yu WILL need t watch the crrespnding vides fr this sectin n MyOpenMath! This sectin is primarily fcused n tls t aid us in finding rts/zers/ -intercepts f plynmials. Essentially, ur fcus

More information

Bootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) >

Bootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) > Btstrap Methd > # Purpse: understand hw btstrap methd wrks > bs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(bs) > mean(bs) [1] 21.64625 > # estimate f lambda > lambda = 1/mean(bs);

More information

Analysis on the Stability of Reservoir Soil Slope Based on Fuzzy Artificial Neural Network

Analysis on the Stability of Reservoir Soil Slope Based on Fuzzy Artificial Neural Network Research Jurnal f Applied Sciences, Engineering and Technlgy 5(2): 465-469, 2013 ISSN: 2040-7459; E-ISSN: 2040-7467 Maxwell Scientific Organizatin, 2013 Submitted: May 08, 2012 Accepted: May 29, 2012 Published:

More information

BASD HIGH SCHOOL FORMAL LAB REPORT

BASD HIGH SCHOOL FORMAL LAB REPORT BASD HIGH SCHOOL FORMAL LAB REPORT *WARNING: After an explanatin f what t include in each sectin, there is an example f hw the sectin might lk using a sample experiment Keep in mind, the sample lab used

More information

How do scientists measure trees? What is DBH?

How do scientists measure trees? What is DBH? Hw d scientists measure trees? What is DBH? Purpse Students develp an understanding f tree size and hw scientists measure trees. Students bserve and measure tree ckies and explre the relatinship between

More information

CHAPTER 3 INEQUALITIES. Copyright -The Institute of Chartered Accountants of India

CHAPTER 3 INEQUALITIES. Copyright -The Institute of Chartered Accountants of India CHAPTER 3 INEQUALITIES Cpyright -The Institute f Chartered Accuntants f India INEQUALITIES LEARNING OBJECTIVES One f the widely used decisin making prblems, nwadays, is t decide n the ptimal mix f scarce

More information

Lead/Lag Compensator Frequency Domain Properties and Design Methods

Lead/Lag Compensator Frequency Domain Properties and Design Methods Lectures 6 and 7 Lead/Lag Cmpensatr Frequency Dmain Prperties and Design Methds Definitin Cnsider the cmpensatr (ie cntrller Fr, it is called a lag cmpensatr s K Fr s, it is called a lead cmpensatr Ntatin

More information

Methods for Determination of Mean Speckle Size in Simulated Speckle Pattern

Methods for Determination of Mean Speckle Size in Simulated Speckle Pattern 0.478/msr-04-004 MEASUREMENT SCENCE REVEW, Vlume 4, N. 3, 04 Methds fr Determinatin f Mean Speckle Size in Simulated Speckle Pattern. Hamarvá, P. Šmíd, P. Hrváth, M. Hrabvský nstitute f Physics f the Academy

More information

Document for ENES5 meeting

Document for ENES5 meeting HARMONISATION OF EXPOSURE SCENARIO SHORT TITLES Dcument fr ENES5 meeting Paper jintly prepared by ECHA Cefic DUCC ESCOM ES Shrt Titles Grup 13 Nvember 2013 OBJECTIVES FOR ENES5 The bjective f this dcument

More information

AP Statistics Notes Unit Two: The Normal Distributions

AP Statistics Notes Unit Two: The Normal Distributions AP Statistics Ntes Unit Tw: The Nrmal Distributins Syllabus Objectives: 1.5 The student will summarize distributins f data measuring the psitin using quartiles, percentiles, and standardized scres (z-scres).

More information

[COLLEGE ALGEBRA EXAM I REVIEW TOPICS] ( u s e t h i s t o m a k e s u r e y o u a r e r e a d y )

[COLLEGE ALGEBRA EXAM I REVIEW TOPICS] ( u s e t h i s t o m a k e s u r e y o u a r e r e a d y ) (Abut the final) [COLLEGE ALGEBRA EXAM I REVIEW TOPICS] ( u s e t h i s t m a k e s u r e y u a r e r e a d y ) The department writes the final exam s I dn't really knw what's n it and I can't very well

More information

Subject description processes

Subject description processes Subject representatin 6.1.2. Subject descriptin prcesses Overview Fur majr prcesses r areas f practice fr representing subjects are classificatin, subject catalging, indexing, and abstracting. The prcesses

More information

CAUSAL INFERENCE. Technical Track Session I. Phillippe Leite. The World Bank

CAUSAL INFERENCE. Technical Track Session I. Phillippe Leite. The World Bank CAUSAL INFERENCE Technical Track Sessin I Phillippe Leite The Wrld Bank These slides were develped by Christel Vermeersch and mdified by Phillippe Leite fr the purpse f this wrkshp Plicy questins are causal

More information

Checking the resolved resonance region in EXFOR database

Checking the resolved resonance region in EXFOR database Checking the reslved resnance regin in EXFOR database Gttfried Bertn Sciété de Calcul Mathématique (SCM) Oscar Cabells OECD/NEA Data Bank JEFF Meetings - Sessin JEFF Experiments Nvember 0-4, 017 Bulgne-Billancurt,

More information

making triangle (ie same reference angle) ). This is a standard form that will allow us all to have the X= y=

making triangle (ie same reference angle) ). This is a standard form that will allow us all to have the X= y= Intrductin t Vectrs I 21 Intrductin t Vectrs I 22 I. Determine the hrizntal and vertical cmpnents f the resultant vectr by cunting n the grid. X= y= J. Draw a mangle with hrizntal and vertical cmpnents

More information

A Matrix Representation of Panel Data

A Matrix Representation of Panel Data web Extensin 6 Appendix 6.A A Matrix Representatin f Panel Data Panel data mdels cme in tw brad varieties, distinct intercept DGPs and errr cmpnent DGPs. his appendix presents matrix algebra representatins

More information

Revision: August 19, E Main Suite D Pullman, WA (509) Voice and Fax

Revision: August 19, E Main Suite D Pullman, WA (509) Voice and Fax .7.4: Direct frequency dmain circuit analysis Revisin: August 9, 00 5 E Main Suite D Pullman, WA 9963 (509) 334 6306 ice and Fax Overview n chapter.7., we determined the steadystate respnse f electrical

More information

A Scalable Recurrent Neural Network Framework for Model-free

A Scalable Recurrent Neural Network Framework for Model-free A Scalable Recurrent Neural Netwrk Framewrk fr Mdel-free POMDPs April 3, 2007 Zhenzhen Liu, Itamar Elhanany Machine Intelligence Lab Department f Electrical and Cmputer Engineering The University f Tennessee

More information

Perfrmance f Sensitizing Rules n Shewhart Cntrl Charts with Autcrrelated Data Key Wrds: Autregressive, Mving Average, Runs Tests, Shewhart Cntrl Chart

Perfrmance f Sensitizing Rules n Shewhart Cntrl Charts with Autcrrelated Data Key Wrds: Autregressive, Mving Average, Runs Tests, Shewhart Cntrl Chart Perfrmance f Sensitizing Rules n Shewhart Cntrl Charts with Autcrrelated Data Sandy D. Balkin Dennis K. J. Lin y Pennsylvania State University, University Park, PA 16802 Sandy Balkin is a graduate student

More information

MATCHING TECHNIQUES Technical Track Session VI Céline Ferré The World Bank

MATCHING TECHNIQUES Technical Track Session VI Céline Ferré The World Bank MATCHING TECHNIQUES Technical Track Sessin VI Céline Ferré The Wrld Bank When can we use matching? What if the assignment t the treatment is nt dne randmly r based n an eligibility index, but n the basis

More information

Kinetic Model Completeness

Kinetic Model Completeness 5.68J/10.652J Spring 2003 Lecture Ntes Tuesday April 15, 2003 Kinetic Mdel Cmpleteness We say a chemical kinetic mdel is cmplete fr a particular reactin cnditin when it cntains all the species and reactins

More information

Department of Economics, University of California, Davis Ecn 200C Micro Theory Professor Giacomo Bonanno. Insurance Markets

Department of Economics, University of California, Davis Ecn 200C Micro Theory Professor Giacomo Bonanno. Insurance Markets Department f Ecnmics, University f alifrnia, Davis Ecn 200 Micr Thery Prfessr Giacm Bnann Insurance Markets nsider an individual wh has an initial wealth f. ith sme prbability p he faces a lss f x (0

More information

Comprehensive Exam Guidelines Department of Chemical and Biomolecular Engineering, Ohio University

Comprehensive Exam Guidelines Department of Chemical and Biomolecular Engineering, Ohio University Cmprehensive Exam Guidelines Department f Chemical and Bimlecular Engineering, Ohi University Purpse In the Cmprehensive Exam, the student prepares an ral and a written research prpsal. The Cmprehensive

More information

Five Whys How To Do It Better

Five Whys How To Do It Better Five Whys Definitin. As explained in the previus article, we define rt cause as simply the uncvering f hw the current prblem came int being. Fr a simple causal chain, it is the entire chain. Fr a cmplex

More information

Multiple Source Multiple. using Network Coding

Multiple Source Multiple. using Network Coding Multiple Surce Multiple Destinatin Tplgy Inference using Netwrk Cding Pegah Sattari EECS, UC Irvine Jint wrk with Athina Markpulu, at UCI, Christina Fraguli, at EPFL, Lausanne Outline Netwrk Tmgraphy Gal,

More information

Margin Distribution and Learning Algorithms

Margin Distribution and Learning Algorithms ICML 03 Margin Distributin and Learning Algrithms Ashutsh Garg IBM Almaden Research Center, San Jse, CA 9513 USA Dan Rth Department f Cmputer Science, University f Illinis, Urbana, IL 61801 USA ASHUTOSH@US.IBM.COM

More information

2004 AP CHEMISTRY FREE-RESPONSE QUESTIONS

2004 AP CHEMISTRY FREE-RESPONSE QUESTIONS 2004 AP CHEMISTRY FREE-RESPONSE QUESTIONS 6. An electrchemical cell is cnstructed with an pen switch, as shwn in the diagram abve. A strip f Sn and a strip f an unknwn metal, X, are used as electrdes.

More information

We can see from the graph above that the intersection is, i.e., [ ).

We can see from the graph above that the intersection is, i.e., [ ). MTH 111 Cllege Algebra Lecture Ntes July 2, 2014 Functin Arithmetic: With nt t much difficulty, we ntice that inputs f functins are numbers, and utputs f functins are numbers. S whatever we can d with

More information

Support-Vector Machines

Support-Vector Machines Supprt-Vectr Machines Intrductin Supprt vectr machine is a linear machine with sme very nice prperties. Haykin chapter 6. See Alpaydin chapter 13 fr similar cntent. Nte: Part f this lecture drew material

More information

Tutorial 4: Parameter optimization

Tutorial 4: Parameter optimization SRM Curse 2013 Tutrial 4 Parameters Tutrial 4: Parameter ptimizatin The aim f this tutrial is t prvide yu with a feeling f hw a few f the parameters that can be set n a QQQ instrument affect SRM results.

More information

Computational modeling techniques

Computational modeling techniques Cmputatinal mdeling techniques Lecture 4: Mdel checing fr ODE mdels In Petre Department f IT, Åb Aademi http://www.users.ab.fi/ipetre/cmpmd/ Cntent Stichimetric matrix Calculating the mass cnservatin relatins

More information

Technical Bulletin. Generation Interconnection Procedures. Revisions to Cluster 4, Phase 1 Study Methodology

Technical Bulletin. Generation Interconnection Procedures. Revisions to Cluster 4, Phase 1 Study Methodology Technical Bulletin Generatin Intercnnectin Prcedures Revisins t Cluster 4, Phase 1 Study Methdlgy Release Date: Octber 20, 2011 (Finalizatin f the Draft Technical Bulletin released n September 19, 2011)

More information

BLAST / HIDDEN MARKOV MODELS

BLAST / HIDDEN MARKOV MODELS CS262 (Winter 2015) Lecture 5 (January 20) Scribe: Kat Gregry BLAST / HIDDEN MARKOV MODELS BLAST CONTINUED HEURISTIC LOCAL ALIGNMENT Use Cmmnly used t search vast bilgical databases (n the rder f terabases/tetrabases)

More information

Hypothesis Tests for One Population Mean

Hypothesis Tests for One Population Mean Hypthesis Tests fr One Ppulatin Mean Chapter 9 Ala Abdelbaki Objective Objective: T estimate the value f ne ppulatin mean Inferential statistics using statistics in rder t estimate parameters We will be

More information

Sequential Allocation with Minimal Switching

Sequential Allocation with Minimal Switching In Cmputing Science and Statistics 28 (1996), pp. 567 572 Sequential Allcatin with Minimal Switching Quentin F. Stut 1 Janis Hardwick 1 EECS Dept., University f Michigan Statistics Dept., Purdue University

More information

CS 109 Lecture 23 May 18th, 2016

CS 109 Lecture 23 May 18th, 2016 CS 109 Lecture 23 May 18th, 2016 New Datasets Heart Ancestry Netflix Our Path Parameter Estimatin Machine Learning: Frmally Many different frms f Machine Learning We fcus n the prblem f predictin Want

More information

The standards are taught in the following sequence.

The standards are taught in the following sequence. B L U E V A L L E Y D I S T R I C T C U R R I C U L U M MATHEMATICS Third Grade In grade 3, instructinal time shuld fcus n fur critical areas: (1) develping understanding f multiplicatin and divisin and

More information

Building to Transformations on Coordinate Axis Grade 5: Geometry Graph points on the coordinate plane to solve real-world and mathematical problems.

Building to Transformations on Coordinate Axis Grade 5: Geometry Graph points on the coordinate plane to solve real-world and mathematical problems. Building t Transfrmatins n Crdinate Axis Grade 5: Gemetry Graph pints n the crdinate plane t slve real-wrld and mathematical prblems. 5.G.1. Use a pair f perpendicular number lines, called axes, t define

More information

CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS

CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS 1 Influential bservatins are bservatins whse presence in the data can have a distrting effect n the parameter estimates and pssibly the entire analysis,

More information

Department: MATHEMATICS

Department: MATHEMATICS Cde: MATH 022 Title: ALGEBRA SKILLS Institute: STEM Department: MATHEMATICS Curse Descriptin: This curse prvides students wh have cmpleted MATH 021 with the necessary skills and cncepts t cntinue the study

More information

Emphases in Common Core Standards for Mathematical Content Kindergarten High School

Emphases in Common Core Standards for Mathematical Content Kindergarten High School Emphases in Cmmn Cre Standards fr Mathematical Cntent Kindergarten High Schl Cntent Emphases by Cluster March 12, 2012 Describes cntent emphases in the standards at the cluster level fr each grade. These

More information

Distributions, spatial statistics and a Bayesian perspective

Distributions, spatial statistics and a Bayesian perspective Distributins, spatial statistics and a Bayesian perspective Dug Nychka Natinal Center fr Atmspheric Research Distributins and densities Cnditinal distributins and Bayes Thm Bivariate nrmal Spatial statistics

More information

Chapter Summary. Mathematical Induction Strong Induction Recursive Definitions Structural Induction Recursive Algorithms

Chapter Summary. Mathematical Induction Strong Induction Recursive Definitions Structural Induction Recursive Algorithms Chapter 5 1 Chapter Summary Mathematical Inductin Strng Inductin Recursive Definitins Structural Inductin Recursive Algrithms Sectin 5.1 3 Sectin Summary Mathematical Inductin Examples f Prf by Mathematical

More information

Land Information New Zealand Topographic Strategy DRAFT (for discussion)

Land Information New Zealand Topographic Strategy DRAFT (for discussion) Land Infrmatin New Zealand Tpgraphic Strategy DRAFT (fr discussin) Natinal Tpgraphic Office Intrductin The Land Infrmatin New Zealand Tpgraphic Strategy will prvide directin fr the cllectin and maintenance

More information

Pipetting 101 Developed by BSU CityLab

Pipetting 101 Developed by BSU CityLab Discver the Micrbes Within: The Wlbachia Prject Pipetting 101 Develped by BSU CityLab Clr Cmparisns Pipetting Exercise #1 STUDENT OBJECTIVES Students will be able t: Chse the crrect size micrpipette fr

More information

o o IMPORTANT REMINDERS Reports will be graded largely on their ability to clearly communicate results and important conclusions.

o o IMPORTANT REMINDERS Reports will be graded largely on their ability to clearly communicate results and important conclusions. BASD High Schl Frmal Lab Reprt GENERAL INFORMATION 12 pt Times New Rman fnt Duble-spaced, if required by yur teacher 1 inch margins n all sides (tp, bttm, left, and right) Always write in third persn (avid

More information

1996 Engineering Systems Design and Analysis Conference, Montpellier, France, July 1-4, 1996, Vol. 7, pp

1996 Engineering Systems Design and Analysis Conference, Montpellier, France, July 1-4, 1996, Vol. 7, pp THE POWER AND LIMIT OF NEURAL NETWORKS T. Y. Lin Department f Mathematics and Cmputer Science San Jse State University San Jse, Califrnia 959-003 tylin@cs.ssu.edu and Bereley Initiative in Sft Cmputing*

More information

On Huntsberger Type Shrinkage Estimator for the Mean of Normal Distribution ABSTRACT INTRODUCTION

On Huntsberger Type Shrinkage Estimator for the Mean of Normal Distribution ABSTRACT INTRODUCTION Malaysian Jurnal f Mathematical Sciences 4(): 7-4 () On Huntsberger Type Shrinkage Estimatr fr the Mean f Nrmal Distributin Department f Mathematical and Physical Sciences, University f Nizwa, Sultanate

More information

22.54 Neutron Interactions and Applications (Spring 2004) Chapter 11 (3/11/04) Neutron Diffusion

22.54 Neutron Interactions and Applications (Spring 2004) Chapter 11 (3/11/04) Neutron Diffusion .54 Neutrn Interactins and Applicatins (Spring 004) Chapter (3//04) Neutrn Diffusin References -- J. R. Lamarsh, Intrductin t Nuclear Reactr Thery (Addisn-Wesley, Reading, 966) T study neutrn diffusin

More information

initially lcated away frm the data set never win the cmpetitin, resulting in a nnptimal nal cdebk, [2] [3] [4] and [5]. Khnen's Self Organizing Featur

initially lcated away frm the data set never win the cmpetitin, resulting in a nnptimal nal cdebk, [2] [3] [4] and [5]. Khnen's Self Organizing Featur Cdewrd Distributin fr Frequency Sensitive Cmpetitive Learning with One Dimensinal Input Data Aristides S. Galanpuls and Stanley C. Ahalt Department f Electrical Engineering The Ohi State University Abstract

More information

Chemistry 20 Lesson 11 Electronegativity, Polarity and Shapes

Chemistry 20 Lesson 11 Electronegativity, Polarity and Shapes Chemistry 20 Lessn 11 Electrnegativity, Plarity and Shapes In ur previus wrk we learned why atms frm cvalent bnds and hw t draw the resulting rganizatin f atms. In this lessn we will learn (a) hw the cmbinatin

More information

INSTRUMENTAL VARIABLES

INSTRUMENTAL VARIABLES INSTRUMENTAL VARIABLES Technical Track Sessin IV Sergi Urzua University f Maryland Instrumental Variables and IE Tw main uses f IV in impact evaluatin: 1. Crrect fr difference between assignment f treatment

More information