Constructing socio-demographic indicators for National Statistical Institutes using mobile phone data: estimating literacy rates in Senegal

Size: px
Start display at page:

Download "Constructing socio-demographic indicators for National Statistical Institutes using mobile phone data: estimating literacy rates in Senegal"

Transcription

1 Constructng soco-demographc ndcators for Natonal Statstcal Insttutes usng moble phone data: estmatng lteracy rates n Senegal Tmo Schmd Faban Bruckschen Ncola Salvat Tll Zbransk School of Busness & Economcs Dscusson Paper Economcs 2016/9

2 Constructng soco-demographc ndcators for Natonal Statstcal Insttutes usng moble phone data: estmatng lteracy rates n Senegal Tmo Schmd *, Faban Bruckschen *, Ncola Salvat **, and Tll Zbransk * * Insttute of Statstcs and Econometrcs, Free Unverstät Berln, Berln, Germany ** Dpartmento d Economa e Management, Unverstà of Psa, Psa, Italy Abstract Modern systems of offcal statstcs requre the accurate and tmely estmaton of soco-demographc ndcators for dsaggregated geographcal regons. Tradtonal data collecton methods such as censuses or household surveys mpose great fnancal and organzatonal burdens for Natonal Statstcal Insttutes. The rse of new nformaton and communcaton technologes offers promsng sources to mtgate these shortcomngs. In ths paper we propose a unfed approach for Natonal Statstcal Insttutes based on small area estmaton that allows for the estmaton of soco-demographc ndcators by usng moble phone data. In partcular, the methodology s appled to moble phone data from Senegal for dervng sub-natonal estmates of the share of llterates dsaggregated by gender. The estmates are used to dentfy hot spots of llterates wth a need for addtonal nfrastructure or polcy adjustments. Although the paper focuses on lteracy as a partcular soco-demographc ndcator, the proposed approach s applcable to ndcators from natonal statstcs n general. Keywords: Indcators, Model-based estmaton, Offcal statstcs, Small area estmaton. 1 Introducton If you can t measure t, you can t manage t. (Mchael Bloomberg, former Mayor of New York Cty) A country s budget can hardly be allocated effcently, f the country does not know where the money s needed the most. Relable knowledge on the soco-demographc ndcators of a country s populaton s essental for sound evdence-based polcymakng. For nstance, the geographc dstrbuton of wealth s used to make decsons regardng the allocaton of resources. Tradtonally, ths knowledge s collected va household surveys and s provded by Natonal Statstcal Insttutes (NSI). The surveys are generally desgned to provde relable estmates for the ndcators only for larger domans such as the natonal or the regonal level. One possble way to derve estmates on spatally dsaggregated levels, lke muncpaltes or communes, s by usng small area methods (Rao, 2003). Durng the last decade there has been a substantal growth n the development and applcaton of model-based small area methods for the estmaton of ndcators. Examples are manfold n lterature: Elbers et al. (2003) and Molna and Rao (2010) used small area technques for the estmaton of poverty ndcators and, recently, Lopez- Vzcano et al. (2015) and Chambers et al. (2016) nvestgated the estmaton of labour force ndcators. 1

3 For a comprehensve revew we refer to Pfeffermann (2013) and Rao and Molna (2015). However, the producton of precse small area estmates of ndcators reles on the avalablty of predctve auxlary varables lke census or regster nformaton. In many countres successve census and natonal surveys are conducted wth long lag tmes. Both requre a well-functonng nfrastructure, startng from cars for the ntervewers to computers and well-traned personnel for the analyss. Wth natonal statstcal systems n developng countres often beng subject to unstable fundng and a lack of human resources, the collecton and processng of relevant data mposes a great challenge or often does not exst (Ghosh and Rao, 1994). For nstance, n Angola the most recent census before 2014 was conducted n 1970 and the offcal populaton grew by more than 400% n that perod (Blumenstock et al., 2015). An alternatve to the usage of census nformaton for small area estmaton s to nvestgate dfferent sources of passvely collected data lke socal meda sources (e.g. Facebook, Twtter etc.) or moble phone data. Eagle et al. (2010) used recently socal network data to measure economc growth n the UK. Nevertheless, socal meda data are rare n developng countres whereas moble phone data are a remarkable excepton. The unque subscrber penetraton s between 40% 55% n developng countres wth a share of around 40% n Sub-Saharan Afrca (GSMA, 2015). In ths paper we nvestgate how moble phone data (n combnaton wth survey data) can be used to predct soco-demographc ndcators at regonally dsaggregated levels when census nformaton s not avalable. The motvaton s that moble phone data are collected as a by-product and nclude valuable nformaton on the tmng and frequency of communcaton events and patterns of locaton and travel choces (Blumenstock et al., 2015). Eagle et al. (2010) and Devlle et al. (2014) showed that spatally aggregated measures of moble phone usage and penetraton have a hgh correlaton wth spatally aggregated statstcs from censuses. At ths pont we should make clear that the paper does not dscuss whether the soco-demographc ndcators can be drectly estmated usng only the moble data. We are aware of some mportant recent work by Blumenstock et al. (2015). The authors predct poverty and wealth by usng an ndvdual s past hstory of moble phone usage n combnaton wth a phone survey. In our paper we had access to the Demographc and Health Survey (DHS) 2011 and moble phone data coverng the year 2013 n Senegal. The Republc of Senegal s located n West Afrca at the Atlantc Ocean between Maurtana to the North and Gunea-Bssau to the South. At the most Western tp les Dakar, the country s captal and also the largest cty. The set-up of admnstratve areas n Senegal s complex, but can be dvded nto four dfferent levels: 14 regons, 45 departments, 123 arrondssements and 431 communes. The total populaton s estmated at about 13.5 mllon (2013) and conssts of several ethnc groups, e.g. the Wolof or the Serer. From a methodologcal pont of vew the present artcle uses area-level small area models (Fay and Herrot, 1979) n combnaton wth covarates from alternatve data sources. The resultng estmates are benchmarked such that the aggregated small area estmates produce the offcal natonal estmate for the country. We also apply transformaton to restrct the ndcator of nterest, for nstance the lteracy rate, to partcular ntervals when necessary. However, the dea of alternatve covarates s not new n lterature. Porter et al. (2014) appled functonal covarates extracted from Google n spatal Fay-Herrot models (Prates and Salvat, 2009). Recently, Marchett et al. (2015) gve a comprehensve overvew how alternatve data sources can be used n the context of small area estmaton. Nevertheless, none of these papers consdered n detal the usage of moble phone data. To the best of our knowledge, ths paper s the frst attempt to provde an easly applcable approach for NSIs to model a basket of regonally dsaggregated soco-demographc ndcators usng survey data n combnaton wth moble phone data. 2

4 In partcular, the paper nvestgates the usablty of moble phone data, n ths case tower-to-tower traffc n Senegal from 2013, for constructng fne granular ndcators, lke lteracy and poverty rates, access to electrcty and safe water or relgous afflatons. The applcaton here ams at estmatng the socodemographc ndcator lteracy rate for women and men for regonally dsaggregated areas because t s a common problem across Afrca. From an appled pont of vew, the paper also dscusses the processng, cleanng and handlng of the moble phone data used as addtonal source of nformaton. Especally chld labour, poverty and poor access to educaton are common problems across the Afrca contnent (Ford, 2007). Poverty n developng countres s not only a result of low ncome, but also of a lack of opportuntes to mprove the stuaton (UNESCO, 2015). Lteracy s one of the keys to mprove people s chances to escape from the lowest poverty levels. Although there are countres wth a stuaton worse than the one of Senegal, the country s only ranked 117th out of 127 countres n the Educaton for All Development Index (EDI) publshed by the UNESCO (2012). Especally the lteracy rate s qute low compared to other Afrcan countres (lteracy rate n 2011: 38% for women and 62% for men, ANSD (2012)). The hgh number of llterates can be partally explaned by hstorc reason. Senegal was a former French colony untl t ganed ndependence from France n At that pont the school attendance of chldren n the prmary school was at 36%, whle the country s average lteracy rate was around 34% (Schelle, 2013). The orgn for ths low share of lteracy les n the lttle nterest of the colonal rulers n educatng the ndgenous people. Other colonal powers n West Afrca lke Germany (Togoland) or England (Gold Coast, now called Ghana) had a pupls count whch was around four tmes as hgh as Senegal s count (Schelle, 2013). Concernng the country s lteracy rate from 2011, not much has changed n ths regard snce the wthdrawal of the French power n Another problem of the educatonal stuaton s the slow development of a coherent educaton system due to opposng educaton concepts wth dfferent tradtons. The ndgenous Afrcan concept coexsts next to the slamc and western concept. Nowadays, f chldren vst school, they often vst a publc school and addtonally a Qur anc school n Senegal. In 2002 a new system emerged, the so called franco-arabc schools. A hybrd form of a blngual (French and Arabc) school wth a heavy currculum. Although Vllalon and Bodan (2012) predct ths franco-arabc schools could be the future and predomnant form of publc schools, Senegal s after more than 50 years of ndependence stll n the development stage of a coherent educaton system. The problem s doubtless not only due to a fragmented school system, but also caused by low attendance rates of chldren at any school. Although prmary and secondary educaton s compulsory and free n Senegal, many parents stll do not send ther chldren to school, and drop-out rates are hgh (Ford, 2007). UNESCO (2012) reported that as the level of educaton ncreases especally the enrollment ratos of women strongly decrease. Although Senegal acheved a gender party n prmary educaton, the dsparty for secondary educaton s even more severe. For every 100 boys attendng secondary educaton n Senegal, only around 79 grls attend (UNESCO, 2012). Ths s one reason for low lteracy rates especally among women. Accordng to UNESCO (2012) more than two mllon women n Senegal mss sklls n basc lteracy. Especally n the country s poor regons lke Matam and Tambacounda, both located n the East, grls are nvolved n economc actvtes and therefore the parents keep the grls out of the school to earn some addtonal ncome. Next to economc reasons, gender-based volence, early marrage and pregnancy as well as the tradtonal role of women n the socety are further ssues whch add to low lteracy rates for women (UNESCO, 2012). The Senegalese government wants to sgnfcantly mprove the lteracy rate, especally for women. For nstance, n the early 2000s, the government bult communty schools and lteracy centers for dsadvantaged people, lke women who mssed a basc school educaton. However, accordng to the lteracy 3

5 rates for 2011 there s stll a large gender dsparty and a persstng need to address ths ssue n Senegal. Organzatons lke the UNESCO and UNICEF are constantly workng on ths educatonal ssue and ntated several projects. Currently the Senegalese government and the UNESCO offce n Dakar run a project to mprove the lteracy rate for women (UNESCO, 2015). In partcular, the PAJEF project (Projet d alphabétsaton des jeunes flles et jeunes femmes) provdes, for nstance, access to organzed lteracy classes and develops tranng manuals. The project currently runs n seven regons dentfed by the Natonal Agency of Statstcs and Demography (ANSD - Agence Natonale de Statstque et de la Demographe) n Senegal. Further nformaton are avalable n UNESCO (2015). So far Senegal belongs to the most successful countres n advancement of gender equalty for the enrollment n prmary schools, but the natonal number of llterate women remans hgh. All the efforts mentoned above are expermental and not countrywde because of a lack of spatally dsaggregated knowledge where more support s needed. To obtan a hgher countrywde lteracy rate, areas of hgh llteracy have to be dentfed. In ths paper we propose an approach for NSIs based on small area estmaton for dervng estmates of the share of lterates by gender by usng moble phone data for the 431 communes n Senegal. The estmates are used to dentfy hot spots of llterate women for the PAJEF project wth a need for addtonal nfrastructure. The structure of the paper s as follows. In Secton 2 we descrbe the DHS survey and the moble phone data ncludng the cleanng and preparaton. In Secton 3 we revew small area estmaton usng Fay-Herrot models. The methodologcal approach for constructng soco-demographc ndcators based on moble phone data s descrbed and computatonal detals are provded. In Secton 4 we present the results of the applcaton for the ndcator lteracy rate n Senegal by usng the moble phone data. The performance of the proposed approach s emprcally evaluated n a large-scaled desgn-based smulaton n Secton 5. Fnally, n Secton 6 we conclude the paper wth some fnal remarks and dscuss lmtatons of the proposed approach. Addtonal results are presented n the supplementary materals. 2 Data sources: survey data and moble phone data In ths secton we descrbe the data sources used n the analyss. In partcular, we had access to the Demographc and Health Survey (DHS) 2011 and moble phone data coverng the year 2013 n Senegal. We present detals regardng practcal mplementaton of the tme-ntensve cleanng and preparaton of the moble phone data and dscuss the constructon of moble phone covarates. 2.1 Demographc and Health Survey The DHS program collects representatve data on populaton, health, HIV and nutrton n over 90 countres. The data that we use are from the DHS survey 2011 carred out by the ANSD n Senegal. The survey ncludes a secton on the producton of soco-demographc ndcators on household level and another part on assessng the avalablty of materal and human resources. In partcular, the DHS survey conssts of three questonnares: () a household questonnare, () a women s questonnare and () a men s questonnare. The household survey collects nformaton on the usual household members ncludng, for nstance, gender, age, educaton, survval of parents, and chld labor. Addtonal nformaton lke household characterstcs (source of water, avalablty of electrcty, buldng materal and type of tolet), ownershp, use of mosquto nets and several health related questons are collected as well. The household survey s also used to dentfy men and women for the ndvdual questonnares. The questonnare for women conssts of 10 sectons coverng soco-demographc ndcators (lke age and date of 4

6 brth, schoolng, lteracy, ethncty), reproducton, use of contracepton, pregnancy, marrage and female gental mutlaton. The men s questonnare s a short verson of the questonnare for women coverng soco-demographc characterstcs and health related questons. Note as soco-demographc characterstcs are only avalable n the gender-specfc questonnares we focus n the analyss n ths paper on the women s and men s questonnares. For addtonal nformaton regardng the varables and the questonnares we refer to ANSD (2012). The survey ams to cover the complete country and s based on a stratfed two-stage cluster samplng desgn. The 28 strata are defned by a cross-classfcaton of the 14 regons and rural/urban areas n Senegal. The survey s desgned to produce relable results for most ndcators for the 14 regons. In the frst samplng stage 391 census dstrcts (147 urban and 244 rural) were drawn wth probablty proportonal to sze (number of households n the census dstrcts). In the second samplng stage 21 households were selected wth equal probablty n each of the 391 census dstrcts whch were sampled n the DHS survey. Among the 21 households selected for the women s survey, 8 households were drawn for the men s survey. All men (age between 15-59) and women (age between 15-49) n these households were ntervewed. The ntervew was successfully conducted for 15,688 women (response rate of 92.7 percent) and for 4,929 men (response rate of 87 percent) (ANSD, 2012). Fgure 1 presents results based on DHS survey 2011 of the ndcator lteracy rate by gender for the regons n Senegal. In partcular, the varable lteracy s collected by four dfferent categores n the DHS survey. The categores able to read only parts of sentence and able to read whole sentence are grouped as lterate. The answers blnd/ vsually mpared, cannot read at all and no card wth requred language are categorzed llterate. The ntal results ndcate that the proporton of lterate women (38%) n Senegal s lower than the proporton of lterate men (62%). The results are consstent wth the offcal publshed results of the ANSD (2012). As the ANSD ams to estmate soco-demographc ndcators for the 431 communes n Senegal, we allocated the nformaton of the DHS survey to the admnstratve areas (communes). In partcular, we had access to the geographcal coordnates of the centrods of the 391 census dstrcts. As the actual coverage of the census dstrcts was not avalable, we matched the centrods of the census dstrcts wth the geographcal boundares of the 431 communes. Sx out of the 391 census dstrcts were excluded from the analyss because the coordnates of the centrods were mssng. Drect survey estmates are only avalable for 242 out of the 431 communes gven the data from the DHS survey A summary of the commune specfc sample szes for the women s and men s questonnares s provded n Table 1. Fgure 2 shows drect estmates for the lteracy rate by gender on commune level for the captal Dakar (rght panel) and for the rest of Senegal (left panel). Communes flled wth whte color represent areas wth zero sample sze, so drect estmates based on the DHS survey 2011 are not avalable. The spatal dstrbuton of lteracy on commune level s not clearly vsble and the dentfcaton of hot spots of llterates wth a need for addtonal nfrastructure mght be dffcult. The applcaton of small area methods could sgnfcantly mprove the nterpretaton of Fgure 2 by provdng results for the communes wth zero sample sze. Ths requres fttng of an approprate model to the survey data. The estmated model parameters are then combned wth known populaton nformaton. The reason that we reled on moble phone data for predctng soco-demographc ndcators s twofold: frst, the predctve power of the covarates for soco-demographc ndcators from the Senegalese census s lmted and second, the ANSD s nterested n a wdely applcable approach based on the DHS survey for dsaggregated ndcators ndependent of census data. 5

7 Male lteracy rate n % (by DHS) Female lteracy rate n % (by DHS) Fgure 1: Estmates for the lteracy rate by gender on regonal level based on DHS survey Table 1: Sample szes over communes Mn. 1st Qu. Medan Mean 3rd Qu. Max. NA Women s questonnare Men s questonnare Moble Phone Data The moble phone data used n ths paper consst of anonymzed call detal records (CDR) from the Senegalese telecommuncaton company Sonatel coverng the year The dataset s based on more than 9 mllon unque moble phone numbers and represents a market share of around 60%. In partcular, we had access to the tower-to-tower traffc of all 1666 moble phone towers n Senegal. In the followng we dscuss the practcal mplementaton of the processng of the moble phone data and present detals regardng the constructon of the moble phone covarates Data processng and cleanng The preprocessng of the moble phone raw data s essental and accounts for a consderable amount of tme n the whole analyss. The dataset s not drty or nosy n the sense of an excessve amount of mssng values or llogcal recorded values. The data s collected automatcally by machnes and not gathered by human hand. Ths means errors n the data are more lkely a consequence of machne breakdowns than of human falure. The traffc of all 1666 towers n Senegal for 2013 s about 1.1 Terabyte of data stored n a cloud system. Because of the massve amount of data, the moble phone records need to be preprocessed drectly n the cloud system. In partcular, the raw data s organzed n a Hadoop cluster wth one separate fle by hour per day per month. Hadoop s an open-source software for storng and handlng massve data. Each sngle row contans an nteracton and has several characterstcs. For example ndcatng f t s an ncomng or outgong nteracton, f t s a phone call or SMS, whch tower receved 6

8 Male lteracy rate n % (by DHS) Male lteracy rate n % (by DHS) Female lteracy rate n % (by DHS) Female lteracy rate n % (by DHS) Fgure 2: Estmates for the lteracy rate by gender on commune level based on DHS survey 2011: Senegal (left panel) and Dakar (rght panel). 7

9 and sent the nteracton, or smply the duraton of a call n mnutes. To process these data we used Apache Hve (Apache Hve s a data warehouse nfrastructure bult on top of Hadoop for provdng data summarzaton) and ts SQL logc. MapReduce s appled to create daly, monthly and yearly aggregates of the varables of nterest on the cluster. The programmng model MapReduce s an mplementaton for processng large datasets wth parallel algorthms on a cluster. For nstance, the aggregated dataset for SMS usage ncludes the number of ncomng and outgong calls and SMS as well as the duraton of each call for every tower on an hourly bass for the year Table 2 shows the head of an preprocessed dataset for the usage of SMS. The frst column s Table 2: Structure of the call detal records for SMS. DH TO TI E the observaton ndcator whch reaches n January 2013 alone around 50 mllon rows. Varable DH tracks the day and hour of a sent SMS; TO and TI are the tower numbers correspondng to outgong and ncomng, respectvely; E gves the number of events happenng,.e. SMS beng sent. So the frst row says that on the 1st of January at mdnght there was sent 1 SMS from tower 1 to tower 61. We also had access to the exact geo-coordnate (longtude and lattude) of the towers provded by Sonatel Constructon of moble phone covarates Moble phone data are measured on tower level on an hourly bass wth an excessve amount of observatons over the year. To construct varables whch can be used as covarates for a statstcal model for estmatng ndcators on commune level, the data needs to be aggregated by two dmensons: tme and geographc level. Frst, n order to reduce the amount of data, the aggregaton was done up to the whole year 2013 for each tower. Annual aggregates may dsregard sub-annual trends, but snce most of the soco-demographc ndcators, especally the lteracy rate, are tme nsenstve varables, ths fact can be neglected. Second, for havng the covarates on the same geographcal level lke the DHS survey, we used the aggregated (by tme) covarates on tower level and averaged them for hgher geographc levels lke communes or regons. Note as the actual coverage of the moble towers are unknown, we matched the geo-coordnate of the tower wth the geographcal boundares of the 431 communes. In total we constructed around 70 moble phone covarates on commune level based on the call detal records. The aggregaton routne s done n R by usng the package data.table. The package extends data.frames n R based on SQL logc and focuses on fast aggregaton of large data (Dowle et al., 2014). For nstance, we construct the sum of the number of calls startng from/endng n a specfc tower and denote these varables as outgong calls / ncomng calls, respectvely. In addton, we also buld the varable call volume whch sums up the mnutes of calls. In the followng we label SMS and phone calls together as events. For each event we also calculated the ratos of the number of outgong events dvded by ncomng events. The varable mean dstance s defned as the average dstance n klometers for an event. In partcular, the dstance s computed on the tower level by takng the dstance of the outgong tower to the ncomng tower for each event and dvdng t by the amount of events between the 8

10 Table 3: Moble phone towers over communes Mn. 1st Qu. Medan Mean 3rd Qu. 90% Max. NA Number of towers two towers. The covarate dstance-to-dakar measures the dstance from each tower to a centrod of the regon Dakar. Accordng to Smth et al. (2013) we construct the varable solaton whch quantfes the dversty of nteractons by users of a tower. The varable s defned for an outgong tower t by 1666 Isolaton(t ) = I E(t,t j ), (1) where the ndcator functon I s 1 f the condton E(t, t j ) s true,.e. an event happened between the towers t and t j, and 0 otherwse. The varable ranges between 0 and 1666 (total number of towers). We measure the average amount of nformaton an event contans by the varable Entropy (Montjoye et al., 2014). The ntuton behnd Entropy s that the more unlkely an event s to happen, the more j j=1 nformaton t contans once t happens. Entropy for a tower t s defned by 1666 Entropy(t ) = p(t, t j ) log [ p(t, t j ) ], (2) j j=1 where p(t, t j ) s the probablty of an event between the towers t and t j. In addton, we calculated the monthly growth and the varaton (.e. varance) of monthly aggregates for the number and volume of events respectvely. Varables Calls-to-dakar and sms-to-dakar reflect the amount of calls or SMS for each tower that were drected to towers located n the captal Dakar. A complete lst and descrpton of the covarates s provded n the supplementary materals. Addtonally to the varables descrbed above and n the supplementary materals, we created behavoral ndcators based on the moble phone data wth the open-source python toolkt bandcoot (Montjoye et al., 2013). A lst of these varables can be found at As the bandcoot ndcators are constructed for analyzng ndvdual patterns based on the moble behavor of each sngle user, we summarzed the nformaton to tower level. In partcular, a bandcoot ndcator on tower level s calculated as a weghted average of all ndvduals ndcators where ths tower was part of the nteracton. The steps are as follows: frst, we calculated the bandcoot ndcators on a monthly level for all sngle users. Second, we extracted the number of nteractons (calls and SMS) durng that month for each user and tower combnaton from the call detal records. Thrd, we used the number of nteractons as a weght to average the ndvduals ndcators on tower level for each month. Fnally, we averaged the monthly values to obtan a yearly ndcator for each tower Frst descrptve statstcs Fgure 3 gves a frst mpresson of the spatal dstrbuton of the 1666 moble phone towers (red ponts) n Senegal. The towers are spread over the whole country wth hgher denstes n regons wth hgher populaton denstes. For nstance, most of the towers are located n the regon of the captal Dakar whch tself s located on the Cap-Vert Pennsula on the Atlantc coast n the West. Table 3 shows summary statstcs of the number of moble phone towers over the communes. The mean number of towers per 9

11 Fgure 3: Locaton of moble phone towers n Senegal. commune s 4.1 wth a maxmum of 60. Although Fgure 3 suggests a good coverage of the country by moble phone towers, there are 30 communes wthout moble phone towers. Most of these communes are qute small and they are manly covered by towers whch are close-by. For nstance, the map at the top on the rght of Fgure 3 shows the area around the commune Badegne Ouolof wthout tower nformaton. Badegne Ouolof s located n north-western Senegal wthn the Louga Regon on a total of around 300 square klometers. The centrod of Badegne Ouolof s represented by a blue trangle. In order to apply small area estmaton methods for the out-of-covarate communes, the covarates are constructed by nverse dstance weghtng from neghborng moble towers. In partcular, the assgned covarates to out-of-covarate communes are calculated by a weghted average of the covarates avalable at known tower locatons. We used the Eucldan dstance functon and a power parameter of 2 n the weghtng. 3 Descrpton of the small area estmaton method In ths secton we descrbe the methodologcal approach for constructng soco-demographc ndcators based on moble phone data. Snce our am s to provde an easy-applcable approach for the producton of offcal statstcs, especally for the ANSD n Senegal, we apply relatvely smple small area estmaton methods and corrected for msspecfcatons by adjustments. The mplemented approach should meet three condtons: 1. the method should provde estmates for all 431 communes n Senegal; 2. the estmates should be close to the drect estmators for communes wth large sample szes; 3. the aggregated estmates for the communes should produce the offcal natonal estmate for the country. Note that the Mnstry of Chle recently conducted a small area project for the estmaton of poverty n Chle based on smlar gudelnes (Casas-Cordero et al., 2016). In addton the moble phone covarates 10

12 are only avalable on area-level (communes) and t s not possble to lnk the ndvduals n the survey wth the moble phone numbers because of confdentalty constrants. Based on the mentoned condtons and avalable data we consdered a benchmarked transformed Fay-Herrot estmator n ths paper. MSE estmaton s performed by a parametrc bootstrap approach. 3.1 Transformed Fay-Herrot estmator We assume that the populaton U, consstng of N unts, s dvded nto m dsjunct small areas. The sample s s selected from the populaton by usng a complex samplng desgn. The populaton s separated nto n sampled and N n non-sampled unts, ndexed by s and r, respectvely. We use the subscrpt to ndcate the restrcton to the area, for nstance, n and N denote the sample sze and the populaton sze n area, respectvely. Let y denote a contnuous varable of nterest and y j the response value of unt j n area and ω j are the correspondng samplng weghts. A desgn unbased estmator for the populaton mean θ of the varable of nterest y n area s gven by ˆθ drect = 1 N n j=1 ω j y j, (3) and V ar(ˆθ drect ) denotes the correspondng varance. The area level model proposed by Fay and Herrot (1979) (hereafter FH model) lnks the drect estmates wth area-level covarates. The FH model s based on two stages. Samplng model (frst stage) : ˆθdrect = θ + ε (4) Lnkng model (second stage) : θ = x T β + u, (5) where x T and β denote the (k 1) vectors of area-level covarates and regresson parameters, respectvely. The samplng errors are assumed to be normally dstrbuted and ndependent wth ε N(0, σε 2 ). Furthermore, ε s estmated based on the desgn of the survey and known, for nstance, ε = V ar(ˆθ drect ). The random effects u are assumed to be ndependently normally dstrbuted wth u N(0, σu). 2 For addtonal detals we refer to Rao and Molna (2015). The combnaton of both models leads to an area-level lnear mxed model gven by ˆθ drect = x T β + u + ε. (6) Let ˆβ defne the emprcal best lnear unbased estmator (EBLUE) of β and û the emprcal best lnear unbased predctor (EBLUP) of u (Henderson, 1950; Searle, 1971), where the varance component σ 2 u can be estmated by maxmum lkelhood or restrcted maxmum lkelhood (Datta and Lahr, 2000; Rao, 2003). The EBLUP under the FH model s obtaned by ˆθ F H = x T = γ ˆθdrect ˆβ + û (7) + (1 γ )x T ˆβ, (8) where γ = ˆσ 2 u(ˆσ 2 u + σ 2 ε ) 1 denotes the shrnkage factor for area. In practce, many of the small areas may have zero sample szes, so a drect estmator s not avalable. In ths case we rely on synthetc 11

13 estmaton as follows (Rao and Molna, 2015): ˆθ F H,out = x T ˆβ. (9) The MSE of the EBLUP n (7) can be obtaned by analytc solutons followng Prasad and Rao (1990) and Datta et al. (2005). Some soco-demographc ndcators are restrcted to a specfc range. For nstance, the share of lterates n an area should be wthn the nterval [0, 1]. However, there s no guarantee that the FH estmates produces estmates n a partcular range. Followng Carter and Rolph (1974) and Raghunathan et al. (2007) we use arcsne transformaton n modelng. Let y now denote a bnary varable of nterest and y j s the 0-1 response value of unt j n area. The steps of the estmaton are as follows: ˆθdrect 1. transform the drect estmator va ϑ = f(ˆθ drect ) = arcsn. 2. The samplng varance of ϑ s approxmated by σ 2 ε = 1/(4ñ ), where ñ stands for the effectve sample sze (Carter and Rolph, 1974). In partcular, the effectve sample sze s the sample sze dvded by an estmate of the desgn effect. 3. Estmate ˆθ F H { ϑ, 1/(4ñ ) } accordng to (7). ˆθ F H s truncated to the nterval [0, π/2] f necessary. 4. Back-transform the estmator ˆθ F H where ˆθ F H,trans For the MSE estmatng of ˆθ F H,trans to the orgnal scale va = f 1 (ˆθ F H ) = sn 2 (ˆθ F H ) for = 1,..., m, (10) denotes the transformed FH estmator. ˆθ F H,trans Mantega et al. (2008). The steps are as follows: we use a parametrc bootstrap procedure followng Gonzalez- 1. for gven ˆβ and ˆσ 2 u estmated wth the transformed drect estmator ϑ, samplng varance 1/(4ñ ) and covarates x, we generate u from N(0, ˆσ2 u) and ε from N(0, 1/(4ñ )). 2. Usng u and ε to generate the bootstrap sample, ˆθ,(b) = x T ˆβ + u + ε (11) and the correspondng bootstrap populaton θ,(b) = x T ˆβ + u. (12) 3. Usng the bootstrap sample, we estmate the model parameters n (6). Based on the estmated model parameters from the bootstrap sample, we compute the correspondng FH estmator (7) n F H,(b) area, ˆθ. 4. Usng the B bootstrap samples, the MSE estmator of ˆθ F H,trans s gven by ˆ MSE(ˆθ F H,trans ) = 1 B B ( f 1{ˆθF H,(b) } f 1 { θ,(b) } ) 2. (13) b=1 The propertes of ths bootstrap scheme are emprcally evaluated n Secton 5. Furthermore we use the F H,trans MSE estmates of ˆθ for the benchmarked transformed FH estmator ntroduced n Secton

14 3.2 Benchmarked transformed Fay-Herrot estmator Although the model-based estmator n (10) provdes estmates for all communes (small areas) n Senegal, the aggregated estmates on natonal level can dffer substantally from the correspondng drect estmator. Followng Datta et al. (2010) we use a benchmark approach to acheve the nternal consstency wth the drect estmator on natonal level. where We seek for a benchmarked FH estmator m =1 α = ˆθ F H,bench such that w ˆθF H,bench = α, m =1 w ˆθdrect. We defne the weghts by w = N /N. We defne the benchmarked transformed FH estmator (Datta et al., 2010) by ˆθ F H,trans,bench = m F H,trans =1 ˆθ + w2 φ ( α m =1 w ˆθF H,trans ) w φ for = 1,..., m. (14) There are several way to defne the weght φ (Datta et al., 2010). For nstance, φ = w /ˆθ F H,trans leads to a rato adjustment of the FH estmator, where small areas wth larger estmates wll receve ˆθ F H,trans a larger adjustment and vce versa. As the s restrcted to [0, 1], we defne the weghts by φ = w / MSE(ˆθ ˆ F H,trans ). That means that small areas wth hgher varablty n terms of MSE wll receve a larger adjustment. Note that the benchmarked FH estmator for (7) s defned analogously. For the MSE estmatng of ˆθ F H,trans,bench Gonzalez-Mantega et al. (2008). The steps are as follows: we also apply a parametrc bootstrap procedure followng 1. for gven ˆβ and ˆσ 2 u estmated wth the transformed drect estmator ϑ, samplng varance 1/(4ñ ) and covarates x, we generate u from N(0, ˆσ2 u) and ε from N(0, 1/(4ñ )). 2. Usng u and ε,(l) to generate the bootstrap sample ˆθ and the correspondng bootstrap populaton accordng to equatons (11) and (12) respectvely. θ,(l) 3. Usng the bootstrap sample, estmate the transformed FH estmator (10) and the correspondng MSE estmator (13) n area. Note that ths step nvolves B bootstrap replcatons descrbed n (13). 4. Compute the correspondng benchmarked transformed FH estmator (14) n area, ˆθ F H,trans,bench,(l). 5. Usng the L bootstrap samples, the MSE estmator of ˆθ F H,trans,bench s gven by ˆ MSE(ˆθ F H,trans,bench ) = 1 L L l=1 (ˆθF H,trans,bench,(l) f 1{ θ,(l) } ) 2. (15) The MSE estmaton for the proposed benchmarked transformed FH estmator s computatonally demandng because t nvolves B L bootstrap replcatons. 13

15 4 Applcaton: estmatng lteracy rates n Senegal In ths secton the benefts of usng the presented Fay-Herrot-type estmators n combnaton wth moble phone covarates for the estmaton of soco-demographc ndcators are llustrated n an applcaton whch uses the data from the DHS survey 2011 and the moble phone data we descrbed n Secton 2. The applcaton ams at estmatng the lteracy rate by gender on commune level n Senegal. The analyss s carred out by usng the varables lteracy women and lteracy men from the gender-specfc questonnares ntroduced n Secton 2. The estmates are used to dentfy hot spots of llterate women for the PAJEF project wth a need for addtonal nfrastructure and fnancal support from the government. 4.1 Model selecton and model checkng Before proceedng wth the analyss of lteracy n Senegal, we dscuss the model selecton and present some dagnostc plots. The model selecton n ths paper s done by usng the classc Akake nformaton crteron (AIC) based on a lnear model. Although we are aware of more complex methods for Fay- Herrot model selecton dscussed n Marhuenda et al. (2014) we used an smple approach whch s mplemented n standard statstcal software. Based on the wde range of the moble phone covarates dscussed n Secton 2 we dentfed the fnal set of covarates by a stepwse selecton procedure usng the AIC. The fnal model on commune level for the varables lteracy women and lteracy men nclude 26 and 30 moble phone covarates wth an adjusted R 2 of 68% and 52% respectvely. Based on the transformed drect estmates from the DHS survey 2011 and the set of selected moble phone covarates on commune level we ftted area level mxed models (6) by gender. As dscussed n Secton 3 the samplng varances of the drect estmates are approxmated by 1/4ñ where ñ denotes the sample sze dvded by the desgn effect. Followng Casas-Cordero et al. (2016), we used the desgn effect on regonal level as an approxmaton for the desgn effect on commune level. The reason here s that the varance estmaton of the drect estmator s unstable because of a low number of cluster or even not drectly possble because only one cluster s nested n some communes. We refer to Opsomer et al. (2012) for a recent dscusson on ths ssue n the context of forestry data. Table 4.1 reports the desgn effects of the drect estmators by gender on regonal level n Senegal. The estmates are consstent wth offcal results publshed by the ANSD (2012) n Senegal and show an hgh value of the desgn effect of the drect estmator usng DHS survey Table 4: Desgn effects of the drect estmator n Senegal by regon. Regon Female Male Regon Female Male Dakar Louga Dourbel Sant Lous Fatck Matam Kaffrne Sedhou Kaolack Tambacounda Kedougou Thes Kolda Zgunchor Fgure 4 shows normal probablty plots of level 1 and level 2 resduals obtaned from fttng the female model (left panel) and the male model (rght panel). The fgure ndcates some small departures from normalty especally n the tals of the dstrbuton. However, the departures are not severe. The Shapro-Wlk test supports the lack of evdence aganst the normalty assumpton for the level 1 14

16 Sample Quantles Female Level 1 Female Level 2 Male Level 1 Male Level Theorcal Quantles Fgure 4: Normal probablty plots of level 1 and level 2 standardzed resduals (top down) for the female model (left panel) and for the male model (rght panel). standardzed resduals (p-values: male model = and female model = ) and level 2 standardzed resduals (p-values: male model = and female model = ). Usng the transformed Fay-Herrot model (10) may be advsable for estmatng the lteracy of women and men. 4.2 Small area estmates on commune level Estmates of the lteracy rate by gender for each commune are calculated by usng the transformed FH estmator (10) (FH Trans) and by the benchmarked transformed FH estmator (14) (FH Bench). MSE estmaton for the FH Trans and FH Bench s mplemented wth the parametrc bootstrap approaches dscussed n Secton 3. We performed B = 200 replcates for the bootstrap of the FH Trans (13) and B = 200 wth L = 200 replcates for the bootstrap of the FH Bench (15). We also nclude the drect estmator to assess the resultng estmates as the model-based estmators should be consstent wth the unbased drect estmators but wth a hgher precson. Note that drect estmaton s not an opton for the DHS survey 2011 on commune level because around 45% of the communes are out-of-sample. The estmators are mplemented by computatonally effcent algorthms usng R. The codes are avalable from the authors upon request. Table 5 reports the dstrbuton of estmated lteracy rates for women n the communes n Senegal, the correspondng estmated RMSE and the coeffcent of varaton (CV). Note that we do not report varance estmates for the drect estmator because there was only one samplng cluster nested n most of the communes. Our frst observatons s that the estmates for the lteracy rate are hgher for the FH Bench compared to the FH Trans. The reason s that the aggregated FH Trans estmates (36.1%) on natonal level slghtly underestmate the natonal share of lterate women (38%). However, the dfferences are more pronounced for the out-of-sample communes. In order to nvestgate the reason for these 15

17 Table 5: Dstrbuton of the female lteracy rates, estmated RMSE and coeffcent of varaton over communes n Senegal. 233 In-sample communes Indctor Estmator Mn. 1st Qu. Medan Mean 3rd Qu. Max. Pont Est. Drect FH Trans FH Bench RMSE FH Trans FH Bench CV FH Trans FH Bench Out-of-sample communes Mn. 1st Qu. Medan Mean 3rd Qu. Max. Pont Est. FH Trans FH Bench RMSE FH Trans FH Bench CV FH Trans FH Bench Out-of-covarate communes Mn. 1st Qu. Medan Mean 3rd Qu. Max. Pont Est. FH Trans FH Bench RMSE FH Trans FH Bench CV FH Trans FH Bench dfferences, we had a closer look at the estmated RMSE of the FH Trans n Table 5. As expected, the estmated RMSE are smaller for the n-sample communes compared to the out-of-sample communes. As the weghts for the FH Bench are defned by φ = w / MSE(ˆθ ˆ F H,trans ) we expect that communes wth hgher varablty n terms of RMSE wll receve a larger adjustment. Ths s also confrmed by Fgure 5. The plot shows the dfferences between the FH Bench and the FH Trans (sold lne) n relaton to the sze of the estmated RMSE of the FH Trans (dashed lne) on commune level. We can note that () the adjustments due to the benchmarkng are larger than zero for all communes; () the adjustments are proportonal to the sze of the estmated RMSE; () the adjustments for the n-sample communes are smaller compared to the out-of-sample communes because of the smaller RMSE ndcated by the dashed lne. In order to save space, the correspondng table and fgures for the estmated lteracy rates for men on commune level are reported n the supplementary materals. As the requred approach for the ANSD should meet the thrd gudelne whch s that the aggregated estmates for the communes should produce the offcal natonal estmate for Senegal we focus n the followng only on the benchmarked transformed FH for women and men. To assess the resultng estmates of the FH Bench for female and male lteracy we compare the estmates wth the drect unbased estmates n Fgure 6. The fgure shows the FH Bench versus the drect estmates for the lteracy rate by gender 16

18 Communes FH Bench FH Trans In sample Out of covarate Out of sample RMSE FH Trans FH Bench FH Trans RMSE FH Trans Fgure 5: Dfferences between the FH Bench and the FH Trans on commune level for the female model. Drect estmator FH Bench Female Male Fgure 6: Estmates for lteracy for women (left panel) and men (rght panel): drect estmates vs. modelbased estmates based on the FH Bench. on commune level. We expect that the estmates of the FH Bench are smlar to the drect estmates especally n communes wth larger sample sze. We observe that the drect estmates and the FH Bench behave smlar for the female model. In contrast, the model-based estmates dffer more compared to the drect estmates for the male model. Ths s because the sample sze n the women s questonnare s almost three tmes as bg as the one n the men s questonnare. We note that the model-based estmators are larger/ smaller compared to the drect estmator for small/ large values respectvely. 4.3 Lteracy rates by gender n Senegal Havng assessed the results of the estmators from a statstcal perspectve, we now dscuss the results of the benchmarked transformed FH n the context of female and male lteracy n Senegal. Fgure 7 shows the estmates for lteracy by gender on commune level for the captal Dakar (rght panel) and for the rest of Senegal (left panel). In order to smplfy the nterpretaton of the results, Fgure 7 presents geographcal maps for Dakar and for Senegal whch are extracted from Google Maps. As a frst comment, 17

19 we note that the relatve spatal dstrbuton of male and female lteracy rates are very smlar n the Dakar regon and n the rest of Senegal. Havng a closer look to the Dakar regon (rght panel) we observe that the coastal area, where the cty of Dakar and ts harbor are located, shows a very hgh rate of lterates for male and female. Ths trend contnues by movng from the pennsula closer to the man land and s only nterrupted by a pocket of lower lteracy around the dstrct of Pkne (located to the east of the lake n the mddle of Dakar). The dstrct was founded n 1952 by the French colonal government for the former resdents of the coastal area around the harbor. Snce 1967, t s forbdden by law to buld houses on ths land because of problems wth floodng. Today, however, llegal housngs of mgrant workers and refugees domnate ths area, reflected n remarkably low lteracy rates. Movng further nto the nteror of the country, the area gets more rural and the lteracy rate shrnks. We now turn to the estmated lteracy rates for the rest of Senegal n Fgure 7 (left panel). Next to the Dakar regon, the regon around Zgunchor below Gamba reveals a hgh lteracy rate for men and women. The hgh lteracy rates can be explaned by the strategc poston between the countres Gunea- Bssau and Gamba as well as to ts closeness to the Atlantc Ocean. Zgunchor s Senegal s second largest cty and t s also the trade center of the Casamance regon (area of Senegal south of Gamba ncludng the Casamance rver). Another reason s that the Casamance regon s ethncally dfferent from the other parts of Senegal. The regon conssts manly of Jola people wth a strong nfluence of Chrstanty whereas the Islam s the predomnant relgon n most other parts of the country (Hel, 2014). Another fndng s that communes closer to the ocean and to borders n the North to Maurtana and n the South to Gunea-Bssau have hgher lteracy rates for men and women. In contrast, communes located on the boarders to Mal (South-East) and to Gamba tend to have lower ones. As expected, the densty of moble phone towers n Fgure 3 s hgher n communes wth hgher lteracy rates. Rural communes wth a low coverage of moble phone towers seem to have a lower lteracy rates n general. Especally the central part of Senegal n the Matam and Tambacounda regon reveals hgh shares of llterate men and women. Although the relatve dstrbuton s very smlar n Senegal, Fgure 7 reveals clear dfferences n terms of absolute values. The lteracy rate for women s around 20% lower compared to men. Reasons are manfold n Senegal: Especally n poor regons of the country lke Matam and Tambacounda n the eastern part of the country, grls are nvolved n economc actvtes and therefore the parents keep the grls out of the school to earn some addtonal ncome. Next to economc reasons, unsafe and long roads to school, gender-based volence, early marrage and pregnancy, the tradtonal role of women n the socety and the low qualty of the educaton system are further ssues whch add to low lteracy rates for women. The PAJEF project, already mentoned n the ntroducton, ams to boost lteracy among women n Senegal s currently conducted by UNESCO Dakar and the government of Senegal (UNESCO, 2015). The project runs n the seven regons (Dakar, Dourbel, Fatck, Kedougou, Matam, Sant-Lous and Tambacounda) wth the lowest lteracy rate dentfed by the ANSD based on the DHS survey. The seven regons and the correspondng lteracy rates for women are dsplayed n Fgure 8 (rght panel). The regons cover around 50% of the country. The left fgure shows the lteracy rate for women on commune level held by the lowest 20% estmated by usng the DHS survey 2011 n combnaton wth moble phone covarates. There are some hotspots for example n the regon around Gamba n the Zgunchor regon or n the Western part of Senegal, wth low lteracy rates for women but wthout any fnancal support. In contrast, the PAJEF project provdes fnancal support to the Sant-Lous regon n the north of Senegal or to Dakar where the female lteracy rates are above average. 18

20 Male lteracy rate n % Male lteracy rate n % Female lteracy rate n % Female lteracy rate n % lat lat lon lon Fgure 7: Estmates for the lteracy rate by gender on commune level based on a benchmarked FH model: Senegal (left panel) and Dakar (rght panel). 19

Chapter 5 Multilevel Models

Chapter 5 Multilevel Models Chapter 5 Multlevel Models 5.1 Cross-sectonal multlevel models 5.1.1 Two-level models 5.1.2 Multple level models 5.1.3 Multple level modelng n other felds 5.2 Longtudnal multlevel models 5.2.1 Two-level

More information

Lecture 6: Introduction to Linear Regression

Lecture 6: Introduction to Linear Regression Lecture 6: Introducton to Lnear Regresson An Manchakul amancha@jhsph.edu 24 Aprl 27 Lnear regresson: man dea Lnear regresson can be used to study an outcome as a lnear functon of a predctor Example: 6

More information

Chapter 13: Multiple Regression

Chapter 13: Multiple Regression Chapter 13: Multple Regresson 13.1 Developng the multple-regresson Model The general model can be descrbed as: It smplfes for two ndependent varables: The sample ft parameter b 0, b 1, and b are used to

More information

A Robust Method for Calculating the Correlation Coefficient

A Robust Method for Calculating the Correlation Coefficient A Robust Method for Calculatng the Correlaton Coeffcent E.B. Nven and C. V. Deutsch Relatonshps between prmary and secondary data are frequently quantfed usng the correlaton coeffcent; however, the tradtonal

More information

On Outlier Robust Small Area Mean Estimate Based on Prediction of Empirical Distribution Function

On Outlier Robust Small Area Mean Estimate Based on Prediction of Empirical Distribution Function On Outler Robust Small Area Mean Estmate Based on Predcton of Emprcal Dstrbuton Functon Payam Mokhtaran Natonal Insttute of Appled Statstcs Research Australa Unversty of Wollongong Small Area Estmaton

More information

Negative Binomial Regression

Negative Binomial Regression STATGRAPHICS Rev. 9/16/2013 Negatve Bnomal Regresson Summary... 1 Data Input... 3 Statstcal Model... 3 Analyss Summary... 4 Analyss Optons... 7 Plot of Ftted Model... 8 Observed Versus Predcted... 10 Predctons...

More information

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding Recall: man dea of lnear regresson Lecture 9: Lnear regresson: centerng, hypothess testng, multple covarates, and confoundng Sandy Eckel seckel@jhsph.edu 6 May 8 Lnear regresson can be used to study an

More information

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding Lecture 9: Lnear regresson: centerng, hypothess testng, multple covarates, and confoundng Sandy Eckel seckel@jhsph.edu 6 May 008 Recall: man dea of lnear regresson Lnear regresson can be used to study

More information

Comparison of Regression Lines

Comparison of Regression Lines STATGRAPHICS Rev. 9/13/2013 Comparson of Regresson Lnes Summary... 1 Data Input... 3 Analyss Summary... 4 Plot of Ftted Model... 6 Condtonal Sums of Squares... 6 Analyss Optons... 7 Forecasts... 8 Confdence

More information

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers Psychology 282 Lecture #24 Outlne Regresson Dagnostcs: Outlers In an earler lecture we studed the statstcal assumptons underlyng the regresson model, ncludng the followng ponts: Formal statement of assumptons.

More information

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 30 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 2 Remedes for multcollnearty Varous technques have

More information

Chapter 3 Describing Data Using Numerical Measures

Chapter 3 Describing Data Using Numerical Measures Chapter 3 Student Lecture Notes 3-1 Chapter 3 Descrbng Data Usng Numercal Measures Fall 2006 Fundamentals of Busness Statstcs 1 Chapter Goals To establsh the usefulness of summary measures of data. The

More information

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4) I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes

More information

Basically, if you have a dummy dependent variable you will be estimating a probability.

Basically, if you have a dummy dependent variable you will be estimating a probability. ECON 497: Lecture Notes 13 Page 1 of 1 Metropoltan State Unversty ECON 497: Research and Forecastng Lecture Notes 13 Dummy Dependent Varable Technques Studenmund Chapter 13 Bascally, f you have a dummy

More information

Lecture 3 Stat102, Spring 2007

Lecture 3 Stat102, Spring 2007 Lecture 3 Stat0, Sprng 007 Chapter 3. 3.: Introducton to regresson analyss Lnear regresson as a descrptve technque The least-squares equatons Chapter 3.3 Samplng dstrbuton of b 0, b. Contnued n net lecture

More information

Chapter 8 Indicator Variables

Chapter 8 Indicator Variables Chapter 8 Indcator Varables In general, e explanatory varables n any regresson analyss are assumed to be quanttatve n nature. For example, e varables lke temperature, dstance, age etc. are quanttatve n

More information

University, Bogor, Indonesia.

University, Bogor, Indonesia. ROBUST SMALL AREA ESTIMATION FOR HOUSEHOLD CONSUMPTION EXPENDITURE QUANTILES USING M-QUANTILE APPROACH (CASE STUDY: POVERTY INDICATOR DATA IN BOGOR DISTRICT) Kusman Sadk 1,a), Grnoto 1,b), Indahwat 1,c)

More information

/ n ) are compared. The logic is: if the two

/ n ) are compared. The logic is: if the two STAT C141, Sprng 2005 Lecture 13 Two sample tests One sample tests: examples of goodness of ft tests, where we are testng whether our data supports predctons. Two sample tests: called as tests of ndependence

More information

Uncertainty as the Overlap of Alternate Conditional Distributions

Uncertainty as the Overlap of Alternate Conditional Distributions Uncertanty as the Overlap of Alternate Condtonal Dstrbutons Olena Babak and Clayton V. Deutsch Centre for Computatonal Geostatstcs Department of Cvl & Envronmental Engneerng Unversty of Alberta An mportant

More information

Statistics II Final Exam 26/6/18

Statistics II Final Exam 26/6/18 Statstcs II Fnal Exam 26/6/18 Academc Year 2017/18 Solutons Exam duraton: 2 h 30 mn 1. (3 ponts) A town hall s conductng a study to determne the amount of leftover food produced by the restaurants n the

More information

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA 4 Analyss of Varance (ANOVA) 5 ANOVA 51 Introducton ANOVA ANOVA s a way to estmate and test the means of multple populatons We wll start wth one-way ANOVA If the populatons ncluded n the study are selected

More information

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6 Department of Quanttatve Methods & Informaton Systems Tme Seres and Ther Components QMIS 30 Chapter 6 Fall 00 Dr. Mohammad Zanal These sldes were modfed from ther orgnal source for educatonal purpose only.

More information

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

2016 Wiley. Study Session 2: Ethical and Professional Standards Application 6 Wley Study Sesson : Ethcal and Professonal Standards Applcaton LESSON : CORRECTION ANALYSIS Readng 9: Correlaton and Regresson LOS 9a: Calculate and nterpret a sample covarance and a sample correlaton

More information

January Examinations 2015

January Examinations 2015 24/5 Canddates Only January Examnatons 25 DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR STUDENT CANDIDATE NO.. Department Module Code Module Ttle Exam Duraton (n words)

More information

Chapter 11: Simple Linear Regression and Correlation

Chapter 11: Simple Linear Regression and Correlation Chapter 11: Smple Lnear Regresson and Correlaton 11-1 Emprcal Models 11-2 Smple Lnear Regresson 11-3 Propertes of the Least Squares Estmators 11-4 Hypothess Test n Smple Lnear Regresson 11-4.1 Use of t-tests

More information

Efficient nonresponse weighting adjustment using estimated response probability

Efficient nonresponse weighting adjustment using estimated response probability Effcent nonresponse weghtng adjustment usng estmated response probablty Jae Kwang Km Department of Appled Statstcs, Yonse Unversty, Seoul, 120-749, KOREA Key Words: Regresson estmator, Propensty score,

More information

Bias-correction under a semi-parametric model for small area estimation

Bias-correction under a semi-parametric model for small area estimation Bas-correcton under a sem-parametrc model for small area estmaton Laura Dumtrescu, Vctora Unversty of Wellngton jont work wth J. N. K. Rao, Carleton Unversty ICORS 2017 Workshop on Robust Inference for

More information

Statistics for Economics & Business

Statistics for Economics & Business Statstcs for Economcs & Busness Smple Lnear Regresson Learnng Objectves In ths chapter, you learn: How to use regresson analyss to predct the value of a dependent varable based on an ndependent varable

More information

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Analyss of Varance and Desgn of Experment-I MODULE VII LECTURE - 3 ANALYSIS OF COVARIANCE Dr Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur Any scentfc experment s performed

More information

Online Appendix to: Axiomatization and measurement of Quasi-hyperbolic Discounting

Online Appendix to: Axiomatization and measurement of Quasi-hyperbolic Discounting Onlne Appendx to: Axomatzaton and measurement of Quas-hyperbolc Dscountng José Lus Montel Olea Tomasz Strzaleck 1 Sample Selecton As dscussed before our ntal sample conssts of two groups of subjects. Group

More information

x = , so that calculated

x = , so that calculated Stat 4, secton Sngle Factor ANOVA notes by Tm Plachowsk n chapter 8 we conducted hypothess tests n whch we compared a sngle sample s mean or proporton to some hypotheszed value Chapter 9 expanded ths to

More information

BIO Lab 2: TWO-LEVEL NORMAL MODELS with school children popularity data

BIO Lab 2: TWO-LEVEL NORMAL MODELS with school children popularity data Lab : TWO-LEVEL NORMAL MODELS wth school chldren popularty data Purpose: Introduce basc two-level models for normally dstrbuted responses usng STATA. In partcular, we dscuss Random ntercept models wthout

More information

x i1 =1 for all i (the constant ).

x i1 =1 for all i (the constant ). Chapter 5 The Multple Regresson Model Consder an economc model where the dependent varable s a functon of K explanatory varables. The economc model has the form: y = f ( x,x,..., ) xk Approxmate ths by

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands Content. Inference on Regresson Parameters a. Fndng Mean, s.d and covarance amongst estmates.. Confdence Intervals and Workng Hotellng Bands 3. Cochran s Theorem 4. General Lnear Testng 5. Measures of

More information

Topic- 11 The Analysis of Variance

Topic- 11 The Analysis of Variance Topc- 11 The Analyss of Varance Expermental Desgn The samplng plan or expermental desgn determnes the way that a sample s selected. In an observatonal study, the expermenter observes data that already

More information

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution Department of Statstcs Unversty of Toronto STA35HS / HS Desgn and Analyss of Experments Term Test - Wnter - Soluton February, Last Name: Frst Name: Student Number: Instructons: Tme: hours. Ads: a non-programmable

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

Chapter 14: Logit and Probit Models for Categorical Response Variables

Chapter 14: Logit and Probit Models for Categorical Response Variables Chapter 4: Logt and Probt Models for Categorcal Response Varables Sect 4. Models for Dchotomous Data We wll dscuss only ths secton of Chap 4, whch s manly about Logstc Regresson, a specal case of the famly

More information

DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes

DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes 25/6 Canddates Only January Examnatons 26 Student Number: Desk Number:...... DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR Department Module Code Module Ttle Exam Duraton

More information

Comparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method

Comparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method Appled Mathematcal Scences, Vol. 7, 0, no. 47, 07-0 HIARI Ltd, www.m-hkar.com Comparson of the Populaton Varance Estmators of -Parameter Exponental Dstrbuton Based on Multple Crtera Decson Makng Method

More information

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010 Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton

More information

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Analyss of Varance and Desgn of Exerments-I MODULE III LECTURE - 2 EXPERIMENTAL DESIGN MODELS Dr. Shalabh Deartment of Mathematcs and Statstcs Indan Insttute of Technology Kanur 2 We consder the models

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

Bayesian predictive Configural Frequency Analysis

Bayesian predictive Configural Frequency Analysis Psychologcal Test and Assessment Modelng, Volume 54, 2012 (3), 285-292 Bayesan predctve Confgural Frequency Analyss Eduardo Gutérrez-Peña 1 Abstract Confgural Frequency Analyss s a method for cell-wse

More information

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 31 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 6. Rdge regresson The OLSE s the best lnear unbased

More information

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U) Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of

More information

Economics 130. Lecture 4 Simple Linear Regression Continued

Economics 130. Lecture 4 Simple Linear Regression Continued Economcs 130 Lecture 4 Contnued Readngs for Week 4 Text, Chapter and 3. We contnue wth addressng our second ssue + add n how we evaluate these relatonshps: Where do we get data to do ths analyss? How do

More information

Spatial Statistics and Analysis Methods (for GEOG 104 class).

Spatial Statistics and Analysis Methods (for GEOG 104 class). Spatal Statstcs and Analyss Methods (for GEOG 104 class). Provded by Dr. An L, San Dego State Unversty. 1 Ponts Types of spatal data Pont pattern analyss (PPA; such as nearest neghbor dstance, quadrat

More information

Boostrapaggregating (Bagging)

Boostrapaggregating (Bagging) Boostrapaggregatng (Baggng) An ensemble meta-algorthm desgned to mprove the stablty and accuracy of machne learnng algorthms Can be used n both regresson and classfcaton Reduces varance and helps to avod

More information

Testing for seasonal unit roots in heterogeneous panels

Testing for seasonal unit roots in heterogeneous panels Testng for seasonal unt roots n heterogeneous panels Jesus Otero * Facultad de Economía Unversdad del Rosaro, Colomba Jeremy Smth Department of Economcs Unversty of arwck Monca Gulett Aston Busness School

More information

Cokriging Partial Grades - Application to Block Modeling of Copper Deposits

Cokriging Partial Grades - Application to Block Modeling of Copper Deposits Cokrgng Partal Grades - Applcaton to Block Modelng of Copper Deposts Serge Séguret 1, Julo Benscell 2 and Pablo Carrasco 2 Abstract Ths work concerns mneral deposts made of geologcal bodes such as breccas

More information

Small Area Interval Estimation

Small Area Interval Estimation .. Small Area Interval Estmaton Partha Lahr Jont Program n Survey Methodology Unversty of Maryland, College Park (Based on jont work wth Masayo Yoshmor, Former JPSM Vstng PhD Student and Research Fellow

More information

Statistics for Business and Economics

Statistics for Business and Economics Statstcs for Busness and Economcs Chapter 11 Smple Regresson Copyrght 010 Pearson Educaton, Inc. Publshng as Prentce Hall Ch. 11-1 11.1 Overvew of Lnear Models n An equaton can be ft to show the best lnear

More information

4.3 Poisson Regression

4.3 Poisson Regression of teratvely reweghted least squares regressons (the IRLS algorthm). We do wthout gvng further detals, but nstead focus on the practcal applcaton. > glm(survval~log(weght)+age, famly="bnomal", data=baby)

More information

Numerical Heat and Mass Transfer

Numerical Heat and Mass Transfer Master degree n Mechancal Engneerng Numercal Heat and Mass Transfer 06-Fnte-Dfference Method (One-dmensonal, steady state heat conducton) Fausto Arpno f.arpno@uncas.t Introducton Why we use models and

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

NUMERICAL DIFFERENTIATION

NUMERICAL DIFFERENTIATION NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the

More information

An Introduction to Censoring, Truncation and Sample Selection Problems

An Introduction to Censoring, Truncation and Sample Selection Problems An Introducton to Censorng, Truncaton and Sample Selecton Problems Thomas Crossley SPIDA June 2003 1 A. Introducton A.1 Basc Ideas Most of the statstcal technques we study are for estmatng (populaton)

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

Global Sensitivity. Tuesday 20 th February, 2018

Global Sensitivity. Tuesday 20 th February, 2018 Global Senstvty Tuesday 2 th February, 28 ) Local Senstvty Most senstvty analyses [] are based on local estmates of senstvty, typcally by expandng the response n a Taylor seres about some specfc values

More information

Midterm Examination. Regression and Forecasting Models

Midterm Examination. Regression and Forecasting Models IOMS Department Regresson and Forecastng Models Professor Wllam Greene Phone: 22.998.0876 Offce: KMC 7-90 Home page: people.stern.nyu.edu/wgreene Emal: wgreene@stern.nyu.edu Course web page: people.stern.nyu.edu/wgreene/regresson/outlne.htm

More information

Discussion of Extensions of the Gauss-Markov Theorem to the Case of Stochastic Regression Coefficients Ed Stanek

Discussion of Extensions of the Gauss-Markov Theorem to the Case of Stochastic Regression Coefficients Ed Stanek Dscusson of Extensons of the Gauss-arkov Theorem to the Case of Stochastc Regresson Coeffcents Ed Stanek Introducton Pfeffermann (984 dscusses extensons to the Gauss-arkov Theorem n settngs where regresson

More information

Durban Watson for Testing the Lack-of-Fit of Polynomial Regression Models without Replications

Durban Watson for Testing the Lack-of-Fit of Polynomial Regression Models without Replications Durban Watson for Testng the Lack-of-Ft of Polynomal Regresson Models wthout Replcatons Ruba A. Alyaf, Maha A. Omar, Abdullah A. Al-Shha ralyaf@ksu.edu.sa, maomar@ksu.edu.sa, aalshha@ksu.edu.sa Department

More information

The Geometry of Logit and Probit

The Geometry of Logit and Probit The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.

More information

Introduction to Generalized Linear Models

Introduction to Generalized Linear Models INTRODUCTION TO STATISTICAL MODELLING TRINITY 00 Introducton to Generalzed Lnear Models I. Motvaton In ths lecture we extend the deas of lnear regresson to the more general dea of a generalzed lnear model

More information

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition)

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition) Count Data Models See Book Chapter 11 2 nd Edton (Chapter 10 1 st Edton) Count data consst of non-negatve nteger values Examples: number of drver route changes per week, the number of trp departure changes

More information

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics ECOOMICS 35*-A Md-Term Exam -- Fall Term 000 Page of 3 pages QUEE'S UIVERSITY AT KIGSTO Department of Economcs ECOOMICS 35* - Secton A Introductory Econometrcs Fall Term 000 MID-TERM EAM ASWERS MG Abbott

More information

Homework Assignment 3 Due in class, Thursday October 15

Homework Assignment 3 Due in class, Thursday October 15 Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.

More information

BOOTSTRAP METHOD FOR TESTING OF EQUALITY OF SEVERAL MEANS. M. Krishna Reddy, B. Naveen Kumar and Y. Ramu

BOOTSTRAP METHOD FOR TESTING OF EQUALITY OF SEVERAL MEANS. M. Krishna Reddy, B. Naveen Kumar and Y. Ramu BOOTSTRAP METHOD FOR TESTING OF EQUALITY OF SEVERAL MEANS M. Krshna Reddy, B. Naveen Kumar and Y. Ramu Department of Statstcs, Osmana Unversty, Hyderabad -500 007, Inda. nanbyrozu@gmal.com, ramu0@gmal.com

More information

Econ107 Applied Econometrics Topic 9: Heteroskedasticity (Studenmund, Chapter 10)

Econ107 Applied Econometrics Topic 9: Heteroskedasticity (Studenmund, Chapter 10) I. Defnton and Problems Econ7 Appled Econometrcs Topc 9: Heteroskedastcty (Studenmund, Chapter ) We now relax another classcal assumpton. Ths s a problem that arses often wth cross sectons of ndvduals,

More information

Composite Hypotheses testing

Composite Hypotheses testing Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter

More information

Resource Allocation with a Budget Constraint for Computing Independent Tasks in the Cloud

Resource Allocation with a Budget Constraint for Computing Independent Tasks in the Cloud Resource Allocaton wth a Budget Constrant for Computng Independent Tasks n the Cloud Wemng Sh and Bo Hong School of Electrcal and Computer Engneerng Georga Insttute of Technology, USA 2nd IEEE Internatonal

More information

UNR Joint Economics Working Paper Series Working Paper No Further Analysis of the Zipf Law: Does the Rank-Size Rule Really Exist?

UNR Joint Economics Working Paper Series Working Paper No Further Analysis of the Zipf Law: Does the Rank-Size Rule Really Exist? UNR Jont Economcs Workng Paper Seres Workng Paper No. 08-005 Further Analyss of the Zpf Law: Does the Rank-Sze Rule Really Exst? Fungsa Nota and Shunfeng Song Department of Economcs /030 Unversty of Nevada,

More information

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore Sesson Outlne Introducton to classfcaton problems and dscrete choce models. Introducton to Logstcs Regresson. Logstc functon and Logt functon. Maxmum Lkelhood Estmator (MLE) for estmaton of LR parameters.

More information

Chapter 6. Supplemental Text Material

Chapter 6. Supplemental Text Material Chapter 6. Supplemental Text Materal S6-. actor Effect Estmates are Least Squares Estmates We have gven heurstc or ntutve explanatons of how the estmates of the factor effects are obtaned n the textboo.

More information

since [1-( 0+ 1x1i+ 2x2 i)] [ 0+ 1x1i+ assumed to be a reasonable approximation

since [1-( 0+ 1x1i+ 2x2 i)] [ 0+ 1x1i+ assumed to be a reasonable approximation Econ 388 R. Butler 204 revsons Lecture 4 Dummy Dependent Varables I. Lnear Probablty Model: the Regresson model wth a dummy varables as the dependent varable assumpton, mplcaton regular multple regresson

More information

III. Econometric Methodology Regression Analysis

III. Econometric Methodology Regression Analysis Page Econ07 Appled Econometrcs Topc : An Overvew of Regresson Analyss (Studenmund, Chapter ) I. The Nature and Scope of Econometrcs. Lot s of defntons of econometrcs. Nobel Prze Commttee Paul Samuelson,

More information

LINEAR REGRESSION ANALYSIS. MODULE VIII Lecture Indicator Variables

LINEAR REGRESSION ANALYSIS. MODULE VIII Lecture Indicator Variables LINEAR REGRESSION ANALYSIS MODULE VIII Lecture - 7 Indcator Varables Dr. Shalabh Department of Maematcs and Statstcs Indan Insttute of Technology Kanpur Indcator varables versus quanttatve explanatory

More information

The Ordinary Least Squares (OLS) Estimator

The Ordinary Least Squares (OLS) Estimator The Ordnary Least Squares (OLS) Estmator 1 Regresson Analyss Regresson Analyss: a statstcal technque for nvestgatng and modelng the relatonshp between varables. Applcatons: Engneerng, the physcal and chemcal

More information

DrPH Seminar Session 3. Quantitative Synthesis. Qualitative Synthesis e.g., GRADE

DrPH Seminar Session 3. Quantitative Synthesis. Qualitative Synthesis e.g., GRADE DrPH Semnar Sesson 3 Quanttatve Synthess Focusng on Heterogenety Qualtatve Synthess e.g., GRADE Me Chung, PhD, MPH Research Assstant Professor Nutrton/Infecton Unt, Department of Publc Health and Communty

More information

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Analyss of Varance and Desgn of Experment-I MODULE VIII LECTURE - 34 ANALYSIS OF VARIANCE IN RANDOM-EFFECTS MODEL AND MIXED-EFFECTS EFFECTS MODEL Dr Shalabh Department of Mathematcs and Statstcs Indan

More information

VQ widely used in coding speech, image, and video

VQ widely used in coding speech, image, and video at Scalar quantzers are specal cases of vector quantzers (VQ): they are constraned to look at one sample at a tme (memoryless) VQ does not have such constrant better RD perfomance expected Source codng

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

A LINEAR PROGRAM TO COMPARE MULTIPLE GROSS CREDIT LOSS FORECASTS. Dr. Derald E. Wentzien, Wesley College, (302) ,

A LINEAR PROGRAM TO COMPARE MULTIPLE GROSS CREDIT LOSS FORECASTS. Dr. Derald E. Wentzien, Wesley College, (302) , A LINEAR PROGRAM TO COMPARE MULTIPLE GROSS CREDIT LOSS FORECASTS Dr. Derald E. Wentzen, Wesley College, (302) 736-2574, wentzde@wesley.edu ABSTRACT A lnear programmng model s developed and used to compare

More information

Supporting Information

Supporting Information Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to

More information

28. SIMPLE LINEAR REGRESSION III

28. SIMPLE LINEAR REGRESSION III 8. SIMPLE LINEAR REGRESSION III Ftted Values and Resduals US Domestc Beers: Calores vs. % Alcohol To each observed x, there corresponds a y-value on the ftted lne, y ˆ = βˆ + βˆ x. The are called ftted

More information

Topic 23 - Randomized Complete Block Designs (RCBD)

Topic 23 - Randomized Complete Block Designs (RCBD) Topc 3 ANOVA (III) 3-1 Topc 3 - Randomzed Complete Block Desgns (RCBD) Defn: A Randomzed Complete Block Desgn s a varant of the completely randomzed desgn (CRD) that we recently learned. In ths desgn,

More information

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models Computaton of Hgher Order Moments from Two Multnomal Overdsperson Lkelhood Models BY J. T. NEWCOMER, N. K. NEERCHAL Department of Mathematcs and Statstcs, Unversty of Maryland, Baltmore County, Baltmore,

More information

Polynomial Regression Models

Polynomial Regression Models LINEAR REGRESSION ANALYSIS MODULE XII Lecture - 6 Polynomal Regresson Models Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur Test of sgnfcance To test the sgnfcance

More information

ECONOMETRICS - FINAL EXAM, 3rd YEAR (GECO & GADE)

ECONOMETRICS - FINAL EXAM, 3rd YEAR (GECO & GADE) ECONOMETRICS - FINAL EXAM, 3rd YEAR (GECO & GADE) June 7, 016 15:30 Frst famly name: Name: DNI/ID: Moble: Second famly Name: GECO/GADE: Instructor: E-mal: Queston 1 A B C Blank Queston A B C Blank Queston

More information

Problem Set 9 Solutions

Problem Set 9 Solutions Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem

More information

Small Area Estimation for Business Surveys

Small Area Estimation for Business Surveys ASA Secton on Survey Research Methods Small Area Estmaton for Busness Surveys Hukum Chandra Southampton Statstcal Scences Research Insttute, Unversty of Southampton Hghfeld, Southampton-SO17 1BJ, U.K.

More information

Non-parametric bootstrap mean squared error estimation for M-quantile estimates of small area means, quantiles and poverty indicators *

Non-parametric bootstrap mean squared error estimation for M-quantile estimates of small area means, quantiles and poverty indicators * Non-parametrc bootstrap mean squared error maton for M-quantle mates of small area means quantles and poverty ndcators * Stefano Marchett 1 Monca Prates 2 Nos zavds 3 1 Unversty of Psa e-mal: stefano.marchett@for.unp.t

More information

Limited Dependent Variables and Panel Data. Tibor Hanappi

Limited Dependent Variables and Panel Data. Tibor Hanappi Lmted Dependent Varables and Panel Data Tbor Hanapp 30.06.2010 Lmted Dependent Varables Dscrete: Varables that can take onl a countable number of values Censored/Truncated: Data ponts n some specfc range

More information

Credit Card Pricing and Impact of Adverse Selection

Credit Card Pricing and Impact of Adverse Selection Credt Card Prcng and Impact of Adverse Selecton Bo Huang and Lyn C. Thomas Unversty of Southampton Contents Background Aucton model of credt card solctaton - Errors n probablty of beng Good - Errors n

More information

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography CSc 6974 and ECSE 6966 Math. Tech. for Vson, Graphcs and Robotcs Lecture 21, Aprl 17, 2006 Estmatng A Plane Homography Overvew We contnue wth a dscusson of the major ssues, usng estmaton of plane projectve

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

Chapter 15 - Multiple Regression

Chapter 15 - Multiple Regression Chapter - Multple Regresson Chapter - Multple Regresson Multple Regresson Model The equaton that descrbes how the dependent varable y s related to the ndependent varables x, x,... x p and an error term

More information

Cathy Walker March 5, 2010

Cathy Walker March 5, 2010 Cathy Walker March 5, 010 Part : Problem Set 1. What s the level of measurement for the followng varables? a) SAT scores b) Number of tests or quzzes n statstcal course c) Acres of land devoted to corn

More information