Supplementary Information - On the Predictability of Future Impact in Science

Size: px
Start display at page:

Download "Supplementary Information - On the Predictability of Future Impact in Science"

Transcription

1 Supplementary Information - On the Predictability of uture Impact in Science Orion Penner, Raj K. Pan, lexander M. Petersen, Kimmo Kaski, and Santo ortunato Laboratory of Innovation Management and conomics, IMT Institute for dvanced Studies Lucca, 55 Lucca, Italy epartment of iomedical ngineering and omputational Science, alto University School of Science, P.O. ox, I-76, inland Laboratory for the nalysis of omplex conomic Systems, IMT Institute for dvanced Studies Lucca, 55 Lucca, Italy S. MTHOS. isambiguation strategy The disambiguation problem is a major hurdle in the analysis of careers, as multiple authors having the same initials, and even the same complete name, can appear as a single author. Here we use disambiguated distinct author data from Thomson Reuters Web of Knowledge, isiknowledge.com using their matching algorithms to identify publication profiles of distinct authors. urther, we use its website portal ResearcherI.com, where users upload and maintain their publication profiles. This ISI online database is host to comprehensive data that is well-suited for developing testable models for scientific impact [, ] and career achievement [, ].. Selection of scientists We seek to compare variations in productivity and impact across distinct scientific fields as well as within fields. To do this we analyze a total 76 scientists divided into broad disciplines: 76, 6 biologists, and 5 mathematicians. ataset : or the selection of high-impact, we aggregate all authors who published in Physical Review Letters (PRL) over the 5-year period 95- into a common dataset. rom this dataset, we rank the scientists using the citations shares metric defined in [], and select the top. Such metric divides equally the total number of citations a paper receives among the k coauthors, and also normalizes the total number of citations by a time-dependent factor to account for citation variations across time and discipline. Hence, for each scientist in the PRL database, we calculate a cumulative number of citation shares received from only their PRL publications. This tally serves as a proxy for his/her scientific impact in all journals. We also choose from our ranked PRL list, randomly, additional highly prolific. The selection criteria for that dataset is that an author must have published between and 5 papers in PRL []. This likely ensures that the total publication [] uthors contributed equally. history, in all journals, be on the order of articles for each author selected. These two lists were curated in such a fashion in a previous publications, and as both of them represent prominent, are merged and analyzed together for some of the analysis. The average h-index of these scientists is h = 56 ±. ataset : or the selection of high-impact cell biologists we choose the top careers based on publications in the journal LL. These scientist s have average h- index of h = 97 ±. ataset : or the selection of high-impact mathematicians we selected the 5 authors with the most publications in the prestigious journal nnals of Mathematics. The average h-index of these scientist is h = ±. We choose only 5 since the variation in collaboration and productivity across mathematics is significantly smaller than in the experimental and theoretical natural sciences. The above three datasets consist of high-impact senior scientists with average academic age ±, ± 7 and 6 ±, respectively. ataset : We also consider relatively young assistant professors from physics. To select the scientists in this dataset, we choose two assistant professors from each of the top 5 U.S. physics and astronomy departments ranked according to the magazine U.S. News. The average h-index of these scientists is h = ± 7. or datasets []-[] we used the istinct uthor Sets function provided by ISI in order to increase the likelihood that only papers published by each given author are analyzed. On a case by case basis, we performed further author disambiguation for each author. Other datasets are comprised of a broad range of scientists with profiles on ResearcherI.com who satisfied the criterion of having more than publications. ataset : This dataset consists of 7 scientists who have published in the field of graphene research. dditionally, we also include notable leaders of this field (Nobel Prize Laureates. K. Geim and K. S. Novoselov). The average h-index of this group is h = ±. ataset : This dataset consists of 6 molecular biology and 76 neuroscience ResearcherI scientists. We assume that such ResearcherI scientists have uploaded a representative (approximately update and complete) set of publications. The average h-index of the scientists in this group is h = 7 ±. The last three datasets consist of relatively young scientists with average academic age ±6, ±7 and 7±9,

2 ..5. h-index.5 academic age biologists. 5 5 R Top Prolific t=ll t=7 t= igure S: Variation of the standard coefficient indicating the change in the contribution of each factor over time for different disciplines. The shaded region indicates the 95% confidence error bars. The standard coefficients are shown for t =ll case, where all career ages were lumped together. respectively. In summary, we group the 76 scientists that we analyze into 6 sets. We downloaded datasets in Jan., and in pr., in Oct., and, in October. biologists 5 5 t=7 h-index biologists 5 5 h-index igure S: Variation of the standard coefficient indicating the change in the contribution of each factor over time for different disciplines. The shaded region indicates the 95% confidence error bars. The standard coefficients are shown for different age cohorts t =, 7. S. LSTI NT RGULRIZTION Regression analysis is a statistical technique for estimating the relationships among the dependent and independent variables. If X represent the independent vari- igure S: The predictive power of the regression model of the h-index for physicist for different career age cohorts (years since first publication t =, 5, 7), with () top and () prolific plotted separately. In the curve for t =ll, all career ages were lumped together. or all the cases, overall regression model is significant (p < 6, calculated from -statistic). The first group show slightly larger predictive power as compared to the second group, especially for large t. able and y represent the dependent variable then the regression model relates these two variables as y = f(x, w), (S) where w are the unknown parameters. If the dependent variable is expected to be a linear combination of the independent variables, then in mathematical notation, the predicted value is expressed as ŷ = w + w x + + w p x p (S) Here, w is the intercept and (w,..., w p ) are the coefficients of the model. In general one can use an ordinary least square method to fit a linear model with coefficients to minimize the residual sum of squares between the observed responses in the dataset, and the responses predicted by the linear approximation. Thus, it can be represented as min w Xw y. (S) However, coefficient estimates for ordinary least squares rely on the independence of the model terms. When terms are correlated (also termed as collinear) this method becomes highly sensitive to random errors in the observed response, producing a large variance. urther if the number of features is large, it is possible to reduce the complexity of the model by forcing some coefficients to be small or zero. The elastic net regularization does this by imposing preferred solutions with fewer parameter values, effectively reducing the number of variables upon which the given solution is dependent. This method can be mathematically represented as min w n Xw y + λα w + λ( α) w (S)

3 Here,... and... are the L and L norm of the vector respectively. λ is a complexity parameter that controls the amount of shrinkage: the larger the value of λ, the greater the amount of shrinkage and thus the coefficients become more robust to collinearity. The parameter α controls how much collinearity is expected between features. In all our analysis we set α =., whereas the best λ is determined by cross-validation. We also checked that our results are qualitatively similar for other alpha values, say α =.. S. NULL MOLS N RNOMIZ RRS We used two different shuffling methods to create a set of randomized career profiles that do no have any inherent correlations. We use these careers as a benchmark for determining whether the predictability in the regression model is due to correlations in the scientific careers or to the career measure used. Paper shuffle model: In this null model we start by shuffling papers only within a specific career age t. To be specific, let the paper repository {p} t be the set of papers p published among all authors i =... in career year t. To distribute papers to randomized career profiles, we randomly assign papers from the set {p} t until each author has n i (t) papers, where n i (t) is the value observed in his/her real career. This method approximately retains the collaboration patterns of each scientist (and his/her sub-discipline) which are largely responsible for growth in n i (t) over the career []. We tested this shuffling method by pooling together prestigious from datasets [] and [] into a single paper repository. istinguishing papers according to year t we approximately retain the properties of older papers versus younger papers, while washing out the historical, aging, and reputation effects that make empirical career profiles author specific. Using this repository we constructed, synthetic careers profiles used in our analysis. h model: In this null model we first obtain the distribution of single year h-index increases for all careers in a given dataset. Next, we generate a career by constructing a sequence of yearly h-index increases, drawn randomly from the distribution generated in the previous step. This null model is not based on the number of papers published by a specific scientist but rather on the length of career of each scientist, which is kept fixed as in the original dataset. [] Petersen,. M., Wang,. & Stanley, H.. Methods for measuring the citations and productivity of scientists across time and discipline. Phys. Rev., 6 (). [] Petersen,. M., Stanley, H.. & Succi, S. Statistical regularities in the rank-citation profile of scientists. Scientific Reports (). [] Petersen,. M., Jung, W.-S., Yang, J.-S. & Stanley, H.. Quantitative and empirical demonstration of the matthew effect in a study of career longevity. Proc. Natl. cad. Sci. U.S.., (). [] Petersen,. M., Riccaboni, M., Stanley, H.. & Pammolli,. Persistence and uncertainty in the academic career. Proc. Natl. cad. Sci. U.S.. 9, 5 5 ().

4 ..5. h-index.5 academic age Top Prolific. 5 5 t=5 Top Prolific 5 5 h-index Top Prolific 5 5 t=7 h-index Top Prolific 5 5 h-index igure S: Variation of the standard coefficient indicating the change in the contribution of each factor over time for physicist, with top and () prolific modeled separately. The shaded region indicates the 95% confidence error bars for different career age cohorts (years since first publication t =, 5, 7) for both the groups. The spread of the coefficients within a group and the differences in the coefficients across the two groups, indicate that parameters from one group is not suitable to predict the other group...7 biologists mathematicians.6.5 R Physics assistant prof iologists Graphene researcher R igure S5: The predictive power of h-index increments h(t, t) for different disciplines and age-cohorts. or prominent, biologists and mathematicians t =,...,. s the careers of young scientists are short, for the assistant professors in physics, biologists and graphene researchers t =,...,. In all the cases, overall regression model is significant (p < ). or young researchers, as there were very few career with t >, using regression model leads to over fitting and non-significant results (p >.5).

5 5 Std. oeff. Std. oeff. Std. oeff biologists h N journals top journals t=5 t= t= t= igure S6: Variation of the standard coefficient indicating the change in the contribution of each factor over t, the period for the calculation of h. The top panels show the coefficients for early careers (t = ), the mid range careers (t = 5), and late careers (t = ). 6 5 R =.5 6 R = R = R = R =.5. R = igure S7: orrelation between the past and the true future for prominent mathematics careers. Scatter plot of the (,) number of papers, (,) number of total citations, and (,) non-cumulative h-index h(t {T j}) calculated for each author using non-intersecting sets of papers published in early - mid - and late - career periods. The correlation coefficient R for each plot is also shown. The plot also confirms that for mathematicians, fluctuations in impact measure is low and hence many data points lay on top of each other. 6 R = R =. 5 R =.5 5 R = R =. 5 6 R =.5 5 igure S: orrelation between the past and the true future for prominent biology careers. Scatter plot of the (,) number of papers, (,) number of total citations, and (,) noncumulative h-index h(t {T j}) calculated for each author using non-intersecting sets of papers published in early - mid - and late - career periods. 6 5 R = R =.6 R =. 5 5 R = R = R =. 5 5 igure S9: orrelation between the past and the true future for assistant professors in physics. Scatter plot of the (,) number of papers, (,) number of total citations, and (,) non-cumulative h-index h(t {T j}) calculated for each author using non-intersecting sets of papers published in early - mid - and late - career periods.

6 6 R =. 5 6 R =. R =.6 6 R = R = R =. 6 6 R =. 6 5 R =. R =. R = R =. 5 R =. igure S: orrelation between the past and the true future for careers in biology. Scatter plot of the (,) number of papers, (,) number of total citations, and (,) noncumulative h-index h(t {T j}) calculated for each author using non-intersecting sets of papers published in early - mid - and late - career periods. The correlation coefficient R for each plot is also shown. igure S: orrelation between the past and the true future for graphene researchers. Scatter plot of the (,) number of papers, (,) number of total citations, and (,) noncumulative h-index h(t {T j}) calculated for each author using non-intersecting sets of papers published in early - mid - and late - career periods. The correlation coefficient R for each plot is also shown.

arxiv: v1 [physics.soc-ph] 27 Aug 2013

arxiv: v1 [physics.soc-ph] 27 Aug 2013 The Z-index: A geometric representation of productivity and impact which accounts for information in the entire rank-citation profile Alexander M. Petersen 1 and Sauro Succi 2, 3 1 IMT Lucca Institute

More information

The Memory of Science: Inflation, Myopia, and the Knowledge Network

The Memory of Science: Inflation, Myopia, and the Knowledge Network 1 Supplementary Material The Memory of Science: Inflation, Myopia, and the Knowledge Network Raj K. Pan 1, Alexander M. Petersen 2,3, Fabio Pammolli 2, Santo Fortunato 3 1 Department of Computational Science,

More information

Supplementary Materials for

Supplementary Materials for www.sciencemag.org/content/354/63/aaf539/suppl/dc Supplementary Materials for Quantifying the evolution of individual scientific impact Roberta Sinatra, Dashun Wang, Pierre Deville, Chaoming Song, Albert-László

More information

Persistence and Uncertainty in the Academic Career

Persistence and Uncertainty in the Academic Career Persistence and Uncertainty in the Academic Career Alexander M. Petersen, Massimo Riccaboni, 2, 3 H. Eugene Stanley, 4, 2, 3, 4 and Fabio Pammolli Laboratory for the Analysis of Complex Economic Systems,

More information

The Retraction Penalty: Evidence from the Web of Science

The Retraction Penalty: Evidence from the Web of Science Supporting Information for The Retraction Penalty: Evidence from the Web of Science October 2013 Susan Feng Lu Simon School of Business University of Rochester susan.lu@simon.rochester.edu Ginger Zhe Jin

More information

arxiv: v1 [physics.soc-ph] 10 May 2011

arxiv: v1 [physics.soc-ph] 10 May 2011 How citation boosts promote scientific paradigm shifts and Nobel Prizes Amin Mazloumian, 1 Young-Ho Eom, 2 Dirk Helbing, 1 Sergi Lozano, 1 and Santo Fortunato 2 1 ETH Zürich, CLU E1, Clausiusstrasse 50,

More information

THE H- AND A-INDEXES IN ASTRONOMY

THE H- AND A-INDEXES IN ASTRONOMY Organizations, People and Strategies in Astronomy I (OPSA I), 245-252 Ed. A. Heck,. THE H- AND A-INDEXES IN ASTRONOMY HELMUT A. ABT Kitt Peak National Observatory P.O. Box 26732 Tucson AZ 85726-6732, U.S.A.

More information

Quantifying statistical regularities in the career achievements of scientists and professional athletes

Quantifying statistical regularities in the career achievements of scientists and professional athletes Quantifying statistical regularities in the career achievements of scientists and professional athletes Alexander M. Petersen Department of Physics, Boston University Thesis Advisor: H. Eugene Stanley

More information

Modeling Uncertainty in the Earth Sciences Jef Caers Stanford University

Modeling Uncertainty in the Earth Sciences Jef Caers Stanford University Probability theory and statistical analysis: a review Modeling Uncertainty in the Earth Sciences Jef Caers Stanford University Concepts assumed known Histograms, mean, median, spread, quantiles Probability,

More information

Evaluation of research publications and publication channels in astronomy and astrophysics

Evaluation of research publications and publication channels in astronomy and astrophysics https://helda.helsinki.fi Evaluation of research publications and publication channels in astronomy and astrophysics Isaksson, Eva Kristiina 2018-07-27 Isaksson, E K & Vesterinen, H 2018, ' Evaluation

More information

5-Star Analysis Tutorial!

5-Star Analysis Tutorial! 5-Star Analysis Tutorial This tutorial was originally created by Aaron Price for the Citizen Sky 2 workshop. It has since been updated by Paul York to reflect changes to the VStar software since that time.

More information

Web of Science: showing a bug today that can mislead scientific research output s prediction

Web of Science: showing a bug today that can mislead scientific research output s prediction Web of Science: showing a bug today that can mislead scientific research output s prediction Pablo Diniz Batista 1 *, Igor Carneiro Marques 1, Leduc Hermeto de Almeida Fauth 1 and Márcia de Oliveira Reis

More information

Group comparisons in logit and probit using predicted probabilities 1

Group comparisons in logit and probit using predicted probabilities 1 Group comparisons in logit and probit using predicted probabilities 1 J. Scott Long Indiana University May 27, 2009 Abstract The comparison of groups in regression models for binary outcomes is complicated

More information

MS-C1620 Statistical inference

MS-C1620 Statistical inference MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents

More information

DEPARTMENT OF ECONOMICS AND FINANCE COLLEGE OF BUSINESS AND ECONOMICS UNIVERSITY OF CANTERBURY CHRISTCHURCH, NEW ZEALAND

DEPARTMENT OF ECONOMICS AND FINANCE COLLEGE OF BUSINESS AND ECONOMICS UNIVERSITY OF CANTERBURY CHRISTCHURCH, NEW ZEALAND DEPARTMENT OF ECONOMICS AND FINANCE COLLEGE OF BUSINESS AND ECONOMICS UNIVERSITY OF CANTERBURY CHRISTCHURCH, NEW ZEALAND Article Influence Score = 5YIF divided by 2 Chia-Lin Chang, Michael McAleer, and

More information

ST430 Exam 1 with Answers

ST430 Exam 1 with Answers ST430 Exam 1 with Answers Date: October 5, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textook are permitted but you may use a calculator.

More information

MODIFIED STEPWISE REGRESSION MODELS ON SIMPLIFYING THE ARWU S INDICATORS

MODIFIED STEPWISE REGRESSION MODELS ON SIMPLIFYING THE ARWU S INDICATORS PP. 95-116 ISSN: 2394-5788 MODIFIED STEPWISE REGRESSION MODELS ON SIMPLIFYING THE ARWU S INDICATORS Farrah Pei-Chen Chang & Liang-Yuh Ouyang Department of Management Sciences, Tamkang University, 25137,

More information

Analysis of bibliometric indicators for individual scholars in a large data set

Analysis of bibliometric indicators for individual scholars in a large data set Analysis of bibliometric indicators for individual scholars in a large data set Filippo Radicchi 1, and Claudio Castellano 2, 1 Departament d Enginyeria Quimica, Universitat Rovira i Virgili, Av. Paisos

More information

Outline Challenges of Massive Data Combining approaches Application: Event Detection for Astronomical Data Conclusion. Abstract

Outline Challenges of Massive Data Combining approaches Application: Event Detection for Astronomical Data Conclusion. Abstract Abstract The analysis of extremely large, complex datasets is becoming an increasingly important task in the analysis of scientific data. This trend is especially prevalent in astronomy, as large-scale

More information

arxiv: v1 [physics.soc-ph] 23 May 2017

arxiv: v1 [physics.soc-ph] 23 May 2017 On the persistency of scientific publications: introducing an h-index for journals Roberto Piazza arxiv:1705.09390v1 [physics.soc-ph] 23 May 2017 Department CMIC Giulio Natta, Politecnico di Milano, P.zza

More information

Stat 401B Final Exam Fall 2016

Stat 401B Final Exam Fall 2016 Stat 40B Final Exam Fall 0 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed ATTENTION! Incorrect numerical answers unaccompanied by supporting reasoning

More information

Model Fitting. Jean Yves Le Boudec

Model Fitting. Jean Yves Le Boudec Model Fitting Jean Yves Le Boudec 0 Contents 1. What is model fitting? 2. Linear Regression 3. Linear regression with norm minimization 4. Choosing a distribution 5. Heavy Tail 1 Virus Infection Data We

More information

Methods for measuring the citations and productivity of scientists across time and discipline

Methods for measuring the citations and productivity of scientists across time and discipline Methods for measuring the citations and productivity of scientists across time and discipline Alexander M. Petersen, Fengzhong Wang, and H. Eugene Stanley Center for Polymer Studies and Department of Physics,

More information

Bob Stembridge Patent analysis and graphene 2020 programme

Bob Stembridge Patent analysis and graphene 2020 programme Bob Stembridge Patent analysis and graphene 2020 programme Bob Stembridge graduated from University of Sussex, UK with an Honours degree in Chemistry. He joined Derwent (one of the founding components

More information

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 Statistics Boot Camp Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 March 21, 2018 Outline of boot camp Summarizing and simplifying data Point and interval estimation Foundations of statistical

More information

A stochastic model for Career Longevity

A stochastic model for Career Longevity A stochastic model for Career Longevity Alexander Petersen Thesis Advisor: H. Eugene Stanley Department of Physics, Boston University 12/09/2008 I ) A. Petersen, W.-S. Jung, H. E. Stanley, On the distribution

More information

(Directions for Excel Mac: 2011) Most of the global average warming over the past 50 years is very likely due to anthropogenic GHG increases

(Directions for Excel Mac: 2011) Most of the global average warming over the past 50 years is very likely due to anthropogenic GHG increases (Directions for Excel Mac: 2011) Most of the global average warming over the past 50 years is very likely due to anthropogenic GHG increases How does the IPCC know whether the statement about global warming

More information

Week 8: Correlation and Regression

Week 8: Correlation and Regression Health Sciences M.Sc. Programme Applied Biostatistics Week 8: Correlation and Regression The correlation coefficient Correlation coefficients are used to measure the strength of the relationship or association

More information

Innovative Web Tools for User Outreach: ChemWeb.com Alerting Services

Innovative Web Tools for User Outreach: ChemWeb.com Alerting Services Innovative Web Tools for User Outreach: ChemWeb.com Alerting Services Jan Kuras, ChemWeb Inc., 84 Theobalds Road, London WC1X 8RR, UK jan.kuras@chemweb.com Introduction ChemWeb.com aims to maintain a one-stop

More information

Workshop on Statistical Applications in Meta-Analysis

Workshop on Statistical Applications in Meta-Analysis Workshop on Statistical Applications in Meta-Analysis Robert M. Bernard & Phil C. Abrami Centre for the Study of Learning and Performance and CanKnow Concordia University May 16, 2007 Two Main Purposes

More information

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

 M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2 Notation and Equations for Final Exam Symbol Definition X The variable we measure in a scientific study n The size of the sample N The size of the population M The mean of the sample µ The mean of the

More information

EVALUATING THE USAGE AND IMPACT OF E-JOURNALS IN THE UK

EVALUATING THE USAGE AND IMPACT OF E-JOURNALS IN THE UK E v a l u a t i n g t h e U s a g e a n d I m p a c t o f e - J o u r n a l s i n t h e U K p a g e 1 EVALUATING THE USAGE AND IMPACT OF E-JOURNALS IN THE UK BIBLIOMETRIC INDICATORS FOR CASE STUDY INSTITUTIONS

More information

Journal Impact Factor Versus Eigenfactor and Article Influence*

Journal Impact Factor Versus Eigenfactor and Article Influence* Journal Impact Factor Versus Eigenfactor and Article Influence* Chia-Lin Chang Department of Applied Economics National Chung Hsing University Taichung, Taiwan Michael McAleer Econometric Institute Erasmus

More information

EXTENDING PARTIAL LEAST SQUARES REGRESSION

EXTENDING PARTIAL LEAST SQUARES REGRESSION EXTENDING PARTIAL LEAST SQUARES REGRESSION ATHANASSIOS KONDYLIS UNIVERSITY OF NEUCHÂTEL 1 Outline Multivariate Calibration in Chemometrics PLS regression (PLSR) and the PLS1 algorithm PLS1 from a statistical

More information

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017 Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017 PDF file location: http://www.murraylax.org/rtutorials/regression_anovatable.pdf

More information

Section 11: Quantitative analyses: Linear relationships among variables

Section 11: Quantitative analyses: Linear relationships among variables Section 11: Quantitative analyses: Linear relationships among variables Australian Catholic University 214 ALL RIGHTS RESERVED. No part of this work covered by the copyright herein may be reproduced or

More information

Computational Biology, University of Maryland, College Park, MD, USA

Computational Biology, University of Maryland, College Park, MD, USA 1 Data Sharing in Ecology and Evolution: Why Not? Cynthia S. Parr 1 and Michael P. Cummings 2 1 Institute for Advanced Computer Studies, 2 Center for Bioinformatics and Computational Biology, University

More information

BOOtstrapping the Generalized Linear Model. Link to the last RSS article here:factor Analysis with Binary items: A quick review with examples. -- Ed.

BOOtstrapping the Generalized Linear Model. Link to the last RSS article here:factor Analysis with Binary items: A quick review with examples. -- Ed. MyUNT EagleConnect Blackboard People & Departments Maps Calendars Giving to UNT Skip to content Benchmarks ABOUT BENCHMARK ONLINE SEARCH ARCHIVE SUBSCRIBE TO BENCHMARKS ONLINE Columns, October 2014 Home»

More information

Bootstrapping, Randomization, 2B-PLS

Bootstrapping, Randomization, 2B-PLS Bootstrapping, Randomization, 2B-PLS Statistics, Tests, and Bootstrapping Statistic a measure that summarizes some feature of a set of data (e.g., mean, standard deviation, skew, coefficient of variation,

More information

Single-Trial Neural Correlates. of Arm Movement Preparation. Neuron, Volume 71. Supplemental Information

Single-Trial Neural Correlates. of Arm Movement Preparation. Neuron, Volume 71. Supplemental Information Neuron, Volume 71 Supplemental Information Single-Trial Neural Correlates of Arm Movement Preparation Afsheen Afshar, Gopal Santhanam, Byron M. Yu, Stephen I. Ryu, Maneesh Sahani, and K rishna V. Shenoy

More information

Job Announcement for an Assistant Professor at the Institute of Space and Astronautical Science, the Japan Aerospace Exploration Agency

Job Announcement for an Assistant Professor at the Institute of Space and Astronautical Science, the Japan Aerospace Exploration Agency Job Announcement for an Assistant Professor at the Institute of Space and Astronautical Science, the Japan Aerospace Exploration Agency The Japan Aerospace Space Exploration Agency (JAXA) is seeking to

More information

CHAPTER 4: DATASETS AND CRITERIA FOR ALGORITHM EVALUATION

CHAPTER 4: DATASETS AND CRITERIA FOR ALGORITHM EVALUATION CHAPTER 4: DATASETS AND CRITERIA FOR ALGORITHM EVALUATION 4.1 Overview This chapter contains the description about the data that is used in this research. In this research time series data is used. A time

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Frequency in % of each F-scale rating in outbreaks

Frequency in % of each F-scale rating in outbreaks Frequency in % of each F-scale rating in outbreaks 2 1 1 1 F/EF1 F/EF2 F/EF3-F/EF5 1954 196 197 198 199 2 21 Supplementary Figure 1 Distribution of outbreak tornadoes by F/EF-scale rating. Frequency (in

More information

Differentially Private Linear Regression

Differentially Private Linear Regression Differentially Private Linear Regression Christian Baehr August 5, 2017 Your abstract. Abstract 1 Introduction My research involved testing and implementing tools into the Harvard Privacy Tools Project

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

Chapter 27 Summary Inferences for Regression

Chapter 27 Summary Inferences for Regression Chapter 7 Summary Inferences for Regression What have we learned? We have now applied inference to regression models. Like in all inference situations, there are conditions that we must check. We can test

More information

Multilevel modeling and panel data analysis in educational research (Case study: National examination data senior high school in West Java)

Multilevel modeling and panel data analysis in educational research (Case study: National examination data senior high school in West Java) Multilevel modeling and panel data analysis in educational research (Case study: National examination data senior high school in West Java) Pepi Zulvia, Anang Kurnia, and Agus M. Soleh Citation: AIP Conference

More information

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression. 10/3/011 Functional Connectivity Correlation and Regression Variance VAR = Standard deviation Standard deviation SD = Unbiased SD = 1 10/3/011 Standard error Confidence interval SE = CI = = t value for

More information

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model Stat 3302 (Spring 2017) Peter F. Craigmile Simple linear logistic regression (part 1) [Dobson and Barnett, 2008, Sections 7.1 7.3] Generalized linear models for binary data Beetles dose-response example

More information

A NEW MODEL FOR THE FUSION OF MAXDIFF SCALING

A NEW MODEL FOR THE FUSION OF MAXDIFF SCALING A NEW MODEL FOR THE FUSION OF MAXDIFF SCALING AND RATINGS DATA JAY MAGIDSON 1 STATISTICAL INNOVATIONS INC. DAVE THOMAS SYNOVATE JEROEN K. VERMUNT TILBURG UNIVERSITY ABSTRACT A property of MaxDiff (Maximum

More information

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3 Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details Section 10.1, 2, 3 Basic components of regression setup Target of inference: linear dependency

More information

Experimental Design and Data Analysis for Biologists

Experimental Design and Data Analysis for Biologists Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1

More information

Basic Statistics. 1. Gross error analyst makes a gross mistake (misread balance or entered wrong value into calculation).

Basic Statistics. 1. Gross error analyst makes a gross mistake (misread balance or entered wrong value into calculation). Basic Statistics There are three types of error: 1. Gross error analyst makes a gross mistake (misread balance or entered wrong value into calculation). 2. Systematic error - always too high or too low

More information

CS Homework 3. October 15, 2009

CS Homework 3. October 15, 2009 CS 294 - Homework 3 October 15, 2009 If you have questions, contact Alexandre Bouchard (bouchard@cs.berkeley.edu) for part 1 and Alex Simma (asimma@eecs.berkeley.edu) for part 2. Also check the class website

More information

Coercive Journal Self Citations, Impact Factor, Journal Influence and Article Influence

Coercive Journal Self Citations, Impact Factor, Journal Influence and Article Influence Coercive Journal Self Citations, Impact Factor, Journal Influence and Article Influence Chia-Lin Chang Department of Applied Economics Department of Finance National Chung Hsing University Taichung, Taiwan

More information

Advanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting)

Advanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting) Advanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting) Professor: Aude Billard Assistants: Nadia Figueroa, Ilaria Lauzana and Brice Platerrier E-mails: aude.billard@epfl.ch,

More information

One-stage dose-response meta-analysis

One-stage dose-response meta-analysis One-stage dose-response meta-analysis Nicola Orsini, Alessio Crippa Biostatistics Team Department of Public Health Sciences Karolinska Institutet http://ki.se/en/phs/biostatistics-team 2017 Nordic and

More information

Guide to the Development of a Deterioration Rate Curve Using Condition State Inspection Data

Guide to the Development of a Deterioration Rate Curve Using Condition State Inspection Data Guide to the Development of a Deterioration Rate Curve Using Condition State Inspection Data by Guillermo A. Riveros and Elias Arredondo PURPOSE: The deterioration of elements of steel hydraulic structures

More information

Passing-Bablok Regression for Method Comparison

Passing-Bablok Regression for Method Comparison Chapter 313 Passing-Bablok Regression for Method Comparison Introduction Passing-Bablok regression for method comparison is a robust, nonparametric method for fitting a straight line to two-dimensional

More information

23. Inference for regression

23. Inference for regression 23. Inference for regression The Practice of Statistics in the Life Sciences Third Edition 2014 W. H. Freeman and Company Objectives (PSLS Chapter 23) Inference for regression The regression model Confidence

More information

Behavioral Data Mining. Lecture 7 Linear and Logistic Regression

Behavioral Data Mining. Lecture 7 Linear and Logistic Regression Behavioral Data Mining Lecture 7 Linear and Logistic Regression Outline Linear Regression Regularization Logistic Regression Stochastic Gradient Fast Stochastic Methods Performance tips Linear Regression

More information

This gives us an upper and lower bound that capture our population mean.

This gives us an upper and lower bound that capture our population mean. Confidence Intervals Critical Values Practice Problems 1 Estimation 1.1 Confidence Intervals Definition 1.1 Margin of error. The margin of error of a distribution is the amount of error we predict when

More information

Multiple Regression Analysis

Multiple Regression Analysis Chapter 4 Multiple Regression Analysis The simple linear regression covered in Chapter 2 can be generalized to include more than one variable. Multiple regression analysis is an extension of the simple

More information

22s:152 Applied Linear Regression

22s:152 Applied Linear Regression 22s:152 Applied Linear Regression Chapter 7: Dummy Variable Regression So far, we ve only considered quantitative variables in our models. We can integrate categorical predictors by constructing artificial

More information

Pseudocode for calculating Eigenfactor TM Score and Article Influence TM Score using data from Thomson-Reuters Journal Citations Reports

Pseudocode for calculating Eigenfactor TM Score and Article Influence TM Score using data from Thomson-Reuters Journal Citations Reports Pseudocode for calculating Eigenfactor TM Score and Article Influence TM Score using data from Thomson-Reuters Journal Citations Reports Jevin West and Carl T. Bergstrom November 25, 2008 1 Overview There

More information

Expert assessment for the recruitment of one Professor in Physics with focus on Magnetism at Lund University.

Expert assessment for the recruitment of one Professor in Physics with focus on Magnetism at Lund University. Expert assessment for the recruitment of one Professor in Physics with focus on Magnetism at Lund University. Expert: Prof. Giacomo Ghiringhelli, Politecnico di Milano, Italy Due: 14 February, 2017 I have

More information

Q30b Moyale Observed counts. The FREQ Procedure. Table 1 of type by response. Controlling for site=moyale. Improved (1+2) Same (3) Group only

Q30b Moyale Observed counts. The FREQ Procedure. Table 1 of type by response. Controlling for site=moyale. Improved (1+2) Same (3) Group only Moyale Observed counts 12:28 Thursday, December 01, 2011 1 The FREQ Procedure Table 1 of by Controlling for site=moyale Row Pct Improved (1+2) Same () Worsened (4+5) Group only 16 51.61 1.2 14 45.16 1

More information

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While

More information

Econometrics of Panel Data

Econometrics of Panel Data Econometrics of Panel Data Jakub Mućk Meeting # 3 Jakub Mućk Econometrics of Panel Data Meeting # 3 1 / 21 Outline 1 Fixed or Random Hausman Test 2 Between Estimator 3 Coefficient of determination (R 2

More information

Econometrics Summary Algebraic and Statistical Preliminaries

Econometrics Summary Algebraic and Statistical Preliminaries Econometrics Summary Algebraic and Statistical Preliminaries Elasticity: The point elasticity of Y with respect to L is given by α = ( Y/ L)/(Y/L). The arc elasticity is given by ( Y/ L)/(Y/L), when L

More information

Increasing Competitiveness of Investigators and Concentration Modeling: Quantitative Approaches

Increasing Competitiveness of Investigators and Concentration Modeling: Quantitative Approaches SCIENCE & TECHNOLOGY POLIC Y INSTITUTE Increasing Competitiveness of Investigators and Concentration Modeling: Quantitative Approaches Thomas W. Jones Brian Q. Rieksts Brian L. Zuckerman October 2014 Approved

More information

The Effect of Level of Significance (α) on the Performance of Hotelling-T 2 Control Chart

The Effect of Level of Significance (α) on the Performance of Hotelling-T 2 Control Chart The Effect of Level of Significance (α) on the Performance of Hotelling-T 2 Control Chart Obafemi, O. S. 1 Department of Mathematics and Statistics, Federal Polytechnic, Ado-Ekiti, Ekiti State, Nigeria

More information

Hypothesis Registration: Structural Predictions for the 2013 World Schools Debating Championships

Hypothesis Registration: Structural Predictions for the 2013 World Schools Debating Championships Hypothesis Registration: Structural Predictions for the 2013 World Schools Debating Championships Tom Gole * and Simon Quinn January 26, 2013 Abstract This document registers testable predictions about

More information

Museumpark Revisit: A Data Mining Approach in the Context of Hong Kong. Keywords: Museumpark; Museum Demand; Spill-over Effects; Data Mining

Museumpark Revisit: A Data Mining Approach in the Context of Hong Kong. Keywords: Museumpark; Museum Demand; Spill-over Effects; Data Mining Chi Fung Lam The Chinese University of Hong Kong Jian Ming Luo City University of Macau Museumpark Revisit: A Data Mining Approach in the Context of Hong Kong It is important for tourism managers to understand

More information

ST505/S697R: Fall Homework 2 Solution.

ST505/S697R: Fall Homework 2 Solution. ST505/S69R: Fall 2012. Homework 2 Solution. 1. 1a; problem 1.22 Below is the summary information (edited) from the regression (using R output); code at end of solution as is code and output for SAS. a)

More information

Lecture on Null Hypothesis Testing & Temporal Correlation

Lecture on Null Hypothesis Testing & Temporal Correlation Lecture on Null Hypothesis Testing & Temporal Correlation CS 590.21 Analysis and Modeling of Brain Networks Department of Computer Science University of Crete Acknowledgement Resources used in the slides

More information

QUANTITATIVE INTERPRETATION

QUANTITATIVE INTERPRETATION QUANTITATIVE INTERPRETATION THE AIM OF QUANTITATIVE INTERPRETATION (QI) IS, THROUGH THE USE OF AMPLITUDE ANALYSIS, TO PREDICT LITHOLOGY AND FLUID CONTENT AWAY FROM THE WELL BORE This process should make

More information

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions Journal of Modern Applied Statistical Methods Volume 8 Issue 1 Article 13 5-1-2009 Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error

More information

Summary statistics. G.S. Questa, L. Trapani. MSc Induction - Summary statistics 1

Summary statistics. G.S. Questa, L. Trapani. MSc Induction - Summary statistics 1 Summary statistics 1. Visualize data 2. Mean, median, mode and percentiles, variance, standard deviation 3. Frequency distribution. Skewness 4. Covariance and correlation 5. Autocorrelation MSc Induction

More information

Statistical View of Least Squares

Statistical View of Least Squares May 23, 2006 Purpose of Regression Some Examples Least Squares Purpose of Regression Purpose of Regression Some Examples Least Squares Suppose we have two variables x and y Purpose of Regression Some Examples

More information

Making Our Cities Safer: A Study In Neighbhorhood Crime Patterns

Making Our Cities Safer: A Study In Neighbhorhood Crime Patterns Making Our Cities Safer: A Study In Neighbhorhood Crime Patterns Aly Kane alykane@stanford.edu Ariel Sagalovsky asagalov@stanford.edu Abstract Equipped with an understanding of the factors that influence

More information

Determining Sufficient Number of Imputations Using Variance of Imputation Variances: Data from 2012 NAMCS Physician Workflow Mail Survey *

Determining Sufficient Number of Imputations Using Variance of Imputation Variances: Data from 2012 NAMCS Physician Workflow Mail Survey * Applied Mathematics, 2014,, 3421-3430 Published Online December 2014 in SciRes. http://www.scirp.org/journal/am http://dx.doi.org/10.4236/am.2014.21319 Determining Sufficient Number of Imputations Using

More information

Ordinary Least Squares Regression Explained: Vartanian

Ordinary Least Squares Regression Explained: Vartanian Ordinary Least Squares Regression Eplained: Vartanian When to Use Ordinary Least Squares Regression Analysis A. Variable types. When you have an interval/ratio scale dependent variable.. When your independent

More information

Predicting Future Energy Consumption CS229 Project Report

Predicting Future Energy Consumption CS229 Project Report Predicting Future Energy Consumption CS229 Project Report Adrien Boiron, Stephane Lo, Antoine Marot Abstract Load forecasting for electric utilities is a crucial step in planning and operations, especially

More information

Prediction of Bike Rental using Model Reuse Strategy

Prediction of Bike Rental using Model Reuse Strategy Prediction of Bike Rental using Model Reuse Strategy Arun Bala Subramaniyan and Rong Pan School of Computing, Informatics, Decision Systems Engineering, Arizona State University, Tempe, USA. {bsarun, rong.pan}@asu.edu

More information

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION SEAN GERRISH AND CHONG WANG 1. WAYS OF ORGANIZING MODELS In probabilistic modeling, there are several ways of organizing models:

More information

Two-Level Designs to Estimate All Main Effects and Two-Factor Interactions

Two-Level Designs to Estimate All Main Effects and Two-Factor Interactions Technometrics ISSN: 0040-1706 (Print) 1537-2723 (Online) Journal homepage: http://www.tandfonline.com/loi/utch20 Two-Level Designs to Estimate All Main Effects and Two-Factor Interactions Pieter T. Eendebak

More information

Outline. Mixed models in R using the lme4 package Part 3: Longitudinal data. Sleep deprivation data. Simple longitudinal data

Outline. Mixed models in R using the lme4 package Part 3: Longitudinal data. Sleep deprivation data. Simple longitudinal data Outline Mixed models in R using the lme4 package Part 3: Longitudinal data Douglas Bates Longitudinal data: sleepstudy A model with random effects for intercept and slope University of Wisconsin - Madison

More information

Business Statistics. Lecture 10: Course Review

Business Statistics. Lecture 10: Course Review Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,

More information

GROUPED DATA E.G. FOR SAMPLE OF RAW DATA (E.G. 4, 12, 7, 5, MEAN G x / n STANDARD DEVIATION MEDIAN AND QUARTILES STANDARD DEVIATION

GROUPED DATA E.G. FOR SAMPLE OF RAW DATA (E.G. 4, 12, 7, 5, MEAN G x / n STANDARD DEVIATION MEDIAN AND QUARTILES STANDARD DEVIATION FOR SAMPLE OF RAW DATA (E.G. 4, 1, 7, 5, 11, 6, 9, 7, 11, 5, 4, 7) BE ABLE TO COMPUTE MEAN G / STANDARD DEVIATION MEDIAN AND QUARTILES Σ ( Σ) / 1 GROUPED DATA E.G. AGE FREQ. 0-9 53 10-19 4...... 80-89

More information

Psychology 282 Lecture #4 Outline Inferences in SLR

Psychology 282 Lecture #4 Outline Inferences in SLR Psychology 282 Lecture #4 Outline Inferences in SLR Assumptions To this point we have not had to make any distributional assumptions. Principle of least squares requires no assumptions. Can use correlations

More information

Tables Table A Table B Table C Table D Table E 675

Tables Table A Table B Table C Table D Table E 675 BMTables.indd Page 675 11/15/11 4:25:16 PM user-s163 Tables Table A Standard Normal Probabilities Table B Random Digits Table C t Distribution Critical Values Table D Chi-square Distribution Critical Values

More information

Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee

Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee Lecture - 04 Basic Statistics Part-1 (Refer Slide Time: 00:33)

More information

Recall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as:

Recall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as: 1 Joint hypotheses The null and alternative hypotheses can usually be interpreted as a restricted model ( ) and an model ( ). In our example: Note that if the model fits significantly better than the restricted

More information

Department of Astrophysical Sciences Peyton Hall Princeton, New Jersey Telephone: (609)

Department of Astrophysical Sciences Peyton Hall Princeton, New Jersey Telephone: (609) Princeton University Department of Astrophysical Sciences Peyton Hall Princeton, New Jersey 08544-1001 Telephone: (609) 258-3808 Email: strauss@astro.princeton.edu January 18, 2007 Dr. George Helou, IPAC

More information

1 Descriptive statistics. 2 Scores and probability distributions. 3 Hypothesis testing and one-sample t-test. 4 More on t-tests

1 Descriptive statistics. 2 Scores and probability distributions. 3 Hypothesis testing and one-sample t-test. 4 More on t-tests Overall Overview INFOWO Statistics lecture S3: Hypothesis testing Peter de Waal Department of Information and Computing Sciences Faculty of Science, Universiteit Utrecht 1 Descriptive statistics 2 Scores

More information

Lecture 18 Miscellaneous Topics in Multiple Regression

Lecture 18 Miscellaneous Topics in Multiple Regression Lecture 18 Miscellaneous Topics in Multiple Regression STAT 512 Spring 2011 Background Reading KNNL: 8.1-8.5,10.1, 11, 12 18-1 Topic Overview Polynomial Models (8.1) Interaction Models (8.2) Qualitative

More information

IEOR165 Discussion Week 5

IEOR165 Discussion Week 5 IEOR165 Discussion Week 5 Sheng Liu University of California, Berkeley Feb 19, 2016 Outline 1 1st Homework 2 Revisit Maximum A Posterior 3 Regularization IEOR165 Discussion Sheng Liu 2 About 1st Homework

More information

Simulating Properties of the Likelihood Ratio Test for a Unit Root in an Explosive Second Order Autoregression

Simulating Properties of the Likelihood Ratio Test for a Unit Root in an Explosive Second Order Autoregression Simulating Properties of the Likelihood Ratio est for a Unit Root in an Explosive Second Order Autoregression Bent Nielsen Nuffield College, University of Oxford J James Reade St Cross College, University

More information