The Mathematics of Scientific Research: Scientometrics, Citation Metrics, and Impact Factors

Similar documents
Journal Impact Factor Versus Eigenfactor and Article Influence*

DEPARTMENT OF ECONOMICS AND FINANCE COLLEGE OF BUSINESS AND ECONOMICS UNIVERSITY OF CANTERBURY CHRISTCHURCH, NEW ZEALAND

Eigenfactor: ranking and mapping scientific knowledge

What Do We Know About the h Index?

Coercive Journal Self Citations, Impact Factor, Journal Influence and Article Influence

Pseudocode for calculating Eigenfactor TM Score and Article Influence TM Score using data from Thomson-Reuters Journal Citations Reports

KU Leuven, ECOOM & Dept MSI, Leuven (Belgium)

Evaluation of research publications and publication channels in astronomy and astrophysics

Differences in Impact Factor Across Fields and Over Time

Logistical and Transportation Planning. QUIZ 1 Solutions

Locating an Astronomy and Astrophysics Publication Set in a Map of the Full Scopus Database

Ten good reasons to use the Eigenfactor TM metrics

The h-index of a conglomerate

carroll/notes/ has a lot of good notes on GR and links to other pages. General Relativity Philosophy of general

arxiv: v2 [physics.soc-ph] 27 Oct 2008

The embedding problem for partial Steiner triple systems

Co-authorship networks in South African chemistry and mathematics

0. Introduction 1 0. INTRODUCTION

Faculty Working Paper Series

Introducing CitedReferencesExplorer (CRExplorer): A program for Reference Publication Year Spectroscopy. with Cited References Disambiguation

Network models: dynamical growth and small world


Application of Bradford's Law of Scattering to the Economics Literature of India and China: A Comparative Study

COMBINATORICS OF RANDOM TENSOR MODELS

arxiv: v1 [physics.soc-ph] 19 Apr 2008

has a lot of good notes on GR and links to other pages. General Relativity Philosophy of general relativity.

EVALUATING THE USAGE AND IMPACT OF E-JOURNALS IN THE UK

Phenomenology of a Toy Model Inspired by the Spontaneous Reduction of the Spectral Dimension in Quantum Gravity. Patrick Greene

Scientometrics as network science The hidden face of a misperceived research field. Sándor Soós, PhD

dt 2 The Order of a differential equation is the order of the highest derivative that occurs in the equation. Example The differential equation

Closed Universes, de Sitter Space and Inflation

Predicting long term impact of scientific publications

6.207/14.15: Networks Lecture 12: Generalized Random Graphs

Investigation Of Plane Symmetry In Lattice Designs

Newton s Law of Motion

Vectors. January 13, 2013

Test One Mathematics Fall 2009

1 Mechanistic and generative models of network structure

Measuring the attractiveness of academic journals: A direct influence aggregation model

SELECTION OF JOURNALS BASED ON THE IMPACT FACTORS: A Case Study

Temporal patterns in information and social systems

Electrical Formulation of the Type Problem: To determine p (r)

18.05 Final Exam. Good luck! Name. No calculators. Number of problems 16 concept questions, 16 problems, 21 pages

18.05 Practice Final Exam

Math 440 Project Assignment

College Physics 201 Graphing Review and Analysis

We are going to discuss what it means for a sequence to converge in three stages: First, we define what it means for a sequence to converge to zero

CS224W: Analysis of Networks Jure Leskovec, Stanford University

Index theory on manifolds with corners: Generalized Gauss-Bonnet formulas

International Journal of Digital Library Services

Comparative h and m indices for Fifteen Ground and Space Based Observatories

1 A Review of Basics and Applications

Eigenfactor: ranking and mapping scientific knowledge

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER / Functions

A polytime proof of correctness of the Rabin-Miller algorithm from Fermat s Little Theorem

A cluster analysis of scholar and journal bibliometric indicators

Applicability of Lotka s Law in Astronomy & Astrophysics Research of India

The Concept of Inflation

CS1800: Sequences & Sums. Professor Kevin Gold

How and Why to go Beyond the Discovery of the Higgs Boson

RESEARCH INFRASTRUCTURES IN THE LHC ERA: A SCIENTIOMETRIC APPROACH

Calculus (Math 1A) Lecture 1

Journal influence factors

The Riemann curvature tensor, its invariants, and their use in the classification of spacetimes

Digital Library Evaluation by Analysis of User Retrieval Patterns

Newton s Law of Motion

U N I V E R S I T A S N E G E R I S E M A R A N G J U L Y, S U H A R T M A I L. U N N E S. A C. I D E D I T O R D O A J

RESEARCH PRODUCTIVITY OF PHYSICAL SCIENCE DISCIPLINES IN SAMBALPUR UNIVERSITY (ORISSA): A SCIENTOMETRIC STUDY

A historical perspective on fine tuning: lessons from causal set cosmology

THE H- AND A-INDEXES IN ASTRONOMY

An Intuitive Introduction to Motivic Homotopy Theory Vladimir Voevodsky

Quantum Gravity Phenomenology

Math 114: Course Summary

1 The preferential attachment mechanism

The Stable Types and Minimal Whitney Stratifications of Discriminants and Euler Obstruction.

Posts in Transitive Percolation

2.1 Laplacian Variants

2.1 Definition. Let n be a positive integer. An n-dimensional vector is an ordered list of n real numbers.

Multicolour Ramsey Numbers of Odd Cycles

Introducing Proof 1. hsn.uk.net. Contents

CHAPTER 6 SOME CONTINUOUS PROBABILITY DISTRIBUTIONS. 6.2 Normal Distribution. 6.1 Continuous Uniform Distribution

Parameter Learning With Binary Variables

SYMPLECTIC CAPACITY AND CONVEXITY

Journal of Informetrics

MATH 167: APPLIED LINEAR ALGEBRA Least-Squares

In case (1) 1 = 0. Then using and from the previous lecture,

Lecture 24: Element-wise Sampling of Graphs and Linear Equation Solving. 22 Element-wise Sampling of Graphs and Linear Equation Solving

Section 1.1 Differential Equation Models. Key Terms:

18.175: Lecture 2 Extension theorems, random variables, distributions

Is the Relationship between Numbers of References and Paper Lengths the Same for All Sciences?

Accelerated Observers

Quantum Gravity and the Dimension of Space-time

2.1 Differential Equations and Solutions. Blerina Xhabli

1 Complex Networks - A Brief Overview

MATH 3510: PROBABILITY AND STATS July 1, 2011 FINAL EXAM

THE MCKEAN-SINGER FORMULA VIA EQUIVARIANT QUANTIZATION

Model theory, stability, applications

College Algebra Through Problem Solving (2018 Edition)

Selected Topics in AGT Lecture 4 Introduction to Schur Rings

Probably About Probability p <.05. Probability. What Is Probability?

Transcription:

Wayne State University Library Scholarly Publications Wayne State University Libraries 1-1-2016 The Mathematics of Scientific Research: Scientometrics, Citation Metrics, and Impact Factors Clayton Hayes Wayne State University, as6348@wayne.edu Recommended Citation Hayes, Clayton, "The Mathematics of Scientific Research: Scientometrics, Citation Metrics, and Impact Factors" (2016). Library Scholarly Publications. Paper 109. http://digitalcommons.wayne.edu/libsp/109 This Presentation is brought to you for free and open access by the Wayne State University Libraries at DigitalCommons@WayneState. It has been accepted for inclusion in Library Scholarly Publications by an authorized administrator of DigitalCommons@WayneState.

The Mathematics of Scientific Research Scientometrics, Citation Metrics, and Impact Factors C. Hayes April 1, 2016 C. Hayes The Mathematics of Scientific Research April 1, 2016 1 / 29

What We Will Cover In this talk, I will cover: 1. The definition of and foundation for Scientometrics 2. Price and his observations on citation networks 3. Garfield and his development of journal impact factors 3.1 Bergstrom & West and the Eigenfactor 4. Hirsch and the development of the h-index 4.1 Further developments in author metrics C. Hayes The Mathematics of Scientific Research April 1, 2016 2 / 29

A Brief Note on Citations BIMEASURE ALGEBRAS ON LCA GROUPS Graham & Schreiber 1984 C. Hayes The Mathematics of Scientific Research April 1, 2016 3 / 29

A Brief Note on Citations ON LINEAR TRANSFORMATIONS Phillips 1940 BIMEASURE ALGEBRAS ON LCA GROUPS Graham & Schreiber 1984 ADVANCES IN QUANTIZED FUNCTIONAL ANALYSIS Effros 1986 C. Hayes The Mathematics of Scientific Research April 1, 2016 3 / 29

A Brief Note on Citations cites cites ON LINEAR TRANSFORMATIONS Phillips 1940 BIMEASURE ALGEBRAS ON LCA GROUPS Graham & Schreiber 1984 ADVANCES IN QUANTIZED FUNCTIONAL ANALYSIS Effros 1986 cited by cited by C. Hayes The Mathematics of Scientific Research April 1, 2016 3 / 29

A Brief Note on Citations ON LINEAR TRANSFORMATIONS Phillips 1940 BIMEASURE ALGEBRAS ON LCA GROUPS Graham & Schreiber 1984 ADVANCES IN QUANTIZED FUNCTIONAL ANALYSIS Effros 1986 C. Hayes The Mathematics of Scientific Research April 1, 2016 3 / 29

Scientometrics Scientometrics is best described as the quantitative study of scientific communication. Its foundations were laid out in a paper published by S. C. Bradford in 1934 [1]. C. Hayes The Mathematics of Scientific Research April 1, 2016 4 / 29

Scientometrics Scientometrics is best described as the quantitative study of scientific communication. Its foundations were laid out in a paper published by S. C. Bradford in 1934 [1]. J 01 C. Hayes The Mathematics of Scientific Research April 1, 2016 4 / 29

Scientometrics Scientometrics is best described as the quantitative study of scientific communication. Its foundations were laid out in a paper published by S. C. Bradford in 1934 [1]. J 01 J 11 J 12 J 13 J 14 J 15 C. Hayes The Mathematics of Scientific Research April 1, 2016 4 / 29

Scientometrics Scientometrics is best described as the quantitative study of scientific communication. Its foundations were laid out in a paper published by S. C. Bradford in 1934 [1]. J 01 J 11 J 12 J 13 J 14 J 15 J 21 J 22 J 23 J 24 J 25 J 26 J 27 J 28 C. Hayes The Mathematics of Scientific Research April 1, 2016 4 / 29

The Foundations of Scientometrics Scientometrics had its true germination in the Cold War era, in the 1960s and 1970s. There are a three main reasons for this: C. Hayes The Mathematics of Scientific Research April 1, 2016 5 / 29

The Foundations of Scientometrics Scientometrics had its true germination in the Cold War era, in the 1960s and 1970s. There are a three main reasons for this: Computers were more accessible and could handle much of the necessary work C. Hayes The Mathematics of Scientific Research April 1, 2016 5 / 29

The Foundations of Scientometrics Scientometrics had its true germination in the Cold War era, in the 1960s and 1970s. There are a three main reasons for this: Computers were more accessible and could handle much of the necessary work The onset of Big Science made the meta-analysis of scientific research more relevant C. Hayes The Mathematics of Scientific Research April 1, 2016 5 / 29

The Foundations of Scientometrics Scientometrics had its true germination in the Cold War era, in the 1960s and 1970s. There are a three main reasons for this: Computers were more accessible and could handle much of the necessary work The onset of Big Science made the meta-analysis of scientific research more relevant The founding of the Institute for Scientific Information in 1960 and of the Science Citation Index in 1964 C. Hayes The Mathematics of Scientific Research April 1, 2016 5 / 29

The Foundations of Scientometrics Scientometrics had its true germination in the Cold War era, in the 1960s and 1970s. There are a three main reasons for this: Computers were more accessible and could handle much of the necessary work The onset of Big Science made the meta-analysis of scientific research more relevant The founding of the Institute for Scientific Information in 1960 and of the Science Citation Index in 1964 This situation laid the groundwork for Price and Garfield, the two most influential figures in the development of Scientometrics. C. Hayes The Mathematics of Scientific Research April 1, 2016 5 / 29

Derek de Solla Price Derek de Solla Price was a physicist and mathematician who developed an interest in the history of science in his late 20s. He developed a theory that the study of science was growing exponentially... C. Hayes The Mathematics of Scientific Research April 1, 2016 6 / 29

Derek de Solla Price Derek de Solla Price was a physicist and mathematician who developed an interest in the history of science in his late 20s. He developed a theory that the study of science was growing exponentially... 1655v1 1720v2 1720v1 1785v4 1785v3 1785v2 1785v1 1850v8 1850v7 1850v6 1850v5 1850v4 1850v3 1850v2 1850v1 Philosophical Transactions of the Royal Society C. Hayes The Mathematics of Scientific Research April 1, 2016 6 / 29

Citation Networks Based on his background in math and physics, and because of the availability of the Science Citation Index, he introduced the idea of a Citation Network in 1965 [2]. C. Hayes The Mathematics of Scientific Research April 1, 2016 7 / 29

Citation Networks Based on his background in math and physics, and because of the availability of the Science Citation Index, he introduced the idea of a Citation Network in 1965 [2]. A C. Hayes The Mathematics of Scientific Research April 1, 2016 7 / 29

Citation Networks Based on his background in math and physics, and because of the availability of the Science Citation Index, he introduced the idea of a Citation Network in 1965 [2]. A 11 A A 21 C. Hayes The Mathematics of Scientific Research April 1, 2016 7 / 29

Citation Networks Based on his background in math and physics, and because of the availability of the Science Citation Index, he introduced the idea of a Citation Network in 1965 [2]. A 12 A 22 A 11 A A 21 A 13 A 23 C. Hayes The Mathematics of Scientific Research April 1, 2016 7 / 29

Citation Networks Based on his background in math and physics, and because of the availability of the Science Citation Index, he introduced the idea of a Citation Network in 1965 [2]. A 12 A 31 A 22 A 11 A A 21 A 13 A 23 C. Hayes The Mathematics of Scientific Research April 1, 2016 7 / 29

Citation Networks Based on his background in math and physics, and because of the availability of the Science Citation Index, he introduced the idea of a Citation Network in 1965 [2]. A 12 A 31 A 22 A 11 A A 21 A 13 A 23 C. Hayes The Mathematics of Scientific Research April 1, 2016 7 / 29

Citation Networks Based on his background in math and physics, and because of the availability of the Science Citation Index, he introduced the idea of a Citation Network in 1965 [2]. A 12 A 31 A 22 A 11 A A 21 A 13 A 23 A 32 C. Hayes The Mathematics of Scientific Research April 1, 2016 7 / 29

Citation Networks Based on his background in math and physics, and because of the availability of the Science Citation Index, he introduced the idea of a Citation Network in 1965 [2]. A 12 A 31 A 22 A 11 A A 21 A 13 A 23 A 32 C. Hayes The Mathematics of Scientific Research April 1, 2016 7 / 29

Price s Observations Price s primary concern was to develop a reasonable picture of the nature of scholarship in the sciences, and made some conclusions based on his observations of the SCI data: C. Hayes The Mathematics of Scientific Research April 1, 2016 8 / 29

Price s Observations Price s primary concern was to develop a reasonable picture of the nature of scholarship in the sciences, and made some conclusions based on his observations of the SCI data: Each year, about 10% of papers die, i.e. are never cited again C. Hayes The Mathematics of Scientific Research April 1, 2016 8 / 29

Price s Observations Price s primary concern was to develop a reasonable picture of the nature of scholarship in the sciences, and made some conclusions based on his observations of the SCI data: Each year, about 10% of papers die, i.e. are never cited again For the live papers, odds of being cited in a particular year are 60% C. Hayes The Mathematics of Scientific Research April 1, 2016 8 / 29

Price s Observations Price s primary concern was to develop a reasonable picture of the nature of scholarship in the sciences, and made some conclusions based on his observations of the SCI data: Each year, about 10% of papers die, i.e. are never cited again For the live papers, odds of being cited in a particular year are 60% About 1% of papers are completely isolated C. Hayes The Mathematics of Scientific Research April 1, 2016 8 / 29

Price s Observations Price s primary concern was to develop a reasonable picture of the nature of scholarship in the sciences, and made some conclusions based on his observations of the SCI data: Each year, about 10% of papers die, i.e. are never cited again For the live papers, odds of being cited in a particular year are 60% About 1% of papers are completely isolated The age of a paper has a significant impact on the number of times it is cited Around 70% of citations are randomly distributed over all published papers The other 30% are to a more selective collection of recent literature C. Hayes The Mathematics of Scientific Research April 1, 2016 8 / 29

Price s Observations Price s primary concern was to develop a reasonable picture of the nature of scholarship in the sciences, and made some conclusions based on his observations of the SCI data: Each year, about 10% of papers die, i.e. are never cited again For the live papers, odds of being cited in a particular year are 60% About 1% of papers are completely isolated The age of a paper has a significant impact on the number of times it is cited Around 70% of citations are randomly distributed over all published papers The other 30% are to a more selective collection of recent literature Price developed this further by referring to that small, selective collection of recent literature as a research front. C. Hayes The Mathematics of Scientific Research April 1, 2016 8 / 29

Price s Observations New Papers C. Hayes The Mathematics of Scientific Research April 1, 2016 9 / 29

Price s Observations Half of published papers (one each) New Papers C. Hayes The Mathematics of Scientific Research April 1, 2016 9 / 29

Price s Observations Research Front Half of published papers (one each) New Papers C. Hayes The Mathematics of Scientific Research April 1, 2016 9 / 29

Price s Observations Research Front Half of published papers (one each) New Papers C. Hayes The Mathematics of Scientific Research April 1, 2016 9 / 29

Price and Cumulative Advantage Price developed these theories further, and outlined what he called cumulative advantage in a 1976 paper [3]. You are probably familiar with the usual example of drawing items from an urn: n red m black C. Hayes The Mathematics of Scientific Research April 1, 2016 10 / 29

Price and Cumulative Advantage Price developed these theories further, and outlined what he called cumulative advantage in a 1976 paper [3]. You are probably familiar with the usual example of drawing items from an urn: success n red m black failure C. Hayes The Mathematics of Scientific Research April 1, 2016 10 / 29

Price and Cumulative Advantage Price developed these theories further, and outlined what he called cumulative advantage in a 1976 paper [3]. You are probably familiar with the usual example of drawing items from an urn: success n red m black failure P(red) = n n+m C. Hayes The Mathematics of Scientific Research April 1, 2016 10 / 29

Price and Cumulative Advantage Price developed these theories further, and outlined what he called cumulative advantage in a 1976 paper [3]. You are probably familiar with the usual example of drawing items from an urn: w/ replacement success n red m black P(red) = n n+m n red m black failure P(red) = n n+m C. Hayes The Mathematics of Scientific Research April 1, 2016 10 / 29

Price and Cumulative Advantage Price developed these theories further, and outlined what he called cumulative advantage in a 1976 paper [3]. You are probably familiar with the usual example of drawing items from an urn: w/ replacement success n red m black P(red) = n n+m n red m black failure w/o replacement P(red) = n n+m n 1 red m black P(red) = n 1 n+m 1 C. Hayes The Mathematics of Scientific Research April 1, 2016 10 / 29

Cumulative Advantage Consider the following instead: success n red m black P(red) = n n+m failure C. Hayes The Mathematics of Scientific Research April 1, 2016 11 / 29

Cumulative Advantage Consider the following instead: success if red is drawn n + c red m black P(red) = n+c n+c+m n red m black failure P(red) = n n+m C. Hayes The Mathematics of Scientific Research April 1, 2016 11 / 29

Cumulative Advantage Consider the following instead: success if red is drawn n + c red m black P(red) = n+c n+c+m n red m black if black is drawn P(red) = n n+m failure n red m black P(red) = n n+m C. Hayes The Mathematics of Scientific Research April 1, 2016 11 / 29

The Cumulative Advantage Distribution Price extrapolated this to develop what he termed the Cumulative Advantage Distribution. Consider the base case, with n = m = c = 1, and let k denote number of successes: 1 red 1 black k = 0 C. Hayes The Mathematics of Scientific Research April 1, 2016 12 / 29

The Cumulative Advantage Distribution Price extrapolated this to develop what he termed the Cumulative Advantage Distribution. Consider the base case, with n = m = c = 1, and let k denote number of successes: 1 red 1 black P(red) = 1 2 k = 0 C. Hayes The Mathematics of Scientific Research April 1, 2016 12 / 29

The Cumulative Advantage Distribution Price extrapolated this to develop what he termed the Cumulative Advantage Distribution. Consider the base case, with n = m = c = 1, and let k denote number of successes: 1 red 1 black P(red) = 1 2 k = 0 2 red 1 black k = 1 C. Hayes The Mathematics of Scientific Research April 1, 2016 12 / 29

The Cumulative Advantage Distribution Price extrapolated this to develop what he termed the Cumulative Advantage Distribution. Consider the base case, with n = m = c = 1, and let k denote number of successes: 1 red 1 black P(red) = 1 2 k = 0 2 red 1 black P(red) = 2 3 k = 1 C. Hayes The Mathematics of Scientific Research April 1, 2016 12 / 29

The Cumulative Advantage Distribution Price extrapolated this to develop what he termed the Cumulative Advantage Distribution. Consider the base case, with n = m = c = 1, and let k denote number of successes: 1 red 1 black P(red) = 1 2 3 red 1 black P(red) = 3 4 k = 0 k = 2 2 red 1 black P(red) = 2 3 4 red 1 black P(red) = 4 5 k = 1 k = 3 C. Hayes The Mathematics of Scientific Research April 1, 2016 12 / 29

The Cumulative Advantage Distribution Price extrapolated this to develop what he termed the Cumulative Advantage Distribution. Consider the base case, with n = m = c = 1, and let k denote number of successes: 1 red 1 black P(red) = 1 2 3 red 1 black P(red) = 3 4 k = 0 k = 2 2 red 1 black P(red) = 2 3 4 red 1 black P(red) = 4 5 k = 1 k = 3 So the probability of success after k successes is then... C. Hayes The Mathematics of Scientific Research April 1, 2016 12 / 29

The Cumulative Advantage Distribution Price extrapolated this to develop what he termed the Cumulative Advantage Distribution. Consider the base case, with n = m = c = 1, and let k denote number of successes: 1 red 1 black P(red) = 1 2 3 red 1 black P(red) = 3 4 k = 0 k = 2 2 red 1 black P(red) = 2 3 4 red 1 black P(red) = 4 5 k = 1 k = 3 So the probability of success after k successes is then...... 1+k 2+k. C. Hayes The Mathematics of Scientific Research April 1, 2016 12 / 29

The Cumulative Advantage Distribution Do this for K urns, stopping each whenever you draw your first failure. One would then expect to have... C. Hayes The Mathematics of Scientific Research April 1, 2016 13 / 29

The Cumulative Advantage Distribution Do this for K urns, stopping each whenever you draw your first failure. One would then expect to have... k = 1 K/2 successes k = 2 K/3 successes k = 3 K/4.. C. Hayes The Mathematics of Scientific Research April 1, 2016 13 / 29

The Cumulative Advantage Distribution Do this for K urns, stopping each whenever you draw your first failure. One would then expect to have... k = 1 K/2 successes k = 2 K/3 successes k = 3 K/4 The expected number of urns with at least k successes is then.. C. Hayes The Mathematics of Scientific Research April 1, 2016 13 / 29

The Cumulative Advantage Distribution Do this for K urns, stopping each whenever you draw your first failure. One would then expect to have... k = 1 K/2 successes k = 2 K/3 successes k = 3 K/4 The expected number of urns with at least k successes is then.. K/k + 1 C. Hayes The Mathematics of Scientific Research April 1, 2016 13 / 29

The Cumulative Advantage Distribution Do this for K urns, stopping each whenever you draw your first failure. One would then expect to have... k = 1 K/2 successes k = 2 K/3 successes k = 3 K/4 The expected number of urns with at least k successes is then.. K/k + 1 while the number of urns with exactly k successes is C. Hayes The Mathematics of Scientific Research April 1, 2016 13 / 29

The Cumulative Advantage Distribution Do this for K urns, stopping each whenever you draw your first failure. One would then expect to have... k = 1 K/2 successes k = 2 K/3 successes k = 3 K/4 The expected number of urns with at least k successes is then.. K/k + 1 while the number of urns with exactly k successes is K k + 1 K k + 2 = K (k + 1)(k + 2) C. Hayes The Mathematics of Scientific Research April 1, 2016 13 / 29

The Cumulative Advantage Distribution What you will get is something like the following: Urn1 6 red 1 black Urn3 2 red 1 black Urn5 1 red 1 black k = 5 k = 1 k = 0 Urn2 1 red 1 black Urn4 4 red 1 black Urn6 2 red 1 black k = 0 k = 3 k = 1 C. Hayes The Mathematics of Scientific Research April 1, 2016 14 / 29

The Cumulative Advantage Distribution If you re looking to actually model a citation network, taking into account new articles and new citations, its much more complex. C. Hayes The Mathematics of Scientific Research April 1, 2016 15 / 29

The Cumulative Advantage Distribution If you re looking to actually model a citation network, taking into account new articles and new citations, its much more complex. What Price showed is that, if f (k) represents the fraction of the population in state k (e.g. having exactly k successes), f (k) = CP j B(k, j + 2) where C is Euler s constant, P is the size of the population, j is a constant parameter, and B(, ) is the Beta Function: B(a, b) = B(b, a) = 1 0 x a 1 (1 x) b 1 dx = Γ(a)Γ(b) (a 1)!(b 1)! = Γ(a + b) (a + b 1)! C. Hayes The Mathematics of Scientific Research April 1, 2016 15 / 29

The Cumulative Advantage Distribution If you re looking to actually model a citation network, taking into account new articles and new citations, its much more complex. What Price showed is that, if f (k) represents the fraction of the population in state k (e.g. having exactly k successes), f (k) = CP j B(k, j + 2) where C is Euler s constant, P is the size of the population, j is a constant parameter, and B(, ) is the Beta Function: B(a, b) = B(b, a) = 1 0 x a 1 (1 x) b 1 dx = Γ(a)Γ(b) (a 1)!(b 1)! = Γ(a + b) (a + b 1)! He defined the Cumulative Advantage Distribution as having density function f (k) = (j + 1) B(k, j + 2) C. Hayes The Mathematics of Scientific Research April 1, 2016 15 / 29

The Cumulative Advantage Distribution f (k) = (j + 1) B(k, j + 2) The parameter j mentioned previously more or less entirely defines the distribution. For example, j = 0 gives the simplified Urn model shown previously. Generally, for given j > 1, B(1, j) gives the total number of successes B(1, j + 1) gives the population size B(k, j + 1) gives the number of members with at least k successes B(k, j + 2) gives the number of members with exactly k successes One can adjust for the population size using constants, while j (very roughly speaking) helps to model the average number of successes per item. C. Hayes The Mathematics of Scientific Research April 1, 2016 16 / 29

Further Developments Cumulative Advantage was later given the name Preferential Attachment, which is still used today. Preferential attachment processes can give rise to Power Law Distributions, where the proportion P(k) of a population in state k is given by P(k) k γ where γ is some parameter and for sufficiently large k. Considering citation networks where members are nodes and success is a connection from one node to another, then this power law distribution means that a citation network forms what is called a scale-free network. C. Hayes The Mathematics of Scientific Research April 1, 2016 17 / 29

Notes on Citation Spaces A citation network can be considered to have an inherent geometric structure based on the fact that they form a Directed Acyclic Graph having causal connections that are constrained by time. Consider our little citation network model from before: C. Hayes The Mathematics of Scientific Research April 1, 2016 18 / 29

Notes on Citation Spaces A citation network can be considered to have an inherent geometric structure based on the fact that they form a Directed Acyclic Graph having causal connections that are constrained by time. Consider our little citation network model from before: A 12 A 31 A 22 A 11 A A 21 A 13 A 23 A 32 C. Hayes The Mathematics of Scientific Research April 1, 2016 18 / 29

Notes on Citation Spaces Clough and Evans [4] have proposed modeling these citation spaces with a Lorentzian manifold, wherein one dimension (time) is considered separately from the other (spacial) dimensions. The simplest manifold of this type is Minkowski space, which is also used for special relativity. C. Hayes The Mathematics of Scientific Research April 1, 2016 19 / 29

Notes on Citation Spaces Clough and Evans [4] have proposed modeling these citation spaces with a Lorentzian manifold, wherein one dimension (time) is considered separately from the other (spacial) dimensions. The simplest manifold of this type is Minkowski space, which is also used for special relativity. By leveraging existing methods related to modeling quantum gravity, Clough and Evans attempted to estimate the dimension of several sections of the Physics arxiv network by considering them to be embedded in a Minkowski space: High-Energy Theory (hep-th) 2 High-Energy Phenomenology (hep-ph) 3 Astrophysics (astro-ph) 3.5 Quantum Physics (quant-ph) 3 This may be used to help differentiate between the research practices of closely-related fields, or to provide a way to define the distance between papers in a particular discipline. C. Hayes The Mathematics of Scientific Research April 1, 2016 19 / 29

Garfield and Impact Factor For better or for worse, Eugene Garfield is largely responsible for how we decide which journals in the sciences are top and which are not. This is all possible because of the factors mentioned previously. Once you are able to quickly determine how many citations are coming in to a particular journal, you have some idea of how influential that journal is. Garfield introduced the notion of impact factor in a 1972 paper [5] as the number of times a journal has been cited divided by the number of articles it has produced. C. Hayes The Mathematics of Scientific Research April 1, 2016 20 / 29

Garfield and Impact Factor For better or for worse, Eugene Garfield is largely responsible for how we decide which journals in the sciences are top and which are not. This is all possible because of the factors mentioned previously. Once you are able to quickly determine how many citations are coming in to a particular journal, you have some idea of how influential that journal is. Garfield introduced the notion of impact factor in a 1972 paper [5] as the number of times a journal has been cited divided by the number of articles it has produced. IF y,t = y+t i=y c i y+t i=y a i where y is a particular year, t is the number of years, c i are the number of citations received, and a i are the number of articles. C. Hayes The Mathematics of Scientific Research April 1, 2016 20 / 29

Impact Factor When used today, the Impact Factor for a given year y is IF y = c y 1 + c y 2 a y 1 + a y 2 where the c i are the number of citations from year y to articles in year i and the a i are the number of citable articles in year i. C. Hayes The Mathematics of Scientific Research April 1, 2016 21 / 29

Impact Factor When used today, the Impact Factor for a given year y is IF y = c y 1 + c y 2 a y 1 + a y 2 where the c i are the number of citations from year y to articles in year i and the a i are the number of citable articles in year i. The 5-year Impact Factor in a year y is defined similarly: IF 5y = c y 1 + c y 2 + + c y 5 a y 1 + a y 2 + + a y 5 C. Hayes The Mathematics of Scientific Research April 1, 2016 21 / 29

Impact Factor When used today, the Impact Factor for a given year y is IF y = c y 1 + c y 2 a y 1 + a y 2 where the c i are the number of citations from year y to articles in year i and the a i are the number of citable articles in year i. The 5-year Impact Factor in a year y is defined similarly: IF 5y = c y 1 + c y 2 + + c y 5 a y 1 + a y 2 + + a y 5 There is also a kind of zero case called the Immediacy Index: II y = c y a y C. Hayes The Mathematics of Scientific Research April 1, 2016 21 / 29

Sample Impact Factors Journal Title IF 5-yr IF Annals of Mathematics 3.263 3.654 Duke Mathematical Journal 1.578 2.009 Advances in Applied Probability 0.709 0.831 Proceedings of the AMS 0.681 0.680 Algebraic & Geometric Topology 0.445 0.581 Nature 41.456 41.296 JAMA 35.289 31.026 PLoS One 3.234 3.702 This data comes from the Journal Citation Reports (JCR) database, owned by Thomson-Reuters, who also bought the ISI and SCI in the 90s. C. Hayes The Mathematics of Scientific Research April 1, 2016 22 / 29

The Eigenfactor Introduced by Bergstrom and West in 2007 [6], the general theory behind the Eigenfactor score is tracking a researcher on a random walk through the body of literature in a particular year. C. Hayes The Mathematics of Scientific Research April 1, 2016 23 / 29

The Eigenfactor Introduced by Bergstrom and West in 2007 [6], the general theory behind the Eigenfactor score is tracking a researcher on a random walk through the body of literature in a particular year. Bulletin AMS C. Hayes The Mathematics of Scientific Research April 1, 2016 23 / 29

The Eigenfactor Introduced by Bergstrom and West in 2007 [6], the general theory behind the Eigenfactor score is tracking a researcher on a random walk through the body of literature in a particular year. Science Bulletin AMS C. Hayes The Mathematics of Scientific Research April 1, 2016 23 / 29

The Eigenfactor Introduced by Bergstrom and West in 2007 [6], the general theory behind the Eigenfactor score is tracking a researcher on a random walk through the body of literature in a particular year. Astérisque Science Bulletin AMS C. Hayes The Mathematics of Scientific Research April 1, 2016 23 / 29

The Eigenfactor Introduced by Bergstrom and West in 2007 [6], the general theory behind the Eigenfactor score is tracking a researcher on a random walk through the body of literature in a particular year. Science Astérisque Science Bulletin AMS C. Hayes The Mathematics of Scientific Research April 1, 2016 23 / 29

The Eigenfactor Introduced by Bergstrom and West in 2007 [6], the general theory behind the Eigenfactor score is tracking a researcher on a random walk through the body of literature in a particular year. Duke Math Science Astérisque Science Bulletin AMS C. Hayes The Mathematics of Scientific Research April 1, 2016 23 / 29

The Eigenfactor Introduced by Bergstrom and West in 2007 [6], the general theory behind the Eigenfactor score is tracking a researcher on a random walk through the body of literature in a particular year. Duke Math Science Astérisque Science Bulletin AMS Unfortunately, the actual algorithm is patented, so there isn t much more to say about it! C. Hayes The Mathematics of Scientific Research April 1, 2016 23 / 29

Sample Impact Factors & Eigenfactors Journal Title Eigenfactor * IF 5-yr IF Annals of Mathematics 0.03843 3.263 3.654 Duke Mathematical Journal 0.02013 1.578 2.009 Advances in Applied Probability 0.00380 0.709 0.831 Proceedings of the AMS 0.03015 0.681 0.680 Algebraic & Geometric Topology 0.00713 0.445 0.581 Nature 1.49869 41.456 41.296 JAMA 0.26099 35.289 31.026 PLoS One 1.53341 3.234 3.702 * Note that Eigenfactors are scaled so that the sum of all Eigenfactors over the Journal Citation Reports database is 100. C. Hayes The Mathematics of Scientific Research April 1, 2016 24 / 29

Hirsch and the h-index If you can rank journals based on citation data, why not rank researchers, too? C. Hayes The Mathematics of Scientific Research April 1, 2016 25 / 29

Hirsch and the h-index If you can rank journals based on citation data, why not rank researchers, too? Hirsch introduced the h-index in 2005 [7], and it s still the most common author metric. An author A has an h-index of h A if: h A is the largest number such that h A of their articles have been cited at least h A times. C. Hayes The Mathematics of Scientific Research April 1, 2016 25 / 29

Hirsch and the h-index If you can rank journals based on citation data, why not rank researchers, too? Hirsch introduced the h-index in 2005 [7], and it s still the most common author metric. An author A has an h-index of h A if: h A is the largest number such that h A of their articles have been cited at least h A times. Author A s Papers 15 3 10 7 105 16 7 C. Hayes The Mathematics of Scientific Research April 1, 2016 25 / 29

Hirsch and the h-index If you can rank journals based on citation data, why not rank researchers, too? Hirsch introduced the h-index in 2005 [7], and it s still the most common author metric. An author A has an h-index of h A if: h A is the largest number such that h A of their articles have been cited at least h A times. Author A s Papers 15 3 10 7 105 16 7 h A = 6 C. Hayes The Mathematics of Scientific Research April 1, 2016 25 / 29

Hirsch and the h-index If you can rank journals based on citation data, why not rank researchers, too? Hirsch introduced the h-index in 2005 [7], and it s still the most common author metric. An author A has an h-index of h A if: h A is the largest number such that h A of their articles have been cited at least h A times. Author A s Papers 15 3 10 7 105 16 7 h A = 6 Author B s Papers 7 16 97 17 83 9 13 5 C. Hayes The Mathematics of Scientific Research April 1, 2016 25 / 29

Hirsch and the h-index If you can rank journals based on citation data, why not rank researchers, too? Hirsch introduced the h-index in 2005 [7], and it s still the most common author metric. An author A has an h-index of h A if: h A is the largest number such that h A of their articles have been cited at least h A times. Author A s Papers 15 3 10 7 105 16 7 h A = 6 Author B s Papers 7 16 97 17 83 9 13 5 h B = 7 C. Hayes The Mathematics of Scientific Research April 1, 2016 25 / 29

Hirsch and the h-index Generally speaking, it s easiest to order papers according to number of citations, from largest to smallest: C. Hayes The Mathematics of Scientific Research April 1, 2016 26 / 29

Hirsch and the h-index Generally speaking, it s easiest to order papers according to number of citations, from largest to smallest: Author A s Papers 105 16 15 10 7 7 3 Author B s Papers 97 83 17 16 13 9 7 5 C. Hayes The Mathematics of Scientific Research April 1, 2016 26 / 29

Hirsch and the h-index Generally speaking, it s easiest to order papers according to number of citations, from largest to smallest: Author A s Papers 105 16 15 10 7 7 3 Author B s Papers 97 83 17 16 13 9 7 5 C. Hayes The Mathematics of Scientific Research April 1, 2016 26 / 29

Variations on the h index The h I -index seeks to control for differing author interests [8]: h I = where h is the usual h-index and N a (T ) papers. h2 N (T ) a is the total number of authors in those h C. Hayes The Mathematics of Scientific Research April 1, 2016 27 / 29

Variations on the h index The h I -index seeks to control for differing author interests [8]: h I = where h is the usual h-index and N a (T ) papers. h2 N (T ) a is the total number of authors in those h The g-index seeks to weigh highly-cited papers more heavily [9]. An author A has g-index g A if g A is the largest number such that a collection of g A of their articles has been cited a total of at least (g A ) 2 times. C. Hayes The Mathematics of Scientific Research April 1, 2016 27 / 29

Variations on the h index The contemporary h-index seeks to control for the age of an article [10]. It first defines the score S c (i) of an article i as: S c (i) = γ (Y (now) Y (i) + 1) δ C(i) where Y (i) is the publication year of i, C(i) is the set of articles citing i, δ is the degree to which you desire age to be a factor, and γ is an offset. The developers of this index used δ = 1 and γ = 4. An author has contemporary h-index h c if h c is the largest number such that h c of their articles have a score of S c (i) h c. C. Hayes The Mathematics of Scientific Research April 1, 2016 28 / 29

References [1] S.C. Bradford, Sources of specific information, Engineering 137 (1934) 85-86. [2] D.S. Price, Networks of scientific papers, Science 149 (1965) 510-515. [3] D.S. Price, A general theory of bibliometric and other cumulative advantage processes, J. Am. Soc. Inf. Sci. 27 (1976) 292-306. [4] J.R. Clough, T.S. Evans, What is the dimension of citation space? Physica A 448 (2016) 235-247. [5] E. Garfield, Citation analysis as a tool in journal evaluation, Science 178 (1972) 471-479. [6] C.T. Bergstron, J.D. West, M.A. Wiseman, The Eigenfactor metrics, J. Neuurosci. 28 (2008) 11433-4. [7] J.E. Hirsch, An index to quantify an individual s scientific research output, Proc. Natl. Acad. Sci. USA 102 (2005) 16569-72. [8] P.D. Batista, M.G. Campiteli, O. Kinouchi, Is it possible to compare researchers with different scientific interests? Scientometrics 68 (2006) 179-189. [9] L. Egghe, Theory and practise of the g-index, Scientometrics 69 (2006) 131-152. [10] A. Sidiropoulos, D. Katsaros, Y. Manolopoulos, Generlized Hirsch h-index for disclosing latent facts in citation networks, Scientometrics 72 (2007) 253-280. Photo of Eugene Garfield provided by the Chemical Heritage Foundation under a CC-BY-SA license. C. Hayes The Mathematics of Scientific Research April 1, 2016 29 / 29