On the correction of the h-index for career length

1 On the correcton of the h-ndex for career length by L. Egghe Unverstet Hasselt (UHasselt), Campus Depenbeek, Agoralaan, B-3590 Depenbeek, Belgum 1 and Unverstet Antwerpen (UA), IBW, Stadscampus, Venusstraat 35, B-2000 Antwerpen, Belgum leo.egghe@uhasselt.be ABSTRACT We descrbe mathematcally the age-ndependent verson of the h-ndex, defned by Abt (Scentometrcs 91(3), 863-868, 2012) and explan when ths ndcator s constant wth age. We compare ths ndex wth the one where not the h-ndex s dvded by career length but where all ctaton numbers are dvded by career length and where we then calculate the new h-ndex. Both mathematcal models are compared. A varant of ths second method s by calculatng the h-ndex of the ctaton data, dvded by artcle age. Examples are gven. 1 Permanent address Key words and phrases: age-ndependent, h-ndex (Hrsch-ndex)

2 Introducton Let us have a researcher wth T publcatons and let c ( 1,..., T) be the number of receved ctatons of paper. We suppose that the papers are arranged n decreasng order of number of receved ctatons (.e. c cj f and only f j). Then the Hrsch-ndex (Hrsch (2005)) (or h-ndex) s the largest rank r h such that all papers on ranks 1,..., h have at least h ctatons (.e. the largest rank r h such that ch h and hence c h for all 1,..., h). It s clear that the h-ndex s age-dependent (.e. s dependent of career length). Long careers usually have hgher values of T (total number of publcatons) and of c ( 1,..., T) (number of ctatons receved by paper ), when compared to shorter careers (e.g. younger researchers). Ths fact was already noted n the defnng paper Hrsch (2005). So, wth the h-ndex, one should not compare researchers wth dfferent career length (as we should also not compare researchers from dfferent felds, but that s the case for all ctaton- based ndcators n ths paper we do not deal wth ths problem). Ths has lead Abt (2012) to the followng age-ndependent h-ndex (also mplct n Hrsch (2005)). Denote by ht the h-ndex of a researcher at career length t (startng at the tme of the researcher s frst publshed paper). Then defne at ht (1) t 10,.e. the h-ndex (at tme t) dvded by the (fractonal) number of decades snce the frst publshed paper. The factor 10 s only useful n practcal examples; n theoretcal models we mght use ht a t (2) t as well.

3 Abt (2012) clams that constant at mples that at s constant n t and gves practcal evdence for t. Incdentally, a ht ncreases lnearly, as s trval from (1) or (2). In ths paper, based on the models developed n Egghe and Rousseau (2006) and Egghe (2009), we gve a mathematcal model for at and at and a a t to be constant. Practcal data on gven. Ths s done n the next secton. t and present necessary and suffcent condtons for ht and at for ths author s career are In the thrd secton we descrbe a second way to reach an age-ndependent ndcator: we do not dvde ht by t (or t 10 ) but we dvde all ctaton numbers c by ths career length and then we calculate the h-ndex of ths transformed set of data. We denote by (f we use 10 t ) and by b ht ths ndcator t ths ndcator (f we use t). Also for these ndcators we present a mathematcal model based on Egghe and Rousseau (2006) and Egghe (2008a, b) and gve necessary and suffcent condtons for bt or b t to be constant. An example from ths author s career s gven. Also n ths secton we remark that a thrd ndcator can be constructed to reach age-ndependence. In the prevous two cases we dvde always by career length t or t 10 where we do not take nto account the dfferent ages of the publshed papers. So, theoretcally, t makes more sense to dvde the number of ctatons c of the th paper by the age of the paper whch s 2012 gven by 2012 publcaton year + 1 (3) and then calculate the h-ndex of ths transformed set of data. If we use age/10 we denote ths ndcator ct and f we use age we denote ths ndcator c t (as we dd wth the prevous two ndcators). We prove relatons (nequaltes) between these three ndcators and present an example from ths author s career. The paper closes wth some conclusons and suggestons for further research.

4 A model for Abt s age-ndependent ndex In Egghe and Rousseau (2006) we assumed that the paper-ctaton system s Lotkaan (Egghe (2005)) wth Lotka exponent 1. We there showed that, f there are T papers n total, that the h-ndex of ths system s gven by 1 T h (4) In Egghe (2009) we assumed that we have a number (densty) of publcatons per tme unt (at t) equal to dt, where d 0, 0 (d was denoted b n Egghe (2009) but we avod ths n order not to confuse wth the second ndcator bt, dscussed n the ntroducton). Note that the case 0 s the case where we have a constant number of publcatons per year. The total number of publcatons T, dependent on t (denoted ast t ) s gven by 0 t T t dt ' dt ' (5) d 1 1 t T t (6) Combnng (4) and (6) (supposng to be ndependent of t) and denotng the tme-dependent h-ndex h by ht, yelds h t d 1 1 t 1 (7)

5 Note that ht s a concavely ncreasng functon of t f and only f 1, s lnear n t f and only f 1 and s a convexly ncreasng functon of t f and only f 1. By defnton of Abt s ndcator at we have, by (1) and (7) a t d 10 1 1 t 1 (8) (and the same for a t wth the factor 10 deleted). We now have Proposton 1, yeldng a necessary and suffcent condton for Abt s clam to be vald. Proposton 1: The ndcator at s constant f and only f 1 (9) The same result s true for a t. Proof: Ths s trvally followng from (8) Note: The case 1 s a classcal nformetrc case. The most classcal Lotka exponent s 2 (see Egghe (2005)). Ths mples that 1 and by (7) we have a lnearly ncreasng ht functon, much n lne wth Fg. 1 ( ht for ths author s career). So Proposton 1 ndcates that a constant (2012). at functon s classcal and hence supports the fndng n Abt

6 We further have Proposton 2. Proposton 2: () at s convexly decreasng f and only f 1 (10) () at s ncreasng f and only f 1 (11) () at s convexly ncreasng f and only f 1 2 (12) (v) at s concavely ncreasng f and only f 1 2 (13) (v) at s lnearly ncreasng f and only f 1 2 (14) The same results are true for a t Proof: All these results follow trvally from (8). These results are llustrated by ths author s ctaton data yeldng Fg. 1 at. ht and Fg. 2

7 Fg. 1: h(t) sequence for ths author s career Fg. 2: a(t) sequence for ths author s career

8 Fg. 1 s an updated verson of Fg. 2 n Egghe (2009), updated to 35 career years ( t 1s 1978, the year of the frst publcaton and t 35 s 2012, the present year). We can say that ht s convexly ncreasng ( 1 ) but s close to lnear ( 1 ) leadng to an ncreasng also Proposton 1). at (see also Proposton 2 ()) but wth a relatvely constant mddle part (see An varant of Abt s age-ndependent ndex There are several ways to correct the h-ndex for career length. One of them s Abt s ndex, dscussed n the prevous secton: the smple dea there s by dvdng the h-ndex by the career length t (or t 10 ). A smlar dea s as follows. We take the ctaton data c ( 1,..., T) and dvde all these numbers by t (or t 10 ). For these normalzed ctaton data we calculate the h-ndex. We denote ths age-ndependent ndex by bt (usng t 10 ) and by b bt and b t (usng t). To model t we nvoke a result from Egghe (2008 a, b) on transformatons of the h-ndex. Proposton 3 (Egghe (2008 a, b)): Let h be the h-ndex of the system c ( 1,..., T): the number of receved ctatons for paper, where there are T papers n total. If we do not change the total number T of papers but f we multply each c by a postve number B, then the h-ndex h of ths transformed system s gven by 1 h B h (15) where s the Lotka exponent of the orgnal system.

9 So we have the followng formulae for bt and b t. b t 1 t 10 1 h (16) 1 1 b t h t (17) Combnng (16) wth (7) yelds b t 1 1 2 d 10 1 t (18) and smlarly for b t (wth 10 1 deleted). From (8) and (18) we now see that b t 1 10 a t t (19) and smlarly b t a t t (20) Note that, snce t 1, t follows from (20) that b t a t

10 for all t. Ths shows that bt ncreases faster than at (and smlar for the followng propostons, smlar to Propostons 1 and 2. b t and a t ). We have Proposton 4: The ndcator bt s constant f and only f 2 (21) The same result s true for b t. Proposton 5: () bt s convexly decreasng f and only f 2 (22) () bt s ncreasng f and only f 2 (23) () bt s convexly ncreasng f and only f 2 2 (24) (v) bt s concavely ncreasng f and only f 2 2 (25)

11 (v) bt s lnearly ncreasng f and only f 2 2 (26) The same results are true for b t. The proofs of Propostons 4 and 5 follow trvally from (18) and the smlar result for b t. From Proposton 4 we see that, for the classcal Lotka exponent 2 we have that bt s constant f 0. From (7) t follows that ht s concavely ncreasng (snce 1). The calculaton of bt s llustrated on ths author s data for t 35 (the year 2012). The ctaton data are as n Table 1. From ths table we see that h h 35 22. Hence, by (1) h 35 a35 6.2857 (27) 35 10 For b 35 we have to dvde the c -values by 35 10. Ths s done n Table 2 from whch t follows that b 35 10 (28)

12 Table 1. Ctaton data of ths author, retreved from the Web of Scence on August 21, 2012 yeldng h h 35 22 c 1 258 2 137 3 106 4 61 5 60 6 52 7 48 8 41 9 41 10 35 11 31 12 31 13 30 14 29 15 26 16 26 17 24 18 24 19 24 20 24 21 23 22 23 23 20

13 Table 2. Ctaton data from Table 1, dvded by 3.5, yeldng b35 10. c 3.5 1 73.71 2 39.14 3 30.29 4 17.43 5 17.14 6 14.86 7 13.71 8 11.71 9 11.71 10 10 11 8.86 Both methods of normalzng the h-ndex use the career length t. There s a thrd normalzng method. It s the same as the one yeldng bt (or b t ) but nstead of dvdng each c by t 10 (or t) we now dvde each c by the (artcle age)/10 respectvely by the artcle age and calculate the h-ndex of ths new table. The new age-ndependent ndces are denoted ct and age c t respectvely. Note that, by defnton, ct bt and c t b t t. snce artcle The dsadvantage of ths thrd method s that t s complex to calculate: we have to check every artcle and the obtaned new table s not decreasng anymore. A manual control of ths author s ctaton data n the Web of Scence on August 21, 2012 showed that c35 19. Here c 35h35 22 but the smple example n Table 3 shows that ct ht Ths occurs when artcles are cted n a fast way. Ths s a good property. s possble.

14 Table 3. Example of ht ct 1 2 c age c age 10 1 10 1 100 2 1 1 10 Dscusson on the three correcton methods As dscussed above, we have selected three methods for correctng the h-ndex for career length. The frst method s Abt s orgnal proposal (Abt (2012)) by smply dvdng the h- ndex of a researcher by the career length. A second method that s presented here s to dvde all ctaton data by ths career length and then calculate the h-ndex of ths set of normalzed data. A thrd and last method that s presented here s to dvde each ctaton number by the age of the cted paper and then calculate the h-ndex of ths set of normalzed data. Clearly the frst method s the smplest, followed by the second method. The thrd method s the most logcal one (snce each artcle s ctaton number s dvded by ts age) but s the most ntrcate one snce the order of the normalzed ctaton data s dfferent from the orgnal one and hence one s oblged to consder all papers of the researcher. In ths sense, the second method s an acceptable alternatve for the thrd method snce one dvdes each artcle s ctaton number by the career length. Ths s more logcal than smply dvdng the h-ndex by the career length as n the frst method (Abt s method). The frst method s not logcal n ths sense snce ths smple method ndcates that a normalzaton s obtaned by smply dvdng the h-ndex by career length whch would only be logcal f the h-ndex s a lnear functon of tme (career length) whch s, by (7), not always the case. From the above dscusson one would be nclned to say that the second method s to be preferred snce t s more logcal than the frst one and smpler than the thrd one. However, from (20):

a t t 15 b (29) t ndcatng that the frst method s equvalent to the second method snce t s obtaned by dvdng the normalzed h-ndex b t by the career length as ndcated n (29). So the frst method (although too smple n ts defnton) performs equally well as the second method, hence the frst method should be preferred due to ts smplcty. We also repeat that the frst method yelds a constant functon (as ndcated by Abt) n a classcal nformetrcs case: 1 (see Proposton 1) whch s e.g. the case for 2 (most classcal Lotka exponent see Egghe (2005)) and 1 (lnear growth of the h-ndex). Conclusons and suggestons for further research Ths paper presented a mathematcal model for the age-ndependent ndcator of Abt. We gve characterzatons of the dfferent shapes of ths functon of the career length t, amongst whch a characterzaton of when ths functon s constant. We show that ths happens n classcal nformetrc cases gvng evdence to Abt s clam that ths ndcator often s t- ndependent. A second type of age-ndependent ndcator s obtaned by not dvdng the h-ndex by t but by dvdng each ctaton number c by t and then calculatng the h-ndex of ths transformed table. Also for ths ndcator, a mathematcal model s presented and characterzatons of the dfferent shapes are gven, amongst whch a characterzaton of when ths functon s constant. Both models are also compared. A thrd method of normalzng the h-ndex for career length t s as n the second method but, nstead of dvdng every ctaton number by t we dvde by the age of each artcle. Although t s more dffcult to calculate ths thrd ndex t has the good property that, the faster artcles are cted, the hgher ths ndex becomes.

16 We encourage the reader to conduct further experments on these three types of agendependent ndces and to defne new varants of these three methods. References H.A. Abt (2012). A publcaton ndex that s ndependent of age. Scentometrcs 91(3), 863-868. L. Egghe (2005). Power Laws n the Informaton Producton Process: Lotkaan Informetrcs. Elsever, Oxford, UK. L. Egghe (2008a). The nfluence of transformatons on the h-ndex and the g-ndex. Journal of the Amercan Socety for Informaton Scence and Technology 59(8), 1304-1312. L. Egghe (2008b). Examples of smple transformatons of the h-ndex: Qualtatve and quanttatve conclusons and consequences for other ndces. Journal of Informetrcs 2(2), 136-148. L. Egghe (2009). Mathematcal study of h-ndex sequences. Informaton Processng and Management 45(2), 288-297. L. Egghe and R. Rousseau (2006). An nformetrc model for the Hrsch-ndex. Scentometrcs 69(1), 121-129. J.E. Hrsch (2005). An ndex to quantfy an ndvdual s scentfc research output. Proceedngs of the Natonal Academy of Scences of the Unted States of Amerca 102, 16569-16572.