Significant Digit (Benford's) Law in Publication Citations

10 Feb 2015

I expect that any decent sized sample of a convex process will have more numbers
with a leading significant digit of 1, followed by significant digit 2, and
the least occurrence of numbers with significant digit 9, since

for a convex function $f(t)$ and uniform distribution of $t$. To see this in
action I thought to plot the histogram of significant digits of publication
citations, since I think it’s reasonable that the more citations a paper has,
the more likely it is to be cited again. This meets the convex criterion. For
a roughly uniform sampling of $t$, we should collect the citations of papers
of senior researchers (although I make one exception out of curiosity).

To get the data, I used the Publish or Perish
application, a Windows interface to Google Scholar, and downloaded six
csv files, one per researcher. Here’s the J code I use to plot the histograms:

So the results match intuition, but the next question is why (except for
the less senior researcher) do the distributions so closely match
the log distribution $log(1 + 1/x)$? Here’s one answer by Hill 1995.

I'm a software engineer currently living in Melbourne, working remotely for the
Psychiatry Neuroimaging Laboratory
in Boston and
a startup in Canada. My interests are data analysis pipelines and inference,
and I'm unduly obsessed with understanding design principles behind concise,
uncomplicated software systems.