this page is prepared by wentian li of north shore LIJ research institute
you are the visitor no.
since January 1, 1999.
Zipf's law, named after the Harvard linguistic professor
George Kingsley Zipf (1902-1950), is the observation that
frequency of occurrence of some event ( P ), as a function of
the rank ( i) when the rank is determined by the above frequency
of occurrence, is a power-law function Pi ~ 1/ia
with the exponent a close to unity.
The most famous example of Zipf's law is the frequency of English
words. Click
here
(or here is a PDF file of the class note)
to see a count of the top 50 words in 423 TIME magazine articles
(total 245,412 occurrences of words), with "the" as the number one (appearing
15861 times), "of" as number two (appearing 7239 times), "to" as the number
three (6331 times), etc. When the number of occurrences is plotted as
the function of the rank (1, 2, 3, etc.), the functional form is a power-law
function with exponent close to 1.
If you want to download English texts and analyze it yourself,
get texts from
Project
Gutenberg (National Clearinghouse for Machine Readable Texts)
(one mirror site is at
UIUC ).
The second example Zipf showed in his book was the population of cities
(or population of communities). The population of the city as
plotted as a function of the rank (the most popular city is ranked
number one, etc) is a power-law function with exponent close to 1.
The income or revenue of a company as a function of the rank
is also an example of the Zipf's law (also in Zipf's book).
This should also be called the Pareto's law because Pareto observed
this at the end of the last century.
Does Zipf's law describe rare or common events?
(new on sept-15-1999)
Well, both! It depends on the quantity used in ordering
the events. If an event is number 1 because it is
most popular, Zipf's plot describes the common
events (e.g. the use of English words). On the other
hand, if an event is number 1 because it is unusual
(biggest, highest, largest...), then it describes the
rare events (e.g. city population).
Actually, in Miller's preface of Zipf's book, he
distinguished Zipf's "first law" and "second law",
one for rare events and another for common events.
We don't make such distinction here (it's hard
to remember which is the first law and which is the
second law!)
Power-law or "stretched exponential" (Weibull) or "log-normal" or "Yule distribution"?
(new on dec-02-2002)
I am yet to find a more complete list, let me just start
to compile papers which question whether a seemingly
power-law function may not really be a power-law functions...
-
J Aitchison, JAC Brown (1954),
"On criteria for descriptions of income distribution",
Microeconomica, 6:88-98.
-
Colin Martindale, Andrzej K Konopka (1996),
"Oligonucleotide frequencies in DNA follow a
Yule distribution",
Computer & Chemistry, 20(1):35-38. (Yule distribution?)
- Richard Perline, "Zipf's law, the central limit theorem,
and the random division of the unit interval",
Physical Review E, 54(1):220-223 (1996). (Log-normal distribution?)
-
Jean Laherrere,
D Sornette (1998),
"Stretched exponential distributions in Nature and Economy:
'Fat tails' with characteristic scales",
European Physical Journals, B2:525-539.
( http://xxx.lanl.gov/abs/cond-mat/9801293)
(Stretched exponential distribution?)
-
Ronald Rousseau (1999),
"A weak goodness-of-fit test for rank-frequency distributions",
in Proceedings of the Seventh Conference of the International
Society for Scientometrics and Informetrics,
ed. C. Macias-Chapula, Universidad de Colima (Mexico), pages 421-430.
-
Carlos M Urzua (2000),
"A simple and efficient test for Zipf's Law",
Economics Letters, 66:257-260.
[PDF]
-
Bill Reed (2001),
"The double Pareto-lognormal distribution - A new parametric model
for size distribution", preprint. [note: this paper is on size
distribution, not on rank-frequency distribution.]
-
Allan Downey (2001),
"The structural cause of file size distributions",
Technical Report (Wellesley College).
-
E Limpert, WA Stahl,
M Abbt (2001),
"Lognormal distributions across the sciences: keys and clues",
Bioscience, 51(5):341-352. [a general discussion on the
lognormal distribution] [ PDF ]
-
Z Bi, C Faloutsos, F Korn
(2001),
"The 'DGX distribution for mining massive, skewed data",
Conference on Knowledge Discovery and Data Mining (KDD) 2001.
[PDF ]
-
Michael Mitzenmacher (2002), "A brief history of
generative models for power law and lognormal distributions",
preprint (EECS, Harvard Univ).
[PDF]
Zipf's original work
-
GK Zipf, Selective Studies and the Principle of Relative
Frequency in Language (?, 1932)
GK Zipf, Psycho-Biology of Languages
(Houghton-Mifflin, 1935; MIT Press, 1965).
[Zipf actually thought about this 10 years earlier,
i.e., around 1925.]
-
GK Zipf, Human Behavior and the Principle
of Least Effort (Addison-Wesley, 1949).
-
GK Zipf , National Unity and Disunity:
The Nation As a Bio-Social Organism
(Principia Press, Bloomington Indiana, 1941).
pre-Zipf work: "Pareto-Estoup-Zipf law"
-
V Pareto,
Cours d'economie politique
(Droz, Geneva Switzerland, 1896)
(Rouge, Lausanne et Paris, 1897)
-
JB Estoup,
Gammes Stenographiques
(Institut Stenographique de France, Paris, 1916).
-
JC Willis,
Age and area
(Cambridge Univ Press, 1922).
-
GU Yule,
"A mathematical theory of evolution based on the conclusions
of Dr. J.C. Willis, F.R.S. ", Philosophical Transactions
of the Royal Society of London (Series B), 213:21-87 (1925).
-
GU Yule, Statistical Study of Literary Vocabulary
(Cambridge Univ Press, 1944).
Mandelbrot's early work
-
BB Mandelbrot, "Adaptation d'un message a la ligne de
transmission. I & II",
Comptes Rendus (Paris), 232, 1638-1640 & 2003-2005 (1951).
-
BB Mandelbrot,
in "Contribution a la Theorie Mathematique
des Jeux de communication" (Institute of Statistics,
Univ of Paris, page 124, 1953)
-
BB Mandelbrot,
"An informational theory of the statistical
structure of languages", in Communication Theory,
ed. W. Jackson (Betterworth, 1953) , pp. 486-502.
-
BB Mandelbrot,
"Simple games of strategy occurring
in communication through natural languages", symposium on
statistical methods in communication engineering (Berkely,
Aug 17-18, 1953). appearing in Transactions of IRE
(professional groups on information theory), 3, 124-137 (1954).
-
GA Milller,
"Communication", Annual Review of Psychology, 5,
401-420 (1954).
[a summary of Mandelbrot's result.]
-
BB Mandelbrot, "Information theory and psycholinguistics", in
Scientific Psychology: Principles and Approaches,
eds. B. Wolman, E. Nagel (Basic Books,1965), pp.550-562.
-
BB Mandelbrot,
"Les constantes chiffrees du discourts",in
Encyclopedie de la Pleisde: Linguistique,
ed. J. Martinet (Gallimard, 1968), pp. 46-56.
Mandelbrot and Simon's debate
- HA Simon (1955),
"On a class of skew distribution functions",
Biometrika, 42:425-440.
[ PDF]
-
BB Mandelbrot, "A note on a class of skew distribution
function. analysis and critique of a paper by H.A. Simon",
Information and Control, 2,90-99 (1959).
[ABSTRACT:
This note is a discussion of H.A.
Simon's model (1955) concerning the class of frequency
distributions generally associated with the name of G.K.
Zipf. The main purpose is to show that Simon's model is
analytically circular in the case of the linguistic laws
of Estoup-Zipf and Willis-Yule. Insofar as the economic
law of Pareto is concerned, Simon has himself noted that
his model is a particular case of that of Champernowne;
this is correct, with some reservation. A simplified
version of Simon's model is included. ]
-
HA Simon, "Some further notes on a class of skew distribution
functions", Information and Control, 3, 80-88 (1960).
[ABSTRACT:
This note takes issue with a recent criticism by Dr. B. Mandelbrot
of a certain stochastic model to explain word-frequency data.
Dr. Mandelbrot's principal empirical and mathematical objections
to the model are shown to be unfounded. a central question is whether the
basic parameter of the distributions is larger or smaller than unity.
The empirical data show it is almost always very close to unity,
Sometimes slightly larger, sometimes smaller. Simple stochastic
models can be constructed for either case, and give a special status,
as a limiting case, to instances where the parameter is unity.
More generally, the empirical data can be explained by two types
of stochastic models as well as by models assuming efficient information
coding. The three types of models are briefly characterized and compared.
]
-
BB Mandelbrot,
"Final note on a class of skew distribution functions:
analysis and critique of a model due to H.A. Simon",
Information and Control, 4, 198-216 (1961).
[ABSTRACT:
We shall restate in detail our 1959 objections to Simon's 1955 model
for the Pareto-Yule-Zipf distribution. Our objections are valid
quite irrespectively of the sign of p-1, so that most of Simon's
(1960) reply was irrelevant. We shall also analyze the other
points brought up in that reply.
]
-
HA Simon, "Reply to 'final note' by Benoit Mandelbrot",
Information and Control, 4, 217-223 (1961).
[ABSTRACT:
Dr. Mandelbrot's original objection (1959) to using the Yule
process to explain the phenomena of word frequencies were refuted
in Simon (1960), and are now mostly abandoned. the present
"reply" refutes the almost entirely new arguments introduced by
Dr. Mandelbrot in his "final note", and demonstrates again
the adequacy of the models in (1955).
]
-
BB Mandelbrot, "Post scriptum to 'final note'", Information
and Control, 4, 300-304 (1961).
[ABSTRACT:
My criticism has not changed since I first had the privilege
of commenting upon a draft of Simon (1955).
]
-
HA Simon, "Reply to Dr. Mandelbrot's post scriptum",
Information and Control, 4, 305-308 (1961).
[ABSTRACT:
Dr. Mandelbrot has proposed a new set of
objections to my 1955 models of the Yule distribution.
Like his earlier objections, these are invalid.
]
Editorial note: Dr. Mandelbrot feels that no further comment
is needed and this debate terminates herewith.
Zipf's law in natural languages
(updated on december-10-2001)
-
GA Miller, EB Newman (1958), "Tests of a statistical
explanation of the rank-frequency relation
for words in written English", American Journal
of Psychology, 71, 209-218.
-
GA Miller, EB Newman, EA Friedman (1958),
"Length-frequency statistics for written
English", Information and Control, 1, 370-389.
-
Henry Kucera, W Nelsen Francis (1967), Computational
Analysis of Present-Day American English
(Brown Univ Press).
[out of print: see
Amazon]
-
Ronald E Wyllys (1975),
"Measuring scientific prose with rank-frequency ('Zipf')
curves: a new use for an old phenomenon," Proceedings
of the American Society for Information Science 12, 30-31.
Washington, DC: American Society for Information Science.
-
H Dahl (1979),
Word Frequencies of Spoken American
(Verbatim).
[rank-frequency of spoken words. the top twenty is:
I, and, the, to, that, you, it, of, a, know, was,
uh, in, but, is, this, me, about, just, don't]
-
R Rousseau, Qiaoqiao Zhang (1992),
"Zipf's data on the frequency of Chinese words revisited",
Scientometrics, 24(2):201-220.
-
EG Bard ,
RC Shillcock (1993),
"Competitor effects during lexical access: Chasing Zipf's tail",
In Cognitive Models of Speech Processing: The Second
Sperlonga Meeting,
Eds. GTM Altmann and RC Shillcock (Lawrence Erlbaum Associates).
-
DR Ridley , EA Gonzales (1994),
"Zipf's law extended to small samples of adult speech",
Percept. Mot. Skills, 79:153-154.
-
J Cooke, S Gregor, J Luck, JL Clark,
KT Lua, J McCallum, "Analyzing the conformance of
Chinese text to Zipf's law and Automatic
indexing of natural language text in the
UNIX environment", (transcript of slides, 1996?
Univ of Central Queensland, Australia)
-
J Tuldava (1996),
"The frequency spectrum of text and vocabulary",
Journal of Quantitative Linguistics, 3(1):?-?.
[ABSTRACT: The present paper deals with some problems
of the analysis of the word-frequency distribution and
the possibility of its analytical description ]
-
Colin Martindale, SM Gusein-Zade, Dean McKenzie, and Mark Yu. Borodovsky
(1996), "Comparison of equations describing the ranked frequency distributions
of graphemes and phonemes",
Journal of Quantitative Linguistics, 3(2):?-?.
-
VK Balasubrahmanyan, S Naranan (1996),
"Quantitative linguistics and complex system studies",
Journal of Quantitative Linguistics, 3(3):?-?.
-
S Naranan, VK Balasubrahmanyan (1998),
"Models for power law relations in linguistics and information science",
Journal of Quantitative Linguistics, 5(3):?-?.
-
W Li, Letters to the editor,
Complexity, 3:9-10 (1998).
-
B K Sen, Khong Wye Keen, Lee Soo Hoon, Lim Bee Ling, Mohd Rafae Abdullah,
Ting Chang Nguan, Wee Siu Hiang (1998),
"Zipf's law and writings on LIS",
Malaysian Journal of Library & Information Science, 3(2):93-98.
[ abstract ]
-
R Rousseau (1998), "George Kingsley Zipf: life, ideas and
recent developments of his theories", preprint (talk presented
at the Beijing International Seminar of Quantitative Evaluation of
R&D in Universities, and Fifth All-China Annual Meeting for
Scientometrics and Informatics. Dec 4-6, 1998).
-
Leo Egghe (1999),
"On the law of Zipf-Mandelbrot for multi-word phrases",
Journal of the American Society for Information Science, 50:?-?.
-
Claudia Prun (1999),
"G.K. Zipf's conception of language as an
early prototype of synergetic linguistics",
Journal of Quantitative Linguistics, 6(1):?-?.
-
MA Nowak (2000),
"The basic reproductive ratio of a word, the maximum size of a lexicon",
Journal of Theoretical Biology, 204(2):179-189.
-
Marcelo A Montemurro (2001),
"Beyond the Zipf-Mandelbrot law in quantitative linguistics",
arxiv.org e-print , cond-mat/0104066,
[ abstract ]
-
Alexander Gelbukh, Grigori Sidorov (2001),
"Zipf and Heaps laws' coefficients depend on language",
Proceeding of Conference on Intelligent Text Processing
and Computational Linguistics (CICLing'2001), ed. Alexander Gelbukh,
Lecture Notes in Computer Science, Vol 2004 (Springer-Verlag),
pp. 332-335.
-
AB Downey (2001),
"Evidence for long-tailed distributions in the Internet",
Proceedings of ACM SIGCOMM Internet Measurement Workshop 2001.
online reports (new on sept-15-1999)
Zipf's law in natural languages
(papers written in non-English languages)
(new on feb-05-2002, I would like to thank Dr. Gabriel Altmann
for this collection)
-
JB Estoup (1916),
Les Gammes Stenographiques
Paris, Institut Stenographique. (in French)
-
W Skalmowski (1961),
"Polskie przeklady Hafiza w swietle prawa Zipfa-Mandelbrota",
Sprawozdania Kom. Orient. PAN 125-127.
-
VM Kalinin (1964),
"O statistike literaturnogo teksta",
Voprosy jazykoznanija Nr. 1, ?-?.
-
VM Kalinin (1964),
Razvitie schemy Puassona i ee primenenie dlja statisticeskich svojstv reci,
Leningrad: Diss.
(in Russian)
-
Ju A Srejder (1967),
"O vozmoznosti teoreticeskogo vyvoda statisticeskich zakonomernostej teksta
(k obosnovaniju zakona Cipfa)",
in Problemy peredaci informacii, Vol 3, 57-63. Moskva.
-
EA Kalinina (1968),
"Izucenie leksiko-statisticeskich zakonomernostej na osnove verojanotnoj modeli",
in Statistika reci i avtomaticeskij analiz teksta,
Leningrad, ?-?.
-
G Billmeier (1969),
Worthaufigkeiten vom Zipfschen Typ, uberprüft an deutschem
Textmaterial, Hamburg: Buske. (in German)
-
Ju K Orlov (1970),
"Statisticeskaja struktura soobscenij, optymalŽnych dlja celoveceskogo
vosprijatija",
Naucno-techniceskaja informacija, 2m(8):11-16.
-
PM Alekseev, ST NavalŽna (1971),
"Pro graficnij opis zaleznosti 'rang-castota' lingvisticeskich odinic",
Visnik CharŽkivskogo universitetu 64, folologija, vyp. 8:?-?.
-
GG Belonogov, AP Novoselov (1971),
"Nekotorye kolicestvennye zakonomernosti v automatizirovannych informacionnych
sistemach",
in Avtomaticeskaja pererabotka teksta metodami prikladnoj lingvistiki.
Materialy vsesojuznoj konferencii: 219-220. Kisinev.
-
BA Volosin, JK Orlov (1972),
Obobscennyj zakon Cipfa-MandelŽbrota i raspredelenie cvetovych ploscadej
v proizvedenijach zivopisi, Tbilisi, AN GSSR Institut kibernetiki.
-
LS Kozackov (1973),
Sistemy potokov naucnoj informacii,
Kiev: Naukova dumka.
-
MV Arapov, EN Efimova (1975),
"Ponjatie leksiceskoj struktury teksta",
Naucno-techniceskaja informacija, 2:3-7.
-
MV Arapov, EN Efimova, Ju A Srejder (1975),
"O smysle rangovych raspredelenij",
Naucno-techniceskaja informacija, 2:9-20.
-
MV Arapov, EN Efimova, Ju A Srejder (1975),
"Rangovye raspredelenija v tekste i jazyke",
Naucno-techniceskaja informacija, 2:?-? .
-
AT Micevic (1975),
"Issledovanija struktury potokov naucno-techniceskoj informacii
po masinostroenii",
Naucno-techniceskaja informacija, 2(5):3-16.
-
Ju K Orlov (1976),
"O svjazi mezdu raspredeleniem Pareto i obobscennym zakonom Cipfa-Mandel'brota",
Bulletin of the Academy of Sciences of the Georgian SSR, 83:57-60.
-
Ju K Orlov (1976),
"Obobscennyj zakon Cipfa-Mandelbrota i castotnye struktury
informacion-nych edinic razlicnych urovnej",
in VycislitelŽnaja lingvistika,
ed. EK Guseva, pp. 179-202. Moskva: Nauka.
-
E Schurer (1976),
Das Zipfsche Gesetz in der fruhen Kindersprache,
Munchen: Diss. (in German)
-
MV Arapov, JA Srejder (1977),
"Klassifikacija i rangovye raspredelenija",
Naucno-techniceskaja informacija, 2(1-12):15-21.
-
MV Arapov (1977),
"Dve modeli rangovogo raspredelenija",
Voprosy informacionnoj teorii i praktiki, 4: 3-42.
-
AI Jablonskij (1977),
"Struktura i dinamika sovremennoj nauki",
in Sistemnye issledovanija. Ezegodnik 1976 ,
ed. DM Gvisiani, pp. 66-90. Moskva: Nauka.
-
SV Kopejkin, VE Ostapenko (1977),
"Zakon Cipfa i sopostavitelŽnyj analiz castotnych struktur anglijskogo,
fancuzskogo, rumynskogo i russkogo jazykov na baze matematiceskich modelej",
Naucnye trudy Kujbysevskogo pedagogiceskogo instituta, 193:91-94.
-
PM Alekseev (1978),
"O nelinejnych formulirovkach zakona Cipfa",
Voprosy kibernetiki 41:53-65.
-
MV Arapov, JA Srejder (1978),
"Zakon Cipfa i princip dissimetrii sistemy",
Semiotika i informatika, 10:74-95.
-
LS Kozackov (1978),
"Informacionnye sistemy s ierarchiceskoj ('rangovoj') strukturoj",
Naucno-techniceskaja informacija, 2(8):15-24.
-
W Marx, E Schuprer-Necker (1978),
"Uberlegungen zur Interpretation des Zipfschen Gesetzes am Beispiel der fruhen Kindersprachee",
Glottometrika, 1:154-167. (in German)
-
A Rouault (1978),
"Loi de Zipf et sources markoviennes",
Annales de lŽInstitut H. Poincare, 14:169-188.
(in French)
-
H Birkhahn (1979),
"Das 'Zipfsche Gesetz', das schwache Prateritum und die germanische Lautverschiebung",
Sitzungsberichte der osterreichischen Akademie der Wissenschaften,
philosophisch-historische Klasse 348.
(in German)
-
L Hoffmann, RG Piotrowski (1979),
Beitrage zur Sprachstatistik, Leipzig: ?
-
C Muller (1979),
"Du nouveau sur les distributions lexicales: la formule de Waring-Herdan",
in Langue Francais et Linguistique Quantitative,
ed. C Muller, pp. 177-195. Geneve: Slatkine (in French).
-
A Babanarov (1980),
"Castotnyj slovnik i avtomaticeskij slovarŽ dlja masynnogo perevoda
tereckich gazetnych textov",
in Inzenernaja lingvistika i optimizacija prepodavanija inostrannych
jazykov, Leningrad, pp.?-?.
-
MG Boroda (1980),
"Haufigkeitsstrukturen musikalischer Texte",
Glottometrika, 3:36-69.
(in German)
-
Ju K Orlov (1980),
"Informacionnye potoki: statisticeskij analiz i prognozirovanie",
Naucno-techniceskaja informacija, 2(2):23-30.
-
Ju K Krylov (1982),
"Stacionarnaja modelŽ porozdenija svjaznogo teksta",
Acta et Commenta-tiones Universitatis Tartuensis, 774:81-102.
-
Ju K Orlov (1982),
"Dynamik der Haufigkeitsstrukturen",
in Studies on Zipf's Law, eds. H Guiter,
MV Arapov, pp. 116-153. Bochum: Brockmeyer.
(in German)
-
Ju K Orlov (1982),
"Ein Modell der Haufigkeitsstruktur des Vokabulars",
in Studies on Zipf's Law, eds. H Guiter,
MV Arapov, pp. 154-233. Bochum: Brockmeyer.
(in German)
-
Ju K Orlov (1982),
"Linguostatistik: Aufstellung von Sprachnormen oder Analyse
des Redeprozesses? Die Antinomie 'Soprache-Rede' in der statistischen
Linguistik", in ? , eds. Ju K Orlov, MG Boroda,
IS Nadarejsvili, pp. 1-55.
-
Ju V Orlov, MG Boroda, IS Nadarejsvili (1982),
Sprache, Text, Kunst. Quantitative Analysen,
Bochum, Brockmeyer.
(in German)
-
AN Lebedev (1983),
"Zakonomernosti postroenija slov v reci",
Psichologiceskij zurnal, 4/5:11-23.
-
SD Haitun (1983),
Naukometrika. Sostojanie i perspektivy,
Moskva: Nauka.
-
Ju K Orlov, RY Chitashvili (1983),
"Generalized Z-distribution generating the well-known ŽRank-DistributionsŽ",
Bulletin of the Academy of Sciences of the Georgian, 110(2):269-272.
-
VN Byckov (1984),
"K probleme obobscenija i interpretacija rangovych raspredelenij v
statisticeskoj lingvistike",
Ucenye zapiski TGU, 689:61-70.
-
RG Piotrowski, KB Bektaev, AA Piotrovskaja (1985),
Mathematische Linguistik , Bochum, Brockmeyer.
(in German)
-
J Tuldava (1985),
"Castotnaja struktura teksta i zakon Cipfa",
Ucenye zapiski, TGU 711, 93-116.
-
G Altmann (1988),
Wiederholungen in Texten, Bochum, Brockmeyer.
(in German)
-
Ju K Orlov (1988),
"Unsichtbare Harmonie",
Musikometrika, 1:281-315.
-
C Prun (1995),
Die linguistischen Hypothesen von G.K. Zipf aus systemtheoretischer Sicht,
Trier: Magisterarbeit.
-
A Knuppel (1997),
Untersuchungen zum Zipf-Mandelbrot Gesetz an deutschen Texten,
Gottingen: Staatsexamensarbeit.
(in German)
-
RG Piotrovskij, KB Bektaev, AA Piotrovskaja (1997),
Matematiceskaja lingvistika, Moskva: Nauka.
-
J Tuldava (1998),
Probleme und Methoden der quantitativ-systemischen Lexikologie,
Trier: WVT.
-
A Knuppel (2001),
"Untersuchungen zum Zipf-Mandelbrot-Gesetz an deutschen Texten",
in Haufigkeitsverteilungen in Texten ed. KH Best,
pp. 248-280. Gottingen: Peust & Gutschmidt.
(in German)
Zipf's law in monkey-typing texts
(updated on feb-12-2002)
-
GA Miller (1957),
"Some effects of intermittent silence",
American Journal of Psychology, 70:311-314.
-
GA Miller, N Chomsky (1963), in
Handbook of Mathematical Psychology II,
eds, R. Luce, R. Bush, E. Galanter
(Wiley), pp. 419-491.
-
J Nicolis (1991), Chaos and Information
Processing: A Heuristic Outline (World Scientific).
[out of print, see
Amazon]
-
W Li (1992),
"Random texts exhibit Zipf's-law-like word frequency distribution",
IEEE Transactions on Information Theory , 38(6):1842-1845.
-
W Li (1996),
Comments to "Bell curves and monkey languages" (letter to the
editor), Complexity, 1(6):6.
- Richard Perline (1996), "Zipf's law, the central limit theorem,
and the random division of the unit interval",
Physical Review E, 54(1):220-223.
-
G Troll, P beim Graben (1998), "Zipf's law is not a
consequence of the central limit theorem",
Physical Review E, 57(2), 1347-1355.
-
Leo Egghe (2000),
"General study of the distribution of N-tuples of letters or
words based on the distribution of the single letters of words",
Mathematical and Computer Modelling, 31:35-41.
-
Leo Egghe (2000),
"The distribution of N-grams", Scientometrics, 47(2):237-252.
-
Ramon Ferrer, Richard V Sole (2002),
"Zipf's law and random texts",
Advances in Complex Systems, to appear.
Turing's formula?
-
Christer Samuelson (1995),
"Relating Turing's formula and Zipf's law",
Proceedings of the 4th Workshop on Very Large Corpora, Copenhagen, Denmark, 1996.
[ abstract ]
Connection with information theory
(added on may-10-2002)
-
P Harremoees, F Topsoe (2001),
"Maximum entropy fundamentals",
Entropy, 3:227-292.
-
P Harremoees, F Topsoe (2002),
"Zipf's law, hyperbolic distributions and entropy loss",
IEEE International Symposium on Information Theory (ISIT) Proceedings, in press.
Zipf's law discussed in popular books/Tutorial
-
BB Mandelbrot (1977), The Fractal Geometry of Nature
(W.H. Freeman and Company).
section 38 "scaling and power laws without geometry".
[ Amazon entry]
-
George A Miller (1991),
The Science of Words
(Scientific American Library, a division of HPHLP, distributed
by W.H. Freeman and Company).
[
Amazon entry]
-
Manfred Schroeder (1991),
Fractals, Chaos, Power Laws
(W.H. Freeman and Company), pp. 35-38.
[ Amazon entry]
-
Murray Gell-Mann (1994), The Quark and the Jaguar
(W.H. Freeman and Company), pp.92-97.
[
Amazon entry]
-
Lada A Adamic
Zipf, Power-laws, and Pareto - a ranking tutorial (online tutorial:
http://ginger.hpl.hp.com/shl/papers/ranking/ )
Zipf's law in city populations
(updated on jul-30-2001)
-
F Auerbach (1913),
"Das Gesetz der Bevolkerungskonzentration",
Petermanns Geographische Mitteilungen, LIX:73-76.
- Bruce M Hill (1970),
"Zipf's law and prior distributions for the composition
of a population", Journal of the American Statistical
Association, 65:1220-1232.
-
R Gunther, L Levitin, B Schapiro, P Wagner (1996),
"Zipf's law and the effect of ranking on probability distribution",
International Journal of Theoretical Physics, 35(2):395-417.
-
Hernan A Makse, Shlomo Havlin, H Eugene Stanley (1995),
"Modelling urban growth patterns",
Nature, 377:608-612.
-
P Krugman (1996),
The Self-Organizing Economy (Blackwell, Cambridge, MA).
-
DH Zanette
and SC Manrubia (1997),
"Role of intermittency in urban development: a model of large-scale
city formation",
Physical Review Letters, 79:523-526.
[ PDF]
comments by M Marsili, S Maslov and Y-C Zhang, and reply
at Physical Review Letters, 80:4831(1998).
(note: the x-axis in the paper is city population, not rank)
-
SC Manrubia,
DH Zanette
(1998),
"Intermittency model for urban development",
Physical Review E, 58:295-302.
-
Matteo Marsili, Yi-Cheng Zhang (1998),
"Interacting individuals leading to Zipf's law",
Physical Review Letters, 80(12):2741-2744.
[ PDF]
-
X Gabaix (1999),
"Zipf's law for cities: an explanation", Quarterly Journal of Economics,
114:739-767.
-
Bill Reed (2002),
"On the rank-size distribution for human settlements",
J Regional Science, 41:1-17.
[ PDF ]
-
LC Malacarne, RS Mendes, EK Lenzi (2002),
"q-exponential distribution in urban agglomeration",
Physical Review E, 65(1):article017106.
Zipf's law in Web Access Statistics and Internet Traffic
(updated on mar-07-2001)
See also, Mark Crovella's
publication list
Jakob Nielsen's column
Zipf curve and website popularity
Jakob Nielsen's column
Traffic from referring sites
Hewlett-Packard's
information dynamics group
-
Steve Glassman, "A caching relay for the world wide web",
In First International World-Wide Web Conference, pages 69-76
(May 1994).
( html)
-
WE Leland,
MS Taqqu,
W Willinger, DV Wilson (1994),
"On the self-similar nature of Ethernet traffic ",
IEEE/ACM Transactions on Networking, 2:1-15.
-
Carlos R Cunha, Azer Bestavros,
Mark E Crovella ,
"Characteristics of WWW client-based traces",
Technical Report TR-95-010, Boston University Computer
Science Department, June 1995.
-
Virgilio Almeida, Azer Bestavros,
Mark Crovella, and Adriana de Oliveira
(1996), "Characterizing reference locality in the WWW",
Boston University Computer Science Department, TR-96-11, June 1996.
In Proceedings of the Fourth International Conference on Parallel and Distributed
Information Systems (PDIS '96), December 1996.
-
Martin F Arlitt, Carey L Williamson (1997),
"Internet web server: workload characterization and performance
implications", IEEE/ACM Transactions on Networking, 5(5):631-645.
-
ME Crovella,
A Bestavros (1997),
"Self-similarity in world wide web traffic: evidence and possible
causes", IEEE/ACM Transactions on Networking, 5(6):835-846.
-
P Barford,
ME Crovella ,
"Generating representative web workloads for network
and server performance evaluation,"
in Proceedings of Performance '98/ACM SIGMETRICS '98,
151-160, Madison WI. [Slightly expanded version
appears as BUCS-TR-1997-006, November 4, 1997.]
-
ME Crovella,
Murad S Taqqu, Azer Bestavros (1998),
"Heavy-tailed probability distributions in the world wide web",
in A Practical Guide To Heavy Tails,
eds RJ Adler, RE Feldman,
MS Taqqu, Chapter 1, 3-26 (Chapman & Hall)
-
N Nishikawa, T Hosokawa, Y Mori, K Yoshida, H Tsuji (1998),
"Memory-based architecture for distributed WWW caching proxy",
Computer Networks and ISDN Systems,30:205-214.
-
BA Huberman, PLT Pirollo, JE Pitkow, RM Lukose,
"Strong regularities in world wide web surfing",
Science, 280:95-97 (April 3, 1998).
-
M Harchol-Balter,
ME Crovella,
CD Murta (1998),
"On choosing a task assignment policy for a distributed
server system,"
in Proceedings of Performance Tools '98, Lecture Notes
in Computer Science Vol 1469, pp. 231--242, 1998.
-
ME Crovella,
R Frangioso, M Harchol-Balter (1999),
"Connection Scheduling in Web Servers,"
Boston University Computer Science Technical Report BUCS-TR-99-003.
-
ME Crovella,
MS Taqqu (1999),
"Estimating the heavy tail index from scaling properties,"
Methodology and Computing in Applied Probability, 1(1):?-?.
-
P Barford, A Bestavros, A Bradley, and
ME Crovella (1999),
"Changes in Web client access patterns: characteristics and
caching implications," to appear in World Wide Web, Special
Issue on Characterization and Performance Evaluation.
-
Albert-Laszlo Barabasi, Reka Albert (1999),
"Emergence of scaling in random networks", Science,
286(5439):509-512. (may be relevant, but i haven't checked)
An ABC News online article on this work can be found at
http://abcnews.go.com/sections/science/WhosCounting/whoscounting991201.html
(Dec 1, 1999)
-
JM Carlson, J Doyle (2000),
"Highly optimized tolerance: a mechanism for power laws in designed systems",
Physical Review E, 60(2):1412-1427. [PDF ]
(this paper describes a general theory for power-law, not just
in internet traffic. but there is a section on this particular
application.)
-
Lee Breslau, Pei Cao, Li Fan, Graham Phillips, Scott Shenker
(2000),
"Web caching and Zipf-like distributions: evidence and implications",
Proceedings of INFOCOM'99 (IEEE Press).
[ abstract]
[ PDF]
-
Sidney Resnick, Holger Rootzen (2000),
"Self-similar communication models and very heavy tails",
Annals of Applied Probability, 10(3):753-778.
-
Lada A Adamic,
Bernardo A Huberman (2000),
"The nature of markets in the World Wide Web",
Quarterly Journal of Electronic Commerce, 1:5-12.
[ PDF]
-
Anders Johansen, Didier Sornette (2000),
"Download relaxation dynamics on the WWW following newsppaer
publication of URL",
Physica A, 276:338-345.
-
AB Downey (2001),
"Evidence for long-tailed distributions in the Internet",
ACM SIGCOMM Internet Measurement Workshop (November 2001).
-
AB Downey (2001), "The structural causes of file size distributions",
Ninth International Symposium on
Modeling, Analysis and Simulation of Computer and
Telecommunication Systems (MASCOTS'2001).
-
Michael Mitzenmacher (2002), "Improved
models for file size distribution",
preprint (EECS, Harvard Univ).
Zipf's law in bibliometrics, informetrics, scientometrics,
and library science
(updated on mar-07-2001)
This is similar to the Zipf's law in natural language, but
discussed in the context of information retrieval and library
science.
Some links to conferences:
7th International Conference on Scientometrics and Informetrics (July 5-9, 1999, Mexico)
6th International Conference on Scientometrics and Informetrics
(June 16-19, 1997, Israel)
a collection
of links on bibliometrics
-
AJ Lotka (1926),
"The frequency distribution of scientific productivity",
Journal of the Washington Academy of Sciences, 16:317-323.
-
RA Fairthorne (1969),
"Empirical hyperbolic distributions (Bradford Zipf Mandelbrot)
for bibliometric description and prediction",
Journal of Documentation, 25:319-343.
-
Bertram Brookes (1977),
"Theory of the Bradford law",
Journal of Documentation, 33:180-209.
-
Ronald E Wyllys (1981),
"Empirical and theoretical bases of Zipf's law,"
Library Trends. Summer; 30(1):53-64.
-
Bertram Brooks (1982),
"Quantitative analysis in the humanities: the advantage of
ranking techniques", in Studies on Zipf's law, ed. H Guiter,
MV Arapov (Brockmeyer), pages 65-115.
-
J Fedorowicz (1982),
"A Zipfian model of an automatic bibliographic system: an application
to MEDLINE", Journal of American Society of Information Science,
33:223-232.
-
Bertram Brooks (1984),
"Towards informetrics: Haitun, Laplace, Zipf, Bradford
and Alvey programme", Journal of Documentation, 40:120-143.
- Linus Ikpaahindi (1985),
"An overview of bibliometrics: its measurements,
laws and their applications", Libri, 35(2):163-177.
-
Ye-Sho Chen, Ferdinand F Leimkuhler (1986),
"A relationship between Lotka's law, Bradford's law, and
Zipf's law", Journal of the American Society for Information
Science, 37:307-314.
-
Ye-Sho Chen, Ferdinand F Leimkuhler (1987),
"Analysis of Zipf's law: an index approach",
Information Processing and Management, 23:71-182.
-
Ye-Sho Chen, Ferdinand F Leimkuhler (1987),
"Bradford's law: an index approach",
Scientometrics, 11:183-198.
-
Leo Egghe (1989),
The Duality of Informetric Systems with Applications to
the Empirical Laws, Ph.D Thesis (City University, London).
-
Michael J Nelsen (1989)
"Stochastic models for the distribution of index terms",
Journal of Documentation, 45:227-237.
-
Howard White, Katherine W McCain (1989)
"Bibliometrics", Annual Review of Information Science and
Technology, 24:119-186.
-
Abraham Bookstein (1990),
"Informetric distributions. Part I: unified overview",
Journal of the American Society for Information Science,
41:368-375.
-
Leo Egghe (1990),
"The duality of informetric systems with applications to the
empirical laws", Journal of Information Science, 16:17-27.
-
Leo Egghe,
Ronald Rousseau (1990),
Introduction to Informetrics: Quantitative Methods in Library,
Documentation and Information Science (Elsevier).
-
Liwen Qiu (1990),
"An empirical examination of the existing models for
Bradford's law", Information Processing and Management,
26:655-672.
-
Ronald Rousseau (1990),
"Relations between continuous versions of bibliometric laws",
Journal of the American Society for Information Science,
41(3):197-203.
-
Leo Egghe (1991),
"The exact place of Zipf's and Pareto's law amongst
the classical informetric laws", Scientometrics, 20:93-106.
-
Ronald Rousseau , Qiaoqiao Zhang (1992),
"Zipf's data on the frequency of Chinese words revisited",
Scientometrics, 24:201-220.
-
Ronald Rousseau , Sandra Rousseau (1993),
"Informetric distributions: a tutorial review",
CJILS/RCSIB, 18(2):51-63.
-
Quoniam Luc, Balme Frederic, Rostaing Herve, Giraud Eric, Dou Jean Mari
(1997),
"Bibliometric law used for information retrieval",
in Proceedings of the Sixth Conference of the International
Society for Scientometrics and Informetrics,
eds. Bluma C Peritz, Leo Egghe, Hebrew Univ of Jerusalem.
-
S Redner (1998),
"How popular is your paper?
An empirical study of the citation distribution"
European Physical Journal B, 4:131-134.
(http://xxx.lanl.gov/abs/cond-mat/9804163
)
-
ZK Silagadze's preprint:
"Citations and the Zipf-Mandelbrot's law",
arxiv.org e-print , physics/9901035
[ abstract ],
Complex Systems, 11:487-499 (1997).
-
Another preprint, C Tsallis, MP de Albuquerque,
"Are citations of scientific papers a case of nonextensivity ?",
(March 1999)
(http://xxx.lanl.gov/abs/cond-mat/9903433)
-
ZK Silagadze (2000),
"Citations and the Zipf-Mandelbrot law", Complex Systems, 11(6):?-?.
-
Robert Losee (2001),"Term dependence: a basis for Luhn and
Zipf models", Journal of the American Society for Information Science and Technology,
52(12):1019-1025.[ PDF]
Zipf's law in finance and business
(updated on sep-09-2001)
- Of course, Pareto's paper should be listed here.
- If the distribution is not plotted as the rank-frequency
plot, but the number of companies in each revenue/sale/income/whatever
category (this is actually the other type of Zipf's plot,
see Zipf [1935]), the log-normal distribution is usually relevant (I haven't
got the chance to trace the references...)
- D Champernowne (1953), "A model of income distribution",
Economic Journal, 63:318-351.
- BB Mandelbrot (1963),
"", Journal of Business, 36:394-?.
- BB Mandelbrot (1963),
"New methods in statistical economics",
Journal of Political Economy, 71:421-440 .
-
E Fama (1965),
" ". Management Science, 11:404-419.
-
JP Bouchaud (1995),
"More Levy distributions in physics, in Levy Flights
and related topics in physics, Lecture notes in physics 450, Springer
pp 239-250.
-
MHR Stanley, SV Buldyrev, S Havlin, RN Mantegna, MA Salinger,
HE Stanley (1995), "Zipf's plots and the size distribution
of firms",
Economics Letters, 49:453-457.
-
BB Mandelbrot (1997),
Fractals and Scaling in Finance : Discontinuity,
Concentration, Risk (Springer-Verlag, Nov 1997)
[
Amazon entry ]
-
D. Sornette, D. Zajdenweber
"Economic returns of research: the Pareto law and its implications",
European Physical Journal B, 8:653-664 (1998).
( abstract)
-
JP Bouchaud,
D Sornette, C Walter, JP Aguilar,
"Taming large events: optimal portfolio theory for
strongly fluctuating assets",
International Journal of Theoretical and
Applied Finance, 1:25-41 (1998).
-
N. Vandewalle and M. Ausloos,
"The n-Zipf analysis of financial data series
and biased data series",
Physica A, 268:170-176 (1999).
-
Greg Ip, "Analyst discovers the order in
internet stocks valuations",
Wall Street Journal, Dec 27 (1999).
[
http://interactive.wsj.com/articles/SB946246776318315015.htm
][ a local copy ]
-
J J Ramsden, Gy Kiss-Haypdl (2000),
"Company size distribution in different countries",
Physica A, 277:220-227.
-
Sorin Solomon, Peter Richmond (2000),
"Stability of Pareto-Zipf law in non-stationary economics",
arxiv.org e-print , cond-mat/0012479.
[ abstract]
-
H Aoyama, W Souma, Y Nagahara, M P Okazaki, H Takayasu,
M Takayasu (2000),
"Pareto's law for income of individuals and debt of bankrupt companies",
Fractals, 8(3):293-300.
-
A Dragulescu, VM Yakovenko (2001),
"Evidence for the exponential distribution of income in the USA",
European Physical Journal B, 20:585-589.
-
Bill Reed (2000),
"The Pareto law of incomes - an explanation and an extention",
submitted.
-
Bill Reed (2001),
"The Pareto, Zipf and other power laws",
Economics Letters, in press.
[note: the paper also contains a model for Zipf's law in general.]
[ PDF]
-
Robert L Axtell (2001),
"Zipf distribution of US firm sizes",
Science, 293(5536):1818-1820. [note: it's a frequency-size
plot, not the size-rank plot.] [ PDF]
Zipf's law in ecological systems
(updated on dec-02-2002)
(well, i haven't checked the original papers,
so i'm not sure the papers are in the right
place ...)
-
BM Hill, "The rank-frequency form of Zipf's law",
Journal of American Statisticians, 3, 1163-1174 (1975).
-
Juan Camacho, Richard V Sole
(2001) "Scaling in ecological size spectra",
Europhysics Letters, 55:774-780.
-
WJ Reed, BD Hughes (2002),
"On the size distribution of live genera",
Journal of Theoretical Biology, 217:?-?
-
WJ Reed, BD Hughes (2002),
"From gene families and genera to incomes and internet file sizes:
why power-laws are so common in nature",
Physical Review E, to appear.
Zipf's law in earthquake?
-
D Sornette, L Knopoff, a YY Kagan, C Vanneste,
"Rank-ordering statistics of extreme events:
application to the distribution of large earthquakes",
Journal of Geophysical Research, 101(B6):13883-13894 (1996).
[ PDF ]
Biomolecular sequences, Genomics
(note that i didn't use the words "zipf's law", because these are not!)
-
G Gamow, M Ycas (1955),
"Statistical correlation of protein and ribonucleic
acid composition", Proceedings of National Academy of Sciences, 41 (12),
1011-1019 (Dec 15, 1955).
- I wouldn't list the recent papers on the so-called Zipf's law in
subsequences in DNA sequences, because these rank-frequency plots do not
follow the power-law well, and the slope in the double-logarithm plot is
far from -1. These are rank-frequency plots, but are not Zipf's law!
-
E Bornberg-Bauer (1997), "How are model protein structures distributed
in sequence space?", Biophysical Journal, 73(5):2393-2403.
[If I understood correctly, some protein structure corresponds to many
protein sequences, whereas other structure corresponds to fewer sequences.
So structures can be ranked...]
-
M Gerstein, H Hegyi (1998),
"Comparing genomes in terms of protein structure: surveys of a
finite parts list", FEMS Microbiol Review, 22(4):277-304. [well, the words Zipf's
law is mentioned in the abstract...]
-
Vladimir A Kuznetsov (2001),
"Distribution associated with stochastic processes of
gene expression in a single eukaryotic cell",
EURASIP Journal on Applied Signal Processing, 4:285-296.
[ PDF ]
-
W Li , Y Yang (2002),
"Zipf's law in importance of genes for cancer classification using
microarray data", Journal of Theoretical Biology,
219:539-551.
or: arxiv.org e-print,
[ physics/0104028 ]
-
Vladmir A Kuznetsov (2002)
"Statistics of the numbers of transcripts and protein sequences
encoded in the genome", in
Computational and Statistical Approaches to
Genomics (Kluwer). [ PDF]
-
NM Luscombe, J Qian, Z Zhang, T Johnson, M Gerstein (2002),
"The dominance of the population by a selected few: power-law behaviour
applies to a wide variety of genomic properties", Genome Biology, 3:research0040.
-
WJ Reed, BD Hughes (2002),
"A model explaining the size distribution of gene and protein
families", submitted to Discrete and Conts. Dyn. Systems - B
Estimation issues
- BM Hill, "A simple general approach to inference about the tail
of a distribution", Annals of Statistics, 3, 1163-1174 (1975).
-
G.S. Lo, "Asymptotic behavior of Hill's estimate and application",
Journal of Applied Probability, 23, 922-936 (1986).
-
BM Hill, "Bayesian forecasting of
extreme values in an exchangeable sequence",
J Res National Institute of Standard Technology,
99:521-538 (1994).
Miscellaneous
(updated on jul-05-2001)
-
CJ Brackenridge (1978),
"A study of phenotypic arrays derived from seven genetic systems in
an Australian population sample", Ann. Human Biology, 5:381-388.
-
P Schuster, PF Stadler (1994),
"Landscapes: complex optimization problems and biopolymer structures",
Computer & Chemistry, 18(3):295-324.
-
P Schuster, W Fontana, PF Stadler , IL Hofacker (1994),
"From sequences to shapes and back: a case study in RNA secondary
structures",
Proceedings of Royal Society of London (B. Biological Sciences),
255:279-284.
-
P Schuster (1995),
"How to search for RNA structures. Theoretical concepts
in evolutionary biotechnology", Journal of Biotechnology,
41(2-3):239-257.
["The frequency with which a structure is realized in sequence space
is inversely proportional to some power c > 1 of the structure's
frequency rank, thus following a (generalized) Zipf law"]
-
MS Watanabe (1996),
"Zipf's law in percolation", Physical Review E, 53(4):4187-4190.
-
JD Burgos, P Moreno-Tovar (1996),
"Zipf-scaling behavior in the immune system",
Biosystems, 39(3):227-232.
-
YG Ma (1999),
"Zipf's law in the liquid gas phase transition of nuclei",
European Physics Journal, A6:367-371.
-
Piqueira JR, Monteiro LH, de Magalhaes TM, Ramos RT, Sassi RB, Cruz EG
(1999),
"Zipf's law organizes a psychiatric ward", Journal of Theoretical Biology,
198:439-443. [what?]
-
J Kalda, M Sakki, M Vainu, M Laan (Oct 2001),
"Zipf's law in human heatbeat dynamics",
arxiv.org e-print , physics/0110075.
[ abstract]
-
WJ Reed, BD Hughes (2002),
"On the distribution of family names",
Physica A, to appear.
Relation with Benford's Law (also called first-digit law)?
(new on sep-19-2001)
-
L Pietronero, E Tossati, V Tossati, A Vespignani (2001),
"Explaining the uneven distribution of
numbers in nature: the laws of Benford and Zipf",
Physica A, 293:297-304.
More links to Benford's law:
- S Newcomb , "Note on the frequency of use of
the different digits in natural numbers", American
Journal of Mathematics, 4:39-40 (1881).
- Frank Benford, "The law of anomalous numbers",
Proc. American Phil Society, 78:551-572 (1938).
- RA Raimi, "The peculiar distribution of
first digits", Scientific American, 221:109-119 (Dec 1969)
- J Burke, E Kincanon (1991),
"Benford's law and physical constants: the
distribution of initial digits", American
Journal of Physics, 14:59-63 (1991).
-
Mark J Nigrini,
The Detection of Income
Tax Evasion Through an Analysis
of Digital Frequencies
(Ph.D Thesis, Univ Cincinnati, 1992)
(current a professor of accountancy at the
Southern Methodist University, Dallas, TX)
-
"He's got their number: Scholar uses
math to foil financial fraud" (Wall Street Journal,July 10, 1995)
- E Ley, "On the peculiar distribution of the US stock
indices digits", American Statistician, 1995
- Theodore P Hill,
"A statistical derivation of the significant-digit law",
Statistical Science, 10(4):354-363 (1995).
-
M Nigrini, "A taxpayer compliance application of
Benford's law", Journal of the American Taxation Association,
18:72-91 (1996).
- TP Hill,
"The first digit phenomenon", American
Scientist, 86:358-363 (1998).
- Matthews,
The power of one, NewScientist, July 10, 1999.
- Eric Weisstein's
Treasure Troves of Science
http://www.treasure-troves.com/math/BenfordsLaw.html
-
Alexander Bogomolny's
Interactive Math Miscelany and Puzzles
http://www.cut-the-knot.com/do_you_know/zipfLaw.html
-
New York Times, Aug 4, 1998
"Following Benford's Law, or Looking Out for No. 1"
(a copy from
http://courses.nus.edu.sg/course/mathelmr/080498sci-benford.htm)
-
LM Leemis, BW Schmeiser, DL Evans (2000),
"Survival distributions satisfying Benford's law",
The American Statistician, 54:1-6.