This page attempts to show the first uses of various words used in Probability & Statistics. It contains words related to probability & statistics that are extracted from the Earliest Known Uses of Some of the Words of Mathematics pages of Jeff Miller with his permission. Research for his pages is ongoing, and the uses cited in this page should not be assumed to be the first uses that occurred unless it is stated that the term was introduced or coined by the mathematician named. If you are able antedate any of the entries herein, please contact Jeff Miller, a teacher at Gulf High School in New Port Richey, Florida, who maintains these aformentioned pages. See also Jeff Millers Earliest Uses of Various Mathematical Symbols. Texts in red are by Kees Verduin.

The phrase

In The History of Statistics: The Measurement of Uncertainty before 1900, Stephen M. Stigler writes, "Yule derived what we now, following Fisher, call the analysis of variance breakdown." [James A. Landau]

The form of diagram, however, is much older; there is an example from William Playfair's
*Commercial and Political Atlas* of 1786 at
http://www.york.ac.uk/depts/maths/histstat/playfair.gif.

*Bar graph* is found in 1925 in *Statistics* by B. F. Young:
"Bar-graphs in the form of progress charts are used to represent a changing
condition such as the output of a factory" (OED2).

*Biased sample* is found in 1911 *An Introduction to the
theory of Statistics* by G. U. Yule: "Any sample, taken in the
way supposed, is likely to be definitely *biassed,* in the sense
that it will not tend to include, even in the long run, equal
proportions of the A’s and [alpha]'s in the original material"
(OED2).

*Biased sampling* is found in F. Yates, "Some examples of
biassed sampling," *Ann. Eugen.* 6 (1935) [James A. Landau].

*Central limit theorem* appears in the title "Ueber den
zentralen Grenzwertsatz der Wahrscheinlichkeitsrechnung," *Math.
Z.,* 15 (1920) by George Polya (1887-1985) [James A. Landau].
Polya apparently coined the term in this paper.

*Central limit theorem* appears in English in 1937 in *Random
Variables and Probability Distributions* by H. Cramér
(David, 1995).

*Central tendency* is found in 1929 in Kelley & Shen in C.
Murchison, *Found. Exper. Psychol.* 838: "Some investigators
have often preferred the median to the mean as a measure of central
tendency" (OED2).

The term

Let[James A. Landau]AandBbe two events whose probabilities are (A) and (B). It is understood that the probability (A) is determined without any regard toBwhen nothing is known about the occurrence or nonoccurrence ofB.When itisknown thatBoccurred,Amay have a different probability, which we shall denote by the symbol (A, B) and call 'conditional probability ofA,given thatBhas actually happened.'

The form of this solution consists in determining certain intervals, which I propose to call the confidence intervals..., in which we may assume are contained the values of the estimated characters of the population, the probability of an error is a statement of this sort being equal to or less than 1 - (epsilon), where (epsilon) is any number 0 < (epsilon) < 1, chosen in advance.

In the modern literature this notion is usually called
*Fisher-consistency* (a name suggested by Rao) to distinguish it
from the more standard notion linked to the limiting behavior of a
sequence of estimators. The latter is hinted at in Fisher's writings
but was perhaps first set out rigorously by Hotelling in the "The
Consistency and Ultimate Distribution of Optimum Statistics,"
*Transactions of the American Mathematical Society* (1930).
[This entry was contributed by John Aldrich, based on David (1995).]

This result enables us to start from the mathematical theory of independent probability as developed in the elementary text books, and build up from it a generalised theory of association, or, as I term it, contingency. We reach the notion of a pure contingency table, in which the order of the sub-groups is of no importance whatever.This citation was provided by James A. Landau.

The term *coefficient of correlation* was apparently originated
by Edgeworth in 1892, according to Karl Pearson's "Notes on the
History of Correlation," (reprinted in Pearson & Kendall (1970).
It appears in 1892 in F. Y. Edgeworth, "Correlated Averages,"
*Philosophical Magazine, 5th Series,* 34, 190-204.

*Correlation coefficient* appears in a paper published in 1895
[James A. Landau].

The OED2 shows a use of *coefficient of correlation* in 1896 by
Pearson in *Proc. R. Soc.* LIX. 302: "Let *r*_{0}
be the coefficient of correlation between parent and offspring."
David (1995) gives the 1896 paper by Karl Pearson, "Regression,
Heredity, and Panmixia," *Phil. Trans. R. Soc., Ser. A.* 187,
253-318. This paper introduced the product moment formula for
estimating correlations--Galton and Edgeworth had used different
methods.

**Partial correlation.** G. U. Yule introduced "net coefficients"
for "coefficients of correlation between any two of the variables
while eliminating the effects of variations in the third" in "On the
Correlation of Total Pauperism with Proportion of Out-Relief" (in
Notes and Memoranda) *Economic Journal,* Vol. 6, (1896), pp.
613-623. Pearson argued that partial and total are more appropriate
than net and gross in Karl Pearson & Alice Lee "On the
Distribution of Frequency (Variation and Correlation) of the
Barometric Height at Divers Stations," *Phil. Trans. R. Soc.,*
Ser. A, 190 (1897), pp. 423-469. Yule went fully partial with his
1907 paper "On the Theory of Correlation for any Number of Variables,
Treated by a New System of Notation," *Proc. R. Soc. Series A,*
79, pp. 182-193.

**Multiple correlation.** At first multiple correlation referred
only to the general approach, e.g. by Yule in *Economic
Journal* (1896). The coefficient arrives later. "On the Theory of
Correlation" (*J. Royal Statist. Soc.,* 1897, p. 833) refers to
a coefficient of double correlation *R*_{1} (the
correlation of the first variable with the other two). Yule (1907)
discussed the coefficient of n-fold correlation
*R*^{2}_{1(23...n)}. Pearson used the phrases
"coefficient of multiple correlation" in his 1914 "On Certain Errors
with Regard to Multiple Correlation Occasionally Made by Those Who
Have not Adequately Studied this Subject," *Biometrika,* 10, pp.
181-187, and "multiple correlation coefficient" in his 1915 paper "On
the Partial Correlation Ratio," *Proc. R. Soc. Series A,* 91,
pp. 492-498.

[This entry was largely contributed by John Aldrich.]

The term

Earlier uses of the term *covariance* are found in mathematics,
in a non-statistical sense.

The term

*Decile* appears in 1882 in Francis Galton, *Rep. Brit. Assoc.
1881* 245: "The Upper Decile is that which is exceeded by
one-tenth of an infinitely large group, and which the remaining
nine-tenths fall short of. The Lower Decile is the converse of this"
(OED2).

*Dependent variable* appears in in 1831 in the second edition of
*Elements of the Differential Calculus* (1836) by John Radford
Young: "On account of this dependence of the value of the function
upon that of the variable the former, that is *y,* is called the
*dependent* variable, and the latter, *x,* the
*independent* variable" [James A. Landau].

*Directly proportional* is found in 1796 in *A Mathematical
and Philosophical Dictionary*: "Quantities are said to be directly
proportional, when the proportion is according to the order of the
terms" (OED2).

*Direct variation* is found in 1856 in
*Ray's higher arithmetic. The principles of arithmetic, analyzed and practically applied*
by Joseph Ray (1807-1855):

Variation is a general method of expressing proportion often used, and is either direct or inverse. Direct variation exists between two quantities when they increase togeether, or decrease together. Thus the distance a ship goes at a uniform rate, varies directly as the time it sails; which means that the ratio of any two distances is equal to the ratio of the corresponding times taken in the same order. Inverse variation exists between two quantities when one increases as the other decreases. Thus, the time in which a piece of work will be done, varies inversely as the number of men employed; which means that the ratio of any two times is equal to the ratio of the numbers of men employed for these times, taken in reverse order.This citation was taken from the University of Michigan Digital Library [James A. Landau].

See also W. G. Cochran and C. I. Bliss, "Discriminant functions with
covariance," *Ann. Math. Statist.* 19 (1948) [James A. Landau].

The term

The English term appears in J. L. Doob's "The Limiting Distributions of Certain Statistics,"
*Annals of Mathematical Statistics,* **6**, (1935), 160-169.

The term

In regression analysis a **DUMMY VARIABLE** indicates the presence (value 1) or absence
of an attribute (0).

A JSTOR search found "dummy variables" for social class and for region in H. S.
Houthakker's "The Econometrics of Family Budgets" *Journal of the Royal Statistical Society A,*
**115**, (1952), 1-28.

A 1957 article by D. B. Suits, "Use of Dummy Variables in Regression Equations" *Journal of the
American Statistical Association,* **52**, 548-551, consolidated both the device and the name.

The International Statistical Institute's *Dictionary of Statistical Terms*
objects to the name: the term is "used, rather laxly, to denote an artificial variable expressing
qualitative characteristics .... [The] word 'dummy' should be avoided."

Apparently these variables were not dummy enough for Kendall & Buckland, for whom a dummy variable signifies "a quantity written in a mathematical expression in the form of a variable although it represents a constant", e.g. when the constant in the regression equation is represented as a coefficient times a variable that is always unity.

The indicator device, without the name "dummy variable" or any other, was also used by writers on
experiments who put the analysis of variance into the format of the general linear hypothesis, e.g.
O. Kempthorne in his *Design and Analysis of Experiments* (1952) [John Aldrich].

Dorothy Geddes and Sally I. Lipsey, "The Hazards of Sets," *The
Mathematics Teacher,* October 1969 has: "The fact that
mathematicians refer to *the* empty set emphasizes the rather
unique nature of this set."

An older term is *null set,* q. v.

The terms *estimation* and *estimate* were introduced in R.
A. Fisher's "On the Mathematical Foundations of Theoretical
Statistics" (*Phil. Trans. R. Soc.* 1922). He writes (none too
helpfully!): "Problems of estimation are those in which it is
required to estimate the value of one or more of the population
parameters from a random sample of the population." Fisher uses
*estimate* as a substantive sparingly in the paper.

The phrase *unbiassed estimate* appears in Fisher's
*Statistical Methods for Research Workers* (1925, p. 54)
although the idea is much older.

The expression *best linear unbiased estimate* appears in 1938
in F. N. David and J. Neyman, "Extension of the Markoff Theorem on
Least Squares," *Statistical Research Memoirs,* 2, 105-116.
Previously in his "On the Two Different Aspects of the Representative
Method" (*Journal of the Royal Statistical Society,* 97,
558-625) Neyman had used *mathematical expectation estimate* for
unbiased estimate and *best linear estimate* for best linear
unbiased estimate (David, 1995).

The term *estimator* was introduced in 1939 in E. J. G. Pitman,
"The Estimation of the Location and Scale Parameters of a Continuous
Population of any Given Form," *Biometrika,* 30, 391-421. Pitman
(pp. 398 & 403) used the term in a specialised sense: his
estimators are estimators of location and scale with natural
invariance properties. Now estimator is used in a much wider sense so
that Neyman's best linear unbiased *estimate* would be called a
best linear unbiased *estimator* (David, 1995). [This entry was
contributed by John Aldrich.]

*Event* took on a technical existence when Kolmogorov in the
*Grundbegriffe der Wahrscheinlichkeitsrechnung*
(1933) identified "elementary events" ("elementare Ereignisse") with the elements of a collection
*E* (now called the "sample space") and "random events" ("zufällige Ereignisse")
with the elements of a set of subsets of *E* [John Aldrich].

According to Burton (p. 461), the word *expectatio* first
appears in van Schooten's translation of a tract by Huygens.

The two references above point to the same text as Huygens's *De
Ratiociniis in Ludo Alae* was a translation by van Schooten. NB The word
expectatio is used quite frequently throughout the text.

This is the Latin translation by Van Schooten of the first proposition:

Si a vel b expectem, quorum utriusque aeque facile mihi obtingere possit. expectatio mea dicenda est (a+b)/2This is the Dutch text of Huygens'

Als ick gelijcke kans hebbe om a of b te hebben, dit is my so veelThe litteral translation of the Dutch text is:weerdtals (a+b)/2

*Expectation* appears in English in Browne's 1714 translation of
Huygens's *De Ratiociniis in Ludo Alae* (David 1995).

This is Browne's 1714 translation of the first proposition:

If I expect a or b, and have an equal chance of gaining either of them, my Expectation is worth (a+b)/2

See also *mathematical expectation.*

See also L. H. C. Tippett, "On the extreme individuals and the range
of samples taken from a normal population," *Biometrika* 17
(1925) [James A. Landau].

The term *F distribution* is found in Leo A. Aroian, "A study of
R. A. Fisher's *z* distribution and the related *F*
distribution," *Ann. Math. Statist.* 12, 429-448 (1941).

The term

The term

The term

*Gaussian distribution* and *Gaussian law* were used by
Karl Pearson in 1905 in *Biometrika* IV: "Many of the other
remedies which have been proposed to supplement what I venture to
call the universally recognised inadequacy of the Gaussian law ..
cannot .. effectively describe the chief deviations from the Gaussian
distribution" (OED2).

In an essay in the 1971 book *Reconsidering Marijuana,* Carl
Sagan, using the pseudonym "Mr. X," wrote, "I can remember one
occasion, taking a shower with my wife while high, in which I had an
idea on the origins and invalidities of racism in terms of gaussian
distribution curves. I wrote the curves in soap on the shower wall,
and went to write the idea down."

The name

The term

The term

The term was also used by Aristotle.

According to the *Catholic Encyclopedia,* the
word *harmonic* first appears in a work on conics by Philippe de
la Hire (1640-1718) published in 1685.

*Harmonical mean* is found in English in the 1828 *Webster*
dictionary:

Harmonical mean, in arithmetic and algebra, a term used to express certain relations of numbers and quantities, which are supposed to bear an analogy to musical consonances.

*Harmonic mean* is also found in 1851 in
*The principles of the solution of the Senate-house 'riders,' exemplified by the solution of those proposed
in the earlier parts of the examinations of the years 1848-1851* by Francis James Jameson:
"Prove that the discount on a sum of money is half the harmonic mean between the principal and the interest"
[University of Michigan Digital Library].

Many authors prefer the spelling *heteroskedasticity.* J. Huston
McCulloch (*Econometrica* 1985) discusses the linguistic aspects
and decides for the *k*-spelling. Pearson recalled that when he
set up *Biometrika* in 1901 Edgeworth had insisted the name be
spelled with a *k.* By 1932 when *Econometrica* was founded
standards had fallen or tastes had changed. [This entry was
contributed by John Aldrich, referring to OED2 and David, 1995.]

In *Philos. Trans. R. Soc. A.* CLXXXVI, (1895) 399 Pearson explained that
term was "introduced by the writer in his lectures on statistics as a term for a common form of graphical
representation, i.e., by columns marking as areas the frequency corresponding to the range of their base."

S. M. Stigler writes in his *History of Statistics* that Pearson used the term in his 1892 lectures
on the geometry of statistics.

The earliest citation in the OED2 is in 1891 in E. S. Pearson *Karl
Pearson* (1938).

According to Heinz Lueneburg, the term *numero sano* "was used
extensively by Luca Pacioli in his *Summa.* Before Pacioli, it
was already used by Piero della Francesca in his *Trattato
d'abaco.* I also find it in the second edition of Pietro Cataneo's
*Le pratiche delle due prime matematiche* of 1567. I haven't
seen the first edition. Counting also Fibonacci's Latin *numerus
sanus,* the word *sano* was used for at least 350 years to
denote an integral (untouched, virginal) number. Besides the words
*sanus, sano,* the words *integer, intero, intiero* were
also used during that time."

The first citation for *whole number* in the OED2 is
from about 1430 in *Art of Nombryng* ix. EETS 1922:

Of nombres one is lyneal, ano(th)er superficialle, ano(th)er quadrat, ano(th)cubike or hoole.In the above quotation (th) represents a thorn. In this use,

*Whole number* is found in its modern sense in the title of one
of the earliest and most popular arithmetics in the English language,
which appeared in 1537 at St. Albans. The work is anonymous, and its
long title runs as follows: "An Introduction for to lerne to reken
with the Pen and with the Counters, after the true cast of arismetyke
or awgrym in hole numbers, and also in broken" (Julio González
Cabillón).

Oresme used *intégral.*

*Integer* was used as a noun in English in 1571 by Thomas Digges
(1546?-1595) in * A geometrical practise named Pantometria*:
"The containing circles Semidimetient being very nighe 11 19/21 for
exactly nether by integer nor fraction it can be expressed" (OED2).

*Integral number* appears in 1658 in Phillips: "In Arithmetick
integral numbers are opposed to fraction[s]" (OED2).

*Whole number* is most frequently defined as Z+, although it is
sometimes defined as Z. In *Elements of the Integral Calculus*
(1839) by J. R. Young, the author refers to "a whole number or 0" but
later refers to "a positive whole number."

See also W. Feller, "On the Kolmogorow-Smirnov limit theorems for
empirical distributions," *Ann. Math. Statist.* 19 (1948)
[James A. Landau].

The term

*Latin square* appears in English in 1890 in the title of a
paper by Arthur Cayley, "On Latin Squares" in *Messenger of
Mathematics.*

The term was introduced into statistics by R. A. Fisher, according to
Tankard (p. 112). Fisher used the term in 1925 in *Statistical
Methods Res. Workers* (OED2).

*Graeco-Latin square* appears in 1934 in R. A. Fisher and F.
Yates, "The 6 x 6 Latin Squares," *Proceedings of the Cambridge
Philosophical Society* 30, 492-507.

According to Porter (p. 12), Poisson coined the term in 1835.

Formerly, likelihood was a synonym for probability, as it still is in everyday English.
(See the entry on *maximum likelihood* and the passage quoted there for Fisher's attempt
to distinguish the two. In 1921 Fisher referred to the value that maximizes the likelihood as "the optimum.")

*Likelihood* first appeared in a Bayesian context in H. Jeffreys's *Theory of Probability* (1939)
[John Aldrich, based on David (2001)].

The principle (without a name) can be traced back to R. A. Fisher's writings of the 1920s though
its clearest earlier manifestation is in Barnard's 1949 "Statistical Inference" (*Journal of the Royal
Statistical Society. Series B,* **11,** 115-149). On these earlier outings the principle attracted
little attention.

The

The standing of "likelihood ratio" was confirmed by S. S. Wilks's "The Large-Sample Distribution of the
Likelihood Ratio for Testing Composite Hypotheses," *Annals of Mathematical Statistics,* **9,**
(1938), 60-620 [John Aldrich, based on David (2001)].

See also *expectation.*

The term

The solution of the problems of calculating from a sample the parameters of the hypothetical population, which we have put forward in the method of maximum likelihood, consists, then, simply of choosing such values of these parameters as have the maximum likelihood. Formally, therefore, it resembles the calculation of the mode of an inverse frequency distribution. This resemblance is quite superficial: if the scale of measurement of the hypothetical quantity be altered, the mode must change its position, and can be brought to have any value, by an appropriate change of scale; but the optimum, as the position of maximum likelihood may be called, is entirely unchanged by any such transformation. Likelihood also differs from probability in that it is not a differential element, and is incapable of being integrated: it is assigned to a particular point of the range of variation, not to a particular element of it.

In 1571, *A geometrical practise named Pantometria* by Thomas
Digges (1546?-1595) has: "When foure magnitudes are...in continual
proportion, the first and the fourth are the extremes, and the second
and thirde the meanes" (OED2).

*Mean* is found in 1755 in Thomas Simpson, "An ATTEMPT to shew
the Advantage, arising by Taking the Mean of a Number of
Observations, in practical Astronomy," *Philosophical Transactions
of the Royal Society of London.*

*Mean error* is found in 1853 in *A dictionary of arts, manufactures, and mines;
containing a clear exposition of their principles and practice* by Andrew Ure
[University of Michigan Digital Library].

*Mean error* is found in English in an 1857 translation of Gauss's *Theoria motus*:
Consequently, if we desire the greatest accuracy, it will be necessary to compute the geocentric
place from the elements for the same time, and afterwards to free it from the mean error A, in order that
the most accurate position may be obtained. But it will in general be abundantly sufficient if the mean
error is referred to the observation nearest to the mean time" [University of Michigan Digital Library].

In 1894 in *Phil. Trans. Roy. Soc,* Karl Pearson has "error of
mean square" as an alternate term for "standard-deviation" (OED2).

In *Higher Mathematics for Students of Chemistry and Physics*
(1912), J. W. Mellor writes:

In Germany, the favourite method is to employ theIn a footnote, Mellor writes, "Some writers call our "average error" the "mean error," and our "mean error" the "error of mean square" [James A. Landau].mean error,which is defined asthe error whose square is the mean of the squares of all the errors,or the "error which, if it alone were assumed in all the observations indifferently, would give the same sum of the squares of the errors as that which actually exists." ...The mean error must not be confused with the "mean of the errors," or, as it is sometimes called, the

average error,another standard of comparison defined as the mean of all the errors regardless of sign.

The term

*Median* was used in English by Francis Galton in *Report of
the British Association for the Advancement of Science* in 1881:
"The Median, in height, weight, or any other attribute, is the value
which is exceeded by one-half of an infinitely large group, and which
the other half fall short of" (OED2).

The term

"Minimum" and "small" were the early English translations of
*moindres* (David, 1995).

*Method of least squares* occurs in English in 1825 in the title
"On the Method of Least Squares" by J. Ivory in *Philosophical
Magazine,* 65, 3-10.

*Modulus* (a coefficient that expresses the degree to which a
body possesses a particular property) appears in the 1738 edition of
*The Doctrine of Chances: or, a Method of Calculating the
Probability of Events in Play* by Abraham De Moivre (1667-1754)
[James A. Landau].

(Corollary 6)...To apply this to particular Examples, it will be necessary to estimate the frequency of an Event's happening or failing by the Square-root of the number which denotes how many Experiments have been, or are designed to be taken, and this Square-root, according as at has been already hinted at in the fourth Corollary, will be as it were theModulusby which we are to regulate our Estimation, and therefore suppose the number of Experiments to be taken is 3600, and that it were required to assign the Probability of the Event's neither happening oftner than 2850 times, nor more rarely than 1750, which two numbers may be varied at pleasure, provided they he equally distant from the middle Sum 1800, then make the half difference between the two numbers 1850 and 1750, that is, in this case, 50=sÖn; now having supposed 3600=n, then Ön will be 60, which will make it that 50 will be =60s, and consequently s=50/60=5/6, and therefore if we take the proportion, which in an infinite power, the double Sum of the Terms corresponding to the Interval 5/6 Ön, bears to the Sum of all the Terms, we shall have the Probability required exceeding near.

See also Stigler (1986), page 83.
The Egyptologist Flinders Petrie (1883) refers to the modulus as a measure
of dispersion. His sources are Airy's *Theory of Errors* (1875^{2})
and De Morgan's *Essay on Probability* (1838). The modulus equals
Ö2 s. FY Edgeworth also uses the modulus in 1885.

*Modulus* (in number theory) was introduced by Gauss in 1801 in
*Disquisitiones arithmeticae*:

Si numerusanumerorumb, cdifferentiam metitur,betcsecunduma congruidicuntur, sin minus,incongrui; ipsuma modulumappelamus. Uterque numerorumb, cpriori in casu alteriusresiduum,in posteriori verononresiduumvocatur. [If a numberameasure the difference between two numbersbandc, bandcare said to be congruent with respect toa,if not, incongruent;ais called the modulus, and each of the numbersbandcthe residue of the other in the first case, the non-residue in the latter case.]

*Modulus* (the length of the vector *a* + *bi*) is due
to Jean Robert Argand (1768-1822) (Cajori 1919, page 265). The term
was first used by him in 1814, according to William F. White in *A
Scrap-Book of Elementary Mathematics* (1908).

*Modulus* for Ö(*a*^{2} + *b*^{2})
was used by Augustin-Louis Cauchy (1789-1857) in 1821.

*Moment* appears in English in the obsolete sense of "momentum"
in 1706 in *Synopsis Palmariorum Matheseos* by William Jones:
"Moment..is compounded of Velocity..and..Weight" (OED2).

*Moment of a force* appears in 1830 in *A Treatise on
Mechanics* by Henry Kater and Dionysius Lardner (OED2).

*Moment* was used in a statistics sense by Karl Pearson in
October 1893 in *Nature*: "Now the centre of gravity of the
observation curve is found at once, also its area and its first four
moments by easy calculation" (OED2).

The phrase *method of moments* was used in a statistics sense in
the first of Karl Pearson's "Contributions to the Mathematical Theory
of Evolution" (*Phil. Trans. R. Soc.* 1894). The method was used
to estimate the parameters of a mixture of normal distributions. For
several years Pearson used the method on different problems but the
name only gained general currency with the publication of his 1902
*Biometrika* paper "On the systematic fitting of curves to
observations and measurements" (David 1995). In "On the Mathematical
Foundations of Theoretical Statistics" (*Phil. Trans. R. Soc.*
1922), Fisher criticized the method for being inefficient compared to
his own maximum likelihood method (Hald pp. 650 and 719). [This
paragraph was contributed by John Aldrich.]

According to W. L. Winston, the term was coined by Ulam and von Neumann in the feasibility project of atomic bomb by simulations of nuclear fission; they gave the code name Monte Carlo for these simulations.

According to several Internet web pages, the term was coined in 1947 by Nicholas Metropolis, inspired by Ulam's interest in poker during the Manhattan Project of World War II.

*Monte Carlo method* occurs in the title "The Monte Carlo
Method" by Nicholas Metropolis in the *Journal of the American
Statistical Association* 44 (1949).

*Monte Carlo method* also appears in 1949 in *Math. Tables
& Other Aids to Computation* III: "This method of solution of
problems in mathematical physics by sampling techniques based on
random walk models constitutes what is known as the 'Monte Carlo'
method. The method as well as the name for it were apparently first
suggested by John von Neumann and S. M. Ulam" (OED2).

*Normal probability curve* was used by Karl Pearson (1857-1936)
in 1893 in *Nature* 26 Oct. 615/2: "As verification note that
for the normal probability curve 3µ_{2}^{2} =
µ_{4} and µ_{3} = 0" (OED2).

Pearson used *normal curve* in 1894 in "Contributions to the
Mathematical Theory of Evolution":

When a series of measurements gives rise to a normal curve, we may probably assume something approaching a stable condition; there is production and destruction impartially around the mean.The above quotation is from Porter.

Pearson used *normal curve* in 1894 in *Phil. Trans. R.
Soc.* A. CLXXXV. 72: "A frequency-curve, which for practical
purposes, can be represented by the error curve, will for the
remainder of this paper be termed a normal curve."

*Normal distribution* appears in 1897 in *Proc. R. Soc.*
LXII. 176: "A random selection from a normal distribution" (OED2).

According to Hald, p. 356:

The new error distribution was first of all called the law of error, but many other names came to be used, such as the law of facility of errors, the law of frequency of errors, the Gaussian law of errors, the exponential law, and the typical law of errors. In his paper "Typical laws of heredity" Galton (1877) studied biological variation, and he therefore replaced the term "error" with "deviation," and referring to Quetelet, he called the distribution "the mathematical law of deviation." Chapter 5 in Galton'sAccording to Walker (p. 185), Karl Pearson did not coin the termNatural Inheritance(1889a) is entitled "Normal Variability," and he writes consistently about "The Normal Curve of Distributions," an expression that caught on.

Nevertheless, "...Pearson's consistent and exclusive use of this term in his epoch-making publications led to its adoption throughout the statistical community" (DSB).

However, Porter (p. 312) calls *normal curve* a "Pearsonian
neologism."

The "null hypothesis" is often identified with the "hypothesis tested"
of J. Neyman and E. S. Pearson's 1933 paper, "On the Problems of the Most
Efficient Tests of Statistical Hypotheses" *Phil. Trans. Roy. Soc. A*
(1933), 289-337, and represented by their symbol *H _{0}.*
Neyman did not like the "null hypothesis," arguing (

*Parameter* is found in 1922 in R. A. Fisher, "On the
Mathematical Foundations of Theoretical Statistics,"
*Philosoophical Transactions of the Royal Society of London,*
Ser. A. 222, 309-368 (David, 1995).

The term was introduced by Fisher, according to Hald, p. 716.

According to Hald (p. 604), Galton introduced the term.

Earlier, Leibniz had used the term *variationes* and Wallis had
adopted *alternationes* (Smith vol. 2, page 528).

*Poisson distribution* appears in 1922 in *Ann. Appl.
Biol.* IX. 331: "When the statistical examination of these data
was commenced it was not anticipated that any clear relationship with
the Poisson distribution would be obtained" (OED2).

The term

As, if any one shou'd lay that he wou'd throw the Number 6 with a single die the first throw, it is indeed uncertain whether he will win or lose; but how much more probability there is that he shou'd lose than win, is easily determin'd, and easily calculated.This is from the Latin translation by van Schooten of Huygens' introduction:

Ut si quis primo jactu una tessera senarium jacerere contendat, incertum quidem an vincet; at quanto verisimilius sit eum perdere quam vincere, reipsa definitum est, calculoque subducitur.This is the Dutch text of the introduction of Huygens'

Als, by exempel. Die met een dobbel-stee(n) ten eerste(n) een ses neemt te werpen / het is onseecker of hy het winnen sal of niet; maer hoe veel minder kans hy heeft om te winnen als om te verliesen / dat is in sich selven seecker / en werdt door reeckeningh uyt-gevonden.and

TO resolve which, we must observe, First, That there are six several Throws upon one Die, which all have an equal probability of coming up.This is from the Latin translation by van Schooten of Huygens' 9th proposition:

Ad quas solvendas advertendum est. Primo unius tesserae sex esse jactus diversos, quorum quivis aeque facile eveniat.This is the Dutch text from the 9th proposition of Huygens'

Om welcke te solveeren / so moet hier op worden acht genomen. Eerstelijck dat op 1 steen zijn 6 verscheyde werpen / die evenAlthough Huygens uses the word Kans (Chance) repeatedly in his Dutch text, van Schooten seems in his Latin translation to rephrase the text every time just to circumvent the use of a single term for probability. (See p. 11-13, in Waerden, BL van der (ed, 1975)lichtkonnen gebeuren.

IfThe first citation forpis the number of chances by which a certain event may happen, andqis the number of chances by which it may fail; the happenings as much as the failings have their degree of probability: But if all the chances by which the event may happen or fail were equally easy; the probability of happening will be to the probability of failing asptoq.

Pascal did not use the term (DSB).

*Wahrscheinlichkeitsdichte* appears in 1912 in
*Wahrscheinlichkeitsrechnung* by A. A. Markoff (David, 1998).

In J. V. Uspensky, *Introduction to Mathematical Probability*
(1937), page 264 reads "The case of continuous F(t), having a
continuous derivative f(t) (save for a finite set of points of
discontinuity), corresponds to a continuous variable distributed with
the density f(t), since F(t) = integral from -infinity to t f(x)dx"
[James A. Landau].

*Probability density* appears in 1939 in H. Jeffreys, *Theory
of Probability*: "We shall usually write this briefly P(dx|p) =
f'(x)dx, dx on the left meaning the proposition that x lies in a
particular range dx. f'(x) is called the probability density" (OED2).

*Probability density function* appears in 1946 in an English
translation of *Mathematical Methods of Statistics* by Harald
Cramér. The original appeared in Swedish in 1945 [James A.
Landau].

According to Hald (p. 360), Friedrich Wilhelm Bessel (1784-1846)
introduced the term *probable error* (*wahrscheinliche
Fehler*) without detailed explanation in 1815 in "Ueber den Ort
des Polarsterns" in *Astronomische Jahrbuch für das Jahr
1818*, and in 1816 defined the term in "Untersuchungen über
die Bahn des Olbersschen Kometen" in *Abh. Math. Kl. Kgl. Akad.
Wiss., Berlin.* Bessel used the term for the 50% interval around
the least-squares estimate.

Also in 1816 Gauss published a paper *Bestimmung der
Genauigkeit der Beobachtungen* in which he showed several methods to
calculate the Probable Error. He wrote:"... wir wollen diese Grösse
... der *wahrscheinleichen Fehler* nennen, und ihn met r bezeichnen."
His calculations were based on a general dispersion measure E_{k}=
(S(d^{k})/n)^{1/k}. Gauss showed that
k = 2 results in the most precise value of the probable error: r =
0.6744897 * E_{2}. Notice that E_{2} is the mean error
(i.e. the sample standard deviation).

All calculations and constants
related to the probable error and starting with Gauss are based on the
assumption that the errors follow a normal distribution. A modern
approximation of the ratio ^{r}/_{E2} is
0.674489749382381

*Probable error* is found in 1852 in
*Report made to the Hon. Thomas Corwin, secretary of the treasury*
by Richard Sears McCulloh. This book uses the term four times, but on the one
occasion where a computation can be seen the writer takes two measurements
and refers to the difference between them as the "probable error"
[University of Michigan Digital Library].

*Probable error* is found in 1853
in *A dictionary of science, literature & art*
edited by William Thomas Brande:
"... the probable error is the quantity, which is such that there is the same probability
of the difference between the determination and the true absolute value of the thing to
be determined exceeding or falling short of it. Thus, if twenty measurements of an angle have been
made with the theodolite, and the arithmetical mean or average of the whole gives 50° 27' 13";
and if it be an equal wager that the error of this result (either in excess or defect) is less than two
seconds, or greater than two seconds, then the probable error of the determination is two seconds"
[University of Michigan Digital Library].

*Probable error* is found in 1853 in
*A collection of tables and fromulae (=formulae) useful in surveying, geodesy, and practical astronomy*
by Thomas Jefferson Lee. The term is defined,
in modern terminology, as the sample standard deviation times .674489 divided by the square
root of the number of observations
[James A. Landau; University of Michigan Digital Library].

Actually on page 238 of the book mentioned above T.J. Lee
presents two versions of the probable error: r and R. The one called r is
the PE of a single observation with r = 0.674489 * E_{2} with
E_{2} = s and the one called R is the PE
of final result (ie of the mean) with R = r / Ön.

*Probable error* is found in 1855 in
*A treatise on land surveying* by William Mitchell Gillespie:
"When a number of separate observations of an angle have been made, the mean or average of them all, (obtained
by dividing the sum of the readings by their number,) is taken as the true reading. The 'Probable error'
of this mean, is the quantity, (minutes or seconds) which is such that there is an even chance of the real
error being more or less than it. Thus, if ten measurements of an angle gave a mean of 350 18', and it was an
equal wager that the error of this result, too much or too little, was half a minute, then half a minute would
be the 'Probable error' of this determination. This probable error is equal to the square root of the sum
of the squares of the errors (i. e. the differences of each observation from the mean) divided by the number of
observations, and multiplied by the decimal 0.674489.
The same result would be obtained by using what is called 'The weight' of the observation. It is equal to
the square of the number of observations divided by twice the sum of the squares of the errors. The
'Probable error' is equal to 0.476936 divided by the square root of the weight"
[University of Michigan Digital Library].

*Probable error* is found in 1865 in
*Spherical astronomy*
by Franz Brünnow (an English translation by the author of the second German edition):
"In any series of errors written in the order of their absolute magnitude and each written as often as it actually
occurs, we call that error which stands exactly in the middle, the probable error"
[University of Michigan Digital Library].

In 1872 *Elem. Nat. Philos.* by Thomson & Tait has: "The
probable error of the sum or difference of two quantities, affected
by independent errors, is the square root of the sum of the squares
of their separate probable errors" (OED2).

In 1889 in *Natural Inheritance,* Galton criticized the term
*probable error,* saying the term was "absurd" and "quite
misleading" because it does not refer to what it seems to, the most
probable error, which would be zero. He suggested the term
*Probability Deviation* be substituted, opening the way for
Pearson to introduce the term *standard deviation* (Tankard, p.
48).

The term

*Higher* and *lower quartile* are found in 1879 in D.
McAlister, *Proc. R. Soc.* XXIX: "As these two measures, with
the mean, divide the curve of facility into four equal parts, I
propose to call them the 'higher quartile' and the 'lower quartile'
respectively. It will be seen that they correspond to the ill-named
'probable errors' of the ordinary theory" (OED2).

*Upper* and *lower quartile* appear in 1882 in F. Galton,
"Report of the Anthropometric Committee," *Report of the 51st
Meeting of the British Association for the Advancement of Science,
1881,* p. 245-260 (David, 1995).

See also L. H. C. Tippett, "Random Sampling Numbers 1927," *Tracts
for Computers,* No. 15 (1927) [James A. Landau].

*Random choice* appears in the *Century Dictionary*
(1889-1897).

*Random selection* occurs in 1897 in *Proc. R. Soc.* LXII.
176: "A random selection from a normal distribution" (OED2).

*Random sampling* was used by Karl Pearson in 1900 in the title,
"On the criterion that a given system of deviations from the probable
in the case of a correlated system of variables is such that it can
be reasonably supposed to have arisen from random sampling,"
*Philosophical Magazine* 50, 157-175 (OED2).

*Random sample* is found in 1903 in *Biometrika* II. 273:
"If the whole of a population were taken we should have certain
values for its statistical constants, but in actual practice we are
only able to take a sample, which should if possible be a random
sample" (OED2).

*Random variable* is found in 1934 in A. Winter, "On Analytic
Convolutions of Bernoulli Distributions," *American Journal of
Mathematics,* 56, 659-663 (David, 1998).

According to Tankard (p. 112), R. A. Fisher "may ... have coined the
term *randomization*; at any rate, he certainly gave it the
important position in statistics that it has today."

*Rank correlation* appears in 1907 in *Drapers' Company Res.
Mem.* (Biometric Ser.) IV. 25: "No two rank correlations are in
the least reliable or comparable unless we assume that the frequency
distributions are of the same general character .. provided by the
hypothesis of normal distribution. ... Dr. Spearman has suggested
that rank in a series should be the character correlated, but he has
not taken this rank correlation as merely the stepping stone..to
reach the true correlation" (OED2).

Porter (page 289), referring to Galton, writes:

He did, however, change his terminology from "reversion" to "regression," a shift whose significance is not entirely clear. Possibly he simply felt that the latter term expressed more accurately the fact that offspring returned only part way to the mean. More likely, the change reflected his new conviction, first expressed in the same papers in which he introduced the term "regression," that this return to the mean reflected an inherent stability of type, and not merely the reappearance of remote ancestral gemmules.In 1859 Charles Darwin used

Galton used the term *reversion coefficient* in "Typical laws of
heredity," *Nature* 15 (1877), 492-495, 512-514 and 532-533 =
*Proceedings of the Royal Institution of Great Britain* 8 (1877)
282-301.

Galton used *regression* in a genetics context in "Section H.
Anthropology. Opening Address by Francis Galton," *Nature,* 32,
507-510 (David, 1995).

Galton also used *law of regression* in 1885, perhaps in the
same address.

Karl Pearson used *regression* and *coefficient of
regression* in 1897 in *Phil. Trans. R. Soc.*:

The coefficient of regression may be defined as the ratio of the mean deviation of the fraternity from the mean off-spring to the deviation of the parentage from the mean parent. ... From this special definition of regression in relation to parents and offspring, we may pass to a general conception of regression. Let A and B be two correlated organs (variables or measurable characteristics) in the same or different individuals, and let the sub-group of organs B, corresponding to a sub-group of A with a definite value a, be extracted. Let the first of these sub-groups be termed an array, and the second a type. Then we define the coefficient of regression of the array on the type to be the ratio of the mean-deviation of the array from the mean B-organ to the deviation of the type a from the mean A-organ.[OED2]

The phrase "*multiple regression* coefficients" appears in the
1903 *Biometrika* paper "The Law of Ancestral Heredity" by Karl
Pearson, G. U. Yule, Norman Blanchard, and Alice Lee. From around
1895 Pearson and Yule had worked on multiple regression and the
phrase "double regression" appears in Pearson's paper "Mathematical
Contributions to the Theory of Evolution. III. Regression, Heredity,
and Panmixia" (*Phil. Trans. R. Soc.* 1896). [This paragraph was
contributed by John Aldrich.]

The term may have been used earlier by Richard von Mises (1883-1953).

*Scattergram* is found in 1938 in A. E. Waugh, *Elem.
Statistical Method*: "This is the method of plotting the data on a
scatter diagram, or scattergram, in order that one may see the
relationship" (OED2).

*Scatterplot* is found in 1939 in *Statistical Dictionary of
Terms and Symbols* by Kurtz and Edgerton (David, 1998).

In 1946 - still in the genetic context - Fisher ("A System of Scoring Linkage
Data, with Special Reference to the Pied Factors in Mice. *Amer. Nat.,* 80:
568-578) described an iterative method for obtaining the maximum likelihood
value. Rao's 1948 *J. Roy. Statist. Soc.* B paper treats the method in a
more general framework and the phrase "Fisher's method of scoring" appears in a
comment by Hartley. Fisher had already used the method in a general context in
his 1925 "Theory of Statistical Estimation" paper (*Proc. Cambr. Philos.
Soc.* 22: 700-725) but it attracted neither attention nor name. [This entry
was contributed by John Aldrich, with some information taken from David (1995).]

The term

*Menge* (set) is found in *Geometrie der Lage* (2nd ed., 1856) by
Carl Georg Christian von Staudt: "Wenn man die Menge aller in einem und
demselben reellen einfoermigen Gebilde enthaltenen reellen Elemente durch n + 1
bezeichnet und mit diesem Ausdrucke, welcher dieselbe Bedeutung auch in den acht
folgenden Nummern hat, wie mit einer endlichen Zahl verfaehrt, so ..." [Ken
Pledger].

Georg Cantor (1845-1918) did not define the concept of a set in his early
works on set theory, according to Walter Purkert in *Cantor's Philosophical
Views.*

Cantor's first definition of a set appears in an 1883 paper: "By a set I
understand every multitude which can be conceived as an entity, that is every
embodiment [*Inbegriff*] of defined elements which can be joined into an
entirety by a rule." This quotation is taken from *Über unendliche lineare
Punctmannichfaltigkeiten,* Mathematische Annalen, 21 (1883).

In 1895 Cantor used the word *Menge* in *Beiträge zur Begründung der
Transfiniten Mengenlehre,* Mathematische Annalen, 46 (1895):

By a set we understand every collection [This translation was taken fromZusammenfassung]Mof defined, well-distinguished objectsmof our intuition [Zusammenfassung] or our thinking (which are called the elements ofMbrought together to form an entirety.

*Significance* is found in 1888 in *Logic of Chance* by
John Venn: "As before, common sense would feel little doubt that such
a difference was significant, but it could give no numerical estimate
of the significance" (OED2).

*Test of significance* and *significance test* are found in
1907 in *Biometrika* V. 183: " Several other cases of probable
error tests of significance deserve reconsideration" (OED2).

*Testing the significance* is found in "New tables for testing
the significance of observations," *Metron* 5 (3) pp 105-108
(1925) [James A. Landau].

*Statistically significant* is found in 1931 in L. H. C.
Tippett, *Methods Statistics*: "It is conventional to regard all
deviations greater than those with probabilities of 0.05 as real, or
statistically significant" (OED2).

*Statistical significance* is found in 1938 in *Journal of
Parapsychology*: "The primary requirement of statistical
significance is met by the results of this investigation" (OED2).

See also *rank correlation.*

A quantity of bones are taken from anThe term has been applied to other correlation scenarios with potential for misleading inferences. In Student's "The Elimination of Spurious Correlation due to Position in Time or Space" (ossuarium,and are put together in groups which are asserted to be those of individual skeletons. To test this a biologist takes the triplet femur, tibia, humerus, and seeks the correlation between the indicesfemur/humerusandtibia/humerus.He might reasonably conclude that this correlation marked organic relationship, and believe that the bones had really been put together substantially in their individual grouping. As a matter of fact ... there would be ... a correlation of about 0.4 to 0.5 between these indices had the bones been sorted absolutely at random.

The term

The term "standard deviation" was introduced in a lecture of 31 January, 1893, as a convenient substitute for the cumbersome "root mean square error" and the older expressions "error of mean square" and "mean error."The OED2 shows a use of

*Standard score* is dated 1928 in MWCD10.

The earliest citation in the OED2 is from the *Baltimore Sun,* Oct. 1,
1945, "The result .. was a 'stanine' rating (stanine being an invented word,
from 'standard of nine')."

Stanines were first used to describe an examinee's performance on a battery of tests constructed for the U. S. Army Air Force during World War II.

This term was introduced in 1922 by Fisher, according to Tankard (p. 112).

The term *statistic* was not well-received initially. Arne
Fisher (no relation) asked Fisher, "Where ... did you get that
atrocity, *a statistic*? (letter (p. 312) in J. H. Bennet
*Statistical Inference and Analysis: Selected Correspondence of R.
A. Fisher* (1990).) Karl Pearson objected, "Are we also to
introduce the words a mathematic, a physic, an electric etc., for
parameters or constants of other branches of science?" (p. 49n of
*Biometrika,* 28, 34-59 1936). [These two quotations were
provided by John Aldrich.]

In *Webster's* dictionary of 1828 the definition of statistics
is: "A collection of facts respecting the state of society, the
condition of the people in a nation or country, their health,
longevity, domestic economy, arts, property and political strength,
the state of the country, &c."

In its modern sense, the term was used in 1917 by Ladislaus Josephowitsch
Bortkiewicz (1868-1931) in *Die Iterationem* 3: "Die an der
Wahrscheinlichkeitstheorie orientierte, somit auf 'das Gesetz der Grossen
Zahlen' sich gründende Betrachtng empirischer Vielheiten mö ge als Stochastik
... bezeichnet werden" (OED2).

*Stochastic process* is found in A. N. Kolmogorov, "Sulla forma generale
di un prozesso stocastico omogeneo," *Rend. Accad. Lincei Cl. Sci. Fis.
Mat.* 15 (1) page 805 (1932) [James A. Landau].

*Stochastic process* is also found in A. Khintchine "Korrelationstheorie
der stationäre stochastischen Prozesse," *Math. Ann.* 109 (1934) [James A.
Landau].

*Stochastic process* occurs in English in "Stochastic processes and
statistics," *Proc. Natl. Acad. Sci. USA* 20 (1934).

In his 1908 paper, "The Probable Error of a Mean," *Biometrika*
6, 1-25 Gosset introduced the statistic, *z,* for testing
hypotheses on the mean of the normal distribution. Gosset used the
divisor *n,* not the modern (*n* - 1), when he estimated
s and his *z* is proportional
to *t* with *t* = *z* Ö(*n* - 1). Fisher
introduced the *t* form for it fitted in with his theory of
degrees of freedom. Fisher's treatment of the distributions based on
the normal distribution and the role of degrees of freedom was given
in "On a Distribution Yielding the Error Functions of Several well
Known Statistics," *Proceedings of the International Congress of
Mathematics,* Toronto, 2, 805-813. The *t* symbol appears in
this paper but although the paper was presented in 1924, it was not
published until 1928 (Tankard, page 103; David, 1995). According to
the OED2, the letter *t* was chosen arbitrarily. A new symbol
suited Fisher for he was already using *z* for a statistic of
his own (see entry for *F*).

*Student's distribution* (without "*t*") appears in 1925 in
R. A. Fisher, "Applications of 'Student's' Distribution,"
*Metron* 5, 90-104 and in *Statistical Methods for Research
Workers* (1925). The book made Student's distribution famous; it
presented new uses for the tables and made the tables generally
available.

"Student's" t-distribution appears in 1929 in Nature (OED2).

t-distribution appears (without Student) in A. T. McKay,
"Distribution of the coefficient of variation and the extended
'*t*' distribution," *J. Roy. Stat. Soc., n. Ser. 95*
(1932).

*t*-test is found in 1932 in R. A. Fisher, *Statistical
Methods for Research Workers*: "The validity of the *t*-test,
as a test of this hypothesis, is therefore absolute" (OED2).

Eisenhart (1979) is the best reference for the evolution of *t,*
although Tankard and Hald also discuss it.

[This entry was largely contributed by John Aldrich.]

*Studentized D ^{2} statistic* is found in R. C. Bose and S. N.
Roy, "The exact distribution of the Studentized D

The statistic chosen should summarise the whole of the relevant information supplied by the sample. This may be called the Criterion of Sufficiency. ... In the case of the normal curve of distribution it is evident that the second moment is a sufficient statistic for estimating the standard deviation.According to Hald (page 452), Fisher introduced the term

The phrase

*Errors of first and second kind* is found in 1933 in J. Neyman
and E. S. Pearson, "On the Problems of the Most Efficient Tests of
Statistical Hypotheses," *Philosophical Transactions of the Royal
Society of London,* Ser. A (1933), 289-337 (David, 1995).

*Type I error* and *Type II error* are found in 1933 in J.
Neyman and E. S. Pearson, "The Testing of Statistical Hypotheses in
Relation to Probabilities A Priori," *Proceedings of the
Cambridge Philosophical Society,* 24, 492-510 (David, 1995).

*Uniformly distributed* is found in H. Sakamoto, "On the
distributions of the product and the quotient of the independent and
uniformly distributed random variables," *Tohoku Math. J.* 49
(1943).

The phrase

*Variance* was introduced by Ronald Aylmer Fisher in 1918 in
"The Correlation Between Relatives on the Supposition of Mendelian
Inheritance," *Transactions of the Royal Society of Edinburgh,*
52, 399-433: "It is ... desirable in analysing the causes of
variability to deal with the square of the standard deviation as the
measure of variability. We shall term this quantity the Variance."

*Euler's system of notation* appears
in 1863 in *An outline of the necessary laws of thought: a treatise on
pure and applied logic* by William Thomson (University of Michigan Digital Library).

*Euler's notation* appears in about 1869 in *The principles of logic, for high schools and colleges*
by Aaron Schuyler (University of Michigan Digital Library).

*Euler's diagram* appears in 1884 in
*Elementary Lessons in Logic* by W. Stanley Jevons: "Euler's
diagram for this proposition may be constructed in the same manner as
for the proposition I as follows:..."

*Euler's circles* appears in 1893 in *Logic* by William
Minto (1845-1893): "The relations between the terms in the four forms
are represented by simple diagrams known as Euler's circles."

*Euler's circles* appears in October 1937 in George W. Hartmann,
"Gestalt Psychology and Mathematical Insight," *The Mathematics
Teacher*: "But in the case of 'Euler's circles' as used in
elementary demonstrations of formal logic, one literally 'sees' how
intimately syllogistic proof is linked to direct sensory perception
of the basic pattern. It seems that the famous Swiss mathematician of
the eighteenth century was once a tutor by correspondence to a
dull-witted Russian princess and devised this method of convincing
her of the reality and necessity of certain relations established
deductively."

*Venn diagram* appears in 1918 in *A Survey of Symbolic
Logic* by Clarence Irving Lewis: "This method resembles nothing so
much as solution by means of the Venn diagrams" (OED2).

The terms

URL: http://members.aol.com/jeff570/mathword.html

Sources : http://members.aol.com/jeff570/sources.html