Load Home page + menu
Last update 12 march 2003

Earliest Known Uses of Some of the Words of Probability & Statistics

This page attempts to show the first uses of various words used in Probability & Statistics. It contains words related to probability & statistics that are extracted from the Earliest Known Uses of Some of the Words of Mathematics pages of Jeff Miller with his permission. Research for his pages is ongoing, and the uses cited in this page should not be assumed to be the first uses that occurred unless it is stated that the term was introduced or coined by the mathematician named. If you are able antedate any of the entries herein, please contact Jeff Miller, a teacher at Gulf High School in New Port Richey, Florida, who maintains these aformentioned pages. See also Jeff Millers Earliest Uses of Various Mathematical Symbols. Texts in red are by Kees Verduin.

ANCILLARY in the theory of statistical estimation. The term "ancillary statistic" first appears in R. A. Fisher's 1925 "Theory of Statistical Estimation," Proc. Cambr. Philos. Soc. 22. 700-725, although interest in ancillary statistics only gathered momentum in the mid-1930s when Fisher returned to the topic and other authors started contributing to it [John Aldrich, David (1995)].
The phrase ANALYSIS OF VARIANCE appears in 1918 in Sir Ronald Aylmer Fisher, "The Causes of Human Variability," Eugenics Review, 10, 213-220 (David, 1995).

It appears in a paper by Sir Ronald Aylmer Fisher published in 1924, used as if Fisher expected the reader to know that an analysis of variance was. In a 1920 paper, Fisher used the phrase "analysis of total variance" as if he had to explain what such a procedure is.

In The History of Statistics: The Measurement of Uncertainty before 1900, Stephen M. Stigler writes, "Yule derived what we now, following Fisher, call the analysis of variance breakdown." [James A. Landau]

ASSOCIATION (in statistics) is found in 1900 in G. U. Yule, "On the Association of Attributes in Statistics," Philosophical Transactions of the Royal Society of London, Ser. A, 194, 257-319 (David, 1998).
AVERAGE ERROR: more to be added
BAR CHART occurs in Nov. 1914 in W. C. Brinton, "Graphic Methods for Presenting Data. IV. Time Charts," Engineering Magazine, 48, 229-241 (David, 1998).

The form of diagram, however, is much older; there is an example from William Playfair's Commercial and Political Atlas of 1786 at http://www.york.ac.uk/depts/maths/histstat/playfair.gif.

BAR GRAPH is dated 1924 in MWCD10.

Bar graph is found in 1925 in Statistics by B. F. Young: "Bar-graphs in the form of progress charts are used to represent a changing condition such as the output of a factory" (OED2).

BERNOULLI TRIAL is dated 1951 in MWCD10, although James A. Landau has found the phrases "Bernoullian trials" and "Bernoullian series of trials" in 1937 in Introduction to Mathematical Probability by J. V. Uspensky.
BIASED and UNBIASED. Biased errors and unbiased errors (meaning "errors with zero expectation") are found in 1897 in A. L. Bowley, "Relations Between the Accuracy of an Average and That of Its Constituent Parts," Journal of the Royal Statistical Society, 60, 855-866 (David, 1995).

Biased sample is found in 1911 An Introduction to the theory of Statistics by G. U. Yule: "Any sample, taken in the way supposed, is likely to be definitely biassed, in the sense that it will not tend to include, even in the long run, equal proportions of the A’s and [alpha]'s in the original material" (OED2).

Biased sampling is found in F. Yates, "Some examples of biassed sampling," Ann. Eugen. 6 (1935) [James A. Landau].

BIMODAL appears in 1903 in S. R. Williams, "Variation in Lithobius Forficatus," American Naturalist, 37, 299-312 (David, 1998).
BINOMIAL DISTRIBUTION is found in 1911 in An Introduction to the Theory of Statistics by G. U. Yule: "The binomial distribution,..only becomes approximately normal when n is large, and this limitation must be remembered in applying the table..to cases in which the distribution is strictly binomial" (OED2).
BIVARIATE is found in 1920 in Biometrika XIII. 37: "Thus in 1885 Galton had completed the theory of bi-variate normal correlation" (OED2).
CENTRAL LIMIT THEOREM. In 1919 R. von Mises called the limit theorems Fundamentalsätze der Wahrscheinlichkeitsrechnung in a paper of the same name in Math Z. 4, 1-97.

Central limit theorem appears in the title "Ueber den zentralen Grenzwertsatz der Wahrscheinlichkeitsrechnung," Math. Z., 15 (1920) by George Polya (1887-1985) [James A. Landau]. Polya apparently coined the term in this paper.

Central limit theorem appears in English in 1937 in Random Variables and Probability Distributions by H. Cramér (David, 1995).

CENTRAL TENDENCY is dated ca. 1928 in MWCD10.

Central tendency is found in 1929 in Kelley & Shen in C. Murchison, Found. Exper. Psychol. 838: "Some investigators have often preferred the median to the mean as a measure of central tendency" (OED2).

CHI SQUARE. Karl Pearson introduced the chi-squared test and the name for it in an article in 1900 in The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science. Pearson had been in the habit of writing the exponent in the multivariate normal density as -1/2 chi-squared [James A. Landau, John Aldrich].
CLASSICAL PROBABILITY. This term for probability as defined by Laplace and earlier writers came into use in the 1930s when alternative definitions were widely canvassed. J. V. Uspensky (Introduction to Mathematical Probability, 1937, p. 8) gave the "classical definition," which he favored, and criticized the "new definitions" (von Mises) and "the attempt to build up the theory of probability as an axiomatic science" (Kolmogorov) [John Aldrich].
CLASSICAL statistical inference. The polar pair "classical" and "Bayesian" have figured in discussions of the foundations of statistical inference since the 1960s. The body of work to which "classical" was attached went back only to the 1920s and -30s but, as Schlaifer wrote in 1959 (Probability and Statistics for Business Decisions, p. 607), "it is expounded in virtually every course on statistics [in the United States] and is adhered to by the great majority of practicing statisticians." Schlaifer and a few others were sponsoring a rejuvenated Bayesian alternative. The "classical" tag may have derived some authority from Neyman's "Outline of a Theory of Statistical Estimation based on the Classical Theory of Probability" (Philosophical Transactions of the Royal Society, 236, (1937), 333-380), one of the classics of classical statistics. The non-classical possibility Neyman had in mind and rejected was the Bayesian theory of Jeffreys. Confusingly Neyman's "classical theory of probability" has more to do with Kolmogorov and von Mises than with Laplace [John Aldrich].
CLUSTER ANALYSIS is found in 1939 in Cluster Analysis by R. C. Tryon [James A. Landau].
The term COEFFICIENT OF VARIATION appears in 1896 in Karl Pearson, "Regression, Heredity, and Panmixia," Philosophical Transactions of the Royal Society of London, Ser. A. 187, 253-318 (David, 1995). The term is due to Pearson (Cajori 1919, page 382). According to the DSB, he introduced the term in this paper.
CONDITIONAL PROBABILITY is found in J. V. Uspensky, Introduction to Mathematical Probability, New York: McGraw-Hill, 1937, page 31:
Let A and B be two events whose probabilities are (A) and (B). It is understood that the probability (A) is determined without any regard to B when nothing is known about the occurrence or nonoccurrence of B. When it is known that B occurred, A may have a different probability, which we shall denote by the symbol (A, B) and call 'conditional probability of A, given that B has actually happened.'
[James A. Landau]
CONFIDENCE INTERVAL was coined by Jerzy Neyman (1894-1981) in 1934 in "On the Two Different Aspects of the Representative Method," Journal of the Royal Statistical Society, 97, 558-625:
The form of this solution consists in determining certain intervals, which I propose to call the confidence intervals..., in which we may assume are contained the values of the estimated characters of the population, the probability of an error is a statement of this sort being equal to or less than 1 - (epsilon), where (epsilon) is any number 0 < (epsilon) < 1, chosen in advance.

CONSISTENCY. The term consistency applied to estimation was introduced by R. A. Fisher in "On the Mathematical Foundations of Theoretical Statistics" (Phil. Trans. R. Soc. 1922). Fisher wrote: "A statistic satisfies the criterion of consistency, if, when it is calculated from the whole population, it is equal to the required population."

In the modern literature this notion is usually called Fisher-consistency (a name suggested by Rao) to distinguish it from the more standard notion linked to the limiting behavior of a sequence of estimators. The latter is hinted at in Fisher's writings but was perhaps first set out rigorously by Hotelling in the "The Consistency and Ultimate Distribution of Optimum Statistics," Transactions of the American Mathematical Society (1930). [This entry was contributed by John Aldrich, based on David (1995).]

CONTINGENCY TABLE was introduced by Karl Pearson in "On the Theory of Contingency and its Relation to Association and Normal Correlation," which appeared in Drapers' Company Research Memoirs (1904) Biometric Series I:
This result enables us to start from the mathematical theory of independent probability as developed in the elementary text books, and build up from it a generalised theory of association, or, as I term it, contingency. We reach the notion of a pure contingency table, in which the order of the sub-groups is of no importance whatever.
This citation was provided by James A. Landau.
CORRELATION, CORRELATION COEFFICIENT and COEFFICIENT OF CORRELATION. Francis Galton introduced the measurement of correlation (Hald, p. 604). The index of co-relation appears in 1888 in his "Co-Relations and Their Measurement," Proc. R. Soc., 45, 135-145: "The statures of kinsmen are co-related variables; thus, the stature of the father is correlated to that of the adult son,..and so on; but the index of co-relation ... is different in the different cases" (OED2). "Co-relation" soon gave way to "correlation" as in W. F. R. Weldon's "The Variations Occurring in Certain Decapod Crustacea-I. Crangon vulgaris," Proc. R. Soc., 47. (1889 - 1890), pp. 445-453.

The term coefficient of correlation was apparently originated by Edgeworth in 1892, according to Karl Pearson's "Notes on the History of Correlation," (reprinted in Pearson & Kendall (1970). It appears in 1892 in F. Y. Edgeworth, "Correlated Averages," Philosophical Magazine, 5th Series, 34, 190-204.

Correlation coefficient appears in a paper published in 1895 [James A. Landau].

The OED2 shows a use of coefficient of correlation in 1896 by Pearson in Proc. R. Soc. LIX. 302: "Let r0 be the coefficient of correlation between parent and offspring." David (1995) gives the 1896 paper by Karl Pearson, "Regression, Heredity, and Panmixia," Phil. Trans. R. Soc., Ser. A. 187, 253-318. This paper introduced the product moment formula for estimating correlations--Galton and Edgeworth had used different methods.

Partial correlation. G. U. Yule introduced "net coefficients" for "coefficients of correlation between any two of the variables while eliminating the effects of variations in the third" in "On the Correlation of Total Pauperism with Proportion of Out-Relief" (in Notes and Memoranda) Economic Journal, Vol. 6, (1896), pp. 613-623. Pearson argued that partial and total are more appropriate than net and gross in Karl Pearson & Alice Lee "On the Distribution of Frequency (Variation and Correlation) of the Barometric Height at Divers Stations," Phil. Trans. R. Soc., Ser. A, 190 (1897), pp. 423-469. Yule went fully partial with his 1907 paper "On the Theory of Correlation for any Number of Variables, Treated by a New System of Notation," Proc. R. Soc. Series A, 79, pp. 182-193.

Multiple correlation. At first multiple correlation referred only to the general approach, e.g. by Yule in Economic Journal (1896). The coefficient arrives later. "On the Theory of Correlation" (J. Royal Statist. Soc., 1897, p. 833) refers to a coefficient of double correlation R1 (the correlation of the first variable with the other two). Yule (1907) discussed the coefficient of n-fold correlation R21(23...n). Pearson used the phrases "coefficient of multiple correlation" in his 1914 "On Certain Errors with Regard to Multiple Correlation Occasionally Made by Those Who Have not Adequately Studied this Subject," Biometrika, 10, pp. 181-187, and "multiple correlation coefficient" in his 1915 paper "On the Partial Correlation Ratio," Proc. R. Soc. Series A, 91, pp. 492-498.

[This entry was largely contributed by John Aldrich.]

The term CORRELOGRAM was introduced by H. Wold in 1938 (A Study in the Analysis of Stationary Time Series). There is a plot of empirical serial correlations, i.e. an empirical correlogram, in Yule's "Why Do We Sometimes Get Nonsense Correlations between Time-series ..." Journal of the Royal Statistical Society, 89, (1926), 1-69 (David 2001).
COVARIANCE is found in 1930 in The Genetical Theory of Natural Selection by R. A. Fisher (David, 1998).

Earlier uses of the term covariance are found in mathematics, in a non-statistical sense.

The term CRITERION OF SUFFICIENCY was used by Sir Ronald Aylmer Fisher in his paper "On the Mathematical Foundations of Theoretical Statistics," in Philosophical Transactions of the Royal Society, April 19, 1922: "The complete criterion suggested by our work on the mean square error (7) is: -- That the statistic chosen should summarise the whole of the relevant information supplied by the sample. This may be called the Criterion of Sufficiency" [James A. Landau].
DECILE (in statistics) was introduced by Francis Galton (Hald, p. 604).

Decile appears in 1882 in Francis Galton, Rep. Brit. Assoc. 1881 245: "The Upper Decile is that which is exceeded by one-tenth of an infinitely large group, and which the remaining nine-tenths fall short of. The Lower Decile is the converse of this" (OED2).

DEGREES OF FREEDOM. (See also chi-squared, F-distribution and Student's t-distribution.) Fisher introduced degrees of freedom in connection with Pearson's chi-squared test in the 1922 paper "On the Interpretation of chi-squared from "Contingency Tables, and the Calculation of P," J. Royal Statist. Soc., 85, pp. 87-94. He applied the number of degrees of freedom to distributions related to chi-squared--Student's distribution and his own z distribution in his 1924 paper, "On a Distribution Yielding the Error Functions of Several well Known Statistics," Proceedings of the International Congress of Mathematics, Toronto, 2, 805-813 [John Aldrich].
DEPENDENT VARIABLE. Subordinate variable appears in English in the 1816 translation of Differential and Integral Calculus by Lacroix: "Treating the subordinate variables as implicit functions of the indepdndent [sic] ones" (OED2).

Dependent variable appears in in 1831 in the second edition of Elements of the Differential Calculus (1836) by John Radford Young: "On account of this dependence of the value of the function upon that of the variable the former, that is y, is called the dependent variable, and the latter, x, the independent variable" [James A. Landau].

DIRECT VARIATION. Directly is found in 1743 in W. Emerson, Doctrine Fluxions: "The Times of describing any Spaces uniformly are as the Spaces directly, and the Velocities reciprocally" (OED2).

Directly proportional is found in 1796 in A Mathematical and Philosophical Dictionary: "Quantities are said to be directly proportional, when the proportion is according to the order of the terms" (OED2).

Direct variation is found in 1856 in Ray's higher arithmetic. The principles of arithmetic, analyzed and practically applied by Joseph Ray (1807-1855):

Variation is a general method of expressing proportion often used, and is either direct or inverse. Direct variation exists between two quantities when they increase togeether, or decrease together. Thus the distance a ship goes at a uniform rate, varies directly as the time it sails; which means that the ratio of any two distances is equal to the ratio of the corresponding times taken in the same order. Inverse variation exists between two quantities when one increases as the other decreases. Thus, the time in which a piece of work will be done, varies inversely as the number of men employed; which means that the ratio of any two times is equal to the ratio of the numbers of men employed for these times, taken in reverse order.
This citation was taken from the University of Michigan Digital Library [James A. Landau].
DISCRIMINANT ANALYSIS is found in Palmer O. Johnson, "The quantification of qualitative data in discriminant analysis," J. Am. Stat. Assoc. 45, 65-76 (1950).

See also W. G. Cochran and C. I. Bliss, "Discriminant functions with covariance," Ann. Math. Statist. 19 (1948) [James A. Landau].

DISPERSION (in statistics) is found in 1876 in Catalogue of the Special Loan Collection of Scientific Apparatus at the South Kensington Museum by Francis Galton (David, 1998).
The term DISTRIBUTION FUNCTION of a random variable is a translation of the Verteilungsfunktion of R. von Mises "Grundlagen der Wahrscheinlichkeitsrechnung," Math. Zeit. 5, (1919) 52-99.

The English term appears in J. L. Doob's "The Limiting Distributions of Certain Statistics," Annals of Mathematical Statistics, 6, (1935), 160-169.

The term DUMMY VARIABLE is often used when describing the status of a variable like x in a definite integral. A. Church seems to be describing an established usage when he wrote in 1942, "A variable is free in a given expression ... if the expression can be considered as representing a function with that variable as an argument. In the contrary case the variable is called a bound (or apparent or dummy) variable." ("Differentials", American Mathematical Monthly, 49, 390.) [John Aldrich].

In regression analysis a DUMMY VARIABLE indicates the presence (value 1) or absence of an attribute (0).

A JSTOR search found "dummy variables" for social class and for region in H. S. Houthakker's "The Econometrics of Family Budgets" Journal of the Royal Statistical Society A, 115, (1952), 1-28.

A 1957 article by D. B. Suits, "Use of Dummy Variables in Regression Equations" Journal of the American Statistical Association, 52, 548-551, consolidated both the device and the name.

The International Statistical Institute's Dictionary of Statistical Terms objects to the name: the term is "used, rather laxly, to denote an artificial variable expressing qualitative characteristics .... [The] word 'dummy' should be avoided."

Apparently these variables were not dummy enough for Kendall & Buckland, for whom a dummy variable signifies "a quantity written in a mathematical expression in the form of a variable although it represents a constant", e.g. when the constant in the regression equation is represented as a coefficient times a variable that is always unity.

The indicator device, without the name "dummy variable" or any other, was also used by writers on experiments who put the analysis of variance into the format of the general linear hypothesis, e.g. O. Kempthorne in his Design and Analysis of Experiments (1952) [John Aldrich].

EFFICIENCY. The terms efficiency and efficient applied to estimation were introduced by R. A. Fisher in "On the Mathematical Foundations of Theoretical Statistics" (Phil. Trans. R. Soc. 1922). He described the criterion of efficiency as "satisfied by those statistics which, when derived from large samples, tend to a normal distribution with the least possible standard deviation." He also wrote: "To calculate the efficiency of any given method, we must therefore know the probable error of the statistic calculated by that method, and that of the most efficient statistic which could be used. The square of the ratio of these two quantities then measures the efficiency." Fisher seems not to have known that such calculations had been done by Gauss a century earlier (Gauss (1816) Bestimmung der Genauigkeit der Beobachtungen). However the idea of efficiency in extracting information was novel. [This entry was contributed by John Aldrich, based on David (1995).]
EMPTY SET is found in Walter J. Bruns, "The Introduction of Negative Numbers," The Mathematics Teacher, October 1940: "For our purposes we still need a symbol for an 'empty' set, that means for a multitude containing no element."

Dorothy Geddes and Sally I. Lipsey, "The Hazards of Sets," The Mathematics Teacher, October 1969 has: "The fact that mathematicians refer to the empty set emphasizes the rather unique nature of this set."

An older term is null set, q. v.

EQUIPROBABLE was used in 1921 by John Maynard Keynes in A Treatise on Probability: "A set of exclusive and exhaustive equiprobable alternatives" (OED2).
ESTIMATION. Long before the terminology stabilized around estimation the activity was called calculation, determination or fitting.

The terms estimation and estimate were introduced in R. A. Fisher's "On the Mathematical Foundations of Theoretical Statistics" (Phil. Trans. R. Soc. 1922). He writes (none too helpfully!): "Problems of estimation are those in which it is required to estimate the value of one or more of the population parameters from a random sample of the population." Fisher uses estimate as a substantive sparingly in the paper.

The phrase unbiassed estimate appears in Fisher's Statistical Methods for Research Workers (1925, p. 54) although the idea is much older.

The expression best linear unbiased estimate appears in 1938 in F. N. David and J. Neyman, "Extension of the Markoff Theorem on Least Squares," Statistical Research Memoirs, 2, 105-116. Previously in his "On the Two Different Aspects of the Representative Method" (Journal of the Royal Statistical Society, 97, 558-625) Neyman had used mathematical expectation estimate for unbiased estimate and best linear estimate for best linear unbiased estimate (David, 1995).

The term estimator was introduced in 1939 in E. J. G. Pitman, "The Estimation of the Location and Scale Parameters of a Continuous Population of any Given Form," Biometrika, 30, 391-421. Pitman (pp. 398 & 403) used the term in a specialised sense: his estimators are estimators of location and scale with natural invariance properties. Now estimator is used in a much wider sense so that Neyman's best linear unbiased estimate would be called a best linear unbiased estimator (David, 1995). [This entry was contributed by John Aldrich.]

EVENT has been in probability in English from the beginning. A. De Moivre's The Doctrine of Chances (1718) begins "The Probability of an Event is greater or less, according to the number of chances by which it may happen, compared with the whole number of chances by which it may either happen or fail."

Event took on a technical existence when Kolmogorov in the Grundbegriffe der Wahrscheinlichkeitsrechnung (1933) identified "elementary events" ("elementare Ereignisse") with the elements of a collection E (now called the "sample space") and "random events" ("zufällige Ereignisse") with the elements of a set of subsets of E [John Aldrich].

EXPECTATION. According to A. W. F. Edwards, expectatio occurs in 1657 in Huygens's De Ratiociniis in Ludo Alae (David 1995).

According to Burton (p. 461), the word expectatio first appears in van Schooten's translation of a tract by Huygens.

The two references above point to the same text as Huygens's De Ratiociniis in Ludo Alae was a translation by van Schooten. NB The word expectatio is used quite frequently throughout the text.
This is the Latin translation by Van Schooten of the first proposition:

Si a vel b expectem, quorum utriusque aeque facile mihi obtingere possit. expectatio mea dicenda est (a+b)/2
This is the Dutch text of Huygens' Van Rekeningh in Spelen van Geluck. This text was published in 1660 but already written in 1656.
Als ick gelijcke kans hebbe om a of b te hebben, dit is my so veel weerdt als (a+b)/2
The litteral translation of the Dutch text is: If I have an equal chance to get either a or b, this to me is worth as much as (a+b)/2. There is no explicit mention of expectation only of value, but as the rest of the explanation of the first proposition is concentrated on the possible outcomes of a game of chance, expectation is implicitly around.

Expectation appears in English in Browne's 1714 translation of Huygens's De Ratiociniis in Ludo Alae (David 1995).
This is Browne's 1714 translation of the first proposition:

If I expect a or b, and have an equal chance of gaining either of them, my Expectation is worth (a+b)/2

See also mathematical expectation.

EXTREME VALUE appears in E. J. Gumbel, "Les valeurs extrêmes des distributions statistiques," Ann. Inst. H. Poincaré, 5 (1934).

See also L. H. C. Tippett, "On the extreme individuals and the range of samples taken from a normal population," Biometrika 17 (1925) [James A. Landau].

F DISTRIBUTION. The F distribution was tabulated - and the letter introduced - by G. W. Snedecor Calculation and Interpretation of Analysis of Variance and Covariance (1934). (David, 1995). The letter was chosen to honor Fisher.

The term F distribution is found in Leo A. Aroian, "A study of R. A. Fisher's z distribution and the related F distribution," Ann. Math. Statist. 12, 429-448 (1941).

The term FACTOR ANALYSIS was introduced by Louis L. Thurstone (1887-1955) in 1931 in "Multiple Factor Analysis," Psychological Review, 38, 406-427: "It is the purpose of this paper to describe a more generally applicable method of factor analysis which has no restrictions as regards group factors and which does not restrict the number of general factors that are operative in producing the correlations" (OED2).
FIDUCIAL PROBABILITY and FIDUCIAL DISTRIBUTION first appeared in R. A. Fisher's 1930 paper "Inverse Probability," Proceedings of the Cambridge Philosophical Society, 26, 528-535 (David (2001)).
The term FLUCTUATION was introduced by F.Y. Edgeworth in 1885 as a measure of dispersion. The fluctation equals 2s2 which is the Modulus squared. It could be interpreted as a precursor of Fisher's variance
FREQUENCY DISTRIBUTION is found in 1895 in Karl Pearson, Phil. Trans. R. Soc. A. CLXXXVI. 412: "A method is given of expressing any frequency distribution by a series of differences of inverse factorials with arbitrary constants" (OED2).
The term FREQUENTIST (one who believes that the probability of an event should be defined as the limit of its relative frequency in a large number of trials) was used by M. G. Kendall in 1949 in Biometrika XXXVI. 104: "It might be thought that the differences between the frequentists and the non-frequentists (if I may call them such) are largely due to the differences of the domains which they purport to cover" (OED2).
GAUSSIAN CURVE (normal curve) appears in a 1902 paper by Karl Pearson [James A. Landau].

Gaussian distribution and Gaussian law were used by Karl Pearson in 1905 in Biometrika IV: "Many of the other remedies which have been proposed to supplement what I venture to call the universally recognised inadequacy of the Gaussian law .. cannot .. effectively describe the chief deviations from the Gaussian distribution" (OED2).

In an essay in the 1971 book Reconsidering Marijuana, Carl Sagan, using the pseudonym "Mr. X," wrote, "I can remember one occasion, taking a shower with my wife while high, in which I had an idea on the origins and invalidities of racism in terms of gaussian distribution curves. I wrote the curves in soap on the shower wall, and went to write the idea down."

GAUSSIAN DISTRIBUTION and GAUSSIAN LAW were used by Karl Pearson in 1905 in Biometrika (OED2).
The name GAUSS-MARKOV THEOREM for the chief result on least squares and best linear unbiassed estimation in the linear (regression) model has a curious history. David (1998) refers to H. Scheffé's 1959 book Analysis of Variance where the expression "Gauss-Markoff theorem" appears. Before that the name "Markoff theorem" had been popularized by J. Neyman, starting with his "On the Two Different Aspects of the Representative Method" (Journal of the Royal Statistical Society, 97, 558-625). Neyman thought that this contribution from the Russian A. A. Markov had been overlooked in the West. However in 1949 Plackett (Biometrika, 36, 149-157) showed that Markov had done no more than Gauss nearly a century before in 1821/3. (In the nineteenth century the theorem was often referred to as "Gauss's second proof of the method of least squares" - the "first" being a Bayesian argument Gauss published in 1809). Following Plackett, a few authors adopted the expression "Gauss theorem" but "Markov" was well-entrenched and the compromise "Gauss-Markov theorem" has become standard. [This entry was contributed by John Aldrich.]
GEOMETRIC MEAN. The term geometrical mean is found in the 1771 edition of the Encyclopaedia Britannica [James A. Landau].
The term GOODNESS OF FIT is found in the sentence, "The 'percentage error' in ordinate is, of course, only a rough test of the goodness of fit, but I have used it in default of a better." This citation is a footnote in Karl Pearson, "Contributions to the Mathematical Theory of Evolution II Skew Variation in Homogeneous Material," which was in Philosophical Transactions of the Royal Society of London (1895) Series A, vol 186, pp 343-414 [James A. Landau].
The term HARMONIC MEAN is due to Archytas of Tarentum, according to the University of St. Andrews website, which also states that it had been called sub-contrary in earlier times.

The term was also used by Aristotle.

According to the Catholic Encyclopedia, the word harmonic first appears in a work on conics by Philippe de la Hire (1640-1718) published in 1685.

Harmonical mean is found in English in the 1828 Webster dictionary:

Harmonical mean, in arithmetic and algebra, a term used to express certain relations of numbers and quantities, which are supposed to bear an analogy to musical consonances.
Harmonic mean is found in 1851 in Problems in illustration of the principles of plane coordinate geometry by William Walton [University of Michigan Digital Library].

Harmonic mean is also found in 1851 in The principles of the solution of the Senate-house 'riders,' exemplified by the solution of those proposed in the earlier parts of the examinations of the years 1848-1851 by Francis James Jameson: "Prove that the discount on a sum of money is half the harmonic mean between the principal and the interest" [University of Michigan Digital Library].

HETERO- and HOMOSCEDASTICITY. The terms heteroscedasticity and homoscedasticity were introduced in 1905 by Karl Pearson in "On the general theory of skew correlation and non-linear regression," Drapers' Company Res. Mem. (Biometric Ser.) II. Pearson wrote, "If ... all arrays are equally scattered about their means, I shall speak of the system as a homoscedastic system, otherwise it is a heteroscedastic system." The words derive from the Greek skedastos (capable of being scattered).

Many authors prefer the spelling heteroskedasticity. J. Huston McCulloch (Econometrica 1985) discusses the linguistic aspects and decides for the k-spelling. Pearson recalled that when he set up Biometrika in 1901 Edgeworth had insisted the name be spelled with a k. By 1932 when Econometrica was founded standards had fallen or tastes had changed. [This entry was contributed by John Aldrich, referring to OED2 and David, 1995.]

HISTOGRAM. The term histogram was coined by Karl Pearson.

In Philos. Trans. R. Soc. A. CLXXXVI, (1895) 399 Pearson explained that term was "introduced by the writer in his lectures on statistics as a term for a common form of graphical representation, i.e., by columns marking as areas the frequency corresponding to the range of their base."

S. M. Stigler writes in his History of Statistics that Pearson used the term in his 1892 lectures on the geometry of statistics.

The earliest citation in the OED2 is in 1891 in E. S. Pearson Karl Pearson (1938).

HYPOTHESIS TESTING. Test of hypothesis is found in 1928 in J. Neyman and E. S. Pearson, "On the use and Interpretation of Certain Test Criteria for Purposes of Statistical Inference. Part I," Biometrika, 20 A, 175-240 (David, 1995).
INDEPENDENT EVENT and DEPENDENT EVENT are found in 1738 in The Doctrine of Chances by De Moivre: "Two Events are independent, when they have no connexion one with the other, and that the happening of one neither forwards nor obstructs the happening of the other. Two events are dependent, when they are so connected together as that the Probability of either's happening is alter'd by the happening of the other."
INDEPENDENT VARIABLE is is found in the 1816 translation of Differential and Integral Calculus by Lacroix: "Treating the subordinate variables as implicit functions of the independent ones" (OED2).
INFORMATION, AMOUNT OF, QUANTITY OF in the theory of statistical estimation. R. A. Fisher first wrote about "the whole of the information which a sample provides" in 1920 (Mon. Not. Roy. Ast. Soc., 80, 769). In 1922-5 he developed the idea that information could be given quantitative expression as minus the expected value of the second derivative of the log-likelihood. The formula for "the amount of information in a single observation" appears in the 1925 "Theory of Statistical Estimation," Proc. Cambr. Philos. Soc. 22. 700-725. In the modern literature the qualification Fisher's information is common, distinguishing Fisher's measure from others originating in the theory of communication as well as in statistics. [John Aldrich and David (1995)].
INTEGER and WHOLE NUMBER. Writing in Latin, Fibonacci used numerus sanus.

According to Heinz Lueneburg, the term numero sano "was used extensively by Luca Pacioli in his Summa. Before Pacioli, it was already used by Piero della Francesca in his Trattato d'abaco. I also find it in the second edition of Pietro Cataneo's Le pratiche delle due prime matematiche of 1567. I haven't seen the first edition. Counting also Fibonacci's Latin numerus sanus, the word sano was used for at least 350 years to denote an integral (untouched, virginal) number. Besides the words sanus, sano, the words integer, intero, intiero were also used during that time."

The first citation for whole number in the OED2 is from about 1430 in Art of Nombryng ix. EETS 1922:

Of nombres one is lyneal, ano(th)er superficialle, ano(th)er quadrat, ano(th)cubike or hoole.
In the above quotation (th) represents a thorn. In this use, whole number has the obsolete definition of "a number composed of three prime factors," according to the OED2.

Whole number is found in its modern sense in the title of one of the earliest and most popular arithmetics in the English language, which appeared in 1537 at St. Albans. The work is anonymous, and its long title runs as follows: "An Introduction for to lerne to reken with the Pen and with the Counters, after the true cast of arismetyke or awgrym in hole numbers, and also in broken" (Julio González Cabillón).

Oresme used intégral.

Integer was used as a noun in English in 1571 by Thomas Digges (1546?-1595) in A geometrical practise named Pantometria: "The containing circles Semidimetient being very nighe 11 19/21 for exactly nether by integer nor fraction it can be expressed" (OED2).

Integral number appears in 1658 in Phillips: "In Arithmetick integral numbers are opposed to fraction[s]" (OED2).

Whole number is most frequently defined as Z+, although it is sometimes defined as Z. In Elements of the Integral Calculus (1839) by J. R. Young, the author refers to "a whole number or 0" but later refers to "a positive whole number."

INTERQUARTILE RANGE is found in 1882 in Francis Galton, "Report of the Anthropometric Committee," Report of the 51st Meeting of the British Association for the Advancement of Science, 1881, pp. 245-260: "This gave the upper and lower 'quartile' values, and consequently the 'interquartile' range (which is equal to twice the 'probable error') (OED2).
INTERSECTION (in set theory) is found in Webster's New International Dictionary of 1909.
k-STATISTICS. k-statistics are sample cumulants and were introduced with them by R. A. Fisher in 1929. The term "k-statistic" appears in the 1932 edition of his Statistical Methods for Research Workers [John Aldrich].
KOLMOGOROV-SMIRNOV TEST appears in F. J. Massey Jr., "The Kolmogorov-Smirnov test of goodness of fit," J. Amer. Statist. Ass. 46 (1951).

See also W. Feller, "On the Kolmogorow-Smirnov limit theorems for empirical distributions," Ann. Math. Statist. 19 (1948) [James A. Landau].

KURTOSIS was used by Karl Pearson in 1905 in "Das Fehlergesetz und seine Verallgemeinerungen durch Fechner und Pearson. A Rejoinder," Biometrika, 4, 169-212, in the phrase "the degree of kurtosis." He states therein that he has used the term previously (OED2).
The term LATIN SQUARE was named by Euler (as quarré latin) in 1782 in Verh. uitgegeven door het Zeeuwsch Genootschap d. Wetensch. te Vlissingen.

Latin square appears in English in 1890 in the title of a paper by Arthur Cayley, "On Latin Squares" in Messenger of Mathematics.

The term was introduced into statistics by R. A. Fisher, according to Tankard (p. 112). Fisher used the term in 1925 in Statistical Methods Res. Workers (OED2).

Graeco-Latin square appears in 1934 in R. A. Fisher and F. Yates, "The 6 x 6 Latin Squares," Proceedings of the Cambridge Philosophical Society 30, 492-507.

LAW OF LARGE NUMBERS. La loi de grands nombres appears in 1835 in Siméon-Denis Poisson (1781-1840), "Recherches sur la Probabilité des Jugements, Principalement en Matiére Criminelle," Comptes Rendus Hebdomadaires des Séances de l'Académie des Sciences, 1, 473-494 (James, 1998).

According to Porter (p. 12), Poisson coined the term in 1835.

LEPTOKURTIC (and platykurtic and mesokurtic) were introduced by Karl Pearson, who wrote in Biometrika (1905) IV. 173: "Given two frequency distributions which have the same variability as measured by the standard deviation, they may be relatively more or less flat-topped than the normal curve. If more flat-topped I term them platykurtic, if less flat-topped leptokurtic, and if equally flat-topped mesokurtic" (OED2).
LIKELIHOOD. The term was first used in its modern sense in R. A. Fisher's "On the 'Probable Error' of a Coefficient of Correlation Deduced from a Small Sample," Metron, 1, (1921), 3-32.

Formerly, likelihood was a synonym for probability, as it still is in everyday English. (See the entry on maximum likelihood and the passage quoted there for Fisher's attempt to distinguish the two. In 1921 Fisher referred to the value that maximizes the likelihood as "the optimum.")

Likelihood first appeared in a Bayesian context in H. Jeffreys's Theory of Probability (1939) [John Aldrich, based on David (2001)].

LIKELIHOOD PRINCIPLE. This expression burst into print in 1962, appearing in "Likelihood Inference and Time Series" by G. A. Barnard, G. M. Jenkins, C. B. Winsten (Journal of the Royal Statistical Society A, 125, 321-372), "On the Foundations of Statistical Inference" by A. Birnbaum (Journal of the American Statistical Association, 57, 269-306), and L. J. Savage et al, (1962) The Foundations of Statistical Inference. It must have been current for some time because the Savage volume records a conference in 1959; the term appears in Savage's contribution so the expression may have been his coining.

The principle (without a name) can be traced back to R. A. Fisher's writings of the 1920s though its clearest earlier manifestation is in Barnard's 1949 "Statistical Inference" (Journal of the Royal Statistical Society. Series B, 11, 115-149). On these earlier outings the principle attracted little attention.

The LIKELIHOOD RATIO figured in the test theory of J. Neyman and E. S. Pearson from the beginning, "On the Use of Certain Test Criteria for Purposes of Statistical Inference, Part I" Biometrika, (1928), 20A, 175-240. They usually referred to it as the likelihood although the phrase "likelihood ratio" appears incidentally in their "Problem of k Samples," Bulletin Académie Polonaise des Sciences et Lettres, A, (1931) 460-481. This phrase was more often used by others writing about Neyman and Pearson's work, e.g. Brandner "A Test of the Significance of the Difference of the Correlation Coefficients in Normal Bivariate Samples," Biometrika, 25, (1933), 102-109.

The standing of "likelihood ratio" was confirmed by S. S. Wilks's "The Large-Sample Distribution of the Likelihood Ratio for Testing Composite Hypotheses," Annals of Mathematical Statistics, 9, (1938), 60-620 [John Aldrich, based on David (2001)].

LOSS and LOSS FUNCTION in statistical decision theory.  In the paper establishing the subject ("Contributions to the Theory of Statistical Estimation and Testing  Hypotheses," Annals of Mathematical Statistics, 10, 299-326) Wald referred to "loss" but used "weight function" for the (modern) loss function. He continued to use weight function, for instance in his book Statistical Decision Functions (1950), while others adopted loss function. Arrow, Blackwell & Girshick’s "Bayes and Minimax Solutions of Sequential Decision Problems" (Econometrica, 17, (1949) 213-244) wrote L rather than W for the  function and called it the loss function. A paper by Hodges & Lehmann ("Some Problems in Minimax Point Estimation," Annals of Mathematical Statistics, 21, (1950), 182-197) used loss function more freely but retained Wald’s W. [John Aldrich, based on David (2001) and JSTOR]
MATHEMATICAL EXPECTATION was used by DeMorgan in 1838 in An Essay on Probabilities (1841) 97: "The balance is the average required, and is known by the name of mathematical expectation" (OED2).

See also expectation.

MATHEMATICAL STATISTICS. Mathematische Statistik is found in 1867 in the title Mathematische Statistik und deren Anwendung auf National-Oekonomie und Versicherungs-Wissenschaft by T. Wittstein (David, 1998).
The term MAXIMUM LIKELIHOOD was introduced by Sir Ronald Aylmer Fisher in his paper "On the Mathematical Foundations of Theoretical Statistics," in Philosophical Transactions of the Royal Society, April 19, 1922. In this paper he made clear for the first time the distinction between the mathematical properties of "likelihoods" and "probabilities" (DSB).
The solution of the problems of calculating from a sample the parameters of the hypothetical population, which we have put forward in the method of maximum likelihood, consists, then, simply of choosing such values of these parameters as have the maximum likelihood. Formally, therefore, it resembles the calculation of the mode of an inverse frequency distribution. This resemblance is quite superficial: if the scale of measurement of the hypothetical quantity be altered, the mode must change its position, and can be brought to have any value, by an appropriate change of scale; but the optimum, as the position of maximum likelihood may be called, is entirely unchanged by any such transformation. Likelihood also differs from probability in that it is not a differential element, and is incapable of being integrated: it is assigned to a particular point of the range of variation, not to a particular element of it.

MEAN occurs in English in the sense of a geometric mean in a Middle English manuscript of circa 1450 known as The Art of Numbering: "Lede the rote of o quadrat into the rote of the oþer quadrat, and þan wolle þe meene shew" [Mark Dunn].

In 1571, A geometrical practise named Pantometria by Thomas Digges (1546?-1595) has: "When foure magnitudes are...in continual proportion, the first and the fourth are the extremes, and the second and thirde the meanes" (OED2).

Mean is found in 1755 in Thomas Simpson, "An ATTEMPT to shew the Advantage, arising by Taking the Mean of a Number of Observations, in practical Astronomy," Philosophical Transactions of the Royal Society of London.

MEAN ERROR. The 1845 Encyclopedia Metropolitana has "mean risk of error" (OED2).

Mean error is found in 1853 in A dictionary of arts, manufactures, and mines; containing a clear exposition of their principles and practice by Andrew Ure [University of Michigan Digital Library].

Mean error is found in English in an 1857 translation of Gauss's Theoria motus: Consequently, if we desire the greatest accuracy, it will be necessary to compute the geocentric place from the elements for the same time, and afterwards to free it from the mean error A, in order that the most accurate position may be obtained. But it will in general be abundantly sufficient if the mean error is referred to the observation nearest to the mean time" [University of Michigan Digital Library].

In 1894 in Phil. Trans. Roy. Soc, Karl Pearson has "error of mean square" as an alternate term for "standard-deviation" (OED2).

In Higher Mathematics for Students of Chemistry and Physics (1912), J. W. Mellor writes:

In Germany, the favourite method is to employ the mean error, which is defined as the error whose square is the mean of the squares of all the errors, or the "error which, if it alone were assumed in all the observations indifferently, would give the same sum of the squares of the errors as that which actually exists." ...

The mean error must not be confused with the "mean of the errors," or, as it is sometimes called, the average error, another standard of comparison defined as the mean of all the errors regardless of sign.

In a footnote, Mellor writes, "Some writers call our "average error" the "mean error," and our "mean error" the "error of mean square" [James A. Landau].
MEAN SQUARE is found in 1845 Encycl. Metrop. (OED2).
The term MEAN SQUARE DEVIATION (apparently meaning variance) appears in a paper published by Sir Ronald Aylmer Fisher in 1920 [James A. Landau].
MEDIAN (in statistics). Valeur médiane was used by Antoine A. Cournot in 1843 in Exposition de la Théorie des Chances et des Probabilités (David, 1998).

Median was used in English by Francis Galton in Report of the British Association for the Advancement of Science in 1881: "The Median, in height, weight, or any other attribute, is the value which is exceeded by one-half of an infinitely large group, and which the other half fall short of" (OED2).

The term METHOD OF LEAST SQUARES was coined by Adrien Marie Legendre (1752-1833), appearing in Sur la Méthode des moindres quarrés [On the method of least squares], the title of an appendix to Nouvelles méthodes pour la détermination des orbites des comètes (1805). The appendix is dated March 6, 1805 [James A. Landau].

"Minimum" and "small" were the early English translations of moindres (David, 1995).

Method of least squares occurs in English in 1825 in the title "On the Method of Least Squares" by J. Ivory in Philosophical Magazine, 65, 3-10.

MODE was coined by Karl Pearson (1857-1936). He used the term in 1895 in "Skew Variation in Homogeneous Material," Philosophical Transactions of the Royal Society of London, Ser. A, 186, 343-414: "I have found it convenient to use the term mode for the abscissa corresponding to the ordinate of maximum frequency. Thus the "mean," the "mode," and the "median" have all distinct characters."
MODULUS (in logarithms) was used by Roger Cotes (1682-1716) in 1722 in Harmonia Mensurarum: Pro diversa magnitudine quantitatis assumptae M, quae adeo vocetur systematis Modulus. Cotes also coined the term ratio modularis (modular ratio) in this work.

Modulus (a coefficient that expresses the degree to which a body possesses a particular property) appears in the 1738 edition of The Doctrine of Chances: or, a Method of Calculating the Probability of Events in Play by Abraham De Moivre (1667-1754) [James A. Landau].

(Corollary 6)...To apply this to particular Examples, it will be necessary to estimate the frequency of an Event's happening or failing by the Square-root of the number which denotes how many Experiments have been, or are designed to be taken, and this Square-root, according as at has been already hinted at in the fourth Corollary, will be as it were the Modulus by which we are to regulate our Estimation, and therefore suppose the number of Experiments to be taken is 3600, and that it were required to assign the Probability of the Event's neither happening oftner than 2850 times, nor more rarely than 1750, which two numbers may be varied at pleasure, provided they he equally distant from the middle Sum 1800, then make the half difference between the two numbers 1850 and 1750, that is, in this case, 50=sÖn; now having supposed 3600=n, then Ön will be 60, which will make it that 50 will be =60s, and consequently s=50/60=5/6, and therefore if we take the proportion, which in an infinite power, the double Sum of the Terms corresponding to the Interval 5/6 Ön, bears to the Sum of all the Terms, we shall have the Probability required exceeding near.

See also Stigler (1986), page 83. The Egyptologist Flinders Petrie (1883) refers to the modulus as a measure of dispersion. His sources are Airy's Theory of Errors (18752) and De Morgan's Essay on Probability (1838). The modulus equals Ö2 s. FY Edgeworth also uses the modulus in 1885.

Modulus (in number theory) was introduced by Gauss in 1801 in Disquisitiones arithmeticae:

Si numerus a numerorum b, c differentiam metitur, b et c secundum a congrui dicuntur, sin minus, incongrui; ipsum a modulum appelamus. Uterque numerorum b, c priori in casu alterius residuum, in posteriori vero nonresiduum vocatur. [If a number a measure the difference between two numbers b and c, b and c are said to be congruent with respect to a, if not, incongruent; a is called the modulus, and each of the numbers b and c the residue of the other in the first case, the non-residue in the latter case.]
Modulus (in number theory) is found in English in 1811 in An Elementary Investigation of the Theory of Numbers by Peter Barlow [James A. Landau].

Modulus (the length of the vector a + bi) is due to Jean Robert Argand (1768-1822) (Cajori 1919, page 265). The term was first used by him in 1814, according to William F. White in A Scrap-Book of Elementary Mathematics (1908).

Modulus for Ö(a2 + b2) was used by Augustin-Louis Cauchy (1789-1857) in 1821.

MOMENT was used in the obsolete sense of "an infinitesimal increment or decrement of a varying quantity" by Isaac Newton in 1704 in De Quadratura Curvarum: "Momenta id est incrementa momentanea synchrona" (OED2).

Moment appears in English in the obsolete sense of "momentum" in 1706 in Synopsis Palmariorum Matheseos by William Jones: "Moment..is compounded of Velocity..and..Weight" (OED2).

Moment of a force appears in 1830 in A Treatise on Mechanics by Henry Kater and Dionysius Lardner (OED2).

Moment was used in a statistics sense by Karl Pearson in October 1893 in Nature: "Now the centre of gravity of the observation curve is found at once, also its area and its first four moments by easy calculation" (OED2).

The phrase method of moments was used in a statistics sense in the first of Karl Pearson's "Contributions to the Mathematical Theory of Evolution" (Phil. Trans. R. Soc. 1894). The method was used to estimate the parameters of a mixture of normal distributions. For several years Pearson used the method on different problems but the name only gained general currency with the publication of his 1902 Biometrika paper "On the systematic fitting of curves to observations and measurements" (David 1995). In "On the Mathematical Foundations of Theoretical Statistics" (Phil. Trans. R. Soc. 1922), Fisher criticized the method for being inefficient compared to his own maximum likelihood method (Hald pp. 650 and 719). [This paragraph was contributed by John Aldrich.]

MONTE CARLO. The method as well as the name for it were apparently first suggested by John von Neumann and Stanislaw M. Ulam. In an unpublished manuscript, "The Origin of the Monte Carlo Method," dated Apr. 12, 1983, Ulam wrote that the method came to him while playing solitaire during an illness in 1946, and that what seems to be the first written account of the method was given by von Neumann in a letter to Robert Richtmyer of Los Alamos in early 1947.

According to W. L. Winston, the term was coined by Ulam and von Neumann in the feasibility project of atomic bomb by simulations of nuclear fission; they gave the code name Monte Carlo for these simulations.

According to several Internet web pages, the term was coined in 1947 by Nicholas Metropolis, inspired by Ulam's interest in poker during the Manhattan Project of World War II.

Monte Carlo method occurs in the title "The Monte Carlo Method" by Nicholas Metropolis in the Journal of the American Statistical Association 44 (1949).

Monte Carlo method also appears in 1949 in Math. Tables & Other Aids to Computation III: "This method of solution of problems in mathematical physics by sampling techniques based on random walk models constitutes what is known as the 'Monte Carlo' method. The method as well as the name for it were apparently first suggested by John von Neumann and S. M. Ulam" (OED2).

MULTIVARIATE is found in J. Wishart, "The generalized product moment distribution in samples from a normal multivariate population," Biometrika 20A, 32 (1928) [James A. Landau].
NON-NORMAL appears in 1929 in Biometrika in the heading: "On the distribution of the ratio of mean to standard deviation in small samples from non-normal universes" (OED2).
NONPARAMETRIC (referring to a statistical inference) is found in 1942 in Jacob Wolfowitz (1910-1981), "Additive Partition Functions and a Class of Statistical Hypotheses," Annals of Mathematical Statistics, 13, 247-279 (David, 1995).
NORMAL (statistics). Normal was used by F. Galton in 1889 in Natural Inheritance. David (1995) writes that Stigler informs him that this is the first use of "normal" unambiguously as a term for the distribution.

Normal probability curve was used by Karl Pearson (1857-1936) in 1893 in Nature 26 Oct. 615/2: "As verification note that for the normal probability curve 3µ22 = µ4 and µ3 = 0" (OED2).

Pearson used normal curve in 1894 in "Contributions to the Mathematical Theory of Evolution":

When a series of measurements gives rise to a normal curve, we may probably assume something approaching a stable condition; there is production and destruction impartially around the mean.
The above quotation is from Porter.

Pearson used normal curve in 1894 in Phil. Trans. R. Soc. A. CLXXXV. 72: "A frequency-curve, which for practical purposes, can be represented by the error curve, will for the remainder of this paper be termed a normal curve."

Normal distribution appears in 1897 in Proc. R. Soc. LXII. 176: "A random selection from a normal distribution" (OED2).

According to Hald, p. 356:

The new error distribution was first of all called the law of error, but many other names came to be used, such as the law of facility of errors, the law of frequency of errors, the Gaussian law of errors, the exponential law, and the typical law of errors. In his paper "Typical laws of heredity" Galton (1877) studied biological variation, and he therefore replaced the term "error" with "deviation," and referring to Quetelet, he called the distribution "the mathematical law of deviation." Chapter 5 in Galton's Natural Inheritance (1889a) is entitled "Normal Variability," and he writes consistently about "The Normal Curve of Distributions," an expression that caught on.
According to Walker (p. 185), Karl Pearson did not coin the term normal curve. She writes, "Galton used it, as did also Lexis, and the writer has not found any reference which seems to be its first use."

Nevertheless, "...Pearson's consistent and exclusive use of this term in his epoch-making publications led to its adoption throughout the statistical community" (DSB).

However, Porter (p. 312) calls normal curve a "Pearsonian neologism."

NORMAL CORRRELATION appears in W. F. Sheppard, "On the application of the theory of error to cases of normal distributions and normal correlations," Phil. Trans. A, 192, page 1091, and Proc. Roy. Soc. 62, page 170 (1898) [James A. Landau].
NORMAL DEVIATE is found in 1925 in R. A. Fisher, Statistical Methods: "Table I. shows that the normal deviate falls outside the range +/-1.598193 in 10 per cent of cases" (OED2).
NORMAL LAW was coined by Karl Pearson in 1894, according to Porter (p. 13).
NORMAL POPULATION appears in E. S. Pearson, "A further note on the distribution of range in samples taken from a normal population," Biometrika 18, page 173 (1926) [James A. Landau]. Also see extreme value.
NORMAL SAMPLES is found in R. A. Fisher, "The moments of the distribution for normal samples of measures of departure from normality," Proc. Roy. Soc. A, 130 (1930).
NULL HYPOTHESIS is used in 1935 by Ronald Aylmer Fisher in The Design of Experiments. He writes, "We may speak of this hypothesis as the 'null hypothesis,' and it should be noted that the null hypothesis is never proved or established, but is possibly disproved, in the course of experimentation."

The "null hypothesis" is often identified with the "hypothesis tested" of J. Neyman and E. S. Pearson's 1933 paper, "On the Problems of the Most Efficient Tests of Statistical Hypotheses" Phil. Trans. Roy. Soc. A (1933), 289-337, and represented by their symbol H0. Neyman did not like the "null hypothesis," arguing (First Course in Probability and Statistics, 1950, p. 259) that "the original term 'hypothesis tested' seems more descriptive." It is not clear, however, that "hypothesis tested" was ever floated as a technical term [John Aldrich].

NULL SET. Null-set appears in 1906 in Theory of Sets and Points by W. H. and G. C. Young (OED2).
ORDINAL. The earliest citation for this term in the OED2 is in 1599 in Percyvall's Dictionarie in Spanish and English enlarged by J. Minsheu, in which the phrase ordinall numerals is found.
P-VALUE is found in 1943 in Statistical Adjustment of Data by W. E. Deming (David, 1998).
PARAMETER (in statistics) is found in 1914 in E. Czuber, Wahrscheinlichkeitsrechnung, Vol. I (David, 1998).

Parameter is found in 1922 in R. A. Fisher, "On the Mathematical Foundations of Theoretical Statistics," Philosoophical Transactions of the Royal Society of London, Ser. A. 222, 309-368 (David, 1995).

The term was introduced by Fisher, according to Hald, p. 716.

PERCENTILE appears in 1885 in Francis Galton, "Some Results of the Anthropometric Laboratory," Journal of the Anthropological Institute, 14, 275-287: "The value which 50 per cent. exceeded, and 50 per cent. fell short of, is the Median Value, or the 50th per-centile, and this is practically the same as the Mean Value; its amount is 85 lbs." (OED2).

According to Hald (p. 604), Galton introduced the term.

PERMUTATION first appears in print with its present meaning in Ars Conjectandi by Jacques Bernoulli: "De Permutationibus. Permutationes rerum voco variationes..." (Smith vol. 2, page 528).

Earlier, Leibniz had used the term variationes and Wallis had adopted alternationes (Smith vol. 2, page 528).

PIE CHART is found in 1922 in A. C. Haskell, Graphic Charts in Business (OED2).
POPULATION. See sample.
POISSON DISTRIBUTION. Poisson's exponential binomial limit appears in 1914 in the title "Tables of Poisson's Exponential Limit" by Herbert Edward Soper in Biometrika, 10, 25-35 (David, 1995).

Poisson distribution appears in 1922 in Ann. Appl. Biol. IX. 331: "When the statistical examination of these data was commenced it was not anticipated that any clear relationship with the Poisson distribution would be obtained" (OED2).

POPULATION. See sample.
POSTERIOR PROBABILITY and PRIOR PROBABILITY. These contractions of "probability a priori" and "probability a posteriori" were introduced by Wrinch and Jeffreys ("On Certain Fundamental Principles of Scientific Inquiry," Philosophical Magazine, 42, (1921), 369-390). The longer forms were used by Lubbock & Drinkwater-Bethune (On Probability, 1830?) presumably following Laplace (Théorie Analytique des Probabilités (1812)) who wrote of "la probabilité de l'évenement observé, déterminée à priori" though Laplace did not use the à posteriori form [John Aldrich, using David (2001) and Hald (1998, p. 162)].
POWER (of a test) is found in 1933 in J. Neyman and E. S. Pearson, "The Testing of Statistical Hypotheses in Relation to Probabilities A Priori," Proceedings of the Cambridge Philosophical Society, 24, 492-510 (David (2001)).
The term PROBABILITY may appear in Latin in De Ratiociniis in Ludo Aleae (1657) by Christiaan Huygens, since the 1714 English translation has:
As, if any one shou'd lay that he wou'd throw the Number 6 with a single die the first throw, it is indeed uncertain whether he will win or lose; but how much more probability there is that he shou'd lose than win, is easily determin'd, and easily calculated.
This is from the Latin translation by van Schooten of Huygens' introduction:
Ut si quis primo jactu una tessera senarium jacerere contendat, incertum quidem an vincet; at quanto verisimilius sit eum perdere quam vincere, reipsa definitum est, calculoque subducitur.
This is the Dutch text of the introduction of Huygens' Van Rekeningh in Spelen van Geluck. This text was published in 1660 but allready written in 1656.
Als, by exempel. Die met een dobbel-stee(n) ten eerste(n) een ses neemt te werpen / het is onseecker of hy het winnen sal of niet; maer hoe veel minder kans hy heeft om te winnen als om te verliesen / dat is in sich selven seecker / en werdt door reeckeningh uyt-gevonden.
TO resolve which, we must observe, First, That there are six several Throws upon one Die, which all have an equal probability of coming up.
This is from the Latin translation by van Schooten of Huygens' 9th proposition:
Ad quas solvendas advertendum est. Primo unius tesserae sex esse jactus diversos, quorum quivis aeque facile eveniat.
This is the Dutch text from the 9th proposition of Huygens' Van Rekeningh in Spelen van Geluck.
Om welcke te solveeren / so moet hier op worden acht genomen. Eerstelijck dat op 1 steen zijn 6 verscheyde werpen / die even licht konnen gebeuren.
Although Huygens uses the word Kans (Chance) repeatedly in his Dutch text, van Schooten seems in his Latin translation to rephrase the text every time just to circumvent the use of a single term for probability. (See p. 11-13, in Waerden, BL van der (ed, 1975) Die Werke von Jacob Bernoulli, Band 3, Birckhauser Verlag Basel)

The opening sentence of De Mensura Sortis (1712) by Abraham de Moivre (1667-1754) is translated:
If p is the number of chances by which a certain event may happen, and q is the number of chances by which it may fail; the happenings as much as the failings have their degree of probability: But if all the chances by which the event may happen or fail were equally easy; the probability of happening will be to the probability of failing as p to q.
The first citation for probability in the OED2 is in 1718 in the title The Doctrine of Chances: or, a Method of Calculating the Probability of Events in Play by De Moivre.

Pascal did not use the term (DSB).

PROBABILITY DENSITY FUNCTION. Probability function appears in J. E. Hilgard, "On the verification of the probability function," Rep. Brit. Ass. (1872).

Wahrscheinlichkeitsdichte appears in 1912 in Wahrscheinlichkeitsrechnung by A. A. Markoff (David, 1998).

In J. V. Uspensky, Introduction to Mathematical Probability (1937), page 264 reads "The case of continuous F(t), having a continuous derivative f(t) (save for a finite set of points of discontinuity), corresponds to a continuous variable distributed with the density f(t), since F(t) = integral from -infinity to t f(x)dx" [James A. Landau].

Probability density appears in 1939 in H. Jeffreys, Theory of Probability: "We shall usually write this briefly P(dx|p) = f'(x)dx, dx on the left meaning the proposition that x lies in a particular range dx. f'(x) is called the probability density" (OED2).

Probability density function appears in 1946 in an English translation of Mathematical Methods of Statistics by Harald Cramér. The original appeared in Swedish in 1945 [James A. Landau].

PROBABILITY DISTRIBUTION appears in a paper published by Sir Ronald Aylmer Fisher in 1920 [James A. Landau].
PROBABLE ERROR appears in 1812 in Phil. Mag.: "All that can be gained is, that the errors are as trifling as possible--that they are equally distributed--and that none of them exceed the probable errors of the observation" (OED2).

According to Hald (p. 360), Friedrich Wilhelm Bessel (1784-1846) introduced the term probable error (wahrscheinliche Fehler) without detailed explanation in 1815 in "Ueber den Ort des Polarsterns" in Astronomische Jahrbuch für das Jahr 1818, and in 1816 defined the term in "Untersuchungen über die Bahn des Olbersschen Kometen" in Abh. Math. Kl. Kgl. Akad. Wiss., Berlin. Bessel used the term for the 50% interval around the least-squares estimate.
Also in 1816 Gauss published a paper Bestimmung der Genauigkeit der Beobachtungen in which he showed several methods to calculate the Probable Error. He wrote:"... wir wollen diese Grösse ... der wahrscheinleichen Fehler nennen, und ihn met r bezeichnen." His calculations were based on a general dispersion measure Ek= (S(dk)/n)1/k. Gauss showed that k = 2 results in the most precise value of the probable error: r = 0.6744897 * E2. Notice that E2 is the mean error (i.e. the sample standard deviation).

All calculations and constants related to the probable error and starting with Gauss are based on the assumption that the errors follow a normal distribution. A modern approximation of the ratio r/E2 is 0.674489749382381

Probable error is found in 1852 in Report made to the Hon. Thomas Corwin, secretary of the treasury by Richard Sears McCulloh. This book uses the term four times, but on the one occasion where a computation can be seen the writer takes two measurements and refers to the difference between them as the "probable error" [University of Michigan Digital Library].

Probable error is found in 1853 in A dictionary of science, literature & art edited by William Thomas Brande: "... the probable error is the quantity, which is such that there is the same probability of the difference between the determination and the true absolute value of the thing to be determined exceeding or falling short of it. Thus, if twenty measurements of an angle have been made with the theodolite, and the arithmetical mean or average of the whole gives 50° 27' 13"; and if it be an equal wager that the error of this result (either in excess or defect) is less than two seconds, or greater than two seconds, then the probable error of the determination is two seconds" [University of Michigan Digital Library].

Probable error is found in 1853 in A collection of tables and fromulae (=formulae) useful in surveying, geodesy, and practical astronomy by Thomas Jefferson Lee. The term is defined, in modern terminology, as the sample standard deviation times .674489 divided by the square root of the number of observations [James A. Landau; University of Michigan Digital Library].
Actually on page 238 of the book mentioned above T.J. Lee presents two versions of the probable error: r and R. The one called r is the PE of a single observation with r = 0.674489 * E2 with E2 = s and the one called R is the PE of final result (ie of the mean) with R = r / Ön.

Probable error is found in 1855 in A treatise on land surveying by William Mitchell Gillespie: "When a number of separate observations of an angle have been made, the mean or average of them all, (obtained by dividing the sum of the readings by their number,) is taken as the true reading. The 'Probable error' of this mean, is the quantity, (minutes or seconds) which is such that there is an even chance of the real error being more or less than it. Thus, if ten measurements of an angle gave a mean of 350 18', and it was an equal wager that the error of this result, too much or too little, was half a minute, then half a minute would be the 'Probable error' of this determination. This probable error is equal to the square root of the sum of the squares of the errors (i. e. the differences of each observation from the mean) divided by the number of observations, and multiplied by the decimal 0.674489. The same result would be obtained by using what is called 'The weight' of the observation. It is equal to the square of the number of observations divided by twice the sum of the squares of the errors. The 'Probable error' is equal to 0.476936 divided by the square root of the weight" [University of Michigan Digital Library].

Probable error is found in 1865 in Spherical astronomy by Franz Brünnow (an English translation by the author of the second German edition): "In any series of errors written in the order of their absolute magnitude and each written as often as it actually occurs, we call that error which stands exactly in the middle, the probable error" [University of Michigan Digital Library].

In 1872 Elem. Nat. Philos. by Thomson & Tait has: "The probable error of the sum or difference of two quantities, affected by independent errors, is the square root of the sum of the squares of their separate probable errors" (OED2).

In 1889 in Natural Inheritance, Galton criticized the term probable error, saying the term was "absurd" and "quite misleading" because it does not refer to what it seems to, the most probable error, which would be zero. He suggested the term Probability Deviation be substituted, opening the way for Pearson to introduce the term standard deviation (Tankard, p. 48).

The term QUARTILE was introduced by Francis Galton (Hald, p. 604).

Higher and lower quartile are found in 1879 in D. McAlister, Proc. R. Soc. XXIX: "As these two measures, with the mean, divide the curve of facility into four equal parts, I propose to call them the 'higher quartile' and the 'lower quartile' respectively. It will be seen that they correspond to the ill-named 'probable errors' of the ordinary theory" (OED2).

Upper and lower quartile appear in 1882 in F. Galton, "Report of the Anthropometric Committee," Report of the 51st Meeting of the British Association for the Advancement of Science, 1881, p. 245-260 (David, 1995).

QUINTILE is found in 1922 in "The Accuracy of the Plating Method of Estimating the Density of Bacterial Populations," Annals of Applied Biology by R. A. Fisher, H. G. Thronton, and W. A. Mackenzie: "Since the 3-plate sets are relatively scanty, we can best test their agreement with theory by dividing the theoretical distribution of 43 values at its quintiles, so that the expectation is the same in each group." There are much earlier uses of this term in astrology [James A. Landau].
RANDOM NUMBER. The phrase "this table of random numbers" is found in 1927 in Tracts for Computers (OED2).

See also L. H. C. Tippett, "Random Sampling Numbers 1927," Tracts for Computers, No. 15 (1927) [James A. Landau].

RANDOM SAMPLE is found in April 1870 in "Notices of Recent Publications," The Princeton review: "We confess that we have never suspected Satan as capable of poetizing in the manner attributed to him in Book IX, of which the following is a random sample."

Random choice appears in the Century Dictionary (1889-1897).

Random selection occurs in 1897 in Proc. R. Soc. LXII. 176: "A random selection from a normal distribution" (OED2).

Random sampling was used by Karl Pearson in 1900 in the title, "On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling," Philosophical Magazine 50, 157-175 (OED2).

Random sample is found in 1903 in Biometrika II. 273: "If the whole of a population were taken we should have certain values for its statistical constants, but in actual practice we are only able to take a sample, which should if possible be a random sample" (OED2).

RANDOM WALK. Karl Pearson posed "The Problem of the Random Walk," in the July 27, 1905, issue of Nature (vol. LXXII, p. 294). "A man starts from a point O and walks l yards in a straight line; he then turns through any angle whatever and walks another l yards in a second straight line. He repeats this process n times. I require the probability that after these stretches he is at a distance between r and r + dr from his starting point O." Pearson's objective was to develop a mathematical theory of random migration. In the next issue (vol. LXXII, p. 318) Lord Rayleigh translated the problem into one involving sound, "the composition of n iso-periodic vibrations of unit amplitude and of phases distributed at random," and reported that he had given the solution for large n in 1880 [John Aldrich].
RANDOM VARIABLE. Variabile casuale is found in 1916 in F. P. Cantelli, "La Tendenza ad un limite nel senso del calcolo delle probabilità," Rendiconti del Circolo Matematico di Palermo, 41, 191-201 (David, 1998).

Random variable is found in 1934 in A. Winter, "On Analytic Convolutions of Bernoulli Distributions," American Journal of Mathematics, 56, 659-663 (David, 1998).

RANDOMIZATION appears in 1926 in R. A. Fisher, "The Arrangement of Field Experiments," Journal of the Ministry of Agriculture of Great Britain, 33, 503-513 (David, 1995).

According to Tankard (p. 112), R. A. Fisher "may ... have coined the term randomization; at any rate, he certainly gave it the important position in statistics that it has today."

RANGE (in statistics) is found in 1848 in H. Lloyd, "On Certain Questions Connected with the Reduction of Magnetical and Meteorological Observations," Proceedings of the Royal Irish Academy, 4, 180-183 (David, 1995).
RANK CORRELATION. Kendall & Stuart vol ii page 494 say that the rank correlation coefficient was introduced by "the eminent psychologist" Spearman in 1906. Pearson's biography of Galton also uses the term "correlation of ranks" [James A. Landau].

Rank correlation appears in 1907 in Drapers' Company Res. Mem. (Biometric Ser.) IV. 25: "No two rank correlations are in the least reliable or comparable unless we assume that the frequency distributions are of the same general character .. provided by the hypothesis of normal distribution. ... Dr. Spearman has suggested that rank in a series should be the character correlated, but he has not taken this rank correlation as merely the stepping stone..to reach the true correlation" (OED2).

REGRESSION. According to the DSB, Francis Galton (1822-1911) discovered the statistical phenomenon of regression and used this term, although he originally termed it "reversion."

Porter (page 289), referring to Galton, writes:

He did, however, change his terminology from "reversion" to "regression," a shift whose significance is not entirely clear. Possibly he simply felt that the latter term expressed more accurately the fact that offspring returned only part way to the mean. More likely, the change reflected his new conviction, first expressed in the same papers in which he introduced the term "regression," that this return to the mean reflected an inherent stability of type, and not merely the reappearance of remote ancestral gemmules.
In 1859 Charles Darwin used reversion in a biological context in The Origin of Species (1860): "We could not have told, whether these characters in our domestic breeds were reversions or only analogous variations" (OED2).

Galton used the term reversion coefficient in "Typical laws of heredity," Nature 15 (1877), 492-495, 512-514 and 532-533 = Proceedings of the Royal Institution of Great Britain 8 (1877) 282-301.

Galton used regression in a genetics context in "Section H. Anthropology. Opening Address by Francis Galton," Nature, 32, 507-510 (David, 1995).

Galton also used law of regression in 1885, perhaps in the same address.

Karl Pearson used regression and coefficient of regression in 1897 in Phil. Trans. R. Soc.:

The coefficient of regression may be defined as the ratio of the mean deviation of the fraternity from the mean off-spring to the deviation of the parentage from the mean parent. ... From this special definition of regression in relation to parents and offspring, we may pass to a general conception of regression. Let A and B be two correlated organs (variables or measurable characteristics) in the same or different individuals, and let the sub-group of organs B, corresponding to a sub-group of A with a definite value a, be extracted. Let the first of these sub-groups be termed an array, and the second a type. Then we define the coefficient of regression of the array on the type to be the ratio of the mean-deviation of the array from the mean B-organ to the deviation of the type a from the mean A-organ.

The phrase "multiple regression coefficients" appears in the 1903 Biometrika paper "The Law of Ancestral Heredity" by Karl Pearson, G. U. Yule, Norman Blanchard, and Alice Lee. From around 1895 Pearson and Yule had worked on multiple regression and the phrase "double regression" appears in Pearson's paper "Mathematical Contributions to the Theory of Evolution. III. Regression, Heredity, and Panmixia" (Phil. Trans. R. Soc. 1896). [This paragraph was contributed by John Aldrich.]

RISK and RISK FUNCTION (referring to the expected value of the loss in statistical decision theory) first appear in Wald’s "Contributions to the Theory of Statistical Estimation and Testing  Hypotheses," Annals of Mathematical Statistics,, 10, (1939), 299-326 [John Aldrich, based on David (2001)].
SAMPLE. The juxtaposition of sample and population seems to have originated with Karl Pearson writing in 1903 in Biometrika 2, 273. The relevant passage appears in OED2: "If the whole of a population were taken we should have certain values for its statistical constants, but in actual practice we are only able to take a sample ...." Pearson's colleague, the zoologist W. F. R. Weldon, had been using "sample" to refer to collections of observations since 1892. (See also random sample.) [John Aldrich]
SAMPLE SPACE was introduced into statistical theory by J. Neyman and E. S. Pearson, Phil. Trans. Roy. Soc. A (1933), 289-337. It was associated with the representation of a sample comprising n numbers as a point in n-dimensional space, a representation R. A. Fisher had exploited in articles going back to 1915. W. Feller used this notion of sample space in his "Note on regions similar to the sample space," Statist. Res. Mem., Univ. London 2, 117-125 (1938) but in the Introduction to Probability Theory and its Applications, volume one of 1950 Feller used the term quite abstractly for the set of outcomes of an experiment. He attributed this general concept to Richard von Mises (1883-1953) who had referred to the Merkmalraum (label space) in writings on the foundations of probability from 1919 onwards [John Aldrich].

The term may have been used earlier by Richard von Mises (1883-1953).

SAMPLING DISTRIBUTION. R. A. Fisher seems to have introduced this term. It appears incidentally in 1922 (JRSS, 85, 598) and then in the title of his 1928 paper "The General Sampling Distribution of the Multiple Correlation Coefficient," Proc. Roy. Soc. A, 213, p. 654.
SCATTER DIAGRAM is found in 1925 in F. C. Mills, Statistical Methods X. 366: "The equation to a straight line, fitted by the method of least squares to the points on the scatter diagram, will express mathematically the average relationship between these two variables" (OED2).

Scattergram is found in 1938 in A. E. Waugh, Elem. Statistical Method: "This is the method of plotting the data on a scatter diagram, or scattergram, in order that one may see the relationship" (OED2).

Scatterplot is found in 1939 in Statistical Dictionary of Terms and Symbols by Kurtz and Edgerton (David, 1998).

SCORE and METHOD OF SCORING in the theory of statistical estimation. The derivative of the log-likelihood function played an important part in R. A. Fisher's theory of maximum likelihood from its beginnings in the 1920s but the name score is more recent. The "score" was originally associated with a particular genetic application; a family is assigned a score based on the number of children of each category and there were different ways scoring associated with different ways of estimating linkage. In a 1935 paper ("The Detection of Linkage with Dominant Abnormalities," Annals of Eugenics, 6, 193) Fisher wrote that, because of the efficiency of maximum likelihood, the "ideal score" is provided by the derivative of the log-likelihood function. In 1948 C. R. Rao used the phrase efficient score (Proc. Cambr. Philos. Soc. 44, 50-57) and score by itself (J. Roy. Statist. Soc., B, 10: 159-203) when writing about maximum likelihood in general, i.e. without reference to the linkage application. Today "score" is so established in this derivative of the log-likelihood sense that the phrases "non-ideal score" or "inefficient score" convey nothing.

In 1946 - still in the genetic context - Fisher ("A System of Scoring Linkage Data, with Special Reference to the Pied Factors in Mice. Amer. Nat., 80: 568-578) described an iterative method for obtaining the maximum likelihood value. Rao's 1948 J. Roy. Statist. Soc. B paper treats the method in a more general framework and the phrase "Fisher's method of scoring" appears in a comment by Hartley. Fisher had already used the method in a general context in his 1925 "Theory of Statistical Estimation" paper (Proc. Cambr. Philos. Soc. 22: 700-725) but it attracted neither attention nor name. [This entry was contributed by John Aldrich, with some information taken from David (1995).]

SERIAL CORRELATION. The term was introduced by G. U. Yule in his 1926 paper "Why Do We Sometimes Get Nonsense Correlations between Time-series? A Study in Sampling and the Nature of Time-series," Journal of the Royal Statistical Society, 89, 1-69 (David 2001).
The term SET first appears in Paradoxien des Unendlichen (Paradoxes of the Infinite), Hrsg. aus dem schriftlichen Nachlasse des Verfassers von Fr. Prihonsky, C. H. Reclam sen., xi, pp. 157, Leipzig, 1851. This small tract by Bernhard Bolzano (1781-1848) was published three years after his death by a student Bolzano had befriended (Burton, page 592).

Menge (set) is found in Geometrie der Lage (2nd ed., 1856) by Carl Georg Christian von Staudt: "Wenn man die Menge aller in einem und demselben reellen einfoermigen Gebilde enthaltenen reellen Elemente durch n + 1 bezeichnet und mit diesem Ausdrucke, welcher dieselbe Bedeutung auch in den acht folgenden Nummern hat, wie mit einer endlichen Zahl verfaehrt, so ..." [Ken Pledger].

Georg Cantor (1845-1918) did not define the concept of a set in his early works on set theory, according to Walter Purkert in Cantor's Philosophical Views.

Cantor's first definition of a set appears in an 1883 paper: "By a set I understand every multitude which can be conceived as an entity, that is every embodiment [Inbegriff] of defined elements which can be joined into an entirety by a rule." This quotation is taken from Über unendliche lineare Punctmannichfaltigkeiten, Mathematische Annalen, 21 (1883).

In 1895 Cantor used the word Menge in Beiträge zur Begründung der Transfiniten Mengenlehre, Mathematische Annalen, 46 (1895):

By a set we understand every collection [Zusammenfassung] M of defined, well-distinguished objects m of our intuition [Zusammenfassung] or our thinking (which are called the elements of M brought together to form an entirety.
This translation was taken from Cantor's Philosophical Views by Walter Purkett.
SIGN TEST appears in W. MacStewart, "A note on the power of the sign test," Ann. Math. Statist. 12 (1941) [James A. Landau].
SIGNIFICANCE. Significant is found in 1885 in F. Y. Edgeworth, "Methods of Statistics," Jubilee Volume, Royal Statistical Society, pp. 181-217: "In order to determine whether the observed difference between the mean stature of 2,315 criminals and the mean stature of 8,585 British adult males belonging to the general population is significant [etc.]" (OED2).

Significance is found in 1888 in Logic of Chance by John Venn: "As before, common sense would feel little doubt that such a difference was significant, but it could give no numerical estimate of the significance" (OED2).

Test of significance and significance test are found in 1907 in Biometrika V. 183: " Several other cases of probable error tests of significance deserve reconsideration" (OED2).

Testing the significance is found in "New tables for testing the significance of observations," Metron 5 (3) pp 105-108 (1925) [James A. Landau].

Statistically significant is found in 1931 in L. H. C. Tippett, Methods Statistics: "It is conventional to regard all deviations greater than those with probabilities of 0.05 as real, or statistically significant" (OED2).

Statistical significance is found in 1938 in Journal of Parapsychology: "The primary requirement of statistical significance is met by the results of this investigation" (OED2).

See also rank correlation.

SKEW DISTRIBUTION appears in 1895 in a paper by Karl Pearson [James A. Landau].
SPURIOUS CORRELATION. The term was introduced by Karl Pearson in "On a Form of Spurious Correlation Which May Arise When Indices Are Used in the Measurement of Organs," Proc. Royal Society, 60, (1897), 489-498. Pearson showed that correlation between indices u (= x/z) and v (= x(=y)/z) was a misleading guide to correlation between x and y. His illustration is
A quantity of bones are taken from an ossuarium, and are put together in groups which are asserted to be those of individual skeletons. To test this a biologist takes the triplet femur, tibia, humerus, and seeks the correlation between the indices femur/humerus and tibia/humerus. He might reasonably conclude that this correlation marked organic relationship, and believe that the bones had really been put together substantially in their individual grouping. As a matter of fact ... there would be ... a correlation of about 0.4 to 0.5 between these indices had the bones been sorted absolutely at random.
The term has been applied to other correlation scenarios with potential for misleading inferences. In Student's "The Elimination of Spurious Correlation due to Position in Time or Space" (Biometrika, 10, (1914), 179-180) the source of the spurious correlation is the common trends in the series. In H. A. Simon's "Spurious Correlation: A Causal Interpretation," Journal of the American Statistical Association, 49, (1954), pp. 467-479 the source of the spurious correlation is a common cause acting on the variables. In the recent spurious regression literature in time series econometrics (Granger & Newbold, Journal of Econometrics, 1974) the misleading inference comes about through applying the correlation theory for stationary series to non-stationary series. The dangers of doing this were pointed out by G. U. Yule in his 1926 "Why Do We Sometimes Get Nonsense Correlations between Time-series? A Study in Sampling and the Nature of Time-series," Journal of the Royal Statistical Society, 89, 1-69. (Based on Aldrich 1995)
The term STANDARD DEVIATION was introduced by Karl Pearson (1857-1936) in 1893, "although the idea was by then nearly a century old" (Abbott; Stigler, page 328). According to the DSB:
The term "standard deviation" was introduced in a lecture of 31 January, 1893, as a convenient substitute for the cumbersome "root mean square error" and the older expressions "error of mean square" and "mean error."
The OED2 shows a use of standard deviation in 1894 by Pearson in "Contributions to the Mathematical Theory of Evolution, Philosophical Transactions of the Royal Society of London, Ser. A. 185, 71-110: "Then s will be termed its standard-deviation (error of mean square)."
STANDARD ERROR is found in 1897 in G. U. Yule, "On the Theory of Correlation," Journal of the Royal Statistical Society, 60, 812-854: "We see that s1Ö(1 - r2) is the standard error made in estimating x" (OED2).
STANDARD SCORE. In 1921 Univ. Illin. Bur. Educ. Res. Bull. has: "Provision is made for comparing a pupil's achievement score..with the norm corresponding to his mental age by dividing his achievement age by the standard score for his mental age. This quotient is called the Achievement Quotient" (OED2).

Standard score is dated 1928 in MWCD10.

STANINE is dated 1944 in MWCD10.

The earliest citation in the OED2 is from the Baltimore Sun, Oct. 1, 1945, "The result .. was a 'stanine' rating (stanine being an invented word, from 'standard of nine')."

Stanines were first used to describe an examinee's performance on a battery of tests constructed for the U. S. Army Air Force during World War II.

STATISTIC (as opposed to parameter) is found in R. A. Fisher, "On the Mathematical Foundations of Theoretical Statistics," Philosophical Transactions of the Royal Society of London, Ser. A., 222, 309-368: "These involve the choice of methods of calculating from a sample statistical derivates, or as we shall call them statistics, which are designed to estimate the values of the parameters of the hypothetical population" (OED2).

This term was introduced in 1922 by Fisher, according to Tankard (p. 112).

The term statistic was not well-received initially. Arne Fisher (no relation) asked Fisher, "Where ... did you get that atrocity, a statistic? (letter (p. 312) in J. H. Bennet Statistical Inference and Analysis: Selected Correspondence of R. A. Fisher (1990).) Karl Pearson objected, "Are we also to introduce the words a mathematic, a physic, an electric etc., for parameters or constants of other branches of science?" (p. 49n of Biometrika, 28, 34-59 1936). [These two quotations were provided by John Aldrich.]

STATISTICS originally referred to political science and it is difficult to determine when the word was first used in a purely mathematical sense. The earliest citation of the word statistics in the OED2 is in 1770 in W. Hooper's translation of Bielfield's Elementary Universal Education: "The science, that is called statistics, teaches us what is the political arrangement of all the modern states of the known world." However, there are earlier citations for statistical and Latin and German forms of statistic, all used in a political sense.

In Webster's dictionary of 1828 the definition of statistics is: "A collection of facts respecting the state of society, the condition of the people in a nation or country, their health, longevity, domestic economy, arts, property and political strength, the state of the country, &c."

STOCHASTIC is found in English as early as 1662 with the obsolete meaning "pertaining to conjecture."

In its modern sense, the term was used in 1917 by Ladislaus Josephowitsch Bortkiewicz (1868-1931) in Die Iterationem 3: "Die an der Wahrscheinlichkeitstheorie orientierte, somit auf 'das Gesetz der Grossen Zahlen' sich gründende Betrachtng empirischer Vielheiten mö ge als Stochastik ... bezeichnet werden" (OED2).

Stochastic process is found in A. N. Kolmogorov, "Sulla forma generale di un prozesso stocastico omogeneo," Rend. Accad. Lincei Cl. Sci. Fis. Mat. 15 (1) page 805 (1932) [James A. Landau].

Stochastic process is also found in A. Khintchine "Korrelationstheorie der stationäre stochastischen Prozesse," Math. Ann. 109 (1934) [James A. Landau].

Stochastic process occurs in English in "Stochastic processes and statistics," Proc. Natl. Acad. Sci. USA 20 (1934).

STRATIFIED SAMPLING occurs in J. Neyman, "On the two different aspects of the representative method; the method of stratified sampling and the method of purposive selection," J. R. Satatist. Soc 97 (1934) [James A. Landau].
STRONG LAW OF LARGE NUMBERS is found in A. N. Kolmogorov, "Sur la loi forte des grandes nombres," Comptes Rendus de l'Acade/mie des Sciences, Paris 191 page 910 (1930) [James A. Landau].
STUDENT'S t-DISTRIBUTION. "Student" was the pseudonym of William Sealy Gosset (1876-1937). Gosset once wrote to R. A. Fisher, "I am sending you a copy of Student's Tables as you are the only man that's ever likely to use them!" The letter appears in Letters from W. S. Gosset to R. A. Fisher, 1915-1936 (1970). Student's tables became very important in statistics but not in the form he first constructed them.

In his 1908 paper, "The Probable Error of a Mean," Biometrika 6, 1-25 Gosset introduced the statistic, z, for testing hypotheses on the mean of the normal distribution. Gosset used the divisor n, not the modern (n - 1), when he estimated s and his z is proportional to t with t = z Ö(n - 1). Fisher introduced the t form for it fitted in with his theory of degrees of freedom. Fisher's treatment of the distributions based on the normal distribution and the role of degrees of freedom was given in "On a Distribution Yielding the Error Functions of Several well Known Statistics," Proceedings of the International Congress of Mathematics, Toronto, 2, 805-813. The t symbol appears in this paper but although the paper was presented in 1924, it was not published until 1928 (Tankard, page 103; David, 1995). According to the OED2, the letter t was chosen arbitrarily. A new symbol suited Fisher for he was already using z for a statistic of his own (see entry for F).

Student's distribution (without "t") appears in 1925 in R. A. Fisher, "Applications of 'Student's' Distribution," Metron 5, 90-104 and in Statistical Methods for Research Workers (1925). The book made Student's distribution famous; it presented new uses for the tables and made the tables generally available.

"Student's" t-distribution appears in 1929 in Nature (OED2).

t-distribution appears (without Student) in A. T. McKay, "Distribution of the coefficient of variation and the extended 't' distribution," J. Roy. Stat. Soc., n. Ser. 95 (1932).

t-test is found in 1932 in R. A. Fisher, Statistical Methods for Research Workers: "The validity of the t-test, as a test of this hypothesis, is therefore absolute" (OED2).

Eisenhart (1979) is the best reference for the evolution of t, although Tankard and Hald also discuss it.

[This entry was largely contributed by John Aldrich.]

STUDENTIZATION. According to Hald (p. 669), William Sealy Gossett (1876-1937) used the term Studentization in a letter to E. S. Pearson of Jan. 29, 1932.

Studentized D2 statistic is found in R. C. Bose and S. N. Roy, "The exact distribution of the Studentized D2 statistic," Sankhya 3 pt. 4 (1935) [James A. Landau].

SUFFICIENT STATISTIC. Criterion of Sufficiency and sufficient statistic appear in 1922 in R. A. Fisher, "On the Mathematical Foundations of Theoretical Statistics," Philosophical Transactions of the Royal Society of London, Ser. A, 222, 309-368:
The statistic chosen should summarise the whole of the relevant information supplied by the sample. This may be called the Criterion of Sufficiency. ... In the case of the normal curve of distribution it is evident that the second moment is a sufficient statistic for estimating the standard deviation.
According to Hald (page 452), Fisher introduced the term sufficiency in a 1922 paper.
TIME SERIES appears in W. M. Persons's "The Correlation of Economic Statistics," Publications of the American Statistical Association, 12, (1910), 287-322 [John Aldrich].
The phrase TIME SERIES ANALYSIS entered circulation at the end of 1920s, e.g. in S. Kuznets's "On the Analysis of Time Series," Journal of the American Statistical Association, 23, (1928), 398-410, although it only became really popular much later [John Aldrich].
TYPE I ERROR and TYPE II ERROR. In their first joint paper "On the Use of Certain Test Criteria for Purposes of Statistical Inference, Part I," Biometrika, (1928) 20A, 175-240 Neyman and Pearson referred to "the first source of error" and "the second source of error" (David, 1995).

Errors of first and second kind is found in 1933 in J. Neyman and E. S. Pearson, "On the Problems of the Most Efficient Tests of Statistical Hypotheses," Philosophical Transactions of the Royal Society of London, Ser. A (1933), 289-337 (David, 1995).

Type I error and Type II error are found in 1933 in J. Neyman and E. S. Pearson, "The Testing of Statistical Hypotheses in Relation to Probabilities A Priori," Proceedings of the Cambridge Philosophical Society, 24, 492-510 (David, 1995).

UNIFORMLY DISTRIBUTED. Uniform distribution appears in 1937 in Introduction to Mathematical Probability by J. V. Uspensky. Page 237 reads, "A stochastic variable is said to have uniform distribution of probability if probabilities attached to two equal intervals are equal." This is a slight variant of the modern terminology, which would be "a variable is said to be uniformly distributed" or "a variable from the uniform distribution" [James A. Landau].

Uniformly distributed is found in H. Sakamoto, "On the distributions of the product and the quotient of the independent and uniformly distributed random variables," Tohoku Math. J. 49 (1943).

The phrase UNIFORMLY MOST POWERFUL occurs in R. A. Fisher, "Two New Properties of Mathematical Likelihood," Proceedings of the Royal Society, Series A, vol. 144 (1934) [James A. Landau].
UNIMODAL is found in 1904 in F. de Helguero, "Sui massimi delle curve dimorfiche," Biometrika, 3, 84-98 (David, 1995).
UNIVARIATE is found in 1928 in Biometrika XXa. 32: "Various writers struggled with the problems that arise when samples are taken from uni-variate and bi-variate populations" (OED2).
VARIANCE. Edgeworth used fluctuation for twice the square of the standard deviation.

Variance was introduced by Ronald Aylmer Fisher in 1918 in "The Correlation Between Relatives on the Supposition of Mendelian Inheritance," Transactions of the Royal Society of Edinburgh, 52, 399-433: "It is ... desirable in analysing the causes of variability to deal with the square of the standard deviation as the measure of variability. We shall term this quantity the Variance."

VENN DIAGRAM. Euler's scheme of notation is found in 1858 in Elements of logic by Henry Coppée (1821-1895): "Euler's scheme of notation is altogether the one best suited to our purpose, and we shall limit ourselves to the explanation of that. It is essentially an arrangement of three circles, to represent the three terms of a syllogism, and, by their combination, the three propositions" [University of Michigan Digital Library].

Euler's system of notation appears in 1863 in An outline of the necessary laws of thought: a treatise on pure and applied logic by William Thomson (University of Michigan Digital Library).

Euler's notation appears in about 1869 in The principles of logic, for high schools and colleges by Aaron Schuyler (University of Michigan Digital Library).

Euler's diagram appears in 1884 in Elementary Lessons in Logic by W. Stanley Jevons: "Euler's diagram for this proposition may be constructed in the same manner as for the proposition I as follows:..."

Euler's circles appears in 1893 in Logic by William Minto (1845-1893): "The relations between the terms in the four forms are represented by simple diagrams known as Euler's circles."

Euler's circles appears in October 1937 in George W. Hartmann, "Gestalt Psychology and Mathematical Insight," The Mathematics Teacher: "But in the case of 'Euler's circles' as used in elementary demonstrations of formal logic, one literally 'sees' how intimately syllogistic proof is linked to direct sensory perception of the basic pattern. It seems that the famous Swiss mathematician of the eighteenth century was once a tutor by correspondence to a dull-witted Russian princess and devised this method of convincing her of the reality and necessity of certain relations established deductively."

Venn diagram appears in 1918 in A Survey of Symbolic Logic by Clarence Irving Lewis: "This method resembles nothing so much as solution by means of the Venn diagrams" (OED2).

WINSORIZED is found in 1960 in W. J. Dixon, "Simplified Estimation from Censored Normal Samples," The Annals of Mathematical Statistics, 31, 385-391 (David, 1998).
The terms z-STATISTIC and z-DISTRIBUTION were introduced by R. A. Fisher in "On a distribution yielding the error functions of several well-known statistics," Proceedings of the International Mathematics Congress, Toronto (1924) [James A. Landau].
Sources : http://members.aol.com/jeff570/sources.html