/docs/statistics/order/ Directory Listing

Directories

Files

  • 2016-dewinter.pdf: “Comparing the Pearson and Spearman correlation coefficients across distributions and sample sizes: A tutorial using simulations and empirical data”⁠, Joost C. F. de Winter, Samuel D. Goslin, Jeff Potter

  • 2005-dosanjos.pdf: ⁠, Ulisses U. dos Anjos, Nikolai Kolev, Nelson I. Tanaka (2005-12; backlinks):

    We exhibit a copula representation of the (r, s)-th bivariate order statistics from an independent sample of size n. We give conditions when such a representation converges weakly to a bivariate Gaussian copula. A recurrence relationship between the density of the order statistics is presented and related Fréchet bounds are given. The usefulness of those results are stressed through examples.

    [Keywords: Bivariate binomial, copula, Fréchet bounds, normal asymptotics, order statistics.]

  • 2004-barakat.pdf: ⁠, H. M. Barakat, M. A. El-Shandidy (2004-01):

    This work gives a new representation of the distribution and expected value of the concomitant rank of order statistics.

    An advantage of this representation is its ability to extend without any complexity to the multivariate case. Moreover, it gives a new direct approach to compute an approximate formula for the distribution and expected value of the concomitant rank of order statistics. Finally, an upper bound is derived for the confidence level of the tolerance region of the original bivariate (resp., multivariate) d.f., from which the sample is drawn.

    [Keywords: order statistics, concomitants, ranking, tolerance region]

  • 1999-chen.pdf: ⁠, Chien-Chung Chen, Christopher W. Tyler (1999; backlinks):

    Evaluation of the integral properties of Gaussian Statistics is problematic because the Gaussian function is not analytically integrable. We show that the expected value of the greatest order statistics in Gaussian samples (the max distribution) can be accurately approximated by the expression Φ−1(0.52641/n), where n is the sample size and Φ−1 is the inverse of the Gaussian cumulative distribution function. The expected value of the least order statistics in Gaussian samples (the min distribution) is correspondingly approximated by -Φ-1(0.52641/n). The standard deviation of both extreme order distributions can be approximated by the expression 0.5[Φ-1(0.88321/n) - Φ−1(0.21421/n)]. We also show that the probability density function of the extreme order distribution can be well approximated by gamma distributions with appropriate parameters. These approximations are accurate, computationally efficient, and readily implemented by build-in functions in many commercial mathematical software packages such as Matlab, Mathematica, and Excel.

  • 1996-lubinski-2.pdf: ⁠, David Lubinski, Lloyd G. Humphreys (1996; backlinks):

    When measures of individual differences are used to predict group performance, the reporting of correlations computed on samples of individuals invites misinterpretation and dismissal of the data. In contrast, if regression equations, in which the correlations required are computed on bivariate means, as are the distribution statistics, it is difficult to underappreciate or lightly dismiss the utility of psychological predictors.

    Given sufficient sample size and linearity of regression, this technique produces cross-validated regression equations that forecast criterion means with almost perfect accuracy. This level of accuracy is provided by correlations approaching unity between bivariate samples of predictor and criterion means, and this holds true regardless of the magnitude of the “simple” correlation (e.g., rxy = 0.20, or rxy = 0.80).

    We illustrate this technique empirically using a measure of general intelligence as the predictor and other measures of individual differences and socioeconomic status as criteria. In addition to theoretical applications pertaining to group trends, this methodology also has implications for applied problems aimed at developing policy in numerous fields.

    …To summarize, psychological variables generating modest correlations frequently are discounted by those who focus on the magnitude of unaccounted for criterion variance, large standard errors, and frequent false positive and false negative errors in predicting individuals. Dismissal of modest correlations (and the utility of their regressions) by professionals based on this psychometric-statistical reasoning has spread to administrators, journalists, and legislative policy makers. Some examples of this have been compiled by Dawes (1979, 1988) and Linn (1982). They range from squaring a correlation of 0.345 (i.e., 0.12) and concluding that for 88% of students, “An SAT score will predict their grade rank no more accurately than a pair of dice” (cf. Linn, 1982, p. 280) to evaluating the differential utility of two correlations 0.20 and 0.40 (based on different procedures for selecting graduate students) as “twice of nothing is nothing” (cf. Dawes, 1979, p. 580).

    …Tests are used, however, in ways other than the prediction of individuals or of a specific outcome for Johnny or Jane. And policy decisions based on tests frequently have broader implications for individuals beyond those directly involved in the assessment and selection context (see the discussion later in this article). For example, selection of personnel in education, business, industry, and the military focuses on the criterion performance of groups of applicants whose scores on selection instruments differ. Selection psychologists have long made use of modest predictive correlations when the ratio of applicants to openings becomes large. The relation of utility to size of correlation, relative to the selection ratio and base rate for success (if one ignores the test scores), is incorporated in the well-known Taylor-Russell (1939) tables. These tables are examples of how psychological tests have revealed convincingly economic and societal benefits (), even when a correlation of modest size remains at center stage. For example, given a base rate of 30% for adequate performance and a predictive validity coefficient of 0.30 within the applicant population, selecting the top 20% on the predictor test will result in 46% of hires ultimately achieving adequate performance (a 16% gain over base rate). To be sure, the prediction for individuals within any group is not strong—about 9% of the variance in job performance. Yet, when training is expensive or time-consuming, this can result in huge savings. For analyses of groups composed of anonymous persons, however, there is a more unequivocal way of illustrating the importance of modest correlations than even the Taylor-Russell tables provide.

    Rationale for an Alternative Approach: Applied psychologists discovered decades ago that it is more advantageous to report correlations between a continuous predictor and a dichotomous criterion graphically rather than as a number that varies between zero and one. For example, the correlation () of about 0.40 with the pass-fail pilot training criterion and an ability- predictor looks quite impressive when graphed in the manner of Figure 1a. In contrast, in Figure 1b, a scatter plot of a correlation of 0.40 between two continuous measures looks at first glance like the pattern of birdshot on a target. It takes close scrutiny to perceive that the pattern in Figure 1b is not quite circular for the small correlation. Figure 1a communicates the information more effectively than Figure 1b. When the data on the predictive validity of the pilot ability-stanine were presented in the form of Figure 1a (rather than, say, as a scatter plot of a correlation of 0.40; Figure 1b), general officers in recruitment, training, logistics, and operations immediately grasped the importance of the data for their problems. Because the Army Air Forces were an attractive career choice, there were many more applicants for pilot training than could be accommodated and selection was required…A small gain on a criterion for a unit of gain on the predictor, as long as it is predicted with near-perfect accuracy, can have high utility.

    Figure 1. a: Percentage of pilots eliminated from a training class as a function of pilot aptitude rating in stanines. Number of trainees in each stanine is shown on each bar. (From DuBois 1947). b: A synthetic example of a correlation of 0.40 (n = 400).
  • 1989-hartigan-fairnessinemploymenttesting.pdf: ⁠, John A. Hartigan, Alexandra K. Wigdor (1989; backlinks):

    Declining American competitiveness in world economic markets has renewed interest in employment testing as a way of putting the right workers in the right jobs. A new study of the U.S. Department of Labor’s General Aptitude Test Battery (GATB) Referral System sheds light on key questions for America’s employers: How well does the GATB predict job success? Are there scientific justifications for adjusting minority test scores? Will increased use of the GATB result in substantial increases in productivity?

    Fairness in Employment Testing evaluates both the validity generalization techniques used to justify the use of the GATB across the spectrum of U.S. jobs and the policy of adjusting test scores to promote equal opportunity.


    This volume is one of a number of studies conducted under the aegis of the National Research Council/National Academy of Sciences that deal with the use of standardized ability tests to make decisions about people in employment or educational settings. Because such tests have a sometimes important role in allocating opportunities in American society, their use is quite rightly subject to questioning and not infrequently to legal scrutiny. At issue in this report is the use of a federally sponsored employment test, the General Aptitude Test Battery (GATB), to match job seekers to requests for job applicants from private-sector and public-sector employers. Developed in the late 1940s by the U.S. Employment Service (USES), a division of the Department of Labor, the GATB is used for vocational counseling and job referral by state-administered Employment Service (also known as Job Service) offices located in some 1,800 communities around the country.


    • Front Matter
    • Summary
    1. The Policy Context
    2. Issues in Equity and Law
    3. The Public Employment Service
    4. The GATB: Its Character and Psychometric Properties
    5. Problematic Features of the GATB: Test Administration, Speedness, and Coachability
    6. The Theory of Validity Generalization
    7. Validity Generalization Applied to the GATB
    8. GATB Validities
    9. Differential Validity and Differential Prediction
    10. The VG-GATB Program: Concept, Promotion, and Implementation
    11. In Whose Interest: Potential Effects of the VG-GATB Referral System
    12. Evaluation of Economic Claims
    13. Recommendations for Referral and Score Reporting
    14. Central Recommendations
    • References
    • Appendix A: A Synthesis of Research on Some Psychometric Properties of the GATB
    • Appendix B: Tables Summarizing GATB Reliabilities
    • Appendix C: Biographical Sketches, Committee Members and Staff
    • Index
  • 1987-galambos-theasymptotictheoryofextremeorderstatistics2nded.pdf: “The Asymptotic Theory of Extreme Order Statistics, Second Edition”⁠, Janos Galambos (backlinks)

  • 1982-royston.pdf: “Expected Normal Order Statistics (Exact and Approximate)”⁠, J. P. Royston (backlinks)

  • 1967-srivastava.pdf: ⁠, O. P. Srivastava (1967-06; backlinks):

    The exact distribution of extremes in a sample and its limiting forms are well known in the univariate case. The limiting form for the largest observation in a sample was derived by Fisher and Tippet (1928) as early as 1927 by a functional equation, and that for the smallest was studied by Smirnov (1952). Though the joint distribution of two extremes has not been fully studied yet gave a necessary and sufficient condition for the asymptotic independence of two largest extremes in a bivariate distribution. In this paper a necessary and sufficient condition for the asymptotic independence of two smallest observations in a bivariate sample has been derived, and the result has been used to find the condition for the asymptotic independence of any pair of extreme order statistics, one in each component of the bivariate sample. This result is further extended to find the condition for asymptotic independence of the pair of distances between two order statistics, arising from each component.

  • 1967-deakin.pdf: “Estimating Bounds on Athletic Performance”⁠, Michael Deakin

  • 1964-mardia.pdf: ⁠, K. V. Mardia (1964-11-01; backlinks):

    has given a necessary and sufficient condition for asymptotic independence of two extremes for a sample from bivariate population. We shall obtain such a condition for asymptotic independence of all the four extremes X, X’, Y and Y’. It assumes a very simple form when f(x,y) is symmetrical in x and y, and the marginal p. d. f. of x and y have the same form. Under these conditions on the p. d. f., a modification is possible in the condition given by Sibuya (1960) which reduces to one given by Watson (1954) for other purpose. It is further shown that extremes for samples from bivariate normal population satisfy our condition if |p| < 1, where p is the population correlation coefficient. Geffroy (1958) and Sibuya (1960) have proved a particular result for asymptotic independence of only two extremes X and Y in the normal case.

  • 1961-harter.pdf: “Expected Values of Normal Order Statistics”⁠, H. Leon Harter (backlinks)

  • 1958-blom-orderstatistics.pdf: “Statistical Estimates and Transformed Beta-Variables”⁠, Gunnar Blom (backlinks)

  • 1947-dubois-theclassificationprogram.pdf: “The Classification Program”⁠, Philip H. Dubois (backlinks)

  • 1947-elfving.pdf: ⁠, G. Elfving (1947; backlinks):

    Consider a sample of n observations, taken from an infinite normal population with the mean 0 and the standard deviation 1. Let a be the smallest and b the greatest of the observed values. Then w = b − a is the range of the sample. For certain statistical purposes knowledge of the sampling distribution of range is needed. The distribution function, however, involves a rather complicated integral, whose exact calculation is, for n > 2, impossible…it seems to be at least of theoretical interest to investigate the asymptotical distribution of range for n → ∞. This is the purpose of the present paper.

  • 1989-husler.pdf

  • 1923-kelley.pdf (backlinks)