Classical test theory

  • Jul 26, 2021
click fraud protection
Classical test theory

A test is a scientific instrument insofar as it measures what it claims, that is, it is valid, and it measures well, that is, it is accurate or reliable. If we find an instrument that we cannot trust the measurements they provide, since they vary from time to time when we measure the same object, then we will say that it is not reliable. An instrument, to measure correctly Something must be precise, because if not, measure what you measure, you will measure it wrong. Therefore, being precise is a necessary but not sufficient condition. In addition, it must be valid, that is, what it measures with precision will be what it is intended to measure, and not something else.

Absolute and relative reliability: We can approach the problem of the reliability of a test in two different ways, although basically they coincide.

Reliability as the inaccuracy of his measurements: When a subject responds to a test he obtains an empirical score, which is affected by an error. If there were no errors, the subject would get his true score. The test is imprecise because the empirical score does not match the true true score. This difference between the two scores is the sampling error, the measurement error. The

standard error of measurement will be the standard deviation of the measurement errors. The standard error of measurement indicates the absolute precision of the test, since it allows estimating the difference between the measurement obtained and the one that would be obtained if there were no error.

Reliability and stability of measurements: A test will be more reliable the more constant or stable the results it provides are maintained when repeated. The more stable the results are on two occasions, the greater the correlation between them. This correlation is called reliability coefficient. This tells us, not the amount of the error, but the consistency of the test with itself and the consistency of the information it offers. The reliability coefficient expresses the relative reliability of the test.

The reliability coefficient and the reliability index: - The reliability coefficient of a test is the correlation of the test with itself, obtained for example, in two parallel forms: rxx. - The precision index is the correlation between the empirical scores of a test and its true scores: rxv The precision index always will be greater than the reliability coefficient To find out the reliability coefficient, these three methods should be highlighted classics:

  • Find the correlation between the test and its repetition: The repetition method or test-retest method: It consists of apply the same test to the same group twice and calculate the correlation between the two series of scores. This correlation is the reliability coefficient. This method usually gives a higher reliability coefficient than those obtained by other procedures, and may be contaminated by disturbing factors.
  • Find the correlation between two parallel forms of the test: The method of parallel forms: Two forms are prepared parallel lines of the same test, that is, two equivalent forms that give the same information, and are applied to the same group of subjects. The correlation between the two forms is the reliability coefficient. With this method, by not repeating the same test, disturbing sources of retest reliability are avoided.
  • Find the correlation between two parallel halves of the test: The two halves method: The test is divided into two equivalent halves and the correlation between them is found. It is the preferred method, since it is simple and avoids the limitations of the previous procedures. You can choose the odd elements of the test, to constitute one half, and the even elements to constitute the other.

The reliability coefficient and the correlation between parallel tests

The reliability coefficient of a test indicates the proportion that the true variance is of the empirical variance: graph33 The reliability coefficient of a test varies between 0 and 1. For example: if the correlation between two parallel tests is rxx´ = 0'80, it means that 80% of the variance of the test is due to the true measure, and the rest, that is, 20% of the variance of the test is due to the error. The reliability index of a test is the correlation between its empirical scores and its true scores reliability index = The reliability index is equal to the square root of the reliability coefficient

Once two parallel forms of a test have been developed, the variance analysis procedure is applied to check the homogeneity of the variances and the difference between the measures. If the variances are homogeneous, the difference between the means is not significant and the two forms are constructed with the same number of elements of the same type and psychological content, it can be said that they are parallel. If not, you have to reform them until they are. The lack of reliability is identified with the value rxx´ = 0 4.- The typical measurement error: The difference between the empirical score and the true one is the random error, called the measurement error. The standard deviation of measurement errors is called the standard measurement error. The standard error of measurement allows making estimates about the absolute reliability of the test, that is, estimating how much measurement error affects a score.

Reliability and length: The length of the test refers to the number of its elements. Its reliability depends on this length. If a test consists of three elements, a subject may obtain a score of 1 on one occasion and a score of 1 on the other, or in a parallel way.

From one occasion to another, the score has varied by one point; one point out of three is a 33% variation, a high variation. If the subjects obtain accidental variations of this type, the correlation of the test with itself or that of the two parallel forms of the test will be greatly lowered and cannot be high. If the test is much longer, if it has, for example, 100 elements, a subject can obtain 70 points on one occasion and 67 in a parallel way. From one time to another it has varied 3 points; it is a relatively small variance in relation to the total test, specifically 3%. These small accidental alterations of this magnitude, which occur in the scores of the subjects, when going from a parallel shape, are relatively unimportant and will not decrease as much as before the correlation between both.

The reliability coefficient will be much higher than in the previous case. The Spearman-Brown equation expresses the relationship between reliability and length. The precision of a test is null when the length is 0, and it increases as the length increases. Although the increase is relatively less as the length of the part is greater. This means that the precision grows a lot in the beginning and relatively less afterwards. When the length tends to infinity, the reliability coefficient tends to

As the length of a test increases, its precision increases because the true variance increases at a higher rate than the error variance. This means that the precision of the test increases because the proportion of variance due to the error decreases. The Rulon formula, as well as the Flanagan and Guttman formula, are especially applicable when calculating the reliability coefficient by the two-halves method. These are formulas used to calculate the reliability coefficient.

Reliability and consistency: The reliability coefficient can also be found in another way, it is called alpha coefficient or coefficient of generalizability or representativeness (Cronbach). This alpha coefficient indicates how accurately some items measure an aspect of personality or behavior. It can be interpreted as: An estimate of the mean correlation of all possible items in a certain aspect. A measure of the precision of the test based on its coherence or internal consistency (interrelation between its elements; to what extent the test items are all measuring the same) and their length. Indicating the representativeness of the test, that is, the amount in which the sample of items that composes it is representative of the population of possible items of the same type and psychological content. The alpha coefficient It mainly reflects two basic concepts in the precision of a test: 1. The interrelation between its elements: the extent to which they all measure the same thing well.

The length of the test: by increasing the number of cases in a sample, and if errors are eliminated systematic, the sample better represents the population from which it is drawn and the casual mistake. If the test items are dichotomous, (yes or no, 1 or 0, agree or disagree, etc.), the equation of the alpha coefficient is simplified, giving rise to the equations of Kuder-Richardson (KR20 and KR21). Given a certain number of items, a test will be the more reliable, the more homogeneous it is. The alpha coefficient tells us the reliability insofar as it represents homogeneity and coherence or internal consistency of the elements of a test.

According to the item sample space model, the objective of the test is to estimate the measure that would be obtained if all the items in the sample space were used. This measurement would be the true score, which is roughly close to the actual measurements. Depending on the degree to which a sample of items correlates with the true scores, the test is more or less reliable. Central to this model is the matrix of correlations between all the items in the sample space. This sample model insists more directly on internal consistency, and to the extent that it achieves it, it indirectly guarantees stability.

The linear model of parallel tests insists more on the stability of the scores, and to the extent that it achieves stability, it indirectly favors internal consistency. If we apply a test to establish individual diagnoses and prognoses, the reliability coefficient should be 0.90 upwards. In forecasts and collective classifications, the requirement is not so great, although it is not convenient to stray too far from 0.90 to 0.80.

Sometimes in certain kinds of tests, such as personality tests, it is difficult to achieve coefficients of more than 0.70. If the parallel shapes, or parallel halves, are applied after a more or less large interval, the chance errors may be more numerous than those affecting the alpha coefficient. This is so because what lowers the correlation is not only the random errors intrinsic to the test and on a single occasion, which are the ones that are taken into account the alpha coefficient, but also all the errors that can come from the two different situations, which can differ in many details, influence. Therefore, the alpha coefficient is usually higher than the other coefficients.

With the exception of the coefficient found by repeating the same test, since there is a greater probability that the errors random patterns from the first application are repeated in the second, and instead of decreasing the correlation between the two, the increase. Care must be taken that the second application is completely independent of the first. If we achieve this, this will be the easiest and cheapest method and advisable when trying to appreciate the stability of the scores, especially during long periods of time and with complex tests. > Next: Validity of the tests

instagram viewer