Theories test: TCT and IRT

Tests are used in psychology as measuring instruments, to get a little closer to the concept, just as we use the meter to measure length, could we use a test to measure intelligence, memory, attention?One of the differences between the two actions would be that the tests are not so easy to build, nor so easy to apply.

Moreover, just as a single measure does not allow us to talk about the volume of an object, the administration of a single test does not allow us to diagnose or propose an intervention, so the tests are important for evaluation, but not decisive. .

  • This is where the psychologist plays the most important role: one way or another.
  • He must use the information he has obtained from the test and other sources to shape a coherent assessment that results in intervention planning.

That is, it is in the integration of results from different sources that the quality of the professional feels most, we are talking about a skill that is acquired with knowledge, but also with years of experience.

The origin of the tests is often cited in tests conducted by Chinese emperors in 3000 BC. They were designed to assess the professional competence of the officers who would be serving them. (1)

The current tests have their origins closer to the tests carried out by Galton (1822-1911) in his laboratory; However, it was James Cattell who first used the term mental test in 1890.

Because these early tests were not highly predictive of human cognitive ability, researchers such as Binet and Simon (1905) introduced cognitive tasks to assess aspects such as judgment, understanding, and reasoning on their new scale.

Binet’s scale marked the beginning of a tradition of individual scales. In addition to cognitive testing, there have been significant advances in personality testing.

In view of all the advances produced, measurement theories (test theories) that directly affect tests as instruments they are begin to develop.

With the concern of generating instruments that measure what we want them to measure and do so with as little error as possible, psychometrics appear. A psychometry that will require any valid and reliable test or measuring instrument.

Should we remember that reliability means stability or consistency of measurements when the measurement process is repeated, i. e. a test will be more reliable, better if the results are replicated when measuring two subjects?Or the same subject at different times?have the same measured level.

On the other hand, validity refers to the extent to which empirical evidence and theory support the interpretation of test results. (2)

Thus, there are two major test theories or approaches to analyzing and constructing such instruments: classical test theory (TCT) and item response theory (TRI).

This is the dominant theory in the construction and analysis of tests, so it is relatively easy to create tests that meet the minimums required by this paradigm, and it is also relatively easy to evaluate the test itself for the above parameters: reliability and validity. .

It originated from Spearman’s works in the early 20th century. Then, in 1968, researchers Lord and Novick reformulated this theory and paved the way for TRI’s new approach.

This theory is based on the classical linear model, proposed by Spearman and consists of assuming that the score obtained by a person in a test, which we call the empirical score, and which is usually referred to with the letter X, consists of two elements. (2)

On the one hand, we find the actual score of the subject in the test (V) and, on the other hand, the error (e). It is as follows: X – V e.

Spearman adds three hypotheses to this theory

To come to this theory, Spearman defined parallel tests as those that measure the same thing, but with different items.

The first limitation is that, in this theory, the measurements are not invariant compared to the instrument used, that is, if a psychologist evaluates the intelligence of three people with a different test for each, the results will not be comparable. Is this happening?

The results of the three measuring instruments are not on the same scale: each test has its own scale, in order to compare, for example, the intelligence of X people who have been evaluated with different intelligence tests it is necessary to transform the scores obtained directly from the test into other scales.

The problem is that when converting scores into scales, it is assumed that the regulatory groups in which the scales of the different tests were designed are comparable – even average, same standard deviation – which is difficult to guarantee in practice.

(1) Therefore, IRT’s new approach was a big step forward from this fact. TRI will ensure that the results obtained through the use of different instruments are on the same scale.

The second limitation of this approach is the lack of invariance in the properties of the tests compared to the people used to estimate them, so in TST the important psychometric properties of the tests depend on the type of sample used to calculate them. that also finds a solution, at least partial, in the IRT approach.

The theory of the response to items (IRT) was born in addition to the classic theory of tests, that is, TCT and TRI could evaluate the same test, as well as establish a score or relevance for each item, which in turn could us a different result for each person.

On the other hand, when it is noted that the IRT would give us a much better calibrated instrument, the problem is that this paradigm has a much higher cost and requires the participation of specialized professionals.

The IRT has several hypotheses, but perhaps the most important is that any measuring instrument must be aligned with an idea: there is a functional relationship between the values of the variable that measures the items and the probability of obtaining them correctly, this function is called the characteristic curve of the article (CCI). So, what do we assume?

Well, something that from the outside may seem very logical and that TCT does not evaluate, for example, the most difficult elements are those answered only by the smartest people, on the other hand, a question that everyone answers correctly would not be worth it. because I wouldn’t have the power to discriminate, that is, I wouldn’t give any information. This is just a small look at the revolution proposed by the TRI.

To see a little better the differences between one measurement model and another, we can refer to the table of José Muiz (2010):

Thus relate the two theories of the test, although almost current, it seems clear that the IRT was born in response to the limitations or problems that TST can develop, however, it is clear that research still has a long way to go in this area. psychometrics.

Leave a Comment

Your email address will not be published. Required fields are marked *