Scatter measures in statistics

In data distribution, dispersion measures play a very important role, these measures complement the measurements of the central position, characterizing the variability of the data.

Thus, central trend measures indicate values for which the data appear to be grouped, recommended to infer the behavior of variables in populations and samples, some examples of them are arithmetic mean, fashion or median (1).

  • Scatter measures complement these central trend measures.
  • In addition.
  • They are essential in data distribution.
  • In fact.
  • They characterize the variability of the data.
  • Its relevance to statistical training was raised by Wild and Pfannkuch (1999).

In these measurements, the perception of the variability of the data is one of the basic components of statistical thinking, the perception of this variability tells us about the dispersion of the data relative to an average.

Arithmetic averages are widely used in practice, but can often be misinterpreted, this occurs when variable values are very dispersed, it is on these occasions that it is necessary to follow the average dispersion measures (2).

In scatter measurements, there are three important components related to random variability (2):

In a statistical study, when it comes to generalizing data from a sample of a population, dispersion measures are very important because they have a direct impact on the error with which we work, thus, the more dispersions we collect in a sample, the more volume we will have I will have I need to work with the same error.

On the other hand, these measures help us determine whether our data defevers too far from the core value and therefore provide us with information on whether this core value is appropriate to represent the population studied.

These measures are very useful for comparing distributions and understanding risks in decision-making (1), the greater the dispersion, the less representative the central value, the more commonly used these are:

First, the range is recommended for a main comparison, so consider only the two extreme observations, so it is recommended only for small samples (1), it is defined as the difference between the last value of the variable and the first. (3).

In turn, the average difference indicates where the data would be concentrated if they were all the same distance from the arithmetic average (1). We consider the difference in the value of a variable as the difference in absolute value between the value of this variable and the arithmetic mean of the series, therefore the arithmetic mean of the deviations (3) is considered.

Variance is an algebraic function of all values, adapted to inferential statistical tasks (1). It can be defined as square spreads.

For samples drawn from the same population, the standard deviation is the most used (1). This is the square root of the variance (3).

This is a measure that is mainly used to compare the variance between two data sets measured in different units, for example the size and body weight of students in a sample, so it is used to determine in which distribution they are grouped more data and the average is more representative (1).

The coefficient of variation is a more representative dispersion measure than the previous ones because it is an abstract number, that is, it is independent of the units in which the values of the variable appear, in general this coefficient of variation is generally expressed as a percentage (3).

Thus, these dispersion measures shall indicate, on the one hand, the degree of variability in the sample; on the other hand, they will indicate the representativeness of the central value, because if a small value is obtained, it will mean that the values are concentrated around that center.

This will mean that there is little variability in the data and that the center represents everything appropriately; on the other hand, if the value obtained is high, it means that the values ​​are not concentrated, but scattered, this will mean that there is a lot of variability and that the center will not be very representative. On the other hand, to make inferences, we will need a larger sample size if we want to reduce the error, exactly increased by the greater variability.

Leave a Comment

Your email address will not be published. Required fields are marked *