exploRations
Statistical tests for ordinal variables

This tutorial is the third in a series of four. This third part shows you how to apply and interpret the tests for ordinal and interval variables. This link will get you back to the first part of the series.

Descriptive

Sometimes you just want to describe one variable. Although these types of descriptions don’t need statistical tests, I’ll describe them here since they should be a part of interpreting the statistical test results. Statistical tests say whether they change, but descriptions on distibutions tell you in what direction they change.

Ordinal median

The median, the value or quantity lying at the midpoint of a frequency distribution, is the appropriate central tendency measure for ordinal variables. Ordinal variables are implemented in R as factor ordered variables. Strangely enough the standard R function median doesn’t support ordered factor variables, so here’s a function that you can use to create this:

median_ordinal <- function(x) {
  d <- table(x)
  cfd <- cumsum(d / sum(d))
  idx <- min(which(cfd >= .5))
  return(levels(x)[idx])
}

Which you can use on the diamond dataset from the ggplot2 library on it’s cut variable, which has the following distribution:

cut n() prop cum_prop
Fair 1.610 3% 3%
Good 4.906 9% 12%
Very Good 12.082 22% 34%
Premium 13.791 26% 60%
Ideal 21.551 40% 100%

With this distribution I’d expect the Premium cut would be the median. Our function call,

median_ordinal(diamonds$cut)

shows extactly that:

[1] "Premium"

Ordinal interquartile range

The InterQuartile Range (IQR) is implemented in the IQR function from base R, like it’s median counterpart, does not work with ordered factor variables. So again, we’re out on our own again to create a function for this:

IQR_ordinal <- function(x) {
  d <- table(x)
  cfd <- cumsum(d / sum(d))
  idx <- c(min(which(cfd >= .25)), min(which(cfd >= .75)))
  return(levels(x)[idx])
}

This returns, as expected:

[1] "Very Good" "Ideal" 

One sample

One sample tests are done when you want to find out whether your measurements differ from some kind of theorethical distribution. For example: you might want to find out whether you have a dice that doesn’t get the random result you’d expect from a dice. In this case you’d expect that the dice would throw 1 to 6 about 1/6th of the time.

Wilcoxon one sample test

With the Wilcoxon one sample test, you test whether your ordinal data fits an hypothetical distribution you’d expect. In this example we’ll examine the diamonds data set included in the ggplot2 library. We’ll test a hypothesis that the diamond cut quality is centered around the middle value of “Very Good” (our null hypothesis). First let’s see the total number of factor levels of the cut quality:

library(ggplot2)
levels(diamonds$cut)

Output:

[1] "Fair"      "Good"      "Very Good" "Premium"   "Ideal"

The r function for the Wilcoxon one-sample test doesn’t take non-numeric variables. So we have to convert the vector cut to numeric, and ‘translate’ our null hypothesis to numeric as well. As we can derive from the level function’s output the value “Very Good” corresponds to the number 3. We’ll pass this to the wilcox.test function like this:

wilcox.test(as.numeric(diamonds$cut), 
            mu=3,
            conf.int=TRUE ) 

Output:

	Wilcoxon signed rank test with continuity correction

data:  as.numeric(diamonds$cut)
V = 781450000, p-value < 2.2e-16
alternative hypothesis: true location is not equal to 3
95 percent confidence interval:
 4.499983 4.499995
sample estimates:
(pseudo)median 
      4.499967

We can see that our null hypothesis doesn’t hold. The diamond cut quality doesn’t center around “Very Good”. Somewhat non-sensical we also passed TRUE to the argument conf.int, to the function, but this also gave the pseudo median, so we are able to interpret what the cut quality does center around: “Premium” (the level counterpart of the numeric value 4.499967).

Two unrelated samples

Two unrelated sample tests can be used for analysing marketing tests; you apply some kind of marketing voodoo to two different groups of prospects/customers and you want to know which method was best.

Mann-Whitney U test

The Mann–Whitney U test can be used to test whether two sets of unrelated samples are equally distributed. Let’s take our trusted mtcars data set: we can test whether automatic and manual transmission cars differ in gas mileage. For this we use the wilcox.test function:

wilcox.test(mpg ~ am, data=mtcars)

Output:

        Wilcoxon rank sum test with continuity correction 
 
data:  mpg by am 
W = 42, p-value = 0.001871 
alternative hypothesis: true location shift is not equal to 0 

Since the p-value is well below 0.05, we can assume the cars differ in gas milage between automatic and manual transmission cars. This plot shows us the differences:

mtcars %>% 
  mutate(am = ifelse(am == 1, "Automatic", "Manual")) %>% 
  ggplot() +
    geom_density(aes(x = mpg, fill = am), alpha = 0.8)

Image text

The plot tells us, since the null hypothesis doesn’t hold, it is likely the manual transmission cars have lower gas milage.

Kolmogorov-Smirnov test

The Kolmogorov-Smirnov test tests the same thing as the Mann-Whitney U test, but has a much cooler name; the only reason I included this test here. And I don’t like name dropping, but drop this name in certain circles: it will result in snickers of approval for obvious reasons. The ks.test function call isn’t as elegant as the wilcox.test function call, but still does the job:

automatic <- (mtcars %>% filter(am == 1))$mpg
manual <- (mtcars %>% filter(am == 0))$mpg
ks.test(automatic, manual)

Output:

	Two-sample Kolmogorov-Smirnov test

data:  automatic and manual
D = 0.63563, p-value = 0.003911
alternative hypothesis: two-sided

Not surprisingly, automatic and manual transmission cars aren’t equal in gas milage, but the p-value is higher than the Mann-Whitney U test told us. This tells us Kolmogorov-Smirnov is a tad more conservative than it’s Mann-Whitney U counterpart.

Related samples tests are used to determine whether there are differences before and after some kind of treatment. It is also useful when seeing when verifying the predictions of machine learning algorithms.

Wilcoxon Signed-Rank Test

The Wilcoxon Signed-Rank Test is used to see whether observations changed direction on two sets of ordinal variables. It’s usefull, for example, when comparing results of questionaires with ordered scales for the same person across a period of time.

Association between 2 variables

Tests of association determine what the strength of the movement between variables is. It can be used if you want to know if there is any relation between the customer’s amount spent, and the number of orders the customer already placed.

Spearman Rank Correlation

The Spearman Rank Correlation is a test of association for ordinal or interval variables. In this example we’ll see if vocabulary is related to education. The function cor.test can be used to see if they are related:

library(car)

cor.test( ~ education + vocabulary, 
          data=Vocab,
          method = "spearman",
          continuity = FALSE,
          conf.level = 0.95)

Output:

	Spearman's rank correlation rho

data:  education and vocabulary
S = 8.5074e+11, p-value < 2.2e-16
alternative hypothesis: true rho is not equal to 0
sample estimates:
      rho 
0.4961558 

It looks like there is a relation between education and vocubulary. By looking at the plot, you can see the relation between education and vocabulary is a quite positive one.

ggplot(Vocab, aes(x = education, y = vocabulary)) +
  geom_jitter(alpha = 0.1) +
  stat_ellipse(geom = "polygon", alpha = 0.4, size = 1)

Image text

Association measures can be useful more than one variable at a time. For example you might want to consider a range of variables from your data set for inclusion in your predictive model. Luckily the corrplot library contains the corrplot_function to quickly visualise the relations between all in a dataset. Let’s take the mtcars variables as an example:

library(corrplot)
mat_corr <- cor(mtcars, 
                method = "spearman", 
                use = "pairwise.complete.obs")

p_values <- cor.mtest(mtcars, 
                      method = "spearman", 
                      use = "pairwise.complete.obs")
corrplot(mat_corr, 
         order = "AOE", 
         type = "lower", 
         cl.pos = "b", 
         p.mat = p_values[["p"]], 
         sig.level = .05, 
         insig = "blank")

The matrix mat_corr contains all Spearman rank correlation values, which are calculated with the cor function, passing the whole data frame of mtcars, using the method spearman and only include the pairwise observations. Then the p-values of each of the correlations are calculated using *corrplot**’s cor.mtest function, with the same function arguments. Both are used in the corrplot function, where the order, type and cl.post arguments specify some layout, which I won’t go in further. The argument p.mat is the p-value matrix, which are used by the sig.level and insig arguments to leave out those correlations which are considered insigificant below the 0.05 threshold.

Image text

This plot shows which values are positively correlated by the blue dots, while the negative associations are indicated by red dots. The size of the dots and the intensity of the colour show how strong that association is. The cells without dot, don’t have significant correlations. For the number of cylinders (cyl) are strongly negatively correlated with the milage per gallon, while the horsepower (hp) is positively associated with the number of carburetors (carb) albeit not as strong.

When you make a correlation martrix like this, be on the lookout for spurious correlations. Putting in variables into your model indiscriminately, without reasoning, may lead to some unintended results…

0 Comments