exploRations
Statistical tests

This tutorial is one in a series of four. The goal of this tutorial series is to show you how to choose your test and how to apply and interpret them. This first part will tell you which one to choose, the other three parts are about applying and interpreting the tests for categorical, ordinal and Gaussian variables respectively.

Goal Categorical Ordinal Gaussian
Descriptive Proportion, Mode Median, Interquartile Range Central tendency, Distribution spread
1 Sample Chi-square, Binominal test Wilcoxon one sample text Normality, One sample t-test
2 Unrelated samples Chi-square Mann-Whitney U test, Kolmogorov-Smirnov test Unpaired t-test
2 Related samples McNemar’s test Wilcoxon Signed-Rank Test Paired t-test
Association 2 variables Contigency coefficients Spearman rank correlation Pearson correlation

Statistical test allow us to draw conclusions about the distribution of a population, comparisons between populations or relations between variables. Statistical testing is about testing whether the so called null hypothesis, which I sometimes refer to as the ‘nothing to see here’ conclusion, is true. General examples of null hypothesis statements are “there are no differences between groups”, “this didn’t show any effect” or “there is no relation between…”.

Statistical tests results: p-values

The outcome of a statistical test can be read from the p-value. The p-value is very powerful, because it incorporates effect size, sample size, and variability of the data into a single number that objectively tells you how consistent your data are with the null hypothesis. Low p-values indicate strong evidence against the null hypothesis, so they lead you to reject the null hypothesis; generally a p-value 0.05 or less is taken as a significant deviation from the null hypothesis.

There are many ways in which p-values are misinterpreted (see this blog for an in depth discussion about this). Do not take the p value as a percentual chance you might be wrong in rejecting the null hypothesis, tempting you into thinking a p-value of 0.1 is fine too: a p-value of 0.1 is not equal to a 10% chance of being wrong! In the table below you can see just how wrong this interpretation is. You might even prefer to be more stringent in your p-value choice after seeing this… Just remember: p-values are about the likeliness that your samples represent the null hypothesis (p-value > 0.05) or not (p-value < 0.05), it is not about how likely this result will hold in all other samples.

p-value Probability of incorrectly rejecting a true null hypothesis
0.05 At least 23% (and typically close to 50%)
0.01 At least 7% (and typically close to 15%)

Choosing your statistical test

The exact test you use is determined by two things:

Types of conclusions

Mainly there are four groups of statistics we’ll be discussing: descriptives, statistical tests for one group, statistical tests for two samples and statistical tests describing how variables are associated.

Sometimes you just want to describe one variable. Although these types of descriptions don’t need statistical tests, I’ll describe them here since they should be a part of interpreting the statistical test results. Statistical tests say whether they change, but descriptions on distibutions tell you in what direction they change.

One sample tests are done when you want to find out whether your measurements differ from some kind of theorethical distribution. For example: you might want to find out whether you have a dice that doesn’t get the random result you’d expect from a dice. In this case you’d expect that the dice would throw 1 to 6 about 1/6th of the time.

Two sample tests come in two flavors: unrelated and related samples. Unrelated sample tests can be used for analysing marketing tests; you apply some kind of marketing voodoo to two different groups of prospects/customers and you want to know which method was best. Related sample tests are used to determine whether there are differences before and after some kind of treatment. It is also useful when seeing when verifying the predictions of machine learning algorithms.

Tests of association determine what the strength of the movement between variables is. It can be used if you want to know if there is any relation between the customer’s amount spent, and the number of orders the customer already placed.

Levels of measurement

A variable can be categorized as one of the following levels of measurements, in order of increasing information value:

The level of measurement of the variable determines which type of test you can use. The main distinction between tests are the parametric versus nonparametric tests. Nonparametric tests are the tests that I’ve categorized here in the Categorical and Ordinal variables. Parametric tests can only be done from interval and ratio variables, but additionaly tests make assumptions about the defining properties of the variable’s distribution of the population. Mostly they assume that the variable is normally distributed for the population (i.e. Gaussian). To check whether your variable is in this category you can use this link. In my experience in Marketing variables never follow a Gaussian distribution (please let me know if you have a examples in which it is the case). So for me, this rules out parametric tests, and whenever I think I have an interval or ratio level variable I still take the tests associated with ordinal variables. The downside to my approach is that the nonparametric tests are less powerful for detecting effects, so it makes my conclusions more conservative.

Putting it all together

The previous sections should have given you enough rope to find out what kind of test you need: by knowing what the type of conclusion is you want to reach, and finding out what level of measurement your variable is at, you can infer the needed test by making the correct crossing in the table below:

Goal Categorical Ordinal Gaussian
Descriptive Proportion, Mode Median, Interquartile Range Mean, SD
1 Sample Chi-square, Binominal test Wilcoxon one sample text One sample t-test
2 Unrelated samples Chi-square Mann-Whitney U test, Kolmogorov-Smirnov test Unpaired t-test
2 Related samples McNemar’s test Wilcoxon Signed-Rank Test Paired t-test
Association 2 variables Contigency coefficients Spearman correlation Pearson correlation

All descriptive methods and statistical tests associated with the lower information value variables can be applied to the higher information variables. So you can calculate a mode, which I’ve here associated with categorical variables, for ordinal and Gaussian variables as well. Although you can do the same for statistical tests, you should prefer the test associated with the variable level if you can: these tests increase the chances of finding the smallest p-value, since they include most of the variable’s information value. The next tutorials will zoom in on the tests for categorical variables, ordinal variables and Guassian variables.

0 Comments