Principal Component Analysis

Dec 10, 2017 • Mark

Principal Component Analysis (PCA) is a method for reducing a data-set with a high number of variables to a smaller set of new variables, ‘juicing’ the most of the same information out of the whole set of variables. In the data science realm it is mostly used to achieve one or more of the following goals:

Reducing the number of variables in a dataset reduces the number of degrees of freedom of a statistical model, which in turn reduces the risk of overfitting the model.
Machine learning algorithms perform significantly faster when less variables are included.
It can simplify the interpretation of data, by showing which variables play the biggest role in describing the data set.

In this tutorial I’ll explain the concept behind Principal Component Analysis, and with an example I’ll show you how to perform a PCA, how to choose the principal components and how to interpret them. Read more…

You can download the script here