noun: Statistics The simultaneous change in value of two numerically valued random variables: the positive correlation between cigarette smoking and the incidence of lung cancer; the negative correlation between age and normal vision.

In data science/machine learning there are different ways of measuring the strength and direction of the relationship between two variables. Typically measured in the form of a numerical coefficient ranging between -1 and 1:

Correlation is covariance that’s been standardized by their respective X’s and Y’s. Covariace(X, Y) is sum((X - X.mean()) * (Y - Y.mean())) / N. Dividing the numerator of this formula by the standard deviation will give you the correlation coefficient.

In simpler terms, it measures how two variables change togeter whether in a positve, negative, or a straight-line direction.

It always takes values between -1 and 1, 1 meaning perfect positive correlation, and -1 meaning perfect negative correlation.

Correlation and Causation:

In statistics, many statistical tests calculate correlations between variables and when two variables are found to be correlated, it is tempting to assume that this shows that one variable causes the other. That “correlation proves causation” is considered a questionable cause logical fallacy when two events occurring together are taken to have established a cause-and-effect relationship.

Last Updated: April 07, 2019