noun: Statistics The simultaneous change in value of two numerically valued random variables: the positive correlation between cigarette smoking and the incidence of lung cancer; the negative correlation between age and normal vision.
In data science/machine learning there are different ways of measuring the strength and direction of the relationship between two variables. Typically measured in the form of a numerical coefficient ranging between -1 and 1:
Pearson’s Correlation (Product Moment Coefficient): a measure of the strength and direction of the linear relatioship between two variables.
Pearson’s correlation coefficient
Correlation is covariance that’s been standardized by their respective X’s and Y’s. Covariace(X, Y) is sum((X - X.mean()) * (Y - Y.mean())) / N. Dividing the numerator of this formula by the standard deviation will give you the correlation coefficient.
In simpler terms, it measures how two variables change togeter whether in a positve, negative, or a straight-line direction.
It always takes values between -1 and 1, 1 meaning perfect positive correlation, and -1 meaning perfect negative correlation.
Matthew’s Correlation: used in machine learning as a measure of the quality of binary and multiclass classifications. It takes into account true and false positives and negatives and is generally regarded as a balanced measure which can be used even if the classes are of very different sizes.
Rank Correlation: is any of several statistics that measure an ordinal association—the relationship between rankings of different ordinal variables or different rankings of the same variable, where a “ranking” is the assignment of the ordering labels “first”, “second”, “third”, etc. to different observations of a particular variable. A rank correlation coefficient measures the degree of similarity between two rankings, and can be used to assess the significance of the relation between them.
Correlation and Causation:
In statistics, many statistical tests calculate correlations between variables and when two variables are found to be correlated, it is tempting to assume that this shows that one variable causes the other. That “correlation proves causation” is considered a questionable cause logical fallacy when two events occurring together are taken to have established a cause-and-effect relationship.