Principal Component Analysis

Wed 23 May 2018

Finds patterns to reduce the dimensions of the dataset with minimal loss of information.

Finds directions (components) that maximize the variance in our dataset as opposed to MDA (Multiple Discriminant Analaysis) which also finds direction but instead to maximize class separation.


Average value for a feature.

$$\frac{\displaystyle\sum_{i=1}^{n}(x_i)} {n}$$


The the average variability of a feature from the mean.

$$\sigma^2 = \frac{\displaystyle\sum_{i=1}^{n}(x_i - \mu)^2} {n}$$

Standard Deviation

The square root of the variance. Indicate by lowercase sigma (σ)

$$\sigma = \sqrt\frac{\displaystyle\sum_{i=1}^{n}(x_i - \mu)^2} {n}$$


Measures the amount and direction of correlation between two random variables. Would use \(n\) for denominator, instead of \(n - 1\), if \(n\) was the total populate and not a sample.

$$cov(X, Y) = \displaystyle\frac{\sum_{i=1}^{n}(x_i - \mu_x)(y_i - \mu_y)} {n-1}$$

Pearson Correlation Coefficent

Similar to covariance except covariance except the divisor is the product of the standard deviations. This gives the "product moment", the moment at it's origin.

$$\rho _{X,Y}={\frac {\operatorname {cov} (X,Y)}{\sigma _{X}\sigma _{Y}}}$$

Spearman Rank Correlation Coefficent

Relationship between two ranked variables can be described using a monotonic function.

$${\displaystyle r_{s}=\rho _{\operatorname {rg} _{X},\operatorname {rg} _{Y}}={\frac {\operatorname {cov} (\operatorname {rg} _{X},\operatorname {rg} _{Y})}{\sigma _{\operatorname {rg} _{X}}\sigma _{\operatorname {rg} _{Y}}}}}$$


  1. Find mean of each column.
  2. Create covariance matrix (covariance of each column with each other).
  3. Calculate the eigendecomposition of the covariance matrix.
  4. Eigenvectors are the directions or components for the reduced subspace
  5. Eigenvalues represent the magnitudes for the directions
  6. Rank the eigenvectors from high to low with corresponding eigenvalue and choose the top k eigenvectors
  7. If all eigenvalues are similar projection might not be effective as it's already compressed
  8. If there are eigenvalues close to zero, they represent components or axes that may be discarded
  9. Project the original data onto the new subspace using eigenvectors
from numpy import array
from sklearn.decomposition import PCA
# define a matrix with 2 components
A = array([[1, 2], [3, 4], [5, 6]])
# Number of components to keep = 1
pca = PCA(1)
print('components', pca.components_)
print('variance', pca.explained_variance_)
B = pca.transform(A)
print('transformed', B)

components [[0.70710678 0.70710678]]
variance [8.]
transformed [[-2.82842712]
 [ 0.        ]
 [ 2.82842712]]