# Principal Component Analysis

Wed 23 May 2018

## Mean

Average value for a feature. $$\frac{\displaystyle\sum_{i=1}^{n}(x_i)} {n}$$

## Variance

The variability of a feature from the mean. $$\sigma^2 = \frac{\displaystyle\sum_{i=1}^{n}(x_i - \mu)^2} {n}$$

## Standard Deviation

The square root of the variance. Indicate by lowercase sigma (σ) $$\sigma = \sqrt\frac{\displaystyle\sum_{i=1}^{n}(x_i - \mu)^2} {n}$$

## Covariance

Measures the strength of correlation between two random variables. Would use $n$ for denominator, instead of $n - 1$, if $n$ was the total populate and not a sample. $$cov(X, Y) = \displaystyle\frac{\sum_{i=1}^{n}(x_i - \mu_x)(y_i - \mu_y)} {n-1}$$

## PCA

### TensorFlow

tf.contrib.distributions.moving_mean_variance or for a single tensor...

def variance(x):
"""Avg of the sq deviation from the mean
E[(Xi - mu)^2]
"""
mu = tf.reduce_mean(x)
return tf.reduce_mean(tf.pow(x - mu, 2))


### NumPy

a = [1, 2, 3, 4, 5]
np.var(a)
2.0

b = [1, 2, 3, 4, 50]
np.var(b)
362.0

np.cov(a, b)
array([[  2.5,  25. ],
[ 25. , 452.5]])


### Beam

https://github.com/apache/beam/blob/9d75d06643f0d443ede4d172cca2c5d8b3c5ef65/sdks/python/apache_beam/transforms/ptransform_test.py#L359

result = pcoll | 'Mean' >> beam.CombineGlobally(self._MeanCombineFn())