Principal Component Analysis

Wed 23 May 2018


Mean

Average value for a feature. $$\frac{\displaystyle\sum_{i=1}^{n}(x_i)} {n}$$

Variance

The variability of a feature from the mean. $$\sigma^2 = \frac{\displaystyle\sum_{i=1}^{n}(x_i - \mu)^2} {n}$$

Standard Deviation

The square root of the variance. Indicate by lowercase sigma (σ) $$\sigma = \sqrt\frac{\displaystyle\sum_{i=1}^{n}(x_i - \mu)^2} {n}$$

Covariance

Measures the strength of correlation between two random variables. Would use $n$ for denominator, instead of $n - 1$, if $n$ was the total populate and not a sample. $$cov(X, Y) = \displaystyle\frac{\sum_{i=1}^{n}(x_i - \mu_x)(y_i - \mu_y)} {n-1}$$

Pearson Correlation Coefficient

PCA


TensorFlow

tf.contrib.distributions.moving_mean_variance or for a single tensor...

def variance(x):
  """Avg of the sq deviation from the mean
   E[(Xi - mu)^2]
   """
  mu = tf.reduce_mean(x)
  return tf.reduce_mean(tf.pow(x - mu, 2))

NumPy

a = [1, 2, 3, 4, 5]
np.var(a)
2.0
b = [1, 2, 3, 4, 50]
np.var(b)
362.0
np.cov(a, b)
array([[  2.5,  25. ],
       [ 25. , 452.5]])

Beam

https://github.com/apache/beam/blob/9d75d06643f0d443ede4d172cca2c5d8b3c5ef65/sdks/python/apache_beam/transforms/ptransform_test.py#L359

result = pcoll | 'Mean' >> beam.CombineGlobally(self._MeanCombineFn())