Math Glossary

Mon 01 January 2018

Math Glossary

Associative property (Binary Operators)

Grouping of terms does not matter.

$a + b + c = (a + b) + c = a + (b + c)$

$abc = (ab)c = a(bc)$

Commutative property (Binary Operators)

Order of term does not matter.

$a + b = b + a$

$ab = ba$

Distributive property (Binary Operators)

Multiplication distributes over addition.

$a(b + c) = ab + ac$


if it is both injective and surjective. In this case, f is a one-to-one correspondence between the input set and the output set: for each of the possible outputs y ∈ Y (surjective part), there exists exa


Sampling the data with replacement. Randomly sample from the ENTIRE dataset N times where N * M (sample from each pass) equals the size of the observational set.


set of output types


Number of items in set: n Number of items taken: k n C k = n! / [k! (n-k) !]. n = 5, k = 3 = 10


Just Covariance normalized. covariance / (stdev(x) * stddev(y))


Measures the amount and direction of correlation between two random variables. How much difference there are in two features are measured the same way for example height, arm length, foot size are all measure the same way.

cov(x,y) = 1/n sig (x - mean(x)) * (y - mean(y))

Would use $n$ for denominator, instead of $n - 1$, if $n$ was the total populate and not a sample. $$cov(X, Y) = \displaystyle\frac{\sum_{i=1}^{n}(x_i - \mu_x)(y_i - \mu_y)} {n-1}$$

Covariate Shift

When training and test samples follow different input distributions but the conditional distribution of output values, P(y|x), remains unchanged

Dependent variable

is the effect. Its value depends on changes in the independent variable.


set of allowed input values

Euler's/Napier's Number

Exponents with e as a base are known as natural exponents, and here's the reason. If you plot a graph of $y=e^x$ you'll get a curve that increases exponentially, just as you would if you plotted the curve with base 10 or any other number. However, the curve ​y​ = e​x​ has two special properties. For any value of ​x​, the value of ​y​ equals the value of the slope of the graph at that point, and it also equals the area under the curve up to that point. This makes e an especially important number in calculus and in all the areas of science that use calculus.

The logarithmic spiral, which is represented by the equation $r = ae^{b\theta}$ is found throughout nature, in seashells, fossils and and flowers. Moreover, e turns up in numerous scientific contexts, including the studies of electric circuits, the laws of heating and cooling, and spring damping. Even though it was discovered 350 years ago, scientists continue to find new examples of Euler's number in nature.



$$n = 3$$ $$n! = 3 * 2 * 1 = 6$$

Gini impurity

is a measure of how often a randomly chosen element from the set would be incorrectly labeled if it was randomly labeled according to the distribution of labels in the subset”

Expectation: Probability weighted sum of all values from a random variable. $\text{E}[X] = \sum_{x \in \mathcal{X}}xP(x)$

Independent variable

is the cause. Its value is independent of other variables in your study.

Independently and Identically Distributed (I.I.D.)

Each random variable has the identical probability distribution as the others and all are mutually independent. Mutually independent means one random value does not effect another. "same probability distribution" means same function and function is basically the data collection process. For instance click logs or form submissions. But together they wouldn't be.


Average value for a feature. $$\frac{\displaystyle\sum_{i=1}^{n}(x_i)} {n}$$

Monotonic Relationship

As one variable increases so does the other.

Pearson Correlation Coefficent

Similar to covariance except the divisor is the product of the standard deviations. This gives the "product moment", the moment at it's origin. $$\rho {X,Y}={\frac {\operatorname {cov} (X,Y)}{\sigma \sigma _{Y}}}$$


Number of items in set: n Number of items taken: k n P k = (n)(n-1)(n-2)(n-3)......(n-k+1) Therfore, The number of permutations of n distinct objects taken k at a time can be written as: n P k = n! / (n - k) ! n = 5, k = 3 = 60

Poisson Bootstrapping

Instead of resampling and adding to the new dataset a weights are used to capture the distribution. This is useful in a streaming environment as you can accumulate counts for each feature.

Probability Distribution

The mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment.


set of all possible output values of the function

Ridge Regression

"Ridge Regression is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates are unbiased, but their variances are large so they may be far from the true value. By adding a degree of bias to the regression estimates, ridge regression reduces the standard errors. It is hoped that the net effect will be to give estimates that are more reliable."

Standard Deviation

The square root of the variance. Indicate by lowercase sigma (σ) Variance is a measure of dispersion to show the magnitude of how spread the numbers are apart. Standard Deviation is a stat to indicate how far from the mean a feature/sample is. $$\sigma = \sqrt\frac{\displaystyle\sum_{i=1}^{n}(x_i - \mu)^2} {n}$$

Spearman Rank Correlation Coefficent

Relationship between two ranked variables can be described using a monotonic function. $${\displaystyle r_{s}=\rho {\operatorname {rg} ,\operatorname {rg} {Y}}={\frac {\operatorname {cov} (\operatorname {rg} ,\operatorname {rg} {Y})}{\sigma {X}}\sigma _{Y}}}}}$$


if it covers the entire output set (in other words, if the image of the function is equal to the function’s codomain).


The the average variability of a feature from the mean. In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers is spread out from their average value.

$$\sigma^2 = \frac{\displaystyle\sum_{i=1}^{n}(x_i - \mu)^2} {n}$$


Objective Function Uses

Optimization objective Problem type API value Use this objective if you want to... AUC ROC Classification - Distinguish between classes

Log loss Keep prediction probabilities as accurate as possible. Only supported objective for multi-class classification. AUC PR Optimize results for predictions for the less common class. Precision at Recall Optimize precision at a specific recall value. Recall at Precision Optimize recall at a specific precision value. Root Mean Squared Error (RMSE) Regression MINIMIZE_RMSE Capture more extreme values accurately. MAE Regression View extreme values as outliers with less impact on model. RMSLE Regression Penalize error on relative size rather than absolute value. Especially helpful when both predicted and actual values can be quite large.

R denotes real numbers and n is the number of dimensions. $${R}^n$$

Note that := means “is defined as”.