Math Glossary
Mon 01 January 2018
Associative property: Grouping of terms does not matter.
$a + b + c = (a + b) + c = a + (b + c)$ $abc = (ab)c = a(bc)$;
Commutative property: Order of term does not matter.
$a + b = b + a$ $ab = ba$;
Distributive property (Binary Operators)
Multiplication distributes over addition.
Math Glossary
Associative property (Binary Operators)
Grouping of terms does not matter.
$a + b + c = (a + b) + c = a + (b + c)$
$abc = (ab)c = a(bc)$
Commutative property (Binary Operators)
Order of term does not matter.
$a + b = b + a$
$ab = ba$
Distributive property (Binary Operators)
Multiplication distributes over addition.
$a(b + c) = ab + ac$
Bijective
if it is both injective and surjective. In this case, f is a one-to-one correspondence between the input set and the output set: for each of the possible outputs y ∈ Y (surjective part), there exists exa
Bootstrapping
Sampling the data with replacement. Randomly sample from the ENTIRE dataset N times where N * M (sample from each pass) equals the size of the observational set.
Codomain
set of output types
Combinations
Number of items in set: n Number of items taken: k n C k = n! / [k! (n-k) !]. n = 5, k = 3 = 10
Correlation
Just Covariance normalized. covariance / (stdev(x) * stddev(y))
Covariance
Measures the amount and direction of correlation between two random variables. How much difference there are in two features are measured the same way for example height, arm length, foot size are all measure the same way.
cov(x,y) = 1/n sig (x - mean(x)) * (y - mean(y))
Would use $n$ for denominator, instead of $n - 1$, if $n$ was the total populate and not a sample. $$cov(X, Y) = \displaystyle\frac{\sum_{i=1}^{n}(x_i - \mu_x)(y_i - \mu_y)} {n-1}$$
Covariate Shift
When training and test samples follow different input distributions but the conditional distribution of output values, P(y|x), remains unchanged
Dependent variable
is the effect. Its value depends on changes in the independent variable.
Domain
set of allowed input values
Euler's/Napier's Number
Exponents with e as a base are known as natural exponents, and here's the reason. If you plot a graph of $y=e^x$ you'll get a curve that increases exponentially, just as you would if you plotted the curve with base 10 or any other number. However, the curve y = ex has two special properties. For any value of x, the value of y equals the value of the slope of the graph at that point, and it also equals the area under the curve up to that point. This makes e an especially important number in calculus and in all the areas of science that use calculus.
The logarithmic spiral, which is represented by the equation $r = ae^{b\theta}$ is found throughout nature, in seashells, fossils and and flowers. Moreover, e turns up in numerous scientific contexts, including the studies of electric circuits, the laws of heating and cooling, and spring damping. Even though it was discovered 350 years ago, scientists continue to find new examples of Euler's number in nature.
** reference **
x = (-b+-sqrt(b^2-4ac))/(2a)
Factorial
$$n = 3$$ $$n! = 3 * 2 * 1 = 6$$
Gini impurity
is a measure of how often a randomly chosen element from the set would be incorrectly labeled if it was randomly labeled according to the distribution of labels in the subset”
Expectation: Probability weighted sum of all values from a random variable. $\text{E}[X] = \sum_{x \in \mathcal{X}}xP(x)$
Independent variable
is the cause. Its value is independent of other variables in your study.
Independently and Identically Distributed (I.I.D.)
Each random variable has the identical probability distribution as the others and all are mutually independent. Mutually independent means one random value does not effect another. "same probability distribution" means same function and function is basically the data collection process. For instance click logs or form submissions. But together they wouldn't be.
Mean
Average value for a feature. $$\frac{\displaystyle\sum_{i=1}^{n}(x_i)} {n}$$
Monotonic Relationship
As one variable increases so does the other.
Permutations
Number of items in set: n Number of items taken: k n P k = (n)(n-1)(n-2)(n-3)......(n-k+1) Therfore, The number of permutations of n distinct objects taken k at a time can be written as: n P k = n! / (n - k) ! n = 5, k = 3 = 60
Poisson Bootstrapping
Instead of resampling and adding to the new dataset a weights are used to capture the distribution. This is useful in a streaming environment as you can accumulate counts for each feature.
Probability Distribution
The mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment.
Range
set of all possible output values of the function
Ridge Regression
"Ridge Regression is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates are unbiased, but their variances are large so they may be far from the true value. By adding a degree of bias to the regression estimates, ridge regression reduces the standard errors. It is hoped that the net effect will be to give estimates that are more reliable." https://ncss-wpengine.netdna-ssl.com/wp-content/themes/ncss/pdf/Procedures/NCSS/Ridge_Regression.pdf
Surjective
if it covers the entire output set (in other words, if the image of the function is equal to the function’s codomain).
Variance
The the average variability of a feature from the mean. In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers is spread out from their average value.
$$\sigma^2 = \frac{\displaystyle\sum_{i=1}^{n}(x_i - \mu)^2} {n}$$
sqrt((sum_(i=1)^n(x_i + mu)^2)/n)
Standard Deviation
The square root of the variance. Indicate by lowercase sigma (σ) Variance is a measure of dispersion to show the magnitude of how spread the numbers are apart. Standard Deviation is a stat to indicate how far from the mean a feature/sample is. $$\sigma = \sqrt\frac{\displaystyle\sum_{i=1}^{n}(x_i - \mu)^2} {n}$$
sqrt((sum_(i=1)^n(x_i + mu)^2)/n)
Pearson Correlation Coefficent
Similar to covariance except the divisor is the product of the standard deviations. This gives the "product moment", the moment at it's origin. $$\rho {X,Y}={\frac {\operatorname {cov} (X,Y)}{\sigma \sigma _{Y}}}$$
Spearman Rank Correlation CoefficentG
Relationship between two ranked variables can be described using a monotonic function. $${\displaystyle r_{s}=\rho {\operatorname {rg} ,\operatorname {rg} {Y}}={\frac {\operatorname {cov} (\operatorname {rg} ,\operatorname {rg} {Y})}{\sigma {X}}\sigma _{Y}}}}}$$
Optimization objective Problem type API value Use this objective if you want to... AUC ROC Classification - Distinguish between classes
Log loss Keep prediction probabilities as accurate as possible. Only supported objective for multi-class classification. AUC PR Optimize results for predictions for the less common class. Precision at Recall Optimize precision at a specific recall value. Recall at Precision Optimize recall at a specific precision value. Root Mean Squared Error (RMSE) Regression MINIMIZE_RMSE Capture more extreme values accurately. MAE Regression View extreme values as outliers with less impact on model. RMSLE Regression Penalize error on relative size rather than absolute value. Especially helpful when both predicted and actual values can be quite large.
R denotes real numbers and n is the number of dimensions. $${R}^n$$
Note that := means “is defined as”.
t-Test
t = r sqrt((n - 2) / (1 - r^2))