such that g is defined as returning the y value that gives the highest score
g(x)=argmaxyf(x,y)
Loss function
In order to measure how well a function fits the training data, the loss function is defined:
L:Y×Y→R≥0
I.e. if we have the training samples (xi,yi), then the the loss of predicting the value ˆy is L(yi,ˆy).
Usually, in the same context, is used the term cost function, which can be regarded as a generalization of the lost function
Important Concepts
Important Concepts
Feature Selection
Features are chosen with a specific task in mind
Curse of Dimensionality: The more features you include, the more data you need (exponentially) to produce an equally accurate model
No Free Lunch' theorem
'No Free Lunch' theorem: "If an algorithm performs better than random search on some class of problems then it must perform worse than random search on the remaining problems".
Generalization
The ability of algorithm to perform well on new (not-seeing before) data
Overfitting and Underfitting
Overfitting happens when a model learns all the detail (and noise) in the training data to the extent that it negatively impacts the performance of the model on new data.
Underfitting is the case where the model has "not learned enough", resulting in low generalization and unreliable predictions.
Overfitting and underfitting are the two biggest causes for poor performance of machine learning algorithms
Overfitting vs Underfitting
Bias and Variance
Bias is the difference between the estimator's expected value and the true value of the parameter being estimated. In ML, biased results are usually due to faulty assumptions
In ML, bias is inevitable - remember the 'No Free Lunch theorem'.
Variance is the expectation of the squared deviation of a random variable from its mean.
Cross validation is a technique for testing the effectiveness of a machine learning models.
Cross validation can give an insight on how the model will generalize to an independent dataset
The goal of cross-validation is to test the model's ability to predict new data, that was not used in estimating it, in order to flag problems like overfitting