Wednesday, February 22, 2017

Importance of a Baseline Model

One of the important aspects of building a machine learning model is to understand the data first. Most of us forget this and jump right into modelling. Another corollary to this is that we often times forget to build a baseline model before building something complicated.

What is a Baseline Model and a Baseline Accuracy?

A baseline model, in simple words, is the most simple model that you can build over the provided data. The accuracy that is achieved by a baseline model is the lower bound for evaluating the performance of your model.

A baseline model usually does not include any machine learning approaches, rather a statistical approach. It also include heuristics, randomness or simple statistics in order to come up with a value.

Sklearn supports baseline models in the form of Dummy Classifiers:

  • “stratified”: generates predictions by respecting the training set’s class distribution.
  • “most_frequent”: always predicts the most frequent label in the training set.
  • “prior”: always predicts the class that maximizes the class prior.
  • “uniform”: generates predictions uniformly at random.
  • “constant”: always predicts a constant label that is provided by the user. This is useful for metrics that evaluate a non-majority class.
In the case of regression, a baseline model could be any of the following:
  • Median or average
  • Constant
Ideally, the performance of the machine learning model should be much greater than the statistical performance.

In case of models that are already implemented, we can use the performance of the existing models as a frame of reference and they become baseline models.

No comments:

Post a Comment