There are many metrics to measure the performance of your machine learning model depending on the type of machine learning you are looking to conduct. In this article, we take a look at performance measures for classification and regression models and discuss which is better-optimized. Sometimes the metric to look at will vary according to the problem that is initially being solved.

You may also like: Machine Learning Models: Deployment Activities

*Examples of metrics in Machine Learning: Source*

**Optimization for Classification Problems**

*Example of a classification chart: Source*

**1. True Positive (Recall)**

The **True Positive Rate,** also called **Recall**, is the go-to performance measure in binary/non-binary classification problems. Most of the time — if not all of the time — we are only interested in correctly predicting one class. For example, if you were predicting diabetes, you will care more about predicting whether this person has diabetes than predicting that this person does not have diabetes. In this situation, the positive class is, “This person has diabetes,” and the negative class is, “This person does not have diabetes.” It is merely the accuracy of predicting the positive class

**This is not the Accuracy performance metric. See number 4 below for more details.**

**2. ROC Curve (Receiver Operating Characteristic Curve)**

An **ROC Curve** shows the performance of your classification model at different thresholds (probability of classification into a certain class). It plots the True Positive Rate and False Positive Rate against each other. Lowering the threshold will increase your True Positive Rate but sacrifice your False Positive Rate and vice versa.

**3. AUC (Area Under the Curve)**

AUC is also known as ‘Area Under the ROC Curve’. Simply put, the AUC will tell you the probability of correctly classifying your classes. A higher AUC represents a better model.

**4. Accuracy**

Accuracy is, by default, the first thing to look at. However, a real Data Scientist knows that Accuracy is too misleading. A better way to call it is the **Average Accuracy** of predicting all classes. Like I mentioned with True Positive Rate, it is the most ideal metric to optimize. Accuracy will take the average of the sum of **True Positive** and **True Negative**. Most times, in unbalanced classification problems, the **Negative class** is more represented than the **Positive class** so you are more likely to have a very high **True Negative Rate**. The **Accuracy** will then be biased to the accurate predictions of the **Negative class,** which might not interest anyone.

**Regression Optimization in Machine Learning**

*Example of a regression chart: Source*

**5. Error**

Often overlooked next to R^{2}, the error tells us more about the precision of the fitted values to the regression line (i.e. the average distance between the fitted values and the line of best fit). This is more important when calculating confidence and prediction intervals for your model. It is more interpretable due to the use of the natural units of the response variable, whereas the R^{2} has no units and is only between 0 and 1.

There are different types of errors such as **Mean Absolute Error** and **Root Mean Squared Error**. Each has its own pros and cons and must be treated independently to assess a model.

**6. R**^{2}

^{2}

Now, although Standard Error is important, the R^{2} has become the de-facto measure of a good regression model. It tells us how much the variation between the dependent variable and the independent variables are explained by the model. A higher R^{2} gives a better model, however, if too high at close to 99%, it can sometimes cause the risk of overfitting. R^{2} can be misleading due to the **correlation vs causation** debate that can give an illogically high R^{2}.

**The Objective of the User Will Affect the Performance of the Model — So Choose Carefully**

Accuracy is not always the best measure in a classification problem, and R^{2} might not be the best for regression. They are both definitely the easiest to understand especially by non-technical stakeholders (which is probably the biggest reason for building a model in the first place). The best possible approach may be to consider a variety of performance metrics and consider your initial objective. The performance of a model is always subject to the objective of the user. A poor performance from one person’s point of view might not be the case for another.