This blog series is intended to discuss some of the most widely
used concepts during the task of 'Classification' in Data Science.
Prerequisite: Familiarity with basics of Machine Learning
terminologies.
Confusion Matrix (also known as Contingency Table) plays an
important role in assessment of the strength of the Classification Model in
Machine Learning.
The four numbers inside this Table:
1. True Positive
2. True Negative
3. False Positive
4. False Negative
can be helpful in telling your data story. One can easily plot a confusion matrix using the R library
called 'caret'.
Looking at the Confusion Matrix alone, one can
calculate:
1. Precision
2. Recall
3. F1 Score
4. Accuracy
Figure 1: Confusion Matrix
'Precision' represents 'Exactness' of the classifier. It is also known as
Positive Predictive Value(PPV). It tells us, how likely it is to be
correct, when you predict Positive.
'Recall' represents 'Completeness' of the classifier. It is also known
as Sensitivity/True Positive Rate. It tells us, how much % of the Positive
Class is caught by the model of the Total Positives.
'F1 score' is the Harmonic Mean of Precision and Recall.
'Accuracy Paradox' is well known in
the Data Science world. A model may look highly accurate but in reality it
might be misleading, in case of imbalanced dataset. Therefore it is imperative
to check the distribution of the Positive Class and the Negative Class in the
data set.
Hence exploring the topics like CAP (Cumulative Accuracy
Profile) and ROC(Receiver Operating Curve) is a vital step. Both visualizations
are quite popular and are vastly used when one wants to assess the
Discriminatory Capabilities of the Model.
ROC curve can be plotted
and AUC (area under the curve) can be calculated using Open Source R function
called roc.curve. Read more on ROC here.
CAP can be visualized on an Excel sheet easily.
CAP is a very powerful technique to improve the Hit Ratio of
your marketing efforts within the predetermined Budget, Time and Man power of
the campaign.
Figure 2: Example for a CAP-Curve
AR = 2 AUC − 1 where AR is Accuracy Ratio of the CAP.
Easy to understand and very useful. Short and simple explanation.
ReplyDelete