Understanding random operating curves or ROC analysis

By Priya Chetty on September 18, 2018

Previous articles in this module on logistic regression and discriminant analysis explained how to know the classification of a group of observations based on some selected variables. In results, the articles predicted a binary classification (in the case of logistic regression) and classified the observations (like student hired or not hired). Receiver Operating Curve (ROC) is an extension of such classifications. Performance of binary classifier system in the case of ROC analysis can be tested.

ROC is a graphical plot that tests the performance of the classifier at different threshold levels. For instance, classify a group of students as hired or not hired based on the probability score. The students getting a probability above 0.75 get hired and the rest does not. Suppose, following the threshold of 0.75, 70 out of 100 students get hired. By changing the threshold, say from 0.75 to 0.80, 50 out of 100 students get hired. Thus, as one moves along the curve of thresholds, the results will change accordingly. In such cases, the ROC curve is used.

Plotting of the true positive rate against the false positive rate generates the ROC curve, at different thresholds. This phenomenon refers to the role of the ROC curve in comparing the ‘sensitivity’ with the ‘specificity’ across a host of values, hence predicting a dichotomous outcome. ‘Sensitivity’ refers to the ability of a system where true values are predicted correctly as true. Similarly, ‘specificity’ refers to the ability of a system where false values are predicted incorrectly as true. Where sensitivity is measured at Y axis, specificity is measured at X-axis. Thus, more the system generates specificity that means a correct prediction, more the ROC curve will towards left. The ROC curve looks like something as below:

The blue curve represents the ROC curve which is tilted towards the Y-axis, indicating more sensitivity than specificity. That means the system has generated more correct predictions than incorrect predictions at every threshold.

Example case of using ROC analysis

Take the case of test score of 30 students. Based on that classify the results as binary, with values ‘0’ for ‘fail’ and ‘1’ for ‘pass’. Now apply ROC curve in this case. The ROC will assess the data and test the results at different thresholds. Like, it can treat different score like 60, 55, 78, 56 as its threshold and check how many students get through the test.