# Understanding random operating curves or ROC analysis

Previous articles in this module on logistic regression and discriminant analysis explained how to know the classification of a group of observations based on some selected variables. In results, the articles predicted a binary classification (in the case of logistic regression) and classified the observations (like student hired or not hired). **Receiver Operating Curve (ROC)** is an extension of such classifications. Performance of binary classifier system in the case of **ROC** analysis can be tested.

## Receiver operating curve

**ROC** is a graphical plot that tests the performance of the classifier at different threshold levels. For instance, classify a group of students as hired or not hired based on the probability score. The students getting a probability above 0.75 get hired and the rest does not. Suppose, following the threshold of 0.75, 70 out of 100 students get hired. By changing the threshold, say from 0.75 to 0.80, 50 out of 100 students get hired. Thus, as one moves along the curve of thresholds, the results will change accordingly. In such cases, the **ROC** curve is used.

## Using receiver operating curve

Plotting of the true positive rate against the false positive rate generates the ROC curve, at different thresholds. This phenomenon refers to the role of the ROC curve in comparing the ‘sensitivity’ with the ‘specificity’ across a host of values, hence predicting a dichotomous outcome. ‘Sensitivity’ refers to the ability of a system where true values are predicted correctly as true. Similarly, ‘specificity’ refers to the ability of a system where false values are predicted incorrectly as true. Where sensitivity is measured at Y axis, specificity is measured at X-axis. Thus, more the system generates specificity that means a correct prediction, more the ROC curve will towards left. The ROC curve looks like something as below:

The blue curve represents the ROC curve which is tilted towards the Y-axis, indicating more sensitivity than specificity. That means the system has generated more correct predictions than incorrect predictions at every threshold.

## Example case of using **ROC** analysis

Take the case of test score of 30 students. Based on that classify the results as binary, with values ‘0’ for ‘fail’ and ‘1’ for ‘pass’. Now apply **ROC** curve in this case. The **ROC** will assess the data and test the results at different thresholds. Like, it can treat different score like 60, 55, 78, 56 as its threshold and check how many students get through the test.

Using SPSS, process the **ROC** analysis of the above data of students. The **ROC** curve looks something like as below. The **ROC** curve is more tilted towards the sensitivity then specificity, which means, at all the levels of thresholds selected by the system, more correct predictions have been attained as compared to incorrect predictions.

## Applications of **ROC** analysis

**ROC** analysis is used as an assessment of the performance of predictive analysis techniques. Therefore, wherever the techniques like logistic regression, discriminant analysis, nearest neighbor or Naïve Bayesian are used, ROC analysis can be used for assessing the validity of the model.

Software that supports ROC analysis is R, SAS, MATLAB, STATA, and SPSS. **ROC** can be easily performed in any software with minimal requirements.

## Discuss