Understanding random operating curves or ROC analysis

Previous articles in this module on logistic regression and discriminant analysis explained how to know the classification of a group of observations based on some selected variables. In results, the articles predicted a binary classification (in the case of logistic regression) and classified the observations (like student hired or not hired). Receiver Operating Curve (ROC) is an extension of such classifications. Performance of binary classifier system in the case of ROC analysis can be tested.

Receiver operating curve

ROC is a graphical plot that tests the performance of the classifier at different threshold levels. For instance, classify a group of students as hired or not hired based on the probability score. The students getting a probability above 0.75 get hired and the rest does not. Suppose, following the threshold of 0.75, 70 out of 100 students get hired. By changing the threshold, say from 0.75 to 0.80, 50 out of 100 students get hired. Thus, as one moves along the curve of thresholds, the results will change accordingly. In such cases, the ROC curve is used.

Using receiver operating curve

Plotting of the true positive rate against the false positive rate generates the ROC curve, at different thresholds. This phenomenon refers to the role of the ROC curve in comparing the ‘sensitivity’ with the ‘specificity’ across a host of values, hence predicting a dichotomous outcome. ‘Sensitivity’ refers to the ability of a system where true values are predicted correctly as true. Similarly, ‘specificity’ refers to the ability of a system where false values are predicted incorrectly as true. Where sensitivity is measured at Y axis, specificity is measured at X-axis. Thus, more the system generates specificity that means a correct prediction, more the ROC curve will towards left. The ROC curve looks like something as below:

Figure 1: ROC curve after plotting true positive rate against the false positive rate

Figure 1: ROC curve after plotting the true positive rate against the false positive rate

The blue curve represents the ROC curve which is tilted towards the Y-axis, indicating more sensitivity than specificity. That means the system has generated more correct predictions than incorrect predictions at every threshold.

Example case of using ROC analysis

Take the case of test score of 30 students. Based on that classify the results as binary, with values ‘0’ for ‘fail’ and ‘1’ for ‘pass’. Now apply ROC curve in this case. The ROC will assess the data and test the results at different thresholds. Like, it can treat different score like 60, 55, 78, 56 as its threshold and check how many students get through the test.

Figure 2: Results for test scores of 30 students using ROC analysis

Figure 2: Results for test scores of 30 students using ROC analysis

Using SPSS, process the ROC analysis of the above data of students. The ROC curve looks something like as below. The ROC curve is more tilted towards the sensitivity then specificity, which means, at all the levels of thresholds selected by the system, more correct predictions have been attained as compared to incorrect predictions.

Figure 3: Receiver operating curve for the test case using SPSS software

Figure 3: ROC analysis for the test case using SPSS software

Applications of ROC analysis

ROC analysis is used as an assessment of the performance of predictive analysis techniques. Therefore, wherever the techniques like logistic regression, discriminant analysis, nearest neighbor or Naïve Bayesian are used, ROC analysis can be used for assessing the validity of the model.

Software that supports ROC analysis is R, SAS, MATLAB, STATA, and SPSS. ROC can be easily performed in any software with minimal requirements.

Priya Chetty

Partner at Project Guru
Priya is a master in business administration with majors in marketing and finance. She is fluent with data modelling, time series analysis, various regression models, forecasting and interpretation of the data. She has assisted data scientists, corporates, scholars in the field of finance, banking, economics and marketing.

Related articles

  • How to apply logistic regression in a case? Machine learning involves solutions to predict scenarios based on past data. Logistic regression offers probability functions based on inputs and their corresponding output.
  • How to apply linear discriminant analysis? Linear discriminant model is a multivariate model. It is used for modeling the differences in groups. In this model a categorical variable can be predicted through continuous or binary dependent variable.
  • How to conduct path analysis? Path analysis is a graphical representation of multiple regression models. In this analysis, the graphs represent the relationship between dependent and independent variables with the help of square and arrows.
  • How to perform and apply Monte Carlo simulation? Monte Carlo simulation is an extension of statistical analysis where simulated data is produced. This method uses repeated sampling techniques to generate simulated data.
  • Getting acquainted with neural network analysis Neural network, popularly known as Artificial Neural Network (ANN) is an information processing system with a large number of nodes and connections as part of a structure which helps in processing complex information.

Discuss

We are looking for candidates who have completed their master's degree or Ph.D. Click here to know more about our vacancies.