How to conduct survival analysis?

Survival analysis is a method under predictive modeling where the dependent variable is time. Therefore, it involves time-to-event prediction modeling. The methodology is that the outcome variable is time until the occurrence of a certain event. The response of the event is known as the survival time, failure time or event time. The data for survival analysis is generally continuous. For example; survival analysis is applicable in the event of person developing a heart attack. The analysis will help to ascertain the time period (in days, weeks, years, etc.) until a person develops a heart attack.

In survival analysis, the subjects (like person having heart attack) are observed over a certain period of time. The focus is on the time at which the event of interest occurs. Another example of survival analysis is administering the period of attrition of employees. Survival analysis can help estimat the probable time when an employee will leave the company.

Difference between survival analysis and regression

Regression is not suitable for determining survival because of two major assumptions; non-normality of outcome and censoring of information. Censoring means the missing information about the subject. For example, a patient under observation in a study for 30 days, does not experience a heart attack during the period. In this case, the information of the patient is termed as ‘right censored’. Another example of right censoring is when a person drops out of the study prematurely and did not experience the event. This person’s survival time is said to be ‘censored’, since the event of interest did not happen while this person was under observation. Although the censoring implies the missing of information in data, but it also avoids bias in survival analysis. Also survival times are mostly positive number. However it is very difficult in regression to restrict the results to positive number only.

Example of survival analysis

This section explains the process of conducting survival analysis through a case study. The case dataset is of 30 patients who are suffering from a heart disease. A healthcare institute discovered three types of drugs for heart patients, namely, Drug A, Drug B and Drug C. The researcher needs to test if these three drugs are able to cure heart diseases in 30 patients, at any given time period. Thus, the data is comprised of three sets:

  1. Time: implies the time period each of the patient took the drugs,
  2. Treatment: includes the three types of drugs taken by patients and
  3. Status: implies if the patients after taking drugs (Drug A, Drug B or Drug C) in a given time period such as 10 months for 1st patient; no longer suffer from the disease.
Figure 1: Dataset for survival analysis

Figure 1: Dataset for survival analysis

Step 1

Here, survival analysis helps investigate the time period taken by the patients to cure from disease using above postulated treatment. Figure 2 below represents the results from SPSS software. It consists of three different curves representing three Drugs (Drug A, Drug B or Drug C). The horizontal axis carries the different time periods taken up by patients to try medicines. The curves start with point 1 and converge to zero, representing, how each patient is taking the drug and how many months it takes to cure him. For instance, as shown in figure below, patients taking drug one (A) and two (B) (curve blue and green) gets cured within 10 months. However, patients taking drug C (curve yellow), is cured within 9 months.

Figure 2: Survival Analysis Curve in SPSS

Figure 2: Survival Analysis Curve in SPSS

Step 2

Furthermore, investigate if the results of all three drugs are similar or different. The results of survival analysis also contain below table representing three test scores. Here, check all three p values (as indicated in below figure). Then analyze if p values are greater than 0.05. If yes, then the null hypothesis is rejected. In other words, all three drugs have similar effects. Since in this case, all the p values are way greater than 0.05 (5%), the null hypothesis cannot be rejected. Therefore the conclusion is that all three drugs are similar or have similar effects.

Figure 3: Test of equality of survival distributions for the different levels of treatment

Figure 3: Test of equality of survival distributions for the different levels of treatment

Application of survival analysis

Survival analysis is important for studying the nature of recurring events. Thus it is applicable in social sciences, medical research and any other field with such recurring events.

Software supporting survival analysis

Software that supports survival analysis test are SAS, MATLAB, STATA and SPSS.

Indra Giri

Senior Analyst at Project Guru
He completed his Masters in Development Economics from South Asian University, New Delhi. His areas of interest includes various socio development issues like poverty, inequality and unemployment in South Asia. Apart from writing for Project Guru he loves to travel and play football in his spare time.
Indra Giri

Latest posts by Indra Giri (see all)

Related articles

  • How to perform and apply Monte Carlo simulation? Monte Carlo simulation is an extension of statistical analysis where simulated data is produced. This method uses repeated sampling techniques to generate simulated data.
  • How to conduct generalized least squares test? In statistics, Generalized Least Squares (GLS) is one of the most popular methods for estimating unknown coefficients of a linear regression model when the independent variable is correlating with the residuals.
  • How to apply missing data imputation? Missing data is one of the most common problems in almost all statistical analyses. If the data is not available for all the observations of variables in the model, then it is a case of ‘missing data’. Missing data are part of almost all researches. They are also a common problem in most […]
  • How to use an instrumental variable? Instrumental variable is a third variable that estimates causal relationships in the regression analysis when an endogenous variable is present. Instrumental variables are useful when the independent variable in the regression model correlates with the error term in the model.
  • How to perform nonlinear regression? Regression analysis is a statistical tool to study the relationship between variables. These variables are the outcome variable and one or more exposure variables. In other words, regression analysis is an equation which predicts a response from the value of a certain predictor.


We are looking for candidates who have completed their master's degree or Ph.D. Click here to know more about our vacancies.