How to conduct a survival analysis?

By Indra Giri & Priya Chetty on October 30, 2017

Survival analysis is a method under predictive modeling where the dependent variable is time. Therefore, it involves time-to-event prediction modeling. The methodology is that the outcome variable is time until the occurrence of a certain event. The response of the event is known as the survival time, failure time or event time. The data for survival analysis is generally continuous. For example; survival analysis is applicable in the event of person developing a heart attack. The analysis will help to ascertain the time period (in days, weeks, years, etc.) until a person develops a heart attack.

In survival analysis, the subjects (like person having heart attack) are observed over a certain period of time. The focus is on the time at which the event of interest occurs. Another example of survival analysis is administering the period of attrition of employees. Survival analysis can help estimat the probable time when an employee will leave the company.

Difference between survival analysis and regression

Regression is not suitable for determining survival because of two major assumptions; non-normality of outcome and censoring of information. Censoring means the missing information about the subject. For example, a patient under observation in a study for 30 days, does not experience a heart attack during the period. In this case, the information of the patient is termed as ‘right censored’. Another example of right censoring is when a person drops out of the study prematurely and did not experience the event. This person’s survival time is said to be ‘censored’, since the event of interest did not happen while this person was under observation. Although the censoring implies the missing of information in data, but it also avoids bias in survival analysis. Also survival times are mostly positive number. However it is very difficult in regression to restrict the results to positive number only.

Example of survival analysis

This section explains the process of conducting survival analysis through a case study. The case dataset is of 30 patients who are suffering from a heart disease. A healthcare institute discovered three types of drugs for heart patients, namely, Drug A, Drug B and Drug C. The researcher needs to test if these three drugs are able to cure heart diseases in 30 patients, at any given time period. Thus, the data is comprised of three sets:

  1. Time: implies the time period each of the patient took the drugs,
  2. Treatment: includes the three types of drugs taken by patients and
  3. Status: implies if the patients after taking drugs (Drug A, Drug B or Drug C) in a given time period such as 10 months for 1st patient; no longer suffer from the disease.
Figure 1: Dataset for survival analysis
Figure 1: Dataset for survival analysis

Step 1

Here, survival analysis helps investigate the time period taken by the patients to cure from disease using above postulated treatment. Figure 2 below represents the results from SPSS software. It consists of three different curves representing three Drugs (Drug A, Drug B or Drug C). The horizontal axis carries the different time periods taken up by patients to try medicines. The curves start with point 1 and converge to zero, representing, how each patient is taking the drug and how many months it takes to cure him. For instance, as shown in figure below, patients taking drug one (A) and two (B) (curve blue and green) gets cured within 10 months. However, patients taking drug C (curve yellow), is cured within 9 months.

Figure 2: Survival Analysis Curve in SPSS
Figure 2: Survival Analysis Curve in SPSS

Step 2

Furthermore, investigate if the results of all three drugs are similar or different. The results of survival analysis also contain below table representing three test scores. Here, check all three p values (as indicated in below figure). Then analyze if p values are greater than 0.05. If yes, then the null hypothesis is rejected. In other words, all three drugs have similar effects. Since in this case, all the p values are way greater than 0.05 (5%), the null hypothesis cannot be rejected. Therefore the conclusion is that all three drugs are similar or have similar effects.

Figure 3: Test of equality of survival distributions for the different levels of treatment
Figure 3: Test of equality of survival distributions for the different levels of treatment

Application of survival analysis

Survival analysis is important for studying the nature of recurring events. Thus it is applicable in social sciences, medical research and any other field with such recurring events.

Software that supports survival analysis test are SAS, MATLAB, STATA and SPSS.

Priya is the co-founder and Managing Partner of Project Guru, a research and analytics firm based in Gurgaon. She is responsible for the human resource planning and operations functions. Her expertise in analytics has been used in a number of service-based industries like education and financial services.

Her foundational educational is from St. Xaviers High School (Mumbai). She also holds MBA degree in Marketing and Finance from the Indian Institute of Planning and Management, Delhi (2008).

Some of the notable projects she has worked on include:

  • Using systems thinking to improve sustainability in operations: A study carried out in Malaysia in partnership with Universiti Kuala Lumpur.
  • Assessing customer satisfaction with in-house doctors of Jiva Ayurveda (a project executed for the company)
  • Predicting the potential impact of green hydrogen microgirds (A project executed for the Government of South Africa)

She is a key contributor to the in-house research platform Knowledge Tank.

She currently holds over 300 citations from her contributions to the platform.

She has also been a guest speaker at various institutes such as JIMS (Delhi), BPIT (Delhi), and SVU (Tirupati).