There are two types of statistical analysis tools, i.e. descriptive statistics and inferential statistics. This article introduces the statistical tools used in both inferential and descriptive statistics.
Descriptive statistics tools are used to summarize the information about the variables in the data set. The most common tools available are:
- Standard deviation.
- Inter-quartile range, etc.
Each of these tools has been discussed in my previous article.
Inferential statistics are procedures which allow researchers to infer or generalize observations made with samples to the larger population from which they are selected. It is different from descriptive statistics in a way that while descriptive statistics remains local to the sample describing the central tendency and variability in the sample, inferential statistics is focused on making statements about the population.
The concept of hypothesis testing is closely tied to inferential statistics wherein the researcher seeks to determine if the sample characteristics observed during statistical testing sufficiently deviate from the null hypothesis so that its rejection is justified. In order to test a hypothesis, the researcher first needs to define the statistical model which can describe the behaviour of data and type of sample population parameter which needs to be tested. Most of the statistical analysis models belong to a normal distribution like:
Other non-parametric models like:
- Wilcoxon rank sum test.
- Wilcoxon signed rank test.
- Kruskal Wallis test.
also exist, however, they will be discussed in future articles. Once the model is decided, we can determine test statistics and the value derived from the data helps in deciding whether to reject or accept the Null Hypothesis.
Statistical formulas used in Hypothesis testing
1. t-Test: This test is conducted to compare the means of two samples, even if they have different numbers of replicates. When conducting t-test, the list of sample 1 and 2 is made and their means are calculated. Next the standard deviations of both the samples are calculated. Next step is to calculate variance after which t-value is calculated using:
Where, x1 and x2, means of sample 1 and 2.
As per the degree of freedom and level of significance (p= 0.05), the tabulated t-value is checked. If the calculated t-value exceeds the tabulated value we may say that the means are significantly different at that level of probability. The significant difference means that we can accept the null hypothesis which has nearly 5% chance of being wrong.
2. Chi-square Test (χ2): Chi-square test is conducted to determine whether there is a significant difference between the expected and observed frequencies in one or more frequencies. It tests the null hypothesis which states that there is no significant difference between expected and observed results. The value for Chi-square is calculated using the following formula:
χ 2= df (o-e)2/e
where, O is the Observed Frequency in each category.
E is the Expected Frequency in the corresponding category.
df is the “degree of freedom” (n-1).
χ 2 is Chi-square.
3. Analysis of variance (ANOVA): Analysis of variance is similar to regression. It can say that ANOVA is used to investigate and model the relationship between a response variable and one or more independent variables. There are two types of ANOVA tests:
- One way ANOVA test.
- Two way ANOVA test.
The one way ANOVA tests the equality when the classification is by one variable and two way ANOVA test is used to test equality of population when the classification is by two variables.
F = Differences among Means/Error variance within groups.