All articles by Prateek Sharma

How to use K-Nearest Neighbor (KNN) algorithm on a dataset?

K- Nearest Neighbor, popular as K-Nearest Neighbor (KNN), is an algorithm that helps to assess the properties of a new variable with the help of the properties of existing variables. KNN is applicable in classification as well as regression predictive problems. KNN is a simple non-parametric test. It does not involve any internal modeling and does not require data points to have certain properties. It simply takes the voting of majority of variables and accordingly treats new variables. Read more »

How to use an instrumental variable?

Instrumental variable is a third variable that estimates causal relationships in the regression analysis when an endogenous variable is present. Instrumental variables are useful when the independent variable in the regression model correlates with the error term in the model. A major complication in econometrics is the possibility of inconsistent parameter estimation due to endogenous regressors. Read more »

How to perform LASSO regression test?

In statistics, to increase the prediction accuracy and interpret-ability of the model, Least Absolute Shrinkage and Selection Operator (LASSO) is extremely popular. It is a regression procedure that involves selection and regularisation and was developed in 1989. LASSO regression is an extension of linear regression that uses shrinkage. The LASSO imposes a constraint on the sum of the absolute values of the model parameters. Here the sum has a specific constant as an upper bound. This constraint causes regression coefficients for some variables to shrink towards zero, i.e. ‘shrinkage’. The LASSO regression is easy when there is automatic feature or variable selection. It is also useful when dealing with predictors with high correlation, where standard regression will usually have large regression coefficients. Read more »

How to apply missing data imputation?

Missing data is one of the most common problems in almost all statistical analyses. If the data is not available for all the observations of variables in the model, then it is a case of ‘missing data’. Missing data are part of almost all researches. They are also a common problem in most scientific research domains such as biology and medicine. If missing values are not treated well then complications arise in handling and analyzing the data. Read more »

Markov chain and its use in solving real world problems

Markov chain is one of the most important tests in order to deal with independent trials processes. There are two major principal theorems for these processes. The first one is the ‘Law of Large Numbers’ and the second one is the ‘Central Limit Theorem’. When probability experiments create independent trial processes, the outcomes for all iterations are same. Also, they increase or decrease with the same probability. Markov chain is the process where the outcome of a given experiment can affect the outcome of the future experiments. Read more »

How to perform bootstrap and jackknife analysis?

Bootstrap and jackknife are superficially similar statistical techniques that involve re-sampling the data. They are nonparametric and specific resampling techniques that can estimate standard errors and confidence intervals of a population parameter. The population parameters include mean, median, proportion, odds ratio, correlation coefficient or regression coefficient. In 1979 Bradley Efron introduced the bootstrap method for evaluating the variance of an estimator. Read more »

We are looking for candidates who have completed their master's degree or Ph.D. Click here to know more about our vacancies.