Challenges of using different bibliometrics and data analysis software

In the preceding article, the challenges presented with respect to selection of the field of study and collection of data in bibliometrics. However as mentioned, bibliometrics is the statistical analysis of bibliographic data from journals, books, documents and other conference proceedings (Kurtz & Bollen, 2010). Furthermore, challenges of bibliometrics arise at every step of the study, selection of the specific software and type of analyses done. Thus, this article presented both the challenges and the solutions occurring in data analysis of bibliometrics.

The basic flow or process of bibliometrics is:

The process of bibliometrics

The process of bibliometrics

Challenges of selection of software and data analysis is one of the major setbacks in a bibliometric study. However, there are softwares available with both free and limited access. Henceforth, the selection of software is completely on the basis of the data analyses chosen.

Types of software and analyses for bibliometrics

A simple frequency analysis could be done on MS Excel, BibExcel, R and SPSS, whereas network and mapping analyses uses Pajek, CiteSpace, VosView, MatLab and others and for citation analyses Publish or Perish, Clarivate Analytics, BibExcel and others (Chen, Lin, Huang, & Huang, 2010; Harzing, 2008; Salini, 2012; Singh & Komal, 2009; Van Eck, Waltman, Dekker, & Van Den Berg, 2010). However, the most commonly used software are BibExcel, SPSS, Pajek, Citespace, Publish or Perish and VosView.

Description of the software

Bibexcel is an indispensable tool for conversion and treatment of bibliographic data and moreover it is freely available (Arsenova, 2013). It also contains an abundance of routines for creating different types of bibliometric analyses. Again, Publish or perish on the other hand, based on data from Google Scholar creates bibliometric analyses of researchers including H index and other citation metrics (Harzing, 2008). On the other hand, Pajek is used for different kinds of network analyses and visualizations. Citespace helps to analyze, visualize and cluster (mainly) bibliographic data (Guler, Waaijer, & Palmblad, 2016). Lastly, Vosview, a software for mapping and network bibliometrics based on the availability of the desired file format (Van Eck et al., 2010).

Challenges associated with the softwares

The main challenge is that there are various software available but not a particular one that would help in any type of data analysis. Moreover, every software has its own format or file extension for analysis, when not available cannot perform bibliometrics. However, “.ris” is the most common form of file extension that can read by most of the software available (Guler et al., 2016; Salini, 2012). Furthermore, another issue is that the manuals available for software, not properly constructed. Henceforth a new researcher finds it very difficult to choose the appropriate software for data analysis. However, analyses showed that various software are either directly linked to Web of Science or Scopus. Hence, the researcher with no access to such web libraries have face limitations (Salini, 2012).

Problem in data analysis and software selection

Problem in data analysis and software selection


Overcoming the challenge

With respect to the data analysis and selection of the software, it is important that the type of analysis done by the software is well known. In addition, the importance of every software and how they operate is also important to acknowledge so that the selection of the suitable data is in congruent to the aim of the bibliometric study. However, the selection of software depends on whether it is required for descriptive analysis or citation analysis or bibliometric mapping or networking analysis. Another aspect to this is that in many cases the researcher combines one or more types of data analysis for more complicated analyses and in-depth studies.

However, the following table helps to identify which software used in which case. Furthermore, another acknowledgment in this respect is that certain statistics done manually whereby automated software not applicable or web based information from Clarivate analytics or Google Scholar used for descriptive or citation analyses. However, the researcher must also know how to integrate two or more software while conducting a bibliometric study.

List of table based on the usage of different softwares

Data analyis Type of software File formats supported Limited/open access Recommendation level
Descriptive analyses Bibexcel .bibx, .txt, .csv Open source High
SPSS .xls, .sav Open source Low
MS Excel/Access .xls, .accdb/.mdb Open source Moderate
Publish or perish .txt, .ris, .csv, .bibtex Open source High
Citation analyses Clarivate analytics .ris, .bibtex Limited Low
SCOPUS .ris, .csv, .txt Limited Low
Publish or Perish .txt, .ris, .csv, .bibtex Open source High
Bibliometric mapping Pajek .net, .svg Limited Low
CiteSpace .txt, .ris, .csv, .bibtex, .xls Limited Low
VosView .txt, .ris, .csv, .bibtex, .xls Open source High
Matlab .csv, .xls Limited Moderate
File convertion software Zotero All formats Open source High
EndNote .ris, .csv, .txt, .xls, .bibtex Limited High
Mendeley .ris, .csv, .txt Open source Moderate


Avishek Majumder

Avishek Majumder

Research Analyst at Project Guru
Avishek is a Master in Biotechnology and has previously worked with Lifecell International Private Limited. Apart from data analysis and biological research, he loves photography and reading. He loves to play football and basketball in his spare time with an avid interest in adventure and nature. He was also a member of the Scouts in his school and has attended Military training.
Avishek Majumder

Related articles


We are looking for candidates who have completed their master's degree or Ph.D. Click here to know more about our vacancies.