Challenges of using different bibliometrics and data analysis software

By Avishek Majumder & Priya Chetty on February 6, 2018

In the preceding article, the challenges presented with respect to selection of the field of study and collection of data in bibliometrics. However as mentioned, bibliometrics is the statistical analysis of bibliographic data from journals, books, documents, and other conference proceedings (Kurtz & Bollen, 2010). Furthermore, challenges of bibliometrics arise at every step of the study, selection of the specific software, and type of analyses done. Thus, this article presented both the challenges and the solutions occurring in data analysis of bibliometrics.

The basic flow or process of bibliometrics is:

The process of bibliometrics
The process of bibliometrics

Challenges of selection of software and data analysis is one of the major setbacks in a bibliometric study. However, there are softwares available with both free and limited access. Henceforth, the selection of software is completely on the basis of the data analyses chosen.

Types of software and analyses for bibliometrics

A simple frequency analysis could be done on MS Excel, BibExcel, R and SPSS, whereas network and mapping analyses uses Pajek, CiteSpace, VosView, MatLab and others and for citation analyses Publish or Perish, Clarivate Analytics, BibExcel and others (Chen, Lin, Huang, & Huang, 2010; Harzing, 2008; Salini, 2012; Singh & Komal, 2009; Van Eck, Waltman, Dekker, & Van Den Berg, 2010). However, the most commonly used software are BibExcel, SPSS, Pajek, Citespace, Publish or Perish and VosView.

Description of the software

Bibexcel is an indispensable tool for conversion and treatment of bibliographic data and moreover it is freely available (Arsenova, 2013). It also contains an abundance of routines for creating different types of bibliometric analyses. Again, Publish or perish on the other hand, based on data from Google Scholar creates bibliometric analyses of researchers including H index and other citation metrics (Harzing, 2008). On the other hand, Pajek is used for different kinds of network analyses and visualizations. Citespace helps to analyze, visualize and cluster (mainly) bibliographic data (Guler, Waaijer, & Palmblad, 2016). Lastly, Vosview, a software for mapping and network bibliometrics based on the availability of the desired file format (Van Eck et al., 2010).

Challenges associated with the softwares

The main challenge is that there are various software available but not a particular one that would help in any type of data analysis. Moreover, every software has its own format or file extension for analysis, when not available cannot perform bibliometrics. However, “.ris” is the most common form of file extension that can read by most of the software available (Guler et al., 2016; Salini, 2012). Furthermore, another issue is that the manuals available for software, not properly constructed. Henceforth a new researcher finds it very difficult to choose the appropriate software for data analysis. However, analyses showed that various software are either directly linked to Web of Science or Scopus. Hence, the researcher with no access to such web libraries have face limitations (Salini, 2012).

Problem in data analysis and software selection
Problem in data analysis and software selection

Overcoming the challenge

With respect to the data analysis and selection of the software, it is important that the type of analysis done by the software is well known. In addition, the importance of every software and how they operate is also important to acknowledge so that the selection of the suitable data is in congruent to the aim of the bibliometric study. However, the selection of software depends on whether it is required for descriptive analysis or citation analysis or bibliometric mapping or networking analysis. Another aspect to this is that in many cases the researcher combines one or more types of data analysis for more complicated analyses and in-depth studies.

However, the following table helps to identify which software used in which case. Furthermore, another acknowledgment in this respect is that certain statistics done manually whereby automated software not applicable or web based information from Clarivate analytics or Google Scholar used for descriptive or citation analyses. However, the researcher must also know how to integrate two or more software while conducting a bibliometric study.

List of table based on the usage of different softwares

Data analyisType of softwareFile formats supportedLimited/open accessRecommendation level
Descriptive analysesBibexcel.bibx, .txt, .csvOpen sourceHigh
SPSS.xls, .savOpen sourceLow
MS Excel/Access.xls, .accdb/.mdbOpen sourceModerate
Publish or perish.txt, .ris, .csv, .bibtexOpen sourceHigh
Citation analysesClarivate analytics.ris, .bibtexLimitedLow
SCOPUS.ris, .csv, .txtLimitedLow
Publish or Perish.txt, .ris, .csv, .bibtexOpen sourceHigh
Bibliometric mappingPajek.net, .svgLimitedLow
CiteSpace.txt, .ris, .csv, .bibtex, .xlsLimitedLow
VosView.txt, .ris, .csv, .bibtex, .xlsOpen sourceHigh
Matlab.csv, .xlsLimitedModerate
File convertion softwareZoteroAll formatsOpen sourceHigh
EndNote.ris, .csv, .txt, .xls, .bibtexLimitedHigh
Mendeley.ris, .csv, .txtOpen sourceModerate

References

Discuss