Collection of data and data file extensions in Hamlet II

In the previous article, the interface of the Hamlet II software was discussed. This article presents the methods of collection of data and the type of data file extensions in Hamlet II software. The software has different file formats or extensions depending upon the type of analysis. Therefore, it is very important to know the use of every file format and the mode of collection of data.

Types of data file extensions in Hamlet II

Brier & Hopp (2015) provide the list of data file extensions in Hamlet II.

Data file extension
.voc Vocabulary lists for use with Hamlet joint frequencies.
.mat Matrix of co-occurrences of word pairs, from Hamlet Joint Frequencies, used as input to MINISSA, Cluster Analysis, or INDSCAL. It is also used to store arc-distances in the analysis of subject spaces, expressed as similarities. However, this must not be confused with Windows Office access (.mat) format files.
.xpr Context profiles created by Hamlet joint frequencies and latent Dirichlet allocation for use with singular value decomposition (MDPREF) and correspondence analysis.
.svd Results of singular value decomposition (MDPREF).
.xpc Results of correspondence analysis.
.ham Output listing from Hamlet joint frequencies.
.txt Listing of word clusters identified by cluster analysis.
.min MINISSA output listing, accessed by SELECT to generate input to PINDIS.
.stp ‘Stoplist’ file used in association with WORDLIST, VOCEDIT and LDA.
.ins INDSCAL input file.
.ind INDSCAL output listing.
.inp PINDIS input file.
.pin PINDIS output listing.
.bmp, .jpg Graphic files to store results displayed by MINISSA, PINDIS, singular value decomposition (MDPREF), correspondence analysis, INDSCAL and PROFILE.
.lst Wordlist file, generated by Wordlist or Compare.
.kwc Key-Word-In-Context listing.
.win, .cfg Language convention files.

Collection of data

Data is presented in text form when collected via questionnaires formed for interviews or qualitative studies. Therefore, it is collected in either word document .doc or converted to .txt format. Hamlet II accepts only .txt files.

In social researches, qualitative studies comprise of interviews or focus group interviews. The three most common qualitative methods are participant observation, in-depth interviews, and focus groups. Each method is particularly suited for obtaining a specific type of data.

Computer-generated transcription

Computer-generated transcription is derived from audio player or video files from interviews and other similar data collection techniques. There are softwares that allow conversion or transcription of data from voice or audio files to texts, of which Dragon and NVivo are popular. Google Docs is another option with a speech to text option. However, it is important to know that no software gives 100% perfect transcription. Hence cross-checking for left out text is important for appropriate analysis.

Transcript for Hamlet II

The following points are important for analysis in Hamlet II software.

  1. Firstly it must be in .txt format.
  2. Secondly, questions from the interviews must avoid reducing repetition of words.
  3. Words like ‘interviewer’, ‘respondent’ and etc. must be ignored because they add no value to your analysis.
  4. Avoid numerical as well as special characters.
  5. The transcript must follow a particular language. For instance, if a researcher is using English, the complete transcript must comprise of English language only.
  6. Lastly, use automated transcript but with final manual transcript proofread.

Difference between good and bad transcript

The two images present the difference between good and bad transcripts for usage in Hamlet II.

Figure 1: Example for bad transcript for Hamlet II

Figure 1: Example for a bad transcript in Hamlet II

Figure 2: Example of good transcript for Hamlet II

Figure 2: Example of a good transcript for Hamlet II

As the above image shows, the good example of transcript for Hamlet II shows the removal of numerical and special characters. The image also presents removal of ‘respondent’ or ‘interviewer’ type of words. Moreover, the output ignored the interview questions, while using a singular language throughout.

Text analysis and interpretation

This article presents the importance of qualitative data collection for Hamlet II. Furthermore, in this module, the next article focuses on textual analyses and their interpretations.


Avishek Majumder

Avishek Majumder

Research Analyst at Project Guru
Avishek is a Master in Biotechnology and has previously worked with Lifecell International Private Limited. Apart from data analysis and biological research, he loves photography and reading. He loves to play football and basketball in his spare time with an avid interest in adventure and nature. He was also a member of the Scouts in his school and has attended Military training.
Avishek Majumder

Related articles

  • Interface of the Hamlet II software Hamlet II helps in quantitative analysis of text. It helps assess the joint occurrences or recurrence of word frequencies in a vocabulary list or in content.
  • Performing wordlist comparing, KWIC and text profile in Hamlet II This article presents the steps to perform frequency analyses which are, keyword in-context or KWIC and graphical analysis of wordlist and compare wordlist.
  • Perform a non-hierarchical cluster analysis in Hamlet II Non-hierarchical cluster analysis is the next step to a hierarchical cluster model. It allows the partitioning of the similar matrices into equal numbers of clusters. It also creates a list of the partitions from the similar matrix generated in the hierarchical cluster.
  • MINISSA in Hamlet II for multidimensional scaling This article focuses on the application and interpretation of non-metric Multidimensional Scaling (MDS) method Michigan-Nijmegen Integrated Smallest Space Analysis (MINISSA) in Hamlet II.
  • Text frequency or wordlist analysis in Hamlet II Text analysis, or text frequency analysis, is an important and common text-based analysis using Wordlist. In this analysis, the transcript or the text file is assessed for occurrence or repetition or frequency of words. This is known as the wordlist analysis in Hamlet II.


We are looking for candidates who have completed their master's degree or Ph.D. Click here to know more about our vacancies.