Performing wordlist comparing, KWIC and text profile in Hamlet II

In the previous article, the steps of performing Wordlist in Hamlet II software were presented. In this article, the steps of comparing wordlist, keyword in-context or KWIC and text profiling have been presented.

Comparing wordlist

Wordlist displays a sorted list of words found in a file of text. To compare wordlist:

  • Step 1: First repeat the wordlist analysis as mentioned in the previous article (the step one is to be done for two different text files or transcripts)
  • Step 2: Save them in .LST file format
  • Step 3: Compare the wordlist by:
    1. Clicking on ‘Tools’.
    2. Then on ‘Compare’.
    3. Select the files you want to compare. This step is represented in the image below.
    Figure 1: Comparing wordlist steps

    Comparing wordlist steps

  • Step 3: Search for the .LST files saved from Wordlist analysis
  • Step 4: Choose ‘create list’ button from the above dialog box
    Figure 2: Result from compare wordlist

    Result from compare wordlist

    The above image presents the findings and comparison between two different text files. The image presenets the findings from the comapre wordlist analysis.

  • Step 5: Save the wordlist or result in .LST format

Interpreting compare wordlist

The results indicate the comparison between two wordlist files. The word ‘they’ appeared 55 times in file one and 179 times in file 2. Furthermore, ‘They’ appeared the most in both files. Thus, compare wordlist function assesses the lists of frequencies for any words which appear in both lists. ‘They’, contributes, 0.5% of total appearance to the first transcript whilst 1.7% of total appearance from second transcript. Therefore, ‘Music’ appears 239 times in file one and 53 times in file two.

Comparison indicates on frequency basis ranks 7th most common word. Henceforth, the number of appearance from both files on an average is important, rather than in single file. Furthermore, at the end of the result page, the total word-strings found common from comparing of wordlist is presented, which in the sample is 373. Again, the percentage presented can interpret that, 22.87% of the total vocabulary are common to both files. Thus, results from the compare wordlist are interpreted this way.

KWIC or keyword in context

Keyword in context lists the given word or phrase from a text file. KWIC presents keywords from a line or a paragraph. Therefore, it also presents one or more keywords in the centre of the text or display the keyword in a larger block of text (Brier & Hopp, 2015). Follow these steps for KWIC;

  1. Tools > KWIC > KWIC dialog box.
  2. Now choose the file for KWIC (.TXT file).
  3. Set the words for KWIC. Here for example the words chosen; Music, Dance, and Course.
  4. Choose from ‘as phrase’ or ‘listing words separately’.
  5. Set context length in lines, here chosen; 1. Choose any number on the basis of the text sample file.
  6. Choose create button, follow the number steps as in image.


  7. Results appear as;
    Results from KWIC

    Results from KWIC

However, from the image, the results vary for ‘as phrase’ and ‘listing words separately’. Furthermore, in ‘as phrase’, more than two continuous words is used to find the phrase in the texts or lines. Therefore, listing separately helps finding different keywords from different sentences or lines in the text file. Furthermore, display context allows filtering the number of lines to be searched for the keywords in the text file.

Interpreting KWIC

While interpreting it is important to present the total number of hits of keywords, appearance of lines or phrases and understanding the context of the line. Herewith, the keywords chosen; music indicates 221 hits from the text file. Thus, interpreting, a total of 221 ‘music’ keywords were established form the text file. In other words the word ‘music’ is used 221 times in total in the text file. Moreover, the word listing and phrase indicate that the density of use of ‘music’ is high and hence is an important component of the research topic. Thus, on an average, after every 8 lines the word ‘music’ is established in the text file. Henceforth, ‘music’ is an important keyword in the research.

Text Profile

Text profile displays a profile of the distributions of word and sentence lengths for a given text file (Brier & Hopp, 2015). Therefore, this analysis presents mean word and sentence lengths, with dispersion statistics and a histogram for each distribution.

Thus, follow the steps for text profile;

  1. Tools > Text profile> Pop-up box.
  2. Choose file.
  3. Set characters if any other than the default
  4. Click create button.
    Steps to create text profile

    Steps to create text profile

  5. Text profile5. Save the file or result
    Result from text profile

    Result from text profile

  6. Result from text profile

Interpreting text profile

The results comprise of dispersion statistics and a histogram for each distribution. The distribution is made on the basis of the wordlist. However, total words of 11456 established from the text profiling. On an average every word in the text file is of 4 letter whereby the maximum letter word limits to 14 and the lowest to 1. Furthermore, the standard deviation of 6.31 implies the variation amongst the words and letters. Therefore, histogram presents that majority of words either 2 or 4 letter words and followed by 3 letter, and 5 letter words.

On the other hand total sentences comprise of 957, in the text profile of the text file. Highest limit of words in one sentence tally 83, whilst the lowest counts 2 words in a sentence. Thus, on an average, 11 words established for the total number of sentences in the text profile with deviation from the average words of ±9.96.

Thus, this article presented methods to perform KWIC, compare wordlists and graphical presentation of wordlists, along with their interpretations. However, the next article comprises of creation of vocabulary, comparison and joint frequencies.


Avishek Majumder

Avishek Majumder

Research Analyst at Project Guru
Avishek is a Master in Biotechnology and has previously worked with Lifecell International Private Limited. Apart from data analysis and biological research, he loves photography and reading. He loves to play football and basketball in his spare time with an avid interest in adventure and nature. He was also a member of the Scouts in his school and has attended Military training.
Avishek Majumder

Related articles

  • Interface of the Hamlet II software Hamlet II helps in quantitative analysis of text. It helps assess the joint occurrences or recurrence of word frequencies in a vocabulary list or in content.
  • Search and help in MATLAB MATLAB provides various ways in which a user can access help within the software for better understanding of the functions. Similarly, many search options are provided to search text, characters and functions in the software in order to fix errors and avoid bugs.
  • Reviewing documents on Mendeley desktop With the option to save PDF document in the Mendeley desktop, it is now easier to store, and review the pdf document at any point during the study.
  • Handling text value and characters in MATLAB If the text value itself contains single quote; eg. Sam’s in the image 2 below, then three single quotes are used within the statement. Two for text and one for the apostrophe.
  • Managing imported data in NVIVO workspace In the previous article the importance of Nvivo as a computer assisted qualitative data analysis software, along with types of qualitative data it enables researchers to process and steps to import the same has been explained.


We are looking for candidates who have completed their master's degree or Ph.D. Click here to know more about our vacancies.