Performing wordlist comparing, KWIC and text profile in Hamlet II

In the previous article, the steps of performing Wordlist in Hamlet II software were presented. In this article, the steps of comparing wordlist, keyword in-context or KWIC and text profiling have been presented.

Comparing wordlist

Wordlist displays a sorted list of words found in a file of text. To compare wordlist:

  • Step 1: First repeat the wordlist analysis as mentioned in the previous article (the step one is to be done for two different text files or transcripts)
  • Step 2: Save them in .LST file format
  • Step 3: Compare the wordlist by:
    1. Clicking on ‘Tools’.
    2. Then on ‘Compare’.
    3. Select the files you want to compare. This step is represented in the image below.
    Figure 1: Comparing wordlist steps

    Comparing wordlist steps

  • Step 3: Search for the .LST files saved from Wordlist analysis
  • Step 4: Choose ‘create list’ button from the above dialog box
    Figure 2: Result from compare wordlist

    Result from compare wordlist

    The above image presents the findings and comparison between two different text files. The image presenets the findings from the comapre wordlist analysis.

  • Step 5: Save the wordlist or result in .LST format

Interpreting compare wordlist

The results indicate the comparison between two wordlist files. The word ‘they’ appeared 55 times in file one and 179 times in file 2. Furthermore, ‘They’ appeared the most in both files. Thus, compare wordlist function assesses the lists of frequencies for any words which appear in both lists. ‘They’, contributes, 0.5% of total appearance to the first transcript whilst 1.7% of total appearance from second transcript. Therefore, ‘Music’ appears 239 times in file one and 53 times in file two.

Comparison indicates on frequency basis ranks 7th most common word. Henceforth, the number of appearance from both files on an average is important, rather than in single file. Furthermore, at the end of the result page, the total word-strings found common from comparing of wordlist is presented, which in the sample is 373. Again, the percentage presented can interpret that, 22.87% of the total vocabulary are common to both files. Thus, results from the compare wordlist are interpreted this way.

KWIC or keyword in context

Keyword in context lists the given word or phrase from a text file. KWIC presents keywords from a line or a paragraph. Therefore, it also presents one or more keywords in the centre of the text or display the keyword in a larger block of text (Brier & Hopp, 2015). Follow these steps for KWIC;

  1. Tools > KWIC > KWIC dialog box.
  2. Now choose the file for KWIC (.TXT file).
  3. Set the words for KWIC. Here for example the words chosen; Music, Dance, and Course.
  4. Choose from ‘as phrase’ or ‘listing words separately’.
  5. Set context length in lines, here chosen; 1. Choose any number on the basis of the text sample file.
  6. Choose create button, follow the number steps as in image.
    KWIC

    KWIC

  7. Results appear as;
    Results from KWIC

    Results from KWIC

However, from the image, the results vary for ‘as phrase’ and ‘listing words separately’. Furthermore, in ‘as phrase’, more than two continuous words is used to find the phrase in the texts or lines. Therefore, listing separately helps finding different keywords from different sentences or lines in the text file. Furthermore, display context allows filtering the number of lines to be searched for the keywords in the text file.

Interpreting KWIC

While interpreting it is important to present the total number of hits of keywords, appearance of lines or phrases and understanding the context of the line. Herewith, the keywords chosen; music indicates 221 hits from the text file. Thus, interpreting, a total of 221 ‘music’ keywords were established form the text file. In other words the word ‘music’ is used 221 times in total in the text file. Moreover, the word listing and phrase indicate that the density of use of ‘music’ is high and hence is an important component of the research topic. Thus, on an average, after every 8 lines the word ‘music’ is established in the text file. Henceforth, ‘music’ is an important keyword in the research.

Text Profile

Text profile displays a profile of the distributions of word and sentence lengths for a given text file (Brier & Hopp, 2015). Therefore, this analysis presents mean word and sentence lengths, with dispersion statistics and a histogram for each distribution.

Thus, follow the steps for text profile;

  1. Tools > Text profile> Pop-up box.
  2. Choose file.
  3. Set characters if any other than the default
  4. Click create button.
    Steps to create text profile

    Steps to create text profile

  5. Text profile5. Save the file or result
    Result from text profile

    Result from text profile

  6. Result from text profile

Interpreting text profile

The results comprise of dispersion statistics and a histogram for each distribution. The distribution is made on the basis of the wordlist. However, total words of 11456 established from the text profiling. On an average every word in the text file is of 4 letter whereby the maximum letter word limits to 14 and the lowest to 1. Furthermore, the standard deviation of 6.31 implies the variation amongst the words and letters. Therefore, histogram presents that majority of words either 2 or 4 letter words and followed by 3 letter, and 5 letter words.

On the other hand total sentences comprise of 957, in the text profile of the text file. Highest limit of words in one sentence tally 83, whilst the lowest counts 2 words in a sentence. Thus, on an average, 11 words established for the total number of sentences in the text profile with deviation from the average words of ±9.96.

Thus, this article presented methods to perform KWIC, compare wordlists and graphical presentation of wordlists, along with their interpretations. However, the next article comprises of creation of vocabulary, comparison and joint frequencies.

References

Avishek Majumder

Avishek Majumder

Research Analyst at Project Guru
Avishek is a Master in Biotechnology and has previously worked with Lifecell International Private Limited. Apart from data analysis and biological research, he loves photography and reading. He loves to play football and basketball in his spare time with an avid interest in adventure and nature. He was also a member of the Scouts in his school and has attended Military training.
Avishek Majumder

Related articles

  • Joint frequencies analysis using Hamlet II Joint frequencies analysis helps to search inter-connections between a number of keywords or character strings occurring in the text. It produces matrices of joint frequencies of the items of a specified vocabulary list with respect to a suitably chosen unit of context.
  • Steps to conduct MDPREF using Hamlet II for Singular Value Decomposition (SVD) This article talks about the application of Singular Value Decomposition (SVD) technique MDPREF using Hamlet II. It is performed on the same matrix of profiles or context units saved while performing joint frequency analysis.
  • Application of PINDIS separately in Hamlet II This article explains the application of PINDIS separately in Hamlet II. It also presents an example using PINDIS analysis to understand the application in depth. Accessing PINDIS separately is possible only after creating the input file using 'Select' function.
  • How to perform hierarchical clustering using Hamlet II? Hierarchical clustering uses methods to segregate the texts according to the similar vocabularies and then similar words or context are clustered together.
  • Collection of data and data file extensions in Hamlet II The software has different file formats or extensions depending upon the type of analysis. Therefore, it is very important to know the use of every file format and the mode of collection of data.

Discuss

We are looking for candidates who have completed their master's degree or Ph.D. Click here to know more about our vacancies.