Performing wordlist comparing, KWIC and text profile in Hamlet II

By Avishek Majumder & Priya Chetty on May 14, 2018

In the previous article, the steps of performing Wordlist in Hamlet II software were presented. In this article, the steps of comparing wordlist, keyword in-context or KWIC and text profiling have been presented.

Comparing wordlist

Wordlist displays a sorted list of words found in a file of text. To compare wordlist:

  • Step 1: First repeat the wordlist analysis as mentioned in the previous article (the step one is to be done for two different text files or transcripts)
  • Step 2: Save them in .LST file format
  • Step 3: Compare the wordlist by:
    1. Clicking on ‘Tools’.
    2. Then on ‘Compare’.
    3. Select the files you want to compare. This step is represented in the image below.
    Figure 1: Comparing wordlist steps
    Comparing wordlist steps
  • Step 3: Search for the .LST files saved from Wordlist analysis
  • Step 4: Choose ‘create list’ button from the above dialog box
    Figure 2: Result from compare wordlist
    Result from compare wordlist

    The above image presents the findings and comparison between two different text files. The image presenets the findings from the comapre wordlist analysis.

  • Step 5: Save the wordlist or result in .LST format

Interpreting compare wordlist

The results indicate the comparison between two wordlist files. The word ‘they’ appeared 55 times in file one and 179 times in file 2. Furthermore, ‘They’ appeared the most in both files. Thus, compare wordlist function assesses the lists of frequencies for any words which appear in both lists. ‘They’, contributes, 0.5% of total appearance to the first transcript whilst 1.7% of total appearance from second transcript. Therefore, ‘Music’ appears 239 times in file one and 53 times in file two.

Comparison indicates on frequency basis ranks 7th most common word. Henceforth, the number of appearance from both files on an average is important, rather than in single file. Furthermore, at the end of the result page, the total word-strings found common from comparing of wordlist is presented, which in the sample is 373. Again, the percentage presented can interpret that, 22.87% of the total vocabulary are common to both files. Thus, results from the compare wordlist are interpreted this way.

KWIC or keyword in context

Keyword in context lists the given word or phrase from a text file. KWIC presents keywords from a line or a paragraph. Therefore, it also presents one or more keywords in the centre of the text or display the keyword in a larger block of text (Brier & Hopp, 2015). Follow these steps for KWIC;

  1. Tools > KWIC > KWIC dialog box.
  2. Now choose the file for KWIC (.TXT file).
  3. Set the words for KWIC. Here for example the words chosen; Music, Dance, and Course.
  4. Choose from ‘as phrase’ or ‘listing words separately’.
  5. Set context length in lines, here chosen; 1. Choose any number on the basis of the text sample file.
  6. Choose create button, follow the number steps as in image.
    KWIC
    KWIC
  7. Results appear as;
    Results from KWIC
    Results from KWIC

However, from the image, the results vary for ‘as phrase’ and ‘listing words separately’. Furthermore, in ‘as phrase’, more than two continuous words is used to find the phrase in the texts or lines. Therefore, listing separately helps finding different keywords from different sentences or lines in the text file. Furthermore, display context allows filtering the number of lines to be searched for the keywords in the text file.

Interpreting KWIC

While interpreting it is important to present the total number of hits of keywords, appearance of lines or phrases and understanding the context of the line. Herewith, the keywords chosen; music indicates 221 hits from the text file. Thus, interpreting, a total of 221 ‘music’ keywords were established form the text file. In other words the word ‘music’ is used 221 times in total in the text file. Moreover, the word listing and phrase indicate that the density of use of ‘music’ is high and hence is an important component of the research topic. Thus, on an average, after every 8 lines the word ‘music’ is established in the text file. Henceforth, ‘music’ is an important keyword in the research.

Text profile

Text profile displays a profile of the distributions of word and sentence lengths for a given text file (Brier & Hopp, 2015). Therefore, this analysis presents mean word and sentence lengths, with dispersion statistics and a histogram for each distribution.

Thus, follow the steps for text profile;

  1. Tools > Text profile> Pop-up box.
  2. Choose file.
  3. Set characters if any other than the default
  4. Click create button.
    Steps to create text profile
    Steps to create text profile
  5. Text profile5. Save the file or result
    Result from text profile
    Result from text profile
  6. Result from text profile

Interpreting text profile

The results comprise of dispersion statistics and a histogram for each distribution. The distribution is made on the basis of the wordlist. However, total words of 11456 established from the text profiling. On an average every word in the text file is of 4 letter whereby the maximum letter word limits to 14 and the lowest to 1. Furthermore, the standard deviation of 6.31 implies the variation amongst the words and letters. Therefore, histogram presents that majority of words either 2 or 4 letter words and followed by 3 letter, and 5 letter words.

On the other hand total sentences comprise of 957, in the text profile of the text file. Highest limit of words in one sentence tally 83, whilst the lowest counts 2 words in a sentence. Thus, on an average, 11 words established for the total number of sentences in the text profile with deviation from the average words of ±9.96.

Thus, this article presented methods to perform KWIC, compare wordlists and graphical presentation of wordlists, along with their interpretations. However, the next article comprises of creation of vocabulary, comparison and joint frequencies.

References

Discuss