In the previous article, the steps of performing Wordlist in Hamlet II software were presented. In this article, the steps of comparing wordlist, keyword in-context or KWIC and text profiling have been presented.
Wordlist displays a sorted list of words found in a file of text. To compare wordlist:
- Step 1: First repeat the wordlist analysis as mentioned in the previous article (the step one is to be done for two different text files or transcripts)
- Step 2: Save them in .LST file format
- Step 3: Compare the wordlist by:
- Clicking on ‘Tools’.
- Then on ‘Compare’.
- Select the files you want to compare. This step is represented in the image below.
- Step 3: Search for the .LST files saved from Wordlist analysis
- Step 4: Choose ‘create list’ button from the above dialog box
The above image presents the findings and comparison between two different text files. The image presenets the findings from the comapre wordlist analysis.
- Step 5: Save the wordlist or result in .LST format
Interpreting compare wordlist
The results indicate the comparison between two wordlist files. The word ‘they’ appeared 55 times in file one and 179 times in file 2. Furthermore, ‘They’ appeared the most in both files. Thus, compare wordlist function assesses the lists of frequencies for any words which appear in both lists. ‘They’, contributes, 0.5% of total appearance to the first transcript whilst 1.7% of total appearance from second transcript. Therefore, ‘Music’ appears 239 times in file one and 53 times in file two.
Comparison indicates on frequency basis ranks 7th most common word. Henceforth, the number of appearance from both files on an average is important, rather than in single file. Furthermore, at the end of the result page, the total word-strings found common from comparing of wordlist is presented, which in the sample is 373. Again, the percentage presented can interpret that, 22.87% of the total vocabulary are common to both files. Thus, results from the compare wordlist are interpreted this way.
KWIC or keyword in context
Keyword in context lists the given word or phrase from a text file. KWIC presents keywords from a line or a paragraph. Therefore, it also presents one or more keywords in the centre of the text or display the keyword in a larger block of text (Brier & Hopp, 2015). Follow these steps for KWIC;
- Tools > KWIC > KWIC dialog box.
- Now choose the file for KWIC (.TXT file).
- Set the words for KWIC. Here for example the words chosen; Music, Dance, and Course.
- Choose from ‘as phrase’ or ‘listing words separately’.
- Set context length in lines, here chosen; 1. Choose any number on the basis of the text sample file.
- Choose create button, follow the number steps as in image.
- Results appear as;
However, from the image, the results vary for ‘as phrase’ and ‘listing words separately’. Furthermore, in ‘as phrase’, more than two continuous words is used to find the phrase in the texts or lines. Therefore, listing separately helps finding different keywords from different sentences or lines in the text file. Furthermore, display context allows filtering the number of lines to be searched for the keywords in the text file.
While interpreting it is important to present the total number of hits of keywords, appearance of lines or phrases and understanding the context of the line. Herewith, the keywords chosen; music indicates 221 hits from the text file. Thus, interpreting, a total of 221 ‘music’ keywords were established form the text file. In other words the word ‘music’ is used 221 times in total in the text file. Moreover, the word listing and phrase indicate that the density of use of ‘music’ is high and hence is an important component of the research topic. Thus, on an average, after every 8 lines the word ‘music’ is established in the text file. Henceforth, ‘music’ is an important keyword in the research.
Text profile displays a profile of the distributions of word and sentence lengths for a given text file (Brier & Hopp, 2015). Therefore, this analysis presents mean word and sentence lengths, with dispersion statistics and a histogram for each distribution.
Thus, follow the steps for text profile;
- Tools > Text profile> Pop-up box.
- Choose file.
- Set characters if any other than the default
- Click create button.
- Text profile5. Save the file or result
- Result from text profile
Interpreting text profile
The results comprise of dispersion statistics and a histogram for each distribution. The distribution is made on the basis of the wordlist. However, total words of 11456 established from the text profiling. On an average every word in the text file is of 4 letter whereby the maximum letter word limits to 14 and the lowest to 1. Furthermore, the standard deviation of 6.31 implies the variation amongst the words and letters. Therefore, histogram presents that majority of words either 2 or 4 letter words and followed by 3 letter, and 5 letter words.
On the other hand total sentences comprise of 957, in the text profile of the text file. Highest limit of words in one sentence tally 83, whilst the lowest counts 2 words in a sentence. Thus, on an average, 11 words established for the total number of sentences in the text profile with deviation from the average words of ±9.96.
Thus, this article presented methods to perform KWIC, compare wordlists and graphical presentation of wordlists, along with their interpretations. However, the next article comprises of creation of vocabulary, comparison and joint frequencies.
- Brier, A., & Hopp, B. (2015). HAMLET II 3.0: Software for computer assisted text analysis. Southampton/Cologne. Retrieved from http://apb.newmdsx.com/hamlet2.html.
Latest posts by Avishek Majumder (see all)
- Importance of machine learning in biomarker research - January 22, 2019
- Gaps in health insurance plans are resulting in increasing allergy prevalence in India - January 19, 2019
- Impact of international policies on the Indian Indian drug market - January 19, 2019