Joint frequencies analysis using Hamlet II

By Avishek Majumder & Priya Chetty on September 26, 2018

The previous article explained how to perform wordlist comparing, KWIC and text profile using Hamlet II. Further, this article focuses on the creation and comparison of vocabulary. It also explains how to perform performing joint frequencies analysis using Hamlet II.

Joint frequencies analysis helps to search inter-connections between a number of keywords or character strings occurring in the text. It produces matrices of joint frequencies of the items of a specified vocabulary list with respect to a suitably chosen unit of context. After calculating a matrix of joint frequencies of all pairs of main entries in the vocabulary list, Hamlet II offers a simple cluster analysis procedure and a correspondence analysis to explore word and category associations. However, the first step for performing joint frequencies analysis is to create a vocabulary list.

Creation of vocabulary list

Before conducting a joint frequencies analysis, it is important to check the text file or transcript or content for spelling mistakes. This is because, if a word is repeated multiple times in the content but with a different spelling each time, they will be interpreted as different words on Hamlet II. Therefore, it is important to first create this ‘vocabulary list’ on Hamlet II to club together the misspelt words before conducting the joint frequency analysis.

Follow the below steps for creation of a vocabulary list:
  1. In Hamlet II, go to Tools > Create a vocabulary list (steps 1 and 2 figure below).
  2. Set the number of characters or letters for keyword analysis. The maximum number of words in the search list helps set the number of words for vocabulary list (step 3 in the figure below).
  3. Create a new file with .VOC file extension and save (steps 4 & 5 in the figure below).
Figure 1: Steps for creating a vocabulary list on Hamlet II
Figure 1: Steps for creating a vocabulary list on Hamlet II
Figure 2: The two sections of a vocabulary list on Hamlet II
Figure 2: The two sections of a vocabulary list on Hamlet II

The vocabulary list comprises of two sections. The main section comprises of words or keywords from the transcript or the text file, whereas the other list contains the synonyms or related words. For instance, main entry is ‘music’, its synonyms comprise of musical, music, Music, and Musik.

Editing a vocabulary list

  1. On Hamlet II, go to Tools > Edit Vocabulary list > Edit popup box.
  2. Choose the .VOC file for editing.
  3. Edit the texts from the main entries and related entries accordingly.
Figure 3: Editing a vocabulary list
Figure 3: Editing a vocabulary list

How to perform joint frequencies analysis on Hamlet II?

  1. In Hamlet II, go to Tool Bar > Hamlet (step 1 in the figure below).
  2. Choose the appropriate.TXT and create .VOC files (step 2 in the figure below).
  3. Click on ‘Count joint frequencies or the specified vocabulary’ (step 3 in the figure below).
    Figure 4: Steps 1, 2 & 3 of conducting joint frequencies analysis using Hamlet II
    Figure 4: Steps 1, 2 & 3 of conducting joint frequencies analysis using Hamlet II
  4. After choosing the files the vocabulary list appears along with a confirmation window.
    Figure 5: Step 5 of conducting joint frequencies analysis on Hamlet II
    Figure 5: Step 5 of conducting joint frequencies analysis on Hamlet II
  5. Click on ‘Count joint frequencies for the specified vocabulary’ as shown in the figure above.
  6. Lastly, choose the appropriate options. Most importantly choose variable context, without which it is not possible to perform the joint frequencies analysis.
    Figure 6: Step 6 of conducting joint frequencies analysis on Hamlet II
    Figure 6: Step 6 of conducting joint frequencies analysis on Hamlet II
  7. The results indicate the category or word counts, joint frequencies and coefficients chosen accordingly (figure below). However, one can choose the Sokal coefficient and van Eck/Waltman coefficient.
    Figure 7: Step 7 of joint frequencies analysis using Hamlet II
    Figure 7: Step 7 of joint frequencies analysis using Hamlet II

The results are dependent on the coefficient of similarity applied to the raw frequency counts. The Sokal coefficient takes account of joint non-occurrences. The Jaccard coefficient, applied by default, is, however, suitable for most purposes. Lastly, a probabilistic measure is provided by Eck and Waltman’s test.

Interpreting joint frequencies analysis using Hamlet II

The results from the joint frequencies are presented as ‘Jaccard similarity index’. The Jaccard similarity index compares members for two sets of word lists to see which members are shared and which are distinct. It’s a measure of similarity for the two sets of data, with a range from 0% to 100%. The higher the percentage, the more similar the two populations.

For instance, if joint frequency indicates a combination of two conditions happening together or appears for 7 times and 16 times respectively; the Jaccard similarity index indicates 23% of the total appearances. Similarly one can draw such implications for ‘Sokal coefficient’ and ‘Eck and Waltman’s coefficient’. The analysis from joint frequencies consequently allows analyses for correspondence, cluster analysis, multi-dimensional scaling, and multiple text comparisons.

Importance of joint frequencies

Joint frequencies analysis is useful for a number of reasons.

  1. It helps assess inter-connections between a number of keywords or, more generally, character strings.
  2. Joint frequencies analysis also helps in assessing matrix of joint frequencies of all pairs of main entries in the vocabulary list.
  3. It allows exploratory single-linkage cluster analysis, and also a correspondence analysis of the profiles of the context units.
  4. Lastly, it allows analysis of the similarity matrix, multidimensional scaling techniques, and cluster analysis. The findings presented in the dendrogram, and clusters on the basis of minimum similarity.

This article focuses on preparing steps for easy vocabulary list creation and editing thereby allowing joint frequencies. Further, in the next article, results from the joint frequency used to assess correspondence and cluster analyses and other multidimensional analyses are discussed.

Discuss