Searching for collocates
When you are dealing with frequent words, you easily find yourself having to sift through hundreds of concordance lines to make sure you understand how these words are used in context.
Sorting can help, but there is an even more practical way of showing corpus results in a compact form: the collocation display.
Imagine that you want to investigate the use of the adjective sheer on the web (e.g., for a lexicography project). You would probably start by looking at what nouns it typically modifies. But the Leeds Internet corpus has 947 occurrences of sheer, most of them followed by nouns... You do not have time to go through them all in detail, so what can you do?
To get a quick first impression, you can use the collocation facility.
- Go to the Leeds Internet Corpus;
- Type in the word sheer in the search box;
- Tick the "Compute collocation statistics" and the "Loglikelihood score" check boxes;
- Submit the query as you would in a normal search for concordances.
Instead of returning a concordance, the programme should have produced a list of Adjective (sheer) + Noun sequences, ordered according to a statistical measure of "relatedness" known as log-likelihood.
- The collocates seem to fall into two rather clear sets.
- Taking the first 20 entries in the list, can you assign the nouns to the two sets?
- How could you label each set in terms of its semantic preference?