Searching for collocates

Overwhelmed by the sheer quantity of evidence?

When you are dealing with frequent words, you easily find yourself having to sift through hundreds of concordance lines to make sure you understand how these words are used in context.

Sorting can help, but there is an even more practical way of showing corpus results in a compact form: the collocation display.

Collocates of sheer

Imagine that you want to investigate the use of the adjective sheer on the web (e.g., for a lexicography project). You would probably start by looking at what nouns it typically modifies. But the Leeds Internet corpus has 947 occurrences of sheer, most of them followed by nouns... You do not have time to go through them all in detail, so what can you do?

To get a quick first impression, you can use the collocation facility.

Go to the Leeds Internet Corpus;
Type in the word sheer in the search box;
Tick the "Compute collocation statistics" and the "Loglikelihood score" check boxes;
Submit the query as you would in a normal search for concordances.

Instead of returning a concordance, the programme should have produced a list of Adjective (sheer) + Noun sequences, ordered according to a statistical measure of "relatedness" known as log-likelihood.

The collocates seem to fall into two rather clear sets.

Taking the first 20 entries in the list, can you assign the nouns to the two sets?

How could you label each set in terms of its semantic preference?

When you have finished your analysis, click the button below.

« previous | next »

abstract nouns (esp. referring to emotions)	quantity nouns
joy	volume
luck	number
terror	size
stupidity	scale
pleasure	magnitude
force	quantity
willpower	amount
nonsense	weight
coincidence
incompetence
complexity