Lexical priming and the properties of text

© Michael Hoey

University of Liverpool

This paper is divided into two very uneven parts. In the first, I want to briefly sketch out a theory of language, first proposed in Bertinoro at TALC (July 2002) but expanded upon here. In the second, I want to look at a short piece of popular scientific writing and to comment on the way the lexis contributes to its key textual features.

First, the theory. The classical theory of the word is well reflected in the two central compendia of linguistic scholarship of the 19 th century - the Oxford English Dictionary and Roget’s Thesaurus. According to such texts, lexical items have pronunciation, grammar(s), meaning(s), etymology, and relationships with other words, especially synonymous and co-hyponymous relations. The theory of language to which the dictionary and the thesaurus contribute is a theory of the lexical item as an isolated element organised by syntax, realised by phonology, and latterly cross-referenced by text. Corpus linguistics has however shown such a theory to be suspect, and the first great feature standing against it is collocation.

Collocations are both   pervasive and subversive. Their pervasiveness has often been noted, both from the perspective of the lexical item – probably all lexical items have collocations (Sinclair, 1991, Stubbs, 1996) – and from the perspective of the sentence – it is possible to show that whole clauses are made up of interlocking collocations such that the sentence could be said to be a reproduction with variation of an earlier sentence (Hoey 2002). In such cases it is not simply the case that the sentence uses collocation; the sentence only exists because of the collocations it manifests.

The subversiveness of collocation has rarely been noted, but it is as important a property as its pervasiveness and stems from it. The ubiquity of collocation challenges current theories of language because it demands explanation, and the only explanation that seems to account for the existence of collocation is that each lexical item is primed for collocational use. By primed , I mean that as the word is learnt through encounters with it in speech and writing, it is loaded with the cumulative effects of those encounters such that it is part of our knowledge of the word that it co-occurs with other words.

Collocation priming is not a permanent feature of the word. Each use we make of the word, and each new encounter, either reinforces the priming or loosens it, if we use it in defiance of the priming. It may accordingly shift in the course of an individual’s life-time, and if it does so, and to the extent that it does so, the lexical item shifts slightly in meaning and/or function. This may be referred to as drifts in the priming .

Collocational priming is sensitive to the domain in which the lexical item is encountered. Part of our knowledge of a lexical item is that it is used in certain combinations in certain kinds of text. So the phrase in winter is primed for use in travel writing whereas the phrase during the winter months , which means more or less the same thing, is primed for use in gardening writing.

If we accept this position, and I see no alternative explanation of how collocations come into being, it becomes possible to argue that lexical priming accounts for much more than just collocations. Everything we know about a word is a product of our encounters with it; everything we know about a word can be formulated in terms of its priming. So the grammatical category a word belongs to can, for example, be seen as its grammatical priming. Instead of saying breakfast is a noun, unemployed is an adjective, I would argue that one should say ‘ breakfast is primed for use as a noun’, ‘ unemployed is primed for use as an adjective’. Just as it is possible to use a lexical item without recourse to its collocational priming, so one can use a lexical item without recourse to its grammatical priming (or more accurately with recourse to its less dominant grammatical priming) e.g. Are you going to breakfast here ? The unemployed are always with us – or even you have unemployed me . In a recent charity appeal letter from ITDG, the following sentence appears:

If your supporter number ends in “D”, you already Gift Aid your donations.

Ignoring the temptations offered by supporter and D , let me focus on Gift Aid . The capitals reflect the phrase’s grammatical priming as nominal group, but the point about priming is that it can be ignored, and here it is indeed being ignored. Of course if every subsequent charity appeal includes a similar sentence, then the phrase will become primed for use as (part of) a verbal group in the domain of charity appeals and the genre of appeal letters. Priming, as already remarked, never stops and consequently is open to drift.

Some words are so primed that grammatical categories cannot be applied conveniently to them. Sinclair (1991) has argued, for example, that of is not a true preposition, and I have never seen a grammatical account of ago that convinced me. What we call grammatical categories may be post-hoc generalisations from the individual instances of lexical primings.

So, I am arguing, lexical items are collocationally and grammatically primed. If we accept this, i.e. if we accept that learning a lexical item entails learning what it occurs with and what grammar it tends to have, it opens the door to other types of priming. Indeed it opens the door to a more radical view of priming as an account of how language is constructed. In the first place, lexical items are also primed for semantic association. Semantic association occurs when a word associates with a semantic set or class, some members of which will normally also be collocates. The existence of those collocates will in part explain the existence of, and in part be explained by their membership of, the semantic set or class in question.

An example of this kind of priming is the lexical item undercover , which has a semantic association with ‘official (representative of) organisation with the role of defending or attacking something’. Thus we attest undercover action by a range of recognisable collocates of undercover , e.g. agents, cops, detectives, police, SAS . But also in the set we have instances such as the Automobile Association, Alpha 66 (a paramilitary group) and the Special Operations Group (the plain clothes branch of the RSPCA). Preliminary research strongly supports the idea of semantic association, and under different labels and with varying but related meanings (semantic prosody, semantic preference) it has received attention from Louw (1993), Stubbs (1996), Sinclair (1997) and myself (Hoey, 1997).

Another, very important, kind of lexical priming is that of colligation; lexical items are primed to occur in, or avoid, certain grammatical functions or structures. The term comes from Firth (1957), who introduced the term as part of a pair with collocation, and Halliday (1959). Unlike its more famous sibling, colligation has received little attention until the past five years when it was independently resurrected by Sinclair (1996, 1997) and myself (1997, 1998) in very similar ways. My definition, which does not differ greatly from Sinclair’s but does vary somewhat from Firth’s original conception, is as follows. Colligation, for my purposes, can be defined as

•  the grammatical company a word keeps, or avoids keeping, either within its own group or at a higher rank;
•  the grammatical functions that a word’s group prefers or avoids;
•  the place in a sequence that a word prefers or avoids.


If there has been no mention of the corpus, the reader wonders where these examples come from.

Thus undercover agent , identified above both as a collocation and as an illustration of a particular semantic association, is primed to colligate with indefiniteness. Only one in five instances of undercover agent are definite. (This also illustrates the important fact that primings nest; indefiniteness is not necessarily a priming of undercover , but it is a priming of the collocating pair undercover agent. ) It should be noted that the above priming could have been formulated negatively: undercover agent has a negative preference for definiteness. All primings can be formulated in terms of avoidance as well as preference, but it is usually only the colligational and textual primings where such formulations are economical.

I mentioned a moment ago ‘textual primings’. The third element of my definition of colligation in the previous paragraph in fact slipped in yet another posited priming – textual colligation. The implication of the third kind of colligation is that lexical items are also primed for textual position (e.g. beginning of sentence, beginning of paragraph). Thus preliminary investigation suggests that, for example, x years ago has a powerful tendency to begin both paragraphs and texts. I shall have more to say on this below.

I hypothesise that when we acquire a lexical item, it becomes primed for collocation, grammatical category, semantic associations and colligation, and it is not properly acquired unless it has all this priming. So, to illustrate, result is primed for collocation with good, it is primed for use as a noun or as a verb, it is primed for semantic association with positiveness ( a good result, a great result, an excellent result, a brilliant result, etc.), and it is primed for use in certain grammatical contexts, e.g. definiteness ( the result v. a result ).

The formulation of priming thus far, and all my examples, seem to prioritise the lexical item. In fact, though, this is a conscious simplification. Initially it must be sounds or stretches of sound that are primed. A phonetic/phonological starting-point allows priming to explain word-play, malapropisms and rhyme. There is, for the purposes of priming theory, no qualitative distinction amongst the priming of sl as in slip (but also slimy, slope ) to indicate a slippery quality, the priming of slip to have a quasi-collocational preference for ery (but also way, shod ), the priming of slippery to have a collocational preference for slope (but also customer ) and the negative priming of slippery slope for Subject function in the clause – it likes to end clauses either as prepositional Object or as direct Object; it is also positively primed for Range (for the term, see Halliday, 1994). All but the first of these involve some nesting.

If we see priming as attaching itself in the first place to syllables and sounds, we can see that syllables such as ing, to , and ed have their own priming. Looked at from one perspective, what we think of as grammar is in my terms the accumulation and interweaving of the commonest sounds, syllables and lexical items of the language. So grammar is, in such terms, the sum of the collocations, colligations and semantic associations of words like is, was, the, a and of , syllables like ing, er and ly , and sounds like [t] (at the end of syllables) and [s] and [z] (likewise at the end of syllables).

Looked at from another perspective (and I believe the two perspectives to be compatible), what we think of as grammar and semantics are the products of the accumulation of all the lexical primings of an individual’s life-time. Grammar and semantics are the products of the cumulative primings of all the lexis we encounter. As we connect up collocational primings, so we create semantic associations; as we connect up semantic associations, we create an incomplete, inconsistent and leaky, but workable semantic system. As we connect up grammatical primings, so we create colligations. As we connect up colligational primings we create an incomplete, inconsistent and leaky, but workable grammatical system.

These systems, post-hoc and incomplete as they are, nevertheless, when reflected upon, may in turn be brought to bear on the lexical primings that gave birth to them and these may then be adjusted to bring them in line with the grammatical system or semantic system that the language user has postulated. Alternatively a tension may be created between the data and the leaky system. In such circumstances cracks in the priming may occur as a result of conflict between the original priming and the self-reflexivity of the post hoc systems.

Cracks may also occur as a result of education – another form of self-reflexivity. If as a child you have a word primed to function in a particular way and you are told that it is incorrectly primed, the result is again a a potential crack in the priming. Cracks can be resolved either by adjustment of the original priming or by rejection of the attack on the priming. Or, worse than these, the result can be uncertainty about the priming, a codification of the crack, leading to long-term linguistic insecurity. Primings are of course, as already noted, domain specific. Potential cracks can most successfully be prevented by assigning one set of primings to one domain or social context (e.g. family and friends) and the other to another domain or social context (i.e. education, science, the middle class etc.).

If this view of language is correct, if grammar and semantics are post-hoc effects of the way lexical items have been primed, there are three increasingly alarming implications. The first is that grammar is less central to our understanding of the way language works, and semantics as a system is also less central (which in reality it has always been in linguistics).

Secondly, there is no right or wrong in language. It makes little sense to talk of something being ungrammatical. All one can say is that a lexical item or items are used in a way not predicted by your priming.

The third and most alarming implication is that everybody’s language is truly unique, in that all our lexical items are primed differently as a result of different encounters. This last implication, though, raises other questions – after all, strangers from the same linguistic community usually understand each other without trouble, and you are presumably understanding what I am writing despite our sharing little or nothing in the way of common linguistic experience. What then are the controlling factors that bring our primings into line with each other?

One of these controlling mechanisms is our own self-reflexivity. As we find cracks in our priming, we seek to regularise these, using the makeshift grammar we are constructing (note the present tense). Our grammar is never finished, at least in theory, though in old age it practically may slow up along with the ossification of colligation, vocabulary etc. Analogy, referred to by Chomsky et al, as proof of innate grammar – children never hear goed but say it - is better seen as an attempt to resolve a priming crack. The sound [d] and the syllable [?d] will normally be primed for use with action lexis, but go will not be primed for co-occurrence with [d]. Note also that [d] will be part of the priming of pronouns and names in the narrative genres to which children are exposed. Consequently, since [d] is more frequent than go , and since the need to produce a past form of GO would typically occur in conjunction with a name or pronoun in a narrative context, the priming of the more common item [d] temporarily overwhelms the negative priming of the less common item ( go ).

To return, though, to our question of how our disparate primings are controlled, self-reflexive harmonising will only go some way to ensuring a degree of consistency of primings across speakers. In principle at least, self-reflexive harmonising might result in different and clashing grammatical and semantic systems for each language user. So there is still a need for other controlling mechanisms to harmonise the primings. The controlling mechanisms in a culture are necessarily of great importance and consequently tend to be areas of great controversy in that culture.

The most important controlling mechanism in the great majority of cultures is education. Examinations are, amongst other things, attempts to ensure that only those whose primings harmonise with those already in positions of influence or power move into jobs that will in turn exercise influence or power over others. So in the UK, examination boards are required specifically to look out for grammar and spelling. Mastery of a subject is, again amongst other things, mastery of the collocations, colligations and semantic associations of the vocabulary of the discipline – mastery, in fact, of the domain-specific and genre-specific primings.

The second way in which a culture may attempt to harmonise its primings is through its shared literary and religious tradition, whether that tradition is formulated in terms of great literary works, an oral performance tradition or sacred texts. If I talk to my mother and you talk to yours, the priming effects may be quite different, but if I read David Copperfield and you read David Copperfield, the priming effects should be (but actually need not be, dependent on previous primings) the same. Arguments that (used to) break out about the status of the literary canon or about changes to the Anglican order of service (for example) are partly attempts to prevent the "wrong" harmonisation of the primings. It is however only in isolated and self-contained cultures and in sub-cultures within the larger cultures that this kind of harmonisation can be effective.

The third way in which modern cultures harmonise the primings of a linguistic community is through the mass media, which are second only to education in this respect (and possibly have more importance for some speakers). But of course the priming may be different from that promoted by education and is domain specific. There are also issues here with regard to receptive priming versus productive priming (which apply to the literary canon as well). Productive priming occurs when the source(s) of the priming include members of a linguistic sub-community with whom the speaker has an affinity or that s/he aspires to join. Receptive priming occurs when there are reasons why the speaker might not want to claim membership of the community from which the sources come. So a newscaster on TV is a member of a linguistic community that the great majority of listeners are unlikely to want to emulate, and the same applies to nineteenth century novelists.

Harmonisation comes in one final resource: dictionaries and grammars. This is why there have been over the past three decades so much distress at the repeated claim of linguists, including grammarians and lexicographers, that they describe, not prescribe. Such a posture is seen by those who instinctively recognise the need for harmonisation as a betrayal. The problem of course is linguistic scientists find it hard to be linguistic legislators (and vice versa). Every time a new dictionary comes out, I am interviewed about it; and journalists always ask about whether certain new words should have been included. Dictionaries enshrine and enable a degree of harmonisation of priming. (But they may contribute to cracks too).

Since texts are permanent rather than evanesant, literacy becomes central to a nation’s attempt to harness the dangerous effects of priming. This raises the central question of creativity. If items are primed, are we bound to say what we have been primed to say? Only a moment’s thought shows that this is not the implication – the whole thrust of my argument concerning priming has been that lexical items (or syllables or sounds) are loaded with the effects of all the encounters we have had with them, but that this loading is a matter of weighting, not a matter of requirement.

Creativity involves a selective overriding of the primings. Thus if, for example, a word is positively primed for one particular collocation, positively primed also for one particular colligation, and negatively primed for another particular colligation, the creative writer may utilise both the positive primings but override the negative, or s/he may override the collocational priming but conform to the colligational primings.

That raises other issues, though. Lexical priming has so far been talked about in mostly local terms. But texts are organised on an altogether different scale. I move at this point to the second part of this paper. If a theory of lexical priming is not to stall, it must be able to suggest how text organisation might be affected or created by lexical priming, text here being neutral as to speech or writing. I do not at present have the data to undertake a detailed investigation of the interaction between lexical priming and spoken text. What follows is therefore regrettably and damagingly restricted to written text.

The properties of a written text are many, but certainly include the following:


•  Written texts are interactively produced and processed. There is a writer-reader interaction, of which the text is the site or residue, depending whose side you take (the writer’s or the reader’s).
•  All but the shortest written texts are chunked. As writers construct their texts, they group sentences that seem to belong more closely together; as readers process texts, they make use of paragraph, section and chapter divisions in their interpretation, as well as becoming aware of other groupings not orthographically marked.
•  Not withstanding the chunking just referred to, written texts are linearly developed. On a moment by moment basis, both in the construction and the processing of a text, each sentence/paragraph/section/chapter builds upon those before and creates a basis for those following. One effect of the first three properties is that sentences (and parts of, and groups of, sentences) relate to each semantically in predictable ways, e.g. cause-consequence, contrast, generalisation-exemplification.
•  Written texts are cohesive. It is of course still disputed whether cohesion contributes to coherence or is a by-product of coherence; either way the phenomenon exists and contributes to the recognisability of certain relations, e.g. matching compatibility, if not to their creation.
•  Notwithstanding the third point in this list, written texts are also non-linearly developed. Connections can typically be found between non-adjacent sentences/paragraphs/sections, independently of those mediated by intervening sentences/paragraphs/sections. These connections are certainly reflected in, and may in part be generated by, the cohesion of the text.


The question I have set myself, then, is: to what extent can the properties of a text be accounted for in terms of lexical priming?

A full answer would require a great deal of highly detailed research and its report would fill a book. All I can do here is sketch a possible answer, or rather a set of possible answers. One of the problems one has is that if one is investigating collocation, 250 lines may only just suffice in some circumstances. If one is investigating the special use of a word in text organisation, 250 texts may likewise barely suffice. While a corpus of 100 million words will often generate sufficient lines to permit a useful collocational analysis of a lexical item, it does not always generate sufficient texts for an analysis of similar reliability. Furthermore, it takes relatively little time to inspect the immediate surroundings of a lexical item and there is plenty of software to assist one in this task. It takes, on the other hand, a long time to inspect a text for examples of a cohesive chain (to take just one feature explored).

I have already attempted a partial answer to the question I set myself above in Hoey (2003). In that paper, I argued that paragraphing was a lexical phenomenon and that lexical items might be primed either negatively or positively with regard to paragraph boundaries. In other words, some lexical items like to begin paragraphs; others avoid occurring at the beginning of paragraphs. I also argued there that in the same way some lexical items are primed to begin texts. I shall not rehearse those arguments again here; I have in any case briefly touched on this topic in my discussion of textual colligation above. Instead I want to look at two of the other properties of text listed above. As noted in the list of properties of written text, it is generally agreed that texts normally manifest cohesion, though there is no agreement about the relationship, if any, of cohesion to coherence, and it is generally agreed that texts manifest semantic relations amongst their parts, though there is no agreement as to how best to characterise these semantic relations. I want to suggest that both cohesion and semantic relations are best described in terms of lexical priming.

Consider the following text, taken from The Independent on Sunday, 18-02-90. The reason I have chosen this text is that I have analysed its cohesion in a previous paper and its cohesive patterns are therefore familiar (Hoey, 1995). Sentences have been numbered; the author’s by-line has not however been included in the numbering. The original’s typography and punctuation have been retained, but the columnar presentation has been ignored and a picture and accompanying caption have been omitted.


The invisible influence of Planet X

(1) Sixty years ago today Pluto was discovered, now the hunt is on for an elusive tenth planet. Simon Mitton reports

(2) THE TEXTBOOKS say there are nine planets in our solar system. (3) The most distant is Pluto, discovered on 18 February 1930 by astronomers at the Lowell Observatory, Arizona. (4) But some astronomers have continued to suspect there may be a tenth planet lurking even further away which has somehow discovered detection . (5) One, Robert Harrington, of the US Naval Observatory in Washington, has begun a new search for “Planet X”.

(6)   He is using similar techniques to those used by Clyde Tombaugh 60 years ago to discover Pluto. (7) The young astronomer had detected a sure sign of an object orbiting the Sun by comparing two photographs showing that a speck of light had shifted position against the stars.

(8)   For a number of years before Mr Tombaugh’s discovery, the existence of a ninth planet was suspected, because something large was affecting the orbital path of Uranus around the Sun. (9) The locating of Pluto was thought to explain the Uranus effect. (10) But Dr Harrington and other sceptics say Pluto is too small to explain the orbits of the planets in the outer regions of the solar system, such as Uranus and Neptune.

(11)   Most astronomers nowadays work on such exotic problems as the origin of our universe or the properties of black holes, but Dr Harrington is cast in a traditional mould. (12) As director of the astrometry section at the Naval Observatory he prefers the classical work of finding the positions of stars and planets with the greatest accuracy.

(13)   Dr Harrington is continuing a tradition of planet-searching which began thousands of years ago, when ancient astronomers identified the five planets visible to the naked eye; Jupiter, for instance, is the brilliant object high up in the southern sky, and Venus can be easily seen in the east at sunrise.

(14)   In 1781, the British astronomer Sir William Herschel found a new planet, Uranus, setting off feverish planet-hunting. (15) This led to the discovery of the asteroids, or minor planets, in greatest profusion between Mars and Jupiter. (16) Meanwhile, mathematicians had become involved in the great planet hunt. (17) They found they could not match the orbital path of Uranus around the Sun to that predicted from the laws of gravity; there must be another planet, or planets, pulling it off course. (18) Neptune, located in 1846, appeared to offer an incomplete solution. (19) This century, Percival Lowell, an American astronomer, who achieved notoriety for suggesting Mars had canals and life, urged a search for a further planet using wide-angle cameras.

(20)   For 20 years, astronomers at the Lowell Observatory searched the skies. (21) Mr Tombaugh finally spotted his moving speck of light after a year at the job. (22) However, Pluto seemed surprisingly small and faint, and astronomers almost immediately suspected it was not massive enough to pull Uranus off-course. (23) Mr Tombaugh himself had doubts. (24) Still a professional astronomer at 83, he said last week: "It was much fainter than we expected, and I carried on searching, just in case there was another one, for 14 years, until May 1943." (25) More recent evidence confirms that Pluto is much too small to influence Uranus and Neptune's orbits. (26) In 1978, James Christy at the US Naval Observatory accidentally found a moon, subsequently called Charon, orbiting Pluto. (27) Its motion showed that the mass of Pluto is a thousand times too small to influence the giant planets.

(28)   This has become the strongest evidence for the mysterious tenth planet. (29) David Dewhirst, of the Cambridge Institute of Astronomy, sees the current search as more promising. (30) "There are another 20 years of data for a start, and that helps. (31) But more significant perhaps are the great advances in computing. (32) The computer models used by the Jet Propulsion Laboratory in Pasadena can, for instance, handle much more complex calculations."

(33)   At JPL, where the tracks of space probes through our solar system are computed with phenomenal accuracy, theorists find that recent observations of Uranus and Neptune do not fit computer predictions using a nine-planet model. (34) The laboratory's observers found that Uranus is drifting out of its predicted orbit by 1,000 miles a year. (35) "One possible explanation is an unseen planet," Dr Harrington says. (36) Nevertheless, Mr Tombaugh remains sceptical. (37) "I did my searching very thoroughly and very slowly. (38) If it's there, it should have shown on my plates. (39) However, I only covered two-thirds of the sky and the weakest part of my search was in the south. (40) I think the case for Planet X is marginal; maybe it's there, and maybe it isn't. (41) Let's see."

(42)   Dr Harrington has now begun work on two fronts, running new computer calculations in Washington and making fresh observations in New Zealand. (43) "My computer strategy is to make model solar systems that include the nine known planets plus a guess at Planet X. (44) I then run lots of these 10-planet simulations to give the smallest possible deviation of Uranus and Neptune from their observed positions. (45) Each time we do this, we predict a position for Planet X in 1990. (46) What we are finding is that the permitted positions for Planet X cluster in a small region in the sky."

(47) The inclusion of the irregularities in Neptune's orbit is new, and that could be why computer models are showing a narrower search area. (48) Neptune's true position is accurately known, following the Voyager 2 encounter in August 1989.

(49)   Dr Harrington says the most remarkable feature predicted for Planet X is that its orbit is tilted 30 degrees away from the ecliptic, the main plane of the solar system, where all previous searches have concentrated. (50) His models also predict a greater distance from the Sun, about 10 billion miles, or between two or three times as distant as Pluto.

(51) In April the new sweep starts in earnest at the Black Birch Observatory in New Zealand. (52) A modest 8in telescope, similar to that used by Mr Tombaugh, will examine the northern part of the constellation Centaurus. (53) Pairs of photographs of the same region of sky taken on successive nights will be sent to Washington. (54) Using a blink comparator, a device that compares two photographs, Dr Harrington hopes to locate any faint object that has moved during the interval between the two pictures.

(55) A serious problem is that the search area falls close to the Milky Way, and every plate will include millions of faint stars in our galaxy. (56) The planet, if it exists, must be picked out from this crowded background.

(57) Dr Harrington says astronomers still do not understand the outer regions of our solar system. (58) He hopes Planet X will explain the mysterious "wobble" of Uranus and Neptune. (59) "I think we have a 50-50 chance of showing that the anomalies are due to another planet orbiting 10 billion miles from the Sun."


My first textual claim is that lexical items are primed for cohesion. This priming occurs in two ways. Firstly, each item is primed to occur in cohesion chains or to avoid such chains; if it is primed to avoid chains, it may be primed either to also avoid isolated ties or to occur in them. Thus far, the priming concerns the availability of an item for participation in cohesion; it says nothing about the nature of the cohesion, should cohesion occur. However, a second kind of priming relates to the nature of the priming: those items that are primed to occur in cohesive chains (or cohesive ties) may also be primed to occur in chains constituted in particular kinds of ways. Thus the lexical item Bush is primed to occur in chains making use of pronouns and co-referents; if you doubt this, ask yourself whether you would expect a text that contains Bush in the first sentence not to contain a cohesive chain stemming from the first occurrence made up of instances of he and the President . Texts without such a chain can exist and, no doubt, do, but they have the same status as lexical items occurring without their collocates and avoiding their colligations.

It is my claim that it is an inherent property of the lexical item that it has these cohesive characteristics. We are told that many of the characteristics of the human being discovered in adulthood are latent in the genes from conception. So also many of the characteristics of the text are latent in the lexical items from the moment of their selection.

My second claim is every lexical item may be primed to occur as part of a textual semantic relation (what Winter, 1977, called ‘clause relations’).

As evidence of this claim, we turn to a few lexical items from the text and examine their relationship to their immediate textual environment and to the text as a whole. In each case I shall first consider the item’s cohesive and semantic relations with the rest of the Planet X text and then examine the concordance evidence for the item’s lexical priming with respect to cohesion and the different kinds of semantic relations. The corpus on which I draw is 100 million words made up of Guardian news text. Since the article above is drawn from another British broadsheet newspaper, this would appear to constitute a valid corpus for purposes of comparison. Since, further, all claims about priming are domain and genre specific, a general language corpus would in any case be for my purposes theoretically objectionable (except of course as a way of checking the claim about domain and genre specificity).

The Planet X text begins with the words Sixty years ago today Pluto , and it makes sense therefore to consider these first. We will then look at the words that begin the second sentence: The textbooks say there are. Reference will be made to other words in these sentences in less detail.

Beginning with sixty , we find that in the Planet X text it has a single cohesive link with 60 in sentence 6; otherwise there is little in the way of plausible cohesion in which it participates in this text (a connection with 20 in sentence 20 being a marginal candidate). In short it makes little or no contribution to the cohesion of the text.

The question then was whether the cohesive behaviour just described is characteristic of sixty . To determine this, it was necessary to consult a concordance of sixty. Because of the work involved in examining the cohesion of every text from which a concordance line is drawn, I restricted myself for this purpose to a concordance of 100 instances of sixty , drawn from a corpus of newspaper writing. Examination of these instances revealed that 56 formed no cohesive links with any other item, 25 formed one link, 11 had two or more links across two or fewer sentences (i.e. not forming a chain) and just 8 participated in a cohesive chain. In other words, 92% of the 100 concordance instances were not part of chains. The calculation can alternatively be done with the number of texts involved as the base rather than the number of concordance lines. In the case of sixty , this has no appreciable effect on the statistics with 86 texts out of 93 (92% again) having no cohesive chain involving sixty .

The concordance evidence suggests therefore that sixty is primed to avoid cohesive chains, and this is in accordance with the fact that in the Planet X text it does not form a cohesive chain. In the eight instances in the corpus where sixty occurred in chains, the cohesion was in every case with a figure. Again this accords with our text where sixty is cohesive with 60 (and possibly 30 20 ).

In the Planet X text, the first clause sixty years ago today Pluto was discovered contrasts with the second now the hunt is on for an elusive tenth planet , with the following parallelism:


Sixty years ago today     Pluto         was discovered

now         an elusive tenth planet   the hunt is on for

The two clauses are also in a time sequence relation marking change.

If we look at the same concordance of 100 instances of sixty, we find that 41% occur in a semantic relation of contrast, with a further 16% being involved in a relation of comparison without a focus on difference; another 37% occurred within the Problem statement of a Problem-Solution pattern ( Hoey, 1983/1991; 2001 Not in Biblio ). So it can be argued that sixty is primed for us in contrast relations and the Problem part of Problem-Solution patterns, and its behaviour in the Planet X text is in accordance with the first of these primings.

Turning now to years , we find a quite different picture as regards the item’s cohesive behaviour. In the Planet X text, years participates in the following chain; sentence numbers are in brackets after the participating items:

years (1) – 1930 (3) – years (6) – years (8) – years (13) – 1781 (14) – 1846 (18) – years (20) – years, 1943 (24) – 1978 (26) – years (30) – year (34) – 1990 (45)

1989 (48).

It will be seen that years here has formed a chain with simple repetition and with hyponyms (e.g. 1930 ).

Inspection of 100 lines of concordance of years shows that 56% of instances of the lexical item participated in chains and a further 17% formed an isolated cohesive link. (There were no instances of two or more links between two or less sentences.) Consequently 73% of instances of years were cohesive in some way. Taken from the perspective of texts, rather than concordance lines, the proportion drops slightly. The 100 lines came from 69 texts, and 41% (28) of these texts contained cohesive chains involving years , with a further 20% (14) containing an isolated cohesive link. Thus 61% of texts had a cohesive instance of years . Both sets of statistics suggest that years is positively primed for cohesion, which is of course in accord with its use in the Planet X text. Of the 28 chains in my data, 20 (71%) contained simple repetition and 21 (75%) contained a hyponym (e.g. the 1970s, 1986 ). This too is in accord with the way years is used in our text, though as remarked earlier the statistics here can only be suggestive of a hypothesis to be explored.

Examination of 100 concordance instances of years revealed that it appeared in contrast relations 38% of the time and in temporal change relations 12% of the time; a further 6% were in comparison relations without a focus on change or contrast. Allowing for a few cases where the item was in a multiple relation, 56% of instances were in some form of comparison relation, whether contrastive or otherwise. Thus the evidence suggests that years is primed for use in comparison relations and particularly contrast relations, and as we have seen its use in the Planet X text is compatible with such a priming.

Turning now to ago , in the Planet X text, ago forms a single link between the use in sentence 1 and that in sentence 6. Looking at 50 concordance instances, only six participated in chains, with the remainder split evenly between forming a single link and participating in no cohesive relations whatsoever. In terms of texts, this meant that only two texts out of 40 (5%) contained chains of ago , with 16 containing a cohesive link (40%) and 22 (55%) forming neither links nor chains. Chains and links make little use of any cohesive feature other than simple repetition, the only exception being arguable links with yesterday (= a day ago) and last year (?= a year ago). Thus it can be claimed tentatively that ago is negatively primed for cohesive chains. When it is cohesive, whether in links or chains, it is positively primed for simple repetition.

I inspected 100 instances of thematised ago to see what semantic relations the item appeared in. This analysis revealed that 71% was involved in some kind of comparison relation, with 55% being part of a contrast relation. Here, again, therefore, ago ’s behaviour in the Planet X text is entirely compatible with its behaviour in the concordances. Put in the terms I have been proposing, ago , when thematised, is primed for use in contrast relations and the Planet X text is an instance of the item being used in accordance with its priming.

The word today is not cohesive in the Planet X text. When however 100 instances of today were examined in my corpus, 43% were found to participate in chains and a further 25% participated in links. From the point of view of the number of texts involved, there were 75 texts examined with today appearing within them, of which 32% (24) had cohesive chains and 25% (again) contained cohesive links. Clearly, then, today is primed for cohesion. Inspection of the kinds of chain in which it participated showed that they were predictably chains largely made up of time references – yesterday – last night – today – next week. (This of course is a good example of the domain/genre specificity of priming claims; today is primed, I would argue, to occur in cohesive chains of this sort in newspaper text. It certainly need not behave like this in all other kinds of writing.)

So our Planet X text instance of today is not in conformity with its dominant priming (though its non-cohesive appearances in my corpus are hardly rare). However, closer attention to its use in this context suggests that its priming for occurrence with ago leads to a cancellation of the cohesion priming. Examination of 18 instances of ago today found only three chains and two links; 13 were not cohesive. (Incidentally, of these 18 instances of ago today , exactly half were text-initial and another four were in the second sentence of the text. The average length of the texts concerned was 21 sentences for the sentence-initial cases and 27 sentences for the second sentence cases. This strongly suggests that today when collocating with ago is primed for beginning texts.

Looking at a sample of 85 instances of Pluto first, 91% of instances of Pluto (with planetary as opposed to Disney reference) occur in cohesive chains. If texts are taken as the base (a more satisfactory procedure), 70% of texts containing instances of Pluto also contain chains in which Pluto participates (though the number of instances in my database was too low, at 34, to permit any safe generalisation). If a larger sample replicated this distribution, we would have evidence that Pluto was primed for cohesion. It is of course part of a cohesive chain in the Planet X text (which might indeed be referred to with almost equal propriety as the Pluto text). Further, the chains in which Pluto participates in the concordance data make use of co-hyponymy ( Uranus, Neptune etc.), superordinates ( planet(s) ) and meronymy ( surface, solar system ), and this is again the case in our chosen text.

Perhaps more surprisingly Pluto also appears, on the basis again of course of an unsatisfactorily small sample of 85, to be primed for certain semantic relations. Of the 85 examples in my corpus, 31 (36%) are involved in a relation of comparison. Pluto is typically compared with other planets or is described in terms of a superlative (the smallest, the coldest, the most distant, the faintest etc). It is instructive to examine the sentences with Pluto in our chosen text (which of course did not contribute to the 85 examples examined). Maybe insert a mention around here that you have returned to the topic of semantic association. They are:


•  Sixty years ago today Pluto was discovered, now the hunt is on for an elusive tenth planet;

where Pluto is part of a comparison (though Pluto itself is not being compared).


•  The most distant is Pluto, discovered on 18 February 1930 by astronomers at the Lowell Observatory, Arizona;


where Pluto is described in superlative terms;

(6) He is using similar techniques to those used by Clyde Tombaugh 60 years ago to discover Pluto;


where a similar situation to sentence 1 pertains;

(9) The locating of Pluto was thought to explain the Uranus effect.

where there is no comparison;


(10) But Dr Harrington and other sceptics say Pluto is too small to explain the orbits of the planets in the outer regions of the solar system, such as Uranus and Neptune.


where Pluto is being compared with what was expected/required;


(21) Mr Tombaugh finally spotted his moving speck of light after a year at the job.


where there is no comparison;


(22) However, Pluto seemed surprisingly small and faint, and astronomers almost immediately suspected it was not massive enough to pull Uranus off-course;


where the word surprisingly indicates that Pluto is again being compared with what was expected;


(24) Still a professional astronomer at 83, he said last week: "It was much fainter than we expected, and I carried on searching, just in case there was another one, for 14 years, until May 1943."


where a comparative form is used;


(25) More recent evidence confirms that Pluto is much too small to influence Uranus and Neptune's orbits.


where Pluto is yet again being compared with what was expected/required;


(26) In 1978, James Christy at the US Naval Observatory accidentally found a moon, subsequently called Charon, orbiting Pluto.


where there is no comparison;


(27) Its motion showed that the mass of Pluto is a thousand times too small to influence the giant planets.


where the situation is identical to that in sentence 25;


(50) His models also predict a greater distance from the Sun, about 10 billion miles, or between two or three times as distant as Pluto.


where the comparison is explicit.

Thus three-quarters of the occurrences of Pluto in the Planet X text are directly or indirectly involved in a comparison relation. Once again our chosen data conforms to the priming tentatively ascribed to the item on the basis of corpus data.

Examination of 42 instances of say in the context say (that) there BE revealed that this is what say (that) there BE is primed to do. A minimum of 18 (43%) (arguably more) were associated with a disagreement. A fossil? Eliminate???

A link with previous text???

In sentence 2 of Planet X we have the statement The textbooks say there are nine planets in our solar system . This statement is set up in the text as an Aunt Sally in order that it can be challenged by ‘Dr Harrington and other sceptics’ (sentence 10).

Looking first at the cohesion patterns of textbooks , there is no clear priming for text*book(s) in my data. Of a hundred instances, 33% were involved in cohesive chains, 10% participated in a cohesive link and the remainder (57%) were not cohesive. From the point of view of the texts in which the items appear (of which there were 68), 15% (10) had text*book(s) chains, 12% (8) had cohesive links and 74% (50) had neither. So the evidence is that there is only weak priming for non-cohesion, to which our text conforms. Informally I observe that the priming for non-cohesion is much stronger when the item appears in premodifying function; I also informally notice that when textbooks is associated with a verb of utterance (e.g. say, write, claim, note ) it again has a stronger tendency to be non-cohesive. But these are matters for further investigation.

Of greater interest perhaps are the semantic relations observed for textbooks + semantic association with content and/or act of communication (e.g. Constitutional textbooks all explain…; …gleaned from their textbooks; it will be up to teachers, museums and textbooks to transmit the truth…; the standard textbooks don’t mention it; … two inter-related principles that can be found in textbooks ). I identified 136 such instances in my data; of these, 86 (63%) were in a context where the claim in the textbook was being denied or challenged or where it was being noted that the textbooks did not make a claim that might have been expected of them. There is therefore a clear priming of textbooks, in conjunction with one or both of these semantic associations of textbooks, for denial, disagreement or challenge.

Because semantic association is a new concept Does it differ fundamentally from semantic preference? that might itself need support, I also looked at 100 instances of textbooks in the following contexts:

•  (part of) Subject
•  in (x) textbooks
•  by (x) textbooks
•  According to (x) textbooks

where ‘x’ represents premodification of some kind. I found that 47% were part of either a claim-denial relation or part of a denial-affirmation relation (Winter 1986, Hoey 2001), e.g.

The rays of the sun tend to diverge when reflected from either a flat or concave mirror. “Almost all textbooks, whether of school or University level, are curiously silent on this commonplace phenomenon.” [quotation marks in the original.]
The textbooks say he will not survive, but surgeons at Great Ormond Street Children’s Hospital believe they have performed the impossible.


Once again, our Planet X text does the expected. The first sentence establishes a claim that the remainder of the text seeks to challenge.

It was natural to follow this up with an examination of 42 instances of say in the context say (that) there BE. This revealed that disagreement is what say (that) there BE is also primed to occur with. At least 18 (43%) and possibly 20 were associated with a disagreement. Perhaps tellingly, all nine of the instances in my data of textbooks SAY are associated with challenge/disagreement.

To conclude then, I have offered evidence in the latter part of this paper that lexical priming, as a way of conceiving how language is learnt and used, does not stumble at the hurdle of text-linguistics. I want to conclude by making the further, but actually logically inevitable, claim that choices between primings include the choice to switch off a priming. Creativity comes from the switching off of primings. Fluency comes from conformity to them.

I have argued that lexical primings can drift, crack and be harmonised. Fluency and currency are associated with drift; so, self-evidently, is language. Disfluency is associated with cracks, and education paradoxically can be a prime creator of cracks.

Fluency and power are associated with harmonisation. Those in a position to harmonise the primings of others have power in our society. Is this a new para of part of the previous one?

If lexical priming is accepted as a way of talking about language, it does not mean that everything else has to be rejected. You will, for example, be able to spot close connections with Susan Hunston’s work and with Douglas Biber’s as reported in other papers in this volume. But it does mean that half of the work is still to be done and the other half has barely been started. Priming is linguistic but it is also, I have sought to show, text-linguistic. It is sociolinguistic but it also offers a way forward for critical discourse analysis. It offers a way out for grammarians – they can become chemists rather than alchemists.

Lexical priming is still just an infant theory supported by inadequate data and over-confident generalisations based on those inadequate data. I hope, though, I have offered enough to suggest that it might grow up in time into a fine young theory with a long life ahead of it, given proper care and attention and an appropriate mixture of criticism and encouragement.



Firth, J R (1957) ‘A synopsis of linguistic theory, 1930-1955’ in Studies in Linguistic Analysis , 1-32, reprinted in Selected Papers of J R Firth 1952-59 (ed. F Palmer), London: Longman, 168-205.


Halliday, M A K (1959) The Language of theChinese ‘Secret History of the Mongols’ Publication 17 of the Philological Society, Oxford: Blackwell.

Hoey, M (1995) ‘The lexical nature of intertextuality: a preliminary study’ in S-K Tanskanen & B Warvik (eds) Organisation in Discourse: Proceedings from the Turku Conference, Anglicana Turkuensia, 14, 73-94.


Halliday, M A K (1994) An Introduction to Functional Grammar (2 nd ed) London: Arnold.

Hoey, M (1997) ‘From concordance to text structure: new uses for computer corpora’ in PALC ’97 : Proceedings of Practical Applications of Linguistic Corpora Conference, University of Lodz , 2-23.


Hoey, M (1998) ‘Some text properties of certain nouns’ in T McEnery & S Botley (eds) Proceedings of the Colloquium on Discourse Anaphora and Reference Resolution , Lancaster: University of Lancaster, 1998.


Hoey, M (2002) ‘Lexis as choice: what is chosen?’ Paper given at International Systemics Congress, University of Liverpool, July 2002.

Hoey, M (2003) ‘Textual colligation – a special kind of lexical priming’ to appear in K Aijmer & B Altenberg (eds) Proceedings of ICAME 2002, Göteborg.

Louw, B (1993) ‘Irony in the text or insincerity in the writer? The diagnostic potential of semantic prosodies', in M Baker et al. (eds) Text and Technology . Amsterdam: John Benjamins, 157-76.

Sinclair, J McH (1991) Corpus, Concordance, Collocation Oxford: O.U.P.


Sinclair, J McH (1996) ‘The search for units of meaning’……

Sinclair, J McH (1997) ‘The lexical item’ ……..

Stubbs, M (1996) Text and Corpus Analysis Oxford: Blackwell