Language Is Arbitrary

Language is arbitrary. I recently stated that language is arbitrary on Twitter in response to the erroneous claim that language is a code. Language is not a code because language is arbitrary. When I state that language is arbitrary, I often receive bewildered and sometimes disdainful replies such as (1) English is 80% predictable, (2) written language codifies spoken language, and (3) language is rule-governed. I shall therefore clarify the linguistic tenet of the arbitrariness of language by unpacking the above statements.

Arbitrariness

First, what is meant by arbitrary? Some dictionary definitions of arbitrary include (1) based on random choice or personal whim, rather than any reason or system, (2) existing or coming about seemingly at random or by chance or as a capricious and unreasonable act of will, (3) based on chance rather than being planned or based on reason, (4) something that is determined by judgment or whim and not for any specific reason or rule, and (5) based on individual discretion or judgment; not based on any objective distinction, perhaps even made at random.

Sign, Signifier, Signified

Why is language arbitrary? Arbitrariness refers to the quality of “being determined by randomness and not for a specific reason.” Language consists of signifiers that represent the signified. But the signifier is not the signified. A signifier is a form such as a sound, morpheme, word, phrase, clause, or sign. The signified is that to which a signifier refers such as an object, action, quality, or quantity. For example, the English word book refers to the object “book.” The Spanish word correr refers to the action “run.” The German word zwei refers to a quantity of two.

Language is arbitrary because a language form does not have an innate or natural relationship with its meaning. English uses the words turkey and dog to refer to “turkey” and “dog.” But the signified “turkey” does not possess “turkeyness” and the signified “dog” does not possess “dogness.” If “turkey” and “dog” possessed turkeyness and dogness, then German would not use the words Truthahn or Pute and Hund and Spanish could not use the words pavo and perro. If the affixation of an -s or -es suffix possessed inherent plurality, then Italian would not change the suffix of gatto meaning “cat” to gatti meaning “cats.” Sotho could not use loti meaning “singular money, currency” and maloti meaning “plural money, currency.”

Why is language arbitrary? Arbitrariness refers to the quality of “being determined by randomness and not for a specific reason.” In linguistics, a sign refers to a basic unit of communication composed of a signifier and a signified. A signifier is a form such as a sound, morpheme, word, phrase, clause, or sign. The signified is that to which a signifier refers such as an object, action, quality, or quantity. For example, the English word book refers to the object “book.” The Spanish word correr refers to the action “run.” The German word zwei refers to a quantity of two. Semiotics is the branch of linguistics concerned with the study of signs. Signs are the basic building blocks of language, conveying meaning through the arbitrary relationship between a signifier and a signified.

Language consists of signifiers that represent the signified. But the signifier is not the signified. A sign conveys meaning through an arbitrary relationship between the signifier and signified. Language is arbitrary because a language form does not have an innate or natural relationship with its meaning. English uses the words turkey and dog to refer to “turkey” and “dog.” But the signified “turkey” does not possess “turkeyness” and the signified “dog” does not possess “dogness.” If “turkey” and “dog” possessed turkeyness and dogness, then German would not use the words Truthahn or Pute and Hund and Spanish could not use the words pavo and perro. If the affixation of an -s or -es suffix possessed inherent plurality, then Italian would not change the suffix of gatto meaning “cat” to gatti meaning “cats.” Sotho could not use loti meaning “singular money, currency” and maloti meaning “plural money, currency.”

Written Language

Written language is likewise arbitrary. Graphemes in an alphabetic writing system bear no innate relationship with the phones or markers that the graphemes spell. The written word <cat> spelled with the three graphemes <c.a.t> lacks a natural relationship with the phone [kæt]. Neither <cat> nor [kæt] have an innate relationship with “cat.” The grapheme <th> does not have a natural relationship with the phones [θ] or [ð]. Old English used the letter <ð> interchangeably with the letter <þ> to represent the dental fricative phoneme /θ/ and the allophone /ð/. If letters were not arbitrary, Old English and Modern English could not use different graphemes for the same phones. Languages could also not use the same graphemes for different phones. The grapheme <au> could not spell [ɔ] in Englishes without the cot~caught merger but [ɑ] in Englishes with the merger.

Languages and Codes

Because language is arbitrary, language is not a code. Some definitions of code include (1) a system for representing information with signs or symbols that are not ordinary language, or the signs or symbols themselves, (2) a system of words, letters, figures, or other symbols substituted for other words, letters, etc., especially for the purposes of secrecy, and (3) a system of replacing the words in a message with other words or symbols, so that nobody can understand it unless they know the system. A code is not arbitrary. The symbols of a code (signifier) shares an inherent relationship with that which the symbols represent (signified). Language is arbitrary, so language is not a code.

Rule-Based Arbitrariness

One recent reply that I received to stating that language is arbitrary is that “English is 80% predictable.” The 80%-predictable argument comes from phonics. Phonics is a method of teaching reading that correlates sounds to graphemes. Phonics fails because not all graphemes spell sounds. Some graphemes can spell the zero phone. The <p> in <pteranodon> spells the zero phone while the <p> in <helicopter> spells the phone [p]. Graphemes can also function as markers such as the <d> in <Wednesday> and the <b> in <debt>. There is also the single final non-syllabic <e>, or replaceable <e>, which is the lynchpin of the English spelling system. Phonics likes to make claims such as “20% of English phonemes have predictable spellings 90% of the time” and “10 phonemes are predictable 80% of the time.”

Phonics is flawed method for teaching reading, but English orthography is still rule-based, which is the point that the 80%-predictable argument tries to make. <fish> can spell [fɘʃ]. <*ghoti> cannot spell [fɘʃ] because of English spelling rules. <gh> is an English grapheme, but one that spells [g], not [f]. The <o> in <women>, which comes from Old English <wimmen>, spells the phone [ɪ]. <*ti> is not an English grapheme. Many phonics programs erroneously claim that <*tion> is a suffix. Only <ion> is a suffix as evidenced by <act + ion -> action>. Graphemes cannot cross morphemic boundaries. However, <phish> can spell [fɘʃ]. Although, as homophones, <fish> and <phish> are spelled differently and have different meanings. But <phish> does not violate the rules of English spelling. The grapheme <ph> can spell the phone [f] at the beginning of a word.

Language is arbitrary, but language is also rule-governed. A construction such as *Cat the bitten has dog the is grammatically incorrect in English. In English grammar, a determiner such as the precedes a nominal form such as cat and dog. An auxiliary verb such as has likewise precedes a main verb such as bitten. Language is conventional. The conventionality of language refers to language rules being established by accepted usage. In English, determiners precede nominal forms and auxiliary verbs precede main verbs. But there is no inherent reason for either grammar rule, or any grammar rule. A determiner could follow a nominal form. Furthermore, a language does not even have to have the distinct category of determiner. Language is rule-governed, but the rules are also arbitrary.

The formation of words in a given language is also rule-based but arbitrary. In English, the suffix <ion> can form a noun from a verb such as action from act and flexion from flex. Without knowing the meaning of a word like <dehydrogenation>, I can ascertain that the word is a noun formed from a verb because of the final <ion> suffix. I can also peel off an <ate> suffix before the <ion> and a <de> suffix at the beginning of the word. The full word sum for <dehydrogenation> is <de + hydr + o + g(e)n(e) + ate + ion>. Knowing bases and affixes allows me to figure out the spellings, pronunciations, and meanings of words. Rules for word formation also help me learn and even create new words. But the bases, affixes, and word formation rules are all arbitrary. <ion> can form a noun from a verb in English. <ung> can form a noun from a verb in German. Language is arbitrary. Otherwise, all languages would use the same forms to perform the same functions.

Language Change

Language conventions can also change. Nouns in Modern German express grammatical number, gender, and case. A noun is singular or plural. A noun is masculine, feminine, or neuter. A noun is also nominative, accusative, dative, or genitive. Nouns in Modern English express only number and possessive. A noun is singular or plural. A noun is non-possessive or possessive. But nouns in Old English more closely resembled nouns in Modern German, expressing grammatical number, gender, and case. Old English nouns were singular or plural; masculine, feminine, or neuter; and nominative, accusative, genitive, dative, or instrumental. Modern English nouns are no longer declined for case. If language were not arbitrary, grammar would not and could not change.

The arbitrariness of language additionally allows for the creation of new words and meanings. A lack of an inherent connection between the form of a word and its meaning allows for the creation of new combinations of sounds or symbols to represent new concepts or ideas. Language is not limited by the physical or sensory characteristics of the signified. For example, the Germanic tribes who spoke Old English had no buildings that soared dozens of stories into the air and thus had no word denoting the concept. Modern English developed the word skyscraper to name such buildings. Modern French and Modern German developed the cognates gratte-ciel and Wolkenkratzer. The meaning of existing words can also change over time through a process called semantic shift. For example, the word gay originally meant “lightheaded, joyful” as in the lyrics “Don we now our gay apparel.” By the 1890s, the word had developed a tinge of promiscuity. The word gay continued to change, coming to mean “homosexual” by the 1940s. Semantic shift would not be possible if language were not arbitrary.

Language is arbitrary, conventional, and rule-governed. Language codification can also occur. Codification refers to the process of selecting, developing, and prescribing a model for standard language usage. A standard language is a variety of language that has undergone codification of grammar and usage. Written language is largely codified. I, a speaker of the St. Louis Corridor of the Inland North of American English, can read written English just as well as a speaker of any other English. I pronounce <been> as [bɛn] while speakers of other Englishes might say [bɪn] or [bi:n]. But we all read <been> as the past participle of <be> regardless of pronunciation. If you look at Old English writing, you might find multiple spellings for the same word across manuscripts or even within the same text. Codification continues today. American English refers the <plow> spelling while British English tends to use <plough>. A search of COCA reveals <plow> (1665) is much more common than <plough> (145). Both spell [plaʊ]. Neither bears an innate relationship to “a farm implement dragged through the ground to break up the dirt for planting seeds.”

Conclusion

Language is arbitrary. The signifier is not the signified. Language is rule-based, but the rules of language are also arbitrary. Language is conventional. Language rules develop out of usage. Language is not a code. A code is not arbitrary. Finally, because language is arbitrary, all languages are linguistically equal. Some language varieties are considered more socially prestigious for non-linguistic reasons. Language can be codified, resulting in a standard language considered more prestigious than varieties that depart from the standard. But all languages, even standard languages, are arbitrary. Because language is arbitrary.

This post was originally published on December 14, 2019 and updated on April 30, 2023.