:: wikimiki.org ::
| Abjad |
Abjad:This article discusses the recent neologistic use of the word "abjad" to refer to a certain class of writing systems in linguistic scholarship. For information on the original and more established use of the term to denote a certain ordering of the letters of the Arabic alphabet, see Abjadi order.
An abjad is a type of writing system in which there is one symbol per consonantal phoneme, sometimes also called a consonantary. Abjads differ from alphabets, in that in an abjad, each basic grapheme represents a consonant, although vowels may be indicated by vowel marks on the basic graphemes. An alphabet has basic graphemes for both consonants and vowels. Abjads also differ from abugidas. In an abjad, each basic grapheme represents only a consonant. In an abugida, each basic grapheme represents a syllable consisting of a consonant and a vowel; the same consonant with a different vowel -- or with no vowel -- is represented by a modified or marked form of the same basic grapheme.
The system takes its name from the first nonsense 'word' of the mnemonic sequence for the letters of the Arabic alphabet in the older abjadi order. It has been suggested that the word 'Abjad' may have earlier roots in Phoenician or Ugaritic.
All known abjads belong to the Semitic family of scripts, and derive from the original Northern Linear Abjad. The reason for this is that Semitic languages have a morphemic structure which makes the denotation of vowels redundant or unnecessary in most cases.
"Impure" abjads (such as Arabic) may have characters for some vowels as well (called Matres lectionis, 'mothers of reading', singular mater lectionis), or optional vowel diacritics, or both; however, the term's originator, Peter T. Daniels, insists that it should be applied only to scripts entirely lacking in vowel indicators, thus excluding Arabic, Hebrew, and Syriac.
Impure abjads develop when, due to phonetic change, a previous consonant or diphthong becomes a vowel. Later generations, who receive their orthography without knowing that letter originally signified a consonant there, understand it to mean a vowel as it is in their spoken language. They then use that letter as a vowel in other places where it was never a consonant. For example, the Hebrew word הורישׁ probably underwent the following pronunciation change: - hiwriʃ >> - howriʃ >> horiʃ . The ו, which was originally the consonant w, became the vowel o. Later, probably in the Second Temple period, the vowel use of ו was expanded to places where no consonant ever existed.
Many scripts derived from abjads have been extended with vowel symbols to become full alphabets. This has mostly happened when the script was adapted to a non-Semitic language, the most famous case being the derivation of the Greek alphabet from the Phoenician abjad. The Greeks did not need the letters for the guttural (א, ה, ח, ע) and co-articulated (ט, צ, ק) consonants. They dropped some of them and turned others into vowels.
In other cases, the vowel signs come in the form of little points or hooks attached to the consonant letters, producing an abugida such as the system of writing Amharic.
Surprisingly, many non-Semitic languages such as English can be written without vowels and read with little difficulty. (For example, the previous sentence could be written Srprsngly, mny nn-Smtc lnggs sch `s `nglsh cn b wrttn wtht vwls `nd rd wth lttl dffclty.)
See also
- Abjad numerals
External Links
- [http://www.abjad.com/ Abjad - The Arabic Alphabet learning system]
References
- , v. 1, p. 28.
Category:Writing systems
Writing systemA writing system, also called a script, is a type of symbolic system used to represent elements or statements expressible in language.
language
General properties
Writing systems are distinguished from other possible symbolic communication systems in that one must usually understand something of the associated language in order to successfully read and comprehend the text. Contrast this with other possible symbolic systems such as information signs, painting, maps, and mathematics, which do not necessarily depend upon prior knowledge of a given language in order to extract their meaning.
Every human community possesses language, a feature regarded by many as an innate and defining condition of humankind. However, the development and adoption of writing systems has occurred only sporadically. Once established, writing systems are on the whole modified more slowly than their spoken counterparts, and often preserve features and expressions which are no longer current in the discourse of the speech community. The great benefit conferred by writing systems is their ability to maintain a persistent record of information expressed in a language, which can be retrieved independently of the initial act of formulation.
All writing systems require:
: - a set of defined base elements or symbols (termed characters or graphemes);
: - a set of rules and conventions understood and shared by a community, which arbitrarily assign meaning to the base elements, their ordering, and relations to one another;
: - a language (generally a spoken language) whose constructions are represented and able to be recalled by the interpretation of these elements and rules;
: - some physical means of distinctly representing the symbols by application to a permanent or semi-permanent medium, so that they may be interpreted (usually visually, but tactile systems have also been devised).
Basic terminology
The study of writing systems has developed along partially independent lines in the examination of individual scripts, and as such the terminology employed differs somewhat from field to field.
The generic term text may be used to refer to an individual product of a writing system. The act of composing a text may be referred to as writing, and the act of interpreting the text as reading. In the study of writing systems, orthography refers to the method and rules of observed writing structure (literal meaning, "correct writing"), and in particular for alphabetic systems, includes the concept of spelling.
A grapheme is the technical term coined to refer to the specific base or atomic units of a given writing system. Graphemes are the minimally significant elements which taken together comprise the set of "building blocks" out of which texts of a given writing system may be constructed, along with rules of correspondence and use. The concept is similar to that of the phoneme used in the study of spoken languages. For example, in the Latin-based writing system of standard contemporary English, examples of graphemes include the majuscule and minuscule forms of the twenty-six letters of the alphabet (corresponding to various phonemes), marks of punctuation (mostly non-phonemic), and a few other symbols such as those for numerals (logograms for numbers).
Note that an individual grapheme may be represented in a wide variety of ways, where each variation is visually distinct in some regard, but all are interpreted as representing the "same" grapheme. These individual variations are known as allographs of a grapheme (compare with the term allophone used in linguistic study). For example, the minuscule letter a has different allographs when written as a cursive, block, or typed letter. The selection between different allographs may be influenced by the medium used, the writing instrument, the stylistic choice of the writer, and the largely unconscious features of an individual's handwriting.
The terms glyph, sign and character are sometimes used to refer to a grapheme. Common usage varies from discipline to discipline; compare cuneiform sign, Maya glyph, Chinese character. The glyphs of most writing systems are made up of lines (or strokes) and are therefore called linear, but there are glyphs in non-linear writing systems made up of other types of marks, such as Cuneiform and Braille.
Writing systems are conceptual systems, as are the languages to which they refer. Writing systems may be regarded as complete according to the extent to which they are able to represent all that may be expressed in the spoken language.
History of writing systems
Proto-writing
Before there was writing, there was proto-writing. However few surviving examples exist, with some authorities questioning the inscriptions as early writing at all. Some believe them to be ideographic, early mnemonic devices of sorts, which may have been invented over time by creative prehistoric individuals. The two best known examples are:
- Old European Script, 6000 BC - 4000 BC
- Tărtăria inscriptions, 4500 BC
Old European script is disputed as actual proto-writing versus symbolic but non-linguistic artwork.
Invention of writing
The invention of the first writing systems is roughly contemporary with the beginning of the Bronze Age in the late Neolithic of the late 4th millennium BC. The first writing system is generally believed to have been the Sumerian script, which developed into cuneiform. However, Egyptian hieroglyphs and the undeciphered Proto-Elamite script also date to this era. Other early writing systems likely influenced by these innovations include the undeciphered Indus valley script; though its status as a writing system is unclear.
The Chinese script may have originated independently of the Middle Eastern scripts, around 1200 BC. The pre-Columbian writing systems of the Americas (including among others Olmec and Mayan) are also generally believed to have had independent origins.
The first pure alphabets emerged around 2000 BC in Ancient Egypt, but by then alphabetic principles had already been inculcated into Egyptian hieroglyphs for a millennium (see Middle Bronze Age alphabets).
Types of writing system
The oldest-known forms of writing were primarily logographic in nature, based on pictographic and ideographic elements. Most writing systems can be broadly divided into three categories: logographic, syllabic and alphabetic (or segmental); however, all three may be found in any given writing system in varying proportions, often making it difficult to categorise a system uniquely. The term complex system is sometimes used to describe those where the admixture makes classification problematic.
| Type of writing system | What each symbol represents | Example |
| Logographic | morpheme | Chinese hanzi |
| Syllabic | syllable | Japanese kana |
| Alphabetic | phoneme (consonant or vowel) | Latin |
| Abugida | phoneme (consonant+vowel) | Indian devanagari |
| Abjad | phoneme (consonant) | Arabic |
| Featural | phonetic feature | Korean hangul |
See also: phonemic and phonetic orthography.
Logographic writing systems
Main article: Logogram
A logogram is a single written character which represents a complete grammatical word. Most Chinese characters are classified as logograms.
As each character represents a single word (or, more precisely, a morpheme), many logograms are required to write all the words of language. The vast array of logograms and the memorization of what they mean are the major disadvantage of the logographic systems over alphabetic systems. However, since the meaning is inherent to the symbol, the same logographic system can theoretically be used to represent different languages. In practice, this is only true for closely related languages, like the Chinese languages, as syntactical constraints reduce the portability of a given logographic system. Both Korean and Japanese use Chinese logograms in their writing systems, with most of the symbols carrying the same or similar meanings. However, the semantics, and especially the grammar, are different enough that a Chinese text is not readily understandable by a Japanese or Korean reader.
While most languages do not use wholly logographic writing systems many languages use some logograms. A good example of modern western logograms are the Arabic numerals — everyone who uses those symbols understands what 1 means whether he or she calls it one, eins, uno, or ichi. Other western logograms include the ampersand &, used for and, and the at sign @ , used in many contexts for at.
Logograms are sometimes called ideograms, a word that refers to symbols which graphically represent abstract ideas, but linguists avoid this use, as Chinese characters are often semantic–phonetic compounds, symbols which include an element that represents the meaning and element that represents the pronunciation. Some nonlinguists distinguish between lexigraphy and ideography, where symbols in lexigraphies represent words, and symbols in ideographies represent words or morphemes.
The most important (and, to a degree, the only surviving) modern logographic writing system is the Chinese one, whose characters are used, with varying degrees of modification, in Chinese, Japanese, Korean, Vietnamese, and other east Asian languages. Ancient Egyptian hieroglyphics and the Mayan writing system are also systems with certain logographic features, although they have marked phonetic features as well, and are no longer in current use.
See List of writing systems for a list of predominantly-logographic writing systems.
Syllabic writing systems
Main article: Syllabary
As logographic writing systems use a single symbol for an entire word, a syllabary is a set of written symbols that represent (or approximate) syllables, which make up words. A symbol in a syllabary typically represents a consonant sound followed by a vowel sound, or just a vowel alone. In a true syllabary there is no systematic graphic similarity between phonetically related characters (though some do have graphic similarity for the vowels). That is, the characters for "ke", "ka", and "ko" have no similarity to indicate their common "k"-ness. Compare abugida, where each grapheme typically represents a syllable but where characters representing related sounds are similar graphically (typically, a common consonantal base is annotated in a more or less consistent manner to represent the vowel in the syllable).
Syllabaries are best suited to languages with relatively simple syllable structure, such as Japanese. The English language, on the other hand, allows complex syllable structures, with a relative large inventory of vowels and complex consonant clusters, making it cumbersome to write English words with a syllabary. To write English using a syllabary, every possible syllable in English would have to have a separate symbol, and whereas the number of possible syllables in Japanese is no more than one hundred or so, in English there are many thousands.
Other languages that use syllabic writing include Mycenaean Greek (Linear B) and Native American languages such as Cherokee. Several languages of the Ancient Near East used forms of cuneiform, which is a syllabary with some non-syllabic elements.
See List of writing systems for a list of syllabaries.
Alphabetic writing systems
Main article: Alphabet
An alphabet is a small set of letters — basic written symbols — each of which roughly represents or represented historically a phoneme of a spoken language. The word alphabet is derived from alpha and beta, the first two symbols of the Greek alphabet.
In a perfectly phonological alphabet, the phonemes and letters would correspond perfectly in two directions: a writer could predict the spelling of a word given its pronunciation, and a speaker could predict the pronunciation of a word given its spelling. Each language has general rules that govern the association between letters and phonemes, but, depending on the language, these rules may or may not be consistently followed.
Perfectly phonological alphabets are very easy to use and learn, and languages that have them (for example, Finnish) have much lower barriers to literacy than languages such as English, which has a very complex and irregular spelling system. As languages often evolve independently of their writing systems, and writing systems have been borrowed for languages they were not designed for, the degree to which letters of an alphabet correspond to phonemes of a language varies greatly from one language to another and even within a single language. In modern times, when linguists invent a writing system for a language that didn't previously have one, the goal is usually to make perfectly phonological alphabet. An example of such writing systems is the "IPA" (International Phonetic Alphabet).
See alphabet for more information about alphabets.
See List of writing systems for a list of alphabetic writing systems.
Abjads
Main article: Abjad
The first type of alphabet that was developed was the abjad. An abjad is an alphabetic writing system where there is one symbol per consonant. Abjads differ from regular alphabets in that they only have characters for consonantal sounds. Vowels are not usually marked in abjad.
All known abjads (except maybe Tifinagh) belong to the Semitic family of scripts, and derive from the original Northern Linear Abjad. The reason for this is that Semitic languages and the related Berber languages have a morphemic structure which makes the denotation of vowels redundant in most cases.
Some abjads (like Arabic and Hebrew) have markings for vowels as well, but only use them in special contexts, such as for teaching. Many scripts derived from abjads have been extended with vowel symbols to become full alphabets, the most famous case being the derivation of the Greek alphabet from the Phoenician abjad. This has mostly happened when the script was adapted to a non-Semitic language.
The term abjad takes its name from the old order of the Arabic alphabet's consonants Alif, Bá, Jim, Dál, though the word may have earlier roots in Phoenician or Ugaritic.
Abjad is still the word for alphabet in Arabic and Indonesian.
See List of writing systems for a list of abjad-based writing systems.
Abugidas
Main article: Abugida
An abugida is an alphabetic writing system whose basic signs denote consonants with an inherent vowel and where consistent modifications of the basic sign indicate other following vowels than the inherent one.
Thus, in an abugida there is no sign for "k", but instead one for "ka" (if "a" is the inherent vowel), and "ke" is written by modifying the "ka" sign in a way that is consistent with how one would modify "la" to get "le". In many abugidas the modification is the addition of a vowel sign, but other possibilities are imaginable (and used), such as rotation of the basic sign, addition of diacritical marks, and so on.
The obvious contrast is with syllabaries, which have one distinct symbol per possible syllable, and the signs for each syllable have no systematic graphic similarity. The graphic similarity comes from the fact that most abugidas are derived from abjads, and the consonants make up the symbols with the inherent vowel, and the new vowel symbols are markings added on to the base symbol.
The Ethiopic script is an abugida, although the vowel modifications in Ethiopic are not entirely systematic. Canadian Aboriginal Syllabics can be considered abugidas, although they are rarely thought of in those terms. The largest single group of abugidas is the Brahmic family of scripts, however, which includes nearly all the scripts used in India and Southeast Asia.
The name abugida is derived from the first four characters of an order of the Ge'ez script used in some religious contexts. The term was coined by Peter T. Daniels.
See List of writing systems for a list of abugida-based writing systems.
Featural writing systems
A featural script represents finer detail than an alphabet. Here symbols do not represent whole phonemes, but rather the elements (features) that make up the phonemes, such as voicing or its place of articulation. Theoretically, each feature could be written with a separate letter; and abjads or abugidas, or indeed syllabaries, could be featural, but the only prominent system of this sort is Korean Hangul. In Hangul, the featural symbols are combined into alphabetic letters, and these letters are in turn joined into syllabic blocks, so that the system combines three levels of phonological representation.
See List of writing systems for a list of featural writing systems.
Directionality
Different scripts are written in different directions. The early alphabet could be written in any direction: either horizontal (left-to-right or right-to-left) or vertical (up or down). It could also be written boustrophedon: starting horizontally in one direction, then turning at the end of the line and reversing direction. Egyptian hieroglyph is one such script, where the beginning of a line written horizontally was to be indicated by the direction in which animal and human idiograms are looking.
The Greek alphabet and its successors settled on a left-to-right pattern, from the top to the bottom of the page. Other scripts, such as Arabic and Hebrew, came to be written right-to-left. Many East Asian scripts, such as Chinese and Japanese, are written top-to-bottom, from the right to the left of the page. There are even scripts that are written from bottom to top, such as those formerly used in the Philippines and other Western Pacific Islands.
See also
- Artificial script
- Calligraphy
- ISO 15924 - list of "codes for the representation of names of scripts"
- List of writing systems
- List of inventors of writing systems
- Majuscule
- Minuscule
- Nü Shu
- Official script
- Orthography
- Pasigraphy
- Penmanship
- Shorthand
- Spelling
- Transliteration
- Written language
In computers and telecommunication systems, graphemes and other grapheme-like units required for text processing are represented by "characters" that typically manifest in encoded form. For technical aspects of computer support for various writing systems, see the articles CJK (Chinese, Japanese, Korean) and Bi-directional text, as well as :Category:Character encoding.
External links
- About African writing systems by the John Henrik Clarke Africana Library at Cornell University:
- http://www.library.cornell.edu/africana/Writing_Systems/Welcome.html
- General about writing systems
- http://www.omniglot.com/index.htm
- [http://omniglot.com/writing/alphabetic.htm Alphabetic Writing Systems]
- Michael Everson's [http://www.evertype.com/alphabets/index.html Alphabets of Europe]
- The [http://www.unicode.org/ Unicode Consortium]
- [http://www.digitas.harvard.edu/cgi-bin/wiki/ken/ATypographicOutcry A Typographic Outcry]: a curious perspective
References
- Coulmas, Florian. 1996. The Blackwell encyclopedia of writing systems. Oxford: Blackwell.
- Daniels, Peter T., and William Bright, eds. 1996. The world's writing systems. ISBN 0-19-507-993-0.
- DeFrancis, John. 1990. The Chinese Language: Fact and Fantasy. Honolulu: University of Hawaii Press. ISBN 0824810686
- Hannas, William. C. 1997. Asia's Orthographic Dilemma. University of Hawaii Press. ISBN 082481892X (paperback); ISBN 0824818423 (hardcover)
- Rogers, Henry. 2005. Writing Systems: A Linguistic Approach. Oxford: Blackwell. ISBN 0-631-23463-2 (hardcover); ISBN 0-631-23464-0 (paperback)
- Sampson, Geoffrey. 1985. Writing Systems. Stanford, California: Stanford University Press. ISBN 0-8047-1756-7 (paper), ISBN 0-8047-1254-9 (cloth).
- Smalley, W.A. (ed.) 1964. Orthography studies: articles on new writing systems, United Bible Society, London.
Category:Writing
writing system
ja:文字
ko:문자
zh-min-nan:Bûn-jī hē-thóng
Consonant
:See also consonance in music.
A consonant is a sound in spoken language that is characterized by a closure or stricture of the vocal tract sufficient to cause audible turbulence. The word consonant comes from Latin and means "sounding with" or "sounding together", the idea being that consonants don't sound on their own, but only occur with a nearby vowel, which is the case in Latin. This conception of consonants, however, does not reflect the modern linguistic understanding which defines consonants in terms of vocal tract constriction.
Since the number of consonants in the world's languages is much greater than the number of consonant letters in any one alphabet, linguists have devised systems such as the International Phonetic Alphabet (IPA) to assign a unique symbol to each possible consonant. In fact, the Latin alphabet, which is used to write English, has fewer consonant letters than English has consonant sounds, so some letters represent more than one consonant, and digraphs like "sh" and "th" are used to represent some sounds. Many speakers aren't even aware that the "th" sound in "this" is a different sound from the "th" sound in "thing" (in the IPA they're [ð] and [θ], respectively).
Each consonant can be distinguished by several features:
- The manner of articulation is the method that the consonant is articulated, such as nasal (through the nose), stop (complete obstruction of air), or approximant (vowel like).
- The place of articulation is where in the vocal tract the obstruction of the consonant occurs, and which speech organs are involved. Places include bilabial (both lips), alveolar (tongue against the gum ridge), and velar (tongue against soft palate). Additionally, there may be a simultaneous narrowing at another place of articulation, such as palatalisation or pharyngealisation.
- The phonation of a consonant is how the vocal cords vibrate during the articulation. When the vocal cords vibrate fully, the consonant is called voiced; when they do not vibrate at all, it's voiceless.
- The voice onset time (VOT) indicates the timing of the phonation. Aspiration is a feature of VOT.
- The airstream mechanism is how the air moving through the vocal tract is powered. Most languages have exclusively pulmonic egressive consonants, which use the lungs and diaphragm, but ejectives, clicks, and implosives use different mechanisms.
- The length is how long the obstruction of a consonant lasts. This feature is not distinctive in English, but various languages such as Italian, Japanese and Finnish have two length levels, "single" and "geminate". Estonian and some Sami languages have three lengths on the phonetic level: short, geminate, and long geminate.
- The articulatory force is how much muscular energy is involved. This has been proposed many times, but no distinction relying exclusively on force has ever been demonstrated.
All English consonants can be classified by a combination of these features, such as "voiceless alveolar stop consonant" [t]. In this case the airstream mechanism is omitted.
Some pairs of consonants like p::b, t::d are sometimes called fortis and lenis, but this is a phonological rather than phonetic distinction.
Consonant as a symbol
The word consonant is also used to refer to a letter of an alphabet that denotes a consonant sound. Consonant letters in the English alphabet are B, C, D, F, G, H, J, K, L, M, N, P, Q, R, S, T, V, W, X, Z, and usually Y: The letter Y stands for the consonant [j] in "yoke" but for the vowel in "myth", for example.
See also
- Table of consonants
- List of consonants
- List of phonetics topics
Links
- [http://www.chass.utoronto.ca/~danhall/phonetics/sammy.html interactive manner and place of articulation]
- [http://www.oneletterwords.com Dictionary of All-Consonant Words]: a free online dictionary with over 1,000 words with no vowels and examples of usage from literature.
-
ko:닿소리
ja:子音
PhonemeIn human language, a phoneme is a set of phones (speech sounds or sign elements) that are cognitively equivalent. It is the basic unit that distinguishes words and morphemes. That is, changing an element of a word from one phoneme to another produces either a different word or obvious nonsense; whereas changing an element from one phone to another, when both belong to the same phoneme, produces the same word with an odd or incomprehensible pronunciation.
Phonemes are not the physical segments themselves, but mental abstractions of them. A phoneme is a family of phones, called allophones, that the speakers of a language think of, and hear or see, as being the same.
In sign languages, the phoneme was formerly called a chereme (or cheireme), but usage changed to phoneme when it was recognized that the mental abstractions involved are essentially the same as in oral languages.
A "perfect" alphabet is one that has a single symbol for each phoneme.
Phonemics, a branch of phonology, is the study of the systems of phonemes of languages.
Although it is fundamental to most phonological theories, some linguists reject the theoretical validity of the phoneme. Some think that phonemes are more a product of literacy (i.e., the need to categorize the phonetics of a language in order to write it down systematically with a minimum number of letters). Other critics charge that the mind processes sub-phonemic elements of speech (e.g., features) in meaningful ways.
A common test to determine whether two phones are allophones or separate phonemes relies on finding so-called minimal pairs: words that differ only in the phones in question.
Background and related ideas
The term phonème was reportedly first used by Dufriche-Desgenettes in 1873, but it refered to only a sound of speech. The term phoneme as an abstraction was developed by the Polish linguist Jan Niecislaw Baudouin de Courtenay and his student Mikołaj Kruszewski during 1875-1895. The term used by these two was fonema, the basic unit of what they called psychophonetics. The concept of the phoneme was elaborated in the works of Nikolai Trubetzkoi and other of the Prague School (during the years 1926-1935), as well as in that of structuralists like Ferdinand de Saussure, Edward Sapir, and Leonard Bloomfield. Later, it was also used in generative linguistics, most famously by Noam Chomsky and Morris Halle, and remains central in any accounts of the development of virtually all modern schools of phonology.
The phoneme can be defined as "the smallest meaningful psychological unit of sound." The phoneme has mental, physiological, and physical substance: our brains process the sounds; the sounds are produced by the human speech organs; and the sounds are physical entities that can be recorded and measured.
For an example of phonemes, consider the English words pat and sat, which differ only in their initial consonants. This difference, known as contrastiveness or opposition, is sufficient to distinguish these words, and therefore the P and S sounds are said to be different phonemes. A pair of words that are identical except for such a sound are known as a minimal pair; this is the most frequent demonstration that two sounds are separate phonemes.
If no minimal pair can be found to demonstrate that two sounds are distinct, it may be that they are allophones. Allophones are variant phones (i.e., sounds) that are not recognized as distinct by a speaker, and are not meaningfully different in the language, and yet are perceived as "the same". This is especially likely if they consistently occur in different environments. For example, the "dark" L sound at the end of the English word "wool" is quite different from the "light" L sound at the beginning of the word "leaf", but this difference is meaningless in English, and is determined by whether the sound is at the beginning or end of a word. A native English speaker might have a hard time hearing the difference at first, but in Turkish the difference between "light" and "dark" L is sufficient to distinguish words. That is, they are two separate phonemes in Turkish, but allophones of a single phoneme in English.
The phonemic relationship of two sounds may not be obvious to a non-native speaker, which is why minimal pairs and an understanding of phonetic environments are important. For example, in Korean, there is a phoneme /r/ that is a flapped r between vowels, and is an l-sound next to other consonants. These sound very different to an English speaker, who is attuned to hearing them because the differences are meaningful in English. However, the native speaker has learned from an early age to filter out the difference, as they are not meaningful in their language. In Korean, for instance, it is impossible to distinguish the two words "ram" and "lam", despite the fact that both R and L sounds occur in the language.
The exact number of phonemes in English depends on the speaker and the method of determining phoneme vs. allophone, but estimates typically range from 40 to 45, which is above average across all languages. Pirahã has only 10, while !Xóõ has 141.
Depending on the language and the alphabet used, a phoneme may be written consistently with one letter; however there are many exceptions to this rule — see Writing systems below.
Some languages make use of pitch for the precise same purpose. In this case, the tones used are called tonemes. Some languages distinguish words made up of the same phonemes (and tonemes) by using different durations of some elements, which are called chronemes. However, the chroneme is not employed by the majority of scholars working on languages with distinctive duration, and the term itself may not even be recognized by most linguists. Usually, long vowels and consonants are represented either by a length indicator or doubling of the sound in question.
In sign languages, phonemes may be classified as Tab (elements of location, from Latin tabula), Dez (the hand shape, from designator), Sig (the motion, from signation), and with some researchers, Ori (orientation). Facial expressions and mouthing are also phonemic.
Notation
A transcription that only indicates the different phonemes of a languages is said to be phonemic. Such transcriptions are enclosed within virgules (slashes), / /; these show that each enclosed symbol is claimed to be phonemically meaningful. On the other hand, a transcription that indicates finer detail, including allophonic variation like the two English L's, is said to be phonetic, and is enclosed in square brackets, [ ].
The common notation used in linguistics employs virgules (slashes) (/ /) around the symbol that stands for the phoneme. For example, the phoneme for the initial consonant sound in the word "phoneme" would be written as . In other words, the graphemes are <ph>, but this digraph represents one sound . Allophones, real speech variants of a phoneme, are often denoted in linguistics by the use of diacritical or other marks added to the phoneme symbols and then placed in square brackets ([ ]) to differentiate them from the phoneme in slant brackets (/ /). The conventions of orthography are then kept separate from both phonemes and allophones by the use of the markers < > to enclose the spelling.
The symbols of the International Phonetic Alphabet (IPA) and extended sets adapted to a particular language are often used by linguists to write phonemes of oral languages, with the principle being one symbol equals one categorical sound. Due to problems displaying some symbols in the early days of the Internet, systems such as X-SAMPA and Kirshenbaum were developed to represent IPA symbols in plain text. As of 2004, any modern web browser can display IPA symbols (as long as the operating system provides the appropriate fonts), and we use this system in this article.
The only published set of phonemic symbols for a sign language is the Stokoe notation developed for American Sign Language, which has since been applied to British Sign Language by Kyle and Woll, and to Australian Aboriginal sign languages by Adam Kendon. However, there are several phonetic systems, such as SignWriting.
Examples
Examples of phonemes in the English language would include sounds from the set of English consonants, like and . These two are most often written consistently with one letter for each sound. However, phonemes might not be so apparent in written English, such as when they are typically represented with combined letters, called digraphs, like <sh> (pronounced ) or <ch> (pronounced ).
To see a list of the phonemes in the English language, see IPA for English.
Two sounds that may be allophones (sound variants belonging to the same phoneme) in one language may belong to separate phonemes in another language or dialect. In English, for example, has aspirated and non-aspirated allophones:aspirated as in , and non-aspirated as in . However, in many languages (e. g. Chinese), aspirated is a phoneme distinct from unaspirated . As another example, there is no distinction between and in Japanese, there is only one phoneme in Japanese, although the Japanese has allophones that make it sound more like an , , or to English speakers. The sounds and are distinct phonemes in English, but allophones in Spanish. (as in run) and (as in rung) are phonemes in English, but allophones in Italian and Spanish.
An important phoneme is the chroneme, a phonemically-relevant extension of the duration a consonant or vowel. Some languages or dialects such as Finnish or Japanese allow chronemes after both consonants and vowels. Others, like Italian or Australian English use it after only one (in the case of Italian, consonants; in the case of Australian, vowels).
Arguments against the phoneme
Rather than a basic mental unit of language, some think that the phoneme may well be a perceptual artifact of alphabetic literacy (see the terms Phonemic awareness and Phonological awareness). If not that, it may be an epiphenomenal aspect to listening removed from face-to-face encounters, that is, text-like listening (qv phone and feature). It could be said that the unit of the phoneme is a necessary construct if we wish to set a dynamic, complex spoken language into static, written form expressed at a sub-syllabic level, though the model is a simplification and no where near phonologically or phonetically complete. The phoneme has the theoretical weakness from the perspective of phonology in that it uses, in part, lexical criteria to determine something that is supposed to be phonological (i.e., minimal pairs of words to point out phonological categories).
Much of phonology, while accepting the phoneme as possible model or unit of language for description, has largely moved past the segmental phoneme as a basic unit of speech, of speech processing or of language acquisition. This is because the concept of the 'feature' is viewed as beneath the level of the phoneme while also spanning across segments. Meanwhile, attempts at capturing a phonological picture of the psychological control and structure underlying real speech flounder on the inadequacies of the phoneme for such purposes (that is, the phoneme can not account for co-articulation or assimilation of controlled speech, among other phenomena). However, the term, though variably defined and delimited, remains a widely and uncritically accepted concept in foreign language teaching and native literacy (especially for alphabetic languages, such as English).
Restricted phonemes
A restricted phoneme is a phoneme that can only occur in a certain environment: There are restrictions as to where it can occur. English has several restricted phonemes:
- , as in sing, occurs only at the end of a syllable, never at the beginning. (In many other languages, such as Swahili, can start a word.)
- occurs only at the beginning of a syllable, never at the end. (A few languages such as Arabic allow /h/ at the ends of words.)
- In many American dialects with the cot-caught merger, occurs only before /r/, /l/, and in the diphthong .
- In non-rhotic dialects, /r/ can only occur before a vowel, never at the end of a word or before a consonant.
- Under most interpretations, and occur only before a vowel, never at the end of a syllable. However, many phonologists interpret a word like boy as either or .
Neutralization, archiphoneme, underspecification
Phonemes that are contrastive in certain environments may not be contrastive in all environments. In the environments where they don't contrast, the contrast is said to be neutralized. In English there are three nasal phonemes, , as shown by the minimal triplet,
However, these sounds are not contrastive before plosives such as . Although all three phones appear before plosives, for example in limp, lint, link, only one of these may appear before each of the plosives. That is, the distinction is neutralized before each of the plosives :
- Only occurs before ,
- only before , and
- only before .
Thus these phonemes are not contrastive in these environments, and according to some theorists, there is no evidence as to what the underlying representation might be. If we hypothesize that we are dealing with only a single underlying nasal, there is no reason to pick one of the three phonemes over the other two.
(In some languages there is only one phonemic nasal anywhere, and due to obligatory assimilation, it surfaces as in just these environments, so this idea is not as far-fetched as it might seem at first glance.)
In certain schools of phonology, such a neutralized distinction is known as an archiphoneme (Nikolai Trubetzkoy of the Prague school is often associated with this analysis.). Archiphonemes are often notated with a capital letter. Following this convention, the neutralization of before could be notated as |N|, and limp, lint, link would be represented as |lɪNp, lɪNt, lɪNk|. (The |pipes| indicate underlying representation.) Other ways this archiphoneme could be notated are |m-n-ŋ|, , or |n - |.
Another example from English is the neutralization of the plosives /k, g/ following /s/. Phonetically, the unaspirated tenuis plosive in sky is closer to English /g/, which is partially voiceless in initial position, than to aspirated /k/. This can be heard by comparing the sky with this guy; also, in the speech of young children who are not yet able to produce consonant clusters, they often pronounce sky as what sounds like to adult ears. That is, /k/ and /g/ are constrastive word initially,
But not after an /s/,
Thus one cannot say whether the underlying representation of the plosive in sky is /skai/ without aspiration, or /sgai/ without voicing. This neutralization can instead be represented as an archiphoneme |G|, in which case the underlying representation of sky would be |sGai|.
Another way to talk about archiphonemes involves the concept of underspecification. Phonemes can be considered fully specified segments while archiphonemes are underspecified segments. In Tuvan, phonemic vowels are specified with the features of tongue height, backness, and lip rounding. The archiphoneme |U| is an underspecified high vowel where only the tongue height is specified.
Whether |U| is pronounced as front or back and whether rounded or unrounded depends on vowel harmony. If |U| occurs following a front unrounded vowel, it will be pronounced as the phoneme ; if following a back unrounded vowel, it will be as an ; and if following a back rounded vowel, it will be an . This can been seen in the following words:
It should be noted that not all phonologists accept the concept of archiphonemes. Many doubt that it reflects how people process language.
Non-phonemes
Prothesis, epenthesis and paragoge due to phonotactics add sounds into words without adding meaning. Nevertheless, the sound is added, and thus the phoneme status may be ambiguous. For example, Spanish prothetic e- must be added before consonant clusters, e.g. estres.
Phonological extremes
Of all the sounds that a human vocal tract can create, different languages vary considerably in the number of these sounds that are considered to be distinctive phonemes in the speech of that language. Ubyx and some dialects of Abkhaz have only two phonemic vowels, and many Native American languages have three. On other extreme, the Bantu language Ngwe has fourteen vowel qualities, twelve of which may occur long or short, for twenty-six oral vowels, plus six nasalized vowels, long and short, for thirty-eight vowels; while !Xóõ achieves thirty-one pure vowels—not counting vowel length, which it also has—by varying the phonation. Rotokas has only six consonants, while !Xóõ has somewhere in the neighborhood of seventy-seven, and Ubyx eighty-one. French has no phonemic tone or stress, while several of the Kam-Sui languages have nine tones, and one of the Kru languages, Wobe, has been claimed to have fourteen, though this is disputed. The total number of phonemes in languages varies from as few as eleven in Rotokas to as many as 112 in !Xóõ (including four tones). These may range from familiar sounds like , , or to very unusual ones produced in extraordinary ways (see: Click consonant, phonation, airstream mechanism). The English language itself uses a rather large set of thirteen to twenty-two vowels, including diphthongs, though its twenty-two to twenty-six consonants are close to average. (There are twenty-one consonant and five vowel letters in the English alphabet, but this does not correspond to the number of consonant and vowel sounds.)
The most common vowel system consists of the five vowels . The most common consonants are . A very few languages lack one of these: standard Hawai‘ian lacks , Mohawk lacks and , Hupa lacks both and a simple , colloquial Samoan lacks and , while Rotokas and Quileute lack and . While most of these languages have very small inventories, Quileute and Hupa have quite complex consonant systems.
The ways that sounds are pronounced can vary slightly from language to language even if the same IPA symbol is used. For example, the Finnish word maat ("countries") sounds different from the British English (Received Pronunciation) word mart even though both are transcribed as IPA [http://www.helsinki.fi/hum/hyfl/projektit/vokaalikartat_eng.html#sweswedish_vowels]; the Spanish word sin ("without") has a somewhat different vowel from the American English seen though both are transcribed as .
Writing systems
In a phonemic writing system, a given symbol represents a single phoneme and each phoneme is represented by a single symbol. This may differ from a phonetic orthography, which only requires that the spelling be unambiguously determined by the pronunciation, and the pronunciation unambiguously indicated by the spelling. English spelling is the classic example of an nonphonemic, and indeed unphonetic, spelling system. Welsh and Irish are, by contrast, among the more predictable orthographies among languages using the Latin alphabet. In French, rules to predict pronunciation from spelling are quite simple and have few exceptions, as long as there are some clues such as context or part of speech, but guessing spelling from pronunciation is quite difficult, especially because of the many silent letters. Italian, Spanish and especially Finnish have a very close letter-to-phoneme correspondence. Karelian has a perfectly phonemic spelling system, as it has no standard language, but it has a complete spelling system.
However, the split between phonemic and nonphonemic orthographies is exaggerated. All languages are written with conventions that represent both meaning and pronunciation. This is true at both ends of the scale: Chinese characters are first and foremost symbols of words, but they have some phonetic information as well. At the other extreme, there are a few orthographies which are perfect phonemic representations of an artificial national standard, but since they make no effort to represent variation in pronunciation within the language, they too are conventional.
Other languages fall somewhere in between. Although English is often given as an example of an unphonetic orthography, its system is nowhere near to being as purely conventional a system as Chinese writing is. English spelling conveys etymological information, but also vast amounts of phonetic information. Spanish is often given as an example of a phonetic orthography, but it has numerous imperfections including silent letters. It is, at least, possible to tell the correct pronunciation of any written Spanish word. Another phonemic orthography is Serbian. Its phonemicity was established by Serbian "Webster" Vuk Stefanović Karadžić. He followed a strict phonemic principle, which is best told by his own words: "Write as you speak and read as it is written.". Hindi, a descendant of Sanskrit, is an example of phonetic language written with a non-Roman Alphabet.
See also
- Minimal pair
- Phone
- Phonology
- Emic and etic
- Tone (linguistics)
- Morphophonology
- List of phonetics topics
- Initial-stress-derived noun
External links
- [http://www.sil.org/linguistics/GlossaryOfLinguisticTerms/WhatIsAPhoneme.htm What is a phoneme? (SIL)]
- [http://www.sil.org/linguistics/GlossaryOfLinguisticTerms/WhatIsAnAllophone.htm What is an allophone? (SIL)]
- [http://www.sil.org/linguistics/GlossaryOfLinguisticTerms/WhatIsAPhone.htm What is a phone? (SIL)]
- [http://www.sil.org/linguistics/GlossaryOfLinguisticTerms/WhatIsAPhoneticallySimilarSegm.htm What is a phonetically similar segment? (SIL)]
- [http://www.sil.org/linguistics/GlossaryOfLinguisticTerms/WhatIsAMinimalPair.htm What is a minimal pair? (SIL)]
- [http://www.sil.org/linguistics/GlossaryOfLinguisticTerms/WhatIsComplementaryDistributio.htm What is complementary distribution? (SIL)]
- [http://www.sil.org/linguistics/GlossaryOfLinguisticTerms/WhatIsAnEnvironment.htm What is an environment? (SIL)]
- [http://www.sil.org/linguistics/GlossaryOfLinguisticTerms/WhatIsContrastInIdenticalEnvir.htm What is an contrast in identical environments? (SIL)]
- [http://www.sil.org/linguistics/GlossaryOfLinguisticTerms/WhatIsContrastInAnalogousEnvir.htm What is an contrast in analogous environments? (SIL)]
- [http://www.sil.org/linguistics/GlossaryOfLinguisticTerms/ComparisonOfMorphemeMorphAllom.htm Comparison of morpheme-morph-allomorph & phoneme-phone-allophone? (SIL)]
- [http://www.sil.org/linguistics/GlossaryOfLinguisticTerms/Phonology.htm What is phonology? (SIL)]
- [http://www2.let.uu.nl/UiL-OTS/Lexicon/zoek.pl?lemma=phoneme Phoneme (Lexicon of Linguistics)]
- [http://www2.let.uu.nl/UiL-OTS/Lexicon/zoek.pl?lemma=allophony Allophony (Lexicon of Linguistics)]
- [http://www2.let.uu.nl/UiL-OTS/Lexicon/zoek.pl?lemma=transcription Transcription (Lexicon of Linguistics)]
- [http://www2.let.uu.nl/UiL-OTS/Lexicon/zoek.pl?lemma=Grapheme-phoneme+conversion Grapheme-Phoneme Conversion (Lexicon of Linguistics)]
- [http://www2.let.uu.nl/UiL-OTS/Lexicon/zoek.pl?lemma=Phoneme+restoration Phoneme Restoration (Lexicon of Linguistics)]
- [http://moodle.ed.uiuc.edu/wiked/index.php/Phonemic_awareness phonemic awareness]
Category:Phonetics
Category:Phonology
zh-min-nan:Im-sò·
ko:낱소리
ja:音素
GraphemeA grapheme designates the atomic unit in written language. Graphemes include letters, Chinese ideograms, numerals, punctuation marks, and other symbols.
In a phonological orthography a grapheme corresponds to one phoneme. In spelling systems that are non-phonemic — such as the spellings used most widely for written English — multiple graphemes may represent a single phoneme. These are called digraphs (two graphemes for a single phoneme) and trigraphs (three graphemes). For example, the word ship contains four graphemes (s, h, i, and p) but only three phonemes, because sh is a digraph. An example of a trigraph is the tch in itch.
Different glyphs can represent the same grapheme. For example, the minuscule letter a can be seen in two variants, with a hook at the top, and without. Not all glyphs are graphemes; for example the logogram ampersand (&) represents the Latin word et (English word and), which contains two phonemes.
See also
- Digraph (orthography)
- Trigraph (orthography)
- Allograph (orthography)
- Tilde
Category:Linguistics
als:Buchstabe
zh-min-nan:Grapheme
Vowel mark:This article concerns the vowel points or vowel marks of Hebrew. For those of Arabic, see Harakat.
In Hebrew orthography, Niqqud or Nikkud (Standard Hebrew , Biblical Hebrew ְקֻ, Tiberian Hebrew "vowels") is the system of diacritical vowel points (or vowel marks) in the Hebrew alphabet. Several orthographic systems for representing Hebrew vowels were developed in the early middle ages. The most widespread system (and the only one still used to a significant degree today) was created by the Masoretes of Tiberias (see Masoretic Text, Tiberian Hebrew).
Niqqud marks are small compared to the consonants they are positioned adjacent to, and thus can be added, without requiring the retranscription of texts whose writers did not anticipate their eventual addition.
Non-speakers of Hebrew give their greatest attention to vowel points (usually without using the word "niqqud") in the context of controversy over the interpretation of those written with the Tetragrammaton -- written as ְהָה in Hebrew. The interpretation affects discussion of the authentic ancient pronunciation of the name whose other conventional English forms are "Jehovah" and "Yahweh".
The signs of the niqqud
This table uses the consonants , or , where appropriate, to demonstrate where the niqqud is placed in relation to the consonant it is pronounced after. Any other consonants shown are actually part of the vowel. Note that there is some variation among different traditions in exactly how some vowel points are pronounced. The table below shows how most Israelis would pronounce them, but the classic Ashkenazi pronunciation, for example, differs in several respects.
:This demonstration is known to work in Internet Explorer and Mozilla browsers in at least some circumstances, but in most other Windows browsers the niqqud do not properly combine with the consonants. This is because, currently, the Windows text display engine does not combine the niqqud automatically. Except as noted, the vowel pointings should appear directly beneath the consonants, although the accompanying "vowel letter" consonants for the mālê (unchangeable long) forms appear after.
See also
- The Arabic equivalent Harakat
Technical problems on Wikimedia
- Important: There is currently a serious bug affecting niqqud in all Wikimedia projects. See Wikipedia:Niqqud for a discussion of the problem in English, and click the language link in the sidebar for an extensive analysis of the problem in Hebrew.
Category:Jewish texts
Category:Hebrew language
Arabic alphabet
The Arabic alphabet is the script used for writing in the Arabic language.
Because the Qur'an, the holy book of Islam, is written with this alphabet, its influence spread with that of Islam. As a result, the Arabic alphabet is used to write many other languages—many other languages belonging to language families other than Semitic, the family Arabic belongs to. For example, Persian and Urdu languages.
In order to accommodate the phonetics of other languages, the alphabet has been adapted by the addition of letters and other symbols. (See #Arabic alphabets of other languages below).
The alphabet presents itself in different styles such as Nasta'līq, Thuluth, Kufic and others (see Arabic calligraphy), just like different handwriting styles and typefaces for the Roman alphabet. Superficially, these styles appear quite different, but the basic letterforms remain the same.
Structure of the Arabic alphabet
The Arabic alphabet is written from right to left and is composed of 28 basic letters. Adaptations of the script for other languages such as Persian and Urdu have additional letters. There is no difference between written and printed letters; the writing is unicase (i.e. the concept of upper and lower case letters does not exist). On the other hand, most of the letters are attached to one another, even when printed, and their appearance changes as a function of whether they connect to preceding or following letters. Some combinations of letters form special ligatures.
The Arabic alphabet is an "impure" abjad—short vowels are not written, though long ones are—so the reader must know the language in order to restore the vowels. However, in editions of the Qur'an or in didactic works a vocalization notation in the form of diacritic marks is used. Moreover, in vocalized texts, there is a series of other diacritics of which the most modern are an indication of vowel omission (sukūn) and the lengthening of consonants (šadda).
The names of Arabic letters can be thought of as abstractions of an older version where the names of the letters signified meaningful words in the Proto-Semitic language.
There are two orders for Arabic letters in the alphabet, the original Abjadī order matches the ordering of letters in all alphabets derived from the Phoenician alphabet, including the English ABC. The standard order used today, and shown in the table is the Hejā'ī order, where letters are grouped according to their shape.
Abjadi order
The special Abjadī order (or two slightly variant orders) was devised by matching an Arabic letter of the fully consonant-dotted 28-letter Arabic alphabet to each of the 22 letters of the Aramaic alphabet (in their old Phoenician/Hebrew/Aramaic alphabetic order) — leaving six remaining Arabic letters at the end. The Abjadī order is not a simple historically-continuous preservation of the earlier north Semitic alphabetic order, since it contains a position corresponding to the Aramaic letter semkat/samekh , yet no letter of the Arabic alphabet historically comes from ס. Similarly, the Abjad orders include in their first 22 positions some letters ( and ) which did not exist until the Arabic alphabet was expanded by consonant dotting.
The most common Abjad sequence is:
This is commonly vocalized as follows:
: - .
Another vocalization is:
: -
Another (probably older) Abjad sequence, now mainly confined to the Maghreb, is:
which can be vocalized as:
: -
Presentation of the alphabet
abjad
The following table provides all of the Unicode characters for Arabic, and none of the supplementary letters used for other languages. Current browser technology still has not caught up, so some of these forms may not display correctly. The table also shows some of the many Latin-alphabet characters that have been used. There are at least a half dozen standards for transliterating Arabic characters. Multiple methods have proliferated due to various conflicting goals. See the article Arabic transliteration for more on this topic and the different transliteration methods.
To complicate the entire question still further, there are regional differences in the way Arabic speakers pronounce the various letters, even when speaking the standard, literary language (Fusha). This chart only attempts to set forth the "standard" pronunciation as taught in universities. The phonetic equivalents are given in the Continental version of the International Phonetic Alphabet. For more details concerning the pronunciation of Arabic, consult the article Arabic phonology.
Primary letters
Letters lacking an initial or medial version are never tied to the following letter, even within a word. As to hamza, it has only a single graphic, since it is never tied to a preceding or following letter. However, it is sometimes 'seated' on a waw, ya or alif, and in that case the seat behaves like an ordinary waw, ya or alif.
Technically, hamza is not a letter, but a diacritic.
Other characters
The following are not actual letters, but rather different orthographical shapes for letters, and in the case of the , a ligature.
Notes
The , commonly using Unicode 0x0649 () in Arabic, is sometimes replaced in Persian or Urdu, with Unicode 0x06CC (ی), called "Farsi Yeh". This is appropriate to its pronunciation in those languages. The glyphs are identical in isolated and final form (ﻯ ﻰ), but not in initial and medial form, in which the Farsi Yeh gains two dots below (ﯾ ﯿ) while the has neither an initial nor a medial form.
Writing the hamza
Initially, the letter indicated an occlusive glottal, or glottal stop, transcribed by , confirming the alphabet came from the same Phoenician origin. Now it is used in the same manner as in other abjads, with and , as a mater lectionis, that is to say, a consonant standing in for a long vowel (see below). In fact, over the course of time its original consonantal value has been obscured, since now serves either as a long vowel or as graphic support for certain diacritics (madda or hamza).
The Arabic alphabet now uses the hamza to indicate a glottal stop, which can appear anywhere in a word. This letter, however, does not function like the others: it can be written alone or on a support in which case it becomes a diacritic:
- alone: ;
- with a support: (above and under a ), (above a ), (above a without points or ).
The details of writing of the hamza are discussed below, after that of the vowels and syllable-division marks, because their functions are related.
Ligatures
The only compulsory ligature is lām+'alif. All other ligatures (yaa - mīm, etc.) are optional.
Some fonts include a Salla-llahu 'alayhi wasallam glyph:
:100px
Muslims normally use this phrase after any mention of the prophet Muhammad.
Fonts also include a special glyph for the word "li-llah", which means "to God."
:50px
Combined with the letter 'alif, it becomes Allah:
The latter is a work-around for the shortcomings of most text processors, which are incapable of displaying the correct vowel marks for the word "Allah". Compare the display below, which depends on your browser and installed fonts:
:
Alternatively, some fonts may be designed to replace the sequence lam-lam-hā' or alif-lam-lam-hā' to the ligature U+FDF2 ARABIC LIGATURE ALLAH ISOLATED FORM, but this seems to depend on the font, with Arial, Bitstream Cyberbit and Times New Roman, as examples of the former, and Arial Unicode MS as an example of the latter. This is probably because some font designers interpret U+FDF2 as "li-llah", and others as "Allah" and so design the glyph and replacement mapping as such.
lam-lam-hā':
alif-lam-lam-hā':
U+FDF2:
Diacritics
See also article Harakat.
Vowels
Arabic short vowels are generally not written, except sometimes in sacred texts (such as the Qurʼan) and didactics, which are known as vocalised texts. Occasionally short vowels are marked where the word would otherwise be ambiguous and cannot be resolved simply from context.
Short vowels may be written with diacritics placed above or below the consonant that precedes them in the syllable. (All Arabic vowels, long and short, follow a consonant; contrary to appearances: there is a consonant at the start of a name like Ali in Arabic or a word like .)
Long "a" following a consonant other than hamzah
is written with a short-"a" mark on the consonant
plus an alif after it
(). Long "i" is a mark for short "i" plus
a yaa yāʼ, and long u is mark for short u plus waaw,
so aā = ā, iy = ī and uw = ū);
Long "a" following a hamzah sound may be represented by
an alif-madda or by a floating hamzah followed by an alif.
In an un-vocalised text (one in which the short
vowels are not marked), the long vowels are
represented by the consonant in question (alif, yaa, waaw).
Long vowels written in the middle of a word are treated like consonants taking sukūn (see below) in a text that has full diacritics.
For clarity, vowels will be placed above or below the letter so it is necessary to read the results etc. Please note, is one of the six letters that do not connect to the left, and is used in this demonstration for clarity. Most other letters connect to , and .
Other Symbols and Signs
Shadda
marks the gemination (doubling) of a consonant; kasra (when present) moves to between the shadda and the geminate (doubled) consonant.
Sukūn
An Arabic syllable can be open (ended by a vowel) or closed (ended by a consonant).
- open: CV[consonant-vowel] (long or short vowel)
- closed: CVC (short vowel only)
When the syllable is closed, we can indicate that the consonant that closes it does not carry a vowel by marking it with a sign called sukūn, which takes the form "°", to remove any ambiguity, especially when the text is not vocalised: it's necessary to remember that a standard text is only composed of series of consonants; thus, the word , "heart", is written . Sukūn allows us to know where not to place a vowel: could, in effect, be read , but written with a sukūn over the and the , it can only be interpreted as the form (as for knowing which vowel to use, the word has to be memorised); we write this .
You might think that in a vocalised text sukūn
is not necessary, because the lack of vowel after
a consonant might be signalled by simply not writing
any mark above it, so would be redundant. That is not so because such a convention
("lack of any vowel mark means lack of vowel sound")
does not exist: may indeed be read
"". Such a rule would make sense if
everybody writing a vowel mark were forced
to write all vowel marks in the same word,
and that is not the case. In fact, you may
write as many or as few
of the vowel marks as you like.
In the , however, all vowel marks
must be written: there, sukuun over a letter
(other than the alif indicating long "a")
indicates that it is pronounced but not followed
by a short vowel, while the lack of any sign
over a letter (other than alif) indicates that
the consonant is not pronounced.
Outside of the , putting a sukuun above a
yaa' which indicates long ee, or above a
waaw which stands for long oo, is extremely rare,
to the point that yaa with sukuun will be unambiguously
read as the diphthong ai (as in English "eye") and waaw with sukuun will be read au (as in English "cow").
So, the word , "husband", can be written simply : (which
might be also read "zooj" if such a word existed); or with sukūn
which is unambiguously "";
or with sukūn and vowels: .
The letters
( with a at the end of the word)
will be read most naturally as the word "mooseekaa"
("music"). If you were to write sukuuns above the waaw, yaa and alif, you'd get
,
which looks like "mowsaykay"
(note that an is an alif and never takes sukūn).
You cannot place a sukuun on the final letter of "" even if you don't pronounce a vowel there, because fully vocalised texts are always written as if the ighraab vowels were in fact pronounced, and this word can never have a sukuun as an ighraab. Let's take the sentence "", meaning "Ahmed is a bad husband". The theoretical pronunciation with the ighraab vowels is "". Interestingly, regardless of the fact that most people say "", you cannot write the mark for sukuun over that ; you either leave it markless, or use the mark for "". By the same token, you can leave the final of this sentence either completely unmarked or topped with a shadda plus "", but a sukuun never belongs there, regardless of the fact that the only correct pronunciation of "" at the end of an utterance is "".
Rules for hamza
Summary
- Initial hamza is always written over or under an alif. Otherwise, surrounding vowels determine the seat of the hamza – but, preceding long vowels or diphthongs are ignored (as are final short vowels).
- over over if there are two conflicting vowels that “count”; on the line if there are none.
- As a special case, and require hamza on the line, instead of over an alif as you would expect from rule #1. (See III.1b below.)
- Two adjacent alifs are never allowed. If the rules call for this, replace the combination by a single alif-madda.
Detailed Description
- Logically, hamza is just like any other letter, but it may be written in different ways. It has no effect on the way other letters are written. In particular, surrounding long vowels are written just as they always are, regardless of the “seat” of the hamza – even if this results in the appearance of two consecutive waws or yaas.
- Hamza can be written in four ways – on its own (“on the line”) or over an alif, waw, or yaa, called the “seat” of the hamza. When written over yaa, the dots that would normally be written underneath disappear.
- When, according to the rules below, an hamza with an alif seat would occur before another alif, instead a single alif is written with the madda symbol over it.
- The rules for hamza depend on whether it occurs as the initial, middle, or final letter (not sound) in a word. (Thus, final short inflectional vowels do not count, but when –an is written as alif-tanwiin, it does count and the hamza is considered middle.)
I. If the hamza is initial:
- It is always written on an alif – over it if the following sound is or , under it if follows.
- If long follows, alif-madda will occur.
II. If the hamza is final:
- If a short vowel precedes, the hamza is written over the letter (alif, waw, or yaa) corresponding to the short vowel.
- Otherwise (i.e. long vowel, diphthong or consonant preceding), the hamza is written on the line.
III. If the hamza is middle:
- If a long vowel or diphthong precedes, the seat of the hamza is determined mostly by what follows:
: - If or follows, the hamza is written over yaa or waw, accordingly.
: - Otherwise, the hamza wants to be written on the line. If a yaa precedes, however, this would conflict with the stroke joining the yaa to the following letter, so the hamza is (in print, at least) written over yaa.
- Otherwise, both preceding and following vowels have an effect on the hamza.
: - If there is only one vowel (or two of the same kind), that vowel determines the seat (alif, waw, or yaa).
: - If there are two conflicting vowels, takes precedence over , over .
: - Alif-madda will occur if appropriate.
- Not surprisingly given the complexity of these rules, there is some disagreement.
: - Barron’s "201 Arabic Verbs" follows these rules exactly (although the sequence does not occur; see below).
: - John Mace’s "Teach Yourself Arabic Verbs and Essential Grammar" presents alternative forms in almost all cases when hamza is followed by a long . The motivation appears to be to avoid two waws in a row. Generally, the choice is between the form following the rules here, or an alternative form using hamza over yaa in all cases. Example forms are . Exceptions:
:: - In the sequence , e.g. , the alternatives are hamza on the line, or hamza over yaa, when the rules here would call for hamza over waw. Perhaps the resulting sequence of three waws would be especially repugnant?
:: - In the sequence , the alternative form has hamza over alif, not yaa.
:: - The forms have no alternative form. (But note with the same sequence of vowels!)
: - Haywood and Nahmad’s "A new Arabic grammar" doesn’t write the paradigms out in full but in general agrees with John Mace’s book, including the alternative forms – and sometimes lists a third alternative where the entire sequence is written as a single hamza over waw instead of as two letters.
: - "Al-Kitaab fii Ta:allum ..." presents paradigms with hamza written the same way throughout, regardless of what the rules above say. Thus with hamza only over alif, with hamza only over yaa, with hamza only over alif although this is not allowed in any of the previous three books. (This appears to be an over-generalization on the part of the Al-Kitaab writers.)
Arabic numerals
There are two kinds of numerals used in Arabic writing; standard Arabic numerals, and "East Arab" numerals, used in Iran, Pakistan and India. In Arabic, these numbers are referred to as "Indian numbers" (). In most of present-day North Africa, the usual Western numerals are used; in medieval times, a slightly different set (from which, via Italy, Western "Arabic numerals" derive) was used. Unlike Arabic alphabetic characters, Arabic numerals are written from left to right.
In addition, the Arabic alphabet can be used to represent numbers (Abjad numerals), a usage rare today. This usage is based on the Abjadi order of the alphabet. is 1, is 2, is 3, and so on until = 10, = 20, = 30, ... = 200, ..., = 1000. This is sometimes used to produce chronograms.
History
The Arabic alphabet can be traced back to the Nabatean alphabet used to write the Nabataean dialect of Aramaic, itself descended from Phoenician. The first known text in the Arabic alphabet is a late fourth-century inscription from Jabal Ram (50 km east of Aqaba), but the first dated one is a trilingual inscription at Zebed in Syria from 512. However, the epigraphic record is extremely sparse, with only five certainly pre-Islamic Arabic inscriptions surviving, though some others may be pre-Islamic. Later, dots were added above and below the letters to differentiate them (the Aramaic model had fewer phonemes than the Arabic, and some originally distinct Aramaic letters had become indistinguishable in shape, so in the early writings 15 distinct letter-shapes had to do duty for 28 sounds!) The first surviving document that definitely uses these dots is also the first surviving Arabic papyrus (PERF 558), dated April 643, although they did not become obligatory until much later. Important texts like the Qurʼan were frequently memorized; this practice, which survives even today, probably arose partially from a desire to avoid the great ambiguity of the script.
Yet later, vowel signs and hamzas were added, beginning sometime in the last half of the seventh century, roughly contemporaneous with the first invention of Syriac and Hebrew vocalization. Initially, this was done by a system of red dots, said to have been commissioned by an Umayyad governor of Iraq, Hajjaj ibn Yusuf: a dot above = , a dot below = , a dot on the line = , and doubled dots gave tanwin. However, this was cumbersome and easily confusable with the letter-distinguishing dots, so about 100 years later, the modern system was adopted. The system was finalized around 786 by al-Farahidi.
Arabic alphabets of other languages
Arabic script is not used solely for writing Arabic, but for a variety of languages. In each language, it has been modified to fit the language's sound system. There are sounds not found in Arabic, but found in, for instance, Persian, Kurdish, Malay and Urdu: such sounds don't correspond to any sound from the Arabic system of sounds for which the Arabic alphabet can be used. For example, the Arabic language lacks a sounding letter, so many languages add their own letter for in the script, though the symbol used may differ between languages. These modifications tend to fall into groups: all the Indian and Turkic languages written in Arabic tend to use the Persian modified letters (and those are the languages that are “geographically closer” to Persia), whereas West African languages tend to imitate those of Ajami, and Indonesian ones those of Jawi. A writing system in which the Persian modified letters are used is called Perso-Arabic script by the scholars.
Generally, in countries where national education is effective and where the national language is written in Arabic script, Arabic script is also used to write the other languages used in that country.
Current uses of the alphabet for other languages
The Arabic alphabet is currently used for:
- Kurdish and Turkmen in Northern Iraq. (In Turkey, the Latin alphabet is now used for Kurdish);
- Official language Persian and regional languages including Azeri, Sorani-Kurdish and Baluchi in Iran;
- Official languages Dari and Pashto and regional languages including Uzbek in Afghanistan;
- Official language Urdu and regional languages including Punjabi (where the script is known as Shahmukhi), Sindhi, Kashmiri, and Baluchi in Pakistan;
- Urdu and Kashmiri in India (see List of national languages of India);
- Uyghur (changed to Roman script in 1969 and back to a simplified, fully voweled, Arabic script in 1983), Kazakh and Kyrgyz by a minority of Kyrgyz in the Xinjiang Uyghur Autonomous Region in northwest China;
- Malay in the Arabic script known as Jawi is co-official in Brunei, and used for religious purposes in Malaysia, Indonesia, and Singapore;
- Comorian (Comorian) in the Comoros, currently side by side with the Latin alphabet (neither is official);
- Hausa for many purposes, especially religious (known as Ajami);
- Mandinka, widely but unofficially; (another alphabet used is N'Ko)
- Wolof (at zaouias), known as Wolofal.
- Tamazight and other Berber languages were traditionally written in Arabic in the Maghreb. There is now a competing 'revival' of neo-Tifinagh.
Former uses of the alphabet for other languages
In the past, it has also been used to represent other languages. Most education was once religious instead of governmental and uniform within a state, so choice of script was determined by the user's religion and Muslims would use Arabic script to write any language they used. See also Languages of Muslim countries.
- Afrikaans (as it was first written among the "Cape Malays");
- Albanian;
- Azeri in Azerbaijan (now written in the Latin alphabet and Cyrillic alphabet scripts in Azerbaijan);
- Belarusian (among ethnic Tatars);
- Berber in North Africa, particularly Tachelhit in Morocco (still being considered, along with Tifinagh and Latin for Tamazight);
- Bashkir (for some years: from October Revolution (1917) until 1928);
- Bosnian (only for literary purposes); (presently written in the Latin alphabet and Cyrillic alphabet scripts)
- Chaghatai across Central Asia;
- Chechen (for some years: from October Revolution (1917) until 1928);
- Chinese and Dungan, among the Chinese Hui Muslims[http://www.aa.tufs.ac.jp/~kmach/xiaoerjin/xiaoerjin-e.htm];
- Fulani, where the script is known as Ajami script;
- Kazakh in Kazakhstan;
- Kyrgyz in Kyrgyzstan;
- Malay in Malaysia and Indonesia;
- Mozarabic, when the Moors ruled Spain (and later Aragonese, Portuguese, and Spanish proper; see aljamiado);
- Nubian;
- Polish (among ethnic Tatars);
- Sanskrit has also been written in Arabic script, though it is more well known as using Devanagari - the script also known for being currently used for writing the Hindi language.
- Swahili;
- Somali (has used the Latin alphabet since 1972);
- Songhay in West Africa, particularly in Timbuktu;
- Tatar (iske imlâ) before 1928 (changed to Latin), reformed in 1880's, 1918 (deletion of some letters);
- Turkish in the Ottoman Empire was written in Arabic script until Mustafa Kemal Atatürk declared the change to Roman script in 1928. This form of Turkish is now known as Ottoman Turkish and is held by many to be a different language, due to its much higher percentage of Persian and Arabic loanwords;
- Turkmen in Turkmenistan;
- Uzbek in Uzbekistan;
- All the Muslim peoples of the USSR between 1918-1928 (many also earlier), including Bashkir, Chechen, Kazakh, Tajik etc. After 1928 their script became Latin, then later Cyrillic.
Computers and the Arabic alphabet
The Arabic alphabet can be encoded using several character sets, including ISO-8859-6 and Unicode, in the latter thanks to the "Arabic segment", entries U+0600 to U+06FF. However, neither of these sets indicate the form each character should take in context. It is left to the rendering engine to select the proper glyph to display for each character.
When one wants to encode a particular written form of a character, there are extra code points provided in Unicode which can be used to express the exact written form desired. The Arabic presentation forms A (U+FB50 to U+FDFF) and Arabic presentation forms B (U+FE70 to U+FEFF) contain most of the characters with contextual variation as well as the extended characters appropriate for other languages. These effects are better achieved in Unicode by using the zero width joiner and non-joiner, as these presentation forms are deprecated in Unicode, and should generally only be used within the internals of text-rendering software, when using Unicode as an intermediate form for conversion between character encodings, or for backwards compatibility with implementations that rely on the hard-coding of glyph forms.
Finally, the Unicode encoding of Arabic is in logical order, that is, the characters are entered, and stored in computer memory, in the order that they are written and pronounced without worrying about the direction in which they will be displayed on paper or on the screen. Again, it is left to the rendering engine to present the characters in the correct direction, using Unicode's bi-directional text features. In this regard, if the Arabic words on this page are written left to right, it is an indication that the Unicode rendering engine used to display them is out-of-date. For more information about encoding Arabic, consult the Unicode manual available at http://www.unicode.org/
- [http://www.nclrc.org/readings/inst-arabic3.pdf Multilingual Computing in Arabic with Windows, major word processors, web browsers, Arabic keyboards, and Arabic transliteration fonts]
Arabic keyboard layout
bi-directional text
See also
- Arabic calligraphy - considered an art form in its own right
- Arabic numerals
- Arabic transliteration
- Arabic Chat Alphabet
- ArabTeX - provides Arabic support for TeX and LaTeX
- Jawi - an adapted Arabic alphabet for the Malay language
External links
- [http://www.lexilogos.com/clavier/araby.htm online Arabic Keyboard]
- [http://www.nicoweb.com/sirpus/learn%20arabic%20course%20mp3.htm Arabic Writing and Reading never been Easier with MP3]
- [http://www.al-bab.com/arab/visual/calligraphy.htm Arab writing and calligraphy]
- [http://www.omniglot.com/writing/arabic.htm Article about Arabic alphabet]
- [http://www.islamicart.com/main/calligraphy/ Arabic alphabet and calligraphy]
- [http://members.aol.com/OlivThill/ aralpha (freeware) to learn the characters]
- [http://www.uga.edu/islam/arabic_windows.html Guide to the use of Arabic in Windows, major word processors and web browsers]
- [http://www.declan-software.com/arabic/ Arabic Alphabet teaching software]
- [http://www.theiling.de/schrift/#arabic Learn the Arabic Script Online]
----------------
This article contains major sections of text from the very detailed article Arabic alphabet/from the French Wikipedia, which has been partially translated into English. Further translation of that page, and its incorporation into the text here, are welcomed.
Category:Abjad writing systems
als:Arabisches Alphabet
ja:アラビア文字
Phoenician languages
Phoenician was a language originally spoken in the coastal region then called Pūt in Phoenician, Canaan in Phoenician, Hebrew and Aramaic, and Phoenicia in Greek and Latin. Phoenician is a Semitic language of the Canaanite subgroup, closely related to Hebrew and Aramaic. This area includes modern-day Lebanon, coastal Syria and northern Israel. Its speakers called their own language (dabarīm) Pōnnīm/Kana'nīm "Phoenician/Canaanite (speech)".
Phoenician is known only from inscriptions such as Ahiram's coffin, Kilamuwa's tomb, Yehawmilk's in Byblos, and occasional glosses in books written in other languages; Roman authors such as Sallust allude to some books written in Punic, but none have survived except occasionally in translation (eg. Mago's treatise) or in snippets (eg. in Plautus' plays).
Punic and its influences
The significantly divergent later-form of the language that was spoken in the Tyrian Phoenician colony of Carthage is known as Punic; it remained in use there for considerably longer than Phoenician did | | |