Language Supplement, UK English
Nuance Recognizer 10.0.0
October 25, 2010
This guide is for the application developer who wishes to create speech recognition applications. It supplements the Nuance product documentation and details the grammars and pronunciation guidelines for UK English. The Language Pack Guide covers general details about grammars.
For more information on language-related application development topics, see the documentation included with your set of licensed Nuance speech products.
Be sure to review the release notes distributed with this product for the latest information, as well as restrictions and known problems.
This document includes these sections:
This document uses the font Arial Unicode MS to display IPA characters and any other special characters (in the Vocabulary items and pronunciations section). Arial Unicode MS is normally distributed with Microsoft Office. If the UK English characters don't appear when you view this document, make sure you choose View-->Encoding-->Unicode (UTF-8) in your browser.
Grammar documents
Built-in grammars for UK English (en-GB):
- alphanum_lc built-in grammar
- alphanum built-in grammar
- boolean built-in grammar
- ccexpdate built-in grammar
- creditcard built-in grammar
- currency built-in grammar
- date built-in grammar
- digits built-in grammar
- national insurance built-in grammar
- number built-in grammar
- phone built-in grammar
- postcode built-in grammar
- time built-in grammar
Creating grammars
The following subsections describe key issues for working with grammar documents in the UK English language. For detailed information on creating grammars, see your product documentation.
Character encoding
UK English grammars must specify Latin-1 character encoding, also known as ISO-8859-1. (Incorrect encoding will result in grammar compilation errors.) The first line of each grammar should be:
<?xml version="1.0" encoding="ISO-8859-1" ?>Testing grammars with the ParseTool program
The ParseTool program lets you type sentences into a grammar and returns the grammar's results. The tool is stored in the bin directory of the installation baseline. See your product documentation for information.
When entering UK English text for the built-in grammars, note the following:
- Digits in the alphanum grammar are written as digits (0-9).
- In all other built-in grammars, Arabic numbers are not allowed. For example, the number grammar does not parse "25" but it does parse "twenty five".
- Numbers between 0 and 99 are written as strings of individual words separated by spaces; there are no hyphens. For example, "seventy five" is correct, but not "seventy-five" or "seventyfive."
Valid characters in grammars
In order to define which characters can be used with this language pack please read the sections "Valid characters in grammars" and "Checking pronunciations with dicttest" in the Grammar Developer guide (accessible through the Product Documentation program shortcut).
alphanum_lc built-in grammar
The alphanum_lc built-in grammar recognizes a connected string of up to 20 digits and lowercase alphabetic characters, such as "a8f9h23". For example, this grammar could be used to recognize a product code or user id. The “lc” in the name of this built-in means lowercase. The possible characters are the lowercase letters a-z and the digits 0-9. The application layer can adjust the case of the returned letters as needed for further processing.Note: This grammar replaces the alphanum built-in grammar.
alphanum built-in grammar
(NOTE: for backward-compatibility only. Otherwise, use alphanum_lc builtin)
This grammar has been replaced by the alphanum_lc grammar, but is still available. The alphanum builtin-grammar has been retained for backward-compatibility. For new implementations, please use the alphanum_lc builtin grammar.
The alphanum built-in grammar recognizes a connected string of up to 20 digits and uppercase or lowercase alphabetic characters, such as “A8f9h23”. For example, this grammar could be used to recognize a product code or order number. The possible characters are the uppercase letters A-Z, lowercase letters a-z, and digits 0-9. Uppercase and lowercase letters are homonyms (e.g., “B” and “b”), so the inclusion of both is redundant for the purposes of speech recognition of case insensitive items such as product codes. Thus, the alphanum built-in grammar has been replaced by the alphanum_lc grammar.
boolean built-in grammar
The boolean grammar collects an affirmative or negative response.
Properties
The y and n parameters let you associate any two touchtone buttons as synonyms for yes and no.
Parameter
Description
y
Desired DTMF digit to be equivalent to "yes" (default = 1)
n
Desired DTMF digit to be equivalent to "no" (default = 2)
Examples
Caller says...
MEANING key
yes
correct
true
no
false
ccexpdate built-in grammar
The ccexpdate grammar understands the expiration date on a credit card. Expiration dates are usually a month and a year, and are often embossed on a credit card in the form "mm/yy." The grammar recognizes variations on the date, for example, "December 2007," "twelve oh seven," "twelve of two thousand and seven," "twelve slash zero seven," etc.
creditcard built-in grammar
The creditcard grammar understands a caller saying a credit card number, optionally preceding the number with the credit card name, or the words "account number" or "account." For example, a caller can say, "visa account number four oh one seven...," "mastercard five zero zero two...," or "three seven three five...."
currency built-in grammar
The currency grammar collects currency using pounds and pence (or `p' or "penny"). The grammar also accepts Euro and Cent.
Return keys/values
MEANING
Contains a string in the following form:
currencymain_unit_amount.subunit amount
If the caller explicitly says "euros" or "cents", then a currency value of EUR is added as a prefix.
If the caller explicitly says "pounds," "pence," "penny," or "p" then a currency value of GBP is added as a prefix.
If the caller does not explicitly indicate the currency type, then no prefix is added.
If the caller omits the main unit or subunit amount, then that field is zero. The string contains a leading zero if the subunit amount is collected without the main unit.
SWI_literal
contains the exact text that was recognized.
Examples
Caller says
MEANING
five pounds
GBP5.00
five euros
EUR5.00
five pence
five p
GBP0.05
five cents
EUR0.05
five pounds and five pence
GBP5.05
five euros and five cents
EUR5.05
five pounds and twenty-five
five pounds twenty-five
GBP5.25
five twenty-five
5.25
six hundred twenty-five thousand four hundred sixty-four pounds
GBP625464.00
one pound zero pence
GBP1.00
one twenty two
1.22
date built-in grammar
The date grammar accepts a date spoken in any of several formats.
Recognized phrases include "4 June," "4 June 2006," ""4, 6, 2006," "the 4th," "4th June," and "Monday, the 4th of June."
The grammar also accepts "yesterday" "today," and "tomorrow" which return values of -1, 0, and +1 respectively into the MEANING key.
Examples
Caller says
MEANING key
the 5th of January, 2000
20000105
Yesterday
-1
Today
0
Tomorrow
+1
the fourth
??????04
Wednesday
(Phrase not recognized)
Wednesday the 12th
??????12
June 4
June 4th
????0604
June 4, 1997
19970604
June 4, 97
??970604
Wednesday, June 4, 1997
19970604
the 6th
??????06
4, 6
????0604
10, 12
????1210
10, 12, 97
??971210
digits built-in grammar
Valid characters are the digits 0-9. The digit `0' can be pronounced as either "oh" or "zero."
national insurance built-in grammar
The national_insurance grammar understands NI numbers. The valid format is aannnnnnb (where a is alphabetic, n is a digit, and the optional b is the alphabetic A, B, C, or D). Illegal numbers are rejected.
The advantage of using this grammar rather than an alphanum grammar is that NI numbers have constraints that reduce that set of possible recognition hypotheses (and thus increase recognition accuracy).
Return keys/values
Upon return, the key MEANING is assigned to the recognized number.
number built-in grammar
The number grammar recognizes whole numeric numbers (the caller must not speak the individual digits).
Examples
Numbers from -999,999,999.99 to 999,999,999.99 are recognized, but by default the minallowed parameter is set to zero, which limits recognition to positive values.
Caller says
MEANING key
twenty five
25
twelve thousand three hundred forty five
12345
twelve hundred
1200
minus two
-2
negative two
(Phrase not recognized; the word "negative" is not allowed)
fourteen point five six
14.56
fourteen dot fifty six
(Phrase not recognized; the words "dot" and "fifty six" are not allowed)
phone built-in grammar
The phone built-in grammar accepts telephone numbers (landline and cellular) using the following conventions:
Optional leading 1 before a number.
- 6 digits [local]: NNNNNN
- 8 digits [local]: NNNN NNNN
- 10 digits [cellular]: 07 NNN NNNNN
- 11 digits [national]: 01 N NNNN NNNN or 02 N NNNN NNNN
- 11 digits [national]: 01 NNN NNNNNN or 02 NNN NNNNNN
- 10 digits [special rate]: 08 NNN NNNNN
- 10 digits [premium rate]: 09 NNN NNNNN
Numbers reserved for emergency services: 999 (emergency) and 100 (operator)
- 150 residential customer service
- 152 business customer service
- 153 international directory assistance
- 155 international operator assistance
- 192 directory assistance
- 195 blind & disabled assistance (no charge)
The grammar does not allow natural number phrases such as "three two four five double two ." Callers can generally speak natural numbers telephone extension numbers (for example, saying "extension fifty two" instead of "extension five two").
Return keys/values
Upon return, the MEANING key is assigned to a variable length character result representing the recognized phone number.
Parameter properties
Additionally, as stipulated in the VoiceXML specification, the caller may specify an extension, for example, "five four two three five six seven extension two thousand." By default, extensions of one to four digits long are supported.
Property
Description
minextension
Minimum numeric value allowed for an extension (default is 1).
maxextension
Maximum numeric value allowed for an extension. Set this to 0 to disallow extensions. (Default is 9999.)
postcode built-in grammar
The postcode grammar recognizes valid alphanumeric postcodes in the UK. The following table shows valid formats ("A" indicates an alphabetic character and "N" indicates a digit):
Format
Example
AN NAA
B1 3AW
ANN NAA
M15 4PT
AAN NAA
CH4 9GB
AANN NAA
SW19 3PZ
ANA NAA
W1A 4RR
AANA NAA
SW1A 1AA
All the formats consist of two parts, separated by a space. The first part of the postcode only accepts values found in the Royal Mail database.
Return keys/values
Upon return, the key MEANING is assigned to the recognized postal code. The string is alphanumeric, all lowercase, and contains no spaces. For example, "SW1A 1AA" is returned as "sw1a1aa".
time built-in grammar
The time grammar recognizes a time of day.
Examples
For each entry, the values returned in the MEANING and QUALIFIER keys are shown. (Not shown are the values of the HOUR, MINUTE, and AMPM keys.)
Caller says
MEANING
QUALIFIER
now, immediately...
(Phrase not recognized)
--
in a half hour
(Phrase not recognized)
--
at noon
1200p
exact
at midnight
1200a
exact
before noon
1200p
before
after thirteen thirty
1330h
after
twenty twenty
2020h
exact
eight twenty in the morning
0820a
exact
half past eight
0830?
exact
half eight
(Phrase not recognized)
--
seven fifteen pm
quarter past seven in the evening
0715p
exact
twenty four hundred hours
twenty four hundred
0000h
exact
Vocabulary items and pronunciations
This chapter describes considerations for vocabularies and their pronunciations in UK English (en-GB).
Specially tuned pronunciations
The following table shows common words that are fine-tuned by Nuance. Each of these words contains "word-specific phonemes;" that is, phonemes and associated models created especially for the words.
Words with tuned pronunciations (do not modify):
- All letters of the alphabet, a-z
- yes, no
- Monetary units: pound, pence
- Cardinal numbers: 0-99, 100, and 1000
- Ordinal numbers: 1.-31. (1 st through 31 st )
UK English pronunciations
This section provides detailed reference information to help create pronunciation dictionaries. It is intended for people who have sufficient knowledge of the English language as spoken in the United Kingdom. It provides information about transcription and pronunciation.
As reference pronunciation dictionary we use:
Wells, John C.: Longman Pronunciation Dictionary.
Burnt Mill: Longman 1990. (ISBN 0-582-96411-3)In this dictionary you will find the UK English as well as the American English pronunciation.
If you are not sure how a certain word is pronounced you can refer to the IPA transcriptions given there and then convert them into the SAMPA symbols, given in The UK English symbol set in alphabetical order.
The UK English phoneme system
The UK English phoneme system can be divided into two groups:
- Consonants
- Vowels
Furthermore, it is possible to define six different types of consonants:
- Plosives
- Fricatives
- Affricates
- Nasals
- Laterals
- Semivowels
Within the vowel group, further distinctions can be made between front, central and back vowels and diphthongs.
UK English spelling does have a certain complexity, since the orthography of most of its constituent words does not necessarily reflect their pronunciation. This lack of rigid structure means that the relationship between spelling (grapheme) and sound (phoneme) is difficult to define. Generally speaking, the phonetic transcription of a word is influenced by:
- Specific phonetic rules
- Pronunciation peculiarities that have developed by the time
UK English symbol set grouped by phoneme classes
The following table shows all phonemes used in UK English transcriptions. These are listed according to their phoneme classes with their SAMPA and IPA representations.
Phoneme class
SAMPA
IPA
Examples of usage
Consonants
Plosives
b
b
bin
/bIn/
p
p
pin
/pIn/
g
g
give
/gIv/
k
k
skin
/skIn/
d
d
dummy
/dVmi:/
t
t
tin
/tIn/
Fricatives
v
v
saving
/seYvIN/
f
f
coffee
/kQfi:/
D
ð
this
/DIs/
T
θ
thin
/TIn/
z
z
crazy
/kreYzi:/
s
s
sin
/sIn/
S
ʃ
ship
/SIp/
Z
ʒ
vision
/vIZ@n/
h
h
hit
/hIt/
Affricates
tS
ʧ
chat
/tS{t/
dZ
ʤ
ginger
/dZIndZ@/
Nasals
m
m
mock
/mQk/
n
n
knock
/nQk/
N
ŋ
thing
/TIN/
Laterals
l
l
long
/lQN/
Vowels
Semivowels
r
r
run
/rVn/
j
j
yes
/jes/
w
w
wet
/wet/
Single vowels
I
ɪ
pit
/pIt/
i:
i:
ease
/i:z/
e
e
pet
/pet/
u:
u:
lose
/lu:z/
@
ə
away
/@weY/
{
æ
bad
/b{d/
A:
ɑ:
stars
/stA:z/
Q
ɒ
pot
/pQt/
O:
ɔ:
north
/nO:T/
V
ʌ
cut
/kVt/
3:
ɜ:
furs
/f3:z/
U
ʊ
put
/pUt/
Diphthongs
eY
eɪ
raise
/reYz/
aY
aɪ
rise
/raYz/
QY
ɔɪ
noise
/nQYz/
@W
əʊ
nose
/n@Wz/
aW
au̬ / aʊ̬
rouse
/raWz/
eR
eə
stairs
/steRz/
IR
ɪə
appear
/@pIR/
UR
ʊə
tourist
/tURrIst/
UK English consonants
English consonants typically consist of
- six plosives
- nine fricatives
- two affricates
- three nasals
- one lateral
- three semivowels
Plosives
There are three voiced and three voiceless plosives in UK English, which can be arranged in pairs as shown here:
Voiced
Voiceless
/b/
bit
rabid
cab
/bIt/
/r{bId/
/k{b/
/p/
pit
rapid
cap
/pIt/
/r{pId/
/k{p/
/g/
gold
degree
bag
/g@Wld/
/dIgri:/
/b{g/
/k/
cold
decree
back
/k@Wld/
/dIkri:/
/b{k/
/d/
down
medal
sad
/daWn/
/med@l/
/s{d/
/t/
town
metal
sat
/taWn/
/met@l/
/s{t/
Fricatives
There are nine fricatives in the UK English SAMPA symbol set, five voiceless and four voiced:
Voiced
Voiceless
/v/
vine
even
prove
/vaYn/
/i:v@n/
/pru:v/
/f/
fine
rougher
proof
/faYn/
/rVf@/
/pru:f/
/D/
this
worthy
with
/DIs/
/w3:Di:/
/wID/
/T/
thin
earthy
oath
/TIn/
/3:Ti:/
/@WT/
/z/
zone
razor
plays
/z@Wn/
/reYz@/
/pleYz/
/s/
sign
racer
place
/saYn/
/reYs@/
/pleYs/
/Z/
gendarme
vision
/ZQndA:m/
/vIZ@n/
/S/
shine
mission
dish
/SaYn/
/mIS@n/
/dIS/
/h/
house
behind
/haWs/
/bIhaYnd/
In UK English the voiceless fricative /h/ does not appear in the final position.
Affricates
In UK English there are two affricates: /dZ/ and /tS/.
Note, that in SAMPA affricates are always represented by two single phonemes.
Voiced
Voiceless
/dZ/
gin
ridges
large
/dZIn/
/rIdZIz/
/lA:dZ/
/tS/
chin
riches
much
/tSIn/
/rItS@z/
/mVtS/
Nasals
There are three nasals in UK English, /m/, /n/, and /N/. The velar nasal /N/ (back of the tongue touches the soft palate) never appears in the initial position.
/m/
man
hammer
ham
/m{n/
/h{m@/
/h{m/
/n/
net
enter
run
/net/
/ent@/
/rVn/
/N/
sing
finger
/sIN/
/fINg@/
Pronunciation note: The grapheme <n> before <c>, <g>, <k>, <q>, and <x> is pronounced as /N/.
Syllabic /m/ and /n/ are represented as /@m/ and /@n/ respectively, for example: garden /gA:d@n/.
Laterals
There is one lateral in UK English: /l/.
/l/
long
falling
roll
/lQN/
/fO:lIN/
/r@Wl/
Syllabic /l/ is represented as /@l/, for example: level /lev@l/.
Semivowels
A semivowel is articulated by allowing air to escape over the center of the tongue through a stricture (in the case of /w/ two strictures) that is not so narrow as to cause audible friction. Semivowels are articulated like vowels, but function as consonants since they are not syllabic. They can also be referred to as approximants.
There are three semivowels in UK English, /r/, /j/, and /w/, shown below:
/r/
rich
blurring
/rItS/
/bl3:rIN/
/j/
young
view
/jVN/
/vju:/
/w/
win
away
/wIn/
/@weY/
In UK English final <r> is usually not pronounced, unless it appears in combined words as a linking-r, for example: faraway /fA:r@weY/.
UK English vowels
Front, central and back vowels
UK English single vowels (monophthongs) can be divided into three groups according to their place of articulation: front, central or back. Within each group vowels differ in their degree of mouth opening. Length is of minor importance in the UK English vowel system, and the length of a particular vowel in a given word may change considerably in connected speech. Thus the colon, which appears in some phonetic symbols to denote length, is used in the transcription of UK English to denote a different vowel quality rather than quantity (length).
The three vowel groups are shown in the following table, ranging in each group from closed (top) to open (bottom) mouth:
Front
Central
Back
/i:/
ease
believe
free
/i:z/
/bIi:v/
/fri:/
/u:/
ooze
goose
two
/u:z/
/gu:z/
/tu:/
/U/
umlaut
put
/UmlaWt/
/pUt/
/I/
itch
pit
/ItS/
/pIt/
/e/
pet
ever
/pet/
/ev@/
/@/
about
success
liver
/@baWt/
/s@kses/
/lIv@/
/3:/
urban
nurse
fur
/3:b@n/
/n3:s/
/f3:/
/V/
utter
cut
/Vt@/
/kVt/
/O:/
awe
north
cause
/O:/
/nO:T/
/kO:z/
/{/
apt
sad
/{pt/
/s{d/
/A:/
start
father
bar
/stA:t/
/fA:D@/
=/bA:/
/Q/
optimistic
pot
/QptImIstIk/
/pQt/
Pronunciation note: The short-o sound, is regularly transcribed as /Q/, like in moral /mQr@l/.
Diphthongs
There are eight diphthongs in the UK English phoneme inventory:
/eY/
aim
face
hay
/eYm/
/feYs/
/heY/
/aY/
ice
price
high
/aYs/
/praYs/
/haY/
/QY/
oyster
toys
boy
/QYst@/
/tQYz/
/bQY/
/@W/
omen
home
blow
/@Wm@n/
/h@Wm/
/bl@W/
/aW/
our
house
now
/aW@/
/haWs/
/naW/
/IR/
ear
near
/IR/
/nIR/
/eR/
air
area
square
/eR/
/eRrIR/
/skweR/
/UR/
cure
/kjUR/
Diphthongs can artificially emerge in a transcription when the individual phonemes that usually form a diphthong are placed adjacent in a word. For example, autoimmune. However, instances of such words are rare and can be ignored.
Specific pronunciation transcription methods
Linking-r
In UK English, a word pronounced in isolation never ends in /r/. However, in connected speech the final <r> is pronounced as if it is followed by a vocal, as in the combined words:
The inserted-r sound is known as `linking-r' and should be transcribed to avoid liaison problems.
Syllabic consonants
The consonants <l>, <m>, and <n> can sometimes form a syllable on their own. In these cases they are transcribed as /@l/, /@m/, and /@n/ respectively.
Pronunciation of foreign words
To transcribe foreign words, you must use the UK English SAMPA symbols.
If you use a different symbol set your system will be incapable of understanding the input.
Every language has a different phoneme inventory, so you may have problems in covering each and every sound. For the most common cases we offer the following transcription examples.
French nasals
Try to apply a pronunciation that has been adapted to UK English, for example
bonbon
/bQnbQn/
The original transcription /bo~bo~/ cannot be realized because the French phoneme /o~/ is not part of the UK English SAMPA symbol set.
Vowel 'y' in German and French
The vowel 'y', found in some German or French words can be represented by /u:/ or /jU/, such as:
Duchenne
/du:Sen/
Dubonnet
/djUbQneY/
Conveniently this reflects the pronunciation commonly used by UK English speakers who are not fully conversant within the particular language.
German fricative 'x'
Palatal and velar fricatives that occur in, for example, German, can be transcribed as /k/, instead of 'x'. As in:
Reich
/raYk/
Multiple pronunciations (variants)
Since it is possible to have more than one pronunciation for a word by using pronunciation variants, it may be difficult to determine how many pronunciation variants should be created. The general rule is: Variants should only be created if the pronunciation differs in more than one phoneme. Minor systematic variations can usually be reflected in the training material for the phonemes, and need not be covered by pronunciation variants. If such a word causes recognition errors, the creation of pronunciation variants may help to solve the problem.
The UK English symbol set in alphabetical order
The following table shows the UK English symbol set in alphabetical order:
SAMPA
IPA
Examples of usage
@
ə
away
/@weY/
@W
əʊ
nose
/n@Wz/
{
æ
bad
/b{d/
3:
ɜ:
furs
/f3:z/
A:
ɑ:
stars
/stA:z/
aW
au̬/ aʊ̬
rouse
/raWz/
aY
aɪ
rise
/raYz/
b
b
bin
/bIn/
d
d
dummy
/dVmi:/
D
ð
this
/DIs/
dZ
ʤ
ginger
/dZIndZ@/
e
e
pet
/pet/
eR
eə
stairs
/steRz/
eY
eɪ
raise
/reYz/
f
f
coffee
/kQfi:/
g
g
give
/gIv/
h
h
hit
/hIt/
I
ɪ
pit
/pIt/
i:
i:
ease
/i:z/
IR
ɪə
appear
/@pIR/
j
j
yes
/jes/
k
k
skin
/skIn/
l
l
long
/lQN/
m
m
mock
/mQk/
n
n
knock
/nQk/
N
ŋ
thing
/TIN/
O:
ɔ:
north
/nO:T/
p
p
pin
/pIn/
Q
ɒ
pot
/pQt/
QY
ɔɪ
noise
/nQYz/
r
r
run
/rVn/
s
s
sin
/sIn/
S
ʃ
ship
/SIp/
t
t
tin
/tIn/
T
θ
thin
/TIn/
tS
ʧ
chat
/tS{t/
U
ʊ
put
/pUt/
u:
u:
lose
/lu:z/
UR
ʊə
tourist
/tURrIst/
v
v
saving
/seYvIN/
V
ʌ
cut
/kVt/
w
w
wet
/wet/
z
z
crazy
/kreYzi:/
Z
ʒ
vision
/vIZ@n/
Online documentation and technical support
This language pack includes online documentation in HTML format. To access additional product documentation, look for the Product Documentation program shortcut. Send comments on Nuance documentation to techdoc@nuance.com.
Technical support is provided online at Nuance Network.