Slide 1

NATURAL LANGUAGE UNDERSTANDING
1. Introduction to Language Processing
Language is one of the most distinctive behaviours that sets humans apart from
other animals. Other animals communicate with signals, but not with language.
Here are some defining characteristics of language (after Edward Sapir ~1920)
Arbitrariness: There is no relationship between a sound or sign and its meaning.
Cultural transmission: Language is passed from one language user to the next,
consciously or unconsciously.
Discreteness: Language is composed of discrete units that are used in
combination to create meaning.
Displacement: Languages can be used to communicate ideas about things that
are not in the immediate vicinity either spatially or temporally.
Duality: Language works on two levels at once, a surface level and a semantic
(meaningful) level.
Metalinguistics: Ability to discuss language itself.
Productivity: A finite number of units can be used to create an infinite number of
utterances.
animal language link #1
animal language link #2
To devise systems which can understand natural language has always been one of
the prime objectives of Artificial Intelligence.
Language is more than just a communication method.
Language can be used in a variety of speech acts:
Query
Inform
Request (directly or indirectly)
Direct (or command)
Acknowledge
Promise
Declare (pronounce)
The full task of understanding language requires recognizing the nature of the
speech act as well as its context, along with the plans that surround the utterance.
When we study language, we usually think in terms of an idealized, grammatically
correct form. But everyday use of language is often fragmented, and ungrammatical.
Annie Hall link
When we study language, we usually think in terms of an idealized, grammatically
correct form. But everyday use of language is often fragmented, and ungrammatical.
ANNIE Come on, Alvy, they're only baby ones, for God's sake.
ALVY
If they're only babies, then you pick 'em up.
ANNIE Oh, all right. All right! It's all right. Here.
ALVY
Don't give it to me. Don't!
ANNIE Oooh! Here! Here!
ALVY
Look! Look, one crawled behind the refrigerator. It'll turn up in our bed at
night. Will you get outta here with that thing? Jesus!
ANNIE Get him!
ALVY
Talk to him. You speak shellfish!
Hey, look ... put it in the pot.
ANNIE I can't! I can't put him in the pot. I can't put a live thing in hot water.
ALVY
Gimme! Gimme! Let me do it! What-what's he think we're gonna do, take
him to the movies?
ANNIE Oh, God! Here yuh go! Oh, good, now he'll think- Aaaah! Okay.
Understanding language requires appropriate knowledge. There could never be a
completely general artificial intelligence natural language understanding system,
any more than any human could understand every sentence in a particular
language. Consider the following sentence:
“It is a killer wet miller aventurine rebel rainbow”
Understanding language requires appropriate knowledge. There could never be a
completely general artificial intelligence natural language understanding system,
any more than any human could understand every sentence in a particular
language. Consider the following sentence:
“It is a killer wet miller aventurine rebel rainbow”
Link to McGuirk effect
Steps in Language Communication:
intention:
Speaker S identifies proposition (or idea) P to communicate.
generation: S plans how P can be turned into an utterance consisting of
words W. This is based on the perception of the current situation,
and on expectations of listener’s ability to understand and
interpret.
synthesis:
W is encoded into a series of vocalizations W′.
perception:
W′ → W2′ → W2 Listener perceives vocalization W2′, and
converts it into words W2
analysis:
Listener carries out parsing to identify the role of words in
sentences, and carries out semantic interpretation to assign
meaning and reference to the utterance, then carries out
pragmatic interpretation: to relate the meaning to the context,
and to the current situation. These steps involve disambiguation
to select among feasible interpretations, and incorporation to
decide what (if any) of the utterance is to be believed or acted
upon.
Language is usually studied as a formal system. In particular it is studied as if there
were some form of underlying grammar.
G = {S, P, NT, T}
terminal symbols (words)
non-terminal symbols (such as verb-phrase, noun-phrase)
production rules (such as sentence → noun-phrase verb-phrase)
starting symbol (for the application of production rules)
L(G) is the language produced by G: the set of possible terminal strings that can be
produced using G.
But many aspects of natural language use can’t be captured with grammar rules
He beat him up
* He beat up him
He beat John up
He beat up John
He beat the man up
He beat up the man
* He beat the very big man up
He beat up the very big man
2. Devising a Natural Language Understanding System
Most Natural Language systems in Artificial Intelligence perform only the analysis
step, involving the stages of parsing, and semantic analysis, with some pragmatic
analysis. It is usually centred on the task of providing an interface for a database.
In devising a natural language understanding system, here are some prerequisites:
1. Decide upon a domain of discourse. What is the purpose of the system.
example: a system that asks questions about, and maintains a
chessboard configuration.
2. Determine how your domain of knowledge is to be maintained and
accessed apart from the language processing.
(a) what are the objects and their representations?
board
squares = A1, B1, .. H8
pieces = pawn, rook, ..
rows (rank)
columns (file)
colours
boardparts
etc.
(b) what are the actions and responses that your system will take?
(setup n)
(clear-board)
(display)
(move square1 square2)
(place piece square)
(castle colour side)
. . .
(print-take square1 square2)
(print-move square1 square2)
(past) “Gee ... I don’t really remember that”
(reply) “That’s nice, .. I’ll try to remember that”
(c) what supporting functions and predicates will be necessary?
(piece-on square)
(possible-takes square)
(occupied square)
(colour piece)
(can-take square 1 square2)
3. How are the sentences of the language to be represented in terms of
access to the database, and in terms of responses from your system?
(for all MV1 piece
(for the MV2 piece
(queen MV2)
(black MV2)
(and (can-take MV1 MV2)
(print-take MV2 MV1))))
What can take the black queen?
Which pieces threaten the black queen?
(test (for a MV1 piece
(pawn MV1)
(for the MV2 square
(square-is MV2 F6)
(piece-on-square MV1 MV2)
(print-piece MV1)))
Is there a pawn on F6?
4. What general classes of sentences are you going to use?
declarative
interrogative
imperative
past tense
present tense
future tense
- “The black king is on E5” – just parse it, no action
- “does a pawn threaten a white bishop?”
obtain information, and report
- “move the pawn to B3” – check validity, make changes
- parse, no action
- yes
- parse, no action
(note: - having ruled out imperative sentences, something like “Tell me how
many pawns there are” would not be responded to.
5. For each part of speech, devise many examples of what
processed:
example: for Noun Phrases
the black pawn
pawn on A1
the pawn that can take the rook
all of the pawns
any pawn
position a1
is intended to be
any position
the pieces
any pieces
6. Establish the lexicon. That is a list of all the words that can be used in the
system, and for each, the specification of their lexical category.
<Noun> → pawn | rook | knight | bishop | queen | king | row| column|
file | rank | centre | edge | corner | square | place | position |
space | board | men | man | danger | attack | check | move |
moves | colour | left | right | front | behind | back | white |
black | player
<Verb> → threaten | endanger | attack | capture | remove | display |
show | print | setup | prepare | castle | place | put | move |
moves | take | takes | took | taken | taking | have | has | had |
having | do | does | did | done | doing | can | could | be | is |
am | are | being | was | were | been
etc
As you can see, many of the allowed words are formed from the same
root, for use in different tenses, or with different numbers. All those
relationships will have to be made explicit (later)
As you can see, many words are just synonyms in terms of the action that
your system will take colour and player, take and capture, threaten and
endanger, ...
6. Devise the grammar for the language that your system is going to employ.
<S>
→
<TYPE> →
<VP> →
<NP>
→
<TYPE> <NP> <VP>
<DECL> | <QYN> | <SREL> | <COMM>
<V> | <VP> <NP> | <VP> <ADJ> | <VP> <PP> |
<VP> <ADV>
......
In devising the grammar, you will have to make extensive lists of
a. The types of specifications that you intend to allow:
[place] [piece] on [ square]
[remove] [piece]
questions about [piece] [takes] [piece]
questions about [piece] [be] [compliment]
...
a. Example sentences that you intend the system to process
Does the row the white queen is in contain a bishop?
How many moves does white have?
What are the moves that the black rook in row two can make?
Are there any pawns that the black bishop is threatening?
Move the pawn to the square behind the white knight.
Which player has four pawns left?
As one proceeds in constructing a natural language understanding system, there
will be discoveries of other words that are applicable, other operations in the
database that are desirable, and other desired ways of saying things, and
therefore other productions that are required for the grammar.
That’s one reason why it is important for NLU systems to be data driven, as much
as possible.
input sentence
PARSER
grammar, lexicon
parse tree
SEMANTIC
semantic templates
INTERPRETER
data-base query
DATABASE
response
database content
3. Parsing
<S>
<NP>
<VP>
<ARTICLE> <NOUN> <AUX> <VERB>
<NP>
<ARTICLE> <NOUN>
the
pawn
is
threatening
the
queen
Parse trees are specifications of the grammar rules that are applied to produce
particular sentences. Note that a parse tree does not encode any order in which
they were applied.
We can represent such parse trees as lists:
(<S> (<NP> (<ARTICLE> the) (<NOUN> pawn))
(<VP> (<AUX> is) (<VERB> threatening) (<NP> (<ARTICLE> the)
(<NOUN> queen))))
The parse tree is critical, because it tells us the role that each word plays in the
organization of the sentence. Some words can be both nouns and verbs, such as
move, and meaning can hinge on what parts of speech are being modified by what.
When we understand language, we don’t notice the ambiguities, because they
usually don’t make sense.
Can the queen take a pawn in the row with the bishop?
We know that the bishop is not helping the queen take the pawn, but that is a
possible interpretation, though it would require a different parse tree from the one
for the usual meaning.
Parsing is the process that starts with a sentence, and derives a parse tree for the
sentence. We can search through a space of possible parse trees that is
generated either searching top-down or searching bottom-up.
A top-down version starts with <S> and branches to all of its possible expansions
in accordance with the rules of the grammar.
(<S> ?)
(<S> (<TYPE> ?) (<NP> ?) (<VP> ?))
(<S> (<TYPE> <DECL>) (<NP> ?) (<VP> ?))
(<S> (<TYPE> <QYN>) (<NP> ?) (<VP> ?))
(<S> ?)
(<S> (<TYPE> ?) (<NP> ?) (<VP> ?))
(<S> (<TYPE> <DECL>) (<NP> ?) (<VP> ?))
(<S> (<TYPE> <QYN>) (<NP> ?) (<VP> ?))
Generated top-down, we can continue expanding in the space of parse trees, until
the individual parse trees have no further expansion, except into words. Any parse
tree for which the words of the sentence are valid, is a valid parse tree for the
sentence. We can search the space of parse trees either depth, or breadth first.
Depth-first search entails a difficulty that we may, with left-recursive productions,
continue to expand the tree forever. if <A> → <A><B> for example. But this can
easily be remedied by ordering the use of productions in expanding the search tree.
Breadth-first search will always find a parse tree, but may not terminate if the
sentence is not grammatically correct.
We can also generate possible parse trees in a bottom-up fashion.
(move the pawn to A6)
((<VERB>move) the pawn to A6)
((<NOUN>move) the pawn to A6)
Whether bottom-up, or top-down, and whether searched depth-first, or breadth-first,
or even searched with heuristics, there are always inefficiencies in parsing.
One problem arises from the fact that parts of a sentence can be processed
identically more than once.
Have the students in your section of CISC 101 taken the exam?
Have the students in your section of CISC 101 take the exam.
How many of the inmates escaped from the prison?
How many of the inmates escaped is still a mystery.
There is interest in devising parsers that will retain partial results in case they are
required for subsequent, different parses of the same sentence (see pp. 800-805)
In speech, we offer disambiguation cues about the sentence in the form of
intonation differences, hand gestures, etc. But in written communication, we are
more prone to misinterpretations, as anyone who frequents discussion boards
knows very well : )  (LOL) (ROTFLMHO)
There are many characteristics of natural language that complicate the process of
parsing. We will return to them later, but first we look at an example of a parsing
system.
4. Augmented Transition Networks (ATN)
There are several ways to represent a simple regular grammar:
(1) all bit-strings with an odd number of 1’s.
(3’)
(2) 0*10*(10*10*)*
(3) S→A
A→0A
A→1B
B→1A
B→0B
B→
(4)
1
S
A
0
B
1
S→0S
S→1B
S→1
B→1S
B→0B
B→0
0
With method (4) above, it is very easy to follow a top-down parse,
and easy visualize search through alternative paths.
These transition networks can easily be extended to cover context-free grammars,
rather than just regular grammars. An arc can specify (when capitalized) entry
(referred to as a “push”) into a different sub-network. (note that across all subnetworks, the node names are kept unique).
v
NP
Q1
aux
v
Q3
S
aux
Q2
NP
<S>→<NP> <aux> <v>
<S>→<NP> <v>
<S>→<aux> <NP> <v>
<S>→<NP> <aux> <v> <NP>
<S>→<NP> <v> <NP>
<S>→<aux> <NP> <v> <NP>
<S>→<NP> <aux> <v> <NP> <PP>
<S>→<NP> <v> <NP> <PP>
<S>→<aux> <NP> <v> <NP> <PP>
.....
NP
Q4
Q5
PP
<S>→<NP> <aux> <VP>
<S>→<NP> <VP>
<S>→<aux> <NP> <VP>
<VP>→<v> <NP>
<VP>→<v> <NP><Q5>
<Q5>→<Q5><PP>
<Q5>→<PP>
<S>→<NP> [<aux>] <v>[<NP><PP>*]
<S>→ <aux><NP> <v>[<NP><PP>*]
v
NP
Q1
aux
v
Q3
S
aux
NP
det
Q5
Q4
PP
NP
Q2
adj
Q6
NP
n
prep
Q7
PP
NP
Q9
Q10
PP
prn
Q8
Dark circled nodes indicate they have a
“pop” back to the point at which they were
entered from. At each category arc (lower
case), one word of input is advanced. If no
arc can be taken, the system backtracks.
v
NP
Q1
aux
v
Q3
S
aux
Q2
NP
NP
Q4
Q5
PP
These networks are specified in a way that includes the use of some special
procedural operations that will accompany the use of a node.
(S (push NP t
setr is “set register”. registers are restored on
(setr subj *)
backtracking, so they always end up with the
(setr type ‘decl)
values as set during the successful path
(to Q1))
through the networks.
(cat aux t
When a category arc is traversed, * is set to
(setr aux *)
the word that is consumed. On a push arc, it
(setr type ‘q)
is set to the structure returned by the sub(to Q2))
network that was entered.
v
NP
Q1
aux
v
Q3
S
aux
Q2
NP
Q4
Q5
NP
(Q4 (pop (buildq (S (TYPE +) + (AUX +) (VP +)) type subj
(push NP t (setr vp (buildq (VP (V + ) *) v))
(to Q5)))
PP
aux v) t)
so, assuming that the registers were set
type = decl (via a transition from S)
subj = (NP (ART the) (NOUN pawn)) from the NP sub-network
aux = is from having gone to Q2
v = threatened from going between Q3 and Q4,
Q4 would return (S (TYPE decl)
(NP (ART the) (NOUN pawn))
(AUX is)
(VP (V threatened))
NP
det
adj
Q6
n
Q7
PP
(NP (jump Q6 t
(setr np ‘(NP (DET nil))))
(push DET t
(setr np (buildq (NP (DET *)) ) ))
(Q6 (cat adj t
(addr np (buildq (ADJ *) ))
(to Q6))
(cat n t
(addr np (NOUN *) )
(to Q7)))
.....
treats register as list, and adds
the required element to end of list
A nice thing about this system is that you can make some very simple diagrams,
code it, and quickly have the parser working for a small subset of the language.
Then you can add more and more, testing it after each addition, until it is able to
parse the full language that you desire.
(parse (can the pawn in the row beside the king take a bishop?) S)
(S QYN (NP (DET (ART the))
(NOUN pawn)
(NUMBER SG)
(PP (PREP in)
(NP (DET (ART the))
(NOUN row)
(NUMBER SG)
(PP (PREP beside)
(NP (DET (ART the))
(NOUN king)
(NUMBER SG)))))
(VP (AUX can)
(TNS PRESENT)
(V take)
(NP (DET
(ART a))
(NOUN bishop)
(NUMBER SG))))
5. Augmented Grammars
The Context-free grammars for English, that we have used so far, will
overgenerate: That is, they will produce sentences which are normally thought
to be invalid in the language.
One example of overgeneration is sentences such as Me eats apples
It would be created with productions such as
S → NP VP
NP → PRONOUN | NOUN
VP → VERB NP
VP → VERB
providing that we do not discriminate between the different cases for the subject
and object of sentences.
subjective case (nomative) I love her
objective case (accusative) she loves me
there is one other – the dative (indirect object) case, not used in English
subjective case (nomative) I love her
objective case (accusative) she loves me
How could we take this distinction into account, and have our grammar only
generate valid sentences?
One approach would be to divide up the grammar, using two different kinds of NP,
and two different kinds of PRONOUNS.
S → NP-SUBJ VP
NP-SUBJ → PRONOUN-SUBJ | NOUN
NP-OBJ → PRONOUN-OBJ | NOUN
VP → VERB NP-OBJ
VP → VERB
This solution would lose some of the expressive power of the grammar by
replicating some aspects of the definitions of language components, such as NP,
and is not a good idea, because there are other similar problems which require
much the same solution, such as agreement.
Agreement refers to the different forms of many words, which are restricted to go
together.
as evidenced by the ungrammatical form
He eat apples
most verbs have two forms which must correspond in number. (whereas to be has
three forms I am, You are, and He is These issues would require further
partitioning of the (already partitioned) grammar.
An alternative is to augment the non-terminals of the grammar with parameters
which can take on values indicative of case, and number.
would have to be subjective
S → NP(s) VP
NP(s) → PRONOUN(s) | NOUN(s)
VP → VERB NP(s)
VP → VERB
would have to be objective
When the productions are applied, or considered in the parse, rules are enforced
concerning the values of the parameter.
When productions such as NP(s) → PRONOUN(s) are applied, the value of the
parameter for the PRONOUN is inherited by the parameter for the NP
Another way to represent this involves the use of Definite Clause Grammars
They are based on the observation that productions have logic counterparts.
S → NP VP expresses the same as
NP(s1)  VP(s2)  S(s1+s2)
where s1 and s2 are substrings of the input, and + denotes concatenation of
strings.
The use of a grammar in this form allows the use of a theorem prover to carry
out the parsing.
the parameters involved in case, and number can easily be incorporated as
additional variables in the logic statements.
NP (case) → PRONOUN (case)
becomes
PRONOUN (case,s1)
NP (case,s1)
This indicates that if s1 is a pronoun, with its case specified by the parameter
case, then s1 is also a noun phrase with the same case.
A definite clause grammar is effective in expressing the intentions in processing the
variations of form that words can take on. But in practice, these operations are
usually isolated from the grammatical specification through the use of a
morphological analysis that takes place on the words as they are encountered
before they are accessed by the parser.
A dictionary is set up which holds all the root words (and irregular forms).
for example, it could contain:
capture v s-d
trans passive fixtime
indicting that capture is a regular verb, with normal s and d endings which specify
tense, number, and person. The verb is transitive – meaning that it requires an
object, it can exist in a passive form (Can the pawn be captured?), and the verb
represents an action that takes place in a fixed short time period.
is
v (be (tns present) (pncode 3sg))
specifying is as an irregular variant of the verb be which is used when indicating
the present tense, with the 3rd person singular form (She is).
In addition, there would be a table of endings, which would allow variants of words to
be correctly interpreted. It might contain
s
v
s-d, s-ed (pncode 3sg) (tns present)
This would indicate that whenever a word is not found in the dictionary, but if we
remove a trailing s, it is found to be a verb which is regular with either s-d or s-ed
ending specification, then return the root word with person number code specifying
3rd person singular, and the present tense. So if we found threatens, we would follow
that rule.
As a sentence is made available to the parser, all these details about the words
are made available through the morphological analysis.
The pawn threatens the queen
the article
pawn noun (pncode sg)
threaten (pncode 3sg) (tns present)
the article
queen noun (pncode sg)
These characteristics of words do
not change much from one
application to another, so any
parsing system will already have an
extensive set words with default
values, to which only slight changes
will have to be made.
The parser is then able to enforce whatever restrictions that are desired.
When <S>→<NP><VP> is being applied, (that point in a transition network is
entered), the pncodes for the verb and for the NP can be checked to make sure
they are appropriately matched.
When using a system such as ATN’s, there are many working example languages
available to the designer. Surprisingly few modifications of any substance will have
to be made to those examples to make it work effectively within your domain of
discourse.
6. Semantic Processing
Thus far we have only seen how to detect syntactic errors and how to produce a
parse tree that indicates the role that each word plays in the sentence. That
information is critical, but it remains to convert the sentence into an operation that
can take place within a database of knowledge, resulting in a response. That last
step is called semantic processing.
The objective is to create an executable form from the parse tree (or during the
parse).
It is possible to specify semantic action that is associated with each production of the
language.
Number(x)→ Digit(x)
Number(x)→ Number(x1) Digit(x2){x= 10x1+x2}
Digit(x)→ 1{x=1} | 2{x=2} | . . .
then the form that is associated with the final Number can be executed to determine
the value of the number.
This captures the idea of compositional semantics: that the meaning of the whole is
composed out of the meanings of the components.
This could easily be extended to a grammar for expressions (as seen on page 811).
But it seems a bit strange to be given something like (4*6)+(5*19) and to use a
grammar to determine that the semantics of the expression is to execute (4*6)+(5*19)
When the target execution is logic derived from English, the task is more apparent
(and more difficult)
If using logic within our knowledge base, we would want to convert a sentence such
as
John loves Mary into the form loves(John, Mary)
this logic form would have to be constructed when the production
S → NP VP
is applied.
we can specify the component meanings: John and Loves(?, Mary), but they do not
combine in as straightforward a manner as the numbers, and expressions did.
But if we use lambda calculus expressions for the semantics of the components,
then we can just concatenate the components to derive the logic form.
S (rel (obj)) → NP (obj) VP (rel)
where rel = λx Loves(x, Mary) and obj = John
the result will be (λx Loves(x, Mary)) (John) In deriving VP from VP → V NP we
can insert Mary into the result in the same way
S (rel (obj)) → NP (obj) VP (rel)
VP (rel (obj)) → V (rel) NP (obj)
NP (obj) → NAME (obj)
NAME → John | Mary
V(λx λy Loves(x, y)) → loves
As we attempt to include more aspects of English expressions, the logic form
becomes more complex:
The black queen threatens the white king
The black queen threatened the white king
requires the use of an event logic such as:
V(λx λy e  Threatens (x, y)  During(Now, e)) → threatens
V(λx λy e  Threatens (x, y)  After(Now, e)) → threatened
A problem occurs as we try to extend this semantic processing method to the use
of quantifiers.
for Every bishop threatens a pawn we want
b b Bishops  ( p) p Pawns  ( e) e Threaten(b,p)  During(Now,
e)
It appears this could be composed from the components used for the NP and VP
NP (b b Bishops  P)
VP ( (  p) p Pawns  ( e) e Threaten(b,p)  During(Now, e))
But this would require both inserting (  p) p Pawns ... During(Now, e) in place
of P, and inserting b in place within Threaten(b,p). So our previous simple
concatenation of logic forms will no longer work.
The solution to this problem is to develop special forms which can be built up in a
more general way. This could be done through more and more sophisticated logical
forms, or by performing matches within the parse tree, looking for subtrees that can
be represented as simple database actions, and then building them into more
complex actions with higher level matches within the parse trees.
In any event, semantics is the most difficult part, and almost always requires
implementers to resort to some ad hoc techniques, for which methods that have
procedural components have an advantage over methods that are entirely
declarative (such as pure logic representations).
In any event, there are further difficulties with pragmatics. For example, the
sentence:
Every bishop threatens a pawn
could mean that every bishop threatens the same pawn, which in turn would only
make sense if one of the players had already lost both bishops. So the actual
situation on the chessboard can contribute to the preferred meaning associated with
a sentence.
AIMA references (2e)
communication as action: pages 790 to 795(bottom)
formal grammar for English fragment: pages 795(bottom) to 798(top)
parsing: pages 798(top) to 800(top)
augmented grammars: pages 806 to 810(mid)
semantic interpretation: pages 810 to 814(top)
AIMA references (3e)
Chapters 22 & 23 include the same material (see especially Ch 23) but with
revised emphasis and considerable new material