LING2011 Language and Literacy

Introduction to Cognitive Science
Linguistics Component
Lecture 2
September 22, 2005.
(2.00 p.m. – 3.50 p.m.)
Venue: Meng Wah Complex Room 324
Lecturer: Dr. A. B. Bodomo
Department of Linguistics
<[email protected]>
Topic 3:
Formal Grammar:
Parsing and Generation
Introduction
• In my previous lectures, we discussed how tacit
linguistic knowledge can be represented at
various levels of phonology, morphology, syntax,
semantics, pragmatics, and their interfaces,
including morphophonology, morphosyntax, and
the syntax-semantics interrelationships.
• In this lecture, we shall look closely at how
these linguistic knowledge representations can
be formalised into an algorithm, a computational
procedure for processing this linguistic
knowledge.
3
Keywords
•
•
•
•
•
•
Constituent structure rules
initial symbol
terminal symbol
non-terminal symbol
generative grammar
formal grammar
4
Formal devices and notation
• The symbol ‘’
– indicates that a node is ‘rewritten as…’ or ‘consists
of ’, or ‘has the constituents…’
• This is used in rewrite rules of the type:
– S  NP + VP
• a sentence, S, has the constituents: noun phrase (NP)
and verb phrase (VP)
• Optionality in the grammar is expressed as {X,
Y} .
– This means apply either X or Y but not both
5
Formal devices and notation (cont’d)
• Initial symbol: the
• The symbol # is
symbol from which a
used to indicate
rewrite rule begins (e.g.
constituent boundary
S)
– e.g. # _ is word initial
while _# is word final • Terminal symbol: the
end symbols from which
• The notation X (Y)
no constituent structure
implies that X is
can be further
obligatory and may
developed (N, V, Art).
be followed by Y
All others are nonterminal symbols (e.g.
NP, VP).
6
Two main aspects of grammatical
information processing:
Generating and Parsing sentences
• Before we begin let us illustrate
with a simple grammar and
lexicon, using the following
sentence:
– The students greeted the teacher.
7
The students greeted the teacher.
• Grammar:
– S  NP +VP
– VP  V + NP
– NP  Art + N
• Lexicon 1:
– Greeted: V, - NP
– Students: N
– The: Art
– Teacher: N
This grammar can also generate (i.e. produce) the following sentences:
The teacher greeted the students
The teacher scared the students
The child ate an apple
But you have to augment i.e. increase the lexicon as follows:
Lexicon2:
an: Art
the: Art
teacher: N
students: N
apple: N
child: N
greeted: V, -NP
scared: V, -NP
ate: V, -NP
8
Sentence Generation:the algorithm
• To produce a sentence we need three
things:
A set of phrase structure rules (as illustrated
above)
A lexicon (as illustrated above), and
A lexical insertion rule (as explained below)
• A lexical insertion rule is an instruction to
select the right word from a lexicon
• The following is an example of a lexical
rule:
9
Lexical insertion rule
• For each terminal symbol of a phrase structure rule,
select a word from the lexicon that satisfies the
following conditions:
– terminal symbol (e.g. N, V) It is a member of the
class of
– its subcategorization frame matches that of the
terminal symbol (e.g. V, _NP). Attach this word as
the daughter of this terminal symbol.
• The set of rules above constitutes what is known as
a sentence generator.
10
• The whole procedure of beginning with an
initial symbol and then working through
phrase structure rules to adding the lexical
items via lexical insertions rules is driven
by an algorithm or a set of instructions.
• Let us set out an algorithm for the
generation (production) of the sentence:
The students greeted the teacher, a
grammar and a lexicon as follows:
11
The students greeted the teacher
Grammar:
PS Rule (a): S  NP +VP
PS Rule (b): VP  V + NP
PS Rule (c): NP  Art + N
Lexicon1:
Greeted: V, - NP
Students: N
The: Art
Teacher: N
Rule 1
Start with the initial symbol, S.
Rule 2
For every non-terminal symbol, X, find a phrase structure rule
with X as left-hand symbol and others as the right hand
symbol(s), and develop a rewrite rule with X as the mother and
the right hand symbols as ordered daughters.
Rule 3
Apply rule 2 until all branches end in terminal symbols.
Rule 4
Apply lexical rule iteratively until every terminal symbol is
replaced by a lexical item.
12
Illustrating the algorithm
S
Applying Rule 1
VP
NP
Art
N
Applying Rule 2,3
NP
V
Art
The
professor
greeted
N
Applying Rule 3
the students Applying Rule 4
• From the above we can see that we have
started from an initial string and have
ended with terminal strings with lexical
items as their daughters. A sentence has
thus been generated (produced), telling us
how this sentence is built up.
• Now, let us see how we can begin with an
existing sentence and then break it down
into its component parts by applying rules.
14
Sentence parsing:
the algorithm
• To parse a sentence means to
analyse it into its constituent parts
by the systematic application of
lexical insertion rules and some
phrase structure rules.
• It is like the reverse process of
generation.
15
Types of Parsing
• Top-down: Begin with the symbol S.
• Bottom-up: Begin with terminal symbols
(words).
Possible research: Which types of parsing in
natural languages provide the most cognitively
realistic and efficient parser?
16
Some sentence parsing rules which
constitute a PARSER
• For a sentence, S
– Rule 1: Determine from the lexicon the word class of every item and
develop a partial tree for each word where the word class label
dominates the word.
– Rule 2: Find a PS rule of the type X  Y, Z and where the right hand
symbols match some sequence of categories in the structure
so far, and develop a partial tree with X as the mother and the
right hand symbols as ordered daughters.
– Rule 3: Continue rule 2 until the root, S, is reached and there are no
unattached strings.
17
The man drank the tea.
Grammar:
PS Rule1: S  NP +VP
PS Rule2: VP  V + NP
PS Rule3: NP  Art + N
Applying Rule 1
Lexicon1:
drank: V, - NP
man: N
the: Art
tea: N
Art
The
N
V
man drank the tea
NP
Applying Rule 2
Art
The
Art N
NP
N
V
Art N
man drank the tea
18
VP
Applying Rule 3
NP
Art
The
NP
N
V
Art N
man drank the tea
S
NP
VP
NP
Art
The
N
V
Art N
man drank the tea
Conclusion
• Parsing and generation of natural language
data is a very important area of linguistics,
especially in computer applications of natural
languages which has become an important
aspect of the computer or information
processing industry.
20
Topic 4:
Language and Literacy
Acquisition
Keywords
• language acquisition
• innateness hypothesis
• language faculty / Language
Acquisition Device (LAD)
• literacy
• levels of literacy
• literacy acquisition
22
Introduction
• Theme
– A survey of how linguistic knowledge is acquired/learnt
by speakers of a language, from the point of view of
spoken language and from the point of view of literacy
(reading and writing).
• Objective
– an understanding of the basic terms and issues in
language and literacy acquisition
– an interface approach: rather than rigidly discussing
these issues from language acquisition as separate and
different from literacy acquisition, we will look at how
23
language acquisition relates to literacy acquisition.
What is language acquisition?
• Gleitman and Bloom 1999:434
– ‘refers to the process of attaining a specific variant of
human language…the fundamental puzzle in
understanding this process has to do with the openended nature of what is learned: children appropriately
use words acquired in one context to make reference in
the next, and they construct novel sentences to make
known their changing thoughts and desires’ (in MIT
Encyclopedia of the Cognitive Sciences).
• Crystal 1997: 430
– The process of learning a first language in children.
– The analogous process of gaining a foreign or second
24
language.
Explaining how languages are acquired
• In previous lectures we have tried to account for
how all and only the grammatical sentences of a
language are produced and represented in the
brain of the speakers of a language.
• However, a complete account of linguistic
knowledge representation must address the issue
of how we acquire a language as children and
how we learn foreign languages as adults.
• We will mainly be concerned with first language
acquisition and not foreign language learning. 25
Stages of language development
• the single word stage (12-18 months)
– the language of the child consists of just a few isolated
words of the target language, e.g. ‘mamma’,
‘daddy’,etc.
– very little grammatical development
• the grammar stage (19-29 months)
– marked by the emergence of a few nominal and verbal
inflections in languages that have these.
– a few phrases and word utterances apparently strung
together: ‘mammy, milk’; ‘daddy bye bye’, etc.
• 30 months
– can produce more adult-like speech: ‘Where's daddy ?’
‘Daddy, I want to go with you.’
26
Explaining language acquisition:
• The reason for the uniformity and rapidity in child language
acquisition is contained in the innateness hypothesis.
• This is, at least, the position of Chomsky and most cognitive
approaches to linguistic explanation.
• In this hypothesis, language acquisition is determined by a
biologically endowed innate language faculty (also called
Language Acquisition Device (LAD)).
• LAD or language learning ‘program’ in children’s brains provides
them with a set of procedures (let us call it an ‘algorithm’ since
we are computer/cognitive science inclined) for developing a
grammar.
– Input: linguistic experience they get from the parents and
teachers.
27
The nature of the language faculty
• Children can acquire any language as their native
tongue.
– e.g. a child of Cantonese speaking parents
growing up in England can learn to speak
perfect English as her native tongue.
• Those aspects of language innately determined
are universal
– language faculty does not vary significantly
from human to human
An important aspect in the language faculty is the search for
principles of Universal Grammar!
28
Universal Grammar (UG)
• A theory of the human language faculty, i.e. a
module of the mind/brain involved in the basic design
of language (Noam Chomsky)
• It is part of an innate biologically endowed language
faculty, an innate mental organ specific to the human
species
• It allows us to perceive and interpret information
governed by certain formal constraints
• These formal constraints refer to a system of rules
and representations and one of its operations (its
grammar) by which the acceptable sentences of a
language can be generated
– Examples of formal universals, linguistic constraints of an
abstract nature: the binding principles determining what can
or cannot be the antecedent of an anaphoric, pronominal, or
fully referential nominal element, etc.
29
Literacy Acquisition
• Literacy: the ability to read, write and calculate basic
numbers
• Difficult to define:
– can mean different things to different people in
different areas: computer literacy, investment
literacy, etc.
• Is literacy part of our mental, cognitive faculty?
– Yes, because any human can acquire literacy i.e.
learn how to read, write and calculate basic
numbers given the right environment
30
Levels of Literacy
(cf. Stages of language acquisition)
• 6 stages of reading (Daswani 1999)
– Stages 1-3: Pre-reading, decoding, fluency
(approx. grades 1 – 3)
– Stage 4: Acquiring new knowledge
(approx.grades 4 – 8)
– Stage 5: Reading a range of complex materials
critically (grades 9 – 12)
– Stage 6: Mature reader: able to read for
various purposes: professional, personal, civic
(university and beyond)
31
The relationship between language
and literacy acquisition
• Traditional/historical view of child language acquisition:
– learning to speak happens up to the age of five years,
while learning to read happens after five.
• Now they are seen as very intertwined i.e. very related:
learning to speak and learning to be literate both deal
with learning to use language
• the basis of learning to speak has been outlined to
provide an ecology for literacy. The most important
lesson is that learning to speak and learning to read are
very much interwoven.
32
Evidence of the interface of
language and literacy acquisition
• They are both part of learning to USE language.
• Both need input from the environment.
– can be compared with Vygotsky's idea of ZOPED, zone of
proximal development, i.e. the distance between child initiative
and ability of child to do things under the influence of parental
support.
– The learning environment: participants, situation, activity and a
mechanism
• Literacy acquisition is like language acquisition (cf. Givon's idea of
literacy acquisition as a weak reflex of language acquisition).
• Literacy is best acquired in a language one has acquired.
33
Conclusion
• Literacy (reading and writing) is then another
level/kind of linguistic knowledge representation.
• Spoken and written linguistic knowledge
representation interface with each other and are
very intertwined.
• Language and literacy acquisition have very
important social, educational and cognitive
implications.
• Language and Literacy acquisition should
therefore form an integral part of cognitive
34
science.
References
•
•
•
•
•
•
•
•
•
David Barton. 1994. The roots of literacy. Literacy: An Introduction to the Ecology of Written
Language. Oxford UK and Cambridge USA: Blackwell. Chapter 9, p.130-139.
C. J. Daswani. 1999. Literacy. In Bernard Spolsky (ed) 1999. Concise Encyclopedia of
Educational Linguistics. Oxford: Elsevier Science Ltd..
Viv Edwards and David Corson (eds.) 1997. Encyclopedia of Language and Education,
Volume 2: Literacy. Netherlands: Kluwer Academic Publishers.
Talmy Givon. 1998. The grammar of Literacy. In Syntaxis, 1, 1998: 1-40.
Elfrieda Hierbert. 1994. Literacy in preschool programs. In Alan C. Purves et al.(eds.) 1994.
Encyclopedia of English Studies and Language Arts. New York: Scholastic. 754-756.
Ernest Lepore and Zenon Pylyshyn (eds). 1999. What Is Cognitive Science. Blackwell
Publishers. (especially chapters 10, 11, 12, and 13)
Neil Stillings and others. 1995. Cognitive Science: An Introduction. MIT Press. (especially
chapters 6, 9, 10, and 11)
Daniel A. Wagner. 1994. Literacy: definitions. In Alan C. Purves et al.(eds.) 1994.
Encyclopedia of English Studies and Language Arts. New York: Scholastic. 748-752.
R. Wilson and Frank C. Neil (eds.) 1999. The MIT Encyclopedia of the Cognitive Sciences.
MIT Press.
– Lila Gleitman and Paul Bloom. Language Acquisition. p.434-438
– David Olson. Literacy. p.481-482
35
Tentative List of research topics for
Cognitive Science Students
• Supervisor: Dr. Adams BODOMO ([email protected])
•
Topics in Syntax: Theory, Description and Application
– Building human language components in Computational Systems
– The LFG treatment of serial verbs, Complex Predicates, and other verbal
constructions in various languages: French, Norwegian, Japanese, Chinese,
Dagaare, etc
•
Topics in Language and Literacy as cognitive processes
– Chinese writing and computer technology: Survey and evaluation of various
inputting systems.
– New forms and functions of language and literacy in the age of Information
technology (emails, ICQ, bulletin boards, mobile phone texting,etc).:A survey of
SMS texting as a cognitive and communicative process in HK
•
The grammar of aphasic patients
36
Further studies - courses by Dr Bodomo
• LING1002 - Language.com: Language in the
Contemporary World (1st year undergraduate,
co-taught with other staff members)
• LING2011 - Language and Literacy in the
Information Age
• LING2032 - Syntactic Theory
• LING2018 - Lexical-Functional Grammar
• LING2041 - Language and Information Technology
• LING2050 – Grammatical Description
• LING2051 – French Syntax and Universal Grammar
• Also consider B.A. in Human Language Technology
(HLT) as an option for a minor
37
Take-home Quiz
• Please submit your answers to your tutor
on or before September 22, 2005.
38
- The End -
39