Lecture 1 String and Language

Lecture 1 String and
Language
String
• string is a finite sequence of symbols.
For example,
string
( s, t, r, i, n, g)
CS4384
( C, S, 4, 3, 8)
101001
(1, 0)
• Symbols are given through alphabet.
• An alphabet is a finite set of symbols.
Examples of Alphabet
• {a, b, c, ..., x, y, z} (Roman alphabet)
• {0, 1, ..., 9}
• {0, 1}
(binary alphabet)
Length of a String
• The length of a string x is the number of
symbols contained in the string x, denoted
by |x|.
• For example, | string | = 6,
• |CS5400| = 6, | 101001 | = 6.
• The empty string is a string having
no symbol, denoted by ε.
Equal
• Two strings x1x2···xn and y1y2···ym are
equal if and only if
(1) n=m and
(2) xi=yi for all i.
• For example, 01 ≠ 010 and 1010 ≠1110.
Substring
• s is a substring of x if there exist strings
y and z such that x = ysz.
• In particular,
when x = sz (y=ε), s is called a prefix of
x;
when x = ys (z=ε), s is called a suffix of x.
For example, CS is a prefix of CS5400
• and 5400 is a surfix of CS5400.
Concatenation
• The concatenation of two strings x and y is
a string xy, i.e., x is followed by y.
• For example, CS5400 is a concatenation
of CS and 5400.
• In particular, we denote
4
2
3
xx = x, xxx = x, xxxx = x, ..., and define
0
x =ε
3
0
• For example, 101010 = (10), (10) = ε
Solve equation 011x=x011
•
•
•
•
If x=ε, then ok.
If |x|=1, then no solution.
If |x|=2, then no solution.
If |x|>3, then x=011y. Hence,
011x=011y011. So, x=y011.
Hence, 011y=y011.
k
• x=(011) for k > 0
Language
• A language is a set of strings.
0
1
2
For example, {0, 1}, {all English words}, {0, 0, 0,
...} are all languages.
• The following are operations on sets and hence
also on languages.
Union: A U B
Intersection: A ∩ B
Difference: A \_B (A - B when B  A)
Complement: A = Σ* - A where Σ* is the set of all
strings on alphabet Σ.
Concatenation of Languages
• Concatenation: AB = {ab | a \in A, b \in B}
• For example, {0, 1}{1, 2} = {01, 02, 11, 12}.
• Especially, we denote A = A, A = AA, ...,
1
2
and define A = {ε}.
0
If AB=B for any B, then A ={ε}.
• Choose B = {ε }. Then A ≠ empty and A
cannot contain a nonempty string.
Examples
2
• For Σ = {0, 1}, Σ = {00, 01, 10, 11},
k
• (Σ is the set of all strings of length k on Σ.)
Therefore,
0
1
2
• Σ* = Σ U Σ U Σ U ···.
Kleene Closure
• Kleene closure:
0
1
2
A* = A U A U A U ···
• Notation:
+
1
2
3
A = A U A U A U ···
• A={grand, ε}, B={father, mother}.
What is A*B?
• A*B={father, mother, grandfather,
grandmother, …}
What is  ?
0
• What is * ?
• What is   ?
• Where  is the empty language.
+
A* = A if and only if ε is in A
+
+
• If ε is in A, then ε is in A. Hence A* = A.
+
• If ε is not in A, then ε is not in A.
Hence A* ≠ A.+
{0, 10}* is the language of strings
not containing substring 11 and not
ending with 1.
• What is the language of strings not
containing substring 11 and ending with
0?
+
• {0, 10}
Puzzle
• How many strings of length at most 40 are in
the following language ?
{ ,0,0 ,0 }
2
5 5
Lecture 2 Regular Language
and Regular Expression.
Regular Languages
• The concept of regular languages on an
alphabet Σ is defined recursively as follows:
(1) The empty language  is regular.
(2) For every symbol a  Σ, {a} is regular.
(3) If A and B are regular languages, then
A U B, AB, and A* are regular.
(4) Nothing else is a regular language.
{ε} is regular.
• Because the empty language  is regular,
 * = {ε} is regular.
For Σ={0,1}, {011} is regular.
• Since {0} and {1} are regular,
{011}={0}{1}{1} is regular
• Remark: Every language containing only
one string is regular.
{011,100} is regular.
• Because {011} and {100} are regular,
{011, 100} = {011}U{100} is regular.
• Remark: Every finite language is regular.
• Remark: Every infinite regular language
must be obtained with Kleene closure.
Operation Preference
• ({0}*U{0}{1}{1}*){0}{0}{1}*
• (1) Kleene closure has the higher
preference over union and concatenation.
• (2) Concatenation has the higher
preference over union.
The language of all binary strings starting
with 01 is regular.
Proof. The string in this language is in form
01x1··· xn
where x1··· xn  {0,1}*. Therefore, the
language can be written as
{01} {0,1}* = ({0}{1})({0} U {1})*,
which is regular.
The language of all binary strings ending at
01 is regular.
Proof. The string in this language is in form
x1··· xn01
where x1 ··· xn  {0,1}*. Therefore, the
language can be written as
{0,1}*{01} = ({0} U {1})*({0}{1}),
which is regular.
The language of all binary strings having
substring 01 is regular.
Proof. The string in this language is in form
x1 ··· xn01y1 ··· ym
where x1 ··· xn, y1 ··· ym {0,1}*. Therefore,
the language can be written as
{0,1}* {01} {0,1}* =({0}U{1})*({0}{1})({0}U{1})*,
which is regular.
Question:
Do you fell that the expression of the regular
set in the above example contains too many
parentheses?
• Here is a simple expression -- Regular
Expression
Regular Expression
• (1)  is a regular expression of the empty
language.
• (2) ε is a regular expression of {ε}.
• (3) For any symbol a, a is a regular
expression of {a}.
• (4) If rA and rB are regular expressions of languages A
and B, then rA+rB is a regular expression of A U B, rArB is
a regular expression of AB, and rA* is a regular
expression of A*.
Examples
•
•
•
•
011 is a regular expression of {0}{1}{1}.
0+1 is a regular expression of {0,1}.
(0+1)* is a regular expression of {0,1}*.
Remark: (0+1)+ is also considered to be
+
a regular expression of {0, 1}.
• The language of all binary strings starting
with 01 has a regular expression
01(0+1)*.
• The language of all binary strings ending at
01 has a regular expression
(0+1)*01.
• The language of all binary strings having
substring 01 has a regular expression
(0+1)*01(0+1)*.
Induction Proof
• Because the regular language is defined recursively,
• we can prove the property of regular languages by
• proving the following:
(1)  has the property.
(2) For any symbol a  Σ, {a} has the property.
(3) If A and B has the property, then all A U B, AB, and
A* have the property.
• Actually, this is an induction proof. (1), (2) serve the
basis step and (3) is the induction step.
• For a string x=x1x2…xn, x R =xn…x2x1.
R
• For a language A, A = {x R| x  A}.
• Show that if A is regular, so is A.R
Proof. (1)  R   is regular.
R
(2) For any symbol a, {a} = {a} is regular.
R
(3) Suppose that for regular languages A and B, A
and B Rare regular. Then
R
R
R
(A U B) = A U B is regular,
R
R R
(AB) = B A is regular.
(A*) R = (A R)* is regular.
Find a regular expression for
{xwxR | x  (0+1)*, w  (0+1)*}
• {xwx R | x  (0+1)*, w  (0+1)*} = (0+1)*
Find a regular expression for
{xwxR | x  (0+1),+ w  (0+1)*}
•
x  (0+1), w  (0+1)*}
= 0(0+1)*0 + 1(0+1)*1
{xwxR |
+
Puzzle
• How many regular expressions can a
language have?