Off-line Arabic handwritten words recognition based on hierarchical

Off-line Arabic handwritten words recognition based
on hierarchical Bayesian networks
Khlifia Jayech 1, 1, Mohamed Ali Mahjoub 1, Najoua Essoukri Ben A mara1
1
Research Unit SAGE, Team Signals and Image Documents, National Engineering School
of Sousse
[email protected]
[email protected]
[email protected]
Abstract. This paper present new probabilistic graphical model used to model
and recognize Arabic Handwriting words representing names of Tunisian cities.
In fact, this work is based on Hierarchical Bayesian networks. The aim is to
find the best model of Arabic Handwriting to reduce the complexity of
recognition process by permitting the partial recognition. Indeed, we propose a
hierarchical division of the word and then we extract the characteristics of each
block using Zernike and Hu moments, which are invariant to rotation,
translation and scaling. Next, we model each block by a latent model and the
word by latent hierarchical Bayesian network. In such models, the task of
determining the cardinality of the latent variables is difficult, that we solve
using K-means. Our approach is tested using IFN / ENIT database, experiments
results are very promising.
Keywords: Bayesian networks, handwriting recognition, hierarchical latent
class model.
1
Introduction
Segmentation and recognition of Arabic handwritten words are t wo important
areas of research. The recognition of Arabic handwritten words has many applications
in bank check reading, mail sorting, forms processing in admin istration and insurance.
Off-line Arabic Handwriting Recognition is a difficult task due to the high variab ility
and uncertainty of human writ ing (infinite variat ions of shapes resulting from the
writing habit, scanning methods, fusion of d iacritical points, overlappin g …). Many
works proposed different approaches such as statistical, structural, neural , and
Morkov, but all these approaches have many limits.
Thus we propose, in this paper, an original method of recognition based on
hierarchical Bayesian networks. The originality of our approach also relies on the use
of k-means to estimate the cardinality of hidden nodes.
In the follo wing section, we expose a state of art of the related work and then the
theory of Bayesian networks and hierarchical Bayesian networks, respectively. The
developed approach is addressed in section 4. Experiments are the subject of section
5. Conclusion and some perspectives are addressed in the last section.
2 Related works
In recent years, many researchers have addressed the recognition o f Arabic text.
In [1], Benouareth et al. proposed a MMC model in wh ich the state duration is
modeled exp licit ly. They begin by division the handwriting word into vertical band
(division may be uniform or not). Then, for each band, they extract a vector (33
features) to be used for training the model. They have shown experimentally that the
use of these models DHMM (MMC duration of explicit statements) is more efficient
than the use of standard semi-continuous HMMs. They also showed that the division
into non uniform bands gives better results than division in uniform bands.
Experiments on this model were performed based on IFN / ENIT. For the best system
(Gamma to model duration of states and non-uniform division into bands), the authors
obtained a recognition rate of 89.79%.
In [2], S. Masmoudi Touj proposed a recognition system based on pseudo 2D or
MMC planar. This approach is based on a decomposition of writ ing characteristics
into five zones which are the upper and lo wer diacrit ics, extensions top and bottom
and the middle zone. This decomposition allo ws obtaining homogeneous groups of
entities with morphological characteristics specific to each area which can be
characterized by appropriate techniques. Each horizontal zone is associated with a
super model status MMCP and each super state was modeled by an HMM that adapts
well to semi cursive nature of Arabic script. The correlat ion between different
horizontal zones is provided by the vertical model o f MMCP.
In [16], the author proposed an alphabet of letters body that can take into account
the redundancy of forms of letters of the Arab ic alphabet. He has shown that although
the points are essential to differentiate the Arabic letters, it is possible to propose a
recognition system on an application to average vocabulary size (937 Tunisian town
names) without it being necessary to take account the information carried by
diacrit ics. The recognition rate of this system evaluated on the basis of IFN / ENIT is
89.98%. He then proposed a mechanis m for dealing with d iacritical marks, and a
strategy to combine this information with the recognition results without diacritics.
This strategy helped to increase the recognition rate to 92%.
In [15], M. Pechwitz and V. Maergner use a sliding window of width three pi xels
on a normalized image. On each window, they form a single vector co mprising values
grayscale fro m the three bands. They use Hidden Markov Models to model semicontinuous letters. The model word is obtained by concatenation of model letters.
This system was tested on IFN / ENIT and gave a recognition rate of 89.5%.
3 Bayesian Networks
Bayesian networks (BNs), also known as belief networks, belong to the family of
probabilistic graphical models (GMs). A Bayesian network is a directed acyclic graph
(DA G) co mposed of nodes and directed links. The nodes correspond to random
variables and directed lin ks between nodes represent probabilistic dependencies. The
conditioning variables and the dependent variables are called parent nodes and child
nodes, respectively. For a link between two variab les, X Y, the overall joint
distribution is specified by the product of the prior probability P(X) and the
conditional probability P (Y|X).
In general, given a set of N variables H1:N =H1 ,…,HN , the joint probability
distribution P(H1:N )=P(H1 ,…,HN ) can be factored into a spare set of conditional
probabilit ies as follows according to the conditional independency:
Where
is the set of parent nodes of node
in the DA G.
3.1 Hierarchical B ayesian Networks
Hierarchical Bayesian networks are a generalization of standard Bayesian
networks, where a node in the network may be an aggregate data type. This allows the
random variables of the network to represent arbitrarily structured types. Within a
single node, there may also be links between co mponents, representing probabilistic
dependencies among parts of the structure. Hierarchical Bayesian networks encode
conditional probability dependencies the same way as standard Bayesian networks.
Hierarchical Bayesian networks are used in various domains such as event recognition
of human actions and interactions [17], genetic association studies [18], Discovering
Handwriting Strategies of Primary School Children [19].
3.2 Inference i n Hierarchical B ayesian Networks
Inference algorith ms used for standard Bayesian networks can also be applied to the
hierarchical Bayesian network model. Given a part ially observed set e of variables
referred to as “evidence”, the set of beliefs is established and updated and established
to reflect both prior and observed information depending on the evidence:
B (h) = P (h|e)
Where B(h) is the belief in the value h of variab le H given the evidence E.
The most probable explanation h* of the hidden variable H give the evidence e is
determined by:
The belief update is achieved by a message-passing process distributed along the
network through the explo itation of the local dependencies and global independencies
of various variables.
4
The developed approach
Database
Feature
extraction
Pretreatment
Discretization
Determination
cardinality
hidden nodes
Classification
Inference
C1
…
Recognition
Structure learning
Parameter learning
Ci
Class
Cn
Fig. 1. Developed system architecture
In the next section, we detail the different steps of the developed system:
4.1 Pretreatment
of
of
After the pre-processing: normalization and thinning (fig. 2.), we d ivide the word in a
set of blocks using a threshold to avoid having empty blocks .
(a)
(b)
(c)
Fig. 2. (a) Original Word (b) word normalization and thinning (c) divided word.
4.2 Feature extraction
After the pretreatment of the word, we use the descriptors such as mo ment invariants
of Zern ike and Hu descriptors, which are invariant to rotation, t ranslation and scaling,
to extract the characteristics of each block.
Zernike descriptor:
They are constructed using a set of complex polynomials which form a comp lete
orthogonal set on the unit disk with
:
Where m and n define the order of the mo ment and I (x, y) the gray level of a pixel in
the image. The Zern ike polynomials
are exp ressed in polar coordinates:
Where
is the radial o rthogonal polynomial:
Hu descri ptor:
Hu made the first step in the use of mo ment invariants for pattern recognition by
offering the seven Hu mo ments which are calculated fro m the normalized mo ments.
Hu mo ments are invariant to translation, rotation and scaling.
We introduce the normalized mo ments as follows:
The seven mo ments of Hu are:
For examp le the mo ment invariants of the word “sidi dhaher” are:
block 1
0,0003 0,0032 0,6098 ..
..
..
..
0,1760
0,0453
block 2
0,0011 0,0002 0,6861 ..
..
..
..
0,1732
0,0399
block 3
0,0053 0,0004 0,6828 ..
..
..
..
0,1767
0,0555
block 4
0,0001 0,0001 0,7132 ..
..
..
..
0,1718
0,0356
4.3 Discretization
The descriptors of mo ment invariants, described in the previous section, give us
continuous signatures. However, Bayesian networks require discrete variables. So, we
process to the next stage of pre-treat ment, which consist to discretize the obtained
variables. We opt for the method proposed in [8], because it allows a significant
reduction of the data without loss of informat ion. Th is method consists to represent
the laws o f probability in the fo rm of histograms. These histograms must optimally
approximate, in the sense of maximu m likelihood and the mean square error, the
unknown probability laws of a random process with a single n -sample. To obtain the
bin number of histogram and distribution for each bin, the Akaike information
criterion (AIC) [9] has been generalized to our p roblem. Specifically, we first
construct a histogram with m bins fro m the samp le, to reduce the size of the
histogram, we have merged the two ad jacent bins, that maximize the d ifference
between AIC before and after merg ing. This process is repeated until the d ifferen ce
would be negative.
4.4 Determinati on of cardi nality of hi dden variable
Although this task is difficult, some methods have been proposed for estimating
the cardinality of the latent variables such as cross-validation method used in [11],
which is to maximize a criterion score on the structure obtained for examp le the BIC
criterion, the algorith m known as Structural EM [12] which was adapted for the latent
hierarchical models to optimize the size of the latent variab les in the learning phase.
In our work, we used the clustering method (k-means) for a first attempt to
approximate the size of our h idden nodes.
4.5 Learning
The construction of a Bayesian network requires the determination of its structure
and its parameters.
 Structure Learning
Structure learn ing is a difficult task, we consider the recognition process as a
classification problem and we propose to use latent hierarchical models ( fig.3). These
models generalize the latent structure models (fig. 4).
Word Model: hierarchical Bayesian Networks
0
Ci=1…8
Fig. 3. Hierarchical Bayesian Networks
We model our problem with a hierarchical model where the local structures of each
block are latent discrete variables that are independent of each other, but all depend
on a global structure that is latent.
Block Model: latent model
Considering now that each block is represented by its own local model, the task is
to determine the local structure between observed variables and latent variables.
Several solutions are available to us. One solution is to use conventional algorith ms
for structure learning in Bayesian networks. Another solution is to use structures
"classic" for classification as naive Bayesian network. A third approach is to
determine the structure of knowledge fro m experts.
To simp lify and facilitate the recognition process , the first approach is to ignore. We
chose to use a Bayesian latent network structure often used in pattern recognition
Fig. 4. The latent model, the latent variable is colored in gray

Para meter learning
Once the structure and the cardinality of the latent variables fixed, the learning of
network parameters (i.e the conditional probabilit ies) fro m inco mplete data and / or
latent variables is achieved by the EM algorithm. The EM algorith m introduced by
Dempster is an iterat ive approach of maximu m likelihood estimators. Each iteration
of the EM algorithm is composed of two main steps: a step of estimat ing (E) and a
maximization step (M). The purpose of this algorith m is to maximize the log
likelihood function l(λ,Y) = log (L(λ, Y)) where λ is the set of model parameters and Y
is the set of observations on the problem.
5
Experime nts and discussion
The tests have been done on a corpus of Tunisian city names that is extracted fro m
the IFN/ENIT data base. We have selected a subset of 8 models and generated a
database of 1600 words, composed of 200 different images per model. Each word of
city name corresponds to a model. On this database we have defined various training
and test sets of different sizes. We have used the samples of the sub basis “a” and
”b” for the training and those of the sub basis “c” for the validation and those of the
sub basis “d” for the test.
100
50
0
1 3 5 7 9 11 13 15
Cardinality of latent variable (×10)
Recognition Rate (%)
Recognition Rate (%)
After ext racted and discretized the continuous variables. The first experiments are
to find, using the validation dataset, the cardinality of hidden nodes for the optimal
model fo r each class. The criteria used to determine the cardinality of hidden nodes is
the recognition rates and the cost of inference in terms of t imes. The fo llo wing figure
shows the different results for some classes:
100
50
0
1
50
0
1 3 5 7 9 11 13 15
Cardinality of latent variable (×10)
C7
5
7
9 11 13 15
C4
Recognition Rate (%)
Recognition Rate (%)
C2
100
3
Cardinality of latent variable (×10)
100
50
0
1 3 5 7 9 11 13 15
Cardinality of latent variable (×10)
C8
Fig. 5. Recognition rates according to the cardinality of latent variab les for
class 2, 4, 7 and 8.
For class C2, the value o f the card inality of h idden nodes varies fro m 10 to 150. The
results in fig. 5 show that the recognition rates increase with the cardinality of hidden
nodes. This rate reaches a maximu m at k=110. There isn’t improvement fo r values of
k>110. The optimal value of k is set at 110.
For class C7, the recognition rates reaches a maximu m value for k= 40 and 100. The
cost of recognition rates, when k=100, is high in terms of memo ry space and time
inference. So, we choose k=40 for this class, this value p rovides the best compro mise
between computational comp lexity and the recognition rates.
All results concerning the optimal cardinality of hidden nodes are illustrated in the
following table:
Class
Cardinality
of
latent
variable
Table 1. Optimum cardinality of the latent variable per class
C1
C2
C3
C4
C5
C6
C7
80
110
50
90
90
150
40
C8
90
For the following results, learn ing and inference in each model are made taking into
account the corresponding optimal cardinality of the latent variable determined in
previous experiments.
Table 2 shows the recognition rates obtained with the training set:
Class
Recognition
rates (%)
Table 2. Recognition rates obtained by training set
C1
C2
C3
C4
C5
C6
C7
C8
97
99
95
98
94
98
99
98
Table 3 shows the recognition rates obtained with the test set:
Class
Recognition
rates (%)
C1
Table 3. Recognition rates obtained by test set
C2
C3
C4
C5
C6
88
88
86
90
90
92
C7
C8
90
86
Comparison with other systems
The IFN/ENIT database is available to the scientific co mmunity and this makes
system co mparison possible. As mentioned in the abstract, we co mpared the
developed system to three other systems tested also on IFN/ ENIT database; the ones
of El-hajj [21], Burrow [20], and Masmoudi [2], it may be noted fro m table 4 that the
highest accuracy was obtained by El-hajj fo llo wing our system, this is due to the use
of a segmentation phase, on the other side our system achieves a good accuracy
compared with Burro w’s and Masmoudi’s systems which prove the performance of
Hierarchical Bayesian network as classifier.
Table 4. Co mparison results
Systems
Classifiers
Co mbin ing rule
Accuracy
El-Hajj’s system [21]
Burro w’s system [20]
3 HMMs
Several K-NN
Neural Net work
Majority vote
94.44 %
74%
Masmoudi’s system
[2]
Our system
6
PHMM
Hierarchical
Bayesian
network
Max ru le
70%
Max ru le
90%
Conclusion and pers pectives
In this paper, we proposed a new approach for the offline Arabic handwrit ing
words recognition based on hierarchical Bayesian networks. Moreover, the Zernike
and Hu mo ments, invariant to rotation, translation and scaling, allowed us to
effectively address the major problems of mo rphological Arab ic handwriting. We
proposed the k-means method to solve the cardinality problem of the h idden nodes.
The experimental results are pro mising and show that our method enables to improve
the recognition rates when we find the optimal card inality of hidden nodes.
References
1. Benouareth, A., Ennaji, A., Sellami, M .: Arabic handwritten word recognition using HMMs
with explicit state duration. EURASIP Journal on Advances in Signal Processing, 2008.
2. M asmoudi Touj, S., Ben Amara, N., Amiri, H.: Arabic handwritten words recognition based
on a planar hidden markov model. the International Arab Journal of Information
Technology, vol.2, no.4, 2005.
3. M asmoudi Touj S., Ben Amara N., Amiri H. : M odélisation markovienne planaire pour la
reconnaissance de l’écriture arabe. Colloque International Francophone sur l’Ecrit et le
Document, CIFED, 2004.
4. Cho S.J., Kim J.H.: Bayesian Network M odeling of Hangul Characters for On-line
Handwriting Recognition. Proc. of ICDAR, 2003.
5. Tlemsani, R.,Benyettou, A. : Application des réseaux bayésiens dynamiques à la
reconnaissance en-ligne des caractères isolés. 4th International Conference: Sciences of
Electronic, Technologies of Information and Telecommunications, 2007, tunisia.
6. Hallouli K. : Reconnaissance de caractères par méthodes markoviennes et réseaux
bayésiens. PhD thèses. Ecole Nationale supérieure des Télécommunication, mai ,2004.
7. Hallouli, K., Likforman, L., Sigelle, M .: A comparative study between decision fusion and
fusion data in M arkovian printed character recognition. Proceedings of 16th International
Conference on Pattern Recognition. pp.147-150, 2002.
8. Colot, O., Olivier, C., Courtellemont, P., El-M atouat, A., Brucq, D.: Information criteria
and abrupt changes in probability laws. Signal Processing VII : Theories and Applications,
vol. 1, pp.387–391, 2004.
9. Barrat, S. : M odèles graphiques probabilistes pour la reconnaissance de formes. thése de
doctorat, 2009.
10. Leray, P. : Réseaux bayésiens : apprentissage et modélisation de systèmes complexes.
Rapport de recherche, 2006.
11. Zhang, N. L., Nielsen, T.D., Jensen, F.V.: Latent variable discovery in classification
models. to appear in Artificial Intelligence in M edicine, 2003.
12. Zhang, N. L.: Structural EM for hierarchical latent class model. Technical report. HKUSTCS03-06, 2003.
13. Peña, J.M ., Lozano, J.A., Larrañaga, P.: An improved Bayesian structural EM algorithm for
learning Bayesian networks for clustering. Pattern Recognition Letters. pp.779-786, 2001.
14. Tibshirani, R.: Regression Shrinkage and Selection via the Lasso. Journal of the Royal
Statistical Society. Series B (M ethodological), vol. 58, no. 1, pp: 267–288, 1996.
15. Pechwitz, M ., and M argner, V.: HMM Based Approach for Handwritten Arabic Word
Recognition Using the IFN/ENIT – Database. In Proc. of the 7th Inter. Conf. on Document
Analysis and Recognition (ICDAR), Vol. 2, pp 890– 894, 2003.
16. M enasri, F. : Contributions à la reconnaissance de l’écriture arabe manuscrite, Ph. D. thesis,
l’Université Paris Descartes, France 2008
17. Sangho, P., Aggarwal, J.K.: A hierarchical Bayesian network for event recognition of
human actions and interactions, Springer-Verlag 2004.
18. M ourad, R., Sinoquet, C., Leray, P. : Apprentissage de réseaux bayésiens hiérarchiques
latents pour les études d’association pangénomiques. Proc. JFRB, 2010.
19.Leray, Ph., Zaarour, I., Heutte, L., El Eter, B., Labiche, J., M ellier, D.: A Bayesian Network
M odel for Discovering Handwriting Strategies of Primary School Children. Proceedings of
14ème Congrès Francophone Reconnaissance des Formes et Intelligence Artificielle, 2004.
20. Burrow, P.: Arabic Handwriting Recognition. Report of M aster of Science School of
Informatics, University of Edinburgh, 2004.
21. El-Hajj, R., M okbel, C., and Likforman-Sulem, L.: Combination of HMM -Based classifiers
for the recognition of Arabic handwritten words. Document Analysis and Recognition,
2007.