An MQDF-CNN Hybrid Model for Offline Handwritten Chinese

2014 14th International Conference on Frontiers in Handwriting Recognition
An MQDF-CNN Hybrid Model for Offline Handwritten Chinese Character
Recognition
Yanwei Wang, Xin Li, Changsong Liu, Xiaoqing Ding
State Key Laboratory of Intelligent Technology and
Tsinghua National Laboratory for Information Science
and Technology, Department of Electronic Engineering,
Tsinghua University
Beijing, China
{wangyw, lixin08,lcs,dxq}@ocrserv.ee.tsinghua.edu.cn
Youxin Chen
Beijing Samsung Telecom R&D Center
Beijing, China
[email protected]
Abstract—An MQDF-CNN hybrid model is presented for
offline handwritten Chinese character recognition. The main
idea behind MQDF-CNN hybrid model is that the significant
difference on features and classification mechanisms between
MQDF and CNN can complement each other. Linear
confidence accumulation and multiplication confidence criteria
are used for fusion outputs of MQDF and CNN. Experiments
have been conducted on CASIA-HWDB1.1 and ICDAR2013
offline handwritten Chinese character recognition competition
dataset. On both datasets, CNN beats MQDF by more than 1%
of the accuracy, and the MQDF-CNN hybrid model has
achieved the test accuracies of 92.03% and 94.44%
respectively. The result on competition dataset is comparable
to the state-of-the-art result though less training samples and
only one CNN is used.
Keywords-CNN; MQDF; MQDF-CNN
handwritten Chinese character recognition
I.
bybrid
model;
INTRODUCTION
Offline Chinese handwritten character recognition is a
large scale pattern recognition problem and still remained to
be unsolved. The cursive characters are written
unconstrainedly and vary drastically in writing style and
shape distortions. This makes the recognition exceptional
difficult.
The discriminative information contained in extracted
features determines the upper limit of the recognition system
and the classifier design explores the way to achieve good
performance. Therefore, feature extraction and classifier
design are the most important parts in a character recognition
system. For Chinese character recognition, modified
quadratic discriminant function (MQDF) [10] implemented
with gradient feature [19] was commonly used for relatively
higher performance with lower computation complexity.
Gradient feature is well designed for most character class
discrimination. However, it cannot adaptively extracted
discriminative feature for each class because it is not a kind
of learning based feature. Great efforts focused on classifier
design, which can be divided into two categories. One is the
generative model optimized with discriminative information
integration. MQDF can be improved by modulating
parameters directly [1][4][5] under objective functions and
indirectly by sample reweighting [2][3]. MQDF assumes that
the features satisfy Gaussian distribution, however features
2167-6445/14 $31.00 © 2014 IEEE
DOI 10.1109/ICFHR.2014.49
of cursive character do not confirm this requirement to some
extent. The difference between real data distribution and
model assumption determines that MQDF cannot solve the
problem thoroughly. The other kind of methods is the
discriminative model, such as support vector machine (SVM)
[6], and deep learning using SVM [9] etc. It models the
classification boundary directly by minimizing empirical risk
or structure risk without data distribution assumption.
The leading performance of MQDF has changed when
deep convolutional neural network (CNN) has made a break
through [7][8]. Hierarchical features are discriminatively
learned by CNN from classifier’s perspective and contain
more discriminative information than the conventional
gradient feature. However, the computation complexity of
CNN is extremely high for large scale classification and
training a robust CNN needs a large amount of samples.
Since MQDF based method and CNN employ different
features and different classification mechanisms, it’s
reasonable to suppose that they will complement each other.
Based on this idea, an MQDF-CNN hybrid model is
proposed for offline Chinese character recognition. MQDF is
implemented with gradient feature and CNN makes use of
hierarchical features. To combine MQDF and CNN, two
fusion criteria are evaluated. The experiments demonstrate
that the results are great promising.
II.
FONTS SYSTEM OVERVIEW
The diagram of MQDF-CNN hybrid model for offline
handwritten Chinese character recognition system is given in
Fig.1. The system mainly consists of three main parts, such
as feature extraction, classifier training and recognition.
MQDF and CNN are trained separately and combined
depending on the output of MQDF in recognition stage.
Figure 1. Diagram of MQDF-CNN hybrid model for offline handwritten
Chinese character recognition system.
246
A. MQDF
MQDF is derived from the quadratic discriminant
function (QDF) classifier, which is a Bayesian classifier
assuming samples of each class satisfying Gaussian
distribution. QDF can be expressed as the following equation
hi ( x ) = ( x − μi )T Σi−1 ( x − μi ) + log Σ i .
are
from a symmetric interval
Soft-max activation function is
used for the output layer. Training ends once the program
iterates specific times or there is no significant reduction on
validation error.
eigenvalue decomposition on Σ i and regularize the small
eigenvalues by a constant 2, then the MQDF is derived as
k
j =1
1
λj
[ϕijT ( x − μi )]2 +
n
1
¦σ
j = k +1
2
[ϕijT ( x − μi )]2
RC = 1 − d1 / d 2 .
(2)
k
+ ln ∏ λij + (n − k ) ln σ 2
where k is the truncation dimensionality which means that
the k largest eigenvalues remain to be not regularized. n is
the feature dimensionality. ij is the jth eigenvalue of the
covariance matrix of the ithclass and ij is the corresponding
eigenvector.
B. CNN
As reported [11], there are many techniques to improve
CNN’s performance, such as drop out, rectifying nonlinearity, unsupervised pre-training etc. Since the paper
concerns more on the efficiency of MQDF-CNN hybrid
model, the CNN used is a plain one without any special
improvement. The CNN structure is shown in Fig.2, which
adheres to the structure of LeNet-5[7].
The CNN are structured as 10x48x48-100C3-MP2200C2-MP2-300C2-MP2-400C2-MP2-1000N-3755N.
It
means the input layer has 10 input images of size 48x48 and
succeeded by a convolutional layer with 100 filters of size
3x3 and a max-pooling layer over 2x2 regions. The
following convolutional layer and max-pooling layer
parameters have the similar meaning. At the end of last maxpooling layer, the features are mapped by a hidden layer with
1000 hidden units. Finally, the fully connected output layer
has 3755 neurons and each neuron corresponds to one
character class.
The input 10 images include 8 gradient images and two
images normalized by bilinear interpolation and density
equalization [15] respectively. The parameters of each layer
23x23
11x11
5x5
(3)
When d1 and d2 is comparable (RC approaches to zero),
misclassification is most likely prone to be present.
Therefore, if a sample recognized with RC larger than a
constant threshold Th then the output of MQDF is directly
outputted as the final result. Otherwise, the sample would be
fed to CNN and gets a fusion result.
The fusion is performed at the level of recognition scores.
Before the implementation of combination schemes, the
outputs of MQDF and CNN are normalized to a comparable
level measurement value. Let wi denote the ith candidate class
label, xg and xh denote the gradient feature and hierarchical
feature of the unknown pattern for MQDF and CNN
respectively. The conditional probability p(xg|wi) and
recognition distance hi(xg) of the MQDF satisfy
j =1
48x48
initiated
C. Combing MQDF and CNN
As MQDF based method can recognize most of the
character classes well therefore only samples possibly
recognized incorrectly need to be recognized again by CNN.
Firstly the output of MQDF is evaluated by general
recognition confidence (RC) [12], which is an effective
measurement of recognition reliability. MQDF can output
the top Q candidate recognition distances. On the top two
recognition distances d1, d2 (d1<d2), RC is defined as
(1)
where x is the feature vector. The mean vector i and
covariance matrix Σ i needs to be estimated. Apply
hi ( x ) = ¦
uniformly
[− 6 / (nin + nout ), 6 / (nin + nout ) ] .
p( x g | wi ) ∝ e
− hi ( x g )/ 2
.
Based on (4), the posterior probability p(wi |xg) of MQDF can
be computed over the top Q candidate classes [16]. CNN
directly outputs the class label and a probability. The
posterior probability p(wi|xh) of each neuron of CNN is
recomputed by normalizing over the top Q candidate classes.
Based on p(wi|xg) and p(wi|xh), the new probability
p(wi|xh,xg) of MQDF-CNN hybrid model can be calculated
under specific fusion rules. Linear confidence accumulation
(LCA) [13] and multiplication confidence (MC) [14]
methods are considered in this paper.
According to LCA, the new probability p(wi|xh,xg) is
calculated as a linear weighted combination of p(wi|xg) and
p(wi|xh).
2x2
p ( wi | xh , x g ) = α ⋅ p ( wi | x g ) +β ⋅ p ( wi | xh )
Input layer
(4)
(5)
where and are weighting factors, which can be defined
on the performance of MQDF and CNN. The fusion
recognition result for the current character image is found as
Convolution layer
Convolution layer
Convolution layer
Convolution layer
Max-pooling layer
Max-pooling layer
Max-pooling layer Max-pooling layer Hidden layer Output layer
Figure 2. The diagram of CNN structure
wi = arg max p ( wi | xh , x g ).
i =1"Q
247
(6)
100.00
As the absolute value of and do not influence the
result of (6), it is assumed that + =1.
MC explores the multiplication of two probabilities with
weighting factors.
α
β
99.00
Rr(Th)(%)
p ( wi | xh , x g ) = p ( wi | x g ) ⋅ p ( wi | xh )
99.50
(7)
98.50
98.00
97.50
97.00
where and can be defined in the same way as LCA. The
final result is also found by maximizing p(wi|xh,xg). The final
decision of MQDF-CNN hybrid model is derived as
96.50
96.00
0.00
0.05
0.10
0.15
0.20
0.25
Th
if RC > Th
­ wMQDF
°
w( x ) = ®
arg max p ( wi | xh , x g ) otherwise
°¯ i =1"Q
Figure 3. Statistical analysis of Rr for MQDF with varying Th values on
the training set.
(8)
Fig.4 shows the performance of MQDF-CNN hybrid
model on both datasets under LCA with Th varying from
0.08 to 1.0. On CASIA-HWDB1.1, CNN-t gives the best
performance at Th=1.0 when falling in [0.1, 0.9]. Since
samples of CASIA-HWDB1.1 are very cursive, MQDFCNN can benefit the final recognition accuracy more with a
larger Th because CNN adapts more to the cursive character
recognition than MQDF. On ICDAR2013 competition
dataset, the performance of MQDF-CNN-v varies slightly
with different Th. As the experiments show that on both
datasets, a better falls in the interval of [0.4, 0.6].
The test accuracy of CASIA-HWDB1.1 and ICDAR2013
competition datasets are listed as TABLE I. As the table
shows CNN beats MQDF by more than 1% on both datasets.
Moreover, the accuracies have been improved higher by
MQDF-CNN hybrid model. The minimum, mean and
maximum accuracies of MQDF-CNN are calculated
sequentially in a square bracket under LCA and MC criteria
with varying Th and (Th and vary in the same manner as
in Fig. 4.). MQDF-CNN achieves the mean accuracy of
91.59% and 91.60% under LCA and MC criteria on CASIAHWDB1.1. The highest test accuracy is 92.03% obtained
with Th=1.0 and =0.4; On ICDAR2013 competition dataset,
MQDF-CNN-v gets 94.44% of the accuracy with Th=0.08
and =0.4 under MC, which is comparable to the best result,
94.77% (4 CNN) in ICDAR2013 competition[18]. The mean
accuracies on both datasets indicate the MQDF-CNN hybrid
model performs stable. The results also illustrate LCA and
MC criteria act comparably for MQDF-CNN hybrid model.
where wMQDF is the top candidate result of MQDF.
III.
EXPERIMENTS
MQDF-CNN has been verified on CASIA-HWDB1.1
database and ICDAR2013 offline handwritten Chinese
character recognition competition dataset [17]. CASIAHWDB1.1 contains 300 subsets, 240 subsets for training and
60 subsets for testing. Each subset has 3755 character class
samples of GB2312-80 level I. ICDAR2013 competition
data also contains 60 subsets including 224,419 characters.
588 dimensional gradient features are extracted and
projected onto a 200 dimensional sub space. For testing
CASIA-HWDB1.1 dataset, MQDF and CNN are both
trained on HWDB1.1 training sets (from 1001-f to 1240-f).
MQDF is trained under maximum likelihood estimation rule.
For testing ICDAR2013 competition dataset, MQDF and
CNN are trained on CASIA-HWDB1.0 database and
CASIA-HWDB1.1 training sets. MQDF is learned by
sample reweighting [2]. The experiment uses CASIAHWDB1.1 testing set as the validation set. The size of
MQDF and CNN classifiers on both datasets are 51M and
16.5M respectively.
A. RC and recognition reliability
Denote M as the total number of training samples and Ms
the number of samples recognized by MQDF with RC
smaller than Th. Rr(Th) is defined as the recognition
accuracy of MQDF on the M-Ms samples with RC larger
than Th. In Fig.3, Rr(Th) is plotted with varying Th value.
As the figure shows there is a strong and positive correlation
between Rr(Th) and RC, which indicates that If Th is
carefully set up, the samples to be or prone to be
misclassified by MQDF could be selected reliably.
TABLE I.
RECOGNITION ACCURACY ON TESTING DATASETS
CASIA-HWDB1.1
ICDAR2013
competition dataset
MQDF(%)
89.24
91.79
CNN-v(%)
-
92.86
CNN-t(%)
Testing sets
B. Comparison of MQDF, CNN and hybrid MQDF-CNN
Let CNN-v and CNN-t denote CNN parameters
determined by the best validation accuracy and the best test
accuracy respectively. MQDF runs on a 3.16GHz CPU at the
speed of 9.37 ms per character. CNN is implemented on
Tesla K20Xm graphics card since the computational
complexity is high. It consumes 4.5ms for recognizing one
character averagely.
90.46
92.88
MQDF-CNN-v/LCA(%)
-
[93.47 94.10 94.43]
MQDF-CNN-v/MC (%)
-
[93.73 94.15 94.44]
MQDF-CNN-t/LCA (%) [90.96 91.59 92.02] [93.47 94.11 94.44]
MQDF-CNN-t/MC(%)
248
[90.93 91.61 92.03] [93.74 94.15 94.44]
ICDAR2013 competition data
Th=0.08
94.6
Th=0.08
92.0
Th=0.10
94.4
Th=0.10
94.2
Th=0.12
94.0
Th=0.15
91.8
Accuracy(%)
Accuracy(%)
CASIA-HWDB1.1
92.2
Th=0.12
91.6
Th=0.15
91.4
91.2
Th=0.20
91.0
Th=0.30
93.6
90.8
Th=1.00
93.4
Th=0.20
93.8
Th=0.30
Th=1.00
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
(b)
(a)
Figure 4. (a) and (b) are test accuracies of MQDF-CNN under LCA fusion critia on CASIA-HWDB1.1 and ICDAR2013 competition testing datasets.
[7]
IV.
CONCLUTIONS
This paper presents an MQDF-CNN hybrid model for
offline handwritten Chinese character recognition. The
MQDF was implemented with hand designed gradient
features while the CNN explored hierarchically learned
feature. The compensation between MQDF and CNN makes
their combination outperforming each of them alone.
Although only one CNN is used, the combination method
has reduced the recognition error rate greatly.
In the future work, multi-column layer of CNN focusing
on different type of character samples will be learned and
combined with MQDF. In order to explore the combination
between MQDF and CNN, the recognition results fusion
rules will be studied comprehensively.
[8]
[9]
[10]
[11]
[12]
ACKNOWLEDGMENT
[13]
This work was supported by National Natural Science
Foundation of China under Grant No. 60933010,
61032008 and the National Basic Research Program of
China (973 program) under Grant No. 2013CB329403.
[14]
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[15]
C.L. Liu, H. Sako and H. Fujisawa, “Discriminative Learning
Quadratic Discriminant Function for Handwriting Recognition,”
IEEE Trans. on Neural Networks, vol.15, Mar. 2004, pp. 430-444.
Y.W. Wang, X.Q. Ding and C.S. Liu, “MQDF Discriminative
Learning Based Offline Handwritten Chinese Character
Recognition,” Proc.of International Conference on Document
Analysis and Recognition, Sep.2011,pp.1100-1104.
Y.W. Wang, X.Q. Ding and C.S. Liu, “MQDF Retrained on Selected
Sample Set,” IEICE Transactions on Information and Systems. vol.
E94-D (10), 2011,pp.1933-1936.
T.H. Su, C.L. Liu and X.Y. Zhang, “Perceptron Learning of Modied
Quadratic Discriminant Function,”. Proc.of International Conference
on Document Analysis and Recognition, Sep.2011,pp.1007-1011.
X.Y. Zhang and C.L. Liu, “Locally Smoothed Modified Quadratic
Discriminant Function,” Proc.of International Conference on
Document Analysis and Recognition, Aug. 2013, pp.8-12.
J.X. Dong, A. Krzyzak and C.Y. Suen, “Fast SVM Training
Algorithm with Decomposition on Very Large Datasets,”. IEEE
Trans. on Pattern Analysis and Machine Intelligence.
vol.27,2005,pp.603-618.
[16]
[17]
[18]
[19]
249
Y. LeCun, L. Bottou, Y. Bengio and P. Haffner, “Gradient-based
Learning Applied to Document Recognition,” Proceedings of the
IEEE, vol.86, 1998, pp.2278–2324.
D. Cires and J. Schmidhuber, “Multi-Column Deep Neural Networks
for Offline Handwritten Chinese Character Classification,” Technical
Report arXiv: 1309.0261v1,2013.
Y. Tang, “Deep Learning using Support Vector Machines,”. arXiv
preprint arXiv:1306.0239, 2013.
F. Kimura, K. Takashina, S. Tsuruoka and Y. Miyake, “Modified
Quadratic Discriminate Functions and the Application to Chinese
Character Recognition,” IEEE Trans. Pattern Analysis and Machine
Intelligence, vol.9, Jan. 1987, pp. 149-153.
Y. Bengio, A. Courville and P. Vincent, “Representation Learning: A
Review and New Perspectives,” IEEE Transactions oPattern Analysis
and Machine Intelligence, vol:35, 2013,pp. 1798 -1828.
X.F. Lin, X.Q. Ding, M. Chen, R. Zhang and Y.S Wu, “Adaptive
Confidence Transform Based Classifier Combination for Chinese
Character Recognition,” Pattern Recognition Letters, vol.19, 1998,
pp. 975-988.
Y.S. Huang and C.Y. Suen, “A Method of Combining Multiple
Experts for the Recognition of Unconstrained Handwritten
Numerals,” IEEE Trans Pattern Recognition and Machine
Intelligence, vol.17, 1995, pp.90-94.
J.H. Kim, K.K. Kim and C.Y. Suen, “An HMM-MLP Hybrid Model
for Cursive Script Recognition,” Pattern Analysis & Applications,
vol.3, 2000,pp.314-324.
H. Yamada, K. Yamamoto and T. Saito, “A Nonlinear Normalization
Method for Handprinted Kanji Character Recognition Line Density
Equalization,” Pattern Recognition, vol.23, 1990, pp.1023-1029.
C.L. Liu and N. Masaki, “Precise Candidate Selection for Large
Character Set Recognition by Confidence Evaluation,” IEEE Trans.
on Pattern Analysis and Machine Intelligence,vol.22,2000,pp.636642.
C.L. Liu, F. Yin,D.H. Wang and Q.F. Wang, “Online and Offline
Handwritten Chinese Character Recognition: Benchmarking on New
Databases,” Pattern Recognition, vol.46, 2013, pp.155–162.
F. Yin, Q.F. Wang, X.Y. Zhang and C.-L. Liu, “ICDAR 2013
Chinese
Handwriting
Recognition
Competition,”
Proc.of
International Conference on Document Analysis and Recognition,
Aug. 2013,pp.1464-1470.
H.L. Liu, X.Q. Ding, “Handwritten Character Recognition Using
Gradient Feature and Quadratic Classifier with Multiple
Discrimination Schemes,” Proc.of International Conference on
Document Analysis and Recognition, Sep. 2005, pp.19-23.