2014 14th International Conference on Frontiers in Handwriting Recognition An MQDF-CNN Hybrid Model for Offline Handwritten Chinese Character Recognition Yanwei Wang, Xin Li, Changsong Liu, Xiaoqing Ding State Key Laboratory of Intelligent Technology and Tsinghua National Laboratory for Information Science and Technology, Department of Electronic Engineering, Tsinghua University Beijing, China {wangyw, lixin08,lcs,dxq}@ocrserv.ee.tsinghua.edu.cn Youxin Chen Beijing Samsung Telecom R&D Center Beijing, China [email protected] Abstract—An MQDF-CNN hybrid model is presented for offline handwritten Chinese character recognition. The main idea behind MQDF-CNN hybrid model is that the significant difference on features and classification mechanisms between MQDF and CNN can complement each other. Linear confidence accumulation and multiplication confidence criteria are used for fusion outputs of MQDF and CNN. Experiments have been conducted on CASIA-HWDB1.1 and ICDAR2013 offline handwritten Chinese character recognition competition dataset. On both datasets, CNN beats MQDF by more than 1% of the accuracy, and the MQDF-CNN hybrid model has achieved the test accuracies of 92.03% and 94.44% respectively. The result on competition dataset is comparable to the state-of-the-art result though less training samples and only one CNN is used. Keywords-CNN; MQDF; MQDF-CNN handwritten Chinese character recognition I. bybrid model; INTRODUCTION Offline Chinese handwritten character recognition is a large scale pattern recognition problem and still remained to be unsolved. The cursive characters are written unconstrainedly and vary drastically in writing style and shape distortions. This makes the recognition exceptional difficult. The discriminative information contained in extracted features determines the upper limit of the recognition system and the classifier design explores the way to achieve good performance. Therefore, feature extraction and classifier design are the most important parts in a character recognition system. For Chinese character recognition, modified quadratic discriminant function (MQDF) [10] implemented with gradient feature [19] was commonly used for relatively higher performance with lower computation complexity. Gradient feature is well designed for most character class discrimination. However, it cannot adaptively extracted discriminative feature for each class because it is not a kind of learning based feature. Great efforts focused on classifier design, which can be divided into two categories. One is the generative model optimized with discriminative information integration. MQDF can be improved by modulating parameters directly [1][4][5] under objective functions and indirectly by sample reweighting [2][3]. MQDF assumes that the features satisfy Gaussian distribution, however features 2167-6445/14 $31.00 © 2014 IEEE DOI 10.1109/ICFHR.2014.49 of cursive character do not confirm this requirement to some extent. The difference between real data distribution and model assumption determines that MQDF cannot solve the problem thoroughly. The other kind of methods is the discriminative model, such as support vector machine (SVM) [6], and deep learning using SVM [9] etc. It models the classification boundary directly by minimizing empirical risk or structure risk without data distribution assumption. The leading performance of MQDF has changed when deep convolutional neural network (CNN) has made a break through [7][8]. Hierarchical features are discriminatively learned by CNN from classifier’s perspective and contain more discriminative information than the conventional gradient feature. However, the computation complexity of CNN is extremely high for large scale classification and training a robust CNN needs a large amount of samples. Since MQDF based method and CNN employ different features and different classification mechanisms, it’s reasonable to suppose that they will complement each other. Based on this idea, an MQDF-CNN hybrid model is proposed for offline Chinese character recognition. MQDF is implemented with gradient feature and CNN makes use of hierarchical features. To combine MQDF and CNN, two fusion criteria are evaluated. The experiments demonstrate that the results are great promising. II. FONTS SYSTEM OVERVIEW The diagram of MQDF-CNN hybrid model for offline handwritten Chinese character recognition system is given in Fig.1. The system mainly consists of three main parts, such as feature extraction, classifier training and recognition. MQDF and CNN are trained separately and combined depending on the output of MQDF in recognition stage. Figure 1. Diagram of MQDF-CNN hybrid model for offline handwritten Chinese character recognition system. 246 A. MQDF MQDF is derived from the quadratic discriminant function (QDF) classifier, which is a Bayesian classifier assuming samples of each class satisfying Gaussian distribution. QDF can be expressed as the following equation hi ( x ) = ( x − μi )T Σi−1 ( x − μi ) + log Σ i . are from a symmetric interval Soft-max activation function is used for the output layer. Training ends once the program iterates specific times or there is no significant reduction on validation error. eigenvalue decomposition on Σ i and regularize the small eigenvalues by a constant 2, then the MQDF is derived as k j =1 1 λj [ϕijT ( x − μi )]2 + n 1 ¦σ j = k +1 2 [ϕijT ( x − μi )]2 RC = 1 − d1 / d 2 . (2) k + ln ∏ λij + (n − k ) ln σ 2 where k is the truncation dimensionality which means that the k largest eigenvalues remain to be not regularized. n is the feature dimensionality. ij is the jth eigenvalue of the covariance matrix of the ithclass and ij is the corresponding eigenvector. B. CNN As reported [11], there are many techniques to improve CNN’s performance, such as drop out, rectifying nonlinearity, unsupervised pre-training etc. Since the paper concerns more on the efficiency of MQDF-CNN hybrid model, the CNN used is a plain one without any special improvement. The CNN structure is shown in Fig.2, which adheres to the structure of LeNet-5[7]. The CNN are structured as 10x48x48-100C3-MP2200C2-MP2-300C2-MP2-400C2-MP2-1000N-3755N. It means the input layer has 10 input images of size 48x48 and succeeded by a convolutional layer with 100 filters of size 3x3 and a max-pooling layer over 2x2 regions. The following convolutional layer and max-pooling layer parameters have the similar meaning. At the end of last maxpooling layer, the features are mapped by a hidden layer with 1000 hidden units. Finally, the fully connected output layer has 3755 neurons and each neuron corresponds to one character class. The input 10 images include 8 gradient images and two images normalized by bilinear interpolation and density equalization [15] respectively. The parameters of each layer 23x23 11x11 5x5 (3) When d1 and d2 is comparable (RC approaches to zero), misclassification is most likely prone to be present. Therefore, if a sample recognized with RC larger than a constant threshold Th then the output of MQDF is directly outputted as the final result. Otherwise, the sample would be fed to CNN and gets a fusion result. The fusion is performed at the level of recognition scores. Before the implementation of combination schemes, the outputs of MQDF and CNN are normalized to a comparable level measurement value. Let wi denote the ith candidate class label, xg and xh denote the gradient feature and hierarchical feature of the unknown pattern for MQDF and CNN respectively. The conditional probability p(xg|wi) and recognition distance hi(xg) of the MQDF satisfy j =1 48x48 initiated C. Combing MQDF and CNN As MQDF based method can recognize most of the character classes well therefore only samples possibly recognized incorrectly need to be recognized again by CNN. Firstly the output of MQDF is evaluated by general recognition confidence (RC) [12], which is an effective measurement of recognition reliability. MQDF can output the top Q candidate recognition distances. On the top two recognition distances d1, d2 (d1<d2), RC is defined as (1) where x is the feature vector. The mean vector i and covariance matrix Σ i needs to be estimated. Apply hi ( x ) = ¦ uniformly [− 6 / (nin + nout ), 6 / (nin + nout ) ] . p( x g | wi ) ∝ e − hi ( x g )/ 2 . Based on (4), the posterior probability p(wi |xg) of MQDF can be computed over the top Q candidate classes [16]. CNN directly outputs the class label and a probability. The posterior probability p(wi|xh) of each neuron of CNN is recomputed by normalizing over the top Q candidate classes. Based on p(wi|xg) and p(wi|xh), the new probability p(wi|xh,xg) of MQDF-CNN hybrid model can be calculated under specific fusion rules. Linear confidence accumulation (LCA) [13] and multiplication confidence (MC) [14] methods are considered in this paper. According to LCA, the new probability p(wi|xh,xg) is calculated as a linear weighted combination of p(wi|xg) and p(wi|xh). 2x2 p ( wi | xh , x g ) = α ⋅ p ( wi | x g ) +β ⋅ p ( wi | xh ) Input layer (4) (5) where and are weighting factors, which can be defined on the performance of MQDF and CNN. The fusion recognition result for the current character image is found as Convolution layer Convolution layer Convolution layer Convolution layer Max-pooling layer Max-pooling layer Max-pooling layer Max-pooling layer Hidden layer Output layer Figure 2. The diagram of CNN structure wi = arg max p ( wi | xh , x g ). i =1"Q 247 (6) 100.00 As the absolute value of and do not influence the result of (6), it is assumed that + =1. MC explores the multiplication of two probabilities with weighting factors. α β 99.00 Rr(Th)(%) p ( wi | xh , x g ) = p ( wi | x g ) ⋅ p ( wi | xh ) 99.50 (7) 98.50 98.00 97.50 97.00 where and can be defined in the same way as LCA. The final result is also found by maximizing p(wi|xh,xg). The final decision of MQDF-CNN hybrid model is derived as 96.50 96.00 0.00 0.05 0.10 0.15 0.20 0.25 Th if RC > Th wMQDF ° w( x ) = ® arg max p ( wi | xh , x g ) otherwise °¯ i =1"Q Figure 3. Statistical analysis of Rr for MQDF with varying Th values on the training set. (8) Fig.4 shows the performance of MQDF-CNN hybrid model on both datasets under LCA with Th varying from 0.08 to 1.0. On CASIA-HWDB1.1, CNN-t gives the best performance at Th=1.0 when falling in [0.1, 0.9]. Since samples of CASIA-HWDB1.1 are very cursive, MQDFCNN can benefit the final recognition accuracy more with a larger Th because CNN adapts more to the cursive character recognition than MQDF. On ICDAR2013 competition dataset, the performance of MQDF-CNN-v varies slightly with different Th. As the experiments show that on both datasets, a better falls in the interval of [0.4, 0.6]. The test accuracy of CASIA-HWDB1.1 and ICDAR2013 competition datasets are listed as TABLE I. As the table shows CNN beats MQDF by more than 1% on both datasets. Moreover, the accuracies have been improved higher by MQDF-CNN hybrid model. The minimum, mean and maximum accuracies of MQDF-CNN are calculated sequentially in a square bracket under LCA and MC criteria with varying Th and (Th and vary in the same manner as in Fig. 4.). MQDF-CNN achieves the mean accuracy of 91.59% and 91.60% under LCA and MC criteria on CASIAHWDB1.1. The highest test accuracy is 92.03% obtained with Th=1.0 and =0.4; On ICDAR2013 competition dataset, MQDF-CNN-v gets 94.44% of the accuracy with Th=0.08 and =0.4 under MC, which is comparable to the best result, 94.77% (4 CNN) in ICDAR2013 competition[18]. The mean accuracies on both datasets indicate the MQDF-CNN hybrid model performs stable. The results also illustrate LCA and MC criteria act comparably for MQDF-CNN hybrid model. where wMQDF is the top candidate result of MQDF. III. EXPERIMENTS MQDF-CNN has been verified on CASIA-HWDB1.1 database and ICDAR2013 offline handwritten Chinese character recognition competition dataset [17]. CASIAHWDB1.1 contains 300 subsets, 240 subsets for training and 60 subsets for testing. Each subset has 3755 character class samples of GB2312-80 level I. ICDAR2013 competition data also contains 60 subsets including 224,419 characters. 588 dimensional gradient features are extracted and projected onto a 200 dimensional sub space. For testing CASIA-HWDB1.1 dataset, MQDF and CNN are both trained on HWDB1.1 training sets (from 1001-f to 1240-f). MQDF is trained under maximum likelihood estimation rule. For testing ICDAR2013 competition dataset, MQDF and CNN are trained on CASIA-HWDB1.0 database and CASIA-HWDB1.1 training sets. MQDF is learned by sample reweighting [2]. The experiment uses CASIAHWDB1.1 testing set as the validation set. The size of MQDF and CNN classifiers on both datasets are 51M and 16.5M respectively. A. RC and recognition reliability Denote M as the total number of training samples and Ms the number of samples recognized by MQDF with RC smaller than Th. Rr(Th) is defined as the recognition accuracy of MQDF on the M-Ms samples with RC larger than Th. In Fig.3, Rr(Th) is plotted with varying Th value. As the figure shows there is a strong and positive correlation between Rr(Th) and RC, which indicates that If Th is carefully set up, the samples to be or prone to be misclassified by MQDF could be selected reliably. TABLE I. RECOGNITION ACCURACY ON TESTING DATASETS CASIA-HWDB1.1 ICDAR2013 competition dataset MQDF(%) 89.24 91.79 CNN-v(%) - 92.86 CNN-t(%) Testing sets B. Comparison of MQDF, CNN and hybrid MQDF-CNN Let CNN-v and CNN-t denote CNN parameters determined by the best validation accuracy and the best test accuracy respectively. MQDF runs on a 3.16GHz CPU at the speed of 9.37 ms per character. CNN is implemented on Tesla K20Xm graphics card since the computational complexity is high. It consumes 4.5ms for recognizing one character averagely. 90.46 92.88 MQDF-CNN-v/LCA(%) - [93.47 94.10 94.43] MQDF-CNN-v/MC (%) - [93.73 94.15 94.44] MQDF-CNN-t/LCA (%) [90.96 91.59 92.02] [93.47 94.11 94.44] MQDF-CNN-t/MC(%) 248 [90.93 91.61 92.03] [93.74 94.15 94.44] ICDAR2013 competition data Th=0.08 94.6 Th=0.08 92.0 Th=0.10 94.4 Th=0.10 94.2 Th=0.12 94.0 Th=0.15 91.8 Accuracy(%) Accuracy(%) CASIA-HWDB1.1 92.2 Th=0.12 91.6 Th=0.15 91.4 91.2 Th=0.20 91.0 Th=0.30 93.6 90.8 Th=1.00 93.4 Th=0.20 93.8 Th=0.30 Th=1.00 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 (b) (a) Figure 4. (a) and (b) are test accuracies of MQDF-CNN under LCA fusion critia on CASIA-HWDB1.1 and ICDAR2013 competition testing datasets. [7] IV. CONCLUTIONS This paper presents an MQDF-CNN hybrid model for offline handwritten Chinese character recognition. The MQDF was implemented with hand designed gradient features while the CNN explored hierarchically learned feature. The compensation between MQDF and CNN makes their combination outperforming each of them alone. Although only one CNN is used, the combination method has reduced the recognition error rate greatly. In the future work, multi-column layer of CNN focusing on different type of character samples will be learned and combined with MQDF. In order to explore the combination between MQDF and CNN, the recognition results fusion rules will be studied comprehensively. [8] [9] [10] [11] [12] ACKNOWLEDGMENT [13] This work was supported by National Natural Science Foundation of China under Grant No. 60933010, 61032008 and the National Basic Research Program of China (973 program) under Grant No. 2013CB329403. [14] REFERENCES [1] [2] [3] [4] [5] [6] [15] C.L. Liu, H. Sako and H. Fujisawa, “Discriminative Learning Quadratic Discriminant Function for Handwriting Recognition,” IEEE Trans. on Neural Networks, vol.15, Mar. 2004, pp. 430-444. Y.W. Wang, X.Q. Ding and C.S. Liu, “MQDF Discriminative Learning Based Offline Handwritten Chinese Character Recognition,” Proc.of International Conference on Document Analysis and Recognition, Sep.2011,pp.1100-1104. Y.W. Wang, X.Q. Ding and C.S. Liu, “MQDF Retrained on Selected Sample Set,” IEICE Transactions on Information and Systems. vol. E94-D (10), 2011,pp.1933-1936. T.H. Su, C.L. Liu and X.Y. Zhang, “Perceptron Learning of Modied Quadratic Discriminant Function,”. Proc.of International Conference on Document Analysis and Recognition, Sep.2011,pp.1007-1011. X.Y. Zhang and C.L. Liu, “Locally Smoothed Modified Quadratic Discriminant Function,” Proc.of International Conference on Document Analysis and Recognition, Aug. 2013, pp.8-12. J.X. Dong, A. Krzyzak and C.Y. Suen, “Fast SVM Training Algorithm with Decomposition on Very Large Datasets,”. IEEE Trans. on Pattern Analysis and Machine Intelligence. vol.27,2005,pp.603-618. [16] [17] [18] [19] 249 Y. LeCun, L. Bottou, Y. Bengio and P. Haffner, “Gradient-based Learning Applied to Document Recognition,” Proceedings of the IEEE, vol.86, 1998, pp.2278–2324. D. Cires and J. Schmidhuber, “Multi-Column Deep Neural Networks for Offline Handwritten Chinese Character Classification,” Technical Report arXiv: 1309.0261v1,2013. Y. Tang, “Deep Learning using Support Vector Machines,”. arXiv preprint arXiv:1306.0239, 2013. F. Kimura, K. Takashina, S. Tsuruoka and Y. Miyake, “Modified Quadratic Discriminate Functions and the Application to Chinese Character Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol.9, Jan. 1987, pp. 149-153. Y. Bengio, A. Courville and P. Vincent, “Representation Learning: A Review and New Perspectives,” IEEE Transactions oPattern Analysis and Machine Intelligence, vol:35, 2013,pp. 1798 -1828. X.F. Lin, X.Q. Ding, M. Chen, R. Zhang and Y.S Wu, “Adaptive Confidence Transform Based Classifier Combination for Chinese Character Recognition,” Pattern Recognition Letters, vol.19, 1998, pp. 975-988. Y.S. Huang and C.Y. Suen, “A Method of Combining Multiple Experts for the Recognition of Unconstrained Handwritten Numerals,” IEEE Trans Pattern Recognition and Machine Intelligence, vol.17, 1995, pp.90-94. J.H. Kim, K.K. Kim and C.Y. Suen, “An HMM-MLP Hybrid Model for Cursive Script Recognition,” Pattern Analysis & Applications, vol.3, 2000,pp.314-324. H. Yamada, K. Yamamoto and T. Saito, “A Nonlinear Normalization Method for Handprinted Kanji Character Recognition Line Density Equalization,” Pattern Recognition, vol.23, 1990, pp.1023-1029. C.L. Liu and N. Masaki, “Precise Candidate Selection for Large Character Set Recognition by Confidence Evaluation,” IEEE Trans. on Pattern Analysis and Machine Intelligence,vol.22,2000,pp.636642. C.L. Liu, F. Yin,D.H. Wang and Q.F. Wang, “Online and Offline Handwritten Chinese Character Recognition: Benchmarking on New Databases,” Pattern Recognition, vol.46, 2013, pp.155–162. F. Yin, Q.F. Wang, X.Y. Zhang and C.-L. Liu, “ICDAR 2013 Chinese Handwriting Recognition Competition,” Proc.of International Conference on Document Analysis and Recognition, Aug. 2013,pp.1464-1470. H.L. Liu, X.Q. Ding, “Handwritten Character Recognition Using Gradient Feature and Quadratic Classifier with Multiple Discrimination Schemes,” Proc.of International Conference on Document Analysis and Recognition, Sep. 2005, pp.19-23.
© Copyright 2024 ExpyDoc