Road Detection from a Single Image Si Chen Department of Electrical and Computer Engineering University of California San Diego La Jolla, California 92037 Email: [email protected] Abstract—Road detection is important both in computer vision and intelligent systems for vehicles. In this project, road areas are automatically detected by Deep Neural Network (DNN) [3]. The result detected by DNN fails mainly for two reasons: the model is too general to label the input image precisely and the similarity between sidewalk and road makes it hard to discriminate them. As a result, I use the Color Plane Fusion method proposed in [1] and edge information to refine the labeled image. Experiments show that the following steps do improve the performance. I. I NTRODUCTION Road detection is a significant issue in computer vision since it can be used in many applications like autonomous driving, pedestrian detection, scene parsing and 3D reconstruction. Many algorithms have been developed to let machines understand what’s going on and recognize many kinds of objects by statistical modeling. The statistical models often describe local information such as texture, color and some also make use of contextual information by generative models like Markov Random Field or discriminative models like Conditional Random Field, which give some surprising results but not robust enough. In this project, I just focus on segmenting road, which is one of the most challenging task since there are lots of variations. Information from various kinds of sensors can be used for road detection, such as stereo maps, depth information, RGB pictures. Typically, the more information you use, the more reasonable results you may get at the expense of more processing time and more device. RGB image is the only input that I choose, for it’s more accessible. Current algorithms for road detection mainly consist of two steps, feature extraction followed by a classifier. Some of them also contain a segmentation stage using color information to make use of the edge information. It has been shown in [1] that machine generated features are superior to human generated features, which might be too restrictive to model complex patterns for road and non-road areas. Hence, I use DNN to learn features automatically from raw data. Sidewalks can be usually misclassified as road, which is one of the most challenging problem in road segmentation. I try to deal with this problem by adding a online-learning stage using Color Plane Fusion given the fact that the color of road are typically different than the color of sidewalk. Also, Hough transform is used to detect the boundary of the roads, which helps further refine the result. Fig. 1: Structure of Deep Neural Network road) using labeled ground truth images [4] [1]. The road region is then calculated by the trained model. [4] also uses edge information by first calculating superpixels, which is too general to the task of road segmentation. Another approach is using dominant orientations of textures. This approach is based on the strong perspective effect of road scenes with only one vanishing point. Vanishing point and road boundaries can be detected by orientation information of textures, which has been tested to be effective but not quite robust [2]. The assumption used by dominant texture orientation approach is too strong, especially in urban areas where there are many complex objects on the road. Therefore, I prefer the former solution. [1] is the work that I refer to most, which uses Convolutional Neural Network for the classifier and refine the results of specific image by color plane fusion. While, my method has three difference. The first is to use DNN instead of Convolutional Neural Network to make use of its deep architecture for classification. Second, I use a different measure of uniformity in the color plane fusion step. The last difference is the use of line detection algorithm to refine the result of the combined model. The rest of the report is organized as follows: details of the approach will be described in section III-B; experiment results and analysis will be shown in section IV-B5; section V will include future work and conclusion. III. II. R ELATED R ESEARCH As far as I know, road detection has two main approaches. The first is to rely on statistical learning methods to model objects in typical road scenes(usually sky,vertical regions and M ETHOD In this section, I will first briefly introduce Deep Neural Network and Color Plane Fusion. And then demonstrate the combined learning algorithm. At last, line refinement will be described. Fig. 3: Flowchart of Line Refinement Fig. 2: Flowchart of the Learning Algorithm A. Learning Stage 1) Deep Neural Network: In the last few years, a large amount of research has been conducted around deep learning. The key idea of deep learning is to learn more abstract representations of the input data in a layer-wise way using unsupervised learning. Traditional neural networks were intended to learning such deep representations, which is difficult using gradient descent since they are easily stuck in local maxima. DNN circumvents this by a greedy layer-wise unsupervised pre-training phase, which tends to give the supervised training stage a good initialization. The DNN is constructed from many layers of Restricted Boltzmann Machines (RBMs) as shown in Fig. 1 and a RBM consists of two layers of neurons: a visible layer and a hidden layer. RBM is special because of the structure that each neuron is fully connected to the neurons of the other layer, but there is no connection between neurons of the same layer, which makes it possible to train conveniently by Contrastive Divergence (CD). There exists a very good library called Theano in Python for implementing DNN. In this project, however, I use the matlab code provided by Geoffrey E. Hinton on his homepage. 2) Color Plane Fusion: The pixel values of road area are supposed to be more uniformly distributed compared to others, which is a very useful cue. The key idea of color plane fusion is to find a linear transform of the values from different color planes ( R, G, B, nr, ng, H, S, V, etc ) that minimizes the variance of intensity values of local small pathces. The tranformation can be expressed as: ,where Σ is the data covariance and I is an N-element vector with ones. In this project, I first choose a small area of road for training w. Then use w to transform each pixel in the image. For each pixel, I find a small surronding patch, and assume that the pixel values in the surronding patch satisfiy Gaussian distribution. And evaluate the probability that the pixel belongs to road by: p = Φ(k/b σ ) − Φ(−k/b σ) where Φ(x) is the cdf of N (0, 1), and σ b is the estimated variance of the corresponding patch. This is the probability of x falls in the interval[-k,k]. The parameter k here is used to change the relative importance of Color Plane Fusion and DNN, which will be discussed later. This evaluation is intuitive since patches of the road area are supposed to have smaller σ b. I tried to use the DNN result to find the road patch for training w, but there is little improvement and cost more time. To be honest, the assumption that the mid bottom patch must be road area is quite robust. 3) Combination: Finally, the probability of each pixel xi being a road surface is determined by a the same way as in [1], which is the product of the two probabilities estimated by DNN and Color Plane Fusion. Fig. 2 shows the overview of the learning algorithm. The confidence map clearly reflects the improvement by combining off-line and on-line learning. B. Line Refinement Stage The goal is to find thePoptimal w such that var(y) is minimized N given the constraint j=1 wj = 1 . N is the number of color planes you use. And the optimal solution for w is Line refinement stage is mainly introduced to differentiate road and sidewalk. Edge information has not been considered in the previous stage, which is the most useful cue for this problem. As is shown in Fig. 3, the objective of the line refinement stage is to find the boundary of the road region. The road pixels outside the boundaries are considered to be false alarms and then abandoned. The first step is to apply Canny edge detector to the road regions (which is obtained by morphological dilation of the result from the learning stage). Then long line segments are detected by Hough transform. I first remove lines that are nearly horizontal. Assume that the number of pixels that is classified as road by learning stage on either side of the line is M and N . Whether a line is a road boundary or not is decided by a heuristic criterion: w = Σ−1 I(I T Σ−1 I)−1 min(M, N )/(M + N ) < β y(i) = N X wj xj (i) j=1 Fig. 5: Probability of Error , where β is the threshold. If β is high, lines are easier to be regarded as road boundaries, therefore road area may also be abandoned. If β is low, we only remove the area with very high probability to be sidewalk, but may at the same time miss real road boundaries. IV. R ESULTS AND A NALYSIS A. Experiment I test the proposed approach on the KITTI-ROAD dataset, 62500 patches with their ground truth labels are used to train the DNN. The size of each patch is 15*20 pixels and they are overlapping in order to achieve a certain extent of translate invariance. There are [902 500 500 2000 2] units in each layer. The input of the network is just the raw data of 900 for RGB values and 2 for position. There are totally 50 epochs for pretraining (using CD-1) and 200 epochs for back propagation. Fig. 6: The Rule of k: from top to bottom, the value of k is 0.5,1.0,1.5,2.0 Fig. 7: Improvement by Color Plane Fusion: images from left to right are results of DNN, Color Plane Fusion and combined For Color Plane Fusion, I choose the mid-bottom 76*101 patch for learning w and k is set to 1. For line refinement, β is set to 0.2. The pixels with a posterior probability higher than 0.3 will be regarded as road. Fig. 8: Improvement by Edge Detection Only the bottom half is used for detecting road. Some test results are shown in Fig. 4. B. Result Analysis 1) Deep Neural Network: I compare the test error of DNN vs. two kinds of shallow neural networks (Logistic Regression and Multilayer Perceptron) using the same raw input. Fig. 5 shows the benefit of using a deep architecture than shallow networks. The confusion matrix is Road non-Road Road 0.8426 0.0682 non-Road 0.1574 0.9318 2) The Rule of k: As is mentioned, k is a parameter that decides the relative importance of the online learning vs. the offline learning. If k is small, the online learning result becomes dominant and vice versa. Fig. 6 shows the result of the learning stage using different k. Qualitatively speaking, smaller k makes the method more generalized because it relies more on the online information. Larger k, however, makes the method more robust because the model by DNN contains more information than Color Plane Fusion. In my experiment, I set k to be 1. 3) Why use Color Plane Fusion?: The model captured by DNN is too general to work accurately on every image. As shown in Fig. 7 It may classify a car surface to road. By online-learning, the road information in that particular image is considered. By combining the two models, the algorithm can generalize better. 4) Why use edge information?: It helps to separate sidewalks (which is typically not drivable ) from road. Sidewalks usually have similar feature with roads. Fig. 8 shows the improvement. 5) Reason for failure: There are mainly two reason that can lead to the wrong result: • severe illumination changes, such as shadows • road marks These two reasons are all noise on the road, which make the road detection a hard problem. Since there are so many variations on road areas, it can never be robustly detected using heuristic tricks. Fig. 4: Experiment Result: each row shows one experiment result; each column stands for the result of different stages ACKNOWLEDGMENT I would like to thank Alfredo Ramirez for providing the dataset and Professor Mohan Trivedi for teaching the class. R EFERENCES [1] Fig. 9: Failed Examples V. F UTURE W ORK AND C ONCLUSION Road detection is an important and hard problem in computer vision. In this project, a novel feature learned by deep neural network is used to address this problem. Also, two heuristic tricks, Color Plane Fusion and Line Refinement, help refine the result. Road area is hard to model and differentiated from other parts of the street scenes. Future work may include using road mark removal and shadow removal algorithms to eliminate the ”noise” on road area and proposing more powerful model that takes contextual information into account. Pedestrian and car detection may also be useful. Jose M Alvarez, Theo Gevers, Yann LeCun, and Antonio M Lopez. Road scene segmentation from a single image. In Computer Vision–ECCV 2012, pages 376–389. Springer, 2012. [2] Jiwon Choi, Wonjun Kim, Haejung Kong, and Changick Kim. Realtime vanishing point detection using the local dominant orientation signature. In 3DTV Conference: The True Vision-Capture, Transmission and Display of 3D Video (3DTV-CON), 2011, pages 1–4. IEEE, 2011. [3] Geoffrey E Hinton, Simon Osindero, and Yee-Whye Teh. A fast learning algorithm for deep belief nets. Neural computation, 18(7):1527–1554, 2006. [4] Derek Hoiem, Alexei A Efros, and Martial Hebert. Recovering surface layout from an image. International Journal of Computer Vision, 75(1):151–172, 2007.
© Copyright 2025 ExpyDoc