Road Detection from a Single Image

Road Detection from a Single Image
Si Chen
Department of Electrical and Computer Engineering
University of California San Diego
La Jolla, California 92037
Email: [email protected]
Abstract—Road detection is important both in computer vision
and intelligent systems for vehicles. In this project, road areas are
automatically detected by Deep Neural Network (DNN) [3]. The
result detected by DNN fails mainly for two reasons: the model is
too general to label the input image precisely and the similarity
between sidewalk and road makes it hard to discriminate them.
As a result, I use the Color Plane Fusion method proposed in [1]
and edge information to refine the labeled image. Experiments
show that the following steps do improve the performance.
I.
I NTRODUCTION
Road detection is a significant issue in computer vision
since it can be used in many applications like autonomous
driving, pedestrian detection, scene parsing and 3D reconstruction. Many algorithms have been developed to let machines
understand what’s going on and recognize many kinds of
objects by statistical modeling. The statistical models often
describe local information such as texture, color and some also
make use of contextual information by generative models like
Markov Random Field or discriminative models like Conditional Random Field, which give some surprising results but
not robust enough. In this project, I just focus on segmenting
road, which is one of the most challenging task since there are
lots of variations. Information from various kinds of sensors
can be used for road detection, such as stereo maps, depth
information, RGB pictures. Typically, the more information
you use, the more reasonable results you may get at the
expense of more processing time and more device. RGB image
is the only input that I choose, for it’s more accessible.
Current algorithms for road detection mainly consist of
two steps, feature extraction followed by a classifier. Some of
them also contain a segmentation stage using color information
to make use of the edge information. It has been shown in
[1] that machine generated features are superior to human
generated features, which might be too restrictive to model
complex patterns for road and non-road areas. Hence, I use
DNN to learn features automatically from raw data. Sidewalks
can be usually misclassified as road, which is one of the most
challenging problem in road segmentation. I try to deal with
this problem by adding a online-learning stage using Color
Plane Fusion given the fact that the color of road are typically
different than the color of sidewalk. Also, Hough transform is
used to detect the boundary of the roads, which helps further
refine the result.
Fig. 1: Structure of Deep Neural Network
road) using labeled ground truth images [4] [1]. The road
region is then calculated by the trained model. [4] also uses
edge information by first calculating superpixels, which is too
general to the task of road segmentation. Another approach
is using dominant orientations of textures. This approach is
based on the strong perspective effect of road scenes with
only one vanishing point. Vanishing point and road boundaries
can be detected by orientation information of textures, which
has been tested to be effective but not quite robust [2]. The
assumption used by dominant texture orientation approach is
too strong, especially in urban areas where there are many
complex objects on the road. Therefore, I prefer the former
solution. [1] is the work that I refer to most, which uses
Convolutional Neural Network for the classifier and refine the
results of specific image by color plane fusion. While, my
method has three difference. The first is to use DNN instead
of Convolutional Neural Network to make use of its deep
architecture for classification. Second, I use a different measure
of uniformity in the color plane fusion step. The last difference
is the use of line detection algorithm to refine the result of the
combined model.
The rest of the report is organized as follows: details of the
approach will be described in section III-B; experiment results
and analysis will be shown in section IV-B5; section V will
include future work and conclusion.
III.
II.
R ELATED R ESEARCH
As far as I know, road detection has two main approaches.
The first is to rely on statistical learning methods to model
objects in typical road scenes(usually sky,vertical regions and
M ETHOD
In this section, I will first briefly introduce Deep Neural
Network and Color Plane Fusion. And then demonstrate the
combined learning algorithm. At last, line refinement will be
described.
Fig. 3: Flowchart of Line Refinement
Fig. 2: Flowchart of the Learning Algorithm
A. Learning Stage
1) Deep Neural Network: In the last few years, a large
amount of research has been conducted around deep learning.
The key idea of deep learning is to learn more abstract
representations of the input data in a layer-wise way using unsupervised learning. Traditional neural networks were intended
to learning such deep representations, which is difficult using
gradient descent since they are easily stuck in local maxima.
DNN circumvents this by a greedy layer-wise unsupervised
pre-training phase, which tends to give the supervised training
stage a good initialization.
The DNN is constructed from many layers of Restricted
Boltzmann Machines (RBMs) as shown in Fig. 1 and a RBM
consists of two layers of neurons: a visible layer and a hidden
layer. RBM is special because of the structure that each neuron
is fully connected to the neurons of the other layer, but there is
no connection between neurons of the same layer, which makes
it possible to train conveniently by Contrastive Divergence
(CD). There exists a very good library called Theano in Python
for implementing DNN. In this project, however, I use the
matlab code provided by Geoffrey E. Hinton on his homepage.
2) Color Plane Fusion: The pixel values of road area
are supposed to be more uniformly distributed compared to
others, which is a very useful cue. The key idea of color plane
fusion is to find a linear transform of the values from different
color planes ( R, G, B, nr, ng, H, S, V, etc ) that minimizes
the variance of intensity values of local small pathces. The
tranformation can be expressed as:
,where Σ is the data covariance and I is an N-element vector
with ones. In this project, I first choose a small area of road for
training w. Then use w to transform each pixel in the image.
For each pixel, I find a small surronding patch, and assume
that the pixel values in the surronding patch satisfiy Gaussian
distribution. And evaluate the probability that the pixel belongs
to road by:
p = Φ(k/b
σ ) − Φ(−k/b
σ)
where Φ(x) is the cdf of N (0, 1), and σ
b is the estimated
variance of the corresponding patch. This is the probability
of x falls in the interval[-k,k]. The parameter k here is used
to change the relative importance of Color Plane Fusion and
DNN, which will be discussed later. This evaluation is intuitive
since patches of the road area are supposed to have smaller σ
b.
I tried to use the DNN result to find the road patch for training
w, but there is little improvement and cost more time. To be
honest, the assumption that the mid bottom patch must be road
area is quite robust.
3) Combination: Finally, the probability of each pixel xi
being a road surface is determined by a the same way as in
[1], which is the product of the two probabilities estimated by
DNN and Color Plane Fusion. Fig. 2 shows the overview of
the learning algorithm. The confidence map clearly reflects the
improvement by combining off-line and on-line learning.
B. Line Refinement Stage
The goal is to find thePoptimal w such that var(y) is minimized
N
given the constraint j=1 wj = 1 . N is the number of color
planes you use. And the optimal solution for w is
Line refinement stage is mainly introduced to differentiate
road and sidewalk. Edge information has not been considered
in the previous stage, which is the most useful cue for this
problem. As is shown in Fig. 3, the objective of the line
refinement stage is to find the boundary of the road region.
The road pixels outside the boundaries are considered to be
false alarms and then abandoned. The first step is to apply
Canny edge detector to the road regions (which is obtained by
morphological dilation of the result from the learning stage).
Then long line segments are detected by Hough transform. I
first remove lines that are nearly horizontal. Assume that the
number of pixels that is classified as road by learning stage on
either side of the line is M and N . Whether a line is a road
boundary or not is decided by a heuristic criterion:
w = Σ−1 I(I T Σ−1 I)−1
min(M, N )/(M + N ) < β
y(i) =
N
X
wj xj (i)
j=1
Fig. 5: Probability of Error
, where β is the threshold. If β is high, lines are easier to
be regarded as road boundaries, therefore road area may also
be abandoned. If β is low, we only remove the area with very
high probability to be sidewalk, but may at the same time miss
real road boundaries.
IV.
R ESULTS AND A NALYSIS
A. Experiment
I test the proposed approach on the KITTI-ROAD dataset,
62500 patches with their ground truth labels are used to train
the DNN. The size of each patch is 15*20 pixels and they are
overlapping in order to achieve a certain extent of translate
invariance. There are [902 500 500 2000 2] units in each layer.
The input of the network is just the raw data of 900 for RGB
values and 2 for position. There are totally 50 epochs for pretraining (using CD-1) and 200 epochs for back propagation.
Fig. 6: The Rule of k: from top to bottom, the value of k is
0.5,1.0,1.5,2.0
Fig. 7: Improvement by Color Plane Fusion: images from left
to right are results of DNN, Color Plane Fusion and combined
For Color Plane Fusion, I choose the mid-bottom 76*101
patch for learning w and k is set to 1. For line refinement, β is
set to 0.2. The pixels with a posterior probability higher than
0.3 will be regarded as road.
Fig. 8: Improvement by Edge Detection
Only the bottom half is used for detecting road. Some test
results are shown in Fig. 4.
B. Result Analysis
1) Deep Neural Network: I compare the test error of DNN
vs. two kinds of shallow neural networks (Logistic Regression
and Multilayer Perceptron) using the same raw input. Fig. 5
shows the benefit of using a deep architecture than shallow
networks. The confusion matrix is
Road
non-Road
Road
0.8426
0.0682
non-Road
0.1574
0.9318
2) The Rule of k: As is mentioned, k is a parameter that
decides the relative importance of the online learning vs.
the offline learning. If k is small, the online learning result
becomes dominant and vice versa. Fig. 6 shows the result of
the learning stage using different k. Qualitatively speaking,
smaller k makes the method more generalized because it relies
more on the online information. Larger k, however, makes the
method more robust because the model by DNN contains more
information than Color Plane Fusion. In my experiment, I set
k to be 1.
3) Why use Color Plane Fusion?: The model captured by
DNN is too general to work accurately on every image. As
shown in Fig. 7 It may classify a car surface to road. By
online-learning, the road information in that particular image
is considered. By combining the two models, the algorithm
can generalize better.
4) Why use edge information?: It helps to separate sidewalks (which is typically not drivable ) from road. Sidewalks
usually have similar feature with roads. Fig. 8 shows the
improvement.
5) Reason for failure: There are mainly two reason that
can lead to the wrong result:
•
severe illumination changes, such as shadows
•
road marks
These two reasons are all noise on the road, which make
the road detection a hard problem. Since there are so many
variations on road areas, it can never be robustly detected using
heuristic tricks.
Fig. 4: Experiment Result: each row shows one experiment result; each column stands for the result of different stages
ACKNOWLEDGMENT
I would like to thank Alfredo Ramirez for providing the
dataset and Professor Mohan Trivedi for teaching the class.
R EFERENCES
[1]
Fig. 9: Failed Examples
V.
F UTURE W ORK AND C ONCLUSION
Road detection is an important and hard problem in computer vision. In this project, a novel feature learned by deep
neural network is used to address this problem. Also, two
heuristic tricks, Color Plane Fusion and Line Refinement, help
refine the result. Road area is hard to model and differentiated
from other parts of the street scenes. Future work may include
using road mark removal and shadow removal algorithms
to eliminate the ”noise” on road area and proposing more
powerful model that takes contextual information into account.
Pedestrian and car detection may also be useful.
Jose M Alvarez, Theo Gevers, Yann LeCun, and Antonio M Lopez. Road
scene segmentation from a single image. In Computer Vision–ECCV
2012, pages 376–389. Springer, 2012.
[2] Jiwon Choi, Wonjun Kim, Haejung Kong, and Changick Kim. Realtime vanishing point detection using the local dominant orientation
signature. In 3DTV Conference: The True Vision-Capture, Transmission
and Display of 3D Video (3DTV-CON), 2011, pages 1–4. IEEE, 2011.
[3] Geoffrey E Hinton, Simon Osindero, and Yee-Whye Teh. A fast learning
algorithm for deep belief nets. Neural computation, 18(7):1527–1554,
2006.
[4] Derek Hoiem, Alexei A Efros, and Martial Hebert. Recovering surface
layout from an image. International Journal of Computer Vision,
75(1):151–172, 2007.