Python と画像処理の演習 2015-11-24 白井英俊ゼミ資料 1．はじめに

Python と画像処理の演習
2015-11-24
白井英俊ゼミ資料
１．はじめに---酢漬けの作り方
Lutz, M. (2009) 『初めての Python』. オライリー・ジャパン p.11
Python の標準モジュールである pickle を使うと、オブジェクトの永続化が簡単にできます。
Python オブジェクトをファイルなどに保存する、あるいは保存したオブジェクトを復元する、とい
ったことを行うためのプログラムが簡単に書けるということです。
同 p.193
pickle によりオブジェクトをそのままファイルに保存する
pickle モジュールを使用すれば、Python のあらゆるオブジェクトを、文字列への変換を行うこと
なくファイルに直接保存できます。データのファイルへの保存、ファイルからの読み込みに幅広く
使える便利なツールと言えます。たとえば、ディクショナリをファイルに保存するには、pickle を使
用する場合は以下のようなコードを書きます。
>>> D = {'a': 1, 'b': 2}
>>> F = open('datafile.txt', 'w')
>>> import pickle
>>> pickle.dump(D, F)
# pickle を使用してオブジェクトをファイルに保存
>>> F.close()
このディクショナリをファイルから取り出すには、以下のようなコードを書きます。
>>> F = open('datafile.txt')
>>> E = pickle.load(F)
# ファイルからオブジェクトを取り出す
>>> E
{'a': 1, 'b': 2}
このようにしてディクショナリをファイルに保存した場合は、ファイルから取り出す際、先に示し
たような分割、変換の処理は必要ありません。pickle モジュールを使用した場合には、自動的に
シリアライズと呼ばれる処理が行われます。これは、オブジェクトからバイトストリーム、バイトストリ
ームからオブジェクトへの変換処理です。この処理に際して、プログラマの側ですべきことはほと
んどありません。ただ、バイトストリーム形式のデータをそのまま読み込んで表示しても、以下のと
おり、人間にとってはほとんど意味のないものになっています（シリアライズのモードによっては、
もっと複雑でわけのわからない形式になります）。
>>> open('datafile.txt').read()
"(dp0¥nS'a'¥np1¥nI1¥nsS'b'¥np2¥nI2¥ns."
この形式から元のオブジェクトに戻す処理は、pickle が自動的に行うので、プログラマがそのた
めのコードを書く必要はありません。pickle モジュールの詳細については、Python の標準のライ
ブラリマニュアルを参照するか、対話型環境で pickle をインポートし、（pickle を引数として）help
関数を実行してみてください。また、同時に shelve モジュールについても調べてみるとよいでしょ
う。shelve は基本的には pickle と同じですが、保存されたファイルが、ディクショナリのようにキー
によってアクセスできる一種の「データベース」になる、という点が pickle とは異なります。
2. カメラからの画像取込み
桑井・豊沢・永田(2014)『実践 OpenCV2.4 for Python』カットシステム
2.5 節カメラ画像の表示
ゼミ資料ページから「OpenCV2.4 for Python のサンプルプログラム一式」をダウンロード
DL/2_5/2_5.py
大事な箇所
import cv2
# OpenCV2 の使用
src = cv2.VideoCapture(0)
# カメラからの画像取得の準備
retval, frame = src.read()
# 1 フレーム取得
cv2.imshow(“Camer”, frame)
# フレーム表示
key = cv2.waitKey(33)
src.release()
# これがないと実際には表示が行われない
# ビデオファイルを閉じる(開いたら閉じること)
3. 顔の認識
桑井・豊沢・永田(2014)『実践 OpenCV2.4 for Python』カットシステム
5.6 物体検出（顔、眼、人物）
ゼミ資料ページから「OpenCV2.4 for Python のサンプルプログラム一式」をダウンロード
DL/5_6/5_6.py
大事な箇所(2_5 に加えて)
HAAR_FILE = 'haarcascade_frontalface_default.xml'
# Haar-like 特徴量の一つ、正面顔検出
cascade = cv2.CascadeClassifier(HAAR_FILE)
# フ Haar-like 特徴量ァイルの読み込み
# 顔検出 (いろいろなサイズの『顔らしいもの』の座標などが返る)
objects = cascade.detectMultiScale(frame, scaleFactor=1.1, minNeighbors=3,
flags=cv2.CASCADE_SCALE_IMAGE, minSize=(0, 0))
for x, y, w, h in objects:
#
検出数の繰り返し
cv2.rectangle(frame, (x, y), (x+w, y+h), (0, 255, 0), 4, cv2.CV_AA, 0) # 四角表示
cv2.imshow (windowName_1, frame)
#
検出結果表示
Haar-Like 特徴は、画像の明暗差により特徴を捉えるので、画面の明るさに左右される---代替品として LBP 特徴
量がある。どちらも小さな特徴を組み合わせて目的の物体を認識する方式。なお近年注目されているのは、HOG
特徴量。
OpenCV では、Haar-like と LBP の顔検出用のための資料が提供されている。また、これは OpenCV のプログ
ラムによって、自分で作り出すこともできる。それには、顔検出のための資料を作るとすれば、顔が写っている「正
解画像」と顔が写っていない「不正解画像」を大量に用意する必要がある。精度を考えると、正解画像は 7000 枚
以上、不正解画像は 3000 枚以上あると良いそうである。
表１．検出に関係する資料（他にもたくさんあるが主要なものだけ. OpenCV の data に格納されている）
Type of cascade classifier
XML filename
Face detector (default)
Face detector (fast Haar)
Face detector (fast LBP)
Profile (side-looking) face detector
Eye detector (separate for left and right)
Mouth detector
Nose detector
Whole person detector
haarcascade_frontalface_default.xml
haarcascade_frontalface_alt2.xml
lbpcascade_frontalface.xml
haarcascade_profileface.xml
haarcascade_lefteye_2splits.xml
haarcascade_mcs_mouth.xml
haarcascade_mcs_nose.xml
haarcascade_fullbody.xml
detectMultiScale メソッドの引数の説明: Baggio, D. L. et al. (2012) Mastering OpenCV with Practical Computer
Vision Projects. Packt. p.268 （ゼミ資料ページにあり）




minFeatureSize: This parameter determines the minimum face size that we care about, typically
20 x 20 or 30 x 30 pixels but this depends on your use case and image size. If you are performing
face detection on a webcam or martphone where the face will always be very close to the camera,
you could enlarge this to 80 x 80 to have much faster detections, or if you want to detect far away
faces, such as on a beach with friends, then leave this as 20 x 20.
searchScaleFactor: The parameter determines how many different sizes of faces to look for;
typically it would be 1.1 for good detection, or 1.2 for faster detection that does not find the face
as often.
minNeighbors: This parameter determines how sure the detector should be that it has detected a
face, typically a value of 3 but you can set it higher if you want more reliable faces, even if many
faces are not detected.
flags: This parameter allows you to specify whether to look for all faces (default) or only look for
the largest face (CASCADE_FIND_BIGGEST_OBJECT). If you only look for the largest face, it
should run faster. There are several other parameters you can add to make the detection about one
percent or two percent faster, such as CASCADE_DO_ROUGH_SEARCH or
CASCADE_SCALE_IMAGE.
参考: http://opencv.blog.jp/python/anime_face_detect
http://blog.adjust-work.com/212/
http://www.takunoko.com/blog/python で遊んでみる-part1-opencv で顔認識/
http://rest-term.com/archives/3131/
http://shkh.hatenablog.com/entry/2012/11/03/052251
4. 顔の識別までの道のり
Howse, J. (2013) OpenCV Computer Vision with Python. Packt
Chapter 4. (コードあり)
Baggio, D. L. et al. (2012) Mastering OpenCV with Practical Computer Vision Projects. Packt. Chapter 8
(1) 顔の検出 --- 今までの話
Haar や LBP （OpenCV で用意されているものを使う）, もしくは
特徴量の学習を行ッテできたものを使う
(2) 顔の前処理
Face recognition is extremely vulnerable to changes in lighting conditions, face orientation, face
expression, and so on, so it is very important to reduce these differences as much as possible.
The easiest form of face preprocessing is just to apply histogram equalization using the equalizeHist()
function, But for reliability in real-world conditions, we need many sophisticated techniques,
including facial feature detection (for example, detecting eyes, nose, mouth and eyebrows). For
simplicity, this chapter will just use eye detection and ignore other facial features such as the mouth
and nose, which are less useful.
Eye detectors that detect open or closed eyes are as follows:
• haarcascade_mcs_lefteye.xml (and haarcascade_mcs_righteye.xml)
• haarcascade_lefteye_2splits.xml (and haarcascade_righteye_2splits.xml)
Eye detectors that detect open eyes only are as follows:
• haarcascade_eye.xml
• haarcascade_eye_tree_eyeglasses.xml
探索範囲(サイズに対する割合)
Cascade Classifier
EYE_SX
EYE_SY
EYE_SW
EYE_SH
haarcascade_eye.xml
haarcascade_mcs_lefteye.xml
haarcascade_lefteye_2splits.xml
0.26
0.19
0.17
0.30
0.40
0.37
0.28
0.36
0.36
LBP を使用して顔検出後の信頼性と検出速度(i7 2.2GHz)
Cascade Classifier
Reliability*
Speed**
Eyes found
Glasses
haarcascade_mcs_lefteye.xml
80%
haarcascade_lefteye_2splits.xml 60%
haarcascade_eye.xml
40%
haarcascade_eye_tree_eyeglasses.xml
Open or closed
Open or closed
Open only
Open only
no
no
no
yes
0.16
0.10
0.12
15%
18 msec
7 msec
5 msec
10 msec
前処理
 Geometrical transformation and cropping: This process would include scaling, rotating, and
translating the images so that the eyes are aligned, followed by the removal of the forehead, chin,
ears, and background from the face image.
• Rotate the face so that the two eyes are horizontal.
• Scale the face so that the distance between the two eyes is always the same.
• Translate the face so that the eyes are always centered horizontally and at a desired height.
• Crop the outer parts of the face, since we want to crop away the image background, hair,
forehead, ears, and chin.
 Separate histogram equalization for left and right sides: This process standardizes the
brightness and contrast on both the left- and right-hand sides of the face independently.
 Smoothing: This process reduces the image noise using a bilateral filter.
 Elliptical mask: The elliptical mask removes some remaining hair and background from the face
image.
(3) 顔画像を集め、そこから学習する
Collecting faces can be just as simple as putting each newly preprocessed face into an array of
preprocessed faces from the camera, as well as putting a label into an array (to specify which person
the face was taken from). The face recognition algorithm will then learn how to distinguish between
the faces of the different people. This is referred to as the training phase and the collected faces are
referred to as the training set.
It is important that you provide a good training set that covers the types of variations you expect to
occur in your testing set. For example, if you will only test with faces that are looking perfectly
straight ahead (such as ID photos), then you only need to provide training images with faces that are
looking perfectly straight ahead. But if the person might be looking to the left or up, then you should
make sure the training set will also include faces of that person doing this, otherwise the face
recognition algorithm will have trouble recognizing them, as their face will appear quite different.
One way to obtain a good training set that will cover many different real-world conditions is
for each person to rotate their head from looking left, to up, to right, to down then looking directly
straight.
After you have collected enough faces for each person to recognize, you must train the system to
learn the data using a machine-learning algorithm suited for face recognition. There are many different
face recognition algorithms in literature, the simplest of which are Eigenfaces and Artificial Neural
Networks. Eigenfaces tends to work better than ANNs, and despite its simplicity, it tends to work
almost as well as many more complex face recognition algorithms, so it has become very popular as
the basic face recognition algorithm for beginners as well as for new algorithms to be compared to.
Any reader who wishes to work further on face recognition is recommended to read the theory behind:
• Eigenfaces (also referred to as Principal Component Analysis (PCA)
• Fisherfaces (also referred to as Linear Discriminant Analysis (LDA)
• Other classic face recognition algorithms (many are available at
http://www.face-rec.org/algorithms/)
• Newer face recognition algorithms in recent Computer Vision research papers (such as CVPR and
ICCV at http://www.cvpapers.com/), as there are hundreds of face recognition papers published each
year
Thanks to the OpenCV team and Philipp Wagner's libfacerec contribution, OpenCV v2.4.1 provided
cv::Algorithm as a simple and generic method to perform face recognition using one of several
different algorithms (even selectable at runtime) without necessarily understanding how they are
implemented.
Here are the three face recognition algorithms available in OpenCV v2.4.1:
• FaceRecognizer.Eigenfaces: Eigenfaces, also referred to as PCA, first used by Turk and Pentland in
1991.
• FaceRecognizer.Fisherfaces: Fisherfaces, also referred to as LDA, invented by Belhumeur,
Hespanha and Kriegman in 1997.
• FaceRecognizer.LBPH: Local Binary Pattern Histograms, invented by Ahonen, Hadid and
Pietikäinen in 2004.
These face recognition algorithms are available through the FaceRecognizer class in OpenCV.
Both the Eigenfaces and Fisherfaces algorithms first calculate the average face that is the
mathematical average of all the training images, so they can subtract the average image from each
facial image to have better face recognition results.
(4) 顔の認識
Thanks to OpenCV's FaceRecognizer class, we can identify the person in a photo simply by calling
the FaceRecognizer::predict() function on a facial image.
The problem with this identification is that it will always predict one of the given people, even if
the input photo is of an unknown person or of a car. It would still tell you which person is the most
likely person in that photo, so it can be difficult to trust the result! The solution is to obtain a confidence
metric so we can judge how reliable the result is, and if it seems that the confidence is too low then we
assume it is an unknown person.
To confirm if the result of the prediction is reliable or whether it should be taken as an unknown
person, we perform face verification (also referred to as face authentication), to obtain a confidence
metric showing whether the single face image is similar to the claimed person (as opposed to face
identification, which we just performed, comparing the single face image with many people).
OpenCV's FaceRecognizer class can return a confidence metric when you call the predict() function
but unfortunately the confidence metric is simply based on the distance in eigen-subspace, so it is not
very reliable.
これを乗り越える一つの方法：eigenspace から画像を復元、明るさ調整などした後、
We can now calculate how similar this reconstructed face is to the input face by using the same
getSimilarity() function we created previously for comparing two images, where a value less than 0.3
implies that the two images are very similar.
For Eigenfaces, there is one eigenvector for each face, so reconstruction tends to work well and
therefore we can typically use a threshold of 0.5, but Fisherfaces has just one eigenvector for each
person, so reconstruction will not work as well and therefore it needs a higher threshold, say 0.7.

Download Report