Facial Recognition — a visual step by step
About a month ago I wrote an article about how I performed facial recognition with a single LinkedIn profile picture.
After this post, I had a co-worker ask how Facial Recognition actual works. I tried to explain it, text book style, but I realized that is not very effective. When working with a visual technology, being able to see how all of the parts come together, visually, is really powerful.
So I put together a Jupyter Notebook which you can find on my Github which shows visually each step.
Since Medium is obviously not a Juypter notebook — I wanted to run through the static highlites but I encourage you to clone the repo and try it. The one caveat is that I have only tested the Jupyter Notebook on a Mac. The jupyter notebook uses the VideoCamera to show you the HOG image, and the face detected along with the Facial Landmarks — all running in realtime from the video camera.
Step 1 — Find the faces in an image
Create a histogram of oriented gradients or HOG
What that means, is to look at every pixel, along with every surrounding pixel and ‘draw an arrow’ to the darkest neighboring pixel. Do this for every pixel in the image. The reason to look at the gradients instead of just pixel intensity is because the gradients typically do not change with variations in lightness and darkness of an image. A light or bright picture of a person will have roughly the same gradients as a darker picture of the same person.
Dealing the data at the pixel level can be a great deal of data, so typically the algorithm will then consider 16x16 pixel segments and create a resulting gradient ‘arrow’ of the majority vote of the contained gradients. With the 16x16 segment — you can start to see the outline of the image below.
If you are running the code from my github repo you would execute the following:
python video_hog.py
Press ‘q’ to quit out of the application. This is consistent for all of the applications in this post.
Use the HOG to find the faces
The libraries face_recognition and dlib which provide a programatic interface to the pre-trained models that can take a HOG and determine the area of the face.
You can see this by running the following in the github repo.
python video_hog_face_detect.py
Step 2 — Locate the 68 Facial Landmarks
Dlib comes with a pre-trained facial landmark detector that detects 68 different facial landmarks.
These facial landmarks will then be used as input to a CNN to generate 128 values representing the face.
Here is what that looks like
You can run this for yourself from the github repo.
python face_landmarks.py
Step 3 — Create a 128 value encoding of the facial landmarks
The generation of the 128 encoding values is done with a pre-trained CNN. The model has already been trained on millions of training images and therefore can create reliable encodings for faces that the model has not seen before. Pictures of the same person, should give roughly the same 128 valued encoded vector.
The process looks a little like the following (disclaimer — I am not sure what the exact CNN architecture looked like so this is just a loose interpretation):
The facial_recoginition library provides a very simple interface to get the encodings from an image and the collection of rectangles that represent the faces in the image.
As an example, below is a picture of the values for my face. I am showing the indexes: 0, 32, 64, 96, 127.
You can try this yourself by running:
python encodings_display.py
Once you have the 128 valued encoded vector, and the name of the person as the label, determining which person the face belongs to becomes a simple matter of calculating the distance of the new facial image against those in the encodings and selecting the one with the lowest distance value.
For this article and the notebook demonstration I am using a dataset from PyImageSearch.com. I highly recommend Adrian’s website and his books. He is a gifted instructor and I have learned a tremendous amount from him.
In Adrian’s article, he has a dataset of Jurassic Park characters. Using that dataset I created an encodings collection of each of those characters along with an encoding of myself. That collective dataset was used to determine if the facial recognition library could determine if a new picture was me.
To encode all of the pictures, create a directory that contains folders for each person and execute the following:
python create_facial_encodings.py -d images/dataset -e encodings/facial_encodings.pkl -r true
The -d is the root directory containing all of the directories of each person. The -e is where to output the encodings file and the -r is used to indicate whether any existing encodings file should be removed first. If you do not use the -r, the script will add to the existing encodings file.
Step 4 — Calculate the distance of a new image against the encodings
Given the new 128 valued encoding of a new image, calculate the Euclidean distance and select the one with the shortest distance. By default the facial_recognition library uses a threshold of 0.6, but you should work with the value to reduce the false positives. I found a value of 0.5 or 0.55 worked better.
The picture I used from my LinkedIn profile.
Lets create an encoding of the LinkedIn picture and compare that against the encodings of the other images.
python calculate_encoding_distance.py -i images/pat_ryan_linkedin/pat.ryan.smaller.png -e encodings/facial_encodings.pkl
The script above takes in the image to compare and the encodings file and produces an ordered list of distances. Below is a shortened list from the script above.
(0.32661354282556054, 'pat_ryan')(0.3396320380221242, 'pat_ryan')(0.37282213715155005, 'pat_ryan')(0.3873396941553151, 'pat_ryan')(0.3931766472928196, 'pat_ryan')(0.5899434975941679, 'john_hammond')(0.5915515133378403, 'pat_ryan')(0.5931512383511677, 'pat_ryan')(0.6181329489347294, 'john_hammond')(0.6220354071101851, 'pat_ryan')(0.6288404317148943, 'claire_dearing')(0.6289871731418288, 'pat_ryan')(0.6362522359834057, 'ellie_sattler')(0.6385852659752145, 'ian_malcolm')(0.6402352852332767, 'pat_ryan')(0.64030341755603, 'ian_malcolm')(0.6481591277972383, 'john_hammond')(0.6492390421627161, 'pat_ryan')(0.6536934378322224, 'pat_ryan')(0.6558663581518828, 'john_hammond')(0.6581323325011266, 'ian_malcolm')(0.6581323325011266, 'ian_malcolm')(0.9935973296143709, 'claire_dearing')
You can see from the list above, that even with a default threshold of 0.6, most — but not all — of the faces were mine. John Hammond snuck in there. But a threshold of 0.5 would have only matched my encodings.
Try it yourself
You might want to try this yourself, using your own face. In the Jupyter Notebook there is a section called, Photo Booth. You can use your computers webcam to collect some pictures, encode them and run the facial recognition for yourself.
To capture new images you can run ( obviously for the -n parameter you should use your name) to collect 10 pictures:
python capture_webcam_face_images.py -d images/dataset -n ernest_t_bass -c 10
Then create a new set of encodings that includes your pictures:
python create_facial_encodings.py -d images/dataset/ernest_t_bass -e encodings/facial_encodings.pkl
Notice in this case the -d option includes the directory name of your pictures — NOT the dataset directory. We only want to add your pictures to the collection. Also notice, that the scripts does not use the -r option, because we do not want to remove the existing encodings — just add to it.
After we have the encodings, run the video script to see if the video camera recognizes you.
python video_facial_recognition.py -e encodings/facial_encodings.pkl --distance-tolerance 0.5
In closing…
I hope this gives you a more visual, hands on, insight into how facial recognition is working.
Sources
This post is really a collection of my readings and journey to understand how facial recognition works. I have used a couple of sources that are invaluable in learning about this and I recommend you check them out.
Adrian has amazing articles and equally amazing books. His material is clearly written and accurate. I highly recommend his website and books.
Adam is the creator and maintainer of the library facial_recognition that provides the fantastic wrapper around the dlib library. His library makes the task very straightforward and accessible to those without a deep machine learning background. His articles cover more than just computer vision — and his book is also a great read.