Streaming Video Facial Recognition on Amazon Web Services
AWS is back with yet another great tool that’s been approved for HIPAA use—Amazon Rekognition is a robust video and image analysis tool for use whenever your application needs to identify objects, people, or different activities. It does a great job as a text mining library, too, able to sift through tons of unstructured data with the deep learning technology developed by Amazon’s computer vision experts.
Today’s walkthrough is a technical introduction to using Rekognition’s Video library to detect faces within a video stream.
Rekognition Video uses Kinesis video streams for input, along with a collection of known `FaceSearch` parameters, which is information about the face recognition criteria.
The following diagram is our completed stream processor architecture:
Although certain implementations of live streaming analysis will be more robust (terrain analysis, multi-input sourcing, etc.) this overview is meant to instruct a high-level overview of Rekognition and its most basic workflow.
Here are the steps to follow to begin your live-stream analysis:
Send streaming video to Rekognition Video via Kinesis video stream
Kinesis video streams incorporate streaming video from just about any tool imaginable, including webcams using GStreamer, mobile phones, video cameras, drones, etc.
Use the `CreateStreamProcessor` API to begin analyzing your Kinesis video data
The Rekognition Video Stream Processor will contain information about the Kinesis data stream and the Kinesis video stream. It will also hold the identification information for the collection of faces you want to recognize. Here is a sample `CreateStreamProcessor` request:
{
"Name": "streamProcessorForCam",
"Input": {
"KinesisVideoStream": {
"Arn": "arn:aws:kinesisvideo:us-east-1:nnnnnnnnnnnn:stream/inputVideo"
}
},
"Output": {
"KinesisDataStream": {
"Arn": "arn:aws:kinesis:us-east-1:nnnnnnnnnnnn:stream/outputData"
}
},
"RoleArn": "arn:aws:iam::nnnnnnnnnnn:role/roleWithKinesisPermission",
"Settings": {
"FaceSearch": {
"CollectionId": "collection-with-100-faces",
"FaceMatchThreshold": 85.5
}
}
}
The `Name` parameter is for your application to use. The stream processor can be named whatever you want. Any relevant facial recognition collection will be stored in the `Settings` hash.
Read streaming video analysis from Amazon Kinesis Data Streams output stream
Rekognition Video will place a JSON frame record for each analyzed frame into the Kinesis output stream. **Worth noting:** Rekognition doesn’t analyze every single frame that comes via the Kinesis video stream.
The following is a sample record analysis streamed by Rekognition Video:
{
"InputInformation": {
"KinesisVideo": {
"StreamArn": "arn:aws:kinesisvideo:us-west-2:nnnnnnnnnnnn:stream/stream-name",
"FragmentNumber": "91343852333289682796718532614445757584843717598",
"ServerTimestamp": 1510552593.455,
"ProducerTimestamp": 1510552593.193,
"FrameOffsetInSeconds": 2
}
},
"StreamProcessorInformation": {
"Status": "RUNNING"
},
"FaceSearchResponse": [
{
"DetectedFace": {
"BoundingBox": {
"Height": 0.075,
"Width": 0.05625,
"Left": 0.428125,
"Top": 0.40833333
},
"Confidence": 99.975174,
"Landmarks": [
{
"X": 0.4452057,
"Y": 0.4395594,
"Type": "eyeLeft"
},
{
"X": 0.46340984,
"Y": 0.43744427,
"Type": "eyeRight"
},
{
"X": 0.45960626,
"Y": 0.4526856,
"Type": "nose"
},
{
"X": 0.44958648,
"Y": 0.4696949,
"Type": "mouthLeft"
},
{
"X": 0.46409217,
"Y": 0.46704912,
"Type": "mouthRight"
}
],
"Pose": {
"Pitch": 2.9691637,
"Roll": -6.8904796,
"Yaw": 23.84388
},
"Quality": {
"Brightness": 40.592964,
"Sharpness": 96.09616
}
},
"MatchedFaces": [
{
"Similarity": 88.863960,
"Face": {
"BoundingBox": {
"Height": 0.557692,
"Width": 0.749838,
"Left": 0.103426,
"Top": 0.206731
},
"FaceId": "ed1b560f-d6af-5158-989a-ff586c931545",
"Confidence": 99.999201,
"ImageId": "70e09693-2114-57e1-807c-50b6d61fa4dc",
"ExternalImageId": "matchedImage.jpeg"
}
}
]
}
]
}
`FaceSearchResponse` will contain information about any matches found against the input collection you supplied when you created your stream processor. Note how each item in the `MatchedFaces` array contains specific information about face objects (left eye, right eye, mouth, etc.)
Delete your stream processor for easy cleanup
Eventually, you will gather what you need from your streamed data and will need to do a little housekeeping. `DeleteStreamProcessor` is available by using the `Name` value you set during the `CreateStreamProcessor` stage. Note that it’s good to have a few stream processor names in use by your application, as sometimes a given name is unavailable for a few seconds after deletion.
Analyze everything!
This technical overview of Amazon Rekognition Video is meant to be short and sweet, but the library is anything but. I'm curious: how do you see this technology being used in healthcare? My initial thoughts revolve around video-based monitoring for elderly populations, but it could even be used for user verification on healthcare apps!