Update: Nov 28 2011 – The OpenCV framework has been rebuilt using opencv svn revision 7017
Introduction
Hot on the heels of our last article, in which we showed you how to build an OpenCV framework for iOS, we are turning our attention to capturing live video and processing video frames with OpenCV. This is the foundation for augmented reality, the latest buzz topic in computer vision. The article is accompanied by a demo app that detects faces in a real-time video feed from your iOS device’s camera. You can check out the source code for the app at GitHub or follow the direct download link at the end of the article.

As shown in our last article, OpenCV supports video capture on iOS devices using the cv::VideoCapture class from the highgui module. Calling the grab method of this class allows you to capture a single video frame and return it as a cv::Mat object for processing. However, the class is not optimized for processing live video:
- Each video frame is copied several times before being made available to your app for processing.
- You are required to ‘pull’ frames from
cv::VideoCaptureat a rate that you decide rather than being ‘pushed’ frames in real time as they become available. - No video preview is supported. You are required to display frames manually in your UI.
In designing image processing apps for iOS devices we recommend that you use OpenCV for what it excels at – image processing – but use standard iOS support for accessing hardware and implementing UI. It may be a philosophical standpoint, but we find that cross-platform layers such as OpenCV’s highgui module always incur performance and design restrictions in trying to support the lowest common denominator. With that in mind, we have implemented a re-useable view controller subclass (VideoCaptureViewController) that enables high performance processing of live video using video capture support provided by the AVFoundation framework. The controller automatically manages a video preview layer and throttles the rate at which video frames are supplied to your processing implementation to accomodate processing load. The components of the underlying AVFoundation video capture stack are also made available to you so that you can tweak behaviour to match your exact requirements.
The Video Capture View Controller
The AVFoundation video capture stack and video preview layer are conveniently wrapped up in the VideoCaptureViewController class provided with the demo source code. This class handles creation of the video capture stack, insertion of the view preview layer into the controller’s view hierarchy and conversion of video frames to cv::Mat instances for processing with OpenCV. It also provides convenience methods for turning the iPhone 4′s torch on and off, switching between the front and back cameras while capturing video and displaying the current frames per second.
The details of how to set up the AVFoundation video capture stack are beyond the scope of this article and we refer you to the documentation from Apple and the canonical application sample AVCam. If you are interested in how the stack is created, however, then take a look at the implementation of the createCaptureSessionForCamera:qualityPreset:grayscale: method, which is called from viewDidLoad. There are a number of interesting aspects of the implementation, which we will go into next.
Hardware-acceleration of grayscale capture
For many image processing applications the first processing step is to reduce the full-color BGRA data received from the video hardware to a grayscale image to maximize processing speed when color information is not required. With OpenCV, this is usually achieved using the cv::cvtColor function, which produces a single channel image by calculating the weighted average of the R, G and B components of the original image. In VideCaptureViewController we perform this conversion in hardware using a little trick and save processor cycles for the more interesting parts of your image processing pipeline.
If grayscale mode is enabled then the video format is set to kCVPixelFormatType_420YpCbCr8BiPlanarFullRange. The video hardware will then supply YUV formatted video frames in which the Y channel contains luminance data and the color information is encoding in the U and V chrominance channels. The luminance channel is used by the controller to create a single-channel grayscale image and the chrominance channels are ignored. Note that the video preview layer will still display the full-color video feed whether grayscale mode is enabled or not.
Processing video frames
VideoCaptureViewController implements the AVCaptureVideoDataOutputSampleBufferDelegate protocol and is set as the delegate for receiving video frames from AVFoundation via the captureOutput:didOutputSampleBuffer:fromConnection: method. This method takes the supplied sample buffer containing the video frame and creates a cv::Mat object. If grayscale mode is enabled then a single-channel cv::Mat is created; for full-color mode a BGRA format cv::Mat is created. This cv::Mat object is then passed on to processFrame:videoRect:videoOrientation: where the OpenCV heavy-lifting is implemented. Note that no video data is copied here: the cv::Mat that is created points right into the hardware video buffer and must be processed before captureOutput:didOutputSampleBuffer:fromConnection: returns. If you need to keep references to video frames then use the cv::Mat::clone method to create a deep copy of the video data.
Note that captureOutput:didOutputSampleBuffer:fromConnection: is called on a private GCD queue created by the view controller. Your overridden processFrame:videoRect:videoOrientation: method is also called on this queue. If you need to update UI based on your frame processing then you will need to use dispatch_sync or dispatch_async to dispatch those updates on the main application queue.
VideoCaptureViewController also monitors video frame timing information and uses it to calculate a running average of performance measured in frames per second. Set the showDebugInfo property of the controller to YES to display this information in an overlay on top of the video preview layer.
Video orientation and the video coordinate system
Video frames are supplied by the iOS device hardware in landscape orientation irrespective of the physical orientation of the device. Specifically, the front camera orientation is AVCaptureVideoOrientationLandscapeLeft (as if you were holding the device in landscape with the Home button on the left) and the back camera orientation is AVCaptureVideoOrientationLandscapeRight (as if you were holding the device in landscape with the Home button on the left). The video preview layer automatically rotates the video feed to the upright orientation and also mirrors the feed from the front camera to give the reflected image that we are used to seeing when we look in a mirror. The preview layer also scales the video according to its current videoGravity mode: either stretching the video to fill its full bounds or fitting the video while maintaining the aspect ratio.
All these transformations create a problem when we need to map from a coordinate in the original video frame to the corresponding coordinate in the view as seen by the user and vice versa. For instance, you may have the location of a feature detected in the video frame and need to draw a marker at the corresponding position in the view. Or a user may have tapped on the view and you need to convert that view coordinate into the corresponding coordinate in the video frame.
All this complexity is handled in -[VideoCaptureController affineTransformForVideoRect:orientation:], which creates an affine transform that you can use to convert CGPoints and CGRects between the video coordinate system and the view coordinate system. If you need to convert in the opposite direction then create the inverse transform using the CGAffineTransformInvert function. If you are not sure what an affine transform is then just look at the following code snippet for how to use them to convert CGPoints and CGRects between different coordinate systems.
// Create the affine transform for converting from the video coordinate system to the view coordinate system CGAffineTransform t = [self affineTransformForVideoRect:videoRect orientation:videoOrientation]; // Convert CGPoint from video coordinate system to view coordinate system viewPoint = CGPointApplyAffineTransform(videoPoint, t); // Convert CGRect from video coordinate system to view coordinate system viewRect = CGRectApplyAffineTransform(videoRect, t); // Create inverse transform for converting from view coordinate system to video coordinate system CGAffineTransform invT = CGAffineTransformInvert(t); videoPoint = CGPointApplyAffineTransform(viewPoint, t); videoRect = CGRectApplyAffineTransform(viewRect, t);
Using VideoCaptureViewController in your own projects
VideoCaptureViewController is designed to be re-useable in your own projects by subclassing it just as you would subclass Apple-provided controllers like UIViewController and UITableViewController. Add the header and implementation files (VideoCaptureViewController.h and VideoCaptureViewController.mm) to your project and modify your application-specific view controller(s) to derive from VideoCaptureViewController instead of UIViewController. If you want to add additional controls over the top of the video preview you can use Interface Builder and connect up IBOutlets as usual. See the demo app for how this is done to overlay the video preview with UIButtons. You implement your application-specific video processing by overriding the processFrame:videoRect:videoOrientation: method in your controller. Which leads us to face tracking…
Face tracking
Face tracking seems to be the ‘Hello World’ of computer vision and judging by the number of questions about it on StackOverflow many developers are looking for an iOS implementation. We couldn’t resist choosing it as the subject for our demo app either. The implementation can be found in the DemoVideoCaptureViewController class. This is a subclass of VideoCaptureViewController and, as described above, we’ve added our app-specific processing code by overriding the processFrame:videoRect:videoOrientation: method of the base class. We have also added three UIButton controls in InterfaceBuilder to demonstrate how to extend the user interface. These buttons allow you to turn the iPhone4 torch on and off, switch between the front and back cameras and toggle the frames-per-second display.
Processing the video frames
The VideoCaptureViewController base class handles capturing frames and wrapping them up as cv::Mat instances. Each frame is supplied to our app-specific subclass via the processFrame:videoRect:videoOrientation: method, which is overridden to implement the detection.
The face detection is performed using OpenCV’s CascadeClassifier and the ‘haarcascade_frontalface_alt2′ cascade provided with the OpenCV distribution. The details of the detection are beyond the scope of this article but you can find lots of information about the Viola-Jones method and Haar-like features on Wikipedia.
The first task is to rotate the video frame from the hardware-supplied landscape orientation to portrait orientation. We do this to match the orientation of the video preview layer and also to allow OpenCV’s CascadeClassifier to operate as it will only detect upright features in an image. Using this technique, the app can only detect faces when the device is held in the portrait orientation. Alternatively, we could have rotated the video frame based on the current physical orientation of the device to allow faces to be detected when the device is held in any orientation.
The rotation is performed quickly by combining a cv::transpose, which swaps the x axis and y axis of a matrix, and a cv::flip, which mirrors a matrix about a specified axis. Video frames from the front camera need to be mirrored to match the video preview display so we can perform the rotation with just a transpose and no flip.
Once the video frame is in the correct orientation, it is passed to the CascadeClassifier for detection. Detected faces are returned as an STL vector of rectangles. The classification is run using the CV_HAAR_FIND_BIGGEST_OBJECT flag, which instructs the classifier to look for faces at decreasing size and stop when it finds the first face. You can remove this flag at the start of DemoVideoCaptureViewController.mm, which instructs the classifier to start small, look for faces at increasing size and return all the faces it detects in the frame.
The STL vector of face rectangles (if any) is passed to the displayFaces:forVideoRect:videoOrientation: method for display. We use GCD’s dispatch_sync here to dispatch the call on the main application thread. Remember that processFrame:videoRect:videoOrientation: is called on our private video processing thread but UI updates must be performed on the main application thread. We use dispatch_sync rather than dispatch_async so that the video processing thread is blocked while the UI updates are being performed on the main thread. This will cause AVFoundation to discard video frames automatically while our UI updates are taking place and ensures that we are not processing video frames faster than we can display the results. In practice, processing the frame will take longer than any UI update associated with the frame but its worth bearing in mind if your app is doing simple processing accompanied by lengthy UI updates.
// Dispatch updating of face markers to main queue dispatch_sync(dispatch_get_main_queue(), ^{ [self displayFaces:faces forVideoRect:videoRect videoOrientation:videOrientation]; });
Displaying the face markers
For each detected face, the method creates an empty CALayer of the appropriate size with a 10 pixel red border and adds it into the layer hierarchy above the video preview layer. These ‘FaceLayers’ are re-used from frame to frame and repositioned within a CATransaction block to disable the default layer animation. This technique gives us a high-performance method for adding markers without having to do any drawing.
// Create a new feature marker layer featureLayer = [[CALayer alloc] init]; featureLayer.name = @"FaceLayer"; featureLayer.borderColor = [[UIColor redColor] CGColor]; featureLayer.borderWidth = 10.0f; [self.view.layer addSublayer:featureLayer]; [featureLayer release];
The face rectangles passed to this method are in the video frame coordinate space. For them to line up correctly with the video preview they need to be transformed into the view’s coordinate space. To do this we create a CGAffineTransform using the affineTransformForVideoRect:orientation: method of the VideoCaptureViewController class and use this to transform each rectangle in turn.
The displayFaces:forVideoRect:videoOrientation: method supports display of multiple face markers even though, with the current settings, OpenCV’s CascadeClassifier will return the single largest face that it detects. Remove the CV_HAAR_FIND_BIGGEST_OBJECT flag at the start of DemoVideoCaptureViewController.mm to enable detection of multiple faces in a frame.
Performance
On an iPhone 4 using the CV_HAAR_FIND_BIGGEST_OBJECT option the demo app achieves up to 4 fps when a face is in the frame. This drops to around 1.5 fps when no face is present. Without the CV_HAAR_FIND_BIGGEST_OBJECT option multiple faces can be detected in a frame at around 1.8 fps. Note that the live video preview always runs at the full 30 fps irrespective of the processing frame rate and processFrame:videoRect:videoOrientation: is called at 30 fps if you only perform minimal processing.
The face detection could obviously be optimized to achieve a faster effective frame rate and this has been discussed at length elsewhere. However, the purpose of this article is to demonstrate how to efficiently capture live video on iOS devices . What you do with those frames and how you process them is really up to you. We look forward to seeing all your augmented reality apps in the App Store!
Links to demo project source code
Git – https://github.com/aptogo/FaceTracker
Download zip – https://github.com/aptogo/FaceTracker/zipball/master

{ 22 comments… read them below or add one }
Thanks nice example its help me a lot.
Now, I am working on eye blink detection.Can you please guide me for same?
Thanks & Regards
Hitesh
For each face detected, perform another detection using the haarcascade_eye_tree_eyeglasses.xml cascade. You can restrict the eye detection to the upper portion of the face rectangle to speed things up. If you detect two eyes then everything is OK. No eyes = a blink, one eye = a wink
I actually implemented this in the demo but stripped it out because it was getting too complicated and moving away from the original purpose of just showing how to process live video.
Thanks for the reply.
If i blink eyes it also detect both eyes.So, I am not able to detect actual blink.
Can you provide me some sample code for it.
Thanks
Can you please give me some sample for eye tracking?
Thanks
Hi great articles!!
1) ‘NSInvalidArgumentException’, reason: ‘videoMirrored cannot be set because it is not supported by this connection. Use -isVideoMirroringSupported’ When I run this.
2) On previous article red/blue colors reversed on video capture, how to fix?
thanks great stuff!!!
1) Thanks for pointing that out! The lines setting the videoMirrored property were left over from an experiment. I’m developing on iOS5 where this property is supported so I didn’t see the exception. I’ve updated the Git repo. Look out for articles on new iOS5 features for image processing when the NDA is lifted.
2) OpenCV defaults to BGRA channel order rather than the RGBA order that you might be used to. It is the developer’s responsibility to ensure that cv::Mat objects are ordered correctly before display. Use cv::convertColor if you want to switch between BGRA and RGBA.
Thanks for the excellent articles. Can you give me some guidance about how I would modify your code to apply a convolution kernel to the pixels within the face rect and then display the result in the rect?
Thanks in advance!
Jim
How to restrict the eye detection to the upper portion of the face rectangle to speed things up? For eye blink detection. Can you please give me some idea?
Thanks
Any idea why i can’t add images to a view? I’m trying to convert a cv::Mat back to a UIImage, but it never shows in the view…
This worked with the code from your older post, but in this version it isn’t working for some reason… Any idea why?
- (void)viewDidLoad
{
[super viewDidLoad];
UIImage *testImage = [UIImage imageNamed:@"ie.png"];
cv::Mat tempMat = [testImage CVMat];
cv::cvtColor(tempMat, tpl, cv::COLOR_RGB2GRAY);
myView.image = [UIImage imageWithCVMat:tpl];
}
I’ve tried your code above and it works as expected. I had to declare ‘tpl’ as follows because it was missing from your code:
cv::Mat tpl;
There is no difference between the UIImage extensions between this project and the previous one so I’m not sure why you are seeing problems. Can you step through with the debugger and check that ‘testImage’ is being loaded (i.e. not nil)?
Hello, I am not able to track face in landscape mode.
Can please provide me some solution?
Thanks
Thanks so much for putting this together… just starting OpenCV and it’s so helpful to have some working code to start with.
One problem: I’m trying to run on an old iPhone 3, so need to use armv6… but xCode tells me that “aptogo-FaceTracker/OpenCV.framework/OpenCV, file is universal but does not contain a(n) armv6 slice for architecture armv6″…
reading your earlier post, it seemed like the build script was doing an armv6 build? perhaps you haven’t tested with an armv6 device?
makesensical?
Mark,
I recently re-built the framework for svn revision 7017. You are right that the armv6 slice is missing. I tested the original build on an iPhone 3 but didn’t re-test with this new version. My fault, sorry!
I’ll fix this and push a new version to GitHub. Thanks very much for pointing this out.
I’ve updated the project on GitHub to re-instate the missing armv6 slice in the framework and also to address a compatibility problem with iPhone 3.
Hi,
I’ve problem with using “some” function from openCV…when I compile your project, everything is OK and work well. But for example if i want use function cvSmooth(), doesn’t matter for what, It doesn’t works.
I don’t understand why? Any suggestion?
PS: Interesting is, that I can use Erode(), Dilate(), but cvSmooth() doesn’t.
u can use cv::medianBlur , GaussianBlur etc. to do the job
The application in Part 1 built and worked fine, but the code from Part 2 gives the following error:
2012-01-29 00:06:37.343 FaceTracker[1596:10703] *** Terminating app due to uncaught exception ‘NSUnknownKeyException’, reason: ‘[ setValue:forUndefinedKey:]: this class is not key value coding-compliant for the key view.’
I have absolutely no experience with iPad or Objective C, so I can’t do much. Judging from the fact that others are able to build this without any problems, unless you have made a change to the code since you submitted the article, it seems there is a problem with my Xcode settings. A compiler issue perhaps?
Thanks
Hi CK,
This is a problem that I haven’t seen before. What device are you trying to run this on and which version of iOS is it? Do you see any more information in the stack backtrace showing where in the code the exception was thrown?
Robin
how do I convert this to run with a video stream from the machine? for me, seems it is not doing anything in the simulator?. how do i use this to run in the simulator
You cannot capture video using the simulator. You must run this on an iOS device with a camera.
Thank you Robin.
Can you help me run this program by taking video from the machine, instead of taking a video from the camera. Thanks. What are the changes do I have to do?
Hi,
Is there a way i can get the curve of the face instead of a rectangle. I tried searching for it, but was not able to find anything for ios, can you please guide me through the same. Any help would be great .
{ 2 trackbacks }