iOS SDK Augmented Reality: Camera Setup

The concept of Augmented Reality (AR) has received a groundswell of interest in recent years as the iPhone and other mobile devices have placed evermore powerful processors, sensors, and cameras into the hands of millions of people across the globe. For the first time since the term was coined in the 1990s, the average consumer can now peer through the screen of their smartphone and find a layer of reality they never knew existed. This Mobiletuts+ premium series will venture into the world of both Augmented Reality and Mixed Reality. It will demonstrate, step-by-step, how to merge the iPhone camera and sensors with computer graphics to create an ehanced, modified, or even distorted version of the world around you.

The emphasis of this tutorial series is on understanding the components available on the iPhone that enable Augmented Reality and Mixed Reality experiences. This will be achieved by creating a simple AR demo, and this tutorial will start us off on that path by tackling the most fundamental task in many AR apps: gaining programmatic access to the device video feed.

Step 1: Create a New Xcode Project

Begin by launching Xcode and creating a new project. Select "View-based Application" as the template type and click "Next":

On the next screen, enter "MobiletutsAR" as the product name.

For the Company Identifier field, I've used "com.mobiletuts.ardemo", but you will need to provide the unique identifier that matches your own Developer or Ad Hoc Distribution provisioning profile.

The process of setting up a device for application testing is beyond the scope of this tutorial, but documentation on this process is available from the following sources:

Official Apple Documentation (Updated for Xcode 4)
Mobiletuts+ Guide (From 2010, demonstrates manual setup with Xcode 3)
Stackoverflow Question

As mentioned above, you will need to be a member of the paid iOS developer program in order to test applications on physical iOS devices.

Select "iPhone" for the Device Family dropdown. While this tutorial will focus on building a simple AR application for the iPhone, the principles demonstrated will apply to any other iOS device with the required hardware.

Uncheck the "Include Unit Tests" box, and then click "Next". Unit testing is unnecessary for this AR demo project, but can be a great help in the real-world software development lifecycle (official Apple documentation on unit testing).

The final step in this process is to select where you'd like to store the project on your hard drive and click "Create".

Step 2: Import the Framework Requirements

To begin working with camera data and complete this tutorial, we're going to need the AVFoundation framework. The AVFoundation framework provides the necessary functionality to capture images from the device camera, which is the primary focus of this tutorial. It also provides methods to create, manage, and playback other media resources.

For more detailed information, check out Apple's official AVFoundation Reference and Programming Guide.

The process of importing a framework into an Xcode 4 project is straightforward and likely well understood, but, for those who are new to Xcode 4, it will be demonstrated nonetheless.

Select the "ARDemo" project in the Xcode 4 Project Navigator. Next click the "ARDemo" target and then click the "Build Phases" tab before expanding the "Link Binary With Libraries" dropdown.

Click the "+" symbol and select the "AVFoundation.framework" value from the list. Click "Add".

After the framework has been added and is displayed in the Xcode Project Navigator, drag it into the "Frameworks" folder to keep the project orderly.

Repeat this process to add both "CoreMedia.framework" and "CoreVideo.framework".

Now that we have the above frameworks in our project, we need to make them available to our code. Open ARDemoViewController.h and add the following line:

 
#import <AVFoundation/AVFoundation.h>

Step 3: Configure the View Controller Interface

For this lesson, we're going to need to declare two data members and one method in the ARDemoViewController.h file:

 
@interface ARDemoViewController : UIViewController 
{ 
    AVCaptureSession *cameraCaptureSession; 
    AVCaptureVideoPreviewLayer *cameraPreviewLayer; 
} 
 
- (void) initializeCaptureSession; 
 
@end

On line 3, an instance of AVCaptureSession is declared. The AVCaptureSession class is the core component used to manage video input received from the device camera. In addition to managing the camera input, the class also provides delegate methods that make each frame available to your application for custom processing. For this reason, I like to think of it as a media "pipeline".

On line 4, an instance of AVCaptureVideoPreviewLayer is declared. This is a special subclass of CALayer designed to work with an AVCaptureSession by displaying the video output streamed from the camera.

On line 7 the -(void)initializeCaptureSession method is declared. This method will initialize the cameraCaptureSession, link cameraPreviewLayer with cameraCaptureSession, and add cameraCaptureSession to the view.

Step 4: Check for Camera Availability

The rest of this tutorial will focus on the -initializeCaptureSession method declard in step 3. This method should be called from -viewDidLoad because we want the device camera to immediately begin streaming when the application loads. Let's go ahead and set this up:

 
- (void)viewDidLoad 
{ 
    [super viewDidLoad]; 
    [self initializeCaptureSession]; 
} 
 
-(void)initializeCaptureSession 
{ 
 
}

If you've done much work with the camera in UIKit, you're probably already familiar with the UIImagePickerController class method +availableMediaTypesForSourceType: which returns an NSArray of available source types. You can then iterate over the returned array looking for the desired source type to determine whether or not the device can take video.

The AVFoundation framework provides similar functionality for finding available sources of input (called "capture devices").

 
    // Attempt to initialize AVCaptureDevice with back camera 
    NSArray *videoDevices = [AVCaptureDevice devicesWithMediaType:AVMediaTypeVideo]; 
    AVCaptureDevice *captureDevice = nil; 
    for (AVCaptureDevice *device in videoDevices) 
    { 
        if (device.position == AVCaptureDevicePositionBack) 
        { 
            captureDevice = device; 
            break; 
        } 
    } 
     
    // If camera is accessible by capture session 
    if(captureDevice) 
    { 
        // Desired AVCaptureDevice is available 
    } 
    else 
    { 
       //  Desired AVCaptureDevice isn't available. Aler the user and bail. 
       UIAlertView *alert = [[UIAlertView alloc] 
                              initWithTitle:@"Camera Not Available"  
                              message:@""  
                              delegate:nil  
                              cancelButtonTitle:@"Okay"  
                              otherButtonTitles:nil]; 
        [alert show]; 
         
        [alert release]; 
    }

On line 2 an array of all the available capture devices capable of streaming video is created. On lines 4 - 11, that array is iterated over in an attempt to find a device with the position value AVCaptureDevicePositionBack (if you'd rather complete this tutorial with the front-facing camera, you could instead search for AVCaptureDevicePositionFront).

The rest of the code above simply sets up a conditional that tests captureDevice to allow us to dynamically respond if, for whatever reason, the desired device isn't available.

Step 5: Initialize the Capture Session

Having made sure that the desired input device is in fact available, let's go ahead and setup our video capture session:

 
    // If camera is accessible by capture session 
    if(captureDevice) 
    { 
         
        // Allocate camera capture session 
        cameraCaptureSession = [[AVCaptureSession alloc] init]; 
        cameraCaptureSession.sessionPreset = AVCaptureSessionPresetMedium; 
         
        // Configure capture session input 
        AVCaptureDeviceInput *videoIn = [AVCaptureDeviceInput deviceInputWithDevice:captureDevice error:nil]; 
        [cameraCaptureSession addInput:videoIn]; 
         
        // Configure capture session output 
        AVCaptureVideoDataOutput *videoOut = [[AVCaptureVideoDataOutput alloc] init]; 
        [videoOut setAlwaysDiscardsLateVideoFrames:YES]; 
        [cameraCaptureSession addOutput:videoOut]; 
        [videoOut release];

On line 6, the cameraCaptureSession is allocated (this will be released later).

On line 7, we specify that the quality of the video output generated from this capture session should be AVCaptureSessionPresetMedium. The official Apple documentations lists the following available choices for this setting:

AVCaptureSessionPresetPhoto

Specifies capture settings suitable for high resolution photo quality output.

AVCaptureSessionPresetHigh

Specifies capture settings suitable for high quality video and audio output.

AVCaptureSessionPresetMedium

Specifies capture settings suitable for output video and audio bitrates suitable for sharing over WiFi.

AVCaptureSessionPresetLow

Specifies capture settings suitable for output video and audio bitrates suitable for sharing over 3G.

AVCaptureSessionPreset640x480

Specifies capture settings suitable for VGA quality (640x480 pixel) video output.

AVCaptureSessionPreset1280x720

Specifies capture settings suitable for 720p quality (1280x720pixel) video output.

Setting a higher output value means you have more data available for processing and analysis in addition to a final output image of higher visual quality. However, those benefits come with a price: slower processing speeds. When working with video or attempting dynamic object recognition, you'll find that tweaking this setting can help dramatically.

Also note that while specific capture presets are available (e.g. 1280x720, 640x480), it's generally advisable to use one of the more generic "High", "Medium", or "Low" values. This makes your application more portable and robust as it can run on devices (current or future!) that may not support a given hard-coded value.

Lines 8 - 10 create an AVCaptureDeviceInput object, using the capture device we created earlier, and then add the object to our capture session.

Lines 14 - 17 setup an AVCaptureDeviceOutput object and add it to our capture session. On line 15, we configure the videoOut to always discard frames that are received late. You have to keep in mind that the capture session will be managing many frames every single second, and, due to variances in processor load, it's possible that some of those frames may arrive later than others. Because we are building an application that is attempting to generate a live window into the world behind the iPhone, we don't care about late frames, so we discard them. We only want to be processing and displaying images that are as close to the current millisecond as possiblle.

Step 6: Add a Video Preview Layer

Having configured the input and output of the capture session, it's time to bind the preview layer declared in the interface to the capture session video stream. This will provide users with a digital window of the reality around them.

 
        // Bind preview layer to capture session data 
        cameraPreviewLayer = [[AVCaptureVideoPreviewLayer alloc] initWithSession:cameraCaptureSession]; 
        CGRect layerRect = self.view.bounds; 
        cameraPreviewLayer.bounds = self.view.bounds; 
        cameraPreviewLayer.position = CGPointMake(CGRectGetMidX(layerRect), CGRectGetMidY(layerRect)); 
        cameraPreviewLayer.videoGravity = AVLayerVideoGravityResizeAspectFill; 
         
        // Add preview layer to UIView layer 
        [self.view.layer addSublayer:cameraPreviewLayer];

The first five lines of the above code snippet allocate the preview layer and resize the bounds of the layer to fill the device screen.

The next line sets the videGravity property. This setting controls how video frames should be rendered in the preview layer. The official Apple documentation outlines the following possibilities:

AVLayerVideoGravityResize

Specifies that the video should be stretched to fill the layer’s bounds.

AVLayerVideoGravityResizeAspect

Specifies that the player should preserve the video’s aspect ratio and fit the video within the layer’s bounds.

AVLayerVideoGravityResizeAspectFill

Specifies that the player should preserve the video’s aspect ratio and fill the layer’s bounds.

As you can tell from the code above, I think that AVLayerVideoGravityResizeAspectFill best fits our use case.

The final line above is responsible for actually adding the preview CALayer to the view controller layer, making it visible to the user.

Step 7: Begin Streaming Video

In the six steps above, we've created our project and created an AVCaptureSession bound to the rear camera for input and a custom preview layer for output. All that remains at this stage is to actually begin streaming video by starting the capture session:

 
        // Begin camera capture 
        [cameraCaptureSession startRunning];  
    }

With the capture session initialization complete, now seems like a good time to release the memory we allocated. Do so with the following code:

 
- (void)viewDidUnload 
{ 
    [super viewDidUnload]; 
    [cameraCaptureSession stopRunning]; 
    [cameraCaptureSession release], cameraCaptureSession = nil; 
    [cameraPreviewLayer release], cameraPreviewLayer = nil; 
} 
 
- (void)dealloc 
{ 
    [cameraCaptureSession stopRunning]; 
    [cameraCaptureSession release]; 
    [cameraPreviewLayer release]; 
         
    [super dealloc]; 
}

If you save, build, and run the project after adding the above lines of code, you should be able to take a glimpse at the world around you straight through the iPhone screen.

Wrap Up

This tutorial has walked you through the process of using the AVFoundation framework to initiate video capture and display each frame in near real-time on an iPhone screen. Due to the ubiquity of video cameras all around us (as well as the iPhone Camera app), this may not seem like much of an accomplishment, but it really is! This is the first step in many augmented reality applications: digitizing the user's view of the perceived world. From here we could continue in any number of directions from simply overlaying custom views into the real-world frame to actually manipulating the pixel data of the world to enhance or alter our vision of reality.

The next Mobiletuts+ premium tutorial in this series will continue by taking our digital window of the world and making it a bit more interesting. Stay tuned, and thanks for reading!

HIGHLIGHTS OF THE DAY