The Third EYE: Sixth Sense Image Capturing Technology

www.gunjangupta.net

1 Introduction

The Sixth Sense image capturing technology is founded by Pranav Mistry at MIT Media Lab. The demo of sixth sense device also presented at TED Talks. The technology of capturing image is based on gesture detection. There are four different strips bent on the fingers of two hands as shown in figure 5.1. The camera records the video and processing of video can be performed frame by frame to detect the gesture shape based on color detection of the strips. When all the four strips is detected for certain amount of time, the image will be captured and cropped with the Region of Interest (ROI) evaluated from four points.

Figure 1: Gesture detection using 4 points

The improved design of this technology is getting ROI from two points, which are placed diagonally in rectangle. It has an advantage of less computational power and increased processing speed. Hence, the device can be designed on low cost embedded system. For that, the requirement of image processing is color detection and finding the position of color. Another thing is to use circles to detect colors as circle is the most compressed shape with given area. Detection of circle can be achieved by “Circular Hough Transform”.

2 Hough’s Transform

The Hough transform is a feature extraction technique used in image analysis, computer vision, and digital image processing. The purpose of the technique is to find imperfect instances of objects within a certain class of shapes by a voting procedure. This voting procedure is carried out in a parameter space, from which object candidates are obtained as local maxima in a so-called accumulator space that is explicitly constructed by the algorithm for computing the Hough transform.

In automated analysis of digital images, a sub problem often arises of detecting simple shapes, such as straight lines, circles or ellipses. In many cases an edge detector can be used as a pre-processing stage to obtain image points or image pixels that are on the desired curve in the image space. Due to imperfections in either the image data or the edge detector, however, there may be missing points or pixels on the desired curves as well as spatial deviations between the ideal line/circle/ellipse and the noisy edge points as they are obtained from the edge detector. For these reasons, it is often non-trivial to group the extracted edge features to an appropriate set of lines, circles or ellipses. The purpose of the Hough transform is to address this problem by making it possible to perform groupings of edge points into object candidates by performing an explicit voting procedure over a set of parameterized image objects.

The simplest case of Hough transform is the linear transform for detecting straight lines. In the image space, the straight line can be described as,

y = mx + b

where, the parameter m is the slope of the line, and b is the intercept (y-intercept).

This is called the slope-intercept model of a straight line. In the Hough transform, a main idea is to consider the characteristics of the straight line not as discrete image points (x1, y1), (x2, y2), etc., but instead, in terms of its parameters according to the slope-intercept model, i.e., the slope parameter m and the intercept parameter b. In general, the straight line y = mx + b can be represented as a point (b, m) in the parameter space. However, vertical lines pose a problem. They are more naturally described as x = a, and would give rise to unbounded values of the slope parameter m. Thus, for computational reasons, Duda and Hart proposed the use of a different pair of parameters, denoted r and θ, for the lines in the Hough transform. These two values, taken in conjunction, define a polar coordinate. It is shown in figure 5.2. The points on line can be shown in (x,y) plane with (r, θ) coordinates in Hough’s space.

It is therefore possible to associate with each line of the image a pair (r, θ) which is unique if θ ∈ [0, π] and r ∈ R, or if θ ∈ [0, 2π] and r > 0.

Figure 2: Hough’s space denotation as (r, θ) for line.

The linear Hough transform algorithm uses a two-dimensional array, called an accumulator, to detect the existence of a line described by r = x cosθ + y sinθ. The dimension of the accumulator equals the number of unknown parameters, i.e., two, considering quantized values of r and θ in the pair (r,θ). For each pixel at (x,y) and its neighborhood, the Hough transform algorithm determines if there is enough evidence of a straight line at that pixel. If so, it will calculate the parameters (r,θ) of that line, and then look for the accumulator's bin that the parameters fall into, and increment the value of that bin. By finding the bins with the highest values, typically by looking for local maxima in the accumulator space, the most likely lines can be extracted, and their (approximate) geometric definitions read off. The simplest way of finding these peaks is by applying some form of threshold, but other techniques may yield better results in different circumstances - determining which lines are found as well as how many. Since the lines returned do not contain any length information, it is often necessary, in the next step, to find which parts of the image match up with which lines. Moreover, due to imperfection errors in the edge detection step, there will usually be errors in the accumulator space, which may make it non-trivial to find the appropriate peaks, and thus the appropriate lines.

The final result of the linear Hough transform is a two-dimensional array (matrix) similar to the accumulator -- one dimension of this matrix is the quantized angle θ and the other dimension is the quantized distance r. Each element of the matrix has a value equal to the number of points or pixels that are positioned on the line represented by quantized parameters (r, θ). So the element with the highest value indicates the straight line that is most represented in the input image.

The line can be found in image using Hough’s transform as explained following example.

Figure 3: Accumulator tables for (r, θ) for three points

Following steps are used to find the actual (r, θ) point representing the line passing by three points given in the figure 3:

For each data point, a number of lines are plotted going through it, all at different angles. These are shown here as solid lines.
For each solid line a line is plotted which is perpendicular to it and which intersects the origin. These are shown as dashed lines.
The length (i.e. perpendicular distance to the origin) and angle of each dashed line is measured. In the diagram above, the results are shown in tables.
This is repeated for each data point.
A graph of the line lengths for each angle, known as a Hough space graph, is then created. It is shown in figure 4. The intersection point of all 3 curves is the actual value of (r, θ) representing the line passing through 3 points.

Figure 4: (r, θ) graph for 3 points

3 Circular Hough Transform

Unlike the linear HT, the CHT relies on equations for circles. The equation of the a circle is,

r² = (x – a)² + (y – b)²

Here, a and b represent the coordinates for the center, and r is the radius of the circle. The parametric representation of this circle is

x = a + r*cosθ

y = b + r*sinθ

In contrast to a linear HT, a CHT relies on 3 parameters, which requires a larger computation time and memory for storage, increasing the complexity of extracting information from our image. For simplicity, most CHT programs set the radius to a constant value (hard coded) or provide the user with the option of setting a range (maximum and minimum) prior to running the application.

For each edge point, a circle is drawn with that point as origin and radius r. The CHT also uses an array (3D) with the first two dimensions representing the coordinates of the circle and the last third specifying the radii. The values in the accumulator (array) are increased every time a circle is drawn with the desired radii over every edge point. The accumulator, which kept counts of how many circles pass through coordinates of each edge point, proceeds to a vote to find the highest count. The coordinates of the center of the circles in the images are the coordinates with the highest count.

4 Block diagram for circle detection

The block diagram of detection of circle is shown in figure 5.5. The input image is preprocessed to remove noise. The image is enhanced by adjusting the contrast and brightness. Also, the image is converted into grayscale. The next is to segment the image by edge detector to detect the edges in image. If the solid circle is present, the boundary of the circle will be extracted in white color in the output image from segmentation process. Finally, the image is given into Hough’s transform block to detect the lines or circle according to the transformation parameters.

Figure 5: Block diagram for Circle detection

Result

The detection of two different circles is necessary to detect the two diagonal points of rectangle. For that purpose, application of Hough’s circle detection algorithm is repeated twice with different preprocessing parameters. The preprocessing block involves filtering and thresholding with the RGB value of pixels. So, red color can be detected by keeping the (R, G, B) filtering value in (150, 0, 0) to (255, 30, 30). The first set of RGB value shows the minimum value which is allowed in image to have white pixel value or 255 grayscale value. The second set of RGB value shows the maximum value, which are allowed to have white pixel value. If the image pixel value is in between the minimum and maximum values, the output image will contain white color or 255 grayscale value at the corresponding pixel position. If the image pixel value doesn’t satisfy the criteria and falls beneath the minimum or maximum value, the output image will contain black color or 0 grayscale value at the corresponding pixel position.

Same way the detection of blue color can be done by setting RGB minima at (0, 0, 150) and maxima at (30, 30, 255).

Hence, two different preprocessed images are available, which are filtered with red and blue color criteria. Applying Hough’s circle detection on these images produce two points, which are center of the red and blue color circle. Assuming that, (x₁, y₁) and (x₂, y₂) are the center of red and blue circles, respectively.

From two points (x₁, y₁) and (x₂, y₂), which are center of two circles, the rectangle can be drawn joining these points in diagonal direction. This rectangle can be considered as ROI for cropping an image. It is shown in figure 6.

Figure 6: Deciding the Region of Interest (ROI) from two points

Finally, the cropped image is stored. The above image processing can be done by OpenCV libraries. For designing the device pandaboard can be used. Raspberry Pi, which is low cost and lower specification than pandaboard, can also be used as this is the optimized sixth sense capturing using two color detection.

IMAGE IS CAPTURED USING GESTURE DETECTION & IT WILL BE STORED ON GOOGLE DRIVE THROUGH CLOUD COMPUTING.

The Third EYE

Tuesday, June 3, 2014

Sixth Sense Image Capturing Technology

No comments:

Post a Comment