1 Introduction
The Sixth
Sense image capturing technology is founded by Pranav Mistry at MIT Media Lab.
The demo of sixth sense device also presented at TED Talks. The technology of
capturing image is based on gesture detection. There are four different strips
bent on the fingers of two hands as shown in figure 5.1. The camera records the
video and processing of video can be performed frame by frame to detect the
gesture shape based on color detection of the strips. When all the four strips
is detected for certain amount of time, the image will be captured and cropped
with the Region of Interest (ROI) evaluated from four points.
Figure 1: Gesture detection using 4 points
The improved
design of this technology is getting ROI from two points, which are placed
diagonally in rectangle. It has an advantage of less computational power and
increased processing speed. Hence, the device can be designed on low cost
embedded system. For that, the requirement of image processing is color
detection and finding the position of color. Another thing is to use circles to
detect colors as circle is the most compressed shape with given area. Detection
of circle can be achieved by “Circular Hough Transform”.
2 Hough’s
Transform
The Hough
transform is a feature extraction technique used in image analysis, computer
vision, and digital image processing. The purpose of the technique is to find
imperfect instances of objects within a certain class of shapes by a voting
procedure. This voting procedure is carried out in a parameter space, from
which object candidates are obtained as local maxima in a so-called accumulator
space that is explicitly constructed by the algorithm for computing the Hough
transform.
In automated
analysis of digital images, a sub problem often arises of detecting simple
shapes, such as straight lines, circles or ellipses. In many cases an edge
detector can be used as a pre-processing stage to obtain image points or image
pixels that are on the desired curve in the image space. Due to imperfections
in either the image data or the edge detector, however, there may be missing
points or pixels on the desired curves as well as spatial deviations between
the ideal line/circle/ellipse and the noisy edge points as they are obtained
from the edge detector. For these reasons, it is often non-trivial to group the
extracted edge features to an appropriate set of lines, circles or ellipses.
The purpose of the Hough transform is to address this problem by making it
possible to perform groupings of edge points into object candidates by
performing an explicit voting procedure over a set of parameterized image
objects.
The simplest
case of Hough transform is the linear transform for detecting straight lines.
In the image space, the straight line can be described as,
y = mx + b
where, the
parameter m is the slope of the line, and b is the intercept (y-intercept).
This is called
the slope-intercept model of a straight line. In the Hough transform, a main
idea is to consider the characteristics of the straight line not as discrete
image points (x1, y1), (x2, y2), etc., but instead, in terms of its parameters
according to the slope-intercept model, i.e., the slope parameter m and the
intercept parameter b. In general, the straight line y = mx + b can be
represented as a point (b, m) in the parameter space. However, vertical lines
pose a problem. They are more naturally described as x = a, and would give rise
to unbounded values of the slope parameter m. Thus, for computational reasons,
Duda and Hart proposed the use of a different pair of parameters, denoted r and
θ, for the lines in the Hough transform. These two values, taken in
conjunction, define a polar coordinate. It is shown in figure 5.2. The points
on line can be shown in (x,y) plane with (r, θ) coordinates in Hough’s space.
It is
therefore possible to associate with each line of the image a pair (r, θ) which
is unique if θ ∈ [0, π]
and r ∈ R, or if θ ∈ [0, 2π] and r > 0.
Figure 2: Hough’s space denotation as (r, θ) for line.
The linear
Hough transform algorithm uses a two-dimensional array, called an accumulator,
to detect the existence of a line described by r = x cosθ + y sinθ. The
dimension of the accumulator equals the number of unknown parameters, i.e.,
two, considering quantized values of r and θ in the pair (r,θ). For each pixel
at (x,y) and its neighborhood, the Hough transform algorithm determines if
there is enough evidence of a straight line at that pixel. If so, it will
calculate the parameters (r,θ) of that line, and then look for the
accumulator's bin that the parameters fall into, and increment the value of
that bin. By finding the bins with the highest values, typically by looking for
local maxima in the accumulator space, the most likely lines can be extracted,
and their (approximate) geometric definitions read off. The simplest way of
finding these peaks is by applying some form of threshold, but other techniques
may yield better results in different circumstances - determining which lines
are found as well as how many. Since the lines returned do not contain any
length information, it is often necessary, in the next step, to find which
parts of the image match up with which lines. Moreover, due to imperfection
errors in the edge detection step, there will usually be errors in the
accumulator space, which may make it non-trivial to find the appropriate peaks,
and thus the appropriate lines.
The final
result of the linear Hough transform is a two-dimensional array (matrix)
similar to the accumulator -- one dimension of this matrix is the quantized
angle θ and the other dimension is the quantized distance r. Each element of
the matrix has a value equal to the number of points or pixels that are
positioned on the line represented by quantized parameters (r, θ). So the
element with the highest value indicates the straight line that is most
represented in the input image.
The line can
be found in image using Hough’s transform as explained following example.
Figure 3: Accumulator tables for (r, θ) for three points
Following steps
are used to find the actual (r, θ) point representing the line passing by three
points given in the figure 3:
- For each data point, a number of
lines are plotted going through it, all at different angles. These are
shown here as solid lines.
- For each solid line a line is
plotted which is perpendicular to it and which intersects the origin.
These are shown as dashed lines.
- The length (i.e. perpendicular
distance to the origin) and angle of each dashed line is measured. In the
diagram above, the results are shown in tables.
- This is repeated for each data
point.
- A graph of the line lengths for
each angle, known as a Hough space graph, is then created. It is shown in
figure 4. The intersection point of all 3 curves is the actual value of
(r, θ) representing the line passing through 3 points.
Figure 4: (r, θ) graph for 3 points
3 Circular
Hough Transform
Unlike the
linear HT, the CHT relies on equations for circles. The equation of the a
circle is,
r² = (x – a)² + (y – b)²
Here, a and b
represent the coordinates for the center, and r is the radius of the circle.
The parametric representation of this circle is
x = a +
r*cosθ
y = b +
r*sinθ
In contrast to
a linear HT, a CHT relies on 3 parameters, which requires a larger computation
time and memory for storage, increasing the complexity of extracting
information from our image. For simplicity, most CHT programs set the radius to
a constant value (hard coded) or provide the user with the option of setting a
range (maximum and minimum) prior to running the application.
For each edge
point, a circle is drawn with that point as origin and radius r. The CHT also
uses an array (3D) with the first two dimensions representing the coordinates
of the circle and the last third specifying the radii. The values in the
accumulator (array) are increased every time a circle is drawn with the desired
radii over every edge point. The accumulator, which kept counts of how many
circles pass through coordinates of each edge point, proceeds to a vote to find
the highest count. The coordinates of the center of the circles in the images
are the coordinates with the highest count.
4 Block
diagram for circle detection
The block
diagram of detection of circle is shown in figure 5.5. The input image is
preprocessed to remove noise. The image is enhanced by adjusting the contrast
and brightness. Also, the image is converted into grayscale. The next is to
segment the image by edge detector to detect the edges in image. If the solid
circle is present, the boundary of the circle will be extracted in white color
in the output image from segmentation process. Finally, the image is given into
Hough’s transform block to detect the lines or circle according to the
transformation parameters.
Figure 5: Block diagram for Circle detection
Result
The detection
of two different circles is necessary to detect the two diagonal points of
rectangle. For that purpose, application of Hough’s circle detection algorithm is
repeated twice with different preprocessing parameters. The preprocessing block
involves filtering and thresholding with the RGB value of pixels. So, red color
can be detected by keeping the (R, G, B) filtering value in (150, 0, 0) to
(255, 30, 30). The first set of RGB value shows the minimum value which is
allowed in image to have white pixel value or 255 grayscale value. The second
set of RGB value shows the maximum value, which are allowed to have white pixel
value. If the image pixel value is in between the minimum and maximum values,
the output image will contain white color or 255 grayscale value at the
corresponding pixel position. If the image pixel value doesn’t satisfy the
criteria and falls beneath the minimum or maximum value, the output image will
contain black color or 0 grayscale value at the corresponding pixel position.
Same way the
detection of blue color can be done by setting RGB minima at (0, 0, 150) and
maxima at (30, 30, 255).
Hence, two
different preprocessed images are available, which are filtered with red and
blue color criteria. Applying Hough’s circle detection on these images produce
two points, which are center of the red and blue color circle. Assuming that, (x1,
y1) and (x2, y2) are the center of red and
blue circles, respectively.
From two
points (x1, y1) and (x2, y2), which
are center of two circles, the rectangle can be drawn joining these points in
diagonal direction. This rectangle can be considered as ROI for cropping an
image. It is shown in figure 6.
Figure 6: Deciding the Region of Interest (ROI) from two points
Finally, the
cropped image is stored. The above image processing can be done by OpenCV
libraries. For designing the device pandaboard can be used. Raspberry Pi, which
is low cost and lower specification than pandaboard, can also be used as this
is the optimized sixth sense capturing using two color detection.
IMAGE IS
CAPTURED USING GESTURE DETECTION & IT WILL BE STORED ON GOOGLE DRIVE
THROUGH CLOUD COMPUTING.
No comments:
Post a Comment