Result
The
detection of various materials living or dead is done. Using Logitech C270
webcam and codes with OpenCV library functions, certain day-to-day life
activities can be tracked and reproduced in the form of required outputs. Face
detection and tracking, movement detection and tracking are the functions that
can be executed efficiently and have been executed with almost 70% efficiency
w.r.t. time delay.
Using
Microsoft Xbox 360 Sensor, more detailed tasks can be carried out that works
efficiently in terms of time delay. Using Microsoft Xbox 360 Sensor, it uses
three different cameras viz. RGB, IR and Depth cameras. Using these all
together and running basic functions, many tasks can be carried out. The task
of head pose estimation, face detection and tracking etc can be carried out
effectively. Apart from that depth detection, distance measurement, use of
camera during less light, object detection and tracking as well as motion
detection can be carried out. Microsoft Xbox 360 sensor also has inbuilt
microphone which is direction based and helps in detection any object or person
based on the detection of type of sound. Also it helps the blind person in the
proper detection of a person as well as store the same.
The
other functions include the detection of text to speech and vice versa. Also
detection of hand written letters into speech. Here the hand written letters
will be compared with inbuilt letters and numbers and a probable outcome will
be provided which can be turned into a speech form. This is effective as far as
writing is proper and the detection is efficient. Also text to speech helps the
blind person in noting down a person’s name in the directory and can be saved
as a file name.
Object
detection is an advanced form of handwritten to speech detection. Here the
pre-built images will be compared with real time information of the object in
front and will the approximated detected object and its name as well as provide
a sound output.
The
following screenshot shows the final WebcamFaceRec project, including a small
rectangle at the top-right corner highlighting the recognized person. Also
notice the confidence bar that is next to the pre-processed face (a small face
at the top-centre of the rectangle marking the face), which in this case shows
roughly 70 percent confidence that it has recognized the correct person.
Fig R.1: Webcam Face Recognition
Next result is of Text to Speech synthesis
which works very fine. Sound can be listened practically but in this document I
am showing snapshot of program with a text file and terminal line for
execution. This successful implementation of Text to Speech synthesis can be
used in WebCam Face Recognition program where when a face is recognized it can
speak his name with pre-defined words like
“Hey PandaBoard User, XYZ is in front of
you “!
Fig R.2: Text to Speech Synthesis
Now after Face Recognition and Text to
Speech synthesis this project is almost complete but the last hurdle is that we
cannot store faces of each individual and hence need to store new faces or
unknown person coming as and when they come in front of camera. Speech to Text
synthesis was one thought we were having to use when a new person’s face is
detected but there is no Speech to Text software or algorithm developed till
date for Ubuntu operating system and research is going on still. So to do that
work we are thinking out of the box by using Sound Marking.
In sound marking we are going to record
sound of the unknown person coming in front of device that is his/her own voice
containing his name. So for doing this we used a Kinect Sensor and developed an
application which records sound for 5 seconds in which whatever is spoken will
be recorded and will be saved with the name of date and time of that moment. At
the same instant a picture would be captured of unknown person with the same
date and time name off course so that his/her face could be stored. Next time
when that person comes in front of camera his/her face would be detected and we
would get the filename of detected face from which we can play his/her recorded
voice containing his/her name and hence the problem of Unknown faces could be
solved.
We had developed the application of
recording sound for windows whose snapshot is shown in next page but we are yet
to develop the application for Ubuntu OS.
Fig R.3: Audio Recording for Sound Marking
One another application we developed is
Basic Optical Character Recognition in which whatever number is drawn by the
user gets detected by the application with a system error chance of 11.00%.
This could help blind to write something or read something in digital alphabets
and numbers.
Fig R.4: Optical Character Recognition
Conclusion
We have developed a device which can be
used easily by visually impaired community. The size of this device is 4.5 by 4.0 inches with the camera size of 3 x 8.2 x 6 inches
and whole weight is 332 grams. Approximate cost of this device is INR
16000 /-. This device has salient features as
following:
- · Face Detection, Tracking and Person Identity Detection
- · Face Tagging and Storing New Person’s Face
- · Optical Character Recognition: Hand Written Text Recognition
- · Text to Speech
- · Object Detection, Tracking and Tagging
- · Motion Detection and Tracking
- · Capturing an Image using Hand Gesture and Uploading it to Internet
- · Colour Detection and Generation of Different Audio for Different Colours
- · Sound Marking for saving new faces and name
Nothing is perfect so there are always
chance of improvement and hence following are the future work which can be done
in this project:
- · Integrating Face Recognition , Text to Speech synthesis and Sound Marking in a single application
- · Developing Speech to text Synthesis
- · Improving Number and Character Recognition accuracy
- · Increasing speed of Video Frame for processing
- · Loading PandaBoard with only needed applications and softwares
- · Working without Operating System , example using PUTTY
No comments:
Post a Comment