Remote-Control Object detection Robot

A Project By Yiqi Yu, Weitao Jiang.

Introduction Video

Objective

Our project aims to design a remote-control object detection robot, and it mainly consists of three functions.

Users can remotely control the robot to move around from a terminal like a laptop via LAN.
Users can receive the real-time video with the camera installed at the head of the robot.
Lastly, once starting the object detection function, the robot is able to search around and find the target.

Introduction

Basically, we implemented three functions on our robot as planned.

We built a local server using python to realize the remote-control.
we used the local server to transfer images from camera port to the terminal in real time.
For the object detection part, we took use of the opencv haar cascade training model to train for our own object images and finally we could detect three items.

Design

In order to realize the remote-control, we had to build a communication connection between the robot and the terminal. It naturally occurred to us that we could build a local server using our own router. For the back end, we developed the web server framework based on Python’s micro-framework library, flask. The back-end code is responsible for handling the http request from the front end. When the user clicks on the button in the front end, the json data will be sent to the server end via HTTP POST, and we can call servo_control.py to change the movement of robot, like move forward or backward. Then we sent back the current status back to front end using json.

As for the front end, we wrote a JavaScript corresponding to the on-click function of each button. We used AJAX to retrieve the video stream from server side, since AJAX allows the web to send and retrieve data from a server asynchronously without interfering with the display and behavior of the existing page. The whole system design is shown as the Figure 1.

Figure 1. Whole system design

Inspired by the face recognition system project which was completed by the Jingyao Ren, Andre Heil, we decided to use opencv library and Haar cascade algorithm to realize the object detection. Object Detection using Haar feature-based cascade classifiers is an effective object detection method proposed by Paul Viola and Michael Jones in their paper, "Rapid Object Detection using a Boosted Cascade of Simple Features" in 2001. It is a machine learning based approach where a cascade function is trained from a lot of positive and negative images. It is then used to detect objects in other images. More details about haar cascade algorithm could be found in the reference. The object detection system design is shown as Figure 2.

Figure 2. Object Detection System Design

Firstly, we installed OpenCV and got OpenCV source onto the Raspberry Pi.

sudo apt-get install libopencv-dev python-opencv

We tried the existed haarcascade_frontalface_default.xml file to test the function and performance of the detection on Raspberry Pi. However, we came across a big problem which troubled us a lot that the FPS (frames per second) was too low (about one or two) when capturing the video from camera. At first, we really didn’t understand what caused that because the VideoCapture worked well on my own laptop. Then, we realized that the performance of single core in Raspberry Pi was limited. So, we came up with an idea that using multiprocessing to make good use of the three cores in Raspberry Pi, because we have to leave one core for the remote-control part. The approach did raise the FPS to around 5 for the face detection and around 12 for detecting a banana as Figure 2 and 3 shows below, but it still could be improved. After talking to Joe, he suggested that we could sacrificing the real-time detection for better FPS by jumping some frames, which means the Raspberry Pi didn’t have to do the large computation for each frame captured. Another cause for the low FPS was due to the resolution that we set. We firstly set the resolution 640*480 to achieve a wide perspective. That definitely brought a larger computation for the cpu. Finally, we set it to 320*240 to balance the performance. Also, a problem occurred when I want to train my own xml file. Due to the large training process of haar cascade, which usually takes a whole week for generate my own xml file, I only trained for one item – football as a target object. However, after the whole training is finished, the performance of my own xml file is disastrous with only around 60% accuracy rate. In order for the better performance of the demo, I used trained xml file from the website.

These following commands are steps for training your own OpenCV Haar classifier.

Install OpenCV & get OpenCV source

brew tap homebrew/science

brew install --with-tbb opencv

wget http://downloads.sourceforge.net/project/opencvlibrary/opencv- unix/2.4.9/opencv-2.4.9.zip

unzip opencv-2.4.9.zip

Put your positive images in the ./positive_images folder and create a list of them:

find ./positive_images -iname "*.jpg"  positives.txt

Put your negative images in the ./negative_images folder and create a list of them:

find ./negative_images -iname "*.jpg"  negatives.txt

Create positive samples with the bin/createsamples.pl script and save them to the ./samples folder:

perl bin/createsamples.pl positives.txt negatives.txt samples 1500\
   "opencv_createsamples -bgcolor 0 -bgthresh 0 -maxxangle 1.1\
   -maxyangle 1.1 maxzangle 0.5 -maxidev 40 -w 80 -h 40"

Use tools/mergevec.py to merge the samples in ./samples into one file:

python ./tools/mergevec.py -v samples/ -o samples.vec

Start training the classifier with opencv_traincascade, which comes with OpenCV, and save the results to ./classifier:

opencv_traincascade -data classifier -vec samples.vec -bg negatives.txt\
   -numStages 20 -minHitRate 0.999 -maxFalseAlarmRate 0.5 -numPos 1000\
   -numNeg 600 -w 80 -h 40 -mode ALL -precalcValBufSize 1024\
   -precalcIdxBufSize 1024

Upon the implementation of each module, we proceeded to combining all modules together to complete the embedded system. Firstly, we have to connect the front end and the back end to implement the remote-control. We build the interface of the website using html and connecting it using python. More specifically, once the front end received a press button signal, it transferred it to the back end to call the according functions.

For the robot control, we extended the function in lab 3 to fit our project, adding two more functions – speeding up and down.

We encountered a big problem here for combining the video transferring and object detection, because they both needed to use the camera port. Only one camera port was available and the types of their captured video were different. Video transferring needed a video stream type and jpeg to increase the process of transferring and object detection only accepted video array and bgr type for the computation.

Testing

Followed by the modularity principle, we firstly tested each module separately to confirm that development steps of the project performed as planned.

Figure 3. Remote Control Testing

Figure 4. Real-Time Video Transfer Testing

Figure 5. Existed haarcascade_frontalface_default.xml Testing

Figure 6. banana_detection.xml Testing

Figure 7. Two Objects Detection Testing

After testing for each module, we proceeded to combining all the modules together.

Results and Conclusions

Basically, we have realized all three functions of the original goal. As for our demo:

Successfully controlled the robot to move around from a terminal via a local server.
Successfully received the real-time video on the terminal with the camera installed in the head of the robot with barely no latency.
Successfully realized the object detection for three items – banana, phone, car, that once detecting the target, the robot would stop.

Absolutely, some improvements can be made to perfect our results. First of all, the conflict between real-time video transferring and object detection, that once starting to detect object, the FPS becomes very low, should be solved. Also, since the robot will stop immediately once detecting the target, it will stop even if the target is far away as long as it is captured by the camera.

We are also unable to meet our goal of having our robot recognize the football. The main reason behind this is because the training process takes a long time. Theoretically, it will take two whole weeks to complete the training. In order to catch the deadline, I just speeded up the training process by jumping some stages, which made the training results disastrous. However, if there is enough time, we think we could make it.

Work Contributions

For the whole project, Weitao Jiang took responsibility for the front and back end design, Yiqi Yu completed the video capturing and transferring, and object detection part. For the last week, we worked together to debug the code.

Future Work

Our future work would mainly focus on the problems mentioned before. We will search for other ways to realize the object detection if the conflict could not be solved. Installing two cameras would not be that bad an idea. Also, we want the robot to send a message back to the terminal to inform the user that the object is found instead of just stopping. Certainly, we will restart the training process to get a final satisfying results of soccer ball detection.

Demo

Demo Video

Parts List

Raspberry Pi Provided in Lab
Raspberry Pi Camera Provided in Lab
Router Brought from home

References

PiCamera Document
Standard Parallax Servo Datasheet
Opencv haar cascade training
R-Pi GPIO Document
Figure 2

Code Appendix


#server.py
#Date: 11/30/2017
#Design ideas: Building a local host 192.168.1.100:8080 to allow connection 
#              and handle http requests from the front end
#Author: Weitao Jiang
#NetId: wj236

#!/usr/bin/env python
from importlib import import_module
import os
import sys
from flask import Flask, render_template, request, Response
from flask import jsonify
#import multiprocessing as mp
import threading
import servo_control
import time
import json
import camera_pi

from multiprocessing import Process


if os.environ.get('CAMERA'):
    Camera = import_module('camera_' + os.environ['CAMERA']).Camera
else:
   from camera_pi import Camera

app = Flask(__name__)
#app.config['SECRET_KEY'] = "dfdfdffdad"

@app.route('/')
def index():
    return render_template('index.htm')


def gen(camera_pi):
    """Video streaming generator function."""
    while True:
        frame = camera_pi.get_frame()
        yield (b'--frame\r\n'b'Content-Type: image/jpeg\r\n\r\n' + frame + b'\r\n')


@app.route('/video_feed')
def video_feed():
    """Video streaming route. Put this in the src attribute of an img tag."""
     
    return Response(gen(Camera()),
                   mimetype='multipart/x-mixed-replace; boundary=frame')

@app.route('/start', methods=['GET', 'POST'])
def start():
    print("start")
    if request.method == 'POST':
        a = request.form['mydata']
        print(a)
        
        camera_pi.detect()
        print("100")
    d = {'name': 'xmr', 'age': 18}
    return jsonify(d)

@app.route('/left', methods=['GET', 'POST'])
def left():
    print("left")
    if request.method == 'POST':
        a = request.form['mydata']
        print(a)
        servo_control.left()
    d = {'name': 'xmr', 'age': 18}
    return jsonify(d)

@app.route('/right', methods=['GET', 'POST'])
def right():
    print("right")
    if request.method == 'POST':
        a = request.form['mydata']
        print(a)
        servo_control.right()
    d = {'name': 'xmr', 'age': 18}
    return jsonify(d)

@app.route('/forward', methods=['GET', 'POST'])
def forward():
    print("forward")
    if request.method == 'POST':
        a = request.form['mydata']
        print(a)
        servo_control.forward()
    d = {'name': 'xmr', 'age': 18}
    return jsonify(d)

@app.route('/backward', methods=['GET', 'POST'])
def backward():
    print("backward")
    if request.method == 'POST':
        a = request.form['mydata']
        print(a)
        servo_control.backward()
    d = {'name': 'xmr', 'age': 18}
    return jsonify(d)

@app.route('/stop', methods=['GET', 'POST'])
def stop():
    print("stop")
    if request.method == 'POST':
        a = request.form['mydata']
        print(a)
        servo_control.stop()
    d = {'name': 'xmr', 'age': 18}
    return jsonify(d)

@app.route('/quit', methods=['GET', 'POST'])
def quit():
    
    if request.method == 'POST':
        shutdown()
        servo_control.myquit()
        camera_pi.myquit()
       
        
        
        return "server shutdown"
        a = request.form['mydata']
        
        print(a)
     
    d = {'name': 'xmr', 'age': 18}
    return jsonify(d)

@app.route('/up', methods=['GET', 'POST'])
def up():
    print("up")
    
    if request.method == 'POST':
        
        servo_control.speed_up()
        a = request.form['mydata']
        print(a)
    d = {'name': 'xmr', 'age': 18}
    return jsonify(d)

@app.route('/down', methods=['GET', 'POST'])
def down():
    print("down")
    
    if request.method == 'POST':
        servo_control.speed_down()
        a = request.form['mydata']
        print(a)
    d = {'name': 'xmr', 'age': 18}
    return jsonify(d)

def shutdown():
    func=request.environ.get('werkzeug.server.shutdown')
    if func is None:
        raise RuntimeError('NOt running with the server')
    func()
    
    
if __name__ == '__main__':
  
    app.run(host="192.168.1.100",port=int("8080"),threaded=True)


#servo_control.py
#Date: 11/30/2017
#Design ideas: Using GPIO output singal to give signal to the servo
#              Adding speed up and slowdown function
#              Displaying the history in the PiTFT
#Author: Yiqi Yu
#NetId: yy757

import pygame
from pygame.locals import*
import os
import time
import RPi.GPIO as GPIO
import subprocess
import threading

os.putenv('SDL_VIDEODRIVER', 'fbcon') # Display on piTFT 
os.putenv('SDL_FBDEV', '/dev/fb1')
os.putenv('SDL_MOUSEDRV', 'TSLIB') # Track mouse clicks on piTFT 
os.putenv('SDL_MOUSEDEV', '/dev/input/touchscreen')
#=================== display setup ================================
s1='stop'
s2='stop'
s3='stop'
s4='stop'
s5='stop'
s6='stop'
t1=0
t2=0
t3=0
t4=0
t5=0
t6=0
GPIO.setmode(GPIO.BCM)
GPIO.setup(19, GPIO.OUT)       #  set GPIO 
pl=GPIO.PWM(19,50)
GPIO.setup(13, GPIO.OUT)
pr=GPIO.PWM(13,50)
pygame.init()
starttime=time.time()
pygame.mouse.set_visible(False)
WHITE = 255, 255, 255
black = 0,0,0
screen = pygame.display.set_mode((320, 240))
i = 0
#====================button set========================================#
def prf():
    global i
    my_font = pygame.font.Font(None, 20)
    if (i == 0):
        my_buttons = { 'left history':(80,40), 'right history':(240,40),'stop':(160,100)}
    else:
        my_buttons = { 'left history':(80,40), 'right history':(240,40),'resume':(160,100)}
    my_button1 = {s1:(60,80)}
    my_button2 = {s2:(60,120)}
    my_button3 = {s3:(60,160)}
    my_button4 = {s4:(240,80)}
    my_button5 = {s5:(240,120)}
    my_button6 = {s6:(240,160)}
    my_time1 = {str(t1):(110,80)}
    my_time2 = {str(t2):(110,120)}
    my_time3 = {str(t3):(110,160)}
    my_time4 = {str(t4):(300,80)}
    my_time5 = {str(t5):(300,120)}
    my_time6 = {str(t6):(300,160)}
    screen.fill(black)
    for my_text, text_pos in my_buttons.items(): 
        text_surface = my_font.render(my_text, True, WHITE) 
        rect = text_surface.get_rect(center=text_pos) 
        screen.blit(text_surface, rect)
    for my_text, text_pos in my_button1.items(): 
        text_surface = my_font.render(my_text, True, WHITE) 
        rect = text_surface.get_rect(center=text_pos) 
        screen.blit(text_surface, rect)
    for my_text, text_pos in my_button2.items(): 
        text_surface = my_font.render(my_text, True, WHITE) 
        rect = text_surface.get_rect(center=text_pos) 
        screen.blit(text_surface, rect)
    for my_text, text_pos in my_button3.items(): 
        text_surface = my_font.render(my_text, True, WHITE) 
        rect = text_surface.get_rect(center=text_pos) 
        screen.blit(text_surface, rect)
    for my_text, text_pos in my_button4.items(): 
        text_surface = my_font.render(my_text, True, WHITE) 
        rect = text_surface.get_rect(center=text_pos) 
        screen.blit(text_surface, rect)
    for my_text, text_pos in my_button5.items(): 
        text_surface = my_font.render(my_text, True, WHITE) 
        rect = text_surface.get_rect(center=text_pos) 
        screen.blit(text_surface, rect)
    for my_text, text_pos in my_button6.items(): 
        text_surface = my_font.render(my_text, True, WHITE) 
        rect = text_surface.get_rect(center=text_pos) 
        screen.blit(text_surface, rect)
    
    for my_text, text_pos in my_time1.items(): 
        text_surface = my_font.render(my_text, True, WHITE) 
        rect = text_surface.get_rect(center=text_pos) 
        screen.blit(text_surface, rect)
    for my_text, text_pos in my_time2.items(): 
        text_surface = my_font.render(my_text, True, WHITE) 
        rect = text_surface.get_rect(center=text_pos) 
        screen.blit(text_surface, rect)
    for my_text, text_pos in my_time3.items(): 
        text_surface = my_font.render(my_text, True, WHITE) 
        rect = text_surface.get_rect(center=text_pos) 
        screen.blit(text_surface, rect)
    for my_text, text_pos in my_time4.items(): 
        text_surface = my_font.render(my_text, True, WHITE) 
        rect = text_surface.get_rect(center=text_pos) 
        screen.blit(text_surface, rect)
    for my_text, text_pos in my_time5.items(): 
        text_surface = my_font.render(my_text, True, WHITE) 
        rect = text_surface.get_rect(center=text_pos) 
        screen.blit(text_surface, rect)
    for my_text, text_pos in my_time6.items():
        text_surface = my_font.render(my_text, True, WHITE) 
        rect = text_surface.get_rect(center=text_pos) 
        screen.blit(text_surface, rect)
    pygame.display.flip()




def forward():
    global s1,s2,s3,s4,s5,s6
    global t1,t2,t3,t4,t5,t6
    pl.start(0)
    pr.start(0)
#=====================forward=====================#
    s3=s2
    s2=s1
    s1='forward'
    t3=t2
    t2=t1
    t1=int(time.time()-starttime)
    s6=s5
    s5=s4
    s4='forward'
    t6=t5
    t5=t4
    t4=int(time.time()-starttime)
    prf()
    pl.ChangeDutyCycle(9)
    pr.ChangeDutyCycle(6)

def stop():
    global s1,s2,s3,s4,s5,s6
    global t1,t2,t3,t4,t5,t6
    pl.start(0)
    pr.start(0)
    #=====================stop=====================#
    s3=s2
    s2=s1
    s1='stop'
    t3=t2
    t2=t1
    t1=int(time.time()-starttime)
    s6=s5
    s5=s4
    s4='stop'
    t6=t5
    t5=t4
    t4=int(time.time()-starttime)
    prf()
    pl.ChangeDutyCycle(0)
    pr.ChangeDutyCycle(0)
 
    #=====================backward=====================#
def backward():
    global s1,s2,s3,s4,s5,s6
    global t1,t2,t3,t4,t5,t6
    pl.start(0)
    pr.start(0)
    s3=s2
    s2=s1
    s1='backward'
    t3=t2
    t2=t1
    t1=int(time.time()-starttime)
    s6=s5
    s5=s4
    s4='backward'
    t6=t5
    t5=t4
    t4=int(time.time()-starttime)
    prf()
    pl.ChangeDutyCycle(6)
    pr.ChangeDutyCycle(9)

    #=====================left=====================#
def left():
    global s1,s2,s3,s4,s5,s6
    global t1,t2,t3,t4,t5,t6
    pl.start(0)
    pr.start(0)        
    s3=s2
    s2=s1
    s1='left'
    t3=t2
    t2=t1
    t1=int(time.time()-starttime)
    s6=s5
    s5=s4
    s4='left'
    t6=t5
    t5=t4
    t4=int(time.time()-starttime)
    prf()
    pl.ChangeDutyCycle(0)
    pr.ChangeDutyCycle(6)


    #=====================right=====================#
def right():
    global s1,s2,s3,s4,s5,s6
    global t1,t2,t3,t4,t5,t6
    pl.start(0)
    pr.start(0)
    s3=s2
    s2=s1
    s1='right'
    t3=t2
    t2=t1
    t1=int(time.time()-starttime)
    s6=s5
    s5=s4
    s4='right'
    t6=t5
    t5=t4
    t4=int(time.time()-starttime)
    prf()
    pl.ChangeDutyCycle(9)
    pr.ChangeDutyCycle(0)
    
    
    #=====================speed up=====================#
speed1 = 6
speed2 = 9
def speed_up():
    global s1,s2,s3,s4,s5,s6
    global t1,t2,t3,t4,t5,t6
    global speed1, speed2
    speed1 -= 0.5
    speed2 += 0.5
    pl.start(0)
    pr.start(0)
    s3=s2
    s2=s1
    s1='speed up'
    t3=t2
    t2=t1
    t1=int(time.time()-starttime)
    s6=s5
    s5=s4
    s4='speed up'
    t6=t5
    t5=t4
    t4=int(time.time()-starttime)
    prf()
    pl.ChangeDutyCycle(speed2)
    pr.ChangeDutyCycle(speed1)
    
#======================speed down======================
def speed_down():
    global s1,s2,s3,s4,s5,s6
    global t1,t2,t3,t4,t5,t6
    global speed1, speed2
    speed1 += 0.5
    speed2 -= 0.5
    pl.start(0)
    pr.start(0)
    s3=s2
    s2=s1
    s1='speed down'
    t3=t2
    t2=t1
    t1=int(time.time()-starttime)
    s6=s5
    s5=s4
    s4='speed down'
    t6=t5
    t5=t4
    t4=int(time.time()-starttime)
    prf()
    pl.ChangeDutyCycle(speed2)
    pr.ChangeDutyCycle(speed1)

#======================= quit =========================== 
def myquit():
    pygame.quit()
    GPIO.cleanup()


#camera_pi.py
#Date: 11/30/2017
#Design ideas: Using camera_capture_continuous to capture the frames
#              Using Trained Haar Cascade Classifier to detect target
#              Using multiprocessing to improve FPS
#              Using three cores and leave one core for remote-control
#Author: Yiqi Yu
#NetId: yy757

import io
import time
import picamera
from base_camera import BaseCamera
from picamera.array import PiRGBArray
from picamera import PiCamera
from multiprocessing import Queue
import cv2
import multiprocessing as mp
import servo_control
import threading
import time
import os


camera = PiCamera()
camera.resolution = ( 320, 240 )
camera.framerate = 60

flag=False
class Camera(BaseCamera):
    @staticmethod
    def frames():
       
            # let camera warm up
        time.sleep(0.5)
        stream=io.BytesIO()
        for foo in camera.capture_continuous(stream,'jpeg',use_video_port=True):
            if flag:
                break
                
            stream.seek(0)
            yield stream.read()
            stream.seek(0)
            stream.truncate()
            


def myquit():
    global flag
    flag=True
    camera.close()
    print("camera alreay closed")



#================= image capture ===================================

count = 0
start_time = time.time()
frames = 0 

#================= object detection ================================

cars_cascade = cv2.CascadeClassifier('/home/pi/5725/project/xml_file/cars.xml') #use haar cascade model 
banana_cascade = cv2.CascadeClassifier('/home/pi/5725/project/xml_file/banana_classifier.xml')
iphone_cascade = cv2.CascadeClassifier('/home/pi/5725/project/xml_file/iphone.xml')

#================= image capture ===================================

def object_capture ( img ):
    gray = cv2.cvtColor ( img, cv2.COLOR_BGR2GRAY)
    cars = cars_cascade.detectMultiScale( gray )
    banana = banana_cascade.detectMultiScale( gray )
    iphone = iphone_cascade.detectMultiScale( gray )
    return cars, banana, iphone, img

#================= draw frame ======================================
def draw_frame ( cars, banana, iphone, img ):
    global start_time, fps, frames             #only define global can change its value

    for (x, y, w, h) in cars:
            cv2.rectangle(img, (x, y), (x+w, y+h), (0, 255, 0), 2)
            servo_control.stop()
    for (x, y, w, h) in banana:
            cv2.rectangle(img, (x, y), (x+w, y+h), (0, 255, 0), 2)
            servo_control.stop()
    for (x, y, w, h) in iphone:
            cv2.rectangle(img, (x, y), (x+w, y+h), (0, 255, 0), 2)
            servo_control.stop()

    frames = frames + 1
    fps = frames / (time.time()-start_time)
    cv2.waitKey(1)
    

def detect():
    # frame
   
    rawCapture = PiRGBArray ( camera, size = (320,240))
    global count, start_time, frames
    camera.capture( rawCapture, format="bgr" )

    pool = mp.Pool( processes = 3)

    p1 = pool.apply_async( object_capture, [ rawCapture.array ])
    p2 = pool.apply_async( object_capture, [ rawCapture.array ])
    p3 = pool.apply_async( object_capture, [ rawCapture.array ])

    c1, b1, ip1, i1 = p1.get()         #return object, image to o1, i1
    c2, b2, ip2, i2 = p2.get()         #return object, image to o2, i2
    c3, b3, ip3, i3 = p3.get()         #return object, image to o3, i3
    

    while True:
        if flag:
            print("cvquit")
            break
        rawCapture = PiRGBArray ( camera, size = (320,240))
        camera.capture( rawCapture, format="bgr" )
        
        
       
        image = rawCapture.array

  
        if   count == 1:
            
            p1 = pool.apply_async( object_capture, [ image ] )
            f2, b2, ip2, i2 = p2.get()
            draw_frame( f2, b2, ip2, i2 )
         

        elif count == 2:
            p2 = pool.apply_async( object_capture, [ image ] )
            f3, b3, ip3, i3 = p3.get()
            draw_frame( f3, b3, ip3, i3 )

        elif count == 3:
            p3 = pool.apply_async( object_capture, [ image ] )
            f1, b1, ip3, i1 = p1.get()
            draw_frame( f1, b1, ip3, i1)
            count = 0

        count += 1
        
        PiRGBArray ( camera, size = (320,240)).truncate( 0 )
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break