ECE5725 Project

RaspberryPi-based Smartphone with Voice Control

By Zhongyu Lin (zl579), Xingyu Chen (xc374).
2017 - 12 - 08

Objective

Voice recognition has been upgrading rapidly in recent years and many applications have incorporated this functionality to provide more user-oriented service. In this project, a Raspberry Pi based smart phone (or call it Piphone) is designed to implement voice control to achieve two basic functions in a phone system: make calls and play music.

Introduction

To implement the idea of ‘portable device’ as being a smart phone, some circuitry design and off-the-shelf components are used to make the physical design gets rid of entanglement of wires. The Piphone is developed with Pygame library, which is a very easy-to-use python package to construct neat user interface, to build linkages to on-board applications via touch screen. Our DIY Piphone contains two applications running with voice recognition engines: one is for making calls and receiving calls for communication purpose and one is for playing music for entertainment purpose. Both online and off voice recognition systems have been tried onto the phone applications. The online voice recognition system used is the Google’s speech recognition API which was utilized mainly to achieve complete and better application performance while the offline system is developed to mainly yield basic functions with no internet access and is only designed for recognizing limited vocabulary. Additionally, the music player application is developed upon MYSQL database for customized playlist generation and efficient batch selection.

Design & Testing

Hardware Design

First thing we did is to construct the circuitry needed for the Piphone. A fona GSM module is the most important component for this project since it establishs communication protocols on Pi to link with other devices and a 3.7v 1200mAh Lipo is mandatory for its operation, which falls short of the voltage requirement for rpi, which is around 5V. We could’ve used the an external battery pack to power the rpi separately, yet it is not very neat in terms of the design layout. Therefore, we took an alternative approach and purchased a DC boost converter that is able to convert the 3.7 Volts supplied by Lipo battery up to 5V. Therefore, only one battery is needed to power everything up! Also, consider that smart phone conceptionally should be a portable, so we constructed a little board to compact all the circuitry to connect everything up.
The actual layout and wiring diagram of the complete hardware design are shown below:

The front view and rear view of Piphone

The wiring diagram of the piphone system

In terms of the wiring of the system, we referred to the instruction shown in [1]. Yet, we made a lot improvement. In [1], the designer used one power switch at lipo battery end to power everything up; the KEY pin in the GSM module is always connected to ground, which means the FONA would be always powered on during the lipo battery charging process. Yet in reality, the charging of phone could be done even when the unit is shut down. Thus, we placed a switch between the KEY pin and ground, which could be used to turn the GSM module off during charging process. The specific functions associate with each pinout could refer to the official Adafruit website [5].
There are few things to note for the wiring diagram:

The Vin pin is connected to Vbat to set the logic converter in GSM otherwise the Fona will not work

The RI signal pin will be pulled low for 120ms (default is high) if a call is received, which is faster than using AT command via software; it might be useful for performance enhancement in future.

Voice Recognition

Online voice recognition system:
First thing we did the explore the google speech recognition API package [2]. The corresponding library could be quick installed by using pip install command. Google speech recognition requires internet connection to upload recorded voice to server and sends the recognition result back to local device. Such process causes significant time consumption, so that recognizing voice iteratively would cause too much time delay for other process. Therefore, we designed to use a button to start voice recognition when needed instead of detecting voice all the time in the background. As the google speech recognition has its own deep learning model to optimize its performance, a voice command with more words could increase the recognition accuracy, such as a sequence of numbers would be recognized more accurate than single number, a word group “pop music” would be recognized easier than single word “pop”. In our python script, the voice recognition was achieved by one function voice_reg() with Pyaudio [3]. In this method, it requires user to record a 5 second wav file to be recognized. The return string is the recognition result, and it would return error message if voice cannot be recognized or server connection failed.

Offline voice recognition system:
We also tried to implement our own speech recognition system. The basic idea is to record our voice upon running the program and compare it with ground truth (which are pre-recorded with our voices). The evaluation metrics we used to test matching degree of two .wav files using cross correlation. The advantage of cross correlation is that it could handle asynchronous wav file and fast to compute. Each .wav file is recorded in a static format improve matching format: 5 seconds of duration, 44.1k sample rate and 1024 chunk sizes. Then each wav file is read using wave library. Also, each wav file is normalized with respect to the maximum value to reduce the interference of different volume effect. One thing to note here when recording, keep some distance to the microphone, otherwise the record wav file would be clamped and significantly impact the testing. We have tried to implement the cross correlation both in time domain and frequency domain. Time domain is simple to started with, we can directly apply cross correlation to the voice sample after normalization. For the frequency domain, we first operate FFT onto the normalized wav file samples in time domain. Then we represent the imaginary numbers in FFT as real numbers by multiplying its complex conjugate and compute the absolute of the result onto a new array [4]. The frequency spectrum is very noisy so we compute the average value of each adjacent 100 bins to smooth the frequency signal. Also, we normalize the returned value by dividing it sum. Then, we perform the aggregation results of multiplications between testing wav file and ground truths using cross correlation to test the matchiness. Using either of the approach, it should give the same result. Yet, we used the FFT based approach since it is faster. Initially, we are trying to distinguish numbers from 0-9. The testing gives very unstable matching results. This implies that the used approach might not able to distinguish such a large vocabulary without some appropriate training. But when we tried out with fewer commands, like distinguishing ‘play’, ‘stop’, ‘next’, the performance is much stable. Therefore, we assumed that the failure of distinguishing numbers is because they all have short and similar syllables in pronunciation. Due to the large requirement of commands in our applications, we decided to use google speech recognition system to complete the later design.
The example waveforms of wav file in time domain and frequency domain are shown as below:

Example waveform of wav file in time domain

Example waveform of wav file in frequency domain

With the voice recognition system ready, we set about designing our applications: making calls and playing music.

Phone Call Application Design

Our phone supports basic function such as making, receiving and hang up a phone call. In addition, user can dial a number via both touch screen and voice control.
GSM module setup & testing :
The communication will be set with the FONA GSM module. The wiring of GSM could refer to the hardware design section before. There are several communication statuses between the GSM and RPi. When powered on, the BLUE LED would be turned on. If the NETWORK is not available, the red NET LED will blink with 64ms on and 800 ms off. If the connection is solid, then it would blink in a much slower frequency about 64ms on and 3 seconds off as described in the official module document [5]. Also, the charging process of lipo battery is indicated by LED: if it’s on progress the orange LED will be on and once the battery is fully charged it would turn on the green LED.

To check if the serial communication between the Adafruit FONA module and Raspberry Pi is successful, an application Putty could be used to send AT command to the GSM module. Before testing AT commands, there were several setup steps specially for “Jessie” kernel [6]:

Ensure terminal over serial is disabled in raspi-config. Run: sudo raspi-config, in “Interfacing Options” select “Serial”, disable login shell to be accessible over serial.

Ensure there is no ttyAMA0 in /boot/cmdline.txt Now, /boot/cmdline.txt is showed like this:  dwc_otg.lpm_enable=0 console=tty1 root=/dev/mmcblk0p2 rootfstype=ext4 elevator=deadline rootwait It could be found that there is no ttyAMA0 in the file.

Disable serial-getty.  Run: sudo systemctl mask serial-getty@ttyAMA0.service

Modify UART pins (15, 16 wPi). Run: gpio readall. At first, wPi 15 and 16 were in state IN.  Run: gpio mode 15 ALT0; gpio mode 16 ALT0 . After rebooting the Rpi, the two pins were still in IN mode. Then enabled uart in /boot/config.txt by setting enable_uart from 0 to 1. After rebooting the Rpi, wPi pin 15 and 16 were in mode ALT5. Meanwhile ttyS0 was showed under dev directory and could be used as our serial port to communicate Rpi with GSM model. This port is used as communication port to send signal between GSM and Pi.

As we used python script to design our project, library “serial” was imported in our script and the serial port initialization was set as:
serialport = serial.Serial("/dev/ttyS0", 9600, timeout=0.5)
There were several AT commands used for different functions in our application showed in the following table [7]:

AT Command Table

At this step, we used Putty application to check each AT command. In python script, all these command could be write by serialport.write("[AT Command]\r"), and responcse could be read and store in a string by response=serialport.readline().

Phone call interface design:
We had five types of screen mode (including main menu). The transition of these screen modes was showed in the following diagram:

Sreenmode Flowchart

Main Menu

Main Menu
The main menu contains two icons one for phone call application, another for the music player. In addition, the top right corner indicated current battery status. Battery detection was achieved by extracting part of information from the response string of “AT+CBC”. For example, the response string was “+CBC: 0,92,3877”, we only needed to extract the two numbers “92” between two “,” to represent remaining battery percentage. Our battery indicator was not designed to be updated in real time, as iteratively writing and reading serial port to detect battery status would lead to slow response of other actions. In this case, the battery status was only updated each time we entered the main menu.

Dial Number Screenmode

Connecting Screenmode

Dialnumber Screenmode:
In this menu, we designed icons for digits from 0 to 9 to dial the phone number, a ‘deletion’ icon for delete number, a ‘call’ icon for making a call and a ‘microphone’ icon to run voice recognition. After we pressed the microphone icon, we had 8 seconds to speak the numbers. Then the screen would show the numbers we said during this time. However, if we said non-numeric words, it would show nothing. The button for dialing is also implemented as a failsafe approach.
Calling Screenmode:
The ‘calling’ menu will show up after pressing call button and disappear as the call being picked up. As mentioned in AT Command table, we could use “AT+CLCC” to extract status information. For example, the response string was “+CLCC:1,1, 2, 0, 0, ”*********”,129,”””, we need to extract number “2” as it representing dialing status. As the status number was always the third number in the string, we could translate the above string into a number only string and returned the third number. If the returned number becomes “0”, it means the call is connecting and enter connecting screen mode.
Connecting Screenmode:
The ‘connecting’ screen mode showed the calling number and the current calling time updated in real time. Once detect the status changed from non-zero (disconnecting) to zero (connecting), we started recording the time. Once detect the status changed from zero back to non-zero (hang up), clear old_time and go back to Dial Number screen mode.
Incoming Screenmode:
The incoming screen displayed the incoming number and allowed user to pick up or hang up the incoming phone. When there was an incoming call, the serial port read would receive a string “Ring”, therefore, once such a response string was detected, enter incoming screen mode. Use “AT+CLCC” again to obtain incoming number by extracting numbers between first two “ “ ”.
All of above screen mode displays will be showed in Result part.

Music Player Application Design

Next is the design of our own music player. The challenge part is to build a database of songs to enforce automatic selection of user favored songs and customize their own choices of playlist. The songs are stored locally in the SIM card. The MYSQL database management system is used to set up indexing of songs for fast batch selection.
The idea of using MySQL database to organize songs originates from the experience of selecting objects with touchscreen. It is often not easy or even trivial to select songs that belongs to a specific genre or singer via touchscreen if the songs are unsorted. With MySQL, all the songs could be sorted with appropriate indexing on specific attribute and enforce efficient and faster batch selection with simple but powerful query commands. More importantly, with the implementation of speech recognition, the expressing of user desired choice is much convenient, and that the selection of songs with database would become more user oriented. For example, when the voice input is activated, if we say, ‘rap music’, then all the music belongs to genre of ‘rap music’ would be automatic shown on the playlist.
User can add whatever songs in the database into playlist, and store the playlist for the next time access. Even if we reboot the whole system, the playlist could be recovered since it is stored in the database. The store of playlist would generate another table under the same database, user is able to update it or delete it. If a user selects same songs into playlist, the database would eliminate the duplicate and the generated playlist only contains distinct songs for next-time access.
The detailed design process of music player follows as below:

First, we create our own database using mysql-python connector function, which enable us to manage the database via python script. Each song is store into the database with the format of: Note: song_id is an unique entity for each song, so we are able to use identify a specific song and distinguish with other songs.

Then, we build our interfaces with pygame. The challenge part is to implement a scroll down menu for displaying and tracking process of playback.

As the piTFT screen size is limited to display the whole play list, we designed two buttons to scroll screen up or down [8]. Firstly, we initialized a surface called “intermediate” with large size (240,960), then added all the songs with corresponding information into this interface. Secondly, we initialized another surface in piTFT screen size (240,320) called “screen” and displayed all the static buttons. Use “screen.blit(intermediate, (0, scroll_y))” to combine two surface. Change scroll_y value to achieve scrolling up or down.

We also designed a progress bar to indicate progress of song playing. The duration of one wav file could be calculated by frames/rate in seconds. The current progress time could be easily obtained by pygame.mixer.music.get_pos()/1000 in seconds. Use a function to translate seconds to mode “minute:second” to display. The ratio of the progress bar length and duration bar length is equal to the ratio of current progress time and duration. Additionally, the music player volume could be controlled by two buttons. We used GPIO 17 to add volume and GPIO 22 to reduce volume by using pygame functions: pygame.mixer.music.get_volume() and pygame.mixer.music.set_volume(). Volume bar was designed in the similar method as progress bar described before. 
We then write mysql-python functions to implement database related operations [9]. Each function corresponds to a particular operation to our song and playlist database and the code would be shown in the Appendix. Finally, we proceed to the interface designed with python using Pygame library. The menu interface and playlist interface are shown as below:

Local Song List

Playlist Menu

Briefly summarize the function we implement in the music player:

Play, pause, replay the song

Play previous or next song

Tracking progress of a playing song with progress bar. Display current progress and total duration of a song

Indicating which song is currently on play and display all the related attributes on interface

Including song id, song name, singer name, album, genre.

Selection of singer song by its ID

Selection of songs by its genre or singer

Volume control. Volume bar indicating volume magnitude.

Clear playlist only for displaying purposes

Save/Load/Delete playlist from database

Create/Update playlist

Switch between menu interface and playlist, back button to exit

Update song attributes (for correcting data attribute, for example, change the song genre to another or singer to another)

Note:
All the operation is user friendly, if one specific operation goes wrong (voice unrecognized or a conflict occurs), a window would pop up informing the error. For example, if a playlist is already created, then user would be informed to delete the pre-existing playlist before storing into database.
The corresponding buttons on touch screen for some commands listed above are placed onto interface as a failsafe operation. To reduce the interference of the noise, the program is set to response correspondingly as long as the detected word sequence contains the targeted ‘word’. We do not need to enforce exact matching.

Result

The successful results in our projects were demostrated in the following video:

Demonstration Video

In the demo video, the voice recognition performance with google API was proved to be very accurate. Yet some problems like slow response, over-interpolation of voice might incur incorrect recognition. The offline voice recognition system requires improvement since it is constrained by the number of vocabulary it recognizes. For the application functions, all of them are carefully thought of, tested out and proved to have no bugs.

Conclusion

To sum up, we successfully design two applications on our Piphone, phone call and music player which both could be controlled via voice. We developed our own offline voice recognition system based on voice frequency domain cross correlation analysis. However, due to the large requirement of voice commands, the performance was limited by using the offline system. We finally implement all the applications using Google Speech Recognition API to be our voice recognition tool due to its high accurate recognition performance. The phone call application was achieved based on FONA GSM module. User could make a call by both touch screen and voice control. The duration time, phone number could be displayed during the call and hang up could be detected by both two devices. The music player application we developed upon database system with which users can create specific play list according to the selection of specific genre type or singer via voice command and users are allowed to store the play list into database and load it back when needed. In addition, we designed a battery indicator the main menu. From this project, we learned how to use Pygame library to design user oriented interface and how to link MySQL database with python script. We also realized when training is not enforced, the cross correlation is not very effective to test to degree of matching between voice files.

Future Work

In the future, the performance of offline voice recognition system could be enhanced by applying some training to the testing voice files. We can hopefully add our own voice recognition algorithm onto the designed applications, to make calls and to play music. The deigned interface could be improved as well: for example, for the application that makes calls, we can also use properties of database to store the information of caller into a contact list. Therefore, upon next time, we can directly load the phone number from database to call him instead of dialing the number again. For the music player, we can have different playlists thereby having a selection of different playlists. Also, we can upgrade our music player to have access not only to local files but also to cloud files via internet and enable it to sort song and select songs on different distrusted databases.

Parts List

Parts Cost List

References

[1] D. Hunt, “PiPhone - A Raspberry Pi based Cellphone,” adafruit learning system, 2015. Available at: https://cdn-learn.adafruit.com/downloads/pdf/piphone-a-raspberry-pi-based-cellphone.pdf. [Accessed : December-2017]
[2] A. Zhang, “SpeechRecognition 3.8.1”, Python, google speech recognition API, 2017, available at https://pypi.python.org/pypi/SpeechRecognition. [[Accessed : November-2017]
[3] “Pyaudio Documentation”, https://people.csail.mit.edu/hubert/pyaudio/docs/. [Accessed : November-2017]
[4] A. Kaushal and N. Vyas, “VOICE RECOGNITION USING FFT TRANSFORMATION,” Digital system processing Lab, 2008. Available at: http://ee301.wdfiles.com/local-files/dsp1/VOICE%20RECOGNITION%20USING%20FFT%20TRANSFORMATION.pdf. [Accessed : November-2017]
[5] “Adafruit FONA”, lady ada, adafruit learning system, 2017. Available at: https://cdn-learn.adafruit.com/downloads/pdf/adafruit-fona-mini-gsm-gprs-cellular-phone-module.pdf. [Accessed : December-2017]
[6] “Problem with ttyAMA0 on Raspbian jessie,” StackExchange, Raspberry Pi forum, 2016. https://raspberrypi.stackexchange.com/questions/47671/why-my-program-wont-communicate-through-ttyama0-on-raspbian-jessie. [Accessed : November-2017]
[7] “GSM AT Command Set, Application Note 010”, UbiNetics, 2001 [Online]. Available at: http://www.zeeman.de/wp-content/uploads/2007/09/ubinetics-at-command-set.pdf. [Accessed : December-2017]
[8] “Pygame scroll down page set up”, Stack overflow, 2017. Available at: https://stackoverflow.com/questions/24518573/pygame-scrolling-down-page. [Accessed : December-2017]
[9] Official Mysql-ptyhon documentation, MySQL,2017. Available at https://dev.mysql.com/downloads/connector/python. [Accessed : November-2017]

Code Appendix

Main_menu.py

import serial
import pygame
from pygame.locals import *
import RPi.GPIO as GPIO
import time
import subprocess
import os
import speech_recognition as sr
from os import path
os.putenv('SDL_VIDEODRIVER', 'fbcon') # Display on piTFT
os.putenv('SDL_FBDEV', '/dev/fb1') #
os.putenv('SDL_MOUSEDRV', 'TSLIB') # Track mouse clicks on piTFT
os.putenv('SDL_MOUSEDEV', '/dev/input/touchscreen')
GPIO.setmode(GPIO.BCM)
GPIO.setup(27, GPIO.IN, pull_up_down=GPIO.PUD_UP)
#quit
def GPIO27_callback(channel):
  exit (0)
GPIO.add_event_detect(27,GPIO.FALLING,callback=GPIO27_callback)

def returnnumber(st):
 n=""
 for char in st:
  if char.isdigit():
   n=n+char
 return n
# get incoming number
def get_incoming(st):
 n=""
 in_flag = 0
 for char in st:
  if in_flag<2 and in_flag>0: n=n+char
  if char == '"':
   in_flag = in_flag +1
 return n[:-1]
# get connecting status
def get_status(st):
 n=0
 for char in st:
  if char.isdigit():
   n=n+1
  if n==3:
   return char

def check_over(st):
  return st[0]
# get battery status
def get_battery(st):
 n=""
 in_flag = 0
 for char in st:
  if in_flag<2 and in_flag>0: n=n+char
  if char == ',':
   in_flag = in_flag +1
 return n[:-1]
# translate time to minute:second mode
def get_progress(t):
    minutes=int(t/60)
    second=int(t-minutes*60)
    if second<10:
     time_string=str(minutes)+" : 0"+str(second)
    else:
     time_string=str(minutes)+" : "+str(second)
    return time_string

class Background(pygame.sprite.Sprite):
    def __init__(self, image_file, location):
        pygame.sprite.Sprite.__init__(self)  #call Sprite initializer
        self.image = pygame.image.load("/home/pi/project/"+image_file)
        self.rect = self.image.get_rect()
        self.rect.left, self.rect.top = location

pygame.init()
pygame.mouse.set_visible(False)
clock=pygame.time.Clock()
size = width, height = 240,320
black = 0, 0, 0
WHITE = 255, 255, 255
screen = pygame.display.set_mode(size)
background = Background("background.png",[0,0])
call = Background("call.png",[10,10])
music = Background("music.png",[80,10])
one = Background("1.png",[30,70])
two = Background("2.png",[100,70])
three = Background("3.png",[170,70])
four = Background("4.png",[30,130])
five = Background("5.png",[100,130])
six = Background("6.png",[170,130])
seven = Background("7.png",[30,190])
eight = Background("8.png",[100,190])
nine = Background("9.png",[170,190])
zero = Background("0.png",[100,250])
micro = Background("microphone.png",[40,25])
micro2 = Background ("microphone2.png",[40,25])
back = Background("back.png",[0,10])
call2 = Background("call2.png",[40,260])
delete = Background("delete.png",[180,260])
hang_up = Background("hang_up.png",[90,220])
hang_up2 = Background("hang_up.png",[130,220])
answer = Background("answer.png",[50,220])
battery = Background("battery.png",[200,10])

calling={'Calling...':(100,70)}
ring={'Ring...':(100,70)}
x1 = [20,40,60,80,100,120,140,160,180,200,220]
y1 = 40
my_font = pygame.font.Font(None, 15)
my_font2 = pygame.font.Font(None,25)
menu = 0
number = " "
press = 0
voice_recog=0
old_time=0
status='1'
previous_status=0

serialport = serial.Serial("/dev/ttyS0", 9600, timeout=0.5)
serialport.write("AT\r")
response = serialport.readlines(None)
serialport.write("ATE0\r")
response = serialport.readlines(None)
serialport.write("AT\r")
response = serialport.readlines(None)
serialport.write("AT+CBC\r")
response = serialport.readline()
while len(response)<5:
 response = serialport.readline()
battery_n=get_battery(response)

while 1:
 #main menu 
 if menu == 0:
  screen.blit(background.image,background.rect)
  screen.blit(call.image,call.rect)
  screen.blit(music.image,music.rect)
  screen.blit(battery.image,battery.rect)
  screen.blit(my_font.render(battery_n,True,(0,0,0)),(208,35))
  pygame.display.flip()

  for event in pygame.event.get():
   if(event.type is MOUSEBUTTONDOWN):
    pos = pygame.mouse.get_pos()
    x,y = pos
    if y>10 and y<70 and x>10 and x<70:
     menu = 1
    if y>10 and y<70 and x>80 and x<145:
     GPIO.cleanup()
     import m_player2

 #phone call menu
 if menu == 1:
  screen.fill(WHITE)
  screen.blit(one.image,one.rect)
  screen.blit(two.image,two.rect)
  screen.blit(three.image,three.rect)
  screen.blit(four.image,four.rect)
  screen.blit(five.image,five.rect)
  screen.blit(six.image,six.rect)
  screen.blit(seven.image,seven.rect)
  screen.blit(eight.image,eight.rect)
  screen.blit(nine.image,nine.rect)
  screen.blit(zero.image,zero.rect)
  screen.blit(back.image,back.rect)
  screen.blit(call2.image,call2.rect)
  screen.blit(delete.image,delete.rect)
  if voice_recog == 0:
    screen.blit(micro.image,micro.rect)
  screen.blit(my_font2.render(number,True,(0,0,0)),(80,30))
  pygame.display.flip()
  #voice recognition to dial    
  if voice_recog == 1:
   screen.blit(micro2.image,micro2.rect)
   pygame.display.flip()
   cmd = 'arecord -d 8 -D plughw:1,0 output.wav'
   print subprocess.check_output(cmd, shell=True)
   voice_recog = 0
   AUDIO_FILE = path.join(path.dirname('\home\pi\project'), 'output.wav')
   r=sr.Recognizer()
   with sr.AudioFile(AUDIO_FILE) as source:
    audio = r.record(source)  # read the entire audio file
   try:
       number = number + returnnumber(r.recognize_google(audio))
   except sr.UnknownValueError:
       number = number
   except sr.RequestError as e:
       number = number

  for event in pygame.event.get():
   if(event.type is MOUSEBUTTONDOWN):
    pos = pygame.mouse.get_pos()
    x,y = pos
    if x>0 and x<20 and y>0 and y<30:
     menu = 0
    if x>30 and x<80 and y>65 and y<115:
     number = number+'1'
    if x>100 and x<150 and y>65 and y<115:
     number = number+'2'
    if x>170 and x<220 and y>65 and y<115:
     number = number+'3'
    if x>30 and x<80 and y>125 and y<175:
     number = number+'4'
    if x>100 and x<150 and y>125 and y<175:
     number = number+'5'
    if x>170 and x<220 and y>125 and y<175:
     number = number+'6'
    if x>30 and x<80 and y>185 and y<235:
     number = number+'7'
    if x>100 and x<150 and y>185 and y<235:
     number = number+'8'
    if x>170 and x<220 and y>185 and y<235:
     number = number+'9'
    if x>100 and x<150 and y>245 and y<295:
     number = number+'0'
    if x>170 and x<220 and y>245 and y<295:
     number = number[:-1]
    #make call 
    if x>30 and x<80 and y>245 and y<295:
     menu = 2
     voice_recog=0
     print("Calling " + number);
     serialport.write("AT\r")
     response = serialport.readlines(None)
     serialport.write("ATD " + number + ';\r')
     response = serialport.readlines(None)
     print response

    if x>40 and x<70 and y>20 and y<60:
     voice_recog = 1
 #calling menu    
 if menu == 2:

   while len(s)<5:
    serialport.write("AT+CLCC\r")
    s=serialport.readline()
    status=get_status(s)

   if status == '0':
    previous_status=1
    if old_time==0:
     old_time=time.time()
    else:
     current_time=time.time()-old_time
     screen.fill(WHITE)
     screen.blit(my_font2.render(number,True,(0,0,0)),(80,30))
     screen.blit(my_font2.render(get_progress(current_time),True,(0,0,0)),(100,60))
     screen.blit(hang_up.image,hang_up.rect)
     pygame.display.flip()
   else:
    if previous_status==1:
     menu=1
     previous_status=0
    else:
     old_time=0
     screen.fill(WHITE)
     screen.blit(my_font2.render(number,True,(0,0,0)),(80,30))
     for my_text, text_pos in calling.items():
      text_surface = my_font2.render(my_text, True, black)
      rect = text_surface.get_rect(center=text_pos)
      screen.blit(text_surface, rect)
     screen.blit(hang_up.image,hang_up.rect)
     pygame.display.flip()

   #hang up
   for event in pygame.event.get():
    if(event.type is MOUSEBUTTONDOWN):
     pos = pygame.mouse.get_pos()
     x,y = pos
     print (x,y)
     if x>100 and x<160 and y>220 and y<270:
      old_time=0
      previous_status=0
      menu = 1
      n=0
      while n<2000:
       serialport.write("ATH\r")
       n=n+1
 #incoming call detection
 s = serialport.readline()
 if (s =='RING\r\n'):
     menu = 3
     serialport.write("AT+CLCC\r")
     s = serialport.readline()
     while len(s)<10:
      s = serialport.readline()
     incoming_number=get_incoming(s)
     number=incoming_number
 #incoming menu
 if menu == 3:
  screen.fill(WHITE)
  screen.blit(my_font2.render(incoming_number,True,(0,0,0)),(60,90))
  for my_text, text_pos in ring.items():
    text_surface = my_font2.render(my_text, True, black)
    rect = text_surface.get_rect(center=text_pos)
    screen.blit(text_surface, rect)
  screen.blit(hang_up2.image,hang_up2.rect)
  screen.blit(answer.image,answer.rect)

  pygame.display.flip()
  for event in pygame.event.get():
    if(event.type is MOUSEBUTTONDOWN):
     pos = pygame.mouse.get_pos()
     x,y = pos
     if x>50 and x<110 and y>220 and y<270:
      menu = 2
      n=0
      while n<800:
       serialport.write("ATA\r")
       n=n+1
      status='0'
     if x>130 and x<190 and y>220 and y<270:
      menu = 0
      n=0
      while n<2000:
       serialport.write("ATH\r")
       n=n+1

m_player.py

import serial
import pygame
from pygame.locals import *
import RPi.GPIO as GPIO
import time
import subprocess
import os
import speech_recognition as sr
from os import path
os.putenv('SDL_VIDEODRIVER', 'fbcon') # Display on piTFT
os.putenv('SDL_FBDEV', '/dev/fb1') #
os.putenv('SDL_MOUSEDRV', 'TSLIB') # Track mouse clicks on piTFT
os.putenv('SDL_MOUSEDEV', '/dev/input/touchscreen')
GPIO.setmode(GPIO.BCM)
GPIO.setup(27, GPIO.IN, pull_up_down=GPIO.PUD_UP)
#quit
def GPIO27_callback(channel):
  exit (0)
GPIO.add_event_detect(27,GPIO.FALLING,callback=GPIO27_callback)

def returnnumber(st):
 n=""
 for char in st:
  if char.isdigit():
   n=n+char
 return n
# get incoming number
def get_incoming(st):
 n=""
 in_flag = 0
 for char in st:
  if in_flag<2 and in_flag>0: n=n+char
  if char == '"':
   in_flag = in_flag +1
 return n[:-1]
# get connecting status
def get_status(st):
 n=0
 for char in st:
  if char.isdigit():
   n=n+1
  if n==3:
   return char

def check_over(st):
  return st[0]
# get battery status
def get_battery(st):
 n=""
 in_flag = 0
 for char in st:
  if in_flag<2 and in_flag>0: n=n+char
  if char == ',':
   in_flag = in_flag +1
 return n[:-1]
# translate time to minute:second mode
def get_progress(t):
    minutes=int(t/60)
    second=int(t-minutes*60)
    if second<10:
     time_string=str(minutes)+" : 0"+str(second)
    else:
     time_string=str(minutes)+" : "+str(second)
    return time_string

class Background(pygame.sprite.Sprite):
    def __init__(self, image_file, location):
        pygame.sprite.Sprite.__init__(self)  #call Sprite initializer
        self.image = pygame.image.load("/home/pi/project/"+image_file)
        self.rect = self.image.get_rect()
        self.rect.left, self.rect.top = location

pygame.init()
pygame.mouse.set_visible(False)
clock=pygame.time.Clock()
size = width, height = 240,320
black = 0, 0, 0
WHITE = 255, 255, 255
screen = pygame.display.set_mode(size)
background = Background("background.png",[0,0])
call = Background("call.png",[10,10])
music = Background("music.png",[80,10])
one = Background("1.png",[30,70])
two = Background("2.png",[100,70])
three = Background("3.png",[170,70])
four = Background("4.png",[30,130])
five = Background("5.png",[100,130])
six = Background("6.png",[170,130])
seven = Background("7.png",[30,190])
eight = Background("8.png",[100,190])
nine = Background("9.png",[170,190])
zero = Background("0.png",[100,250])
micro = Background("microphone.png",[40,25])
micro2 = Background ("microphone2.png",[40,25])
back = Background("back.png",[0,10])
call2 = Background("call2.png",[40,260])
delete = Background("delete.png",[180,260])
hang_up = Background("hang_up.png",[90,220])
hang_up2 = Background("hang_up.png",[130,220])
answer = Background("answer.png",[50,220])
battery = Background("battery.png",[200,10])

calling={'Calling...':(100,70)}
ring={'Ring...':(100,70)}
x1 = [20,40,60,80,100,120,140,160,180,200,220]
y1 = 40
my_font = pygame.font.Font(None, 15)
my_font2 = pygame.font.Font(None,25)
menu = 0
number = " "
press = 0
voice_recog=0
old_time=0
status='1'
previous_status=0

serialport = serial.Serial("/dev/ttyS0", 9600, timeout=0.5)
serialport.write("AT\r")
response = serialport.readlines(None)
serialport.write("ATE0\r")
response = serialport.readlines(None)
serialport.write("AT\r")
response = serialport.readlines(None)
serialport.write("AT+CBC\r")
response = serialport.readline()
while len(response)<5:
 response = serialport.readline()
battery_n=get_battery(response)

while 1:
 #main menu 
 if menu == 0:
  screen.blit(background.image,background.rect)
  screen.blit(call.image,call.rect)
  screen.blit(music.image,music.rect)
  screen.blit(battery.image,battery.rect)
  screen.blit(my_font.render(battery_n,True,(0,0,0)),(208,35))
  pygame.display.flip()

  for event in pygame.event.get():
   if(event.type is MOUSEBUTTONDOWN):
    pos = pygame.mouse.get_pos()
    x,y = pos
    if y>10 and y<70 and x>10 and x<70:
     menu = 1
    if y>10 and y<70 and x>80 and x<145:
     GPIO.cleanup()
     import m_player2

 #phone call menu
 if menu == 1:
  screen.fill(WHITE)
  screen.blit(one.image,one.rect)
  screen.blit(two.image,two.rect)
  screen.blit(three.image,three.rect)
  screen.blit(four.image,four.rect)
  screen.blit(five.image,five.rect)
  screen.blit(six.image,six.rect)
  screen.blit(seven.image,seven.rect)
  screen.blit(eight.image,eight.rect)
  screen.blit(nine.image,nine.rect)
  screen.blit(zero.image,zero.rect)
  screen.blit(back.image,back.rect)
  screen.blit(call2.image,call2.rect)
  screen.blit(delete.image,delete.rect)
  if voice_recog == 0:
    screen.blit(micro.image,micro.rect)
  screen.blit(my_font2.render(number,True,(0,0,0)),(80,30))
  pygame.display.flip()
  #voice recognition to dial    
  if voice_recog == 1:
   screen.blit(micro2.image,micro2.rect)
   pygame.display.flip()
   cmd = 'arecord -d 8 -D plughw:1,0 output.wav'
   print subprocess.check_output(cmd, shell=True)
   voice_recog = 0
   AUDIO_FILE = path.join(path.dirname('\home\pi\project'), 'output.wav')
   r=sr.Recognizer()
   with sr.AudioFile(AUDIO_FILE) as source:
    audio = r.record(source)  # read the entire audio file
   try:
       number = number + returnnumber(r.recognize_google(audio))
   except sr.UnknownValueError:
       number = number
   except sr.RequestError as e:
       number = number

  for event in pygame.event.get():
   if(event.type is MOUSEBUTTONDOWN):
    pos = pygame.mouse.get_pos()
    x,y = pos
    if x>0 and x<20 and y>0 and y<30:
     menu = 0
    if x>30 and x<80 and y>65 and y<115:
     number = number+'1'
    if x>100 and x<150 and y>65 and y<115:
     number = number+'2'
    if x>170 and x<220 and y>65 and y<115:
     number = number+'3'
    if x>30 and x<80 and y>125 and y<175:
     number = number+'4'
    if x>100 and x<150 and y>125 and y<175:
     number = number+'5'
    if x>170 and x<220 and y>125 and y<175:
     number = number+'6'
    if x>30 and x<80 and y>185 and y<235:
     number = number+'7'
    if x>100 and x<150 and y>185 and y<235:
     number = number+'8'
    if x>170 and x<220 and y>185 and y<235:
     number = number+'9'
    if x>100 and x<150 and y>245 and y<295:
     number = number+'0'
    if x>170 and x<220 and y>245 and y<295:
     number = number[:-1]
    #make call 
    if x>30 and x<80 and y>245 and y<295:
     menu = 2
     voice_recog=0
     print("Calling " + number);
     serialport.write("AT\r")
     response = serialport.readlines(None)
     serialport.write("ATD " + number + ';\r')
     response = serialport.readlines(None)
     print response

    if x>40 and x<70 and y>20 and y<60:
     voice_recog = 1
 #calling menu    
 if menu == 2:

   while len(s)<5:
    serialport.write("AT+CLCC\r")
    s=serialport.readline()
    status=get_status(s)

   if status == '0':
    previous_status=1
    if old_time==0:
     old_time=time.time()
    else:
     current_time=time.time()-old_time
     screen.fill(WHITE)
     screen.blit(my_font2.render(number,True,(0,0,0)),(80,30))
     screen.blit(my_font2.render(get_progress(current_time),True,(0,0,0)),(100,60))
     screen.blit(hang_up.image,hang_up.rect)
     pygame.display.flip()
   else:
    if previous_status==1:
     menu=1
     previous_status=0
    else:
     old_time=0
     screen.fill(WHITE)
     screen.blit(my_font2.render(number,True,(0,0,0)),(80,30))
     for my_text, text_pos in calling.items():
      text_surface = my_font2.render(my_text, True, black)
      rect = text_surface.get_rect(center=text_pos)
      screen.blit(text_surface, rect)
     screen.blit(hang_up.image,hang_up.rect)
     pygame.display.flip()

   #hang up
   for event in pygame.event.get():
    if(event.type is MOUSEBUTTONDOWN):
     pos = pygame.mouse.get_pos()
     x,y = pos
     print (x,y)
     if x>100 and x<160 and y>220 and y<270:
      old_time=0
      previous_status=0
      menu = 1
      n=0
      while n<2000:
       serialport.write("ATH\r")
       n=n+1
 #incoming call detection
 s = serialport.readline()
 if (s =='RING\r\n'):
     menu = 3
     serialport.write("AT+CLCC\r")
     s = serialport.readline()
     while len(s)<10:
      s = serialport.readline()
     incoming_number=get_incoming(s)
     number=incoming_number
 #incoming menu
 if menu == 3:
  screen.fill(WHITE)
  screen.blit(my_font2.render(incoming_number,True,(0,0,0)),(60,90))
  for my_text, text_pos in ring.items():
    text_surface = my_font2.render(my_text, True, black)
    rect = text_surface.get_rect(center=text_pos)
    screen.blit(text_surface, rect)
  screen.blit(hang_up2.image,hang_up2.rect)
  screen.blit(answer.image,answer.rect)

  pygame.display.flip()
  for event in pygame.event.get():
    if(event.type is MOUSEBUTTONDOWN):
     pos = pygame.mouse.get_pos()
     x,y = pos
     if x>50 and x<110 and y>220 and y<270:
      menu = 2
      n=0
      while n<800:
       serialport.write("ATA\r")
       n=n+1
      status='0'
     if x>130 and x<190 and y>220 and y<270:
      menu = 0
      n=0
      while n<2000:
       serialport.write("ATH\r")
       n=n+1

set_up.py

#Database related ptyhon function to implement projection and selection of songs on music player
#This script is a list of functions. Import this file to use all the functions.
# the coding refres to  https://dev.mysql.com/doc/connector-python/en/connector-python-tutorial-cursorbuffered.html
import mysql.connector
from mysql.connector import errorcode, Error
import numpy as np

#connect to database
def connect():
    """ Connect to MySQL database """
    try:
        print('Connecting to MySQL database...')
        conn = mysql.connector.connect(host='localhost',
                                       database='project5725',
                                       user='root',
                                       password='tbybzm6287065')
        if conn.is_connected():
            print('Connected to MySQL database')

    except mysql.connector.Error as err:
        if err.errno == errorcode.ER_ACCESS_DENIED_ERROR:
         print("Something is wrong with your user name or password")
        elif err.errno == errorcode.ER_BAD_DB_ERROR:
         print("Database does not exist")
        else:
         print(err)
    finally:
        conn.close()
        print('Connection completed')

#create database
def create_database(name):
    try:
        conn = mysql.connector.connect(host='localhost',
                                       user='root',
                                       password='tbybzm6287065')
        cursor = conn.cursor()

        cursor.execute(
            "CREATE DATABASE {} DEFAULT CHARACTER SET 'utf8'".format(name))
    except mysql.connector.Error as err:
        print("Failed creating database: {}".format(err))
        exit(1)
    finally:
        try:
            conn.database = db_name


        except mysql.connector.Error as err:
            if err.errno == errorcode.ER_BAD_DB_ERROR:
               create_database(db_name)
               conn.database = db_name
            else:
               print(err)
               exit(1)
        else:
            if conn.is_connected():
                 print('connection established.')
            else:
                 print('connection failed.')


#table format
TABLES = {}
TABLES['songs'] = (
    "CREATE TABLE `playlist` ("
    "  `song_id` int NOT NULL,"
    "  `song_name` varchar(100) NOT NULL,"
    "  `singer_name` varchar(100) NOT NULL,"
    "  `genre` varchar(100) NOT NULL,"
    "  `album_name` varchar(100) NOT NULL,"
    "  PRIMARY KEY (`song_id`)"
    ") ENGINE=InnoDB")

#create table in a specfic database
def create_tables(TABLES,db_name):
    response='0'
    conn = mysql.connector.connect(host='localhost',
                                       database=db_name,
                                       user='root',
                                       password='tbybzm6287065')
    if conn.is_connected():
                 print('connection established.')
    else:
                 print('connection failed.')
    cursor = conn.cursor()
    for name, ddl in TABLES.items():
        try:
            print("Creating table {}: ".format(name))
            cursor.execute(ddl)
        except mysql.connector.Error as err:
            if err.errno == errorcode.ER_TABLE_EXISTS_ERROR:

                response="playlist already exists."
            else:

                response=err.msg
        else:
            print("OK")

    cursor.close()
    conn.close()

    return response

#add tuple (song) in table of music
def add_music(info):
   conn = mysql.connector.connect(user='root', database='project5725',password='tbybzm6287065')
   cursor = conn.cursor()
   add_music = ("INSERT INTO Songs "
               "(song_id, song_name, singer_name, genre, album_name) "
               "VALUES (%s, %s, %s, %s, %s)" "ON DUPLICATE KEY UPDATE")
   cursor.execute(add_music, info)
   conn.commit()

   cursor.close()
   conn.close()


#select music with its ID and print the result onto screen
def select_music(ID):
    conn = mysql.connector.connect(user='root', database='project5725',password='tbybzm6287065')
    cursor = conn.cursor()

    query = ("SELECT * FROM Songs "
             "WHERE song_id = %s")%(ID)


    cursor.execute(query)
    #fet each row of table
    row = cursor.fetchone()

    while row is not None:
     print(row)
     row = cursor.fetchone()

    cursor.close()
    conn.close()

#delete a song from the table list
def delete_music(ID):
    conn = mysql.connector.connect(user='root', database='project5725',password='tbybzm6287065')
    cursor = conn.cursor()

    query = ("DELETE FROM Songs "
             "WHERE song_id = %s")




    cursor.execute(query,(ID,))

    conn.commit()
    cursor.close()
    conn.close()


#select all the songs from music table and store into a numpy array
def select_all_music():
    try:
         conn = mysql.connector.connect(user='root', database='project5725',password='tbybzm6287065')
         cursor = conn.cursor()

         query = ("SELECT * FROM Songs ")


         result=[]
         counter=0
         cursor.execute(query)
         row = cursor.fetchone()

         #encoding the ptyhon list object into numpy array
         while row is not None:
          result=np.append(result,row)
          counter=counter+1
          row = cursor.fetchone()
    except Error as error:
         print(error)

    finally:
         result=np.reshape(result,(counter,5))
         cursor.close()
         conn.close()
    return result

#select all the songs from playlist and store into a numpy array 
def select_playlist():
    response='0'
    try:
         conn = mysql.connector.connect(user='root', database='project5725',password='tbybzm6287065')
         cursor = conn.cursor()

         query = ("SELECT * FROM playlist ")


         result=[]
         counter=0
         cursor.execute(query)
         row = cursor.fetchone()
         #encoding the ptyhon list object into numpy array
         while row is not None:
          result=np.append(result,row)
          counter=counter+1
          row = cursor.fetchone()
    except Error as error:
         print(error)
         response='No playlist exists'
    finally:
         result=np.reshape(result,(counter,5))
         cursor.close()
         conn.close()
    return result, response

# delete the playlist from database 
def delete_list(name):
   conn = mysql.connector.connect(user='root', database='project5725',password='tbybzm6287065')
   cursor = conn.cursor()
   response='0'
   try:
         delete_music = ("Drop Table %s")%(name)

         cursor.execute(delete_music,)
         response='Deletion of playlist completed'
   except Error as error:
         response='No playlist exists'

   finally:
         cursor.close()
         conn.close()
   return response

# add song into the playlist database
def add_list(info):
   conn = mysql.connector.connect(user='root', database='project5725',password='tbybzm6287065')
   cursor = conn.cursor()
   try:

         add_music = ("INSERT IGNORE INTO playlist"
               "(song_id, song_name, singer_name, genre, album_name) "
               "VALUES (%s, %s, %s, %s, %s)")
         cursor.execute(add_music,info,)
         conn.commit()
   except Error as error:
         print(error)
   else :print("OK")
   finally:
         cursor.close()
         conn.close()

#update specfic table with its specfic attribute
def update_music(attribute,ID,content):
    # prepare query and data
    query =("UPDATE Songs "
            "SET %s = '%s'"
            "WHERE song_id = %s")%(attribute,content,ID,)

    try:
        conn = mysql.connector.connect(user='root', database='project5725',password='tbybzm6287065')
        cursor = conn.cursor()

        cursor.execute(query)

        # accept the changes
        conn.commit()

    except Error as error:
        print(error)

    finally:
        cursor.close()
        conn.close()
#update_music('album_name',23,'Kingdom Come') 

#slection of songs based on name of singer
def select_singer(name):
    try:
         query = ("SELECT * FROM Songs "
             "WHERE singer_name = '%s'")%(name)

         conn = mysql.connector.connect(user='root', database='project5725',password='tbybzm6287065')
         cursor = conn.cursor()
         result=[]
         counter=0
         cursor.execute(query)

         row = cursor.fetchone()

         while row is not None:
           result=np.append(result,row)
           counter=counter+1
           row = cursor.fetchone()

    except Error as error:
         print(error)

    finally:
         result=np.reshape(result,(counter,5))
         cursor.close()
         conn.close()
    return result

#selection of songs based on genre 
def select_genre(name):
    try:
         query = ("SELECT * FROM Songs "
             "WHERE genre = '%s'")%(name)

         conn = mysql.connector.connect(user='root', database='project5725',password='tbybzm6287065')
         cursor = conn.cursor()
         result=[]
         counter=0
         cursor.execute(query)

         row = cursor.fetchone()

         while row is not None:
           result=np.append(result,row)
           counter=counter+1
           row = cursor.fetchone()

    except Error as error:
         print(error)

    finally:
         result=np.reshape(result,(counter,5))
         cursor.close()
         conn.close()
    return result

#selection of songs based on album name  
def select_album(name):
    try:
         query = ("SELECT * FROM Songs "
             "WHERE album_name = '%s'")%(name)

         conn = mysql.connector.connect(user='root', database='project5725',password='tbybzm6287065')
         cursor = conn.cursor()
         result=[]
         counter=0
         cursor.execute(query)

         row = cursor.fetchone()

         while row is not None:
           result=np.append(result,row)
           counter=counter+1
           row = cursor.fetchone()

    except Error as error:
         print(error)

    finally:
         result=np.reshape(result,(counter,5))
         cursor.close()
         conn.close()
    return result


#testing the connection
if __name__ == '__main__':
    connect()

record_groundtruth.py

# this script automatcally generate a predefined number of ground truth voice samples
# upon recording, instruction would be given
# number of samples is defined with variable 'k' in main function
import pyaudio
import wave
import os
import time
import struct
import numpy as np
import matplotlib.pyplot as plt
from scipy.fftpack import fft


# Options
CHUNK = 1024 # The size of each audio chunk coming from the input device.
FORMAT = pyaudio.paInt16 # Should not be changed, as this format is best for speech recognition.
RATE = 44100 # Speech recognition only works well with this rate.  
RECORD_SECONDS = 5 # Number of seconds to record, can be changed.
number=0
nrecord=4
CHANNELS=1
FILENAME=np.empty(nrecord,dtype=object)

for number in range (nrecord):

    FILENAME[number] = ("voice_sample%d"%number+".wav")



def save_audio(wav_file,k):
    """
    Stream audio from an input device and save it.
    """
    p = pyaudio.PyAudio()

    stream = p.open(
        format=FORMAT,
        channels=CHANNELS,
        rate=RATE,
        input=True,
        output=True,
        frames_per_buffer=CHUNK
    )

    print("* recording  %d"%(k+1))

    frames = []

    for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
        data = stream.read(CHUNK)
        frames.append(data)

    print("* done recording %d"%(k+1))

    stream.stop_stream()
    stream.close()

    p.terminate()

    wf = wave.open(wav_file, 'wb')
    wf.setnchannels(CHANNELS)
    wf.setsampwidth(p.get_sample_size(FORMAT))
    wf.setframerate(RATE)
    wf.writeframes(b''.join(frames))
    wf.close()

# Run the thing!； k defines the number of files
if __name__ == '__main__':
     for k in range (nrecord):
          print("* prepare recording %d"%(k+1))
          time.sleep(2)
          save_audio(FILENAME[k],k)
     print("***********************************************")
     print("* Done all the recordings")

    #result = recognize(WAVE_OUTPUT_FILENAME)
    # print "You just said: {0}".format(result[0])

time_domain.py

# A script to implement time domain based cross correlation to evaluate the matchness of testing wav file and groudn truth.
import pyaudio
import wave
import os
import time
import struct
import numpy as np
import matplotlib.pyplot as plt
from scipy.fftpack import fft
#format the wav
CHUNK = 1024 # The size of each audio chunk coming from the input device.
FORMAT = pyaudio.paInt16 # Should not be changed, as this format is best for speech recognition.
RATE = 44100 # Speech recognition only works well with this rate.  Don't change unless your microphone demands it.
RECORD_SECONDS = 5 # Number of seconds to record, can be changed.


CHANNELS=1
FILENAME=np.empty(nrecord,dtype=object)

#record a wav file 
def save_audio(wav_file,k):
    """
    Stream audio from an input device and save it.
    """

    p = pyaudio.PyAudio()

    #device = find_device(p, ["input", "mic", "audio"])
    #device_info = p.get_device_info_by_index(device)
    #channels = int(device_info['maxInputChannels'])

    stream = p.open(
        format=FORMAT,
        channels=CHANNELS,
        rate=RATE,
        input=True,
        output=True,
        frames_per_buffer=CHUNK
    )

    print("* recording  %d"%(k+1))

    frames = []

    for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
        data = stream.read(CHUNK)
        frames.append(data)

    print("* done recording %d"%(k+1))

    stream.stop_stream()
    stream.close()

    p.terminate()

    wf = wave.open(wav_file, 'wb')
    wf.setnchannels(CHANNELS)
    wf.setsampwidth(p.get_sample_size(FORMAT))
    wf.setframerate(RATE)
    wf.writeframes(b''.join(frames))
    wf.close()

save_audio( "voice_signal.wav",0)


nrecord=4
records=[]
#encode the ground truth into numpy array 
for NUMBER in range (nrecord):
#encoding string into float number
 a=wave.open("voice_sample%d"%NUMBER+".wav", 'rb')
 datasize=a.getsampwidth()*a.getnframes()
 nframe=int(datasize/2)
 data=a.readframes(nframe)
 data_int=struct.unpack(str(nframe)+'h',data)
 data_np=np.array(data_int, float)
 # normalization
 total=np.sum(data_np,axis=0)
 data_np=data_np/total*1000000
 a.close()

 data_np=np.reshape(data_np,(1,nframe))

 records=np.append(records,data_np)
records=records.reshape(nrecord,nframe)
baseline=records
 #yf=fft(data_int)
 #yf=np.abs(yf)
 #length=len(yf)
 #yf=np.abs(yf[0:int(length/2)]/65536/CHUNK)
 #yf=np.reshape(yf,(1,int(nframe/2)))


#reocording wav file

save_audio("voice_signal.wav", 0)
#encoding of testing voice file
t=wave.open("voice_signal.wav", 'rb')
datasize=t.getsampwidth()*t.getnframes()
nframe=int(datasize/2)
test_data=t.readframes(nframe)
test_int=struct.unpack(str(nframe)+'h',test_data)
test_data=np.array(test_int, float)
total=np.sum(test_data,axis=0)
test_data=test_data/total*1000000
t.close()

#average every 100 bins
t_s=[]
for i in range (0,int(nframe/100)):
   a=i*100
   s=np.sum(test_data[a:a+100])
   s=s/100
   t_s=np.append(t_s,s)
t_s=np.reshape(t_s,(int(nframe/100)))
t_s.shape



##cross corelation using np.coorelate function
sum_result=[]
for j in range (4):
     sum=0
     sum= np.correlate(t_s, baseline[j])
     sum_result=np.append(sum_result,sum)
sum_result=np.reshape(sum_result,(nrecord,))
sum_result

frequency_domain.py

# A script to implement FFT based cross correlation to evaluate the matchness of testing wav file and groudn truth.
import pyaudio
import wave
import os
import time
import struct
import numpy as np
import matplotlib.pyplot as plt
from scipy.fftpack import fft
#format the wav
NUMBER=0
records2=[]
nrecord=4
#record a wav file 
def save_audio(wav_file,k):
    """
    Stream audio from an input device and save it.
    """



    #device = find_device(p, ["input", "mic", "audio"])
    #device_info = p.get_device_info_by_index(device)
    #channels = int(device_info['maxInputChannels'])

    stream = p.open(
        format=FORMAT,
        channels=CHANNELS,
        rate=RATE,
        input=True,
        output=True,
        frames_per_buffer=CHUNK
    )

    print("* recording  %d"%(k+1))

    frames = []

    for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
        data = stream.read(CHUNK)
        frames.append(data)

    print("* done recording %d"%(k+1))

    stream.stop_stream()
    stream.close()

    p.terminate()

    wf = wave.open(wav_file, 'wb')
    wf.setnchannels(CHANNELS)
    wf.setsampwidth(p.get_sample_size(FORMAT))
    wf.setframerate(RATE)
    wf.writeframes(b''.join(frames))
    wf.close()


#encode the ground truth into numpy array 
for NUMBER in range (nrecord):
 #encoding string into float number
 a=wave.open("voice_sample%d"%NUMBER+".wav", 'rb')
 datasize=a.getsampwidth()*a.getnframes()
 print("number_of_data:",datasize)
 nframe=int(datasize/2)
 print("number_of_frame:",nframe)
 data=a.readframes(nframe)
 data_int=struct.unpack(str(nframe)+'h',data)
 print("Length=",len(data_int))

 # normalization
 data_np=np.array(data_int, float)

 total=np.sum(data_np,axis=0)
 data_np=data_np/total*1000000

 a.close()

 #fft 
 yf=fft(data_np)
 yf=np.abs(yf)
 length=len(yf)
 yf=np.abs(yf[0:int(length/2)]/65536/CHUNK)

 # transform it into numpy array format
 yf=np.reshape(yf,(1,int(nframe/2)))

 records2=np.append(records2,yf)

records2=records2.reshape(nrecord,int(nframe/2))
records2.shape

# generate ground truth by taking average of adjacent 100 bins to reduce noise
baseline=[]
for i in range (0, nrecord):
   r=[]
   for j in range (0,int(nframe/100)):
      a=i*100
      s=np.sum(records2[i][a:a+100])
      s=s/100
      r=np.append(r,s)
   r=np.reshape(r,(int(nframe/100),1))
   baseline=np.append(baseline,r)
baseline=np.reshape(baseline, (nrecord,(int(nframe/100))) )
print(baseline.shape)


#reocording wav file

save_audio("voice_signal3.wav", 0)

#encodig the testing wav file
t=wave.open("voice_signal3.wav", 'rb')
datasize=t.getsampwidth()*t.getnframes()
nframe=int(datasize/2)
data=t.readframes(nframe)
data_int=struct.unpack(str(nframe)+'h',data)
t.close()
data_np=np.array(data_int, float)

#normalization
total=np.sum(data_np,axis=0)
data_np=data_np/total*1000000

#fft
tf=fft(data_np)
length=len(tf)
tf=np.abs(tf[0:int(length/2)]/65536/CHUNK)
tf=np.abs(tf)

#slection of songs based on genre
t_s=[]
for i in range (0,int(nframe/100)):
   a=i*100
   s=np.sum(tf[a:a+100])
   s=s/100
   t_s=np.append(t_s,s)
t_s=np.reshape(t_s,(int(nframe/100)))
t_s.shape
sum_result=[]

#multiplication; cross corelation
for j in range (nrecord):
     sum=0
     for i in range (int(nframe/100)):
         sum=sum+t_s[i]*baseline[j][i]
     sum_result=np.append(sum_result,sum)
sum_result=np.reshape(sum_result,(nrecord,))
sum_result