Following on from R2 Performance Testing and subsequent ASR tuning I have been working on further speech recognition analysis in the last week.
The Release 2 solution includes the rollout of Nuance Speech Recognition (ASR) for existing Customer identification. This is based on them saying their postcode and then the first line of the address.
At this client we have a total of 9 Nuance Recognizer servers so pulling of the Nuance log files, analysing them to identify calls with 5 or more utterances, pulling off and listening to each of the individual utterance WAV file and then manually looking up addresses in our customer database was all becoming very time consuming and monotonous!
Therefore I decided to extend the functionality of my custom Nuance Log Analyser tool to do all this at the click of a button! I also did a bit of playing with Microsoft Speech to Text using the dictation grammar to transcribe the audio utterances into text for me automatically!
The output for each call to be analysed further (since the utterance count would indicate retries on both postcode and address line prompts) is 4 files: a WAV file containing the merged utterances separated with a “beep”, a text file containing a possible transcription of the audio, a text file containing the actual ASR interpretations and a text file containing possible addresses returned from the customer database.
Reviewing each call now only takes 10 – 15 seconds!
Here are some screenshots: