Recognize simple keyword lists as well as more complex grammar using Windows 10 and Unity's speech recognition features. When starting with this project the assumption was I can wrap the simple looking API in a plugin and build a simple demo scene in just a few hours. However, making a reasonable plugin UI, implementing error handling and necessary performance features as well as circumvent VaM quirks with file dependencies took a bit longer.

(Watch video with sound!)

1.jpg 2.jpg xml.png

Features
  • This is not "AI", it does only understand phrases you teach it to understand. It simply triggers whatever you hook up to it when a phrase is recognized.
  • Keyword Phrase: Simple list of keyword variants. Works great for things like "Yes", "Yeah!", "Yep", ... or "No", "Nope", ...
  • Grammar Phrase: Provide an XML file that defines a grammar structure, which helps system to make more sense of what you are saying, provide synonyms for individual words, etc. I recommend the Microsoft documentation here for more details what is possible.
  • The plugin allows you to listen for multiple phrases simultaneously. Each have their own trigger actions. To provide context you can enable/disable phrases via triggers depending on what makes sense at the time in the scene.
  • UI should be familiar if you used LogicBricks before.
  • Simple Demo scene from the video.
Examples
Some sentences that should be recognized by the demo scene's grammar definitions:
  • "launch the terminator robots"
  • "deploy killer androids"
  • "release the mutant penguins"
  • "retrieve the sharks"
  • "recall ninja sharks"
In this case I did stick to simple sentences structured as "action - fluff - object". Of course you could define variation not just in words but also in sentence structure.

Notes / FAQ
  • This plugin is using your system default microphone, not your microphone setting in Oculus or SteamVR software! Make sure your mic is enabled and has a reasonable volume setting.
  • While SpeechRecognition works with other languages, to run the demo you need to go to your Windows 10 speech settings and make sure they are set to English (US). You may have to install the voice package as well. Note that other English variants, like UK or Australian, won't work, it has to be US! This is a different setting than your regular Windows region/language setting. You can try to enable "Recognize non-native accents", but at least for me it actually reduced the recognition reliability. See screenshot for details:
    • 1678647940526.png
  • WHY on earth does this use a ".json" file extension for ".xml" files??! Because....VaM. When creating a VAR package this allows dependencies to be handled automatically, making sure your scene does not break. As the list of extensions properly supported by VaM is fixed, ".json" is simply the least terrible choice.
  • If you XML files refer to other XML files using the ruleref tag, you can/should add a dependency hint for each file needed, so VaM can find them. For each file add a comment line like this:
<!-- "VaM Dependency Hint" : "common.json" -->
  • Of course Windows 10 Speech Recognition does not know what a VAR package is and how to get files out of it. Therefore, when the plugin encounters XML files (".json") that are located inside a VAR package, all ".json" files in that package are extracted to "Custom\PluginData\SpeechRecognition" to make them accessible. Sadly that includes other files likes scenes that don't really need to be extracted. If you have lots of those, you might want to put your XML files into a separate VAR package to avoid extracting more than needed.
  • The problem with speech recognition is that a creator will have a hard time to think of all the possible things a player could say and setup recognition and reactions for those. There is simply no way to implement everything. Usually you will have to give the user some kind of hint what types actions can be done.
Credits for EvilCorpHQ scene
  • Speech synthesis for assistant voice by Amazon Polly (via ttsmp3.com)
Dependencies for EvilCorpHQ scene
  • All the needed dependencies can be found for free on the Hub, just use VaM's handy "Scan Hub For Missing Packages" function.
License
  • This was an EarlyAccess release! Download is now available for free under CC BY-SA license. You are allowed to reference this package in your own VAR packages, even if they are paid or use a different license. Links to my Patreon are always appreciated.
PatreonBanner480.png
Author
MacGruber
Downloads
142,651
Views
142,651
Dependencies
3
Packages
2
Total Size
0.13 MB
Version
3 (free)
First release
Last update
Rating
5.00 star(s) 9 ratings

More resources from MacGruber

Latest updates

  1. Version 3 (free)

    Changelog Fixed null reference exception that was apparently introduced due to an API change...
  2. Version 2 (free)

    Changelog for EvilCorpHQ demo Upgraded to current plugin versions. This mainly fixes the broken...
  3. Version 1 (free) (forgot the demo scene)

    Uploaded missing demo scene as well ;)

Latest reviews

Very cool library based speech recognition plugin that sidesteps some of the trendy attempts at ML integrations.
Upvote 0
I loved the demo :).
Upvote 0
This is an amazing feature, especially for VR, making many controls so much easier to access. It's been quite an inspiration for me, leading to my recently published Voice Commander, which uses some of the same basic features showcased here to add speech recognition for all UI buttons to any scene. Thank you very much for inspiring me to create that.
Upvote 0
I love it! So much more natural to interact through speech than buttons.
Upvote 0
Logic bricks and speech recognition take time to understand and effort to implement, but vam is infinitely more immersive when they're properly woven into a scene.

One note on the demo scene MacGruber has included: you can have all the discussed setup in Windows / vam correct (e.g. default mic), and the scene still might not work. For the demo to work correctly, you need to have EN-US set as your Speech inside the Language setting in Windows. Otherwise, the demo scene will throw a pretty obvious error (mismatch) and fail to recognize anything you say, even if you're speaking in a US English dialect. Your Windows display / keyboard language doesn't matter, just the speech setting.

Overall fantastic work and fantastic demo. This scene is actually a great way for beginners to get started on logic bricks as well; just have a look through the scene controller and voice recognition nodes at the corner of the desk, and it's really easy to see what's possible. MacGruber ftw
Upvote 0
Acid Bubbles and MacGruber ftw!
Upvote 0
Just played around with this a bit. Amazing plugin! Recognizes the words very good. Also supports different languages :)
Upvote 0
Awesome!! Very slick work, the 'add keyword' option is really cool, thank you for making and sharing this!
Upvote 0
The speech recognition works quite well for me and I've found that it really increases immersion.
Upvote 0
Back
Top Bottom