The good thing is once the learning is done the dataset can be copied and re-used easily. The learning process doesn't need to be repeated. The "playback" mode has learning disabled and just works as per the final round of learning.
With dynamic rewards systems where you can give it new "goals" in real time you could easily control movement. Such as making it want to touch a certain collider with it's hand, or walk to a certain location etc.
All easily achievable with stuff like this by a clever person. Sadly I am not an AI guru, it needs somebody way smarter than me.