Impressive Plugin, nice work!
btw. you can try the 7b models instead of the 13b. They are faster and performs better for chat situations as the 13b models because that is their main role.
I prefer this one: Undi95/Toppy-M-7B its also from Undi like the lewd models. Maybe you can find it also on the Blokes board as GPTQ etc. But if you can run it as the original transformer model, then it has the best interference.
You can also watch on the LLM Leaderboard for good new models.
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
The best models for this usage, have an high HellaSwag and Winogrande.