audio-inference reverse proxy: use /stt/ or /tts/