See my fork of AI Pronunciation Trainer repository for more details.
Right now this tool uses:
faster_whisper as STT (speech-to-text) model; other supported models are:
48000 as input samplerate value (from empirical tests the best sample rate value is 48000)
16000 as resampled samplerate value
16000 as TTS (text-to-speech) samplerate value
Speech accuracy output