thinking out loud...
connect the video to a speech to text software,
translate the text to your desired language,
read the text with a text to speech software and record it
burn in the new audio,
problems i can think about:
time stamps with first step,
sound effects
not accurate
thoughts?