Spot-on demonstration of where we are on Mori's curve right now. The prosody issue you mention is critical, AI voices nail lexical stress but often miss the pragmatic layer (where humans modulate pitch/timing based on discourse context, not just word meaning). What's wild is how the valley depth varies by listener familiarity with synth voices. People who interact with voice agents daily seem to have shifted their threshold, they're less bothered by mismatches that would've triggered revulsion a year ago. Adaptation is hapening faster than the technology is improving.
Spot-on demonstration of where we are on Mori's curve right now. The prosody issue you mention is critical, AI voices nail lexical stress but often miss the pragmatic layer (where humans modulate pitch/timing based on discourse context, not just word meaning). What's wild is how the valley depth varies by listener familiarity with synth voices. People who interact with voice agents daily seem to have shifted their threshold, they're less bothered by mismatches that would've triggered revulsion a year ago. Adaptation is hapening faster than the technology is improving.
Thanks for the feedback. We're mostly comfortable with humans code-switching, so perhaps this increased tolerance is a version of that?