Read aloud problem

skil-phil · 09-28-2024, 05:13 AM

Hi,
I have just tried to use the read aloud feature.
Using the flite and speech dispatcher engines I get a voice (but terrible).
Using the Piper engine the only thing that happens is a word or phrase is highlighted but no sound. The same word remains highlighted but nothing else happens.
Tried a different voice. Still nothing.
Any ideas?
Thanks
Phil

JSWolf · 09-28-2024, 05:15 AM

Quote:

Originally Posted by skil-phil

Hi,
I have just tried to use the read aloud feature.
Using the flite and speech dispatcher engines I get a voice (but terrible).
Using the Piper engine the only thing that happens is a word or phrase is highlighted but no sound. The same word remains highlighted but nothing else happens.
Tried a different voice. Still nothing.
Any ideas?
Thanks
Phil

Which version of calibre? Which OS?

skil-phil · 09-28-2024, 06:22 AM

Quote:

Originally Posted by JSWolf

Which version of calibre? Which OS?

7.19
Linux mint 21

JSWolf · 09-28-2024, 06:35 AM

Quote:

Originally Posted by skil-phil

7.19
Linux mint 21

Are you by any chance using Wayland?

skil-phil · 09-28-2024, 07:11 AM

No
Display: x11 server: X.Org v: 1.21.1.4 driver: X: loaded: modesetting unloaded: fbdev,vesa
gpu: i915 display-ID: :0 screens: 1

JSWolf · 09-28-2024, 07:12 AM

Quote:

Originally Posted by skil-phil

No
Display: x11 server: X.Org v: 1.21.1.4 driver: X: loaded: modesetting unloaded: fbdev,vesa
gpu: i915 display-ID: :0 screens: 1

Then I'm out of ideas. Someone else will have to help.

skil-phil · 09-28-2024, 07:14 AM

Quote:

Originally Posted by JSWolf

Then I'm out of ideas. Someone else will have to help.

Thanks for the suggestions.

kovidgoyal · 09-28-2024, 10:21 PM

If you are not getting sound then its because the Qt Multimedia module is unable to connect to the audio device. Check that you have working pipewire/pulseaudio/alsa and try changing the default audio device in the read aloud configuration.

skil-phil · 09-29-2024, 12:39 AM

Quote:

Originally Posted by kovidgoyal

If you are not getting sound then its because the Qt Multimedia module is unable to connect to the audio device. Check that you have working pipewire/pulseaudio/alsa and try changing the default audio device in the read aloud configuration.

There is only one option 'Built-in Audio Analog Stereo'
Do have pulseaudio working in other apps.
Will fiddle some more.
Thanks for the suggestion.
Phil

noodler · 10-18-2024, 02:03 PM

Quote:

Originally Posted by kovidgoyal

If you are not getting sound then its because the Qt Multimedia module is unable to connect to the audio device. Check that you have working pipewire/pulseaudio/alsa and try changing the default audio device in the read aloud configuration.

I have a similar looking problem on Ubuntu 22.04 but I have found something unusual that suggests it's not the sound settings.

1. If I start read aloud, it highlights a sentence but does nothing - no sound and does not advance. I see a piper process max out 12/24 cores and allocate up to 1.5GB of memory.

2. However if I open the read aloud settings from the toolbar and click "cancel", it will read aloud the current sentence, then highlight the next, but not proceed further.

3. It will read a sentence each time I open and cancel the read aloud settings. If I click "ok" on the read aloud settings, it does not read the sentence.

4. I get no relevant messages / errors in the terminal or syslog

Hope that helps!

kovidgoyal · 10-18-2024, 02:06 PM

piper maxing out your CPUs is normal it proceeds to synthesize audio for the entire current chapter regardless of how slow the actual speaking goes.

Run the viewer as

calibre-debug -w /path/to/ebook

and you will ge tplenty of messgaes in the terminal.

noodler · 10-18-2024, 02:51 PM

It looks like an audio clip is only getting sent to the device when the audio state is toggled.

These are the logs at the moment I cancel the settings dialog after 10s without audio that makes one sentence to be read out loud:

Code:

[10.45] Audio state: State.IdleState
[10.45] Utterance 1 audio output finished
[10.47] Audio sent to output: maxlen=16384 len(ans)=16384
[10.47] Audio state: State.ActiveState

At all other times, all I see I see is piper zooming through the synthesis and "Waiting for audio to finish playing..." with no errors but no sound or "audio output finished" messages e.g.

Code:

[1.22] Utterance 3 synthesis started
[1.22] Synthesized data read: 36864 bytes
[1.22] [piper-debug] Phonemizing text: “I hold at your neck the gom jabbar,” she said.
[1.22] [piper-debug] Converting 50 phoneme(s) to ids: aɪ hˈoʊld æt jʊɹ nˈɛk ðə ɡˈɑːm dʒˈæbɑːɹ, ʃiː sˈɛd.
[1.22] [piper-debug] Converted 50 phoneme(s) to 103 phoneme id(s): xxx
[1.22] [piper-debug] Synthesizing audio for 103 phoneme id(s)
[1.50] [piper-debug] Synthesized 2.2639455782312927 second(s) of audio in 0.280647179 second(s)
[1.50] Synthesized data read: 65536 bytes
[1.50] [piper-info] Waiting for audio to finish playing...
[1.50] [piper-info] Real-time factor: 0.13556893212154525 (infer=0.9396494800000001 sec, audio=6.931156462585034 sec)
[1.50] Utterance 3 got 102400 bytes of audio data from piper
[1.50] Utterance 4 synthesis started
[1.50] Synthesized data read: 34304 bytes
[1.50] [piper-debug] Phonemizing text: “The gom jabbar, the highhanded enemy.
[1.50] [piper-debug] Converting 40 phoneme(s) to ids: ðə ɡˈɑːm dʒˈæbɑːɹ, ðə hˈaɪhændᵻd ˈɛnəmi.
[1.50] [piper-debug] Converted 40 phoneme(s) to 83 phoneme id(s): xxx
[1.50] [piper-debug] Synthesizing audio for 83 phoneme id(s)
[1.79] [piper-debug] Synthesized 2.345215419501134 second(s) of audio in 0.28609584 second(s)
[1.79] Synthesized data read: 65536 bytes
[1.79] [piper-info] Waiting for audio to finish playing...
[1.79] [piper-info] Real-time factor: 0.13213628513180542 (infer=1.2257453200000001 sec, audio=9.276371882086167 sec)
[1.79] Utterance 4 got 99840 bytes of audio data from piper
[1.79] Utterance 5 synthesis started

Please find the full log attached.

kovidgoyal · 10-18-2024, 11:39 PM

Yes as I said its an issue with your audio device. For whatever reason its not reading the audio data that is available to it. Sadly audio on Linux is such an absolute cluster fuck that it could be anything, I haven't the first clue where you would go to debug it. The calibre piper code will emit the readyRead() signal when synthesized data is available. It is then upto the audio device to read that data, which it isnt on your system.

noodler · 10-19-2024, 11:46 AM

I've taken a look at the code and found a fix by removing the line that sets a large buffer on the QAudioSink in piper.py:

Code:

self._audio_sink.setBufferSize(2 * 1024 * 1024)

With that line removed:
- the default buffer size on Ubuntu 22.04 is just 3794 bytes rather than 2097152 bytes
- piper tts now plays fine - no stutter or glitches (sounds great btw, looking forward to using it!)

It feels like the Nagle algorithm type issue i.e. QT/linux is waiting for enough data to be written into the buffer before processing it to avoid under runs but calibre only writes one utterance at a time and waits for it to be processed to sync the highlight. A single utterance is tiny relative to a 2MB buffer size so processing stalls.

Might count as a QT bug?

I guess might need to elevate buffer size to a setting given different platforms seem quite sensitive to it.

kovidgoyal · 10-19-2024, 11:00 PM

Definitely a bug in either Qt or the underlying audio driver. It's pretty ridiculous for a audio driver to refuse to output audio until its buffer is full. What if the user is playing short, intermittent audio sounds? I will note it works fine on all the Linux, Windows and macOS systems I have. If you can reproduce it easily I suggest writing a small PySide based script to reproduce it (just create an audio device with a large buffer and write some random raw audio data into it) and open a bug report at Qt with your reproducer script.

And in the next release there will be a tool to generate Piper based audio overlays which means you can pre-create the TTS audio files and play then directly using the calibre viewer to workaround the Qt bug.

09-28-2024, 05:13 AM	#1
skil-phil Connoisseur Posts: 74 Karma: 6698 Join Date: Sep 2022 Location: South Africa Device: kindle pw10	Read aloud problem Hi, I have just tried to use the read aloud feature. Using the flite and speech dispatcher engines I get a voice (but terrible). Using the Piper engine the only thing that happens is a word or phrase is highlighted but no sound. The same word remains highlighted but nothing else happens. Tried a different voice. Still nothing. Any ideas? Thanks Phil

10-19-2024, 11:46 AM	#14
noodler Member Posts: 10 Karma: 10 Join Date: Feb 2023 Device: none	I've taken a look at the code and found a fix by removing the line that sets a large buffer on the QAudioSink in piper.py: Code: self._audio_sink.setBufferSize(2 * 1024 * 1024) With that line removed: - the default buffer size on Ubuntu 22.04 is just 3794 bytes rather than 2097152 bytes - piper tts now plays fine - no stutter or glitches (sounds great btw, looking forward to using it!) It feels like the Nagle algorithm type issue i.e. QT/linux is waiting for enough data to be written into the buffer before processing it to avoid under runs but calibre only writes one utterance at a time and waits for it to be processed to sync the highlight. A single utterance is tiny relative to a 2MB buffer size so processing stalls. Might count as a QT bug? I guess might need to elevate buffer size to a setting given different platforms seem quite sensitive to it. Last edited by noodler; 10-19-2024 at 11:51 AM.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Read-Aloud suggestion	ElectricOutcast	Viewer	0	05-12-2023 10:05 PM
Read Aloud in Japanese	magphil	Calibre	1	02-07-2023 11:27 AM
Center read text when auto scrolling and Read Aloud	Revolving Pixel	Viewer	1	07-17-2021 09:04 PM
KF8 and read-aloud	marcelo2605	Kindle Fire	5	04-12-2012 01:34 PM
Will Kindle DX read aloud PDFs?	JoeC	Amazon Kindle	15	05-07-2009 10:47 AM

09-28-2024, 07:11 AM	#5
skil-phil Connoisseur Posts: 74 Karma: 6698 Join Date: Sep 2022 Location: South Africa Device: kindle pw10	No Display: x11 server: X.Org v: 1.21.1.4 driver: X: loaded: modesetting unloaded: fbdev,vesa gpu: i915 display-ID: :0 screens: 1

09-28-2024, 10:21 PM	#8
kovidgoyal creator of calibre Posts: 44,468 Karma: 24044628 Join Date: Oct 2006 Location: Mumbai, India Device: Various	If you are not getting sound then its because the Qt Multimedia module is unable to connect to the audio device. Check that you have working pipewire/pulseaudio/alsa and try changing the default audio device in the read aloud configuration.

10-18-2024, 02:06 PM	#11
kovidgoyal creator of calibre Posts: 44,468 Karma: 24044628 Join Date: Oct 2006 Location: Mumbai, India Device: Various	piper maxing out your CPUs is normal it proceeds to synthesize audio for the entire current chapter regardless of how slow the actual speaking goes. Run the viewer as calibre-debug -w /path/to/ebook and you will ge tplenty of messgaes in the terminal.

10-18-2024, 11:39 PM	#13
kovidgoyal creator of calibre Posts: 44,468 Karma: 24044628 Join Date: Oct 2006 Location: Mumbai, India Device: Various	Yes as I said its an issue with your audio device. For whatever reason its not reading the audio data that is available to it. Sadly audio on Linux is such an absolute cluster fuck that it could be anything, I haven't the first clue where you would go to debug it. The calibre piper code will emit the readyRead() signal when synthesized data is available. It is then upto the audio device to read that data, which it isnt on your system.

10-19-2024, 11:00 PM	#15
kovidgoyal creator of calibre Posts: 44,468 Karma: 24044628 Join Date: Oct 2006 Location: Mumbai, India Device: Various	Definitely a bug in either Qt or the underlying audio driver. It's pretty ridiculous for a audio driver to refuse to output audio until its buffer is full. What if the user is playing short, intermittent audio sounds? I will note it works fine on all the Linux, Windows and macOS systems I have. If you can reproduce it easily I suggest writing a small PySide based script to reproduce it (just create an audio device with a large buffer and write some random raw audio data into it) and open a bug report at Qt with your reproducer script. And in the next release there will be a tool to generate Piper based audio overlays which means you can pre-create the TTS audio files and play then directly using the calibre viewer to workaround the Qt bug.

Advert

Advert