Force alignment (synchronize audio with text)

ctzogka · July 2, 2019, 8:07am

Hi, i am trying to build a Greek dataset for DeepSpeech Project. I checked LibriVox and other available datasets and i thing i can get a big enough dataset. The problem is when i apply force alignment with Aeneas library (check tutorial: https://medium.com/@klintcho/creating-an-open-speech-recognition-dataset-for-almost-any-language-c532fb2bc0cf)
When i export json file i see that there is a big variance from the correct start/end time and i have to synchronize them manually. Is there any other tool that i could use to synchronize audio with text (available in many languages + greek) and export json file or similar format?

kdavis · July 2, 2019, 9:37am

@Tilman_Kamp Do you have any suggestions?

Tilman_Kamp · July 2, 2019, 10:02am

@ctzogka We are currently also working on forced alignment based on DeepSpeech, but this is pre-alpha and not tested on any other language than English. There is a quite comprehensive list of forced alignment tools. For your case the CMU Sphinx aligner seems to be a good first bet, as there is a Greek model. There is also an example on how to use it.

ctzogka · July 2, 2019, 11:07am

Thank you for the response, I am glad to hear that you are also working on force alignment! CMU Sphinx aligner isn’t what i was expecting… as i can see it exports csv with word timestamp + phonemic prounciation. Also i can’t run the project and i think that’s due to “This repository has been archived by the owner. It is now read-only.”
My opinion is that Aeneas is comfortable for DeepSpeech but it demands fine-tuning, which is time-consuming. Moreover, it would be nice to find or create an interface where, besides fine-tuning, users could edit the inference, when it’s not the ground-truth.

srkn · July 3, 2019, 10:13am

Hey @Tilman_Kamp I am also interested in deepspeech based forced alignment tool, can you share a bit more info about that? What is the release date?

srkn · July 5, 2019, 7:45am

here is the repolink of the mentioned tool by @Tilman_Kamp https://github.com/mozilla/DSAlign/tree/master

Tilman_Kamp · July 5, 2019, 8:05am

That’s the repo, yes… Plan is to get it productive (for our purposes) in a couple of weeks. The main focus is labeling audio data on a phrase-by-phrase basis for training DeepSpeech.

kimonode · August 16, 2019, 8:02pm

Hey,

I am also interested in force alignment to automatically generate karaoke timestamped lyrics (e.g. .lrc files) given an audio and a lyrics files (.mp3 and *.txt).

I am looking forward to seeing where this threads is heading !

Jendker · September 11, 2019, 2:48pm

Hi,

What is the state of the forced alignment with DeepSpeech? The repo looks quite good already, but could you confirm what is it’s status?

tensorfoo · October 28, 2019, 5:51am

Hi @Tilman_Kamp, i’ve been using DSAlign and thank you so much for the work you’ve put in, it’s fantastic.

I have a question about workflow for generating new training data for DeepSpeech using DSAlign. We can do it using EXPORT but is there a way to avoid generating wav files etc and just use the new transcriptions to train DeepSpeech once we’ve generated them in DSAlign. Kind of how in DSAlign we call DeepSpeech for inference, could we also call on DeepSpeech for training? It would make the workflow a lot smoother to have that sort of integration.

Topic		Replies	Views
DSAlign - use forced alignment tag on GitHub DeepSpeech	0	551	September 13, 2019
Problem in creating DeepSpeech training data DeepSpeech	3	448	January 14, 2020
Generate voice command dataset DeepSpeech dataset	2	822	September 2, 2019
How to use Deep Speech to force aliginment? DeepSpeech	2	402	June 25, 2019
DSAlign - handling disfluencies DeepSpeech	2	662	September 16, 2019

Force alignment (synchronize audio with text)

Related topics