Metadata in json format with speaker info

Tortoise · October 10, 2019, 9:26am

Dear Support.

I have seen your help for metadata on the ticket link below.

This doesnot help me yet.
Can you please give command line how to incorporate client.cc while STT model running with deepspeech to get the proper json format and speaker info. I tried several ways. I am confuse and still unsuccessful.
I have tried like this to incorporate client.cc

$python3 …/DeepSpeech-0.5.1/DeepSpeech.py --model models/en/output_graph.pbmm --alphabet models/en/alphabet.txt --lm models/en/lm.binary --trie models/en/trie --audio data/test4/dialog.wav --extended …/DeepSpeech-0.5.1/native_client/client.cc --json --extended > data/test4/transcript_plain.txt

Greetings.

lissyx · October 10, 2019, 1:43pm

Those are multiple questions here. Can you explicit exactly what you want to achieve ?
We don’t have any “speaker info” anyhow.

This command line makes no sense at all. What is this --extended …/DeepSpeech-0.5.1/native_client/client.cc --json --extended ? What are you trying to achieve ?

Tortoise · October 10, 2019, 1:55pm

Dear Respected sir,

first of all, thank you so much for kind help.
I have simple .wav file. I want to get it into text, and later I am aligning with mozilla align.
But, I realized that after alignment, I am missing the speaker in json format. When I am getting text from deepspeech, it is also simple text file and nothing in addition.

Can you guide me how and where to get that metadata as speaker info?
or where I am making mistake?

lissyx · October 10, 2019, 2:21pm

This is expected.

We don’t have such thing, so I don’t know what you are talking about.

Tortoise · October 10, 2019, 2:27pm

Dear Sir, there is

where
after alignment

[
// …
{
“start”: 7491960,
“end”: 7493040,
“transcript”: “good shepherd”
}]

this format. and I am looking for

[
// …
{
“start”: 7491960,
“end”: 7493040,
“transcript”: “good shepherd”,
“text-start”: 98302,
“text-end”: 98316,
“meta”: {
“speaker”: [
“Phebe”
]

reuben · October 10, 2019, 2:34pm

@Tilman_Kamp might be able to help.

lissyx · October 10, 2019, 4:27pm

This is DSAlign, this is not DeepSpeech directly. I still don’t figure out what you want exactly to do from DeepSpeech itself.

Tortoise · October 11, 2019, 8:33am

Metadata search with specific info.

lissyx · October 11, 2019, 8:35am

Please, can you articulate a complete and descriptive sentence ? I absolutely don’t understand what you want. I wish to help you, but honestly, I am loosing my time right now trying to do divination out of five words.

Tortoise · October 11, 2019, 8:38am

Dear Sir,
suppose if i have an audio with a conference or lecture of a professor. I am transcribing into text. Then from text, I need only what the key speaker has spoken. Or, what professor has explained. Only that text can be extracted out if I have the metadata with info like speaker 1, speaker 2 and so. This I think has explained in DSAlign but I don’t know how to get it out.

lissyx · October 11, 2019, 8:41am

Ok, then the problem is that you did not read what I said earlier: we don’t have that information

Tortoise · October 11, 2019, 1:54pm

Ok. Is there any open source tool or library which helps in getting such? If you can guide as you are much more expert of this domain. or any advice what steps need to follow?

lissyx · October 11, 2019, 1:56pm

This is not something we have any use, so I have no advice to share.

reuben · October 11, 2019, 2:01pm

Search for speaker diarization.

lissyx · October 11, 2019, 6:41pm

FTR there was also this: https://github.com/mozilla/DeepSpeech/issues/2169

Tortoise · October 14, 2019, 8:30am

Thank you so much. You are all really great persons.
One more question.
in alignment, is there any flag, or option that instead of fragments, I get json for each word?

lissyx · October 14, 2019, 9:58am

You can implement that yourself, from the Metadata structure.