Metadata in json format with speaker info

Dear Support.

I have seen your help for metadata on the ticket link below.

This doesnot help me yet.
Can you please give command line how to incorporate while STT model running with deepspeech to get the proper json format and speaker info. I tried several ways. I am confuse and still unsuccessful.
I have tried like this to incorporate

$python3 …/DeepSpeech-0.5.1/ --model models/en/output_graph.pbmm --alphabet models/en/alphabet.txt --lm models/en/lm.binary --trie models/en/trie --audio data/test4/dialog.wav --extended …/DeepSpeech-0.5.1/native_client/ --json --extended > data/test4/transcript_plain.txt


Those are multiple questions here. Can you explicit exactly what you want to achieve ?
We don’t have any “speaker info” anyhow.

This command line makes no sense at all. What is this --extended …/DeepSpeech-0.5.1/native_client/ --json --extended ? What are you trying to achieve ?

Dear Respected sir,

first of all, thank you so much for kind help.
I have simple .wav file. I want to get it into text, and later I am aligning with mozilla align.
But, I realized that after alignment, I am missing the speaker in json format. When I am getting text from deepspeech, it is also simple text file and nothing in addition.

Can you guide me how and where to get that metadata as speaker info?
or where I am making mistake?

This is expected.

We don’t have such thing, so I don’t know what you are talking about.

Dear Sir, there is

after alignment

// …
“start”: 7491960,
“end”: 7493040,
“transcript”: “good shepherd”

this format. and I am looking for

// …
“start”: 7491960,
“end”: 7493040,
“transcript”: “good shepherd”,
“text-start”: 98302,
“text-end”: 98316,
“meta”: {
“speaker”: [

@Tilman_Kamp might be able to help.

This is DSAlign, this is not DeepSpeech directly. I still don’t figure out what you want exactly to do from DeepSpeech itself.

Metadata search with specific info.

Please, can you articulate a complete and descriptive sentence ? I absolutely don’t understand what you want. I wish to help you, but honestly, I am loosing my time right now trying to do divination out of five words.

Dear Sir,
suppose if i have an audio with a conference or lecture of a professor. I am transcribing into text. Then from text, I need only what the key speaker has spoken. Or, what professor has explained. Only that text can be extracted out if I have the metadata with info like speaker 1, speaker 2 and so. This I think has explained in DSAlign but I don’t know how to get it out.

Ok, then the problem is that you did not read what I said earlier: we don’t have that information

Ok. Is there any open source tool or library which helps in getting such? If you can guide as you are much more expert of this domain. or any advice what steps need to follow?

This is not something we have any use, so I have no advice to share.

Search for speaker diarization.

FTR there was also this:

Thank you so much. You are all really great persons.
One more question.
in alignment, is there any flag, or option that instead of fragments, I get json for each word?

You can implement that yourself, from the Metadata structure.