Transcription having lot of spelling errors and giving wrong spaces for words

raghavk92 · December 24, 2018, 1:53pm

Hi,
I was trying to transcribe two different audio samples.
One has a bit of backgroud music. I actually extracted audio from an apple ad where jonathan ive speaks with a really clear voice but has background music.I converted to 16000 samples a second as required by deepspeech I found a lot of spelling errors.

Mistakes like evolution is spelt evil lution. And its an apple watch ad. So how do i correct this. I tried to use the latest lm , trie models still the transcription is bad.

I ll list what i used but please tell what should i use.

I used output_graph from reuben’s release because the 0.3.0 was giving very bad results as it was just gibberish and nothing of vcalue was there in the transcription for 0.3.0 models and this fix was providing in the github issue Language model incorrectly drops spaces for out-of-vocabulary words · Issue #1156 · mozilla/DeepSpeech · GitHub

output graph of reuben’s release:

lm and trie i used from DeepSpeech/data/lm at master · mozilla/DeepSpeech · GitHub

and alphabet.txt i used from the 0.3.0 models release in the github readme.The alphabets.txt maybe from this link but i am not sure right now: DeepSpeech/data at master · mozilla/DeepSpeech · GitHub

So the transcription that i get for apple ad : https://www.youtube.com/watch?v=6EiI5_-7liQ

transcription is : e e e in i an an an enemple agh seres for is more than an evil lution erepresents a fundamental redesin anryengineering of apple watchretaining the riginal i comicg design veloped ury find the for olsimanaging to make it fine be new display is now oven birty percen larger and is seemlessly integrated into the product the interface as been read deigned fron you tiplay providing more information with rich a detail the heard wore hand the software combine to define a very new and truly intergrated singular design novigating with the digital crown olready one of the most intricat makhalisms wit ever created has been intirely igreengineeredwith hapti feeback dilivering a presise ecannical field as idrol in addition to an obtea hasanco the is a new applepizine ilectrical hars and se to the lousutitake in electra cardia graham or easy ge to share with your doctor a momnentesichievement for a were of a divice placing a finger on the tigital crownd i eeplose cerkid with a lectrods on the bank providing dater the easy g busesanaliz your harid whole understanding hea health is a sential to ou well bei aditional features in in harmsmans in courag es ti live and overall healther or tantive life the excela romiter girescove an alfliter allow you to recall youtypes of workelse measure runs withincreased presision and tra your all day activity with great accuracy in hart selilar connectiv ity in tabu something prulyliberating the obility distaklinected with just your wach fon case music streaming and even a mergency essistence ol immediately evolable from your restch eries for is a device so powerful so postnal so liperating i con change the way ou liveach day

and for the other file link is : https://www.youtube.com/watch?v=GnGI76__sSA

and the transcipption with vad transcriber is -
DEBUG:root:Processing chunk 00
DEBUG:root:Running inference…
DEBUG:root:Inference took 2.720s for 5.880s audio file.
DEBUG:root:Transcript: stevies to um saye o me and heused to saye is a lut
DEBUG:root:Processing chunk 01
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.292s for 1.470s audio file.
DEBUG:root:Transcript: jonny
DEBUG:root:Processing chunk 02
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.337s for 1.620s audio file.
DEBUG:root:Transcript: is it that the idea
DEBUG:root:Processing chunk 03
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.282s for 1.530s audio file.
DEBUG:root:Transcript:
DEBUG:root:Processing chunk 04
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.772s for 3.750s audio file.
DEBUG:root:Transcript: and sometimes they wore
DEBUG:root:Processing chunk 05
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.639s for 3.180s audio file.
DEBUG:root:Transcript: really do pe
DEBUG:root:Processing chunk 06
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.918s for 4.410s audio file.
DEBUG:root:Transcript: sometimes they would tru to dreadful
DEBUG:root:Processing chunk 07
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.632s for 3.090s audio file.
DEBUG:root:Transcript: sometimes they of the air from the room
DEBUG:root:Processing chunk 08
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.638s for 3.000s audio file.
DEBUG:root:Transcript: an me liftis poth completely silent
DEBUG:root:Processing chunk 09
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.845s for 4.200s audio file.
DEBUG:root:Transcript: od crazy magninificen ideas
DEBUG:root:Processing chunk 10
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.403s for 2.010s audio file.
DEBUG:root:Transcript: whire simple ones
DEBUG:root:Processing chunk 11
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.371s for 1.890s audio file.
DEBUG:root:Transcript: hin this sufflety
DEBUG:root:Processing chunk 12
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.288s for 1.470s audio file.
DEBUG:root:Transcript: tee tal
DEBUG:root:Processing chunk 13
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.352s for 1.740s audio file.
DEBUG:root:Transcript: eatto e profound
DEBUG:root:Processing chunk 14
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.366s for 1.860s audio file.
DEBUG:root:Transcript: just i speve
DEBUG:root:Processing chunk 15
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.382s for 1.950s audio file.
DEBUG:root:Transcript: loved ydeas
DEBUG:root:Processing chunk 16
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.434s for 2.160s audio file.
DEBUG:root:Transcript: an loved maan stuff
DEBUG:root:Processing chunk 17
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.513s for 2.550s audio file.
DEBUG:root:Transcript: he treated the process
DEBUG:root:Processing chunk 18
DEBUG:root:Running inference…
DEBUG:root:Inference took 1.094s for 5.370s audio file.
DEBUG:root:Transcript: treativeity with the rare and a wonderful reverence
DEBUG:root:Processing chunk 19
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.871s for 4.260s audio file.
DEBUG:root:Transcript: is the i think he better than any one understood
DEBUG:root:Processing chunk 20
DEBUG:root:Running inference…
DEBUG:root:Inference took 1.017s for 5.010s audio file.
DEBUG:root:Transcript: wile ideas oltemately can be so powerful
DEBUG:root:Processing chunk 21
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.598s for 2.970s audio file.
DEBUG:root:Transcript: egin as fratile
DEBUG:root:Processing chunk 22
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.383s for 1.920s audio file.
DEBUG:root:Transcript: e fomd thoughts
DEBUG:root:Processing chunk 23
DEBUG:root:Running inference…
DEBUG:root:Inference took 1.123s for 5.490s audio file.
DEBUG:root:Transcript: so esily mistd so easily compromise so isily josquift
DEBUG:root:Processing chunk 24
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.909s for 4.230s audio file.
DEBUG:root:Transcript: on love the way that he listened so intendly
DEBUG:root:Processing chunk 25
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.432s for 2.190s audio file.
DEBUG:root:Transcript: loved his perseption
DEBUG:root:Processing chunk 26
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.582s for 2.910s audio file.
DEBUG:root:Transcript: is remarkable sensitive ity
DEBUG:root:Processing chunk 27
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.544s for 2.700s audio file.
DEBUG:root:Transcript: nd his surgecly preciseieinion
DEBUG:root:Processing chunk 28
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.350s for 1.920s audio file.
DEBUG:root:Transcript:
DEBUG:root:Processing chunk 29
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.551s for 2.700s audio file.
DEBUG:root:Transcript: i really believe there was a beuty
DEBUG:root:Processing chunk 30
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.869s for 4.410s audio file.
DEBUG:root:Transcript: e sehela how meen his insih was
DEBUG:root:Processing chunk 31
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.456s for 2.280s audio file.
DEBUG:root:Transcript: sometimes et could spey
DEBUG:root:Processing chunk 32
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.585s for 3.030s audio file.
DEBUG:root:Transcript: as um suremany you know
DEBUG:root:Processing chunk 33
DEBUG:root:Running inference…
DEBUG:root:Inference took 1.022s for 4.920s audio file.
DEBUG:root:Transcript: steve didn’t comfined his sensif excellent to make him products
DEBUG:root:Processing chunk 34
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.544s for 2.610s audio file.
DEBUG:root:Transcript: you a wo we travel together
DEBUG:root:Processing chunk 35
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.356s for 1.770s audio file.
DEBUG:root:Transcript: wold check hin
DEBUG:root:Processing chunk 36
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.387s for 1.920s audio file.
DEBUG:root:Transcript: t gop to my room
DEBUG:root:Processing chunk 37
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.868s for 4.260s audio file.
DEBUG:root:Transcript: nat leave my bags thery needly but te door
DEBUG:root:Processing chunk 38
DEBUG:root:Running inference…
DEBUG:root:Inference took 1.239s for 6.390s audio file.
DEBUG:root:Transcript: with numat
DEBUG:root:Processing chunk 39
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.814s for 4.080s audio file.
DEBUG:root:Transcript: gon si on the bed
DEBUG:root:Processing chunk 40
DEBUG:root:Running inference…
DEBUG:root:Inference took 1.061s for 5.220s audio file.
DEBUG:root:Transcript: on si on the bed next to the fhun
DEBUG:root:Processing chunk 41
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.283s for 1.470s audio file.
DEBUG:root:Transcript: wat
DEBUG:root:Processing chunk 42
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.434s for 2.130s audio file.
DEBUG:root:Transcript: n evetible fone cal
DEBUG:root:Processing chunk 43
DEBUG:root:Running inference…
DEBUG:root:Inference took 2.631s for 12.990s audio file.
DEBUG:root:Transcript: ony this hoodself soctless go
DEBUG:root:Processing chunk 44
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.308s for 1.560s audio file.
DEBUG:root:Transcript: used to joe
DEBUG:root:Processing chunk 45
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.631s for 3.150s audio file.
DEBUG:root:Transcript: lunitics a takean over the assinem
DEBUG:root:Processing chunk 46
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.576s for 2.760s audio file.
DEBUG:root:Transcript: swe shard gedioxsignment
DEBUG:root:Processing chunk 47
DEBUG:root:Running inference…
DEBUG:root:Inference took 1.090s for 5.070s audio file.
DEBUG:root:Transcript: spending months and months working on a part of a product
DEBUG:root:Processing chunk 48
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.493s for 2.310s audio file.
DEBUG:root:Transcript: nobody with ever see
DEBUG:root:Processing chunk 49
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.290s for 1.380s audio file.
DEBUG:root:Transcript: owith the rese
DEBUG:root:Processing chunk 50
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.872s for 4.020s audio file.
DEBUG:root:Transcript: did it because we because we really believed that it was right
DEBUG:root:Processing chunk 51
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.276s for 1.410s audio file.
DEBUG:root:Transcript: cause we cared
DEBUG:root:Processing chunk 52
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.542s for 2.520s audio file.
DEBUG:root:Transcript: elieved that there was a grammidty
DEBUG:root:Processing chunk 53
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.751s for 3.570s audio file.
DEBUG:root:Transcript: umast ascensive civic responsibility
DEBUG:root:Processing chunk 54
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.452s for 2.280s audio file.
DEBUG:root:Transcript: so care wavbyyongs
DEBUG:root:Processing chunk 55
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.619s for 2.940s audio file.
DEBUG:root:Transcript: and e sot of functional imperative
DEBUG:root:Processing chunk 56
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.108s for 0.630s audio file.
DEBUG:root:Transcript:
DEBUG:root:Processing chunk 57
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.340s for 1.800s audio file.
DEBUG:root:Transcript: wok
DEBUG:root:Processing chunk 58
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.488s for 2.340s audio file.
DEBUG:root:Transcript: hoopfully appeared in evi table
DEBUG:root:Processing chunk 59
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.309s for 1.560s audio file.
DEBUG:root:Transcript: hid simple
DEBUG:root:Processing chunk 60
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.225s for 1.140s audio file.
DEBUG:root:Transcript: teasy
DEBUG:root:Processing chunk 61
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.301s for 1.500s audio file.
DEBUG:root:Transcript: really cost
DEBUG:root:Processing chunk 62
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.323s for 1.650s audio file.
DEBUG:root:Transcript: cost te soledin i
DEBUG:root:Processing chunk 63
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.460s for 2.190s audio file.
DEBUG:root:Transcript: you know i cost him most
DEBUG:root:Processing chunk 64
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.312s for 1.500s audio file.
DEBUG:root:Transcript: cared the most
DEBUG:root:Processing chunk 65
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.956s for 4.620s audio file.
DEBUG:root:Transcript: he wo in the most deeply he constantly questioned
DEBUG:root:Processing chunk 66
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.290s for 1.380s audio file.
DEBUG:root:Transcript: this good enough
DEBUG:root:Processing chunk 67
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.245s for 1.230s audio file.
DEBUG:root:Transcript: this right
DEBUG:root:Processing chunk 68
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.530s for 2.610s audio file.
DEBUG:root:Transcript: dispite all his successis
DEBUG:root:Processing chunk 69
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.404s for 2.040s audio file.
DEBUG:root:Transcript: his achievements
DEBUG:root:Processing chunk 70
DEBUG:root:Running inference…
DEBUG:root:Inference took 1.089s for 5.220s audio file.
DEBUG:root:Transcript: never presued he never assumed thet we would get there in the end
DEBUG:root:Processing chunk 71
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.397s for 2.010s audio file.
DEBUG:root:Transcript: nideas didn’t come
DEBUG:root:Processing chunk 72
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.529s for 2.640s audio file.
DEBUG:root:Transcript: the proace it types faled
DEBUG:root:Processing chunk 73
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.778s for 3.840s audio file.
DEBUG:root:Transcript: it was with great intent with faith
DEBUG:root:Processing chunk 74
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.477s for 2.400s audio file.
DEBUG:root:Transcript: he decided to believe
DEBUG:root:Processing chunk 75
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.298s for 1.530s audio file.
DEBUG:root:Transcript: then shally
DEBUG:root:Processing chunk 76
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.317s for 1.530s audio file.
DEBUG:root:Transcript: a something greaght
DEBUG:root:Processing chunk 77
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.539s for 2.730s audio file.
DEBUG:root:Transcript: joy of getting man
DEBUG:root:Processing chunk 78
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.526s for 2.640s audio file.
DEBUG:root:Transcript: i loved is infhusiasm
DEBUG:root:Processing chunk 79
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.484s for 2.430s audio file.
DEBUG:root:Transcript: simple thelight
DEBUG:root:Processing chunk 80
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.474s for 2.370s audio file.
DEBUG:root:Transcript: ma i mixed with serilief
DEBUG:root:Processing chunk 81
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.423s for 2.130s audio file.
DEBUG:root:Transcript: the year we got there
DEBUG:root:Processing chunk 82
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.319s for 1.590s audio file.
DEBUG:root:Transcript: we got there in the end
DEBUG:root:Processing chunk 83
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.233s for 1.140s audio file.
DEBUG:root:Transcript: ahe was good
DEBUG:root:Processing chunk 84
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.448s for 2.250s audio file.
DEBUG:root:Transcript: conceise smile conye
DEBUG:root:Processing chunk 85
DEBUG:root:Running inference…
DEBUG:root:Inference took 1.010s for 4.710s audio file.
DEBUG:root:Transcript: selebration of making something grat for everybody
DEBUG:root:Processing chunk 86
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.662s for 3.270s audio file.
DEBUG:root:Transcript: enjoying the defeat of sinisism
DEBUG:root:Processing chunk 87
DEBUG:root:Running inference…
DEBUG:root:Inference took 1.439s for 6.600s audio file.
DEBUG:root:Transcript: rjection of reason the rejection of being told a hundred times in condo that
DEBUG:root:Processing chunk 88
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.733s for 3.570s audio file.
DEBUG:root:Transcript: so hes i think was in victory for beauty
DEBUG:root:Processing chunk 89
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.307s for 1.560s audio file.
DEBUG:root:Transcript: pperity
DEBUG:root:Processing chunk 90
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.605s for 2.970s audio file.
DEBUG:root:Transcript: he would say for givein at dham
DEBUG:root:Processing chunk 91
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.840s for 4.140s audio file.
DEBUG:root:Transcript: he was my closeess and we must loa friend
DEBUG:root:Processing chunk 92
DEBUG:root:Running inference…
DEBUG:root:Inference took 2.090s for 9.300s audio file.
DEBUG:root:Transcript: together fornerly fitteen years and he still laughed to the way i sad ali minum
DEBUG:root:Processing chunk 93
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.487s for 2.340s audio file.
DEBUG:root:Transcript: past tothe weeks
DEBUG:root:Processing chunk 94
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.968s for 4.410s audio file.
DEBUG:root:Transcript: wh we ill bing struggling to find ways to save tood by
DEBUG:root:Processing chunk 95
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.342s for 1.620s audio file.
DEBUG:root:Transcript: t smooning
DEBUG:root:Processing chunk 96
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.380s for 1.920s audio file.
DEBUG:root:Transcript: smply once who weren
DEBUG:root:Processing chunk 97
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.372s for 1.860s audio file.
DEBUG:root:Transcript: ank you staye
DEBUG:root:Processing chunk 98
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.628s for 3.000s audio file.
DEBUG:root:Transcript: f youl remarkable vision
DEBUG:root:Processing chunk 99
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.332s for 1.620s audio file.
DEBUG:root:Transcript: ichis inited
DEBUG:root:Processing chunk 100
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.319s for 1.590s audio file.
DEBUG:root:Transcript: nspired
DEBUG:root:Processing chunk 101
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.526s for 2.550s audio file.
DEBUG:root:Transcript: this extraordinary groups of people
DEBUG:root:Processing chunk 102
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.525s for 2.580s audio file.
DEBUG:root:Transcript: for the oll the weav hof men from you
DEBUG:root:Processing chunk 103
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.781s for 3.660s audio file.
DEBUG:root:Transcript: nfor all thet we will continue to learn from each other
DEBUG:root:Processing chunk 104
DEBUG:root:Running inference…
DEBUG:root:Inference took 0.200s for 1.050s audio file.
DEBUG:root:Transcript: st
DEBUG:root:Processing chunk 105
DEBUG:root:Running inference…
DEBUG:root:Inference took 1.926s for 9.900s audio file.
DEBUG:root:Transcript: ee

So how should i improve this transcription. should i use different models but where do i get them from. How can i improve this without training because i dont have annotated samples.

And if it needs training how much minimum training it needs and how do i train it in the most minimum way possible to get a good transcription . And how many minimum samples would i need to annotate and train to get a good transcription if training is needed.

Thanks in advance
Raghav

dan0 · January 4, 2019, 9:14am

VAD tends to make mincemeat of audio, we were seeing similar results after adding WebRTCVAD to our DeepSpeech Frontend. Honestly, your chunking looks quite aggressive, I would try splitting audio every 50 seconds or so and see how those samples perform. Larger audio files = more RAM usage, but generally better accuracy.

raghavk92 · January 15, 2019, 8:33am

@dan0 the models for 0.4.0 has resolved most of the transcription errors the errors. Thanks for the response but i have a few issues with transfer learning.

I have 3 questions regarding a few problems i am facing :

We tried to do transfer learning on the model with a few of our samples(4 large audio samples (technical talks) converted to 740 (around 5 sec chunks and 500 training samples and 100 for dev and 140 for test) that we created with around 5 sec audio with transcription in the csv file created from voice activity detection.

So some transcriptions got better some got worse.
So how many files are needed for a good transfer learning to happen?

While transcribing with 04.0 i found that when the person speaks fast the transcription goes wrong either two words merge and form a wrong word or to seperate wrong words. So how do i improve this or this will also happen with transfer learning with people speaking fast and how many samples are ideal
I tried transcribing with files with background music but got around 75% accuracy. I removed the noise with audacity:
procedure :

remove voice from audio
get noise profile
and remove noise from original sample with noise profile

The accuracy was 85% after this.

But i tried to automate this with sox package for ubuntu
procedure:

remove voice
sox audio.wav music.wav oops
create noise profile
sox music.wav -n noiseprof noise.prof
Remove noise from wav using profile
sox audio.wav output.wav noisered noise.prof 0.21

(i also tried with different levels of aggressivness like 0.3,0.05,0.1 etc but not much change in transcription)

The trancription became bad. I think it damaged the voice audio while noise reducing with sox.Do you know a better way for noise reduction and get better transscription.? And if i need to better transcribe a file which has background music is there any other way(like would training help and how many samples would i be needing)?

Thanks

lissyx · January 15, 2019, 10:41am

@josh_meyer can likely help on that

This can only be solved by better dataset, including improving Common Voice

raghavk92 · January 17, 2019, 8:16am

hi,
thanks for the information. @josh_meyer could you please tell how many samples are needed for good t ransfer learning to happen. We had around 700 (around 5 sec samples) split into(500 train,150 test and 150 test samples) but we got better results for some audios but for some audios results were worse than the original model.

How much silence can there be between two words or does that need to be removed. and what is the optimal length of an audio sample, and is there a recommended procedure to cut audio (we got our audio by cutting technical conferences with voice activity detection)

And @lissyx i tried to remove background noise as i mentioned in post : Transcription having lot of spelling errors and giving wrong spaces for words

Do you know if i have made some mistake in removing background noise (If aggressiveness is high or anything else) which caused the voice to become damaged. Because i tried same procedure with audacity and a got a good transcription result but their i didnt mention the aggressiveness of noisereduction (0.21 which i mentioned with sox) .

Am i doing something wrong.

And @josh_meyer if i train with files with background noise how many files would i need to transfer learn to get a transcription with files having background noise

please tell
thanks

lissyx · January 17, 2019, 8:17am

I have no idea, I can’t help.

Topic		Replies	Views
Question with DeepSpeech Transfer Learning DeepSpeech	40	2891	March 28, 2020
Transcription Results very bad in english DeepSpeech	16	1207	October 7, 2020
How to get good transcription results with only a specific English vocabulary? DeepSpeech	15	1786	June 3, 2020
Trained model on my own data DeepSpeech	48	4657	May 29, 2021
Fine-tuning DeepSpeech Model (CommonVoice-DATA) DeepSpeech	60	6331	August 20, 2019

Transcription having lot of spelling errors and giving wrong spaces for words

Related topics