Public Safety radio traffic at a new language

I have an ever growing collection of public safety radio transmissions. I am working on labeling them and once I have enough labeled I intend to use DeepSpeech to build a model to understand them. Does anyone else have an interest in doing this?

I have interest in this how would you label them? I can help if you like.

I am building a tool that plays a transmission and allows the user to type in what they heard. It has a textbook per talker (based on radio ID). The user plays the audio, can seek back 10 seconds to re-hear a part, and then hits next to submit the translation and hear another.

I hope to have it online in the next week or so.

I currently have a couple of counties around St. Louis covered and am hoping to work with the OpenMHZ guy to get traffic from other areas.

That’s awesome would love to help in any way. Will the tool be online to use? How do you plan to you plan on deploying it?

I know this is ancient! I got some help from a friend and we have an interface that will present each transmission. The audio is captured from a digital system so we have metadata indicating which radio ID is speaking at each timestamp. This allows us to break the transmission up and allow the user to hear the whole thing, or play each speaker’s piece separately. The user can type in what they heard on each part and submit it to get another transmission. This will allow labelers to go through the radio traffic relatively quickly. We have a backend system that receives and decodes the radio traffic and provides the metadata and audio to the UI. It also accepts the labelling and adds that to the metadata.

An example would be “Dispatch 235, 235 go ahead, I’m clear this subject, citation given, clear at 19:35”. This represents a police officer talking to dispatch and would have 3 boxes to fill in.

I run several capture boxes capturing radio traffic in a few metro areas and I run the UI and backend in a docker container locally. I will be working to clean it up a little and then get it hosted publicly. I think that I can get help from the RadioReference and other Scanner communities to label traffic if I explain what it’s for. Eventually the labelled dataset could be used to accurately transcribe radio traffic, perform sentiment analysis, as well as auto-generate events based on what is happening.

1 Like

Apologies, I know this is an old form. However, I am working on doing the exact same thing right now, my data set is a little small though. Would you be willing to share your data set by chance?