Downloading common voice dataset with url command

I have problem.
My shell script demands the common voice dataset to download with url command.

I tried e.g., for English
url and wget.

url https://mozilla-common-voice-datasets.s3.dualstack.us-west-2.amazonaws.com/cv-corpus-6.1-2020-12-11/en.tar.gz?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=ASIAQ3GQRTO3JGWOOX7U%2F20210317%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Date=20210317T134200Z&X-Amz-Expires=43200&X-Amz-Security-Token=FwoGZXIvYXdzEAYaDO5GZMqYTXeYcmNFPSKSBPtettqfZHflnu3E4mEKcmWYcQUEPCoOTbd42wN6Ln7BKdb3nbeFshI%2F2f8ZWjz3V%2BM8MTgHdCU5SeqMv49Cwa13xC%2BXCu2AvAoaAVdq4Y9RT%2FlnGnovkwm%2FGPjBO0hHPSwnEZzONSW8qSUcHFHWn5g327W8Ny7W5H970vHtki7%2Fijf6uSsauNBTC23pGIcq75AhnXPbMrP1u6jx8CcVxRjoRylnbvN6JsAWUCcBRzdw4pXxjWmOMN8d2R3lM7blaq4i%2BgV02k7TfUSAFR5Pn%2FxUJPAZ1h9EVi%2BBKh1mHy8OIv2GsMEBlIKAdIFJOY1KCEjibXdZpaU07G%2FDZtSi8av88vrBXAjeWj1Kam1Z1E35x4W9fnanqUgt5NBK5Hgbz2yRlih5celwJwkJu47p427%2FQxPrVOFAJWSCT2QqlOXy2XvYjFudapuMDEcTLyX%2BSvU%2BMgMg1rWPca9xyVYlj%2BaC6uslw8MvTTajMQOer1ue4yXpuP8gh5GrFZ%2FqwWQ6PprM2C6ZOU%2FaL2XJdn9xJa7kztWg2SRczuZ4AfsIo5avQB20zKegQUn1Iz%2BisvhuEkZhotA5A%2F719UUChCR8FlTQTtbuOaHlB2j2I0jYXoZ%2FD3tMxgMWcY%2BRdyHt0LS6D6qqYkpZJlv4EREAsaXKHE%2FB0ksroZLsXreKoFZH345MwZODX2K%2Bsct8DDaoAjTnScBfKLL2x4IGMir32AKBrdeFplizApfGnAgv84NTOjvxNkpC%2F768K4XgJejjhchFi%2FFWDOg%3D&X-Amz-Signature=9b62ade68adb2ee0ce85a05350b7138455f259a22cb453cc2637341cc36c2d91&X-Amz-SignedHeaders=host

apparently both didn’t work from the link which is given after the submission of email to download by browser.

i tried only this part as well

url https://mozilla-common-voice-datasets.s3.dualstack.us-west-2.amazonaws.com/cv-corpus-6.1-2020-12-11/en.tar.gz

Can you guide me how to resolve this issue.

If you don’t share precisely what “didn’t work” people cannot help you.

I am instead of manually downloading the data via web, I want to download via terminal /shell script in linux, putting url command in shell script and by url commandline in shell script it download and then uncompress and prepare data by one script.

in easy words, how can I download the common voice dataset with the terminal using **url** command

You don’t answer my question ; you need to share the error you get out of your wget URL call.

OK. I tried two ways.

Below are both the command and its errors.
corpus_url=https://mozilla-common-voice-datasets.s3.dualstack.us-west-2.amazonaws.com/cv-corpus-6.1-2020-12-11/en.tar.gz?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=ASIAQ3GQRTO3NNKWBM5N%2F20210317%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Date=20210317T143800Z&X-Amz-Expires=43200&X-Amz-Security-Token=FwoGZXIvYXdzEAgaDAo76o9mztZTJyK7wiKSBHzTBmQpuVL12exr3vmp9V3D3DylreCAQWzgRSZI3HOOnkZUiWBrkCFMt%2BQH5u5jYG0p4IFvINJRdONvaE2SEZZkn5NLjJugKI6qa1SV%2FgJGdcXChe8cmB%2FCSqd4tyAJnHR%2F7kYjdxe90SyuWanl%2BBYWslLlcKRnB0QdX2HqPAsKpjDFtZuHBNpKYtYRrAbmvDtx%2FJ%2BKEfdeZAiA4Y9eQkoFlkFqTi7uB00X6lwXYFLrojDZLftZwZGEGEFR9aVqY5S8J4LJ4%2Bfe0uRxn2azfgv2kHmJ4MTV3OYNcs%2FXbQWktb9ffItp0FEC%2Fxx1JApv6OZl3Y46epnswXIdXnM%2BF37yadgWKBAcvkjWFlpqJ%2BX4cvoisPXB6MhGOCgKgA7aJtCINTfZSywkyq2e96lsYBFbSPhbSXrEw39thThECILMJMLH91nios9uBjFfjQOLCPzLB00GvYy3iiSVy%2FxCtJXy5p4VXgrQPgbai%2BD6g34TErVn1EdGwyYRT14tXkDPkPRaAZnflB3g3Y2EpdeXeynYztDQ%2FnWTM9sCZhg6hV5OEMHqijRSCzhLbVeB0AhbUqC6gb2RJ5Ohor75%2Fbc1ejbcL0QTAkUbjQLv8dJEH1cxp%2FDzuQ2xmfEgR7f%2BtqHEl3C5c48lzXgKnItlxV3HhL8Tg8sh5gITy%2BY4wXh%2FDmqRWXDfPGL9N%2BZnFAvV5UtOwvGHKOihyIIGMio5TGIujH4cSgwiAhVAu5YjHR84flC7v4rVM9TIbI1HF9GZ3qKFNfXYbGc%3D&X-Amz-Signature=b9334af7eda27b09267acb58d3060ba380f92a621ce9e05378840fd20141d50c&X-Amz-SignedHeaders=host

and error is

./run.sh: line 12: X-Amz-Credential=ASIAQ3GQRTO3NNKWBM5N%2F20210317%2Fus-west-2%2Fs3%2Faws4_request: command not found
./run.sh: line 12: X-Amz-Security-Token=FwoGZXIvYXdzEAgaDAo76o9mztZTJyK7wiKSBHzTBmQpuVL12exr3vmp9V3D3DylreCAQWzgRSZI3HOOnkZUiWBrkCFMt%2BQH5u5jYG0p4IFvINJRdONvaE2SEZZkn5NLjJugKI6qa1SV%2FgJGdcXChe8cmB%2FCSqd4tyAJnHR%2F7kYjdxe90SyuWanl%2BBYWslLlcKRnB0QdX2HqPAsKpjDFtZuHBNpKYtYRrAbmvDtx%2FJ%2BKEfdeZAiA4Y9eQkoFlkFqTi7uB00X6lwXYFLrojDZLftZwZGEGEFR9aVqY5S8J4LJ4%2Bfe0uRxn2azfgv2kHmJ4MTV3OYNcs%2FXbQWktb9ffItp0FEC%2Fxx1JApv6OZl3Y46epnswXIdXnM%2BF37yadgWKBAcvkjWFlpqJ%2BX4cvoisPXB6MhGOCgKgA7aJtCINTfZSywkyq2e96lsYBFbSPhbSXrEw39thThECILMJMLH91nios9uBjFfjQOLCPzLB00GvYy3iiSVy%2FxCtJXy5p4VXgrQPgbai%2BD6g34TErVn1EdGwyYRT14tXkDPkPRaAZnflB3g3Y2EpdeXeynYztDQ%2FnWTM9sCZhg6hV5OEMHqijRSCzhLbVeB0AhbUqC6gb2RJ5Ohor75%2Fbc1ejbcL0QTAkUbjQLv8dJEH1cxp%2FDzuQ2xmfEgR7f%2BtqHEl3C5c48lzXgKnItlxV3HhL8Tg8sh5gITy%2BY4wXh%2FDmqRWXDfPGL9N%2BZnFAvV5UtOwvGHKOihyIIGMio5TGIujH4cSgwiAhVAu5YjHR84flC7v4rVM9TIbI1HF9GZ3qKFNfXYbGc%3D: command not found
./run.sh: line 12: X-Amz-Date=20210317T143800Z: command not found
./run.sh: line 12: X-Amz-Signature=b9334af7eda27b09267acb58d3060ba380f92a621ce9e05378840fd20141d50c: command not found
./run.sh: line 12: X-Amz-SignedHeaders=host: command not found
./run.sh: line 12: X-Amz-Expires=43200: command not found
./run.sh: line 31: corpus_url: unbound variable

and

other way

corpus_url=https://mozilla-common-voice-datasets.s3.dualstack.us-west-2.amazonaws.com/cv-corpus-6.1-2020-12-11/en.tar.gz

only this part of link

and error is

./run.sh
local/download_and_untar.sh: downloading data from https://mozilla-common-voice-datasets.s3.dualstack.us-west-2.amazonaws.com/cv-corpus-6.1-2020-12-11/en.tar.gz.  This may take some time, please be patient.
--2021-03-17 15:45:19--  https://mozilla-common-voice-datasets.s3.dualstack.us-west-2.amazonaws.com/cv-corpus-6.1-2020-12-11/en.tar.gz
Resolving [mozilla-common-voice-datasets.s3.dualstack.us-west-2.amazonaws.com](http://mozilla-common-voice-datasets.s3.dualstack.us-west-2.amazonaws.com) ([mozilla-common-voice-datasets.s3.dualstack.us-west-2.amazonaws.com](http://mozilla-common-voice-datasets.s3.dualstack.us-west-2.amazonaws.com))... 52.218.242.88, 2600:1fa0:402c:9890:34da:8801::
Connecting to [mozilla-common-voice-datasets.s3.dualstack.us-west-2.amazonaws.com](http://mozilla-common-voice-datasets.s3.dualstack.us-west-2.amazonaws.com) ([mozilla-common-voice-datasets.s3.dualstack.us-west-2.amazonaws.com](http://mozilla-common-voice-datasets.s3.dualstack.us-west-2.amazonaws.com))|52.218.242.88|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2021-03-17 15:45:21 ERROR 403: Forbidden.

local/download_and_untar.sh: error executing wget https://mozilla-common-voice-datasets.s3.dualstack.us-west-2.amazonaws.com/cv-corpus-6.1-2020-12-11/en.tar.gz

This is not really a Common Voice issue, you are just using your system incorrectly: place the URL between quotes:

$ wget "URL"

I have tried that too

$  wget "https://mozilla-common-voice-datasets.s3.dualstack.us-west-2.amazonaws.com/cv-corpus-6.1-2020-12-11/nl.tar.gz?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=ASIAQ3GQRTO3OBNKRXGU%2F20210317%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Date=20210317T152026Z&X-Amz-Expires=43200&X-Amz-Security-Token=FwoGZXIvYXdzEAgaDG8wFq2Uc4P5Rb3moCKSBMhB1GH%2BnmRKFS3AAuEiieqX3yq5YR47vurx0qnDLaBwg4WYTPFVBoQR4zOTXX5Ngj1rFI0B57%2Fmuqhyxf%2FkI4zBEgsSXl%2FStBsuoiEnBkkwLFbPIgOuFqTa0X1hx9IhaaRPhb5ZzBX3r4RSHHcXdU4BX4LQ8YQpIitCAjb0QYjZ6uEta%2BWuOq4s1K3YmJNJP0u1YHireMqwJrkNb75GuHmQ6oVu3GWBTIoi6uHe4kj1QQ41Ri9TWj%2Bx56nTBpzfQ8UEGBdU3CrNBNajHklFGAaVX97SaHwNF6DjadCLpPFXFtcHA6q%2BjcmuRUZK9CHqX5L2Kqv878Ivj2iCgJ%2FtZ%2FB3eAuhijp%2BGwVtBsJieat%2BLMfEMx%2F6512wR4hLq%2Bm9Ajx%2FpdLGhIxBzwEkPeGoXDVN9JX7wGldftL0AJk2sQ0XduBQeiMsAxXYOvNjqVvlYFKM7BqvRGwdrQo%2FXuLMFSVbG80loss7WlGhw870BcoZ1FfHM8LMHVhzqNJkRsYFuXN88ostRUGCP69asPZ4EJQbfnC6VyE0n8%2BeURaAG5t1GTO5DLNVKFLJ446mTPusogoqYct%2BYostI%2F6KrSoSvAO4ktcAJMcXZg0iS4NDF%2FwoQH2vzBswRudHH4gHWde7koV2bshnbynxU328jaUiJuwefpYgTcbEmGTBpXNmzavzO35Zc%2FMvMao2ABy0spd7EBduKKqhyIIGMirxN2ro00lCSOa74RppxTFDH6cQgas4Di%2Bgczgxh4qyN0NzqKpEQRwi9Zs%3D&X-Amz-Signature=9699d69e86e659a887e257fbcffcabd4bc7419b6112e78f686e6fb029646351f&X-Amz-SignedHeaders=host"
    --2021-03-17 16:23:48--  https://mozilla-common-voice-datasets.s3.dualstack.us-west-2.amazonaws.com/cv-corpus-6.1-2020-12-11/nl.tar.gz?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=ASIAQ3GQRTO3OBNKRXGU%2F20210317%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Date=20210317T152026Z&X-Amz-Expires=43200&X-Amz-Security-Token=FwoGZXIvYXdzEAgaDG8wFq2Uc4P5Rb3moCKSBMhB1GH%2BnmRKFS3AAuEiieqX3yq5YR47vurx0qnDLaBwg4WYTPFVBoQR4zOTXX5Ngj1rFI0B57%2Fmuqhyxf%2FkI4zBEgsSXl%2FStBsuoiEnBkkwLFbPIgOuFqTa0X1hx9IhaaRPhb5ZzBX3r4RSHHcXdU4BX4LQ8YQpIitCAjb0QYjZ6uEta%2BWuOq4s1K3YmJNJP0u1YHireMqwJrkNb75GuHmQ6oVu3GWBTIoi6uHe4kj1QQ41Ri9TWj%2Bx56nTBpzfQ8UEGBdU3CrNBNajHklFGAaVX97SaHwNF6DjadCLpPFXFtcHA6q%2BjcmuRUZK9CHqX5L2Kqv878Ivj2iCgJ%2FtZ%2FB3eAuhijp%2BGwVtBsJieat%2BLMfEMx%2F6512wR4hLq%2Bm9Ajx%2FpdLGhIxBzwEkPeGoXDVN9JX7wGldftL0AJk2sQ0XduBQeiMsAxXYOvNjqVvlYFKM7BqvRGwdrQo%2FXuLMFSVbG80loss7WlGhw870BcoZ1FfHM8LMHVhzqNJkRsYFuXN88ostRUGCP69asPZ4EJQbfnC6VyE0n8%2BeURaAG5t1GTO5DLNVKFLJ446mTPusogoqYct%2BYostI%2F6KrSoSvAO4ktcAJMcXZg0iS4NDF%2FwoQH2vzBswRudHH4gHWde7koV2bshnbynxU328jaUiJuwefpYgTcbEmGTBpXNmzavzO35Zc%2FMvMao2ABy0spd7EBduKKqhyIIGMirxN2ro00lCSOa74RppxTFDH6cQgas4Di%2Bgczgxh4qyN0NzqKpEQRwi9Zs%3D&X-Amz-Signature=9699d69e86e659a887e257fbcffcabd4bc7419b6112e78f686e6fb029646351f&X-Amz-SignedHeaders=host
    Resolving mozilla-common-voice-datasets.s3.dualstack.us-west-2.amazonaws.com (mozilla-common-voice-datasets.s3.dualstack.us-west-2.amazonaws.com)... 52.218.250.160, 2600:1fa0:40ac:1148:34da:e561::
    Connecting to mozilla-common-voice-datasets.s3.dualstack.us-west-2.amazonaws.com (mozilla-common-voice-datasets.s3.dualstack.us-west-2.amazonaws.com)|52.218.250.160|:443... connected.
    HTTP request sent, awaiting response... 400 Bad Request
    2021-03-17 16:23:54 ERROR 400: Bad Request.

and

$ wget "https://mozilla-common-voice-datasets.s3.dualstack.us-west-2.amazonaws.com/cv-corpus-6.1-2020-12-11/nl.tar.gz"
--2021-03-17 16:25:13--  https://mozilla-common-voice-datasets.s3.dualstack.us-west-2.amazonaws.com/cv-corpus-6.1-2020-12-11/nl.tar.gz
Resolving mozilla-common-voice-datasets.s3.dualstack.us-west-2.amazonaws.com (mozilla-common-voice-datasets.s3.dualstack.us-west-2.amazonaws.com)... 52.218.238.56, 2600:1fa0:40cc:488:34da:d071::
Connecting to mozilla-common-voice-datasets.s3.dualstack.us-west-2.amazonaws.com (mozilla-common-voice-datasets.s3.dualstack.us-west-2.amazonaws.com)|52.218.238.56|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2021-03-17 16:25:15 ERROR 403: Forbidden.

maybe the link got invalid in the meantime?

Within few seconds? because I copy pasted directly in terminal.

I just tested generating a link here, and it’s working …

And yours is not working here as well.

Maybe I am doing mistake while generating link. I am simply copying the address-bar which is next step after email address submission and it pop-up the tar folder to save manually. Is there any other better way to generate link.

I just let the download start, cancel it, and right-click / “Copy source URL”.

2 Likes

thanks a lot. this trick worked for me!