Download common voice corpus 7.0 en_2637h_2021-07-21 65GB fails,

i have tried three ways to download this dataset ,but all fails
1 wget -c “url”,after download about 1%, it fails,the info is
Connecting to |52.218.235.40|:443… connected.
HTTP request sent, awaiting response… 403 Forbidden

2 goole chrome Downloads Plus, Sometimes the connection gets lost after some minutes, sometimes after some hours . In the end I’m unable to download the file.
3 internet download manager ,the same as 2. after i have download about 3GB,the connection gets lost.

Thanks for the contribution of this open source data,but the download problem prevented more people from using it,hope solve this issue.

Regarding (1) is your IP changing when it fails? It isn’t possible to post direct URLs because of the requirement to agree to the checkbox to get the URL, but you should be able to find the URL and use that, providing your IP does not change.

thanks for your reply,but what do you mean about ip changing? i check the ip,it is always the same by ifconfig command (ubuntu) and ipconfig (windows)
the first way is in ubuntu ,and the second and third way is in windows

I mean your external IP (from your router). Also, what is the exact URL you try and download?

the following command is what i use to download thedataset :

wget -c

and the idm download fail

Can you show the output on the terminal for the wget command?

I found a way to download this dataset,up to now ,52% downloaded.

1 Use IDM to download. When the speed is 0, right-click in the tool to change the download address ( in common voice page,enter the email address an give you a new download adress)
2 use the same download address and use Xunlei to download. At this time, pause IDM and restart. IDM will continue to download
3 there have been 2 times when the speed is 0 and no longer download. According to the above methods, this continue to download successfully

some url are forbidden,i remove it so that i can make this comment
root@1e11c64a3d17:~/data/audio_data# wget -c [“url”]
The name is too long, 1196 chars total.
Trying to shorten…
New name is cv-corpus-7.0-2021-07-21-en.tar.gz?X-Amz-Algorithm…
–2021-09-15 01:41:04-- [url]
Resolving … 52.218.237.224, 2600:1fa0:405f:9e41:345c:a03a::
Connecting to)|52.218.237.224|:443… connected.
HTTP request sent, awaiting response… 403 Forbidden
2021-09-15 01:41:05 ERROR 403: Forbidden.

Can you show the whole URL?

Does it tend to fail at around a common time? E.g. after 12 hours?

Hey thanks for creating this topic.

To help us understand your issue, is it possible to provide an update regarding your situation ?

Many thanks,

Hillary