i have tried three ways to download this dataset ,but all fails
1 wget -c “url”,after download about 1%, it fails,the info is
Connecting to |52.218.235.40|:443… connected.
HTTP request sent, awaiting response… 403 Forbidden
2 goole chrome Downloads Plus, Sometimes the connection gets lost after some minutes, sometimes after some hours . In the end I’m unable to download the file.
3 internet download manager ,the same as 2. after i have download about 3GB,the connection gets lost.
Thanks for the contribution of this open source data,but the download problem prevented more people from using it,hope solve this issue.
Regarding (1) is your IP changing when it fails? It isn’t possible to post direct URLs because of the requirement to agree to the checkbox to get the URL, but you should be able to find the URL and use that, providing your IP does not change.
thanks for your reply,but what do you mean about ip changing? i check the ip,it is always the same by ifconfig command (ubuntu) and ipconfig (windows)
the first way is in ubuntu ,and the second and third way is in windows
I found a way to download this dataset,up to now ,52% downloaded.
1 Use IDM to download. When the speed is 0, right-click in the tool to change the download address ( in common voice page,enter the email address an give you a new download adress)
2 use the same download address and use Xunlei to download. At this time, pause IDM and restart. IDM will continue to download
3 there have been 2 times when the speed is 0 and no longer download. According to the above methods, this continue to download successfully
some url are forbidden,i remove it so that i can make this comment
root@1e11c64a3d17:~/data/audio_data# wget -c [“url”]
The name is too long, 1196 chars total.
Trying to shorten…
New name is cv-corpus-7.0-2021-07-21-en.tar.gz?X-Amz-Algorithm…
–2021-09-15 01:41:04-- [url]
Resolving … 52.218.237.224, 2600:1fa0:405f:9e41:345c:a03a::
Connecting to)|52.218.237.224|:443… connected.
HTTP request sent, awaiting response… 403 Forbidden
2021-09-15 01:41:05 ERROR 403: Forbidden.