Random sampling size for bulk submission


I have a question about the random sampling size for bulk submission, using online calculators with confidence level of 99% and a margin of error 2%, it seems the sampling size is around 4000 sentences for 50000,100000 or a million sentence.
Is this accurate? I want to make sure I am not over doing it.



These sample size calculations are good for large sets. For a small set, ex. 1000, the result will be 800… For large sets, e.g. 100.000 - 1 M, the difference will be small.

As the original process is based on 100k sentences (so a large number), that calculation is provided. I think, for smaller sets, e.g. 10k, two different random sets of (say) 5-10% should be more than enough… You would already cover 20-40% of the whole anyway.

80% (800 from 1000) is not a sample…

For some mathematics: https://stattrek.com/sample-size/simple-random-sample.aspx