We have been collecting them from users lately, but in the past I have used public domain books and movies (War of the Worlds, It’s a Wonderful Life). The problem is that those texts are old and the language is outdated, so we are trying to get people to donate stuff they would say. It hasn’t been a very scaleable approach yet, but I think it’s a good direction.
It really depends. Either can be pretty technical and use a lot of proper nouns, which is less useful I think. But the right forum might have some good stuff, you just gotta find it.
Books also have the “not very conversational” problem. Movies scripts are better. Call center calls are really good, that’s how google trained their original engine.
Is it public domain? License free? If so, this would be an amazing source granted we could use it.