I’ve not got my laptop on me at the moment so can’t look at the WhatsApp Web thing, but I’ve done a fair bit of web automation in Selenium and suspect it’d be reasonably easy because it’s largely stable and pretty much unmaintained.
This is the “brute force and ignorance” method of how I’d do it if I couldn’t work out how to snag the JSON data directly from the phone:
Visit web.whatsapp.com then scan the QR code with your phone, and it sets up an ipv6/peer.js connection direct to your phone. There’s a list of conversations on the left, you’d extract those using a css or xpath selector on the browser’s DOM, then loop over them by clicking on them in turn. For each one, you’d then get the hight of the message panel on the right side, scroll up as far as you can, then see if the size has changed. Once it’s not got any bigger for, say, 10 seconds, you’d then use a css or xpath selector to get all the green message bubbles and copy them into an empty doc. Once all the conversations are done, let the user select the ones they want to share.
Facebook would be more difficult and would require constant maintenance, as they work to prevent scraping so the available scrapers tend to die a couple of times a year. It’d need some thought, data extraction would be a pain once the pages have made it to disk.
Search history would be easy enough. Just dump all URLs from the browser’s history and filter them with search URLs for the major search engines. Bonus points for language selection using the top level domain.
IRC would be easy. Have a bot that logs text to a file, and an on JOIN
event for a channel that messages new users asking them to opt-in. Anyone who is identified with nickserv and responds with the trigger word gets added to the logging whitelist. Comments in a public channel are generally not private anyway, so no privacy issues there. Bonus points for reminding the user that they’ve opted in each time they join, and giving them stats and the chance to opt out. The project would of course need permission from each channel where the bot operates, and we’d need to avoid technical channels and focus purely on chatty ones.
If people are interested then these are things I could probably write given a couple of days of effort.