Assessing the health of the internet: What data is missing?

This session is facilitated by Stefan Baack

Show on schedule

About this session

After a short kick-off presentation from the host with some examples of missing data he wishes existed, participants are encouraged to introduce themselves and share the issues they encountered. Afterwards, participants select some of those issues and brainstorm whether there is something that can be done to deal with the lack of this data or research. For example, is there alternative data available to approach the problem from a different angle? Would it be possible that some of the participants team up to start some research and small scale data collection to highlight the lack of data? Ideas like these will be collected in the group.

Goals of this session

There are numerous reports and research collectives that are trying to assess the ‘state of the internet’, but this task often remains surprisingly difficult. Researchers and activists interested in assessing the security, openness, or inclusiveness of the internet constantly face a lack of data on the issues they care about - data that we often think should be available somewhere, but that is either nonexistent or very difficult to acquire. Knowing what is missing and understanding why can be just as important as analyzing what is available. What is measured can be managed and thus is considered relevant. Shedding light on unmeasured blind spots enables us to advocate for internet health more effectively. This session therefore brings together activists and researchers to share what data and research they sorely miss and to discuss how these pain points can be addressed.

Measuring language online - lots of studies, but once you dig deeper there are gaps. If an internet user speaks more than one language? You can look at data re: languages spoken and internet connectivity in the same region to get a proxy, but what about in less homogeneous areas where there are many languages spoken? Do people care if internet content is published in their first language? Opinions vary widely depending on culture (eg feelings in Germany very different from France).
ideas & reactions: survey users through the browser. ask users to opt-in to sharing their browsing history and use that as a data set. ads in different languages, see what gets clicked on.

Putting UN data on Wikipedia to improve articles - idea is that this leads to good outcomes (people learn more and then can do things). But hard to collect data on how much impact this is having. Can say # of people who read a specific page, but not how many people read a section of that page and what happened after. How to make the case re: impact?
ideas & reactions: Putting out a survey to ask, sharing the challenge openly/vulnerably and asking for people’s input. Ask (lobby?) Wikipedia to gather more data on this and/or survey their users.

Trying to find info re: who owns my data in Canada and if I can control it. Hard to gather and maintain research that is focused on this kind of qualitative, complex data.
ideas & reactions: use and support existing crowdsourcing research tools, like Wikidata

Chatbot prototype on Facebook messenger - challenge of working with clients without a high level of digital literacy, don’t necessarily know what data they need or want to capture. And the tools available on platforms are often very limited. Not until you see the data at the end of the project that you realize you wish you’d captured something else (but can’t go back or don’t have budget to do that).

CitizenLab Security Planner - not collecting data from users, want to link to data/research in the tool. A lot of data & research about security issues is collected by people who create devices/software to improve security - so need to be skeptical about the findings, as it often drives people back to the product they created. Also research methodology can be questionable. How to find good statistics/info that is vendor agnostic and address a more representative, global audience?

Internet shutdowns - growing group of researchers looking to document this and share it with activists to fight against this. Internet shutdowns are happening more frequently, not always reported by media, can happen in very short windows, can be very localized in relatively small areas (is it a shutdown, or is the electricity just off?), access to some platforms/sites might be shut down and not the whole internet, etc. Feels like this should be measurable, but it’s actually very difficult to track this worldwide. How do researchers collect/present this data and ensure it’s credible? How quick can this be done?
ideas & reactions: hard to report internet shutdowns when it’s happening because you don’t have internet access to report it - could you go to satellite network partners and ask them to provide support in these cases? how do you marry this to some of the automated solutions that already exist?

Working to digitize the shipping industry. Clients are often not open to giving enough data. By having just 1 image of a person you can make a facial recognition system. Using this approach for documents, to train a template recognition system.

Internet Health Report - Much more data on western countries than other parts of the world (eg lots of research re: online harassment in North America, far less from Sub-Saharan Africa and not comparable). Research on net neutrality laws - not many countries have law that are called “net neutrality law”, best research that we found had 66 countries that looked only at 2 continents… considered combining with another study that looked at more countries but it didn’t use the same scoring system so we couldn’t combine them. Didn’t end up using this in the report. One approach taken with other topics was to combine qualitative and quantitative studies (eg re: online harassment of journalists, combined quantitative research from The Guardian with a qualitative study).

One great approach is a network! Let’s stay in contact.