Verifying information from the crowd–can it be done?


Image by Gobierno de Aragón via Flickr

Whenever I mention the concept of obtaining situational-awareness information from citizens, the people in logo shirts cringe. The question of data veracity is always the chief concern, as demonstrated by the discussion on this blog a couple of weeks about the Oil Spill Crisis Map (which displays an aggregation of citizen reports regarding the BP Oil Spill). Others in emergency management completely dismiss the notion out-of-hand.

The international humanitarian response community, however, does not have the luxury of ignoring “real-time  streams of data” from citizens impacted by either man-made or natural disaster events. Instead of throwing the baby out with the bathwater, as my Texas mother would say, processes, both human and technological, have been developed to address the issue. I should note that this effort is occurring mostly in the NGO sector. (But , see a tangentially related  initiative by the U.S. State Department call Civil Society 2.0, announced last Dec.)

The organization leading the way is the non-profit tech company, Ushahidi. What is Ushahidi?

Ushahidi …specializes in developing free and open source software for information collection,visualization and interactive mapping. We build tools for democratizing information, increasing transparency and lowering the barriers for individuals to share their stories. We’re a disruptive organization that is willing to fail in the pursuit of changing the traditional way that information flows.

Since the Ushahidi software is available for any organization (public or private) to use,  the creators developed a guide for users that specifically addresses how to verify data from citizens. You can peruse the one-page document, but in general it touches on everything from direct communication with the source, to looking out for  “poison data”, or intentionally misleading information.

Another way to verify data is with the deployment of their newly upgraded software “Swiftriver.”   This software enables the user to do several things: mine intelligence from the web; aggregate data from multiple sources; monitor mentions of your company, organization, or agency; and categorize information based on semantic context.  From their website:

SwiftRiver is a free and open source platform that helps people make sense of a lot of information in a short amount of time. …

In practice, SwiftRiver enables the filtering and verification of real-time data from channels such as Twitter, SMS, Email and RSS feeds. This free tool is especially useful for organizations who need to sort their data by authority and accuracy, as opposed to popularity. These organizations include the media, emergency response groups, election monitors and more.

The SwiftRiver platform offers organizations an easy way to combine natural language/artificial intelligence process, data-mining for SMS and Twitter, and verification algorithms for different sources of information. Swift’s user-friendly dashboard means that users need not be experts in artificial intelligence or algorithms to aggregate and validate information. The intuitive dashboard allows users to easily manage sources of information they wish to triangulate, such as email, Twitter, SMS and RSS feeds from the web.

I think this is interesting because it is a completely different way to sort information during a response. Although currently Ushahidi might be one of the few companies developing these technologies, I suspect many more software applications will become available as organizations, response and otherwise, see the benefits in “mining data”.  I also predict that privacy concerns will surface as these practices become more common.

This might make a lot of emergency managers uncomfortable. I like this quote from the article “Aid groups using cellphones to reach the world’s poor” in yesterday’s Washington Post :

“Tech is an enabler, not the end goal,” said David Edelstein, vice president of technology programs for Grameen. “It’s about putting information into people’s hands and empowering them.”

7 responses to “Verifying information from the crowd–can it be done?

  1. The idea of “verifying” information is an artifact of a scarcity mindset in intelligence gathering. When we have only a few reports or a few sources, a relatively small error can severely bias interpretation, and so each datum deserves detailed scrutiny.

    But when we have whole rivers of information the veracity of individual reports becomes much less important… and that’s good news when, as in rapidly developing emergent situations, it becomes impossible to verify each individual report against any objective standard anyway.

    Likewise, even source reputation, itself largely a proxy for data reliability, becomes much less of a concern when we’ve got oodles of sources. Errors tend to cancel themselves out if the statistical sample is large enough. “Outlier” data points can be filtered out or, better, given special attention to see if they might indicate a systemic blind spot or an emerging trend.

    This is one of the areas where an excessively worshipful attitude toward officialness can cause us to miss the point. Even if “official” sources are actually more reliable than unofficial ones (which may not always be the case) larger samples are better when we can get them.

  2. At the same time, it’s one thing to create new aggregated views of disasters, another to use them to improve outcomes. Right now there’s a lot of self-congratulation going ’round about how disaster data is being collected and presented in new ways. That’s good, but the jury’s still out on how, or even whether, that’s actually making a difference.

    Who is consuming all this new information, and how is it influencing what they’d otherwise do?

    • I think the first step is documenting needs. If we can better understand needs then governments and NGOs are in a better position to meet those needs. I wonder, however, if a digital divide will be created. Those who know how to use social media to get their story heard and therefore get aid, vs those relying on traditional communications. To answer my own question, resources always run to those that scream the loudest–no matter what the medium.

  3. The common mistake to assume that ‘verifying’ information can ever be something that’s completely machine driven. I firmly stand on the side of the line that it cannot, at least not in with the technology available to us today.

    It’s also a common mistake to assume that verifying small quantities of data is less difficult than large quantities, as Art points out. However, I’ll disagree with Art on another point, that: “Errors tend to cancel themselves out if the statistical sample is large enough.” that statement assumes a lot.

    It assumes that the erroneous report has occurred in a vacuum, with no influence, no causation, no reaction. The problem with erroneous reports, especially online, is that if they come from a trusted source, they become more powerful. For example in the past few years there’s been a number of reports from major media outlets that a very public figure (Steve Jobs) was deceased, when in fact he was not. When a ‘trusted source’ is wrong, it makes them no less influential. So the validity of the source is always relevant, especially when crowd sourcing. Is the validity something that can be predicted? Again, I’m of the camp that believes no (at least not yet). However, those errors, inaccuracies or mistakes can’t just be ignored.

    As the developer of SwiftRiver, our goal is to simply maximize human time when verifying information. Before you had a cellphone could you communicate with other people? Yes. Is it more efficient to use a cellphone? Yes. Our goal at Swift is to empower those who seek to better manage streams of information.

    They might have one message, you can have one million, the goal remains the same: to attempt to add context to help the human make more informed decisions.


    • Thank you so much for contributing to our conversation. Your work at is very interesting. I’m guessing you used swift to find my blog posting. Am I correct?

    • Not meaning to attack your business model, Jon, but it seem like there might be a bit of circularity here. If a “trusted” source can distort the construction of meaning, isn’t that arguably a problem with the notion of “trusted” sources?

      We’ve inherited a culture of celebrity and authority from the mass media era; I’m afraid it’s going to take us some time to shake free of practices and attitudes from that era that don’t serve us as well anymore.

      And in a crowdsourced application I’m not sure I understand what it even means to assess the validity of a source. Do we mean the validity of the whole crowd? Or of certain selected members? Who gets to choose, and on what basis?

      The fear of runaway rumors has dogged the Internet since its inception. But what we’ve actually seen is that the system is raucous and noisy but ultimately self-stabilizing. That’s hard to accept for folks of my generation who’ve been accustomed most of our lives to heavily filtered (“validated”) media. But I hold with Kevin Kelly’s view that the price of admission to more complex and sophisticated systems is a willingness to surf the chaos.

  4. Thanks for this post, I didn’t know about the Gulf map. It was really interesting to see how social media played out in the Boulder wildfire (I recently blogged about this on my site). Very quickly a rich google map was created with fire perimeters, locations of burned houses, roads closed, shelters and so on. People went to twitter or live audio scanner feeds to get the latest information. The thing that surprises me is how organized and useful these resources are – we assume they would be full of rumors and misinformation, but actually communities seem to be able to curate and organize on the web quite well.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s