After turning the MTurk crowdsourcing world loose on our project for about 30 hours, we decided that we’d stop and take a look at our results of the experiment, and see what we’ve learned so far. We’ll start out with the statistics, then we’ll follow up this post with an overview of our methodology and the things we learned on this initial test. Recap We took a list of 100 companies and asked if our MTurk workers could find the Last Name, First Name, Suffix, and link to a bio of the company’s General Counsel. We asked that the user only look on the company’s website to find the information, and we would not accept answers from external resources like ZoomInfo, etc. The question was answered in a “double-blind” method where each question was given to two different people to answer.Here’s what the actual MTurk questionnaire looked like:

The Statistics

  • Actual hours used to answer questions – 6.3 hours This meant that each answer took about 2 ½ minutes to answer. Some of the questions were handled within a few seconds, while others took a five minutes or more.
  • Average Pay Per Hour – $1.51 (+ 15¢ surcharge for MTurk per hour) When I looked at this, I immediately thought that I’d make a very good slum lord.
  • Total amount paid – $9.50 to workers + 95¢ to MTurk ($10.45 Total) Again, not a lot of money, but from my research on the topic, the MTurk workers aren’t looking to make a lot of money, they are taking on these projects in their spare time. So, although I’d make a great slum lord, I guess I shouldn’t feel too guilty about it.
  • Over 330 individuals accepted the tasks, but only 161 (48.8%) returned an answer Since we were only paying 10¢ a question, it is apparent that some decided it wasn’t worth the work, while others probably couldn’t find the answer and gave up.
  • Questions Answered – 161 of 200 (80.5%) Initially, I was floored by the fact that over 80% of the questions were answered. Once I started diving into the answers, it became apparent that not all of the answers were what we were looking for.
  • Total number of Answers that were “acceptable” – 95 of 161 (59%) Of the 161 answers we received, only 95 were deemed as correctly identifying the General Counsel of the company.
  • Companies with at least one answer – 86 of 100 (86%) Although there was an 86% return rate, not all of these companies had correct answers (more below.)
  • Companies that received a double-blind entry – 72 of 86 (83.7%) These were companies that had two different individuals return answers.
  • Companies that received only one answer – 14 of 86 (16.3%) However, out of the 14, only about two were correct. The rest were either pointing to the Chief Executive Officer as the GC, or were just plain guesses at the answer.
  • Double-Blind Answers where both answers matched – 46 of 72 (64.0%) These were answers where both individuals found the same name and url. Generally this is a good indicator that the information is correct. But, we found out that this isn’t always true (more below.)
  • Companies where the General Counsel information was found – 52 of 86 (60.5%) We looked at the answers individually and discovered that 34 (39.5%) of the companies where we received some type of data back from the MTurk worker had incorrectly identified the wrong person as the General Counsel. Most of the incorrect answers identified the Chief Executive Officer as the General Counsel. In one case, the person answered “none” (which was technically correct, but wasn’t included in the stats above.)

When the test was over, we ended up with solid answers for 52 of the 100 companies, and a good guess that 20-25 companies probably didn’t have a GC on staff. That is a great result for something we’ve spent $10.45 to get. We’ll discuss more of the methodology and some of the surprises we found while conducting this test in later posts.

  • I like the idea of crowd sourcing and wow, do I see the possible benefits. But at the same time, I can’t help but feel that it is exploitative particularly, if I am gathering information for my own profit. I don’t take issue with your experiment, because the results were for public benefit. But what if, for example, you crowd sourced this kind of data (or other similarly desirable data not otherwise available), then bundled it and sold it for profit. Or what if a law firm decided to crowd source document review for $5/hour and billed it out at $100/hr? That does not seem right.

    At the same time, managing a crowd sourcing project can be difficult and the people who take the helm — the aggregators — deserve compensation as well. I’m curious to see how these issues resolve as companies begin to use crowd sourcing more and more.

  • Carolyn,

    I, too, had the same thoughts on the potential for exploitation of crowdsourcing. I’ve thought a lot about this very topic as I was conducting the test and still haven’t resolved all of the pros and cons of the issue.

    On the one hand, you could say that the market will take care of itself. So, for example, if I don’t pay enough for the work, I’ll either not get the work done, or I’ll find that I’m getting poor results back and it is probably costing me more money in weeding out the bad results than it would be if I paid a higher amount.

    On the other hand, I’m thinking that this type of resource will drive down costs in some areas (like electronic discovery) that are – in my opinion – way too high. By crowdsourcing some things and bringing the cost of the process down, and passing that substantial discount through to the client, there is a possible “win-win-win” situation. The client wins by reducing the overall costs, the firm wins because it doesn’t need to invest in the infrastructure and personnel costs associated with bringing in reviewers (we’ll assume this is low-level,non-attorney work), and the crowdsourcing people win because they can take on these projects in their spare time and earn a little extra money at the same time.

    There will always be a chance that someone will find a way to leverage crowdsourcing to make a huge profit on the backs of low-paid workers, but this type of business model isn’t limited to crowdsourcing, it is one that is used in almost all businesses that depend upon labor to produce a product.

    Toby and I will be covering some of the methodology we used in putting this experiment together, and some of the issues we’ve had to look at in what it means to have a crowdsourcing project.

  • Anonymous

    Greetings, I am one of your sourced crowd. An anonymous Mturker who accepted Greg’s “100 word summary of legal decision” Hit on Monday evening, May 18.
    I emailed Greg complaining that the Hit expired on me, and 60 minutes was not enough (for me) to summarize a patent law appellate decision.
    Greg sent a nice email back promising compensation (50 cents — gotta make the rent, y’know). But then he included something interesting as an attachment: a snapshot of the batch progress report which lists average time per assignment as 3 minutes. In all, 4 out of 5 assignments were finished in an 8 hour posting period. So 4 people took an average of 3 minutes to write a 100 word summary on this:
    Title of article: Federal Circuit affirms award of attorneys`fees for litigation misconduct
    And this was my summary:
    ICU Medical Inc lost its patent infringement claim against Alaris Medical Systems in a California District Court. ICU appealed and the Ninth Circuit affirmed the lower court’s ruling, including imposing Rule 11 sanctions (awarding of attorneys fees) against ICU for falsely representing its own patents on medical devices (valves to allow injection of fluid into patients’ IV lines). (No. 08-1077 Fed. Cir. Mar. 13, 2009). At issue was ICU’s claim that its common patent specifications described a nonpiercing design. The Appellate Court justified sanctions because ICU claimed that drawings in its patent specifications clearly showed a spikeless device, but later admitted the figures did not.
    Now I ask you, ladies and gentlemen, what do you think of the 3 minute average? Seriously, I want to know what the blog lurkers and the blog commanders think the statistic says. Does it say that the 4 random Mturkers who accepted the assignment were so good at this kind of thing, (because they do it for a living, perhaps) that they could spit it out in 3 minutes? If so, and their summaries were acceptable, then I am very impressed, and very much chastened.
    And what does that have to do with the experiment? Well, it’s an important issue, quality is. Greg’s findings on a task of looking up general counsel on lawfirm websites — not exactly rocket science — returned a rate of 50% usable product. Which way does the statistic go when the assignment becomes less intellectually facile?
    Do tell.
    By the way, stop by and see my blog some time: Literary Birthdays (litbirthdays on wordpress) or twitter litbirthdays for the birthday of the day.
    Later —
    E.S. Dempsey