Understanding “Natural Language” – IBM's Watson and the Future of Search

By Greg Lambert on January 11, 2011

Anyone that has ever tried a “Natural Language” search… whether using something as generic as Google, or searching a more focused databases like Westlaw or Lexis… knows that it is a hit or miss type searching strategy. The nuances of the English language make a sentence like “Last night I shot an elephant in my pajamas” nearly unintelligible for computers. (How did that elephant get in your pajamas, anyway??) Legal research providers have always dreamed of establishing an algorithm that can take a normal sentence that a human can interpret fairly easily… understanding that the person was wearing pajamas, not that the elephant was in the pajamas… by using their experiences, knowledge and intuition to understand exactly what the sentence means, thus being able to give an appropriate response to the sentence. It is this ability – this insight – that humans have, that computers simply have not been able to accomplish so far.

Enter IBM’s “Next Grand Challenge” where the scientists at IBM accept this challenge, and attempt to create a computer system that can not only handle natural language, but can understand the nuances that are found in the game show Jeopardy!

The IBM Jeopardy! Challenge poses a specific question with very real business implications: Can a system be designed that applies advanced data management and analytics to natural language in order to uncover a single, reliable insight – in a fraction of a second?

The IBMers are calling the project “Watson,” named after the company’s founder, Thomas J. Watson (not Sherlock Holmes’ sidekick like I initially assumed.) Sam Palmisano, IBM Chairman and CEO, says that , like its big brother, Big Blue (the computer Chess Master), or Blue Gene (Human Genome Project) Watson is attempting to do something that many people believe is impossible for technology to accomplish – “the ability of a computer to do something that’s far more challenging than chess: to understand natural human speech about a limitless range of topics, and to make informed judgments about them.”

Here are some videos that explain the Jeopardy! challenge, and the glitches, and accomplishments that Watson has shown so far. If you are a legal researcher, you should watch these videos from that angle, and think about the possibilities that can come from applying the techniques that IBM is using to answer the scope of questions presented on the game show, and start wondering how that could apply to a more narrow set of legal topics and questions that we face on a day-to-day basis.

The Next Grand Challenge

The part on “Open Question Answering,” that Dr. Katharine Frase discusses around the 2:00 mark the issues and differences between “searching” and “keywords,” and the issues of understanding and interacting in “the way normal humans communicate.”

What is Watson? Why Jeopardy?

Because you have to really understand the complexity of the English language, not just the pieces of information, the nature of the game Jeopardy! presents a very good challenge for Watson to not only extract knowledge, but to interpret that knowledge. Watson has to understand the nuances of the “answer” that is presented by the Jeopardy! host, and not only answer it quickly and accurately, but also to understand when not to answer if it is wrong (risk factors.) That’s a very complex idea, and one that made for some funny answers at first, but over time, Watson started getting the “questions” right… 15% of the time, 50% of the time, then 60% (average player level), then 70% (average Champion level), 80% (3x Champion level), then 90% (Grand Champion level). What was a little scary, was the speed at which the increase occurred… in less than a year, it went from 15% accuracy, to over 80%.

In 2011, IBM’s Watson is supposed to compete on an actual show of Jeopardy! It will be interesting to see how the technology advances of “Open Question Answering” work not only in the areas of answering game show hosts… but how this type of advancement in natural language in computer databases can work to improve the way those of us conduct what we call “search” today.

One interesting issue that I saw on another video that goes into depth of what Watson can do, one of the first questions that Watson was asked, was answered incorrectly (according to a comment, and the answer I got from WolframAlpha.) Watson answered that “ln((12546798*pi)^2)/34567.46” was 0.00885, and the answer according to other sources is actually 0.001011917. Will one of you with a degree in Mathematics (or at least a good calculator) double check that, please? If Watson answered this incorrectly, then IBM may want to look at Watson’s math algorithms one more time before going on to face the Jeopardy! Challenge.

[Note: seems that Watson wasn’t wrong after all… see the comment below that explains the issues with the parens placement.]

Watson’s Question

Watson’s Answer

WolframAlpha’s Answer