archive Block
This is example content. Double-click here and select a page to create an index of your own content. Learn more.


archive Block
This is example content. Double-click here and select a page to create an index of your own content. Learn more.
Google Knowledge Graph & Vault, and web-aided truth scoring

Google Knowledge Graph & Vault, and web-aided truth scoring

On August 25th, Google announced Knowledge Vault: A Web-Scale Approach to Probabilistic Knowledge Fusion. The program, not yet public, is an extension on Google Knowledge Graph (see also Wikipedia entry). The idea is to have this huge database storing structured relationships between topics, which makes your search results more meaningful. Knowledge Graph drives the carousel at the top right of your search results and draws mainly from open sources such asCIA World FactbookFreebase (which Google owns), and Wikipedia. Knowledge Vault builds on Knowledge Graph by automating the structuring and validation of entity relationships, unlocking a web of knowledge that could extend far beyond pre-structured data sources.

Here's how it works in a nutshell. First, data extraction algorithms crawl the web looking for triples. Think of triples as a subject and object linked together by a verb, called the predicate. The Extractors find triples using a combination of automated procedures that find named entities and label text by part of speech. There are four types of extractors in Knowledge Vault: extractors based on Natural Language Processing algorithms that tokenize text strings, extractors based on Domain Object Model trees, extractors that use Natural Language Processing to make sense of HTML tables, and extractors that draw on annotations of web pages by actual human beings. Second, Knowledge Fault uses preexisting, well-validated triples from Freebase to validate the triples that it extracts from the web. Finally, Knowledge Vault computes the probability that a particular triple is true based on the agreement between the various data extraction and triples validation algorithms.

Awesome. How could this be useful to SoundCheks? To explore the possibilities, let's review how SoundCheks will work.

  1. Somebody finds a public statement they want to annotate.
  2. They decompose the public statement into a series of premises, including a conclusion. The decomposition, by the way, could be as simple as a triple, which hints at how SoundCheks could draw on Knowledge Vault already.
  3. They score each of the premises on the probability that it is true.
  4. They identify any fallacies that the argument commits, and present their own argument for why the argument commits these fallacies.
  5. We take the product of the premise truth scores and a normalized measure of validity (computed from the number of fallacies committed per premise) as the soundness score of the statement.
  6. The scores of the statement contributed by multiple raters are combined, and probable political biases are filtered out.
  7. The scores of public figures are combined into summary measures of their soundness.

One of the most time-consuming activities of SoundCheking will likely be determining the veracity of a premise. SoundCheks could speed up the process by plugging into Knowledge Graph (or, one day, Knowledge Vault) so that premises that form triples could easily be checked for their veracity, or at least provide a good initial guess. In the case of Knowledge Vault, we could actually have an initial estimate of the probability that a premise is true. Alternatively, the annotator could supply their own probability score, along with an argument in support of their ruling. Another way Knowledge Vault could be useful is if we could tap into the path-ranking algorithm that drives its truth-scoring algorithm. See, the Vault's truth scores are implicitly based in part on logical linkages between predicts such as "is parent of" and "is married to" or "had sex with".

So that's how Knowledge Vault could be useful to SoundCheks. How could SoundCheks be useful to Google? Knowledge Vault scores the truthfulness of individual triples. The path-ranking algorithm is potentially useful for identifying formal fallacies. But it doesn't have any capabilities to assess informal fallacies of arguments constructed from the interaction of multiple triples. SoundCheks could serve as a rich source of training data for a future algorithm from Google that automates the scoring of informal arguments. Rather than putting SoundCheks out of business, this algorithm could speed up the SoundCheks annotation process further, increasing the rate at which public statements are checked. Provided that the soundness-scoring algorithm was good enough, a positive feedback loop might ensue wherein the rate of SoundCheking increases along with the set of informal arguments that the soundness-scoring algorithm could recognize.

So I'm very excited about Knowledge Vault. If it doesn't have a public API by the time SoundCheks launches, we'll at the very least plug into Google's Freebase API, along with similar resources, to facilitate SoundCheks' search for the truth.

If you only care about the mean, you're doing it wrong