Knowledge Vault - World's Largest Database of Facts and Entities Being Built by Google

Rate this post

Google is building the world’s largest database of facts and objects in it without any form of human editing or intervention and this database is named the “Knowledge Vault”. This massive database will expand and enhance itself automatically pulling in information from all over the web. It will provide Google all the information required to process queries based upon entities and their relationships. This will take Google a step ahead in becoming an answer engine. Conversational queries like:

“When was Shakespeare born?”
“How far is London from New York?”
“Who is the author of Harry Potter” etc.

would be answered perfectly by the information contained in the Knowledge Vault.

The current database that Google uses to answer queries is the Knowledge Graph. It is based upon human edited databases like Wikipedia,Freebase and many such sources.

Knowledge Vault has pulled in 1.6 billion facts to date and is built on text, tabular data, page structure, and human annotations. It is composed of 3 major components:

Extractors – Helps to extracts triplets from the web and assigns a confidence score to it.

Priors – These help to learn the prior probability of possible triples.

Knowledge Fusion – It determines if the probability extracted by the extractor and the priors are true.

It also uses the path ranking algorithm approach and LCWA (Local Closed World Assumption) labels.

Sources and citations:

http://searchenginewatch.com/article/2362128/Move-Over-Google-Knowledge-Graph-Here-Comes-Knowledge-Vault
http://www.cs.cmu.edu/~nlao/publication/2014.kdd.pdf
http://www.newscientist.com/article/mg22329832.700-googles-factchecking-bots-build-vast-knowledge-bank.html

Also See:

How Does Google Applies Semantic Search?
Latent Semantic Indexing
Facebook Graph Search Optimization
Getting Listed on Search Engines
Universal Analytics

Google Hummingbird Update- All You Need to Know

New Google Search Quality Updates: Page Quality, Location, Proximity most important

Understanding Google’s Page Layout Algorithm

Google Indepth Articles
Google Query Processing by Identifying Entities
How Google Identifies Substitute Terms of a Query?
Google Patent to Identify Erroneous Business Listings
How Google Identifies Spam in Information Collected From a Source?
Google Patent Named Ranking Documents to Penalize Spammers
Taxonomic Classification While Finding Context of Search Query
Google Granted Patent for Detecting Hidden Texts and Hidden Links