Taxonomic Classification and the Golden Set
Taxonomic classification is a technique which uses a hierarchical and tree like structure in order to classify every word that falls under a category contained in the taxonomy. The taxonomy itself is a large set of documents which requires human readers to identify specific words within a known set of documents and associate the words under separate labels which may then form a different set known as “golden set” or the “training set”. A new classifier model gets developed in conjunction with the original taxonomy.
In layman terms, the model specifically deals with identifying particular labels and forming a hierarchical structure containing “top level” and further “lower level” categories that may be used to process and identify the correct meaning of the word. This will be a continuous process and the model is an ever growing model with new words constantly being added as labels.
Example
Here is a short example of how this model will work?
Automobiles
|
Foreign or Domestic Cars
|
Make of the Car
|
Model of the Car
|
Color of the Car
|
Car Price
Now, the interesting part of this classification is that, the data forming the part of the documents will be obtained by websites. A branded website having proper categorization of cars and models might be used to produce a golden set; or data from several websites together might be used for this task.
Summary of the Patent
Here are the screenshots of the summary of the patent:-
Full Patent information can be viewed here:-
Training Set Construction for Taxonomic Classification
Inventors: | Juang; Philo (Los Angeles, CA), Testa; Christopher (Venice, CA), Mote; Nicolaus (Los Angeles, CA) | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Applicant: |
|
||||||||||
Assignee: | Google Inc. (Mountain View, CA) | ||||||||||
Family ID: | 45572099 | ||||||||||
Appl. No.: | 13/350,213 | ||||||||||
Filed: | January 13, 2012 |
Also See:-
Location Relevance System and Relevancy Score to Power Local Search Results
Google Patent to Identify Erroneous Business Listings
Google Granted Patent for Detecting Hidden Texts and Hidden Links
New Google Patent to Identify Spam in Information Collected From a Source
Google Patent Named Ranking Documents to Penalize Spammers
Rich Snippets in Google
How to Add Ratings and Review Stars on Google Search Results
Query Highlighting on Google Search Results
List of Google Search Operators
New Google Search Quality Updates
Google Search Tips and Tricks