Answers for Searching

Exploring Relevancy

Nov 3, 2019 9:07:39 AM / by Curtis Wadsworth, Founder, CEO

The single most important determinant of whether a search was successful is whether the documents returned by the search engine are relevant to the user. As you might imagine, relevancy is a daily discussion at DorothyAI, since we want to have happy customers who view our returned results as being highly relevant.

You might be surprised to know that there are two definitions of “relevant” in software development:

  1. The returned document actually answers the question or solves the problem posed by the search query.
  2. The user can easily understand why the search engine retrieved the returned document.

You are probably thinking that since DorothyAI is a patent search company that we are primarily interested in definition #1. You are correct. However, ask yourself the last time you looked at a result returned by your favorite search engine and asked yourself, “Where the heck did that come from?” We have to focus on definition #2 as well. Patent lawyers/agents are very smart people, you should understand how Dorothy works, so you can make appropriate changes if you’re not seeing the results you expect.

Most patent search engines allow the user to filter by relevance or filing date. Let’s take a moment to appreciate this. If sorting by relevance, which is typically the default setting, doesn’t produce the results I’m looking for, your next best option is sorting by filing date. My guess is that sorting by filing date is not going to reveal the most important references for a given query. But, people still use the USPTO search site, which returns results sorted by publication number, so I could be wrong.

giphy

Understanding the elements of a search query in context may be the most obvious advantage of natural language processing engines, like Dorothy, as compared to keyword based search engines. NLP engines infer context by scoring the search query as a whole and identifying sentences, paragraphs, and sections of patent publications that have been similarly scored during indexing. References that include all of the sentences and paragraphs with similar scores describe similar concepts and are returned at the top of the results list.

There is no way to provide context to the words in a keyword. At best, you can specify the location of one term in relation to another using, for example, Boolean operator “near” or “w/3” or “near.3” depending on the search engine you are using. Proximity operators are a mechanical means for providing context (The search terms are within 3 words of each other or they are not.), and does not provide context. Many of the, “Where the heck did that come from,” moments experienced by searches are a result of terms being identified in a reference without context.

Let’s say you are searching for a kitchen knife that cuts food using a laser. The query [laser “kitchen knife”] in Google Patents results U.S. patent publications primarily describing methods for making kitchen knives using laser welding, laser engraving, laser annealing, etc., and robotic cookers that use lasers to scan the surface of food. Not exactly what we were looking for.

A modified query [laser NEAR “kitchen knife”] returns U.S. patent publications describing laser assemblies used in machining operations and surgical tools. The term “kitchen knife” does not appear in any of the first 5 results. These results seem further from our target.

Modifying our query to, laser NEAR knife kitchen, results in U.S. patent publications describing cooking apparatuses that use lasers to cook, not cut, food. We are getting somewhat better results, but the search seems to be directed more to cooking than cutting.

Query kitchen laser NEAR knife results in similar cooking apparatuses. Here’s the description of the provided by Google patents as the most relevant result:

800 , the robotic hands 72 execute the mini-manipulation 770 of cracking an egg with a knife, where the optimal way to execute each movement in the cracking an egg operation 772 , the holding a knife operation 774 , the striking the egg with a knife operation 776 , and opening the cracked egg operation

Nope!

Reviewing the results reveals that terms “laser cooking” and “cooking” are now highlighted. In attempting to put the query in context, Google Patent took the presence of the term “kitchen” in the query to infer that the search relates to “cooking” and returned results that combine the concepts of lasers and cooking as highly relevant. Notably, the phrase “kitchen knife” was not identified as a cutting utensil for food in the previous searches, so the terms “knife” and “kitchen” were taken completely out of the context of cutting food.

With this in mind, we modify the search to, food cutting laser NEAR knife. Finally, we get results that look promising. Notably however, the most relevant result according to Google Patents is a publication entitled “Method and system for more accurately determining nutritional values and reducing waste of food items,” with the description:

Lend the concession stand personnel a warmer for the evening so the concessions can keep their pizzas warm throughout the game. Offer to include or recommend that the fundraiser purchase a roller blade cutter to cut the pizzas. Even though the equal slice cutting tool will cut equal slices at the…

At least we’re seeing synonyms for “knife” (“blade”) in the results. The presence of “pizza” in the search results could reflect user preferences, but I can’t say for sure.

Indexing also allows NLP based engines to understand the words “kitchen” and “knife” are often used together to describe a cutting utensil. Indeed, 100% results returned by Dorothy using the search query “kitchen knife that cuts food using a laser” are directed to cutting utensils and apparatuses. The top references describe inventions that cut food: broadly (#1), eggs (#2), rolls (#3), or tuna (#4), and all of these references describe a laser cutting means somewhere in the specification. Notably absent from our results are the laser cooking apparatuses that make up the majority of the results returned by Google Patent.

Because NLP is much better are searching things in context, the results returned by Dorothy are obviously more relevant (both because the results solve the problem and it’s obvious why they were returned) than the best results that were obtained using Google Patents in this simple search. Adding a more detailed description in the Invention Disclosure field will produce more targeted results.

 

 

Tags: Patent Law, Natural Language Processing, AI, Legal Tech, Extreme Problem Solving, Startup