Answers for Searching

Exploring Relevancy 2: Live Free, Search Hard

Nov 8, 2019 11:05:35 AM / by Curtis Wadsworth, Founder, CEO

Last week we discussed relevance and the advantages that NLP based search engines have compared to their keyword searching counterparts. Because NLP understands the elements of a search query in context, NLP based engines, like Dorothy, have a clear advantage over keyword based search engines. We used relatively simple examples to illustrate this point. But, there’s more.

The terminology used in patent applications can have different meanings in different contexts and when describing different technologies. Serious searchers have solved this problem by limiting their searches to specific classifications. However, choosing the correct classification from the 250,000 CPC categories is tricky to say the least.

giphySearchin' ain't easy!

Most searchers add elements to, or otherwise limit, their search queries to eliminate the spurious results. Another search is carried out and analyzed. This is the same approach when searching consumer search products like Google. For example, your search for “power drill” yields too many results, so you search “cordless power drill,” “DeWalt power drill,” “drill press,” or “hammer drill.” In each case, fewer more targeted results are returned, but at what cost? By limiting your search to “DeWalt power drill,” you might miss a sale on a more highly rated, more powerful Milwaukee power drill.

This is exactly what happens when you add elements to a patent search query, or limit your search query to, for example, title, abstract, and claims: your search returns fewer results, but potentially important references that discuss key concepts in the detailed description are left out. To avoid missing important references, a user may perform numerous searches with various combinations of key terms. Patent Examiner’s use this tactic. I recently determined that 4 U.S. patent examiners prosecuting related patent medical device applications performed on average 75 individual searches for each application. One Examiner I talked to told me this average is low. On the other hand, the time required to perform 75 individual searches is extremely high.

NLP based search engines overcome these deficiencies in two ways. First, by inferring classification information from the query, the returned results are limited to particular classes, groups, and subgroups of inventions. Second, by synonymizing the words in the search query, NLP eliminates the need to perform multiple searches with different words or include large strings of synonyms and can perform multiple searches with different combinations of words simultaneously.

Duplicates are a BIG problem in the patent world. Every issued U.S. patent filed after November 29, 2000 was published 18 months after the earliest filing date, meaning that there is a pre-issuance publication and a published issued patent in the patent database, i.e. a duplicate of every issued patent. Add to this continuation applications, divisional applications, and foreign equivalents and their duplicates, and you get an idea of the expansiveness of the duplicate problem.

On the surface, the duplicate problem is annoying. You’re half way through reviewing the application before you realize it’s a duplicate, or you have to page through results to find a non-duplicate result.

More importantly, the presence of duplicates pushes potentially relevant results further down on the list of returned results, making them more difficult to find. Coupled with the frustration of paging through page after page of results, the odds of finding these relevant but buried results become increasingly small, particularly if you are practicing at a law firm and your client won’t pay for time spent searching.

Deduplicating patent search results is an issue that all platforms are attempting to resolve with limited success. In some cases, patent families can be removed, but foreign equivalents are very often not included in the family or are missed. If the application was filed in multiple countries, multiple duplicates still exist in the deduped return results. In other cases, priority data is not entered properly causing duplicates to persist in returned results.

At DorothyAI, we are solving this problem in, what I believe to be, a completely novel way. We won’t discuss it here. It will be live with our next update. I’ll let the results speak for themselves.


Tags: Patent Law, Natural Language Processing, Legal Tech, Machine Learning, Creative Solutions, Extreme Problem Solving