This week we are pushing a revolutionary update that will improve the results returned by Dorothy across technologies. The update includes a number of improvements including better relevancy scoring and reduced redundancy. By far, the biggest improvement is the introduction of DiversiSEARCHTM technology to the platform.
Last week we discussed relevance and the advantages that NLP based search engines have compared to their keyword searching counterparts. Because NLP understands the elements of a search query in context, NLP based engines, like Dorothy, have a clear advantage over keyword based search engines. We used relatively simple examples to illustrate this point. But, there’s more.
The single most important determinant of whether a search was successful is whether the documents returned by the search engine are relevant to the user. As you might imagine, relevancy is a daily discussion at DorothyAI, since we want to have happy customers who view our returned results as being highly relevant.
Dorothy uses natural language processing to search the patent database. Many search platforms have semantic search capabilities which seem to vary in their effectiveness. Like Dorothy, the semantic search query is a plain English description of the thing being searched. You are probably asking yourself, “What’s the difference?”
Patents are REALLY important in the pharmaceutical industry. Taking a new drug to market cost $3-$5 billion and can take up to 16 years thanks to the arduous FDA approval process. Even though many patents that cover new drugs have less 5 years pendency after the drug is approved for sale, 80% of the overall revenue pharmaceutical companies make are tied directly to a patent claiming an approved drug. VC’s and institutional investors understand this, and rarely invest in drugs that are not covered by at least one patent. Basically, if you are going to raise money for a biotech company and/or you want to recoup the cost of bringing the drug to market you are going to need a patent.
We invited 22 people to preview Dorothy, and I’ve had butterflies in my stomach for more than a week.
All things considered, the USPTO patent database is well curated. Millions of patent documents (issued patents and application publications) are available for search and download. The documents include the complete application (title, abstract, specification, and claims), along with various important dates (filing, publication, and issue dates) and lists of references submitted or cited during prosecution. Not bad.