Answers for Searching

Dorothy Saves Lives

Oct 4, 2019 12:47:38 PM / by Curtis Wadsworth, Founder, CEO

Patents are REALLY important in the pharmaceutical industry. Taking a new drug to market cost $3-$5 billion and can take up to 16 years thanks to the arduous FDA approval process. Even though many patents that cover new drugs have less 5 years pendency after the drug is approved for sale, 80% of the overall revenue pharmaceutical companies make are tied directly to a patent claiming an approved drug. VC’s and institutional investors understand this, and rarely invest in drugs that are not covered by at least one patent. Basically, if you are going to raise money for a biotech company and/or you want to recoup the cost of bringing the drug to market you are going to need a patent.

As a Ph.D. biochemist, I have worked with many biotech companies to develop strategies that maximize the exclusivity period of the new drug candidate, making it more likely that the drug will produce enough revenue to offset the cost of approval. At the heart of this process, is a good patent search.


Searching biotech patents is hard. Genes, proteins, and drug candidates are often referred to using nonsensical strings of letters and numbers. Adding to the confusion, many genes and proteins have more than one name or have closely related homologues. It’s impossible to know what is what without deep knowledge of the subject matter. On the small molecule side, genuses covering thousands compounds based on weirdly simple base structures with hundreds of optional “R” groups are used in most small molecule patents. It’s not impossible to figure out the breadth of compounds that are covered by these genuses, but it is extremely time consuming and responsible for at least 3 changes in my eyeglass prescription over the last 10 years.

Having suffered through these searches for most of my career, making biotech searching easier is high on the list of goals for DorothyAI. Achieving this goal is going to be a struggle. We knew that at the outset. The mere act of indexing biotech patents to give Dorothy the deep knowledge of this subject matter is fraught with difficulties, many of which we experienced. Without a well indexed database, we can’t search anything efficiently. It’s a problem we have been working on since we started.

A breakthrough came this week. We (rather Jay) got creative with word embeddings, reindexed the database, and pushed an update that has dramatically improved biotech searching. In our test query, Dorothy previously placed 1 or 2 of the most important 20 publications in the first 50 returned results.* After the update, Dorothy returns 15 of the top 20 publications in the first 50 returned results! At best, the platforms we’re using to compare our results returned 12 of the top 20 publications. Like a King Size Snickers Bar, Dorothy delivers 25% more (relevant patent publications in the top 50 returned results)! Soooo, satisfying.

Here’s a graph showing how we compare to other services.

chart (1)

Keep in mind, this is only the beginning. I fully anticipate that our biotechnology search capabilities will get better. AND, we are going to find a way to search genuses. With this update, Dorothy is in the business of saving lives.

*Learn about our testing process here.

Tags: Patent Law, Natural Language Processing, AI, Legal Tech, Creative Solutions, Extreme Problem Solving