DrugBank Clinical Trial Data

At DrugBank, data quality is foundational, and our clinical trial data is no exception. While much of the publicly available trial data remain fragmented, inconsistently formatted, or difficult to work with, ours is cleaned, standardized, and deeply connected to our broader biomedical knowledgebase. This means you can easily move from a trial to related drugs, diseases, targets, and pathways, surfacing relationships that might otherwise remain hidden.

Whether your starting point is a drug, a protein, or a clinical trial itself, DrugBank gives you the ability to trace relevant relationships and uncover meaningful insights earlier and more reliably than with less connected datasets.

How We Source Our Clinical Trial Data

Our primary source for our trials data is ClinicalTrials.gov. This information is brought into our data ecosystem, then we carefully vet, clean, and link all trial data as extensively as we can to our knowledgebase. This includes creating and verifying connections from trial data to drugs, disease, and proteins.

What We Include and Exclude

At the time of writing, we have more than 218,000 clinical trials in our knowledgebase. Our trial coverage includes the full range of historical content available on ClinicalTrials.gov, which began in 1999, but includes trials dating as far back as 1965. This retrospective data, however, is subject to varied levels of completeness due to differing standards by year and sponsor, and the extent to which the trials were accurately reported.

We regularly integrate forward-looking data by including any recently registered trials as they become available. To be included in DrugBank, whether new or historical, a clinical trial must involve a drug intervention. This applies to trials throughout a variety of recruitment statuses, including ongoing, completed, and trials that haven’t started yet.

We exclude trials without drug interventions, as they typically fall outside the scope of drug discovery. As our mapping and filtering logic improves, we’ll continue refining our criteria to ensure we’re capturing the most relevant data for drug discovery.

How We Handle Data Maintenance

New trial data from ClinicalTrials.gov is added to DrugBank weekly. As new data comes in, we also update any related drug and disease associations. Any data that requires more focused human curation, such as our structured clinical trial termination reasons, will be manually curated within 1-4 weeks (typically within 2 weeks).

Clinical Trial Data: Mapped for Discovery

DrugBank’s data connectivity helps you avoid data silos or dead ends that can stall your research. We have created a web of interconnected information to make it easy to step from one protein, trial, or drug to the next indicated condition or intervention. By maintaining highly curated data and deep connections we enable an exploration process that builds momentum.

Our connections are mapped multi-directionally, meaning you can move from drugs to trials, trials to diseases, diseases to drugs, and drugs to targets, as well as explore every other connection we’ve been able to define. These mappings create the opportunity to not only search using particular key words, but to browse and explore related concepts.

They also facilitate exploration of clinical trials within our Table Builder from a range of different starting points, including drugs, proteins, diseases or trials.

TableBuilder: Connecting Drugs, Targets, and Trials

Table Builder is our interactive tool for creating custom data tables from the full extent of DrugBank data. It allows you to explore clinical trial data at both the micro and macro level. You can quickly move from specific drug details to a broader view of the clinical landscape.

Go From Molecule to Market Landscape

After starting a new clinical trial table you could choose to focus on trials involving a particular drug, let’s say Ivonescimab. As you look through the table you might see a trial focused on pancreatic cancer. From there, you can build a new table filtered by condition (“pancreatic cancer”) and trial phase (“Phase 3”) to see what other therapies are nearing approval.

You could then add trial sponsors to the table to see which companies are active in this disease area. This would give you a clearer picture of the competitive space and ongoing research trends.

A Clinical Trial table filtered for Ivonescimab

A Clinical Trial table filtered for Ivonsecimab and Phase 3 trials

A Clinical Trial table filtered for Ivonescimab, Pancreatic Cancer, and Phase 3 trials. You can then explore the details of the relevant trial directly in Table Builder.

A Clinical Trial table filtered for Ivonescimab and Phase 3 trials with Sponsor data added to explore the competitive landscape.

Data Normalization and Mapping

To support consistent, reliable navigation across DrugBank and ensure we are delivering valuable results, we maintain a combination of manual and automated mapping, along with rigorous data normalization processes.

A key example of this is investigational codes for experimental drugs and drugs in trials. These codes can vary depending on where a drug was developed, whether the original developer was acquired, or if a drug becomes licensed by another company. Rather than leaving you the work of piecing many variations together, we normalize and map each identifier to a single drug data card. This way, no matter which term you search for a drug by, you will find everything we have about it.

Take Ivonescimab as an example. You might know it as AK112, SMT112, or by Ivonescimab. In DrugBank, any of those search terms will get you to the same unified drug entry. Outside of DrugBank, searching for these terms can lead to a different number of results.

Condition Mapping

Similarly to ClinicalTrials.gov, our conditions are all mapped to MeSH. We have also mapped our trial data to MedDRA, SNOMED, and ICD-10 to deliver stronger interoperability, seamless crosswalks between clinical and research vocabularies, and greater flexibility in how and where you use the data.

To see this in action, take the Phase 3 clinical trial of Ivonescimab (AK112) for NSCLC patients. From the trial page, you can explore the linked condition, Non-squamous Non-small-cell Lung Cancer, which opens up a dedicated condition card. There, you’ll find external sources already mapped, along with the condition’s position within the broader disease hierarchy. This transforms a static trial listing into a dynamic entry point for exploration, analysis, and integration.

Looking at the External Sources section of the Non-squamous Non-small cell Lung Cancer condition card.

Looking at the Family Tree section of the Non-squamous Non-small cell Lung Cancer condition card.

Standardized Trial Terminations Reasons

Currently, when reporting why a trial has stopped the trial researchers are using unstructured text fields. This often results in inconsistent, vague, or even misspelled explanations. Because this data isn’t standardized, it can’t be easily searched, sorted, or analyzed at scale. As a result, attempting to use it can be time-intensive and unreliable. You often have to open each individual trial page and scroll deep into the “Researcher View” tab just to read through the raw text.

To streamline this, we’ve created a structured, searchable field for trial termination reasons that can be found in Table Builder. Instead of manually interpreting free-text entries like “wothdrawal (sic) of sponsor” (NCT00377780) or “withdrawn support from BMS” (NCT05200143), you can get clear, filterable categories like “business decision” or “funding.”

The DrugBank Why Stopped: Categorized filter allows you to quickly explore at scale.

Our internal curation team has standardized over 30 termination reasons to eliminate guesswork, correct typos, and remove formatting issues. This makes it possible to run real, scalable analyses, like pulling up all Pfizer-sponsored trials that were terminated due to safety concerns (73 results) in just a few clicks. On ClinicalTrials.gov, that same task would require reviewing 400+ trial pages manually.

A Clinical Trial table filtered for Pfizer-sponsored trials that were terminated for safety concerns.

If you're trying to understand why trials fail, and how that affects your own development strategy, this data saves time, removes ambiguity, and makes cross-trial insights possible.

These examples only scratch the surface. The strength of DrugBank lies in how each data point is systematically linked across drugs, trials, targets, and conditions. This level of connectivity supports more precise filtering, enables multi-directional queries, and reduces the need for redundant lookups.

Current Limitations and Planned Fixes

While we’ve made significant progress in structuring and linking clinical trial data, there are still areas where limitations exist. Our rare disease coverage is currently focused on conditions with FDA orphan designations, which means not all rare diseases are yet represented, and some older trials still use outdated identifiers, which can affect searchability.

In terms of investigational drug data, our connections are strongest in Phase 3 trials, where 80% of interventions are correctly mapped to their respective drugs. However, gaps remain, particularly in Phase 1, Phase 2, and Phase 4 trials. These often stem from poorly formatted entries, brand-name-only references, or complex regimens like chemotherapy combinations and vaccines with ambiguous naming.

We're working to actively address these challenges. Currently, we’re focused on refining our drug-target-to-disease mapping with more precise scoring, improving coverage for early-phase trials, and tightening our handling of inconsistent data formats.

Our goal is to achieve 100% coverage and accuracy when it comes to linking drugs to clinical trials, across all phases of drug development in the future. In the near term, we are working on ensuring that all new trials are linked with high accuracy. As you continue to work within DrugBank, you’ll see continuous improvements in coverage as we expand and refine our mappings to better support complex research workflows.