On the biologically relevant
chemical space: BioReCS

José L. Medina-Franco1*, Edgar López-López1,2,
Juan F. Avellaneda-Tamayo1 and William J. Zamora3,4

1DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional
Autónoma de México, Avenida Universidad 3000, Mexico City, Mexico, 2Department of Chemistry and
Graduate Program in Pharmacology, Center for Research and Advanced Studies of the National
Polytechnic Institute, Section 14-740, Mexico City, Mexico, 3CBio3 Laboratory, School of Chemistry,
University of Costa Rica, San José, Costa Rica, 4Laboratory of Computational Toxicology and Biological
Testing Laboratory (LEBi), University of Costa Rica, San José, Costa Rica

KEYWORDS

chemoinformatics, dark chemicalmatter, de novo design, food chemicals, metallodrugs,
natural products, odor chemicals, peptides

1 Introduction

The “chemical space” (CS), “chemical compound space,” or “chemical universe” terms
are frequently used in drug discovery and other areas, including chemical synthesis,
catalysis, materials science, food chemistry, and agrochemistry, among others (Kim
et al., 2024). While the concept is often used intuitively or colloquially, CS is inherently
complex, and numerous formal definitions have been proposed and reviewed (Medina-
Franco et al., 2022). A commonly accepted notion of CS relates to the number of chemical
compounds that could theoretically exist—the “size” of chemical space—which varies
greatly depending on the classes of compounds considered (e.g., small organic molecules,
peptides, odorants). Another perspective views CS as a multidimensional space in which
molecular properties (both structural and functional) define coordinates and relationships
between compounds (Virshup et al., 2013; Martinez-Mayorga and Medina-Franco, 2014).
These definitions give rise to the concept of chemical subspaces (ChemSpas): subsets of the
broader chemical universe distinguished by shared structural or functional features. Within
this framework, the biologically relevant chemical space (BioReCS) comprises molecules
with biological activity—both beneficial and detrimental. BioReCS spans diverse application
areas such as drug discovery, agrochemistry, sensory chemistry (e.g., flavor and odor), food
science, and natural product research. It also includes compounds with reactive molecules,
including promiscuous and poly-active molecules, as well as those with highly detrimental
or undesirable effects, such as toxic and allergic compounds.

Chemical compound databases are key resources for exploring the CS and are central to
chemoinformatics (Williams and Richard, 2025). Numerous public databases—varying in
size and specialization—target specific regions of BioReCS. Table 1 provides representative
examples of freely available libraries across several domains. Comprehensive reviews of
chemoinformatic and bioinformatic databases have been published elsewhere (Rigden and
Fernández, 2025; de Azevedo et al., 2024).

A systematic study of CS requires molecular descriptors that define the dimensionality
of the space. The choice of descriptors depends on project goals, compound classes (e.g.,
metal-containing vs purely organic molecules), and the dataset size and diversity. Large and
ultra-large chemical libraries that are highly used today in drug discovery projects (Lyu
et al., 2019; Corrêa Veríssimo et al., 2024), for example, demand descriptors that strike a
balance between computational efficiency and chemical relevance (Warr et al., 2022). The
rise of machine learning has led to the development of novel molecular representations
(Wigh et al., 2022). Visualization is another critical tool for CS analysis, because these spaces

OPEN ACCESS

EDITED BY

Rodolpho C. Braga,
InsilicAll, Brazil

REVIEWED BY

Ho Leung Ng,
Atomwise Inc, United States
Andrea Trabocchi,
University of Florence, Italy

*CORRESPONDENCE

José L. Medina-Franco,
medinajl@unam.mx

RECEIVED 27 July 2025
ACCEPTED 12 August 2025
PUBLISHED 25 August 2025

CITATION

Medina-Franco JL, López-López E,
Avellaneda-Tamayo JF and Zamora WJ (2025)
On the biologically relevant chemical space:
BioReCS.
Front. Drug Discov. 5:1674289.
doi: 10.3389/fddsv.2025.1674289

COPYRIGHT

© 2025 Medina-Franco, López-López,
Avellaneda-Tamayo and Zamora. This is an
open-access article distributed under the terms
of the Creative Commons Attribution License
(CC BY). The use, distribution or reproduction in
other forums is permitted, provided the original
author(s) and the copyright owner(s) are
credited and that the original publication in this
journal is cited, in accordance with accepted
academic practice. No use, distribution or
reproduction is permitted which does not
comply with these terms.

Frontiers in Drug Discovery frontiersin.org01

TYPE Opinion
PUBLISHED 25 August 2025
DOI 10.3389/fddsv.2025.1674289

https://www.frontiersin.org/articles/10.3389/fddsv.2025.1674289/full
https://www.frontiersin.org/articles/10.3389/fddsv.2025.1674289/full
https://crossmark.crossref.org/dialog/?doi=10.3389/fddsv.2025.1674289&domain=pdf&date_stamp=2025-08-25
mailto:medinajl@unam.mx
mailto:medinajl@unam.mx
https://doi.org/10.3389/fddsv.2025.1674289
https://creativecommons.org/licenses/by/4.0/
https://creativecommons.org/licenses/by/4.0/
https://www.frontiersin.org/journals/drug-discovery
https://www.frontiersin.org
https://www.frontiersin.org/journals/drug-discovery
https://www.frontiersin.org/journals/drug-discovery#editorial-board
https://www.frontiersin.org/journals/drug-discovery#editorial-board
https://doi.org/10.3389/fddsv.2025.1674289


often involve many dimensions; dimensionality-reduction
techniques are commonly used to project them into two or three
dimensions for interpretation. Recent reviews detail advancements
in the visualization of chemical space (Sosnin, 2025).

In this article, we offer an integrative perspective on BioReCS,
highlighting common considerations for its consistent and
meaningful exploration. We also address its size, historical
evolution, and future expansion.

2 BioReCS

2.1 Current view

In many research projects, the chemical universe—and by
extension, BioReCS—is explored through distinct sections of
chemical subspaces (ChemSpas). For instance, CS analyses may
focus specifically on small-molecule drug candidates, peptides (Orsi
and Reymond, 2024), or proteolysis-targeting chimeras (PROTACs)
(Danishuddin et al., 2023; Sincere et al., 2023). Other studies target
agrochemicals, odorants, natural products, or metal-containing
compounds. Some research initiatives are at the intersection of
multiple ChemSpas, such as investigating bioactive compounds that
straddle both natural product and food chemical domains (Avellaneda-
Tamayo et al., 2024) or studying the overlap between flavor and odor
chemicals (Cui et al., 2025). Analyzing these intersecting regions of
chemical space often requires integrating methodologies from diverse
disciplines. In this section, we highlight both heavily explored and
underexplored regions of BioReCS.

2.2 Heavily explored chemical subspaces

In drug discovery, widely used public databases such as ChEMBL
(Zdrazil, 2025) and PubChem (Kim et al., 2024) serve asmajor sources of
biologically active small molecules, primarily organic compounds. Owing
to their extensive biological activity annotations, these databases are
major sources of poly-active compounds and promiscuous structures.
Table 1 summarizes these and other key databases that cover different
regions of BioReCS. The chemical space of drug-like molecules,
particularly small organic compounds and natural products, has been
extensively studied. Closely related areas, such as small peptides and other
beyond Rule of 5 (bRo5) entities, are also well-characterized using
computational approaches (Price et al., 2024; Capecchi and Reymond,
2021; López-López et al., 2023). Importantly, to fully chart the boundaries
of BioReCS, it is crucial to include negative biological data—that is,
compounds known to lack bioactivity (Williams et al., 2016; López-López
et al., 2022). These data help define the non-biologically relevant portions
of chemical space. A notable example is dark chemical matter, a large-
scale dataset comprising small molecules from corporate compound
collections that have repeatedly failed to show activity in high-throughput
screening assays (Wassermann et al., 2015). Also, a recent development is

the generation of InertDB, a compound collection with 3,205 curated
inactive compounds obtained from PubChem (An et al., 2025). The
database also includes 64,368 putative inactivemolecules generatedwith a
deep generative artificial intelligence (AI) model trained on the
experimentally determined inactive molecules (An et al., 2025).

2.3 Underexplored chemical subspaces

Certain types of chemical structures remain underrepresented in
chemoinformatics due to modeling challenges. A prominent example is
metal-containing molecules, which are often excluded during data
curation because most chemoinformatics tools are optimized for
small organic compounds (Fourches et al., 2016; Bento et al., 2020;
Valle-Núñez et al., 2025). Metallodrugs, therefore, represent a
structurally and functionally important class that is commonly
filtered out by default. However, the difficulty of modeling a region
of BioReCS should not justify its exclusion. Similarly, various
compound classes are rarely targeted in drug discovery efforts,
including large and complex natural products, macrocycles
(compounds containing rings of ≥12 atoms), protein-protein
interaction (PPI) modulators or inhibitors, PROTACs, and mid-
sized peptides. Many of these molecules fall into the beyond Rule of
5 (bRo5) category (Price et al., 2024; Whitty and Zhou, 2015; Schaub
et al., 2021) (Table 1). Despite their complexity, interest in
characterizing these regions of chemical space is growing. Recent
studies have addressed the CS of peptides (Orsi and Reymond,
2024; Capecchi et al., 2019), agrochemicals (Zhang et al., 2018),
metallodrugs (Meggers, 2007; López López and Medina-Franco,
2025), macrocycles (Viarengo-Baker et al., 2021; Kim et al., 2025),
and PPIs (Zhang et al., 2014; Choi et al., 2021).

2.3.1 Dark regions of the underexplored BioReCS
Beyond beneficial regions, BioReCS also encompasses gray-to-

dark areas—zones that include compounds with undesirable
biological effects, such as toxic chemicals (Tihányi et al., 2025;
(Annex on Chemicals, 2025). Understandably, these regions have
received less attention than areas linked to therapeutic or beneficial
activity. Nonetheless, distinguishing the characteristics that separate
harmful compounds from beneficial ones is vital for the design of
safer, human-beneficial, and ecologically responsible molecules.

3 Common considerations to
explore BioReCS

In this section, we highlight common challenges associated with
exploring BioReCS, along with possible workarounds and emerging
directions. While not exhaustive, these topics are meant to illustrate
recurring issues and encourage a holistic consideration of the BioReCS.

3.1 Towards universal descriptors

The structural diversity across underexplored regions of BioReCS
presents a major challenge to define a consistent chemical space using
molecular descriptors. Traditional descriptors, tailored to specific
ChemSpas such as small molecules, peptides, or metallodrugs, lack

Abbreviations: AI, artificial intelligence; bRo5, beyond Rule of 5; ChemSpa,
chemical subspace; CS, chemical space; BioReCS, biological-relevant
chemical space; PROTACs, proteolysis-targeting chimeras; PPI, protein-
protein interaction.

Frontiers in Drug Discovery frontiersin.org02

Medina-Franco et al. 10.3389/fddsv.2025.1674289

https://www.frontiersin.org/journals/drug-discovery
https://www.frontiersin.org
https://doi.org/10.3389/fddsv.2025.1674289


TABLE 1 Representative public compound data sets covering different regions of the BioReCS.a

Type of data set,
area covered

Exemplary data sets Size range Brief description

Drugs approved for
clinical use

DrugBank (Knox et al., 2023) | FDA
(Center for Drug Evaluation and Research,
2025)

17,481 entries | 4,563 approved chemical entities Comprehensive, manually curated resource
integrating detailed drug, drug–target, and
pharmacological data. The FDA set is
included in DrugBank

Metallodrugs MetAP DB (López López and
Medina-Franco, 2025)

61 Metal-based approved drug database.
Compounds are classified according to their
clinical uses: metallodrug, imaging,
radioimaging, radiotherapy, and
photodynamic

Compounds and tools for
drug repositioning

DrugRepoBank (Huang et al., 2024) Bioactive compounds: 49,652; Drug–target
interactions: 880,945; Drug–disease associations:
28,978; Drug–side effect associations: 109,698;
Target proteins: 4,221; Drug gene-expression
signatures: 473,647

A comprehensive, curated database and
discovery platform designed to accelerate
drug repositioning

Compounds in clinical trials ClinicalTrials (ClinicalTrials.gov, 2025) ≈530,000 entries Database of clinical research studies and
information about their results. Generated
by the U.S. National Institutes of Health
and other U.S. agencies. Data on clinical
entries from 200 countries

Compounds annotated with
biological activity

ChEMBL (Zdrazil et al., 2023; Zdrazil,
2025); PubChem (Kim et al., 2024);
CellMinerCDB (Shankavaram et al., 2009)

~2.4 M | > 322 M | >20,000 compounds Repositories of biologically annotated
compounds, integrating experimental
bioactivity data, clinical-phase molecules,
drug repurposing candidates, and chemical
probe information. | CellMiner Integrates
genomic and pharmacologic data for the
NCI-60 panel of 60 diverse human cancer
cell lines, representing 9 different cancer
types

Peptides Peptipedia v2.0 (Cabas-Mora et al., 2024) 3,983,654 sequences; 103,561 active labeled Largest bioactive peptide compilation
database to 2024, with more than
200 bioactivity types. Web-based tools
include secondary structure evaluation,
functional domain analysis,
physicochemical, and thermodynamic
properties

Proteomics ProteomicsDB (Schmidt et al., 2017) Number of LC-MS/MS experiments: ~19,000;
Human tissues/body fluids: ~41; Cell line
datasets: ~60

Protein-centric database designed for
exploration of large-scale quantitative mass
spectrometry proteomics data. Multi-omics
data types: transcriptomics, proteomics,
functional drug-sensitivity, and interaction
networks

Targeted covalent inhibitors
(TCIs)

CovBinderInPDB (Guo and Zhang, 2022)
CovalentInDB 2.0 (Du et al., 2024)

7,375 covalent modifications; 8,303 inhibitors Curated databases to support the design of
TCIs. Covalent interactions detailing
binders across diverse residues. Expand on
bioactivity data, target profiles, ligandability
predictions, and libraries of commercial
and natural product-derived covalent
compounds

Protein-protein interaction
(PPI) inhibitors

iPPI-DB (Torchet et al., 2021) | DLiP-PPI
(Ikeda et al., 2023) ref

2,374 compounds | 32,647 PPI-related
compounds

Manually curated, community-extendable
resource featuring annotated PPI
modulators and stabilizers | Newly
synthesized and literature-extracted
molecules, characterized by properties
tailored for PPI inhibition, along with
target-specific filtering, and activity data

Macrocycles MacrolactoneDB (Zin et al., 2020) ~14,000 Macrocyclic lactones integrating structural
and bioactivity data, designed to support
cheminformatics analysis and predictive
modeling of this compound class

Heterobifunctional degraders PROTACs (Srivastava et al., 2025) 10 Manual compilation of representative
PROTACs in clinical development

(Continued on following page)

Frontiers in Drug Discovery frontiersin.org03

Medina-Franco et al. 10.3389/fddsv.2025.1674289

http://ClinicalTrials.gov
https://doi.org/10.3389/fchem.2022.1090643
https://www.frontiersin.org/journals/drug-discovery
https://www.frontiersin.org
https://doi.org/10.3389/fddsv.2025.1674289


universality. However, there are ongoing efforts to develop structure-
inclusive, general-purpose descriptors. Notable examples include
molecular quantum numbers (Nguyen et al., 2009) and the
MAP4 fingerprint (Capecchi et al., 2020 ref), which is designed to
accommodate entities ranging from small molecules to biomolecules
and evenmetabolomic data.More recently, neural network embeddings
derived from chemical language models have shown promise in
encoding chemically meaningful representations that can reconstruct
molecular structures or predict properties (Lžičař and Gamouh, 2024).
However, there is still a pressing need to develop systematic molecular
fingerprints for the study of biomaterials and inorganic molecules.

3.2 pH-dependent chemical space

Many bioactive compounds, especially drugs, are weak bases, acids,
or ampholytes that can ionize depending on the pH of their
environment. Pioneering studies have reported that 62.9% of
compounds in the World Drug Index (n = 582) are ionizable, with
the majority being bases, fewer acids, and some ampholytes (Manallack,
2007), however, chemogenomic analyses on contemporary drugs (n =
3766) have shown that this percentage can reach 80% (Manallack et al.,
2013). In consequence, the ionization state—charged or neutral—of a
bioactive compound profoundly impacts its solubility, permeability,
absorption, distribution, toxicity, and binding, making this distinction
essential in drug development and computational modeling. However,
CS analyses typically assume molecular structures with neutral charge,
which may not reflect the actual bioactive species of compounds under
physiological or environmental conditions. Even when the structural

representation of an ionizable compound is accurate, chemoinformatics
tools often calculate molecular descriptors such as lipophilicity (logP)
based solely on the neutral species, overlooking the dominant ionic
forms. Computing lipophilicity using logD at physiological pH is much
more relevant than using logP for small molecules (Bhal et al., 2007;
Zamora et al., 2017), including standard amino acid residues (Zamora
et al., 2019) to non-standard residues (Viayna et al., 2024). Those
limitations underscore the need for implementing chemoinformatics
tools capable of calculating molecular properties contingent on the
ionization state of bioactive compounds as a function of
environmental pH in CS research (Bertsch et al., 2023; Bertsch-
Aguilar et al., 2024). This highlights that neglecting the pH-
dependent behavior of bioactive compounds could limit the
biological relevance of BioReCS. Consequently, future efforts should
aim to incorporate protonation state dynamics to enhance their
representativeness in pH-dependent CS analysis.

3.3 De novo generated libraries: expanding
the BioReCS

In drug discovery and beyond, there is growing interest in creating
on-demand, synthetically accessible virtual libraries for high-throughput
screening (Perebyinis and Rognan, 2022; Grygorenko et al., 2020;
Chávez-Hernández et al., 2023). Advances in generative models have
accelerated the enumeration of the large and ultra-large chemical
libraries, expanding the known chemical space and enabling the
design of extensive libraries guided by structure or property
constraints (Ye, 2024). However, evaluating the usefulness of such

TABLE 1 (Continued) Representative public compound data sets covering different regions of the BioReCS.a

Type of data set,
area covered

Exemplary data sets Size range Brief description

Pharmacogenomics PharmGKB (Gong et al., 2021) Drugs: 715; Genes: 1,761; Diseases/phenotypes:
227; Clinical dosing guidelines: 165; Drug labels
annotated: 784; Variant annotations:
>5,000 individual variant–drug summaries

It specializes in curated information about
how human genetic variation affects drug
response—covering clinical dosing
guidelines, drug label annotations,
variant–drug associations, and
gene–pathway data to support both
research and clinical precision medicine

Natural product compounds COCONUT (Chandrasekhar et al., 2024) |
LANaPDB (Gómez-García et al., 2024)

695,119 | 13,578 Compilation of curated natural product
databases

Food chemicals FooDB (Harrington et al., 2019) >3 M records and observations, corresponding to
128,283 different foods

Database focused on the chemical
composition of foods and their associated
health effects

Flavor molecules Kou et al. compilation (Kou et al., 2023) |
Compilation for FlavorMiner
(Herrera-Rocha et al., 2024)

>14,000 unique flavor molecules (8982 molecules
with known taste and 5,046 with known aroma) |
13,387 compounds

Compilation of 25 flavor molecule
databases published within the last 20 years
| Compilation of molecules with
experimentally validated flavor profiles

Odor chemical Pyrfume (Hamel et al., 2024) |
OlfactionBase (Sharma et al., 2021)

>20,000 odorants | 2,871 entries related to
odorant/pheromone binding

Unified dataset of stimulus-linked olfactory
datasets | Includes odors, odorants, and
odorless compounds and their interactions
with different receptors

Toxic chemicals TOXNET (Davis et al., 2020) | OPCW
schedules (Annex on Chemicals, 2025)

103,062,149 toxicogenomic data, including
chemical–gene/protein interactions,
chemical–disease and gene–disease relationships
| >35,000 chemical weapons

A publicly available database that aims to
advance understanding about how
environmental exposures affect human
health
| Substances are organized into two
categories: Toxic and precursors

aThe list of compound databases is not exhaustive. Exemplary databases are shown.

Frontiers in Drug Discovery frontiersin.org04

Medina-Franco et al. 10.3389/fddsv.2025.1674289

https://doi.org/10.1186/s13321-020-00445-4
https://www.frontiersin.org/journals/drug-discovery
https://www.frontiersin.org
https://doi.org/10.3389/fddsv.2025.1674289


libraries requires more than sheer size; chemical diversity, as assessed
through fingerprints, scaffolds, and physicochemical descriptors, is
equally critical. Notably, a recent historical analysis of ChEMBL,
PubChem, and DrugBank revealed that newer libraries are not
necessarily more diverse (Lopez Perez et al., 2025). A similar trend
could be observed for the continuously enumerated ultra-large chemical
libraries, highlighting the need to quantify their chemical diversity using
multiple structural representations. For BioReCS, we must consider not
only the scale and diversity of expansion but also its direction—whether
new molecules occupy unexplored regions or merely populate existing
subspaces. Depending on the application area (e.g., drug discovery), the
bioactivity profile should also be considered to avoid populating regions
of BioReCS with promiscuous compounds associated with undesirable
clinical effects.

3.4 Developing novel computational
approaches

As the concept and application of chemical space evolve, so too
must the computational tools used to explore it (Reymond, 2025).
Novel or less conventional regions of drug-like space, such as
bRo5 compounds discussed in Section 2.2, demand innovative
methodologies or adaptations of existing ones. For instance, a
recently developed hybrid fingerprint was designed specifically to
accommodate metal-containing molecules, extending traditional
organic-focused fingerprints by incorporating metal-specific features
(López López andMedina-Franco, 2025). Looking ahead, we anticipate
increasing use of hybrid computational workflows, which combine
descriptor-based, rule-based, and AI-driven methods (Medina-Franco
et al., 2024). In parallel, newmethods for analyzingmultiple dimensions
and types of information—such as chemical multiverse analysis and the
creation of consensus chemical spaces (Medina-Franco et al., 2022;
Medina-Franco et al., 2019; López-López and Medina-Franco, 2023)
—will enable more efficient use and integration of available data.
Finally, machine learning models trained in known regions of
BioReCS will play a pivotal role in navigating uncharted subspaces
and improving coverage of BioReCS.

4 Summary

In this opinion article, we offered a holistic perspective on the
biologically relevant chemical space (BioReCS) as a subset of the broader
chemical universe. Effective navigation of BioReCS requires not only
cataloging active compounds but also systematically reporting
biologically inactive molecules, which help define the limits of
relevance. While most of the explored regions focus on human-
beneficial activities—such as therapeutic development, agriculture,
and food sciences—BioReCS also includes dark regions populated by
undesirable or toxic compounds. Recognizing and learning from these
contrasts is essential for safer, ecologically responsible, andmore targeted
molecular design. The exploration of understudied ChemSpasmay drive
the development or refinement of computational tools, especially in cases
where current methods fall short. Broadening the scope of BioReCS
analysis—from both a structural and functional standpoint—could
reveal hidden subspaces containing compounds with novel or
unexpected biological activities. Importantly, training machine

learning models on known BioReCS data will enhance our capacity
to identify uncharted regions and optimize exploration strategies. As
chemical databases continue to grow, it is important to emphasize that
expansion alone does not equate to increased chemical diversity or
biological relevance. Future research should consider not only the scale of
these libraries but also their directionality, structural diversity, and
applicability to real-world biological contexts.

Author contributions

JM-F: Conceptualization, Funding acquisition, Resources,
Writing – review and editing, Writing – original draft, Project
administration, Supervision, Formal Analysis. EL-L: Formal Analysis,
Investigation, Writing – review and editing. JA-T: Formal Analysis,
Investigation, Writing – review and editing. WZ: Funding acquisition,
Formal Analysis, Writing – review and editing, Investigation.

Funding

The author(s) declare that financial support was received for the
research and/or publication of this article. We thank the Dirección
General de Cómputo y de Tecnologías de la Información y
Comunicación (DGTIC), Universidad Nacional Autónoma de
México, for the computational resources to use Miztli under
project LANCAD-UNAM-DGTIC-335. WZR thanks the Vice
Chancellor for Research of the University of Costa Rica for its
support via the research project 908-C3-610.

Acknowledgments

Insights and rich discussions with Karina Martinez-Mayorga
and Gerald M. Maggiora are highly acknowledged. EL-L and JFA-T
thank the Consejo Nacional de Humanidades, Ciencias y Tecnología
(CONAHCyT) for the PhD scholarships 762342 (No. CVU:
894234), and 1270553, respectively.

Conflict of interest

The authors declare that the research was conducted in the
absence of any commercial or financial relationships that could be
construed as a potential conflict of interest.

The author(s) declared that they were an editorial board
member of Frontiers, at the time of submission. This had no
impact on the peer review process and the final decision.

Generative AI statement

The author(s) declare that no Generative AI was used in the
creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this
article has been generated by Frontiers with the support of artificial
intelligence and reasonable efforts have been made to ensure
accuracy, including review by the authors wherever possible. If
you identify any issues, please contact us.

Frontiers in Drug Discovery frontiersin.org05

Medina-Franco et al. 10.3389/fddsv.2025.1674289

https://www.frontiersin.org/journals/drug-discovery
https://www.frontiersin.org
https://doi.org/10.3389/fddsv.2025.1674289


Publisher’s note

All claims expressed in this article are solely those of the authors
and do not necessarily represent those of their affiliated

organizations, or those of the publisher, the editors and the
reviewers. Any product that may be evaluated in this article, or
claim that may be made by its manufacturer, is not guaranteed or
endorsed by the publisher.

References

An, S., Lee, Y., Gong, J., Hwang, S., Park, I. G., Cho, J., et al. (2025). InertDB as a
generative AI-expanded resource of biologically inactive small molecules from
PubChem. J. Cheminformatics 17 (1), 49–14. doi:10.1186/s13321-025-00999-1

Annex on Chemicals (2025). OPCW. Available online at: https://www.opcw.org/
chemical-weapons-convention/annexes/annex-chemicals/annex-chemicals (Accessed
July 25, 2025).

Avellaneda-Tamayo, J. F., Chávez-Hernández, A. L., Prado-Romero, D. L., and
Medina-Franco, J. L. (2024). Chemical multiverse and diversity of food chemicals.
J. Chem. Inf. Model 64 (4), 1229–1244. doi:10.1021/acs.jcim.3c01617

Bento, A. P., Hersey, A., Félix, E., Landrum, G., Gaulton, A., Atkinson, F., et al. (2020).
An open source chemical structure curation pipeline using RDKit. J. Cheminformatics
12 (1), 51. doi:10.1186/s13321-020-00456-1

Bertsch, E., Suñer, S., Pinheiro, S., and Zamora, W. J. (2023). Critical assessment of
ph-dependent lipophilicity profiles of small molecules: which one should we use and in
which cases? ChemPhysChem 24 (24), e202300548. doi:10.1002/cphc.202300548

Bertsch-Aguilar, E., Piedra, A., Acuña, D., Suñer, S., Pinheiro, S., and Zamora
Ramírez, W. J. (2024). LiProS: FAIR simulation workflow to predict accurate
lipophilicity profiles for small molecules. Am. Chem. Soc. (ACS). doi:10.26434/
chemrxiv-2024-znppb-v2

Bhal, S. K., Kassam, K., Peirson, I. G., and Pearl, G. M. (2007). The rule of five
revisited: applying log D in place of log P in drug-likeness filters. Mol. Pharm. 4 (4),
556–560. doi:10.1021/mp0700209

Cabas-Mora, G., Daza, A., Soto-García, N., Garrido, V., Alvarez, D., Navarrete, M.,
et al. (2024). Peptipedia v2.0: a peptide sequence database and user-friendly web
platform. A major update. Database 2024, baae113. doi:10.1093/database/baae113

Capecchi, A., and Reymond, J.-L. (2021). Peptides in chemical space. Med. Drug
Discov. 9, 100081. doi:10.1016/j.medidd.2021.100081

Capecchi, A., Zhang, A., and Reymond, J.-L. (2019). Populating chemical space with
peptides using a genetic algorithm. J. Chem. Inf. Model. 60 (1), 121–132. doi:10.1021/
acs.jcim.9b01014

Capecchi, A., Probst, D., and Reymond, J.-L. (2020). One molecular fingerprint to rule
them all: drugs, biomolecules, and the metabolome. J. Cheminformatics 12 (1), 43.
doi:10.1186/s13321-020-00445-4

Center for Drug Evaluation and Research (2025). Drugs@FDA data files. U.S. Food
Drug Adm. Available online at: https://www.fda.gov/drugs/drug-approvals-and-
databases/drugsfda-data-files (Accessed July 25, 2025).

Chandrasekhar, V., Rajan, K., Kanakam, S. R. S., Sharma, N., Weißenborn, V.,
Schaub, J., et al. (2024). COCONUT 2.0: a comprehensive overhaul and curation of the
collection of open natural products database. Nucleic Acids Res. 53 (D1), D634–D643.
doi:10.1093/nar/gkae1063

Chávez-Hernández, A. L., López-López, E., and Medina-Franco, J. L. (2023). Yin-
yang in drug discovery: rethinking de novo design and development of predictive
models. Front. Drug Discov. 3, 1222655. doi:10.3389/fddsv.2023.1222655

Choi, J., Yun, J. S., Song, H., Kim, N. H., Kim, H. S., and Yook, J. I. (2021). Exploring
the chemical space of protein–protein interaction inhibitors through machine learning.
Sci. Rep. 11 (1), 13369. doi:10.1038/s41598-021-92825-5

ClinicalTrials.gov (2025). ClinicalTrials.gov. Available online at: https://clinicaltrials.
gov/(Accessed July 25, 2025).

Corrêa Veríssimo, G., Salgado Ferreira, R., and Gonçalves Maltarollo, V. (2024).
Ultra-large virtual screening: definition, recent advances, and challenges in drug design.
Mol. Inf. 44 (1), e202400305. doi:10.1002/minf.202400305

Cui, Z., Qi, C., Zhou, T., Yu, Y., Wang, Y., Zhang, Z., et al. (2025). Artificial
intelligence and food flavor: how AI models are shaping the future and
revolutionary technologies for flavor food development. Compr. Rev. Food Sci. Food
Saf. 24 (1), e70068. doi:10.1111/1541-4337.70068

Danishuddin, M., Jamal, M. S., Song, K.-S., Lee, K.-W., Kim, J.-J., and Park, Y.-M.
(2023). Revolutionizing drug targeting strategies: integrating artificial intelligence and
structure-based methods in PROTAC development. Pharmaceuticals 16 (12), 1649.
doi:10.3390/ph16121649

Davis, A. P., Grondin, C. J., Johnson, R. J., Sciaky, D., Wiegers, J., Wiegers, T. C., et al.
(2020). Comparative toxicogenomics database (CTD): update 2021. Nucleic Acids Res.
49 (D1), D1138–D1143. doi:10.1093/nar/gkaa891

de Azevedo, D. Q., Campioni, B. M., Pedroz Lima, F. A., Medina-Franco, J. L.,
Castilho, R. O., and Maltarollo, V. G. (2024). A critical assessment of bioactive

compounds databases. Fut. Med. Chem. 16 (10), 1029–1051. doi:10.1080/17568919.
2024.2342203

Du, H., Zhang, X., Wu, Z., Zhang, O., Gu, S., Wang, M., et al. (2024). CovalentInDB
2.0: an updated comprehensive database for structure-based and ligand-based covalent
inhibitor design and screening. Nucleic Acids Res. 53 (D1), D1322–D1327. doi:10.1093/
nar/gkae946

Fourches, D., Muratov, E., and Tropsha, A. (2016). Trust, but verify II: a practical
guide to chemogenomics data curation. J. Chem. Inf. Model. 56 (7), 1243–1252. doi:10.
1021/acs.jcim.6b00129

Gómez-García, A., Acuña Jiménez, D. A., Zamora, W. J., Barazorda-Ccahuana, H. L.,
Chávez-Fumagalli, M. Á., Valli, M., et al. (2024). Latin American natural product
database (LANaPDB): an update. J. Chem. Inf. Model. 64 (22), 8495–8509. doi:10.1021/
acs.jcim.4c01560

Gong, L., Whirl-Carrillo, M., and Klein, T. E. (2021). PharmGKB, an integrated
resource of pharmacogenomic knowledge. Curr. Protoc. 1 (8), e226. doi:10.1002/
cpz1.226

Grygorenko, O. O., Radchenko, D. S., Dziuba, I., Chuprina, A., Gubina, K. E., and
Moroz, Y. S. (2020). Generating multibillion chemical space of readily accessible
screening compounds. iScience 23 (11), 101681. doi:10.1016/j.isci.2020.101681

Guo, X.-K., and Zhang, Y. (2022). CovBinderInPDB: a structure-based covalent
binder database. J. Chem. Inf. Model. 62 (23), 6057–6068. doi:10.1021/acs.jcim.2c01216

Hamel, E. A., Castro, J. B., Gould, T. J., Pellegrino, R., Liang, Z., Coleman, L. A., et al.
(2024). Pyrfume: a window to the world’s olfactory data. Sci. Data 11 (1), 1220. doi:10.
1038/s41597-024-04051-z

Harrington, R. A., Adhikari, V., Rayner, M., and Scarborough, P. (2019). Nutrient
composition databases in the age of big data: FoodDB, a comprehensive, real-time
database infrastructure. BMJ Open 9 (6), e026652. doi:10.1136/bmjopen-2018-026652

Herrera-Rocha, F., Fernández-Niño, M., Duitama, J., Cala, M. P., Chica, M. J.,
Wessjohann, L. A., et al. (2024). FlavorMiner: a machine learning platform for
extracting molecular flavor profiles from structural data. J. Cheminformatics 16 (1),
1–12. doi:10.1186/s13321-024-00935-9

Huang, Y., Dong, D., Zhang, W., Wang, R., Lin, Y.-C.-D., Zuo, H., et al. (2024).
DrugRepoBank: a comprehensive database and discovery platform for accelerating drug
repositioning. Database 2024, baae051. doi:10.1093/database/baae051

Ikeda, K., Maezawa, Y., Yonezawa, T., Shimizu, Y., Tashiro, T., Kanai, S., et al. (2023).
DLiP-PPI library: an integrated chemical database of small-to-medium-sized molecules
targeting protein–protein interactions. Front. Chem. 10, 1090643. doi:10.3389/fchem.
2022.1090643

Kim, S., Chen, J., Cheng, T., Gindulyte, A., He, J., He, S., et al. (2024). PubChem
2025 update. Nucleic Acids Res. 53 (D1), D1516–D1525. doi:10.1093/nar/gkae1059

Kim, T., Baek, E., and Kim, J. (2025). Exploring macrocyclic chemical space: strategies
and technologies for drug discovery. Pharmaceuticals 18 (5), 617. doi:10.3390/
ph18050617

Knox, C., Wilson, M., Klinger, C. M., Franklin, M., Oler, E., Wilson, A., et al. (2023).
DrugBank 6.0: the DrugBank knowledgebase for 2024. Nucleic Acids Res. 52 (D1),
D1265–D1275. doi:10.1093/nar/gkad976

Kou, X., Shi, P., Gao, C., Ma, P., Xing, H., Ke, Q., et al. (2023). Data-driven elucidation
of flavor chemistry. J. Agric. Food Chem. 71 (18), 6789–6802. doi:10.1021/acs.jafc.
3c00909

López López, E., and Medina-Franco, J. L. (2025). Metal-FP: a hybrid molecular
fingerprint to encode metal-based approved drugs. ChemRxiv 2025. doi:10.26434/
chemrxiv-2025-6zh2h

Lopez Perez, K., López-López, E., Soulage, F., Felix, E., Medina-Franco, J. L., and
Miranda-Quintana, R. A. (2025). Growth vs diversity: a time-evolution analysis of the
chemical space. J. Chem. Inf. Model. 65 (13), 6788–6796. doi:10.1021/acs.jcim.5c00347

López-López, E., and Medina-Franco, J. L. (2023). Towards decoding hepatotoxicity
of approved drugs through navigation of multiverse and consensus chemical spaces.
Biomolecules 13 (1), 176. doi:10.3390/biom13010176

López-López, E., Fernández-de Gortari, E., and Medina-Franco, J. L. (2022). Yes SIR!
on the structure–inactivity relationships in drug discovery. Drug disc. Today 27 (8),
2353–2362. doi:10.1016/j.drudis.2022.05.005

López-López, E., Robles, O., Plisson, F., andMedina-Franco, J. L. (2023). Mapping the
structure–activity landscape of non-canonical peptides with MAP4 fingerprinting.
Digit. Disc. 2 (5), 1494–1505. doi:10.1039/d3dd00098b

Frontiers in Drug Discovery frontiersin.org06

Medina-Franco et al. 10.3389/fddsv.2025.1674289

https://doi.org/10.1186/s13321-025-00999-1
https://www.opcw.org/chemical-weapons-convention/annexes/annex-chemicals/annex-chemicals
https://www.opcw.org/chemical-weapons-convention/annexes/annex-chemicals/annex-chemicals
https://doi.org/10.1021/acs.jcim.3c01617
https://doi.org/10.1186/s13321-020-00456-1
https://doi.org/10.1002/cphc.202300548
https://doi.org/10.26434/chemrxiv-2024-znppb-v2
https://doi.org/10.26434/chemrxiv-2024-znppb-v2
https://doi.org/10.1021/mp0700209
https://doi.org/10.1093/database/baae113
https://doi.org/10.1016/j.medidd.2021.100081
https://doi.org/10.1021/acs.jcim.9b01014
https://doi.org/10.1021/acs.jcim.9b01014
https://doi.org/10.1186/s13321-020-00445-4
https://www.fda.gov/drugs/drug-approvals-and-databases/drugsfda-data-files
https://www.fda.gov/drugs/drug-approvals-and-databases/drugsfda-data-files
https://doi.org/10.1093/nar/gkae1063
https://doi.org/10.3389/fddsv.2023.1222655
https://doi.org/10.1038/s41598-021-92825-5
https://clinicaltrials.gov/
https://clinicaltrials.gov/
https://doi.org/10.1002/minf.202400305
https://doi.org/10.1111/1541-4337.70068
https://doi.org/10.3390/ph16121649
https://doi.org/10.1093/nar/gkaa891
https://doi.org/10.1080/17568919.2024.2342203
https://doi.org/10.1080/17568919.2024.2342203
https://doi.org/10.1093/nar/gkae946
https://doi.org/10.1093/nar/gkae946
https://doi.org/10.1021/acs.jcim.6b00129
https://doi.org/10.1021/acs.jcim.6b00129
https://doi.org/10.1021/acs.jcim.4c01560
https://doi.org/10.1021/acs.jcim.4c01560
https://doi.org/10.1002/cpz1.226
https://doi.org/10.1002/cpz1.226
https://doi.org/10.1016/j.isci.2020.101681
https://doi.org/10.1021/acs.jcim.2c01216
https://doi.org/10.1038/s41597-024-04051-z
https://doi.org/10.1038/s41597-024-04051-z
https://doi.org/10.1136/bmjopen-2018-026652
https://doi.org/10.1186/s13321-024-00935-9
https://doi.org/10.1093/database/baae051
https://doi.org/10.3389/fchem.2022.1090643
https://doi.org/10.3389/fchem.2022.1090643
https://doi.org/10.1093/nar/gkae1059
https://doi.org/10.3390/ph18050617
https://doi.org/10.3390/ph18050617
https://doi.org/10.1093/nar/gkad976
https://doi.org/10.1021/acs.jafc.3c00909
https://doi.org/10.1021/acs.jafc.3c00909
https://doi.org/10.26434/chemrxiv-2025-6zh2h
https://doi.org/10.26434/chemrxiv-2025-6zh2h
https://doi.org/10.1021/acs.jcim.5c00347
https://doi.org/10.3390/biom13010176
https://doi.org/10.1016/j.drudis.2022.05.005
https://doi.org/10.1039/d3dd00098b
https://www.frontiersin.org/journals/drug-discovery
https://www.frontiersin.org
https://doi.org/10.3389/fddsv.2025.1674289


Lyu, J., Wang, S., Balius, T. E., Singh, I., Levit, A., Moroz, Y. S., et al. (2019). Ultra-large
library docking for discovering new chemotypes. Nature 566 (7743), 224–229. doi:10.
1038/s41586-019-0917-9

Lžičař, M., and Gamouh, H. (2024). CHEESE: 3D shape and electrostatic virtual
screening in a vector space. ChemRxiv 2024. doi:10.26434/chemrxiv-2024-cswth

Manallack, D. T. (2007). The pKa distribution of drugs: application to drug discovery.
Perspect. Med. Chem. 1, 1177391X0700100003. doi:10.1177/1177391x0700100003

Manallack, D. T., Prankerd, R. J., Nassta, G. C., Ursu, O., Oprea, T. I., and Chalmers,
D. K. (2013). A chemogenomic analysis of ionization constants—Implications for drug
discovery. ChemMedChem 8 (2), 242–255. doi:10.1002/cmdc.201200507

Martinez-Mayorga, K., and Medina-Franco, J. L. (2014). Foodinformatics:
applications of chemical information to food chemistry. Springer. doi:10.1007/978-3-
319-10226-9

Medina-Franco, J. L., Naveja, J. J., and López-López, E. (2019). Reaching for the bright
StARs in chemical space. Drug disc. Today 24 (11), 2162–2169. doi:10.1016/j.drudis.
2019.09.013

Medina-Franco, J. L., Rodríguez-Pérez, J. R., Cortés-Hernández, H. F., and López-
López, E. (2024). Rethinking the “best method” paradigm: the effectiveness of hybrid
and multidisciplinary approaches in chemoinformatics. Artif. Intell. Life Sci. 6, 100117.
doi:10.1016/j.ailsci.2024.100117

Medina-Franco, J. L., Chávez-Hernández, A. L., López-López, E., and Saldívar-
González, F. I. (2022). Chemical multiverse: an expanded view of chemical space.
Mol. Inf. 41 (11), 2200116. doi:10.1002/minf.202200116

Meggers, E. (2007). Exploring biologically relevant chemical space with metal
complexes. Curr. Op. Chem. Biol. 11 (3), 287–292. doi:10.1016/j.cbpa.2007.05.013

Nguyen, K. T., Blum, L. C., van Deursen, R., and Reymond, J. (2009). Classification of
organic molecules by molecular quantum numbers. ChemMedChem 4 (11), 1803–1805.
doi:10.1002/cmdc.200900317

Orsi, M., and Reymond, J. (2024). Navigating a 1E+60 chemical space of peptide/
peptoid oligomers. Mol. Inf. 44 (1), e202400186. doi:10.1002/minf.202400186

Perebyinis, M., and Rognan, D. (2022). Overlap of on-demand ultra-large
combinatorial spaces with on-the-shelf drug-like libraries. Mol. Inf. 42 (1), 2200163.
doi:10.1002/minf.202200163

Price, E., Weinheimer, M., Rivkin, A., Jenkins, G., Nijsen, M., Cox, P. B., et al. (2024).
Beyond rule of five and PROTACs in modern drug discovery: polarity reducers,
chameleonicity, and the evolving physicochemical landscape. J. Med. Chem. 67 (7),
5683–5698. doi:10.1021/acs.jmedchem.3c02332

Reymond, J.-L. (2025). Chemical space as a unifying theme for chemistry.
J. Cheminformatics 17 (1), 6. doi:10.1186/s13321-025-00954-0

Rigden, D. J., and Fernández, X. M. (2025). The 2025 nucleic acids research database
issue and the online molecular biology database collection. Nucleic Acids Res. 53 (D1),
D1–D9. doi:10.1093/nar/gkae1220

Schaub, J., Zielesny, A., Steinbeck, C., and Sorokina, M. (2021). Description and
analysis of glycosidic residues in the largest open natural products database.
Biomolecules 11 (4), 486. doi:10.3390/biom11040486

Schmidt, T., Samaras, P., Frejno, M., Gessulat, S., Barnert, M., Kienegger, H., et al.
(2017). ProteomicsDB. Nucleic Acids Res. 46 (D1), D1271-D1281–D1281. doi:10.1093/
nar/gkx1029

Shankavaram, U. T., Varma, S., Kane, D., Sunshine, M., Chary, K. K., Reinhold, W. C.,
et al. (2009). CellMiner: a relational database and query tool for the NCI-60 cancer cell
lines. BMC Genomics 10 (1), 277. doi:10.1186/1471-2164-10-277

Sharma, A., Saha, B. K., Kumar, R., and Varadwaj, P. K. (2021). OlfactionBase: a
repository to explore odors, odorants, olfactory receptors and odorant–receptor
interactions. Nucleic Acids Res. 50 (D1), D678–D686. doi:10.1093/nar/gkab763

Sincere, N. I., Anand, K., Ashique, S., Yang, J., and You, C. (2023). PROTACs:
emerging targeted protein degradation approaches for advanced druggable strategies.
Molecules 28 (10), 4014. doi:10.3390/molecules28104014

Sosnin, S. (2025). Chemical space visual navigation in the era of deep learning and big
data. Drug Discov. Today 30 (7), 104392. doi:10.1016/j.drudis.2025.104392

Srivastava, A., Pike, A., Swedrowska, M., Nash, S., and Grime, K. (2025). In vitro
ADME profiling of PROTACs: successes, challenges, and lessons learned from analysis
of clinical protacs from a diverse physicochemical space. J. Med. Chem. 68 (9),
9584–9593. doi:10.1021/acs.jmedchem.5c00358

Tihányi, J., Horváthová, E., Fábelová, L., Murínová, Ľ. P., Sisto, R., Moleti, A., et al.
(2025). Environmental ototoxicants: an update. Environ. Sci. Pollut. Res. 32, 8629–8642.
doi:10.1007/s11356-025-36230-9

Torchet, R., Druart, K., Ruano, L. C., Moine-Franel, A., Borges, H., Doppelt-Azeroual,
O., et al. (2021). The iPPI-DB initiative: a community-centered database of
protein–protein interaction modulators. Bioinformatics 37 (1), 89–96. doi:10.1093/
bioinformatics/btaa1091

Valle-Núñez, G., Cedillo-González, R., Avellaneda-Tamayo, J. F., Saldívar-González,
F. I., Prado-Romero, D. L., and Medina-Franco, J. L. (2025). Machine learning-driven
antiviral libraries targeting respiratory viruses. Digit. Discov. 4, 1239–1258. doi:10.1039/
d5dd00037h

Viarengo-Baker, L. A., Brown, L. E., Rzepiela, A. A., and Whitty, A. (2021). Defining
and navigating macrocycle chemical space. Chem. Sci. 12, 4309–4328. doi:10.1039/
d0sc05788f

Viayna, A., Matamoros, P., Blázquez-Ruano, D., and Zamora, W. J. (2024). From
canonical to unique: extension of a lipophilicity scale of amino acids to non-standard
residues. Explor Drug Sci. 2, 389–407. doi:10.37349/eds.2024.00053

Virshup, A. M., Contreras-García, J., Wipf, P., Yang, W., and Beratan, D. N. (2013).
Stochastic voyages into uncharted chemical space produce a representative library of all
possible drug-like compounds. J. Am. Chem. Soc. 135 (19), 7296–7303. doi:10.1021/
ja401184g

Warr, W. A., Nicklaus, M. C., Nicolaou, C. A., and Rarey, M. (2022). Exploration of
ultralarge compound collections for drug discovery. J. Chem. Inf. Model. 62 (9),
2021–2034. doi:10.1021/acs.jcim.2c00224

Wassermann, A. M., Lounkine, E., Hoepfner, D., Le Goff, G., King, F. J., Studer, C.,
et al. (2015). Dark chemical matter as a promising starting point for drug lead discovery.
Nat. Chem. Biol. 11, 958–966. doi:10.1038/nchembio.1936

Whitty, A., and Zhou, L. (2015). Horses for courses: reaching outside drug-like
chemical space for inhibitors of challenging drug targets. Fut. Med. Chem. 7 (9),
1093–1095. doi:10.4155/fmc.15.56

Wigh, D. S., Goodman, J. M., and Lapkin, A. A. (2022). A review of molecular
representation in the age of machine learning. WIREs Comp. Mol. Sci. 12 (5), e1603.
doi:10.1002/wcms.1603

Williams, A. J., and Richard, A. M. (2025). Three pillars for ensuring public access and
integrity of chemical databases powering cheminformatics. J. Cheminf. 17, 40. doi:10.
1186/s13321-025-00983-9

Williams, R. V., Amberg, A., Brigo, A., Coquin, L., Giddings, A., Glowienke, S., et al.
(2016). It’s difficult, but important, to make negative predictions. Regul. Toxicol.
Pharmacol. 76, 79–86. doi:10.1016/j.yrtph.2016.01.008

Ye, G. (2024). De novo drug design as GPT language modeling: large chemistry
models with supervised and reinforcement learning. J. Comput.-Aided Mol. Des. 38, 20.
doi:10.1007/s10822-024-00559-z

Zamora, W. J., Curutchet, C., Campanera, J. M., and Luque, F. J. (2017). Prediction of
pH-dependent hydrophobic profiles of small molecules from miertus–scrocco–tomasi
continuum solvation calculations. J. Phys. Chem. B 121 (42), 9868–9880. doi:10.1021/
acs.jpcb.7b08311

Zamora, W. J., Campanera, J. M., and Luque, F. J. (2019). Development of a
structure-based, ph-dependent lipophilicity scale of amino acids from continuum
solvation calculations. J. Phys. Chem. Lett. 10 (4), 883–889. doi:10.1021/acs.
jpclett.9b00028

Zdrazil, B. (2025). Fifteen years of ChEMBL and its role in cheminformatics and drug
discovery. J. Cheminf. 17, 32. doi:10.1186/s13321-025-00963-z

Zdrazil, B., Felix, E., Hunter, F., Manners, E. J., Blackshaw, J., Corbett, S., et al. (2023).
The ChEMBL database in 2023: a drug discovery platform spanning multiple bioactivity
data types and time periods. Nuc. Acids Res. 52 (D1), D1180–D1192. doi:10.1093/nar/
gkad1004

Zhang, X., Betzi, S., Morelli, X., and Roche, P. (2014). Focused chemical
libraries – design and enrichment: an example of protein–protein interaction
chemical space. Fut. Med. Chem. 6 (11), 1291–1307. doi:10.4155/fmc.14.57

Zhang, Y., Lorsbach, B. A., Castetter, S., Lambert, W. T., Kister, J., Wang, N. X., et al.
(2018). Physicochemical property guidelines for modern agrochemicals. Pest Manag.
Sci. 74, 1979–1991. doi:10.1002/ps.5037

Zin, P. P. K., Williams, G. J., and Ekins, S. (2020). Cheminformatics analysis and
modeling with macrolactoneDB. Sci. Rep. 10, 6284. doi:10.1038/s41598-020-63192-4

Frontiers in Drug Discovery frontiersin.org07

Medina-Franco et al. 10.3389/fddsv.2025.1674289

https://doi.org/10.1038/s41586-019-0917-9
https://doi.org/10.1038/s41586-019-0917-9
https://doi.org/10.26434/chemrxiv-2024-cswth
https://doi.org/10.1177/1177391x0700100003
https://doi.org/10.1002/cmdc.201200507
https://doi.org/10.1007/978-3-319-10226-9
https://doi.org/10.1007/978-3-319-10226-9
https://doi.org/10.1016/j.drudis.2019.09.013
https://doi.org/10.1016/j.drudis.2019.09.013
https://doi.org/10.1016/j.ailsci.2024.100117
https://doi.org/10.1002/minf.202200116
https://doi.org/10.1016/j.cbpa.2007.05.013
https://doi.org/10.1002/cmdc.200900317
https://doi.org/10.1002/minf.202400186
https://doi.org/10.1002/minf.202200163
https://doi.org/10.1021/acs.jmedchem.3c02332
https://doi.org/10.1186/s13321-025-00954-0
https://doi.org/10.1093/nar/gkae1220
https://doi.org/10.3390/biom11040486
https://doi.org/10.1093/nar/gkx1029
https://doi.org/10.1093/nar/gkx1029
https://doi.org/10.1186/1471-2164-10-277
https://doi.org/10.1093/nar/gkab763
https://doi.org/10.3390/molecules28104014
https://doi.org/10.1016/j.drudis.2025.104392
https://doi.org/10.1021/acs.jmedchem.5c00358
https://doi.org/10.1007/s11356-025-36230-9
https://doi.org/10.1093/bioinformatics/btaa1091
https://doi.org/10.1093/bioinformatics/btaa1091
https://doi.org/10.1039/d5dd00037h
https://doi.org/10.1039/d5dd00037h
https://doi.org/10.1039/d0sc05788f
https://doi.org/10.1039/d0sc05788f
https://doi.org/10.37349/eds.2024.00053
https://doi.org/10.1021/ja401184g
https://doi.org/10.1021/ja401184g
https://doi.org/10.1021/acs.jcim.2c00224
https://doi.org/10.1038/nchembio.1936
https://doi.org/10.4155/fmc.15.56
https://doi.org/10.1002/wcms.1603
https://doi.org/10.1186/s13321-025-00983-9
https://doi.org/10.1186/s13321-025-00983-9
https://doi.org/10.1016/j.yrtph.2016.01.008
https://doi.org/10.1007/s10822-024-00559-z
https://doi.org/10.1021/acs.jpcb.7b08311
https://doi.org/10.1021/acs.jpcb.7b08311
https://doi.org/10.1021/acs.jpclett.9b00028
https://doi.org/10.1021/acs.jpclett.9b00028
https://doi.org/10.1186/s13321-025-00963-z
https://doi.org/10.1093/nar/gkad1004
https://doi.org/10.1093/nar/gkad1004
https://doi.org/10.4155/fmc.14.57
https://doi.org/10.1002/ps.5037
https://doi.org/10.1038/s41598-020-63192-4
https://www.frontiersin.org/journals/drug-discovery
https://www.frontiersin.org
https://doi.org/10.3389/fddsv.2025.1674289

	On the biologically relevant chemical space: BioReCS
	1 Introduction
	2 BioReCS
	2.1 Current view
	2.2 Heavily explored chemical subspaces
	2.3 Underexplored chemical subspaces
	2.3.1 Dark regions of the underexplored BioReCS


	3 Common considerations to explore BioReCS
	3.1 Towards universal descriptors
	3.2 pH-dependent chemical space
	3.3 De novo generated libraries: expanding the BioReCS
	3.4 Developing novel computational approaches

	4 Summary
	Author contributions
	Funding
	Acknowledgments
	Conflict of interest
	Generative AI statement
	Publisher’s note
	References