Logo Kérwá
 

Benchmarking genome assemblers for four bacterial models based on contiguity, correctness, and completeness

dc.creatorRojas Miranda, Hanzel Jesús
dc.creatorMadrigal Ly, Vanessa
dc.creatorMolina Mora, José Arturo
dc.date.accessioned2025-11-17T15:48:15Z
dc.date.issued2025
dc.description.abstractDe novo genome assembly allows for the genome reconstruction of an organism without using a reference sequence. Assembly results depend on various sequencing technologies that generate data with differing fidelity, read lengths, and coverage levels, as well as on performance of a wide variety of algorithms. These attributes generate a diversity of assemblies for each single genome, which collectively defines the pan-assembly. In this study, we aimed to benchmark pan-assemblies of the prokaryotic models Brucella henselae, Escherichia coli, Pseudomonas aeruginosa, and Xylella fastidiosa, using different attributes and their impact on metrics of the 3C (contiguity, correctness, and completeness) criterion for selecting the best conditions for de novo genome assembly. Results showed that short-read assembly strategies presented higher accuracy with fewer errors (high correctness) and a high degree of completeness but lower contiguity due to fragmented assemblies. In contrast, long-read-based strategies showed high contiguity but lower completeness and accuracy. The hybrid strategy yielded the best overall results across all parameters by leveraging the strengths of both types of technology. Regarding assembly algorithms, Unicycler was the top assembler in 3C metrics, using any of the short-read (compared to Megahit), long-read (compared to Canu), or hybrid strategies (compared to Wengan). Overall, the hybrid approach with Unicycler proved to be the best general approach for genome assembly of the four bacterial models. Finally, regarding coverage depth, increasing depth did not significantly affect assembly quality results if a minimum data level was maintained, indicating that high-quality assemblies can be achieved using moderate coverage levels. Jointly, the results of the pan-assembly provide working conditions for de novo genome assembly that can be applied to bacterial models of interest, guiding the selection of optimized experimental and bioinformatics conditions while reducing sequencing costs for generating high-quality sequences.
dc.description.procedenceUCR::Vicerrectoría de Investigación::Unidades de Investigación::Ciencias de la Salud::Centro de Investigación en Enfermedades Tropicales (CIET)
dc.description.procedenceUCR::Vicerrectoría de Investigación::Unidades de Investigación::Ciencias de la Salud::Centro de Investigación en Hematología y Trastornos Afines (CIHATA)
dc.description.sponsorshipUniversidad de Costa Rica/[803-C1163]/UCR/Costa Rica
dc.description.sponsorshipUniversidad de Costa Rica/[803-C4604]/UCR/Costa Rica
dc.identifier.codproyecto803-C1163
dc.identifier.codproyecto803-C4604
dc.identifier.urihttps://hdl.handle.net/10669/103234
dc.language.isoeng
dc.rightsAttribution-NonCommercial-NoDerivs 3.0 United Statesen
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/us/
dc.sourceUniversidad de Costa Rica
dc.subjectpan-assembly
dc.subjectprokaryotes
dc.subjectde novo genome assembly
dc.subject3C criterion
dc.subjectbenchmarking
dc.titleBenchmarking genome assemblers for four bacterial models based on contiguity, correctness, and completeness
dc.typeartículo preliminar

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
13-11 Maintext Pan-ensamblaje v2ok.pdf
Size:
348.67 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
3.5 KB
Format:
Item-specific license agreed upon to submission
Description:

Collections