Logo Kérwá
 

Fast, low-memory detection and localization of large, polymorphic inversions from SNPs

dc.creatorNowling, Ronald J.
dc.creatorFallas Moya, Fabián
dc.creatorSadovnik, Amir
dc.creatorEmrich, Scott
dc.creatorAleck, Matthew
dc.creatorLeskiewicz, Daniel
dc.creatorPeters, John G.
dc.date.accessioned2025-06-20T16:16:03Z
dc.date.issued2022-01-20
dc.description.abstractLarge (>1 Mb), polymorphic inversions have substantial impacts on population structure and maintenance of genotypes. These large inversions can be detected from single nucleotide polymorphism (SNP) data using unsupervised learning techniques like PCA. Construction and analysis of a feature matrix from millions of SNPs requires large amount of memory and limits the sizes of data sets that can be analyzed. We propose using feature hashing construct a feature matrix from a VCF file of SNPs for reducing memory usage. The matrix is constructed in a streaming fashion such that the entire VCF file is never loaded into memory at one time. When evaluated on Anopheles mosquito and Drosophila fly data sets, our approach reduced memory usage by 97% with minimal reductions in accuracy for inversion detection and localization tasks. With these changes, inversions in larger data sets can be analyzed easily and efficiently on common laptop and desktop computers. Our method is publicly available through our open-source inversion analysis software, Asaph.
dc.description.procedenceUCR::Sedes Regionales::Sede del Atlántico
dc.description.procedenceUCR::Vicerrectoría de Docencia::Ingeniería::Facultad de Ingeniería::Escuela de Ciencias de la Computación e Informática
dc.description.sponsorshipFundación Nacional de Ciencias/[IIS-1947257]/NSF/Estados Unidos
dc.identifier.doihttps://doi.org/10.7717/peerj.12831
dc.identifier.issn2167-8359
dc.identifier.pmid35116204
dc.identifier.urihttps://hdl.handle.net/10669/102344
dc.language.isoeng
dc.rightsacceso abierto
dc.sourcePeerJ, 10, Artículo e12831
dc.subjectprincipal component analysis
dc.subjectPCA
dc.subjectchromosomal inversions
dc.subjectfeature hashing
dc.subjectsingle nucleotide polymorphisms
dc.subjectSNP
dc.subjectopen-source software
dc.subjectAsaph
dc.subjectbioinformatics
dc.titleFast, low-memory detection and localization of large, polymorphic inversions from SNPs
dc.typeartículo original

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Artículo_peerJ_2022.pdf
Size:
3.04 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
3.5 KB
Format:
Item-specific license agreed upon to submission
Description: