BIGDATA: F: DKA: DKM: Novel Out-of-core and Parallel Algorithms for Processing Biological Big Data


This is a project funded by NSF (1447711). Specific major goals of our project are: 1) To develop out-of-core algorithms for fundamental problems in Biological Big Data (BBD) processing such as sorting, graph problems, data structures, sequence analysis, and clustering; 2) To develop out-of-core algorithms for macro-problems including de novo genome and metagenome assembly, variant calling, and split-read mapping; 3) To implement the out-of-core and parallel algorithms and develop a software library.

Start Date: September 1, 2014

Participants:

Sanguthevar Rajasekaran (University of Connecticut)
Reda Ammar (University of Connecticut)
Jinbo Bi (University of Connecticut)
Joerg Graf (University of Connecticut)
Sartaj Sahni (University of Florida)
George Weinstock (Jackson Laboratory)
Yufeng Wu (University of Connecticut)

 



 

Publications

  • C. Chu, X. Li, and Y. Wu (2015). SpliceJumper: a classification-based approach for calling splicing junctions from RNAseq data. BMC Bioinformatics. 16 (Suppl 17:S).
  • C. Chu, R. Nielsen and Y.Wu (2016). REPdenovo: Inferring De Novo Repeat Motifs from Short Sequence Reads. PLoS ONE. 11 (3), e0150719.
  • A.-A. Mamun, R. Aseltine, and S. Rajasekaran (2016). Efficient Record Linkage Algorithms Using Complete Linkage Clustering. PLoS ONE. 11 (4), e0154446.
  • A.-A. Mamun, S. Pal, and S. Rajasekaran (2016). KCMBT: a k-mer Counter based on Multiple Burst Trees. Bioinformatics.
  • A.-A. Mamun and S. Rajasekaran (2016). An efficient Minimum Spanning Tree algorithm. Proc. IEEE Symposium on Computers and Communications (ISCC).
  • J.N. Marden, E.A. McClure, L. Beka, and J. Graf (2016). Host Matters: Medicinal Leech Digestive Tract Symbionts and their Pathogenic Potential. Frontiers in Microbiology.
  • M. Nicolae, S. Pathak and S. Rajasekaran (2015). LFQC: A lossless compression algorithm for FASTQ files. Bioinformatics.
  • M. Nicolae and S. Rajasekaran (2016). On pattern matching with k mismatches and few don't cares. Information Processing Letters 118.
  • S. Pal, S. Pathak, and S. Rajasekaran (2016). On Speeding Up Parallel Jacobi Iterations for SVDs. Proc. IEEE International Conference on High Performance Computing and Communications (HPCC).
  • S. Pal and S. Rajasekaran (2015). Improved algorithms for finding edit distance based motifs. IEEE International Conference on Bioinformatics and Biomedicine (BIBM).
  • S. Pal, T. Xu, T. Yang, S. Rajasekaran, and J. Bi (2017). HybridDCA: Hybrid Distributed Asynchronous Stochastic Dual Coordinate Ascent. Submitted to JPDC.
  • S. Rajasekaran and M. Nicolae (2016). An error correcting parser for context free grammars that takes less than cubic time. 10th International Conference on Language and Automata Theory and Applications (LATA), Prague, Czech Republic.
  • S. Rajasekaran and S. Saha (2016). Efficient Algorithms for the Three Locus Problem in Genome-Wide Association Study. Proc. IEEE International Conference on Data Mining (ICDM).
  • S. Rajasekaran and S. Saha (2016). Efficient Algorithms for the Two Locus Problem in Genome-Wide Association Study. Proc. ACM International Conference on Information and Knowledge Management (CIKM).
  • S. Saha and S. Rajasekaran (2015). NRRC: A Non-referential Reads Compression Algorithm. International Symposium on Bioinformatics Research and Applications (ISBRA).
  • S. Saha and S. Rajasekaran (2015). EC: an efficient error correction algorithm for short reads. BMC Bioinformatics. 16 (Suppl 17:S).
  • S. Saha and S. Rajasekaran (2015). REFECT: A novel paradigm for correcting short reads. The 6th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (ACM-BCB).
  • S. Saha and S. Rajasekaran (2015). ERGC: An efficient referential genome compression algorithm. Bioinformatics. 31 (21).
  • S. Saha and S. Rajasekaran (2016). POMP: A powerful splice mapper for RNAseq data. The 7th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (ACM-BCB).
  • S. Saha and S. Rajasekaran (2016). NRGC: a novel referential genome compression algorithm. Bioinformatics. 32 (22).
  • J. Sun, H.R. Kranzler, and J. Bi (2015). Refining Multivariate Disease Phenotype for High Chip Heritability. BMC Medical Genomics. 8 (Suppl 3:S3).
  • J. Sun, J. Lu, T. Xu and J. Bi (2015). Multi-view Sparse Co-clustering via Proximal Alternating Linearized Minimization. Journal of Machine Learning Research.
  • P. Xiao, S. Pal, and S. Rajasekaran (2016). qPMS10: A randomized algorithm for efficiently solving quorum Planted Motif Search problem. Proc. IEEE International Conference on Bioinformatics and Biomedicine (BIBM).
  • T. Xu, J. Sun, and J. Bi (2015). Longitudinal LASSO: Jointly Learning Features and Temporal Contingency for Outcome Prediction. ACM International Conference on Knowledge Discovery and Data Mining. Sydney, Australia.
  • C. Zhao and S. Sahni (2016). Cache and energy efficient algorithms for Nussinov RNA folding. 6th IEEE International Conference on Computational Advances in Bio and medical Sciences (ICCABS).
  • C. Zhao and S. Sahni (2017). Cache and energy efficient algorithms for RNA folding using Zuker's method. Submitted.