DNA sequencing performance in Go, C++, and Java
Pascal Costanza (imec, Belgium)
2/3/18 at FOSDEM 2018 - Go Devroom
While Go is not primarily designed for parallel programming, it nevertheless has features that end up being beneficial for parallelism as well, especially the inclusion of a work-stealing scheduler for goroutines and a concurrent, parallel garbage collector. For this reason, we have recently included Go as one of several candidate programming languages in an evaluation of their suitability for expressing sequencing pipelines. Other programming languages we have evaluated were C++ and Java. Go hits a sweet spot of performing very close to the best results with little programming effort and few compromises in terms of safety and generality. This talk will present highlights of this experiments and the most important insights. A DNA sequencer takes a DNA sample, such as human tissue, and applies chemical processes to eventually read small fragments of the DNA sample and output them as large files. These files are then fed into software pipelines that reconstruct the original DNA sequence from those fragments, among other things. Such sequencing pipelines need large amounts of storage, on disk and/or in RAM, and can strongly benefit from parallel execution to improve runtime performance. Data sets for human DNA samples are usually in the order of several hundreds of GB of uncompressed data, and runtimes are typically in the order of several hours for single samples.


Receive the most recent recordings from meetups and conferences in your inbox monthly

About MeetupFeed

MeetupFeed collects and organizes recordings from tech meetups and conferences. Follow your global community!

Disclaimer: This site is not associated, affiliated, endorsed, or sponsored by Meetup.com. None of the videos are our own.


New on the Blog