4 SQLite3
This chapter describes our general approach to using sqlite3
for pyani-plus
application data storage.
4.1 Overview
We have used sqlite3
for persistent storage of analysis results in previous versions of pyani
, and we will continue to do so.
The main issue we have faced is performance. In previous versions of pyani
there was a single operation to populate the results database serially with output from each analysis. This was a detriment to performance, as it required
- all comparisons to complete before the database was populated
- each comparison to be added in turn
Needing to wait for all comparisons to complete, especially for large analyses, sometimes left “dead time” on the server where most cores were not conducting comparisons, but no database population was taking place. This time could have been used more efficiently.
The need to populate the database serially did not take advantage of sqlite
’s ability to manage concurrent connections.
The behaviour we would like is that, immediately after each analysis is concluded, the result is added to the sqlite3
database, removing the need to wait for all analyses to conclude.
4.2 Proposal
When we run a pairwise comparison, we carry out the comparison using a wrapper script that conducts the comparison, and then populates the database, e.g.
#!/usr/bin/env bash
run_comparison sequence_A sequence_B -o output.tab # run pairwise comparison
populate_db dbpath output.tab # populate sqlite database with data from the output file
4.3 Test implementation
A test implementation is available under examples/sqlite/concurrency
.