BactiSeq: Databases

Database Configuration

Table of contents

  1. Database Configuration
    1. Introduction
    2. Default Database Versions
    3. Database Parameters

Introduction

Databases are fundamental for genomic analysis, and Bactiseq’s customizable database parameters provide critical flexibility that enhances the pipeline’s applicability across diverse research contexts. This functionality enables researchers to:

  • Utilize specific databases to maintain consistency with previously executed analyses
  • Optimize database size and scope when computational resources are limited
  • Customize analyses for specific organisms or research questions

The pipeline automatically downloads default databases and stores them in a the path specified –db_path or default, saves to the current directory under Databases/

Default Database Versions

When no custom database paths are provided, Bactiseq downloads the following default versions:

  1. Bakta database - v1.10.4
  2. AMRFinderPlus database - v3.12.8
  3. CheckM2 database - v14897628
  4. RGI database - v6.0.3
  5. Busco database - v5.8.3 (bacteria_odb10)
  6. Kraken2 database - 8GB standard database
  7. Gambit database - v1.0

Database Parameters

Parameter Description Default Value Resources
db_path Database storage directory for databases downloaded through pipeline "./Databases" custom local path
bakta_db Path to Bakta annotation database null https://zenodo.org/records/14916843
amr_db Path to AMRFinderPlus database null https://github.com/ncbi/amr/wiki/AMRFinderPlus-database
card_db Path to RGI database for AMR genes null https://github.com/arpcard/rgi/blob/master/docs/rgi_load.rst
checkm2_db Path to CheckM2 database null https://zenodo.org/records/14897628
checkm2_ver Version of database that will be downloaded if no path is given 14897628 https://zenodo.org/records/14897628
busco_db_type Type of database that the Busco database contains "bacteria_odb10" https://busco.ezlab.org/
busco_db Path to Busco database null https://busco.ezlab.org/
kraken2_db Path to Kraken2 database null https://benlangmead.github.io/aws-indexes/k2
gambit_db Path to gambit database null https://gambit-genomics.readthedocs.io/en/latest/databases.html#database-releases

If a different version fo the database is wanted, the path to the manually downaded database can be set in nextflow.config file of the pipeline. This allows researchers to utilize specific database to maintain consistency with potentially previously executed analyses or optimize database size and scope if computational resources are limited.

⚠️ WARNING: If you change the database type for Bakta to a light database, you must change the line in subworkflows/local/databasedownload/main.nf for bakta to detect db-light instead of just db.


Flexible, portable and reproducible whole bacterial genome analysis pipeline

This site uses Just the Docs, a documentation theme for Jekyll.