SRA Toolkit
The SRA Toolkit is used to manipulate data from the Sequence Read Archive. The main two tools used in the project are the prefetch and fasterq-dump tools.
Download
The SRA Toolkit can be downloaded from GitHub. Its image can be retrieved from Docker Hub (main, mirror).
Usage
prefetch
# Template
prefetch [SRA Accession Number]
# Example
prefetch SRR21470609
For multiple sequences, you may choose to create a file containing all the SRA accession numbers separated by a new line and add the file to the command using the --option-file option.
prefetch --option-file sra.txt
fasterq-dump
# Template
fasterq-dump --split-files [SRA Accession Number]
# Example
fasterq-dump --split-files SRR21470609
Using this command, you should get one set of paired-end reads, SRR21470609_1.fastq and SRR21470609_2.fastq.
For multiple sequences, you may create a loop in your Bash script using the text file containing the SRA accession numbers.
sratoolkit/prefetch --option-file $1
while read sra; do
sratoolkit/fasterq-dump --split-files "$sra"
done < $1