SRA Toolkit

The SRA Toolkit is used to manipulate data from the Sequence Read Archive. The main two tools used in the project are the prefetch and fasterq-dump tools.

Download

The SRA Toolkit can be downloaded from GitHub. Its image can be retrieved from Docker Hub (main, mirror).

Usage

prefetch

# Template
prefetch [SRA Accession Number]

# Example
prefetch SRR21470609

For multiple sequences, you may choose to create a file containing all the SRA accession numbers separated by a new line and add the file to the command using the --option-file option.

prefetch --option-file sra.txt

fasterq-dump

# Template
fasterq-dump --split-files [SRA Accession Number]

# Example
fasterq-dump --split-files SRR21470609

Using this command, you should get one set of paired-end reads, SRR21470609_1.fastq and SRR21470609_2.fastq.
For multiple sequences, you may create a loop in your Bash script using the text file containing the SRA accession numbers.

sratoolkit/prefetch --option-file $1
while read sra; do
	sratoolkit/fasterq-dump --split-files "$sra"
done < $1