Batch upload your raw data from your machine or local server
Installation and Setup
Download the bx tool. Downloads are available for linux and MacOS. If you need Windows support for command line upload, please contact support. The MacOS version will only work on MacOS 11.3.1 and above. If you need to run the tool on an older version of MacOS, see the Docker based instructions below. The linux version should work on all distributions. The tool is only available for AMD64 architecture, ARM and M1 are not supported. If you are downloading the tool from a command line on a server, you can use
curl -fsSLO https://storage.googleapis.com/biobox-public/cli/v2.0.0/linux-amd64/bx
Wherever you have downloaded the bx tool, navigate to this location using the command line. For example on MacOS if your username is “john”, and you downloaded into the Downloads folder, you would type
cd ~/Downloads or cd /Users/john/Downloads
into the Terminal.app program. Once navigated to the location where you have downloaded bx, enable the file to be executable: type chmod +x bx into the terminal and press enter.
You may wish to add the location of the bx tool into your terminal’s $PATH environment variable - otherwise when using the tool you will always need to type the full path, like
~/Downloads/bx (instead of just bx)
If the tool is in your Downloads folder, you would run
export PATH="$PATH:$HOME/Downloads" or
the beginning of every terminal session you intend to use the tool. To make this permanent, add that export line to the end of your shell configuration file, likely
~/.bash_profile or ~/.bashrc or ~/.profile
and restart your terminal application.
Required Environment Variables
For each terminal session, you must set these environment variables before uploading. Each environment variable is set by running export <VARIABLE_NAME>=xxx in your terminal.
"BX_ORG" is your org ID. You can retrieve it by copy pasting the first segment of the URL when on any pages within the organization. An example org ID looks life bx-1f264133-d3ac-4542-b7af-b3434b28a21c.
"BX_TOKEN" is a private credential that allows anyone with it to upload data into your organization. KEEP IT SECRET. The token will last for one month, after that attempting to upload files with it will fail; at this point generate a new token. The token can be retrieved by visiting the Account tab within your account page and clicking the “GET API TOKEN” button:
Copy the entire string, being mindful of text scrolling horizontally:
Then set it with
export BX_TOKEN=<pasted token>
(not including < and >).
On MacOS you can also use
once the token is in your clipboard. Then verify the token is correct with
"BX_JSON" only needs to be set if the you do not have write permissions in the folder you are running bx from. Set "BX_JSON" to a folder you have write permissions, the file will end up being stored at
This file is used to keep track of upload progress should a file upload fail partially. For example,
Performing a Batch File Upload
Batch upload takes a TSV file with the columns filename, filetype, experiment, and technical replicate. Do not include any file headers in the TSV itself. The file paths can either be a relative basepath or a full path containing directories. All / within a path will be converted to _.
An example input TSV (referred to hereafter as example.tsv) looks like (no column headers):
MB1_1.fastq FASTQ P1 RNA 1
MB1_2.fastq FASTQ P1 RNA 1
MB2_1.fastq FASTQ P2 RNA 1
MB2_2.fastq FASTQ P2 RNA 2
aligned.bam BAM P3 DNA
The first column is the path to the file itself, relative to where you are running bx from. You may use filenames or full file paths, but be aware that full paths will have the / directory separators converted into _ in the resulting entity display name and filename property.
The second column is the type of file, one of exactly FASTQ or BAM.
The third column is the name of the experiment to link this data record with.
The fourth column is the technical replicate number. You can only set this for FASTQ files.
You can run the upload with:
bx library data upload-batch example.tsv
Pass --overwrite to replace the raw files stored in cloud object storage. The default flag is --skip-exists. Overwriting will only be permitted if filename and experiment name exactly match.
bx library data upload-batch --overwrite example.tsv
The default behaviour (same as not passing any options):
bx library data upload-batch --skip-duplicates example.tsv
The program will exit with a 1 exit code if any validations fail or if an upload failed for any reason while in progress. Once a single upload fails, the rest will be skipped.
Validations ran against the name column:
File at path exists
File at path is non-empty (does not have a size of 0 bytes)
File ends in .fastq.gz or .fq.gz or .fastq.gzipped or .fq.gzipped
File ends in .bam
Validations ran against the filetype column:
Must be one of FASTQ, BAM
Validations ran against the experiment column:
An experiment with the display name as the column value must exist
Validations ran against the technical replicate column:
- Replicate should not already be assigned by another FASTQ
Validations ran after the above:
No duplicate URIs present (which are computed based on experiment name, filename, and replacing / in the path)
Validations ran when checking if file/entity already exists:
Will confirm that a data record of the appropriate type (FASTQ or BAM) exists with the display name as the name column value, AND that exactly the same experiment with the display name as the experiment column value exists as a link from that data record. If both of these do not exactly match up, --overwrite will not overwrite the raw file.
If you have Docker installed, you can build an image and use the linux version of the tool on a MacOS device, if your MacOS version is below 11.3 and the MacOS tool does not work on your machine. Create a file called Dockerfile with these contents:
RUN apt update && apt install -y --no-install-recommends \
&& rm -rf /var/lib/apt/lists/*
RUN curl -fsSL -o /usr/local/bin/bx https://storage.googleapis.com/biobox-public/cli/v2.0.0/linux-amd64/bx && \
chmod +x /usr/local/bin/bx
ENV LC_ALL=C.UTF-8 LANG=C.UTF-8
ENTRYPOINT [ "/usr/local/bin/bx" ]
docker build -t bx .
To use the tool through Docker, export all the same environment variables as above (or set them here) and use:
docker run -it -e BX_ORG=$BX_ORG -e BX_TOKEN=$BX_TOKEN -v "$(pwd)":/data -w /data bx library data upload-batch example.tsv
Note the "-v $(pwd):/data", this mounts our current directory on the host machine into "/data" inside the container. As well the "-w /data" sets the default directory when we run the bx command inside the container to be "/data". To make this more convenient, run:
alias bx='docker run -it -e BX_ORG=$BX_ORG -e BX_TOKEN=$BX_TOKEN -v "$(pwd)":/data -w /data bx'
Then you can use "bx" like normal, as if it was a simple binary.
ValueError: tokens posted are invalid
If you see the error above, it implies that your JWT token can not be authenticated. This can happen for a number of reasons including an expired token or a mistake in setting the environment variable. You can always generate a new token using the steps above. Use
to confirm that the variable has been set correctly.
Call to gateway failed
Exception: Call to gateway failed
This error is because your authenticated user has failed to communicate with the BioBox API. This is most likely because you have not set or have set an incorrect BX_ORG. Use
to confirm your orgID against the ID provided in the user profile page outlined in the first step.