Command Line Data Upload Installation and Setup

Batch upload your raw data from your machine or local server

Installation and Setup

Step 1

Download the bx  tool. Downloads are available for linux and MacOS. If you need Windows support for command line upload, please contact support. The MacOS version will only work on MacOS 11.3.1 and above. If you need to run the tool on an older version of MacOS, see the Docker based instructions below. The linux version should work on all distributions. The tool is only available for AMD64 architecture, ARM and M1 are not supported. If you are downloading the tool from a command line on a server, you can use


curl -fsSLO https://storage.googleapis.com/biobox-public/cli/v2.0.0/linux-amd64/bx

Step 2

Wherever you have downloaded the bx tool, navigate to this location using the command line. For example on MacOS if your username is “john”, and you downloaded into the Downloads folder, you would type

cd ~/Downloads or cd /Users/john/Downloads

into the Terminal.app program. Once navigated to the location where you have downloaded bx, enable the file to be executable: type  chmod +x bx  into the terminal and press enter.

Step 3

You may wish to add the location of the bx tool into your terminal’s $PATH environment variable - otherwise when using the tool you will always need to type the full path, like 

~/Downloads/bx (instead of just bx) 

If the tool is in your Downloads folder, you would run

export PATH="$PATH:$HOME/Downloads"  or
export PATH="$PATH:/Users/john/Downloads" 

the beginning of every terminal session you intend to use the tool. To make this permanent, add that export line to the end of your shell configuration file, likely

~/.bash_profile or ~/.bashrc or ~/.profile 

and restart your terminal application.


 

Required Environment Variables

For each terminal session, you must set these environment variables before uploading. Each environment variable is set by running export <VARIABLE_NAME>=xxx in your terminal.

  1. "BX_ORG"  is your org ID. You can retrieve it by copy pasting the first segment of the URL when on any pages within the organization. An example org ID looks life bx-1f264133-d3ac-4542-b7af-b3434b28a21c.

    export BX_ORG=xxx
  2. "BX_TOKEN" is a private credential that allows anyone with it to upload data into your organization. KEEP IT SECRET. The token will last for one month, after that attempting to upload files with it will fail; at this point generate a new token. The token can be retrieved by visiting the Account tab within your account page and clicking the “GET API TOKEN” button:

Screen Shot 2021-01-13 at 3.33.06 PM

Screen Shot 2021-01-13 at 3.48.20 PM

Copy the entire string, being mindful of text scrolling horizontally:

API code

Then set it with

export BX_TOKEN=<pasted token> 

(not including < and >).

On MacOS you can also use

export BX_TOKEN=$(pbpaste) 

once the token is in your clipboard. Then verify the token is correct with

echo $BX_TOKEN

"BX_JSON" only needs to be set if the you do not have write permissions in the folder you are running bx from. Set "BX_JSON" to a folder you have write permissions, the file will end up being stored at

$BX_JSON/biobox.json

This file is used to keep track of upload progress should a file upload fail partially. For example,

export BX_JSON=/Users/john/Documents/

Performing a Batch File Upload

Batch upload takes a TSV file with the columns filename, filetype, experiment, and technical replicate. Do not include any file headers in the TSV itself. The file paths can either be a relative basepath or a full path containing directories. All / within a path will be converted to _.

An example input TSV (referred to hereafter as example.tsv) looks like (no column headers):

MB1_1.fastq	FASTQ	P1 RNA	1
MB1_2.fastq FASTQ P1 RNA 1
MB2_1.fastq FASTQ P2 RNA 1
MB2_2.fastq FASTQ P2 RNA 2
aligned.bam BAM P3 DNA

The first column is the path to the file itself, relative to where you are running bx from. You may use filenames or full file paths, but be aware that full paths will have the / directory separators converted into _ in the resulting entity display name and filename property.

The second column is the type of file, one of exactly FASTQ or BAM.

The third column is the name of the experiment to link this data record with.

The fourth column is the technical replicate number. You can only set this for FASTQ files.

You can run the upload with:

bx library data upload-batch example.tsv

Options

Pass --overwrite to replace the raw files stored in cloud object storage. The default flag is --skip-exists. Overwriting will only be permitted if filename and experiment name exactly match.

Example usage:

bx library data upload-batch --overwrite example.tsv

The default behaviour (same as not passing any options):

bx library data upload-batch --skip-duplicates example.tsv

Exit Codes

The program will exit with a 1 exit code if any validations fail or if an upload failed for any reason while in progress. Once a single upload fails, the rest will be skipped.

Validations

Validations ran against the name column:

  • File at path exists

  • File at path is non-empty (does not have a size of 0 bytes)

  • For FASTQ,

    • File ends in .fastq.gz or .fq.gz or .fastq.gzipped or .fq.gzipped

  • For BAM,

    • File ends in .bam

Validations ran against the filetype column:

  • Must be one of FASTQ, BAM

Validations ran against the experiment column:

  • An experiment with the display name as the column value must exist

Validations ran against the technical replicate column:

  • Replicate should not already be assigned by another FASTQ

Validations ran after the above:

  • No duplicate URIs present (which are computed based on experiment name, filename, and replacing / in the path)

Validations ran when checking if file/entity already exists:

  • Will confirm that a data record of the appropriate type (FASTQ or BAM) exists with the display name as the name column value, AND that exactly the same experiment with the display name as the experiment column value exists as a link from that data record. If both of these do not exactly match up, --overwrite will not overwrite the raw file.

Using Docker

If you have Docker installed, you can build an image and use the linux version of the tool on a MacOS device, if your MacOS version is below 11.3 and the MacOS tool does not work on your machine. Create a file called Dockerfile with these contents:

FROM ubuntu:20.04

RUN apt update && apt install -y --no-install-recommends \
    ca-certificates \
    curl \
    && rm -rf /var/lib/apt/lists/*

RUN curl -fsSL -o /usr/local/bin/bx https://storage.googleapis.com/biobox-public/cli/v2.0.0/linux-amd64/bx && \
    chmod +x /usr/local/bin/bx

ENV LC_ALL=C.UTF-8 LANG=C.UTF-8

ENTRYPOINT [ "/usr/local/bin/bx" ]

Then run:

docker build -t bx .

To use the tool through Docker, export all the same environment variables as above (or set them here) and use:

docker run -it -e BX_ORG=$BX_ORG -e BX_TOKEN=$BX_TOKEN -v "$(pwd)":/data -w /data bx library data upload-batch example.tsv

Note the "-v $(pwd):/data", this mounts our current directory on the host machine into "/data" inside the container. As well the "-w /data" sets the default directory when we run the bx command inside the container to be "/data". To make this more convenient, run:

alias bx='docker run -it -e BX_ORG=$BX_ORG -e BX_TOKEN=$BX_TOKEN -v "$(pwd)":/data -w /data bx'

Then you can use "bx" like normal, as if it was a simple binary.


Common Errors

  • TokenError:

ValueError: tokens posted are invalid

If you see the error above, it implies that your JWT token can not be authenticated. This can happen for a number of reasons including an expired token or a mistake in setting the environment variable. You can always generate a new token using the steps above. Use

echo $BX_TOKEN 

to confirm that the variable has been set correctly.

  • Call to gateway failed

    raise Exception(e)
Exception: Call to gateway failed

This error is because your authenticated user has failed to communicate with the BioBox API. This is most likely because you have not set or have set an incorrect BX_ORG. Use

echo $BX_ORG 

to confirm your orgID against the ID provided in the user profile page outlined in the first step.