Installation#

Install MicroCAT#

MicroCAT runs on Python 3.8 and above. We provide several installation methods:

Pip Installation#

Use pip to quickly install microcat from PyPI:

pip install microcat

Then install the software required to run microcat, or use the ‘–use-conda’ parameter during execution to automatically build the runtime environment (see microcat’s official documentation).

Docker Image#

Docker image is still under construction, please be patient.

If you enter microcat --help in the terminal and the following information is displayed, it means that MicroCAT has been successfully installed:

!microcat --help
Usage: microcat [OPTIONS] COMMAND [ARGS]...

          ███╗   ███╗██╗ ██████╗██████╗  ██████╗  ██████╗ █████╗ ████████╗
          ████╗ ████║██║██╔════╝██╔══██╗██╔═══██╗██╔════╝██╔══██╗╚══██╔══╝
          ██╔████╔██║██║██║     ██████╔╝██║   ██║██║     ███████║   ██║
          ██║╚██╔╝██║██║██║     ██╔══██╗██║   ██║██║     ██╔══██║   ██║
          ██║ ╚═╝ ██║██║╚██████╗██║  ██║╚██████╔╝╚██████╗██║  ██║   ██║
          ╚═╝     ╚═╝╚═╝ ╚═════╝╚═╝  ╚═╝ ╚═════╝  ╚═════╝╚═╝  ╚═╝   ╚═╝
          Microbiome Identification upon Cell Resolution from Omics-
          Computational Analysis Toolbox

Options:
  -v, --version  Show the version and exit.
  -h, --help     Show this message and exit.

Commands:
  config      Quickly adjust microcat's default configurations
  debug       Execute the analysis workflow on debug mode.
  download    Download necessary files for running microcat
  init        Init microcat style analysis project
  path        Print out microcat install path
  run-local   Execute the analysis workflow on local computer mode
  run-remote  Execute the analysis workflow on remote cluster mode

Installation of tools for host read mapping and counting#

For the read mapping and UMI counting step microcat offers pre-defined rules for using either Cellranger or STARsolo. Both tools are not available for installation via conda and need to be installed separately. Only one of the tools needs to be installed, depending on the method of choice.

STAR of version 2.7.9a or above is recommended (2.7.10a is the latest and greatest, as of August’22). The newest update includes the ability to correctly process multi-mapping reads, and adds many important options and bug fixes.

In order to use settings that closely mimic those of Cell Ranger v4 or above (see explanations below, particularly –clipAdapterType CellRanger4 option), STAR needs to be re-compiled from source with make STAR CXXFLAGS_SIMD=”-msse4.2” (see this issue for more info). If you get the Illegal instruction error, that’s what you need to do.

You can also use the command line to check if cellranger has been successfully installed.

!cellranger --version
cellranger cellranger-7.1.0

STARsolo is invoked through the STAR command line

!STAR --version
2.7.11b

Note

In the workflow of microcat, we assume that users have already added cellranger and STAR to the environment variables, and use the software by calling the command line of cellranger and STAR. In the future, microcat will support user-defined software paths, such as calling the software after using module load in a high-performance computing cluster.

Adapting/Integrating rules in Snakemake#

Snakemake is a Python-based workflow management system for building and executing pipelines. A pipeline is made up of “rules” that represent single steps of the analysis. In a yaml config file parameters and rule-specific input can be adjusted to a new analysis without changing the rules. In a “master” snake file the desired end points of the analysis are specified. With the input and the desired output defined, Snakemake is able infer all steps that have to be performed in-between.

To change one of the steps, e.g. to a different software tool, one can create a new rule, insert a new code block into the config file, and include the input/output directory of this step in the master snake file. It is important to make sure that the format of the input and output of each rule is compatible with the previous and the subsequent rule. For more detailed information please have a look at the excellent online documentation of Snakemake.

Install snakemake cluster profile#

Thanks to the organizational characteristics of snakemake itself, users can quickly download the corresponding cluster configuration files and configure and automate the task scheduling system. For details, see the snakemake documentation

Note

After snakemake 8.0, the cluster call interface was changed to use the plugin mode. At present, microcat does not yet support this mode, so microcat only adapts to snakemake > 7 <8 versions.

We recommend users to use the snakemake profile generic profile, which can configure dynamic tasks according to the resource requirements of different task nodes.

Users can download the corresponding profile file through microcat download profile and configure it.

!microcat download profile -h
Usage: microcat download profile [OPTIONS]

  Download profile config from Github

  $ microcat download profile --cluster lsf

  $ microcat download profile --cluster slurm

  $ microcat download profile --cluster sge

Options:
  --cluster [slurm|sge|lsf]  Cluster workflow manager engine, now support
                             generic
  -h, --help                 Show this message and exit.