Preprocess flow data#

In this notebook, we load an fcs file into the anndata format, move the forward scatter (FCS) and sideward scatter (SSC) information to the .obs section of the anndata file and perform compensation on the data. Next, we apply different types of normalisation to the data.

import readfcs
import pytometry as pm

%load_ext autoreload
%autoreload 2

Read data from readfcs package example.

path_data = readfcs.datasets.example()

adata = pm.io.read_fcs(path_data)

/home/runner/work/pytometry/pytometry/.nox/build-3-9/lib/python3.9/site-packages/fcsparser/api.py:440: UserWarning: The default channel names (defined by the $PnS parameter in the FCS file) were not unique. To avoid problems in downstream analysis, the channel names have been switched to the alternate channel names defined in the FCS file. To avoid seeing this warning message, explicitly instruct the FCS parser to use the alternate channel names by specifying the channel_naming parameter.
  warnings.warn(msg)

adata

AnnData object with n_obs × n_vars = 65016 × 16
    var: 'Channel Number', 'channel', 'marker', '$PnB', '$PnR', '$PnG'
    uns: 'meta'

Reduce features#

We split the data matrix into the marker intensity part and the FSC/SSC part. Moreover, we move all height related features to the .obs part of the anndata file. Notably. the function split_signal checks if a feature name is either FSC/SSC or whether a name endswith -A for area related features and -H for height related features.

Let us check the var_names of the features and the channel names. In this example, the channel names have been cleaned such that none of the markers have the -A or -H suffix.

adata.var

	Channel Number	channel	marker	$PnB	$PnR	$PnG
FSC-A	1	FSC-A		32	262207	1
FSC-H	2	FSC-H		32	262207	1
SSC-A	3	SSC-A		32	261588	1
KI67	4	B515-A	KI67	32	261588	1
CD3	5	R780-A	CD3	32	261588	1
CD28	6	R710-A	CD28	32	261588	1
CD45RO	7	R660-A	CD45RO	32	261588	1
CD8	8	V800-A	CD8	32	261588	1
CD4	9	V655-A	CD4	32	261588	1
CD57	10	V585-A	CD57	32	261588	1
CD14	11	V450-A	CD14	32	261588	1
CCR5	12	G780-A	CCR5	32	261588	1
CD19	13	G710-A	CD19	32	261588	1
CD27	14	G660-A	CD27	32	261588	1
CCR7	15	G610-A	CCR7	32	261588	1
CD127	16	G560-A	CD127	32	261588	1

We use the channel column of the adata.var data frame to split the matrix.

pm.pp.split_signal(adata, var_key="channel")

adata

AnnData object with n_obs × n_vars = 65016 × 13
    obs: 'FSC-A', 'FSC-H', 'SSC-A'
    var: 'Channel Number', 'channel', 'marker', '$PnB', '$PnR', '$PnG', 'signal_type'
    uns: 'meta'

The data matrix was reduced by three features (FSC-A, FSC-H and SSC-A).

Compensation#

Next, we compensate the data using the compensation matrix that is included in the FCS file header. Alternatively, one may provide a custom compensation matrix.

The compensate function matches the var_names of adata with the column names of the spillover matrix to compensate the correct channels.

pm.pp.compensate(adata)

5499 NaN values found after compensation. Please adjust compensation matrix.

Normalize data#

In the next step, we normalize the data. By default, normalization is an inplace operation, i.e. we only create a new anndata object, if we set the argument inplace=False. We demonstrate three different normalization methods that are build in pytometry:

arcsinh
logicle
bi-exponential

adata_arcsinh = pm.tl.normalize_arcsinh(adata, cofactor=150, inplace=False)

adata_logicle = pm.tl.normalize_logicle(adata, inplace=False)

/home/runner/work/pytometry/pytometry/.nox/build-3-9/lib/python3.9/site-packages/pytometry/tools/_normalization.py:166: RuntimeWarning: invalid value encountered in double_scalars
  y = (ae2bx + p["f"]) - (ce2mdx + value)

adata_biex = pm.tl.normalize_biExp(adata, inplace=False)

Read FCS files

API