Read FCS files#
In this notebook, we load an fcs file into the anndata format, move the forward scatter (FCS) and sideward scatter (SSC) information to the .obs
section of the anndata file and perform compensation on the data.
import readfcs
import pytometry as pm
Read data from readfcs
package example.
path_data = readfcs.datasets.example()
adata = pm.io.read_fcs(path_data)
/home/runner/work/pytometry/pytometry/.nox/build-3-9/lib/python3.9/site-packages/fcsparser/api.py:440: UserWarning: The default channel names (defined by the $PnS parameter in the FCS file) were not unique. To avoid problems in downstream analysis, the channel names have been switched to the alternate channel names defined in the FCS file. To avoid seeing this warning message, explicitly instruct the FCS parser to use the alternate channel names by specifying the channel_naming parameter.
warnings.warn(msg)
adata
AnnData object with n_obs × n_vars = 65016 × 16
var: 'Channel Number', 'channel', 'marker', '$PnB', '$PnR', '$PnG'
uns: 'meta'
The .var
section of the AnnData object contains the channel information. We set the marker names as var_names
by default. In addition, we save the channel information in the "channel"
column.
adata.var
Channel Number | channel | marker | $PnB | $PnR | $PnG | |
---|---|---|---|---|---|---|
FSC-A | 1 | FSC-A | 32 | 262207 | 1 | |
FSC-H | 2 | FSC-H | 32 | 262207 | 1 | |
SSC-A | 3 | SSC-A | 32 | 261588 | 1 | |
KI67 | 4 | B515-A | KI67 | 32 | 261588 | 1 |
CD3 | 5 | R780-A | CD3 | 32 | 261588 | 1 |
CD28 | 6 | R710-A | CD28 | 32 | 261588 | 1 |
CD45RO | 7 | R660-A | CD45RO | 32 | 261588 | 1 |
CD8 | 8 | V800-A | CD8 | 32 | 261588 | 1 |
CD4 | 9 | V655-A | CD4 | 32 | 261588 | 1 |
CD57 | 10 | V585-A | CD57 | 32 | 261588 | 1 |
CD14 | 11 | V450-A | CD14 | 32 | 261588 | 1 |
CCR5 | 12 | G780-A | CCR5 | 32 | 261588 | 1 |
CD19 | 13 | G710-A | CD19 | 32 | 261588 | 1 |
CD27 | 14 | G660-A | CD27 | 32 | 261588 | 1 |
CCR7 | 15 | G610-A | CCR7 | 32 | 261588 | 1 |
CD127 | 16 | G560-A | CD127 | 32 | 261588 | 1 |
The .uns['meta']
section contains the header information from the FCS file.
adata.uns["meta"]
{'__header__': {'FCS format': 'FCS2.0',
'text start': 58,
'text end': 5099,
'data start': 5120,
'data end': 4166142,
'analysis start': 0,
'analysis end': 0},
'tot': 65016,
'par': 16,
'mode': 'L',
'byteord': '4,3,2,1',
'fil': '100715.fcs',
'nextdata': 0,
'datatype': 'F',
'beginstext': '0',
'btim': '15:36:28',
'cyt': 'Main Aria (FACSAria)',
'date': '17-JUL-2007',
'endstext': '0',
'etim': '15:38:06',
'inst': ' ',
'op': 'Administrator',
'src': 'Specimen_001',
'sys': 'Windows XP 5.1',
'timestep': '0.08',
'apply compensation': 'TRUE',
'autobs': 'TRUE',
'cd age': '19.6',
'CD CD4, %CM': '.',
'CD CD4, %EM': '.',
'CD CD4, %N': '.',
'CD CD4, %TM': '.',
'cd event censor': '0',
'cd first viral load': '2024',
'cd first viral load date': '11/09/1999',
'CD Gag/100 CD4 Cells': '.',
'CD Gag/100 CM Cells': '.',
'CD Gag/100 EM Cells': '.',
'CD Gag/100 N Cells': '.',
'CD GAG/100 TM CELLS': '.',
'cd seroconversion datae': '04/30/1999',
'cd survival time from seroconversion': '63',
'cd time from seroc to sample': '194',
'cytnum': '1',
'experiment name': '070717_AB02_tb',
'export time': '17-JUL-2007-16:04:38',
'export user name': 'Administrator',
'final pin': '100715',
'FJ_P17R': '262144',
'fj_timestep': '0.01',
'fj_compmatrixname': ' ',
'fsc asf': '0.63',
'guid': '0d8e743a-05fe-4e8b-9ec4-25993c124ee2',
'index': '416',
'LASER1ASF': '0.66',
'LASER1DELAY': '0.00',
'LASER1NAME': 'Blue',
'LASER2ASF': '0.55',
'LASER2DELAY': '-59.80',
'LASER2NAME': 'Red',
'LASER3ASF': '0.48',
'LASER3DELAY': '-24.40',
'LASER3NAME': 'Violet',
'LASER4ASF': '0.53',
'LASER4DELAY': '-82.60',
'LASER4NAME': 'Green',
'live cells recovered': ' ',
'pin': ' ',
'pin check': ' ',
'sort type': 'SORT',
'spill': KI67 CD3 CD28 CD45RO CD8 CD4 CD57 \
KI67 1.000000 0.000000 0.000000 0.000088 0.000249 0.000645 0.007198
CD3 0.000000 1.000000 0.071188 0.148448 0.338903 0.009717 0.000000
CD28 0.000000 0.331405 1.000000 0.061965 0.120979 0.004053 0.000000
CD45RO 0.000000 0.088621 0.389424 1.000000 0.029759 0.065553 0.000000
CD8 0.000000 0.136618 0.010757 0.000000 1.000000 0.000156 0.000000
CD4 0.000000 0.000124 0.019463 0.218206 0.004953 1.000000 0.003583
CD57 0.000000 0.000000 0.000000 0.000000 0.001056 0.002287 1.000000
CD14 0.000000 0.000000 0.000000 0.000000 0.000000 0.008118 0.170066
CCR5 0.003122 0.008526 0.001024 0.001163 0.125401 0.018142 0.193646
CD19 0.002015 0.069645 0.194715 0.001008 0.151611 0.001270 0.007133
CD27 0.001685 0.054340 0.277852 0.343008 0.061753 0.077523 0.004263
CCR7 0.000000 0.008713 0.048213 0.073190 0.150563 0.386293 0.101896
CD127 0.001684 0.000000 0.000000 0.000095 0.003463 0.015712 0.174122
CD14 CCR5 CD19 CD27 CCR7 CD127
KI67 0.0 0.000000 0.000131 0.000067 0.000582 0.002520
CD3 0.0 0.301380 0.007478 0.012354 0.000000 0.000000
CD28 0.0 0.109117 0.100314 0.005832 0.000000 0.000000
CD45RO 0.0 0.031294 0.039306 0.091375 0.000396 0.000057
CD8 0.0 0.483235 0.014858 0.000000 0.000000 0.000000
CD4 0.0 0.001311 0.029646 0.408902 0.006506 0.000119
CD57 0.0 0.000389 0.000194 0.000000 0.062551 0.132484
CD14 1.0 0.000000 0.000000 0.000000 0.000000 0.000000
CCR5 0.0 1.000000 0.066898 0.161456 0.286823 1.238037
CD19 0.0 1.150032 1.000000 0.016077 0.014674 0.055352
CD27 0.0 0.497488 0.743923 1.000000 0.010329 0.037635
CCR7 0.0 0.370277 0.613490 1.218024 1.000000 0.065211
CD127 0.0 0.023802 0.049474 0.132511 0.239216 1.000000 ,
'threshold': 'FSC,27000',
'tube name': 'Tube_025',
'viability': ' ',
'vial id': '100715',
'vrc id': ' ',
'window extension': '3.00',
'creator': 'LYSYS',
'P1BS': '0',
'P1DISPLAY': 'LIN',
'P1MS': '0',
'P2BS': '0',
'P2DISPLAY': 'LIN',
'P2MS': '0',
'P3BS': '0',
'P3DISPLAY': 'LOG',
'P3MS': '0',
'P4BS': '0',
'P4DISPLAY': 'LOG',
'P4MS': '0',
'P5BS': '2926',
'P5DISPLAY': 'LOG',
'P5MS': '0',
'P6BS': '1162',
'P6DISPLAY': 'LOG',
'P6MS': '0',
'P7BS': '1849',
'P7DISPLAY': 'LOG',
'P7MS': '0',
'P8BS': '2029',
'P8DISPLAY': 'LOG',
'P8MS': '0',
'P9BS': '3343',
'P9DISPLAY': 'LOG',
'P9MS': '0',
'P10BS': '331',
'P10DISPLAY': 'LOG',
'P10MS': '0',
'P11BS': '0',
'P11DISPLAY': 'LOG',
'P11MS': '0',
'P12BS': '14511',
'P12DISPLAY': 'LOG',
'P12MS': '0',
'P13BS': '6053',
'P13DISPLAY': 'LOG',
'P13MS': '0',
'P14BS': '9362',
'P14DISPLAY': 'LOG',
'P14MS': '0',
'P15BS': '557',
'P15DISPLAY': 'LOG',
'P15MS': '0',
'P16BS': '9808',
'P16DISPLAY': 'LOG',
'P16MS': '0',
'begindata': ' 5120',
'enddata': ' 4166143',
'_channels_': $PnN $PnS $PnB $PnR $PnG
Channel Number
1 FSC-A 32 262207 1
2 FSC-H 32 262207 1
3 SSC-A 32 261588 1
4 B515-A KI67 32 261588 1
5 R780-A CD3 32 261588 1
6 R710-A CD28 32 261588 1
7 R660-A CD45RO 32 261588 1
8 V800-A CD8 32 261588 1
9 V655-A CD4 32 261588 1
10 V585-A CD57 32 261588 1
11 V450-A CD14 32 261588 1
12 G780-A CCR5 32 261588 1
13 G710-A CD19 32 261588 1
14 G660-A CD27 32 261588 1
15 G610-A CCR7 32 261588 1
16 G560-A CD127 32 261588 1,
'_channel_names_': ['FSC-A',
'FSC-H',
'SSC-A',
'B515-A',
'R780-A',
'R710-A',
'R660-A',
'V800-A',
'V655-A',
'V585-A',
'V450-A',
'G780-A',
'G710-A',
'G660-A',
'G610-A',
'G560-A'],
'header': {'FCS format': 'FCS2.0',
'text start': 58,
'text end': 5099,
'data start': 5120,
'data end': 4166142,
'analysis start': 0,
'analysis end': 0},
'channels': $PnN $PnS $PnB $PnR $PnG
Channel Number
1 FSC-A 32 262207 1
2 FSC-H 32 262207 1
3 SSC-A 32 261588 1
4 B515-A KI67 32 261588 1
5 R780-A CD3 32 261588 1
6 R710-A CD28 32 261588 1
7 R660-A CD45RO 32 261588 1
8 V800-A CD8 32 261588 1
9 V655-A CD4 32 261588 1
10 V585-A CD57 32 261588 1
11 V450-A CD14 32 261588 1
12 G780-A CCR5 32 261588 1
13 G710-A CD19 32 261588 1
14 G660-A CD27 32 261588 1
15 G610-A CCR7 32 261588 1
16 G560-A CD127 32 261588 1}
Missing marker column#
In some FCS files, the marker information does not follow the $P[0-9]S
pattern, and reading the FCS file might fail. You can set the reindex=False
option when reading the FCS files.
adata = pm.io.read_fcs(path_data, reindex=False)
/home/runner/work/pytometry/pytometry/.nox/build-3-9/lib/python3.9/site-packages/fcsparser/api.py:440: UserWarning: The default channel names (defined by the $PnS parameter in the FCS file) were not unique. To avoid problems in downstream analysis, the channel names have been switched to the alternate channel names defined in the FCS file. To avoid seeing this warning message, explicitly instruct the FCS parser to use the alternate channel names by specifying the channel_naming parameter.
warnings.warn(msg)
adata
AnnData object with n_obs × n_vars = 65016 × 16
var: 'channel', 'marker', '$PnB', '$PnR', '$PnG'
uns: 'meta'
The .var
section of the AnnData object contains the channel information. Here we use a running number as var_names
. The marker names may be created manually from the channel
column.
adata.var
channel | marker | $PnB | $PnR | $PnG | |
---|---|---|---|---|---|
Channel Number | |||||
1 | FSC-A | 32 | 262207 | 1 | |
2 | FSC-H | 32 | 262207 | 1 | |
3 | SSC-A | 32 | 261588 | 1 | |
4 | B515-A | KI67 | 32 | 261588 | 1 |
5 | R780-A | CD3 | 32 | 261588 | 1 |
6 | R710-A | CD28 | 32 | 261588 | 1 |
7 | R660-A | CD45RO | 32 | 261588 | 1 |
8 | V800-A | CD8 | 32 | 261588 | 1 |
9 | V655-A | CD4 | 32 | 261588 | 1 |
10 | V585-A | CD57 | 32 | 261588 | 1 |
11 | V450-A | CD14 | 32 | 261588 | 1 |
12 | G780-A | CCR5 | 32 | 261588 | 1 |
13 | G710-A | CD19 | 32 | 261588 | 1 |
14 | G660-A | CD27 | 32 | 261588 | 1 |
15 | G610-A | CCR7 | 32 | 261588 | 1 |
16 | G560-A | CD127 | 32 | 261588 | 1 |