Image Analysis#

In this tutorial I will show you how to perform quality control on your image processing and segmentation results.

#lets take a quick look
import skimage.io as io
import numpy as np
import opendvp as dvp

import napari
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import geopandas as gpd

Part 1: Visualize segmentation in QuPath#

QuPath is a great piece of software created to allow users to see their images in a smooth manner.

Here is our demo data, this is how it should look like

! tree
.
├── Tutorial_1.ipynb
├── data
│   ├── image
│   │   └── mIF.ome.tif
│   ├── manual_artefact_annotations
│   │   └── artefacts.geojson
│   ├── quantification
│   │   └── quant.csv
│   └── segmentation
│       └── segmentation_mask.tif
└── outputs
    └── segmentation_for_qupath.geojson

7 directories, 6 files
# let's perform some quick QC
path_to_segmentation = "data/segmentation/segmentation_mask.tif"
seg = io.imread(path_to_segmentation)
print(f"Number of pixels in x,y: {seg.shape}")
print(f"Number of segmented objects {np.unique(seg).size -1}")
Number of pixels in x,y: (5000, 5000)
Number of segmented objects 16808
#quick look
io.imshow(seg, vmax=1)
<matplotlib.image.AxesImage at 0x106a77d10>
../_images/061c0ce75161e05b63b7cf763eab21076c2510147b257647ba5c295ff60d46dc.png

Let’s visualize interactively in Napari or Qupath#

For Napari#

#load image
image = io.imread("data/image/mIF.ome.tif")
# this should produce a napari window with image and segmentation mask
viewer = napari.Viewer()
viewer.add_image(image, name="mIF_image")
viewer.add_labels(seg, name='Segmentation')
<Labels layer 'Segmentation' at 0x2b9e33890>

For QuPath#

# transform mask into shapes
gdf = dvp.io.segmask_to_qupath(path_to_segmentation, simplify_value=1, save_as_detection=True)
INFO     no axes information specified in the object, setting `dims` to: ('y', 'x')
13:18:26.98 | INFO | Simplifying the geometry with tolerance 1
gdf.head()
geometry objectType
label
1 POLYGON ((60 43.5, 54 43.5, 46.5 39, 42.5 30, ... detection
2 POLYGON ((134 19.5, 129 19.5, 126 15.5, 119 11... detection
3 POLYGON ((167 31.5, 148 33.5, 142 30.5, 137.5 ... detection
4 POLYGON ((188 13.5, 178 13.5, 167 7.5, 160.5 1... detection
5 POLYGON ((235 48.5, 231 47.5, 220 39.5, 202.5 ... detection
gdf.to_file("outputs/segmentation_for_qupath.geojson")

This file you just drag and drop into qupath after you have loaded the image.

Quant to adata#

adata = dvp.io.quant_to_adata("data/quantification/quant.csv")
15:14:06.74 | INFO | Detected 0 in 'CellID' — shifting all values by +1 for 1-based indexing.
15:14:06.75 | INFO |  16808 cells and 15 variables
adata.obs
CellID Y_centroid X_centroid Area MajorAxisLength MinorAxisLength Eccentricity Orientation Extent Solidity
0 1 17.612598 53.337008 1270.0 48.198269 36.841132 0.644782 0.359469 146.669048 0.949178
1 2 6.598958 126.006944 576.0 45.835698 18.372329 0.916152 1.513685 113.112698 0.886154
2 3 17.416667 156.656504 984.0 40.751104 31.700565 0.628380 -1.528462 121.396970 0.955340
3 4 4.982558 179.337209 344.0 34.620290 13.577757 0.919884 1.474818 82.627417 0.971751
4 5 19.159558 228.598210 1899.0 54.446578 49.053930 0.433912 1.287374 196.610173 0.896601
... ... ... ... ... ... ... ... ... ... ...
16803 16804 4993.579685 106.373030 571.0 52.291050 14.719719 0.959562 -1.563557 116.870058 0.976068
16804 16805 4993.413242 810.385845 438.0 40.105247 14.334812 0.933940 1.531591 90.970563 0.962637
16805 16806 4994.153535 645.010101 495.0 50.864135 13.112118 0.966202 1.538209 111.248737 0.951923
16806 16807 4993.835570 1244.718121 298.0 28.947821 13.495126 0.884686 -1.561806 69.213203 0.973856
16807 16808 4994.250774 4186.876161 323.0 33.325316 13.027177 0.920429 -1.564638 77.213203 0.984756

16808 rows × 10 columns

Filter cells#

Filter cells too big#

adata.obs
CellID Y_centroid X_centroid Area MajorAxisLength MinorAxisLength Eccentricity Orientation Extent Solidity
0 1 17.612598 53.337008 1270.0 48.198269 36.841132 0.644782 0.359469 146.669048 0.949178
1 2 6.598958 126.006944 576.0 45.835698 18.372329 0.916152 1.513685 113.112698 0.886154
2 3 17.416667 156.656504 984.0 40.751104 31.700565 0.628380 -1.528462 121.396970 0.955340
3 4 4.982558 179.337209 344.0 34.620290 13.577757 0.919884 1.474818 82.627417 0.971751
4 5 19.159558 228.598210 1899.0 54.446578 49.053930 0.433912 1.287374 196.610173 0.896601
... ... ... ... ... ... ... ... ... ... ...
16803 16804 4993.579685 106.373030 571.0 52.291050 14.719719 0.959562 -1.563557 116.870058 0.976068
16804 16805 4993.413242 810.385845 438.0 40.105247 14.334812 0.933940 1.531591 90.970563 0.962637
16805 16806 4994.153535 645.010101 495.0 50.864135 13.112118 0.966202 1.538209 111.248737 0.951923
16806 16807 4993.835570 1244.718121 298.0 28.947821 13.495126 0.884686 -1.561806 69.213203 0.973856
16807 16808 4994.250774 4186.876161 323.0 33.325316 13.027177 0.920429 -1.564638 77.213203 0.984756

16808 rows × 10 columns

adata = dvp.tl.filter_by_abs_value(adata=adata,feature_name="Area", lower_bound=0.01, upper_bound=0.99, mode="quantile")
15:14:06.77 | INFO | Starting filter_by_abs_value for feature 'Area'...
15:14:06.77 | INFO | Feature 'Area' identified from adata.obs.
15:14:06.77 | INFO | Keeping cells with 'Area' >= 295.0000 (from quantile bound: 0.01).
15:14:06.77 | INFO | Keeping cells with 'Area' <= 2376.4400 (from quantile bound: 0.99).
15:14:06.77 | SUCCESS | 16475 of 16808 cells (98.02%) passed the filter.
15:14:06.78 | INFO | New boolean column 'Area_filter' added to adata.obs.
adata.obs
CellID Y_centroid X_centroid Area MajorAxisLength MinorAxisLength Eccentricity Orientation Extent Solidity Area_filter
0 1 17.612598 53.337008 1270.0 48.198269 36.841132 0.644782 0.359469 146.669048 0.949178 True
1 2 6.598958 126.006944 576.0 45.835698 18.372329 0.916152 1.513685 113.112698 0.886154 True
2 3 17.416667 156.656504 984.0 40.751104 31.700565 0.628380 -1.528462 121.396970 0.955340 True
3 4 4.982558 179.337209 344.0 34.620290 13.577757 0.919884 1.474818 82.627417 0.971751 True
4 5 19.159558 228.598210 1899.0 54.446578 49.053930 0.433912 1.287374 196.610173 0.896601 True
... ... ... ... ... ... ... ... ... ... ... ...
16803 16804 4993.579685 106.373030 571.0 52.291050 14.719719 0.959562 -1.563557 116.870058 0.976068 True
16804 16805 4993.413242 810.385845 438.0 40.105247 14.334812 0.933940 1.531591 90.970563 0.962637 True
16805 16806 4994.153535 645.010101 495.0 50.864135 13.112118 0.966202 1.538209 111.248737 0.951923 True
16806 16807 4993.835570 1244.718121 298.0 28.947821 13.495126 0.884686 -1.561806 69.213203 0.973856 True
16807 16808 4994.250774 4186.876161 323.0 33.325316 13.027177 0.920429 -1.564638 77.213203 0.984756 True

16808 rows × 11 columns

# explain why adding a column instead of just outright filtering is important
adata.var
mean_750_bg
mean_647_bg
mean_555_bg
mean_488_bg
mean_DAPI_bg
mean_Vimentin
mean_CD3e
mean_panCK
mean_CD8
mean_DAPI_1
mean_COL1A1
mean_CD20
mean_CD68
mean_Ki67
mean_DAPI_2

Filter by initial nuclear stain signal#

df = pd.DataFrame(data=adata.X, columns=adata.var_names)
sns.histplot(data=df, x="mean_DAPI_bg", bins=200)
<Axes: xlabel='mean_DAPI_bg', ylabel='Count'>
../_images/b86fd43755b6f009fcb2c8140ff9c2aae8e1c3c4ab1ba7d8a5805016dde74499.png
adata = dvp.tl.filter_by_abs_value(adata=adata,feature_name="mean_DAPI_bg", lower_bound=0.01, upper_bound=0.99, mode="quantile")
15:14:07.13 | INFO | Starting filter_by_abs_value for feature 'mean_DAPI_bg'...
15:14:07.13 | INFO | Feature 'mean_DAPI_bg' identified from adata.X.
15:14:07.13 | INFO | Keeping cells with 'mean_DAPI_bg' >= 8.0822 (from quantile bound: 0.01).
15:14:07.13 | INFO | Keeping cells with 'mean_DAPI_bg' <= 47.2041 (from quantile bound: 0.99).
15:14:07.13 | SUCCESS | 16470 of 16808 cells (97.99%) passed the filter.
15:14:07.13 | INFO | New boolean column 'mean_DAPI_bg_filter' added to adata.obs.
adata.obs
CellID Y_centroid X_centroid Area MajorAxisLength MinorAxisLength Eccentricity Orientation Extent Solidity Area_filter mean_DAPI_bg_filter
0 1 17.612598 53.337008 1270.0 48.198269 36.841132 0.644782 0.359469 146.669048 0.949178 True True
1 2 6.598958 126.006944 576.0 45.835698 18.372329 0.916152 1.513685 113.112698 0.886154 True True
2 3 17.416667 156.656504 984.0 40.751104 31.700565 0.628380 -1.528462 121.396970 0.955340 True True
3 4 4.982558 179.337209 344.0 34.620290 13.577757 0.919884 1.474818 82.627417 0.971751 True True
4 5 19.159558 228.598210 1899.0 54.446578 49.053930 0.433912 1.287374 196.610173 0.896601 True True
... ... ... ... ... ... ... ... ... ... ... ... ...
16803 16804 4993.579685 106.373030 571.0 52.291050 14.719719 0.959562 -1.563557 116.870058 0.976068 True True
16804 16805 4993.413242 810.385845 438.0 40.105247 14.334812 0.933940 1.531591 90.970563 0.962637 True True
16805 16806 4994.153535 645.010101 495.0 50.864135 13.112118 0.966202 1.538209 111.248737 0.951923 True True
16806 16807 4993.835570 1244.718121 298.0 28.947821 13.495126 0.884686 -1.561806 69.213203 0.973856 True True
16807 16808 4994.250774 4186.876161 323.0 33.325316 13.027177 0.920429 -1.564638 77.213203 0.984756 True True

16808 rows × 12 columns

Filter by ratio of nuclear stain between last and first DAPI images#

df = pd.DataFrame(data=adata.X, columns=adata.var_names)
df['ratio'] = df['mean_DAPI_2'] / df['mean_DAPI_bg']

fig,ax = plt.subplots()
sns.histplot(data=df, x="ratio", bins=200, ax=ax)
ax.set_xlim(0,1.5)
ax.set_yscale('log')
../_images/5abfa0abdb32f507bd9c2a95b51db0efa0043d1b1654686176a3ac97d01504c4.png
adata = dvp.tl.filter_by_ratio(adata=adata, end_cycle="mean_DAPI_2", start_cycle="mean_DAPI_bg", label="DAPI", min_ratio=0.25, max_ratio=1.05)
15:14:07.44 | INFO | Starting filter_by_ratio...
15:14:07.44 | INFO | Number of cells with DAPI ratio < 0.25: 1035
15:14:07.44 | INFO | Number of cells with DAPI ratio > 1.05: 28
15:14:07.44 | INFO | Cells with DAPI ratio between 0.25 and 1.05: 15745
15:14:07.44 | INFO | Cells filtered: 6.32%
15:14:07.44 | SUCCESS | filter_by_ratio complete.

Filter by manual annotations#

#check annotations
gdf = gpd.read_file("data/manual_artefact_annotations/artefacts.geojson")
gdf
id objectType classification geometry
0 9dbac0eb-6171-4da8-9c3f-846ecdb81dfb annotation { "name": "folded_tissue", "color": [ 176, 102... POLYGON ((722 2645, 702 2647, 689.93 2650.81, ...
1 cc4df5d0-fe6b-4285-849a-698851827e9c annotation { "name": "Antibody_clumps", "color": [ 32, 19... POLYGON ((4685 2530, 4682 2531, 4677 2531, 467...
2 e6aaf657-f4e7-401f-834b-a2fd5a072300 annotation { "name": "folded_tissue", "color": [ 176, 102... POLYGON ((3127 3675, 3119 3676, 3116 3677, 311...
3 be635097-4631-46e7-b1a8-878363184124 annotation { "name": "CD8_noise", "color": [ 51, 236, 220... POLYGON ((117 3008, 110 3009.62, 105 3010, 96....
4 baff029c-3349-4fa2-946a-0f5e55c46dc8 annotation { "name": "Antibody_clumps", "color": [ 32, 19... POLYGON ((3987 4058, 3984 4059, 3979 4059, 397...
5 ad114d98-3048-4e3c-9a94-982e72a95515 annotation { "name": "Antibody_clumps", "color": [ 32, 19... POLYGON ((4791 1546, 4788.47 1546.95, 4788 154...
6 99a65f87-acf2-434d-9ab5-7692e39af63b annotation { "name": "Antibody_clumps", "color": [ 32, 19... POLYGON ((4636 1840, 4628 1843, 4620 1847, 461...
7 7908eec6-399d-4e17-90ba-26446b5dcacd annotation { "name": "Antibody_clumps", "color": [ 32, 19... POLYGON ((4693 2599, 4690 2600, 4685 2600, 468...
8 f1021867-a0ec-4323-8cf9-d9d70063566a annotation { "name": "CD8_noise", "color": [ 51, 236, 220... POLYGON ((3 1994, 0 1994.23, 0 2047.23, 0 2052...
9 46b88b7d-3f7f-4a7d-812d-5ae37a6090cb annotation { "name": "folded_tissue", "color": [ 176, 102... POLYGON ((1745 18, 1743.71 18.43, 1738 19, 172...
fig,ax = plt.subplots()
gdf.plot(column="classification", legend=True, figsize=(8, 6), ax=ax)
ax.invert_yaxis()
plt.show()
../_images/dca14ce34c8c50095aadbd0b5df4279b2192d2d054d946eae9406af5cdd428b3.png
adata.obs
CellID Y_centroid X_centroid Area MajorAxisLength MinorAxisLength Eccentricity Orientation Extent Solidity Area_filter mean_DAPI_bg_filter DAPI_ratio DAPI_ratio_pass
0 1 17.612598 53.337008 1270.0 48.198269 36.841132 0.644782 0.359469 146.669048 0.949178 True True 0.065991 False
1 2 6.598958 126.006944 576.0 45.835698 18.372329 0.916152 1.513685 113.112698 0.886154 True True 0.107462 False
2 3 17.416667 156.656504 984.0 40.751104 31.700565 0.628380 -1.528462 121.396970 0.955340 True True 0.098039 False
3 4 4.982558 179.337209 344.0 34.620290 13.577757 0.919884 1.474818 82.627417 0.971751 True True 0.136228 False
4 5 19.159558 228.598210 1899.0 54.446578 49.053930 0.433912 1.287374 196.610173 0.896601 True True 0.104794 False
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
16803 16804 4993.579685 106.373030 571.0 52.291050 14.719719 0.959562 -1.563557 116.870058 0.976068 True True 0.209647 False
16804 16805 4993.413242 810.385845 438.0 40.105247 14.334812 0.933940 1.531591 90.970563 0.962637 True True 0.477926 True
16805 16806 4994.153535 645.010101 495.0 50.864135 13.112118 0.966202 1.538209 111.248737 0.951923 True True 0.268536 True
16806 16807 4993.835570 1244.718121 298.0 28.947821 13.495126 0.884686 -1.561806 69.213203 0.973856 True True 0.523596 True
16807 16808 4994.250774 4186.876161 323.0 33.325316 13.027177 0.920429 -1.564638 77.213203 0.984756 True True 0.401436 True

16808 rows × 14 columns

adata = dvp.tl.filter_by_annotation(adata=adata, path_to_geojson="data/manual_artefact_annotations/artefacts.geojson")
15:14:08.83 | INFO |  Each class of annotation will be a different column in adata.obs
15:14:08.83 | INFO |  TRUE means cell was inside annotation, FALSE means cell not in annotation
15:14:08.84 | INFO | GeoJSON loaded, detected: 10 annotations
adata.obs
CellID Y_centroid X_centroid Area MajorAxisLength MinorAxisLength Eccentricity Orientation Extent Solidity Area_filter mean_DAPI_bg_filter DAPI_ratio DAPI_ratio_pass Antibody_clumps CD8_noise folded_tissue ANY annotation
0 1 17.612598 53.337008 1270.0 48.198269 36.841132 0.644782 0.359469 146.669048 0.949178 True True 0.065991 False False False False False Unannotated
1 2 6.598958 126.006944 576.0 45.835698 18.372329 0.916152 1.513685 113.112698 0.886154 True True 0.107462 False False False False False Unannotated
2 3 17.416667 156.656504 984.0 40.751104 31.700565 0.628380 -1.528462 121.396970 0.955340 True True 0.098039 False False False False False Unannotated
3 4 4.982558 179.337209 344.0 34.620290 13.577757 0.919884 1.474818 82.627417 0.971751 True True 0.136228 False False False False False Unannotated
4 5 19.159558 228.598210 1899.0 54.446578 49.053930 0.433912 1.287374 196.610173 0.896601 True True 0.104794 False False False False False Unannotated
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
16803 16804 4993.579685 106.373030 571.0 52.291050 14.719719 0.959562 -1.563557 116.870058 0.976068 True True 0.209647 False False True False True CD8_noise
16804 16805 4993.413242 810.385845 438.0 40.105247 14.334812 0.933940 1.531591 90.970563 0.962637 True True 0.477926 True False True False True CD8_noise
16805 16806 4994.153535 645.010101 495.0 50.864135 13.112118 0.966202 1.538209 111.248737 0.951923 True True 0.268536 True False True False True CD8_noise
16806 16807 4993.835570 1244.718121 298.0 28.947821 13.495126 0.884686 -1.561806 69.213203 0.973856 True True 0.523596 True False True False True CD8_noise
16807 16808 4994.250774 4186.876161 323.0 33.325316 13.027177 0.920429 -1.564638 77.213203 0.984756 True True 0.401436 True False False False False Unannotated

16808 rows × 19 columns

# new processed adata
adata_processed = adata[
    (adata.obs["Area_filter"])
    & (adata.obs["DAPI_ratio_pass"])
    & (~adata.obs["Antibody_clumps"])
    & (~adata.obs["folded_tissue"])
].copy() # type: ignore
adata
AnnData object with n_obs × n_vars = 16808 × 15
    obs: 'CellID', 'Y_centroid', 'X_centroid', 'Area', 'MajorAxisLength', 'MinorAxisLength', 'Eccentricity', 'Orientation', 'Extent', 'Solidity', 'Area_filter', 'mean_DAPI_bg_filter', 'DAPI_ratio', 'DAPI_ratio_pass', 'Antibody_clumps', 'CD8_noise', 'folded_tissue', 'ANY', 'annotation'
adata_processed
AnnData object with n_obs × n_vars = 15244 × 15
    obs: 'CellID', 'Y_centroid', 'X_centroid', 'Area', 'MajorAxisLength', 'MinorAxisLength', 'Eccentricity', 'Orientation', 'Extent', 'Solidity', 'Area_filter', 'mean_DAPI_bg_filter', 'DAPI_ratio', 'DAPI_ratio_pass', 'Antibody_clumps', 'CD8_noise', 'folded_tissue', 'ANY', 'annotation'

QuPath QC#

gdf = gpd.read_file("outputs/segmentation_for_qupath.geojson")
gdf.head()
label objectType geometry
0 1 detection POLYGON ((60 43.5, 54 43.5, 46.5 39, 42.5 30, ...
1 2 detection POLYGON ((134 19.5, 129 19.5, 126 15.5, 119 11...
2 3 detection POLYGON ((167 31.5, 148 33.5, 142 30.5, 137.5 ...
3 4 detection POLYGON ((188 13.5, 178 13.5, 167 7.5, 160.5 1...
4 5 detection POLYGON ((235 48.5, 231 47.5, 220 39.5, 202.5 ...
adata.obs.head()
CellID Y_centroid X_centroid Area MajorAxisLength MinorAxisLength Eccentricity Orientation Extent Solidity Area_filter mean_DAPI_bg_filter DAPI_ratio DAPI_ratio_pass Antibody_clumps CD8_noise folded_tissue ANY annotation
0 1 17.612598 53.337008 1270.0 48.198269 36.841132 0.644782 0.359469 146.669048 0.949178 True True 0.065991 False False False False False Unannotated
1 2 6.598958 126.006944 576.0 45.835698 18.372329 0.916152 1.513685 113.112698 0.886154 True True 0.107462 False False False False False Unannotated
2 3 17.416667 156.656504 984.0 40.751104 31.700565 0.628380 -1.528462 121.396970 0.955340 True True 0.098039 False False False False False Unannotated
3 4 4.982558 179.337209 344.0 34.620290 13.577757 0.919884 1.474818 82.627417 0.971751 True True 0.136228 False False False False False Unannotated
4 5 19.159558 228.598210 1899.0 54.446578 49.053930 0.433912 1.287374 196.610173 0.896601 True True 0.104794 False False False False False Unannotated
# check processed cells in qupath
cells = dvp.io.adata_to_qupath(
    adata=adata_processed, 
    geodataframe=gdf,
    adataobs_on="CellID",
    gdf_on="label",
    classify_by=None,
    simplify_value=None,
    save_as_detection=True)
15:14:09.23 | INFO | Found 15244 matching IDs between adata.obs['CellID'] and geodataframe['label'].
cells.to_file("outputs/filtered_cells.geojson")

Napari QC not quite possible without spatialdata object, check tutorial #3#

Phenotype my cells#

help(dvp.pp.impute_marker_with_annotation)
Help on function impute_marker_with_annotation in module opendvp.pp.impute_marker_with_annotation:

impute_marker_with_annotation(adata: anndata._core.anndata.AnnData, target_variable: str, target_annotation_column: str, quantile_for_imputation: float = 0.05) -> anndata._core.anndata.AnnData
    Change value of a feature in an AnnData object for rows matching a specific annotation.
    
    Using a specified quantile value from the variable's distribution.
    
    Parameters:
    ----------
    adata : ad.AnnData
        The annotated data matrix.
    target_variable : str
        The variable (gene/feature) to impute.
    target_annotation_column : str
        The column in adata.obs to use for selecting rows to impute.
    quantile_for_imputation : float, optional
        The quantile to use for imputation (default is 0.05).
    
    Returns:
    -------
    ad.AnnData
        A copy of the AnnData object with imputed values.
adata.obs
CellID Y_centroid X_centroid Area MajorAxisLength MinorAxisLength Eccentricity Orientation Extent Solidity Area_filter mean_DAPI_bg_filter DAPI_ratio DAPI_ratio_pass Antibody_clumps CD8_noise folded_tissue ANY annotation
0 1 17.612598 53.337008 1270.0 48.198269 36.841132 0.644782 0.359469 146.669048 0.949178 True True 0.065991 False False False False False Unannotated
1 2 6.598958 126.006944 576.0 45.835698 18.372329 0.916152 1.513685 113.112698 0.886154 True True 0.107462 False False False False False Unannotated
2 3 17.416667 156.656504 984.0 40.751104 31.700565 0.628380 -1.528462 121.396970 0.955340 True True 0.098039 False False False False False Unannotated
3 4 4.982558 179.337209 344.0 34.620290 13.577757 0.919884 1.474818 82.627417 0.971751 True True 0.136228 False False False False False Unannotated
4 5 19.159558 228.598210 1899.0 54.446578 49.053930 0.433912 1.287374 196.610173 0.896601 True True 0.104794 False False False False False Unannotated
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
16803 16804 4993.579685 106.373030 571.0 52.291050 14.719719 0.959562 -1.563557 116.870058 0.976068 True True 0.209647 False False True False True CD8_noise
16804 16805 4993.413242 810.385845 438.0 40.105247 14.334812 0.933940 1.531591 90.970563 0.962637 True True 0.477926 True False True False True CD8_noise
16805 16806 4994.153535 645.010101 495.0 50.864135 13.112118 0.966202 1.538209 111.248737 0.951923 True True 0.268536 True False True False True CD8_noise
16806 16807 4993.835570 1244.718121 298.0 28.947821 13.495126 0.884686 -1.561806 69.213203 0.973856 True True 0.523596 True False True False True CD8_noise
16807 16808 4994.250774 4186.876161 323.0 33.325316 13.027177 0.920429 -1.564638 77.213203 0.984756 True True 0.401436 True False False False False Unannotated

16808 rows × 19 columns

adata_CD8 = dvp.pp.impute_marker_with_annotation(
    adata=adata_processed,
    target_variable="mean_CD8", 
    target_annotation_column="CD8_noise",
    quantile_for_imputation=0.15
    )
15:14:09.64 | INFO | Imputing with 0.15% percentile value = 7.545454545454546
gates = dvp.io.import_thresholds(gates_csv_path="data/phenotyping/gates.csv")
gates
15:14:09.66 | INFO | Filtering out all rows with value 0.0 (assuming not gated)
15:14:09.66 | INFO | Found 8 valid gates
15:14:09.66 | INFO | Markers found: ['mean_Vimentin' 'mean_CD3e' 'mean_panCK' 'mean_CD8' 'mean_COL1A1'
 'mean_CD20' 'mean_CD68' 'mean_Ki67']
15:14:09.66 | INFO | Samples found: ['TD_15_TNBC_subset']
15:14:09.66 | INFO | Applying log1p transformation to gate values and formatting for scimap.
15:14:09.66 | INFO |    Output DataFrame columns: ['markers', 'TD_15_TNBC_subset']
markers TD_15_TNBC_subset
5 mean_Vimentin 1.915886
6 mean_CD3e 2.138796
7 mean_panCK 1.287972
8 mean_CD8 2.890372
10 mean_COL1A1 3.169333
11 mean_CD20 3.205329
12 mean_CD68 1.436576
13 mean_Ki67 1.093354
adata_phenotyping = adata_CD8[:,adata_CD8.var_names.isin(gates['markers'])].copy()
adata_phenotyping.obs['sample_id'] = "TD_15_TNBC_subset"
# seems that I will have to:
# create adata for gating by filtering unused columns
adata_rescaled = dvp.pp.rescale(
                        adata=adata_phenotyping,
                        gate=gates,
                        method="all",
                        imageid="sample_id"
                        )
Scaling Image: TD_15_TNBC_subset
Scaling mean_Vimentin (gate: 1.916)
Scaling mean_CD3e (gate: 2.139)
Scaling mean_panCK (gate: 1.288)
Scaling mean_CD8 (gate: 2.890)
Scaling mean_COL1A1 (gate: 3.169)
Scaling mean_CD20 (gate: 3.205)
Scaling mean_CD68 (gate: 1.437)
Scaling mean_Ki67 (gate: 1.093)
# load the phenotyping workflow
phenotype = pd.read_csv('data/phenotyping/celltype_matrix.csv')
phenotype.style.format(na_rep='')
  Unnamed: 0 Unnamed: 1 Vimentin CD3e panCK CD8 COL1A1 CD20 CD68 Ki67
0 all Epithelial pos
1 all Mesenchymal pos
2 all Immune anypos anypos anypos anypos
3 all Fibroblasts pos
4 Immune CD4_T_cell pos neg
5 Immune CD8_T_cell pos
6 Immune B_cell pos
7 Immune Macrophage pos
adata_phenotyping.var['feature_name']  = [name.split("_")[1] for name in adata_phenotyping.var_names]
adata_phenotyping.var.index = adata_phenotyping.var['feature_name'].values
adata_phenotyped = dvp.tl.phenotype_cells(
    adata_phenotyping, 
    phenotype=phenotype, 
    label="phenotype",
    verbose=True) 
Phenotyping Epithelial
Phenotyping Mesenchymal
Phenotyping Immune
Phenotyping Fibroblasts
-- Subsetting Immune
Phenotyping CD4_T_cell
Phenotyping CD8_T_cell
Phenotyping B_cell
Phenotyping Macrophage
Consolidating the phenotypes across all groups
adata_phenotyped.obs
CellID Y_centroid X_centroid Area MajorAxisLength MinorAxisLength Eccentricity Orientation Extent Solidity ... mean_DAPI_bg_filter DAPI_ratio DAPI_ratio_pass Antibody_clumps CD8_noise folded_tissue ANY annotation sample_id phenotype
6 7 28.019009 321.572980 1473.0 57.990373 34.056846 0.809381 0.702143 160.367532 0.943022 ... True 0.503327 True False False False False Unannotated TD_15_TNBC_subset Epithelial
7 8 13.004021 351.318365 1492.0 60.422292 35.485098 0.809380 1.459721 167.917785 0.965071 ... True 0.563896 True False False False False Unannotated TD_15_TNBC_subset Epithelial
8 9 8.195719 415.693170 981.0 62.489467 20.854657 0.942668 -1.541497 144.769553 0.974181 ... True 0.473116 True False False False False Unannotated TD_15_TNBC_subset Unknown
9 10 9.020833 482.729167 576.0 34.660038 21.747314 0.778660 -1.556045 93.698485 0.963211 ... True 0.526049 True False False False False Unannotated TD_15_TNBC_subset CD4_T_cell
10 11 11.670357 554.346863 813.0 36.506883 29.531264 0.587914 -1.567206 109.355339 0.975990 ... True 0.649908 True False False False False Unannotated TD_15_TNBC_subset Unknown
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
16801 16802 4992.956322 561.156322 435.0 36.423914 15.810187 0.900884 -1.561879 87.798990 0.966667 ... True 0.461031 True False True False True CD8_noise TD_15_TNBC_subset Fibroblasts
16804 16805 4993.413242 810.385845 438.0 40.105247 14.334812 0.933940 1.531591 90.970563 0.962637 ... True 0.477926 True False True False True CD8_noise TD_15_TNBC_subset Unknown
16805 16806 4994.153535 645.010101 495.0 50.864135 13.112118 0.966202 1.538209 111.248737 0.951923 ... True 0.268536 True False True False True CD8_noise TD_15_TNBC_subset Fibroblasts
16806 16807 4993.835570 1244.718121 298.0 28.947821 13.495126 0.884686 -1.561806 69.213203 0.973856 ... True 0.523596 True False True False True CD8_noise TD_15_TNBC_subset Unknown
16807 16808 4994.250774 4186.876161 323.0 33.325316 13.027177 0.920429 -1.564638 77.213203 0.984756 ... True 0.401436 True False False False False Unannotated TD_15_TNBC_subset Unknown

15244 rows × 21 columns

adata = adata_CD8.copy()
adata.obs = adata_phenotyped.obs.copy()
adata
AnnData object with n_obs × n_vars = 15244 × 15
    obs: 'CellID', 'Y_centroid', 'X_centroid', 'Area', 'MajorAxisLength', 'MinorAxisLength', 'Eccentricity', 'Orientation', 'Extent', 'Solidity', 'Area_filter', 'mean_DAPI_bg_filter', 'DAPI_ratio', 'DAPI_ratio_pass', 'Antibody_clumps', 'CD8_noise', 'folded_tissue', 'ANY', 'annotation', 'sample_id', 'phenotype'

QC phenotypes#

#Qupath shapes

Cellular neighborhoods#