Integrating proteomics and imaging data#
Import necessary packages#
import opendvp as dvp
import geopandas as gpd
import ast
import spatialdata
import napari_spatialdata
from dask_image import imread
Load DIANN data into an adata object#
adata = dvp.io.DIANN_to_adata(
DIANN_path="data/proteomics/DIANN_subset_demo.pg_matrix.csv",
DIANN_sep="\t",
metadata_path="data/proteomics/DIANN_metadata_subset_demo.csv",
metadata_sep=";",
n_of_protein_metadata_cols=4
)
adata
13:50:10.86 | INFO | Starting DIANN matrix shape (7030, 14)
13:50:10.87 | INFO | Removing 264 contaminants
13:50:10.87 | INFO | Filtering 3 genes that are NaN
13:50:10.87 | INFO | ['A0A0G2JRQ6_HUMAN', 'A0A0J9YY99_HUMAN', 'YJ005_HUMAN']
13:50:10.87 | INFO | 10 samples, and 6763 proteins
13:50:10.88 | INFO | 52 gene lists (eg 'TMA7;TMA7B') were simplified to ('TMA7').
13:50:10.88 | SUCCESS | Anndata object has been created :)
AnnData object with n_obs × n_vars = 10 × 6763
obs: 'Precursors.Identified', 'Proteins.Identified', 'Average.Missed.Tryptic.Cleavages', 'LCMS_run_id', 'RCN', 'RCN_long', 'QuPath_class'
var: 'Protein.Group', 'Protein.Names', 'Genes', 'First.Protein.Description'
Let’s see the loaded metadata
adata.obs
Precursors.Identified | Proteins.Identified | Average.Missed.Tryptic.Cleavages | LCMS_run_id | RCN | RCN_long | QuPath_class | |
---|---|---|---|---|---|---|---|
Sample_filepath | |||||||
E:\Proteomics\Jose\P26E17\rawfiles\Olive_20241204_JoN_Evo2_80SPD_Rplate_P26E17_K20_S6-G4_1_8674.d | 26358 | 4760 | 0.139 | 8674 | RCN1 | Tumor enriched | P12_Tumor_1 |
E:\Proteomics\Jose\P26E17\rawfiles\Olive_20241129_JoN_Evo2_80SPD_P26E17_991_H21_S3-F9_1_8452.d | 24705 | 4553 | 0.134 | 8452 | RCN1 | Tumor enriched | P12_Tumor_2 |
E:\Proteomics\Jose\P26E17\rawfiles\Olive_20241129_JoN_Evo2_80SPD_P26E17_991_E4_S2-C2_1_8414.d | 26750 | 4835 | 0.134 | 8414 | RCN1 | Tumor enriched | P12_Tumor_3 |
E:\Proteomics\Jose\P26E17\rawfiles\Olive_20241204_JoN_Evo2_80SPD_Rplate_P26E17_L10_S4-H8_1_8551.d | 24268 | 4502 | 0.116 | 8551 | RCN1 | Tumor enriched | P12_Tumor_4 |
E:\Proteomics\Jose\P26E17\rawfiles\Olive_20241129_JoN_Evo2_80SPD_P26E17_991_F4_S2-D2_1_8424.d | 27883 | 4858 | 0.136 | 8424 | RCN1 | Tumor enriched | P12_Tumor_5 |
E:\Proteomics\Jose\P26E17\rawfiles\Olive_20241129_JoN_Evo2_80SPD_P26E17_991_C22_S3-A10_1_8521.d | 20460 | 4116 | 0.129 | 8521 | RCN3 | Immune enriched | P12_Immune_1 |
E:\Proteomics\Jose\P26E17\rawfiles\Olive_20241129_JoN_Evo2_80SPD_P26E17_991_D4_S2-B2_1_8404.d | 20446 | 3955 | 0.129 | 8404 | RCN3 | Immune enriched | P12_Immune_2 |
E:\Proteomics\Jose\P26E17\rawfiles\Olive_20241129_JoN_Evo2_80SPD_P26E17_991_D3_S2-B1_1_8403.d | 21077 | 4212 | 0.127 | 8403 | RCN3 | Immune enriched | P12_Immune_3 |
E:\Proteomics\Jose\P26E17\rawfiles\Olive_20241129_JoN_Evo2_80SPD_P26E17_991_C10_S2-A8_1_8390.d | 18013 | 3749 | 0.117 | 8390 | RCN3 | Immune enriched | P12_Immune_4 |
E:\Proteomics\Jose\P26E17\rawfiles\Olive_20241129_JoN_Evo2_80SPD_P26E17_991_C23_S3-A11_1_8520.d | 19740 | 3799 | 0.123 | 8520 | RCN3 | Immune enriched | P12_Immune_5 |
Export adata for transparency#
dvp.io.export_adata(adata=adata, path_to_dir="data/checkpoints", checkpoint_name="1_loaded")
12:52:16.33 | INFO | Writing h5ad
12:52:16.37 | SUCCESS | Wrote h5ad file
Load shapes of cut samples#
gdf = gpd.read_file("data/proteomics/collection_shapes.geojson")
gdf.head()
id | objectType | classification | geometry | |
---|---|---|---|---|
0 | 78422794-3ed4-4e09-a9aa-6486c467149d | annotation | { "name": "P12_Immune_3", "color": [ 0, 255, 2... | POLYGON ((3417 955.5, 3417.23 962.94, 3417.93 ... |
1 | 70a57ffd-50a0-4449-b873-2dd2a38bf190 | annotation | { "name": "P12_Immune_3", "color": [ 0, 255, 2... | POLYGON ((3547 715.5, 3547.23 722.94, 3547.93 ... |
2 | edfda1ca-8de8-44f5-af4e-f16e15bcd4ca | annotation | { "name": "P12_Immune_3", "color": [ 0, 255, 2... | POLYGON ((3324 568.5, 3324.23 575.94, 3324.93 ... |
3 | a0a868c6-c7e3-4fb0-8f4f-071454565080 | annotation | { "name": "P12_Immune_4", "color": [ 0, 255, 2... | POLYGON ((4398 2191.5, 4398.21 2198.06, 4398.8... |
4 | ce97d787-321e-4b7d-97e7-55b203fb0373 | annotation | { "name": "P12_Immune_4", "color": [ 0, 255, 2... | POLYGON ((4173 2349, 4173.22 2356.16, 4173.9 2... |
Create spatialdata object#
sdata = spatialdata.SpatialData()
Load multiplex immunofluorescence into spatialdata object#
# first load as array using dask-image
image_array = imread.imread("data/image/mIF.ome.tif")
image_array
|
# load image to spatialdata object
sdata['mIF'] = spatialdata.models.Image2DModel.parse(image_array)
INFO no axes information specified in the object, setting `dims` to: ('c', 'y', 'x')
sdata
SpatialData object
└── Images
└── 'mIF': DataArray[cyx] (15, 5000, 5000)
with coordinate systems:
▸ 'global', with elements:
mIF (Images)
Load proteomics matrix (adata object) to spatialdata object#
First we must label the matrix, to let spatialdata know which coordinate system to use.
In this case, this means labelling which slide it was.
adata.obs["Slide_id"] = "Slide_P12"
adata
AnnData object with n_obs × n_vars = 10 × 6763
obs: 'Precursors.Identified', 'Proteins.Identified', 'Average.Missed.Tryptic.Cleavages', 'LCMS_run_id', 'RCN', 'RCN_long', 'QuPath_class', 'Slide_id'
var: 'Protein.Group', 'Protein.Names', 'Genes', 'First.Protein.Description'
Now we will pass the matrix to spatialdata.
We need to tell spatialdata what parts of the adata object to use for visualization.
region
is which set of shapes to project these data into, in this case Slide_P12
.
region_key
is which column in adata.obs to use for finding the region
parameter.
instance_key
refers to the column that links this adata object to the shapes.
We use QuPath_class
because that is the column that matches between the adata and the geodataframe.
sdata['proteomics'] = spatialdata.models.TableModel.parse(
adata=adata,
region="Slide_P12",
region_key="Slide_id",
instance_key="QuPath_class"
)
sdata
SpatialData object
├── Images
│ └── 'mIF': DataArray[cyx] (15, 5000, 5000)
└── Tables
└── 'proteomics': AnnData (10, 6763)
with coordinate systems:
▸ 'global', with elements:
mIF (Images)
Load the geodataframe with shapes that match proteomic samples#
gdf.head()
id | objectType | classification | geometry | |
---|---|---|---|---|
0 | 78422794-3ed4-4e09-a9aa-6486c467149d | annotation | { "name": "P12_Immune_3", "color": [ 0, 255, 2... | POLYGON ((3417 955.5, 3417.23 962.94, 3417.93 ... |
1 | 70a57ffd-50a0-4449-b873-2dd2a38bf190 | annotation | { "name": "P12_Immune_3", "color": [ 0, 255, 2... | POLYGON ((3547 715.5, 3547.23 722.94, 3547.93 ... |
2 | edfda1ca-8de8-44f5-af4e-f16e15bcd4ca | annotation | { "name": "P12_Immune_3", "color": [ 0, 255, 2... | POLYGON ((3324 568.5, 3324.23 575.94, 3324.93 ... |
3 | a0a868c6-c7e3-4fb0-8f4f-071454565080 | annotation | { "name": "P12_Immune_4", "color": [ 0, 255, 2... | POLYGON ((4398 2191.5, 4398.21 2198.06, 4398.8... |
4 | ce97d787-321e-4b7d-97e7-55b203fb0373 | annotation | { "name": "P12_Immune_4", "color": [ 0, 255, 2... | POLYGON ((4173 2349, 4173.22 2356.16, 4173.9 2... |
Here we see that the QuPath class name is inside that classification column.
Let’s get it out
gdf['QuPath_class'] = gdf['classification'].apply(
lambda row: ast.literal_eval(row).get('name') if isinstance(row,str) else row.get('name'))
# this line of code says, if classification cells are strings,
# convert to a dictionary and get name attribute,
# otherwise we assume it is already a dictionary and we get the name attribute.
gdf.head()
id | objectType | classification | geometry | QuPath_class | |
---|---|---|---|---|---|
0 | 78422794-3ed4-4e09-a9aa-6486c467149d | annotation | { "name": "P12_Immune_3", "color": [ 0, 255, 2... | POLYGON ((3417 955.5, 3417.23 962.94, 3417.93 ... | P12_Immune_3 |
1 | 70a57ffd-50a0-4449-b873-2dd2a38bf190 | annotation | { "name": "P12_Immune_3", "color": [ 0, 255, 2... | POLYGON ((3547 715.5, 3547.23 722.94, 3547.93 ... | P12_Immune_3 |
2 | edfda1ca-8de8-44f5-af4e-f16e15bcd4ca | annotation | { "name": "P12_Immune_3", "color": [ 0, 255, 2... | POLYGON ((3324 568.5, 3324.23 575.94, 3324.93 ... | P12_Immune_3 |
3 | a0a868c6-c7e3-4fb0-8f4f-071454565080 | annotation | { "name": "P12_Immune_4", "color": [ 0, 255, 2... | POLYGON ((4398 2191.5, 4398.21 2198.06, 4398.8... | P12_Immune_4 |
4 | ce97d787-321e-4b7d-97e7-55b203fb0373 | annotation | { "name": "P12_Immune_4", "color": [ 0, 255, 2... | POLYGON ((4173 2349, 4173.22 2356.16, 4173.9 2... | P12_Immune_4 |
Now we see that the QuPath_class
column has our matching names.
One last thing we must do, is set these names as the index of the geodataframe.
gdf = gdf.set_index("QuPath_class")
gdf.head()
id | objectType | classification | geometry | |
---|---|---|---|---|
QuPath_class | ||||
P12_Immune_3 | 78422794-3ed4-4e09-a9aa-6486c467149d | annotation | { "name": "P12_Immune_3", "color": [ 0, 255, 2... | POLYGON ((3417 955.5, 3417.23 962.94, 3417.93 ... |
P12_Immune_3 | 70a57ffd-50a0-4449-b873-2dd2a38bf190 | annotation | { "name": "P12_Immune_3", "color": [ 0, 255, 2... | POLYGON ((3547 715.5, 3547.23 722.94, 3547.93 ... |
P12_Immune_3 | edfda1ca-8de8-44f5-af4e-f16e15bcd4ca | annotation | { "name": "P12_Immune_3", "color": [ 0, 255, 2... | POLYGON ((3324 568.5, 3324.23 575.94, 3324.93 ... |
P12_Immune_4 | a0a868c6-c7e3-4fb0-8f4f-071454565080 | annotation | { "name": "P12_Immune_4", "color": [ 0, 255, 2... | POLYGON ((4398 2191.5, 4398.21 2198.06, 4398.8... |
P12_Immune_4 | ce97d787-321e-4b7d-97e7-55b203fb0373 | annotation | { "name": "P12_Immune_4", "color": [ 0, 255, 2... | POLYGON ((4173 2349, 4173.22 2356.16, 4173.9 2... |
Now we see that QuPath_class
is the index.
Spatialdata needs this to match the proteomics adata object to these shapes.
Let’s add these prepared shapes to the spatialdata object.
sdata['Slide_P12'] = spatialdata.models.ShapesModel.parse(gdf)
sdata
SpatialData object
├── Images
│ └── 'mIF': DataArray[cyx] (15, 5000, 5000)
├── Shapes
│ └── 'Slide_P12': GeoDataFrame shape: (20, 4) (2D shapes)
└── Tables
└── 'proteomics': AnnData (10, 6763)
with coordinate systems:
▸ 'global', with elements:
mIF (Images), Slide_P12 (Shapes)
Visualize data interactively#
napari_spatialdata.Interactive(sdata)
<napari_spatialdata._interactive.Interactive at 0x2ad487610>
Write spatialdata object for future use#
sdata.write("outputs/spatialdata.zarr")
INFO The Zarr backing store has been changed from None the new file path: outputs/spatialdata.zarr