Counting cells using YOLOv5¶

YOLOv5: https://github.com/ultralytics/yolov5

Author of this notebook: Andre Telfer (andretelfer@cmail.carleton.ca)

pip install shapely

Requirement already satisfied: shapely in /home/andretelfer/anaconda3/envs/napari-env/lib/python3.9/site-packages (1.8.2)

Note: you may need to restart the kernel to use updated packages.

import matplotlib.pyplot as plt
import numpy as np
import shapely.geometry
import napari
import pandas as pd

from tqdm import tqdm
from pathlib import Path
from skimage.io import imread, imsave

What does our dataset look like?¶

Our dataset is just a folder containing .png images of cfos stains

DATA_DIR = Path("/home/andretelfer/shared/curated/brenna/cfos-examples/original")

plt.figure(figsize=(20,20))
sample_image = next(DATA_DIR.glob('*.png'))
image = imread(sample_image)
plt.imshow(image)

<matplotlib.image.AxesImage at 0x7f55958bcf40>

../_images/creating-dataset-by-sampling-cell-images_5_1.png

Create a new dataset by sampling sections of the original dataset¶

SIZE = 200 # size of new images in pixels
SAMPLES_PER_IMAGE = 5

def subsample_image(imagepath, samples):
    image = imread(imagepath)
    h,w,c = image.shape
    locations = np.stack([
        np.random.randint(0,h-SIZE,samples),
        np.random.randint(0,w-SIZE,samples)
    ]).T
    
    images = []
    for (i,j) in locations:
        new_image = np.zeros(shape=(SIZE,SIZE))
        new_image = image[i:i+SIZE, j:j+SIZE]
        images.append({
            'image': new_image,
            'x': j,
            'y': i,
            'path': imagepath
        })
        
    return images

sampled_images = []
for image in DATA_DIR.glob('*.png'):
    sampled_images += subsample_image(image, SAMPLES_PER_IMAGE)
    
plt.imshow(sampled_images[5]['image'])

<matplotlib.image.AxesImage at 0x7f5594517c10>

../_images/creating-dataset-by-sampling-cell-images_7_1.png

Save the images to a new directory¶

OUTPUT_DIR = Path("/home/andretelfer/shared/curated/brenna/cfos-examples/subsamples")

! rm -rf {OUTPUT_DIR}
! mkdir -p {OUTPUT_DIR}

for item in sampled_images:
    image = item['image']
    fname = item['path'].parts[-1].split('.')[0]
    
    output_file = f"{fname}_{item['x']}x_{item['y']}y.png"
    imsave(OUTPUT_DIR / output_file, image)

ls {OUTPUT_DIR}

Rat2slide1sample3-L_305x_394y.png    Rat2slide1sample4-R_100x_1311y.png
Rat2slide1sample3-L_510x_487y.png    Rat2slide1sample4-R_1367x_1096y.png
Rat2slide1sample3-L_519x_748y.png    Rat2slide1sample4-R_1522x_349y.png
Rat2slide1sample3-L_55x_1035y.png    Rat2slide1sample4-R_1703x_387y.png
Rat2slide1sample3-L_624x_961y.png    Rat2slide1sample4-R_824x_172y.png
Rat2slide1sample3-R_1386x_11y.png    Rat2slide1sample5-L_258x_655y.png
Rat2slide1sample3-R_15x_1024y.png    Rat2slide1sample5-L_496x_1318y.png
Rat2slide1sample3-R_1620x_1176y.png  Rat2slide1sample5-L_693x_447y.png
Rat2slide1sample3-R_44x_328y.png     Rat2slide1sample5-L_705x_749y.png
Rat2slide1sample3-R_582x_896y.png    Rat2slide1sample5-L_79x_700y.png
Rat2slide1sample4-L_1376x_1171y.png  Rat2slide1sample5-R_1320x_1213y.png
Rat2slide1sample4-L_1546x_250y.png   Rat2slide1sample5-R_1474x_655y.png
Rat2slide1sample4-L_1550x_435y.png   Rat2slide1sample5-R_1519x_298y.png
Rat2slide1sample4-L_217x_33y.png     Rat2slide1sample5-R_476x_544y.png
Rat2slide1sample4-L_23x_960y.png     Rat2slide1sample5-R_601x_334y.png

Labeling the Images¶

For this step, the YOLOv5 documentation recommended roboflow.

I created a free-tier account and started labelling.

I modified the tutorial by not including a scaling preprocessing step. I did this because the size of the cell matters and I wanted to preserve that information

there are some large splotches that are cell shaped, but are not cells
there are small speckles which are not cells

Training a Model¶

The YOLOv5 guide came with a Google Colab notebook that was easy to modify to my own examples (dataset, image sizes, etc)

Following the guide, I changed the dataset to the one we created in RoboFlow
In order to preserve information about cell size, for training I set the image size to be the same as the actual image size for the sampled training images. When running inference/detection, I used the full image size (e.g. ~2000px in my case).

Inference and Results¶

I uploaded the original images to google drive separately and modified the notebook to use them

The results were overall quite good, although I later found I should’ve labeled more of the lighter cells in the training data; so the model also misses the lighter cells.

Interacting with the results (correcting/viewing)¶

This will allow us to add/remove cells that were missed

Loading the YOLO labels¶

all_vertices = []

for idx, p in enumerate(sorted(LABEL_DIR.glob("*.txt"))):
    with open(p, 'r') as fp:
        lines = fp.readlines()
    
    # Turn the data into a dataframe
    data = [l.strip().split(' ') for l in lines]
    df = pd.DataFrame(data, columns=['0', 'x', 'y', 'w', 'h', 'c']).astype(float)

    # Drop low confidence frames
    df = df[df.c > 0.2]
    
    # Scale by image size
    fname, _ = p.parts[-1].split('.')
    image = imread(DATA_DIR / f"{fname}.png")
    h, w, c = image.shape
    df.y *= h
    df.h *= h
    df.x *= w
    df.w *= w
    
    # Get x,y vertices for rectangle
    i = np.ones(shape=df.shape[0])*idx
    vertices = np.array([
        [i, df.y-df.h/2, df.x-df.w/2],
        [i, df.y+df.h/2, df.x-df.w/2],
        [i, df.y+df.h/2, df.x+df.w/2],
        [i, df.y-df.h/2, df.x+df.w/2]
    ]).transpose(2, 0, 1)
    all_vertices.append(vertices)
    
all_vertices = np.concatenate(all_vertices)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [8], in <cell line: 3>()
      1 all_vertices = []
----> 3 for idx, p in enumerate(sorted(LABEL_DIR.glob("*.txt"))):
      4     with open(p, 'r') as fp:
      5         lines = fp.readlines()

NameError: name 'LABEL_DIR' is not defined

Viewing them with Napari¶

# The YOLOv5 labels
LABEL_DIR = Path("assets/yolov5-results/labels")

viewer = napari.Viewer()

# Add the images
images = np.array([imread(p) for p in sorted(DATA_DIR.glob("*.png"))])
image_layer = viewer.add_image(np.array(images))

# Add the yolov5 labels
shape_layer = viewer.add_shapes(all_vertices, face_color=[1., 0., 0., 0.3])

Getting cell counts¶

shape_layer.save('assets/cells.csv')
cells_df = pd.read_csv('assets/cells.csv')
cells_df.head(3)

	shape-type	vertex-index	axis-1	axis-2
0	rectangle	0	1041.999713	255.000033
1	rectangle	1	1054.999711	255.000033
2	rectangle	2	1054.999711	264.000031

cells_df = cells_df.rename(columns={'axis-0': 'image', 'axis-1': 'y', 'axis-2' : 'x', 'index': 'cell'})
cells_df.head(5)

	cell	shape-type	vertex-index	y	x
0	0	rectangle	0	1041.999713	255.000033
1	0	rectangle	1	1054.999711	255.000033
2	0	rectangle	2	1054.999711	264.000031
3	0	rectangle	3	1041.999713	264.000031
4	1	rectangle	0	17.000033	257.000059

Finally, we can get the cell counts for each image

cell_counts = cells_df.groupby('image').apply(lambda x: len(x.cell.unique()))

image
0    507
0    678
0    637
0    681
0    499
0    685
Name: cell_count, dtype: int64

Getting cells in an area¶

zone_layer = viewer.add_shapes(name='zone', ndim=3, edge_color='red', face_color=[0.,0.,1.,0.3])

zone_layer.save('assets/zones.csv')
zone_df = pd.read_csv('assets/zones.csv')
zone_df = zone_df.rename(columns={'axis-0': 'image', 'axis-1': 'y', 'axis-2' : 'x'})

cells_by_image = cells_df.groupby('image')
zones_by_image = zone_df.groupby('image')

for (idx, zone), (idx, cells) in zip(zones_by_image, cells_by_image):
    zone = shapely.geometry.Polygon(zone[['x', 'y']].values)
    
    plt.figure(figsize=(16,16))
    plt.imshow(images[int(idx)])
    x,y = zone.exterior.xy
    plt.plot(x,y)
    ax = plt.gca()
    
    count = 0
    for _, cell in tqdm(cells.groupby('cell')):
        cell = shapely.geometry.Polygon(cell[['x', 'y']].values)
        if zone.contains(cell):
            count += 1
            ax.add_patch(plt.Polygon(np.stack(cell.exterior.xy).T, color='red'))
    
    plt.show()
    
    print("Cell count", count)
    print("Density", count / zone.area * 1e6)
    print()

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 507/507 [00:00<00:00, 2086.06it/s]

../_images/creating-dataset-by-sampling-cell-images_26_1.png

Cell count 253
Density 212.8925598329929

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 678/678 [00:00<00:00, 2406.08it/s]

../_images/creating-dataset-by-sampling-cell-images_26_4.png

Cell count 237
Density 287.37546350839864

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 637/637 [00:00<00:00, 2235.87it/s]

../_images/creating-dataset-by-sampling-cell-images_26_7.png

Cell count 280
Density 277.3213753614854

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 681/681 [00:00<00:00, 2398.25it/s]

../_images/creating-dataset-by-sampling-cell-images_26_10.png

Cell count 231
Density 294.89183180541426

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 499/499 [00:00<00:00, 2362.33it/s]

../_images/creating-dataset-by-sampling-cell-images_26_13.png

Cell count 155
Density 180.41495343681075

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 685/685 [00:00<00:00, 2353.12it/s]

../_images/creating-dataset-by-sampling-cell-images_26_16.png

Cell count 238
Density 306.52820841199133

My sample book

Counting cells using YOLOv5

Contents