Counting cells using YOLOv5

YOLOv5: https://github.com/ultralytics/yolov5


Author of this notebook: Andre Telfer (andretelfer@cmail.carleton.ca)

pip install shapely
Requirement already satisfied: shapely in /home/andretelfer/anaconda3/envs/napari-env/lib/python3.9/site-packages (1.8.2)
Note: you may need to restart the kernel to use updated packages.
import matplotlib.pyplot as plt
import numpy as np
import shapely.geometry
import napari
import pandas as pd

from tqdm import tqdm
from pathlib import Path
from skimage.io import imread, imsave

What does our dataset look like?

Our dataset is just a folder containing .png images of cfos stains

DATA_DIR = Path("/home/andretelfer/shared/curated/brenna/cfos-examples/original")
plt.figure(figsize=(20,20))
sample_image = next(DATA_DIR.glob('*.png'))
image = imread(sample_image)
plt.imshow(image)
<matplotlib.image.AxesImage at 0x7f55958bcf40>
../_images/creating-dataset-by-sampling-cell-images_5_1.png

Create a new dataset by sampling sections of the original dataset

SIZE = 200 # size of new images in pixels
SAMPLES_PER_IMAGE = 5

def subsample_image(imagepath, samples):
    image = imread(imagepath)
    h,w,c = image.shape
    locations = np.stack([
        np.random.randint(0,h-SIZE,samples),
        np.random.randint(0,w-SIZE,samples)
    ]).T
    
    images = []
    for (i,j) in locations:
        new_image = np.zeros(shape=(SIZE,SIZE))
        new_image = image[i:i+SIZE, j:j+SIZE]
        images.append({
            'image': new_image,
            'x': j,
            'y': i,
            'path': imagepath
        })
        
    return images

sampled_images = []
for image in DATA_DIR.glob('*.png'):
    sampled_images += subsample_image(image, SAMPLES_PER_IMAGE)
    
plt.imshow(sampled_images[5]['image'])
<matplotlib.image.AxesImage at 0x7f5594517c10>
../_images/creating-dataset-by-sampling-cell-images_7_1.png

Save the images to a new directory

OUTPUT_DIR = Path("/home/andretelfer/shared/curated/brenna/cfos-examples/subsamples")

! rm -rf {OUTPUT_DIR}
! mkdir -p {OUTPUT_DIR}

for item in sampled_images:
    image = item['image']
    fname = item['path'].parts[-1].split('.')[0]
    
    output_file = f"{fname}_{item['x']}x_{item['y']}y.png"
    imsave(OUTPUT_DIR / output_file, image)
ls {OUTPUT_DIR}
Rat2slide1sample3-L_305x_394y.png    Rat2slide1sample4-R_100x_1311y.png
Rat2slide1sample3-L_510x_487y.png    Rat2slide1sample4-R_1367x_1096y.png
Rat2slide1sample3-L_519x_748y.png    Rat2slide1sample4-R_1522x_349y.png
Rat2slide1sample3-L_55x_1035y.png    Rat2slide1sample4-R_1703x_387y.png
Rat2slide1sample3-L_624x_961y.png    Rat2slide1sample4-R_824x_172y.png
Rat2slide1sample3-R_1386x_11y.png    Rat2slide1sample5-L_258x_655y.png
Rat2slide1sample3-R_15x_1024y.png    Rat2slide1sample5-L_496x_1318y.png
Rat2slide1sample3-R_1620x_1176y.png  Rat2slide1sample5-L_693x_447y.png
Rat2slide1sample3-R_44x_328y.png     Rat2slide1sample5-L_705x_749y.png
Rat2slide1sample3-R_582x_896y.png    Rat2slide1sample5-L_79x_700y.png
Rat2slide1sample4-L_1376x_1171y.png  Rat2slide1sample5-R_1320x_1213y.png
Rat2slide1sample4-L_1546x_250y.png   Rat2slide1sample5-R_1474x_655y.png
Rat2slide1sample4-L_1550x_435y.png   Rat2slide1sample5-R_1519x_298y.png
Rat2slide1sample4-L_217x_33y.png     Rat2slide1sample5-R_476x_544y.png
Rat2slide1sample4-L_23x_960y.png     Rat2slide1sample5-R_601x_334y.png

Labeling the Images

For this step, the YOLOv5 documentation recommended roboflow.

I created a free-tier account and started labelling.

I modified the tutorial by not including a scaling preprocessing step. I did this because the size of the cell matters and I wanted to preserve that information

  • there are some large splotches that are cell shaped, but are not cells

  • there are small speckles which are not cells

Training a Model

The YOLOv5 guide came with a Google Colab notebook that was easy to modify to my own examples (dataset, image sizes, etc)

  • Following the guide, I changed the dataset to the one we created in RoboFlow

  • In order to preserve information about cell size, for training I set the image size to be the same as the actual image size for the sampled training images. When running inference/detection, I used the full image size (e.g. ~2000px in my case).

Inference and Results

I uploaded the original images to google drive separately and modified the notebook to use them

The results were overall quite good, although I later found I should’ve labeled more of the lighter cells in the training data; so the model also misses the lighter cells.

Interacting with the results (correcting/viewing)

This will allow us to add/remove cells that were missed

Loading the YOLO labels

all_vertices = []

for idx, p in enumerate(sorted(LABEL_DIR.glob("*.txt"))):
    with open(p, 'r') as fp:
        lines = fp.readlines()
    
    # Turn the data into a dataframe
    data = [l.strip().split(' ') for l in lines]
    df = pd.DataFrame(data, columns=['0', 'x', 'y', 'w', 'h', 'c']).astype(float)

    # Drop low confidence frames
    df = df[df.c > 0.2]
    
    # Scale by image size
    fname, _ = p.parts[-1].split('.')
    image = imread(DATA_DIR / f"{fname}.png")
    h, w, c = image.shape
    df.y *= h
    df.h *= h
    df.x *= w
    df.w *= w
    
    # Get x,y vertices for rectangle
    i = np.ones(shape=df.shape[0])*idx
    vertices = np.array([
        [i, df.y-df.h/2, df.x-df.w/2],
        [i, df.y+df.h/2, df.x-df.w/2],
        [i, df.y+df.h/2, df.x+df.w/2],
        [i, df.y-df.h/2, df.x+df.w/2]
    ]).transpose(2, 0, 1)
    all_vertices.append(vertices)
    
all_vertices = np.concatenate(all_vertices)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Input In [8], in <cell line: 3>()
      1 all_vertices = []
----> 3 for idx, p in enumerate(sorted(LABEL_DIR.glob("*.txt"))):
      4     with open(p, 'r') as fp:
      5         lines = fp.readlines()

NameError: name 'LABEL_DIR' is not defined

Viewing them with Napari

# The YOLOv5 labels
LABEL_DIR = Path("assets/yolov5-results/labels")

viewer = napari.Viewer()

# Add the images
images = np.array([imread(p) for p in sorted(DATA_DIR.glob("*.png"))])
image_layer = viewer.add_image(np.array(images))

# Add the yolov5 labels
shape_layer = viewer.add_shapes(all_vertices, face_color=[1., 0., 0., 0.3])

Getting cell counts

shape_layer.save('assets/cells.csv')
cells_df = pd.read_csv('assets/cells.csv')
cells_df.head(3)
index shape-type vertex-index axis-0 axis-1 axis-2
0 0 rectangle 0 0.0 1041.999713 255.000033
1 0 rectangle 1 0.0 1054.999711 255.000033
2 0 rectangle 2 0.0 1054.999711 264.000031
cells_df = cells_df.rename(columns={'axis-0': 'image', 'axis-1': 'y', 'axis-2' : 'x', 'index': 'cell'})
cells_df.head(5)
cell shape-type vertex-index image y x
0 0 rectangle 0 0.0 1041.999713 255.000033
1 0 rectangle 1 0.0 1054.999711 255.000033
2 0 rectangle 2 0.0 1054.999711 264.000031
3 0 rectangle 3 0.0 1041.999713 264.000031
4 1 rectangle 0 0.0 17.000033 257.000059

Finally, we can get the cell counts for each image

cell_counts = cells_df.groupby('image').apply(lambda x: len(x.cell.unique()))
image
0.0    507
1.0    678
2.0    637
3.0    681
4.0    499
5.0    685
Name: cell_count, dtype: int64

Getting cells in an area

zone_layer = viewer.add_shapes(name='zone', ndim=3, edge_color='red', face_color=[0.,0.,1.,0.3])
zone_layer.save('assets/zones.csv')
zone_df = pd.read_csv('assets/zones.csv')
zone_df = zone_df.rename(columns={'axis-0': 'image', 'axis-1': 'y', 'axis-2' : 'x'})

cells_by_image = cells_df.groupby('image')
zones_by_image = zone_df.groupby('image')

for (idx, zone), (idx, cells) in zip(zones_by_image, cells_by_image):
    zone = shapely.geometry.Polygon(zone[['x', 'y']].values)
    
    plt.figure(figsize=(16,16))
    plt.imshow(images[int(idx)])
    x,y = zone.exterior.xy
    plt.plot(x,y)
    ax = plt.gca()
    
    count = 0
    for _, cell in tqdm(cells.groupby('cell')):
        cell = shapely.geometry.Polygon(cell[['x', 'y']].values)
        if zone.contains(cell):
            count += 1
            ax.add_patch(plt.Polygon(np.stack(cell.exterior.xy).T, color='red'))
    
    plt.show()
    
    print("Cell count", count)
    print("Density", count / zone.area * 1e6)
    print()
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 507/507 [00:00<00:00, 2086.06it/s]
../_images/creating-dataset-by-sampling-cell-images_26_1.png
Cell count 253
Density 212.8925598329929
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 678/678 [00:00<00:00, 2406.08it/s]
../_images/creating-dataset-by-sampling-cell-images_26_4.png
Cell count 237
Density 287.37546350839864
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 637/637 [00:00<00:00, 2235.87it/s]
../_images/creating-dataset-by-sampling-cell-images_26_7.png
Cell count 280
Density 277.3213753614854
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 681/681 [00:00<00:00, 2398.25it/s]
../_images/creating-dataset-by-sampling-cell-images_26_10.png
Cell count 231
Density 294.89183180541426
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 499/499 [00:00<00:00, 2362.33it/s]
../_images/creating-dataset-by-sampling-cell-images_26_13.png
Cell count 155
Density 180.41495343681075
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 685/685 [00:00<00:00, 2353.12it/s]
../_images/creating-dataset-by-sampling-cell-images_26_16.png
Cell count 238
Density 306.52820841199133