Skip to content

Incorrect std calculation

Currently, the standard deviation is calculated as the mean of the stds over images in the dataset.

https://gitlab.ics.muni.cz/rationai/digital-pathology/pathology/breast-cancer/-/blob/master/preprocessing/calculate_mean_std.py

The ideal situation is that each pixel is treated as one feature/attribute of the input data (similar to how I could have a data vector like (age, weight, height, …), for an image it’s simply (pixel, pixel, …)). Traditionally (as taught in courses), each feature is standardized independently: you compute its mean and standard deviation, and then transform it as (x - mean) / std.

If we apply this idea to images, the result is a “mean image” and its “std”. It’s usually assumed that images don’t have a prior bias in what they show (for example, if an object in an image is shifted to the side, it’s still just “an image”). Under this assumption, the mean image will almost certainly come out as a uniform color (or very close to it).

In our case (tissue tiles), this assumption might not fully hold, since the borders will much more often be white than the centers — but we ignore that detail, for simplicity.

So in the end, we treat all pixels as independent, and voilà: we compute the mean and std of all pixels across the entire dataset.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information