Metadata

This page acts as the technical reference for the metadata subpackage.

The metadata subpackage provides classes for managing information related to the files, which are managed by the FrameData and PairedFrameData classes. The BaseMetadata class provides a guideline for implementing a Metadata class, while the others provide concrete ready-to-use classes to manage image metadata and video metadata.

`BaseMetadata`

Bases: ABC

Base class for enforcing the format of metadata classes. Each metadata class must implement their own init method and have a common interface for checking metadata matches via is_match.

Source code in clair_torch/metadata/base.py

class BaseMetadata(ABC):
    """
    Base class for enforcing the format of metadata classes. Each metadata class must implement their
    own __init__ method and have a common interface for checking metadata matches via is_match.
    """
    @abstractmethod
    def __init__(self):
        ...

    @property
    @abstractmethod
    def _numeric_fields(self) -> list[str]:
        ...

    @property
    @abstractmethod
    def _text_fields(self) -> list[str]:
        ...

    def get_numeric_metadata(self) -> dict[str, float | int]:
        return {
            field: getattr(self, field)
            for field in self._numeric_fields
            if getattr(self, field) is not None
        }

    def get_text_metadata(self) -> dict[str, str]:
        return {
            field: getattr(self, field)
            for field in self._text_fields
            if getattr(self, field) is not None
        }

    def get_all_metadata(self) -> dict[str, str | int | float]:
        numeric_dict = self.get_numeric_metadata()
        text_dict = self.get_text_metadata()
        return text_dict | numeric_dict

    @typechecked
    def is_match(self, other: 'BaseMetadata', attributes: dict[str, None | int | float], *,
                 missing_key_fails: bool = True) -> bool:
        """
        Function for checking if two instances of metadata classes are a match or not based on the given
        mapping of attributes to tolerances.

        Args:
            other: an instance of a concrete implementation of a metadata class.
            attributes: dictionary mapping attribute names to their tolerances.
                        A None value means an exact match is required.
            missing_key_fails: whether a missing attribute results in a failed check or not.

        Returns:
            True if a match, False if not.
        """
        if not issubclass(type(other), BaseMetadata):
            return False

        for attr, tolerance in attributes.items():
            if attr in self.get_text_metadata():
                if getattr(self, attr, None) != getattr(other, attr, None):
                    return False

            elif attr in self.get_numeric_metadata():
                safe_tolerance = 0.0 if tolerance is None else tolerance
                if not isclose(getattr(self, attr, None), getattr(other, attr, None), rel_tol=safe_tolerance):
                    return False

            elif missing_key_fails:
                # Attribute not present in either text or numeric metadata
                return False

        return True

`is_match(other, attributes, *, missing_key_fails=True)`

Function for checking if two instances of metadata classes are a match or not based on the given mapping of attributes to tolerances.

Parameters:

Name	Type	Description	Default
`other`	`BaseMetadata`	an instance of a concrete implementation of a metadata class.	required
`attributes`	`dict[str, None \| int \| float]`	dictionary mapping attribute names to their tolerances. A None value means an exact match is required.	required
`missing_key_fails`	`bool`	whether a missing attribute results in a failed check or not.	`True`

Returns:

Type	Description
`bool`	True if a match, False if not.

Source code in clair_torch/metadata/base.py

@typechecked
def is_match(self, other: 'BaseMetadata', attributes: dict[str, None | int | float], *,
             missing_key_fails: bool = True) -> bool:
    """
    Function for checking if two instances of metadata classes are a match or not based on the given
    mapping of attributes to tolerances.

    Args:
        other: an instance of a concrete implementation of a metadata class.
        attributes: dictionary mapping attribute names to their tolerances.
                    A None value means an exact match is required.
        missing_key_fails: whether a missing attribute results in a failed check or not.

    Returns:
        True if a match, False if not.
    """
    if not issubclass(type(other), BaseMetadata):
        return False

    for attr, tolerance in attributes.items():
        if attr in self.get_text_metadata():
            if getattr(self, attr, None) != getattr(other, attr, None):
                return False

        elif attr in self.get_numeric_metadata():
            safe_tolerance = 0.0 if tolerance is None else tolerance
            if not isclose(getattr(self, attr, None), getattr(other, attr, None), rel_tol=safe_tolerance):
                return False

        elif missing_key_fails:
            # Attribute not present in either text or numeric metadata
            return False

    return True

`ImagingMetadata`

Bases: BaseMetadata

Class for managing the metadata of an optical microscope image, containing metadata fields for exposure time, magnification, illumination type and subject name. The metadata is parsed based on the file name, which is assumed to have the following parts, in arbitrary order, for successful parsing. White space is reserved for separating parts. 1. Exposure time in milliseconds with a point for decimal separator, ending in ms with no white space. e.g. '10.5ms'. 2. Magnification as float with a point for decimal separator, ending in 'x' or 'X' with no white space. e.g. '50.0x'. 3. Illumination type is reserved as 'BF' or 'bf' for bright field and 'DF' or 'df' for dark field. 4. Subject name will be parsed as the first part that doesn't fit into any of the aforementioned categories.

Source code in clair_torch/metadata/imaging_metadata.py

class ImagingMetadata(BaseMetadata):
    """
    Class for managing the metadata of an optical microscope image, containing metadata fields for exposure time,
    magnification, illumination type and subject name. The metadata is parsed based on the file name, which is assumed
    to have the following parts, in arbitrary order, for successful parsing. White space is reserved for separating
    parts.
    1. Exposure time in milliseconds with a point for decimal separator, ending in ms with no white space. e.g. '10.5ms'.
    2. Magnification as float with a point for decimal separator, ending in 'x' or 'X' with no white space. e.g. '50.0x'.
    3. Illumination type is reserved as 'BF' or 'bf' for bright field and 'DF' or 'df' for dark field.
    4. Subject name will be parsed as the first part that doesn't fit into any of the aforementioned categories.
    """
    @typechecked
    def __init__(self, val_input_path: str | Path):
        """
        Initialization of a ImagingMetadata instance. Metadata parsing based on the file name.
        Args:
            val_input_path: the path to the file for which to parse metadata.
        """
        if isinstance(val_input_path, str):
            val_input_path = Path(val_input_path)
        if not isinstance(val_input_path, Path):
            raise TypeError(f"Expected val_input_path as Path, got {type(val_input_path)}")

        self.exposure_time = None
        self.magnification = None
        self.illumination = None
        self.subject = None

        self._parse_file_name(val_input_path)

    @property
    def _text_fields(self) -> list[str]:
        return ["illumination", "subject"]

    @property
    def _numeric_fields(self) -> list[str]:
        return ["exposure_time", "magnification"]

    def _parse_file_name(self, val_input_path: Path):
        """
        Extracts metadata fields from the filename. Attempts to parse exposure time, magnification, illumination type
        and subject name. After successful parsing the parsed value is assigned to instance attributes.
        """
        file_name_array = val_input_path.stem.split()

        for element in file_name_array:
            lower_elem = element.casefold()

            # Try exposure time
            if self.exposure_time is None and re.match(r"^\d+.*ms$", element):
                try:
                    self.exposure_time = float(element.removesuffix('ms')) / 1000
                    continue
                except ValueError:
                    pass

            # Try magnification
            if self.magnification is None and re.match(r"^\d+.*[xX]$", element):
                try:
                    self.magnification = float(element.lower().removesuffix('x'))
                    continue
                except ValueError:
                    pass

            # Try illumination
            if self.illumination is None and lower_elem in {'bf', 'df'}:
                self.illumination = element.lower()
                continue

            # If none match and subject is not yet set
            if self.subject is None:
                self.subject = element

`init(val_input_path)`

Initialization of a ImagingMetadata instance. Metadata parsing based on the file name. Args: val_input_path: the path to the file for which to parse metadata.

Source code in clair_torch/metadata/imaging_metadata.py

@typechecked
def __init__(self, val_input_path: str | Path):
    """
    Initialization of a ImagingMetadata instance. Metadata parsing based on the file name.
    Args:
        val_input_path: the path to the file for which to parse metadata.
    """
    if isinstance(val_input_path, str):
        val_input_path = Path(val_input_path)
    if not isinstance(val_input_path, Path):
        raise TypeError(f"Expected val_input_path as Path, got {type(val_input_path)}")

    self.exposure_time = None
    self.magnification = None
    self.illumination = None
    self.subject = None

    self._parse_file_name(val_input_path)

`VideoMetadata`

Bases: ImagingMetadata

Class for managing the metadata of a video file, based off of the ImagingMetadata class. Additional feature to that is the numeric metadata field 'number_of_frames', which is parsed using a function that calls OpenCV to get the number of frames.

Source code in clair_torch/metadata/imaging_metadata.py

class VideoMetadata(ImagingMetadata):
    """
    Class for managing the metadata of a video file, based off of the ImagingMetadata class. Additional feature to that
    is the numeric metadata field 'number_of_frames', which is parsed using a function that calls OpenCV to get the
    number of frames.
    """
    @typechecked
    def __init__(self, val_input_path: str | Path):

        super().__init__(val_input_path)
        self.number_of_frames = _get_frame_count(val_input_path)

    @property
    def _numeric_fields(self) -> list[str]:
        return ["exposure_time", "magnification", "number_of_frames"]