API Reference

This section provides a detailed reference to the classes and functions in the vision-explanation-methods package.

vision_explanation_methods

Module for creating explanations for vision models.

vision_explanation_methods.explanations

Module for image explanation methods.

vision_explanation_methods.explanations.drise

Implementation of DRISE.

A black box explainability method for object detection.

vision_explanation_methods.explanations.drise.DRISE_saliency(model: vision_explanation_methods.explanations.common.GeneralObjectDetectionModelWrapper, image_tensor: torch.Tensor, target_detections: List[vision_explanation_methods.explanations.common.DetectionRecord], number_of_masks: int, mask_res: Tuple[int, int] = (16, 16), mask_padding: Optional[int] = None, device: str = 'cpu', verbose: bool = False) List[torch.Tensor][source]

Compute DRISE saliency map.

Parameters
  • model (OcclusionModelWrapper) – Object detection model wrapped for occlusion

  • target_detections (List of Detection Records) – Baseline detections to get saliency maps for

  • number_of_masks (int) – Number of masks to use for saliency

  • mask_res – Resolution of mask before scale up

  • mask_padding – How much to pad the mask before cropping

Type

Optional int

Device

Device to use to run the function

Type

str

Returns

A list of tensors, one tensor for each image. Each tensor is of shape [D, 3, W, H], and [i ,3 W, H] is the saliency map associated with detection i.

Return type

List torch.Tensor

vision_explanation_methods.explanations.drise.DRISE_saliency_for_mlflow(model, image_tensor: pandas.core.frame.DataFrame, target_detections: List[vision_explanation_methods.explanations.common.DetectionRecord], number_of_masks: int, mask_res: Tuple[int, int] = (16, 16), mask_padding: Optional[int] = None, device: str = 'cpu', verbose: bool = False) List[torch.Tensor][source]

Compute DRISE saliency map.

Parameters
  • model (OcclusionModelWrapper) – Object detection model wrapped for occlusion

  • target_detections (List of Detection Records) – Baseline detections to get saliency maps for

  • number_of_masks (int) – Number of masks to use for saliency

  • mask_res – Resolution of mask before scale up

  • mask_padding – How much to pad the mask before cropping

Type

Optional int

Device

Device to use to run the function

Type

str

Returns

A list of tensors, one tensor for each image. Each tensor is of shape [D, 3, W, H], and [i ,3 W, H] is the saliency map associated with detection i.

Return type

List torch.Tensor

class vision_explanation_methods.explanations.drise.MaskAffinityRecord(mask: torch.Tensor, affinity_scores: List[torch.Tensor])[source]

Bases: object

Class for keeping track of masks and associated affinity score.

Parameters
  • mask (torch.Tensor) – 3xHxW mask

  • affinity_scores (List of Tensors) – Scores for each detection in each image associated with mask.

get_weighted_masks() List[torch.Tensor][source]

Return the masks weighted by the affinity scores.

Returns

Masks weighted by affinity scores - N tensors of shape Dx3xHxW, where N is the number of images in the batch, D, is the number of detections in an image (where D changes image to image)

Return type

List of Tensors

to(device: str)[source]

Move affinity record to accelerator.

Parameters

device (String) – Torch string describing device, e.g. ‘cpu’ or ‘cuda:0’

vision_explanation_methods.explanations.drise.compute_affinity_scores(base_detections: vision_explanation_methods.explanations.common.DetectionRecord, masked_detections: vision_explanation_methods.explanations.common.DetectionRecord) torch.Tensor[source]

Compute highest affinity score between two sets of detections.

Parameters
  • base_detections (Detection Record) – Set of detections to get affinity scores for

  • masked_detections (Detection Record) – Set of detections to score against

Returns

Set of affinity scores associated with each detections

Return type

Tensor of shape D, where D is number of base detections

vision_explanation_methods.explanations.drise.convert_base64_to_tensor(b64_img: str, device: str) torch.Tensor[source]

Convert base64 image to tensor.

Parameters
  • b64_img (str) – Base64 encoded image

  • device (str) – Torch string describing device, e.g. “cpu” or “cuda:0”

Returns

Image tensor

Return type

Tensor

vision_explanation_methods.explanations.drise.convert_tensor_to_base64(img_tens: torch.Tensor) Tuple[str, Tuple[int, int]][source]

Convert image tensor to base64 string.

Parameters

img_tens (Tensor) – Image tensor

Returns

Base64 encoded image

Return type

str

vision_explanation_methods.explanations.drise.fuse_mask(img_tensor: torch.Tensor, mask: torch.Tensor) torch.Tensor[source]

Mask an image tensor.

Parameters
  • img_tensor (Tensor) – Image to be masked

  • mask (Tensor) – Mask for image

Returns

Masked image

Return type

Tensor

vision_explanation_methods.explanations.drise.generate_mask(base_size: Tuple[int, int], img_size: Tuple[int, int], padding: int, device: str) torch.Tensor[source]

Create a random mask for image occlusion.

Parameters
  • base_size (Tuple (int, int)) – Lower resolution mask grid shape

  • img_size (Tuple (int, int)) – Size of image to be masked (hxw)

  • padding (int) – Amount to offset mask

  • device (String) – Torch string describing device, e.g. ‘cpu’ or ‘cuda:0’

Returns

Occlusion mask for image, same shape as image

Return type

Tensor

vision_explanation_methods.explanations.drise.saliency_fusion(affinity_records: List[vision_explanation_methods.explanations.drise.MaskAffinityRecord], device: str, normalize: Optional[bool] = True, verbose: bool = False) torch.Tensor[source]

Create a fused mask based on the affinity scores of the different masks.

Parameters
  • affinity_records (List of affinity records) – List of affinity records computed for mask

  • device (String) – Torch string describing device, e.g. ‘cpu’ or ‘cuda:0’

  • normalize – Normalize the image by subtracting off the average affinity score (optional), defaults to true

Type

bool

Returns

List of saliency maps - one list of maps for each image in batch, and one map per detection in each image

Return type

List of Tensors - one tensor for each image, and each tensor of shape Dx3xHxW, where D is the number of detections in that image.

vision_explanation_methods.evaluation

Module for evaluation.

vision_explanation_methods.evaluation.pointing_game

Defines a variety of explanation evaluation tools.

class vision_explanation_methods.evaluation.pointing_game.PointingGame(model: Any, device='auto')[source]

Bases: object

A class for the high energy pointing game.

calculate_gt_salient_pixel_overlap(saliency_scores: List[torch.Tensor], gt_bbox: List)[source]

Calculate percent of overlap between salient pixels and gt bbox.

Formula: number of salient pixels in the gt bbox /

number of pixels in the gt bbox

Parameters
  • saliency_scores (List[Tensor]) – 2D matrix representing the saliency scores of each pixel in an image

  • gt_bbox (List) – bounding box for ground truth prediction

Returns

return percent of salient pixel overlap with the ground truth

Return type

Float

pointing_game(imagelocation: str, index: int, threshold: float = 0.8, num_masks: int = 100)[source]

Calculate the saliency scores for a given object detection prediction.

The calculated value is a matrix of saliency scores. Values below the threshold are set to -1. The goal here is to filter out insignificant saliency scores, and identify highly salient pixels. That is why it is called a pointing game - we want to “point”, i.e. identify, all highly salient pixels. That way we can easily determine if these highly salient pixels overlap with the gt bounding box.

Parameters
  • imagelocation (str) – Path of the image location

  • index (int) – Index of the desired object within the given image to evaluate

  • threshold (float) – threshold between 0 and 1 to determine saliency of a pixel. If saliency score is below the threshold, then the score is set to -1

  • num_masks (int) – number of masks to run drise with

Returns

2d matrix of highly salient pixels

Return type

List[Tensor]

visualize_highly_salient_pixels(img, saliency_scores, gt_bbox: Optional[List] = None)[source]

Create figure of highly salient pixels.

Parameters
  • img (PIL.Image) – PIL test image

  • saliency_scores (List[Tensor]) – 2D matrix representing the saliency scores of each pixel in an image

  • gt_bbox (List) – bounding box for ground truth prediction. if none then no ground truth bounding box is drawn

Returns

Overlay of the saliency scores on top of the image

Return type

Figure

vision_explanation_methods.error_labeling

Module for error labeling.

vision_explanation_methods.error_labeling.error_labeling

Defines the Error Labeling Manager class.

class vision_explanation_methods.error_labeling.error_labeling.ErrorLabelType(value)[source]

Bases: enum.Enum

Enum providing types of error labels.

If none, then the detection is not an error. It is a correct prediction.

BACKGROUND = 'background'
CLASS_LOCALIZATION = 'class_localization'
CLASS_NAME = 'class_name'
DUPLICATE_DETECTION = 'duplicate_detection'
LOCALIZATION = 'localization'
MATCH = 'match'
MISSING = 'missing'
class vision_explanation_methods.error_labeling.error_labeling.ErrorLabeling(task_type: str, pred_y: list, true_y: list, iou_threshold: float = 0.5)[source]

Bases: object

Defines a wrapper class of Error Labeling for vision scenario.

Only supported for object detection at this point.

compute_error_labels()[source]

Compute labels for errors in an object detection prediction.

Note: if a row does not have a match, that means that there is a missing gt detection

Returns

2d matrix of error labels

Return type

NDArray

compute_error_list()[source]

Determine a complete list of errors encountered during prediction.

Note that it is possible to have more errors than actual objects in an image (because we account for missing detections and duplicate detections).

Returns

list of error labels

Return type

list

vision_explanation_methods.DRISE_runner

Method for generating saliency maps for object detection models.

vision_explanation_methods.DRISE_runner.get_drise_saliency_map(imagelocation: str, model: Optional[object], numclasses: int, savename: str, nummasks: int = 25, maskres: Tuple[int, int] = (4, 4), maskpadding: Optional[int] = None, devicechoice: Optional[str] = None, max_figures: Optional[int] = None)[source]

Run D-RISE on image and visualize the saliency maps.

Parameters
  • imagelocation (str) – Path of the image location

  • model (PyTorch model) – Input model for D-RISE. If None, Faster R-CNN model will be used.

  • numclasses (int) – Number of classes model predicted

  • savename (str) – Path of the saved output figure

  • nummasks (int) – Number of masks to use for saliency

  • maskres (Tuple of ints) – Resolution of mask before scale up

  • maskpadding – How much to pad the mask before cropping

  • max_figures – max figure # if memory limitations.

Type

Optional int

Type

Optional int

Returns

Tuple of Matplotlib figure list, path to where the output figure is saved, list of labels

Return type

Tuple of - list of Matplotlib figures, str, list

vision_explanation_methods.DRISE_runner.get_instance_segmentation_model(num_classes: int)[source]

Load in pre-trained Faster R-CNN model with resnet50 backbone.

Parameters

num_classes (int) – Number of classes model predicted

Returns

Faster R-CNN PyTorch model

Return type

PyTorch model

vision_explanation_methods.DRISE_runner.plot_img_bbox(ax: matplotlib.axes._subplots.AxesSubplot, box: numpy.ndarray, label: str, color: str)[source]

Plot predicted bounding box and label on the D-RISE saliency map.

Parameters
  • ax (Matplotlib AxesSubplot) – Axis on which the d-rise saliency map was plotted

  • box (numpy.ndarray) – Bounding box the model predicted

  • label (str) – Label the model predicted

  • color (single letter color string) – Color of the bounding box based on predicted label

Returns

Axis with the predicted bounding box and label plotted on top of d-rise saliency map

Return type

Matplotlib AxesSubplot

vision_explanation_methods.version

Metadata including name and version of package.