API Reference

This section provides a detailed reference to the classes and functions in the vision-explanation-methods package.

vision_explanation_methods

Module for creating explanations for vision models.

vision_explanation_methods.explanations

Module for image explanation methods.

vision_explanation_methods.explanations.drise

Implementation of DRISE.

A black box explainability method for object detection.

vision_explanation_methods.explanations.drise.DRISE_saliency(model: vision_explanation_methods.explanations.common.GeneralObjectDetectionModelWrapper, image_tensor: torch.Tensor, target_detections: List[vision_explanation_methods.explanations.common.DetectionRecord], number_of_masks: int, mask_res: Tuple[int, int] = (16, 16), mask_padding: Optional[int] = None, device: str = 'cpu', verbose: bool = False) → List[torch.Tensor][source]

Compute DRISE saliency map.

Parameters

model (OcclusionModelWrapper) – Object detection model wrapped for occlusion
target_detections (List of Detection Records) – Baseline detections to get saliency maps for
number_of_masks (int) – Number of masks to use for saliency
mask_res – Resolution of mask before scale up
mask_padding – How much to pad the mask before cropping

Type

Optional int

Device

Device to use to run the function

Type

str

Returns

A list of tensors, one tensor for each image. Each tensor is of shape [D, 3, W, H], and [i ,3 W, H] is the saliency map associated with detection i.

Return type

List torch.Tensor

vision_explanation_methods.explanations.drise.DRISE_saliency_for_mlflow(model, image_tensor: pandas.core.frame.DataFrame, target_detections: List[vision_explanation_methods.explanations.common.DetectionRecord], number_of_masks: int, mask_res: Tuple[int, int] = (16, 16), mask_padding: Optional[int] = None, device: str = 'cpu', verbose: bool = False) → List[torch.Tensor][source]

Compute DRISE saliency map.

Parameters

model (OcclusionModelWrapper) – Object detection model wrapped for occlusion
target_detections (List of Detection Records) – Baseline detections to get saliency maps for
number_of_masks (int) – Number of masks to use for saliency
mask_res – Resolution of mask before scale up
mask_padding – How much to pad the mask before cropping

Type

Optional int

Device

Device to use to run the function

Type

str

Returns

A list of tensors, one tensor for each image. Each tensor is of shape [D, 3, W, H], and [i ,3 W, H] is the saliency map associated with detection i.

Return type

List torch.Tensor

class vision_explanation_methods.explanations.drise.MaskAffinityRecord(mask: torch.Tensor, affinity_scores: List[torch.Tensor])[source]

Bases: object

Class for keeping track of masks and associated affinity score.

Parameters

mask (torch.Tensor) – 3xHxW mask
affinity_scores (List of Tensors) – Scores for each detection in each image associated with mask.

get_weighted_masks() → List[torch.Tensor][source]

Return the masks weighted by the affinity scores.

Returns: Masks weighted by affinity scores - N tensors of shape Dx3xHxW, where N is the number of images in the batch, D, is the number of detections in an image (where D changes image to image)
Return type: List of Tensors

to(device: str)[source]

Move affinity record to accelerator.

Parameters: device (String) – Torch string describing device, e.g. ‘cpu’ or ‘cuda:0’

vision_explanation_methods.explanations.drise.compute_affinity_scores(base_detections: vision_explanation_methods.explanations.common.DetectionRecord, masked_detections: vision_explanation_methods.explanations.common.DetectionRecord) → torch.Tensor[source]

Compute highest affinity score between two sets of detections.

Parameters

base_detections (Detection Record) – Set of detections to get affinity scores for
masked_detections (Detection Record) – Set of detections to score against

Returns

Set of affinity scores associated with each detections

Return type

Tensor of shape D, where D is number of base detections

vision_explanation_methods.explanations.drise.convert_base64_to_tensor(b64_img: str, device: str) → torch.Tensor[source]

Convert base64 image to tensor.

Parameters

b64_img (str) – Base64 encoded image
device (str) – Torch string describing device, e.g. “cpu” or “cuda:0”

Returns

Image tensor

Return type

Tensor

vision_explanation_methods.explanations.drise.convert_tensor_to_base64(img_tens: torch.Tensor) → Tuple[str, Tuple[int, int]][source]

Convert image tensor to base64 string.

Parameters: img_tens (Tensor) – Image tensor
Returns: Base64 encoded image
Return type: str

vision_explanation_methods.explanations.drise.fuse_mask(img_tensor: torch.Tensor, mask: torch.Tensor) → torch.Tensor[source]

Mask an image tensor.

Parameters

img_tensor (Tensor) – Image to be masked
mask (Tensor) – Mask for image

Returns

Masked image

Return type

Tensor

vision_explanation_methods.explanations.drise.generate_mask(base_size: Tuple[int, int], img_size: Tuple[int, int], padding: int, device: str) → torch.Tensor[source]

Create a random mask for image occlusion.

Parameters

base_size (Tuple (int, int)) – Lower resolution mask grid shape
img_size (Tuple (int, int)) – Size of image to be masked (hxw)
padding (int) – Amount to offset mask
device (String) – Torch string describing device, e.g. ‘cpu’ or ‘cuda:0’

Returns

Occlusion mask for image, same shape as image

Return type

Tensor

vision_explanation_methods.explanations.drise.saliency_fusion(affinity_records: List[vision_explanation_methods.explanations.drise.MaskAffinityRecord], device: str, normalize: Optional[bool] = True, verbose: bool = False) → torch.Tensor[source]

Create a fused mask based on the affinity scores of the different masks.

Parameters

affinity_records (List of affinity records) – List of affinity records computed for mask
device (String) – Torch string describing device, e.g. ‘cpu’ or ‘cuda:0’
normalize – Normalize the image by subtracting off the average affinity score (optional), defaults to true

Type

bool

Returns

List of saliency maps - one list of maps for each image in batch, and one map per detection in each image

Return type

List of Tensors - one tensor for each image, and each tensor of shape Dx3xHxW, where D is the number of detections in that image.

vision_explanation_methods.evaluation

Module for evaluation.

vision_explanation_methods.evaluation.pointing_game

Defines a variety of explanation evaluation tools.

class vision_explanation_methods.evaluation.pointing_game.PointingGame(model: Any, device='auto')[source]

Bases: object

A class for the high energy pointing game.

calculate_gt_salient_pixel_overlap(saliency_scores: List[torch.Tensor], gt_bbox: List)[source]

Calculate percent of overlap between salient pixels and gt bbox.

Formula: number of salient pixels in the gt bbox /: number of pixels in the gt bbox

Parameters

saliency_scores (List[Tensor]) – 2D matrix representing the saliency scores of each pixel in an image
gt_bbox (List) – bounding box for ground truth prediction

Returns

return percent of salient pixel overlap with the ground truth

Return type

Float

pointing_game(imagelocation: str, index: int, threshold: float = 0.8, num_masks: int = 100)[source]

Calculate the saliency scores for a given object detection prediction.

The calculated value is a matrix of saliency scores. Values below the threshold are set to -1. The goal here is to filter out insignificant saliency scores, and identify highly salient pixels. That is why it is called a pointing game - we want to “point”, i.e. identify, all highly salient pixels. That way we can easily determine if these highly salient pixels overlap with the gt bounding box.

Parameters

imagelocation (str) – Path of the image location
index (int) – Index of the desired object within the given image to evaluate
threshold (float) – threshold between 0 and 1 to determine saliency of a pixel. If saliency score is below the threshold, then the score is set to -1
num_masks (int) – number of masks to run drise with

Returns

2d matrix of highly salient pixels

Return type

List[Tensor]

visualize_highly_salient_pixels(img, saliency_scores, gt_bbox: Optional[List] = None)[source]

Create figure of highly salient pixels.

Parameters

img (PIL.Image) – PIL test image
saliency_scores (List[Tensor]) – 2D matrix representing the saliency scores of each pixel in an image
gt_bbox (List) – bounding box for ground truth prediction. if none then no ground truth bounding box is drawn

Returns

Overlay of the saliency scores on top of the image

Return type

Figure

vision_explanation_methods.error_labeling

Module for error labeling.

vision_explanation_methods.error_labeling.error_labeling

Defines the Error Labeling Manager class.

class vision_explanation_methods.error_labeling.error_labeling.ErrorLabelType(value)[source]

Bases: enum.Enum

Enum providing types of error labels.

If none, then the detection is not an error. It is a correct prediction.

BACKGROUND = 'background'

CLASS_LOCALIZATION = 'class_localization'

CLASS_NAME = 'class_name'

DUPLICATE_DETECTION = 'duplicate_detection'

LOCALIZATION = 'localization'

MATCH = 'match'

MISSING = 'missing'

class vision_explanation_methods.error_labeling.error_labeling.ErrorLabeling(task_type: str, pred_y: list, true_y: list, iou_threshold: float = 0.5)[source]

Bases: object

Defines a wrapper class of Error Labeling for vision scenario.

Only supported for object detection at this point.

compute_error_labels()[source]

Compute labels for errors in an object detection prediction.

Note: if a row does not have a match, that means that there is a missing gt detection

Returns: 2d matrix of error labels
Return type: NDArray

compute_error_list()[source]

Determine a complete list of errors encountered during prediction.

Note that it is possible to have more errors than actual objects in an image (because we account for missing detections and duplicate detections).

Returns: list of error labels
Return type: list

vision_explanation_methods.DRISE_runner

Method for generating saliency maps for object detection models.

vision_explanation_methods.DRISE_runner.get_drise_saliency_map(imagelocation: str, model: Optional[object], numclasses: int, savename: str, nummasks: int = 25, maskres: Tuple[int, int] = (4, 4), maskpadding: Optional[int] = None, devicechoice: Optional[str] = None, max_figures: Optional[int] = None)[source]

Run D-RISE on image and visualize the saliency maps.

Parameters

imagelocation (str) – Path of the image location
model (PyTorch model) – Input model for D-RISE. If None, Faster R-CNN model will be used.
numclasses (int) – Number of classes model predicted
savename (str) – Path of the saved output figure
nummasks (int) – Number of masks to use for saliency
maskres (Tuple of ints) – Resolution of mask before scale up
maskpadding – How much to pad the mask before cropping
max_figures – max figure # if memory limitations.

Type

Optional int

Type

Optional int

Returns

Tuple of Matplotlib figure list, path to where the output figure is saved, list of labels

Return type

Tuple of - list of Matplotlib figures, str, list

vision_explanation_methods.DRISE_runner.get_instance_segmentation_model(num_classes: int)[source]

Load in pre-trained Faster R-CNN model with resnet50 backbone.

Parameters: num_classes (int) – Number of classes model predicted
Returns: Faster R-CNN PyTorch model
Return type: PyTorch model

vision_explanation_methods.DRISE_runner.plot_img_bbox(ax: matplotlib.axes._subplots.AxesSubplot, box: numpy.ndarray, label: str, color: str)[source]

Plot predicted bounding box and label on the D-RISE saliency map.

Parameters

ax (Matplotlib AxesSubplot) – Axis on which the d-rise saliency map was plotted
box (numpy.ndarray) – Bounding box the model predicted
label (str) – Label the model predicted
color (single letter color string) – Color of the bounding box based on predicted label

Returns

Axis with the predicted bounding box and label plotted on top of d-rise saliency map

Return type

Matplotlib AxesSubplot

vision_explanation_methods.version

Metadata including name and version of package.