Image Classes

Image

class perception.Image(data, frame=’unspecified’)

Bases: object

Abstract wrapper class for images.

__init__(data, frame=’unspecified’)

Create an image from an array of data.

Parameters:
  • data (numpy.ndarray) – An array of data with which to make the image. The first dimension of the data should index rows, the second columns, and the third individual pixel elements (i.e. R,G,B values). Alternatively, if the matrix is one dimensional, it will be interpreted as an N by 1 image with single element list at each pixel, and if the matrix is two dimensional, it will be a N by M matrix with a single element list at each pixel.
  • frame (str) – A string representing the frame of reference in which this image lies.
Raises:

ValueError – If the data is not a properly-formatted ndarray or frame is not a string.

shape

tuple of int – The shape of the data array.

height

int – The number of rows in the image.

width

int – The number of columns in the image.

center

numpy.ndarray of int – The xy indices of the center of the image.

channels

int – The number of channels in each pixel. For example, RGB images have 3 channels.

type

numpy.dtype – The data type of the image’s elements.

raw_data

numpy.ndarray – The 3D array of data. The first dim is rows, the second is columns, and the third is pixel channels.

data

numpy.ndarray – The data array, but squeezed to get rid of extraneous dimensions.

frame

str – The frame of reference in which the image resides.

resize(size, interp)

Resize the image.

Parameters:
  • size (int, float, or tuple) –
    • int - Percentage of current size.
    • float - Fraction of current size.
    • tuple - Size of the output image.
  • interp (str, optional) – Interpolation to use for re-sizing (‘nearest’, ‘lanczos’, ‘bilinear’, ‘bicubic’, or ‘cubic’)
transform(translation, theta, method=’opencv’)

Create a new image by translating and rotating the current image.

Parameters:
  • translation (numpy.ndarray of float) – The XY translation vector.
  • theta (float) – Rotation angle in radians, with positive meaning counter-clockwise.
  • method (str) – Method to use for image transformations (opencv or scipy)
Returns:

An image of the same type that has been rotated and translated.

Return type:

Image

gradients()

Return the gradient as a pair of numpy arrays.

Returns:The gradients of the image along each dimension.
Return type:tuple of numpy.ndarray of float
ij_to_linear(i, j)

Converts row / column coordinates to linear indices.

Parameters:
  • i (numpy.ndarray of int) – A list of row coordinates.
  • j (numpy.ndarray of int) – A list of column coordinates.
Returns:

A list of linear coordinates.

Return type:

numpy.ndarray of int

linear_to_ij(linear_inds)

Converts linear indices to row and column coordinates.

Parameters:linear_inds (numpy.ndarray of int) – A list of linear coordinates.
Returns:A 2D ndarray whose first entry is the list of row indices and whose second entry is the list of column indices.
Return type:numpy.ndarray of int
mask_by_ind(inds)

Create a new image by zeroing out data at locations not in the given indices.

Parameters:inds (numpy.ndarray of int) – A 2D ndarray whose first entry is the list of row indices and whose second entry is the list of column indices. The data at these indices will not be set to zero.
Returns:A new Image of the same type, with data not indexed by inds set to zero.
Return type:Image
mask_by_linear_ind(linear_inds)

Create a new image by zeroing out data at locations not in the given indices.

Parameters:linear_inds (numpy.ndarray of int) – A list of linear coordinates.
Returns:A new Image of the same type, with data not indexed by inds set to zero.
Return type:Image
is_same_shape(other_im, check_channels=False)

Checks if two images have the same height and width (and optionally channels).

Parameters:
  • other_im (Image) – The image to compare against this one.
  • check_channels (bool) – Whether or not to check equality of the channels.
Returns:

True if the images are the same shape, False otherwise.

Return type:

bool

static median_images(images)

Create a median Image from a list of Images.

:param list of Image: A list of Image objects.

Returns:A new Image of the same type whose data is the median of all of the images’ data.
Return type:Image
static min_images(images)

Create a min Image from a list of Images.

:param list of Image: A list of Image objects.

Returns:A new Image of the same type whose data is the min of all of the images’ data.
Return type:Image
__getitem__(indices)

Index the image’s data array.

Parameters:indices (int or tuple of int) –
  • int - A linear index.
  • tuple - An ordered index in row, column, and (optionally) channel order.
Returns:The indexed item.
Return type:item
Raises:ValueError – If the index is poorly formatted or out of bounds.
apply(method, *args, **kwargs)

Create a new image by applying a function to this image’s data.

Parameters:
  • method (function) – A function to call on the data. This takes in a ndarray as its first argument and optionally takes other arguments. It should return a modified data ndarray.
  • args (arguments) – Additional args for method.
  • kwargs (keyword arguments) – Additional keyword arguments for method.
Returns:

A new Image of the same type with new data generated by calling method on the current image’s data.

Return type:

Image

copy()

Returns a copy of this image.

Returns:copy of this image
Return type:Image
crop(height, width, center_i=None, center_j=None)

Crop the image centered around center_i, center_j.

Parameters:
  • height (int) – The height of the desired image.
  • width (int) – The width of the desired image.
  • center_i (int) – The center height point at which to crop. If not specified, the center of the image is used.
  • center_j (int) – The center width point at which to crop. If not specified, the center of the image is used.
Returns:

A cropped Image of the same type.

Return type:

Image

focus(height, width, center_i=None, center_j=None)

Zero out all of the image outside of a crop box.

Parameters:
  • height (int) – The height of the desired crop box.
  • width (int) – The width of the desired crop box.
  • center_i (int) – The center height point of the crop box. If not specified, the center of the image is used.
  • center_j (int) – The center width point of the crop box. If not specified, the center of the image is used.
Returns:

A new Image of the same type and size that is zeroed out except within the crop box.

Return type:

Image

center_nonzero()

Recenters the image on the mean of the coordinates of nonzero pixels.

Returns:A new Image of the same type and size that is re-centered at the mean location of the non-zero pixels.
Return type:Image
nonzero_pixels()

Return an array of the nonzero pixels.

Returns:Nx2 array of the nonzero pixels
Return type:numpy.ndarray
zero_pixels()

Return an array of the zero pixels.

Returns:Nx2 array of the zero pixels
Return type:numpy.ndarray
finite_pixels()

Return an array of the finite pixels.

Returns:Nx2 array of the finite pixels
Return type:numpy.ndarray
nonzero_data()

Returns the values in the image at the nonzero pixels

Returns:NxC array of the nonzero data
Return type:numpy.ndarray
replace_zeros(val, zero_thresh=0.0)

Replaces all zeros in the image with a specified value

Returns:value to replace zeros with
Return type:image dtype
save(filename)

Writes the image to a file.

Parameters:filename (str) – The file to save the image to. Must be one of .png, .jpg, .npy, or .npz.
Raises:ValueError – If an unsupported file type is specified.
savefig(output_path, title, dpi=400, format=’png’, cmap=None)

Write the image to a file using pyplot.

Parameters:
  • output_path (str) – The directory in which to place the file.
  • title (str) – The title of the file in which to save the image.
  • dpi (int) – The resolution in dots per inch.
  • format (str) – The file format to save. Available options include .png, .pdf, .ps, .eps, and .svg.
  • cmap (Colormap, optional) – A Colormap object fo the pyplot.
static load_data(filename)

Loads a data matrix from a given file.

Parameters:filename (str) – The file to load the data from. Must be one of .png, .jpg, .npy, or .npz.
Returns:The data array read from the file.
Return type:numpy.ndarray

ColorImage

class perception.ColorImage(data, frame=’unspecified’)

Bases: perception.image.Image

An RGB color image.

__init__(data, frame=’unspecified’)

Create a color image from an array of data.

Parameters:
  • data (numpy.ndarray) – An array of data with which to make the image. The first dimension of the data should index rows, the second columns, and the third individual pixel elements (i.e. R,G,B values). Alternatively, the image may have a single channel, in which case it is interpreted as greyscale.
  • frame (str) – A string representing the frame of reference in which this image lies.
Raises:

ValueError – If the data is not a properly-formatted ndarray or frame is not a string.

r_data

numpy.ndarray of uint8 – The red-channel data.

g_data

numpy.ndarray of uint8 – The green-channel data.

b_data

numpy.ndarray of uint8 – The blue-channel data.

swap_channels(channel_swap)

Swaps the two channels specified in the tuple.

Parameters:channel_swap (tuple of int) – the two channels to swap
Returns:color image with cols swapped
Return type:ColorImage
resize(size, interp=’bilinear’)

Resize the image.

Parameters:
  • size (int, float, or tuple) –
    • int - Percentage of current size.
    • float - Fraction of current size.
    • tuple - Size of the output image.
  • interp (str, optional) – Interpolation to use for re-sizing (‘nearest’, ‘lanczos’, ‘bilinear’, ‘bicubic’, or ‘cubic’)
Returns:

The resized image.

Return type:

ColorImage

find_chessboard(sx=6, sy=9)

Finds the corners of an sx X sy chessboard in the image.

Parameters:
  • sx (int) – Number of chessboard corners in x-direction.
  • sy (int) – Number of chessboard corners in y-direction.
Returns:

A list containing the 2D points of the corners of the detected chessboard, or None if no chessboard found.

Return type:

list of numpy.ndarray

mask_binary(binary_im)

Create a new image by zeroing out data at locations where binary_im == 0.0.

Parameters:binary_im (BinaryImage) – A BinaryImage of the same size as this image, with pixel values of either zero or one. Wherever this image has zero pixels, we’ll zero out the pixels of the new image.
Returns:A new Image of the same type, masked by the given binary image.
Return type:Image
foreground_mask(tolerance, ignore_black=True, use_hsv=False, scale=8, bgmodel=None)

Creates a binary image mask for the foreground of an image against a uniformly colored background. The background is assumed to be the mode value of the histogram for each of the color channels.

Parameters:
  • tolerance (int) – A +/- level from the detected mean backgroud color. Pixels withing this range will be classified as background pixels and masked out.
  • ignore_black (bool) – If True, the zero pixels will be ignored when computing the background model.
  • use_hsv (bool) – If True, image will be converted to HSV for background model generation.
  • scale (int) – Size of background histogram bins – there will be 255/size bins in the color histogram for each channel.
  • bgmodel (list of int) – A list containing the red, green, and blue channel modes of the background. If this is None, a background model will be generated using the other parameters.
Returns:

A binary image that masks out the background from the current ColorImage.

Return type:

BinaryImage

background_model(ignore_black=True, use_hsv=False, scale=8)

Creates a background model for the given image. The background color is given by the modes of each channel’s histogram.

Parameters:
  • ignore_black (bool) – If True, the zero pixels will be ignored when computing the background model.
  • use_hsv (bool) – If True, image will be converted to HSV for background model generation.
  • scale (int) – Size of background histogram bins – there will be 255/size bins in the color histogram for each channel.
Returns:

  • A list containing the red, green, and blue channel modes of the
  • background.

draw_box(box)

Draw a white box on the image.

:param autolab_core.Box: A 2D box to draw in the image.

Returns:A new image that is the same as the current one, but with the white box drawn in.
Return type:ColorImage
nonzero_hsv_data()

Computes non zero hsv values.

Returns:array of the hsv values for the image
Return type:numpy.ndarray
segment_kmeans(rgb_weight, num_clusters, hue_weight=0.0)

Segment a color image using KMeans based on spatial and color distances. Black pixels will automatically be assigned to their own ‘background’ cluster.

Parameters:
  • rgb_weight (float) – weighting of RGB distance relative to spatial and hue distance
  • num_clusters (int) – number of clusters to use
  • hue_weight (float) – weighting of hue from hsv relative to spatial and RGB distance
Returns:

image containing the segment labels

Return type:

SegmentationImage

inpaint(win_size=3, rescale_factor=1.0)

Fills in the zero pixels in the image.

Parameters:
  • win_size (int) – size of window to use for inpainting
  • rescale_factor (float) – amount to rescale the image for inpainting, smaller numbers increase speed
Returns:

color image with zero pixels filled in

Return type:

ColorImage

to_binary(threshold=0.0)

Converts the color image to binary.

Returns:Binary image corresponding to the nonzero px of the original image
Return type:BinaryImage
to_grayscale()

Converts the color image to grayscale using OpenCV.

Returns:Grayscale image corresponding to original color image.
Return type:GrayscaleImage
static open(filename, frame=’unspecified’)

Creates a ColorImage from a file.

Parameters:
  • filename (str) – The file to load the data from. Must be one of .png, .jpg, .npy, or .npz.
  • frame (str) – A string representing the frame of reference in which the new image lies.
Returns:

The new color image.

Return type:

ColorImage

DepthImage

class perception.DepthImage(data, frame=’unspecified’)

Bases: perception.image.Image

A depth image in which individual pixels have a single floating-point depth channel.

__init__(data, frame=’unspecified’)

Create a depth image from an array of data.

Parameters:
  • data (numpy.ndarray) – An array of data with which to make the image. The first dimension of the data should index rows, the second columns, and the third individual pixel elements (depths as floating point numbers).
  • frame (str) – A string representing the frame of reference in which this image lies.
Raises:

ValueError – If the data is not a properly-formatted ndarray or frame is not a string.

resize(size, interp=’bilinear’)

Resize the image.

Parameters:
  • size (int, float, or tuple) –
    • int - Percentage of current size.
    • float - Fraction of current size.
    • tuple - Size of the output image.
  • interp (str, optional) – Interpolation to use for re-sizing (‘nearest’, ‘lanczos’, ‘bilinear’, ‘bicubic’, or ‘cubic’)
Returns:

The resized image.

Return type:

DepthImage

threshold(front_thresh=0.0, rear_thresh=100.0)

Creates a new DepthImage by setting all depths less than front_thresh and greater than rear_thresh to 0.

Parameters:
  • front_thresh (float) – The lower-bound threshold.
  • rear_thresh (float) – The upper bound threshold.
Returns:

A new DepthImage created from the thresholding operation.

Return type:

DepthImage

threshold_gradients(grad_thresh)

Creates a new DepthImage by zeroing out all depths where the magnitude of the gradient at that point is greater than grad_thresh.

Parameters:grad_thresh (float) – A threshold for the gradient magnitude.
Returns:A new DepthImage created from the thresholding operation.
Return type:DepthImage
threshold_gradients_pctile(thresh_pctile, min_mag=0.0)

Creates a new DepthImage by zeroing out all depths where the magnitude of the gradient at that point is greater than some percentile of all gradients.

Parameters:
  • thresh_pctile (float) – percentile to threshold all gradients above
  • min_mag (float) – minimum magnitude of the gradient
Returns:

A new DepthImage created from the thresholding operation.

Return type:

DepthImage

inpaint(rescale_factor=1.0)

Fills in the zero pixels in the image.

Parameters:rescale_factor (float) – amount to rescale the image for inpainting, smaller numbers increase speed
Returns:depth image with zero pixels filled in
Return type:DepthImage
mask_binary(binary_im)

Create a new image by zeroing out data at locations where binary_im == 0.0.

Parameters:binary_im (BinaryImage) – A BinaryImage of the same size as this image, with pixel values of either zero or one. Wherever this image has zero pixels, we’ll zero out the pixels of the new image.
Returns:A new Image of the same type, masked by the given binary image.
Return type:Image
pixels_farther_than(depth_im)

Returns the pixels that are farther away than those in the corresponding depth image.

Parameters:depth_im (DepthImage) – depth image to query replacement with
Returns:the pixels
Return type:numpy.ndarray
combine_with(depth_im)

Replaces all zeros in the source depth image with the value of a different depth image

Parameters:depth_im (DepthImage) – depth image to combine with
Returns:the combined depth image
Return type:DepthImage
to_binary(threshold=0.0)

Creates a BinaryImage from the depth image. Points where the depth is greater than threshold are converted to ones, and all other points are zeros.

Parameters:threshold (float) – The depth threshold.
Returns:A BinaryImage where all 1 points had a depth greater than threshold in the DepthImage.
Return type:BinaryImage
to_color(normalize=False)

Convert to a color image.

Parameters:normalize (bool) – whether or not to normalize by the maximum depth
Returns:color image corresponding to the depth image
Return type:ColorImage
to_float()

Converts to 32-bit data.

Returns:depth image with 32 bit float data
Return type:DepthImage
point_normal_cloud(camera_intr)

Computes a PointNormalCloud from the depth image.

Parameters:camera_intr (CameraIntrinsics) – The camera parameters on which this depth image was taken.
Returns:A PointNormalCloud created from the depth image.
Return type:autolab_core.PointNormalCloud
static open(filename, frame=’unspecified’)

Creates a DepthImage from a file.

Parameters:
  • filename (str) – The file to load the data from. Must be one of .png, .jpg, .npy, or .npz.
  • frame (str) – A string representing the frame of reference in which the new image lies.
Returns:

The new depth image.

Return type:

DepthImage

IrImage

class perception.IrImage(data, frame=’unspecified’)

Bases: perception.image.Image

An IR image in which individual pixels have a single uint16 channel.

__init__(data, frame=’unspecified’)

Create an IR image from an array of data.

Parameters:
  • data (numpy.ndarray) – An array of data with which to make the image. The first dimension of the data should index rows, the second columns, and the third individual pixel elements (IR values as uint16’s).
  • frame (str) – A string representing the frame of reference in which this image lies.
Raises:

ValueError – If the data is not a properly-formatted ndarray or frame is not a string.

resize(size, interp=’bilinear’)

Resize the image.

Parameters:
  • size (int, float, or tuple) –
    • int - Percentage of current size.
    • float - Fraction of current size.
    • tuple - Size of the output image.
  • interp (str, optional) – Interpolation to use for re-sizing (‘nearest’, ‘lanczos’, ‘bilinear’, ‘bicubic’, or ‘cubic’)
Returns:

The resized image.

Return type:

IrImage

static open(filename, frame=’unspecified’)

Creates an IrImage from a file.

Parameters:
  • filename (str) – The file to load the data from. Must be one of .png, .jpg, .npy, or .npz.
  • frame (str) – A string representing the frame of reference in which the new image lies.
Returns:

The new IR image.

Return type:

IrImage

GrayscaleImage

class perception.IrImage(data, frame=’unspecified’)

Bases: perception.image.Image

An IR image in which individual pixels have a single uint16 channel.

__init__(data, frame=’unspecified’)

Create an IR image from an array of data.

Parameters:
  • data (numpy.ndarray) – An array of data with which to make the image. The first dimension of the data should index rows, the second columns, and the third individual pixel elements (IR values as uint16’s).
  • frame (str) – A string representing the frame of reference in which this image lies.
Raises:

ValueError – If the data is not a properly-formatted ndarray or frame is not a string.

resize(size, interp=’bilinear’)

Resize the image.

Parameters:
  • size (int, float, or tuple) –
    • int - Percentage of current size.
    • float - Fraction of current size.
    • tuple - Size of the output image.
  • interp (str, optional) – Interpolation to use for re-sizing (‘nearest’, ‘lanczos’, ‘bilinear’, ‘bicubic’, or ‘cubic’)
Returns:

The resized image.

Return type:

IrImage

static open(filename, frame=’unspecified’)

Creates an IrImage from a file.

Parameters:
  • filename (str) – The file to load the data from. Must be one of .png, .jpg, .npy, or .npz.
  • frame (str) – A string representing the frame of reference in which the new image lies.
Returns:

The new IR image.

Return type:

IrImage

BinaryImage

class perception.BinaryImage(data, frame=’unspecified’, threshold=128)

Bases: perception.image.Image

A binary image in which individual pixels are either black or white (0 or 255).

__init__(data, frame=’unspecified’, threshold=128)

Create a BinaryImage image from an array of data.

Parameters:
  • data (numpy.ndarray) – An array of data with which to make the image. The first dimension of the data should index rows, the second columns, and the third individual pixel elements (only one channel, all uint8). The data array will be thresholded and will end up only containing elements that are 255 or 0.
  • threshold (int) – A threshold value. Any value in the data array greater than threshold will be set to 255, and all others will be set to 0.
  • frame (str) – A string representing the frame of reference in which this image lies.
Raises:

ValueError – If the data is not a properly-formatted ndarray or frame is not a string.

resize(size, interp=’bilinear’)

Resize the image.

Parameters:
  • size (int, float, or tuple) –
    • int - Percentage of current size.
    • float - Fraction of current size.
    • tuple - Size of the output image.
  • interp (str, optional) – Interpolation to use for re-sizing (‘nearest’, ‘lanczos’, ‘bilinear’, ‘bicubic’, or ‘cubic’)
Returns:

The resized image.

Return type:

BinaryImage

mask_binary(binary_im)

Takes AND operation with other binary image.

Parameters:binary_im (BinaryImage) – binary image for and operation
Returns:AND of this binary image and other image
Return type:BinaryImage
prune_contours(area_thresh=1000.0, dist_thresh=20, preserve_topology=True)

Removes all white connected components with area less than area_thresh. :param area_thresh: The minimum area for which a white connected component will not be

zeroed out.
Parameters:dist_thresh (int) – If a connected component is within dist_thresh of the top of the image, it will not be pruned out, regardless of its area.
Returns:The new pruned binary image.
Return type:BinaryImage
find_contours(min_area=0.0, max_area=inf)

Returns a list of connected components with an area between min_area and max_area. :param min_area: The minimum area for a contour :type min_area: float :param max_area: The maximum area for a contour :type max_area: float

Returns:A list of resuting contours
Return type:list of Contour
contour_mask(contour)

Generates a binary image with only the given contour filled in.

boundary_map()

Computes the boundary pixels in the image and sets them to nonzero values.

Returns:binary image with nonzeros on the boundary of the original image
Return type:BinaryImage
closest_nonzero_pixel(pixel, direction, w=13, t=0.5)

Starting at pixel, moves pixel by direction * t until there is a non-zero pixel within a radius w of pixel. Then, returns pixel.

Parameters:
  • pixel (numpy.ndarray of float) – The initial pixel location at which to start.
  • direction (numpy.ndarray of float) – The 2D direction vector in which to move pixel.
  • w (int) – A circular radius in which to check for non-zero pixels. As soon as the current pixel has some non-zero pixel with a raidus w of it, this function returns the current pixel location.
  • t (float) – The step size with which to move pixel along direction.
Returns:

The first pixel location along the direction vector at which there exists some non-zero pixel within a radius w.

Return type:

numpy.ndarray of float

add_frame(left_boundary, right_boundary, upper_boundary, lower_boundary)

Adds a frame to the image, e.g. turns the boundaries white

Parameters:
  • left_boundary (int) – the leftmost boundary of the frame
  • right_boundary (int) – the rightmost boundary of the frame (must be greater than left_boundary)
  • upper_boundary (int) – the upper boundary of the frame
  • lower_boundary (int) – the lower boundary of the frame (must be greater than upper_boundary)
Returns:

binary image with white (255) on the boundaries

Return type:

BinaryImage

most_free_pixel()

Find the black pixel with the largest distance from the white pixels.

Returns:2-vector containing the most free pixel
Return type:numpy.ndarray
diff_with_target(binary_im)

Creates a color image to visualize the overlap between two images. Nonzero pixels that match in both images are green. Nonzero pixels of this image that aren’t in the other image are yellow Nonzero pixels of the other image that aren’t in this image are red

Parameters:binary_im (BinaryImage) – binary image to take the difference with
Returns:color image to visualize the image difference
Return type:ColorImage
num_adjacent(i, j)

Counts the number of adjacent nonzero pixels to a given pixel.

Parameters:
  • i (int) – row index of query pixel
  • j (int) – col index of query pixel
Returns:

number of adjacent nonzero pixels

Return type:

int

to_sdf()

Converts the 2D image to a 2D signed distance field.

Returns:2D float array of the signed distance field
Return type:numpy.ndarray
to_color()

Creates a ColorImage from the binary image.

Returns:The newly-created color image.
Return type:ColorImage
static open(filename, frame=’unspecified’)

Creates a BinaryImage from a file.

Parameters:
  • filename (str) – The file to load the data from. Must be one of .png, .jpg, .npy, or .npz.
  • frame (str) – A string representing the frame of reference in which the new image lies.
Returns:

The new binary image.

Return type:

BinaryImage

SegmentationImage

class perception.SegmentationImage(data, frame=’unspecified’)

Bases: perception.image.Image

An image containing integer-valued segment labels.

__init__(data, frame=’unspecified’)

Create a Segmentation image from an array of data.

Parameters:
  • data (numpy.ndarray) – An array of data with which to make the image. The first dimension of the data should index rows, the second columns, and the third individual pixel elements (only one channel, all uint8). The integer-valued data should correspond to segment labels.
  • frame (str) – A string representing the frame of reference in which this image lies.
Raises:

ValueError – If the data is not a properly-formatted ndarray or frame is not a string.

border_pixels(grad_sigma=0.5, grad_lower_thresh=0.1, grad_upper_thresh=1.0)

Returns the pixels on the boundary between all segments, excluding the zero segment.

Parameters:
  • grad_sigma (float) – standard deviation used for gaussian gradient filter
  • grad_lower_thresh (float) – lower threshold on gradient threshold used to determine the boundary pixels
  • grad_upper_thresh (float) – upper threshold on gradient threshold used to determine the boundary pixels
Returns:

Nx2 array of pixels on the boundary

Return type:

numpy.ndarray

segment_mask(segnum)

Returns a binary image of just the segment corresponding to the given number.

Parameters:segnum (int) – the number of the segment to generate a mask for
Returns:binary image data
Return type:BinaryImage
resize(size, interp=’nearest’)

Resize the image.

Parameters:
  • size (int, float, or tuple) –
    • int - Percentage of current size.
    • float - Fraction of current size.
    • tuple - Size of the output image.
  • interp (str, optional) – Interpolation to use for re-sizing (‘nearest’, ‘lanczos’, ‘bilinear’, ‘bicubic’, or ‘cubic’)
static open(filename, frame=’unspecified’)

Opens a segmentation image

PointCloudImage

class perception.PointCloudImage(data, frame=’unspecified’)

Bases: perception.image.Image

A point cloud image in which individual pixels have three float channels.

__init__(data, frame=’unspecified’)

Create a PointCloudImage image from an array of data.

Parameters:
  • data (numpy.ndarray) – An array of data with which to make the image. The first dimension of the data should index rows, the second columns, and the third individual pixel elements (three floats).
  • frame (str) – A string representing the frame of reference in which this image lies.
Raises:

ValueError – If the data is not a properly-formatted ndarray or frame is not a string.

resize(size, interp=’bilinear’)

Resize the image.

Parameters:
  • size (int, float, or tuple) –
    • int - Percentage of current size.
    • float - Fraction of current size.
    • tuple - Size of the output image.
  • interp (str, optional) – Interpolation to use for re-sizing (‘nearest’, ‘lanczos’, ‘bilinear’, ‘bicubic’, or ‘cubic’)
Returns:

The resized image.

Return type:

PointCloudImage

to_point_cloud()

Convert the image to a PointCloud object.

Returns:The corresponding PointCloud.
Return type:autolab_core.PointCloud
normal_cloud_im()

Generate a NormalCloudImage from the PointCloudImage.

Returns:The corresponding NormalCloudImage.
Return type:NormalCloudImage
static open(filename, frame=’unspecified’)

Creates a PointCloudImage from a file.

Parameters:
  • filename (str) – The file to load the data from. Must be one of .png, .jpg, .npy, or .npz.
  • frame (str) – A string representing the frame of reference in which the new image lies.
Returns:

The new PointCloudImage.

Return type:

PointCloudImage

NormalCloudImage

class perception.NormalCloudImage(data, frame=’unspecified’)

Bases: perception.image.Image

A normal cloud image in which individual pixels have three float channels.

__init__(data, frame=’unspecified’)

Create a NormalCloudImage image from an array of data.

Parameters:
  • data (numpy.ndarray) – An array of data with which to make the image. The first dimension of the data should index rows, the second columns, and the third individual pixel elements (three floats).
  • frame (str) – A string representing the frame of reference in which this image lies.
Raises:

ValueError – If the data is not a properly-formatted ndarray or frame is not a string.

resize(size, interp=’bilinear’)

This method is not implemented for NormalCloudImage.

Raises:NotImplementedError
to_normal_cloud()

Convert the image to a NormalCloud object.

Returns:The corresponding NormalCloud.
Return type:autolab_core.NormalCloud
static open(filename, frame=’unspecified’)

Creates a NormalCloudImage from a file.

Parameters:
  • filename (str) – The file to load the data from. Must be one of .png, .jpg, .npy, or .npz.
  • frame (str) – A string representing the frame of reference in which the new image lies.
Returns:

The new NormalCloudImage.

Return type:

NormalCloudImage

RenderMode

class perception.RenderMode

Bases: object

Supported rendering modes.

ObjectRender

class perception.ObjectRender(image, T_camera_world=RigidTransform(rotation=[[ 1. 0. 0.] [ 0. 1. 0.] [ 0. 0. 1.]], translation=[ 0. 0. 0.], from_frame=camera, to_frame=table), obj_key=None, stable_pose=None)

Bases: object

Class to encapsulate images of an object rendered from a virtual camera.

Note

In this class, the table’s frame of reference is the ‘world’ frame for the renderer.

__init__(image, T_camera_world=RigidTransform(rotation=[[ 1. 0. 0.] [ 0. 1. 0.] [ 0. 0. 1.]], translation=[ 0. 0. 0.], from_frame=camera, to_frame=table), obj_key=None, stable_pose=None)

Create an ObjectRender.

Parameters:
  • image (Image) – The image to be encapsulated.
  • T_camera_world (autolab_core.RigidTransform) – A rigid transform from camera to world coordinates (positions the camera in the world). TODO – this should be renamed.
  • obj_key (str, optional) – A string identifier for the object being rendered.
  • stable_pose (meshpy.StablePose) – The object’s stable pose.
T_obj_camera

Returns the transformation from camera to object when the object is in the given stable pose.

Returns:The desired transform.
Return type:autolab_core.RigidTransform