learning

Classes to facilitate machine learning with Dex-Net, including wrappers for GQ-CNN training datasets and Multi-Armed Bandit methods for grasp robustness evaluation.

class dexnet.learning.Model

A predictor of some value of the input data

predict(x)

Predict the function of the data at some point x. For probabilistic models this returns the mean prediction

snapshot()

Returns a concise description of the current model for debugging and logging purposes

update()

Update the model based on current data

class dexnet.learning.DiscreteModel

Maintains a prediction over a discrete set of points

max_prediction()

Returns the index (or indices), posterior mean, and posterior variance of the variable(s) with the maximal mean predicted value

num_vars()

Returns the number of variables in the model

sample()

Sample discrete predictions from the model. For deterministic models, returns the deterministic prediction

class dexnet.learning.Snapshot(best_pred_ind, num_obs)

Abstract class for storing the current state of a model

class dexnet.learning.BernoulliSnapshot(best_pred_ind, means, num_obs)

Stores the current state of a Bernoulli model

class dexnet.learning.BetaBernoulliSnapshot(best_pred_ind, alphas, betas, num_obs)

Stores the current state of a Beta Bernoulli model

class dexnet.learning.GaussianSnapshot(best_pred_ind, means, variances, sample_vars, num_obs)

Stores the current state of a Gaussian model

class dexnet.learning.BernoulliModel(num_vars, mean_prior=0.5)

Standard bernoulli model for predictions over a discrete set of candidates

num_vars

int – the number of variables to track

prior_means

(float) prior on mean probabilty of success for candidates

static bernoulli_mean(p)

Mean of the beta distribution with params alpha and beta

static bernoulli_variance(p, n)

Uses Wald interval for variance prediction

max_prediction()

Returns the index (or indices), posterior mean, and posterior variance of the variable(s) with the maximal mean probaiblity of success

predict(index)

Predicts the probability of success for the variable indexed by index

sample()

Samples probabilities of success from the given values

snapshot()

Return copys of the model params

update(index, value)

Update the model based on an observation of value at index index

class dexnet.learning.BetaBernoulliModel(num_vars, alpha_prior=1.0, beta_prior=1.0)

Beta-Bernoulli model for predictions over a discrete set of candidates .. attribute:: num_vars

int – the number of variables to track
alpha_prior

float – prior alpha parameter of the Beta distribution

beta_prior

float – prior beta parameter of the Beta distribution

static beta_mean(alpha, beta)

Mean of the beta distribution with params alpha and beta

static beta_variance(alpha, beta)

Mean of the beta distribution with params alpha and beta

max_prediction()

Returns the index (or indices), posterior mean, and posterior variance of the variable(s) with the maximal mean probaiblity of success

predict(index)

Predicts the probability of success for the variable indexed by index

sample(vis=False, stop=False)

Samples probabilities of success from the given values

static sample_variance(alpha, beta)

Mean of the beta distribution with params alpha and beta

snapshot()

Return copies of the model params

update(index, value)

Update the model based on an observation of value at index index

class dexnet.learning.GaussianModel(num_vars)

Gaussian model for predictions over a discrete set of candidates.

num_vars

int – the number of variables to track

max_prediction()

Returns the index, mean, and variance of the variable(s) with the maximal predicted value.

predict(index)

Predict the value of the index’th variable.

Parameters:index (int) – the variable to find the predicted value for
sample(stop=False)

Sample discrete predictions from the model. Mean follows a t-distribution

snapshot()

Returns a concise description of the current model for debugging and logging purposes.

update(index, value)

Update the model based on current data.

Parameters:
  • index (int) – the index of the variable that was evaluated
  • value (float) – the value of the variable
variances

Confidence bounds on the mean

class dexnet.learning.CorrelatedBetaBernoulliModel(candidates, nn, kernel, tolerance=0.01, alpha_prior=1.0, beta_prior=1.0, p=0.5)

Correlated Beta-Bernoulli model for predictions over a discrete set of candidates.

candidates

list – the objects to track

nn

NearestNeighbor – nearest neighbor structure to use for neighborhood lookups

kernel

Kernel – kernel instance to measure similarities

tolerance

float – for computing radius of neighborhood, between 0 and 1

alpha_prior

float – prior alpha parameter of the Beta distribution

beta_prior

float – prior beta parameter of the Beta distribution

kernel_matrix

Create the full kernel matrix for debugging purposes

lcb_prediction(p=0.95)

Return the index with the highest lower confidence bound

snapshot()

Return copys of the model params

update(index, value)

Update the model based on current data

Parameters:
  • index (int) – the index of the variable that was evaluated
  • value (float) – the value of the variable
class dexnet.learning.TerminationCondition

Returns true when a condition is satisfied. Used for supplying different termination conditions to optimization algorithms

class dexnet.learning.MaxIterTerminationCondition(max_iters)

Terminate based on reaching a maximum number of iterations.

max_iters

int – the maximum number of allowed iterations

class dexnet.learning.ProgressTerminationCondition(eps)

Terminate based on lack of progress.

eps

float – the minimum admissible progress that must be made on each iteration to continue

class dexnet.learning.ConfidenceTerminationCondition(eps)

Terminate based on model confidence.

eps

float – the amount of confidence in the predicted objective value that the model must have to terminate

class dexnet.learning.OrTerminationCondition(term_conditions)

Terminate based on the OR of several termination conditions

term_conditions

list of TerminationCondition – termination conditions that are ORed to get the final termination results

class dexnet.learning.AndTerminationCondition(term_conditions)

Terminate based on the AND of several termination conditions

term_conditions

list of TerminationCondition – termination conditions that are ANDed to get the final termination results

class dexnet.learning.ThompsonSelectionPolicy(model=None)

Chooses the next point using the Thompson sampling selection policy

choose_next(stop=False)

Returns the index of the maximal random sample, breaking ties uniformly at random

class dexnet.learning.BetaBernoulliGittinsIndex98Policy(model=None)

Chooses the next point using the BetaBernoulli gittins index policy with gamma = 0.98

choose_next()

Returns the index of the maximal random sample, breaking ties uniformly at random

class dexnet.learning.BetaBernoulliBayesUCBPolicy(horizon=1000, c=6, model=None)

Chooses the next point using the Bayes UCB selection policy

choose_next(stop=False)

Returns the index of the maximal random sample, breaking ties uniformly at random

class dexnet.learning.Objective

Acts as a function that returns a numeric value for classes of input data, with checks for valid input.

check_valid_input(x)

Return whether or not a point is valid for the objective.

Parameters:x (object) – point at which to evaluate the objective
evaluate(x)

Evaluates a function to be maximized at some point x.

Parameters:x (object) – point at which to evaluate the objective
class dexnet.learning.DifferentiableObjective

Objectives that are at least two-times differentable.

gradient(x)

Evaluate the gradient at x.

Parameters:x (object) – point at which to evaluate the objective
hessian(x)

Evaluate the hessian at x.

Parameters:x (object) – point at which to evaluate the objective
class dexnet.learning.MaximizationObjective(obj)

Wrapper for maximization of some supplied objective function. Actually not super important, here for symmetry.

obj

Objective – objective function to maximize

class dexnet.learning.MinimizationObjective(obj)

Wrapper for minimization of some supplied objective function. Used because internally all solvers attempt to maximize by default.

obj

Objective – objective function to minimize

evaluate(x)

Return negative, as all solvers will be assuming a maximization

class dexnet.learning.NonDeterministicObjective(det_objective)

Wrapper for non-deterministic objective function evaluations. Samples random values of the input data x.

det_objective

Objective – deterministic objective function to optimize

evaluate(x)

Evaluates a function to be maximized at some point x.

Parameters:x (object with a sample() function) – point at which to evaluate the nondeterministic objective
class dexnet.learning.ZeroOneObjective(b=0)

Zero One Loss based on thresholding.

b

int – threshold value, 1 iff x > b, 0 otherwise

check_valid_input(x)

Check whether or not input is valid for the objective

class dexnet.learning.IdentityObjective

Just returns the value x

check_valid_input(x)

Check whether or not input is valid for the objective

class dexnet.learning.RandomBinaryObjective

Returns a 0 or 1 based on some underlying random probability of success for the data points Evaluated data points must have a sample_success method that returns 0 or 1

check_valid_input(x)

Check whether or not input is valid for the objective

class dexnet.learning.RandomContinuousObjective

Returns a continuous value based on some underlying random probability of success for the data points Evaluated data points must have a sample method

check_valid_input(x)

Check whether or not input is valid for the objective

class dexnet.learning.LeastSquaresObjective(A, b)

Classic least-squares loss 0.5 * norm(Ax - b)**2

A

numpy.ndarray – A matrix in least squares 0.5 * norm(Ax - b)**2

b

numpy.ndarray – b vector in least squares 0.5 * norm(Ax - b)**2

class dexnet.learning.LogisticCrossEntropyObjective(X, y)

Logistic cross entropy loss.

X

numpy.ndarray – X matrix in logistic function 1 / (1 + exp(- X^T beta)

y

numpy.ndarray – y vector, true labels

class dexnet.learning.CrossEntropyLoss(true_p)

Cross entropy loss.

true_p

numpy.ndarray – the true probabilities for all admissible datapoints

class dexnet.learning.SquaredErrorLoss(true_p)

Squared error (x - x_true)**2

true_p

numpy.ndarray – the true labels for all admissible inputs

class dexnet.learning.WeightedSquaredErrorLoss(true_p)

Weighted squared error w * (x - x_true)**2

true_p

numpy.ndarray – the true labels for all admissible inputs

evaluate(est_p, weights)

Evaluates the squared loss of the estimated p with given weights

Parameters:est_p (list of float) – points at which to evaluate the objective
class dexnet.learning.CCBPLogLikelihood(true_p)

CCBP log likelihood of the true params under a current posterior distribution

true_p

list of Number – true probabilities of datapoints

evaluate(alphas, betas)

Evaluates the CCBP likelihood of the true data under estimated CCBP posterior parameters alpha and beta

Parameters:
  • alphas (list of Number) – posterior alpha values
  • betas (list of Number) – posterior beta values
class dexnet.learning.SamplingSolver(objective)

Optimization methods based on a sampling strategy

class dexnet.learning.AdaptiveSamplingResult(best_candidates, best_pred_means, best_pred_vars, total_time, checkpt_times, iters, indices, vals, models)

Struct to store the results of sampling / optimization.

best_candidates

list of candidate objects – list of the best candidates as estimated by the optimizer

best_pred_means

list of floats – list of the predicted mean objective value for the best candidates

best_pred_vars

list of floats – list of the variance in the predicted objective value for the best candidates

total_time

float – the total optimization time

checkpt_times

list of floats – the time since start at which the snapshots were taken

iters

list of ints – the iterations at which snapshots were taked

indices

list of ints – the indices of the candidates selected at each snapshot iteration

vals

list of objective output values – the value returned by the evaluated candidate at each snapshot iteration

models

list of Model – the state of the current candidate objective value predictive model at each snapshot iteration

best_pred_ind

list of int – the indices of the candidate predicted to be the best by the model at each snapshot iteration

class dexnet.learning.BetaBernoulliBandit(objective, candidates, policy, alpha_prior=1.0, beta_prior=1.0)

Class for running Beta Bernoulli Multi-Armed Bandits

objective

Objective – the objective to optimize via sampling

candidates

list of arbitrary objects that can be evaluted by the objective – the list of candidates to optimize over

policy

DiscreteSelectionPolicy – a policy to use to select the next candidate to evaluate (e.g. ThompsonSampling)

alpha_prior

float – the prior to use on the alpha parameter

beta_prior

float – the prior to use on the beta parameter

reset_model(candidates)

Needed to independently maximize over subsets of data

class dexnet.learning.UniformAllocationMean(objective, candidates, alpha_prior=1.0, beta_prior=1.0)

Uniform Allocation with Beta Bernoulli Multi-Armed Bandits

objective

Objective – the objective to optimize via sampling

candidates

list of arbitrary objects that can be evaluted by the objective – the list of candidates to optimize over

alpha_prior

float – the prior to use on the alpha parameter

beta_prior

float – the prior to use on the beta parameter

class dexnet.learning.ThompsonSampling(objective, candidates, alpha_prior=1.0, beta_prior=1.0)

Thompson Sampling with Beta Bernoulli Multi-Armed Bandits

objective

Objective – the objective to optimize via sampling

candidates

list of arbitrary objects that can be evaluted by the objective – the list of candidates to optimize over

alpha_prior

float – the prior to use on the alpha parameter

beta_prior

float – the prior to use on the beta parameter

class dexnet.learning.GittinsIndex98(objective, candidates, alpha_prior=1.0, beta_prior=1.0)

Gittins Index Policy using gamma = 0.98 with Beta Bernoulli Multi-Armed Bandits

objective

Objective – the objective to optimize via sampling

candidates

list of arbitrary objects that can be evaluted by the objective – the list of candidates to optimize over

alpha_prior

float – the prior to use on the alpha parameter

beta_prior

float – the prior to use on the beta parameter

class dexnet.learning.GaussianBandit(objective, candidates, policy)

Multi-Armed Bandit class using and independent Gaussian random variables to model the objective value of each candidate.

objective

Objective – the objective to optimize via sampling

candidates

list of arbitrary objects that can be evaluted by the objective – the list of candidates to optimize over

policy

DiscreteSelectionPolicy – a policy to use to select the next candidate to evaluate (e.g. ThompsonSampling)

class dexnet.learning.GaussianUniformAllocationMean(objective, candidates)

Uniform Allocation with Independent Gaussian Multi-Armed Bandit model

objective

Objective – the objective to optimize via sampling

candidates

list of arbitrary objects that can be evaluted by the objective – the list of candidates to optimize over

class dexnet.learning.GaussianThompsonSampling(objective, candidates)

Thompson Sampling with Independent Gaussian Multi-Armed Bandit model

objective

Objective – the objective to optimize via sampling

candidates

list of arbitrary objects that can be evaluted by the objective – the list of candidates to optimize over

class dexnet.learning.GaussianUCBSampling(objective, candidates)

UCB with Independent Gaussian Multi-Armed Bandit model

objective

Objective – the objective to optimize via sampling

candidates

list of arbitrary objects that can be evaluted by the objective – the list of candidates to optimize over

class dexnet.learning.CorrelatedBetaBernoulliBandit(objective, candidates, policy, nn, kernel, tolerance=0.0001, alpha_prior=1.0, beta_prior=1.0, p=0.95)

Multi-Armed Bandit class using Continuous Correlated Beta Processes (CCBPs) to model the objective value of each candidate.

objective

Objective – the objective to optimize via sampling

candidates

list of arbitrary objects that can be evaluted by the objective – the list of candidates to optimize over

policy

DiscreteSelectionPolicy – a policy to use to select the next candidate to evaluate (e.g. ThompsonSampling)

nn

NearestNeighbor – nearest neighbor structure for fast lookups during module updates

kernel

Kernel – kernel to use in CCBP model

alpha_prior

float – the prior to use on the alpha parameter

beta_prior

float – the prior to use on the beta parameter

p

float – the lower confidence bound used for best arm prediction (e.g. 0.95 -> return the 5th percentile of the belief distribution as the estimated objective value for each candidate)

reset_model(candidates)

Needed to independently maximize over subsets of data

class dexnet.learning.CorrelatedThompsonSampling(objective, candidates, nn, kernel, tolerance=0.0001, alpha_prior=1.0, beta_prior=1.0, p=0.95)

Thompson Sampling with CCBP Multi-Armed Bandit model

objective

Objective – the objective to optimize via sampling

candidates

list of arbitrary objects that can be evaluted by the objective – the list of candidates to optimize over

nn

NearestNeighbor – nearest neighbor structure for fast lookups during module updates

kernel

Kernel – kernel to use in CCBP model

alpha_prior

float – the prior to use on the alpha parameter

beta_prior

float – the prior to use on the beta parameter

p

float – the lower confidence bound used for best arm prediction (e.g. 0.95 -> return the 5th percentile of the belief distribution as the estimated objective value for each candidate)

class dexnet.learning.CorrelatedBayesUCB(objective, candidates, nn, kernel, tolerance=0.0001, alpha_prior=1.0, beta_prior=1.0, horizon=1000, c=6, p=0.95)

Bayes UCB with CCBP Multi-Armed Bandit model (see “On Bayesian Upper Confidence Bounds for Bandit Problems” by Kaufmann et al.)

objective

Objective – the objective to optimize via sampling

candidates

list of arbitrary objects that can be evaluted by the objective – the list of candidates to optimize over

nn

NearestNeighbor – nearest neighbor structure for fast lookups during module updates

kernel

Kernel – kernel to use in CCBP model

tolerance

float – TODO

alpha_prior

float – the prior to use on the alpha parameter

beta_prior

float – the prior to use on the beta parameter

horizon

int – horizon parameter for Bayes UCB

c

int – quantile parameter for Bayes UCB

p

float – the lower confidence bound used for best arm prediction (e.g. 0.95 -> return the 5th percentile of the belief distribution as the estimated objective value for each candidate)

class dexnet.learning.CorrelatedGittins(objective, candidates, nn, kernel, tolerance=0.0001, alpha_prior=1.0, beta_prior=1.0, p=0.95)

” Gittins Index Policy for gamma=0.98 with CCBP Multi-Armed Bandit model

objective

Objective – the objective to optimize via sampling

candidates

list of arbitrary objects that can be evaluted by the objective – the list of candidates to optimize over

nn

NearestNeighbor – nearest neighbor structure for fast lookups during module updates

kernel

Kernel – kernel to use in CCBP model

alpha_prior

float – the prior to use on the alpha parameter

beta_prior

float – the prior to use on the beta parameter

p

float – the lower confidence bound used for best arm prediction (e.g. 0.95 -> return the 5th percentile of the belief distribution as the estimated objective value for each candidate)

class dexnet.learning.ConfusionMatrix(num_categories)

Confusion matrix for classification errors

class dexnet.learning.Tensor(shape, dtype=<type ‘numpy.float32’>)

Abstraction for 4-D tensor objects.

add(datapoint)

Adds the datapoint to the tensor if room is available.

data_slice(slice_ind)

Returns a slice of datapoints

datapoint(ind)

Returns the datapoint at the given index.

static load(filename, compressed=True)

Loads a tensor from disk.

reset()

Resets the current index.

save(filename, compressed=True)

Save a tensor to disk.

set_datapoint(ind, datapoint)

Sets the value of the datapoint at the given index.

class dexnet.learning.TensorDataset(filename, config, access_mode=’WRITE’)

Encapsulates learning datasets and different training and test splits of the data.

add(datapoint)

Adds a datapoint to the file.

datapoint(ind)

Loads a tensor datapoint for a given global index.

Parameters:ind (int) – global index in the tensor
Returns:the desired tensor datapoint
Return type:TensorDatapoint
datapoint_indices

Returns an array of all dataset indices.

datapoint_indices_for_tensor(tensor_index)

Returns the indices for all datapoints in the given tensor.

flush()

Flushes the data tensors. Alternate handle to write.

generate_tensor_filename(field_name, file_num, compressed=True)

Generate a filename for a tensor.

load_tensor(field_name, file_num)

Loads a tensor for a given field and file num.

Parameters:
  • field_name (str) – the name of the field to load
  • file_num (int) – the number of the file to load from
Returns:

the desired tensor

Return type:

Tensor

next()

Read the next datapoint.

Returns:the next datapoint
Return type:TensorDatapoint
static open(dataset_dir)

Opens a tensor dataset.

split(attribute, train_pct, val_pct)

Splits the dataset along the given attribute.

tensor_dir

Return the tensor directory.

tensor_index(datapoint_index)

Returns the index of the tensor containing the referenced datapoint.

tensor_indices

Returns an array of all tensor indices.

write()

Writes all tensors to the next file number.