learning¶

Classes to facilitate machine learning with Dex-Net, including wrappers for GQ-CNN training datasets and Multi-Armed Bandit methods for grasp robustness evaluation.

class dexnet.learning.Model¶

A predictor of some value of the input data

predict(x)¶: Predict the function of the data at some point x. For probabilistic models this returns the mean prediction

snapshot()¶: Returns a concise description of the current model for debugging and logging purposes

update()¶: Update the model based on current data

class dexnet.learning.DiscreteModel¶

Maintains a prediction over a discrete set of points

max_prediction()¶: Returns the index (or indices), posterior mean, and posterior variance of the variable(s) with the maximal mean predicted value

num_vars()¶: Returns the number of variables in the model

sample()¶: Sample discrete predictions from the model. For deterministic models, returns the deterministic prediction

class dexnet.learning.Snapshot(best_pred_ind, num_obs)¶: Abstract class for storing the current state of a model

class dexnet.learning.BernoulliSnapshot(best_pred_ind, means, num_obs)¶: Stores the current state of a Bernoulli model

class dexnet.learning.BetaBernoulliSnapshot(best_pred_ind, alphas, betas, num_obs)¶: Stores the current state of a Beta Bernoulli model

class dexnet.learning.GaussianSnapshot(best_pred_ind, means, variances, sample_vars, num_obs)¶: Stores the current state of a Gaussian model

class dexnet.learning.BernoulliModel(num_vars, mean_prior=0.5)¶

Standard bernoulli model for predictions over a discrete set of candidates

num_vars¶: int – the number of variables to track

prior_means¶: (float) prior on mean probabilty of success for candidates

static bernoulli_mean(p)¶: Mean of the beta distribution with params alpha and beta

static bernoulli_variance(p, n)¶: Uses Wald interval for variance prediction

max_prediction()¶: Returns the index (or indices), posterior mean, and posterior variance of the variable(s) with the maximal mean probaiblity of success

predict(index)¶: Predicts the probability of success for the variable indexed by index

sample()¶: Samples probabilities of success from the given values

snapshot()¶: Return copys of the model params

update(index, value)¶: Update the model based on an observation of value at index index

class dexnet.learning.BetaBernoulliModel(num_vars, alpha_prior=1.0, beta_prior=1.0)¶

Beta-Bernoulli model for predictions over a discrete set of candidates .. attribute:: num_vars

int – the number of variables to track

alpha_prior¶: float – prior alpha parameter of the Beta distribution

beta_prior¶: float – prior beta parameter of the Beta distribution

static beta_mean(alpha, beta)¶: Mean of the beta distribution with params alpha and beta

static beta_variance(alpha, beta)¶: Mean of the beta distribution with params alpha and beta

max_prediction()¶: Returns the index (or indices), posterior mean, and posterior variance of the variable(s) with the maximal mean probaiblity of success

predict(index)¶: Predicts the probability of success for the variable indexed by index

sample(vis=False, stop=False)¶: Samples probabilities of success from the given values

static sample_variance(alpha, beta)¶: Mean of the beta distribution with params alpha and beta

snapshot()¶: Return copies of the model params

update(index, value)¶: Update the model based on an observation of value at index index

class dexnet.learning.GaussianModel(num_vars)¶

Gaussian model for predictions over a discrete set of candidates.

num_vars¶: int – the number of variables to track

max_prediction()¶: Returns the index, mean, and variance of the variable(s) with the maximal predicted value.

predict(index)¶

Predict the value of the index’th variable.

Parameters:	index (int) – the variable to find the predicted value for

sample(stop=False)¶: Sample discrete predictions from the model. Mean follows a t-distribution

snapshot()¶: Returns a concise description of the current model for debugging and logging purposes.

update(index, value)¶

Update the model based on current data.

Parameters:	index (int) – the index of the variable that was evaluated value (float) – the value of the variable

variances¶: Confidence bounds on the mean

class dexnet.learning.CorrelatedBetaBernoulliModel(candidates, nn, kernel, tolerance=0.01, alpha_prior=1.0, beta_prior=1.0, p=0.5)¶

Correlated Beta-Bernoulli model for predictions over a discrete set of candidates.

candidates¶: list – the objects to track

nn¶: NearestNeighbor – nearest neighbor structure to use for neighborhood lookups

kernel¶: Kernel – kernel instance to measure similarities

tolerance¶: float – for computing radius of neighborhood, between 0 and 1

alpha_prior¶: float – prior alpha parameter of the Beta distribution

beta_prior¶: float – prior beta parameter of the Beta distribution

kernel_matrix¶: Create the full kernel matrix for debugging purposes

lcb_prediction(p=0.95)¶: Return the index with the highest lower confidence bound

snapshot()¶: Return copys of the model params

update(index, value)¶

Update the model based on current data

Parameters:	index (int) – the index of the variable that was evaluated value (float) – the value of the variable

class dexnet.learning.TerminationCondition¶: Returns true when a condition is satisfied. Used for supplying different termination conditions to optimization algorithms

class dexnet.learning.MaxIterTerminationCondition(max_iters)¶

Terminate based on reaching a maximum number of iterations.

max_iters¶: int – the maximum number of allowed iterations

class dexnet.learning.ProgressTerminationCondition(eps)¶

Terminate based on lack of progress.

eps¶: float – the minimum admissible progress that must be made on each iteration to continue

class dexnet.learning.ConfidenceTerminationCondition(eps)¶

Terminate based on model confidence.

eps¶: float – the amount of confidence in the predicted objective value that the model must have to terminate

class dexnet.learning.OrTerminationCondition(term_conditions)¶

Terminate based on the OR of several termination conditions

term_conditions¶: list of TerminationCondition – termination conditions that are ORed to get the final termination results

class dexnet.learning.AndTerminationCondition(term_conditions)¶

Terminate based on the AND of several termination conditions

term_conditions¶: list of TerminationCondition – termination conditions that are ANDed to get the final termination results

class dexnet.learning.ThompsonSelectionPolicy(model=None)¶

Chooses the next point using the Thompson sampling selection policy

choose_next(stop=False)¶: Returns the index of the maximal random sample, breaking ties uniformly at random

class dexnet.learning.BetaBernoulliGittinsIndex98Policy(model=None)¶

Chooses the next point using the BetaBernoulli gittins index policy with gamma = 0.98

choose_next()¶: Returns the index of the maximal random sample, breaking ties uniformly at random

class dexnet.learning.BetaBernoulliBayesUCBPolicy(horizon=1000, c=6, model=None)¶

Chooses the next point using the Bayes UCB selection policy

choose_next(stop=False)¶: Returns the index of the maximal random sample, breaking ties uniformly at random

class dexnet.learning.Objective¶

Acts as a function that returns a numeric value for classes of input data, with checks for valid input.

check_valid_input(x)¶

Return whether or not a point is valid for the objective.

Parameters:	x (`object`) – point at which to evaluate the objective

evaluate(x)¶

Evaluates a function to be maximized at some point x.

Parameters:	x (`object`) – point at which to evaluate the objective

class dexnet.learning.DifferentiableObjective¶

Objectives that are at least two-times differentable.

gradient(x)¶

Evaluate the gradient at x.

Parameters:	x (`object`) – point at which to evaluate the objective

hessian(x)¶

Evaluate the hessian at x.

Parameters:	x (`object`) – point at which to evaluate the objective

class dexnet.learning.MaximizationObjective(obj)¶

Wrapper for maximization of some supplied objective function. Actually not super important, here for symmetry.

obj¶: Objective – objective function to maximize

class dexnet.learning.MinimizationObjective(obj)¶

Wrapper for minimization of some supplied objective function. Used because internally all solvers attempt to maximize by default.

obj¶: Objective – objective function to minimize

evaluate(x)¶: Return negative, as all solvers will be assuming a maximization

class dexnet.learning.NonDeterministicObjective(det_objective)¶

Wrapper for non-deterministic objective function evaluations. Samples random values of the input data x.

det_objective¶: Objective – deterministic objective function to optimize

evaluate(x)¶

Evaluates a function to be maximized at some point x.

Parameters:	x (`object` with a sample() function) – point at which to evaluate the nondeterministic objective

class dexnet.learning.ZeroOneObjective(b=0)¶

Zero One Loss based on thresholding.

b¶: int – threshold value, 1 iff x > b, 0 otherwise

check_valid_input(x)¶: Check whether or not input is valid for the objective

class dexnet.learning.IdentityObjective¶

Just returns the value x

check_valid_input(x)¶: Check whether or not input is valid for the objective

class dexnet.learning.RandomBinaryObjective¶

Returns a 0 or 1 based on some underlying random probability of success for the data points Evaluated data points must have a sample_success method that returns 0 or 1

check_valid_input(x)¶: Check whether or not input is valid for the objective

class dexnet.learning.RandomContinuousObjective¶

Returns a continuous value based on some underlying random probability of success for the data points Evaluated data points must have a sample method

check_valid_input(x)¶: Check whether or not input is valid for the objective

class dexnet.learning.LeastSquaresObjective(A, b)¶

Classic least-squares loss 0.5 * norm(Ax - b)**2

A¶: numpy.ndarray – A matrix in least squares 0.5 * norm(Ax - b)**2

b¶: numpy.ndarray – b vector in least squares 0.5 * norm(Ax - b)**2

class dexnet.learning.LogisticCrossEntropyObjective(X, y)¶

Logistic cross entropy loss.

X¶: numpy.ndarray – X matrix in logistic function 1 / (1 + exp(- X^T beta)

y¶: numpy.ndarray – y vector, true labels

class dexnet.learning.CrossEntropyLoss(true_p)¶

Cross entropy loss.

true_p¶: numpy.ndarray – the true probabilities for all admissible datapoints

class dexnet.learning.SquaredErrorLoss(true_p)¶

Squared error (x - x_true)**2

true_p¶: numpy.ndarray – the true labels for all admissible inputs

class dexnet.learning.WeightedSquaredErrorLoss(true_p)¶

Weighted squared error w * (x - x_true)**2

true_p¶: numpy.ndarray – the true labels for all admissible inputs

evaluate(est_p, weights)¶

Evaluates the squared loss of the estimated p with given weights

Parameters:	est_p (`list` of `float`) – points at which to evaluate the objective

class dexnet.learning.CCBPLogLikelihood(true_p)¶

CCBP log likelihood of the true params under a current posterior distribution

true_p¶: list of Number – true probabilities of datapoints

evaluate(alphas, betas)¶

Evaluates the CCBP likelihood of the true data under estimated CCBP posterior parameters alpha and beta

Parameters:	alphas (`list` of `Number`) – posterior alpha values betas (`list` of `Number`) – posterior beta values

class dexnet.learning.SamplingSolver(objective)¶: Optimization methods based on a sampling strategy

class dexnet.learning.AdaptiveSamplingResult(best_candidates, best_pred_means, best_pred_vars, total_time, checkpt_times, iters, indices, vals, models)¶

Struct to store the results of sampling / optimization.

best_candidates¶: list of candidate objects – list of the best candidates as estimated by the optimizer

best_pred_means¶: list of floats – list of the predicted mean objective value for the best candidates

best_pred_vars¶: list of floats – list of the variance in the predicted objective value for the best candidates

total_time¶: float – the total optimization time

checkpt_times¶: list of floats – the time since start at which the snapshots were taken

iters¶: list of ints – the iterations at which snapshots were taked

indices¶: list of ints – the indices of the candidates selected at each snapshot iteration

vals¶: list of objective output values – the value returned by the evaluated candidate at each snapshot iteration

models¶: list of Model – the state of the current candidate objective value predictive model at each snapshot iteration

best_pred_ind¶: list of int – the indices of the candidate predicted to be the best by the model at each snapshot iteration

class dexnet.learning.BetaBernoulliBandit(objective, candidates, policy, alpha_prior=1.0, beta_prior=1.0)¶

Class for running Beta Bernoulli Multi-Armed Bandits

objective¶: Objective – the objective to optimize via sampling

candidates¶: list of arbitrary objects that can be evaluted by the objective – the list of candidates to optimize over

policy¶: DiscreteSelectionPolicy – a policy to use to select the next candidate to evaluate (e.g. ThompsonSampling)

alpha_prior¶: float – the prior to use on the alpha parameter

beta_prior¶: float – the prior to use on the beta parameter

reset_model(candidates)¶: Needed to independently maximize over subsets of data

class dexnet.learning.UniformAllocationMean(objective, candidates, alpha_prior=1.0, beta_prior=1.0)¶

Uniform Allocation with Beta Bernoulli Multi-Armed Bandits

objective¶: Objective – the objective to optimize via sampling

candidates¶: list of arbitrary objects that can be evaluted by the objective – the list of candidates to optimize over

alpha_prior¶: float – the prior to use on the alpha parameter

beta_prior¶: float – the prior to use on the beta parameter

class dexnet.learning.ThompsonSampling(objective, candidates, alpha_prior=1.0, beta_prior=1.0)¶

Thompson Sampling with Beta Bernoulli Multi-Armed Bandits

objective¶: Objective – the objective to optimize via sampling

candidates¶: list of arbitrary objects that can be evaluted by the objective – the list of candidates to optimize over

alpha_prior¶: float – the prior to use on the alpha parameter

beta_prior¶: float – the prior to use on the beta parameter

class dexnet.learning.GittinsIndex98(objective, candidates, alpha_prior=1.0, beta_prior=1.0)¶

Gittins Index Policy using gamma = 0.98 with Beta Bernoulli Multi-Armed Bandits

objective¶: Objective – the objective to optimize via sampling

candidates¶: list of arbitrary objects that can be evaluted by the objective – the list of candidates to optimize over

alpha_prior¶: float – the prior to use on the alpha parameter

beta_prior¶: float – the prior to use on the beta parameter

class dexnet.learning.GaussianBandit(objective, candidates, policy)¶

Multi-Armed Bandit class using and independent Gaussian random variables to model the objective value of each candidate.

objective¶: Objective – the objective to optimize via sampling

candidates¶: list of arbitrary objects that can be evaluted by the objective – the list of candidates to optimize over

policy¶: DiscreteSelectionPolicy – a policy to use to select the next candidate to evaluate (e.g. ThompsonSampling)

class dexnet.learning.GaussianUniformAllocationMean(objective, candidates)¶

Uniform Allocation with Independent Gaussian Multi-Armed Bandit model

objective¶: Objective – the objective to optimize via sampling

candidates¶: list of arbitrary objects that can be evaluted by the objective – the list of candidates to optimize over

class dexnet.learning.GaussianThompsonSampling(objective, candidates)¶

Thompson Sampling with Independent Gaussian Multi-Armed Bandit model

objective¶: Objective – the objective to optimize via sampling

candidates¶: list of arbitrary objects that can be evaluted by the objective – the list of candidates to optimize over

class dexnet.learning.GaussianUCBSampling(objective, candidates)¶

UCB with Independent Gaussian Multi-Armed Bandit model

objective¶: Objective – the objective to optimize via sampling

candidates¶: list of arbitrary objects that can be evaluted by the objective – the list of candidates to optimize over

class dexnet.learning.CorrelatedBetaBernoulliBandit(objective, candidates, policy, nn, kernel, tolerance=0.0001, alpha_prior=1.0, beta_prior=1.0, p=0.95)¶

Multi-Armed Bandit class using Continuous Correlated Beta Processes (CCBPs) to model the objective value of each candidate.

objective¶: Objective – the objective to optimize via sampling

candidates¶: list of arbitrary objects that can be evaluted by the objective – the list of candidates to optimize over

policy¶: DiscreteSelectionPolicy – a policy to use to select the next candidate to evaluate (e.g. ThompsonSampling)

nn¶: NearestNeighbor – nearest neighbor structure for fast lookups during module updates

kernel¶: Kernel – kernel to use in CCBP model

alpha_prior¶: float – the prior to use on the alpha parameter

beta_prior¶: float – the prior to use on the beta parameter

p¶: float – the lower confidence bound used for best arm prediction (e.g. 0.95 -> return the 5th percentile of the belief distribution as the estimated objective value for each candidate)

reset_model(candidates)¶: Needed to independently maximize over subsets of data

class dexnet.learning.CorrelatedThompsonSampling(objective, candidates, nn, kernel, tolerance=0.0001, alpha_prior=1.0, beta_prior=1.0, p=0.95)¶

Thompson Sampling with CCBP Multi-Armed Bandit model

objective¶: Objective – the objective to optimize via sampling

candidates¶: list of arbitrary objects that can be evaluted by the objective – the list of candidates to optimize over

nn¶: NearestNeighbor – nearest neighbor structure for fast lookups during module updates

kernel¶: Kernel – kernel to use in CCBP model

alpha_prior¶: float – the prior to use on the alpha parameter

beta_prior¶: float – the prior to use on the beta parameter

p¶: float – the lower confidence bound used for best arm prediction (e.g. 0.95 -> return the 5th percentile of the belief distribution as the estimated objective value for each candidate)

class dexnet.learning.CorrelatedBayesUCB(objective, candidates, nn, kernel, tolerance=0.0001, alpha_prior=1.0, beta_prior=1.0, horizon=1000, c=6, p=0.95)¶

Bayes UCB with CCBP Multi-Armed Bandit model (see “On Bayesian Upper Confidence Bounds for Bandit Problems” by Kaufmann et al.)

objective¶: Objective – the objective to optimize via sampling

candidates¶: list of arbitrary objects that can be evaluted by the objective – the list of candidates to optimize over

nn¶: NearestNeighbor – nearest neighbor structure for fast lookups during module updates

kernel¶: Kernel – kernel to use in CCBP model

tolerance¶: float – TODO

alpha_prior¶: float – the prior to use on the alpha parameter

beta_prior¶: float – the prior to use on the beta parameter

horizon¶: int – horizon parameter for Bayes UCB

c¶: int – quantile parameter for Bayes UCB

p¶: float – the lower confidence bound used for best arm prediction (e.g. 0.95 -> return the 5th percentile of the belief distribution as the estimated objective value for each candidate)

class dexnet.learning.CorrelatedGittins(objective, candidates, nn, kernel, tolerance=0.0001, alpha_prior=1.0, beta_prior=1.0, p=0.95)¶

” Gittins Index Policy for gamma=0.98 with CCBP Multi-Armed Bandit model

objective¶: Objective – the objective to optimize via sampling

candidates¶: list of arbitrary objects that can be evaluted by the objective – the list of candidates to optimize over

nn¶: NearestNeighbor – nearest neighbor structure for fast lookups during module updates

kernel¶: Kernel – kernel to use in CCBP model

alpha_prior¶: float – the prior to use on the alpha parameter

beta_prior¶: float – the prior to use on the beta parameter

p¶: float – the lower confidence bound used for best arm prediction (e.g. 0.95 -> return the 5th percentile of the belief distribution as the estimated objective value for each candidate)

class dexnet.learning.ConfusionMatrix(num_categories)¶: Confusion matrix for classification errors

class dexnet.learning.Tensor(shape, dtype=<type ‘numpy.float32’>)¶

Abstraction for 4-D tensor objects.

add(datapoint)¶: Adds the datapoint to the tensor if room is available.

data_slice(slice_ind)¶: Returns a slice of datapoints

datapoint(ind)¶: Returns the datapoint at the given index.

static load(filename, compressed=True)¶: Loads a tensor from disk.

reset()¶: Resets the current index.

save(filename, compressed=True)¶: Save a tensor to disk.

set_datapoint(ind, datapoint)¶: Sets the value of the datapoint at the given index.

class dexnet.learning.TensorDataset(filename, config, access_mode=’WRITE’)¶

Encapsulates learning datasets and different training and test splits of the data.

add(datapoint)¶: Adds a datapoint to the file.

datapoint(ind)¶

Loads a tensor datapoint for a given global index.

Parameters:	ind (int) – global index in the tensor
Returns:	the desired tensor datapoint
Return type:	`TensorDatapoint`

datapoint_indices¶: Returns an array of all dataset indices.

datapoint_indices_for_tensor(tensor_index)¶: Returns the indices for all datapoints in the given tensor.

flush()¶: Flushes the data tensors. Alternate handle to write.

generate_tensor_filename(field_name, file_num, compressed=True)¶: Generate a filename for a tensor.

load_tensor(field_name, file_num)¶

Loads a tensor for a given field and file num.

Parameters:	field_name (str) – the name of the field to load file_num (int) – the number of the file to load from
Returns:	the desired tensor
Return type:	`Tensor`

next()¶

Read the next datapoint.

Returns:	the next datapoint
Return type:	`TensorDatapoint`

static open(dataset_dir)¶: Opens a tensor dataset.

split(attribute, train_pct, val_pct)¶: Splits the dataset along the given attribute.

tensor_dir¶: Return the tensor directory.

tensor_index(datapoint_index)¶: Returns the index of the tensor containing the referenced datapoint.

tensor_indices¶: Returns an array of all tensor indices.

write()¶: Writes all tensors to the next file number.