Some comonly used functions for datasets.
Checking
source
check_duplicate_in_dataset
check_duplicate_in_dataset (x, dataset)
Check if ‘x’ is in ‘dataset’
source
check_duplicates_in_dataset
check_duplicates_in_dataset (xs, dataset, return_ind=False, invert=False)
Checks if xs
is are dataset
. Boolean invert
changes if we count duplicates (False) or ones that are not in dataset (True). Uses torch.vmap
which copies dataset
for every element in xs
.
Check if this works:
xs = torch.tensor(
[[0.7 , 1 , 0.5 ],
[0.3 , 1 , 0.5 ],
[ 0 , 1 , 0.5 ]])
d = torch.tensor([
[0.11 , 1 , 0.5 ],
[0.70 , 1 , 0.5 ], #here a dup
[0.71 , 1 , 0.5 ],
[0.3 , 1 , 0.5 ]])
check_duplicates_in_dataset(xs, d, return_ind= True )
Manipulating
source
shuffle_tensor_dataset
shuffle_tensor_dataset (x, y=None, *z)
Assumes numpy or tensor objects with same length.
source
get_unique_elements_indices
get_unique_elements_indices (tensor)
Returns indices of unique_elements in tensor
.
source
uniquify_tensor_dataset
uniquify_tensor_dataset (x, y=None, *z)
x
has to be tensor, assumes numpy or tensor obj for y
and z
source
balance_tensor_dataset
balance_tensor_dataset (x, y, *z, samples:int=None,
make_unique:bool=True, y_uniques=None,
shuffle_lables:bool=True, add_balance_fn:<built-
infunctioncallable>=None)
Assumes x
is tensor and y
is tensor or numpy.
Back to top