API - Dataflow¶
Dataflow list¶
|
Data loader. |
|
An abstract class to encapsulate methods and behaviors of datasets. |
An abstract class to encapsulate methods and behaviors of iterable datasets. |
|
|
Generate a dataset from a list of tensors. |
|
A Dataset which chains multiple iterable-tyle datasets. |
|
Concat multiple datasets into a new dataset |
|
Subset of a dataset at specified indices. |
|
Randomly split a dataset into non-overlapping new datasets of given lengths. |
|
Base class for all Samplers. |
|
Wraps another sampler to yield a mini-batch of indices. |
|
Samples elements randomly. |
|
Samples elements sequentially, always in the same order. |
|
Samples elements from |
|
Samples elements randomly from a given list of indices, without replacement. |
Dataflow¶
DataLoader¶
-
class
tensorlayerx.dataflow.
DataLoader
(dataset, batch_size=1, shuffle=False, drop_last=False, sampler=None, batch_sampler=None, num_workers=0, collate_fn=None, time_out=0, worker_init_fn=None, prefetch_factor=2, persistent_workers=False)[source]¶ Data loader. Combines a dataset and a sampler, and provides an iterable over the given dataset.
The
tensorlayerx.dataflow.DataLoader
supports both map-style and iterable-style datasets with single- or multi-process loading, customizing loading order and optional automatic batching- Parameters
dataset (Dataset) – dataset from which to load the data.
batch_size (int) – how many samples per batch to load, default is 1.
shuffle (bool) – set to
True
to have the data reshuffled at every epoch, default isFalse
.drop_last (bool) – set to
True
to drop the last incomplete batch, if the dataset size is not divisible by the batch size. IfFalse
and the size of dataset is not divisible by the batch size, then the last batch will be smaller. default isFalse
.sampler (Sampler) – defines the strategy to draw samples from the dataset. If specified, shuffle must not be specified.
batch_sampler (Sampler) – returns a batch of indices at a time. If specified, shuffle, batch_size, drop_last, sampler must not be specified.
num_workers (int) – how many subprocesses to use for data loading.
0
means that the data will be loaded in single process. default is0
.collate_fn (callable) – merges a list of samples to form a mini-batch of Tensor(s). Used when using batched loading from a map-style dataset.
time_out (numeric) – if positive, the timeout value for collecting a batch from workers. Should always be non-negative. default is
0
.worker_init_fn (callable) – If not
None
, this will be called on each worker subprocess with the worker id (an int in[0, num_workers - 1]
) as input, after seeding and before data loading. default isNone
.prefetch_factor (int) – Number of samples loaded in advance by each worker.
2
means there will be a total of 2 * num_workers samples prefetched across all workers. default is2
persistent_workers (bool) – If
True
, the data loader will not shutdown the worker processes after a dataset has been consumed once. This allows to maintain the workers Dataset instances alive. default isFalse
.
Dataset¶
-
class
tensorlayerx.dataflow.
Dataset
[source]¶ An abstract class to encapsulate methods and behaviors of datasets. All datasets in map-style(dataset samples can be get by a given key) should be a subclass of ‘tensorlayerx.dataflow.Dataset’. ALl subclasses should implement following methods:
__getitem__
: get sample from dataset with a given index.__len__
: return dataset sample number.__add__
: concat two datasetsExamples
With TensorLayerx
>>> from tensorlayerx.dataflow import Dataset >>> class mnistdataset(Dataset): >>> def __init__(self, data, label,transform): >>> self.data = data >>> self.label = label >>> self.transform = transform >>> def __getitem__(self, index): >>> data = self.data[index].astype('float32') >>> data = self.transform(data) >>> label = self.label[index].astype('int64') >>> return data, label >>> def __len__(self): >>> return len(self.data) >>> train_dataset = mnistdataset(data = X_train, label = y_train ,transform = transform)
IterableDataset¶
-
class
tensorlayerx.dataflow.
IterableDataset
[source]¶ An abstract class to encapsulate methods and behaviors of iterable datasets. All datasets in iterable-style (can only get sample one by one sequentially, likea Python iterator) should be a subclass of tensorlayerx.dataflow.IterableDataset. All subclasses should implement following methods:
__iter__
: yield sample sequentially.Examples
With TensorLayerx
>>>#example 1: >>> from tensorlayerx.dataflow import IterableDataset >>> class mnistdataset(IterableDataset): >>> def __init__(self, data, label,transform): >>> self.data = data >>> self.label = label >>> self.transform = transform >>> def __iter__(self): >>> for i in range(len(self.data)): >>> data = self.data[i].astype(‘float32’) >>> data = self.transform(data) >>> label = self.label[i].astype(‘int64’) >>> yield data, label >>> train_dataset = mnistdataset(data = X_train, label = y_train ,transform = transform) >>>#example 2: >>>iterable_dataset_1 = mnistdataset(data_1, label_1, transform_1) >>>iterable_dataset_2 = mnistdataset(data_2, label_2, transform_2) >>>new_iterable_dataset = iterable_dataset_1 + iterable_dataset_2
TensorDataset¶
-
class
tensorlayerx.dataflow.
TensorDataset
(*tensors)[source]¶ Generate a dataset from a list of tensors. Each sample will be retrieved by indexing tensors along the first dimension.
- Parameters
*tensor (list or tuple of tensors) – tensors that have the same size of the first dimension.
Examples
With TensorLayerx
>>> import numpy as np >>> import tensorlayerx as tlx >>> data = np.random.random([10,224,224,3]).astype(np.float32) >>> label = np.random.random((10,)).astype(np.int32) >>> data = tlx.convert_to_tensor(data) >>> label = tlx.convert_to_tensor(label) >>> dataset = tlx.dataflow.TensorDataset([data, label]) >>> for i in range(len(dataset)): >>> x, y = dataset[i]
ChainDataset¶
-
class
tensorlayerx.dataflow.
ChainDataset
(datasets)[source]¶ A Dataset which chains multiple iterable-tyle datasets.
- Parameters
datasets (list or tuple) – sequence of datasets to be chainned.
Examples
With TensorLayerx
>>> import numpy as np >>> from tensorlayerx.dataflow import IterableDataset, ChainDataset >>> class mnistdataset(IterableDataset): >>> def __init__(self, data, label): >>> self.data = data >>> self.label = label >>> def __iter__(self): >>> for i in range(len(self.data)): >>> yield self.data[i] self.label[i] >>> train_dataset1 = mnistdataset(data = X_train1, label = y_train1) >>> train_dataset2 = mnistdataset(data = X_train2, label = y_train2) >>> train_dataset = ChainDataset([train_dataset1, train_dataset2])
ConcatDataset¶
-
class
tensorlayerx.dataflow.
ConcatDataset
(datasets)[source]¶ Concat multiple datasets into a new dataset
- Parameters
datasets (list or tuple) – sequence of datasets to be concatenated
Examples
With TensorLayerx
>>> import numpy as np >>> from tensorlayerx.dataflow import Dataset, ConcatDataset >>> class mnistdataset(Dataset): >>> def __init__(self, data, label,transform): >>> self.data = data >>> self.label = label >>> self.transform = transform >>> def __getitem__(self, index): >>> data = self.data[index].astype('float32') >>> data = self.transform(data) >>> label = self.label[index].astype('int64') >>> return data, label >>> def __len__(self): >>> return len(self.data) >>> train_dataset1 = mnistdataset(data = X_train1, label = y_train1 ,transform = transform1) >>> train_dataset2 = mnistdataset(data = X_train2, label = y_train2 ,transform = transform2) >>> train_dataset = ConcatDataset([train_dataset1, train_dataset2])
Subset¶
-
class
tensorlayerx.dataflow.
Subset
(dataset, indices)[source]¶ Subset of a dataset at specified indices.
- Parameters
dataset (Dataset) – The whole Dataset
indices (list or tuple) – Indices in the whole set selected for subset
Examples
With TensorLayerx
>>> import numpy as np >>> from tensorlayerx.dataflow import Dataset, Subset >>> class mnistdataset(Dataset): >>> def __init__(self, data, label): >>> self.data = data >>> self.label = label >>> def __iter__(self): >>> for i in range(len(self.data)): >>> yield self.data[i] self.label[i] >>> train_dataset = mnistdataset(data = X_train, label = y_train) >>> sub_dataset = Subset(train_dataset, indices=[1,2,3])
random_split¶
-
class
tensorlayerx.dataflow.
random_split
[source]¶ Randomly split a dataset into non-overlapping new datasets of given lengths.
- Parameters
dataset (Dataset) – dataset to be split
lengths (list or tuple) – lengths of splits to be produced
Examples
With TensorLayerx
>>> import numpy as np >>> from tensorlayerx.dataflow import Dataset, Subset >>> random_split(range(10), [3, 7])
Sampler¶
-
class
tensorlayerx.dataflow.
Sampler
[source]¶ Base class for all Samplers. All subclasses should implement following methods:
__iter__
: providing a way to iterate over indices of dataset element__len__
: the length of the returned iterators.Examples
With TensorLayerx
>>> from tensorlayerx.dataflow import Sampler >>> class MySampler(Sampler): >>> def __init__(self, data): >>> self.data = data >>> def __iter__(self): >>> return iter(range(len(self.data_source))) >>> def __len__(self): >>> return len(self.data)
BatchSampler¶
-
class
tensorlayerx.dataflow.
BatchSampler
(sampler=None, batch_size=1, drop_last=False)[source]¶ Wraps another sampler to yield a mini-batch of indices.
- Parameters
sampler (Sampler) – Base sampler.
batch_size (int) – Size of mini-batch
drop_last (bool) – If
True
, the sampler will drop the last batch if its size would be less thanbatch_size
Examples
With TensorLayerx
>>> from tensorlayerx.dataflow import BatchSampler, SequentialSampler >>> list(BatchSampler(SequentialSampler(range(10)), batch_size=3, drop_last=False)) >>> #[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]] >>> list(BatchSampler(SequentialSampler(range(10)), batch_size=3, drop_last=True)) >>> #[[0, 1, 2], [3, 4, 5], [6, 7, 8]]
RandomSampler¶
-
class
tensorlayerx.dataflow.
RandomSampler
(data, replacement=False, num_samples=None, generator=None)[source]¶ Samples elements randomly. If without replacement, then sample from a shuffled dataset. If with replacement, then user can specify`num_samples` to draw.
- Parameters
data (Dataset) – dataset to sample
replacement (bool) – samples are drawn on-demand with replacement if
True
, default=``False``num_samples (int) – number of samples to draw, default=`len(dataset)`. This argument is supposed to be specified only when replacement is
True
.generator (Generator) – Generator used in sampling. Default is None.
Examples
With TensorLayerx
>>> from tensorlayerx.dataflow import RandomSampler, Dataset >>> import numpy as np >>> class mydataset(Dataset): >>> def __init__(self): >>> self.data = [np.random.random((224,224,3)) for i in range(100)] >>> self.label = [np.random.randint(1, 10, (1,)) for i in range(100)] >>> def __getitem__(self, item): >>> x = self.data[item] >>> y = self.label[item] >>> return x, y >>> def __len__(self): >>> return len(self.data) >>> sampler = RandomSampler(data = mydataset())
SequentialSampler¶
-
class
tensorlayerx.dataflow.
SequentialSampler
(data)[source]¶ Samples elements sequentially, always in the same order.
- Parameters
data (Dataset) – dataset to sample
Examples
With TensorLayerx
>>> from tensorlayerx.dataflow import SequentialSampler, Dataset >>> import numpy as np >>> class mydataset(Dataset): >>> def __init__(self): >>> self.data = [np.random.random((224,224,3)) for i in range(100)] >>> self.label = [np.random.randint(1, 10, (1,)) for i in range(100)] >>> def __getitem__(self, item): >>> x = self.data[item] >>> y = self.label[item] >>> return x, y >>> def __len__(self): >>> return len(self.data) >>> sampler = SequentialSampler(data = mydataset())
WeightedRandomSampler¶
-
class
tensorlayerx.dataflow.
WeightedRandomSampler
(weights, num_samples, replacement=True)[source]¶ Samples elements from
[0,..,len(weights)-1]
with given probabilities (weights).- Parameters
weights (list or tuple) – a sequence of weights, not necessary summing up to one
num_samples (int) – number of samples to draw
replacement (bool) – if
True
, samples are drawn with replacement. If not, they are drawn without replacement, which means that when a sample index is drawn for a row, it cannot be drawn again for that row.
Examples
With TensorLayerx
>>> from tensorlayerx.dataflow import WeightedRandomSampler, Dataset >>> import numpy as np >>> sampler = list(WeightedRandomSampler(weights=[0.2,0.3,0.4,0.5,4.0], num_samples=5, replacement=True)) >>> #[4, 4, 1, 4, 4] >>> sampler = list(WeightedRandomSampler(weights=[0.2,0.3,0.4,0.5,0.6], num_samples=5, replacement=False)) >>> #[4, 1, 3, 0, 2]