API - Optimizers¶
TensorLayer provides rich layer implementations trailed for
various benchmarks and domain-specific problems. In addition, we also
support transparent access to native TensorFlow parameters.
For example, we provide not only layers for local response normalization, but also
layers that allow user to apply tf.ops.lrn
on network.outputs
.
More functions can be found in TensorFlow API.
TensorLayerX provides simple API and tools to ease research, development and reduce the time to production. Therefore, we provide the latest state of the art optimizers that work with Tensorflow, MindSpore, PaddlePaddle and PyTorch. The optimizers functions provided by Tensorflow, MindSpore, PaddlePaddle and PyTorch can be used in TensorLayerX. We have also wrapped the optimizers functions for each framework, which can be found in tensorlayerx.optimizers. In addition, we provide the latest state of Optimizers Dynamic Learning Rate that work with Tensorflow, MindSpore, PaddlePaddle and PyTorch.
Optimizers List¶
|
Optimizer that implements the Adadelta algorithm. |
|
Optimizer that implements the Adagrad algorithm. |
|
Optimizer that implements the Adam algorithm. |
|
Optimizer that implements the Adamax algorithm. |
|
Optimizer that implements the FTRL algorithm. |
|
Optimizer that implements the NAdam algorithm. |
|
Optimizer that implements the RMSprop algorithm. |
|
Gradient descent (with momentum) optimizer. |
|
Optimizer that implements the Momentum algorithm. |
|
Optimizer that implements the Layer-wise Adaptive Moments (LAMB). |
|
LARS is an optimization algorithm employing a large batch optimization technique. |
Optimizers Dynamic Learning Rate List¶
|
LRScheduler Base class. |
|
Update the learning rate of |
|
Set the learning rate using a cosine annealing schedule, where \(\eta_{max}\) is set to the initial learning_rate. |
|
Applies Noam Decay to the initial learning rate. |
|
Piecewise learning rate scheduler. |
|
Applies natural exponential decay to the initial learning rate. |
|
Applies inverse time decay to the initial learning rate. |
|
Applies polynomial decay to the initial learning rate. |
|
Linear learning rate warm up strategy. |
|
Update learning rate by gamma each epoch. |
|
Update the learning rate by |
|
Sets the learning rate of |
|
Reduce learning rate when |
Adadelta¶
-
class
tensorlayerx.optimizers.
Adadelta
(learning_rate=0.001, rho=0.95, epsilon=1e-07, *args, **kwargs)[source]¶ Optimizer that implements the Adadelta algorithm. Equivalent to tf.optimizers.Adadelta.
References
- Parameters
learning_rate (A Tensor, floating point value) – The learning rate. Defaults to 0.001.
rho (float or constant float tensor) – A Tensor or a floating point value. The decay rate.
epsilon (float) – A small constant for numerical stability.Defaults to 1e-7.
Examples
With TensorLayer
>>> import tensorlayerx as tlx >>> optimizer = tlx.optimizers.Adadelta(0.0001) >>> optimizer.apply_gradients(zip(grad, train_weights))
Adagrad¶
-
class
tensorlayerx.optimizers.
Adagrad
(learning_rate=0.001, rho=0.95, epsilon=1e-07, *args, **kwargs)[source]¶ Optimizer that implements the Adagrad algorithm. Equivalent to tf.optimizers.Adagrad.
References
- Parameters
learning_rate (A Tensor, floating point value) – The learning rate. Defaults to 0.001.
initial_accumulator_value (float) – Floating point value. Starting value for the accumulators (per-parameter momentum values). Must be non-negative.Defaults to 0.95.
epsilon (float) – A small constant for numerical stability.Defaults to 1e-7.
Examples
With TensorLayer
>>> import tensorlayerx as tlx >>> optimizer = tlx.optimizers.Adagrad(0.0001) >>> optimizer.apply_gradients(zip(grad, train_weights))
Adam¶
-
class
tensorlayerx.optimizers.
Adam
(learning_rate=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-07, *args, **kwargs)[source]¶ Optimizer that implements the Adam algorithm. Equivalent to tf.optimizers.Adam.
References
- Parameters
learning_rate (A Tensor, floating point value) – The learning rate. Defaults to 0.001.
beta_1 (float or constant float tensor) – The exponential decay rate for the 1st moment estimates. Defaults to 0.9.
beta_2 (float or constant float tensor) – The exponential decay rate for the 2nd moment estimates. Defaults to 0.999.
epsilon (float) – A small constant for numerical stability.Defaults to 1e-7.
Examples
With TensorLayer
>>> import tensorlayerx as tlx >>> optimizer = tlx.optimizers.Adam(0.0001) >>> optimizer.apply_gradients(zip(grad, train_weights))
Adamax¶
-
class
tensorlayerx.optimizers.
Adamax
(learning_rate=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-07, *args, **kwargs)[source]¶ Optimizer that implements the Adamax algorithm. Equivalent to tf.optimizers.Adamax.
References
- Parameters
learning_rate (A Tensor, floating point value) – The learning rate. Defaults to 0.001.
beta_1 (float or constant float tensor) – The exponential decay rate for the 1st moment estimates. Defaults to 0.9.
beta_2 (float or constant float tensor) – The exponential decay rate for the exponentially weighted infinity norm. Defaults to 0.999.
epsilon (float) – A small constant for numerical stability.Defaults to 1e-7.
Examples
With TensorLayer
>>> import tensorlayerx as tlx >>> optimizer = tlx.optimizers.Adamax(0.0001) >>> optimizer.apply_gradients(zip(grad, train_weights))
Ftrl¶
-
class
tensorlayerx.optimizers.
Ftrl
(learning_rate=0.001, learning_rate_power=-0.5, initial_accumulator_value=0.1, l1_regularization_strength=0.0, l2_regularization_strength=0.0, beta=0.0, l2_shrinkage_regularization_strength=0.0, **kwargs)[source]¶ Optimizer that implements the FTRL algorithm. Equivalent to tf.optimizers.Ftrl.
References
- Parameters
learning_rate (A Tensor, floating point value) – The learning rate. Defaults to 0.001.
learning_rate_power (float) – Controls how the learning rate decreases during training. Use zero for a fixed learning rate.
initial_accumulator_value (float) – The starting value for accumulators. Only zero or positive values are allowed.
l1_regularization_strength (float) – A float value, must be greater than or equal to zero. Defaults to 0.0.
l2_regularization_strength (float) – A float value, must be greater than or equal to zero. Defaults to 0.0.
l2_shrinkage_regularization_strength (float) – This differs from L2 above in that the L2 above is a stabilization penalty, whereas this L2 shrinkage is a magnitude penalty. When input is sparse shrinkage will only happen on the active weights.
beta (float) – A float value, representing the beta value from the paper. Defaults to 0.0.
Examples
With TensorLayer
>>> import tensorlayerx as tlx >>> optimizer = tlx.optimizers.Ftrl(0.0001) >>> optimizer.apply_gradients(zip(grad, train_weights))
Nadam¶
-
class
tensorlayerx.optimizers.
Nadam
(learning_rate=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-07, *args, **kwargs)[source]¶ Optimizer that implements the NAdam algorithm. Equivalent to tf.optimizers.Nadam.
References
- Parameters
learning_rate (A Tensor, floating point value) – The learning rate. Defaults to 0.001.
beta_1 (float or constant float tensor) – The exponential decay rate for the 1st moment estimates. Defaults to 0.9.
beta_2 (float or constant float tensor) – The exponential decay rate for the exponentially weighted infinity norm. Defaults to 0.999.
epsilon (float) – A small constant for numerical stability.Defaults to 1e-7.
Examples
With TensorLayer
>>> import tensorlayerx as tlx >>> optimizer = tlx.optimizers.Nadam(0.0001) >>> optimizer.apply_gradients(zip(grad, train_weights))
RMSprop¶
-
class
tensorlayerx.optimizers.
RMSprop
(learning_rate=0.001, rho=0.9, momentum=0.0, epsilon=1e-07, centered=False, *args, **kwargs)[source]¶ Optimizer that implements the RMSprop algorithm. Equivalent to tf.optimizers.RMSprop.
References
- Parameters
learning_rate (A Tensor, floating point value) – The learning rate. Defaults to 0.001.
rho (float) – Discounting factor for the history/coming gradient. Defaults to 0.9.
momentum (float) – A scalar or a scalar Tensor. Defaults to 0.0.
epsilon (float) – A small constant for numerical stability.Defaults to 1e-7.
centered (bool) – If True, gradients are normalized by the estimated variance of the gradient; if False, by the uncentered second moment. Setting this to True may help with training, but is slightly more expensive in terms of computation and memory. Defaults to False.
Examples
With TensorLayer
>>> import tensorlayerx as tlx >>> optimizer = tlx.optimizers.RMSprop(0.0001) >>> optimizer.apply_gradients(zip(grad, train_weights))
SGD¶
-
class
tensorlayerx.optimizers.
SGD
(learning_rate=0.01, momentum=0.0, nesterov=False, *args, **kwargs)[source]¶ Gradient descent (with momentum) optimizer. Equivalent to tf.optimizers.SGD.
References
- Parameters
learning_rate (A Tensor, floating point value) – The learning rate. Defaults to 0.001.
momentum (float) – float hyperparameter >= 0 that accelerates gradient descent in the relevant direction and dampens oscillations. Defaults to 0, i.e., vanilla gradient descent.
nesterov (bool) – Whether to apply Nesterov momentum. Defaults to False.
Examples
With TensorLayer
>>> import tensorlayerx as tlx >>> optimizer = tlx.optimizers.SGD(0.0001) >>> optimizer.apply_gradients(zip(grad, train_weights))
Momentum¶
-
class
tensorlayerx.optimizers.
Momentum
(learning_rate, momentum=0.0, *args, **kwargs)[source]¶ Optimizer that implements the Momentum algorithm. Equivalent to tf.compat.v1.train.MomentumOptimizer
References
- Parameters
learning_rate (A Tensor, floating point value) – The learning rate. Defaults to 0.001.
momentum (float) – A Tensor or a floating point value. The momentum. Defaults to 0
use_locking (bool) – If True use locks for update operations.
use_nesterov (bool) – If True use Nesterov Momentum. See (Sutskever et al., 2013). This implementation always computes gradients at the value of the variable(s) passed to the optimizer. Using Nesterov Momentum makes the variable(s) track the values called theta_t + mu*v_t in the paper. This implementation is an approximation of the original formula, valid for high values of momentum. It will compute the “adjusted gradient” in NAG by assuming that the new gradient will be estimated by the current average gradient plus the product of momentum and the change in the average gradient.
Examples
With TensorLayer
>>> import tensorlayerx as tlx >>> optimizer = tlx.optimizers.Momentum(0.0001) >>> optimizer.apply_gradients(zip(grad, train_weights))
Lamb¶
LARS¶
LRScheduler¶
-
class
tensorlayerx.optimizers.lr.
LRScheduler
(learning_rate=0.1, last_epoch=-1, verbose=False)[source]¶ LRScheduler Base class. Define the common interface of a learning rate scheduler.
User can import it by
from tl.optimizer.lr import LRScheduler
,then overload it for your subclass and have a custom implementation of
get_lr()
.References
- Parameters
learning_rate (A floating point value) – The learning rate. Defaults to 0.1.
last_epoch (int) – The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate.
verbose (bool) – If
True
, prints a message to stdout for each update. Default:False
.
Examples
With TensorLayer
>>> #Here is an example of a simple ``StepDecay`` implementation. >>> import tensorlayerx as tlx >>> from tensorlayerx.optimizers.lr import LRScheduler >>> class StepDecay(LRScheduler): >>> def __init__(self, learning_rate, step_size, gamma = 0.1, last_epoch = -1, verbose=False): >>> if not isinstance(step_size, int): >>> raise TypeError("The type of 'step_size' must be 'int', but received %s." %type(step_size)) >>> if gamma >= 1.0 : >>> raise ValueError('gamma should be < 1.0.') >>> self.step_size = step_size >>> self.gamma = gamma >>> super(StepDecay, self).__init__(learning_rate, last_epoch, verbose) >>> def get_lr(self): >>> i = self.last_epoch // self.step_size >>> return self.base_lr * (self.gamma**i)
StepDecay¶
-
class
tensorlayerx.optimizers.lr.
StepDecay
(learning_rate, step_size, gamma=0.1, last_epoch=-1, verbose=False)[source]¶ Update the learning rate of
optimizer
bygamma
everystep_size
number of epoch.\[new\_learning\_rate = learning\_rate * gamma^{epoch // step_size}\]References
- Parameters
learning_rate (float) – The learning rate.
step_size (int) – the interval to update.
gamma (float) – The Ratio that the learning rate will be reduced.
new_lr = origin_lr * gamma
. It should be less than 1.0. Default: 0.1.last_epoch (int) – The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate.
verbose (bool) – If
True
, prints a message to stdout for each update. Default:False
.
Examples
With TensorLayer
>>> import tensorlayerx as tlx >>> scheduler = tlx.optimizers.lr.StepDecay(learning_rate = 0.1, step_size = 10, gamma = 0.1, last_epoch = -1, verbose = False) >>> sgd = tlx.optimizers.SGD(learning_rate=scheduler,momentum=0.2) >>> for epoch in range(100): >>> for batch in range(100): >>> # train model >>> scheduler.step() # If you update learning rate each batch >>> #scheduler.step() # If you update learning rate each epoch
CosineAnnealingDecay¶
-
class
tensorlayerx.optimizers.lr.
CosineAnnealingDecay
(learning_rate, T_max, eta_min=0, last_epoch=-1, verbose=False)[source]¶ Set the learning rate using a cosine annealing schedule, where \(\eta_{max}\) is set to the initial learning_rate. \(T_{cur}\) is the number of epochs since the last restart in SGDR.
\[\begin{aligned} \eta_t & = \eta_{min} + \frac{1}{2}(\eta_{max} - \eta_{min})\left(1 + \cos\left(\frac{T_{cur}}{T_{max}}\pi\right)\right), & T_{cur} \neq (2k+1)T_{max}; \ \eta_{t+1} & = \eta_{t} + \frac{1}{2}(\eta_{max} - \eta_{min}) \left(1 - \cos\left(\frac{1}{T_{max}}\pi\right)\right), & T_{cur} = (2k+1)T_{max}. \end{aligned}\]References
- Parameters
learning_rate (float or int) – The initial learning rate, that is \(\eta_{max}\) . It can be set to python float or int number.
T_max (int) – Maximum number of iterations. It is half of the decay cycle of learning rate.
eta_min (float or int) – Minimum learning rate, that is \(\eta_{min}\) . Default: 0.
last_epoch (int) – The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate.
verbose (bool) – If
True
, prints a message to stdout for each update. Default:False
.
Examples
With TensorLayer
>>> import tensorlayerx as tlx >>> scheduler = tlx.optimizers.lr.CosineAnnealingDecay(learning_rate = 0.1, step = 10, gamma = 0.1, last_epoch = -1, verbose = False) >>> sgd = tlx.optimizers.SGD(learning_rate=scheduler,momentum=0.2) >>> for epoch in range(100): >>> for step in range(100): >>> # train model >>> scheduler.step() # If you update learning rate each step >>> #scheduler.step() # If you update learning rate each epoch
NoamDecay¶
-
class
tensorlayerx.optimizers.lr.
NoamDecay
(d_model, warmup_steps, learning_rate=1.0, last_epoch=-1, verbose=False)[source]¶ Applies Noam Decay to the initial learning rate.
\[new\_learning\_rate = learning\_rate * d_{model}^{-0.5} * min(epoch^{-0.5}, epoch * warmup\_steps^{-1.5})\]References
https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/optimizer/lr/NoamDecay_cn.html
‘Attention is all you need’<https://arxiv.org/pdf/1706.03762.pdf>_
- Parameters
d_model (int) – The dimensionality of input and output feature vector of model. It is a python int number.
warmup_steps (int) – The number of warmup steps. A super parameter. It is a python int number
learning_rate (float) – The initial learning rate. It is a python float number. Default: 1.0.
last_epoch (int) – The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate.
verbose (bool) – If
True
, prints a message to stdout for each update. Default:False
.
Examples
With TensorLayer
>>> import tensorlayerx as tlx >>> scheduler = tlx.optimizers.lr.NoamDecay(d_model=0.01, warmup_steps=100, verbose=True) >>> sgd = tlx.optimizers.SGD(learning_rate=scheduler,momentum=0.2) >>> for epoch in range(100): >>> for step in range(100): >>> # train model >>> scheduler.step() # If you update learning rate each step >>> #scheduler.step() # If you update learning rate each epoch
PiecewiseDecay¶
-
class
tensorlayerx.optimizers.lr.
PiecewiseDecay
(boundaries, values, last_epoch=-1, verbose=False)[source]¶ Piecewise learning rate scheduler.
boundaries = [100, 200] values = [1.0, 0.5, 0.1] if epoch < 100: learning_rate = 1.0 elif 100 <= global_step < 200: learning_rate = 0.5 else: learning_rate = 0.1
References
https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/optimizer/lr/PiecewiseDecay_cn.html
- Parameters
boundaries (list) – A list of steps numbers.
values (list) – A list of learning rate values that will be picked during different epoch boundaries.
last_epoch (int) – The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate.
verbose (bool) – If
True
, prints a message to stdout for each update. Default:False
.
Examples
With TensorLayer
>>> import tensorlayerx as tlx >>> scheduler = tlx.optimizers.lr.PiecewiseDecay(boundaries=[100, 200], values=[0.1, 0.5, 0.1], verbose=True) >>> sgd = tlx.optimizers.SGD(learning_rate=scheduler,momentum=0.2) >>> for epoch in range(100): >>> for step in range(100): >>> # train model >>> scheduler.step() # If you update learning rate each step >>> #scheduler.step() # If you update learning rate each epoch
NaturalExpDecay¶
-
class
tensorlayerx.optimizers.lr.
NaturalExpDecay
(learning_rate, gamma, last_epoch=-1, verbose=False)[source]¶ Applies natural exponential decay to the initial learning rate.
\[new\_learning\_rate = learning\_rate * e^{- gamma * epoch}\]References
- Parameters
learning_rate (float) – The initial learning rate.
gamma (float) – A Ratio to update the learning rate. Default: 0.1.
last_epoch (int) – The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate.
verbose (bool) – If
True
, prints a message to stdout for each update. Default:False
.
Examples
With TensorLayer
>>> import tensorlayerx as tlx >>> scheduler = tlx.optimizers.lr.NaturalExpDecay(learning_rate=0.1, gamma=0.1, verbose=True) >>> sgd = tlx.optimizers.SGD(learning_rate=scheduler,momentum=0.2) >>> for epoch in range(100): >>> for step in range(100): >>> # train model >>> scheduler.step() # If you update learning rate each step >>> #scheduler.step() # If you update learning rate each epoch
InverseTimeDecay¶
-
class
tensorlayerx.optimizers.lr.
InverseTimeDecay
(learning_rate, gamma, last_epoch=-1, verbose=False)[source]¶ Applies inverse time decay to the initial learning rate.
\[new\_learning\_rate = \frac{learning\_rate}{1 + gamma * epoch}\]References
- Parameters
learning_rate (float) – The initial learning rate.
gamma (float) – A Ratio to update the learning rate. Default: 0.1.
last_epoch (int) – The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate.
verbose (bool) – If
True
, prints a message to stdout for each update. Default:False
.
Examples
With TensorLayer
>>> import tensorlayerx as tlx >>> scheduler = tlx.optimizers.lr.InverseTimeDecay(learning_rate=0.1, gamma=0.1, verbose=True) >>> sgd = tlx.optimizers.SGD(learning_rate=scheduler,momentum=0.2) >>> for epoch in range(100): >>> for step in range(100): >>> # train model >>> scheduler.step() # If you update learning rate each step >>> #scheduler.step() # If you update learning rate each epoch
PolynomialDecay¶
-
class
tensorlayerx.optimizers.lr.
PolynomialDecay
(learning_rate, decay_steps, end_lr=0.0001, power=1.0, cycle=False, last_epoch=-1, verbose=False)[source]¶ Applies polynomial decay to the initial learning rate.
If cycle is set to True, then:
\[ \begin{align}\begin{aligned}decay\_steps & = decay\_steps * math.ceil(\frac{epoch}{decay\_steps})\\new\_learning\_rate & = (learning\_rate-end\_lr)*(1-\frac{epoch}{decay\_steps})^{power}+end\_lr\end{aligned}\end{align} \]If cycle is set to False, then:
\[ \begin{align}\begin{aligned}epoch & = min(epoch, decay\_steps)\\new\_learning\_rate & = (learning\_rate-end\_lr)*(1-\frac{epoch}{decay\_steps})^{power}+end\_lr\end{aligned}\end{align} \]References
- Parameters
learning_rate (float) – The initial learning rate.
decay_steps (int) – The decay step size. It determines the decay cycle.
end_lr (float) – The minimum final learning rate. Default: 0.0001.
power (float) – Power of polynomial. Default: 1.0.
cycle (bool) – Whether the learning rate rises again. If True, then the learning rate will rise when it decrease to
end_lr
. If False, the learning rate is monotone decreasing. Default: False.last_epoch (int) – The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate.
verbose (bool) – If
True
, prints a message to stdout for each update. Default:False
.
Examples
With TensorLayer
>>> import tensorlayerx as tlx >>> scheduler = tlx.optimizers.lr.PolynomialDecay(learning_rate=0.1, decay_steps=50, verbose=True) >>> sgd = tlx.optimizers.SGD(learning_rate=scheduler,momentum=0.2) >>> for epoch in range(100): >>> for step in range(100): >>> # train model >>> scheduler.step() # If you update learning rate each step >>> #scheduler.step() # If you update learning rate each epoch
LinearWarmup¶
-
class
tensorlayerx.optimizers.lr.
LinearWarmup
(learning_rate, warmup_steps, start_lr, end_lr, last_epoch=-1, verbose=False)[source]¶ Linear learning rate warm up strategy. Update the learning rate preliminarily before the normal learning rate scheduler.
When epoch < warmup_steps, learning rate is updated as:
\[lr = start\_lr + (end\_lr - start\_lr) * \frac{epoch}{warmup\_steps}\]where start_lr is the initial learning rate, and end_lr is the final learning rate;
When epoch >= warmup_steps, learning rate is updated as:
\[lr = learning_rate\]where
learning_rate
is float or any subclass ofLRScheduler
.References
https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/optimizer/lr/LinearWarmup_cn.html
Bag of Tricks for Image Classification with Convolutional Neural Networks
- Parameters
learning_rate (float) – The initial learning rate.
warmup_steps (int) – total steps of warm up.
start_lr (float) – Initial learning rate of warm up.
end_lr (float) – Final learning rate of warm up.
last_epoch (int) – The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate.
verbose (bool) – If
True
, prints a message to stdout for each update. Default:False
.
Examples
With TensorLayer
>>> import tensorlayerx as tlx >>> scheduler = tlx.optimizers.lr.LinearWarmup(learning_rate=0.1, warmup_steps=20, start_lr=0.0, end_lr=0.5, verbose=True) >>> sgd = tlx.optimizers.SGD(learning_rate=scheduler,momentum=0.2) >>> for epoch in range(100): >>> for step in range(100): >>> # train model >>> scheduler.step() # If you update learning rate each step >>> #scheduler.step() # If you update learning rate each epoch
ExponentialDecay¶
-
class
tensorlayerx.optimizers.lr.
ExponentialDecay
(learning_rate, gamma, last_epoch=-1, verbose=False)[source]¶ Update learning rate by gamma each epoch.
When epoch < warmup_steps, learning rate is updated as:
\[new\_learning\_rate = last\_learning\_rate * gamma\]References
- Parameters
learning_rate (float) – The initial learning rate.
gamma (float) – The Ratio that the learning rate will be reduced. It should be less than 1.0. Default: 0.1.
last_epoch (int) – The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate.
verbose (bool) – If
True
, prints a message to stdout for each update. Default:False
.
Examples
With TensorLayer
>>> import tensorlayerx as tlx >>> scheduler = tlx.optimizers.lr.ExponentialDecay(learning_rate=0.1, gamma=0.9, verbose=True) >>> sgd = tlx.optimizers.SGD(learning_rate=scheduler,momentum=0.2) >>> for epoch in range(100): >>> for step in range(100): >>> # train model >>> scheduler.step() # If you update learning rate each step >>> #scheduler.step() # If you update learning rate each epoch
MultiStepDecay¶
-
class
tensorlayerx.optimizers.lr.
MultiStepDecay
(learning_rate, milestones, gamma=0.1, last_epoch=-1, verbose=False)[source]¶ Update the learning rate by
gamma
onceepoch
reaches one of the milestones. The algorithm can be described as the code below.learning_rate = 0.1 milestones = [50, 100] gamma = 0.1 if epoch < 50: learning_rate = 0.1 elif epoch < 100: learning_rate = 0.01 else: learning_rate = 0.001
References
https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/optimizer/lr/MultiStepDecay_cn.html
- Parameters
learning_rate (float) – The initial learning rate.
milestones (list) – List or tuple of each boundaries. Must be increasing.
gamma (float) – The Ratio that the learning rate will be reduced. It should be less than 1.0. Default: 0.1.
last_epoch (int) – The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate.
verbose (bool) – If
True
, prints a message to stdout for each update. Default:False
.
Examples
With TensorLayer
>>> import tensorlayerx as tlx >>> scheduler = tlx.optimizers.lr.MultiStepDecay(learning_rate=0.1, milestones=[50, 100], gamma=0.1, verbose=True) >>> sgd = tlx.optimizers.SGD(learning_rate=scheduler,momentum=0.2) >>> for epoch in range(100): >>> for step in range(100): >>> # train model >>> scheduler.step() # If you update learning rate each step >>> #scheduler.step() # If you update learning rate each epoch
LambdaDecay¶
-
class
tensorlayerx.optimizers.lr.
LambdaDecay
(learning_rate, lr_lambda, last_epoch=-1, verbose=False)[source]¶ Sets the learning rate of
optimizer
by functionlr_lambda
.lr_lambda
is funciton which receivesepoch
.The algorithm can be described as the code below.
learning_rate = 0.5 # init learning_rate lr_lambda = lambda epoch: 0.95 ** epoch learning_rate = 0.5 # epoch 0, 0.5*0.95**0 learning_rate = 0.475 # epoch 1, 0.5*0.95**1 learning_rate = 0.45125 # epoch 2, 0.5*0.95**2
References
- Parameters
learning_rate (float) – The initial learning rate.
lr_lambda (function) – A function which computes a factor by
epoch
, and then multiply the initial learning rate by this factor.last_epoch (int) – The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate.
verbose (bool) – If
True
, prints a message to stdout for each update. Default:False
.
Examples
With TensorLayer
>>> import tensorlayerx as tlx >>> scheduler = tlx.optimizers.lr.LambdaDecay(learning_rate=0.1, lr_lambda=lambda x:0.9**x, verbose=True) >>> sgd = tlx.optimizers.SGD(learning_rate=scheduler,momentum=0.2) >>> for epoch in range(100): >>> for step in range(100): >>> # train model >>> scheduler.step() # If you update learning rate each step >>> #scheduler.step() # If you update learning rate each epoch
ReduceOnPlateau¶
-
class
tensorlayerx.optimizers.lr.
ReduceOnPlateau
(learning_rate, mode='min', factor=0.1, patience=10, threshold=0.0001, threshold_mode='rel', cooldown=0, min_lr=0, epsilon=1e-08, verbose=False)[source]¶ Reduce learning rate when
metrics
has stopped descending. Models often benefit from reducing the learning rate by 2 to 10 times once model performance has no longer improvement.The
metrics
is the one which has been pass intostep
, it must be 1-D Tensor with shape [1]. Whenmetrics
stop descending for apatience
number of epochs, the learning rate will be reduced tolearning_rate * factor
. (Specially,mode
can also be set to'max
, in this case, whenmetrics
stop ascending for apatience
number of epochs, the learning rate will be reduced.)In addition, After each reduction, it will wait a
cooldown
number of epochs before resuming above operation.References
- Parameters
learning_rate (float) – The initial learning rate.
mode (str) –
'min'
or'max'
can be selected. Normally, it is'min'
, which means that the learning rate will reduce whenloss
stops descending.Specially, if it’s set to
'max'
, the learning rate will reduce whenloss
stops ascending. Default:'min'
.
factor (float) – The Ratio that the learning rate will be reduced.It should be less than 1.0. Default: 0.1.
patience (int) – When
loss
doesn’t improve for this number of epochs, learing rate will be reduced. Default: 10.threshold (float) –
threshold
andthreshold_mode
will determine the minimum change ofloss
. This make tiny changes ofloss
will be ignored. Default: 1e-4.threshold_mode (str) –
'rel'
or'abs'
can be selected. In'rel'
mode, the minimum change ofloss
islast_loss * threshold
, wherelast_loss
isloss
in last epoch. In'abs'
mode, the minimum change ofloss
isthreshold
. Default:'rel'
.cooldown (int) – The number of epochs to wait before resuming normal operation. Default: 0.
min_lr (float) – The lower bound of the learning rate after reduction. Default: 0.
epsilon (float) – Minimal decay applied to lr. If the difference between new and old lr is smaller than epsilon, the update is ignored. Default: 1e-8.
verbose (bool) – If
True
, prints a message to stdout for each update. Default:False
.
Examples
With TensorLayer
>>> import tensorlayerx as tlx >>> scheduler = tlx.optimizers.lr.ReduceOnPlateau(learning_rate=1.0, factor=0.5, patience=5, verbose=True) >>> sgd = tlx.optimizers.SGD(learning_rate=scheduler,momentum=0.2) >>> for epoch in range(100): >>> for step in range(100): >>> # train model >>> scheduler.step() # If you update learning rate each step >>> #scheduler.step() # If you update learning rate each epoch