Shortcuts

torch.nn.functional

Convolution functions

conv1d

Applies a 1D convolution over an input signal composed of several input planes.

conv2d

Applies a 2D convolution over an input image composed of several input planes.

conv3d

Applies a 3D convolution over an input image composed of several input planes.

conv_transpose1d

Applies a 1D transposed convolution operator over an input signal composed of several input planes, sometimes also called "deconvolution".

conv_transpose2d

Applies a 2D transposed convolution operator over an input image composed of several input planes, sometimes also called "deconvolution".

conv_transpose3d

Applies a 3D transposed convolution operator over an input image composed of several input planes, sometimes also called "deconvolution"

unfold

Extracts sliding local blocks from a batched input tensor.

fold

Combines an array of sliding local blocks into a large containing tensor.

Pooling functions

avg_pool1d

Applies a 1D average pooling over an input signal composed of several input planes.

avg_pool2d

Applies 2D average-pooling operation in kH×kWkH \times kW regions by step size sH×sWsH \times sW steps.

avg_pool3d

Applies 3D average-pooling operation in kT×kH×kWkT \times kH \times kW regions by step size sT×sH×sWsT \times sH \times sW steps.

max_pool1d

Applies a 1D max pooling over an input signal composed of several input planes.

max_pool2d

Applies a 2D max pooling over an input signal composed of several input planes.

max_pool3d

Applies a 3D max pooling over an input signal composed of several input planes.

max_unpool1d

Computes a partial inverse of MaxPool1d.

max_unpool2d

Computes a partial inverse of MaxPool2d.

max_unpool3d

Computes a partial inverse of MaxPool3d.

lp_pool1d

Applies a 1D power-average pooling over an input signal composed of several input planes.

lp_pool2d

Applies a 2D power-average pooling over an input signal composed of several input planes.

adaptive_max_pool1d

Applies a 1D adaptive max pooling over an input signal composed of several input planes.

adaptive_max_pool2d

Applies a 2D adaptive max pooling over an input signal composed of several input planes.

adaptive_max_pool3d

Applies a 3D adaptive max pooling over an input signal composed of several input planes.

adaptive_avg_pool1d

Applies a 1D adaptive average pooling over an input signal composed of several input planes.

adaptive_avg_pool2d

Applies a 2D adaptive average pooling over an input signal composed of several input planes.

adaptive_avg_pool3d

Applies a 3D adaptive average pooling over an input signal composed of several input planes.

fractional_max_pool2d

Applies 2D fractional max pooling over an input signal composed of several input planes.

fractional_max_pool3d

Applies 3D fractional max pooling over an input signal composed of several input planes.

Attention Mechanisms

scaled_dot_product_attention

Computes scaled dot product attention on query, key and value tensors, using an optional attention mask if passed, and applying dropout if a probability greater than 0.0 is specified.

Non-linear activation functions

threshold

Thresholds each element of the input Tensor.

threshold_

In-place version of threshold().

relu

Applies the rectified linear unit function element-wise.

relu_

In-place version of relu().

hardtanh

Applies the HardTanh function element-wise.

hardtanh_

In-place version of hardtanh().

hardswish

Applies the hardswish function, element-wise, as described in the paper:

relu6

Applies the element-wise function ReLU6(x)=min(max(0,x),6)\text{ReLU6}(x) = \min(\max(0,x), 6).

elu

Applies the Exponential Linear Unit (ELU) function element-wise.

elu_

In-place version of elu().

selu

Applies element-wise, SELU(x)=scale(max(0,x)+min(0,α(exp(x)1)))\text{SELU}(x) = scale * (\max(0,x) + \min(0, \alpha * (\exp(x) - 1))), with α=1.6732632423543772848170429916717\alpha=1.6732632423543772848170429916717 and scale=1.0507009873554804934193349852946scale=1.0507009873554804934193349852946.

celu

Applies element-wise, CELU(x)=max(0,x)+min(0,α(exp(x/α)1))\text{CELU}(x) = \max(0,x) + \min(0, \alpha * (\exp(x/\alpha) - 1)).

leaky_relu

Applies element-wise, LeakyReLU(x)=max(0,x)+negative_slopemin(0,x)\text{LeakyReLU}(x) = \max(0, x) + \text{negative\_slope} * \min(0, x)

leaky_relu_

In-place version of leaky_relu().

prelu

Applies element-wise the function PReLU(x)=max(0,x)+weightmin(0,x)\text{PReLU}(x) = \max(0,x) + \text{weight} * \min(0,x) where weight is a learnable parameter.

rrelu

Randomized leaky ReLU.

rrelu_

In-place version of rrelu().

glu

The gated linear unit.

gelu

When the approximate argument is 'none', it applies element-wise the function GELU(x)=xΦ(x)\text{GELU}(x) = x * \Phi(x)

logsigmoid

Applies element-wise LogSigmoid(xi)=log(11+exp(xi))\text{LogSigmoid}(x_i) = \log \left(\frac{1}{1 + \exp(-x_i)}\right)

hardshrink

Applies the hard shrinkage function element-wise

tanhshrink

Applies element-wise, Tanhshrink(x)=xTanh(x)\text{Tanhshrink}(x) = x - \text{Tanh}(x)

softsign

Applies element-wise, the function SoftSign(x)=x1+x\text{SoftSign}(x) = \frac{x}{1 + |x|}

softplus

Applies element-wise, the function Softplus(x)=1βlog(1+exp(βx))\text{Softplus}(x) = \frac{1}{\beta} * \log(1 + \exp(\beta * x)).

softmin

Applies a softmin function.

softmax

Applies a softmax function.

softshrink

Applies the soft shrinkage function elementwise

gumbel_softmax

Samples from the Gumbel-Softmax distribution (Link 1 Link 2) and optionally discretizes.

log_softmax

Applies a softmax followed by a logarithm.

tanh

Applies element-wise, Tanh(x)=tanh(x)=exp(x)exp(x)exp(x)+exp(x)\text{Tanh}(x) = \tanh(x) = \frac{\exp(x) - \exp(-x)}{\exp(x) + \exp(-x)}

sigmoid

Applies the element-wise function Sigmoid(x)=11+exp(x)\text{Sigmoid}(x) = \frac{1}{1 + \exp(-x)}

hardsigmoid

Applies the element-wise function

silu

Applies the Sigmoid Linear Unit (SiLU) function, element-wise.

mish

Applies the Mish function, element-wise.

batch_norm

Applies Batch Normalization for each channel across a batch of data.

group_norm

Applies Group Normalization for last certain number of dimensions.

instance_norm

Applies Instance Normalization for each channel in each data sample in a batch.

layer_norm

Applies Layer Normalization for last certain number of dimensions.

local_response_norm

Applies local response normalization over an input signal composed of several input planes, where channels occupy the second dimension.

normalize

Performs LpL_p normalization of inputs over specified dimension.

Linear functions

linear

Applies a linear transformation to the incoming data: y=xAT+by = xA^T + b.

bilinear

Applies a bilinear transformation to the incoming data: y=x1TAx2+by = x_1^T A x_2 + b

Dropout functions

dropout

During training, randomly zeroes some of the elements of the input tensor with probability p using samples from a Bernoulli distribution.

alpha_dropout

Applies alpha dropout to the input.

feature_alpha_dropout

Randomly masks out entire channels (a channel is a feature map, e.g.

dropout1d

Randomly zero out entire channels (a channel is a 1D feature map, e.g., the jj-th channel of the ii-th sample in the batched input is a 1D tensor input[i,j]\text{input}[i, j]) of the input tensor).

dropout2d

Randomly zero out entire channels (a channel is a 2D feature map, e.g., the jj-th channel of the ii-th sample in the batched input is a 2D tensor input[i,j]\text{input}[i, j]) of the input tensor).

dropout3d

Randomly zero out entire channels (a channel is a 3D feature map, e.g., the jj-th channel of the ii-th sample in the batched input is a 3D tensor input[i,j]\text{input}[i, j]) of the input tensor).

Sparse functions

embedding

A simple lookup table that looks up embeddings in a fixed dictionary and size.

embedding_bag

Computes sums, means or maxes of bags of embeddings, without instantiating the intermediate embeddings.

one_hot

Takes LongTensor with index values of shape (*) and returns a tensor of shape (*, num_classes) that have zeros everywhere except where the index of last dimension matches the corresponding value of the input tensor, in which case it will be 1.

Distance functions

pairwise_distance

See torch.nn.PairwiseDistance for details

cosine_similarity

Returns cosine similarity between x1 and x2, computed along dim.

pdist

Computes the p-norm distance between every pair of row vectors in the input.

Loss functions

binary_cross_entropy

Function that measures the Binary Cross Entropy between the target and input probabilities.

binary_cross_entropy_with_logits

Function that measures Binary Cross Entropy between target and input logits.

poisson_nll_loss

Poisson negative log likelihood loss.

cosine_embedding_loss

See CosineEmbeddingLoss for details.

cross_entropy

This criterion computes the cross entropy loss between input logits and target.

ctc_loss

The Connectionist Temporal Classification loss.

gaussian_nll_loss

Gaussian negative log likelihood loss.

hinge_embedding_loss

See HingeEmbeddingLoss for details.

kl_div

The Kullback-Leibler divergence Loss

l1_loss

Function that takes the mean element-wise absolute value difference.

mse_loss

Measures the element-wise mean squared error.

margin_ranking_loss

See MarginRankingLoss for details.

multilabel_margin_loss

See MultiLabelMarginLoss for details.

multilabel_soft_margin_loss

See MultiLabelSoftMarginLoss for details.

multi_margin_loss

See MultiMarginLoss for details.

nll_loss

The negative log likelihood loss.

huber_loss

Function that uses a squared term if the absolute element-wise error falls below delta and a delta-scaled L1 term otherwise.

smooth_l1_loss

Function that uses a squared term if the absolute element-wise error falls below beta and an L1 term otherwise.

soft_margin_loss

See SoftMarginLoss for details.

triplet_margin_loss

See TripletMarginLoss for details

triplet_margin_with_distance_loss

See TripletMarginWithDistanceLoss for details.

Vision functions

pixel_shuffle

Rearranges elements in a tensor of shape (,C×r2,H,W)(*, C \times r^2, H, W) to a tensor of shape (,C,H×r,W×r)(*, C, H \times r, W \times r), where r is the upscale_factor.

pixel_unshuffle

Reverses the PixelShuffle operation by rearranging elements in a tensor of shape (,C,H×r,W×r)(*, C, H \times r, W \times r) to a tensor of shape (,C×r2,H,W)(*, C \times r^2, H, W), where r is the downscale_factor.

pad

Pads tensor.

interpolate

Down/up samples the input to either the given size or the given scale_factor

upsample

Upsamples the input to either the given size or the given scale_factor

upsample_nearest

Upsamples the input, using nearest neighbours' pixel values.

upsample_bilinear

Upsamples the input, using bilinear upsampling.

grid_sample

Given an input and a flow-field grid, computes the output using input values and pixel locations from grid.

affine_grid

Generates a 2D or 3D flow field (sampling grid), given a batch of affine matrices theta.

DataParallel functions (multi-GPU, distributed)

data_parallel

torch.nn.parallel.data_parallel

Evaluate module(input) in parallel across the GPUs given in device_ids.

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources