torch.cuda¶
This package adds support for CUDA tensor types, that implement the same function as CPU tensors, but they utilize GPUs for computation.
It is lazily initialized, so you can always import it, and use
is_available()
to determine if your system supports CUDA.
CUDA semantics has more details about working with CUDA.
StreamContext |
Context-manager that selects a given stream. |
can_device_access_peer |
Checks if peer access between two devices is possible. |
current_blas_handle |
Returns cublasHandle_t pointer to current cuBLAS handle |
current_device |
Returns the index of a currently selected device. |
current_stream |
Returns the currently selected |
default_stream |
Returns the default |
device |
Context-manager that changes the selected device. |
device_count |
Returns the number of GPUs available. |
device_of |
Context-manager that changes the current device to that of given object. |
get_arch_list |
Returns list CUDA architectures this library was compiled for. |
get_device_capability |
Gets the cuda capability of a device. |
get_device_name |
Gets the name of a device. |
get_device_properties |
Gets the properties of a device. |
get_gencode_flags |
Returns NVCC gencode flags this library was compiled with. |
get_sync_debug_mode |
Returns current value of debug mode for cuda synchronizing operations. |
init |
Initialize PyTorch's CUDA state. |
ipc_collect |
Force collects GPU memory after it has been released by CUDA IPC. |
is_available |
Returns a bool indicating if CUDA is currently available. |
is_initialized |
Returns whether PyTorch's CUDA state has been initialized. |
memory_usage |
Returns the percent of time over the past sample period during which global (device) memory was being read or written. |
set_device |
Sets the current device. |
set_stream |
Sets the current stream.This is a wrapper API to set the stream. |
set_sync_debug_mode |
Sets the debug mode for cuda synchronizing operations. |
stream |
Wrapper around the Context-manager StreamContext that selects a given stream. |
synchronize |
Waits for all kernels in all streams on a CUDA device to complete. |
utilization |
Returns the percent of time over the past sample period during which one or more kernels was executing on the GPU as given by nvidia-smi. |
temperature |
Returns the average temperature of the GPU sensor in Degrees C (Centigrades) |
power_draw |
Returns the average power draw of the GPU sensor in mW (MilliWatts) |
clock_rate |
Returns the clock speed of the GPU SM in Hz Hertz over the past sample period as given by nvidia-smi. |
OutOfMemoryError |
Exception raised when CUDA is out of memory |
Random Number Generator¶
get_rng_state |
Returns the random number generator state of the specified GPU as a ByteTensor. |
get_rng_state_all |
Returns a list of ByteTensor representing the random number states of all devices. |
set_rng_state |
Sets the random number generator state of the specified GPU. |
set_rng_state_all |
Sets the random number generator state of all devices. |
manual_seed |
Sets the seed for generating random numbers for the current GPU. |
manual_seed_all |
Sets the seed for generating random numbers on all GPUs. |
seed |
Sets the seed for generating random numbers to a random number for the current GPU. |
seed_all |
Sets the seed for generating random numbers to a random number on all GPUs. |
initial_seed |
Returns the current random seed of the current GPU. |
Communication collectives¶
Broadcasts a tensor to specified GPU devices. |
|
Broadcast a sequence of tensors to the specified GPUs. |
|
Sum tensors from multiple GPUs. |
|
Scatters tensor across multiple GPUs. |
|
Gathers tensors from multiple GPU devices. |
Streams and events¶
Stream |
Wrapper around a CUDA stream. |
ExternalStream |
Wrapper around an externally allocated CUDA stream. |
Event |
Wrapper around a CUDA event. |
Graphs (beta)¶
is_current_stream_capturing |
Returns True if CUDA graph capture is underway on the current CUDA stream, False otherwise. |
graph_pool_handle |
Returns an opaque token representing the id of a graph memory pool. |
CUDAGraph |
Wrapper around a CUDA graph. |
graph |
Context-manager that captures CUDA work into a |
make_graphed_callables |
Accepts callables (functions or |
Memory management¶
empty_cache |
Release all unoccupied cached memory currently held by the caching allocator so that those can be used in other GPU application and visible in nvidia-smi. |
list_gpu_processes |
Return a human-readable printout of the running processes and their GPU memory use for a given device. |
mem_get_info |
Return the global free and total GPU memory for a given device using cudaMemGetInfo. |
memory_stats |
Return a dictionary of CUDA memory allocator statistics for a given device. |
memory_summary |
Return a human-readable printout of the current memory allocator statistics for a given device. |
memory_snapshot |
Return a snapshot of the CUDA memory allocator state across all devices. |
memory_allocated |
Return the current GPU memory occupied by tensors in bytes for a given device. |
max_memory_allocated |
Return the maximum GPU memory occupied by tensors in bytes for a given device. |
reset_max_memory_allocated |
Reset the starting point in tracking maximum GPU memory occupied by tensors for a given device. |
memory_reserved |
Return the current GPU memory managed by the caching allocator in bytes for a given device. |
max_memory_reserved |
Return the maximum GPU memory managed by the caching allocator in bytes for a given device. |
set_per_process_memory_fraction |
Set memory fraction for a process. |
memory_cached |
Deprecated; see |
max_memory_cached |
Deprecated; see |
reset_max_memory_cached |
Reset the starting point in tracking maximum GPU memory managed by the caching allocator for a given device. |
reset_peak_memory_stats |
Reset the "peak" stats tracked by the CUDA memory allocator. |
caching_allocator_alloc |
Perform a memory allocation using the CUDA memory allocator. |
caching_allocator_delete |
Delete memory allocated using the CUDA memory allocator. |
get_allocator_backend |
Return a string describing the active allocator backend as set by |
CUDAPluggableAllocator |
CUDA memory allocator loaded from a so file. |
change_current_allocator |
Change the currently used memory allocator to be the one provided. |
NVIDIA Tools Extension (NVTX)¶
Describe an instantaneous event that occurred at some point. |
|
Push a range onto a stack of nested range span. |
|
Pop a range off of a stack of nested range spans. |
Jiterator (beta)¶
Create a jiterator-generated cuda kernel for an elementwise op. |
|
Create a jiterator-generated cuda kernel for an elementwise op that supports returning one or more outputs. |
Stream Sanitizer (prototype)¶
CUDA Sanitizer is a prototype tool for detecting synchronization errors between streams in PyTorch. See the documentation for information on how to use it.