PyTorch 入门教程

从零开始学习 PyTorch，适合有 Python 基础的初学者。

April 23, 2026

1. PyTorch 简介

PyTorch 是由 Meta（原 Facebook）开发的开源深度学习框架。它以动态计算图为核心设计理念，让你像写普通 Python 代码一样构建和调试神经网络。

PyTorch 的核心优势：

动态计算图：运行时构建计算图，调试方便，逻辑灵活。
Pythonic：API 设计贴近 Python 习惯，学习曲线平缓。
强大的 GPU 加速：一行代码即可将计算迁移到 GPU。
丰富的生态：torchvision、torchaudio、torchtext 等官方库覆盖各领域。

2. 安装 PyTorch

推荐使用 pip 安装。访问 pytorch.org 获取适合你系统的安装命令。

bash

# CPU 版本（适合入门学习）
pip install torch torchvision

# CUDA 12.x GPU 版本（需要 NVIDIA 显卡）
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121

验证安装：

python

import torch
print(torch.__version__)        # 例如 2.x.x
print(torch.cuda.is_available()) # GPU 是否可用

3. 张量（Tensor）基础

张量是 PyTorch 中最基本的数据结构，可以理解为多维数组，类似于 NumPy 的 ndarray，但支持 GPU 加速和自动求导。

3.1 创建张量

python

import torch

# 从 Python 列表创建
a = torch.tensor([1, 2, 3])
print(a)  # tensor([1, 2, 3])

# 创建二维张量（矩阵）
b = torch.tensor([[1, 2], [3, 4]])
print(b)
# tensor([[1, 2],
#         [3, 4]])

# 常用创建方式
zeros = torch.zeros(3, 4)       # 3x4 全零张量
ones = torch.ones(2, 3)         # 2x3 全一张量
rand = torch.rand(2, 3)         # 2x3 随机张量（均匀分布，[0,1)）
randn = torch.randn(2, 3)       # 2x3 随机张量（标准正态分布）
arange = torch.arange(0, 10, 2) # tensor([0, 2, 4, 6, 8])
eye = torch.eye(3)              # 3x3 单位矩阵

3.2 张量属性

python

x = torch.rand(3, 4)

print(x.shape)   # torch.Size([3, 4]) — 形状
print(x.dtype)   # torch.float32 — 数据类型
print(x.device)  # cpu — 所在设备
print(x.ndim)    # 2 — 维度数
print(x.numel()) # 12 — 元素总数

3.3 张量运算

python

a = torch.tensor([1.0, 2.0, 3.0])
b = torch.tensor([4.0, 5.0, 6.0])

# 基本算术（逐元素）
print(a + b)   # tensor([5., 7., 9.])
print(a * b)   # tensor([ 4., 10., 18.])
print(a ** 2)  # tensor([1., 4., 9.])

# 矩阵乘法
m1 = torch.rand(2, 3)
m2 = torch.rand(3, 4)
result = m1 @ m2          # 等价于 torch.matmul(m1, m2)
print(result.shape)        # torch.Size([2, 4])

# 聚合操作
x = torch.tensor([1.0, 2.0, 3.0, 4.0])
print(x.sum())    # tensor(10.)
print(x.mean())   # tensor(2.5000)
print(x.max())    # tensor(4.)
print(x.argmax()) # tensor(3) — 最大值的索引

3.4 形状变换

python

x = torch.arange(12)
print(x)  # tensor([ 0,  1,  2, ..., 11])

# reshape：改变形状
y = x.reshape(3, 4)
print(y)
# tensor([[ 0,  1,  2,  3],
#         [ 4,  5,  6,  7],
#         [ 8,  9, 10, 11]])

# view：类似 reshape，但要求内存连续
z = x.view(4, 3)

# 展平
flat = y.flatten()  # 变回一维

# 增加/减少维度
a = torch.rand(3, 4)
b = a.unsqueeze(0)   # shape: (1, 3, 4) — 在第0维插入
c = b.squeeze(0)     # shape: (3, 4) — 移除大小为1的维度

# 转置
t = a.T              # 二维转置
t = a.permute(1, 0)  # 通用维度重排

3.5 索引与切片

python

x = torch.arange(12).reshape(3, 4)
# tensor([[ 0,  1,  2,  3],
#         [ 4,  5,  6,  7],
#         [ 8,  9, 10, 11]])

print(x[0])        # tensor([0, 1, 2, 3]) — 第0行
print(x[:, 1])     # tensor([1, 5, 9]) — 第1列
print(x[0:2, 1:3]) # tensor([[1, 2], [5, 6]]) — 子矩阵
print(x[x > 5])    # tensor([ 6,  7,  8,  9, 10, 11]) — 布尔索引

3.6 与 NumPy 互转

python

import numpy as np

# Tensor → NumPy（共享内存，修改一个会影响另一个）
t = torch.ones(3)
n = t.numpy()

# NumPy → Tensor
n = np.array([1, 2, 3])
t = torch.from_numpy(n)

4. 自动求导（Autograd）

自动求导是深度学习的核心——它让 PyTorch 自动计算梯度，省去手动推导的麻烦。

4.1 基本概念

python

# requires_grad=True 表示需要跟踪这个张量的计算，以便之后求导
x = torch.tensor(3.0, requires_grad=True)

# 定义一个计算
y = x ** 2 + 2 * x + 1  # y = x² + 2x + 1

# 反向传播，计算 dy/dx
y.backward()

# 查看梯度：dy/dx = 2x + 2 = 8.0
print(x.grad)  # tensor(8.)

4.2 多变量求导

python

x = torch.tensor(1.0, requires_grad=True)
w = torch.tensor(2.0, requires_grad=True)
b = torch.tensor(3.0, requires_grad=True)

# 模拟一个简单的线性计算
y = w * x + b  # y = 2*1 + 3 = 5

y.backward()

print(x.grad)  # tensor(2.) — dy/dx = w = 2
print(w.grad)  # tensor(1.) — dy/dw = x = 1
print(b.grad)  # tensor(1.) — dy/db = 1

4.3 阻止梯度追踪

在推理（预测）阶段不需要计算梯度，可以节省内存和加速：

python

# 方式一：使用 torch.no_grad()
with torch.no_grad():
    y = x * 2  # 不会记录计算图

# 方式二：detach()
y = x.detach()  # 创建一个不需要梯度的副本

5. 构建神经网络

PyTorch 使用 torch.nn 模块来构建神经网络。

5.1 最简单的网络

python

import torch
import torch.nn as nn

class SimpleNet(nn.Module):
    def __init__(self):
        super().__init__()
        # 定义网络层
        self.fc1 = nn.Linear(784, 128)  # 全连接层：784 → 128
        self.fc2 = nn.Linear(128, 64)   # 全连接层：128 → 64
        self.fc3 = nn.Linear(64, 10)    # 全连接层：64 → 10

    def forward(self, x):
        # 定义前向传播
        x = torch.relu(self.fc1(x))  # 第1层 + ReLU激活
        x = torch.relu(self.fc2(x))  # 第2层 + ReLU激活
        x = self.fc3(x)              # 输出层（不加激活）
        return x

# 创建模型
model = SimpleNet()
print(model)

5.2 使用 nn.Sequential（更简洁）

python

model = nn.Sequential(
    nn.Linear(784, 128),
    nn.ReLU(),
    nn.Linear(128, 64),
    nn.ReLU(),
    nn.Linear(64, 10)
)

5.3 常用网络层

python

# 全连接层
nn.Linear(in_features, out_features)

# 激活函数
nn.ReLU()        # max(0, x)
nn.Sigmoid()     # 1 / (1 + e^(-x))
nn.Tanh()        # 双曲正切
nn.Softmax(dim=1) # 归一化为概率

# 卷积层（用于图像）
nn.Conv2d(in_channels=1, out_channels=32, kernel_size=3, padding=1)

# 池化层
nn.MaxPool2d(kernel_size=2)
nn.AvgPool2d(kernel_size=2)

# 正则化
nn.Dropout(p=0.5)       # 随机丢弃 50% 神经元
nn.BatchNorm1d(128)     # 批归一化

# 展平层
nn.Flatten()  # 将多维张量展平为一维

6. 训练一个完整模型

训练的核心流程是：前向传播 → 计算损失 → 反向传播 → 更新参数。

6.1 损失函数与优化器

python

import torch.optim as optim

model = SimpleNet()

# 损失函数
criterion = nn.CrossEntropyLoss()  # 分类任务
# 其他常用损失函数：
# nn.MSELoss()          — 均方误差（回归）
# nn.BCELoss()          — 二分类交叉熵
# nn.L1Loss()           — 平均绝对误差

# 优化器
optimizer = optim.Adam(model.parameters(), lr=0.001)
# 其他常用优化器：
# optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
# optim.RMSprop(model.parameters(), lr=0.001)

6.2 训练循环

python

# 模拟数据
X_train = torch.randn(1000, 784)  # 1000个样本，每个784维
y_train = torch.randint(0, 10, (1000,))  # 1000个标签，0-9

# 训练
model.train()  # 设置为训练模式
epochs = 10

for epoch in range(epochs):
    # 前向传播
    outputs = model(X_train)
    loss = criterion(outputs, y_train)

    # 反向传播 + 更新
    optimizer.zero_grad()  # 清零梯度（重要！）
    loss.backward()        # 计算梯度
    optimizer.step()       # 更新参数

    # 打印训练信息
    if (epoch + 1) % 2 == 0:
        print(f"Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.4f}")

6.3 评估模型

python

model.eval()  # 设置为评估模式

with torch.no_grad():  # 不计算梯度
    X_test = torch.randn(200, 784)
    y_test = torch.randint(0, 10, (200,))

    outputs = model(X_test)
    _, predicted = torch.max(outputs, 1)  # 取概率最大的类别
    accuracy = (predicted == y_test).float().mean()
    print(f"Accuracy: {accuracy.item():.2%}")

7. 模型的保存与加载

python

# 保存模型参数（推荐方式）
torch.save(model.state_dict(), "model.pth")

# 加载模型参数
model = SimpleNet()
model.load_state_dict(torch.load("model.pth"))
model.eval()

# 保存整个模型（包括结构，不推荐）
torch.save(model, "full_model.pth")
model = torch.load("full_model.pth")

8. 使用 GPU 加速

python

# 检查 GPU 是否可用
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

# 将模型移到 GPU
model = SimpleNet().to(device)

# 将数据移到 GPU
x = torch.randn(64, 784).to(device)
y = torch.randint(0, 10, (64,)).to(device)

# 之后的计算会在 GPU 上进行
output = model(x)

注意：模型和数据必须在同一个设备上，否则会报错。

9. 实战：手写数字识别（MNIST）

把前面学的知识串起来，完成一个真实的分类任务。

python

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

# =====================
# 1. 数据准备
# =====================
transform = transforms.Compose([
    transforms.ToTensor(),                # 转为张量，像素值归一化到 [0, 1]
    transforms.Normalize((0.1307,), (0.3081,))  # MNIST 的均值和标准差
])

# 下载并加载数据集
train_dataset = datasets.MNIST(root="./data", train=True,
                               download=True, transform=transform)
test_dataset = datasets.MNIST(root="./data", train=False,
                              download=True, transform=transform)

# 使用 DataLoader 进行批量加载和打乱
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=1000, shuffle=False)

# =====================
# 2. 定义模型
# =====================
class MNISTNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.net = nn.Sequential(
            nn.Flatten(),             # 28x28 → 784
            nn.Linear(784, 256),
            nn.ReLU(),
            nn.Dropout(0.2),
            nn.Linear(256, 128),
            nn.ReLU(),
            nn.Dropout(0.2),
            nn.Linear(128, 10)
        )

    def forward(self, x):
        return self.net(x)

# =====================
# 3. 设置训练
# =====================
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = MNISTNet().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# =====================
# 4. 训练
# =====================
def train(model, loader, criterion, optimizer, device):
    model.train()
    total_loss = 0
    correct = 0
    total = 0

    for images, labels in loader:
        images, labels = images.to(device), labels.to(device)

        outputs = model(images)
        loss = criterion(outputs, labels)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        total_loss += loss.item()
        _, predicted = outputs.max(1)
        total += labels.size(0)
        correct += predicted.eq(labels).sum().item()

    avg_loss = total_loss / len(loader)
    accuracy = correct / total
    return avg_loss, accuracy

# =====================
# 5. 评估
# =====================
def evaluate(model, loader, criterion, device):
    model.eval()
    total_loss = 0
    correct = 0
    total = 0

    with torch.no_grad():
        for images, labels in loader:
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            loss = criterion(outputs, labels)

            total_loss += loss.item()
            _, predicted = outputs.max(1)
            total += labels.size(0)
            correct += predicted.eq(labels).sum().item()

    avg_loss = total_loss / len(loader)
    accuracy = correct / total
    return avg_loss, accuracy

# =====================
# 6. 运行训练
# =====================
epochs = 10
for epoch in range(epochs):
    train_loss, train_acc = train(model, train_loader, criterion,
                                  optimizer, device)
    test_loss, test_acc = evaluate(model, test_loader, criterion, device)

    print(f"Epoch {epoch+1:2d}/{epochs} | "
          f"Train Loss: {train_loss:.4f}, Acc: {train_acc:.2%} | "
          f"Test Loss: {test_loss:.4f}, Acc: {test_acc:.2%}")

# 保存训练好的模型
torch.save(model.state_dict(), "mnist_model.pth")
print("模型已保存！")

运行后你会看到类似的输出：

plain text

Epoch  1/10 | Train Loss: 0.3842, Acc: 88.92% | Test Loss: 0.1320, Acc: 96.05%
Epoch  2/10 | Train Loss: 0.1589, Acc: 95.28% | Test Loss: 0.0987, Acc: 96.93%
...
Epoch 10/10 | Train Loss: 0.0483, Acc: 98.52% | Test Loss: 0.0712, Acc: 97.89%

10. 下一步学习建议

掌握以上内容后，可以继续深入以下方向：

卷积神经网络 (CNN)：用于图像识别，学习 nn.Conv2d、nn.MaxPool2d 的组合使用。
循环神经网络 (RNN/LSTM)：用于序列数据（文本、时间序列），学习 nn.LSTM、nn.GRU。
Transformer：现代 NLP 和视觉模型的基础，学习 nn.Transformer 和注意力机制。
迁移学习：使用 torchvision.models 中的预训练模型（ResNet、EfficientNet 等）进行微调。
数据增强：使用 transforms 模块丰富训练数据。
学习率调度：torch.optim.lr_scheduler 动态调整学习率。
TensorBoard 可视化：使用 torch.utils.tensorboard 监控训练过程。

推荐学习资源：

本教程基于 PyTorch 2.x 编写。祝你学习愉快！

目录

1. PyTorch 简介

2. 安装 PyTorch

3. 张量（Tensor）基础

3.1 创建张量

3.2 张量属性

3.3 张量运算

3.4 形状变换

3.5 索引与切片

3.6 与 NumPy 互转

4. 自动求导（Autograd）

4.1 基本概念

4.2 多变量求导

4.3 阻止梯度追踪

5. 构建神经网络

5.1 最简单的网络

5.2 使用 nn.Sequential（更简洁）

5.3 常用网络层

6. 训练一个完整模型

6.1 损失函数与优化器

6.2 训练循环

6.3 评估模型

7. 模型的保存与加载

8. 使用 GPU 加速

9. 实战：手写数字识别（MNIST）

10. 下一步学习建议