🔍 一、为什么需要残差网络(ResNet)?
随着神经网络层数的增加,网络的表达能力理论上会增强,但实际情况却出现了:
- 梯度消失 / 梯度爆炸
- 训练精度下降(退化问题)
即:更深的网络反而更难训练,甚至效果比浅层网络差。
ResNet(Residual Network) 的核心思想正是为了解决这一问题。
⚙️ 二、核心思想:残差连接(Residual Connection)
在传统卷积层中:
$$
y = F(x)
$$
在 ResNet 中引入 跳跃连接(shortcut connection):
$$
y = F(x) + x
$$
其中:
- ( x ):输入特征
- ( F(x) ):经过卷积层等操作后的输出(残差)
- ( y ):最终输出
直观理解:
模型不需要学习完整的映射 ( H(x) ),而是学习一个残差 ( F(x) = H(x) – x )。
这让训练变得更容易,收敛更快。
🧩 三、残差块(Residual Block)结构
基本残差块(BasicBlock)示意:
输入
│
├── Conv3x3 → BN → ReLU → Conv3x3 → BN
│
└───────────────+
│
ReLU 输出
如果输入与输出通道不一致,则需要使用 1×1 卷积层 调整维度(shortcut projection)。
🧱 四、ResNet-18 网络结构概览
ResNet-18 属于 浅层残差网络(18 层),其结构如下:
| 层级 | 输出尺寸 | 网络结构 |
|---|---|---|
| conv1 | 112×112 | 7×7 卷积,64通道,步长2 |
| maxpool | 56×56 | 3×3 最大池化,步长2 |
| conv2_x | 56×56 | 2个 BasicBlock,64通道 |
| conv3_x | 28×28 | 2个 BasicBlock,128通道 |
| conv4_x | 14×14 | 2个 BasicBlock,256通道 |
| conv5_x | 7×7 | 2个 BasicBlock,512通道 |
| avgpool | 1×1 | 全局平均池化 |
| fc | 1×1 | 全连接层(分类) |
🧰 五、使用 PyTorch 实现 ResNet-18
1️⃣ 导入依赖
import torch
import torch.nn as nn
2️⃣ 定义残差块(BasicBlock)
class BasicBlock(nn.Module):
expansion = 1 # 通道扩展比例
def __init__(self, in_channels, out_channels, stride=1, downsample=None):
super(BasicBlock, self).__init__()
self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3,
stride=stride, padding=1, bias=False)
self.bn1 = nn.BatchNorm2d(out_channels)
self.relu = nn.ReLU(inplace=True)
self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3,
stride=1, padding=1, bias=False)
self.bn2 = nn.BatchNorm2d(out_channels)
self.downsample = downsample # 调整维度的shortcut层
def forward(self, x):
identity = x
if self.downsample is not None:
identity = self.downsample(x)
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
out += identity
out = self.relu(out)
return out
3️⃣ 定义 ResNet 主体结构
class ResNet(nn.Module):
def __init__(self, block, layers, num_classes=1000):
super(ResNet, self).__init__()
self.in_channels = 64
self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False)
self.bn1 = nn.BatchNorm2d(64)
self.relu = nn.ReLU(inplace=True)
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
self.layer1 = self._make_layer(block, 64, layers[0])
self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
self.fc = nn.Linear(512 * block.expansion, num_classes)
def _make_layer(self, block, out_channels, blocks, stride=1):
downsample = None
if stride != 1 or self.in_channels != out_channels * block.expansion:
downsample = nn.Sequential(
nn.Conv2d(self.in_channels, out_channels * block.expansion,
kernel_size=1, stride=stride, bias=False),
nn.BatchNorm2d(out_channels * block.expansion),
)
layers = []
layers.append(block(self.in_channels, out_channels, stride, downsample))
self.in_channels = out_channels * block.expansion
for _ in range(1, blocks):
layers.append(block(self.in_channels, out_channels))
return nn.Sequential(*layers)
def forward(self, x):
x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
x = self.maxpool(x)
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)
x = self.avgpool(x)
x = torch.flatten(x, 1)
x = self.fc(x)
return x
4️⃣ 构建 ResNet-18 模型
def ResNet18(num_classes=1000):
return ResNet(BasicBlock, [2, 2, 2, 2], num_classes=num_classes)
model = ResNet18(num_classes=100)
print(model)
⚡ 六、验证模型运行
x = torch.randn(1, 3, 224, 224)
model = ResNet18(num_classes=10)
y = model(x)
print(y.shape)
输出:
torch.Size([1, 10])
🧠 七、总结与优化方向
| 优化点 | 说明 |
|---|---|
| BatchNorm | 保持分布稳定,加快收敛 |
| ReLU(inplace=True) | 节省显存 |
| AdaptiveAvgPool2d | 自适应输入尺寸 |
| Residual Connection | 防止梯度消失,优化收敛 |
✅ ResNet 的成功,标志着“更深的网络可以更好地训练”,并成为现代 CNN 的重要基础(如 ResNeXt、DenseNet、EfficientNet 都继承其思想)。
发表回复