PyTorch入门: Kaggle 猫狗识别VGG

Kaggle猫狗识别比赛地址

给定$12500$张猫的图片和$12500$张狗的图片作为训练集,再给定$12500$张未分类的猫或狗图片,输出每张图片是狗的概率。

使用PyTorch训练VGG网络(采用CUDA优化)来完成这个任务。

数据预处理

图片的处理

注意给的数据是大小不一、形形色色的。需要对图像规格进行标准化处理。VGG的网络输入统一是$3 \times 224 \times 224$的,因此要将图像处理为相应大小。使用torchvision.transforms对图像调整大小并转为torch.tensor:

1
2
3
4
TRANSFORM = transforms.Compose([transforms.Resize((256,256)),
transforms.RandomCrop((224,224)),
transforms.ToTensor()
])

通过随机剪取训练集图片可以引入更多随机性,一方面可以增多训练数据(不同epoch剪取图片不同),另一方面减轻过拟合。

另外一个很重要的预防过拟合的操作是图像的归一化(Normalization),图像归一化不是简单地每个位置$/256.0$,而是要进行更多的处理。虽然不知道在干嘛但是能够加快收敛并且防止过拟合就对了: “Well, it does work.”。PyTorch已经实现了归一化函数,需要控制的参数就是方差和均值,站在巨人的肩膀上,图像识别中常用的参数是(将每个像素归一化到$[-1,1]$):

1
2
Normalize(mean = [0.485, 0.456, 0.406], 
std = [0.229, 0.224, 0.225])

因此训练集图片这样预处理:

1
2
3
4
5
TRANSFORM = transforms.Compose([transforms.Resize((256,256)),
transforms.RandomCrop((224,224)),
transforms.ToTensor(),
transforms.Normalize((0.485,0.456,0.406),(0.229,0.224,0.225))
])

而测试集就不需要随机剪取了,直接用原图缩放即可:

1
2
3
4
TRANSFORM = transforms.Compose([transforms.Resize((224,224)),
transforms.ToTensor(),
transforms.Normalize((0.485,0.456,0.406),(0.229,0.224,0.225))
])


Dataset与DataLoader

对比之前手写训练的”造轮子行为”: PyTorch入门: Kaggle 泰坦尼克幸存者预测。据说Dataset系列才是PyTorch的标准使用方法,方便一些也可以增进并行化……

PyTorch内置的Dataset系列可以自动实现数据载入首先,需要自己定义一个torch.utils.data.Dataset的继承。这里有两种实现方式。

一种是动态加载型: 内存里只存每个图片的文件路径,以后需要调用图片的时候,现场读取图片、进行单张图片的预处理并返回。因此不需要预处理时间可以直接运行。缺点是在很多个epoch的情况下,重复读入图片总时间开销会较大。由于训练过程是以batch为最小单位的,内存开销除去存储网络参数之外完全受batch_size控制,可以根据内存自动调整。贫下中农选项

1
2
3
4
5
6
7
8
9
10
11
12
13
14
class ImageDataset(data.Dataset):
def __init__(self, image_list, label_list):
self.data = image_list
self.label = label_list

def __getitem__(self, index):
global TRANSFORM
img = Image.open(self.data[index])
data = TRANSFORM(img)
img.close()
return data.cuda(),torch.cuda.FloatTensor([self.label[index]])

def __len__(self):
return len(self.data)

一种是预加载型: 将整个图片库经过预处理加载到内存里,以后每次返回图片即可。试验最开始读入图片库需要大约$40$s的时间,可以接受。此后的调用图片的速度会非常快,有明显提升。但是内存开销特别巨大,使用numpy.ndarray存储似乎也需要$20$GB左右的内存(没有测具体数字),如果用torch.tensor则会进一步爆炸。大资本家选项

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
class ImageDataset(data.Dataset):
def __init__(self, image_list, label_list):
self.data = []
self.label = []
for i in range(len(image_list)):
img = Image.open(image_list[i])
self.data.append(TRANSFORM(img))
img.close()
self.label.append(label_list[i])

def __getitem__(self, index):
return self.data[index].cuda(),torch.cuda.FloatTensor([self.label[index]])

def __len__(self):
return len(self.data)


输入数据

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
def load():
np.random.seed(998244353)
torch.manual_seed(998244353)
image_list = []
label_list = []
for i in range(ORIGIN_DATA_SIZE):
image_list.append(INPUT_PATH+"/train/cat.{0}.jpg".format(i))
label_list.append(0)
image_list.append(INPUT_PATH+"/train/dog.{0}.jpg".format(i))
label_list.append(1)
n = int(ORIGIN_DATA_SIZE*2*RATIO)
train_data = ImageDataset(image_list[:n],label_list[:n])
validate_data = ImageDataset(image_list[n:],label_list[n:])
image_list = []
for i in range(TARGET_DATA_SIZE):
image_list.append(INPUT_PATH+"/test/{0}.jpg".format(i+1))
test_data = ImageDataset(image_list,[0]*TARGET_DATA_SIZE)
np.random.seed()
torch.seed()
return train_data,validate_data,test_data

这里采用统一的随机种子是为了保证每次运行时划分的验证集和真·训练集是相同的。


PyTorch实现VGG

VGG结构

VGG论文原文: VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION

经典VGG13结构图

经典VGG各版本结构表

接下来将会按照VGG各版本结构表来实现一个通用的VGG。主要利用PyTorch的torch.nn.Sequential()以及可以动态添加连接层的add_module()来实现。如下是基本的结构:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
class VGG(nn.Module):
def __init__(self, name="11"):
super(VGG, self).__init__()
self.name = "VGG"+name
self.conv = nn.Sequential()
i = 1; p = 1

# ... Different versions of VGG

self.fc = nn.Sequential(
nn.Linear(512*7*7,4096),
nn.ReLU(),
nn.Dropout(0.5),
nn.Linear(4096,4096),
nn.ReLU(),
nn.Dropout(0.5),
nn.Linear(4096,1000),
nn.ReLU(),
nn.Linear(1000,1),
nn.Sigmoid()
)

所有VGG版本的全连接部分是确定的,只有中间省略的部分不同。创建一个VGG的时候要定义名称,根据VGG结构表从["11","11-LRN","13","16-1","16","19"]中选择,默认或者不在表内的名称都被认为是VGG11。由于add_module()创建层需要名称,ip分别用于标识不同卷积层和不同池化层的标号。


VGG实现

1
self.conv.add_module('conv-{0}'.format(i),nn.Conv2d(in_channels=  3, out_channels= 64, kernel_size=3, stride=1, padding=1))

利用上述方法创建一个卷积层,可以看到各个参数都已经显式注明。注意不同卷积层之间的通道数一定要匹配。为了保证图片大小不发生变化,每个$3 \times 3$的卷积都使用padding=1而每个$1 \times 1$的卷积都使用padding=0,也因此,初始直接传入$3 \times 224 \times 224$的图片而不是$3 \times 227 \times 227$的图片。

1
self.conv.add_module('ReLU-{0}'.format(i),nn.ReLU());i+=1

利用上述方法创建一个激活函数层(标准采用$\mathtt{ReLU}$),而每个卷积层后面都有一个激活函数层,两者标号可以采用相同的,因此每次激活函数层后更新标号i

1
self.conv.add_module('MaxPooling-{0}'.format(p),nn.MaxPool2d(kernel_size=2, stride=2));p+=1

利用上述方法创建一个池化层。在VGG中,只使用了上述一种类型和参数的池化层: $2 \times 2$的$\mathtt{MaxPooling}$。注意更新池化层标号p

1
self.conv.add_module('LRN',nn.LocalResponseNorm(size=2))

利用上述方法创建一个$\mathtt{LRN}$层(Local Response Norm, 局部响应归一化)。只有VGG11-LRN中使用了一次$\mathtt{LRN}$层。主要思想是在神经网络中间对相邻通道做信号的归一化,因此某个神经元信号比较大的时候会使得周边神经元相对信号大小减小,可以模仿生物学一个神经元的兴奋会导致周边神经元抑制的现象。主要作用是防范过拟合,尤其在使用$\mathtt{ReLU}$激活函数的神经网络上比较有效果。


于是接下来只要按照结构表构造即可(注意padding):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
self.conv.add_module('conv-{0}'.format(i),nn.Conv2d(in_channels=  3, out_channels= 64, kernel_size=3, stride=1, padding=1))
self.conv.add_module('ReLU-{0}'.format(i),nn.ReLU());i+=1
if name in ["13","16-1","16","19"]:
self.conv.add_module('conv-{0}'.format(i),nn.Conv2d(in_channels= 64, out_channels= 64, kernel_size=3, stride=1, padding=1))
self.conv.add_module('ReLU-{0}'.format(i),nn.ReLU());i+=1
if name in ["11-LRN"]:
self.conv.add_module('LRN',nn.LocalResponseNorm(size=2))
self.conv.add_module('MaxPooling-{0}'.format(p),nn.MaxPool2d(kernel_size=2, stride=2));p+=1 # 224 -> 112

self.conv.add_module('conv-{0}'.format(i),nn.Conv2d(in_channels= 64, out_channels=128, kernel_size=3, stride=1, padding=1))
self.conv.add_module('ReLU-{0}'.format(i),nn.ReLU());i+=1
if name in ["13","16-1","16","19"]:
self.conv.add_module('conv-{0}'.format(i),nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3, stride=1, padding=1))
self.conv.add_module('ReLU-{0}'.format(i),nn.ReLU());i+=1
self.conv.add_module('MaxPooling-{0}'.format(p),nn.MaxPool2d(kernel_size=2, stride=2));p+=1 # 112 -> 56

self.conv.add_module('conv-{0}'.format(i),nn.Conv2d(in_channels=128, out_channels=256, kernel_size=3, stride=1, padding=1))
self.conv.add_module('ReLU-{0}'.format(i),nn.ReLU());i+=1
self.conv.add_module('conv-{0}'.format(i),nn.Conv2d(in_channels=256, out_channels=256, kernel_size=3, stride=1, padding=1))
self.conv.add_module('ReLU-{0}'.format(i),nn.ReLU());i+=1
if name in ["16","19"]:
self.conv.add_module('conv-{0}'.format(i),nn.Conv2d(in_channels=256, out_channels=256, kernel_size=3, stride=1, padding=1))
self.conv.add_module('ReLU-{0}'.format(i),nn.ReLU());i+=1
if name in ["16-1"]:
self.conv.add_module('conv-{0}'.format(i),nn.Conv2d(in_channels=256, out_channels=256, kernel_size=1, stride=1, padding=0))
self.conv.add_module('ReLU-{0}'.format(i),nn.ReLU());i+=1
if name in ["19"]:
self.conv.add_module('conv-{0}'.format(i),nn.Conv2d(in_channels=256, out_channels=256, kernel_size=3, stride=1, padding=1))
self.conv.add_module('ReLU-{0}'.format(i),nn.ReLU());i+=1
self.conv.add_module('MaxPooling-{0}'.format(p),nn.MaxPool2d(kernel_size=2, stride=2));p+=1 # 56 -> 28

self.conv.add_module('conv-{0}'.format(i),nn.Conv2d(in_channels=256, out_channels=512, kernel_size=3, stride=1, padding=1))
self.conv.add_module('ReLU-{0}'.format(i),nn.ReLU());i+=1
self.conv.add_module('conv-{0}'.format(i),nn.Conv2d(in_channels=512, out_channels=512, kernel_size=3, stride=1, padding=1))
self.conv.add_module('ReLU-{0}'.format(i),nn.ReLU());i+=1
if name in ["16","19"]:
self.conv.add_module('conv-{0}'.format(i),nn.Conv2d(in_channels=512, out_channels=512, kernel_size=3, stride=1, padding=1))
self.conv.add_module('ReLU-{0}'.format(i),nn.ReLU());i+=1
if name in ["16-1"]:
self.conv.add_module('conv-{0}'.format(i),nn.Conv2d(in_channels=512, out_channels=512, kernel_size=1, stride=1, padding=0))
self.conv.add_module('ReLU-{0}'.format(i),nn.ReLU());i+=1
if name in ["19"]:
self.conv.add_module('conv-{0}'.format(i),nn.Conv2d(in_channels=512, out_channels=512, kernel_size=3, stride=1, padding=1))
self.conv.add_module('ReLU-{0}'.format(i),nn.ReLU());i+=1
self.conv.add_module('MaxPooling-{0}'.format(p),nn.MaxPool2d(kernel_size=2, stride=2));p+=1 # 28 -> 14

self.conv.add_module('conv-{0}'.format(i),nn.Conv2d(in_channels=512, out_channels=512, kernel_size=3, stride=1, padding=1))
self.conv.add_module('ReLU-{0}'.format(i),nn.ReLU());i+=1
self.conv.add_module('conv-{0}'.format(i),nn.Conv2d(in_channels=512, out_channels=512, kernel_size=3, stride=1, padding=1))
self.conv.add_module('ReLU-{0}'.format(i),nn.ReLU());i+=1
if name in ["16","19"]:
self.conv.add_module('conv-{0}'.format(i),nn.Conv2d(in_channels=512, out_channels=512, kernel_size=3, stride=1, padding=1))
self.conv.add_module('ReLU-{0}'.format(i),nn.ReLU());i+=1
if name in ["16-1"]:
self.conv.add_module('conv-{0}'.format(i),nn.Conv2d(in_channels=512, out_channels=512, kernel_size=1, stride=1, padding=0))
self.conv.add_module('ReLU-{0}'.format(i),nn.ReLU());i+=1
if name in ["19"]:
self.conv.add_module('conv-{0}'.format(i),nn.Conv2d(in_channels=512, out_channels=512, kernel_size=3, stride=1, padding=1))
self.conv.add_module('ReLU-{0}'.format(i),nn.ReLU());i+=1
self.conv.add_module('MaxPooling-{0}'.format(p),nn.MaxPool2d(kernel_size=2, stride=2));p+=1 # 14 -> 7

输入是batch_size $\times 3 \times 224 \times 224$维度的,输出是batch_size $\times 512 \times 7 \times 7$,接下来接入输入为batch_size $\times 25088$的卷积层。需要进行一次神经元的重新组织,于是前向传播过程这样实现(x.shape[0]就是batch_size-1表示自动填充大小):

1
2
3
4
5
def forward(self, x):
x = self.conv(x)
x = x.view(x.shape[0], -1)
x = self.fc(x)
return x

然后就可以轻松构造VGG了!

1
net = VGG("19")


RMSprop优化器

调参是一件令人头痛的事。不过听说RMSprop优化器可以自动调参(学习率)。直接调用PyTorch内置的RMSprop优化器即可:

1
optimizer = optim.RMSprop(net.parameters(), lr=LR, alpha=0.9)

注意optimizer定义过程中传入了net.parameters(),这意味着创建optimizer以后就不能再手动修改网络了(除了通过optimizer训练),否则就要重新创建optimizer


VGG模型的保存、加载和CUDA优化

一定要及时保存啊不然亏大了,别问我怎么知道的,问就全是泪……

爆炸日常


在网络结构确定的情况下,保存参数即可。通过如下方法保存和载入VGG的参数:

1
2
3
torch.save(net.state_dict(),FILENAME)

net.load_state_dict(torch.load(FILENAME))


注意到前面的很多定义中已经用到了cuda。而网络和损失函数也需要CUDA优化一下,写法很简单: net.cuda()调用LOSS_FUNC.cuda()即可

注意net.cuda()也会修改网络,因此优化器应该在net.cuda()以后创建。


学习过程

训练和预测

已经用上了Dataset系列,那么训练就很轻松了。记得中间保存,由于存储容量大小限制(保存一次参数+优化器就要1个多G),每UPDATEepoch保存一次,保存的时候记录Validation Error和Validation Loss。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
def train(optimizer):
global net
global EPOCH
global BATCH_SIZE
global train_data
global validate_data
global PROGRAM_START
print('['+net.name+'] with optimizer ['+str(type(optimizer))+']:')
train_loader = data.DataLoader(train_data, batch_size=BATCH_SIZE, shuffle=True, num_workers=0)
validate_loader = data.DataLoader(validate_data, batch_size=1, shuffle=False, num_workers=0)
BATCH = len(train_loader)
m = len(validate_loader)
for epoch in range(EPOCH):
EPOCH_START = time.time()
print("\tEpoch #{0}/{1}:".format(epoch+1,EPOCH))
for batch,(x,y) in enumerate(train_loader):
optimizer.zero_grad()
t = net(x)
loss = LOSS_FUNC(t,y)
print("\t\tBatch #{0}/{1}: ".format(batch+1,BATCH) + "Loss = %.6f"%float(loss))
loss.backward()
optimizer.step()
with torch.no_grad():
L = 0.
E = 0.
for batch,(x,y) in enumerate(validate_loader):
t = net(x)
L += float(LOSS_FUNC(t,y))
E += float((float(t[0][0])>0.5)!=y)
print("\t Validation Loss = %.6f. Error Rate = %.3f%%"%(L/m,E*100/m))
if((epoch+1)%UPDATE==0):
torch.save(net.state_dict(),OUTPUT_PATH+"/{0}[{1}]".format(net.name,epoch+1)+"-L(%.6f)E(%.3f).pt"%(L/m,E*100/m))
torch.save(optimizer.state_dict(),OUTPUT_PATH+"/{0}[{1}]".format(net.name,epoch+1)+"-L(%.6f)E(%.3f)-optimizer.pt"%(L/m,E*100/m))
print("\t Finish epoch #{0}".format(epoch+1)+" in %.4f s."%(time.time()-EPOCH_START)+" Total Time Cost = %.4f s."%(time.time()-PROGRAM_START))

运行基本没啥区别,跑就完了:

1
2
3
4
5
6
7
8
9
10
11
12
def run(filename):
global net
global test_data
prediction = []
test_loader = data.DataLoader(test_data, batch_size=1, shuffle=False, num_workers=0)
with torch.no_grad():
for i,(x,y) in enumerate(test_loader):
t = net(x)
prediction.append([i+1,float(t[0][0])])
submission = pd.DataFrame(prediction)
submission.columns = ['id','label']
submission.to_csv(filename+".csv",index=0)


完整代码

训练集$25000$张图片,其中$99\%$即$24750$张作为真·训练集,$250$张作为测试集。每$5$个epoch更新一下。

batch_size大一点更好,但是作为一个穷人……只能设为$75$了。一个技巧是用二进制SWITCH来控制程序运行,1表示训练,2表示载入现成模型。那么如果SWITCH=1就训练现成模型然后预测,SWITCH=2就载入现成模型然后预测,SWITCH=3就载入现成模型,训练后预测。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
import torch
import torch.nn as nn
import torch.optim as optim
import torch.utils.data as data
import matplotlib.pyplot as plt
import torchvision.transforms as transforms
from PIL import Image
import pandas as pd
import numpy as np
import math
import time
import os
import gc

# Configurations
INPUT_PATH = ""
OUTPUT_PATH = ""
ORIGIN_DATA_SIZE = 12500
TARGET_DATA_SIZE = 12500
RATIO = 0.99
EPOCH = 15
BATCH_SIZE = 75
LOSS_FUNC = nn.BCELoss()
LR = 0.0001
SWITCH = 3
UPDATE = 5
TRANSFORM = transforms.Compose([transforms.Resize((256,256)),
transforms.RandomCrop((224,224)),
transforms.ToTensor(),
transforms.Normalize((0.485,0.456,0.406),(0.229,0.224,0.225))
])
PARAMETERS = ""

class ImageDataset(data.Dataset):
def __init__(self, image_list, label_list):
self.data = image_list
self.label = label_list

def __getitem__(self, index):
global TRANSFORM
img = Image.open(self.data[index])
data = TRANSFORM(img)
img.close()
return data.cuda(),torch.cuda.FloatTensor([self.label[index]])

def __len__(self):
return len(self.data)

class VGG(nn.Module):
def __init__(self, name="11"):
super(VGG, self).__init__()
self.name = "VGG"+name
self.conv = nn.Sequential()
i = 1; p = 1
self.conv.add_module('conv-{0}'.format(i),nn.Conv2d(in_channels= 3, out_channels= 64, kernel_size=3, stride=1, padding=1))
self.conv.add_module('ReLU-{0}'.format(i),nn.ReLU());i+=1
if name in ["13","16-1","16","19"]:
self.conv.add_module('conv-{0}'.format(i),nn.Conv2d(in_channels= 64, out_channels= 64, kernel_size=3, stride=1, padding=1))
self.conv.add_module('ReLU-{0}'.format(i),nn.ReLU());i+=1
if name in ["11-LRN"]:
self.conv.add_module('LRN',nn.LocalResponseNorm(size=2))
self.conv.add_module('MaxPooling-{0}'.format(p),nn.MaxPool2d(kernel_size=2, stride=2));p+=1 # 224 -> 112

self.conv.add_module('conv-{0}'.format(i),nn.Conv2d(in_channels= 64, out_channels=128, kernel_size=3, stride=1, padding=1))
self.conv.add_module('ReLU-{0}'.format(i),nn.ReLU());i+=1
if name in ["13","16-1","16","19"]:
self.conv.add_module('conv-{0}'.format(i),nn.Conv2d(in_channels=128, out_channels=128, kernel_size=3, stride=1, padding=1))
self.conv.add_module('ReLU-{0}'.format(i),nn.ReLU());i+=1
self.conv.add_module('MaxPooling-{0}'.format(p),nn.MaxPool2d(kernel_size=2, stride=2));p+=1 # 112 -> 56

self.conv.add_module('conv-{0}'.format(i),nn.Conv2d(in_channels=128, out_channels=256, kernel_size=3, stride=1, padding=1))
self.conv.add_module('ReLU-{0}'.format(i),nn.ReLU());i+=1
self.conv.add_module('conv-{0}'.format(i),nn.Conv2d(in_channels=256, out_channels=256, kernel_size=3, stride=1, padding=1))
self.conv.add_module('ReLU-{0}'.format(i),nn.ReLU());i+=1
if name in ["16","19"]:
self.conv.add_module('conv-{0}'.format(i),nn.Conv2d(in_channels=256, out_channels=256, kernel_size=3, stride=1, padding=1))
self.conv.add_module('ReLU-{0}'.format(i),nn.ReLU());i+=1
if name in ["16-1"]:
self.conv.add_module('conv-{0}'.format(i),nn.Conv2d(in_channels=256, out_channels=256, kernel_size=1, stride=1, padding=0))
self.conv.add_module('ReLU-{0}'.format(i),nn.ReLU());i+=1
if name in ["19"]:
self.conv.add_module('conv-{0}'.format(i),nn.Conv2d(in_channels=256, out_channels=256, kernel_size=3, stride=1, padding=1))
self.conv.add_module('ReLU-{0}'.format(i),nn.ReLU());i+=1
self.conv.add_module('MaxPooling-{0}'.format(p),nn.MaxPool2d(kernel_size=2, stride=2));p+=1 # 56 -> 28

self.conv.add_module('conv-{0}'.format(i),nn.Conv2d(in_channels=256, out_channels=512, kernel_size=3, stride=1, padding=1))
self.conv.add_module('ReLU-{0}'.format(i),nn.ReLU());i+=1
self.conv.add_module('conv-{0}'.format(i),nn.Conv2d(in_channels=512, out_channels=512, kernel_size=3, stride=1, padding=1))
self.conv.add_module('ReLU-{0}'.format(i),nn.ReLU());i+=1
if name in ["16","19"]:
self.conv.add_module('conv-{0}'.format(i),nn.Conv2d(in_channels=512, out_channels=512, kernel_size=3, stride=1, padding=1))
self.conv.add_module('ReLU-{0}'.format(i),nn.ReLU());i+=1
if name in ["16-1"]:
self.conv.add_module('conv-{0}'.format(i),nn.Conv2d(in_channels=512, out_channels=512, kernel_size=1, stride=1, padding=0))
self.conv.add_module('ReLU-{0}'.format(i),nn.ReLU());i+=1
if name in ["19"]:
self.conv.add_module('conv-{0}'.format(i),nn.Conv2d(in_channels=512, out_channels=512, kernel_size=3, stride=1, padding=1))
self.conv.add_module('ReLU-{0}'.format(i),nn.ReLU());i+=1
self.conv.add_module('MaxPooling-{0}'.format(p),nn.MaxPool2d(kernel_size=2, stride=2));p+=1 # 28 -> 14

self.conv.add_module('conv-{0}'.format(i),nn.Conv2d(in_channels=512, out_channels=512, kernel_size=3, stride=1, padding=1))
self.conv.add_module('ReLU-{0}'.format(i),nn.ReLU());i+=1
self.conv.add_module('conv-{0}'.format(i),nn.Conv2d(in_channels=512, out_channels=512, kernel_size=3, stride=1, padding=1))
self.conv.add_module('ReLU-{0}'.format(i),nn.ReLU());i+=1
if name in ["16","19"]:
self.conv.add_module('conv-{0}'.format(i),nn.Conv2d(in_channels=512, out_channels=512, kernel_size=3, stride=1, padding=1))
self.conv.add_module('ReLU-{0}'.format(i),nn.ReLU());i+=1
if name in ["16-1"]:
self.conv.add_module('conv-{0}'.format(i),nn.Conv2d(in_channels=512, out_channels=512, kernel_size=1, stride=1, padding=0))
self.conv.add_module('ReLU-{0}'.format(i),nn.ReLU());i+=1
if name in ["19"]:
self.conv.add_module('conv-{0}'.format(i),nn.Conv2d(in_channels=512, out_channels=512, kernel_size=3, stride=1, padding=1))
self.conv.add_module('ReLU-{0}'.format(i),nn.ReLU());i+=1
self.conv.add_module('MaxPooling-{0}'.format(p),nn.MaxPool2d(kernel_size=2, stride=2));p+=1 # 14 -> 7

self.fc = nn.Sequential(
nn.Linear(512*7*7,4096),
nn.ReLU(),
nn.Dropout(0.5),
nn.Linear(4096,4096),
nn.ReLU(),
nn.Dropout(0.5),
nn.Linear(4096,1000),
nn.ReLU(),
nn.Linear(1000,1),
nn.Sigmoid()
)

def forward(self, x):
x = self.conv(x)
x = x.view(x.shape[0], -1)
x = self.fc(x)
return x

def train(optimizer):
global net
global EPOCH
global BATCH_SIZE
global train_data
global validate_data
global PROGRAM_START
print('['+net.name+'] with optimizer ['+str(type(optimizer))+']:')
train_loader = data.DataLoader(train_data, batch_size=BATCH_SIZE, shuffle=True, num_workers=0)
validate_loader = data.DataLoader(validate_data, batch_size=1, shuffle=False, num_workers=0)
BATCH = len(train_loader)
m = len(validate_loader)
for epoch in range(EPOCH):
EPOCH_START = time.time()
print("\tEpoch #{0}/{1}:".format(epoch+1,EPOCH))
for batch,(x,y) in enumerate(train_loader):
optimizer.zero_grad()
t = net(x)
loss = LOSS_FUNC(t,y)
print("\t\tBatch #{0}/{1}: ".format(batch+1,BATCH) + "Loss = %.6f"%float(loss))
loss.backward()
optimizer.step()
with torch.no_grad():
L = 0.
E = 0.
for batch,(x,y) in enumerate(validate_loader):
t = net(x)
L += float(LOSS_FUNC(t,y))
E += float((float(t[0][0])>0.5)!=y)
print("\t Validation Loss = %.6f. Error Rate = %.3f%%"%(L/m,E*100/m))
if((epoch+1)%UPDATE==0):
torch.save(net.state_dict(),OUTPUT_PATH+"/{0}[{1}]".format(net.name,epoch+1)+"-L(%.6f)E(%.3f).pt"%(L/m,E*100/m))
torch.save(optimizer.state_dict(),OUTPUT_PATH+"/{0}[{1}]".format(net.name,epoch+1)+"-L(%.6f)E(%.3f)-optimizer.pt"%(L/m,E*100/m))
print("\t Finish epoch #{0}".format(epoch+1)+" in %.4f s."%(time.time()-EPOCH_START)+" Total Time Cost = %.4f s."%(time.time()-PROGRAM_START))

def run(filename):
global net
global test_data
prediction = []
test_loader = data.DataLoader(test_data, batch_size=1, shuffle=False, num_workers=0)
with torch.no_grad():
for i,(x,y) in enumerate(test_loader):
t = net(x)
prediction.append([i+1,float(t[0][0])])
submission = pd.DataFrame(prediction)
submission.columns = ['id','label']
submission.to_csv(filename+".csv",index=0)

def load():
np.random.seed(998244353)
torch.manual_seed(998244353)
image_list = []
label_list = []
for i in range(ORIGIN_DATA_SIZE):
image_list.append(INPUT_PATH+"/train/cat.{0}.jpg".format(i))
label_list.append(0)
image_list.append(INPUT_PATH+"/train/dog.{0}.jpg".format(i))
label_list.append(1)
n = int(ORIGIN_DATA_SIZE*2*RATIO)
train_data = ImageDataset(image_list[:n],label_list[:n])
validate_data = ImageDataset(image_list[n:],label_list[n:])
image_list = []
for i in range(TARGET_DATA_SIZE):
image_list.append(INPUT_PATH+"/test/{0}.jpg".format(i+1))
test_data = ImageDataset(image_list,[0]*TARGET_DATA_SIZE)
# np.random.seed()
# torch.seed()
return train_data,validate_data,test_data

print("*****Start")
PROGRAM_START = time.time()
train_data,validate_data,test_data = load()
print("Finish reading data in %.4f s."%(time.time()-PROGRAM_START))
net = VGG("19")
if SWITCH//2==1 and PARAMETERS!="":
net.load_state_dict(torch.load(PARAMETERS+".pt"))
print("Load Model ["+PARAMETERS+".pt] Success!")
net.cuda()
optimizer = optim.RMSprop(net.parameters(), lr=LR, alpha=0.9)
if SWITCH//2==1 and PARAMETERS!="":
optimizer.load_state_dict(torch.load(PARAMETERS+"-optimizer.pt"))
print("Load Optimizer ["+PARAMETERS+"-optimizer.pt] Success!")
LOSS_FUNC.cuda()
if SWITCH% 2==1:
train(optimizer)
TRANSFORM = transforms.Compose([transforms.Resize((224,224)),
transforms.ToTensor(),
transforms.Normalize((0.485,0.456,0.406),(0.229,0.224,0.225))
])
run(OUTPUT_PATH+"/{0}".format(net.name))
print("*****Finish")

注: 没有GPU不进行CUDA优化就去掉所有.cuda.cuda()就可以了。注意Kaggle的Kernel上传数据似乎不认识中括号'['']',文件名里的中括号会被忽略。注意INPUT_PATHOUTPUT_PATH未定义,PARAMETERS默认为空。Kaggle Kernel版本


效果

只测试了VGG11和VGG19。

作为参照,直接瞎猜全部输出$0.5$,Loss大约在$0.69$附近。

最开始没做图像归一化,用的是SGD而不是RMSprop,没有随机剪切,并且运行了一个特别小的卷积网络,CPU跑了一晚上,做到了$0.56046$。

VGG11训练了$20$个epoch(使用GPU大约$40$分钟,不然可能要几个小时),做到了$0.34516$。

VGG19训练了$25$个epoch(使用GPU大约三四个小时,不然可能……),做到了$0.21873$。

预测正确率已经比较优秀了,基本都在$95\%$左右,但是Loss和前面的高级神经网络的预训练模型(AlexNet,ResNet,Inception等)的$0.01$数量级还是有很大差距。不过起码说明,个人用一两天的时间训练相对较大的神经网络还是有效果的。

UPD 2020.01.22: VGG19预训练模型fine-tune做到了$0.13763$。


扫描二维码即可在手机上查看这篇文章,或者转发二维码来分享这篇文章:


文章作者: Magolor
文章链接: https://magolor.cn/2020/01/14/2020-01-14-blog-01/
版权声明: 本博客所有文章除特别声明外,均采用 CC BY-NC-SA 4.0 许可协议。转载请注明来自 Magolor
扫描二维码在手机上查看或转发二维码以分享Magolor的博客