在 Pytorch 中一种模型保存和加载的方式如下:

# save

torch.save(model.state_dict(), PATH)

# load

model = MyModel(*args, **kwargs)

model.load_state_dict(torch.load(PATH))

model.eval()

model.state_dict()其实返回的是一个OrderDict，存储了网络结构的名字和对应的参数，下面看看源代码如何实现的。

state_dict

# torch.nn.modules.module.py

class Module(object):

	def state_dict(self, destination=None, prefix='', keep_vars=False):

		if destination is None:

			destination = OrderedDict()

			destination._metadata = OrderedDict()

		destination._metadata[prefix[:-1]] = local_metadata = dict(version=self._version)

		for name, param in self._parameters.items():

			if param is not None:

				destination[prefix + name] = param if keep_vars else param.data

		for name, buf in self._buffers.items():

			if buf is not None:

				destination[prefix + name] = buf if keep_vars else buf.data

		for name, module in self._modules.items():

			if module is not None:

				module.state_dict(destination, prefix + name + '.', keep_vars=keep_vars)

		for hook in self._state_dict_hooks.values():

			hook_result = hook(self, destination, prefix, local_metadata)

			if hook_result is not None:

				destination = hook_result

		return destination

可以看到state_dict函数中遍历了4中元素，分别是_paramters,_buffers,_modules和_state_dict_hooks,前面三者在之前的文章已经介绍区别，最后一种就是在读取state_dict时希望执行的操作，一般为空，所以不做考虑。另外有一点需要注意的是，在读取Module时采用的递归的读取方式，并且名字间使用.做分割，以方便后面load_state_dict读取参数。

class MyModel(nn.Module):

	def __init__(self):

		super(MyModel, self).__init__()

		self.my_tensor = torch.randn(1) # 参数直接作为模型类成员变量

		self.register_buffer('my_buffer', torch.randn(1)) # 参数注册为 buffer

		self.my_param = nn.Parameter(torch.randn(1))

		self.fc = nn.Linear(2,2,bias=False)

		self.conv = nn.Conv2d(2,1,1)

		self.fc2 = nn.Linear(2,2,bias=False)

		self.f3 = self.fc

	def forward(self, x):

		return x

model = MyModel()

print(model.state_dict())

>>>OrderedDict([('my_param', tensor([-0.3052])), ('my_buffer', tensor([0.5583])), ('fc.weight', tensor([[ 0.6322, -0.0255],

        [-0.4747, -0.0530]])), ('conv.weight', tensor([[[[ 0.3346]],

         [[-0.2962]]]])), ('conv.bias', tensor([0.5205])), ('fc2.weight', tensor([[-0.4949,  0.2815],

        [ 0.3006,  0.0768]])), ('f3.weight', tensor([[ 0.6322, -0.0255],

        [-0.4747, -0.0530]]))])

可以看到最后的确输出了三种参数。

load_state_dict

下面的代码中我们可以分成两个部分看，

load(self)

这个函数会递归地对模型进行参数恢复，其中的_load_from_state_dict的源码附在文末。

首先我们需要明确state_dict这个变量表示你之前保存的模型参数序列，而_load_from_state_dict函数中的local_state 表示你的代码中定义的模型的结构。

那么_load_from_state_dict的作用简单理解就是假如我们现在需要对一个名为conv.weight的子模块做参数恢复，那么就以递归的方式先判断conv是否在staet__dict和local_state中，如果不在就把conv添加到unexpected_keys中去，否则递归的判断conv.weight是否存在，如果都存在就执行param.copy_(input_param),这样就完成了conv.weight的参数拷贝。

if strict：

这个部分的作用是判断上面参数拷贝过程中是否有unexpected_keys或者missing_keys,如果有就报错，代码不能继续执行。当然，如果strict=False，则会忽略这些细节。

def load_state_dict(self, state_dict, strict=True):

	missing_keys = []

	unexpected_keys = []

	error_msgs = []

	# copy state_dict so _load_from_state_dict can modify it

	metadata = getattr(state_dict, '_metadata', None)

	state_dict = state_dict.copy()

	if metadata is not None:

		state_dict._metadata = metadata

	def load(module, prefix=''):

		local_metadata = {} if metadata is None else metadata.get(prefix[:-1], {})

		module._load_from_state_dict(

			state_dict, prefix, local_metadata, strict, missing_keys, unexpected_keys, error_msgs)

		for name, child in module._modules.items():

			if child is not None:

				load(child, prefix + name + '.')

	load(self)

	if strict:

		error_msg = ''

		if len(unexpected_keys) > 0:

			error_msgs.insert(

				0, 'Unexpected key(s) in state_dict: {}. '.format(

					', '.join('"{}"'.format(k) for k in unexpected_keys)))

		if len(missing_keys) > 0:

			error_msgs.insert(

				0, 'Missing key(s) in state_dict: {}. '.format(

					', '.join('"{}"'.format(k) for k in missing_keys)))

	if len(error_msgs) > 0:

		raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(

						   self.__class__.__name__, "\n\t".join(error_msgs)))

_load_from_state_dict

def _load_from_state_dict(self, state_dict, prefix, local_metadata, strict,

						  missing_keys, unexpected_keys, error_msgs):

	for hook in self._load_state_dict_pre_hooks.values():

		hook(state_dict, prefix, local_metadata, strict, missing_keys, unexpected_keys, error_msgs)

	local_name_params = itertools.chain(self._parameters.items(), self._buffers.items())

	local_state = {k: v.data for k, v in local_name_params if v is not None}

	for name, param in local_state.items():

		key = prefix + name

		if key in state_dict:

			input_param = state_dict[key]

			# Backward compatibility: loading 1-dim tensor from 0.3.* to version 0.4+

			if len(param.shape) == 0 and len(input_param.shape) == 1:

				input_param = input_param[0]

			if input_param.shape != param.shape:

				# local shape should match the one in checkpoint

				error_msgs.append('size mismatch for {}: copying a param with shape {} from checkpoint, '

								  'the shape in current model is {}.'

								  .format(key, input_param.shape, param.shape))

				continue

			if isinstance(input_param, Parameter):

				# backwards compatibility for serialized parameters

				input_param = input_param.data

			try:

				param.copy_(input_param)

			except Exception:

				error_msgs.append('While copying the parameter named "{}", '

								  'whose dimensions in the model are {} and '

								  'whose dimensions in the checkpoint are {}.'

								  .format(key, param.size(), input_param.size()))

		elif strict:

			missing_keys.append(key)

	if strict:

		for key, input_param in state_dict.items():

			if key.startswith(prefix):

				input_name = key[len(prefix):]

				input_name = input_name.split('.', 1)[0]  # get the name of param/buffer/child

				if input_name not in self._modules and input_name not in local_state:

					unexpected_keys.append(key)

源码详解Pytorch的state_dict和load_state_dict的更多相关文章

Spark Streaming揭秘 Day25 StreamingContext和JobScheduler启动源码详解
Spark Streaming揭秘 Day25 StreamingContext和JobScheduler启动源码详解今天主要理一下StreamingContext的启动过程,其中最为重要的就是Jo ...
spring事务详解（三）源码详解
系列目录 spring事务详解(一)初探事务 spring事务详解(二)简单样例 spring事务详解(三)源码详解 spring事务详解(四)测试验证 spring事务详解(五)总结提高一.引子 ...
条件随机场之CRF++源码详解-预测
这篇文章主要讲解CRF++实现预测的过程,预测的算法以及代码实现相对来说比较简单,所以这篇文章理解起来也会比上一篇条件随机场训练的内容要容易. 预测上一篇条件随机场训练的源码详解中,有一个地方并没有 ...
[转]Linux内核源码详解--iostat
Linux内核源码详解——命令篇之iostat 转自:http://www.cnblogs.com/york-hust/p/4846497.html 本文主要分析了Linux的iostat命令的源码, ...
saltstack源码详解一
目录初识源码流程入口 1.grains.items 2.pillar.items 2/3: 是否可以用python脚本实现总结pillar源码分析: @(python之路)[saltstack源 ...
Shiro 登录认证源码详解
Shiro 登录认证源码详解 Apache Shiro 是一个强大且灵活的 Java 开源安全框架,拥有登录认证.授权管理.企业级会话管理和加密等功能,相比 Spring Security 来说要更加 ...
udhcp源码详解（五）之DHCP包--options字段
中间有很长一段时间没有更新udhcp源码详解的博客,主要是源码里的函数太多,不知道要不要一个一个讲下去,要知道讲DHCP的实现理论的话一篇博文也就可以大致的讲完,但实现的源码却要关心很多的问题,比如说 ...
Activiti架构分析及源码详解
目录 Activiti架构分析及源码详解引言一.Activiti设计解析-架构&领域模型 1.1 架构 1.2 领域模型二.Activiti设计解析-PVM执行树 2.1 核心理念 2. ...
源码详解系列(六) ------ 全面讲解druid的使用和源码
简介 druid是用于创建和管理连接,利用"池"的方式复用连接减少资源开销,和其他数据源一样,也具有连接数控制.连接可靠性测试.连接泄露控制.缓存语句等功能,另外,druid还扩展 ...

随机推荐

vlmcsd
scp ./vlmcsd-x64-musl-static xxx@host.ip:/opt/kms/ chmod u+x /opt/kms/vlmcsd-x64-musl-static ./vlmcs ...
【Ribbon篇四】自定义负载均衡策略（4）
官方文档特别指出:自定义的负载均衡配置类不能放在 @componentScan 所扫描的当前包下及其子包下,否则我们自定义的这个配置类就会被所有的Ribbon客户端所共享,也就是说我们达不到特殊化定制 ...
LG2495 「SDOI2011」消耗战虚树
问题描述 LG2495 题解虚树 \(\mathrm{Code}\) #include<bits/stdc++.h> using namespace std; #define int l ...
rabbit 发送者设置
@Override public void sendUploadOssAndRiskDanger(String uuid, Object objectData) { try { rabbitTempl ...
集成Azure DevOps Server(TFS) 与微软Teams
1.概述 Microsoft Teams是Office 365中团队协作的中心.将团队的所有聊天.会议.文件和应用程序放在一个位置.软件开发团队可以在一个专门的协作中心中即时访问他们所需的所有内容,T ...
Flask-Moment本地化日期和时间
moment.js客户端开源代码库,可以在浏览器中渲染日期和时间.Flask-Moment是一个flask程序扩展,能把moment.js集成到Jinja2模板中. 1.安装 pip install ...
项目整合SpringDataRedis
1:准备工作先导入redis和jedis依赖,在配置redis-config.properties 和applicationContext-redis.xml (详细配置信息及入门demo见我上一篇 ...
漫谈微服务架构：什么是Spring Cloud，为何要选择Spring Cloud
Spring Cloud是基于Spring Boot的,因此还在使用SpringMVC的同学要先了解Spring Boot.先上一段官话,Spring Cloud是一个基于Spring Boo ...
Java 银联云闪付对接记录
一开始盲目找资料走了弯路: 还是从银联给的官方文档入手最高效: 附件3:云闪付业务商户入网服务指引.pdf http://tomas.test.upcdn.net/pay/%E9%99%84%E4%B ...
Logstash：处理多个input
Logstash:处理多个input Logstash的整个pipleline分为三个部分: input插件:提取数据. 这可以来自日志文件,TCP或UDP侦听器,若干协议特定插件(如syslog或I ...

源码详解Pytorch的state_dict和load_state_dict

state_dict

load_state_dict

源码详解Pytorch的state_dict和load_state_dict的更多相关文章

随机推荐

热门专题