在 Pytorch 中一种模型保存和加载的方式如下:

# save

torch.save(model.state_dict(), PATH)

# load

model = MyModel(*args, **kwargs)

model.load_state_dict(torch.load(PATH))

model.eval()

model.state_dict()其实返回的是一个OrderDict，存储了网络结构的名字和对应的参数，下面看看源代码如何实现的。

state_dict

# torch.nn.modules.module.py

class Module(object):

	def state_dict(self, destination=None, prefix='', keep_vars=False):

		if destination is None:

			destination = OrderedDict()

			destination._metadata = OrderedDict()

		destination._metadata[prefix[:-1]] = local_metadata = dict(version=self._version)

		for name, param in self._parameters.items():

			if param is not None:

				destination[prefix + name] = param if keep_vars else param.data

		for name, buf in self._buffers.items():

			if buf is not None:

				destination[prefix + name] = buf if keep_vars else buf.data

		for name, module in self._modules.items():

			if module is not None:

				module.state_dict(destination, prefix + name + '.', keep_vars=keep_vars)

		for hook in self._state_dict_hooks.values():

			hook_result = hook(self, destination, prefix, local_metadata)

			if hook_result is not None:

				destination = hook_result

		return destination

可以看到state_dict函数中遍历了4中元素，分别是_paramters,_buffers,_modules和_state_dict_hooks,前面三者在之前的文章已经介绍区别，最后一种就是在读取state_dict时希望执行的操作，一般为空，所以不做考虑。另外有一点需要注意的是，在读取Module时采用的递归的读取方式，并且名字间使用.做分割，以方便后面load_state_dict读取参数。

class MyModel(nn.Module):

	def __init__(self):

		super(MyModel, self).__init__()

		self.my_tensor = torch.randn(1) # 参数直接作为模型类成员变量

		self.register_buffer('my_buffer', torch.randn(1)) # 参数注册为 buffer

		self.my_param = nn.Parameter(torch.randn(1))

		self.fc = nn.Linear(2,2,bias=False)

		self.conv = nn.Conv2d(2,1,1)

		self.fc2 = nn.Linear(2,2,bias=False)

		self.f3 = self.fc

	def forward(self, x):

		return x

model = MyModel()

print(model.state_dict())

>>>OrderedDict([('my_param', tensor([-0.3052])), ('my_buffer', tensor([0.5583])), ('fc.weight', tensor([[ 0.6322, -0.0255],

        [-0.4747, -0.0530]])), ('conv.weight', tensor([[[[ 0.3346]],

         [[-0.2962]]]])), ('conv.bias', tensor([0.5205])), ('fc2.weight', tensor([[-0.4949,  0.2815],

        [ 0.3006,  0.0768]])), ('f3.weight', tensor([[ 0.6322, -0.0255],

        [-0.4747, -0.0530]]))])

可以看到最后的确输出了三种参数。

load_state_dict

下面的代码中我们可以分成两个部分看，

load(self)

这个函数会递归地对模型进行参数恢复，其中的_load_from_state_dict的源码附在文末。

首先我们需要明确state_dict这个变量表示你之前保存的模型参数序列，而_load_from_state_dict函数中的local_state 表示你的代码中定义的模型的结构。

那么_load_from_state_dict的作用简单理解就是假如我们现在需要对一个名为conv.weight的子模块做参数恢复，那么就以递归的方式先判断conv是否在staet__dict和local_state中，如果不在就把conv添加到unexpected_keys中去，否则递归的判断conv.weight是否存在，如果都存在就执行param.copy_(input_param),这样就完成了conv.weight的参数拷贝。

if strict：

这个部分的作用是判断上面参数拷贝过程中是否有unexpected_keys或者missing_keys,如果有就报错，代码不能继续执行。当然，如果strict=False，则会忽略这些细节。

def load_state_dict(self, state_dict, strict=True):

	missing_keys = []

	unexpected_keys = []

	error_msgs = []

	# copy state_dict so _load_from_state_dict can modify it

	metadata = getattr(state_dict, '_metadata', None)

	state_dict = state_dict.copy()

	if metadata is not None:

		state_dict._metadata = metadata

	def load(module, prefix=''):

		local_metadata = {} if metadata is None else metadata.get(prefix[:-1], {})

		module._load_from_state_dict(

			state_dict, prefix, local_metadata, strict, missing_keys, unexpected_keys, error_msgs)

		for name, child in module._modules.items():

			if child is not None:

				load(child, prefix + name + '.')

	load(self)

	if strict:

		error_msg = ''

		if len(unexpected_keys) > 0:

			error_msgs.insert(

				0, 'Unexpected key(s) in state_dict: {}. '.format(

					', '.join('"{}"'.format(k) for k in unexpected_keys)))

		if len(missing_keys) > 0:

			error_msgs.insert(

				0, 'Missing key(s) in state_dict: {}. '.format(

					', '.join('"{}"'.format(k) for k in missing_keys)))

	if len(error_msgs) > 0:

		raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(

						   self.__class__.__name__, "\n\t".join(error_msgs)))

_load_from_state_dict

def _load_from_state_dict(self, state_dict, prefix, local_metadata, strict,

						  missing_keys, unexpected_keys, error_msgs):

	for hook in self._load_state_dict_pre_hooks.values():

		hook(state_dict, prefix, local_metadata, strict, missing_keys, unexpected_keys, error_msgs)

	local_name_params = itertools.chain(self._parameters.items(), self._buffers.items())

	local_state = {k: v.data for k, v in local_name_params if v is not None}

	for name, param in local_state.items():

		key = prefix + name

		if key in state_dict:

			input_param = state_dict[key]

			# Backward compatibility: loading 1-dim tensor from 0.3.* to version 0.4+

			if len(param.shape) == 0 and len(input_param.shape) == 1:

				input_param = input_param[0]

			if input_param.shape != param.shape:

				# local shape should match the one in checkpoint

				error_msgs.append('size mismatch for {}: copying a param with shape {} from checkpoint, '

								  'the shape in current model is {}.'

								  .format(key, input_param.shape, param.shape))

				continue

			if isinstance(input_param, Parameter):

				# backwards compatibility for serialized parameters

				input_param = input_param.data

			try:

				param.copy_(input_param)

			except Exception:

				error_msgs.append('While copying the parameter named "{}", '

								  'whose dimensions in the model are {} and '

								  'whose dimensions in the checkpoint are {}.'

								  .format(key, param.size(), input_param.size()))

		elif strict:

			missing_keys.append(key)

	if strict:

		for key, input_param in state_dict.items():

			if key.startswith(prefix):

				input_name = key[len(prefix):]

				input_name = input_name.split('.', 1)[0]  # get the name of param/buffer/child

				if input_name not in self._modules and input_name not in local_state:

					unexpected_keys.append(key)

源码详解Pytorch的state_dict和load_state_dict的更多相关文章

Spark Streaming揭秘 Day25 StreamingContext和JobScheduler启动源码详解
Spark Streaming揭秘 Day25 StreamingContext和JobScheduler启动源码详解今天主要理一下StreamingContext的启动过程,其中最为重要的就是Jo ...
spring事务详解（三）源码详解
系列目录 spring事务详解(一)初探事务 spring事务详解(二)简单样例 spring事务详解(三)源码详解 spring事务详解(四)测试验证 spring事务详解(五)总结提高一.引子 ...
条件随机场之CRF++源码详解-预测
这篇文章主要讲解CRF++实现预测的过程,预测的算法以及代码实现相对来说比较简单,所以这篇文章理解起来也会比上一篇条件随机场训练的内容要容易. 预测上一篇条件随机场训练的源码详解中,有一个地方并没有 ...
[转]Linux内核源码详解--iostat
Linux内核源码详解——命令篇之iostat 转自:http://www.cnblogs.com/york-hust/p/4846497.html 本文主要分析了Linux的iostat命令的源码, ...
saltstack源码详解一
目录初识源码流程入口 1.grains.items 2.pillar.items 2/3: 是否可以用python脚本实现总结pillar源码分析: @(python之路)[saltstack源 ...
Shiro 登录认证源码详解
Shiro 登录认证源码详解 Apache Shiro 是一个强大且灵活的 Java 开源安全框架,拥有登录认证.授权管理.企业级会话管理和加密等功能,相比 Spring Security 来说要更加 ...
udhcp源码详解（五）之DHCP包--options字段
中间有很长一段时间没有更新udhcp源码详解的博客,主要是源码里的函数太多,不知道要不要一个一个讲下去,要知道讲DHCP的实现理论的话一篇博文也就可以大致的讲完,但实现的源码却要关心很多的问题,比如说 ...
Activiti架构分析及源码详解
目录 Activiti架构分析及源码详解引言一.Activiti设计解析-架构&领域模型 1.1 架构 1.2 领域模型二.Activiti设计解析-PVM执行树 2.1 核心理念 2. ...
源码详解系列(六) ------ 全面讲解druid的使用和源码
简介 druid是用于创建和管理连接,利用"池"的方式复用连接减少资源开销,和其他数据源一样,也具有连接数控制.连接可靠性测试.连接泄露控制.缓存语句等功能,另外,druid还扩展 ...

随机推荐

SpringCloud学习笔记（一、SpringCloud 基础）
目录: 概述观察者模式代理模式概述: spring系列中使用了大量的设计模式,而最常见的便是这观察者.代理模式,所以在讲解SpringCloud之前我们先学习下这两个最常见的设计模式. 观察者模 ...
jQuery中的CSS（四）
1. css(name|pro|[,val|fn]), 访问匹配元素的样式属性 jQuery 1.8中,当你使用CSS属性在css()或animate()中,我们将根据浏览器自动加上前缀(在适当的时候 ...
VMware exsi虚拟机磁盘扩容
创建Linux时分配磁盘空间随着使用的增加,使用率逐渐升高,需要对/root进行扩容,此时需要在添加或者扩展一下磁盘. 查看Linux版本信息 [root@localhost ~]# cat /etc ...
【shell脚本】定时备份日志===logBackup.sh
定时备份日志设置执行权限 [root@VM_0_10_centos shellScript]# chmod a+x logBackup,sh 脚本内容 [root@VM_0_10_centos sh ...
MSM8909中LK阶段LCM屏适配与显示流程分析(二)
1.前言在前面的文章MSM8909中LK阶段LCM屏适配与显示流程分析(一),链接如下: https://www.cnblogs.com/Cqlismy/p/12019317.html 介绍了如何使 ...
解决原生javascript 缺少insertAfter的功能，非Jquery方法
在现有的方法后插入一个新元素,你可能会想:既然有insertBefore方法,是不是也有一个相应的insertAfter()方法.很可惜,DOM没有提供方法.下面编写insertAfter函数,虽然D ...
【前端知识体系-JS相关】深入理解MVVM和VUE
1. v-bind和v-model的区别? v-bind用来绑定数据和属性以及表达式,缩写为':' v-model使用在表单中,实现双向数据绑定的,在表单元素外使用不起作用 2. Vue 中三要素的是 ...
golang编译器：gccgo vs gc
GCC是一个功能强大的编译器,不仅可以编译我们很熟悉的C/C++,也可以做为Fortran.Pascal.Objective-C等语言的编译器.而GCCGO则是GCC专门用来编译Golang语言的.G ...
python I/O复用
select是阻塞式的方法
Kubernetes 静态PV使用
Kubernetes 静态PV使用 Kubernetes支持持久卷的存储插件:https://kubernetes.io/docs/concepts/storage/persistent-volum ...

源码详解Pytorch的state_dict和load_state_dict

state_dict

load_state_dict

源码详解Pytorch的state_dict和load_state_dict的更多相关文章

随机推荐

热门专题