之前分别用numpy实现了mlp,cnn,lstm,这次搞一个大一点的模型bert,纯numpy实现,最重要的是可在树莓派上或其他不能安装pytorch的板子上运行,推理数据

本次模型是随便在hugging face上找的一个新闻评论的模型,7分类

看这些模型参数,这并不重要,模型占硬盘空间都要400+M

bert.embeddings.word_embeddings.weight torch.Size([21128, 768])
bert.embeddings.position_embeddings.weight torch.Size([512, 768])
bert.embeddings.token_type_embeddings.weight torch.Size([2, 768])
bert.embeddings.LayerNorm.weight torch.Size([768])
bert.embeddings.LayerNorm.bias torch.Size([768])
bert.encoder.layer.0.attention.self.query.weight torch.Size([768, 768])
bert.encoder.layer.0.attention.self.query.bias torch.Size([768])
bert.encoder.layer.0.attention.self.key.weight torch.Size([768, 768])
bert.encoder.layer.0.attention.self.key.bias torch.Size([768])
bert.encoder.layer.0.attention.self.value.weight torch.Size([768, 768])
bert.encoder.layer.0.attention.self.value.bias torch.Size([768])
bert.encoder.layer.0.attention.output.dense.weight torch.Size([768, 768])
bert.encoder.layer.0.attention.output.dense.bias torch.Size([768])
bert.encoder.layer.0.attention.output.LayerNorm.weight torch.Size([768])
bert.encoder.layer.0.attention.output.LayerNorm.bias torch.Size([768])
bert.encoder.layer.0.intermediate.dense.weight torch.Size([3072, 768])
bert.encoder.layer.0.intermediate.dense.bias torch.Size([3072])
bert.encoder.layer.0.output.dense.weight torch.Size([768, 3072])
bert.encoder.layer.0.output.dense.bias torch.Size([768])
bert.encoder.layer.0.output.LayerNorm.weight torch.Size([768])
bert.encoder.layer.0.output.LayerNorm.bias torch.Size([768])
bert.encoder.layer.1.attention.self.query.weight torch.Size([768, 768])
bert.encoder.layer.1.attention.self.query.bias torch.Size([768])
bert.encoder.layer.1.attention.self.key.weight torch.Size([768, 768])
bert.encoder.layer.1.attention.self.key.bias torch.Size([768])
bert.encoder.layer.1.attention.self.value.weight torch.Size([768, 768])
bert.encoder.layer.1.attention.self.value.bias torch.Size([768])
bert.encoder.layer.1.attention.output.dense.weight torch.Size([768, 768])
bert.encoder.layer.1.attention.output.dense.bias torch.Size([768])
bert.encoder.layer.1.attention.output.LayerNorm.weight torch.Size([768])
bert.encoder.layer.1.attention.output.LayerNorm.bias torch.Size([768])
bert.encoder.layer.1.intermediate.dense.weight torch.Size([3072, 768])
bert.encoder.layer.1.intermediate.dense.bias torch.Size([3072])
bert.encoder.layer.1.output.dense.weight torch.Size([768, 3072])
bert.encoder.layer.1.output.dense.bias torch.Size([768])
bert.encoder.layer.1.output.LayerNorm.weight torch.Size([768])
bert.encoder.layer.1.output.LayerNorm.bias torch.Size([768])
bert.encoder.layer.2.attention.self.query.weight torch.Size([768, 768])
bert.encoder.layer.2.attention.self.query.bias torch.Size([768])
bert.encoder.layer.2.attention.self.key.weight torch.Size([768, 768])
bert.encoder.layer.2.attention.self.key.bias torch.Size([768])
bert.encoder.layer.2.attention.self.value.weight torch.Size([768, 768])
bert.encoder.layer.2.attention.self.value.bias torch.Size([768])
bert.encoder.layer.2.attention.output.dense.weight torch.Size([768, 768])
bert.encoder.layer.2.attention.output.dense.bias torch.Size([768])
bert.encoder.layer.2.attention.output.LayerNorm.weight torch.Size([768])
bert.encoder.layer.2.attention.output.LayerNorm.bias torch.Size([768])
bert.encoder.layer.2.intermediate.dense.weight torch.Size([3072, 768])
bert.encoder.layer.2.intermediate.dense.bias torch.Size([3072])
bert.encoder.layer.2.output.dense.weight torch.Size([768, 3072])
bert.encoder.layer.2.output.dense.bias torch.Size([768])
bert.encoder.layer.2.output.LayerNorm.weight torch.Size([768])
bert.encoder.layer.2.output.LayerNorm.bias torch.Size([768])
bert.encoder.layer.3.attention.self.query.weight torch.Size([768, 768])
bert.encoder.layer.3.attention.self.query.bias torch.Size([768])
bert.encoder.layer.3.attention.self.key.weight torch.Size([768, 768])
bert.encoder.layer.3.attention.self.key.bias torch.Size([768])
bert.encoder.layer.3.attention.self.value.weight torch.Size([768, 768])
bert.encoder.layer.3.attention.self.value.bias torch.Size([768])
bert.encoder.layer.3.attention.output.dense.weight torch.Size([768, 768])
bert.encoder.layer.3.attention.output.dense.bias torch.Size([768])
bert.encoder.layer.3.attention.output.LayerNorm.weight torch.Size([768])
bert.encoder.layer.3.attention.output.LayerNorm.bias torch.Size([768])
bert.encoder.layer.3.intermediate.dense.weight torch.Size([3072, 768])
bert.encoder.layer.3.intermediate.dense.bias torch.Size([3072])
bert.encoder.layer.3.output.dense.weight torch.Size([768, 3072])
bert.encoder.layer.3.output.dense.bias torch.Size([768])
bert.encoder.layer.3.output.LayerNorm.weight torch.Size([768])
bert.encoder.layer.3.output.LayerNorm.bias torch.Size([768])
bert.encoder.layer.4.attention.self.query.weight torch.Size([768, 768])
bert.encoder.layer.4.attention.self.query.bias torch.Size([768])
bert.encoder.layer.4.attention.self.key.weight torch.Size([768, 768])
bert.encoder.layer.4.attention.self.key.bias torch.Size([768])
bert.encoder.layer.4.attention.self.value.weight torch.Size([768, 768])
bert.encoder.layer.4.attention.self.value.bias torch.Size([768])
bert.encoder.layer.4.attention.output.dense.weight torch.Size([768, 768])
bert.encoder.layer.4.attention.output.dense.bias torch.Size([768])
bert.encoder.layer.4.attention.output.LayerNorm.weight torch.Size([768])
bert.encoder.layer.4.attention.output.LayerNorm.bias torch.Size([768])
bert.encoder.layer.4.intermediate.dense.weight torch.Size([3072, 768])
bert.encoder.layer.4.intermediate.dense.bias torch.Size([3072])
bert.encoder.layer.4.output.dense.weight torch.Size([768, 3072])
bert.encoder.layer.4.output.dense.bias torch.Size([768])
bert.encoder.layer.4.output.LayerNorm.weight torch.Size([768])
bert.encoder.layer.4.output.LayerNorm.bias torch.Size([768])
bert.encoder.layer.5.attention.self.query.weight torch.Size([768, 768])
bert.encoder.layer.5.attention.self.query.bias torch.Size([768])
bert.encoder.layer.5.attention.self.key.weight torch.Size([768, 768])
bert.encoder.layer.5.attention.self.key.bias torch.Size([768])
bert.encoder.layer.5.attention.self.value.weight torch.Size([768, 768])
bert.encoder.layer.5.attention.self.value.bias torch.Size([768])
bert.encoder.layer.5.attention.output.dense.weight torch.Size([768, 768])
bert.encoder.layer.5.attention.output.dense.bias torch.Size([768])
bert.encoder.layer.5.attention.output.LayerNorm.weight torch.Size([768])
bert.encoder.layer.5.attention.output.LayerNorm.bias torch.Size([768])
bert.encoder.layer.5.intermediate.dense.weight torch.Size([3072, 768])
bert.encoder.layer.5.intermediate.dense.bias torch.Size([3072])
bert.encoder.layer.5.output.dense.weight torch.Size([768, 3072])
bert.encoder.layer.5.output.dense.bias torch.Size([768])
bert.encoder.layer.5.output.LayerNorm.weight torch.Size([768])
bert.encoder.layer.5.output.LayerNorm.bias torch.Size([768])
bert.encoder.layer.6.attention.self.query.weight torch.Size([768, 768])
bert.encoder.layer.6.attention.self.query.bias torch.Size([768])
bert.encoder.layer.6.attention.self.key.weight torch.Size([768, 768])
bert.encoder.layer.6.attention.self.key.bias torch.Size([768])
bert.encoder.layer.6.attention.self.value.weight torch.Size([768, 768])
bert.encoder.layer.6.attention.self.value.bias torch.Size([768])
bert.encoder.layer.6.attention.output.dense.weight torch.Size([768, 768])
bert.encoder.layer.6.attention.output.dense.bias torch.Size([768])
bert.encoder.layer.6.attention.output.LayerNorm.weight torch.Size([768])
bert.encoder.layer.6.attention.output.LayerNorm.bias torch.Size([768])
bert.encoder.layer.6.intermediate.dense.weight torch.Size([3072, 768])
bert.encoder.layer.6.intermediate.dense.bias torch.Size([3072])
bert.encoder.layer.6.output.dense.weight torch.Size([768, 3072])
bert.encoder.layer.6.output.dense.bias torch.Size([768])
bert.encoder.layer.6.output.LayerNorm.weight torch.Size([768])
bert.encoder.layer.6.output.LayerNorm.bias torch.Size([768])
bert.encoder.layer.7.attention.self.query.weight torch.Size([768, 768])
bert.encoder.layer.7.attention.self.query.bias torch.Size([768])
bert.encoder.layer.7.attention.self.key.weight torch.Size([768, 768])
bert.encoder.layer.7.attention.self.key.bias torch.Size([768])
bert.encoder.layer.7.attention.self.value.weight torch.Size([768, 768])
bert.encoder.layer.7.attention.self.value.bias torch.Size([768])
bert.encoder.layer.7.attention.output.dense.weight torch.Size([768, 768])
bert.encoder.layer.7.attention.output.dense.bias torch.Size([768])
bert.encoder.layer.7.attention.output.LayerNorm.weight torch.Size([768])
bert.encoder.layer.7.attention.output.LayerNorm.bias torch.Size([768])
bert.encoder.layer.7.intermediate.dense.weight torch.Size([3072, 768])
bert.encoder.layer.7.intermediate.dense.bias torch.Size([3072])
bert.encoder.layer.7.output.dense.weight torch.Size([768, 3072])
bert.encoder.layer.7.output.dense.bias torch.Size([768])
bert.encoder.layer.7.output.LayerNorm.weight torch.Size([768])
bert.encoder.layer.7.output.LayerNorm.bias torch.Size([768])
bert.encoder.layer.8.attention.self.query.weight torch.Size([768, 768])
bert.encoder.layer.8.attention.self.query.bias torch.Size([768])
bert.encoder.layer.8.attention.self.key.weight torch.Size([768, 768])
bert.encoder.layer.8.attention.self.key.bias torch.Size([768])
bert.encoder.layer.8.attention.self.value.weight torch.Size([768, 768])
bert.encoder.layer.8.attention.self.value.bias torch.Size([768])
bert.encoder.layer.8.attention.output.dense.weight torch.Size([768, 768])
bert.encoder.layer.8.attention.output.dense.bias torch.Size([768])
bert.encoder.layer.8.attention.output.LayerNorm.weight torch.Size([768])
bert.encoder.layer.8.attention.output.LayerNorm.bias torch.Size([768])
bert.encoder.layer.8.intermediate.dense.weight torch.Size([3072, 768])
bert.encoder.layer.8.intermediate.dense.bias torch.Size([3072])
bert.encoder.layer.8.output.dense.weight torch.Size([768, 3072])
bert.encoder.layer.8.output.dense.bias torch.Size([768])
bert.encoder.layer.8.output.LayerNorm.weight torch.Size([768])
bert.encoder.layer.8.output.LayerNorm.bias torch.Size([768])
bert.encoder.layer.9.attention.self.query.weight torch.Size([768, 768])
bert.encoder.layer.9.attention.self.query.bias torch.Size([768])
bert.encoder.layer.9.attention.self.key.weight torch.Size([768, 768])
bert.encoder.layer.9.attention.self.key.bias torch.Size([768])
bert.encoder.layer.9.attention.self.value.weight torch.Size([768, 768])
bert.encoder.layer.9.attention.self.value.bias torch.Size([768])
bert.encoder.layer.9.attention.output.dense.weight torch.Size([768, 768])
bert.encoder.layer.9.attention.output.dense.bias torch.Size([768])
bert.encoder.layer.9.attention.output.LayerNorm.weight torch.Size([768])
bert.encoder.layer.9.attention.output.LayerNorm.bias torch.Size([768])
bert.encoder.layer.9.intermediate.dense.weight torch.Size([3072, 768])
bert.encoder.layer.9.intermediate.dense.bias torch.Size([3072])
bert.encoder.layer.9.output.dense.weight torch.Size([768, 3072])
bert.encoder.layer.9.output.dense.bias torch.Size([768])
bert.encoder.layer.9.output.LayerNorm.weight torch.Size([768])
bert.encoder.layer.9.output.LayerNorm.bias torch.Size([768])
bert.encoder.layer.10.attention.self.query.weight torch.Size([768, 768])
bert.encoder.layer.10.attention.self.query.bias torch.Size([768])
bert.encoder.layer.10.attention.self.key.weight torch.Size([768, 768])
bert.encoder.layer.10.attention.self.key.bias torch.Size([768])
bert.encoder.layer.10.attention.self.value.weight torch.Size([768, 768])
bert.encoder.layer.10.attention.self.value.bias torch.Size([768])
bert.encoder.layer.10.attention.output.dense.weight torch.Size([768, 768])
bert.encoder.layer.10.attention.output.dense.bias torch.Size([768])
bert.encoder.layer.10.attention.output.LayerNorm.weight torch.Size([768])
bert.encoder.layer.10.attention.output.LayerNorm.bias torch.Size([768])
bert.encoder.layer.10.intermediate.dense.weight torch.Size([3072, 768])
bert.encoder.layer.10.intermediate.dense.bias torch.Size([3072])
bert.encoder.layer.10.output.dense.weight torch.Size([768, 3072])
bert.encoder.layer.10.output.dense.bias torch.Size([768])
bert.encoder.layer.10.output.LayerNorm.weight torch.Size([768])
bert.encoder.layer.10.output.LayerNorm.bias torch.Size([768])
bert.encoder.layer.11.attention.self.query.weight torch.Size([768, 768])
bert.encoder.layer.11.attention.self.query.bias torch.Size([768])
bert.encoder.layer.11.attention.self.key.weight torch.Size([768, 768])
bert.encoder.layer.11.attention.self.key.bias torch.Size([768])
bert.encoder.layer.11.attention.self.value.weight torch.Size([768, 768])
bert.encoder.layer.11.attention.self.value.bias torch.Size([768])
bert.encoder.layer.11.attention.output.dense.weight torch.Size([768, 768])
bert.encoder.layer.11.attention.output.dense.bias torch.Size([768])
bert.encoder.layer.11.attention.output.LayerNorm.weight torch.Size([768])
bert.encoder.layer.11.attention.output.LayerNorm.bias torch.Size([768])
bert.encoder.layer.11.intermediate.dense.weight torch.Size([3072, 768])
bert.encoder.layer.11.intermediate.dense.bias torch.Size([3072])
bert.encoder.layer.11.output.dense.weight torch.Size([768, 3072])
bert.encoder.layer.11.output.dense.bias torch.Size([768])
bert.encoder.layer.11.output.LayerNorm.weight torch.Size([768])
bert.encoder.layer.11.output.LayerNorm.bias torch.Size([768])
bert.pooler.dense.weight torch.Size([768, 768])
bert.pooler.dense.bias torch.Size([768])
classifier.weight torch.Size([7, 768])
classifier.bias torch.Size([7])

为了实现numpy的bert模型,踩了两天的坑,一步步对比huggingface源码实现的,真的太难了~~~

这是使用numpy实现的bert代码,分数上和huggingface有稍微的一点点区别,可能是模型太大,保存的模型参数误差累计造成的!

看下面的代码真的有利于直接了解bert模型结构,各种细节简单又到位,自己都服自己,研究这个东西~~~

import numpy as np

def word_embedding(input_ids, word_embeddings):
return word_embeddings[input_ids] def position_embedding(position_ids, position_embeddings):
return position_embeddings[position_ids] def token_type_embedding(token_type_ids, token_type_embeddings):
return token_type_embeddings[token_type_ids] def softmax(x, axis=None):
# e_x = np.exp(x).astype(np.float32) #
e_x = np.exp(x - np.max(x, axis=axis, keepdims=True))
sum_ex = np.sum(e_x, axis=axis,keepdims=True).astype(np.float32)
return e_x / sum_ex def scaled_dot_product_attention(Q, K, V, mask=None):
d_k = Q.shape[-1]
scores = np.matmul(Q, K.transpose(0, 2, 1)) / np.sqrt(d_k)
if mask is not None:
scores = np.where(mask, scores, np.full_like(scores, -np.inf))
attention_weights = softmax(scores, axis=-1)
# print(attention_weights)
# print(np.sum(attention_weights,axis=-1))
output = np.matmul(attention_weights, V)
return output, attention_weights def multihead_attention(input, num_heads,W_Q,B_Q,W_K,B_K,W_V,B_V,W_O,B_O): q = np.matmul(input, W_Q.T)+B_Q
k = np.matmul(input, W_K.T)+B_K
v = np.matmul(input, W_V.T)+B_V # 分割输入为多个头
q = np.split(q, num_heads, axis=-1)
k = np.split(k, num_heads, axis=-1)
v = np.split(v, num_heads, axis=-1) outputs = []
for q_,k_,v_ in zip(q,k,v):
output, attention_weights = scaled_dot_product_attention(q_, k_, v_)
outputs.append(output)
outputs = np.concatenate(outputs, axis=-1)
outputs = np.matmul(outputs, W_O.T)+B_O
return outputs def layer_normalization(x, weight, bias, eps=1e-12):
mean = np.mean(x, axis=-1, keepdims=True)
variance = np.var(x, axis=-1, keepdims=True)
std = np.sqrt(variance + eps)
normalized_x = (x - mean) / std
output = weight * normalized_x + bias
return output def feed_forward_layer(inputs, weight, bias, activation='relu'):
linear_output = np.matmul(inputs,weight) + bias if activation == 'relu':
activated_output = np.maximum(0, linear_output) # ReLU激活函数
elif activation == 'gelu':
activated_output = 0.5 * linear_output * (1 + np.tanh(np.sqrt(2 / np.pi) * (linear_output + 0.044715 * np.power(linear_output, 3)))) # GELU激活函数 elif activation == "tanh" :
activated_output = np.tanh(linear_output)
else:
activated_output = linear_output # 无激活函数 return activated_output def residual_connection(inputs, residual):
# 残差连接
residual_output = inputs + residual
return residual_output def tokenize_sentence(sentence, vocab_file = 'vocab.txt'):
with open(vocab_file, 'r', encoding='utf-8') as f:
vocab = f.readlines()
vocab = [i.strip() for i in vocab]
# print(len(vocab)) tokenized_sentence = ['[CLS]'] + list(sentence) + ["[SEP]"] # 在句子开头添加[cls]
token_ids = [vocab.index(token) for token in tokenized_sentence] return token_ids # 加载保存的模型数据
model_data = np.load('bert_model_params.npz') word_embeddings = model_data["bert.embeddings.word_embeddings.weight"]
position_embeddings = model_data["bert.embeddings.position_embeddings.weight"]
token_type_embeddings = model_data["bert.embeddings.token_type_embeddings.weight"]
def model_input(sentence):
token_ids = tokenize_sentence(sentence)
input_ids = np.array(token_ids) # 输入的词汇id
word_embedded = word_embedding(input_ids, word_embeddings) position_ids = np.array(range(len(input_ids))) # 位置id
# 位置嵌入矩阵,形状为 (max_position, embedding_size)
position_embedded = position_embedding(position_ids, position_embeddings) token_type_ids = np.array([0]*len(input_ids)) # 片段类型id
# 片段类型嵌入矩阵,形状为 (num_token_types, embedding_size)
token_type_embedded = token_type_embedding(token_type_ids, token_type_embeddings) embedding_output = np.expand_dims(word_embedded + position_embedded + token_type_embedded, axis=0) return embedding_output def bert(input,num_heads): ebd_LayerNorm_weight = model_data['bert.embeddings.LayerNorm.weight']
ebd_LayerNorm_bias = model_data['bert.embeddings.LayerNorm.bias']
input = layer_normalization(input,ebd_LayerNorm_weight,ebd_LayerNorm_bias) #这里和模型输出一致 for i in range(12):
# 调用多头自注意力函数
W_Q = model_data['bert.encoder.layer.{}.attention.self.query.weight'.format(i)]
B_Q = model_data['bert.encoder.layer.{}.attention.self.query.bias'.format(i)]
W_K = model_data['bert.encoder.layer.{}.attention.self.key.weight'.format(i)]
B_K = model_data['bert.encoder.layer.{}.attention.self.key.bias'.format(i)]
W_V = model_data['bert.encoder.layer.{}.attention.self.value.weight'.format(i)]
B_V = model_data['bert.encoder.layer.{}.attention.self.value.bias'.format(i)]
W_O = model_data['bert.encoder.layer.{}.attention.output.dense.weight'.format(i)]
B_O = model_data['bert.encoder.layer.{}.attention.output.dense.bias'.format(i)]
attention_output_LayerNorm_weight = model_data['bert.encoder.layer.{}.attention.output.LayerNorm.weight'.format(i)]
attention_output_LayerNorm_bias = model_data['bert.encoder.layer.{}.attention.output.LayerNorm.bias'.format(i)]
intermediate_weight = model_data['bert.encoder.layer.{}.intermediate.dense.weight'.format(i)]
intermediate_bias = model_data['bert.encoder.layer.{}.intermediate.dense.bias'.format(i)]
dense_weight = model_data['bert.encoder.layer.{}.output.dense.weight'.format(i)]
dense_bias = model_data['bert.encoder.layer.{}.output.dense.bias'.format(i)]
output_LayerNorm_weight = model_data['bert.encoder.layer.{}.output.LayerNorm.weight'.format(i)]
output_LayerNorm_bias = model_data['bert.encoder.layer.{}.output.LayerNorm.bias'.format(i)] output = multihead_attention(input, num_heads,W_Q,B_Q,W_K,B_K,W_V,B_V,W_O,B_O)
output = residual_connection(input,output)
output1 = layer_normalization(output,attention_output_LayerNorm_weight,attention_output_LayerNorm_bias) #这里和模型输出一致 output = feed_forward_layer(output1, intermediate_weight.T, intermediate_bias, activation='gelu')
output = feed_forward_layer(output, dense_weight.T, dense_bias, activation='')
output = residual_connection(output1,output)
output2 = layer_normalization(output,output_LayerNorm_weight,output_LayerNorm_bias) #一致 input = output2 bert_pooler_dense_weight = model_data['bert.pooler.dense.weight']
bert_pooler_dense_bias = model_data['bert.pooler.dense.bias']
output = feed_forward_layer(output2, bert_pooler_dense_weight.T, bert_pooler_dense_bias, activation='tanh') #一致
return output # for i in model_data:
# # print(i)
# print(i,model_data[i].shape)
id2label = {0: 'mainland China politics', 1: 'Hong Kong - Macau politics', 2: 'International news', 3: 'financial news', 4: 'culture', 5: 'entertainment', 6: 'sports'}
if __name__ == "__main__": # 示例用法
sentence = '马拉松决赛'
# print(model_input(sentence).shape)
output = bert(model_input(sentence),num_heads=12)
# print(output)
classifier_weight = model_data['classifier.weight']
classifier_bias = model_data['classifier.bias']
output = feed_forward_layer(output[:,0,:], classifier_weight.T, classifier_bias, activation='')
# print(output)
output = softmax(output,axis=-1)
label_id = np.argmax(output,axis=-1)
label_score = output[0][label_id]
print(id2label[label_id[0]],label_score)

这是hugging face上找的一个别人训练好的模型,roberta模型作新闻7分类,并且保存模型结构为numpy格式,为了上面的代码加载

import numpy as np
from transformers import AutoModelForSequenceClassification,AutoTokenizer,pipeline
model = AutoModelForSequenceClassification.from_pretrained('uer/roberta-base-finetuned-chinanews-chinese')
tokenizer = AutoTokenizer.from_pretrained('uer/roberta-base-finetuned-chinanews-chinese')
text_classification = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer)
print(text_classification("马拉松决赛")) # print(model) # 打印BERT模型的权重维度
for name, param in model.named_parameters():
print(name, param.data.shape) # # # 保存模型参数为NumPy格式
model_params = {name: param.data.cpu().numpy() for name, param in model.named_parameters()}
np.savez('bert_model_params.npz', **model_params)
# model_params

对比两个结果:

hugging face:[{'label': 'sports', 'score': 0.9929242134094238}]

numpy:sports [0.9928773]

使用numpy实现bert模型,使用hugging face 或pytorch训练模型,保存参数为numpy格式,然后使用numpy加载模型推理,可在树莓派上运行的更多相关文章

  1. 深度学习原理与框架-猫狗图像识别-卷积神经网络(代码) 1.cv2.resize(图片压缩) 2..get_shape()[1:4].num_elements(获得最后三维度之和) 3.saver.save(训练参数的保存) 4.tf.train.import_meta_graph(加载模型结构) 5.saver.restore(训练参数载入)

    1.cv2.resize(image, (image_size, image_size), 0, 0, cv2.INTER_LINEAR) 参数说明:image表示输入图片,image_size表示变 ...

  2. 【4】TensorFlow光速入门-保存模型及加载模型并使用

    本文地址:https://www.cnblogs.com/tujia/p/13862360.html 系列文章: [0]TensorFlow光速入门-序 [1]TensorFlow光速入门-tenso ...

  3. MindSpore保存与加载模型

    技术背景 近几年在机器学习和传统搜索算法的结合中,逐渐发展出了一种Search To Optimization的思维,旨在通过构造一个特定的机器学习模型,来替代传统算法中的搜索过程,进而加速经典图论等 ...

  4. NeHe OpenGL教程 第三十一课:加载模型

    转自[翻译]NeHe OpenGL 教程 前言 声明,此 NeHe OpenGL教程系列文章由51博客yarin翻译(2010-08-19),本博客为转载并稍加整理与修改.对NeHe的OpenGL管线 ...

  5. tensorflow学习笔记2:c++程序静态链接tensorflow库加载模型文件

    首先需要搞定tensorflow c++库,搜了一遍没有找到现成的包,于是下载tensorflow的源码开始编译: tensorflow的contrib中有一个makefile项目,极大的简化的接下来 ...

  6. PyTorch保存模型与加载模型+Finetune预训练模型使用

    Pytorch 保存模型与加载模型 PyTorch之保存加载模型 参数初始化参 数的初始化其实就是对参数赋值.而我们需要学习的参数其实都是Variable,它其实是对Tensor的封装,同时提供了da ...

  7. [Pytorch]Pytorch 保存模型与加载模型(转)

    转自:知乎 目录: 保存模型与加载模型 冻结一部分参数,训练另一部分参数 采用不同的学习率进行训练 1.保存模型与加载 简单的保存与加载方法: # 保存整个网络 torch.save(net, PAT ...

  8. tensorflowjs下载源文件到本地不能加载模型解决方案

    大多数情况(非源文件错误)下载源文件到本地不能加载模型,那么你可能需要搭建一个本地WEB服务器. 1.安装apache或ngnix,可以参照这个博客 2.强烈推荐一个Chrome插件Web Serve ...

  9. WPF 3D动态加载模型文件

    原文:WPF 3D动态加载模型文件 这篇文章需要读者对WPF 3D有一个基本了解,至少看过官方的MSDN例子. 一般来说关于WPF使用3D的例子,都是下面的流程: 1.美工用3DMAX做好模型,生成一 ...

  10. [译]Vulkan教程(31)加载模型

    [译]Vulkan教程(31)加载模型 Loading models 加载模型 Introduction 入门 Your program is now ready to render textured ...

随机推荐

  1. 4.测试类mapper报错

    1.总结:前几天还有今天一直在弄测试类报错的原因,想着项目是一个大整体,写一个mappe测试类,测试一个mapper,这样后面不会出错: 但是在测试mapper的时候一直,出现mapper值为空的异常 ...

  2. ZR.Admin小改和VUE3版本体验

    前言 孔乙己显出极高兴的样子,将两个指头的长指甲敲着柜台,点头说:"对呀,对呀!......回字有四样写法,你知道么?" 大家好,我是44岁的大龄程序员码农阿峰.阿峰从事编程二十年 ...

  3. 【LeetCode回溯算法#extra01】集合划分问题【火柴拼正方形、划分k个相等子集、公平发饼干】

    火柴拼正方形 https://leetcode.cn/problems/matchsticks-to-square/ 你将得到一个整数数组 matchsticks ,其中 matchsticks[i] ...

  4. CentOS 7 更改内网 IP

    打开网络配置文件 vim /etc/sysconfig/network-scripts/ifcfg-em2 修改配置文件如下 TYPE=Ethernet PROXY_METHOD=none BROWS ...

  5. Java:如何加密或解密PDF文档?

    在工作中,我们会将重要的文档进行加密,并且设置用户的访问权限,其他外部人员均无法打开,只有获取该权限的用户才有资格打开文档.此外,限制用户的使用权限,极大程度上阻止了那些有意要篡改.拷贝其中内容的人, ...

  6. ASTAR机台(win7 p'rofessional)使用python tool中文显示异常问题解决

    1.双击"computer"打开界面如下,再单击"open control panel"打开控制面板. 2.在控制面板中点击"Clock,Langua ...

  7. hackathon 复盘:niche 海外软件工具正确的方法 6 个步骤

    上周末,去参加了北京思否 hackathon,两天时间内从脑暴 & 挖掘软件 IDEA -> Demo 研发路演,这次经历让我难忘.这里我的看法是每个开发者圈友,都应该去参加一次 hac ...

  8. Go中的有限状态机FSM的详细介绍

    1.FSM简介 1.1 有限状态机的定义 有限状态机(Finite State Machine,FSM)是一种数学模型,用于描述系统在不同状态下的行为和转移条件. 状态机有三个组成部分:状态(Stat ...

  9. springCloud Alibaba服务的注册与发现之eureka搭建

    1.创建eureka微服务模块.导入maven依赖. <dependency> <groupId>org.springframework.cloud</groupId&g ...

  10. 【总结】从++i思考计算机原子性和线程安全

    在C++中,++i被认为是一种原子性操作,即不可分割的.不可中断的整体.它能够确保对变量的修改完整且正确,从而避免了数据竞争等问题,提高了程序的并发性和可靠性.然而,有些人可能会将原子性和线程安全混淆 ...