技术背景

本文主要介绍在Hugging Face上把bin格式的模型文件转为safetensors格式的模型文件，并下载到本地的方法。

bin转safetensors

首先安装safetensors：

$ python3 -m pip install safetensors --upgrade

然后把Github的safetensors仓库克隆下来：

$ git clone https://github.com/huggingface/safetensors.git

正克隆到 'safetensors'...

remote: Enumerating objects: 4812, done.

remote: Counting objects: 100% (1486/1486), done.

remote: Compressing objects: 100% (406/406), done.

remote: Total 4812 (delta 1340), reused 1082 (delta 1079), pack-reused 3326 (from 2)

接收对象中: 100% (4812/4812), 1.15 MiB | 1.22 MiB/s, 完成.

处理 delta 中: 100% (2457/2457), 完成.

进入子目录：

$ cd safetensors/bindings/python/

$ ll

总用量 84

drwxrwxr-x 6 dechin dechin  4096 2月  21 16:37 ./

drwxrwxr-x 3 dechin dechin  4096 2月  21 16:37 ../

drwxrwxr-x 2 dechin dechin  4096 2月  21 16:37 benches/

-rw-rw-r-- 1 dechin dechin   476 2月  21 16:37 Cargo.toml

-rw-rw-r-- 1 dechin dechin  1454 2月  21 16:37 convert_all.py

-rw-rw-r-- 1 dechin dechin 14769 2月  21 16:37 convert.py

-rw-rw-r-- 1 dechin dechin   729 2月  21 16:37 fuzz.py

-rw-rw-r-- 1 dechin dechin   685 2月  21 16:37 .gitignore

-rw-rw-r-- 1 dechin dechin  1103 2月  21 16:37 Makefile

-rw-rw-r-- 1 dechin dechin   190 2月  21 16:37 MANIFEST.in

-rw-rw-r-- 1 dechin dechin  2419 2月  21 16:37 pyproject.toml

drwxrwxr-x 3 dechin dechin  4096 2月  21 16:37 py_src/

-rw-rw-r-- 1 dechin dechin   852 2月  21 16:37 README.md

-rw-rw-r-- 1 dechin dechin   891 2月  21 16:37 setup.cfg

drwxrwxr-x 2 dechin dechin  4096 2月  21 16:37 src/

-rw-rw-r-- 1 dechin dechin  5612 2月  21 16:37 stub.py

drwxrwxr-x 3 dechin dechin  4096 2月  21 16:37 tests/

其中有一个convert.py的格式转换脚本。查看用法：

$ python3 convert.py --help

usage: convert.py [-h] [--revision REVISION] [--force] [-y] model_id

Simple utility tool to convert automatically some weights on the hub to `safetensors` format. It is PyTorch

exclusive for now. It works by downloading the weights (PT), converting them locally, and uploading them back as a

PR on the hub.

positional arguments:

  model_id             The name of the model on the hub to convert. E.g. `gpt2` or `facebook/wav2vec2-base-960h`

options:

  -h, --help           show this help message and exit

  --revision REVISION  The revision to convert

  --force              Create the PR even if it already exists of if the model was already converted.

  -y                   Ignore safety prompt

这个脚本可以将指定路径的模型文件转为safetensors模型，但是如果直接运行会报错：

$ python3 convert.py --force -y Salesforce/blip-image-captioning-base

config.json: 100%|█████████████████████████████████████████████████████████████| 4.56k/4.56k [00:00<00:00, 15.3MB/s]

pytorch_model.bin: 100%|█████████████████████████████████████████████████████████| 990M/990M [02:06<00:00, 7.82MB/s]

Traceback (most recent call last):

  File "/home/dechin/anaconda3/envs/llama/lib/python3.10/site-packages/huggingface_hub/utils/_http.py", line 406, in hf_raise_for_status

    response.raise_for_status()

  File "/home/dechin/anaconda3/envs/llama/lib/python3.10/site-packages/requests/models.py", line 1024, in raise_for_status

    raise HTTPError(http_error_msg, response=self)

requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/api/models/Salesforce/blip-image-captioning-base/preupload/main?create_pr=1

The above exception was the direct cause of the following exception:

Traceback (most recent call last):

  File "/datb/DeepSeek/safetensors/bindings/python/convert.py", line 369, in <module>

    commit_info, errors = convert(api, model_id, revision=args.revision, force=args.force)

  File "/datb/DeepSeek/safetensors/bindings/python/convert.py", line 313, in convert

    new_pr = api.create_commit(

  File "/home/dechin/anaconda3/envs/llama/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn

    return fn(*args, **kwargs)

  File "/home/dechin/anaconda3/envs/llama/lib/python3.10/site-packages/huggingface_hub/hf_api.py", line 1524, in _inner

    return fn(self, *args, **kwargs)

  File "/home/dechin/anaconda3/envs/llama/lib/python3.10/site-packages/huggingface_hub/hf_api.py", line 3961, in create_commit

    self.preupload_lfs_files(

  File "/home/dechin/anaconda3/envs/llama/lib/python3.10/site-packages/huggingface_hub/hf_api.py", line 4184, in preupload_lfs_files

    _fetch_upload_modes(

  File "/home/dechin/anaconda3/envs/llama/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn

    return fn(*args, **kwargs)

  File "/home/dechin/anaconda3/envs/llama/lib/python3.10/site-packages/huggingface_hub/_commit_api.py", line 542, in _fetch_upload_modes

    hf_raise_for_status(resp)

  File "/home/dechin/anaconda3/envs/llama/lib/python3.10/site-packages/huggingface_hub/utils/_http.py", line 454, in hf_raise_for_status

    raise _format(RepositoryNotFoundError, message, response) from e

huggingface_hub.errors.RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-67b83d70-5af65b805cde0ba55c72abd1;ce2be295-8da7-4230-a8db-f505919ddb85)

Repository Not Found for url: https://huggingface.co/api/models/Salesforce/blip-image-captioning-base/preupload/main?create_pr=1.

Please make sure you specified the correct `repo_id` and `repo_type`.

If you are trying to access a private or gated repo, make sure you are authenticated.

Invalid username or password.

Note: Creating a commit assumes that the repo already exists on the Huggingface Hub. Please use `create_repo` if it's not the case.

这需要我们先注册一个Hugging Face的账号，然后免费获取一个token：

把token加到convert.py的前两行：

$ head -n 2 convert.py

from huggingface_hub import login

login("your_token")

再次执行该转换脚本：

$ python3 convert.py --force -y Salesforce/blip-image-captioning-base

config.json: 100%|█████████████████████████████████████████████████████████████| 4.56k/4.56k [00:00<00:00, 16.2MB/s]

pytorch_model.bin: 100%|█████████████████████████████████████████████████████████| 990M/990M [02:04<00:00, 7.94MB/s]

Pr created at https://huggingface.co/Salesforce/blip-image-captioning-base/discussions/42

### Success

Yay! This model was successfully converted and a PR was open using your token, here:

[https://huggingface.co/Salesforce/blip-image-captioning-base/discussions/42](https://huggingface.co/Salesforce/blip-image-captioning-base/discussions/42)

成功构建safetensor文件。并提交了一个Pull Request，那么就算这个PR没有合并，我们也可以从自己的PR下载相关模型文件。

从HF下载仓库

我们可以使用git-lfs从Hugging Face的PR里面下载模型文件。首先我们从主分支下载所有的小型文件（非LFS文件）：

$ GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/Salesforce/blip-image-captioning-base

正克隆到 'blip-image-captioning-base'...

remote: Enumerating objects: 76, done.

remote: Counting objects: 100% (76/76), done.

remote: Compressing objects: 100% (38/38), done.

remote: Total 76 (delta 39), reused 72 (delta 37), pack-reused 0 (from 0)

展开对象中: 100% (76/76), 323.20 KiB | 1.05 MiB/s, 完成.

下载完轻量级文件，进入到下载的路径下：

$ cd blip-image-captioning-base/

$ ll

总用量 976

drwxrwxr-x 3 dechin dechin   4096 2月  21 17:22 ./

drwxrwxr-x 3 dechin dechin   4096 2月  21 17:22 ../

-rw-rw-r-- 1 dechin dechin   4563 2月  21 17:22 config.json

drwxrwxr-x 9 dechin dechin   4096 2月  21 17:22 .git/

-rw-rw-r-- 1 dechin dechin   1477 2月  21 17:22 .gitattributes

-rw-rw-r-- 1 dechin dechin    287 2月  21 17:22 preprocessor_config.json

-rw-rw-r-- 1 dechin dechin    134 2月  21 17:22 pytorch_model.bin

-rw-rw-r-- 1 dechin dechin   6359 2月  21 17:22 README.md

-rw-rw-r-- 1 dechin dechin    125 2月  21 17:22 special_tokens_map.json

-rw-rw-r-- 1 dechin dechin    134 2月  21 17:22 tf_model.h5

-rw-rw-r-- 1 dechin dechin    506 2月  21 17:22 tokenizer_config.json

-rw-rw-r-- 1 dechin dechin 711396 2月  21 17:22 tokenizer.json

-rw-rw-r-- 1 dechin dechin 231508 2月  21 17:22 vocab.txt

可以看到此时的大模型文件是没有被下载下来的，然后在这个路径下pull我们自己的PR分支的内容：

$ git pull origin refs/pr/42

remote: Enumerating objects: 4, done.

remote: Counting objects: 100% (4/4), done.

remote: Compressing objects: 100% (3/3), done.

remote: Total 3 (delta 1), reused 0 (delta 0), pack-reused 0 (from 0)

展开对象中: 100% (3/3), 1.42 KiB | 1.42 MiB/s, 完成.

来自 https://huggingface.co/Salesforce/blip-image-captioning-base

 * branch            refs/pr/42 -> FETCH_HEAD

更新 82a3776..0f2f8a0

Fast-forward

 model.safetensors | 3 +++

 1 file changed, 3 insertions(+)

 create mode 100644 model.safetensors

origin后面跟着的是我们的PR的名称，在Hugging Face相关分支主页可以查看。再次查看本地路径：

$ ll

总用量 967508

drwxrwxr-x 3 dechin dechin      4096 2月  21 17:24 ./

drwxrwxr-x 3 dechin dechin      4096 2月  21 17:22 ../

-rw-rw-r-- 1 dechin dechin      4563 2月  21 17:22 config.json

drwxrwxr-x 9 dechin dechin      4096 2月  21 17:24 .git/

-rw-rw-r-- 1 dechin dechin      1477 2月  21 17:22 .gitattributes

-rw-rw-r-- 1 dechin dechin 989721336 2月  21 17:24 model.safetensors

-rw-rw-r-- 1 dechin dechin       287 2月  21 17:22 preprocessor_config.json

-rw-rw-r-- 1 dechin dechin       134 2月  21 17:22 pytorch_model.bin

-rw-rw-r-- 1 dechin dechin      6359 2月  21 17:22 README.md

-rw-rw-r-- 1 dechin dechin       125 2月  21 17:22 special_tokens_map.json

-rw-rw-r-- 1 dechin dechin       134 2月  21 17:22 tf_model.h5

-rw-rw-r-- 1 dechin dechin       506 2月  21 17:22 tokenizer_config.json

-rw-rw-r-- 1 dechin dechin    711396 2月  21 17:22 tokenizer.json

-rw-rw-r-- 1 dechin dechin    231508 2月  21 17:22 vocab.txt

可以看到safetensors模型文件下载成功，这样我们就完成了线上模型格式转换，再下载到本地的过程。

总结概要

本文介绍了一种将Hugging Face上bin格式的大模型文件，在线转换为safetensors文件格式，然后下载到本地的方法。

版权声明

本文首发链接为：https://www.cnblogs.com/dechinphy/p/bin-safetensors.html

作者ID：DechinPhy

更多原著文章：https://www.cnblogs.com/dechinphy/

请博主喝咖啡：https://www.cnblogs.com/dechinphy/gallery/image/379634.html

bin格式转safetensors的更多相关文章

keil MDK中如何生成*.bin格式的文件
在Realview MDK的集成开发环境中,默认情况下可以生成*.axf格式的调试文件和*.hex格式的可执行文件.虽然这两个格式的文件非常有利于ULINK2仿真器的下载和调试,但是ADS的用户更习惯 ...
hex格式介绍及转bin格式的源程序
Intel HEX文件是记录文本行的ASCII文本文件,在Intel HEX文件中,每一行是一个HEX记录,由十六进制数组成的机器码或者数据常量.Intel HEX文件经常被用于将程序或数据传输存储到 ...
在MDK中怎样生成*.bin格式的文件?
在Realview MDK的集成开发环境中.默认情况下能够生成*.axf格式的调试文件和*.hex格式的可运行文件. 尽管这两个格式的文件很有利于ULINK2仿真器的下载和调试,可是ADS的用户更习惯 ...
深蓝词库转换2.4版发布，支持最新的搜狗用户词库备份bin格式
很高兴的告诉大家,感谢GitHub上的h4x3rotab提供python版的搜狗用户词库备份bin格式的解析算法,感谢tmxkn1提供了C#版的实现,深蓝词库转换终于迎来了一个重大更新,能够支持搜狗用 ...
在Linux系统中如何装rpm,deb,tar.gz,tar.bz2,apt,bin 格式的文件
首先安装系统自带的 alien 包 :终端 -su-输入密码 -进入ROOT 用户 - sudo apt-get install alien 这样 alien 包就装上去了 !(if alien ...
阶段小项目2：显示bin格式图片
#include<stdlib.h>#include<stdio.h>#include<string.h>#include<error.h>#inclu ...
Centos7下安装.bin格式
1.检查系统是否已经存在jdk版本,#java –version.javac –version 2.赋予.bin的jdk安装包的执行权限chmod u+x jdk-6u25-linux-x64.bin ...
如何在Ubuntu下安装”.deb“、”.bin“、”.tar.gz“、”.tar.bz2“格式的软件包！
今天在Ubuntu11.10中安装Google chrome浏览器是遇到了问题,下载好的“.deb”格式的安装文件google-chrome-stable.deb双击后或者右键快捷菜单选择 Synap ...
pdmreader支持读取xml格式的pdm文件，无法读取二进制格式的pdm文件。
您的Pdm数据字典文件可能不被PDMReader读取,可能是因为pdm文件版本的问题.但您可以通过PowerDesigner12(下载PowerDesigner12)进行转换后进行读取. 您要做的 ...
HEX转BIN源码分析（51系列）
以前写的一个Atmel的S5X的下载程序,其中有支持HEX格式的文件,所以将这个程序贴出来,程序的意思是将输入的HEX文件转换为BIN格式的文件,并存储到文件中,注意不支持64K的扩展模式. int ...

随机推荐

Winform多线程持续读取PLC数据
1.Winform窗体界面 2.后台代码点击查看代码 using Modbus.Device; using System; using System.Collections.Generic; usi ...
用 16G 内存存放 30亿数据（Java Map）转载
在讨论怎么去重,提出用 direct buffer 建 btree,想到应该有现成方案,于是找到一个好东西: MapDB - MapDB : http://www.mapdb.org/ 以下来自:ko ...
【Vue】vue项目搭建、ES6的简单使用（大觅）
目录项目搭建与基本配置项目搭建安装淘宝NPM镜像 cnpm 安装webpack 新建项目运行项目运行时出现的一些问题和解决方案框架安装安装UI框架iView 引入UI框架iView 引入 ...
H2数据库用户自定义函数方法及范例
H2数据库,是Java实现的内存数据库.可使用它作为嵌入式内存数据库,但就其特性还用更多值得应用在实际项目中的意义.之前的一篇Blog中已经描述过其使用方法及丰富的连接数据库方式. 官方主页:http ...
Qt开发经验小技巧251-255
今天在一个头文件中,发现 #ifdef Q_OS_WIN #ifdef Q_CC_MSVC 之类的都失效了,搞得差点怀疑人生了.经历过之前类似的教训后,排查原来是没有提前引入 qglobal.h 头文 ...
Qt编写地图综合应用53-省市轮廓图下载
一.前言 Qt的浏览器控件的交互机制非常方便,所以在在线地图的时候可以对每个区域的经纬度坐标集合发给Qt程序,让他去存储到文件,在实际的测试过程中,发现有部分地图有多个封闭的曲线的,比如散落的岛屿和飞 ...
EPPlus使用方法---Excel处理我觉得超级好用
目前只是用到导出Excel功能,导出大规模数据量速度也很快,而且比较容易操作(最起码导出是,暂时没有用到处理已存在的excel功能,有人说NPOI也好用,试了一下,最起码导出这个不如EPPlus ...
findHomography()函数详解
indHomography: 计算多个二维点对之间的最优单映射变换矩阵 H(3行x3列) ,使用最小均方误差或者RANSAC方法函数功能:找到两个平面之间的转换矩阵. Mat cv::findHom ...
[转]Node.js安装详细步骤教程(Windows版)
什么是Node.js? 简单的说 Node.js 就是运行在服务端的 JavaScript. Node.js是一个基于 Chrome V8 引擎的 JavaScript 运行环境: Node.js使用 ...
即时通讯技术文集（第22期）：IM安全相关文章(Part1) [共13篇]
为了更好地分类阅读 52im.net 总计1000多篇精编文章,我将在每周三推送新的一期技术文集,本次是第22 期. [- 1 -] 即时通讯安全篇(一):正确地理解和使用Android端加密算法 ...

bin格式转safetensors