DeepSeek 多模态模型 Janus-Pro 本地部署
一、概述
Janus-Pro是DeepSeek最新开源的多模态模型,是一种新颖的自回归框架,统一了多模态理解和生成。通过将视觉编码解耦为独立的路径,同时仍然使用单一的、统一的变压器架构进行处理,该框架解决了先前方法的局限性。这种解耦不仅缓解了视觉编码器在理解和生成中的角色冲突,还增强了框架的灵活性。Janus-Pro 超过了以前的统一模型,并且匹配或超过了特定任务模型的性能。
代码链接:https://github.com/deepseek-ai/Janus
模型链接:https://modelscope.cn/collections/Janus-Pro-0f5e48f6b96047
体验页面:https://modelscope.cn/studios/AI-ModelScope/Janus-Pro-7B
二、虚拟环境
环境说明
本文使用WSL2运行的ubuntu系统来进行演示,参考链接:https://www.cnblogs.com/xiao987334176/p/18864140
创建虚拟环境
conda create --name vll-Janus-Pro-7B python=3.12.7
激活虚拟环境,执行命令:
conda activate vll-Janus-Pro-7B
查看CUDA版本,执行命令:
# nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Wed_Jan_15_19:20:09_PST_2025
Cuda compilation tools, release 12.8, V12.8.61
Build cuda_12.8.r12.8/compiler.35404655_0
三、安装Janus-Pro
创建项目目录
mkdir vllm
cd vllm
克隆代码
git clone https://github.com/deepseek-ai/Janus
安装依赖包,注意:这里要手动安装pytorch,指定版本。
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
安装其他依赖组件
pip3 install transformers attrdict einops timm
下载模型
可以用modelscope下载,安装modelscope,命令如下:
pip install modelscope modelscope download --model deepseek-ai/Janus-Pro-7B
效果如下:
# modelscope download --model deepseek-ai/Janus-Pro-7B
Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/deepseek-ai/Janus-Pro-7B
Downloading [config.json]: 100%|███████████████████████████████████████████████████| 1.42k/1.42k [00:00<00:00, 5.29kB/s]
Downloading [configuration.json]: 100%|████████████████████████████████████████████████| 68.0/68.0 [00:00<00:00, 221B/s]
Downloading [README.md]: 100%|█████████████████████████████████████████████████████| 2.49k/2.49k [00:00<00:00, 7.20kB/s]
Downloading [processor_config.json]: 100%|███████████████████████████████████████████████| 210/210 [00:00<00:00, 590B/s]
Downloading [janus_pro_teaser1.png]: 100%|██████████████████████████████████████████| 95.7k/95.7k [00:00<00:00, 267kB/s]
Downloading [preprocessor_config.json]: 100%|████████████████████████████████████████████| 346/346 [00:00<00:00, 867B/s]
Downloading [janus_pro_teaser2.png]: 100%|███████████████████████████████████████████| 518k/518k [00:00<00:00, 1.18MB/s]
Downloading [special_tokens_map.json]: 100%|███████████████████████████████████████████| 344/344 [00:00<00:00, 1.50kB/s]
Downloading [tokenizer_config.json]: 100%|███████████████████████████████████████████████| 285/285 [00:00<00:00, 926B/s]
Downloading [pytorch_model.bin]: 0%|▏ | 16.0M/3.89G [00:00<03:55, 17.7MB/s]
Downloading [tokenizer.json]: 100%|████████████████████████████████████████████████| 4.50M/4.50M [00:00<00:00, 6.55MB/s]
Processing 11 items: 91%|█████████████████████████████████████████████████████▋ | 10.0/11.0 [00:19<00:00, 14.1it/s]
Downloading [pytorch_model.bin]: 100%|█████████████████████████████████████████████| 3.89G/3.89G [09:18<00:00, 7.48MB/s]
Processing 11 items: 100%|███████████████████████████████████████████████████████████| 11.0/11.0 [09:24<00:00, 51.3s/it]
可以看到下载目录为/root/.cache/modelscope/hub/models/deepseek-ai/Janus-Pro-1B
把下载的模型移动到vllm目录里面
mv /root/.cache/modelscope/hub/models/deepseek-ai /home/xiao/vllm
四、测试图片理解
vllm目录有2个文件夹,结构如下:
# ll
total 20
drwxr-xr-x 4 root root 4096 May 8 18:59 ./
drwxr-x--- 5 xiao xiao 4096 May 8 14:50 ../
drwxr-xr-x 8 root root 4096 May 8 18:59 Janus/
drwxr-xr-x 4 root root 4096 May 8 16:01 deepseek-ai/
进入deepseek-ai目录,会看到一个文件夹Janus-Pro-7B
# ll
total 16
drwxr-xr-x 4 root root 4096 May 8 16:01 ./
drwxr-xr-x 4 root root 4096 May 8 18:59 ../
drwxr-xr-x 2 root root 4096 May 7 18:32 Janus-Pro-7B/
返回上一级,在Janus目录,创建image_understanding.py文件,代码如下:
import torch
from transformers import AutoModelForCausalLM
from janus.models import MultiModalityCausalLM, VLChatProcessor
from janus.utils.io import load_pil_images model_path = "../deepseek-ai/Janus-Pro-7B" image='aa.jpeg'
question='请说明一下这张图片'
vl_chat_processor: VLChatProcessor = VLChatProcessor.from_pretrained(model_path)
tokenizer = vl_chat_processor.tokenizer vl_gpt: MultiModalityCausalLM = AutoModelForCausalLM.from_pretrained(
model_path, trust_remote_code=True
)
vl_gpt = vl_gpt.to(torch.bfloat16).cuda().eval() conversation = [
{
"role": "<|User|>",
"content": f"<image_placeholder>\n{question}",
"images": [image],
},
{"role": "<|Assistant|>", "content": ""},
] # load images and prepare for inputs
pil_images = load_pil_images(conversation)
prepare_inputs = vl_chat_processor(
conversations=conversation, images=pil_images, force_batchify=True
).to(vl_gpt.device) # # run image encoder to get the image embeddings
inputs_embeds = vl_gpt.prepare_inputs_embeds(**prepare_inputs) # # run the model to get the response
outputs = vl_gpt.language_model.generate(
inputs_embeds=inputs_embeds,
attention_mask=prepare_inputs.attention_mask,
pad_token_id=tokenizer.eos_token_id,
bos_token_id=tokenizer.bos_token_id,
eos_token_id=tokenizer.eos_token_id,
max_new_tokens=512,
do_sample=False,
use_cache=True,
) answer = tokenizer.decode(outputs[0].cpu().tolist(), skip_special_tokens=True)
print(f"{prepare_inputs['sft_format'][0]}", answer)
下载一张图片,地址:https://pics6.baidu.com/feed/09fa513d269759ee74c8d049640fcc1b6f22df9e.jpeg
将此图片,重命名为aa.jpeg,存放在Janus目录
最终Janus目录,文件如下:
# ll
total 2976
drwxr-xr-x 8 root root 4096 May 8 18:59 ./
drwxr-xr-x 4 root root 4096 May 8 18:59 ../
drwxr-xr-x 8 root root 4096 May 7 18:11 .git/
-rw-r--r-- 1 root root 115 May 7 18:11 .gitattributes
-rw-r--r-- 1 root root 7301 May 7 18:11 .gitignore
-rw-r--r-- 1 root root 1065 May 7 18:11 LICENSE-CODE
-rw-r--r-- 1 root root 13718 May 7 18:11 LICENSE-MODEL
-rw-r--r-- 1 root root 3069 May 7 18:11 Makefile
-rwxr-xr-x 1 root root 26781 May 7 18:11 README.md*
-rw-r--r-- 1 root root 62816 May 8 14:59 aa.jpeg
drwxr-xr-x 2 root root 4096 May 7 18:11 demo/
drwxr-xr-x 2 root root 4096 May 8 17:19 generated_samples/
-rw-r--r-- 1 root root 4515 May 7 18:11 generation_inference.py
-rw-r--r-- 1 xiao xiao 4066 May 8 18:50 image_generation.py
-rw-r--r-- 1 root root 1594 May 8 18:58 image_understanding.py
drwxr-xr-x 2 root root 4096 May 7 18:11 images/
-rw-r--r-- 1 root root 2642 May 7 18:11 inference.py
-rw-r--r-- 1 root root 5188 May 7 18:11 interactivechat.py
drwxr-xr-x 6 root root 4096 May 7 19:01 janus/
drwxr-xr-x 2 root root 4096 May 7 18:11 janus.egg-info/
-rw-r--r-- 1 root root 2846268 May 7 18:11 janus_pro_tech_report.pdf
-rw-r--r-- 1 root root 1111 May 7 18:11 pyproject.toml
-rw-r--r-- 1 root root 278 May 7 18:11 requirements.txt
运行代码,效果如下:
# python image_understanding.py
Python version is above 3.10, patching the collections module.
/root/anaconda3/envs/vll-Janus-Pro-7B/lib/python3.12/site-packages/transformers/models/auto/image_processing_auto.py:604: FutureWarning: The image_processor_class argument is deprecated and will be removed in v4.42. Please use `slow_image_processor_class`, or `fast_image_processor_class` instead
warnings.warn(
Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message.
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████| 2/2 [00:10<00:00, 5.18s/it]
You are a helpful language and vision assistant. You are able to understand the visual content that the user provides, and assist the user with a variety of tasks using natural language. <|User|>: <image_placeholder>
请说明一下这张图片 <|Assistant|>: 这张图片展示了一位身穿传统服饰的女性,她正坐在户外,双手合十,闭着眼睛,似乎在进行冥想或祈祷。背景是绿色的树木和植物,阳光透过树叶洒在她的身上,营造出一种宁静、祥和的氛围。她的服装以淡雅的白色和粉色为主,带有精致的花纹,整体风格非常优雅。
描述还是比较准确的
五、测试图片生成
在Janus目录,新建image_generation.py脚本,代码如下:
import os
import torch
import numpy as np
from PIL import Image
from transformers import AutoModelForCausalLM
from janus.models import MultiModalityCausalLM, VLChatProcessor model_path = "../deepseek-ai/Janus-Pro-7B"
vl_chat_processor: VLChatProcessor = VLChatProcessor.from_pretrained(model_path)
tokenizer = vl_chat_processor.tokenizer vl_gpt: MultiModalityCausalLM = AutoModelForCausalLM.from_pretrained(
model_path, trust_remote_code=True
)
vl_gpt = vl_gpt.to(torch.bfloat16).cuda().eval() conversation = [
{"role": "<|User|>", "content": "超写实8K渲染,一位具有东方古典美的中国女性,瓜子脸,西昌的眉毛如弯弯的月牙,双眼明亮而深邃,犹如夜空中闪烁的星星。高挺的鼻梁,樱桃小嘴微微上扬,透露出一丝诱人的微笑。她的头发如黑色的瀑布般垂直落在减胖两侧,微风轻轻浮动发色。肌肤白皙如雪,在阳光下泛着微微的光泽。她身着乙烯白色的透薄如纱的连衣裙,裙摆在海风中轻轻飘动。"},
{"role": "<|Assistant|>", "content": ""},
] sft_format = vl_chat_processor.apply_sft_template_for_multi_turn_prompts(
conversations=conversation,
sft_format=vl_chat_processor.sft_format,
system_prompt=""
)
prompt = sft_format + vl_chat_processor.image_start_tag @torch.inference_mode()
def generate(
mmgpt: MultiModalityCausalLM,
vl_chat_processor: VLChatProcessor,
prompt: str,
temperature: float = 1,
parallel_size: int = 1, # 减小 parallel_size
cfg_weight: float = 5,
image_token_num_per_image: int = 576,
img_size: int = 384,
patch_size: int = 16,
):
input_ids = vl_chat_processor.tokenizer.encode(prompt)
input_ids = torch.LongTensor(input_ids) tokens = torch.zeros((parallel_size * 2, len(input_ids)), dtype=torch.int).cuda()
for i in range(parallel_size * 2):
tokens[i, :] = input_ids
if i % 2 != 0:
tokens[i, 1:-1] = vl_chat_processor.pad_id inputs_embeds = mmgpt.language_model.get_input_embeddings()(tokens) generated_tokens = torch.zeros((parallel_size, image_token_num_per_image), dtype=torch.int).cuda() for i in range(image_token_num_per_image):
outputs = mmgpt.language_model.model(inputs_embeds=inputs_embeds, use_cache=True,
past_key_values=outputs.past_key_values if i != 0 else None)
hidden_states = outputs.last_hidden_state logits = mmgpt.gen_head(hidden_states[:, -1, :])
logit_cond = logits[0::2, :]
logit_uncond = logits[1::2, :] logits = logit_uncond + cfg_weight * (logit_cond - logit_uncond)
probs = torch.softmax(logits / temperature, dim=-1) next_token = torch.multinomial(probs, num_samples=1)
generated_tokens[:, i] = next_token.squeeze(dim=-1)
next_token = torch.cat([next_token.unsqueeze(dim=1),
next_token.unsqueeze(dim=1)], dim=1).view(-1)
img_embeds = mmgpt.prepare_gen_img_embeds(next_token)
inputs_embeds = img_embeds.unsqueeze(dim=1)
# 添加显存清理
del logits, logit_cond, logit_uncond, probs
torch.cuda.empty_cache() dec = mmgpt.gen_vision_model.decode_code(generated_tokens.to(dtype=torch.int),
shape=[parallel_size, 8, img_size // patch_size, img_size // patch_size])
dec = dec.to(torch.float32).cpu().numpy().transpose(0, 2, 3, 1) dec = np.clip((dec + 1) / 2 * 255, 0, 255) visual_img = np.zeros((parallel_size, img_size, img_size, 3), dtype=np.uint8)
visual_img[:, :, :] = dec os.makedirs('generated_samples', exist_ok=True)
for i in range(parallel_size):
save_path = os.path.join('generated_samples', f"img_{i}.jpg")
img = Image.fromarray(visual_img[i])
img.save(save_path) generate(
vl_gpt,
vl_chat_processor,
prompt,
)
注意:提示词是可以写中文的,不一定非要是英文。
运行代码,效果如下:
# python image_generation.py
Python version is above 3.10, patching the collections module.
/root/anaconda3/envs/vll-Janus-Pro-7B/lib/python3.12/site-packages/transformers/models/auto/image_processing_auto.py:604: FutureWarning: The image_processor_class argument is deprecated and will be removed in v4.42. Please use `slow_image_processor_class`, or `fast_image_processor_class` instead
warnings.warn(
Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message.
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████| 1/1 [00:09<00:00, 4.58s/it]
注意观察一下GPU使用情况,这里会很高。
RTX 5080显卡,16GB显存,几乎已经占满了。
等待30秒左右,就会生成一张图片。
打开小企鹅,进入目录\home\xiao\vllm\Janus\generated_samples
这里会出现一张图片
打开图片,效果如下:
效果还算可以。
DeepSeek 多模态模型 Janus-Pro 本地部署的更多相关文章
- windows下百度离线人脸识别本地部署与使用(nodejs做客户端,c++做服务端,socket做通信)
1.离线人脸识别本地部署 详情请阅读百度人脸识别官网 2.nodejs做socket通信的客户端 为什么不直接通过调用c++编译的exe获得人脸识别结果? 原因:exe运行时会加载很多模型而消耗很多时 ...
- 零样本文本分类应用:基于UTC的医疗意图多分类,打通数据标注-模型训练-模型调优-预测部署全流程。
零样本文本分类应用:基于UTC的医疗意图多分类,打通数据标注-模型训练-模型调优-预测部署全流程. 1.通用文本分类技术UTC介绍 本项目提供基于通用文本分类 UTC(Universal Text C ...
- 本地部署arcgis by eclipse
首次来博客园发帖,从本地部署arcgis api开始吧: 首先还是下载arcgis的api包开始,在中国区官网下载arcgis包: 1.http://support.esrichina.com.cn/ ...
- ArcGIS server开发之API for js 本地部署
ArcGIS Server for javascript 本地部署 第一次使用arcgis server for js开发,在经验方面还有很多的不足,所以将自己在开发过程中遇到的问题写出来与大家共享. ...
- Exceptionless 本地部署
免费开源分布式系统日志收集框架 Exceptionless 前两天看到了这篇文章,亲身体会了下,确实不错,按照官方的文档试了试本地部署,折腾一番后终于成功,记下心得在此,不敢独享. 本地部署官方wik ...
- ArcGIS JavaScript API本地部署离线开发环境[转]
原文地址:http://www.cnblogs.com/brawei/archive/2012/12/28/2837660.html 1 获取ArcGIS JavaScript API API的下载地 ...
- Exceptionless 本地部署踩坑记录
仅已此文记录 Exceptionless 本地部署所遇到的问题 1.安装ElasticSearch文本 执行elasticsearch目录中的elasticsearch.bat 没有执行成功. 使用命 ...
- jsbin本地部署
jsbin 本地运行 1.首先安装node.js,下载地址http://nodejs.org/ 安装完成后,使用node.js安装jsbin,如下:进入node环境,执行下面语句: $ npm ins ...
- 解决fiddler无法抓取本地部署项目的请求问题
在本地部署了几个应用,然后想用fiddler抓取一些请求看看调用了哪些接口,然鹅,一直抓不到... 比如访问地址是这样的: 在网上搜罗半天,找到一个解决方法 在localhost或127.0.0.1后 ...
- ArcGIS API for JavaScript 4.x 本地部署之Apache(含Apache官方下载方法)
IIS.Nginx都说了,老牌的Apache和Tomcat也得说一说(如果喜欢用XAMPP另算) 本篇先说Apache. 安装Apache 这个...说实话,比Nginx难找,Apache最近的版本都 ...
随机推荐
- 重磅发布!DeepSeek 微调秘籍揭秘,一键解锁升级版全家桶,AI 玩家必备神器!
DeepSeek V3/R1 火爆全网,基于原始模型的解决方案和 API 服务已随处可见,陷入低价和免费内卷. 如何站在巨人肩膀上,通过后训练(post-training)结合专业领域数据,低成本打造 ...
- LVGL 8.3.0常用函数快速使用
LVGL 8.3.0使用快速上手教程(使用篇) 定义页面通用样式style // 创建页面样式 static lv_style_t style; lv_style_init(&style); ...
- 全面详解C语言使用cJSON解析JSON字符[转载]
cJSON对象的实现采用了树形结构,每个对象是树的一个节点,每个节点由cJSON这个结构体组成,对象中的元素也由cJSON这个结构体组成.同一层的对象和元素是双向链表结构,由next和prev指针链接 ...
- Typecho 如何开启 HTTPS
一般来说,我们直接开启 HTTPS 就行,开启后进去网站后台修改网站的 URL 即可. 但是我昨天发现,我的工具箱迁移服务器之后,前台看着是很正常的,但是后台的登陆页面引入的依然的 http 标头,所 ...
- C#语法糖foreach语句和using语句联合使用
foreach语句可以和using语句联合使用,比如你需要对多个相机设备进行一些设置,设置完就调用 Dispose() 释放相机资源, 这时可以这样写: 模拟的设备类: class Device : ...
- PVE 配置显卡直通
博客链接:PVE 配置显卡直通 配置 Device: Dell PowerEdge T630 CPU: Intel(R) Xeon(R) E5-2696 v4 x2 GPU 1: Matrox Ele ...
- Dify 和 Manus 的技术架构差异
Dify 框架能够部分实现 Manus 的功能效果,但在复杂任务自动化.多代理协作等领域存在技术差距. 一.核心功能对比 1. 任务拆解与执行能力 Dify:支持通过 Agent 模式 进行任务分解, ...
- Qt 设置QTableView表格列宽自动均分表格
文章目录 Qt 设置QTableView表格列宽自动均分表格 前言 setSectionResizeMode 通过获取字体占的像素来设置 Qt 设置QTableView表格列宽自动均分表格 前言 最近 ...
- Windows编程----线程管理
系统中,进程主要有两部分组成:进程内核对象和进程地址空间.操作系统通过进程内核对象来管理进程,进程地址空间用于维护进程所需的资源:如代码.全局变量.资源文件等. 那么线程也是有两部分组成:线程内核对象 ...
- 在GNU Hurd中感受Mach微内核的进程通信(IPC)
什么是GNU Hurd 具体的时间线已经在官方维基页面得到详细描述[0],笔者在此就简单叙述一下.在1983年Richard Stallman开启了GNU项目,目的是创建一个自由的操作系统[1].在接 ...