本文分享自华为云社区《Open-Sora 文生视频原来在AI Gallery上也能体验了》,作者:码上开花_Lancer。

体验链接:Open-Sora 文生视频案例体验

不久前,OpenAI Sora 凭借其惊人的视频生成效果迅速走红,在一堆文本转视频模型中脱颖而出,成为全球关注的焦点。之后,Colossal-AI团队又推出了新的开源解决方案“Open-Sora 1.0”,涵盖了整个训练过程,包括数据处理、所有训练细节和模型检查点,与世界各地的AI爱好者携手推进视频创作的新时代。

详细内容请参考:https://hpc-ai.com/blog/open-sora-v1.0

2024年4月份又更新了Open-Sora 1.1,它可以生成2s~15s,144p到720p分辨率的视频,支持文本到图像、文本到视频以及图像到视频的生成,让我们来看看Open-Sora 1.1的实际视频生成效果:

案例体验

本案例需使用 Pytorch-2.0.1 GPU-V100 及以上规格运行

点击Run in ModelArts,将会进入到ModelArts CodeLab中,这时需要你登录华为云账号,如果没有账号,则需要注册一个,且要进行实名认证,参考《如何创建华为云账号并且实名认证》 即可完成账号注册和实名认证。 登录之后,等待片刻,即可进入到CodeLab的运行环境

出现 Out Of Memory ,请检查是否为您的参数配置过高导致,修改参数配置,重启kernel或更换更高规格资源进行规避

1. 下载代码和模型

此处运行大约需要1分钟,请耐心等待!

import os
import moxing as mox

if not os.path.exists('Open-Sora'):
mox.file.copy_parallel('obs://modelbox-course/open-sora_1.1/Open-Sora', 'Open-Sora') if not os.path.exists('/home/ma-user/.cache/huggingface'):
mox.file.copy_parallel('obs://modelbox-course/huggingface', '/home/ma-user/.cache/huggingface') if not os.path.exists('Open-Sora/opensora/models/sd-vae-ft-ema'):
mox.file.copy_parallel('obs://modelbox-course/sd-vae-ft-ema', 'Open-Sora/opensora/models/sd-vae-ft-ema')

if not os.path.exists('Open-Sora/opensora/models/text_encoder/t5-v1_1-xxl'):
mox.file.copy_parallel('obs://modelbox-course/t5-v1_1-xxl', 'Open-Sora/opensora/models/text_encoder/t5-v1_1-xxl') if not os.path.exists('/home/ma-user/work/t5.py'):
mox.file.copy_parallel('obs://modelbox-course/open-sora_1.1/t5.py', '/home/ma-user/work/t5.py') if not os.path.exists('Open-Sora/opus-mt-zh-en'):
mox.file.copy_parallel('obs://modelarts-labs-bj4-v2/course/ModelBox/opus-mt-zh-en', 'Open-Sora/opus-mt-zh-en') if not os.path.exists('/home/ma-user/work/frpc_linux_amd64'):
mox.file.copy_parallel('obs://modelarts-labs-bj4-v2/course/ModelBox/frpc_linux_amd64', '/home/ma-user/work/frpc_linux_amd64')
INFO:root:Using MoXing-v2.1.6.879ab2f4-879ab2f4

INFO:root:List OBS time cost: 0.02 seconds.

INFO:root:Copy parallel total time cost: 41.71 seconds.

INFO:root:List OBS time cost: 0.14 seconds.

INFO:root:Copy parallel total time cost: 2.91 seconds.

2. 配置运行环境

本案例依赖Python3.10.10及以上环境,因此我们首先创建虚拟环境:

!/home/ma-user/anaconda3/bin/conda clean -i
!/home/ma-user/anaconda3/bin/conda create -n python-3.10.10 python=3.10.10 -y --override-channels --channel https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
!/home/ma-user/anaconda3/envs/python-3.10.10/bin/pip install ipykernel
/home/ma-user/anaconda3/lib/python3.7/site-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.26.15) or chardet (3.0.4) doesn't match a supported version!
RequestsDependencyWarning)
/home/ma-user/anaconda3/lib/python3.7/site-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.26.15) or chardet (3.0.4) doesn't match a supported version!
RequestsDependencyWarning)
Collecting package metadata (current_repodata.json): done
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: done

## Package Plan ##

environment location: /home/ma-user/anaconda3/envs/python-3.10.10

added / updated specs:
- python=3.10.10
The following packages will be downloaded:

package | build
---------------------------|-----------------
_libgcc_mutex-0.1 | main 3 KB https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
_openmp_mutex-5.1 | 1_gnu 21 KB https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
bzip2-1.0.8 | h5eee18b_6 262 KB https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
ca-certificates-2024.3.11 | h06a4308_0 127 KB https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
ld_impl_linux-64-2.38 | h1181459_1 654 KB https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libffi-3.4.4 | h6a678d5_1 141 KB https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libgcc-ng-11.2.0 | h1234567_1 5.3 MB https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libgomp-11.2.0 | h1234567_1 474 KB https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libstdcxx-ng-11.2.0 | h1234567_1 4.7 MB https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
libuuid-1.41.5 | h5eee18b_0 27 KB https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
ncurses-6.4 | h6a678d5_0 914 KB https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
openssl-1.1.1w | h7f8727e_0 3.7 MB https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
pip-24.0 | py310h06a4308_0 2.7 MB https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
python-3.10.10 | h7a1cb2a_2 26.9 MB https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
readline-8.2 | h5eee18b_0 357 KB https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
setuptools-69.5.1 | py310h06a4308_0 1012 KB https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
sqlite-3.45.3 | h5eee18b_0 1.2 MB https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
tk-8.6.14 | h39e8969_0 3.4 MB https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
tzdata-2024a | h04d1e81_0 116 KB https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
wheel-0.43.0 | py310h06a4308_0 110 KB https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
xz-5.4.6 | h5eee18b_1 643 KB https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
zlib-1.2.13 | h5eee18b_1 111 KB https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
------------------------------------------------------------
Total: 52.8 MB

The following NEW packages will be INSTALLED:

_libgcc_mutex anaconda/pkgs/main/linux-64::_libgcc_mutex-0.1-main
_openmp_mutex anaconda/pkgs/main/linux-64::_openmp_mutex-5.1-1_gnu
bzip2 anaconda/pkgs/main/linux-64::bzip2-1.0.8-h5eee18b_6
ca-certificates anaconda/pkgs/main/linux-64::ca-certificates-2024.3.11-h06a4308_0
ld_impl_linux-64 anaconda/pkgs/main/linux-64::ld_impl_linux-64-2.38-h1181459_1
libffi anaconda/pkgs/main/linux-64::libffi-3.4.4-h6a678d5_1
libgcc-ng anaconda/pkgs/main/linux-64::libgcc-ng-11.2.0-h1234567_1
libgomp anaconda/pkgs/main/linux-64::libgomp-11.2.0-h1234567_1
libstdcxx-ng anaconda/pkgs/main/linux-64::libstdcxx-ng-11.2.0-h1234567_1
libuuid anaconda/pkgs/main/linux-64::libuuid-1.41.5-h5eee18b_0
ncurses anaconda/pkgs/main/linux-64::ncurses-6.4-h6a678d5_0
openssl anaconda/pkgs/main/linux-64::openssl-1.1.1w-h7f8727e_0
pip anaconda/pkgs/main/linux-64::pip-24.0-py310h06a4308_0
python anaconda/pkgs/main/linux-64::python-3.10.10-h7a1cb2a_2
readline anaconda/pkgs/main/linux-64::readline-8.2-h5eee18b_0
setuptools anaconda/pkgs/main/linux-64::setuptools-69.5.1-py310h06a4308_0
sqlite anaconda/pkgs/main/linux-64::sqlite-3.45.3-h5eee18b_0
tk anaconda/pkgs/main/linux-64::tk-8.6.14-h39e8969_0
tzdata anaconda/pkgs/main/noarch::tzdata-2024a-h04d1e81_0
wheel anaconda/pkgs/main/linux-64::wheel-0.43.0-py310h06a4308_0
xz anaconda/pkgs/main/linux-64::xz-5.4.6-h5eee18b_1
zlib anaconda/pkgs/main/linux-64::zlib-1.2.13-h5eee18b_1
Downloading and Extracting Packages
libffi-3.4.4 | 141 KB | ##################################### | 100%
_openmp_mutex-5.1 | 21 KB | ##################################### | 100%
xz-5.4.6 | 643 KB | ##################################### | 100%
tzdata-2024a | 116 KB | ##################################### | 100%
_libgcc_mutex-0.1 | 3 KB | ##################################### | 100%
zlib-1.2.13 | 111 KB | ##################################### | 100%
bzip2-1.0.8 | 262 KB | ##################################### | 100%
libuuid-1.41.5 | 27 KB | ##################################### | 100%
ca-certificates-2024 | 127 KB | ##################################### | 100%
libstdcxx-ng-11.2.0 | 4.7 MB | ##################################### | 100%
ncurses-6.4 | 914 KB | ##################################### | 100%
openssl-1.1.1w | 3.7 MB | ##################################### | 100%
wheel-0.43.0 | 110 KB | ##################################### | 100%
python-3.10.10 | 26.9 MB | ##################################### | 100%
pip-24.0 | 2.7 MB | ##################################### | 100%
readline-8.2 | 357 KB | ##################################### | 100%
tk-8.6.14 | 3.4 MB | ##################################### | 100%
setuptools-69.5.1 | 1012 KB | ##################################### | 100%
libgcc-ng-11.2.0 | 5.3 MB | ##################################### | 100%
ld_impl_linux-64-2.3 | 654 KB | ##################################### | 100%
libgomp-11.2.0 | 474 KB | ##################################### | 100%
sqlite-3.45.3 | 1.2 MB | ##################################### | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
# $ conda activate python-3.10.10
#
# To deactivate an active environment, use
#
# $ conda deactivate

Looking in indexes: http://repo.myhuaweicloud.com/repository/pypi/simple
Collecting ipykernel
Downloading http://repo.myhuaweicloud.com/repository/pypi/packages/53/9d/40d5207db523363d9b5698f33778c18b0d591e3fdb6e0116b894b2a2491c/ipykernel-6.29.4-py3-none-any.whl (117 kB)
•[2K •[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━•[0m •[32m117.1/117.1 kB•[0m •[31m10.6 MB/s•[0m eta •[36m0:00:00•[0m
......

Downloading http://repo.myhuaweicloud.com/repository/pypi/packages/80/03/6ea8b1b2a5ab40a7a60dc464d3daa7aa546e0a74d74a9f8ff551ea7905db/executing-2.0.1-py2.py3-none-any.whl (24 kB)
Collecting asttokens>=2.1.0 (from stack-data->ipython>=7.23.1->ipykernel)
Downloading http://repo.myhuaweicloud.com/repository/pypi/packages/45/86/4736ac618d82a20d87d2f92ae19441ebc7ac9e7a581d7e58bbe79233b24a/asttokens-2.4.1-py2.py3-none-any.whl (27 kB)
Collecting pure-eval (from stack-data->ipython>=7.23.1->ipykernel)
Downloading http://repo.myhuaweicloud.com/repository/pypi/packages/2b/27/77f9d5684e6bce929f5cfe18d6cfbe5133013c06cb2fbf5933670e60761d/pure_eval-0.2.2-py3-none-any.whl (11 kB)
Installing collected packages: wcwidth, pure-eval, ptyprocess, typing-extensions, traitlets, tornado, six, pyzmq, pygments, psutil, prompt-toolkit, platformdirs, pexpect, parso, packaging, nest-asyncio, executing, exceptiongroup, decorator, debugpy, python-dateutil, matplotlib-inline, jupyter-core, jedi, comm, asttokens, stack-data, jupyter-client, ipython, ipykernel
Successfully installed asttokens-2.4.1 comm-0.2.2 debugpy-1.8.1 decorator-5.1.1 exceptiongroup-1.2.1 executing-2.0.1 ipykernel-6.29.4 ipython-8.25.0 jedi-0.19.1 jupyter-client-8.6.2 jupyter-core-5.7.2 matplotlib-inline-0.1.7 nest-asyncio-1.6.0 packaging-24.0 parso-0.8.4 pexpect-4.9.0 platformdirs-4.2.2 prompt-toolkit-3.0.46 psutil-5.9.8 ptyprocess-0.7.0 pure-eval-0.2.2 pygments-2.18.0 python-dateutil-2.9.0.post0 pyzmq-26.0.3 six-1.16.0 stack-data-0.6.3 tornado-6.4 traitlets-5.14.3 typing-extensions-4.12.1 wcwidth-0.2.13
import json
import os

data = {
"display_name": "python-3.10.10",
"env": {
"PATH": "/home/ma-user/anaconda3/envs/python-3.10.10/bin:/home/ma-user/anaconda3/envs/python-3.7.10/bin:/modelarts/authoring/notebook-conda/bin:/opt/conda/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/home/ma-user/modelarts/ma-cli/bin:/home/ma-user/modelarts/ma-cli/bin:/home/ma-user/anaconda3/envs/PyTorch-1.8/bin"
},
"language": "python",
"argv": [
"/home/ma-user/anaconda3/envs/python-3.10.10/bin/python",
"-m",
"ipykernel",
"-f",
"{connection_file}"
]
}

if not os.path.exists("/home/ma-user/anaconda3/share/jupyter/kernels/python-3.10.10/"):
os.mkdir("/home/ma-user/anaconda3/share/jupyter/kernels/python-3.10.10/")

with open('/home/ma-user/anaconda3/share/jupyter/kernels/python-3.10.10/kernel.json', 'w') as f:
json.dump(data, f, indent=4)
conda env list
/home/ma-user/anaconda3/lib/python3.7/site-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.26.15) or chardet (3.0.4) doesn't match a supported version!
RequestsDependencyWarning)
# conda environments:
#
base * /home/ma-user/anaconda3
python-3.10.10 /home/ma-user/anaconda3/envs/python-3.10.10
python-3.7.10 /home/ma-user/anaconda3/envs/python-3.7.10
Note: you may need to restart the kernel to use updated packages.

创建完成后,稍等片刻,或刷新页面,点击右上角kernel选择python-3.10.10

查看Python版本

!python -V
Python 3.10.10

检查可用GPU,至少需要32GB显存

!nvidia-smi
Wed Jun  5 16:22:37 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla V100-PCIE... On | 00000000:00:0D.0 Off | 0 |
| N/A 28C P0 25W / 250W | 0MiB / 32510MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+

安装依赖包

!pip install --upgrade pip
!pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 xformers==0.0.22
Looking in indexes: http://repo.myhuaweicloud.com/repository/pypi/simple
Requirement already satisfied: pip in /home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages (24.0)
Looking in indexes: http://repo.myhuaweicloud.com/repository/pypi/simple
Collecting torch==2.0.1
Downloading http://repo.myhuaweicloud.com/repository/pypi/packages/8c/4d/17e07377c9c3d1a0c4eb3fde1c7c16b5a0ce6133ddbabc08ceef6b7f2645/torch-2.0.1-cp310-cp310-manylinux1_x86_64.whl (619.9 MB)
•[2K •[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━•[0m •[32m619.9/619.9 MB•[0m •[31m8.2 MB/s•[0m eta •[36m0:00:00•[0m00:01•[0m00:01•[0m
•[?25hCollecting torchvision==0.15.2
Downloading http://repo.myhuaweicloud.com/repository/pypi/packages/87/0f/88f023bf6176d9af0f85feedf4be129f9cf2748801c4d9c690739a10c100/torchvision-0.15.2-cp310-cp310-manylinux1_x86_64.whl (6.0 MB)
•[2K •[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━•[0m •[32m6.0/6.0 MB•[0m •[31m109.5 MB/s•[0m eta •[36m0:00:00•[0ma •[36m0:00:01•[0m
•[?25hCollecting torchaudio==2.0.2
Downloading •[?25hCollecting certifi>=2017.4.17 (from requests->torchvision==0.15.2)
Downloading http://repo.myhuaweicloud.com/repository/pypi/packages/5b/11/1e78951465b4a225519b8c3ad29769c49e0d8d157a070f681d5b6d64737f/certifi-2024.6.2-py3-none-any.whl (164 kB)
•[2K •[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━•[0m •[32m164.4/164.4 kB•[0m •[31m23.1 MB/s•[0m eta •[36m0:00:00•[0m
•[?25hCollecting mpmath<1.4.0,>=1.1.0 (from sympy->torch==2.0.1)
Downloading http://repo.myhuaweicloud.com/repository/pypi/packages/43/e3/7d92a15f894aa0c9c4b49b8ee9ac9850d6e63b03c9c32c0367a13ae62209/mpmath-1.3.0-py3-none-any.whl (536 kB)
•[2K •[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━•[0m •[32m536.2/536.2 kB•[0m •[31m32.8 MB/s•[0m eta •[36m0:00:00•[0m
•[?25hInstalling collected packages: mpmath, lit, urllib3, sympy, pillow, nvidia-nvtx-cu11, nvidia-nccl-cu11, nvidia-cusparse-cu11, nvidia-curand-cu11, nvidia-cufft-cu11, nvidia-cuda-runtime-cu11, nvidia-cuda-nvrtc-cu11, nvidia-cuda-cupti-cu11, nvidia-cublas-cu11, numpy, networkx, MarkupSafe, idna, filelock, cmake, charset-normalizer, certifi, requests, nvidia-cusolver-cu11, nvidia-cudnn-cu11, jinja2, triton, torch, xformers, torchvision, torchaudio
Successfully installed MarkupSafe-2.1.5 certifi-2024.6.2 charset-normalizer-3.3.2 cmake-3.29.3 filelock-3.14.0 idna-3.7 jinja2-3.1.4 lit-18.1.6 mpmath-1.3.0 networkx-3.3 numpy-1.26.4 nvidia-cublas-cu11-11.10.3.66 nvidia-cuda-cupti-cu11-11.7.101 nvidia-cuda-nvrtc-cu11-11.7.99 nvidia-cuda-runtime-cu11-11.7.99 nvidia-cudnn-cu11-8.5.0.96 nvidia-cufft-cu11-10.9.0.58 nvidia-curand-cu11-10.2.10.91 nvidia-cusolver-cu11-11.4.0.1 nvidia-cusparse-cu11-11.7.4.91 nvidia-nccl-cu11-2.14.3 nvidia-nvtx-cu11-11.7.91 pillow-10.3.0 requests-2.32.3 sympy-1.12.1 torch-2.0.1 torchaudio-2.0.2 torchvision-0.15.2 triton-2.0.0 urllib3-2.2.1 xformers-0.0.22
%cd Open-Sora
/home/ma-user/work/ma_share/open-spra_1/Open-Sora
/home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages/IPython/core/magics/osm.py:417: UserWarning: This is now an optional IPython functionality, setting dhist requires you to install the `pickleshare` library.
self.shell.db['dhist'] = compress_dhist(dhist)[-100:]
'/home/ma-user/work/ma_share/open-spra_1/Open-Sora'
!pip install colossalai==0.3.6 accelerate==0.29.2 diffusers==0.27.2 ftfy==6.2.0 gdown==5.1.0 mmengine==0.10.3 pre-commit==3.7.0 pyav==12.0.5 tensorboard==2.16.2 timm==0.9.16 transformers==4.39.3 wandb==0.16.6
Looking in indexes: http://repo.myhuaweicloud.com/repository/pypi/simple
Collecting colossalai==0.3.6
Downloading http://repo.myhuaweicloud.com/repository/pypi/packages/05/ed/57e80620ea8e35c3aa63a3207720b1890700fd12eea38b6592e9833e5c1b/colossalai-0.3.6.tar.gz (1.1 MB)
•[2K •[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━•[0m •[32m1.1/1.1 MB•[0m •[31m36.5 MB/s•[0m eta •[36m0:00:00•[0m
•[?25h Preparing metadata (setup.py) ... •[?25ldone
•[?25hCollecting accelerate==0.29.2
Downloading http://repo.myhuaweicloud.com/repository/pypi/packages/1b/e8/2fc7af3fa77ddac89a9c9b390d2d31d1db0612247ba2274009946959604e/accelerate-0.29.2-py3-none-any.whl (297 kB)
•[2K •[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━•[0m •[32m297.4/297.4 kB•[0m •[31m14.5 MB/s•[0m eta •[36m0:00:00•[0m
•[?25hCollecting diffusers==0.27.2
Downloading http://repo.myhuaweicloud.com/repository/pypi/packages/75/c5/3b84fd731dd93c549a0c25657e4ce5a957aeccd32d60dba2958cd3cdac23/diffusers-0.27.2-py3-none-any.whl (2.0 MB)
!pip install .
Looking in indexes: http://repo.myhuaweicloud.com/repository/pypi/simple
Processing /home/ma-user/work/ma_share/open-spra_1/Open-Sora
Preparing metadata (setup.py) ... •[?25ldone
•[?25hRequirement already satisfied: colossalai in /home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages (from opensora==1.1.0) (0.3.6)
Requirement already satisfied: accelerate in /home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages (from opensora==1.1.0) (0.29.2)
Requirement already satisfied: diffusers in /home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages (from opensora==1.1.0) (0.27.2)
Requirement already satisfied: ftfy in /home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages (from opensora==1.1.0) (6.2.0)
Requirement already satisfied: gdown in /home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages (from opensora==1.1.0) (5.1.0)
Requirement already satisfied: mmengine in /home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages (from opensora==1.1.0) (0.10.3)
Collecting pandas (from opensora==1.1.0)
Downloading http://repo.myhuaweicloud.com/repository/pypi/packages/89/1b/12521efcbc6058e2673583bb096c2b5046a9df39bd73eca392c1efed24e5/pandas-2.2.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (13.0 MB)
•[2K •[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━•[0m •[32m13.0/13.0 MB•[0m •[31m60.4 MB/s•[0m eta •[36m0:00:00•[0m00:01•[0m00:01•[0m
•[?25hRequirement already satisfied: pre-commit in /home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages (from opensora==1.1.0) (3.7.0)
Collecting pyarrow (from opensora==1.1.0)
Downloading http://repo.myhuaweicloud.com/repository/pypi/packages/91/83/57572c088ec185582f04b607d545a4a6ef7599c0a3c1e60d397743b0d609/pyarrow-16.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (40.9 MB)
•[2K •[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━•[0m •[32m40.9/40.9 MB•[0m •[31m36.9 MB/s•[0m eta •[36m0:00:00•[0m00:01•[0m00:01•[0m
•[?25hCollecting av (from opensora==1.1.0)
Downloading http://repo.myhuaweicloud.com/repository/pypi/packages/0a/11/2b501d0a4de22826217a0b909e832f52fb5d503df50f424f3e31023a7bcc/av-12.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (34.3 MB)
•[2K •[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━•[0m •[32m34.3/34.3 MB•[0m •[31m96.1 MB/s•[0m eta •[36m0:00:00•[0m00:01•[0m00:01•[0m
•[?25hRequirement already satisfied: tensorboard in /home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages (from opensora==1.1.0) (2.16.2)
Requirement already satisfied: timm in /home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages (from opensora==1.1.0) (0.9.16)
Requirement already satisfied: tqdm in /home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages (from opensora==1.1.0) (4.66.4)
Requirement already satisfied: transformers in /home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages (from opensora==1.1.0) (4.39.3)
Requirement already satisfied: wandb in /home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages (from opensora==1.1.0) (0.16.6)
Collecting rotary_embedding_torch (from opensora==1.1.0)
Downloading
Building wheels for collected packages: opensora, pandarallel
Building wheel for opensora (setup.py) ... •[?25ldone
•[?25h Created wheel for opensora: filename=opensora-1.1.0-py3-none-any.whl size=195249 sha256=86c66de7ded305b2e4fb07992d0147c0408086cc31cdc31d97bcea44d8f69596
Stored in directory: /home/ma-user/.cache/pip/wheels/ae/34/85/7f84dd36f2e448d8d4455272d3358f557d0a570011d1701074
Building wheel for pandarallel (setup.py) ... •[?25ldone
•[?25h Created wheel for pandarallel: filename=pandarallel-1.6.5-py3-none-any.whl size=16673 sha256=b97386c92d34443f19cc88ea717c6cca143ef2b8f1f1ac79f4645c37d230bafc
Stored in directory: /home/ma-user/.cache/pip/wheels/f6/dd/25/a1c3775e721641ff67c71b3652e901e7e52611c6c3091784c9
Successfully built opensora pandarallel
Installing collected packages: pytz, tzdata, pyarrow, dill, beartype, av, pandas, pandarallel, rotary_embedding_torch, opensora
Successfully installed av-12.1.0 beartype-0.18.5 dill-0.3.8 opensora-1.1.0 pandarallel-1.6.5 pandas-2.2.2 pyarrow-16.1.0 pytz-2024.1 rotary_embedding_torch-0.6.2 tzdata-2024.1
!pip install spaces gradio MoviePy -i https://pypi.tuna.tsinghua.edu.cn/simple --trusted-host pypi.tuna.tsinghua.edu.cn
!cp /home/ma-user/work/frpc_linux_amd64 /home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages/gradio/frpc_linux_amd64_v0.2
!chmod +x /home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages/gradio/frpc_linux_amd64_v0.2
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting spaces
Downloading https://pypi.tuna.tsinghua.edu.cn/packages/b2/3c/6205090507ea96e6e56d0deda8d0fc4c507026ef3772e55b637a5d0b7c61/spaces-0.28.3-py3-none-any.whl (18 kB)
Collecting gradio
Downloading https://pypi.tuna.tsinghua.edu.cn/packages/d1/37/f49320600cdf1fa856cc605a2e20e9debd34b5425b53f49abdb2ea463716/gradio-4.32.2-py3-none-any.whl (12.3 MB)
•[2K •[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━•[0m •[32m12.3/12.3 MB•[0m •[31m5.2 MB/s•[0m eta •[36m0:00:00•[0m00:01•[0m00:01•[0m

Successfully uninstalled decorator-5.1.1
•[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
fabric 3.2.2 requires decorator>=5, but you have decorator 4.4.2 which is incompatible.•[0m•[31m
•[0mSuccessfully installed MoviePy-1.0.3 aiofiles-23.2.1 altair-5.3.0 anyio-4.4.0 decorator-4.4.2 dnspython-2.6.1 email_validator-2.1.1 fastapi-0.111.0 fastapi-cli-0.0.4 ffmpy-0.3.2 gradio-4.32.2 gradio-client-0.17.0 h11-0.14.0 httpcore-1.0.5 httptools-0.6.1 httpx-0.27.0 imageio-2.34.1 imageio_ffmpeg-0.5.1 importlib-resources-6.4.0 orjson-3.10.3 proglog-0.1.10 pydub-0.25.1 python-dotenv-1.0.1 python-multipart-0.0.9 ruff-0.4.7 semantic-version-2.10.0 shellingham-1.5.4 sniffio-1.3.1 spaces-0.28.3 starlette-0.37.2 tomlkit-0.12.0 toolz-0.12.1 typer-0.12.3 ujson-5.10.0 uvicorn-0.30.1 uvloop-0.19.0 watchfiles-0.22.0 websockets-11.0.3

3. 生成视频

修改模型配置文件:

%%writefile configs/opensora-v1-1/inference/sample.py
num_frames = 16
frame_interval = 3
fps = 24
image_size = (240, 426)
multi_resolution = "STDiT2"

# Define model
model = dict(
type="STDiT2-XL/2",
from_pretrained="hpcai-tech/OpenSora-STDiT-v2-stage3",
input_sq_size=512, # 使用huggingface上下载好的模型权重
qk_norm=True,
enable_flash_attn=True,
enable_layernorm_kernel=True,
)
vae = dict(
type="VideoAutoencoderKL",
from_pretrained="./opensora/models/sd-vae-ft-ema",
cache_dir=None, # 修改为从当前目录加载
micro_batch_size=4,
)
text_encoder = dict(
type="t5",
from_pretrained="./opensora/models/text_encoder/t5-v1_1-xxl",
cache_dir=None, # 修改为从当前目录加载
model_max_length=200,
)
scheduler = dict(
type="iddpm",
num_sampling_steps=100,
cfg_scale=7.0,
cfg_channel=3, # or None
)
dtype = "fp16"

# Condition
prompt_path = "./assets/texts/t2v_samples.txt"
prompt = None # prompt has higher priority than prompt_path

# Others
batch_size = 1
seed = 42
save_dir = "./samples/samples/"
Overwriting configs/opensora-v1-1/inference/sample.py
import os

os.environ['HF_ENDPOINT'] = 'https://hf-mirror.com'
!cp /home/ma-user/work/t5.py /home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages/opensora/models/text_encoder/t5.py
# text to video
!python scripts/inference.py configs/opensora-v1-1/inference/sample.py --prompt "A fashion girl walking on the streets of Tokyo" --num-frames 32 --image-size 240 426
/home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages/colossalai/shardformer/layer/normalization.py:45: UserWarning: Please install apex from source (https://github.com/NVIDIA/apex) to use the fused layernorm kernel
warnings.warn("Please install apex from source (https://github.com/NVIDIA/apex) to use the fused layernorm kernel")
Config (path: configs/opensora-v1-1/inference/sample.py): {'num_frames': 32, 'frame_interval': 3, 'fps': 24, 'image_size': [240, 426], 'multi_resolution': 'STDiT2', 'model': {'type': 'STDiT2-XL/2', 'from_pretrained': 'hpcai-tech/OpenSora-STDiT-v2-stage3', 'input_sq_size': 512, 'qk_norm': True, 'enable_flash_attn': True, 'enable_layernorm_kernel': True}, 'vae': {'type': 'VideoAutoencoderKL', 'from_pretrained': './opensora/models/sd-vae-ft-ema', 'cache_dir': None, 'micro_batch_size': 4}, 'text_encoder': {'type': 't5', 'from_pretrained': './opensora/models/text_encoder/t5-v1_1-xxl', 'cache_dir': None, 'model_max_length': 200}, 'scheduler': {'type': 'iddpm', 'num_sampling_steps': 100, 'cfg_scale': 7.0, 'cfg_channel': 3}, 'dtype': 'fp16', 'prompt_path': './assets/texts/t2v_samples.txt', 'prompt': ['A fashion girl walking on the streets of Tokyo'], 'batch_size': 1, 'seed': 42, 'save_dir': './samples/samples/', 'config': 'configs/opensora-v1-1/inference/sample.py', 'prompt_as_path': False, 'reference_path': None, 'loop': 1, 'sample_name': None, 'num_sample': 1}
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]/home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
return self.fget.__get__(instance, owner)()
Loading checkpoint shards: 100%|██████████████████| 2/2 [00:35<00:00, 17.87s/it]
/home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
100%|█████████████████████████████████████████| 100/100 [02:11<00:00, 1.32s/it]
Prompt: A fashion girl walking on the streets of Tokyo
Saved to ./samples/samples/sample_0.mp4

生成的视频保存在Open-Sora/samples文件夹中,随机查看:

import os
import random
from moviepy.editor import *
from IPython.display import Image

# 视频存放目录
video_root = 'samples/samples'
# 列出所有文件
videos = os.listdir(video_root)
# 随机抽取视频
video = random.sample(videos, 1)[0]
# 视频输入路径
video_path = os.path.join(video_root, video)
# 加载原始视频
clip = VideoFileClip(video_path)
# 保存为GIF文件
clip.write_gif("output_animation.gif", fps=10)
# 显示生成结果
Image(open('output_animation.gif','rb').read())
MoviePy - Building file output_animation.gif with imageio.

4. Gradio 界面

修改配置文件:

%%writefile configs/opensora-v1-1/inference/sample-ref.py
num_frames = 16
frame_interval = 3
fps = 24
image_size = (240, 426)
multi_resolution = "STDiT2"

# Condition
prompt_path = None
prompt = [
"A car driving on the ocean.",
"In an ornate, historical hall, a massive tidal wave peaks and begins to crash. Two surfers, seizing the moment, skillfully navigate the face of the wave.",
]

loop = 2
condition_frame_length = 4
# (
# loop id, [the loop index of the condition image or video]
# reference id, [the index of the condition image or video in the reference_path]
# reference start, [the start frame of the condition image or video]
# target start, [the location to insert]
# length, [the number of frames to insert]
# edit_ratio [the edit rate of the condition image or video]
# )
# See https://github.com/hpcaitech/Open-Sora/blob/main/docs/config.md#advanced-inference-config for more details
# See https://github.com/hpcaitech/Open-Sora/blob/main/docs/commands.md#inference-with-open-sora-11 for more examples
mask_strategy = [
"0,0,0,0,8,0.3",
None,
"0",
]
reference_path = [
"https://cdn.openai.com/tmp/s/interp/d0.mp4",
None,
"assets/images/condition/wave.png",
]

# Define model
model = dict(
type="STDiT2-XL/2",
from_pretrained="hpcai-tech/OpenSora-STDiT-v2-stage3",
input_sq_size=512, # 使用huggingface上下载好的模型权重
qk_norm=True,
enable_flash_attn=True,
enable_layernorm_kernel=True,
)
vae = dict(
type="VideoAutoencoderKL",
from_pretrained="./opensora/models/sd-vae-ft-ema",
cache_dir=None, # 修改为从当前目录加载
micro_batch_size=4,
)
text_encoder = dict(
type="t5",
from_pretrained="./opensora/models/text_encoder/t5-v1_1-xxl",
cache_dir=None, # 修改为从当前目录加载
model_max_length=200,
)
scheduler = dict(
type="iddpm",
num_sampling_steps=100,
cfg_scale=7.0,
cfg_channel=3, # or None
)
dtype = "fp16"

# Others
batch_size = 1
seed = 42
save_dir = "./samples/samples/"
Overwriting configs/opensora-v1-1/inference/sample-ref.py

修改Gradio应用:

%%writefile gradio/app-ref.py
import argparse
import importlib
import os
import subprocess
import sys
import re
import json
import math
import spaces
import torch
import gradio as gr
from tempfile import NamedTemporaryFile
import datetime
from transformers import pipeline

zh2en = pipeline("translation", model="./opus-mt-zh-en")

MODEL_TYPES = ["v1.1-stage2", "v1.1-stage3"]
CONFIG_MAP = {
"v1.1-stage2": "configs/opensora-v1-1/inference/sample-ref.py",
"v1.1-stage3": "configs/opensora-v1-1/inference/sample-ref.py",
}
HF_STDIT_MAP = {
"v1.1-stage2": "hpcai-tech/OpenSora-STDiT-v2-stage2",
"v1.1-stage3": "hpcai-tech/OpenSora-STDiT-v2-stage3",
}
RESOLUTION_MAP = {
"144p": {
"16:9": (256, 144),
"9:16": (144, 256),
"4:3": (221, 165),
"3:4": (165, 221),
"1:1": (192, 192),
},
"240p": {
"16:9": (426, 240),
"9:16": (240, 426),
"4:3": (370, 278),
"3:4": (278, 370),
"1:1": (320, 320),
},
"360p": {
"16:9": (640, 360),
"9:16": (360, 640),
"4:3": (554, 416),
"3:4": (416, 554),
"1:1": (480, 480),
},
"480p": {
"16:9": (854, 480),
"9:16": (480, 854),
"4:3": (740, 555),
"3:4": (555, 740),
"1:1": (640, 640),
},
"720p": {
"16:9": (1280, 720),
"9:16": (720, 1280),
"4:3": (1108, 832),
"3:4": (832, 1110),
"1:1": (960, 960),
},
}


# ============================
# Utils
# ============================
def collect_references_batch(reference_paths, vae, image_size):
from opensora.datasets.utils import read_from_path

refs_x = []
for reference_path in reference_paths:
if reference_path is None:
refs_x.append([])
continue
ref_path = reference_path.split(";")
ref = []
for r_path in ref_path:
r = read_from_path(r_path, image_size, transform_name="resize_crop")
r_x = vae.encode(r.unsqueeze(0).to(vae.device, vae.dtype))
r_x = r_x.squeeze(0)
ref.append(r_x)
refs_x.append(ref)
# refs_x: [batch, ref_num, C, T, H, W]
return refs_x


def process_mask_strategy(mask_strategy):
mask_batch = []
mask_strategy = mask_strategy.split(";")
for mask in mask_strategy:
mask_group = mask.split(",")
assert len(mask_group) >= 1 and len(mask_group) <= 6, f"Invalid mask strategy: {mask}"
if len(mask_group) == 1:
mask_group.extend(["0", "0", "0", "1", "0"])
elif len(mask_group) == 2:
mask_group.extend(["0", "0", "1", "0"])
elif len(mask_group) == 3:
mask_group.extend(["0", "1", "0"])
elif len(mask_group) == 4:
mask_group.extend(["1", "0"])
elif len(mask_group) == 5:
mask_group.append("0")
mask_batch.append(mask_group)
return mask_batch


def apply_mask_strategy(z, refs_x, mask_strategys, loop_i):
masks = []
for i, mask_strategy in enumerate(mask_strategys):
mask = torch.ones(z.shape[2], dtype=torch.float, device=z.device)
if mask_strategy is None:
masks.append(mask)
continue
mask_strategy = process_mask_strategy(mask_strategy)
for mst in mask_strategy:
loop_id, m_id, m_ref_start, m_target_start, m_length, edit_ratio = mst
loop_id = int(loop_id)
if loop_id != loop_i:
continue
m_id = int(m_id)
m_ref_start = int(m_ref_start)
m_length = int(m_length)
m_target_start = int(m_target_start)
edit_ratio = float(edit_ratio)
ref = refs_x[i][m_id] # [C, T, H, W]
if m_ref_start < 0:
m_ref_start = ref.shape[1] + m_ref_start
if m_target_start < 0:
# z: [B, C, T, H, W]
m_target_start = z.shape[2] + m_target_start
z[i, :, m_target_start : m_target_start + m_length] = ref[:, m_ref_start : m_ref_start + m_length]
mask[m_target_start : m_target_start + m_length] = edit_ratio
masks.append(mask)
masks = torch.stack(masks)
return masks


def process_prompts(prompts, num_loop):
from opensora.models.text_encoder.t5 import text_preprocessing

ret_prompts = []
for prompt in prompts:
if prompt.startswith("|0|"):
prompt_list = prompt.split("|")[1:]
text_list = []
for i in range(0, len(prompt_list), 2):
start_loop = int(prompt_list[i])
text = prompt_list[i + 1]
text = text_preprocessing(text)
end_loop = int(prompt_list[i + 2]) if i + 2 < len(prompt_list) else num_loop
text_list.extend([text] * (end_loop - start_loop))
assert len(text_list) == num_loop, f"Prompt loop mismatch: {len(text_list)} != {num_loop}"
ret_prompts.append(text_list)
else:
prompt = text_preprocessing(prompt)
ret_prompts.append([prompt] * num_loop)
return ret_prompts


def extract_json_from_prompts(prompts):
additional_infos = []
ret_prompts = []
for prompt in prompts:
parts = re.split(r"(?=[{\[])", prompt)
assert len(parts) <= 2, f"Invalid prompt: {prompt}"
ret_prompts.append(parts[0])
if len(parts) == 1:
additional_infos.append({})
else:
additional_infos.append(json.loads(parts[1]))
return ret_prompts, additional_infos


# ============================
# Model-related
# ============================
def read_config(config_path):
"""
Read the configuration file.
"""
from mmengine.config import Config

return Config.fromfile(config_path)


def build_models(model_type, config, enable_optimization=False):
"""
Build the models for the given model type and configuration.
"""
# build vae
from opensora.registry import MODELS, build_module

vae = build_module(config.vae, MODELS).cuda()

# build text encoder
text_encoder = build_module(config.text_encoder, MODELS) # T5 must be fp32
text_encoder.t5.model = text_encoder.t5.model.cuda()

# build stdit
# we load model from HuggingFace directly so that we don't need to
# handle model download logic in HuggingFace Space
from opensora.models.stdit.stdit2 import STDiT2

stdit = STDiT2.from_pretrained(
HF_STDIT_MAP[model_type],
enable_flash_attn=enable_optimization,
trust_remote_code=True,
).cuda()

# build scheduler
from opensora.registry import SCHEDULERS

scheduler = build_module(config.scheduler, SCHEDULERS)

# hack for classifier-free guidance
text_encoder.y_embedder = stdit.y_embedder

# move modelst to device
vae = vae.to(torch.float16).eval()
text_encoder.t5.model = text_encoder.t5.model.eval() # t5 must be in fp32
stdit = stdit.to(torch.float16).eval()

# clear cuda
torch.cuda.empty_cache()
return vae, text_encoder, stdit, scheduler


def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument(
"--model-type",
default="v1.1-stage3",
choices=MODEL_TYPES,
help=f"The type of model to run for the Gradio App, can only be {MODEL_TYPES}",
)
parser.add_argument("--output", default="./outputs", type=str, help="The path to the output folder")
parser.add_argument("--port", default=None, type=int, help="The port to run the Gradio App on.")
parser.add_argument("--host", default=None, type=str, help="The host to run the Gradio App on.")
parser.add_argument("--share", action="store_true", help="Whether to share this gradio demo.")
parser.add_argument(
"--enable-optimization",
action="store_true",
help="Whether to enable optimization such as flash attention and fused layernorm",
)
return parser.parse_args()


# ============================
# Main Gradio Script
# ============================
# as `run_inference` needs to be wrapped by `spaces.GPU` and the input can only be the prompt text
# so we can't pass the models to `run_inference` as arguments.
# instead, we need to define them globally so that we can access these models inside `run_inference`

# read config
args = parse_args()
config = read_config(CONFIG_MAP[args.model_type])

# make outputs dir
os.makedirs(args.output, exist_ok=True)

# disable torch jit as it can cause failure in gradio SDK
# gradio sdk uses torch with cuda 11.3
torch.jit._state.disable()

# import after installation
from opensora.datasets import IMG_FPS, save_sample
from opensora.utils.misc import to_torch_dtype

# some global variables
dtype = to_torch_dtype(config.dtype)
device = torch.device("cuda")

# build model
vae, text_encoder, stdit, scheduler = build_models(args.model_type, config, enable_optimization=args.enable_optimization)


def run_inference(mode, prompt_text, resolution, aspect_ratio, length, reference_image, seed, sampling_steps, cfg_scale):
torch.manual_seed(seed)
with torch.inference_mode():
# ======================
# 1. Preparation
# ======================
# parse the inputs
resolution = RESOLUTION_MAP[resolution][aspect_ratio]

# gather args from config
num_frames = config.num_frames
frame_interval = config.frame_interval
fps = config.fps
condition_frame_length = config.condition_frame_length

# compute number of loops
if mode == "Text2Image":
num_frames = 1
num_loop = 1
else:
num_seconds = int(length.rstrip('s'))
if num_seconds <= 16:
num_frames = num_seconds * fps // frame_interval
num_loop = 1
else:
config.num_frames = 16
total_number_of_frames = num_seconds * fps / frame_interval
num_loop = math.ceil((total_number_of_frames - condition_frame_length) / (num_frames - condition_frame_length))

# prepare model args
if config.num_frames == 1:
fps = IMG_FPS

model_args = dict()
height_tensor = torch.tensor([resolution[0]], device=device, dtype=dtype)
width_tensor = torch.tensor([resolution[1]], device=device, dtype=dtype)
num_frames_tensor = torch.tensor([num_frames], device=device, dtype=dtype)
ar_tensor = torch.tensor([resolution[0] / resolution[1]], device=device, dtype=dtype)
fps_tensor = torch.tensor([fps], device=device, dtype=dtype)
model_args["height"] = height_tensor
model_args["width"] = width_tensor
model_args["num_frames"] = num_frames_tensor
model_args["ar"] = ar_tensor
model_args["fps"] = fps_tensor

# compute latent size
input_size = (num_frames, *resolution)
latent_size = vae.get_latent_size(input_size)

# process prompt
prompt = zh2en(prompt_text)[0].get("translation_text")
prompt_raw = [prompt]
print(prompt_raw)
prompt_raw, _ = extract_json_from_prompts(prompt_raw)
prompt_loops = process_prompts(prompt_raw, num_loop)
video_clips = []

# prepare mask strategy
if mode == "Text2Image":
mask_strategy = [None]
elif mode == "Text2Video":
if reference_image is not None:
mask_strategy = ['0']
else:
mask_strategy = [None]
else:
raise ValueError(f"Invalid mode: {mode}")

# =========================
# 2. Load reference images
# =========================
if mode == "Text2Image":
refs_x = collect_references_batch([None], vae, resolution)
elif mode == "Text2Video":
if reference_image is not None:
# save image to disk
from PIL import Image
im = Image.fromarray(reference_image)

with NamedTemporaryFile(suffix=".jpg") as temp_file:
im.save(temp_file.name)
refs_x = collect_references_batch([temp_file.name], vae, resolution)
else:
refs_x = collect_references_batch([None], vae, resolution)
else:
raise ValueError(f"Invalid mode: {mode}")

# 4.3. long video generation
for loop_i in range(num_loop):
# 4.4 sample in hidden space
batch_prompts = [prompt[loop_i] for prompt in prompt_loops]
z = torch.randn(len(batch_prompts), vae.out_channels, *latent_size, device=device, dtype=dtype)

# 4.5. apply mask strategy
masks = None

# if cfg.reference_path is not None:
if loop_i > 0:
ref_x = vae.encode(video_clips[-1])
for j, refs in enumerate(refs_x):
if refs is None:
refs_x[j] = [ref_x[j]]
else:
refs.append(ref_x[j])
if mask_strategy[j] is None:
mask_strategy[j] = ""
else:
mask_strategy[j] += ";"
mask_strategy[
j
] += f"{loop_i},{len(refs)-1},-{condition_frame_length},0,{condition_frame_length}"

masks = apply_mask_strategy(z, refs_x, mask_strategy, loop_i)

# 4.6. diffusion sampling
# hack to update num_sampling_steps and cfg_scale
scheduler_kwargs = config.scheduler.copy()
scheduler_kwargs.pop('type')
scheduler_kwargs['num_sampling_steps'] = sampling_steps
scheduler_kwargs['cfg_scale'] = cfg_scale

scheduler.__init__(
**scheduler_kwargs
)
samples = scheduler.sample(
stdit,
text_encoder,
z=z,
prompts=batch_prompts,
device=device,
additional_args=model_args,
mask=masks, # scheduler must support mask
)
samples = vae.decode(samples.to(dtype))
video_clips.append(samples)

# 4.7. save video
if loop_i == num_loop - 1:
video_clips_list = [
video_clips[0][0]] + [video_clips[i][0][:, config.condition_frame_length :]
for i in range(1, num_loop)
]
video = torch.cat(video_clips_list, dim=1)
current_datetime = datetime.datetime.now()
timestamp = current_datetime.timestamp()
save_path = os.path.join(args.output, f"output_{timestamp}")
saved_path = save_sample(video, save_path=save_path, fps=config.fps // config.frame_interval)
return saved_path

@spaces.GPU(duration=200)
def run_image_inference(prompt_text, resolution, aspect_ratio, length, reference_image, seed, sampling_steps, cfg_scale):
return run_inference("Text2Image", prompt_text, resolution, aspect_ratio, length, reference_image, seed, sampling_steps, cfg_scale)

@spaces.GPU(duration=200)
def run_video_inference(prompt_text, resolution, aspect_ratio, length, reference_image, seed, sampling_steps, cfg_scale):
return run_inference("Text2Video", prompt_text, resolution, aspect_ratio, length, reference_image, seed, sampling_steps, cfg_scale)


def main():
# create demo
with gr.Blocks() as demo:
with gr.Row():
with gr.Column():
gr.HTML("""<h1 align="center">Open-Sora 1.1</h1>""")

with gr.Row():
with gr.Column():
prompt_text = gr.Textbox(
label="Prompt",
placeholder="请输入中文提示词",
lines=4,
)
resolution = gr.Radio(
choices=["144p", "240p", "360p", "480p", "720p"],
value="240p",
label="Resolution",
)
aspect_ratio = gr.Radio(
choices=["9:16", "16:9", "3:4", "4:3", "1:1"],
value="9:16",
label="Aspect Ratio (H:W)",
)
length = gr.Radio(
choices=["2s", "4s", "8s", "16s"],
value="2s",
label="Video Length (only effective for video generation)",
info="8s may fail as Hugging Face ZeroGPU has the limitation of max 200 seconds inference time."
)

with gr.Row():
seed = gr.Slider(
value=1024,
minimum=1,
maximum=2048,
step=1,
label="Seed"
)

sampling_steps = gr.Slider(
value=100,
minimum=1,
maximum=200,
step=1,
label="Sampling steps"
)
cfg_scale = gr.Slider(
value=7.0,
minimum=0.0,
maximum=10.0,
step=0.1,
label="CFG Scale"
) reference_image = gr.Image(
label="Reference Image (Optional)",
) with gr.Column():
output_video = gr.Video(
label="Output Video",
height="100%"
)

with gr.Row():
image_gen_button = gr.Button("Generate image")
video_gen_button = gr.Button("Generate video") ​
image_gen_button.click(
fn=run_image_inference,
inputs=[prompt_text, resolution, aspect_ratio, length, reference_image, seed, sampling_steps, cfg_scale],
outputs=reference_image
)
video_gen_button.click(
fn=run_video_inference,
inputs=[prompt_text, resolution, aspect_ratio, length, reference_image, seed, sampling_steps, cfg_scale],
outputs=output_video
)

# launch
demo.launch(share=True, inbrowser=True) ​
if __name__ == "__main__":
main()
Writing gradio/app-ref.py

运行Gradio应用,运行成功后点击

Running on public URL

后的网页链接即可体验!

!python gradio/app-ref.py
/home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
return self.fget.__get__(instance, owner)()
/home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages/transformers/models/marian/tokenization_marian.py:197: UserWarning: Recommended: pip install sacremoses.
warnings.warn("Recommended: pip install sacremoses.")
/home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages/colossalai/shardformer/layer/normalization.py:45: UserWarning: Please install apex from source (https://github.com/NVIDIA/apex) to use the fused layernorm kernel
warnings.warn("Please install apex from source (https://github.com/NVIDIA/apex) to use the fused layernorm kernel")
Loading checkpoint shards: 100%|██████████████████| 2/2 [00:32<00:00, 16.15s/it]
/home/ma-user/anaconda3/envs/python-3.10.10/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
Running on local URL: http://127.0.0.1:7860
Running on public URL: https://64147712240bbb3753.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)

我们也准备了一些提示词以供参考:

一只穿着紫色长袍的胖兔子穿过奇幻的风景

海浪冲击着孤零零的灯塔,不祥的灯光

一个神秘的森林展示了旅行者的冒险经历

一个蓝头发的法师在唱歌

一个超现实的景观,漂浮的岛屿和空中的瀑布

一只蓝鸟站在水里

一个年轻人独自走在海边

粉红色的玫瑰在玻璃表面滴,特写

驱车远眺,一列地铁正从隧道中驶出

太空中所有的行星都是绿色和粉色的,背景是明亮的白色恒星

一座漂浮在星体空间的城市,有星星和星云 高楼顶上的日出

粉色和青色粉末爆炸 树林里的鹿在阳光下凝视着相机

一道闪电,一个巫师从稀薄的空气中出现了,他的长袍在风中翻腾

夜晚的未来赛博朋克城市景观,高耸的霓虹灯照亮的摩天大楼

在这里,树木、花朵和动物聚集在一起,谱写出一曲大自然的交响乐

一艘幽灵般的船在云层中航行,在月光下的天空中航行 日落和美丽的海滩

一个年轻人独自走在森林里

生成好的视频也可以使用MusicGen进行配乐,使用AI进行短视频创作。

5. 视频效果展示

提示词:一个极端的特写一个头发花白的胡子的男人在他的60年代,他在思想深处思考宇宙的历史,他坐在一家咖啡馆在巴黎,他的眼睛关注人私生活方面大多像他坐在他们走不动,他穿着一件羊毛外套西装外套和一件衬衫,他穿着一件棕色的贝雷帽,眼镜和有一个非常专业的外表,和结束他提供了一个微妙的封闭式的笑容好像找到了答案的神秘生活,灯光非常电影化,金色的灯光和巴黎的街道和城市作为背景,景深,电影化的35mm胶片。

提示词:无人机拍摄的海浪冲击着大苏尔加雷角海滩上崎岖的悬崖。蓝色的海水拍打着白色的波浪,夕阳的金色光芒照亮了岩石海岸。远处有一座小岛,岛上有一座灯塔,悬崖边上长满了绿色的灌木丛。从公路到海滩的陡峭落差是一个戏剧性的壮举,悬崖的边缘突出在海面上。这是一幅捕捉到海岸原始美景和太平洋海岸公路崎岖景观的景色。

提示词:一段高耸的无人机镜头捕捉到了海岸悬崖的雄伟之美,它的红色和黄色分层岩石表面色彩丰富,映衬着充满活力的绿松石般的大海。可以看到海鸟在悬崖峭壁上飞翔。当无人机从不同角度缓慢移动时,变化的阳光投射出移动的阴影,突出了悬崖的崎岖纹理和周围平静的大海。水轻轻地拍打着岩石基座和附着在悬崖顶部的绿色植物,这一场景给人一种宁静的感觉,在海洋的边缘孤立。这段视频捕捉了未受人类建筑影响的原始自然美的本质。

提示词:雄伟美丽的瀑布从悬崖上倾泻而下,进入宁静的湖泊。瀑布,以其强大的流量,是视频的中心焦点。周围的景色郁郁葱葱,树木和树叶增添了自然美景。相机角度提供了瀑布的鸟瞰图,让观众欣赏瀑布的全部高度和壮观。这段视频令人惊叹地展现了大自然的力量和美。

 

提示词:夜晚熙熙攘攘的城市街道,充满了汽车前灯的光辉和街灯的环境光。场景是一个模糊的运动,汽车飞驰而过,行人在人行横道上穿行。城市景观是高耸的建筑和照明标志的混合,创造了一个充满活力和动态的氛围。视频的视角是高角度的,提供了街道及其周围环境的鸟瞰图。整个视频的风格充满活力和活力,捕捉到了夜晚城市生活的精髓。

 

提示词:森林地区宁静的夜景。第一个画面是一个宁静的湖泊,倒映着繁星满天的夜空。第二帧展示了美丽的日落,在风景上投下温暖的光芒。第三帧展示了夜空,充满了星星和充满活力的银河系。这段视频是延时拍摄的,捕捉了从白天到夜晚的过渡,湖泊和森林作为恒定的背景。视频的风格是自然主义的,强调夜空的美丽和森林的宁静。

点击关注,第一时间了解华为云新鲜技术~

无需搭建环境,零门槛带你体验Open-Sora文生视频应用的更多相关文章

  1. app整体搭建环境:tabBar切换不同控制器的封装(自定义导航+自定义uiviewcontroler+系统自带tabbar+自定义tabbarController)

    首先,一个app的搭建环境非常重要.既要实现基本功能,又要考虑后期优化的性能. 现在很多应用不仅仅是系统自带的控制器,由于需求复杂,基本上需要自定义多控制器来管理. 新建一个BasicNavigati ...

  2. 个人永久性免费-Excel催化剂功能第21波-Excel与Sqlserver零门槛交互-执行SQL语句篇

    在前两波中,已完成了Excel与Sqlserver的查询和上传功能,但难免许多临时的或更深入地操作数据库需要用Sql语句来操作,对一般用户电脑里,不可能有条件轻易安装一个数据库客户端软件,就算安装了对 ...

  3. 个人永久性免费-Excel催化剂功能第20波-Excel与Sqlserver零门槛交互-数据上传篇

    Excel作为众多数据存储的交换介质,在不同的系统内的数据很少可以很连贯地进行整合分析,一般的业务系统都会提供导出Excel作为标配功能供用户使用系统内生成的数据. 此时最大的问题是,Excel很维去 ...

  4. 网络安全(超级详细)零基础带你一步一步走进缓冲区溢出漏洞和shellcode编写!

    零基础带你走进缓冲区溢出,编写shellcode. 写在前面的话:本人是以一个零基础者角度来带着大家去理解缓冲区溢出漏洞,当然如果你是开发者更好. 注:如果有转载请注明出处!创作不易.谢谢合作. 0. ...

  5. ionic2新手入门整理,搭建环境,创建demo,打包apk,热更新,优化启动慢等避坑详解

    onic官方文档链接:http://ionicframework.com/docs/ 如果是新的环境会有很多坑,主要是有墙,请仔细阅读每个步骤 文档包含以下内容: l  环境搭建 l  创建demo并 ...

  6. ROS零门槛学渣教程系列前言

    为什么选择ROS: 1.ROS是开放源码的,在该平台上可以找到非常很多免费开源的代码包,并且这些例程还带wiki说明文档: 2.机器人领域最新的算法直接支持ROS,简单几个步骤就能运行: 3.ROS工 ...

  7. 不需内测账号,带你体验微信小程序完整开发过程

    不需内测账号,带你体验微信小程序完整开发过程 2016年09月24日 - 作者: SwiftCafe 微信小程序还没正式发布就已经迅速成为大家讨论的焦点,那么大家可能觉得只有收到内测邀请才能体验小程序 ...

  8. PYTHON 学习笔记1 PYTHON 入门 搭建环境与基本类型

    简介 Python,当然大家听到这个名词不再是有关于像JAVA 一样的关于后台,我们学习Python 的目的在于对于以后数据分析和机器学习AI 奠定基础,Python 在数据分析这一块,可谓是有较好的 ...

  9. Xamarin 跨移动端开发系列(01) -- 搭建环境、编译、调试、部署、运行

    如果是.NET开发人员,想学习手机应用开发(Android和iOS),Xamarin 无疑是最好的选择,编写一次,即可发布到Android和iOS平台,真是利器中的利器啊!好了,废话不多说,就开始吧, ...

  10. springMVC学习篇 - 搭建环境及关键点

    springMVC是spring家族中一个重要的组件,和struts一样作为一套前台框架被广泛的应用于各种项目. 之前在很多项目组都用到springMVC,只感觉很强大,但是对这套框架的知识了解比较少 ...

随机推荐

  1. CSS 样式清单整理(二)

    16.元素占满整个屏幕 heigth如果使用100%,会根据父级的高度来决定,所以使用100vh单位. .dom{ width:100%; height:100vh; } 17.CSS实现文本两端对齐 ...

  2. .netcore 使用Quartz定时任务

    这是一个使用 .NET Core 和 Quartz.NET 实现定时任务的完整示例.首先确保已经安装了 .NET Core SDK.接下来按照以下步骤创建一个新的控制台应用程序并设置定时任务: 创建一 ...

  3. vuex合作怎么用仓库

  4. 力扣1662(java&python)-检查两个字符串数组是否相等(简单)

    题目: 给你两个字符串数组 word1 和 word2 .如果两个数组表示的字符串相同,返回 true :否则,返回 false . 数组表示的字符串 是由数组中的所有元素 按顺序 连接形成的字符串. ...

  5. 力扣182(java&python)-数组元素积的符号(简单)

    题目: 已知函数 signFunc(x) 将会根据 x 的正负返回特定值: 如果 x 是正数,返回 1 .如果 x 是负数,返回 -1 .如果 x 是等于 0 ,返回 0 .给你一个整数数组 nums ...

  6. 阿里云RemoteShuffleService新功能:AQE和流控

    ​简介:阿里云EMR自2020年推出Remote Shuffle Service(RSS)以来,帮助了诸多客户解决Spark作业的性能.稳定性问题,并使得存算分离架构得以实施.为了更方便大家使用和扩展 ...

  7. Flink集成Iceberg在同程艺龙的实践

    ------------恢复内容开始------------ null ------------恢复内容结束------------

  8. Serverless 应用优化四则秘诀

    ​简介:Serverless 架构下,虽然我们更多精力是关注我们的业务代码,但是实际上对于一些配置和成本也是需要进行关注的,并且在必要的时候,还需要根据配置与成本进行对我们的 Serverless 应 ...

  9. 从操作系统层面分析Java IO演进之路

    简介: 本文从操作系统实际调用角度(以CentOS Linux release 7.5操作系统为示例),力求追根溯源看IO的每一步操作到底发生了什么. 作者 | 道坚来源 | 阿里技术公众号 前言 本 ...

  10. OceanBase时序数据库CeresDB正式商用 为用户提供安全可靠的数据存储管理服务

    简介: OceanBase完成OLAP和OLTP双重能力并行后,向数据管理领域多模方向迈出第一步. 近日,在数据库OceanBase3.0峰会上,OceanBase CEO杨冰宣布首个时序数据库产品C ...