Training Your Own LoRAs
https://tfwol.github.io/text-generation-webui/Training-LoRAs.html#format-files
text-generation-webui
Training Your Own LoRAs
The WebUI seeks to make training your own LoRAs as easy as possible. It comes down to just a few simple steps:
Step 1: Make a plan.
- What base model do you want to use? The LoRA you make has to be matched up to a single architecture (eg LLaMA-13B) and cannot be transferred to others (eg LLaMA-7B, StableLM, etc. would all be different). Derivatives of the same model (eg Alpaca finetune of LLaMA-13B) might be transferrable, but even then it’s best to train exactly on what you plan to use.
- What model format do you want? At time of writing, 8-bit models are most stable, and 4-bit are supported but experimental. In the near future it is likely that 4-bit will be the best option for most users.
- What are you training it on? Do you want it to learn real information, a simple format, …?
Step 2: Gather a dataset.
- If you use a dataset similar to the Alpaca format, that is natively supported by the
Formatted Dataset
input in the WebUI, with premade formatter options. - If you use a dataset that isn’t matched to Alpaca’s format, but uses the same basic JSON structure, you can make your own format file by copying
training/formats/alpaca-format.json
to a new file and editing its content. - If you can get the dataset into a simple text file, that works too! You can train using the
Raw text file
input option.- This means you can for example just copy/paste a chatlog/documentation page/whatever you want, shove it in a plain text file, and train on it.
- If you use a structured dataset not in this format, you may have to find an external way to convert it - or open an issue to request native support.
Step 3: Do the training.
- 3.1: Load the WebUI, and your model.
- Make sure you don’t have any LoRAs already loaded (unless you want to train for multi-LoRA usage).
- 3.2: Open the
Training
tab at the top,Train LoRA
sub-tab. - 3.3: Fill in the name of the LoRA, select your dataset in the dataset options.
- 3.4: Select other parameters to your preference. See parameters below.
- 3.5: click
Start LoRA Training
, and wait.- It can take a few hours for a large dataset, or just a few minute if doing a small run.
- You may want to monitor your loss value while it goes.
Step 4: Evaluate your results.
- Load the LoRA under the Models Tab.
- You can go test-drive it on the
Text generation
tab, or you can use thePerplexity evaluation
sub-tab of theTraining
tab. - If you used the
Save every n steps
option, you can grab prior copies of the model from sub-folders within the LoRA model’s folder and try them instead.
Step 5: Re-run if you’re unhappy.
- Make sure to unload the LoRA before training it.
- You can simply resume a prior run - use
Copy parameters from
to select your LoRA, and edit parameters. Note that you cannot change theRank
of an already created LoRA.- If you want to resume from a checkpoint saved along the way, simply copy the contents of the checkpoint folder into the LoRA’s folder.
- (Note:
adapter_model.bin
is the important file that holds the actual LoRA content). - This will start Learning Rate and Steps back to the start. If you want to resume as if you were midway through, you can adjust your Learning Rate to the last reported LR in logs and reduce your epochs.
- Or, you can start over entirely if you prefer.
- If your model is producing corrupted outputs, you probably need to start over and use a lower Learning Rate.
- If your model isn’t learning detailed information but you want it to, you might need to just run more epochs, or you might need a higher Rank.
- If your model is enforcing a format you didn’t want, you may need to tweak your dataset, or start over and not train as far.
Format Files
If using JSON formatted datasets, they are presumed to be in the following approximate format:
[
{
"somekey": "somevalue",
"key2": "value2"
},
{
// etc
}
]
Where the keys (eg somekey
, key2
above) are standardized, and relatively consistent across the dataset, and the values (eg somevalue
, value2
) contain the content actually intended to be trained.
For Alpaca, the keys are instruction
, input
, and output
, wherein input
is sometimes blank.
A simple format file for Alpaca to be used as a chat bot is:
{
"instruction,output": "User: %instruction%\nAssistant: %output%",
"instruction,input,output": "User: %instruction%: %input%\nAssistant: %output%"
}
Note that the keys (eg instruction,output
) are a comma-separated list of dataset keys, and the values are a simple string that use those keys with %%
.
So for example if a dataset has "instruction": "answer my question"
, then the format file’s User: %instruction%\n
will be automatically filled in as User: answer my question\n
.
If you have different sets of key inputs, you can make your own format file to match it. This format-file is designed to be as simple as possible to enable easy editing to match your needs.
Raw Text File Settings
When using raw text files as your dataset, the text is automatically split into chunks based on your Cutoff Length
you get a few basic options to configure them.
Overlap Length
is how much to overlap chunks by. Overlapping chunks helps prevent the model from learning strange mid-sentence cuts, and instead learn continual sentences that flow from earlier text.Prefer Newline Cut Length
sets a maximum distance in characters to shift the chunk cut towards newlines. Doing this helps prevent lines from starting or ending mid-sentence, preventing the model from learning to cut off sentences randomly.Hard Cut String
sets a string that indicates there must be a hard cut without overlap. This defaults to\n\n\n
, meaning 3 newlines. No trained chunk will ever contain this string. This allows you to insert unrelated sections of text in the same text file, but still ensure the model won’t be taught to randomly change the subject.
Parameters
The basic purpose and function of each parameter is documented on-page in the WebUI, so read through them in the UI to understand your options.
That said, here’s a guide to the most important parameter choices you should consider:
VRAM
- First, you must consider your VRAM availability.
- Generally, under default settings, VRAM usage for training with default parameters is very close to when generating text (with 1000+ tokens of context) (ie, if you can generate text, you can train LoRAs).
- Note: worse by default in the 4-bit monkeypatch currently. Reduce
Micro Batch Size
to1
to restore this to expectations.
- Note: worse by default in the 4-bit monkeypatch currently. Reduce
- If you have VRAM to spare, setting higher batch sizes will use more VRAM and get you better quality training in exchange.
- If you have large data, setting a higher cutoff length may be beneficial, but will cost significant VRAM. If you can spare some, set your batch size to
1
and see how high you can push your cutoff length. - If you’re low on VRAM, reducing batch size or cutoff length will of course improve that.
- Don’t be afraid to just try it and see what happens. If it’s too much, it will just error out, and you can lower settings and try again.
- Generally, under default settings, VRAM usage for training with default parameters is very close to when generating text (with 1000+ tokens of context) (ie, if you can generate text, you can train LoRAs).
Rank
- Second, you want to consider the amount of learning you want.
- For example, you may wish to just learn a dialogue format (as in the case of Alpaca) in which case setting a low
Rank
value (32 or lower) works great. - Or, you might be training on project documentation you want the bot to understand and be able to understand questions about, in which case the higher the rank, the better.
- Generally, higher Rank = more precise learning = more total content learned = more VRAM usage while training.
- For example, you may wish to just learn a dialogue format (as in the case of Alpaca) in which case setting a low
Learning Rate and Epochs
- Third, how carefully you want it to be learned.
- In other words, how okay or not you are with the model losing unrelated understandings.
- You can control this with 3 key settings: the Learning Rate, its scheduler, and your total epochs.
- The learning rate controls how much change is made to the model by each token it sees.
- It’s in scientific notation normally, so for example
3e-4
means3 * 10^-4
which is0.0003
. The number aftere-
controls how many0
s are in the number. - Higher values let training run faster, but also are more likely to corrupt prior data in the model.
- It’s in scientific notation normally, so for example
- You essentially have two variables to balance: the LR, and Epochs.
- If you make LR higher, you can set Epochs equally lower to match. High LR + low epochs = very fast, low quality training.
- If you make LR low, set epochs high. Low LR + high epochs = slow but high-quality training.
- The scheduler controls change-over-time as you train - it starts high, and then goes low. This helps balance getting data in, and having decent quality, at the same time.
- You can see graphs of the different scheduler options in the HuggingFace docs here
Loss
When you’re running training, the WebUI’s console window will log reports that include, among other things, a numeric value named Loss
. It will start as a high number, and gradually get lower and lower as it goes.
“Loss” in the world of AI training theoretically means “how close is the model to perfect”, with 0
meaning “absolutely perfect”. This is calculated by measuring the difference between the model outputting exactly the text you’re training it to output, and what it actually outputs.
In practice, a good LLM should have a very complex variable range of ideas running in its artificial head, so a loss of 0
would indicate that the model has broken and forgotten to how think about anything other than what you trained it.
So, in effect, Loss is a balancing game: you want to get it low enough that it understands your data, but high enough that it isn’t forgetting everything else. Generally, if it goes below 1.0
, it’s going to start forgetting its prior memories, and you should stop training. In some cases you may prefer to take it as low as 0.5
(if you want it to be very very predictable). Different goals have different needs, so don’t be afraid to experiment and see what works best for you.
Note: if you see Loss start at or suddenly jump to exactly 0
, it is likely something has gone wrong in your training process (eg model corruption).
Note: 4-Bit Monkeypatch
The 4-bit LoRA monkeypatch works for training, but has side effects:
- VRAM usage is higher currently. You can reduce the
Micro Batch Size
to1
to compensate. - Models do funky things. LoRAs apply themselves, or refuse to apply, or spontaneously error out, or etc. It can be helpful to reload base model or restart the WebUI between training/usage to minimize chances of anything going haywire.
- Loading or working with multiple LoRAs at the same time doesn’t currently work.
- Generally, recognize and treat the monkeypatch as the dirty temporary hack it is - it works, but isn’t very stable. It will get better in time when everything is merged upstream for full official support.
Legacy notes
LoRA training was contributed by mcmonkey4eva in PR #570.
Using the original alpaca-lora code
Kept here for reference. The Training tab has much more features than this method.
conda activate textgen
git clone https://github.com/tloen/alpaca-lora
Edit those two lines in alpaca-lora/finetune.py
to use your existing model folder instead of downloading everything from decapoda:
model = LlamaForCausalLM.from_pretrained(
"models/llama-7b",
load_in_8bit=True,
device_map="auto",
)
tokenizer = LlamaTokenizer.from_pretrained(
"models/llama-7b", add_eos_token=True
)
Run the script with:
python finetune.py
It just works. It runs at 22.32s/it, with 1170 iterations in total, so about 7 hours and a half for training a LoRA. RTX 3090, 18153MiB VRAM used, drawing maximum power (350W, room heater mode).
Training Your Own LoRAs的更多相关文章
- hdu 4946 2014 Multi-University Training Contest 8
Area of Mushroom Time Limit: 2000/1000 MS (Java/Others) Memory Limit: 65536/65536 K (Java/Others) ...
- 2016 Multi-University Training Contests
2016 Multi-University Training Contest 1 2016 Multi-University Training Contest 2 2016 Multi-Univers ...
- 2016 Multi-University Training Contest 2 D. Differencia
Differencia Time Limit: 10000/10000 MS (Java/Others) Memory Limit: 65536/65536 K (Java/Others)Tot ...
- 2016 Multi-University Training Contest 1 G. Rigid Frameworks
Rigid Frameworks Time Limit: 2000/1000 MS (Java/Others) Memory Limit: 65536/65536 K (Java/Others) ...
- ACM: Gym 101047K Training with Phuket's larvae - 思维题
Gym 101047K Training with Phuket's larvae Time Limit:2000MS Memory Limit:65536KB 64bit IO F ...
- The Solution of UESTC 2016 Summer Training #1 Div.2 Problem C
Link http://acm.hust.edu.cn/vjudge/contest/121539#problem/C Description standard input/output After ...
- 2012 Multi-University Training Contest 9 / hdu4389
2012 Multi-University Training Contest 9 / hdu4389 打巨表,实为数位dp 还不太懂 先这样放着.. 对于打表,当然我们不能直接打,这里有技巧.我们可以 ...
- 2014 Multi-University Training Contest 9#11
2014 Multi-University Training Contest 9#11 Killing MonstersTime Limit: 2000/1000 MS (Java/Others) ...
- 2014 Multi-University Training Contest 9#6
2014 Multi-University Training Contest 9#6 Fast Matrix CalculationTime Limit: 2000/1000 MS (Java/Oth ...
- 2016 Multi-University Training Contest 1
8/11 2016 Multi-University Training Contest 1 官方题解 老年选手历险记 最小生成树+线性期望 A Abandoned country(BH) 题意: 1. ...
随机推荐
- [原创]《C#高级GDI+实战:从零开发一个流程图》第04章:来个圆形,连线它!
一.前言 上一节我们实现了在矩形与矩形之间添加连线,光是矩形太单调了,某些问题也暴露不出来,我们本节就来看一下,如何添加一个圆形,且支持圆形与圆形.圆形与矩形.矩形与矩形间的连线.在这个过程中我们会发 ...
- oracle 存储过程 for loop 定时任务
记录. 是这么个事,要实现一个需求,当人员表里的数据有更新后需要告知其他系统更新他们自己的人员数据. 我想了一下,表里是有时间戳字段的,那我只要监听这个时间就行,拿到数据后用存储过程把数据插入到中间表 ...
- 一文搞懂javascript中的var、let、const
简介 var, let and const是JavaScript中三种定义变量的方式,它们之间有什么区别呢?这是前端面试中常见的一道题,今天我们来一文说透它. let和const区别不大,主要是con ...
- C# 选择文件保存路径
public static string SetSaveFilePath(string filterType= "所有文件|*.*",string fileName="我 ...
- Oracle中数值型及处理方法
数值型 理解精度 number类型的精度表示可以标识数据精确度的位数.对于数字13245.977,当精确到小数点后2位,数据为12345.98,此时精度为7.而当精确到小数点前2位,数据为12300, ...
- Luogu P11131 【MX-X5-T3】「GFOI Round 1」Cthugha 题解
P11131 [MX-X5-T3]「GFOI Round 1」Cthugha 有意思的最短路题目,需要对迪杰斯特拉算法有深入的理解. 首先,不存在最小值的条件是相邻的两个格子加起来值小于 \(0\), ...
- WSL初探
1 简介 WSL( Windows Subsystem for Linux )是微软开发的兼容层,允许在 Windows 10 及更高版本上运行原生Linux二进制文件(如 Ubuntu . Debi ...
- Day1 备战CCF-CSP练习
Day 1 201403-1 题目描述 有 \(N\) 个非零且各不相同的整数.请你编一个程序求出它们中有多少对相反数(\(a\) 和 \(-a\) 为一对相反数). 输入格式 第一行包含一个正整数 ...
- 全自助使用Cursor生成的DeepSeek Demo java 项目
今天尝试用cursor全自助生成了1个spring boot调用ollama(本地模型) +远程deepseek的示例项目. 效果图: .. 主要特性(摘自Readme.md,其实也是 C ...
- FreeSwitch:群振、顺振研究
在呼叫中心系统中,有二类特殊的应用场景,即所谓的"群振"(也叫"共振"或"同振")以及"顺振". 群振的业务场景: 当客 ...