AIGC笔记--基于PEFT库使用LoRA

正在检查是否收录...

1--相关讲解

LORA: LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS

LoRA 在 Stable Diffusion 中的三种应用：原理讲解与代码示例

PEFT-LoRA

2--基本原理

固定原始层，通过添加和训练两个低秩矩阵，达到微调模型的效果；

3--简单代码

import torch import torch.nn as nn from peft import LoraConfig, get_peft_model, LoraModel from peft.utils import get_peft_model_state_dict # 创建模型 class Simple_Model(nn.Module): def __init__(self): super().__init__() self.linear1 = nn.Linear(64, 128) self.linear2 = nn.Linear(128, 256) def forward(self, x: torch.Tensor): x = self.linear1(x) x = self.linear2(x) return x if __name__ == "__main__": # 初始化原始模型 origin_model = Simple_Model() # 配置lora config model_lora_config = LoraConfig( r = 32, lora_alpha = 32, # scaling = lora_alpha / r 一般来说，lora_alpha的参数初始化为与r相同，即scale=1 init_lora_weights = "gaussian", # 参数初始化方式 target_modules = ["linear1", "linear2"], # 对应层添加lora层 lora_dropout = 0.1 ) # Test data input_data = torch.rand(2, 64) origin_output = origin_model(input_data) # 原始模型的权重参数 origin_state_dict = origin_model.state_dict() # 两种方式生成对应的lora模型，调用后会更改原始的模型 new_model1 = get_peft_model(origin_model, model_lora_config) new_model2 = LoraModel(origin_model, model_lora_config, "default") output1 = new_model1(input_data) output2 = new_model2(input_data) # 初始化时，lora_B矩阵会初始化为全0，因此最初 y = WX + (alpha/r) * BA * X == WX # origin_output == output1 == output2 # 获取lora权重参数，两者在key_name上会有区别 new_model1_lora_state_dict = get_peft_model_state_dict(new_model1) new_model2_lora_state_dict = get_peft_model_state_dict(new_model2) # origin_state_dict['linear1.weight'].shape -> [output_dim, input_dim] # new_model1_lora_state_dict['base_model.model.linear1.lora_A.weight'].shape -> [r, input_dim] # new_model1_lora_state_dict['base_model.model.linear1.lora_B.weight'].shape -> [output_dim, r] print("All Done!")

4--权重保存和合并

核心公式是：new_weights = origin_weights + alpha* (BA)

 # 借助diffuser的save_lora_weights保存模型权重 from diffusers import StableDiffusionPipeline save_path = "./" global_step = 0 StableDiffusionPipeline.save_lora_weights( save_directory = save_path, unet_lora_layers = new_model1_lora_state_dict, safe_serialization = True, weight_name = f"checkpoint-{global_step}.safetensors", ) # 加载lora模型权重(参考Stable Diffusion)，其实可以重写一个简单的版本 from safetensors import safe_open alpha = 1. # 参数融合因子 lora_path = "./" + f"checkpoint-{global_step}.safetensors" state_dict = {} with safe_open(lora_path, framework="pt", device="cpu") as f: for key in f.keys(): state_dict[key] = f.get_tensor(key) all_lora_weights = [] for idx,key in enumerate(state_dict): # only process lora down key if "lora_B." in key: continue up_key = key.replace(".lora_A.", ".lora_B.") # 通过lora_A直接获取lora_B的键名 model_key = key.replace("unet.", "").replace("lora_A.", "").replace("lora_B.", "") layer_infos = model_key.split(".")[:-1] curr_layer = new_model1 while len(layer_infos) > 0: temp_name = layer_infos.pop(0) curr_layer = curr_layer.__getattr__(temp_name) weight_down = state_dict[key].to(curr_layer.weight.data.device) weight_up = state_dict[up_key].to(curr_layer.weight.data.device) # 将lora参数合并到原模型参数中 -> new_W = origin_W + alpha*(BA) curr_layer.weight.data += alpha * torch.mm(weight_up, weight_down).to(curr_layer.weight.data.device) all_lora_weights.append([model_key, torch.mm(weight_up, weight_down).t()]) print('Load Lora Done')

5--完整代码

PEFT_LoRA

总结

### 文章总结：Lo ⁇ RA 在 Stable Diffusion 中的应用及其代码示例
#### 1. 介绍
LoRA（Low-Rank Adaptation of Large Language Models）是一种用于微调大规模预训练模型的新型方法。通过固定原始模型层，并添加和训练两个低秩矩阵（A和B），LoRA 能够以较小的计算成本和参数量实现模型微调的效果。本文重点介绍了LoRA在Stable Diffusion模型中的三种应用，并提供了PEFT-LoRA的原理讲解及代码示例。
#### 2. 基本原理
LoRA 的核心思想是在保持原始模型大部分参数不变的情况下，通过添加两个可训练的低秩矩阵A和B来微调模型。这两个矩阵的秩（r）远低于原始矩阵的维度，从而大大减少了参数量和计算量。具体地，假设原始线性层的权重为W，LoRA通过以下公式更新权重：
\[ \text{new\_weights} = \text{origin\_weights} + \alpha \cdot (B \cdot A) \]
其中，\(\alpha\) 是缩放因子，通常设置为与秩r相同的数值以保持初始时缩放为1。
#### 3. 简单代码示例
文章给出了一个简单的PyTorch代码示例，演示了如何使用PEFT库在自定义模型中应用LoRA。示例中包括模型的定义、LoRA配置的设置、以及如何使用LoRA修改模型并验证修改前后输出的变化。核心代码如下：
```python
# 模型定义
class Simple_Model(nn.Module):
# 阶层结构
# 配置LoRA
lora_config = LoraConfig(r=32, lora_alpha=32, ...)
# 创建并初始化原始模型
origin_model = Simple_Model()
# 生成LoRA模型（两种方式）
new_model1 = get_peft_model(origin_model, lora_config)
new_model2 = LoraModel(origin_model, lora_config, "default")
# 比较原始输出与LoRA模型输出
# 初始时因B矩阵初始化为全0，三者输出相同
```
#### 4. 权重保存和合并
文章还展示了如何保存LoRA参数并将这些参数合并回原始模型权重中。通过计算 \( \alpha \cdot (B \cdot A) \)，可以实现低秩矩阵对权重矩阵的更新，并将新的权重保存到模型中。使用了diffusers库中的`save_lora_weights`函数来保存LoRA权重，并演示了如何从`.safetensors`文件中加载并应用这些权重到模型中。代码片段包括：
```python
# 保存LoRA权重
StableDiffusionPipeline.save_lora_weights(...)
# 加载并合并LoRA权重
for key in state_dict:
# ... 提取A、B权重，计算BA并更新模型参数 ...
```
#### 5. 总结
本文全面介绍了LoRA在大规模模型微调中的应用，特别是Stable Diffusion中的使用方式。通过示例代码，读者可以了解到如何配置和使用LoRA来微调自定义模型，以及如何保存和合并LoRA权重。LoRA作为一种高效的任务特定适应技术，有望在未来更多领域中得到广泛应用。 tpudiffusionstable diffusion代码示例stablediffusion自定义模型模型微调自定义python代码片段app大规模模型预训练模型pytorch预训练广泛应用大规模预训练cpucto