AIGC——微调技术(Datawhale X 魔搭 Al夏令营)

微调（Fine-tuning）是一种在深度学习和机器学习领域中常用的技术，旨在通过调整预训练模型的参数来使其更好地适应特定任务。其基本原理和参数的理解对于实现更好的效果至关重要。

前言

了解微调的基本原理，对微调的各种参数有一个更加清楚的了解，来实现一个更好的效果，并且在这个Task中给大家介绍一下文生图的工作流平台工具ComfyUI，来实现一个更加高度定制的文生图。

一、工具初探一ComfyUI应用场景探索

ComfyUI是一个基于节点流程的AI绘图工具WebUI，它专注于提供更加精准的工作流定制，通过将Stable Diffusion的流程拆分成节点，实现了工作流的定制和可复现性。

1、20分钟速通安装ComfyUI

选择使用魔搭社区提供的Notebook和免费的GPU算力体验来体验ComfyUI。

2、下载脚本代码文件

下载安装ComfyUI的执行文件和task1中微调完成Lora文件

git lfs install git clone https://www.modelscope.cn/datasets/maochase/kolors_test_comfyui.git mv kolors_test_comfyui/* ./ rm -rf kolors_test_comfyui/ mkdir -p /mnt/workspace/models/lightning_logs/version_0/checkpoints/ mv epoch=0-step=500.ckpt /mnt/workspace/models/lightning_logs/version_0/checkpoints/

3、进入ComfyUI的安装文件

4、一键执行安装程序（大约10min）

5、进入预览界面

当执行到最后一个节点的内容输出了一个访问的链接的时候，复制链接到浏览器中访问

PS：如果链接访问白屏，或者报错，就等一会再访问重试，程序可能没有正常启动完毕

6、浅尝ComfyUI工作流

1.不带Lora的工作流样例

创建.json格式文件：

{ "last_node_id": 15, "last_link_id": 18, "nodes": [ { "id": 11, "type": "VAELoader", "pos": [ 1323, 240 ], "size": { "0": 315, "1": 58 }, "flags": {}, "order": 0, "mode": 0, "outputs": [ { "name": "VAE", "type": "VAE", "links": [ 12 ], "shape": 3 } ], "properties": { "Node name for S&R": "VAELoader" }, "widgets_values": [ "sdxl.vae.safetensors" ] }, { "id": 10, "type": "VAEDecode", "pos": [ 1368, 369 ], "size": { "0": 210, "1": 46 }, "flags": {}, "order": 6, "mode": 0, "inputs": [ { "name": "samples", "type": "LATENT", "link": 18 }, { "name": "vae", "type": "VAE", "link": 12, "slot_index": 1 } ], "outputs": [ { "name": "IMAGE", "type": "IMAGE", "links": [ 13 ], "shape": 3, "slot_index": 0 } ], "properties": { "Node name for S&R": "VAEDecode" } }, { "id": 14, "type": "KolorsSampler", "pos": [ 1011, 371 ], "size": { "0": 315, "1": 222 }, "flags": {}, "order": 5, "mode": 0, "inputs": [ { "name": "kolors_model", "type": "KOLORSMODEL", "link": 16 }, { "name": "kolors_embeds", "type": "KOLORS_EMBEDS", "link": 17 } ], "outputs": [ { "name": "latent", "type": "LATENT", "links": [ 18 ], "shape": 3, "slot_index": 0 } ], "properties": { "Node name for S&R": "KolorsSampler" }, "widgets_values": [ 1024, 1024, 1000102404233412, "fixed", 25, 5, "EulerDiscreteScheduler" ] }, { "id": 6, "type": "DownloadAndLoadKolorsModel", "pos": [ 201, 368 ], "size": { "0": 315, "1": 82 }, "flags": {}, "order": 1, "mode": 0, "outputs": [ { "name": "kolors_model", "type": "KOLORSMODEL", "links": [ 16 ], "shape": 3, "slot_index": 0 } ], "properties": { "Node name for S&R": "DownloadAndLoadKolorsModel" }, "widgets_values": [ "Kwai-Kolors/Kolors", "fp16" ] }, { "id": 3, "type": "PreviewImage", "pos": [ 1366, 468 ], "size": [ 535.4001724243165, 562.2001106262207 ], "flags": {}, "order": 7, "mode": 0, "inputs": [ { "name": "images", "type": "IMAGE", "link": 13 } ], "properties": { "Node name for S&R": "PreviewImage" } }, { "id": 12, "type": "KolorsTextEncode", "pos": [ 519, 529 ], "size": [ 457.2893696934723, 225.28656056301645 ], "flags": {}, "order": 4, "mode": 0, "inputs": [ { "name": "chatglm3_model", "type": "CHATGLM3MODEL", "link": 14, "slot_index": 0 } ], "outputs": [ { "name": "kolors_embeds", "type": "KOLORS_EMBEDS", "links": [ 17 ], "shape": 3, "slot_index": 0 } ], "properties": { "Node name for S&R": "KolorsTextEncode" }, "widgets_values": [ "cinematic photograph of an astronaut riding a horse in space |\nillustration of a cat wearing a top hat and a scarf |\nphotograph of a goldfish in a bowl |\nanime screencap of a red haired girl", "", 1 ] }, { "id": 15, "type": "Note", "pos": [ 200, 636 ], "size": [ 273.5273818969726, 149.55464588512064 ], "flags": {}, "order": 2, "mode": 0, "properties": { "text": "" }, "widgets_values": [ "Text encoding takes the most VRAM, quantization can reduce that a lot.\n\nApproximate values I have observed:\nfp16 - 12 GB\nquant8 - 8-9 GB\nquant4 - 4-5 GB\n\nquant4 reduces the quality quite a bit, 8 seems fine" ], "color": "#432", "bgcolor": "#653" }, { "id": 13, "type": "DownloadAndLoadChatGLM3", "pos": [ 206, 522 ], "size": [ 274.5334274291992, 58 ], "flags": {}, "order": 3, "mode": 0, "outputs": [ { "name": "chatglm3_model", "type": "CHATGLM3MODEL", "links": [ 14 ], "shape": 3 } ], "properties": { "Node name for S&R": "DownloadAndLoadChatGLM3" }, "widgets_values": [ "fp16" ] } ], "links": [ [ 12, 11, 0, 10, 1, "VAE" ], [ 13, 10, 0, 3, 0, "IMAGE" ], [ 14, 13, 0, 12, 0, "CHATGLM3MODEL" ], [ 16, 6, 0, 14, 0, "KOLORSMODEL" ], [ 17, 12, 0, 14, 1, "KOLORS_EMBEDS" ], [ 18, 14, 0, 10, 0, "LATENT" ] ], "groups": [], "config": {}, "extra": { "ds": { "scale": 1.1, "offset": { "0": -114.73954010009766, "1": -139.79705810546875 } } }, "version": 0.4 }

加载模型，并完成第一次生图

PS：首次点击生成图片会加载资源，时间较长，大家耐心等待

2.带Lora的工作流样例

创建.json格式文件：

 { "last_node_id": 16, "last_link_id": 20, "nodes": [ { "id": 11, "type": "VAELoader", "pos": [ 1323, 240 ], "size": { "0": 315, "1": 58 }, "flags": {}, "order": 0, "mode": 0, "outputs": [ { "name": "VAE", "type": "VAE", "links": [ 12 ], "shape": 3 } ], "properties": { "Node name for S&R": "VAELoader" }, "widgets_values": [ "sdxl.vae.safetensors" ] }, { "id": 10, "type": "VAEDecode", "pos": [ 1368, 369 ], "size": { "0": 210, "1": 46 }, "flags": {}, "order": 7, "mode": 0, "inputs": [ { "name": "samples", "type": "LATENT", "link": 18 }, { "name": "vae", "type": "VAE", "link": 12, "slot_index": 1 } ], "outputs": [ { "name": "IMAGE", "type": "IMAGE", "links": [ 13 ], "shape": 3, "slot_index": 0 } ], "properties": { "Node name for S&R": "VAEDecode" } }, { "id": 15, "type": "Note", "pos": [ 200, 636 ], "size": { "0": 273.5273742675781, "1": 149.5546417236328 }, "flags": {}, "order": 1, "mode": 0, "properties": { "text": "" }, "widgets_values": [ "Text encoding takes the most VRAM, quantization can reduce that a lot.\n\nApproximate values I have observed:\nfp16 - 12 GB\nquant8 - 8-9 GB\nquant4 - 4-5 GB\n\nquant4 reduces the quality quite a bit, 8 seems fine" ], "color": "#432", "bgcolor": "#653" }, { "id": 13, "type": "DownloadAndLoadChatGLM3", "pos": [ 206, 522 ], "size": { "0": 274.5334167480469, "1": 58 }, "flags": {}, "order": 2, "mode": 0, "outputs": [ { "name": "chatglm3_model", "type": "CHATGLM3MODEL", "links": [ 14 ], "shape": 3 } ], "properties": { "Node name for S&R": "DownloadAndLoadChatGLM3" }, "widgets_values": [ "fp16" ] }, { "id": 6, "type": "DownloadAndLoadKolorsModel", "pos": [ 201, 368 ], "size": { "0": 315, "1": 82 }, "flags": {}, "order": 3, "mode": 0, "outputs": [ { "name": "kolors_model", "type": "KOLORSMODEL", "links": [ 19 ], "shape": 3, "slot_index": 0 } ], "properties": { "Node name for S&R": "DownloadAndLoadKolorsModel" }, "widgets_values": [ "Kwai-Kolors/Kolors", "fp16" ] }, { "id": 12, "type": "KolorsTextEncode", "pos": [ 519, 529 ], "size": { "0": 457.28936767578125, "1": 225.28656005859375 }, "flags": {}, "order": 4, "mode": 0, "inputs": [ { "name": "chatglm3_model", "type": "CHATGLM3MODEL", "link": 14, "slot_index": 0 } ], "outputs": [ { "name": "kolors_embeds", "type": "KOLORS_EMBEDS", "links": [ 17 ], "shape": 3, "slot_index": 0 } ], "properties": { "Node name for S&R": "KolorsTextEncode" }, "widgets_values": [ "二次元，长发，少女，白色背景", "", 1 ] }, { "id": 3, "type": "PreviewImage", "pos": [ 1366, 469 ], "size": { "0": 535.400146484375, "1": 562.2001342773438 }, "flags": {}, "order": 8, "mode": 0, "inputs": [ { "name": "images", "type": "IMAGE", "link": 13 } ], "properties": { "Node name for S&R": "PreviewImage" } }, { "id": 16, "type": "LoadKolorsLoRA", "pos": [ 606, 368 ], "size": { "0": 317.4000244140625, "1": 82 }, "flags": {}, "order": 5, "mode": 0, "inputs": [ { "name": "kolors_model", "type": "KOLORSMODEL", "link": 19 } ], "outputs": [ { "name": "kolors_model", "type": "KOLORSMODEL", "links": [ 20 ], "shape": 3, "slot_index": 0 } ], "properties": { "Node name for S&R": "LoadKolorsLoRA" }, "widgets_values": [ "/mnt/workspace/models/lightning_logs/version_0/checkpoints/epoch=0-step=500.ckpt", 2 ] }, { "id": 14, "type": "KolorsSampler", "pos": [ 1011, 371 ], "size": { "0": 315, "1": 266 }, "flags": {}, "order": 6, "mode": 0, "inputs": [ { "name": "kolors_model", "type": "KOLORSMODEL", "link": 20 }, { "name": "kolors_embeds", "type": "KOLORS_EMBEDS", "link": 17 }, { "name": "latent", "type": "LATENT", "link": null } ], "outputs": [ { "name": "latent", "type": "LATENT", "links": [ 18 ], "shape": 3, "slot_index": 0 } ], "properties": { "Node name for S&R": "KolorsSampler" }, "widgets_values": [ 1024, 1024, 0, "fixed", 25, 5, "EulerDiscreteScheduler", 1 ] } ], "links": [ [ 12, 11, 0, 10, 1, "VAE" ], [ 13, 10, 0, 3, 0, "IMAGE" ], [ 14, 13, 0, 12, 0, "CHATGLM3MODEL" ], [ 17, 12, 0, 14, 1, "KOLORS_EMBEDS" ], [ 18, 14, 0, 10, 0, "LATENT" ], [ 19, 6, 0, 16, 0, "KOLORSMODEL" ], [ 20, 16, 0, 14, 0, "KOLORSMODEL" ] ], "groups": [], "config": {}, "extra": { "ds": { "scale": 1.2100000000000002, "offset": { "0": -183.91309381910426, "1": -202.11110769225016 } } }, "version": 0.4 }

7、关闭魔塔GPU环境

二、Lora微调

Lora微调，全称为Low-Rank Adaptation（低秩适应），是一种高效的模型微调技术，特别适用于大型预训练模型。该技术通过引入低秩矩阵来保持预训练模型的大部分参数不变，仅调整少量参数以适应特定任务。

1.Lora微调的基本原理

参数矩阵的低秩近似：大模型通常具有过参数化的特点，即参数矩阵的维度很高，但在特定任务中，只有一小部分参数起主要作用。 Lora利用低秩矩阵分解的思想，通过引入两个维度较小的矩阵A和B（A的维度为dxr，B的维度为rxd，其中r远小于d）来近似原始权重矩阵。这两个矩阵相乘后，得到的矩阵AB的秩远小于原始权重矩阵的秩，但能够在一定程度上保持模型在特定任务上的性能。旁路结构：在网络中增加一个旁路结构，该旁路是A和B两个矩阵相乘的结果。在训练过程中，冻结原始网络的参数，只训练旁路参数A和B。由于A和B的参数量远远小于原始网络的参数，因此训练时所需的显存开销大大减小。

2.Task2中的的微调代码

代码如下：

import os cmd = """ python DiffSynth-Studio/examples/train/kolors/train_kolors_lora.py \ # 选择使用可图的Lora训练脚本DiffSynth-Studio/examples/train/kolors/train_kolors_lora.py --pretrained_unet_path models/kolors/Kolors/unet/diffusion_pytorch_model.safetensors \ # 选择unet模型 --pretrained_text_encoder_path models/kolors/Kolors/text_encoder \ # 选择text_encoder --pretrained_fp16_vae_path models/sdxl-vae-fp16-fix/diffusion_pytorch_model.safetensors \ # 选择vae模型 --lora_rank 16 \ # lora_rank 16 表示在权衡模型表达能力和训练效率时，选择了使用 16 作为秩，适合在不显著降低模型性能的前提下，通过 LoRA 减少计算和内存的需求 --lora_alpha 4.0 \ # 设置 LoRA 的 alpha 值，影响调整的强度 --dataset_path data/lora_dataset_processed \ # 指定数据集路径，用于训练模型 --output_path ./models \ # 指定输出路径，用于保存模型 --max_epochs 1 \ # 设置最大训练轮数为 1 --center_crop \ # 启用中心裁剪，这通常用于图像预处理 --use_gradient_checkpointing \ # 启用梯度检查点技术，以节省内存 --precision "16-mixed" # 指定训练时的精度为混合 16 位精度（half precision），这可以加速训练并减少显存使用 """.strip() os.system(cmd) # 执行可图Lora训练