AIGC——微调技术(Datawhale X 魔搭 Al夏令营)

AIGC——微调技术(Datawhale X 魔搭 Al夏令营)

    正在检查是否收录...

微调(Fine-tuning)是一种在深度学习和机器学习领域中常用的技术,旨在通过调整预训练模型的参数来使其更好地适应特定任务。其基本原理和参数的理解对于实现更好的效果至关重要。

前言

了解微调的基本原理,对微调的各种参数有一个更加清楚的了解,来实现一个更好的效果,并且在这个Task中给大家介绍一下文生图的工作流平台工具ComfyUI,来实现一个更加高度定制的文生图。

一、工具初探一ComfyUI应用场景探索

ComfyUI是一个基于节点流程的AI绘图工具WebUI,它专注于提供更加精准的工作流定制,通过将Stable Diffusion的流程拆分成节点,实现了工作流的定制和可复现性。

1、20分钟速通安装ComfyUI

选择使用魔搭社区提供的Notebook和免费的GPU算力体验来体验ComfyUI。

2、下载脚本代码文件

下载安装ComfyUI的执行文件task1中微调完成Lora文件

git lfs install git clone https://www.modelscope.cn/datasets/maochase/kolors_test_comfyui.git mv kolors_test_comfyui/* ./ rm -rf kolors_test_comfyui/ mkdir -p /mnt/workspace/models/lightning_logs/version_0/checkpoints/ mv epoch=0-step=500.ckpt /mnt/workspace/models/lightning_logs/version_0/checkpoints/ 

3、进入ComfyUI的安装文件

4、一键执行安装程序(大约10min)

5、进入预览界面

当执行到最后一个节点的内容输出了一个访问的链接的时候,复制链接到浏览器中访问

PS:如果链接访问白屏,或者报错,就等一会再访问重试,程序可能没有正常启动完毕

6、浅尝ComfyUI工作流

1.不带Lora的工作流样例

创建.json格式文件:

{ "last_node_id": 15, "last_link_id": 18, "nodes": [ { "id": 11, "type": "VAELoader", "pos": [ 1323, 240 ], "size": { "0": 315, "1": 58 }, "flags": {}, "order": 0, "mode": 0, "outputs": [ { "name": "VAE", "type": "VAE", "links": [ 12 ], "shape": 3 } ], "properties": { "Node name for S&R": "VAELoader" }, "widgets_values": [ "sdxl.vae.safetensors" ] }, { "id": 10, "type": "VAEDecode", "pos": [ 1368, 369 ], "size": { "0": 210, "1": 46 }, "flags": {}, "order": 6, "mode": 0, "inputs": [ { "name": "samples", "type": "LATENT", "link": 18 }, { "name": "vae", "type": "VAE", "link": 12, "slot_index": 1 } ], "outputs": [ { "name": "IMAGE", "type": "IMAGE", "links": [ 13 ], "shape": 3, "slot_index": 0 } ], "properties": { "Node name for S&R": "VAEDecode" } }, { "id": 14, "type": "KolorsSampler", "pos": [ 1011, 371 ], "size": { "0": 315, "1": 222 }, "flags": {}, "order": 5, "mode": 0, "inputs": [ { "name": "kolors_model", "type": "KOLORSMODEL", "link": 16 }, { "name": "kolors_embeds", "type": "KOLORS_EMBEDS", "link": 17 } ], "outputs": [ { "name": "latent", "type": "LATENT", "links": [ 18 ], "shape": 3, "slot_index": 0 } ], "properties": { "Node name for S&R": "KolorsSampler" }, "widgets_values": [ 1024, 1024, 1000102404233412, "fixed", 25, 5, "EulerDiscreteScheduler" ] }, { "id": 6, "type": "DownloadAndLoadKolorsModel", "pos": [ 201, 368 ], "size": { "0": 315, "1": 82 }, "flags": {}, "order": 1, "mode": 0, "outputs": [ { "name": "kolors_model", "type": "KOLORSMODEL", "links": [ 16 ], "shape": 3, "slot_index": 0 } ], "properties": { "Node name for S&R": "DownloadAndLoadKolorsModel" }, "widgets_values": [ "Kwai-Kolors/Kolors", "fp16" ] }, { "id": 3, "type": "PreviewImage", "pos": [ 1366, 468 ], "size": [ 535.4001724243165, 562.2001106262207 ], "flags": {}, "order": 7, "mode": 0, "inputs": [ { "name": "images", "type": "IMAGE", "link": 13 } ], "properties": { "Node name for S&R": "PreviewImage" } }, { "id": 12, "type": "KolorsTextEncode", "pos": [ 519, 529 ], "size": [ 457.2893696934723, 225.28656056301645 ], "flags": {}, "order": 4, "mode": 0, "inputs": [ { "name": "chatglm3_model", "type": "CHATGLM3MODEL", "link": 14, "slot_index": 0 } ], "outputs": [ { "name": "kolors_embeds", "type": "KOLORS_EMBEDS", "links": [ 17 ], "shape": 3, "slot_index": 0 } ], "properties": { "Node name for S&R": "KolorsTextEncode" }, "widgets_values": [ "cinematic photograph of an astronaut riding a horse in space |\nillustration of a cat wearing a top hat and a scarf |\nphotograph of a goldfish in a bowl |\nanime screencap of a red haired girl", "", 1 ] }, { "id": 15, "type": "Note", "pos": [ 200, 636 ], "size": [ 273.5273818969726, 149.55464588512064 ], "flags": {}, "order": 2, "mode": 0, "properties": { "text": "" }, "widgets_values": [ "Text encoding takes the most VRAM, quantization can reduce that a lot.\n\nApproximate values I have observed:\nfp16 - 12 GB\nquant8 - 8-9 GB\nquant4 - 4-5 GB\n\nquant4 reduces the quality quite a bit, 8 seems fine" ], "color": "#432", "bgcolor": "#653" }, { "id": 13, "type": "DownloadAndLoadChatGLM3", "pos": [ 206, 522 ], "size": [ 274.5334274291992, 58 ], "flags": {}, "order": 3, "mode": 0, "outputs": [ { "name": "chatglm3_model", "type": "CHATGLM3MODEL", "links": [ 14 ], "shape": 3 } ], "properties": { "Node name for S&R": "DownloadAndLoadChatGLM3" }, "widgets_values": [ "fp16" ] } ], "links": [ [ 12, 11, 0, 10, 1, "VAE" ], [ 13, 10, 0, 3, 0, "IMAGE" ], [ 14, 13, 0, 12, 0, "CHATGLM3MODEL" ], [ 16, 6, 0, 14, 0, "KOLORSMODEL" ], [ 17, 12, 0, 14, 1, "KOLORS_EMBEDS" ], [ 18, 14, 0, 10, 0, "LATENT" ] ], "groups": [], "config": {}, "extra": { "ds": { "scale": 1.1, "offset": { "0": -114.73954010009766, "1": -139.79705810546875 } } }, "version": 0.4 }

加载模型,并完成第一次生图

PS:首次点击生成图片会加载资源,时间较长,大家耐心等待

2.带Lora的工作流样例

创建.json格式文件:

 { "last_node_id": 16, "last_link_id": 20, "nodes": [ { "id": 11, "type": "VAELoader", "pos": [ 1323, 240 ], "size": { "0": 315, "1": 58 }, "flags": {}, "order": 0, "mode": 0, "outputs": [ { "name": "VAE", "type": "VAE", "links": [ 12 ], "shape": 3 } ], "properties": { "Node name for S&R": "VAELoader" }, "widgets_values": [ "sdxl.vae.safetensors" ] }, { "id": 10, "type": "VAEDecode", "pos": [ 1368, 369 ], "size": { "0": 210, "1": 46 }, "flags": {}, "order": 7, "mode": 0, "inputs": [ { "name": "samples", "type": "LATENT", "link": 18 }, { "name": "vae", "type": "VAE", "link": 12, "slot_index": 1 } ], "outputs": [ { "name": "IMAGE", "type": "IMAGE", "links": [ 13 ], "shape": 3, "slot_index": 0 } ], "properties": { "Node name for S&R": "VAEDecode" } }, { "id": 15, "type": "Note", "pos": [ 200, 636 ], "size": { "0": 273.5273742675781, "1": 149.5546417236328 }, "flags": {}, "order": 1, "mode": 0, "properties": { "text": "" }, "widgets_values": [ "Text encoding takes the most VRAM, quantization can reduce that a lot.\n\nApproximate values I have observed:\nfp16 - 12 GB\nquant8 - 8-9 GB\nquant4 - 4-5 GB\n\nquant4 reduces the quality quite a bit, 8 seems fine" ], "color": "#432", "bgcolor": "#653" }, { "id": 13, "type": "DownloadAndLoadChatGLM3", "pos": [ 206, 522 ], "size": { "0": 274.5334167480469, "1": 58 }, "flags": {}, "order": 2, "mode": 0, "outputs": [ { "name": "chatglm3_model", "type": "CHATGLM3MODEL", "links": [ 14 ], "shape": 3 } ], "properties": { "Node name for S&R": "DownloadAndLoadChatGLM3" }, "widgets_values": [ "fp16" ] }, { "id": 6, "type": "DownloadAndLoadKolorsModel", "pos": [ 201, 368 ], "size": { "0": 315, "1": 82 }, "flags": {}, "order": 3, "mode": 0, "outputs": [ { "name": "kolors_model", "type": "KOLORSMODEL", "links": [ 19 ], "shape": 3, "slot_index": 0 } ], "properties": { "Node name for S&R": "DownloadAndLoadKolorsModel" }, "widgets_values": [ "Kwai-Kolors/Kolors", "fp16" ] }, { "id": 12, "type": "KolorsTextEncode", "pos": [ 519, 529 ], "size": { "0": 457.28936767578125, "1": 225.28656005859375 }, "flags": {}, "order": 4, "mode": 0, "inputs": [ { "name": "chatglm3_model", "type": "CHATGLM3MODEL", "link": 14, "slot_index": 0 } ], "outputs": [ { "name": "kolors_embeds", "type": "KOLORS_EMBEDS", "links": [ 17 ], "shape": 3, "slot_index": 0 } ], "properties": { "Node name for S&R": "KolorsTextEncode" }, "widgets_values": [ "二次元,长发,少女,白色背景", "", 1 ] }, { "id": 3, "type": "PreviewImage", "pos": [ 1366, 469 ], "size": { "0": 535.400146484375, "1": 562.2001342773438 }, "flags": {}, "order": 8, "mode": 0, "inputs": [ { "name": "images", "type": "IMAGE", "link": 13 } ], "properties": { "Node name for S&R": "PreviewImage" } }, { "id": 16, "type": "LoadKolorsLoRA", "pos": [ 606, 368 ], "size": { "0": 317.4000244140625, "1": 82 }, "flags": {}, "order": 5, "mode": 0, "inputs": [ { "name": "kolors_model", "type": "KOLORSMODEL", "link": 19 } ], "outputs": [ { "name": "kolors_model", "type": "KOLORSMODEL", "links": [ 20 ], "shape": 3, "slot_index": 0 } ], "properties": { "Node name for S&R": "LoadKolorsLoRA" }, "widgets_values": [ "/mnt/workspace/models/lightning_logs/version_0/checkpoints/epoch=0-step=500.ckpt", 2 ] }, { "id": 14, "type": "KolorsSampler", "pos": [ 1011, 371 ], "size": { "0": 315, "1": 266 }, "flags": {}, "order": 6, "mode": 0, "inputs": [ { "name": "kolors_model", "type": "KOLORSMODEL", "link": 20 }, { "name": "kolors_embeds", "type": "KOLORS_EMBEDS", "link": 17 }, { "name": "latent", "type": "LATENT", "link": null } ], "outputs": [ { "name": "latent", "type": "LATENT", "links": [ 18 ], "shape": 3, "slot_index": 0 } ], "properties": { "Node name for S&R": "KolorsSampler" }, "widgets_values": [ 1024, 1024, 0, "fixed", 25, 5, "EulerDiscreteScheduler", 1 ] } ], "links": [ [ 12, 11, 0, 10, 1, "VAE" ], [ 13, 10, 0, 3, 0, "IMAGE" ], [ 14, 13, 0, 12, 0, "CHATGLM3MODEL" ], [ 17, 12, 0, 14, 1, "KOLORS_EMBEDS" ], [ 18, 14, 0, 10, 0, "LATENT" ], [ 19, 6, 0, 16, 0, "KOLORSMODEL" ], [ 20, 16, 0, 14, 0, "KOLORSMODEL" ] ], "groups": [], "config": {}, "extra": { "ds": { "scale": 1.2100000000000002, "offset": { "0": -183.91309381910426, "1": -202.11110769225016 } } }, "version": 0.4 } 

7、关闭魔塔GPU环境

二、Lora微调

Lora微调,全称为Low-Rank Adaptation(低秩适应),是一种高效的模型微调技术,特别适用于大型预训练模型。该技术通过引入低秩矩阵来保持预训练模型的大部分参数不变,仅调整少量参数以适应特定任务。

1.Lora微调的基本原理

参数矩阵的低秩近似: 大模型通常具有过参数化的特点,即参数矩阵的维度很高,但在特定任务中,只有一小部分参数起主要作用。 Lora利用低秩矩阵分解的思想,通过引入两个维度较小的矩阵A和B(A的维度为dxr,B的维度为rxd,其中r远小于d)来近似原始权重矩阵。 这两个矩阵相乘后,得到的矩阵AB的秩远小于原始权重矩阵的秩,但能够在一定程度上保持模型在特定任务上的性能。 旁路结构: 在网络中增加一个旁路结构,该旁路是A和B两个矩阵相乘的结果。 在训练过程中,冻结原始网络的参数,只训练旁路参数A和B。 由于A和B的参数量远远小于原始网络的参数,因此训练时所需的显存开销大大减小。

2.Task2中的的微调代码

代码如下:

import os cmd = """ python DiffSynth-Studio/examples/train/kolors/train_kolors_lora.py \ # 选择使用可图的Lora训练脚本DiffSynth-Studio/examples/train/kolors/train_kolors_lora.py --pretrained_unet_path models/kolors/Kolors/unet/diffusion_pytorch_model.safetensors \ # 选择unet模型 --pretrained_text_encoder_path models/kolors/Kolors/text_encoder \ # 选择text_encoder --pretrained_fp16_vae_path models/sdxl-vae-fp16-fix/diffusion_pytorch_model.safetensors \ # 选择vae模型 --lora_rank 16 \ # lora_rank 16 表示在权衡模型表达能力和训练效率时,选择了使用 16 作为秩,适合在不显著降低模型性能的前提下,通过 LoRA 减少计算和内存的需求 --lora_alpha 4.0 \ # 设置 LoRA 的 alpha 值,影响调整的强度 --dataset_path data/lora_dataset_processed \ # 指定数据集路径,用于训练模型 --output_path ./models \ # 指定输出路径,用于保存模型 --max_epochs 1 \ # 设置最大训练轮数为 1 --center_crop \ # 启用中心裁剪,这通常用于图像预处理 --use_gradient_checkpointing \ # 启用梯度检查点技术,以节省内存 --precision "16-mixed" # 指定训练时的精度为混合 16 位精度(half precision),这可以加速训练并减少显存使用 """.strip() os.system(cmd) # 执行可图Lora训练 

总结

AIGC作为人工智能领域的一个重要分支,正在逐步改变我们的生活方式和工作方式。随着技术的不断发展和完善,我们有理由相信AIGC将在更多领域展现出其独特的魅力和价值。

总结

kolchattpucodecomfyui工作流elorssfixdiffusiongit预训练模型预训练sdxlpytorchcodingjsongpurap文生图
  • 本文作者:李琛
  • 本文链接: https://wapzz.net/post-19836.html
  • 版权声明:本博客所有文章除特别声明外,均默认采用 CC BY-NC-SA 4.0 许可协议。
本站部分内容来源于网络转载,仅供学习交流使用。如涉及版权问题,请及时联系我们,我们将第一时间处理。
文章很赞!支持一下吧 还没有人为TA充电
为TA充电
还没有人为TA充电
0
  • 支付宝打赏
    支付宝扫一扫
  • 微信打赏
    微信扫一扫
感谢支持
文章很赞!支持一下吧
关于作者
2.3W+
5
0
1
WAP站长官方

【AIGC】训练数据入库(Milvus)

上一篇

Datawhale AI夏令营第四期魔搭-AIGC文生图方向Task3笔记

下一篇
  • 复制图片
按住ctrl可打开默认菜单