Debug Stable Diffusion webui

WAP站长网发布于 2025-6-24 16:16 阅读：33 SEO教程

文章目录

SD 前期预备一些惊喜 TorchHijackForUnet Txt2Img 搭配 Lora 使用单独运行 txt2img.py 获取所有资源代码地址参数 sd model 主程序代码地址参数(同上) 模型Inference LORA应用重构并使用LORA模型用Lora重构后的网络做 sampler 后处理 API 运行指南根据自己的需要编写Api api.py models.py Samplers 误区 Euler LMS DDIM DPM solver++ Patch Diffusion

以下内容是最近的学习笔记，如果有不对的地方，还望同志们指出~共勉

SD

前期预备

深入Stable diffusion时，可以不按照官方指导来。官方指导对于AIGC爱好者比较的友好~.

可以选择Anaconda 按照之前AI传统，安装环境（需要从代码包里找到所有需要安装的库，有点麻烦，但是能用），也可以直接运行webui.sh安装虚拟的python环境（个人推荐VENV，非常省事，删掉VENV重新安装，科学上网后耗时只在半小时内）。

安装完环境后，可以设置IDE的python环境，然后debug ‘launch.py’, voila, 你可以开始各种探索了。

一些惊喜

TorchHijackForUnet

CondFunc(‘modules.models.diffusion.ddpm_edit.LatentDiffusion.decode_first_stage’, first_stage_sub, first_stage_cond)

用于dynamicly import modules.

Txt2Img 搭配 Lora 使用

单独运行 txt2img.py

可以自行改写以下(放在txt2img.py的最后)：

if __name__ == "__main__": txt2img(id_task= 'task(lt4vr6pvvx26gfm)', prompt= 'pandas', negative_prompt= 'cats', prompt_styles=[], steps= 20, sampler_index= 0, restore_faces= False, tiling= False, n_iter= 1, #(对应GUI上的Batch count) batch_size= 1, cfg_scale= 7, seed= -1, subseed= -1, subseed_strength= 0, seed_resize_from_h= 0, seed_resize_from_w= 0, seed_enable_extras= False, height= 512, width= 512, enable_hr= False, denoising_strength= 0.7, hr_scale= 2, hr_upscaler= 'Latent', hr_second_pass_steps= 0, hr_resize_x= 0, hr_resize_y= 0, hr_sampler_index= 0, hr_prompt= '', hr_negative_prompt='', override_settings_texts=[], )

获取所有资源

代码地址

modules\txt2img.py

参数

是一个Object processing.StableDiffusionProcessingTxt2Img
参数地址： Line 871 和 Line 105

modules\processing.py

p = processing.StableDiffusionProcessingTxt2Img( sd_model=shared.sd_model, outpath_samples=opts.outdir_samples or opts.outdir_txt2img_samples, outpath_grids=opts.outdir_grids or opts.outdir_txt2img_grids, prompt=prompt, styles=prompt_styles, negative_prompt=negative_prompt, seed=seed, subseed=subseed, subseed_strength=subseed_strength, seed_resize_from_h=seed_resize_from_h, seed_resize_from_w=seed_resize_from_w, seed_enable_extras=seed_enable_extras, sampler_name=sd_samplers.samplers[sampler_index].name, batch_size=batch_size, n_iter=n_iter, steps=steps, cfg_scale=cfg_scale, width=width, height=height, restore_faces=restore_faces, tiling=tiling, enable_hr=enable_hr, denoising_strength=denoising_strength if enable_hr else None, hr_scale=hr_scale, hr_upscaler=hr_upscaler, hr_second_pass_steps=hr_second_pass_steps, hr_resize_x=hr_resize_x, hr_resize_y=hr_resize_y, hr_sampler_name=sd_samplers.samplers_for_img2img[hr_sampler_index - 1].name if hr_sampler_index != 0 else None, hr_prompt=hr_prompt, hr_negative_prompt=hr_negative_prompt, override_settings=override_settings, )

sd model

除了：

unet = p.sd_model.model.diffusion_model text_encoder = p.sd_model.cond_stage_model

sd model还包含：

first_stage_model configure_shareded_model

主程序

process_images() process_images_inner()

代码地址

modules\processing.py

参数(同上)

是一个Object processing.StableDiffusionProcessingTxt2Img
参数地址： Line 871 和 Line 105

modules\processing.py

模型Inference

LORA应用

地址：Line 186

extensions\sd-webui-additional-networks\scripts\additional_networks.py

Line 236 - Line 251:

加载Lora模型到latest_networks 这个list中通过 du_state_dict = load_file(model_path)导入LORA模型每层权重（类似于一个ZIP里包含每一个layer的权重文件，然后通过文件名的index对应到每一层的名字上（单独一个name list）） network, info = lora_compvis.create_network_and_apply_compvis(du_state_dict, weight_tenc, weight_unet, text_encoder, unet) 将权重和LORA模型进行结合：

dimension: {96}, alpha: {48.0}, multiplier_unet: 1, multiplier_tenc: 1 create LoRA for Text Encoder: 72 modules. create LoRA for U-Net: 192 modules. original forward/weights is backed up. enable LoRA for text encoder enable LoRA for U-Net shapes for 0 weights are converted.

重构并使用LORA模型

地址：

extensions\sd-webui-additional-networks\scripts\lora_compvis.py

靠 model里是否包含"MultiheadAttention"判断LORA的版本。有：V2，没有：V1

modules 里的dim 和 alpha 都是传入的一个list，可以潜在不唯一（如果满足网络结构）

comp_vis_loras_dim_alpha[comp_vis_lora_name] = (dim, alpha)

把SD与lora结合就需要根据几点去筛选出Unet和CLIP里有用的那几层，条件：

选择：Linear or Conv2d相关的不选 _resblocks_23_, 即StabilityAi Text Encoder最后一个block 不选不在comp_vis_loras_dim_alpha这个dict的keys里的选择：MultiheadAttention相关的在这个范围内的每一层，都会通过加入 [“q_proj”, “k_proj”, “v_proj”, “out_proj”] 这些suffix 来构建关于 Q，K， V， O 的重复项不选 _resblocks_23_, 即StabilityAi Text Encoder最后一个block 不选不在comp_vis_loras_dim_alpha这个dict的keys里的

以上条件都满足的会通过：

lora = LoRAModule(lora_name, child_module, multiplier, dim, alpha)

重构针对Lora使用的Unet 或者 CLIP的每一层/block，形成新的Unet & CLIP，也因此，我们的SD model 融入好了我们自己的LORA

用Lora重构后的网络做 sampler

地址：

modules\processing.py

通过 def sample() 来运行：

组装sampler 设定latent scale mode create_random_tensors noise的shape =(4, 64, 64) noise 被 slerp(subseed_strength, noise, subnoise)做interpolation的平滑优化

samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditioning(x)

后处理

地址：

modules\processing.py

通过以下代码得到最终模型输出得结果：

with devices.without_autocast() if devices.unet_needs_upcast else devices.autocast(): samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.subseed_strength, prompts=p.prompts)

从Line 741 开始进入后处理得阶段（根据用户需要进行处理），以下为一个例子：
改颜色范围以及调换shape(从（3, 512, 512）==> (512, 512, 3)).

x_sample = 255. * np.moveaxis(x_sample.cpu().numpy(), 0, 2)

API

运行指南

修改webui-user.bat

@echo off set PYTHON= set GIT= set VENV_DIR= set COMMANDLINE_ARGS=--api call webui.bat

打开powershell, 运行stablediffusion

webui-user.bat

webui 会被locate 在： http://127.0.0.1:7860

假设你要运行txt2img 并且需要额外辅助

def run(): import json import requests import io import base64 import cv2 from PIL import Image, PngImagePlugin img_path = r'C:\Users\Me\Desktop\tryout.jpg' img = cv2.imread(img_path) retval, bytes = cv2.imencode('.png', img) encoded_image = base64.b64encode(bytes).decode('utf-8') url = "http://127.0.0.1:7860" height, width = 800, 800 payload = { "prompt": "puppy dog", "sampler_name":'DPM++ 2S a Karras', "steps": 100, "width": width, "height":height, "hr_scale":2, "hr_upscaler":'Latent', "alwayson_scripts": { "ControlNet": { "args": [ { "enabled": True, "input_image": encoded_image, "module": "inpaint_only", "model": "control_v11p_sd15_inpaint [ebff9138]", "resize_mode":"Resize and Fill", "control_mode": "ControlNet is more important", }, ] } } } response = requests.post(url=f'{url}/sdapi/v1/txt2img', json=payload) r = response.json() result = r['images'][0] image = Image.open(io.BytesIO(base64.b64decode(result.split(",", 1)[0]))) image.save('output.png') if __name__ == "__main__": run()

根据自己的需要编写Api

api.py

有2个东西需要复制用作你自己的api: 假设你有个api叫做：xxx

添加route

self.add_api_route("/sdapi/v1/xxx", self.xxxapi, methods=["POST"], response_model=models.xxxResponse)

写自己的api

def xxxapi(self, xxxreq: models.StableDiffusionXXXProcessingAPI): ...

models.py

有1个东西需要复制用作你自己的api:

input Object
假设这里你还是需要做txt2img, 那么保留，但是由于你有自己的API，那么名字是不能重复的，所有要用 StableDiffusionProcessingXXX

StableDiffusionXXXProcessingAPI = PydanticModelGenerator( "StableDiffusionProcessingXXX", StableDiffusionProcessingTxt2Img, [ {"key": "sampler_index", "type": str, "default": "Euler"}, {"key": "script_name", "type": str, "default": None}, {"key": "script_args", "type": list, "default": []}, {"key": "send_images", "type": bool, "default": True}, {"key": "save_images", "type": bool, "default": False}, {"key": "alwayson_scripts", "type": dict, "default": {}}, ] ).generate_model()

Samplers

误区

因为要和制造一个好的图联系在一起，这里的sampler 比较容易让我和Adam 等optimisation混在一起.

如果有接触 Monte Carlo，这个也比较迷惑我。

sampler是用来求一个解的方式。例如穷举，通过得到整个解的空间，找到最终解（如果solution space小），类似的还有 Monte Carlo, 当你solution space巨大，那么就根据已知的概率分布，随机方式采样（sampling），获得近似解。

但是， Euler sampler实际上做的是估计一个函数本身，而不是估计一个函数的解

Euler sampler可以当作是，已知一个Function A的Differential equation，记为B, 通过对这Differential equation 求 N个解，估计出Function A。或者说已知 10个点，但是Euler Sampler可以得到一个完整的曲线，并且确保这个10个点也属于这个完整曲线的解也就是说，我一定已经有了一个Differential Equation。从一个初始值开始，通过微分方程逐步推进解，从而得到整个解的轨迹或曲线。 DDPM这个内容里，Difussion是一个扩散的过程这样的”扩散“ 更应当往动力学上一个小球的滚动路径上想象。因为Diffusion的机制是依赖feedforward 和 backforward，这里运用到了概率上 bayesian的思想，DDPM的做法也使得扩散（diffusion）是随机的。说扩散随机，主要源自于在推导每步高斯分布的noise基础上，额外加了random noise。但实际上整个DDPM过程求解，只是用一个function更好的更高端的推测Difussion扩散的过程。
因为是随机的，这样的动力function更可以用一个SDE（随机的 Differential Equation，不是Diffusion）来重新归纳。

https://learn.rundiffusion.com/sampling-methods/
https://github.com/crowsonkb/k-diffusion

Euler

Euler’s method is used to approximate solutions to ordinary differential equations
https://tutorial.math.lamar.edu/classes/de/eulersmethod.aspx
这属于老套路的解法，除此之外还有基于Exponential intergrators的。这个有在DPM-Solver ++ 介绍。

LMS

DDIM

对DDPM做了优化，去掉了“ramdom noise”,让stochastic Diffusion变成 Deterministic，它可以重新用 ODE（Ordinary　Differential　Equation）做归纳。

DPM solver++

Diffusion Probabilistic model

之前的DPM是针对noise去预测，这个版本的DPM＋＋是针对image x 去

Patch Diffusion

https://github.com/ericl122333/PatchDiffusion-Pytorch

codediffusionapipromptstablediffusionwebwebuiscripturlcreateidejsonstoreclippnggitpythonclipromptssharegithubappcontrolneterplmsatscpupandasshellpytorchstable diffusiondebugnistpowershelltpusat动力学学习笔记guinumpygenerator高斯分布

Debug Stable Diffusion webui

文章目录

SD

前期预备

一些惊喜

TorchHijackForUnet

Txt2Img 搭配 Lora 使用

单独运行 txt2img.py

获取所有资源

代码地址

参数

sd model

主程序

代码地址

参数(同上)

模型Inference

LORA应用

重构并使用LORA模型

用Lora重构后的网络 做 sampler

后处理

API

运行指南

根据自己的需要编写Api

api.py

models.py

Samplers

误区

Euler

LMS

DDIM

DPM solver++

Patch Diffusion

用Lora重构后的网络做 sampler