stable diffusion webui的代码结构解析

stable-diffusion-webui源码分析（1）-Gradio - 知乎AUTOMATIC1111的webui是近期很流行的stable-diffusion应用，它集合stable-diffusion各项常用功能，还通过扩展的形式支持controlnet、lora等技术。下图是stable-diffusion-webui的界面，可见功能强大。 by 罗培羽 s…https://zhuanlan.zhihu.com/p/617742414核心是抽接口，因为stable-diffusion-webui应用的很广，有不少先验的经验，也许可以抽象成比较不错的接口。不过还是需要对应着webui原始嵌入的代码来看，否则直接去webui里面抽还是挺麻烦的。

cmd_args.py：

update-all-extensions： skip-python-version-check： skip-torch-cuda-test： reinstall-torch： update-check： tests: no-tests： skip-install： skip-version-check：False不检查torch和xformers的版本 data-dir：用户数据保存的路径。 config：configs/stable-diffusion/v1-inference.yaml，建构模型设置档的路径。 ckpt：model.ckpt,Stable Diffusion模型的存盘点模型路径。一旦指定，该模型会加入至存盘点模型列表并加载。 ckpt-dir：None,存放Stable Diffusion模型存盘点模型的路径。 no-download-sd-model：False,即使找不到模型，也不自动下载SD1.5模型。 vae-dir：None，VAE的路径。 gfpgan-dir：GFPGAN路径 gfpgan-model：GFPGAN模型文件名 codeformer-models-path：Codeformer模型档的路径。 gfpgan-models-path：GFPGAN模型档的路径。 esrgan-models-path：ESRGAN模型档的路径。 bsrgan-models-path：BSRGAN模型档的路径。 realesrgan-models-path：RealESRGAN模型档的路径。 clip-models-path：None，含有CLIP模型档的路径。 embeddings-dir：embeddings/Textual，inversion的embeddings路径 (缺省: embeddings) textual-inversion-templates-dir：textual_inversion_templatesTextual inversion范本的路径 hypernetwork-dir：models/hypernetworks/Hypernetwork路径 localizations-dir：localizations/， 在地化翻译路径 ui-config-file：ui-config.json，UI设置档文件名 no-progressbar-hiding：False，取消隐藏Gradio UI的进度条 (我们之所以将其隐藏，是因为在浏览器启动硬件加速的状况下，进度条会降低机器学习的性能) max-batch-count：16，UI的最大批量数值 ui-settings-file：config.json，UI设置值画面的文件名 allow-code：False，允许在WebUI运行自订指令稿 share：False，使用此参数在启动后会产生Gradio网址，使WebUI能从外部网络访问 listen：False，以0.0.0.0主机名称启动Gradio，使其能回应连接请求 port：7860，以给定的通信端口启动Gradio。1024以下的通信端口需要root权限。如果可用的话，缺省使用7860通信端口。 hide-ui-dir-config：False，在WebUI隐藏设置档目录。 freeze-settings：False，停用编辑设置。 enable-insecure-extension-access：False，无视其他选项，强制激活扩充功能页签。 gradio-debug：False，使用 --debug选项启动Gradio gradio-auth：None，设置Gardio授权，例如"username:password"，或是逗号分隔值形式"u1:p1,u2:p2,u3:p3" gradio-auth-path：None，设置Gardio授权文件路径。 例如 "/路径/" 再加上`--gradio-auth`的格式。 disable-console-progressbars：False，不在终端机显示进度条。 enable-console-prompts：False，在使用文生图和图生图的时候，于终端机印出提示词 api：False，以API模式启动WebUI api-auth：None，设置API授权，例如"username:password"，或是逗号分隔值形式"u1:p1,u2:p2,u3:p3" api-log：False，激活所有API请求的纪录档 nowebui：False，仅启动API, 不启动WebUI ui-debug-mode：False，不加载模型，以更快启动WebUI device-id：None，选择要使用的CUDA设备 (例如在启动指令稿使用export CUDA_VISIBLE_DEVICES=0或1) administrator：False，使用系统管理员权限 cors-allow-origins：None，允许跨来源资源共用，列表以逗号分隔，不可有空格 cors-allow-origins-regex：None，允许跨来源资源共用，后面加上单一正规表达式 tls-keyfile：None，部份激活TLS,，需要配合--tls-certfile才能正常运作 tls-certfile：None，部份激活TLS，需要配合--tls-keyfile才能正常运作 server-name：None，设置服务器主机名称 gradio-queue：False，使用Gradio queue。实验性功能，会导致重启按钮损坏。 no-hashing：False，停用计算存盘点模型的sha256哈希值，加快加载速度 xformers：False，给cross attention layers激活xformers force-enable-xformers：False，强制给cross attention layers激活xformers reinstall-xformers：False，强制重装xformers，升级时很有用。但为避免不断重装，升级后将会移除。 xformers-flash-attention：False，给xformers激活Flash Attention，提升再现能力 (仅支持SD2.x或以此为基础的模型) opt-split-attention：False，强制激活Doggettx的cross-attention layer优化。有CUDA的系统缺省激活此选项。 opt-split-attention-invokeai：False，强制激活InvokeAI的cross-attention layer优化。无CUDA的系统缺省激活此选项。 opt-split-attention-v1：False，激活旧版的split attention优化，防止占用全部可用的VRAM opt-sub-quad-attention：False，激活增进内存效率的sub-quadratic cross-attention layer优化 sub-quad-q-chunk-size：1024，sub-quadratic cross-attention layer优化使用的串行化区块大小 sub-quad-kv-chunk-size：None，sub-quadratic cross-attention layer优化使用的kv区块大小 sub-quad-chunk-threshold：None，sub-quadratic cross-attention layer优化过程中，区块化使用的VRAM阈值 opt-channelslast：False，激活4d tensors使用的alternative layout，或许可以加快推理速度 仅适用搭载Tensor内核的Nvidia显卡(16xx系列以上) disable-opt-split-attention：False，强制停用cross-attention layer的优化 disable-nan-check：False，不检查生成图像/潜在空间是否有nan。在CI模式无使用存盘点模型的时候很有用。 use-cpu：None，让部份模块使用CPU作为PyTorch的设备 no-half：False，不将模型转换为半精度浮点数 precision：autocast，使用此精度评估 no-half-vae：False，不将VAE模型转换为半精度浮点数 upcast-sampling：False，向上采样。搭配 --no-half使用则无效。生成的结果与使用--no-half参数相近，效率更高，使用更少内存。 medvram：False，激活Stable Diffusion模型优化，牺牲速度，换取较小的VRAM占用。 lowvram：False，激活Stable Diffusion模型优化，大幅牺牲速度，换取更小的VRAM占用。 lowram：False，将Stable Diffusion存盘点模型的权重加载至VRAM，而非RAM always-batch-cond-uncond：False将--medvram或--lowvram使用的无限制批量停用 | --autolaunch | False | 启动WebUI后自动打开系统缺省的浏览器 | | ----------------------- | ----- | ------------------------------------------------------------ | | --theme | Unset | 使用指定主题启动WebUI (light或dark)，无指定则使用浏览器缺省主题。 | | --use-textbox-seed | False | 在WebUI的种子字段使用textbox (没有上下，但可以输入长的种子码) | | --disable-safe-unpickle | False | 不检查PyTorch模型是否有恶意代码 | | --ngrok | None | Ngrok授权权杖， --share参数的替代品。 | | --ngrok-region | us | 选择启动Ngrok的区域 | | --show-negative-prompt | False | 无作用 | | ---------------------- | ----- | ------ | | --deepdanbooru | False | 无作用 | | --unload-gfpgan | False | 无作用 | | --gradio-img2img-tool | None | 无作用 | | --gradio-inpaint-tool | None | 无作用 |

代码文件夹：

localizations：是ui的各种语言翻译，其中zh_CN.json是中文的。 extensions: extensions-builtin:这两个库都是额外的一些扩展件 modules: ui.py：gradio对应的页面，里面有modules的接口 shared.py：输入变量 paths：一些库和文件的依赖关系，比如ldm，包括模型的路径 extensions.py：对应目录中的extensions， sd_samplers.py: 封装了create_sampler方法，ddim等 sd_samplers_kdiffusion.py：封装了很多的采样方法 sd_samplers_compvis.py：和上面一样，封装了DDIM、plms等采样方法 scripts.py：控制整个流程的函数 script_callback.py: 像hook一样封装了各种runner的流程，这个可以由扩展函数来写 api/api.py: 用fastapi封装了api

// webui modules.ui-> modules.txt2img-> modules.processing： StableDiffusionProcessingTxt2Img子类继承了sampler等方法->  StableDiffusionProcessing -> sample-> sampler=sd_samplers.create_sampler(sampler_name,sd_model)-> // api txt2img modules.api.api.py-> text2imgapi-> p=StableDiffusionProcessingTxt2Img(sd_model,*args) 初始化->StableDiffusionProcessing precess_images(p)-> res=process_image_inner(p)-> prompts/negative_prompts webui引入了负面prompts-> uc=get_conds_with_caching(prompt_parser.get_learned_conditioning,negative_prompts)-> conds=model.get_learned_conditioning(texts)-> c=get_condes_with_caching(prompt_parser.get_multicond_learned_conditioning,prompts)-> samples_ddim=p.sample(conditioning=c,unconditional_conditioning=uc, seeds=seeds,subseeds=subseeds,subseed_strength=p.subseed_strength,prompt=prompts)-> # sampler = sd_sampler.create_sampler()-> # x=create_random_tensor()-> # samples=sampler.sample(x,conditioning,unconditional_conditioning, # image_conditioning-txt2img_image_conditioning(x))-> x_samples_ddim=[decode_first_stage(p.sd_model,sample_ddim[i:i+1])]-> x_samples_ddim=torch.stack(x_sample_ddim).float()-> x_samples_ddim=torch.clamp((x_sample_ddim+1.0)/2,min=0,max=1)

之后我会按照自己的理解对stable diffusion webui的原始代码进行重构，webui的代码更新还是不频繁的，重构还是有一定意义的

launch.py：进行环境和资源的安装检测，自动用git去拉文件，这块对我们意义不大因为平台连不了外网

webui.py：

modules/paths.py 把repositories目录加载sys.path中 modules/paths_internal.py 加载文件config等一些目录的地址 modules/shared.py-> modules/cmd_args.py->ckpt: webui.py-> initialize() #核心在初始化权重 -> - check_version()-> - extensions.list_extensions()-> - modules.sd_models.setup_model()-> -- modules.sd_models.list_models()-> - modules.scripts.load_scripts()-> - modules.sd_vae.refresh_vae_list()-> - modules.sd_models.load_model()-> -- get_checkpoint_state_dict()-> -- read_state_dict(checkpoint.filename)-> -- pl_sd=torch.load()->get_state_dict_from_checkpoint(pl_sd)-> -- sd=get_state_dict_from_checkpoint()-> -- checkpoint_config=sd_models_config.find_checkpoint(state_dict,checkpoint_info) # 默认v1_inference.yaml-> -- sd_config=OmegaConf.load(checkpoint_config)-> -- load_model_weights()-> -- sd_hijack.model_hijack.hijack(sd_model)-> -- sd_model.eval()-> -- script_callback.model_loaded_callback(sd_model)-> - shared.opts.onchange("sd_model_checkpoint", wrap_queued_call(lambda: modules.sd_models.reload_model_weights()))-> - shared.opts.onchange("sd_vae", wrap_queued_call(lambda: modules.sd_vae.reload_vae_weights()), call=False)-> - shared.opts.onchange("sd_vae_as_default", wrap_queued_call(lambda: modules.sd_vae.reload_vae_weights()), call=False)-> ### 启动了fastapi app=FastAPI()-> api=create_api(app)-> - modules.api.Api-> - add_api_route("/sdapi/v1/txt2img",self.text2imgapi,methods=['POST'],reponse_model=TextToImageResponse)-> - modules.script_callbacks.app_started_callback(None,app) modules.api.Api-> text2imgapi(txt2imgreq: StableDiffusionTxt2ImgProcessingAPI)-> modules.scripts.load_scripts()-> ### webui modules.script_callbacks.before_ui_callback()-> modules.ui.create_ui()-> - modules.scripts.scripts_current=modules.scripts.scripts_txt2img-> - modules.scripts.scripts_txt2img.initialize_scripts(is_img2img=False)-> -- auto_processing_scripts=modules.scripts_auto_postprocessing.create_auto_preprocesing_script_data() # 预处理的script-> -- scripts_data:scripts下.py,继承Scripts-> - txt2img_prompt, txt2img_prompt_styles, txt2img_negative_prompt, submit, _, _, txt2img_prompt_style_apply, txt2img_save_style, txt2img_paste, extra_networks_button, token_counter, token_button, negative_token_counter, negative_token_button=create_toprow()-> - ... - txt2img_args=dict(fn=wrap_gradio_gpu_call(modules.txt2img.txt2img))-> - sumbit

模型加载时间：Model loaded in 3271.0s (calculate hash: 60.8s, load weights from disk: 773.7s, find config: 1294.1s, load config: 4.2s, create model: 18.4s, apply weights to model: 383.6s, apply half(): 172.5s, apply dtype to VAE: 0.2s, load VAE: 1.1s, move model to device: 91.3s, hijack: 416.4s, load textual inversion embeddings: 23.5s, scripts callbacks: 31.3s).

安装：

1.注销掉

modules/safe.py

TypedStorage = torch.storage.TypedStorage if hasattr(torch.storage, 'TypedStorage') else torch.storage._TypedStorage

parser.add_argument("--disable-safe-unpickle", action='store_true', help="disable checking pytorch models for malicious code", default=True) 2.安装

websockets-10.0 gradio-3.23.0-py3-none-any.whl altair-4.2.0-py3-none-any.whl anyio-2.2.0 piexif fonts font-roboto safetensors lark git blendmodes jsonmerge torchdiffeq clean_fid-0.1.35-py3-none-any.whl resize_right-0.0.2-py3-none-any.whl torchsde-0.2.5-py3-none-any.whl facexlib-0.2.5-py3-none-any.whl basicsr-1.4.2.tar.gz gfpgan-1.3.8-py3-none-any.whl markdown_it_py-2.2.0-py3-none-any.whl huggingface_hub-0.13.0-py3-none-any.whl Markdown-3.4.1-py3-none-any.whl importlib_metadata-4.4.0-py3-none-any.whl mdit_py_plugins-0.3.3-py3-none-any.whl realesrgan-0.3.0-py3-none-any.whl uvicorn-0.18.0-py3-none-any.whl

3.p40不支持autocast

不用改，在cmd_args中

parser.add_argument("--precision", type=str, help="evaluate at this precision", choices=["full", "autocast"], default="full")

在processing中，在586、593、627、652行注掉 with devices.autocast()

ldm/modules/diffusionmodules/util.py", line 126 注掉autocast，

 ctx.gpu_autocast_kwargs = {"enabled": torch.is_autocast_enabled(), # "dtype": torch.get_autocast_gpu_dtype(), # 'dtype':torch.cuda.amp(), # "cache_enabled": torch.is_autocast_cache_enabled() }

4.在modules/sd_hijack_clip.py中，第247中

tokens = torch.from_numpy(np.array(remade_batch_tokens)).to(devices.device)

在260行：cl

batch_multipliers = torch.from_numpy(np.array(batch_multipliers)).to(devices.device)

5.RuntimeError: expected scalar type Half but found Float？

这个错误我在compvis1.5中也遇到了，直接在cmd_args中

parser.add_argument("--no-half", action='store_true', help="do not switch the model to 16-bit floats",default=True)
parser.add_argument("--no-half-vae", action='store_true', help="do not switch the VAE model to 16-bit floats",default=True)

6.AttributeError: module 'torch.cuda' has no attribute 'mem_get_info'

在modules/sd_hijack_optimizations.py中，117行直接把判断steps的全注掉了

或者只返回：return psutil.virtual_memory().available

7.RuntimeError: expected scalar type Double but found Float

在extensions-builtin/Lora/lora.py中第308行

return torch.nn.Linear_forward_before_lora(self, input.float())

apiwebscriptwebuidiffusioncodegradiopromptgancreateapppromptsshareragparsetokenstable diffusionstablediffusionjsonrap浏览器storeelofastapicliclippytorchgpudebugactionamlatswordcpunumpymarkdown模型优化git代码更新提示词url文生图