Stable Diffusion - SD v1.6+ 版本导致 BLIP Interrogate CLIP (CLIP 反推) 功能 RuntimeError 异常

正在检查是否收录...

欢迎关注我的CSDN：https://spike.blog.csdn.net/
本文地址：https://spike.blog.csdn.net/article/details/132994678

图像来源于 麦橘写实_MajicMIX_Realistic_v6 模型

升级 SD v1.6 版本，导致 CLIP 反推功能无法使用，即：

参考：图像反推 (Interrogate) 提示词算法 (BLIP 和 DeepBooru)

错误日志：

# ... File "stable_diffusion_webui/repositories/BLIP/models/med.py", line 277, in forward self_outputs = self.self( File "stable_diffusion_webui/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "stable_diffusion_webui/repositories/BLIP/models/med.py", line 178, in forward attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2)) RuntimeError: The size of tensor a (2) must match the size of tensor b (4) at non-singleton dimension 0

解决方案：SD 的 CLIP 反推功能，调用 GitHub - salesforce/BLIP，工程是上次更新是2022.9，整体的 Transformer 框架比较旧，目前仅支持 4.26.1 版本，即：

pip install transformers==4.26.1 pip install tokenizers==0.11.1

然而，SD v1.6 版本的 transformers 建议更新至 4.30.2，因而导致冲突，参考 requirements.txt 与 requirements_versions.txt：

transformers==4.30.2

因此，需要修改至 transformers==4.26.1，即可使用，BLIP 目前无人维护，因此只能以 BLIP 的 Transformer 为主。

参考：

Github - Dreambooth extension causes BLIP interrogation to give error (if number of beams is changed to anything greater then 1) GitHub - Bug: Interrogate CLIP

同时，修改 stable-diffusion-webui/modules/launch_utils.py 脚本，增加 GitHub 代理 https://ghproxy.com/，可以提升启动 WebUI 工程的预处理速度，如果需要更新版本，可以根据相应的工程地址，进行更新。其中：

BLIP 工程位于 stable-diffusion-webui/stable_diffusion_webui/repositories/BLIP BLIP 模型位于 stable_diffusion_webui/models/BLIP/model_base_capfilt_large.pth

即：

def prepare_environment(): # ... clip_package = os.environ.get('CLIP_PACKAGE', "https://ghproxy.com/https://github.com/openai/CLIP/archive/d50d76daa670286dd6cacf3bcd80b5e4823fc8e1.zip") openclip_package = os.environ.get('OPENCLIP_PACKAGE', "https://ghproxy.com/https://github.com/mlfoundations/open_clip/archive/bb6e834e9c70d9c27d0dc3ecedeebeaeb1ffad6b.zip") stable_diffusion_repo = os.environ.get('STABLE_DIFFUSION_REPO', "https://ghproxy.com/https://github.com/Stability-AI/stablediffusion.git") stable_diffusion_xl_repo = os.environ.get('STABLE_DIFFUSION_XL_REPO', "https://ghproxy.com/https://github.com/Stability-AI/generative-models.git") k_diffusion_repo = os.environ.get('K_DIFFUSION_REPO', 'https://ghproxy.com/https://github.com/crowsonkb/k-diffusion.git') codeformer_repo = os.environ.get('CODEFORMER_REPO', 'https://ghproxy.com/https://github.com/sczhou/CodeFormer.git') blip_repo = os.environ.get('BLIP_REPO', 'https://ghproxy.com/https://github.com/salesforce/BLIP.git') #...

注意：官网模型地址是 https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_base_capfilt_large.pth，比 SD 推荐的 model_base_caption_capfilt_large.pth 模型，尺寸更大，即 2.0G 与 800M。

 files = modelloader.load_models( model_path=os.path.join(paths.models_path, "BLIP"), - model_url='https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_base_caption_capfilt_large.pth', + model_url='https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_base_capfilt_large.pth',^M ext_filter=[".pth"], - download_name='model_base_caption_capfilt_large.pth', + download_name='model_base_capfilt_large.pth',^M )

图像描述，来自于 New Bing：

a picture of a person sitting on a chair in a luxurious room,
wearing a black and white zebra print dress and black high heels,
chair is a light beige color with a curved back and armrests,
room has a large window with white curtains and a gold-framed mirror on the wall,
floor is made of light-colored wood,
shows a contrast between the bold and striking pattern of the dress and the soft and elegant colors of the room,
seems to be relaxed and comfortable as they are leaning back on the chair and crossing their legs,
picture might be taken for a fashion magazine or a personal blog as it showcases the style and taste of the person,
It might also be taken for a hotel advertisement or a travel diary as it shows the beauty and luxury of the room,
The picture creates an impression of sophistication and glamour as well as curiosity and interest,

完整提升词：

(masterpiece, best quality:1.2),highly detailed,extremely detailed,real photo,
looking at viewer,body facing viewer,240D wrap hip very thick pantyhose,
a picture of a person sitting on a chair in a luxurious room,
wearing a black and white zebra print dress and black high heels,
chair is a light beige color with a curved back and armrests,
room has a large window with white curtains and a gold-framed mirror on the wall,
floor is made of light-colored wood,
shows a contrast between the bold and striking pattern of the dress and the soft and elegant colors of the room,
seems to be relaxed and comfortable as they are leaning back on the chair and crossing their legs,
The picture might be taken for a fashion magazine or a personal blog as it showcases the style and taste of the person,
It might also be taken for a hotel advertisement or a travel diary as it shows the beauty and luxury of the room,
picture creates an impression of sophistication and glamour as well as curiosity and interest,
(pair shoes,pair legs:1.2),nice hand,nice figure,
(photorealistic,realistic:1.2),
<lora:more_details:0.4>,<lora:clothing_adjuster_v2:-0.8>,
Negative prompt: (ng_deepnegative_v1_75t:1.3),(negative_hand),(badhandv4),
(negative_feet_v2:0.5),
cleavage,buttocks,
missing arm,missing leg,extra arms,extra legs,mutated legs,extra limbs,malformed limbs,floating limbs,disconnected limbs,
bad anatomy,bad proportions,disfigured,long neck,long leg,
worst quality,bad quality,jpeg artifacts,lowres,normal quality,low quality,
EasyNegative,
Steps: 30, Sampler: DPM++ 2M SDE Karras, CFG scale: 7, Seed: 2386674497, Size: 512x768, Model hash: e4a30e4607, Model: 麦橘写实_MajicMIX_Realistic_v6, Denoising strength: 0.3, ADetailer model: face_yolov8n.pt, ADetailer prompt: “asian face,beatiful face,”, ADetailer confidence: 0.3, ADetailer dilate/erode: 4, ADetailer mask blur: 4, ADetailer denoising strength: 0.4, ADetailer inpaint only masked: True, ADetailer inpaint padding: 32, ADetailer version: 23.9.2, Hires upscale: 2, Hires steps: 5, Hires upscaler: 4x-UltraSharp, Lora hashes: “more_details: 3b8aa1d351ef, clothing_adjuster_v2: f038e3a5b67b”, TI hashes: “ng_deepnegative_v1_75t: 54e7e4826d53, negative_hand: 73b524a2da12, badhandv4: 5e40d722fc3d, negative_feet_v2: df90b1ff666d, EasyNegative: 66a7279a88dd”, Version: v1.6.0

codegitdiffusiongithubcliclipproxywebwebuitransformertransformersapiraggoogleurlsalesforcehivesemcreatesalespromptgansopioserpideopenai图像描述pythonstablediffusiontputoken解决方案提示词dreamboothdreamrapbing