核心思想是:
1. 抛去算法设计方面,仅从工程角度考虑的话,Stable diffusion的潜力挖掘几乎完全受输入文字影响。
2. BLIP2所代表的一类多模态模型走的路线是"扩展赋能LLM模型",思路简单清晰,收益明显。LLM + Stable diffusion的问题应该也不大。
3. ChatGLM-6B 和 Stable diffusion都有huggingface中的pipeline,做个实验也不困难。
4. 这里只验证了基本管道连接的效果,如果效果还不错,那将还会有很多可玩的东西,如从LLM的LoRA、P-Tuning微调方面入手。感兴趣的话留意之后更新。
操作:
1. 安装ChatGLM-6B:GitHub - THUDM/ChatGLM-6B: ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
2. 赋能ChatGLM-6B模板
Q:
以下提示用于指导Al绘画模型创建图像。它们包括人物外观、背景、颜色和光影效果,以及图像的主题和风格等各种细节。这些提示的格式通常包括带权重的数字括号,用于指定某些细节的重要性或强调。例如,"(杰作:1.4)"表示作品的质量非常重要。以下是一些示例:
1. (8k, RAW photo, best quality, masterpiece:1.2),(realistic, photo-realistic:1.37), ultra-detailed, 1girl, cute, solo, beautiful detailed sky, detailed cafe, night, sitting, dating, (nose blush), (smile:1.1),(closed mouth), medium breasts, beautiful detailed eyes, (collared shirt:1.1), bowtie, pleated skirt, (short hair:1.2), floating hair, ((masterpiece)), ((best quality))
2. (masterpiece, finely detailed beautiful eyes: 1.2), ultra-detailed, illustration, 1 girl, blue hair black hair, japanese clothes, cherry blossoms, tori, street full of cherry blossoms, detailed background, realistic, volumetric light, sunbeam, light rays, sky, cloud,
3. highres, highest quallity, illustration, cinematic light, ultra detailed, detailed face, (detailed eyes, best quality, hyper detailed, masterpiece, (detailed face), blue hairlwhite hair, purple eyes, highest details, luminous eyes, medium breats, black halo, white clothes, backlighting, (midriff:1.4), light rays, (high contrast), (colorful)
仿照之前的提示,写一段描写如下要素的提示:开心 女孩
A:
(高质量,高分辨率,杰作:1.2),(真实感, photo-realistic:1.37),(超详细,女孩 开心,时尚元素,细节丰富,) ( collared shirt:1.1), (卷发:1.2), (运动装:1.3), (唱:1.1), (跳舞:1.2), (高对比度,色彩丰富), ((杰作))
3. 安装diffusers并验证text_to_image的pipeline是否能跑通:GitHub - huggingface/diffusers: ? Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch
4. 组装
ChatGLM
import os import platform import signal from transformers import AutoTokenizer, AutoModel import readline tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True) model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda() model = model.eval() os_name = platform.system() clear_command = 'cls' if os_name == 'Windows' else 'clear' stop_stream = False def build_prompt(history): prompt = "欢迎使用 ChatGLM-6B 模型,输入内容即可进行对话,clear 清空对话历史,stop 终止程序" for query, response in history: prompt += f"\n\n用户:{query}" prompt += f"\n\nChatGLM-6B:{response}" return prompt def signal_handler(signal, frame): global stop_stream stop_stream = True def main(): history = [] global stop_stream print("欢迎使用 ChatGLM-6B 模型,输入内容即可进行对话,clear 清空对话历史,stop 终止程序") while True: query = input("\n用户:") if query.strip() == "stop": break if query.strip() == "clear": history = [] os.system(clear_command) print("欢迎使用 ChatGLM-6B 模型,输入内容即可进行对话,clear 清空对话历史,stop 终止程序") continue count = 0 for response, history in model.stream_chat(tokenizer, query, history=history): if stop_stream: stop_stream = False break else: count += 1 if count % 8 == 0: os.system(clear_command) print(build_prompt(history), flush=True) signal.signal(signal.SIGINT, signal_handler) os.system(clear_command) print(build_prompt(history), flush=True) if __name__ == "__main__": main()
Stable diffusion
from diffusers import DiffusionPipeline # 导入stable diffusion generator = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5") generator.to("cuda") image = generator("ChatGLM_result_xxxx").images[0] image.save("result_image.jpg")
5. 生成效果对比
仅输入:开心 女孩
输入ChatGLM增强后结果
chatdiffusionpromptsignalcodestable diffusionerptokenstemllmgeneratormediumhuggingfacegithubgit语言模型ats细节丰富dating生成效果真实感transformerswindows多模态模型高分辨率pytorchtransformer高质量多模态url