区分stable diffusion中的通道数与张量维度

区分stable diffusion中的通道数与张量维度

    正在检查是否收录...

区分stable diffusion中的通道数与张量形状

1.通道数: 1.1 channel = 3 1.2 channel = 4 2.张量形状 2.1 3D 张量 2.2 4D 张量 2.2.1 通常 2.2.2 stable diffusion 3.应用 3.1 问题 3.2 举例 3.3 张量可以理解为多维可变数组 3.4 将张量化为list 3.4.1 3.4.2 3.5 将list化为张量 3.5.1 3.5.2 3.5.3 沿着现有维度拼接/在新的维度上增加维度


前言:通道数与张量形状都在数值3和4之间变换,容易混淆。

1.通道数:

1.1 channel = 3

RGB 图像具有 3 个通道(红色、绿色和蓝色)。

1.2 channel = 4

Stable Diffusion has 4 latent channels。
如何理解卷积神经网络中的通道(channel)

2.张量形状

2.1 3D 张量

形状为 (C, H, W),其中 C 是通道数,H 是高度,W 是宽度。这适用于单个图像。

2.2 4D 张量

2.2.1 通常

形状为 (B, C, H, W),其中 B 是批次大小,C 是通道数,H 是高度,W 是宽度。这适用于多个图像(例如,批量处理)。

2.2.2 stable diffusion

在img2img中,将image用vae编码并按照timestep加噪:

 # This code copyed from diffusers.pipline_controlnet_img2img.py # 6. Prepare latent variables latents = self.prepare_latents( image, latent_timestep, batch_size, num_images_per_prompt, prompt_embeds.dtype, device, generator, ) 

image的dim(维度)是3,而latents的dim为4。
让我们先看text2img的prepare_latents函数:

 # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.prepare_latents def prepare_latents(self, batch_size, num_channels_latents, height, width, dtype, device, generator, latents=None): shape = (batch_size, num_channels_latents, height // self.vae_scale_factor, width // self.vae_scale_factor) if isinstance(generator, list) and len(generator) != batch_size: raise ValueError( f"You have passed a list of generators of length {len(generator)}, but requested an effective batch" f" size of {batch_size}. Make sure the batch size matches the length of the generators." ) if latents is None: latents = randn_tensor(shape, generator=generator, device=device, dtype=dtype) else: latents = latents.to(device) # scale the initial noise by the standard deviation required by the scheduler latents = latents * self.scheduler.init_noise_sigma return latents 

显然,shape已经规定了latents的dim(4)和排列顺序。
在img2img中:

 # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion_img2img.StableDiffusionImg2ImgPipeline.prepare_latents def prepare_latents(self, image, timestep, batch_size, num_images_per_prompt, dtype, device, generator=None): if not isinstance(image, (torch.Tensor, PIL.Image.Image, list)): raise ValueError( f"`image` has to be of type `torch.Tensor`, `PIL.Image.Image` or list but is {type(image)}" ) image = image.to(device=device, dtype=dtype) batch_size = batch_size * num_images_per_prompt if image.shape[1] == 4: init_latents = image else: if isinstance(generator, list) and len(generator) != batch_size: raise ValueError( f"You have passed a list of generators of length {len(generator)}, but requested an effective batch" f" size of {batch_size}. Make sure the batch size matches the length of the generators." ) elif isinstance(generator, list): init_latents = [ self.vae.encode(image[i : i + 1]).latent_dist.sample(generator[i]) for i in range(batch_size) ] init_latents = torch.cat(init_latents, dim=0) else: init_latents = self.vae.encode(image).latent_dist.sample(generator) init_latents = self.vae.config.scaling_factor * init_latents if batch_size > init_latents.shape[0] and batch_size % init_latents.shape[0] == 0: # expand init_latents for batch_size deprecation_message = ( f"You have passed {batch_size} text prompts (`prompt`), but only {init_latents.shape[0]} initial" " images (`image`). Initial images are now duplicating to match the number of text prompts. Note" " that this behavior is deprecated and will be removed in a version 1.0.0. Please make sure to update" " your script to pass as many initial images as text prompts to suppress this warning." ) deprecate("len(prompt) != len(image)", "1.0.0", deprecation_message, standard_warn=False) additional_image_per_prompt = batch_size // init_latents.shape[0] init_latents = torch.cat([init_latents] * additional_image_per_prompt, dim=0) elif batch_size > init_latents.shape[0] and batch_size % init_latents.shape[0] != 0: raise ValueError( f"Cannot duplicate `image` of batch size {init_latents.shape[0]} to {batch_size} text prompts." ) else: init_latents = torch.cat([init_latents], dim=0) shape = init_latents.shape noise = randn_tensor(shape, generator=generator, device=device, dtype=dtype) # get latents init_latents = self.scheduler.add_noise(init_latents, noise, timestep) latents = init_latents return latents 

3.应用

3.1 问题

new_map = texture.permute(1, 2, 0) RuntimeError: permute(sparse_coo): number of dimensions in the tensor input does not match the length of the desired ordering of dimensions i.e. input.dim() = 4 is not equal to len(dims) = 3 

该问题是张量形状的问题,跟通道数毫无关系。

3.2 举例

问:4D 张量:形状为 (B, C, H, W),其中C可以为3吗?
答:4D 张量的形状为 (B,C,H,W),其中 C 表示通道数。通常情况下,C 可以为 3,这对应于 RGB 图像的三个颜色通道(红色、绿色和蓝色)。

3.3 张量可以理解为多维可变数组

print("sample:", sample.shape) print("sample:", sample[0].shape) print("sample:", sample[0][0].shape) 
>> sample: torch.Size([10, 4, 96, 96]) sample: torch.Size([4, 96, 96]) sample: torch.Size([96, 96]) 

由此可见,可以将张量形状为torch.size([10, 4, 96, 96])理解为一个4维可变数组。

3.4 将张量化为list

3.4.1

# sample: torch.Size([10, 4, 96, 96]) views = [view for view in sample] print("views:", views.shape) 
>>AttributeError: 'list' object has no attribute 'shape' 

此时应该:

print("views:", views[0].shape) 
>>views: torch.Size([4, 96, 96]) 

3.4.2

# 方法二 for i, view in enumerate(prev_views): pred_prev_sample[i] = view 

3.5 将list化为张量

3.5.1

# 定义一个Python列表 my_list = [1, 2, 3, 4, 5] # 将Python列表转换为PyTorch张量 my_tensor = torch.tensor(my_list) print(my_tensor) 
>>tensor([1, 2, 3, 4, 5]) 

3.5.2

# 假设你有一个包含多个张量的列表 tensor_list = [torch.tensor([1, 2, 3]), torch.tensor([4, 5, 6]), torch.tensor([7, 8, 9])] # 使用torch.stack将它们堆叠成一个新的张量 stacked_tensor = torch.stack(tensor_list) print(stacked_tensor) 
>>tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) 

张量运算时对轴参数的设定非常常见,在 Numpy 中一般是参数axis,在 Pytorch 中一般是参数dim,但它们含义是一样的。
深度学习中的轴/axis/dim全解

# 默认情况下,它在新的维度(即0维)上堆叠这些张量。 # views is a list,and views[0].shape is ([4, 96, 96]). views = torch.stack(views, axis=0) # ([10, 4, 96, 96]) 

3.5.3 沿着现有维度拼接/在新的维度上增加维度

import torch # 假设你有一个包含多个张量的列表 tensor_list = [torch.tensor([1, 2, 3]), torch.tensor([4, 5, 6]), torch.tensor([7, 8, 9])] # 使用 torch.cat 将张量沿着现有维度拼接 concatenated_tensor = torch.cat(tensor_list, dim=0) # 使用 torch.unsqueeze 在新的维度上增加维度 stacked_tensor = torch.unsqueeze(concatenated_tensor, dim=0) 

generatorpromptdiffusionstable diffusionpromptsctocodepythonpytorchstablediffusionscript卷积神经网络深度学习parsenumpy神经网络批量处理controlnet
  • 本文作者:李琛
  • 本文链接: https://wapzz.net/post-16746.html
  • 版权声明:本博客所有文章除特别声明外,均默认采用 CC BY-NC-SA 4.0 许可协议。
本站部分内容来源于网络转载,仅供学习交流使用。如涉及版权问题,请及时联系我们,我们将第一时间处理。
文章很赞!支持一下吧 还没有人为TA充电
为TA充电
还没有人为TA充电
0
  • 支付宝打赏
    支付宝扫一扫
  • 微信打赏
    微信扫一扫
感谢支持
文章很赞!支持一下吧
关于作者
2.3W+
5
0
1
WAP站长官方

AI绘画Stable Diffusion 3 正式开源,AI生图格局迎来巨变!(附模型下载)

上一篇

Midjourney NIJI5制作动漫风格作品保姆级教程

下一篇
  • 复制图片
按住ctrl可打开默认菜单