tf.extract_image_patches以及pytorch的extract_patches

心已赠人 2024-05-24 02:24 47阅读 0赞

### 前言 ###

由于最近一直在研究上下文注意力机制，当对特征图FM提取patch，需要用到函数extract\_image\_patches，特此记录一下。温馨提示：提笔计算一下，更容易理解。

### 1、 tensorflow的extract\_image\_patches函数提取patch ###

[参考1 https://blog.csdn.net/qq\_38675570/article/details/81239774][1 https_blog.csdn.net_qq_38675570_article_details_81239774]  
[参考2 http://www.voidcn.com/article/p-svnrphay-btg.html][2 http_www.voidcn.com_article_p-svnrphay-btg.html]

def extract_image_patches(  
        images,      # 输入的特征图[b,h,w,c]
        ksizes=None, # patch大小，图像这块一般常用1,3，5。参数为[1, k_size, k_size, 1]
        strides=None,# 步长，[1, stride, stride, 1]，滑动的像素
        rates=None,  # 参考扩张卷积，相邻像素填0个数
        padding=None,# “VALID”,每个补丁必须完全包含在图像中；“SAME”,允许补丁不完整
        name=None,
        sizes=None):

这里以特征图bg=fg\_in=\[8,32,32,192\]为例：

# 先从背景区域中提取块
    patch1 = tf.extract_image_patches(bg, [1, k_size, k_size, 1], [1, stride, stride, 1], [1, 1, 1, 1], 'VALID')
    print("patch1 size",patch1.shape) # [8,30,30,1728]
    
    # 再从前景区域中提取块，添加了padding
    patch2 = tf.extract_image_patches(fg_in, [1,k_size,k_size,1], [1,1,1,1], [1,1,1,1], 'SAME')
    print("patch2 size", patch2.shape) # [8,32,32,1728]

输出维度分析：  
h=32，stride=1，k\_size=3。当padding=0时，计算出的块个数为c=900；当padding=1时，块的个数为32\*32=1024。  
提出的patch的维度为1728 = 192 \* 3 \* 3，也就是输出的最后一维是c\_in \* k\_size \* k\_size。在对patch计算余弦相似度时，处理的就是这一部分。

kn = int((k_size - 1) / 2)
    c = 0 # 计算块的个数
     for p in range(kn, h - kn, stride):
         for q in range(kn, w - kn, stride):
             c += 1

### 2、pytorch的ford和unfold ###

[参考1 https://blog.csdn.net/LoseInVain/article/details/88139435][1 https_blog.csdn.net_LoseInVain_article_details_88139435]  
上面这条链接将网络层的torch.nn.unfold()函数讲得很透彻，看明白之后大概就理解了。

# extract patches
    def extract_patches(x, kernel=3, stride=1):
        if kernel != 1:
            x = nn.ZeroPad2d(1)(x)
        x = x.permute(0, 2, 3, 1)
        all_patches = x.unfold(1, kernel, stride).unfold(2, kernel, stride)
        return all_patches

调用函数，输入的维度x2=\[8,192,32,32\]，ksize=3，stride=1，提取patch后输出的维度为\[8,32,32,192,3,3\]，和tensorflow的用法差不多，将tensor进行view一下可以达到一样的效果。

w = extract_patches(x2, kernel=self.ksize, stride=self.stride)
    print("w",w.size()) # [8,32,32,192,3,3]

下面给出2019 CVPR PEPSI : Fast Image Inpainting With Parallel Decoding Network论文中注意力模块，有做图像修复的朋友可以瞅一瞅，我自己修改了一些添加了一些代码。  
论文：[地址][Link 1]  
翻译：[https://blog.csdn.net/baidu\_33256174/article/details/102570361][https_blog.csdn.net_baidu_33256174_article_details_102570361]  
code:[https://github.com/Forty-lock/PEPSI][https_github.com_Forty-lock_PEPSI]

from __future__ import division
    # from ops import *
    import tensorflow as tf
    from tensorflow.keras.layers import Conv2D,ELU
    
    
    def softmax(input):
    
        k = tf.exp(input - 3)
        k = tf.reduce_sum(k, 3, True)
        # k = k - num * tf.ones_like(k)
    
        ouput = tf.exp(input - 3) / k
    
        return ouput
    
    def reduce_var(x, axis=None, keepdims=False):
        """Variance of a tensor, alongside the specified axis.
    
        # Arguments
            x: A tensor or variable.
            axis: An integer, the axis to compute the variance.
            keepdims: A boolean, whether to keep the dimensions or not.
                If `keepdims` is `False`, the rank of the tensor is reduced
                by 1. If `keepdims` is `True`,
                the reduced dimension is retained with length 1.
    
        # Returns
            A tensor with the variance of elements of `x`.
        """
        m = tf.reduce_mean(x, axis=axis, keepdims=True)
        devs_squared = tf.square(x - m)
        return tf.reduce_mean(devs_squared, axis=axis, keepdims=keepdims)
    
    def reduce_std(x, axis=None, keepdims=False):
        """Standard deviation of a tensor, alongside the specified axis.
    
        # Arguments
            x: A tensor or variable.
            axis: An integer, the axis to compute the standard deviation.
            keepdims: A boolean, whether to keep the dimensions or not.
                If `keepdims` is `False`, the rank of the tensor is reduced
                by 1. If `keepdims` is `True`,
                the reduced dimension is retained with length 1.
    
        # Returns
            A tensor with the standard deviation of elements of `x`.
        """
        return tf.sqrt(reduce_var(x, axis=axis, keepdims=keepdims))
    
    def contextual_block(bg_in, fg_in, mask, k_size, lamda, name, stride=1):
        with tf.variable_scope(name):
            b, h, w, dims = [i.value for i in bg_in.get_shape()]
            temp = tf.image.resize_nearest_neighbor(mask, (h, w)) # 进行插值扩张到（h,w）
            temp = tf.expand_dims(temp[:, :, :, 0], 3) # b 128 128 1
            mask_r = tf.tile(temp, [1, 1, 1, dims]) # b 128 128 128 ，复制128次
            bg = bg_in * mask_r
    
            kn = int((k_size - 1) / 2)
            c = 0 # 计算块的个数
            for p in range(kn, h - kn, stride):
                for q in range(kn, w - kn, stride):
                    c += 1
            # 先从背景区域中提取块
            patch1 = tf.extract_image_patches(bg, [1, k_size, k_size, 1], [1, stride, stride, 1], [1, 1, 1, 1], 'VALID')
            print("patch1 size",patch1.shape) # [8,30,30,1728]
            # 推算一下
            patch1 = tf.reshape(patch1, (b, 1, c, k_size*k_size*dims)) # [b,out_rows,out_cols,k_size*k_size*c]
            patch1 = tf.reshape(patch1, (b, 1, 1, c, k_size * k_size * dims))
            patch1 = tf.transpose(patch1, [0, 1, 2, 4, 3]) # 扩展成5维
            print("patch1 size", patch1.shape) # [8,1,1,1728,900]
            # 再从前景区域中提取块，添加了padding
            patch2 = tf.extract_image_patches(fg_in, [1,k_size,k_size,1], [1,1,1,1], [1,1,1,1], 'SAME')
            print("patch2 size", patch2.shape) # [8,32,32,1728]
            ACL = []
    
            for ib in range(b): # 一张一张图的计算
    
                k1 = patch1[ib, :, :, :, :] # 一张图像的背景
                print("k1",k1.shape) # [1,1,1728,900]
                k1d = tf.reduce_sum(tf.square(k1), axis=2)
                print("k1d",k1d.shape)  # [1,1,900]
                k2 = tf.reshape(k1, (k_size, k_size, dims, c))
                print("k2",k2.shape) # (3, 3, 192, 900)
    
                ww = patch2[ib, :, :, :]
                print("ww", ww.shape) # [32,32,1728]
                wwd = tf.reduce_sum(tf.square(ww), axis=2, keepdims=True)
                print("wwd",wwd.shape) # [32,32,1]
                ft = tf.expand_dims(ww, 0) # [1,32,32,1728]
    
                CS = tf.nn.conv2d(ft, k1, strides=[1, 1, 1, 1], padding='SAME')
                print("CS",CS.shape) # [1,32,32,900]
    
                tt = k1d + wwd
                print("tt",tt.shape) # [1,32,32,900]
                # 在axis=0处增加一个为1的维度
                DS1 = tf.expand_dims(tt, 0) - 2 * CS
                # 规范化处理
                DS2 = (DS1 - tf.reduce_mean(DS1, 3, True)) / reduce_std(DS1, 3, True)
                print('DS2',DS2.shape) # [1,32,32,900]
                DS2 = -1 * tf.nn.tanh(DS2)
    
                CA = softmax(lamda * DS2) # 计算得到相似性分数
                print("CA",CA.shape) # [1,32,32,900]
                # K2为卷积核
                ACLt = tf.nn.conv2d_transpose(CA, k2, output_shape=[1, h, w, dims], strides=[1, 1, 1, 1], padding='SAME')
                print("ACLt",ACLt.shape) # (1, 32, 32, 192)
                ACLt = ACLt / (k_size ** 2)
    
                if ib == 0:
                    ACL = ACLt
                else:
                    ACL = tf.concat((ACL, ACLt), 0)
    
            ACL = bg + ACL * (1.0 - mask_r) # mask,0表示缺失
    
            con1 = tf.concat([bg_in, ACL], 3) # 整合背景和重构的前景
            ACL2 = Conv2D(dims, [1, 1], strides=[1, 1], padding='VALID')(con1)
            ACL2 = tf.nn.elu(ACL2)
    
            return ACL2
    
    def ca_test(args):
        from PIL import Image
        import matplotlib.pyplot as plt
        import os
        import numpy as np
        os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
    
        def float_to_uint8(img):
            img = img * 255
            return img.astype('uint8')
    
        def default_loader(path):
            with open(path, 'rb') as f:
                image = Image.open(f).convert('RGB')
            return image
    
        rate = 2
        stride = 2
        grid = rate * stride
    
        # b = default_loader(args.imageA)
        # w, h = b.size
        # # b = b.resize((w//grid*grid//2, h//grid*grid//2), Image.ANTIALIAS)
        # b = b.resize((w // (grid * grid*2), h // (grid * grid*2)), Image.ANTIALIAS)
        # print('Size of imageA: {}'.format(b.size))
        #
        # f = default_loader(args.imageB)
        # w, h = f.size
        # f = f.resize((w // (grid *grid*2), h // (grid *grid*2)), Image.ANTIALIAS)
        # # print('Size of imageB: {}'.format(f.size))
    
        b = np.zeros([8,32,32,192],dtype=np.float32)
        print(type(b))
    
        mask = default_loader(args.mask)
        print("mask",mask.size)
        w, h = mask.size
        mask = mask.resize((w // (grid*2), h // (grid*2)), Image.ANTIALIAS)
        mask = mask.convert("L")
        print("Size of mask:{}".format(mask.size))
    
        # variable
        batch_size, Height, Width = 8,32,32
    
        bg_var = tf.placeholder(tf.float32, [batch_size, Height, Width, 192])
        fg_var = tf.placeholder(tf.float32, [batch_size, Height, Width, 192])
        mask_var = tf.placeholder(tf.float32, [batch_size, Height, Width, 1])
    
        y = contextual_block(bg_in=bg_var, fg_in=fg_var, mask=mask_var, k_size=3, lamda=80, name='CA')
    
        config = tf.ConfigProto()
        config.gpu_options.per_process_gpu_memory_fraction = 0.9  # 限制GPU使用率为40%
        config.gpu_options.allow_growth = True  # 动态申请显存
    
        with tf.Session(config=config) as sess:
            sess.run(tf.global_variables_initializer())
            # b = tf.expand_dims(b, axis=0).eval()
            # f = tf.expand_dims(f, axis=0).eval()
            mask = tf.expand_dims(mask,axis=2)
            mask = tf.expand_dims(mask, axis=0)
            mask = tf.tile(mask, [8, 1, 1, 1]).eval()
            print(type(mask))
            print("b:",b.shape,"f:",b.shape,"mask:",mask.shape)
    
            y = sess.run([y],feed_dict={
        bg_var:b,fg_var:b,mask_var:mask})
    
        # plt.imshow(y)
        # plt.show()
    
    
    if __name__ == '__main__':
        import argparse
        parser = argparse.ArgumentParser()
        parser.add_argument('--imageA', default='../image/000147.jpg.jpg', type=str,
                            help='Image A as background patches to reconstruct image B.')
        parser.add_argument('--imageB', default='../image/000201.jpg.jpg', type=str,
                            help='Image B is reconstructed with image A.')
        parser.add_argument('--mask', default='../image/mask_0001.jpg', type=str)
        parser.add_argument('--imageOut', default='../image/result.png', type=str,
                            help='Image B is reconstructed with image A.')
        args = parser.parse_args()
        ca_test(args)

[1 https_blog.csdn.net_qq_38675570_article_details_81239774]: https://blog.csdn.net/qq_38675570/article/details/81239774
[2 http_www.voidcn.com_article_p-svnrphay-btg.html]: http://www.voidcn.com/article/p-svnrphay-btg.html
[1 https_blog.csdn.net_LoseInVain_article_details_88139435]: https://blog.csdn.net/LoseInVain/article/details/88139435
[Link 1]: http://openaccess.thecvf.com/content_CVPR_2019/papers/Sagong_PEPSI__Fast_Image_Inpainting_With_Parallel_Decoding_Network_CVPR_2019_paper.pdf
[https_blog.csdn.net_baidu_33256174_article_details_102570361]: https://blog.csdn.net/baidu_33256174/article/details/102570361
[https_github.com_Forty-lock_PEPSI]: https://github.com/Forty-lock/PEPSI