faster RCNN(keras版本)代码讲解(6)-ROI Pooling层详情 古城微笑少年丶 2022-05-16 02:50 264阅读 0赞 faster RCNN(keras版本)代码讲解博客索引: [1.faster RCNN(keras版本)代码讲解(1)-概述][1.faster RCNN_keras_1_-] [2.faster RCNN(keras版本)代码讲解(2)-数据准备][2.faster RCNN_keras_2_-] [3.faster RCNN(keras版本)代码讲解(3)-训练流程详情][3.faster RCNN_keras_3_-] [4.faster RCNN(keras版本)代码讲解(4)-共享卷积层详情][4.faster RCNN_keras_4_-] [5.faster RCNN(keras版本)代码讲解(5)-RPN层详情][5.faster RCNN_keras_5_-RPN] [6.faster RCNN(keras版本)代码讲解(6)-ROI Pooling层详情][6.faster RCNN_keras_6_-ROI Pooling] 一.ROI Pooling的原理和作用 将不同大小的特征图,压缩成相同大小。比如有两个特征图(16\*16),压缩成相同大小(4,4),也就是说,对于(16\*16)的特征图,每(4\*4)大小的框就做一个max pooling;目的也是减少原始图片为了裁剪或者压缩成固定大小带来的信息损失;还有将不同大小图片生成的特征图转换成同等大小特征图,便于后面全连接层或者分类器接收。但这个keras 版本的fast rcnn中,作者在backend 为theano中使用ROI Pooling,然后在backend 为tensorflow中使用ROI Align([mask rcnn 代码][mask rcnn]),比ROI Pooling更好。 class RoiPoolingConv(Layer): '''ROI pooling layer for 2D inputs. See Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition, K. He, X. Zhang, S. Ren, J. Sun # Arguments pool_size: int Size of pooling region to use. pool_size = 7 will result in a 7x7 region. num_rois: number of regions of interest to be used # Input shape list of two 4D tensors [X_img,X_roi] with shape: X_img: `(1, channels, rows, cols)` if dim_ordering='th' or 4D tensor with shape: `(1, rows, cols, channels)` if dim_ordering='tf'. X_roi: `(1,num_rois,4)` list of rois, with ordering (x,y,w,h) # Output shape 3D tensor with shape: `(1, num_rois, channels, pool_size, pool_size)` ''' def __init__(self, pool_size, num_rois, **kwargs): self.dim_ordering = K.image_dim_ordering() assert self.dim_ordering in { 'tf', 'th'}, 'dim_ordering must be in {tf, th}' self.pool_size = pool_size self.num_rois = num_rois super(RoiPoolingConv, self).__init__(**kwargs) def build(self, input_shape): if self.dim_ordering == 'th': self.nb_channels = input_shape[0][1] elif self.dim_ordering == 'tf': self.nb_channels = input_shape[0][3] def compute_output_shape(self, input_shape): if self.dim_ordering == 'th': return None, self.num_rois, self.nb_channels, self.pool_size, self.pool_size else: return None, self.num_rois, self.pool_size, self.pool_size, self.nb_channels def call(self, x, mask=None): assert(len(x) == 2) #特征图 img = x[0] #原始图像上框的坐标 rois = x[1] input_shape = K.shape(img) outputs = [] for roi_idx in range(self.num_rois): x = rois[0, roi_idx, 0] y = rois[0, roi_idx, 1] w = rois[0, roi_idx, 2] h = rois[0, roi_idx, 3] row_length = w / float(self.pool_size) col_length = h / float(self.pool_size) num_pool_regions = self.pool_size #NOTE: the RoiPooling implementation differs between theano and tensorflow due to the lack of a resize op # in theano. The theano implementation is much less efficient and leads to long compile times #真的roi pooling if self.dim_ordering == 'th': for jy in range(num_pool_regions): for ix in range(num_pool_regions): x1 = x + ix * row_length x2 = x1 + row_length y1 = y + jy * col_length y2 = y1 + col_length x1 = K.cast(x1, 'int32') x2 = K.cast(x2, 'int32') y1 = K.cast(y1, 'int32') y2 = K.cast(y2, 'int32') x2 = x1 + K.maximum(1,x2-x1) y2 = y1 + K.maximum(1,y2-y1) new_shape = [input_shape[0], input_shape[1], y2 - y1, x2 - x1] x_crop = img[:, :, y1:y2, x1:x2] xm = K.reshape(x_crop, new_shape) pooled_val = K.max(xm, axis=(2, 3)) outputs.append(pooled_val) elif self.dim_ordering == 'tf': x = K.cast(x, 'int32') y = K.cast(y, 'int32') w = K.cast(w, 'int32') h = K.cast(h, 'int32') #使用的是mask rcnn中的ROIAlign,效果更好 rs = tf.image.resize_images(img[:, y:y+h, x:x+w, :], (self.pool_size, self.pool_size)) # print("resize_images",img[:, y:y+h, x:x+w, :].shape) # print("resize_result",rs.shape) outputs.append(rs) final_output = K.concatenate(outputs, axis=0) final_output = K.reshape(final_output, (1, self.num_rois, self.pool_size, self.pool_size, self.nb_channels)) if self.dim_ordering == 'th': final_output = K.permute_dimensions(final_output, (0, 1, 4, 2, 3)) else: final_output = K.permute_dimensions(final_output, (0, 1, 2, 3, 4)) return final_output [1.faster RCNN_keras_1_-]: https://blog.csdn.net/u011311291/article/details/81004067 [2.faster RCNN_keras_2_-]: https://blog.csdn.net/u011311291/article/details/81021731 [3.faster RCNN_keras_3_-]: https://blog.csdn.net/u011311291/article/details/81121519 [4.faster RCNN_keras_4_-]: https://blog.csdn.net/u011311291/article/details/81221145 [5.faster RCNN_keras_5_-RPN]: https://blog.csdn.net/u011311291/article/details/81221893 [6.faster RCNN_keras_6_-ROI Pooling]: https://blog.csdn.net/u011311291/article/details/81673460 [mask rcnn]: https://github.com/matterport/Mask_RCNN
还没有评论,来说两句吧...