1. 概述

导读：在这篇文章之前对于目标的跟踪是在一个视频上根据目标的外在特征进行在线学习，这样的方法也取得了较为不错的结果，但是这样的方式具有较多的限制性，推广性不是很好。对此文章借助CNN网络设计了一个新的全卷积Siamese网络，并且可以实现端到端的训练，同时也解决了之前CNN网络带来的不能实时的问题。这篇文章的方法简单算作是跟踪领域使用CNN网络比较好的开端文章，由它衍生出来的一系列siamese跟踪网络至今也是效果很好的存在。

2. 方法设计

这篇文章的方法利用了Siamese网络共享权值的特性，对参考图和当前帧分别获取embedding feature，之后使用correlation match计算两个特征图的相似度，使其在对应的区域响应度最大。其infer的时候（训练的时候经过图像剪裁和补边，所以中心怎么都是对齐的）流程见下图所示：
在这里插入图片描述

3. 代码梳理

这里介绍的代码来自于文章开头的“参考代码”，以下的代码全是来自于该仓库，这里只梳理了关键性的步骤，其它非关键步骤忽略。

Step1. 训练标注的生成
这里的训练标注是在给定的correlation feature尺寸上去生成label（同时给定正负样本的阈值，用以将中心与边缘区分开，当然也可以通过高斯平滑）

def create_BCELogit_loss_label(label_size, pos_thr, neg_thr, upscale_factor=4):
    pos_thr = pos_thr / upscale_factor
    neg_thr = neg_thr / upscale_factor
    # Note that the center might be between pixels if the label size is even.
    center = (label_size - 1) / 2
    line = np.arange(0, label_size)
    line = line - center
    line = line**2
    line = np.expand_dims(line, axis=0)
    dist_map = line + line.transpose()
    label = np.zeros([label_size, label_size, 2]).astype(np.float32)
    label[:, :, 0] = dist_map <= pos_thr**2
    label[:, :, 1] = (dist_map <= pos_thr**2) | (dist_map > neg_thr**2)
    return label

Step2. 计算ref与search之间的correlation match

def match_corr(self, embed_ref, embed_srch):
        b, c, h, w = embed_srch.shape
        match_map = F.conv2d(embed_srch.view(1, b * c, h, w),
                             embed_ref, groups=b)
        # Here we reorder the dimensions to get back the batch dimension.
        match_map = match_map.permute(1, 0, 2, 3)
        match_map = self.match_batchnorm(match_map)    # torch.nn.BatchNorm2d(1)
        if self.upscale:
            match_map = F.interpolate(match_map, self.upsc_size, mode='bilinear',
                                      align_corners=False)
        return match_map

Step3：计算损失

def BCELogit_Loss(score_map, labels):
    labels = labels.unsqueeze(1)
    loss = F.binary_cross_entropy_with_logits(score_map, labels[:, :, :, :, 0],
                                              weight=labels[:, :, :, :, 1],
                                              reduction='mean')
    return loss