Pascal VOC数据格式转COCO数据格式脚本(Object Detection)

妖狐艹你老母 2022-01-30 14:59 566阅读 0赞

1. 前言

1.1 COCO数据集

COCO的全称是Common Objects in COntext,是微软团队提供的一个可以用来进行图像识别的数据集。MS COCO数据集中的图像分为训练、验证和测试集。COCO通过在Flickr上搜索80个对象类别和各种场景类型来收集图像,其使用了亚马逊的Mechanical Turk(AMT)来收集数据。COCO数据集现在有3种标注类型:object instances(目标实例), object keypoints(目标上的关键点), and image captions(看图说话),使用JSON文件存储。

1.2 COCO数据基本结构

这3种类型共享下面所列的基本类型,包括info、image、license,而annotation类型则呈现出了多态,会根据不同的任务具有不同的数据标注形式。

  1. {
  2. "info" : info,
  3. "images" : [image],
  4. "annotations" : [annotation],
  5. "licenses" : [license],
  6. }
  7. info{
  8. "year" : int,
  9. "version" : str,
  10. "description" : str,
  11. "contributor" : str,
  12. "url" : str,
  13. "date_created" : datetime,
  14. }
  15. image{
  16. "id" : int,
  17. "width" : int,
  18. "height" : int,
  19. "file_name" : str,
  20. "license" : int,
  21. "flickr_url" : str,
  22. "coco_url" : str,
  23. "date_captured" : datetime,
  24. }
  25. license{
  26. "id" : int,
  27. "name" : str,
  28. "url" : str,
  29. }

除了Annotation数据之外的数据类型举例如下:
1)info类型,比如一个info类型的实例:

  1. "info":{
  2. "description":"This is stable 1.0 version of the 2014 MS COCO dataset.",
  3. "url":"http:\/\/mscoco.org",
  4. "version":"1.0","year":2014,
  5. "contributor":"Microsoft COCO group",
  6. "date_created":"2015-01-27 09:11:52.357475"
  7. }

2)Images类型,Images是包含多个image实例的数组,对于一个image类型的实例:

  1. {
  2. "license":3,
  3. "file_name":"COCO_val2014_000000391895.jpg",
  4. "coco_url":"http:\/\/mscoco.org\/images\/391895",
  5. "height":360,"width":640,"date_captured":"2013-11-14 11:18:45",
  6. "flickr_url":"http:\/\/farm9.staticflickr.com\/8186\/8119368305_4e622c8349_z.jpg",
  7. "id":391895
  8. }

3)licenses类型,licenses是包含多个license实例的数组,对于一个license类型的实例:

  1. {
  2. "url":"http:\/\/creativecommons.org\/licenses\/by-nc-sa\/2.0\/",
  3. "id":1,
  4. "name":"Attribution-NonCommercial-ShareAlike License"
  5. }

1.3 Object Instance 类型的标注格式

1)整体JSON文件格式
Object Instance这种格式的文件从头至尾按照顺序分为以下段落:

  1. {
  2. "info": info,
  3. "licenses": [license],
  4. "images": [image],
  5. "annotations": [annotation],
  6. "categories": [category]
  7. }

是的,你打开这两个文件,虽然内容很多,但从文件开始到结尾按照顺序就是这5段。其中,info、licenses、images这三个结构体/类型 在上一节中已经说了,在不同的JSON文件中这三个类型是一样的,定义是共享的。不共享的是annotation和category这两种结构体,他们在不同类型的JSON文件中是不一样的。

PS,mages数组、annotations数组、categories数组的元素数量是相等的,等于图片的数量。

2)annotations字段
annotations字段是包含多个annotation实例的一个数组,annotation类型本身又包含了一系列的字段,如这个目标的category id和segmentation mask。segmentation格式取决于这个实例是一个单个的对象(即iscrowd=0,将使用polygons格式)还是一组对象(即iscrowd=1,将使用RLE格式)。如下所示:

  1. annotation{
  2. "id": int,
  3. "image_id": int,
  4. "category_id": int,
  5. "segmentation": RLE or [polygon],
  6. "area": float,
  7. "bbox": [x,y,width,height],
  8. "iscrowd": 0 or 1,
  9. }

注意,单个的对象(iscrowd=0)可能需要多个polygon来表示,比如这个对象在图像中被挡住了。而iscrowd=1时(将标注一组对象,比如一群人)的segmentation使用的就是RLE格式。

另外,每个对象(不管是iscrowd=0还是iscrowd=1)都会有一个矩形框bbox ,矩形框左上角的坐标和矩形框的长宽会以数组的形式提供,数组第一个元素就是左上角的横坐标值。

其中,area是框的面积(area of encoded masks)。

3)categories字段
annotation结构中的categories字段存储的是当前对象所属的category的id,以及所属的supercategory的name。
categories是一个包含多个category实例的数组,而category结构体描述如下:

  1. {
  2. "id": int,
  3. "name": str,
  4. "supercategory": str,
  5. }

从instances_val2017.json文件中摘出的2个category实例如下所示:

  1. {
  2. "supercategory": "person",
  3. "id": 1,
  4. "name": "person"
  5. },
  6. {
  7. "supercategory": "vehicle",
  8. "id": 2,
  9. "name": "bicycle"
  10. },
  11. ......

2. 转换脚本

  1. # -*- coding=utf-8 -*-
  2. #!/usr/bin/python
  3. import sys
  4. import os
  5. import shutil
  6. import numpy as np
  7. import json
  8. import xml.etree.ElementTree as ET
  9. # 检测框的ID起始值
  10. START_BOUNDING_BOX_ID = 1
  11. # 类别列表无必要预先创建,程序中会根据所有图像中包含的ID来创建并更新
  12. PRE_DEFINE_CATEGORIES = { }
  13. # If necessary, pre-define category and its id
  14. # PRE_DEFINE_CATEGORIES = {"aeroplane": 1, "bicycle": 2, "bird": 3, "boat": 4,
  15. # "bottle":5, "bus": 6, "car": 7, "cat": 8, "chair": 9,
  16. # "cow": 10, "diningtable": 11, "dog": 12, "horse": 13,
  17. # "motorbike": 14, "person": 15, "pottedplant": 16,
  18. # "sheep": 17, "sofa": 18, "train": 19, "tvmonitor": 20}
  19. def get(root, name):
  20. vars = root.findall(name)
  21. return vars
  22. def get_and_check(root, name, length):
  23. vars = root.findall(name)
  24. if len(vars) == 0:
  25. raise NotImplementedError('Can not find %s in %s.'%(name, root.tag))
  26. if length > 0 and len(vars) != length:
  27. raise NotImplementedError('The size of %s is supposed to be %d, but is %d.'%(name, length, len(vars)))
  28. if length == 1:
  29. vars = vars[0]
  30. return vars
  31. # 得到图片唯一标识号
  32. def get_filename_as_int(filename):
  33. try:
  34. filename = os.path.splitext(filename)[0]
  35. return int(filename)
  36. except:
  37. raise NotImplementedError('Filename %s is supposed to be an integer.'%(filename))
  38. def convert(xml_list, xml_dir, json_file):
  39. ''' :param xml_list: 需要转换的XML文件列表 :param xml_dir: XML的存储文件夹 :param json_file: 导出json文件的路径 :return: None '''
  40. list_fp = xml_list
  41. # 标注基本结构
  42. json_dict = { "images":[],
  43. "type": "instances",
  44. "annotations": [],
  45. "categories": []}
  46. categories = PRE_DEFINE_CATEGORIES
  47. bnd_id = START_BOUNDING_BOX_ID
  48. for line in list_fp:
  49. line = line.strip()
  50. print("buddy~ Processing {}".format(line))
  51. # 解析XML
  52. xml_f = os.path.join(xml_dir, line)
  53. tree = ET.parse(xml_f)
  54. root = tree.getroot()
  55. path = get(root, 'path')
  56. # 取出图片名字
  57. if len(path) == 1:
  58. filename = os.path.basename(path[0].text)
  59. elif len(path) == 0:
  60. filename = get_and_check(root, 'filename', 1).text
  61. else:
  62. raise NotImplementedError('%d paths found in %s'%(len(path), line))
  63. ## The filename must be a number
  64. image_id = get_filename_as_int(filename) # 图片ID
  65. size = get_and_check(root, 'size', 1)
  66. # 图片的基本信息
  67. width = int(get_and_check(size, 'width', 1).text)
  68. height = int(get_and_check(size, 'height', 1).text)
  69. image = { 'file_name': filename,
  70. 'height': height,
  71. 'width': width,
  72. 'id':image_id}
  73. json_dict['images'].append(image)
  74. ## Cruuently we do not support segmentation
  75. # segmented = get_and_check(root, 'segmented', 1).text
  76. # assert segmented == '0'
  77. # 处理每个标注的检测框
  78. for obj in get(root, 'object'):
  79. # 取出检测框类别名称
  80. category = get_and_check(obj, 'name', 1).text
  81. # 更新类别ID字典
  82. if category not in categories:
  83. new_id = len(categories)
  84. categories[category] = new_id
  85. category_id = categories[category]
  86. bndbox = get_and_check(obj, 'bndbox', 1)
  87. xmin = int(get_and_check(bndbox, 'xmin', 1).text) - 1
  88. ymin = int(get_and_check(bndbox, 'ymin', 1).text) - 1
  89. xmax = int(get_and_check(bndbox, 'xmax', 1).text)
  90. ymax = int(get_and_check(bndbox, 'ymax', 1).text)
  91. assert(xmax > xmin)
  92. assert(ymax > ymin)
  93. o_width = abs(xmax - xmin)
  94. o_height = abs(ymax - ymin)
  95. annotation = dict()
  96. annotation['area'] = o_width*o_height
  97. annotation['iscrowd'] = 0
  98. annotation['image_id'] = image_id
  99. annotation['bbox'] = [xmin, ymin, o_width, o_height]
  100. annotation['category_id'] = category_id
  101. annotation['id'] = bnd_id
  102. annotation['ignore'] = 0
  103. # 设置分割数据,点的顺序为逆时针方向
  104. annotation['segmentation'] = [[xmin,ymin,xmin,ymax,xmax,ymax,xmax,ymin]]
  105. json_dict['annotations'].append(annotation)
  106. bnd_id = bnd_id + 1
  107. # 写入类别ID字典
  108. for cate, cid in categories.items():
  109. cat = { 'supercategory': 'none', 'id': cid, 'name': cate}
  110. json_dict['categories'].append(cat)
  111. # 导出到json
  112. json_fp = open(json_file, 'w')
  113. json_str = json.dumps(json_dict)
  114. json_fp.write(json_str)
  115. json_fp.close()
  116. if __name__ == '__main__':
  117. root_path = os.getcwd()
  118. xml_dir = os.path.join(root_path, 'Annotations')
  119. xml_labels = os.listdir(os.path.join(root_path, 'Annotations'))
  120. np.random.shuffle(xml_labels)
  121. split_point = int(len(xml_labels)/10)
  122. # validation data
  123. xml_list = xml_labels[0:split_point]
  124. json_file = './instances_val2014.json'
  125. convert(xml_list, xml_dir, json_file)
  126. for xml_file in xml_list:
  127. img_name = xml_file[:-4] + '.jpg'
  128. shutil.copy(os.path.join(root_path, 'JPEGImages', img_name),
  129. os.path.join(root_path, 'val2014', img_name))
  130. # train data
  131. xml_list = xml_labels[split_point:]
  132. json_file = './instances_train2014.json'
  133. convert(xml_list, xml_dir, json_file)
  134. for xml_file in xml_list:
  135. img_name = xml_file[:-4] + '.jpg'
  136. shutil.copy(os.path.join(root_path, 'JPEGImages', img_name),
  137. os.path.join(root_path, 'train2014', img_name))

3. Reference

  1. COCO官网解释
  2. COCO数据集的标注格式

发表评论

表情:
评论列表 (有 0 条评论,566人围观)

还没有评论,来说两句吧...

相关阅读

    相关 VOC数据格式介绍

    深度学习很多框架都在使用VOC数据集,所以先来研究一下voc数据集的具体内容。 以[PASCAL VOC2017][]为例,它包含如下5个文件夹: JPEGImage