labelme转coco数据集

原始labelme数据目录结构如下：

|-- images

|     |---  1.jpg

|     |---  1.json

|     |---  2.jpg

|     |---  2.json

|     |---  .......

|-- labelme2coco.py

|-- labels.txt

imges目录下就是你的数据集原始图片，加上labelme标注的json文件。
labelme2coco.py源码放到最后。
labels.txt就是你的类别标签，假设我有两个类（lm，ls），那么对应的labels.txt内容如下：

__ignore__

_background_

lm

ls

在labelme2coco.py文件的目录下，打开命令行执行：

 python labelme2coco.py --input_dir images --output_dir coco --labels labels.txt

--input_dir：指定images文件夹
--output_dir：指定你的输出文件夹
--labels：指定你的labels.txt文件

执行结果如下图：

生成的coco数据集目录结构如下：

|-- annotations

| 	|---  instances_train2017.json

|       |---  instances_val2017.json

|-- train2017

| 	|---  2.jpg

| 	|---  5.jpg

| 	|---  .......

|-- val2017

| 	|---  1.jpg

| 	|---  3.jpg

| 	|---  .......

|-- visualization

| 	|---  1.jpg

| 	|---  2.jpg

| 	|---  .......

训练之用前三个文件夹就可以了，也就是annotations,train2017,val2017就可以了。visualization可以用来观察自己标注的数据集效果。源码里都标注好了，大部分来自labelme官方的源码：labelme/labelme2coco.py · GitHub

如果想调整训练集验证集的比例，可以在labelme2coco.py源码中搜索 test_size

labelme2coco.py源码：

# 命令行执行：  python labelme2coco.py --input_dir images --output_dir coco --labels labels.txt

# 输出文件夹必须为空文件夹

import argparse

import collections

import datetime

import glob

import json

import os

import os.path as osp

import sys

import uuid

import imgviz

import numpy as np

import labelme

from sklearn.model_selection import train_test_split

try:

    import pycocotools.mask

except ImportError:

    print("Please install pycocotools:\n\n    pip install pycocotools\n")

    sys.exit(1)

def to_coco(args,label_files,train):

    # 创建 总标签data

    now = datetime.datetime.now()

    data = dict(

        info=dict(

            description=None,

            url=None,

            version=None,

            year=now.year,

            contributor=None,

            date_created=now.strftime("%Y-%m-%d %H:%M:%S.%f"),

        ),

        licenses=[dict(url=None, id=0, name=None,)],

        images=[

            # license, url, file_name, height, width, date_captured, id

        ],

        type="instances",

        annotations=[

            # segmentation, area, iscrowd, image_id, bbox, category_id, id

        ],

        categories=[

            # supercategory, id, name

        ],

    )

    # 创建一个 {类名 : id} 的字典，并保存到 总标签data 字典中。

    class_name_to_id = {}

    for i, line in enumerate(open(args.labels).readlines()):

        class_id = i - 1  # starts with -1

        class_name = line.strip()   # strip() 方法用于移除字符串头尾指定的字符（默认为空格或换行符）或字符序列。

        if class_id == -1:

            assert class_name == "__ignore__"   # background:0, class1:1, ,,

            continue

        class_name_to_id[class_name] = class_id

        data["categories"].append(

            dict(supercategory=None, id=class_id, name=class_name,)

        )

    if train:

        out_ann_file = osp.join(args.output_dir, "annotations","instances_train2017.json")

    else:

        out_ann_file = osp.join(args.output_dir, "annotations","instances_val2017.json")

    for image_id, filename in enumerate(label_files):

        label_file = labelme.LabelFile(filename=filename)

        base = osp.splitext(osp.basename(filename))[0]      # 文件名不带后缀

        if train:

            out_img_file = osp.join(args.output_dir, "train2017", base + ".jpg")

        else:

            out_img_file = osp.join(args.output_dir, "val2017", base + ".jpg")

        print("| ",out_img_file)

        # ************************** 对图片的处理开始 *******************************************

        # 将标签文件对应的图片进行保存到对应的 文件夹。train保存到 train2017/ test保存到 val2017/

        img = labelme.utils.img_data_to_arr(label_file.imageData)   # .json文件中包含图像，用函数提出来

        imgviz.io.imsave(out_img_file, img)     # 将图像保存到输出路径

        # ************************** 对图片的处理结束 *******************************************

        # ************************** 对标签的处理开始 *******************************************

        data["images"].append(

            dict(

                license=0,

                url=None,

                file_name=base+".jpg",              # 只存图片的文件名

                # file_name=osp.relpath(out_img_file, osp.dirname(out_ann_file)),  # 存标签文件所在目录下找图片的相对路径

                ##   out_img_file : "/coco/train2017/1.jpg"

                ##   out_ann_file : "/coco/annotations/annotations_train2017.json"

                ##   osp.dirname(out_ann_file) : "/coco/annotations"

                ##   file_name : ..\train2017\1.jpg   out_ann_file文件所在目录下 找 out_img_file 的相对路径

                height=img.shape[0],

                width=img.shape[1],

                date_captured=None,

                id=image_id,

            )

        )

        masks = {}  # for area

        segmentations = collections.defaultdict(list)  # for segmentation

        for shape in label_file.shapes:

            points = shape["points"]

            label = shape["label"]

            group_id = shape.get("group_id")

            shape_type = shape.get("shape_type", "polygon")

            mask = labelme.utils.shape_to_mask(

                img.shape[:2], points, shape_type

            )

            if group_id is None:

                group_id = uuid.uuid1()

            instance = (label, group_id)

            if instance in masks:

                masks[instance] = masks[instance] | mask

            else:

                masks[instance] = mask

            if shape_type == "rectangle":

                (x1, y1), (x2, y2) = points

                x1, x2 = sorted([x1, x2])

                y1, y2 = sorted([y1, y2])

                points = [x1, y1, x2, y1, x2, y2, x1, y2]

            else:

                points = np.asarray(points).flatten().tolist()

            segmentations[instance].append(points)

        segmentations = dict(segmentations)

        for instance, mask in masks.items():

            cls_name, group_id = instance

            if cls_name not in class_name_to_id:

                continue

            cls_id = class_name_to_id[cls_name]

            mask = np.asfortranarray(mask.astype(np.uint8))

            mask = pycocotools.mask.encode(mask)

            area = float(pycocotools.mask.area(mask))

            bbox = pycocotools.mask.toBbox(mask).flatten().tolist()

            data["annotations"].append(

                dict(

                    id=len(data["annotations"]),

                    image_id=image_id,

                    category_id=cls_id,

                    segmentation=segmentations[instance],

                    area=area,

                    bbox=bbox,

                    iscrowd=0,

                )

            )

        # ************************** 对标签的处理结束 *******************************************

        # ************************** 可视化的处理开始 *******************************************

        if not args.noviz:

            labels, captions, masks = zip(

                *[

                    (class_name_to_id[cnm], cnm, msk)

                    for (cnm, gid), msk in masks.items()

                    if cnm in class_name_to_id

                ]

            )

            viz = imgviz.instances2rgb(

                image=img,

                labels=labels,

                masks=masks,

                captions=captions,

                font_size=15,

                line_width=2,

            )

            out_viz_file = osp.join(

                args.output_dir, "visualization", base + ".jpg"

            )

            imgviz.io.imsave(out_viz_file, viz)

        # ************************** 可视化的处理结束 *******************************************

    with open(out_ann_file, "w") as f:  # 将每个标签文件汇总成data后，保存总标签data文件

        json.dump(data, f)

# 主程序执行

def main():

    parser = argparse.ArgumentParser(

        formatter_class=argparse.ArgumentDefaultsHelpFormatter

    )

    parser.add_argument("--input_dir", help="input annotated directory")

    parser.add_argument("--output_dir", help="output dataset directory")

    parser.add_argument("--labels", help="labels file", required=True)

    parser.add_argument("--noviz", help="no visualization", action="store_true")

    args = parser.parse_args()

    if osp.exists(args.output_dir):

        print("Output directory already exists:", args.output_dir)

        sys.exit(1)

    os.makedirs(args.output_dir)

    print("| Creating dataset dir:", args.output_dir)

    if not args.noviz:

        os.makedirs(osp.join(args.output_dir, "visualization"))

    # 创建保存的文件夹

    if not os.path.exists(osp.join(args.output_dir, "annotations")):

        os.makedirs(osp.join(args.output_dir, "annotations"))

    if not os.path.exists(osp.join(args.output_dir, "train2017")):

        os.makedirs(osp.join(args.output_dir, "train2017"))

    if not os.path.exists(osp.join(args.output_dir, "val2017")):

        os.makedirs(osp.join(args.output_dir, "val2017"))

    # 获取目录下所有的.jpg文件列表

    feature_files = glob.glob(osp.join(args.input_dir, "*.jpg"))

    print('| Image number: ', len(feature_files))

    # 获取目录下所有的joson文件列表

    label_files = glob.glob(osp.join(args.input_dir, "*.json"))

    print('| Json number: ', len(label_files))

    # feature_files:待划分的样本特征集合    label_files:待划分的样本标签集合    test_size:测试集所占比例

    # x_train:划分出的训练集特征      x_test:划分出的测试集特征     y_train:划分出的训练集标签    y_test:划分出的测试集标签

    x_train, x_test, y_train, y_test = train_test_split(feature_files, label_files, test_size=0.3)

    print("| Train number:", len(y_train), '\t Value number:', len(y_test))

    # 把训练集标签转化为COCO的格式，并将标签对应的图片保存到目录 /train2017/

    print("—"*50)

    print("| Train images:")

    to_coco(args,y_train,train=True)

    # 把测试集标签转化为COCO的格式，并将标签对应的图片保存到目录 /val2017/

    print("—"*50)

    print("| Test images:")

    to_coco(args,y_test,train=False)

if __name__ == "__main__":

    print("—"*50)

    main()

    print("—"*50)

labelme转coco数据集的更多相关文章

coco数据集标注图转为二值图python（附代码）
coco数据集大概有8w张以上的图片,而且每幅图都有精确的边缘mask标注. 后面后分享一个labelme标注的json或xml格式转二值图的源码(以备以后使用) 而我现在在研究显著性目标检测,需要的 ...
COCO 数据集的使用
Windows 10 编译 Pycocotools 踩坑记 COCO数据库简介微软发布的COCO数据库, 除了图片以外还提供物体检测, 分割(segmentation)和对图像的语义文本描述信息. ...
COCO数据集深入理解
TensorExpand/TensorExpand/Object detection/Data_interface/MSCOCO/ 深度学习数据集介绍及相互转换 Object segmentation ...
COCO 数据集使用说明书
下面的代码改写自 COCO 官方 API,改写后的代码 cocoz.py 被我放置在 Xinering/cocoapi.我的主要改进有: 增加对 Windows 系统的支持: 替换 defaultdi ...
Pascal VOC & COCO数据集介绍 & 转换
目录 Pascal VOC & COCO数据集介绍 Pascal VOC数据集介绍 1. JPEGImages 2. Annotations 3. ImageSets 4. Segmentat ...
[PocketFlow]解决TensorFLow在COCO数据集上训练挂起无输出的bug
1. 引言因项目要求,需要在PocketFlow中添加一套PeleeNet-SSD和COCO的API,具体为在datasets文件夹下添加coco_dataset.py, 在nets下添加pelee ...
在ubuntu1604上使用aria2下载coco数据集效率非常高
简单的下载方法: 所以这里介绍一种能照顾大多数不能上外网的同学的一种简单便捷,又不会中断的下载方法:系统环境: Ubuntu 14.04 方法: a. 使用aria2 搭配命令行下载.需要先安装: s ...
MS coco数据集下载
2017年12月02日 23:12:11 阅读数:10411 登录ms-co-co数据集官网,一直不能进入,FQ之后开看到下载链接.有了下载链接下载还是很快的,在我这儿晚上下载,速度能达到7M/s,所 ...
COCO数据集使用
一.简介官方网站:http://cocodataset.org/全称:Microsoft Common Objects in Context (MS COCO)支持任务:Detection.Keyp ...
Microsoft COCO 数据集
本篇博客主要以介绍MS COCO数据集为目标,分为3个部分:COCO介绍,数据集分类和COCO展示. 本人主要下载了其2014年版本的数据,一共有20G左右的图片和500M左右的标签文件.标签文件标记 ...

随机推荐

CYQ.Data 支持 DaMeng 达梦数据库
DaMeng 达梦数据库介绍: 达梦数据库(DMDB)是中国自主研发的关系型数据库管理系统,由达梦科技股份有限公司开发. 达梦数据库提供了企业级的数据库解决方案,广泛应用于金融.电信.政府.制造等行业 ...
Landsat 7的热红外波段有2个该如何选择？
本文介绍Landsat 7遥感影像数据中B61.B62两个热红外波段的区别,以及研究应用时二者选择的依据. Landsat 7遥感影像数据具有2个热红外波段,分别是Band 61与Band 6 ...
Python回顾面向对象
[一]面向过程开发和面向对象开发 [1]面向过程包括函数和面条包括面条版本一条线从头穿到尾学习函数后开始对程序进行分模块,分功能开发学习模块化开发,我们就可以对我们的功能进行分类开发建一个功能 ...
Redis稳定性之战：AOF日志支撑数据持久化
★ Redis24篇集合 1 介绍 AOF(Append Only File)持久化:以独立日志的方式存储了 Redis 服务器的顺序指令序列,并只记录对内存进行修改的指令. 当Redis服务发生雪崩 ...
idea vue 格式化并保存文件宏快捷键 ctrl+s
idea 格式化是 reformat Code 存盘是 ctrl+s 所以创建一个宏,先点格式化,再点存盘,然后定义个ctrl+s的快捷键覆盖之前的保存就ok了. 资料: IDEA 配置宏定义并为宏 ...
C++实现一个简单的生产者-消费者队列
本文的代码都是ChatGPT生成,我只是做了微小的调整和整合,AI提示词如下: 设计一个C++类,支持生产者-消费者模型,可以通过size函数获取剩余数量可能第一次生成的不一定合适,多刷新几次. 生 ...
cpprestsdk移植到mingw，项目上传至github
如题 https://github.com/bbqz007/cpprestsdk4mingw 移植过程解决的问题,下面列出其中一些问题: 1. mingw对#pragma once支持不好. 须要在所 ...
快速将json装DTO的GsonFormatPlus插件使用
参考:https://www.jianshu.com/p/8fb0e4274436 https://blog.csdn.net/qq_43039260/article/details/12676582 ...
2022亚洲视博会圆满落幕，3DCAT荣获“优秀沉浸式视觉解决方案”奖
2022年8月10-12日,为期3天的2022世界元宇宙生态博览会暨VR/AR/MR/XR.数字创意.数字展陈.数字文旅.数字运动.数字艺术与沉浸式空间场景设计展览会圆满落下帷幕! 此次展会共包含三大 ...
使用现代身份验证（OAuth）来连接POP、IMAP或SMTP
我的博客园:https://www.cnblogs.com/CQman/ 转载: https://mp.weixin.qq.com/s?__biz=MzU0MzUxMzU2NA==&mid=2 ...

labelme转coco数据集

labelme转coco数据集的更多相关文章

随机推荐

热门专题