使用tfrecord建立自己的数据集

注意事项：

1.关于输入图像格式的问题

使用io.imread()的时，根据输入图像确定as_grey的参数值。转化为字符串之后(image.tostring) ，最后输出看下image_raw的长度。因为不同的图像编码格式，存储方式不同。

我读入的灰度图jpeg格式，类型是int64,image_raw的大小是图像的大小的8倍。但如果是RGB图像，则统一类型是uint8。确定了类型，在之后的解码（decode_raw）中，需要将type设置和存储方式同样的类型。

根据image_raw的长度和原图像大小，推算一下使用的类型，常用的是uint8,int32,int64.

2.转化成tfrecords的时间有点长，需要等待。

import os

import tensorflow as tf

import numpy as np

import skimage.io as io

import matplotlib.pyplot as plt

import cv2

def get_data (file_path):

    data = []

    label = []

    for dirs in os.listdir(file_path):

        temp_path = os.path.join(file_path,dirs)

        i =0

        for dirss in os.listdir(temp_path):

            data.append(os.path.join(temp_path,dirss))

        num_img = len(os.listdir(temp_path))

        label = np.append(label,num_img*[1])

    temp = np.array([data,label])

    temp = temp.transpose()

    np.random.shuffle(temp)

    image_list = list(temp[:,0])

    label_list = list(temp[:,1])

    label_list = [int(float(i)) for i in label_list]

    return image_list,label_list

# 转化成字符串

def _int64_feature(value):

    return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))

def _bytes_feature(value):

    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))

def convert_tfrecord(images,labels,save_filename):

    writer = tf.python_io.TFRecordWriter(save_filename)

    print("Transform start....")

    num_examples= len(labels)

    if np.shape(images)[0]!=num_examples:

        raise ValueError('Images size %d does not match label size %d.' % (images.shape[0], num_examples))

    for index in np.arange(0,num_examples):

        try:

            image = io.imread(images[index],as_grey=False)

            #image = tf.image.decode_jpeg(images[index])

            #print(image.shape)

            image_raw = image.tostring()

            #print(len(image_raw))

            example = tf.train.Example(features = tf.train.Features(feature={

                'label' :_int64_feature(int(labels[index])),

                'image_raw':_bytes_feature(image_raw)

            }))

            writer.write(example.SerializeToString())

        except IOError as e:

            print('Could not read:',images[index])

            print('error :%s Skip it !\n'%e)

    writer.close()

    print("success!")

def read_and_decode(tfrecords_file,batch_size):

    reader = tf.TFRecordReader()

    filename_queue = tf.train.string_input_producer([tfrecords_file])

    _,serialized_example = reader.read(filename_queue)

    features = tf.parse_single_example(

        serialized_example,

        features={

            'label': tf.FixedLenFeature([],tf.int64),

            'image_raw': tf.FixedLenFeature([], tf.string)

        }

    )

    #print(features['image_raw'])

    capacity = 1000+3*batch_size

    image = tf.decode_raw(features['image_raw'],tf.uint8)

    label = tf.cast(features['label'],tf.int32)

    #image = tf.image.resize_images(image,[300, 200, 1])

    image = tf.reshape(image,[200,300,3])

    image_batch,label_batch = tf.train.batch([image,label],

                                             batch_size=batch_size,

                                             capacity=capacity)

    image_batch = tf.image.resize_image_with_crop_or_pad(image_batch,100,100)

    image_batch = tf.cast(image_batch, tf.float32) * (1. / 255)

    return image_batch,label_batch

def plot_images(images, labels):

    '''plot one batch size

    '''

    for i in np.arange(0, 2):

        plt.subplot(3, 3, i + 1)

        plt.axis('off')

        # plt.title((labels[i] - 1), fontsize = 14)

        plt.subplots_adjust(top=1)

        print(labels[i])

        print(images.shape)

        # print(images[i].shape)

        plt.imshow(images[i][:,:,:])

    plt.show()

def train():

    image,label = get_data('E:\syn_data')

    convert_tfrecord(image,label,'1.tfrecords')

    x_batch, y_batch = read_and_decode('1.tfrecords', batch_size=2)

    with tf.Session() as sess:

        coord = tf.train.Coordinator()

        threads = tf.train.start_queue_runners(coord=coord)

        try:

            i=0

            while not coord.should_stop() and i<3:

                     # just plot one batch size

                image, label = sess.run([x_batch, y_batch])

                plot_images(image, label)

                i+=1

        except tf.errors.OutOfRangeError:

            print('done!')

        finally:

            coord.request_stop()

        coord.join(threads)

#train()

使用tfrecord建立自己的数据集的更多相关文章

tensorflow目标检测API之建立自己的数据集
1 收集数据为了方便,我找了11张月儿的照片做数据集,如图1,当然这在实际应用过程中是远远不够的 2 labelImg软件的安装使用labelImg软件(下载地址:https://github.c ...
SSAS多维数据集以及维度的建立
首先打开vs建立一个Analysis Services项目,然后点击数据源文件右键[新建数据源],根据数据源向导建立自己的数据源,如图1: 点击[确定],选择刚才的数据连接,点击[下一步]进入模拟信息 ...
ubuntu之路——day6(今天对数据集的建立有了更深的体会）
两个重点: 一.举个例子,如果建立一个图像识别的数据集,你的训练集和你的训练验证集是从网上爬下来的(也就是说这些图片的大小.像素.后期制作都可能很精美),你真正的测试集是用户的手机上传(不同的手机.环 ...
第二十二节，TensorFlow中的图片分类模型库slim的使用、数据集处理
Google在TensorFlow1.0,之后推出了一个叫slim的库,TF-slim是TensorFlow的一个新的轻量级的高级API接口.这个模块是在16年新推出的,其主要目的是来做所谓的“代码瘦 ...
FineReport中如何制作树数据集来实现组织树报表
1. 问题描述 FineReport,组织树报表中由id与父id来实现组织树报表,若层级数较多时,对每个单元格设置过滤条件和形态会比较繁琐,因此FineReport提供了一种特殊的数据集——树数据集, ...
TensorFlow数据集（一）——数据集的基本使用方法
参考书 <TensorFlow:实战Google深度学习框架>(第2版) 例子:从一个张量创建一个数据集,遍历这个数据集,并对每个输入输出y = x^2 的值. #!/usr/bin/en ...
TensorFlow学习笔记——LeNet-5（训练自己的数据集）
在之前的TensorFlow学习笔记——图像识别与卷积神经网络(链接:请点击我)中了解了一下经典的卷积神经网络模型LeNet模型.那其实之前学习了别人的代码实现了LeNet网络对MNIST数据集的训练 ...
ArcGIS Engine开发之地图基本操作（3）
地图数据的加载一.加载Shapefile数据 Shapefile文件是目前主流的一种空间数据的文件存储方式,也是不同GIS软件进行数据格式转换常用的中间格式.加载Shapefile数据的方式有两种: ...
限制Boltzmann机（Restricted Boltzmann Machine）
起源:Boltzmann神经网络 Boltzmann神经网络的结构是由Hopfield递归神经网络改良过来的,Hopfield中引入了统计物理学的能量函数的概念. 即,cost函数由统计物理学的能量函 ...

随机推荐

ELK基础配置
前言近期在研究日志系统的设计,感觉现在公司的子系统和接口太多了,日志看不过来,就想着有没有一种方法可以把各个程序的日志组合到一起.于是乎就搜到了ELK.开始对ELK的概念完全搞不懂,就照着各个平台文 ...
CentOS7安装firewall防火墙
CentOS7之后 , 系统已经推荐了firewall防火墙 , 而不是iptables 主要 : firewall 和 iptables冲突 , 需要禁用其中一个. #停止iptables服务 sy ...
SpringBoot学习（五）—— springboot快速整合Druid
Druid连接池简介由阿里巴巴开源的druid连接池是目前综合实力最突出的数据库连接池,而且还提供了监控日志功能,能够分析SQL执行情况. 引入druid连接池 pom.xml中加入 <de ...
使用代码浏览WPF控件模版
纯代码创建,不需要创建界面,创建WPF工程后,直接复制代码就可以使用. 当你手头没有Blend,又不记得以下这段代码,但是又想浏览控件模版的时候,就可以直接复制拿来用了. public partial ...
VS 2015 .net UI界面报错总结
一.提示錯誤解決方法: 右击解决方案点击properties Window Ctrl+W ,P 将Mnaged Pipeline Mode 从Integrated更改为Classic 二.提示錯誤 ...
Java Web-Redis学习
Java Web-Redis学习基本概念 Redis是一款高性能的NOSQL系列的.非关系型数据库 NOSQL:not only SQL,是一系列非关系型数据库的总称,例如radis.hbase等数 ...
基于【 MySql 】二 || mysql详细学习笔记
mysql重点学习笔记 /* Windows服务 */ -- 启动MySQL net start mysql -- 创建Windows服务 sc create mysql binPath= mysql ...
自定义过滤器和标签 & 静态文件相关
自定义过滤器和标签 1.在settings中的INSTALLED_APPS配置当前app,不然django无法找到自定义的simple_tag. 2.在app中创建templatetags模块(模块名 ...
SEO运用meta标签进行网站优化
SEO定义 Search Engine Optimization 搜索引擎优化一,常用的HTTP-EQUIV类型: Set-Cookie(cookie设定) 说明:如果网页过期,存盘的cookie将 ...
解决在web.xml中配置server服务器启动失败问题
一.问题"Server Tomacat v8.5 Server at locallhost failed to start" 二.解决方法:删除注释@webServlet 三.分析 ...

使用tfrecord建立自己的数据集

使用tfrecord建立自己的数据集的更多相关文章

随机推荐

热门专题