Protocol buffers are Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data.

Protocol buffers是由Google设计的无关程序语言、平台的、具有可扩展性机制的序列化数据结构。

The tf.train.Example message (or protosun) is a flexible message type that represents a {"string": value} mapping. It is designed for use with TensorFlow and is used throughout the higher-level APIs such as TFX.

tf.traom.Example是一种表示{“string”:value}映射关系的灵活的消息类型。它被设计用于TensorFlow以及更加高级的API。

写入

tf.train.Example

一个tf.train.Example的实例是构建的是数个{”string“: tf.train.Feature}映射。

其中，tf.train.Feature可以是以下三种，其他类型的数据格式可以通过一个或多个Feature组合描述：

tf.train.BytesList
tf.train.FloatList
tf.train.Int64List

模板

import tensorflow as tf

with tf.io.TFRecordWriter("train.tfrecords","GZIP") as writer:

    for i in range(200): # Assume there are 200 records

        example_proto = tf.train.Example(

            features=tf.train.Features(

                feature= {

                    'feature0':

                        tf.train.Feature(float_list=tf.train.int64List(value=feature0)),

                    'feature1':

                        tf.train.Feature(float_list=tf.train.FloatList(value=feature1)),

                    'feature2':

                        tf.train.Feature(float_list=tf.train.BtyesList(value=feature2)),

                    'label':

                        tf.train.Feature(float_list=tf.train.int64List(value=[label])),

                }

            )

        )

        writer.write(example_proto.SerializeToString())

读取

tf.io.parse_single_example 和 tf.io.parse_example

One might see performance advantages by batching Example protos with parse_example instead of using this function directly.

对Example protos分批并使用parse_example会比直接使用parse_single_example有性能优势。

模板

# with map_func using tf.io.parse_single_example

def map_func(example):

    # Create a dictionary describing the features.

    feature_description = {

        'feature0': tf.io.FixedLenFeature([len_feature0], tf.int64),

        'feature1': tf.io.FixedLenFeature([len_feature1], tf.float32),

        'feature2': tf.io.FixedLenFeature([len_feature2], tf.int64),

        'label': tf.io.FixedLenFeature([1], tf.int64),

    }

    parsed_example = tf.io.parse_single_example(example, features=feature_description)

    feature0 = parsed_example["feature0"]

    feature1 = parsed_example["feature1"]

    feature2 = parsed_example["feature2"]

    label = parsed_example["label"]

    return image, label

raw_dataset = tf.data.TFRecordDataset("train.tfrecords","GZIP")

parsed_dataset = raw_dataset.map(map_func=map_func)

parsed_dataset = raw_dataset.batch(BATCH_SIZE)

以下代码和前者的区别在于map_func中使用tf.io.parse_example替换tf.io.parse_single_example，并在调用map方法前先调用batch方法。

# with map_func using tf.io.parse_example

def map_func(example):

    # Create a dictionary describing the features.

    feature_description = {

        'feature0': tf.io.FixedLenFeature([len_feature0], tf.int64),

        'feature1': tf.io.FixedLenFeature([len_feature1], tf.float32),

        'feature2': tf.io.FixedLenFeature([len_feature2], tf.int64),

        'label': tf.io.FixedLenFeature([1], tf.int64),

    }

    parsed_example = tf.io.parse_example(example, features=feature_description)

    # features can be modified here

    feature0 = parsed_example["feature0"]

    feature1 = parsed_example["feature1"]

    feature2 = parsed_example["feature2"]

    label = parsed_example["label"]

    return image, label

raw_dataset = tf.data.TFRecordDataset(["./1.tfrecords", "./2.tfrecords"])

raw_dataset = raw_dataset.batch(BATCH_SIZE)

parsed_dataset = raw_dataset.map(map_func=map_func)

以上两张图分别时使用带有parse_single_example和parse_example的map_func在训练中的性能对比，后者（parse_example）明显性能更优秀。

不定长数据的读写 RaggedFeature

对于不定长且未padding的数据，写入过程中和定长数据没有区别，但在读取过程中需要使用tf.io.RaggedFeature替代tf.io.FixedLenFeature。

def map_func(example):

    # Create a dictionary describing the features.

    feature_description = {

        'feature': tf.io.RaggedFeature(tf.float32),

        'label': tf.io.FixedLenFeature([1], tf.int64),

    }

    parsed_example = tf.io.parse_example(example, features=feature_description)

    # feature = parsed_example["feature"]

    feature = parsed_example["feature"].to_tensor(shape=[1,100])

    label = parsed_example["label"]

    return feature, label

raw_dataset = tf.data.TFRecordDataset("train_unpadding.tfrecords").batch(1000)

parsed_dataset = raw_dataset.map(map_func=map_func)

下图对比了是否对不定长数据进行padding分别在压缩和未压缩的情况下的文件大小。

TFrecord写入与读取的更多相关文章

java一行一行写入或读取数据
原文:http://www.cnblogs.com/linjiqin/archive/2011/03/23/1992250.html 假如E:/phsftp/evdokey目录下有个evdokey_2 ...
iOS中plist的创建，数据写入与读取
iOS中plist的创建,数据写入与读取 Documents:应用将数据存储在Documents中,但基于NSuserDefaults的首选项设置除外Library:基于NSUserDefaults的 ...
Java Web SSH框架总是无法写入无法读取Cookie
不关乎技术,关乎一个小Tips: 默认情况下,IE和Chrome内核的浏览器会认为http://localhost为无效的域名,所以不会保存它的cookie,使用http://127.0.0.1访问程 ...
php 如何写入、读取word，excel文档
如何在php写入.读取word文档 <? //如何在php写入.读取word文档 // 建立一个指向新COM组件的索引 $word = new COM("word.applicatio ...
Java笔记--java一行一行写入或读取数据
转自 Ruthless java一行一行写入或读取数据链接:http://www.cnblogs.com/linjiqin/archive/2011/03/23/1992250.html 假如E:/ ...
蜗牛爱课－ iOS中plist的创建，数据写入与读取
iOS中plist的创建,数据写入与读取功能创建一个test.plist文件-(void)triggerStorage{ NSArray *paths=NSSearchPathForDirect ...
HDFS写入和读取流程
HDFS写入和读取流程一.HDFS HDFS全称是Hadoop Distributed System.HDFS是为以流的方式存取大文件而设计的.适用于几百MB,GB以及TB,并写一次读多次的场合.而 ...
java处理Excel文件---excel文件的创建，删除，写入，读取
这篇文章的代码是我封装的excel处理类,包含推断excel是否存在,表格索引是否存在,创建excel文件,删除excel文件,往excel中写入信息,从excel中读取数据. 尤其在写入与读取两个方 ...
INI文件的写入与读取
INI文件的写入与读取 [节名] '[]中的节名对应此API的第一参数 Name=内容 'Nmae对应此API的第二参数 API的第三参数是没有取到匹配内容时返回的字符串; ...

随机推荐

流量录制回放工具jvm-sandbox-repeater入门篇——录制和回放
在上一篇文章中,把repeater服务部署介绍清楚了,详细可见:流量录制回放工具jvm-sandbox-repeater入门篇--服务部署今天在基于上篇内容基础上,再来分享下流量录制和回放的相关内容 ...
js实时查询，为空提示
<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8&quo ...
Idea之常用插件
效率提升 Jrebel 热部署插件,修改代码编译就生效,节省大量重启服务时间.热部署支持修改方法代码,方法定义,类定义,接口定义(包括swagger文档),修改资源文件,修改mapper.xml(需配 ...
数据管理技术发展，数据库应用发展史，数据库分类，MySQL
计算机数据管理技术发展 1. 自由管理阶段用户以文件形式将数据组织起来,并附属在各自的应用程序下. 1.数据不保存当时计算机主要用于科学计算,一般不需要将数据长期保存,只是计算某一课 ...
Web Api源码(路由注册)
这篇文章只是我学习Web API框架的输出,学习方法还是输出倒逼输入比较行得通,所以不管写的好不好,坚持下去,肯定有收获.篇幅比较长,仔细思考阅读下来大约需要几分钟. 做.NET开发有好几年时间了,从 ...
C# WPF后台动态添加控件(经典)
概述在Winform中从后台添加控件相对比较容易,但是在WPF中,我们知道界面是通过XAML编写的,如何把后台写好的控件动态添加到前台呢?本节举例介绍这个问题. 这里要用到UniformGrid布局 ...
推荐一款新框架PyScript：在 HTML 嵌入 Python 代码！
一.介绍网页浏览器是目前世界上最普遍,最可携的计算机环境.几乎所有人都可以在计算机或是手机上使用网页浏览器,以没有基础设施障碍的方式访问程序. 在 PyCon US 2022 上,知名 Python ...
04 Springboot 格式化LocalDateTime
Springboot 格式化LocalDateTime 我们知道在springboot中有默认的json解析器,Spring Boot 中默认使用的 Json 解析技术框架是 jackson.我们点开 ...
RabitMQ 发布确认
每日一句军人天生就舍弃了战斗的意义! 概述 RabitMQ 发布确认,保证消息在磁盘上. 前提条件 1.队列必须持久化队列持久化 2.队列中的消息必须持久化消息持久化使用三种发布确认的方式: ...
贝塞尔曲线在Unity中的应用
前言:国庆放假后基本整个人的散掉了.加之种种原因,没时间没心情写博客.最近研究了一下3d的一些效果.其中有类似翻书撕纸的操作,可是一个panel怎么由平整的变成弯曲的呢? 两点可以确定一条直线,三点可 ...

TFrecord写入与读取

写入

tf.train.Example

模板

读取

tf.io.parse_single_example 和 tf.io.parse_example

模板

不定长数据的读写 RaggedFeature

TFrecord写入与读取的更多相关文章

随机推荐

热门专题