TF Boys (TensorFlow Boys ) 养成记（二）

TensorFlow 的 How-Tos，讲解了这么几点：

1. 变量：创建，初始化，保存，加载，共享；

2. TensorFlow 的可视化学习，（r0.12版本后，加入了Embedding Visualization）

3. 数据的读取；

4. 线程和队列；

5. 分布式的TensorFlow；

6. 增加新的Ops；

7. 自定义数据读取；

由于各种原因，本人只看了前5个部分，剩下的2个部分还没来得及看，时间紧任务重，所以匆匆发车了，以后如果有用到的地方，再回过头来研究。学习过程中深感官方文档的繁杂冗余极多多，特别是第三部分数据读取，又臭又长，花了我好久时间，所以我想把第三部分整理如下，方便乘客们。

TensorFlow 有三种方法读取数据：1）供给数据，用placeholder；2）从文件读取；3）用常量或者是变量来预加载数据，适用于数据规模比较小的情况。供给数据没什么好说的，前面已经见过了，不难理解，我们就简单的说一下从文件读取数据。

官方的文档里，从文件读取数据是一段很长的描述，链接层出不穷，看完这个链接还没看几个字，就出现了下一个链接。

自己花了很久才认识路，所以想把这部分总结一下，带带我的乘客们。

首先要知道你要读取的文件的格式，选择对应的文件读取器；

然后，定位到数据文件夹下，用

["file0", "file1"]        # or

[("file%d" % i) for i in range(2)])    # or

tf.train.match_filenames_once

选择要读取的文件的名字，用 tf.train.string_input_producer 函数来生成文件名队列，这个函数可以设置shuffle = Ture，来打乱队列，可以设置epoch = 5，过5遍训练数据。

最后，选择的文件读取器，读取文件名队列并解码，输入 tf.train.shuffle_batch 函数中，生成 batch 队列，传递给下一层。

1）假如你要读取的文件是像 CSV 那样的文本文件，用的文件读取器和解码器就是 TextLineReader 和 decode_csv 。

2）假如你要读取的数据是像 cifar10 那样的 .bin 格式的二进制文件，就用 tf.FixedLengthRecordReader 和 tf.decode_raw 读取固定长度的文件读取器和解码器。如下列出了我的参考代码，后面会有详细的解释，这边先大致了解一下：

class cifar10_data(object):

    def __init__(self, filename_queue):

        self.height = 32

        self.width = 32

        self.depth = 3

        self.label_bytes = 1

        self.image_bytes = self.height * self.width * self.depth

        self.record_bytes = self.label_bytes + self.image_bytes

        self.label, self.image = self.read_cifar10(filename_queue)

    def read_cifar10(self, filename_queue):

        reader = tf.FixedLengthRecordReader(record_bytes = self.record_bytes)

        key, value = reader.read(filename_queue)

        record_bytes = tf.decode_raw(value, tf.uint8)

        label = tf.cast(tf.slice(record_bytes, [0], [self.label_bytes]), tf.int32)

        image_raw = tf.slice(record_bytes, [self.label_bytes], [self.image_bytes])

        image_raw = tf.reshape(image_raw, [self.depth, self.height, self.width])

        image = tf.transpose(image_raw, (1,2,0))

        image = tf.cast(image, tf.float32)

        return label, image

def inputs(data_dir, batch_size, train = True, name = 'input'):

    with tf.name_scope(name):

        if train:

            filenames = [os.path.join(data_dir,'data_batch_%d.bin' % ii)

                        for ii in range(1,6)]

            for f in filenames:

                if not tf.gfile.Exists(f):

                    raise ValueError('Failed to find file: ' + f)

            filename_queue = tf.train.string_input_producer(filenames)

            read_input = cifar10_data(filename_queue)

            images = read_input.image

            images = tf.image.per_image_whitening(images)

            labels = read_input.label

            num_preprocess_threads = 16

            image, label = tf.train.shuffle_batch(

                                    [images,labels], batch_size = batch_size,

                                    num_threads = num_preprocess_threads,

                                    min_after_dequeue = 20000, capacity = 20192)

            return image, tf.reshape(label, [batch_size])

        else:

            filenames = [os.path.join(data_dir,'test_batch.bin')]

            for f in filenames:

                if not tf.gfile.Exists(f):

                    raise ValueError('Failed to find file: ' + f)

            filename_queue = tf.train.string_input_producer(filenames)

            read_input = cifar10_data(filename_queue)

            images = read_input.image

            images = tf.image.per_image_whitening(images)

            labels = read_input.label

            num_preprocess_threads = 16

            image, label = tf.train.shuffle_batch(

                                    [images,labels], batch_size = batch_size,

                                    num_threads = num_preprocess_threads,

                                    min_after_dequeue = 20000, capacity = 20192)

            return image, tf.reshape(label, [batch_size])

3）如果你要读取的数据是图片，或者是其他类型的格式，那么可以先把数据转换成 TensorFlow 的标准支持格式 tfrecords ，它其实是一种二进制文件，通过修改 tf.train.Example 的Features，将 protocol buffer 序列化为一个字符串，再通过 tf.python_io.TFRecordWriter 将序列化的字符串写入 tfrecords，然后再用跟上面一样的方式读取tfrecords，只是读取器变成了tf.TFRecordReader，之后通过一个解析器tf.parse_single_example ，然后用解码器 tf.decode_raw 解码。

例如，对于生成式对抗网络GAN，我采用了这个形式进行输入，部分代码如下，后面会有详细解释，这边先大致了解一下：

def _int64_feature(value):

    return tf.train.Feature(int64_list = tf.train.Int64List(value = [value]))

def _bytes_feature(value):

    return tf.train.Feature(bytes_list = tf.train.BytesList(value = [value]))

def convert_to(data_path, name):

    """

    Converts s dataset to tfrecords

    """

    rows = 64

    cols = 64

    depth = DEPTH

    for ii in range(12):

        writer = tf.python_io.TFRecordWriter(name + str(ii) + '.tfrecords')

        for img_name in os.listdir(data_path)[ii*16384 : (ii+1)*16384]:

            img_path = data_path + img_name

            img = Image.open(img_path)

            h, w = img.size[:2]

            j, k = (h - OUTPUT_SIZE) / 2, (w - OUTPUT_SIZE) / 2

            box = (j, k, j + OUTPUT_SIZE, k+ OUTPUT_SIZE)

            img = img.crop(box = box)

            img = img.resize((rows,cols))

            img_raw = img.tobytes()

            example = tf.train.Example(features = tf.train.Features(feature = {

                                    'height': _int64_feature(rows),

                                    'weight': _int64_feature(cols),

                                    'depth': _int64_feature(depth),

                                    'image_raw': _bytes_feature(img_raw)}))

            writer.write(example.SerializeToString())

        writer.close()

def read_and_decode(filename_queue):

    """

    read and decode tfrecords

    """

#    filename_queue = tf.train.string_input_producer([filename_queue])

    reader = tf.TFRecordReader()

    _, serialized_example = reader.read(filename_queue)

    features = tf.parse_single_example(serialized_example,features = {

                        'image_raw':tf.FixedLenFeature([], tf.string)})

    image = tf.decode_raw(features['image_raw'], tf.uint8)

    return image

这里，我的data_path下面有16384*12张图，通过12次写入Example操作，把图片数据转化成了12个tfrecords，每个tfrecords里面有16384张图。

4）如果想定义自己的读取数据操作，请参考https://www.tensorflow.org/how_tos/new_data_formats/。

好了，今天的车到站了，请带好随身物品准备下车，明天老司机还有一趟车，请记得准时乘坐，车不等人。

参考文献：

1. https://www.tensorflow.org/how_tos/

2. 没了

TF Boys (TensorFlow Boys ) 养成记（二）的更多相关文章

TF Boys (TensorFlow Boys ) 养成记（二）： TensorFlow 数据读取
TensorFlow 的 How-Tos,讲解了这么几点: 1. 变量:创建,初始化,保存,加载,共享: 2. TensorFlow 的可视化学习,(r0.12版本后,加入了Embedding Vis ...
TF Boys (TensorFlow Boys ) 养成记（一）
本资料是在Ubuntu14.0.4版本下进行,用来进行图像处理,所以只介绍关于图像处理部分的内容,并且默认TensorFlow已经配置好,如果没有配置好,请参考官方文档配置安装,推荐用pip安装.关于 ...
TF Boys (TensorFlow Boys ) 养成记（一）：TensorFlow 基本操作
本资料是在Ubuntu14.0.4版本下进行,用来进行图像处理,所以只介绍关于图像处理部分的内容,并且默认TensorFlow已经配置好,如果没有配置好,请参考官方文档配置安装,推荐用pip安装.关于 ...
TF Boys (TensorFlow Boys ) 养成记（五）
有了数据,有了网络结构,下面我们就来写 cifar10 的代码. 首先处理输入,在 /home/your_name/TensorFlow/cifar10/ 下建立 cifar10_input.py,输 ...
TF Boys (TensorFlow Boys ) 养成记（四）
前面基本上把 TensorFlow 的在图像处理上的基础知识介绍完了,下面我们就用 TensorFlow 来搭建一个分类 cifar10 的神经网络. 首先准备数据: cifar10 的数据集共有 6 ...
TF Boys (TensorFlow Boys ) 养成记（五）： CIFAR10 Model 和 TensorFlow 的四种交叉熵介绍
有了数据,有了网络结构,下面我们就来写 cifar10 的代码. 首先处理输入,在 /home/your_name/TensorFlow/cifar10/ 下建立 cifar10_input.py,输 ...
TF Boys (TensorFlow Boys ) 养成记（四）：TensorFlow 简易 CIFAR10 分类网络
前面基本上把 TensorFlow 的在图像处理上的基础知识介绍完了,下面我们就用 TensorFlow 来搭建一个分类 cifar10 的神经网络. 首先准备数据: cifar10 的数据集共有 6 ...
TF Boys (TensorFlow Boys ) 养成记（六）
圣诞节玩的有点嗨,差点忘记更新.祝大家昨天圣诞节快乐,再过几天元旦节快乐. 来继续学习,在/home/your_name/TensorFlow/cifar10/ 下新建文件夹cifar10_train ...
TF Boys (TensorFlow Boys ) 养成记（三）
上次说到了 TensorFlow 从文件读取数据,这次我们来谈一谈变量共享的问题. 为什么要共享变量?我举个简单的例子:例如,当我们研究生成对抗网络GAN的时候,判别器的任务是,如果接收到的是生成器生 ...

随机推荐

各种 starter poms (启动器)
starter包含了搭建项目,快速运行所需的依赖.它是一个依赖关系描述符的集合.当应用需要一种spring的服务时,不需要粘贴拷贝大量的依赖关系描述符.例如想在spring中使用redis,只需要在项 ...
Codeforces Round #325 垫底纪念
A. Alena's Schedule 间隔0长度为1被记录 1被记录其余不记录 #include <iostream> #include <cstring> #incl ...
(转)ajax.dll,ajaxpro.dll的区别和用法
ASP.NET AjaxPro的应用 1.首先下载AjaxPro组件.并将AjaxPro.dll引用到网站(或项目). 2.修改Web.config.在 <system.web> 元素中添 ...
C语言 1 << 0 是什么意思
1 << 0 是把1 按2进制左移0位,结果还是 1 ,2进制 0000 00011 << 1, 是把1 按2进制左移1位,结果是2,2进制 0000 0010
expdp报错ora 39126
11.2.0.2,expdp报错: ORA-39126: Worker unexpected fatal error in KUPW$WORKER.GET_TABLE_DATA_OBJECTS []O ...
iOS中两个APP之间的跳转和通信
app间的跳转一:在第一个app首先要做下面这些操作: 1.在info.plist文件中的Information Property List下添加一项:URL types. 2.点开URL type ...
Mac上Homebrew的使用 (Homebrew 使 OS X 更完整)
0 Homebrew是啥? “Homebrew installs the stuff you need that Apple didn’t.——Homebrew 使 OS X 更完整”. Homebr ...
【Java学习笔记】泛型
泛型: jdk1.5出现的安全机制好处: 1.将运行时期的问题ClassCastException转到了编译时期. 2.避免了强制转换的麻烦. <>: 什么时候用? 当操作的引用数据类型 ...
用JS制作简易的可切换的年历，类似于选项卡
p.p1 { margin: 0.0px 0.0px 0.0px 0.0px; font: 30.0px Consolas; color: #2b7ec3 } p.p2 { margin: 0.0px ...
Java+MySql图片数据保存与读取的具体实例
1.创建表: drop table if exists photo;CREATE TABLE photo ( id INT NOT NULL AUTO_INCREMENT PRIMARY KEY ...

TF Boys (TensorFlow Boys ) 养成记（二）

TF Boys (TensorFlow Boys ) 养成记（二）的更多相关文章

随机推荐

热门专题