#!/usr/bin/env python
# Copyright 2016 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
r"""Process the ImageNet Challenge bounding boxes for TensorFlow model training. Associate the ImageNet 2012 Challenge validation data set with labels. The raw ImageNet validation data set is expected to reside in JPEG files
located in the following directory structure. data_dir/ILSVRC2012_val_00000001.JPEG
data_dir/ILSVRC2012_val_00000002.JPEG
...
data_dir/ILSVRC2012_val_00050000.JPEG This script moves the files into a directory structure like such:
data_dir/n01440764/ILSVRC2012_val_00000293.JPEG
data_dir/n01440764/ILSVRC2012_val_00000543.JPEG
...
where 'n01440764' is the unique synset label associated with
these images. This directory reorganization requires a mapping from validation image
number (i.e. suffix of the original file) to the associated label. This
is provided in the ImageNet development kit via a Matlab file. In order to make life easier and divorce ourselves from Matlab, we instead
supply a custom text file that provides this mapping for us. Sample usage:
./preprocess_imagenet_validation_data.py ILSVRC2012_img_val \
imagenet_2012_validation_synset_labels.txt
""" from __future__ import absolute_import
from __future__ import division
from __future__ import print_function import os
import sys from six.moves import xrange # pylint: disable=redefined-builtin if __name__ == '__main__':
if len(sys.argv) < 3: # sys.argv返回脚本本身的名字及给定脚本的参数.
print('Invalid usage\n'
'usage: preprocess_imagenet_validation_data.py '
'<validation data dir> <validation labels file>')
sys.exit(-1) # System.exit(-1)是指所有程序(方法,类等)停止,系统停止运行。
data_dir = sys.argv[1]
validation_labels_file = sys.argv[2] # Read in the 50000 synsets associated with the validation data set.
# imagenet_2012_validation_synset_labels.txt 这个文件中有50000行类别,有重复,与50000图片是一一对应的
labels = [l.strip() for l in open(validation_labels_file).readlines()] # strip() 方法用于移除字符串头尾指定的字符(默认为空格或换行符)。
unique_labels = set(labels) # set() 函数创建一个无序不重复元素集,可进行关系测试,删除重复数据,还可以计算交集、差集、并集等。 # Make all sub-directories in the validation data dir.
for label in unique_labels:
labeled_data_dir = os.path.join(data_dir, label)
if not os.path.exists(labeled_data_dir):
os.makedirs(labeled_data_dir) # Move all of the image to the appropriate sub-directory.
for i in xrange(len(labels)): # xrange() 函数用法与 range 完全相同,所不同的是生成的不是一个数组,而是一个生成器。
basename = 'ILSVRC2012_val_000%.5d.JPEG' % (i + 1)
original_filename = os.path.join(data_dir, basename)
if not os.path.exists(original_filename):
#print('Failed to find: ' % original_filename)
continue
#sys.exit(-1)
new_filename = os.path.join(data_dir, labels[i], basename)
os.rename(original_filename, new_filename)

82行的代码一加进去,就出错:

TypeError: not all arguments converted during string formatting

过程中还出现了以下错误:

Organizing the validation data into sub-directories.
Traceback (most recent call last):
File "F:/datasets/preprocess_imagenet_validation_data.py", line 86, in <module>
os.rename(original_filename, new_filename)
PermissionError: [WinError 32] ▒▒һ▒▒▒▒▒▒▒▒▒▒ʹ▒ô▒▒ļ▒▒▒▒▒▒▒▒޷▒▒▒▒ʡ▒: 'F:/ILSVRC2012_img_val/ILSVRC2012_val_00032304.JPEG' -> 'F:/ILSVRC2012_img_val/n02109961\\ILSVRC2012_val_00032304.JPEG'

可能是不能够一次性重命名太多文件,反正我重新运行了

./download_and_convert_imagenet.sh /f/ILSVRC2012_img_val_varified

preprocess_imagenet_validation_data.py这个程序可以继续重命名文件。

https://github.com/tensorflow/models/blob/master/research/slim/datasets/preprocess_imagenet_validation_data.py 改编版的更多相关文章

  1. https://github.com/chenghuige/tensorflow-exp/blob/master/examples/sparse-tensor-classification/

        https://github.com/chenghuige/tensorflow-exp/blob/master/examples/sparse-tensor-classification/ ...

  2. 结对项目https://github.com/bxoing1994/test/blob/master/源代码

    所选项目名称:文本替换      结对人:曲承玉 github地址 :https://github.com/bxoing1994/test/blob/master/源代码 结对人github地址:ht ...

  3. https://github.com/python/cpython/blob/master/Doc/library/contextlib.rst 被同一个线程多次获取的同步基元组件

    # -*- coding: utf-8 -*- import time from threading import Lock, RLock from datetime import datetime ...

  4. https://github.com/golang/crypto/blob/master/bcrypt/bcrypt.go

    https://github.com/golang/crypto/blob/master/bcrypt/bcrypt.go

  5. https://github.com/PyMySQL/PyMySQL/blob/master/pymysql/connections.py

    # Python implementation of the MySQL client-server protocol # http://dev.mysql.com/doc/internals/en/ ...

  6. 用swoole实现mysql的连接池--摘自https://github.com/153734009/doc/blob/master/php/mysql_pool.php

    <?php   $serv = new swoole_server("0.0.0.0", 9508);   $serv->set(['worker_num'=>1 ...

  7. GC 的认识(转) https://github.com/qcrao/Go-Questions/blob/master/GC/GC.md#1-什么是-gc有什么作用

    1. 什么是 GC,有什么作用? GC,全称 Garbage Collection,即垃圾回收,是一种自动内存管理的机制. 当程序向操作系统申请的内存不再需要时,垃圾回收主动将其回收并供其他代码进行内 ...

  8. tensorflow models flags 初步使用

    参考官方仓库:https://github.com/tensorflow/models/tree/master/official/utils/flags 测试Demo代码如下: from absl i ...

  9. Ubuntu18.04下安装、测试tensorflow/models Tensorflow Object Detection API 笔记

    参考:https://www.jianshu.com/p/1ed2d9ce6a88 安装 安装conda+tensorflow库 下载protoc linux x64版,https://github. ...

随机推荐

  1. IOS-企业开发人员账号&amp;邓白氏码申请记录

    Apple开发人员账号分三种,个人.公司,还有企业.个人和公司都称为标准账号. 另一种是教育机构的账号. 账号介绍 个人和公司的就不说了.如今仅仅说企业账号 首先是申请企业账号的地址: https:/ ...

  2. LoRa基础

    一.LoRa技术 LoRa 是LPWAN通信技术中的一种,是美国Semtech公司采用和推广的一种基于扩频技术的超远距离无线传输方案.这一方案改变了以往关于传输距离与功耗的折衷考虑方式,为用户提供一种 ...

  3. set names utf8;

    对应用程序来说,强制将它们发起的数据库链接设置成UTF8编码有什么办法? 每个链接建立时先执行set names utf8; [mysqld] init-connect=‘set names utf8 ...

  4. virtualbox 在物理机是无线网卡的时候做桥接配置

    在“计算机”图标上右键选择“管理”,在打开的“计算机管理”窗口中选择左侧的“设备管理器”,然后在右侧图示的地方右键选择“添加过时硬件”. 在打开的窗口中点击“下一步”. 选择“安装我手动从列表中选择的 ...

  5. 关系数据库(RDBMS)小记

    关系数据库三个范式 三个范式: 第一范式(1NF):数据表中的每一列(每个字段)必须是不可拆分的最小单元,也就是确保每一列的原子性 这里说的不可拆分通常是放在业务背景下而言的,是否可拆分视业务需求而定 ...

  6. 0710 mux协议的作用(ppp拨号时如何和gprs进行at指令交互)

    ppp拨号使gprs上网的同时如何和gprs模块进行at指令的交互,这是一个问题. 在linux中,ppp拨号上网是内核中支持的,只需要在内核配置中选上. ppp拨号的方式使gprs进行上网与at指令 ...

  7. Golang 笔记 2 函数、结构体、接口、指针

    一.函数 Go中函数是一等(first-class)类型.我们可以把函数当作值来传递和使用.Go中的函数可以返回多个结果.  函数类型字面量由关键字func.由圆括号包裹声明列表.空格以及可以由圆括号 ...

  8. 盘点 React 16.0 ~ 16.5 主要更新及其应用

    目录 0. 生命周期函数的更新 1. 全新的 Content API 2. React Strict Mode 3. Portal 4. Refs 5. Fragment 6. 其他 7. 总结 生命 ...

  9. VSCode远程调试Go程序方法(Attach)

    set launch.json { "name": "Attach", "type": "go", "requ ...

  10. 数据共享Manager

    将数据设置成共享数据,一个进程修改了数据,另外一个进程就能就接受的被修改的数据. 起50个进程让他们都去操作一个数据: from multiprocessing import Process, Man ...