最新超简单解读torchvision
torchvision
https://pytorch.org/docs/stable/torchvision/index.html#module-torchvision
The torchvision package consists of popular datasets(数据集), model architectures(模型结构), and common image transformations(通用图像转换) for computer vision.
torchvision.get_image_backend
():Gets the name of the package used to load images
torchvision.set_image_backend
(backend): Specifies the package used to load images.
torchvision.set_video_backend
(backend): Specifies the package used to decode videos.
- torchvision.datasets(目前共24个数据集):
MNIST;Fashion-MNIST;KMNIST;EMNIST;QMNIST;FakeData;COCO;LSUN;ImageFolder;DatasetFolder;ImageNet;CIFAR;STL10;SVHN;PhotoTour;SBU;Flickr;VOC;Cityscapes;SBD;USPS;Kinetics-400;HMDB51;UCF101.
- torchvision.io(目前只支持video):
Video
torchvision.io.read_video
(filename, start_pts=0, end_pts=None, pts_unit='pts')
Reads a video from a file, returning both the video frames as well as the audio frames.
- torchvision.models(目前只支持Classification, Semantic Segmentation, Object Detection, Instance Segmentation and Person Keypoint Detection和Video classification三类模型):
The models subpackage contains definitions for the following model architectures for image classification:
Inception v3
ShuffleNet v2
MobileNet v2
You can construct a model with random weights by calling its constructor:
import torchvision.models as models
resnet18 = models.resnet18()
alexnet = models.alexnet()
vgg16 = models.vgg16()
squeezenet = models.squeezenet1_0()
densenet = models.densenet161()
inception = models.inception_v3()
googlenet = models.googlenet()
shufflenet = models.shufflenet_v2_x1_0()
mobilenet = models.mobilenet_v2()
resnext50_32x4d = models.resnext50_32x4d()
wide_resnet50_2 = models.wide_resnet50_2()
mnasnet = models.mnasnet1_0()
pre-trained models, using the PyTorch torch.utils.model_zoo. These can be constructed by passing pretrained=True:
import torchvision.models as models
resnet18 = models.resnet18(pretrained=True)
alexnet = models.alexnet(pretrained=True)
squeezenet = models.squeezenet1_0(pretrained=True)
vgg16 = models.vgg16(pretrained=True)
densenet = models.densenet161(pretrained=True)
inception = models.inception_v3(pretrained=True)
googlenet = models.googlenet(pretrained=True)
shufflenet = models.shufflenet_v2_x1_0(pretrained=True)
mobilenet = models.mobilenet_v2(pretrained=True)
resnext50_32x4d = models.resnext50_32x4d(pretrained=True)
wide_resnet50_2 = models.wide_resnet50_2(pretrained=True)
mnasnet = models.mnasnet1_0(pretrained=True)
Instancing a pre-trained model will download its weights to a cache directory. This directory can be set using the TORCH_MODEL_ZOO environment variable. See torch.utils.model_zoo.load_url() for details.
Some models use modules which have different training and evaluation behavior, such as batch normalization. To switch between these modes, use model.train() or model.eval() as appropriate. See train() or eval() for details.
All pre-trained models expect input images normalized in the same way, i.e. mini-batches of 3-channel RGB images of shape (3 x H x W), where H and W are expected to be at least 224. The images have to be loaded in to a range of [0, 1] and then normalized using mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225]. You can use the following transform to normalize:
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
The models subpackage contains definitions for the following model architectures for semantic segmentation:
As with image classification models, all pre-trained models expect input images normalized in the same way. The images have to be loaded in to a range of [0,
1] and then normalized using mean
=
[0.485,
0.456,
0.406] and std
=
[0.229,
0.224,
0.225]. They have been trained on images resized such that their minimum size is 520.
The pre-trained models have been trained on a subset of COCO train2017, on the 20 categories that are present in the Pascal VOC dataset. You can see more information on how the subset has been selected in references/segmentation/coco_utils.py. The classes that the pre-trained model outputs are the following, in order:
['__background__', 'aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus',
'car', 'cat', 'chair', 'cow', 'diningtable', 'dog', 'horse', 'motorbike',
'person', 'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor']
Object Detection, Instance Segmentation and Person Keypoint Detection:
The models subpackage contains definitions for the following model architectures for detection:
The pre-trained models for detection, instance segmentation and keypoint detection are initialized with the classification models in torchvision.
The models expect a list of Tensor[C, H, W], in the range 0-1. The models internally resize the images so that they have a minimum size of 800. This option can be changed by passing the option min_size to the constructor of the models.
For object detection and instance segmentation, the pre-trained models return the predictions of the following classes:
COCO_INSTANCE_CATEGORY_NAMES = [
'__background__', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'N/A', 'stop sign',
'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
'elephant', 'bear', 'zebra', 'giraffe', 'N/A', 'backpack', 'umbrella', 'N/A', 'N/A',
'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket',
'bottle', 'N/A', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl',
'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',
'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'N/A', 'dining table',
'N/A', 'N/A', 'toilet', 'N/A', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'N/A', 'book',
'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush'
]
For person keypoint detection, the pre-trained model return the keypoints in the following order:
COCO_PERSON_KEYPOINT_NAMES = [
'nose',
'left_eye',
'right_eye',
'left_ear',
'right_ear',
'left_shoulder',
'right_shoulder',
'left_elbow',
'right_elbow',
'left_wrist',
'right_wrist',
'left_hip',
'right_hip',
'left_knee',
'right_knee',
'left_ankle',
'right_ankle'
]
We provide models for action recognition pre-trained on Kinetics-400. They have all been trained with the scripts provided in references/video_classification.
All pre-trained models expect input images normalized in the same way, i.e. mini-batches of 3-channel RGB videos of shape (3 x T x H x W), where H and W are expected to be 112, and T is a number of video frames in a clip. The images have to be loaded in to a range of [0, 1] and then normalized using mean = [0.43216, 0.394666, 0.37645] and std = [0.22803, 0.22145, 0.216989].
NOTE
The normalization parameters are different from the image classification ones, and correspond to the mean and std from Kinetics-400.
NOTE
For now, normalization code can be found in references/video_classification/transforms.py, see the Normalizefunction there. Note that it differs from standard normalization for images because it assumes the video is 4d.
Kinetics 1-crop accuracies for clip length 16 (16x112x112)
Network |
Clip acc@1 |
Clip acc@5 |
ResNet 3D 18 |
52.75 |
75.45 |
ResNet MC 18 |
53.90 |
76.29 |
ResNet (2+1)D |
57.50 |
78.81 |
- torchvision.ops(操作符):
torchvision.ops implements operators that are specific for Computer Vision.
支持:
torchvision.ops.nms
(boxes, scores, iou_threshold):Performs non-maximum suppression (NMS) on the boxes according to their intersection-over-union (IoU).
torchvision.ops.roi_align
(input, boxes, output_size, spatial_scale=1.0, sampling_ratio=-1): Performs Region of Interest (RoI) Align operator described in Mask R-CNN
torchvision.ops.roi_pool
(input, boxes, output_size, spatial_scale=1.0): Performs Region of Interest (RoI) Pool operator described in Fast R-CNN
- torchvision.transforms(转换操作)
torchvision.utils.make_grid
(tensor, nrow=8, padding=2, normalize=False, range=None, scale_each=False, pad_value=0), Make a grid of images.
torchvision.utils.save_image
(tensor, fp, nrow=8, padding=2, normalize=False, range=None, scale_each=False, pad_value=0, format=None), Save a given Tensor into an image file.
最新超简单解读torchvision的更多相关文章
- ssh框架整合---- spring 4.0 + struts 2.3.16 + maven ss整合超简单实例
一 . 需求 学了这么久的ssh,一直都是别人整合好的框架去写代码,自己实际动手时才发现框架配置真是很坑爹,一不小心就踏错,真是纸上得来终觉浅! 本文将记录整合struts + spring的过程 , ...
- 程序员,一起玩转GitHub版本控制,超简单入门教程 干货2
本GitHub教程旨在能够帮助大家快速入门学习使用GitHub,进行版本控制.帮助大家摆脱命令行工具,简单快速的使用GitHub. 做全栈攻城狮-写代码也要读书,爱全栈,更爱生活. 更多原创教程请关注 ...
- DCGAN 论文简单解读
DCGAN的全称是Deep Convolution Generative Adversarial Networks(深度卷积生成对抗网络).是2014年Ian J.Goodfellow 的那篇开创性的 ...
- DCGAN 代码简单解读
之前在DCGAN文章简单解读里说明了DCGAN的原理.本次来实现一个DCGAN,并在数据集上实际测试它的效果.本次的代码来自github开源代码DCGAN-tensorflow,感谢carpedm20 ...
- ECharts.js 超简单入门(本质canvas)
ECharts.js 超简单入门(本质canvas) 一.总结 一句话总结:echarts这些图标的本质都是canvas. 二.ECharts.js学习(一) 简单入门 EChart.js 简单入门 ...
- 超简单集成 HMS ML Kit 实现最大脸微笑抓拍
前言 如果大家对 HMS ML Kit 人脸检测功能有所了解,相信已经动手调用我们提供的接口编写自己的 APP 啦.目前就有小伙伴在调用接口的过程中反馈,不太清楚 HMS ML Kit 文档中的 ML ...
- 把C#程序(含多个Dll)合并成一个Exe的超简单方法
开发程序的时候经常会引用一些第三方的DLL,然后编译生成的exe文件就不能脱离这些DLL独立运行了. 但是,很多时候我们本想开发一款只需要一个exe就能完美运行的小工具.那该怎么办呢? 下文介绍一种超 ...
- 记住密码超简单实现(C#)
实现效果如下 实现过程 [Serializable] class User { //记住密码 private string loginID; public string LoginID { get { ...
- 超简单的JNI——NDK开发教程
不好意思各位,我按照网上一些教程进行JNI开发,折腾了半天也没成功,最后自己瞎搞搞定了,其实超简单的,网上的教程应该过时了,最新版的AS就包含了NDK编译的功能,完全不用手动javah,各种包名路径的 ...
随机推荐
- 【C/C++】static关键字
首先static的最主要功能是隐藏,其次因为static变量存放在静态存储区,所以它具备持久性和默认值0. static性质 隐藏 当同时编译多个文件时,未加static前缀的全局变量和函数都具有全局 ...
- P3386 【模板】二分图匹配(匈牙利算法)
题目背景 二分图 题目描述 给定一个二分图,结点个数分别为n,m,边数为e,求二分图最大匹配数 输入输出格式 输入格式: 第一行,n,m,e 第二至e+1行,每行两个正整数u,v,表示u,v有一条连边 ...
- linux命令之------部分细节点
创建文件夹/文件命令以及清除模式 mkdir +文件夹名字 touch +文件名字 rm -fr 删除文件,文件夹 -f强制删除 -r是递归 查询linux下所有启动的线程: ps -ef|grep ...
- windows安装IIS不成功的原因
一.背景 之前做过一段时间的实施,因此总结一下IIS安装不成功会有哪些原因导致的,希望给踩坑的人提供思路和帮助. 二.分析原因 1.系统问题,比如Windows家庭版本(独白:我之前花了一天的时间安装 ...
- 灵活的MyBatis
一.前言 将数据存储到数据库是开发中很重要的一环.曾经有程序员说自己做过最牛逼的事情就是增删改查.确实我们做了很多页面,后太代码写了很多,可是最终都离不开数据库的增删改查.Java有一套自己的JPA标 ...
- kafka如何实现高并发存储-如何找到一条需要消费的数据(阿里)
阿里太注重原理了:阿里问kafka如何实现高并发存储-如何找到一条需要消费的数据,kafka用了稀疏索引的方式,使用了二分查找法,其实很多索引都是二分查找法 二分查找法的时间复杂度:O(logn) ...
- 【技术博客】JWT的认证机制Django项目中应用
开发组在开发过程中,都不可避免地遇到了一些困难或问题,但都最终想出办法克服了.我们认为这样的经验是有必要记录下来的,因此就有了[技术博客]. JWT的认证机制Django项目中应用 这篇技术博客基于软 ...
- mybatis-generator 模板
generatorConfig.xml <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE ge ...
- Unity2019.1中文技术手册离线版
使用离线版优质.系统化的教程.经验文档.参考手册,为开发者节省时间,提高效率! 解压后打开UnityDocumentation_2019.1/Manual/index.html 需要的自取,下载地址: ...
- Sonar错误 Invoke method(s) only conditionally
sonarLint总是报错: Invoke method(s) only conditionally 代码如下: if(us != null){ logger.info("Log this: ...