最新超简单解读torchvision

torchvision

https://pytorch.org/docs/stable/torchvision/index.html#module-torchvision

The torchvision package consists of popular datasets(数据集), model architectures（模型结构）, and common image transformations（通用图像转换） for computer vision.

torchvision.get_image_backend():Gets the name of the package used to load images

torchvision.set_image_backend(backend): Specifies the package used to load images.

torchvision.set_video_backend(backend): Specifies the package used to decode videos.

torchvision.datasets(目前共24个数据集)：

MNIST;Fashion-MNIST;KMNIST;EMNIST;QMNIST;FakeData;COCO;LSUN;ImageFolder;DatasetFolder;ImageNet;CIFAR;STL10;SVHN;PhotoTour;SBU;Flickr;VOC;Cityscapes;SBD;USPS;Kinetics-400;HMDB51;UCF101.

torchvision.io(目前只支持video):

Video

torchvision.io.read_video(filename, start_pts=0, end_pts=None, pts_unit='pts')

Reads a video from a file, returning both the video frames as well as the audio frames.

torchvision.models(目前只支持Classification, Semantic Segmentation, Object Detection, Instance Segmentation and Person Keypoint Detection和Video classification三类模型)：

Classification：

The models subpackage contains definitions for the following model architectures for image classification:

AlexNet

VGG

ResNet

SqueezeNet

DenseNet

Inception v3

GoogLeNet

ShuffleNet v2

MobileNet v2

ResNeXt

Wide ResNet

MNASNet

You can construct a model with random weights by calling its constructor:

import torchvision.models as models

resnet18 = models.resnet18()

alexnet = models.alexnet()

vgg16 = models.vgg16()

squeezenet = models.squeezenet1_0()

densenet = models.densenet161()

inception = models.inception_v3()

googlenet = models.googlenet()

shufflenet = models.shufflenet_v2_x1_0()

mobilenet = models.mobilenet_v2()

resnext50_32x4d = models.resnext50_32x4d()

wide_resnet50_2 = models.wide_resnet50_2()

mnasnet = models.mnasnet1_0()

pre-trained models, using the PyTorch torch.utils.model_zoo. These can be constructed by passing pretrained=True:

import torchvision.models as models

resnet18 = models.resnet18(pretrained=True)

alexnet = models.alexnet(pretrained=True)

squeezenet = models.squeezenet1_0(pretrained=True)

vgg16 = models.vgg16(pretrained=True)

densenet = models.densenet161(pretrained=True)

inception = models.inception_v3(pretrained=True)

googlenet = models.googlenet(pretrained=True)

shufflenet = models.shufflenet_v2_x1_0(pretrained=True)

mobilenet = models.mobilenet_v2(pretrained=True)

resnext50_32x4d = models.resnext50_32x4d(pretrained=True)

wide_resnet50_2 = models.wide_resnet50_2(pretrained=True)

mnasnet = models.mnasnet1_0(pretrained=True)

Instancing a pre-trained model will download its weights to a cache directory. This directory can be set using the TORCH_MODEL_ZOO environment variable. See torch.utils.model_zoo.load_url() for details.

Some models use modules which have different training and evaluation behavior, such as batch normalization. To switch between these modes, use model.train() or model.eval() as appropriate. See train() or eval() for details.

All pre-trained models expect input images normalized in the same way, i.e. mini-batches of 3-channel RGB images of shape (3 x H x W), where H and W are expected to be at least 224. The images have to be loaded in to a range of [0, 1] and then normalized using mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225]. You can use the following transform to normalize:

normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],

std=[0.229, 0.224, 0.225])

Semantic Segmentation：

The models subpackage contains definitions for the following model architectures for semantic segmentation:

FCN ResNet101

DeepLabV3 ResNet101

As with image classification models, all pre-trained models expect input images normalized in the same way. The images have to be loaded in to a range of [0, 1] and then normalized using mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225]. They have been trained on images resized such that their minimum size is 520.

The pre-trained models have been trained on a subset of COCO train2017, on the 20 categories that are present in the Pascal VOC dataset. You can see more information on how the subset has been selected in references/segmentation/coco_utils.py. The classes that the pre-trained model outputs are the following, in order:

['__background__', 'aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus',

'car', 'cat', 'chair', 'cow', 'diningtable', 'dog', 'horse', 'motorbike',

'person', 'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor']

Object Detection, Instance Segmentation and Person Keypoint Detection：

The models subpackage contains definitions for the following model architectures for detection:

Faster R-CNN ResNet-50 FPN

Mask R-CNN ResNet-50 FPN

The pre-trained models for detection, instance segmentation and keypoint detection are initialized with the classification models in torchvision.

The models expect a list of Tensor[C, H, W], in the range 0-1. The models internally resize the images so that they have a minimum size of 800. This option can be changed by passing the option min_size to the constructor of the models.

For object detection and instance segmentation, the pre-trained models return the predictions of the following classes:

COCO_INSTANCE_CATEGORY_NAMES = [

'__background__', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',

'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'N/A', 'stop sign',

'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',

'elephant', 'bear', 'zebra', 'giraffe', 'N/A', 'backpack', 'umbrella', 'N/A', 'N/A',

'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',

'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket',

'bottle', 'N/A', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl',

'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',

'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'N/A', 'dining table',

'N/A', 'N/A', 'toilet', 'N/A', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',

'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'N/A', 'book',

'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush'

]

For person keypoint detection, the pre-trained model return the keypoints in the following order:

COCO_PERSON_KEYPOINT_NAMES = [

'nose',

'left_eye',

'right_eye',

'left_ear',

'right_ear',

'left_shoulder',

'right_shoulder',

'left_elbow',

'right_elbow',

'left_wrist',

'right_wrist',

'left_hip',

'right_hip',

'left_knee',

'right_knee',

'left_ankle',

'right_ankle'

]

Video classification：

We provide models for action recognition pre-trained on Kinetics-400. They have all been trained with the scripts provided in references/video_classification.

All pre-trained models expect input images normalized in the same way, i.e. mini-batches of 3-channel RGB videos of shape (3 x T x H x W), where H and W are expected to be 112, and T is a number of video frames in a clip. The images have to be loaded in to a range of [0, 1] and then normalized using mean = [0.43216, 0.394666, 0.37645] and std = [0.22803, 0.22145, 0.216989].

NOTE

The normalization parameters are different from the image classification ones, and correspond to the mean and std from Kinetics-400.

NOTE

For now, normalization code can be found in references/video_classification/transforms.py, see the Normalizefunction there. Note that it differs from standard normalization for images because it assumes the video is 4d.

Kinetics 1-crop accuracies for clip length 16 (16x112x112)

Network	Clip acc@1	Clip acc@5
ResNet 3D 18	52.75	75.45
ResNet MC 18	53.90	76.29
ResNet (2+1)D	57.50	78.81

torchvision.ops（操作符）：

torchvision.ops implements operators that are specific for Computer Vision.

支持：

torchvision.ops.nms(boxes, scores, iou_threshold)：Performs non-maximum suppression (NMS) on the boxes according to their intersection-over-union (IoU).

torchvision.ops.roi_align(input, boxes, output_size, spatial_scale=1.0, sampling_ratio=-1): Performs Region of Interest (RoI) Align operator described in Mask R-CNN

torchvision.ops.roi_pool(input, boxes, output_size, spatial_scale=1.0): Performs Region of Interest (RoI) Pool operator described in Fast R-CNN

torchvision.transforms（转换操作）

torchvision.utils.make_grid(tensor, nrow=8, padding=2, normalize=False, range=None, scale_each=False, pad_value=0), Make a grid of images.

torchvision.utils.save_image(tensor, fp, nrow=8, padding=2, normalize=False, range=None, scale_each=False, pad_value=0, format=None), Save a given Tensor into an image file.

随机推荐

python 定时任务 from apscheduler.schedulers.blocking import BlockingScheduler
说明:使用python内置的模块来实现,本篇博客只是以循环定时来示范,其他的可以结合crontab的风格自己设定一.导包 from apscheduler.schedulers.blocking i ...
ES6学习笔记--Object.is()
ES5比较两个值是否相等, 相等运算符(==)和恒等运算符(===).它们都有缺点,前者会自动转换数据类型,后者的NaN不等于自身,以及+0等于-0. javascript缺乏一种运算,在所有环境中, ...
关于绿盟RSAS使用时遇到的问题
本周在使用绿盟RSAS扫描工具时遇到了一些问题: 一.扫描工具在家测试可以正常工作,到了现场设置正确但Web端页面打不开: 二.扫描器可以正常进行扫描,并且成功扫描出结果,但显示目标主机没有问题: 原 ...
timeout/timelimit
timelimit
LOJ#6229. 这是一道简单的数学题(莫比乌斯反演+杜教筛)
题目链接 $Description$ 求\[\sum_{i=1}^n\sum_{j=1}^i\frac{lcm(i,j)}{gcd(i,j)}\] 答案对$10^9+7$取模. \(n< ...
一起学Makefile（四）
变量的定义 makefile中的变量,与C语言中的宏类似,它为一个文本字符串(变量的值,其类型只能是字符串类型)提供了一个名字(变量名). 变量的基本格式: 变量名赋值符变量值变量名指的 ...
【牛客】路径计数机（树形dp 前缀和）
题目描述有一棵n个点的树和两个整数p, q,求满足以下条件的四元组(a, b, c, d)的个数: 1.$1\leq a,b,c,d \leq n$ 2.点a到点b的经过的边数为p. 3.点c ...
年轻人的第一个自定义 Spring Boot Starter！
陆陆续续,零零散散,栈长已经写了几十篇 Spring Boot 系列文章了,其中有介绍到 Spring Boot Starters 启动器,使用的.介绍的都是第三方的 Starters ,那如何开发一 ...
mybatis自定义插件（拦截器）开发详解
mybatis插件(准确的说应该是around拦截器,因为接口名是interceptor,而且invocation.proceed要自己调用,配置中叫插件)功能非常强大,可以让我们无侵入式的对SQL的 ...
HSBImageView--android--可以设置HSB值的imageview
package guide.yunji.com.guide.view; import android.content.Context; import android.content.res.Typed ...

最新超简单解读torchvision

最新超简单解读torchvision的更多相关文章

随机推荐

热门专题