Bag of Visual Word (BoW, BoF, 词袋)

简介

BoW 是传统的计算机视觉方法，用一些特征（一些向量）来表示一个图像。BoW的核心思想是利用一组较为通用的特征，将图像用这些特征来表示，不同图像对于同一个特征的响应也是不同的，最终一个图像可以转化成关于这一组特征的一个频率直方图（向量）。这里有个挺清晰的介绍。BoW 常常用在 content-based image retrieval (CBIR) 任务上。

例如下面这张图（来源 Brown Computer Vision 2021 ）形象的介绍了BoW的，首先有一堆图片，然后提取这些图片中的特征，然后提取具有代表性的通用特征，然后计算不同图像对于这些特征的响应，从而将图像转换成关于这组特征的一个特征向量。

实践

本文不过多的介绍理论部分，主要使用opencv来进行一些实践操作。

数据集

本文使用的是一个比较老的数据集是 ZuBuD 数据集，是苏黎世联邦理工构建的数据集，开放下载。数据集是苏黎世城市内的一些建筑，训练集有1005张图像，包含201个建筑，测试集有115张图像，用来测试 image retrieval，有ground truth信息，即指定来哪些图像是对应的，如下随便找了两张图片。

以下是 ground truth 的部分信息，例如第一行代表测试集中编号为 1 的图像对应到训练集中，应该是编号 100。

TEST	TRAIN

001	100

002	102

003	104

004	105

005	107

006	109

...

...

总体思路

对每个图像提取sift特征
将训练集的所有特征放在一起进行聚类
对训练集中的图像计算直方图
对测试集中的图像计算直方图
从训练集中找和测试图像直方图最接近的图像作为结果
计算正确率

代码部分

有了上述思路后，代码的逻辑也比较清晰了，下面给出所有的代码，详细的解释在注释里。

#1.对每个图像提取sift特征

#2.将训练集合的所有特征放在一起进行聚类

#3.对每个图像计算直方图

#4.对测试图像计算直方图

#5.从训练集中寻找和测试图像直方图最近接近的图像作为结果

#6.计算正确率

import cv2

import os

import matplotlib.pyplot as plt

import numpy as np

import time

from sklearn.cluster import MiniBatchKMeans

DataPath = "../Dataset/ZuBuD" #数据集的根目录

TrainPath = os.path.join(DataPath, "png-ZuBuD") #训练集的根目录

TestPath = os.path.join(DataPath,"1000city","qimage") #测试集的根目录

trainList = os.listdir(TrainPath) #训练集图像的所有名字

TrainSIFTPath = "../Dataset/ZuBuD/Train_SIFT" #训练集图像SIFT保存的路径（保存在文件中时有用）

TestSIFTPath = "../Dataset/ZuBuD/Test_SIFT" #测试集图像SIFT保存的路径（保存在文件中时有用）

TrainSIFT = []#训练集的SIFT特征，为了后面numpy方便拼接

TestSIFT = []#测试集的SIFT特征

Train_SIFT_dict = {}#同上，只不过用名字来索引特征

Test_SIFT_dict = {}

#批量生成SIFT特征

def genSIFT(dataDir,outdir, outlist,outdict):

    begin = time.time()

    sift = cv2.SIFT_create()

    imgList = os.listdir(dataDir)

    if not os.path.exists(outdir):

        os.mkdir(outdir)

    count = 0

    for name in imgList:

        ext = os.path.splitext(name)[-1]

        if ext!=".png" and ext!=".JPG" and ext!=".jpg" :

            continue

        #读取图片、转成灰度、提取描述子

        path = os.path.join(dataDir,name)

        imgdata = cv2.imread(path)

        gray = cv2.cvtColor(imgdata,cv2.COLOR_BGR2GRAY)

        _, des = sift.detectAndCompute(gray, None)

        outlist.append(des)

        outdict[name] = des

        #np.save(os.path.join(outdir,name),des)

        print(len(imgList),count)

        count = count + 1

    end = time.time()

#聚类，也是生成通用特征、词袋，这里用的是MiniBatchKMeans，这个比KMeans快，精度没有差很多

def cluster(featureList, n):

    #将所有训练图片的SIFT特征放在一起进行聚类

    begin = time.time()

    X = np.concatenate(featureList)

    kmeans = MiniBatchKMeans(n_clusters=n, random_state=0,verbose=1).fit(X)

    end = time.time()

    return kmeans

#计算余弦距离，为了计算相似度

def get_cos_similar(v1, v2):

    num = float(np.dot(v1, v2))

    denom = np.linalg.norm(v1) * np.linalg.norm(v2)

    return 0.5 + 0.5 * (num / denom) if denom != 0 else 0

#读取groundtruth文件，生成数据对

def getGroundTruth(dataPath):

    gtpair = {}

    with open(os.path.join(dataPath,"zubud_groundtruth.txt")) as f:

        gt = f.readlines()

    for i, line in enumerate(gt):

        if i == 0:

            continue

        test, train = line[:-1].split("\t")

        gtpair[test] = train

    return gtpair

#根据聚类的结果，也就是词袋生成频率向量，这里就将图像转成了一个向量表示

def getFeatureHistogram(dataDict,kmeans):

    outDict = {}

    for k in dataDict.keys():

        feat = dataDict[k]

        his = np.bincount(kmeans.predict(feat))

        if his.shape[0] < kmeans.n_clusters:

            diff = kmeans.n_clusters - his.shape[0]

            for i in range(diff):

                his = np.append(his,0)

        outDict[k] = his

    return outDict

#这里时进行测试，这里使用了一种比较朴素的方法，也就是测试图像

#和训练集里的图像挨个比较，取余弦距离最大的那个作为结果。

def predict(testHisDict, trainHisDict, gtpair):

    predict = {}

    for testk in testHisDict.keys():

        testhis = testHisDict[testk]

        score = 0.0

        index = ""

        for traink in trainHisDict.keys():

            trainhis = trainHisDict[traink]

            s = get_cos_similar(testhis,trainhis)

            if s > score:

                score = s

                index = traink

        predict[testk] = index

    suc = 0

    for k in predict.keys():

        tk = k[5:8]

        pk = predict[k][7:10]

        if gtpair[tk] == pk:

            suc = suc+1

    return suc/len(predict)

#将以上步骤串起来，调整聚类的类别，来观察精度

def pipeline(n_list):

    result = []

    #1.对训练集、测试集提取sift特征

    t0 = time.time()

    genSIFT(TrainPath,TrainSIFTPath,TrainSIFT,Train_SIFT_dict)

    genSIFT(TestPath,TestSIFTPath,TestSIFT,Test_SIFT_dict)

    t1 = time.time()

    #2.读取ground truth

    gtpair = getGroundTruth(DataPath)

    #3.对训练集提取的sift进行聚类，生成 visual word

    for n in n_list:

        t3 = time.time()

        clu = cluster(TrainSIFT, n)

        t4 = time.time()

        #4.计算每个图像关于 visual word 的直方图

        train_his = getFeatureHistogram(Train_SIFT_dict, clu)

        test_his = getFeatureHistogram(Test_SIFT_dict, clu)

        t5 = time.time()

        #5.利用余弦距离计算相似度

        acc = predict(test_his,train_his, gtpair)

        t6 = time.time()

        info = {"sift":t1-t0,"clu":t4-t3,"calvw":t5-t4,"predict":t6-t5,"acc":acc}

        result.append(info)

        print(info)

    return result

result = pipeline([50,100,300,600,1000,2000])

print(result)

测试结果

本文一共测试了6组聚类的类别，随着类别增多，准确的逐渐上升，但是太对类别准确度反而会下降，这是因为在实验中发现每张图像平均也就能提取1000～1500个特征点，2000个类别太多啦。下面是绘制的准确度折线图，因为1000 - 2000之间没有测试，因此可能准确率还会有所提升。600个类别的准确率为 75.65%， 1000个准确率为 78.26%。

关于耗时，2020年 mac pro：

提取所有图像 SIFT 特征，耗时 55s 左右。
聚类 600 类，耗时 191s 左右，聚类 1000 类，耗时 251s 左右
计算频率直方图，600 类大概 6s，1000 类 9s
预测耗时基本都是 1.5s

[computer vision] Bag of Visual Word (BOW)的更多相关文章

模式识别之检索---Bag of visual word（词袋模型）
visual words 视觉单词 http://blog.csdn.net/v_july_v/article/details/8203674 http://blog.csdn.net/pi9nc/a ...
（转） WTF is computer vision?
WTF is computer vision? Posted Nov 13, 2016 by Devin Coldewey, Contributor Next Story Someon ...
计算机视觉和人工智能的状态：我们已经走得很远了 The state of Computer Vision and AI: we are really, really far away.
The picture above is funny. But for me it is also one of those examples that make me sad about the o ...
Computer Vision Algorithm Implementations
Participate in Reproducible Research General Image Processing OpenCV (C/C++ code, BSD lic) Image man ...
Graph Cut and Its Application in Computer Vision
Graph Cut and Its Application in Computer Vision 原文出处: http://lincccc.blogspot.tw/2011/04/graph-cut- ...
Learning ROS for Robotics Programming Second Edition学习笔记(五) indigo computer vision
中文译著已经出版,详情请参考:http://blog.csdn.net/ZhangRelay/article/category/6506865 Learning ROS for Robotics Pr ...
Computer Vision Resources
Computer Vision Resources Softwares Topic Resources References Feature Extraction SIFT [1] [Demo pro ...
Computer Vision Tutorials from Conferences (3) -- CVPR
CVPR 2013 (http://www.pamitc.org/cvpr13/tutorials.php) Foundations of Spatial SpectroscopyJames Cogg ...
Computer Vision Tutorials from Conferences (2) -- ECCV
ECCV 2012 (http://eccv2012.unifi.it/program/tutorials/) Vision Applications on Mobile using OpenCVGa ...

随机推荐

LeetCode-025-K 个一组翻转链表
K 个一组翻转链表题目描述:给你一个链表,每 k 个节点一组进行翻转,请你返回翻转后的链表. k 是一个正整数,它的值小于或等于链表的长度. 如果节点总数不是 k 的整数倍,那么请将最后剩余的节点保 ...
六、Java方法
Java方法何为方法 System.out.println(),那么它是什么呢? System是一个类,out是一个对象,println()是一个方法 Java方法是语句的集合,它们在一起执行的 ...
ELKB-ElasticSearch-Logstash-Kibana-beats 个人理解
先说一下ELK,E是ElasticSearch,L是Logstash,K是Kibana,还有一个Beats.按照从采集到展示的顺序介绍下各个组件的作用. 1.Beats Beats 是一个免费且开放的 ...
php简易表单及下拉框动态渲染
<?php//1.连接数据库$link = mysqli_connect('127.0.0.1','root','root','1906');//2.设置字符集mysqli_set_charse ...
高颜值测试报告- XTestRunner
Modern style test report based on unittest framework. 基于unittest框架现代风格测试报告. 特点漂亮测试报告让你更愿意编写测试. 支持单元 ...
LGP6825题解
科技的力量!!!!!!我德意志科技天下第一!!! 这是一篇需要一点儿科技的题解,但实际上这个科技我认为甚至算不上科技,太 simple 了. 首先是推柿子: \[\sum_{i=1}^n\sum_{j ...
【面经】Python面试的16个高频问题
(一)Python 是如何进行内存管理的? 答:从三个方面来说,一对象的引用计数机制,二垃圾回收机制,三内存池机制 ⒈对象的引用计数机制 Python 内部使用引用计数,来保持追踪内存中的对象,所有对 ...
RESTful API设计规范总结
RESTful 是目前最流行的 API 设计规范,用于 Web 数据接口的设计. 它的大原则容易把握,但是细节不容易做对.本文总结 RESTful 的设计细节,介绍如何设计出易于理解和使用的 API. ...
SpringCloudAlibaba 微服务讲解（二）微服务环境搭建
微服务环境搭建我们这次是使用的电商项目的商品.订单.用户为案例进行讲解 2.1 案例准备 2.1.1 技术选型 maven :3.3.9 数据库:mysql 持久层:SpringData JPA S ...
【推理引擎】ONNXRuntime 的架构设计
ONNXRuntime,深度学习领域的神经网络模型推理框架,从名字中可以看出它和 ONNX 的关系:以 ONNX 模型作为中间表达(IR)的运行时(Runtime). 本文许多内容翻译于官方文档:ht ...

[computer vision] Bag of Visual Word (BOW)