Parsing Document Metadata with Bing Scaping

Set up the environment - install goquery package.

https://github.com/PuerkitoBio/goquery

go get github.com/PuerkitoBio/goquery

Modify the Proxy setting if in China. Refer to: https://sum.golang.org/

Unzip an Office file and analyze the Open XML file struct. "creator", "lastModifiedBy" in core.xml and "Application", "Company", "AppVersion" in app.xml are of primary interest.

Defining the metadata Package and mapping the data to structs in GO to open, parse, and extract Office Open XML documents.

package metadata

import (
"archive/zip"
"encoding/xml"
"strings"
) // Open XML type definition and version mapping
type OfficeCoreProperty struct {
XMLName xml.Name `xml:"coreProperties"`
Creator string `xml:"creator"`
LastModifiedBy string `xml:"lastModifiedBy"`
} type OfficeAppProperty struct {
XMLName xml.Name `xml:"Properties"`
Application string `xml:"Application"`
Company string `xml:"Company"`
Version string `xml:"AppVersion"`
} var OfficeVersion = map[string]string{
"16": "2016",
"15": "2013",
"14": "2010",
"12": "2007",
"11": "2003",
} func (a *OfficeAppProperty) GetMajorVersion() string {
tokens := strings.Split(a.Version, ".") if len(tokens) < 2 {
return "Unknown"
}
v, ok := OfficeVersion[tokens[0]]
if !ok {
return "Unknown"
}
return v
} // Processing Open XML archives and embedded XML documents
func NewProperties(r *zip.Reader) (*OfficeCoreProperty, *OfficeAppProperty, error) {
var coreProps OfficeCoreProperty
var appProps OfficeAppProperty for _, f := range r.File {
switch f.Name {
case "docProps/core.xml":
if err := process(f, &coreProps); err != nil {
return nil, nil, err
}
case "docProps/app.xml":
if err := process(f, &appProps); err != nil {
return nil, nil, err
}
default:
continue
}
}
return &coreProps, &appProps, nil
} func process(f *zip.File, prop interface{}) error {
rc, err := f.Open()
if err != nil {
return err
}
defer rc.Close() if err := xml.NewDecoder(rc).Decode(&prop); err != nil {
return err
} return nil
}

Figure out how to search for and retrieve files by using Bing.

1. Submit a search request to Bing with proper filters to retrieve targeted results.

2. Scrape the HTML response, extracting the HRER(link) data to obtain direct URLs for documents.

3. Submit an HTTP request for each direct document URL.

4. Parse the response body to create a zip.Reader.

5. Pass the zip.Reader into the code you already developed to extract metadata.

Analyze the search result elements in Bing.

Now scrap Bing results and parse the document metadata.

package metadata

import (
"archive/zip"
"encoding/xml"
"strings"
) // Open XML type definition and version mapping
type OfficeCoreProperty struct {
XMLName xml.Name `xml:"coreProperties"`
Creator string `xml:"creator"`
LastModifiedBy string `xml:"lastModifiedBy"`
} type OfficeAppProperty struct {
XMLName xml.Name `xml:"Properties"`
Application string `xml:"Application"`
Company string `xml:"Company"`
Version string `xml:"AppVersion"`
} var OfficeVersion = map[string]string{
"16": "2016",
"15": "2013",
"14": "2010",
"12": "2007",
"11": "2003",
} func (a *OfficeAppProperty) GetMajorVersion() string {
tokens := strings.Split(a.Version, ".") if len(tokens) < 2 {
return "Unknown"
}
v, ok := OfficeVersion[tokens[0]]
if !ok {
return "Unknown"
}
return v
} // Processing Open XML archives and embedded XML documents
func NewProperties(r *zip.Reader) (*OfficeCoreProperty, *OfficeAppProperty, error) {
var coreProps OfficeCoreProperty
var appProps OfficeAppProperty for _, f := range r.File {
switch f.Name {
case "docProps/core.xml":
if err := process(f, &coreProps); err != nil {
return nil, nil, err
}
case "docProps/app.xml":
if err := process(f, &appProps); err != nil {
return nil, nil, err
}
default:
continue
}
}
return &coreProps, &appProps, nil
} func process(f *zip.File, prop interface{}) error {
rc, err := f.Open()
if err != nil {
return err
}
defer rc.Close() if err := xml.NewDecoder(rc).Decode(&prop); err != nil {
return err
} return nil
}

Go Pentester - HTTP CLIENTS(5)的更多相关文章

  1. Go Pentester - HTTP CLIENTS(1)

    Building HTTP Clients that interact with a variety of security tools and resources. Basic Preparatio ...

  2. Go Pentester - HTTP CLIENTS(4)

    Interacting with Metasploit msf.go package rpc import ( "bytes" "fmt" "gopk ...

  3. Go Pentester - HTTP CLIENTS(3)

    Interacting with Metasploit Early-stage Preparation: Setting up your environment - start the Metaspl ...

  4. Go Pentester - HTTP CLIENTS(2)

    Building an HTTP Client That Interacts with Shodan Shadon(URL:https://www.shodan.io/)  is the world' ...

  5. Creating a radius based VPN with support for Windows clients

    This article discusses setting up up an integrated IPSec/L2TP VPN using Radius and integrating it wi ...

  6. Deploying JRE (Native Plug-in) for Windows Clients in Oracle E-Business Suite Release 12 (文档 ID 393931.1)

    In This Document Section 1: Overview Section 2: Pre-Upgrade Steps Section 3: Upgrade and Configurati ...

  7. ZK 使用Clients.response

    参考: http://stackoverflow.com/questions/11416386/how-to-access-au-response-sent-from-server-side-at-c ...

  8. MySQL之aborted connections和aborted clients

    影响Aborted_clients 值的可能是客户端连接异常关闭,或wait_timeout值过小. 最近线上遇到一个问题,接口日志发现有很多超时报错,根据日志定位到数据库实例之后发现一切正常,一般来 ...

  9. 【渗透测试学习平台】 web for pentester -2.SQL注入

    Example 1 字符类型的注入,无过滤 http://192.168.91.139/sqli/example1.php?name=root http://192.168.91.139/sqli/e ...

随机推荐

  1. 图解MySQL索引(三)—如何正确使用索引?

    MySQL使用了B+Tree作为底层数据结构,能够实现快速高效的数据查询功能.工作中可怕的是没有建立索引,比这更可怕的是建好了索引又没有使用到.本文将围绕着如何优雅的使用索引,图文并茂地和大家一起探讨 ...

  2. Java并发编程的艺术(一、二章) ——学习笔记

    第一章  并发编程的挑战 需要了解的一些概念 转自 https://blog.csdn.net/TzBugs/article/details/80921351 (1) 同步VS异步 同步和异步通常用来 ...

  3. C++ 进阶 模板和STL

    C++提高编程 本阶段主要针对C++泛型编程和STL技术做详细讲解,探讨C++更深层的使用 1 模板 1.1 模板的概念 模板就是建立通用的模具,大大提高复用性 模板的特点: 模板不可以直接使用,它只 ...

  4. 并发编程-CPU执行volatile原理探讨-可见性与原子性的深入理解

    volatile的定义 Java语言规范第3版中对volatile的定义如下:Java编程语言允许线程访问共享变量,为了确保共享变量能被准确和一致地更新,线程应该确保通过排他锁单独获得这个变量.Jav ...

  5. Java并发编程的本质是解决这三大问题

    [本文版权归微信公众号"代码艺术"(ID:onblog)所有,若是转载请务必保留本段原创声明,违者必究.若是文章有不足之处,欢迎关注微信公众号私信与我进行交流!] 前言 并发编程的 ...

  6. AOP的概念

    1.1 什么是AOP? 软件开发一直在寻求更加高效.更易维护甚至更易扩展的方式.软件开发的目的,最终是为了解决各种需求,包括业务需求和系统需求.使用面向对象方法,我们可以对业务需求等普通关注点进行很好 ...

  7. 为什么说String是线程安全的

    String是final修饰的类,是不可变的,所以是线程安全的. 一.Java String类为什么是final的? 1.为了实现字符串池 2.为了线程安全 3.为了实现String可以创建HashC ...

  8. app自动化测试环境配置:adb环境配置、monkey环境配置、appium环境配置大全

    1. 安装jdk 2. 安装配置Andriod sdk 安装Andriod sdk前首先需要安装配置好jdk环境. 然后安装Android sdk 安装完成后需要配置环境变量:ANDROID_HOME ...

  9. HDU 2157 How many ways?【矩阵快速幂】

    题目 春天到了, HDU校园里开满了花, 姹紫嫣红, 非常美丽. 葱头是个爱花的人, 看着校花校草竞相开放, 漫步校园, 心情也变得舒畅. 为了多看看这迷人的校园, 葱头决定, 每次上课都走不同的路线 ...

  10. Docker入门——理解Docker的核心概念

    1 前言 相信不少人听过这么一句话: 人类的本质是复读机. 在软件开发领域也一样,我们总是想寻找更好地方式复制优秀的逻辑或系统.最核心的方法是抽取通用逻辑和组件,把差异化的东西接口化或配置化,达到复用 ...