Go Pentester - HTTP CLIENTS(5)
Parsing Document Metadata with Bing Scaping
Set up the environment - install goquery package.
https://github.com/PuerkitoBio/goquery
go get github.com/PuerkitoBio/goquery

Modify the Proxy setting if in China. Refer to: https://sum.golang.org/

Unzip an Office file and analyze the Open XML file struct. "creator", "lastModifiedBy" in core.xml and "Application", "Company", "AppVersion" in app.xml are of primary interest.

Defining the metadata Package and mapping the data to structs in GO to open, parse, and extract Office Open XML documents.
package metadata import (
"archive/zip"
"encoding/xml"
"strings"
) // Open XML type definition and version mapping
type OfficeCoreProperty struct {
XMLName xml.Name `xml:"coreProperties"`
Creator string `xml:"creator"`
LastModifiedBy string `xml:"lastModifiedBy"`
} type OfficeAppProperty struct {
XMLName xml.Name `xml:"Properties"`
Application string `xml:"Application"`
Company string `xml:"Company"`
Version string `xml:"AppVersion"`
} var OfficeVersion = map[string]string{
"16": "2016",
"15": "2013",
"14": "2010",
"12": "2007",
"11": "2003",
} func (a *OfficeAppProperty) GetMajorVersion() string {
tokens := strings.Split(a.Version, ".") if len(tokens) < 2 {
return "Unknown"
}
v, ok := OfficeVersion[tokens[0]]
if !ok {
return "Unknown"
}
return v
} // Processing Open XML archives and embedded XML documents
func NewProperties(r *zip.Reader) (*OfficeCoreProperty, *OfficeAppProperty, error) {
var coreProps OfficeCoreProperty
var appProps OfficeAppProperty for _, f := range r.File {
switch f.Name {
case "docProps/core.xml":
if err := process(f, &coreProps); err != nil {
return nil, nil, err
}
case "docProps/app.xml":
if err := process(f, &appProps); err != nil {
return nil, nil, err
}
default:
continue
}
}
return &coreProps, &appProps, nil
} func process(f *zip.File, prop interface{}) error {
rc, err := f.Open()
if err != nil {
return err
}
defer rc.Close() if err := xml.NewDecoder(rc).Decode(&prop); err != nil {
return err
} return nil
}
Figure out how to search for and retrieve files by using Bing.
1. Submit a search request to Bing with proper filters to retrieve targeted results.
2. Scrape the HTML response, extracting the HRER(link) data to obtain direct URLs for documents.
3. Submit an HTTP request for each direct document URL.
4. Parse the response body to create a zip.Reader.
5. Pass the zip.Reader into the code you already developed to extract metadata.
Analyze the search result elements in Bing.

Now scrap Bing results and parse the document metadata.
package metadata import (
"archive/zip"
"encoding/xml"
"strings"
) // Open XML type definition and version mapping
type OfficeCoreProperty struct {
XMLName xml.Name `xml:"coreProperties"`
Creator string `xml:"creator"`
LastModifiedBy string `xml:"lastModifiedBy"`
} type OfficeAppProperty struct {
XMLName xml.Name `xml:"Properties"`
Application string `xml:"Application"`
Company string `xml:"Company"`
Version string `xml:"AppVersion"`
} var OfficeVersion = map[string]string{
"16": "2016",
"15": "2013",
"14": "2010",
"12": "2007",
"11": "2003",
} func (a *OfficeAppProperty) GetMajorVersion() string {
tokens := strings.Split(a.Version, ".") if len(tokens) < 2 {
return "Unknown"
}
v, ok := OfficeVersion[tokens[0]]
if !ok {
return "Unknown"
}
return v
} // Processing Open XML archives and embedded XML documents
func NewProperties(r *zip.Reader) (*OfficeCoreProperty, *OfficeAppProperty, error) {
var coreProps OfficeCoreProperty
var appProps OfficeAppProperty for _, f := range r.File {
switch f.Name {
case "docProps/core.xml":
if err := process(f, &coreProps); err != nil {
return nil, nil, err
}
case "docProps/app.xml":
if err := process(f, &appProps); err != nil {
return nil, nil, err
}
default:
continue
}
}
return &coreProps, &appProps, nil
} func process(f *zip.File, prop interface{}) error {
rc, err := f.Open()
if err != nil {
return err
}
defer rc.Close() if err := xml.NewDecoder(rc).Decode(&prop); err != nil {
return err
} return nil
}

Go Pentester - HTTP CLIENTS(5)的更多相关文章
- Go Pentester - HTTP CLIENTS(1)
Building HTTP Clients that interact with a variety of security tools and resources. Basic Preparatio ...
- Go Pentester - HTTP CLIENTS(4)
Interacting with Metasploit msf.go package rpc import ( "bytes" "fmt" "gopk ...
- Go Pentester - HTTP CLIENTS(3)
Interacting with Metasploit Early-stage Preparation: Setting up your environment - start the Metaspl ...
- Go Pentester - HTTP CLIENTS(2)
Building an HTTP Client That Interacts with Shodan Shadon(URL:https://www.shodan.io/) is the world' ...
- Creating a radius based VPN with support for Windows clients
This article discusses setting up up an integrated IPSec/L2TP VPN using Radius and integrating it wi ...
- Deploying JRE (Native Plug-in) for Windows Clients in Oracle E-Business Suite Release 12 (文档 ID 393931.1)
In This Document Section 1: Overview Section 2: Pre-Upgrade Steps Section 3: Upgrade and Configurati ...
- ZK 使用Clients.response
参考: http://stackoverflow.com/questions/11416386/how-to-access-au-response-sent-from-server-side-at-c ...
- MySQL之aborted connections和aborted clients
影响Aborted_clients 值的可能是客户端连接异常关闭,或wait_timeout值过小. 最近线上遇到一个问题,接口日志发现有很多超时报错,根据日志定位到数据库实例之后发现一切正常,一般来 ...
- 【渗透测试学习平台】 web for pentester -2.SQL注入
Example 1 字符类型的注入,无过滤 http://192.168.91.139/sqli/example1.php?name=root http://192.168.91.139/sqli/e ...
随机推荐
- Dubbo——服务目录
引言 前面几篇文章分析了Dubbo的核心工作原理,本篇将对之前涉及到但却未细讲的服务目录进行深入分析,在开始之前先结合前面的文章思考下什么是服务目录?它的作用是什么? 正文 概念及作用 清楚Dubbo ...
- 【JMeter_11】JMeter逻辑控制器__Switch控制器<Switch Controller>
Switch控制器<Switch Controller> 业务逻辑: 取得switch value的值,通过对节点下所有取样器.逻辑控制器的下标.名称匹配去执行,switch value的 ...
- Python 为什么推荐蛇形命名法?
关于变量的命名,这又是一个容易引发程序员论战的话题.如何命名才能更具有可读性.易写性与明义性呢?众说纷纭. 本期"Python为什么"栏目,我们将聚焦于变量命名中的连接方式,来切入 ...
- 《Java并发编程的艺术》第5章 Java中的锁 ——学习笔记
参考https://www.cnblogs.com/lilinzhiyu/p/8125195.html 5.1 Lock接口 锁是用来控制多个线程访问共享资源的方式. 一般来说一个锁可以防止多个线程同 ...
- 11.实战交付一套dubbo微服务到k8s集群(3)之dubbo微服务底包镜像制作
1.下载jre镜像并推送到harbor [root@hdss7- ~]# docker pull registry.cn-hangzhou.aliyuncs.com/yfhub/jre8:8u112 ...
- 阿里面试官最喜欢问的21个HashMap面试题
1.HashMap 的数据结构? A:哈希表结构(链表散列:数组+链表)实现,结合数组和链表的优点.当链表长度超过 8 时,链表转换为红黑树. transient Node<K,V>\[\ ...
- Spark HA搭建
正文 下载Spark版本,这版本又要求必须和jdk与hadoop版本对应. http://spark.apache.org/downloads.html tar -zxvf 解压到指定目录,进入con ...
- Python3-threading模块-多线程
什么是线程? 线程是操作系统能够进行运算调度的最小单位.它被包含在进程之中,是进程中的实际运作单位.一条线程指的是进程中一个单一顺序的控制流,一个进程中可以并发多个线程,每条线程并行执行不同的任务 P ...
- ceph bluestore与 filestore 数据存放的区别
一. filestore 对象所在的PG以文件方式放在xfs文件中 1 查看所有的osd硬盘,跟其他linux其他硬盘一样,被挂载一个目录中. [root@hz-storage1 ~]# df -h ...
- 实战笔记丨JDBC问题定位指南
JDBC(Java数据库连接性)是Java API,用于管理与数据库的连接,发出查询和命令以及处理从数据库获得的结果集.JDBC在1997年作为JDK 1.1的一部分发布,是为Java持久层开发的首批 ...