内网有个网页用了HTTP基本认证机制,想用gocolly爬取,不知道怎么登录,只好研究HTTP基本认证机制

参考这里:https://www.jb51.net/article/89070.htm

下面开始参考作者dotcoo了:-)

看了<<http权威指南>>第12章HTTP基本认证机制(本站下载地址://www.jb51.net/books/93254.html),感觉讲的蛮详细的,写了一个小小例子测试.

请求响应过程:

==>
GET /hello HTTP/1.1
Host: 127.0.0.1:
<==
HTTP/1.1 Unauthorized
WWW-Authenticate: Basic realm="Dotcoo User Login"
==>
GET /hello HTTP/1.1
Host: 127.0.0.1:
Authorization: Basic YWRtaW46YWRtaW5wd2Q=
<==
HTTP/1.1 OK
Content-Type: text/plain; charset=utf-

golang HTTP基本认证机制的实现代码

package main
import (
"fmt"
"io"
"net/http"
"log"
"encoding/base64"
"strings"
)
// hello world, the web server
func HelloServer(w http.ResponseWriter, req *http.Request) {
auth := req.Header.Get("Authorization")
if auth == "" {
w.Header().Set("WWW-Authenticate", `Basic realm="Dotcoo User Login"`)
w.WriteHeader(http.StatusUnauthorized)
return
}
fmt.Println(auth)
auths := strings.SplitN(auth, " ", )
if len(auths) != {
fmt.Println("error")
return
}
authMethod := auths[]
authB64 := auths[]
switch authMethod {
case "Basic":
authstr, err := base64.StdEncoding.DecodeString(authB64)
if err != nil {
fmt.Println(err)
io.WriteString(w, "Unauthorized!\n")
return
}
fmt.Println(string(authstr))
userPwd := strings.SplitN(string(authstr), ":", )
if len(userPwd) != {
fmt.Println("error")
return
}
username := userPwd[]
password := userPwd[]
fmt.Println("Username:", username)
fmt.Println("Password:", password)
fmt.Println()
default:
fmt.Println("error")
return
}
io.WriteString(w, "hello, world!\n")
}
func main() {
http.HandleFunc("/hello", HelloServer)
err := http.ListenAndServe(":8000", nil)
if err != nil {
log.Fatal("ListenAndServe: ", err)
}
}

试验了上面的例子后,基本明白了HTTP基本认证的过程。但是怎么用gocolly访问呢?

参考:https://stackoverflow.com/questions/50576248/using-colly-framework-i-cant-login-to-the-evernote-account

但是答复者Matías Insaurralde提供的模拟浏览器访问的例子编译不通过,不明白其中的hptsKey的意思。代码放在下面供参考(可跳过):

package evernote

import (
"bytes"
"errors"
"fmt"
"io/ioutil"
"net/http"
"net/http/cookiejar"
"net/url"
"regexp"
"strings"
) const (
evernoteLoginURL = "https://www.evernote.com/Login.action"
) var (
evernoteJSParamsExpr = regexp.MustCompile(`document.getElementById\("(.*)"\).value = "(.*)"`)
evernoteRedirectExpr = regexp.MustCompile(`Redirecting to <a href="(.*)">`) errNoMatches = errors.New("No matches")
errRedirectURL = errors.New("Redirect URL not found")
) // EvernoteClient wraps all methods required to interact with the website.
type EvernoteClient struct {
Username string
Password string
httpClient *http.Client // These parameters persist during the login process:
hpts string
hptsh string
} // NewEvernoteClient initializes a new Evernote client.
func NewEvernoteClient(username, password string) *EvernoteClient {
// Allocate a new cookie jar to mimic the browser behavior:
cookieJar, _ := cookiejar.New(nil) // Fill up basic data:
c := &EvernoteClient{
Username: username,
Password: password,
} // When initializing the http.Client, copy default values from http.DefaultClient
// Pass a pointer to the cookie jar that was created earlier:
c.httpClient = &http.Client{
Transport: http.DefaultTransport,
CheckRedirect: http.DefaultClient.CheckRedirect,
Jar: cookieJar,
Timeout: http.DefaultClient.Timeout,
}
return c
} func (e *EvernoteClient) extractJSParams(body []byte) (err error) {
matches := evernoteJSParamsExpr.FindAllSubmatch(body, -)
if len(matches) == {
return errNoMatches
}
for _, submatches := range matches {
if len(submatches) < {
err = errNoMatches
break
}
key := submatches[]
val := submatches[] if bytes.Compare(key, hptsKey) == {
e.hpts = string(val)
}
if bytes.Compare(key, hptshKey) == {
e.hptsh = string(val)
}
}
return nil
} // Login handles the login action.
func (e *EvernoteClient) Login() error {
// First step: fetch the login page as a browser visitor would do:
res, err := e.httpClient.Get(evernoteLoginURL)
if err != nil {
return err
}
if res.Body == nil {
return errors.New("No response body")
}
body, err := ioutil.ReadAll(res.Body)
if err != nil {
return err
}
err = e.extractJSParams(body)
if err != nil {
return err
} // Second step: we have extracted the "hpts" and "hptsh" parameters
// We send a request using only the username and setting "evaluateUsername":
values := &url.Values{}
values.Set("username", e.Username)
values.Set("evaluateUsername", "")
values.Set("analyticsLoginOrigin", "login_action")
values.Set("clipperFlow", "false")
values.Set("showSwitchService", "true")
values.Set("hpts", e.hpts)
values.Set("hptsh", e.hptsh) rawValues := values.Encode()
req, err := http.NewRequest(http.MethodPost, evernoteLoginURL, bytes.NewBufferString(rawValues))
if err != nil {
return err
}
req.Header.Set("Accept", "application/json")
req.Header.Set("Content-Type", "application/x-www-form-urlencoded; charset=UTF-8")
req.Header.Set("x-requested-with", "XMLHttpRequest")
req.Header.Set("referer", evernoteLoginURL)
res, err = e.httpClient.Do(req)
if err != nil {
return err
}
body, err = ioutil.ReadAll(res.Body)
if err != nil {
return err
}
bodyStr := string(body)
if !strings.Contains(bodyStr, `"usePasswordAuth":true`) {
return errors.New("Password auth not enabled")
} // Third step: do the final request, append password to form data:
values.Del("evaluateUsername")
values.Set("password", e.Password)
values.Set("login", "Sign in") rawValues = values.Encode()
req, err = http.NewRequest(http.MethodPost, evernoteLoginURL, bytes.NewBufferString(rawValues))
if err != nil {
return err
}
req.Header.Set("Accept", "text/html")
req.Header.Set("Content-Type", "application/x-www-form-urlencoded; charset=UTF-8")
req.Header.Set("x-requested-with", "XMLHttpRequest")
req.Header.Set("referer", evernoteLoginURL)
res, err = e.httpClient.Do(req)
if err != nil {
return err
} // Check the body in order to find the redirect URL:
body, err = ioutil.ReadAll(res.Body)
if err != nil {
return err
}
bodyStr = string(body)
matches := evernoteRedirectExpr.FindAllStringSubmatch(bodyStr, -)
if len(matches) == {
return errRedirectURL
}
m := matches[]
if len(m) < {
return errRedirectURL
}
redirectURL := m[]
fmt.Println("Login is ok, redirect URL:", redirectURL)
return nil
}
After you successfully get the redirect URL, you should be able to send authenticated requests as long as you keep using the HTTP client that was used for the login process, the cookie jar plays a very important role here. To call this code use: func main() {
evernoteClient := NewEvernoteClient("user@company", "password")
err := evernoteClient.Login()
if err != nil {
panic(err)
}
}

只好自己写,经反复试验,发现对于本文开头自己写的server,只需以下代码即可通过验证,输出了hello,world!(将访问方式改为POST也一样。)

package main

import (
"fmt" "io/ioutil"
"net/http"
) // Login handles the login action.
func Login() {
//生成client 参数为默认
client := &http.Client{}
//要访问的url
url := "http://localhost:8000/hello"
//要提交的请求
req, _ := http.NewRequest("GET", url, nil)
//最重要的一句,用户名和密码可随意写
req.SetBasicAuth("aa", "bb")
fmt.Println("POST访问")
//返回结果
res, _ := client.Do(req)
defer res.Body.Close()
fmt.Println("header:")
header := res.Header
fmt.Println(header)
fmt.Println("realm:")
basicRealm := res.Header.Get("Www-Authenticate")
fmt.Println(basicRealm)
fmt.Println("body:")
body, _ := ioutil.ReadAll(res.Body)
fmt.Println(string(body)) } func main() {
Login()
}

查看SetBasicAuth的定义为(liteide中在光标位置按Ctrl+shift+J):

func (r *Request) SetBasicAuth(username, password string) {
r.Header.Set("Authorization", "Basic "+basicAuth(username, password))
}

而basicAuth的定义为

func basicAuth(username, password string) string {
auth := username + ":" + password
return base64.StdEncoding.EncodeToString([]byte(auth))
}

那么,用gocolly访问的代码如下:

package main

import (
"encoding/base64"
"fmt"
"net/http" "github.com/gocolly/colly"
) func basicAuth(username, password string) string {
auth := username + ":" + password
return base64.StdEncoding.EncodeToString([]byte(auth))
}
func main() {
c := colly.NewCollector()
h := http.Header{}
h.Set("Authorization", "Basic "+basicAuth("aaaa", "bbbb")) c.OnResponse(func(r *colly.Response) {
//fmt.Println(r)
fmt.Println(string(r.Body))
}) c.Request("GET", "http://localhost:8000/hello", nil, nil, h)
}

注:对于其他网站,也许要用Fiddler抓包,设置相应的header和cookie才行。

(golang)HTTP基本认证机制及使用gocolly登录爬取的更多相关文章

  1. Golang+chromedp+goquery 简单爬取动态数据

    目录 Golang+chromedp+goquery 简单爬取动态数据 Golang的安装 下载golang软件 解压golang 配置golang 重新导入配置 chromedp框架的使用 实际的代 ...

  2. Atitit HTTP 认证机制基本验证 (Basic Authentication) 和摘要验证 (Digest Authentication)attilax总结

    Atitit HTTP认证机制基本验证 (Basic Authentication) 和摘要验证 (Digest Authentication)attilax总结 1.1. 最广泛使用的是基本验证 ( ...

  3. WPS 认证机制

    WPS 认证机制 WPS(Wi-Fi Protected Setup,Wi-Fi保护设置)(有的叫做AOSS.有的叫做QSS,不过功能都一致.)是由Wi-Fi联盟组织实施的认证项目,主要致力于简化无线 ...

  4. 基于Token的WEB后台认证机制

    几种常用的认证机制 HTTP Basic Auth HTTP Basic Auth简单点说明就是每次请求API时都提供用户的username和password,简言之,Basic Auth是配合RES ...

  5. HTTP中的摘要认证机制

    引子: 指定和服务器端交互的HTTP方法,URL地址,即其他请求信息: Method:表示http请求方法,一般使用"GET","POST". url:表示请求 ...

  6. USB Type-C 应用面临安全性考验,USB-IF 将推动新认证机制

    USB 应用已经达到空前盛况,横跨电脑.移动设备.周边设备.影音器材等范畴,是一个极为普遍常见的界面.进入 USB Type-C 世代由于一并推动 USB-PD,过去没有严格执行的认证要求,基于安全性 ...

  7. 一个NB的安全认证机制

    这是一个NB的安全认证机制. 1.这是一个安全认证机制 2.可以防止黑客截获到客户端发送的请求消息,避免了黑客冒充客户端向服务器发送操作的请求. 原理与步骤: 1.客户端与服务器端都会放着一份验证用的 ...

  8. 【pac4j】OAuth 认证机制 入门篇

    1,pac4j是什么? pac4j是一个支持多种支持多种协议的身份认证的Java客户端. 2,pac4j的12种客户端认证机制:目前我只有用过第一和第八种. OAuth (1.0 & 2.0) ...

  9. web安全认证机制知多少

    如今web服务随处可见,成千上万的web程序被部署到公网上供用户访问,有些系统只针对指定用户开放,属于安全级别较高的web应用,他们需要有一种认证机制以保护系统资源的安全,本文将探讨五种常用的认证机制 ...

随机推荐

  1. 为VIP解决问题时写的源码

    平时为学生们解决问题时,建立的项目源代码,方便大家学习与讨论. 开源DEMO列表 1. https://github.com/bfyxzls/student_orderBy 2. https://gi ...

  2. 从零打卡leetcode之day 4--无重复最长字符串

    题目描述: 给定一个字符串,找出不含有重复字符的最长子串的长度. 示例: 给定 "abcabcbb" ,没有重复字符的最长子串是 "abc" ,那么长度就是3. ...

  3. leetcode — subsets-ii

    import java.util.ArrayList; import java.util.Arrays; import java.util.List; /** * Source : https://o ...

  4. 解决ajax跨域问题

    JQuery ajax支持get方式的跨域,采用了jsonp来完成.完成跨域请求的有两种方式实现.一种是使用Jquery ajax最底层的Api实现跨域的请求,而另一种则是JQuery ajax的高级 ...

  5. Linux学习笔记之Django项目部署(CentOS)----进阶篇

    一.引入 当我们开发好了一个Django项目之后是需要部署到服务器上的,这样才能正式使用这个项目.之前用了一个运行.sh文件的方法让项目得以在后台运行,其实随着学习的深入,这种方法其实是有点low的, ...

  6. MySQL中 and or 查询的优先级

    这个可能是容易被忽略的问题,首选我们要清楚:MySQL中,AND的执行优先级高于OR.也就是说,在没有小括号()的限制下,总是优先执行AND语句,再执行OR语句.比如: select * from t ...

  7. MatrixTree速成

    前言 MatrixTree定理是用来解决生成树计数问题的有利工具 比如说这道题 MatrixTree定理的算法流程也非常简单 我们记矩阵\(A\)为无向图的度数矩阵 记矩阵\(D\)为无向图的邻接矩阵 ...

  8. 阿里云服务器部署Office online注意事项

    阿里云服务器部署Office online注意事项 一.参考配置 实例规格:4核8GB(IO优化) 网络带宽:5Mbps 系统盘:40G 存储盘:200G OS:Windows Server 2016 ...

  9. Android apk安装时出现“解析软件包错误”

    有时候在安装apk的时候会出现解析软件包出错 (Android studio)解决方法如下: 关闭Instant Run功能: File-Settings-...看下图: 将红色框内的勾取消. 如果还 ...

  10. Android ContenObserver 监听联系人数据变化

    一.知识介绍 1.ContentProvider是内容提供者 ContentResolver是内容解决者(对内容提供的数据进行操作) ContentObserver是内容观察者(观察内容提供者提供的数 ...