scrpy-cookie

两种方法模拟登陆

1.直接携带cookie

import re

import scrapy

class RenrenSpider(scrapy.Spider):

    name = 'renren'

    allowed_domains = ['renren.com']

    start_urls = ['http://renren.com/']

    # 重写start_urls的处理方法，加上cookies

    def start_requests(self):

        cookies = '''anonymid=juzai6na-g0fmvf; depovince=GW; _r01_=1; ick_login=9de0dec9-4f94-42e0-819b-22df9e9adf66; ick=75ca63f4-c056-4af0-ba6e-7683bb07d04d; 
jebecookies=747a7092-f53c-40ae-bc0b-90b3f9ab5e2d|||||; JSESSIONID=abcjUmG7wh1SragUBfEPw; _de=8B28AA93122391F898B641D1F469956B; p=9984be9e31957abbf89e6751ad2fd8f48; 
first_login_flag=1; ln_uact=18781736136; ln_hurl=http://head.xiaonei.com/photos/0/0/men_main.gif; t=59071958da717542e6a80ffd0df189c38; 
societyguester=59071958da717542e6a80ffd0df189c38; id=970578188; xnsid=a1ea20ee; ver=7.0; loginfrom=null; 
jebe_key=ed626104-9dc0-45aa-961c-2cfea0e1935d%7C570ae1432b36360003bbd95b7fb6661a%7C1556356655118%7C1%7C1556356654129; 
wp_fold=0; XNESSESSIONID=2d1bc0ef1740; vip=1'''

        cookies = {i.split('=')[0]: i.split('=')[1] for i in cookies.split('; ')}

        start_urls = ['http://www.renren.com/970578188/profile?v=info_timeline']

        yield scrapy.Request(

            start_urls[0],

            callback=self.parse_detail,

            cookies=cookies

        )

    def parse_detail(self, response):

        res = response.xpath("//div[@class='love-infobox']/p/text()").extract_first()

        print(res)

        # print(re.findall(r'单身', response.body.decode()))

2.找到发送post请求的url地址，带上信息，发送请求。 scrapy.FormRequest

# 模拟github网登陆

class Renren1Spider(scrapy.Spider):

    name = 'renren1'

    allowed_domains = ['github.com']

    start_urls = ['http://github.com/login']

    # func 1

    def parse(self, response):

        authenticity_token = response.xpath("//input[@name='authenticity_token']/@value").extract_first()

        utf8 = response.xpath("//input[@name='utf8']/@value").extract_first()

        commit = response.xpath("//input[@name='commit']/@value").extract_first()
　　　　　# 所有的form data

        post_data = dict(

            login='tangpinggui',

            password='***********',

            authenticity_token=authenticity_token,

            utf8=utf8,

            commit=commit

        )

        yield scrapy.FormRequest(

            url="https://github.com/session",

            formdata=post_data,

            callback=self.after_login

        )

    # func 2

    def parse(self, response):
　　　　　# 只需要登陆名及密码，其它form data数据由FormRequest.from_response 找到

        post_data = dict(

            login='tangpinggui',

            password='*********',

        )

        yield scrapy.FormRequest.from_response(

            response, # 自动从response找到form表单

            formdata=post_data,

            callback=self.after_login

        )

    def after_login(self, response):

        res = response.xpath("//a[@class='btn btn-outline mt-2']/text()").extract_first()

        print(res)

# 模拟人人网网登陆

class Renren1Spider(scrapy.Spider):

    name = 'renren1'

    allowed_domains = ['renren.com']

    start_urls = ['http://renren.com']

    """

    email: 18781736136

    icode:

    origURL: http://www.renren.com/970578188/profile?v=info_timeline

    domain: renren.com

    key_id: 1

    captcha_type: web_login

    password: 6af626fe325aa7fcea5e6ff3c541404d9104667d6d941a5c5c30390c2d5da8ad

    rkey: 86cfb8063d4b47d05407cc549819f975

    f:

    """

    # func 1

    def parse(self, response):

        origURL = response.xpath("//input[@name='origURL']/@value").extract_first()

        domain = 'renren.com'

        key_id = response.xpath("//input[@name='key_id']/@value").extract_first()

        captcha_type = response.xpath("//input[@name='captcha_type']/@value").extract_first()

        # rkey = response.xpath("//input[@name='rkey']/@value").extract_first()

        post_data = dict(

            email='1********',

            password='**********',

            origURL=origURL,

            domain=domain,

            key_id=key_id,

            captcha_type=captcha_type,

            # rkey='', #不知道怎么获取,貌似不要也能登录

            f=''

        )

        yield scrapy.FormRequest(

            url="http://www.renren.com/PLogin.do",

            formdata=post_data,

            callback=self.after_login

        )

    # func 2

    def parse(self, response):

        post_data = dict(

            email='1********',

            password='**************',

        )

        yield scrapy.FormRequest.from_response(

            response, # 自动从response找到form表单中

            formdata=post_data,

            callback=self.after_login

        )

    def after_login(self, response):

        print('start....')

        with open('renren.html', 'w') as f:

            f.write(response.body.decode())

scrpy-cookie的更多相关文章

超大 Cookie 拒绝服务攻击
有没有想过,如果网站的 Cookie 特别多特别大,会发生什么情况? 不多说,马上来试验一下: for (i = 0; i < 20; i++) document.cookie = i + '= ...
IE10、IE11 User-Agent 导致的 ASP.Net 网站无法写入Cookie 问题
你是否遇到过当使用一个涉及到Cookie操作的网站或者管理系统时,IE 6.7.8.9下都跑的好好的,唯独到了IE10.11这些高版本浏览器就不行了?好吧,这个问题码农连续2天内遇到了2次.那么,我们 ...
解决cookie跨域访问
一.前言随着项目模块越来越多,很多模块现在都是独立部署.模块之间的交流有时可能会通过cookie来完成.比如说门户和应用,分别部署在不同的机器或者web容器中,假如用户登陆之后会在浏览器客户端写入c ...
jquery插件的用法之cookie 插件
一．使用cookie 插件插件官方网站下载地址:http://plugins.jquery.com/cookie/ cookie 插件的用法比较简单,直接粘贴下面代码示例: //生成一个cookie ...
一个诡异的COOKIE问题
今天下午,发现本地的测试环境突然跑不动了,thinkphp直接跑到异常页面,按照正常的排错思路,直接看thinkphp的log 有一条 [ error ] [2]setcookie() expects ...
[转载]Cookie/Session的机制与安全
Cookie和Session是为了在无状态的HTTP协议之上维护会话状态,使得服务器可以知道当前是和哪个客户在打交道.本文来详细讨论Cookie和Session的实现机制,以及其中涉及的安全问题. 因 ...
jquery.cookie的使用
今天想到了要为自己的影像日记增加赞的功能,并且需要用到cookie. 记得原生的js操作cookie也不是很麻烦的,但似乎jquery更简单,不过相比原生js,需要额外引入2个文件,似乎又不是很好,但 ...
跨域问题，前端主动向后台发送cookie
跨域是什么? 从一个域名的网页访问另一个域名的资源,就会出现跨域.只要协议.端口.域名有一个不同就会出现跨域例如: 1.协议不同 http://www.baidu.com:80 和 https:/ ...
【流量劫持】沉默中的狂怒 —— Cookie 大喷发
精简版:http://www.cnblogs.com/index-html/p/mitm-cookie-crack.html 前言上一篇文章讲解了如何借助前端技术,打造一个比 SSLStrip 更 ...
好好了解一下Cookie
Cookie的诞生由于HTTP协议是无状态的,而服务器端的业务必须是要有状态的.Cookie诞生的最初目的是为了存储web中的状态信息,以方便服务器端使用.比如判断用户是否是第一次访问网站.目前最新 ...

随机推荐

NoSQL、memcached介绍、安装memcached、查看memcached状态
1.NoSQL 2.memcached介绍 3.安装memcached(二进制包安装) yum install -y memcached libmemcached libevent (若没有安 ...
zoj 1649 bfs
Angel was caught by the MOLIGPY! He was put in prison by Moligpy. The prison is described as a N * M ...
unity 常用插件 2
Advanced PlayerPrefs Window 用来管理 PlayerPrefs 数据,超好用 JsonDotNet JSON 的序列化/反序列化插件.这个插件是支持 iOS, Android ...
[Ynoi2018]未来日记
"望月悲叹的最初分块" (妈呀这名字好中二啊(谁叫我要用日本轻小说中的东西命名真是作死)) 这里就直接挂csy的题解了,和我的不太一样,但是大概思路还是差不多的,我的做法是和“五彩 ...
安装FireEye渗透测试套件commando-vm
前两天FireEye开源了套他们自己的渗透测试工具,玩了下,这里简单讲一下我安装的过程. 1.首先是虚拟机,在virtualbox或者vmware中安装一个新的Windows系统,win7或者win1 ...
（转）volatile 的理解
对于(volatile unsigned char *)0x20我们再分析一下,它是由两部分组成: 1) (unsigned char *)0x20,0x20只是个值,前面加(unsigned cha ...
【NOIp2004提高组】食虫算题解
所谓虫食算,就是原先的算式中有一部分被虫子啃掉了,需要我们根据剩下的数字来判定被啃掉的字母.来看一个简单的例子: 43#9865#045 + 8468#6633 44445509678 其中#号代表被 ...
SignalR 行实时通信最大连接数
SignalR 搭建实时刷新应用虽然非常方便,但是有个问题你必须考虑到,就是一般的浏览器,对于SignalR的全双工通信方式,绝大多数浏览器都只支持6个新窗口,如果你打开第7个,那么新的框口页面是不会 ...
nginx ngx_http_image_filter_module 简单试用
nginx包含了一个ngx_http_image_filter_module 模块,我们可以方便的进行图片的缩略图,平时一些简单的功能已经够用了环境准备为了简单使用docker-compose ...
使用powerpoint的表对象
以下为basic范例,delphi使用需要加以修改 Table 对象代表幻灯片上的表格形状.Table 对象是 Shapes 集合的成员.Table 对象包含 Columns 集合和 Rows 集合 ...

scrpy-cookie

scrpy-cookie的更多相关文章

随机推荐

热门专题