用python requests库写一个人人网相册爬虫

担心人人网会黄掉，写个爬虫，把我的相册照片都下载下来。代码如下：

# -*- coding: utf-8 -*-

import requests

import json

import os

def mkdir(path):

    path=path.strip()

    path=path.rstrip("\\")

    isExists=os.path.exists(path)

    if not isExists:

        print path+u' 创建成功'

        os.makedirs(path)

        return "yes"

    else:

        print path+u' 目录已存在'

        return "no"

def login_renren(s):

    origin_url = 'http://www.renren.com'

    login_data = {

        'email':'用户名',

        'domain':'renren.com',

        'origURL':'http://www.renren.com/home',

        'key_id':'',

        'captcha_type':'web_login',

        'password':'密码抓包获得',

        'rkey':'rkey抓包获得'

    }

    r = s.post("http://www.renren.com/ajaxLogin/login?1=1&uniqueTimestamp=2016742045262", data = login_data)

    if 'true' in r.content:

        print u'登录人人网成功'

    return s

def get_albums(s):

    r = s.get('http://photo.renren.com/photo/278382090/albumlist/v7?showAll=1#')

    #print r.content

    content = r.content

    index1 = content.find('nx.data.photo = ')

    #print index1

    index2 = content.find('nx.data.hasHiddenAlbum =')

    #print index2

    target_json = content[index1+16:index2].strip()

    target_json = target_json[0:len(target_json)-1]

    #print target_json

    data = json.loads(target_json.replace("\'", '"'));

    album_list = data['albumList']

    album_count = album_list['albumCount']

    tip = u'一共有'+str(album_count)+u'个相册'

    print tip

    album_ids = []

    for album in album_list['albumList']:

        #print album['albumName']

        album_ids.append(album['albumId'])

    return album_ids,s

def download_albums(album_ids,s):

    #访问相册

    for album_id in album_ids:

        album_url = 'http://photo.renren.com/photo/278382090/album-'+album_id+'/v7'

        r = s.get(album_url)

        if "photoId" in r.content:

            print u'进入相册成功'

            #print r.content

            content = r.content

            index1 = content.find('nx.data.photo = ')

            #print index1

            index2 = content.find('; define.config')

            #print index2

            target_json = content[index1+16:index2].strip()

            target_json = target_json[13:len(target_json)-2]

            #print target_json

            data = json.loads(target_json.replace("\'", '"'));

            photos = data['photoList']

            album_name = data['albumName']

            # 定义并创建目录

            album_path = 'd:\\'+album_name

            #print album_path

            if mkdir(album_path)=='yes':

                for photo in photos:

                    #print photo['url']

                    image_name = photo['photoId']

                    photo_url = photo['url']

                    r = requests.get(photo_url)

                    image_path = album_path+'/'+image_name+'.jpg'

                    f = open(image_path, 'wb')

                    f.write(r.content)

                    f.close()

                    tip = u'相片'+image_name+u'下载成功'

                    print tip

            else:

                print u'相册已经下载'

#执行该文件的主过程

if __name__ == '__main__':

    #创建requests会话

    s = requests.Session()

    #登录人人网

    s = login_renren(s)

    #获取相册列表

    album_ids,s = get_albums(s)

    #下载相册

    download_albums(album_ids,s)

搞定！运行效果如下：

用python requests库写一个人人网相册爬虫的更多相关文章

使用python requests库写接口自动化测试--记录学习过程中遇到的坑（1）
一直听说python requests库对于接口自动化测试特别合适,但由于自身代码基础薄弱,一直没有实践: 这次赶上公司项目需要,同事小伙伴们一起学习写接口自动化脚本,听起来特别给力,赶紧实践一把: ...
大概看了一天python request源码。写下python requests库发送 get,post请求大概过程。
python requests库发送请求时,比如get请求,大概过程. 一.发起get请求过程:调用requests.get(url,**kwargs)-->request('get', url ...
用python的time库写一个进度条
运算符算数运算如a=10,b=20 +两个数相加 a+b=30 -两个数相减 a-b=-10 两个数相乘 a****b =200 /两个数相除b/a=2 %取模,并返回余数b%a=0 幂,a*** ...
【python爬虫】用requests库模拟登陆人人网
说明:以前是selenium登陆取cookie的方法比较复杂,改用这个 """ 用requests库模拟登陆人人网 """ import r ...
Python Requests库简单入门
我对Python网络爬虫的学习主要是基于中国慕课网上嵩天老师的讲授,写博客的目的是为了更好触类旁通,并且作为学习笔记之后复习回顾. 1.引言 requests 库是一个简洁且简单的处理HTTP请求的第 ...
python requests库学习笔记（上）
尊重博客园原创精神,请勿转载! requests库官方使用手册地址:http://www.python-requests.org/en/master/:中文使用手册地址:http://cn.pytho ...
Python——Requests库的开发者接口
本文介绍 Python Requests 库的开发者接口,主要内容包括: 目录一.主要接口 1. requests.request() 2. requests.head().get().post() ...
Python:requests库、BeautifulSoup4库的基本使用（实现简单的网络爬虫）
Python:requests库.BeautifulSoup4库的基本使用(实现简单的网络爬虫) 一.requests库的基本使用 requests是python语言编写的简单易用的HTTP库,使用起 ...
Python requests库的使用（一）
requests库官方使用手册地址:http://www.python-requests.org/en/master/:中文使用手册地址:http://cn.python-requests.org/z ...

随机推荐

G面经prepare: Pattern Match
设定一个pattern 把 'internationalization' 变成 'i18n', 比如word是house,pattern可以是h3e, 3se, 5, 1o1s1等, 给pattern ...
Python学习总结4：字符串常量与操作汇总
参考博客:http://www.cnblogs.com/Camilo/archive/2013/09/21/3332267.html http://www.cnblogs.com/SunWentao/ ...
c++ 中this底层
成员变量设置在一个结构体中, 操作成员变量的成员函数,其实质上就是拥有一个隐藏的成员变量结构体的地址指针,俗称this指针.
URAL 1876 Centipede's Morning（数学）
A centipede has 40 left feet and 40 right feet. It keeps a left slippers and b right slippers under ...
。。。无语的Eclipse+Tomact。。。
晕哦,今天又被Eclipse给骗了,今天部署了一个SSH的环境,搞了半天,JAR包是通过BuildPath导入进去的,怎么搞都报错,说是找不到Spring-Web的一个Jar包,差点没有把我给弄死.. ...
1009: 恺撒Caesar密码
1009: 恺撒Caesar密码时间限制: 10 Sec 内存限制: 128 MB提交: 349 解决: 215[提交][状态][讨论版] 题目描述 Julius Caesar 生活在充满危险和 ...
Android -- 自定义View小Demo，关于Rect绘制Android机器人（一）
1,关于Rect和RectF类的区别以前一直没有去关注它,刚刚了解了一下才知道都是用来确定矩形的区域,不过Rect是int类型的坐标而RectF是float类型的坐标,所以说RectF要更加精确.现在 ...
PAT乙级 1005. 继续(3n+1)猜想 (25)
1005. 继续(3n+1)猜想 (25) 时间限制 400 ms 内存限制 65536 kB 代码长度限制 8000 B 判题程序 Standard 作者 CHEN, Yue 卡拉兹(Callatz ...
POJ - 2041Unreliable Message
这里的算法非常简单,就是“模拟”,注意编写每个传令官的算法时先分开测试,放在一起就会混淆. POJ - 2041Unreliable Message Time Limit: 1000MS Memory ...
转：MyEclipse8.6插件安装方法
通常,我们可以用update来直接安装.但是myeclipse限制了中国区的下载和更新.所以我们只能用插件配置的方法来实现. MyEclipse8.6插件安装同Eclipse插件安装方式大致相同,如下 ...

用python requests库写一个人人网相册爬虫

用python requests库写一个人人网相册爬虫的更多相关文章

随机推荐

热门专题