Python3爬虫04（其他例子，如处理获取网页的内容）

#!/usr/bin/env python
# -*- coding:utf-8 -*-

import os
import re
import requests
from bs4 import NavigableString
from bs4 import BeautifulSoup

res=requests.get("https://www.qiushibaike.com/")
qiushi=res.content
soup=BeautifulSoup(qiushi,"html.parser")
duanzis=soup.find_all(class_="content")
for i in duanzis:
    duanzi=i.span.contents[0]
    # duanzi=i.span.string
    print(duanzi)
    # print(i.span.string)

res=requests.get("http://699pic.com/sousuo-218808-13-1-0-0-0.html")
image=res.content
soup=BeautifulSoup(image,"html.parser")
images=soup.find_all(class_="lazy")

for i in images:
    original=i["data-original"]
    title=i["title"]
    # print(title)
    # print(original)
    # print("")
    try:
        with open(os.getcwd()+"\\jpg\\"+title+'.jpg','wb') as file:
            file.write(requests.get(original).content)
    except:
        pass

r = requests.get("http://699pic.com/sousuo-218808-13-1.html")
fengjing = r.content
soup = BeautifulSoup(fengjing, "html.parser")
# 找出所有的标签
images = soup.find_all(class_="lazy")
# print images # 返回list对象

for i in images:
    jpg_rl = i["data-original"]  # 获取url地址
    title = i["title"]           # 返回title名称
    print(title)
    print(jpg_rl)
    print("")

r = requests.get("https://www.qiushibaike.com/")
r=requests.get("http://www.cnblogs.com/nicetime/")
blog=r.content
soup=BeautifulSoup(blog,"html.parser")
soup=BeautifulSoup(blog,features="lxml")
print(soup.contents[0].contents)

tag=soup.find('div')
tag=soup.find(class_="menu-bar menu clearfix")
tag=soup.find(id="menu")
print(list(tag))

tag01=soup.find(class_="c_b_p_desc")

print(len(list(tag01.contents)))
print(len(list(tag01.children)))
print(len(list(tag01.descendants)))

print(tag01.contents)
print(tag01.children)
for i in tag01.children:
    print(i)

print(len(tag01.contents))

for i in tag01:
    print(i)

print(tag01.contents[0].string)
print(tag01.contents[1])
print(tag01.contents[1].string)

url = "http://www.dygod.net/html/tv/oumeitv/109673.html"
s = requests.get(url)
print(s.text.encode("iso-8859-1").decode('gbk'))
res = re.findall('href="(.*?)">ftp',s.text)
for resi in res:
    a=resi.encode("iso-8859-1").decode('gbk')
    print(a)

Python3爬虫04（其他例子，如处理获取网页的内容）的更多相关文章

Python获取网页指定内容(BeautifulSoup工具的使用方法)
Python用做数据处理还是相当不错的,如果你想要做爬虫,Python是很好的选择,它有很多已经写好的类包,只要调用,即可完成很多复杂的功能,此文中所有的功能都是基于BeautifulSoup这个包. ...
telnet建立http连接获取网页HTML内容
利用telnet可以与服务器建立http连接,获取网页,实现浏览器的功能.它对于需要对http header进行观察和测试到时候非常方便.因为浏览器看不到http header. 步骤如下: 1. 运 ...
黄聪：C#获取网页HTML内容的三种方式
C#通常有三种方法获取网页内容,使用WebClient.WebBrowser或者HttpWebRequest/HttpWebResponse. 方法一:使用WebClient static void ...
【Python3 爬虫】16_抓取腾讯视频评论内容
上一节我们已经知道如何使用Fiddler进行抓包分析,那么接下来我们开始完成一个简单的小例子抓取腾讯视频的评论内容首先我们打开腾讯视频的官网https://v.qq.com/ 我们打开[电视剧]这 ...
C++ 与 php 的交互之----- C++ 获取网页文字内容，获取 php 的 echo 值。
转载请声明出处! http://www.cnblogs.com/linguanh/category/633252.html 距离上次谈 C++ 制作json 或者其他数据传送给服务器,时隔两个多月 ...
[python]获取网页中内容为汉字的字符串的判断
实际上是这样,将获取到网页中表单内容与汉字字符串作比较,即: a = request.POST['a'] if a == '博客园': print 'ok' else: print 'false' a ...
使用SOCKET获取网页的内容
使用fsockopen()函数来实现获取页面信息,完整代码如下 //设置字符集(由于要抓取的网易网站字符集编码是gbk编码) header("content-type:text/html;c ...
C++ 与 php 的交互之----- C++ 异步获取网页文字内容，异步获取 php 的 echo 值。
已搬迁至 http://www.cnblogs.com/linguanh/p/4543836.html
php利用curl获取网页title内容
/**$html = curl_get_file_contents($url); $title = get_title_contents($html); var_dump($title);*/ fun ...

随机推荐

在x64平台上调试依赖于x86的WCF服务
根据微软官方的解释,WCF(之前的版本名为“Indigo”)是使用托管代码建立和运行面向服务(Service Oriented)应用程序的统一框架.它使得开发者能够建立一个跨平台的安全.可信赖.事务性 ...
SSMS 远程连接SERVER 设置 - Unable to connect to SQL Server instance remotely
问题描述: 新装了一台SERVER,在SERVER本地打开SSMS链接sever,一且正常.但是用我自己local去链接的时候出现以下错误. A network-related or instance ...
Lock接口和ReadWriteLock接口
Lock接口 Lock接口在java.util.concurrent.locks包中,在jdk1.5之后才有. Lock接口有6个方法: void lock(); void lockInterrupt ...
formValidation校验
引用: https://www.cnblogs.com/aliger/p/3898216.html
2019.03.21 读书笔记 ==与Equals
首先得出一个结论:==是比较变量内存的数据,Equals是值比较.但是他们都能被重写,所以object又增加了一个RefrenceEquals不可被重写,只比较数据: [ReliabilityCont ...
Android中的AlertDialog遇到的错误
public void showAddIPCDialog() { Log.i("ssssssssss","wwwwwwwwww"); LayoutInflate ...
PHP文件上传error的错误类型 - $_FILES['file']['error']
假设文件上传字段的名称img,则: $_FILES['img']['error']有以下几种类型 1.UPLOAD_ERR_OK 其值为 0,没有错误发生,文件上传成功. 2.UPLOAD_ERR_I ...
性能测试工具LoadRunner21-LR之Controller 常用函数
1.事务函数: Lr_start_transaction(); //标记事务的开始 Lr_end_transaction(); //标记事务的结束,一般情况下,事务开始与结束联合使用 Lr_get ...
性能测试工具LoadRunner06-LR之Virtual User Generator 事务（Transaction）
定义为了衡量某个操作的性能,需要在操作的开始和结束位置插入这样一个范围,这就定义了一个transaction. 原因从性能测试的角度出发,我们需要知道不同的操作所花费的时间,这样就能衡量不同的操作 ...
Murano Weekly Meeting 2015.09.08
Meeting time: 2015.September.8th 1:00~2:00 Chairperson: Serg Melikyan, PTL from Mirantis Meeting su ...

Python3爬虫04（其他例子，如处理获取网页的内容）

Python3爬虫04（其他例子，如处理获取网页的内容）的更多相关文章

随机推荐

热门专题