python （1）一个简单的爬虫： python 在windows下创建文件夹并写入文件

1.一个简单的爬虫：爬取豆瓣的热门电影的信息

写在前面：如何创建本来存在的文件夹并写入

 t_path = "d:/py/inn"            #本来不存在inn，先定义路径，然后如果不存在，则创建目录，下面就能用了

 if not os.path.exists(t_path):

     os.makedirs(t_path)

 f = open(r'd:/py/inn/info.txt','a')

技能：获取网页源码，正则表达式，函数调用，全局变量的定义

 #! /usr/bin/env python

 # -*- coding=utf-8 -*-

 import requests

 import json

 import re

 import sys

 reload(sys)

 sys.setdefaultencoding("utf-8")

 classinfo = []

 f = open('info.txt','w')

 num = 0

 def write(htm):

     titl = re.findall('data-tit(.*?)data-enough',htm.text,re.S)

     for each in titl:

         #print each

         info = {}

         #print each

         info['title'] = re.search('le="(.*?)"',each,re.S).group(1)

         info['year'] = re.search('data-release="(.*?)" data',each,re.S).group(1)

         info['Rating']= re.findall('data-rate="(.*?)" data-star',each,re.S)[0]

         info['time'] = re.findall('data-duration="(.*?)" data-re',each,re.S)[0]

         info['reg'] = re.findall('data-region="(.*?)" data-dir',each,re.S)[0]

         info['act'] = re.findall('data-actors="(.*?)" data-in',each,re.S)[0]

         global num #全局的定义

         num = num + 1

         f.writelines('%d\n' %num)

         f.writelines(u'电影名：'+info['title'] + '\n')

         f.writelines(u'主演：'+info['act'] + '\n')

         f.writelines(u'电影地区：' + info['reg']+'\n')

         f.writelines(u'上映年份：' + info['year']+'\n')

         f.writelines(u'电影时长：' + info['time']+'\n')

         f.writelines(u'评分：' + info['Rating']+'\n\n')

 def getremen():

     # html = requests.get('http://movie.douban.com/')

     url = 'http://movie.douban.com/'

     html = requests.get(url)

     html.encoding = 'utf-8'

     # print html.text

     write(html)

 if __name__ == "__main__":

     getremen()

python （1）一个简单的爬虫： python 在windows下创建文件夹并写入文件的更多相关文章

Python写一个简单的爬虫
code #!/usr/bin/env python # -*- coding: utf-8 -*- import requests from lxml import etree class Main ...
一个简单的多线程Python爬虫（一）
一个简单的多线程Python爬虫最近想要抓取拉勾网的数据,最开始是使用Scrapy的,但是遇到了下面两个问题: 前端页面是用JS模板引擎生成的接口主要是用POST提交参数的目前不会处理使用JS模 ...
python爬虫系列（1）——一个简单的爬虫实例
本文主要实现一个简单的爬虫,目的是从一个百度贴吧页面下载图片. 1. 概述本文主要实现一个简单的爬虫,目的是从一个百度贴吧页面下载图片.下载图片的步骤如下: 获取网页html文本内容:分析html中 ...
Python并发编程-一个简单的爬虫
一个简单的爬虫 #网页状态码 #200 正常 #404 网页找不到 #502 504 import requests from multiprocessing import Pool def get( ...
使用python做最简单的爬虫
使用python做最简单的爬虫 --之心 #第一种方法import urllib2 #将urllib2库引用进来response=urllib2.urlopen("http://www.ba ...
用Python写一个简单的Web框架
一.概述二.从demo_app开始三.WSGI中的application 四.区分URL 五.重构 1.正则匹配URL 2.DRY 3.抽象出框架六.参考一.概述在Python中,WSGI( ...
用Python编写一个简单的Http Server
用Python编写一个简单的Http Server Python内置了支持HTTP协议的模块,我们可以用来开发单机版功能较少的Web服务器.Python支持该功能的实现模块是BaseFTTPServe ...
python中一个简单的webserver
python中一个简单的webserver 2013-02-24 15:37:49 分类: Python/Ruby 支持多线程的webserver 1 2 3 4 5 6 7 8 9 10 11 ...
Python实现一个简单三层神经网络的搭建并测试
python实现一个简单三层神经网络的搭建(有代码) 废话不多说了,直接步入正题,一个完整的神经网络一般由三层构成:输入层,隐藏层(可以有多层)和输出层.本文所构建的神经网络隐藏层只有一层.一个神经网 ...

随机推荐

CSS3卡片旋转效果
HTML: <div id="rotate"> <div id="rotate_wrap"> <div id="fron ...
【BZOJ1010】【HNOI2008】玩具装箱
继续看黄学长代码原题: P教授要去看奥运,但是他舍不下他的玩具,于是他决定把所有的玩具运到北京.他使用自己的压缩器进行压缩,其可以将任意物品变成一堆,再放到一种特殊的一维容器中.P教授有编号为1.. ...
（转）A Beginner's Guide To Understanding Convolutional Neural Networks Part 2
Adit Deshpande CS Undergrad at UCLA ('19) Blog About A Beginner's Guide To Understanding Convolution ...
ABBYY FineReader 12最新官方版下载
ABBYY FineReader是市场领先的文字识别(OCR)软件,可快速方便地将扫描纸质文档.PDF文件和数码相机的图像转换成可编辑.可搜索的信息,ABBYY FineReader 12是目前最新版 ...
【转】Android Drawable Resource学习（十一）、RotateDrawable
对另一个drawable资源,基于当前的level,进行旋转的drawable. 文件位置: res/drawable/filename.xml文件名即资源名编译数据类型: 指向 RotateDra ...
python3生成标签云
标签云是现在大数据里面最喜欢使用的一种展现方式,其中在python3下也能实现标签云的效果,贴图如下: -------------------进入正文--------------------- 首先要 ...
大象数据库SQL存储过程（函数）
-- Function: antifraudjudge(character varying) -- DROP FUNCTION antifraudjudge(character varying); C ...
oracle 自定义异常处理
--第一种方式:使用raise_application_error抛出自定义异常declare i number:=-1;begin if i=-1 then raise_application_er ...
linux shell 整理收集(不断更新)
1)主从复制延时判断 (转 http://www.cnblogs.com/gomysql/p/3862018.html) 说明: 不要通过Seconds_Behind_Master去判断,该值表示sl ...
jquery读取csv文件并用json格式输出
直接贴上代码: <html xmlns="http://www.w3.org/1999/xhtml" > <head> <title>Untit ...

python （1）一个简单的爬虫： python 在windows下 创建文件夹并写入文件

python （1）一个简单的爬虫： python 在windows下 创建文件夹并写入文件的更多相关文章

随机推荐

热门专题

python （1）一个简单的爬虫： python 在windows下创建文件夹并写入文件

python （1）一个简单的爬虫： python 在windows下创建文件夹并写入文件的更多相关文章