Centos下安装Scrapy
Scrapy是一个开源的机遇twisted框架的python的单机爬虫,该爬虫实际上包含大多数网页抓取的工具包,用于爬虫下载端以及抽取端。
安装环境:
centos5.4
python2.7.3
安装步骤:
1.下载python2.7 http://www.python.org/ftp/python/2.7.3/Python-2.7.3.tgz

[root@zxy-websgs ~]# wget http://www.python.org/ftp/python/2.7.3/Python-2.7.3.tgz -P /opt [root@zxy-websgs opt]# tar xvf Python-2.7.3.tgz [root@zxy-websgs Python-2.7.3]# ./configure [root@zxy-websgs Python-2.7.3]# make && make install

验证python2.7安装
[root@zxy-websgs Python-2.7.3]# python2.7
Python 2.7.3 (default, Feb 28 2013, 03:08:43)
[GCC 4.1.2 20080704 (Red Hat 4.1.2-50)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> exit()
2.安装setuptools,http://pypi.python.org/packages/source/s/setuptools/setuptools-0.6c11.tar.gz
[root@zxy-websgs ~]# wget http://pypi.python.org/packages/source/s/setuptools/setuptools-0.6c11.tar.gz -P /opt/
[root@zxy-websgs opt]# tar zxvf setuptools-0.6c11.tar.gz
[root@zxy-websgs setuptools-0.6c11]# python2.7 setup.py install
3.安装Twisted
[root@zxy-websgs setuptools-0.6c11]# easy_install Twisted
......
Installed /usr/local/lib/python2.7/site-packages/Twisted-12.3.0-py2.7-linux-x86_64.egg
......
Installed /usr/local/lib/python2.7/site-packages/zope.interface-4.0.4-py2.7-linux-x86_64.egg
Twisted要安装zope.interface,可以从下面地址下载
zope.interface:http://pypi.python.org/packages/source/z/zope.interface/zope.interface-4.0.1.tar.gz
twisted:http://twistedmatrix.com/Releases/Twisted/12.1/Twisted-12.1.0.tar.bz2
5.安装w3lib

[root@zxy-websgs setuptools-0.6c11]# easy_install -U w3lib
Searching for w3lib
Reading http://pypi.python.org/simple/w3lib/
Reading http://github.com/scrapy/w3lib
Best match: w3lib 1.2
Downloading http://pypi.python.org/packages/source/w/w3lib/w3lib-1.2.tar.gz#md5=f929d5973a9fda59587b09a72f185a9e
Processing w3lib-1.2.tar.gz
Running w3lib-1.2/setup.py -q bdist_egg --dist-dir /tmp/easy_install-wm_1BB/w3lib-1.2/egg-dist-tmp-2DQHY_
zip_safe flag not set; analyzing archive contents...
Adding w3lib 1.2 to easy-install.pth file Installed /usr/local/lib/python2.7/site-packages/w3lib-1.2-py2.7.egg
Processing dependencies for w3lib
Finished processing dependencies for w3lib

w3lib:http://pypi.python.org/packages/source/w/w3lib/w3lib-1.2.tar.gz
6.安装libxml2或者用easy_install安装lxml
安装失败时参考:http://www.coder4.com/archives/3660
[root@zxy-websgs lxml-3.1.0]# easy_install lxml
验证lxml安装
[root@zxy-websgs lxml-3.1.0]# python2.7
Python 2.7.3 (default, Feb 28 2013, 03:08:43)
[GCC 4.1.2 20080704 (Red Hat 4.1.2-50)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import lxml
>>> exit()
也可以安装libxml2,官网上推荐安装2.6.28或者以上的版本,但在官网上没找到,我先是安装的2.6.9的版本,运行scrapy时报以下错误

Traceback (most recent call last):
File "/usr/local/bin/scrapy", line 5, in <module>
pkg_resources.run_script('Scrapy==0.14.4', 'scrapy')
File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 489, in run_script
File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 1207, in run_script
File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/EGG-INFO/scripts/scrapy", line 4, in <module>
execute()
File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/cmdline.py", line 112, in execute
cmds = _get_commands_dict(inproject)
File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/cmdline.py", line 37, in _get_commands_dict
cmds = _get_commands_from_module('scrapy.commands', inproject)
File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/cmdline.py", line 30, in _get_commands_from_module
for cmd in _iter_command_classes(module):
File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/cmdline.py", line 21, in _iter_command_classes
for module in walk_modules(module_name):
File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/utils/misc.py", line 65, in walk_modules
submod = __import__(fullpath, {}, {}, [''])
File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/commands/shell.py", line 8, in <module>
from scrapy.shell import Shell
File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/shell.py", line 14, in <module>
from scrapy.selector import XPathSelector, XmlXPathSelector, HtmlXPathSelector
File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/selector/__init__.py", line 30, in <module>
from scrapy.selector.libxml2sel import *
File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/selector/libxml2sel.py", line 12, in <module>
from .factories import xmlDoc_from_html, xmlDoc_from_xml
File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/selector/factories.py", line 14, in <module>
libxml2.HTML_PARSE_NOERROR + \
AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'

升级到2.6.21版本以后解决了。
libxml2.6.1:ftp://xmlsoft.org/libxml2/python/libxml2-python-2.6.21.tar.gz
7.安装pyOpenSSL(这个是可选安装的,主要为了使scrapy能够支持https)
用easy_install pyOpenSSL安装的是pyOpenSSL-0.13版本,没安装成功,于是手动下载.011版本来进行安装。
[root@zxy-websgs opt]# wget http://launchpadlibrarian.net/58498441/pyOpenSSL-0.11.tar.gz -P /opt
[root@zxy-websgs opt]# tar zxvf pyOpenSSL-0.11.tar.gz
[root@zxy-websgs pyOpenSSL-0.11]# python2.7 setup.py install
pyOpenSSL:http://launchpadlibrarian.net/58498441/pyOpenSSL-0.11.tar.gz
8.安装scrapy
[root@zxy-websgs pyOpenSSL-0.11]# easy_install -U Scrapy
验证安装

[root@zxy-websgs pyOpenSSL-0.11]# scrapy
Scrapy 0.16.4 - no active project Usage:
scrapy <command> [options] [args] Available commands:
fetch Fetch a URL using the Scrapy downloader
runspider Run a self-contained spider (without creating a project)
settings Get settings values
shell Interactive scraping console
startproject Create new project
version Print Scrapy version
view Open URL in browser, as seen by Scrapy [ more ] More commands available when run from project directory Use "scrapy <command> -h" to see more info about a command

scrapy:http://pypi.python.org/packages/source/S/Scrapy/Scrapy-0.14.4.tar.gz
总结:
pyOpenSSL单独安装的时候不成功,也可以先下载pyOpenSSL0.11进行安装,再使用easy_install -U Scrapy进行全程安装
yuanwen ::: http://www.cnblogs.com/xiaoruoen/archive/2013/02/27/2933854.html
Centos下安装Scrapy的更多相关文章
- CentOS下安装hadoop
CentOS下安装hadoop 用户配置 添加用户 adduser hadoop passwd hadoop 权限配置 chmod u+w /etc/sudoers vi /etc/sudoers 在 ...
- CentOS下安装使用start-stop-daemon
CentOS下安装使用start-stop-daemon 在centos下下了个自启动的服务器脚本 执行的时候发现找不到start-stop-daemon命令 好吧 执行手动编译一下 加上这个命令 w ...
- 从零开始学 Java - CentOS 下安装 Tomcat
生活以痛吻我,我仍报之以歌 昨天晚上看到那个冯大辉老师的微信公众号,「小道消息」上的一篇文章,<生活以痛吻我,我仍报之以歌>.知乎一篇匿名回答,主题为<冯大辉到底是不是技术大牛,一个 ...
- CentOS 下安装
2016年12月5日15:25:58 ----------------------------------- 通常情况下在centos下安装软件就用yum. 关键是,使用yum你要知道安装包的名字是什 ...
- [原创] ubuntu下安装scrapy报错 error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
Ubuntu14.04在virtualenv下安装scrapy报错,Failed building wheel for cffi,lxml,cryptography 等. error: command ...
- [Linux]CentOS下安装和使用tmux
前天随意点开博客园,看到了一篇关于tmux的文章 Tmux - Linux从业者必备利器,特意还点进去看了.毕竟Linux对于做游戏服务端开发的我来说,太熟悉不过了.不过我就粗略地看了一眼,就关掉了. ...
- CentOS下安装JDK7 转载
转载地址:http://www.cnblogs.com/rilley/archive/2012/02/02/2335395.html CentOS下安装JDK7 下载地址:http://www.ora ...
- Centos下安装mysql 总结
一.MySQL安装 Centos下安装mysql 请点开:http://www.centoscn.com/CentosServer/sql/2013/0817/1285.html 二.MySQL的几个 ...
- 在centos下安装django
这里有一个不错的Django的学习资料.先收藏一下,以备后用.谢谢 http://www.ziqiangxuetang.com/django/django-install.html 在centos下安 ...
随机推荐
- C# Winform ListView使用
以下内容均来自网上,个人收集整理,具体出处也难确认了,就没写出处了: 一.基本使用: listView.View = View.Details;//设置视图 listView.SmallImageLi ...
- js 默认行为取消
js 默认行为取消 可以简单的 return false; 看需求吧 <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transit ...
- cocos2dx游戏开发——微信打飞机学习笔记(八)——EnemyLayer的搭建
一.创建文件= = EnemyLayer.h EnemyLayer.cpp Ps:我绝对不是在凑字数~. 二.How to do? (1)一些宏 ...
- BroadcastRecevier广播接受者
广播接收器的两种注册方式: 1)动态注册:在代码中注册,创建一个IntentFilter(意图过滤器)对象,设置想要就收的广播,在onCreate()方法中通过调用registerReceiver() ...
- PDA手持终端在ERP系统仓库管理出入库盘点环节的应用
PDA手持终端在ERP系统仓库管理出入库盘点环节的应用 传统库存管理的数据录入过程,常采用PC机录入数据,或在电脑上结合条码枪扫描条码进行管理(非实时),造成管理上的不便.因而,采用无线(WIFI)手 ...
- oracle处理考勤时间,拆分考勤时间段的sql语句
最近一直在用mysql数据库做云项目,有段时间没有接触oracle了,昨天有朋友叫我帮忙用oracle处理一个考勤记录的需求,我在考虑如何尽量精简实现上面花了一些时间.于是把这个实现做个总结. 需求如 ...
- 数学+高精度 ZOJ 2313 Chinese Girls' Amusement
题目传送门 /* 杭电一题(ACM_steps 2.2.4)的升级版,使用到高精度: 这次不是简单的猜出来的了,求的是GCD (n, k) == 1 最大的k(1, n/2): 1. 若n是奇数,则k ...
- java中equals和hashCode方法的解析
解析Java对象的equals()和hashCode()的使用 前言 在Java语言中,equals()和hashCode()两个函数的使用是紧密配合的,你要是自己设计其中一个,就要设计另外一个.在多 ...
- ZOJ3362 Beer Problem(最小费用任意流)
题目大概说有n个城市,由m条无向边相连,每条边每天最多运送cap桶酒且其运送一桶的花费是cost.现在从1号城市开始出发运酒,供应到2到n号城市,这些城市的收购单价是price,问最大的盈利是多少. ...
- BZOJ1580 : [Usaco2009 Hol]Cattle Bruisers 杀手游戏
以贝茜为参照物,则贝茜固定于原点,每个杀手是一个圆心在某条射线上的圆. 解出每个杀手可以射杀贝茜的时间区间,然后扫描线即可,时间复杂度$O(n\log n)$. #include<cstdio& ...