最近听说Python写爬虫很厉害,想体验一下

于是我就抱着这样的心态开始了我的Scrapy之旅

第一步当然是Scrapy的安装了,这玩意我发现在windows下倒是挺好装的,照着教程一路走就行了

官方的安装教程在这

http://doc.scrapy.org/en/latest/intro/install.html

想要麻烦点的话可以看开源中国的某大神写的教程

http://my.oschina.net/dragonblog/blog/173290

整个Scrapy的环境依赖如下:

Python 2.7
pip and setuptools Python packages. Nowadays pip requires and installs setuptools if not installed.
lxml. Most Linux distributions ships prepackaged versions of lxml. Otherwise refer to http://lxml.de/installation.html
OpenSSL. This comes preinstalled in all operating systems, except Windows where the Python installer ships it bundled.

这里我重点想说的不是windows下的安装,而是在RedHat系的CentOS下的安装,遇到了很多问题

之前看官方教程的时候看得快,一眼晃过去以为说非要2.7.9才能安装scrapy

Python 2.7.9 and later

于是下载了个Python2.7.9的源代码下来编译安装,结果出了一大堆问题

又因为yum本身就是基于python的,他们的那个python2.7.5是经过特殊扩展之后才发布的,所以装2.7.9的时候差点把yum也搞坏了

后来仔细一看,上面写着

pip included with Python

顿时就无语了,它的意思是说,在2.7.9版本之后,Python都自带了pip,好吧,怪我眼瞎,正确的安装方式如下

python是CentOS自带的,所以就不用再重复安装了

直接去下载get-pip.py

wget -c https://bootstrap.pypa.io/get-pip.py

下载完之后执行安装即可

python get-pip.py

装完之后直接输入pip回车看看,如果有显示参数列表的话就说明安装成功了

接着安装setuptools和lxml

pip install -U setuptools
pip install lxml

要是遇到了说找不到python.h的话,要先装一下python开发工具包

yum -y install python-devel

如果出现说没有找到合适的C Compiler的话,装一下gcc即可

yum -y install gcc

装完setuptools和lxml之后


可以安装openssl了,这里建议把devel也装上

yum -y install openssl openssl-devel

弄完之后最后就把Scrapy装上,这后面基本不会出现问题了

pip install scrapy

装完之后输入scrapy就可以知道是否安装成功了,安装完毕后的结果如下

# scrapy
Scrapy 1.0.0 - no active project

Usage:
  scrapy <command> [options] [args]

Available commands:
  bench         Run quick benchmark test
  commands      
  fetch         Fetch a URL using the Scrapy downloader
  runspider     Run a self-contained spider (without creating a project)
  settings      Get settings values
  shell         Interactive scraping console
  startproject  Create new project
  version       Print Scrapy version
  view          Open URL in browser, as seen by Scrapy

  [ more ]      More commands available when run from project directory

Use "scrapy <command> -h" to see more info about a command