Proxypool代理池搭建

个人博客:点我

前言

项目地址 : https://github.com/jhao104/proxy_pool

这个项目是github上一个大佬基于python爬虫制作的定时获取免费可用代理并入池的代理池项目

我们来具体实现一下。

具体操作

1.安装配置redis

将自动爬取的代理入池需要redis数据库，首先就得安装redis。

redis官方建议我们在linux上安装，安装方式主要有两种，直接包获取或手动安装。

- 指令安装

apt-get install redis-server

- 手动安装

在官网下载最新redis安装包，导入Linux。

tar -zxvf redis-6.2.6.tar.gz

cd redis-6.2.6/

make

make install

cd /usr/local/bin

mkdir config

cp /opt/redis-6.2.6/redis.conf config		# 默认安装位置为/opt

配置文件修改

修改redis配置文件(注意两种安装方式的配置文件位置不同,自动安装在/etc/redis/redis.conf，手动安装在/opt/redis-6.2.6/redis.conf)，进行如下修改：

daemonize yes		# 守护进程开启

protected-mode no   # 关闭保护模式

# bind 127.0.0.1 ::1			# 此条为仅允许本地访问，必须注释掉

port 6379			# redis 开放端口(如果是有防火墙的服务器需要开启该端口)

开启redis

redis-server config/redis.conf

redis-cli

如需停止:

shutdown

exit

2.拉取并使用脚本

根据项目文档，可以手动配置也可以使用docker部署(推荐)

docker 使用方法见另一篇博客

docker pull jhao104/proxy_pool

docker run --env DB_CONN=redis://:[password]@[ip]:[port]/[db] -p 5010:5010 jhao104/proxy_pool:latest

password 没有可为空

db 默认0

运行成功应如图:

3.生成配置文件并导入Proxyfier

首先pip安装redis包

pip install redis

编译以下代码，注意修改第8行的ip和port（redis）

# -*- coding:utf8 -*-

import redis

import json

from xml.etree import ElementTree

def RedisProxyGet():

    ConnectString = []

    pool = redis.ConnectionPool(host='[ip]', port=[port], db=0, decode_responses=True)

    use_proxy = redis.Redis(connection_pool=pool)

    key = use_proxy.hkeys('use_proxy')

    for temp in key:

        try:

            ConnectString.append(json.loads(use_proxy.hget('use_proxy',temp)))

        except json.JSONDecodeError: # JSON解析异常处理

            pass

    return ConnectString

def xmlOutputs(data):

    i = 101

    ProxyIDList = []

    ProxifierProfile = ElementTree.Element("ProxifierProfile")

    ProxifierProfile.set("version", str(i))

    ProxifierProfile.set("platform", "Windows")

    ProxifierProfile.set("product_id", "0")

    ProxifierProfile.set("product_minver", "310")

    Options = ElementTree.SubElement(ProxifierProfile, "Options")

    Resolve = ElementTree.SubElement(Options, "Resolve")

    AutoModeDetection = ElementTree.SubElement(Resolve, "AutoModeDetection")

    AutoModeDetection.set("enabled", "false")

    ViaProxy = ElementTree.SubElement(Resolve, "ViaProxy")

    ViaProxy.set("enabled", "false")

    TryLocalDnsFirst = ElementTree.SubElement(ViaProxy, "TryLocalDnsFirst")

    TryLocalDnsFirst.set("enabled", "false")

    ExclusionList = ElementTree.SubElement(Resolve, "ExclusionList")

    ExclusionList.text = "%ComputerName%; localhost; *.local"

    Encryption = ElementTree.SubElement(Options, "Encryption")

    Encryption.set("mode", 'basic')

    Encryption = ElementTree.SubElement(Options, "HttpProxiesSupport")

    Encryption.set("enabled", 'true')

    Encryption = ElementTree.SubElement(Options, "HandleDirectConnections")

    Encryption.set("enabled", 'false')

    Encryption = ElementTree.SubElement(Options, "ConnectionLoopDetection")

    Encryption.set("enabled", 'true')

    Encryption = ElementTree.SubElement(Options, "ProcessServices")

    Encryption.set("enabled", 'false')

    Encryption = ElementTree.SubElement(Options, "ProcessOtherUsers")

    Encryption.set("enabled", 'false')

    ProxyList = ElementTree.SubElement(ProxifierProfile, "ProxyList")

    for temp in data:

        i += 1  # 从101开始增加

        Proxy = ElementTree.SubElement(ProxyList, "Proxy")

        Proxy.set("id", str(i))

        if not temp['https']:

            Proxy.set("type", "HTTP")

        else:

            Proxy.set("type", "HTTPS")

            Proxy.text = str(i)

            ProxyIDList.append(i)

        Address = ElementTree.SubElement(Proxy, "Address")

        Address.text = temp['proxy'].split(":", 1)[0]

        Port = ElementTree.SubElement(Proxy, "Port")

        Port.text = temp['proxy'].split(":", 1)[1]

        Options = ElementTree.SubElement(Proxy, "Options")

        Options.text = "48"

    ChainList = ElementTree.SubElement(ProxifierProfile, "ChainList")

    Chain = ElementTree.SubElement(ChainList, "Chain")

    Chain.set("id", str(i))

    Chain.set("type", "simple")

    Name = ElementTree.SubElement(Chain, "Name")

    Name.text="AgentPool"

    for temp_id in ProxyIDList:

        Proxy = ElementTree.SubElement(Chain, "Proxy")

        Proxy.set("enabled", "true")

        Proxy.text=str(temp_id)

    RuleList = ElementTree.SubElement(ProxifierProfile, "RuleList")

    Rule = ElementTree.SubElement(RuleList, "Rule")

    Rule.set("enabled", "true")

    Name = ElementTree.SubElement(Rule,"Name")

    Applications = ElementTree.SubElement(Rule,"Applications")

    Action = ElementTree.SubElement(Rule,"Action")

    Name.text="御剑后台扫描工具.exe [auto-created]"

    Applications.text="御剑后台扫描工具.exe"

    Action.set("type","Direct")

    # Rule

    Rule = ElementTree.SubElement(RuleList, "Rule")

    Rule.set("enabled", "true")

    Name = ElementTree.SubElement(Rule,"Name")

    Targets = ElementTree.SubElement(Rule,"Targets")

    Action = ElementTree.SubElement(Rule,"Action")

    Name.text="Localhost"

    Targets.text="localhost; 127.0.0.1; %ComputerName%"

    Action.set("type", "Direct")

    # Rule

    Rule = ElementTree.SubElement(RuleList, "Rule")

    Rule.set("enabled", "true")

    Name = ElementTree.SubElement(Rule, "Name")

    Action = ElementTree.SubElement(Rule, "Action")

    Name.text = "Default"

    Action.text = "102"

    Action.set("type", "Proxy")

    tree = ElementTree.ElementTree(ProxifierProfile)

    tree.write("ProxifierConf.ppx", encoding="UTF-8", xml_declaration=True)

    if __name__ == '__main__':

    proxy_data = RedisProxyGet()

    xmlOutputs(proxy_data)

    print("ProxifierConf.ppx配置文件创建完成....")

编译成功生成ProxyfierConf.ppx文件。双击导入proxyfier即可

这里proxyfier的版本不能太高，否则会报错，建议3.3.1

Proxypool代理池搭建的更多相关文章

python爬虫redis-ip代理池搭建几十万的ip数据--可以使用
from bs4 import BeautifulSoupimport requests,os,sys,time,random,redisfrom lxml import etreeconn = re ...
【Python3爬虫】教你怎么利用免费代理搭建代理池
一.写在前面有时候你的爬虫刚开始的时候可以正常运行,能够正常的爬取数据,但是过了一会,却出现了一个“403 Forbidden",或者是”您的IP访问频率太高“这样的提示,这就意味着你的I ...
反爬虫之搭建IP代理池
反爬虫之搭建IP代理池听说你又被封 ip 了,你要学会伪装好自己,这次说说伪装你的头部.可惜加了header请求头,加了cookie 还是被限制爬取了.这时就得祭出IP代理池!!! 下面就是requ ...
进程线程协程补充、docker-compose一键部署项目、搭建代理池、requests超时设置、认证设置、异常处理、上传文件
今日内容概要补充:进程,线程,协程 docker-compose一键部署演示搭建代理池 requests超时设置 requests认证设置 requests异常处理 requests上传文件内容 ...
配置个人Ip代理池
做爬虫最害怕的两件事一个是被封账户一个是被封IP地址,IP地址可以使用代理来解决,网上有许多做IP代理的服务,他们提供大量的IP地址,不过这些地址不一定都是全部可用,因为这些IP地址可能被其他人做爬虫 ...
介绍一种 Python 更方便的爬虫代理池实现方案
现在搞爬虫,代理是不可或缺的资源很多人学习python,不知道从何学起.很多人学习python,掌握了基本语法过后,不知道在哪里寻找案例上手.很多已经做案例的人,却不知道如何去学习更加高深的知识.那 ...
Python爬虫代理池
爬虫代理IP池在公司做分布式深网爬虫,搭建了一套稳定的代理池服务,为上千个爬虫提供有效的代理,保证各个爬虫拿到的都是对应网站有效的代理IP,从而保证爬虫快速稳定的运行,当然在公司做的东西不能开源出来 ...
Python实现的异步代理爬虫及代理池
使用python asyncio实现了一个异步代理池,根据规则爬取代理网站上的免费代理,在验证其有效后存入redis中,定期扩展代理的数量并检验池中代理的有效性,移除失效的代理.同时用aiohttp实 ...
记一次企业级爬虫系统升级改造（六）：基于Redis实现免费的IP代理池
前言: 首先表示抱歉,春节后一直较忙,未及时更新该系列文章. 近期,由于监控的站源越来越多,就偶有站源做了反爬机制,造成我们的SupportYun系统小爬虫服务时常被封IP,不能进行数据采集. 这时候 ...

随机推荐

Robot Framework 面试题
什么是 RF 基于可扩展关键字驱动的自动化测试框架什么是可扩展关键字驱动可扩展意味着可以自己开发,也可以调用第三方的关键字库关键字驱动意味着测试用例都是围绕着关键字运行的 RF 的原理(框架?) ...
adb 常用命令大全（6）- 模拟按键输入
语法格式 input [<source>] <command> [<arg>...] 物理键 # 电源键 adb shell input keyevent 26 # ...
Spring基于XML方式加载Bean定义信息(又名：Spring IOC源码时序图)-图解
Elasticsearch-head插件的安装与配置
第一种: 通过浏览器添加插件通过chrome安装插件的方式提供一个可操作es的图形化界面. 在chrome 浏览器中,通过"扩展程序" 添加 elasticsearch head ...
支持Cron表达式、间隔时间的工具（TaskScheduler）
后台任务如何支持间隔时间.Cron表达式两种方式? 分享一个项目TaskScheduler,这是我从Furion项目中拷出来的源码:https://gitee.com/dot-net-core/ta ...
Spring Boot 2.x 之构建Fat Jar和可执行Jar
Spring Boot提供的Maven插件spring-boot-maven-plugin可以用来构建Fat Jar和可执行Jar. 1.Fat Jar Fat Jar需要使用 java -jar x ...
C语言中的符号重载
摘自<C专家编程>第二章37页 C语言中符号的重载符号意义 static 在函数内部,表示该变量的值在各个调用间一直保持延续性在函数这一级,表示 ...
一个简单的session传值学习
a.html <!DOCTYPE html> <html lang="en"> <head> <meta charset="UT ...
vijos题解
Vijos题解题库地址:https://vijos.org/p P1001 谁拿了最多奖学金题意:按照指定要求计算奖学金,直接用if判断即可 #include<iostream> us ...
TP5关联模型出现疑问,待解决
一对一: hasOne('关联模型名','外键名','主键名',['模型别名定义'],'join类型'); 说明:其他模型一对一都可以按照手册说明写,但上面代码只能这么写才能执行正确答案, 如果是 / ...