根据 savefrom条例

本实例及教程只用于学习交流用，权利归savefrom.net所有

最后代码+注释大概100行左右，具体代码以github代码为主(可以会在上面修复bug)，本文只做具体讲解

项目地址

思路

流程

1. post

根据思路里的第一步，我们首先需要用post方式取到加密后的js字段，笔者使用了requests第三方库来执行，关于爬虫可以参考我之前的文章

i. 先把post中的headers格式化

# set the headers or the website will not return information

    # the cookies in here you may need to change

    headers = {

        "cache-Control": "no-cache",

        "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,"

                  "*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",

        "accept-encoding": "gzip, deflate, br",

        "accept-language": "zh-CN,zh;q=0.9,en;q=0.8",

        "content-type": "application/x-www-form-urlencoded",

        "cookie": "lang=en; country=CN; uid=fd94a82a406a8dd4; sfHelperDist=72; reference=14; "

                  "clickads-e2=90; poropellerAdsPush-e=63; promoBlock=64; helperWidget=92; "

                  "helperBanner=42; framelessHdConverter=68; inpagePush2=68; popupInOutput=9; "

                  "_ga=GA1.2.799702638.1610248969; _gid=GA1.2.628904587.1610248969; "

                  "PHPSESSID=030393eb0776d20d0975f99b523a70d4; x-requested-with=; "

                  "PHPSESSUD=islilfjn5alth33j9j8glj9776; _gat_helperWidget=1; _gat_inpagePush2=1",

        "origin": "https://en.savefrom.net",

        "pragma": "no-cache",

        "referer": "https://en.savefrom.net/1-youtube-video-downloader-4/",

        "sec-ch-ua": "\"Google Chrome\";v=\"87\", \"Not;A Brand\";v=\"99\",\"Chromium\";v=\"87\"",

        "sec-ch-ua-mobile": "?0",

        "sec-fetch-dest": "iframe",

        "sec-fetch-mode": "navigate",

        "sec-fetch-site": "same-origin",

        "sec-fetch-user": "?1",

        "upgrade-insecure-requests": "1",

        "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) "

                      "Chrome/87.0.4280.88 Safari/537.36"}

其中cookie部分可能要改，然后最好以你们浏览器上的为主，具体每个参数的含义不是本文范围，可以自行去搜索引擎搜

ii.然后把参数也格式化

# set the parameter, we can get from chrome

    kv = {"sf_url": url,

          "sf_submit": "",

          "new": "1",

          "lang": "en",

          "app": "",

          "country": "cn",

          "os": "Windows",

          "browser": "Chrome"}

其中sf_url字段是我们要下载的youtube视频的url，其他参数都不变

iii. 最后再执行`requests`库的post请求

# do the POST request

    r = requests.post(url="https://en.savefrom.net/savefrom.php", headers=headers,

                      data=kv)

    r.raise_for_status()

注意是data=kv

iv. 封装成一个函数

import requests

def gethtml(url):

    # set the headers or the website will not return information

    # the cookies in here you may need to change

    headers = {

        "cache-Control": "no-cache",

        "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,"

                  "*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",

        "accept-encoding": "gzip, deflate, br",

        "accept-language": "zh-CN,zh;q=0.9,en;q=0.8",

        "content-type": "application/x-www-form-urlencoded",

        "cookie": "lang=en; country=CN; uid=fd94a82a406a8dd4; sfHelperDist=72; reference=14; "

                  "clickads-e2=90; poropellerAdsPush-e=63; promoBlock=64; helperWidget=92; "

                  "helperBanner=42; framelessHdConverter=68; inpagePush2=68; popupInOutput=9; "

                  "_ga=GA1.2.799702638.1610248969; _gid=GA1.2.628904587.1610248969; "

                  "PHPSESSID=030393eb0776d20d0975f99b523a70d4; x-requested-with=; "

                  "PHPSESSUD=islilfjn5alth33j9j8glj9776; _gat_helperWidget=1; _gat_inpagePush2=1",

        "origin": "https://en.savefrom.net",

        "pragma": "no-cache",

        "referer": "https://en.savefrom.net/1-youtube-video-downloader-4/",

        "sec-ch-ua": "\"Google Chrome\";v=\"87\", \"Not;A Brand\";v=\"99\",\"Chromium\";v=\"87\"",

        "sec-ch-ua-mobile": "?0",

        "sec-fetch-dest": "iframe",

        "sec-fetch-mode": "navigate",

        "sec-fetch-site": "same-origin",

        "sec-fetch-user": "?1",

        "upgrade-insecure-requests": "1",

        "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) "

                      "Chrome/87.0.4280.88 Safari/537.36"}

    # set the parameter, we can get from chrome

    kv = {"sf_url": url,

          "sf_submit": "",

          "new": "1",

          "lang": "en",

          "app": "",

          "country": "cn",

          "os": "Windows",

          "browser": "Chrome"}

    # do the POST request

    r = requests.post(url="https://en.savefrom.net/savefrom.php", headers=headers,

                      data=kv)

    r.raise_for_status()

    # get the result

    return r.text

2. 调用解密函数

i. 分析

这其中的难点在于在python里执行javascript代码，而晚上的解决方法有PyV8等，本文选用execjs。在思路部分我们可以发现js部分的最后几行是解密函数，所以我们只需要在execjs中先执行一遍全部，然后再单独执行解密函数就好了

ii. 先取出js部分

# target(youtube address) url

    url = "https://www.youtube.com/watch?v=YPvtz1lHRiw"

    # get the target text

    reo = gethtml(url)

    # Remove the code from the head and tail (we need the javascript part, information store with encryption in js part)

    reo = reo.split("<script type=\"text/javascript\">")[1].split("</script>")[0]

这里其实可以用正则，不过由于笔者正则表达式还不太熟练就直接用split了

iii. 取第一个解密函数作为我们用的解密函数

当你多取几次不同视频的结果，你就会发现每次的解密函数都不一样，不过位置都是还是在固定行数

# split each line(help us find the decrypt function in last few line)

    reA = reo.split("\n")

    # get the depcrypt function

    name = reA[len(reA) - 3].split(";")[0] + ";"

所以name就是我们的解密函数了(变量名没取太好hhh)

iv. 用execjs执行

# use execjs to execute the js code, and the cwd is the result of `npm root -g`(the path of npm in your computer)

    ct = execjs.compile(reo)

    # do the decryption

    text = ct.eval(name.split("=")[1].replace(";", ""))

其中只取=后面的和去掉分号是指指执行这个函数而不用赋值，当先执行赋值+解密然后取值也不是不可以

但是我们可以发现马上就报错了(要是有这么简单就好了)

1. this也就是window变量不存在

如果没记错是报错this或者$b，笔者尝试把全部this去掉或者把全部框在一个class里面(这样子this就变成那个class了)不过都没有成功，然后发现在npm下有个jsdom可以在execjs里模拟window变量(其实应该有更好方法的)，所以我们需要下载npm和里面的jsdom，然后改写以上代码

    addition = """

    const jsdom = require("jsdom");

    const { JSDOM } = jsdom;

    const dom = new JSDOM(`<!DOCTYPE html><p>Hello world</p>`);

    window = dom.window;

    document = window.document;

    XMLHttpRequest = window.XMLHttpRequest;

    """

    # use execjs to execute the js code, and the cwd is the result of `npm root -g`(the path of npm in your computer)

    ct = execjs.compile(addition + reo, cwd=r'C:\Users\xxx\AppData\Roaming\npm\node_modules')

其中

cwd字段是npm root -g的结果，也就是npm的modules路径
addition是用来模拟window的

但是我们又可以发现下一个错误

2. alert不存在

这个错误是因为在execjs下执行alert函数是没有意义的，因为我们没有浏览器让他弹窗，且原本alert函数的定义是来源window而我们自定义了window，所以我们要在代码前重写覆盖alert函数(相当于定义一个alert)

# override the alert function, because in the code there has one place using

    # and we cannot do the alerting in execjs(it is meaningless) however, if we donnot override, the code will raise a error

    reo = reo.replace("(function(){", "(function(){\nthis.alert=function(){};")

v. 整合代码

# target(youtube address) url

    url = "https://www.youtube.com/watch?v=YPvtz1lHRiw"

    # get the target text

    reo = gethtml(url)

    # Remove the code from the head and tail (we need the javascript part, information store with encryption in js part)

    reo = reo.split("<script type=\"text/javascript\">")[1].split("</script>")[0]

    # override the alert function, because in the code there has one place using

    # and we cannot do the alerting in execjs(it is meaningless) however, if we donnot override, the code will raise a error

    reo = reo.replace("(function(){", "(function(){\nthis.alert=function(){};")

    # split each line(help us find the decrypt function in last few line)

    reA = reo.split("\n")

    # get the depcrypt function

    name = reA[len(reA) - 3].split(";")[0] + ";"

    # add jsdom into the execjs because the code will use(maybe there is a solution without jsdom, but i have no idea)

    addition = """

    const jsdom = require("jsdom");

    const { JSDOM } = jsdom;

    const dom = new JSDOM(`<!DOCTYPE html><p>Hello world</p>`);

    window = dom.window;

    document = window.document;

    XMLHttpRequest = window.XMLHttpRequest;

    """

    # use execjs to execute the js code, and the cwd is the result of `npm root -g`(the path of npm in your computer)

    ct = execjs.compile(addition + reo, cwd=r'C:\Users\19308\AppData\Roaming\npm\node_modules')

    # do the decryption

    text = ct.eval(name.split("=")[1].replace(";", ""))

3. 分析解密结果

i. 取关键json

运行完上面的部分，解密结果就存在text里了，而我们在思路中可以发现，真正对我们重要的就是存在window.parent.sf.videoResult.show()里的json，所以用正则表达式取这一部分的json

# get the result in json

    result = re.search('show\((.*?)\);;', text, re.I | re.M).group(0).replace("show(", "").replace(");;", "")

ii. 格式化json

python可以格式化json的库有很多，这里笔者用了json库(记得import)

# use `json` to load json

    j = json.loads(result)

iii. 取下载地址

接下来就到了最后一步，根据思路里和json格式化工具我们可以发现j["url"][num]["url"]就是下载链接，而num是我们要的视频格式(不同分辨率和类型)

# the selection of video(in this case, num=1 mean the video is

    # - 360p known from j["url"][num]["quality"]

    # - MP4 known from j["url"][num]["type"]

    # - audio known from j["url"][num]["audio"]

    num = 1

    downurl = j["url"][num]["url"]

    # do some download

    # thanks :)

    # - EOF -

3. 全部代码

# -*- coding: utf-8 -*-

# @Time: 2021/1/10

# @Author: Eritque arcus

# @File: Youtube.py

# @License: MIT

# @Environment:

#           - windows 10

#           - python 3.6.2

# @Dependence:

#           - jsdom in npm(windows also can use)

#           - requests, execjs, re, json in python

import requests

import execjs

import re

import json

def gethtml(url):

    # set the headers or the website will not return information

    # the cookies in here you may need to change

    headers = {

        "cache-Control": "no-cache",

        "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,"

                  "*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",

        "accept-encoding": "gzip, deflate, br",

        "accept-language": "zh-CN,zh;q=0.9,en;q=0.8",

        "content-type": "application/x-www-form-urlencoded",

        "cookie": "lang=en; country=CN; uid=fd94a82a406a8dd4; sfHelperDist=72; reference=14; "

                  "clickads-e2=90; poropellerAdsPush-e=63; promoBlock=64; helperWidget=92; "

                  "helperBanner=42; framelessHdConverter=68; inpagePush2=68; popupInOutput=9; "

                  "_ga=GA1.2.799702638.1610248969; _gid=GA1.2.628904587.1610248969; "

                  "PHPSESSID=030393eb0776d20d0975f99b523a70d4; x-requested-with=; "

                  "PHPSESSUD=islilfjn5alth33j9j8glj9776; _gat_helperWidget=1; _gat_inpagePush2=1",

        "origin": "https://en.savefrom.net",

        "pragma": "no-cache",

        "referer": "https://en.savefrom.net/1-youtube-video-downloader-4/",

        "sec-ch-ua": "\"Google Chrome\";v=\"87\", \"Not;A Brand\";v=\"99\",\"Chromium\";v=\"87\"",

        "sec-ch-ua-mobile": "?0",

        "sec-fetch-dest": "iframe",

        "sec-fetch-mode": "navigate",

        "sec-fetch-site": "same-origin",

        "sec-fetch-user": "?1",

        "upgrade-insecure-requests": "1",

        "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) "

                      "Chrome/87.0.4280.88 Safari/537.36"}

    # set the parameter, we can get from chrome

    kv = {"sf_url": url,

          "sf_submit": "",

          "new": "1",

          "lang": "en",

          "app": "",

          "country": "cn",

          "os": "Windows",

          "browser": "Chrome"}

    # do the POST request

    r = requests.post(url="https://en.savefrom.net/savefrom.php", headers=headers,

                      data=kv)

    r.raise_for_status()

    # get the result

    return r.text

if __name__ == '__main__':

    # target(youtube address) url

    url = "https://www.youtube.com/watch?v=YPvtz1lHRiw"

    # get the target text

    reo = gethtml(url)

    # Remove the code from the head and tail (we need the javascript part, information store with encryption in js part)

    reo = reo.split("<script type=\"text/javascript\">")[1].split("</script>")[0]

    # override the alert function, because in the code there has one place using

    # and we cannot do the alerting in execjs(it is meaningless) however, if we donnot override, the code will raise a error

    reo = reo.replace("(function(){", "(function(){\nthis.alert=function(){};")

    # split each line(help us find the decrypt function in last few line)

    reA = reo.split("\n")

    # get the depcrypt function

    name = reA[len(reA) - 3].split(";")[0] + ";"

    # add jsdom into the execjs because the code will use(maybe there is a solution without jsdom, but i have no idea)

    addition = """

    const jsdom = require("jsdom");

    const { JSDOM } = jsdom;

    const dom = new JSDOM(`<!DOCTYPE html><p>Hello world</p>`);

    window = dom.window;

    document = window.document;

    XMLHttpRequest = window.XMLHttpRequest;

    """

    # use execjs to execute the js code, and the cwd is the result of `npm root -g`(the path of npm in your computer)

    ct = execjs.compile(addition + reo, cwd=r'C:\Users\19308\AppData\Roaming\npm\node_modules')

    # do the decryption

    text = ct.eval(name.split("=")[1].replace(";", ""))

    # get the result in json

    result = re.search('show\((.*?)\);;', text, re.I | re.M).group(0).replace("show(", "").replace(");;", "")

    # use `json` to load json

    j = json.loads(result)

    # the selection of video(in this case, num=1 mean the video is

    # - 360p known from j["url"][num]["quality"]

    # - MP4 known from j["url"][num]["type"]

    # - audio known from j["url"][num]["audio"]

    num = 1

    downurl = j["url"][num]["url"]

    # do some download

    # thanks :)

    # - EOF -

总计102行
开发环境

# @Environment:

#           - windows 10

#           - python 3.6.2

依赖

# @Dependence:

#           - jsdom in npm(windows also can use)

#           - requests, execjs, re, json in python

-end-

For 爬虫

版权声明：本文为博主原创文章，遵循CC 4.0 BY-SA版权协议，转载请附上原文出处链接和本声明。

本文作者: https://www.cnblogs.com/Eritque-arcus/ 或https://blog.csdn.net/qq_40832960

用python做youtube自动化下载器代码的更多相关文章

用python做youtube自动化下载器思路
目录 0. 思路 1.准备 i.savfrom.net 2. 探索并规划获取方式 i.总览 ii. 获取该网页取到下载url的请求 iii. 在本地获取请求 iv.解析请求结果 v.解析解密后的结果 ...
Python实现多线程HTTP下载器
本文将介绍使用Python编写多线程HTTP下载器,并生成.exe可执行文件. 环境:windows/Linux + Python2.7.x 单线程在介绍多线程之前首先介绍单线程.编写单线程的思路为 ...
使用appium+python做UI自动化的demo
使用appium+python做UI自动化的demo 案例使用的知乎app,下载最新的知乎apk,存在了电脑上,只需要配置本机上app目录,不需要再配置appPackage和appActivity # ...
python多进程断点续传分片下载器
python多进程断点续传分片下载器标签:python 下载器多进程因为爬虫要用到下载器,但是直接用urllib下载很慢,所以找了很久终于找到一个让我欣喜的下载器.他能够断点续传分片下载,极大提 ...
Python + Selenium +Chrome 批量下载网页代码修改【新手必学】
Python + Selenium +Chrome 批量下载网页代码修改主要修改以下代码可以调用本地的 user-agent.txt 和 cookie.txt来达到在登陆状态下批量打开并下载网页, ...
Qt+Python开发百度图片下载器
一.资源下载地址 https://www.aliyundrive.com/s/jBU2wBS8poH 本项目路径:项目->收费->百度图片下载器(可试用5分钟) 安装包直接下载地址:htt ...
python的内置下载器
python有个内置下载器,有时候在内部提供文件下载很好用. 进入提供下载的目录 # ls abc.aaa chpw.py finance.py lsdir.py ping.py u2d-partia ...
python ddt数据驱动（简化重复代码）
在接口自动化测试中,往往一个接口的用例需要考虑正确的.错误的.异常的.边界值等诸多情况,然后你需要写很多个同样代码,参数不同的用例.如果测试接口很多,不但需要写大量的代码,测试数据和代码柔合在一起, ...
基于iOS 10、realm封装的下载器
代码地址如下:http://www.demodashi.com/demo/11653.html 概要在决定自己封装一个下载器前,我本以为没有那么复杂,可在实际开发过程中困难重重,再加上iOS10和X ...

随机推荐

.NET 开源导入导出库 Magicodes.IE 2.5发布
今天我们发布了2.5版本,这当然也离不开大家对Magicodes.IE的支持,今天我也是跟往常一样列举了该版本一些重要的更新内容. 当然也要说一下,在这个版本中我们设计了全新的LOGO Excel导出 ...
个人项目作业WC
项目github地址 https://github.com/gs735028922gs/wordc 项目相关要求 wc.exe 是一个常见的工具,它能统计文本文件的字符数.单词数和行数.这个项目要求写 ...
【Codeforces 1097F】Alex and a TV Show（bitset & 莫比乌斯反演）
Description 你需要维护 $n$ 个可重集,并执行 $m$ 次操作: 1 x v:$X\leftarrow \{v\}$: 2 x y z:\(X\leftarrow Y \cu ...
spark有个节点特别慢，解决办法
除解决数据倾斜问题外,还要开启推测执行,寻找另一个executor执行task,哪个先完成就取哪个结果,再kill掉另一个.
微信小程序云开发如何上手
简要介绍微信小程序云开发,是基于 Serverless 的一站式后端云服务,涵盖函数.数据库.存储.CDN等服务,免后端运维.基于云开发可以免鉴权调用微信所有开放能力. 前提准备微信开发者工具创 ...
Consul的使用
Consul的使用生产部署中,Consul安装在要注册服务的每个节点上.Consul有两种运行模式:客户端和服务器端,每个Consul数据中心必须至少有一个服务器,负责维护Consul状态,为了 ...
Django使用channels实现Websocket连接
简述: 需求:消息实时推送消息以及通知功能,采用django-channels来实现websocket进行实时通讯.并使用docker.daphne启动通道,保持websocket后台运行介绍Dja ...
Flink集群监控
prometheus+grafana 监控hadoop.yarn https://blog.csdn.net/c275090933/article/details/82108014 Prometheu ...
Python自动化测试入门必读（最新）
入门自动化测试必读自动化测试概念自动化测试是把以人为驱动的测试行为转化为机器执行的一种过程.通常,在设计了测试用例并通过评审之后,由测试人员根据测试用例中描述的规程一步步执行测试,得到实际结果与期 ...
Java基础进阶:APi使用,Math,Arrarys,Objects工具类,自动拆装箱,字符串与基本数据类型互转,递归算法源码,冒泡排序源码实现,快排实现源码,附重难点,代码实现源码,课堂笔记,课后扩展及答案
要点摘要 Math: 类中么有构造方法,内部方法是静态的,可以直接类名.方式调用常用: Math.abs(int a):返回参数绝对值 Math.ceil(double a):返回大于或等于参数的最 ...

用python做youtube自动化下载器 代码