fun下载内容批量收集

1.download title and url

#!/usr/bin/env python

#-*- coding:utf-8 -*-

import re, urllib2,threading

def geturltitle(match, file):

    s = match.group();

    p = re.compile(r'^\[mukio=file\]');

    downurl = re.sub(p, '', s);

    print downurl;

    # 过滤url

    if downurl:

        file.writelines(downurl);

        file.write('\n');

        # for line in downurl:

        #     file.write(line);

    # 过滤title

    pattern1 = re.compile(r'<meta name="keywords" content="\S.*"');

    match1 = pattern1.search(respread);

    if match1:

        s1 = match1.group();

        p1 = re.compile(r'^<meta name="keywords" content="');

        title = re.sub(p1, '', s1);

        print title;

        if title:

            file.writelines(title);

            file.write('\n\n');

            # for line in title:

            #     file.write(line);

while 1:

    file = open('avfun1.txt', 'w');

    for  n in range(3600,9000):

        try:

            resp = urllib2.urlopen('http://www.avfun1.com/forum.php?mod=viewthread&tid='+repr(n)+'&mobile=yes', timeout = 2);

            respread = resp.read();

            pattern = re.compile(r'\[mukio=file\]\S.*mp4');

            match = pattern.search(respread);

            print "pid = " + repr(n)

            if match:

                threading.Thread(target=geturltitle(match, file)).start();

            # else:

            #     continue;

            pass

        except Exception, e:

            print e;

            pass

        else:

            pass

        finally:

            pass

        

    file.close();

    break;

2.rename title from file

#!/usr/bin/env python

#-*- coding:utf-8 -*-

import re, os

dir = "/Users/apple/Downloads/avfun1/" #文件目录

if os.path.isdir(dir): #检验目录是否有误

  print ("Directory exists!")

else:

  print ("Directory not exist.")

filelist=os.listdir(dir+'aaa')

file = open(dir+'avfun1.txt', 'rb');

dir = dir + 'aaa'

'''for line in file:

    print line

'''

str = file.read()

for name in filelist:

    match = re.search(name+r'\n\S.*', str)

    if match:

        str1 = match.group();

        tt_match = re.search(r'[^\d.mp4\n].*$', str1)

        newfile = tt_match.group()+'.mp4' #获取匹配名存为newfile

        print name

        print newfile

        os.rename(os.path.join(dir,name),os.path.join(dir,newfile))

        

    else:

print match

fun下载内容批量收集的更多相关文章

向linux服务器上传下载文件方式收集
向linux服务器上传下载文件方式收集 1. scp [优点]简单方便,安全可靠:支持限速参数[缺点]不支持排除目录[用法] scp就是secure copy,是用来进行远程文件拷贝的.数据传输使用 ...
(转)libcurl应用：如何把下载内容写入内存
libcurl应用:如何把下载内容写入内存 2008-01-13 00:32:52| 分类: 默认分类 |举报 |字号订阅 libcurl的文档中有 getinmemory.c这个例子,把下载 ...
EasyUI form ajax submit到MVC后，在IE下提示下载内容的解决办法
问题描述: 项目环境为,.Net Mvc5+EF6……前端框架使用的是EasyUI v1.4.4. 在视图页面中,使用form的submit方法提交表单数据时,如果是使用IE的话,请求成功后IE会提示 ...
API例子：用Java/JavaScript下载内容提取器
1,引言本文讲解怎样用Java和JavaScript使用 GooSeeker API 接口下载内容提取器,这是一个示例程序.什么是内容提取器?为什么用这种方式?源自Python即时网络爬虫开源项目: ...
chrome浏览器下载内容存放位置
点击: 或者直接快捷键 ctrl +J 打开下载的资料在[设置]中可将浏览器设置成默认浏览器,更换下载内容的存放位置:给一个提示,本人未曾修改下载的存放位置,是用户/Administrator/Dow ...
2018-11-8-WPF-获取下载内容长度
title author date CreateTime categories WPF 获取下载内容长度 lindexi 2018-11-08 20:18:15 +0800 2018-11-08 20 ...
shell脚本批量收集linux服务器的硬件信息快速实现
安装ansible批量管理系统.(没有的话,ssh远程命令循环也可以) 在常用的数据库里面新建一张表,用你要收集的信息作为列名,提供可以用shell插入.
nodejs读取excel内容批量替换并生成新的html和新excel对照文件
因为广告投放需要做一批对外投放下载页面,由于没有专门负责填充页面的编辑同学做,只能前端来做了, 拿到excel看了一下,需要生成200多个文件,一下子懵逼了. 这要是来回复制粘贴太low了正好最新用 ...

随机推荐

Using MultiPath TCP to enhance home networks
Over the last few months I’ve been playing with MultiPath TCP and in this post I will show how I use ...
【Linux学习】Linux系统管理2—作业调度
Linux系统管理2-作业调度 at: 作业仅执行一次就从系统工作队列中取消语法 denny@ubuntu:~$ at [-m] TIME → 作业命令at ...
Eclipse SVN 图标解释
[转]http://blog.sina.com.cn/s/blog_64941c8101018dno.html - 已忽略版本控制的文件.可以通过Window → Preferences → Team ...
LeetCode：104 Maximum Depth of Binary Tree(easy)
题目: Given a binary tree, find its maximum depth. The maximum depth is the number of nodes along the ...
Codeforces 61B【怪在读题】
搞不懂为什么DFS的写法崩了,然后乱暴力,因为题意不是很懂... 主要还是读题吧(很烦 #include <bits/stdc++.h> using namespace std; type ...
基于GPU的优化处理
http://www.cnblogs.com/wuhanhoutao/archive/2007/11/10/955293.html 早期的三维场景绘制,显卡只是为屏幕上显示像素提供一个缓存,所有的图形 ...
bzoj 2502: 清理雪道【有上下界有源汇最小流】
对于原有边,流区间是(1,inf),按着原边连,然后再连(s,i,(0,inf)),(i,t,(0,inf))表示任意位置进出雪场按着这个建出新图然后最小流的方法是先跑可行流,设ans为(t,s, ...
web前端篇：JavaScript基础篇（易懂小白上手快）-1
目录详细内容: 0.JavaScript的引入 1.第一个JavaScript 2.变量 3.变量的类型 4.数组 5.条件语句 6.三元运算符 7.循环 8.函数 9.对象(object): 10 ...
java string（转）
初探Java字符串优化变成了忧患:String.split引发的“内存泄露” String是java中的无处不在的类,使用也很简单.初学java,就已经有字符串是不可变的盖棺定论,解释通常是:它是f ...
E. Cyclic Components (DFS)（Codeforces Round #479 (Div. 3)）
#include <bits/stdc++.h> using namespace std; *1e5+; vector<int>p[maxn]; vector<int&g ...

fun下载内容批量收集

fun下载内容批量收集的更多相关文章

随机推荐

热门专题