perl 爬取上市公司业绩预告

<pre name="code" class="python">use  LWP::UserAgent;

use utf8;

use DBI;

use POSIX;

use Data::Dumper;

use HTML::TreeBuilder;

  use HTML::TreeBuilder::XPath;

my $ua = LWP::UserAgent->new;

$ua->timeout(10);

$ua->env_proxy;

$ua->agent("Mozilla/8.0");

#my $response = $ua->get('http://data.10jqka.com.cn/financial/yjyg/date/2016-12-31/board/ALL/field/enddate/order/desc/page/1/ajax/1/');

#my $response = $ua->get('http://data.10jqka.com.cn/financial/yjyg/');

my @array=('2016-12-31','2016-03-31','2015-12-31','2015-09-30','2015-06-30','2015-03-31','2014-12-31','2014-09-30','2014-03-31');

foreach (@array){

print "\$_ is $_\n";

my $url="http://data.10jqka.com.cn/financial/yjyg/date/$_/board/ALL/field/enddate/order/desc/page/1/ajax/1/";

print "\$url is $url\n";

my $response = $ua->get($url);

if ($response->is_success) {

open DATAFH,">data.html" || die "open data file failed:$!";

print DATAFH "<html>";

print 	DATAFH "\n";

print DATAFH  $response->decoded_content;  # or whatever

print DATAFH "</html>";

print DATAFH "\n";

};

close DATAFH;

unlink("ths.html");

system('cp data.html ths.html');

$tree= HTML::TreeBuilder::XPath->new;

$tree->parse_file( "ths.html");

my $title="$_";

#my $title=  $tree->findvalue('/html/body//span[@class="text-value"]');

print "\$title is $title\n";

my @pages="";

my @titlepage="";

$max="";

my    @pages=$tree->find_by_tag_name('a');

print "\@pages is @pages\n";

                      #@urlall除了包含每个类别的文章,还包含阅读排行里的文章

                      foreach (@pages) {

                                               @titlepage = $_->attr('page');

                                               foreach (@titlepage) {

                                                 if ($_){

                                                 if ( $_ > $max ){

                                                   $max=$_;

							};				   ###获取版块中每个页面的url

                                                     };

                                           };

};

unless ($max){$max=1};

print "\$max is $max\n";

sleep (5);

for ($m=1;$m<=$max; $m++){

my $url="http://data.10jqka.com.cn/financial/yjyg/date/$_/board/ALL/field/enddate/order/desc/page/$m/ajax/1/";

my $response = $ua->get("$url");

if ($response->is_success) {

open DATAFH,">data.html" || die "open data file failed:$!";

print DATAFH "<html>";

print   DATAFH "\n";

print DATAFH  $response->decoded_content;  # or whatever

print DATAFH "</html>";

print DATAFH "\n";

close DATATH;

};

unlink("ths.html");

system('cp data.html ths.html');

$tree= HTML::TreeBuilder::XPath->new;

$tree->parse_file( "ths.html");

my @arr1= $tree->find_by_tag_name("tr") ;

#shift @arr1;

foreach my $row ( @arr1) {

   my @arr2= $row->content_list;

    my $str1= $arr2[0]->as_text;

    my $str2= $arr2[1]->as_text;

    my $str3= $arr2[2]->as_text;

    my $str4= $arr2[3]->as_text;

    my $str5= $arr2[4]->as_text;

    my $str6= $arr2[5]->as_text;

    my $str7= $arr2[6]->as_text;

    my $str8= $arr2[7]->as_text;

    print $str1, $str2, $str3, $str4, $str5, $str6, $str7,$str8."\n";

   open( E, ">>", "$title-$m.txt" );

      print E ($str1."|".$str2."|".$str3."|".$str4."|".$str5."|".$str6."|".$str7."|".$str8."\n");

      close E; 

                  }

    }

}

perl 爬取上市公司业绩预告的更多相关文章

Perl爬取江西失信执行
#! /usr/bin/perl use strict; use Encode qw(encode decode); binmode(STDIN,":encoding(utf8)" ...
Perl爬取铁路违章旅客信息
#! /usr/bin/perl use strict; use Encode qw(encode decode); binmode(STDIN,":encoding(utf8)" ...
perl 爬取某理财网站产品信息
use LWP::UserAgent; use utf8; use DBI; $user="root"; $passwd="xxxxx"; $dbh=" ...
perl 爬取数据<1>
use LWP::UserAgent; use POSIX; use DBI; $user="root"; $passwd="11111111"; $dbh=& ...
perl 爬取csdn
<pre name="code" class="python">use LWP::UserAgent; use POSIX; use HTML::T ...
perl 爬取同花顺数据
use LWP::UserAgent; use utf8; use DBI; $user="root"; $passwd='xxx'; $dbh=""; $db ...
Python爬取CSDN博客文章
0 url :http://blog.csdn.net/youyou1543724847/article/details/52818339Redis一点基础的东西目录 1.基础底层数据结构 2.win ...
用WebCollector制作一个爬取《知乎》并进行问题精准抽取的爬虫（JAVA）
简单介绍: WebCollector是一个无须配置.便于二次开发的JAVA爬虫框架(内核),它提供精简的的API.仅仅需少量代码就可以实现一个功能强大的爬虫. 怎样将WebCollector导入项目请 ...
Java爬虫_资源网站爬取实战
对 http://bestcbooks.com/ 这个网站的书籍进行爬取 (爬取资源分享在结尾) 下面是通过一个URL获得其对应网页源码的方法传入一个 url 返回其源码 (获得源码后,对源码进 ...

随机推荐

无法下载图片 App Transport Security has blocked a cleartext HTTP (http://) resource load since it is insecure. Temporary exceptions can be configured via your app's Info.plist file
刚学线程通信,提示: 2016-01-27 11:11:02.246 20-9 gcd3 communicationOfThread[5193:298643] App Transport Securi ...
hdu4507
数位dp,终于守得云开见月明了.建议初学者先试试两道比较简单的hdu2089,hdu3555. 鸣谢:http://blog.csdn.net/acm_cxlove/article/details/8 ...
Spring中ref local与ref bean区别
今天在做SSH框架Demo实例时,在ApplicationResources.properties文件时对<ref bean>与<ref local>感到不解,经查找资料才弄明 ...
css负边距自适应布局
单列定宽单列自适应布局: <!DOCTYPE HTML> <html> <head> <meta charset="UTF-8"> ...
Mysql语句的批量操作[修改]
UPDATE `cla_info` SET `comment` = CASE ) THEN 'A' ) THEN 'B' ) THEN 'C' ) THEN 'D' END, `collect` = ...
codeforces 659E . New Reform 强连通
题目链接对于每一个联通块, 如果有一个强连通分量, 那么这个联通块对答案的贡献就是0. 否则对答案贡献是1. #include <iostream> #include <vecto ...
JavaScript和php常用语法——切割字符串
在面向Web的应用中,前台和后台通信非常常用的一种格式就是字符串,所以,在通信中,我们不可避免的就需要进行字符串的拼切. 在js代码中,当我们传递一个字符串到后台代码时,我们在后台需要对字符串进行切割 ...
深入Android媒体存储服务（一）：APP与媒体存储服务的交互
简介: 本文介绍如何在 Android 中,开发者的 APP 如何使用媒体存储服务(包含MediaScanner.MediaProvider以及媒体信息解析等部分),包括如何把 APP 新增或修改的文 ...
Cocos2d-x CCNotificationCenter 通知中心
相信接触过ios开发的人来说对NSNotificationCenter都不陌生.而在cocos2d-x中也参照这个类,提供了CCNotificationCenter这个类,用作通知中心. 那么Noti ...
git Bug分支
Bug分支软件开发中,bug就像家常便饭一样.有了bug就需要修复,在Git中,由于分支是如此的强大,所以,每个bug都可以通过一个新的临时分支来修复,修复后,合并分支,然后将临时分支删除. 当你接 ...

perl 爬取上市公司业绩预告

perl 爬取上市公司业绩预告的更多相关文章

随机推荐

热门专题