Python股票信息抓取~

本来想把股票的涨跌抓取出来，用汇通网的股票为例，就找了国际外汇为例。

页面里有xhr请求,并且每个xhr的url请求的 http://api.q.fx678.com/history.php?symbol=USD&limit=288&resolution=5&codeType=8100&st=0.43342772855649625

然后请求里面哟一个limit=288表示288个时间段，汇通网里面将5分钟为一个时间段，所以一天的话有288个时间间隔。

xhr请求中的st参数好像是跟页面的请求的过期与否有关，因为有的股票可以通过该st值获得数据，但是有的股票就不可以。然后此xhr请求可以将从该时间段为止的24小时内的所有股票的变动返回，包括该时间段的开盘，收盘，最高，最低，然后依次来解析。

处理的时候只是将所有整点的数据该股票的最高点解析出来，xhr请求中的时间戳是秒级的，然后将之转换成北京时间，并且将时间进行排序，然后获取正确的其他的并没有进行处理，程序中存储时间和最高点都是用数组存储的，并且改程序是单进程，会更新~

只想说xpath是真心好用，第二次用，解析参数的时候很方便。

#-*-coding:utf-8 -*-

import urllib

import re

import json

import urllib2

from lxml import etree

import requests

import time

from Queue import Queue

import matplotlib.pyplot as plt

URL = 'http://quote.fx678.com/exchange/WH'

nation_que = Queue()

nation = Queue()

high = Queue()

start_time = Queue()

Chart = []

def __cmp__(self, other):

    if self.time < other.time:

        return -1

    elif self.time == other.time:

        return 0

    else:

        return 1

class Share:

    def __init__(self,time,score):

        self.time = time

        self.score = score

    def __getitem__(self, key):

        self.item.get(key)

    def __cmp__(self, other):

        return cmp(self.time, other.time)

    def __str__(self):

        return self.time + " " + str(self.score)

def download(url, headers, num_try=2):

    while num_try >0:

        num_try -= 1

        try:

            content = requests.get(url, headers=headers)

            return content.text

        except urllib2.URLError as e:

            print 'Download error', e.reason

    return None

def get_type_url():

    headers = {

        'User_agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36',

        'Referer': 'http://quote.fx678.com/exchange/WH',

        'Cookie': 'io=-voMclEjiizK9nWKALqB; UM_distinctid=15f5938ddc72db-089cf9ba58d9e5-31657c00-fa000-15f5938ddc8b24; Hm_lvt_d25bd1db5bca2537d34deae7edca67d3=1509030420; Hm_lpvt_d25bd1db5bca2537d34deae7edca67d3=1509031023',

        'Accept-Language': 'zh-CN,zh;q=0.8',

        'Accept-Encoding': 'gzip, deflate',

        'Accept': '*/*'

    }

    content = download(URL,headers)

    html = etree.HTML(content)

    result = html.xpath('//a[@class="mar_name"]/@href')

    for each in result:

        print each

        st = each.split('/')

        nation_que.put(st[len(st)-1])

    get_precent()

def get_precent():

    while not nation_que.empty():

        ss = nation_que.get(False)

        print ss

        url = 'http://api.q.fx678.com/history.php?symbol=' + ss +'&limit=288&resolution=5&codeType=8100&st=0.43342772855649625'

        print url

        headers = {'Accept':'application/json, text/javascript, */*; q=0.01',

                'Accept-Encoding':'gzip, deflate',

                'Accept-Language':'zh-CN,zh;q=0.8',

                'Connection':'keep-alive',

                'Host':'api.q.fx678.com',

                'Origin':'http://quote.fx678.com',

                'Referer':'http://quote.fx678.com/symbol/USD',

                'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'

               }

        num_try = 2

        while num_try >0:

            num_try -= 1

            try:

                content = requests.get(url, headers=headers)

                html = json.loads(content.text)

                st = html['h']

                T_time = html['t']

                if  len(st) > 0 and len(T_time) > 0:

                    nation.put(ss)

                    high.put(st)

                    start_time.put(T_time)

                break

            except urllib2.URLError as e:

                print 'Download error', e.reason

        nation_que.task_done()

        draw_pict()

def sub_sort(array,array1,low,high):

    key = array[low]

    key1 = array1[low]

    while low < high:

        while low < high and array[high] >= key:

            high -= 1

        while low < high and array[high] < key:

            array[low] = array[high]

            array1[low] = array1[high]

            low += 1

            array[high] = array[low]

            array1[high] = array1[low]

    array[low] = key

    array1[low] = key1

    return low

def quick_sort(array,array1,low,high):

     if low < high:

        key_index = sub_sort(array,array1,low,high)

        quick_sort(array,array1,low,key_index)

        quick_sort(array,array1,key_index+1,high)

def draw_pict():

    while True:

        Nation = nation.get(False)

        High = high.get(False)

        Time = start_time.get(False)

        T_time = []

        High_Rate = []

        num = 0

        for each,high1 in zip(Time,High):

            st = time.localtime(float(each))

            if st.tm_min == 0:

                T_time.append(st.tm_hour)

                High_Rate.append(high1)

        quick_sort(T_time,High_Rate,0,len(T_time)-1)

        plt.plot(T_time,High_Rate, marker='*')

        plt.show()

        plt.title(Nation)

        nation.task_done()

        high.task_done()

        break

if __name__ == '__main__':

    get_type_url()

Python股票信息抓取~的更多相关文章

Python股票信息抓取(三)
最近在看mongodb,然后会用了一些最简单的mongodb的操作,然后想着结合股票信息的数据的抓取,然后将数据存储在mongodb中,对于mongo和数据库的最大的区别是,mongo不需要建表,直接 ...
Python股票信息抓取(二)
在一的基础上,想着把所有的折线图放在一个图中,然后图的结果如图所示: 不是略丑,是很丑~ 依然的单进程,只是将图标结果放在了一张图里代码如下: #-*-coding:utf-8 -*- import ...
python 爬虫抓取心得
quanwei9958 转自 python 爬虫抓取心得分享 urllib.quote('要编码的字符串') 如果你要在url请求里面放入中文,对相应的中文进行编码的话,可以用: urllib.quo ...
python requests抓取NBA球员数据，pandas进行数据分析，echarts进行可视化 (前言)
python requests抓取NBA球员数据,pandas进行数据分析,echarts进行可视化 (前言) 感觉要总结总结了,希望这次能写个系列文章分享分享心得,和大神们交流交流,提升提升. 因为 ...
python数据抓取分析（python + mongodb）
分享点干货!!! Python数据抓取分析编程模块:requests,lxml,pymongo,time,BeautifulSoup 首先获取所有产品的分类网址: def step(): try: ...
python 处理抓取网页乱码
python 处理抓取网页乱码问题一招鲜相信用python的人一定在抓取网页时,被编码问题弄晕过一阵前几天写了一个测试网页的小脚本,并查找是否包含指定的信息. 在html = urllib2. ...
Python爬虫----抓取豆瓣电影Top250
有了上次利用python爬虫抓取糗事百科的经验,这次自己动手写了个爬虫抓取豆瓣电影Top250的简要信息. 1.观察url 首先观察一下网址的结构 http://movie.douban.com/to ...
Python爬虫抓取东方财富网股票数据并实现MySQL数据库存储
Python爬虫可以说是好玩又好用了.现想利用Python爬取网页股票数据保存到本地csv数据文件中,同时想把股票数据保存到MySQL数据库中.需求有了,剩下的就是实现了. 在开始之前,保证已经安装好 ...
python Web抓取（一）[没写完]
需要的模块: python web抓取通过: webbrowser:是python自带的,打开浏览器获取指定页面 requests:从因特网上下载文件和网页 Beautiful Soup:解析HTML ...

随机推荐

Codeforces Round #431
我太菜啦 A 一道斯波题,我想了一会儿后写了dp,其实if就好了 B做法很一眼,但有一些细节,分类一下就好了 C一直在想dp,挂机30分钟,后来dp来模拟分层图状态扩展的过程 D不会然后发现room ...
导入（移动）数据到hive1.1.0表的方法
hive数据导入代码格式(会移动源文件位置): LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [partit ...
[HEOI2015]公约数数列
不错的分块题 gcd和xor其实并没有联系这里,xor的按位性质没有半点卵用 gcd的性质却很关键: 一个数组,前缀gcd最多logn个不同的 gcd不太多,(暴力的基础) 所有考虑分块. 分块,每 ...
poj 2176 folding
Description: 就是把一个字符串压尽可能的压缩 #include<iostream> #include<cstring> #include<cstdio&g ...
【莫队】【P3901】数列找不同
Description 现在有一个长度为\(~n~\)的数列\(~A_1~,~A_2~\dots~A_n~\),\(~Q~\)个询问\(~[l_i~,~r_i]~\),每次询问区间内是否有元素相同 I ...
selenium - webdriver - 截图方法get_screenshot_as_file()
WebDriver提供了截图函数get_screenshot_as_file()来截取当前窗口. from selenium import webdriver from time import sle ...
洛谷P1588 丢失的牛
P1588 丢失的牛 158通过 654提交题目提供者JOHNKRAM 标签USACO 难度普及/提高- 时空限制1s / 128MB 提交讨论题解最新讨论更多讨论答案下载下来是对的,但 ...
Android缓存
一个利用内存缓存和磁盘缓存图片的例子 public class BitmapCache { public static final String TAG = "debug"; pr ...
Java基础之equals() 和 hashCode()
equals()是Object中的一个方法: public boolean equals(Object obj) { return (this == obj); } 在Object中equals()方 ...
Java集合框架（set）
set继承自collection接口,其子类和子接口如下: set的共同特性:不能添加相同的元素,通常无法记住元素添加的顺序 1.HashSet类判断两元素相同的标准:1.equals方法返回tru ...

Python股票信息抓取~

Python股票信息抓取~的更多相关文章

随机推荐

热门专题