用Python写网络爬虫 第二版
书籍介绍
背景调研
检查robots.txt
大多数的网站都会定义robots.txt文件,这样可以让爬虫了解爬取该网站时存在哪些限制。这些限制虽然是仅仅作为建议给出,但是良好的网络公民都应当遵守这些限制。
更多信息参见:https://www.robotstxt.org
示例:
访问http://example.python-scraping.com/robots.txt获取如下内容:
# section 1
User-agent: BadCrawler
Disallow: / # section 2
User-agent: *
Disallow: /trap
Crawl-delay: 5 # section 3
Sitemap: http://example.python-scraping.com/sitemap.xml
在section1中,robots.txt文件禁止用户代理未BadCcrawler的爬虫爬取该网站,不过这种写法可能无法起到应有的作用,因为恶意爬虫根本不会遵从robots.txt的要求。
section2规定,无论使用哪种用户代理,都应该在两次下载请求之间给出5秒的抓取延迟,我们需要遵从建议以免服务器过载。这里还有一个/trap链接,用于封禁那些爬取了不允许访问的链接的恶意爬虫。如果你访问了这个链接,服务器就会封禁你的IP一分钟!一个真实的网站可能会对你的IP封禁更长时间,甚至是永久封禁。
section3定义了一个Sitemap文件(即网站地图)。
检查网站地图
网站提供的Sitemap文件(即网站地图)可以帮助爬虫定位网站最新的内容,而无需爬取每一个网页,如果想要了解更多信息,可以从https://www.sitemaps.org/protocol.html获取网站地图的标准定义。许多网站发布平台都有自动生成网站地图的能力。下面是robots.txt文件中定位到的Sitemap文件的内容:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url><loc>http://example.python-scraping.com/places/default/view/Afghanistan-1</loc></url>
<url><loc>http://example.python-scraping.com/places/default/view/Aland-Islands-2</loc></url>
<url><loc>http://example.python-scraping.com/places/default/view/Albania-3</loc></url>
...
</urlset>
网站地图提供了所有网页的链接
编写第一个网络爬虫
下载网页
import urllib.request
from urllib.error import URLError, HTTPError, ContentTooShortError def download(url):
print('Downloading:', url)
try:
html = urllib.request.urlopen(url).read()
except (URLError, HTTPError, ContentTooShortError) as e:
print('Download error:', e.reason)
html = None
return html
下载重试
下面代码保证download函数在发送5xx错误时重新下载,可以尝试下载 http://httpstat.us/500 ,该网址会始终返回500错误码。
import urllib.request
from urllib.error import URLError, HTTPError, ContentTooShortError def download(url, num_retries=2):
print('Downloading:', url)
try:
html = urllib.request.urlopen(url).read()
except (URLError, HTTPError, ContentTooShortError) as e:
print('Download error:', e.reason)
html = None
if num_retries > 0:
if hasattr(e, 'code') and 500 <= e.code < 600:
# recursively retry 5xx HTTP errors
return download(url, num_retries - 1)
return html
设置用户代理user-agent
默认情况下,urllib使用Python-urllib/3.x作为用户代理下载网页内容,其中3.x是环境当前所用的Python的版本号。也许是因为曾经历过质量不佳的Python网络爬虫造成的服务器过载,一些网站还会封禁这个默认代理。
为了使下载网站更加可靠,我们需要控制用户代理的设定。下面的代码对download这个函数进行了参数化,设定了一个默认的用户代理‘wswp’(即Web Scraping With Python的首字母缩写)
import urllib.request
from urllib.error import URLError, HTTPError, ContentTooShortError def download(url, num_retries=2, user_agent='wswp'):
print('Downloading:', url)
request = urllib.request.Request(url)
request.add_header('User-agent', user_agent)
try:
html = urllib.request.urlopen(request).read()
except (URLError, HTTPError, ContentTooShortError) as e:
print('Download error:', e.reason)
html = None
if num_retries > 0:
if hasattr(e, 'code') and 500 <= e.code < 600:
# recursively retry 5xx HTTP errors
return download(url, num_retries=num_retries - 1)
return html
网站地图爬虫
import urllib.request
import re from urllib.error import URLError, HTTPError, ContentTooShortError def download(url, num_retries=2, user_agent='wswp', charset='utf-8'):
print('Downloading:', url)
request = urllib.request.Request(url)
request.add_header('User-agent', user_agent)
try:
resp = urllib.request.urlopen(request)
cs = resp.headers.get_content_charset()
if not cs:
cs = charset
html = resp.read().decode(cs)
except (URLError, HTTPError, ContentTooShortError) as e:
print('Download error:', e.reason)
html = None
if num_retries > 0:
if hasattr(e, 'code') and 500 <= e.code < 600:
# recursively retry 5xx HTTP errors
return download(url, num_retries=num_retries - 1)
return html def crawl_sitemap(url):
# download the sitemap file
sitemap = download(url)
# extract the sitemap links
links = re.findall('<loc>(.*?)</loc>', sitemap)
# download each link
for link in links:
html = download(link)
# scrape html here
ID遍历爬虫
下面代码对ID进行遍历,直到出现下载错误时停止。
import urllib.request
from urllib.error import URLError, HTTPError, ContentTooShortError
import itertools def download(url, num_retries=2):
print('Downloading:', url)
try:
html = urllib.request.urlopen(url).read()
except (URLError, HTTPError, ContentTooShortError) as e:
print('Download error', e.reason)
html = None
if num_retries > 0:
if hasattr(e, 'code') and 500 <= e.code <= 500:
return download(url, num_retries - 1)
return html def crawl_site(url):
for page in itertools.count(1):
pg_url = '{0}{1}'.format(url, page)
html = download(pg_url)
if html is None:
break
上面实现方式有一个缺陷就是,某个记录可能被删除,数据库ID之间并不是连续的,此时只要访问某个间隔点,爬虫就会立即退出。
下面代码对此进行改进,该版本连续发生多次下载错误后才会退出程序
import itertools
import urllib.request
from urllib.error import URLError, HTTPError, ContentTooShortError def download(url, num_retries=2, user_agent='wswp', charset='utf-8'):
print('Downloading:', url)
request = urllib.request.Request(url)
request.add_header('User-agent', user_agent)
try:
resp = urllib.request.urlopen(request)
cs = resp.headers.get_content_charset()
if not cs:
cs = charset
html = resp.read().decode(cs)
except (URLError, HTTPError, ContentTooShortError) as e:
print('Download error:', e.reason)
html = None
if num_retries > 0:
if hasattr(e, 'code') and 500 <= e.code < 600:
# recursively retry 5xx HTTP errors
return download(url, num_retries - 1)
return html def crawl_site(url, max_errors=5):
num_errors = 0
for page in itertools.count(1):
pg_url = '{}{}'.format(url, page)
html = download(pg_url)
if html is None:
num_errors += 1
if num_errors == max_errors:
# reached max number of errors, so exit
break
else:
num_errors = 0
# success - can scrape the result
链接爬虫
下面代码完成下载链接、将相对链接转为绝对链接、去重功能
import re
import urllib.request
from urllib.parse import urljoin
from urllib.error import URLError, HTTPError, ContentTooShortError def download(url, num_retries=2, user_agent='wswp', charset='utf-8'):
print('Downloading:', url)
request = urllib.request.Request(url)
request.add_header('User-agent', user_agent)
try:
resp = urllib.request.urlopen(request)
cs = resp.headers.get_content_charset()
if not cs:
cs = charset
html = resp.read().decode(cs)
except (URLError, HTTPError, ContentTooShortError) as e:
print('Download error:', e.reason)
html = None
if num_retries > 0:
if hasattr(e, 'code') and 500 <= e.code < 600:
# recursively retry 5xx HTTP errors
return download(url, num_retries - 1)
return html def link_crawler(start_url, link_regex):
" Crawl from the given start URL following links matched by link_regex "
crawl_queue = [start_url]
# keep track which URL's have seen before
seen = set(crawl_queue)
while crawl_queue:
url = crawl_queue.pop()
html = download(url)
if not html:
continue
# filter for links matching our regular expression
for link in get_links(html):
if re.match(link_regex, link):
abs_link = urljoin(start_url, link)
if abs_link not in seen:
seen.add(abs_link)
crawl_queue.append(abs_link) def get_links(html):
" Return a list of links from html "
# a regular expression to extract all links from the webpage
webpage_regex = re.compile("""<a[^>]+href=["'](.*?)["']""", re.IGNORECASE)
# list of all links from the webpage
return webpage_regex.findall(html)
解析robots.txt
首先,我们需要解析robots.txt 文件,以避免下载禁止爬取的URL,使用Python的urllib库中的robotparser模块,就可以轻松完成这项工作,如下面的代码所示:
from urllib import robotparser rp = robotparser.RobotFileParser()
rp.set_url('http://example.python-scraping.com/robots.txt')
rp.read()
url = 'http://example.python-scraping.com/robots.txt'
user_agent = 'BadCrawler'
print(rp.can_fetch(user_agent, url)) # False
user_agent = 'GoodCrawler'
print(rp.can_fetch(user_agent, url)) # True
为将robotparser集成到链接爬虫中,我们首先需要创建有个新函数用于返回robotparser对象。
from urllib import robotparser def get_robots_parser(robots_url):
rp = robotparser.RobotFileParser()
rp.set_url(robots_url)
rp.read()
return rp
我们需要可靠的设置robots_url,此时我们可以通过向函数传递额外的关键字参数的方法实现这一目标,我们还可以设置一个默认值,防止用户没有传递该变量,此外还需要定义user_agent
def link_crawler(start_url, link_regex, robots_url=None, user_agent='wswp'):
...
if not robots_url:
robots_url = '{}/robots.txt'.format(start_url)
rp = get_robots_parser(robots_url) # 最后我们在crawl循环中添加解释器检查
...
while crawl_queue:
url = crawl_queue.pop()
if rp.can_fetch(user_agent, url):
html = download(url, use=user_agent)
...
else:
print('Blocked by robots.txt:', url)
支持代理
下面是使用urllib只存储代理的代码
proxy = 'http://myproxy.net:1234'
proxy_support = urllib.request.ProxyHandler({'http':proxy})
opener = urllib.request.build_opener(proxy_support)
urllib.request.install_opener(opener)
下面是集成了该功能的新版本的download函数
import urllib.request
from urllib.error import URLError, HTTPError, ContentTooShortError def download(url, user_agent='wswp', num_retries=2, charset='utf-8', proxy=None):
print('Downloading:', url)
request = urllib.request.Request(url)
request.add_header('User-agent', user_agent) try:
if proxy:
proxy = 'http://myproxy.net:1234'
proxy_support = urllib.request.ProxyHandler({'http': proxy})
opener = urllib.request.build_opener(proxy_support)
urllib.request.install_opener(opener)
resp = urllib.request.urlopen(request)
cs = resp.headers.get_content_charset()
if not cs:
cs = charset
html = resp.read().decode(cs)
except (URLError, HTTPError, ContentTooShortError) as e:
print('Download error', e.reason)
html = None
if num_retries > 0:
if hasattr(e, 'code') and 500 <= e.code <= 500:
return download(url, user_agent=user_agent, num_retries=num_retries, charset=charset, proxy=charset)
return html
目前,默认情况下(python3.5),urllib模块不支持https代理。
下载限速
如果我们爬取网站的速度过快,就会面临被封禁或是造成服务器过载的风险。为了降低这些风险,我们可以在两次下载之间添加一组延时,从而对爬虫限速。下面是实现了该功能的类的代码。
from urllib.parse import urlparse
import time class Throttle:
""" Add a delay between downloads to the same domain
"""
def __init__(self, delay):
# amount of delay between downloads for each domain
self.delay = delay
# timestamp of when a domain was last accessed
self.domains = {} def wait(self, url):
domain = urlparse(url).netloc
last_accessed = self.domains.get(domain) if self.delay > 0 and last_accessed is not None:
sleep_secs = self.delay - (time.time() - last_accessed)
if sleep_secs > 0:
# domain has been accessed recently
# so need to sleep
time.sleep(sleep_secs)
# update the last accessed time
self.domains[domain] = time.time()
Throttle类记录了每个域名上次访问的时间,如果当前时间距离上次访问时间小于指定延时,则执行睡眠操作。我们可以在每次下载之前调用throttle对爬虫进行限速。
throttle = Throttle(delay)
throttle.wait(url)
html = download(url, user_agent=user_agent, num_retries=num_retries, charset=charset, proxy=charset)
避免爬虫陷阱
目前,我们的爬虫会跟踪所有之前没有访问过的链接。但是,一些网站会动态生成页面内容,这样就会出现无限多的页面。比如,网站有一个在线日历功能,提供了可以访问下个月和下一年的链接,那么下个月的页面中同样会包含访问再下个月的链接,这样就会一直持续请求到部件设定的最大时间(可能会是很久之后的时间)。该站点可能还会在简单的分页导航中提供相同的功能,本质上是 分页请求不断访问空的搜索结果页,直至达到最大页数。这种情况被称为爬虫陷阱。
想要避免陷入爬虫陷阱,一个简单的方法是记录到达当前网页经过了多少个链接,也就是深度。当达到最大深度时,爬虫就不再向队列中添加该网页中的链接了,想要实现最大深度的功能,我们需要修改seen变量,该变量原先只记录了访问过的网页链接,现在修改为一个字典,添加已发现链接的深度记录。
def link_crawler(..., max_depth=4):
seen = {}
...
if rp.can_fetch(user_agent, url):
depth = seen.get(url, 0)
if depth == max_depth:
print('Skipping %s due to depth' % url)
continnue
...
for link in get_links(html):
if re.match(link_regex, link):
abs_link = urljoin(start_url, link)
if abs_link not in seen:
seen[abs_link] = depth + 1
crawl_queue.append(abs_link)
有了该功能之后,我们就有信心爬虫最终一定能够完成了。如果想要禁用该功能,只需要将max_depth设为一个负数即可,此时,当前深度永远不会与之相等。
完整版代码
import re
import time
import urllib.request
from urllib import robotparser
from urllib.parse import urljoin,urlparse
from urllib.error import URLError, HTTPError, ContentTooShortError def download(url, num_retries=2, user_agent='wswp', charset='utf-8', proxy=None):
""" Download a given URL and return the page content
args:
url (str): URL
kwargs:
user_agent (str): user agent (default: wswp)
charset (str): charset if website does not include one in headers
proxy (str): proxy url, ex 'http://IP' (default: None)
num_retries (int): number of retries if a 5xx error is seen (default: 2)
"""
print('Downloading:', url)
request = urllib.request.Request(url)
request.add_header('User-agent', user_agent)
try:
if proxy:
proxy_support = urllib.request.ProxyHandler({'http': proxy})
opener = urllib.request.build_opener(proxy_support)
urllib.request.install_opener(opener)
resp = urllib.request.urlopen(request)
cs = resp.headers.get_content_charset()
if not cs:
cs = charset
html = resp.read().decode(cs)
except (URLError, HTTPError, ContentTooShortError) as e:
print('Download error:', e.reason)
html = None
if num_retries > 0:
if hasattr(e, 'code') and 500 <= e.code < 600:
# recursively retry 5xx HTTP errors
return download(url, num_retries=num_retries - 1)
return html def get_robots_parser(robots_url):
" Return the robots parser object using the robots_url "
rp = robotparser.RobotFileParser()
rp.set_url(robots_url)
rp.read()
return rp def get_links(html):
" Return a list of links (using simple regex matching) from the html content "
# a regular expression to extract all links from the webpage
webpage_regex = re.compile("""<a[^>]+href=["'](.*?)["']""", re.IGNORECASE)
# list of all links from the webpage
return webpage_regex.findall(html) class Throttle:
""" Add a delay between downloads to the same domain
"""
def __init__(self, delay):
# amount of delay between downloads for each domain
self.delay = delay
# timestamp of when a domain was last accessed
self.domains = {} def wait(self, url):
domain = urlparse(url).netloc
last_accessed = self.domains.get(domain) if self.delay > 0 and last_accessed is not None:
sleep_secs = self.delay - (time.time() - last_accessed)
if sleep_secs > 0:
# domain has been accessed recently
# so need to sleep
time.sleep(sleep_secs)
# update the last accessed time
self.domains[domain] = time.time() def link_crawler(start_url, link_regex, robots_url=None, user_agent='wswp',
proxy=None, delay=3, max_depth=4):
""" Crawl from the given start URL following links matched by link_regex. In the current
implementation, we do not actually scrapy any information. args:
start_url (str): web site to start crawl
link_regex (str): regex to match for links
kwargs:
robots_url (str): url of the site's robots.txt (default: start_url + /robots.txt)
user_agent (str): user agent (default: wswp)
proxy (str): proxy url, ex 'http://IP' (default: None)
delay (int): seconds to throttle between requests to one domain (default: 3)
max_depth (int): maximum crawl depth (to avoid traps) (default: 4)
"""
crawl_queue = [start_url]
# keep track which URL's have seen before
seen = {}
if not robots_url:
robots_url = '{}/robots.txt'.format(start_url)
rp = get_robots_parser(robots_url)
throttle = Throttle(delay)
while crawl_queue:
url = crawl_queue.pop()
# check url passes robots.txt restrictions
if rp.can_fetch(user_agent, url):
depth = seen.get(url, 0)
if depth == max_depth:
print('Skipping %s due to depth' % url)
continue
throttle.wait(url)
html = download(url, user_agent=user_agent, proxy=proxy)
if not html:
continue
# TODO: add actual data scraping here
# filter for links matching our regular expression
for link in get_links(html):
if re.match(link_regex, link):
abs_link = urljoin(start_url, link)
if abs_link not in seen:
seen[abs_link] = depth + 1
crawl_queue.append(abs_link)
else:
print('Blocked by robots.txt:', url)
requests版本:
import re
import time
import requests
from urllib import robotparser
from urllib.parse import urljoin,urlparse class Throttle:
""" Add a delay between downloads to the same domain
"""
def __init__(self, delay):
# amount of delay between downloads for each domain
self.delay = delay
# timestamp of when a domain was last accessed
self.domains = {} def wait(self, url):
domain = urlparse(url).netloc
last_accessed = self.domains.get(domain) if self.delay > 0 and last_accessed is not None:
sleep_secs = self.delay - (time.time() - last_accessed)
if sleep_secs > 0:
# domain has been accessed recently
# so need to sleep
time.sleep(sleep_secs)
# update the last accessed time
self.domains[domain] = time.time() def download(url, num_retries=2, user_agent='wswp', proxies=None):
""" Download a given URL and return the page content
args:
url (str): URL
kwargs:
user_agent (str): user agent (default: wswp)
proxies (dict): proxy dict w/ keys 'http' and 'https', values
are strs (i.e. 'http(s)://IP') (default: None)
num_retries (int): # of retries if a 5xx error is seen (default: 2)
"""
print('Downloading:', url)
headers = {'User-Agent': user_agent}
try:
resp = requests.get(url, headers=headers, proxies=proxies)
html = resp.text
if resp.status_code >= 400:
print('Download error:', resp.text)
html = None
if num_retries and 500 <= resp.status_code < 600:
# recursively retry 5xx HTTP errors
return download(url, num_retries=num_retries - 1)
except requests.exceptions.RequestException as e:
print('Download error:', e)
html = None
return html def get_robots_parser(robots_url):
" Return the robots parser object using the robots_url "
rp = robotparser.RobotFileParser()
rp.set_url(robots_url)
rp.read()
return rp def get_links(html):
""" Return a list of links (using simple regex matching)
from the html content """
# a regular expression to extract all links from the webpage
webpage_regex = re.compile("""<a[^>]+href=["'](.*?)["']""", re.IGNORECASE)
# list of all links from the webpage
return webpage_regex.findall(html) def link_crawler(start_url, link_regex, robots_url=None, user_agent='wswp',
proxies=None, delay=3, max_depth=4):
""" Crawl from the given start URL following links matched by link_regex.
In the current implementation, we do not actually scrape any information. args:
start_url (str): web site to start crawl
link_regex (str): regex to match for links
kwargs:
robots_url (str): url of the site's robots.txt
(default: start_url + /robots.txt)
user_agent (str): user agent (default: wswp)
proxies (dict): proxy dict w/ keys 'http' and 'https', values
are strs (i.e. 'http(s)://IP') (default: None)
delay (int): seconds to throttle between requests
to one domain (default: 3)
max_depth (int): maximum crawl depth (to avoid traps) (default: 4)
"""
crawl_queue = [start_url]
# keep track which URL's have seen before
seen = {}
if not robots_url:
robots_url = '{}/robots.txt'.format(start_url)
rp = get_robots_parser(robots_url)
throttle = Throttle(delay)
while crawl_queue:
url = crawl_queue.pop()
# check url passes robots.txt restrictions
if rp.can_fetch(user_agent, url):
depth = seen.get(url, 0)
if depth == max_depth:
print('Skipping %s due to depth' % url)
continue
throttle.wait(url)
html = download(url, user_agent=user_agent, proxies=proxies)
if not html:
continue
# TODO: add actual data scraping here
# filter for links matching our regular expression
for link in get_links(html):
if re.match(link_regex, link):
abs_link = urljoin(start_url, link)
if abs_link not in seen:
seen[abs_link] = depth + 1
crawl_queue.append(abs_link)
else:
print('Blocked by robots.txt:', url)
数据抓取
上面已经学习了如何构建一个爬虫来下载网页,现在,我们要让这个爬虫从每个网页中抽取一些数据,然后实现某些事情,这种做法也称为抓取(scraping)。
正则表达式
官方文档:https://docs.python.org/3/howto/regex.html
Beautiful Soup
中文文档:https://beautifulsoup.readthedocs.io/zh_CN/latest/
安装命令:
pip install beautifulsoup4
安装html5lib解析器
pip install html5lib
使用了html5lib的BeautifulSoup能够正确解析缺失的属性引号以及闭合标签,使其成为完整的HTML文档
Lxml
LXML是基于libxml2这一XML解析库构建的Python库,它使用C语言编写,解析速度比BeautifulSoup更快。
用Python写网络爬虫 第二版的更多相关文章
- 读书笔记汇总 --- 用Python写网络爬虫
本系列记录并分享:学习利用Python写网络爬虫的过程. 书目信息 Link 书名: 用Python写网络爬虫 作者: [澳]理查德 劳森(Richard Lawson) 原版名称: web scra ...
- Python写网络爬虫爬取腾讯新闻内容
最近学了一段时间的Python,想写个爬虫,去网上找了找,然后参考了一下自己写了一个爬取给定页面的爬虫. Python的第三方库特别强大,提供了两个比较强大的库,一个requests, 另外一个Bea ...
- 笔记之《用python写网络爬虫》
1 .3 背景调研 robots. txt Robots协议(也称为爬虫协议.机器人协议等)的全称是"网络爬虫排除标准"(Robots Exclusion Protocol),网站 ...
- python写网络爬虫的环境搭建
网上找了好多资料,都不全,通过资料的整理,包括自己的测试,终于把环境打好了,真是对于一个刚接触爬虫的人来说实属不易,现在分享给大家,若有不够详细之处,希望各位网友能补充. 第一步,下载python, ...
- 读书笔记--用Python写网络爬虫01--网络爬虫简介
Wiki - Web crawler 百度百科 - 网络爬虫 1.1 网络爬虫何时使用 用于快速自动地获取网络信息,避免重复性的手工操作. 1.2 网络爬虫是否合法 网络爬虫目前人处于早期的蛮荒阶段, ...
- Python 写网络爬虫思路分析
首先从程序入口开始分析,在程序入口处传入一个待爬取的网址, 使用下载器Html_downloader类下载该地址的内容,使用解释器 parser分析内容,利用BeautifulSoup包抓取想要爬取的 ...
- 读书笔记--用Python写网络爬虫02--数据抓取
抓取(scraping)---爬虫从网页中抽取一些数据用以实现某些用途. 三种抽取网页数据的方法:正则表达式.Beautiful Soup和lxml. 2.1 分析网页 通过浏览器自带选项,查看网页源 ...
- 用python写网路爬虫 PDF高清完整版免费下载 Python基础教程免费电子书 python入门书籍免费下载
<用python写网路爬虫PDF免费下载>PDF书籍下载 内容简介 作为一种便捷地收集网上信息并从中抽取出可用信息的方式,网络爬虫技术变得越来越有用.使用Python这样的简单编程语言,你 ...
- [原创]手把手教你写网络爬虫(4):Scrapy入门
手把手教你写网络爬虫(4) 作者:拓海 摘要:从零开始写爬虫,初学者的速成指南! 封面: 上期我们理性的分析了为什么要学习Scrapy,理由只有一个,那就是免费,一分钱都不用花! 咦?怎么有人扔西红柿 ...
随机推荐
- LeetCode 70. 爬楼梯(Climbing Stairs)
70. 爬楼梯 70. Climbing Stairs 题目描述 假设你正在爬楼梯.需要 n 阶你才能到达楼顶. 每次你可以爬 1 或 2 个台阶.你有多少种不同的方法可以爬到楼顶呢? 注意: 给定 ...
- Servlet 表单及上传文件
// 文件路径 D:\ApacheServer\web_java\HelloWorld\src\com\test\TestServletForm.java package com.test; impo ...
- 转:对JavaScript中闭包的理解
关于 const let var 总结: 建议使用 let ,而不使用var,如果要声明常量,则用const. ES6(ES2015)出现之前,JavaScript中声明变量只有 ...
- Python【input()函数】
运用input函数搜集信息 input()函数结果的赋值name = input('请输入你的名字:') #将input()函数的执行结果(收集的信息)赋值给变量name input()函数的使用场景 ...
- redis有序集合数据类型---sortedset
一.概述 redis有序集合和集合一样,也是string类型元素的集合,且不允许重复的成员. 不同的是每个元素都会关联一个double类型的分数. redis正式通过分数来为集合中的重圆进行从小到大的 ...
- sql 防注入(更新问题)
一下这条语句虽然不会是数据表中的数据发生变化,但是会对数据库主从造成影响 update `article` where `article_id` = '40&n974742=v995656' ...
- python的excel处理之openpyxl
一.颜色处理 cell = sheet.cell(row, col)font = Font(size=12, bold=False, name='Arial', color=colors.BLACK) ...
- echarts的一点记录
echart官网地址: https://www.echartsjs.com/index.html echarts实例地址:https://echarts.baidu.com/examples/ vue ...
- Linux文件(夹)属性
ll 或者 ls -lh 查看文件属性:
- windows上安装python虚拟环境
一.windows上安装python虚拟环境 1.安装pip install virtualenvvirtualenv --version 2.新建一个python虚拟环境virtual_env_01 ...