编程Tips集锦

以下是自己编程的一些小贴士，记录，总结提高自己。

1.python中集合类型的查找，尽量用dict or set类型。

dict和set类型，在python内部的实现都是使用hash映射，查找的时间复杂度是O(1)，比任何的查找算法都高效。

当在程序中使用到>1K次的查询，就应该开始考虑使用dict或set类型来进行数据的组织。

 #coding:utf-8

 from urllib.request import urlopen

 from bs4 import BeautifulSoup

 import re

 import string

 import operator

 import datetime

 commonWords = ["the", "be", "and", "of", "a", "in", "to", "have", "it", "i", "that", "for", "you", "he", "with", "on", "do", "say", "this", "they", "is", "an", "at", "but","we", "his", "from", "that", "not", "by", "she", "or", "as", "what", "go", "their","can", "who", "get", "if", "would", "her", "all", "my", "make", "about", "know", "will","as", "up", "one", "time", "has", "been", "there", "year", "so", "think", "when", "which", "them", "some", "me", "people", "take", "out", "into", "just", "see", "him", "your", "come", "could", "now", "than", "like", "other", "how", "then", "its", "our", "two", "more", "these", "want", "way", "look", "first", "also", "new", "because", "day", "more", "use", "no", "man", "find", "here", "thing", "give", "many", "well"]

 #若不注释，则为set类型，跑一遍程序，对比一下，则知优劣！

 #commonWords = set(commonWords)

 def isCommon(word):

     global commonWords

     if word in commonWords:

         return True

     return False

 def cleanText(input):

     input = re.sub('\n+', " ", input).lower()

     input = re.sub('\[[0-9]*\]', "", input)

     input = re.sub(' +', " ", input)

     input = re.sub("u\.s\.", "us", input)

     input = bytes(input, "UTF-8")

     input = input.decode("ascii", "ignore")

     return input

 def cleanInput(input):

     input = cleanText(input)

     cleanInput = []

     input = input.split(' ')

     for item in input:

         item = item.strip(string.punctuation)

         if len(item) > 1 or (item.lower() == 'a' or item.lower() == 'i'):

             cleanInput.append(item)

     cleanContent = []

     for word in cleanInput:

         if not isCommon(word):

             cleanContent.append(word)

     return cleanContent

 def getNgrams(input, n):

     input = cleanInput(input)

     output = {}

     for i in range(len(input)-n+1):

         ngramTemp = " ".join(input[i:i+n])

         if ngramTemp not in output:

             output[ngramTemp] = 0

         output[ngramTemp] += 1

     return output

 def getFirstSentenceContaining(ngram, content):

     #print(ngram)

     sentences = content.split(".")

     for sentence in sentences:

         if ngram in sentence:

             return sentence

     return ""

 content = str(urlopen("http://pythonscraping.com/files/inaugurationSpeech.txt").read(), 'utf-8')

 print('Use the set as the format of common words.')

 print('Begin:',datetime.datetime.now())

 for i in range(50):

     ngrams = getNgrams(content, 2)

     sortedNGrams = sorted(ngrams.items(), key = operator.itemgetter(1), reverse = True)

 print('End:',datetime.datetime.now())

 print(sortedNGrams)

2.python往数据库插入数据

在插入数据之前，记得先进行一次查询，查看数据是否已经在数据库中。

一可以使程序更健壮，二也可顺便避免二次查询。

3.数据库在建表的时候，最后有索引

最近需要往数据库中插入上百万级的数据，十万级以后之后，数据库变得极慢，磁盘读写也是爆满！

后来，发现查询次数太多，重新建表，顺便加入索引。特别是unique index，我猜背后的实现机制是hash映射。

加入索引之后的数据库，大大减轻了磁盘的负担，查询速度几乎恒定，不过数据库的增大还是降低了读写的速度（实属情理之中）。

3.python字符串中转义字符的处理

python中\t所占位为4位，不是通常的8位。

编程Tips集锦的更多相关文章

Spring MVC 学习笔记1 - First Helloworld by Eclipse【& - java web 开发Tips集锦】
Spring MVC 学习笔记1 - First Helloworld by Eclipse reference:http://www.gontu.org 1. 下载 Spring freamwork ...
Java编程Tips
原文: Java编程中"为了性能"尽量要做到的一些地方作者: javatgo 最近的机器内存又爆满了,除了新增机器内存外,还应该好好review一下我们的代码,有很多代码编写过于 ...
【转】高效Java编程工具集锦
原文地址:http://geek.csdn.net/news/detail/57469 Java 开发者常常都会想办法如何更快地编写 Java 代码,让编程变得更加轻松.目前,市面上涌现出越来越多的高 ...
前端编程tips
1.ts less 网上搜视频教程,不用太复杂的,短短几分钟视频基本就对其入门了,比自己搜官网学习更方便. 常用的ts技术:let name:string=""; let obj ...
Spring入门编程问题集锦Top10
我写的一篇文章,希望对spring初学者有所帮助: 1.如何学习Spring? 你可以通过下列途径学习spring: ①. spring下载包中doc目录下的MVC-step-by-step和samp ...
[C++]高效C/C ++编程tips
Effective C++ 视C++ 为一个语言联邦(C.Object-Oriented C++.Template C++.STL) 宁可以编译器替换预处理器(尽量以const.enum.inline ...
编程Tips
三元运算符 Vb中的iif(expr,truepart,falsepart)和C#中的expr?truepart:falsepart. 无论expr的结果是true还是false,true/false ...
stm8编程tips（stvd）
编译完成时显示程序占用的flash和ram大小将附件压缩包中的mapinfo.exe解压到stvd的安装路径\stvd中在工程上点右键选settings 右侧的选项卡选择Linker,将categ ...
vim tips 集锦
删除文件中的空行 :g/^$/d g 表示 global,全文件 ^ 是行开始,$ 是行结束 d 表示删除该这里只能匹配到没有白空符的空行,假如要删除有空白符的空行,则使用: :g/^\s*$/d ...

随机推荐

INDEX RANG SCAN无需回表的情况
create table a3 as select * from dba_objects create index a3_idx1 on a3(owner); select owner from a3 ...
LeetCode——Permutations
Permutations Given a collection of numbers, return all possible permutations. For example,[1,2,3] ha ...
ubuntu下QT输出程序控制台界面难看的解决方法
这几天在ubuntu下装了QT5,但输出程序界面后,简直无法入目于是,随便乱找后,终于找到解决方法打开选项在终端那行改下就行
在vim下写python 会出现python错误：unexpected unident
需要在.vimrc 的set tabstop=4的这一行上面增加 set expandtab 否则会报unexpected unident
MySQL数据库的安装
官方下载地址: http://downloads.mysql.com/archives/community/ msi为安装版,zip为免安装版,最新版本的MySQL没有64位windows的msi版囧 ...
[Design Pattern] Flywight Pattern 简单案例
Flywight Pattern, 即享元模式,用于减少对象的创建,降低内存的占用,属于结构类的设计模式.根据名字,我也将其会理解为轻量模式. 下面是享元模式的一个简单案例. 享元模式,主要是重用已 ...
15个不起眼但非常强大的 Vim 命令
如果我的关于这个话题的最新帖子没有提醒到你的话,那我明确地说,我是一个 vim 的粉丝.所以在你们中的某些人向我扔石头之前,我先向你们展示一系列“鲜为人知的 Vim 命令”.我的意思是,一些你可能以前 ...
E - Just a Hook - hdu 1698（区间覆盖）
某个英雄有这样一个金属长棍,这个金属棍有很多相同长度的短棍组成,大概最多有10w节,现在这个人有一种魔法,他可以把一段区间的金属棍改变成别的物质,例如金银或者铜, 现在他会有一些操作在这个金属棍上,他 ...
redis取值报错
> get "all_couriers_on_the_job" (error) ERR Operation against a key holding the wrong k ...
[Flexbox] Using flex-direction to layout content horizontally and vertically
The Flexbox css spec allows for more adjustable layouts. The flex-directionproperty allows you to ea ...

编程Tips集锦

编程Tips集锦的更多相关文章

随机推荐

热门专题