Effective Python2 读书笔记3
Item 22: Prefer Helper Classes Over Bookkeeping with Dictionaries and Tuples
For example, say you want to record the grades of a set of students whose names aren't konwn in advance. You can define a class to store the names in a dictionary instead of using a predefined attribute for each student.
class SimpleGradebook(object):
def __init__(self):
self._grades = {} def add_student(self, name):
self._grades[name] = [] def report_grade(self, name, score):
self._grades[name].append(score) def average_grade(self, name):
grades = self._grades[name]
return float(sum(grades)) / len(grades)
Using the class is simple.
book = SimpleGradebook()
book.add_student('Isaac Newton')
book.report_grade('Isaac Newton', 90)
# ...
print(book.average_grade('Isaac Newton')) >>>
90.0
Now say you want to extend the SimpleGradebook class to keep a list of grades by subject, not just overall.
class BySubjectGradebook(object):
def __init__(self):
self._grades = {} def add_student(self, name):
self._grades[name] = {} def report_grade(self, name, subject, grade):
by_subject = self._grades[name]
grade_list = by_subject.setdefault(subject, [])
grade_list.append(grade) def average_grade(self, name):
by_subject = self._grades[name]
total, count = 0.0, 0
for grades in by_subject.values():
total += sum(grades)
count += len(grades)
return total / count
Using the class remains simple.
book = BySubjectGradebook()
book.add_student('Albert Einstein')
book.report_grade('Albert Einstein', 'Math', 75)
book.report_grade('Albert Einstein', 'Math', 65)
book.report_grade('Albert Einstein', 'Gym', 90)
book.report_grade('Albert Einstein', 'Gym' 95)
Now, requirements change again. You also want to track the weight of each score toward the overall grade in the class so midterms and finals are more important than pop quizzes. One way to implement this feature is to change the innermost dictionary; instead of mapping subjects (the keys) to grades (the values), I can use the tuple (score, weight) as values.
def WeightedGradebook(object):
# ...
def report_grade(self, name, subject, score, weight):
by_subject = self._grades[name]
grade_list = by_subject.setdefault(subject, [])
grade_list.append((score, weight)) def average_grade(self, name):
by_subject = self._grades[name]
score_sum, score_count = 0.0, 0
for subject, scores in by_subject.items():
subject_avg, total_weight = 0.0, 0
for socre, weight in scores:
# ...
return score_sum / score_count
Using the class has also gotten more difficult. It's unclear what all of the numbers in the positional arguments mean.
When you see complexity like this happen, it's time to make the leap from dictionaries and tuples to a hierarchy of classes.
At first, you didn't know you'd need to support weighted grades, so the complexity of additional helper classes seemed unwarranted. Python's built-in dictionary and tuple types made it easy to keep going, adding layer after layer to the internal bookkeeping. But you should avoid doing this for more than one level of nesting (i.e.. avoid dictionaries that contain dictionaries). It makes your code hard to read by other programmers and sets you up for a maintenance nightmare.
As soon as you realize the bookkeeping is getting complicated, break it all out into classes. This lets you provide well-defined interfaces that better encapsulate your data. This also enables you to create a layer of abstranction between your interfaces and your concrete implementations.
Refactoring to Classes
You can start moving to classes at the bottom of the dependency tree: a single grade. A class seems too heavyweight for such simple information. A tuple, though, seems appropriate because grades are immutable.
grades = []
grades.append((95, 0.45))
# ...
total = sum(score * weight for score, weight in grades)
total_weight = sum(weight for _, weight in grades)
average_grade = total / total_weight
The problem is that plain tuples are positional. When you want to associate more information with grade, like a set of notes from the teacher, you'll need to rewrite every usage of the two-tuple to be aware that there are now three items present instead of two.
grades = []
grades.append((95, 0.45, 'Great job'))
# ...
total = sum(score * weight for score, weight, _ in grades)
total_weight = sum(weight for _, weight, _ in grades)
average_grade = total / total_weight
This pattern of extending tuples longer and longer is similar to deepening layers of dictionaries. As soon as you find yourself going longer than a two-tuple, it's time to consider another approach.
The namedtuple type in the collections module does exactly what you need. It lets you easily define tiny, immutable data classes.
from collections import namedtuple
Grade = namedtuple('Grade', ('score', 'weight'))
These classes can be constructed with positional or keyword arguments. The fields are accessible with named attributes. Having named attributes makes it easy to move from a namedtuple to your own class later if your requirements change again and you need to add behaviors to the simple data containers.
Limitations of namedtuple
Although useful in many circumstances, it's important to understand when namedtuple can cause more harm than good.
- You can't specify default argument values for namedtuple classes. This makes them unwieldy when your data may have many optional properities. If you find yourself using more than a handful of attributes, defining your own class may be a better choice.
- The attribute values of namedtuple instances are still accessible using numerical indexes and iteration. If you're not in control of all of the usage of your namedtuple instances, it's better to define your own class.
Next, you can write a class to represent a single subject that contains a set of grades.
class Subject(object):
def __init__(self):
self._grades = [] def report_grade(self, score, weight):
self._grades.append(Grade(score, weight)) def average_grade(self):
total, total_weight = 0, 0
for grade in self._grades:
total += grade.score * grade.weight
total_weight += grade.weight
return total / total_weight
Then you would write a class to represent a set of subjects that are being studied by a single student.
class Student(object):
def __init__(self):
self._subjects = {} def subject(self, name):
if name not in self._subjects:
self._subjects[name] = Subject()
return self._subjects[name] def average_grade(self):
total, count = 0, 0
for subject in self._subjects.values():
total += subjects.average_grade()
count += 1
return total / count
Finally, you'd write a container for all of the students keyed dynamically by their names.
class Gradebook(object):
def __init__(self):
self._students = {} def student(self, name):
if name not in self._students:
self._students[name] = Student()
return self._students[name]
The line count of these classes is almost double the previous implementation's size. But this code is much easier to read. The example driving the classes is also more clear and extensible.
book = Gradebook()
albert = book.student('Albert Einstein')
math = albert.subject('Math')
math.report_grade(score=80, weight=0.10)
# ...
print(albert.average_grade()) >>>
81.5
Item 23: Accept Functions for Simple Iterfaces Instead of Classes
Many of Python's built-in APIs allow you to customize behavior by passing in a function. These hooks are used by APIs to call back your code while they execute.
The __call__ special method enables instances of a class to be called like plain Python functions.
When you need a function to maintain state, consider defining a class that provides the __call__ method instead of defining a stateful closure.
Item 24: Use @classmethod Polymorphism to Construct Object Generically
Say you're writing a MapReduce implementation and you want a common class to respresent the input data. Here, I define such a class with a read method that must be defined by subclass:
class InputData(object):
def read(self):
raise NotImplementedError
Here, I have a concrete subclass of InputData that reads data from a file on disk:
class PathInputData(InputData):
def __init__(self, path):
super().__init__()
self.path = path def read(self):
return open(self.path).read()
You'd want a similar abstract interface for the MapReduce worker that consumes the input data in a standard way.
class Worker(object):
def __init__(self, input_data):
self.input_data = input_data
self.result = None def map(self):
raise NotImplementedError def reduce(self, other):
raise NotImplementedError
Here, I define a concrete subclass of Worker to implement the specific MapReduce function I want to apply: a simple newline counter:
class LineCountWorker(Worker):
def map(self):
data = self.input_data.read()
self.result = data.count('\n') def reduce(self, other):
self.result += other.result
It may look like this implementation is going great, but I've reached the biggest hurdle in all of this. What connects all of these pieces?
The simplest approach is to manually build and connect the objects with some helper functions. Here I list the contents of a directory and construct a PathInputData instance for each file it contains:
def generate_inputs(data_dir):
for name in os.listdir(data_dir):
yield PathInputData(os.path.join(data_dir, name))
Next, I create the LineCountWorker instances using the InputData instances returned by generate_inputs.
def create_workers(input_list):
workers = []
for input_data in input_list:
workers.append(LineCountWorker(input_data))
return workers
Then, I call reduce repeatedly to combine the results into one final value.
def execute(workers):
threads = [Thread(target=w.map) for w in workers]
for thread in threads: thread.start()
for thread in threads: thread.join() first, rest = workers[0], workers[1:]
for worker in rest:
first.reduce(worker)
return first.result
Finally, I connect all of these pieces together in a function to run each step.
def mapreduce(data_dir):
inputs = generate_inputs(data_dir)
workers = create_workers(inputs)
return execute(workers)
What's the problem? The huge issue is the mapreduce function is not generic at all. If you want to write another InputData or Worker subclass, you would also have to rewrite the generate_inputs, create_workers and mapreduce functions to match.
The best way to solve this problem is with @classmethod polymorphism.
Here, I extend the InputData class with a generic class method that's responsible for creating new InputData instances using a common interface.
class GenericInputData(object):
def read(self):
raise NotImplementedError @classmethod
def generate_inputs(cls, config):
raise NotImplementedError
I have generate_inputs take a dictionary with a set of configuration parameters that are up to the InputData concrete subclass to interpret. Here, I use the config to find the directory to list for input files.
class PathInputData(GenericInputData):
def __init__(self, path):
super().__init__()
self.path = path def read(self):
return open(self.path).read() @classmethod
def generate_inputs(cls, config):
data_dir = config['data_dir']
for name in os.listdir(data_dir)
yield cls(os.path.join(data_dir, name))
Here, I use the input_class parameter, which must be a subclass of GenericInputData, to generate the necessary inputs. I construct instances of the GenericWorker concrete subclass using cls() as a generic constructor.
class GenericWorker(object):
def __init__(self, input_data):
self.input_data = input_data
self.result = None def map(self):
raise NotImplementedError def reduce(self):
raise NotImplementedError @classmethod
def create_workers(cls, input_class, config):
workers = []
for input_data in input_class.generate_inputs(config):
workers.append(cls(input_data))
return workers
Note that the call to input_class.generate_inputs above is the class polymorphism I'm trying to show. You can also see how create_workers calling cls provides an alternate way to construct GenericWorker objects besides using the __init__ method directly.
LineCountWorker is nothing more than changing its parent class.
And finally, I can rewrite the mapreduce function to be completely generic.
def mapreduce(worker_class, input_class, config):
workers = worker_class.create_workers(input_class, config)
return execute(workers)
Item 25: Initialize Parent Classes with super
class MyBaseClass(object):
def __init__(self, value):
self.value = value
# old way, breaks down in multiple inheritance
class MyChildClass(MyBaseClass):
def __init__(self, value):
MyBaseClass.__init__(self, value) # use super
class MyChildClass(MyBaseClass):
def __init__(self, value):
super(MyChildClass, self).__init__(value)
Python's standard method resolution order (MRO) solves the problems of superclass initialization order and diamond inheritance.
Always use the super built-in function to initialize parent classes.
Item 26: Use Multiple Inheritance Only for Mix-in Utility Classes
It's too hard for me now.
Item 27: Perfer Public Attributes Over Private Ones
Private attributes aren't rigorously enforced by the Python compiler.
Plan from the beginning to allow subclasses to do more with you internal APIs and attributes instead of locking them out by default.
Use documentation of protected fields to guide subclasses instead of trying to force access control with private attributes.
Only consider using private attributes to avoid naming conflicts with subclasses that are out of your control.
Item 28: Inherit from collections.abc for Custom Container Types
Still a little too hard for me now. Fuck binary tree!
Effective Python2 读书笔记3的更多相关文章
- Effective Python2 读书笔记1
Item 2: Follow the PEP 8 Style Guide Naming Naming functions, variables, attributes lowercase_unders ...
- Effective Python2 读书笔记2
Item 14: Prefer Exceptions to Returning None Functions that returns None to indicate special meaning ...
- Effective STL 读书笔记
Effective STL 读书笔记 标签(空格分隔): 未分类 慎重选择容器类型 标准STL序列容器: vector.string.deque和list(双向列表). 标准STL管理容器: set. ...
- Effective STL读书笔记
Effective STL 读书笔记 本篇文字用于总结在阅读<Effective STL>时的笔记心得,只记录书上描写的,但自己尚未熟练掌握的知识点,不记录通用.常识类的知识点. STL按 ...
- effective c++读书笔记(一)
很早之前就听过这本书,找工作之前读一读.看了几页,个人感觉实在是生涩难懂,非常不符合中国人的思维方式.之前也有博主做过笔记,我来补充一些自己的理解. 我看有人记了笔记,还不错:http://www.3 ...
- Effective Java读书笔记完结啦
Effective Java是一本经典的书, 很实用的Java进阶读物, 提供了各个方面的best practices. 最近终于做完了Effective Java的读书笔记, 发布出来与大家共享. ...
- Effective java读书笔记
2015年进步很小,看的书也不是很多,感觉自己都要废了,2016是沉淀的一年,在这一年中要不断学习.看书,努力提升自己 计在16年要看12本书,主要涉及java基础.Spring研究.java并发.J ...
- Effective Objective-C 读书笔记
一本不错的书,给出了52条建议来优化程序的性能,对初学者有不错的指导作用,但是对高级阶段的程序员可能帮助不是很大.这里贴出部分笔记: 第2条: 使用#improt导入头文件会把头文件的内容全部暴露到目 ...
- 【Effective C++读书笔记】序
C++ 是一个难学易用的语言! [C++为什么难学?] C++的难学,不仅在其广博的语法,以及语法背后的语义,以及语义背后的深层思维,以及深层思维背后的对象模型: C++的难学还在于它提供了四种不同而 ...
随机推荐
- 2016-2017-2 《Java程序设计》预备作业2总结
2016-2017-2 <Java程序设计>预备作业2总结 古希腊学者普罗塔戈说过:「头脑不是一个要被填满的容器,而是一束需要被点燃的火把.」 在对计算机系的学生情况的调查中,我说: 最近 ...
- sql where and or优先级 待验证
where 后面如果有and,or的条件,则or自动会把左右的查询条件分开,即先执行and,再执行or.原因就是:and的执行优先级最高! 关系型运算符优先级高到低为:not and or 问题的解决 ...
- jq实现点击某元素之外触发事件
<script type="text/javascript"> $(document).bind("click",function(e){ var ...
- 转发 VS 重定向
转发:JSP容器将使用一个内部的方法来调用目标页面,新的页面继续处理同一个请求,而浏览器将不会知道这个过程.以前的request中存放的变量全部失效,并进入一个新的request作用域. 重定向:第一 ...
- 解决:Linux版百度云客户端 BCloud网络错误 问题
国内很多云盘渐渐停止服务支持,如新浪.华为.115.360等... 强大的百度云,你会继续免费让大家使用吗? 今天在Linux上使用了liulang的BCloud百度云客户端,登陆之后不显示主页,什么 ...
- jQuery插件 -- Cookie插件jquery.cookie.js(转)
Cookie是网站设计者放置在客户端的小文本文件.Cookie能为用户提供很多的使得,例如购物网站存储用户曾经浏览过的产品列表,或者门户网站记住用户喜欢选择浏览哪类新闻. 在用户允许的情况下,还可以存 ...
- 快速上手php:使用PhpStrom调试php
闲话 使用phpStrom的时候居然不打印到控制台,要打印测试的话就要输出到页面,目前我还不知道有什么好办法像jsp一样输出到页面的同时也打印到控制台.这种做法还是比较烦的,特别出问题需要调试的时候. ...
- bzoj 1014 splay维护hash值
被后缀三人组虐了一下午,写道水题愉悦身心. 题很裸,求lcq时二分下答案就行了,写的不优美会被卡时. (写题时精神恍惚,不知不觉写了快两百行...竟然调都没调就A了...我还是继续看后缀自动机吧... ...
- codevs 1772 歌词
1772 歌词 时间限制: 1 s 空间限制: 128000 KB 题目等级 : 白银 Silver 题目描述 Description 35痛过以后才知情已难寻吾爱至斯只剩飞花梦影回 ...
- 深夜重温JavaScript中的对象和数组
这一块实际上已经学过了,因为没有学好,在工作过程中遇到一些对象或者数组的操作,会去百度查找,浪费了许多宝贵的时间,所以特地再拐过头来重新学习. 对象 基本概念: 对象这种基本的数据结构还有其他很多种叫 ...