python学习要点（一）

我的个人博客排版更舒服： https://www.luozhiyun.com/archives/264

列表和元组

列表是动态的，长度大小不固定，可以随意地增加、删减或者改变元素（mutable）。
而元组是静态的，长度大小固定，无法增加删减或者改（immutable）。

如果你想对已有的元组做任何"改变”，那就只能重新开辟一块内存，创建新的元组了。

如下：

tup = (1, 2, 3, 4)

new_tup = tup + (5, ) # 创建新的元组new_tup，并依次填充原元组的值 new _tup (1, 2, 3, 4, 5)

l = [1, 2, 3, 4]

l.append(5) # 添加元素5到原列表的末尾 l [1, 2, 3, 4, 5]

列表和元组存储方式的差异

由于列表是动态的，所以它需要存储指针，来指向对应的元素。另外，由于列表可变，所以需要额外存储已经分配的长度大小，这样才可以即使扩容。

l = []

l.__sizeof__() // 空列表的存储空间为40字节

40

l.append(1) l.__sizeof__()

72 // 加⼊了元素1之后，列表为其分配了可以存储4个元素的空间 (72 - 40)/8 = 4

l.append(2)

l.__sizeof__()

72 // 由于之前分配了空间，所以加⼊元素2，列表空间不变

l.append(3)

l.__sizeof__()

72 // 同上

l.append(4)

l.__sizeof__()

72 // 同上

l.append(5)

l.__sizeof__()

104 // 加⼊元素5之后，列表的空间不⾜，所以⼜额外分配了可以存储4个元素的空间

但是对于元组，情况就不同了。元组长度大小固定，元素不可变，所以存储空间固定。

列表和元组的性能

元组要比列表更加轻量级一些，所以总体上来说，元组的性能速度要略优于列表。

Python会在后台，对静态数据做一些资源缓存资源缓存（resource caching）。通常来说，因为垃圾回收机制的存在，如果一些变量不被使用了，Python就会回收它们所占用的内存，返还给操作系统，以便其他变量或其他应用使用。

但是对于一些静态变量，比如元组，如果它不被使用并且占用空间不大时，Python会暂时缓存这部分内存。

由下面例子元组的初始化速度，要比列表快5倍。

python3 -m timeit 'x=(1,2,3,4,5,6)'

20000000 loops, best of 5: 9.97 nsec per loop

python3 -m timeit 'x=[1,2,3,4,5,6]'

5000000 loops, best of 5: 50.1 nsec per loop

字典和集合

集合和字典基本相同，唯一的区别，就是集合没有键和值的配对，是一系列无序的、唯一的元素组合。

如何访问、使用就不说了，说两个注意点：

Python 中字典和集合，无论是键还是值，都可以是混合类型

s = {1, 'hello', 5.0}

字典访问可以直接索引键，如果不存在，就会抛出异常；也可以使用 get(key, default) 函数来进行索引。如果键不存在，调用 get() 函数可以返回一个默认值。

d = {'name': 'jason', 'age': 20}

d['name']

'jason'

d['location']

Traceback (most recent call last):

  File "<stdin>", line 1, in <module>

KeyError: 'location'

d = {'name': 'jason', 'age': 20}

d.get('name')

'jason'

d.get('location', 'null')

'null

字典和集合的工作原理

字典和集合的内部结构都是一张哈希表。

对于字典而言，这张表存储了哈希值（hash）、键和值这 3 个元素。
而对集合来说，区别就是哈希表内没有键和值的配对，只有单一的元素了。

老版本 Python 的哈希表结构如下所示：

--+-------------------------------+

  | 哈希值 (hash)  键 (key)  值 (value)

--+-------------------------------+

0 |    hash0      key0    value0

--+-------------------------------+

1 |    hash1      key1    value1

--+-------------------------------+

2 |    hash2      key2    value2

--+-------------------------------+

. |           ...

__+_______________________________+

随着哈希表的扩张，它会变得越来越稀疏。举个例子：

{'name': 'mike', 'dob': '1999-01-01', 'gender': 'male'}

那么它会存储为类似下面的形式：

entries = [

['--', '--', '--']

[-230273521, 'dob', '1999-01-01'],

['--', '--', '--'],

['--', '--', '--'],

[1231236123, 'name', 'mike'],

['--', '--', '--'],

[9371539127, 'gender', 'male']

]

这样的设计结构显然非常浪费存储空间。

为了提高存储空间的利用率，现在的哈希表除了字典本身的结构，会把索引和哈希值、键、值单独分开：

Indices

----------------------------------------------------

None | index | None | None | index | None | index ...

----------------------------------------------------

Entries

--------------------

hash0   key0  value0

---------------------

hash1   key1  value1

---------------------

hash2   key2  value2

---------------------

        ...

---------------------

在新的哈希表结构下的存储形式，上面的例子就是这样：

indices = [None, 1, None, None, 0, None, 2]

entries = [

[1231236123, 'name', 'mike'],

[-230273521, 'dob', '1999-01-01'],

[9371539127, 'gender', 'male']

]

插入操作

每次向字典或集合插入一个元素时，Python 会首先计算键的哈希值（hash(key)），再和 mask = PyDicMinSize - 1 做与操作，计算这个元素应该插入哈希表的位置 index = hash(key) & mask。

如果哈希表中此位置是空的，那么这个元素就会被插入其中。而如果此位置已被占用，Python 便会比较两个元素的哈希值和键是否相等。

若两者中有一个不相等，这种情况我们通常称为哈希冲突（hash collision），意思是两个元素的键不相等，但是哈希值相等。这种情况下，Python 便会继续寻找表中空余的位置，直到找到位置为止。

删除操作

对于删除操作，Python 会暂时对这个位置的元素，赋于一个特殊的值，等到重新调整哈希表的大小时，再将其删除。

为了保证其高效性，字典和集合内的哈希表，通常会保证其至少留有 1/3 的剩余空间。随着元素的不停插入，当剩余空间小于 1/3 时，Python 会重新获取更大的内存空间，扩充哈希表。

函数变量作用域

局部变量

如果变量是在函数内部定义的，就称为局部变量，只在函数内部有效。一旦函数执行完毕，局部变量就会被回收，无法访问。

对于嵌套函数来说，内部函数可以访问外部函数定义的变量，但是无法修改，若要修改，必须加上 nonlocal 这个关键字：

def outer():

    x = "local"

    def inner():

        nonlocal x # nonlocal 关键字表示这里的 x 就是外部函数 outer 定义的变量 x

        x = 'nonlocal'

        print("inner:", x)

    inner()

    print("outer:", x)

outer()

# 输出

inner: nonlocal

outer: nonlocal

如果不加上 nonlocal 这个关键字，而内部函数的变量又和外部函数变量同名，那么同样的，内部函数变量会覆盖外部函数的变量。

def outer():

    x = "local"

    def inner():

        x = 'nonlocal' # 这里的 x 是 inner 这个函数的局部变量

        print("inner:", x)

    inner()

    print("outer:", x)

outer()

# 输出

inner: nonlocal

outer: local

全局变量

全局变量则是定义在整个文件层次上的，可以在文件内的任何地方被访问，但是不能在函数内部随意改变全局变量的值。

例如：

MIN_VALUE = 1

MAX_VALUE = 10

def validation_check(value):

    ...

    MIN_VALUE += 1

    ...

validation_check(5)

#输出

UnboundLocalError: local variable 'MIN_VALUE' referenced before assignment

因为，Python 的解释器会默认函数内部的变量为局部变量，但是又发现局部变量 MIN_VALUE 并没有声明，因此就无法执行相关操作。

如果我们一定要在函数内部改变全局变量的值，就必须加上 global 这个声明：

MIN_VALUE = 1

MAX_VALUE = 10

def validation_check(value):

    global MIN_VALUE

    ...

    MIN_VALUE += 1

    ...

validation_check(5)

如果遇到函数内部局部变量和全局变量同名的情况，那么在函数内部，局部变量会覆盖全局变量，比如下面这种：

MIN_VALUE = 1

MAX_VALUE = 10

def validation_check(value):

    MIN_VALUE = 3

    ...

闭包

闭包中外部函数返回的是一个函数，返回的函数通常赋于一个变量，这个变量可以在后面被继续执行调用。

比如，我们想计算一个数的 n 次幂，用闭包可以写成下面的代码：

def nth_power(exponent):

    def exponent_of(base):

        return base ** exponent

    return exponent_of # 返回值是 exponent_of 函数

square = nth_power(2) # 计算一个数的平方

cube = nth_power(3) # 计算一个数的立方

square

# 输出

<function __main__.nth_power.<locals>.exponent(base)>

cube

# 输出

<function __main__.nth_power.<locals>.exponent(base)>

print(square(2))  # 计算 2 的平方

print(cube(2)) # 计算 2 的立方

# 输出

4 # 2^2

8 # 2^3

这里外部函数 nth_power() 返回值，是函数 exponent_of()，而不是一个具体的数值。

面对对象

函数

静态函数：与类没有什么关联可以用来做一些简单独立的任务，既方便测试，也能优化代码结构。静态函数可以通过在函数前一行加上 @staticmethod 来表示。

类函数：第一个参数一般为 cls，表示必须传一个类进来。类函数最常用的功能是实现不同的 init 构造函数，类似java中的构造器。类函数需要装饰器 @classmethod 来声明。

成员函数：是我们最正常的类的函数，它不需要任何装饰器声明，第一个参数 self 代表当前对象的引用，可以通过此函数，来实现想要的查询 / 修改类的属性等功能。

例子如下：

class Document():

    WELCOME_STR = 'Welcome! The context for this book is {}.'

    def __init__(self, title, author, context):

        print('init function called')

        self.title = title

        self.author = author

        self.__context = context

    # 类函数

    @classmethod

    def create_empty_book(cls, title, author):

        return cls(title=title, author=author, context='nothing')

    # 成员函数

    def get_context_length(self):

        return len(self.__context)

    # 静态函数

    @staticmethod

    def get_welcome(context):

        return Document.WELCOME_STR.format(context)

empty_book = Document.create_empty_book('What Every Man Thinks About Apart from Sex', 'Professor Sheridan Simove')

print(empty_book.get_context_length())

print(empty_book.get_welcome('indeed nothing'))

########## 输出 ##########

init function called

7

Welcome! The context for this book is indeed nothing.

继承

class Entity():

    def __init__(self, object_type):

        print('parent class init called')

        self.object_type = object_type

    def get_context_length(self):

        raise Exception('get_context_length not implemented')

    def print_title(self):

        print(self.title)

class Document(Entity):

    def __init__(self, title, author, context):

        print('Document class init called')

        Entity.__init__(self, 'document')

        self.title = title

        self.author = author

        self.__context = context

    def get_context_length(self):

        return len(self.__context)

class Video(Entity):

    def __init__(self, title, author, video_length):

        print('Video class init called')

        Entity.__init__(self, 'video')

        self.title = title

        self.author = author

        self.__video_length = video_length

    def get_context_length(self):

        return self.__video_length

harry_potter_book = Document('Harry Potter(Book)', 'J. K. Rowling', '... Forever Do not believe any thing is capable of thinking independently ...')

harry_potter_movie = Video('Harry Potter(Movie)', 'J. K. Rowling', 120)

print(harry_potter_book.object_type)

print(harry_potter_movie.object_type)

harry_potter_book.print_title()

harry_potter_movie.print_title()

print(harry_potter_book.get_context_length())

print(harry_potter_movie.get_context_length())

########## 输出 ##########

Document class init called

parent class init called

Video class init called

parent class init called

document

video

Harry Potter(Book)

Harry Potter(Movie)

77

120

我们可以从中抽象出一个叫做 Entity 的类，来作为Document 和 Video的父类。

每个类都有构造函数，继承类在生成对象的时候，是不会自动调用父类的构造函数的，因此你必须在 init() 函数中显式调用父类的构造函数。它们的执行顺序是子类的构造函数 -> 父类的构造函数。

由于父类的get_context_length方法是用来被重写的，所以使用 Entity 直接生成对象，调用 get_context_length() 函数，就会 raise error 中断程序的执行。

继承的优势：减少重复的代码，降低系统的熵值（即复杂度）。

抽象类

抽象类是一种特殊的类，它生下来就是作为父类存在的，一旦对象化就会报错。同样，抽象函数定义在抽象类之中，子类必须重写该函数才能使用。相应的抽象函数，则是使用装饰器 @abstractmethod 来表示。

抽象类就是这么一种存在，它是一种自上而下的设计风范，你只需要用少量的代码描述清楚要做的事情，定义好接口，然后就可以交给不同开发人员去开发和对接。

from abc import ABCMeta, abstractmethod

class Entity(metaclass=ABCMeta):

    @abstractmethod

    def get_title(self):

        pass

    @abstractmethod

    def set_title(self, title):

        pass

class Document(Entity):

    def get_title(self):

        return self.title

    def set_title(self, title):

        self.title = title

document = Document()

document.set_title('Harry Potter')

print(document.get_title())

entity = Entity()

########## 输出 ##########

Harry Potter

---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

<ipython-input-7-266b2aa47bad> in <module>()

     21 print(document.get_title())

     22

---> 23 entity = Entity()

     24 entity.set_title('Test')

TypeError: Can't instantiate abstract class Entity with abstract methods get_title, set_title

代码中entity = Entity()直接报错，只有通过 Document 继承 Entity 才能正常使用。