Pandas | 27 注意事项&窍门

警告和疑难意味着一个看不见的问题。在使用Pandas过程中，需要特别注意的地方。

与Pandas一起使用If/Truth语句

当尝试将某些东西转换成布尔值时，Pandas遵循了一个错误的惯例。这种情况发生在使用布尔运算的。目前还不清楚结果是什么。如果它是真的，因为它不是zerolength？错误，因为有错误的值？目前还不清楚，Pandas提出了一个ValueError -

import pandas as pd

if pd.Series([False, True, False]):

    print ('I am True')

输出结果：

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

在if条件，它不清楚如何处理它。错误提示是否使用None或任何这些。

import pandas as pd

if pd.Series([False, True, False]).any():

    print("I am any")

要在布尔上下文中评估单元素Pandas对象，请使用方法.bool() -

import pandas as pd

print (pd.Series([True]).bool())

输出结果：

True

按位布尔值

按位布尔运算符(如==和!=)将返回一个布尔系列，这几乎总是需要的。

import pandas as pd

s = pd.Series(range(5))

print (s==4)

输出结果：

0    False

1    False

2    False

3    False

4     True

dtype: bool

isin操作符

这将返回一个布尔序列，显示系列中的每个元素是否完全包含在传递的值序列中。

import pandas as pd

s = pd.Series(list('abc'))

s = s.isin(['a', 'c', 'e'])

print (s)

输出结果：

0     True

1    False

2     True

dtype: bool

重构索引与ix陷阱

许多用户会发现自己使用ix索引功能作为从Pandas对象中选择数据的简洁方法 -

import pandas as pd

import numpy as np

df = pd.DataFrame(np.random.randn(6, 4), columns=['one', 'two', 'three','four'],index=list('abcdef'))

print (df)

print ("=============================================")

print (df.ix[['b', 'c', 'e']])

输出结果：

        one       two     three      four

a -1.174632  0.951047 -0.177007  1.036567

b -0.806324 -0.562209  1.081449 -1.047623

c  0.107607  0.778843 -0.063531 -1.073552

d -0.277602 -0.962720  1.381249  0.868656

e  0.576266  0.986949  0.433569  0.539558

f -0.708917 -0.583124 -0.686753 -2.338110

=============================================

        one       two     three      four

b -0.806324 -0.562209  1.081449 -1.047623

c  0.107607  0.778843 -0.063531 -1.073552

e  0.576266  0.986949  0.433569  0.539558

这当然在这种情况下完全等同于使用reindex方法 -

import pandas as pd

import numpy as np

df = pd.DataFrame(np.random.randn(6, 4), columns=['one', 'two', 'three','four'],index=list('abcdef'))

print (df)

print("=============================================")

print (df.reindex(['b', 'c', 'e']))

输出结果：

        one       two     three      four

a -1.754084 -1.423820 -0.152234 -1.475104

b  1.508714 -0.216916 -0.184434 -2.117229

c -0.409298 -0.224142  0.308175 -0.681308

d  0.938517 -1.626353 -0.180770 -0.470252

e  0.718043 -0.730215 -0.716810  0.546039

f  2.313001  0.371286  0.359952  2.126530

=============================================

        one       two     three      four

b  1.508714 -0.216916 -0.184434 -2.117229

c -0.409298 -0.224142  0.308175 -0.681308

e  0.718043 -0.730215 -0.716810  0.546039

有人可能会得出这样的结论，ix和reindex是基于这个100％的等价物。除了整数索引的情况，它是true。例如，上述操作可选地表示为 -

import pandas as pd

import numpy as np

df = pd.DataFrame(np.random.randn(6, 4), columns=['one', 'two', 'three', 'four'],index=list('abcdef'))

print (df)

print("=====================================")

print (df.ix[[1, 2, 4]])

print("=====================================")

print (df.reindex([1, 2, 4]))

输出结果：

        one       two     three      four

a  1.017408  0.594357 -0.760587  1.001547

b -1.480067  1.524270  0.455070  1.886959

c -0.136238 -0.165867 -0.589767 -1.078473

d  0.670576  1.600312  0.219578 -1.121352

e -0.224181  0.958156  0.013055 -0.013652

f  1.576155 -0.185003 -0.527204 -0.336275

=====================================

        one       two     three      four

b -1.480067  1.524270  0.455070  1.886959

c -0.136238 -0.165867 -0.589767 -1.078473

e -0.224181  0.958156  0.013055 -0.013652

=====================================

   one  two  three  four

1  NaN  NaN    NaN   NaN

2  NaN  NaN    NaN   NaN

4  NaN  NaN    NaN   NaN

重要的是要记住，reindex只是严格的标签索引。这可能会导致一些潜在的令人惊讶的结果，例如索引包含整数和字符串的病态情况。