np.where与pd.Series.where，pd.DataFrame.where的用法及区别

np.where与pd.Series.where及pd.DataFrame用法不一样，下面一一进行学习，总结：

import numpy as np

import pandas as pd

help(np.where)

Help on built-in function where in module numpy.core.multiarray:

where(...)

    where(condition, [x, y])

    Return elements, either from `x` or `y`, depending on `condition`.

    If only `condition` is given, return ``condition.nonzero()``.

    Parameters

    ----------

    condition : array_like, bool

        When True, yield `x`, otherwise yield `y`.

    x, y : array_like, optional

        Values from which to choose. `x`, `y` and `condition` need to be

        broadcastable to some shape.

    Returns

    -------

    out : ndarray or tuple of ndarrays

        If both `x` and `y` are specified, the output array contains

        elements of `x` where `condition` is True, and elements from

        `y` elsewhere.

        If only `condition` is given, return the tuple

        ``condition.nonzero()``, the indices where `condition` is True.

    See Also

    --------

    nonzero, choose

    Notes

    -----

    If `x` and `y` are given and input arrays are 1-D, `where` is

    equivalent to::

        [xv if c else yv for (c,xv,yv) in zip(condition,x,y)]

    Examples

    --------

    >>> np.where([[True, False], [True, True]],

    ...          [[1, 2], [3, 4]],

    ...          [[9, 8], [7, 6]])

    array([[1, 8],

           [3, 4]])

    >>> np.where([[0, 1], [1, 0]])

    (array([0, 1]), array([1, 0]))

    >>> x = np.arange(9.).reshape(3, 3)

    >>> np.where( x > 5 )

    (array([2, 2, 2]), array([0, 1, 2]))

    >>> x[np.where( x > 3.0 )]               # Note: result is 1D.

    array([ 4.,  5.,  6.,  7.,  8.])

    >>> np.where(x < 5, x, -1)               # Note: broadcasting.

    array([[ 0.,  1.,  2.],

           [ 3.,  4., -1.],

           [-1., -1., -1.]])

    Find the indices of elements of `x` that are in `goodvalues`.

    >>> goodvalues = [3, 4, 7]

    >>> ix = np.isin(x, goodvalues)

    >>> ix

    array([[False, False, False],

           [ True,  True, False],

           [False,  True, False]])

    >>> np.where(ix)

    (array([1, 1, 2]), array([0, 1, 1]))

np.where用法

从上面帮助信息可以看到：np.where的参数有condition，可选参数x,y。

而有无可选参数以及可选参数x,y的维数将直接影响np.where的返回结果：如果没有可选参数x,y则相当于np.nonzero，返回condition数组的True或者非0的包含索引列表对的元组；如果有x,y则输出的数组形状首先与condition，x,y的一致（如果不一致，则广播为一致）根据condition的值来从x,y中挑选值。

（1）无可选参数,x,y

a=np.random.randint(0,high=2,size=(3,3));a

array([[0, 1, 1],

       [1, 1, 0],

       [1, 1, 0]])

np.where(a)

(array([0, 0, 1, 1, 2, 2], dtype=int64),

 array([1, 2, 0, 1, 0, 1], dtype=int64))

(2)有x,y，输出结果的形状是condition,x,y的广播后的数组的形状，然后根据condition从x,y中挑选。

cond=np.array([True,False])

x=np.arange(6).reshape(3,2);x

array([[0, 1],

       [2, 3],

       [4, 5]])

y=np.array([[100,200]])

cond.shape

(2,)

x.shape

(3, 2)

y.shape

(1, 2)

所以广播后的形状应该是（3，2）

result=np.where(cond,x,y);result

array([[  0, 200],

       [  2, 200],

       [  4, 200]])

result.shape

(3, 2)

pandas中的where

help(pd.DataFrame.where)

Help on function where in module pandas.core.generic:

where(self, cond, other=nan, inplace=False, axis=None, level=None, errors='raise', try_cast=False, raise_on_error=None)

    Return an object of same shape as self and whose corresponding

    entries are from self where `cond` is True and otherwise are from

    `other`.

    Parameters

    ----------

    cond : boolean NDFrame, array-like, or callable

        Where `cond` is True, keep the original value. Where

        False, replace with corresponding value from `other`.

        If `cond` is callable, it is computed on the NDFrame and

        should return boolean NDFrame or array. The callable must

        not change input NDFrame (though pandas doesn't check it).

        .. versionadded:: 0.18.1

            A callable can be used as cond.

    other : scalar, NDFrame, or callable

        Entries where `cond` is False are replaced with

        corresponding value from `other`.

        If other is callable, it is computed on the NDFrame and

        should return scalar or NDFrame. The callable must not

        change input NDFrame (though pandas doesn't check it).

        .. versionadded:: 0.18.1

            A callable can be used as other.

    inplace : boolean, default False

        Whether to perform the operation in place on the data

    axis : alignment axis if needed, default None

    level : alignment level if needed, default None

    errors : str, {'raise', 'ignore'}, default 'raise'

        - ``raise`` : allow exceptions to be raised

        - ``ignore`` : suppress exceptions. On error return original object

        Note that currently this parameter won't affect

        the results and will always coerce to a suitable dtype.

    try_cast : boolean, default False

        try to cast the result back to the input type (if possible),

    raise_on_error : boolean, default True

        Whether to raise on invalid data types (e.g. trying to where on

        strings)

        .. deprecated:: 0.21.0

    Returns

    -------

    wh : same type as caller

    Notes

    -----

    The where method is an application of the if-then idiom. For each

    element in the calling DataFrame, if ``cond`` is ``True`` the

    element is used; otherwise the corresponding element from the DataFrame

    ``other`` is used.

    The signature for :func:`DataFrame.where` differs from

    :func:`numpy.where`. Roughly ``df1.where(m, df2)`` is equivalent to

    ``np.where(m, df1, df2)``.

    For further details and examples see the ``where`` documentation in

    :ref:`indexing <indexing.where_mask>`.

    Examples

    --------

    >>> s = pd.Series(range(5))

    >>> s.where(s > 0)

    0    NaN

    1    1.0

    2    2.0

    3    3.0

    4    4.0

    >>> s.mask(s > 0)

    0    0.0

    1    NaN

    2    NaN

    3    NaN

    4    NaN

    >>> s.where(s > 1, 10)

    0    10.0

    1    10.0

    2    2.0

    3    3.0

    4    4.0

    >>> df = pd.DataFrame(np.arange(10).reshape(-1, 2), columns=['A', 'B'])

    >>> m = df % 3 == 0

    >>> df.where(m, -df)

       A  B

    0  0 -1

    1 -2  3

    2 -4 -5

    3  6 -7

    4 -8  9

    >>> df.where(m, -df) == np.where(m, df, -df)

          A     B

    0  True  True

    1  True  True

    2  True  True

    3  True  True

    4  True  True

    >>> df.where(m, -df) == df.mask(~m, -df)

          A     B

    0  True  True

    1  True  True

    2  True  True

    3  True  True

    4  True  True

    See Also

    --------

    :func:`DataFrame.mask`

从上面帮助信息可以看到:DataFrame和Series的where函数遵循的是if-then模式,即调用者(DataFrame,或者Series)中的元素对于在condition中为True的保留,为False的,用other填充(默认为nan),inplace默认为False,即返回一个与调用者形状一样的DataFrame或者Series,如果为True,则原地修改.其与mask方法正好相反.

np.where与DataFrame或Series的where方法的区别:

(1)numpy中是模块级别的函数,numpy模块下ndarray对象并没有where方法;而pandas没有模块级别where方法,只能通过DataFrame,Series对象来调用

(2)np.where中condition可以是数组,布尔值,而pandas的DataFrame及Series的condition不仅可以是数组,布尔值,还可以是函数句柄;

(3)前者有对于condition为True的选择集合x,而后者遵循的是if-then模式,仅对condition为False情况给出其选择集合

(4)前者返回值的形状与condition,x,y有关,是三者广播后数组的形状;而后者返回值与调用者保持一致

(5)后者有inplace参数,可以决定是返回一个新的对象还是对调用者原地修改;而前者本身就是要重组一个数组,所以没有inplace这个参数.

np.where与pd.Series.where，pd.DataFrame.where的用法及区别的更多相关文章

【原创】展开二层嵌套列表(或pd.Series)的几种方法效率对比
转载请注明出处:https://www.cnblogs.com/oceanicstar/p/10248763.html ★二层嵌套列表(或以列表为元素的pd.Series)有以下几种展开方式 (1)列 ...
遍历pd.Series的index和value的方法
以下内容来自链接:https://blog.csdn.net/qq_42648305/article/details/89634186 遍历pd.Series的index和value的方法如下,pyt ...
Series转化为DataFrame数据
out=groupby_sum.ix[:'to_uid','sum(diamonds)']使用ix在提取数据的时候,out的数据类型通常为<class 'pandas.core.series.S ...
cumsum累计函数系列：pd.cumsum()、pd.cumprod()、pd.cummax()、pd.cummin()
cum系列函数是作为DataFrame或Series对象的方法出现的,因此命令格式为D.cumsum() 举例: D=pd.Series(range(0,5)) 1. cumsum 2. cumpro ...
pandas处理时间序列（1）：pd.Timestamp()、pd.Timedelta()、pd.datetime( )、 pd.Period()、pd.to_timestamp()、datetime.strftime()、pd.to_datetime( )、pd.to_period()
Pandas库是处理时间序列的利器,pandas有着强大的日期数据处理功能,可以按日期筛选数据.按日期显示数据.按日期统计数据. pandas的实际类型主要分为: timestamp(时间戳) ...
使用read、readline、readlines和pd.read_csv、pd.read_table、pd.read_fwf、pd.read_excel获取数据
从文本文件读取数据法一: 使用read.readline.readlines读取数据 read([size]):从文件读取指定的字节数.如果未给定或为负值,则去取全部.返回数据类型为字符串(将所有行 ...
PowerDesigner 12小技巧-pd小技巧-pd工具栏不见了-pd修改外键命名规则-pd添加外键
PowerDesigner 12小技巧-pd小技巧-pd工具栏不见了-pd修改外键命名规则-pd添加外键 1. 附加:工具栏不见了调色板(Palette)快捷工具栏不见了PowerDesigner ...
dataframe的一些用法
pandas中Dataframe的一些用法 pandas读取excel文件 pd.read_excel 前提是安装xlrd库 dataframe,numpy,list之间的互相转换 dataframe ...
SparkSQL 中 RDD 、DataFrame 、DataSet 三者的区别与联系
一.SparkSQL发展: Shark是一个为spark设计的大规模数据仓库系统,它与Hive兼容 Shark建立在Hive的代码基础上,并通过将Hive的部分物理执行计划交换出来(by s ...
pandas 下的 one hot encoder 及 pd.get_dummies() 与 sklearn.preprocessing 下的 OneHotEncoder 的区别
sklearn.preprocessing 下除了提供 OneHotEncoder 还提供 LabelEncoder(简单地将 categorical labels 转换为不同的数字): 1. 简单区 ...

随机推荐

探秘Transformer系列之（12）--- 多头自注意力
探秘Transformer系列之(12)--- 多头自注意力目录探秘Transformer系列之(12)--- 多头自注意力 0x00 概述 0x01 研究背景 1.1 问题 1.2 根源 1.3 ...
[Qt基础-06] QButtonGroup
QButtonGroup 本文主要根据QT官方帮助文档以及日常使用,简单的介绍一下QButtonGroup的功能以及基本使用文章目录 QButtonGroup 简介信号和槽简介有的时候,我们会 ...
Ansible忽略任务失败
在默认情况下,任务失败时会中止剧本任务,不过可以通过忽略失败的任务来覆盖此类行为.在可能出错且不影响全局的段中使用ignore_errors关键词来达到目的. 环境: 受控主机清单文件: [dev] ...
几个技巧，教你去除文章的 AI 味！
最近有不少朋友在利用 AI 写毕业设计论文,几秒钟一篇文章就刷出来的,爽的飞起. 结果万万没想到,人家论文查重服务也升级了,是不是用 AI 写的论文大概率都能被查出来... 这可如何是好啊?救救我救救 ...
Docker学习笔记：Docker 网络配置
2016-10-12 10:29:00 先知转贴 51964 图: Docker - container and lightweight virtualization Dokcer 通过使用 Li ...
CompletableFuture你真的懂了么，我劝你在项目中慎用
1. 前言在实际做项目中,我们经常使用多线程.异步的来帮我们做一些事情. 比如用户抽取奖品,异步的给他发一个push. 又比如一段前后不相关的业务逻辑,原本是顺序执行,耗时=(A + B + C), ...
在 CentOS 系统下搭建 ZeroTier Moon 服务器
安装 ZeroTier One: 首先,确保已经安装了 ZeroTier One.你可以按照上述说明,使用以下命令进行安装: sudo yum install zerotier-one 启动 Zero ...
C# 13 中的新增功能实操
前言今天大姚带领大家一起来看看 C# 13 中的新增几大功能,并了解其功能特性和实际应用场景. 前提准备要体验 C# 13 新增的功能可以使用最新的 Visual Studio 2022 版本或 ...
study Python3【4】字符串的判断
判断类型: result为True和False str = '1122abc' str.isalnum()是数字或者字母 str = 'MDCA' str.isalpha() 是字母 str = '1 ...
PandasAI：当数据分析遇上自然语言处理
数据科学的新范式在数据爆炸的时代,传统的数据分析工具正面临着前所未有的挑战.数据科学家们常常需要花费70%的时间在数据清洗和探索上,而真正的价值创造时间却被大幅压缩.PandasAI的出现,正在改变 ...

np.where与pd.Series.where，pd.DataFrame.where的用法及区别

np.where与pd.Series.where，pd.DataFrame.where的用法及区别的更多相关文章

随机推荐

热门专题