String Manipulation related with pandas

String object Methods

import pandas as pd

import numpy as np

val='a,b, guido'

val.split(',') # normal python built-in method split

['a', 'b', ' guido']

pieces=[x.strip() for x in val.split(',')];pieces  # strip whitespace

['a', 'b', 'guido']

'::'.join(pieces)

'a::b::guido'

val.count(',')

val.count('guido')

val.replace(',',':')

'a:b: guido'

val.swapcase()

'A,B, GUIDO'

val[::-1]

'odiug ,b,a'

Regular expression

The re module functions fall into 3 categories:pattern matching,substitution,splliting.

import re

text='foo   bar\t baz  \t qux'

re.split('\s+',text)

['foo', 'bar', 'baz', 'qux']

regex=re.compile('\s+')

regex.split(text)

['foo', 'bar', 'baz', 'qux']

regex.findall(text)

['   ', '\t ', '  \t ']

To avoid unwanted escaping with \ in a regular expression,use raw string literals

text="""Dave dave@google.com

Steve steve@mail.com

Rob rob@mail.com

Ryan ryan@yahoo.com

"""

pattern=r'[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}'

regex=re.compile(pattern,re.I)

Using findall() produces a list of the email address.

regex.findall(text)

['dave@google.com', 'steve@mail.com', 'rob@mail.com', 'ryan@yahoo.com']

regex.findall(r' J.onepy+@w-m.co')

['J.onepy+@w-m.co']

search() returns a specified match object for the first email address in the text.

m=regex.search(text)

<re.Match object; span=(5, 20), match='dave@google.com'>

regex.match(text)

text[m.start():m.end()]

'dave@google.com'

regex.match(text) returns None,as it onlyu will match if the pattern occurs at the start of the string.

sub() will return a new string with occurences of the pattern replaced by a new string.

print(regex.sub('READACTED',text))

Dave READACTED

Steve READACTED

Rob READACTED

Ryan READACTED

Vectorized string functions in pandas

data={'Dave':'dave@google.com','Steve':'steve@gmeil.com','Rob':'rob@gmail.com','Wes':np.nan}

data=pd.Series(data);data

Dave     dave@google.com

Steve    steve@gmeil.com

Rob        rob@gmail.com

Wes                  NaN

dtype: object

data.isnull()

Dave     False

Steve    False

Rob      False

Wes       True

dtype: bool

data.str.contains('gmail')

Dave     False

Steve    False

Rob       True

Wes        NaN

dtype: object

data

Dave     dave@google.com

Steve    steve@gmeil.com

Rob        rob@gmail.com

Wes                  NaN

dtype: object

data.map(lambda x:x[:2],na_action='ignore')  # x is the value in data, the returned Series has the same index with caller,data here.

Dave      da

Steve     st

Rob       ro

Wes      NaN

dtype: object

help(data.map)

Help on method map in module pandas.core.series:

map(arg, na_action=None) method of pandas.core.series.Series instance

    Map values of Series using input correspondence (a dict, Series, or

    function).

    Parameters

    ----------

    arg : function, dict, or Series

        Mapping correspondence.

    na_action : {None, 'ignore'}

        If 'ignore', propagate NA values, without passing them to the

        mapping correspondence.

    Returns

    -------

    y : Series

        Same index as caller.

    Examples

    --------

    Map inputs to outputs (both of type `Series`):

    >>> x = pd.Series([1,2,3], index=['one', 'two', 'three'])

    >>> x

    one      1

    two      2

    three    3

    dtype: int64

    >>> y = pd.Series(['foo', 'bar', 'baz'], index=[1,2,3])

    >>> y

    1    foo

    2    bar

    3    baz

    >>> x.map(y)

    one   foo

    two   bar

    three baz

    If `arg` is a dictionary, return a new Series with values converted

    according to the dictionary's mapping:

    >>> z = {1: 'A', 2: 'B', 3: 'C'}

    >>> x.map(z)

    one   A

    two   B

    three C

    Use na_action to control whether NA values are affected by the mapping

    function.

    >>> s = pd.Series([1, 2, 3, np.nan])

    >>> s2 = s.map('this is a string {}'.format, na_action=None)

    0    this is a string 1.0

    1    this is a string 2.0

    2    this is a string 3.0

    3    this is a string nan

    dtype: object

    >>> s3 = s.map('this is a string {}'.format, na_action='ignore')

    0    this is a string 1.0

    1    this is a string 2.0

    2    this is a string 3.0

    3                     NaN

    dtype: object

    See Also

    --------

    Series.apply : For applying more complex functions on a Series.

    DataFrame.apply : Apply a function row-/column-wise.

    DataFrame.applymap : Apply a function elementwise on a whole DataFrame.

    Notes

    -----

    When `arg` is a dictionary, values in Series that are not in the

    dictionary (as keys) are converted to ``NaN``. However, if the

    dictionary is a ``dict`` subclass that defines ``__missing__`` (i.e.

    provides a method for default values), then this default is used

    rather than ``NaN``:

    >>> from collections import Counter

    >>> counter = Counter()

    >>> counter['bar'] += 1

    >>> y.map(counter)

    1    0

    2    1

    3    0

    dtype: int64

pattern

'[A-Z0-9._%+-]+@[A-Z0-9.-]+\\.[A-Z]{2,4}'

data.str.findall(pattern,flags=re.I)

Dave     [dave@google.com]

Steve    [steve@gmeil.com]

Rob        [rob@gmail.com]

Wes                    NaN

dtype: object

matches=data.str.match(pattern,flags=re.I);matches

Dave     True

Steve    True

Rob      True

Wes       NaN

dtype: object

matches.str.get(1)

Dave    NaN

Steve   NaN

Rob     NaN

Wes     NaN

dtype: float64

matches.str[0]

Dave    NaN

Steve   NaN

Rob     NaN

Wes     NaN

dtype: float64

data.str[:5]

Dave     dave@

Steve    steve

Rob      rob@g

Wes        NaN

dtype: object

String Manipulation related with pandas的更多相关文章

VK Cup 2012 Qualification Round 2 C. String Manipulation 1.0 字符串模拟
C. String Manipulation 1.0 Time Limit: 20 Sec Memory Limit: 256 MB 题目连接 codeforces.com/problemset/pr ...
Bash String Manipulation Examples – Length, Substring, Find and Replace--reference
In bash shell, when you use a dollar sign followed by a variable name, shell expands the variable wi ...
string manipulation in game development-C # in Unity -
◇ string manipulation in game development-C # in Unity - It is about the various string ● defined as ...
CodeForces 159c String Manipulation 1.0
String Manipulation 1.0 Time Limit: 3000ms Memory Limit: 262144KB This problem will be judged on Cod ...
leetcode@ [68] Text Justification (String Manipulation)
https://leetcode.com/problems/text-justification/ Given an array of words and a length L, format the ...
[IoLanguage]Io Programming Guide[转]
Io Programming Guide Introduction Perspective Getting Started Downloading Installing Binaries Ru ...
pandas 之数据清洗-缺失值
Abstract During the course fo doing data analysis and modeling, a significant amount of time is spen ...
[转帖]Introduction to text manipulation on UNIX-based systems
Introduction to text manipulation on UNIX-based systems https://www.ibm.com/developerworks/aix/libra ...
Pandas 之 DataFrame 常用操作
import numpy as np import pandas as pd This section will walk you(引导你) through the fundamental(基本的) ...
Java String Class Example--reference
reference:http://examples.javacodegeeks.com/core-java/lang/string/java-string-class-example/ 1. Intr ...

随机推荐

div剩余高度自动填充满
这边采用弹性布局来处理在需要被填充满的div上设置display:flex;,然后根据你所需要填充宽度(flex-direction:column;)高度(flex-direction:row;)设 ...
2024CSP-S邮寄
前言去年被沉重打击到了,不过从此以后心态就好很多了,不会因为什么考试动不动就崩溃了. 考前一直在认真复习,也停了课,甚至差点错过运动会.从国庆开始听了几天课,消化课件,然后考试.考试的稳定性不高, ...
win7系统安装mysql新建数据库/数据表及故障处理，安装mysql后net start mysql服务无法启动
问题描述:win7系统安装mysql,安装mysql后net start mysql服务无法启动 1.下载mysql: 官网地址:https://dev.mysql.com/downloads/mys ...
uniapp 截屏扫码
最近开发功能遇到个需求,用户点击某个操作之后,需要截取当前屏幕内容,并扫码识别屏幕截图中的二维码,代码如下: 首先将代码抽离到外部文件中,以便复用: // 截图 export function tak ...
es6 export和export default的区别
相同点 export 与 export default 均可用于导出常量.函数.文件.模块可在其它文件或模块中通过import+(常量 | 函数 | 文件 | 模块)名的方式,将其导入,以便能够对其 ...
JdbcTemplate 自定义返回的结果集字段和实体类映射
废话不多:抄袭代码 package com.webank.wedatasphere.qualitis.handler; import com.webank.wedatasphere.qualitis. ...
C# 13 中的新增功能实操
前言今天大姚带领大家一起来看看 C# 13 中的新增几大功能,并了解其功能特性和实际应用场景. 前提准备要体验 C# 13 新增的功能可以使用最新的 Visual Studio 2022 版本或 ...
AI时代：本地运行大模型ollama
https://ollama.com/ 使用 Llama 2.Mistral.Gemma 和其他大型语言模型启动和运行. 支持windows,Linux,Mac. 支持的开源模型列表: Ollama ...
分享 3 款基于 .NET 开源且免费的远程桌面工具
前言今天大姚给大家分享 3 款基于 .NET 开源.免费.功能强大的远程桌面工具,希望可以给大家的远程工作和学习带来便利. 1Remote 1Remote是一款基于 .NET 开源(GPL-3.0 ...
阿里云服务器中Linux下centos7.6安装JDK
一.下载jdk安装包 wget https://imcfile.oss-cn-beijing.aliyuncs.com/shizhan/file/liaoshixiong/jdk-8u231-linu ...

String Manipulation related with pandas

String Manipulation related with pandas

String object Methods

Regular expression

Vectorized string functions in pandas

String Manipulation related with pandas的更多相关文章

随机推荐

热门专题