String Manipulation related with pandas

String object Methods

import pandas as pd
import numpy as np
val='a,b, guido'
val.split(',') # normal python built-in method split
['a', 'b', ' guido']
pieces=[x.strip() for x in val.split(',')];pieces  # strip whitespace
['a', 'b', 'guido']
'::'.join(pieces)
'a::b::guido'
val.count(',')
2
val.count('guido')
1
val.replace(',',':')
'a:b: guido'
val.swapcase()
'A,B, GUIDO'
val[::-1]
'odiug ,b,a'

Regular expression

The re module functions fall into 3 categories:pattern matching,substitution,splliting.

import re
text='foo   bar\t baz  \t qux'
re.split('\s+',text)
['foo', 'bar', 'baz', 'qux']
regex=re.compile('\s+')
regex.split(text)
['foo', 'bar', 'baz', 'qux']
regex.findall(text)
['   ', '\t ', '  \t ']
  • To avoid unwanted escaping with \ in a regular expression,use raw string literals
text="""Dave dave@google.com
Steve steve@mail.com
Rob rob@mail.com
Ryan ryan@yahoo.com
"""
pattern=r'[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}'
regex=re.compile(pattern,re.I)

Using findall() produces a list of the email address.

regex.findall(text)
['dave@google.com', 'steve@mail.com', 'rob@mail.com', 'ryan@yahoo.com']
regex.findall(r' J.onepy+@w-m.co')
['J.onepy+@w-m.co']

search() returns a specified match object for the first email address in the text.

m=regex.search(text)
m
<re.Match object; span=(5, 20), match='dave@google.com'>
regex.match(text)
text[m.start():m.end()]
'dave@google.com'

regex.match(text) returns None,as it onlyu will match if the pattern occurs at the start of the string.

sub() will return a new string with occurences of the pattern replaced by a new string.

print(regex.sub('READACTED',text))
Dave READACTED
Steve READACTED
Rob READACTED
Ryan READACTED

Vectorized string functions in pandas

data={'Dave':'dave@google.com','Steve':'steve@gmeil.com','Rob':'rob@gmail.com','Wes':np.nan}
data=pd.Series(data);data
Dave     dave@google.com
Steve steve@gmeil.com
Rob rob@gmail.com
Wes NaN
dtype: object
data.isnull()
Dave     False
Steve False
Rob False
Wes True
dtype: bool
data.str.contains('gmail')
Dave     False
Steve False
Rob True
Wes NaN
dtype: object
data
Dave     dave@google.com
Steve steve@gmeil.com
Rob rob@gmail.com
Wes NaN
dtype: object
data.map(lambda x:x[:2],na_action='ignore')  # x is the value in data, the returned Series has the same index with caller,data here.
Dave      da
Steve st
Rob ro
Wes NaN
dtype: object
help(data.map)
Help on method map in module pandas.core.series:

map(arg, na_action=None) method of pandas.core.series.Series instance
Map values of Series using input correspondence (a dict, Series, or
function). Parameters
----------
arg : function, dict, or Series
Mapping correspondence.
na_action : {None, 'ignore'}
If 'ignore', propagate NA values, without passing them to the
mapping correspondence. Returns
-------
y : Series
Same index as caller. Examples
-------- Map inputs to outputs (both of type `Series`): >>> x = pd.Series([1,2,3], index=['one', 'two', 'three'])
>>> x
one 1
two 2
three 3
dtype: int64 >>> y = pd.Series(['foo', 'bar', 'baz'], index=[1,2,3])
>>> y
1 foo
2 bar
3 baz >>> x.map(y)
one foo
two bar
three baz If `arg` is a dictionary, return a new Series with values converted
according to the dictionary's mapping: >>> z = {1: 'A', 2: 'B', 3: 'C'} >>> x.map(z)
one A
two B
three C Use na_action to control whether NA values are affected by the mapping
function. >>> s = pd.Series([1, 2, 3, np.nan]) >>> s2 = s.map('this is a string {}'.format, na_action=None)
0 this is a string 1.0
1 this is a string 2.0
2 this is a string 3.0
3 this is a string nan
dtype: object >>> s3 = s.map('this is a string {}'.format, na_action='ignore')
0 this is a string 1.0
1 this is a string 2.0
2 this is a string 3.0
3 NaN
dtype: object See Also
--------
Series.apply : For applying more complex functions on a Series.
DataFrame.apply : Apply a function row-/column-wise.
DataFrame.applymap : Apply a function elementwise on a whole DataFrame. Notes
-----
When `arg` is a dictionary, values in Series that are not in the
dictionary (as keys) are converted to ``NaN``. However, if the
dictionary is a ``dict`` subclass that defines ``__missing__`` (i.e.
provides a method for default values), then this default is used
rather than ``NaN``: >>> from collections import Counter
>>> counter = Counter()
>>> counter['bar'] += 1
>>> y.map(counter)
1 0
2 1
3 0
dtype: int64
pattern
'[A-Z0-9._%+-]+@[A-Z0-9.-]+\\.[A-Z]{2,4}'
data.str.findall(pattern,flags=re.I)
Dave     [dave@google.com]
Steve [steve@gmeil.com]
Rob [rob@gmail.com]
Wes NaN
dtype: object
matches=data.str.match(pattern,flags=re.I);matches
Dave     True
Steve True
Rob True
Wes NaN
dtype: object
matches.str.get(1)
Dave    NaN
Steve NaN
Rob NaN
Wes NaN
dtype: float64
matches.str[0]
Dave    NaN
Steve NaN
Rob NaN
Wes NaN
dtype: float64
data.str[:5]
Dave     dave@
Steve steve
Rob rob@g
Wes NaN
dtype: object

String Manipulation related with pandas的更多相关文章

  1. VK Cup 2012 Qualification Round 2 C. String Manipulation 1.0 字符串模拟

    C. String Manipulation 1.0 Time Limit: 20 Sec Memory Limit: 256 MB 题目连接 codeforces.com/problemset/pr ...

  2. Bash String Manipulation Examples – Length, Substring, Find and Replace--reference

    In bash shell, when you use a dollar sign followed by a variable name, shell expands the variable wi ...

  3. string manipulation in game development-C # in Unity -

    ◇ string manipulation in game development-C # in Unity - It is about the various string ● defined as ...

  4. CodeForces 159c String Manipulation 1.0

    String Manipulation 1.0 Time Limit: 3000ms Memory Limit: 262144KB This problem will be judged on Cod ...

  5. leetcode@ [68] Text Justification (String Manipulation)

    https://leetcode.com/problems/text-justification/ Given an array of words and a length L, format the ...

  6. [IoLanguage]Io Programming Guide[转]

    Io Programming Guide     Introduction Perspective Getting Started Downloading Installing Binaries Ru ...

  7. pandas 之 数据清洗-缺失值

    Abstract During the course fo doing data analysis and modeling, a significant amount of time is spen ...

  8. [转帖]Introduction to text manipulation on UNIX-based systems

    Introduction to text manipulation on UNIX-based systems https://www.ibm.com/developerworks/aix/libra ...

  9. Pandas 之 DataFrame 常用操作

    import numpy as np import pandas as pd This section will walk you(引导你) through the fundamental(基本的) ...

  10. Java String Class Example--reference

    reference:http://examples.javacodegeeks.com/core-java/lang/string/java-string-class-example/ 1. Intr ...

随机推荐

  1. Java中ArrayList的常见用法

    Java 中的 ArrayList 是一个非常常用的动态数组,它属于 Java 集合框架的一部分.与普通数组不同,ArrayList 可以在需要时动态调整其大小.以下是 ArrayList 的一些详细 ...

  2. Week09_day05(Hbase的安装搭建)

    搭建完全分布式集群 HBase集群建立在hadoop集群基础之上,所以在搭建HBase集群之前需要把Hadoop集群搭建起来,并且要考虑二者的兼容性.现在就以5台机器为例,搭建一个简单的集群. 软件版 ...

  3. 百万架构师第四十六课:并发编程的原理(一)|JavaGuide

    百万架构师系列文章阅读体验感更佳 原文链接:https://javaguide.net 并发编程的原理 课程目标 JMM 内存模型 JMM 如何解决原子性.可见性.有序性的问题 Synchronize ...

  4. Vulnhub-Node

    利用信息收集拿到路径得到账户密码,下载备份文件,base64解密后,利用fcrackzip爆破zip压缩包,得到一个文件,查看app.js,发现泄露的账户密码,连接ssh,成功连接,利用ubuntu历 ...

  5. surpac 中如何删除点

    找到显示的编号 输入线窜线段编号

  6. rust学习笔记(7)

    crate 中文是货箱,这是我们编写自己的库或者程序的方式 库 使用rustc可以把一个文件编译为lib rustc --crate-type=lib rary.rs 构建的方式选择lib 编译出来的 ...

  7. Suspense和vue-async-manager

    Suspense Suspense是 Vue3.x 中新增的特性, 那它有什么用呢?别急,我们通过 Vue2.x 中的一些场景来认识它的作用. Vue2.x 中应该经常遇到这样的场景: <tem ...

  8. ubuntu apt 安装报错:Media change: please insert the disc labeled 'Ubuntu 20.04.5 LTS Focal Fossa - Release amd64 (20220831)' in the drive '/cdrom/' and press [Enter]

    前言 如果你在 Ubuntu 上使用 apt 安装软件包时遇到 "Media change: please insert the disc labeled ..." 的错误消息,这 ...

  9. go mgo包 简单封装 mongodb 数据库驱动

    mgo是go编写的mongodb的数据库驱动,集成到项目中进行mongodb的操作很流畅,以下是对其的一些简单封装,具体使用可随意改动封装. 安装 go get gopkg.in/mgo.v2 使用 ...

  10. Delphi 让窗体自适应屏幕显示

    unit Unit1; interface uses Winapi.Windows, Winapi.Messages, System.SysUtils, System.Variants, System ...