String Manipulation related with pandas
String Manipulation related with pandas
String object Methods
import pandas as pd
import numpy as np
val='a,b, guido'
val.split(',') # normal python built-in method split
['a', 'b', ' guido']
pieces=[x.strip() for x in val.split(',')];pieces # strip whitespace
['a', 'b', 'guido']
'::'.join(pieces)
'a::b::guido'
val.count(',')
2
val.count('guido')
1
val.replace(',',':')
'a:b: guido'
val.swapcase()
'A,B, GUIDO'
val[::-1]
'odiug ,b,a'
Regular expression
The re module functions fall into 3 categories:pattern matching,substitution,splliting.
import re
text='foo bar\t baz \t qux'
re.split('\s+',text)
['foo', 'bar', 'baz', 'qux']
regex=re.compile('\s+')
regex.split(text)
['foo', 'bar', 'baz', 'qux']
regex.findall(text)
[' ', '\t ', ' \t ']
- To avoid unwanted escaping with \ in a regular expression,use raw string literals
text="""Dave dave@google.com
Steve steve@mail.com
Rob rob@mail.com
Ryan ryan@yahoo.com
"""
pattern=r'[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}'
regex=re.compile(pattern,re.I)
Using findall() produces a list of the email address.
regex.findall(text)
['dave@google.com', 'steve@mail.com', 'rob@mail.com', 'ryan@yahoo.com']
regex.findall(r' J.onepy+@w-m.co')
['J.onepy+@w-m.co']
search() returns a specified match object for the first email address in the text.
m=regex.search(text)
m
<re.Match object; span=(5, 20), match='dave@google.com'>
regex.match(text)
text[m.start():m.end()]
'dave@google.com'
regex.match(text) returns None,as it onlyu will match if the pattern occurs at the start of the string.
sub() will return a new string with occurences of the pattern replaced by a new string.
print(regex.sub('READACTED',text))
Dave READACTED
Steve READACTED
Rob READACTED
Ryan READACTED
Vectorized string functions in pandas
data={'Dave':'dave@google.com','Steve':'steve@gmeil.com','Rob':'rob@gmail.com','Wes':np.nan}
data=pd.Series(data);data
Dave dave@google.com
Steve steve@gmeil.com
Rob rob@gmail.com
Wes NaN
dtype: object
data.isnull()
Dave False
Steve False
Rob False
Wes True
dtype: bool
data.str.contains('gmail')
Dave False
Steve False
Rob True
Wes NaN
dtype: object
data
Dave dave@google.com
Steve steve@gmeil.com
Rob rob@gmail.com
Wes NaN
dtype: object
data.map(lambda x:x[:2],na_action='ignore') # x is the value in data, the returned Series has the same index with caller,data here.
Dave da
Steve st
Rob ro
Wes NaN
dtype: object
help(data.map)
Help on method map in module pandas.core.series:
map(arg, na_action=None) method of pandas.core.series.Series instance
Map values of Series using input correspondence (a dict, Series, or
function).
Parameters
----------
arg : function, dict, or Series
Mapping correspondence.
na_action : {None, 'ignore'}
If 'ignore', propagate NA values, without passing them to the
mapping correspondence.
Returns
-------
y : Series
Same index as caller.
Examples
--------
Map inputs to outputs (both of type `Series`):
>>> x = pd.Series([1,2,3], index=['one', 'two', 'three'])
>>> x
one 1
two 2
three 3
dtype: int64
>>> y = pd.Series(['foo', 'bar', 'baz'], index=[1,2,3])
>>> y
1 foo
2 bar
3 baz
>>> x.map(y)
one foo
two bar
three baz
If `arg` is a dictionary, return a new Series with values converted
according to the dictionary's mapping:
>>> z = {1: 'A', 2: 'B', 3: 'C'}
>>> x.map(z)
one A
two B
three C
Use na_action to control whether NA values are affected by the mapping
function.
>>> s = pd.Series([1, 2, 3, np.nan])
>>> s2 = s.map('this is a string {}'.format, na_action=None)
0 this is a string 1.0
1 this is a string 2.0
2 this is a string 3.0
3 this is a string nan
dtype: object
>>> s3 = s.map('this is a string {}'.format, na_action='ignore')
0 this is a string 1.0
1 this is a string 2.0
2 this is a string 3.0
3 NaN
dtype: object
See Also
--------
Series.apply : For applying more complex functions on a Series.
DataFrame.apply : Apply a function row-/column-wise.
DataFrame.applymap : Apply a function elementwise on a whole DataFrame.
Notes
-----
When `arg` is a dictionary, values in Series that are not in the
dictionary (as keys) are converted to ``NaN``. However, if the
dictionary is a ``dict`` subclass that defines ``__missing__`` (i.e.
provides a method for default values), then this default is used
rather than ``NaN``:
>>> from collections import Counter
>>> counter = Counter()
>>> counter['bar'] += 1
>>> y.map(counter)
1 0
2 1
3 0
dtype: int64
pattern
'[A-Z0-9._%+-]+@[A-Z0-9.-]+\\.[A-Z]{2,4}'
data.str.findall(pattern,flags=re.I)
Dave [dave@google.com]
Steve [steve@gmeil.com]
Rob [rob@gmail.com]
Wes NaN
dtype: object
matches=data.str.match(pattern,flags=re.I);matches
Dave True
Steve True
Rob True
Wes NaN
dtype: object
matches.str.get(1)
Dave NaN
Steve NaN
Rob NaN
Wes NaN
dtype: float64
matches.str[0]
Dave NaN
Steve NaN
Rob NaN
Wes NaN
dtype: float64
data.str[:5]
Dave dave@
Steve steve
Rob rob@g
Wes NaN
dtype: object
String Manipulation related with pandas的更多相关文章
- VK Cup 2012 Qualification Round 2 C. String Manipulation 1.0 字符串模拟
C. String Manipulation 1.0 Time Limit: 20 Sec Memory Limit: 256 MB 题目连接 codeforces.com/problemset/pr ...
- Bash String Manipulation Examples – Length, Substring, Find and Replace--reference
In bash shell, when you use a dollar sign followed by a variable name, shell expands the variable wi ...
- string manipulation in game development-C # in Unity -
◇ string manipulation in game development-C # in Unity - It is about the various string ● defined as ...
- CodeForces 159c String Manipulation 1.0
String Manipulation 1.0 Time Limit: 3000ms Memory Limit: 262144KB This problem will be judged on Cod ...
- leetcode@ [68] Text Justification (String Manipulation)
https://leetcode.com/problems/text-justification/ Given an array of words and a length L, format the ...
- [IoLanguage]Io Programming Guide[转]
Io Programming Guide Introduction Perspective Getting Started Downloading Installing Binaries Ru ...
- pandas 之 数据清洗-缺失值
Abstract During the course fo doing data analysis and modeling, a significant amount of time is spen ...
- [转帖]Introduction to text manipulation on UNIX-based systems
Introduction to text manipulation on UNIX-based systems https://www.ibm.com/developerworks/aix/libra ...
- Pandas 之 DataFrame 常用操作
import numpy as np import pandas as pd This section will walk you(引导你) through the fundamental(基本的) ...
- Java String Class Example--reference
reference:http://examples.javacodegeeks.com/core-java/lang/string/java-string-class-example/ 1. Intr ...
随机推荐
- Ansible - [02] 基础配置以及常用操作场景
Ansible 基础配置 主配置文件:/etc/ansible/ansible.cfg ansible配置文件查找顺序 首先检测ANSIBLE_CONFIG变量定义的配置 其次检查当前目录下的./an ...
- mysql扫描全表更新状态部分失败
1. mysql排序问题 一直以为mysql是按照主键排序的,实则排序和主键没有关系(不使用 order by 子句). 然后从 stackoverflow 上查了一下,找到了以下的回答: 没有默认的 ...
- win11 输入法自定义短语输出日期时间变量
自定义短语中输入%yyyy%-%MM%-%dd% %HH%:%mm%:%ss%
- 别再混淆了!JVM内存模型和Java内存模型的本质区别
JVM 内存模型(JVM Memory Model)和 Java 内存模型(Java Memory Model, JMM)是 Java 开发中两个非常重要的概念,但这两个概念很容易被搞混,所以本文就来 ...
- script 标签中 defer 和 async 的区别
https://www.cnblogs.com/huangtq/p/18422775 在 <script> 标签中,defer 和 async 是两个用于控制 JavaScript 脚本加 ...
- 【P1】Verilog部件级实验/有限状态机
课上 再次体验大心脏 T1 奇偶校验 for循环数1的个数判断奇偶/异或缩减运算符判断奇偶,然后根据check的奇偶要求调整最高位 bug1 !注意优先级:位运算 低于 比较运算. cnt & ...
- Linux升级openssl、openssh
在项目中,我们经常会发现Linux系统中Open SSH.Open SSL存在高危漏洞,如OpenSSL"心脏出血"漏洞,利用该漏洞,黑客可以获取约30%的https开头网址的 ...
- linux 根目录扩容方法
准备知识 linux volume 1.(PV)physical volume disk : 物理硬盘 物理硬盘需要转换成lvm(logic volume manage)可识别的状态,将磁盘的syst ...
- Go操作MySQL总结
1.下载驱动包 打开GoLand->Terminal,输入:go get github.com/go-sql-driver/mysql 2.编写代码 package mainimport ( & ...
- 🎀抓包工具安装-Charles
简介 Charles 作为一个 HTTP 代理/HTTP 监视器/反向代理工具,允许开发者查看他们的计算机与互联网之间的所有 HTTP 和 HTTPS 通信.工作原理是基于 HTTP 代理的概念,它充 ...