Data Preparation in Pandas
Data cleaning
import pandas as pd
import numpy as np
string_data=pd.Series(['aardvark','artichoke',np.nan,'avocado']);string_data
0 aardvark
1 artichoke
2 NaN
3 avocado
dtype: object
string_data.isnull()
0 False
1 False
2 True
3 False
dtype: bool
string_data[2]
nan
from numpy import nan as NA
data=pd.Series([1,NA,3.5,NA,7])
data.dropna()
0 1.0
2 3.5
4 7.0
dtype: float64
data[[False,True,True,False,False]]
1 NaN
2 3.5
dtype: float64
data[data.notnull()]
0 1.0
2 3.5
4 7.0
dtype: float64
data=pd.DataFrame([[1,6.5,3],[1,NA,NA],[NA,NA,NA],[NA,6.5,3]]);data
|
0 |
1 |
2 |
| 0 |
1.0 |
6.5 |
3.0 |
| 1 |
1.0 |
NaN |
NaN |
| 2 |
NaN |
NaN |
NaN |
| 3 |
NaN |
6.5 |
3.0 |
data.dropna()
data.dropna(how='all')
|
0 |
1 |
2 |
| 0 |
1.0 |
6.5 |
3.0 |
| 1 |
1.0 |
NaN |
NaN |
| 3 |
NaN |
6.5 |
3.0 |
data[4]=NA;data
|
0 |
1 |
2 |
4 |
| 0 |
1.0 |
6.5 |
3.0 |
NaN |
| 1 |
1.0 |
NaN |
NaN |
NaN |
| 2 |
NaN |
NaN |
NaN |
NaN |
| 3 |
NaN |
6.5 |
3.0 |
NaN |
data.dropna(how='all',axis='columns')
|
0 |
1 |
2 |
| 0 |
1.0 |
6.5 |
3.0 |
| 1 |
1.0 |
NaN |
NaN |
| 2 |
NaN |
NaN |
NaN |
| 3 |
NaN |
6.5 |
3.0 |
df=pd.DataFrame(np.random.randn(7,3))
df
|
0 |
1 |
2 |
| 0 |
-1.744196 |
-0.281787 |
-0.963212 |
| 1 |
-1.114174 |
0.024707 |
0.095524 |
| 2 |
0.879205 |
-1.272202 |
-0.317218 |
| 3 |
0.227725 |
-0.067809 |
0.609824 |
| 4 |
-1.082470 |
-1.230476 |
-1.616135 |
| 5 |
-1.218976 |
0.018245 |
-0.155761 |
| 6 |
-0.607157 |
-0.641986 |
-0.406378 |
help(np.random.randn)
Help on built-in function randn:
randn(...) method of mtrand.RandomState instance
randn(d0, d1, ..., dn)
Return a sample (or samples) from the "standard normal" distribution.
If positive, int_like or int-convertible arguments are provided,
`randn` generates an array of shape ``(d0, d1, ..., dn)``, filled
with random floats sampled from a univariate "normal" (Gaussian)
distribution of mean 0 and variance 1 (if any of the :math:`d_i` are
floats, they are first converted to integers by truncation). A single
float randomly sampled from the distribution is returned if no
argument is provided.
This is a convenience function. If you want an interface that takes a
tuple as the first argument, use `numpy.random.standard_normal` instead.
Parameters
----------
d0, d1, ..., dn : int, optional
The dimensions of the returned array, should be all positive.
If no argument is given a single Python float is returned.
Returns
-------
Z : ndarray or float
A ``(d0, d1, ..., dn)``-shaped array of floating-point samples from
the standard normal distribution, or a single such float if
no parameters were supplied.
See Also
--------
random.standard_normal : Similar, but takes a tuple as its argument.
Notes
-----
For random samples from :math:`N(\mu, \sigma^2)`, use:
``sigma * np.random.randn(...) + mu``
Examples
--------
>>> np.random.randn()
2.1923875335537315 #random
Two-by-four array of samples from N(3, 6.25):
>>> 2.5 * np.random.randn(2, 4) + 3
array([[-4.49401501, 4.00950034, -1.81814867, 7.29718677], #random
[ 0.39924804, 4.68456316, 4.99394529, 4.84057254]]) #random
df
|
0 |
1 |
2 |
| 0 |
-1.744196 |
-0.281787 |
-0.963212 |
| 1 |
-1.114174 |
0.024707 |
0.095524 |
| 2 |
0.879205 |
-1.272202 |
-0.317218 |
| 3 |
0.227725 |
-0.067809 |
0.609824 |
| 4 |
-1.082470 |
-1.230476 |
-1.616135 |
| 5 |
-1.218976 |
0.018245 |
-0.155761 |
| 6 |
-0.607157 |
-0.641986 |
-0.406378 |
df.iloc[:4,1]=NA;df
|
0 |
1 |
2 |
| 0 |
-1.744196 |
NaN |
-0.963212 |
| 1 |
-1.114174 |
NaN |
0.095524 |
| 2 |
0.879205 |
NaN |
-0.317218 |
| 3 |
0.227725 |
NaN |
0.609824 |
| 4 |
-1.082470 |
-1.230476 |
-1.616135 |
| 5 |
-1.218976 |
0.018245 |
-0.155761 |
| 6 |
-0.607157 |
-0.641986 |
-0.406378 |
df.iloc[:2,2]=NA;df
|
0 |
1 |
2 |
| 0 |
-1.744196 |
NaN |
NaN |
| 1 |
-1.114174 |
NaN |
NaN |
| 2 |
0.879205 |
NaN |
-0.317218 |
| 3 |
0.227725 |
NaN |
0.609824 |
| 4 |
-1.082470 |
-1.230476 |
-1.616135 |
| 5 |
-1.218976 |
0.018245 |
-0.155761 |
| 6 |
-0.607157 |
-0.641986 |
-0.406378 |
df.dropna()
|
0 |
1 |
2 |
| 4 |
-1.082470 |
-1.230476 |
-1.616135 |
| 5 |
-1.218976 |
0.018245 |
-0.155761 |
| 6 |
-0.607157 |
-0.641986 |
-0.406378 |
df.dropna(thresh=2)
|
0 |
1 |
2 |
| 2 |
0.879205 |
NaN |
-0.317218 |
| 3 |
0.227725 |
NaN |
0.609824 |
| 4 |
-1.082470 |
-1.230476 |
-1.616135 |
| 5 |
-1.218976 |
0.018245 |
-0.155761 |
| 6 |
-0.607157 |
-0.641986 |
-0.406378 |
df.fillna(0)
|
0 |
1 |
2 |
| 0 |
-1.744196 |
0.000000 |
0.000000 |
| 1 |
-1.114174 |
0.000000 |
0.000000 |
| 2 |
0.879205 |
0.000000 |
-0.317218 |
| 3 |
0.227725 |
0.000000 |
0.609824 |
| 4 |
-1.082470 |
-1.230476 |
-1.616135 |
| 5 |
-1.218976 |
0.018245 |
-0.155761 |
| 6 |
-0.607157 |
-0.641986 |
-0.406378 |
df.fillna({1:0.5,2:0})
|
0 |
1 |
2 |
| 0 |
-1.744196 |
0.500000 |
0.000000 |
| 1 |
-1.114174 |
0.500000 |
0.000000 |
| 2 |
0.879205 |
0.500000 |
-0.317218 |
| 3 |
0.227725 |
0.500000 |
0.609824 |
| 4 |
-1.082470 |
-1.230476 |
-1.616135 |
| 5 |
-1.218976 |
0.018245 |
-0.155761 |
| 6 |
-0.607157 |
-0.641986 |
-0.406378 |
df
|
0 |
1 |
2 |
| 0 |
-1.744196 |
NaN |
NaN |
| 1 |
-1.114174 |
NaN |
NaN |
| 2 |
0.879205 |
NaN |
-0.317218 |
| 3 |
0.227725 |
NaN |
0.609824 |
| 4 |
-1.082470 |
-1.230476 |
-1.616135 |
| 5 |
-1.218976 |
0.018245 |
-0.155761 |
| 6 |
-0.607157 |
-0.641986 |
-0.406378 |
df.fillna(0,inplace=True)
df
|
0 |
1 |
2 |
| 0 |
-1.744196 |
0.000000 |
0.000000 |
| 1 |
-1.114174 |
0.000000 |
0.000000 |
| 2 |
0.879205 |
0.000000 |
-0.317218 |
| 3 |
0.227725 |
0.000000 |
0.609824 |
| 4 |
-1.082470 |
-1.230476 |
-1.616135 |
| 5 |
-1.218976 |
0.018245 |
-0.155761 |
| 6 |
-0.607157 |
-0.641986 |
-0.406378 |
df=pd.DataFrame(np.random.randn(6,3))
df.iloc[2:,1]=NA
df.iloc[4:,2]=NA
df
|
0 |
1 |
2 |
| 0 |
-0.970921 |
-1.311345 |
0.779965 |
| 1 |
-0.352837 |
0.290834 |
-0.440396 |
| 2 |
0.574406 |
NaN |
2.034865 |
| 3 |
0.088611 |
NaN |
-0.004141 |
| 4 |
0.792289 |
NaN |
NaN |
| 5 |
0.668345 |
NaN |
NaN |
df.fillna(method='ffill')
|
0 |
1 |
2 |
| 0 |
-0.970921 |
-1.311345 |
0.779965 |
| 1 |
-0.352837 |
0.290834 |
-0.440396 |
| 2 |
0.574406 |
0.290834 |
2.034865 |
| 3 |
0.088611 |
0.290834 |
-0.004141 |
| 4 |
0.792289 |
0.290834 |
-0.004141 |
| 5 |
0.668345 |
0.290834 |
-0.004141 |
df
|
0 |
1 |
2 |
| 0 |
-0.970921 |
-1.311345 |
0.779965 |
| 1 |
-0.352837 |
0.290834 |
-0.440396 |
| 2 |
0.574406 |
NaN |
2.034865 |
| 3 |
0.088611 |
NaN |
-0.004141 |
| 4 |
0.792289 |
NaN |
NaN |
| 5 |
0.668345 |
NaN |
NaN |
df.dropna()
|
0 |
1 |
2 |
| 0 |
-0.970921 |
-1.311345 |
0.779965 |
| 1 |
-0.352837 |
0.290834 |
-0.440396 |
df.dropna(thresh=2)
|
0 |
1 |
2 |
| 0 |
-0.970921 |
-1.311345 |
0.779965 |
| 1 |
-0.352837 |
0.290834 |
-0.440396 |
| 2 |
0.574406 |
NaN |
2.034865 |
| 3 |
0.088611 |
NaN |
-0.004141 |
df.dropna(thresh=2,inplace=True)
df
|
0 |
1 |
2 |
| 0 |
-0.970921 |
-1.311345 |
0.779965 |
| 1 |
-0.352837 |
0.290834 |
-0.440396 |
| 2 |
0.574406 |
NaN |
2.034865 |
| 3 |
0.088611 |
NaN |
-0.004141 |
data=pd.DataFrame({'K1':['one','two']*3+['two'],'K2':[1,1,2,3,3,4,4]});data
|
K1 |
K2 |
| 0 |
one |
1 |
| 1 |
two |
1 |
| 2 |
one |
2 |
| 3 |
two |
3 |
| 4 |
one |
3 |
| 5 |
two |
4 |
| 6 |
two |
4 |
data.duplicated()
0 False
1 False
2 False
3 False
4 False
5 False
6 True
dtype: bool
data.drop_duplicates()
|
K1 |
K2 |
| 0 |
one |
1 |
| 1 |
two |
1 |
| 2 |
one |
2 |
| 3 |
two |
3 |
| 4 |
one |
3 |
| 5 |
two |
4 |
data['v1']=range(7)
data
|
K1 |
K2 |
v1 |
| 0 |
one |
1 |
0 |
| 1 |
two |
1 |
1 |
| 2 |
one |
2 |
2 |
| 3 |
two |
3 |
3 |
| 4 |
one |
3 |
4 |
| 5 |
two |
4 |
5 |
| 6 |
two |
4 |
6 |
data.drop_duplicates(['K1','K2'])
|
K1 |
K2 |
v1 |
| 0 |
one |
1 |
0 |
| 1 |
two |
1 |
1 |
| 2 |
one |
2 |
2 |
| 3 |
two |
3 |
3 |
| 4 |
one |
3 |
4 |
| 5 |
two |
4 |
5 |
df
|
0 |
1 |
2 |
| 0 |
-0.970921 |
-1.311345 |
0.779965 |
| 1 |
-0.352837 |
0.290834 |
-0.440396 |
| 2 |
0.574406 |
NaN |
2.034865 |
| 3 |
0.088611 |
NaN |
-0.004141 |
data
|
K1 |
K2 |
v1 |
| 0 |
one |
1 |
0 |
| 1 |
two |
1 |
1 |
| 2 |
one |
2 |
2 |
| 3 |
two |
3 |
3 |
| 4 |
one |
3 |
4 |
| 5 |
two |
4 |
5 |
| 6 |
two |
4 |
6 |
data.drop_duplicates(['K1','K2'])
|
K1 |
K2 |
v1 |
| 0 |
one |
1 |
0 |
| 1 |
two |
1 |
1 |
| 2 |
one |
2 |
2 |
| 3 |
two |
3 |
3 |
| 4 |
one |
3 |
4 |
| 5 |
two |
4 |
5 |
Transforming Data Using a Function or Mapping
import pandas as pd
import numpy as np
data=pd.DataFrame({'food':['bacon','pulled pork','bacon','pastrami','corned beef','Bacon','Pastrami','honey ham','nova lox'],
'ounces':[4,3,12,6,7.5,8,3,5,6]});data
|
food |
ounces |
| 0 |
bacon |
4.0 |
| 1 |
pulled pork |
3.0 |
| 2 |
bacon |
12.0 |
| 3 |
pastrami |
6.0 |
| 4 |
corned beef |
7.5 |
| 5 |
Bacon |
8.0 |
| 6 |
Pastrami |
3.0 |
| 7 |
honey ham |
5.0 |
| 8 |
nova lox |
6.0 |
meat_to_animal={'bacon':'pig',
'pulled pork':'pig',
'pastrami':'cow',
'corned beef':'cow',
'honey ham':'pig',
'nova lox':'salmon'}
pd.Series.str.lower
<function pandas.core.strings._noarg_wrapper.<locals>.wrapper>
- str.lower above is a Series method.
lowercased=data['food'].str.lower()
data['animal']=lowercased
data
|
food |
ounces |
animal |
| 0 |
bacon |
4.0 |
bacon |
| 1 |
pulled pork |
3.0 |
pulled pork |
| 2 |
bacon |
12.0 |
bacon |
| 3 |
pastrami |
6.0 |
pastrami |
| 4 |
corned beef |
7.5 |
corned beef |
| 5 |
Bacon |
8.0 |
bacon |
| 6 |
Pastrami |
3.0 |
pastrami |
| 7 |
honey ham |
5.0 |
honey ham |
| 8 |
nova lox |
6.0 |
nova lox |
The map() method on a Series accepts a function or dict-like object containing a mapping.Using map() is a convenient way to perform element-wise transformations and other data cleaning related operations.
data['animal']=lowercased.map(meat_to_animal);data
|
food |
ounces |
animal |
| 0 |
bacon |
4.0 |
pig |
| 1 |
pulled pork |
3.0 |
pig |
| 2 |
bacon |
12.0 |
pig |
| 3 |
pastrami |
6.0 |
cow |
| 4 |
corned beef |
7.5 |
cow |
| 5 |
Bacon |
8.0 |
pig |
| 6 |
Pastrami |
3.0 |
cow |
| 7 |
honey ham |
5.0 |
pig |
| 8 |
nova lox |
6.0 |
salmon |
We could also have passed a function that does all the work.Such as the following:
data['food'].map(lambda x:meat_to_animal[x.lower()])
0 pig
1 pig
2 pig
3 cow
4 cow
5 pig
6 cow
7 pig
8 salmon
Name: food, dtype: object
Replacing values
data=pd.Series([1,-999,2,-999,-1000,3]);data
0 1
1 -999
2 2
3 -999
4 -1000
5 3
dtype: int64
data.replace(-999,np.nan) # Replcace one value with one value
0 1.0
1 NaN
2 2.0
3 NaN
4 -1000.0
5 3.0
dtype: float64
data.replace([-999,-1000],np.nan) # Replace multi-values with one value
0 1.0
1 NaN
2 2.0
3 NaN
4 NaN
5 3.0
dtype: float64
data.replace([-999,-1000],[np.nan,0])# Replace multi-values with multi-values
0 1.0
1 NaN
2 2.0
3 NaN
4 0.0
5 3.0
dtype: float64
data.replace({-999:np.nan,0-1000:0}) # dict can also be passed into replace method
0 1.0
1 NaN
2 2.0
3 NaN
4 0.0
5 3.0
dtype: float64
data1=pd.Series(['A','B','c',12])
help(data1.str.replace)
Help on method replace in module pandas.core.strings:
replace(pat, repl, n=-1, case=True, flags=0) method of pandas.core.strings.StringMethods instance
Replace occurrences of pattern/regex in the Series/Index with
some other string. Equivalent to :meth:`str.replace` or
:func:`re.sub`.
Parameters
----------
pat : string
Character sequence or regular expression
repl : string
Replacement sequence
n : int, default -1 (all)
Number of replacements to make from start
case : boolean, default True
If True, case sensitive
flags : int, default 0 (no flags)
re module flags, e.g. re.IGNORECASE
Returns
-------
replaced : Series/Index of objects
Renaming Axis indexes
data=pd.DataFrame(np.arange(12).reshape((3,4)),index=['Ohio','Colorado','New York'],columns=['One','Two','three','Four']);data
|
One |
Two |
three |
Four |
| Ohio |
0 |
1 |
2 |
3 |
| Colorado |
4 |
5 |
6 |
7 |
| New York |
8 |
9 |
10 |
11 |
data.index.map(lambda x:x[:4].upper())
array(['OHIO', 'COLO', 'NEW '], dtype=object)
data
|
One |
Two |
three |
Four |
| Ohio |
0 |
1 |
2 |
3 |
| Colorado |
4 |
5 |
6 |
7 |
| New York |
8 |
9 |
10 |
11 |
data.index=data.index.map(lambda x:x[:4].upper());data # Modify DataFrame in-place
|
One |
Two |
three |
Four |
| OHIO |
0 |
1 |
2 |
3 |
| COLO |
4 |
5 |
6 |
7 |
| NEW |
8 |
9 |
10 |
11 |
If you want to create a transformed version of a dataset without modifying the original,a useful method is rename().
data
|
One |
Two |
three |
Four |
| OHIO |
0 |
1 |
2 |
3 |
| COLO |
4 |
5 |
6 |
7 |
| NEW |
8 |
9 |
10 |
11 |
data.rename(index=str.title,columns=str.upper)
|
ONE |
TWO |
THREE |
FOUR |
| Ohio |
0 |
1 |
2 |
3 |
| Colo |
4 |
5 |
6 |
7 |
| New |
8 |
9 |
10 |
11 |
data
|
One |
Two |
three |
Four |
| OHIO |
0 |
1 |
2 |
3 |
| COLO |
4 |
5 |
6 |
7 |
| NEW |
8 |
9 |
10 |
11 |
To modify dataset in-place,pass inplace=True.
data.rename(index={'OHIO':'INDIANA'},inplace=True)
data
|
One |
Two |
three |
Four |
| INDIANA |
0 |
1 |
2 |
3 |
| COLO |
4 |
5 |
6 |
7 |
| NEW |
8 |
9 |
10 |
11 |
Discretization and Binning
help(pd.cut)
Help on function cut in module pandas.tools.tile:
cut(x, bins, right=True, labels=None, retbins=False, precision=3, include_lowest=False)
Return indices of half-open bins to which each value of `x` belongs.
Parameters
----------
x : array-like
Input array to be binned. It has to be 1-dimensional.
bins : int or sequence of scalars
If `bins` is an int, it defines the number of equal-width bins in the
range of `x`. However, in this case, the range of `x` is extended
by .1% on each side to include the min or max values of `x`. If
`bins` is a sequence it defines the bin edges allowing for
non-uniform bin width. No extension of the range of `x` is done in
this case.
right : bool, optional
Indicates whether the bins include the rightmost edge or not. If
right == True (the default), then the bins [1,2,3,4] indicate
(1,2], (2,3], (3,4].
labels : array or boolean, default None
Used as labels for the resulting bins. Must be of the same length as
the resulting bins. If False, return only integer indicators of the
bins.
retbins : bool, optional
Whether to return the bins or not. Can be useful if bins is given
as a scalar.
precision : int
The precision at which to store and display the bins labels
include_lowest : bool
Whether the first interval should be left-inclusive or not.
Returns
-------
out : Categorical or Series or array of integers if labels is False
The return type (Categorical or Series) depends on the input: a Series
of type category if input is a Series else Categorical. Bins are
represented as categories when categorical data is returned.
bins : ndarray of floats
Returned only if `retbins` is True.
Notes
-----
The `cut` function can be useful for going from a continuous variable to
a categorical variable. For example, `cut` could convert ages to groups
of age ranges.
Any NA values will be NA in the result. Out of bounds values will be NA in
the resulting Categorical object
Examples
--------
>>> pd.cut(np.array([.2, 1.4, 2.5, 6.2, 9.7, 2.1]), 3, retbins=True)
([(0.191, 3.367], (0.191, 3.367], (0.191, 3.367], (3.367, 6.533],
(6.533, 9.7], (0.191, 3.367]]
Categories (3, object): [(0.191, 3.367] < (3.367, 6.533] < (6.533, 9.7]],
array([ 0.1905 , 3.36666667, 6.53333333, 9.7 ]))
>>> pd.cut(np.array([.2, 1.4, 2.5, 6.2, 9.7, 2.1]), 3,
labels=["good","medium","bad"])
[good, good, good, medium, bad, good]
Categories (3, object): [good < medium < bad]
>>> pd.cut(np.ones(5), 4, labels=False)
array([1, 1, 1, 1, 1], dtype=int64)
ages=[20,22,25,27,21,23,37,31,61,45,41,32]
bins=[18,25,35,60,100]
cats=pd.cut(ages,bins)
cats
[(18, 25], (18, 25], (18, 25], (25, 35], (18, 25], ..., (25, 35], (60, 100], (35, 60], (35, 60], (25, 35]]
Length: 12
Categories (4, object): [(18, 25] < (25, 35] < (35, 60] < (60, 100]]
len(ages)
12
type(cats)
pandas.core.categorical.Categorical
cats.codes
array([0, 0, 0, 1, 0, 0, 2, 1, 3, 2, 2, 1], dtype=int8)
cats.categories
Index(['(18, 25]', '(25, 35]', '(35, 60]', '(60, 100]'], dtype='object')
type(pd.value_counts(cats))
pandas.core.series.Series
help(pd.value_counts)
Help on function value_counts in module pandas.core.algorithms:
value_counts(values, sort=True, ascending=False, normalize=False, bins=None, dropna=True)
Compute a histogram of the counts of non-null values.
Parameters
----------
values : ndarray (1-d)
sort : boolean, default True
Sort by values
ascending : boolean, default False
Sort in ascending order
normalize: boolean, default False
If True then compute a relative histogram
bins : integer, optional
Rather than count values, group them into half-open bins,
convenience for pd.cut, only works with numeric data
dropna : boolean, default True
Don't include counts of NaN
Returns
-------
value_counts : Series
pd.value_counts([1,1,2,3,4,45,5])
1 2
5 1
45 1
4 1
3 1
2 1
dtype: int64
pd.value_counts(cats)
(18, 25] 5
(35, 60] 3
(25, 35] 3
(60, 100] 1
dtype: int64
You can also pass your bin names by passing a list or array to the labels option.
group_names=['Youth','YoungAdult','MiddleAged','Senior']
pd.cut(ages,bins,labels=group_names) # bin is a reserved key.
[Youth, Youth, Youth, YoungAdult, Youth, ..., YoungAdult, Senior, MiddleAged, MiddleAged, YoungAdult]
Length: 12
Categories (4, object): [Youth < YoungAdult < MiddleAged < Senior]
help(bin)
Help on built-in function bin in module builtins:
bin(number, /)
Return the binary representation of an integer.
>>> bin(2796202)
'0b1010101010101010101010'
bin(2)
'0b10'
bins can also be an integer, and in that case, the category will be equal-space.
data=np.random.rand(20)
pd.cut(data,4,precision=2)# precision limits the decimal precision to two digits.
[(0.25, 0.5], (0.25, 0.5], (0.25, 0.5], (0.75, 1], (0.5, 0.75], ..., (0.0024, 0.25], (0.25, 0.5], (0.25, 0.5], (0.25, 0.5], (0.0024, 0.25]]
Length: 20
Categories (4, object): [(0.0024, 0.25] < (0.25, 0.5] < (0.5, 0.75] < (0.75, 1]]
- A closely related function,
qcut,bins the data based on sample quantiles.Using cut will not usually result in each bin having the same number of data points.
data=np.random.randn(1000)
cats=pd.qcut(data,4);cats
[(0.0211, 0.689], (0.689, 3.225], (-0.62, 0.0211], (0.689, 3.225], (0.689, 3.225], ..., (0.689, 3.225], [-3.401, -0.62], (-0.62, 0.0211], (-0.62, 0.0211], (-0.62, 0.0211]]
Length: 1000
Categories (4, object): [[-3.401, -0.62] < (-0.62, 0.0211] < (0.0211, 0.689] < (0.689, 3.225]]
pd.value_counts(cats)
(0.689, 3.225] 250
(0.0211, 0.689] 250
(-0.62, 0.0211] 250
[-3.401, -0.62] 250
dtype: int64
cats1=pd.qcut(data,[0,0.1,0.5,0.9,1])
pd.value_counts(cats1)
(0.0211, 1.33] 400
(-1.201, 0.0211] 400
(1.33, 3.225] 100
[-3.401, -1.201] 100
dtype: int64
Detecting and filtering Outliers
data=pd.DataFrame(np.random.randn(1000,4))
data.describe()
|
0 |
1 |
2 |
3 |
| count |
1000.000000 |
1000.000000 |
1000.000000 |
1000.000000 |
| mean |
0.002634 |
-0.038263 |
0.001432 |
-0.040628 |
| std |
0.981600 |
0.996856 |
1.021248 |
1.030675 |
| min |
-3.400618 |
-3.427137 |
-4.309211 |
-4.375632 |
| 25% |
-0.656369 |
-0.713371 |
-0.681777 |
-0.754702 |
| 50% |
-0.005199 |
-0.026878 |
-0.019116 |
0.005450 |
| 75% |
0.649159 |
0.613807 |
0.690614 |
0.625859 |
| max |
3.408137 |
3.171119 |
3.784272 |
2.992607 |
col=data[2]
col[np.abs(col)>3]
322 3.059163
431 -3.089013
648 -4.309211
653 3.784272
834 3.007481
Name: 2, dtype: float64
help(pd.DataFrame.any)
Help on function any in module pandas.core.frame:
any(self, axis=None, bool_only=None, skipna=None, level=None, **kwargs)
Return whether any element is True over requested axis
Parameters
----------
axis : {index (0), columns (1)}
skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result
will be NA
level : int or level name, default None
If the axis is a MultiIndex (hierarchical), count along a
particular level, collapsing into a Series
bool_only : boolean, default None
Include only boolean columns. If None, will attempt to use everything,
then use only boolean data. Not implemented for Series.
Returns
-------
any : Series or DataFrame (if level specified)
(abs(data)>3) ==(np.abs(data)>3)
|
0 |
1 |
2 |
3 |
| 0 |
True |
True |
True |
True |
| 1 |
True |
True |
True |
True |
| 2 |
True |
True |
True |
True |
| 3 |
True |
True |
True |
True |
| 4 |
True |
True |
True |
True |
| 5 |
True |
True |
True |
True |
| 6 |
True |
True |
True |
True |
| 7 |
True |
True |
True |
True |
| 8 |
True |
True |
True |
True |
| 9 |
True |
True |
True |
True |
| 10 |
True |
True |
True |
True |
| 11 |
True |
True |
True |
True |
| 12 |
True |
True |
True |
True |
| 13 |
True |
True |
True |
True |
| 14 |
True |
True |
True |
True |
| 15 |
True |
True |
True |
True |
| 16 |
True |
True |
True |
True |
| 17 |
True |
True |
True |
True |
| 18 |
True |
True |
True |
True |
| 19 |
True |
True |
True |
True |
| 20 |
True |
True |
True |
True |
| 21 |
True |
True |
True |
True |
| 22 |
True |
True |
True |
True |
| 23 |
True |
True |
True |
True |
| 24 |
True |
True |
True |
True |
| 25 |
True |
True |
True |
True |
| 26 |
True |
True |
True |
True |
| 27 |
True |
True |
True |
True |
| 28 |
True |
True |
True |
True |
| 29 |
True |
True |
True |
True |
| ... |
... |
... |
... |
... |
| 970 |
True |
True |
True |
True |
| 971 |
True |
True |
True |
True |
| 972 |
True |
True |
True |
True |
| 973 |
True |
True |
True |
True |
| 974 |
True |
True |
True |
True |
| 975 |
True |
True |
True |
True |
| 976 |
True |
True |
True |
True |
| 977 |
True |
True |
True |
True |
| 978 |
True |
True |
True |
True |
| 979 |
True |
True |
True |
True |
| 980 |
True |
True |
True |
True |
| 981 |
True |
True |
True |
True |
| 982 |
True |
True |
True |
True |
| 983 |
True |
True |
True |
True |
| 984 |
True |
True |
True |
True |
| 985 |
True |
True |
True |
True |
| 986 |
True |
True |
True |
True |
| 987 |
True |
True |
True |
True |
| 988 |
True |
True |
True |
True |
| 989 |
True |
True |
True |
True |
| 990 |
True |
True |
True |
True |
| 991 |
True |
True |
True |
True |
| 992 |
True |
True |
True |
True |
| 993 |
True |
True |
True |
True |
| 994 |
True |
True |
True |
True |
| 995 |
True |
True |
True |
True |
| 996 |
True |
True |
True |
True |
| 997 |
True |
True |
True |
True |
| 998 |
True |
True |
True |
True |
| 999 |
True |
True |
True |
True |
1000 rows × 4 columns
data[(np.abs(data)>3).any(1)]
|
0 |
1 |
2 |
3 |
| 59 |
-3.400618 |
0.342563 |
0.649758 |
-2.629268 |
| 274 |
1.264869 |
-3.427137 |
0.991494 |
-0.906788 |
| 322 |
2.714233 |
-1.239436 |
3.059163 |
0.318054 |
| 431 |
-0.376058 |
-0.713530 |
-3.089013 |
-0.791221 |
| 460 |
0.411801 |
-0.323974 |
0.301139 |
-3.051362 |
| 465 |
0.054043 |
-1.046532 |
2.054820 |
-4.375632 |
| 587 |
0.857067 |
-3.162763 |
0.137409 |
-1.327873 |
| 648 |
-0.323629 |
0.325867 |
-4.309211 |
-0.477572 |
| 653 |
0.171840 |
0.148702 |
3.784272 |
0.269508 |
| 678 |
0.303109 |
3.171119 |
0.854269 |
0.489537 |
| 834 |
1.651314 |
1.303992 |
3.007481 |
0.494971 |
| 841 |
3.408137 |
0.869413 |
-0.111245 |
1.306775 |
| 960 |
-0.302520 |
-3.118445 |
2.116509 |
0.003669 |
np.sign([0,0.3,-0.3,20,-90])
array([ 0., 1., -1., 1., -1.])
data[np.abs(data)>3]=np.sign(data)*3
np.sign(data)*3
|
0 |
1 |
2 |
3 |
| 0 |
-3.0 |
-3.0 |
-3.0 |
3.0 |
| 1 |
-3.0 |
3.0 |
3.0 |
-3.0 |
| 2 |
3.0 |
3.0 |
3.0 |
3.0 |
| 3 |
-3.0 |
-3.0 |
-3.0 |
-3.0 |
| 4 |
3.0 |
-3.0 |
3.0 |
-3.0 |
| 5 |
-3.0 |
-3.0 |
-3.0 |
3.0 |
| 6 |
-3.0 |
-3.0 |
3.0 |
3.0 |
| 7 |
3.0 |
3.0 |
3.0 |
3.0 |
| 8 |
-3.0 |
3.0 |
-3.0 |
3.0 |
| 9 |
3.0 |
-3.0 |
3.0 |
3.0 |
| 10 |
-3.0 |
3.0 |
-3.0 |
-3.0 |
| 11 |
-3.0 |
3.0 |
3.0 |
3.0 |
| 12 |
3.0 |
3.0 |
3.0 |
3.0 |
| 13 |
3.0 |
-3.0 |
3.0 |
3.0 |
| 14 |
3.0 |
3.0 |
3.0 |
3.0 |
| 15 |
3.0 |
3.0 |
3.0 |
-3.0 |
| 16 |
3.0 |
-3.0 |
3.0 |
3.0 |
| 17 |
3.0 |
-3.0 |
-3.0 |
3.0 |
| 18 |
-3.0 |
3.0 |
3.0 |
3.0 |
| 19 |
3.0 |
3.0 |
3.0 |
-3.0 |
| 20 |
-3.0 |
3.0 |
3.0 |
3.0 |
| 21 |
3.0 |
3.0 |
-3.0 |
3.0 |
| 22 |
-3.0 |
3.0 |
-3.0 |
-3.0 |
| 23 |
3.0 |
3.0 |
-3.0 |
-3.0 |
| 24 |
3.0 |
-3.0 |
3.0 |
3.0 |
| 25 |
-3.0 |
-3.0 |
-3.0 |
3.0 |
| 26 |
3.0 |
3.0 |
-3.0 |
-3.0 |
| 27 |
3.0 |
-3.0 |
-3.0 |
-3.0 |
| 28 |
3.0 |
-3.0 |
-3.0 |
3.0 |
| 29 |
3.0 |
3.0 |
-3.0 |
-3.0 |
| ... |
... |
... |
... |
... |
| 970 |
-3.0 |
-3.0 |
3.0 |
-3.0 |
| 971 |
-3.0 |
3.0 |
-3.0 |
-3.0 |
| 972 |
-3.0 |
3.0 |
-3.0 |
3.0 |
| 973 |
3.0 |
3.0 |
3.0 |
3.0 |
| 974 |
3.0 |
-3.0 |
-3.0 |
3.0 |
| 975 |
-3.0 |
3.0 |
-3.0 |
3.0 |
| 976 |
-3.0 |
3.0 |
3.0 |
3.0 |
| 977 |
-3.0 |
-3.0 |
3.0 |
-3.0 |
| 978 |
3.0 |
-3.0 |
-3.0 |
-3.0 |
| 979 |
-3.0 |
3.0 |
-3.0 |
3.0 |
| 980 |
-3.0 |
-3.0 |
-3.0 |
3.0 |
| 981 |
3.0 |
3.0 |
3.0 |
-3.0 |
| 982 |
-3.0 |
3.0 |
-3.0 |
-3.0 |
| 983 |
-3.0 |
3.0 |
-3.0 |
-3.0 |
| 984 |
3.0 |
3.0 |
-3.0 |
-3.0 |
| 985 |
3.0 |
3.0 |
-3.0 |
3.0 |
| 986 |
-3.0 |
-3.0 |
-3.0 |
3.0 |
| 987 |
-3.0 |
3.0 |
-3.0 |
-3.0 |
| 988 |
3.0 |
3.0 |
-3.0 |
-3.0 |
| 989 |
3.0 |
-3.0 |
-3.0 |
3.0 |
| 990 |
3.0 |
-3.0 |
3.0 |
-3.0 |
| 991 |
3.0 |
-3.0 |
3.0 |
3.0 |
| 992 |
-3.0 |
3.0 |
-3.0 |
-3.0 |
| 993 |
-3.0 |
3.0 |
-3.0 |
3.0 |
| 994 |
3.0 |
-3.0 |
-3.0 |
-3.0 |
| 995 |
3.0 |
-3.0 |
-3.0 |
-3.0 |
| 996 |
3.0 |
-3.0 |
3.0 |
-3.0 |
| 997 |
-3.0 |
-3.0 |
-3.0 |
-3.0 |
| 998 |
3.0 |
3.0 |
-3.0 |
-3.0 |
| 999 |
3.0 |
3.0 |
3.0 |
-3.0 |
1000 rows × 4 columns
data
|
0 |
1 |
2 |
3 |
| 0 |
-0.564062 |
-0.887969 |
-0.854782 |
0.107613 |
| 1 |
-1.364165 |
1.337851 |
1.671698 |
-0.814129 |
| 2 |
0.765877 |
1.916774 |
0.441002 |
2.128419 |
| 3 |
-0.581957 |
-1.024641 |
-1.983024 |
-2.757392 |
| 4 |
0.778034 |
-1.375845 |
0.044277 |
-1.037062 |
| 5 |
-0.796683 |
-0.540663 |
-0.120198 |
0.003503 |
| 6 |
-0.708554 |
-0.105414 |
1.037527 |
0.826310 |
| 7 |
1.233856 |
1.217529 |
1.097430 |
0.842746 |
| 8 |
-0.201433 |
0.249823 |
-1.620147 |
0.436595 |
| 9 |
1.328493 |
-0.396323 |
1.927629 |
1.615656 |
| 10 |
-0.560207 |
0.252996 |
-0.151543 |
-0.667813 |
| 11 |
-1.729057 |
1.144087 |
1.087689 |
0.520086 |
| 12 |
0.704758 |
1.707940 |
0.720834 |
0.447245 |
| 13 |
1.024834 |
-0.217376 |
1.340304 |
0.176801 |
| 14 |
0.075745 |
1.430761 |
0.193627 |
0.191701 |
| 15 |
0.536566 |
0.047559 |
1.715175 |
-1.115074 |
| 16 |
2.803965 |
-0.465377 |
1.127140 |
1.417856 |
| 17 |
0.677525 |
-1.091631 |
-0.572231 |
0.241533 |
| 18 |
-1.172228 |
1.049830 |
0.266288 |
0.836902 |
| 19 |
0.930699 |
0.379891 |
1.637741 |
-1.770379 |
| 20 |
-0.749769 |
0.711326 |
1.591292 |
1.099071 |
| 21 |
1.550585 |
1.276488 |
-0.214484 |
0.195340 |
| 22 |
-0.289236 |
1.882439 |
-0.275263 |
-0.247316 |
| 23 |
0.688167 |
0.357913 |
-1.675828 |
-0.305840 |
| 24 |
1.255532 |
-1.802804 |
0.889900 |
0.864982 |
| 25 |
-1.391447 |
-0.291022 |
-0.190022 |
0.540653 |
| 26 |
0.435101 |
2.444416 |
-1.235937 |
-0.428450 |
| 27 |
0.165456 |
-1.091942 |
-1.560662 |
-0.739435 |
| 28 |
1.469728 |
-0.123806 |
-2.071746 |
2.574603 |
| 29 |
1.287949 |
1.278130 |
-0.825906 |
-1.852465 |
| ... |
... |
... |
... |
... |
| 970 |
-0.379102 |
-0.778606 |
2.213794 |
-0.062573 |
| 971 |
-1.108557 |
0.723650 |
-2.436704 |
-0.068733 |
| 972 |
-0.518995 |
0.455508 |
-0.217321 |
1.363977 |
| 973 |
0.444636 |
1.625221 |
0.222103 |
1.236397 |
| 974 |
0.699354 |
-2.076747 |
-0.454499 |
0.383902 |
| 975 |
-1.759718 |
0.717117 |
-0.077413 |
1.698893 |
| 976 |
-1.230778 |
0.222673 |
0.151731 |
0.174875 |
| 977 |
-0.575290 |
-0.316810 |
0.380077 |
-0.048428 |
| 978 |
1.906133 |
-0.861802 |
-0.026937 |
-2.865641 |
| 979 |
-0.134489 |
0.607949 |
-0.821089 |
0.831827 |
| 980 |
-0.058894 |
-0.707492 |
-0.273980 |
0.129724 |
| 981 |
2.288519 |
0.149683 |
0.580679 |
-0.055218 |
| 982 |
-0.280748 |
0.861358 |
-0.254339 |
-0.596723 |
| 983 |
-1.322965 |
0.323534 |
-0.585862 |
-1.316894 |
| 984 |
0.793711 |
0.165646 |
-0.212855 |
-1.752453 |
| 985 |
0.310908 |
0.758156 |
-0.040923 |
0.538293 |
| 986 |
-0.589173 |
-1.688947 |
-0.501485 |
0.019880 |
| 987 |
-0.111807 |
1.007026 |
-0.853133 |
-0.249211 |
| 988 |
0.601993 |
0.690953 |
-1.168277 |
-0.516737 |
| 989 |
1.319895 |
-0.046141 |
-0.680194 |
1.443361 |
| 990 |
1.839785 |
-0.480675 |
0.056481 |
-0.097993 |
| 991 |
2.590916 |
-0.367057 |
1.110105 |
0.130826 |
| 992 |
-0.108846 |
1.717209 |
-0.580895 |
-0.985869 |
| 993 |
-1.152810 |
0.390732 |
-0.104866 |
1.553947 |
| 994 |
1.721177 |
-0.088994 |
-0.565308 |
-1.602808 |
| 995 |
0.922409 |
-0.027923 |
-1.258001 |
-1.933848 |
| 996 |
0.647699 |
-0.089378 |
1.455509 |
-0.598519 |
| 997 |
-1.590236 |
-0.544202 |
-0.764923 |
-0.329425 |
| 998 |
0.969542 |
0.106538 |
-0.188919 |
-1.474017 |
| 999 |
0.235337 |
0.232514 |
0.113181 |
-1.403455 |
1000 rows × 4 columns
np.sign(data).head(10) # return the first 10 rows.
|
0 |
1 |
2 |
3 |
| 0 |
-1.0 |
-1.0 |
-1.0 |
1.0 |
| 1 |
-1.0 |
1.0 |
1.0 |
-1.0 |
| 2 |
1.0 |
1.0 |
1.0 |
1.0 |
| 3 |
-1.0 |
-1.0 |
-1.0 |
-1.0 |
| 4 |
1.0 |
-1.0 |
1.0 |
-1.0 |
| 5 |
-1.0 |
-1.0 |
-1.0 |
1.0 |
| 6 |
-1.0 |
-1.0 |
1.0 |
1.0 |
| 7 |
1.0 |
1.0 |
1.0 |
1.0 |
| 8 |
-1.0 |
1.0 |
-1.0 |
1.0 |
| 9 |
1.0 |
-1.0 |
1.0 |
1.0 |
Permutation and random sample
df=pd.DataFrame(np.arange(20).reshape((5,4)))
sampler=np.random.permutation(5);sampler
array([4, 3, 1, 2, 0])
df.take(sampler)
|
0 |
1 |
2 |
3 |
| 4 |
16 |
17 |
18 |
19 |
| 3 |
12 |
13 |
14 |
15 |
| 1 |
4 |
5 |
6 |
7 |
| 2 |
8 |
9 |
10 |
11 |
| 0 |
0 |
1 |
2 |
3 |
df.sample(n=4)
|
0 |
1 |
2 |
3 |
| 2 |
8 |
9 |
10 |
11 |
| 0 |
0 |
1 |
2 |
3 |
| 1 |
4 |
5 |
6 |
7 |
| 4 |
16 |
17 |
18 |
19 |
df.sample(n=10,replace=True) # replace allows repeat choices.
|
0 |
1 |
2 |
3 |
| 1 |
4 |
5 |
6 |
7 |
| 2 |
8 |
9 |
10 |
11 |
| 1 |
4 |
5 |
6 |
7 |
| 2 |
8 |
9 |
10 |
11 |
| 0 |
0 |
1 |
2 |
3 |
| 3 |
12 |
13 |
14 |
15 |
| 3 |
12 |
13 |
14 |
15 |
| 2 |
8 |
9 |
10 |
11 |
| 0 |
0 |
1 |
2 |
3 |
| 4 |
16 |
17 |
18 |
19 |
choices=pd.Series([5,7,-1,6,4])
choices.sample(n=10,replace=True)
2 -1
4 4
1 7
0 5
0 5
3 6
4 4
2 -1
1 7
1 7
dtype: int64
Computing indicator/Dummy variables
df=pd.DataFrame({'Key':['b','b','a','c','a','b'],'data1':range(6)});df
|
Key |
data1 |
| 0 |
b |
0 |
| 1 |
b |
1 |
| 2 |
a |
2 |
| 3 |
c |
3 |
| 4 |
a |
4 |
| 5 |
b |
5 |
pd.get_dummies(df['Key'])
|
a |
b |
c |
| 0 |
0 |
1 |
0 |
| 1 |
0 |
1 |
0 |
| 2 |
1 |
0 |
0 |
| 3 |
0 |
0 |
1 |
| 4 |
1 |
0 |
0 |
| 5 |
0 |
1 |
0 |
pd.get_dummies(df['Key'],prefix='key')
|
key_a |
key_b |
key_c |
| 0 |
0 |
1 |
0 |
| 1 |
0 |
1 |
0 |
| 2 |
1 |
0 |
0 |
| 3 |
0 |
0 |
1 |
| 4 |
1 |
0 |
0 |
| 5 |
0 |
1 |
0 |
df[['data1']]
|
data1 |
| 0 |
0 |
| 1 |
1 |
| 2 |
2 |
| 3 |
3 |
| 4 |
4 |
| 5 |
5 |
df['data1']
0 0
1 1
2 2
3 3
4 4
5 5
Name: data1, dtype: int32
- so the difference between df[['data1']] and df['data1'] is apparent, the former one returns DataFrame,the latter one returns Series.
- [Machine Learning with Python] Data Preparation by Pandas and Scikit-Learn
In this article, we dicuss some main steps in data preparation. Drop Labels Firstly, we drop labels ...
- [Machine Learning with Python] Data Preparation through Transformation Pipeline
In the former article "Data Preparation by Pandas and Scikit-Learn", we discussed about a ...
- 机器学习- Sklearn (交叉验证和Pipeline)
前面一节咱们已经介绍了决策树的原理已经在sklearn中的应用.那么这里还有两个数据处理和sklearn应用中的小知识点咱们还没有讲,但是在实践中却会经常要用到的,那就是交叉验证cross_valid ...
- Pandas的Categorical Data
http://liao.cpython.org/pandas15/ Docs » Pandas的Categorical Data类型 15. Pandas的Categorical Data panda ...
- 【Repost】A Practical Intro to Data Science
Are you a interested in taking a course with us? Learn about our programs or contact us at hello@zip ...
- A Complete Tutorial to Learn Data Science with Python from Scratch
A Complete Tutorial to Learn Data Science with Python from Scratch Introduction It happened few year ...
- pandas 之 字符串处理
import numpy as np import pandas as pd Python has long been a popular raw data manipulation language ...
- pandas 之 数据清洗-缺失值
Abstract During the course fo doing data analysis and modeling, a significant amount of time is spen ...
- 数据分析06-五个pandas可视化项目
数据分析-06 数据分析-06 pandas可视化 基本绘图 Series数据可视化 DataFrame数据可视化 高级绘图 代码总结 pandas可视化 基本绘图 pandas高级绘图 pandas ...
- Why Apache Spark is a Crossover Hit for Data Scientists [FWD]
Spark is a compelling multi-purpose platform for use cases that span investigative, as well as opera ...
随机推荐
- 音乐在线刮削容器部署(Music Tag Web)
『音乐标签』Web版是一款可以编辑歌曲的标题,专辑,艺术家,歌词,封面等信息的音乐标签编辑器程序, 支持FLAC, APE, WAV, AIFF, WV, TTA, MP3, M4A, OGG, MP ...
- Hbase - hbase hbck介绍
原文地址:https://bbs.huaweicloud.com/blogs/353332 HBaseFsck(hbck)是一种命令行工具,可检查hbase集群的region一致性和表完整性的问题,同 ...
- Visual Studio 卸载
1.找个安装镜像文件 2.必须以管理员身份运行cmd 3.在CMD里输入"G:\vs_professional.exe /uninstall /force" 4.企业版就把prof ...
- try catch异常捕获工具类
异常捕获工具类 package com.example.multiThreadTransaction_demo.utils; import lombok.extern.slf4j.Slf4j; imp ...
- windows mysql8安装zip
MySQL 是一种广泛使用的关系数据库管理系统,MySQL 8 是其最新的主要版本,结合了出色的性能和丰富的功能. 一.准备工作 1. 下载MySQL 8 zip包 首先,你需要获取MySQL 8的压 ...
- Qt设置QTextEdit的行高
Qt设置QTextEdit的行高 解决方法: QTextDocument* doc = ui->edtCountryIntroduce->document(); for(QTextBloc ...
- MySQL错误码大全
B.1. 服务器错误代码和消息服务器错误信息来自下述源文件:· 错误消息信息列在share/errmsg.txt文件中."%d"和"%s"分别代表编号和字符串, ...
- 【QT】解决生成的exe文件出现“无法定位程序入口”或“找不到xxx.dll”的问题
[QT]解决生成的exe文件出现"无法定位程序入口"或"找不到xxx.dll"的问题 零.问题 使用QT编译好项目后,想直接在文件资源管理器中运行exe程序或想 ...
- 【视频编辑】Pr视频编辑软件导出的视频声音有一段会变大怎么解决
导出视频后为什么有段声音会突然变大? 也就是可能存在编辑器导出的时候有自动增益声音的行为. 具体描述: 工程文件里我没动过声音,工程文件里听也是很正常的,但是导出后有一小段音乐会突然变大(存在自动增益 ...
- HTML5 定时通知
通过setInterval()和Notification来实现定时通知功能. demo <script> window.onload = function () { //每10秒弹出一个桌 ...