吴裕雄--天生自然 python数据分析:葡萄酒分析
# import pandas
import pandas as pd # creating a DataFrame
pd.DataFrame({'Yes': [50, 31], 'No': [101, 2]})
# another example of creating a dataframe
pd.DataFrame({'Bob': ['I liked it.', 'It was awful.'], 'Sue': ['Pretty good.', 'Bland']})
pd.DataFrame({'Bob': ['I liked it.', 'It was awful.'],
'Sue': ['Pretty good.', 'Bland.']},
index = ['Product A', 'Product B'])
# creating a pandas series
pd.Series([1, 2, 3, 4, 5])
# we can think of a Series as a column of a DataFrame.
# we can assign index values to Series in same way as pandas DataFrame
pd.Series([10, 20, 30], index=['2015 sales', '2016 sales', '2017 sales'], name='Product A')
# reading a csv file and storing it in a variable
wine_reviews = pd.read_csv("F:\\kaggleDataSet\\wine-reviews\\winemag-data-130k-v2.csv")
# we can use the 'shape' attribute to check size of dataset
wine_reviews.shape
# To show first five rows of data, use 'head()' method
wine_reviews.head()
wine_reviews = pd.read_csv("F:\\kaggleDataSet\\wine-reviews\\winemag-data-130k-v2.csv", index_col=0)
wine_reviews.head()
wine_reviews.head().to_csv("F:\\wine_reviews.csv")
import pandas as pd
reviews = pd.read_csv("F:\\kaggleDataSet\\wine-reviews\\winemag-data-130k-v2.csv", index_col=0)
pd.set_option("display.max_rows", 5)
reviews
# access 'country' property (or column) of 'reviews'
reviews.country
# Another way to do above operation
# when a column name contains space, we have to use this method
reviews['country']
# To access first row of country column
reviews['country'][0]
# returns first row
reviews.iloc[0]
# returns first column (country) (all rows due to ':')
reviews.iloc[:, 0]
# retruns first 3 rows of first column
reviews.iloc[:3, 0]
# we can pass a list of indices of rows/columns to select
reviews.iloc[[0, 1, 2, 3], 0]
# We can also pass negative numbers as we do in Python
reviews.iloc[-5:]
# To select first entry in country column
reviews.loc[0, 'country']
# select columns by name using 'loc'
reviews.loc[:, ['taster_name', 'taster_twitter_handle', 'points']]
# 'set_index' to the 'title' field
reviews.set_index('title')
# 1. Find out whether wine is produced in Italy
reviews.country == 'Italy'
# 2. Now select all wines produced in Italy
reviews.loc[reviews.country == 'Italy'] #reviews[reviews.country == 'Italy']
# Add one more condition for points to find better than average wines produced in Italy
reviews.loc[(reviews.country == 'Italy') & (reviews.points >= 90)] # use | for 'OR' condition
reviews.loc[reviews.country.isin(['Italy', 'France'])]
reviews.loc[reviews.price.notnull()]
reviews['critic'] = 'everyone'
reviews.critic
# using iterable for assigning
reviews['index_backwards'] = range(len(reviews), 0, -1)
reviews['index_backwards']
吴裕雄--天生自然 python数据分析:葡萄酒分析的更多相关文章
- 吴裕雄--天生自然 PYTHON数据分析:所有美国股票和etf的历史日价格和成交量分析
# This Python 3 environment comes with many helpful analytics libraries installed # It is defined by ...
- 吴裕雄--天生自然 python数据分析:健康指标聚集分析(健康分析)
# This Python 3 environment comes with many helpful analytics libraries installed # It is defined by ...
- 吴裕雄--天生自然 PYTHON数据分析:基于Keras的CNN分析太空深处寻找系外行星数据
#We import libraries for linear algebra, graphs, and evaluation of results import numpy as np import ...
- 吴裕雄--天生自然 PYTHON数据分析:钦奈水资源管理分析
df = pd.read_csv("F:\\kaggleDataSet\\chennai-water\\chennai_reservoir_levels.csv") df[&quo ...
- 吴裕雄--天生自然 PYTHON数据分析:糖尿病视网膜病变数据分析(完整版)
# This Python 3 environment comes with many helpful analytics libraries installed # It is defined by ...
- 吴裕雄--天生自然 PYTHON数据分析:人类发展报告——HDI, GDI,健康,全球人口数据数据分析
import pandas as pd # Data analysis import numpy as np #Data analysis import seaborn as sns # Data v ...
- 吴裕雄--天生自然 python数据分析:医疗费数据分析
import numpy as np import pandas as pd import os import matplotlib.pyplot as pl import seaborn as sn ...
- 吴裕雄--天生自然 python数据分析:基于Keras使用CNN神经网络处理手写数据集
import pandas as pd import numpy as np import matplotlib.pyplot as plt import matplotlib.image as mp ...
- 吴裕雄--天生自然 PYTHON数据分析:医疗数据分析
import numpy as np # linear algebra import pandas as pd # data processing, CSV file I/O (e.g. pd.rea ...
随机推荐
- Java 工厂模式登陆系统实现
没有工厂模式 设定一个登陆系统 UserServiceImp.java public class UserServiceImp { public boolean login(String userna ...
- 在gff中切fa的内容
#!/usr/bin/python import re def readfa(l): col={} arr =[] sca ='' li = open(l) for line in li: if re ...
- Ubuntu更改源地址列表
1. 备份源列表 sudo cp /etc/apt/sources.list /etc/apt/sources.list.backup 2.打开源列表 sudo gedit /etc/apt/sour ...
- Hibernate相关概念及序列化和持久化的区别
hibernate是一种ORM(object relation mapping,对象关系映射)框架,所谓的对象关系映射,通俗的说,就是把JAVA对象保存到关系型数据库中. hibernate要做的事, ...
- Linux svn 服务器配置--转
转自 http://my.oschina.net/lionel45/blog/298305 留存备份. Linux搭建SVN 服务器 作者: 沈小然 日期:2014年 8月 5日 1 ...
- Python运维中常用的_脚本
前言 file是一个类,使用file('file_name', 'r+')这种方式打开文件,返回一个file对象,以写模式打开文件不存在则会被创建.但是更推荐使用内置函数open()来打开一个文件. ...
- Nginx配置使用
1.黑色标注的得自己写入到nginx.conf文件中 upstream serverlb { server 127.0.0.1:9999; server 127.0.0.1:8888; } serve ...
- day06-迭代器
一.迭代器: 1.可迭代协议:含有__iter__方法. 2.迭代器协议:同时含有__iter__和__next__方法.迭代器是可迭代对象. iterator迭代器. 3.使用可迭代对象有什么好处? ...
- Relative-Frequency|frequency|pie chart |bar chart
2.2Organizing Qualitative Data The number of times a particular distinct value occurs is called its ...
- easyui 表单提交前的 confirm 处理
最近学习用 easyui,异步提交表单是遇到一个小问题 $('#fModiDetail').form('submit',{ url:'...', onSubmit:function(){ if($(t ...