吴裕雄--天生自然 python数据分析:葡萄酒分析

# import pandas
import pandas as pd # creating a DataFrame
pd.DataFrame({'Yes': [50, 31], 'No': [101, 2]})

# another example of creating a dataframe
pd.DataFrame({'Bob': ['I liked it.', 'It was awful.'], 'Sue': ['Pretty good.', 'Bland']})

pd.DataFrame({'Bob': ['I liked it.', 'It was awful.'],
'Sue': ['Pretty good.', 'Bland.']},
index = ['Product A', 'Product B'])

# creating a pandas series
pd.Series([1, 2, 3, 4, 5])

# we can think of a Series as a column of a DataFrame.
# we can assign index values to Series in same way as pandas DataFrame
pd.Series([10, 20, 30], index=['2015 sales', '2016 sales', '2017 sales'], name='Product A')

# reading a csv file and storing it in a variable
wine_reviews = pd.read_csv("F:\\kaggleDataSet\\wine-reviews\\winemag-data-130k-v2.csv")
# we can use the 'shape' attribute to check size of dataset
wine_reviews.shape

# To show first five rows of data, use 'head()' method
wine_reviews.head()

wine_reviews = pd.read_csv("F:\\kaggleDataSet\\wine-reviews\\winemag-data-130k-v2.csv", index_col=0)
wine_reviews.head()

wine_reviews.head().to_csv("F:\\wine_reviews.csv")

import pandas as pd
reviews = pd.read_csv("F:\\kaggleDataSet\\wine-reviews\\winemag-data-130k-v2.csv", index_col=0)
pd.set_option("display.max_rows", 5)
reviews

# access 'country' property (or column) of 'reviews'
reviews.country

# Another way to do above operation
# when a column name contains space, we have to use this method
reviews['country']

# To access first row of country column
reviews['country'][0]

# returns first row
reviews.iloc[0]

# returns first column (country) (all rows due to ':')
reviews.iloc[:, 0]

# retruns first 3 rows of first column
reviews.iloc[:3, 0]

# we can pass a list of indices of rows/columns to select
reviews.iloc[[0, 1, 2, 3], 0]

# We can also pass negative numbers as we do in Python
reviews.iloc[-5:]

# To select first entry in country column
reviews.loc[0, 'country']

# select columns by name using 'loc'
reviews.loc[:, ['taster_name', 'taster_twitter_handle', 'points']]

# 'set_index' to the 'title' field
reviews.set_index('title')

# 1. Find out whether wine is produced in Italy
reviews.country == 'Italy'

# 2. Now select all wines produced in Italy
reviews.loc[reviews.country == 'Italy'] #reviews[reviews.country == 'Italy']

# Add one more condition for points to find better than average wines produced in Italy
reviews.loc[(reviews.country == 'Italy') & (reviews.points >= 90)] # use | for 'OR' condition

reviews.loc[reviews.country.isin(['Italy', 'France'])]

reviews.loc[reviews.price.notnull()]

reviews['critic'] = 'everyone'
reviews.critic

# using iterable for assigning
reviews['index_backwards'] = range(len(reviews), 0, -1)
reviews['index_backwards']

吴裕雄--天生自然 python数据分析:葡萄酒分析的更多相关文章
- 吴裕雄--天生自然 PYTHON数据分析:所有美国股票和etf的历史日价格和成交量分析
# This Python 3 environment comes with many helpful analytics libraries installed # It is defined by ...
- 吴裕雄--天生自然 python数据分析:健康指标聚集分析(健康分析)
# This Python 3 environment comes with many helpful analytics libraries installed # It is defined by ...
- 吴裕雄--天生自然 PYTHON数据分析:基于Keras的CNN分析太空深处寻找系外行星数据
#We import libraries for linear algebra, graphs, and evaluation of results import numpy as np import ...
- 吴裕雄--天生自然 PYTHON数据分析:钦奈水资源管理分析
df = pd.read_csv("F:\\kaggleDataSet\\chennai-water\\chennai_reservoir_levels.csv") df[&quo ...
- 吴裕雄--天生自然 PYTHON数据分析:糖尿病视网膜病变数据分析(完整版)
# This Python 3 environment comes with many helpful analytics libraries installed # It is defined by ...
- 吴裕雄--天生自然 PYTHON数据分析:人类发展报告——HDI, GDI,健康,全球人口数据数据分析
import pandas as pd # Data analysis import numpy as np #Data analysis import seaborn as sns # Data v ...
- 吴裕雄--天生自然 python数据分析:医疗费数据分析
import numpy as np import pandas as pd import os import matplotlib.pyplot as pl import seaborn as sn ...
- 吴裕雄--天生自然 python数据分析:基于Keras使用CNN神经网络处理手写数据集
import pandas as pd import numpy as np import matplotlib.pyplot as plt import matplotlib.image as mp ...
- 吴裕雄--天生自然 PYTHON数据分析:医疗数据分析
import numpy as np # linear algebra import pandas as pd # data processing, CSV file I/O (e.g. pd.rea ...
随机推荐
- JDK8新特性之stream
stream中有很多方法,讲一些常用的. 1.forEach(),遍历方法,很简单,对于一般的遍历可以替代for循环 List<String> strings = Arrays.asLis ...
- tif图片压缩
tif图片在ImageIo.read获取时,返回为空,导致无法使用,百度了很久,很多人说jai可以,便去看了下,总结如下: public static void CompressPic(String ...
- 10.PoolArena
PoolArena PoolArena成员介绍 PoolChunkList PoolChunkList实例化 PoolChunkList添加PoolChunk PoolChunkList移动PoolC ...
- 14 微服务电商【黑马乐优商城】:day02-springcloud(理论篇一:HttpClient的简单使用)
本项目的笔记和资料的Download,请点击这一句话自行获取. day01-springboot(理论篇) :day01-springboot(实践篇) day02-springcloud(理论篇一: ...
- linux 下c 链接so 库
gcc -shared -fPIC -o libname.so *.c //生成so库 gcc main.c -om -Lpath -lname //链接测试so 库 但是生成可执行程序执行时报 ...
- CI_CD(jenkins)公司实战_未完成版
环境准备 #三台服务器,不同的ip centos 镜像时:CentOS Linux release 7.6.1810 (Core) gitlab-ce 版本是:11.11.3 jenkins ...
- 领域建模-模型验证与面向资源的API设计
使用 UMLet 建模 1. 使用类图,分别对 Asg_RH 文档中 Make Reservation 用例以及 Payment 用例开展领域建模.然后,根据上述模型,给出建议的数据表以及主要字段,特 ...
- 32)PHP,遍历对象的属性或者属性值
首先是遍历属性: <?php class A{ ; ; ; function fetchAllProp(){ //遍历时,key取得属性名,value取得对应值 foreach($this as ...
- bwa index|amb|ann|bwt|pac|sa
-.gapcloser.fa | > t1.fa bwa index -a bwtsw -p t1 t1.fa >t1.bwa_index.log >& #$ ll #tot ...
- 浅谈Java中的泛型
泛型是Java自JDK5开始支持的新特性,主要用来保证类型安全.另外泛型也让代码含义更加明确清晰,增加了代码的可读性. 泛型的声明和使用 在类声明时在类名后面声明泛型,比如MyList<T> ...