Kaggle:Titanic: Machine Learning from Disaster

一直想着抓取股票的变化，偶然的机会在看股票数据抓取的博客看到了kaggle，然后看了看里面的题，感觉挺新颖的，就试了试。

题目如图：给了一个train.csv，现在预测test.csv里面的Passager是否幸存。train.csv里面包含的乘客信息有

PassagerId	乘客id
Survived	乘客是否幸存
Pclass	仓位
Name	乘客姓名
Sex	乘客性别
Age	乘客年龄
SibSp	船上是否有兄弟姐妹
Parch	穿上是否有父母子女
Ticket	船票信息
Fare	票价
Cabin	客舱
Embarked	上船地址

然后表里面的Sibsp，Parch，Name，PassagerId，Ticket，Cabin都是些数据无关的信息。

然后用到了随机森林算法。

#-*- coding:utf-8 -*-

import numpy as np # linear algebra

import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

from subprocess import check_outputimport csv

import random as rnd

import seaborn as sns

import matplotlib.pyplot as plt

from sklearn.ensemble import RandomForestClassifier

from sklearn.cross_validation import cross_val_score

from sklearn.grid_search import GridSearchCV, RandomizedSearchCV

train_df = pd.read_csv('train.csv', header=0)

test_df = pd.read_csv('test.csv', header=0)

df = pd.concat([train_df, test_df])

df = df.reset_index()

df = df.drop('index',axis=1)

#移除index列

df = df.reindex_axis(train_df.columns,axis=1)

#填补合并之后的表中 属性是Age,Fare,Embarked为空的值

df['Age'][df['Age'].isnull()] = df['Age'].median()

df['Fare'][df['Fare'].isnull()] = df['Fare'].median()

df['Embarked'][df['Embarked'].isnull()] = df['Embarked'].mode().values

#将表中的Sex属性做映射

df['Sex'] = pd.factorize(df['Sex'])[0]

df['Embarked'] = pd.factorize(df['Embarked'])[0]

df['family_member'] = df['SibSp'] + df['Parch']

#移除表中的'Cabin','Ticke t','Name','SibSp','Parch','PassengerId'属性

d= df.drop(['Cabin','Ticke t','Name','SibSp','Parch','PassengerId'],axis=1)

survived_member = df[df['Survived'].notnull()].values

test_message = df[df['Survived'].isnull()].values

Y = survived_member[:, 0].astype(int)

#取servived属性不为空的属性的第一列

X = survived_member[:, 1:].astype(int)

#取servived属性不为空的出第一列之外的所有信息

result = RandomForestClassifier(n_estimators=1000, random_state=312, min_samples_leaf=3).fit(X, Y)

#随机森林算法

pre = result.predict(test_message[:, 1:]).astype(int)

Id = test_df['PassengerId']

result_csv = open('result1.csv',"w")

result_fd = csv.writer(result_csv)

result_fd.writerow(['PassengerId','Survived'])

result_fd.writerows(zip(Id,pre))

result_csv.close()

Kaggle:Titanic: Machine Learning from Disaster的更多相关文章

机器学习案例学习【每周一例】之 Titanic: Machine Learning from Disaster
下面一文章就总结几点关键: 1.要学会观察,尤其是输入数据的特征提取时,看各输入数据和输出的关系,用绘图看! 2.训练后,看测试数据和训练数据误差,确定是否过拟合还是欠拟合: 3.欠拟合的话,说明模 ...
Kaggle项目实战一：Titanic: Machine Learning from Disaster
项目地址 https://www.kaggle.com/c/titanic 项目介绍: 除了乘客的编号以外,还包括下表中10个字段,构成了数据的所有特征 Variable Definition Key ...
Kaggle比赛（一）Titanic: Machine Learning from Disaster
泰坦尼克号幸存预测是本小白接触的第一个Kaggle入门比赛,主要参考了以下两篇教程: https://www.cnblogs.com/star-zhao/p/9801196.html https:// ...
kaggle _Titanic: Machine Learning from Disaster
A Data Science Framework: To Achieve 99% Accuracy https://www.kaggle.com/ldfreeman3/a-data-science-f ...
学习小记: Kaggle Learn - Machine Learning Explainability
Method Feature(s) Sample(s) Result Value/Feature Permutation Importance 1 all validation samples Sin ...
How do I learn machine learning?
https://www.quora.com/How-do-I-learn-machine-learning-1?redirected_qid=6578644 How Can I Learn X? ...
【机器学习Machine Learning】资料大全
昨天总结了深度学习的资料,今天把机器学习的资料也总结一下(友情提示:有些网站需要"科学上网"^_^) 推荐几本好书: 1.Pattern Recognition and Machi ...
Python (1) - 7 Steps to Mastering Machine Learning With Python
Step 1: Basic Python Skills install Anacondaincluding numpy, scikit-learn, and matplotlib Step 2: Fo ...
[Machine Learning] 国外程序员整理的机器学习资源大全
本文汇编了一些机器学习领域的框架.库以及软件(按编程语言排序). 1. C++ 1.1 计算机视觉 CCV —基于C语言/提供缓存/核心的机器视觉库,新颖的机器视觉库 OpenCV—它提供C++, C ...

随机推荐

spark streaming (二)
一.基础核心概念 1.StreamingContext详解 (一) 有两种创建StreamingContext的方式: val conf = new SparkConf().s ...
专题训练之区间DP
例题:以下例题部分的内容来自https://blog.csdn.net/my_sunshine26/article/details/77141398 一.石子合并问题 1.(NYOJ737)http: ...
高性能相关、Scrapy框架
高性能相关在编写爬虫时,性能的消耗主要在IO请求中,当单进程单线程模式下请求URL时必然会引起等待,从而使得请求整体变慢. import requests def fetch_async(url): ...
SAS8.1安装步骤(附图)
安装前应当把系统时间更改到一九九几年. 1.在解压后的文件夹里找到 setup .exe 双击开始安装 2.单击SAS System Setup 3．点击Next 4．选择 complete 并单击 ...
ROS中的CMakeLists.txt (转)
在ROS的编程过程中,如果CMakeLists.txt如果写不好,编译就很难成功.如果看不懂CMakeLists.txt那么很多错误你也不知道时什么回事.所以深入了解它是很右必要的.现在我们就来看看它 ...
mysql ERROR 1045 (28000): Access denied for user 'ODBC'@'localhost' (using password: NO)错误解决办法
我的电脑是win10,所用的是mysql5.7.14 近期在学习mysql数据库的时候,遇到了这个错误,我的密码错误了.突如其来的问题,很是蒙蔽,因为我没对数据库设置过密码.通过网上查询,可以通过进入 ...
protobuf手册
1. c++快速上手 https://developers.google.com/protocol-buffers/docs/cpptutorial 2. c++使用手册 https://develo ...
STL源码分析-list
http://note.youdao.com/noteshare?id=81492dc45602618344edc838ef104581
P3620 [APIO/CTSC 2007]数据备份
P3620 [APIO/CTSC 2007]数据备份题目描述你在一家 IT 公司为大型写字楼或办公楼(offices)的计算机数据做备份.然而数据备份的工作是枯燥乏味的,因此你想设计一个系统让不同 ...
Lucene 索引与检索架构图

Kaggle:Titanic: Machine Learning from Disaster

Kaggle:Titanic: Machine Learning from Disaster的更多相关文章

随机推荐

热门专题