[PyData] 01 - Web Crawler
前言
Let's go to https://www.kaggle.com/
Kaggle Notebook 有实践记录的案例。
一、线性拟合噪声数据
[Sklearn] Linear regression models to fit noisy data
二、打造 Pipeline
[Feature] Final pipeline: custom transformers
资源队列
阅读目录
- Algorithmic Trading Challenge25
- Allstate Purchase Prediction Challenge3
- Amazon.com – Employee Access Challenge6
- AMS 2013-2014 Solar Energy Prediction Contest2
- Belkin Energy Disaggregation Competition1
- Challenges in Representation Learning: Facial Expression Recognition Challenge4
- Challenges in Representation Learning: The Black Box Learning Challenge1
- Challenges in Representation Learning: Multi-modal Learning2
- Detecting Insults in Social Commentary
- EMI Music Data Science Hackathon
- Galaxy Zoo – The Galaxy Challenge
- Global Energy Forecasting Competition 2012 – Wind Forecasting
- KDD Cup 2013 – Author-Paper Identification Challenge (Track 1)2
- KDD Cup 2013 – Author Disambiguation Challenge (Track 2)1
- Large Scale Hierarchical Text Classification4
- Loan Default Prediction – Imperial College London
- Merck Molecular Activity Challenge1
- MLSP 2013 Bird Classification Challenge
- Observing the Dark World
- PAKDD 2014 – ASUS Malfunctional Components Prediction
- Personalize Expedia Hotel Searches – ICDM 2013
- Predicting a Biological Response1
- Predicting Closed Questions on Stack Overflow
- See Click Predict Fix1
- See Click Predict Fix – Hackathon1
- StumbleUpon Evergreen Classification Challenge
- [The Analytics Edge (15.071x)](The%20Analytics Edge (15.071x))
- The Marinexplore and Cornell University Whale Detection Challenge
- Walmart Recruiting – Store Sales Forecasting1
Kaggle比赛源代码和讨论的收集整理。
Algorithmic Trading Challenge25
Allstate Purchase Prediction Challenge3
- Rank 2 solution code21 by Alessandro Mariani.
- Rank 10 solution code5 by B1aine.
- Rank 36 solution cod1e by Hiroyuki.
- Rank 159 solution code by MrCanard.
- Solution thread.
Amazon.com – Employee Access Challenge6
- Rank 1 solution code24 by Paul Duan and Benjamin Solecki team.
- Rank 1 solution Q&A5 by Paul Duan.
- Rank 2 solution code1 by Owen Zhang.
- Rank 3 solution code3 by Dmitry & Leustagos.
- Rank 289 solution code by Foxtrot with original blog post here.
- Solution thread.
AMS 2013-2014 Solar Energy Prediction Contest2
- Rank 1 solution code5 and description4 by Leustagos team.
- Rank 2 solution code and description by Toulouse.
- Rank 3 solution code1 and description by Owen Zhang.
- Rank 4 solution escription by Peter Prettenhofer.
- Rank 5 solution description by Domcastro.
- Rank 58 solution code and description by Davit.
- Solution thread here.
- Ridge Regression starter code with MAE about 2.2M by Alec Radford, original thread here.
- Improved starter code by Foxtrot.
- Baseline code with MAE about 2.6M using Catmull-Rom Spline interpolation, also available in R here andhere.
Belkin Energy Disaggregation Competition1
Challenges in Representation Learning: Facial Expression Recognition Challenge4
- Rank 1 solution code6 and description2 by Charlie Tang.
- Rank 3 solution description3 by Maxim Milakov.
- Solution thread.
Challenges in Representation Learning: The Black Box Learning Challenge1
- Rank 1 solution description1 by David Thaler.
- Rank 2 solution code and description by sayit.
Challenges in Representation Learning: Multi-modal Learning2
- Rank 1 solution1 by MMDL.
- Solution thread.
Detecting Insults in Social Commentary
- Rank 1 solution description4 and code by Vivek Sharma.
- Rank 2 solution1 by tuzzeg.
- Rank 3 solution description Andrei Olariu.
- Rank 4 solution by Chris Brew.
- Rank 5 solution description by Yasser Tabandeh.
- Rank 6 solution by Andreas Mueller, code available here.
- Rank 8 solution description by Steve Poulson.
- Solution thread.
EMI Music Data Science Hackathon
- Rank 4 solution description1 by Steffen Rindle.
- Rank 18 solution code and description by Vlad Gusev.
- Rank 34 solution code and description by zenog.
- Solution thread.
Galaxy Zoo – The Galaxy Challenge
- Rank 1 solution code2 and description1 by Sander Dieleman.
- Rank 2 solution code and description by Maxim Milakov.
- Rank 3 solution code and description by tund.
- Rank 5 solution code and description by Julian de Wit.
- Rank 9 solution code and description by Soumith Chintala.
- Rank 13 solution code and description by Xiaoxiang Zhang.
- Rank 28 solution code and description by utdiscant.
- Rank 38 solution code and description by sugi.
- Rank 57 solution code and description1 by hxu.
- Rank 58 solution code and description by yr.
- Solution thread.
Global Energy Forecasting Competition 2012 – Wind Forecasting
- Rank 1 solution by Leustagos.
- Solution thread here1.
KDD Cup 2013 – Author-Paper Identification Challenge (Track 1)2
- Rank 1 solution with code and description4 by Team Algorithm, Github link to code here1.
KDD Cup 2013 – Author Disambiguation Challenge (Track 2)1
- Rank 1 solution with code and description4 by Team Algorithm, Github link to code here1.
- Rank 2 solution1 by SmallData Team.
- Rank 3 solution1 by hustmonk.
- Rank 4 solution1 by Ben S.
- Solution thread1.
Large Scale Hierarchical Text Classification4
- Rank 1 solution code and description7 by anttip.
- Rank 3 solution code2 and description2 by nagadomi.
- Solution thread one3.
- Solution thread two2.
Loan Default Prediction – Imperial College London
- Rank 2 solution and description1 by HelloWorld.
- Rank 12 solution and description by David McGarry.
- Solution thread.
Merck Molecular Activity Challenge1
MLSP 2013 Bird Classification Challenge
- Rank 1 solution code3 and description by beluga.
- Rank 2 solution code1 and description by Herbal Candy (W and thomeou).
- Rank 3 solution description by Anil Thomas.
- Rank 4 solution description by Maxim Milakov.
- Solution thread.
Observing the Dark World
- Rank 2 solution by Iain Murray, code available here.
PAKDD 2014 – ASUS Malfunctional Components Prediction
Personalize Expedia Hotel Searches – ICDM 2013
- Presentation paper/slides1 for ICDM 2013.
- Solution thread1.
Predicting a Biological Response1
- Rank 6 solution by Shea Parkes & Neil Schneider team.
- Rank 17 solution of Ensemble of RandomForests, GradientBoostingTrees and ExtraTreesRegressorby Emanuele Olivetti.
- Another solution code by Oblique Random Forest (oRF) by Shea Parkes & Neil Schneider team.
- The code of my best submission thread. Talks about Multi-core training Oblique Random Forests, and Stacking.
- Question about the process of ensemble learning thread. Talks about applying ensembles in practice, and how can problems arise and how to deal with them.
Predicting Closed Questions on Stack Overflow
- Rank 10 solution by Marco Lui.
- Rank 33 solution by Foxtrot.
See Click Predict Fix1
See Click Predict Fix – Hackathon1
StumbleUpon Evergreen Classification Challenge
- Benchmark beater 1.
- Benchmark beater 2.
- Benchmark beater 3.
- Solution thread.
- My own solution, which is a good example of what is overfitting. (Public rank: 57, Private rank: 291)
[The Analytics Edge (15.071x)](The%20Analytics Edge (15.071x))
- Rank 17 solution code and description by Foxtrot.
- Solution thread.
The Marinexplore and Cornell University Whale Detection Challenge
- Rank 1 solution by Nick Kridler.
- Rank 7 solution by Gilles Louppe and Peter Prettenhofer team.
- Rank 8 solution by Sander Dieleman.
- Rank 56 solution by Sudeep Juvekar.
- Solution discussion thread.
- Mean spectogram thread.
- Official interview from the Marinexplorer and Cornell at Kaggle.
Walmart Recruiting – Store Sales Forecasting1
- Rank 1 solution code5 and description by David Thaler.
- Rank 2 solution description1 by sriok.
- Rank 3 solution code and description1 by James King.
- Rank 5 solution description by ACS69.
- Rank 6 solution description by T. Henry.
- Rank 8 solution description by BreakfastPirate.
- Rank 9 solution description by Neil Summers.
- Rank 10 solution description by Gilberto Titericz Junior.
- Rank 11 solution description by citynight.
- Rank 16 solution code and description by yr.
- Rank 29 solution code and description by Mike Kim.
- Rank 30 solution description by dkay.
- Solution thread.
Thank you Foxtrot, James Petterson, Ben S for providing some of the links and solutions above.
[PyData] 01 - Web Crawler的更多相关文章
- A web crawler design for data mining
Abstract The content of the web has increasingly become a focus for academic research. Computer prog ...
- [CareerCup] 10.5 Web Crawler 网络爬虫
10.5 If you were designing a web crawler, how would you avoid getting into infinite loops? 这道题问如果让我们 ...
- (92) Web Crawling: How can I build a web crawler from scratch? - Quora
(92) Web Crawling: How can I build a web crawler from scratch? - Quora How can I build a web crawler ...
- <Web Crawler><Java><thread-safe queue>
Basic Solution The simplest way is to build a web crawler that runs on a single machine with single ...
- Free web scraping | Data extraction | Web Crawler | Octoparse, Free web scraping
Free web scraping | Data extraction | Web Crawler | Octoparse, Free web scraping 人才知了
- 01.Web大前端时代之:HTML5+CSS3入门系列~初识HTML5
Web大前端时代之:HTML5+CSS3入门系列:http://www.cnblogs.com/dunitian/p/5121725.html 文档申明 <!--文档类型申明,html代表是ht ...
- Tomcat笔记 #01# WEB应用管理工具简介
索引 查看JVM以及SERVLET/接口的情况 动态管理WEB应用 Tomcat自带了一个基于网页的web应用管理工具,可以帮助我们监控&管理部署上去的WEB APP,特别方便!恰好之前碰到的 ...
- 【Web crawler】simulated DFS web crawler
Finish crawl web learned from udacity 提示:在某些时候,你必须在page上调用get_page.这似乎违反直觉,但是我们用 page 这个词时,指的网页的网址 ( ...
- 01 Web框架介绍
一.Web框架本质 所有的web应用程序本质上都是socket,用户的浏览器其实就是一个socket客户端. python中常用的web框架有: Django Flask web.py WSGI(we ...
随机推荐
- Syntax error , insert “EnumBody” to complete EnumDeclaration
当@Test写在方法前面的时候因为没有导入junit的jar包,如果已经导入架包依然是同样的错误,那就是方法没写对,或者还没有写方法
- ie6定位absolute bug触发layout解决
IE6中很多Bug都可以通过触发layout得到解决,以上的解决方法无论是设置zoom:1还是设置width和height其实都是为了触发layout.下列的CSS属性或取值会让一个元素获得layou ...
- Java基础之理解Annotation
一.概念 Annontation是Java5开始引入的新特征.中文名称一般叫注解.它提供了一种安全的类似注释的机制,用来将任何的信息或元数据(metadata)与程序元素(类.方法.成员变量等)进行关 ...
- C#并行Parallel编程模型实战技巧手册
一.课程介绍 本次分享课程属于<C#高级编程实战技能开发宝典课程系列>中的一部分,阿笨后续会计划将实际项目中的一些比较实用的关于C#高级编程的技巧分享出来给大家进行学习,不断的收集.整理和 ...
- Unity Shader-后处理:Bloom全屏泛光
一.简介 今天来学习一下全屏Bloom效果,有时候也叫Glow效果,中文一般叫做“全屏泛光”,这是一种可以模拟出HDR的全屏后处理效果,但是实现原理与HDR相差很远,效果比HDR差一些,但是比HD ...
- Spring---面向切面编程(AOP模块)
Spring AOP 简介 如果说 IoC 是 Spring 的核心,那么面向切面编程就是 Spring 最为重要的功能之一了,在数据库事务中切面编程被广泛使用. AOP 即 Aspect Orien ...
- C# SpinLock实现
关于SpinLock自旋锁网上已经有很多说明,这里也copy了一部分,我这里主要关注微软的实现,学习人家的实现方式. 如果由于垃圾回收,基于对象的锁对象开销太高,可以使用SpinLock结构..NET ...
- postgre 导出单表和导入
pg除了可以通过dump的方式导入和导出.如果只是导出数据,可以直接使用copy 导出 COPY user TO '/tmp/data/test.csv' WITH csv; COPY user(na ...
- 微信小程序 多个视频播放器
大致思路就是,wx:for="{{ list }}"下两个view,一个视频video,另一个封面image(客户需求,要可以自定义封面).主要控制变量是playIndex,当点击 ...
- 快速准备(复制替换)一套新测试环境,CentOS7 MySQL相关配置
拿到一个新环境,需要找相关配置,我有一个办法,相对能比较快速地复制一套环境出来. 修改机器配置: virsh 相关几条命令,已完成,后续我再整理补充... 虚拟化相关,参考:https://www.c ...