kaggle预测
两个预测kaggle比赛
一 .https://www.kaggle.com/c/web-traffic-time-series-forecasting/overview
Arthur Suilin•(1st in this Competition)•a year ago•Options
github:https://github.com/sjvasquez/web-traffic-forecasting
- There are two main information sources for prediction. A) Year/Quarter seasonality. B) Past trend. Good model should use both sources and combine them intelligently
- Minimal feature engineering. Deep learning model is powerful enough to discover and use features on it's own. My task is just to assist model to use incoming data in a meaningful way.
I'll describe encountered implementation problems and their solutions
1. Learning takes too much time.
RNN's are inherently sequential and hard to parallelize. Today's most efficient RNN implementation is CuDNN fused kernels, created by NVIDIA experts. Tensorflow by default uses own generic, but slow sequential RNNCell. Surprisingly, TF also has support for CuDNN kernels (hard to find in documentation and poorly described). I spent some time to figure out how to use classes in tf.contrib.cudnn_rnn module
and got amazing result: ~10x decrease in computation time! I also used GRU instead of classical LSTM: it gives better results and computes ~1.5x faster. Of course, CuDNN can be used only for encoder. In decoder, each next step depends on customized processing of outputs from previous step, so decoder is Tensorflow GRUBlockCell
. GRUBlockCell
is again slightly faster than standard GRUCell
(~1.2x)
2. Long short-term memory is not so long.
The practical memory limit for LSTM-type cells is 100-200 steps. If we use longer sequence, LSTM/GRU just forgets what was at the beginning. But, to exploit yearly seasonality, we should use at least 365 steps. The conventional method to overcome this memory limit is attention. We can take encoder outputs from the distant past and feed them as inputs into current decoder step. My first very basic positional attention model: take encoder outputs from steps current_day - 365
(year seasonality) and current_day - 92
(quarter seasonality), squeeze them through FF layer (to reduce dimensionality and extract useful features), concatenate and feed into decoder. To compensate random walk noise and deviations in year and quarter lengths (leap/non-leap year, different number of days in months), I take weighted average (in proportion 0.25:0.5:0.25
) of 3 decoder outputs around the chosen step. Then I realized that 0.25:0.5:0.25
is just a 1D convolution kernel of size 3, and my model can learn most effective kernel weights and attention offsets on it's own. This learnable convolutional attention significantly improved model results.
But what if we just use lagged pageviews (year or quarter lag) as additional input features? Can lagged pageviews supplement or even replace attention? Yes, they can. When I added 4 additional features (3,6,9,12 months lagged pageviews) to inputs, I got roughly the same improvement as from attention.
3. Overfitting.
I decided to limit a number of days used for training to 100..400 and use remaining days to generate different samples for training. Example: if we have 500 days of data, use 200 days window for training, 60 days for prediction, then first 240 days is a 'free space' to randomly choose a starting day for training. Each starting day will produce a different time series. 145K pages x 250 starting days = 36.25M unique timeseries, not bad! For stage 2, this number is even higher. This is an effective kind of data augmentation: models using random starting point shows very little overfitting, even without any regularization. With dropout and slight L2 regularization, overfitting is almost non existent.
4. How model can decide what to use: seasonality or past trend or both?
Autocorrelation coefficient to the rescue. It turned to be a very important input feature. If year-to-year (lag 365) autocorrelation is positive and high, model should use mostly year-to-year seasonality, if it's low or negative, model should use mostly past trend information (or quarter seasonality if it's high). RNN can't compute autocorrelation on it's own (this will require additional pass over all steps), so this is only hand-crafted input feature in my models. It's important to not include leading/ending zeros/nans into autocorrelation calculation (page either don't exists at leading zeros day either deleted at ending zeros day)
5. High variance
I used following variance reduction methods:
- SGD weights averaging, decay=0.99. It really don't reduced observable variance, but improved prediction quality by ~0.2 SMAPE points.
- Checkpoints created at each 100 training steps, prediction results of models at 10 last checkpoints were averaged.
- Same model was trained on 3 different random seeds, prediction results were averaged. Again, it slightly improved prediction quality.
Prediction quality (predict last 60 days) of my models on Stage 2 data was ~35.2-35.5 SMAPE if autocorrelation calculated over all available data (including prediction interval) and ~36 SMAPE if autocorrelation calculated on all data excluding prediction interval. Let's see if model will hold same quality on future data.
Tips from the winning solutions
Congratulation to "all winners"! (including organizers) Thank you so much for creating, maintaining, competing, and sharing your solutions! Let me summarize something I learned from the top:
Use medians as features.
Use
log1p
to transform data, andMAE
as the evaluation metric.XGBoost and deep learning models such as MLP, CNN, RNN work. However, the performance hugely depends on how we create and train models.
For these deep learning models, skip connection works.
Best trick to me: clustering these time-series based on the performance of the best model. Then training different models for each cluster.
The period of stage 2 is easier for prediction than the period of stage 1. This affects how we will choose our best model (should it capture the weird behavior of stage 1 or not?).
Don't wait until last hour to submit models. For me, I overslept so I can't submit my best model =o= that model might have given me a gold (it boosts my CV to a margin of 0.5) :D
Various solutions (including 1st, 3rd, 4th,... places): https://www.kaggle.com/c/web-traffic-time-series-forecasting/discussion/39367
2nd place solution: https://www.kaggle.com/c/web-traffic-time-series-forecasting/discussion/39395
第六名:
https://github.com/sjvasquez/web-traffic-forecasting
二. Corporación Favorita Grocery Sales Forecasting
Topic 8 months ago in Corporación Favorita Grocery Sales Forecasting

Congrats to all winner teams and new grandmaster sjv. Thanks to kaggle for hosting and Favorita for sponsoring this great competition. Special thanks to @sjv, @senkin13, @tunguz, @ceshine, we build our models based on your kernels.
- https://github.com/sjvasquez/web-traffic-forecasting/blob/master/cnn.py
- https://www.kaggle.com/senkin13/lstm-starter/code
- https://www.kaggle.com/tunguz/lgbm-one-step-ahead-lb-0-513
- https://www.kaggle.com/ceshine/lgbm-starter
Like the Rossmann competiton, the private leaderboard shaked up again this time. I think luck is on our side finally.
Sample Selection
we used only 2017 data to extract features and construct samples.
train data:20170531 - 20170719 or 20170614 - 20170719, different models are trained with different data set.
validition: 20170726 - 20170810
In fact, we tried to use more data but failed. The gap between public and private leadboard is not very stable. If we train a single model for data of 16 days, the gap will be smaller(0.002-0.003).
Preprocessing
We just filled missing or negtive promotion and target values with 0.
Feature Engineering
- basic features
- category features: store, item, famlily, class, cluster...
- promotion
- dayofweek(only for model 3)
- statitical features: we use some methods to stat some targets for different keys in different time windows
- time windows
- nearest days: [1,3,5,7,14,30,60,140]
- equal time windows: [1] * 16, [7] * 20...
- key:store x item, item, store x class
- target: promotion, unit_sales, zeros
- method
- mean, median, max, min, std
- days since last appearance
- difference of mean value between adjacent time windows(only for equal time windows)
- time windows
- useless features
- holidays
- other keys such as: cluster x item, store x family...
Single Model
- model_1 : 0.506 / 0.511 , 16 lgb models trained for each day source code: https://www.kaggle.com/shixw125/1st-place-lgb-model-public-0-506-private-0-511
- model_2 : 0.507 / 0.513 , 16 nn models trained for each day source code: https://www.kaggle.com/shixw125/1st-place-nn-model-public-0-507-private-0-513
- model_3 : 0.512 / 0.515,1 lgb model for 16 days with almost same features as model_1
- model_4 : 0.517 / 0.519,1 nn model based on @sjv's code
Ensemble
Stacking doesn't work well this time, our best model is linear blend of 4 single models.
final submission = 0.42*model_1 + 0.28 * model_2 + 0.18 * model_3 + 0.12 * model_4
public = 0.504 , private = 0.509
kaggle预测的更多相关文章
- kaggle预测房价的代码步骤
# -*- coding: utf-8 -*- """ Created on Sat Oct 20 14:03:05 2018 @author: 12958 " ...
- [Machine Learning] 国外程序员整理的机器学习资源大全
本文汇编了一些机器学习领域的框架.库以及软件(按编程语言排序). 1. C++ 1.1 计算机视觉 CCV —基于C语言/提供缓存/核心的机器视觉库,新颖的机器视觉库 OpenCV—它提供C++, C ...
- 决策树和基于决策树的集成方法(DT,RF,GBDT,XGB)复习总结
摘要: 1.算法概述 2.算法推导 3.算法特性及优缺点 4.注意事项 5.实现和具体例子 内容: 1.算法概述 1.1 决策树(DT)是一种基本的分类和回归方法.在分类问题中它可以认为是if-the ...
- kaggle之数字序列预测
数字序列预测 Github地址 Kaggle地址 # -*- coding: UTF-8 -*- %matplotlib inline import pandas as pd import strin ...
- Kaggle竞赛 —— 房价预测 (House Prices)
完整代码见kaggle kernel 或 Github 比赛页面:https://www.kaggle.com/c/house-prices-advanced-regression-technique ...
- kaggle入门项目:Titanic存亡预测(四)模型拟合
原kaggle比赛地址:https://www.kaggle.com/c/titanic 原kernel地址:A Data Science Framework: To Achieve 99% Accu ...
- kaggle入门项目:Titanic存亡预测(二)数据处理
原kaggle比赛地址:https://www.kaggle.com/c/titanic 原kernel地址:A Data Science Framework: To Achieve 99% Accu ...
- kaggle入门项目:Titanic存亡预测 (一)比赛简介
自从入了数据挖掘的坑,就在不停的看视频刷书,但是总觉得实在太过抽象,在结束了coursera上Andrew Ng 教授的机器学习课程还有刷完一整本集体智慧编程后更加迷茫了,所以需要一个实践项目来扎实之 ...
- kaggle之泰坦尼克号乘客死亡预测
目录 前言 相关性分析 数据 数据特点 相关性分析 数据预处理 预测模型 Logistic回归训练模型 模型优化 前言 一般接触kaggle的入门题,已知部分乘客的年龄性别船舱等信息,预测其存活情况, ...
随机推荐
- 2013-2014 ACM-ICPC, NEERC, Southern Subregional Contest Problem F. Judging Time Prediction 优先队列
Problem F. Judging Time Prediction 题目连接: http://www.codeforces.com/gym/100253 Description It is not ...
- python调用oracle存储过程(packeage)
http://markmail.org/message/y64t5mqlgy4rogte http://www.oracle.com/technetwork/cn/articles/prez-stor ...
- 从客户端浏览器直传文件到Storage
关于上传文件到Azure Storage没有什么可讲的,不论我们使用哪种平台.语言,上传流程都如下图所示: 从上图我们可以了解到从客户端上传文件到Storage,是需要先将文件上传到应用服务上,然后再 ...
- 使用 IntraWeb (5) - 页面布局之 TFrame
IW 对 TFrame(还是之前那个), 这在页面布局中很有用. 如果多个页面都有一个共同的部分(譬如页眉.页脚.菜单.边栏等), 可以将这些共同的部分放在一个 TFrame 中, 从而方便统一与修改 ...
- IBM MR10i阵列卡配置Raid0/Raid1/Raid5(转)
RAID5配置: 其实RAID0/RAID1都基本一致,只是选择的类型不同. 1. 开机看到ctrl+h的提示按下相应的键,等ServerRaid 10-i卡初始化完成则进入WebBIOS 配置界面: ...
- spring cloud 学习(7) - 生产环境如何不停机热发布?
业务繁忙的系统,原则上是不允许停机的,那么问题来了,如果真有严重的bug要修复,不得不发布,怎么做到不停机发布,对业务无感知呢? eureka 提供了一系列rest url,可以对注册实例进行操作,比 ...
- the difference between an embOS interrupt and a zero latency interrupt
the difference between an embOS interrupt and a zero latency interrupt is the interrupt priority lev ...
- sql server在执行批处理时出现错误。错误消息为: 目录名无效
今天在客户服务器上的sql server上执行脚本,报错提示“在执行批处理时出现错误.错误消息为:目录名无效”,第一反应就是客户是不是在服务器装了360,因为之前有类似问题,360把数据库的文件给隔离 ...
- Git 修复 bug 切换分支时,如何保存修改过的代码(即如何保存现场)?
工作除了开发最新的版本之外还要对原来的版本做例行的维护,修修补补.于是有了在两个分支之间游走切换的问题,最新改版的代码在分支 new 上,旧版本的代码在分支 old 上,我在 new 上开发了一半,忽 ...
- systemtap 调试postgrel
http://blog.163.com/digoal@126/blog/static/16387704020137140265557/ dtrace http://blog.163.com/dig ...