[转] Making GTFS query more convenient
url:http://ontrakinfo.wordpress.com/2012/10/29/making-gtfs-query-more-convenient/
这简直说出了我的心声。
I have been spending a lot of time parsing the GTFS database. On the surface it is just a simple CSV files. But to extract useful information from GTFS is often unexpected difficult. For example, find the stops from a bus line in sequential order might sounds like basic thing to do. But it is actually non-trivial with GTFS.
One reason is transit service is more complex it seems. It might seems a bus service just hit all the stops in sequence. But the actual service has a lot of variables. The schedule is often different in weekend compare to weekdays. And so does the exact route that it covers. Sometimes a bus is scheduled to run a short route rather than covering the whole length. In more complex case there can be branching where there is a common main trunk and then the buses split to serve two or more alternative destination.
This is the reason why in GTFS one “route” may associate with multiple “shapes”. To find out what shapes are associate with a route, we will have to make a query like this
SELECT
shape_id
FROM route
JOIN trips
JOIN shape
GROUP BY shape_id;
To find out the stops is even more complex. Here we need to join one more table the stop_times. It is also the biggest tables in the GTFS. So this is also the most computation intensive query to do.
SELECT
shape_id, stop_id
FROM route
JOIN trips
JOIN stop_times
JOIN stops
GROUP BY shape_id, stop_id;
Still most people have a clear concept of what a transit line is where it runs. It shouldn’t be such a pain to compute. A more useful structure should look like below.
GTFS More Useful
Structure Structure route line
| |
| V
| route*
| | \
| shape | +-> route_shape
| ^ | |
| / | +-> route_stops*
| / |
V / V
trips trips
| |
| stops | stops
| ^ |
| / |
V / V
stop_times stop_times
Here a shift the terminology a bit. The top level entity is a line (i.e. GTFS’ route). This is service that people know of, like a numbered bus line or a metro line. Below that is routes. These are the collection of alternative routes a line may run. The routes are not explicitly represented in GTFS. You can find that by querying all unique shape_id using the first SQL. Another missing piece is the stops. If we can pre-compute all the route_stops using the second SQL once, for the most part we don’t need the giant stop_times table. For applications that do not deal with scheduled time, this is a huge saver. The is one assumption my structure makes though. It is that different lines do not shape that same route. If should be a reasonable assumption. And if there is indeed share route and shape, we should just replicated them as two separate entities.
The original GTFS structure seems to have a transit operator centric view. It allows them maximum flexibility to author and publish their service data. But for application developers, it is not structured for easy traversal. By adding the route and route_stops tables as indicated, it will greatly facilitate the query and operation of transit information.
[转] Making GTFS query more convenient的更多相关文章
- Spring Boot Reference Guide
Spring Boot Reference Guide Authors Phillip Webb, Dave Syer, Josh Long, Stéphane Nicoll, Rob Winch, ...
- Using dojo/query(翻译)
In this tutorial, we will learn about DOM querying and how the dojo/query module allows you to easil ...
- Query classification; understanding user intent
http://vervedevelopments.com/Blog/query-classification-understanding-user-intent.html What exactly i ...
- The 5th tip of DB Query Analyzer
The 5th tip of DB Query Analyzer Ma Genfeng (Guangdong UnitollServices incorporated, G ...
- Data access between different DBMS and other txt/csv data source by DB Query Analyzer
1 About DB Query Analyzer DB Query Analyzer is presented by Master Genfeng,Ma from Chinese Mainl ...
- How to generate the complex data regularly to Ministry of Transport of P.R.C by DB Query Analyzer
How to generate the complex data regularly to Ministry of Transport of P.R.C by DB Query Analyzer 1 ...
- Install and run DB Query Analyzer 6.04 on Microsoft Windows 10
Install and run DB Query Analyzer 6.04 on Microsoft Windows 10 DB Query Analyzer is presented ...
- DB Query Analyzer 6.04 is distributed, 78 articles concerned have been published
DB Query Analyzer 6.04 is distributed,78 articles concerned have been published DB Query Analyz ...
- FluentData -Micro ORM with a fluent API that makes it simple to query a database 【MYSQL】
官方地址:http://fluentdata.codeplex.com/documentation MYSQL: MySQL through the MySQL Connector .NET driv ...
随机推荐
- css3 倒影
说起倒影效果,在传统网页中,我们只能使用photoshop进行事先将倒影设计好,然后导入到网页中,这样不但耗费资源,也阻碍了开发的效率.而 css3新增了Reflections板块,css Refl ...
- HDU 3652 B-number
也是数位dp.考虑反面会简单很多. #include<iostream> #include<cstdio> #include<cstring> #include&l ...
- MATLAB与C/C++混合编程的一些总结
[转载请注明出处]http://www.cnblogs.com/mashiqi 先上总结: 由于C/C++语言的函数输入输出参数的特点,可以将多个参数方便地传入一个函数中,但却不能方便地返回多个参数. ...
- 利用中文数据跑Google开源项目word2vec
一直听说word2vec在处理词与词的相似度的问题上效果十分好,最近自己也上手跑了跑Google开源的代码(https://code.google.com/p/word2vec/). 1.语料 首先准 ...
- Interproscan, xml文件转化为tsv
将interproscan的结果转化格式 很奇怪 tsv格式里没有go, kegg, inter-domain信息,但是xml文件里面却有,tsv文件比较好处理,所以先将xml文件转化为tsv.用软件 ...
- web安全之sql注入原理
sql注入的原理: 将(恶意)的SQL命令注入到后台数据库引擎执行的能力,把信息返回到页面 sql注入产生的原因: 过滤不严谨,导致产生的sql注入. sql注入产生的地方: ...
- poj3181 Dollar Dayz ——完全背包
link:http://poj.org/problem?id=3181 本来很常规的一道完全背包,比较有意思的一点是,结果会超int,更有意思的解决方法是,不用高精度,用两个整型的拼接起来就行了.OR ...
- 转weblogic 10.3新建域
一.安装前准备 1.解决linux中文乱码问题 修改/etc/sysconfig/i18n文件 #LANG="en_US.UTF-8"#SUPPORTED="en_US. ...
- Alpha版本项目展示
成员简介 谷大鑫: 热爱编程,技术狂魔,可以对感兴趣的技术钻研到茶饭不思,队伍的技术中坚.标签:整个队伍里最靠谱的人. 个人博客:http://www.cnblogs.com/nrm1/ 杨金键: 未 ...
- OS实验一实验报告
实验一.命令解释程序的编写实验 专业:商业软件工程 姓名:王泽锴 学号:201406114113 一.实验目的 (1)掌握命令解释程序的原理: (2)*掌握简单的DOS调用方法: (3)掌握C语 ...