[转] Making GTFS query more convenient
url:http://ontrakinfo.wordpress.com/2012/10/29/making-gtfs-query-more-convenient/
这简直说出了我的心声。
I have been spending a lot of time parsing the GTFS database. On the surface it is just a simple CSV files. But to extract useful information from GTFS is often unexpected difficult. For example, find the stops from a bus line in sequential order might sounds like basic thing to do. But it is actually non-trivial with GTFS.
One reason is transit service is more complex it seems. It might seems a bus service just hit all the stops in sequence. But the actual service has a lot of variables. The schedule is often different in weekend compare to weekdays. And so does the exact route that it covers. Sometimes a bus is scheduled to run a short route rather than covering the whole length. In more complex case there can be branching where there is a common main trunk and then the buses split to serve two or more alternative destination.
This is the reason why in GTFS one “route” may associate with multiple “shapes”. To find out what shapes are associate with a route, we will have to make a query like this
SELECT
shape_id
FROM route
JOIN trips
JOIN shape
GROUP BY shape_id;
To find out the stops is even more complex. Here we need to join one more table the stop_times. It is also the biggest tables in the GTFS. So this is also the most computation intensive query to do.
SELECT
shape_id, stop_id
FROM route
JOIN trips
JOIN stop_times
JOIN stops
GROUP BY shape_id, stop_id;
Still most people have a clear concept of what a transit line is where it runs. It shouldn’t be such a pain to compute. A more useful structure should look like below.
GTFS More Useful
Structure Structure route line
| |
| V
| route*
| | \
| shape | +-> route_shape
| ^ | |
| / | +-> route_stops*
| / |
V / V
trips trips
| |
| stops | stops
| ^ |
| / |
V / V
stop_times stop_times
Here a shift the terminology a bit. The top level entity is a line (i.e. GTFS’ route). This is service that people know of, like a numbered bus line or a metro line. Below that is routes. These are the collection of alternative routes a line may run. The routes are not explicitly represented in GTFS. You can find that by querying all unique shape_id using the first SQL. Another missing piece is the stops. If we can pre-compute all the route_stops using the second SQL once, for the most part we don’t need the giant stop_times table. For applications that do not deal with scheduled time, this is a huge saver. The is one assumption my structure makes though. It is that different lines do not shape that same route. If should be a reasonable assumption. And if there is indeed share route and shape, we should just replicated them as two separate entities.
The original GTFS structure seems to have a transit operator centric view. It allows them maximum flexibility to author and publish their service data. But for application developers, it is not structured for easy traversal. By adding the route and route_stops tables as indicated, it will greatly facilitate the query and operation of transit information.
[转] Making GTFS query more convenient的更多相关文章
- Spring Boot Reference Guide
Spring Boot Reference Guide Authors Phillip Webb, Dave Syer, Josh Long, Stéphane Nicoll, Rob Winch, ...
- Using dojo/query(翻译)
In this tutorial, we will learn about DOM querying and how the dojo/query module allows you to easil ...
- Query classification; understanding user intent
http://vervedevelopments.com/Blog/query-classification-understanding-user-intent.html What exactly i ...
- The 5th tip of DB Query Analyzer
The 5th tip of DB Query Analyzer Ma Genfeng (Guangdong UnitollServices incorporated, G ...
- Data access between different DBMS and other txt/csv data source by DB Query Analyzer
1 About DB Query Analyzer DB Query Analyzer is presented by Master Genfeng,Ma from Chinese Mainl ...
- How to generate the complex data regularly to Ministry of Transport of P.R.C by DB Query Analyzer
How to generate the complex data regularly to Ministry of Transport of P.R.C by DB Query Analyzer 1 ...
- Install and run DB Query Analyzer 6.04 on Microsoft Windows 10
Install and run DB Query Analyzer 6.04 on Microsoft Windows 10 DB Query Analyzer is presented ...
- DB Query Analyzer 6.04 is distributed, 78 articles concerned have been published
DB Query Analyzer 6.04 is distributed,78 articles concerned have been published DB Query Analyz ...
- FluentData -Micro ORM with a fluent API that makes it simple to query a database 【MYSQL】
官方地址:http://fluentdata.codeplex.com/documentation MYSQL: MySQL through the MySQL Connector .NET driv ...
随机推荐
- svg DOM的一些js操作
这是第一个实例,其中讲了如何新建svg,添加元素,保存svg document,查看svg. 下面将附上常用一些元素的添加方法:(为js的,但基本上跟java中操作一样,就是类名有点细微差别) Cir ...
- 如何解决xx列不在表中
在连接数据库的程序中常会出现xx列不在表中的问题?那么应该怎么解决呢? 产生此问题的原因有三种: 1.数据表没这个字段2.sql查询没将这个字段查出来3.字段名写错了 还有重要的是一定要检查你的数据库 ...
- Javascript中call函数和apply函数的使用
Javascript 中call函数和apply的使用: Javascript中的call函数和apply函数是对执行上下文进行切换,是将一个函数从当前执行的上下文切换到另一个对象中执行,例如: so ...
- redis-persist上线
九月份惨不忍睹,因为代码质量不够高,直接被Boss喷成了筛子.被反复教育说要高质量的代码,要可维护.高性能…… 幸而,最后一周终于在紧张的加班中,灰度上线redis-land-go了,项目也改名为re ...
- Scalding初探之二:动手来做做小实验
输入文件 Scalding既可以处理HDFS上的数据,也可以很方便地在本地运行处理一些test case便于debug,Source有好多种 1 TextLine(filename) TextLine ...
- LintCode Binary Tree Paths
Binary Tree Paths Given a binary tree, return all root-to-leaf paths. Given the following binary tre ...
- EntityFramework 实体映射到数据库
EntityFramework实体映射到数据库 在Entity Framework Code First与数据表之间的映射方式实现: 1.Fluent API映射 通过重写DbContext上的OnM ...
- oracle 小知识
oracle: 数值随机的函数是 dbms_random.value(最大值,最小值) 用法是select dbms_random(3,0) from dual; oracle: 获取前100条 ...
- yii框架便利类CVarDumper使用
1.类文件位置:path/to/yiiframework/utils/CVarDumper.php 2.作用:CVarDumper is intended to replace the buggy P ...
- 数迹学——Asp.Net MVC4入门指南(3):添加一个视图
方法返回值 ActionResult(方法执行后的结果) 例子1 public ActionResult methordName() { return "string"; } 例 ...