[转] Making GTFS query more convenient
url:http://ontrakinfo.wordpress.com/2012/10/29/making-gtfs-query-more-convenient/
这简直说出了我的心声。
I have been spending a lot of time parsing the GTFS database. On the surface it is just a simple CSV files. But to extract useful information from GTFS is often unexpected difficult. For example, find the stops from a bus line in sequential order might sounds like basic thing to do. But it is actually non-trivial with GTFS.
One reason is transit service is more complex it seems. It might seems a bus service just hit all the stops in sequence. But the actual service has a lot of variables. The schedule is often different in weekend compare to weekdays. And so does the exact route that it covers. Sometimes a bus is scheduled to run a short route rather than covering the whole length. In more complex case there can be branching where there is a common main trunk and then the buses split to serve two or more alternative destination.
This is the reason why in GTFS one “route” may associate with multiple “shapes”. To find out what shapes are associate with a route, we will have to make a query like this
SELECT
shape_id
FROM route
JOIN trips
JOIN shape
GROUP BY shape_id;
To find out the stops is even more complex. Here we need to join one more table the stop_times. It is also the biggest tables in the GTFS. So this is also the most computation intensive query to do.
SELECT
shape_id, stop_id
FROM route
JOIN trips
JOIN stop_times
JOIN stops
GROUP BY shape_id, stop_id;
Still most people have a clear concept of what a transit line is where it runs. It shouldn’t be such a pain to compute. A more useful structure should look like below.
GTFS More Useful
Structure Structure route line
| |
| V
| route*
| | \
| shape | +-> route_shape
| ^ | |
| / | +-> route_stops*
| / |
V / V
trips trips
| |
| stops | stops
| ^ |
| / |
V / V
stop_times stop_times
Here a shift the terminology a bit. The top level entity is a line (i.e. GTFS’ route). This is service that people know of, like a numbered bus line or a metro line. Below that is routes. These are the collection of alternative routes a line may run. The routes are not explicitly represented in GTFS. You can find that by querying all unique shape_id using the first SQL. Another missing piece is the stops. If we can pre-compute all the route_stops using the second SQL once, for the most part we don’t need the giant stop_times table. For applications that do not deal with scheduled time, this is a huge saver. The is one assumption my structure makes though. It is that different lines do not shape that same route. If should be a reasonable assumption. And if there is indeed share route and shape, we should just replicated them as two separate entities.
The original GTFS structure seems to have a transit operator centric view. It allows them maximum flexibility to author and publish their service data. But for application developers, it is not structured for easy traversal. By adding the route and route_stops tables as indicated, it will greatly facilitate the query and operation of transit information.
[转] Making GTFS query more convenient的更多相关文章
- Spring Boot Reference Guide
Spring Boot Reference Guide Authors Phillip Webb, Dave Syer, Josh Long, Stéphane Nicoll, Rob Winch, ...
- Using dojo/query(翻译)
In this tutorial, we will learn about DOM querying and how the dojo/query module allows you to easil ...
- Query classification; understanding user intent
http://vervedevelopments.com/Blog/query-classification-understanding-user-intent.html What exactly i ...
- The 5th tip of DB Query Analyzer
The 5th tip of DB Query Analyzer Ma Genfeng (Guangdong UnitollServices incorporated, G ...
- Data access between different DBMS and other txt/csv data source by DB Query Analyzer
1 About DB Query Analyzer DB Query Analyzer is presented by Master Genfeng,Ma from Chinese Mainl ...
- How to generate the complex data regularly to Ministry of Transport of P.R.C by DB Query Analyzer
How to generate the complex data regularly to Ministry of Transport of P.R.C by DB Query Analyzer 1 ...
- Install and run DB Query Analyzer 6.04 on Microsoft Windows 10
Install and run DB Query Analyzer 6.04 on Microsoft Windows 10 DB Query Analyzer is presented ...
- DB Query Analyzer 6.04 is distributed, 78 articles concerned have been published
DB Query Analyzer 6.04 is distributed,78 articles concerned have been published DB Query Analyz ...
- FluentData -Micro ORM with a fluent API that makes it simple to query a database 【MYSQL】
官方地址:http://fluentdata.codeplex.com/documentation MYSQL: MySQL through the MySQL Connector .NET driv ...
随机推荐
- 黑马程序员——【Java高新技术】——反射机制
---------- android培训.java培训.期待与您交流! ---------- 一.概述 1.Java反射机制:是指“在运行状态中”,对于任意一个类,都能够知道这个类中的所有属性和方法: ...
- [Java Basics] multi-threading
1, Process&Threads Most implementations of the Java virtual machine run as a single process. Thr ...
- 使用HelloCharts绘制柱状图
首先下载依赖库 ,有现成的jar包:hellocharts-library-1.5.8.jar 在需要的布局中直接使用: <lecho.lib.hellocharts.view.ColumnCh ...
- 【LeetCode OJ】Minimum Depth of Binary Tree
Problem Link: http://oj.leetcode.com/problems/minimum-depth-of-binary-tree/ To find the minimum dept ...
- uget和aria2
http://blog.csdn.net/luojiming1990/article/details/9078447 其中的aria2 -v要改成aria2c -v
- 该应用的登录功能版本较旧,无法使用QQ账号登录,请升级到最新版本,如果还无法解决,请联系开发者升级。(错误码:100044)
该原因应该是你的应用数据签名更改的原因 解决步骤已经写到我的公众号,二维码在下面. 欢迎观看我的CSDN学院课程,地址:http://edu.csdn.net/course/detail/2877 本 ...
- C#- 反射之 GetType()方法
Type.GetType()在跨程序集反射时返回null的解决方法 在开发中,经常会遇到这种情况,在程序集A.dll中需要反射程序集B.dll中的类型.如果使用稍有不慎,就会产生运行时错误.例如使用T ...
- ✡ leetcode 166. Fraction to Recurring Decimal 分数转换 --------- java
Given two integers representing the numerator and denominator of a fraction, return the fraction in ...
- [JSOI2008][BZOJ1012] 最大数(动态开点线段树)
题目描述 现在请求你维护一个数列,要求提供以下两种操作: 1. 查询操作. 语法:Q L 功能:查询当前数列中末尾L个数中的最大的数,并输出这个数的值. 限制:L不超过当前数列的长度. 2. 插入操作 ...
- Unity3d 引擎原理详细介绍
体系结构 为了更好地理解游戏的软件架构和对象模型,它获得更好的外观仅有一名Unity3D的游戏引擎和编辑器是非常有用的,它的主要原则. Unity3D 引擎 Unity3D的是一个屡获殊荣的工具,用于 ...