[转] Making GTFS query more convenient
url:http://ontrakinfo.wordpress.com/2012/10/29/making-gtfs-query-more-convenient/
这简直说出了我的心声。
I have been spending a lot of time parsing the GTFS database. On the surface it is just a simple CSV files. But to extract useful information from GTFS is often unexpected difficult. For example, find the stops from a bus line in sequential order might sounds like basic thing to do. But it is actually non-trivial with GTFS.
One reason is transit service is more complex it seems. It might seems a bus service just hit all the stops in sequence. But the actual service has a lot of variables. The schedule is often different in weekend compare to weekdays. And so does the exact route that it covers. Sometimes a bus is scheduled to run a short route rather than covering the whole length. In more complex case there can be branching where there is a common main trunk and then the buses split to serve two or more alternative destination.
This is the reason why in GTFS one “route” may associate with multiple “shapes”. To find out what shapes are associate with a route, we will have to make a query like this
SELECT
shape_id
FROM route
JOIN trips
JOIN shape
GROUP BY shape_id;
To find out the stops is even more complex. Here we need to join one more table the stop_times. It is also the biggest tables in the GTFS. So this is also the most computation intensive query to do.
SELECT
shape_id, stop_id
FROM route
JOIN trips
JOIN stop_times
JOIN stops
GROUP BY shape_id, stop_id;
Still most people have a clear concept of what a transit line is where it runs. It shouldn’t be such a pain to compute. A more useful structure should look like below.
GTFS More Useful
Structure Structure route line
| |
| V
| route*
| | \
| shape | +-> route_shape
| ^ | |
| / | +-> route_stops*
| / |
V / V
trips trips
| |
| stops | stops
| ^ |
| / |
V / V
stop_times stop_times
Here a shift the terminology a bit. The top level entity is a line (i.e. GTFS’ route). This is service that people know of, like a numbered bus line or a metro line. Below that is routes. These are the collection of alternative routes a line may run. The routes are not explicitly represented in GTFS. You can find that by querying all unique shape_id using the first SQL. Another missing piece is the stops. If we can pre-compute all the route_stops using the second SQL once, for the most part we don’t need the giant stop_times table. For applications that do not deal with scheduled time, this is a huge saver. The is one assumption my structure makes though. It is that different lines do not shape that same route. If should be a reasonable assumption. And if there is indeed share route and shape, we should just replicated them as two separate entities.
The original GTFS structure seems to have a transit operator centric view. It allows them maximum flexibility to author and publish their service data. But for application developers, it is not structured for easy traversal. By adding the route and route_stops tables as indicated, it will greatly facilitate the query and operation of transit information.
[转] Making GTFS query more convenient的更多相关文章
- Spring Boot Reference Guide
Spring Boot Reference Guide Authors Phillip Webb, Dave Syer, Josh Long, Stéphane Nicoll, Rob Winch, ...
- Using dojo/query(翻译)
In this tutorial, we will learn about DOM querying and how the dojo/query module allows you to easil ...
- Query classification; understanding user intent
http://vervedevelopments.com/Blog/query-classification-understanding-user-intent.html What exactly i ...
- The 5th tip of DB Query Analyzer
The 5th tip of DB Query Analyzer Ma Genfeng (Guangdong UnitollServices incorporated, G ...
- Data access between different DBMS and other txt/csv data source by DB Query Analyzer
1 About DB Query Analyzer DB Query Analyzer is presented by Master Genfeng,Ma from Chinese Mainl ...
- How to generate the complex data regularly to Ministry of Transport of P.R.C by DB Query Analyzer
How to generate the complex data regularly to Ministry of Transport of P.R.C by DB Query Analyzer 1 ...
- Install and run DB Query Analyzer 6.04 on Microsoft Windows 10
Install and run DB Query Analyzer 6.04 on Microsoft Windows 10 DB Query Analyzer is presented ...
- DB Query Analyzer 6.04 is distributed, 78 articles concerned have been published
DB Query Analyzer 6.04 is distributed,78 articles concerned have been published DB Query Analyz ...
- FluentData -Micro ORM with a fluent API that makes it simple to query a database 【MYSQL】
官方地址:http://fluentdata.codeplex.com/documentation MYSQL: MySQL through the MySQL Connector .NET driv ...
随机推荐
- Ztree使用笔记
在项目中需要用到树,使用了Ztree.(官网地址:http://www.treejs.cn/v3/main.php#_zTreeInfo,介绍很详细,有API,有demo) 1.初始化树: $.f ...
- SCC(强连通分量)
1.定义: 在有向图G中,如果两个顶点间至少存在一条路径,称两个顶点强连通(SC---strongly connected). 有向图中的极大强连通子图,成为强连通分量(SCC---strongly ...
- Angularjs中的promise
promise 是一种用异步方式处理值的方法,promise是对象,代表了一个函数最终可能的返回值或抛出的异常.在与远程对象打交道非常有用,可以把它们看成一个远程对象的代理. 要在Angular中创建 ...
- sed命令的基本使用
sed(Stream Editor):流编辑器 一次只读取一行 模式空间 1.sed语法: sed [option] "script" FILE... 2.选项: -n:静默模式, ...
- Bean
1. Bean配置项 1.1. ID 在整个IOC容器中Bean的唯一标识 1.2. Class 具体要实例化的类 1.3. Scope 范围,作用域 1.4. Constructor argumen ...
- isinstance(),issubclass()
isinstance(object,classinfo) 返回True,如果object是classinfo或者classinfo子class的实例. 如果classinfo是包含type和class ...
- Python-Jenkins 查询job是否存在
def check_jobs_from_jenkins(job_names): if isinstance(job_names, str): job_names = [job_names] job_d ...
- Hadoop的I/O操作
HDFS的数据完整性 检验数据是否损坏最常见的措施是:在数据第一次引入系统时计算校验和并在数据通过一个不可靠通道进行传输时再次计算校验和,这样就能发现数据是否被损坏.HDFS会对写入的所有数据计算校验 ...
- DB_MYSQL_mysql-5.7.10-winx64解压版安装笔记
1.http://dev.mysql.com/downloads/mysql/ 里面下载Windows (x86, 64-bit), ZIP Archive mysql-5.7.10-winx64.z ...
- Online Object Tracking: A Benchmark 翻译
来自http://www.aichengxu.com/view/2426102 摘要 目标跟踪是计算机视觉大量应用中的重要组成部分之一.近年来,尽管在分享源码和数据集方面的努力已经取得了许多进展,开发 ...