url:http://ontrakinfo.wordpress.com/2012/10/29/making-gtfs-query-more-convenient/

这简直说出了我的心声。

I have been spending a lot of time parsing the GTFS database. On the surface it is just a simple CSV files. But to extract useful information from GTFS is often unexpected difficult. For example, find the stops from a bus line in sequential order might sounds like basic thing to do. But it is actually non-trivial with GTFS.

One reason is transit service is more complex it seems. It might seems a bus service just hit all the stops in sequence. But the actual service has a lot of variables. The schedule is often different in weekend compare to weekdays. And so does the exact route that it covers. Sometimes a bus is scheduled to run a short route rather than covering the whole length. In more complex case there can be branching where there is a common main trunk and then the buses split to serve two or more alternative destination.

This is the reason why in GTFS one “route” may associate with multiple “shapes”. To find out what shapes are associate with a route, we will have to make a query like this

SELECT
shape_id
FROM route
JOIN trips
JOIN shape
GROUP BY shape_id;

To find out the stops is even more complex. Here we need to join one more table the stop_times. It is also the biggest tables in the GTFS. So this is also the most computation intensive query to do.

SELECT
shape_id, stop_id
FROM route
JOIN trips
JOIN stop_times
JOIN stops
GROUP BY shape_id, stop_id;

Still most people have a clear concept of what a transit line is where it runs. It shouldn’t be such a pain to compute. A more useful structure should look like below.

    GTFS             More Useful
  Structure           Structure     route              line
     |                   |
     |                   V
     |                 route*
     |                   | \
     |    shape          |  +-> route_shape
     |     ^             |  |
     |    /              |  +-> route_stops*
     |   /               |
     V  /                V
    trips              trips
     |                   |
     |        stops      |          stops
     |        ^          |
     |       /           |
     V      /            V
    stop_times         stop_times

Here a shift the terminology a bit. The top level entity is a line (i.e. GTFS’ route). This is service that people know of, like a numbered bus line or a metro line. Below that is routes. These are the collection of alternative routes a line may run. The routes are not explicitly represented in GTFS. You can find that by querying all unique shape_id using the first SQL. Another missing piece is the stops. If we can pre-compute all the route_stops using the second SQL once, for the most part we don’t need the giant stop_times table. For applications that do not deal with scheduled time, this is a huge saver. The is one assumption my structure makes though. It is that different lines do not shape that same route. If should be a reasonable assumption. And if there is indeed share route and shape, we should just replicated them as two separate entities.

The original GTFS structure seems to have a transit operator centric view. It allows them maximum flexibility to author and publish their service data. But for application developers, it is not structured for easy traversal. By adding the route and route_stops tables as indicated, it will greatly facilitate the query and operation of transit information.

[转] Making GTFS query more convenient的更多相关文章

  1. Spring Boot Reference Guide

    Spring Boot Reference Guide Authors Phillip Webb, Dave Syer, Josh Long, Stéphane Nicoll, Rob Winch,  ...

  2. Using dojo/query(翻译)

    In this tutorial, we will learn about DOM querying and how the dojo/query module allows you to easil ...

  3. Query classification; understanding user intent

    http://vervedevelopments.com/Blog/query-classification-understanding-user-intent.html What exactly i ...

  4. The 5th tip of DB Query Analyzer

    The 5th tip of DB Query Analyzer             Ma Genfeng   (Guangdong UnitollServices incorporated, G ...

  5. Data access between different DBMS and other txt/csv data source by DB Query Analyzer

        1 About DB Query Analyzer DB Query Analyzer is presented by Master Genfeng,Ma from Chinese Mainl ...

  6. How to generate the complex data regularly to Ministry of Transport of P.R.C by DB Query Analyzer

    How to generate the complex data regularly to Ministry of Transport of P.R.C by DB Query Analyzer 1 ...

  7. Install and run DB Query Analyzer 6.04 on Microsoft Windows 10

          Install and run DB Query Analyzer 6.04 on Microsoft Windows 10  DB Query Analyzer is presented ...

  8. DB Query Analyzer 6.04 is distributed, 78 articles concerned have been published

        DB Query Analyzer 6.04 is distributed,78 articles concerned have been published  DB Query Analyz ...

  9. FluentData -Micro ORM with a fluent API that makes it simple to query a database 【MYSQL】

    官方地址:http://fluentdata.codeplex.com/documentation MYSQL: MySQL through the MySQL Connector .NET driv ...

随机推荐

  1. HDU2222

    http://acm.hdu.edu.cn/showproblem.php?pid=2222 注意: 1. keyword可以相同,因此计算时要累计:cur->num++. 2. 同一个keyw ...

  2. HDU 4123 (2011 Asia FZU contest)(树形DP + 维护最长子序列)(bfs + 尺取法)

    题意:告诉一张带权图,不存在环,存下每个点能够到的最大的距离,就是一个长度为n的序列,然后求出最大值-最小值不大于Q的最长子序列的长度. 做法1:两步,第一步是根据图计算出这个序列,大姐头用了树形DP ...

  3. asp.net将页面内容按需导入Excel,并设置excel样式,下载文件(解决打开格式与扩展名指定的格式不统一的问题)

    //请求一个excel类 Microsoft.Office.Interop.Excel.ApplicationClass excel = null; //创建 Workbook对象 Microsoft ...

  4. 一种工业级系统交互建模工具的应用--EventStudio System Designer

    一种工业级系统交互建模工具的应用 [摘要] 本文以探索如何维护大规模复杂系统交互设计模型为目的,以EventHelix公司的商业付费软件EventStudio System Designer为建模工具 ...

  5. 更新新网卡驱动,修复win7雷凌网卡Ralink RT3290在电脑睡眠时和启动网卡时出现蓝屏netr28x.sys驱动文件错误

    更新新网卡驱动,修复win7雷凌网卡Ralink RT3290在电脑睡眠时和启动网卡时出现蓝屏netr28x.sys驱动文件错误 我的本本是win7,雷凌网卡Ralink RT3290   802.1 ...

  6. C++学习笔记21:文件系统

    文件系统 实际文件系统 ext, ext2, ext3, ext4 虚拟文件系统 VFS 特殊文件系统/proc:从proc文件系统中抽取信息 实际文件系统:组成与功能描述 引导块,超级块,索引结点区 ...

  7. php冒泡排序

    <?php $arr = array(1,4,2,9,0,10,12,3,7); foreach($arr as $val) { echo $val."--"; } echo ...

  8. 有100个节点的AVL树最大深度是多少?

    首先说AVL树的概念 1 左右子树的深度差<=1 2 左右子树都是AVL树. 其实这样算,可以倒推的. 空树  DEPTH = 0; AVL_DEPTH = 2^0+2^1+......+2^k ...

  9. odoo.cli.main()指的是哪里?OpenERP的第二根线头儿

    接上回,odoo-bin中调用了odoo.cli.main(),去哪儿找? cli目录容易找 跟随__init__.py的脚步 import logging import sys import os ...

  10. 使用Condition Variables 实现一个线程安全队列

    使用Condition Variables实现一个线程安全队列 测试机: i7-4800MQ .7GHz, logical core, physical core, 8G memory, 256GB ...