url:http://ontrakinfo.wordpress.com/2012/10/29/making-gtfs-query-more-convenient/

这简直说出了我的心声。

I have been spending a lot of time parsing the GTFS database. On the surface it is just a simple CSV files. But to extract useful information from GTFS is often unexpected difficult. For example, find the stops from a bus line in sequential order might sounds like basic thing to do. But it is actually non-trivial with GTFS.

One reason is transit service is more complex it seems. It might seems a bus service just hit all the stops in sequence. But the actual service has a lot of variables. The schedule is often different in weekend compare to weekdays. And so does the exact route that it covers. Sometimes a bus is scheduled to run a short route rather than covering the whole length. In more complex case there can be branching where there is a common main trunk and then the buses split to serve two or more alternative destination.

This is the reason why in GTFS one “route” may associate with multiple “shapes”. To find out what shapes are associate with a route, we will have to make a query like this

SELECT
shape_id
FROM route
JOIN trips
JOIN shape
GROUP BY shape_id;

To find out the stops is even more complex. Here we need to join one more table the stop_times. It is also the biggest tables in the GTFS. So this is also the most computation intensive query to do.

SELECT
shape_id, stop_id
FROM route
JOIN trips
JOIN stop_times
JOIN stops
GROUP BY shape_id, stop_id;

Still most people have a clear concept of what a transit line is where it runs. It shouldn’t be such a pain to compute. A more useful structure should look like below.

    GTFS             More Useful
  Structure           Structure     route              line
     |                   |
     |                   V
     |                 route*
     |                   | \
     |    shape          |  +-> route_shape
     |     ^             |  |
     |    /              |  +-> route_stops*
     |   /               |
     V  /                V
    trips              trips
     |                   |
     |        stops      |          stops
     |        ^          |
     |       /           |
     V      /            V
    stop_times         stop_times

Here a shift the terminology a bit. The top level entity is a line (i.e. GTFS’ route). This is service that people know of, like a numbered bus line or a metro line. Below that is routes. These are the collection of alternative routes a line may run. The routes are not explicitly represented in GTFS. You can find that by querying all unique shape_id using the first SQL. Another missing piece is the stops. If we can pre-compute all the route_stops using the second SQL once, for the most part we don’t need the giant stop_times table. For applications that do not deal with scheduled time, this is a huge saver. The is one assumption my structure makes though. It is that different lines do not shape that same route. If should be a reasonable assumption. And if there is indeed share route and shape, we should just replicated them as two separate entities.

The original GTFS structure seems to have a transit operator centric view. It allows them maximum flexibility to author and publish their service data. But for application developers, it is not structured for easy traversal. By adding the route and route_stops tables as indicated, it will greatly facilitate the query and operation of transit information.

[转] Making GTFS query more convenient的更多相关文章

  1. Spring Boot Reference Guide

    Spring Boot Reference Guide Authors Phillip Webb, Dave Syer, Josh Long, Stéphane Nicoll, Rob Winch,  ...

  2. Using dojo/query(翻译)

    In this tutorial, we will learn about DOM querying and how the dojo/query module allows you to easil ...

  3. Query classification; understanding user intent

    http://vervedevelopments.com/Blog/query-classification-understanding-user-intent.html What exactly i ...

  4. The 5th tip of DB Query Analyzer

    The 5th tip of DB Query Analyzer             Ma Genfeng   (Guangdong UnitollServices incorporated, G ...

  5. Data access between different DBMS and other txt/csv data source by DB Query Analyzer

        1 About DB Query Analyzer DB Query Analyzer is presented by Master Genfeng,Ma from Chinese Mainl ...

  6. How to generate the complex data regularly to Ministry of Transport of P.R.C by DB Query Analyzer

    How to generate the complex data regularly to Ministry of Transport of P.R.C by DB Query Analyzer 1 ...

  7. Install and run DB Query Analyzer 6.04 on Microsoft Windows 10

          Install and run DB Query Analyzer 6.04 on Microsoft Windows 10  DB Query Analyzer is presented ...

  8. DB Query Analyzer 6.04 is distributed, 78 articles concerned have been published

        DB Query Analyzer 6.04 is distributed,78 articles concerned have been published  DB Query Analyz ...

  9. FluentData -Micro ORM with a fluent API that makes it simple to query a database 【MYSQL】

    官方地址:http://fluentdata.codeplex.com/documentation MYSQL: MySQL through the MySQL Connector .NET driv ...

随机推荐

  1. oracle 解锁表

    //查询锁表id select session_id from v$locked_object; //查询该ID的serial# SELECT sid, serial#, username, osus ...

  2. java SE 常用的排序算法

    java程序员会用到的经典排序算法实现 常用的排序算法(以下代码包含的)有以下五类: A.插入排序(直接插入排序.希尔排序) B.交换排序(冒泡排序.快速排序) C.选择排序(直接选择排序.堆排序) ...

  3. day11 堡垒机

    项目实战:运维堡垒机开发 前景介绍 到目前为止,很多公司对堡垒机依然不太感冒,其实是没有充分认识到堡垒机在IT管理中的重要作用的,很多人觉得,堡垒机就是跳板机,其实这个认识是不全面的,跳板功能只是堡垒 ...

  4. HDU 3555 Bomb

    RT. #include<iostream> #include<cstdio> #include<cstring> #include<algorithm> ...

  5. JS运动基础(三) 弹性运动

    加减速运动速度不断增加或减少速度减小到负值,会向反方向运动 弹性运动在目标点左边,加速:在目标点右边,减速根据距离,计算加速度 带摩擦力的弹性运动弹性运动+摩擦力 弹性:速度 += (目标点 - 当前 ...

  6. ucos3的配置文件

    1,配置文件,用于系统的裁剪 均有详细的注释 为组件的开关 ​ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 ...

  7. msp430FR5739 FRAM的学习

    FRAM,中文名称为铁电存储器..FRAM提供一种与RAM一致的性能,但又有与ROM 一样的非易失性. FRAM 克服以上二种记忆体的缺陷并合并它们的优点,它是全新创造的产品,一个非易失性随机存取储存 ...

  8. django 过滤器 、日期格式化参数

    http://blog.csdn.net/xyp84/article/details/7945094 django1.4 html页面从数据库中读出DateTimeField字段时,显示的时间格式和数 ...

  9. SQL Server Profiler工具

    一.SQL Profiler工具简介 SQL Profiler是一个图形界面和一组系统存储过程,其作用如下: 图形化监视SQL Server查询: 在后台收集查询信息: 分析性能: 诊断像死锁之类的问 ...

  10. IIS7错误:不能在此路径中使用此配置节。如果在父级别上锁定了该节,便会出现这种情况。锁定是默认设置的(overrideModeDefault="Deny")......

    不能在此路径中使用此配置节.如果在父级别上锁定了该节,便会出现这种情况.锁定是默认设置的(overrideModeDefault="Deny")...... 解决方案: 因为 II ...