url:http://ontrakinfo.wordpress.com/2012/10/29/making-gtfs-query-more-convenient/

这简直说出了我的心声。

I have been spending a lot of time parsing the GTFS database. On the surface it is just a simple CSV files. But to extract useful information from GTFS is often unexpected difficult. For example, find the stops from a bus line in sequential order might sounds like basic thing to do. But it is actually non-trivial with GTFS.

One reason is transit service is more complex it seems. It might seems a bus service just hit all the stops in sequence. But the actual service has a lot of variables. The schedule is often different in weekend compare to weekdays. And so does the exact route that it covers. Sometimes a bus is scheduled to run a short route rather than covering the whole length. In more complex case there can be branching where there is a common main trunk and then the buses split to serve two or more alternative destination.

This is the reason why in GTFS one “route” may associate with multiple “shapes”. To find out what shapes are associate with a route, we will have to make a query like this

SELECT
shape_id
FROM route
JOIN trips
JOIN shape
GROUP BY shape_id;

To find out the stops is even more complex. Here we need to join one more table the stop_times. It is also the biggest tables in the GTFS. So this is also the most computation intensive query to do.

SELECT
shape_id, stop_id
FROM route
JOIN trips
JOIN stop_times
JOIN stops
GROUP BY shape_id, stop_id;

Still most people have a clear concept of what a transit line is where it runs. It shouldn’t be such a pain to compute. A more useful structure should look like below.

    GTFS             More Useful
  Structure           Structure     route              line
     |                   |
     |                   V
     |                 route*
     |                   | \
     |    shape          |  +-> route_shape
     |     ^             |  |
     |    /              |  +-> route_stops*
     |   /               |
     V  /                V
    trips              trips
     |                   |
     |        stops      |          stops
     |        ^          |
     |       /           |
     V      /            V
    stop_times         stop_times

Here a shift the terminology a bit. The top level entity is a line (i.e. GTFS’ route). This is service that people know of, like a numbered bus line or a metro line. Below that is routes. These are the collection of alternative routes a line may run. The routes are not explicitly represented in GTFS. You can find that by querying all unique shape_id using the first SQL. Another missing piece is the stops. If we can pre-compute all the route_stops using the second SQL once, for the most part we don’t need the giant stop_times table. For applications that do not deal with scheduled time, this is a huge saver. The is one assumption my structure makes though. It is that different lines do not shape that same route. If should be a reasonable assumption. And if there is indeed share route and shape, we should just replicated them as two separate entities.

The original GTFS structure seems to have a transit operator centric view. It allows them maximum flexibility to author and publish their service data. But for application developers, it is not structured for easy traversal. By adding the route and route_stops tables as indicated, it will greatly facilitate the query and operation of transit information.

[转] Making GTFS query more convenient的更多相关文章

  1. Spring Boot Reference Guide

    Spring Boot Reference Guide Authors Phillip Webb, Dave Syer, Josh Long, Stéphane Nicoll, Rob Winch,  ...

  2. Using dojo/query(翻译)

    In this tutorial, we will learn about DOM querying and how the dojo/query module allows you to easil ...

  3. Query classification; understanding user intent

    http://vervedevelopments.com/Blog/query-classification-understanding-user-intent.html What exactly i ...

  4. The 5th tip of DB Query Analyzer

    The 5th tip of DB Query Analyzer             Ma Genfeng   (Guangdong UnitollServices incorporated, G ...

  5. Data access between different DBMS and other txt/csv data source by DB Query Analyzer

        1 About DB Query Analyzer DB Query Analyzer is presented by Master Genfeng,Ma from Chinese Mainl ...

  6. How to generate the complex data regularly to Ministry of Transport of P.R.C by DB Query Analyzer

    How to generate the complex data regularly to Ministry of Transport of P.R.C by DB Query Analyzer 1 ...

  7. Install and run DB Query Analyzer 6.04 on Microsoft Windows 10

          Install and run DB Query Analyzer 6.04 on Microsoft Windows 10  DB Query Analyzer is presented ...

  8. DB Query Analyzer 6.04 is distributed, 78 articles concerned have been published

        DB Query Analyzer 6.04 is distributed,78 articles concerned have been published  DB Query Analyz ...

  9. FluentData -Micro ORM with a fluent API that makes it simple to query a database 【MYSQL】

    官方地址:http://fluentdata.codeplex.com/documentation MYSQL: MySQL through the MySQL Connector .NET driv ...

随机推荐

  1. 网页闯关游戏(riddle webgame)--SQL注入的潘多拉魔盒

    前言: 之前编写了一个网页闯关游戏(类似Riddle Game), 除了希望大家能够体验一下我的游戏外. 也愿意分享编写这个网页游戏过程中, 学到的一些知识. web开发初学者往往会忽视一些常见的漏洞 ...

  2. AJAX部分---对比js做日期的下拉选择 和 ajax做三级联动;

    js做日期选择: 实现当前年份的前5后5年的日期选择 实现功能:年份和月份页面加载完成使用JS循环添加,天数根据月份的变化动态添加改变 扩展功能:天数可以根据闰年平年变化 <body> & ...

  3. date_default_timezone_set()设置时区

    <?php echo function_exists(date_default_timezone_set)."<br>";//在这他总是返回1,这函数是判断这里面 ...

  4. word转pdf字体格式变乱的问题

    完成word转pdf的功能之后,本地测试没问题,然后发布到服务器上,就遇到了字体变乱的问题,如下: 由于我本地发布后导出没有出现同样情况,而服务器和本地的最大区别在于字体库,于是,把服务器上关于需要用 ...

  5. SPOJ QTREE Query on a tree

    题意:给一颗n个点的树,有两种操作CHANGE i ti : 把第i条边的权变为tiQUERY a b : 问点a 到 点b 之间的边的最大权 思路:树剖处理边权.由于是边,所以只需要把边权处理到子节 ...

  6. Spring源码学习之:ClassLoader学习(2)

    转载:http://longdick.iteye.com/blog/332580 大家都知道一个java应用项目可以打包成一个jar,当然你必须指定一个拥有main函数的main class作为你这个 ...

  7. python requests库入门[转]

    首先,确认一下: Requests 已安装 Requests是 最新的 让我们从一些简单的示例开始吧. 发送请求 使用Requests发送网络请求非常简单. 一开始要导入Requests模块: > ...

  8. Python 100道题深入理解

    # -*- coding: utf-8 -*-# 题目:有1.2.3.4个数字,能组成多少个互不相同且无重复数字的三位数?都是多少?# 程序分析:可填在百位.十位.个位的数字都是1.2.3.4.组成所 ...

  9. Java基础教程

    http://www.runoob.com/java/java-environment-setup.html

  10. [系统集成] OpenLDAP使用AD密码

    关于OpenLDAP和AD帐号的整合,网上有大量的文档,绝大多数都不符合我们的需求,下面的方案是我经过调研.测试.修改.最终采用的. . 需求概述 公司网络中有两种帐号:OpenLDAP帐号和AD帐号 ...