Motivation

[反正债多了不愁,再开个方向。]

Data plays a core role in most business systems, data storage and retrieval tasks seem plain to regular application developers, even managers, while how to connect or link data to gain more interesting patterns(more technically, data mining) is still roughly hard. Many data processing and analysis frameworks try to solve giant data volume problems, this note is dedicated to record features of data structuring problem in data mining, and pay more attentions on representation, storage, navigation of relationships in data models.

Audience

myself, who also are interested in semantic web techniques.

Scope

support a descriptive running example

linked data concepts

linked data application based on semantic web techniques

find a clue to use Neo4j in semantic web techniques or linked data

Related Notes
Apache Jena Fuseki notes

Progress

2015/06/26 initial plan
2015/06/27 1-5: introduction, FOAF, SPARQL, etc: need review
2015/06/28 6-7: RDFa and RDF storage
2015/06/29 Related Notes - Fuseki

Content

1 introducing Linked Data

5-star scoring system of Linked Data: P.4

The DBpedia project(http://dbpedia.org) extracts this structured data from Wikipedia
articles and puts it on the Web.

Linked Data has one amazing property: it may be easily combined with other
Linked Data to form new knowledge.

Another useful feature of Linked Data is that it’s self-documenting.

Linked Data is no silver bullet. It won’t protect you from issues of data quality or
from service failures.

Linked Data principles

  • Use URI s as names for things.
  • Use HTTP URI s so that people can look up those names.
  • When someone looks up a URI , provide useful information, using the standards (RDF,SPARQL ).
  • Include links to other URI s, so people can discover more things.

see more in Tim Berners-Lee's thoughts on Linked Data principles

The Linking Open Data(LOD) project

The LOD project 4 is a community activity started in 2007 by the W3C ’s Semantic Web Education and Outreach (SWEO) Interest Group. The project’s stated goal is to “make data freely available to everyone.”


2 RDF

commonly used RDF prefixes: P.40

RDF formats

  • turtle: human-readable format
  • RDF/XML: orginal RDF format in XML
  • RDFa: RDF embedded in HTML attributes
  • JSON-LD: a newer formaint aimed at web developers

JSON-LD ref1
JSON-LD ref2

RDF in the web
Media types:

RDF Format Preferred Content-Type Alternative Content-Type
RDF Turtle file text/turtle
RDF/XML file application/rdf+xml
RDFa text/html
JSON-LD file application/ld+json application/json
OWL file application/owl+xml application/rdf+xml
N-Triples application/N-Triples text/plain

file types and web server

publishing RDF content using Apache HTTP servers:
swbp-vocab-pub

platforms

Linked Data platforms or Semantic Web products, for example Callimachus, see semanticweb tool and sw wiki tool for more products,


3 comsuming Linked Data

3.1 thinking the Web way

In using structured data, you’re enabling machine readability and indexing of this data.
In interlinking published data on the Web, you’re enabling reuse of your information.

3.2 find Linked Data on the web

a Question: is President Barack Obama a Star Wars fan?

3.3 retrieving Linked Data from web pages

tools for finding distributed Linked Data

  • Sindice: the semantic web index
  • SameAs.org: identify equivalent URI s to the Linked Data URI entered and provide an entry point to perform a Sindice search on a general search term
  • Data Hub: a community-run catalog of useful sets of Linked Data on the Web

3.4 combine Linked Data from multiple sources

from known datasets

Product DB aims to be the World’s most comprehensive and open source of product data.

Its data including ProductWiki, MusicBrainz, DB pedia, Freebase, and OpenLibrary, and is gathered by search engines’ crawl sites that publish GoodRelations RDFa or Open Graph protocol data in their pages; for example, or example, BestBuy, IMDb, and Spotify.

from web pages using browser plug-ins

Mozilla add-ons: RDFa Developer

You can use the outcome of SameAs.org to help identify a canonical URL for a given item. A canonical URL is the best URL among available choices.

3.5 display Linked Data in HTML

Using Python to crawl the Linked Data Web

example: use the Python scripting language, RDFLib, and html5lib to access the RDF a data available from Best Buy for a sample product, the Darth Vader Alarm Clock Radio.

install python modules:

$ sudo pip install rdflib
$ sudo pip install html5lib

core code snippet:

import rdflib
import html5lib

graph = rdflib.Graph()
result = graph.parse('http://productdb.org/gtin/00681326152002.ttl', format='turtle')

bestBuyGraph = rdflib.Graph()
bestBuyResult = bestBuyGraph.parse('http://purl.org/net/BestBuyDarthVaderClock', format='rdfa')

4 FOAF

FOAF vocabulary

FOAF profile generator


5 SPARQL

5.1 SPARQL syntax

Each SPARQL SELECT query is organized as follows:

  1. PREFIX (Namespace prefixes.)
  2. SELECT (Define what you wish to retrieve.)
  3. FROM (Specify the dataset from which to draw the results.)
  4. WHERE {
    (Describe the criteria on which to base the selection. This description is in the form of a query triple pattern.)
    }
  5. ORDER BY , LIMIT , and the like (Modifiers that affect the desired result.)

types of SPARQL queries

5.2 SPARQL endpoint

depedia: an online playground

sample queries:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbprop: <http://dbpedia.org/property/>

select ?location
where {
    ?person rdfs:label "George Washington"@en.
    ?location dbprop:namedFor ?person
}
LIMIT 100

Query using Apache Jena ARQ:

JENA_HOME> ./bin/arq --query ./bin/query/location.rq
JENA_HOME> ./bin/arq --query ./bin/query/location.rq --results JSON

Seriously, you should reference [1] as a second try with SPARQL.

windfalls:

Online map generator Google Static Maps


6 Enhance results for search engines

purpose: provide semantic meaning to web content and enable the extraction of Linked Data. This enables your website to be both machine- and human-readable.

6.1 enhacing HTML with embedding RDFa

RDF in Attributes(RDFa) is a language that allows you to express RDF data within an HTML document.

Specifications:

RDFa 1.1
HTML+RDFa 1.1, Support for RDFa in HTML4 and HTML5

Tool: RDFa 1.1 Distiller and Parser

HTML5 and RDFa support

<!DOCTYPE HTML>
<html version="HTML+RDFa 1.1" lang="en">
<body id="me"
    prefix="
    rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
    rdfs: http://www.w3.org/2000/01/rdf-schema#
    xsd: http://www.w3.org/2001/XMLSchema#
    dc: http://purl.org/dc/elements/1.1/
    foaf: http://xmlns.com/foaf/0.1/
    rel: http://purl.org/vocab/relationship/
    stars: http://www.starwars.com/explore/encyclopedia/characters/">

RDFa attributes in HTML5 tags:

  • vocab
  • resource
  • about
  • datatype
  • typeof
  • prefix
  • rel
  • inlist
  • property
  • content
  • rev

6.2 embedding RDFa with a supporting official RDF vocabulary

[1] GoodRelations

GoodRelations is the most widely used RDF vocabulary for e-commerce. It enables you
to publish details of your products and services in a way that search engines, mobile
applications, and browser extensions can utilize the information and improve your click-through rates.

GoodRelations website
GoodRelations' concept model wiki

[2] schema.org

Schema.org is a collaborative initiative by three major search engines: Yahoo!, Bing, and Google.
Its purpose is to create and support a common set of schema for structured data markup on web pages
and to provide a common means for webmasters to mark up their pages so that the search results
are improved and human users have a more satisfying experience.

schema.org specification

6.3 extract RDFa from HTML and applying SPARQL query

RDF extracted from the RDF a-enhanced HTML files can be queried using SPARQL.

TODO: programming procedures not using RDFa 1.1 Distiller and Parser.


7 RDF datasets

RDF dataset in W3C technology stack

7.1 classification of RDF DB system

RDF abstract view:

RDF as a generic, graph-based data model that represents data in the form of triples. These triples are records containing three values (subject, predicate, object) containing ( URI , URI , URI ) or ( URI , URI , value)

relational DB implemented RDF store:

Category Description
*Vertical(triple)table stores* Each RDF triple is stored directly in a three-column table (subject, predicate, object).
*Property(n-ary)table stores* Multiple RDF properties are modeled as n-ary table columns for a single subject.
*Horizontal(binary)table stores* RDF triples are modeled as one horizontal table or into a set of vertically partitioned binary tables where each individual table represents an RDF property.

Commonly used triplestores:

7.2 battle between RDF storage and RDBMS

omitted :-)

7.3 convert anything to RDF

W3C's index of Converter to RDF tools

application

description:
integrate data from CSV/XML returned web service and local file

tools:

  • python
  • Fuseki

References

[1] Hebeler J, Fisher M, et al. Web 3.0与Semantic Web编程[M]. 清华大学出版社, 北京.2010.

[2] Wood D., Zaidman M., Ruth L., et al. Linked Data: Structured data on the web[M].Manning Publications Co.: 2014.

Notes of Linked Data concept and application - TODO的更多相关文章

  1. 【DataStructure】Linked Data Structures

    Arrayss work well for unordered sequences, and even for ordered squences if they don't change much. ...

  2. [XAF] Simplifying integration of custom controls bound to data from XAF application database

    ASP.NET:  http://www.screencast.com/t/OHhcHD9vy WinForms: http://www.screencast.com/t/8M8K4eskkYO9

  3. python dpkt SSL 流tcp payload(从三次握手开始到application data)和证书提取

    # coding: utf-8 #!/usr/bin/env python from __future__ import absolute_import from __future__ import ...

  4. Linux C double linked for any data type

    /************************************************************************** * Linux C double linked ...

  5. Dynamic Data linq to SQL Web Application

    微软提供了一个数据驱动网站模板,可以自动生成CRUD页面,使用过程中碰到些问题 1.首先是如何应用,只需要创建个context并且在Global.asax里面加入下面这一句就可以了 DefaultMo ...

  6. Cross-Domain Security For Data Vault

    Cross-domain security for data vault is described. At least one database is accessible from a plural ...

  7. ExtJS4笔记 Data

    The data package is what loads and saves all of the data in your application and consists of 41 clas ...

  8. Introduction to Structured Data

    https://developers.google.com/search/docs/guides/intro-structured-data Structured data refers to kin ...

  9. Data Types

    原地址: Home / Database / Oracle Database Online Documentation 11g Release 2 (11.2) / Database Administ ...

随机推荐

  1. IT公司100题-10-翻转句子中单词的顺序

    问题描述: 输入一个英文句子,翻转句子中单词的顺序,但单词内字符的顺序不变. 句子中单词以空格符隔开.为简单起见,标点符号和普通字母一样处理. 例如输入“Hello world!”,则输出“world ...

  2. HTTP状态代码

    1xx(临时响应)表示临时响应并需要请求者继续执行操作的状态代码. 100 (继续) 请求者应当继续提出请求. 服务器返回此代码表示已收到请求的第一部分,正在等待其余部分. 101 (切换协议) 请求 ...

  3. VMware-workstation-full-10.0.3-1895310 CN

    Name: VMware-workstation-full-10.0.3-1895310.exe发行日期: 2014-07-01内部版本号: 1895310文件大小: 491 MB文件类型: exe ...

  4. Visual Studio 2013 Update 2 and with Update 2

    Microsoft 的开发工具 Visual Studio 2013 迎来 Update2 更新.本次更新将为普通开发者带来更多全新功能.修复之前旧版 Bugs.提升性能以及稳定性.之前已经安装 VS ...

  5. STM32之GPIO端口位带操作

    #ifndef __SYS_H #define __SYS_H #include "stm32f10x.h" //位带操作 //把“位带地址+位序号”转换别名地址宏 #define ...

  6. Hibernate xml格式和anno格式 mappedby

    xml配置的时候多对一 一对多的外键可以配置一样,但是anno不太好弄,多这边配完了,一那边用个mappedby"“自己在对方的属性”就可以,不然要建一张中间表.xml的mappedby因为 ...

  7. C#基础之程序集(一)

    一.什么是程序集? 程序集 其实就是bin目录的.exe 文件或者.dll文件. 二.原理 三.程序集分类 1.系统程序集 路径:C:\Windows\assembly 2.源代码生成的程序集 使用V ...

  8. dllimport路径问题

    今天做了个试验,是针对dllimport("XXX.DLL");这样写的时候,系统是如何寻找该dll的. 首先系统会搜寻主应用程序根目录. 其次搜寻操作系统安装目录,一般情况是C: ...

  9. unity3d摄像机入门01

    Clear Flags 清除标记 决定屏幕的哪部分将被清除.当使用多个相机来描绘不同的游戏景象时,利用它是非常方便的  Background 背景 在镜头中的所有元素描绘完成且没有天空盒的情况下,将选 ...

  10. SMS短信PDU编码

    目前,发送短消息常用Text和PDU(Protocol Data Unit,协议数据单元)模式.使用Text模式收发短信代码简单,实现起来十分容易,但最大的缺点是不能收发中文短信:而PDU模式不仅支持 ...