1 Introduction

TDB is a RDF storage of Jena.

official guarantees and limitations

TDB support full range of Jena APIs
TDB can be used as a high performance RDF store on a single machine
TDB can be accessed and managed with cmd scripts and Java API
TDB dataset can be protected against corruption using transaction
TDB supports Serializable transaction level
TDB supports ACID transaction trough write-ahead-logging
A TDB dataset should only directly accessed from a single JVM at a time, otherwise data corruption may occur.From 1.1.0 onwards TDB includes automatic protection against multi-JVM usage which prevents this under most circumstances.

2 TDB Java API

The application obtain a model or RDF dataset from TDB, navigate or use it for other model or dataset.

construct a model or dataset

2 ways to specify data source:

directory

see Jena TDB API without Assembler

assembler file

the assembler syntax and sematic see under 5 assembler: a DSL

bulkloader

load data into an empty dataset fastly, using cmd utitilies: tdbloader

concurrency support

USE transaction DEEPLY, it's guaranteed.

cache

TDB support caching in different levels, from RDF terms to disk block.
When not using transaction, should call synchronization explicitly:

Dataset dataset = ...  ;
TDB.sync(dataset) ;

3 command line scripts

scripts located in JENA_HOME/bin

datasource specification

// assembler file
--desc=assembler.ttl
--tdb=assembler.ttl

// directory
--loc=DIRECTORY
--location=DIRECTORY

if not specified, --desc=tdb.ttl default is set.

commands

tdbloader

Bulk loader and index builder.
tdbloader2

Bulk loader and index builder. Faster than tdbloader but only works on Linux and Mac OS/X since it relies on some Unix system utilities.

Restriction: This bulk loader can only be used to create a database. It may overwrite existing data. It only accepts the --loc argument and a list of files to load.
tdbquery

Invoke a SPARQL query on a store.
Use --time for timing information.
tdbdump

Dump the store in N-Quads format.
tdbstats

Produce a statistics for the dataset.

//TODO cmd tool usage

4 transaction

detailed limitations

Bulk loads: the TDB bulk loader is not transactional
Nested transactions are not supported.
Some active transaction state is held exclusively in-memory, limiting scalability.
Long-running read-transactions cause a build-up of pending changes.

API for transaction

read transaction

// use directory to specify datasource
Dataset dataset = TDBFactory.createDataset(directoryPathStr);
dataset.begin(ReadWrite.READ) ;
try {
    ...
    //dataset.abort();// abort transaction
} finally {
    dataset.commit();
    //dataset.end(); // same as commit() 

}

write transaction

dataset.begin(ReadWrite.WRITE) ;
try {
  ...
  dataset.commit();
} finally {
  dataset.end();
}

see Jena TDB 101 Java API without Assembler for a running example.

tansaction with concurrency

2 methods:

shareable Dataset, sequential transaction behaviour
private Dataset, independent transaction

5 assembler: a DSL

see Jena TDB assembler

6 dataset and named graph

concept

An RDF dataset is composed of 1 unnamed default graph, and 0+ named graphs. Here graph is same as graph is SPARQL vocabulary.

storage

A RDF data set use an individual O.S. directory for storage.

The default graph is held as a single graph, while the named graphs are held in a set of indexes.

query

SPARQL query is fully supported in named graphs of TDB backed datasets.

2 special graph name

**urn:x-arq:UnionGraph**: union of all named graphs in a dataset
**urn:x-arq:DefaultGraph**: the unamed default graph in a dataset

7 TDB configuration

TODO

8 TDB optimizer