SPARQL Lexer and Parser for RDF.rb

This is a Ruby implementation of a SPARQL parser for RDF.rb.

Features

Examples

require 'rubygems'
require 'sparql/grammar'

Executing a SPARQL query against a repository

queryable = RDF::Repository.load("http://usefulinc.com/ns/doap")
sse = SPARQL::Grammar.parse("SELECT * WHERE { ?s ?p ?o }")
sse.execute(queryable)

Parsing a SPARQL query string to SSE

sse = SPARQL::Grammar.parse("SELECT * WHERE { ?s ?p ?o }")
sse.to_sxp

Command line processing

sparql --default-graph http://usefulinc.com/ns/doap input.rq
sparql -e "SELECT * FROM <http://usefulinc.com/ns/doap> WHERE { ?s ?p ?o }"

sparql2sse input.rq
sparql2sse -e "SELECT * WHERE { ?s ?p ?o }"

Documentation

http://sparql.rubyforge.org/grammar/

Representation

The parser natively generates native SPARQL S-Expressions (SSE), a hierarch of SPARQL::Algebra::Operator instances which can be executed against a queryable object, such as a Repository identically to RDF::Query.

Other elements within the hierarchy are generated using RDF objects, such as RDF::URI, RDF::Node, RDF::Literal, and RDF::Query.

See SPARQL::Grammar::Parser for a full listing of algebra operations and RDF objects generated by the parser.

The native SSE representation may be serialized to a textual representation of SSE as serialized general S-Expressions (SXP). The SXP generated closely follows that of OpenJena ARQ, which is intended principally for running the SPARQL rules. Additionally, SSE is generated for CONSTRUCT, ASK, DESCRIBE and FROM operators.

SXP is generated by serializing the parser result as follows:

sse = SPARQL::Grammar.parse("SELECT * WHERE { ?s ?p ?o }")
sxp = sse.to_sxp

The following examples illustrate SPARQL transformations:

SPARQL: SELECT * WHERE { ?a ?b ?c }

SSE: RDF::Query.new { pattern [RDF::Query::Variable.new("a"), RDF::Query::Variable.new("b"), RDF::Query::Variable.new("c")] }

SXP: (bgp (triple ?a ?b ?c))

SPARQL: SELECT * FROM WHERE { ?a ?b ?c }

SSE: SPARQL::Algebra::Operator::Dataset.new( [RDF::URI("a")], RDF::Query.new { pattern [RDF::Query::Variable.new("a"), RDF::Query::Variable.new("b"), RDF::Query::Variable.new("c")] } )

SXP: (dataset () (bgp (triple ?a ?b ?c)))

SPARQL: SELECT * FROM NAMED WHERE { ?a ?b ?c }

SSE: SPARQL::Algebra::Operator::Dataset.new( [[:named, RDF::URI("a")]], RDF::Query.new { pattern [RDF::Query::Variable.new("a"), RDF::Query::Variable.new("b"), RDF::Query::Variable.new("c")] } )

SXP: (dataset ((named )) (bgp (triple ?a ?b ?c)))

SPARQL: SELECT DISTINCT * WHERE ?b ?c

SSE: SPARQL::Algebra::Operator::Distinct.new( RDF::Query.new { pattern [RDF::Query::Variable.new("a"), RDF::Query::Variable.new("b"), RDF::Query::Variable.new("c")] } )

SXP: (distinct (bgp (triple ?a ?b ?c)))

SPARQL: SELECT ?a ?b WHERE ?b ?c

SSE: SPARQL::Algebra::Operator::Project.new( [RDF::Query::Variable.new("a"), RDF::Query::Variable.new("b")], RDF::Query.new { pattern [RDF::Query::Variable.new("a"), RDF::Query::Variable.new("b"), RDF::Query::Variable.new("c")] } )

SXP: (project (?a ?b) (bgp (triple ?a ?b ?c)))

SPARQL: CONSTRUCT ?b ?c WHERE ?b ?c FILTER (?a)

SSE: SPARQL::Algebra::Operator::Construct.new( [RDF::Query::Pattern.new(RDF::Query::Variable.new("a"), RDF::Query::Variable.new("b"), RDF::Query::Variable.new("c"))], SPARQL::Algebra::Operator::Filter.new( RDF::Query::Variable.new("a"), RDF::Query.new { pattern [RDF::Query::Variable.new("a"), RDF::Query::Variable.new("b"), RDF::Query::Variable.new("c")] } ) )

SXP: (construct ((triple ?a ?b ?c)) (filter ?a (bgp (triple ?a ?b ?c))))

SPARQL: SELECT * WHERE OPTIONAL { }

SSE: SPARQL::Algebra::Operator::LeftJoin.new( RDF::Query.new { pattern [RDF::URI("a"), RDF::URI("b"), RDF::URI("c")] }, RDF::Query.new { pattern [RDF::URI("d"), RDF::URI("e"), RDF::URI("f")] } )

SXP: (leftjoin (bgp (triple )) (bgp (triple )))

SPARQL: SELECT * WHERE { }

SSE: SPARQL::Algebra::Operator::Join.new( RDF::Query.new { pattern [RDF::URI("a"), RDF::URI("b"), RDF::URI("c")] }, RDF::Query.new { pattern [RDF::URI("d"), RDF::URI("e"), RDF::URI("f")] } )

SXP: (join (bgp (triple )) (bgp (triple )))

SPARQL: PREFIX : http://example/

SELECT * 
{ 
   { ?s ?p ?o }
  UNION
   { GRAPH ?g { ?s ?p ?o } }
}

SSE: SPARQL::Algebra::Operator::Prefix.new( [[:":", RDF::URI("http://example/")]], SPARQL::Algebra::Operator::Union.new( RDF::Query.new { pattern [RDF::Query::Variable.new("s"), RDF::Query::Variable.new("p"), RDF::Query::Variable.new("o")] }, RDF::Query.new(:context => RDF::Query::Variable.new("g")) { pattern [RDF::Query::Variable.new("s"), RDF::Query::Variable.new("p"), RDF::Query::Variable.new("o")] } ) )

SXP: (prefix ((: http://example/)) (union (bgp (triple ?s ?p ?o)) (graph ?g (bgp (triple ?s ?p ?o)))))

Implementation Notes

The parser is driven through a rules table contained in lib/sparql/grammar/parser/meta.rb. This includes branch rules to indicate productions to be taken based on a current production.

The meta.rb file is generated from etc/sparql-selectors.n3 which is the result of parsing http://www.w3.org/2000/10/swap/grammar/sparql.n3 (along with bnf-token-rules.n3) using cwm using the following command sequence:

cwm ../grammar/sparql.n3 bnf-token-rules.n3 --think --purge --data > sparql-selectors.n3

sparql-selectors.n3 is itself used to generate lib/sparql/grammar/parser/meta.rb using script/build_meta.

Note that The SWAP version of sparql.n3 is an older version of the grammar with the newest in http://www.w3.org/2001/sw/DataAccess/rq23/parsers/sparql.ttl, which uses the EBNF form. Sparql.n3 file has been updated by hand to be consistent with the etc/sparql.ttl version. A future direction will be to generate rules from etc/sparql.ttl to generate branch tables similar to those expressed in meta.rb, but this requires rules not currently available.

Next Steps for Parsing EBNF

A more modern approach is to use the EBNF grammar (e.g., etc/sparql.bnf) to generate a Turtle/N3 representation of the grammar, transform this to and LL1 representation and use this to create meta.rb.

Using SWAP utilities, this would seemingly be done as follows:

python http://www.w3.org/2000/10/swap/grammar/ebnf2turtle.py \
  http://www.w3.org/2001/sw/DataAccess/rq23/parsers/sparql.bnf \
  en \
  'http://www.w3.org/2001/sw/DataAccess/parsers/sparql#' > etc/sparql.ttl

python http://www.w3.org/2000/10/swap/cwm.py etc/sparql.ttl \
  http://www.w3.org/2000/10/swap/grammar/ebnf2bnf.n3 \
  http://www.w3.org/2000/10/swap/grammar/first_follow.n3 \
  --think --data > etc/sparql-ll1.n3

At this point, a variation of script/build_meta should be able to extract first/follow information to re-create the meta branch tables.

Dependencies

Installation

The recommended installation method is via RubyGems. To install the latest official release of the SPARQL::Grammar gem, do:

% [sudo] gem install sparql-grammar

Download

To get a local working copy of the development repository, do:

% git clone git://github.com/bendiken/sparql-grammar.git

Alternatively, download the latest development version as a tarball as follows:

% wget http://github.com/bendiken/sparql-grammar/tarball/master

Mailing List

Author

Contributors

Refer to the accompanying CREDITS file.

Contributing

  • Do your best to adhere to the existing coding conventions and idioms.
  • Don't use hard tabs, and don't leave trailing whitespace on any line.
  • Do document every method you add using YARD annotations. Read the tutorial or just look at the existing code for examples.
  • Don't touch the .gemspec, VERSION or AUTHORS files. If you need to change them, do so on your private branch only.
  • Do feel free to add yourself to the CREDITS file and the corresponding list in the the README. Alphabetical order applies.
  • Do note that in order for us to merge any non-trivial changes (as a rule of thumb, additions larger than about 15 lines of code), we need an explicit public domain dedication on record from you.

License

This is free and unencumbered public domain software. For more information, see http://unlicense.org/ or the accompanying UNLICENSE file.