This document specifies VDoc, a a JSON-LD-based documentation human-machine-valospace interchange format.

This document has not been reviewed. This is a draft document and may be updated, replaced or obsoleted by other documents at any time.

This document is part of the ValOS kernel specification.

The format is implemented and supported by @valos/vdoc npm package.

§ Introduction

VDoc is an extensible JSON-LD interchange specification for extracting documents from varying sources, passing the now-machine-manipulable interchange document around and subsequently producing documents of specific formats such as Valospace resources, markdown, ReSpec HTML and browser and ansi-colored console outputs.

Motivation for this specification is to provide the foundation for document Valospace hypertwins by supporting the ValOS resources as an emission target. This allows all kinds of documents to be accessible from within Valospace with minimal additional tooling. This is not made an explicit design goal unto itself; instead the design goals are chosen to be generic in a way that satisfies this goal as the original author believes this leads to better design.

§ Goals and non-goals

VDoc design goals are:

  1. Manual writing VDoc should be robust and must rely on minimal number of intuitive rules. The more there is to remember the higher the threshold to writing docs.

  2. VDoc should be programmatically manipulable with minimal boilerplate. Complex array and other wrapper nestings make introspection and comprehension harder for a less than dedicated developer.

  3. VDoc should have semantic structure with a globally referenceable underlying model. Documents should be combinable and allowed to evolve; identifying document parts by their position in a document is brittle.

  4. VDoc should be contextually extensible. Formats often have details which resist universalization but must still be accessible during document emission.

Design non-goals are:

  1. VDoc does not attempt at providing a unified ontology. Documentation formats are contextual and often evolve. Common structures may be represented in unified manner using existing ontologies where possible but providing an interchange ontology is outside the scope of this document.

  2. Documentation formats are contextual. Not all information needs to survive the roundtrip via the underlying unified model and back. As a corollary a specific format generator can know about other formats explicitly and consume their contextual data.

To satisfy the goals VDoc chooses JSON-LD 1.1 as the primary interchange format and as a consequence RDF as the underlying document object model.

Additionally VDoc provides extensibility via custom VDoc extensions which can introduce domain-specific namespaces and ontologies, extraction and emission transformation operations and document output formats.

§ Document phases and transformations

VDoc defines the central document flow in terms of three document phases:

  1. Source graph is a cyclic graph of native objects with some of its sub-graphs matching some of the VDoc extraction rules. It can be manually hand-written, programmatically generated or even dynamically introspected.
  2. VDocState is a JSON-LD construct and the primary VDoc interchange format. It is a normalized, complete and self-contained structure with potentially multiple different format-specific @context(s).

  3. Emission output is a format specific output that is produced by emission from a VDocState and format specific set of emission parameters.

VDoc defines two transformations between the phases:

  1. Extraction transforms a source graph into a VDocState by applying the idempotent VDoc extraction rules until the output no longer changes.Due to idempotence the source graph can wildly different or arbitrarily close to the resulting VDocState; in fact a VDocState is always its own source graph.
  2. Emission is a format specific transformation which emits the format specific output from VDocState.

In addition to these phases and transformations VDoc makes use of JSON-LD 1.1 format, its API and algorithms and (maybe) its framing for providing a mapping from VDocState to RDF model.

§ VDocState - primary interchange format

VDocState is a VState JSON-LD document with a well-formed tree structure consisting of three types of nodes, corresponding to the first, second and remaining levels of the tree:

  1. Document node is an always-first-level node identified by a a global document IRIas its JSON-LD @id. Links to document nodes in bold and italics.
  2. Resource node is an always-second-level node which is directly accessible from the first-level document via its document relative resource identifieras the dictionary key. Links to resource nodese are in italics.

  3. Element node is a third-or-more-level node. It might be anonymous and lacks a stable and unique identifiers. It MAY have a locally unique identifier. If the element node and all its parent element nodes have locally unique identifier then the ordered set of those identifiers can be considered a document local unique identifier of the element node, similar to the resource identifier

There can be multiple first-level document nodes in a single VDocState (as per JSON-LD). The tree root node is the singular, implicit '0th-level' VDocState node without semantics defined by VDoc itself.

§ Node keys

The keys of the VDocState nodes have four categories depending on whether the key is an IRI or not and whether an IRI key has semantics defined VDoc or extension format specifications:

  1. VDoc node key is any IRI which matches a VDoc ontology context term. Its semantics are defined by this specification.
  2. Extension node key is any IRI which matches an extension ontology context term. Its semantics are defined by the corresponding extension specification

  3. Generic IRI key is any IRI key lacking recognized ontology. It has no semantics in addition to what JSON-LD specifies.

  4. Identifier key is any non-IRI key. The semantics of an identifier key is defined by the node.

§ Document nodes

The document IRI is a global identifier of a document. It must not have a fragment part.All identifier keys of a document node must have a resource node as their value.

§ Resource nodes

The resource identifier is a string using character set restricted to valid javascript identifiers, is unique within a document and which identifies a resource node inside that document.When the resource identifier is appended to the document IRI as an IRI fragment part the resource node has a stable, global identity over time.

§ Element nodes

Element nodes are structural document building blocks which lack a stable identity even within the document.

§ Transformations convert documents to and from VDocState

§ Extraction transforms source graphs into VDocState

§ Emission transforms VDocState into output targets

§ Extensions provide new ontologies and transformations

A VDoc extension is a specification which can extend the VDoc specification in different ways in the different transformation stages. An extension has two main parts; the extension ontology introduces new vocabulary for the VDoc interchange format itself and the extension transformations introduces new algorithms for extracting and emitting said vocabulary.

Multiple VDoc extensions can co-exist; extensions must be specified in a manner that a VDoc document itself and all transformation operations are well-defined and deterministic even if multiple extensions are used at the same time. There are two primary mechanisms to reach this goal: global dns-registration based ontology base IRI's and and extension ordering primacy during transformations.

§ VDoc extension ontology

VDoc extension ontology is the combination of the extension
  1. namespace preferred prefix and its associated baseIRI
  2. depended ontologies with their prefix definitions

  3. extension RDF vocabulary

  4. JSON-LD context term definitions

Together these fully specify the semantics of an extension ontology. More specifically the ontologies of all extensions listed in the @context section of a VDoc JSON-LD document fully define all the semantics of that particular document itself.

Mappings from short, document-local strings into a globally unique IRIs.

A collection of RDF classes, properties and other names, all of which have the ontology baseIRI as a prefix.

A collection of JSON-LD context term definitions.

§ VDoc extension transformations: extraction and emission

An extension can specify an arbitrary number of extraction and emission transformation algorithms and rules from various source graphs via the VDoc interchange format into various output formats.

§ Extraction transformation rules

Extraction rules specify how a source graph is interpreted as mutations (usually additions) to a given target VDocState document. An extraction rule consists of two parts:
  • key pattern matcher is matched against each source graph node dictionary key to see if the key matches the rule and to parse the extraction rule parameters
  • extraction action specifies how the extraction rule parameters and extraction context is interpreted as a set of mutations on the current target VDocState document node

Extraction context is defined as a collection of rule key, source graph parent node, source graph node value, target document parent node and target document valueTypical extraction rule creates a new Node, adds it to target document parent (directly or as a reference depending on the Node type) and adds the source node value to some specific property of the new Node.

§ Extraction extractee tool APIs

An extension MAY specify an extractee API as a collection of WebIDL interfaces for constructing of extension extraction source graphs. By doing this the native implementations gain the benefits of integrated toolchains:
  • Improved discoverability via integrated documentation and code completion
  • Implicitly well-formed primitives and structures where possible, validation of input where not

  • Improved readability of the document in contexts where the primary document source graph is expressed in native code

All put together the extraction APIs are intended to lower the threshold of adoption of new extensions and as such make the introduction of new extensions easier.

§ Emission outputs

§ Emission transformation

§ VDoc Core namespace

'VDoc' namespace specifies the vocabulary for the human facing document structure of the valos document interchange format.Human facing structure includes primitives such as chapters, titles, lists, tables, cross-references etc.: primitives which are sufficiently common and meaningful across all types of documents. VDoc core ontology explicitly does not specify any semantic meanings outside the document structure itself.

§ VDoc Core IRI prefixes

Prefix IRI
rdf http://www.w3.org/1999/02/22-rdf-syntax-ns#
rdfs http://www.w3.org/2000/01/rdf-schema#
xsd http://www.w3.org/2001/XMLSchema#
owl http://www.w3.org/2002/07/owl#
dc http://purl.org/dc/elements/1.1/
VDoc https://valospace.org/vdoc/0#
VKernel https://valospace.org/kernel/0#

§ VDoc VDoc classes The class of classes which are defined by vdoc or a vdoc extension

rdfs:subClassOf description

#Class

rdfs:ClassThe class of classes which are defined by vdoc or a vdoc extension

#Property

rdf:PropertyThe class of properties which are defined by vdoc or a vdoc extension

#Node

A document tree Node
instance properties The class of all valosheath type prototype properties. VDoc:resourceId VDoc:content VDoc:entries VDoc:words VDoc:lines VDoc:lookup VDoc:wide VDoc:tall VDoc:heading VDoc:collapsed VDoc:elidable VDoc:map VDoc:em VDoc:strong VDoc:ins VDoc:del VDoc:q VDoc:blockquote

#Chapter

VDoc:Node A document tree Node A titled, possibly numbered chapter document node

#Paragraph

VDoc:Node A document tree Node A vertically segmented paragraph document node

#BulletList

VDoc:Node A document tree Node A bullet list document node

#NumberedList

VDoc:Node A document tree Node A numbered list document node

#Table

VDoc:Node A document tree Node A two-dimensional table document node
instance properties The class of all valosheath type prototype properties. VDoc:columns
VDoc:Node A document tree Node A table cross-entry-section header node
instance properties The class of all valosheath type prototype properties. VDoc:cell

#ContentSelector

The class of selectors with explicit meaning for selecting content

#CharacterData

VDoc:Node A document tree Node A character data document node
instance properties The class of all valosheath type prototype properties. VDoc:language

#Reference

VDoc:Node A document tree Node A reference document node

#ContextPath

VDoc:Node A document tree Node A context-based path document node
instance properties The class of all valosheath type prototype properties. VDoc:context

#ContextBase

VDoc:ContextPath A context-based path document node A context base setting document node

#HTMLElementProperty

VDoc:Property The class of properties which are defined by vdoc or a vdoc extension The class of vdoc properties which directly inherit the semantics of a specific HTML5 element
instance properties The class of all valosheath type prototype properties. VDoc:elementName

§ VDoc VDoc properties The class of properties which are defined by vdoc or a vdoc extension

rdfs:domain rdfs:range

#resourceId

VDoc:Node A document tree Node xsd:string
description The resource identifier of a resource node

#content

VDoc:Node A document tree Node rdfs:List
description The primary visible content of a Node

#entries

VDoc:Node A document tree Node rdfs:List
description A visible list of vertically or horizontally segmented entries

#words

VDoc:Node A document tree Node rdfs:List
description A visible list of visually separate words

#lines

VDoc:Node A document tree Node rdfs:List
description A visible list of visually separate lines

#lookup

VDoc:Node A document tree Node rdfs:Resource
description A reference to a lookup structure for string literal entries

#wide

VDoc:Node A document tree Node rdfs:Resource
description Node content is wide

#tall

VDoc:Node A document tree Node rdfs:Resource
description Node content is tall

#heading

VDoc:Node A document tree Node rdfs:Resource
description Node content is a heading. An explicit number value denotes increasingly smaller headings

#collapsed

VDoc:Node A document tree Node xsd:boolean
description Node content is collapsed but expandable. If false, content is expanded but collapsible

#elidable

VDoc:Node A document tree Node rdfs:Resource
description Node should be hidden if its VDoc:content is empty

#columns

VDoc:Table A two-dimensional table document node rdfs:List
description A list of table cross-entry-section columns

#map

VDoc:Node A document tree Node VDoc:Node A document tree Node
description The content selector template mapped onto all entries of this node

#cell

VDoc:Header A table cross-entry-section header node VDoc:Node A document tree Node
description The content selector template mapped onto current entry for this column

#selectField

rdfs:Resourcerdfs:Literal
description A content selector for the entry field denoted by the object of this triple

#language

VDoc:CharacterData A character data document node rdfs:Resource
description The language of the character data content

#context

VDoc:ContextPath A context-based path document node rdfs:Resource
description Possibly non-visible context base (absolute or relative to current base)

#elementName

VDoc:HTMLElementProperty The class of vdoc properties which directly inherit the semantics of a specific HTML5 element xsd:string
description The name of the html element associated with this property

§ VDoc VDoc:HTMLElementProperty vocabulary

Properties instanced from VDoc:HTMLElementProperty inherit HTML5 element semantics directly. Only those HTMl5 elements with structural semantic meaning are exposed via VDoc core ontology.

HTML5 element rdfs:domain rdfs:range

#em

em VDoc:Node A document tree Node rdfs:Resource
description Node content is <em>emphasised</em>

#strong

strong VDoc:Node A document tree Node rdfs:Resource
description Node content is <strong>strong</strong>

#ins

ins VDoc:Node A document tree Node rdfs:Resource
description Node content is an <ins>insertion</ins>

#del

del VDoc:Node A document tree Node rdfs:Resource
description Node content is <del>deleted</del>

#q

q VDoc:Node A document tree Node rdfs:Resource
description Node content is <q>quoted</q>

#blockquote

blockquote VDoc:Node A document tree Node rdfs:Resource
description Node content is <blockquote>block quoted</blockquote>

§ VDoc remaining vocabulary

rdf:type rdfs:subClassOf

#selectKey

VDoc:ContentSelector The class of selectors with explicit meaning for selecting content
description A content selector literal denoting the lookup key or index of the entry

#selectValue

VDoc:ContentSelector The class of selectors with explicit meaning for selecting content
description A content selector literal denoting the value of the entry

§ VDoc Core JSON-LD context term definitions

Term Definition @id @type @container
restriction
@base https://valospace.org/vdoc/0#
VDoc:content #content @id @list
VDoc:entries #entries @id @list
VDoc:words #words @id @list
VDoc:lines #lines @id @list
VDoc:columns #columns @id @list
VDoc:map #map @id
VDoc:cell #cell @id

§ VDoc Core transformations

VDoc defines a single extraction transformation from a native javascript source graph. To support this VDoc defines an extractee API as native javascript convenience functions and libraries which construct typical VDoc source graph primitives. This API can be extended by extensions which make use of native source graph extractors.

The extraction transformation is based around two basic mechanisms: array composition and extraction rules.

§ VDoc Core extraction rules

Rule name Inter-node rdf:type Owner property Body property ';rest' property Comment
VDoc:content VDoc:content Basic Node
chapter VDoc:Chapter VDoc:content VDoc:content dc:title Numbered, titled chapter
p VDoc:Paragraph VDoc:content VDoc:content Vertically segmented paragraph
c VDoc:CharacterData VDoc:content VDoc:content VDoc:language Character data
bulleted VDoc:BulletList VDoc:content VDoc:lines Bulleted list
numbered VDoc:NumberedList VDoc:content VDoc:lines Numbered list
table VDoc:Table VDoc:content VDoc:columns VDoc:lookup Table
column VDoc:Header VDoc:entries VDoc:content VDoc:cell Header
data Hidden data

§ VDoc Core extractee API

API identifier rdf:type
aggregate
c
bulleted
numbered
ref
identifize
cpath
cbase
em
strong
heading
ins
del
q
blockquote

§ VDoc Core emission output

§ VDoc Core emission rules

ReVDoc provides html emission rules for null string number array object VDoc:Node VDoc:Chapter VDoc:Paragraph VDoc:BulletList VDoc:NumberedList VDoc:Table VDoc:Reference VDoc:CharacterData