MinXSON - a fusion of JSON and XML

Why?

The design principle behind JSON is that it uses the literal syntax of languages such as Python and Javascript to neatly describe complex, hierarchical data. It rather elegantly confines itself to a shared syntactic core that these languages have in common. As a result it hits the sweet spot by being immediately familiar to a lot of programmers and pragmatically effective.

However JSON is rather clumsy for representing data in languages such as Ginger that support XML-like literals. The central idea behind MinXSON is that it is possible to fuse JSON and MinXML grammars by extending the JSON grammar to allow start-end tags. e.g.

<data time_recorded="12:14">0.1, 0.3, -3, 4</data>

Design Concepts

MinXSON is designed to be a superset of both MinXML and JSON; a MinXSON parser will accept either and have exactly the same interpretation as would a MinXML parser or JSON parser respectively. But since the reverse can't be true, it's natural to add extra features that make the language more readable. These extensions are intended to be in keeping with JSON's Javascript-like syntax and make MinXML more like a programming language. We're also more favourably inclined towards extensions that other developers have experimented with (e.g. Relaxed JSON https://github.com/phadej/relaxed-json).

  • Javascript-style comments are allowed, matching the general Javascript inspiration of JSON. The lack of comments in JSON is too restrictive. Although XML comments would work, they are visually rather clumsy, especially inside inside start/end tags. Hence both // and /* … */ comments were added.
  • In arrays and objects, the comma operator is permitted to be used as a terminator, see Python for a similar relaxation. This makes generating MinXSON a little more elegant. Also the semi-colon is allowed as an alternative as it reads more naturally as a terminator than a comma and is thematic with the original programming languages.
  • The five pre-defined named character entities of XML is needlessly restrictive, so the full set of HTML5 named character entities is included. This is stylistically consistent with MinXML and a useful feature.
  • The JSON string escape '\' is underused and safe to expand, so we use '\&' to introduce an HTML5 named character entity e.g. "Copyright \&copy; Stephen Leach". This brings attribute values and strings closer together, reducing the memory load.
  • JSON strings are double-quoted but MinXML attribute values may additionally be single-quoted. We extend the use of single quotes to include strings, reducing the differences between the two types of strings and so reducing memory load on the coder. It also makes it simpler to write strings containing one or the other quote marks.
  • JSON objects are restricted to mapping from strings to arbitrary values, so they naturally represent namespaces, where the names are not at the same semantic level as strings in the rest of the expression. To signify this, we allow the keys of objects to be written without quotes provided that they adhere to the syntax of attribute keys; reusing the attribute key syntax reduces the load on programmers. And to make this read more fluidly, equal signs are allowed as an alternative to ':', just as '=' is the separator used in MinXML attributes, bringing the two syntaxes closer together E.g.
    { "foo": true, "bar": false } // Same as
    { foo = true, bar = false }
  • UNIX scripting languages benefit from allowing '#!' to begin an end of line comment. A significant use-case of MinXSON is as a configuration for a 'shebang' script. So the shebang end-of-line comment is permitted at the start of an expression. However, it is not allowed anywhere else in order to free use the '#' symbol for future use as much as possible. Example:
    #!/usr/local/bin/myscript
    {
        path="/usr/local/share/myscript",
        usage="myscript: a program to do my stuff",
        effective_user="admin"
    }

One of the design goals was that it should be possible to cleanly read a stream of newline-terminated expressions off the input stream so that MinXSON can used as the basis for a message stream.

Grammar for MinXSON

A complete grammar for MinXSON in EBNF, together with railroad diagram, courtesy of the excellent Railroad Diagram Generator. Note that it is mandatory to discard comments and whitespaces (represented by D for 'discard').

EBNF Grammar

MinXSON ::= Element | JSON | D MinXSON | Shebang MinXSON
Element ::= StartTag Children EndTag | FusedTag
Children ::= ( Child ( Separator Child )* )? Separator? D?
Separator ::= D? [,;]
Child ::= ( Element+ | JSON ) | D Child
StartTag ::= '<' D? NCName ( D Attributes)* D? '>'
EndTag ::= '</' D? NCName D? '>' | '</' D? '>'
FusedTag ::= '<' D? NCName D? Attributes D? '/' '>'
Attribute ::= NCName D? ( '=' D? AttValue | ':' String )
NCName ::= [http://www.w3.org/TR/xml-names/#NT-NCName]
AttValue ::= '"' ([^&>"]|Reference)* '"' | "'" ([^&>"]|Reference)* "'"
Reference ::= '&' (NamedCharacterReference|[0-9]+|'x'[0-9A-Fa-F])';'
NamedCharacterReference ::= [http://www.w3.org/TR/html5/syntax.html#named-character-references]
JSON ::= Reserved | Variable | Number | String | Array | Object
Variable ::= Identifier - Reserved
Reserved ::= Null | Boolean
Null ::= 'null'
Boolean ::= 'true' | 'false'
Identifier ::= [a-zA-Z_] [a-zA-Z0-9_]*
Number ::= '-'? [0-9]+ ( '.' [0-9]+ )? ( ( 'e' | 'E' ) [0-9]+ )?
String ::= '"' ([^"\]|Escape)* '"' | "'" ([^'\]|Escape)* "'"
Escape ::= '\' ( ["'\/bfnrt] | 'u' Hex Hex Hex Hex | Reference )
Hex ::= [0-9a-fA-F]
Array ::= '[' Children  ']'
Object ::= '{' Entries '}'
Entries ::= ( Entry ( Separator Entry )* )? Separator? D?
Entry ::= D? EntryKey D? ( ':' | '=' ) MinXSON
EntryKey ::= Identifier | String
D ::= S | XMLComment
XMLComment ::= '<' [?!] [^>]* '>' 
Shebang ::= '#!' [^#x12]* #x12
S ::= (#x20 | #x9 | #xD | #xA)+ | '/*' ( [^*] | '*'+ [^*/] )* '*'* '*/' | '//' [^#x12]* #x12

Railroad Diagram

MinXSON

MinXSON.png

Element

Element.png

Children

Children.png

Separator

Separator.png

Child

Child.png

StartTag

StartTag.png

EndTag

EndTag.png

FusedTag

FusedTag.png

Embedded

Embedded.png

Attribute

Attribute.png

NCName

NCName.png

AttValue

AttValue.png

Reference

Reference.png

NamedCharacterReference

NamedCharacterReference.png

JSON

JSON.png

Variable

Variable.png

Reserved

Reserved.png

Null

Null.png

Boolean

Boolean.png

Number

Number.png

String

String.png

Escape

Escape.png

Hex

Hex.png

Array

Array.png

Object

Object.png

Entries

Entries.png

Entry

Entry.png

EntryKey

EntryKey.png

D

D.png

Shebang

Shebang.png

XMLComment

XMLComment.png

S

S.png

See Also