Line Delimited JSON

From Deletionpedia.org: a home for articles deleted from Wikipedia
Jump to: navigation, search
This article was considered for deletion at Wikipedia on November 6 2015. This is a backup of Wikipedia:Line_Delimited_JSON. All of its AfDs can be found at Wikipedia:Special:PrefixIndex/Wikipedia:Articles_for_deletion/Line_Delimited_JSON, the first at Wikipedia:Wikipedia:Articles_for_deletion/Line_Delimited_JSON. Purge

Wikipedia editors had multiple issues with this page:
DPv2 loves original research.
The topic of this article may not meet Wikipedia's general notability guideline. But, that doesn't mean someone has to… establish notability by citing reliable secondary sources that are independent of the topic and provide significant coverage of it beyond its mere trivial mention. (October 2013)

Template:NPOV language Line Delimited JSON is a standard for delimiting JSON in stream protocols (such as TCP).

Introduction

This is a minimal specification for sending and receiving JSON over a stream protocol, such as TCP.

The Line Delimited JSON framing is so simple that no specification had previously been written for this ‘obvious’ way to do it.

Example output

(With \r\n line separators)

<source lang="javascript"> {"some": "thing"} {"foo": 17, "bar": false, "quux": true} {"may": {"include": "nested", "objects": ["and", "arrays"]}} </source>

Motivation

There is currently no standard for transporting JSON within a stream protocol (primarily plain TCP), apart from WebSockets, which is unnecessarily complex for non-browser applications.

An important use case is processing a large number of JSON objects where the receiver of the data should not have to receive every single byte before it can begin decoding it. The processing time and memory usage of a JSON parser trying to parse a multi-gigabyte (or larger) string is often prohibitive. Thus, a "properly" encoded JSON list of millions of lines is not a practical way to pass and parse data.[1]

There were numerous possibilities for JSON framing, including counted strings and ASCII control characters or non-ASCII characters as delimiters (DLE STX, ETX or WebSocket's 0xFFs).

Scope

The primary use case for LDJSON is an unending stream of JSON objects, delivered at variable times, over TCP, where each object needs to be processed as it arrives. e.g. a stream of stock quotes or chat messages.

Philosophy / requirements

The specification must be:

  • trivial to implement in multiple popular programming languages
  • flexible enough to handle arbitrary whitespace (pretty-printed JSON)
  • not contain non-printable characters
  • netcat/telnet friendly

Functional specification

Template:Empty section

Software that supports Line Delimited JSON

PostgreSQL

As of version 9.2 PostgreSQL has a function called row_to_json.[2] In addition PostgreSQL supports JSON as a field type, so this may output nested components in much the same way as MongoDB and other NoSQL databases. <source lang="json">

[email protected]:~$ echo 'SELECT row_to_json(article) FROM article;' | sudo -u postgres psql—tuples-only
 {"article_id":1,"article_name":"ding","article_desc":"bellsound","date_added":null}
 {"article_id":2,"article_name":"dong","article_desc":"bellcountersound","date_added":null}
[email protected]:~$

</source>

Apache

Apache logs can be formatted as JSON lines by setting the LogFormat variable. For example, here is how to write logs for consumption by Logstash and Kibana: "Getting Apache to output JSON (for logstash 1.2.x)". http://untergeek.com/2013/09/11/getting-apache-to-output-json-for-logstash-1-2-x/. 

NGINX

NGIИX logs can likewise be formatted as JSON lines by setting the log_format variable, such as in this example: "Logging to Logstash JSON Format in Nginx". https://blog.pkhamre.com/logging-to-logstash-json-format-in-nginx/. 

jline

An example [1] of command-line tools for manipulating JSON lines in much the same way that grep, sort and other Unix tools manipulate CSV.

jq

sed for JSON, implemented in C and compiled to a standalone binary. [2]

pigshell

This is a shell-in-a-browser that has pipelines made up from objects [3].

Sending

Each JSON object must be written to the stream followed by the carriage return and newline characters 0x0D0A. The JSON objects may contain newlines, carriage returns and any other permitted whitespace. See http://www.json.org/ for the full specification.

All serialized data must use the UTF-8 encoding.

Receiving

The receiver should handle pretty-printed (multi-line) JSON.

The receiver must accept all common line endings: ‘0x0A’ (Unix), ‘0x0D’ (Mac), ‘0x0D0A’ (Windows).

Trivial implementation

A simple implementation is to accumulate received lines. Every time a line ending is encountered, an attempt must be made to parse the accumulated lines into a JSON object.

If the parsing of the accumulated lines is successful, the accumulated lines must be discarded and the parsed object given to the application code.

If the amount of unparsed, accumulated characters exceeds 16 MiB the receiver may close the stream. Resource constrained devices may close the stream at a lower threshold, though they must accept at least 1 KiB.

Implementations

MIME type and file extensions

When using HTTP/email the MIME type for Line Delimited JSON should be application/x-ldjson.

When saved in a file, the file extension should be .ldjson or .ldj

Many parsers handle Line Delimited JSON,[3] and standard content-type for "streaming JSON" suggests application/json; boundary=NL for the MIME type

See also

Notes and references

External links