sphinx-literalizer

sphinx-literalizer is a Sphinx extension for literalizer, which converts JSON, YAML, TOML, and JSON5 data structures to native language literal syntax (Python, TypeScript, Go, etc.).

Installation

Requires Python 3.12+.

pip install sphinx-literalizer

Usage

Add to your Sphinx conf.py:

extensions = [
    # ...
    "sphinx_literalizer",
]

Then use the literalizer directive in your .rst files:

.. literalizer:: path/to/data.json
   :language: python

This reads the data file and renders its contents as a native Python literal in a code block. Input format is auto-detected from the file extension (.json, .yaml/.yml, .toml, .json5), or can be set explicitly with :input-format:.

Directive options

:language: (required)

Target language name. Supported values: ada, bash, c, clojure, cobol, common-lisp, cpp, crystal, csharp, d, dart, dhall, elixir, elm, erlang, fortran, fsharp, forth, gleam, go, groovy, haskell, hcl, java, javascript, json5, jsonnet, julia, kotlin, lua, matlab, mojo, nim, nix, norg, objective-c, ocaml, occam, odin, perl, php, powershell, purescript, python, r, racket, raku, roc, ruby, rust, scala, scheme, sml, swift, systemverilog, tcl, toml, typescript, v, vb.net, wren, yaml, zig.

:input-format: (optional)

Input data format. If not specified, auto-detected from the file extension. Supported values: json, json5, yaml, toml.

:pre-indent-level: (optional)

Number of indent levels to prepend to each output line. Defaults to 0.

:indent: (optional)

Number of whitespace characters used for one level of indentation. Defaults to 4.

:indent-char: (optional)

Type of whitespace for indentation: spaces (default) or tabs.

:include-delimiters: (optional flag)

Include collection delimiters in the output ([] for arrays, {} for dicts).

:collection-layout: (optional)

How to render collections nested inside other collections. Supported values:

compact

Keep nested collections on one line (default).

multiline

Render non-empty nested collections with one element per line.

:include-preamble: (optional flag)

Include language preamble lines (imports, package declarations, etc.) before the generated code. For example, Go code will be preceded by package main, Rust by use std::collections::HashMap;, etc. Has no effect when the language does not require a preamble.

:language-version: (optional)

Target language version. Values are language-specific enum member names in lowercase, e.g. py39 for Python, jdk_11 for Java, and ada_2022 for Ada. Some languages expose more than one version (e.g. v2003 and v2008 for Fortran); unsupported values raise an error.

:ref-case: (optional)

Case conversion for reference markers in the input data. Supported values: camel, kebab, pascal, snake, upper_snake. When set, a single-key mapping such as {"$ref": "user_obj"} renders as a bare identifier instead of a literal dictionary.

:ref-key: (optional)

Marker key used with :ref-case: to detect references. Defaults to $ref. For example, with :ref-key: $reference the mapping {"$reference": "user_obj"} is treated as a reference marker.

:module-name: (optional)

Module or wrapper name used by languages whose :wrap-in-file: output introduces a named scope. The value is converted to the case expected by the selected language.

:record-struct-name-prefix: (optional)

Name prefix for the structs / records / classes generated by :heterogeneous-strategy: record (the auto-generated names are this prefix followed by an index, e.g. Record0). Defaults to Record. Available for Go, Java, Kotlin, Python, Rust, and Scala; using it with any other language raises an error.

:record-shape-names: (optional)

Custom names for specific record shapes generated by :heterogeneous-strategy: record, given as a semicolon-separated list of key1,key2=Name entries. Each entry maps a record’s set of keys to the name used instead of the auto-generated one; for example x,y=Point; a,b,c=Vec3. The name must be a valid PascalCase identifier for the selected language and must not collide with the auto-generated names or with another entry. Available for Go, Java, Kotlin, Rust, and Scala; using it with any other language raises an error.

:wrap-in-file: (optional flag)

Wrap the generated code in a complete file/module when the selected language supports that mode.

:date-format: (optional)

How to render YAML dates. Not all values are valid for every language. Supported values:

cpp

std::chrono::year_month_day type.

csharp

new DateOnly(...) constructor.

dart

DateTime.parse(...) constructor.

go

time.Date(...) call.

iso

Quoted ISO 8601 string (e.g. "2024-01-15"). This is the default for most languages.

java

LocalDate.of(...) constructor.

js

new Date(...) constructor (JavaScript/TypeScript).

julia

Date(...) constructor.

kotlin

LocalDate.of(...) constructor.

objc

Objective-C date representation.

php

PHP date representation.

python

datetime.date(...) constructor.

r

as.Date(...) call.

ruby

Date.new(...) constructor.

rust

NaiveDate::from_ymd_opt(...) call.

toml

TOML date literal.

yaml

YAML date literal.

:datetime-format: (optional)

How to render YAML datetimes. Not all values are valid for every language. Supported values:

cpp

std::chrono datetime type.

csharp

new DateTime(...) constructor.

dart

DateTime.parse(...) constructor.

epoch

Seconds since Unix epoch (e.g. 1705314600.0).

go

time.Date(...) call.

instant

Instant.parse(...) (Java).

iso

Quoted ISO 8601 string. This is the default for most languages.

js

new Date(...) constructor (JavaScript/TypeScript).

julia

DateTime(...) constructor.

kotlin

LocalDateTime.of(...) constructor.

objc

Objective-C datetime representation.

php

PHP datetime representation.

python

datetime.datetime(...) constructor.

r

as.POSIXct(...) call.

ruby

Time.new(...) constructor.

rust

NaiveDateTime::new(...) call.

toml

TOML datetime literal.

yaml

YAML datetime literal.

zoned

ZonedDateTime.of(...) (Java).

:sequence-format: (optional)

How to render sequences (arrays/lists). Not all values are valid for every language. Supported values:

array

Array delimiters. Available for Crystal (default), Julia (default), Rust, and many other languages.

cell_array

Cell array delimiters. Available for MATLAB (default).

initializer_list

Initializer list. Available for C++ (default).

list

List delimiters. Available for Elixir (default), Erlang (default), Python, and many other languages.

sequence

Sequence delimiters. Available for COBOL (default) and YAML (default).

slice

Slice delimiters. Available for Go (default).

table

Table delimiters. Available for Lua (default).

tuple

Tuple delimiters. Available for Crystal, Elixir, Erlang, Julia, Python (default for Python), and Rust.

vec

Vec macro (vec![...]). Available for Rust (default).

vector

Vector delimiters. Available for Clojure (default).

:set-format: (optional)

How to render sets (Python only). Supported values:

set

{} set literal (default).

frozenset

frozenset({}) constructor.

:bytes-format: (optional)

How to render binary data (Python only). Supported values:

hex

Hex-escaped bytes literal, e.g. b"\x48\x65" (default).

python

Python bytes literal, e.g. b"Hello".

:variable-name: (optional)

Wrap the output in a variable declaration or assignment using the given name. Use with :include-delimiters: to include the collection delimiters.

:existing-variable: (optional flag)

When combined with :variable-name:, produce an assignment to an existing variable (e.g. x = ...) instead of a new variable declaration (e.g. final x = ... in Dart). Has no effect without :variable-name:.

:modifiers: (optional)

Comma-separated modifier keywords to add to a new variable declaration, e.g. public,static,final. Requires :variable-name: and cannot be combined with :existing-variable:. Supported values depend on the language:

C++

static, const.

C#

public, private, protected, static, const, readonly.

Java

public, private, protected, static, final.

Rust

mut (a mutable let mut binding, so the bound value can be mutated through the binding).

:variable-type-hints: (optional)

Whether to add type hints to variable declarations. Supported values:

auto

Type hints are added when the language requires them (default).

always

Always add type annotation, e.g. my_var: dict[str, Any] = {...}. Currently available for Python only.

:comment-format: (optional)

How to render comments. Not all values are valid for every language. Supported values:

apostrophe

' comments. Available for Visual Basic.

block

Block comments (/* ... */ or equivalent). Available for C, C#, C++, Common Lisp, D, Dart, F#, Go, Groovy, Haskell, HCL, Java, JavaScript, Julia, Kotlin, Lua, MATLAB, Nim, Objective-C, PHP, PowerShell, Racket, Rust, Scala, Swift, TypeScript.

double_dash

-- comments. Available for Ada, Haskell, Lua, Occam.

double_slash

// comments. Available for C, C#, C++, D, Dart, F#, Go, Groovy, Java, JavaScript, Kotlin, Objective-C, PHP, Rust, Scala, Swift, TypeScript, Zig.

exclamation

! comments. Available for Fortran.

hash

# comments. Available for Bash, Crystal, Elixir, HCL, Julia, Mojo, Nim, Perl, PowerShell, Python, R, Ruby, TOML, YAML.

paren_star

(* ... *) comments. Available for OCaml.

percent

% comments. Available for Erlang, MATLAB, Norg.

semicolon

; comments. Available for Clojure, Common Lisp, Racket.

star_angle

*> comments. Available for COBOL.

:declaration-style: (optional)

How to declare variables. Not all values are valid for every language. Supported values:

assign

Plain assignment (x = ...). Available for Crystal, Elixir, Erlang, Haskell, HCL, Julia, MATLAB, Mojo, PHP, PowerShell, Python, R, Ruby, TOML, YAML.

auto

auto keyword. Available for C++, D.

block

Block-level declaration. Available for Norg.

const

const keyword. Available for JavaScript, TypeScript, Zig.

declare

Language-specific declaration keyword. Available for Ada, Bash.

def

def keyword. Available for Clojure, Groovy.

define

define form. Available for Racket.

defparameter

defparameter form. Available for Common Lisp.

dim

Dim keyword. Available for Visual Basic.

final

final keyword. Available for Dart.

let

let keyword. Available for F#, JavaScript, OCaml, Rust, Swift.

local

local keyword. Available for Lua.

my

my keyword. Available for Perl.

short

Short variable declaration (:=). Available for Go.

typed

Typed declaration. Available for C, COBOL, Fortran, Objective-C.

val

val keyword. Available for Kotlin, Occam, Scala.

var

var keyword. Available for C#, Java, Nim.

:dict-format: (optional)

How to render dictionaries / maps. Not all values are valid for every language. Supported values:

default

Language-default dict syntax. Available for most languages.

dict

Dict(...) constructor. Available for Julia.

dictionary

Dictionary constructor. Available for C#.

hash_map

HashMap constructor. Available for Rust.

map

Map constructor. Available for C++, JavaScript, Kotlin, Scala.

map_of_entries

Map.ofEntries(...) constructor. Available for Java.

object

Object literal syntax. Available for JavaScript, TypeScript.

struct

struct syntax. Available for MATLAB.

:integer-format: (optional)

How to render integer values. Not all values are valid for every language. Supported values:

decimal

Decimal integer literal (default for all languages).

hex

Hexadecimal integer literal. Available for JavaScript.

:numeric-separator: (optional)

Whether to use numeric separators in integer literals. Supported values:

none

No separators (default for all languages).

underscore

Underscore separators (e.g. 1_000_000). Available for JavaScript.

:numeric-style: (optional)

The numeric literal style. Supported values:

overloaded

Use overloaded numeric type classes (default for Haskell).

explicit

Wrap every numeric literal in an explicit constructor. Available for Haskell.

:string-format: (optional)

How to render string values. Supported values:

double

Double-quoted strings (default for all languages).

single

Single-quoted strings. Available for JavaScript.

:trailing-comma: (optional)

Whether to include a trailing comma after the last element in collections. Supported values:

yes

Include trailing comma. Available for C, C++, Crystal, D, Dart, Elixir, Go, Groovy, HCL, JavaScript, Julia, Kotlin, Lua, Mojo, Objective-C, Perl, PHP, Python, Ruby, Rust, Scala, Swift, TypeScript, Zig.

no

Omit trailing comma. Available for Ada, Bash, C#, Clojure, COBOL, Common Lisp, Erlang, F#, Fortran, Haskell, Java, JavaScript, MATLAB, Nim, Norg, OCaml, Occam, PowerShell, R, Racket, TOML, Visual Basic, YAML.

:empty-dict-key: (optional)

How to handle empty string keys in dictionaries. Supported values:

positional

Use positional arguments for empty keys (default). Available for R.

error

Raise an error on empty keys. Available for R.

:heterogeneous-strategy: (optional, defaults to auto)

How to render scalar collections whose elements have more than one language-level type. Defaults to auto; set it explicitly only to pin a specific representation (see the caveat under auto). Supported values:

auto (default)

Render the input with its natural representation first, and only if that fails because the data is heterogeneous, retry with each strategy the target language supports in the order given by the literalizer_heterogeneous_strategy_precedence configuration value. Homogeneous and genuinely map-shaped data keep their native form, so a single auto works across a mix of inputs without picking a strategy per project or per file. Because auto chooses a representation implicitly – record vs tuple vs tagged_enum changes the shape of the generated API – set :heterogeneous-strategy: explicitly when a specific representation is wanted for clarity.

error

Raise an error when a collection mixes scalar types. This was the default before sphinx-literalizer defaulted :heterogeneous-strategy: to auto; set it explicitly to keep a mixed-scalar collection a hard build failure.

tagged_enum

Emit a minimal tagged enum in the preamble and wrap each heterogeneous value at the call site (e.g. Value::I32(1)). Available for Rust.

record

Emit a generated struct / record / dataclass declaration in the preamble for each record-shaped mapping plus a matching literal (e.g. Record0{Name: "a", Items: []int{1}}). The generated names can be customised with :record-struct-name-prefix: and :record-shape-names:. Available for Go, Java, Kotlin, Python, Rust, and Scala.

tuple

Render a fixed-length heterogeneous scalar array as the language’s native tuple literal instead of raising (e.g. std::make_tuple(...)). Available for C++, Kotlin, Rust, Scala, and TypeScript.

See Heterogeneous strategies for the full set of strategies (including tuple, object_variant, union_type, interface, and variant), worked examples, and the per-language support matrix.

:skip-if-unrepresentable: (optional, flag)

Emit no node at all – instead of failing the build – when the input cannot be represented in the target language, including after :heterogeneous-strategy: auto exhausts its precedence. This lets a loop that renders the same data in several languages skip the languages a given input does not fit, without moving data-shape knowledge into the surrounding prose or template.

:call-style: (optional)

How literalizer-call renders function calls. Each language offers its own set of call styles; using a value a language does not support raises an error. Supported values:

positional

Pass arguments by position (e.g. myFunc({...})). Available for C, C#, C++, Crystal, D, F#, Go, Groovy, HCL, Haskell, Java, JavaScript, Julia, Kotlin, Lua, Perl, PHP, Python, R, Ruby, Rust, Scala, TypeScript.

keyword

Pass arguments by keyword (e.g. my_func(flag=True)). Available for Crystal, Groovy, Jsonnet, Julia, Kotlin, PHP, Python, R, Ruby, Scala, Swift.

named

Pass arguments by named colon syntax (e.g. MyFunc(flag: true)). Available for C#.

object

Pass arguments as a single object literal (e.g. myFunc({ obj: {...} })). Available for JavaScript, TypeScript.

prefix_keyword

Pass arguments as prefixed keyword pairs in an S-expression (e.g. (process :flag t)). Available for Common Lisp, Racket.

postfix

Push arguments before naming the function (e.g. 1 2 ADD). Available for Forth.

Example

Given a file _examples/literal.json containing:

[true, false, 42, "hello"]

the literalizer directive renders both its directive source and the generated code block:

.. literalizer:: _examples/literal.json
   :language: python
   :include-delimiters:
(
    True,
    False,
    42,
    "hello",
)

literalizer-call directive

The literalizer-call directive converts data files into function call expressions. Each top-level list element becomes a separate call (when :per-element: is set), with its values mapped positionally to the given parameter names.

Given a file _examples/calls.json containing:

[[true, 42, "hello"], [false, 99, "world"]]

the directive renders as:

.. literalizer-call:: _examples/calls.json
   :language: python
   :target-function: my_func
   :parameter-names: flag,count,name
   :per-element:
my_func(flag=True, count=42, name="hello")
my_func(flag=False, count=99, name="world")

For positional-call languages like Go, _examples/calls_go.json containing:

[[true, 42], [false, 99]]

renders as:

.. literalizer-call:: _examples/calls_go.json
   :language: go
   :target-function: myFunc
   :parameter-names: flag,count
   :per-element:
myFunc(true, 42)
myFunc(false, 99)

A no-argument constructor bound to a variable is expressed with an empty :parameter-names: (or omitted :parameter-names:), :per-element:, and a single-element source. Given _examples/no_args.yaml containing:

---
# A single row with no values: one call that takes no arguments.
- []

the directive renders the language-idiomatic no-argument construction:

.. literalizer-call:: _examples/no_args.yaml
   :language: rust
   :constructor-class: Playlist
   :per-element:
   :variable-name: p1
// A single row with no values: one call that takes no arguments.
let p1 = Playlist::new();

literalizer-call directive options

:target-function: (required unless :constructor-class: is set)

The function expression to call (e.g. my_func or throttler.should_send_notification).

:constructor-class: (required unless :target-function: is set)

A class/type name whose no-argument constructor should be called using the selected language’s idiom. For example, Playlist renders as Playlist(), new Playlist(), NewPlaylist(), Playlist.new(), or Playlist::new() depending on the language. This option cannot be combined with :target-function:.

:parameter-names: (optional)

Comma-separated parameter names, positionally mapped to each element in each row. For positional-call languages (like Go) these are unused in the output but still determine how many values to expect per row. An empty (or omitted) value means the call takes no arguments, which – combined with :per-element: over a single-element source and :variable-name: – renders a no-argument constructor bound to a variable (see the example below).

:per-element: (optional flag)

When set, each top-level list element becomes a separate function call. Without this flag, the whole literalized value is passed as a single argument.

:omit-code: (optional flag)

Omit the generated call expressions. Combine this with :include-preamble: to render only imports or other preamble lines, which is useful when a later literalizer-call block needs preamble lines at the top of a combined snippet.

:call-transform: (optional)

A template applied to each generated call. These placeholders are substituted: $call (and the $0 alias) for the rendered call expression, $index for the zero-based call position, and $zipped for the matching :zip-file: element rendered as a native literal (empty when no :zip-file: is given). For example, :call-transform: result_$index = $call.

:zip-file: (optional)

A data file whose top-level elements pair positionally with the generated calls. Each paired element is rendered as a native literal and made available to :call-transform: as $zipped, which is handy for generating assertions from a parallel file of expected results (e.g. :call-transform: assert $call == $zipped). The file is parsed with the same parser as the main source.

:zip-input-format: (optional)

The input format (json, json5, yaml, or toml) of the :zip-file:. Defaults to the format inferred from the :zip-file: extension.

:comment-file: (optional)

A text file with one line per generated call. Each non-blank line is emitted as a trailing source comment after the matching call, using the target language’s comment syntax, and a blank line emits no comment for that call. The number of lines must equal the number of generated calls. Unlike :call-transform:, the comment is placed after the statement terminator, so a comment can be attached without commenting out the terminator.

The literalizer-call directive also supports these shared options from the literalizer directive: :language:, :input-format:, :pre-indent-level:, :indent:, :indent-char:, :include-preamble:, :language-version:, :ref-case:, :ref-key:, :module-name:, :record-struct-name-prefix:, :record-shape-names:, :wrap-in-file:, :collection-layout:, :heterogeneous-strategy:, :skip-if-unrepresentable:, and all format options (e.g. :string-format:, :trailing-comma:, etc.).

Configuration

literalizer_heterogeneous_strategy_precedence

The order in which :heterogeneous-strategy: auto tries representational strategies after the natural representation fails. It defaults to ["record", "tuple", "tagged_enum", "object_variant", "variant", "union_type", "interface"] and is restricted, per directive, to the strategies the target language exposes (error is never a fallback). Set it in conf.py to change which representation auto prefers, for example to prefer a tagged union over a record:

literalizer_heterogeneous_strategy_precedence = [
    "tagged_enum",
    "record",
    "tuple",
]

Reference