-
Notifications
You must be signed in to change notification settings - Fork 4
Expand file tree
/
Copy pathclext-spec.txt
More file actions
160 lines (101 loc) · 5.49 KB
/
clext-spec.txt
File metadata and controls
160 lines (101 loc) · 5.49 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
CBOR extensions for Lisp and other dynamic languages
====================================================
Authors:
- Mihai Bazon <mihai.bazon@gmail.com>
- Eugene Zaikonnikov <eugene@funcall.org>
Reviewers:
- Carsten Bormann <cabo@tzi.org>
- Christian Amsüss <christian@amsuess.com>
Status: draft (discussion in progress; suggested tag values may change)
SUMMARY
| Tag | Description | Data item |
|-----+-------------+------------------|
| 280 | Symbol | STRING or ARRAY |
| 281 | Linked list | ARRAY |
| 282 | Character | UNSIGNED INTEGER |
| 283 | Object | ARRAY |
These tags help for more precise serialization of several data structures
typically used in Lisp, but also in many other languages, the goal being
that on deserialization we get back an isomorphic data structure.
In this specification, when we mention NULL, UNDEFINED, INTEGER, STRING,
ARRAY or MAP we refer to the basic CBOR data types.
1. TAG 280: SYMBOL
A Lisp symbol. Note that many other languages have a Symbol type [1].
1.1. Data item: STRING or ARRAY (clarified below)
When the data item is a STRING, it is the NAME of the symbol.
When it's an array, it can have one or two elements:
[ NAME ] or [ PACKAGE, NAME ]
- NAME is a string — the Symbol name.
- PACKAGE can be a string, or NULL.
In many languages, symbols can belong to repositories (called "packages" in
Common Lisp). We say that the symbol is "interned" into the package. In such
case, PACKAGE will be the package name (STRING).
For uninterned symbols, the data item can be an array with only one element
(the symbol NAME), or it can have two elements and PACKAGE will be NULL.
Some languages support a "central" symbol repository, and reckoning that
such symbols may be more prevalent than uninterned symbols, we offer a
shortcut representation - the data item will be directly a STRING, the
symbol NAME. For example, in Common Lisp such symbols would be interned into
the KEYWORD package. In JavaScript, a decoder would use `Symbol.for` [2] to
register the symbol. Scheme too has the notion of keyword symbols [3],
although the exact syntax depends on the implementation. A decoder should
produce what is most sensible for the target language.
2. TAG 281: LIST
A tag for pairs / linked lists. This data type can be used to represent a
wide range of structures, from simple lists to trees, graphs, sparse
matrices etc.
2.1. Data item: ARRAY
Items except the last one pack consecutive list elements. The last item is
special - the tail of the list. In this context, NULL marks the end of the
list (empty tail).
In the examples here, by `LIST` and `LIST*` we mean an array tagged as this
node. In notation, we include the asterisk when there is at least one
element, as this tag works similarly to the Common Lisp construct `list*`,
which builds a list given a few elements and its tail.
When the length is 2, this tag is effectively behaving like a pair (usually
called "cons" in Lisp). A simple encoder may chose to always use pairs to
represent a linked list, so the list (1 2 3 4) can be encoded as:
LIST*[1, LIST*[2, LIST*[3, LIST*[4, NULL]]]]
However, provided that no part of the structure is shared with other values,
an encoder may prefer to use the more compact encoding:
LIST*[1, 2, 3, 4, NULL]
By convention, the empty list can be represented as LIST[] (empty array). In
Common Lisp this would simply read back as NIL, but in other languages (such
as Scheme) the distinction might be important. A Scheme encoder may
therefore opt to use LIST[] instead of NULL to mark the end of the list /
empty tail. However, a proper decoder must treat NULL in tail position as
empty list (end of list).
The tail is implied to be empty when the array has exactly one element.
Examples:
() => LIST[]
(1) => LIST*[1] or LIST*[1, NULL]
(1 2) => LIST*[1, 2, NULL]
(1 . 2) => LIST*[1, 2]
(1 2 3) => LIST*[1, 2, 3, NULL]
(1 2 . 3) => LIST*[1, 2, 3]
((1 . 2) (3 . 4)) => LIST*[LIST*[1, 2], LIST*[3, 4], NULL]
((1 . 2) . (3 . 4)) => LIST*[LIST*[1, 2], LIST*[3, 4]]
3. TAG 282: Character (Unicode scalar value)
A single character.
3.1. Data item: UNSIGNED INTEGER
The Unicode code point.
4. TAG 283: Object Snapshot
Unlike tags 26 [4] and 27 [5], which specify how to construct an object, the
Object Snapshot tag records the internal state of an object at the time of
serialization. It encodes the object class name and properties/values map.
4.1. Data item: ARRAY (2) [ CLASS, SLOTS ]
CLASS is a Symbol (object class name), and SLOTS (object properties) is a
basic CBOR mapping, where keys are Symbol (property name) and they map to
the present serialization value. Unbound slots may be left out from
encoding, or they can be encoded and have the value UNDEFINED.
To save space, the class and slot names may be unwrapped symbols (the symbol
tag may be missing), thus they can be directly encoded as a STRING (for
keyword symbols) or an ARRAY for package-qualified symbols (refer to the
Symbol encoding above). This simplification may also be suitable for
languages where the class name and object properties are commonly plain
strings (e.g. Java or JavaScript).
[1] https://en.wikipedia.org/wiki/Symbol_(programming)
[2] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Symbol/for
[3] https://docs.scheme.org/surveys/keyword-syntax/
[4] http://cbor.schmorp.de/perl-object
[5] http://cbor.schmorp.de/generic-object