1*46439007SCharles.ForsythNetwork Working Group R. Rivest 2*46439007SCharles.ForsythInternet Draft May 4, 1997 3*46439007SCharles.ForsythExpires November 4, 1997 4*46439007SCharles.Forsyth 5*46439007SCharles.Forsyth 6*46439007SCharles.Forsyth S-Expressions 7*46439007SCharles.Forsyth draft-rivest-sexp-00.txt 8*46439007SCharles.Forsyth 9*46439007SCharles.Forsyth 10*46439007SCharles.ForsythStatus of this Memo 11*46439007SCharles.Forsyth 12*46439007SCharles.Forsyth Distribution of this memo is unlimited. 13*46439007SCharles.Forsyth 14*46439007SCharles.Forsyth This document is an Internet-Draft. Internet Drafts are working 15*46439007SCharles.Forsyth documents of the Internet Engineering Task Force (IETF), its Areas, 16*46439007SCharles.Forsyth and its Working Groups. Note that other groups may also distribute 17*46439007SCharles.Forsyth working documents as Internet Drafts. 18*46439007SCharles.Forsyth 19*46439007SCharles.Forsyth Internet Drafts are draft documents valid for a maximum of six 20*46439007SCharles.Forsyth months, and may be updated, replaced, or obsoleted by other documents 21*46439007SCharles.Forsyth at any time. It is not appropriate to use Internet Drafts as 22*46439007SCharles.Forsyth reference material, or to cite them other than as a ``working draft'' 23*46439007SCharles.Forsyth or ``work in progress.'' 24*46439007SCharles.Forsyth 25*46439007SCharles.Forsyth To learn the current status of any Internet-Draft, please check the 26*46439007SCharles.Forsyth ``1id-abstracts.txt'' listing contained in the internet-drafts Shadow 27*46439007SCharles.Forsyth Directories on: ftp.is.co.za (Africa), nic.nordu.net (Europe), 28*46439007SCharles.Forsyth ds.internic.net (US East Coast), ftp.isi.edu (US West Coast), 29*46439007SCharles.Forsyth or munnari.oz.au (Pacific Rim) 30*46439007SCharles.Forsyth 31*46439007SCharles.Forsyth 32*46439007SCharles.ForsythAbstract 33*46439007SCharles.Forsyth 34*46439007SCharles.ForsythThis memo describes a data structure called "S-expressions" that are 35*46439007SCharles.Forsythsuitable for representing arbitrary complex data structures. We make 36*46439007SCharles.Forsythprecise the encodings of S-expressions: we give a "canonical form" for 37*46439007SCharles.ForsythS-expressions, described two "transport" representations, and also 38*46439007SCharles.Forsythdescribe an "advanced" format for display to people. 39*46439007SCharles.Forsyth 40*46439007SCharles.Forsyth 41*46439007SCharles.Forsyth 42*46439007SCharles.Forsyth1. Introduction 43*46439007SCharles.Forsyth 44*46439007SCharles.ForsythS-expressions are data structures for representing complex data. They 45*46439007SCharles.Forsythare either byte-strings ("octet-strings") or lists of simpler 46*46439007SCharles.ForsythS-expressions. Here is a sample S-expression: 47*46439007SCharles.Forsyth 48*46439007SCharles.Forsyth (snicker "abc" (#03# |YWJj|)) 49*46439007SCharles.Forsyth 50*46439007SCharles.ForsythIt is a list of length three: 51*46439007SCharles.Forsyth 52*46439007SCharles.Forsyth -- the octet-string "snicker" 53*46439007SCharles.Forsyth 54*46439007SCharles.Forsyth -- the octet-string "abc" 55*46439007SCharles.Forsyth 56*46439007SCharles.Forsyth -- a sub-list containing two elements: 57*46439007SCharles.Forsyth - the hexadecimal constant #03# 58*46439007SCharles.Forsyth - the base-64 constant |YWJj| (which is the same as "abc") 59*46439007SCharles.Forsyth 60*46439007SCharles.ForsythThis note gives a specific proposal for constructing and utilizing 61*46439007SCharles.ForsythS-expressions. The proposal is independent of any particular application. 62*46439007SCharles.Forsyth 63*46439007SCharles.ForsythHere are the design goals for S-expressions: 64*46439007SCharles.Forsyth 65*46439007SCharles.Forsyth -- generality: S-expressions should be good at representing arbitrary 66*46439007SCharles.Forsyth data. 67*46439007SCharles.Forsyth 68*46439007SCharles.Forsyth -- readability: it should be easy for someone to examine and 69*46439007SCharles.Forsyth understand the structure of an S-expression. 70*46439007SCharles.Forsyth 71*46439007SCharles.Forsyth -- economy: S-expressions should represent data compactly. 72*46439007SCharles.Forsyth 73*46439007SCharles.Forsyth -- tranportability: S-expressions should be easy to transport 74*46439007SCharles.Forsyth over communication media (such as email) that are known to be 75*46439007SCharles.Forsyth less than perfect. 76*46439007SCharles.Forsyth 77*46439007SCharles.Forsyth -- flexibility: S-expressions should make it relatively simple to 78*46439007SCharles.Forsyth modify and extend data structures. 79*46439007SCharles.Forsyth 80*46439007SCharles.Forsyth -- canonicalization: it should be easy to produce a unique 81*46439007SCharles.Forsyth "canonical" form of an S-expression, for digital signature purposes. 82*46439007SCharles.Forsyth 83*46439007SCharles.Forsyth -- efficiency: S-expressions should admit in-memory representations 84*46439007SCharles.Forsyth that allow efficient processing. 85*46439007SCharles.Forsyth 86*46439007SCharles.Forsyth 87*46439007SCharles.ForsythSection 2 gives an introduction to S-expressions. 88*46439007SCharles.ForsythSection 3 discusses the character sets used. 89*46439007SCharles.ForsythSection 4 presents the various representations of octet-strings. 90*46439007SCharles.ForsythSection 5 describes how to represent lists. 91*46439007SCharles.ForsythSection 6 discusses how S-expressions are represented for various uses. 92*46439007SCharles.ForsythSection 7 gives a BNF syntax for S-expressions. 93*46439007SCharles.ForsythSection 8 talks about how S-expressions might be represented in memory. 94*46439007SCharles.ForsythSection 9 briefly describes implementations for handling S-expressions. 95*46439007SCharles.ForsythSection 10 discusses how applications might utilize S-expressions. 96*46439007SCharles.ForsythSection 11 gives historical notes on S-expressions. 97*46439007SCharles.ForsythSection 12 gives references. 98*46439007SCharles.Forsyth 99*46439007SCharles.Forsyth2. S-expressions -- informal introduction 100*46439007SCharles.Forsyth 101*46439007SCharles.ForsythInformally, an S-expression is either: 102*46439007SCharles.Forsyth -- an octet-string, or 103*46439007SCharles.Forsyth -- a finite list of simpler S-expressions. 104*46439007SCharles.Forsyth 105*46439007SCharles.ForsythAn octet-string is a finite sequence of eight-bit octets. There may be 106*46439007SCharles.Forsythmany different but equivalent ways of representing an octet-string 107*46439007SCharles.Forsyth 108*46439007SCharles.Forsyth abc -- as a token 109*46439007SCharles.Forsyth 110*46439007SCharles.Forsyth "abc" -- as a quoted string 111*46439007SCharles.Forsyth 112*46439007SCharles.Forsyth #616263# -- as a hexadecimal string 113*46439007SCharles.Forsyth 114*46439007SCharles.Forsyth 3:abc -- as a length-prefixed "verbatim" encoding 115*46439007SCharles.Forsyth 116*46439007SCharles.Forsyth {MzphYmM=} -- as a base-64 encoding of the verbatim encoding 117*46439007SCharles.Forsyth (that is, an encoding of "3:abc") 118*46439007SCharles.Forsyth 119*46439007SCharles.Forsyth |YWJj| -- as a base-64 encoding of the octet-string "abc" 120*46439007SCharles.Forsyth 121*46439007SCharles.ForsythThese encodings are all equivalent; they all denote the same octet string. 122*46439007SCharles.Forsyth 123*46439007SCharles.ForsythWe will give details of these encodings later on, and also describe how to 124*46439007SCharles.Forsythgive a "display type" to a byte string. 125*46439007SCharles.Forsyth 126*46439007SCharles.ForsythA list is a finite sequence of zero or more simpler S-expressions. A list 127*46439007SCharles.Forsythmay be represented by using parentheses to surround the sequence of encodings 128*46439007SCharles.Forsythof its elements, as in: 129*46439007SCharles.Forsyth 130*46439007SCharles.Forsyth (abc (de #6667#) "ghi jkl") 131*46439007SCharles.Forsyth 132*46439007SCharles.ForsythAs we see, there is variability possible in the encoding of an 133*46439007SCharles.ForsythS-expression. In some cases, it is desirable to standardize or 134*46439007SCharles.Forsythrestrict the encodings; in other cases it is desirable to have no 135*46439007SCharles.Forsythrestrictions. The following are the target cases we aim to handle: 136*46439007SCharles.Forsyth 137*46439007SCharles.Forsyth -- a "transport" encoding for transporting the S-expression between 138*46439007SCharles.Forsyth computers. 139*46439007SCharles.Forsyth 140*46439007SCharles.Forsyth -- a "canonical" encoding, used when signing the S-expression. 141*46439007SCharles.Forsyth 142*46439007SCharles.Forsyth -- an "advanced" encoding used for input/output to people. 143*46439007SCharles.Forsyth 144*46439007SCharles.Forsyth -- an "in-memory" encoding used for processing the S-expression in 145*46439007SCharles.Forsyth the computer. 146*46439007SCharles.Forsyth 147*46439007SCharles.ForsythThese need not be different; in this proposal the canonical encoding 148*46439007SCharles.Forsythis the same as the transport encoding, for example. In this note we 149*46439007SCharles.Forsythpropose (related) encoding techniques for each of these uses. 150*46439007SCharles.Forsyth 151*46439007SCharles.Forsyth3. Character set 152*46439007SCharles.Forsyth 153*46439007SCharles.ForsythWe will be describing encodings of S-expressions. Except when giving 154*46439007SCharles.Forsyth"verbatim" encodings, the character set used is limited to the following 155*46439007SCharles.Forsythcharacters in US-ASCII: 156*46439007SCharles.Forsyth Alphabetic: A B ... Z a b ... z 157*46439007SCharles.Forsyth numeric: 0 1 ... 9 158*46439007SCharles.Forsyth whitespace: space, horizontal tab, vertical tab, form-feed 159*46439007SCharles.Forsyth carriage-return, line-feed 160*46439007SCharles.Forsyth The following graphics characters, which we call "pseudo-alphabetic": 161*46439007SCharles.Forsyth - hyphen or minus 162*46439007SCharles.Forsyth . period 163*46439007SCharles.Forsyth / slash 164*46439007SCharles.Forsyth _ underscore 165*46439007SCharles.Forsyth : colon 166*46439007SCharles.Forsyth * asterisk 167*46439007SCharles.Forsyth + plus 168*46439007SCharles.Forsyth = equal 169*46439007SCharles.Forsyth The following graphics characters, which are "reserved punctuation": 170*46439007SCharles.Forsyth ( left parenthesis 171*46439007SCharles.Forsyth ) right parenthesis 172*46439007SCharles.Forsyth [ left bracket 173*46439007SCharles.Forsyth ] right bracket 174*46439007SCharles.Forsyth { left brace 175*46439007SCharles.Forsyth } right brace 176*46439007SCharles.Forsyth | vertical bar 177*46439007SCharles.Forsyth # number sign 178*46439007SCharles.Forsyth " double quote 179*46439007SCharles.Forsyth & ampersand 180*46439007SCharles.Forsyth \ backslash 181*46439007SCharles.Forsyth The following characters are unused and unavailable, except in 182*46439007SCharles.Forsyth "verbatim" encodings: 183*46439007SCharles.Forsyth ! exclamation point 184*46439007SCharles.Forsyth % percent 185*46439007SCharles.Forsyth ^ circumflex 186*46439007SCharles.Forsyth ~ tilde 187*46439007SCharles.Forsyth ; semicolon 188*46439007SCharles.Forsyth ' apostrophe 189*46439007SCharles.Forsyth , comma 190*46439007SCharles.Forsyth < less than 191*46439007SCharles.Forsyth > greater than 192*46439007SCharles.Forsyth ? question mark 193*46439007SCharles.Forsyth 194*46439007SCharles.Forsyth 195*46439007SCharles.Forsyth4. Octet string representations 196*46439007SCharles.Forsyth 197*46439007SCharles.ForsythThis section describes in detail the ways in which an octet-string may 198*46439007SCharles.Forsythbe represented. 199*46439007SCharles.Forsyth 200*46439007SCharles.ForsythWe recall that an octet-string is any finite sequence of octets, and 201*46439007SCharles.Forsyththat the octet-string may have length zero. 202*46439007SCharles.Forsyth 203*46439007SCharles.Forsyth 204*46439007SCharles.Forsyth4.1 Verbatim representation 205*46439007SCharles.Forsyth 206*46439007SCharles.ForsythA verbatim encoding of an octet string consists of four parts: 207*46439007SCharles.Forsyth 208*46439007SCharles.Forsyth -- the length (number of octets) of the octet-string, 209*46439007SCharles.Forsyth given in decimal most significant digit first, with 210*46439007SCharles.Forsyth no leading zeros. 211*46439007SCharles.Forsyth 212*46439007SCharles.Forsyth -- a colon ":" 213*46439007SCharles.Forsyth 214*46439007SCharles.Forsyth -- the octet string itself, verbatim. 215*46439007SCharles.Forsyth 216*46439007SCharles.ForsythThere are no blanks or whitespace separating the parts. No "escape 217*46439007SCharles.Forsythsequences" are interpreted in the octet string. This encoding is also 218*46439007SCharles.Forsythcalled a "binary" or "raw" encoding. 219*46439007SCharles.Forsyth 220*46439007SCharles.ForsythHere are some sample verbatim encodings: 221*46439007SCharles.Forsyth 222*46439007SCharles.Forsyth 3:abc 223*46439007SCharles.Forsyth 7:subject 224*46439007SCharles.Forsyth 4::::: 225*46439007SCharles.Forsyth 12:hello world! 226*46439007SCharles.Forsyth 10:abcdefghij 227*46439007SCharles.Forsyth 0: 228*46439007SCharles.Forsyth 229*46439007SCharles.Forsyth4.2 Quoted-string representation 230*46439007SCharles.Forsyth 231*46439007SCharles.ForsythThe quoted-string representation of an octet-string consists of: 232*46439007SCharles.Forsyth 233*46439007SCharles.Forsyth -- an optional decimal length field 234*46439007SCharles.Forsyth 235*46439007SCharles.Forsyth -- an initial double-quote (") 236*46439007SCharles.Forsyth 237*46439007SCharles.Forsyth -- the octet string with "C" escape conventions (\n,etc) 238*46439007SCharles.Forsyth 239*46439007SCharles.Forsyth -- a final double-quote (") 240*46439007SCharles.Forsyth 241*46439007SCharles.ForsythThe specified length is the length of the resulting string after any 242*46439007SCharles.Forsythescape sequences have been handled. The string does not have any 243*46439007SCharles.Forsyth"terminating NULL" that C includes, and the length does not count such 244*46439007SCharles.Forsytha character. 245*46439007SCharles.Forsyth 246*46439007SCharles.ForsythThe length is optional. 247*46439007SCharles.Forsyth 248*46439007SCharles.ForsythThe escape conventions within the quoted string are as follows (these follow 249*46439007SCharles.Forsyththe "C" programming language conventions, with an extension for 250*46439007SCharles.Forsythignoring line terminators of just LF or CRLF): 251*46439007SCharles.Forsyth \b -- backspace 252*46439007SCharles.Forsyth \t -- horizontal tab 253*46439007SCharles.Forsyth \v -- vertical tab 254*46439007SCharles.Forsyth \n -- new-line 255*46439007SCharles.Forsyth \f -- form-feed 256*46439007SCharles.Forsyth \r -- carriage-return 257*46439007SCharles.Forsyth \" -- double-quote 258*46439007SCharles.Forsyth \' -- single-quote 259*46439007SCharles.Forsyth \\ -- back-slash 260*46439007SCharles.Forsyth \ooo -- character with octal value ooo (all three digits 261*46439007SCharles.Forsyth must be present) 262*46439007SCharles.Forsyth \xhh -- character with hexadecimal value hh (both digits 263*46439007SCharles.Forsyth must be present) 264*46439007SCharles.Forsyth \<carriage-return> -- causes carriage-return to be ignored. 265*46439007SCharles.Forsyth \<line-feed> -- causes linefeed to be ignored 266*46439007SCharles.Forsyth \<carriage-return><line-feed> -- causes CRLF to be ignored. 267*46439007SCharles.Forsyth \<line-feed><carriage-return> -- causes LFCR to be ignored. 268*46439007SCharles.Forsyth 269*46439007SCharles.ForsythHere are some examples of quoted-string encodings: 270*46439007SCharles.Forsyth 271*46439007SCharles.Forsyth "subject" 272*46439007SCharles.Forsyth "hi there" 273*46439007SCharles.Forsyth 7"subject" 274*46439007SCharles.Forsyth 3"\n\n\n" 275*46439007SCharles.Forsyth "This has\n two lines." 276*46439007SCharles.Forsyth "This has\ 277*46439007SCharles.Forsyth one." 278*46439007SCharles.Forsyth "" 279*46439007SCharles.Forsyth 280*46439007SCharles.Forsyth4.3 Token representation 281*46439007SCharles.Forsyth 282*46439007SCharles.ForsythAn octet string that meets the following conditions may be given 283*46439007SCharles.Forsythdirectly as a "token". 284*46439007SCharles.Forsyth 285*46439007SCharles.Forsyth -- it does not begin with a digit 286*46439007SCharles.Forsyth 287*46439007SCharles.Forsyth -- it contains only characters that are 288*46439007SCharles.Forsyth -- alphabetic (upper or lower case), 289*46439007SCharles.Forsyth -- numeric, or 290*46439007SCharles.Forsyth -- one of the eight "pseudo-alphabetic" punctuation marks: 291*46439007SCharles.Forsyth - . / _ : * + = 292*46439007SCharles.Forsyth (Note: upper and lower case are not equivalent.) 293*46439007SCharles.Forsyth (Note: A token may begin with punctuation, including ":"). 294*46439007SCharles.Forsyth 295*46439007SCharles.ForsythHere are some examples of token representations: 296*46439007SCharles.Forsyth 297*46439007SCharles.Forsyth subject 298*46439007SCharles.Forsyth not-before 299*46439007SCharles.Forsyth class-of-1997 300*46439007SCharles.Forsyth //microsoft.com/names/smith 301*46439007SCharles.Forsyth * 302*46439007SCharles.Forsyth 303*46439007SCharles.Forsyth 304*46439007SCharles.Forsyth4.4 Hexadecimal representation 305*46439007SCharles.Forsyth 306*46439007SCharles.ForsythAn octet-string may be represented with a hexadecimal encoding consisting of: 307*46439007SCharles.Forsyth 308*46439007SCharles.Forsyth -- an (optional) decimal length of the octet string 309*46439007SCharles.Forsyth 310*46439007SCharles.Forsyth -- a sharp-sign "#" 311*46439007SCharles.Forsyth 312*46439007SCharles.Forsyth -- a hexadecimal encoding of the octet string, with each octet 313*46439007SCharles.Forsyth represented with two hexadecimal digits, most significant 314*46439007SCharles.Forsyth digit first. 315*46439007SCharles.Forsyth 316*46439007SCharles.Forsyth -- a sharp-sign "#" 317*46439007SCharles.Forsyth 318*46439007SCharles.ForsythThere may be whitespace inserted in the midst of the hexadecimal 319*46439007SCharles.Forsythencoding arbitrarily; it is ignored. It is an error to have 320*46439007SCharles.Forsythcharacters other than whitespace and hexadecimal digits. 321*46439007SCharles.Forsyth 322*46439007SCharles.ForsythHere are some examples of hexadecimal encodings: 323*46439007SCharles.Forsyth 324*46439007SCharles.Forsyth #616263# -- represents "abc" 325*46439007SCharles.Forsyth 3#616263# -- also represents "abc" 326*46439007SCharles.Forsyth # 616 327*46439007SCharles.Forsyth 263 # -- also represents "abc" 328*46439007SCharles.Forsyth 329*46439007SCharles.Forsyth 330*46439007SCharles.Forsyth4.5 Base-64 representation 331*46439007SCharles.Forsyth 332*46439007SCharles.ForsythAn octet-string may be represented in a base-64 coding consisting of: 333*46439007SCharles.Forsyth 334*46439007SCharles.Forsyth -- an (optional) decimal length of the octet string 335*46439007SCharles.Forsyth 336*46439007SCharles.Forsyth -- a vertical bar "|" 337*46439007SCharles.Forsyth 338*46439007SCharles.Forsyth -- the rfc 1521 base-64 encoding of the octet string. 339*46439007SCharles.Forsyth 340*46439007SCharles.Forsyth -- a final vertical bar "|" 341*46439007SCharles.Forsyth 342*46439007SCharles.ForsythThe base-64 encoding uses only the characters 343*46439007SCharles.Forsyth A-Z a-z 0-9 + / = 344*46439007SCharles.ForsythIt produces four characters of output for each three octets of input. 345*46439007SCharles.ForsythIf the input has one or two left-over octets of input, it produces an 346*46439007SCharles.Forsythoutput block of length four ending in two or one equals signs, respectively. 347*46439007SCharles.ForsythOutput routines compliant with this standard MUST output the equals signs 348*46439007SCharles.Forsythas specified. Input routines MAY accept inputs where the equals signs are 349*46439007SCharles.Forsythdropped. 350*46439007SCharles.Forsyth 351*46439007SCharles.ForsythThere may be whitespace inserted in the midst of the base-64 encoding 352*46439007SCharles.Forsytharbitrarily; it is ignored. It is an error to have characters other 353*46439007SCharles.Forsyththan whitespace and base-64 characters. 354*46439007SCharles.Forsyth 355*46439007SCharles.ForsythHere are some examples of base-64 encodings: 356*46439007SCharles.Forsyth 357*46439007SCharles.Forsyth |YWJj| -- represents "abc" 358*46439007SCharles.Forsyth | Y W 359*46439007SCharles.Forsyth J j | -- also represents "abc" 360*46439007SCharles.Forsyth 3|YWJj| -- also represents "abc" 361*46439007SCharles.Forsyth |YWJjZA==| -- represents "abcd" 362*46439007SCharles.Forsyth |YWJjZA| -- also represents "abcd" 363*46439007SCharles.Forsyth 364*46439007SCharles.Forsyth 365*46439007SCharles.Forsyth4.6 Display hint 366*46439007SCharles.Forsyth 367*46439007SCharles.ForsythAny octet string may be preceded by a single "display hint". 368*46439007SCharles.Forsyth 369*46439007SCharles.ForsythThe purposes of the display hint is to provide information on how 370*46439007SCharles.Forsythto display the octet string to a user. It has no other function. 371*46439007SCharles.ForsythMany of the MIME types work here. 372*46439007SCharles.Forsyth 373*46439007SCharles.ForsythA display-hint is an octet string surrounded by square brackets. 374*46439007SCharles.ForsythThere may be whitespace separating the octet string from the 375*46439007SCharles.Forsythsurrounding brackets. Any of the legal formats may be used for the 376*46439007SCharles.Forsythoctet string. 377*46439007SCharles.Forsyth 378*46439007SCharles.ForsythHere are some examples of display-hints: 379*46439007SCharles.Forsyth 380*46439007SCharles.Forsyth [image/gif] 381*46439007SCharles.Forsyth [URI] 382*46439007SCharles.Forsyth [charset=unicode-1-1] 383*46439007SCharles.Forsyth [text/richtext] 384*46439007SCharles.Forsyth [application/postscript] 385*46439007SCharles.Forsyth [audio/basic] 386*46439007SCharles.Forsyth ["http://abc.com/display-types/funky.html"] 387*46439007SCharles.Forsyth 388*46439007SCharles.ForsythIn applications an octet-string that is untyped may be considered to have 389*46439007SCharles.Forsytha pre-specified "default" mime type. The mime type 390*46439007SCharles.Forsyth "text/plain; charset=iso-8859-1" 391*46439007SCharles.Forsythis the standard default. 392*46439007SCharles.Forsyth 393*46439007SCharles.Forsyth 394*46439007SCharles.Forsyth4.7 Equality of octet-strings 395*46439007SCharles.Forsyth 396*46439007SCharles.ForsythTwo octet strings are considered to be "equal" if and only if they 397*46439007SCharles.Forsythhave the same display hint and the same data octet strings. 398*46439007SCharles.Forsyth 399*46439007SCharles.ForsythNote that octet-strings are "case-sensitive"; the octet-string "abc" 400*46439007SCharles.Forsythis not equal to the octet-string "ABC". 401*46439007SCharles.Forsyth 402*46439007SCharles.ForsythAn untyped octet-string can be compared to another octet-string (typed 403*46439007SCharles.Forsythor not) by considering it as a typed octet-string with the default 404*46439007SCharles.Forsythmime-type. 405*46439007SCharles.Forsyth 406*46439007SCharles.Forsyth 407*46439007SCharles.Forsyth5. Lists 408*46439007SCharles.Forsyth 409*46439007SCharles.ForsythJust as with octet-strings, there are several ways to represent an 410*46439007SCharles.ForsythS-expression. Whitespace may be used to separate list elements, but 411*46439007SCharles.Forsyththey are only required to separate two octet strings when otherwise 412*46439007SCharles.Forsyththe two octet strings might be interpreted as one, as when one token 413*46439007SCharles.Forsythfollows another. Also, whitespace may follow the initial left 414*46439007SCharles.Forsythparenthesis, or precede the final right parenthesis. 415*46439007SCharles.Forsyth 416*46439007SCharles.ForsythHere are some examples of encodings of lists: 417*46439007SCharles.Forsyth 418*46439007SCharles.Forsyth (a b c) 419*46439007SCharles.Forsyth 420*46439007SCharles.Forsyth ( a ( b c ) ( ( d e ) ( e f ) ) ) 421*46439007SCharles.Forsyth 422*46439007SCharles.Forsyth (11:certificate(6:issuer3:bob)(7:subject5:alice)) 423*46439007SCharles.Forsyth 424*46439007SCharles.Forsyth ({3Rt=} "1997" murphy 3:{XC++}) 425*46439007SCharles.Forsyth 426*46439007SCharles.Forsyth 427*46439007SCharles.Forsyth6. Representation types 428*46439007SCharles.Forsyth 429*46439007SCharles.ForsythThere are three "types" of representations: 430*46439007SCharles.Forsyth 431*46439007SCharles.Forsyth -- canonical 432*46439007SCharles.Forsyth 433*46439007SCharles.Forsyth -- basic transport 434*46439007SCharles.Forsyth 435*46439007SCharles.Forsyth -- advanced transport 436*46439007SCharles.Forsyth 437*46439007SCharles.ForsythThe first two MUST be supported by any implementation; the last is 438*46439007SCharles.Forsythoptional. 439*46439007SCharles.Forsyth 440*46439007SCharles.Forsyth 441*46439007SCharles.Forsyth6.1 Canonical representation 442*46439007SCharles.Forsyth 443*46439007SCharles.ForsythThis canonical representation is used for digital signature purposes, 444*46439007SCharles.Forsythtransmission, etc. It is uniquely defined for each S-expression. It 445*46439007SCharles.Forsythis not particularly readable, but that is not the point. It is 446*46439007SCharles.Forsythintended to be very easy to parse, to be reasonably economical, and to 447*46439007SCharles.Forsythbe unique for any S-expression. 448*46439007SCharles.Forsyth 449*46439007SCharles.ForsythThe "canonical" form of an S-expression represents each octet-string 450*46439007SCharles.Forsythin verbatim mode, and represents each list with no blanks separating 451*46439007SCharles.Forsythelements from each other or from the surrounding parentheses. 452*46439007SCharles.Forsyth 453*46439007SCharles.ForsythHere are some examples of canonical representations of S-expressions: 454*46439007SCharles.Forsyth 455*46439007SCharles.Forsyth (6:issuer3:bob) 456*46439007SCharles.Forsyth 457*46439007SCharles.Forsyth (4:icon[12:image/bitmap]9:xxxxxxxxx) 458*46439007SCharles.Forsyth 459*46439007SCharles.Forsyth (7:subject(3:ref5:alice6:mother)) 460*46439007SCharles.Forsyth 461*46439007SCharles.Forsyth 462*46439007SCharles.Forsyth6.2 Basic transport representation 463*46439007SCharles.Forsyth 464*46439007SCharles.ForsythThere are two forms of the "basic transport" representation: 465*46439007SCharles.Forsyth 466*46439007SCharles.Forsyth -- the canonical representation 467*46439007SCharles.Forsyth 468*46439007SCharles.Forsyth -- an rfc-2045 base-64 representation of the canonical representation, 469*46439007SCharles.Forsyth surrounded by braces. 470*46439007SCharles.Forsyth 471*46439007SCharles.ForsythThe transport mechanism is intended to provide a universal means of 472*46439007SCharles.Forsythrepresenting S-expressions for transport from one machine to another. 473*46439007SCharles.Forsyth 474*46439007SCharles.ForsythHere are some examples of an S-expression represented in basic 475*46439007SCharles.Forsythtransport mode: 476*46439007SCharles.Forsyth 477*46439007SCharles.Forsyth (1:a1:b1:c) 478*46439007SCharles.Forsyth 479*46439007SCharles.Forsyth {KDE6YTE6YjE6YykA} 480*46439007SCharles.Forsyth 481*46439007SCharles.Forsyth (this is the same S-expression encoded in base-64) 482*46439007SCharles.Forsyth 483*46439007SCharles.ForsythThere is a difference between the brace notation for base-64 used here 484*46439007SCharles.Forsythand the || notation for base-64'd octet-strings described above. Here 485*46439007SCharles.Forsyththe base-64 contents are converted to octets, and then re-scanned as 486*46439007SCharles.Forsythif they were given originally as octets. With the || notation, the 487*46439007SCharles.Forsythcontents are just turned into an octet-string. 488*46439007SCharles.Forsyth 489*46439007SCharles.Forsyth 490*46439007SCharles.Forsyth6.3 Advanced transport representation 491*46439007SCharles.Forsyth 492*46439007SCharles.ForsythThe "advanced transport" representation is intended to provide more 493*46439007SCharles.Forsythflexible and readable notations for documentation, design, debugging, 494*46439007SCharles.Forsythand (in some cases) user interface. 495*46439007SCharles.Forsyth 496*46439007SCharles.ForsythThe advanced transport representation allows all of the representation 497*46439007SCharles.Forsythforms described above, include quoted strings, base-64 and hexadecimal 498*46439007SCharles.Forsythrepresentation of strings, tokens, representations of strings with 499*46439007SCharles.Forsythomitted lengths, and so on. 500*46439007SCharles.Forsyth 501*46439007SCharles.Forsyth 502*46439007SCharles.Forsyth7. BNF for syntax 503*46439007SCharles.Forsyth 504*46439007SCharles.ForsythWe give separate BNF's for canonical and advanced forms of S-expressions. 505*46439007SCharles.ForsythWe use the following notation: 506*46439007SCharles.Forsyth <x>* means 0 or more occurrences of <x> 507*46439007SCharles.Forsyth <x>+ means 1 or more occurrences of <x> 508*46439007SCharles.Forsyth <x>? means 0 or 1 occurrences of <x> 509*46439007SCharles.Forsyth parentheses are used for grouping, as in (<x> | <y>)* 510*46439007SCharles.Forsyth 511*46439007SCharles.ForsythFor canonical and basic transport: 512*46439007SCharles.Forsyth 513*46439007SCharles.Forsyth<sexpr> :: <string> | <list> 514*46439007SCharles.Forsyth<string> :: <display>? <simple-string> ; 515*46439007SCharles.Forsyth<simple-string> :: <raw> ; 516*46439007SCharles.Forsyth<display> :: "[" <simple-string> "]" ; 517*46439007SCharles.Forsyth<raw> :: <decimal> ":" <bytes> ; 518*46439007SCharles.Forsyth<decimal> :: <decimal-digit>+ ; 519*46439007SCharles.Forsyth -- decimal numbers should have no unnecessary leading zeros 520*46439007SCharles.Forsyth<bytes> -- any string of bytes, of the indicated length 521*46439007SCharles.Forsyth<list> :: "(" <sexp>* ")" ; 522*46439007SCharles.Forsyth<decimal-digit> :: "0" | ... | "9" ; 523*46439007SCharles.Forsyth 524*46439007SCharles.ForsythFor advanced transport: 525*46439007SCharles.Forsyth 526*46439007SCharles.Forsyth<sexpr> :: <string> | <list> 527*46439007SCharles.Forsyth<string> :: <display>? <simple-string> ; 528*46439007SCharles.Forsyth<simple-string> :: <raw> | <token> | <base-64> | <hexadecimal> | 529*46439007SCharles.Forsyth <quoted-string> ; 530*46439007SCharles.Forsyth<display> :: "[" <simple-string> "]" ; 531*46439007SCharles.Forsyth<raw> :: <decimal> ":" <bytes> ; 532*46439007SCharles.Forsyth<decimal> :: <decimal-digit>+ ; 533*46439007SCharles.Forsyth -- decimal numbers should have no unnecessary leading zeros 534*46439007SCharles.Forsyth<bytes> -- any string of bytes, of the indicated length 535*46439007SCharles.Forsyth<token> :: <tokenchar>+ ; 536*46439007SCharles.Forsyth<base-64> :: <decimal>? "|" ( <base-64-char> | <whitespace> )* "|" ; 537*46439007SCharles.Forsyth<hexadecimal> :: "#" ( <hex-digit> | <white-space> )* "#" ; 538*46439007SCharles.Forsyth<quoted-string> :: <decimal>? <quoted-string-body> 539*46439007SCharles.Forsyth<quoted-string-body> :: "\"" <bytes> "\"" 540*46439007SCharles.Forsyth<list> :: "(" ( <sexp> | <whitespace> )* ")" ; 541*46439007SCharles.Forsyth<whitespace> :: <whitespace-char>* ; 542*46439007SCharles.Forsyth<token-char> :: <alpha> | <decimal-digit> | <simple-punc> ; 543*46439007SCharles.Forsyth<alpha> :: <upper-case> | <lower-case> | <digit> ; 544*46439007SCharles.Forsyth<lower-case> :: "a" | ... | "z" ; 545*46439007SCharles.Forsyth<upper-case> :: "A" | ... | "Z" ; 546*46439007SCharles.Forsyth<decimal-digit> :: "0" | ... | "9" ; 547*46439007SCharles.Forsyth<hex-digit> :: <decimal-digit> | "A" | ... | "F" | "a" | ... | "f" ; 548*46439007SCharles.Forsyth<simple-punc> :: "-" | "." | "/" | "_" | ":" | "*" | "+" | "=" ; 549*46439007SCharles.Forsyth<whitespace-char> :: " " | "\t" | "\r" | "\n" ; 550*46439007SCharles.Forsyth<base-64-char> :: <alpha> | <decimal-digit> | "+" | "/" | "=" ; 551*46439007SCharles.Forsyth<null> :: "" ; 552*46439007SCharles.Forsyth 553*46439007SCharles.Forsyth8. In-memory representations 554*46439007SCharles.Forsyth 555*46439007SCharles.ForsythFor processing, the S-expression would typically be parsed and represented 556*46439007SCharles.Forsythin memory in a more more amenable to efficient processing. We suggest 557*46439007SCharles.Forsythtwo alternatives: 558*46439007SCharles.Forsyth 559*46439007SCharles.Forsyth -- "list-structure" 560*46439007SCharles.Forsyth 561*46439007SCharles.Forsyth -- "array-layout" 562*46439007SCharles.Forsyth 563*46439007SCharles.ForsythWe only sketch these here, as they are only suggestive. The code referenced 564*46439007SCharles.Forsythbelow illustrates these styles in more detail. 565*46439007SCharles.Forsyth 566*46439007SCharles.Forsyth 567*46439007SCharles.Forsyth8.1. List-structure memory representation 568*46439007SCharles.Forsyth 569*46439007SCharles.ForsythHere there are separate records for simple-strings, strings, and 570*46439007SCharles.Forsythlists. An S-expression of the form ("abc" "de") would require two 571*46439007SCharles.Forsythrecords for the simple strings, two for the strings, and two for the 572*46439007SCharles.Forsythlist elements. This is a fairly conventional representation, and 573*46439007SCharles.Forsythdetails are omitted here. 574*46439007SCharles.Forsyth 575*46439007SCharles.Forsyth8.2 Array-layout memory representation 576*46439007SCharles.Forsyth 577*46439007SCharles.ForsythHere each S-expression is represented as a contiguous array of bytes. 578*46439007SCharles.ForsythThe first byte codes the "type" of the S-expression: 579*46439007SCharles.Forsyth 580*46439007SCharles.Forsyth 01 octet-string 581*46439007SCharles.Forsyth 582*46439007SCharles.Forsyth 02 octet-string with display-hint 583*46439007SCharles.Forsyth 584*46439007SCharles.Forsyth 03 beginning of list (and 00 is used for "end of list") 585*46439007SCharles.Forsyth 586*46439007SCharles.ForsythEach of the three types is immediately followed by a k-byte integer 587*46439007SCharles.Forsythindicating the size (in bytes) of the following representation. Here 588*46439007SCharles.Forsythk is an integer that depends on the implementation, it might be 589*46439007SCharles.Forsythanywhere from 2 to 8, but would be fixed for a given implementation; 590*46439007SCharles.Forsythit determines the size of the objects that can be handled. The transport 591*46439007SCharles.Forsythand canonical representations are independent of the choice of k made by 592*46439007SCharles.Forsyththe implementation. 593*46439007SCharles.Forsyth 594*46439007SCharles.ForsythAlthough the length of lists are not given in the usual S-expression 595*46439007SCharles.Forsythnotations, it is easy to fill them in when parsing; when you reach a 596*46439007SCharles.Forsythright-parenthesis you know how long the list representation was, and 597*46439007SCharles.Forsythwhere to go back to fill in the missing length. 598*46439007SCharles.Forsyth 599*46439007SCharles.Forsyth 600*46439007SCharles.Forsyth8.2.1 Octet string 601*46439007SCharles.Forsyth 602*46439007SCharles.ForsythThis is represented as follows: 603*46439007SCharles.Forsyth 604*46439007SCharles.Forsyth 01 <length> <octet-string> 605*46439007SCharles.Forsyth 606*46439007SCharles.ForsythFor example (here k = 2) 607*46439007SCharles.Forsyth 608*46439007SCharles.Forsyth 01 0003 a b c 609*46439007SCharles.Forsyth 610*46439007SCharles.Forsyth8.2.2 Octet-string with display-hint 611*46439007SCharles.Forsyth 612*46439007SCharles.ForsythThis is represented as follows: 613*46439007SCharles.Forsyth 614*46439007SCharles.Forsyth 02 <length> 615*46439007SCharles.Forsyth 01 <length> <octet-string> /* for display-type */ 616*46439007SCharles.Forsyth 01 <length> <octet-string> /* for octet-string */ 617*46439007SCharles.Forsyth 618*46439007SCharles.ForsythFor example, the S-expression 619*46439007SCharles.Forsyth 620*46439007SCharles.Forsyth [gif] #61626364# 621*46439007SCharles.Forsyth 622*46439007SCharles.Forsythwould be represented as (with k = 2) 623*46439007SCharles.Forsyth 624*46439007SCharles.Forsyth 02 000d 625*46439007SCharles.Forsyth 01 0003 g i f 626*46439007SCharles.Forsyth 01 0004 61 62 63 64 627*46439007SCharles.Forsyth 628*46439007SCharles.Forsyth8.2.3 List 629*46439007SCharles.Forsyth 630*46439007SCharles.ForsythThis is represented as 631*46439007SCharles.Forsyth 632*46439007SCharles.Forsyth 03 <length> <item1> <item2> <item3> ... <itemn> 00 633*46439007SCharles.Forsyth 634*46439007SCharles.ForsythFor example, the list (abc [d]ef (g)) is represented in memory as (with k=2) 635*46439007SCharles.Forsyth 636*46439007SCharles.Forsyth 03 001b 637*46439007SCharles.Forsyth 01 0003 a b c 638*46439007SCharles.Forsyth 02 0009 639*46439007SCharles.Forsyth 01 0001 d 640*46439007SCharles.Forsyth 01 0002 e f 641*46439007SCharles.Forsyth 03 0005 642*46439007SCharles.Forsyth 01 0001 g 643*46439007SCharles.Forsyth 00 644*46439007SCharles.Forsyth 00 645*46439007SCharles.Forsyth 646*46439007SCharles.Forsyth9. Code 647*46439007SCharles.Forsyth 648*46439007SCharles.ForsythThere is code available for reading and parsing the various 649*46439007SCharles.ForsythS-expression formats proposed here. 650*46439007SCharles.Forsyth 651*46439007SCharles.ForsythSee http://theory.lcs.mit.edu/~rivest/sexp.html 652*46439007SCharles.Forsyth 653*46439007SCharles.Forsyth 654*46439007SCharles.Forsyth10. Utilization of S-expressions 655*46439007SCharles.Forsyth 656*46439007SCharles.ForsythThis note has described S-expressions in general form. Application writers 657*46439007SCharles.Forsythmay wish to restrict their use of S-expressions in various ways. Here are 658*46439007SCharles.Forsythsome possible restrictions that might be considered: 659*46439007SCharles.Forsyth 660*46439007SCharles.Forsyth -- no display-hints 661*46439007SCharles.Forsyth -- no lengths on hexadecimal, quoted-strings, or base-64 encodings 662*46439007SCharles.Forsyth -- no empty lists 663*46439007SCharles.Forsyth -- no empty octet-strings 664*46439007SCharles.Forsyth -- no lists having another list as its first element 665*46439007SCharles.Forsyth -- no base-64 or hexadecimal encodings 666*46439007SCharles.Forsyth -- fixed limits on the size of octet-strings 667*46439007SCharles.Forsyth 668*46439007SCharles.Forsyth11. Historical note 669*46439007SCharles.Forsyth 670*46439007SCharles.ForsythThe S-expression technology described here was originally developed 671*46439007SCharles.Forsythfor ``SDSI'' (the Simple Distributed Security Infrastructure by 672*46439007SCharles.ForsythLampson and Rivest [SDSI]) in 1996, although the origins clearly date 673*46439007SCharles.Forsythback to McCarthy's LISP programming language. It was further refined 674*46439007SCharles.Forsythand improved during the merger of SDSI and SPKI [SPKI] during the 675*46439007SCharles.Forsythfirst half of 1997. S-expressions are similar to, but more readable 676*46439007SCharles.Forsythand flexible than, Bernstein's "net-strings" [BERN]. 677*46439007SCharles.Forsyth 678*46439007SCharles.Forsyth12. References 679*46439007SCharles.Forsyth 680*46439007SCharles.Forsyth[SDSI] "A Simple Distributed Security Architecture", by 681*46439007SCharles.Forsyth Butler Lampson, and Ronald L. Rivest 682*46439007SCharles.Forsyth http://theory.lcs.mit.edu/~cis/sdsi.html 683*46439007SCharles.Forsyth 684*46439007SCharles.Forsyth[SPKI] <a href="http://www.clark.net/pub/cme/html/spki.html">SPKI--A 685*46439007SCharles.Forsyth Simple Public Key Infrastructure</a> 686*46439007SCharles.Forsyth 687*46439007SCharles.Forsyth[BERN] Dan Bernstein's "net-strings"; Internet Draft 688*46439007SCharles.Forsyth draft-bernstein-netstrings-02.txt 689*46439007SCharles.Forsyth 690*46439007SCharles.ForsythAuthor's Address 691*46439007SCharles.Forsyth 692*46439007SCharles.Forsyth Ronald L. Rivest 693*46439007SCharles.Forsyth Room 324, 545 Technology Square 694*46439007SCharles.Forsyth MIT Laboratory for Computer Science 695*46439007SCharles.Forsyth Cambridge, MA 02139 696*46439007SCharles.Forsyth 697*46439007SCharles.Forsyth rivest@theory.lcs.mit.edu 698*46439007SCharles.Forsyth 699*46439007SCharles.Forsyth 700