Titan
Databases for the people
Synopsis
Titan is a development platform for incremental optimisable search and enumeration. In Titan, programs are defined as relations between values, which can be defined by formulae according to rules for application and search. For this, Titan defines:
- A reference language in which to express these rules;
- A standard library by which to facilitate development; and
- Implementation constraints to provide performance guarantees.
Regarding the final point, a major intended objective is for implementations to be able to optimise a subset of the language enough to work in embedded programming.
Reference Language
Titan's reference language is extremely simple: conceptually, programs are databases of relations specified as sums of rows, each row relating the values in its columns together. The major tricks added to this, however, are:
- Relations can be partially applied, yielding the projection of the search for the rows with the given values in the corresponding columns onto the other columns;
- A row of a single column decays to the value in that column;
- A sum of a single row decays to that row; and
- Unary relations correspond to column types.
Semantically, sums represent searches over rows, and rows represent patterns whose values are related. The following outputs of an example implementation of the euclidean algorithm illustrate this succinctly:
A brief summary of the language's grammar, taken from the upcoming design document, is provided below.
Alphabet
The alphabet of the Titan reference language is in the Unicode 10 character set. Several subsets are named here for future reference:
- Letters, consisting of the Unicode General Category L, named letter;
- Digits, consisting of the Unicode General Category N, along with the following subsets:
- the decimal digits
0123456789
, named digit,
and the nonzero digits ndigit = digit - 0
;
- the binary digits
01
, named bdigit;
- the octal digits
01234567
, named odigit;
- the hexadecimal digits
0123456789ABCDEF
, named hdigit;
- Alphanumerics, named alpha = digit|letter;
- Punctuation and delimiters, consisting of
()[]{}'"@
;
- Symbols, consisting of
`.\/%^*+-#&|~=<>!?:;,
, named symbol;
- Whitespace, consisting of the union of the Unicode General Categories Zs, Zl, Zp and Cc; and
- The underscore
_
.
Tokens
- The delimiters
() [] {}
;
- Names, denoted by the regular expression name
(_
|letter|symbol)(_
|alpha|symbol)*
- Variables, defined by the regular expression variable
@
name
- Numbers, defined by the regular expressions number
-
?ndigit(digit +(.
digit +)?|(.
digit +)?(e-
?ndigit digit *)?)?
0
(.
digit+|b
bdigit+|o
odigit+|x
hdigit+)?
'
((\
|'
)-|\
(\
|'
))'
- Strings, defined by the regular expression string
"
("
-|\
("
|\
))*"
- Comments, defined by the regular expressions
\*
(*\
)-*\
\\
(newline)-newline
Grammar
The full grammar of the Titan reference language is:
E ::= | AE
A ::= name | variable | number | string | [
AE]
| {
AE}
| (
E)
Note that there are no reserved identifiers.
Standard Library
Titan's standard library is divided into three major parts: Language, Environment, and Algorithms. All definitions in the standard library are abstract data/negative types, meaning they are defined in terms of their projection relations. Only the Language part is mandatory, and the two other parts can be defined in terms of it.
Language
The Language part contains the abstract data types of the core constructs of the language.
Algorithms
The Algorithms part contains the abstract data types for complex operations on the core constructs of the language.
Environment
The Environment part contains data types for interfacing with other systems.
Reference Implementation
Titan's reference implementation will be code-named Othrys, after the mountain home of the Greek titans (as opposed to Olympus, the home of the Olympian deities).
Bootstrap Implementation
Titan's bootstrap implementation is code-named Chaos, after the root deity of the Greek Pantheon. An initial idea is to write it in Prolog, but Twelf might also be appropriate.