|I. Overview of Transcription and Visualization|
A source artifact is
transcribed by a transcriber into a transcription file.
Typically, a transcription file is stored -- together with other transcription files -- on a server
connected to the Worldwide Web (WWW). The client (viz., an end-user's computer)
retrieves a transcription file via the Web for visualization (i.e., visual rendering) or for other
use on the client computer.
Intellectual Property Ownership: The source artifacts are hundreds of years old and, therefore, are in the public domain. Transcriptions, descriptions, and photographic images, however, may be protected by copyright. In such cases, please contact the copyright owner of the material for permission to use it.
|II. Architecture for a Distributed e-Library|
Transcription files can exist on a client computer and on the
Transcription files are designed to be "first class objects" on the Web. By this we mean that a transcription file is retrievable directly by its own URL.
Search engines and specialized databases for chant can act as indexes (or, catalogs) to help users find transcription files.
We call this type of architecture a "distributed
|III. Content of a Transcription File|
|NEUMES : Transcription Part|
A transcription file
is an XML (Extensible Markup Language) file
that has two principal parts:
1) the NeumesXML Description Part (detailed in Diagram IV, below); and
2) the NEUMES Transcription Part (discussed here, and detailed in Diagram V).
The Transcription Part records all prima-facie semantic content of a source artifact.
Specifically excluded from our definition of "semantic content" are: information about the handwriting of individual scribes (i.e., paleographic information); and illuminations, decorations, or other prima-facie content that does not inform the chant or the text. (Note, however, that such information can be recorded at a 'higher level', viz., in the NeumesXML markup.)
We say that an encoding scheme is a lossless data representation if can capture all prima-facie semantic content of sources in the domain of discourse, and if the resultant data can satisfy all principal end-uses required by the domain of discourse. NEUMES (Neumed and Ekphonetic Universal Manuscript Encoding Standard) is a lossless data representation.
NEUMES is a formal language (i.e., a set of strings, where set membership is decided by a formal grammar) whose alphabet consists of Unicode-compatible characters encoded in the UTF-8 standard encoding scheme. NEUMES is optimized for content-search across potentially millions of records, and pattern-matching involving uncertainty and what we call "complex traversal" (viz., random access) in linear data streams.
|IV. Content of NeumesXML Meta-data|
of the Source
NeumesXML is an extension of
XML, and it is defined as an
NeumesXML plays several roles, principally as a wrapper (or, 'vehicle') for convenient disk storage
and Internet transmission of NEUMES transcription data.
Thus, transcriptions appear on disk and on the Web just as XML files (eg, "NeumesExample.xml").
NEUMES data appear in such a file between the NeumesXML tags
<transcription_part> and </transcription_part>.
NeumesXML also allows the transcriber to record descriptive information about the source artifact, the transcriber, the editorial methods used, and so on (see diagram), and--to a lesser extent--allows for markup of a transcription, such as to document the logical structure of the source and to insert in situ editorial comments.
Unlike NEUMES data (which are Unicode character strings), NeumesXML tags are written in plain ASCII. The UTF-8 standard allows the mixing of ASCII and Unicode; this coding separation allows NEUMES content to be parsed unambiguously during "complex traversal" of linear data streams.
|V. Decomposition of the Transcription Part|
The Transcription Part [cf., Diagram III for general
discussion] contains a string of Unicode characters, which
are treated as character data by XML.
This string has a pattern that can repeat many times in the Transcription Part.
(Such repetition is called a sequence and is denoted by wide-angle brackets.)
The pattern has two principal pieces that always occur in the same order, as follows.
1) The first piece is always a chant text segment. It is a sequence of characters that records part of the intoned or recited text.
2) The second piece is 'optional'; it contains zero or more cantillation segments. These are further decomposed, below.
A passage of recited text (i.e., intended to be spoken only) typically does not have any cantillation segments. If the source artifact contains a long passage of recited text without any neumation, then the entire passage can be encoded as one chant text segment. An intoned (or, chanted) text, however, is always segmented in the data, such that one chant text segment is one syllable or some other unit of text that the scribe treated individually for neumation in the source artifact.
A cantillation segment is a NEUMES sequence, which we define as a sequence of characters, such that each character is a member of the NEUMES character set, where the sequence conforms to the NEUMES language. The NEUMES character set consists of 24-bit characters from the Private Use Area of the Unicode, as 'filtered' (or, restricted) by the NEUMES grammar.
The NEUMES language is 'generated' (or, defined) by the NEUMES grammar. The NEUMES grammar decides whether particular NEUMES sequences are grammatical according the NEUMES language (this verification process is typically done at the time of data-entry and during transformation of a transcription file for visualization).