We can also construct special tabulations (known as paradigms) to illustrate contrasts and systematic variation, as shown in 1.3 for three verbs. At the most abstract level, a text is a representation of a real or fictional speech event, and the time-course of that event carries over into the text itself.A text could be a small unit, such as a word or sentence, or a complete narrative or dialogue.Like the Brown Corpus, which displays a balanced selection of text genres and sources, TIMIT includes a balanced selection of dialects, speakers, and materials.For each of eight dialect regions, 50 male and female speakers having a range of ages and educational backgrounds each read ten carefully chosen sentences.Despite its complexity, the TIMIT corpus only contains two fundamental data types, namely lexicons and texts.As we saw in 2., most lexical resources can be represented using a record structure, i.e. A lexical resource could be a conventional dictionary or comparative wordlist, as illustrated.: Structure of the Published TIMIT Corpus: The CD-ROM contains doc, train, and test directories at the top level; the train and test directories both have 8 sub-directories, one per dialect region; each of these contains further subdirectories, one per speaker; the contents of the directory for female speaker A fourth feature of TIMIT is the hierarchical structure of the corpus.With 4 files per sentence, and 10 sentences for each of 500 speakers, there are 20,000 files.

A thesaurus also consists of record-structured data, where we look up entries via non-key fields that correspond to topics.First, the corpus contains two layers of annotation, at the phonetic and orthographic levels.In general, a text or speech corpus may be annotated at many different linguistic levels, including morphological, syntactic, and discourse levels.This last observation is less surprising when we consider that text and record structures are the primary domains for the two subfields of computer science that focus on data management, namely text retrieval and databases.