Smithsonian National Museum of Natural History



Overview of the Data Field Listings


The following sections describe in detail the data fields of the ETE Database. They are primarily intended to be a reference for those researchers preparing data for entry into the database, but can also be employed by users seeking information about the database entries. Following this overview, each type of data entity is described and complete information about every data field is provided.

The Basic Entities of the Database

There are two entities about which the database records information -- fossil localities and fossil species. Although other relatively independent pieces of information, such as literature citations, also go into the database, they can be entered only as part of the entry (or update) of a locality or species. Species and localities can be entered separately. It is possible to enter or modify morphological or ecological attributes of a species without regard to the localities in which it may be found. Likewise, data on a locality can be entered without any ecological inferences about the species in its species lists.

The entry of a locality may include specific information about the locality; a list of sedimentary structures and taphonomic detail; lists of collecting methods, synonyms, and museum collections; a species list; and a list of (one or more) references. The entry of a species includes the information about the species, plus a list of (one or more) references.

Any data in the database can be retrieved independently, and complex queries with many conditions can be answered by the Explorer program that ETE uses. However, queries will usually involve something about either localities, species, or both.

Localities and Local Faunas and Floras

A fossil locality, as the ETE Consortium uses the term, refers to a significant collection of fossil specimens from a particular geographic location. There may be several "localities" that share the same geographic coordinates if each is of a different age (for example, if several localities are superposed in a single exposure or excavation). Likewise, the area of outcrop or the volume of rock representing a locality may vary with the nature of the fossil occurrence and with the traditions of the taxonomic subdisciplines concerned with its study. A general rule that the ETE Consortium tries to apply consistently is that a fossil locality is that collection-unit within which differences in age, sedimentary environment, and taphonomic context cannot be meaningfully distinguished. That is, a "locality" should be at the lowest scale usually considered by the subdiscipline, within which there is no significant structure -- at least as far as the Database is concerned. The descriptors we use for sedimentary context and taphonomy are intended to be applicable to all Database localities, and the entries under these fields will give a good idea of the scale that is intended. Also, a maximum and minimum age must be assignable to each locality, and it is of course desirable that this information be as specific as possible. Multiple collections from the same unit may be combined into one locality as long as these rather broad conditions are not seriously violated. Another way to express what we mean by a locality is to say that it is a sample of the living community or biota that was (probably) formed over a distinct time interval in a particular way.

Various higher-level groupings of what we are calling "localities" define local floras or faunas, as well as the species lists for facies, formations, biostratigraphic units, or even more inclusive regional biotas. We are aware that these entities are more often the subject of interest and comparative analysis than are the lower-level localities. For example, it is unlikely that a complete contemporaneous mammalian species list for a region will be represented at any one locality, but good approximation might be had by combining the lists of several localities -- particularly ones from a variety of taphonomic contexts. It will not always be possible to characterize such higher-level groupings of the primary localities in terms of a single sedimentary or taphonomic context. Furthermore, there are many different criteria for grouping localities, depending upon the nature of the intended analysis. One could group by time, by geography, by sedimentary environment, by taxonomic composition, etc. A given primary locality can thus simultaneously play a part in many different classification schemes. For this reason, we do not store "faunas," "floras," and the like in the ETE Database. It is easier for the relational database software to create unified species lists from a set of localities that are individually recorded than it would be for it to break up floras or faunas (stored as such) into constituent localities. In recording only the lowest level entities, we provide the maximum flexibility for analysis, and at the same time insure that species lists of all higher-level groupings of localities are consistent with those of their contained localities. So, users must combine the appropriate localities needed for a fauna/flora list at the time of analysis. In addition, the Explorer allows users to store lists of localities under user-supplied names, and in turn to store these in files representing particular kinds of higher-level groupings. In this way users may permanently store a configuration of the data contained in the Database that is customized for the kinds of analyses that they usually undertake.


Each locality or species record has a set of fields called Data-Entry Authorization section. This is an important section and most of its fields are mandatory -- they must be filled out or the entry cannot be completed. The section includes the Data Coordinator and Data Authorizer fields. (See the descriptions of the individual fields in the following chapters for more information.) The data-entry software enters the date of entry to the database automatically, as well as recording automatically, for each entry or update, what fields were changed.

The fields of this section are stored internally in a different table than the descriptive data for the locality or species. Each time that data are added to the locality or species (including initial entry) a new blank set of these fields is created and associated with the locality or species. Whenever an update occurs, new information replaces the old in the "ordinary" fields composing the record. However, the fields in the Data-Entry Authorization section are not written over; rather, a new set of entries is added to the old. Thus, one can retrieve the entire update history of a locality or a species from the database. References are keyed to particular updates, not directly to the locality or species. Thus, one can also see what references were used each time data were modified or added. In addition, one can see which fields were modified (however, the old values themselves are not recorded).

One field that may seem strange to find here is the comment field. One would expect comments to refer to the species or locality, not to the update fields in the Data-Entry Authorization section. The comments in the comment field can refer to anything in the associated species or locality record. The idea here is that the comments usually refer to some action taken or decision made compiling data for an update, and thus it is reasonable to key them, like references, to specific updates. Also, the most "active" species or localities -- those that are updated most often -- will also be the ones generating the most comments, and this is a mechanism to ensure that the space available for subsequent comments increases automatically as needed. The entire update history of a locality or species, including all comments, is stored in the database and is available on request.

Locality Ages

A few remarks about the handling of locality ages are in order. Most paleontologists are rightly skeptical about assigning numerical ages in millions of years to particular localities. Ages based on geochronological units of various kinds are usually preferred unless radiometric or other absolute dates are available. However, the graphical interface program needs to use numerical ages as a common denominator to perform its searches for particular time intervals. This means assigning numerical maximum and minimum ages to localities whose age is known only by stratigraphic or biostratigraphic correlation. At any given time there is some consensus about the age-span of geological time units in millions of years, but these estimates are revised regularly. Thus we must retain the information about what was used to generate our numerical ages for localities, so that their ages can be revised as research in geochronology progresses.

The mechanism for doing this is described in some detailed in the locality fields Dating Method, Maximum Age, Minimum Age, MaxBFA, MinBFA, MaxFrac, and MinFrac basis-for-age frac fields. The ages assigned to localities may be based entirely on one or more geochronologic units (indicating a range of geologic time, represented by all or part of such a unit); on absolute dating methods or on a combination of methods. Briefly, one first indicates (in the Dating Method field) whether the age assigned to the locality is an absolute one (such as a radiometric age) or a time_unit (non-absolute, geochronologic) one, or a composite (dates for maximum and minimum based on different methods or criteria). If the date is absolute, one directly supplies the ages in the appropriate fields and indicates the method generating the date in the respective Bfa field. If the date is non-absolute, one supplies the geologic time unit's name and the computer looks up that time unit's current maximum and minimum ages and applies them to the locality. If, later, the accepted age range of the time unit changes, the locality can be updated because the time unit is associated with the locality. It can be used as the target of a search and the retrieved localities updated. Temporally overlapping localities with absolute ages will not be affected. Localities based upon geological time units retain their temporal relationship with each other. The Frac fields are used to specify particular fractions of a time unit's total range in age.

It is important to recognize that entering the name of a time-unit into a Bfa field functions only to specify a (probable) age-range for the locality. That is, it assigns the locality only to a particular geochronologic unit (a segment of geologic time; e.g., the "Clarendonian Land Mammal Age"). However, most designations for geologic time-units possess a corresponding body of rock (a chronostratigraphic or biostratigraphic unit) believed to correspond to that segment of time. For example, by saying that a particular mammal locality is "Clarendonian" in age, I am usually not just saying that I think that it dates between 10.5 and 9.0 Ma (which might be the conventional boundaries assigned in the database to the Clarendonian Land Mammal Age -- a geochronologic unit), but I am also implying that the fauna of this locality belongs to a presumed biological entity, "the Clarendonian Fauna". This entity is presumed to be restricted to (and in some way to define) the time-span of the Clarendonian Land Mammal Age. This biological entity can (we hope) be recognized on purely faunal grounds. However, in the Bfa fields, we are not making any explicit assignment of the locality or its contents to a particular chronostratigraphic or biostratigraphic unit. Rather, we record geochronologic assignments, and the assignment to chronostratigraphic units, biozones or other biostratigraphic entities is often implicit through correspondence of nomenclature, implied correlation, or even the species list itself.

However, in the Chronostratigraphic Age field we can indicate membership in such entities separately from any implications of the time-unit entered in the Basis-for-Age field. This optional field may be useful for a number of reasons, though frequently it may simply repeat the Bfa information for time-units. Biotic entities used, defined, and justified in biochronology/biostratigraphy may in fact have biological significance. Also there may be well-known faunas or floras that do not carry with them any particular age-implication, since they at least at present do not explicitly define chronostratigraphic or geochronologic units, but nevertheless carry important biogeographic or other information. Membership in a particular biostratigraphic unit may be evident, but the temporal significance of this may be controversial, and we may wish to assign an age based on a more general time-unit, while still recording the information about the more specific unit -- for later revision. Finally, we may want to use reliable "absolute" dates for a particular locality, but that should not prevent us from recording the information that the locality can also be assigned to a particular chronostratigraphic or biostratigraphic unit on other grounds.

The system we use gives each locality an age range, not a particular age. When the Explorer searches for localities to display on the map screen, it uses a specified age range as well. Localities are retrieved if these two age ranges overlap. That is, the Explorer finds all the localities that might fall within the given time interval, not only those which certainly do so. If a locality's age is given as "Triassic," this is taken to mean that the actual age of the locality might be anywhere in the Triassic; a search of any time interval including any part of Triassic time will pull up this locality. Thus, one cannot guarantee that one obtains a set of localities similar in age merely by specifying a narrow time interval for the Explorer's search. (One can do so by subsequently searching for localities on screen whose maximum and minimum ages are both contained within the time interval.) Thus, localities whose age ranges are broad will tend to pop up all the time. Researchers should remember this when assigning ages to localities, and attempt to give the most specific age ranges that are consistent with sound scientific inference.

Descriptions of Data Fields

The following chapters describe the data fields of the Database in some detail. Each field of the various entities is described in a standard format. For the benefit of those researchers using this manual as a reference, the fields are presented in alphabetical order rather than in logical subject groupings. A separate chapter treats the lists that are specific to localities.

At the top of each page is the data field name as it appears on the data fields sheet or data-entry screen, followed by a brief description.

Next the data type is described. Usually this is a character string, integer, or real number. The maximum length is given for all character strings. Fields whose data types are "predefined strings" require you to choose one of the strings listed at the bottom of the page under "Allowed Values." Pay attention to upper and lower case; most allowed values for predefined fields are entirely in lower case. Most properly capitalized words (e.g., entries for genus, country, state) are capitalized. Some fields are described as being "MANDATORY". This means that they must be filled in before the entity can be loaded into the Database. If a person performing data-entry reaches a mandatory field and cannot enter a value for it, the entire entry must be aborted. (This is to prevent there being localities without ages or locations, species without names, updates without dates, etc.)

Next follows a fuller description of the data field.

Below that are the names of the internal table that the field is found in and the field's internal name. The table name precedes the period (".") and the field name follows it. You must use these names if you are writing an SQL query directly to the database management system.

Finally, there is a list of allowed values. These lists are subject to change in response to user's requests.

[ TOP ]