Stipa Data Collection Protocol

Stipa is an easy to use and highly adaptable data collection solution for mobile devices. Stipa’s versatility arises from an XML document called the Stipa Data Collection Protocol. This document specifies what information will be collected during a particular data collection effort. It is human-readable and can be edited using a basic text editor (although more sophisticated graphical user interfaces are also available for that purpose).

A Data Collection Protocol is constructed from seven core XML elements. Figure 1 illustrates the hierarchical structure of these core elements and how they should be arranged in the XML document. All other elements used in a Data Collection Protocol (some are required; others are optional) have been omitted from Figure 1 for clarity, but a full description of them can be found in subsequent figures and tables.

<?xml version="1.0" encoding="UTF-8" ?>
<Protocol>
  <Forms>
    <Form>
      <ObservationSets>
        <ObservationSet>
          <Observations>
            <Observation></Observation>
          </Observations>
        </ObservationSet>
      </ObservationSets>
      <Attributes>
        <Attribute></Attribute>
      </Attributes>
    </Form>
  </Forms>
  <SharedLists>
    <SharedList>
      <Category></Category>
    </SharedList>
  </SharedLists>
</Protocol>

Figure 1. XML document structure – core elements.

Protocol. This is the root element of the Data Collection Protocol. It is identified by a universally unique identifier (UUID) that should remain unchanged even if the protocol is modified. The protocol ID is specified in the ID child element. Subsequent modifications to the protocol should be identified using the Version child element. The date of the most recent modification should be documented using the LastModified element. Unique identifiers help to facilitate proper matching of data to metadata, as well as ingestion of data into external databases.

Form. Each protocol must contain one or more Form element. Form elements are listed as child elements of Forms, which is a direct descendent of Protocol. Forms provide a way of logically organizing observations that will be performed on subjects during a data collection effort.

Attribute. Each Form element must contain one or more Attribute element. Attribute elements are listed as child elements of Attributes, which is a direct descendent of Form. Attributes are used to specify which characteristics of a subject need to be measured or observed. They also define how observations should be recorded; for example, as free-form text, selections from a list, numbers within a range, and so on.

ObservationSet. Optionally, each Form element can contain one or more ObservationSet element. ObservationSet elements are listed as child elements of ObservationSets, which is a direct descendent of Form. ObservationSet elements can be used to specify how forms should be repeated for the same study subject. For example, the same attributes may need to be measured at multiple points along a line transect. Or, the same information may need to be collected for multiple horizons within a soil excavation. If multiple ObservationSets are listed for a single Form, the latter ObservationSets are interpreted as being nested within the former, in the order they are listed.

Observation. Each ObservationSet must contain one or more Observation element. Observation elements are listed as child elements of Observations, which is a direct descendent of ObservationSet. If a form is completed multiple times for the same study subject, observations are used to identify these replicate entries. In some data entry scenarios, it may be desirable to have a static, pre-defined set of observations. In other cases, it may not be possible to predict how many times a form will be completed during data collection (for example, how many horizons a particular soil profile will contain). If the ability to add and remove observations is desirable, the Mutable child element of ObservationSet should be set to yes. Observation sets are considered immutable by default.

SharedList. Optionally, each protocol can contain one or more SharedList element. SharedList elements are listed as child elements of SharedLists, which is a direct descendent of Protocol. SharedList elements can be used to define categories that apply to more than one Attribute. Referencing a SharedList, rather than replicating the same Category list for different Attributes, can help to reduce the size and complexity of the data collection protocol.

Category. Optionally, each Attribute can contain one or more Category element. Category elements are listed as child elements of Categories, which is a direct descendent of Attribute or SharedList. Category elements can be used to define the choices available for a particular Attribute. Categories should be listed only for attributes that are of the “category” type.

Table 1. XML Elements.

Element Description Type Maximum Length Required
Protocol Yes
ID Universally unique identifier UUID 36 Yes
Label Protocol display name String 100 Yes
Description Protocol description String 255 No
Origin Name of the individual or organization that created the protocol String 100 No
SubjectLabel Display name of data collection subjects String 100 No
Version Universally unique identifier of the protocol version UUID 36 No
LastModified Date and time of the last protocol edit Datetime NA No
ShowAttributeValues Yes if attribute values will be displayed by default in the attribute list and no otherwise. Default is yes. String in set {no, yes} NA No
ShowCategoryDescription Yes if category descriptions will be displayed by default in the category list and no otherwise. Default is yes. String in set {no, yes} NA No
Forms List of Form elements Element list NA No
SharedLists List of SharedList elements Element list NA No
Form Yes
ID Form identifier (must be unique within the scope of the protocol) String 40 Yes
Label Data form display name String 100 Yes
Description Data form description String 255 No
AttributeLabel Display name of data attributes. Default is ‘Attribute’. String 255 No
ObservationSets List of ObservationSet elements Element list NA No
Attributes List of Attribute elements Element list NA No
Attribute Yes
ID Attribute identifier (must be unique within the scope of the data form) String 40 Yes
Label Attribute display name String 100 Yes
Description Attribute description String 255 No
ControlLabel Label to display above the data entry control, if different than the attribute display name. Default is no label. String 255 No
Figure Name of the figure to display with the attribute description. Default is no figure. String 100 No
Type Attribute type String in set {boolean, calendar, category, date, geometry, number, photo, text, time} NA Yes
Optional Yes if the attribute is optional and no otherwise. Default is yes. String in set {no, yes} NA No
MaxLength Maximum number of characters that can be entered. Default is 1000 characters. Integer in set {n>0} 4 No
Format
(text type only)
Character sequence used to format the entry (n = number, a = lowercase or uppercase letter, A = uppercase letter). Default is no format. String 100 No
Min
(number type only)
Minimum allowable value. Default is no minimum bound. Decimal 40 No
Max
(number type only)
Maximum allowable value. Default is no maximum bound. Decimal 40 No
Precision
(number type only)
Maximum allowable decimal places. Default is unlimited decimal places. Integer in set {0≤n≤10} NA No
Increment
(number type only)
Increment amount. Default is no increment amount. Integer 40 No
Unit
(number type only)
Unit of measure. Default is no unit of measure. String 40 No
Tool Measurement tool. Default is no measurement tool. String in set {clinometer, gps} NA No
StartYear
(date type only)
First year to be displayed in the date or category control. Default is 2000. Integer 4 No
EndYear
(date type only)
Last year to be displayed in the date or category control. Default is 2050. Integer 4 No
SharedList
(category type only)
Name of the shared list used to populate the category list. Default is no shared list. String 40 No
MaxCount
(category type only)
Maximum number of categories that can be selected. Default is unlimited selections. Integer in set {n>0} 4 No
AutoPrioritize
(category type only)
Yes if most-used categories are moved to the top of the category list and no otherwise. Default is no. String in set {no, yes} NA No
LabelObservation
(category type only)
Yes if selected categories will be used to label observations and no otherwise. Default is no. String in set {no, yes} NA No
CategoryModifier
(category type only)
An option modifier to be applied to selections. Only relevant if MaxCount is 1. String 100 No
Categories
(category or text type only)
List of Category elements Element list NA No
Validations List of Validation elements Element list NA No
ObservationSet No
ID Unique identifier within the scope of the data form String 40 Yes
Label Observation set display name String 100 Yes
Description Observation set description String 255 No
Mutable Yes if observations can be added/removed during data entry and no otherwise. Default is no. String in set {no, yes} NA No
LabelMethod Method used to label new observations. Observations can be labeled by numeric order (increment), current date (date), or user-specified label (manual). Default is manual. Only relevant if Mutable is yes. String in set {date, increment, manual} NA No
Observations List of Observation elements Element list NA No
Observation No
ID Unique identifier within the scope of the observation set String 40 Yes
Label Observation set display name String 100 Yes
Description Observation set description String 255 No
SharedList No
ID Unique identifier within the scope of the protocol String 40 Yes
Label Shared list display name String 100 Yes
Description Shared list description String 255 No
Categories List of Category elements Element list NA No
Category No
ID Unique identifier within the scope of the data attribute or shared list String 40 Yes
Label Category display name String 100 Yes
Description Category description String 255 No
Priority Yes if the category will be moved to the top of the category list and no otherwise. Default is no. String in set {no, yes} NA No
Validation No
Type Validation type String in set {distinct value, exclusion set, exclusion switch, exclusive interval, inclusion set, inclusion switch, value combination} NA Yes
Attributes List of Attribute elements containing a single string value and no child elements Element list 40 (Attribute value) No
Dependencies List of Dependency elements containing a single string value and no child elements Element list 255 (Dependency value) No
Values List of Value elements containing a single string value and no child elements Element list 255 (Value value) No

The code sample below illustrates the hierarchical structure of all required and optional XML elements in a Data Collection Protocol document. Stipa does not enforce the ordering of child elements, but imitating the order shown below will heighten the document's readability.

<?xml version="1.0" encoding="UTF-8" ?>
<Protocol xmlns="https://webapps.jornada.nmsu.edu/stipa"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="https://webapps.jornada.nmsu.edu/stipa/stipa-protocol-v.beta.xml">
  <ID></ID>
  <Label></Label>
  <Description />
  <Origin />
  <SubjectLabel></SubjectLabel>
  <Version />
  <LastModified />
  <ShowAttributeValues />
  <ShowCategoryDescription />
  <Forms>
    <Form>
      <ID></ID>
      <Label></Label>
      <Description />
      <AttributeLabel></AttributeLabel>
      <ObservationSets>
        <ObservationSet>
          <ID></ID>
          <Label></Label>
          <Description />
          <Mutable />
          <AutoIncrement />
          <Observations>
            <Observation>
              <ID></ID>
              <Label></Label>
              <Description />
            </Observation>
          </Observations>
        </ObservationSet>
      </ObservationSets>
      <Attributes>
        <Attribute>
          <ID></ID>
          <Label></Label>
          <Description />
          <ControlLabel />
          <Figure />
          <Type></Type>
          <Optional />
          <MaxLength />
          <Format />
          <Min />
          <Max />
          <Precision />
          <Increment />
          <Unit />
          <Tool />
          <StartYear />
          <EndYear />
          <SharedList />
          <MaxCount />
          <AutoPrioritize />
          <LabelObservation />
          <CategoryModifier />
          <Categories>
            <Category>
              <ID></ID>
              <Label></Label>
              <Description />
              <Priority />
            </Category>
          </Categories>
          <Validations>
            <Validation>
              <Type></Type>
              <Attributes>
                <Attribute></Attribute>
              </Attributes>
              <Dependencies>
                <Dependency></Dependency>
              </Dependencies>
              <Values>
                <Value></Value>
              </Values>
            </Validation>
          </Validations>
        </Attribute>
      </Attributes>
    </Form>
  </Forms>
  <SharedLists>
    <SharedList>
      <ID></ID>
      <Label></Label>
      <Description />
      <Categories>
        <Category>
          <ID></ID>
          <Label></Label>
          <Description />
          <Priority />
        </Category>
      </Categories>
    </SharedList>
  </SharedLists>
</Protocol>

Figure 2. XML document structure – all elements.

Up next: Managing Stipa Data Collection Protocols in My Workspace →