7.4.2 Vocabulary mapping
There are many possible approaches to mapping terms from one vocabulary to another. One situation in which a standard process can be defined is that a controlled vocabulary is used to populate a field in the source data, and that field maps directly to a field in the interchange format. For example, consider a source dataset that contains a ‘dominantLithology’ field with the information used to populate the ‘representativeLithology_uri’ for a GeologicUnitView feature. The recommended procedure in this case is:
- Produce a table of the unique ‘dominantLithology’ values in the source data
- Add columns to this table for the corresponding term name and URI from the CGI standard lithology vocabulary (required for GeoSciML-Portrayal OneGeology services).
- Determine the best matching value from the CGI standard lithology vocabulary for each unique lithology term.
- Use an SQL query like that in Code example 1, joining the ‘dominantLithology’ field to the corresponding field in the lookup table to update the ‘representativeLithology_uri’ field to the correct standard lithology term URI.
In general, the most specific term from the interchange vocabulary that completely subsumes (encompasses) the meaning of the term in the source vocabulary should be used. If the source vocabulary has terms that are more specific than the controlled vocabulary, there will be some information lost in this process, but the original source terminology should be preserved in the text description, lithology, and geologicHistory fields in the interchange document as appropriate. Remember, the primary purpose of the controlled vocabulary fields is for data integration, search criteria, and standardized map legends.
In some cases, unique values for a combination of fields from the source data may be necessary to define mapping to interchange concepts, particularly for representativeAge_uri, representativeLithology_uri, and genericSymbolizer fields. The procedure is the same, but the unique values query will involve more than one source field and multiple-field joins will be necessary to construct queries generating the output schema content.
Standard vocabularies are listed in the accompanying Microsoft Excel workbook
Section last modified: 08 October 2015