Skip to contents

Reads a XML file, as produced by SIVIM (version 1.2), importing the most relevant data to R data.frames (relevé table and the respective header), using package xml2.

Usage

read_SIVIM_xml(file, extract = "both", select.col = NA, report = FALSE)

Arguments

file

character The name or path to the file. If only the name is given, the working directory must point to the file location.

extract

character "both" extracts both the relevé table and respective header; "table" extracts only relevé table data (without the header); "header" extracts only the header data.

select.col

numeric A vector with the indices (of the original table order) of the relevés to extract. If NA (the default) all relevés are retrieved from the XML file.

report

logical Should the function report on non-empty elements or attributes that where not treated? Defaults to FALSE.

Value

If extract = "both", the function returns a list with the following two components:

table

a data.frame with the relevé table (without the header)

header

a data.frame with the respective header data

If extract = "table" or extract = "header" only the respective data.frame is returned.

Details

This function uses functions from the package xml2 to import the data inside a SIVIM XML file.

Duplicated lines are merged using aggregate_repeated function, keeping the maximum value according to the recognized scale using releve_scale. If the scale of the relevé is not recognized, a warning is given to the user and the highest value – considering the simple alphabetic order of the values in the relevés – is kept. Manual checking is strongly advisable for these cases.

In very rare cases, SIVIM tables present empty strings ("") or strange characters in the place of the cover-abundance values value. This usually corresponds to a failure, probably from digitization, that needs manual correction. When an empty string is present (as any other strange character), the function will produce a warning as it does not recognize the scale in use. Such cases are imported the same (possibly still eliminating duplicated lines) and should be addressed manually.

Yet not corresponding to an expected use, a dot (".") is used as absence value in the imported data.frame. This deeply facilitates the use of several other functions like aggregate_repeated, which would fail if a true NA would be used. Additionally, it also corresponds to a common way of presenting phytosociological tables in the Iberian tradition with improved readability.

Author

Tiago Monteiro-Henriques. E-mail: tmh.dev@icloud.com.