Reads a XML file, as produced by SIVIM (version 1.2), importing the most relevant data to R data.frame
s (relevé table and the respective header), using package xml2
.
Arguments
- file
character
The name or path to the file. If only the name is given, the working directory must point to the file location.- extract
character
"both" extracts both the relevé table and respective header; "table" extracts only relevé table data (without the header); "header" extracts only the header data.- select.col
numeric
A vector with the indices (of the original table order) of the relevés to extract. IfNA
(the default) all relevés are retrieved from the XML file.- report
logical
Should the function report on non-empty elements or attributes that where not treated? Defaults toFALSE
.
Value
If extract
= "both", the function returns a list
with the following two components:
- table
a
data.frame
with the relevé table (without the header)- header
a
data.frame
with the respective header data
If extract
= "table" or extract
= "header" only the respective data.frame is returned.
Details
This function uses functions from the package xml2
to import the data inside a SIVIM XML file.
Duplicated lines are merged using aggregate_repeated
function, keeping the maximum value according to the recognized scale using releve_scale
.
If the scale of the relevé is not recognized, a warning is given to the user and the highest value – considering the simple alphabetic order of the values in the relevés – is kept. Manual checking is strongly advisable for these cases.
In very rare cases, SIVIM tables present empty strings ("") or strange characters in the place of the cover-abundance values value. This usually corresponds to a failure, probably from digitization, that needs manual correction. When an empty string is present (as any other strange character), the function will produce a warning as it does not recognize the scale in use. Such cases are imported the same (possibly still eliminating duplicated lines) and should be addressed manually.
Yet not corresponding to an expected use, a dot (".") is used as absence value in the imported data.frame
. This deeply facilitates the use of several other functions like aggregate_repeated
, which would fail if a true NA
would be used.
Additionally, it also corresponds to a common way of presenting phytosociological tables in the Iberian tradition with improved readability.
Author
Tiago Monteiro-Henriques. E-mail: tmh.dev@icloud.com.