The Total Differential Value of a big phytosociological data set
Source:R/bigdata_tdv.R
bigdata_tdv.Rd
Given a big phytosociological data set represented as a list, and a partition of the relevés in that list, this function calculates the respective Total Differential Value (TDV).
Usage
bigdata_tdv(
phyto_list,
p,
n_rel,
output_type = "normal",
parallel = FALSE,
mc_cores = getOption("mc.cores", 2L)
)
Arguments
- phyto_list
A list. This is a very light representation of what could be a usual phytosociological table, registering only taxa presences. Each component should uniquely represent a taxon and should contain a vector (of numeric values) with the relevé(s) id(s) where that taxon was observed. Relevé's ids are expected to be represented by consecutive integers, starting with 1. The components of the list might be named (e.g. using the taxon name) or empty (decreasing further memory burden). However, for
output_type == "normal"
taxa names are useful for output interpretation.- p
A vector of integer numbers with the partition of the relevés (i.e., a k-partition, consisting in a vector with values from 1 to k, with length equal to the number of relevés in
phyto_list
, ascribing each relevé to one of the k groups).- n_rel
The number of relevés in the
phyto_list
, obtained e.g. withlength(unique(unlist(phyto_list)))
.- output_type
A character determining the amount of information returned by the function and also the amount of pre-validations. Possible values are "normal" (the default) and "fast".
- parallel
Logical. Should function
parallel::mclapply()
) be used to improve computation time by forking? Not available on Windows. Refer to that function manual for more information. Defaults toFALSE
.- mc_cores
The number of cores to be passed to
parallel::mclapply()
ifparallel = TRUE
. Seeparallel::mclapply()
for more information.
Value
If output_type = "normal"
(the default) pre-validations are done
(which can take some time) and a list is returned, with the following
components (see tdv()
for the mathematical notation):
- ifp
A matrix with the \(\frac{a}{b}\) values for each taxon in each group, for short called the 'inner frequency of presences'.
- ofda
A matrix with the \(\frac{c}{d}\) values for each taxon in each group, for short called the 'outer frequency of differentiating absences'.
- e
A vector with the \(e\) values for each taxon, i.e., the number of groups containing that taxon.
- diffval
A matrix with the \(DiffVal\) for each taxon.
- tdv
A numeric with the TDV of matrix
m_bin,
given the partitionp
.
If output_type = "fast"
, only TDV is returned and no pre-validations are
done.
Details
This function accepts a list (phyto_list
) representing a
phytosociological data set, as well as a k-partition of its relevés (p
),
returning the corresponding TDV (see tdv()
for an explanation
on TDV).
Partition p
gives the group to which each relevé is ascribed, by
increasing order of relevé id.
Big phytosociological tables can occupy a significant amount of computer
memory, which mostly relate to the fact that the absences (usually more
frequent than presences) are also recorded in memory. The use of a list,
focusing only on presences, reduces significantly the amount of needed
memory to store all the information that a phytosociological table contains
and also the computation time of TDV, allowing computations for big data
sets.
Author
Tiago Monteiro-Henriques. E-mail: tmh.dev@icloud.com.
Examples
# Getting the Taxus baccata forests data set
data(taxus_bin)
# Creating a group partition, as the one presented in the original article of
# the data set
groups <- rep(c(1, 2, 3), c(3, 11, 19))
# Removing taxa occurring in only one relevé, in order to reproduce exactly
# the example in the original article of the data set
taxus_bin_wmt <- taxus_bin[rowSums(taxus_bin) > 1, ]
# Calculating TDV using tdv()
tdv(taxus_bin_wmt, groups)$tdv
#> [1] 0.1958471
# Converting from the phytosociologic matrix format to the list format
taxus_phyto_list <- apply(taxus_bin_wmt, 1, function(x) which(as.logical(x)))
# Getting the number of relevés in the list
n_rel <- length(unique(unlist(taxus_phyto_list)))
# Calculating TDV using bigdata_tdv(), even if this is not a big matrix
bigdata_tdv(
phyto_list = taxus_phyto_list,
p = groups,
n_rel = n_rel,
output_type = "normal"
)$tdv
#> [1] 0.1958471