
ohvbd is an R package for retrieving (and parsing) data
from a network of disease vector data sources.
This package was developed as part of the One Health Vector-Borne Diseases Hub.
ohvbd allows for searching and the retrieval of data
from the following data sources:
You can install the stable version of ohvbd from CRAN:
install.packages("ohvbd")You can alternatively install the development version of ohvbd from GitHub including any new or experimental features:
# install.packages("devtools")
devtools::install_github("fwimp/ohvbd")The vignettes are all available online, but if you would like to
build them locally, add build_vignettes = TRUE into your
install_github() command. However, we do not recommend
doing this due to the number of extra R packages utilised in the
vignettes.
ohvbd has been designed to make finding and retrieving
data on disease vectors simple and straightforward.
Typically it uses a “piped”-style approach to find, get, and filter data from the supported databases, however it aims to provide the data to you “as-is”, leaving further downstream analysis and filtering down to you.
A basic pipeline for finding and retrieving data on Ixodes ricinus from the VecTraits database looks something like this:
library(ohvbd)
df <- search_hub("Ixodes ricinus") |>
filter_db("vt") |>
fetch() |>
glean()Major API change
extract_ functions are now glean_.
tidyverse is loaded after
ohvbd, there are no direct namespace collisions.Full list of function name changes:
extract() -> glean()extract_ad() -> glean_ad()extract_gbif() -> glean_gbif()extract_vd() -> glean_vd()extract_vt() -> glean_vt()fetch_extract_vd_chunked() ->
fetch_glean_vd_chunked()fetch_extract_vt_chunked() ->
fetch_glean_vt_chunked()New functions & arguments:
ohvbd now interfaces with GBIF for occurrence data.
*_gbif functions (e.g. fetch_gbif())
allow for retrieving and extracting data from GBIF.rgbif package are required to
retrieve data from GBIF.tee() command allows one to extract data from the
middle of a pipeline and save it to an environment.
ohvbd workflows,
and can be used in any base R pipeline (|>). It has not
been tested in magrittr pipelines but should work as-is.filter_db() command allows for filtering out of
only one database’s results from hub searches.check_db_status() now returns (invisibly) whether all
databases are up or not.fetch_citation() and fetch_citation_*
commands provide an interface to attempt to retrieve citations from a
vectorbyte dataset.
force_db() function enables one to force
ohvbd to consider a particular object as having a
particular provenance.simplify argument to search_hub()
makes hub searches return an ohvbd.ids object if only one
database was searched for. This behaviour is on by default.
filter_db() will now transparently
return ohvbd.ids objects if it gets them.taxonomy argument to search_hub()
allows for filtering searches by GBIF backbone IDs.match_species() function allows for quick and
flexible matching of species names to their GBIF backbone IDs.match_country() function allows for matching of
country names to WKT polygons via naturalearth.ohvbd_db(), has_db(), and
is_from() functions allow for quick testing of object
provenance (according to ohvbd).get_default_ohvbd_cache() function allows for
custom functions that interface with cached ohvbd data
files.list_ohvbd_cache() and
clean_ohvbd_cache() functions enable better interactive
cache management.
clean_ad_cache() has been removed as it is
now unnecessary.search_x_smart() functions can now take
"tags" as a search field, enabling support for tagged
datasets.Other:
\dontrun{} so they
should be runnable from an installed version of the package.ohvbd is now
covered with unit tests (using the vcr package).fetch_vd() no longer tries to retrieve ids with no
pages of data.set_ohvbd_compat() as unexpected
SSL errors should break pipelines by default.
fetch() on an ohvbd.hub.search or
glean() on an ohvbd.ids object now provides a
hint that you may have forgotten something.
fetch() command
and run search_hub() |> glean() which didn’t previously
give an interpretable error.vcr to massively reduce their build
time. This should only matter to developers of ohvbd, or
users who download from github and build the vignettes themselves.ohvbd.ids() now warns you and fixes the problem if you
provide ids with duplicate values.glean_vt() and glean_vd() now force the
inclusion of the dataset ID when filtering columns (using the
cols argument).
glean_ad() now correctly returns a matrix even when
there is only 1 row or column.fetch_vd_counts() is now significantly faster, more
robust, and temporarily caches data.
fetch_vd() under the hood,
particularly if you are running it multiple times in a day.fetch_ad() for
metrics and search_vt_smart() for operators and fields) is
now fuzzy, allowing for a small amount of deviation from the actual term
name.assoc_ad() now tries to guess LatLong column names if
none (or the wrong ones) are provided.NULL rather than
NA for default missing values (except date arguments to
AD-related functions, where NA is more reasonable in the grand
scheme).fetch_ad() now caches and tries to read from cache by
default.
refresh_cache = TRUE or use_cache = FALSE
(depending on if you want to replace your existing cache or not).See changelog for patch notes for all versions.