RDataGet

This page contains the documentation for RDataGet.

RDataGet.RDataGet — Module

RDataGet

RDataGet gets tabular R datasets from CRAN. It is an alternative to RDatasets.jl, working on demand, rather than bundling data.

The basic usage is similar to RDatasets.jl. You can install it as follows:

    Pkg.add("RDataGet")

After installing the RDataGet package, you can then load data sets using the dataset() function, which takes the name of a package and a data set as arguments:

    using RDataGet
    harman_political = dataset("psych", "Harman.political")
    neuro = dataset("boot", "neuro")

Limitations

This package currently just downloads source packages from CRAN and loads its dataset into memory in Julia. It does not depend on R itself.

The package has a few limitation, some of which are caused by this design, while others could be addressed in future:

Does not support built-in R datasets, including the datasets package, only ones which can be downloaded from CRAN
Can only load rda/RData/csv.gz files in the data directory
- As such it does not support packages which generate their data using a build script
Cannot get any descriptions or further documentation related to the datasets from Julia (maybe TODO but needs .Rd parsing)
Only supports getting the latest version of each package (TODO)
Fixed, very-limited caching strategy
- The package index is re-downloaded every time we need to download any package (so as to find the latest version number) (TODO: should be by-default cached per session + longer caching allowed)
- Packages are downloaded exactly once per session, after which the same data is reused until Julia is restarted (TODO: should be customisable for longer caching)

source

Exported functions

RDataGet.dataset — Function

dataset(package_name, dataset_name) -> Any
dataset(package_name, dataset_name, types) -> Any
dataset(
    package_name,
    dataset_name,
    types,
    cran_mirror
) -> Any

Tries to find dataset_name the data directory of the R package package_name. The data table is loaded directly from an RData or CSV file in package source. Sometimes, not all columns can be successfully typed from CSVs, and so types can be provided which will be passed to CSV.File.

An alternative cran_mirror can be specified, by default default_cran_mirror= "https://cloud.r-project.org/" is used.

After first load, the data will be cached as an arrow file.

source

RDataGet.datasets — Function

datasets(package_name) -> DataFrame
datasets(package_name, cran_mirror) -> DataFrame

Lists the datasets found in the data directory of the package_name R package along with some basic metadata in a DataFrame.

An alternative cran_mirror can be specified, by default default_cran_mirror= "https://cloud.r-project.org/" is used.

This will currently cause all datasets in the package to be cached.

source