RDataGet
This page contains the documentation for RDataGet.
RDataGet.RDataGet — Module
RDataGet
RDataGet gets tabular R datasets from CRAN. It is an alternative to RDatasets.jl, working on demand, rather than bundling data.
The basic usage is similar to RDatasets.jl. You can install it as follows:
Pkg.add("RDataGet")After installing the RDataGet package, you can then load data sets using the dataset() function, which takes the name of a package and a data set as arguments:
using RDataGet
harman_political = dataset("psych", "Harman.political")
neuro = dataset("boot", "neuro")Limitations
This package currently just downloads source packages from CRAN and loads its dataset into memory in Julia. It does not depend on R itself.
The package has a few limitation, some of which are caused by this design, while others could be addressed in future:
- Does not support built-in R datasets, including the
datasetspackage, only ones which can be downloaded from CRAN - Can only load rda/RData/csv.gz files in the data directory
- As such it does not support packages which generate their data using a build script
- Cannot get any descriptions or further documentation related to the datasets from Julia (maybe TODO but needs .Rd parsing)
- Only supports getting the latest version of each package (TODO)
- Fixed, very-limited caching strategy
- The package index is re-downloaded every time we need to download any package (so as to find the latest version number) (TODO: should be by-default cached per session + longer caching allowed)
- Packages are downloaded exactly once per session, after which the same data is reused until Julia is restarted (TODO: should be customisable for longer caching)
Exported functions
RDataGet.dataset — Function
dataset(package_name, dataset_name) -> Any
dataset(package_name, dataset_name, types) -> Any
dataset(
package_name,
dataset_name,
types,
cran_mirror
) -> Any
Tries to find dataset_name the data directory of the R package package_name. The data table is loaded directly from an RData or CSV file in package source. Sometimes, not all columns can be successfully typed from CSVs, and so types can be provided which will be passed to CSV.File.
An alternative cran_mirror can be specified, by default default_cran_mirror= "https://cloud.r-project.org/" is used.
After first load, the data will be cached as an arrow file.
RDataGet.datasets — Function
datasets(package_name) -> DataFrame
datasets(package_name, cran_mirror) -> DataFrame
Lists the datasets found in the data directory of the package_name R package along with some basic metadata in a DataFrame.
An alternative cran_mirror can be specified, by default default_cran_mirror= "https://cloud.r-project.org/" is used.
This will currently cause all datasets in the package to be cached.