R Data Science Library Package
A newer version of this documentation is available. Use the version menu above to view the most up-to-date release of the Greenplum 5.x documentation.
R Data Science Library Package
R packages are modules that contain R functions and data sets. Greenplum Database provides a collection of data science-related R libraries that can be used with the Greenplum Database PL/R language. You can download these libraries in .gppkg format from Pivotal Network.
This chapter contains the following information:
- R Data Science Libraries
- Installing the R Data Science Library Package
- Uninstalling the R Data Science Library Package
For information about the Greenplum Database PL/R Language, see Greenplum PL/R Language Extension.
R Data Science Libraries
abind adabag arm assertthat BH bitops car caret caTools coda colorspace compHclust curl data.table DBI dichromat digest dplyr e1071 flashClust forecast foreign gdata ggplot2 |
glmnet gplots gtable gtools hms hybridHclust igraph labeling lattice lazyeval lme4 lmtest magrittr MASS Matrix MCMCpack minqa MTS munsell neuralnet nloptr nnet pbkrtest plyr |
quantreg R2jags R6 randomForest RColorBrewer Rcpp RcppEigen readr reshape2 rjags RobustRankAggreg ROCR rpart RPostgreSQL sandwich scales SparseM stringi stringr survival tibble tseries zoo |
Installing the R Data Science Library Package
Before you install the R Data Science Library package, make sure that your Greenplum Database is running, you have sourced greenplum_path.sh, and that the $MASTER_DATA_DIRECTORY and $GPHOME environment variables are set.
- Locate the R Data Science library package that you built or downloaded.
The file name format of the package is DataScienceR-<version>-relhel<N>-x86_64.gppkg.
- Copy the package to the Greenplum Database master host.
- Use the gppkg command to install the package. For
example:
$ gppkg -i DataScienceR-<version>-relhel<N>-x86_64.gppkg
gppkg installs the R Data Science libraries on all nodes in your Greenplum Database cluster. The command also sets the R_LIBS_USER environment variable and updates the PATH and LD_LIBRARY_PATH environment variables in your greenplum_path.sh file.
- Restart Greenplum Database. You must re-source greenplum_path.sh before
restarting your Greenplum
cluster:
$ source /usr/local/greenplum-db/greenplum_path.sh $ gpstop -r
$GPHOME/ext/DataScienceR/library
$ gpssh -f all_hosts -e 'ln -s $GPHOME /usr/local/greenplum-db' $ gpssh -f all_hosts -e 'chown -h gpadmin /usr/local/greenplum-db'
Uninstalling the R Data Science Library Package
Use the gppkg utility to uninstall the R Data Science Library package. You must include the version number in the package name you provide to gppkg.
To determine your R Data Science Library package version number and remove this package:
$ gppkg -q --all | grep DataScienceR DataScienceR-<version> $ gppkg -r DataScienceR-<version>
The command removes the R Data Science libraries from your Greenplum Database cluster. It also removes the R_LIBS_USER environment variable and updates the PATH and LD_LIBRARY_PATH environment variables in your greenplum_path.sh file to their pre-installation values.
Re-source greenplum_path.sh and restart Greenplum Database after you remove the R Data Science Library package:
$ . /usr/local/greenplum-db/greenplum_path.sh $ gpstop -r