(This article was first published on
RStudio, and kindly contributed to R-bloggers)
by Joseph Rickert
141 new packages landed on CRAN in August. The following are my picks for the most interesting packages in four categories. My selection criteria were brutally simple: to make the list, a package had to have enough documentation for me to have some idea about what it does, and also, in my judgment, provide some functionality that is likely to appeal to a broad class of users. I am sure that through my ignorance and biases I have overlooked some really good work; for this, I apologize.
One thing that struck me as peculiar during my review is the large number of packages lacking vignettes or a link to documentation describing what the package does or how it works. I can understand that explanatory documentation might be superfluous for a person writing a package for herself or her research team, but in that case, why put it on CRAN? I would think that a package developer who takes the trouble to put something on CRAN would want others to discover and use his work. With over 9,000 packages already on CRAN, and that number growing by well over a 100 packages each month, I would not be surprised if some works of real merit go unnoticed due to lack of documentation.
The trend for new R packages written primarily to connect to diverse data sources, which was previously noted in the period from May through July, continued in August. Maybe it’s time to consider developing an R Task View for Data.
Belex v0.1.0: Provides functions for downloading historical financial data from the Belgrade StockExchange.
boxoffice: v0.1.0: Enables downloads of daily box office information (how much each movie earned in theaters) using data from either Box Office Mojo or The Numbers.
dbhydroR v0.1-6: Provides access to the South Florida Water Management District’s DBHYDRO database, with functions for accessing hydrologic and water quality data. The vignette shows how to compose database queries.
getlandsat v0.1.0: Contains functions to get Landsat 8 Data from Amazon Web Services (‘AWS’) public data sets.
IMFData v0.1.0: Provides an interface to International Monetary Fund data, enabling R users to search and extract data.
mdsr v0.1.3: Contains all of the data sets and code for the book Modern Data Science with R.
August was also a good month for new machine-learning packages. R package developers are making serious contributions to the world’s data science tool set.
algorithmia v0.0.1: Provides a set of REST wrappers to access the algorithms in the Algorithmia online marketplace. The vignette describes the Algorithmia R client. Look here for a list of Weka-based machine-learning algorithms.
arulesCBA v1.0: Provides a function to build an association rule-based classifier for data frames. The vignette shows how to get started.
blkbox v1.0: Allows multiple machine-learning algorithms to be run on a data set in parallel, while providing functions for feature selection, k-fold cross-validation, and nested cross-validation. The vignette shows how to get started.
hyperSMURF v1.01: Uses a hyper-ensemble approach to classify data characterized by a high imbalance between the minority and majority class.
meanShiftR v0.50: Performs mean shift classification using linear and k-d tree nearest neighbor implementations for the Gaussian kernel. The blog post provides some benchmarks.
MetaheuristicFPA v1.0: Implements the standard flower pollination algorithm for global optimization. See the paper by Xin-She Yang for details.
ndjson v0.2.0: Provides a fast JSON reader (one record per line)
sunburstR v0.6.0: Sequences sunburst diagrams, and provides an interactive method for exploring sequence data, such as website navigation paths. The package contains a function to create interactive D3.js diagrams.
tpAUC v1.0.1: Provides tools for estimating partial areas under ROC curves and ordinal dominance curves. The vignette explains the method and provides a quick-start example. For a detailed explanation, have a look at the paper by Yang, Lu and Zhao.
Package developers also continued to advance R’s awesome array of packages for doing computational statistics. At least four out of the following five packages should be of interest to students of statistics.
DHARMa v0.1.0: Uses a simulation to compute scaled, quantile residuals from fitted generalized linear mixed models. ‘Lm’, ‘lme4’, and ‘glm’ models are supported. The vignette provides detailed examples.
edfun v0.2.0: Provides a function for creating one-dimensional empirical distribution functions. The vignette shows how to compute the pdf, CDF, quantiles and draw random samples.
lmPerm v2.1.0: Enables a modern approach to linear regression by modifying the standard models-to-uses permutation tests, rather than normal theory, to obtain p-values. The vignette provides several examples.
pulsar v0.2.5: Provides functions to use the Stability Approach for model selection of penalized graphical models. There is a nice vignette on how to get started that includes multiple references.
stR v0.1: Provides functions for the seasonal decomposition of time series data. The methods allow for multiple seasonal components and multiple linear covariates, and provides confidence intervals for the estimated components. The vignette shows several interesting examples. For instance, the following plot shows Australian electricity consumption data, decomposed using a weekly seasonal pattern and a daily seasonal pattern that takes weekends and holidays into account.
Finally, here are three packages on miscellaneous topics that ought to become popular over time
forcats v0.1.0: Provides some very useful helper functions for working with factor levels.
modelr v0.1.0: Extends the workflow underlying Hadley Wickham’s tidyverse packages by integrating modeling tasks into a pipeline of data manipulation and visualization.
XR v0.7: Provides the new class structures, functions, and methods to begin implementing the new ideas for connecting R to other languages described in John Chambers book, Extending R.