tidytext 0.1.4

By Julia Silge

September 30, 2017

I am pleased to announce that tidytext 0.1.4 is now on CRAN!

This release of our package for text mining using tidy data principles has an excellent collection of delightfulness in it. First off, all the important functions in tidytext now support support non-standard evaluation through the tidyeval framework.

library(janeaustenr)
library(tidytext)
library(dplyr)

input_var <- quo(text)
output_var <- quo(word)

data_frame(text = prideprejudice) %>%
    unnest_tokens(!! output_var, !! input_var)
## # A tibble: 122,204 x 1
##         word
##        <chr>
##  1     pride
##  2       and
##  3 prejudice
##  4        by
##  5      jane
##  6    austen
##  7   chapter
##  8         1
##  9        it
## 10        is
## # ... with 122,194 more rows

I have found the tidyeval framework useful already in my day job when writing functions using dplyr for complex data analysis tasks, so we are glad to have this support in tidytext. The older underscored functions (like unnest_tokens_()) that took only strings as arguments are still in the package for now, but tidyeval is the way to go, everybody!

I also used pkgdown to build a website to explore tidytext’s documentation and vignettes.

Our book website of course contains a lot of information about how to use tidytext, but the pkgdown site has a bit of a different focus in that you can explicitly see all the function documentation and such. Getting this site up and running went extremely smoothly, and I have not worked hard to customize it; this is just all the defaults. In my experience here, the relative bang for one’s buck in setting up a pkgdown site is extremely good.

Another exciting addition to this release of tidytext are tidiers and support for Structural Topic Models from the stm package using tidy data principles. I am becoming a real fan of this implementation of topic modeling in R after experimenting with it for a while (no rJava! so fast!) and soon I’ll have a complete code-through with some example text, The Adventures of Sherlock Holmes.

via GIPHY

There are a few other minor changes and bug fixes in this release as well. Get the new version of tidytext and let us know on GitHub if you have any issues!

Posted on:
September 30, 2017
Length:
2 minute read, 367 words
Tags:
rstats
See Also:
Topic modeling for #TidyTuesday Spice Girls lyrics
Predicting viewership for #TidyTuesday Doctor Who episodes
Spatial resampling for #TidyTuesday and the #30DayMapChallenge