Blog

Machine learning, text analysis, and more

Introducing tidylo

Today I am so pleased to introduce a new package for calculating weighted log odds ratios, tidylo. Often in data analysis, we want to measure how the usage or frequency of some feature, such as words, differs across some group or set, such as documents. One statistic often used to find these kinds of differences in text data is tf-idf. Another option is to use the log odds ratio, but the log odds ratio alone does not account for sampling variability.

July 8, 2019

Reordering and facetting for ggplot2

I recently wrote about the release of tidytext 0.2.1, and one of the most useful new features in this release is a couple of helper functions for making plots with ggplot2. These helper functions address a class of challenges that often arises when dealing with text data, so we’ve included them in the tidytext package. Let’s work through an example To show how to use these new functions, let’s walk through a more general example that does not deal with results that come from unstructured, free text.

July 1, 2019

Fixing your mistakes: sentiment analysis edition

Today tidytext 0.2.1 is available on CRAN! This new release of tidytext has a collection of nice new features. Bug squashing! 🐛 Improvements to error messages and documentation 📃 Switching from broom to generics for lighter dependencies Addition of some helper plotting functions I look forward to blogging about soon An additional change is significant and may be felt by you, the user, so I want to share a bit about it.

June 14, 2019

Relaunching the qualtRics package

Note: cross-posted with the rOpenSci blog. rOpenSci is one of the first organizations in the R community I ever interacted with, when I participated in the 2016 rOpenSci unconf. I have since reviewed several rOpenSci packages and been so happy to be connected to this community, but I have never submitted or maintained a package myself. All that changed when I heard the call for a new maintainer for the qualtRics package.

April 30, 2019

Writing a letter to DataCamp

Since 2017 I have been an instructor for DataCamp, the VC-backed online data science education platform. What this means is that I am not an employee, but I have developed content for the company as a contractor. I have two courses there, one on text mining and one on practical supervised machine learning. About two weeks ago, DataCamp published a blog post outlining an incident of sexual misconduct at the company.

April 16, 2019