Blog

Machine learning, text analysis, and more

Stack Overflow questions around the world

I am so lucky to work with so many generous, knowledgeable, and amazing people at Stack Overflow, including Ian Allen and Kirti Thorat. Both Ian and Kirti are part of biweekly sessions we have at Stack Overflow where several software developers join me in practicing R, data science, and modeling skills. This morning, the two of them went to a high school outreach event in NYC for students who have been studying computer science, equipped with Stack Overflow ✨ SWAG ✨, some coding activities based on Stack Overflow internal tools and packages, and a Shiny app that I developed to share a bit about who we are and what we do.

April 11, 2018

The game is afoot! Topic modeling of Sherlock Holmes stories

In a recent release of tidytext, we added tidiers and support for building Structural Topic Models from the stm package. This is my current favorite implementation of topic modeling in R, so let’s walk through an example of how to get started with this kind of modeling, using The Adventures of Sherlock Holmes. via GIPHY You can watch along as I demonstrate how to start with the raw text of these short stories, prepare the data, and then implement topic modeling in this video tutorial!

January 25, 2018

tidytext 0.1.6

I am pleased to announce that tidytext 0.1.6 is now on CRAN! Most of this release, as well as the 0.1.5 release which I did not blog about, was for maintenance, updates to align with API changes from tidytext’s dependencies, and bugs. I just spent a good chunk of effort getting tidytext to pass R CMD check on older versions of R despite the fact that some of the packages in tidytext’s Suggests require recent versions of R.

January 10, 2018

One year as a data scientist at Stack Overflow

I recently passed my one-year anniversary of working at Stack Overflow as a data scientist. I have some very exciting news! I am joining the data team at @StackOverflow. ✨📊✨📊✨ — Julia Silge (@juliasilge) December 13, 2016 Coming to Stack Overflow has been an adventure for me. This is my first time to work at an actual tech company. I have been what I like to think of as “tech adjacent” my whole career, writing code and working on technical questions but never before working at a straight-up web company.

December 27, 2017

Tidy word vectors, take 2!

A few weeks ago, I wrote a post about finding word vectors using tidy data principles, based on an approach outlined by Chris Moody on the StitchFix tech blog. I’ve been pondering how to improve this approach, and whether it would be nice to wrap up some of these functions in a package, so here is an update! Like in my previous post, let’s download half a million posts from the Hacker News corpus using the bigrquery package.

November 27, 2017