Reflecting on the art of Norwood Viviano

One of the first things I wanted to do after receiving my COVID-19 vaccination was to visit a museum (with masks and social distancing, of course), and my first such visit was to The Museum of Fine Arts in Houston, Texas.

While milling about the galleries, I was immediately struck by one exhibit of Norwood Viviano’s work titled Cities: Departure and Deviation. The exhibit is, at its core, a visualization of time series data representing long-term changes in the populations of 24 cities. …

10 key ways data roles are different in an institution where data scientists don’t exist

Photo by Jonathan Meyer from Pexels

There are so many articles and social media posts out there attempting to define the difference between data analyst and data scientist roles, a goal that I’ve never quite understood. This distinction varies so widely from industry to industry and company to company that it seems impossible to draw clear, generalizable lines between the two titles. One of the few commonalities that I’ve seen in these distinctions is the designation of data analysts as second-rate data professionals, which I believe leads many job-seekers to reject data analyst roles together. As someone who thoroughly enjoys her data analyst role, in part…

Exploring functionality beyond clean_names()

Photo by Pixabay from Pexels

The janitor Package

The janitor package is available on CRAN and was created by Sam Firke, Bill Denney, Chris Haid, Ryan Knight, Malte Grosser, and Jonathan Zadra. While arguably best known for its extraordinarily useful clean_names() function (which I will be covering later on in this article), the janitor package has a wide range of functions that facilitate data cleaning and exploration. The package is designed to be compatible with the tidyverse, and can therefore be seamlessly integrated into most data prep workflows. Useful overviews of janitor package functions that I have consulted can be found here and here. …

Conceptual overview and step by step guide for beginners

Photo by from Pexels

Shiny is used by many data scientists and data analysts to create interactive visualizations and web applications. While Shiny is an RStudio product and quite user-friendly, the development of a Shiny app differs significantly from the data visualization and exploration that you might do via the tidyverse in an RMarkdown file. There can therefore be a bit of a Shiny learning curve even for experienced R users, and this tutorial aims to introduce Shiny’s usage and capabilities through a brief conceptual overview as well the step-by-step creation of an example Shiny app.

In this tutorial we’ll create the following Shiny…

Step by step guide with screenshots

Photo by Christina Morillo from Pexels

It’s fairly straightforward to set up a GitHub connection when creating a new R project. Sometimes, however, you end up with a repo that is only stored locally. Maybe you received a project from someone else who doesn’t use GitHub, or maybe you’re having an off day and simply forgot to start out with a GitHub connection.

Either way, you can easily copy this repo into GitHub and set up a connection. This step-by-step guide will show you how.

1. Download the GitHub for Mac app

First, head to the GitHub for Mac webpage:

You should see the following screen:

Why we need to rethink traditional polling

Image source:

I published an analysis back in October in which I used Google searches to predict the results of the 2020 U.S. presidential election. I was largely focused on six swing states — Arizona, Florida, Michigan, North Carolina, Pennsylvania, and Wisconsin. This analysis was based upon the work of Seth Stephens-Davidowitz, a data scientist with extensive experience working with Google search data.

Anyone who followed the election knows that the polls missed the mark in several key states, but how did the predictions gleaned from Google search data hold up?

Part 1: Candidate Name Order

The first part of my analysis was based upon Stephens-Davidowitz’s finding…

Hands-on Tutorials, VIDEO TUTORIAL

How to use scatterplots, correlation coefficients, and linear regression effectively

Photo by Magda Ehlers from Pexels

One of the most common analyses conducted by data scientists is the evaluation of linear relationships between numeric variables. These relationships can be visualized using scatterplots, and this step should be taken regardless of any further analyses that are conducted. Regression analyses and correlation coefficients are both commonly used to statistically assess linear relationships, and these analytic techniques are closely related both conceptually and mathematically.

This article will describe scatterplots, correlation coefficients, and linear regression, as well as the relationships between all three statistical tools.

Video tutorial


Scatterplots are used to visually assess the relationship between two numeric variables. Typically…

Data for Change

Opportunities and Advancements

Photo by Dark Indigo from Pexels

Healthcare data science has been growing rapidly for several years, although the use of data to understand and address mental health problems has lagged behind the rest of the field. In many ways, however, mental health is perfectly suited for data science approaches: the burden of mental illness in the US is enormous, often unaddressed, and not fully understood, creating enormous potential for data-driven research and solutions. …

What the polls may have missed

Photo by Element5 Digital from Pexels


Most of us here in the U.S. are waiting with bated breath for the results of next week’s enormously consequential presidential election. Virtually all of the data providing insight into the likely outcome comes in the form of polling data, which, while extremely valuable, is also inherently imperfect. Selection bias arises from the fact that it is nearly impossible to get a random sample of voters with traditional polling methods, and means that polls often do not actually represent the population that they intend to capture. Polling data is also notoriously susceptible to social desirability bias — there is a…

Why these tools are even better together

Tableau has taken the data visualization world by storm, and for good reason. Beautiful and complex visualizations, dashboards, and reports can be created quickly and without any coding experience within its user-friendly interface. Tableau is particularly useful for the creation of interactive visualizations, as filters can be added to a single visualization or full dashboard with just a few clicks. However, Tableau is limited in its analytic capabilities. The calculated fields feature allows for simple measures such as means, sums, and date differences to be calculated, and Tableau has some built-in features for adding regression lines or identifying clustering. …

Emily A. Halford

I am currently a data analyst working in psychiatric epidemiology, and I am excited about the intersection of data science and mental health. Views are my own.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store