Notes from Industry

There’s a very oddly cut stump along the Hudson River in Manhattan (as of 2018 anyways), almost like a chair. But if you look closer the tree had grown around some old iron fencing, leaving a stump invincible to unmotivated chainsaw operators (Photo: Randy Au)

TL;DR: Cleaning data is considered by some people [citation needed] to be menial work that’s somehow “beneath” the sexy “real” data science work. I call BS. The act of cleaning data imposes values/judgments/interpretations upon data intended to allow downstream analysis algorithms to function and give results. That’s exactly the same as doing data analysis. In fact, “cleaning” is just a spectrum of reusable data transformations on the path towards doing a full data analysis. Once we accept that framework, the steps we need to take to clean data flow more naturally. …


If you’re bad at Viz, like me, there’s hope for us

Chart of Charts (By: Randy Au)

I’ve got a confession to make, I’ve got the artistic and visualization skills of a turnip.

It’s not that I’ve ever lied to myself and pretended that I was good at it-I almost always rely on very simple everyday tools and get by decently. But on a very recent team off-site event, it struck me that I’ve grown super comfortable with it, which is a very peculiar feeling since I really should be improving such a flaw.

The event that triggered this realization was a UX summit hosted by our group’s director 2 org chart levels above me. There were…


It’s dangerous to go alone! Take this.

(Photo: Randy Au)

Special thanks to the person on twitter who messaged me with this question and is letting me use it as a starting point of a post. Poking at real scenarios is real fun, and I can always take a bit of creative liberty in anonymizing details.

Here’s their (paraphrased) problem statement.

I recently became a data analyst at a company. It looks like I need to do a lot of organization database creation work first. A lot of data is in Excel files in different systems. I want to gather everything, organize it, make it queryable and visualizable for users…


What you can do when you actually collect email data

Smallest piledriver I’ve ever seen in person, Seattle (Photo: Randy Au)

Last week, we jumped off the deep end into just about every last scrap of data that could be collected from email. We noted that there are lots of little caveats and limitations in the fundamental technology. But we didn’t have time to go into how to actually use the data. That’s why were’ here.

Email tracking data has interpretation issues

The problem with email analytics is the lack of detailed visibility into what is going on at the user’s end by virtue of the email client being completely divorced from our analytics systems. …


Black boxes are hot again

Mailboxes in Greece (Photo: Randy Au)

Hats off to Vicki for sparking the motivation to write this, I’m always open to ideas and suggestions for topics to write about *hinthint*


Work has conditioned us to

(Photo: Randy Au)

Earlier last week I was seriously thinking about discussing how data people need to be very cautious about what they’re saying in public about COVID-19. There is a lot of contextless data and charts bouncing around that acts like a magnet to us data folk. The temptation is high to make some charts we would do at work, come to some conclusions that are unrealistic and accidentally cause more harm than good by spreading distrust in official sources from experts in the field (just look at the complexity of their model parameters on page 4).

An example of one such…


Some people are learning data science quickly without realizing it

Photo: Randy Au

These days, while doing my best not touch my face, I’m watching the world at large take a crash course in basic data analysis. Since teaching data literacy is a big part of my job, I pay a bit more attention to people learning about data in my daily life. Yes, the way I’m coping with current events filled with increasing uncertainty and risk is to say “Look! This is a learning moment!”

I wish I knew who I can attribute this to, but I remember hearing something that proves true in these situations: Average people aren’t stupid, they’re just…


Don’t expect to land the title easily, heck, don’t even go for the title

Weird object in the middle of a garden. Just like qUXR. (Photo: Randy Au)

For some reason, I’ve been asked about becoming a Quantitative UXR multiple times in the past couple of months, apparently because of the post I wrote a year ago about how I wound up being a Quant UXR. Since the position is very slowly growing out in the wild, I figured I should put down what the current state of the world is for those curious about how to become one.

Quant UXR is just a newer label for a job that always (kinda) existed

A Quant UX Researcher right now is essentially a generalist form of data scientist that prefers to focus on understanding user behavior rather than building other kinds of models…


We seriously don’t need that sort of nonsense

QUACK (Photo: Randy Au)

This weekend, Medium’s Byzantine article recommendation algorithm decided to recommend me an article about being a “10x data scientist”. My initial reaction was “what in the absolute f — -?”. I’m not linking to it since that would be rewarding the bad behavior of a self-aggrandizing twit (it takes a whole new level of hubris to essentially declare yourself to be a 10x anything on the open internet). Search and use the Google cache if you must read it.

I later realized that the article was published in October of 2019, thankfully not super recent but still less than 5…


It’s not every day we can watch an expert explaining how hard counting is

Today, amidst the growing global buzz of the 2019 Novel Coronavirus outbreak that is currently centered in China, I spotted this wonderful twitter thread about a number: R0, “R naught”, the basic reproduction number. This term comes out of epidemiology and I’m most certainly not an epidemiologist, so I’ll try not to mangle it the concept badly.

Before we get to the thread, the simple gist of R0, from my cursory reading is thus:

  • R0 < 1, every existing infection causes less than 1 new infection, disease will eventually die out
  • R0 = 1, every existing infection causes 1 more…

Randy Au

I stress about data quality a lot. Data nerd/scientist, camera junkie. Quant UXR @Google Cloud. Formerly @bitly, @Meetup, @primarydotcom. Opinions are my own.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store