Posts
FAIL
There’s a lot that I don’t recall from my time as an engineering student, but one thing I do remember is studying failures, like the Tacoma Narrows Bridge collapse. I’ve forgotten which other disasters we covered, along with most of the mechanical principles at play, but the meta-lesson may be the most important takeaway from my engineering education: sometimes, you can learn more about how something works from seeing where (and how) it breaks than from studying how it’s supposed to work.
Posts
Everyone You'll Ever Meet Knows Something You Don't
I wrote a guest post on Randy Au’s Counting Stuff about the joys - and value - of unstructured conversations with people you’re not already working with closely.
I recently watched Bill Nye talk about his approach to problem solving and scientific thinking, and one of his main points really resonated with me - “Everyone you’ll ever meet knows something you don’t.” As someone with raging, chronic imposter syndrome, the way that was phrased (implying the corollary, “every time you meet someone new, you’ll know something they don’t) was especially powerful.
Posts
SMART Data Ops
Data quality checks are a critical tool in the data pro’s toolbox to ensure SLAs are maintained, but poorly designed checks can lead to a life of on-call misery, a constant flow of “why is this data wrong?” inquiries, or (worst of all) unecessarily Bad Data. But what makes a good data quality check?
Specific - Willy Wonka’s egg checker returned two possible values - “good” and “bad” - but data quality checks don’t work that way - nor should they.
Posts
Dimensions of Data Quality
Everyone wants “good” data. Almost as universal is the sense that the data you’re working with is…not good. Being able to objectively measure data quality is important for ensuring downstream modeling and decision making is built on reliable data, but it can be hard to measure and report on data quality without a framework for identifying what features of the data are good/bad.
Data features with expectations that can be defined and measured against are dimensions of data quality.
Posts
Minimizing the Cost Function in Data Projects (or, Keep it Simple, Stupid)
I’ve been working in data and computational science in various capacities in geophysics, finance, civic tech, social good, business analytics, and numerous hobbies for nearly 20 years. In that time, one universal truth I’ve found is that data work can get complicated fast. Even the so-called easy things like counting, time, measurements, and naming things aren’t always easy, but sometimes we make them harder than they need to be. It’s a constant battle to remember that a complicated solution might be fun to build, but it’s better to start simple and only add complexity where it’s merited.