YongJin Lee

Engineering Data, Investing in Tomorrow, Journeying Through Life.

Category: Data Engineering

  • Pursuit of Canonical Data: Addressing Misconceptions

    In a world where data drives numerous businesses, we aim to establish sources of truth (sometimes called canonical data) characterized by correctness and reliability. In my view, companies like Spotify, Amazon, Meta, and Google are fundamentally data companies. While their customer offerings vary, their business objective remains consistent: to connect users with what they seek.… Read more

  • Tracking Consumption History in BigQuery Tables: A Guide to Retrieving the Latest Access Date

    In today’s post, I’ll share insights into tracing the consumption history of BigQuery tables. This activity isn’t just for the curious; it’s a necessary step in optimizing the data management within BigQuery and ensuring its relevance. By understanding table utilization, organizations can streamline storage and computation costs, deprecate unused data, and offer more up-to-date datasets.… Read more

  • Dataform Explored: Harnessing Its Power and Addressing Opportunities for Improvements

    In an earlier post, I spotlighted Dataform’s transformative capabilities, emphasizing its potential to reshape the data transformation and pipelining landscape for teams. Like all sophisticated tools, a deeper examination reveals areas where refinement could enhance the user experience. In this piece, I’ll share the challenges I’ve encountered and suggest improvements to augment Dataform’s effectiveness. Areas… Read more

  • My Dive into MLOps: A Data Engineer’s Perspective

    Yesterday, on the recommendation of a colleague/friend, I took my first step into the Machine Learning Engineering for Production (MLOps) Specialization. It wasn’t a random decision but a reflection of where I see the future of data engineering and machine learning converging. Lessons from the Past Throughout my time as a Data Engineer, I’ve tackled… Read more

  • Why I Love My Job as a Data Engineer

    I love my job as a Data Engineer. From my start as a Business Intelligence Analyst to my evolution into a Data Engineer, I’ve had the privilege of expanding my skills through exciting projects. I heard it is hard to find a job you love, and I find myself lucky. My career began as a… Read more

  • Google Dataplex: An Introduction and Its Advantages

    While exploring Google’s tools recently, I bumped into something called Google Dataplex. As most of us know, data plays a huge role in today’s businesses. Every day, big and small companies are making and using tons of data. But with all this data flowing in and out, handling it becomes a big challenge. That’s where… Read more

  • Why Understanding Source Data Is Important

    A Recurring Challenge in Data Management In the intricate realm of data management and analytics, a pattern has consistently surfaced in my experiences: a rush of professionals hastily charting development strategies and pipelines without a deep, intricate understanding of the source data they’re handling. I’ll candidly admit – I’ve also been guilty of this oversight.… Read more

  • The High Level Benefits of Using Dataform with BigQuery

    Introduction In my journey as someone who transitioned from being a data analyst to a data engineer, I had the opportunity to explore and pilot Dataform and DBT for my organization. These tools offered exciting features that bridged the gap between data analysts and software engineers, with a minimal learning curve. In this post, I… Read more

  • Why I Chose Google Cloud Professional Data Engineer Certification

    The Spark: My Initial Interest in Google Cloud Professional Data Engineer Certification In recent weeks, I’ve been pondering over the decision to earn the Google Cloud Professional Data Engineer certification. Curiously, I found limited narratives on why many choose this path. So, I embarked on a journey of introspection to document my motivations. In case… Read more

  • DBT Core vs. Dataform: A Comparative Analysis

    Recently, I embarked on an insightful journey of evaluating two renowned data transformation tools: DBT Core and Dataform. Our aim was clear: to significantly improve data analysts’ experiences when working with data transformations in our data warehouse. Since our primary tools comprise BigQuery, Airflow, and other GCP products, the compatibility and integration of the chosen tool with… Read more