Data Engineering – Basic Skills – Git

Hi fellow data heroes!

Today, I want to dive into Git—a tool that’s as essential for data engineers as bash scripting. While bash scripting might make you feel like a real developer, Git takes it to the next level by facilitating collaboration and version control. After all, knowledge is only valuable if shared, and Git is the perfect tool to enable sharing and enhance our collective expertise. Remember, teamwork makes the dream work!

But, what exactly is Git?

According to Wikipedia:

Git is a distributed version control system that tracks versions of files. It is often used to control source code by programmers collaboratively developing software.

I like to think of Git as a community mural where everyone has the opportunity to contribute. Each time someone adds a new section, the mural improves. If mistakes are made, more experienced artists can correct them. The more organized the community, the more beautiful the mural becomes. Git embodies the balance between freedom and control. Just as we start with drafts, seek feedback, and refine our work before committing to the final mural, Git allows us to manage code with similar care.

Git provides a space where code evolves through contributions from multiple developers. It keeps a history of all changes, allowing you to revert to any previous version of your code. This balance between creative freedom and controlled processes is crucial for data engineers. While we should have the liberty to experiment with new techniques or enhance data pipelines, Git branches offer the structure we need. Think of branches as drafts that can be reviewed and refined before merging into the main branch. This control ensures that no change is made to the main branch without peer approval.

Git is an incredible tool for collaboration, but it’s important to establish guidelines within your team to use it effectively.

Here are some of my favorite Git commands that I use daily. While these commands are just the tip of the iceberg, they’re a great starting point for mastering Git.

I hope you find this helpful!

Data Engineering – Basic Skills – Bash Scripting

When I was a teenager, I watched the movie Swordfish. I barely remember the plot, but what stuck with me was Hugh Jackman as the fastest typer and most incredible software engineer I had ever seen. Before that, my idea of a software engineer was Al McWhiggin, the “chicken man” from Toy Story II. But after Swordfish, my perception of what a software engineer should look like completely changed.

In my mind, a top-tier software engineer had to be:

  • Height: Over 6 feet
  • Body: Six-pack abs, strong arms, and killer legs—because how else could you type as fast as Wolverine?
  • Looks: Amazing hair, a perfect nose, and intense black eyes.
  • Hygiene: A bit of dirt on your face, a rugged, unshowered look.
  • Scars: A few battle wounds as proof that your coding skills have saved the world.

And, most importantly, you had to be able to solve a complex coding challenge in 60 seconds flat—a feat that would take most experienced engineers 16 minutes.

But thanks to my Computer Science degree and exposure to more tech-related movies, I eventually realized that none of these characteristics are necessary to be a successful software engineer—or to save the world.

In this new series, I want to share the essential skills every software engineer needs to kickstart a successful career. My focus will be on the core competencies a Data Engineer requires to thrive.

The first post in this series is about bash scripting. I love bash scripting because it makes me feel like a real software engineer: black screen, basic commands in white text—what more do you need to feel like Hugh Jackman saving the world from dark hackers?

Bash scripting is incredibly easy to learn. Memorize these 10 basic commands, and you’re good to go:

  1. grep – Filters input based on regex pattern matching.
  2. ls – Lists the contents of your directory.
  3. cd – Moves within directories.
  4. rm – Removes files.
  5. mkdir – Creates directories.
  6. nano – Opens a file editor.
  7. echo – Prints a message.
  8. clear – Clears the console.
  9. | – Pipe (not a command, but essential for streaming data from one command to another).
  10. cat – Concatenates files.

I recently took a Bash Scripting course on DataCamp and have attached my notes for anyone looking to get started.

Yes, there’s more to bash than just 10 commands, but these are enough for the daily tasks of a Data Engineer. Thanks to developers like “Wolverine,” we now have user-friendly tools with powerful UIs that allow us to put our bash skills on the back burner. But don’t forget to dust off those skills from time to time—you never know when John Travolta might ask you to solve a coding challenge using only bash scripting in 60 seconds! The world needs those bash skills.