Top products from r/dataengineering
We found 10 product mentions on r/dataengineering. We ranked the 9 resulting products by number of redditors who mentioned them. Here are the top 20.
1. SQL Queries for Mere Mortals: A Hands-On Guide to Data Manipulation in SQL (4th Edition)
Sentiment score: 1
Number of reviews: 1
2. The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling
Sentiment score: 1
Number of reviews: 1
3. Algorithms in a Nutshell (In a Nutshell (O'Reilly))
Sentiment score: 1
Number of reviews: 1
4. Agile Data Warehouse Design: Collaborative Dimensional Modeling, from Whiteboard to Star Schema
Sentiment score: 1
Number of reviews: 1
5. The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling, 3rd Edition
Sentiment score: 1
Number of reviews: 1
John Wiley Sons
6. Learning Spark: Lightning-Fast Big Data Analysis
Sentiment score: 0
Number of reviews: 1
O Reilly Media
7. Agile Data Science 2.0: Building Full-Stack Data Analytics Applications with Spark
Sentiment score: 2
Number of reviews: 1
https://www.amazon.com/Data-Warehouse-Toolkit-Complete-Dimensional/dp/0471200247
The is a highly recommended book for the Data Warehouse industry. Hope you enjoy it and good luck.
I'm reading this book now:
https://www.amazon.com/Agile-Data-Science-2-0-Applications/dp/1491960116
And ok it's already 2 years old but it is amazing, it depicts the complete agile data science process while using, kafka, spark (core, streaming, sql, mlib), airflow, elasticsearch, mongodb, scikitlearn, d3js and how to improve and deploy your pipeline.
The most important reading from a database design perspective, IMO, is one of Kimball’s books:
https://www.amazon.com/Data-Warehouse-Toolkit-Definitive-Dimensional/dp/1118530802
It’s less technically focused, and more focused on how to build good datasets. It’s an older text so it’s references to specific technologies are a bit out of date, but when it comes to describing how to design particular schemas (or at least speak the language of people who design schemas), it’s pretty much canon.
I attended UCI for CS, and am going through the process of masters right now. I'm a data engineer / data platform engineer at a startup, and have been doing it for ~2 years or so. I find that the traditional CS knowledge is a tool belt that you don't necessarily *need* to get through industry.
​
There are a lot of really good algorithm books out there, O'Reilly has Algorithms In A Nutshell which does talk about O notation, and then a walk through of some basic data structures and algorithms (Linked list, trees, sorting). DS and Algos are really like the *core* CS things that one would need. Some community colleges offer these courses, which might be better depending on your circumstance.
​
The upper division classes are useful I think. I took a few classes on distributed systems and computer architecture which have been insurmountable. I took a class on databases (useful I suppose but meh), some classes on machine learning and artificial intelligence and operating systems. Those have become more useful now that I'm doing data platform work.
​
All that being said, I think the only disadvantages you have are the terminology ("This will give o(nlogn) lookup while retaining referential integrity") and the boxes to tick. Terminology though you can learn. The boxes to tick though might be tougher. I think some companies will be really stingy about that stuff. You did say that you have an undergraduate education though so I don't think that will matter.
Definitely. We actually used that book for my Business Intelligence masters course in my MIS program. I met a BI manager hiring for a data engineering role and she recommended the following text as well. The content was pretty similar as they focus on the Kimball method but goes over BEAM*, which is a requirements gathering framework for designing data warehouses.
https://www.amazon.com/Agile-Data-Warehouse-Design-Collaborative/dp/0956817203/ref=sr_1_1?s=books&ie=UTF8&qid=1511661160&sr=1-1&keywords=agile+data+warehouse+design
Re-iterating what the previous posters said: the fundamentals are the same regardless of system. Learning how to get data out of a SQL system is all about learning how to write SQL.
To effectively learn how to write SQL for data engineering, I highly recommend grabbing a book like one of these*:
and grabbing a sample database for the system of your choice:
and then practice some of your chosen book on the sample db.
Notes and words of warning:
^((*I'm not affiliated w/ any of those books))
No, this one: Learning Spark: Lightning-Fast Big Data Analysis https://www.amazon.com/dp/1449358624/ref=cm_sw_r_cp_apa_i_dav3DbS0DXT51