(Part 2) Best data mining books according to redditors
We found 160 Reddit comments discussing the best data mining books. We ranked the 58 resulting products by number of redditors who mentioned them. Here are the products ranked 21-40. You can also go back to the previous section.
Designing Data Intensive Applications is your ticket here. It takes you through a lot of the algorithms and architecture present in the distributed technologies out there.
In a data engineering role you will probably just be munging data through a pipeline making it useful for the analysts/scientists to use, so a book recommendation for that depends on the technology you will be using. Here are some of my favorite resources for the various tools I used in my experience as a Data Engineer:
Good luck in your new position!
Yes, it's absolutely doable. That's a very standard sized fact table for SSAS Tabular.
Some points I'd like to highlight:
I’ll try to explain. The basic layout of a genetic algorithm is rather simple:
Some forms of genetic algorithms may differ from that basic layout, but that’s basically how they work. If it has selection, mutation and crossover, it’s a genetic algorithm. Without crossover, it’s an evolutionary method, but not a genetic algorithm.
Source: The genetic algorithms class I’m taking this semester. We’re using this book. I can certainly recommend it, but it’s probably quite geared towards people with a CS background (as are a lot of books on the subject).
Also, this is an interesting Scientific American article on the subject, if you can get hold of a PDF or something.
If you're interested in the algorithms themselves, I would suggest Bioinformatics Algorithms: An Active Learning Approach, by Phillip Compeau and Pavel Pevzner. The content is the same as is on their Coursera MOOCs, so if you prefer online learning, that would be an equivalent route (as they don't offer a digital copy of the textbook)
If you're interested in statistical methods in Bioinformatics that are specifically relevant to data mining (e.g. classification, clustering, feature extraction, etc.), I really enjoyed Data Mining for Bioinformatics, by Sumeet Dua and Pradeep Chowriappa
Honestly, to start out learning DAX I'd suggest the following two books:
Power Pivot and Power BI: The Excel User's Guide to DAX, Power Query, Power BI & Power Pivot in Excel 2010-2016
and
Supercharge Power BI: Power BI Is Better When You Learn to Write DAX
Then you can move on to the SQLBI stuff.
/$0.02
I think this book gives the best overview of core functionality:-
https://www.amazon.co.uk/PostgreSQL-Running-Regina-O-Obe/dp/1449373194/ref=sr_1_1?ie=UTF8&qid=1479283169&sr=8-1&keywords=postgresql
And this is "reasonably" useful:-
https://www.amazon.co.uk/PostgreSQL-Development-Essentials-Manpreet-Kaur/dp/1783989009/ref=sr_1_3?ie=UTF8&qid=1479283169&sr=8-3&keywords=postgresql
I just bought these three books:
https://www.amazon.com/Hadoop-Definitive-Storage-Analysis-Internet/dp/1491901632/ref=sr_1_2?ie=UTF8&qid=1518547069&sr=8-2&keywords=hadoop+books
https://www.amazon.com/Hadoop-Quick-Start-Guide-Essentials-Addison-wesley/dp/0134049942/ref=pd_sim_14_13?_encoding=UTF8&pd_rd_i=0134049942&pd_rd_r=2BYRDQ84NJPJ7EPE8GZF&pd_rd_w=OwwMn&pd_rd_wg=XmDb6&psc=1&refRID=2BYRDQ84NJPJ7EPE8GZF
https://www.amazon.com/Hadoop-2-x-Administration-Cookbook-Administer/dp/1787126730/ref=sr_1_1_sspa?ie=UTF8&qid=1518547107&sr=8-1-spons&keywords=hadoop+books&psc=1
Im going to shadow a contractor here at work and probably set up a sandbox.
I could only find a direct download link for one of the books I would recommend (the first one), but I'll mention all of them:
I dunno about the best, but PostgreSQL - up and running is decent. Though I second the advice to learn database & SQL generally first, then worry about the specifics of PG - most of what you will be doing in the early stages will be pretty generic across databases
Here are all the local Amazon links I could find:
amazon.co.uk
amazon.ca
amazon.com.au
amazon.in
amazon.com.mx
amazon.de
amazon.it
amazon.es
amazon.com.br
amazon.nl
amazon.co.jp
amazon.fr
Beep bloop. I'm a bot to convert Amazon ebook links to local Amazon sites.
I currently look here: amazon.com, amazon.co.uk, amazon.ca, amazon.com.au, amazon.in, amazon.com.mx, amazon.de, amazon.it, amazon.es, amazon.com.br, amazon.nl, amazon.co.jp, amazon.fr, if you would like your local version of Amazon adding please contact my creator.
This has pretty good review on amazon.
I've seen this used by a few of my mates as a supplement. It has good sample questions and exercises.
...And this is a bit cheaper, but the reviews aren't that good.
Foundations for Architecting Data Solutions by Ted Malaska and Jonathan Seidman
Here are my suggestions:
SQL for Dummies
Practical SQL
​
This book: Ensemble Methods in Data Mining
As mentionned, working effectively with legacy code is a standard. You could look at the mikado method, it's about refactoring code, it's a great match with working effectively..
If you don't know where to start, start with bugs. If there's a bug, first you write a unit test that shows the failure and then you fix it.
Take a look at https://www.amazon.com/Data-Mining-Masses-Third-Implementations/dp/1727102479. It covers all the basics with examples in both RapidMiner and R
I really found this book useful: https://www.amazon.com/Learning-PySpark-Tomasz-Drabas/dp/1786463709
You can “Look Inside” and preview this book on Amazon.com if that helps...
https://www.amazon.com/Hands-Data-Analysis-Pandas-visualization/dp/1789615321