Computers & technology books>Books>Databases & big data books

(Part 2) Best data mining books according to redditors

We found 160 Reddit comments discussing the best data mining books. We ranked the 58 resulting products by number of redditors who mentioned them. Here are the products ranked 21-40. You can also go back to the previous section.

21. Programming for Beginners: 10 Books in 1- 5 Books of Excel Programming+ 5 Books of Data Analytics

1 mention

Read Reddit comments View price

22. Bitcoin & Cryptocurrency Technologies: Bitcoin: Invest In Digital Gold, Wallet Technology Book, Anonymous Altcoins (3 BOOKS IN 1)

1 mention

Read Reddit comments View price

23. Data Mining for Bioinformatics

1 mention

Read Reddit comments View price

24. Text Mining with MATLAB®

1 mention

Read Reddit comments View price

25. Hands-On Data Analysis with Pandas: Efficiently perform data collection, wrangling, analysis, and visualization using Python

1 mention

Read Reddit comments View price

26. Hadoop 2.x Administration Cookbook: Administer and maintain large Apache Hadoop clusters

1 mention

Read Reddit comments View price

30. Data Mining for the Masses, Third Edition: With Implementations in RapidMiner and R

1 mention

Read Reddit comments View price

31. The Mikado Method

1 mention

Read Reddit comments View price

32. Supercharge Power BI: Power BI Is Better When You Learn to Write DAX

1 mention

Read Reddit comments View price

33. Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions (Synthesis Lectures on Data Mining and Knowledge Discovery)

1 mention

Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions (Synthesis Lectures on Data Mining and Knowledge Discovery)

Used Book in Good Condition

Read Reddit comments View price

34. Practical SQL: A Beginner's Guide to Storytelling with Data

1 mention

Read Reddit comments View price

35. Genetic Algorithms and Genetic Programming: Modern Concepts and Practical Applications (Numerical Insights)

1 mention

Read Reddit comments View price

36. Tabular Modeling in Microsoft SQL Server Analysis Services (Developer Reference)

1 mention

Microsoft Press

Read Reddit comments View price

37. Foundations for Architecting Data Solutions: Managing Successful Data Projects

1 mention

Read Reddit comments View price

38. PostgreSQL: Up and Running: A Practical Guide to the Advanced Open Source Database

1 mention

Read Reddit comments View price

39. High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark

1 mention

Read Reddit comments View price

40. Thoughtful Machine Learning with Python: A Test-Driven Approach

1 mention

O Reilly Media

Read Reddit comments View price

Top Reddit comments about Data Mining:

u/cfors · 22 pointsr/datascience

Designing Data Intensive Applications is your ticket here. It takes you through a lot of the algorithms and architecture present in the distributed technologies out there.

In a data engineering role you will probably just be munging data through a pipeline making it useful for the analysts/scientists to use, so a book recommendation for that depends on the technology you will be using. Here are some of my favorite resources for the various tools I used in my experience as a Data Engineer:

High Performant Spark
acloud.guru
Python Cookbook

Good luck in your new position!

u/Data_cruncher · 10 pointsr/PowerBI

Yes, it's absolutely doable. That's a very standard sized fact table for SSAS Tabular.

Some points I'd like to highlight:

Purchase and read this entire book: Tabular Modeling in Microsoft SQL Server Analysis Services (Developer Reference) 2nd Edition
Begin with "Import" mode. Import mode physically moves the data from SQL into the Vertipaq engine; this is best practice. If you've exhausted all options and avenues on attack then you can fall back to "Direct Query" mode; this is not best practice.
When using Import mode, you must partition your table. This mechanism incrementally loads the data so you don't need to physically move 250M rows with every refresh. Note: this is NOT SQL Server "partitioning" - same name but completely different.
Download Vertipaq Analyzer and use it as soon as all 250M rows are loaded. It will highlight problem columns. Your goal is to remove high cardinality columns (columns with high distinct counts). For example, instead of storing a column of type DateTime, you should split it into one column of type Date and and another of type Time. You'll find big performance boosts. The book in my first bullet point goes through this and many other examples.

u/RockinRoel · 9 pointsr/programming

I’ll try to explain. The basic layout of a genetic algorithm is rather simple:

Start with a (random) starting population, consisting of candidate solutions. The fitness of these solutions (how good they are) can somehow be determined — most often this is the most computationally expensive step — which is domain-specific.
Pairs of candidate solutions (parents) are selected in the selection step. There are multiple ways to select these. Usually, those of higher fitness should have a greater chance to be selected. However, to have some diversity, there is still a chance for solutions that are less fit to be selected (because they may still contain parts that are valuable). If you’re trying to get red apples
Now, from these pairs, children are created through crossover. Depending on the domain, there must be some representation of solutions and ways to combine two of them. Finding these is one of the most tricky parts of genetic algorithms.
Mutation. With a certain (low) probability these children are mutated through a small change, so (hopefully desirable) properties can be created that may have not been present yet in the population.
These children become the new population, and it goes back to step 2. The algorithm terminates depending on a certain fitness being reached, or after a certain number of steps,…

Some forms of genetic algorithms may differ from that basic layout, but that’s basically how they work. If it has selection, mutation and crossover, it’s a genetic algorithm. Without crossover, it’s an evolutionary method, but not a genetic algorithm.

Source: The genetic algorithms class I’m taking this semester. We’re using this book. I can certainly recommend it, but it’s probably quite geared towards people with a CS background (as are a lot of books on the subject).

Also, this is an interesting Scientific American article on the subject, if you can get hold of a PDF or something.

u/niemasd · 6 pointsr/bioinformatics

If you're interested in the algorithms themselves, I would suggest Bioinformatics Algorithms: An Active Learning Approach, by Phillip Compeau and Pavel Pevzner. The content is the same as is on their Coursera MOOCs, so if you prefer online learning, that would be an equivalent route (as they don't offer a digital copy of the textbook)

If you're interested in statistical methods in Bioinformatics that are specifically relevant to data mining (e.g. classification, clustering, feature extraction, etc.), I really enjoyed Data Mining for Bioinformatics, by Sumeet Dua and Pradeep Chowriappa

u/CarbonChauvinist · 5 pointsr/PowerBI

Honestly, to start out learning DAX I'd suggest the following two books:

Power Pivot and Power BI: The Excel User's Guide to DAX, Power Query, Power BI & Power Pivot in Excel 2010-2016

and

Supercharge Power BI: Power BI Is Better When You Learn to Write DAX

Then you can move on to the SQLBI stuff.

/$0.02

u/Citizenfishy · 2 pointsr/PostgreSQL

I think this book gives the best overview of core functionality:-

https://www.amazon.co.uk/PostgreSQL-Running-Regina-O-Obe/dp/1449373194/ref=sr_1_1?ie=UTF8&amp;qid=1479283169&amp;sr=8-1&amp;keywords=postgresql

And this is "reasonably" useful:-

https://www.amazon.co.uk/PostgreSQL-Development-Essentials-Manpreet-Kaur/dp/1783989009/ref=sr_1_3?ie=UTF8&amp;qid=1479283169&amp;sr=8-3&amp;keywords=postgresql

u/OracleDBA · 2 pointsr/financialindependence

I just bought these three books:

https://www.amazon.com/Hadoop-Definitive-Storage-Analysis-Internet/dp/1491901632/ref=sr_1_2?ie=UTF8&amp;qid=1518547069&amp;sr=8-2&amp;keywords=hadoop+books

https://www.amazon.com/Hadoop-Quick-Start-Guide-Essentials-Addison-wesley/dp/0134049942/ref=pd_sim_14_13?_encoding=UTF8&amp;pd_rd_i=0134049942&amp;pd_rd_r=2BYRDQ84NJPJ7EPE8GZF&amp;pd_rd_w=OwwMn&amp;pd_rd_wg=XmDb6&amp;psc=1&amp;refRID=2BYRDQ84NJPJ7EPE8GZF

https://www.amazon.com/Hadoop-2-x-Administration-Cookbook-Administer/dp/1787126730/ref=sr_1_1_sspa?ie=UTF8&amp;qid=1518547107&amp;sr=8-1-spons&amp;keywords=hadoop+books&amp;psc=1

Im going to shadow a contractor here at work and probably set up a sandbox.

u/seamanroses · 2 pointsr/dataanalysis

I could only find a direct download link for one of the books I would recommend (the first one), but I'll mention all of them:

Fluent Python (keep in mind the version is unedited and therefore likely has some errors you should watch out for)
Illustrated Guide to Python 3
Python Tricks
Thoughtful Machine Learning with Python

u/khludge · 2 pointsr/PostgreSQL

I dunno about the best, but PostgreSQL - up and running is decent. Though I second the advice to learn database & SQL generally first, then worry about the specifics of PG - most of what you will be doing in the early stages will be pretty generic across databases

u/amazon-converter-bot · 1 pointr/FreeEBOOKS

Here are all the local Amazon links I could find:

amazon.co.uk

amazon.ca

amazon.com.au

amazon.in

amazon.com.mx

amazon.de

amazon.it

amazon.es

amazon.com.br

amazon.nl

amazon.co.jp

amazon.fr

Beep bloop. I'm a bot to convert Amazon ebook links to local Amazon sites.
I currently look here: amazon.com, amazon.co.uk, amazon.ca, amazon.com.au, amazon.in, amazon.com.mx, amazon.de, amazon.it, amazon.es, amazon.com.br, amazon.nl, amazon.co.jp, amazon.fr, if you would like your local version of Amazon adding please contact my creator.

u/[deleted] · 1 pointr/EngineeringStudents

This has pretty good review on amazon.

I've seen this used by a few of my mates as a supplement. It has good sample questions and exercises.

...And this is a bit cheaper, but the reviews aren't that good.

u/chad_m_i · 1 pointr/MLQuestions

Foundations for Architecting Data Solutions by Ted Malaska and Jonathan Seidman

u/joreddit14 · 1 pointr/SQL

Here are my suggestions:

SQL for Dummies

Practical SQL

&#x200B;

u/shaggorama · 1 pointr/MachineLearning

This book: Ensemble Methods in Data Mining

u/Eligriv · 1 pointr/webdev

As mentionned, working effectively with legacy code is a standard. You could look at the mikado method, it's about refactoring code, it's a great match with working effectively..

If you don't know where to start, start with bugs. If there's a bug, first you write a unit test that shows the failure and then you fix it.

u/twentworth22 · 1 pointr/datascience

Take a look at https://www.amazon.com/Data-Mining-Masses-Third-Implementations/dp/1727102479. It covers all the basics with examples in both RapidMiner and R

u/ingvay7 · 1 pointr/apachespark

I really found this book useful: https://www.amazon.com/Learning-PySpark-Tomasz-Drabas/dp/1786463709

u/chaimhaas · 1 pointr/Python

You can “Look Inside” and preview this book on Amazon.com if that helps...

https://www.amazon.com/Hands-Data-Analysis-Pandas-visualization/dp/1789615321

(Part 2) Best data mining books according to redditors

21. Programming for Beginners: 10 Books in 1- 5 Books of Excel Programming+ 5 Books of Data Analytics

22. Bitcoin & Cryptocurrency Technologies: Bitcoin: Invest In Digital Gold, Wallet Technology Book, Anonymous Altcoins (3 BOOKS IN 1)

23. Data Mining for Bioinformatics

24. Text Mining with MATLAB®

25. Hands-On Data Analysis with Pandas: Efficiently perform data collection, wrangling, analysis, and visualization using Python

26. Hadoop 2.x Administration Cookbook: Administer and maintain large Apache Hadoop clusters

27. Learning PySpark

28. PostgreSQL Development Essentials

29. Social Media Mining with R

30. Data Mining for the Masses, Third Edition: With Implementations in RapidMiner and R

31. The Mikado Method

32. Supercharge Power BI: Power BI Is Better When You Learn to Write DAX

33. Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions (Synthesis Lectures on Data Mining and Knowledge Discovery)

34. Practical SQL: A Beginner's Guide to Storytelling with Data

35. Genetic Algorithms and Genetic Programming: Modern Concepts and Practical Applications (Numerical Insights)

36. Tabular Modeling in Microsoft SQL Server Analysis Services (Developer Reference)

37. Foundations for Architecting Data Solutions: Managing Successful Data Projects

38. PostgreSQL: Up and Running: A Practical Guide to the Advanced Open Source Database

39. High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark

40. Thoughtful Machine Learning with Python: A Test-Driven Approach

Top Reddit comments about Data Mining: