(Part 2) Best data mining books according to redditors

Jump to the top 20

We found 160 Reddit comments discussing the best data mining books. We ranked the 58 resulting products by number of redditors who mentioned them. Here are the products ranked 21-40. You can also go back to the previous section.

Next page

Top Reddit comments about Data Mining:

u/cfors · 22 pointsr/datascience

Designing Data Intensive Applications is your ticket here. It takes you through a lot of the algorithms and architecture present in the distributed technologies out there.

In a data engineering role you will probably just be munging data through a pipeline making it useful for the analysts/scientists to use, so a book recommendation for that depends on the technology you will be using. Here are some of my favorite resources for the various tools I used in my experience as a Data Engineer:

u/Data_cruncher · 10 pointsr/PowerBI

Yes, it's absolutely doable. That's a very standard sized fact table for SSAS Tabular.

Some points I'd like to highlight:

  • Purchase and read this entire book: Tabular Modeling in Microsoft SQL Server Analysis Services (Developer Reference) 2nd Edition
  • Begin with "Import" mode. Import mode physically moves the data from SQL into the Vertipaq engine; this is best practice. If you've exhausted all options and avenues on attack then you can fall back to "Direct Query" mode; this is not best practice.
  • When using Import mode, you must partition your table. This mechanism incrementally loads the data so you don't need to physically move 250M rows with every refresh. Note: this is NOT SQL Server "partitioning" - same name but completely different.
  • Download Vertipaq Analyzer and use it as soon as all 250M rows are loaded. It will highlight problem columns. Your goal is to remove high cardinality columns (columns with high distinct counts). For example, instead of storing a column of type DateTime, you should split it into one column of type Date and and another of type Time. You'll find big performance boosts. The book in my first bullet point goes through this and many other examples.
u/RockinRoel · 9 pointsr/programming

I’ll try to explain. The basic layout of a genetic algorithm is rather simple:

  1. Start with a (random) starting population, consisting of candidate solutions. The fitness of these solutions (how good they are) can somehow be determined — most often this is the most computationally expensive step — which is domain-specific.
  2. Pairs of candidate solutions (parents) are selected in the selection step. There are multiple ways to select these. Usually, those of higher fitness should have a greater chance to be selected. However, to have some diversity, there is still a chance for solutions that are less fit to be selected (because they may still contain parts that are valuable). If you’re trying to get red apples
  3. Now, from these pairs, children are created through crossover. Depending on the domain, there must be some representation of solutions and ways to combine two of them. Finding these is one of the most tricky parts of genetic algorithms.
  4. Mutation. With a certain (low) probability these children are mutated through a small change, so (hopefully desirable) properties can be created that may have not been present yet in the population.
  5. These children become the new population, and it goes back to step 2. The algorithm terminates depending on a certain fitness being reached, or after a certain number of steps,…

    Some forms of genetic algorithms may differ from that basic layout, but that’s basically how they work. If it has selection, mutation and crossover, it’s a genetic algorithm. Without crossover, it’s an evolutionary method, but not a genetic algorithm.

    Source: The genetic algorithms class I’m taking this semester. We’re using this book. I can certainly recommend it, but it’s probably quite geared towards people with a CS background (as are a lot of books on the subject).

    Also, this is an interesting Scientific American article on the subject, if you can get hold of a PDF or something.
u/niemasd · 6 pointsr/bioinformatics

If you're interested in the algorithms themselves, I would suggest Bioinformatics Algorithms: An Active Learning Approach, by Phillip Compeau and Pavel Pevzner. The content is the same as is on their Coursera MOOCs, so if you prefer online learning, that would be an equivalent route (as they don't offer a digital copy of the textbook)

If you're interested in statistical methods in Bioinformatics that are specifically relevant to data mining (e.g. classification, clustering, feature extraction, etc.), I really enjoyed Data Mining for Bioinformatics, by Sumeet Dua and Pradeep Chowriappa

u/CarbonChauvinist · 5 pointsr/PowerBI

Honestly, to start out learning DAX I'd suggest the following two books:

Power Pivot and Power BI: The Excel User's Guide to DAX, Power Query, Power BI & Power Pivot in Excel 2010-2016

and

Supercharge Power BI: Power BI Is Better When You Learn to Write DAX

Then you can move on to the SQLBI stuff.

/$0.02

u/seamanroses · 2 pointsr/dataanalysis

I could only find a direct download link for one of the books I would recommend (the first one), but I'll mention all of them:

u/khludge · 2 pointsr/PostgreSQL

I dunno about the best, but PostgreSQL - up and running is decent. Though I second the advice to learn database & SQL generally first, then worry about the specifics of PG - most of what you will be doing in the early stages will be pretty generic across databases

u/amazon-converter-bot · 1 pointr/FreeEBOOKS

Here are all the local Amazon links I could find:


amazon.co.uk

amazon.ca

amazon.com.au

amazon.in

amazon.com.mx

amazon.de

amazon.it

amazon.es

amazon.com.br

amazon.nl

amazon.co.jp

amazon.fr

Beep bloop. I'm a bot to convert Amazon ebook links to local Amazon sites.
I currently look here: amazon.com, amazon.co.uk, amazon.ca, amazon.com.au, amazon.in, amazon.com.mx, amazon.de, amazon.it, amazon.es, amazon.com.br, amazon.nl, amazon.co.jp, amazon.fr, if you would like your local version of Amazon adding please contact my creator.

u/[deleted] · 1 pointr/EngineeringStudents

This has pretty good review on amazon.

I've seen this used by a few of my mates as a supplement. It has good sample questions and exercises.

...And this is a bit cheaper, but the reviews aren't that good.

u/chad_m_i · 1 pointr/MLQuestions

Foundations for Architecting Data Solutions by Ted Malaska and Jonathan Seidman

u/joreddit14 · 1 pointr/SQL

Here are my suggestions:

SQL for Dummies

Practical SQL

​

u/shaggorama · 1 pointr/MachineLearning
u/Eligriv · 1 pointr/webdev

As mentionned, working effectively with legacy code is a standard. You could look at the mikado method, it's about refactoring code, it's a great match with working effectively..

If you don't know where to start, start with bugs. If there's a bug, first you write a unit test that shows the failure and then you fix it.

u/twentworth22 · 1 pointr/datascience

Take a look at https://www.amazon.com/Data-Mining-Masses-Third-Implementations/dp/1727102479. It covers all the basics with examples in both RapidMiner and R

u/chaimhaas · 1 pointr/Python

You can “Look Inside” and preview this book on Amazon.com if that helps...

https://www.amazon.com/Hands-Data-Analysis-Pandas-visualization/dp/1789615321