Reddit Reddit reviews Data Mining: Practical Machine Learning Tools and Techniques (The Morgan Kaufmann Series in Data Management Systems)

We found 6 Reddit comments about Data Mining: Practical Machine Learning Tools and Techniques (The Morgan Kaufmann Series in Data Management Systems). Here are the top ones, ranked by their Reddit score.

Computers & Technology
Books
Computer Science
AI & Machine Learning
Artificial Intelligence & Semantics
Data Mining: Practical Machine Learning Tools and Techniques (The Morgan Kaufmann Series in Data Management Systems)
Check price on Amazon

6 Reddit comments about Data Mining: Practical Machine Learning Tools and Techniques (The Morgan Kaufmann Series in Data Management Systems):

u/bigshum · 15 pointsr/compsci

I found the The WEKA toolkit to be a nice centralised resource when it came to learning about the multitude of techniques and parameters used out there. There's a book too which is a very informative read if a little dry in places.

This was used in my Language Identification project for speech signals and it worked quite nicely.

u/tpintsch · 2 pointsr/datascience

Hello, I am an undergrad student. I am taking a Data Science course this semester. It's the first time the course has ever been run so it's a bit disorganized but I am very excited about this field and I have learned a lot on my own.I have read 3 Data Science books that are all fantastic and are suited to very different types of classes. I'd like to share my experience and book recommendations with you.

Target - 200 level Business/Marketing or Science departments without a programming/math focus. 
Textbook - Data Science for Business https://www.amazon.com/gp/product/1449361323/ref=ya_st_dp_summary
My Comments - This book provides a good overview of Data Science concepts with a focus on business related analysis. There is very little math or programming instruction which makes this ideal for students who would benefit from an understanding of Data Science but do not have math/cs experience. 
Pre-Reqs - None.

Target - 200 level Math/Cs or Physics/Engineering departments.
Textbook -Data Mining: Practical Machine Learning Tools and Techniques https://www.amazon.com/gp/aw/d/0123748569/ref=pd_aw_sim_14_3?ie=UTF8&dpID=6122EOEQhOL&dpSrc=sims&preST=_AC_UL100_SR100%2C100_&refRID=YPZ70F6SKHCE7BBFTN3H
My comments: This book is more in depth than my first recommendation. It focuses on math and computer science approaches with machine learning applications. There are many opportunities for projects from this book. The biggest strength is the instruction on the open source workbench Weka. As an instructor you can easily demonstrate data cleaning,  analysis,  visualization,  machine learning, decision trees, and linear regression. The GUI makes it easy for students to jump right into playing with data in a meaningful way. They won't struggle with knowledge gaps in coding and statistics. Weka isn't used in the industry as far as I can tell, it also fails on large data sets. However, for an Intro to Data Science without many pre-reqs this would be my choice.
Pre-Req - Basic Statistics,  Computer Science 1 or Computer Applications.

Target - 300/400 level Math/Cs majors
Textbook - Data Science from Scratch: First Principles with Python
http://www.amazon.com/Data-Science-Scratch-Principles-Python/dp/149190142X
My comments: I am infatuated with this book. It delights me. I love math, and am quickly becoming enamored by computer science as well. This is the book I wish we used for my class. It quickly moves through some math and Python review into a thorough but captivating treatment of all things data science. If your goal is to prepare students for careers in Data Science this book is my top pick.
Pre-Reqs - Computer Science 1 and 2 (hopefully using Python as the language), Linear Algebra, Statistics (basic will do,  advanced preferred), and Calculus.

Additional suggestions:
Look into using Tableau for visualization.  It's free for students, easy to get started with, and a popular tool. I like to use it for casual analysis and pictures for my presentations. 

Kaggle is a wonderful resource and you may even be able to have your class participate in projects on this website.

Quantified Self is another great resource. http://quantifiedself.com
One of my assignments that's a semester long project was to collect data I've created and analyze it. I'm using Sleep as Android to track my sleep patterns all semester and will be giving a presentation on the analysis. The Quantified Self website has active forums and a plethora of good ideas on personal data analytics.  It's been a really fun and fantastic learning experience so far.

As far as flow? Introduce visualization from the start before wrangling and analysis.  Show or share videos of exciting Data Science presentations. Once your students have their curiosity sparked and have played around in Tableau or Weka then start in on the practicalities of really working with the data. To be honest, your example data sets are going to be pretty clean, small,  and easy to work with. Wrangling won't really be necessary unless you are teaching advanced Data Science/Big Data techniques. You should focus more on Data Mining. The books I recommended are very easy to cover in a semester, I would suggest that you model your course outline according to the book. Good luck!

u/leokassio · 1 pointr/datascience

Many thanks about kaggle tip and book!
Despite your tip about book (that I dont known), I'd like to recommend the DATA MINING from the authors of Weka, a very good book too.(http://www.amazon.ca/Data-Mining-Practical-Learning-Techniques/dp/0123748569/ref=sr_1_1?s=books&ie=UTF8&qid=1425389007&sr=1-1&keywords=data+mining)

u/SupportVectorMachine · 1 pointr/MLQuestions

I used Weka a lot when I was first starting out, and I can confidently recommend it. Data Mining: Practical Machine Learning Tools and Techniques is essentially a companion volume to Weka and its documentation, and it provides a great introduction to machine learning methodology in general; I recommend it, too. For user friendliness and visualization, I think it's a very good place to start.

Over time, I moved to R, which has the advantage of being more likely to incorporate new, cutting-edge methods that people have coded and released in packages. (There are also other R-based ML suites, such as Rattle.) If you like Weka, the transition into R can be pretty smooth, since R and Weka can talk to each other through R's Java interface. R is also good for applying command-line options (which can also be done in Weka's console), which you will eventually want to do as you get more familiar with your techniques of choice, whether they're found in Weka or not.

Python is a popular option for a lot of users (and with it you can use, among other things, Google's open-source TensorFlow suite), and it has the advantage of generally having pretty easy-to-read code, good visualization options, and a huge and very dedicated user base.