(Part 2) Top products from r/datascience

Jump to the top 20

We found 42 product mentions on r/datascience. We ranked the 205 resulting products by number of redditors who mentioned them. Here are the products ranked 21-40. You can also go back to the previous section.

Next page

Top comments that mention products on r/datascience:

u/tpintsch · 2 pointsr/datascience

Hello, I am an undergrad student. I am taking a Data Science course this semester. It's the first time the course has ever been run so it's a bit disorganized but I am very excited about this field and I have learned a lot on my own.I have read 3 Data Science books that are all fantastic and are suited to very different types of classes. I'd like to share my experience and book recommendations with you.

Target - 200 level Business/Marketing or Science departments without a programming/math focus. 
Textbook - Data Science for Business https://www.amazon.com/gp/product/1449361323/ref=ya_st_dp_summary
My Comments - This book provides a good overview of Data Science concepts with a focus on business related analysis. There is very little math or programming instruction which makes this ideal for students who would benefit from an understanding of Data Science but do not have math/cs experience. 
Pre-Reqs - None.

Target - 200 level Math/Cs or Physics/Engineering departments.
Textbook -Data Mining: Practical Machine Learning Tools and Techniques https://www.amazon.com/gp/aw/d/0123748569/ref=pd_aw_sim_14_3?ie=UTF8&dpID=6122EOEQhOL&dpSrc=sims&preST=_AC_UL100_SR100%2C100_&refRID=YPZ70F6SKHCE7BBFTN3H
My comments: This book is more in depth than my first recommendation. It focuses on math and computer science approaches with machine learning applications. There are many opportunities for projects from this book. The biggest strength is the instruction on the open source workbench Weka. As an instructor you can easily demonstrate data cleaning,  analysis,  visualization,  machine learning, decision trees, and linear regression. The GUI makes it easy for students to jump right into playing with data in a meaningful way. They won't struggle with knowledge gaps in coding and statistics. Weka isn't used in the industry as far as I can tell, it also fails on large data sets. However, for an Intro to Data Science without many pre-reqs this would be my choice.
Pre-Req - Basic Statistics,  Computer Science 1 or Computer Applications.

Target - 300/400 level Math/Cs majors
Textbook - Data Science from Scratch: First Principles with Python
http://www.amazon.com/Data-Science-Scratch-Principles-Python/dp/149190142X
My comments: I am infatuated with this book. It delights me. I love math, and am quickly becoming enamored by computer science as well. This is the book I wish we used for my class. It quickly moves through some math and Python review into a thorough but captivating treatment of all things data science. If your goal is to prepare students for careers in Data Science this book is my top pick.
Pre-Reqs - Computer Science 1 and 2 (hopefully using Python as the language), Linear Algebra, Statistics (basic will do,  advanced preferred), and Calculus.

Additional suggestions:
Look into using Tableau for visualization.  It's free for students, easy to get started with, and a popular tool. I like to use it for casual analysis and pictures for my presentations. 

Kaggle is a wonderful resource and you may even be able to have your class participate in projects on this website.

Quantified Self is another great resource. http://quantifiedself.com
One of my assignments that's a semester long project was to collect data I've created and analyze it. I'm using Sleep as Android to track my sleep patterns all semester and will be giving a presentation on the analysis. The Quantified Self website has active forums and a plethora of good ideas on personal data analytics.  It's been a really fun and fantastic learning experience so far.

As far as flow? Introduce visualization from the start before wrangling and analysis.  Show or share videos of exciting Data Science presentations. Once your students have their curiosity sparked and have played around in Tableau or Weka then start in on the practicalities of really working with the data. To be honest, your example data sets are going to be pretty clean, small,  and easy to work with. Wrangling won't really be necessary unless you are teaching advanced Data Science/Big Data techniques. You should focus more on Data Mining. The books I recommended are very easy to cover in a semester, I would suggest that you model your course outline according to the book. Good luck!

u/syntonicC · 13 pointsr/datascience

I used R for about 4 years before I moved to Python to use it for deep learning. I have been using Python for about 2 years now.

>Are R and Python considered redundant, or are there some situations where one will be preferred over the other? If I become proficient at using Python for data wrangling, analysis, and visualization, will I have any reason to continue using R?

It depends. I haven't really found anything that I can do in Python that I could not already do in R. I still use R because I like it better as a functional programming language and because it has a wide variety of more specific statistical packages (many for biology) that are just not available for Python yet. There are some specific cases where I just find it more intuitive and simpler to implement a solution in R. And generally, I just prefer ggplot2 over any of the various Python plotting packages. Also, R has high level API for things like TensorFlow so it's not like you can't do deep learning in R.

The biggest advantage for Python is its speed and ability to work within a larger programming framework. A lot of companies tend to use Python because the models they build are integrated into a larger system that needs the capabilities of a fully-fledged programming language. Python is generally faster and has better management of big data sets in memory. R is actually moving more in the direction to fix these issues but there are still limitations.

>Where should I start? I'm looking for a resource that isn't aimed at complete beginners, since I've been using R for a few years, and took a C class before that. At the same time I wouldn't claim to be an experienced programmer. I'm interested in learning Python both for data analysis and for general programming.

I learned Python syntax using Learn Python 3 the Hard Way. I learned about Pandas and data wrangling etc using Pandas for Everyone and Pandas Cookbook. If I was to suggest just one book, it would be Pandas for Everyone. You can learn Python syntax from YouTube, MOOCs, or online tutorials. The Pandas Cookbook is just extra practice. To be honest though, the general conventions used by Pandas for data analysis and manipulation are very similar to R in many ways. Especially if you've used anything in Hadley Wickham's Tidyverse. Finally, I made a Pandas cheatsheet while I was learning and including equivalent R functions in some places. I would be happy to share this Google Sheets file with you if you are interested.

>What IDE(s) should I use, and what are some must learn packages? I'm hoping to find something similar to RStudio.

I started off using PyCharm. I've heard good things about Spyder. But now, I actually still use RStudio! It is fully integrated with Python thanks to the Reticulate package. You can pass data structures between the languages and use both in RMarkdown. You can also use virtual environments which are popular with Python. Once you install the package:

library(reticulate)
use_virtualenv("path_to_my_virtual_env") # Start virtual environment

You can now run Python scripts directly in the RStudio console

# If you want a Python REPL to use interactively just like in R run:<br />
repl_python()<br />


It's really easy to use and even comes with auto-complete and everything else.

Hope that helped.

u/acid_wrappers · 16 pointsr/datascience

edit Supposedly this guy is OG in data science. http://www.datasciencecentral.com/profiles/blogs/hitchhiker-s-guide-to-data-science-machine-learning-r-python




My friend has a bio background and doing well as a data scientist consultant. I wouldn't shy away with a lack of math.

I'm still an amateur, so take this with a grain of salt.
I'd also like to share my strategy for learning data science so far.

I have a math background, which is useful but not required. Knowing linear algebra and differential equations, some analysis stuff is useful for developing a deeper intuition into how the machine is learning, but not necessary. IMO data science is a life long journey as it can be applied to many fields. It may be useful to learn more math later on as it get's deeper, but surface level knowledge should suffice.

For linear algebra, I've found the first lecture to be the most useful. http://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/video-lectures/ It basically describes how we can translate lines into vectors and find solutions. It may be useful for continue learning, but in the beginning I believe surface understanding should suffice. If you're looking to build new data analytic tools, understanding the maths at depth is a must. But if your goal is to apply the tools already in existence, you can get by with a brief understanding.

For example, I have a weak statistics background; for the things I don't know I look them up on wikipedia, various sites, etc. The goal is not necessarily to learn the material as you would for an exam, but to develop a broader understanding of what the material is and how it is relates to machine learning. When I read this material I probably retain only 5-15% of the information, but I read enough to let me move on. Never get stuck on one piece of information for too long. I've found if I get stuck, I can move on and the brain just kind of figures out how it fits into the puzzle.

With your background Andrew Ng's course on coursera https://www.coursera.org/learn/machine-learning should be suitable.

I watch these videos only once on 2x speed. My goal is not to retain the information but to index it. Much of what is useful will be learned by practice, by watching the videos on 2x it's like skimming a text. It allows you to index, that way you know where to look if you need greater depth in the future. For example, you don't have to memorize the cost function, but it's important to know why the cost function is constructed the way it is, and what it's use is.

I then supplement by reading this: http://neuralnetworksanddeeplearning.com/

and doing these problems http://www.cs.cmu.edu/~tom/10601_fall2012/hws.shtml

This is the most useful resource I've found tbh:
http://www.kdnuggets.com/

I have a weak programming background, so for learning python I've found this text useful for practice and learning the language: https://www.amazon.com/Automate-Boring-Stuff-Python-Programming/dp/1593275994?ie=UTF8&amp;amp;
Version=1&amp;entries*=0

This text is very basic, useful in general if you don't have a compsci/compeng background, but doesn't have direct applications for data science. For a more data focus wrt python: https://www.coursera.org/specializations/python . You do not have to pay for any of these courses. Just search for the specific course and enroll, for example, https://www.coursera.org/learn/python-data

That's pretty much where I'm at.
I believe the most important thing is to train our brains to think as the machine would. It's important to utilize our intuition and natural parallel abilities of the brain, as ultimately these are the techniques we are attempting to replicate.

u/lemontheme · 3 pointsr/datascience

Fellow NLP'er here! Some of my favorites so far:

u/[deleted] · 1 pointr/datascience

I did now. Any way of getting a sticky/wiki/FAQ of useful materials /common questions for noobs like me? People can vote/review books and MOOC's / Kaggle competitions, and what was the best for them. Give us newbies something to get started on so we don't have to flood the sticky. Then gives more of a community support rather than one person's suggestion.


For instance

Applied Predictive Modeling

or the less theory version

Intro to Statistical Learning were two books that helped me with understanding statistical models and had applications and exercises in R

R for Data Science was decent enough and had updated packages for making tidy data.


I found the Data Science Coursera Specialization decently useful, but didn't go deep enough. It did give me enough of a taste to know this is the direction I want my career to go in. So I'm hesitant to do more MOOCs.




I also don't have experience in Data Science hiring, but have it for consulting/actuarial. I'd be happy to help critique resumes during my free time for all the graduating students.

u/adcqds · 1 pointr/datascience

The pymc3 documentation is a good place to start if you enjoy reading through mini-tutorials: pymc3 docs

Also these books are pretty good, the first is a nice soft introduction to programming with pymc &amp; bayesian methods, and the second is quite nice too, albeit targeted at R/STAN.

u/uwjames · 5 pointsr/datascience

There is a LOT you can learn. It can be very bewildering. Here are some links that should help you get started. There are a lot of other posts in this sub with good tips so you should browse a bit.

https://www.reddit.com/r/datascience/comments/7ou6qq/career_data_science_learning_path/

https://www.dataquest.io/blog/why-sql-is-the-most-important-language-to-learn/

https://www.becomingadatascientist.com/2016/08/13/podcast-episodes-0-3/

https://www.amazon.com/Python-Data-Analysis-Wrangling-IPython/dp/1449319793

Sooner or later you'll want to start tackling some projects. That's basically where I am now in the process. I'm at the point where I know enough about Python, Statistics, and SQL to integrate some skills and hopefully do something interesting.

Best advice I can give you is

  1. Keep moving forward even if the task is daunting.

  2. Try to code for at least an hour every day
u/core_dumpd · 3 pointsr/datascience

Jose Portilla on Udemy has some good python based courses (and also frequents this subreddit). There's regularly sales or some sort of coupon code available to get any of the courses for $10-$15, so it's very reasonable.

For books:

https://www.amazon.com/Python-Data-Analysis-Wrangling-IPython/dp/1491957662/ref=asap_bc?ie=UTF8 ... it's not out yet, but due any day. You can also get preview access on sites like Safari Online (which would also have all the books below).

https://www.amazon.com/Data-Science-Scratch-Principles-Python/dp/149190142X/ref=sr_1_1

For general python:

https://www.amazon.com/Python-Crash-Course-Hands-Project-Based/dp/1593276036/ref=sr_1_1

https://www.amazon.com/Automate-Boring-Stuff-Python-Programming/dp/1593275994/ref=sr_1_1

No Starch Press, OReilly, APress and Manning generally have pretty good quality publications. I'd usually skip anything from Packt, unless it's specifically received good reviews.

u/Diet-Cocaine · 4 pointsr/datascience

Hi /r/datascience. I'm an aspiring data scientist and I'm trying to put together a data science course that's self-taught and can be done on one's time. Any pointers would be appreciated.

Section A: Foundations in Mathematics

u/Blarglephish · 1 pointr/datascience

Awesome list! I'm a software engineer looking to make the jump over to data science, so I'm just getting my feet wet in this world. Many of these books were already on my radar, and I love your summaries to these!

One question: how much is R favored over Python in practical settings? This is just based off of my own observation, but it seems to me that R is the preferred language for "pure" data scientists, while Python is a more sought-after language from hiring managers due to its general adaptability to a variety of software and data engineering tasks. I noticed that Francois Chollett also as a book called Deep Learning with Python , which looks to have a near identical description as the Deep Learning with R book, and they were released around the same time. I think its the same material just translated for Python, and was more interested in going this route. Thoughts?

&amp;#x200B;

u/Aidtor · 1 pointr/datascience

If you want to be valuable to companies post graduation you should learn more about programming (design templates, how to write tests, how to go from a paper to code). I recommend this book as a good starting place. Once you're comfortable with how the different methods work, pick up this book.

u/shex1627 · 5 pointsr/datascience

Signal and the Noise by Nate Silver

makes me think about what ML/AI/DS can do and can not do. What should I be focusing on...

The Data Science Handbook: Advice and Insights from 25 Amazing Data Scientists

Good inspirations from many data scientists

u/bbowler86 · 2 pointsr/datascience

Yeah to be honest the only thing that you get with the Enterprise version is some visualization stuff which is meh at best, an Enterprise Scheduler which doesn't even have job dependencies, and support. We had a Consultant come in from the Normandy Group before we started using it and do an evaluation between PDI and Informatica based on our needs and his conclusion was that 95% of everything we needed to do we could do with PDI and we didn't have to pay for it. It hasn't let me down except for some export to Excel stuff but you really shouldn't be doing reporting with an ETL tool anyway. There is of course performance tradeoffs between using any ETL tool and straight SQL/scripting but the amount of time you save and being able to reproduce with a tool like Pentaho make it worth it.

If you are serious about it I would suggest this book. And I mean read it. Bad code makes bad code regardless if you script this with Python or Pentaho. It is a bit of a learning curve but worth it in my opinion.

u/revgizmo · 56 pointsr/datascience

I can’t recommend highly enough 3 books on good visualizations in business (and everywhere else)

  1. Storytelling with Data: A Data Visualization Guide for Business Professionals buy this, use this

  2. The Wall Street Journal Guide to Information Graphics: The Dos and Don'ts of Presenting Data, Facts, and Figures

  3. Show Me the Numbers: Designing Tables and Graphs to Enlighten (the gold-standard usable textbook)

    Report format for abstract/methods/etc vs PowerPoint for salespeople varies dramatically from company to company, so I don’t have any good recommendations there. But in the “a picture is worth a thousand words” world, visualizations really matter.
u/leokassio · 1 pointr/datascience

Many thanks about kaggle tip and book!
Despite your tip about book (that I dont known), I'd like to recommend the DATA MINING from the authors of Weka, a very good book too.(http://www.amazon.ca/Data-Mining-Practical-Learning-Techniques/dp/0123748569/ref=sr_1_1?s=books&amp;amp;ie=UTF8&amp;amp;qid=1425389007&amp;amp;sr=1-1&amp;amp;keywords=data+mining)

u/coffeecoffeecoffeee · 1 pointr/datascience

When I was an undergrad I took an MBA data mining class where we used Data Mining for Business Intelligence. I found it great for explaining why you'd use a technique to solve a particular problem. The only issue is it uses XLMiner to teach, which is an Excel add-on sold by the authors. But you should be able to follow it just fine without it.

u/YeahILiftBro · 3 pointsr/datascience

Not mathematical, but Storytelling with Data: A Data Visualization Guide for Business Professionals https://www.amazon.com/dp/1119002257/ref=cm_sw_r_cp_apa_i_WhB.AbRPZ14ET

Is a good start to communicating results and really easy to understand. Almost mind blowing how much I was missing previously.

u/ThatOtherBatman · 14 pointsr/datascience

I actually just made a post about this book today. It was a good book when it was first released, but doesn't appear to have kept up with the pace that Pandas has developed. There's a second version which was released relatively recently, and even that doesn't mention some not too new features, and does reference some things that are highly outdated.
I've heard good things about Python Cookbook

u/Wafzig · 1 pointr/datascience

This. The book that accompanies these videos link is one of my main go-to's. Very well put together. Great examples.

Another real good book is Practical Data Science with R.

I'm not sure what language the John's Hopkins Coursera Data Science courses is done in, but I'd imagine either R or Python.

u/_Paxifist_ · 3 pointsr/datascience

http://www.amazon.com/Visualize-This-FlowingData-Visualization-Statistics/dp/0470944889

Took a data viz class last year. This was the textbook. Nathan yau's website flowing data is a good resource as well. Also check out d3.js for an advanced/flexible technology for visualizing data.

u/ttelbarto · 1 pointr/datascience

Hi, There are so many resources out there I don't know where to start! I would work through some kind of beginner python book (recommendation below). Then maybe try Andrew Ng's Machine Learning Coursera course to get a taste of Machine Learning. Once you have completed both of those I would reassess what you would like to focus on. I will include some other books I would recommend below.

Beginner Python - https://www.amazon.co.uk/Python-Crash-Course-Hands-Project-Based/dp/1593276036/ref=sr_1_3?keywords=python+books&amp;qid=1565035502&amp;s=books&amp;sr=1-3

Machine Learning Coursera - https://www.coursera.org/learn/machine-learning

Python Machine Learning - https://www.amazon.co.uk/Hands-Machine-Learning-Scikit-Learn-TensorFlow/dp/1491962291/ref=sr_1_7?crid=2QF98N9Q9GCJ9&amp;keywords=hands+on+data+science&amp;qid=1565035593&amp;s=books&amp;sprefix=hands+on+data+sc%2Cstripbooks%2C183&amp;sr=1-7

https://www.amazon.co.uk/Data-Science-Scratch-Joel-Grus/dp/1492041130/ref=sr_1_1?crid=PJEJNNUBNQ8N&amp;keywords=data+science+from+scratch&amp;qid=1565035617&amp;s=books&amp;sprefix=data+science+from+s%2Cstripbooks%2C140&amp;sr=1-1

Statistics (intro) - https://www.amazon.co.uk/Naked-Statistics-Stripping-Dread-Data/dp/039334777X/ref=sr_1_1?keywords=naked+statistics&amp;qid=1565035650&amp;s=books&amp;sr=1-1

More stats (I haven't read this but gets recommended) - https://www.amazon.co.uk/Think-Stats-Allen-B-Downey/dp/1491907339/ref=sr_1_1?keywords=think+stats&amp;qid=1565035674&amp;s=books&amp;sr=1-1

u/SnOrfys · 3 pointsr/datascience

I'm going through Practial Data Science With R right now, and it's fantastic. It's basically an end-to-end walk-through of how to prepare, plan, perform, test and deliver a data science project.

Though this book does go through of some of the major modelling tasks, with examples in R using publicly available datasets, it's not a stats/ML text.

u/flipstables · 1 pointr/datascience

Highly recommend this book:

http://www.amazon.com/Pentaho-Kettle-Solutions-Building-Integration/dp/0470635177

I believe Pentaho has some free books out there, but this was already on the shelf at a company so I picked it up.

u/Watchestelltime · 1 pointr/datascience

Check out this book: https://www.amazon.com/Marketing-Research-Analytics-Use/dp/3319144359

I think it does a nice job of explaining the concepts using examples.

u/datadude · 6 pointsr/datascience

I have an excellent statistics text book that I am using to learn stats: Discovering Statistics Using R by Andy Field. My approach is to do the exercise in R first, then try to reproduce the same result in Python. It's slow going, but it's a real learning experience.

u/666f6f626172 · 7 pointsr/datascience

I doubt any courses you take would spend more than a day on the basics of a language. That's something you need to learn on your own. What's your background like? It sounds like you don't have much programming experience, so perhaps start with this. Then maybe this for learning numpy, pandas, and matplotlib.

EDIT: Didn't realize you were still in high school. I don't believe there's a specific data science undergrad program anywhere, but any STEM undergrad program will probably include an introductory programming course.

u/wouldeye · 4 pointsr/datascience

field's "introduction to statistics using R" is the best book for my money.

EDIT: sorry I got the title wrong:

https://www.amazon.com/Discovering-Statistics-Using-Andy-Field/dp/1446200469

u/killingRadio · 4 pointsr/datascience

Get a copy of Visualize This. Visualize This: The FlowingData Guide to Design, Visualization, and Statistics https://www.amazon.com/dp/0470944889/ref=cm_sw_r_cp_api_jA9KzbF2FGX6M