Best statistical software books according to redditors

We found 115 Reddit comments discussing the best statistical software books. We ranked the 54 resulting products by number of redditors who mentioned them. Here are the top 20.

Next page

Top Reddit comments about Mathematical & Statistical Software:

u/LittleOlaf · 32 pointsr/humblebundles

Maybe this table can help some of you to gauge how worth the bundle is.

| | | Amazon | | | Goodreads | |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------|------------------|---------|--------------|-----------|--------------|
| Tier | Title | Kindle Price ($) | Average | # of Ratings | Average | # of Ratings |
| 1 | Painting with Numbers: Presenting Financials and Other Numbers So People Will Understand You | 25.99 | 3.9 | 20 | 4.05 | 40 |
| 1 | Presenting Data: How to Communicate Your Message Effectively | 26.99 | 2.9 | 4 | 4.25 | 8 |
| 1 | Stories that Move Mountains: Storytelling and Visual Design for Persuasive Presentations | - | 4.0 | 13 | 3.84 | 56 |
| 1 | Storytelling with Data: A Data Visualization Guide for Business Professionals (Excerpt) | 25.99 | 4.6 | 281 | 4.37 | 1175 |
| 2 | 101 Design Methods: A Structured Approach for Driving Innovation in Your Organization | 22.99 | 4.2 | 70 | 3.98 | 390 |
| 2 | Cool Infographics: Effective Communication with Data Visualization and Design | 25.99 | 4.3 | 39 | 3.90 | 173 |
| 2 | The Visual Organization: Data Visualization, Big Data, and the Quest for Better Decisions | 31.71 | 3.8 | 43 | 3.03 | 35 |
| 2 | Visualize This: The FlowingData Guide to Design, Visualization, and Statistics | 25.99 | 3.9 | 83 | 3.88 | 988 |
| 3 | Data Points: Visualization That Means Something | 25.99 | 3.9 | 34 | 3.87 | 362 |
| 3 | Infographics: The Power of Visual Storytelling | 19.99 | 4.0 | 38 | 3.79 | 221 |
| 3 | Graph Analysis and Visualization: Discovering Business Opportunity in Linked Data | 40.99 | 4.2 | 3 | 3.59 | 14 |
| 3 | Tableau Your Data!: Fast and Easy Visual Analysis with Tableau Software, 2nd Edition | 39.99 | 4.0 | 66 | 4.14 | 111 |
| 3 | Visualizing Financial Data | 36.99 | 4.7 | 4 | 3.83 | 6 |

u/SharpSightLabs · 13 pointsr/Python

Here's what I'd recommend.


GETTING STARTED WITH DATA SCIENCE


If you're interested in learning data science I'd suggest the following:
 

Tools

  1. I’d recommend learning R before Python (although Python is an exceptional tool). Here are a few reasons.
    1. Many of the hot tech companies in SF, the Valley, and NYC like Google, Apple, FB, LinkedIn, and Twitter are using R for much of their data science (not all of it, but a lot).
    2. R is the most common programming language among data scientists. O’Reilly Media just released their 2014 Data Science Salary Survey. I’ll caveat though, that Python came in at a close second. Which leads me to the third reason:
    3. R has 2 packages that dramatically streamline the DS workflow:
      • dplyr for data manipulation
      • ggplot2 for data visualization

        Learning these has several benefits: they streamline your workflow. They speed up your learning process, since they are very easy to use. And perhaps most importantly, they really teach you how to think about analyzing data. GGplot2 has a deep underlying structure to the syntax, based on the Grammar of Graphics theoretical framework. I won’t go into that too much, but suffice it to say, when you learn the ggplot2 syntax, you’re actually learning how to think about data visualization in a very deep way. You’ll eventually understand how to create complex visualizations without much effort.
         

        Skill Areas
        My recommendations are:

  2. Learn basic data visualizations first. Start with the essential plots:
    • the scatter plot
    • the bar chart
    • the line chart
      (But, again I recommend learning these in R’s ggplot2.) The reason I recommend these is
      1. The are, hands down, the most common plots. For entry level jobs, you’ll use these every day.
      2. They are “foundational” in the sense that when you learn about the underlying structure of these plots, it begins to open up the world of complex data visualizations.
        As with any discipline, you need to learn the foundations first; this will dramatically speed your progress in the intermediate to advanced stages.
      3. You’ll need these plots as “data exploration” tools. Whether you’re finding insights for your business partners or investigating the results of a sophisticated ML algorithm, you’ll likely be exploring your data visually.
      4. These plots are your best “data communication” tools. As noted elsewhere in this thread, C-level execs need you to translate your data-driven insights into simple language that can be understood in a 1-hour meeting. Communicating visually with the basic plots will be your best method for communicating to a non-technical audience. Communicating to non-technical audiences is a critical (and rare) auxiliary skill, so if you can learn to do this you will be very highly valued by management.
        I usually suggest learning these with dummy data (for simplicity) but if you have a simple .csv file, that should work to.
  3. Learn data management second (AKA, data wrangling, data munging)
    After you learn data visualization, I suggest that you “back into” data management. For this, you should find a dataset and learn to reshape it.
    The core data management skills:
    • subsetting (filtering out rows)
    • selecting columns
    • sorting
    • adding variables
    • aggregating
    • joining
      You can start learning these here. Again, I recommend learning these in R’s dplyr because dplyr makes these tasks very straight forward. It also teaches you how to think about data wrangling in terms of workflow: the “chaining operator” in dplyr helps you wire these commands together in a way that really matches the analytics workflow. dplyr makes it seamless.
  4. Learn machine learning last.
    ML is sort of like the “data science 301” course vs. the 102 and 103 levels of the data-vis and data manipulation stuff I outlined above.
    Here, I’ll just give book recos:
  5. Nathan Yao of Flowing Data is great. His blog shows excellent data visualization examples. Also, I highly recommend his books. In particular, Data Points. Data Points will help you learn how to think about visualization.
  6. The book ggplot2 by Hadley Wickham. This is a great resource (though a little outdated, as Hadley has updated the ggplot package).
  7. I also really like Randal Olson’s work (AKA, /u/rhiever). He creates some great data visualizations that can serve as inspiration as you start learning.
     

    TL;DR

    I'd recommend learning R for data science before Python. Learn data visualization first (with R's ggplot2), using simple data or dummy data. Then find a more complicated dataset. Learn data manipulation second (with R's dplyr), and practice data manipulation on your more complex data. Learn machine learning last.

u/w3woody · 12 pointsr/computerscience

Read about the topic.

Practice.

Set yourself little challenges that you work on, or learn a new language/platform/environment. If you're just starting, try different easy programming challenges you find on the 'net. If you've been doing this a while, do something more sophisticated.

The challenges I've set for myself in the recent past include writing a LISP interpreter in C, building a recursive descent parser for a simple language, and implementing different algorithms I've encountered in books like Numerical Recipes and Introduction to Algorithms.

(Yes, I know; you can download libraries that do these things. But there is something to be gained by implementing quicksort in code from the description of the algorithm.)

The trick is to find interesting things and write code which implements them. Generally you won't become a great programmer just by working on the problems you find at work--most programming jobs nowadays consist of fixing code (a different skill from writing code) and involve implementing the same design patterns for the same kind of code over and over again.

----

When I have free time I cast about for interesting new things to learn. The last big task I set for myself was to learn how to write code for the new iPhone when it came out back in 2008. I had no idea that this would change the course of my career for the next 9 years.

u/incogsteveo · 10 pointsr/psychologystudents

I've always had a knack for the stuff. When I TAed graduate classes I found this book to be helpful in explaining some advanced statistical concepts in plain language. If you are specifically learning to use the SPSS program, this book is by far the best. Good luck!

u/COOLSerdash · 9 pointsr/statistics
u/Sarcuss · 6 pointsr/statistics

I would say: Go for it as long as you are interested in the job :)

For study references for remembering R and Statistics, I think all you would need would be:

For R, data cleaning and the such: http://r4ds.had.co.nz/ and for basic statistics with R probably either Daalgard for Applied Statistics with R and something like OpenIntroStats or Freedman for review of stats

u/CoolCole · 6 pointsr/tableau

Here's an "Intro to Tableau" Evernote link that has the detail below, but this is what I've put together for our teams when new folks join and want to know more about it.

http://www.evernote.com/l/AKBV30_85-ZEFbF0lNaDxgSMuG9Mq0xpmUM/

What is Tableau?

u/ThisIsMyOkCAccount · 5 pointsr/mathbooks

The book Ideals, Varieties and Algorithms by Cox, Litle and O'Shea is a very good undergraduate level algebraic geometry book. It has the benefit of teaching you the commutative algebra you need along the way instead of assuming you know it.

I'm not really aware of any algebraic topology books I'd consider undergraduate, but most of them are accessible to first year grad students anyway, which isn't too far away from senior undergrad. Some of my favorite sources for that are Munkres' book and Fulton's Book.

For knot theory, I haven't really studied it myself, but I've heard that The Knot Book is quite good and quite accessible.

u/Flamdrags5 · 4 pointsr/statistics

Applied Predictive Modeling by Kuhn and Johnson

Gives good interpretations of different approaches as well as listing the strengths, weaknesses, and ways to mitigate the weaknesses of those approaches. If you're an R user, this book is an excellent reference.

u/SemaphoreBingo · 4 pointsr/math

You've almost got a quadratic form: https://en.wikipedia.org/wiki/Quadratic_form maybe you can add a dummy variable to homogenize the linear terms

That aside, (computational) algebraic geometry has a lot to say about this problem, in particular you might want to start here:
https://www.amazon.com/Ideals-Varieties-Algorithms-Computational-Undergraduate/dp/0387356509

u/iwontmakeyoursammich · 4 pointsr/statistics
u/7thSigma · 4 pointsr/Physics

Numerical Recipes is a veritable catalogue of different methods. Depending on what field you're interested in though there is surely a text with a title along the lines of 'Computational methods for [insert field] physics'

u/vbukkala · 4 pointsr/datascience

There is the second edition (2018) of APM
Check out here:
https://www.amazon.com/Applied-Predictive-Modeling-Max-Kuhn/dp/1461468485

u/Luonnon · 3 pointsr/rstats

Quick and dirty answer: speaking very broadly, random forests -- found in the "randomForest" package -- tend to win battle-of-the-algorithms type studies. If you just want to play with a single model, I'd recommend starting with that and looking at the help for it.

Longer and better answer: Your best bet to answering all these questions and getting a good handle on data mining/predictive analytics is this book: Applied Predictive Modeling. The book references the "caret" package quite a bit, since the package's author is the same person. With it, you can train a lot of different types of models for regression or classification optimizing for accuracy, RMSE, ROC, etc. It provides a standard API for playing with models and makes your life much, much easier. It has its own website here.

u/phaeries · 3 pointsr/AskEngineers

Not sure what your skill level is, or the application you're using MATLAB for, but here are a few resources:

u/DataWave47 · 3 pointsr/datascience

You're welcome. Thanks for providing some additional detail. This helps. I think if you read up on the CRISP-DM and use that framework to walk your way through some of these challenges it will be very beneficial to you. I'd recommend giving this document a read when you have the time. I think that if you show them that you are comfortable with these guidelines and know how to work your way through it to solve a problem it will go a long way. Model selection can be a bit tricky depending on the situation but I think most practitioners have a favorite model that they go to. Sounds like you're already familiar with Wolpert's "No Free Lunch Theorem" suggesting to try a wide variety of techniques. Personally, this is where I'd start digging deeper into tuning parameters (cross-validation, etc.) to help with that decision. Ultimately though, it's important to have a firm understanding of the strengths/weaknesses of the different models and their use cases so you can make an informed selection decision. Kuhn and Johnson's book Applied Predictive Modeling will be a good read to help you prepare.

u/Anarcho-Totalitarian · 3 pointsr/math

There are numerical methods that make essential use of randomness, such as Monte Carlo methods or simulated annealing.

And numerical methods can be applied to statistics problems. A nonlinear model is probably going to require a numerical scheme to implement. The book Numerical Recipes, which is all about actually implementing numerical methods on a computer, has four chapters covering randomness and statistics.

> My plan at present is to do a PhD in numerical PDEs and then go into industry in scientific computing as a researcher or developer.

I'd make sure that whoever it is you want to work with is heavy on the computation side. Even better if they work with supercomputers. I say this because even a topic like numerical PDEs can go very deep into theory (consider this paper ). Industry likes computer skills.

u/SoSweetAndTasty · 3 pointsr/AskPhysics

To go with the other comments I also recommend picking up a book like numerical recipes which describes in detail many well tested algorithms.

u/gwippold · 3 pointsr/statistics

You could read the IBM manual OR you could buy this much more user friendly book:

http://www.amazon.com/Discovering-Statistics-using-IBM-SPSS/dp/1446249182

u/icybrain · 3 pointsr/Rlanguage

It sounds like you're looking for time series material, but Applied Predictive Modeling may be of interest to you. For time series and R specifically, this text seems well-reviewed.

u/TheDataScientist · 3 pointsr/statistics

Many thanks. I can speak more on the topic, but you're wanting to learn a lot about Machine learning (well lasso and ridge regression technically count as statistics, but point stands).

If you learn best via online courses, I'd suggest starting with Andrew Ng's Machine Learning Course

If you learn best through reading, I'd recommend two books: Hastie, Tibshirani, & Friedman - Elements of Statistical Learning
and Kuhn & Johnson - Applied Predictive Modeling

Obviously, I'd also recommend my blog once I learn my audience.

u/Adamworks · 3 pointsr/statistics

Discovering Statistics using SPSS
This is the type of book you want. It has good examples and straightforward commentary.

u/[deleted] · 3 pointsr/programming

The recommended book for my computational biology degree is Dalgaard which I've found very handy. It's very clear and accessible.

For the gritty details, you'll have the built-in R help which is excellent.

Edit: He works on the R project, as you can see here

u/sjgw137 · 3 pointsr/statistics

I really like this book:
http://www.amazon.co.uk/Discovering-Statistics-using-IBM-SPSS/dp/1446249182/ref=as_li_tf_sw?&linkCode=wsw&tag=statihell-21

Fun to read, easy to understand, entertaining. What stats book is entertaining???

u/grandzooby · 3 pointsr/javascript

I ran across this presentation last night by Max Kuhn, one of the authors of Applied Predictive Modeling (http://www.amazon.com/Applied-Predictive-Modeling-Max-Kuhn/dp/1461468485).

https://static.squarespace.com/static/51156277e4b0b8b2ffe11c00/t/513e0b97e4b0df53689513be/1363020695985/KuhnENAR.pdf

It's a really great discussion of how they did the joint authoring of the book and the tools they used - and what they would do differently.

u/indigo-bunting · 2 pointsr/ecology

I'm taking a stats class right now and we're just working our way through Ben Bolker's book which is decent and goes through a lot of stats theory. It's completely R based though, which may or may not be helpful to you.

http://www.amazon.com/Ecological-Models-Data-Benjamin-Bolker/dp/0691125228

u/Turtleyflurida · 2 pointsr/sas

The structure of the claims might vary depending on the source but a good source of information are the videos you can find at https://www.resdac.org/workshops/intro-medicare. Check out the rest of the resdac website as well. This book is pretty good but might not match your claims exactly if you work with a contractor as opposed to a research organization.

There are lots of nuances using Medicare claims data that you will have to learn. Hopefully you have someone with experience to guide you. The learning curve is rather steep but not insurmountable. If you come across specific questions please post them here.

u/EorEquis · 2 pointsr/astrophotography

> I have struggled with noise reduction in PixInsight and even when I used Photoshop.

I think most of us (I KNOW I do) have the same difficulty, and it boils down to...well, quite frankly, to wanting our amateur images to look like Hubble results.

Harsh though he can sometimes be, I think Juan Conejero of the PixInsight team said it best :

> A mistake that we see too often these days is trying to get an extremely smooth background without the required data support. This usually leads to "plastic looking" images, background "blobs", incongruent or unbelievable results, and similar artifacts. Paraphrasing one of our reference books, this is like trying to substantiate a questionable hypothesis with marginal data. Observational data are uncertain by nature—that is why they have noise, and why we need noise reduction—, so please don't try to sell the idea that your data are pristine. We can only reduce or dissimulate the noise up to certain limits, but trying to remove it completely is a conceptual error: If you want less noise in your images, then what you need is to gather more signal.

Admittedly...agreeing with Juan doesn't mean I'll stop trying to "prove him wrong" anyway. I'll still mash away at TGVDenoise until the background looks like lumpy oatmeal and call it "noise reduction"...but I'll feel 2 minutes of shame when /u/spastrophoto calls me out on it. ;)

Having said that, I think the article linked above and this comparison probably did more for my "understanding" of PI's NR tools than any others I've read....for whatever that's worth. :)

> Glad to see a fellow hockey player on here...not many astrophotographer/hockey player hybrids out there!

Thought the username looked vaguely familiar. :) It IS an interesting combination, ain't it?

That's one more added to the list now... /u/themongoose85 is a hockey player too.

u/ohnoplus · 2 pointsr/statistics

For statistical models in ecology, I reccommend Ben Bolker's Ecological Models and Data in R.

u/elimeny · 2 pointsr/funny

If you liked that... you'd also love "Discovering Statistics using SPSS" by Andy Field (the second title is "And Sex And Drugs And Rock N' Roll")

http://www.amazon.com/Discovering-Statistics-using-IBM-SPSS/dp/1446249182

u/0111001101110000 · 2 pointsr/datascience

I think you are looking for a book to show you how to do real Statistics and Probability on real data.

Obviously this requires a computer and some programming language. I think R is an excellent place to practice these skills and concepts. I have not read the R Cookbook, but I thought Introductory Statistics with R was good. It should be a good resource for practicing Statistical Programming, but I do not see a free version, but if you find the table of contents it's a good list of items to learn.

u/nauree · 2 pointsr/EngineeringStudents

Just got this book. Its pretty great so far. Very straight forward.

u/sazken · 2 pointsr/GetStudying

Yo, I'm not getting that image, but at a base level I can tell you this -

  1. I don't know you if you know any R or Python, but there are good NLP (Natural Language Processing) libraries available for both

    Here's a good book for Python: http://www.nltk.org/book/

    A link to some more: http://nlp.stanford.edu/~manning/courses/DigitalHumanities/DH2011-Manning.pdf

    And for R, there's http://www.springer.com/us/book/9783319207018
    and
    https://www.amazon.com/Analysis-Students-Literature-Quantitative-Humanities-ebook/dp/B00PUM0DAA/ref=sr_1_9?ie=UTF8&qid=1483316118&sr=8-9&keywords=humanities+r

    There's also this https://www.amazon.com/Mining-Social-Web-Facebook-LinkedIn/dp/1449367615/ref=asap_bc?ie=UTF8 for web scraping with Python

    I know the R context better, and using R, you'd want to do something like this:

  2. Scrape a bunch of sites using the R library 'rvest'
  3. Put everything into a 'Corpus' using the 'tm' library
  4. Use some form of clustering (k-nearest neighbor, LDA, or Structural Topic Model using the libraries 'knn', 'lda', or 'stm' respectively) to draw out trends in the data

    And that's that!
u/spring_m · 2 pointsr/datascience

Also check out Applied Predictive Modeling - it's in a way the next book to read after ISLR - it goes a bit more in depth about good practices, plusses and minuses of different models, feature creation/extraction.

u/EmperorsNewClothes · 2 pointsr/Physics

In addition, this book will save your life. With a good programming base, it's almost like cheating.

u/ffualo · 2 pointsr/askscience

It's very clear for a book on mathematical statistics. It also considers the Bayesian (and even Empirical Bayesian) approach. I'm sometimes shocked at what it covers and how well it covers it in so few pages. For example, there's a nice section on the EM algorithm, which most books in the same class don't cover (unless they're huge).

Edit: I should mention... if you're a scientist looking for how statistics works this is the book for you. If you want to learn a ton about regression/ANOVA, time-series, covariance structures, blah, blah, blah, this book is not for you. A great introduction (for all scientists) that covers this stuff quickly and effectively (as well as MLE, optimization, and R) is Ecological Models and Data with R.

Edit 2: If you want applied linear models, Applied Linear Statistical Models is good, but doesn't use R. Luckily formula objects and delayed evaluation give R some beautiful expressivity here.

u/7buergen · 2 pointsr/IRstudies

Sure, the basics, but for advanced information gathering consider using SPSS. Andy Field gives a good introduction if you're interested.

u/DrewEugene17 · 2 pointsr/italy
u/tobbern · 1 pointr/norge

Google Forms er bra og svarene dine vil bli lagret i et ark som kan lastes inn i SPSS. SPSS kjenner .sav, .csv og excel-varianter. Her er en video hvor Andy Field forklarer deg hvordan du kan gjøre det:

https://www.youtube.com/watch?v=nchjj4XzIWc

Den eneste begrensningen med Google Forms du må bekymre deg for er om du skal ha mer enn 400.000 respondenter og over 256 spørsmål. Dette er begrensningen på datasettet som vil bli laget i Google Spreadsheets. (Disse er ikke noe å undervurdere forresten.)

Fordi Google Forms er et gratis alternativ og jeg har aldri sett det mislyktes på grunn av for mye trafikk så anbefaler jeg det på det sterkeste. Jeg bruker Surveymonkey og Fluidsurveys på jobb og det har kun fordeler om du skal ha mer enn en halv million respondenter i en kort periode (f.eks en uke eller måned). Det koster også penger så jeg anbefaler Google forms.

u/callinthekettleblack · 1 pointr/dataisbeautiful

Yep, humans perceive differences in length much better than differences in angle. Yau's book Data Points talks about this extensively with examples.

u/baialeph1 · 1 pointr/math

Not sure if this is exactly what you're looking for as it was written by physicists, but this is considered the bible of numerical methods in my field (computational physics): http://www.amazon.com/Numerical-Recipes-3rd-Edition-Scientific/dp/0521880688

u/workpsy · 1 pointr/IOPsychology

I highly recommend Andy Field's book Discovering Statistics Using IBM SPSS Statistics. He has a gift for simplifying complex statistical concepts. Additionally, you'll be learning to use SPSS, which is guaranteed to be useful in your graduate studies and career. Alternatively, he offers the same book for other statistical softwares.

u/Sampo · 1 pointr/MachineLearning

Apparently I have to go to the Amazon page to find a table of contents(?)

EDIT: ok they added a TOC.

u/alexandr_msu · 1 pointr/sas_ru

Мой ответ был бы - если есть веские ЗА и веские ПРОТИВ, то лучше всего комбинировать SAS и R. Делается это очень просто: SAS предоставляет API для вызова R функций через SAS/IML(пример). Также можно использовать макрос %PROC_R. Для SAS программистов не знакомых с R есть замечательная книга: "SAS and R: Data Management, Statistical Analysis, and Graphics"

u/vmsmith · 1 pointr/rstats

I didn't know about MLR until this post. So without having spent any time with it whatsoever, I would only say that one of the nice things about the caret package is that you can also leverage Kuhn and Johnson's book, Applied Predictive Modeling, as well as YouTube videos of Max Kuhn discussing caret.

u/ObnoxiousFactczecher · 1 pointr/Amd
u/fatangaboo · 1 pointr/AskEngineers

Yes there is software.

The first thing I would suggest is to try the Microsoft Excel "Solver" . It is actually a wonderful piece of highly polished numerical analysis code, buried inside a stinky, steaming turd called Excel Spreadsheets. You and Google, working together, can find hundreds of tutorials about this, including

(link 1)

(link 2)

(link 3)

(link 4)

If you prefer to code up the algorithm(s) yourself, so you can incorporate them in other bigger software you've got, I suggest purchasing the encyclopaedic textbook NUMERICAL RECIPES. This tour-de-force textbook / reference book has an entire chapter devoted to optimization, including source code for several different algorithms. I recommend Nelder-Mead "amoeba" but other people recommend other code.

u/krypton86 · 1 pointr/ECE

Get this book and start reading it asap.

u/shaggorama · 1 pointr/datascience

You'll probably find this article and its references interesting: https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining

I also strongly recommend this book: http://www.amazon.com/Guerrilla-Analytics-Practical-Approach-Working/dp/0128002182

If you're looking something more technical about actually doing analyses, this is book is very accessible: http://www.amazon.com/Applied-Predictive-Modeling-Max-Kuhn/dp/1461468485

If you use R, this book is really great: http://www.dcc.fc.up.pt/~ltorgo/DataMiningWithR/

u/SomeOne10113 · 1 pointr/EngineeringStudents

We've been using this: http://www.amazon.com/MATLAB-Engineers-Edition-Holly-Moore/dp/0133485978

When I do use it, it has pretty helpful explanations and examples. It's also pretty easy to skim, which is nice.

u/coconutcrab · 1 pointr/sociology

I'm late to the game on this one, but learning these programs cannot be stressed enough. Different institutions have different preferences for programs, so you may hear about MATLAB, SPSS, STATA, R, etc etc. Pick one and go for it. My personal suggestion is to begin with SPSS. It's very user friendly and a great kickoff program for getting your feet wet.

Your school may have stats classes where you'll learn SPSS or a program like it, but if you want to go at it on your own for a headstart, I suggest two things: the first, YouTube, FOREVER. There are a ton of helpful videos which take you step by step through the processes of using almost any program you can think of, and the best part is that YouTube is free.

The second is that it's never a bad idea to pick up a great book, a go to reference guide if you need it. Discovering Statistics Using SPSS is written by a great author who (shockingly) does not make the subject matter seem dry. I own the R equivalent and am looking to pick up the SPSS version soon because I liked it so much.

Costs of textbooks/stats reference books are high, I know. But for my preference, nothing beats having that go to reference item on your shelf. If you decide to start shopping around, you can ask around in /r/booksuggestions or /r/asksocialscience and see what others use to find the best book for you.

u/SubDes · 1 pointr/statistics

If you what a bit of both worlds. The book SAS and R: Data Management, Statistical Analysis, and Graphics is also worth a look.
It might not be the best general introductions, because it cover both at the same time, but is very to the point on how to do most normal data manipulations and statistical calculations.

u/comeUndon · 1 pointr/tableau

This will be your best bet. It should just ship with a license. :)

http://www.amazon.com/Tableau-Your-Data-Analysis-Software/dp/1118612043

u/mikeblas · 1 pointr/skeptic

Bayesian Methods in the Search for MH370 is a really great book on the analysis and critical thinking that one team did to help try and narrow down the search. It's a really great read (for a technical book.)

u/Niemand262 · 1 pointr/AskStatistics

I'm a graduate student who teaches an undergraduate statistics course, and I'm going to be brutally honest with you.


Because you have expressed a lack of understanding about what standard deviation is, I don't anticipate that you will be able to understand the advice that you receive here. I teach statistics at an undergraduate level. I teach standard deviations during week 1, and I teach ANOVA in the final 2 weeks. So, you are at least a full undergraduate course away from understanding the statistics you will need for this.

Honestly, you're probably in over your head on this and a few days spent on reddit aren't going to give you what you're looking for. Even if you're given the answers here, you'll need the statistical knowledge to understand what the answers actually mean about your data.


You have run an experiment, but the data analysis you want to do requires expertise. It's a LOT more nuanced and complex than you probably realized from the outset.


Some quick issues that I see here at a glance...

Mashing together different variables can make a real mess of the data, so the scores you have might not even be useful if you were to attempt to run an ANOVA (the test you would need to use) on them.

With what you have shown us in the post, we are unable to tell if group b's scores are higher because of the message they received or whether they just happen to be higher due to random chance. Without the complete "unmashed" dataset we won't be able to say which of the "mashed" measurements are driving the effect.


I have worked with honors students that I wouldn't trust with the analysis you need. Because you are doing this for work, you really should consider contacting a professional. You can probably hire a graduate student to do the analysis for a few hundred dollars as a side job.


If you really want to learn how to do it for yourself, I would encourage you to check out Andy Field's text book. He also has a YouTube Channel with lectures, but they aren't enough to teach you everything you need to understand. Chapter 11 is ANOVA, but you'll need to work your way up to it.

u/vasili111 · 1 pointr/statistics

R survey package author also has a book, where it is really good explanation how to use survey package: https://www.amazon.com/Complex-Surveys-Analysis-Survey-Methodology-ebook/dp/B005PS6C9A

I had no experience with survey analysis before and that book helped me a lot.

u/sneddo_trainer · 1 pointr/chemistry

Personally I make a distinction between scripting and programming that doesn't really exist but highlights the differences I guess. I consider myself to be scripting if I am connecting programs together by manipulating input and output data. There is lots of regular expression pain and trial-and-error involved in this and I have hated it since my first day of research when I had to write a perl script to extract the energies from thousands of gaussian runs. I appreciate it, but I despise it in equal measure. Programming I love, and I consider this to be implementing a solution to a physical problem in a stricter language and trying to optimise the solution. I've done a lot of this in fortran and java (I much prefer java after a steep learning curve from procedural to OOP). I love the initial math and understanding, the planning, the implementing and seeing the results. Debugging is as much of a pain as scripting, but I've found the more code I write the less stupid mistakes I make and I know what to look for given certain error messages. If I could just do scientific programming I would, but sadly that's not realistic. When you get to do it it's great though.

The maths for comp chem is very similar to the maths used by all the physical sciences and engineering. My go to reference is Arfken but there are others out there. The table of contents at least will give you a good idea of appropriate topics. Your university library will definitely have a selection of lower-level books with more detail that you can build from. I find for learning maths it's best to get every book available and decide which one suits you best. It can be very personal and when you find a book by someone who thinks about the concepts similarly to you it is so much easier.
For learning programming, there are usually tutorials online that will suffice. I have used O'Reilly books with good results. I'd recommend that you follow the tutorials as if you need all of the functionality, even when you know you won't. Otherwise you get holes in your knowledge that can be hard to close later on. It is good supplementary exercise to find a method in a comp chem book, then try to implement it (using google when you get stuck). My favourite algorithms book is Numerical Recipes - there are older fortran versions out there too. It contains a huge amount of detailed practical information and is geared directly at computational science. It has good explanations of math concepts too.

For the actual chemistry, I learned a lot from Jensen's book and Leach's book. I have heard good things about this one too, but I think it's more advanced. For Quantum, there is always Szabo & Ostlund which has code you can refer to, as well as Levine. I am slightly divorced from the QM side of things so I don't have many other recommendations in that area. For statistical mechanics it starts and ends with McQuarrie for me. I have not had to understand much of it in my career so far though. I can also recommend the Oxford Primers series. They're cheap and make solid introductions/refreshers. I saw in another comment you are interested potentially in enzymology. If so, you could try Warshel's book which has more code and implementation exercises but is as difficult as the man himself.

Jensen comes closest to a detailed, general introduction from the books I've spent time with. Maybe focus on that first. I could go on for pages and pages about how I'd approach learning if I was back at undergrad so feel free to ask if you have any more questions.



Out of curiosity, is it DLPOLY that's irritating you so much?

u/berf · 0 pointsr/statistics

You could do worse than
Daalgard