Best statistical software books according to redditors
We found 115 Reddit comments discussing the best statistical software books. We ranked the 54 resulting products by number of redditors who mentioned them. Here are the top 20.
We found 115 Reddit comments discussing the best statistical software books. We ranked the 54 resulting products by number of redditors who mentioned them. Here are the top 20.
Maybe this table can help some of you to gauge how worth the bundle is.
| | | Amazon | | | Goodreads | |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------|------------------|---------|--------------|-----------|--------------|
| Tier | Title | Kindle Price ($) | Average | # of Ratings | Average | # of Ratings |
| 1 | Painting with Numbers: Presenting Financials and Other Numbers So People Will Understand You | 25.99 | 3.9 | 20 | 4.05 | 40 |
| 1 | Presenting Data: How to Communicate Your Message Effectively | 26.99 | 2.9 | 4 | 4.25 | 8 |
| 1 | Stories that Move Mountains: Storytelling and Visual Design for Persuasive Presentations | - | 4.0 | 13 | 3.84 | 56 |
| 1 | Storytelling with Data: A Data Visualization Guide for Business Professionals (Excerpt) | 25.99 | 4.6 | 281 | 4.37 | 1175 |
| 2 | 101 Design Methods: A Structured Approach for Driving Innovation in Your Organization | 22.99 | 4.2 | 70 | 3.98 | 390 |
| 2 | Cool Infographics: Effective Communication with Data Visualization and Design | 25.99 | 4.3 | 39 | 3.90 | 173 |
| 2 | The Visual Organization: Data Visualization, Big Data, and the Quest for Better Decisions | 31.71 | 3.8 | 43 | 3.03 | 35 |
| 2 | Visualize This: The FlowingData Guide to Design, Visualization, and Statistics | 25.99 | 3.9 | 83 | 3.88 | 988 |
| 3 | Data Points: Visualization That Means Something | 25.99 | 3.9 | 34 | 3.87 | 362 |
| 3 | Infographics: The Power of Visual Storytelling | 19.99 | 4.0 | 38 | 3.79 | 221 |
| 3 | Graph Analysis and Visualization: Discovering Business Opportunity in Linked Data | 40.99 | 4.2 | 3 | 3.59 | 14 |
| 3 | Tableau Your Data!: Fast and Easy Visual Analysis with Tableau Software, 2nd Edition | 39.99 | 4.0 | 66 | 4.14 | 111 |
| 3 | Visualizing Financial Data | 36.99 | 4.7 | 4 | 3.83 | 6 |
Here's what I'd recommend.
GETTING STARTED WITH DATA SCIENCE
If you're interested in learning data science I'd suggest the following:
 
Tools
Learning these has several benefits: they streamline your workflow. They speed up your learning process, since they are very easy to use. And perhaps most importantly, they really teach you how to think about analyzing data. GGplot2 has a deep underlying structure to the syntax, based on the Grammar of Graphics theoretical framework. I won’t go into that too much, but suffice it to say, when you learn the ggplot2 syntax, you’re actually learning how to think about data visualization in a very deep way. You’ll eventually understand how to create complex visualizations without much effort.
 
Skill Areas
My recommendations are:
(But, again I recommend learning these in R’s ggplot2.) The reason I recommend these is
As with any discipline, you need to learn the foundations first; this will dramatically speed your progress in the intermediate to advanced stages.
I usually suggest learning these with dummy data (for simplicity) but if you have a simple .csv file, that should work to.
After you learn data visualization, I suggest that you “back into” data management. For this, you should find a dataset and learn to reshape it.
The core data management skills:
You can start learning these here. Again, I recommend learning these in R’s dplyr because dplyr makes these tasks very straight forward. It also teaches you how to think about data wrangling in terms of workflow: the “chaining operator” in dplyr helps you wire these commands together in a way that really matches the analytics workflow. dplyr makes it seamless.
ML is sort of like the “data science 301” course vs. the 102 and 103 levels of the data-vis and data manipulation stuff I outlined above.
Here, I’ll just give book recos:
This is a highly regarded introduction
After you get these foundations, then you can move on to specialize in a particular area.
 
OTHER RESOURCES:
Data Visualization
 
TL;DR
I'd recommend learning R for data science before Python. Learn data visualization first (with R's ggplot2), using simple data or dummy data. Then find a more complicated dataset. Learn data manipulation second (with R's dplyr), and practice data manipulation on your more complex data. Learn machine learning last.
Read about the topic.
Practice.
Set yourself little challenges that you work on, or learn a new language/platform/environment. If you're just starting, try different easy programming challenges you find on the 'net. If you've been doing this a while, do something more sophisticated.
The challenges I've set for myself in the recent past include writing a LISP interpreter in C, building a recursive descent parser for a simple language, and implementing different algorithms I've encountered in books like Numerical Recipes and Introduction to Algorithms.
(Yes, I know; you can download libraries that do these things. But there is something to be gained by implementing quicksort in code from the description of the algorithm.)
The trick is to find interesting things and write code which implements them. Generally you won't become a great programmer just by working on the problems you find at work--most programming jobs nowadays consist of fixing code (a different skill from writing code) and involve implementing the same design patterns for the same kind of code over and over again.
----
When I have free time I cast about for interesting new things to learn. The last big task I set for myself was to learn how to write code for the new iPhone when it came out back in 2008. I had no idea that this would change the course of my career for the next 9 years.
I've always had a knack for the stuff. When I TAed graduate classes I found this book to be helpful in explaining some advanced statistical concepts in plain language. If you are specifically learning to use the SPSS program, this book is by far the best. Good luck!
As you wish to get into applied statistics (i.e. actually analyzing data), you'll need software. I'd strongly recommend learning and using R because it's completely free and incredibly powerful.
Here are some resources for learning statistics using R:
Then, these websites provide very valuable resources for doing statistics with R:
Hope that helps.
I would say: Go for it as long as you are interested in the job :)
For study references for remembering R and Statistics, I think all you would need would be:
For R, data cleaning and the such: http://r4ds.had.co.nz/ and for basic statistics with R probably either Daalgard for Applied Statistics with R and something like OpenIntroStats or Freedman for review of stats
Here's an "Intro to Tableau" Evernote link that has the detail below, but this is what I've put together for our teams when new folks join and want to know more about it.
http://www.evernote.com/l/AKBV30_85-ZEFbF0lNaDxgSMuG9Mq0xpmUM/
What is Tableau?
Where do I get it?
Now that I have it, how do I use it?
How do I become involved?
I want to know more
The book Ideals, Varieties and Algorithms by Cox, Litle and O'Shea is a very good undergraduate level algebraic geometry book. It has the benefit of teaching you the commutative algebra you need along the way instead of assuming you know it.
I'm not really aware of any algebraic topology books I'd consider undergraduate, but most of them are accessible to first year grad students anyway, which isn't too far away from senior undergrad. Some of my favorite sources for that are Munkres' book and Fulton's Book.
For knot theory, I haven't really studied it myself, but I've heard that The Knot Book is quite good and quite accessible.
Applied Predictive Modeling by Kuhn and Johnson
Gives good interpretations of different approaches as well as listing the strengths, weaknesses, and ways to mitigate the weaknesses of those approaches. If you're an R user, this book is an excellent reference.
You've almost got a quadratic form: https://en.wikipedia.org/wiki/Quadratic_form maybe you can add a dummy variable to homogenize the linear terms
That aside, (computational) algebraic geometry has a lot to say about this problem, in particular you might want to start here:
https://www.amazon.com/Ideals-Varieties-Algorithms-Computational-Undergraduate/dp/0387356509
I recommend Andy Field's book
Numerical Recipes is a veritable catalogue of different methods. Depending on what field you're interested in though there is surely a text with a title along the lines of 'Computational methods for [insert field] physics'
There is the second edition (2018) of APM
Check out here:
https://www.amazon.com/Applied-Predictive-Modeling-Max-Kuhn/dp/1461468485
Quick and dirty answer: speaking very broadly, random forests -- found in the "randomForest" package -- tend to win battle-of-the-algorithms type studies. If you just want to play with a single model, I'd recommend starting with that and looking at the help for it.
Longer and better answer: Your best bet to answering all these questions and getting a good handle on data mining/predictive analytics is this book: Applied Predictive Modeling. The book references the "caret" package quite a bit, since the package's author is the same person. With it, you can train a lot of different types of models for regression or classification optimizing for accuracy, RMSE, ROC, etc. It provides a standard API for playing with models and makes your life much, much easier. It has its own website here.
Not sure what your skill level is, or the application you're using MATLAB for, but here are a few resources:
You may want a book as well, I've used MATLAB for Engineers, which has all the basic information you need, programming, plotting, etc.
I would recommend learning to program in the MATLAB environment as it's probably the most useful skill to have, and gives you exposure to the capabilities MATLAB offers. You will learn quickly that MathWorks wants to sell you addons that does the work for you, but if you can program it will save a lot of money.
You're welcome. Thanks for providing some additional detail. This helps. I think if you read up on the CRISP-DM and use that framework to walk your way through some of these challenges it will be very beneficial to you. I'd recommend giving this document a read when you have the time. I think that if you show them that you are comfortable with these guidelines and know how to work your way through it to solve a problem it will go a long way. Model selection can be a bit tricky depending on the situation but I think most practitioners have a favorite model that they go to. Sounds like you're already familiar with Wolpert's "No Free Lunch Theorem" suggesting to try a wide variety of techniques. Personally, this is where I'd start digging deeper into tuning parameters (cross-validation, etc.) to help with that decision. Ultimately though, it's important to have a firm understanding of the strengths/weaknesses of the different models and their use cases so you can make an informed selection decision. Kuhn and Johnson's book Applied Predictive Modeling will be a good read to help you prepare.
There are numerical methods that make essential use of randomness, such as Monte Carlo methods or simulated annealing.
And numerical methods can be applied to statistics problems. A nonlinear model is probably going to require a numerical scheme to implement. The book Numerical Recipes, which is all about actually implementing numerical methods on a computer, has four chapters covering randomness and statistics.
> My plan at present is to do a PhD in numerical PDEs and then go into industry in scientific computing as a researcher or developer.
I'd make sure that whoever it is you want to work with is heavy on the computation side. Even better if they work with supercomputers. I say this because even a topic like numerical PDEs can go very deep into theory (consider this paper ). Industry likes computer skills.
This book is all you will need.
https://www.amazon.com/Applied-Predictive-Modeling-Max-Kuhn/dp/1461468485/ref=nodl_
To go with the other comments I also recommend picking up a book like numerical recipes which describes in detail many well tested algorithms.
You could read the IBM manual OR you could buy this much more user friendly book:
http://www.amazon.com/Discovering-Statistics-using-IBM-SPSS/dp/1446249182
It sounds like you're looking for time series material, but Applied Predictive Modeling may be of interest to you. For time series and R specifically, this text seems well-reviewed.
Many thanks. I can speak more on the topic, but you're wanting to learn a lot about Machine learning (well lasso and ridge regression technically count as statistics, but point stands).
If you learn best via online courses, I'd suggest starting with Andrew Ng's Machine Learning Course
If you learn best through reading, I'd recommend two books: Hastie, Tibshirani, & Friedman - Elements of Statistical Learning
and Kuhn & Johnson - Applied Predictive Modeling
Obviously, I'd also recommend my blog once I learn my audience.
Discovering Statistics using SPSS
This is the type of book you want. It has good examples and straightforward commentary.
The recommended book for my computational biology degree is Dalgaard which I've found very handy. It's very clear and accessible.
For the gritty details, you'll have the built-in R help which is excellent.
Edit: He works on the R project, as you can see here
I really like this book:
http://www.amazon.co.uk/Discovering-Statistics-using-IBM-SPSS/dp/1446249182/ref=as_li_tf_sw?&linkCode=wsw&tag=statihell-21
Fun to read, easy to understand, entertaining. What stats book is entertaining???
I ran across this presentation last night by Max Kuhn, one of the authors of Applied Predictive Modeling (http://www.amazon.com/Applied-Predictive-Modeling-Max-Kuhn/dp/1461468485).
https://static.squarespace.com/static/51156277e4b0b8b2ffe11c00/t/513e0b97e4b0df53689513be/1363020695985/KuhnENAR.pdf
It's a really great discussion of how they did the joint authoring of the book and the tools they used - and what they would do differently.
I'm taking a stats class right now and we're just working our way through Ben Bolker's book which is decent and goes through a lot of stats theory. It's completely R based though, which may or may not be helpful to you.
http://www.amazon.com/Ecological-Models-Data-Benjamin-Bolker/dp/0691125228
The structure of the claims might vary depending on the source but a good source of information are the videos you can find at https://www.resdac.org/workshops/intro-medicare. Check out the rest of the resdac website as well. This book is pretty good but might not match your claims exactly if you work with a contractor as opposed to a research organization.
There are lots of nuances using Medicare claims data that you will have to learn. Hopefully you have someone with experience to guide you. The learning curve is rather steep but not insurmountable. If you come across specific questions please post them here.
Did you even try?
> I have struggled with noise reduction in PixInsight and even when I used Photoshop.
I think most of us (I KNOW I do) have the same difficulty, and it boils down to...well, quite frankly, to wanting our amateur images to look like Hubble results.
Harsh though he can sometimes be, I think Juan Conejero of the PixInsight team said it best :
> A mistake that we see too often these days is trying to get an extremely smooth background without the required data support. This usually leads to "plastic looking" images, background "blobs", incongruent or unbelievable results, and similar artifacts. Paraphrasing one of our reference books, this is like trying to substantiate a questionable hypothesis with marginal data. Observational data are uncertain by nature—that is why they have noise, and why we need noise reduction—, so please don't try to sell the idea that your data are pristine. We can only reduce or dissimulate the noise up to certain limits, but trying to remove it completely is a conceptual error: If you want less noise in your images, then what you need is to gather more signal.
Admittedly...agreeing with Juan doesn't mean I'll stop trying to "prove him wrong" anyway. I'll still mash away at TGVDenoise until the background looks like lumpy oatmeal and call it "noise reduction"...but I'll feel 2 minutes of shame when /u/spastrophoto calls me out on it. ;)
Having said that, I think the article linked above and this comparison probably did more for my "understanding" of PI's NR tools than any others I've read....for whatever that's worth. :)
> Glad to see a fellow hockey player on here...not many astrophotographer/hockey player hybrids out there!
Thought the username looked vaguely familiar. :) It IS an interesting combination, ain't it?
That's one more added to the list now... /u/themongoose85 is a hockey player too.
For statistical models in ecology, I reccommend Ben Bolker's Ecological Models and Data in R.
If you liked that... you'd also love "Discovering Statistics using SPSS" by Andy Field (the second title is "And Sex And Drugs And Rock N' Roll")
http://www.amazon.com/Discovering-Statistics-using-IBM-SPSS/dp/1446249182
I think you are looking for a book to show you how to do real Statistics and Probability on real data.
Obviously this requires a computer and some programming language. I think R is an excellent place to practice these skills and concepts. I have not read the R Cookbook, but I thought Introductory Statistics with R was good. It should be a good resource for practicing Statistical Programming, but I do not see a free version, but if you find the table of contents it's a good list of items to learn.
Just got this book. Its pretty great so far. Very straight forward.
Here is Occupational Health Psychology: https://www.amazon.com/Handbook-Occupational-Health-Psychology-Second/dp/1433807769/ref=sr_1_2?crid=23K4PM6UI8F10&keywords=handbook+of+occupational+health+psychology&qid=1574832541&sprefix=handbook+of+occupation%2Caps%2C198&sr=8-2
​
Here is also a great stats textbook: https://www.amazon.com/Discovering-Statistics-Using-IBM-SPSS/dp/1526436566/ref=sr_1_1?keywords=andy+field+statistics&qid=1574833320&sr=8-1
The same author also has a interesting version of a stats book: https://www.amazon.com/Adventure-Statistics-Reality-Enigma/dp/1446210456/ref=sr_1_3?keywords=andy+field+statistics&qid=1574833347&sr=8-3
For Pandas, I recommend this book: https://www.amazon.com/dp/B01GIE03GW/ref=dp-kindle-redirect?_encoding=UTF8&btkr=1
Yo, I'm not getting that image, but at a base level I can tell you this -
Here's a good book for Python: http://www.nltk.org/book/
A link to some more: http://nlp.stanford.edu/~manning/courses/DigitalHumanities/DH2011-Manning.pdf
And for R, there's http://www.springer.com/us/book/9783319207018
and
https://www.amazon.com/Analysis-Students-Literature-Quantitative-Humanities-ebook/dp/B00PUM0DAA/ref=sr_1_9?ie=UTF8&qid=1483316118&sr=8-9&keywords=humanities+r
There's also this https://www.amazon.com/Mining-Social-Web-Facebook-LinkedIn/dp/1449367615/ref=asap_bc?ie=UTF8 for web scraping with Python
I know the R context better, and using R, you'd want to do something like this:
And that's that!
Also check out Applied Predictive Modeling - it's in a way the next book to read after ISLR - it goes a bit more in depth about good practices, plusses and minuses of different models, feature creation/extraction.
In addition, this book will save your life. With a good programming base, it's almost like cheating.
It's very clear for a book on mathematical statistics. It also considers the Bayesian (and even Empirical Bayesian) approach. I'm sometimes shocked at what it covers and how well it covers it in so few pages. For example, there's a nice section on the EM algorithm, which most books in the same class don't cover (unless they're huge).
Edit: I should mention... if you're a scientist looking for how statistics works this is the book for you. If you want to learn a ton about regression/ANOVA, time-series, covariance structures, blah, blah, blah, this book is not for you. A great introduction (for all scientists) that covers this stuff quickly and effectively (as well as MLE, optimization, and R) is Ecological Models and Data with R.
Edit 2: If you want applied linear models, Applied Linear Statistical Models is good, but doesn't use R. Luckily formula objects and delayed evaluation give R some beautiful expressivity here.
Oh also check out this book: https://www.amazon.com/Discovering-Statistics-Using-IBM-SPSS/dp/1526436566/ref=sr_1_3?s=books&ie=UTF8&qid=1524600936&sr=1-3&keywords=Discovering+Statistics+Using+IBM+SPSS+Statistics
I found it to be a lifesaver!!!!
Sure, the basics, but for advanced information gathering consider using SPSS. Andy Field gives a good introduction if you're interested.
Il terzo sopratutto, se puoi investirci qualche soldo, è secondo me il più veloce. Non parlo strettamente di quel corso gratuito in italiano, ma di tutti i corsi presenti nel portale. Essendo direttamente applicativi ti mitigano la curva di apprendimento di molte volte.
Dell'ultimo libro (ma anche dello springer) ci sono poi tutti i seguiti nelle rispettive collane, in base al proprio indirizzo e a quello che si vuole fare (medicina, finanza, cibo, ecologia, morfologia, etc.)
edit: credo esistano svariati corsi base in inglese su youtube, e qualcosa pure in italiano.
Google Forms er bra og svarene dine vil bli lagret i et ark som kan lastes inn i SPSS. SPSS kjenner .sav, .csv og excel-varianter. Her er en video hvor Andy Field forklarer deg hvordan du kan gjøre det:
https://www.youtube.com/watch?v=nchjj4XzIWc
Den eneste begrensningen med Google Forms du må bekymre deg for er om du skal ha mer enn 400.000 respondenter og over 256 spørsmål. Dette er begrensningen på datasettet som vil bli laget i Google Spreadsheets. (Disse er ikke noe å undervurdere forresten.)
Fordi Google Forms er et gratis alternativ og jeg har aldri sett det mislyktes på grunn av for mye trafikk så anbefaler jeg det på det sterkeste. Jeg bruker Surveymonkey og Fluidsurveys på jobb og det har kun fordeler om du skal ha mer enn en halv million respondenter i en kort periode (f.eks en uke eller måned). Det koster også penger så jeg anbefaler Google forms.
Yep, humans perceive differences in length much better than differences in angle. Yau's book Data Points talks about this extensively with examples.
Not sure if this is exactly what you're looking for as it was written by physicists, but this is considered the bible of numerical methods in my field (computational physics): http://www.amazon.com/Numerical-Recipes-3rd-Edition-Scientific/dp/0521880688
I highly recommend Andy Field's book Discovering Statistics Using IBM SPSS Statistics. He has a gift for simplifying complex statistical concepts. Additionally, you'll be learning to use SPSS, which is guaranteed to be useful in your graduate studies and career. Alternatively, he offers the same book for other statistical softwares.
Apparently I have to go to the Amazon page to find a table of contents(?)
EDIT: ok they added a TOC.
Мой ответ был бы - если есть веские ЗА и веские ПРОТИВ, то лучше всего комбинировать SAS и R. Делается это очень просто: SAS предоставляет API для вызова R функций через SAS/IML(пример). Также можно использовать макрос %PROC_R. Для SAS программистов не знакомых с R есть замечательная книга: "SAS and R: Data Management, Statistical Analysis, and Graphics"
I didn't know about
MLR
until this post. So without having spent any time with it whatsoever, I would only say that one of the nice things about thecaret
package is that you can also leverage Kuhn and Johnson's book, Applied Predictive Modeling, as well as YouTube videos of Max Kuhn discussingcaret
.Educate yourself.
Yes there is software.
The first thing I would suggest is to try the Microsoft Excel "Solver" . It is actually a wonderful piece of highly polished numerical analysis code, buried inside a stinky, steaming turd called Excel Spreadsheets. You and Google, working together, can find hundreds of tutorials about this, including
(link 1)
(link 2)
(link 3)
(link 4)
If you prefer to code up the algorithm(s) yourself, so you can incorporate them in other bigger software you've got, I suggest purchasing the encyclopaedic textbook NUMERICAL RECIPES. This tour-de-force textbook / reference book has an entire chapter devoted to optimization, including source code for several different algorithms. I recommend Nelder-Mead "amoeba" but other people recommend other code.
Get this book and start reading it asap.
You'll probably find this article and its references interesting: https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining
I also strongly recommend this book: http://www.amazon.com/Guerrilla-Analytics-Practical-Approach-Working/dp/0128002182
If you're looking something more technical about actually doing analyses, this is book is very accessible: http://www.amazon.com/Applied-Predictive-Modeling-Max-Kuhn/dp/1461468485
If you use R, this book is really great: http://www.dcc.fc.up.pt/~ltorgo/DataMiningWithR/
We've been using this: http://www.amazon.com/MATLAB-Engineers-Edition-Holly-Moore/dp/0133485978
When I do use it, it has pretty helpful explanations and examples. It's also pretty easy to skim, which is nice.
Stats :) https://www.amazon.com/Discovering-Statistics-Using-IBM-SPSS/dp/1446249182/ref=sr_1_1?ie=UTF8&qid=1494420070&sr=8-1&keywords=statistics+using+spss
(also other versions if you/your school doesn't use SPSS)
I'm late to the game on this one, but learning these programs cannot be stressed enough. Different institutions have different preferences for programs, so you may hear about MATLAB, SPSS, STATA, R, etc etc. Pick one and go for it. My personal suggestion is to begin with SPSS. It's very user friendly and a great kickoff program for getting your feet wet.
Your school may have stats classes where you'll learn SPSS or a program like it, but if you want to go at it on your own for a headstart, I suggest two things: the first, YouTube, FOREVER. There are a ton of helpful videos which take you step by step through the processes of using almost any program you can think of, and the best part is that YouTube is free.
The second is that it's never a bad idea to pick up a great book, a go to reference guide if you need it. Discovering Statistics Using SPSS is written by a great author who (shockingly) does not make the subject matter seem dry. I own the R equivalent and am looking to pick up the SPSS version soon because I liked it so much.
Costs of textbooks/stats reference books are high, I know. But for my preference, nothing beats having that go to reference item on your shelf. If you decide to start shopping around, you can ask around in /r/booksuggestions or /r/asksocialscience and see what others use to find the best book for you.
If you what a bit of both worlds. The book SAS and R: Data Management, Statistical Analysis, and Graphics is also worth a look.
It might not be the best general introductions, because it cover both at the same time, but is very to the point on how to do most normal data manipulations and statistical calculations.
This will be your best bet. It should just ship with a license. :)
http://www.amazon.com/Tableau-Your-Data-Analysis-Software/dp/1118612043
Bayesian Methods in the Search for MH370 is a really great book on the analysis and critical thinking that one team did to help try and narrow down the search. It's a really great read (for a technical book.)
I'm a graduate student who teaches an undergraduate statistics course, and I'm going to be brutally honest with you.
Because you have expressed a lack of understanding about what standard deviation is, I don't anticipate that you will be able to understand the advice that you receive here. I teach statistics at an undergraduate level. I teach standard deviations during week 1, and I teach ANOVA in the final 2 weeks. So, you are at least a full undergraduate course away from understanding the statistics you will need for this.
Honestly, you're probably in over your head on this and a few days spent on reddit aren't going to give you what you're looking for. Even if you're given the answers here, you'll need the statistical knowledge to understand what the answers actually mean about your data.
You have run an experiment, but the data analysis you want to do requires expertise. It's a LOT more nuanced and complex than you probably realized from the outset.
Some quick issues that I see here at a glance...
Mashing together different variables can make a real mess of the data, so the scores you have might not even be useful if you were to attempt to run an ANOVA (the test you would need to use) on them.
With what you have shown us in the post, we are unable to tell if group b's scores are higher because of the message they received or whether they just happen to be higher due to random chance. Without the complete "unmashed" dataset we won't be able to say which of the "mashed" measurements are driving the effect.
I have worked with honors students that I wouldn't trust with the analysis you need. Because you are doing this for work, you really should consider contacting a professional. You can probably hire a graduate student to do the analysis for a few hundred dollars as a side job.
If you really want to learn how to do it for yourself, I would encourage you to check out Andy Field's text book. He also has a YouTube Channel with lectures, but they aren't enough to teach you everything you need to understand. Chapter 11 is ANOVA, but you'll need to work your way up to it.
R survey package author also has a book, where it is really good explanation how to use survey package: https://www.amazon.com/Complex-Surveys-Analysis-Survey-Methodology-ebook/dp/B005PS6C9A
I had no experience with survey analysis before and that book helped me a lot.
Personally I make a distinction between scripting and programming that doesn't really exist but highlights the differences I guess. I consider myself to be scripting if I am connecting programs together by manipulating input and output data. There is lots of regular expression pain and trial-and-error involved in this and I have hated it since my first day of research when I had to write a perl script to extract the energies from thousands of gaussian runs. I appreciate it, but I despise it in equal measure. Programming I love, and I consider this to be implementing a solution to a physical problem in a stricter language and trying to optimise the solution. I've done a lot of this in fortran and java (I much prefer java after a steep learning curve from procedural to OOP). I love the initial math and understanding, the planning, the implementing and seeing the results. Debugging is as much of a pain as scripting, but I've found the more code I write the less stupid mistakes I make and I know what to look for given certain error messages. If I could just do scientific programming I would, but sadly that's not realistic. When you get to do it it's great though.
The maths for comp chem is very similar to the maths used by all the physical sciences and engineering. My go to reference is Arfken but there are others out there. The table of contents at least will give you a good idea of appropriate topics. Your university library will definitely have a selection of lower-level books with more detail that you can build from. I find for learning maths it's best to get every book available and decide which one suits you best. It can be very personal and when you find a book by someone who thinks about the concepts similarly to you it is so much easier.
For learning programming, there are usually tutorials online that will suffice. I have used O'Reilly books with good results. I'd recommend that you follow the tutorials as if you need all of the functionality, even when you know you won't. Otherwise you get holes in your knowledge that can be hard to close later on. It is good supplementary exercise to find a method in a comp chem book, then try to implement it (using google when you get stuck). My favourite algorithms book is Numerical Recipes - there are older fortran versions out there too. It contains a huge amount of detailed practical information and is geared directly at computational science. It has good explanations of math concepts too.
For the actual chemistry, I learned a lot from Jensen's book and Leach's book. I have heard good things about this one too, but I think it's more advanced. For Quantum, there is always Szabo & Ostlund which has code you can refer to, as well as Levine. I am slightly divorced from the QM side of things so I don't have many other recommendations in that area. For statistical mechanics it starts and ends with McQuarrie for me. I have not had to understand much of it in my career so far though. I can also recommend the Oxford Primers series. They're cheap and make solid introductions/refreshers. I saw in another comment you are interested potentially in enzymology. If so, you could try Warshel's book which has more code and implementation exercises but is as difficult as the man himself.
Jensen comes closest to a detailed, general introduction from the books I've spent time with. Maybe focus on that first. I could go on for pages and pages about how I'd approach learning if I was back at undergrad so feel free to ask if you have any more questions.
Out of curiosity, is it DLPOLY that's irritating you so much?
These are the ones I've been looking at.
MATLAB: http://www.amazon.com/Getting-Started-MATLAB-Introduction-Scientists/dp/0199731241/ref=sr_1_4?ie=UTF8&qid=1373915421&sr=8-4&keywords=matlab
http://www.amazon.com/MATLAB-Demystified-David-McMahon/dp/0071485511/ref=sr_1_9?ie=UTF8&qid=1373915488&sr=8-9&keywords=matlab
http://www.amazon.com/Essential-Matlab-Engineers-Scientists-Edition/dp/0123943981/ref=sr_1_6?ie=UTF8&qid=1373915488&sr=8-6&keywords=matlab
Solidworks:
http://www.amazon.com/Engineering-Design-SolidWorks-2011-Multimedia/dp/1585036234/ref=sr_1_3?s=books&ie=UTF8&qid=1373915555&sr=1-3&keywords=solidworks+engineer
You could do worse than
Daalgard