Reddit reviews The Grammar of Graphics (Statistics and Computing)
We found 9 Reddit comments about The Grammar of Graphics (Statistics and Computing). Here are the top ones, ranked by their Reddit score.
We found 9 Reddit comments about The Grammar of Graphics (Statistics and Computing). Here are the top ones, ranked by their Reddit score.
The "bible" is "The Grammar of Graphics" by Leland Wilkinson. (link to amazon). The "gg" of ggplot2 stands for grammar of graphics.
Then we go into other books, resources that help with actually showing visualizations:
Then we can look at the "Table of Elements of Data Visualization":
Then, we can look at some blogs to help you see what works and doesn't work:
Finally, some blog posts about other people in data visualization that you can learn from:
> What is/are the benefits of ggplot compared to to matplotlib/pylab?
From my understanding, ggplot2 is an R package that aims to create graphics that follow the design principles from a book called "grammar of graphics". Also, they're pretty as hell with basically no effort required. This is an attempt to mimic that package in native Python.
Matplotlib is more powerful, and more customizable, but the default settings are ugly and sometimes almost illegible.
Super minor nitpick:
R Studio is the development environment.
R
is the language.Presumably you want to become well versed in the latter rather than the former. It's an easy mistake to make though, since the two are so intertwined for most people as to become almost indistinguishable.
More to your point though:
Before learning anything, it's a good idea to ask yourself why you want to learn it, and what you hope to be able to do with it. Now, you mentioned two things,
Both of these are relatively simple, and if you have even the most rudimentary understanding of
R
, you could learn to do in a couple of minutes.So, my question to you would be, in using
R
is your goal to get quick, simple answers to straightforward questions OR are you ultimately looking to be able to do much more complicated tasks? This isn't a judgemental question, not everyone needs to aspire to become anR
god, just needing something quick and dirty is perfectly okay.If the things you mentioned are more or less the extent of your needs, I'd suggest just googling what you need to do at the time and pick up what you need, more or less, through osmosis.
However, if you have designs on being able to do amazingly complicated things, if you want to push
R
to its fullest, you'll need a more structured approach.One thing you absolutely must understand is
R
is a package based language. What this means for you is that beyond the numerous ways you can do any task in any language, people have written countless* packages which contain all sorts of handy functions to do just about anything you could conceivably want to do.>* Okay, it's not really countless, there are (as of this writing 12,620 packages on CRAN and 1,560 additional packages on bioconductor. There are bunches more of unofficial ones scattered about GitHub and others privately maintained, but you get the point, there's lots of them.
So, for anything you want to do, you can approach it in one of two, very broad, ways:
R
.When you are starting out, I think it's very important to get a good handle on Base
R
.I would start out with basically any introductory
R
book. Search on Amazon and just find one you like.Personally, I can recommend Using R for Introductory Statistics by John Verzani. It isn't for everyone, but if you're truly a beginner to both
R
and statistics more generally, it's a good reference text.After that it's, up to you. Where you want to take it. For me, the pantheon of
R
gods* I would pay tribute to are these four:>*I'm sure every single person on that list would balk at being called a "god," but they'd be lying.
It's no mistake that 3/4 of them work for R Studio.
The god of tidiness.
Hadley must be a complete neat-freak because he's the driving force behind the
tidyverse
,>The tidyverse is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures.
Once you branch out of base
R
, thetidyverse
should be your first destination. It's not quite a new language unto itself, more like a very sophisticated dialect of the language you already know. Once you can speak "tidy," you can still communicate with the "base" speaking plebs, you just won't be able to imagine every wanting to.*>* this is not exactly true, and might come across as gross and elitist, but the
tidy
paradigm really is substantially better. If you were designing a completely new language to do statistical competing, from scratch, today, the language would probably feel a lot like thetidyverse
.Anyway, any book by Hadley Wickham is gold, and they're all available online for free. But R for Data Science is a good first step into a larger world.
The god of speed.
I imagine Dirk is not a patient man. He's very active on forums, basically every meaningful response on stackexchange for an Rcpp related question is his (or his collaborator, lesser-god Romain Francois), but sometimes his responses can seem a little... terse?
Now,
R
is notoriously slow. It's much maligned for this, usually fairly, sometimes not.Much of the perceived slowness can be mitigated in base
R
by learning the suite ofapply
functions which are vectorized. That is they take a multivalued variable (a vector, matrix, or list) and they apply the same function to each element. Its typically much, much faster than using a for-loop. However, you can't always get away from needing a for-loop, and sometimes your loop will need to run thousands (or millions) of times. That's where theRcpp
package which Dirk maintains comes into play.It is an interface between
R
andC++
, there's not much to say about the package itself. You'll need to learn at least some rudimentaryC++
to make use of it, but simply breaking out a computationally intensive for-loop into anRcpp
function can yield a huge improvement in run times. 10x-100x (or more) depending on how well (or poorly) optimized yourR
andC++
code is. There's some weirdness involved (like you can't call anRcpp
function in a parallel apply function (separate package) unless yourRcpp
function is loaded as part of a package, so for maximum benefit you'll need to learn how to write your own packages - praise be to Hadley).Rcpp
includes some semantic "sugar" which allows you to write some things inC++
more like you would inR
, but that's yet a third thing to learn.Also
Rcpp
, much like thetidyverse
is more an ecosystem of interconnected packages than a single package.The god of art.
Base
R
plots are ugly as sin. They just are, no one should use them ever, for any reason.*>*Exaggeration.
That said, Winston's*
ggplot2
is a revelation and a revolution in how graphics are created and presented.>* Yes, technically
ggplot2
is also Hadley's and is part of thetidyverse
, but Winston literally wrote the book on it. Okay, okay, Hadley technically created the package and has written books about it, I just find Chang's book more fitting to my needs.The "gg" in
ggplot2
stands for "grammer of graphics", a common structure for describing the components of a visualization in a concise way.Learning
ggplot2
will take you a long way toward being able to make beautiful graphical visualizations.The god of sharing.
After you've learned all of the above. You can wrangle your messy data into something tidy and manageable, you can work on it cleanly and power through massive computations, and you can create stunning images from your data, it all means nothing if you're the only one who sees it.
This is where Yihui shines. He is the maintainer for the
knitr
package, and the author of Dynamic Documents with R and knitr. This will allow you to turn all of your work into PDFs or web pages to share with the world.It's super easy to get started with, much more complicated to master, but definitely worth it.
To use it effectively, you'll need to learn
rmarkdown
also by Yihui. You'll also want to start dabbling withLaTeX
(if your not proficient already) and to truly bend documents to your whim you'll need to learn to tinker withYAML
.Closing remarks.
It's a lot to master. Few ever will. Not everyone will agree on everything I've said, but I think the park to true mastery looks something like that.
Best of luck!
> You can get away with using Python now, in my mind, and this is a feat unimaginable 5 years ago. But I never want to.
Not even with the interactive beauty and wonderfulness of IPython Notebooks? :)
> Bokeh looks nicer than raw matplotlib, but I'm not sure why it reminds you of ggplot
Because both are explicitly based on The Grammar of Graphics (the "gg" in "ggplot").
> Copying Matlab style plotting has always been a mistake in my mind.
Again, it's explicitly a goal of Bokeh to leverage the experience of existing R/ggplot users in much the same way that matplotlib tried to appeal to Matlab users.
Agreed that I don't like matplotlib's imperative style, but much of its functionality is now exposed via multiple APIs — it's now possible to use it much "less imperatively".
FWIW, although Bokeh is itself not an implementation of the Grammar of Graphics, many of the original Bokeh authors, including and especially myself, were avid fans of Wilkinson's approach. I would say those ideas (and their structure and consistency) greatly informs the structure of Bokeh.
I've heard this one's a classic, but I haven't read it, so I can't comment personally:
https://smile.amazon.com/Grammar-Graphics-Statistics-Computing/dp/0387245448?sa-no-redirect=1
I haven't used Origin 9 before, but from what I can see online it looks like a GUI-driven plotter. How comfortable are you with basic coding? Depending on circumstances, I use either Matplotlib or ggplot2 for data visualization at work and school.
Matplotlib is a plotting library for Python. It's standard plots can look pretty bland, but the Seaborn library helps a lot with appearance. You can handle interactivity with Jupyter Notebooks and its Jupyter Widgets.
ggplot2 is a library for R that is built around the idea of The Grammar of Graphics. Examples of some of the available plots are here. You can also use the ggthemes package to change plot appearance on-the-fly. An option for interactive plots is Plotly's ggplot2 library.
If you are geeky about it, "The Grammar of Graphics" is a bit theory-heavy, but may be worth the dive.
After all, it's what the "gg" in the "ggplot" package is taken from.
ggplot is an implementation of the ideas from "Grammar of Graphics". It is so much more than just pretty looks.