Reddit Reddit reviews Hadoop Security: Protecting Your Big Data Platform

We found 1 Reddit comments about Hadoop Security: Protecting Your Big Data Platform. Here are the top ones, ranked by their Reddit score.

Computers & Technology
Books
Databases & Big Data
Data Processing
Hadoop Security: Protecting Your Big Data Platform
O Reilly Media
Check price on Amazon

1 Reddit comment about Hadoop Security: Protecting Your Big Data Platform:

u/batoure ยท 3 pointsr/hadoop

So its been my experience that because the meta ecosystem of hadoop evolves weirdly sometimes at breakneck speeds I have yet to find one or two books to rule them all.


That being said start with "Hadoop Security" by Ben Spivey.

https://www.amazon.com/Hadoop-Security-Protecting-Your-Platform/dp/1491900989/ref=sr_1_1?ie=UTF8&qid=1498101555&sr=8-1&keywords=hadoop+security

One of the blindspots that 80% of clusters I have interacted with share, is securing the ecosystem. Have you ever seen the interview where Berners-Lee says that it terrifies him that most of the systems on the international space station run over http. Similar situation here. Most Bigdata platforms have a Trust as a default and require strategy and intelligent thought up-front for proper configuration.

Monitoring and managing is where things get complicated management is essentially what the various flavors of hadoop are arguing about. The two majors being Cloudera with "Cloudera Manager" where HortonWorks has "Apache Ambari".

This isn't a plug per-say but I have found that though it often feels like a frankenstein's monster of knitted together tools; Cloudera Manager seems to provide a slightly less painful install process. Additionally their parcels system allows you to manage and mess around with various versions of tools pretty easily. At the end of the day none of these tools are going to hold your hand so expect one of two things to happen.

-You get the cluster up fast and working-ish in "Lets please not call it production mode"
-You get a clean and stable cluster running and start wondering what data to ingest after working on this steadily for the next 3 months or so.

The danger of option 1 is just how achievable it is because a couple months from now the company will start talking about this vital infrastructure and you will have a problem on your hands.