The Pragmatic Programmer: Your Jour...

u/samort7 · 257 pointsr/learnprogramming

Here's my list of the classics:

General Computing

u/hattivat · 11 pointsr/datascience

IMHO,

step 1: Read https://www.amazon.com/Pragmatic-Programmer-20th-Anniversary-2nd/dp/0135957052/ref=dp_ob_title_bk

step 2: Read https://realpython.com/python-pep8/ and https://docs.python-guide.org/dev/virtualenvs/

step 3: Write a REST API which takes arguments from the URL, uses these arguments to run some predictive model of your creation, and then returns the result; since you already know Python, I'd recommend using Flask, there are many free tutorials, just google it. If using Python, I highly recommend using PyCharm (the free community edition is enough) over Jupyter or Anaconda, the latter will let you do many bad things which would trigger a red warning in PyCharm (such us doing import in the middle of the file).

step 4 (optional, but recommended): Learn the basics of Java (this tutorial should be more than enough https://www.tutorialspoint.com/java/index.htm ) and read https://www.amazon.com/Clean-Code-Handbook-Software-Craftsmanship/dp/0132350882

step 5: Write a publisher application which reads a csv, xml, or json file from disk (for bonus points: from someone else's public REST API for data, for example https://developer.walmartlabs.com/docs/read/Search_API ), and turns the data contained within into a list of python dictionaries or serializable objects (btw, read up on serializing, it's important), and then sends the results into a kafka or rabbitMQ queue. I would strongly recommend sending each item/record as a separate queue message instead of sending them all as one huge message.

step 6: Learn how to use cron (for bonus points: Airflow) to make the application from step 5 automatically run every second day at 8 am

step 7: read the closest thing in existence to being the data engineering book: https://dataintensive.net/

step 8: Write a consumer application which runs 24/7 awaiting for something to appear in the queue, and when it does, it calls your rest api from step 2 using the data received from the queue, adds the returned result (predicted price, or whatever) to the data, then runs some validation / cleaning on the data, and saves it in some database (SQLite is the easiest to have running on your local computer) using an ORM (such as SQLalchemy).

step 9: Add error handling - your applications should not crash if they encounter a data-related exception (TypeError, IndexError, etc.) but instead write it to a log file (as a minimum, print it to the console) and continue running. External problems (connection to the database, for example) should trigger a retry - sleep(1) - retry cycle, and after let's say 5 retries if it's still dead, only then the application should crash.

step 10: For bonus points, add process monitoring - every time your application processes a piece of data, record what category it was in a timeseries database, such as influxdb. Install grafana and connect it to inlfuxdb to make a pretty real-time dashboard of your system in action. Whenever your application encounters a problem, record that in influxdb as well. Set grafana to send you an email alert whenever it records more than 10 errors in a minute.

Step 11: More bonus points, add caching to your application from step 2, preferably in Redis (there are libraries with helpful decorators for that, e.g. https://pythonhosted.org/Flask-Cache/ )

I'm assuming you are familiar with Spark, if not, then add that to your learning list. A recommended intro project would be to run some aggregation on a big dataset and record the results into a dedicated database table allowing for fast and easy lookup (typical batch computing task). You could also rewrite the applications from points 5 and/or 8 to use spark streaming.

I also heavily recommend learning how to use docker and kubernetes (minikube for local development), this is not only super useful professionally, but also makes it much easier to do stuff such as running spark and airflow on your home computer - downloading and running docker images is way easier than installing any of those from scratch the traditional way.

One crucial advice I can give is the mindset difference between data science and data engineering - unlike in data science, in data engineering you normally want to divide the process into as small units as possible - the ideal is to be processing just one [document / record / whatever word is appropriate to describe an atomic unit of your data] at a time. You of course process thousands of them per second, but each should be a separate full "cycle" of the system. This minimizes the impact of any crashes/problems and maximizes easy scalability¹. That is of course assuming that the aim is to do some sort of ETL, if you are running batch aggregations then that is of course not atomic.

¹ As an example, if your application from step 5 loaded all the data as one queue message, then the step 8 application would have to process it all in some giant loop, so to parallelize it you would have to get into multi-threaded programming, and trust me - you don't want that if you can avoid it (a great humorous tale on the topic http://thecodelesscode.com/case/121 ). You also have to run it all under one process, so you can't easily spread across multiple machines, and there is a risk that one error will crash the whole thing. If on the other hand you divide the data into the tiniest possible batches - just one item per message, then it's a breeze to scale it - all you need to do is to run more copies of the exact same application consuming from the same queue (queue systems support this use case very well, don't worry). Want to use all 8 CPU cores? Just run 8 instances of the consumer application. Have 3 machines sitting idle that you could use? Run a few instances of the application on each, no problem. Want the results really fast? Use serverless to run as many instances of your app as you have chunks of data and thus complete the job in an instant. One record unexpectedly had a string "it's secret!" in a float-only field and it made your app crash? No problem, you only lost that one record, the rest of your data is safe. Then you can sit back and watch your application work just fine while the colleague who decided to use multi-threading for his part is on his fifth day of overtime trying to debug it.

u/akevinclark · 9 pointsr/AskProgramming

These are great suggestions. The three books I typically give devs early (that fit in well with the two presented here) are:

Refactoring by Martin Fowler

This is a list of patterns of common refactoring a and how to do them safely. It’ll help you recognize transforms you need to make in your code as it changes.

The Pragmatic Programmer by Dave Thomas and Andy Hunt

This is a great guidebook for how to get better at being a software engineer. Essential read.

And while there are lots of options for design patterns books...

Head First Design Patterns was the one that helped me internalize them. Even if you aren’t writing much (or any) Java, the method of teaching is hugely valuable.

u/TracerBulletX · 5 pointsr/iOSProgramming

I spent a lot of time learning specific architectures and patterns that were in common usage when I first started, but the specific patterns in vogue are constantly changing. I'd recommend reading all 3 of these books at some point earlier in your career, I think a lot of the popular software design practices are based on the foundation of ideas in here and if you read them you will start to naturally make the right choices when it comes to organizing your code.

https://www.amazon.com/Pragmatic-Programmer-journey-mastery-Anniversary/dp/0135957052/ref=pd_sbs_14_t_0/142-3028760-3243861?_encoding=UTF8&amp;pd_rd_i=0135957052&amp;pd_rd_r=8877e123-b48f-4ce7-9e92-fec38cbeb54f&amp;pd_rd_w=CdI3a&amp;pd_rd_wg=arKVG&amp;pf_rd_p=5cfcfe89-300f-47d2-b1ad-a4e27203a02a&amp;pf_rd_r=9JQWC8NFNAY0GN7FAN9D&amp;psc=1&amp;refRID=9JQWC8NFNAY0GN7FAN9D

https://www.amazon.com/Code-Complete-Practical-Handbook-Construction/dp/0735619670/ref=pd_sbs_14_t_2/142-3028760-3243861?_encoding=UTF8&amp;pd_rd_i=0735619670&amp;pd_rd_r=8877e123-b48f-4ce7-9e92-fec38cbeb54f&amp;pd_rd_w=CdI3a&amp;pd_rd_wg=arKVG&amp;pf_rd_p=5cfcfe89-300f-47d2-b1ad-a4e27203a02a&amp;pf_rd_r=9JQWC8NFNAY0GN7FAN9D&amp;psc=1&amp;refRID=9JQWC8NFNAY0GN7FAN9D

https://www.amazon.com/Clean-Code-Handbook-Software-Craftsmanship/dp/0132350882/ref=pd_sbs_14_t_1/142-3028760-3243861?_encoding=UTF8&amp;pd_rd_i=0132350882&amp;pd_rd_r=8877e123-b48f-4ce7-9e92-fec38cbeb54f&amp;pd_rd_w=CdI3a&amp;pd_rd_wg=arKVG&amp;pf_rd_p=5cfcfe89-300f-47d2-b1ad-a4e27203a02a&amp;pf_rd_r=9JQWC8NFNAY0GN7FAN9D&amp;psc=1&amp;refRID=9JQWC8NFNAY0GN7FAN9D

u/nnomae · 3 pointsr/csharp

Pragmatic Programmer by Dave Thomas and Andrew Hunt.

If you want something you can download for free I thoroughly reccomend The Graphic Programming Black Book by Michael Abrash. Written just after Quake came out it is an exploration of algorithms and optimisation that is just incredibly well written and entertaining.

u/Wh0_The_Fuck_Cares · 1 pointr/Unity3D

You're kidding right? That's 100% personal preference... It doesn't matter how you place the braces as long as you're consistent.

This: The Microsoft C# Style Guide. It's literally garbage unless you work for Microsoft or a company that also follows this style guide to religiously. They're suggestions by Microsoft on how to write clean code. If you want a real break down on what clean code is read Clean Code or The Pragmatic Programmer and you'll learn what things are actually worth worrying about.

Reddit reviews The Pragmatic Programmer: Your Journey To Mastery, 20th Anniversary Edition (2nd Edition)

6 Reddit comments about The Pragmatic Programmer: Your Journey To Mastery, 20th Anniversary Edition (2nd Edition):