Reddit reviews Site Reliability Engineering: How Google Runs Production Systems
We found 18 Reddit comments about Site Reliability Engineering: How Google Runs Production Systems. Here are the top ones, ranked by their Reddit score.
O Reilly Media
We found 18 Reddit comments about Site Reliability Engineering: How Google Runs Production Systems. Here are the top ones, ranked by their Reddit score.
Basically any SRE advice for a normal service but replace/compliment HAproxy / nginx / ingress controller / ELB with the Tor daemon / OnionBalance.
I run Ablative Hosting and we have a few people who value uptime over anonymity etc and so we follow the usual processes for keeping stuff online.
Have multiples of everything (especially stuff that doesn't keep state), ensure you have monitoring of everything from connections, memory pressure, open files, free RAM etc etc.
Just think of the Tor daemon onion service as just a TCP reverse proxy, with load-balancing capability and then follow any other advice when it comes to building reliable infrastructure;
Once you've got to grips with running a reliable service then you can start layering your Onion reverse proxy / load balancing on top.
All of this aside, check /u/alecmuffett's "Onions that don't suck" repo for examples that are both well setup and stable.
TL;DR; Tor is just a TCP reverse proxy with load balancer capabilities go learn some DevOps dodads
Edit: As per Alec's comment - clarify that Tor is technically a reverse proxy with load-balancing capabilities rather than a straight up TCP load balancer.
I don't really have many good resources for you. I don't read a lot of technical books or websites/blogs outside of my day-to-day job. I've heard some pretty amazing things about Site Reliability Engineering and Effective DevOps, but I have yet to read either of them.
Overall, as you move forward in your career, I would encourage you to learn as much as you can about the ecosystem your code lives in. A lot of people who go into DevOps have really broad and comprehensible knowledge about the entire stack, all the way from networking and servers, to writing code, to building/deploying/hosting that code, to performance tuning that code, to logging and monitoring issues within the code, etc. Some developers really get stuck on "well, I've written the application, so I'm done, right?" but really there's a lot more to it and that's what DevOps people know and do.
My advice, don't worry about coding or experience in this or that solution.
Read, Read, Read!
Learn the discipline, the fundamentals, know what DevOps is.
Start with this:
https://www.amazon.com/Site-Reliability-Engineering-Production-Systems/dp/149192912X
I wouldn't say too early for having a good high-level overview of best practices.
It's always good to have a solid theoretical/philosophical understanding before diving into the specifics.
I'll put my vote down for Site Reliability Engineering as a good reference for high level ideas and less on the low level details.
Site Reliability Engineering: How Google Runs Production Systems
Sure I really enjoy these podcasts.
As for books , here are some tech books I have read and enjoyed:
There's obviously a ton of other books, but those immediately come to mind.
If I were you I would look into the site reliability engineering book. The first few chapters address your question pretty well. The tl;dr version is don't guess. Get stakeholders (specifically management) to agree to a specific SLA target based on your needs and provide the funding necessary to hit those numbers.
Sysadmin, I've been doing Lead SysEng/DevOps/SRE for the past 4 years and literally have multiple written offers I'm trying to choose from right now.
I only started looking 3 weeks ago.
Learn multiple clouds(I've done the big 3 in prod and other ones for utils/tools/hobby/legacy systems), Kubernets/docker, Linux, distributed systems and ansible/puppet/chef
Read this:
https://www.amazon.com/Practice-Cloud-System-Administration-Practices/dp/032194318X/
and this:
https://www.amazon.com/Site-Reliability-Engineering-Production-Systems/dp/149192912X/
See if you can buy time for the internship offer, having multiple offers is always better :)
Is the internship paid?
The SRE book
https://www.amazon.com/Site-Reliability-Engineering-Production-Systems/dp/149192912X
This one
Site Reliability Engineering: How Google Runs Production Systems
Free Online: https://landing.google.com/sre/book.html
Amazon: https://www.amazon.com/dp/149192912X/ref=cm_sw_r_cp_tai_wxJdAbNKAZKZH
The o'reilly "site reliability engineering" book was well worth reading imo
https://www.amazon.com/Site-Reliability-Engineering-Production-Systems/dp/149192912X/
Senior Level Software Engineer Reading List
Read This First
Fundamentals
Development Theory
Philosophy of Programming
Mentality
Software Engineering Skill Sets
Design
History
Specialist Skills
DevOps Reading List
Here's what I used to use:
https://www.amazon.com/gp/product/0521899435/ref=ox_sc_saved_title_2?smid=ATVPDKIKX0DER&psc=1
https://www.amazon.com/gp/product/149192912X/ref=ox_sc_saved_title_1?smid=A26MBA1SHJSNWL&psc=1
But here are some other good reads/materials:
http://www.aosabook.org/en/distsys.html
https://disco.ethz.ch/courses/podc_allstars/lecture/podc.pdf
I can't find the link I was thinking about, but here's a similar one: http://lms.uop.edu.jo/lms/pluginfile.php/2069/mod_resource/content/0/designing-distributed-systems-google-case-study.pdf
Google site reliability engineering wrote a book. Check it out.
You wanna be devops? Read how the best does it then take classes to answer everything you have questions about https://www.amazon.com/Site-Reliability-Engineering-Production-Systems/dp/149192912X
Grow a beard, read this book (http://www.amazon.com/Site-Reliability-Engineering-Production-Systems/dp/149192912X) and apply for SRE jobs. Lots of them in Seattle, if my email inbox is any indicator.
I'm not joking about the beard.
>How did you get started in DevOps?
I watched https://www.youtube.com/watch?v=LdOe18KhtT4. I realised this was the future and if you wanted to be in a high performing organisation you need to do what they're doing.
Unless you're in an organisation that is willing to undergo the cultural change of Operations and Development working together you're probably not going to go far. Creating a devops organisation from scratch is HARD unless everyone is on board.
Looking into the technology is the simple part, try reading around the movement. Pheonix Project (http://www.amazon.com/Phoenix-Project-DevOps-Helping-Business/dp/0988262509) is a good start, from there I'd look into Continuous Integration and Continuous Delivery (https://www.amazon.co.uk/Continuous-Integration-Improving-Software-Signature/dp/0321336380 & https://www.amazon.co.uk/Continuous-Delivery-Deployment-Automation-Addison-Wesley/dp/0321601912).
If by this point you don't know a programming language you're going to be in serious trouble. Learn something, be it Powershell (and honestly you probably will want to move onto C# if you want to be amazing at what you) or Python/Ruby.
Honestly you should be working towards what Google does with SRE if you want to be at the leading edge. https://www.amazon.co.uk/Site-Reliability-Engineering-Production-Systems/dp/149192912X.
Have you read https://www.amazon.com/Site-Reliability-Engineering-Production-Systems/dp/149192912X already? It covers a bit of who/how they hire along with what they do. Might help some. Good luck!