IT training serverless devops khotailieu

84 69 0
IT training serverless devops khotailieu

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

I Table of Contents Chapter 1: Introduction1 Chapter 2: What is Serverless?5 Chapter 3: Where Does Ops Belong?10 Chapter 4: Why Serverless?18 Chapter 5: The Need for Ops26 Chapter 6: The Need to Code36 Chapter 7: The Work Operating Serverless Systems43 Chapter 8: Build, Testing, Deploy, & Management Tooling53 Chapter 9: Security61 Chapter 10: Cost, Revenue, & FinDev69 Chapter 11: Epilogue80 i Chapter 1: Introduction “WHAT DO WE DO WHEN THE SERVER GOES AWAY?” When I built my first serverless application using AWS Lambda, I was excited right from the start It gave me the opportunity to spend more time building my application and less time focusing on the infrastructure that was required to run it I didn’t have to think about how to get the service up and running, or even ask permission for the necessary resources The result was an application that was running and doing what I needed more quickly than I had ever experienced before But that experience also led me to ponder my future If there were no servers to manage, what would I do? Would I be able to explain my job? Would I be able to explain to an employer (current or prospective) the value I provide? This was why I started ServerlessOps The Changing Landscape of Operations I have seen, and personally been affected by, a shift in the operational needs of an organization due to changes in technology I once sat in a meeting in which engineering leadership told me there would come a day when my skills would no longer be necessary When that happened — and they assured me it would be soon — I would not be qualified to be on the engineering team any longer Now is the right time for us to begin discussing what operations will be in a serverless world What happens if we don’t? It will be defined for us At one end of the spectrum, there are people proposing NoOps, where all operational responsibilities are transferred to software engineers That view exposes a fundamental misunderstanding of operations and its importance Fortunately, larger voices are already out there countering that attitude At the other end, there are people who believe operations teams will always be necessary and the status quo will remain That view simply ignores the change that has been occurring over the past several years If DevOps and public cloud adoption hasn’t affected your job yet, it’s only a matter of time Adopting a they’ll-always-need-me-as-I-am-today attitude leaves you unprepared for change Somewhere in between those views, an actual answer exists Production operations, through its growth in complexity, is expanding and changing shape As traditional problems we deal with today become abstracted away by serverless, we’ll see engineering teams and organizations change (This will be particularly acute in SaaS product companies.) But many of today’s problems — system architecture design, deployment, security, observability, and more — will still exist The serverless community largely recognizes the value of operations as a vital component of going serverless successfully Operations never goes away; it simply evolves in practice and meaning Operations engineers and their expertise still possess tremendous value But, as a community, we will have to define a new role for ourselves Starting the Conversation In this ebook, we will discuss: • Operational concerns and responsibilities when much of the stack has been abstracted away • A proposed description of the role of operations when serverless This ebook is a start at defining what I see as the role for operations in a serverless environment I don’t believe, however, it’s the only way to define the role I think of operations in the context of SaaS startup companies It has been awhile since I worked on traditional internal IT projects or thought of engineering without a more product growth-oriented mindset My problems and experiences aren’t necessarily your problems and experiences This is the start of a conversation Personal Biases As you read this, keep a few things in mind What I discuss on a technical level is very Amazon Web Services (AWS) centric This is just a matter of my own experience and the cloud platform I’m most familiar with You can apply these same ideas to serverless on Microsoft Azure or Google Cloud What I write, however, assumes public cloud provider serverless and not private platforms The effects of public cloud serverless are more far reaching and disruptive than private cloud serverless In addition, I’ve worked primarily in product SaaS companies and startups for the past several years My work has contributed toward the delivery of a company’s primary revenuegenerating service But you can take many of these lessons and reapply them Your customer doesn’t need to be external to your organization They can just as easily be your coworker With all that in mind, here’s what I see as the future serverless operations Chapter 2: What is Serverless? “YES, SERVERLESS HAS SERVERS.” Before we can explain the impact of serverless on operations engineers, we need to be clear about what we’re discussing Serverless is a new concept and its meaning is still vague to many people Even more confusing, people in the serverless world can disagree on what the word means For that reason we’re going to establish what we mean by serverless What Is Serverless? To start, let’s give a brief explanation of what serverless is Serverless is a cloud systems architecture that involves no servers, virtual machines, or containers to provisionor manage They still exist underneath the running application, but their presence is abstracted away from the developer or operator of the serverless application Similarly, if you’ve adopted public cloud virtualization already, you know the underlying hardware is no longer your concern Serverless is often, incorrectly, reduced to Functions as a Service (FaaS) It’s viewed as just another component of the Infrastructure as a Service (IaaS), Platform as a Service (PaaS), Containers as a Service (CaaS) evolution But it’s more than that You can manage to build serverless applications without a FaaS, e.g AWS Lambda, component For example, you can have a web application composed of HTML, CSS, graphics, and client-side JavaScript Hosted with AWS CloudFront and S3, and it’s a serverless application So what makes something serverless? What would make a simple web application serverless but an application inside of a Docker container not? These four characteristics are used by AWS to classify what is serverless They apply to serverless cloud services and applications as a whole You can use these characteristics to reasonably distinguish what is and what is not serverless No servers to manage or provision: You’re not managing physical servers, virtual machines, or containers While they may exist, they’re managed by the cloud provider and inaccessible to you Priced by consumption (not capacity): In the the serverless community you often hear, “You never pay for idle time.” If no one is using your service, then you aren’t paying for it With AWS Lambda you pay for the amount of time your function ran for, as opposed to an EC2 instance where you pay for the time the instance runs as well as the time it was idle Scales with usage: With non-serverless systems we’re used to scaling services horizontally and vertically to meet demand Typically, this work was done manually until cloud providers began offering auto-scaling services Serverless services and applications have auto-scaling built in As requests come in, a service or application scales to meet the demand With auto-scaling, however, you’re responsible for figuring out how to integrate the new service instance with the existing running instances Some services are easier to integrate than others Serverless takes care of that work for you Availability and fault tolerance built in: You’re not responsible for ensuring the availability and fault tolerance of the serverless offerings provided by your cloud provider That’s their job That means you’re not running multiple instances of a service to account for the possibility of failure If you’re running RabbitMQ, then you’ve set up multiple instances in case there’s an issue with a host If you’re using AWS SQS, then you create a single queue AWS provides an available and fault tolerant queuing service Public vs Private Serverless Increasingly, all organizations are becoming tech organizations in one form or another If you’re not a cloud-hosting provider, then cloud infrastructure is undifferentiated work; work a typical organization requires to function One of the key advantages of serverless is the reduction in responsibilities for operating cloud infrastructure It provides the opportunity to reallocate time and people to problems unique to the organization That means greater emphasis up the technical stack on the services that provide the most direct value in your organization Serverless also allows for faster delivery of new services and features By removing infrastructure as a potential roadblock, organizations can deliver with one less potential friction point There’s both public cloud provider serverless options from Amazon Web Services (AWS), Microsoft Azure, and Google Cloud; as well as private cloud serverless offerings like Apache OpenWhisk and Google Knative, which are both for Kubernetes For the purposes of this piece, we’re only considering public cloud serverless, and we use AWS examples We only consider public cloud serverless because, to start, private cloud serverless isn’t particularly disruptive to ops If your organization adopts serverless on top of Kubernetes, then the work of operations doesn’t really change You still need people to operate the serverless platform The second reason we only consider public cloud serverless is more philosophical It goes back to the same reasons we largely don’t consider on-prem “cloud” infrastructure in the same light as public cloud offerings On-prem cloud offerings often negate the benefits of public cloud adoption The same is true of public versus private serverless platforms Private serverless violates all four characteristics that make something serverless You still have to manage servers, you pay regardless of platform use, you still need to plan for capacity, and you’re responsible for its availability and fault tolerance More importantly, many of the benefits of serverless are erased There’s no reduction of undifferentiated work No reallocation of people and time to focus further up the application stack And infrastructure still remains a potential roadblock I’ve made the argument previously that serverless infrastructure results in operations teams not making enough sense anymore A monolithic operations team won’t be as effective because they’ll be gradually minimized in the service delivery process Ops teams should be dissolved and their members redistributed to cross-functional feature development teams The same issues of being bypassed in the service delivery process will happen to security professionals Arguable, it already does happen to them in many organizations I’m not really not sure where security professionals belong in the future But this is an important topic for that community to address as I’ve been addressing the role of ops 68 Chapter 10: Cost, Revenue, & FinDev “LAMBDA COMPUTE IS HIGHLY ELASTIC, BUT YOUR WALLET IS NOT.” While many engineers may not be that interested in cloud costs, the organizations that pay our salaries are Cost is ever present in their mind The consumption-based — as opposed to provisioned capacity-based — cost of serverless creates new and unique challenges for controlling cloud spend As your applications usage grows, your costs grow directly with it But, if you can track costs and revenue down to the system or function level (a practice called FinDev), you can help your organization save money, and even grow 69 Additionally, as we progress up the application stack, and therefore up the value chain, those of us in product or SaaS companies will look at revenue numbers for our team’s services We typically treat cost alone as an important metric, when really it’s the relationship between cost and revenue that’s more important An expensive system cost can be offset or made irrelevant by revenue generation As members of product pods tasked with solving business problems, we’ll be responsible for whether or not what we’ve delivered has performed Cost Isn’t the Only Consideration To start, focusing on cost alone isn’t helpful It leads to poor decision-making because it’s not the entirety of your financial picture Costs exist within the context of a budget and revenue If your cost optimization results in only fractions of a percent in savings on your budget, you have to ask if the work was worthwhile Saving $100 a month matters when your budget is in the hundreds or low thousands per month But it doesn’t really matter when your budget is in the hundreds of thousands per month You have to spend money to make money, as well Your revenue generation from a product or feature could also make your cost optimization work largely irrelevant If your product is sold on a razor-thin margin, then cost efficiency is probably going to count But if it’s a high-margin product, then you’re afforded a degree of latitude in cost inefficiency That means you’re going to have to understand to some degree how your organization works and how it generates revenue 70 Stop looking at your work as just a cost, but as work that generates part of your organization’s revenue, too! After all, how long does it take to start realizing cost savings once you factor in developer productivity? Serverless Financial Work Serverless uses a consumption model for billing, where you only pay for what you use, and a change in cost month over month may or may not matter A bill going from $4 per month to $6 per month doesn’t really matter A bill going from $40k per month to $60k per month will probably matter You can begin to see the added billing complexity that serverless introduces Cost should become a first-class system metric We should be tracking system cost fluctuations over time Most likely we’ll be tracking system costs correlated with deploys so when costs jump we’ll have the context to understand where we need to be looking Let’s start with the immediate financials challenges of serverless It’s consumption based pricing and resulting variability creates some new and interesting challenges Calculating Cost To start, it can be difficult to determine what is a suitable cost to run serverless, as opposed to in a container or on an EC2 instance There’s a lot of conflicting cost information You’ll find everything from serverless being significantly less expensive to serverless being significantly more expensive The fact is, there’s no simple answer But that means there’s useful work for you to 71 Determining the cost benefit of moving a few cron jobs, slack bots, and low-use automation services is easy But once you try and figure out the cost of a highly used complex system, the task becomes harder If you’re attempting to a cost analysis, pay attention to these three things: • Compute scale • Operational inefficiency • Ancillary services When it comes to compute, start by ensuring you’re comparing apples to apples as well as you can That means first calculating what the cost of a serverless service would be today as well as, say, a year from now based on growth estimates Likewise, make sure you’re using EC2 instance sizes you would actually use in production, as well as the number of instances required Next, account for operational inefficiency An EC2-based service may be oversized for a variety of reasons You may need only one instance for your service, but you probably have more for redundancy You may have more than you need because of traffic bursts, or because someone has not scaled down the service from a previous necessary high Finally, think about ancillary services on each host How much your logging, metrics, and security SaaS providers cost per month All these will give you a more realistic approach to cost “This will cost you more than $5 per month on EC2 because you cannot run this on a single t2.nano with no metrics, monitoring, or logging in production.” 72 The major cloud providers release case studies touting the cost savings of serverless, and I’ve had my own discussions with organizations I’ve seen everything from “serverless will save you 80 percent” to “serverless costs 2X as much” Both could be true statements, but the devil is in the details Does the organization that saved so much money have a similar architecture to yours? Is the 2X cost a paper calculation or one supported by accounting? The organization that gave me the 2X calculation followed up with operational inefficiency and ancillary service costs eating up most of their savings, to the point they considered serverless and EC2 roughly even For that reason, they require a cost analysis and convincing argument for a decision not to go serverless Preventing Waste Next, let’s talk about how to keep from wasting money Being pay-per-use, reliability issues have a direct impact on cost Bugs cost money Let’s look at how with a few examples Recursive functions, where a function invokes another instance of itself at the end, can be a valid design choice for certain problems For example, attempting to parse a large file may require invoking another instance to continue parsing data from where the previous invocation left off But anytime you see one, you should ensure that the loop will exit, or you may end up with some surprises in your bill (That has happened to me Ouch.) Lambda has built-in retry behavior, as well Retries are important not just for building a successful serverless application, but for a successful cloud application in general But each retry costs you money 73 You might look at your metrics and see a function has a regular rate of failed invocations You know from other system metrics, however, that the function eventually processes events successfully, and the system as a whole is just fine While the system works fine, those errors are an unnecessary cost Do you adjust your retry rate or logic to save money? Before you start refactoring, take some time to calculate potential savings from a refactor over the cost in time and effort There’s also potential architectural cost waste that can occur If you’re familiar with building microservices and standardizing HTTP as a communication transport, then your first inclination may be to replicate that using API Gateway and Lambda But API Gateway can become expensive Does it make more sense to switch to SNS or a Lambda fan-out pattern (where one Lambda directly invokes another Lambda function) for inter-service communication? There’s no easy answer to that question, but someone will have to answer it as your team designs services Application Cost Monitoring We should be monitoring system cost throughout the month Is the system costing you the expected amount to run? If not, why? Is it because of inefficiency or is it a reflection of growth? The ability to measure cost down to the function level — and potentially the feature level — is something I like to call application cost monitoring To start, enable cost allocation tags in your AWS account Then you can easily track cost-per-function invocation and overall system cost over time Overlay feature deploy events with that data and you can understand system cost changes at a much finer level 74 Picture how you scale a standard microservice running on a host or hosts Your options are to scale instances either horizontally or vertically When scaling horizontally you’re adjusting the number of instances that can service requests With vertical scaling you’re adjusting the size of your instances, typically in relation to CPU and/or memory, so that an instance has enough resources to service a determined rate of requests When these system falls outside of spec in terms of performance you right-size the system by scaling in the appropriate direction Each feature or change to a microservice’s codebase usually has only a minimal effect on cost (I say usually because some people like to ship giant PRs and watch their services be ground into dust under real load, requiring frantic scaling.) Your additional new feature does not have a direct effect on cost unless it results in a change in scaling the service vertically or horizontally It’s not individual changes, but the aggregate of them over time that affects changes But with serverless systems it’s different System changes are tightly coupled with cost Make an AWS Lambda function run slightly slower and you could be looking at a reasonable jump in cost Based on AWS’s pricing of Lambda where they round up to the nearest 100ms, a function that goes from executing with an average duration of 195ms costs roughly 45% more if it starts averaging 205ms Additionally, increasing function RAM raises cost But that might result in shorter function invocation durations so you end up saving money And these calculations don’t even take into account the situations where a system is reconfigured and new AWS resources such as SNS topics, SQS queues, and Kinesis streams are added or removed As you can see, cost needs to become a first-class system metric with serverless We also need tools to help us model cost changes to our serverless systems All of this is because 75 if cost monitoring and projection informs us about money already or about to be spent, the next topic will bring together money spent, money allocated, and money generated The Future Into FinDev Application cost monitoring helps yield a new practice popularized by Simon Wardley called FinDev We’re going to be including more than just cost and budgets into our engineering decisions If we can track cost down to the system or even function level, can we take this a step further and track revenue generation down to that level? If we can, then we can include revenue, either existing or projected revenue, in addition to cost and budgets to form a fuller financial picture of engineering effort and productivity What Is FinDev? This requires bridging finance, PM (either product or project management), and engineering ( both at the practitioner and leadership level) at a minimum We want to track cash flow through the organization so engineering efforts can be directed toward making decisions with the greatest business impact This efficiency has the potential to become a competitive advantage over those who are unable to prioritize their engineering time toward providing the most value It starts with tracking revenue into an organization and mapping it back to engineering work and running production systems If there’s a change in revenue month over month then why? Has revenue picked up because a new feature or service has gone to production? Has it decreased because a service is failing to perform adequately? With this relationship 76 established, we can assign monetary values to systems And now we can look at our technical systems with a much fuller financial picture surrounding them We can also establish a feedback loop in our organization, now Revenue data and system value should then be made available to engineering leadership and PMs in order to prioritize work properly You have two serverless systems exhibiting issues, which you prioritize? If there’s a significant difference in value between systems, prioritize the more valuable one If you’re evaluating enhancements, the question becomes even more interesting Does your team prioritize generating more money out of an existing service, or does it prioritize enhancing an underperforming service? Last but not least, engineering and PMs should close the loop by measuring success Has engineering returned a system to its previous financially performant state? Has work decreased or increased revenue? Has a change increased revenue but cost has eroded profit margin? These are all interesting questions to spur the next cycle of engineering work Keep in mind, for many of the preceding questions there is no right answer The data doesn’t make decisions for you It helps you to make more informed decisions Applying FinDev Ideas Today It should be noted that assigning value to systems and prioritizing work is not an entirely new concept Many organizations already assign dollar values to systems and prioritize work based on that value But the more granular we can get with assigning value the more granular we can get within an organization’s hierarchy for prioritizing work 77 Already, good product companies attempt to measure the success of what they deliver through metrics like user adoption and retention Their next step is to understand revenue generation down to the technical level But even in non-product companies there will be room to apply these principles For example, IT delivers a service that reduces friction in the sales process Does that service lead to increased revenue for your organization? Imagine being able to answer that question through analytics before yet understanding the reason why Then picture how absurd it might be to deprecate a revenue-generating service using cost alone as a justification More To Define There’s still a lot of room to figure out what the new processes and their implementation around FinDev will be Those processes will be highly dependent on your organization and business In addition, how to tie revenue generated down to the system or function level is still a largely unanswered question There’s nothing out there in the market attempting to that The process and practices for combining finance with engineering is still a growing and evolving area Keep in mind, this is all closely aligned with why we adopted DevOps; to become more efficient and provide more value to our organizations FinDev is just an extension of that and a new space serverless better opens up 78 We’re Not There Just Yet Admittedly, right now you’re talking about small dollar amounts, and the cost saving over EC2 may already be enough not to care at the feature level In the future, however, as we become accustomed to the low costs of serverless, we will care more Similarly, many organizations still invest heavily in undifferentiated engineering But as companies that focus more heavily on their core business logic through serverless begin to excel, we’ll see more organizations become interested in achieving that level of efficiency 79 Chapter 11: Epilogue I wrote this ebook for the same reasons I started ServerlessOps When I first started building serverless applications I was immediately struck by how much I liked it and how it was something truly something different in technology like we hadn’t seen since public cloud appeared However I soon became apprehensive I knew this would have an impact on my profession and it brought up a familiar feeling I was not unaffected in my career by the progress of technology impacting me negatively It’s an experience I not wish wish to repeat On spending more time building and working with serverless applications, and the success I’ve had, I don’t fear serverless displacing me anytime in the future In fact, I’ve found it a way to enhance my operations role and responsibilities And the time spent focusing less on infrastructure and more on the code to solve my problems has leveled-up my coding skills in ways I never expected to achieve 80 On finishing this ebook, I’m far less worried about the place of operations as a whole while organizations adopt serverless I’ve shown that there is a place for operations people as there is still much engineering work to be done by us But there’s still work to be done educating and preparing operations engineers and their organizations to make them successful with serverless It starts with a cultural and individual mindset shift that values the delivery of results most of all Achieving these results requires adopting an engineering cadence that iteratively delivers work, obtains feedback and measures results, and is ready to course correct or pivot when needed On the technical side, we need to level up the cloud architecture and coding skills of many in the operations field Effort combined with guidance and training makes succeeding not just doable, but something we can even excel at Good luck! 81 About ServerlessOps Started in 2018, ServerlessOps provides advisory and technical services around DevOps transformation and AWS serverless cloud infrastructure Our value delivery focused approach starts by working with leadership in identifying needs, determining goals, and prioritizing work Taking advantage of the velocity serverless provides, we work with engineering to ensure not just delivery but also the meeting of established goals Our range of services include: DevOps Cultural Transformation AWS Serverless Migration & Adoption AWS serverless training Startup Cloud Operations See the ServerlessOps services page to learn more Learn more 82 ... system architecture design, deployment, security, observability, and more — will still exist The serverless community largely recognizes the value of operations as a vital component of going serverless. .. potential roadblock In the end, you’re left with all the complexity of running and managing a serverless platform combined with the new complexity of serverless applications More Than Just Tech... about serverless is that it is more than just something technical In the debate between private and public serverless, the criticisms of private serverless are not technical They are criticisms

Ngày đăng: 12/11/2019, 22:29

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan