From the course: LPIC-2 Linux Engineer (201-450) Cert Prep
Capacity planning - Linux Tutorial
From the course: LPIC-2 Linux Engineer (201-450) Cert Prep
Capacity planning
- So, how do you know if your server has enough horsepower to do the job? Let's talk about it. - [Announcer] You are watching ITProTV. (upbeat music) - Well, hello and welcome to LPIC-2. Very excited to be kicking off this course with the one and only Don Pezet. Don, how are you, sir? - I am doing great. Ready to dive into the world of capacity planning. You know, a lot of us buy hardware, and we don't necessarily know the capabilities of it, how many connections it can handle and so on. So, in this episode, we're going to talk about capacity planning, the different things that we need to measure, how we can measure them, and how we can kind of get an estimate of exactly what our server or any kind of hardware is capable of handling. - So, what are the key things we need to watch when measuring performance? - All right, one thing to remember here is that laptops, desktops, servers, they all really have the same hardware at the end of the day, right? They all have the same basic components. They might be from different vendors, they might be different cost levels, certainly. They might have different capabilities as far as the amount of work they can handle, but at the end of the day, they are the same components. So, we actually measure our servers the same way as we measure our desktops and laptops. It's kind of nice, it's consistent. We're always paying attention to what are called the big four of performance, right? The first thing, CPU, your central processing unit. The CPU is like the brain of your computer. It's what's doing all the computational work. I shouldn't say all, because sometimes we do have other things like GPUs, graphics cards, and so on that might take some of the workload off. You might have network cards that have TCP offload engines and can take a bit off. But in general, the CPU is doing the bulk of the general purpose work on your system. And then we have memory or our RAM, right? RAM is the fast random access memory that we've got where your systems can store data, and quickly and easily reach out, and grab it as fast as possible. Now, RAM is expensive, and so we typically have other storage available that's for writing to disc that is slower and a lot more cost effective. And so, that's where our SSDs and our spinning disc come in. Now, we don't necessarily care about what the disc is. What we care about is how many input and output operations we can perform in a given period of time. And that's referred to as disc I/O. So CPU, memory, disc I/O, and then the last one, network I/O, your network adapter. You can only send so much data in and out of your network card. Maybe you have a 100 MB connection or a gigabit connection, or 10 GB, 80 GB, who knows what you've got on your system. So, that's going to impact how much your system can do. Now we can't just focus in on one of these individually. We have to look at all of them, right? All four together are going to determine exactly how much work your server, your laptop, your desktop is capable of performing. And the type of workload you throw at a system may vary. You may have a service like a database. Databases are typically very RAM and disc I/O intensive. They're typically not using a lot on the network side, and typically not even using a lot on the CPU side. It's really just all about the data storage and the retrieval process. But then you take a look at a web server, which is very active on the CPU and the RAM side, right? So, very active there in what it's doing. So, your workload is going to determine which of these metrics are more or less important to you. - Well, can we measure those all from within one tool? - You can, yeah. And there's a couple of tools we're going to look at in this episode and really in the next four episodes after this that help us to be either really surgical and look specifically at one metric or be a little more holistic and look at the picture as a whole. Whenever I'm just getting started, if somebody calls me up and says, "Don, I've got this server, and it's just slow." All right, well, I've never looked at the server before. I've never looked at the workload. I might not even know the application they're running. I may have never heard of it, right? So, I need some place to start, and I always try and start with some general purpose tools that give you a big vision of the server. And one of the most useful ones that's out there is a tool called top. Now if you've made it all the way here to LPIC-2, the odds are you've heard of the top command. But top is still, even as kind of a basic general tool, it's still really useful for finding out where we need to get started for troubleshooting or capacity planning on a system. So, let's take a look at top here on my laptop. I'm going to go ahead and just fire up the top utility. It's installed by default. In fact, I'll go out on a limb here, I have never encountered a Linux install that didn't have the top command. Even minimal installs typically have it. I have not encountered a system where it didn't. And so, you can just run top, and it will take you into a dynamic, real-time view of your system. And as you look across at what it's showing you, it's showing you a lot of information. We see information about our CPU right here, our memory utilization here, our virtual memory. We're getting a high level view of how our system is performing. And this is a great place to start, because you can come in, and say, "Well, who is using the most CPU right now? "Who is using the most RAM?" And you start to kind of figure out if we have a problem or if these are normal tolerances. We need to interpret that. Now most people just come right into top, look at this data, and then get out. But there's actually a lot you can do in here. For example, by default it's sorted by CPU. So, right now I'm looking and I can see who's my number one CPU consumer. For a moment there it was my virtual machine tools, because this is a VMware virtual machine. And then I see the gnome shell is jumping up, and even top is registering there as one of the top consumers on CPU. But if I were to start doing something like launching Google Chrome, well, Chrome is going to jump up right there. I can see it as my number one CPU. So, we can kind of monitor that activity in real time. But it's all kind of focused on CPU by default, because that is what a lot of people are going for. Well, if I'm concerned about memory though, I can just tap "M" on the keyboard, and now it's sorting by memory consumption and I can see who's consuming the most memory. And actually... Did I make a mistake there? I did. I just tapped "m." I have to remember it's a capital "M." So, the lowercase "m" actually does something a little bit different, if you watch the top of my screen. See how it's changing the little bars up here to give me a graphical representation of my memory usage? So, that's kind of neat. I find that generally pointless though. But a capital M is what we will sort by that memory column, and now I can see it. Now, the reason I didn't remember it was a capital M is, honestly there's a ton of keyboard shorts in here, and I don't remember most of them. The one I do remember is the important one though, which is Shift + F. If you hit Shift + F, it gives you a menu and you can pick from anything to sort by. And so, this is what I normally do. If I want to sort by memory, I hit Shift + F, and I come in, and I pick memory, or I can pick CPU, and I can switch back and forth. But you'll see there's all sorts of other stuff in this list, and we can come in and add them to our screen so we see them. We can move them around, change their sort order. You can build a custom layout for top that shows exactly the information that you want, in the order that you want it. And once you've got it rearranged, it's easy to overlook some of this stuff, but you can actually save this layout and bring it back again to jump right into it. From this screen, you know, top is an old tool, so it's black and white, it's not very fancy. If you hit Z, it actually flips into a color mode, and you can customize the colors and change that, and then output that as a custom layout. You can really do a lot with top if you want. And some people have done that. You'll find some systems have a tool called htop available. And htop is really the same thing, but it's been pretty-ized, is that a word? (both laughing) Prettified. - Prettified. - Prettified. - It's a new word for us right here, "prettified." You heard it right here from Don Pezet. - And so, a gentleman, I can't remember his first name, but his first name starts with an H, that's why it's called htop. And he basically just laid it out so it looked nicer and kind of got you to some of the more critical information a little quicker. So, it's available. I will tell you the htop is not installed by default on most systems, and I'm hesitant to use it on servers, because when you install htop, it brings a lot of dependencies along with it, and I always want to keep a minimum attack surface on my servers. So, it is just another tool that's available. Again, it gives you a general view of your server and how it's performing right now. - So, that takes care of real-time monitoring. But what if we need historical data? - All right, so if you want historical data, top does not do that, right? So, I'm seeing data right now. Top can give me some averages, you know, some of what you can see here. I've got this load average up here, which the load average actually goes back five minutes. So, I've got five minutes, there's that. But what if I want days or weeks, right? If I'm doing capacity planning, I need to know when is my server the busiest? Is it busiest on Thursdays or Tuesdays? Is it busy at 8:00 PM versus 6:00 PM? I need to be a little more specific and I need this stuff over time, all right. If you want to monitor over time, an easy way to do that is with a utility called sar, S-A-R. It's the System Activity Reporter, and it's a service that can run in the background in Linux and collect data, performance data, and log it, that we can go and look at later on. And it can track memory, CPU, disc I/O, and network I/O. The four big metrics that we want to watch, it can track all of those. It is not installed by default on most systems and it's part of a package that's not called sar, it's called sysstat. So, on an Ubuntu system, I would say sudo apt install sysstat. I'm glad I remember the two S's. I almost always ruin that one. So, I'll run it. And like I say, I've already got it installed on mine. But once you install it, that will give you the sar utility. You can verify real quick if you already have it by doing a sar -V, and it'll show you the version of sar that you have installed. And I can see mine says sysstat version 12.2.0. There it is and it's very exciting. So, once you get sar installed, then you can actually use it to go and start collecting that historical data. And it'll drop it into log files. In fact, I should already have some, if I take a look at /var/log, I should have a folder in here where it's storing that. It should be a sysstat folder, which I'm probably just looking right past. I don't see it, oh, there it is. So, let me go into sysstat, and I can see where it's created log files here. I see sa17, sa18, sar17. These are the log files it's generating, and storing that performance data. So, now I can see data over time. - Well, does sar start recording data automatically, or do we have to configure it? - You know, this is something that people get tripped up on. When you install sysstat, that's not enough, okay? I went ahead and installed mine early just so it would be able to collect some data here for the show. But in real life, if I had just run sudo apt install sysstat, it installs the tools, but they're not doing anything. We've got to actually enable the tool, and give it an interval to run on to tell it what we want to collect and monitor. So, you do need to turn it on. Performance data is only good if you've actually got it. So, if you install the tool, and don't enable it, it's a real sinking feeling a week later when you go and look, and there's no data there. So, we need to get it enabled. Fortunately, it's not that big of a deal to get it turned on. With sar, there's actually a couple of different pieces of the tool that are involved. I keep talking about sar itself, which is the system activity reporter. That's what's going to show me the performance data. But in the background, there's actually a piece that's called sadc or S-A-D-C, so the system activity data collector. And that's the thing that needs to be running, collecting data, and getting it put into my system. Here in the command line, what we need to do if we want to get this enabled is, first off, we need to tell sar that it should be enabled. So, that's where I'm going to start. I'm going to do a sudoedit /etc/default/sysstat. And when you install the sysstat package, this file should have been created for you. And when you open it up, this line that says ENABLED="true" for me, well, it actually says ENABLED="false" by default, and you've got to change it to true. Once you change that to true, now it knows that it needs to run. But how often is it going to run? Well, when you install the sysstat package, it created a cron job as well. And we can modify that cron job to meet our interval. So, let me get out of this file, and we'll go take a look at that. If I go into /etc/cron.d, I can see there's a sysstat file in there. I didn't create that, right? So, the system created it for me. And I'll edit that. And when we take a look at it, I can see this job right here, which is basically reaching out and collecting data on the system. And so, I can see that this one right here, activity reports, the note says, "Activity reports every 10 minutes everyday." That's the default. I've modified mine down to every one minute, because I wanted to have a good bit of data to be able to show you guys today. So, basically the way the command works is, you're calling the sa1 collector. Now on a Debian or an Ubuntu based system, they've modified theirs. And so, mine's actually called debian-sa1. If you're on a Red Hat based system, or Arch Linux, or somebody that's a little more true to the original Linux, you'll see sa1 and sa2 used here, not debian-sa1. But we can see that it's set to run, and it's set to run once every minute for me. Otherwise we could have 10 1, it would run once every 10 minutes. And so, it's going to run at that interval, and collect that data, and start to store that into the log files. So, as long as the cron job is created and ENABLED="true" in the sar configuration file, now it's going to be running, now it's going to be collecting that data and we'll be able to make use of it. - So, where does sar store all this data? - Well, we've kind of already seen the location when I was in /var/log/syssat. That's where it stores it. But you don't really have to worry about that, because sar actually generates reports for us to be able to see the information. So, if I want to look at my CPU, or memory, or disc, or network, we can bring that up a couple of different ways. Now I have a little mnemonic that I have memorized for this, which is how I remember to use sar, which is urban, you know, "urban," like urban development or an urban location. If you drop the A out, you have U-R-B-N. URBN are actually the command line extensions or tags that you need to use to get at the CPU, the memory, the disc I/O, and the network. So, if I want to see the CPU, I would say sar -u. "U" for CPU is how we remember it, I guess. So, I'll say sar -u. And when I run that, here comes my performance data for the CPU. And I see entries every minute, because that's how mine is set. I can see user applications, system, I/O wait, idle time, and I can look and see exactly when. When is my CPU the busiest? When is it doing next to nothing? Which is honestly most of the time. I can see at 9:58 AM, it got a little bit busy, right? My system got up to 21%, I was probably doing automatic updates or maybe that's when I booted up. I don't know. But I can scroll back, and I can look at this information, and it's all in a text form, so we can filter, sort, and collapse however we want. There's some additional command line modifiers we can use for that. So, "u" with CPU, "r" is memory. And I just think "R" for RAM. So, if we do sar -r, now it's going to show me memory. Now memory shows a little extra data, so the word wrap is kind of making it look ugly here. If I were to shrink that down and change it, it looks a little better, but now it's almost impossible for you guys to read what it says. I guess I could try and zoom in on that. But basically, what we're seeing on this... I'm going to just make that the normal size again. I will try and make it the normal size. There we go. Let me get to these columns. So, we've got kilobytes of memory free, kilobytes available, kilobytes of memory used, the percentage of memory used. We really get good details here and see it again trending over time. How much memory was I using at a given time? It's all right there. I also mentioned... Let's see, that was CPU, memory, so it was "u" and "r," then "b" is for disc I/O. I don't know why "b" becomes disc I/O. I just think of bytes, but you have bytes in RAM too, so that doesn't always work, but I'll do it anyway. Sar -b, and there we go. We can start to see some of the disc information and we can see blocks read, blocks written. Sometimes these are called blocks in and blocks out, that we are sending blocks to the disc, that would be bytes written. And getting blocks back from the disc, that would be bytes read. And so, we can see how much disc activity we've got, again laid out based on time. And if we see heavy usage or if we see some of these numbers creeping up too high consistently, now we know that a disc might be one of our bottlenecks, so we can start to identify that as well. And then network I left off. And network... Network is turned on, but it actually has to be configured on a per interface basis, and I haven't done that. So, when I run sar -n, it's going to complain. Yeah. So, we have to basically give it an interface, and get a little more specific with sar -n. Like, I would normally follow this up by saying I want to see TCP traffic or IP, but it just doesn't have that data in the log file. So, we'd have to do some extra configuration to be able to pull that down. But the sar utility is a great way to collect data over time, and then be able to allow us to look at it and evaluate the trends on our system. - Are there any other tools that we can use to help us narrow down the source of a problem? - Yeah, yeah, there's a handful of other tools that are available, and it all kind of comes down to what you want to do. So, there's troubleshooting like you described, Zach, and then there's capacity planning, looking ahead, and trying to figure out... Prevent a problem before it happens. If a problem's already happening, a lot of the tools we just saw can show us that, and we can kind of evaluate it, or with sar we can see the trend. Maybe my memory usage is going up a little bit each day. Well, I can plot that trend out, and say, "What's it going to be in a month? "What's it going to be in three months or six months?" And then evaluate when I need to upgrade or whether or not my system's already past what it can handle. We can see a lot of that stuff. Other tools that are handy, uptime is a nice one. Uptime just shows you how long your system has been online. I powered my system up. It's been up for apparently 18 minutes and 45 seconds. That's not very long, right? Oh, wait, actually that's 18 hours and 45 minutes. There we go, it's not seconds. So, mine's been up for almost a day, right? And this will track out for long periods of time, if you've got a system that's been online for weeks or months. I have a Linux server at home that's been online for hundreds of days now. So, they can stay up for a long time. Now, why do we care about uptime? Well, if you have an application that has a memory leak that is slowly bleeding memory away from your system, it'll get worse over time. If your system is recently rebooted, that memory leak goes away temporarily, but it might come back again. So, knowing how long your system has been online is important. We also have ps for listing our processes, and pstree for listing processes that are related. We're going to take a better look at those in the CPU episode. Let's see, w. W is a neat command, because it shows you who's logged into the server. If you're seeing high utilization, you want to make sure nobody else is logged in and doing things, like maybe performing updates, and that's affecting your numbers, kind pf skewing the results. You want to be aware of that. But at the end of the day, most of these tools don't matter for capacity planning. Capacity planning, we need to understand what effect the user load creates on our system. And that's going to come down to your application. So, let's say that I'm tuning an Apache web server, I'm running Apache here, then the Apache Foundation actually has a utility called ab. And ab lets me load test my Apache server. So, I can come in, and I can say, "Hey, I want you to send a a number of requests." We'll do 10,000 requests. "And I want you to do concurrently 10 requests." So, this would be like 10 people on the site simultaneously sending 10,000 requests, or 1,000 each, and run that against my server. And then I would point that to my server. And once I get that running, those are going to run, and it's probably going to run pretty quick. Oh, what did I leave off? Oh, back slash, there. So, it's going to run. There it goes, firing off those requests. I'm doing this all local, so it's pretty fast. But now I can go, and I can look at my metrics and see, how well did my server just handle 10 concurrent users? Let's bump that up, and say, "What if it wasn't 10? "What if it was 100 concurrent users?" And I run that, and now I can come in and look, and say, "All right, how did my server perform this time?" And now I know what impact 100 users has. I can plot that out, and say, "Here's the impact of 200 users, 400 users, 1,000 users." And I can establish, what is the upper limit of my server? What's the maximum amount of users my server can handle, or what do I need to upgrade to be able to handle that? Was disc the bottleneck? Was RAM the bottleneck? I can find that using sar, top, and other utilities. - Fantastic information. I sure enjoyed that. Don, you're the top. - I try, I try. (Zach laughing) - Capacity planning, wonderful episode. Before we leave, anything else you'd like to say? - Well, in this episode, we looked at a lot of different stuff. The key things I want you to take away are the four main metrics that we monitor, CPU, RAM, disc, and network. We also learned about the sar utility, top, htop, ab, and a handful of others. Use whatever tool is in your arsenal to be able to evaluate the the capacity of your systems. - Excellent, and there is a lot more in LPIC-2, so make sure you watch each and every episode. You'll be glad you did. Bye-bye for now. (upbeat music) - [Announcer] Thank you for watching ITProTV.
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.