Home arrow General arrow 10 steps to running a better network (for sys admins)
10 steps to running a better network (for sys admins) Print E-mail
Written by Jason   
May 05, 2007 at 12:00 AM

Being a system administrator for a large network can be one of two types of jobs: either you're a professional firefighter that does nothing but move from one fire to the next, or you're sitting back most of the time surfing Slashdot or Digg because your network simply runs the way it should.

Here are some thoughts I've put together over the years to help you get control over your network.

First, let's get some semantics out of the way: there are network admins and system admins. In my mind, network admins (mostly) focus on OSI layer 3 and below, while system admins are responsible for layer 4 and up. This article is written with system admins with many systems on a network in mind; I'll write something later for the network admins of the world.

Some of these recommendations are of course easier than others. Be aware that each one may not apply to all environments, just the majority. As usual with any advice you're pulling off the Internet, you're responsible for what you do with it; if you get yourself fired, don't come running to me.

Number 1: The Rule of Thirds

A very long time ago back in college, I was fortunate enough to have an instructor that taught me what has probably been the most influential thing in my career: the Rule of Thirds.

The rule simply states that it takes three things to be successful with any project in any environment in IT: one third politics, one third technical ability, and one third knowing the environment. If any one of these are lacking success becomes much more difficult to obtain.

Since I originally heard this and pondered it's meaning as if it were spoken from a monk on a tall mountain somewhere, I've found that it's true 100% of the time. Each and every project I approach I do a quick mental check of the Rule of Thirds and see if any particular third needs to be improved before the project begins. Do I know the technology that's being implemented well enough? Is there resistance to this project by a rogue manager that isn't on board? Do I know the location of the DNS servers, the datasets, etc.?

It sound simple, but it's easy to miss a third and halfway into the project start becoming bogged down with problems. Run a quick mental check of the Rule of Thirds before the start of any major project and learn the nirvana of meeting your deadlines ahead of schedule.

Number 2: Define your problem

If you put a horse behind a cart in an attempt to pull it, odds are the cart isn't going very far. One of the quickest ways to derail any network is to start implementing new technology and features without any consideration for what problem you're trying to solve.

Never take a solution and go looking for a problem on your network; it doesn't work that way. Don't get me wrong, technology is cool and I'm the first to want to play with a new product, but that's what isolated test networks are for. That's what VMWare Workstation is for. That is not what your production network is for.

Do your users need a specific feature of an application? Will they even use it if you roll it out? Do they all need it? Each of these questions is important to answer. If it will take three weeks of writing scripts, installing software, and tweaking security permissions to get it right and no one will even use it, is that a successful deployment?

Technology is a tool. Tools are used to fix problems. Using a tool without a problem to solve will only guarantee you that you'll create an even bigger problem down the road. 

Number 3: Manage the hype

Recently several of my users have requested Office 2007. "Why?", I asked. "Ummm...", they responded.

Your users, managers, and even your CEO get hit with tech ads trying to peddle the latest software release, and you're the one they come to when they want it installed even though they may have no use for it.

The best way to deal with these users is to know more about the product than they do, and show why the current solution works just as well. Being ahead of the game with new software applications is part of being a good administrator and knowing the differences between previous releases helps to alleviate some anxious early adopters. 

Explain the learning curve between the two and the expense incurred by your company to facilitate a request out of cycle like this. Be courteous and let them know that their request hasn't fallen on deaf ears, now is just not the time to implement something like this.

Occasionally, their requests may make sense, then by all means consider what they are asking, but many times it's simply your job to make sure that they understand why they can't have the latest hip application, at least not yet.

Number 4: Manage expectations

If your email server were to die entirely, how long would it take you to be back up and running? Does your manager know how long? Do your users?

Managing management and user expectation is a critical part of maintaining a network of systems. If your budget is only good enough to supply 99% uptime and your managers and users are expecting 99.9999%, then you have a serious problem.

You should already be keeping your management in the loop, but do they really understand the disaster recovery scenarios? If not, make them understand at all costs, otherwise when a disaster does strike they'll be hovering over you like a black cloud.

Keep end users in the loop as much as possible as well. One of the most frustrating things for an end user is finding out a particular system is down the hard way and not knowing when (or even if) it will be back up. Providing a high level view of what is currently planned to your end users helps to let them know when they can expect outages; during unplanned outages disseminating a rough (and usually overstated) timetable can also provide them with a sense of relief as they know at the very least someone is aware of it and working on it.

Number 5: Centralize your logging

Odds are good that if you're the person that installed a specific network application if something goes wrong with it next week you'll know what the problem is, or at least where to start looking. But what if someone else installed the application? What if you installed it, but it was eight months ago? What if the server that the logs are on simply won't start back up? What if the server had a security compromise and the logs have been erased?

Centralizing the logging of all servers and network devices on your network can facilitate finding critical information in a hurry much faster, simplify troubleshooting, and provide a redundant, easily accessible location for log files. The best part is, it's easy to set up even in the Microsoft world; all it requires for Windows is a useful tool like the open source Snare client to convert your logs into syslog format, and in the *nix world syslog itself. Forward these to a centralized logging server running syslog-ng and a solution is born. If that's not enough, you can throw a web frontend and search engine on top of it like Splunk, which is free (as in beer) for up to 500MB of logs a day.

Given the tools available and the huge payoffs involved, there's no reason anyone shouldn't be implementing some form of centralized logging.

Number 6: Virtualize your critical applications

Virtualization is a trend that's been growing in strength as the software to run it gets stronger. Products such as VMWare or Xen can separate the application from the hardware, making restoration of systems and migration to other physical hardware a far easier task in many cases. Given that VMWare Server and Xen are both free (beer, freedom respectively) there's no reason not to start thinking about some level of virtualization in every organization.

It's important to note that virtualization isn't for every application, but many servers in many organizations run at peak hours with very, very low CPU utilization making them great candidates for virtualization. Imagine backing up an entire machine with just a few files, and having the ability to revert back to previous images rapidly - it's the stuff we were promised years ago by the OS vendors, finally delivered by third parties. 

On top of that, VMWare recently released a tool that (among other functions) allows you to migrate a physical machine to a virtual one. As long as you have the extra disk space, why not keep a virtual backup copy of your critical machines?

Number 7: Automate your inventory tracking

It's pretty difficult to manage a system that you don't even know exists. But without fail, almost every environment I've worked in has had rogue machines somewhere on the network - be it someone's home laptop, a wireless access point, or even a forgotten print server. Getting a handle on rogue devices on your network is critical to maintaining a stable network.

Many times lack of a decent hardware and software inventory solution is caused by underfunding. In these cases I turn to the poor man's inventory solution: scripting. With a few properly written scripts it's possible to start getting some baseline metrics on what devices are logging on to your network.

In the Windows world it's fairly easy to pull some quick metrics from machines with the Sysinternals tool psinfo. Run this from a startup script on each machine and copy the contents out to a share on a central server and you've got a really simple solution to start gathering metrics with from Windows:

@echo off
cls
echo A Simple Inventory Script
echo
echo Please wait ...
IF EXIST C:\Script GOTO :INVENTORY
md C:\Script
net use v: \\<SERVERNAME>\NETLOGON
copy v:\psinfo.exe c:\Script
net use /delete v:
:INVENTORY
C:\script\psinfo.exe > c:\script\%COMPUTERNAME%.txt
net use x: \\<SERVERNAME>\drop$
copy c:\Script\%COMPUTERNAME%.txt x:\
net use /delete x:

Very simple, but at least this gets you started. By cross referencing these systems against your DHCP address leases you can start to find out which systems aren't logging into the domain.

In the *nix world things are a bit more complex. Are you using a NFS server to store shared files? A NIS server to centralize user accounts? Using either of those can assist in implementing a perl script such as this one can help to pull together good hardware information and FTP it to a central server.

Another solution to both Windows and *nix is to take a page from the security sector and run a network scanner such as Nessus (free, as in beer) against your IP address range. Be cautious of this and plan to do it during off hours should a critical system run into problems with the scan.

Just because you don't have the money for an overpriced solution doesn't mean you can't have any inventory solution.

Number 8: Cover your @$$

I've reached a point in my career that when a manager or customer doesn't take my advice about a technical recommendation I make sure I get it in writing. The reason for this is simple: if I make a recommendation for a new hard drive array plus maintenance and the customer purchases the array without maintenance, if the array goes bad I refuse to let my reputation go bad with it.

I've seen more than a few IT managers trying to eek by on a shoestring budget just to try to increase their own bonus in the eyes of their boss. These types of managers can be difficult to deal with, but not impossible; if you have them sign off that they aren't taking your advice, I've found that if something goes wrong they don't want to be the one twisting in the wind. Without this paper trail vindicating you, most of the time they'll be the ones pointing fingers at you to shift the blame.

I've had customers encourage me to load unlicensed or improperly licensed software, violate fire codes in server rooms, and even 'hide' certain data that violated the law. Asking them to sign off on something like this tends to fix problems like these with very little hassle. Sure, you're not playing ball with them, but if they're asking you to do one small, illegal thing, it won't be long before the next one.

Remember, your reputation as a system admin is one of your most important job qualifications. 

Number 9: Prepare for the worst, hope for the best

Once a server arrives in many datacenters, it's quickly separated from the accompanying software. The software is then locked in a cabinet and quickly forgotten about, that is until the server fails and the software is required, leading to a mad dash to find it.

Preparing your servers should be a detailed process. Many times (depending on the server case) it's possible to rip a copy of the setup CDs and actually place them inside the case in a sleeve; this way, they're never lost. Maintaining a description of the BIOS configuration along with a description of how the RAID array is configured is critical as well.

Routine maintenance such as defragmentation of the hard drives and registry is something that is very overlooked many times at the server level. Make time for it by scheduling at least one or two days a month that all systems will be down for maintenance. At the very least, regularly scheduled reboots of servers tend to keep unexpected crashes from memory leaks at a minimum.

Just as important is keeping some level of documentation of the applications and OS settings on each server. Keeping accurate documentation is a nearly thankless task, but it's crucial if you ever want to take a vacation. Otherwise the week you're in the Bahamas you can guarantee someone will find you to let you know that they have no idea how to restore the server. Keeping restore instructions for individual servers written so simple anyone could follow is difficult task, but a job saver should you be away and something fail.

Number 10: Stay open to new technologies

There are some things Microsoft does well. There are some things that Linux does well. There are even some things Apple does well. Each of these has a place in most environments, and each doesn't perform nearly as well as the others do in certain areas. Become familiar with the strengths and weaknesses of major OSes and applications in relation to your environment. What do they do well? What do they fail at miserably?

Running a 100% homogeneous environment can lead to vendor lock in at the best case, and terrible disasters such as viruses and worms at the worst case. Keeping a hold on your companies pocketbook is the goal of every IT salesman on the planet, always remember that when you speak with one. Any other concerns you have are secondary to them.

If you're hosting a critical database that requires 99.9999% uptime then certain OSes may be a better option than others regardless of what the marketing information from each says. Financial limitations of the project may sway certain decisions as well. Which platform was the application designed for? Which is easiest to support with in house IT personnel? Keep these things in mind as you decide since theirs a good chance you'll be the one supporting it in the long run.

 

Each of these 10 things has helped to keep my systems running and myself employed, and I hope they help you as well. Have a recommendation of your own? Let know!

Last Updated ( Jun 29, 2007 at 10:10 PM )
� http://www.roboguys.com, Mambo and Designed by Siteground