High availability is the nirvana of network engineers. When everything is humming, monitoring screens are green, and no notifications are in their inbox, network managers can spend time dreaming about how big their Christmas bonuses might be this year. But an unforeseen problem can interrupt their dreams and pop their bubbles at any given moment. Better to be prepared by anticipating what could go wrong – before the worst happens.
If you really want to undertake a fascinating study of network vulnerabilities, have a look around the OWASP website, and you might even consider downloading their Top Ten list. There you’ll find all kinds of clues about the extent to which the bad guys try to infiltrate your critical data. To make it easy, we’ll list them here:
The problem is, of course, that a hacker must only get it right once, but the network security guys have to get it right all the time. Ninety percent is just not good enough. Fail to secure your network in just one of these areas, and your company can suffer real damage.
And it’s not only these web application issues. There are any number of security issues that could leave your network infrastructure exposed to potential catastrophe. To get an idea of the scope of network security management, just look at the objectives in the CompTIA Security+ exam. The broad areas include architecture and design, identity and access management, and cryptography. Developing a thorough and robust security management strategy is not an overnight task. It takes a lot of planning and consideration.
“Don’t touch a live network.” That’s what they say in the network operations centers anyway. Any work that is done to “improve” a network can quite possibly make it worse – and it could shut the network down altogether.
We’ve written about it in our post called “Decrease Downtime with Change Management”. You’re taking chances when you mess around with a live network. That’s why it’s best to perform network changes only through a well-developed change control program. A planned change should include such things as a risk assessment, cost and benefits, requirements, a step-by-step method of procedure, and a way to “roll back” the change if things go wrong.
Sloppy moves and changes put your network at risk. It’s especially galling when cowboy engineers take it upon themselves to fix perceived network problems – on their own, without authorization, and on the fly. It’s dangerous, and it could cost your organization a considerable sum.
Network surveillance is all about noticing faults and failures as quickly as possible, allowing for quick action to bypass or resolve the problem. But despite our best efforts, some things are missed. Some experts refer to these problems as “monitoring gaps”. “Many users have monitoring gaps between what the user is experiencing and what the system is telling them.”
In talking about monitoring gaps, these experts generally refer to problems with the monitoring tools themselves. But there’s more to consider here. The fact is that there are so many bits of network that need monitoring that it’s easy to forget some important network element. There’s even a whole area of network management dedicated to the development of network monitoring and alarm capture tools.
In the language of network managers, every piece of the network that needs to be monitored – whether hardware or software – is called a managed object. But it’s not enough to place a network element into the pools of managed objects. You need to ask how the element will be monitored. What measurements will be taken? How often will measurements take place? What threshold of measurement will trigger the creation of an alarm notification?
Networks that are inadequately monitored can easily go off track. Imagine driving your car without a gas gauge. Sooner or later you’re going to run out of gas – because you don’t really know what’s in there. And modern cars send signals that trigger a check engine light, leading to further investigation of possible problems. You need these measurements and notifications to keep your car going. Not only do they detect immediate problems, they can predict an imminent failure too, before it is too late. The same is true for your network.
You’ve got to have a backup plan. Sometimes that means keeping running systems on standby in case the primary machines fail. The dictionary defines a failsafe as “a system or plan that comes into operation in the event of something going wrong”. That’s what redundancy is about. If your critical application runs on a server that happens to go offline, you would be wise to have another similar server available to take over the duties of the first one. In fact, the best option is to have one ready that assumes the role of the primary server automatically as needed.
For more information, check out our article “Redundancy: When Too Much is Just Right”. There you’ll find an interesting discussion on the concept of redundancy and how it applies to computer networks.
The real goal of network management is high availability. Redundant systems allow networks to continue to provide services with little or no interruption. The ideal is to have 100% availability – total uptime – for all the critical information systems in your network. High availability is only possible with redundant networks and systems.
Welcome to the 21st century! Ok, we’ve been there for a while. But you couldn’t tell it by the way some networks are run, especially in the age of Software Defined Networking. It’s amazing to see so many people still doing routine tasks by hand. But we have these amazing devices – computers – at our fingertips. And computers can do such remarkable things, if only we put them to good use.
Make a list of all the tasks that your IT support teams handles during a given week. Are they still doing manual backups? Are their eyes glossed over from eight hours of staring at network monitoring dashboards? Are they still doing single commands rather than running scripts?
It’s a fact of the workplace that over time machines assume the responsibilities of humans. It’s true that automation and robotics eventually replace people. That’s progress. You can’t fight it. It even happens to network engineers. The device configuration tasks that used to take them days, weeks, or months can now be completed in minutes with well-designed software-defined networking (SDN) tools.
We may as well get used to it. And the best approach is to take advantage of automation to handle routine tasks. Let the human engineers go on to other activities and leave the routine work to automation. Besides, humans do make mistakes — which brings us to our next point.
If you’re really trying to figure out the weakest link in your network, look no further than the human parts of the chain. We’re not saying that machines are better than humans. But it is true that some technical professionals are better than others.
In a 2016 NetworkWorld article by Ann Bednarz, “Top reasons for network downtime”, she cites a problem that we all know to be a real one: human error. A survey of 315 network professionals by the startup Veriflow tells the tale. “Nearly all respondents (97%) agree that human error is a cause of network outages,” says Bedarz. The extent of human culpability in network outages remains a matter for debate. But we have all heard stories.
Point is, there’s no substitute for hiring good people, and training them well. We know that mistakes will happen. We’re only human, after all. But if the same guy tends to keep making noticeable mistakes, whether big or small – well, maybe it’s time for a roster change.
Assessing your network for weaknesses is a never-ending job. What kind of process do you have for evaluating the state of your network infrastructure? Do you routinely look for weak links so that you can strengthen your network and address problems before they get too big? We can’t tell you what the weakest link is in your network. You’ll have to figure that out for yourself. We can, however, help you regain control over the public facing side of things by routing traffic over our platform. Internally, the best we can do is encourage you to ferret out any threats to network availability sooner rather than later. Your company’s future may depend on it.
One of Total Uptime’s largest assets is our global cloud platform, deployed in dozens of datacenters around the world with incredible cloud based routing capacity. This platform gives our customers the ability to control and route traffic between the client and the datacenter, in the middle of the Internet. As you can imagine, this provides a […]
Imagine that a smooth operator convinces Barney Fife — the famous sheriff’s deputy on TV — to unlock a Mayberry jail cell. Barney has the keys. He has the authority. He wants to do the right thing, but he’s easily confused and manipulated. Your web browser has authority too. It can do a lot of […]
Imagine that you are a data packet. You have information to deliver, and you’re anxious to get started. Here you are at your source network device, and you look out on the vast network of switches, routers, and other machines. But where do you go? You look down and see that you’ve been given a […]
Service providers do everything they know how to avoid downtime. Generally the best practice is not to touch a live network. If it ain’t broke, don’t fix it. But change is inevitable, and eventually every network or system will need improvements. The trick is to handle these changes with little to no disruption of running […]