Management and monitoring of IT infrastructure is one of the main tasks of the IT department of any company. HP Software solutions simplify the task of system administrators and organize effective control of the organization's network

Modern IT infrastructure is a complex heterogeneous network that includes telecommunication, server and software solutions from different manufacturers, operating on the basis of different standards. Its complexity and scale determine the high level of automated monitoring and control tools that must be used to ensure reliable network operation. HP Software products will help you solve monitoring problems at all levels, from infrastructure (network equipment, servers and storage systems) to quality control of business services and business processes.

Monitoring systems: what are they?

In modern platforms for IT monitoring, there are 3 directions for developing and bringing monitoring to a new level. The first is called "The Bridge" ("Umbrella System", "Manager of Managers"). Its concept is to utilize investments in existing systems that perform the tasks of monitoring individual parts of the infrastructure, and turn the systems themselves into information agents. This approach is a logical development of conventional IT infrastructure monitoring. As a prerequisite for the implementation of a "Bridge" type system, the IT department can make a decision to consolidate disparate monitoring systems for the transition to monitoring IT services / systems as a whole, disparate systems that are not able to show the whole picture, a case of not diagnosing a serious application failure, and a large number of warnings and alarms, lack of uniform coverage, prioritization and identification of causation.

The result of the implementation will be an automated collection of all available events and metrics of the IT infrastructure, comparison of their state and impact on the "health" of the service. In the event of a failure, the operator will have access to a panel that displays the root cause of the failure with recommendations for resolving it. In the event of a typical failure, it is possible to assign a script that automates the necessary operator actions.

The next trend is called Anomaly Analytics. Here, as in the first case, metrics and events are collected from a number of infrastructure monitoring systems, and in addition, the collection of IT and security logs is configured. Thus, a huge amount of information is accumulated every minute, and the company wants to benefit from its disposal. There are a number of reasons for the implementation of Anomaly Analytics: the complexity of the timely collection, storage and analysis of all data, the need to reactively eliminate unknown problems, the inability to quickly identify important information for troubleshooting, the complexity of manual search operations for individual logs, and the need to identify deviations and repetitive crashes.

The implementation of the system will allow for the automated collection of events, metrics and logs, storage of this information for the required period of time, as well as the analysis of any information, including logs, performance information and system data. In addition, it will be possible to predict and resolve any type of problem and prevent known failures.

And finally - "Application performance management", or identifying and fixing failures in transactions of end users. This solution can be a useful addition, working closely with the previous two. Moreover, such a system in itself can also give a quick result from implementation. In this case, the company has business-critical applications. At the same time, the availability and quality of the service are important, one of the key elements of which is the application (Internet banking, CRM, billing, etc.). When the availability or quality of delivery of this service drops, it usually comes down to proactivity and quick recovery. Such a system is usually implemented when it is necessary to increase the availability of application services and performance, as well as to reduce the mean time to recovery. In addition, this approach is good for eliminating unnecessary costs and risks associated with a service level agreement (SLA), and for preventing customer abandonment (business protection).

Implementation results may differ depending on the main task. In general, this allows the implementation of typical user actions by a “robot” from different regions / network segments, parsing “mirrored” traffic, checking the availability and quality of services with identifying bottlenecks, informing the operator about the need to restore operability, indicating the place of degradation. If necessary, it becomes possible to deeply diagnose the operation of the application to find the reasons for the systematic deterioration of the services.

The above approaches can be implemented using HP Software products, which will be discussed below.

"Bridge" from HP

HP Operations Bridge introduces the latest generation of umbrella monitoring systems. The solution combines monitoring data from proprietary agents, various HP Software monitoring modules, and third-party monitoring tools. The flow of events from all sources of information is superimposed on the resource-service model, correlation mechanisms are applied to it to determine which events are causes, symptoms and consequences.

Separately, we should dwell on the resource-service model, and more precisely models, since there can be an unlimited number of such models for analyzing information from different angles. Its completeness and relevance depends on the ability of the solution to perform the correlation of the flow of events. To maintain the relevance of the models, intelligence tools based on agents and agentless technologies are used to obtain detailed information about the service components, the relationships between them and the mutual influence on each other. It is also possible to import data about the service topology from external sources - monitoring systems.

Another important aspect is ease of management. In complex and dynamically changing environments, it is important to ensure that the monitoring system is adjusted when the structure of the systems changes and new services are added. The Operations Bridge includes the Monitoring Automation component, which allows you to automatically configure systems entered into the monitoring perimeter, for which data on service resource models is used. At the same time, the configuration and change of the previously made monitoring settings are supported.

Whereas previously administrators could perform the same settings for the same type of infrastructure components (for example, metrics on Windows, Linux or UNIX servers), which took a lot of time and effort, now you can dynamically and centrally configure threshold values for a metric in the context of a service or service.

Application Analytics

Using the traditional approach to monitoring implies that it is initially known which parameters to monitor and which events to monitor. The growing complexity and dynamics of development of IT infrastructures makes it necessary to look for other approaches, as it becomes more and more difficult to control all aspects of the system operation.

HP Operations Analytics allows you to collect and store all data about the operation of an application: log files, telemetry, business and performance metrics, system events, and more, and use analytical engines to identify trends and forecast. The solution brings the collected data to a single format and then, making a contextual choice, based on the data of the log files, displays on the timeline what happened, at what moment and on what system. The product provides several forms of data visualization (for example, an interactive "heat map" and topology of log-file relationships) and uses the helper function to find the entire set of data collected for a specific period in the context of an event or by a query entered in a search bar. This helps the operator understand what caused the failure (or, when using HP SHA data in conjunction with HP OA data, make a prediction), as well as identify both the culprit and root cause of the failure. HP Operations Analytics provides the ability to reproduce a picture of the service and environment at the time of the failure and isolate it in context and time.

Another analytical tool is HP Service Health Analyzer. HP SHA detects anomalous behavior of monitored infrastructure elements in order to prevent possible denial of services or violation of the specified parameters of their provision. The product uses special algorithms for statistical data analysis based on the HP BSM topological service-resource model. With their help, it is possible to construct a profile of normal values of performance parameters collected both from software and hardware platforms and from other BSM modules (for example, HP RUM, HP BPM) that characterize the state of services. Typical parameter values are entered into such profiles, taking into account the days of the week and time of day. SHA performs historical and statistical analysis of the accumulated data (to understand the essence of the identified data), and also performs comparison with the existing dynamic profile (baselining).

Application performance monitoring

When it comes to monitoring application performance, the following components of the HP solution should be highlighted:

HP Real User Monitoring (HP RUM) - control of the passage of transactions of real users;
HP Business Process Monitoring (HP BPM) - control of application availability by emulating user actions;
HP Diagnostics - monitoring the passage of requests inside the application.

HP RUM and HP BPM measure application availability from an end-user perspective.

HP RUM parses network traffic, identifying transactions of real users. In this case, you can control the exchange of data between application components: the client part, the application server and the database. This makes it possible to track user activity, the processing time of various transactions, as well as determine the relationship between user actions and business metrics. Using HP RUM, monitoring service operators will be able to instantly receive prompt notifications about service availability problems and information about errors encountered by users.

HP BPM is an active monitoring tool that performs synthetic user transactions that are indistinguishable from real ones for monitored systems. Monitoring data from HP BPM is useful for calculating a real SLA, since the “robot” performs identical checks at the same time intervals, ensuring constant quality control of the processing of typical (or most critical) requests. By configuring probes to perform synthetic transactions from multiple locations (for example, from different company offices), you can also assess the availability of the service for different users, taking into account their location and communication channels. HP BPM uses the Virtual User Generator (VuGen) to emulate activity, which is also used in the popular load testing product HP LoadRunner. VuGen supports a huge range of different protocols and technologies, so you can control the availability of almost any service, as well as use a single set of scripts for testing and monitoring.
If the cause of service failures or slowdowns lies within technologies such as Java, .NET, and so on, HP Diagnostics can help.

The solution provides deep control over Java, .NET, Python on Windows, Linux and Unix platforms. The product supports a variety of application servers (Tomcat, Jboss, WebLogic, Oracle, etc.), MiddleWare and databases. HP Diagnostics specialized agents are installed on application servers and collect technology-specific data. For example, for a Java application, you can see which queries are being executed, which methods are being used, and how much time it takes to process them. The structure of the application is automatically drawn, it becomes clear how its components are involved. HP Diagnostics tracks business transactions across complex applications, identifies bottlenecks, and provides experts with the information they need to make decisions.

Distribution of HP solutions in

27.06.2011 Nate McAlmond

I selected three candidates: WhatsUp Gold Premium from Ipswitch, OpManager Professional from ManageEngine, and ipMonitor from SolarWinds. Each of these network scanners does not cost more than $ 3,000 (per 100 devices), and each has a trial period during which you can test your chosen product for free

I work for a mid-sized company and we have been using the same network monitoring system for about seven years now. It provides our administrators with basic information about the availability of servers and services, and also sends SMS text messages to our mobile phones in case of problems. I have come to the conclusion that there is a need to update the system, or at least add an effective tool that can provide better performance and provide detailed information about the health of the terminal servers, Exchange systems, and SQL systems hosted on your network. ... Let's compare our candidates.

Discovery process

To prepare for testing, the first step was to enable the SNMP service on all devices, including Windows servers. By changing the SNMP service settings, I have set read-only access on all devices that the monitoring process should cover. On Windows Server 2003/2000 systems, SNMP is installed using the Windows Components wizard in the Add / Remove Programs pane, and on Windows Server 2008, SNMP components are added using the Server Manager wizard. After completing the wizard, you need to launch the Services snap-in located in the Control Panel folder and configure the SNMP service — it's easy. Managed network devices such as firewalls, switches, routers, and printers also have SNMP service management, and the configuration process is usually fairly straightforward. For more information on SNMP, see the Simple Network Managment Protocol (technet.microsoft.com/en-us/library/bb726987.aspx).

Next, I installed all three monitoring systems on one of my two working systems with Windows XP SP3. Once installed, each system consisted of two parts: a database and a web server. Each of the selected systems can be managed through the web interface by multiple administrators, and you have the ability to configure accounts with different levels of access. Common to the three systems is that each user has the ability to add, remove and move panels in his workspace. Panels display data of the same type, such as CPU load or memory usage for different devices on the network.

Before starting the network scan (called the discovery process), I set the account parameters that each system should use to gain access to the devices discovered on the network. As shown in the comparison table, Ipswitch WhatsUp Gold Premium allows you to configure an account for SNMP, WMI, Telnet, SSH, ADO, and VMware services. ManageEngine OpManager Professional supports SNMP, WMI, Telnet, SSH, and URL, while SolarWinds ipMonitor supports SNMP, WMI, and URL.

After configuring SNMP service on network devices and accounts (Windows and SNMP) for each of the network monitoring systems, I started the discovery process for a range of IP addresses on my local network. All systems detected about 70 devices. Using the default scan settings, the systems under test performed well in identifying device types and provided detailed information on device status. All three systems contain sensors for basic device and server performance such as CPU utilization, memory utilization, disk utilization / fullness, packet loss / latency, status of Exchange services, Lotus, Active Directory, and all Windows services. Each of the systems had the ability to add sensors both for individual devices and for large groups of devices.

OpManager and WhatsUp Gold have an interface for identifying and collecting VMware service events from servers and guests. In addition, both products have a switch port manager polling feature that shows which devices are connected to different ports on the managed switches. The information obtained will help you determine which port on the switch connects to a particular business application, without the need to manually trace cables in server rooms. In the future, you can configure alerts for specific switch ports. When working with the OpManager package, to get the results of port polling, just select the switch and run the Switch Port Mapper tool - the system will return the results in a few seconds. A similar tool included with WhatsUp Gold is called MAC Address and must be run with the Get Connectivity option checked. WhatsUp Gold takes longer to get a result as it tries to scan devices and gather information about connections across the entire network.

Ipswitch WhatsUp Gold Premium

Ipswitch WhatsUp Gold Premium
PER: provides the most accurate results among three competitors, allows you to create your own sensors, provides comprehensive monitoring tools for VMware systems, integrates with AD.
AGAINST: fewer built-in sensors and higher cost compared to competitors (if you purchase a license for less than 500 devices).
GRADE: 4.5 out of 5.
PRICE:$ 7495 for 500 devices, $ 2695 for 100 devices, $ 2195 for 25 devices.
RECOMMENDATIONS: I recommend WhatsUp Gold to IT departments serving large VMware environments or looking to build their own sensors.
CONTACT INFORMATION: Ipswitch, www.ipswitch.com

When working with the IpMonitor and OpManager systems, from time to time I came across incomprehensible readings that baffled me. In IpMonitor, the dashboards could display negative values when the CPU utilization dropped significantly. In another case, when the processor load was close to zero, the IpMonitor system sent me a notification that the processor was used at 11.490%! The OpManager system, tracking and sending me the correct information about the disk usage of the domain controllers, in some cases did not include any of the controllers in the list of 10 servers with the maximum disk space usage. At the same time, the adjacent panel announced that one of my domain controllers should not even be in the top ten, but in the top three. When using WhatsUp Gold, I have not encountered such situations. WhatsUp Gold monitors the processor core utilization in its dashboards, and when I compared the results from the WhatsUp Gold dashboards to the Windows Performance Monitor, they matched exactly for each core. Likewise, information about hard disk usage was correctly reported to all relevant applications in the workspace.

WhatsUp Gold has a built-in sensor library that allows you to build new sensors from existing ones. Large organizations may find this capability useful because it allows you to create a single set of sensors to monitor different types of devices — the most efficient way to configure sensors for a group of devices.

WhatsUp Gold does not have sensors for individual device manufacturers (with the exception of the sensor for APC UPS power supplies), unlike the OpManager suite, which uses its own sensors for Dell, HP and IBM devices, but it allows you to create sensors like Active Script. This type allows you to develop your own monitoring processes using the VBScript and JScript programming languages. Active Script sensors have an online support center where WhatsUp Gold users can retrieve and download pre-built scripts.

The only improvement I would like to add to WhatsUp Gold is the interface (Figure 1), mainly because it is too linear. For example, it will take up to 5 clicks on the Cancel and Close buttons to return from the Active Monitor Library window back to the workspace. Also, the WhatsUp Gold system lacks a sensor (unless, of course, you write it manually) that checks the status of the site, and it may be necessary, especially in cases where the site is hosted on a third-party server and there are no other ways to access it.

Figure 1: The WhatsUp Gold Premium Interface

To handle situations where devices have been idle for some time, you can set up notifications to be sent every 2, 5, and 20 minutes. In this way, you can draw the administrator's attention to the lack of response from critical nodes for a certain period of time.

WhatsUp Gold is the only system under review that has the ability to integrate into an LDAP environment - this can be crucial when choosing a solution for large networks.

ManageEngine OpManager

ManageEngine OpManager
PER: the best user interface among the three products; more built-in sensors than the other two systems; the lowest price when purchasing a license for 50 or fewer devices.
AGAINST: during the tests, not all device indicators were displayed correctly; it may take some time to debug to make the system fully functional.
GRADE: 4.5 out of 5.
PRICE:$ 1995 for 100 devices, $ 995 for 50 devices, $ 595 for 25 devices.
RECOMMENDATIONS: IT departments looking to get the most out of the box (excluding AD integration) will appreciate OpManager Professional. When you buy licenses in the 26-50 device range, its cost is almost half the cost of the other two products.
CONTACT INFORMATION: ManageEngine, www.manageengine.com

After installing OpManager, I found that it was easy to configure with a myriad of features and was easy to navigate between. OpManager provides the ability to send (along with emails and SMS) Direct Messages to your Twitter account — a nice alternative to email. Using Twitter accounts in this way allows me to keep abreast of what is happening on the network, but since my phone does not ring when messages are delivered from the Twitter system, I simultaneously want to receive text notifications about the most important events. I can view threshold information on any server using Twitter messages and thus have a log of current events on the network, but I do not need to use this scheme to send alerts about critical situations.

In addition to standard sensors, OpManager offers SNMP performance monitoring technologies developed by vendors for devices such as Dell Power-Edge, HP Proliant and IBM Blade Center. OpManager can also be integrated with the Google Maps API so you can add your devices to the Google map. However, you will need to purchase a Google Maps API Premium account (unless you plan to make your map publicly available) in accordance with the licensing terms for the free version of the Google Maps API system.

To handle situations where an administrator receives an alert but does not respond to it within a specified amount of time, OpManager can be configured to send an additional alert to another administrator. For example, an employee usually responsible for handling critical events for a particular group of servers might be busy or sick. In such a case, it makes sense to set up an additional warning that will attract the attention of another administrator if the first warning has not been viewed or cleared within a specified number of hours / minutes.

Among the three products under consideration, only the OpManager system had a section designed to monitor the quality of VoIP exchanges in the global network. To use VoIP monitoring tools, devices on both the source and destination networks must support Cisco IP SLA technology. In addition, the OpManager system, as shown in Figure 2, includes more sensors and dashboards than any competing product.

Figure 2: OpManager Professional Interface

SolarWinds ipMonitor

SolarWinds ipMonitor
PER: unlimited number of devices at a very low price; ease of use.
AGAINST: there is no mechanism for coordinating the actions of administrators.
GRADE: 4 out of 5.
PRICE:$ 1995 - the number of devices is not limited (25 sensors are free).
RECOMMENDATIONS: if your budget is tight and you need to monitor a large number of devices, if the monitoring process does not require complex solutions and you are comfortable with a non-systematic approach to coordinating administrator actions, SolarWinds is your system.
CONTACT INFORMATION: SolarWinds, www.solarwinds.com

After my first introduction to ipMonitor, the interface in Figure 3 was confusing to me. It took me almost an eternity to find a place where the frequency of the system checks for individual system sensors is configured (by default, the poll was performed every 300 seconds). However, after using ipMonitor for several weeks, I found that this system is extremely easy to use and has sufficient capabilities for high-quality network monitoring. Using ipMonitor, you can configure the default scan so that any service or performance setting will always be included in future scans. In addition to the standard (and named above) sensors, ipMonitor offers a Windows Event Log sensor that can be used to send alerts when critical events are detected.

Figure 3 SolarWinds ipMonitor Interface

On the other hand, ipMonitor does not have mechanisms for tracking / assigning alert destinations. It doesn't matter if the company has one network administrator, but larger IT departments are likely to find it a significant disadvantage that the system is unable to acknowledge alerts, assign destinations, and reset alerts. If administrators forget to coordinate outside the system, it is possible that multiple administrators receive the same warning and start working on the same problem. However, to resolve such conflicts, it is enough to develop a consistent algorithm for responding to alerts - for example, if the responsibility for network devices is divided between administrators, then there will be no questions about who should deal with a particular problem.

Time to make a decision

I have already decided for myself which of the three products is more suitable for my environment. I settled on the ManageEngine OpManager with a 50-device license for several reasons.

First of all, I need to be able to track as many parameters of my environment as possible, as this is the best way to avoid unexpected failures. In this matter, the OpManager system is definitely ahead of the competition. The second reason is budget. I can continue to use our old on / off monitoring tools for workstations and printers and thus avoid the cost of additional licenses. Finally, I really liked the approach that ManageEngine took in developing OpManager to take advantage of the new technology, and I think it’s worth the investment in an annual maintenance and support package that allows you to download updates as the product develops.

Nate McAlmond ( [email protected]) - Director of IT for a social services agency, MCSE, Security and Network +, specializes in thin client solutions and medical databases

ESSAY

This document is a technical project for the development and implementation of a network monitoring system for the Verkhnepyshminsk city public data transmission network of Gerkon LLC. The project investigated the existing network monitoring systems, analyzed the current situation at the enterprise and substantiated the choice of specific components of the network monitoring system.

The document contains a description of design solutions and equipment specifications.

The result of the design is the developed solutions for the implementation and use of the system:

§ Full description of all stages of design, development and implementation of the system;

§ System Administrator's Guide, which includes a description of the system user interface.

This document presents complete design solutions and can be used to implement the system.

LIST OF GRAPHIC DOCUMENT SHEETS

Table 1 - List of sheets of graphic documents

1Network monitoring systems220100 4010002Logical structure of the network220100 4010003Algorithm of the core of network monitoring and alerts220100 4010004Structure of the analyzer load of network interfaces220100 4010005The structure of the collector of system event logs220100 4010006Nagios interface220100 4010001

LIST OF SYMBOLS, SYMBOLS AND TERMS

Ethernet is a data transmission standard issued by the IEEE. Determines how to send or receive data from a common data transmission medium. Forms the lower transport layer and is used by various high-level protocols. Provides a data transfer rate of 10Mbps.

Fast Ethernet is a 100Mbps data transmission technology using the CSMA / CD method, just like 10Base-T.

FDDI - Fiber Distributed Data Interface - fiber-optic interface for distributed data transmission - a data transmission technology at a speed of 100 Mbit / s using the token ring method.

IEEE - Institute of Electrical and Electronic Engineers is an organization that develops and publishes standards.

LAN - Local Area Network - local area network, LAN. address - Media Access Control - identification number of the network device, usually determined by the manufacturer.

RFC - Request for Comments - a collection of documents issued by the IEEE organization and including descriptions of standards, specifications, etc.

TCP / IP - Transmission Control Protocol / Internet Protocol - transmission control protocol / Internet protocol.

LAN - Local Area Network.

OS - Operating system.

Software - Software.

SCS - Structured cabling system.

DBMS - Database Management System.

Trend - Long-term statistics that allows you to build a so-called trend.

Computer - Electronic computer.

INTRODUCTION

The information infrastructure of a modern enterprise is a complex conglomeration of different-scale and heterogeneous networks and systems. To keep them running smoothly and efficiently, you need an enterprise-scale management platform with integrated tools. Until recently, however, the very structure of the network management industry impeded the creation of such systems - the "players" in this market sought to lead by releasing products of limited scope, using tools and technologies that are not compatible with systems from other suppliers.

Today the situation is changing for the better - there are products that claim to be flexible in managing the entire variety of corporate information resources, from desktop systems to mainframes and from local area networks to Internet resources. At the same time, there is a realization that control applications must be open to solutions from all vendors.

The relevance of this work is due to the fact that in connection with the proliferation of personal computers and the creation of automated workstations (AWS) on their basis, the importance of local computer networks (LAN) has increased, the diagnosis of which is the object of our study. The subject of the research is the main methods of organizing and diagnosing modern computer networks.

"Local network diagnostics" is a process of (continuous) analysis of the state of an information network. In the event of a malfunction of network devices, the fact of the malfunction is recorded, its place and type are determined. The fault is reported, the device is shut down and replaced with a backup.

The network administrator, who most often has the responsibility for conducting diagnostics, should begin to study the features of his network already at the stage of its formation, i.e. know the network diagram and a detailed description of the software configuration, indicating all parameters and interfaces. For registration and storage of this information, special network documentation systems are suitable. Using them, the system administrator will know in advance all possible "hidden defects" and "bottlenecks" of his system, so that in the event of an abnormal situation, he will know what the problem is with the hardware or software, the program is damaged or led to an error. operator actions.

The network administrator should remember that from the point of view of users, the quality of the application software on the network is decisive. All other criteria, such as the number of data transmission errors, the degree of utilization of network resources, equipment performance, etc., are secondary. A "good network" is a network whose users do not notice how it works.

Company

The pre-diploma practice took place at the Gerkon LLC enterprise in the support department as a system administrator. The company has been offering Internet access services in the cities of Verkhnyaya Pyshma and Sredneuralsk using Ethernet technology and dial-up channels since 1993 and is one of the first Internet service providers in these cities. The rules for the provision of services are regulated by a public offer and regulations.

Scientific and production tasks of the division

The support department solves the following range of tasks within a given enterprise:

§ technical and technological organization of providing Internet access via dial-up and dedicated channels;

§ technical and technological organization of wireless Internet access;

§ allocation of disk space for storing and ensuring the operation of sites (hosting);

§ support for mailboxes or virtual mail server;

§ placement of client equipment at the provider's site (colocation);

§ lease of dedicated and virtual servers;

§ data backup;

§ deployment and support of corporate networks of private enterprises.

1. NETWORK MONITORING SYSTEMS

Despite the many tricks and tools for detecting and troubleshooting computer networks, the ground underfoot for network administrators is still fragile. Computer networks increasingly include fiber optic and wireless components, which make it meaningless to use traditional technologies and tools designed for conventional copper cables. In addition, at speeds over 100 Mbit / s, traditional diagnostic approaches often fail, even if the transmission medium is a regular copper cable. Perhaps the most significant change in computer networking that administrators have to face, however, was the inevitable shift from shared-media Ethernet to switched networks, in which individual servers or workstations often act as switched segments.

True, with the implementation of technological transformations, some old problems were solved by themselves. Coaxial cable, which has always been more difficult to troubleshoot electrical problems than twisted pair cable, is becoming a rarity in corporate environments. Token Ring networks, whose main problem was their dissimilarity to Ethernet (and not at all a weakness in technical terms), are gradually being replaced by switched Ethernet networks. Protocols such as SNA, DECnet, and AppleTalk that generate numerous error messages on network layer protocols are being replaced by IP. The IP stack itself has become more stable and easier to maintain, as proven by millions of customers and billions of Web pages on the Internet. Even hardened adversaries of Microsoft have to admit that connecting a new Windows client to the Internet is much easier and more reliable than installing legacy third-party TCP / IP stacks and standalone dial-up software.

As many modern technologies make it difficult to troubleshoot and manage network performance, the situation could be even worse if ATM became widespread at the PC level. A positive role was also played by the fact that in the late 90s, without having time to gain recognition, several other high-speed data exchange technologies were rejected, including Token Ring with a bandwidth of 100 Mbps, 100VG-AnyLAN and advanced ARCnet networks. Finally, the US rejected a very complex OSI protocol stack (which, however, was legalized by a number of European governments).

Let's take a look at some of the topical issues faced by enterprise network administrators.

The hierarchical topology of computer networks with backbone Gigabit Ethernet channels and dedicated switch ports of 10 or even 100 Mbps for individual client systems has increased the maximum bandwidth potentially available to users by at least 10-20 times. Of course, in most computer networks, there are bottlenecks at the level of servers or access routers, since the bandwidth per user is significantly less than 10 Mbps. Therefore, replacing a 10 Mbps hub port with a dedicated 100 Mbps switch port for an end node does not always result in a significant speed increase. However, if we consider that the cost of switches has recently decreased, and most enterprises have a Category 5 cable that supports 100 Mbps Ethernet technology, and network cards are installed that can operate at 100 Mbps immediately after a system reboot, it becomes it is clear why it is so difficult to resist the temptation of modernization. In a traditional shared media LAN, a protocol analyzer or monitor can examine all traffic on a given network segment.

Rice. 1.1 - Traditional LAN with shared media and protocol analyzer

Although the performance advantage of switched networking is sometimes subtle, the proliferation of switched architectures has had disastrous consequences for traditional diagnostics. In a highly segmented network, protocol analyzers are only able to see unicast traffic on a single switch port, as opposed to a legacy network where they could scrutinize any packet in the collision domain. In such conditions, traditional monitoring tools cannot collect statistics on all "dialogues", because each "talking" pair of endpoints uses, in essence, its own network.

Rice. 1.2 - Switched network

In a switched network, a protocol analyzer can only "see" a single segment at one point if the switch is unable to mirror multiple ports at the same time.

To maintain control over highly segmented networks, switch vendors offer a variety of tools to restore full network “visibility”, but there are many challenges along the way. Switches currently shipping usually support port mirroring, where the traffic from one of them is duplicated to a previously unused port to which a monitor or analyzer is connected.

However, "mirroring" has several disadvantages. First, only one port is visible at a time, so it is very difficult to identify problems that affect multiple ports at once. Second, mirroring can degrade the performance of the switch. Third, the mirror port usually does not reproduce physical layer faults, and sometimes VLAN designations are even lost. Finally, in many cases, full duplex Ethernet links may not be fully mirrored.

A partial solution to the analysis of aggregated traffic parameters is to use the monitoring capabilities of mini-RMON agents, especially since they are built into every port of most Ethernet switches. Although mini-RMON agents do not support the RMON II specification Capture group of objects, which provide full protocol analysis, they still provide insight into resource utilization, error rates, and multicast volume.

Some disadvantages of port mirroring technology can be overcome by installing "passive taps" such as those produced by Shomiti. These devices are pre-installed Y-connectors and allow the real signal to be monitored with protocol analyzers or another device.

The next actual problem is the problem with the features of the optics. Computer network administrators usually use specialized optical network diagnostics equipment only to solve problems with optical cables. Common off-the-shelf SNMP or CLI based device management software can diagnose problems on switches and routers with optical interfaces. Few network administrators are faced with the need to diagnose SONET devices.

With regard to fiber-optic cables, the reasons for the occurrence of possible malfunctions in them are significantly less than in the case of copper cables. Optical signals do not cause crosstalk, which occurs when the signal from one conductor induces a signal on the other - this factor most complicates diagnostic equipment for copper cable. Optical cables are immune to electromagnetic noise and induced signals, so they do not need to be located away from elevator motors and fluorescent lamps, which means that all these variables can be excluded from the diagnostic scenario.

The signal strength, or optical power, at a given point is actually the only variable that needs to be measured when troubleshooting optical networks. If it is possible to determine the signal loss along the entire length of the optical channel, then it will be possible to identify almost any problem. Inexpensive add-on modules for copper cable testers enable optical measurements.

Enterprises deploying and maintaining large optical infrastructure may need to purchase an Optical Time Domain Reflecto-meter (OTDR), which performs the same functions for optical fiber as a Time Domain Reflectometer (TDR) for copper. The device acts like a radar: it sends pulsed signals through the cable and analyzes their reflections, on the basis of which it detects damage in the conductor or some other anomaly, and then tells the expert where to look for the source of the problem.

Although various cable and connector vendors have made it easier to terminate and fan out optical fiber, it still requires a certain level of specialized skill, and with a prudent policy, an enterprise with an advanced optical infrastructure will have to train its employees. No matter how well the cable network is laid, there is always the possibility of physical damage to the cable as a result of some unexpected incident.

Troubleshooting 802.11b WLANs can also cause problems. The diagnosis itself is as simple as in the case of hub-based Ethernet networks, since the wireless media is shared between all owners of client radio devices. Sniffer TechHlogies was the first to offer a protocol analysis solution for these networks with a bandwidth of up to 11 Mbps, and most of the leading analyzer vendors have since introduced similar systems.

Unlike an Ethernet hub with wired connections, the quality of wireless client connections is far from stable. Microwave radio signals used in all local transmission options are weak and sometimes unpredictable. Even small changes in the position of the antenna can seriously affect the quality of the connections. Wireless LAN access points are equipped with a device management console, which is often a more powerful diagnostic method than visiting wireless clients and monitoring throughput and error conditions with a handheld analyzer.

While the data synchronization and device setup issues faced by PDA users are more naturally aligned with the technical support team rather than the network administrator, it is not hard to foresee that in the not too distant future many such devices will evolve from stand-alone aids to complement PCs. , into full-fledged network clients.

Typically, corporate WLAN operators will (or should) discourage the deployment of overly open systems in which any user within range of the network with a compatible interface card can access every information frame of the system. The Wired Equivalent Privacy (WEP) wireless security protocol provides user authentication, integrity assurance, and data encryption, but as is usually the case, perfect security makes it difficult to analyze the root cause of network problems. On secure WEP networks, diagnosticians need to know the keys or passwords that protect information assets and control access to the system. When accessing in the mode of receiving all packets, the protocol analyzer will be able to see all the frame headers, but the information they contain will be meaningless without the presence of keys.

When diagnosing tunneled links, which many manufacturers call remote access VPNs, the problems encountered are similar to those encountered when analyzing encrypted wireless networks. If traffic does not pass through the tunneled link, then the cause of the problem is not easy to determine. This could be an authentication error, a breakdown at one of the endpoints, or a congestion on the public Internet. Trying to use a protocol analyzer to detect high-level errors in tunneled traffic would be a waste of energy because the data content as well as the application, transport, and network layer headers are encrypted. In general, measures taken to improve the security of corporate networks tend to make troubleshooting and performance issues difficult to identify. Firewalls, proxy servers, and intrusion detection systems can further complicate localization of problems.

Thus, the problem of diagnosing computer networks is relevant and, ultimately, diagnosing faults is a management task. For most mission-critical corporate systems, lengthy recovery operations are not feasible, so the only solution is to use redundant devices and processes that can take over the necessary functions immediately after a failure occurs. In some enterprises, networks always have an additional redundant component in the event of a primary failure, that is, n x 2 components, where n is the number of primary components required to provide acceptable performance. If the Mean Time To Repair (MTTR) is long enough, even more redundancy may be needed. The point is that troubleshooting time is not easy to predict, and significant costs during an unpredictable recovery period are a sign of poor management.

For less critical systems, redundancy may not be economically viable, in which case it is prudent to invest in the most efficient tools (and in training) to speed up the process of troubleshooting and diagnosing problems in the enterprise. In addition, outsourced support for specific systems can be outsourced, either outsourced to the enterprise, external datacenters, or Application Service Providers (ASPs) or management service providers. In addition to costs, the most significant factor influencing the decision to use third-party services is the level of competence of its own personnel. Network administrators must decide if a particular function is so closely tied to a specific enterprise task that a third party cannot be expected to perform better than employees of the company.

Almost immediately after the first enterprise networks were deployed, the reliability of which left much to be desired, manufacturers and developers put forward the concept of "self-healing networks." Modern networks are certainly more reliable than they were in the 90s, but not because problems began to fix themselves. Eliminating software and hardware failures in today's networks still requires human intervention, and no major change is foreseen in this state of affairs in the near term. Diagnostic methods and tools are in line with current practice and technology, but they have not yet reached the level that would significantly save network administrators' time in dealing with network problems and performance shortages.

1.1 Software diagnostics

Among the software tools for diagnosing computer networks, one can single out special network management systems (Network Management Systems) - centralized software systems that collect data on the state of nodes and communication devices of a network, as well as data on traffic circulating in the network. These systems not only monitor and analyze the network, but also perform network management actions in automatic or semi-automatic mode - enabling and disabling device ports, changing the parameters of bridges in the address tables of bridges, switches and routers, etc. Examples of control systems are the popular systems HPOpenView, SunNetManager, IBMNetView.

System Management tools perform functions similar to management systems, but with respect to communication equipment. However, some of the functions of these two types of control systems can be duplicated, for example, system controls can perform the simplest analysis of network traffic.

Expert systems. This type of systems accumulates human knowledge about identifying the causes of abnormal operation of networks and possible ways to bring the network into a working state. Expert systems are often implemented as separate subsystems of various network monitoring and analysis tools: network management systems, protocol analyzers, network analyzers. The simplest variant of an expert system is a context-sensitive help-system. More complex expert systems are so-called knowledge bases with elements of artificial intelligence. An example of such a system is the expert system built into the Cabletron Spectrum control system.

1.1.1 Protocol Analyzers

During the design of a new or modernization of an old network, it is often necessary to quantify some characteristics of the network, such as the intensity of data flows over network communication lines, delays arising at various stages of packet processing, response times to requests of one type or another, the frequency of occurrence. certain events and other characteristics.

For these purposes, various means can be used, and first of all - monitoring means in network management systems, which have already been discussed earlier. Some measurements on the network can also be performed by software meters built into the operating system, an example of which is the Windows Performance Monitor component. Even modern cable testers are capable of capturing packets and analyzing their contents.

But the most advanced network exploration tool is a protocol analyzer. The protocol analysis process involves capturing and examining the packets that are circulating in the network that implement a particular network protocol. Based on the results of the analysis, you can make informed and balanced changes to any network components, optimize its performance, and troubleshoot. Obviously, in order to be able to draw any conclusions about the impact of some change on the network, it is necessary to analyze the protocols both before and after the change.

The protocol analyzer is either an independent specialized device or a personal computer, usually portable, of the Нtebook class, equipped with a special network card and appropriate software. The network card and software used must correspond to the network topology (ring, bus, star). The analyzer connects to the network in the same way as a normal node. The difference is that the analyzer can receive all data packets transmitted over the network, while an ordinary station can only receive those addressed to it. The analyzer software consists of a kernel that supports the operation of the network adapter and decodes the received data, and additional program code that depends on the type of topology of the network under investigation. In addition, a number of protocol-specific decoding routines are provided, such as IPX. Some analyzers may also include an expert system that can give the user recommendations on what experiments should be carried out in a given situation, what these or those measurement results may mean, and how to eliminate some types of network malfunctions.

Despite the relative diversity of protocol analyzers on the market, there are some features that are more or less inherent in all of them:

User interface. Most analyzers have a well-developed user-friendly interface, usually based on Windows or Motif. This interface allows the user to: display the results of traffic intensity analysis; get an instant and average statistical estimate of the network performance; set specific events and critical situations to track their occurrence; decode protocols of different levels and present the contents of packets in an understandable form.

Capture buffer. The buffers of the various analyzers differ in size. The buffer can be located on the installed network card, or it can be allocated space in the RAM of one of the computers on the network. If the buffer is located on a network card, then it is managed in hardware, and due to this, the input speed increases. However, this leads to an increase in the cost of the analyzer. In case of insufficient performance of the capture procedure, some of the information will be lost, and analysis will be impossible. The buffer size determines the analysis capabilities for more or less representative samples of the captured data. But no matter how large the capture buffer is, sooner or later it will fill up. In this case, either the capture stops, or the filling starts from the beginning of the buffer.

Filters. Filters allow you to control the process of capturing data, and thus save buffer space. Depending on the value of certain fields of the packet, specified as a filter condition, the packet is either ignored or written to the capture buffer. The use of filters greatly speeds up and simplifies the analysis, as it excludes viewing the packages that are not needed at the moment.

Switches are some conditions set by the operator for starting and stopping the process of capturing data from the network. These conditions can be the execution of manual commands to start and stop the capture process, the time of day, the duration of the capture process, the appearance of certain values in data frames. Switches can be used in conjunction with filters, allowing for more detailed and subtle analysis, as well as more productive use of a limited size of the capture buffer.

Search. Some protocol analyzers allow you to automate the viewing of information in the buffer and find data in it according to specified criteria. While the filters check the input stream against the filtering conditions, the search functions are applied to the data already accumulated in the buffer.

The analysis methodology can be presented in the following six stages:

Capturing data.

View captured data.

Data analysis.

Search for errors. (Most analyzers make this easier by identifying error types and identifying the station from which the error packet came.)

Performance research. The utilization rate of the network bandwidth or the average response time to a request is calculated.

A detailed study of individual sections of the network. The content of this stage is concretized as the analysis is carried out.

Usually, the process of analyzing the protocols takes relatively little time - 1-2 business days.

Most modern analyzers allow you to analyze several WAN protocols at once, such as X.25, PPP, SLIP, SDLC / SNA, frame relay, SMDS, ISDN, bridge / router protocols (3Com, Cisco, Bay Networks and others). Such analyzers allow you to measure various parameters of protocols, analyze network traffic, conversion between LAN and WAN protocols, delay on routers during these conversions, etc. More advanced devices provide the ability to simulate and decode WAN protocols, "stress" testing, measurements maximum bandwidth, testing the quality of services provided. For the sake of versatility, almost all WAN protocol analyzers implement LAN testing and all major interfaces. Some instruments are capable of analyzing telephony protocols. And the most advanced models can decode and conveniently represent all seven OSI layers. The advent of ATM has led manufacturers to supply their analyzers with testing tools for these networks. These instruments can perform complete testing of E-1 / E-3 ATM networks with monitoring and simulation support. The set of the analyzer's service functions is very important. Some of them, for example, the ability to remotely control the device, are simply irreplaceable.

Thus, modern analyzers of WAN / LAN / DTM protocols can detect errors in the configuration of routers and bridges; set the type of traffic sent over the WAN; determine the used range of speeds, optimize the ratio between bandwidth and the number of channels; localize the source of incorrect traffic; perform serial interface testing and full ATM testing; carry out full monitoring and decoding of the main protocols on any channel; analyze statistics in real time, including the analysis of local area network traffic across global networks.

1.1.2 Monitoring Protocols

SNMP (Simple Network Management Protocol) is a communication network management protocol based on the TCP / IP architecture.

Based on the TMN concept in 1980-1990 various standardization bodies have developed a number of protocols for managing data networks with a different spectrum of implementation of TMN functions. One type of such management protocol is SNMP. SNMP was developed to test the functionality of network routers and bridges. Subsequently, the scope of the protocol extended to other network devices such as hubs, gateways, terminal servers, LAN Manager servers, Windows NT machines, etc. In addition, the protocol allows for the possibility of making changes to the functioning of these devices.

This technology is designed to provide management and control over devices and applications in a communication network by exchanging control information between agents located on network devices and managers located at control stations. SNMP defines a network as a collection of network management stations and network elements (hosts, gateways and routers, terminal servers) that collectively provide administrative links between network management stations and network agents.

With SNMP, there are managed and supervised systems. A managed system includes a component called an agent that sends reports to the managing system. Essentially, SNMP agents transmit management information to management systems as variables (such as "free memory", "system name", "number of running processes").

The SNMP agent is a processing element that provides managers located at the management stations of the network with access to the values of the MIB variables, and thereby enables them to implement the functions of managing and monitoring the device.

A software agent is a resident program that performs management functions, as well as collects statistics for transferring it to the information base of a network device.

Hardware Agent - Embedded hardware (with a processor and memory) that stores software agents.

Variables accessible via SNMP are organized in a hierarchy. These hierarchies and other metadata (such as the type and description of a variable) are described by Management Information Bases (MIBs).

Today there are several standards for control information databases. The main ones are the MIB-I and MIB-II standards, as well as the RMON MIB version of the database for remote management. In addition, there are standards for specific device MIBs of a particular type (for example, MIBs for hubs or MIBs for modems), as well as proprietary MIBs of specific equipment manufacturers.

The original MIB-I specification only defined read operations on variable values. The operations of changing or setting the values of an object are part of the MIB-II specifications.

The MIB-I version (RFC 1156) defines up to 114 objects, which are classified into 8 groups:

System - general information about the device (for example, vendor ID, time of last system initialization).

Interfaces - describes the parameters of the device's network interfaces (for example, their number, types, exchange rates, maximum packet size).

AddressTranslationTable - describes the correspondence between network and physical addresses (for example, using the ARP protocol).

InternetProtocol - data related to the IP protocol (addresses of IP gateways, hosts, statistics about IP packets).

ICMP - data related to the ICMP control message exchange protocol.

TCP - data related to the TCP protocol (for example, about TCP connections).

UDP - data related to the UDP protocol (number of transmitted, received and erroneous UPD datagrams).

EGP - data related to the ExteriorGatewayProtocol routing exchange protocol used on the Internet (the number of messages received with and without errors).

From this list of variable groups, it can be seen that the MIB-I standard was developed with a rigid focus on managing routers that support the TCP / IP stack protocols.

In the MIB-II version (RFC 1213), adopted in 1992, the set of standard objects was significantly expanded (to 185), and the number of groups increased to 10.

RMON agents

The latest addition to SNMP functionality is the RMON specification, which provides remote communication with the MIB.

The RMON standard emerged in November 1991 when the Internet Engineering Task Force issued RFC 1271 titled "Remote Network Monitoring Management Information Base". This document contained a description of RMON for Ethernet networks. - a computer network monitoring protocol, an SNMP extension, which, like SNMP, is based on the collection and analysis of information about the nature of information transmitted over the network. As in SNMP, the collection of information is carried out by hardware and software agents, the data from which are sent to the computer where the network management application is installed. The difference between RMON and its predecessor is, first of all, in the nature of the information collected - if in SNMP this information characterizes only events occurring on the device where the agent is installed, then RMON requires the received data to characterize traffic between network devices.

Prior to RMON, SNMP could not be used remotely, it only allowed local management of devices. The RMON MIB has an improved set of properties for remote management, since it contains aggregated information about the device, which does not require the transfer of large amounts of information over the network. RMON MIB objects include additional packet error counters, more flexible graphical trending and statistics analysis, more powerful filtering tools for capturing and analyzing individual packets, and more complex alert conditions. RMON MIB agents are more intelligent than MIB-I or MIB-II agents and do much of the work of processing device information that managers used to do. These agents can be located inside various communication devices, and also be executed as separate software modules running on universal PCs and laptops (an example is LANalyzerНvell).

The intelligence of RMON agents allows them to perform simple actions to diagnose faults and alert about possible failures - for example, using RMON technology, you can collect data on the normal functioning of the network (i.e., perform the so-called baselining), and then issue alerts when the network operating mode deviates from the baseline - this may indicate, in particular, that the equipment is not fully operational. By bringing together information from RMON agents, the management application can help a network administrator (located, for example, thousands of kilometers from the analyzed network segment) to localize a problem and develop an optimal action plan to fix it.

The collection of RMON information is carried out by hardware and software probes that are connected directly to the network. To perform the task of collecting and primary analysis of data, the probe must have sufficient computing resources and the amount of RAM. There are three types of probes currently on the market: embedded, computer-based, and stand-alone. A product is considered RMON capable if it implements at least one RMON group. Of course, the more RMON data groups are implemented in a given product, the more expensive it is, on the one hand, and, on the other hand, the more complete information about the network operation it provides.

Built-in probes are expansion modules for network devices. These modules are available from many manufacturers, including major companies such as 3Com, Cabletron, Bay Networks and Cisco. (Incidentally, 3Com and Bay Networks recently acquired Axon and ARMON, renowned leaders in the design and manufacture of RMON controls. This interest in this technology from major network equipment manufacturers further demonstrates how much remote monitoring is needed for users.) The decision to build RMON modules into hubs looks natural, because it is from observing these devices that one can form an idea of the segment's operation. The advantage of such probes is obvious: they allow obtaining information on all major RMON data groups at a relatively low price. The disadvantage, first of all, is not very high performance, which is manifested, in particular, in the fact that built-in probes often do not support all RMON data groups. 3Com recently announced its intention to release RMON-capable drivers for Etherlink III and Fast Ethernet network adapters. As a result, it will be possible to collect and analyze RMON data directly at workstations on the network.

Computer-based probes are simply networked computers that have the RMON software agent installed on them. These probes (which include, for example, Network General's Cornerstone Agent 2.5) offer better performance than built-in probes and generally support all RMON datasets. They are more expensive than built-in probes, but much less expensive than standalone probes. In addition, computer-based probes are quite large, which can sometimes limit their applicability.

Standalone probes are the best performing; as it is easy to understand, these are at the same time the most expensive products of all those described. Typically, a standalone probe is a processor (i486 class or RISC processor) equipped with sufficient RAM and a network adapter. The market leaders in this sector are Frontier and Hewlett-Packard. Probes of this type are small in size and very mobile - they are very easy to connect and disconnect from the network. When solving the problem of managing a network of a global scale, this, of course, is not a very important property, but if RMON tools are used to analyze the operation of a medium-sized corporate network, then (given the high cost of devices) the mobility of probes can play a very positive role.

The RMON object is numbered 16 in the MIB object set, and the RMON object itself is bundled in accordance with RFC 1271, consists of ten data groups.

Statistics - current accumulated statistics about packet characteristics, the number of collisions, etc.

History - statistical data saved at regular intervals for subsequent analysis of trends in their changes.

Alarms - statistic thresholds above which the RMON agent sends a message to the manager. Allows the user to define a number of threshold levels (these thresholds can refer to a variety of things - any parameter from the statistic group, its amplitude or rate of change, and much more), upon exceeding which an alarm is generated. The user can also determine under what conditions the exceeding of the threshold value should be accompanied by an alarm signal - this will avoid generating a signal "for nothing", which is bad, firstly, because no one pays attention to the constantly burning red light, and secondly because the transmission of unnecessary alarms over the network leads to unnecessary congestion of the communication lines. An alarm is usually passed on to a group of events, where it is determined what to do with it next.

Host - data about hosts on the network, including their MAC addresses ..

HostTopN is a table of the busiest hosts on the network. The Host N host table (HostTopN) contains a list of the top N hosts with the highest value of the specified statistic for the specified interval. For example, you can request a list of 10 hosts that have experienced the maximum number of errors in the last 24 hours. This list will be compiled by the agent itself, and the management application will receive only the addresses of these hosts and the values of the corresponding statistical parameters. It can be seen to what extent this approach saves network resources.

TrafficMatrix - statistics about the traffic intensity between each pair of hosts on the network, sorted in a matrix. The rows of this matrix are numbered in accordance with the MAC addresses of the stations - the sources of the messages, and the columns - in accordance with the addresses of the recipient stations. Matrix elements characterize the traffic intensity between the respective stations and the number of errors. Having analyzed such a matrix, the user can easily find out which pairs of stations generate the most intensive traffic. This matrix, again, is formed by the agent itself, so there is no need to transfer large amounts of data to a central computer responsible for network management.

Filter - packet filtering conditions. The criteria by which packets are filtered can be very diverse - for example, you can require to filter out as erroneous all packets, the length of which is less than a certain specified value. We can say that the installation of the filter corresponds, as it were, to the organization of the channel for transmitting the packet. Where this channel leads is determined by the user. For example, all erroneous packets can be intercepted and sent to the appropriate buffer. In addition, the appearance of a packet matching the set filter can be viewed as an event to which the system must react in a predetermined manner.

PacketCapture - conditions for capturing packets. A packet capture group includes capture buffers, where packets are sent whose characteristics satisfy the conditions set forth in the filter group. In this case, not the entire packet may be captured, but, say, only the first several tens of bytes of the packet. The content of the interception buffers can be subsequently analyzed using various software tools, revealing a number of very useful characteristics of the network. By rearranging the filters for certain signs, it is possible to characterize different parameters of the network operation.

Event - conditions for registering and generating events. The events group defines when to send an alarm to the management application, when to intercept packets, and in general how to respond to certain events occurring in the network, for example, if the thresholds set in the alarms group are exceeded: whether to set the control application is notified, or you just need to log this event and continue working. Events may or may not be related to the transmission of alarms - for example, sending a packet to the capture buffer is also an event.

These groups are numbered in the order shown, so, for example, the Hosts group has the numeric name 1.3.6.1.2.1.16.4.

The tenth group consists of special objects of the TokenRing protocol.

In total, the RMON MIB standard defines about 200 objects in 10 groups, recorded in two documents - RFC 1271 for Ethernet networks and RFC 1513 for TokenRing networks.

A distinctive feature of the RMON MIB standard is its independence from the network layer protocol (in contrast to the MIB-I and MIB-II standards, oriented to the TCP / IP protocols). Therefore, it is convenient to use in heterogeneous environments using different network layer protocols.

1.2 Popular network management systems

Network management system - hardware and / or software tools for monitoring and managing network nodes. The network management system software consists of agents localized on network devices and transmitting information to the network management platform. The communication method between management applications and agents on devices is determined by protocols.

Network management systems must have a number of qualities:

true distribution in accordance with the client / server concept,

scalability,

openness to cope with disparate hardware from desktops to mainframes.

The first two properties are closely related. Good scalability is achieved due to the distributed control system. Distributed means that a system can include multiple servers and clients. Servers (managers) collect data on the current state of the network from agents (SNMP, CMIP or RMON) built into the network equipment and accumulate them in their database. Clients are graphical consoles run by network administrators. The management system client software accepts requests for performing any actions from the administrator (for example, building a detailed map of a part of the network) and requests the necessary information from the server. If the server has the necessary information, then it immediately transmits it to the client, if not, then it tries to collect it from the agents.

Early versions of control systems combined all functions in one computer, which was run by an administrator. For small networks or networks with a small amount of controlled equipment, such a structure turns out to be quite satisfactory, but with a large amount of controlled equipment, the only computer to which information flows from all devices on the network becomes a bottleneck. And the network cannot cope with the large data flow, and the computer itself does not have time to process them. In addition, a large network is usually managed by more than one administrator, therefore, in addition to several servers in a large network, there should be several consoles for network administrators, and each console should provide specific information corresponding to the current needs of a particular administrator.

Support for dissimilar equipment is a desirable rather than a real-life feature of today's control systems. The most popular network management products include four systems: CabletronSystems' Spectrum, Hewlett-Packard's OpenView, IBM's NetView, and SunSoft's Solstice, a division of SunMicrosystems. Three out of four companies produce communication equipment themselves. Naturally, Spectrum manages Cabletron equipment best, OpenView controls Hewlett-Packard equipment, and NetView controls IBM equipment.

When building a network map, which consists of equipment from other manufacturers, these systems begin to make mistakes and take some devices for others, and when managing these devices they support only their basic functions, and many useful additional functions that distinguish this device from the rest, the control system is simple does not understand and therefore cannot use them.

To remedy this deficiency, management system developers include support not only for the standard MIBs I, MIB II, and RMON MIBs, but also for numerous proprietary MIBs from manufacturers. The leader in this area is the Spectrum system, which supports about 1000 MIBs from various manufacturers.

Another way to better support a specific hardware is to use an application based on a control platform from the company that produces this hardware. Leading companies - manufacturers of communication equipment - have developed and supply highly sophisticated and multifunctional control systems for their equipment. The most famous systems of this class include Optivity from BayNetworks, CiscoWorks from CiscoSystems, Transcend from 3Com. Optivity, for example, allows you to monitor and manage networks of routers, switches and hubs from BayNetwork, taking full advantage of their capabilities and features. Equipment from other manufacturers is maintained at the level of basic control functions. Optivity runs on Hewlett-Packard's OpenView and SunNetManager (predecessor to Solstice) from SunSoft. However, running on any management platform with multiple systems, such as Optivity, is too complex and requires that the computers on which it will run have very powerful processors and a lot of RAM.

However, if the network is dominated by equipment from a single manufacturer, then the availability of management applications from that manufacturer for a popular management platform allows network administrators to successfully solve many problems. Therefore, the developers of control platforms ship with them tools that simplify the development of applications, and the availability of such applications and their number is considered a very important factor in choosing a control platform.

The openness of the management platform also depends on how the collected network state data is stored. Most of the leading platforms allow data to be stored in commercial databases such as Oracle, Ingres, or Informix. The use of universal DBMS reduces the speed of the control system as compared to storing data in the operating system files, but it allows processing this data by any application that can work with these DBMS.

2. STATEMENT OF THE PROBLEM

In accordance with the current situation, it was decided to develop and implement a network monitoring system that would solve all the above problems.

2.1 Terms of Reference

To develop and implement a monitoring system that allows monitoring both switches, routers from different manufacturers, and servers of different platforms. Focus on the use of open protocols and systems, with the maximum use of ready-made developments from the free software fund.

2.2 Updated terms of reference

In the course of further formulating the problem and researching the subject area, taking into account economic and time investments, the technical task was clarified:

The system must meet the following requirements:

§ minimum requirements for hardware resources;

§ open source codes of all components of the complex;

§ extensibility and scalability of the system;

§ standard means of providing diagnostic information;

§ availability of detailed documentation for all software products used;

§ ability to work with equipment from different manufacturers.

3. PROPOSED SYSTEM

1 Choosing a network monitoring system

In accordance with the updated terms of reference, the Nagios system is best suited as the core of the network monitoring system, since it has the following qualities:

§ tools for generating diagrams are available;

§ there are reporting tools;

§ there is a possibility of logical grouping;

§ there is a built-in system for recording trends and predicting them;

§ it is possible to automatically add new devices (Autodiscovery) using the official plug-in;

§ there is the possibility of advanced monitoring of the host using an agent;

§ SNMP protocol support via plugin;

§ Syslog protocol support via plugin;

§ support for external scripts;

§ support for self-written plugins and the ability to create them quickly and easily;

§ built-in triggers and events;

§ full-featured web interface;

§ the possibility of distributed monitoring;

§ inventory via plugin;

§ the ability to store data both in files and in SQL databases, which is very important when increasing volumes;

§ GPL license, and therefore free basic delivery, support and open source codes of the system core and accompanying components;

§ dynamic and customizable maps;

§ access control;

§ built-in language for describing hosts, services and checks;

§ the ability to track users.

The Zabbix network monitoring system has a similar set of parameters, but at the time of implementation it had much less functionality than Nagios and had a beta-version status. In addition, a study of thematic forums and news feeds showed that Nagios is most prevalent among users, which means the presence of user-written documentation and the most detailed description of difficult points in setting up.

Nagios allows you to monitor network services such as SMTP, TELNET, SSH, HTTP, DNS, POP3, IMAP, NNTP and many others. In addition, you can monitor the use of server resources such as disk space consumption, free memory, and processor utilization. It is possible to create your own event handlers. These handlers will be executed when certain events triggered by checks of services or servers occur. This approach will allow you to actively respond to ongoing events and try to automatically solve the problems that have arisen. For example, you can create an event handler that will automatically restart a dangling service. Another advantage of the Nagios monitoring system is the ability to control it remotely using the wap interface of a mobile phone. Using the concept of "parent" hosts, it is easy to describe the hierarchy and dependencies between all hosts. This approach is extremely useful for large networks because it allows for complex diagnostics. And this quality, in turn, helps to distinguish not working hosts from those that are not available at the moment due to malfunctions in the operation of intermediate links. Nagios is able to build graphs of monitored systems and maps of monitored network infrastructure.

From his practice of working with Nagios, the author can give an example showing how useful he was for in his personal practice. Packet loss started on the external network interface of the firewall at intervals of several hours. Due to a malfunction, up to 20 percent of the passing traffic was lost. After a minute, the other interface started working again as expected. Due to the floating nature of this issue, it was not possible for several weeks to find out why the intermittent interruptions occur when using the Internet. Without Nagios, troubleshooting would take a long time.

Many administrators are familiar with Nagios' ancestor NetSaint. Although the NetSaint project site is still up and running, new developments are based on the Nagios source code. Therefore, everyone is advised to slowly move to Nagios.

The documentation supplied with Nagios states that it will work stably with many other Unix-like systems. To display the Nagios web interface, we need an Apache server. You are free to use any other, but in this work we will consider Apache as the most widespread web server on Unix platforms. It is possible to install a monitoring system without a web interface at all, but we will not do this, because this significantly reduces the usability.

4. SOFTWARE DEVELOPMENT

As the hardware part of the system being implemented, you can use a regular IBM-compatible computer, however, taking into account the possibility of further increasing the load and requirements of reliability and MTBF, as well as GosvyazNadzor, certified server equipment from Aquarius was purchased.

The existing network actively uses the Debian operating system based on the Linux kernel, has extensive experience in using this system, debugged most of the operations to manage, configure and ensure its stability. In addition, this OS is distributed under the GPL license, which means that it is free and open source, which corresponds to the updated terms of reference for the design of a network monitoring system. ́ nux ", also in some languages" GNU + Linux "," GNU-Linux ", etc.) is the general name for UNIX-like operating systems based on the kernel of the same name and libraries and system programs compiled for it, developed within the framework of the GNU project./ Linux runs on PC-compatible systems of the Intel x86 family, as well as IA-64, AMD64, PowerPC, ARM, and many others.

The GNU / Linux operating system is also often referred to as programs that complement this operating system, and application programs that make it a full-fledged multifunctional operating environment.

Unlike most other operating systems, GNU / Linux does not come with a single "official" package. Instead, GNU / Linux comes in a large number of so-called distributions in which GNU programs are linked to the Linux kernel and other programs. The most famous GNU / Linux distributions are Ubuntu, Debian GNU / Linux, Red Hat, Fedora, Mandriva, SuSE, Gentoo, Slackware, Archlinux. Russian distributions - ALT Linux and ASPLinux.

Unlike Microsoft Windows (Windows NT), Mac OS (Mac OS X), and commercial UNIX-like systems, GNU / Linux does not have a geographic center of development. There is no organization that owns this system; there is not even a single focal point. Linux software is the result of thousands of projects. Some of these projects are centralized, some are concentrated in firms. Many projects bring together hackers from all over the world who are familiar only by correspondence. Anyone can create their own project or join an existing one and, if successful, the results of the work will become known to millions of users. Users take part in testing free software, communicate directly with developers, which allows them to quickly find and fix bugs and implement new features.

The history of the development of UNIX systems. GNU / Linux is UNIX-compatible, but based on its own source code

It is this flexible and dynamic development system that is impossible for closed source projects that determines the exceptional cost-effectiveness of [source unspecified 199 days] GNU / Linux. The low cost of free development, streamlined testing and distribution mechanisms, the involvement of people from different countries with different visions of problems, the protection of the code under the GPL license - all this has become the reason for the success of free software.

Of course, such a high development efficiency could not fail to interest large firms that began to open their projects. This is how Mozilla (Netscape, AOL), OpenOffice.org (Sun), a free clone of Interbase (Borland) - Firebird, SAP DB (SAP) appeared. IBM has been instrumental in porting GNU / Linux to its mainframes.

On the other hand, open source significantly reduces the cost of developing closed systems for GNU / Linux and allows you to reduce the cost of the solution for the user. This is why GNU / Linux has become the platform often recommended for products such as Oracle, DB2, Informix, SyBase, SAP R3, Domino.

The GNU / Linux community maintains communication through Linux user groups.

Most users use distributions to install GNU / Linux. A distribution kit is not just a set of programs, but a series of solutions for various user tasks, united by uniform systems for installing, managing and updating packages, configuring and supporting.

The most widespread distributions in the world are: - a rapidly gaining popularity distribution kit focused on ease of learning and use - a free distribution version of the SuSE distribution kit, owned by Novell. It is easy to set up and maintain thanks to the use of the YaST utility. - Supported by the community and the RedHat corporation, predates the commercial version of RHEL. GNU / Linux is an international distribution, developed by an extensive community of developers for non-commercial purposes. Served as the basis for many other distributions. Differs in a strict approach to the inclusion of non-free software. - French-Brazilian distribution, a merger of the former Mandrake and Conectiva (English). - One of the oldest distributions, characterized by a conservative approach to development and use. - A distribution that is built from source codes. It allows very flexible configuration of the final system and optimization of performance, which is why it often calls itself a meta-distribution. Aimed at experts and power users - geared towards the latest software and constantly updated, supporting binary as well as installation from source and built on the KISS philosophy of simplicity, this distribution is geared towards knowledgeable users who want full power and Linux modifiability, but not at the sacrifice of maintenance time.

In addition to those listed, there are many other distributions, both based on the listed ones and created from scratch and often designed to perform a limited number of tasks.

Each of them has its own concept, its own set of packages, its own advantages and disadvantages. None of them can satisfy all users, and therefore other firms and associations of programmers happily exist alongside the leaders, offering their solutions, their distributions, and their services. There are many LiveCDs built on top of GNU / Linux, such as Knoppix. The LiveCD allows you to run GNU / Linux directly from a CD, without installing to a hard drive.

For those who want to thoroughly understand GNU / Linux, any of the distributions is suitable, however, quite often so-called source-based distributions are used for this purpose, that is, those that involve self-assembly of all (or part) of the components from source codes, such as LFS, Gentoo, ArchLinux or CRUX.

4.1 Installing the system kernel

Nagios can be installed in two ways - from source and from built packages. Both methods have advantages and disadvantages, let's look at them.

Pros of installing a package of their source codes:

§ the ability to configure the system in detail;

§ high degree of application optimization;

§ the most complete presentation of the program.

Cons of installing a package of their source codes:

§ additional time is required to assemble the package, often exceeding the time for its configuration and commissioning;

§ the inability to remove the package along with the configuration files;

§ the inability to update the package along with the configuration files;

§ impossibility of centralized control over installed applications.

When you install Nagios from a prebuilt package, the advantages of a raw installation method become disadvantages, and vice versa. However, as practice has shown, a pre-assembled package meets all the requirements for the system and there is no point in wasting time manually building the package.

Since both installation methods were initially tested, we will consider each of them in more detail.

4.1.1 Description of system kernel installation and source codes

Required packages.

You need to make sure the following packages are installed prior to deploying Nagios. A detailed consideration of the process of their installation is beyond the scope of this work.

· Apache 2

· PHP

· GCC compiler and developer libraries

· GD developer libraries

You can use the apt-get utility (better aptitude) to install them as follows:

% sudo apt-get install apache2

% sudo apt-get install libapache2-mod-php5

% sudo apt-get install build-essential

% sudo apt-get install libgd2-dev

1) Create a new user unprivileged account

A new account is created to run the Nagios service. You can also do this under the superuser account, which will pose a serious threat to the security of the system.

Become a superuser:

Create a new nagios user account and give it a password:

# / usr / sbin / useradd -m -s / bin / bash nagios

# passwd nagios

Create a nagios group and add the nagios user to it:

# / usr / sbin / groupadd nagios

# / usr / sbin / usermod -G nagios nagios

Let's create a nagcmd group to allow execution of external commands sent through the web interface. Add nagios and apache users to this group:

# / usr / sbin / groupadd nagcmd

# / usr / sbin / usermod -a -G nagcmd nagios

# / usr / sbin / usermod -a -G nagcmd www-data

2) Download Nagios and plugins for it

Create a directory for storing downloaded files:

# mkdir ~ / downloads

# cd ~ / downloads

Download the compressed source for Nagios and its plugins (# "justify"> # wget # "justify"> # wget # "justify"> 3) Compile and install Nagios

Let's unpack the compressed Nagios source codes:

# cd ~ / downloads

# tar xzf nagios-3.2.0.tar.gz

# cd nagios-3.2.0

We run the Nagios configuration script, passing it the name of the group that we created earlier:

# ./configure --with-command-group = nagcmd

Full list of parameters of the configuration script:

#. / configure --help

`configure" configures this package to adapt to many kinds of systems .: ./configure ... ... assign environment variables (eg, CC, CFLAGS ...), specify them as = VALUE. See below for descriptions of some of the useful variables.for the options are specified in brackets .:

h, --help display this help and exit

Help = short display options specific to this package

Help = recursive display the short help of all the included packages

V, --version display version information and exit

q, --quiet, --silent do not print `checking ..." messages

Cache-file = FILE cache test results in FILE

C, --config-cache alias for `--cache-file = config.cache"

n, --no-create do not create output files

Srcdir = DIR find the sources in DIR directories:

Prefix = PREFIX install architecture-independent files in PREFIX

Exec-prefix = EPREFIX install architecture-dependent files in EPREFIXdefault, `make install" will install all the files in `/ usr / local / nagios / bin", `/ usr / local / nagios / lib" etc. You can specify an installation prefix other than `/ usr / local / nagios" using `--prefix", for instance `--prefix = $ HOME" .better control, use the options below.tuning of the installation directories:

Bindir = DIR user executables

Sbindir = DIR system admin executables

Libexecdir = DIR program executables

Datadir = DIR read-only architecture-independent data

Sysconfdir = DIR read-only single-machine data

Sharedstatedir = DIR modifiable architecture-independent data

Localstatedir = DIR modifiable single-machine data

Libdir = DIR object code libraries

Includedir = DIR C header files

Oldincludedir = DIR C header files for non-gcc

Infodir = DIR info documentation

Mandir = DIR man documentation types:

Build = BUILD configure for building on BUILD

Host = HOST cross-compile to build programs to run on HOST Features:

Disable-FEATURE do not include FEATURE (same as --enable-FEATURE = no)

Enable-FEATURE [= ARG] include FEATURE

Disable-statusmap = disables compilation of statusmap CGI

Disable-statuswrl = disables compilation of statuswrl (VRML) CGI

Enable-DEBUG0 shows function entry and exit

Enable-DEBUG1 shows general info messages

Enable-DEBUG2 shows warning messages

Enable-DEBUG3 shows scheduled events (service and host checks ... etc)

Enable-DEBUG4 shows service and host notifications

Enable-DEBUG5 shows SQL queries

Enable-DEBUGALL shows all debugging messages

Enable-nanosleep enables use of nanosleep (instead sleep) in event timing

Enable-event-broker enables integration of event broker routines

Enable-embedded-perl will enable embedded Perl interpreter

Enable-cygwin enables building under the CYGWIN environmentPackages:

With-PACKAGE [= ARG] use PACKAGE

Without-PACKAGE do not use PACKAGE (same as --with-PACKAGE = no)

With-nagios-user = sets user name to run nagios

With-nagios-group = sets group name to run nagios

With-command-user = sets user name for command access

With-command-group = sets group name for command access

With-mail = Sets path to equivalent program to mail

With-init-dir = Sets directory to place init script into

With-lockfile = Sets path and file name for lock file

With-gd-lib = DIR sets location of the gd library

With-gd-inc = DIR sets location of the gd include files

With-cgiurl = sets URL for cgi programs (do not use a trailing slash)

With-htmurl = sets URL for public html

With-perlcache turns on cacheing of internally compiled Perl scriptsinfluential environment variables: C compiler commandC compiler flagslinker flags, e.g. -L if you have libraries in adirectory C / C ++ preprocessor flags, e.g. -I if you havein a nonstandard directory C preprocessorthese variables to override the choices made by `configure" or to helpto find libraries and programs with nonstandard names / locations.

Compiling the Nagios source code.

Let's install binaries, an initialization script, examples of configuration files and set permissions on the external commands directory:

# make install-init

# make install-config

# make install-commandmode

) Change the configuration

Sample configuration files are installed in the / usr / local / nagios / etc directory. They should be working right away. You only need to make one change before continuing.

Let's edit the configuration file /usr/local/nagios/etc/objects/contacts.cfg with any text editor and change the email address associated with the nagiosadmin contact definition to the address to which we are going to receive error messages.

# vi /usr/local/nagios/etc/objects/contacts.cfg

5) Configuring the web interface

Install the Nagios frontend configuration file in the Apache conf.d directory.

# make install-webconf

Create a nagiosadmin account to log into the Nagios web interface

# htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin

Restart Apache for the changes to take effect.

# /etc/init.d/apache2 reload

It is necessary to take measures to strengthen the security of the CGI to prevent theft of this account, since the monitoring information is quite sensitive.

) Compile and install the Nagios plugins

Let's unpack the compressed source codes for the Nagios plugins:

# cd ~ / downloads

# tar xzf nagios-plugins-1.4.11.tar.gz

Compile and install plugins:

# ./configure --with-nagios-user = nagios --with-nagios-group = nagios

#make install

) Start the Nagios service

Let's configure Nagios to boot automatically when the operating system is turned on:

# ln -s /etc/init.d/nagios /etc/rcS.d/S99nagios

Let's check the syntactic correctness of the example configuration files:

# / usr / local / nagios / bin / nagios -v /usr/local/nagios/etc/nagios.cfg

If there are no errors, then start Nagios:

# /etc/init.d/nagios start

) Enter the web interface

You can now log into the Nagios web interface using the following URL. You will be prompted to enter the username (nagiosadmin) and password that we set earlier.

# "justify">) Other settings

To receive email reminders about Nagios events, you need to install the mailx (Postfix) package:

% sudo apt-get install mailx

% sudo apt-get install postfix

You need to edit the Nagios reminder commands in the /usr/local/nagios/etc/objects/commands.cfg file and change all links from "/ bin / mail" to "/ usr / bin / mail". After that, you need to restart the Nagios service:

# sudo /etc/init.d/nagios restart

Detailed configuration of the mail module is described in Appendix D.

4.1.2 Description of installing the system kernel from the repository

As shown above, installing Nagios from source takes a significant amount of time and makes sense only if you require careful optimization of the application or want to thoroughly understand how the system works. In production, most software is installed from the repositories as precompiled packages. In this case, the installation is reduced to entering one command:

% sudo aptitude install nagios

The package manager will independently satisfy all dependencies and install the necessary packages.

4.2 Configuring the system kernel

Before configuring in detail, you need to understand how the Nagios kernel works. Its graphical description is shown below in illustration 6.2.

4.2.1 Description of the system kernel

The following figure shows a simplified diagram of how the Nagios service works.

Rice. 4.1 - System core

The Nagios service reads the main configuration file, which, in addition to the basic parameters of the service, contains links to resource files, object description files, and CGI configuration files.

The algorithm and logic of the network monitoring kernel are shown below.

Rice. 4.2 - Nagios Alert Algorithm

2.2 Description of the interaction of configuration files

The /etc/apache2/conf.d/ directory contains the nagios3.conf file, from which the apache web server takes settings for nagios.

The nagios configuration files are located in the / etc / nagios3 directory.

The /etc/nagios3/htpasswd.users file contains passwords for nagios users. The command to create a file and set the default password for the nagios user is given above. In the future, it will be necessary to omit the "-c" argument when specifying a password for a new user, otherwise the new file will overwrite the old one.

The /etc/nagios3/nagios.cfg file contains the basic configuration for nagios itself. For example, the event log files or the path to the rest of the configuration files that nagios reads at startup.

New hosts and services are defined in the / etc / nagios / objects directory.

4.2.3 Completing the descriptions of hosts and services

As shown above, you can configure the system kernel using one description file for hosts and services, but this method will not be convenient as the number of monitored equipment grows, so you need to create a kind of directory and file structure with descriptions of hosts and services.

The structure created is shown in Appendix H.

Hosts.cfg file

First, you need to describe the hosts that will be monitored. You can describe as many hosts as you like, but in this file we will restrict ourselves to general parameters for all hosts.

The host described here is not a real host, but a template on which all other host descriptions are based. The same mechanism can be found in other configuration files when the configuration is based on a predefined set of defaults.

Hostgroups.cfg file

This is where hosts are added to the hostgroup. Even in a simple configuration, when there is only one host, you still need to add it to the group so that Nagios knows which contact group to use to send notifications. More details about the contact group below.

Contactgroups.cfg file

We have defined a contact group and added users to this group. This configuration ensures that all users are alerted if something is wrong with the servers the group is responsible for. However, it should be borne in mind that individual settings for each user can override these settings.

The next step is to provide contact information and alert settings.

Contacts.cfg file

In addition to providing additional contact information for users in this file, one of the fields, contact_name, has another purpose. CGI scripts use the names given in these fields to determine if the user has the right to access a certain resource or not. You must set up authentication based on .htaccess, but other than that you must use the same names used above in order for users to work through the web interface.

Now that the hosts and contacts are configured, you can proceed to configure the monitoring of the individual services that should be monitored.

Services.cfg file

Here we, as in the hosts.cfg file for the hosts, set only general parameters for all services.

There are a huge number of additional Nagios modules available, but if there is still no check, you can always write it yourself. For example, there is no module to check whether Tomcat is running or not. You can write a script that loads a jsp page from a remote Tomcat server and returns the result depending on whether the loaded page contains some text on the page or not. (When adding a new command, be sure to mention it in the checkcommand.cfg file, which we did not touch).

Further, for each individual host, we create our own description file, in the same file we will store the descriptions of the services that we will monitor for this host. This is done for convenience and logical organization.

It is worth noting that Windows hosts are monitored via SNMP and NSClient a that ships with Nagios. Below is a diagram of its work.

Rice. 4.3 - Scheme of monitoring Windows hosts

At the same time * nix hosts are also monitored via SNMP and NRPE plugin. The scheme of its work is shown in the figure.

Rice. 4.4 - Scheme of monitoring * nix hosts

2.4 Writing plugins

In addition to writing initialization scripts, defining hosts and services, the following plugins were used:

├── check_disk

├── check_dns

├── check_http

├── check_icmp

├── check_ifoperstatus

├── check_ifstatus

├── check_imap -> check_tcp

├── check_linux_raid

├── check_load

├── check_mrtg

├── check_mrtgtraf

├── check_nrpe

├── check_nt

├── check_ping

├── check_pop -> check_tcp

├── check_sensors

├── check_simap -> check_tcp

├── check_smtp

├── check_snmp

├── check_snmp_load.pl

├── check_snmp_mem.pl

├── check_spop -> check_tcp

├── check_ssh

├── check_ssmtp -> check_tcp

├── check_swap

├── check_tcp

├── check_time

Most of them come with the Nagios package. Source texts of plugins not included in the delivery set and used in the system are presented in Appendix I.

4.2.5 Configuring SNMP on Remote Hosts

To be able to monitor using the SNMP protocol, you must first configure the agents of this protocol on the network equipment. The diagram of SNMP operation in conjunction with the core of the network monitoring system is shown in the figure below.

Rice. 4.5 - Scheme of monitoring via SNMP protocol

The configuration parameters of the hosts are presented in Appendix H. Security is carried out by individually configuring the packet filter on each of the hosts separately and by organizing secure system subnets to which only authorized personnel of the enterprise have access. In addition, the setting is made in such a way that using the SNMP protocol you can only read the parameters, and not write them.

4.2.6 Configuring the agent on remote hosts

To get advanced monitoring of hosts and services, you need to install a Nagios agent called nagios-nrpe-server on them:

# aptitude install nagios-nrpe-server

The agent configuration is shown in Appendix K. The agent workflow is shown in Figure 4.5 above.

4.4 Installing and configuring the download tracking module

MRTG (Multi Router Traffic Grapher) is a service that allows using the SNMP protocol to receive information from several devices and display the channel load graphs (incoming traffic, outgoing, maximum, average) in your browser window in steps of minutes, hours, days and per year.

Installation Requirements

The following libraries are required for MRTG to work:

§ gd - graph drawing library. The library responsible for the formation of graphics (# "justify"> § libpng - gd is required to create graphics in png format (# "justify"> In our case, the installation is reduced to the execution of one command, since the method of installing the precompiled package from the repository has been selected:

# aptitude install mrtg

You can create configuration files manually, or you can use the configuration generators included in the package:

# cfgmaker @ >

After generating the configuration file, it is recommended to check it, because it may contain descriptions of interfaces that we do not need to analyze for workload. In this case, certain lines in the file are commented out or deleted. An example MRTG configuration file is provided in Appendix M. Due to the large size of these files, only an example of one file is provided.

# indexmaker >

Index pages are ordinary html files and their contents are not of particular interest, so it makes no sense to give examples of them. Appendix H shows an example of displaying interface load graphs.

Finally, it is necessary to organize a check of the load on the interfaces on a schedule. The easiest way to achieve this is by means of the operating system, namely the crontab parameters.

4.5 Installing and configuring the module for collecting system event logs

The syslog-ng.ng (syslog next generation) package was chosen as the module for collecting system event logs - this is a multifunctional service for logging system messages. Compared to the standard syslogd service, it has a number of differences:

§ improved configuration diagram

§ filtering messages not only by priority, but also by their content

§ regexps support (regular expressions)

§ more flexible manipulation and organization of logs

§ the ability to encrypt the data transmission channel using IPSec / Stunnel

The following table lists the supported hardware platforms.

Table 4.1 - Supported hardware platforms

x86x86_64SUN SPARCppc32ppc64PA-RISCAIX 5.2 & 5.3NetNetNetDaPo zaprosuNetDebian etchDaDaNetNetNetNetFreeBSD 6.1 * DaPo zaprosuPo zaprosuNetNetNetHP-UNet 11iNetNetNetNetNetDaIBM System iNetNetNetDaNetNetRed Hat ES 4 / CentOS 4DaDaNetNetNetNetRed Hat ES 5 / CentOS 5DaDaNetNetNetNetSLES 10 / openSUSE 10.0DaPo zaprosuNetNetNetNetSLES 10 SP1 / openSUSE 10.1DaDaNetNetNetNetSolaris 8NetNetDaNetNetNetSolaris 9Po zaprosuNetDaNetNetNetSolaris 10After zaprosuDaDaNetNetNetWindowsDaDaNetNetNetNet Note: * Oracle database access is not supported

A detailed comparison of technical features is given in Appendix A.

Files describing rules and filters, as well as configuration of remote hosts are given in Appendix P.

There is an RFC document that describes in detail the syslog protocol, in general, the operation of the syslog collector module can be represented by the following diagram

Rice. 4.6 - Diagram of the module for collecting system logs

On the client host, each individual application writes its own event log, thereby forming a source. Further, the flow of messages for the logs goes through the determination of the storage location, then through the filters, its network direction is determined, after which, reaching the logging server, the storage location is again determined for each message. The selected module has great scalability and complicated configuration, for example, filters can branch out so that system event messages will be sent in several directions depending on several conditions, as shown in the figure below.

Rice. 4.7 - Branching filters

Scalability implies that in order to distribute the load, the administrator will deploy a network of auxiliary filtering servers, the so-called relay.

Rice. 4.8 - Scaling and load balancing

Ultimately, in the most simplified way, the operation of the module can be described as follows - client hosts transmit messages from event logs of different applications to unloading servers, which, in turn, can transmit them along a chain of relays, and so on to central collection servers.

Rice. 4.9 - Generalized scheme of the module

In our case, the data flow is not so large to deploy a system of offloading servers, so it was decided to use a simplified client-server operation scheme.

Rice. 4.10 - Accepted work scheme

5. SYSTEM ADMINISTRATOR'S GUIDE

In general, the system administrator is advised to adhere to the existing hierarchy of the location of configuration files and directories. Adding new hosts and services to the monitoring system comes down to creating new configuration files and initialization scripts, as was shown in Section 5 - Software Development, so there is no point in re-describing the parameters and principles of configuring the system in this work, but it is worthwhile to dwell on the description in more detail interfaces of individual modules of the system.

5.1 Description of the system web interface

In order to perform interactive monitoring of services, it was more convenient to integrate a web interface into the system. The web interface is also good in that it gives a complete picture of the system thanks to the skillful use of graphical tools and the provision of additional statistical information.

When you log into the Nagios web page, you will be prompted for the username and password that we set up during the setup process. The start page of the web interface is shown in the figure below.

Rice. 5.1 - Start page of the system web interface

On the left is the navigation pane, on the right is the results of various views of data about the state of the network, hosts, and services. We will be primarily interested in the Monitoring section. Let's take a look at the Tactical Overview page.

Rice. 5.2 - Start page of the system web interface

This page contains summarizing information on all monitoring parameters and the status of hosts and services, while no details are provided, however, if any problems arise, they are highlighted in a special color and become a hyperlink leading to a detailed description of the problem that has arisen. In our case, at the moment, among all the hosts and services, there is one unresolved problem, we will follow this link (1 Unhandled Problems).

Rice. 5.3 - Service Problem Detected

Here, in a tabular form, we observe on which host the problem arose, what service caused it (in our case, this is a large processor load on the router), the error status (it can be normal, threshold and critical), the time of the last check, the duration of the problem, the number of the check on the account in the loop and detailed information with specific values returned by the plugin being used.

Rice. 5.4 - Detailed description of service status

Here we see a full description of the problem, this page is useful for in-depth analysis of the problem, when the cause of its occurrence is not entirely clear, for example, it may be in too rigidly set threshold values of the criticality of the state or incorrectly set parameters for launching the plugin, which will also be evaluated by the system as a critical state ... In addition to the description, from this page it is possible to execute commands on the service, for example, disable checks, schedule a different time for the next check, accept data passively, accept the problem for consideration, disable alerts, send a manual alert, schedule a service shutdown, disable unstable state detection and write a comment.

Let's go to the Service Detail page.

Rice. 5.5 - Detailed view of all services

Here we see a list of all hosts and services, regardless of their current state. This feature can be useful, but it is not very convenient to look through a long list of hosts and services and it is rather necessary to visually represent the amount of work performed by the system from time to time. Here, each host and service, as in Figure 6.3, is a link leading to a more detailed description of the parameter.

Rice. 5.6 - Complete Detailed Host List

This table provides a complete detailed list of hosts, their statuses, the time of the last check, the duration of the current status, and additional information. In our system, it is accepted that the status of a host is checked by checking the availability of the host using the ICMP protocol (8), that is, by using the ping command, but in the general case, the check can be anything you like. The icons in the column to the right of the host name indicate the group to which it belongs; this is done for the convenience of reading information. A traffic light is a link leading to a detailed list of services for a given host, it makes no sense to describe this table separately, it is exactly the same as in Figure 10.4, only information is presented about a single host.

The following links on the list are various modifications of the previous tables and it will not be difficult to understand their content. The most interesting feature of the web interface is the ability to build a network map in a semi-automatic mode.

Rice. 5.7 - Complete circular network map

Through the parent parameter of each host and service, we can create a structure or hierarchy of our network, which will determine the logic of the network monitoring kernel and the presentation of hosts and services on the network map. There are several display modes, besides circular, the most convenient are balanced tree and spherical.

Rice. 5.8 - Network Map - Balanced Tree Mode

Rice. 5.9 - Network Map - Ball Mode

In all modes, the image of each host is a link to its table of services and their states.

The next important part of the monitoring engine interface is the trend builder. With its help, you can plan the replacement of equipment with a more productive one, we will give an example. Click on the Trends link. We select the type of report - service.

Step 1: Select Report Type: Service

The third step is to select the counting period and generate a report.

Rice. 5.10 - Trend

We have generated a trend in the processor utilization in the routing. From it we can conclude that within a month this parameter is constantly deteriorating and it is necessary to take measures either to optimize the host's operation or to prepare to replace it with a more productive one.

5.2 Description of the web interface of the interface load tracking module

The web interface of the interface load tracking module is a list of directories that contain index pages of monitored hosts with graphs of the load of each interface.

Rice. 5.11 - Start page of the interface load tracking module

By clicking on any of the links, we get the download graphs. Each graph is a link leading to statistics for the week, month, and year.

5.3 Description of the module for collecting system event logs

At the moment, improved filtering of system logs and the ability to search through them through a single web interface are not required. problems that require viewing these logs are rare. Therefore, the development of a database for these magazines and the web interface has been postponed. Currently, they are accessed via ssh and directory browsing in the mc file manager.

As a result of the work of this module, we got the following directory structure:

├── apache2

├── asterix

├── bgp_router

├── dbconfig-common

├── installer

│ └── cdebconf

├── len58a_3lvl

├── monitoring

├── nagios3

│ └── archives

├── ocsinventory-client

├── ocsinventory-server

├── quagga

├── router_krivous36b

├── router_lenina58a

├── router_su

├── router_ur39a

├── shaper

├── ub13_router

├── univer11_router

└── voip

Each directory is a repository of event logs for each individual host.

Rice. 5.13 - Viewing data collected by the system event log collection module

6. TESTING THE WORK

During the implementation of the system, gradual testing of the operation of each component was carried out, starting with the core of the system. The expansion of the functionality was carried out only after the final adjustment of the lower levels of the network monitoring system modules in the hierarchy due to the many dependencies of various subsystems. Step by step, in general, you can describe the implementation and testing process as follows:

) Installation and debugging of the kernel based on Nagios;

) Setting up monitoring of remote hosts with the basic functionality of Nagios;

) Setting up the module for monitoring the load of network interfaces by means of MRTG;

) Expansion of the functionality of the core of the system and its integration with the MRTG module;

) Adjustment of the module for collecting system logs;

) Writing a script for initializing the packet filter of the monitoring system in order to ensure system security.

7. Information security

1 Characteristics of the workplace

The harmful factors affecting the work when using a PC include:

· increased value of the voltage of the electric current;

· noise;

· electromagnetic radiation;

· electrostatic field.

To ensure the best conditions for efficient and safe work, it is necessary to create such working conditions that will be comfortable and minimize the impact of these harmful factors. It is necessary that the listed harmful factors are consistent with the established rules and regulations.

7.2 Occupational safety

2.1 Electrical safety

The projected software tool is created with the expectation of working on an existing server located in a specially equipped technical room. It is equipped with cable ducts for cable routing. Each server is supplied with ~ 220V power supply, 50Hz, with a working ground. Before entering the power supply into the room, automatic devices are installed that cut off the power supply in the event of a short circuit. Protective grounding is carried out separately.

When connecting a computer, it is necessary to connect the equipment case to the protective grounding conductor so that in the event of insulation failure or for some other reason, the dangerous power supply voltage, when a person touches the equipment case, could not create a dangerous current through the human body.

To do this, use the third contact in electrical outlets, which is connected to the protective earth conductor. The equipment enclosures are grounded through the power supply cable using a dedicated conductor.

Technical measures are applied to ensure protection against electric shock when touching the body of an electrical installation in case of insulation breakdown of live parts, which include:

· protective grounding;

· protective shutdown.

7.2.2 Noise protection

Research shows that hearing loss is the most important factor in noise. But the effect of noise is not limited to the effect only on hearing. It causes noticeable changes in a number of physiological mental functions. Noise has a harmful effect on the nervous system and reduces the speed and accuracy of sensorimotor processes, and the number of errors in solving intellectual problems increases. Noise has a noticeable effect on a person's attention and causes negative emotions.

The main source of noise in rooms where computers are located is air conditioning equipment, printing and copying equipment, and in the computers themselves, cooling fans.

The following noise control measures are actively used in the production area:

· the use of silent cooling mechanisms;

· isolation of noise sources from the environment by means of sound insulation and sound absorption;

· the use of sound-absorbing materials for facing the premises.

The following noise sources are present in the workplace at work:

· system unit (cooler (25dB), hard disk (29dB), power supply (20dB));

· printer (49dB).

The total noise L emitted by these devices is calculated using the formula:

where Li is the noise level of one device, dB = 10 * log (316.23 + 794.33 + 100 + 79432.82) = 10 * 4.91 = 49.1 dB

According to SN 2.2.4 / 2.1.8.562-96, the noise level at the workplace of mathematicians-programmers and video operators should not exceed 50 dB.

7.2.3 Protection against electromagnetic radiation

Protection against electromagnetic interference is provided by screens with an electrically conductive surface and the use of monitors equipped with the Low Radiation system, which minimizes the level of harmful radiation, as well as LCD monitors, in which there is no electromagnetic radiation at all.

7.2.4 Protection against electrostatic field

To protect against electrostatic charges, a grounded protective filter, air humidifiers are used, and the floors are covered with an antistatic coating. To maintain the normalized values of the concentration of positive and negative ions in rooms with computers, air conditioners, air ionization devices are installed and natural ventilation is carried out for at least 10 minutes after every 2 hours of operation.

In order to prevent the harmful effect of dust particles with aeroinos on the body of working people, wet cleaning of the premises is carried out daily, and at least 1 time per shift, dust is removed from the screens when the monitor is off.

7.3 Working conditions

3.1 Microclimate of the production area

The equipment considered in this diploma project does not generate any harmful substances during operation. Thus, the air environment in the room where they are used does not have any harmful effects on the human body and meets the requirements of category I of work, according to GOST 12.1.005-88.

Optimal norms of temperature, relative humidity and air velocity in the working area of industrial premises are standardized by GOST 12.1.005-88 and are shown in Table 7.1.

Table 7.1 - Microclimate parameters

Standardized parameter Value Optimal Allowable Actual Air temperature, C20 - 2218 - 2020 Humidity,% 40 - 60 Not more than 8045 Air velocity, m / s0.20.30..0.3

The microclimate corresponds to the optimal conditions.

3.2 Industrial lighting

For the calculation, we select the support department at Gerkon LLC in the city of Verkhnyaya Pyshma, where the development of this project took place:

· the area of the room is 60m2;

· area of light openings 10 m2;

· 4 automated workstations were installed.

The calculation of natural illumination is made according to the formula SNiP 23.05-95:

S0 = Sп * en * Кз * N0 * KZD / 100% * Т0 * Т1 (7.2)

Where S0 is the area of light openings, m2;

Sп - floor area of the room, m2, 60;

eн - coefficient of natural illumination, 1.6;

Кз - safety factor, 1.5;

N0 - light characteristic of windows, 1;

KZD - coefficient taking into account the darkening of windows by opposing buildings, 1.2;

T0 is the general coefficient of light transmission, 0.48;

T1 - coefficient of reflection from the surface of the room, 1.2.

The values of all coefficients are taken in SNiP 23.05.-95.

As a result of the calculation, we obtain: the required area of the light openings of the windows S0 = 3.4m2. The actual area of the openings is 10m2, which exceeds the minimum allowable area of light openings for this type of room and is sufficient during daylight hours.

Calculation of artificial lighting for a room illuminated by 15 fluorescent lamps of the LDTs-60 type with a power of 60W each.

According to SNiP 23.05-95, the amount of illumination by fluorescent lamps should be at least 300lm in the horizontal plane - for a general lighting system. Taking into account the visual work of high accuracy, the illumination value can be increased up to 1000lm.

The luminous flux of a fluorescent lamp is calculated using the formula from SNiP 23.05.-95:

Phi = En * S * Z * K / N * η (7.3)

where En - normalized room illumination, lux, 200;

S - floor area of the room, m2, 60;

Z - coefficient taking into account the ratio of average to minimum illumination, 1.1;

K is the safety factor taking into account air pollution, 1.3;

N - number of lamps, 15;

η - luminous flux utilization factor, 0.8.

As a result, we get Phi = 1340lm, the total luminous flux of all lamps is 3740lm, therefore, the illumination of the laboratory is higher than the minimum allowable.

7.4 Ergonomics of the workplace

4.1 Organization of the workplace

In accordance with SanPiN 2.2.2 / 4.2.1340-03, VDT (video display terminal) must meet the following technical requirements:

· illumination brightness not less than 100cd / m2;

· the minimum size of the light point is not more than 0.1 mm for a color display;

· the contrast of the image of the sign is not less than 0.8;

· vertical scan frequency not less than 7 kHz

· the number of points is not less than 640;

· anti-reflective coating of the screen;

· screen size not less than 31cm diagonally;

· the height of characters on the screen is not less than 3.8mm;

· the distance from the operator's eyes to the screen should be about 40-80cm;

The RCCB should be equipped with a turntable that allows it to be moved in horizontal and vertical planes within 130-220mm and to change the screen tilt angle by 10-15 degrees.

The diploma project was carried out on a computer with a 39cm ViewSonic VDT. This monitor is manufactured in accordance with global standards and meets all of the above technical requirements.

The following requirements are imposed on the keyboard:

· body painting in calm soft colors with diffuse light diffusion;

· matte surface with a reflectivity of 0.4 - 0.6 and does not have shiny parts that can create glare;

The project was carried out on a Logitech brand keyboard that meets all of the above requirements.

System units are installed in the workplace with easy access to floppy drives and easy access to connectors and controls on the back. Frequently used floppy disks are stored near the system unit in a dust and electromagnetically protected cell. Printer located to the right of the user. The printed text is visible to the operator when he is in the main working position. Blank paper and other essential supplies are stored close to the printer in dedicated compartments.

The connecting cables are laid in special ducts. The arrangement of the channels must be such that the connectors do not impede the removal of the cables.

For a manipulator like "mouse" to the right of the user on the tabletop there is a free area, which should be identical in shape and size to the surface of the screen.

The operator's workplace meets the requirements of GOST 12.2.032-78 SSBT.

The spatial organization of the workplace ensures an optimal working posture:

· the head is tilted forward by 10 - 20 degrees;

· the back has a support, the ratio between the shoulder and the forearm, and also between the thigh and lower leg is a right angle.

The main parameters of the workplace must be adjustable. This ensures the possibility of creating favorable working conditions for an individual, taking into account the geoanthropometric characteristics.

The main parameters of the workplace and furniture equipped with a personal computer (Fig. 7.1)

Rice. 7.1 - Workstation of the computer operator

· seat height 42 - 45 cm;

· the height of the keyboard from the floor is 70 - 85cm;

· the angle of inclination of the keyboard from the horizontal 7 - 15 degrees;

· remoteness of the keyboard from the edge of the table 10 - 26cm;

· distance from the center of the screen to the floor 90 - 115cm;

· tilt angle of the screen from the vertical 0 - 30 degrees (optimal 15);

· distance of the screen from the edge of the table 50 - 75 cm;

· height of the working surface for records 74 - 78 cm;

The workplace is provided with a footrest recommended for all types of work related to long-term preservation in a sitting position

According to SanPiN 2.2.2.542-96, the nature of the work of a computer operator is considered easy and belongs to category 1A.

There are breaks after 2 hours from the beginning of the work shift and 2 hours after the lunch break, each lasting 15 minutes. During regulated breaks, in order to reduce neuro-emotional stress, fatigue, and eliminate the influence of hypodynamia, exercise complexes are performed.

7.5 Fire safety

The room where the work on this project was carried out, a fire hazard category was established in NPB 105-03 - flammable and non-flammable liquids, solid flammable and non-flammable substances and materials, including dust and fibers, substances and materials capable of interacting with water, oxygen air or with each other only to burn, provided that the premises in which they are available or formed do not belong to categories A or B. Building for premises of I degree of fire resistance according to SNiP 21-01-97.

The following safety rules are observed in the production area:

· passages, exits from the premises, access to fire extinguishing means are free;

· equipment in operation is in good working order and is checked every time before starting work;

· at the end of the work, the room is examined, the power supply is de-energized, the room is closed.

The number of evacuation exits from buildings from the premises is two. The width of the emergency exit (door) is 2m. Escape routes use ordinary stairs and swing doors. The stairwells lack any premises, technological communications, exits of lifts and freight elevators. Escape routes are equipped with both natural and artificial emergency lighting.

As the primary means of fire extinguishing in the room, there are two hand-held carbon dioxide fire extinguishers in the room.

To detect the initial stage of ignition and alert the fire service, an automatic and fire alarm system (APS) is used. It independently activates the fire extinguishing installations until the fire reaches a large size and notifies the city fire service.

The objects of the exhibition center, except for the APS, must be equipped with stationary fire extinguishing installations. Applications of the installation of gas extinguishing fires, the action of which is based on the rapid filling of the premises with a fire extinguishing gas substance, as a result of which the oxygen content in the air decreases.

7.6 Emergencies

A fire is the most likely emergency in this room. In the event of a fire, it is necessary to evacuate personnel and report the incident to the fire brigade. The evacuation plan is shown in Figure 7.2.

Rice. 7.2 - Fire escape plan

8. ECONOMIC SECTION

This section discusses the costs of developing a network monitoring system, its implementation and maintenance, as well as related materials and equipment.

The cost of the project reflects the cost of the means and objects of labor consumed in the development and production process (depreciation, the cost of equipment, materials, fuel, energy, etc.), a part of the cost of living labor (labor remuneration), the cost of purchased modules of the system.

In the process of activity and an increase in the volume of services supply, the problem of proactive detection of faulty and weak points in the network organization arose, that is, the task was to implement a solution that allows predicting the need to replace or modernize network sections before the faults affect the operation of subscriber nodes.

With the growth of the client base, and as a result, the number of active equipment, it became necessary to promptly monitor the state of the network as a whole and its individual elements in detail. Before the implementation of the network monitoring system, the network administrator had to connect using the telnet, http, snmp, ssh, etc. protocols. to each network node of interest and use the built-in monitoring and diagnostics tools. At the moment, the network capacity is 5,000 ports, 300 Layer 2 switches, 15 routers and 20 internal servers.

In addition, network congestion and floating faults were detected only when serious user problems arose, which made it impossible to make plans for network upgrades.

All this led, first of all, to a constant deterioration in the quality of the services offered and an increase in the load on system administrators and technical support for users, which entailed colossal losses.

In accordance with the current situation, it was decided to develop and implement a network monitoring system that would solve all the above problems, which, summarized, can be expressed as follows:

It is necessary to develop and implement a monitoring system that allows monitoring both switches, routers from different manufacturers, and servers of different platforms. Focus on the use of open protocols and systems, with the maximum use of ready-made developments from the free software fund, which from an economic point of view reduces the cost of licensing the final system to zero.

The system must meet the following economic requirements:

· minimum requirements for hardware resources (leads to lower costs for the hardware part of the project);

· open source codes of all components of the complex (allows you to independently change the principle of the system without resorting to the help of third-party proprietary developments and reduces the cost of product licensing);

· extensibility and scalability of the system (allows you to expand the scope of the application without resorting to the help of third-party and proprietary developments and reduces the cost of product licensing);

· standard means of providing diagnostic information (allows you to reduce the cost of maintaining the system);

· availability of detailed documentation for all software products used (makes it possible to quickly train a new employee);

· the ability to work with equipment from different manufacturers (makes it possible to use one software product). (See Appendix B for a complete list of equipment).

In general, the development of the project took 112 hours (2 weeks). The implementation of this project will take 56 hours (1 week).

1 Calculation of project development costs

Project development costs consist of:

· payroll costs;

· depreciation costs of equipment and software products;

· the cost of paying for electricity;

· overhead costs.

Payroll expenses.

When calculating the cost of wages, we take into account that this project was developed by one person: a systems engineer.

The average market salary of a system engineer of the required level in the region is 30,000 rubles.

Let's calculate the cost of 1 hour of work of an engineer, based on the following data:

· premium 25%;

· regional coefficient 15%;

· the fund of working time in 2010, in accordance with the production calendar, is 1988 hours;

Thus, the rate, taking into account the regional coefficient, will be:

RF = 30,000 * 1.25 * 1.15 * 12/1988 = 260 rubles

In calculating the cost of wages, deductions paid from accrued wages are taken into account, that is, the total value of the rate of insurance premiums will be equal to the maximum UST rate - 26%, including:

· Pension Fund - 20%;

· FSSR - 2.9%

· FFOMS - 1.1%;

· GFOMS - 2%;

· Compulsory social insurance against accidents - 0.2%.

In total, the deductions will be:

CO = RF * 0.262 = 260 * 0.262 = 68 rubles

Taking into account the engineer's work time (112 hours for development and 56 hours for implementation), we will calculate the salary costs:

ZP = (112 + 56) * (RF + CO) = 168 * 328 = 55104 rubles

Depreciation expenses for equipment and software products.

A personal computer and an AQUARIUS SERVER T40 S41 server were used as the main equipment at the stage of developing the network project. The cost of a computer at the moment is about 17,000 rubles, while a server is 30,000 rubles.

Thus, the cost of a one-time investment in equipment will be:

РВА = 47000 rubles

During the life of the computer and the server, their modernization is allowed, this type of cost is also taken into account in the calculation. We put 50% of the RV for modernization:

РМА = РВ * 0.5 = 23500 rubles

The computer was used in the following steps:

· literature search;

· search for solutions for the design of a network monitoring system;

· development of structures and subsystems;

· designing a network monitoring system;

· registration of the document.

The server was used during the implementation of the system and direct work with the system.

The software products used in the development were obtained under free licenses, which indicates their zero cost and the absence of the need for their depreciation.

Thus, the total cost of equipment, taking into account depreciation, will be:

OZA = РВА + РМА = 47000 + 23500 = 70500 rubles

The useful life is assumed to be 2 years. The cost of one hour of work is (assuming the number of working days in a month is 22 and with an 8-hour working day):

SOCHR = OZA / BP = 70500/4224 = 16.69 rubles

At the time of development and implementation, the cost of depreciation deductions will accordingly be:

SACHRV = SOCHR * TRV = 16.69 * 168 = 2803.92 rubles

Electricity costs.

Electricity costs are the sum of those consumed by the computer and used for lighting. Electricity cost:

SEN = 0.80 rubles / kW * h (By agreement with the owner of the premises)

Рк, с = 200 W - the power consumed by the computer or server.

Trk = 168 hours - computer operation time at the stage of system development and implementation.

Trs = 52 hours - server operation time at the stage of system development and implementation.

Thus, the cost of electricity at the stage of development and implementation of the project will be:

SENP = Rk * Trk * SEN + Rk * Trc * SEN = (200 * 168 * 0.80 + 200 * 52 * 0.80) / 1000 = (26880 + 8320) / 1000 = 35.2 rubles

The workplace where this work was carried out is equipped with a 100 W lamp. Let's calculate the cost of electricity consumed by the lighting device during the development and implementation of the system:

SENO = 100 * Trk * SEN = (100 * 168 * 0.80) / 1000 = 13.44 rubles

The total electricity costs were:

OZEN = SENP + SENO = 35.2 + 13.44 = 48.64 rubles

8.2 Calculation of overheads

This cost item covers the cost of other equipment and consumables, as well as incidental expenses.

Overhead costs in the company's budget are 400% of the accrued wages:

NR = salary * 4 = 55104 * 4 = 220416 rubles.

Thus, the costs for the development and implementation of the project were:

SRV = ZP + SARV + OZEN + NR = 55104 + 2803.92 + 48.64 + 220416 = 278372.56 rubles

3 Efficiency

As a result of the economic calculations, the minimum price for the development and implementation of the network monitoring system was set at 278372.56 rubles.

As can be seen from the calculations, the overwhelming part of the cost of expenditures falls on materials and equipment. This is due to the fact that the main equipment manufacturers are foreign companies and, accordingly, the prices for these products are quoted in US dollars at the CBRF rate + 3%. And the increase in customs duties on imported products also negatively affects the price for end customers.

To justify the independent development of the system, let's compare its cost with the ready-made solutions available on the market:

· D-Link D-View - 360,000 rubles

Active network equipment must ensure long-term and uninterrupted functioning of the corporate network. Timely identification and elimination of faults is the key to the successful and efficient work of the company. That is why it is very important to pay special attention to the monitoring system, which would track the status of active equipment and notify the system administrator about deviations from normal indicators by SMS, e-mail or other means of notification.

Monitoring system Is a set of technical means that continuously monitor and collect information in a local area network based on the analysis of statistical data in order to identify faulty or incorrectly operating nodes and notify responsible persons. The functionality of modern monitoring systems allows you to monitor the status of services such as:

1) Host availability

By periodically sending ICMP Echo-Request to the address of the network device

2) Web server availability

By sending an HTTP request to get the page

3) Availability of mail services

By periodically sending diagnostic SMTP messages

In addition, you can measure the response time of these services.

Periodic checks of this kind allow you to quickly determine at what level the problem has occurred and immediately begin to fix it.

The above figure shows the simplest example of implementing a monitoring system that monitors only four devices. In real conditions, a fleet of active equipment can have much more nodes. To carry out competent monitoring, different types of nodes are combined into groups, for example, a group of web servers or a group of routers. This kind of separation helps to organize statistical information and makes the observation process easier.

Most monitoring systems allow automating SNMP device checking and diagnostics using various plug-ins (including manually created ones).

SNMP (Simple Network Management Protocol) - was created specifically for the needs of monitoring network equipment. All active L2 and L3 devices contain the so-called Management Information Base (MIB), which contains the main parameters of the equipment status. For example, CPU load, interface status, free space, etc. Each such record corresponds to a unique OID (Oject IDentifier). Having the required identifier, you can get information about the parameter of interest using the SNMP protocol. Modern monitoring systems make it possible to automate this process. The system, using the SNMP protocol, connects to the device, polls it for the OID of interest, gets the parameter value and compares it with the specified one. If a discrepancy between these two values is detected, the monitoring system reacts and starts the notification process.

Before the direct implementation of the monitoring system, it is necessary to conduct a LAN survey, the result of which should be a list of monitored equipment, parameters and an approved algorithm for escalating monitoring events. Based on the analysis of the customer's network infrastructure, the first decisions that determine the architecture of the future monitoring system are formed.

At the next stage, specifications and a package of design documentation are drawn up, taking into account the wishes of the customer.

The final stage is scaling the monitoring system, that is, expanding the volume of the monitored IT infrastructure to the one required by the customer.

Implementation of the monitoring system is an important step towards the complete automation of the IT infrastructure, which leads to an increase in the efficiency of its use. The specialists of our company have repeatedly developed solutions that meet the expectations of customers and have been working reliably for several years now.

Was this article helpful to you?

Please tell me why?

We are sorry that the article was not useful to you: (Please, if it does not make it difficult, indicate why? We will be very grateful for a detailed answer. Thank you for helping us become better!