Data warehousing and data mining techniques for cyber security pdf


















For example, a classification model can be built to categorize bank loan applications as either safe or risky. A prediction model can be built to predict the expenditures of potential customers on computer equipment given their income and occupation.

Some of the basic techniques for data classification are decision tree induction, Bayesian classification and neural networks. These techniques find a set of models that describe the different classes of objects. These models can be used to predict the class of an object for which the class is unknown. Clustering is based on the principle of maximizing the intraclass similarity and minimizing the interclass similarity.

In business, clustering can be used to identify customer groups based on their purchasing patterns. It can also be used to help classify documents on the web for information discovery. Due to the large amount of data collected, cluster analysis has recently become a highly active topic in data mining research.

As a branch of statistics, cluster analysis has been extensively studied for many years, focusing primarily on distance based cluster analysis. In machine learning, clustering is an example of unsupervised learning. For this reason clustering is an example of learning by observation. These data objects are called outliers. These outliers are useful for applications such as fraud detection and network intrusion detection.

It can help business managers fmd and reach suitable customers as well as develop special intelligence to improve market share and profits. Here are some applications of data mining. Recent research in DNA data analysis has enabled the discovery of genetic causes of many diseases as well as discovery of new medicines. One of the important search problems in genetic analysis is similarity search and comparison among the DNA sequences.

Data mining techniques can be used to solve these problems. Intrusion Detection and Network Security: This will be discussed further in later chapters.

Financial Data Analysis: Most financial institutions offer a variety of banking services such as credit and investment services. Data warehousing techniques can be used to gather the data to generate monthly reports. Data mining techniques can be used to predict loan payments and customer credit policy analysis. Data Analysis for Retail Industry: Retail is a big application of data mining since it collects huge amount of data on sales, shopping history and service records.

It can also be used to analyze effectiveness of sales campaigns. It also provides an application toolkit for neural network algorithms and data visualization. It includes scalability of mining algorithms and tight integration with IBM's DB2 relational database systems. It provides multiple data mining algorithms including regression, classification and statistical analysis packages.

One of it's distinctive feature is the variety of statistical analysis tools, which are built based on the long history of SAS in the market for statistical analysis.

It also provides multiple data mining algorithms and advanced visualization tools. One distinguishing feature of MineSet is the set of robust graphics tools such as rule visualizer, tree visualizer and so on. It provides an integrated data mining development environment for end users and developers. It's object oriented extended module interface allows user's algorithms and utilities to be added to Clementine's visual programming environment. It provides multiple data mining algorithms including discovery driven OLAP analysis, association, classification and clustering.

A distinct feature of DBMiner is its data cube based analytical mining. Interested readers can consult surveys on data warehousing and data mining products. The primary functions supported by GEMS are: ticketing, proactive management of client's networks, client's asset management, network engineering and billing. GEMS applications use an Integrated Database to store fault tickets, assets and inventory management information. An SLA is a contract between the service provider and a customer usually an enterprise on the level of service quality that should be delivered.

An SLA can contain the following metrics: 1. Available network bandwidth e. Penalty e. In addition, the provider must have the ability to drill down to detailed data in response to customer inquires. The main reason to separate the decision support data from the operation data is performance. Operational databases are designed for known transaction workloads. Moreover special data organization and access methods are required for optimizing the report generation process.

This project also required data integration and data fusion from many external sources such as operational databases and flat files. The main components used in our system are as follows. Ascential's DataStage Tool is an Extraction-Transformation-LoadManagement ETLM class of tool that defines how data is extracted from a data source, transformed by the application of functions, joins and possibly external routines, and then loaded into a target data source.

DataStage reads data from the source information repositories and it applies transformations as it loads all data into a repository atomic database. Once the atomic data repository is loaded with all source information a second level of ETL transformations is applied to various data streams to create one or more Data Marts. Data Marts are a special sub-component of a data warehouse in that they are highly de-normalized to support the fast execution of reports.

Some of these Data Marts are created using Star Schemas. Also, the execution of reports does not impact the applications that are using the source databases. The schemas in the DW are optimized by using de-normalization and pre-aggregation techniques. This results in much better execution time for reports. Some of the open research problems that we are currently investigating are: Time to refresh the data in the data warehouse was large and report generation activity had to be suspended until the time when changes were propagated into the DW.

Therefore, there was a need to investigate incremental techniques for propagating the updates from source databases Loading the data in the data warehouse took a long time 10 to 15 hours.

In case of any crashes, the entire loading process had to be re-started. There was no good support for tracing the data in the DW back to the source information repositories. This process, which is used to design, deploy and manage the data marts is called the ETL Extract, Transform and Load process.

There are a number of open research problems in designing the ETL process. Maintenance of Data Consistency: Since source data repositories continuously evolve by modifying their content or changing their schema one of the research problems is how to incrementally propagate these changes to the central data warehouse. Both re-computation and incremental view maintenance are well understood for centralized relational databases.

However, more complex algorithms are required when updates originate from multiple sources and affect multiple views in the Data Warehouse. The problem is further complicated if the source databases are going through schema evolution. Maintenance of Summary Tables: Decision support functions in a data warehouse involve complex queries. It is not feasible to execute these queries by scanning the entire data.

Therefore, a data warehouse builds a large number of summary tables to improve performance. As changes occur in the source databases, all summary tables in the data warehouse need to be updated. A critical problem in data warehouse is how to update these summary tables efficiently and incrementally. Incremental Resuming of Failed Loading Process: Warehouse creation and maintenance loads typically take hours to run. If the load is interrupted by failures, traditional recovery methods undo the changes.

The administrator must then restart the load and hope that it does not fail again. More research is required into algorithms for resumption of the incomplete load so as to reduce the total load time. Tracing the Lineage of Data: Given data items in the data warehouse, analysts often want to identify the source items and source databases that 20 Anoop Singhal produced those data items. Research is required for algorithms to trace the Uneage of an item from a view back to the source data items in the multiple sources.

Data Reduction Techniques: If the input data is very large, data analysis can take a very long time making such analysis impractical or infeasible. There is a need for data reduction techniques that can be used to reduce the data set so that analysis on the reduced set is more efficient and yet produce the same analytical results.

The following are examples of some of the algorithmic techniques that can be used for data reduction. These operations reduce the amount of data in the DW and also improve the execution time for decision support queries on data in the DW - Dimension Reduction: This is accomplished by detecting and removing irrelevant attributes that are not required for data analysis.

Data Compression: Use encoding mechanisms to reduce the data set size. Data Integration and Data Cleaning Techniques: Generally, data analysis task includes data integration, which combine data from multiple sources into a coherent data store. These sources may include multiple databases or flat files. A number of problems can arise during data integration. Real world entities in multiple data sources can be given different names. How does an analyst know that employee-id in one database is same as employee-number in another database.

We plan to use meta-data to solve the problem of data integration. Data coming from input sources tends to be incomplete, noisy and inconsistent.

If such data is directly loaded in the DW it can cause errors during the analysis phase resulting in incorrect results. Data cleaning methods will attempt to smooth out the noise, while identifying outliers, and correct inconsistencies in the data. We are investigating the following techniques for noise reduction and data smoothing. Intuitively, values that fall outside of the set of clusters may be considered outliers.

Using regression to find a mathematical equation to fit the data helps smooth out the noise. Data pre-processing is an important step for data analysis. Detecting data integration problems, rectifying them and reducing the amount of data to be analyzed can result in great benefits during the data analysis phase.

However, the problem of propagating changes in a DW environment is more complicated due to the following reasons: a In a DW, data is not refreshed after every modification to the base data.

Rather, large batch updates to the base data must be considered which requires new algorithms and techniques. These transformations may include aggregating or summarizing the data. Therefore techniques are required that can deal with both source data changes and schema changes. It enables users to "drill through" from the views in the DW all the way to the source data that was used to create the data in the DW.

However, their methods lack techniques to deal with historical source data or data from previous source versions. They typically use a multidimensional data model to facilitate data analysis.

They are implemented using a three tier architecture. The middle tier is a OLAP server and the top tier is a client, containing query and reporting tools. Data mining is the task of discovering interesting patterns from large amounts of data where data can be stored in multiple repositories. Efficient data warehousing and data mining techniques are challenging to design and implement for large data sets.

We have also described some open research problems that need to be solved in order to efficiently extract data from distributed information repositories. We believe that there are several important research problems that need to be solved to build flexible, powerful and efficient data analysis applications using data warehousing and data mining techniques.

References 1. Chaudhuri, U. Conference on Data Engineering, Cui and J. Chaudhuri, G. Das and V. Weiss and J.

Gupta and I. Vipin Kumar et al. Bernstein P. Hector Garcia Molina, J. Ullman, J. It presents threats against networked applications such as denial of service attacks and protocol attacks. It also presents a brief discussion on firewalls and intrusion detection systems Key words: computer virus, worms, DOS attacks, firewall, intrusion detection Computer security is of importance to a wide variety of practical domains ranging from banking industry to multinational corporations, from space exploration to the intelligence community and so on.

The principal components of a computer that need to be protected are hardware, software and the communication links. This chapter describes different kind of threats related to computer security and protection mechanisms that have been developed to protect the different components. We first present information about computer viruses and worms followed by techniques to handle them.

A virus is a program that can "infect" other programs by modifying them and inserting a copy of itself into the program. This copy can then go to infect other programs. Just like its biological counterpart, a computer virus carries in its instructional code the recipe for making perfect copies of itself.

A virus attaches itself to another program and then executes secretly when the host program is run. During it lifetime a typical virus goes through the following stages: Dormant Phase: In this state the virus is idle waiting for some event to happen before it gets activated.

Propagation Phase: In this stage the virus makes an identical copy of itself and attaches itself to another program. This infected program contains the virus and will in turn enter into a propagation phase to transmit the virus to other programs. Triggering Phase: In this phase the virus starts performing the function it was intended for.

The triggering phase can also be caused by a set of events. Execution Phase: In this phase the virus performs its fiinction such as damaging programs and data files. Network and System Security 1. Parasitic Virus: This is the most common kind of virus. It attaches itself to executable files and replicates when that program is executed.

Memory Resident Virus: This kind of virus resides in main memory. When ever a program is loaded into memory for execution, it attaches itself to that program. Boot Sector Virus: This kind of virus infects the boot sector and it spreads when the system is booted from the disk.

Stealth Virus: This is a special kind of virus that is designed to evade itself from detection by antivirus software. Polymorphic virus: This kind of virus that mutates itself as it spreads from one program to the next, making it difficult to detect using the "signature" methods. These applications have a feature called macro that people use to automate repetitive tasks.

The macro is written in a programming language such as Basic. The macro can be set up so that it is invoked when a certain function key is pressed. Certain kinds of macros are auto execute, they are automatically executed upon some events such as starting the execution of a program or opening of a file.

These auto execution macros are often used to spread the virus. New version of MS Word provides mechanisms to protect itself from macro virus. One example of this tool is a Macro Virus Protection tool that can detect suspicious Word files and alert the customer about a potential risk of opening a file with macros.

As a result these viruses can spread in a few hours and it becomes very hard for anti-virus software to respond before damage is done. A worm on the other hand typically propagates by itself A worm uses network connections to propagate from one machine to another.

Some examples of these connections are: Electronic mail facility Remote execution facility Remote login facility A worm will typically have similar phases as a virus such as dormant phase, a propagation phase, a triggering phase and an execution phase.

The propagation phase for a worm uses the following steps: Search the host tables to determine other systems that can be infected. Establish a connection with the remote system Copy the worm to the remote system and cause it to execute Just like virus, network worms are also difficuh to detect. However, properly designed system security applications can minimize the threat of worms.

It was designed to spread on UNIX systems and it used a number of techniques to propagate. In the beginning of the execution, the worm would discover other hosts that are known to the current host. The worm performed this task by examining various list and tables such as machines that are trusted by this host or user's mail forwarding files. For each discovered host, the worm would try a number of methods to login to the remote host: Attempt to log on to a remote host as a legitimate user.

Use the finger protocol to report on the whereabouts of a remote user. Exploit the trapdoor of a remote process that sends and receives email. The worm probes random IP addresses to spread to other hosts. Also during certain periods of times it issues denial of service attacks against certain web sites by flooding the site with packets from several hosts.

Code Red I infected nearly , servers in 14 hours. In late , another worm called Nimda appeared. The worm spread itself using different mechanisms such as Client to client via email From web server to client via browsing of web sites From client to Web server via exploitation of Microsoft IIS vulnerabilities The worm modifies Web documents and certain executables files on the infected system. As the viruses got more sophisticated the antivirus software packages have got more complex to detect them.

There are four generations of antivirus software: First Generation: This kind of scanner requires a specific signature to identify a virus. They can only detect known viruses. Second Generation: This kind of scanner does not rely on a specific signature. Rather, the scanner uses heuristic rules to search for probable virus infections. Another second generation approach to virus detection is to use integrity checking.

For example, a checksum can be appended to every program. If a virus infects the program without changing the checksum, then an integrity check will detect the virus. Third Generation: These kind of programs are memory resident and they identify a virus by its actions rather than by its structure. The advantage of this approach is that it is not necessary to generate signature or heuristics. This method works by identifying a set of actions that indicate some malicious work is being performed and then to intervene.

They including scanning, access control capability which limits the ability of a virus to penetrate the system and update the files to propagate the infection. Life without networks would be considerably less convenient and many activities would be impossible. In this chapter, we describe the basics of computer networks and how the concepts of confidentiality, integrity and availability can be applied for networks. There are several types of networks and they can be connected in different ways.

This section provides information on different classes of networks. Usually a LAN connects several computers, printers and storage devices. The primary advantage of a LAN to users is that it provides shared access to resources such programs and devices such as printers. It typically covers a wide geographical area. The hosts on a WAN may belong to a company with many offices in different cities or they may be a cluster of independent organizations within a few miles of each other who would like to share the cost of networking.

Therefore a WAN could be controlled by one organization or it can be controlled by multiple organizations. The Internet is a collection of networks that is loosely controlled by the Internet Society. The Internet Society enforces certain minimal rules to make sure that all users are treated fairly. Network and System Security 2. The three different topologies are as follows. In a common bus, the information is broadcast and nodes must continually monitor the bus to get the information addressed to it.

All communication flows from the source node to the traffic controller node and from the traffic controller node to the other nodes. However, there is one drawback with this architecture.

If a node fails to pass a message that it has received, the other nodes will not be able to receive that information. This section describes some of the popular network threats. Examples of spoofing are masquerading and man-in-the-middle attack. A common example is URL confusion. Thus abc. In phishing scams, an attacker sets up a web site that masquerades as a legitimate site. By tricking a user, the phishing site obtains the user's cleartext password for the legitimate site.

Phishing has proven to be quite effective in stealing user passwords. Suppose two people have entered into a session but then a third person intercepts the traffic and carries out a session in the name of the other person then this will be called session hijacking.

For example, if an Online merchant used a wiretap to intercept packets between you and Amazon. When the user has completed the order, Online merchant can intercept when the "Ready to check out" packet is sent and finishes the order with the user obtaining shipping address, credit card detail and other information.

In this case we say the Online merchant has hijacked the session. The difference between man-in-the-middle and hijacking is that a man-in-the-middle usually participates from the start of the session, whereas a session hijacking occurs after a session has been established. This kind of attack is frequently described in protocols.

For example, suppose two parties want to exchange encrypted information. One party contacts the key server to get a secret key that will be used in the communication. The key server responds by sending the private key to both the parties. A malicious middleman intercepts the response key and then eavesdrop on the communication between the two parties. Since this can have a wide impact they are often reported in the popular press.

Web sites are designed so that their code can be easily downloaded enabling an attacker to obtain the full hypertext document. One of the popular attacks against a web site is buffer overflow. In this kind of attack the attacker feeds a program more data than what is expected. A buffer size is exceeded and the excess data spills over adjoining code and data locations. Network and System Security 33 g Message Confidentiality Threats: - - Misdelivery: Sometimes messages are misdelivered because of some flaw in the network hardware or software.

We need to design mechanisms to prevent this. Exposure: To protect the confidentiaHty of a message, we must track it all the way from its creation to its disposal. Traffic Flow Analysis: Consider the case during wartime, if the enemy sees a large amount of traffic between the headquarters and a particular unit, the enemy will be able to infer that a significant action is being planned at that unit.

In these situations there is a need to protect the contents of the message as well as how the messages are flowing in the network. Availability attacks in network context are called denial of service attacks and they can cause a significant impact. The following are some sample denial of service attacks. Connection Flooding: This is the most primitive denial-of-service attack. If an attacker sends so much data that the communication system cannot handle it then you are prevented from receiving any other data.

Ping of Death: Since ping requires the recipient to respond to the ping request, all that the recipient needs to do it to send a flood of pings to the intended victim.

Smurf: This is a variation of a ping attack. It uses the same vehicle, a ping packet with two extra twists. First, the attacker chooses a network of victims. The attacker spoofs the source address in the ping packet so that it appears to come from the victim. Then, the attacker sends this request to the network in broadcast mode by setting the last byte of the address to all Is; broadcast mode packets are distributed to all the hosts.

Attackers using this approach do one more thing, they spoof the nonexistent return address in the initial SYN packet. In the first step, the attacker uses any convenient step such as exploiting a buffer overflow to plant a Trojan horse on a target machine.

The installation of the Trojan horse as a file or a process does not attract any attention. The attacker repeats this process with many targets. Each of these targets then become what is known as a zombie. The target system carry out their work , unaware of the resident zombie. At some point, the attacker chooses a victim and sends a signal to all the zombies to launch the attack.

Then, instead of the victim trying to defend against one denial-of-service attack from one malicious host, the victim must try to counter n attacks from n zombies all acting at one.

This section gives a summary of the taxonomy of defense mechanisms based on this paper. Classification by Activity Level Based on the activity level defense mechanisms can be classified into preventive and reactive mechanisms.

Preventive Mechanisms The goal of these mechanisms is to either eliminate the possibility of DOS attacks or to endure the attack without denying services to legitimate clients.

Attack Prevention Mechanisms These mechanisms modify the system configuration to eliminate the possibility of a DOS attack.

System security mechanisms increase the overall security by guarding against illegitimate access from other machines.

Examples of system security mechanisms include monitored access to the machine, install security patches, and firewall systems. Protocol security mechanisms address the problem of bad protocol design which can be misused to exhaust the resources of a server by initiating a large number of such transactions.

An example of a protocol security mechanism is to have a design in which resources are committed to the client only after sufficient authentication is done. Reactive Mechanisms Reactive mechanisms alleviate the impact of an attack by detecting an attack and responding to it. Reactive mechanisms can be classified based on the mechanisms that they use pattern detection, anomaly detection and hybrid detection.

Mechanism with Pattern Attack Detection In this method, signatures of known attacks are stored in a database. Each communication is monitored and compared with the database entries to discover the occurrence of an attack. Occasionally, the database is updated with new attack signatures.

The obvious drawback of this detection mechanism is that it can only detect known attacks. On the other hand the main advantage is that known attacks are reliably detected and no false positives are encountered. Mechanism with Anomaly Attack Detection Mechanisms that deploy anomaly detection have a model of normal system behavior such as traffic or system performance.

The current state of the system is periodically compared with the models to detect anomalies. The advantage of these techniques as compared to pattern detection is that unknown attacks can be discovered.

However, they have to solve the following problems Threshold setting: Anomalies are detected based on known settings. The setting of a low threshold leads to many false positives, while a high threshold reduces the sensitivity of the detection mechanism. Model Update: Systems and communication patterns evolve with time and models need to be updated to reflect this change. Mechanisms with Hybrid Attack Detection These techniques combine the pattern based and anomaly-based detection, using data about attacks discovered through an anomaly detection mechanism to devise new attack signatures and update the database.

Many intrusion detection systems use this technique but they have to be carefully designed. Reactive mechanisms can be classified based on the response strategy into agent identification, filtering and reconfiguration approaches. Agent Identification Mechanisms These mechanisms provide the victim with information about the identity of the machines that are responsible to perform the attacks. This information can be combined with other response approaches to reduce the impact of attacks.

Filtering Mechanism These techniques use the information provided by a detection mechanism to filter out the attack stream completely. A dynamically deployed firewall is an example of such a system. Reconfiguration System These mechanisms change the connectivity of the victim or the intermediate network topology to isolate the attack machines.

One example of such a system is a reconfigurable overlay network. It can provide privacy, authenticity, integrity and limited access to data. Encryption can be applied wither between two hosts link encryption or between two applications called end-to-end encryption. Link Encryption Link encryption protects the message in transit between two computers, however the message is in clear text inside the host. In this method, the data is encrypted before it is placed on the physical communication link.

The encryption occurs at the lowest layer 1 or 2 in the OSI model. Similarly, decryption occurs when the data arrives at the receiving computer. This mechanism is really useful when the transmission point is of greatest vulnerability. Network and System Security 37 End-to-end Encryption This mechanism provides security from one end of transmission to the other. In this case encryption is performed at the highest levels layer 7 or layer 6.

Virtual Private Networks Link encryption can be used to give the same protection to a user as of they are on a private network, even when their communication links are part of a public network.

When a user first requests communication with a firewall, the user can request a VPN session with the firewall. The user and the firewall can agree on a session encryption key and the user can use that key for all subsequent communication.

With a VPN all communication passes through an encrypted tunnel. PKI and Certificates A public key infrastructure PKI is a process created to enable users to implement public key cryptography usually in a distributed environment.

PKI usually offers the following services Create certificates that associates a user's identity to s cryptographic key Distribute certificates from its database Sign certificates to provide authenticity Confirm a certificate if it is valid PKI is really a set of policies, products and procedures with some flexibility for interpretation. The policies define a set of rules under which the system operates, it defines procedures on how to handle keys and how to manage the risks.

SSH protects against spoofing attacks and modification of during in communication. It is also known as transport layer security TLS. SSL interfaces between the applications e. IPSec The address space for Internet is running out as more machines and domain names are being added to the Internet. A new structure called IPv6 solves this problem by providing a 64 bit address space to IP addresses.

IPSec is implemented at the IP layer so it affects all layers above it. IPSec is somewhat similar to SSL, in that it supports authentication and confidentiality that does not necessitate significant changes either above it in applications or below it in the TCP protocols.

Just like SSL, it was designed to be independent of the cryptographic protocols and to allow the two communicating parties to agree on a mutually supported set of protocols. The basis of IPSec is called a security association which is basically a set of security parameters that are required to establish a secured communication.

Generally, a firewall runs on a dedicated machine Network and System Security 39 which is a single point through which all the traffic is channeled. The purpose of a firewall is to keep "malicious" things outside a protected environment. For example, a firewall may impose a policy that will permit traffic coming from only certain IP addresses or users. Packet Filtering Firewall It is the simplest form of firewall and in some situations it is most effective.

It is based on certain packet address source or destination or transportation protocol HTTP Web traffic. Stateful Inspection Firewall Filtering firewalls work on a packet at a time.

They have no concept of "state" or "context" from one packet to next. A stateful inspection firewall is more sophisticated and it maintains state information to provide better filtering Personal Firewall A personal firewall is an application program that runs on a workstation or a PC to block unwanted traffic on a single workstation. A personal firewall can be configured to enforce a certain policy.

For example, a user may decide that certain sites for example a computer on a company network is trustworthy and the firewall should allow traffic from only those sites. It is useful to combine a virus scanner with a personal firewall. For example, a firewall can direct all incoming email to a virus scanner, which examines every attachment the moment it reaches a particular host. Application Proxy Gateway An application proxy gateway is also called a bastion host.

It is a firewall that simulates the proper effects of an application so that the application will receive only requests to act properly. The proxies on a firewall can be tailored to specific requirements such as logging details about the access. A proxy can demand strong authentication such as name, password and challenge-response. Guard A guard is another form of a sophisticated firewall. It receives protocol data units, interprets them and passes through the same or different protocol 40 Anoop Singhal data units that achieve either the same result or a modified result.

The guard decides what services to perform on user's behalf in accordance with its available knowledge. The following example illustrates the use of a guard. A university wants all students to restrict the size of email messages to a certain number of words or characters.

Although, this rule can be implemented by modifying email handlers, it is more easily done by monitoring the common point through which all email flows. A firewall can only protect the perimeter of its environment against attacks from outsiders. The following are some of the important points about firewall based protection Firewalls can only protect if they control the entire perimeter. Even if one inside host connects to an outside address by a modem, the entire inside net can be vulnerable through the modem and its host.

Firewalls are the most visible parts of a network and therefore they are the most attractive target for attacks.. Firewalls exercise only minor control over the content of the packets that are admitted inside the network. Therefore inaccurate data or malicious code must be controlled by other means inside the parameter. However, prevention is not a complete security solution.

Intrusion Detection systems complement these preventive controls by acting as the next line of defense. An IDS is a sensor, like a smoke detector that raises an alarm if specific things occur.

Intrusion Detection is the process of identifying and responding to malicious activities targeted at computing and network resources. It involves technology, people and tools. An Intrusion Detection System basically monitors and collects data from a target system that should be protected, processes and correlates the gathered information and initiate responses when an intrusion is detected. Network and System Security 8. These components have vulnerabilities and people exploit these vulnerabilities to stage attacks against these resources.

In this chapter we have discussed some of the salient features of security in networks and distributed applications. Since the world is becoming connected by computers the significance of network security will continue to grow. When a network and its components are designed and architectured well, the resulting system is quite resilient to attacks. A lot of work is being done to enhance computer security.

Products from vendor companies will lead to more secure boxes. There is a lot of research interests in the area of authentication, access control and authorizations. Another challenge for security is that networks are pervasive: cell phones, personal digital assistants and other consumer appliances are being connected. New applications lead to a new protocol development. There is a need to make sure that these protocols are tested for security flaws and that security measures are incorporated as needed.

Intrusion Detection Systems and Firewalls have become popular products to secure networks. In the future, security of mobile code and web services will become an important issue as remote updates and patches become popular. Pfleeger C. Abrams, S. Jajodia, and H. Podell, eds. Menezes, P. Kaufman, R. Perlman, and M. Prentice Hall, 7. Cheswick, S. Bellovin, A. This chapter first provides a taxonomy of intrusion detection systems.

Second, architecture of IDS and their basic characteristics are presented. Third, a brief survey of different IDS products is discussed.

Finally, significant gaps and direction for future work is discussed Key words: intrusion, signatures, anomaly, data mining Intrusion Detection is the process of identifying and responding to malicious activity targeted at computing and networking resources. It is a device, typically another computer that monitors activities to identify malicious or suspicious events. An IDS receives raw input from sensors, analyzes those inputs and then takes some action.

Since the cost of information processing and Internet accessibility is dropping, more and more organizations are becoming vulnerable to a wide variety of cyber threats. According to a recent survey by CERT, the rate of cyber attacks has been doubling every year in recent times. Therefore, it has become increasingly important to make our information systems, especially those used for critical functions such as mihtary and commercial purpose, resistant to and tolerant of such attacks.

Intrusion Detection Systems IDS are an integral part of any security package of a modern networked information system. An IDS detects intrusions by monitoring a network or system and analyzing an audit stream collected from the network or system to look for clues of malicious behavior. Information Sources: The different sources of data that are used to determine the occurrence of an intrusion. The common sources are network, host and application monitoring.

Analysis: This part deals with techniques that the system uses to detect an intrusion. The most common approaches are misuse detection and anomaly detection 3. Response: This implies the set of actions that the system takes after it has detected an intrusion. The set of actions can be grouped into active and passive actions. An active action involves an automated intervention whereas a passive action involves reporting IDS alerts to humans.

The humans are in turn expected to take action. Other IDSs analyze information generated by operating system or application software for signs of intrusion. This involves placing a set of traffic sensors within the network. The sensors typically perform local analysis and detection and report suspicious events to a central location.

The majority of commercial intrusion detection systems are network based. A disadvantage of a network based IDS is that it cannot analyze encrypted information. Also, most network based IDS cannot tell if an attack was successful, they can only detect that an attack was started. Since host based systems directly monitor the host data files and operating system processes, they can determine exactly which host resources are targets of a particular attack.

Due to the rapid development of computer networks, traditional single host intrusion detection systems have been modified to monitor a number of hosts on a network. They transfer the monitored information from multiple monitored hosts to a central site for processing. These are termed as distributed intrusion detection systems. One advantage of a host based IDS is that it can "observe" the outcome of an attempted attack, as it can directly access and monitor the data files and system processes that are usually targeted by attacks.

A disadvantage of a host based IDS is that it is harder to manage and it is more vulnerable to attacks. The application log files are used to observe the events. One advantage of application based IDS is that they can directly monitor the interaction between a user and an application which allows them to trace individual users. IDS Analysis There are two primary approaches to analyze events to detect attacks: misuse detection and anomaly detection.

Misuse detection is used by most commercial IDS systems and the analysis targets something that is known to be bad. Anomaly detection is one in which the analysis looks for abnormal forms of activity. It is a subject of great deal of research and is used by a limited number of IDS.

Misuse Detection This method finds intrusions by monitoring network traffic in search of direct matches to known patterns of attack called signatures or rules. A common form of misuse detection that is used in commercial products specifies each pattern of events that corresponds to an attack as a separate signature. However, there are more sophisticated approaches called state based analysis that can leverage a single signature to detect a group of attacks.

A disadvantage of this approach is that it can only detect intrusions that match a pre-defined rule. The set of signatures need to be constantly updated to keep up with the new attacks.

One advantage of these systems is that they have low false alarm rates. Anomaly Detection In this approach, the system defines the expected behavior of the network in advance. The profile of normal behavior is built using techniques that include statistical methods, association rules and neural networks. Any significant deviations from this expected behavior are reported as possible attacks. Some examples of these attributes include number of files accessed by a user in a given period, the number of failed attempts to login to the system, the amount of CPU utilized by a process.

Statistical Measures: In this case the distribution of profiled attributes is assumed to fit a pattern. Other Techniques: These include data mining, neural networks, genetic algorithms and immune system models.

In principle, the primary advantage of anomaly based detection is the ability to detect novel attacks for which signatures have not been defined yet. However, in practice, this is difficult to achieve because it is hard to obtain accurate and comprehensive profiles of normal behavior.

This makes an anomaly detection system generate too many false alarms and it can be very time consuming and labor intensive to sift through this data. Commercial IDS support a wide range of response options, categorized as active responses, passive responses or a mixture of two. Active Responses Active responses are automated actions taken when certain types of intrusions are detected.

There are three categories of active responses. Collect additional information The most common response to an attack is to collect additional information about a suspected attack. This might involve increasing the level of sensitivity of information sources for example turn up the number of events logged by an operating system audit trail or increase the sensitivity of a network monitor to capture all the packets. The additional information collected can help in resolving and diagnosing whether an attack is taking place or not.

Change the Environment Another kind of active response is to halt an attack in progress and block subsequent access by the attacker. Typically, an IDS accomplishes this by blocking the IP address from which the attacker appears to be coming. Take Action Against the Intruder Some folks in the information warfare area believe that the first action in active response area is to take action against the intruder.

The most aggressive form of this response is to launch an attack against the attacker's host or site. Passive Responses Passive IDS responses provide information to system users and they assume that human users will take subsequent action based on that 48 Anoop Singhal information. Alarms and notifications are generated by an IDS to inform users when an attack is detected.

The most common form of an alarm is an on screen alert or a popup window. They are then displayed to the user via the network management consoles. They are briefly described below. Target System: The System that is being analyzed for intrusion detection is considered as the target system. Some examples of target systems are corporate intranets and servers.

Feed: A feed is an abstract notion of information from the target system to the intrusion detection system. Some examples of a feed are system log files on a host machine or network traffic and connections. Advertisement Hide. This service is more advanced with JavaScript available. Authors view affiliations Anoop Singhal.

First book to combine application of data warehousing and data mining techniques to the important emerging area of computer security Reviews current state of intrusion detection systems and their limitations Includes supplementary material: sn. Front Matter Pages i-xiii. Pages Network and System Security. Intrusion Detection Systems.

Data Mining for Intrusion Detection. Back Matter Pages About this book Introduction Data warehousing and data mining provide techniques for collecting information from distributed databases and for performing data analysis.

Authors and affiliations Anoop Singhal 1 1.



0コメント

  • 1000 / 1000