Data Mining for Intrusion Detection

Data mining techniques have been successfully applied in many different fields including marketing, manufacturing, process control, fraud detection, and network management. Over the past five years, a growing number of research projects have applied data mining to various problems in intrusion detection. This chapter surveys a representative cross section of these research efforts. Moreover, four characteristics of contemporary research are identified and discussed in a critical manner. Conclusions are drawn and directions for future research are suggested.
This is a preview of subscription content, log in via an institution to check access.
Access this chapter
Subscribe and save
Springer+ Basic
€32.70 /Month
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
Price includes VAT (France)
eBook EUR 85.59 Price includes VAT (France)
Softcover Book EUR 105.49 Price includes VAT (France)
Hardcover Book EUR 105.49 Price includes VAT (France)
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Similar content being viewed by others

A Survey and Taxonomy on Data and Pre-processing Techniques of Intrusion Detection Systems
Chapter © 2018

Challenges and Opportunities for Network Intrusion Detection in a Big Data Environment
Chapter © 2024

A Practical Review on Intrusion Detection Systems by Known Data Mining Methods
Chapter © 2021
References
- Agrawal, R., Imielinski, T., and Swami, A. (1993). Mining Associations between Sets of Items in Massive Databases. InProceedings of the ACM-SIGMOD 1993 International Conference on Management of Datapages 207–216. Google Scholar
- Agrawal, R. and Srikant, R. (1994). Fast Algorithms for Mining Association Rules. InProceedings of the 20th International Conference on Very Large Databasespages 487–499. Google Scholar
- Allen, J., Christie, A., Fithen, W., McHugh, J., Pickel, J., and Stoner, E. (2000). State of the Practice of Intrusion Detection Technologies. Technical report, Carnegie Mellon University. http://www.cert.org/archive/pdf/99tr028.pdf. Google Scholar
- Almgren, M., Debar, H., and Dacier, M. (2000). A Lightweight Tool for Detecting Web Server Attacks. InProceedings of the Network and Distributed System Security Symposium (NDSS’00)pages 157–170. Google Scholar
- Bace, R. (2000).Intrusion Detection.Macmillan Technical Publishing. Google Scholar
- Barbará, D., Couto, J., Jajodia, S., Popyack, L., and Wu, N. (2001a).ADAM: Detecting Intrusions by Data Mining. InProceedings of the IEEE Workshop on Information Assurance and Security.Google Scholar
- Barbará, D., Wu, N., and Jajodia, S. (2001b). Detecting Novel Network Intrusions Using Bayes Estimators. InProceedings of the first SIAM International Conference on Data Mining (SDM’01).Google Scholar
- Berry, M. J. A. and Linoff, G. (1997).Data Mining Techniques.John Wiley and Sons, Inc. Google Scholar
- Brachman, R. J., Khabaza, T., Kloesgen, W., Piatetsky-Shapiro, G., and Simoudis, E. (1996). Mining Business Databases.Communications of the ACM39(11):42–48. ArticleGoogle Scholar
- Brejová, B., DiMarco, C., Vinar, T., and Hidalgo, S. (2000). Finding Patterns in Biological Sequences. Technical report, University of Waterloo. Google Scholar
- Brin, S., Motwani, R., Ullman, J., and Tsur, S. (1997). Dynamic Itemset Counting and Implication Rules for Market Basket Data. InProceedings of the ACM SIGMOD International Conference on Management of Datapages 255–264. Google Scholar
- Clifton, C. and Gengo, G. (2000). Developing Custom Intrusion Detection Filters Using Data Mining. InMilitary Communications International Symposium (MILCOM2000)Google Scholar
- Cohen, W. W. (1995). Fast Effective Rule Induction. InProceedings 12th International Conference on Machine Learningpages 115–123. Google Scholar
- Dain, O. and Cunningham, R. K. (2001). Fusing Heterogeneous Alert Streams into Scenarios. InProceedings of the ACM CCS Workshop on Data Mining for Security Applications.Google Scholar
- Debar, H., Dacier, M., Nassehi, M., and Wespi, A. (1998). Fixed vs. Variable-Length Patterns for Detecting Suspicious Process Behavior. InProceedings of the 5th European Symposium on Research in Cornputer Securitypages 1–15. Google Scholar
- Debar, H., Dacier, M., and Wespi, A. (2000). A Revised Taxonomy for Intrusion Detection Systems.Annales des Télécommunications55(78):361–378. Google Scholar
- Def Con (2000). DEF CON Capture The Flag Contest.http://www.defcon.org.
- Domingos, P. and Hulten, G. (2000). Mining High-Speed Data Streams InProceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Miningpages 71–80. Google Scholar
- Elmasri, R. and Navathe, S. B. (1994).Fundamentals of Database Systems.Addison-Wesley. Google Scholar
- Eskin, E. (2000). Anomaly Detection over Noisy Data Using Learned Probability Distributions. InProceedings of the International Conference on Machine Learning (ICML).Google Scholar
- Ester, M., Kriegel, H.-P., Sander, J., Wimmer, M., and Xu, X. (1998). Incremental Clustering for Mining in a Data Warehousing Environment. InProceedings of the 24th International Conference on Very Large Databases (VLDB’98)pages 323–333. Google Scholar
- Fayyad, U. (1998). Mining Databases: Towards Algorithms for Knowledge Discovery.Bulletin of the IEEE Computer Society Technical Committee on Data Engineering22(1):39–48. Google Scholar
- Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P. (1996a). From Data Mining to Knowledge Discovery in Databases.AI Magazine,17(3):37–54. Google Scholar
- Fayyad, U. M., Piatetsky-Shapiro, G., Smyth, P., and Uthurusamy, R., editors (1996b).Advances in Knowledge Discovery and Data Mining. AAAI Press/MIT Press. Google Scholar
- Ganti, V., Gehrke, J., and Ramakrishnan, R. (1999). CACTUS ¡ª Clustering Categorical Data Using Summaries. In5th ACM SIGKDD International Conference on Knowldege Discovery and Data Miningpages 73–83. Google Scholar
- Garofalakis, M. and Rastogi, R. (2001). Data Mining Meets Network Management: The Nemesis Project. InProceedings of the ACM SIG-MOD International Workshop on Research Issues in Data Mining and Knowledge Discovery.Google Scholar
- Glymour, C., Madigan, D., Pregibon, D., and Smyth, P. (1997). Statistical Themes and Lessons for Data Mining.Data Mining and Knowledge Discovery1(1):11–28. ArticleGoogle Scholar
- Gordon, A. (1999).Classification.Chapman and Hall. Google Scholar
- Grossman, R., Kasif, S., Moore, R., Rocke, D., and Ullman, J. (1998). Data Mining Research: Opportunities and Challenges. Technical reportWorkshop on Managing and Mining Massive and Distributed Data (M3D2). Google Scholar
- Guha, S., Rastogi, R., and Shim, K. (2000). ROCK: A Robust Clustering Algorithm for Categorical Attributes.Information Systems25(5):345–366. ArticleGoogle Scholar
- Han, J., Cai, Y., and Cercone, N. (1992). Knowledge Discovery in Databases: An Attribute-Oriented Approach. InProceedings of the 18th International Conference on Very Large Databasespages 547–559. Google Scholar
- Han, J. and Fu, Y. (1995). Discovery of Multi-Level Association Rules from Large Databases. InProceedingsof the 21 th Very Large Databases Conference, pages 420–431. Google Scholar
- Han, J. and Kamber, M. (2000).Data Mining: Concepts and Techniques.Morgan Kaufmann Publisher. Google Scholar
- Hätönen, K., Klemettinen, M., Mannila, H., Ronkainen, P., and Toivonen, H. (1996). Knowledge Discovery from Telecommunication Network Alarm Databases. InProceedings of the 12th International Conference on Data Engineeringpages 115–122. Google Scholar
- Hellerstein, J. L. and Ma, S. (2000). Mining Event Data for Actionable Patterns. InThe Computer Measurement Group.http://www. research.ibm.com/PM/. Google Scholar
- Jain, A. and Dubes, R. (1988).Algorithms for Clustering Data.Prentice-Hall. Google Scholar
- Jain, A., Murty, M., and Flynn, P. (1999). Data Clustering: A Review.ACM Computing Surveys31(3). Google Scholar
- Javitz, H. S. and Valdes, A. (1991). The SRI IDES Statistical Anomaly Detector. InProceedingsof the IEEE Symposium on Security and Privacy,Oakland,CA.SRI International. Google Scholar
- Julisch, K. (2001). Mining Alarm Clusters to Improve Alarm Handling Efficiency. InProceedings of the 17th Annual Computer Security Applications Conference (ACSAC).Google Scholar
- Klemettinen, M. (1999).A Knowledge Discovery Methodology for Telecommunication Network Alarm Data.PhD thesis, University of Helsinki (Finland). Google Scholar
- Klemettinen, M., Mannila, H., Ronkainen, P., Toivonen, H., and Verkamo, A. (1994). Finding Interesting Rules from Large Sets of Discovered Association Rules. InProceedings of the 3rd International Conference on Information and Knowledge Managementpages 401–407. Google Scholar
- Klemettinen, M., Mannila, H., and Toivonen, H. (1997). A Data Mining Methodology and Its Application to Semi-Automatic Knowledge Acquisition. InProceedings of the 8th International Workshop on Database and Expert System Applications (DEXA ‘87)pages 670–677. Google Scholar
- Lam, K.-Y., Hui, L., and Chung, S.-L. (1996). A Data Reduction Method for Intrusion Detection.Journal of Systems and Software33:101–108. ArticleGoogle Scholar
- Lane, T. and Brodley, C. E. (1999). Temporal Sequence Learning and Data Reduction for Anomaly Detection Lane.ACM Transactions on Information and System Security2(3):295–331. ArticleGoogle Scholar
- Lankewicz, L. and Benard, M. (1991). Real-Time Anomaly Detection Using a Non-Parametric Pattern Recognition Approach. InProceedings of the 7th Annual Computer Security Applications Conference.Google Scholar
- Lee, W. and Stolfo, S. J. (2000). A Framework for Constructing Features and Models for Intrusion Detection Systems.ACM Transactions on Information and System Security3(4):227–261. ArticleGoogle Scholar
- Lee, W., Stolfo, S. J., and Mok, K. W. (1997). Data Mining Approaches for Intrusion Detection. InProceedings of the Seventh USENIX Security Symposium (SECURITY ‘88)pages 120–132. Google Scholar
- Lee, W., Stolfo, S. J., and Mok, K. W. (1998). Mining Audit Data to Build Intrusion Detection Models. InProceedings of the 4th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’98)pages 66–72. Google Scholar
- Lee, W., Stolfo, S. J., and Mok, K. W. (1999a). A Data Mining Framework for Building Intrusion Detection Models. InProceedings of the 1999 IEEE Symposium on Security and Privacypages 120–132. Google Scholar
- Lee, W., Stolfo, S. J., and Mok, K. W. (1999b). Mining in a Data-flow Environment: Experience in Network Intrusion Detection. InProceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’99)pages 114–124. Google Scholar
- Li, Y., Wu, N., Jajodia, S., and Wang, X. S. (2000). Enhancing Profiles for Anomaly Detection Using Time Granularities. InProceedings of the First ACM Workshop on Intrusion Detection Systems (WIDS).Google Scholar
- Lippmann, R. P., Fried, D. J., Graf, I., Haines, J. W., Kendall, K. R., McClung, D., Weber, D., Webster, S. E., Wyschogrod, D., Cunningham, R. K., and Zissman, M. A. (2000). Evaluating Intrusion Detection Systems: The 1998 DARPA Off-Line Intrusion Detection Evaluation. InProceedings of the 2000 DARPA Information Survivability Conference and Expositionpages 12–26. Google Scholar
- Liu, B. and Hsu, W. (1996). Post-Analysis of Learned Rules. InProceedings of the 13th National Conference on Artificial Intelligencepages 828–834. Google Scholar
- Manganaris, S., Christensen, M., Zerkle, D., and Hermiz, K. (2000). A Data Mining Analysis of RTID Alarms.Computer Networks34(4). Google Scholar
- Mannila, H. (1996). Data Mining: Machine Learning, Statistics, and Databases. InProceedings of the 8th International Conference on Scientific and Statistical Database Managementpages 1–8. Google Scholar
- Mannila, H., Smyth, P., and Hand, D. J. (2001).Principles of Data Mining.MIT Press. Google Scholar
- Mannila, H., Toivonen, H., and Verkamo, A. I. (1997). Discovery of Frequent Episodes in Event Sequences.Data Mining and Knowledge Discovery1:259–289. ArticleGoogle Scholar
- McHugh, J. (2000). The 1998 Lincoln Laboratory IDS Evaluation ¡ª A Critique. In3th Workshop on Recent Advances in Intrusion Detection (RAID)pages 145–161. Google Scholar
- Miller, R. and Yang, T. (1997). Association Rules Over Interval Data. InProceedings of the 1997 ACM-SIGMOD Conference on Management of Datapages 452–461. Google Scholar
- Mitchell, T. M. (1997).Machine Learning.McGraw-Hill. Google Scholar
- Mounji, A. (1997).Languages and Tools for Rule-Based Distributed Intrusion Detection.PhD thesis, Facultés Universitaires Notre-Dame de la Paix Namur (Belgium). Google Scholar
- Mukkamala, R., Gagnon, J., and Jajodia, S. (1999). Integrating Data Mining Techniques with Intrusion Detection Methods. InProceedings of the 13th IFIP WG11.3 Working Conference on Database Securitypages 33–46. Google Scholar
- Pevzner, P. A. and Sze, S.-H. (2000). Combinatorial Approaches to Finding Subtle Signals in DNA Sequences. InProceedings of the 8th International Conference on Intelligent Systems for Molecular Biologypages 269–278. Google Scholar
- Portnoy, L., Eskin, E., and Stolfo, S. J. (2001). Intrusion Detection with Unlabeled Data Using Clustering. InProceedings of the ACM CCS Workshop on Data Mining for Security Applications.Google Scholar
- Quinlan, J. R. (1986). Induction of Decision Trees.Machine Learning1(1):81–106. Google Scholar
- Rigoutsos, I. and Floratos, A. (1998). Combinatorial Pattern Discovery in Biological Sequences: The TEIRESIAS Algorithm.Bioinformatics14(1):55–67. ArticleGoogle Scholar
- Silberschatz, A. and Tuzhilin, A. (1996). On Subjective Measures of Interestingness in Knowledge Discovery. InProceedings of the First International Conference on Knowledge Discovery and Data Miningpages 275–281. Google Scholar
- Smaha, S. E. (1988). Haystack: An Intrusion Detection System. InProceedings of the 4th IEEE Aerospace Computer Security Applications Conference, OrlandoFL, pages 37–44. Google Scholar
- Smyth, P. (2001). Breaking out of the Black-Box: Research Challenges in Data Mining. InProceedings of the ACM SIGMOD International Workshop on Research Issues in Data Mining and Knowledge Discov-ery (DMKD’01).Google Scholar
- Srikant, R. and Agrawal, R. (1996). Mining Quantitative Association Rules in Large Relational Tables. InProceedings of the 1996 ACM-SIGMOD Conference on Management of Datapages 1–12. Google Scholar
- Stedman, C. (1997). Data Mining for Fool’s Gold.Computerworld31(48). Google Scholar
- Teng, H. S., Chen, K., and Lu, S. C. (1990). Adaptive Real-Time Anomaly Detection Using Inductively Generated Sequential Patterns. InPro-ceedings of the IEEE Symposium on Research in Security and Privacy, OaklandCA, pages 278–284. Google Scholar
- Vaccaro, H. S. and Liepins, G. E. (1989). Detection of Anomalous Com-puter Session Activity. InProceedings of the IEEE Symposium on Research in Security and Privacy, Oakland,CA, pages 280–289. Google Scholar
- Warrender, C., Forrest, S., and Pearlmutter, B. (1999). Detecting Intru-sions Using System Calls: Alternative Data Models. InProceedings of the IEEE Symposium on Research in Security and Privacy, OaklandCA, pages 133–145. Google Scholar
Author information
Authors and Affiliations
- IBM Research, Zurich Research Laboratory, Zurich, Switzerland Klaus Julisch
- Klaus Julisch