System Log File Reduction and Detection of Malicious Behavior

Ralph Ritchey

April 27, 2021

Abstract

Cybersecurity relies heavily on data for the detection of malicious behavior. Historically, intrusion detection systems (IDS) utilized sensors placed strategically within a network to monitor and capture network traffic. Various tools then process the network traffic in real-time or batch mode, generating alerts which security analysts review. This methodology worked effectively until the use of encryption became prevalent for network traffic. Encrypted network traffic prevents signature-based IDS tools from inspecting packet payload contents to detect signatures indicating malicious activity or intent. As an alternative data source, system logs and web server logs capture the indicators at the system level required by cybersecurity tools and analysts to detect possible malicious behavior. The research presented here examines the viability of performing log file size reduction while retaining and indicating log entries containing malicious activity using truncated singular value decomposition (TSVD) and k-means clustering.

MS Thesis


Machine Learning Toolkit for System Log File Reduction and Detection of Malicious Behavior

Ralph Ritchey and Richard Perry

May 11, 2021

Abstract

The increasing use of encryption blinds traditional network-based intrusion detection systems (IDS) from performing deep packet inspection. An alternative data source for detecting malicious activity is necessary. Log files found on servers and desktop systems provide an alternative data source containing information about activity occurring on the device and over the network. The log files can be sizeable, making the transport, storage, and analysis difficult. Malicious behavior may appear as normal events in logs, not triggering an error or another obvious indicator, making automated detection challenging. The research described here utilizes a Python-based toolkit approach with unsupervised machine learning to reduce log file sizes and detect malicious behavior.

2021 IEEE INFOCOM: paper, slides, video


System Log File Reduction and Detection of Malicious Behavior

Ralph Ritchey

July 21, 2020

Abstract

Cybersecurity relies heavily on data for the detection of malicious behavior. Historically, intrusion detection systems (IDS) utilized sensors placed strategically within a network to monitor and capture network traffic. Various tools then process the network traffic in real-time or batch mode, generating alerts security analysts review. This methodology worked effectively until the use of encryption became prevalent for network traffic. Encrypted network traffic prevents signature-based IDS tools from inspecting packet payload contents to detect signatures indicating malicious activity or intent. As an alternative data source, system logs, and web server logs capture the indicators at the system level required by cybersecurity tools and analysts to detect possible malicious behavior. Research into log file size reduction while retaining and indicating log entries containing malicious activity is necessary to prevent overwhelming cybersecurity systems, tools, and analysts with too much data.

independent study report

Fig. 2.1 - PCA Identification of Outliers (Anomalies)