Pattern Matching

Overview

Pattern matching is used throughout IT for a multitude of reasons and can be used to identify transit or at rest data. Pattern matching can lead to the classification of data, which can cause problems, as in the case of a false positive. Where you have wrongly classified data based on a particular pattern but the data belongs to another class. In this lab, we are going to use the kddcup.data.corrected dataset, which contains traffic flows captured by TCPDUMP. We will use the Python programming language to detect patterns in the said dataset. We will illustrate how a false positive can occur, and we will also demonstrate how a Python program can be used to detect this error. Lastly, we will correct the error by altering the pattern within our program and verify that there are no more false positives

outcomes

In this lab, you will learn to: 

  • Understand how Python can be used to find patterns. 
  • Understand how Python can be used to detect false positives. 
  • Understand how to correct false positives by tweaking the code used for pattern matching. 

Key terms and descriptions

data set
A data set is a collection of data.
pattern matching
Python has a string find function that you can use to find items in a string.
for..in loop
A for..in loop is used to process through a list of items or series of lines in a file. It simplifies how to access individual items in a list.