Data Format – A Power Grid Dataset for IDS Research

The Sherlock datasets include a range of information. Most users will primarily use the train and test sets from the main vantage point, i.e. the switch of the control center. These two file are available in the downloaded directories as the compressed files train.n302.state.gz and test.n302.state.gz. The IPAL IDS framework can directly process these files. Otherwise, the files can be decompressed to reveal a json file with a time series of each data value.

Moreover, the Sherlock datasets include the raw data of the simulation runs in the raw-data folder and all the information necessary to extract IPAL state files in the ipal folder.

The raw-data folders contains:

pcap: A folder of pcaps captured at a mirror port of the respective switches
log: Log files of the attackers and RTUs
docs: Visulatizations of the power grid and ICT networks
control-center: A jsonl file of control center notifications
data-point-map.json: A mapping between IEC 104 IOAs and human-readable data point identifiers
events.jsonl: A list of events during the simulation run
physical.zip: Ground truth information about the physical grid state
sherlock-config.yml: The configuration for the Wattson simulator

The ipal folders contains:

attack.json: List of all events (attacks and benign events) in the test dataset in the format understood by IPAL
events.json: List of all benign events in the train dataset in the format understood by IPAL
initial_state.json: The initial values of all process values
rules.py: An IPAL rules file to convert IEC 104 IOAs and human-readable data point (derived from data-point-map.json) and replace all NaN readings with 0

To generate the test and train files from the pcaps, the following command was used:
ipal-transcriber --pcap [pcap file] --protocols iec104 --rules ipal/rules.py --malicious [events.json/attack.json] --malicious.default false --state.output - timeslice --timeslice.initial_state initial_state.json > out.state