Data Format

The Sherlock datasets include a range of information. Most users will primarily use the train and test sets from the main vantage point, i.e. the switch of the control center. These two file are available in the downloaded directories as the compressed files train.n302.state.gz and test.n302.state.gz. The IPAL IDS framework can directly process these files. Otherwise, the files can be decompressed to reveal a json file with a time series of each data value.

Moreover, the Sherlock datasets include the raw data of the simulation runs in the raw-data folder and all the information necessary to extract IPAL state files in the ipal folder.

The raw-data folders contains:

  • pcap: A folder of pcaps captured at a mirror port of the respective switches
  • log: Log files of the attackers and RTUs
  • docs: Visulatizations of the power grid and ICT networks
  • control-center: A jsonl file of control center notifications
  • data-point-map.json: A mapping between IEC 104 IOAs and human-readable data point identifiers
  • events.jsonl: A list of events during the simulation run
  • physical.zip: Ground truth information about the physical grid state
  • sherlock-config.yml: The configuration for the Wattson simulator


The ipal folders contains:

  • attack.json: List of all events (attacks and benign events) in the test dataset in the format understood by IPAL
  • events.json: List of all benign events in the train dataset in the format understood by IPAL
  • initial_state.json: The initial values of all process values
  • rules.py: An IPAL rules file to convert IEC 104 IOAs and human-readable data point (derived from data-point-map.json) and replace all NaN readings with 0

To generate the test and train files from the pcaps, the following command was used:
ipal-transcriber --pcap [pcap file] --protocols iec104 --rules ipal/rules.py --malicious [events.json/attack.json] --malicious.default false --state.output - timeslice --timeslice.initial_state initial_state.json > out.state