Skip to main content

Map

Goal

Use map to create a mapping file describing how production data is anonymized, in key=value pairs of the form production_data=anonymized_data.

  • Use map with output to produce a mapping file containing key=value pairs resulting from an anonymization rule
  • Use map with input to update a column according to the key=value pairs in the provided mapping file
  • Use map with input-output to first update a column according to the mapping file, and then update the mapping file in cases there are missing key=value pairs
Warning
  • Mapping files contain production data. Protect it as such.
  • Run a map Work Task only once on a column of a data set
    • Running a map rule twice on the same column in the same data set may result in data integrity problems
      • the first map will work on production data, the second map will work on anonymized data
      • After the second run, in the cases of output and input-output, the mapping file created/updated will no longer properly map production values to anonymized values

Use cases for map

Use map when table values is to be anonymized equally across

  • Different databases or schemas
  • Different versions of the same database, e.g., equally over time
  • Tables in the same database, but that are not connected properly
  • Different .csv files

Encryption

You may encrypt the mapping file using the encrypted keyword, and provide your own encryption key when configuring the application.

Ano Structure and Examples

        ...
format <format-string>
<input-source>
map <map-file.txt> output encrypted?
...
format <format-string>
<input-source>
map <map-file.txt> input encrypted?
...
format <format-string>
<input-source>
map <map-file.txt> input-output encrypted?

structure

  • The type of <input-source> does not matter. It may be a file, random-[type], randomize-[type], shuffle, etc.
  • The result from the <format-string> is what is stored into the mapping file as anonymized_data

Mapping File Structure

The first two rows contain metadata about the mapping file. Further lines are anonymizations with one value per line, of the form production_data=anonymized_data.

#*** identity_number ***
#Fri Oct 15 11:26:58 CEST 2021
13071311390=241360601360
17041711981=948029768029
13081310255=816447276447
29012912169=24136061360
30103011145=905948545948
26062610647=90594895948