Map
Goal
Use map
to create a mapping file describing how production data is anonymized, in key=value
pairs of the form production_data=anonymized_data
.
- Use
map
withoutput
to produce a mapping file containingkey=value
pairs resulting from an anonymization rule - Use
map
withinput
to update a column according to thekey=value
pairs in the provided mapping file - Use
map
withinput-output
to first update a column according to the mapping file, and then update the mapping file in cases there are missingkey=value
pairs
Warning
- Mapping files contain production data. Protect it as such.
- Run a
map
Work Task only once on a column of a data set- Running a
map
rule twice on the same column in the same data set may result in data integrity problems- the first
map
will work on production data, the secondmap
will work on anonymized data - After the second run, in the cases of
output
andinput-output
, the mapping file created/updated will no longer properly map production values to anonymized values
- the first
- Running a
Use cases for map
Use map
when table values is to be anonymized equally across
- Different databases or schemas
- Different versions of the same database, e.g., equally over time
- Tables in the same database, but that are not connected properly
- Different .csv files
Encryption
You may encrypt the mapping file using the encrypted
keyword, and provide your own encryption key when configuring the application.
Ano Structure and Examples
- Structure
- Output Example
- Input Example
- Input-Output Example
...
format <format-string>
<input-source>
map <map-file.txt> output encrypted?
...
format <format-string>
<input-source>
map <map-file.txt> input encrypted?
...
format <format-string>
<input-source>
map <map-file.txt> input-output encrypted?
structure
- The type of
<input-source>
does not matter. It may be afile
,random-[type]
,randomize-[type]
,shuffle
, etc. - The result from the
<format-string>
is what is stored into the mapping file asanonymized_data
update ADDRESS
mask HOMEADDRESS
format %s
file src/main/resources/address.txt random-order
map address_map.txt output
output
- Creates anonymized values using values from
src/main/resources/address.txt
- Save the anonymizations as mappings of the form
production_data=anonymized_data
intoaddress_map.txt
...
update ADDRESS
mask HOMEADDRESS
format %s
file src/main/resources/address.txt random-order
map address_map.txt input
...
input
- Anonymize values using mappings in
address_map.txt
- If mappings does not exist, then create new anonymizations using values from the input-source
src/main/resources/address.txt
- Values created from the input source will not be stored to the mapping file. Use
input-output
for this.
- Values created from the input source will not be stored to the mapping file. Use
...
update ADDRESS
mask HOMEADDRESS
format %s
file src/main/resources/address.txt random-order
map address_map.txt input-output
...
input-output
- Anonymize values using mappings in
address_map.txt
- If mappings does not exist, then create a new anonymizations using values from the input-source
src/main/resources/address.txt
- Update the mapping file
address_map.txt
with the new anonymizations
Mapping File Structure
The first two rows contain metadata about the mapping file. Further lines are anonymizations with one value per line, of the form production_data=anonymized_data
.
- id_number_map.txt Example
#*** identity_number ***
#Fri Oct 15 11:26:58 CEST 2021
13071311390=241360601360
17041711981=948029768029
13081310255=816447276447
29012912169=24136061360
30103011145=905948545948
26062610647=90594895948