Overview of the randomize
rules
Goal
The randomize
rule applies noise to column values. The table values is treated as input source.
Noise is added based upon a Gaussian distribution function, which is configured by the noise parameters offset
, flat-noise
, and percentage-noise
.
Use Cases
The randomize
rule is useful for scrambling data. The value will be slightly changed, according to the size of the noise variables. Supported column types for randomize
are
integer
, decimal
, date
, time
, and datetime
.
ANO Structure
...
randomize <column-name> <rule-name?> // ´?´ means optional
type <type>
format <format>
convert <conversion>
transform? <transformation?>
unique?
offset? <offset-decimal?> // random variable
flat-noise? <flat-noise-decimal?> // random variable
percentage-noise? <percentage-noise-decimal?> // random variable
...
- Randomize needs to know what type is to be anonymized
integer
,decimal
,date
,time
, ordatetime
- format describes the output of the randomize task:
%d
,%f
/%.2f
,%tF
,%tT
,"%1$tF %1$tT"
- Randomize rules reads column values as string, and must therefore have a convert function specified.
- Select one of the built-in conversion functions
String2Integer
,String2Decimal
,String2Date
,String2Time
,String2DateTime
.
- Select one of the built-in conversion functions
transform
refers to a java transformation function in the DBmasker project that will transform the output of the mask rule before storing back to databaseunique
ensures uniqueness of values created by randomize
Noise
Noise Variables
offset
- Fixed valueflat-noise
- Fixed value that is mulitplied by a randomly generated numberr
percentage-noise
- percentage of the original value that is mulitplied by a randomly generated numberr
- minimum 0.00
- maximum 100.00
Note:
- When adding noise to type
time
anddatetime
, the noise is added in seconds - When adding noise to type
date
, the noise is added in days - For type
integer
, after noise is added, the resulting value is cast from double to integer
Noise Generation Process
- A random number r is drawn (at runtime) from the Standard Gaussian Distribution Function
- We use r, together with user-provided noise variables to add noise to the existing value
Noise formula for integers and decimals
existing_value + offset + (flat-noise * r) + (existing_value * percentage-noise / 100) * r
Noise formula for dates, times and datetimes
original_value + offset + (flat-noise * r)