Skip to main content

Overview of the randomize rules

Goal

The randomize rule applies noise to column values. The table values is treated as input source.

Noise is added based upon a Gaussian distribution function, which is configured by the noise parameters offset, flat-noise, and percentage-noise.

Use Cases

The randomize rule is useful for scrambling data. The value will be slightly changed, according to the size of the noise variables. Supported column types for randomize are integer, decimal, date, time, and datetime.

ANO Structure

...
randomize <column-name> <rule-name?> // ´?´ means optional
type <type>
format <format>
convert <conversion>
transform? <transformation?>
unique?
offset? <offset-decimal?> // random variable
flat-noise? <flat-noise-decimal?> // random variable
percentage-noise? <percentage-noise-decimal?> // random variable
...
  • Randomize needs to know what type is to be anonymized integer, decimal, date, time, or datetime
  • format describes the output of the randomize task: %d, %f / %.2f, %tF, %tT, "%1$tF %1$tT"
  • Randomize rules reads column values as string, and must therefore have a convert function specified.
    • Select one of the built-in conversion functions String2Integer, String2Decimal, String2Date, String2Time, String2DateTime.
  • transform refers to a java transformation function in the DBmasker project that will transform the output of the mask rule before storing back to database
  • unique ensures uniqueness of values created by randomize

Noise

Noise Variables

  • offset - Fixed value
  • flat-noise - Fixed value that is mulitplied by a randomly generated number r
  • percentage-noise - percentage of the original value that is mulitplied by a randomly generated number r
    • minimum 0.00
    • maximum 100.00

Note:

  • When adding noise to type time and datetime, the noise is added in seconds
  • When adding noise to type date, the noise is added in days
  • For type integer, after noise is added, the resulting value is cast from double to integer

Noise Generation Process

  1. A random number r is drawn (at runtime) from the Standard Gaussian Distribution Function
  2. We use r, together with user-provided noise variables to add noise to the existing value

Noise formula for integers and decimals

existing_value + offset + (flat-noise * r) + (existing_value * percentage-noise / 100) * r

Noise formula for dates, times and datetimes

original_value + offset + (flat-noise * r)