MySQL Datamasking using ProxySQL – part 2

First of all I want to say thanks to Thomas that sent me feedback on part1 (see his comment). And indeed all three cases were able to defeat the datamasking.

I rewrote then the rules to also take those cases in consideration:

ProxySQL> DELETE FROM mysql_query_rules where rule_id < 50;
ProxySQL> INSERT INTO mysql_query_rules 
          (rule_id,active,username,match_pattern,replace_pattern,apply)  
          VALUES (1,1,'devel','`cc_num`',"cc_num",0);
ProxySQL> INSERT INTO mysql_query_rules 
          (rule_id,active,username,match_pattern,replace_pattern,apply)  
           VALUES (2,1,'devel','^[sS][eE][lL][eE][cC][tT] (.*)cc_num([ ,\n])(.*)',
           "SELECT \1CONCAT(LEFT(cc_num,2),REPEAT('X',10)) cc_num\2\3",1);

masking
However the current recursive implementation of ProxySQL (using flagIN & flagOUT) is not an option to face the case where a masked field is referenced twice. To handle this case, ProxySQL will have to implement PCRE (see issue 716).

Additionally, in part 1, we saw how to mask one column for a specific user.

But if I want to mask other fields in the same table ? Is that feasible ? Easily ?

Yes and yes 😉

The only necessarily steps are:

  • disable “apply” for the previous rule
  • add a new rule
  • load & save the rules

In the following example, I use the same table as in the previous article, but this time I want to also mask the last name, I decided to only show the first two character and use a fixed length.

ProxySQL> update mysql_query_rules set apply=0 where rule_id<3;
ProxySQL> INSERT INTO mysql_query_rules 
          (rule_id,active,username,match_pattern,replace_pattern,apply)  
          VALUES (4,1,'devel','`lastname`',"lastname",0);
ProxySQL> INSERT INTO mysql_query_rules 
          (rule_id,active,username,match_pattern,replace_pattern,apply)  
           VALUES (5,1,'devel','^[sS][eE][lL][eE][cC][tT] (.*)lastname([ ,\n])(.*)',
           "SELECT \1CONCAT(LEFT(lastname,2),REPEAT('X',10)) lastname\2\3",1);
ProxySQL> LOAD MYSQL QUERY RULES TO RUNTIME;  

Et voilà, done !

Let’s test it:

mysql1 mysql> select firstname, cc_num, lastname from myapp.CUSTOMERS;
+-----------+------------------+--------------+
| firstname | cc_num           | lastname     |
+-----------+------------------+--------------+
| Frederic  | XXXXXXXXXXXX5289 | DeXXXXXXXXXX |
| Dim0      | XXXXXXXXXXXX5289 | VaXXXXXXXXXX |
| Kenny     | XXXXXXXXXXXX4589 | GrXXXXXXXXXX |
+-----------+------------------+--------------+

Wooohoooo it worked ! 😉

With ProxySQL you can also easily check which rules have been hit and how many times:

ProxySQL> SELECT hits, mysql_query_rules.rule_id, match_digest, 
                      substr(match_pattern,1,20) "match",
                      substr(replace_pattern,1,20) "replace", apply 
               FROM mysql_query_rules 
               NATURAL JOIN stats.stats_mysql_query_rules 
               ORDER BY mysql_query_rules.rule_id;
+------+---------+--------------+----------------------+----------------------+-------+
| hits | rule_id | match_digest | match                | replace              | apply |
+------+---------+--------------+----------------------+----------------------+-------+
| 2    | 1       | NULL         | `cc_num`             | cc_num               | 0     |
| 2    | 2       | NULL         | ^[sS][eE][lL][eE][cC | SELECT \1CONCAT(LEFT | 0     |
| 1    | 4       | NULL         | `lastname`           | lastname             | 0     |
| 1    | 5       | NULL         | ^[sS][eE][lL][eE][cC | SELECT \1CONCAT(LEFT | 1     |
| 0    | 90      | NULL         | ^SELECT \*.*FROM.*CU | NULL                 | 0     |
+------+---------+--------------+----------------------+----------------------+-------+

So at the end, datamasking for MySQL is not so complicated.

Note: As Tom Krouper highlighted it to me by chat, it’s a VERY bad idea to store Credit Card numbers in clear into a database. This was just for the theory, but you have been warned !

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

7 Comments

  1. select * from table kind of breaks this. Data masking is more complicated than simply doing a regex replacement on column names.

    • Hi Justin,

      select * is banned by the last rule.

      But of course, I know that datamasking is more complicated than that (Thomas already replied me with some extra examples).

      I think the best datamasking should be done from the extracting layer… but that’s another story 🙂

      So meanwhile, with the help of René, I will try to make the more exhaustive list of regexp that will cover all the cases. So, yes please send me examples 🙂

      Thx Justin.

      Fred.

    • Hi dedy,

      This post is not about that, you will find more info about read/write splitting using HA Proxy on Percona’s blog. However, read/write splitting with HA Proxy is not transparent as it’s a TCP level 4 proxy.

  2. What about :

    select firstname, c.* from myapp.CUSTOMERS c;

    select cc_num, lastname, cc_num c, lastname from myapp.CUSTOMERS;

    select cc_num as cc_num from myapp.CUSTOMERS;

    select cc_num from myapp.CUSTOMERS; (With double space)

Leave a Reply

Your email address will not be published. Required fields are marked *

As MySQL Community Manager, I am an employee of Oracle and the views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

You can find articles I wrote on Oracle’s blog.