First of all I want to say thanks to Thomas that sent me feedback on part1 (see his comment). And indeed all three cases were able to defeat the datamasking.
I rewrote then the rules to also take those cases in consideration:
ProxySQL> DELETE FROM mysql_query_rules where rule_id < 50; ProxySQL> INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply) VALUES (1,1,'devel','`cc_num`',"cc_num",0); ProxySQL> INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply) VALUES (2,1,'devel','^[sS][eE][lL][eE][cC][tT] (.*)cc_num([ ,\n])(.*)', "SELECT \1CONCAT(LEFT(cc_num,2),REPEAT('X',10)) cc_num\2\3",1);
However the current recursive implementation of ProxySQL (using flagIN & flagOUT) is not an option to face the case where a masked field is referenced twice. To handle this case, ProxySQL will have to implement PCRE (see issue 716).
Additionally, in part 1, we saw how to mask one column for a specific user.
But if I want to mask other fields in the same table ? Is that feasible ? Easily ?
Yes and yes 😉
The only necessarily steps are:
- disable “apply” for the previous rule
- add a new rule
- load & save the rules
In the following example, I use the same table as in the previous article, but this time I want to also mask the last name, I decided to only show the first two character and use a fixed length.
ProxySQL> update mysql_query_rules set apply=0 where rule_id<3; ProxySQL> INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply) VALUES (4,1,'devel','`lastname`',"lastname",0); ProxySQL> INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply) VALUES (5,1,'devel','^[sS][eE][lL][eE][cC][tT] (.*)lastname([ ,\n])(.*)', "SELECT \1CONCAT(LEFT(lastname,2),REPEAT('X',10)) lastname\2\3",1); ProxySQL> LOAD MYSQL QUERY RULES TO RUNTIME;
Et voilà, done !
Let’s test it:
mysql1 mysql> select firstname, cc_num, lastname from myapp.CUSTOMERS; +-----------+------------------+--------------+ | firstname | cc_num | lastname | +-----------+------------------+--------------+ | Frederic | XXXXXXXXXXXX5289 | DeXXXXXXXXXX | | Dim0 | XXXXXXXXXXXX5289 | VaXXXXXXXXXX | | Kenny | XXXXXXXXXXXX4589 | GrXXXXXXXXXX | +-----------+------------------+--------------+
Wooohoooo it worked ! 😉
With ProxySQL you can also easily check which rules have been hit and how many times:
ProxySQL> SELECT hits, mysql_query_rules.rule_id, match_digest, substr(match_pattern,1,20) "match", substr(replace_pattern,1,20) "replace", apply FROM mysql_query_rules NATURAL JOIN stats.stats_mysql_query_rules ORDER BY mysql_query_rules.rule_id; +------+---------+--------------+----------------------+----------------------+-------+ | hits | rule_id | match_digest | match | replace | apply | +------+---------+--------------+----------------------+----------------------+-------+ | 2 | 1 | NULL | `cc_num` | cc_num | 0 | | 2 | 2 | NULL | ^[sS][eE][lL][eE][cC | SELECT \1CONCAT(LEFT | 0 | | 1 | 4 | NULL | `lastname` | lastname | 0 | | 1 | 5 | NULL | ^[sS][eE][lL][eE][cC | SELECT \1CONCAT(LEFT | 1 | | 0 | 90 | NULL | ^SELECT \*.*FROM.*CU | NULL | 0 | +------+---------+--------------+----------------------+----------------------+-------+
So at the end, datamasking for MySQL is not so complicated.
Note: As Tom Krouper highlighted it to me by chat, it’s a VERY bad idea to store Credit Card numbers in clear into a database. This was just for the theory, but you have been warned !
select * from table kind of breaks this. Data masking is more complicated than simply doing a regex replacement on column names.
Hi Justin,
select * is banned by the last rule.
But of course, I know that datamasking is more complicated than that (Thomas already replied me with some extra examples).
I think the best datamasking should be done from the extracting layer… but that’s another story 🙂
So meanwhile, with the help of René, I will try to make the more exhaustive list of regexp that will cover all the cases. So, yes please send me examples 🙂
Thx Justin.
Fred.
Oh, I see the replacement is NULL. 🙂
You also forgot ANSI quotes. You need to replace “cc_num” with cc_num 😀
Hi
did you have documentation for splitting read/write using HA Proxy on Percona xtradb CLuster.
thanks,
Hi dedy,
This post is not about that, you will find more info about read/write splitting using HA Proxy on Percona’s blog. However, read/write splitting with HA Proxy is not transparent as it’s a TCP level 4 proxy.
What about :
select firstname, c.* from myapp.CUSTOMERS c;
select cc_num, lastname, cc_num c, lastname from myapp.CUSTOMERS;
select cc_num as cc_num from myapp.CUSTOMERS;
select cc_num from myapp.CUSTOMERS; (With double space)