Percona Live Amsterdam is just finished… there was an entire room full day about MySQL 8.0 which is wonderful but there was also another product that everybody was talking about: ProxySQL (great job René).
Ronald Bradford and the MySQL engineering team were discussing about what tool was missing in the MySQL ecosystem and Mark Leith highlighted that a datamasking utility was really missing.
Having used already ProxySQL for Group Replication and other tests, I knew that it would be possible to use it also for this purpose. After a very short discussion and a nightly chat with René, this is an example on how you can achieve this missing (and sometime heavy) task.
- one CUSTOMERS table with sensitive information like Credit Card numbers (those are fake)
- a developer user that doesn’t need to access those numbers
So we decided this:
- the developer can only use SELECT (and EXECUTE)
- the developer can access all columns
- but credit card numbers (cc_num) must be masked, but the last 4 digits
- the developer can’t perform SELECT * on that specific table
This is our table:
+----+-----------+-------------+------------+------------------+----------+ | id | firstname | lastname | cc_type | cc_num | cc_verif | +----+-----------+-------------+------------+------------------+----------+ | 1 | Frederic | Descamps | mastercard | 5275653223285289 | 456 | | 8 | Dim0 | Vanoverbeke | mastercard | 5345654523285289 | 123 | | 15 | Kenny | Gryp | visa | 4916066793184589 | 456 | +----+-----------+-------------+------------+------------------+----------+
So first we will create on our database server the developer’s user:
MySQL> GRANT SELECT, USE on myapp.* to 'devel'@'%' identified by 'password';
Now let’s configure ProxySQL (I assume here that you already have installed and started proxysql) from it’s admin interface:
# mysql -u admin -padmin -h 127.0.0.1 -P6032
The first step is to create a user here too:
ProxySQL> INSERT INTO mysql_users(username,password,default_hostgroup) VALUES ('devel','password',1);
and now our MySQL server if it's not yet referenced:
ProxySQL> INSERT INTO mysql_servers(hostgroup_id,hostname,port) VALUES (1,'mysql1',3306);
We can load those settings in runtime and then save then to disk too:
ProxySQL> LOAD MYSQL SERVERS TO RUNTIME; ProxySQL> LOAD MYSQL USERS TO RUNTIME; ProxySQL> SAVE SERVERS USERS TO DISK; ProxySQL> SAVE MYSQL USERS TO DISK;
We can already try and see that our developer can still read our data. Let’s work on this now.
ProxySQL> INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,error_msg) VALUES (90,1,'devel','^SELECT \*.*FROM.*CUSTOMERS', 'Query not allowed due to sensitive information, please contact firstname.lastname@example.org');
Let’s load it in runtime and test:
ProxySQL> LOAD MYSQL QUERY RULES TO RUNTIME; # mysql -u devel -ppassword -h 127.0.0.1 -P 6033 mysql> select * from myapp.CUSTOMERS; ERROR 1148 (42000): Query not allowed due to sensitive information, please contact email@example.com
Wooohooo it worked 😉
And the datamasking ? This is our next step, let’s enter the following rule:
ProxySQL> INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply) VALUES (1,1,'devel','^[sS][eE][lL][eE][cC][tT] (.*)cc_num([ ,])(.*)', "SELECT \1CONCAT(REPEAT('X',12),RIGHT(cc_num,4)) cc_num\2\3",1); ProxySQL> LOAD MYSQL QUERY RULES TO RUNTIME;
We can test it again:
mysql> select firstname, cc_num from myapp.CUSTOMERS; +-----------+------------------+ | firstname | cc_num | +-----------+------------------+ | Frederic | XXXXXXXXXXXX5289 | | Dim0 | XXXXXXXXXXXX5289 | | Kenny | XXXXXXXXXXXX4589 | +-----------+------------------+
Wooohoooo it worked again 😉
Note: if you find ways to workaround the datamasking, let me know, thank you.
Don’t forget to also save your rules to disk:
ProxySQL> SAVE MYSQL QUERY RULES TO DISK;
As you can see it’s possible to have a very quick datamasking solution for MySQL using ProxySQL.
Would it work with a “select *”?
I think that a easier solution would be create a view “masked_customers” with the cc masked and rewrite select/update/delete/inserts to that view for “devel”.
As written in the article, select * is not allowed for that table.
View can indeed be an option but it can also cause performance issue in certain circumstances.
> if you find ways to workaround the datamasking, let me know, thank you.
Wouldn’t queries like these work :
* select firstname, cc_num, cc_num from myapp.CUSTOMERS;
* select firstname, `cc_num` from myapp.CUSTOMERS;
* select firstname, cc_num
Hey Thomas !
Thank you for this feedback and indeed you are right, good catch, let me try to improve the regexp !
[…] part 1, we saw how to mask one column for a specific […]
What will happen to these queries
(1) select firstname, /* cc_num */, from myapp.CUSTOMERS;
(2) select a.* from myapp.CUSTOMERS a
(3) /* */ select firstname, cc_num from myapp.CUSTOMERS;
Thank you for these examples, I’m currently working on new implementation. That will help me to improve it !
In addition, a user with the right permissions could create a view and access the data indirectly…
create view bleh as select * from customer;
select * from bleh;
i want to implementation of data masking using proxysql in linux o/s.i have created db and tables from my root and create a db_user also and granted all privileges to this user,first one i login into admin and create table mysql_users and mysql_servers also and insert the values from proxysql> prompt.then i load and save it.then after i have inserted mysql query rules on my created table (that i was created from root ).then load mysql query rules to runtime.after that i log into my db_user and created and insert the mysql_query_rules,but when i load mysql query rules for runtime….ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ‘MYSQL QUERY RULES TO RUNTIME’ at line 1 this error has come.
so please suggest me what i do ??
Really good article and very well explained and formatted well done!
I’ve implemented on one environment I have and found that when you break the lines like
This will show the data without masking it. It may be something simple like adding line breaks to the regex to sort this out.
Thank you Olivier for this use case.
thanks for the nice writeup. That was really helpful!
Today I played around with the regular expression for blocking all SELECT * type queries. This seems to work for quite a few standard cases:
INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,error_msg)
‘Query not allowed due to sensitive information, please contact firstname.lastname@example.org‘)
The (?i) part is responsible for ignoring case sensitivity.
There seems a typo in Save syntax should be
ProxySQL> SAVE MYSQL SERVERS TO DISK;
ProxySQL> SAVE MYSQL USERS TO DISK;
ProxySQL> SAVE SERVERS USERS TO DISK;