lefred

lefred

I am MySQL Community Manager for EMEA & APAC. I joined the MySQL Community Team in May 2016. I have been an OpenSource and MySQL consultant for more than 15 years. My favorite topics are High Availability and Performance.

devops Meetup at Fosdem 2011

I will be present at the devops meetup of Fosdem 2011. It will take place the Friday 4th February. I hope to see you there if your are in the devops mind and if you wanna share some experience with us ! Register here

devops… to package or not to package… this is the question !

During the Devopsdays in Hambourg, one of the most recuring discussion was about "packaging vs non-packaging, when and what?" I won't try to convince people on what do do when, neither will I say I have the absolute best solution, this post just illustrates the solution I implemented with @zipkid. Some points aren't finished yet, not implemented... or we have not yet decided which direction to follow.

First, let's start we the description of the environment:

A web based application (J2EE) with a MySQL backend, this product is delireved to us as a tgz package. There are many interconections between gateways, applications, databases, map servers, etc... all these defined in configuration files. We are using SLES from 10 to 11sp1 and we maintain a bunch of servers: physical machines of different types (dell, IBM blades,..) and virtual machines.

What tools do we use ?

- GNU Linux - redmine + kanban board plugin to define the tasks - a pxe installation system (autoyast in sles and cobbler in redhat/centos/fedora) to (re)install the machines - puppet to deploy the configurations - git to save all our configurations of puppet - svn to save other things like specs files (this should be migrated to git) - puppet-dashboard to have an overview of the deployed machines, an overview of puppet and define some variables we use in our recipes - rpmbuild to ... euh... build the rpms :) - jmeter to perform load test - nagios to monitor the systems

What is the process then ?

To define the processe, we must first divide it in several categories : - OS installation and maintenance - "our business product" To install a machine, we install a basic image on a machine (virtual or physical) via pxeboot using kind of kickstart files for redhat base system or autoyast for SLES. We create the node in the dashboard, we add some variables if needed like ip, environment, task. We add the server in the autosign file of puppet. In the dashboard and puppet we have several different environments that are linked to some git branches. This allow us to test recipes or settings without modifying the production. Then puppet is started and takes care of everything : vlan interfaces, bonding the interfaces, dns resolving, install the needed package and change the configuration files via puppet. Nagios checks are also configured by puppet. For our product, we first create the package (rpm) from the tgz provided by the developers, and put it in our own repository. After having installed it on the test servers we start some load test scenario.

Back to the big question then: do we package ?

The answer is definetively YES ! To keep a control of what is installed on the system (package version, release and not having orphaned files). BUT the default configuration files are overidden by the puppet run. conf files, xml, shell scripts, cron jobs are indeed provided by puppet and available in git (which provides us version control too) Of course puppet runs constantly on every machine to constantly guarantee the desired state, both on production and on the test machines! This is only dangerous if you don't test your puppet recepies enough during the development phase. We don't start the puppet client in deamon mode but we start the process via cronjobs to avoid any memory usage issues which we encountered with puppetd in daemon mode.

How to improve ?

We would like to improve the load test and automate the build, installation and test on the test server of "our product". We plan to use hudson for the CI with jmeter for unit tests and why not tsung for bigger load tests ? Some open question we still have if we deploy a CI system is how to link a build version with a puppet configuration ? Using a new branch in git linked to a new environment in puppet (and puppetdashboard) doesn't seem to be an optimal solution. We opted then with a git tag corresponding to the build release and only the last one in testing is deployed on the test machines. If needed we can rollback to a previous tag and package. It would be also great to automatically test our puppet recipes with a tool like cucumber-puppet. I think we are going in the right direction, but the road is still long to a fully automated processes with an overview control of all aspects. But we all agree that puppet already helped us a lot to maintain all our servers.

This is a schema illustrating the process :

1. developers provides a tgz with their application (a java compiled application, they also use Hudson to test their package) 2. the "DEVOPS" machine is started ! Devs and Ops collaborate to write the specs for the rpm package and the puppet recipe (dependencies, configuration settings) 3. test the package build and the puppet recipe (with cucumber-puppet) 4. add the package to the rpm repository and commit the puppet recipe to git (and the rpm spec to svn in our case) 5. puppetmaster gets updated with the new recipes 6. this is only in case of a new machine, the machine is automaticaly installed via pxe 7. puppet client installs the needed packages and configure the system as needed 8. puppet also configures nagios and nagios automaticaly startsmonitoring the machine and the services, hudson also starts unit tests and load tests if needed 9. same as point 6 10. puppet installs the needed packages and configuration to the production machine. it also configures nagios to monitor the machine and its services

Ignite talks with impress!ve

During last devopsdays in Hamburg, Gildas presented a session of ignite talks. He was using impress!ve but it seems the software was not really designed for that purpose: you should have manually defined the duration of the session but also calculate the duration of each slides... I decided then to patch this very nice product to fit the "ignite" needs :) The proposed patch automatically calculate the duration of the slides and add the a countdown for the slide display + the slide number. (see screenshots) New argument :
$ impressive --ignite 5m MySQL-spider.pdf

Windows 7 and Samba

Today I tried to put in production an update of Samba 3 (3.5.4) to allow Windope 7 clients to join the domain. After having performed what's on the samba wiki page about this topic [here], I could join the machine to the domain but I was not able to login !? :( In the log :
[2010/08/20 16:55:20.682477,  0] rpc_server/srv_netlog_nt.c:714(_netr_ServerAuthenticate3)
  _netr_ServerAuthenticate3: netlogon_creds_server_check failed. Rejecting auth request from client RO-BACKUP machine account RO-BACKUP$
[2010/08/20 16:55:30.993850,  0] lib/util_sock.c:474(read_fd_with_timeout)
[2010/08/20 16:55:30.993958,  0] lib/util_sock.c:1432(get_peer_addr_internal)
  getpeername failed. Error was Transport endpoint is not connected
  read_fd_with_timeout: client 0.0.0.0 read error = Connection reset by peer.
The problem was easy to solve but not easy to find : the two machines had not the same time (30 secs delay !) Fixing the time sync fixed the problem (and I'm not using kerberos and AD)

the culprit is always… SELinux :)

After having setup squid and dansguardian (using clamd) on Centos 5, I wasn't able to use it :( I had always the following error, even if the dansguardian user was the same as clamd (clamav) :
2010.7.9 12:22:41 - 10.0.200.6 http://www.eicar.org/anti_virus_test_file.htm
 *INFECTED* *DENIED* /tmp/tfIlR1j6: lstat() failed: Permission denied. 
ERROR GET 15590 0 Content scanning 1 403 text/html  
I just realize after having searched too long that SELinux (I know life is too short for it) was the culprit. It was my mistake as I completely forgot that this machine had selinux enabled :-S So in /var/log/audit/audit.log I had :
type=AVC msg=audit(1278673113.470:3489): avc:  denied  { getattr } for
pid=32164 comm="clamd" path="/tmp/tfCSCirx" dev=dm-3 ino=17 
scontext=user_u:system_r:clamd_t:s0 
tcontext=user_u:object_r:initrc_tmp_t:s0 tclass=file
type=SYSCALL msg=audit(1278673113.470:3489): arch=c000003e 
syscall=6 success=no exit=-13 a0=8cce370 a1=421f2dc0 a2=421f2dc0 
a3=8 items=0 ppid=1 pid=32164 auid=1004 uid=102 gid=114 euid=102 
suid=102 fsuid=102 egid=114 sgid=114 fsgid=114 tty=(none) ses=437 
comm="clamd" exe="/usr/sbin/clamd" subj=user_u:system_r:clamd_t:s0 
key=(null)
Note to myself: Never forget to check in audit.log ! To create the selinux policies, I used the following commands, which are quiet easy:
audit2allow -a -m dansguardian > dansguardian.te
checkmodule -M -m dansguardian.te 
checkmodule -M -m dansguardian.te -o dansguardian.mod
semodule_package -o dansguardian.pp -m dansguardian.mod
semodule -i dansguardian.pp 
Et voilà ! Dansguardian is running and I didn't disable selinux :-)

As MySQL Community Manager, I am an employee of Oracle and the views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

You can find articles I wrote on Oracle’s blog.