
Look for file changes and
On Guard
Watchman is an open source tool developed by Facebook and released under the terms of Apache License 2.0. The Watchman website [1] states: "Watchman exists to watch files and record when they change. It can also trigger actions (such as rebuilding assets) when matching files change."
Written in C, Watchman is multiplatform: It works on Linux, Mac OS X, BSD, and Illumos/Solaris. Windows is not listed as a supported operating system. Although the main goal of Watchman might not be as I describe in this article, its job is to trigger actions when a file changes. This functionality perfectly fits various needs.
Watchman might seem complex at first glance, with its options and configuration directives (yes, it is), but I'll start with some simple examples; then, you can delve deeper into the program on your own.
Replicating Files
Working with distributed or replicated systems is a common task these days. The magic word nowadays – the cloud – is an unclear term, but if you want to simplify the concept, the cloud is what years ago was called a "cluster." If you want to replicate a directory on a cluster of servers, a lot of software, besides shared storage like NFS, is at your disposal. However, often the learning curve is steep, yet the goal is so simple.
The basic scenario is as follows: You have a directory on a server and you want to replicate its content on another server. The simple way to achieve this is with a copy command (maybe using the rsync tool) triggered by a job scheduled in a crontab. Obviously this solution is not optimal, because the scheduled job will be executed even if the directory has not changed, the time between the execution of a task and a following one could be too long or too short, or the copy starts while a file is in a transitional stage, leading to some sort of inconsistency at a given time. Also, why wait until night to perform a backup? Sometimes it might be useful to copy a file as soon as possible.
In such a case, a handy tool like Watchman, which watches the tree of a directory (or a number of directories) and triggers an action when a change is detected, is a good solution.
Triggering Rsync
Install Watchman as instructed in the "Installation" box and start it with the command:
watchman watch /opt/repos
The path /opt/repos is an example of a directory that you might want to replicate to another server. In Watchman it is called the "root."
At this point the /usr/local/var/run/watchman/<username>.state configuration file will look like Listing 1. The Watchman process is up and running, but right now, it is simply watching the directory and reporting changes in the logfiles. (You can also query the daemon for changed files from the command line).
Listing 1: Configuration File
01 {
02   "version": "3.0.0",
03   "watched": [
04     {
05       "path": "/opt/repos",
06       "triggers": []
07     }
08   ]
09 }
  The next step is to define what action to trigger. The syntax is:
watchman -- trigger /opt/repos 'repos-sync' \
             -- /usr/local/sbin/sync.sh /opt/repos
Where 'repos-sync' is the name of the trigger and /usr/local/sbin/sync.sh is the script to invoke. Now the configuration file looks like Listing 2.
Listing 2: Configuration File with Trigger Action
01 {
02   "version": "3.0.0",
03   "watched": [
04     {
05       "path": "/opt/repos",
06       "triggers": [
07         {
08           "name": "repos-sync",
09           "command": [
10             "/usr/local/sbin/sync.sh"
11           ],
12           "append_files": true,
13           "stdin": [
14             "name",
15             "exists",
16             "new",
17             "size",
18             "mode"
19           ]
20         }
21       ]
22     }
23   ]
24 }
  The script that performs the rsync operation, sync.sh, will look like Listing 3. Your script might be more complex (e.g., by adding some sort of logging or notification via email). Please note that the environment variable WATCHMAN_ROOT is set by Watchman itself and contains the "root" (i.e., the watched directory). Make sync.sh executable by entering:
chmod +x /usr/local/sbin/sync.sh
However, you also need to implement SSH key authentication to use rsync over SSH without the need for a password. If you don't know how to do this, you'll find several articles on the Internet (e.g., at the TeachMeJoomla site [3]) describing this procedure.
Listing 3: sync.sh
01 #!/bin/bash
02
03 twinserver="remoteserver.your.domain"
04
05 rsync -avp --delete -e "ssh -i ~/.ssh/id_rsa_rsync" \
   ${WATCHMAN_ROOT}/ ${twinserver}:${WATCHMAN_ROOT}/
  Watching Multiple Directories
If you need to monitor more than one directory, you can reuse the same script, because the aforementioned WATCHMAN_ROOT variable contains the correct path. To do so, you need to run the commands as follows:
watchman watch /opt/repos2
watchman -- trigger /opt/repos2 'repos2-sync' \
         -- /usr/local/sbin/sync.sh
Now run the commands
watchman watch-list watchman trigger-list <root>
to see the list of watched directories (without going to the .state file) and the list of triggers.
Excluding Files
When you define a trigger, you can specify files to exclude; that is, you can prevent the trigger from executing if a file name contains a particular pattern. Note that with rsync, such an exclusion will not be taken in account. The following command can be performed at run time, and the configuration will be updated accordingly:
watchman -- trigger /opt/repos 'repos-sync' \
             -X '*.css' \
             -I '*' -- /usr/local/sbin/sync.sh
The -X excludes a pattern and -I includes a pattern. For a full accounting of pattern syntax, please refer to the Watchman website [4].
When a new file or any change inside the watched directory is detected, Watchman waits for the filesystem (the directory) to be idle for a time before dispatching triggers. By default, this time is 20 milliseconds, and this option is named "settle". To increase the amount of time, you must set the "settle" option in the global configuration file – /etc/watchman.json by default. This file uses JSON syntax, as well:
{
   "settle": 60
}
The settle option is useful if you want to prevent the script from executing before a large file write has completed.
Real-Time Virus Scanner
Watching a real-time virus scanner, such as the free antivirus ClamAV [5] or the free anti-malware tool Linux Malware Detect (a.k.a., maldet) [6], would be another interesting use case for Watchman. The maldet tool uses the Linux inotify facility to perform real-time scanning; however, Solaris/Illumos do not have inotify, so you would be unable to use this functionality on such operating systems. In this case, a good solution would be to use Watchman.
For example, suppose you want to scan files for viruses with the clamscan utility. Whenever a file is added, changed, or deleted, the Watchman service will trigger the scanner.sh script (Listing 4), which invokes the virus scanner, passing in the file name to the $@ variable.
Listing 4: scanner.sh
01 #!/bin/bash
02
03 for file in $@
04 do
05   clamscan --stdout --no-summary --infected ${WATCHMAN_ROOT}/$file
06   EL=$?
07   if [ $EL -eq 1 ]
08   then
09     # Virus found
10     echo "Virus found: ${WATCHMAN_ROOT}/$file" | \
         mailx -s "Virus found" mail@example.com
11   elif [ $EL -eq 2 ]
12   then
13     # Clamscan error
14     echo "Scan error: ${WATCHMAN_ROOT}/$file" | \
         mailx -s "Scan error" mail@example.com
15   fi
16   # Else no virus found
17 done
  However, when you delete a file, Watchman still kicks the trigger, and the name of the deleted file is passed to your script, even though you don't need to scan a non-existent file; therefore, you need to use extended trigger syntax (Listing 5). The key options are "exists" and "append_files": true. The append_files entry enables the arguments – that is, the list of new or changed files – to be passed to the script, and "exists" tells Watchman to trigger the action only if the file exists.
Now, if you execute
watchman watch /opt/share
Watchman passes a list of changed, but not deleted, files to the script, using the relative path to "root" (/opt/share). Of course, you could also use clamdscan, if the clamd service is up and running, or the aforementioned maldet tool on Linux.
Listing 5: Extended Trigger Syntax
01 watchman -j <<-EOT
02 ["trigger", "/opt/share", {
03   "name": "scanner",
04   "expression": ["allof", ["match", "*"], ["exists"]],
05   "command": ["/usr/local/sbin/scanner.sh"],
06   "append_files": true
07 }]
08 EOT
  Shutdown and Reboot
The correct way to shut down Watchman is:
watchman shutdown-server
Remember that if you reboot your server, you need a way to start Watchman automatically at bootup. For SmartOS [7], you can import a sample SMF manifest from the watchman-pkgsrc GitHub site [8] (check the paths). For CentOS 7 with the systemd suite, you can create the file /etc/systemd/system/watchman.service containing something like Listing 6. Finally, you need to issue
systemctl enable watchman systemctl start watchman
to enable Watchman on bootup and start the service.
Listing 6: watchman.service
01 [Unit] 02 Description=Watchman 03 After=network.target 04 05 [Service] 06 Type=simple 07 User=root 08 ExecStart=/usr/local/bin/watchman --log-level 1 watch-list -f 09 ExecStop=/usr/local/bin/watchman shutdown-server 10 11 [Install] 12 WantedBy=multi-user.target
Conclusion
Watchman is a flexible tool under active development. Even if its usage seems unfriendly, learning some basic functionality can help you accomplish complex tasks with your own scripts. Here, I looked at two use cases: triggering rsync and a real-time virus scanner. Instead of triggering an action, you could also simply query Watchman for changes that happen inside a directory during a given period. The uses for Watchman are numerous: No doubt a number of applications have occurred to you already.
