Graph database

In one of the side projects I am involved into in my free time that helps me learn a bunch of things that are not really in my normal workflow, I faced a problem of working with some big sets of data (millions of rows per table, multiple tables, bunch of relations). Trying old ways of improving performance like tuning DB engines (MySQL, MariaDB), optimising code to use low-level queries, multi-row inserts, tweaks on data models, etc, didn’t give the desired results, so I went out googling and discovered graph databases. Not something new in general as Graph Theory is well known, but the use case is pretty interesting.

Not that I am already deep into it, but feels like I will spend some time looking into the technology. First I got a hint on Neo4j somewhere in StackOverflow, but didn’t like something about it and went on further googling of the subj. Ended up at Top 15 Free Graph Databases I first stopped on OrientDB, installed it for the test and played around and while it looks very promising, I have couple of issues with it:

  • Java: it is a personal issue of me not being in love with Java at all. Installing JDK on the server to run DB is something I would do only in case it is absolutely required for my complete happiness. Not that I had any issue during testing, I still don’t trust it somehow internally
  • Poor documentation: you can see pretty extensive documentation on their web-site, but it is a bit hard to navigate and find things around and when you seek Google help for what you need, you mostly end-up on 404, so either old version links are in Google and are not on the site or something else weird is going on.
  • Driver interfaces (outside of Java world) are bad documented or/and bad implemented, at least for PHP. Both, official PHPOrient and Doctrine ODM show only small snippets of usage with no clear overview of what is possible (apart from basic things)

While there are cons above, there are obviously some pros, like almost native SQL, easy install (even involving Java), nice tool-set, etc.

After reviewing the list of 15 databases, the second choice was ArangoDB, which is:

  • Written in C++
  • Has very good and solid documentation with lots of examples (even comparisons for people who came from traditional SQL background)
  • Has lots of pre-build packages for different operating systems and YUM repo for RedHat followers
  • Convincing benchmark comparison of different DB engines and scenarios (not gonna state about truth, as benchmarks are always tricky, but who doesn’t like graphs?)

Still need to put my hands on, but I think this is some nice journey.

If you are in the topic, please leave your thoughts and ideas in the comments or send me them via any other possible path of communication to save time and effort :-)

SSH via bastion host with ForwardAgent

While it is pretty common to have an infrastructure behind load-balancers and bastion hosts, there are still many confusion around actual configuration of the SSH client for fast and convenient use of the setup. While I am not going to talk about actual advantages of bastion hosts, I will put here some clarifications on the SSH client setup.

Assuming you you a bastion.host that you user as a connection gateway to your private.host and you want to work with your default SSH key that is only on your local PC/laptop, you have two possible way.

The one and the most commonly used is with SSH Agent forwarding, meaning you have to run ssh-agent on you laptop, add the SSH keys to it via ssh-add command (or use ssh-add -L to list all keys in the agent) and then user ForwardAgent yes in ~/.ssh/config, something like this:

Host bastion.host
    User ssh_user
    HostName bastion.host
    ProxyCommand none
    IdentityFile ~/.ssh/id_rsa
    PasswordAuthentication no
    ForwardAgent yes
Host private.host
    User ssh_user
    ProxyCommand ssh -q -A bastion.host -W %h:%p
    IdentityFile ~/.ssh/id_rsa
    ForwardAgent yes

And while this is all cool from one point of view, this method has few drawbacks:

  • Running ForwardAgent is not a good idea in terms of security, and you can read more about it here.
  • Running ForwardAgent requires you to actually configure and run ssh-agent on you local PC/laptop, which is not a big deal at all, but you will have to remember and check it all the time or you will have all kind of authentication errors and will spend sometime to find out the reason for them (not running/misconfigured agent

The second method to achieve the same functionality in terms of bastion host and avoid messing around ssh-agent is to use SSH ProxyCommand. In this scenario, when configured properly, ssh will first run the ProxyCommand to establish the connection to bastion.host and then enable the tunnel via this connection to the private.host. This way bastion.host will know nothing about your keys or anything related to authentication, but will just make a tunnel (similar to SSH port forwarding) and keep it for you until you are done.

To get this to work, you would adjust the ~/.ssh/config as follows:

Host bastion.host
    User ssh_user
    IdentityFile ~/.ssh/id_rsa
    ForwardAgent no
Host private.host
    User ssh_user
    ProxyCommand ssh -W %h:%p -q bastion.host
    IdentityFile ~/.ssh/id_rsa
    ForwardAgent no

So now as you have all in place and configured, you can ssh private.host and enjoy the stay on your secure server. While this is all cool, it has a lot of default things and assumptions behind the scene which you are not bothering to learn until you face a slightly different requirements: assume that you need to have the SSH configuration and per-host keys not in your home .ssh directory, but somewhere else. Lets say you have some /very/secure/location with separate ssh.conf (with the content from above) and a bastion.id_rsa and a private.id_rsa to use for the connections. To make them work you would assume that you only need to adjust the IdentityFile configurations to point to the correct keys and then run your SSH as follows: ssh -F /very/secure/location/ssh.conf private.host

Bad news – it will not work and will give you authentication error. Though you still will be able to access bastion.host with the above, you won’t be able to reach you final destination at private.host.

Good news – thanks to this lovely discussion at stack overflow, a minor adjustments have to be done in your ProxyCommand: you need to specify the ssh config file to it as well, so now it will look like: ProxyCommand ssh -F /very/secure/location/ssh.conf -W %h:%p -q bastion.host

Obviously the reason is that giving -F to initial ssh command, you instruct it to look for a specific configuration file, but when it will run the ProxyCommand, that instance of ssh client will have no clue whatsoever about your custom config and will look for default one in ~/.ssh/config and system-wide settings.

I’ve spent quite some time before I figured out what’s going on and in order not to do so again and hopefully save some of your time, let this post be here for future references

mount and systemd

Had a task: double the size of a volume on amazon AWS EC2 instance. The process is yet manual and it is roughly as follows:

  • Create a new volume on AWS with double the size of the the old one
  • Attach it to the instance
  • Create partition and filesystem on the new volume
  • Mount the new volume somewhere next to the old volume mount point
  • Rsync data from the old volume to the new volume
  • Adjust /etc/fstab to point to the new volume for the corresponding mount point
  • Unmount both volumes
  • Mount the new volume to the old mount point
  • Detach the old volume from instance
  • Delete the old volume

All pretty simple and strait-forward. BUT! The new volume is not mounting to the old mount point, while mount command is silent about it!!!

Syslog gives a hint:

systemd: Unit var-whatever.mount is bound to inactive unit dev-xvdg1.device. Stopping, too.
lb1 systemd: Unmounting /var/whatever...
lb1 systemd: Unmounted /var/whatever.

That’s interesting. Why it is still bounded to inactive device (which I have already detached) and how I can unbound it?

Apparently all records in /etc/fstab are converted to systemd units and all mounting is (these ugly days) done via systemd. So when I changed /etc/fstab, the systemd didn’t update the the related unit and was still trying to mount the old device. To fix the problem you need to run:

systemctl daemon-reload

I am too old for this shit… Why are simple things getting more and more complicated (firewalld? ;-))

Robo.li: reducing code base and other tricks

Last time I showed some basic tricks on how to use Robo.li with ease and that was a big post (compared to my other ones), but still didn’t cover some of the very basic things that can save a lot of time and efforts.

Abstract Base Task

To avoid doing same thing over and over again, let’s start with the abstract base task that will extend \Robo\Task\BaseTask. From the last time example you saw that my tasks were extending the Foo\Robo\AbstractTask, so let see three main things that are done there:

<?php

namespace Foo\Robo;

use \Robo\Common\ConfigAwareTrait;
use Robo\Task\BaseTask;
use Robo\Result;

/**
 * Foo base task.
 */
abstract class AbstractTask extends BaseTask
{
    use ConfigAwareTrait;

    /**
     * @var array $data Task data fields
     */
    protected $data = [];

    /**
     * @var array $requiredData List of required data fields keys
     */
    protected $requiredData = [];

    /**
     * @var string $configPrefix Config path prefix
     */
    protected static $configPrefix = "task.";

    /**
     * @var string $configClassRegexPattern Regex to extract class name for config
     */
    protected static $configClassRegexPattern = "/^.*Tasks?\.(.*)\.[^\.]+$/";

    /**
     * @var string $configClassRegexReplacement Regex match to use as extracted class name for config
     */
    protected static $configClassRegexReplacement = '${1}';

    public function __construct($params)
    {
    }

    /**
     * {inheritdoc}
     */
    public function run()
    {
        // for any key defind in data
        // set it to config value, if available
        foreach ($this->data as $k => $v) {
            $default = $this->getConfigValue($k);

            // specifically check for null to avoid problems with false and 0
            // being overwriten
            if ($this->data[$k] === null && $default !== null) {
                $this->data[$k] = $default;
                continue;
            }
            // if key value is an array, merge the config value to it
            if (is_array($this->data[$k]) && is_array($default)) {
                $this->data[$k] = array_merge_recursive($this->data[$k], $default);
            }
            continue;
        }        

        // check if we have all required data fields
        $res = $this->checkRequiredData();
        if (!$res->wasSuccessful()) {
            return $res;
        }

        // general success, as will be overriden by child classes
        return Result::success($this, "Task completed successfully", $this->data);
    }


    /**
     * Magic setters via __call
     * Make sure only valid data passes thtough
     *
     * @param string $name data key name
     * @param mixed $value data value name
     */
    public function __call($name, $value)
    {
        // we use snake_case field keys
        // but camelCase setters
        $name = $this->decamelize($name);

        // only set values for predefined data keys
        if (array_key_exists($name, $this->data)) {
            $this->data[$name] = $value[0];
        }

        return $this;
    }

    /**
     * Check that all required data present
     *
     * @return \Robo\Result
     */
    protected function checkRequiredData()
    {
        $missing = [];
        foreach ($this->requiredData as $key) {
            if (!isset($this->data[$key]) or empty($this->data[$key])) {
                $missing []= $key;
            }
        }

        return count($missing)
            ? Result::error($this, "Missing required data field(s) [" . implode(",", array_map([$this,"camelize"], $missing)) . "].", $this->data)
            : Result::success($this, "All required data fields preset", $this->data);
    }


    /**
     * Ported from Ruby's String#decamelize
     *
     * @param string $word String to convert
     * @return string
     */
    protected function decamelize($word)
    {
        return preg_replace_callback(
            '/(^|[a-z])([A-Z])/',
            function ($matches) {
                return strtolower(strlen($matches[1]) ? $matches[1] . "_" . $matches[2] : $matches[2]);
            },
            $word
        );
    }

    /**
     * Ported from Ruby's String#camelize
     *
     * @param string $word String to convert
     * @return string
     */
    protected function camelize($word)
    {
        return preg_replace_callback(
            '/(^|[a-z])([A-Z])/',
            function ($matches) {
                return strtoupper($matches[2]);
            },
            $word
        );
    }

    /**
     * Override of Robo\Common\ConfigAwareTrait configPrefix()
     */
    protected static function configPrefix()
    {
        return static::$configPrefix;
    }

    /**
     * Override of Robo\Common\ConfigAwareTrait configClassIdentifier($classname)
     */
    protected static function configClassIdentifier($classname)
    {
        return preg_replace(
            static::$configClassRegexPattern,
            static::$configClassRegexReplacement,
            str_replace(
                '\\',
                '.',
                $classname
            )
        );
    }

    /**
     * Override of Robo\Common\ConfigAwareTrait getClassKey()
     *
     * makes method protected instead of private
     */
    protected static function getClassKey($key)
    {
        return sprintf(
            "%s%s.%s", 
            static::configPrefix(),
            static::configClassIdentifier(get_called_class()),
            $key
        );
    }

    /**
     * A quick fix on printInfo, as it is not very friendly
     * when you use 'name' placeholders or even just have 'name'
     * set in the data
     */
    protected function printInfo($msg, $data = null)
    {
        // pass-through when no 'name' found in data
        if ($data == null || !isset($data['name'])) {
            return $this->printTaskInfo($msg, $data);
        }

        // doubt someone will use this ever in data
        $key = 'print_task_info_name_replacement_macro';

        // replace 'name' with above key both in data
        // and in msg placeholders
        $data[$key] = $data['name'];
        unset($data['name']);
        $msg = str_replace('{name}','{' . $key . '}', $msg);

        // print nice message
        $result = $this->printTaskInfo($msg, $data);

        return $result;
    }
}

Configuration

The first thing to note is that we use Robo’s config aware trait, that gives all our tasks access to the configuration (that we have covered in the previous post about Robo). This is very handy, as we don’t need hard-code anything and have couple of interesting tricks, but that’s a bit later. Obviously we had to override couple of methods from configAwareTrait to make thing work for us (and actually took me some time to figure out how all of this works in Robo).

Dynamic Data

The second trick here is to avoid using plain class properties and end up writing a hell amount of getters/setters/validators/etc. Instead we use $data array property with magic __call method that will do a job for us. The only restriction we put in here is that __call will set data only if key is already defined in the array. This restriction is there to avoid rubbish in our data, since we use to pass data as a whole set in many places, and need to make sure we know what’s in there. Another sub-trick here is that we user $requiredData property with a list of keys that are essential to us on run. While this is not something very tricky, it really helps in child classes run() method when called like

public function run() 
{
   $result = parent::run();
   if (!$result->wasSuccessful()) {
       return $result;
   }

   ...
}

So all base validation is done in parent class as long as you have your $data and $requiredData populated.

Please note that we also have to utility methods camelize() and decamelize() that are handy, as magic methods for setters are in camelCase, while actual data keys are in snake_case and those two methods are used to convert between the two.

Dynamic Data Configuration

The third trick, which is actually happens even before the second one, is that since we have access to configuration, whenever something is not set in $data during the run, but we have it in the corresponding config path – apply it to the data, but be careful apply only those configuration variables, that are present as our $data key. This is very handy and very flexible. First of all, many of the defaults are defined in the config and are easy to change. The second advantage is that we can have some other custom things in the config that can be used by any other custom command or component, that will not conflict with our tasks. For example configuration of an API with keys/secrets. Moreover, since tasks can also write to the config, same tasks can share things between them if required (for example some runtime cache data).

Task Information Output

Another handy Robo’s Task thing is the ability to inform about task activity via

$this->printTaskInfo("some message with {some_var} variable", ['some_var' => $this->some_var]);

This is a very cool feature and it looks awesome in the console, as some_var will be highlighted and so on. The above example is exactly of the format I found it on Robo.li documentation, but while using it, I found two problems:

Coming up with different {some_var} macros and using the second arg as above is just ugly and unproductive. That was one of the things that pushed me to using $this->data array instead, so in my case, when I need to print something, I will just do like this:

$this->printTaskInfo("This guy's name is {name}", $this->data);

Assuming we have $this->data[‘name’] of course. The trick with the data property saved me a day, but later on I found out one drawback: the output has a defined format, and if the above would run from \Foo\Robo\Task\Guy\PrintName, the output I would expect would be:

[\Foo\Robo\Task\Guy\PrintName] This guys name is Some Guy

But instead I was getting

[Some Guy] This guys name is Some Guy

This is due to the way Robo loggers are trying to find their context and finally end up using name supplied as their name. To fix the problem, I simply wrote a printInfo() method that would replace name with something else during the debug.

Task Return Value

Make sure your task always returns \Robo\Result instance as it will make your life much easier when you will be going to use your tasks in the commands. Make sure you always specify the reason/message that clearly describes why you return this type of result both for success and errors. Finally make sure you return some kind of supporting data (passed as a third argument to result constructor) to make commands life even easier. Normally I would put $this->data  in most of the cases, for example:

public function run()
{
    try {
        $res = $this->tryToValidataDataExampleMethod();
        if (!$res) {
            return Result::error($this, "Failed to validate data", $this->data);
        }
        return Result::success($this, "Data successfully validated", $this->data);
    } catch (\Exception $e) {
        return Result::fromException($this, $e);
    }
}

There few interesting things you can do with the data from the commands, but that I will probably cover in a later posts.

Asterisk: Initiate a call from extension by PHP script

If you have an asterisk PBX and some kind of internal web-based system like intranet or CRM with a contacts that your team need to call from time to time, there is an easy way to allow users fast dial those contact by linking a telephone number on a web page with a script that will call Asterisk and instruct it to dial first the user extension, and then connect him to the contact number.

The example code, that was originally taken from here and slightly modified looks like this:

# ip address that asterisk is on.
$strHost = "127.0.0.1"; 

# asterisk manager username and password
$strUser = "admin";
$strSecret = "secret_password"; 

# specify the channel (extension) you want to receive the call requests with
# e.g. SIP/XXX, IAX2/XXXX, ZAP/XXXX, etc
$strChannel = $_REQUEST['exten'];
$strContext = "from-internal";

$number = strtolower($_REQUEST['number']);
$strCallerId = $number;

#specify the amount of time you want to try calling the specified channel before hangin up
$strWaitTime = "30";

#specify the priority you wish to place on making this call
$strPriority = "1";

# validation
$valNumber = '/^\d+$/';
$valExt = '/^(SIP|IAX2|ZAP)\/\d+$/';

if (!preg_match($valNumber, $number)) {
    print "The number is incorrect, should match '$valNumber' pattern\n";
    exit();
}
if (!preg_match($valExt, $strChannel)) {
    print "The extension is incorrect, should match '$valExt' pattern\n";
    exit;
}

$errno=0 ;
$errstr=0 ;
$oSocket = fsockopen ($strHost, 5038, $errno, $errstr, 20);

if (!$oSocket) {
    echo "$errstr ($errno)<br>\n";
    exit();
}

fputs($oSocket, "Action: login\r\n");
fputs($oSocket, "Events: off\r\n");
fputs($oSocket, "Username: $strUser\r\n");
fputs($oSocket, "Secret: $strSecret\r\n\r\n");
fputs($oSocket, "Action: originate\r\n");
fputs($oSocket, "Channel: $strChannel\r\n");
fputs($oSocket, "WaitTime: $strWaitTime\r\n");
fputs($oSocket, "CallerId: $strCallerId\r\n");
fputs($oSocket, "Exten: $number\r\n");
fputs($oSocket, "Context: $strContext\r\n");
fputs($oSocket, "Priority: $strPriority\r\n\r\n");
fputs($oSocket, "Action: Logoff\r\n\r\n");
sleep(2);
fclose($oSocket);

echo "Extension $strChannel should be calling $number." ;

So, for example, if you put this code on the asterisk server in a web root as call.php and then call http://<your_asterisk_ip_address>/call.php?exten=SIP/737&number=77777777, your asterisk will attempt to connect extension 737 with number 77777777.