TCP fine-tuning and its consiquenses

In a constant run to optimize resources all of us tend to find a way to fine-tune different aspects of the systems to achieve better server performance. While some people are getting satisfied with minor adjustments in the applications’ configuration, more advanced guys try to go as deep as possible and alter low level kernel flags to try and get the best of the best. That’s all kinda cool and fun, but as it is impossible know everything about everything we often ask google for advice, and that’s where some problem begin. There are bunch of howto’s out there on the internet with different solutions on improving performance, but most of them don’t go deep enough to show possible drawbacks of such solutions. As sysadmins a lazy people, RTFM or better to say RTF-RFC and getting all the internals is not in our habit, until something brakes. And that’s what I had to face yesterday.

My brother called me for help with his problem that he was trying to figure out for some time already, but it seemed that nothing could save him. The problem was pretty tricky and it was pretty well described in Leonid’s blog post, so I will get more on how it was rectified:

The troubleshooting started with simple things and went all the way down to tcpdump. Given the fact that his office server could successfully communicate with Amazon server, but non of the desktops/laptops behind the office server couldn’t, the fun begin.

– traceroute shows that we have proper connectivity and routing in place.
– tcptraceroute shows problems, netcat confirms the problems with connection timeout
– iptables rules look fine on all the parties
– logs on the Amazon server do no show anything useful

My first thought was: WTF! And when we have WTF related to connectivity, we use tcpdump.

The tcpdump shows that:
– initial TCP SYN packet successfully leaves office laptop
– the same packet successfully passes the firewall coming to LAN interface and leaving the WAN interface
– and the TCP SYN packet successfully arrives to the destination Amazon server, but
– there is no TCP SYN-ACK reply ever leaving Amazon server towards office

In case when we leave office laptop alone and try to do the same thing from the office server, both TCP SYN successfully reaching the Amazon server and we have TCP SYN-ACK as well as any following TCP packets successfully traveling between the communicating nodes.

After we have all of the above info gathered, the problem was localized to Amazon server and the question was as simple as: why Amazon server is not replying with TCP SYN-ACK to the office laptop, while it does reply with TCP SYN-ACK to everyone else. That was the point where my knowledge of the TCP internals was exhausted and I turned to google for a solution. As always, there are bunch of articles out there all with different ideas and very limited low-level explanations, so came back to tcpdump on Amazon server and started the game of “find 3 differences between two TCP ACK packets that arrive one from office laptop and one from office server”. The only two differences I managed to see were:
TCP window size of packet from laptop was way bigger (29200) then from office server (5840)
– Timestamp value of packet from laptop was way smaller (64389040) then from office server (809044567)

Quote from tcpdump, first packet from laptop, second from office server:

xxx.xxx.xxx.xxx.55470 >yyy.yyy.yyy.yyy.22: Flags [S], cksum 0x8cb3 (correct), seq 3904091306, win 29200, options [mss 1460,sackOK,TS val 64393040 ecr 0,nop,wscale 8], length 0
15:53:00.755020 IP (tos 0x0, ttl 50, id 55870, offset 0, flags [DF], proto TCP (6), length 60)
zzz.zzz.zzz.zzz.43952 > yyy.yyy.yyy.yyy.22: Flags [S], cksum 0xcfbf (correct), seq 1790824553, win 5840, options [mss 1460,sackOK,TS val 809044567 ecr 0,nop,wscale 8], length 0
15:53:00.755071 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)

With the above two facts I started the investigation on TCP window size. I did remember that this metric can be dynamic and the difference is possible, but I thought it might be more problem then timestamp, that is obviously different all the time and who cares about timestamp anyway. Google showed me a number of options to try with regards to sysctl, including but nor limited to disabling TCP time scaling, adjusting different buffers of OS TCP stuck and so on, which I tried to apply everywhere including Amazon server, office server and office laptop all with no success. Finally, some post via google (lost original post) told that setting net.ipv4.tcp_tw_recycle to 0 solved the problem. Having no other alternatives I did apply the setting on Amazon server and all came back to normal – now everyone could connect to the server and all was working as it supposed to.

Since the problem was gone, I reported to my bother that he can continue with his other tasks as one problem less, made sure that the flag is set permanently in /etc/sysctl.conf and realized that now I need to learn more of TCP internals. Fortunately there is an amazing article by Vincent Bernat “Coping with the TCP TIME-WAIT state on busy Linux servers” that dives into how the whole thing works, why we should not mess with the TCP TIME-WAIT and that at the end, changing this flag will not give one any visible advantages.

As a resume of the above, before you change any kernel flags, make sure you really understand what you are doing. Before applying any configuration changes proposed by some online howto, make sure you exactly what you are doing and don’t trust anyone blindly. Finally – learn to troubleshoot with low-level tools that will help you spot the problem or at least show the directions for further troubleshooting.

P.S.: Leonid, thanks for fun experience and something new! Was fun!

JavaScript, Node.js, Meteor, MongoDB and related

For the past few weeks I’ve been playing with Meteor reactive framework which is heavily utilizing Node.js and MongoDB. It’s been a while since I did something in JavaScript and never ever before I tried something that can be called “reactive”. While few things a pretty weird and a lot of concepts are familiar, there were few moments that got me stuck for a bit and I want to post here just to remember in the future:

Net package for sockets from Node.js

Since my task required some plain socket communication with other service, I touched the default net package from Node.js and while from the first look it was pretty easy, there are couple of problems. Some of them are known (like binding sockets to fibers with bindEnvironment) and there are lots of post around on the forums and related sites, but one that got me go crazy was reading data from sockets line-by-line.

The on data event for the socket fires whenever there is some data coming, but it doesn’t mean you will receive it line-by-line. You can receive part of the line or lots of lines and the last one is not promised to be terminated by newline. The workaround that I found with help of Google was to use backlog, and whenever you receive some data, append it to the backlog and then shift data from backlog line-by-line, leaving whatever rest in backlog that is not a complete line in the log. The idea is clear and should work, but what happens if you receive some more data in the socket and on data event fires while you are still processing the previous call of such event? Not that easy I found out that now I have multiple callbacks manipulating same backlog and that ends up with a lot of mess in you data. Old-school ideas that I thought of was all kind of locks, tokens so on to prevent such behaviour, but did work out very well. Finally, the easiest way was to put socket on pause whenever some data is received, process that data and then resume the socket when processing of data is done. To make my life even easier, whenever I have a full line extracted from socket backlog, I was just emitting the line event on it and process it in different place. The code I end up with is as follows:

socket.on('data', Meteor.bindEnvironment( function (data) {
    socket.pause();
    socketBackLog += data;

    var n = socketBackLog.indexOf('\n');
    while (~n) {
        socket.emit('line', socketBackLog.substring(0, n));
        socketBackLog = socketBackLog.substring(n + 1);
        n = socketBackLog.indexOf('\n');
    }
    socket.resume();
}));
socket.on('line', Meteor.bindEnvironment( function (data) {
    processData(data.toString());
}));

Note the Meteor.bindEnvironment all over around callbacks for socket – this is the way to keep things in fibers, otherwise Meteor will complain and fail.

Loose data types

I know it is pretty common practice now and many languages do not force you to cast variables or define them with the particular type and that is somewhat cool in most of the cases, but sometimes I really miss C-style with all that strictness. This time with JavaScript was exactly the case. Did I ask to convert 1/0 in some cases to boolean true/false or to string “1”/”0″ when I stated it is 0/1??? Since I need integer type, my code is full of binary OR operations, that force JavaScript to keep variable to be unsigned integers.

Example inserting some stuff in MongoDB and forcing integer:

Stuff = new Mongo.Collection("stuff");
Stuff.insert({
    name: "Test",
    number_of_kids: (1 | 0)
});

Basically I wound do binary OR my “integer” variables with 0 and get the result.

MongoDB object relationship and related

That is still a mystery for me and I will try to sort it out eventually. Assuming I have one child collection that has two different parent collections, or, in more traditional way, two OneToMany relations where “One” is the same. In my case here how it would look:

ParentsA = new Mongo.Collection("parents_a");
ParentsB = new Mongo.Collection("parents_b");
Childs = new Mongo.Collection("childs");

var parent_a = ParentsA.insert({
    name: "parent 1 type 1",
    count: (0 | 0),
    childs: []
});

var parent_b = ParentsB.insert({
    name: "parent 1 type 2",
    count: (0 | 0),
    childs: []
});

var child = Childs.insert({
    name: "child 1"
    parent_a_id: parent_a,
    parent_b_id: parent_b
});
ParentsA.update(parent_a,{$inc: {childs: (1 | 0)},$addToSet: {childs: child}});
ParentsB.update(parent_b,{$inc: {childs: (1 | 0)},$addToSet: {childs: child}});

In my case, if I do find/findOne on the records, both parents will have in their childs a list of child IDs (not child objects), which I can assume is a normal thing, but strange thing comes with child record itself: it would have parent_a_id as a plain ID for parentA and in parent_b_id it will have the whole parentB object. So to find the ID of parentA I can call child.parent_a_id, but for parentB I have to call child.parent_b_id._id and until now I don’t know what controls this behaviour.

Another problem I faced is that, according my knowledge, there is no way to count number of items in parents childs field and I have to keep track with the count field. But good thing is the there few query modifiers in Mongo that makes my life easier. As you can see, I use $inc modifier to adjust the count as well as $addToSet to make sure I don’t put the same child to the parent twice.

Setting session variables on client from server

I really love all this reactive things, the way how client acts on collection changes and session adjustments, but one thing is still not clear for me – how can I adjust clients session variable from the server after some events happen. Simple example:

if (Meteor.isClient) {
    Template.some_template.events({
        'click .send_data': function (event,template) {
            Meteor.call('processData',template.find('input[name="data"]').value);
        }
    });
}
if (Meteor.isServer) {
    Meteor.methods({
        'processData': function (data) {
            var socket = new IMAP({....});
            socket.once('ready', Meteor.bindEnvironment(function () {
                // here I want to set clients session "imap_ready" to true
            }));
            socket.connect();
        }
    });
}

What happens is whenever client clicks some .send_data button, we get the data from input and pass it to server’s method processData, which tries to establish the connection to IMAP server and if ok – I want to update clients session “imap_ready” variable. The problem here is that we don’t really know when (if at all) socket connection will emit the ready event and by sure, the processData will already return by that time, so using optional callback for Meteor.call is not an option as well.

For the time being I solved the problem by introducing a MongoDB collection which has session_id, key, value fields. Whenever client calls server methods of such kind, it passes session_id as additional argument (BTW had to use permanent session addon to avoid session losing session data and uuid addon to generate some nice session id), then whenever server has something it needs to pass back – it updates the relevant Mongo document and on the client side, I use observeChanges on the collection to gather all the data and put it in the session. Sounds weird, I don’t like this way, but it somehow works. If anyone can suggest a better way for the above problem – feel free to comment or contact me in any other way.

I think that’s enough for the time being. Maybe one day I will post a follow up with solutions (where applicable) to the problems above.

Plain SQL v.s. Zabbix API for text history items

For one of the past tasks I have the following requirement: a number of hosts that are monitored with Zabbix has a item of history type that provide list of addresses in text format (one per line) and than I need to take all those lists and make one common list (table) where I would have in each row an address from the list, a hostname of the node where this address was last seen and timestamps.

Initially I added in zabbix the item to monitor on each node I needed, created a separate table in MySQL to hold the final list and then made a cron script that would do the following:

  • retrieve a list of nodes that have a given item by item key_ with Zabbix API
  • retrieve all items by itemid that I found out in a previous query with Zabbix API
  • for each item, retrieve the latest history with Zabbix API
  • for each history, split the text by new line to get addresses and then add each address with source host and timestamps in the final MySQL table, or update the timestamp and source in case address is already in the list

With total size of the list about 5K addresses, all of the above was taking around 4-5 minutes and was consuming a lot of CPU and memory on the server. As I was limited on server resources and wanted the list to be updated every minute, I decided to avoid using Zabbix API and try to the job with plain MySQL queries. As I am only interested in the latest history for each node, I recalled my own post that I did a while ago on SQL GROUP BY with subqueries. Checking zabbix SQL structure around and a bit of playing with queries, I ended up with the single request that will give me all I need:

SELECT * FROM (SELECT h.host,hi.id,hi.value\
FROM hosts AS h, items AS i, history_text AS hi\
WHERE i.hostid=h.hostid AND hi.itemid=i.itemid\
AND h.status<>3 AND i.key_ LIKE 'my_item_key%'\
AND hi.value <> '' ORDER BY hi.clock DESC) tmp_table\
GROUP BY host;

The sub-query will give me hostname, history entry id and history values from history_text for non template hosts (status <> 3), with non empty value for the item key_ I want order by time newest first and then the main query will take that list shrink it down to have only on entry per host which will be the newest one.

Now having this list, for each result raw, I can split the value by newline to extract all addresses and add them or update one by one in the final table. Here comes in another trick that I that I described in my old post here: since I need to update the source for already existed entries in the final table while adding anything that is not there, I run the insert with the following SQL statement, considering that the address field is a primary key and is unique:

INSERT INTO final_table (address,source)\
VALUES('$address','$source')\
ON DUPLICATE KEY UPDATE source='$host';

So whenever there is a conflict on address field, the source field is getting updated with the new value.

After changing Zabbix API queries to native SQL, the script runs few seconds and consumes almost nothing, as it is relying on MySQL engine to do most of the job, I MySQL can do it much better.

Finally, if there is no interest in history items after they were imported into final_table, it is possible to delete all raws for the items from history_text for a given key_ table with the following SQL query:

DELETE FROM history_text WHERE itemid IN\
(SELECT hi.id FROM items AS i, history_text AS hi\
WHERE hi.itemid=i.itemid AND i.key_ LIKE 'my_item_key%');

This is an alternative to relying on Zabbix housekeeper that will do the job, but a bit later. And if polling of the nodes for this item is pretty frequent and resulted values are pretty big – it will consume space in MySQL that we want to avoid.

VLC mosaic for multiple RSTP streams

Recently had a task to open streams from 4 cameras over RSTP in a single windows using VLC. There is a bunch of howtos on net with relevant info and examples, but after trying many options non of them worked out of the box. Tweaking things here and there for a while I managed to come up with the working configuration, so I post it here for future reference and hopefully it will be helpful to anyone else.

My cameras stream dimensions is 1280×720 and as I want to fit 4 cameras on one screen, I will scale them in half to have screen size 1280×720 and each stream size 64×360.

First thing to do is to create some background image with exact size of the desired screen (1280×720 in my case) and save somewhere near (bg.jpg in my case).

Then we need to create a VLM config file (let it be cam.vlm.conf) to tell VLC about my streams and how to deal with them:

new channel1 broadcast enabled                                                       
setup channel1 input "rtsp://x.x.x.x:554/?user=foo&password=bar&channel=1&stream=0.sdp"
setup channel1 output #mosaic-bridge{id=1,height=360,width=640}

new channel2 broadcast enabled                                                       
setup channel2 input "rtsp://x.x.x.y:554/?user=foo&password=bar&channel=1&stream=0.sdp"
setup channel2 output #mosaic-bridge{id=2,height=360,width=640}

new channel3 broadcast enabled                                                       
setup channel3 input "rtsp://x.x.x.z:554/?user=foo&password=bar&channel=1&stream=0.sdp"
setup channel3 output #mosaic-bridge{id=3,height=360,width=640}

new channel4 broadcast enabled                                                       
setup channel4 input "rtsp://x.x.x.w:554/?user=foo&password=bar&channel=1&stream=0.sdp"
setup channel4 output #mosaic-bridge{id=4,height=360,width=640}

new mosaic broadcast enabled
setup mosaic input file:///home/user/Pictures/bg.jpg
setup mosaic option image-duration=-1
setup mosaic option image-fps=0
setup mosaic option mosaic-rows=2
setup mosaic option mosaic-cols=2
setup mosaic option mosaic-position=1
setup mosaic output #transcode{sfilter=mosaic,vcodec=mp4v,VB=8500,acodec=none,fps=25,scale=1}:display

control channel1 play
control channel2 play
control channel3 play
control channel4 play
control mosaic play

The input path to the camera streams as well as full path to the background image should be adjusted accordingly.

Somehow VLC doesn’t want to recognize mosaic-(height|width|order) parameters in the VLM file, so need to supply them inline as arguments when calling VLC. Now as we have VLM file ready, we can start the stream with the following command:

cvlc --vlm-conf /home/user/Desktop/cam.vlm.conf --mosaic-width 1280 --mosaic-order "1,2,3,4" --mosaic-height 720

Adjust the path to the VLM file accordingly as well as mosaic order or whatever else you want. For me all of the above worked out perfectly well and I can see all my 4 cameras in single window.

Galaxy Nexus custom ROM

As you may know, Google has announced the next Adnroid (KitKat) some time ago, and at the same moment they told that Galaxy Nexus will not receive this update. This was pretty sad for me, as I got my Nexus mainly because of two reasons: raw Android (without all those apps that normally come pre-installed on branded phones like Sony, LG, whatever) and over-the-air updates to the latest version of Android.

All the time I had Nexus, I was never bothering with routing, custom ROM or what-so-ever as I was pretty happy with the stock SW, but after Google rejected to get the latest SW, I had no choice. Checking all around and doing couple attempts to find what suites me – I finally found what I like and what works for me.

Criteria to choose one or another was as follows:

  • Latest Android
  • Minimal non-stock apps included
  • Most customization
  • Stability

Due to the first criteria or the latest version of Android – there are not that many ROMs out there to pass the test. After checking few, I had finally chose the SlimRoms guys to support me. They have latest Android (currently 4.4.2), frequent updates (even weekly builds), almost all SW is stock one (except Nova Launcher, Simple Browser and SlimCenter to check for updates and some SlimRom IRC tool), and nice customization. For instance, they have increased the standard DPI in SW build, so I have every item smaller, but can have more on the screen, they also have customization on ring short-cuts and so on, but overall it is not over complicated and they keep the ROM really slim.

After getting custom ROM and playing around – I really enjoy and think I gonna stay with rooted phone and such ROMs. Not that I have a lot of things that require root, but some are:

  • remote apps that I don’t need (browser, as I use Chrome, sms, as I use Hangout, Gallery, as I use QuickPic, launchers, as I use Go Launcher and so on)
  • nice SSH client that supports private keys and identities (JuiceSSH – my dream as a system administrator)
  • custom app permissions (as I fed up facebook and others polling GPS and doing other mess)

Anyhow, if you are an owner of Samsung Galaxy Nexus and want latest Android, or you are an owner of any other Android-based phone and wanna play around – check it, it is fun :-)