For one of the past tasks I have the following requirement: a number of hosts that are monitored with Zabbix has a item of history type that provide list of addresses in text format (one per line) and than I need to take all those lists and make one common list (table) where I would have in each row an address from the list, a hostname of the node where this address was last seen and timestamps.
Initially I added in zabbix the item to monitor on each node I needed, created a separate table in MySQL to hold the final list and then made a cron script that would do the following:
- retrieve a list of nodes that have a given item by item key_ with Zabbix API
- retrieve all items by itemid that I found out in a previous query with Zabbix API
- for each item, retrieve the latest history with Zabbix API
- for each history, split the text by new line to get addresses and then add each address with source host and timestamps in the final MySQL table, or update the timestamp and source in case address is already in the list
With total size of the list about 5K addresses, all of the above was taking around 4-5 minutes and was consuming a lot of CPU and memory on the server. As I was limited on server resources and wanted the list to be updated every minute, I decided to avoid using Zabbix API and try to the job with plain MySQL queries. As I am only interested in the latest history for each node, I recalled my own post that I did a while ago on SQL GROUP BY with subqueries. Checking zabbix SQL structure around and a bit of playing with queries, I ended up with the single request that will give me all I need:
SELECT * FROM (SELECT h.host,hi.id,hi.value\ FROM hosts AS h, items AS i, history_text AS hi\ WHERE i.hostid=h.hostid AND hi.itemid=i.itemid\ AND h.status<>3 AND i.key_ LIKE 'my_item_key%'\ AND hi.value <> '' ORDER BY hi.clock DESC) tmp_table\ GROUP BY host;
The sub-query will give me hostname, history entry id and history values from history_text for non template hosts (status <> 3), with non empty value for the item key_ I want order by time newest first and then the main query will take that list shrink it down to have only on entry per host which will be the newest one.
Now having this list, for each result raw, I can split the value by newline to extract all addresses and add them or update one by one in the final table. Here comes in another trick that I that I described in my old post here: since I need to update the source for already existed entries in the final table while adding anything that is not there, I run the insert with the following SQL statement, considering that the address field is a primary key and is unique:
INSERT INTO final_table (address,source)\ VALUES('$address','$source')\ ON DUPLICATE KEY UPDATE source='$host';
So whenever there is a conflict on address field, the source field is getting updated with the new value.
After changing Zabbix API queries to native SQL, the script runs few seconds and consumes almost nothing, as it is relying on MySQL engine to do most of the job, I MySQL can do it much better.
Finally, if there is no interest in history items after they were imported into final_table, it is possible to delete all raws for the items from history_text for a given key_ table with the following SQL query:
DELETE FROM history_text WHERE itemid IN\ (SELECT hi.id FROM items AS i, history_text AS hi\ WHERE hi.itemid=i.itemid AND i.key_ LIKE 'my_item_key%');
This is an alternative to relying on Zabbix housekeeper that will do the job, but a bit later. And if polling of the nodes for this item is pretty frequent and resulted values are pretty big – it will consume space in MySQL that we want to avoid.