Monday, December 12, 2011

ps returns incorrect etime

I run ps -Ao pid,pcpu,rss,etime,args to check for long running processes. If etime (elapsed time) of a process is a greater than say 10 hours, I kill the process. Lately I have been seeing valid processes getting killed. I noticed in the logs that etime was returning 49710-06:28:15 or 4294967295 seconds or 2^32-1. Anytime I see these magic numbers 2^N or 2^N-1, I know there is some thing weird. Turns out I am right.  The procps fix states "the ps utility's "etime" field shows the elapsed time since a process was started. On heavily-loaded systems, it was possible for this value to return negative due to an integer overflow. " 
I din't update the procps, instead I fixed my python script.

Saturday, December 10, 2011

fetching ec2 ondemand (excluding spot instances)

EC2 describeInstance API has no way filtering out only on-demand instances. The way I do it using python boto library:

from boto.ec2.connection import EC2Connection
ec2 = EC2Connection(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
reservations = ec2.get_all_instances(filters = {"instance-state-name":"running"})
ondemand_instances = []
for r in reservations:
    for i in r.instances:
        if not hasattr(i, "instanceLifecycle"):
            ondemand_instances.append(i)

However If I want to fetch all running instances there is a nice filter to do so:

reservations = ec2.get_all_instances(filters = {"instance-lifecycle":"spot",
                                    "instance-state-name":"running"})
spot_instances = []
for r in reservations:
    for i in r.instances:
        spot_instances.append(i)

Saturday, September 10, 2011

Sequence diagrams made easy

I use the websequencediagrams.com a lot these days.. Here is a sample Subscriber/Publisher interaction:

Subscriber->Publisher: subscribe
Publisher-> Subscriber: notify
Publisher-> Subscriber: notify
Subscriber -> Publisher: unsubscribe

All I did was type the text above and a nice sequence diagram is drawn for you.