Hi! Welcome...

Syndication of blogs and tweets by users of the Freenode ##infra-talk IRC channel

27 June 2010 ~ Comments Off

Velocity Conference 2010 takeaways

Velocity 2010 was an excellent conference. Following are my takeways from the conference. There is tons more but following are some of the things that made a good impression and are likely not hard to do

Web performance optimization

Mobile performance optimization

Most of the recommendations have been taken off Maximiliano Firtman's Mobile Web High Performance. You can view slides here.

  • Avoid JQuery unless you really need it. Check out slide 90. It takes 1.8 seconds on iPhone and 4 seconds on Android to download and parse JQuery. Use mobile optimized frameworks such as baseJS and XUI
  • Avoid DNS lookups and minimize number of requests since they are slow
  • Embed CSS and Javascript on the home page. After onload download external CSS and JS.
  • Use inline images (slide 56) and pictograms
  • Avoid redirects
  • Use native constructs especially for Webkit browsers e.g. -webkit-text-stroke
  • Keynote announced their Mobile Testing tool for desktops that looks promising http://mite.keynote.com/

SSL/Security

  • According to Google SSL overhead these days is pretty minimal. Around 1% on today's servers.
  • Pet peeve about the presentation is they were advising everyone to use less secure key lengths ie. 1024 bits and RC4 cipher to improve performance. It is true that adding SSL to insecure connections is certainly an improvement but it should be qualified. E-mail probably fine. Financial sites probably bad.

Scalability

  • Hidden Scalability Gotchas in Memcached and Friends by Neil Gunther (author of Guerilla Capacity Planning) and Shanti Subramanyam discussed their findings around memcached. They used quantitative analysis to analyze different memcache versions. Based on their analysis using Neil's model memcache 1.4.5 has higher contention than 1.2.8.

Culture

27 June 2010 ~ Comments Off

Tutorial: Writing MCollective Agents

I’ve recorded a screencast that walks you through the process of developing a SimpleRPC Agent, give it a DDL and also a simple client to communicate with it.

The tutorial creates a small echo agent that takes input and return it unmodified. It validates that you are sending a string and has a sample of dealing with intermittent failure.

Once you’ve watched this, or even during, you can use the following links are reference material: Writing Agents, Data Definition Language and Writing Clients.

You can view it directly on blip.tv which will hopefully be better quality.

I used a few VIM Snippets during the demo to boilerplate the agent and DDL, you’ll find these in the tarball for the upcoming 0.4.7 release in the ext/vim directory, they are already on GitHub too.

27 June 2010 ~ Comments Off

Scalable Internet Architecture

Scalable Internet Architecture View more presentations from postwait.

27 June 2010 ~ Comments Off

The ThoughtWorks Anthology – Short Review

The ThoughtWorks Anthology is a collection of short articles and essays written by a number of their employees (some of who are now ex-employees) about software development with a heavily agile slant. The topics range from the very high level "Lush Landscape of Languages" and "What is an Iteration manager anyway" to the more technical and technique focused "Refactoring Ant Build Files" and "Object Calisthenics".

While the general quality of the writing is very good, especially my favourite - 'Object Calisthenics', the biggest problem with a book like this is that a lot of the essays authors, and some of their also knowledgeable co-workers, have personal blogs where this quality of information is available on a (near) daily basis, in both greater depth and more a conversational nature.

Like this post? - Digg Me! | Add to del.icio.us! | reddit this!

24 June 2010 ~ Comments Off

MCollective Data Definition Language

I mentioned in my recent post about mcollective Road Map about the DDL.

The DDL is used to describe agents in a way that is accessible by other programs, web applications, client libraries and so forth to help those various client tools to configure themselves correctly.

An actual example of a DDL file can be found here if you want to have a good look at it and full docs here.

I’ve created a short video showing the DDL and some of the features of the upcoming 0.4.7 release, you probably want to view it full screen to really see what’s going on.

And a quick note about the colors, I know people tend to feel strongly about this kind of thing, you can disable them in the config file of the client :)

This is also my first attempt at using blip.tv, please let me know if you see any problems.

24 June 2010 ~ Comments Off

Bash pitfalls

Some kind soul on #bash on Freenode recently pointed me at this excellent document:

http://mywiki.wooledge.org/BashPitfalls

(I ran into #20, in case you're wondering)

23 June 2010 ~ Comments Off

A new brick in my infrastructure

eth0If you read this blog you may know I daily use puppet at $WORK. Puppet is made to maintain configuration on machines, but not for one shot actions. For a couple of month I used to work with fabric for this but It had a few drawbacks, mainly because you need to maintain the list of hosts you want to act on and that it hates dead hosts, even if ghantoos dropped a link showing how to get rid of this. So I replaced it with mcollective : not exactly the same (it uses an agent instead of SSH) but it dynamicaly knows which hosts are up, allows to filter on facts (from facter, the puppet companion).

You could think that needing an agent is a drawback but it allows much complex actions, fine grained logic. Moreover it does not require much work to be installed if you already have puppet running. If you have a high number of machines you gain scalability with real parallel actions : it does not take longer to run on 5 machines than on 100.

One more point is the active development : R.I. Pienaar released 2 versions recently, improving access control, adding DDL to create interfaces easily. You can give your unprivileged staff some power through a web interface in less than 100 lines of code.

A tool for sysadmins that appreciate the devops spirit and want to do more in less time !

I’ll be publishing my agent and related tools on my github account.

23 June 2010 ~ Comments Off

xdotool 2.20100623.2949 released

The latest release of xdotool includes a few bug fixes, minor improvements, and one major feature addition. All of the tests pass in the test suite (156 tests, 1892 assertions, 126 seconds).

The major feature is the new command chaining. That is, you can use multiple commands from the same xdotool invocation. Any window querying (search, getactivewindow, getwindowfocus) will save the result for use with future commands on the same invocation. For a simple example, we can resize the current window by doing this:

% xdotool getactivewindow windowsize 600 400
You can also activate the first Firefox window found:
% xdotool search --class firefox windowactivate
Or move all gimp windows to a specific desktop and then activate (switch to) gimp:
% xdotool search --class gimp set_desktop_for_window %@ 2 windowactivate %@

# the "%@" means all windows found in the search. The default when you do not
# specify a window is always "%1" aka the first window found by the search.
This idea for this feature came primarily from detailed suggestions by Henning Bekel. Huge thanks for dumping ideas my way :)

In addition, libxdo is now documented with Doxygen and is available here. Remember, if you have a feature request, it can't hurt to ask (or send patches!). If I have time/energy to work on it, I'll do what I can.

Download: xdotool-2.20100623.2949.tar.gz

As usual, if you find problems or have feature requests, please file bugs or send an email to the list. Changelist from previously-announced release:

2.20100623.*:
  - Added 'window stack' and 'command chaining' support. Basically lets you
    include multiple commands on a single xdotool invocation and saves the last
    window result from search, getactivewindow, and getwindowfocus. For example,
    the default window selector is "%1" meaning the first window. 'xdotool
    search --class xterm windowsize 500 500' will resize the first xterm found.
    All commands that take window arguments (as flags or otherwise) now default to
    "%1" if unspecified. See xdotool(1) manpage sections 'COMMAND CHAINING'
    and 'WINDOW STACK' for more details. This feature was suggested (with great
    detail) by Henning Bekel.
  - To simplify command chaining, all options now terminate at the first
    non-option argument. See getopt(3) near 'POSIXLY_CORRECT' for details.
  - Add --sync support to windowsize.
  - Update docs and output to indicate that 'search --title' is deprecated (but
    still functions). Use --name instead.
  - Fix mousemove --screen problem due to bug in XTEST. Reported by
    Philipp Specht, http://code.google.com/p/semicomplete/issues/detail?id=35
  - Fix segfault when invoking xdotool with an invalid command.
    http://code.google.com/p/semicomplete/issues/detail?id=34
    Reported by: Bruce Jerrick, Sven Lankes, Daniel Kahn Gillmor, and Thomas
    Schwery.
  - Fix bug --clearmodifiers bug caused by an uninitialized value being
    interpreted as the 'modmask' causing us to try to clear invalid modifiers.
    Reported by Hong-Leong Ong on the mailing list.
  - Lots of internal refactoring and documentation improvements.
  - Testing additions for several commands in addition to command chaining.
  - Documented libxdo through xdo.h. Docs can be generated by 'make docs'
    from the xdotool release.
  - libxdo: xdo_window_translate_with_sizehint
  - libxdo: xdo_window_wait_for_size

23 June 2010 ~ Comments Off

xdotool 2.20100623.2949 released

The latest release of xdotool includes a few bug fixes, minor improvements, and one major feature addition. All of the tests pass in the test suite (156 tests, 1892 assertions, 126 seconds).

The major feature is the new command chaining. That is, you can use multiple commands from the same xdotool invocation. Any window querying (search, getactivewindow, getwindowfocus) will save the result for use with future commands on the same invocation. For a simple example, we can resize the current window by doing this:

% xdotool getactivewindow windowsize 600 400
You can also activate the first Firefox window found:
% xdotool search --class firefox windowactivate
Or move all gimp windows to a specific desktop and then activate (switch to) gimp:
% xdotool search --class gimp set_desktop_for_window %@ 2 windowactivate %@

# the "%@" means all windows found in the search. The default when you do not
# specify a window is always "%1" aka the first window found by the search.
This idea for this feature came primarily from detailed suggestions by Henning Bekel. Huge thanks for dumping ideas my way :)

In addition, libxdo is now documented with Doxygen and is available here. Remember, if you have a feature request, it can't hurt to ask (or send patches!). If I have time/energy to work on it, I'll do what I can.

Download: xdotool-2.20100623.2949.tar.gz

As usual, if you find problems or have feature requests, please file bugs or send an email to the list. Changelist from previously-announced release:

2.20100623.*:
  - Added 'window stack' and 'command chaining' support. Basically lets you
    include multiple commands on a single xdotool invocation and saves the last
    window result from search, getactivewindow, and getwindowfocus. For example,
    the default window selector is "%1" meaning the first window. 'xdotool
    search --class xterm windowsize 500 500' will resize the first xterm found.
    All commands that take window arguments (as flags or otherwise) now default to
    "%1" if unspecified. See xdotool(1) manpage sections 'COMMAND CHAINING'
    and 'WINDOW STACK' for more details. This feature was suggested (with great
    detail) by Henning Bekel.
  - To simplify command chaining, all options now terminate at the first
    non-option argument. See getopt(3) near 'POSIXLY_CORRECT' for details.
  - Add --sync support to windowsize.
  - Update docs and output to indicate that 'search --title' is deprecated (but
    still functions). Use --name instead.
  - Fix mousemove --screen problem due to bug in XTEST. Reported by
    Philipp Specht, http://code.google.com/p/semicomplete/issues/detail?id=35
  - Fix segfault when invoking xdotool with an invalid command.
    http://code.google.com/p/semicomplete/issues/detail?id=34
    Reported by: Bruce Jerrick, Sven Lankes, Daniel Kahn Gillmor, and Thomas
    Schwery.
  - Fix bug --clearmodifiers bug caused by an uninitialized value being
    interpreted as the 'modmask' causing us to try to clear invalid modifiers.
    Reported by Hong-Leong Ong on the mailing list.
  - Lots of internal refactoring and documentation improvements.
  - Testing additions for several commands in addition to command chaining.
  - Documented libxdo through xdo.h. Docs can be generated by 'make docs'
    from the xdotool release.
  - libxdo: xdo_window_translate_with_sizehint
  - libxdo: xdo_window_wait_for_size

22 June 2010 ~ Comments Off

Debugging java threads with top(1) and jstack.

At work, we're testing some new real-time bidding on ADX and are working through some performance issues.

Our server is jetty + our code, and we're seeing performance problems at relatively low QPS.

The first profiling attempt used YJP, but when it was attached to our server, the system load went up quickly and looked like this:

load average: 2671.04, 1653.95, 771.93
Not good; the load average while running with the profiler attached jumps to a number roughly equal to the number of application threads (3000 jetty threads). Possible PEBCAK here, but needless to say we couldn't use YJP for any meaningful profiling.

Stepping back and using some simpler tools proved to be much more useful. The standard system tool "top" can, in most cases show per-thread stats instead of just per-process. Unfortunately, top(1) doesn't seem to have a "parent pid" option, but luckily for us, our java process was the only one running as the user 'rfi'. Further, some top implementations can take a single process (top -p PID) which would be useful here.

If we make top output threads and sort by cpu time, we get an idea of the threads that spend the most time being busy. Here's a sample:

top - 22:29:34 up 87 days, 21:45,  9 users,  load average: 10.31, 12.58, 8.45
Tasks: 3237 total,   5 running, 3231 sleeping,   0 stopped,   1 zombie
Cpu(s): 53.9%us,  2.6%sy,  0.0%ni, 42.3%id,  0.0%wa,  0.0%hi,  1.1%si,  0.0%st
Mem:  16432032k total, 10958512k used,  5473520k free,   149100k buffers
Swap: 18474740k total,    77556k used, 18397184k free,  5321140k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+    TIME COMMAND
32149 rfi       16   0 13.7g 6.8g  13m S 50.7 43.3 134:38.08 134:38 java
28993 rfi       15   0 13.7g 6.8g  13m S  5.5 43.3  13:26.85  13:26 java
28995 rfi       16   0 13.7g 6.8g  13m S  3.6 43.3   8:17.84   8:17 java
32151 rfi       15   0 13.7g 6.8g  13m S  1.9 43.3   5:47.59   5:47 java
29143 rfi       16   0 13.7g 6.8g  13m S  0.0 43.3   4:18.51   4:18 java
28990 rfi       15   0 13.7g 6.8g  13m S  1.0 43.3   3:58.86   3:58 java
28989 rfi       15   0 13.7g 6.8g  13m S  0.6 43.3   3:58.83   3:58 java
28992 rfi       15   0 13.7g 6.8g  13m S  1.0 43.3   3:58.72   3:58 java
28986 rfi       16   0 13.7g 6.8g  13m S  1.0 43.3   3:58.00   3:58 java
28988 rfi       16   0 13.7g 6.8g  13m S  0.6 43.3   3:57.94   3:57 java
28987 rfi       15   0 13.7g 6.8g  13m S  0.6 43.3   3:56.89   3:56 java
28991 rfi       15   0 13.7g 6.8g  13m S  1.0 43.3   3:56.71   3:56 java
28985 rfi       15   0 13.7g 6.8g  13m S  1.3 43.3   3:55.79   3:55 java
28994 rfi       15   0 13.7g 6.8g  13m S  1.0 43.3   2:05.35   2:05 java
29141 rfi       15   0 13.7g 6.8g  13m S  0.0 43.3   1:09.66   1:09 java
32082 rfi       15   0 13.7g 6.8g  13m S  0.0 43.3   0:56.29   0:56 java
28998 rfi       15   0 13.7g 6.8g  13m S  0.0 43.3   0:52.27   0:52 java
29359 rfi       15   0 13.7g 6.8g  13m S  3.2 43.3   0:45.11   0:45 java
31920 rfi       15   0 13.7g 6.8g  13m S  0.0 43.3   0:43.46   0:43 java
31863 rfi       15   0 13.7g 6.8g  13m S  0.0 43.3   0:41.67   0:41 java
31797 rfi       15   0 13.7g 6.8g  13m S  0.0 43.3   0:41.56   0:41 java
31777 rfi       15   0 13.7g 6.8g  13m S  0.0 43.3   0:41.31   0:41 java
30807 rfi       15   0 13.7g 6.8g  13m S  0.0 43.3   0:41.22   0:41 java
31222 rfi       15   0 13.7g 6.8g  13m S  0.0 43.3   0:40.70   0:40 java
30622 rfi       15   0 13.7g 6.8g  13m S  3.9 43.3   0:40.56   0:40 java
31880 rfi       15   0 13.7g 6.8g  13m S  0.0 43.3   0:39.77   0:39 java
29819 rfi       15   0 13.7g 6.8g  13m S  4.8 43.3   0:39.15   0:39 java
29438 rfi       15   0 13.7g 6.8g  13m S  0.0 43.3   0:39.11   0:39 java
30363 rfi       15   0 13.7g 6.8g  13m R  0.0 43.3   0:39.03   0:39 java
31554 rfi       15   0 13.7g 6.8g  13m S  0.0 43.3   0:38.82   0:38 java

I included lots of threads in the output above so you can get an idea, visually, of the time buckets you could place each thing in.

The important column is "TIME+" here. You can see that there's some uniqueness in the times that probably give you an indication of the similarity in duties between each thread with similar times (all the near 4-hour times are probably of similary duty, etc).

If you grab a stack trace from your java process, you can map each process id above to a specific thread. You can get a stack by using jstack(1) or sending SIGQUIT to the java process and reading the output.

Java stack traces, for each thread, have similar headers:

"thread name" prio=PRIORITY tid=THREADID nid=LWPID STATE ...
The thing we care about here is the LWPID (light weight process id) or also known as the process id.

Find my java process (must be run as the user running the process):

% sudo -u rfi jps
28982 ServerMain
3041 Jps
If I take the 2nd highest pid from the top output above, 28993, what thread does this belong to? If you look at the stack trace, you'll see that java outputs the 'nid' field in hex, so you'll have to convert your pid to hex first.
% printf "0x%x\n" 28993
0x7141

% grep nid=0x7141 stacktrace
"VM Thread" prio=10 tid=0x000000005e6d9000 nid=0x7141 runnable 

# What about the first pid?
% printf "%0x%x\n" 32149
0x7d95

% grep -A3 nid=0x7d95 stacktrace
2010-06-22 03:15:26.489485500 "1257787176@Jetty Thread Pool-2999 - Acceptor0 SocketConnector@0.0.0.0:2080" prio=10 tid=0x00002aad3dd43000 nid=0x7d95 runnable [0x00002aadc62d9000]
2010-06-22 03:15:26.489507500    java.lang.Thread.State: RUNNABLE
2010-06-22 03:15:26.489512500   at java.net.PlainSocketImpl.socketAccept(Native Method)
2010-06-22 03:15:26.489518500   at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:390)
From the above output, I can conclude that the most cpu-intensive threads in my java process are the Jetty socket acceptor thread and the "VM Thread". A red flag should pop up in your brain if a 4-hour-old process has spent 2.25 hours spinning in accept().

FYI, googling for "VM Thread" and various java keywords doesn't seem to find much documentation, though most folks claim it has GC and other maintenance-related duties. I am not totally sure what it does.

In fact, if you look at the top 10 pids, none of them are actually anything that runs the application code - they are all management threads (thread pool, socket listener, garbage collection) outside of my application.

32149 1257787176@Jetty Thread Pool-2999 - Acceptor0 SocketConnector@0.0.0.0:2080
28993 VM Thread
28995 Finalizer
32151 pool-7-thread-1
29143 pool-3-thread-1
28990 GC task thread#5 (ParallelGC)
28989 GC task thread#4 (ParallelGC)
28992 GC task thread#7 (ParallelGC)
28986 GC task thread#1 (ParallelGC)
28988 GC task thread#3 (ParallelGC)
To be fair to the rest of this app, there are 3000 worker threads that each have around 40 minutes (or less) of cpu time, so the application is doing useful work, it's important to observe where you are wasting time - in this case, the jetty accept thread. The GC-related data may be an indication that garbage collection could be tuned or avoided (through thread-local storage, etc).

Lastly, since each of these IDs are LWP IDs, you can use other standard tools (strace/truss, systemtap/dtrace, etc) on them.

Knowing the accept thread is consuming lots of time should make you run tcpdump if you haven't already, and in this case I see a few hundred new connections per second, which isn't very high and leads us further down the road of debugging.