<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Apparently this installer wants a title</title>
	<atom:link href="http://davidben.net/blog/feed/" rel="self" type="application/rss+xml" />
	<link>http://davidben.net/blog</link>
	<description>Various ramblings from David Benjamin</description>
	<lastBuildDate>Wed, 24 Aug 2011 00:04:16 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<item>
		<title>KDE SVN Access</title>
		<link>http://davidben.net/blog/2010/05/23/kde-svn-access/</link>
		<comments>http://davidben.net/blog/2010/05/23/kde-svn-access/#comments</comments>
		<pubDate>Mon, 24 May 2010 02:47:14 +0000</pubDate>
		<dc:creator>davidben</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[kde]]></category>

		<guid isPermaLink="false">http://davidben.scripts.mit.edu/blog/?p=461</guid>
		<description><![CDATA[As of some time last week, I am now a KDE committer. So far, I haven&#8217;t made a useful commit yet, but hopefully I&#8217;ll have time over the summer. KHelpCenter could use a lot of love, and I&#8217;d like to improve the startup time of various applications.]]></description>
				<content:encoded><![CDATA[<p>As of some time last week, I am now a <a href="http://websvn.kde.org/trunk/kde-common/accounts?view=markup">KDE committer</a>. So far, I haven&#8217;t made a useful commit yet, but hopefully I&#8217;ll have time over the summer. KHelpCenter could use a lot of love, and I&#8217;d like to improve the startup time of various applications.</p>
]]></content:encoded>
			<wfw:commentRss>http://davidben.net/blog/2010/05/23/kde-svn-access/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>BarnOwl, ncurses, and terminal resizing</title>
		<link>http://davidben.net/blog/2010/05/03/barnowl-ncurses-and-terminal-resizing/</link>
		<comments>http://davidben.net/blog/2010/05/03/barnowl-ncurses-and-terminal-resizing/#comments</comments>
		<pubDate>Mon, 03 May 2010 04:18:27 +0000</pubDate>
		<dc:creator>davidben</dc:creator>
				<category><![CDATA[Software]]></category>
		<category><![CDATA[barnowl]]></category>
		<category><![CDATA[unix]]></category>

		<guid isPermaLink="false">http://davidben.scripts.mit.edu/blog/?p=400</guid>
		<description><![CDATA[Earlier this week, I hunted down a delicious little bug in BarnOwl. A friend of mine, Alex, reported occasionally segmentation faults in his sessions. This week, we finally got a core dump to inspect. Off-screen windows Inspecting the backtrace, we see the crash occurred in the ncurses function wredrawln, called by our owl_function_full_redisplay. (Actually, we [...]]]></description>
				<content:encoded><![CDATA[<p>Earlier this week, I hunted down a delicious little bug in <a hrefx`="http://barnowl.mit.edu/">BarnOwl</a>. A friend of mine, <a href="http://alex.mit.edu/">Alex</a>, reported occasionally segmentation faults in his sessions. This week, we finally got a core dump to inspect.</p>
<h3>Off-screen windows</h3>
<p>Inspecting the <a href="http://web.mit.edu/davidben/Public/barnowl-backtrace.txt">backtrace</a>, we see the crash occurred in the ncurses function <code>wredrawln</code>, called by our <code>owl_function_full_redisplay</code>. (Actually, we call <code>redrawwin</code> which is a macro wrapper over <code>wredrawln</code>.)</p>
<p>Now, one of many causes for crashes in ncurses is off-screen windows. It&#8217;s not clear what the state of support for it is, as the code does check for in some cases, and <code>newwin</code> will allow you to create them. That said, you cannot move them off-screen with <code>mvwin</code>, and parts of ncurses may crash.</p>
<p>In fact, <code>owl_function_full_redisplay</code> had the following code in place</p>
<blockquote><pre>
  /* Work around curses segfualts with windows off the screen */
  if (g.lines >= owl_global_get_typwin_lines(&#038;g)+2)
      redrawwin(owl_global_get_curs_typwin(&#038;g));
</pre>
</blockquote>
<p>Furthermore, it was <em>that</em> call to <code>redrawwin</code> that crashed. (The <code>typwin</code> is the typing area; the name is a holdover from the <a href="http://www.ktools.org/owl/">Owl</a> days.) Perhaps the check wasn&#8217;t sufficient and we got an off-screen window again. So, some <code>gdb</code> dances later:</p>
<pre>
       (gdb) p g.lines                                                                                                       
       $1 = 45
       [...]
      (gdb) print *(g->typpan->win)                                                                                         
       $5 = {_cury = 0, _curx = 0, _maxy = 7, _maxx = 176, _begy = 37, _begx = 0, [...]}
</pre>
<p></p>
<p>(<code>g</code> holds much of BarnOwl&#8217;s state. Again, a remnant from Owl.) Well, <code>7 + 37 = 44 = 45 - 1</code>, so that looks fine. But wait!</p>
<pre>
       (gdb) print *((WINDOW*) curscr)                                      
       $2 = {_cury = 37, _curx = 0, _maxy = 43, _maxx = 176, _begy = 0, _begx = 0, [...]}
</pre>
<p></p>
<p>Ncurses thinks the height of the window is <code>43 + 1 = 44</code>, one less than BarnOwl&#8217;s 45. So the window <em>is</em> off-screen. That triggers the aforementioned <code>wredrawln</code> crash.</p>
<h3>Diving into ncurses</h3>
<p>Clearly, this crash in ncurses shouldn&#8217;t happen in the first place. So let&#8217;s take a brief foray into the tangled forests of ncurses source. The function in question is implemented in <code>ncurses/base/lib_redrawln.c</code>. Here is a snippet of it:</p>
<blockquote><pre>
    end = beg + num;
    if (end > curscr->_maxy + 1)
        end = curscr->_maxy + 1;
    if (end > win->_maxy + 1)
        end = win->_maxy + 1;

    len = (win->_maxx + 1);
    if (len > (size_t) (curscr->_maxx + 1))
        len = (size_t) (curscr->_maxx + 1);
    len *= sizeof(curscr->_line[0].text[0]);

    for (i = beg; i < end; i++) {
        int crow = i + win->_begy;

        memset(curscr->_line[crow].text + win->_begx, 0, len);
        _nc_make_oldhash(crow);
    }
</pre>
</blockquote>
<p>Well, that&#8217;s odd. The library is already careful to clamp the dimensions by the screen size, so we shouldn&#8217;t have a problem. But <code>win->_beg{x,y}</code> is added as an offset. Those checks only work on windows flush with the upper-left corner. The <code>typwin</code> is not, so they fail.</p>
<h3>Ncurses resizing</h3>
<p>Ncurses problems not withstanding, we&#8217;re not finished. The main problem is yet to be addressed: How did BarnOwl and ncurses disagree on the window size? Before that, let&#8217;s discuss how ncurses resize is handled. There is an <code><a href="http://en.wikipedia.org/wiki/Ioctl">ioctl</a></code> for querying the size of a terminal, <code>TIOCGWINSZ</code> (as well as a handful of other mechanisms; the ncurses source code tries a ton of different ones). But we shouldn&#8217;t poll on that value, so there is a <a href="http://en.wikipedia.org/wiki/Signal_(computing)">signal</a> <a href="http://en.wikipedia.org/wiki/SIGWINCH"><code>SIGWINCH</code></a> which informs a process of a changed terminal size.</p>
<p>How do you react to a <code>SIGWINCH</code>? The traditional way was to do a <code>endwin</code>/<code>doupdate</code> pair. This is flickery, so there is an extension to curses, <code>resizeterm</code> which forces the resize work directly. (In ncurses, the former method is implemented using <code>resize_term</code>.) If you don&#8217;t register your own <code>SIGWINCH</code> handler, ncurses will do so on <code>initscr</code>, but then it&#8217;s very hard to atomically relayout and resize. (Ncurses doesn&#8217;t know enough about the application to handle resizes in full: a textbook violation of the <a href="http://en.wikipedia.org/wiki/End_to_end_principle">end-to-end principle</a>.)</p>
<h3>Reproducing the bug</h3>
<p>Most people run BarnOwl inside a <a href="http://www.gnu.org/software/screen/">screen</a> session, so let&#8217;s (incorrectly) guess that screen was failing to send the signal on attach. Aided by the backtrace, we open BarnOwl in screen with a popup open. We then detach, resize the terminal, reattach, close the popup, and BOOM! It crashes! That was easy.</p>
<p>Unfortunately, none of the other developers were able to reproduce this. Also, this theory doesn&#8217;t account for ncurses and BarnOwl getting different values as the same signal handler updates both. Furthermore, the bug isn&#8217;t always hit, so there&#8217;s a race condition happening somewhere.</p>
<h3>Tracing</h3>
<p>Stepping back a bit, there are three sets of window sizes here. The first is the actual terminal size. Then, we have ncurses&#8217; record of the size (from which the size of the screen buffer is determined). Finally, we have BarnOwl&#8217;s own record of the size (from which the window layout is determined). Playing with signals will allow us to synchronize the first with the other two. The crash we&#8217;re interested in comes from a discrepancy between between the latter. But BarnOwl is the only one calling <code>resizeterm</code>, so how can a discrepancy arise?</p>
<p>Quite conveniently, ncurses has a <a href="http://frank.harvard.edu/~coldwell/ncurses/ncurses-intro.html#debugging">trace</a> feature which is indispensable in debugging ncurses applications. By examining trace data, we notice something strange: when the bug is triggered <em>BarnOwl runs its <code>SIGWINCH</code> handler, but ncurses enters <code>resizeterm</code> twice.</em></p>
<p>So who made the second call. If you were reading carefully, you may already know the answer: <code>doupdate</code>. BarnOwl&#8217;s resize code not only called a <code>resizeterm</code>, but it also calls <code>endwin</code>.</p>
<blockquote><pre>
  if (!isendwin()) {
    endwin();
  }

  /* get the new size */
  ioctl(STDIN_FILENO, TIOCGWINSZ, &#038;size);
  // [...]

#ifdef HAVE_RESIZETERM
  resizeterm(size.ws_row, size.ws_col);
#endif</pre>
</blockquote>
<p>Because we call an <code>endwin</code>, the next screen update (the <code>doupdate</code> at the end of the event loop iteration) triggers ncurses&#8217; internal resize routine. It then does its own <code>ioctl</code> and finds the size. Now, <em>if</em> we change terminal size twice in a row, such that the second happens while we are still reacting to the first, ncurses&#8217; later <code>ioctl</code> will return <em>different</em> values than BarnOwl&#8217;s query. As a result, we get our inconsistency.</p>
<p>Where does this quick resize come from? Screen has this feature to enable a status bar. Both Alex and I happen to have them enabled. Screen has (arguably) a bug where it resizes the window twice in quick succession; the second time to add a status bar. This also explains why the numbers of lines were off by exactly 1. And that is final piece of the puzzle.</p>
<h3>Summary</h3>
<p>To help put these pieces together, I made a <a href="http://web.mit.edu/davidben/Public/barnowl-screen-race">diagram</a>. A fairly complex interaction between BarnOwl, ncurses, and screen was required here.</p>
<h4>Closing a popup window eventually calls <code>redrawwin</code> on a number of windows.</h4>
<p>This call is rather pointless. In BarnOwl 1.6, I landed some code to use the libpanel library to manage overlapping windows. The much safer <code>touchwin</code> can be used instead of <code>redrawwin</code>, and <code>owl_popwin_down</code> has no need to repaint the screen anyway.</p>
<h4><code>wredrawln</code> checks boundaries incorrectly</h4>
<p>I sent a patch upstream to correct this. It has been incorporated into <a href="ftp://invisible-island.net/ncurses/5.7/ncurses-5.7-20100501.patch.gz">ncurses-5.7-20100501</a>.</p>
<h4>Screen sends two <code>SIGWINCH</code>s in succession</h4>
<p>Screen&#8217;s code is really scary. I&#8217;m not touching that one.</p>
<h4>BarnOwl&#8217;s resize handler calls both <code>endwin</code> and <code>resizeterm</code></h4>
<p>This one is actually partially my fault. As part of the libpanel changes, I removed many of the extraneous explicit screen updates, including a <code>refresh</code> right after the <code>endwin</code>. This has now been changed. We no longer call <code>endwin</code> at all and require <code>resizeterm</code>. (As we already required ncurses for Unicode, this is not actually a change in dependencies.) We should also finally have flicker-free resizing now.</p>
<p>Had even one of these bugs not occurred, it is quite likely that we would never have noticed. But they all did and meshed together to form a crash. Some failures are simple and fairly easy to debug. Some are not. Sometimes, you have to dig quite far to understand the motions of all the players on the board.</p>
]]></content:encoded>
			<wfw:commentRss>http://davidben.net/blog/2010/05/03/barnowl-ncurses-and-terminal-resizing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>BarnOwl Locker Maintenance</title>
		<link>http://davidben.net/blog/2010/04/26/barnowl-locker-maintenance/</link>
		<comments>http://davidben.net/blog/2010/04/26/barnowl-locker-maintenance/#comments</comments>
		<pubDate>Mon, 26 Apr 2010 06:59:20 +0000</pubDate>
		<dc:creator>davidben</dc:creator>
				<category><![CDATA[Software]]></category>
		<category><![CDATA[barnowl]]></category>
		<category><![CDATA[deployment]]></category>

		<guid isPermaLink="false">http://davidben.scripts.mit.edu/blog/?p=385</guid>
		<description><![CDATA[So, Saturday, SIPB ran a hackathon, named Velocihacker following what is apparently becoming a trend. Last time, I ended up spending much of my time floundering over a Qt bug that apparently already got fixed the week before. This one was somewhat more productive. (Not to say the floundering wasn&#8217;t interesting or useful in itself. [...]]]></description>
				<content:encoded><![CDATA[<p>So, Saturday, <a href="http://sipb.mit.edu/">SIPB</a> ran a hackathon, named <a href="http://sipb.mit.edu/hackathons/velocihacker/">Velocihacker</a> following what is apparently becoming a <a href="http://sipb.mit.edu/hackathons/HackasaurusRex/">trend</a>. Last time, I ended up spending much of my time floundering over a Qt <a href="http://bugreports.qt.nokia.com/browse/QTBUG-8107?page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel">bug</a> that apparently already got fixed the week before. This one was somewhat more productive. (Not to say the floundering wasn&#8217;t interesting or useful in itself. Source-diving a large project like Qt is always an adventure.)</p>
<p>As of this hackathon, I&#8217;m now a <a href="http://barnowl.mit.edu/">BarnOwl</a> maintainer. Yay! <a href="http://nelhage.com/">Nelson</a> helped me through the release <a href="http://github.com/nelhage/barnowl/commit/1ea0249f36575e3bee628ddba71ed6ebab247f03">process</a> for the <code>barnowl</code> locker. (The versions are, quite excitingly, parallel-installed.)</p>
<p>After that, I went to work on a problem we&#8217;ve been having. So, the locker contains builds for various different platforms, but they all need to show up under <code>bin</code>. So, there is some magic in <a href="http://en.wikipedia.org/wiki/Andrew_File_System">AFS</a> to translate <code>@sys</code> components in paths into your <dfn>sysname</dfn>, a string which identifies your platform and architecture. For instance, the SIPB-run <a href="http://web.mit.edu/linerva/www/">Linerva</a> dial-up has a sysname of <code>i386_deb50</code>. Lockers typically symlink <code>bin</code> to <code>arch/@sys</code> and everything magically just works.</p>
<p>Well, not quite. BarnOwl actually uses a <a href="http://github.com/davidben/barnowl-locker-bin/blob/8e3d60b1391c42dd303b557af74fcb76ea10a167/barnowl">wrapper script</a> which launches the executable at <code>BARNOWL_REAL</code> after modifying the environment as appropriate. But other than that, all is good.</p>
<p>Except Athena only puts adjacent Ubuntu and Debian releases in the same sysname. In particular, Debian Lenny and Ubuntu Karmic share sysname suffix <code>deb50</code>. But, they include incompatible versions of libzephyr. Karmic includes zephyr 3 (with <a href="http://en.wikipedia.org/wiki/Soname">soname</a> <code>libzephyr.so.4</code>) while Lenny includes zephyr 2 (soname <code>libzephyr.so.4</code>). So, we spent much of the hackathon messing with the build scripts and wrapper scripts to work around this issue. We eventually decided to incorporate the zephyr soname into the version for the lenny/karmic sysnames and modified the <a href="http://github.com/nelhage/barnowl-locker-bin/blob/zephyr-soname/barnowl">wrapper script</a>.</p>
<p>A bit of work and fiddling later, and BarnOwl runs on both Karmic and Lenny out of the same sysname. It&#8217;s a fairly ugly hack, but it works. In the future, I hope to reorganize the BarnOwl locker to be a little cleaner; right now the folder hierarchy isn&#8217;t quite what I&#8217;d like. It&#8217;d also be nice to make this dispatch less of a hack, but I don&#8217;t know how to solve this problem in general. My preferred solution to these sorts of problems is to declare that we don&#8217;t care and parallel-install conflicting libraries as needed. However, zephyr requires a <code>zhm</code> be running on the machine, so we have dependency on the actual system; it&#8217;s no longer clear how to transform a global conflict into a local one. I guess we could try to run two <code>zhm</code>s, but that seems ridiculous.</p>
<p>I also fixed an annoying <a href="http://github.com/davidben/barnowl/commit/77e3cae25d46c5f34c9efb3cd2fbeebc9e5c0b0f">redraw/resize</a> issue, but that will likely wait until the master is ready to take changes for the 1.7 release. 1.6 isn&#8217;t out yet. Hopefully I will also have time in the future to finish the massive graphics layer project I&#8217;ve been planning for many months now.</p>
]]></content:encoded>
			<wfw:commentRss>http://davidben.net/blog/2010/04/26/barnowl-locker-maintenance/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Running KDE out of $HOME: D-Bus</title>
		<link>http://davidben.net/blog/2010/04/19/running-kde-dbus/</link>
		<comments>http://davidben.net/blog/2010/04/19/running-kde-dbus/#comments</comments>
		<pubDate>Mon, 19 Apr 2010 07:09:48 +0000</pubDate>
		<dc:creator>davidben</dc:creator>
				<category><![CDATA[Packaging]]></category>
		<category><![CDATA[kde]]></category>
		<category><![CDATA[package management]]></category>

		<guid isPermaLink="false">http://davidben.scripts.mit.edu/blog/?p=372</guid>
		<description><![CDATA[A short continuation of the previous post on KDEDIRS. So, after the KDEDIRS game, I was able to get most of the old programs running. But I had some trouble with the newer ones. KDE is slowly moving all their PIM applications to this framework called Akonadi. As of 4.4, the contacts were maintained by [...]]]></description>
				<content:encoded><![CDATA[<p>A short continuation of the previous post on <a href="/blog/2010/04/12/running-kde-subtle-changes/"><code>KDEDIRS</code></a>.</p>
<p>So, after the <code>KDEDIRS</code> game, I was able to get most of the old programs running. But I had some trouble with the newer ones. KDE is slowly moving all their <abbr title="Personal information management">PIM</abbr> applications to this framework called <a href="http://pim.kde.org/akonadi/">Akonadi</a>. As of 4.4, the contacts were maintained by Akonadi.</p>
<p>Even though I don&#8217;t use much of KDE PIM, the pieces notoriously all integrate which each other, and KDE would constantly try to launch the Akonadi server, even at login (I have since disabled the offending <a href="userbase.kde.org/KRunner">KRunner</a> plugin). Each time it would fail, complaining that the <a href="http://www.freedesktop.org/wiki/Software/dbus">D-Bus</a> services were not configured, among other problems.</p>
<p>D-Bus is the IPC mechanism behind the modern free desktop. It was inspired by KDE3&#8242;s old <a href="http://developer.kde.org/documentation/other/dcop.html">DCOP</a> system and GNOME&#8217;s <a href="http://en.wikipedia.org/wiki/CORBA">CORBA</a> implementation, and has since replaced both its predecessors. Now, D-Bus has this concept of <a href="http://dbus.freedesktop.org/doc/dbus-specification.html#message-bus-starting-services"><dfn>services</dfn></a>. These services allow D-Bus to automatically launch a service when one attempts to connect to a name.</p>
<p>While the services for the system bus are a hopeless cause for me, I should be able to influence my session bus as I wish. <code>dbus-daemon</code>&#8216;s man page does claim to follow the <a href="http://standards.freedesktop.org/basedir-spec/basedir-spec-latest.html">XDG Base Directory Specification</a>, so everything <em>should</em> Just Work. I set <code>XDG_DATA_DIRS</code> and D-Bus picks up my session files.</p>
<p>But it doesn&#8217;t. Inspecting the process&#8217;s environment (via <code>/proc/<var>PID</var>/environ</code>) reveals that my changes don&#8217;t take effect.</p>
<p>The problem: D-Bus is not launched by my <code>.xsession</code>, where all the magic happens. D-Bus is launched by login manager! (Well, indirectly.) In <code>/etc/X11/Xsession.d/</code> are files that get sourced by your login session. In particular, <code>/etc/X11/Xsession.d/75dbus_dbus-launch</code> puts <code>dbus-launch</code> into the startup sequence before I ever get to do anything. It is conditional on a <code>STARTDBUS</code> variable, but I am unaware of any way to modify that for my session alone. Removing this script or otherwise messing with it is not fair game; the goal is that I should be able to log back into system KDE safely having not modified any files it cares about.</p>
<p>So, lacking a better way to do this, my KDE 4.4 startup script contains this terrible little hack:</p>
<pre># Kill D-Bus
killall -u "$USER" dbus-daemon

# Launch KDE
exec dbus-launch --exit-with-session startkde</pre>
<p></p>
<p>At some point when I have time, I&#8217;ll investigate how <a href="http://nixos.org/">NixOS</a> manages this. I imagine they patch the display manager or session scripts at some level.</p>
]]></content:encoded>
			<wfw:commentRss>http://davidben.net/blog/2010/04/19/running-kde-dbus/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Running KDE out of $HOME: Subtle effects of path changes</title>
		<link>http://davidben.net/blog/2010/04/12/running-kde-subtle-changes/</link>
		<comments>http://davidben.net/blog/2010/04/12/running-kde-subtle-changes/#comments</comments>
		<pubDate>Mon, 12 Apr 2010 07:13:11 +0000</pubDate>
		<dc:creator>davidben</dc:creator>
				<category><![CDATA[Packaging]]></category>
		<category><![CDATA[kde]]></category>
		<category><![CDATA[package management]]></category>

		<guid isPermaLink="false">http://davidben.scripts.mit.edu/blog/?p=347</guid>
		<description><![CDATA[Continuing from the previous installment on lockers. So, you would think that, with the previous path setup alone, things would Just Work. Of course, there&#8217;s a minor issue needing a newer Qt than Ubuntu provides in Karmic, but that&#8217;s easy to fix with another locker. In fact, one could even download the LGPL SDK installer [...]]]></description>
				<content:encoded><![CDATA[<p>Continuing from the previous installment on <a href="/blog/2010/03/29/running-kde-lockers/">lockers</a>.</p>
<p>So, you would think that, with the previous path setup alone, things would Just Work. Of course, there&#8217;s a minor issue needing a newer <a href="http://qt.nokia.com/">Qt</a> than Ubuntu provides in Karmic, but that&#8217;s easy to fix with another locker. In fact, one could even download the LGPL <a href="http://qt.nokia.com/downloads">SDK installer</a> for Linux and use the folder as-is as a locker.</p>
<p>And, indeed, this mostly worked. However, I did not simply want to run my own build of a desktop. I wanted my original software to still work, but use the newer libraries. There, I ran into a problem. If I tried to launch <code>yakuake</code>, a terminal that I like to use for things like <a href="http://sipb.mit.edu/doc/zephyr/">zephyr</a>, I got this strange error:</p>
<div id="attachment_348" class="wp-caption aligncenter" style="width: 310px"><a href="/blog/wp-content/uploads/2010/04/yakuake.png"><img src="/blog/wp-content/uploads/2010/04/yakuake-300x92.png" alt="Yakuake&#039;s error message" title="Cannot Load Skin" width="300" height="92" class="size-medium wp-image-348" /></a><p class="wp-caption-text">I'm sorry Dave. I'm afraid I can't do that.</p></div>
<p>Well, that&#8217;s a bother. To understand what happened, let&#8217;s look at how KDE applications locate files.</p>
<p>At the heart of the core kdelibs library is <code>KStandardDirs</code>. (KDE&#8217;s API pages are down right now, so I shall direct you to this <a href="http://www.purinchu.net/kdelibs-apidocs/kdecore/html/classKStandardDirs.html">mirror</a> a developer set up.) When a KDE application wishes to locate a file, it does not hard-code a path or use a compiled-in <code><var>PREFIX</var></code> value. Instead, it asks KDECore to find it for them. You provide a resource type (such as <code>data</code>, <code>lib</code>, or <code>config</code>) and a file path. <code>KStandardDirs</code> then goes and locates it for you.</p>
<p>Reading down the docs a bit, we see that the class works by checking a set of registered suffixes for the resource type against a set of roots. (It also does some other magic like appending the application name for some resources.) These roots include the compiled prefix and a colon-separated variable <code>KDEDIRS</code>. This prefix is the prefix <em><code>kdelibs</code></em> was compiled with, not the application. As I was using my own KDE, of course it could not find Yakuake&#8217;s files. Aha! So I add <code>/usr</code> to <code>KDEDIRS</code> and everything works.</p>
<div id="attachment_348" class="wp-caption aligncenter" style="width: 310px"><a href="/blog/wp-content/uploads/2010/04/yakuake.png"><img src="/blog/wp-content/uploads/2010/04/yakuake-300x92.png" alt="Yakuake&#039;s error message" title="Cannot Load Skin" width="300" height="92" class="size-medium wp-image-348" /></a><p class="wp-caption-text">I'm sorry Dave. I'm afraid I <em>still</em> can't do that.</p></div>
<p>Bah! What&#8217;s going on?</p>
<p>Well, if we look at the set of <a href="http://www.purinchu.net/kdelibs-apidocs/kdecore/html/classKStandardDirs.html#3f2eef83977134adf26ea2a6aacbdc59">prefixes</a>, the standard suffix for data is <code>share/apps</code>. This fairly KDE-specific namespace in a global install gets stuffed under <code>/usr/share/apps</code>, which is offensive to distributions, so they like to redirect it to <code>/usr/share/kde4/apps</code>. A few other directories get a similar treatment. In Ubuntu&#8217;s case, a snippet from <code>/usr/share/pkg-kde-tools/makefiles/1/variables.mk</code> reveals the cause:</p>
<blockquote><pre>

# Standard Debian KDE 4 cmake flags
DEB_CMAKE_KDE4_FLAGS += \
        -DCMAKE_BUILD_TYPE=Debian \
        -DKDE4_BUILD_TESTS=false \
        -DKDE_DISTRIBUTION_TEXT="Kubuntu packages" \
        -DCMAKE_SKIP_RPATH=true \
        -DKDE4_USE_ALWAYS_FULL_RPATH=false \
        -DCONFIG_INSTALL_DIR=$(DEB_CONFIG_INSTALL_DIR) \
        -DDATA_INSTALL_DIR=/usr/share/kde4/apps \
        -DHTML_INSTALL_DIR=/usr/share/doc/kde/HTML \
        -DKCFG_INSTALL_DIR=/usr/share/kde4/config.kcfg \
        -DLIB_INSTALL_DIR=/usr/lib \
        -DSYSCONF_INSTALL_DIR=/etc
</pre>
</blockquote>
<p></p>
<p>My kdelibs, however, were compiled directly from upstream sources (in fact, I compiled from the 4.4 branch on a <code>git-svn</code> and hack on it myself). Moreover, these settings fail to set the standard suffixes, only a compiled-in value. (Kubuntu also carries a patch that changes the system-wide <code>FindKDE4Internal.cmake</code>. It may actually register suffixes. I&#8217;m not sure.) When using the system kdelibs, these compiled values do their job and everything works fine. However, this makes the system KDE files special in that they are only a priori accessible via the <em>system</em> kdelibs. While I can inform KDE of the system root, the suffix is wrong.</p>
<p>So, I add a little hack. I have yet another locker, <code>kde-kubuntu-fake</code> which contains a fake additional root for each of those directories. This contains merely a symlink farm:</p>
<pre>kde-kubuntu-fake
`-- share
    |-- apps -> /usr/share/kde4/apps/
    |-- config -> /usr/share/kde4/config
    |-- config.kcfg -> /usr/share/kde4/config.kcfg
    `-- doc
        `-- HTML -> /usr/share/doc/kde4/HTML
</pre>
<p></p>
<p>which also gets added to my <code>KDEDIRS</code>. Finally, after all that work, I can launch Yakuake.</p>
<p><img src="/blog/wp-content/uploads/2010/04/yakuake-yay.png" alt="Successful launch!" title="Success!" width="194" height="72" class="size-full wp-image-360" /></p>
<p>So, hopefully this will help convince that random distribution patches like this are <em>sketchy</em>. Admittedly, given the mistake of trying to mush all packages into one single hierarchy under <code>/usr</code>, the namespace poisoning of <code>/usr/share/apps</code> is a little obnoxious, and this is a defensible change. Still, such things do prevent the compatibility between distributions and upstream and make it very hard for a unified free desktop platform to ever emerge from this tangled mess we have now.</p>
]]></content:encoded>
			<wfw:commentRss>http://davidben.net/blog/2010/04/12/running-kde-subtle-changes/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Fun with (somewhat pointless) type-safe offsets in C++</title>
		<link>http://davidben.net/blog/2010/04/05/type-safe-offsets/</link>
		<comments>http://davidben.net/blog/2010/04/05/type-safe-offsets/#comments</comments>
		<pubDate>Mon, 05 Apr 2010 07:02:20 +0000</pubDate>
		<dc:creator>davidben</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[c++]]></category>

		<guid isPermaLink="false">http://davidben.scripts.mit.edu/blog/?p=327</guid>
		<description><![CDATA[C++ is a hideously complicated language with many little-known features and details. One of these features is member pointers, which we shall play a few games with here. The feature itself is largely useless, but quite amusing. C++ allows pointers to a member of a class. This can be a pointer to a member function [...]]]></description>
				<content:encoded><![CDATA[<p>C++ is a hideously complicated language with many little-known features and details. One of these features is member pointers, which we shall play a few games with here. The feature itself is largely useless, but quite amusing.</p>
<p>C++ allows pointers to a member of a class. This can be a pointer to a member function (in which case you get a delegate) or a pointer to a data member. Now, <a href="http://www.codeproject.com/kb/cpp/FastDelegate.aspx">for reasons I won&#8217;t go into</a>, member function pointers are incredibly complex. Member <em>data</em> pointers, however, are fairly simple; they&#8217;re just type-safe offsets with strange syntax (and limited use).</p>
<p>The syntax is as follows:</p>
<p>
<pre>struct A {
    int a;
    int b;
};

int main() {
    A object;
    int A::* some_field = &#038;A::b;
    object.*some_field = 27;
};</pre>
</p>
<p>The type declaration states that <code>some_field</code> is a member of class <code>A</code> with type <code>int</code>. This is precisely an <code>offsetof</code>.</p>
<p>The C++ language doesn&#8217;t allow you to easily cast these into <code>long</code>s, but you can use the usual <code>offsetof</code> pointer trick to do it.</p>
<p>
<pre>template&lt;class C, class V&gt;
inline unsigned long memptr_to_int(V C::*ptr) {
    return (unsigned long) &#038;(((C *)0)-&gt;*ptr);
}
// ...
int A::*some_field = &#038;A::a;
unsigned long offset = memptr_to_int(some_field);</pre>
</p>
<p>The compiler will also allow you to change either of the types embedded in the pointer with a <code>reinterpret_cast</code>, for instance:</p>
<p>
<pre>int A::* foo = &#038;A::field;
char B::* bar = reinterpret_cast&lt;char B::*&gt;(foo);</pre>
</p>
<p>Using this, we can cast any arbitrary <em>constant</em> integer to a cast:</p>
<p>
<pre>template &lt;int N&gt; struct offset_struct {
    int pad[N];
    int field;
private:
    offset_struct() {} // forbid actual construction
};

template&lt;class C, class V, int N&gt;
inline V C::* const_to_memptr() {
    return reinterpret_cast&lt;V C::*&gt;(&#038;(offset_struct&lt;N&gt;::field));
}

// ...
int A::*some_field = const_to_memptr&lt;A, int, 5&gt;();</pre>
</p>
<p>Sadly, I have not yet found a way (short of terrible tricks with <code>union</code>s) to convert an arbitrary integer variable into such an offset. So, if you need offsets which (almost) prevent you from pointing to any undefined fields in a class, C++ data member pointers are what you want. <img src='/blog/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<p>In practice, this construct sees considerably more use as a template argument, or something similarly inlined. The actual pointers tend not to be created. For instance, in the <a href="http://www.boost.org/doc/libs/1_42_0/libs/python/doc/index.html">Boost.Python</a> library, they&#8217;re used to create <a href="http://www.boost.org/doc/libs/1_42_0/libs/python/doc/tutorial/doc/html/python/exposing.html#python.class_data_members">properties</a> out of fields.</p>
<p>Isn&#8217;t C++ fun?</p>
]]></content:encoded>
			<wfw:commentRss>http://davidben.net/blog/2010/04/05/type-safe-offsets/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Running KDE out of $HOME: Lockers</title>
		<link>http://davidben.net/blog/2010/03/29/running-kde-lockers/</link>
		<comments>http://davidben.net/blog/2010/03/29/running-kde-lockers/#comments</comments>
		<pubDate>Mon, 29 Mar 2010 05:09:08 +0000</pubDate>
		<dc:creator>davidben</dc:creator>
				<category><![CDATA[Packaging]]></category>
		<category><![CDATA[kde]]></category>
		<category><![CDATA[package management]]></category>

		<guid isPermaLink="false">http://davidben.scripts.mit.edu/blog/?p=287</guid>
		<description><![CDATA[Spring break is over. As expected, I didn&#8217;t manage to finish most of my projects for the week, but I did manage one of them: my laptop is is now running an install of KDE 4.4 parallel to the system 4.3 provided by Kubuntu. Why I did this was described previously. Actually managing it was [...]]]></description>
				<content:encoded><![CDATA[<p>Spring break is over. As expected, I didn&#8217;t manage to finish most of my projects for the week, but I did manage one of them: my laptop is is now running an install of KDE 4.4 parallel to the system 4.3 provided by Kubuntu.</p>
<div id="attachment_288" class="wp-caption aligncenter" style="width: 310px"><a href="/blog/wp-content/uploads/2010/03/kde-4.4.png"><img src="/blog/wp-content/uploads/2010/03/kde-4.4-300x187.png" alt="KDE 4.4, out of home directory" title="KDE 4.4, out of home directory" width="300" height="187" class="size-medium wp-image-288" /></a><p class="wp-caption-text">A parallel KDE 4.4, with the, uh, most important apps in KDE: KNetWalk and Potato Guy</p></div>
<p>Why I did this was described <a href="http://davidben.scripts.mit.edu/blog/2010/03/22/parallel-installation/">previously</a>. Actually managing it was not that simple. We do not live in a perfect world, and indeed it&#8217;s silly to expect all of KDE to run without any root activity &#8212; any setuid portions, or global dbus configuration, for instance. Still, I wanted to try. For this and the next few posts, I&#8217;ll talk about the setup.</p>
<p>I have been managing software out of my home directory for quite some time now. To that end, I&#8217;ve built up a collection of functions in <a href="http://zsh.sourceforge.net/">zsh</a>, my primary shell. (There&#8217;s no particular reason why they&#8217;re in zsh; I just prefer it to bash.) They are inspired by the software lockers of MIT&#8217;s <a href="http://ist.mit.edu/services/athena">Athena</a> system and the runtime setup of of <a href="http://zero-install.sourceforge.net/">Zero Install</a>. At some point, I expect this system will converge to something that smells very much like part of Zero Install.</p>
<p>Any time I need some software which Ubuntu does not provide, I build it myself (or, if I&#8217;m lucky, find binary to unpack) isolated somewhere in my home directory. The current convention so far has been <code>~/pkg/<var>PKGNAME</var></code> for random software or <code>~/proj-build/<var>PROJECTNAME</var></code> for things I&#8217;m working on, but I&#8217;m not particular happy with this naming scheme. (It&#8217;s come up mostly by accident. I&#8217;ll likely move everything into <code>~/pkg</code> or something.) Every locker contains approximately a UNIX directory tree.</p>
<p>A set of (fairly hacky) shell functions then inject subdirectories, as appropriate, into the environment when a locker is to be added. Unlike Zero Install, the variables are not specified by the locker. Instead, the shell script will look for, e.g. <code>bin</code>, and add it to, e.g. <code>PATH</code>, if it exists. This was mostly done out of laziness. At some point, variable choices will become the locker&#8217;s business. Current variables set include</p>
<ul>
<li><code>PATH</code></li>
<li><code>LD_LIBRARY_PATH</code></li>
<li><code>PKG_CONFIG_PATH</code></li>
<li><code>MANPATH</code></li>
<li><code>PYTHONPATH</code></li>
<li><code>XDG_DATA_DIRS</code></li>
<li><code>XDG_CONFIG_DIRS</code></li>
<li><code>INCLUDEPATH</code></li>
<li><code>CMAKE_PREFIX_PATH</code></li>
</ul>
<p>There are two commands, borrowed from Athena, to add a locker to an environment. The first is <code>dir_run</code> which runs a command with the given locker injected. The second is <code>dir_add</code> which injects a locker into your current environment. I primarily use <code>dir_run</code> with fancy completion scripts, but my dot files <code>dir_add</code> any lockers which I use often or want injected into my standard environment.</p>
<p>So far, this setup has allowed me to run my system <a href="http://projects.gnome.org/evince/"><code>evince</code></a> and <a href="http://okular.kde.org/"><code>okular</code></a> on a development build of <a href="http://poppler.freedesktop.org/"><code>popper</code></a> when I <a href="http://cgit.freedesktop.org/poppler/poppler/log/?qt=author&#038;q=David+Benjamin">hack</a> on it. It&#8217;s allowed me to maintain a local build of <a href="http://git-scm.com/">git</a>. It&#8217;s allowed me to parallel-install multiple snapshots of <a href="http://build.chromium.org/buildbot/snapshots/chromium-rel-linux-64/">Chromium</a>. It&#8217;s even allowed me to, via <code>dir_add</code>, replace my system&#8217;s PyKerberos, when a <a href="http://trac.calendarserver.org/ticket/355">bug</a> in the packaged version prevented system software from using it. And, indeed, it allows me to run KDE out of my home directory.</p>
<p>Of course, building KDE for this wasn&#8217;t simply a matter of stuffing things into a folder and launching it. There were numerous problems along the way which I had to address, which I&#8217;ll describe in later posts.</p>
<p>If anyone wants my hacky <a href="http://web.mit.edu/davidben/Public/lockers.zsh">scripts</a>, they can be found in my athena Public. <em>A disclaimer: they are hairy and very much need a cleanup.</em> Also, they might need to tweaks to work well in bash; zsh lets me be lazy about <a href="http://zsh.sourceforge.net/Doc/Release/Expansion.html#SEC68">quoting</a> arguments. All that said, it&#8217;s sufficient for my needs and, despite being far from a true package management system, I think superior to anything <code>apt</code> or <code>dpkg</code> offers when it comes to maintaining different software configurations in parallel.</p>
]]></content:encoded>
			<wfw:commentRss>http://davidben.net/blog/2010/03/29/running-kde-lockers/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Parallel installation</title>
		<link>http://davidben.net/blog/2010/03/22/parallel-installation/</link>
		<comments>http://davidben.net/blog/2010/03/22/parallel-installation/#comments</comments>
		<pubDate>Mon, 22 Mar 2010 08:37:50 +0000</pubDate>
		<dc:creator>davidben</dc:creator>
				<category><![CDATA[Packaging]]></category>
		<category><![CDATA[package management]]></category>

		<guid isPermaLink="false">http://davidben.scripts.mit.edu/blog/?p=226</guid>
		<description><![CDATA[It&#8217;s no secret that I am unhappy with package management on Linux. One of these days, I&#8217;ll gather coherent enough thoughts on how things should work. In the meantime, here&#8217;s a glimpse at one of the biggest problems today. If you look at the package management stacks in use on Linux today, be it apt/dpkg [...]]]></description>
				<content:encoded><![CDATA[<p>It&#8217;s no secret that I am unhappy with package management on Linux. One of these days, I&#8217;ll gather coherent enough thoughts on how things should work. In the meantime, here&#8217;s a glimpse at one of the biggest problems today.</p>
<p>If you look at the package management stacks in use on Linux today, be it apt/dpkg or yum/rpm or whatever, they share a fundamental assumption: there will only ever be one version of any package on the system. I argue that this mode of thinking is simply <em>incorrect</em> for a package manager on the free desktop. We need a package manager which fundamentally assumes parallel installation of packages. While correct parallel-install semantics are difficult, the flaws it fixes are well worth the effort.</p>
<h3>Testing</h3>
<p>One important use for parallel installation is testing. The user-base on any platform is different, and multiple configurations should be tested. One possibility is to use a separate machine, but this is painful. Indeed, Microsoft has not solved this problem; web designers wishing to run IE 6 and 7 concurrently were recommended to use a <a href="http://blogs.msdn.com/cwilso/archive/2006/02/01/522281.aspx">virtual machine</a>.</p>
<p>When I used Windows, I used <a href="http://portableapps.com/apps/internet/firefox_portable">Portable Firefox</a> for this. On Linux, I similarly download the <a href="http://www.mozilla.com/en-US/firefox/all.html">official tarballs</a> and run them out of my home directory, taking care not to eat my profile. But why should I manually manage this when I have a state-of-the-art (if the rumors on every Ubuntu advocate&#8217;s top 10 lists are true) package manager on my system!</p>
<h3>Safe fall-backs</h3>
<p>Related to the needs of testing environments is the ability to fall-back when software breaks. for instance, my current browser of choice is <a href="http://www.google.com/chrome">Chrome</a>. Now, Google provides an apt repository for Chrome, and yet I use the Chromium <a href="http://build.chromium.org/buildbot/snapshots/chromium-rel-linux-64/">nightlies</a>. The apt repositories force me into a single-install setup. Chrome is a very fast-moving target, and things sometimes break. Yet, I appreciate the movement as features I require quickly get added, such as client-side certificates. There is a simple solution with parallel installation: I keep around my old version when updating to a new build. If the new one proves unstable, I just revert to using the old one.</p>
<p>(These days things are less unstable than before. Should Youtube&#8217;s <a href="http://www.youtube.com/html5">HTML5 video</a> fix its quality problems, I&#8217;ll likely start using Chrome proper. Of course, my Chromium setup still parallel-installs, so I can rollback at will.)</p>
<p>But why bother? I can just uninstall the new version and install the old version. In  practice, this doesn&#8217;t work. A month or so ago, <a href="http://kde.org/announcements/4.4/">KDE 4.4</a> was released. I, being the avid KDE user that I am, was eager to try it. Well, Kubuntu offered backported <a href="http://www.kubuntu.org/news/kde-sc-4.4">packages</a>&#8230; why not? I can always go back to 4.3 if I wanted, right? To make a long story short, no. When I rolled back, dpkg and apt got woefully confused and I reinstalled most of the software on my machine. I am now in the process of creating a KDE 4.4 to run out of my home directory. When the last few remaining kinks are ironed out, I&#8217;ll describe the setup.</p>
<p>As they say, the best code is code you don&#8217;t have to write. Likewise, the best rollback procedure is one you don&#8217;t have to perform.</p>
<h3>Incompatibilities</h3>
<p>Finally, parallel installation acknowledges a fundamental fact of library compatibility: no two different pieces of software are completely compatible. Distributions love to force every package to use the same copy of every library. Most of the time, this is a sound and sensible goal. But it often falls short of reality. Even if the  author of a library is very careful about keeping API and ABI working, programs may depend on subtle effects.</p>
<p>Take, for instance, this hypothetical situation. libfoo has a bug which causes some functionality of bar to fail. Bar eventually diagnoses this and perhaps even sends a patch to libfoo. In the meantime, bar should still work, so bar adds a workaround for this bug. This is, sadly, not compatible with the fix, so the workaround is conditionalized on libfoo&#8217;s version. Now, a distributor comes along, packages an older libfoo for stability, but backports the fix. Now bar fails to work on that distribution. Think I&#8217;m exaggerating? Search for &#8220;Debian&#8221; on these Eclipse <a href="http://www.gentleware.com/fileadmin/media/archives/userguides/apolloforeclipse_installguide/ch02s04.html">release notes</a>. Indeed, their solution is to install a different version of GTK+. Would it not be better if we could parallel install GTK+ and only use this specially crafted one for Eclipse?</p>
<p>Sometimes a package may even be incompatible with itself. SQLite has countless incompatible <a href="http://www.kdedevelopers.org/node/4156">build options</a>. The only possible solution is for every program to bundle its own SQLite in parallel.</p>
<h3>Conclusion</h3>
<p>Linux package managers of today are inadequate for supporting a platform for developers, content producers, and users. We need package managers which allow as much of the system as possible to be parallel-installed to support the evolving, disorganized nature of the Linux desktop.</p>
]]></content:encoded>
			<wfw:commentRss>http://davidben.net/blog/2010/03/22/parallel-installation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>An almost-constant factor</title>
		<link>http://davidben.net/blog/2010/03/14/an-almost-constant-factor/</link>
		<comments>http://davidben.net/blog/2010/03/14/an-almost-constant-factor/#comments</comments>
		<pubDate>Mon, 15 Mar 2010 02:10:06 +0000</pubDate>
		<dc:creator>davidben</dc:creator>
				<category><![CDATA[Algorithms]]></category>

		<guid isPermaLink="false">http://davidben.scripts.mit.edu/blog/?p=164</guid>
		<description><![CDATA[Often when solving a problem, a useful approach is to do some local work which then reduces to a smaller instance. Sometimes this smaller instance can be complex to build. In the case of Kruskal&#8217;s algorithm for finding a minimum spanning tree, we pick edges and recurse over a reduced graph. Specifically, we identify two [...]]]></description>
				<content:encoded><![CDATA[<p>Often when solving a problem, a useful approach is to do some local work which then reduces to a smaller instance. Sometimes this smaller instance can be complex to build. In the case of <a href="http://en.wikipedia.org/wiki/Kruskal's_algorithm">Kruskal&#8217;s algorithm</a> for finding a <a href="http://en.wikipedia.org/wiki/Minimum_spanning_tree">minimum spanning tree</a>, we pick edges and recurse over a reduced graph. Specifically, we identify two vertices as &#8220;the same&#8221;, merge them together and continue. Well, we can literally merge the vertices, but if vertices have many degrees or the wrong edges are picked, it is easy to get a needless O(<var>N</var><sup>2</sup>) or the like. We would like to do this more efficiently.</p>
<p>Having a dynamic notion of &#8220;sameness&#8221; is common in many problems. Quite conveniently, there is a data structure that allows one to track this efficiently, called <dfn>union-find</dfn> or the <dfn>disjoint-set data structure</dfn>.</p>
<p>The spec of union-find is as follows: Initially, we have <var>N</var> objects and <var>N</var> buckets, with object <var>i</var> in bucket <var>i</var>. We then provide two operations <code>union</code> and <code>find</code>. <code>find</code> takes an object and returns the bucket it lives in. <code>union</code> takes two buckets and merges them into one. Naively, implementing this efficiently isn&#8217;t trivial. To implement <code>union</code>, we must alter state of many objects at once. To resolve this, we do what is often done in data structures and lazily post-pone work done on writes to read time.</p>
<p>For every object, we create a node with a single pointer coming out. This pointer means &#8220;I am in the same bucket as&#8221;. A self pointer means that this object is also a bucket (or they are the representative element of their bucket). Initially, our structure looks like this:</p>
<p><img src="/blog/wp-content/uploads/2010/03/00_initial-state.png" title="Initial state" alt="Initial state" /></p>
<p>To union buckets, we take take their representative elements and hang one under the other. For instance, we were to union 1 with 2, and then 3 and 5 with 4, it may look like this:</p>
<p><img src="/blog/wp-content/uploads/2010/03/01_union_some.png" title="After unions" alt="After unions" /></p>
<p>Note that these are trees, but with only parent pointers. To find, we just walk up the structure until we reach a bucket and return. However, we&#8217;ve offloaded perhaps too much work at read time. It is very easy to reach this configuration, with an arbitrarily long chain:</p>
<p><img src="/blog/wp-content/uploads/2010/03/02_longchain.png" title="A problematic configuration" alt="A long chain" /></p>
<p>A find of 5 will take linear time &mdash; unacceptable. The whole point of this exercise is to avoid linear queries. However, we may add a simple optimization. When unioning two trees, we have a choice: either the first hangs below the second or vice versa. It is better to hang the shorter one underneath the taller one, to avoid increasing the height. (If the two trees are of equal height, we must increase by 1, so pick arbitrarily.) To manage this, we do a little book-keeping and maintain the tree of each node&#8217;s subtree. This is called <dfn>union by rank</dfn>. It turns out that this alone will guarantee balanced trees. Each operation will run in time O(lg <var>N</var>). That&#8217;s already pretty good, but we can do better!</p>
<p>Consider a fairly expensive find. We may traverse something like this:</p>
<p><img src="/blog/wp-content/uploads/2010/03/03_compression.png" title="The result of a find" alt="The result of a find" /></p>
<p>So, now we&#8217;ve learned that <code>find(5) = 1</code>. Now, I ask you this again. And again. And <em>again</em>. Each time, we walk up the tree. Why bother, when you can cache? We can just repoint 5 from 4 all the way to 1. But why stop there? We traversed 4, so why don&#8217;t we rewrite its pointers?</p>
<p><img src="/blog/wp-content/uploads/2010/03/04_compression2.png" title="Rewrite edges" alt="With rewritten edges" /></p>
<p>By rewriting every edge along any path we walk, we <em>never</em> traverse a path twice. By paying for one expensive <code>find</code> of 5, we speed up future queries to anyone on the path. Furthermore, queries to anyone underneath those nodes (7, 8, and 9) get sped up too.</p>
<p><img src="/blog/wp-content/uploads/2010/03/05_compression3.png" title="The resulting tree" alt="The resulting tree" /></p>
<p>This optimization is called <dfn>path compression</dfn>. By a fairly involved proof, one can show that, using path compression alone, we obtain <a href="http://en.wikipedia.org/wiki/Amortized">amortized</a> O(lg <var>N</var>) performance.</p>
<p>So what if we put them together? With both path compression and union by rank, we bring the time down to O(&alpha;(<var>N</var>)) where &alpha; is the inverse <a href="http://en.wikipedia.org/wiki/Ackermann_function">Ackermann</a> function. This is a ridiculously fast-growing function. A(4, 4) is 2<sup>2<sup>65,536</sup></sup>, which is far bigger than any number you will care about. Furthermore, you can prove that any implementation of this must make O(&alpha;(<var>N</var>)) operations amoritized. So, not only is this fancy expression for O(4) possible, but is its <em>optimal</em>. It&#8217;s also extremely simple; a couple dozen lines of your favorite language will suffice.</p>
<aside>(There is a minor technicality in that maintaining tree heights along with path compression is messy. If you just ignore path compression in your book-keeping, things still work fine. Instead of storing the height, you just store an upper bound on it.)</aside>
]]></content:encoded>
			<wfw:commentRss>http://davidben.net/blog/2010/03/14/an-almost-constant-factor/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Tar-filled pipes</title>
		<link>http://davidben.net/blog/2010/02/28/tar-filled-pipes/</link>
		<comments>http://davidben.net/blog/2010/02/28/tar-filled-pipes/#comments</comments>
		<pubDate>Sun, 28 Feb 2010 21:07:01 +0000</pubDate>
		<dc:creator>davidben</dc:creator>
				<category><![CDATA[Software]]></category>
		<category><![CDATA[tar]]></category>
		<category><![CDATA[unix]]></category>

		<guid isPermaLink="false">http://davidben.scripts.mit.edu/blog/?p=121</guid>
		<description><![CDATA[Follow-up to A Very Subtle Bug from nelhage, reposted from a discussion on zephyr. The tar format is, conceptually, a very simple one. You concatenate a bunch of files together and preface each with a metadata header (path, size, etc.). Partial extraction of a single file requires a linear walk across the archive until you [...]]]></description>
				<content:encoded><![CDATA[<p>Follow-up to <a href="http://blog.nelhage.com/archives/150">A Very Subtle Bug</a> from <a href="http://blog.nelhage.com/">nelhage</a>, reposted from a discussion on zephyr.</p>
<p>The <code>tar</code> format is, conceptually, a very simple one. You concatenate a bunch of files together and preface each with a metadata header (path, size, etc.). Partial extraction of a single file requires a linear walk across the archive until you find the record you want. Of course, once you&#8217;ve extracted it, the file can be closed and no more work need be done. This, combined with a piped <code>gzip</code> and Python&#8217;s odd <code>SIGPIPE</code> handling, gives the problem from nelhage&#8217;s <a href="http://blog.nelhage.com/archives/150">A Very Subtle Bug</a>.</p>
<p>But the details don&#8217;t quite seem to work that way. lbzip2 on reddit <a href="http://www.reddit.com/r/programming/comments/b7djd/stuff_like_this_makes_me_hate_python_subtle_bugs/c0lc0dy">notes</a> that, on a large file,</p>
<blockquote><p>
GNU tar 1.20 didn&#8217;t stop reading from lbzip2 after finding and extracting the file from the tar stream. (That stream continues after the specified file for another 270M or so, and the compressed tarball continues for another 47M or so.)
</p></blockquote>
<p>So what is going on here? Because it&#8217;s so much fun, let&#8217;s source-dive! The primary loop is a <code>read_and</code> function in <code>src/list.c</code> (abridged):</p>
<blockquote><pre>/* Main loop for reading an archive.  */
void
read_and (void (*do_something) (void))
{
  /* [Initialize some things...] */
  open_archive (ACCESS_READ);
  do
    {
      prev_status = status;
      tar_stat_destroy (&#038;current_stat_info);

      status = read_header (false);
      /* [Call do_something () per appropriate header] */
    }
  while (!all_names_found (&#038;current_stat_info));

  close_archive ();
  names_notfound ();            /* print names not found */
}</pre>
</blockquote>
<p>Certainly looks like we close the archive after we&#8217;ve seen everything we care about. Looking at <code>all_names_found</code> from <code>src/names.c</code>, it iterates over the arguments reasonably and checks if they&#8217;ve all been seen. However, there is one funny check before that loop:</p>
<blockquote><pre>  if (!p->file_name || occurrence_option == 0 || p->had_trailing_slash)
    return false;</pre>
</blockquote>
<p><code>occurrence_option</code> corresponds to the <code>--occurrence</code> option. Quoth the man page:</p>
<blockquote><pre> --occurrence
       process only the NUMBERth occurrence of each file in the archive;</pre>
</blockquote>
<p>What does that mean? Well, like I said, tar files are very simple. You concatenate files together. They are <em>so</em> simple that duplicate files are allowed. Both versions get extracted and the later ones override the earlier ones. <code>tar</code> does not, and cannot, abort upon seeing all files because there may be newer versions later. The <code>--occurrence</code> option allows you to specify that you want a particular set of versions. Only then will <code>tar</code> prematurely cut off the pipe.</p>
<p>Given that, why the occasional <code>SIGPIPE</code> bug? We&#8217;ve established that, by default, <code>tar</code> will not prematurely close the pipe after extracting, so there must be some place where we close the pipe. Looking back to <code>read_and</code>, it does break out of the loop in other cases: end of file (<code>HEADER_END_OF_FILE</code>) and NUL block (<code>HEADER_ZERO_BLOCK</code>). The latter is handled by this snippet (abridged):</p>
<blockquote><pre>
    case HEADER_ZERO_BLOCK:
      if (block_number_option)
        {
          char buf[UINTMAX_STRSIZE_BOUND];
          fprintf (stdlis, _("block %s: ** Block of NULs **\n"),
                   STRINGIFY_BIGINT (current_block_ordinal (), buf));
        }

      set_next_block_after (current_header);

      if (!ignore_zeros_option)
        {
          /* [Long comment about POSIX compatibility, disabled warning] */
          break;
        }
      status = prev_status;
      continue;</pre>
</blockquote>
<p>Unless one passes <code>-i</code> or <code>--ignore-zeroes</code>, NUL blocks are treated as EOF. And indeed, if one inspects a random tar file with <code>-i</code> and <code>--block-number</code>,</p>
<blockquote><pre>davidben@rupert:/tmp% tar -tzf tar_1.22.orig.tar.gz -i --block-number | tail
block 22151: ** Block of NULs **
block 22152: ** Block of NULs **
block 22153: ** Block of NULs **
block 22154: ** Block of NULs **
block 22155: ** Block of NULs **
block 22156: ** Block of NULs **
block 22157: ** Block of NULs **
block 22158: ** Block of NULs **
block 22159: ** Block of NULs **
block 22160: ** End of File **</pre>
</blockquote>
<p>(This file appears to end in 22 of them.) And now we have the culprit. Tar files end with a few NUL blocks, signifying end-of-file. <code>tar</code> closes the pipe on the first, leaving a few blocks written by <code>gzip</code> and ignored by <code>tar</code>. This race condition allows for <code>tar</code> to finish before <code>gzip</code> does, triggering the Python problem.</p>
<p>A final note: don&#8217;t start passing <code>--occurrence</code> to all your <code>tar</code> calls. The logic in <code>all_names_found</code> does rather odd things with directories and does strange things with some tarballs. This will be the subject of a future post, possibly after some mail with <code>bug-tar@gnu.org</code>.</p>
]]></content:encoded>
			<wfw:commentRss>http://davidben.net/blog/2010/02/28/tar-filled-pipes/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
	</channel>
</rss>
