<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Blog</title>
	<atom:link href="http://blog.stuffedcow.net/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.stuffedcow.net</link>
	<description>Random stuff...</description>
	<lastBuildDate>Fri, 16 Mar 2012 03:30:31 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Thinkpad X61 Tablet LCD Bubbles</title>
		<link>http://blog.stuffedcow.net/2012/03/thinkpad-x61-tablet-lcd-bubbles/</link>
		<comments>http://blog.stuffedcow.net/2012/03/thinkpad-x61-tablet-lcd-bubbles/#comments</comments>
		<pubDate>Tue, 06 Mar 2012 22:50:54 +0000</pubDate>
		<dc:creator>Henry</dc:creator>
				<category><![CDATA[Fixing Stuff]]></category>

		<guid isPermaLink="false">http://blog.stuffedcow.net/?p=317</guid>
		<description><![CDATA[The Lenovo Thinkpad X61 tablet with the "12.1-in Super Wide Angle SXGA+ TFT display with 1400 x 1050 resolution" option uses a BOE-Hydis HV121P01-101 AFFS display with glass bonded to the LCD panel. The display is prone to developing bubbles in the adhesive layer in between the LCD and glass. There's a long thread at Lenovo forums about the bubbles and the difficulty of getting it replaced or repaired. Here's my case of bubbles <span style="color:#777"> . . . &#8594; Read More: <a href="http://blog.stuffedcow.net/2012/03/thinkpad-x61-tablet-lcd-bubbles/">Thinkpad X61 Tablet LCD Bubbles</a></span>]]></description>
			<content:encoded><![CDATA[<p><a href="http://blog.stuffedcow.net/wp-content/uploads/2012/03/tablet.jpg"><img src="http://blog.stuffedcow.net/wp-content/uploads/2012/03/tablet-300x298.jpg" alt="" title="X61 Tablet" width="300" height="298" class="alignright size-medium wp-image-590" /></a></p>
<p>The Lenovo Thinkpad X61 tablet has a &#8220;12.1-in Super Wide Angle SXGA+ TFT display with 1400 x 1050 resolution&#8221; option. The display panel is the BOE-Hydis <a href="http://blog.stuffedcow.net/wp-content/uploads/2011/06/HV121P01-101.pdf">HV121P01-101</a>, an <a href="http://www.hydis.com/eng/04_rnd/rnd_03.asp">AFFS</a> display with glass bonded to the LCD panel. The predecessor used in the X60 tablet (HV121P01-100) uses the same LCD panel but did not bond the cover glass to the LCD.</p>
<p>The display is prone to developing bubbles in the adhesive layer in between the LCD and glass. There&#8217;s a long thread at Lenovo forums (<a href="http://forums.lenovo.com/t5/X-Series-Tablet-ThinkPad-Laptops/Air-Bubble-In-LCD-Screen-X61-Tablet/td-p/172982">Air Bubble In LCD Screen X61 Tablet</a>) about the bubbles and the difficulty of getting it replaced or repaired.<br style="clear: both;"/></p>
<h2>Pictures</h2>
<p>&nbsp;</p>
<p><a href="http://blog.stuffedcow.net/wp-content/uploads/2012/03/bubble1.jpg"><img class="size-medium wp-image-538 alignright" title="Air bubble" src="http://blog.stuffedcow.net/wp-content/uploads/2012/03/bubble1-300x157.jpg" alt="" width="300" height="157" /></a><br />
Air bubbles appearing near the bottom edge of the display.<br style="clear: right;"/></p>
<p><a href="http://blog.stuffedcow.net/wp-content/uploads/2012/03/bubble2.jpg"><img class="size-medium wp-image-539 alignright" title="Air bubble" src="http://blog.stuffedcow.net/wp-content/uploads/2012/03/bubble2-300x158.jpg" alt="" width="300" height="158" /></a><br />
More air bubbles.<br style="clear: right;"/></p>
<p><a href="http://blog.stuffedcow.net/wp-content/uploads/2012/03/bubble3.jpg"><img class="size-medium wp-image-540 alignright" title="Air bubble" src="http://blog.stuffedcow.net/wp-content/uploads/2012/03/bubble3-300x142.jpg" alt="" width="300" height="142" /></a><br />
 This picture shows a new bubble forming inside an existing bubble. A new bubble would emerge every few days. The buttons below the display were also sticky from the leaked adhesive. <br style="clear: both;"/></p>
<p><a href="http://blog.stuffedcow.net/wp-content/uploads/2012/03/bezel.jpg"><img src="http://blog.stuffedcow.net/wp-content/uploads/2012/03/bezel-300x93.jpg" alt="" title="Sticky bezel" width="300" height="93" class="alignright size-medium wp-image-555" /></a></p>
<p>The bezel was sticky. This is the adhesive from the LCD, not the double-sided tape melting. It was gooey only along the bottom edge where the LCD adhesive was leaking.</p>
<p><br style="clear: both;"/></p>
<p><a href="http://blog.stuffedcow.net/wp-content/uploads/2012/03/panel1.jpg"><img src="http://blog.stuffedcow.net/wp-content/uploads/2012/03/panel1-300x117.jpg" alt="" title="LCD Panel" width="300" height="117" class="alignright size-medium wp-image-562" /></a><br />
To see what&#8217;s going on inside, the bezel was removed. There was a lot of leaked adhesive, especially along the bottom-left edge. It appeared as though the adhesive was being wicked out of the bottom edge of the LCD by adhering to and flowing along the plastic bezel piece, gumming up the buttons. Air bubbles entered to replace the loss of adhesive.<br />
<br style="clear: both;"/></p>
<p><a href="http://blog.stuffedcow.net/wp-content/uploads/2012/03/repair.jpg"><img src="http://blog.stuffedcow.net/wp-content/uploads/2012/03/repair-300x136.jpg" alt="" title="Repair attempt" width="300" height="136" class="alignright size-medium wp-image-566" /></a><br />
The most obvious repair method is to apply heat and pressure to squeeze out the air bubbles, with enough heat to redistribute the adhesive evenly across the entire display (resulting in a slightly thinner adhesive layer). That didn&#8217;t work: I successfully squeezed out the air bubbles, but the heated adhesive did not behave like a Newtonion fluid (non-zero shear stress at rest), so it tended to flow back to its original position, refusing to be evenly redistributed.<br />
<br style="clear: both;"/><br />
<a href="http://blog.stuffedcow.net/wp-content/uploads/2012/03/postrepair1.jpg"><img src="http://blog.stuffedcow.net/wp-content/uploads/2012/03/postrepair1-300x103.jpg" alt="" title="More bubbles" width="300" height="103" class="alignright size-medium wp-image-568" /></a><br />
Some adhesive was lost during the repair attempt, there was continuing leakage out the bottom, and the adhesive refused to flow, so even more air was drawn in to replace the adhesive deficit along the bottom edge. This resulted in even more bubbles after the repair attempt.<br />
<br style="clear: both;"/></p>
<p><a href="http://blog.stuffedcow.net/wp-content/uploads/2012/03/postrepair2.jpg"><img src="http://blog.stuffedcow.net/wp-content/uploads/2012/03/postrepair2-300x87.jpg" alt="" title="Big bubbles" width="300" height="87" class="alignright size-medium wp-image-569" /></a><br />
This is the current state, after a second repair attempt using higher temperatures, also failing in the same way. I&#8217;m hoping I can get some fresh adhesive (of the same kind) to fill in the (big) bubble, since it seems hopeless to try to redistribute the existing adhesive.<br />
<br style="clear: both;"/></p>
<h2>Which Adhesive?</h2>
<p>There are many suppliers for optical adhesive:</p>
<ul>
<li><a href="http://www.dymax.com/products/optical/index.php">Dymax</a></li>
<li><a href="https://www.norlandprod.com/adhesiveindex.html">Norland</a></li>
<li><a href="http://www.optical-cement.com/cements/products.html">Summers Optical</a></li>
<li><a href="http://solutions.3m.com/wps/portal/3M/en_US/electronics/home/productsandservices/products/ProductNavigator/Chemicals/?PC_7_RJH9U5230GE3E02LECIE20KAJ2_nid=SNK85KT2PRbeG7W0C8BLTFgl">3M</a></li>
<li><a href="http://www2.dupont.com/Displays/en_US/products_services/vertak/vertak_adhesive.html">DuPont Vertak</a></li>
<li><a href="http://www.loctite.com.au/cps/rde/xchg/henkel_aue/hs.xsl/product-4593.htm">Henker Loctite</a></li>
<li><a href="http://www.hernon.com/hernonmfg/index.php?page=shop.browse&amp;category_id=26&amp;option=com_virtuemart&amp;Itemid=1">Hernon Ultrabond</a></li>
</ul>
<p>The X61 tablet uses a Hydis HV121P01-101 screen, the same as those found in a <a href="http://forum.tabletpcreview.com/motion-computing/29883-le1700-screen-upgrade-4.html#post194726" target="_blank">Motion Computing LE1700</a> tablet. Motion Computing advertises <a href="http://www.motioncomputing.ca/choose/spec_display_x5.htm" target="_blank">later Hydis displays</a> using DuPont Vertak bonding, so there&#8217;s a good chance the older Hydis HV121P01-101 also uses a DuPont adhesive (Vertak DBA1000/2000 or a predecessor?). </p>
<p>The <a href="http://www2.dupont.com/Displays/en_US/assets/downloads/pdf/AdhesivesDatasheet.pdf" target="_blank">Vertak adhesives</a> have low elastic and shear modulus and strength, which might contribute to its tendency to flow out the side of the display. Qualitatively, the gooey adhesive I see in my display agrees with the properties listed in the datasheet.</p>
<h2>Repair?</h2>
<p>I think I can make a decent repair of the bubbles by filling in the bubbles with fresh adhesive (ideally of the same kind), curing it, and then sealing the edge of the display with silicone to prevent future adhesive leaks. I don&#8217;t have a clean room, so I&#8217;m not able to completely replace the adhesive without getting dust into the display.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.stuffedcow.net/2012/03/thinkpad-x61-tablet-lcd-bubbles/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Ikea &#8220;Laver&#8221; Chair repair</title>
		<link>http://blog.stuffedcow.net/2012/01/ikea-laver-chair-repair/</link>
		<comments>http://blog.stuffedcow.net/2012/01/ikea-laver-chair-repair/#comments</comments>
		<pubDate>Sat, 28 Jan 2012 01:45:45 +0000</pubDate>
		<dc:creator>Henry</dc:creator>
				<category><![CDATA[Fixing Stuff]]></category>

		<guid isPermaLink="false">http://blog.stuffedcow.net/?p=514</guid>
		<description><![CDATA[Ikea "Laver" chair. $10. Metal frame, "PP-CO" (polypropylene/polyethylene copolymer?) seat and back. Spots near the two front corners are high-stress and fail (crack) easily. This is a repair attempt <span style="color:#777"> . . . &#8594; Read More: <a href="http://blog.stuffedcow.net/2012/01/ikea-laver-chair-repair/">Ikea &#8220;Laver&#8221; Chair repair</a></span>]]></description>
			<content:encoded><![CDATA[<p><a href="http://blog.stuffedcow.net/wp-content/uploads/2012/01/laver.jpg"><img class="aligncenter size-medium wp-image-515" title="Ikea Laver chair" src="http://blog.stuffedcow.net/wp-content/uploads/2012/01/laver-300x298.jpg" alt="" width="300" height="298" /></a>Ikea &#8220;Laver&#8221; chair. $10. Metal frame, &#8220;PP-CO&#8221; (polypropylene/polyethylene copolymer?) seat and back. Spots near the two front corners are high-stress and fail (crack) easily.</p>
<p><a href="http://blog.stuffedcow.net/wp-content/uploads/2012/01/IMG_1665.jpg"><img class="aligncenter size-medium wp-image-517" title="Ikea Laver chair" src="http://blog.stuffedcow.net/wp-content/uploads/2012/01/IMG_1665-300x225.jpg" alt="" width="300" height="225" /></a></p>
<p><a href="http://blog.stuffedcow.net/wp-content/uploads/2012/01/IMG_1669.jpg"><img class="aligncenter size-medium wp-image-522" title="Chair frame" src="http://blog.stuffedcow.net/wp-content/uploads/2012/01/IMG_1669-300x225.jpg" alt="" width="300" height="225" /></a></p>
<p>Chair frame without seat.</p>
<p><a href="http://blog.stuffedcow.net/wp-content/uploads/2012/01/IMG_1662.jpg"><img class="aligncenter size-medium wp-image-519" title="Cracked seat" src="http://blog.stuffedcow.net/wp-content/uploads/2012/01/IMG_1662-300x225.jpg" alt="" width="300" height="225" /></a></p>
<p>Front corners are high-stress points that crack. This one has already been welded back together (welded from the back, so the weld isn&#8217;t very visible.)</p>
<p><a href="http://blog.stuffedcow.net/wp-content/uploads/2012/01/IMG_1664.jpg"><img class="aligncenter size-medium wp-image-520" title="Welded and glued cracks" src="http://blog.stuffedcow.net/wp-content/uploads/2012/01/IMG_1664-300x225.jpg" alt="" width="300" height="225" /></a></p>
<p>Back side also glued with polyurethane glue. Almost nothing sticks to polypropylene, so the glue might peel once it&#8217;s back in use&#8230; We&#8217;ll see if it&#8217;s good enough.<br />
<a href="http://blog.stuffedcow.net/wp-content/uploads/2012/01/IMG_1663.jpg"><img class="aligncenter size-medium wp-image-521" title="Glue, bubbling..." src="http://blog.stuffedcow.net/wp-content/uploads/2012/01/IMG_1663-300x225.jpg" alt="" width="300" height="225" /></a></p>
<p>The plastic is scored with a knife to make a rougher surface so the glue might hold better.</p>
<p><a href="http://blog.stuffedcow.net/wp-content/uploads/2012/01/IMG_1670.jpg"><img class="aligncenter size-medium wp-image-528" title="Window screen for reinforcement" src="http://blog.stuffedcow.net/wp-content/uploads/2012/01/IMG_1670-300x225.jpg" alt="" width="300" height="225" /></a>Next step: Reinforce the weak spot using some fiberglass window screen soaked in epoxy. Or maybe epoxy-polyurethane mix. Something mostly rigid, but not so hard that it cracks when flexed slightly, hopefully.</p>
<p><a href="http://blog.stuffedcow.net/wp-content/uploads/2012/01/IMG_1673.jpg"><img class="aligncenter size-medium wp-image-529" title="Partially-glued window screen" src="http://blog.stuffedcow.net/wp-content/uploads/2012/01/IMG_1673-300x225.jpg" alt="" width="300" height="225" /></a>Fibreglass window screen partially glued with a bit of epoxy.</p>
<p><a href="http://blog.stuffedcow.net/wp-content/uploads/2012/01/epoxy.jpg"><img class="aligncenter size-medium wp-image-531" title="Epoxy" src="http://blog.stuffedcow.net/wp-content/uploads/2012/01/epoxy-300x225.jpg" alt="" width="300" height="225" /></a>Window screen covered with epoxy.</p>
<h3>Update</h3>
<p>Two months later, the crack hasn&#8217;t spread. Cosmetically, the crack is visible. Structurally, the repair seems good enough.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.stuffedcow.net/2012/01/ikea-laver-chair-repair/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Mediawiki Parsers</title>
		<link>http://blog.stuffedcow.net/2012/01/mediawiki-parsers/</link>
		<comments>http://blog.stuffedcow.net/2012/01/mediawiki-parsers/#comments</comments>
		<pubDate>Tue, 10 Jan 2012 02:58:40 +0000</pubDate>
		<dc:creator>Henry</dc:creator>
				<category><![CDATA[Measuring Stuff]]></category>

		<guid isPermaLink="false">http://blog.stuffedcow.net/?p=474</guid>
		<description><![CDATA[<p>A parser is used to translate wikitext to HTML for viewing. Since there are a bunch of parser projects for MediaWiki&#8217;s markup, I&#8217;ll go benchmark some of them to see how fast they run.</p> Parsers Parser Language Description MediaWiki 1.18.0 PHP Parser from the production MediaWiki, templates disabled. PHP5 Wiki Parser PHP A series of <span style="color:#777"> . . . &#8594; Read More: <a href="http://blog.stuffedcow.net/2012/01/mediawiki-parsers/">Mediawiki Parsers</a></span>]]></description>
			<content:encoded><![CDATA[<p>A parser is used to translate wikitext to HTML for viewing. Since there are a bunch of parser projects for MediaWiki&#8217;s markup, I&#8217;ll go benchmark some of them to see how fast they run.<span id="more-474"></span></p>
<h2>Parsers</h2>
<table>
<tbody>
<tr>
<th>Parser</th>
<th>Language</th>
<th>Description</th>
</tr>
<tr>
<td><a href="http://www.mediawiki.org/">MediaWiki 1.18.0</a></td>
<td>PHP</td>
<td>Parser from the production MediaWiki, templates disabled.</td>
</tr>
<tr>
<td><a href="http://www.d2g.org.uk/index.php?plugin=home&amp;action=WikiParser">PHP5 Wiki Parser</a></td>
<td>PHP</td>
<td>A series of regular expression matches to replace various elements of wikitext.</td>
</tr>
<tr>
<td><a href="http://rendering.xwiki.org/">xWiki renderer</a>
<td>Java
<td>Uses JavaCC parser generator. Used in xWiki</tr>
<tr>
<td><a href="http://wiki.eclipse.org/Mylyn/WikiText">MyLyn WikiText</a>
<td>Java
<td>Used in MyLyn</tr>
<tr>
<td><a href="http://www.sweble.org/">Sweble</a>
<td>Java
<td>JFlex-generated lexer and Rats!-generated parser</tr>
<tr>
<td><a href="http://code.google.com/p/gwtwiki/">Bliki</a> 3.0.16
<td>Java
<td></tr>
<tr>
<td><a href="https://github.com/aboutus/kiwi">Kiwi</a>
<td>C
<td>Uses <a href="http://piumarta.com/software/peg/">leg</a>-generated parser. Used on aboutus.org.</tr>
<tr>
<td><a href="http://svn.wikimedia.org/viewvc/mediawiki/trunk/parsers/graveyard/flexbisonparse/">flexbisonparse</a>
<td>C
<td>flex-generated lexer and bison-generated parser</tr>
</tbody>
</table>
<p>I also tried <a href="https://github.com/tanin47/wiky">Wiky</a> (Ruby), <a href="http://code.google.com/p/wikimodel/">WikiModel</a> 2.0.6 (Java), and <a href="http://svn.wikimedia.org/viewvc/mediawiki/trunk/parsers/graveyard/libmwparser/">libmwparser</a>. These crashed on some of the test documents&#8230;</p>
<p>Although this was intended to be a comparison between programming/scripting languages, the data isn&#8217;t really valid for this purpose. The algorithms between parsers, the subset of the language syntax it supports, and the correctness of the output varies between the parsers. Draw your own conclusions&#8230;</p>
<h2>Test documents</h2>
<p>I just chose a bunch of mostly-random documents (using <a href="http://en.wikipedia.org/wiki/Special:Random">Special:Random</a>) that exercised various features of the language (short/long documents, tables, images).</p>
<ul>
<li><a href="http://en.wikipedia.org/w/index.php?title=Agasanahalli_%28Dharwad%29&amp;oldid=445823561">Agasanahalli (Dharwad)</a> (1,587 bytes, a short article)</li>
<li><a href="http://en.wikipedia.org/w/index.php?title=Predicted_outcome_value_theory&amp;oldid=440560813">Predicted outcome value theory</a> (11,331 bytes, mainly text)</li>
<li><a href="http://en.wikipedia.org/w/index.php?title=Eurosong_%2706&amp;oldid=435274583">Eurosong &#8217;06</a> (12,186 bytes, lots of tables)</li>
<li><a href="http://en.wikipedia.org/w/index.php?title=Automobile&amp;oldid=468397325">Automobile</a> (49,471 bytes)</li>
<li><a href="http://en.wikipedia.org/w/index.php?title=MediaWiki&amp;oldid=469956408">Mediawiki</a> (80,149 bytes)</li>
<li><a href="http://en.wikipedia.org/w/index.php?title=List_of_tallest_buildings_in_New_York_City&amp;oldid=469762639">List of tallest buildings in New York City</a> (81,648 bytes, more tables with images)</li>
<li><a href="http://en.wikipedia.org/w/index.php?title=Nuclear_program_of_Iran&amp;oldid=469648134">Nuclear program of Iran</a> (292,201 bytes, a long article)</li>
<li><a href="http://en.wikipedia.org/w/index.php?title=The_Young_and_the_Restless_minor_characters&amp;oldid=470012183">The Young and the Restless minor characters</a> (386,012 bytes, a long article)</li>
</ul>
<h2>Results</h2>
<div id="attachment_491" class="wp-caption aligncenter" style="width: 660px"><a href="http://blog.stuffedcow.net/wp-content/uploads/2012/01/parse_geomean.png"><img src="http://blog.stuffedcow.net/wp-content/uploads/2012/01/parse_geomean.png" alt="MediaWiki parser runtime chart" title="Parser runtime geomean" width="632" height="519" class="size-full wp-image-491" /></a><p class="wp-caption-text">Geometric mean of runtime over all 8 test documents for each parser</p></div>
<p>It&#8217;s not surprising that MediaWiki&#8217;s parser is the slowest of the bunch. It&#8217;s written in a scripting language (PHP), is the most feature-complete, and doesn&#8217;t use fancy parsing algorithms. PHP5 Wiki Parser is probably faster because it processes only a small subset of the syntax. As far as I know, a few of the others are in production use: xWiki (parser in xWiki), WikiText (MyLyn), and Kiwi (parser used on aboutus.org). Flexbisonparse stands out as being particularly fast (113x!), and it would be interesting to see whether it can robustly support a sufficient subset of the MediaWiki syntax in production without giving up all its speed. Flex and Bison are both around 25 years old, yet they&#8217;re both still alive and well.</p>
<div id="attachment_487" class="wp-caption aligncenter" style="width: 803px"><a href="http://blog.stuffedcow.net/wp-content/uploads/2012/01/parse_separate.png"><img src="http://blog.stuffedcow.net/wp-content/uploads/2012/01/parse_separate.png" alt="MediaWiki parser runtime by test document" title="Normalized parser runtime by document" width="793" height="519" class="size-full wp-image-487" /></a><p class="wp-caption-text">Normalized parser runtime by document</p></div>
<p>Here are some normalized runtimes broken down by document. The objective is to show whether certain parsers have particular strengths for particular document types. The data are normalized to the parsers&#8217; geomean runtime so the geomean for each parser is 1. The data are also normalized by document so that the geomean for each document is also 1. The relative runtimes appear quite random: none of the parsers seem to scale particularly well or poorly with document length.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.stuffedcow.net/2012/01/mediawiki-parsers/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Samsung Linux CUPS USB Printing</title>
		<link>http://blog.stuffedcow.net/2011/10/samsung-linux-cups-usb-printing/</link>
		<comments>http://blog.stuffedcow.net/2011/10/samsung-linux-cups-usb-printing/#comments</comments>
		<pubDate>Wed, 19 Oct 2011 05:32:35 +0000</pubDate>
		<dc:creator>Henry</dc:creator>
				<category><![CDATA[Fixing Stuff]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[usb]]></category>

		<guid isPermaLink="false">http://blog.stuffedcow.net/?p=420</guid>
		<description><![CDATA[Ever since upgrading from Mandriva 2010.0(?) to 2010.1 (and also 2010.2), both of my Samsung laser printers have been intermittent. Print jobs would often be silently discarded. CUPS logs show that the print jobs are completed, the printer would warm up, the printer's LED blinks once or twice, then the print job is "complete". But nothing gets printed. <span style="color:#777"> . . . &#8594; Read More: <a href="http://blog.stuffedcow.net/2011/10/samsung-linux-cups-usb-printing/">Samsung Linux CUPS USB Printing</a></span>]]></description>
			<content:encoded><![CDATA[<p>I have two Samsung USB laser printers connected to the same machine:</p>
<ul>
<li>ML-1740</li>
<li>ML-2010</li>
</ul>
<p>Ever since upgrading from Mandriva 2010.0(?) to 2010.1 (and also 2010.2), both printers have been intermittent. Print jobs would often be silently discarded. CUPS logs show that the print jobs are completed, the printer would warm up, the printer&#8217;s LED blinks once or twice, then the print job is &#8220;complete&#8221;. But nothing gets printed. Unplugging the USB cable will allow printing at least one print job, but sending usb_clear_halt, a printer class SOFT_RESET, nor usb_reset() will get the printer to work.</p>
<p>Since the same behaviour occurs for both printers, and started at the same time (after an upgrade), it&#8217;s probably not a random hardware failure.</p>
<p>The symptoms are similar to some posts out there:</p>
<ul>
<li><a href="https://bbs.archlinux.org/viewtopic.php?pid=778104">CUPS &#8211; can&#8217;t seem to print more than one page</a> (ML-2010, libusb, Arch Linux)</li>
<li><a href="http://groups.google.com/group/alt.os.linux.mandriva/browse_thread/thread/2d2f19e0e1b27e8c/a7960bf862a96ac3">Print jobs intermittently corrupted</a> (Several Samsung printers, Mandriva 2010.2)</li>
<li><a href="http://ubuntuforums.org/showthread.php?t=1709488">Printing from Windows 7 to shared printer on Ubuntu 10.10 &#8211; Completed but no print!</a> (ML-2510)</li>
<li><a href="http://foo2zjs.rkkda.com/forum/read.php?58,2803">Samsung CLP-315 problem on Mandriva 2010.1</a> (CLP-315, using libusb)</li>
<li><a href="http://www.cups.org/str.php?L3964">CUPS STR #3964</a> (Several printers using libusb, Ubuntu)</li>
</ul>
<p>The common thing in all of these seem to be printing with CUPS using libusb with a Samsung printer (Mandriva removed usblp somewhere around 2010.1). Indeed, the Archlinux thread suggests that moving back to usblp printing solves the problem.</p>
<p>After tons of debugging, the problem appears to be a problem (flaw?) with Samsung&#8217;s firmware interacting with some behaviour in the CUPS USB (usb-libusb.c) backend. Hint: <a href="http://www.mjmwired.net/kernel/Documentation/usb/usbmon.txt">usbmon</a> with <a href="http://www.wireshark.org/">Wireshark</a> is very useful for capturing USB traffic.</p>
<p>In the current USB (usb-libusb.c) backend, a typical print job performs the following steps:</p>
<ol>
<li>Loops through all USB buses, devices, configurations, and interfaces, to look for USB <a href="http://www.usb.org/developers/devclass_docs/usbprint11.pdf">printer class</a> interfaces, sends a printer class GET_DEVICE_ID request to it, and tries to find the desired printer based on the returned string.</li>
<li>Opens the device (usb_open)</li>
<li>Sends a Set Configuration request to the device to set it to the desired configuration. (usb_set_configuration)</li>
<li>Claims the desired interface (usb_claim_interface)</li>
<li>Sends a Set Interface request to the device to choose the desired interface. (usb_set_altinterface)</li>
<li>Dumps the print data in 8,192-byte chunks in one or more(?) USB bulk transactions using usb_bulk_write</li>
<li>Releases interfaces that were claimed earlier (usb_release_interface)</li>
<li>Close the device (usb_close)
</ol>
<p>On my printers (and I expect most other printers too), steps 3 and 5 aren&#8217;t really useful. There is only one configuration (bConfiguration=1) and only one interface (bAlternateSetting=0) for that configuration. Normally some part of the OS already sends a Set Configuration to put the device into the <em>Configured</em> state (see USB spec) when the device is plugged in, so there is no real need to again set the configuration to the current value. Similarly, if there is only one altsetting, it is the default and there is no real use to setting it. </p>
<p>According to the USB spec, the above sequence of operations, including setting the configuration and altsetting after the device is already configured, is legal. However, it seems like my Samsung printers do not want to see steps 3 and 5 happen, even though the printer returns a success return code for those requests. It is the presence of the redundant Set Configuration and Set Interface requests that appear to cause the subsequent print job to sometimes/usually be silently discarded. Removing both those requests causes the printer to behave normally.</p>
<p>It appears usblp&#8217;s print sequence is much simpler: It sends a GET_DEVICE_ID to the printer, then dumps the print data to the printer using a USB bulk transfer. It does not set configuration or interface, perhaps (correctly) assuming that it was already done when the printer was first enumerated by usblp. This difference would explain why CUPS-usblp works fine but CUPS-libusb does not. However, since both CUPS-usblp and CUPS-libusb follow the USB spec, it&#8217;s likely Samsung&#8217;s firmware that is flawed here.</p>
<p>One way to work around the probelm is to simplify CUPS&#8217;s usb-libusb backend to not send Set Configuration or Set Interface requests when it&#8217;s not necessary. Samsung printers don&#8217;t seem to have a problem with Get Configuration requests, so I first query the current configuration, then change configurations only if the desired configuration is different from the current one.</p>
<h2>Upstream</h2>
<p>Submitted to CUPS as <a href="http://www.cups.org/str.php?L3965">STR #3965</a>.</p>
<h2>Downloads</h2>
<ul>
<li><strong>Patch</strong>: <a href="http://www.stuffedcow.net/files/cups-samsung/cups-1.4.3-usb-skip-interface-altsetting.patch">cups-1.4.3-usb-skip-interface-altsetting.patch</a>. For Mandriva 2010.2 cups-1.4.3.</li>
<li><strong>Patch</strong>: <a href="http://www.stuffedcow.net/files/cups-samsung/cups-1.6-r10087-usb-skip-interface-altsetting.patch">cups-1.6-r10087-usb-skip-interface-altsetting.patch</a>. For cups-1.6 svn r10087.</li>
<li><strong>Binary RPM</strong> for Mandriva 2010.2 x86_64: <a href="http://www.stuffedcow.net/files/cups-samsung/cups-1.4.3-3.1mdv2010.2.x86_64.rpm">cups-1.4.3-3.2mdv2010.2.x86_64.rpm</a></li>
<li><strong>Source RPM</strong> for Mandriva 2010.2: <a href="http://www.stuffedcow.net/files/cups-samsung/cups-1.4.3-3.1mdv2010.2.src.rpm">cups-1.4.3-3.2mdv2010.2.src.rpm</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://blog.stuffedcow.net/2011/10/samsung-linux-cups-usb-printing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Teksavvy MLPPP Performance</title>
		<link>http://blog.stuffedcow.net/2011/09/teksavvy-mlppp-performance-issues/</link>
		<comments>http://blog.stuffedcow.net/2011/09/teksavvy-mlppp-performance-issues/#comments</comments>
		<pubDate>Tue, 20 Sep 2011 13:23:52 +0000</pubDate>
		<dc:creator>Henry</dc:creator>
				<category><![CDATA[Measuring Stuff]]></category>
		<category><![CDATA[networking]]></category>

		<guid isPermaLink="false">http://blog.stuffedcow.net/?p=397</guid>
		<description><![CDATA[MLPPP on Bell's DSL GAS network doesn't work very well because the GAS network appears to reorder PPP frames (which is forbidden by RFC 1661). Ideally, Bell should stop reordering packets. The next best option is for the ISP and user to configure MRU and MRRU settings to reduce packet/frame fragmentation. With multilink PPPoE, the client should use an IP-MTU of 1486 bytes (1484 on Linux 2.6.31+ due to a bug), a MRU of 1492, and an MRRU of 1486. The ISP should use an MRU of 1492, MRRU of 1486 (possibly 1484 to work around the Linux bug, until the bug gets fixed), and apply the IP-MTU correctly (MRU-0 = 1492 for PPP, MRU-6 = 1486 for MLPPP). <span style="color:#777"> . . . &#8594; Read More: <a href="http://blog.stuffedcow.net/2011/09/teksavvy-mlppp-performance-issues/">Teksavvy MLPPP Performance</a></span>]]></description>
			<content:encoded><![CDATA[<p>After having a 25/7 Mbps DSL line installed (really only 25/3.6 Mbps), I started noticing some performance issues that resembled improper MTU settings that only occurred when multilink PPP was used. It turns out there was &#8220;packet loss&#8221; associated with fragmenting packets.</p>
<h2>Speedtest.net Numbers</h2>
<table>
<th>ISP
<th>Downstream Mbps
<th>Upstream Mbps</tr>
<tr>
<th>Bell PPP
<td>24.7 +/- 1.7
<td>3.61 +/- 0.02</tr>
<tr>
<th>Teksavvy PPP
<td>24.3 +/- 1.2
<td>3.62 +/- 0.005</tr>
<tr>
<th>Teksavvy MLPPP
<td>19.9 +/- 3.2
<td>3.55 +/- 0.12</tr>
</table>
<p>The speedtest.net numbers were measured using the same DSL line (different logins) using the Nexicom server, which is even closer through Teksavvy&#8217;s routing than Bell&#8217;s. This is the average of many tests over two days, with average and standard deviation of the results posted here. This is using an MTU of 1485, which works around all of the fragmentation issues. Multilink performs noticeably worse than the others.</p>
<h2>Packet Loss and Fragmentation</h2>
<p><img class="aligncenter size-full wp-image-398" title="packet_loss" src="http://blog.stuffedcow.net/wp-content/uploads/2011/09/packet_loss.png" alt="" width="706" height="369" /></p>
<p>Teksavvy negotiates the following options over LCP:</p>
<ul>
<li>MRU 1492</li>
<li>MRRU 32719</li>
</ul>
<p>This is a plot of packet loss rate vs. PPP frame size. With a large MRRU, the upstream direction consists of a single large ICMP echo request packet encapsulated into a single large PPP frame, then fragmented at the PPP layer into multiple MLPPP frames. In the downstream direction, Teksavvy does not fragment PPP frames, but fragments the IP packet with each fragment in its own PPP frame. MLPPP frames not greater than 1485 bytes are not fragmented. With MRRU > MRU, we can experiment with both PPP and IP fragmentation. </p>
<p>The periodic spikes in packet loss are abnormal. The spikes correspond to packet sizes at which the ICMP response packet is slightly larger than a multiple of the IP MTU, so there is a small fragment left over after fragmentation. It is the presence of these small fragments that cause packet loss.</p>
<h3>Reordering PPP frames causes packet loss</h3>
<p>The MLPPP session violates RFC 1990 Section 4.1 by delivering PPP frames out of order. When a large packet followed by a tiny fragment, the PPP frame containing the small fragment often arrives sooner than the PPP frame containing the preceding large fragment. RFC 1990 forbids out of order MLPPP frames, so the only choice is to discard frames that arrive out of order, which causes the observed &#8220;packet loss&#8221;.</p>
<p>The same symptoms occur in both upstream and downstream directions, regardless of whether the small frames originated from IP packet fragments or PPP frame fragments. The reordering happens at the PPP layer: The IP fragment offsets are in-order relative to the MLPPP sequence number. It appears this PPP frame reordering is done by Teksavvy: Using the same DSL line, modem, and configuration, a Teksavvy login will reorder small packets ahead of a big one, but a Bell DSL login will not.</p>
<p>With plain IP on PPP (no PPP multilink, compression, nor encryption), this isn&#8217;t a huge issue. PPP doesn&#8217;t notice when packets are out of order, and IP doesn&#8217;t care. Out of order packet delivery breaks everything else, however.</p>
<p>For MLPPP, I think other than getting the PPP layer to deliver packets in order, the next best thing is to avoid creating tiny frames to reduce the probability of getting a reordered packet. Reducing fragmentation should help, but it probably won&#8217;t help if tiny ACK packets are being reordered too. MLPPP requires that the underlying PPP frames are delivered in order as RFC 1661 requires: mostly-in-order isn&#8217;t sufficient.</p>
<p>It looks like this might be an attempt to improve performance by prioritizing small frames (likely to be ACK packets). But the PPP layer would be the wrong place to do this.</p>
<h3>Reducing fragmentation via MRU/MRRU options</h3>
<p>If the network will deliver tiny MLPPP frames out of order, we can avoid some of the penalty by reducing the number of small frames. We can&#8217;t affect how many small IP packets (and empty ACKs) are sent, so the best we can do is avoid IP or PPP fragmentation.</p>
<p>There are two significant sources of fragmentation: Downstream IP packet fragmentation, and upstream PPP frame fragmentation. It appears Teksavvy&#8217;s router does not do downstream PPP frame fragmentation, and upstream IP fragmentation should be rare if Path MTU detection works normally.</p>
<h4>Upstream</h4>
<ol>
<li>On single-link bundles, when MRRU > (MRU &#8211; headers(6 bytes)), large PPP frames are fragmented to fit in the MRU. Since the MRRU is used as the IP-layer MTU, PPP frame fragmentation can be avoided by lowering the server MRRU to (1492-6 = 1486), keeping MRU at 1492.
<p>Workaround: Set the client IP-layer MTU to 1486 manually to avoid upstream PPP frame fragmentation.</li>
<li>Additionally, the Linux kernel (2.6.31-rc6 and newer) seems to have a bug where it fragments PPP frames into MLPPP fragments 2 bytes smaller than allowed by the MRU. (drivers/net/ppp_generic.c: ppp_mp_explode()) Lowering the server MRRU (or client IP MTU) to 1484 may be even better.</li>
</ol>
<h4>Downstream</h4>
<ol start=3>
<li>It appears the client&#8217;s MRU is used incorrectly by Teksavvy&#8217;s router. The MRU value should apply to the PPP payload, but Teksavvy&#8217;s routers appear to use it for the IP layer&#8217;s MTU without subtracting off the 6-byte multilink headers. Because of this, Teksavvy&#8217;s routers will send PPP frames that are up to 6 bytes longer than the client&#8217;s MRU. The routers handle the MRRU option correctly. Normally, the client MRRU is used for the IP MTU and the PPP layer will fragment frames to fit within MRU, but Teksavvy does not fragment PPP frames.
<p>   Workaround: At least one of the routers (206.248.154.106) seems to have hard-coded a maximum IP-layer MTU of 1486. Otherwise, the client MRRU should be set to 1486 (with MRU=1492).</p>
<ul>
<li>Teksavvy&#8217;s workaround has a side effect when multilink is disabled. When the MTU is really 1492, the router fragments IP packets at 1486 anyway.
<li>This IP-layer fragmentation ignores the don&#8217;t-fragment DF bit. This breaks Path-MTU discovery.
<p>   To Teksavvy: I think the DF bit should be respected, but this might make bad<br />
   configurations fail even harder.
</ul>
<li>There is no hard-coded 1486-byte IP-MTU on 206.248.154.103. Workaround: The client should ask for a 1486-byte MRRU because the 1492-byte MRU option is handled incorrectly.
</li>
</ol>
<h4>Summary</h4>
<p>Client:
<ul>
<li>Set IP-MTU to 1486 (or 1484 on Linux 2.6.31+) to avoid upstream PPP frame fragmentation (#1)</li>
<li>Set MRU=1492. This is the correct value even though Teksavvy doesn&#8217;t handle it correctly.</li>
<li>Set MRRU to 1486 because Teksavvy does not correctly handle a MRU of 1492. (#2)</li>
</ul>
<p>Teksavvy:</p>
<ul>
<li>Setting server MRRU to 1486 (or 1484 for Linux clients) would avoid upstream PPP fragmentation without the client manually setting his MTU. (#1)</li>
<li>MRU should be 1492 as it is now.</li>
<li>Hard-coding IP-MTU to 1486 bytes is a good workaround for MLPPP use, but not so good for non-MLPPP users who want IP-MTU=1492, but please respect the DF flag! (#3)</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://blog.stuffedcow.net/2011/09/teksavvy-mlppp-performance-issues/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Hyper-Threading Performance</title>
		<link>http://blog.stuffedcow.net/2011/08/hyperthreading-performance/</link>
		<comments>http://blog.stuffedcow.net/2011/08/hyperthreading-performance/#comments</comments>
		<pubDate>Mon, 08 Aug 2011 02:12:00 +0000</pubDate>
		<dc:creator>Henry</dc:creator>
				<category><![CDATA[Measuring Stuff]]></category>

		<guid isPermaLink="false">http://blog.stuffedcow.net/?p=357</guid>
		<description><![CDATA[Intel uses Hyper-Threading (HT) as a feature for market segmentation: The desktop Core i5 processors differ from the Core i7 mainly by whether HT has been disabled, and Intel charges a significant price premium for the Core i7. Does the performance improvement of HT justify its cost? I test the performance of HT using a selection of cluster-type workloads. <span style="color:#777"> . . . &#8594; Read More: <a href="http://blog.stuffedcow.net/2011/08/hyperthreading-performance/">Hyper-Threading Performance</a></span>]]></description>
			<content:encoded><![CDATA[<p>Simultaneous multithreading (SMT, or Intel Hyper-Threading) is a method of improving the utilization and throughput of a processor by allowing two independent program threads to share the execution resources of one processor, so when one thread stalls the processor can execute ready instructions from a second thread instead of sitting idle. Because only the thread context state and a few other resources are replicated (unlike replicating entire processor cores), the throughput improvement depends on whether the shared execution resources are a bottleneck and is typically much less than 2x with two threads.</p>
<p>Currently, Intel uses HT as a feature for market segmentation: The desktop Core i5 processors differ from the Core i7 mainly by whether HT has been disabled, and Intel charges a significant price premium for the Core i7. Therefore, I want to know what the performance benefit of HT is. Since my workloads usually involve running many independent single-threaded processes on a cluster of machines, these measurements don&#8217;t use multithreaded workloads.</p>
<h2>Hardware</h2>
<table>
<tr>
<th>Processor
<td>Intel Core i7-860 (4 cores, 8 threads, 3.2 GHz Turbo disabled, 8&nbsp;MB L3 cache, Lynnfield)</tr>
<tr>
<th>Memory
<td>8&nbsp;GB DDR3 1600 @ 1530</tr>
</table>
<h2>Workloads</h2>
<table>
<tr>
<th>Workload
<th>Description</tr>
<tr>
<td>Dhrystone
<td>Version 2.1. A synthetic integer benchmark. Compiled with Intel C Compiler 11.1</tr>
<tr>
<td><a href="http://www.coremark.org/">CoreMark</a>
<td>Version 1.0. Another integer CPU core benchmark, intended as a replacement for Dhrystone. Compiled with Intel C Compiler 12.0.3</tr>
<tr>
<td>Kernel Compile
<td>Compile kernel-tmb-2.6.34.8 using GCC 4.4.3</tr>
<tr>
<td><a href="http://www.eecg.utoronto.ca/vpr/">VPR</a>
<td>Academic FPGA packing, placement, and routing tool from the University of Toronto. Modified version 5.0. Intel C Compiler 11.1</tr>
<tr>
<td><a href="http://www.altera.com/products/software/sfw-index.jsp">Quartus</a>
<td>Commercial FPGA design software for Altera FPGAs. Compile a 6,000-LUT circuit for the Stratix III FPGA. Includes logic synthesis and optimization (quartus_map), packing, placement, and routing (quartus_fit), and timing analysis (quartus_sta). Version 10.0, 64-bit.</tr>
<tr>
<td><a href="http://bochs.sourceforge.net/">Bochs</a>
<td>Instruction set (functional) simulator of an x86 PC system. This benchmark runs the first ~4 billion timesteps of a simulation of a system booting Windows XP. Modified version 2.4.6. GCC 4.4.3</tr>
<tr>
<td><a href="http://www.simplescalar.com">SimpleScalar</a>
<td>Processor microarchitecture simulator. This test runs sim-outorder (a cycle-accurate simulation of a dynamically-scheduled RISC processor), simulating 100M instructions. Version 3.0. Compiled with GCC 4.4.3</tr>
<tr>
<td><a href="http://www.gpgpu-sim.org/">GPGPU-Sim</a>
<td>Cycle-level simulator of contemporary GPU microarchitectures running CUDA and OpenCL workloads. Version 3.0.9924.</tr>
</table>
<h2>Throughput Scaling with Multiple Threads</h2>
<p>With the exception of the kernel compile workload, all of these tests start multiple instances of the same task and measures the total throughput of the processor (number of tasks/average runtime for task). Kernel compile uses &#8220;make -j&#8221; to run multiple instances of GCC to independently compile each file, and the time to compile the entire kernel is measured.</p>
<p>The number of simultaneous tasks are varied and plotted. For workloads that are not memory-bound, we expect roughly linear improvement in throughput between 1 and 4 threads (for a 4-core processor), less improvement between 4 and 8 threads (the additional benefit of HT), and roughly no change in throughput beyond 8 threads (these tasks have little IO).</p>
<p><img src="http://blog.stuffedcow.net/wp-content/uploads/2011/08/thread-scaling.png" alt="" title="Thread Scaling" width="577" height="385" class="aligncenter size-full wp-image-370" /></p>
<p>This line plot shows all of the data in one plot. The workload throughput scales reasonably close to linear with the number of real cores they use (1 to 4 threads), while throughput improvements due to HT vary between workloads. Interestingly, Dhrystone throughput decreases with HT, while CoreMark has the second-highest gain (behind VPR), yet both of them are small integer benchmarks that have little main memory traffic.</p>
<h2>Hyper-Threading Throughput Scaling</h2>
<p><img src="http://blog.stuffedcow.net/wp-content/uploads/2011/08/ht-speedup.png" alt="" title="HT Speedup" width="577" height="387" class="aligncenter size-full wp-image-373" /></p>
<p>This chart focuses on comparing the throughput at 8 threads vs. 4 threads for the different workloads. The median improvement for HT is 25%.</p>
<h2>Multicore Throughput Scaling</h2>
<p><img src="http://blog.stuffedcow.net/wp-content/uploads/2011/08/multicore-speedup.png" alt="" title="Multicore Speedup" width="577" height="387" class="aligncenter size-full wp-image-375" /></p>
<p>This plot compares the throughput at 4 threads (1 thread of each core used) vs. 1 thread. If independent processes are executing on independent cores, we would expect to see 4x improvement in throughput when running 4 copies of the workload. In the Core i7, the L3 cache and memory system are shared between cores. Scaling less than 4x with 4 independent threads indicates that the workload is sensitive to L3 cache size or memory system bandwidth. </p>
<p>Note that the kernel compile workload isn&#8217;t strictly independent, so sub-linear scaling does not necessarily mean GCC is sensitive to cache size or memory system bandwidth. The kernel compile workload compiles different files in parallel, with some dependencies between tasks.</p>
<p>Most of the workloads scale close to 4x with 4 cores. Other than kernel compile, Quartus and GPGPU-Sim workloads scale significantly worse than linear. Quartus is known to be sensitive to memory performance. I don&#8217;t know about GPGPU-Sim&#8217;s characteristics, but this might be a hint that it, too, has fairly random access patterns on a large memory working set.</p>
<h2>Core i5 or Core i7?</h2>
<p>The above measurements were made using a Lynnfield Core i7, but future purchasing decisions would be for the next-generation Sandy Bridge. It is unknown how closely the performance gains for HT on Sandy Bridge processors match the HT gains for Lynnfield, although I would expect them to be similar.</p>
<p>As of today, a Sandy Bridge Core i7-2600K costs around $300, while a Core i5-2500K costs $210. The system cost is around 22% higher for the Core i7-2600K assuming each node has the same amount of RAM (so memory/thread is half on the HT system compared to the non-HT system). This indicates that price-performance is <i>slightly</i> higher for Core i7 with HT (22% more price for 25% more performance) when running cluster-type workloads. However, because price-performance is so close, there are other issues to consider:</p>
<ul>
<li>Hyper-Threading requires twice as many threads to achieve peak CPU utilization, requiring each system to have twice the amount of RAM to keep memory/thread constant. Cluster-type workloads run independent processes that don&#8217;t share memory, so memory consumption is nearly linear with the number of threads. Doubling the RAM for a HT system further adds to the system cost.</li>
<li>Hyper-Threading creates the potential for load imbalance, where one node has more tasks than physical cores (is using HT) while another node has physical cores idling. This is the same scheduling problem as discussed in my <a href="http://blog.stuffedcow.net/2011/08/linux-smt-aware-process-scheduling/">post on SMT-aware process schedulers</a>, but extended to scheduling between different compute nodes. This could be significant for long-running jobs, although with 4C-8T, the likely impact of this should be small (Probability theory escapes me for the moment, I don&#8217;t know how much).</li>
<li>Hyper-Threading should be a net power-performance win. HT does consume significantly more power when used, but I believe it&#8217;s less than 20%. Will need measuring.</li>
<li>Although a net power win, HT CPUs have higher power density and are harder to cool. For temperature-limited overclocks, a non-HT CPU will likely clock slightly higher, even if total system power is higher.</li>
<li>Hyper-Threading is a performance-density win, because the alternative of having more nodes in a non-HT cluster occupies more space, even if it doesn&#8217;t cost more.</li>
</ul>
<p>Conclusion: I still don&#8217;t know&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.stuffedcow.net/2011/08/hyperthreading-performance/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Linux SMT-Aware Process Scheduling</title>
		<link>http://blog.stuffedcow.net/2011/08/linux-smt-aware-process-scheduling/</link>
		<comments>http://blog.stuffedcow.net/2011/08/linux-smt-aware-process-scheduling/#comments</comments>
		<pubDate>Fri, 05 Aug 2011 03:00:01 +0000</pubDate>
		<dc:creator>Henry</dc:creator>
				<category><![CDATA[Measuring Stuff]]></category>
		<category><![CDATA[linux]]></category>

		<guid isPermaLink="false">http://blog.stuffedcow.net/?p=322</guid>
		<description><![CDATA[<p>Process scheduling for multicore multithreaded (SMT or HT) systems adds a new challenge to an operating system&#8217;s process scheduler. Two threads scheduled on different cores will run faster than two threads scheduled onto different thread contexts of the same core because much of the hardware resources are shared between SMT thread contexts. This can be <span style="color:#777"> . . . &#8594; Read More: <a href="http://blog.stuffedcow.net/2011/08/linux-smt-aware-process-scheduling/">Linux SMT-Aware Process Scheduling</a></span>]]></description>
			<content:encoded><![CDATA[<p>Process scheduling for multicore multithreaded (SMT or HT) systems adds a new challenge to an operating system&#8217;s process scheduler. Two threads scheduled on different cores will run faster than two threads scheduled onto different thread contexts of the same core because much of the hardware resources are shared between SMT thread contexts. This can be a problem when there is more than one thread, but fewer than (one less than) the number of thread contexts: A scheduler can mistakenly schedule two threads on the same core while leaving another core idle.<span id="more-322"></span></p>
<p>The default process scheduler in Linux (as of kernel 2.6.34) is <a href="http://en.wikipedia.org/wiki/Completely_Fair_Scheduler">CFQ (Completely Fair Queuing)</a>. The &#8220;tmb&#8221; series of kernels optional in Mandriva distributions come with the <a href="http://en.wikipedia.org/wiki/Brain_Fuck_Scheduler">BFS (Brain Fuck Scheduler)</a> as default. BFS claims to be SMT-aware, much simpler, and lower latency than CFQ, targeted for &#8220;interactive&#8221; desktop-style use on small systems.</p>
<p>Here, I measure how &#8220;SMT-aware&#8221; the BFS and CFQ schedulers are at running a simple CPU-intensive workload.</p>
<h2>Workload</h2>
<p>Dhrystone 2.1 compiled for x86-64 with gcc. This is a simple integer benchmark that has a small memory footprint and no I/O. I run multiple independent instances of Dhrystone at the same time. Interestingly, Dhrystone&#8217;s throughput actually falls with hyperthreading, by roughly 4% when running two threads compared to one.</p>
<h2>System</h2>
<ul>
<li>Core i7 860. 4 cores, 2-way SMT</li>
<li>Mandriva Linux, kernel-tmb-server-2.6.34.7-3mdv. BFS and CFQ process schedulers.</li>
</ul>
<h2>CFQ and BFS</h2>
<p>When scheduling threads to cores and thread contexts, there are three trivial cases (for N thread contexts): One runnable thread (placing it on any thread context is the same), and N-1 runnable threads (leave exactly one thread context idle, which one doesn&#8217;t matter), and N or more threads (fill every thread context, and rotate between them fairly). With several threads, the ideal scheduling is to first give each thread its own core, and then double up (use both thread contexts on a core) when there are more threads than cores. The &#8220;Manual Affinity&#8221; option below manually assigns Dhrystone tasks onto processors using this rule.</p>
<p><img src="http://blog.stuffedcow.net/wp-content/uploads/2011/08/dmips-scheduler.png" alt="" title="DMIPS Thread Scaling by Scheduler" width="577" height="385" class="aligncenter size-full wp-image-326" /></p>
<p>Both CFS and manual affinity perform the same. On a 4-core, 8-thread system, Dhrystone performance increases linearly when up to 4 instances are run. Beyond 4 instances, throughput drops slightly until 8 instances since Dhrystone performs worse with SMT than without. Most other applications would see a shallow upward slope and <i>gain</i> an extra 15-25% throughput between 4 and 8 instances.</p>
<p>BFS bounces the tasks around different thread contexts and often gets the scheduling wrong, leading to lower performance between 2 and 6 instances. Not good.</p>
<h2>Is BFS SMT-Aware?</h2>
<p>If BFS isn&#8217;t very good at scheduling for multicore SMT systems, we should ask whether BFS is SMT-aware at all. I compare BFS to a hypothetical random scheduler that randomly assigns threads to thread contexts. The plot is calculated using straightforward probability, assuming a thread runs at full speed when running alone on a core and gains no additional throughput from SMT (which is slightly better than Dhrystone&#8217;s -4%), then scaled to match the graph. The Manual Affinity results are also plotted for comparison.</p>
<p><img src="http://blog.stuffedcow.net/wp-content/uploads/2011/08/bfs-smt.png" alt="" title="BFS SMT Awareness" width="577" height="385" class="aligncenter size-full wp-image-335" /><br />
The graph shows that BFS is indeed SMT-aware. It achieves performance roughly halfway between the ideal scheduling and a random scheduler. It&#8217;s just not very good at fully utilizing all the cores before allowing threads to double up onto one core.</p>
<h2>Ubuntu 10.10 and Core i7 980X</h2>
<p>Out of curiosity, I ran the same test on a 6-core, 12-thread Core i7 980X overclocked to 4.22 GHz.<br />
<img src="http://blog.stuffedcow.net/wp-content/uploads/2011/08/dmips-980x.png" alt="" title="dmips-980x" width="577" height="385" class="aligncenter size-full wp-image-336" /><br />
The Manual Affinity and CFQ plots are normal between 1 and 4 Dhrystone instances (linear increase), but performance is abnormally slow at 5 or more instances. Reason: The processor exceeds 100&deg;C and is being throttled.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.stuffedcow.net/2011/08/linux-smt-aware-process-scheduling/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PSU Fan Control</title>
		<link>http://blog.stuffedcow.net/2010/12/psu-fan-control/</link>
		<comments>http://blog.stuffedcow.net/2010/12/psu-fan-control/#comments</comments>
		<pubDate>Mon, 27 Dec 2010 23:35:58 +0000</pubDate>
		<dc:creator>Henry</dc:creator>
				<category><![CDATA[Fixing Stuff]]></category>

		<guid isPermaLink="false">http://blog.stuffedcow.net/?p=301</guid>
		<description><![CDATA[The Sparkle Power SPI270LE Flex ATX power supply has a rather noisy 40&#160;mm fan that that is temperature-controlled to reduce noise when cool. The fan control circuit failed and powered the fan at full speed regardless of temperature.  <span style="color:#777"> . . . &#8594; Read More: <a href="http://blog.stuffedcow.net/2010/12/psu-fan-control/">PSU Fan Control</a></span>]]></description>
			<content:encoded><![CDATA[<p>The Sparkle Power SPI270LE Flex ATX power supply has a rather noisy 40&nbsp;mm fan that that is temperature-controlled to reduce noise when cool. The fan control circuit failed and powered the fan at full speed regardless of temperature. After much work, it is repaired. </p>
<p>Power supplies usually use single-layer circuit boards, making reverse-engineering the circuit possible. It is interesting the power supply uses entirely discrete devices using two bipolar PNP transistor to implement fan speed control.<br />
<a href="http://blog.stuffedcow.net/wp-content/uploads/2010/12/fancontrol.png"><img src="http://blog.stuffedcow.net/wp-content/uploads/2010/12/fancontrol-300x132.png" alt="" title="Fan Speed Control Schematic" width="300" height="132" class="alignnone size-medium wp-image-305" /></a></p>
<p>A <a href="http://www.thinking.com.tw/documents/en-TTC05.pdf">TTC103</a> negative temperature coefficient thermistor is used (10k at 25Â°C, 2.5k at 60Â°C, 1k at 85Â°C). The circuit appears to use both the output voltage as negative feedback and the thermistor voltage to set the output current that controls fan speed. The diode is probably used to ensure the fan spins even at low temperatures.</p>
<p>The first (left) PNP transistor failed, and leaked current from the base to collector. This caused enough emitter-base current to turn on the fan even when the base voltage is high. The failed transistor was replace with a TO-92 PNP transistor (BC327). Surface mount components are hard to solder&#8230;</p>
<p><a href="http://blog.stuffedcow.net/wp-content/uploads/2010/12/transistor.jpg"><img src="http://blog.stuffedcow.net/wp-content/uploads/2010/12/transistor-300x224.jpg" alt="" title="Transistor Replaced" width="300" height="224" class="alignnone size-medium wp-image-308" /></a></p>
<p>Being curious how the fan speed control circuit really behaves, I simulated it using LTspice. The fan is modeled as a 100 ohm resistance. Output voltage and the thermistor voltage are plotted against thermistor resistance. Thermistor resistance is non-linear (roughly 10k at 25Â°C, 1k at 85Â°C). The temperature at which the fan speeds up can be adjusted using the 6.8k/36k divider and is strongly dependent on the forward base-emitter voltage of the transistors. Two BC327 transistors are simulated because I can&#8217;t read the marking on the transistor. The circuit creates a nice speed vs. temperature curve, with a reasonably linear transition region (55Â°C-95Â°C) and still keeping the fans on (27mA) at low temperatures.</p>
<p><a href="http://blog.stuffedcow.net/wp-content/uploads/2010/12/spice.png"><img src="http://blog.stuffedcow.net/wp-content/uploads/2010/12/spice-300x218.png" alt="" title="Simulation" width="300" height="218" class="alignnone size-medium wp-image-310" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.stuffedcow.net/2010/12/psu-fan-control/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Refrigerators</title>
		<link>http://blog.stuffedcow.net/2010/11/refrigerators/</link>
		<comments>http://blog.stuffedcow.net/2010/11/refrigerators/#comments</comments>
		<pubDate>Sun, 28 Nov 2010 00:28:02 +0000</pubDate>
		<dc:creator>Henry</dc:creator>
				<category><![CDATA[Measuring Stuff]]></category>

		<guid isPermaLink="false">http://blog.stuffedcow.net/?p=274</guid>
		<description><![CDATA[It has been claimed that new refrigerators use much less power than old ones. This is also the premise of <a href="http://everykilowattcounts.ca/residential/fridge/faq.php">The Great Refrigerator Roundup</a> program that encourages replacement of refrigerators older than 15 years. Here is one comparison, measured over about 3 days. <span style="color:#777"> . . . &#8594; Read More: <a href="http://blog.stuffedcow.net/2010/11/refrigerators/">Refrigerators</a></span>]]></description>
			<content:encoded><![CDATA[<p>It has been claimed that new refrigerators use much less power than old ones. This is also the premise of <a href="http://everykilowattcounts.ca/residential/fridge/faq.php">The Great Refrigerator Roundup</a> program that encourages replacement of refrigerators older than 15 years. Here is one comparison, measured over about 3 days.</p>
<table>
<tr>
<td>
<th>New
<th>Old</tr>
<tr>
<th>Model
<td>GE GTH18HBT2RWW
<td>GE VL15JYM</tr>
<tr>
<th>Year
<td>2010
<td>1992</tr>
<tr>
<th>Capacity (ft<sup>3</sup>)
<td>18.1
<td>~15</tr>
<tr>
<th>Duty Cycle (%)
<td>25
<td>43</tr>
<tr>
<th>Average On Power (W)<br />(Includes compressor and defrost)
<td>101
<td>155</tr>
<tr>
<th>Average Power (W)
<td>25
<td>67</tr>
<tr>
<th>Average Power (kWh/yr)
<td>221
<td>585</tr>
<tr>
<th>EnerGuide Rated Power (kWh/yr)
<td>335
<td>816</tr>
</table>
<p>The new refrigerator uses 38% of the electricity of the old one, while also being 20% bigger. Assuming $0.10/kWh, the new refrigerator saves $36.40/year. Both refrigerators use about 70% of their rated consumption. </p>
]]></content:encoded>
			<wfw:commentRss>http://blog.stuffedcow.net/2010/11/refrigerators/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Seasonal Fuel Efficiency</title>
		<link>http://blog.stuffedcow.net/2010/11/seasonal-fuel-efficiency/</link>
		<comments>http://blog.stuffedcow.net/2010/11/seasonal-fuel-efficiency/#comments</comments>
		<pubDate>Sun, 21 Nov 2010 21:16:25 +0000</pubDate>
		<dc:creator>Henry</dc:creator>
				<category><![CDATA[Measuring Stuff]]></category>

		<guid isPermaLink="false">http://blog.stuffedcow.net/?p=254</guid>
		<description><![CDATA[<p>It is well-known that a car&#8217;s fuel efficiency decreases during the winter months. There are many potential contributors, including increased air density causing drag, excessively rich fuel mixture from cold starts taking a long time to warm up, increased pumping losses from dense cold air intake, increased engine oil viscosity, increased rolling friction from colder <span style="color:#777"> . . . &#8594; Read More: <a href="http://blog.stuffedcow.net/2010/11/seasonal-fuel-efficiency/">Seasonal Fuel Efficiency</a></span>]]></description>
			<content:encoded><![CDATA[<p>It is well-known that a car&#8217;s fuel efficiency decreases during the winter months. There are many potential contributors, including increased air density causing drag, excessively rich fuel mixture from cold starts taking a long time to warm up, increased pumping losses from dense cold air intake, increased engine oil viscosity, increased rolling friction from colder tire rubber, and probably more I don&#8217;t know about. This is an attempt at quantifying the overall effect, but it doesn&#8217;t distinguish between the causes.</p>
<h3>Test Methodology</h3>
<p>This is data from a 2003 Corolla CE, collected between Jan. 2009 and Nov. 2010. Each data point is the fuel efficiency for one tank of gasoline at the time of the refueling. The trips consist mostly of a mix of highway and city driving with about 60% highway by time traveled. Data points from long highway-only trips have been removed. The month and day of the fill-up is plotted, with the year ignored, so the resulting graph is cyclical with a period of one year.</p>
<h3>Results</h3>
<p><img src="http://blog.stuffedcow.net/wp-content/uploads/2010/11/seasonal_fuel.png" alt="" title="Seasonal Fuel Efficiency" width="682" height="510" class="alignnone size-full wp-image-263" /></p>
<p>There is a noticeable seasonal dependence of fuel efficiency, and is correlated with average temperature, being worst in mid-January. Winter vs. summer fuel efficiency differs by about 15%, but there is also significant random variation of comparable magnitude between fill-ups. I don&#8217;t see evidence that the use of air conditioning during the hottest summer months has a big impact on fuel efficiency.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.stuffedcow.net/2010/11/seasonal-fuel-efficiency/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

