The Dude abides.

Posted
13 December 2007

Tagged
Current Events
Free Software
Technology
Travels

FOSS.in Conference Day 2

This is day 2 of the conference. Reports for previous days are as follows: Project Day 1, Project Day 2, Conference Day 1.

Union Mount: VFS based Filesystem Namespace Unification for Linux

Bharata B. Rao from IBM spoke on union mount, a filesystem namespace unification (ie, the concept of merging the contents of two or more directories/filesystems to present a unified view). Some uses of Union mount include:

  • live CD systems (writable RAM based FS combined with a read only FS on CD, thus allowing a writable disk-less system)
  • server consolidation, many servers sharing a common RO installation
  • disk-less NFS-root clients (set of machines sharing a single RO NFS root filesystem)
  • sandboxing - simulation of software updates, testing of OS updates

The Union file system also offers unification at filesystem layer, so EXT3 and ReiserFs can be abstracted through a Union file system. File systems offer a namespace (hierarchical view of the filesystem contents) and mounting (adding the filesystem in the device to the namespace tree).

Some examples of transparent mounts include the following:

mount /dev/sda1 /mnt
mount -union /dev/sda2 /mnt

So, /mnt becomes the union mount point of sda1 and sda2, sda2 becomes the topmost writable layer and sda1 is the RO bottom layer of the union.

For directory listings (readdir), directory contents from different mount points are merged with only top layer file being shown. Same named directories are merged again. For file/directory lookups, the lookup starts with topmost directory and proceeds downwards. It stops and returns when the required file is found. Otherwise, it descends into all lower layers in case of directories to create subdirectory level unions.

For RO lower layers, all but the topmost layer are RO immutable layers. A write to a lower layer file results in the file getting copied to topmost layer and write being performed on the copy. Whiteouts are place holders for files that don’t exist logically and a deletion of a lower level only file/directory creates a whiteout for it in the topmost directory. Whiteout lookup return an -ENOENT.

For file renames, for files/directories present only in the topmost layer, traditional rename is used. The rename of a directory which is a part of a union or which is present only in the lower layer is deferred to userspace by returning -EXDEV. The renaming of a regular file present only in the lower layer is done by copying it up to the topmost layer.

There was a vigorous Q&A session, which I could not copy quickly enough. Needless to say, there was much interest from audience members.

Personal thoughts: A fairly lucid technical talk.

MySQL and the architecture of participation

Colin Charles spoke on MySQL and getting involved. Two years ago, MySQL had no contribution mechanism and working at MySQL was, for all intents and purposes, like working at a startup. MySQL has been open source since early 2000, GPLv2 even. However, development has been relatively closed as developers to MySQL are usually immediately hired. In addition, code reviews are performed in secret and there are legal hurdles to getting external contributors involved.

MySQL sought change, at some point, and opened up to getting the community involved, including having its bugs database, mailing list, forums available publically. The developer zone (devzone) was the most immediately useful of the sites, with downloads, necessary documentation, articles and the like. Apparently, MySQL devzone still has some marketing in it, so it doesn’t serve developers as well as it should.

MySQL Forge is a SourceForge/Freshmeat equivalent tailored for projects that use MySQL. It allows for sharing of SQL snippets, stored procedures, UDF and also provides a wiki. Much of the MySQL internals documentation has been moved to the wiki (this includes localized documentation too).

Next up was the Quality Contribution Program (QCP), a MySQL effort to improve the MySQL product base by getting active community participation. Rewards for community members include acknowledgement and possibly rewards. Participation is counted as activity in bug hunting/reports, test cases, patches in the last 12 months.

Particularly interesting was “Worklogs”, effectively development and roadmap tasks for MySQL. It basically describes features that MySQL developers are working on, with the ability for users to provide feedback on features being developed. Very cool!

For development and versioning purposes, MySQL still uses the proprietary BitKeeper. However, all of MySQL trees are public (http://mysql.bkbits.net) and should be up to date. Colin gave an overview of checking out MySQL sources and compiling the software. He also mentioned the use of MySQL sandbox, which is a testing playground for MySQL releases up to 6.0.

Colin went through some test cases and discussed MySQL storage engines. Of particular interest was the open source MySQL proxy which allows for monitoring and analysis of MySQL queries. Apparently, plugins are written in the Lua programming language (version 5.1).

Finally, he described ways in which external developers can contribute to MySQL.

Personal thoughts: Honest talk on where MySQL can improve in terms of community contributions.

PostgreSQL 8.3: A story of hundreds of patches

Josh Berkus with his copresenter Pavan Deolasse spoke on the upcoming PostgreSQL 8.3 release. He started off by showing the PostgreSQL logo and asked whether a modified Lord Ganesha logo would suit an Indic PostgreSQL team (given the silence that greeted his suggestion, I am thinking no).

Jost started off quickly on PostgreSQL 8.3 by noting that it is in beta and is to be released on the 4th of January 2008, in particular the reason for it being in beta for quite a while has simply been the many new patches the PostgreSQL team has received (280 patches and features). In that sense, Josh noted that (unlike the other open source database) PostgreSQL is a community project. Ouch! :)

One new feature has been SQL/XML which actually has been in development for some time (since 2002). Some time ago, Peter Eisentraut wrote a prototypical XML export feature (to export a table to XML). Following up on that, Pavel Stuehle wrote an SQL/XML syntax demo (the first standard syntax example which was dependent on pl/Perl). In 2005, Nikolay Samokhvalov wrote updatable XML views in the RDBMS.

In 2005, Google funded 700 students to work on open source projects. PostgreSQL got a whole 10% of that loot, with 7 students sponsored to work on open source (Nikolay was part of the 7). To build proper support for SQL/XML, PostgreSQL went to look for standards in this area and found an unpublished ANSI SQL standard dated 2006. So the development of SQL/XML was guided to some extent by this standard and there was lots of back and forth discussion between the standards developers and PostgreSQL. The willingness of PostgreSQL to do it right and implement a proper standard instead of re-inventing a proprietary in-operable standard. So there were code modifications to properly support SQL/XML with many patches and subsequent revisions and most importantly, before SQL/XML could be properly released, there was a need for proper documentation. Kudos on this policy, I must say, as PostgreSQL never officially releases anything without proper documentation.

Josh then gave example of use of SQL/XML. He dumped a whole lot of restaurant reviews in an XML format to PostgreSQL and used the inbuilt PostgreSQL XML functions to mine the data. There is a way to create XML data out of a table by using the xmlforest() function. In fact, entire tables/queries can be exported to XML via the table_to_xml() function. xPath can also be used to mine XML data.

Did I say this is fscking cool? :) Folk in the audience seemed to think so as well.

Next up was HOT (heap only tuples). Josh claimed PostgreSQL to be the fastest Open Source Database (OSDB) compared to MySQL, and certainly more scalable. However, he noted that because of the MVCC model, cleaning up older versions is a big performance hit (read: vacuuming). At this point Josh went into some significant detail regarding the nature of MVCC and why vacuuming is a big performance hit. The gist was that vacuuming can be solved by HOT which Pavan helped develop along with Simon Riggs, Heikki Linnakangas, Tom Lane and various other folk. Essentially, HOT provides the ability to do microvacuums which pretty much solve the performance hit problem (and HOT will be in 8.3!).

Josh also mentioned that an Indian team (CDE from IIT) came up with SkyLine which is an extension to the SQL syntax. However, as it was not part of the SQL standard, it was put into PgFoundry.

For future development, Josh noted that PostgreSQL is a mailing list driven project. However, he admitted that release cycles are slightly long and in 2008, they are looking at doing two month long cycles which allows for feedback a lot sooner.

There were some questions from the audience. On specific performance tuning on multiprocessors, Josh suggested increasing shared buffers. Another member of the audience wante to know whether the SQL/XML interface would allow for JSON dumps. The answer was in the negative, although it was noted that somebody working on pl/JavaScript. On running PostgreSQL on handheld devices, there was no plans unless handheld devices get really powerful (a due nod to the excellent SQLite was made at this point).

What about PostgreSQL in low memory environments? Apparently that’s feasible with PostgreSQL working under 20mb of ram (I’ve personally tested it on < 64MB environments - PostgreSQL works wunnerfully!). Somebody wanted to know about Sun’s interest in PostgreSQL when Sun has JavaDB? Well, Josh replied that JavaDB is for embedded and PostgreSQL is for large systems. On Microsoft’s implementation of SQL/XML, Josh agreed that Microsoft sucks balls (I’m *ahem* paraphrasing) as their implementation is “completely non standard” (woah, who could have saw this coming, eh?).

On another question, Josh replied that PostgreSQL 8.3 will be able to execute any function in the xPath 1.0 standard. A low-hanging-fruit question came in on how to troubleshoot slow queries in PostgreSQL; well (I would say RTFM but Josh was polite-r) use “explain analyze” and use system level tools to determine IO, memory or CPU utilization. As to why so few hosting companies provide PostgreSQL hosting, some common reasons were that ISP’s think their customers don’t need it and cPanel only offers MySQL (boo!).

There were several other questions but the talk soon ended and we went off to the PostgreSQL BOF.

Personal thoughts: A most excellent talk. My faith in PostgreSQL was always strong but now it’s root-firm!

BOF Session: PostgreSQL

The Birds-of-a-Feather (BOF) session was fun. Josh ran the PostgreSQL BOF and there were approximately 20 people present. He started off by showing some benchmarks showing how much PostgreSQL owns MySQL in the speed area. Josh was honest though in saying that Oracle skill kicks ass and PostgreSQL has some distance to go in catching up.

Josh discussed a tool he was developing to help generate a useful postgresql.conf to the machine’s architecture as the default postgresql.conf is extremely conservation in its values. He then took us through some common postgresql.conf settings.

Personal thoughts: Very very informative discussion.

Lighting Talk

Next up were the lightning talks. Danese Cooper was organizing it. The lighting talks were great fun. And thanks to Aizatto’s tai-chi, I ended up becoming Malaysia’s representative at the talk. With Rusty Russell, Rasmus Lerdorf and like 100 other people in the room, it was a wee bit scary.

I spoke on CouchDB and Asterisk and how saving AMI events in CouchDB is a great fit of technology. There were some other excellent lightning talks (a math teacher speaking about teaching math in this modern day and age, Rusty speaking on the ANTI-THREAD library, Rasmus on his trip, some chap on great places to eat in Bangalore and a whole lot more!).

Personal thoughts: All in all, great fun!


FOSS.in Conference Day 1 today’s qotd