S3FS on CentOS

So, we’re using CentOS 5 for some of our servers; the ones we need cPanel for.  These are our shared setups with people running blogs, Joomla, Drupal, and such.

I’ve never liked FTP for anything due to its insecurity, slowness, and its inability to recover from even the simplest  of errors.

So I finally got our server provider to build a kernel with FUSE support built in so that I could use s3fs to mount an Amazon S3 bucket as a normal mount.

It was a little annoying to set up the first time but, when I had do it a second time, and had to go all the way back to the beginning, I figured I’d better write it down this time.

Install Subversion

First step is to get s3fs from its site at: http://code.google.com/p/s3fs/wiki/FuseOverAmazon.

I prefer to check out from Subversion but Subversion was not installed on my server.

A simple:

	# yum install subversion

 

gave me an error about a missing dependency:

Error: Missing Dependency: perl(URI) >= 1.17 is needed by package subversion-1.4.2-4.el5_3.1.x86_64 (base)

 

To make a long story short, I ended up downloading and installing the RPM directly with:

# wget http://mirror.centos.org/centos/5/os/i386/CentOS/perl-URI-1.35-3.noarch.rpm
# rpm install perl-URI*

 

Download and Install s3fs

Once I had subversion installed, I checked out and built s3fs:

# svn checkout http://s3fs.googlecode.com/svn/trunk/ s3fs-read-only
# cd s3fs-read-only/s3fs
# make install

 

There are a handful of warnings from the compiler, but I ignored them since I wasn’t particularly interested in working on the code.

Setting up Keys

You can invoke s3fs with your Amazon credentials on the command line, in the environment, or in a configuration file. Since command lines and environments are too easy for bad guys to find, I opted for the configuration file approach.

Create a file /etc/passwd-s3fs with a line containing a accessKeyId:secretAccessKey pair.

You can have more than one set of credentials (i.e., credentials for more than one amazon s3 account) in /etc/passwd-s3fs in which case you’ll have to specify -o accessKeyId=aaa on the command line.

Once that’s all set up, you can mount the S3 bucket mybucket at the mountpoint /mnt/mybucket, the command line is:

	# /usr/bin/s3fs mybucket /mnt/mybucket

Now, you can treat /mnt/mybucket as a regular copy destination including using it for rsync!

If you ever want to get rid of the mount, the normal unix umount command does the trick:

	# umount /mnt/mybucket

Enjoy!

Setting up and Testing The s3fs Hosted at Fedora

Since there’s no real way to differentiate the two Amazon S3 via FUSE Python implementations, I’ve been referring to them as “the one at Fedora” and “the one at Google Code.” Until I’ve got better names, that’s what they are.

Setting up “The One At Fedora”

As I said, I’m starting with the one hosted at Fedora since it is much better organized. It’s a Git repository and can be gotten and built with:

    # git clone git://git.fedorahosted.org/s3fs s3fs-fedora
    # cd s3fs-fedora
    # make
    # sudo install -m644 -p doc/s3fs.1 /usr/share/man/man1/s3fs.1
    # man s3fs

Far as I can tell, the make command is not really necessary since the executable (python source) is already in the src/ subdirectory. The sudo install... command I added at the end there installs the man file so you can type man s3fs at the end there which is handy since that’s the only format the documentation is supplied in. I’ve made an HTML version just ’cause it’s easier to deal with. I used the first thing Google found, this ancient perl script to do it. Don’t laugh, I was in a hurry.

Next, we’ll actually mount an S3 FUSE volume and run some tests. Let me just say up front that the command line paramters for the s3fs program are abominable. The program is “mixed mode” meaning if you pass “-C” as the first parameter it operates in “command mode” and if you don’t, it’s in “mount mode.”

I’ll spare you the annoyance of messing around with it for about 20 minutes trying various permutations to say that this worked (I had a bucket named s3fs-python created on S3 via Transmit and was in the s3fs-fedora directory):

    # export AWS_ACCESS_KEY_ID=my AWS key
    # export AWS_SECRET_ACCESS_KEY=my AWS secret key
    # mkdir /Volumes/s3fs
    # /src/s3fs -C -f s3fs-python
    # src/s3fs /Volumes/s3fs/ -o bucket=s3fs-python
    # ls /Volumes/s3fs
drwxr-xr-x   1 ssteiner  staff    0 Dec 31  1969 .
drwxrwxrwt@ 11 root      admin  374 Jan  2 13:38 ..

So far, so good. Let’s put the source to this utility up there for fun:

    # cp -R . /Volumes/s3fs
cp: ./.DS_Store: could not copy extended attributes to /Volumes/s3fs/./.DS_Store: Invalid argument
cp: /Volumes/s3fs/./.git/description: Socket is not connected
    ...and a whole bunch more.
    # ls /Volumes/s3fs
ls: /Volumes/s3fs: No such file or directory

So, basically, the .DS_Store file threw it for a loop, it got disconnected, unmounted and bye bye. That’s about as far as I needed to go, but, just for fun, I just reinitialized the bucket and tried a simple git init. Sat around for about 20 seconds, then gave a “Bus error.”

UPDATE (2009-1-03): I went hunting around for “Bus Error” and found this thread.
Added:

threading.stack_size(1<<19)

Just after all the imports in src/s3fs and re-ran. The copy worked fine, all looks normal, diff reports no differences. Git is still giving a Bus error on initialization but we're further along at least.

UPDATE (2009-1-04): After mucking around with this for a little while longer, I went back to Jungledisk. Works almost flawlessly (symlinks are a problem in 'compatibility mode' at least, more later), and costs $20/S3 account for Windows, Linux, and Mac OS X versions on any number of computers. More in a later post.

Amazon S3 With FUSE on OS X 10.5 (part 3)

As per parts one and two of this series, I’ve eliminated one of the competitors in the race mostly due to a bug leading to 100% CPU usage while idle. The code is written in C++ and is written in a style I’m not even vaguely interested in working on so I’ve moved on to two Python and boto based implementations.

I’ve decided to start with the s3fs at Fedora implementation for all the reasons outlined in part two.

The first thing to do was to make sure to have the dependencies set up.

The first is the python-fuse library which took me a good half-hour to find and longer to install.

I’ve broken that out into its own post, Installing fuse-python on OS X 10.5.

The next dependency was the boto library. Since I thought the developers seemed pretty good about not checking in broken code, I just:

	# svn checkout http://boto.googlecode.com/svn/trunk/ boto-read-only
	# cd boto-read-only
	# python setup.py install
running install
running bdist_egg
running egg_info
writing boto.egg-info/PKG-INFO
writing top-level names to boto.egg-info/top_level.txt
writing dependency_links to boto.egg-info/dependency_links.txt
Traceback (most recent call last):
  File "setup.py", line 48, in 
    'Topic :: Internet',
	...and many more errors...

So much for that theory.

	# wget http://boto.googlecode.com/files/boto-1.6b.tar.gz
	# tar zxvf boto-1.6b.tar.gz
	# cd boto-1.6b.tar.gz
	# sudo python setup.py install

And that was that.

Next is Setting up and Testing The s3fs Hosted at Fedora

Installing fuse-python on OS X 10.5

In the course of my exploration of Amazon S3 FUSE based filesystems, I needed to install the Python FUSE library fuse-python. It was a pain in the ass and hence this post.

Using Google, I found a version at the Debian site with a link to the code, but no project page. Further digging lead to the main FUSE page which shows links to various things Python, but they all just assume you’re using Debian and apt-get which, on OS X, I’m obviously not.

A little more digging lead me to the actual package download page where you can actually download fuse-python version .02 from June, 2007.

I downloaded fuse-python from the page above, read the INSTALL file and, per it’s recommendation, attempted to build the module, then run one of the test programs to make sure it could find FUSE on my system.

	# cd fuse-python-0.2
	# python setup.py build
	...
    from fuseparts/_fusemodule.c:35:
/System/Library/Frameworks/Python.framework/Versions/2.5/include/python2.5/pyport.h:562:23: error: osreldate.h: No such file or directory
	...

Net result is that osreldate.h is missing from the build path so the first trick was to figure out from whence it came and handle that.

Found an article about building GmailFS for Mac OS X that, in the “Phase 2: Install” section gives a blow-by-blow for making the FUSE Python bindings build. I’m not going to repeat that all here, just report the things I had to do differently.

  • For me, the PKG_CONFIG_PATH stuff was unnecessary. I already had a fuse.pc in /usr/local/lib/pkgconfig/. If you don’t have pkg-config set up on your system, see how I did it in a previous post.
  • The author of that post suggests just commenting out
    the osreldate.h error from pyport.h in /System/Library/Frameworks/Python.frameworks/Versions/2.5/include/python2.5/pyport.h. I’m a little hesitant to do things like that but I decided to try it anyway.

After commenting out the include of osreldate.h, fuse-python did build with some warnings I wasn’t thrilled about but that’s for another day.

Next, I decided to follow the advice in the INSTALL file and try something from the example directory to make sure I’d built something that worked. As per the instructions, I tried:

	# python example/xmp.py -h

And got a help screen i.e. all the fuse-python imports worked properly.

Next, to install!

	# sudo python setup.py install

One dependency down!

Amazon S3 With FUSE on OS X 10.5 (part 2)

As per part one of this article, I’m looking for a convenient way to mount Amazon S3 storage as a regular volume on my OS X 10.5 machines.

I tried the s3fs hosted at Google Code with not-very-satisfactory results due to some sort of bug in the code that leads to 100% CPU consumption. This makes it, for now, completely unusable.

There are two other options that have seen fairly recent (within about a year) development. Both are written in Python and both use the boto library for the back-end.

The first, s3fs at Fedora, is interesting for a couple of reasons. First is that there was a release as recently as May, 2008. Second is that it was under consideration for inclusion in Fedora (see this thread). The second point is interesting because the developer had to jump through Fedora’s stringent package quality standards and the package is really well put together because of it.

We’ll see if the code measures up.

The second, s3fs-fuse at Google Code, is less well polished, package-wise and is missing some pretty basic stuff; README, INSTALL, setup.py etc. It’s just pretty much the code and nothing else, some documentation provided in the comments in the code itself.

Again, no testing done yet on the actual code.

I’m going to do this in several parts since I noticed that both Python implementations rely on some non-standard Python libraries, and neither gives instructions for installing those dependencies so I’ll start there, then take things as they come.

Amazon S3 With FUSE OS X 10.5 (part 1)

I’ve been using Amazon’s S3 service for quite a while for backups and such but, other than Transmit, haven’t been able to find a nice way to mount an S3 bucket as a filesystem.

I use ExpanDrive for my ssh filesystem access but they’ve been slow to add S3 support though it’s been mentioned as ‘on their agenda’ more than once.

I know that ExpanDrive uses MacFuse as its back-end for file operations so I decided to go poke around for an S3 filesystem implementation.

The obvious name for such a project would be some variation on s3fs and I found several with similar names:

So, I decided to try two of them…

I started with s3fs at Google code since it had been around for a while and had also been updated at the end of November; not likely looking at a project gone completely stale like some of the others.

The other I decided to try is s3fs on fedora since it uses the boto Python library that I’ve used to do some of my own S3 fiddling.

This s3fs hasn’t seen a new release May, 2008 and no development since about the same time. Interesting thing is that it is a single Python module working against the very active boto library so it should be pretty straightforward to fix up and/or enhance if need be.

s3fs at Google Code

I checked out the source using Subversion using the link on the project’s Source Page to my OS X 10.5 laptop.

A quick:

	# make

Gave a bunch of errors I didn’t bother to record. One instructed me to modify the command line parameters to include:

-D_FILE_OFFSET_BITS=64

But I figured I’d resolve the other problems before making any changes to the makefile.

The other things that went wrong were:

  • No libcurl which resulted in zillions of unresolved symbols
  • Missing pkg-config utility which didn’t do whatever it does.

In any case, I downloaded the curl source tarball from the main cURL site, copied it to /usr/local/src where I keep all my self-installed stuff, did the sudo .configure;make;make install dance, and that was that. It installed into /usr/local/ tree as it should.

pkg-config can be found at their ‘releases’ download site. I used release 0.23.

Again, copy to /usr/local/src, sudo .configure;make;make install> and all was well.

Back to s3fs directory and make made with only one warning about an unused function.

sudo make install copied the binary to the /usr/bin which I didn’t like so I nuked the binary there, modified the makefile to copy it to /usr/local/bin and re-ran.

So far, so good.

Now, to actually mount an S3 bucket as a volume.

I’m going to move along pretty quickly here as it’s getting late and I want to get this down before I crash.

I created a test bucket named s3fs-test-bucket on my S3 account using Transmit.

Then, I mounted it, copied its own source to it, then inited a Git repository there. Thing about Git is it’s real persnickety about the sha1 matching and such so it’ll detect any type of corruption where other types of tools might miss it.

Then, from the checked out s3fs source directory:

	# mkdir /Volumes/s3
	# /usr/local/bin/s3fs s3fs-test-bucket -o accessKeyId=aaa -o secretAccessKey=bbb /Volumes/s3
	# cp -r . /Volumes/s3
	# diff -r . /Volumes/s3
	# cd /Volumes/s3
	# git init
	# git add .
	# git commit

Now, first time I did this (had an overheat crash while writing) I had captured all of the Git output and such but suffice it to say that it was all as expected and I now have the source I used to create the volume in a Git repository on that volume. How recursive of me.

I haven’t looked at the code at all but I can say that it is pretty noticeably slow — like 20 seconds to copy a few small files slow. I’ll have a look later and see if there’s anything to be done about that but in the meantime, I’m going to see how the other one stacks up.

UPDATE: 2009-01-01 — there is something in the release notes for s3fs about ‘fixed 100% CPU problem’. Not fixed, Finder consumes 100% cpu, lets up immediately when s3fs drive unmounted.

Verdict: Unusable, at least on OS X 10.5, in its current form. Not just too slow, 100%s out the CPU while idle.

Next step will be to try s3fs on fedora to see how a Python implementation, based on the boto library, stacks up.