03

October
2012

Dynamic size filesystem

I was searching for a way how to create a dynamic-size file for Slax changes so it could be loop-mounted. Didn't find anything useful thus had to write own filesystem driver for it using fuse. And that's how dynsizeFS has born. I plan to include it in Slax as a default way to store changes.

How does it work? After mounted, it provides a virtual 2GB file, which is in fact stored on disk in two other files - index file (which remembers block locations) and data file (which holds the data blocks). The data file extends its size on the fly as needed. It doesn't shrink, though, but for our purposes it is perfectly fine. After creation of EXT2 filesystem, it consumes only 4MB of disk space. I couldn't put there XFS thought, due to the lack of mkfs.xfs support in busybox.

If you are interested, grab dynsizeFS sources from here. It is a fuse-based filesystem driver. For now, the 2GB max file size seems OK, but I would like to enable any size in general. Unfortunately I am unable to. Any number bigger than 2GB makes some integer overflows during the filesystem compilation, I am really confused regarding this, since -D_FILE_OFFSET_BITS=64 is defined properly. Fuse simply refuses to create a bigger file.

Am I missing anything? Is there any programmer who could show me a better way how to support even bigger files?

User comments
Manfred 2012-10-03 04:42

Hi Tomas,

That's really great!

Using unsigned long long in the relevant part, I was able to compile dynsizeFS with 8GiB size.

Afterwards I was able to format the dynsize.fs with ext2
and mount it as a loop; too. :D

Please find my changes below:

--- dynsizefs/dynsizefs.c 2012-10-03 17:24:08.000000000 +0000
+++ dynsizefs/dynsizefs.c.patched 2012-10-04 20:16:46.000000000 +0000
@@ -30,7 +30,8 @@
static pthread_mutex_t dynsize_mutex_ix;

#define BLOCK_SIZE 4096
-#define VIRTUAL_FILE_SIZE (1024*1024*1024-1+1024*1024*1024);
+#define SIZE (1024*1024*1024ULL)
+#define VIRTUAL_FILE_SIZE ((SIZE-1)*8);

FILE * fp; // data file
FILE * ix; // index files

Tomas M 2012-10-03 04:53

That really works, how is it possible? Thank you! I'm absolutely confused and shocked :) I defined VIRTUAL_FILE_SIZE as 1024*1024*1024*8 and that didn't work at all. What a magic is this ULL? :) I do not get it :)

Tomas M 2012-10-03 04:57

I can see with bigger files (like 8GB and more) the index gets filled rapidly. I think that I should consider another index for the first index :)

Manfred 2012-10-03 05:05

ULL = unsigned long long

int = 16bit integer
long int = 32bit integer
long long int = 64 bit integer

hxxp://en.wikipedia.org/wiki/C_data_types

in section 'Basic types' 4th row.

I didn't know it,too - It was a find during googling for the solution of your problem...

Manfred ;)

Iguleder 2012-10-03 07:34

Wrong. int = the native size of an integer. Under x86 - 32 bit (DWORD), under x86_64 - 64 bit, quadword.

long is a 32-bit integer and long long is a 64-bit one, always. "word" is a 16-bit integer.

In this case, the issue is using a signed number where an unsigned one was expected - read about how signed numbers are represented in binary form. You get overflows because signed numbers have a lower limit (negative numbers have to be represented somehow, don't they?). In the case of 32-bit architectures, the maximum number you can represent in an integer is 2^32-1, but here you put 2^34 in that poor 'lil integer.

Tomas M 2012-10-03 08:02

My main problem was that I didn't know that #defines should use some suffix (like ULL as mentioned by Manfred) to explicitly set their types, I thought that the compiler will automatically understand that if I put there 2^33 (for 8GB) then it'll use unsigned 64bit integer to fit the value. Instead, it forced 32bit int and complained about overflow. Lesson learned :)

lordwolf 2012-10-03 17:27

Just sharing...

Compilers always take the 'easiest' representation... for example,

float a; int b; b = 1; a = b / 2;

a will get the value of zero because both b and 2 are integers. We need to 'tell' the compiler that the result must be a float - which can be done by typecasting one of the operators or using a = b / 2.0;

Back to the original ULL thingy, it's just a way to tell the compiler that this number needs to be in unsigned long long, else it gets basic int (or long).

valgor 2012-10-03 21:45

Try do define _LARGEFILE_SOURCE and _LARGEFILE64_SOURCE too.

Manfred 2012-10-04 11:24

Fiddling with dynsizeFS there came up some questions.

Would it be possible to change the parameters this way:

dynsizefs [myfs_name] [myfs_size] [myfs_mount]

Then you could use myfs.[idx,data,fs] as vars and it
would be possible to easier separate the files if one wants
to create more than one dynsizeFS. If no parameter had been
given you could still use your hardcoded dynsizefs.[idx,data,fs].
Default folder for myfs.[idx,data] could be a subdir of /var.

But imho the cleanest solution would be if you would simply create
a /dyn directory in / which holds myfs_name.[idx,dat,fs].

The ability to store/mount more than one dynsizefs as

/dyn/myfs_name1.idx
/dyn/myfs_name1.dat
/dyn/myfs_name1.fs
/dyn/myfs_name2.idx
/dyn/myfs_name2.dat
/dyn/myfs_name2.fs
/dyn/myfs_name3.idx
/dyn/myfs_name3.dat
/dyn/myfs_name3.fs
...

would be ideal, too.

And now the last (and most egoistic) request:

myfs_size could be used for creating a file in a variable size.

This way I could implement dynsizeFS into (S)InA in an easy way,too.

Just dreaming,

Manfred

Tomas M 2012-10-05 00:02

There appears to be another problem in the fuse filesystem, strange things happen after 2GB of data is stored. Even if I fix the bug in index file storage (using 8 byte addresses instead of 4), it still has some problem. Fuse's 'read()' function always passes 'size' parameter as 4 bytes only while I'm expecting it to use 8 byte addresses. Really strange things happen here... Not sure if the dynsizeFS filesystem can be of any use for files bigger than 2GB.

Tomas M 2012-10-05 00:50

I'm going to rewrite the whole thing to be more optimal and store everything in one file, lets do it properly this time! :)

Manfred 2012-10-05 01:00

maybe this helps: :pray:

hxxp://code.google.com/p/cryptonite/issues/detail?id=18#c33

They are discussing the 2GiB file limit, too.

Manfred

Manfred 2012-10-05 02:49

Adding the defines below in dynsizefs.c I was able to copy > 5GiB:

...
#ifndef __USE_FILE_OFFSET64
#define __USE_FILE_OFFSET64 1
#endif
#ifndef __USE_LARGEFILE64
#define __USE_LARGEFILE64 1
#endif
#ifdef D_FILE_OFFSET_BITS
#undefine D_FILE_OFFSET_BITS
#endif
#define D_FILE_OFFSET_BITS 64
#ifndef _LARGEFILE64_SOURCE
#define _LARGEFILE64_SOURCE 1
#endif
...

Manfred