附件中的文档，可以完整的学习ubi和ubifs。但本文重点关注文档中关于 ubi 和 ubifs 的 unclean reboot 和 power cut 场景下的数据一致性。
Both UBI and UBIFS are designed with tolerance to power-cuts in mind.
UBI has an internal debugging infrastructure that can emulate power failures for testing. The advantage of the emulation is that it emulates power failures at the critical points where control data structures are written to the device, whereas the probability of interrupting the system at those precise moments with physical power-cut testing is rather low.
UBI suppors power-cut emulation for testing which emulates power-cuts after a random number of writes. When a power-cut is emulated, UBI switches to read-only mode and disallows any further write to the UBI volume, thus emulating a power cut. The main idea of this mode is to emulate power cuts in interesting places, e.g. when writing the vid header.
|Emulation type||Flag value|
|Allow power-cut to be emulated during EC header write||1|
|Allow power-cut to be emulated during VID header write||2|
tolerance to unclean reboots - UBIFS is a journaling file system and it tolerates sudden crashes and unclean reboots; UBIFS just replays the journal and recovers from the unclean reboot; mount time is a little bit slower in this case, because of the need to replay the journal, but UBIFS does not need to scan whole media, so it anyway takes fractions of a second to mount UBIFS; note, authors payed special attention to this UBIFS aspect
UBIFS has internal debugging infrastructure to emulate power failures and the authors used it for extensive testing. It was tested for long time with power-fail emulation. The advantage of the emulation is that it emulates power failures even at the situations which happen not very often. For example, when the master node is updated, or the log is changed. The probability to interrupt the system at those moments is very low in real-life.
There is also a powerful user-space test program called
integck which performs a lot of random I/O operations and checks the integrity of the FS after remount. This test can also handle emulated power-cuts and check the FS integrity.
UBIFS supports write-back, which means that file changes do not go to the flash media straight away, but they are cached and go to the flash later, when it is absolutely necessary. This helps to greatly reduce the amount of I/O which results in better performance. Write-back caching is a standard technique which is used by most file systems like
Write-buffer is an additional UBIFS buffer, which is implemented inside UBIFS, and it sits between the page cache and the flash. This means that write-back actually writes to the write-buffer, not directly to the flash.
The write-buffer implementation is a little more complex, and we actually have several of them - one for each journal head. But this does not change the basic idea behind the write-buffer.
Few notes with regards to synchronization:
sync()" also synchronizes all write-buffers;
fsync(fd)" also synchronizes all write-buffers which contain pieces of "
synchronous files, as well as files opened with "
O_SYNC", bypass write-buffers, so the I/O is indeed synchronous for this files;
write-buffers are also bypassed if the file-system is mounted with the "
-o sync" mount option.
Take into account that write-buffers delay the data synchronization timeout defined by "
dirty_expire_centisecs" (see here) by 3-5 seconds. However, since write-buffers are small, only few data are delayed.
jffs2将meta data存储在data node的头中。所以jffs2扫描到最新的节点时，就知道了meta data。顺序写入发生断电的时候，知会丢失结尾的一部分数据。
In JFFS2 all the meta-data (like inode
ctime, inode size, UID/GID, etc) are stored in the data node headers. Data nodes carry 4KiB of (compressed) data. This means that the meta-data information is duplicated in many places, but this also means that every time JFFS2 writes a data node to the flash media, it updates inode size as well. So when JFFS2 mounts it scans the flash media, finds the latest data node, and fetches the inode size from there.
In practice this means that JFFS2 will write these 10MiB of data sequentially, from the beginning to the end. And if you have a power cut, you will just lose some amount of data at the end of the inode. For example, if JFFS2 starts writing those 10MiB of data, write 5MiB, and a power cut happens, you will end up with a 5MiB
f.dat file. You lose only the last 5MiB.
Every piece of information UBIFS writes to the media has a CRC-32 checksum. UBIFS protects both data and meta-data with CRC. Every time the meta-data is read, the CRC checksum is verified.
The data CRC is not verified by default. We do this to improve the default file-system read speed.
But UBIFS allows to switch the data verification on using the
chk_data_crc mount option.
Note, currently UBIFS cannot disable CRC-32 calculations on write, because UBIFS recovery process depends on in. When recovering from an unclean reboot and re-playing the journal, UBIFS has to be able to detect broken and half-written UBIFS nodes and drop them, and UBIFS depends on the CRC-32 checksum here.
In other words, if you use UBIFS with data CRC-32 checking disabled, you still have the CRC-32 checksum attached to each piece of data, and you may mount UBIFS with the
chk_data_crc option to enable CRC-32 checking at any time
meta-data和 data node总是写入CRC。但是只有meta-data会做CRC校验。data node默认不做CRC检查。但是可以通过
Changing a file atomically means changing its contents in a way that unclean reboots could not lead to any corruption or inconsistency in the file.
The only reliable way to do this in UBIFS (and in most of other file-systems, e.g. JFFS2 or ext3) is the following:
make a copy of the file;
change the copy;
synchronize the copy (see here);
re-name the copy to the file (using the
rename() libc function or the
Note, if a power-cut happens during the re-naming, the original file will be intact because the re-name operation is atomic. This is a
POSIX requirement and UBIFS satisfies it.
OpenWrt UCI使用这个方法。它在uci commit的时候，先写入到一个文件，最后rename。
Zero-length files are a special case of corruption which happens when an application first truncates a file, then updates it. The truncation is synchronous in UBIFS, so it is written to the media straight away. But when the data are written, they go to the page cache, not to the flash media. So when an unclean reboot happens, the file becomes empty (truncated) because the data are lost.
Zero-length files also appear when an application creates a new file, then writes to the file, and a power cut happens. The reason is similar - file creation is a synchronous operation, data writing is not.
Well, the description is a bit simplified. Actually, when a file is created or truncated, the creation/truncation UBIFS information is written to the write-buffer, not straight to the media. So if a power cut happens before the write-buffer is synchronized, the file will disappear (creation case) or stay intact (truncation case). But since the write-buffer is small and all UBIFS writes go there, it is usually synchronized very soon. After this point the file is created/truncated for real.
【2】Thomas Gleixner, Frank Haverkamp, Artem Bityutskiy. UBI - Unsorted Block Images.
【4】Adrian Hunter, Artem Bityutskiy. UBIFS file system, NOKIA.
【5】Adrian Hunter. A Brief Introduction to the Design of UBIFS. 2008.
【6】UBI FAQ and HOWTO
【7】UBIFS FAQ and HOWTO
【9】Katsuki. Evaluation of UBI and UBIFS. TOSHIBA. 2009
Theodore Ts'o. Delayed allocation and the zero-length file problem. 2009
Theodore Ts'o. Don’t fear the fsync! 2009