Nástroje používateľa

Nástoje správy stránok


blog:odborny:2024-09-22-unicode_nfc_normalisation_for_rclone_on_macos

Unicode NFC normalisation for Rclone on macOS

TODO

TL;DR: Apple devices create all filenames in Unicode Decomposed Normalisation Form (NFD), while every other major OS uses Composed Normalisation Form (NFC). This makes you, as a Mac user, the bad guy, because it is you who is incompatible with the rest of the world.

In a nutshell, the problem is this: Whenever you create files with diacritics they will be copied to other devices with filenames stored as decomposed strings. This is a nonstandard for these OS'es, and you never know what problems that will cause.

This article presents my way of solving the problem by configuring Rclone to create all files in NFC (composed form) instead of NFD (decomposed form) – which is not at all that straightforward is it would seem.

Problem

  1. files1)
  2. diacritics2)

This article presents a is my own solution to the problem, based a detailed walkthrough

through Rclone, they are copied to your clouds with filenames stored as decomposed strings. This creates three different problems:

  1. First, if other users on different OS'es are renaming the files you created (or renamed), they need to press Backspace twice when they want to remove a letter with diacritics (e.g. á or ü). Renaming a filename with a lot of diacritics, like XXX, can become pretty lengthy process. And note, this applies also to you when you are accessing these clouds from a web client (i.e. your browser).
  2. Second,

https://www.unicode.org/reports/tr15/#Norm_Forms

Technical background

Due to some technical under-the-hood changes that Apple has made when it switched from HFS+ to APFS file system in its devices back in 2017,

First (naïve) attempt to solve the problem: Rclone with iconv module

Rclone has a special o switch which will forward its parameters to the underlying macFUSE/FUSE-T system providing the mounting functionality of the remote system.

This way, it is possible to order Fuse to load iconv module and have it automatically converting all filenames to NFC when they are moved to remote cloud. Thus, the straightforward way to solve the problem should be to use the following command when mounting the system:

$ rclone mount [] -o modules=iconv,from_code=UTF-8,to_code=UTF-8-MAC

And this actually works, but with some problems.

Problem: bugged Apple implementation of iconv

The problem is that macOS uses its own “Appletweaked” implementation of iconv, which is (1) very old; (2) nonstandard; and (3) it cannot convert significant parts of Unicode characters – for example, emoji. All three of these problems will be crucial in our attempt to deal with the problem. Moreover, the library itself resides in /usr/bin/iconv, which is under SIP, so you cannot normally do anything with it, and the only way to update it is to actually update the whole macOS.3)

The first problem: the standard library versions Apple provides are almost always very obsolete. In the case of iconv, versions supplied with different macOS'es are these:

macOS iconv
version release date version (Apple) version (library) release date
macOS Sequoia 15 20240916 libiconv107 FreeBSD libiconv 1.11[?] 20090303
macOS Sonoma 14 20230926 libiconv102 FreeBSD libiconv 1.11[?] 20090303
macOS Ventura 13 20221024 libiconv64 GNU libiconv 1.11 20060719
macOS Monterey 12 20211025 libiconv61 GNU libiconv 1.11 20060719
macOS Big Sur 11 20201117 libiconv59 GNU libiconv 1.11 20060719
macOS Catalina 10.15 20191007 libiconv59 GNU libiconv 1.11 20060719
macOS Mojave 10.14 20180924 libiconv51.200.6 GNU libiconv 1.11 20060719

In a nutshell, despite what the internal Apple versioning says, all macOS'es still use libiconv 1.11 released back in 2006.

⚠️‑TODO‑⚠️

$ nm -gU /usr/lib/libiconv.2.dylib
00000000000f2700 D __libiconv_version
0000000000002360 T _iconv
000000000000267a T _iconv_canonicalize
0000000000002382 T _iconv_close
0000000000001049 T _iconv_open
000000000000238f T _iconvctl
0000000000002488 T _iconvlist
0000000000013ff8 T _libiconv_set_relocation_prefix
$ nm -gU /usr/local/lib/libiconv.2.dylib
00000000000e3290 D __libiconv_version
0000000000003430 T _iconv_canonicalize
0000000000002ce0 T _libiconv
0000000000002d10 T _libiconv_close
00000000000016b0 T _libiconv_open
0000000000002d20 T _libiconv_open_into
0000000000015eb0 T _libiconv_set_relocation_prefix
0000000000003160 T _libiconvctl
0000000000003270 T _libiconvlist
0000000000015dd0 T _locale_charset
$ nm -gU /usr/lib/libiconv.2.dylib
00000000000f2700 D __libiconv_version
0000000000002360 T _iconv
000000000000267a T _iconv_canonicalize
0000000000002382 T _iconv_close
0000000000001049 T _iconv_open
000000000000238f T _iconvctl
0000000000002488 T _iconvlist
0000000000013ff8 T _libiconv_set_relocation_prefix
$ nm -gU /usr/local/lib/libiconv.2.dylib
00000000000e3290 D __libiconv_version
0000000000003430 T _iconv_canonicalize
0000000000002ce0 T _libiconv
0000000000002d10 T _libiconv_close
00000000000016b0 T _libiconv_open
0000000000002d20 T _libiconv_open_into
0000000000015eb0 T _libiconv_set_relocation_prefix
0000000000003160 T _libiconvctl
0000000000003270 T _libiconvlist
0000000000015dd0 T _locale_charset

Solution: libiconv with UTF-8-MAC support

There is a patched libiconv library on GitHub which adds support for UTF8MAC encoding. Installing it allows you not only to convert between real UTF8 and UTF8MAC encodings

Testing whether the patched iconv works correctly

$ echo "test📖" | /usr/bin/iconv -f utf-8 -t utf-8-mac
test�
 
$ echo "test📖" | /usr/local/bin/iconv -f utf-8 -t utf-8-mac
test📖

Further reading

Tools for manual conversion of filenames between NFC/NFD

Comments

1)
That is, files or folders – I will use the term file to mean any inode, whether it is a file or a directory.
2)
E.g. á or ü
3)
The Apple Open Source site provides listings of all of the open source software included in each release, together with their versions (these sometimes have some weird Applespecific versioning, but when you go to the respective GitHub page, you can usually dig out the actual software version there).
blog/odborny/2024-09-22-unicode_nfc_normalisation_for_rclone_on_macos.txt · Posledná úprava: 2024/11/12 15:19 od Róbert Toth