After some hemming and hawing, I came to a decision: converting my RAW files to DNG was not The Right Thing to do, for a number of reasons:
- DNG is actually a pain in the arse, compatibility wise. For better or worse - arguable worse - it’s never really had the uptake as a kind of universal RAW TIFF replacement. It’s actually easier to run different tools across RAW + sidecar files than DNG.
- On a related note, going to DNG locks you off from the manufacturer’s native tools. Which often aren’t great, but sometimes do a much better job of producing display and print files.
- While Lightroom is a great jack-of-all trades, the Adobe Camera Raw engine isn’t necessarily the best RAW converter out there. I’d like more freedom to work with (or at least trial) others, which can get better results; the fact the DNG format, while an open spec, is built around storing data in a form which assumes you’ve got the ACR/Lightroom pipeline as your internal representation only makes that worse.
- DNGs really slow everything down. They’re slower to browse in Explorer, they’re slower to import, they’re just slow.
So for those reasons, pulling my library back into RAWs seems like the way to go. Problem is, it ain’t as easy as you’d hope.
(But it is possible.)
Hello Darkness My Old Friend
So one way you could accomplish this is to assemble a new Lightroom catalog, re-import all your source files, re-create all your metadata - retag everything, re-apply location and facial recognition data, and so on and then… I’m sorry, I can’t type over my spasms of bitter laughter. This is Not An Option. I have around 20,000 photos, I’m not doing that shit all over again.
Another, which sort-of works, is to replace the DNG files with RAW files (ORFs and CR2s, in my case). This is a bit painful - you’re either using the Adobe DNG converter to re-extract them from the DNG or copying them from your imported file backups. A bit painful, a bit time-consuming, but a test run of a month and a couple of hundred photos wasn’t that bad.
Problem is, Lightroom then notes the DNGs are missing. Cool! You pick one, and say “PA098848.DNG is now PA098848.ORF, savvy?” It warns you the file name has changed, you tell it to go ahead anyway, and harmony is restored.
The problem is that you have. to. do. this. for. every. single. file. by. hand.
200 files? Maybe. 20,000?
The smart thing to do for someone as lazy as myself would be to make the computer do the work. They like repetitive tasks.
Journey to the Centre of the Lightroom
A good starting point might be the Lightroom SDK. Lightroom has an embedded Lua scripting engine; this seems like a delightful place to start, not least because I’ve wanted an excuse to learn Lua, and what better reason to learn a new programming language than laziness?1.
To the Moon!
The first annoyance is that you can’t just pop open a console in Lightroom and run Lua scripts. Adobe, for whatever reason, don’t want you to think in terms of scripts, they want you to think in terms of writing plug-ins. This doesn’t thrill me, because it involves a lot more fucking around with boilerplate to do what I want, but it’s still going to be worth the effort.
Unfortunately, having downloaded the SDK it appears I can’t actually do what I want through the API Adobe have chosen to expose; I can certainly get a list of the missing files (so long as I use the GUI to generate the Missing Files
temporary catalog first), but there doesn’t appear to be a supported way to iterate over the list and relink all the files programmatically.
Bugger.
There’s got to be another way.
SQLing with Delight
A bit of time with Google confirms that the Lightroom .lrcat
file (which contains the guts of your Lightroom catalog) is in actual fact a SQLite database. This is very, very, very good news:
- It means Adobe’s programmers are smart. Rather than hacking together a shithouse one-off DB, they’ve use one of the best, if not the best cross-platform relational database in existence. It’s shameful they don’t show up as a major sponsor on the SQLite home page given how much money it helps them make, though.
- Since SQLite is a well-documented DB with a lot of tools for poking at it, it ought to be possible to come up with a solution to my problem by going to the source data, unless Adobe have done anything horrific in there. (Spoiler: they haven’t.)
Of course, poking around an undocumented database full of data you care about, with a view to changing it in ways the vendor doesn’t want you to, is not for the faint-of-backups, so if you’re following along at home because you want to do the same thing:
- Back up your lrcat files before you do anything to them.
- Be using the same version of the DB as me (Creative Cloud 2016).
- Be aware this may not work with your data and may break everything.
Getting Started
With that warning out of the way, I took a few preparatory steps:
- I grabbed the excellent SQLite Manager for Firefox, which is completely brill.
- I did a manual part-migration by the method I outlined above:
- Copy RAWs into my main image folder heirarchy.
- Deleting the DNGs.
- Running
Library > Find All Missing Photos
.
- Exiting Lightroom and BACKING UP MY LRCAT FILE I CANNOT STRESS THIS ENOUGH.
Nosing around
SQLite Manager makes it easy to nose around the database. I opened a bunch of likely-looking table names and found a couple of good candidates:
AgLibraryFile
.extension
.idx_filename
.lc_idx_filename (always lower case)
.lc_idx_filenameExtension (always lower case)
Adobe_images
.fileFormat
AgLibraryFile
appears to be all of the files in your library. Adobe_images
appears to have all the file metadata, like size, aspect ratios, and so on. The columns I’ve noted above appear to be the only references to filenames or file formats in the database.
The next step is to see what a known-converted file looks like in this picture:
select *
from AgLibraryFile
where idx_filename = 'PA158995.ORF'
id_local
is 724827
, id_global is C43E142F-A0E0-4F5B-BB61-C39DBA11EEDE
; it’s probably reasonable to assume these are going to be primary keys for relations in the DB.
Let’s have a look in in Adobe_images
for these IDs:
select *
from Adobe_images
where id_local = '724827' or
id_global = 'C43E142F-A0E0-4F5B-BB61-C39DBA11EEDE'
…returns nothing. Huh. Maybe my assumption about what sort of files are in Adobe_images
is wrong. Let’s have a look what’s in there:
select distinct(fileFormat)
from Adobe_images
…returns DNG
, JPG
, TIFF
, VIDEO
, RAW
. No ORF
files appear to make it in there. Hum. I wonder what type of files are in my catalog?
select distinct(extension)
from AgLibraryFile
…returns dng
, JPG
, tif
, tiff
, mp4
, MOV
, ORF
, png
, jpeg
; files which are in the catalog, but not in one of the blessed formats don’t even show up. OK, that seems odd. We’ll put that to one side for a minute and see if we can find the Missing Files
collection that a Find All Missing Photos
automatically generates. Is it in the AgLibraryCollection
table?
select *
from AgLibraryCollection
where name like 'Missing%'
…nope. But it persists between restarts, and takes ages to re-create, so it must exist somewhere. OK, let’s go back to the files - this time, a manually migrated but un-fixed ORF
file:
select *
from AgLibraryFile
where baseName = 'PA098848'
returns: id_local = 722549
and id_global = E0C36439-E7A1-47CD-B6DF-BF272804CDF5
, with all the other attributes all other attributes (extensions etc) showing as .dnf
, not .ORF
. Hurm. Let’s compare rows in Adobe_images
select *
from Adobe_images
where id_local = '724827' or
id_local = '722549'
… huh. Neither exists. That seems peculiar. Surely the id_local
is the key between the tables? Let’s have a closer look at Adobe_images, to see if we’re going to find any rows we need to fix up later:
select *
from Adobe_images
where fileFormat = 'RAW'
That returns some stuff. Let’s see what we can learn about it:
select *
from Adobe_images ai, AgLibraryFile alf
where fileFormat = 'RAW' and
ai.id_global = alf.id_global
Nope. Nothing. It’s empty. Maybe I’ve done the query wrong.
select *
from Adobe_images ai, AgLibraryFile alf
where fileFormat = 'RAW' and
ai.id_local = alf.id_local
…nope. Ok, that’s weird. How is Lightroom even making sense of this data? Let’s look closer at the data.
The first file in the AgLibraryFile
table has an id_local
of 47, while the first file in Adobe_images
is 45. That’s a bit off. But looking at the table, there’s a column with ‘47’ in it, called the rootFile
. So let’s try…
select *
from Adobe_images ai, AgLibraryFile alf
where fileFormat = 'RAW' and
ai.rootFile = alf.id_local
…number of rows returned: 1079! Let’s check how that compares with the number of ORF files I have:
select count(*)
from AgLibraryFile
where extension = 'ORF'
…1138. Hmm. Close enough to be worth digging into.
select distinct(fileFormat)
from Adobe_images ai, AgLibraryFile alf
where alf.extension = 'ORF' and
ai.rootFile = alf.id_local
… gets me RAW
and DNG
, which is interesting, since it implies that when I remap the files through the GUI, it may not be updating this table, which could be a bit of a problem. Let’s check:
select fileFormat
from Adobe_images ai, AgLibraryFile alf
where alf.baseName = 'PA158995' and
ai.rootFile = alf.id_local
… DNG. Really interesting. Essential LightRoom isn’t sanity checking the case where a remapping of a missing file may also be a change of file format. That seems like it could cause trouble, unless Lightroom just ignores the data in the Adobe_images.fileFormat
column. How many files do I have in that state?
select count(*)
from Adobe_images ai, AgLibraryFile alf
where alf.lc_idx_filenameExtension = 'orf' and
ai.fileFormat = 'DNG' and
ai.rootFile = alf.id_local
… 59 records. Interesting, and essentially “all the ones I’ve migrated by hand before I decided fuck this for a joke”. So presumably MAKING ANOTHER CATALOG BACKUP BECAUSE I’M SUPERSTITIOUS and running:
update adobe_images
set fileFormat = 'RAW'
where rootFile in (
select rootFile
from Adobe_images ai, AgLibraryFile alf
where alf.lc_idx_filenameExtension = 'orf' and
ai.fileFormat = 'DNG' and
ai.rootFile = alf.id_local
)
… will fix that up. Let’s check:
select count(*)
from Adobe_images ai, AgLibraryFile alf
where alf.lc_idx_filenameExtension = 'orf' and
ai.fileFormat = 'DNG' and
ai.rootFile = alf.id_local
…returns 0, while:
select distinct(fileFormat)
from Adobe_images ai, AgLibraryFile alf
where alf.extension = 'ORF' and
ai.rootFile = alf.id_local
…noice. Let’s try that on the real catalog and see if LightRoom shits the bed:
/me starts Lightroom, trys messing around with some converted files
…and it works. Apparently.
Still leaves us with the question: how is it tracking the Missing Photographs
? Before spending too much time poking about the table structures, let’s see if we can turn up anything:
$ strings Collection\ -\ Working\ Copy.lrcat | grep Missing
hasMissingSidecars INTEGER,
3C3316D7F-A082-44AA-9E39-5B596765D41AtempCollectionNameMissing Photographs
…that looks promising. So if we .dump the DB to a text file, we can see that:
grep 'Missing Photo' Collection.dump
OneDrive Full Collection.dump:INSERT INTO "Adobe_variablesTable" VALUES(81,'C3316D7F-A082-44AA-9E39-5B596765D41A','tempCollectionName',NULL,'Missing Photographs');
…but that UUID doesn’t seem to be referred to in any of the collection-related tables, and neither does the local id2. This is annoying. Hmm, wasn’t there something in the API docs about temporary collections?
catalog.kTemporaryImages: (string) The “Temporary Images” collection, which might be named, for example, “Photos That Failed to Export”, or “Missing Photographs”; see LrCatalog:setActiveSources()."
And the doco for LrCatalog:setActiveSources()
mentions the kTemporaryImages
as a source. Lo and behold, there’s an agTempImages
.
select count(*)
from agTempImages
Yields 149, which is the same as the number of missing files in my collection. Well, well, well, how handy is that? If I manually resolve a couple of missing files and re-run the Find Missing Photos
function in Lightroom, the row count in this table decreases as you’d expect3.
Let’s try for a list of all the image data:
select *
from agTempImages t, AgLibraryFile afl
where t.image = afl.id_local
…hm. Empty set. Let’s try Adobe_images
instead.
select *
from agTempImages t, Adobe_images ai
where t.image = ai.id_local
… that’s more like it. Right number of rows returned and everything. So to get all the file data as well we need:
select *
from agTempImages t, Adobe_images ai, AgLibraryFile afl
where t.image = ai.id_local and
ai.rootFile = afl.id_local
… yus. Right number of rows, right data. Well, rather too many columns, but that’s not the point. The columns we need to update, just to remind ourselves, are:
select ai.fileFormat, afl.extension, afl.idx_filename, afl.lc_idx_filename, afl.lc_idx_filenameExtension
from agTempImages t, Adobe_images ai, AgLibraryFile afl
where t.image = ai.id_local and
ai.rootFile = afl.id_local
…note the afl.originalFilename
; that could be handy later. The key question is whether we can sanely do this in SQL or whether we need to bust out another language to make it all work. The first limitation is that we can only update one table at a time; that’s OK because we can wrap it in transactions to make it a single work unit. The hardest one to update will be the AgLibraryFile
data, so let’s have a look:
select substr(afl.originalFilename, -3) as extension, -- The extension can be derived from the original file
afl.originalFilename as idx_filename, -- This reverts to the pre-DNG filename
lower(afl.originalFilename) as lc_idx_filename, -- Happily lower casing is a SQL function
lower(substr(afl.originalFilename, -3)) as lc_idx_filenameExtension -- Again, this can be derived and lower cased.
from agTempImages t, Adobe_images ai, AgLibraryFile afl
where t.image = ai.id_local and
ai.rootFile = afl.id_local
That’s not too shabby, although it will fail with extensions that are more or less that three characters, which is a bit of a nuisance. Happily we can use instr()
to find the location of the seperator and make it a bit more dynamic, albeit that it will now fail if you have multiple periods in your filename. I don’t, so I’m going to call this good enough:
select substr(afl.originalFilename, instr(afl.originalFilename, '.') + 1) as extension, -- The extension can be derived from the original file
afl.originalFilename as idx_filename, -- This reverts to the pre-DNG filename
lower(afl.originalFilename) as lc_idx_filename, -- Happily lower casing is a SQL function
lower(substr(afl.originalFilename, instr(afl.originalFilename, '.') + 1)) as lc_idx_filenameExtension -- Again, this can be derived and lower cased.
from agTempImages t, Adobe_images ai, AgLibraryFile afl
where t.image = ai.id_local and
ai.rootFile = afl.id_local
The next challenge is to turn that into an update statement that will do something meaningful; The inner loop is pretty simple; we need a list of id_locals to update:
select ai.rootFile
from agTempImages t, Adobe_images ai
where t.image = ai.id_local
…and a quick sanity check of the rows shows what we’d expect. So the update should look like:
update AgLibraryFile
set extension = substr(originalFilename, instr(originalFilename, '.') + 1),
idx_filename = originalFilename,
lc_idx_filename = lower(originalFilename),
lc_idx_filenameExtension = lower(substr(originalFilename, instr(originalFilename, '.') + 1))
where id_local in (
select ai.rootFile
from agTempImages t, Adobe_images ai
where t.image = ai.id_local
)
…et voila. Well, the update did something, anyway. However, the Adobe_images.fileFormat
will now be out of whack, so we need to re-run:
update adobe_images
set fileFormat = 'RAW'
where rootFile in (
select rootFile
from Adobe_images ai, AgLibraryFile alf
where alf.lc_idx_filenameExtension in ('orf', 'cr2') and
ai.fileFormat = 'DNG' and
ai.rootFile = alf.id_local
)
One last check before we apply this to the real database instead of a working set, though: have we changed more than the expected scope of rows; let’s check the different extensions in the AgLibraryFile
:
select extension, count(*)
from AgLibraryFile
group by extension
Nope, looks good; I’m seeing the types and counts I’d expect.
Huge Success
So our two updates are:
begin transaction;
update AgLibraryFile
set extension = substr(originalFilename, instr(originalFilename, '.') + 1),
idx_filename = originalFilename,
lc_idx_filename = lower(originalFilename),
lc_idx_filenameExtension = lower(substr(originalFilename, instr(originalFilename, '.') + 1))
where id_local in (
select ai.rootFile
from agTempImages t, Adobe_images ai
where t.image = ai.id_local and
t.image not in (443890, 551293)
);
update adobe_images
set fileFormat = 'RAW'
where rootFile in (
select rootFile
from Adobe_images ai, AgLibraryFile alf
where alf.lc_idx_filenameExtension = 'orf' and
ai.fileFormat = 'DNG' and
ai.rootFile = alf.id_local
);
…and if we’re happy with the outcome, a COMMIT;
Take a deep breath, back up the catalog file, open the production catalog (with Lightroom closed!), and apply the SQL, aaaand…
Making a note: huge success. The photos in the “Missing Photographs” collection are showing that they have found their original copy, and re-running the Library > Find All Missing Photos
function now shows no missing photos. Lightroom functions (like developing and exporting) work, and previously applied metadata and develop settings have been preserved.
-
I realise that this might seem like a very peverse definition of laziness, but learning enough of a simple programming language to avoid weeks of tedious manual work is definitely laziness. The very best sort. ↩︎
-
In fact, as far as I can tell, the UUIDs aren’t really used anywhere in the relational structures, so appear to be pointless, unless there are XML blobs that make use of them, which is possible. There are certainly XML blobs in the DB. ↩︎
-
This does imply you can only have one temporary catalog object in Lightroom at a time, which does seem a little ill-thought-out. ↩︎