Reverting a Lightroom Library to RAW

After some hemming and hawing, I came to a decision: converting my RAW files to DNG was not The Right Thing to do, for a number of reasons:

  1. DNG is actually a pain in the arse, compatibility wise. For better or worse - arguable worse - it’s never really had the uptake as a kind of universal RAW TIFF replacement. It’s actually easier to run different tools across RAW + sidecar files than DNG.
  2. On a related note, going to DNG locks you off from the manufacturer’s native tools. Which often aren’t great, but sometimes do a much better job of producing display and print files.
  3. While Lightroom is a great jack-of-all trades, the Adobe Camera Raw engine isn’t necessarily the best RAW converter out there. I’d like more freedom to work with (or at least trial) others, which can get better results; the fact the DNG format, while an open spec, is built around storing data in a form which assumes you’ve got the ACR/Lightroom pipeline as your internal representation only makes that worse.
  4. DNGs really slow everything down. They’re slower to browse in Explorer, they’re slower to import, they’re just slow.

So for those reasons, pulling my library back into RAWs seems like the way to go. Problem is, it ain’t as easy as you’d hope.

(But it is possible.)

Hello Darkness My Old Friend

So one way you could accomplish this is to assemble a new Lightroom catalog, re-import all your source files, re-create all your metadata - retag everything, re-apply location and facial recognition data, and so on and then… I’m sorry, I can’t type over my spasms of bitter laughter. This is Not An Option. I have around 20,000 photos, I’m not doing that shit all over again.

Another, which sort-of works, is to replace the DNG files with RAW files (ORFs and CR2s, in my case). This is a bit painful - you’re either using the Adobe DNG converter to re-extract them from the DNG or copying them from your imported file backups. A bit painful, a bit time-consuming, but a test run of a month and a couple of hundred photos wasn’t that bad.

Problem is, Lightroom then notes the DNGs are missing. Cool! You pick one, and say “PA098848.DNG is now PA098848.ORF, savvy?” It warns you the file name has changed, you tell it to go ahead anyway, and harmony is restored.

The problem is that you have. to. do. this. for. every. single. file. by. hand.

200 files? Maybe. 20,000?

I’m sorry

The smart thing to do for someone as lazy as myself would be to make the computer do the work. They like repetitive tasks.

Journey to the Centre of the Lightroom

A good starting point might be the Lightroom SDK. Lightroom has an embedded Lua scripting engine; this seems like a delightful place to start, not least because I’ve wanted an excuse to learn Lua, and what better reason to learn a new programming language than laziness?1.

To the Moon!

The first annoyance is that you can’t just pop open a console in Lightroom and run Lua scripts. Adobe, for whatever reason, don’t want you to think in terms of scripts, they want you to think in terms of writing plug-ins. This doesn’t thrill me, because it involves a lot more fucking around with boilerplate to do what I want, but it’s still going to be worth the effort.

Unfortunately, having downloaded the SDK it appears I can’t actually do what I want through the API Adobe have chosen to expose; I can certainly get a list of the missing files (so long as I use the GUI to generate the Missing Files temporary catalog first), but there doesn’t appear to be a supported way to iterate over the list and relink all the files programmatically.

Bugger.

There’s got to be another way.

SQLing with Delight

A bit of time with Google confirms that the Lightroom .lrcat file (which contains the guts of your Lightroom catalog) is in actual fact a SQLite database. This is very, very, very good news:

  1. It means Adobe’s programmers are smart. Rather than hacking together a shithouse one-off DB, they’ve use one of the best, if not the best cross-platform relational database in existence. It’s shameful they don’t show up as a major sponsor on the SQLite home page given how much money it helps them make, though.
  2. Since SQLite is a well-documented DB with a lot of tools for poking at it, it ought to be possible to come up with a solution to my problem by going to the source data, unless Adobe have done anything horrific in there. (Spoiler: they haven’t.)

Of course, poking around an undocumented database full of data you care about, with a view to changing it in ways the vendor doesn’t want you to, is not for the faint-of-backups, so if you’re following along at home because you want to do the same thing:

  • Back up your lrcat files before you do anything to them.
  • Be using the same version of the DB as me (Creative Cloud 2016).
  • Be aware this may not work with your data and may break everything.

Getting Started

With that warning out of the way, I took a few preparatory steps:

  • I grabbed the excellent SQLite Manager for Firefox, which is completely brill.
  • I did a manual part-migration by the method I outlined above:
    1. Copy RAWs into my main image folder heirarchy.
    2. Deleting the DNGs.
    3. Running Library > Find All Missing Photos.
  • Exiting Lightroom and BACKING UP MY LRCAT FILE I CANNOT STRESS THIS ENOUGH.

Nosing around

SQLite Manager makes it easy to nose around the database. I opened a bunch of likely-looking table names and found a couple of good candidates:

AgLibraryFile
    .extension
    .idx_filename
    .lc_idx_filename (always lower case)
    .lc_idx_filenameExtension (always lower case)

Adobe_images
    .fileFormat

AgLibraryFile appears to be all of the files in your library. Adobe_images appears to have all the file metadata, like size, aspect ratios, and so on. The columns I’ve noted above appear to be the only references to filenames or file formats in the database.

The next step is to see what a known-converted file looks like in this picture:

select  *
from    AgLibraryFile 
where   idx_filename = 'PA158995.ORF'

id_local is 724827, id_global is C43E142F-A0E0-4F5B-BB61-C39DBA11EEDE; it’s probably reasonable to assume these are going to be primary keys for relations in the DB.

Let’s have a look in in Adobe_images for these IDs:

select  *
from    Adobe_images   
where   id_local = '724827' or 
        id_global = 'C43E142F-A0E0-4F5B-BB61-C39DBA11EEDE'

…returns nothing. Huh. Maybe my assumption about what sort of files are in Adobe_images is wrong. Let’s have a look what’s in there:

select  distinct(fileFormat)
from    Adobe_images

…returns DNG, JPG, TIFF, VIDEO, RAW. No ORF files appear to make it in there. Hum. I wonder what type of files are in my catalog?

select  distinct(extension)
from    AgLibraryFile

…returns dng, JPG, tif, tiff, mp4, MOV, ORF, png, jpeg; files which are in the catalog, but not in one of the blessed formats don’t even show up. OK, that seems odd. We’ll put that to one side for a minute and see if we can find the Missing Files collection that a Find All Missing Photos automatically generates. Is it in the AgLibraryCollection table?

select  *
from    AgLibraryCollection
where   name like 'Missing%'

…nope. But it persists between restarts, and takes ages to re-create, so it must exist somewhere. OK, let’s go back to the files - this time, a manually migrated but un-fixed ORFfile:

select  * 
from    AgLibraryFile
where   baseName = 'PA098848'

returns: id_local = 722549 and id_global = E0C36439-E7A1-47CD-B6DF-BF272804CDF5, with all the other attributes all other attributes (extensions etc) showing as .dnf, not .ORF. Hurm. Let’s compare rows in Adobe_images

select  *
from    Adobe_images
where   id_local = '724827' or 
        id_local = '722549'

… huh. Neither exists. That seems peculiar. Surely the id_local is the key between the tables? Let’s have a closer look at Adobe_images, to see if we’re going to find any rows we need to fix up later:

select  *
from    Adobe_images
where   fileFormat = 'RAW'

That returns some stuff. Let’s see what we can learn about it:

select  *
from    Adobe_images ai, AgLibraryFile alf
where   fileFormat = 'RAW' and
        ai.id_global = alf.id_global

Nope. Nothing. It’s empty. Maybe I’ve done the query wrong.

select  *
from    Adobe_images ai, AgLibraryFile alf
where   fileFormat = 'RAW' and
        ai.id_local = alf.id_local

…nope. Ok, that’s weird. How is Lightroom even making sense of this data? Let’s look closer at the data.

The first file in the AgLibraryFile table has an id_local of 47, while the first file in Adobe_images is 45. That’s a bit off. But looking at the table, there’s a column with ‘47’ in it, called the rootFile. So let’s try…

select  *
from    Adobe_images ai, AgLibraryFile alf
where   fileFormat = 'RAW' and
        ai.rootFile = alf.id_local

…number of rows returned: 1079! Let’s check how that compares with the number of ORF files I have:

select  count(*)
from    AgLibraryFile
where   extension = 'ORF' 

…1138. Hmm. Close enough to be worth digging into.

select  distinct(fileFormat)
from    Adobe_images ai, AgLibraryFile alf
where   alf.extension = 'ORF' and
        ai.rootFile = alf.id_local

… gets me RAW and DNG, which is interesting, since it implies that when I remap the files through the GUI, it may not be updating this table, which could be a bit of a problem. Let’s check:

select  fileFormat
from    Adobe_images ai, AgLibraryFile alf
where   alf.baseName = 'PA158995' and 
        ai.rootFile = alf.id_local

… DNG. Really interesting. Essential LightRoom isn’t sanity checking the case where a remapping of a missing file may also be a change of file format. That seems like it could cause trouble, unless Lightroom just ignores the data in the Adobe_images.fileFormat column. How many files do I have in that state?

select  count(*)
from    Adobe_images ai, AgLibraryFile alf
where   alf.lc_idx_filenameExtension = 'orf' and 
        ai.fileFormat = 'DNG' and
        ai.rootFile = alf.id_local

… 59 records. Interesting, and essentially “all the ones I’ve migrated by hand before I decided fuck this for a joke”. So presumably MAKING ANOTHER CATALOG BACKUP BECAUSE I’M SUPERSTITIOUS and running:

update  adobe_images
set     fileFormat = 'RAW'
where   rootFile in (
    select  rootFile
    from    Adobe_images ai, AgLibraryFile alf
    where   alf.lc_idx_filenameExtension = 'orf' and 
            ai.fileFormat = 'DNG' and
            ai.rootFile = alf.id_local    
)

… will fix that up. Let’s check:

select  count(*)
from    Adobe_images ai, AgLibraryFile alf
where   alf.lc_idx_filenameExtension = 'orf' and 
        ai.fileFormat = 'DNG' and
        ai.rootFile = alf.id_local

…returns 0, while:

select  distinct(fileFormat)
from    Adobe_images ai, AgLibraryFile alf
where   alf.extension = 'ORF' and
        ai.rootFile = alf.id_local

…noice. Let’s try that on the real catalog and see if LightRoom shits the bed:

/me starts Lightroom, trys messing around with some converted files

…and it works. Apparently.

Still leaves us with the question: how is it tracking the Missing Photographs? Before spending too much time poking about the table structures, let’s see if we can turn up anything:

$ strings Collection\ -\ Working\ Copy.lrcat | grep Missing
    hasMissingSidecars INTEGER,
3C3316D7F-A082-44AA-9E39-5B596765D41AtempCollectionNameMissing Photographs

…that looks promising. So if we .dump the DB to a text file, we can see that:

grep 'Missing Photo' Collection.dump 
OneDrive Full Collection.dump:INSERT INTO "Adobe_variablesTable" VALUES(81,'C3316D7F-A082-44AA-9E39-5B596765D41A','tempCollectionName',NULL,'Missing Photographs');

…but that UUID doesn’t seem to be referred to in any of the collection-related tables, and neither does the local id2. This is annoying. Hmm, wasn’t there something in the API docs about temporary collections?

catalog.kTemporaryImages: (string) The “Temporary Images” collection, which might be named, for example, “Photos That Failed to Export”, or “Missing Photographs”; see LrCatalog:setActiveSources()."

And the doco for LrCatalog:setActiveSources() mentions the kTemporaryImages as a source. Lo and behold, there’s an agTempImages.

select  count(*)
from    agTempImages

Yields 149, which is the same as the number of missing files in my collection. Well, well, well, how handy is that? If I manually resolve a couple of missing files and re-run the Find Missing Photos function in Lightroom, the row count in this table decreases as you’d expect3.

Let’s try for a list of all the image data:

select  *
from    agTempImages t, AgLibraryFile afl
where   t.image = afl.id_local

…hm. Empty set. Let’s try Adobe_images instead.

select  *
from    agTempImages t, Adobe_images ai
where   t.image = ai.id_local

… that’s more like it. Right number of rows returned and everything. So to get all the file data as well we need:

select  *
from    agTempImages t, Adobe_images ai, AgLibraryFile afl
where   t.image = ai.id_local and  
        ai.rootFile = afl.id_local

… yus. Right number of rows, right data. Well, rather too many columns, but that’s not the point. The columns we need to update, just to remind ourselves, are:

select  ai.fileFormat, afl.extension, afl.idx_filename, afl.lc_idx_filename, afl.lc_idx_filenameExtension
from    agTempImages t, Adobe_images ai, AgLibraryFile afl
where   t.image = ai.id_local and  
        ai.rootFile = afl.id_local 

…note the afl.originalFilename; that could be handy later. The key question is whether we can sanely do this in SQL or whether we need to bust out another language to make it all work. The first limitation is that we can only update one table at a time; that’s OK because we can wrap it in transactions to make it a single work unit. The hardest one to update will be the AgLibraryFile data, so let’s have a look:

select  substr(afl.originalFilename, -3) as extension, -- The extension can be derived from the original file
        afl.originalFilename as idx_filename, -- This reverts to the pre-DNG filename 
        lower(afl.originalFilename) as lc_idx_filename, -- Happily lower casing is a SQL function 
        lower(substr(afl.originalFilename, -3)) as lc_idx_filenameExtension -- Again, this can be derived and lower cased.
from    agTempImages t, Adobe_images ai, AgLibraryFile afl
where   t.image = ai.id_local and  
        ai.rootFile = afl.id_local 

That’s not too shabby, although it will fail with extensions that are more or less that three characters, which is a bit of a nuisance. Happily we can use instr() to find the location of the seperator and make it a bit more dynamic, albeit that it will now fail if you have multiple periods in your filename. I don’t, so I’m going to call this good enough:

select  substr(afl.originalFilename, instr(afl.originalFilename, '.') + 1) as extension, -- The extension can be derived from the original file
        afl.originalFilename as idx_filename, -- This reverts to the pre-DNG filename 
        lower(afl.originalFilename) as lc_idx_filename, -- Happily lower casing is a SQL function 
        lower(substr(afl.originalFilename, instr(afl.originalFilename, '.') + 1)) as lc_idx_filenameExtension -- Again, this can be derived and lower cased.
from    agTempImages t, Adobe_images ai, AgLibraryFile afl
where   t.image = ai.id_local and  
        ai.rootFile = afl.id_local 

The next challenge is to turn that into an update statement that will do something meaningful; The inner loop is pretty simple; we need a list of id_locals to update:

select  ai.rootFile 
from    agTempImages t, Adobe_images ai
where   t.image = ai.id_local

…and a quick sanity check of the rows shows what we’d expect. So the update should look like:

update  AgLibraryFile
set     extension = substr(originalFilename, instr(originalFilename, '.') + 1),
        idx_filename = originalFilename,
        lc_idx_filename = lower(originalFilename),
        lc_idx_filenameExtension = lower(substr(originalFilename, instr(originalFilename, '.') + 1))
where   id_local in (
    select  ai.rootFile 
    from    agTempImages t, Adobe_images ai
    where   t.image = ai.id_local 
)

…et voila. Well, the update did something, anyway. However, the Adobe_images.fileFormat will now be out of whack, so we need to re-run:

update  adobe_images
set     fileFormat = 'RAW'
where   rootFile in (
    select  rootFile
    from    Adobe_images ai, AgLibraryFile alf
    where   alf.lc_idx_filenameExtension in ('orf', 'cr2') and 
            ai.fileFormat = 'DNG' and
            ai.rootFile = alf.id_local    
)

One last check before we apply this to the real database instead of a working set, though: have we changed more than the expected scope of rows; let’s check the different extensions in the AgLibraryFile:

select  extension, count(*)
from    AgLibraryFile
group by extension

Nope, looks good; I’m seeing the types and counts I’d expect.

Huge Success

So our two updates are:

begin transaction;
update  AgLibraryFile
set     extension = substr(originalFilename, instr(originalFilename, '.') + 1),
        idx_filename = originalFilename,
        lc_idx_filename = lower(originalFilename),
        lc_idx_filenameExtension = lower(substr(originalFilename, instr(originalFilename, '.') + 1))
where   id_local in (
    select  ai.rootFile 
    from    agTempImages t, Adobe_images ai
    where   t.image = ai.id_local and
            t.image not in (443890, 551293)
);
update  adobe_images
set     fileFormat = 'RAW'
where   rootFile in (
    select  rootFile
    from    Adobe_images ai, AgLibraryFile alf
    where   alf.lc_idx_filenameExtension = 'orf' and 
            ai.fileFormat = 'DNG' and
            ai.rootFile = alf.id_local    
);

…and if we’re happy with the outcome, a COMMIT; Take a deep breath, back up the catalog file, open the production catalog (with Lightroom closed!), and apply the SQL, aaaand…

Making a note: huge success. The photos in the “Missing Photographs” collection are showing that they have found their original copy, and re-running the Library > Find All Missing Photos function now shows no missing photos. Lightroom functions (like developing and exporting) work, and previously applied metadata and develop settings have been preserved.


  1. I realise that this might seem like a very peverse definition of laziness, but learning enough of a simple programming language to avoid weeks of tedious manual work is definitely laziness. The very best sort. ↩︎

  2. In fact, as far as I can tell, the UUIDs aren’t really used anywhere in the relational structures, so appear to be pointless, unless there are XML blobs that make use of them, which is possible. There are certainly XML blobs in the DB. ↩︎

  3. This does imply you can only have one temporary catalog object in Lightroom at a time, which does seem a little ill-thought-out. ↩︎

Share