Help me, lazyweb.
May. 6th, 2014 11:08 amDue to (inherited, not my fault) poor planning, I have a folder structure with a few million files in a few tens of thousands of nested folders.
These folders are all stored in an ext3 file system, which is case sensitive.
They are largely being accessed by Windows clients, which are not case sensitive, via Samba, which is kinda case sensitive but mostly defers to Windows.
These files are being collected from other places, and being dropped into this location by both Windows and Linux clients.
There are an unknown-but-at-least-three-so-far number of folders with the same name, differing only in case - eg "Pogo" and "POGO". And this is a royal pain in the ass when a Linux rsync job drops files from otherplace\POGO into thisplace\POGO and then a Windows user clicks on POGO and gets the contents of Pogo because it's alphabetically first and it's the same name and thus the same folder, right? Hey, where are my files? Why aren't my files there?
There's gotta be an easy way - some "find" flag or some reasonably non-stupid bash script - to get a list of all cases where there's multiple paths differing only in case, that I can wind up and let run on this for a week or so and find all the duplicates. Ideally there'd also be a way to trigger an automated rename on one of them, but a complete list would be a perfectly cromulent start.
I mean, I *could* write a script to do it. But I don't WANNA. This is a wheel that has to have been invented previously, right? Someone's got a magic spell to do this in a much simpler way?
These folders are all stored in an ext3 file system, which is case sensitive.
They are largely being accessed by Windows clients, which are not case sensitive, via Samba, which is kinda case sensitive but mostly defers to Windows.
These files are being collected from other places, and being dropped into this location by both Windows and Linux clients.
There are an unknown-but-at-least-three-so-far number of folders with the same name, differing only in case - eg "Pogo" and "POGO". And this is a royal pain in the ass when a Linux rsync job drops files from otherplace\POGO into thisplace\POGO and then a Windows user clicks on POGO and gets the contents of Pogo because it's alphabetically first and it's the same name and thus the same folder, right? Hey, where are my files? Why aren't my files there?
There's gotta be an easy way - some "find" flag or some reasonably non-stupid bash script - to get a list of all cases where there's multiple paths differing only in case, that I can wind up and let run on this for a week or so and find all the duplicates. Ideally there'd also be a way to trigger an automated rename on one of them, but a complete list would be a perfectly cromulent start.
I mean, I *could* write a script to do it. But I don't WANNA. This is a wheel that has to have been invented previously, right? Someone's got a magic spell to do this in a much simpler way?
(no subject)
Date: 2014-05-06 06:40 pm (UTC)(no subject)
Date: 2014-05-06 07:04 pm (UTC)(no subject)
Date: 2014-05-06 08:47 pm (UTC)(no subject)
Date: 2014-05-07 06:54 pm (UTC)(no subject)
Date: 2014-05-09 11:19 pm (UTC)No, there's 800,000 duplicates. And I'm not 100% sure why your script outputs /POGO when /POGO doesn't have a dupe but there's /POGO/pong.txt and /POGO/PonG.txt, and I missed that when looking because they weren't next to each other. Because, of course, "p" and "P" *shouldn't* be next to each other.
I'm a dumbass. Thanks for the help.
(no subject)
Date: 2014-05-06 10:47 pm (UTC)(no subject)
Date: 2014-05-06 11:04 pm (UTC)(no subject)
Date: 2014-05-07 12:41 am (UTC)(no subject)
Date: 2014-05-07 12:41 am (UTC)(no subject)
Date: 2014-05-07 08:12 am (UTC)(no subject)
Date: 2014-05-07 01:19 pm (UTC)(no subject)
Date: 2014-05-07 10:27 pm (UTC)https://lists.samba.org/archive/samba/2008-January/137622.html
(no subject)
Date: 2014-05-07 11:00 pm (UTC)It does not appear to address "the directories are fucked up by non-Samba means, and then Samba shows them to the Windows users"
And it definitely doesn't fix "Well, the shit's fucked up NOW"
(I have eliminated the process that creates duplicate entries. Now, any duplicates will have to be deliberately created. However, that doesn't get rid of any past duplicates.)