fuzzy duplicates

Discussion & Support for xplorer² professional

Moderators: fgagnon, nikos, Site Mods

User avatar
nikos
Site Admin
Site Admin
Posts: 15791
Joined: 2002 Feb 07, 15:57
Location: UK
Contact:

fuzzy duplicates

Post by nikos »

I was browsing my music collection and was thinking how to improve the mess. I never used a consistent naming for folders. Some include artist name, some artist name is last and so on

I was thinking of an automatic way to group all these artists together, but not one by one, to do all the music collection at once. THere is no way to do that in xplorer2 at present

the command should:

* break down keywords of folder names
* automatically organize them in groups of similar keywords

this is easier said than done as you don't want John travolta and John abercrombie in the same group :)

anyway, does anybody have a need for this kind of thing or a variation thereof?
narayan
Platinum Member
Platinum Member
Posts: 1430
Joined: 2002 Jun 04, 07:01

Post by narayan »

Yes, I would like that.

This is useful in making a group of pdf files too, where several keywords are present in their titles but in different order.

Some considerations:

1. Some files may contain lesser number of keywords, but still x2 should be able to identify them as a possible match. (Provide a slider for fuzziness)

2. A given file may be member of more than one groups. This should be allowed.

3. Consider a two-stage process, where in the first phase, x2 prepares a list of keywords (single, or a comma-separated list), and allows the user to edit it. Let the user even add new groups, or simply paste a text file where each groupo is listed on a new line.

After that, the second stage finds files that could be members of those groups.

4. A false positive is better than a false negative. That is, if x2 mistakenly thinks that a file belongs to a group, you can manually remove it. That is better than missing it altogether.

5. All these groups should be in virtual folders (like a scrap pane list, or the results of a search operation.)
Kilmatead
Platinum Member
Platinum Member
Posts: 4578
Joined: 2008 Sep 30, 06:52
Location: Dublin

Post by Kilmatead »

For music, you could just use MediaMonkey and feed its Auto-Organise function a mask like:

M:\Musicae\<Genre>\<Artist>\<Album>\<Track#:3> <Title>

...which then instantly reorganises/renames the folder structure according to the audiofile's Metadata matching the mask.  And I mean instantly - I've got a structure of about 15,000 music folders broken down and comprising some 300GB of files, which all get sorted and renamed automatically.  If I wanted to organise it some other way (using other metadata), I'd just define another mask and it would do it all again, no mess, no fuss.

There's something quite satisfying about it - and not as freaky as those self-driving cars.

That said, I've never had a use for anything like it outside of a music collection milieu, but that's just me. :shrug:  I would imagine people who like Photos would have a use for it, as they contain similar metadata - but no doubt those who like photos already have a programme that can do something similar.

And then there's Narayan who just wants everything in life to be as complicated as mentally possible. :twisted: :D
wperkins99
Bronze Member
Bronze Member
Posts: 106
Joined: 2004 Jul 11, 14:55

Post by wperkins99 »

Kilmatead wrote:I've got a structure of about 15,000 music folders

FIFTEEN THOUSAND....!?
Kilmatead
Platinum Member
Platinum Member
Posts: 4578
Joined: 2008 Sep 30, 06:52
Location: Dublin

Post by Kilmatead »

When you start out with a single folder called "Musicae", you then have subfolders for Genres (Singles, Classical, Rock, etc) which then have subfolders for Groups/Composers leading to subfolders for Albums/Collections, and so on and so on.  A simple Genre such as Classical may have upwards of 200 composers, all of whom have numerous opuses (for example, the "complete works" of J.S. Bach stretches to 155 CD's, and the complete organ works of Messiaen are seemingly endless, and all Medieval musics must "of course" be broken down by quarter-century of polytonal progression) - and that's just the "pop" end of the classical spectrum - try spending a few years "collecting" every string quartet by every obscure Eastern European composer, and - well, introspection has its cost - "My God... it's full of subfolders!"

When I discovered that MediaMonkey could sort it out with literally a single-mask and a single-click (would a whip be too sadistic?) I probably added as many years to my life as the cigarettes removed, so I reckon I broke even.  :D
narayan
Platinum Member
Platinum Member
Posts: 1430
Joined: 2002 Jun 04, 07:01

Post by narayan »

Ahem! Look who's talking of making things complicated?  :twisted:
User avatar
nikos
Site Admin
Site Admin
Posts: 15791
Joined: 2002 Feb 07, 15:57
Location: UK
Contact:

Post by nikos »

so this mediamonkey is only applicable to music with reliable tags. But what if the 'artist' in one case is gismonti, and in the other is egberto gismonti, or even gismonti, egberto, will that lead to different categories?
humbag!
Kilmatead
Platinum Member
Platinum Member
Posts: 4578
Joined: 2008 Sep 30, 06:52
Location: Dublin

Post by Kilmatead »

Admittedly, yes, it's not exactly "fuzzy" in the proper sense - but they get around that by allowing online automatic tag-mapping via multiple sources, etc - after all, the user is expected to bring some semblance of common-sense to their activities!

The first time I ran the organising routine I was - what's the word - a tad trepidatious? - but soon became a convert.

I'd hate to think of debugging the thing you're describing - considering by its nature it's not supposed to have exact results anyway.
narayan wrote:Look who's talking...
At least I don't organise my children with RegEx by day and then feign fear of a little scripting every five minutes by night!  Make up your mind, would ya? :wink:  Playing the devil's advocate is fine, but sooner or later you's-a-gotta-stump-up-da-readies and let go of the trainers! :twisted:
narayan
Platinum Member
Platinum Member
Posts: 1430
Joined: 2002 Jun 04, 07:01

Post by narayan »

I leave scripting and programming to the programmers.

Even a virtuoso will never venture into setting of the frets (or bridge). He will only tune the instrument.

Some techniques belong to the kitchen (tempering, basting, cutting, paring, boiling, frying, steaming) and others belong to the dining table (carving, flambing). We at the fine-dine don't expect to do the cooking stuff!  :twisted:
Kilmatead
Platinum Member
Platinum Member
Posts: 4578
Joined: 2008 Sep 30, 06:52
Location: Dublin

Post by Kilmatead »

So I take it any news of the class-system breaking down in India has been grossly exaggerated then, yeah?  Interesting that you're always egging Nikos on for more and more flexibility, yet when it comes to truly useful liberation (where us untouchables can teach the ruling classes a thing or two) you're happiest with the status quo of continental divide and a snappy GUI.

If the virtuoso knows not the value of a good linseed oil, Segovia would turn in his noble grave.

Get thee back into the kitchen!  You've been playing Guitar Hero for far too long on your iPhone! :D
Tuxman
Platinum Member
Platinum Member
Posts: 1610
Joined: 2009 Aug 19, 07:49

Post by Tuxman »

Using genre names as part of your folder structure will, sooner or later, lead to massive structuring issues. Where do you put Genesis? Rock ("From Genesis to Revelation"), Progressive Rock ("Foxtrot"), Pop ("Invisible Touch")? Genre'ing music is for people who make their money by talking about music. Boring, old people.
Tux. ; tuxproject.de
registered xplorer² pro user since Oct 2009, ultimated in Mar 2012
Kilmatead
Platinum Member
Platinum Member
Posts: 4578
Joined: 2008 Sep 30, 06:52
Location: Dublin

Post by Kilmatead »

That's why MediaMonkey allows for multiple Genres to be applied to any single grouping, with shared or individual sorting (without any file duplication).  Since the database is intended to be used from within the programme's interface itself, the actual sorting on the disc is more of a formality than a requirement - it would be just as happy to stuff 10,000 files into a single folder were I to allow it (while everything in the interface still appeared segregated), but that would be just silly.

Does anyone (besides Nikos) actually play music by navigating direct via the folder structure anyway?  I keep things structured so I know how to find them when others wish to copy stuff, so it's nice that MM automatically sorts the physical locations out by itself when I change metadata within the GUI.

I'd be scared shitless of trying to do it via Nikos' fuzzy-logic approach - where would all the unmatched/incomplete or partially matching items end up?  Never mind keeping track of rational grouping.
Tuxman
Platinum Member
Platinum Member
Posts: 1610
Joined: 2009 Aug 19, 07:49

Post by Tuxman »

Kilmatead wrote:That's why MediaMonkey allows for multiple Genres to be applied to any single grouping, with shared or individual sorting (without any file duplication).
This implies that you think that you can actually assign specific genres to all music. Which is wrong (or, at least, philistine).
Kilmatead wrote:Does anyone (besides Nikos) actually play music by navigating direct via the folder structure anyway?
I actually do, because there is a "yet to listen" folder with many subfolders on my external drive. (Rather unsorted though.)
Tux. ; tuxproject.de
registered xplorer² pro user since Oct 2009, ultimated in Mar 2012
Kilmatead
Platinum Member
Platinum Member
Posts: 4578
Joined: 2008 Sep 30, 06:52
Location: Dublin

Post by Kilmatead »

As I see it, genres are just guidelines anyway, and since they can always be personalised, one could consider "UnListened" as a genre itself (which when actually listened-to may have its metadata changed and it's automatically sorted into place, or deleted, as needed).

No matter how much artists themselves may vainly hate to be compartmentalised, the legacy of botanists has influenced our interpretations of organisational logic to such a degree that trying to escape it on principled-grounds is itself more petulant than embracing would be philistinic.  You can thank Aristotle for that.

That said, I have a similar idea I use for films - a basic folder affectionately called "Scuff" which holds (as yours), a very haphazard "mush of stuff" awaiting the cold hard light of judgement day's shift-delete button).  Oh wait, x2 doesn't allow me to hold shift while clicking the toolbar delete button - no wonder Nikos' folders are in a mess. :wink:
Tuxman
Platinum Member
Platinum Member
Posts: 1610
Joined: 2009 Aug 19, 07:49

Post by Tuxman »

Kilmatead wrote:No matter how much artists themselves may vainly hate to be compartmentalised, (...)
I hate compartmentalising myself. What is the advantage - a fuzzy folder structure with "well that almost fits 5 seconds of the album" genres?
Tux. ; tuxproject.de
registered xplorer² pro user since Oct 2009, ultimated in Mar 2012
Post Reply