Why File Duplication Occurs...

fubar_life123 · Dec 16, 2020

File duplication occurs because of the use of a permanent hierarchical directory structure (full-stop). The solution is to uniquely identify each file in a flat directory-less file system... In place of a permanent hierarchy, a "virtual folder" hierarchy can be utilized purely for incidental use to aid in visualizing relationships between files. In addition to the more traditional folder hierarchy modality, tagging can be more readily exploited. Essentially, what I'm describing IS tagging (Semantic File System) with the added layer of creating hierarchical relationships between the tags, purely for visualization. In reality, the disk is already managed this way. We just ruined it by adding an archaic form of file management on top (in the case of Windows and Linux)... This would eliminate unintentional duplications. This would NOT, however, deal with files with a unique name or ID and identical content. Another layer would need to be added, using checksum, to flag potential duplicates.

VaM file bloat....

The real problem is, why hasn't this type of file management been implemented, considering the benefits of efficient disk usage?

Flat and unique is more efficient... Anything else is masochistic...

It also forces programmers to create unnecessarily complex scripts to manage file duplication. Dysfunction begets more dysfunction........................

Unique identifiers can be created in much the same way we create compound words (something we do naturally already) by concatenating unique attributes of the file (or whatever) in combination with a compression algorithm (like checksum) one could crunch down the essence of a "file" in a concise descriptor. The file name can consist of a concatenation of [creator name | file purpose | creation date/time | and of course ext. (file type)] Not unlike VAR naming, except VAR internally is still using a rigid hierarchical system...

A standard would need to be created for the type of compression algorithm, (otherwise you would get inconsistent naming) and also the attributes to concatenate would have to be standardized as well (agreed upon in advance) for consistency. Each attribute is then run through the compression algorithm to create a concise (size limited) descriptor in each attribute category. The trickier attribute would be "File Purpose." This would be read directly from the file contents, presumably file header, body, etc... Regardless of fine details, it would ensure uniqueness and can easily be automated and would be far less frustrating...

Some people love to live in chaos... I do not....

Semantic file system - Wikipedia

en.wikipedia.org

MacGruber · Dec 27, 2020

This is a terrible idea. A virtual file system works great when you would ONLY use VaM. But as soon as you use external software like Blender, Unity, Photoshop, VisualStudio, Notepad++, Git and a bunch of other things, it gets super confusing. Especially when using hash values as filenames. All external software would still rely on the physical file system, not some virtual thing you build on top.

VAR files, where each has its own filesystem hierarchy, are already a good solution. Many creators don't use VARs correctly or don't understand it, but thats a different problem that won't go away with an even more complicate approach.

fubar_life123 · Dec 27, 2020

MacGruber said:
This is a terrible idea. A virtual file system works great when you would ONLY use VaM. But as soon as you use external software like Blender, Unity, Photoshop, VisualStudio, Notepad++, Git and a bunch of other things, it gets super confusing. Especially when using hash values as filenames. All external software would still rely on the physical file system, not some virtual thing you build on top.

VAR files, where each has its own filesystem hierarchy, are already a good solution. Many creators don't use VARs correctly or don't understand it, but thats a different problem that won't go away with an even more complicate approach.

It's already more or less done with unity (or mac OS). This is why you import things into unity so it can slap a UUID on an asset and allow for flexible naming and hierarchical associations. Access to the asset is completely independent from the directory hierarchy because it is virtual. If you rename something or change it's place in a directory tree, the file system locates the asset just the same because the actual location or UUID of the file never changes. Since you are dealing with a flat directory (provided that the naming is consistent) it would virtually eliminate duplications. It's already a proven method of managing files and is much more efficient than a rigid directory hierarchy.

I managed to reduce all the duplicates in my morph folder simply by flattening to ONE folder. There is no need for sub folders in the context of morphs anyway since it's managed by VaM internally. Same thing with scripts, CUA, textures, etc... You know immediately when you have a duplicate!

The naming scheme is just my idea to create an infinite number of IDs based on attributes of the files. It doesn't completely eliminate the possibility of name duplications, with the way hash works, but it practically elimnates the likelihood (of naming collisions). Plus its automated!

Humans have a inborn bias toward rigid hierarchy... It doesn't mean the world actually operates this way.

fubar_life123 · Dec 27, 2020

fubar_life123 said:
It's already more or less done with unity (or mac OS). This is why you import things into unity so it can slap a UUID on an asset and allow for flexible naming and hierarchical associations. Access to the asset is completely independent from the directory hierarchy because it is virtual. If you rename something or change it's place in a directory tree, the file system locates the asset just the same because the actual location or UUID of the file never changes. Since you are dealing with a flat directory (provided that the naming is consistent) it would virtually eliminate duplications. It's already a proven method of managing files and is much more efficient than a rigid directory hierarchy.

I managed to reduce all the duplicates in my morph folder simply by flattening to ONE folder. There is no need for sub folders in the context of morphs anyway since it's managed by VaM internally. Same thing with scripts, CUA, textures, etc... You know immediately when you have a duplicate!

The naming scheme is just my idea to create an infinite number of IDs based on attributes of the files. It doesn't completely eliminate the possibility of name duplications, with the way hash works, but it practically eliminates the likelihood (of naming collisions). Plus its automated!

Humans have a inborn bias toward rigid hierarchy... It doesn't mean the world actually operates this way.

The real issue is getting people to think differently about something that has been ingrained in them for ages. It's major paradigm shift. There will always be resistance to radical change. Yes, not everyone is on board YET, but once people discover the benefits, they will inevitably make the switch... Now or later...

Why File Duplication Occurs...

fubar_life123

Active member

Semantic file system - Wikipedia

MacGruber

Invaluable member

fubar_life123

Active member

fubar_life123

Active member

Similar threads