![]() |
|
|||||||
| The Technical Zone... The Geeky forum... Use this forum to discuss technical aspects of email, from authentication protocols to encryption. |
![]() |
|
|
Thread Tools |
|
|
#1 |
|
Master of the @
Join Date: Jul 2003
Posts: 1,079
|
Comparing two folders of many EMLs
I have a tricky EML set reconciliation task.
I need a Windows 7 tool that will compare the filenames of two folders of files aligning the filenames by equality of the first n characters - ignoring the rest. And let me easily open any file in its associated reader. Beyond Compare is my usual tool of choice for this but it cannot auto-align on n chars, so would leave me manually aligning, taking an age. Any ideas? |
|
|
|
|
|
#2 |
|
Cornerstone of the Community
Join Date: Dec 2017
Location: Scotland
Posts: 858
|
I'd use the dir command to get a list of one folder's files into a text file, and a list of the other folder's files into a second text file.
Sort both files (or use appropriate dir options for that) Use your text editor - assuming it has suitable commands - to either remove filename chars after column 'n', or change all columns after column 'n' to a fixed identical char on all lines. You'd probably need an editor which either allows you to restrict operations to just a certain column-range on each line, OR one with not just regex support, but regexes of a 'flavour' that allows you specify patterns of a certain length. Then eg if each line of one of the files contains a single filename & you only care about the first 13 characters of them, you could change ^(.{13,13})(.*)$ which (depending on the regex flavour) might mean, for each line, match "^" - start of line, ".{13,13}" - any character (matched by "."), for {13,13} a min & max of 13 times ".*" - any character (matched by "."), as many times as needed "$" - end of line Now the ".{13,13}" and ".*" sub-expressions above had brackets around them. In regexes, apart from grouping parts of an expression together, brackets also indicate to the regex processor that it should remember what text actually matched the first, second, third ... ()-enclosed pattern - assuming your regex engine supports that. That means for each line the processor would know what the first 13 characters were, & what the trailing characters were. So, you'd be able to specify a search & replace operation which changed each line from 13-arbitrary-chars other-characters to a copy of the first 13 chars You do that to both lists of files then compare the file lists. Going from a truncated filename to being able to open a corresponding file is trickier. The simplest way, if you see a difference in the edited lists on, say, line 217 is to look at line 217 of the original unedited lists to see what the full filenames are. I don't know my way around spreadsheets, but I expect you could generate a file containing (say) on each line the-13-chars tab the full filename from which those 13 came then load it into a spreadsheet. If BC offers compares of spreadsheets by columns you could run the compare on column 1 only, but still have the full filename in the next column ... & if BC displays that you might be able to click on it. If you can write a simple program in eg python, or perl, or basic ... , then it might be simpler (especially if your text editor isn't clever/powerful enough) just to write something that reads a list of filenames & writes out a modified list of the truncated names. |
|
|
|
|
|
#3 |
|
Cornerstone of the Community
Join Date: Dec 2017
Location: Scotland
Posts: 858
|
I had another idea; it still involves working with lists of filenames which need manipulated before comparison.
IF BC offers the facility (for programmers) to compare two program listings, looking only at the code but not comments on each line, you could present each list of filenames C:this\that\other\such_and_such_a file.eml as eg C:this\that\other\suc /* h_and_such_a file.eml */ & give the file lists the ".c" (or whatever) extension so that BC applies language-specific parsing of the contents. In other words you proxy the absolute column number that matters to you with some other feature that BC /is/ capable of using as a data of-interest / not of interest delimiter. |
|
|
|
|
|
#4 |
|
Cornerstone of the Community
Join Date: Dec 2017
Location: Scotland
Posts: 858
|
Or, try the demo version of a different compare tool; see
https://www.prestosoft.com/edp_features.asp There's a 30-day evaluation period, according to: https://www.prestosoft.com/edp_eval.asp - which I found with Google, not being able to find any obvious info about demo/evaluation copies etc on the usual pages of the website. Searching the FAQ page did find "evaluation" a few times though. There's also an online tool at: https://www.prestosoft.com/edp_diffnow.asp -- but it is limited in the size of files you can compare, and also in which clever features are present. Disclosure: I use EDP (though not in fact a terribly recent version). |
|
|
|
|
|
#5 |
|
Master of the @
Join Date: Jul 2003
Posts: 1,079
|
Thanks. That workaround of using just the filename text truncated is what I'm using at the moment.
The snag as you say is being able to go to the file. And no, Beyond Compare does not allow click on filename. However, the spreadsheet suggestion reminded me that I have seen filename hotlinks in .XLSs. But I've no so far found a way to create them from a text file. Thanks. |
|
|
|
|
|
#6 |
|
Master of the @
Join Date: Jul 2003
Posts: 1,079
|
Thanks but BC doesn't let you open the files from those filenames. BC has Folder Compare - which can't make this alignment but launch the file reader- and Text/Table Compare - which can make this alignment but can't launch the file reader.
|
|
|
|
|
|
#7 |
|
Master of the @
Join Date: Jul 2003
Posts: 1,079
|
Thanks.
Neither has directory comparison options that include partial names. |
|
|
|
|
|
#8 |
|
Cornerstone of the Community
Join Date: Dec 2017
Location: Scotland
Posts: 858
|
The edp option to compare only some columns (or ignore program comments) could simplify comparing lists of parts of filenames.
If you can't click on a filename in the compare text, to launch it, will click (or double or triple-click) select it (possibly that'd need full paths enclosed in double quotes?) .. but at worst drag-selecting a whole filepath - while irritating to have to - surely doesn't take too long, especially compared with how long you'll spend looking at such a file once it's been opened? That changes the problem to needing something convenient somewhere else that you can click on, which will then run code you've supplied. Or ... if the drag-selected filename always has quotes around it, and if ".eml" is associated with the reader application, just paste the quoted filename into a cmd.exe window & press enter. Windows will then run the fileea, which means do whatever the association is. (I'd probably use a temporary new toolbar button in my text editor - I've always got a few of its windows open - & have the code associated with that button: grab the clipboard contents, check it is a modest length (& only one line), check it's for a filename, then issue the command to launch the reader for that file. But ... it's a scriptable editor ... & not many people have such a tool.) I expect you could use AutoIt or AutoHotKey for something like that, if not whatever language you're proficient in. The problem is probably finding a tool that you already have & know, to take advantage of. |
|
|
|
|
|
#9 | |
|
Master of the @
Join Date: Jul 2003
Posts: 1,079
|
[quote=JeremyNicoll;640761]The edp option to compare only some columns (or ignore program comments) could simplify comparing lists of parts of filenames.
Sure, but it is absent from Directory compare https://www.prestosoft.com/ps.asp?pa...arison_options . Quote:
Thanks for the thought. |
|
|
|
|
|
|
#10 |
|
Cornerstone of the Community
Join Date: Dec 2017
Location: Scotland
Posts: 858
|
If eg you're trying to find files whose content is identical but filenames differ, I think you need a list of filepaths, sizes, & a hash of each one's contents. File search utilities typically do that, as well as de-dup utilities.
Otherwise if you're looking for - say - number of lines inside each file, or presence/absence of some key literals... surely it'd be better automated, even if that only allows you to discount a percentage of files & the rest still need eyeballed. What's the chance that your glances into so many files get less thorough after you've done a few hundred? |
|
|
|
|
|
#11 | ||
|
Master of the @
Join Date: Jul 2003
Posts: 1,079
|
Quote:
Quote:
![]() |
||
|
|
|
|
|
#12 |
|
Intergalactic Postmaster
Join Date: Oct 2002
Location: Holon, Israel.
Posts: 5,217
|
I never tried to use filters in WinMerge, but perhaps WinMerge with file filters might be able to do what you want.
|
|
|
|
|
|
#13 |
|
Master of the @
Join Date: Jul 2003
Posts: 1,079
|
Thanks, but filters just filter the set of pairs. https://manual.winmerge.org/en/Quick....html#id635807
They don't control the pairing (alignment) which I need. And that program has no filename alignment control as far as I can see. |
|
|
|
|
|
#14 |
|
Junior Member
Join Date: May 2026
Posts: 1
|
I?ve had to compare big sets of EMLs before, and the trick that helped me was normalizing the filenames first so diff tools didn?t choke on random naming patterns. I used https://renamer.io to batch-fix prefixes and date formats so everything lined up cleanly before running a file compare. After that, even simple folder diff utilities worked way smoother for spotting mismatches or missing messages.
Last edited by Xianoms : 3 Jun 2026 at 07:34 PM. |
|
|
|