EmailDiscussions.com  

Go Back   EmailDiscussions.com > Discussions about Email Services > The Technical Zone...
Register FAQ Members List Calendar Today's Posts
Stay in touch wirelessly

The Technical Zone... The Geeky forum... Use this forum to discuss technical aspects of email, from authentication protocols to encryption.

Reply
 
Thread Tools
Old 18 Jul 2025, 07:12 AM   #1
chrisjj
Master of the @
 
Join Date: Jul 2003
Posts: 1,079
Comparing two folders of many EMLs

I have a tricky EML set reconciliation task.

I need a Windows 7 tool that will compare the filenames of two folders of files aligning the filenames by equality of the first n characters - ignoring the rest. And let me easily open any file in its associated reader.

Beyond Compare is my usual tool of choice for this but it cannot auto-align on n chars, so would leave me manually aligning, taking an age.

Any ideas?
chrisjj is offline   Reply With Quote

Old 18 Jul 2025, 09:03 AM   #2
JeremyNicoll
Cornerstone of the Community
 
Join Date: Dec 2017
Location: Scotland
Posts: 858
I'd use the dir command to get a list of one folder's files into a text file, and a list of the other folder's files into a second text file.

Sort both files (or use appropriate dir options for that)

Use your text editor - assuming it has suitable commands - to either remove filename chars after column 'n', or change all columns after column 'n' to a fixed identical char on all lines.

You'd probably need an editor which either allows you to restrict operations to just a certain column-range on each line, OR one with not just regex support, but regexes of a 'flavour' that allows you specify patterns of a certain length. Then eg if each line of one of the files contains a single filename & you only care about the first 13 characters of them, you could change

^(.{13,13})(.*)$

which (depending on the regex flavour) might mean, for each line, match

"^" - start of line,
".{13,13}" - any character (matched by "."), for {13,13} a min & max of 13 times
".*" - any character (matched by "."), as many times as needed
"$" - end of line

Now the ".{13,13}" and ".*" sub-expressions above had brackets around them. In regexes, apart from grouping parts of an expression together, brackets also indicate to the regex processor that it should remember what text actually matched the first, second, third ... ()-enclosed pattern - assuming your regex engine supports that.

That means for each line the processor would know what the first 13 characters were, & what the trailing characters were. So, you'd be able to specify a search & replace operation which changed each line from

13-arbitrary-chars other-characters

to

a copy of the first 13 chars


You do that to both lists of files then compare the file lists.


Going from a truncated filename to being able to open a corresponding file is trickier. The simplest way, if you see a difference in the edited lists on, say, line 217 is to look at line 217 of the original unedited lists to see what the full filenames are.


I don't know my way around spreadsheets, but I expect you could generate a file containing (say) on each line

the-13-chars tab the full filename from which those 13 came

then load it into a spreadsheet. If BC offers compares of spreadsheets by columns you could run the compare on column 1 only, but still have the full filename in the next column ... & if BC displays that you might be able to click on it.


If you can write a simple program in eg python, or perl, or basic ... , then it might be simpler (especially if your text editor isn't clever/powerful enough) just to write something that reads a list of filenames & writes out a modified list of the truncated names.
JeremyNicoll is offline   Reply With Quote
Old 18 Jul 2025, 06:09 PM   #3
JeremyNicoll
Cornerstone of the Community
 
Join Date: Dec 2017
Location: Scotland
Posts: 858
I had another idea; it still involves working with lists of filenames which need manipulated before comparison.

IF BC offers the facility (for programmers) to compare two program listings, looking only at the code but not comments on each line, you could present each list of filenames

C:this\that\other\such_and_such_a file.eml

as eg

C:this\that\other\suc /* h_and_such_a file.eml */

& give the file lists the ".c" (or whatever) extension so that BC applies language-specific parsing of the contents.


In other words you proxy the absolute column number that matters to you with some other feature that BC /is/ capable of using as a data of-interest / not of interest delimiter.
JeremyNicoll is offline   Reply With Quote
Old 18 Jul 2025, 06:38 PM   #4
JeremyNicoll
Cornerstone of the Community
 
Join Date: Dec 2017
Location: Scotland
Posts: 858
Or, try the demo version of a different compare tool; see

https://www.prestosoft.com/edp_features.asp


There's a 30-day evaluation period, according to: https://www.prestosoft.com/edp_eval.asp - which I found with Google, not being able to find any obvious info about demo/evaluation copies etc on the usual pages of the website. Searching the FAQ page did find "evaluation" a few times though.


There's also an online tool at: https://www.prestosoft.com/edp_diffnow.asp -- but it is limited in the size of files you can compare, and also in which clever features are present.


Disclosure: I use EDP (though not in fact a terribly recent version).
JeremyNicoll is offline   Reply With Quote
Old 18 Jul 2025, 06:42 PM   #5
chrisjj
Master of the @
 
Join Date: Jul 2003
Posts: 1,079
Thanks. That workaround of using just the filename text truncated is what I'm using at the moment.

The snag as you say is being able to go to the file.

And no, Beyond Compare does not allow click on filename.

However, the spreadsheet suggestion reminded me that I have seen filename hotlinks in .XLSs. But I've no so far found a way to create them from a text file.

Thanks.
chrisjj is offline   Reply With Quote
Old 18 Jul 2025, 06:45 PM   #6
chrisjj
Master of the @
 
Join Date: Jul 2003
Posts: 1,079
Quote:
Originally Posted by JeremyNicoll View Post
IF BC offers the facility (for programmers) to compare two program listings, looking only at the code but not comments on each line, you could present each list of filenames ...
Thanks but BC doesn't let you open the files from those filenames. BC has Folder Compare - which can't make this alignment but launch the file reader- and Text/Table Compare - which can make this alignment but can't launch the file reader.
chrisjj is offline   Reply With Quote
Old 18 Jul 2025, 06:49 PM   #7
chrisjj
Master of the @
 
Join Date: Jul 2003
Posts: 1,079
Thanks.

Neither has directory comparison options that include partial names.
chrisjj is offline   Reply With Quote
Old 18 Jul 2025, 08:06 PM   #8
JeremyNicoll
Cornerstone of the Community
 
Join Date: Dec 2017
Location: Scotland
Posts: 858
The edp option to compare only some columns (or ignore program comments) could simplify comparing lists of parts of filenames.

If you can't click on a filename in the compare text, to launch it, will click (or double or triple-click) select it (possibly that'd need full paths enclosed in double quotes?) .. but at worst drag-selecting a whole filepath - while irritating to have to - surely doesn't take too long, especially compared with how long you'll spend looking at such a file once it's been opened?


That changes the problem to needing something convenient somewhere else that you can click on, which will then run code you've supplied.

Or ... if the drag-selected filename always has quotes around it, and if ".eml" is associated with the reader application, just paste the quoted filename into a cmd.exe window & press enter. Windows will then run the fileea, which means do whatever the association is.


(I'd probably use a temporary new toolbar button in my text editor - I've always got a few of its windows open - & have the code associated with that button: grab the clipboard contents, check it is a modest length (& only one line), check it's for a filename, then issue the command to launch the reader for that file. But ... it's a scriptable editor ... & not many people have such a tool.)

I expect you could use AutoIt or AutoHotKey for something like that, if not whatever language you're proficient in. The problem is probably finding a tool that you already have & know, to take advantage of.
JeremyNicoll is offline   Reply With Quote
Old 18 Jul 2025, 09:01 PM   #9
chrisjj
Master of the @
 
Join Date: Jul 2003
Posts: 1,079
[quote=JeremyNicoll;640761]The edp option to compare only some columns (or ignore program comments) could simplify comparing lists of parts of filenames.

Sure, but it is absent from Directory compare https://www.prestosoft.com/ps.asp?pa...arison_options .

Quote:
Originally Posted by JeremyNicoll View Post
If you can't click on a filename in the compare text, to launch it, will click (or double or triple-click) select it (possibly that'd need full paths enclosed in double quotes?) .. but at worst drag-selecting a whole filepath - while irritating to have to - surely doesn't take too long, especially compared with how long you'll spend looking at such a file once it's been opened?
Surely does when I have X,000 files to glance into.

Thanks for the thought.
chrisjj is offline   Reply With Quote
Old 18 Jul 2025, 10:10 PM   #10
JeremyNicoll
Cornerstone of the Community
 
Join Date: Dec 2017
Location: Scotland
Posts: 858
Quote:
Originally Posted by chrisjj View Post
Surely does when I have X,000 files to glance into.
If eg you're trying to find files whose content is identical but filenames differ, I think you need a list of filepaths, sizes, & a hash of each one's contents. File search utilities typically do that, as well as de-dup utilities.

Otherwise if you're looking for - say - number of lines inside each file, or presence/absence of some key literals... surely it'd be better automated, even if that only allows you to discount a percentage of files & the rest still need eyeballed.

What's the chance that your glances into so many files get less thorough after you've done a few hundred?
JeremyNicoll is offline   Reply With Quote
Old 18 Jul 2025, 10:23 PM   #11
chrisjj
Master of the @
 
Join Date: Jul 2003
Posts: 1,079
Quote:
Originally Posted by JeremyNicoll View Post
If eg you're trying to find files whose content is identical but filenames differ
Thanks, but I'm not. I'm just reviewing by eye.

Quote:
Originally Posted by JeremyNicoll View Post
What's the chance that your glances into so many files get less thorough after you've done a few hundred?
High
chrisjj is offline   Reply With Quote
Old 19 Jul 2025, 03:25 AM   #12
hadaso
Intergalactic Postmaster
 
Join Date: Oct 2002
Location: Holon, Israel.
Posts: 5,217
I never tried to use filters in WinMerge, but perhaps WinMerge with file filters might be able to do what you want.
hadaso is offline   Reply With Quote
Old 19 Jul 2025, 03:29 AM   #13
chrisjj
Master of the @
 
Join Date: Jul 2003
Posts: 1,079
Thanks, but filters just filter the set of pairs. https://manual.winmerge.org/en/Quick....html#id635807

They don't control the pairing (alignment) which I need.

And that program has no filename alignment control as far as I can see.
chrisjj is offline   Reply With Quote
Old 25 May 2026, 04:14 PM   #14
Xianoms
Junior Member
 
Join Date: May 2026
Posts: 1
I?ve had to compare big sets of EMLs before, and the trick that helped me was normalizing the filenames first so diff tools didn?t choke on random naming patterns. I used https://renamer.io to batch-fix prefixes and date formats so everything lined up cleanly before running a file compare. After that, even simple folder diff utilities worked way smoother for spotting mismatches or missing messages.

Last edited by Xianoms : 3 Jun 2026 at 07:34 PM.
Xianoms is offline   Reply With Quote
Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump


All times are GMT +9. The time now is 01:09 AM.

 

Copyright EmailDiscussions.com 1998-2022. All Rights Reserved. Privacy Policy