@sim642

sim642@lemm.ee · 11 months ago

Log files themselves don’t, but I’m just comparing it with simpler files with simpler structure with simpler algorithms with better complexity.

sim642@lemm.ee · 11 months ago

It’s not necessarily about the load, it’s about the algorithmic complexity. Going from lists (lines in a file, characters in a line) to trees introduces a potentially exponential increase in complexity due to the number of ways the same list of elements can be organized into a tree.

Also, you’re underestimating the amount of processing. It’s not about pure CPU computations but RAM access or even I/O. Even existing non-semantic diff implementations are unexpectedly inadequate in terms of performance. You clearly haven’t tried diffing multi-GB log files.

sim642@lemm.ee · 11 months ago

How do you expect it to be shown though?

sim642@lemm.ee · 11 months ago

Because text is text and all } are the same.

sim642@lemm.ee · 11 months ago

Diffing algorithms on trees might not be as efficient, especially if they have to find arbitrary node moves.

sim642@lemm.ee · 1 year ago

IT can look up the original (including all headers) based on the forwarded content. It’s on the same mail server.

sim642@lemm.ee · 1 year ago

Also Chevron.

sim642@lemm.ee · 1 year ago

And she was just running random exe files from emails.