In order to classify old email by month, I used archivemail as described in the following paragraphs.
I had several mbox type files that once I classified by sender:
% file 2003/* |grep 'mail text'
2003/2003-09-01: ISO-8859 mail text
2003/2003-09-01-sent-messages: ISO-8859 mail text
2003/2003-10-08: ISO-8859 mail text
2003/2004-01-30: Non-ISO extended-ASCII mail text
2003/bygamx: ISO-8859 mail text, with very long lines
2003/ltsp-es: ISO-8859 mail text
2003/mbox: ISO-8859 mail text, with very long lines
2003/sgarcia: ISO-8859 mail text, with very long lines
...
So first of all you've got to have a mess in your old emails; after that, concat all 'mail text' files into a big mbox file: (in the example, the emails are only from 2003)
% cat [MAIL TEXT FILES] > mbox
Note: you can 'download' your emails from gmail, yahoo or other mail-provider using pop3 protocol and fetchmail(1).
archivemail(1) says:
archivemail - archive and compress your old email
For example, the following line:
archivemail --no-compress --suffix -2003-01 --date 2003-02-01 mbox
removes emails from mbox, only those ones that were sent before February/2003, ie January-emails, and stores them into mbox-2003-01.
The approach I took is to remove by month from mbox, starting from January to December:
archivemail --no-compress --suffix -2003-01 --date 2003-02-01 mbox
archivemail --no-compress --suffix -2003-02 --date 2003-03-01 mbox
archivemail --no-compress --suffix -2003-03 --date 2003-04-01 mbox
archivemail --no-compress --suffix -2003-04 --date 2003-05-01 mbox
archivemail --no-compress --suffix -2003-05 --date 2003-06-01 mbox
archivemail --no-compress --suffix -2003-06 --date 2003-07-01 mbox
archivemail --no-compress --suffix -2003-07 --date 2003-08-01 mbox
archivemail --no-compress --suffix -2003-08 --date 2003-09-01 mbox
archivemail --no-compress --suffix -2003-09 --date 2003-10-01 mbox
archivemail --no-compress --suffix -2003-10 --date 2003-11-01 mbox
archivemail --no-compress --suffix -2003-11 --date 2003-12-01 mbox
archivemail --no-compress --suffix -2003-12 --date 2004-01-01 mbox
using the perl script included at mail-arch.
archivemail can send you a warning if an email is repeated, but only that. As I wanted to have only one copy of each email (and not repeated) I used mutt:
% mutt -f mbox-2003-01
l
Limit to messages matching:~=
ddddddd...
q
I mean: enter mutt, touch l key, then ~= keys (also Enter); at that moment, mutt should only show repeated email, which you can delete safely with d key; finally, quit mutt with q key.
#!/usr/bin/perl -lw
# prints archivemail commands to store old email by year
use Date::Manip;
$year = $ARGV[0] || "2002";
$cur = &ParseDate("$year-01-01") || die "can't parse $year";
$err = 0;
for ($i=0; $i<12; $i++) {
$nxt = &DateCalc($cur, "+1 month", \$err);
$sfx = &UnixDate($cur, "-%G-%m");
$dat = &UnixDate($nxt, "%G-%m-%d");
print "archivemail --no-compress --suffix $sfx --date $dat mbox";
$cur = $nxt;
}
Luis Alfonso Vega Garcia <[email protected]>
Return