spamanalyzer.utils¶
analyze_subject(headers, wordlist)
¶
Checks if the email has gappy words or forbidden words in the subject.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
headers |
dict
|
a dictionary containing parsed email headers |
required |
wordlist |
list[str]
|
a list of words to be used as a spam filter in the subject |
required |
dkim_pass(headers)
¶
Checks if the email has a DKIM record.
dmarc_pass(headers)
¶
Checks if the email has a DMARC record.
get_domain(field)
async
¶
Extracts the domain from a field.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
field |
str
|
a string expected to contain a domain |
required |
Returns:
Name | Type | Description |
---|---|---|
Domain |
a Domain object containing the domain name and the TLD |
has_auth_warning(headers)
¶
Checks if the email has an authentication warning, usually it means that the sender claimed to be someone else.
has_html(body)
¶
Checks if the email contains html tags.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
body |
str
|
the body of the email |
required |
Returns:
Name | Type | Description |
---|---|---|
bool |
bool
|
True if the email contains html tags |
has_html_form(body)
¶
Checks if the email has a form.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
body |
str
|
the body of the email |
required |
Returns:
Name | Type | Description |
---|---|---|
bool |
bool
|
True if the email has a form |
has_images(body)
¶
Checks if the email contains images.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
body |
str
|
the body of the email |
required |
Returns:
Name | Type | Description |
---|---|---|
bool |
bool
|
True if the email contains images |
has_mailto_links(body)
¶
Checks if the email has mailto links.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
body |
str
|
the body of the email |
required |
Returns:
Name | Type | Description |
---|---|---|
bool |
bool
|
True if the email has mailto links |
has_script_tag(body)
¶
Checks if the email has script tags or javascript code.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
body |
str
|
the body of the email |
required |
Returns:
Name | Type | Description |
---|---|---|
bool |
bool
|
True if the email has script tags or javascript code |
inspect_attachments(attachments)
¶
A detailed analysis of the email attachments.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
attachments |
List
|
a list of attachments |
required |
Returns:
Name | Type | Description |
---|---|---|
dict |
dict[str, bool]
|
a dictionary containing the following information: |
dict[str, bool]
|
```python |
|
dict[str, bool]
|
{ "has_attachments": bool, # True if the email has attachments "attachment_is_executable": bool # True if the email has # an attachment in executable format |
|
dict[str, bool]
|
} |
inspect_body(body, wordlist, domain)
¶
A detailed analysis of the email body.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
body |
str
|
the body of the email |
required |
wordlist |
list[str]
|
a list of words to be used as a spam filter in the body |
required |
domain |
Domain
|
the domain of the sender |
required |
Returns:
Name | Type | Description |
---|---|---|
dict |
dict[str, Any]
|
a dictionary containing the following information: |
- has_http_links (bool): True if the email has http links
- has_script (bool): True if the email has script tags or javascript code
- forbidden_words_percentage (float): the percentage of forbidden words in the body
- has_form (bool): True if the email has a form
- contains_html (bool): True if the email contains html tags
inspect_headers(email, wordlist)
async
¶
A detailed analysis of the email headers.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
headers |
dict
|
a dictionary containing parsed email headers |
required |
wordlist |
Iterable[str]
|
a list of words to be used as a spam filter in the |
required |
Returns:
Name | Type | Description |
---|---|---|
tuple |
a tuple containing all the results of the analysis |
- has_spf (bool): True if the email has a SPF record
- has_dkim (bool): True if the email has a DKIM record
- has_dmarc (bool): True if the email has a DMARC record
- domain_matches (bool): True if the domain of the sender matches the domain of the server
- has_auth_warning (bool): True if the email has an authentication warning
- has_suspect_words (bool): True if the email has gappy words or forbidden words in the subject
- send_year (int): the year in which the email was sent (in future versions should be a datetime object)
parse_date(headers, timezone)
¶
Date format should follow RFC 2822, this function expects a date in the format:
"Wed, 21 Oct 2015 07:28:00 -0700", and returns a tuple where:
1. the first element is the parsed date or None
if the date is not in the correct
format
2. the second element is a boolean indicating if the date is valid or not
Eventually in future versions will be specified the kind of error that occurred, like in spamassassin (e.g. "invalid date", "absurd tz", "future date")
percentage_of_bad_words(body, wordlist)
¶
Calculates the percentage of forbidden words in the body.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
body |
str
|
the body of the email |
required |
wordlist |
list[str]
|
a list of words to be used as a spam filter in the body |
required |
Returns:
Name | Type | Description |
---|---|---|
float |
float
|
the percentage of forbidden words in the body |
spf_pass(headers)
¶
Checks if the email has a SPF record.