week-8

Viagra Regex

Had some trouble locating the regex checker within the VM, so I went to a website that presumably does the same thing. https://www.regextester.com It can test against multiple strings just the like the regex checker. I’ll be testing against

1
2
3
v|agra
\/iagra
v|4gra

as the lecture indicates.
Here’s what I came up with!

1
/.*(i|\||1|L|l)(a|4|A)gr(a|4|A)/gm

This isn’t very precice, but it does the trick! It allows any number o characters in front and focuses on matching the “iagra” portion of the word. I’ve allowed any a’s with either a 4 or A, any i with | 1 L l and g and r are just there but this could be expanded if need be!

Continuing with the video, they add a new one!

v | a g r a

this can be fixed by adding optional blank space between each character

/.*(i|\||1|L)\s*(a|4|A)\s*g\s*r\s*(a|4|A)/gm

Spam in general

I havn’t had to think about email spam in a long time. While watching the lectures this week I started thinking about why this is? The last time I clicked into my spam filter, it was during my capstone project. One of our requirements is that our app needs to be able to send emails to customers. Password reset flows and pdfs are the main email contents of our application. When I was implementing the emailer, for a moment I thought that I had implemented it incorrectly. Im using a tool called sendgrid to send my emails for me but for whatever reason I wasn’t receiving any emails. I had an inkling my email client might be blocking the email and sure enough, there my emails were in the spam folder!

I haven’t had to think about email spam in a long time (for my personal email) so I thought I’d do a little research into why that is. As it turns out the wikipedia article that I found claims that the filter is not based on predefined rules or regex strings. Instead, it’s community driven! Whenever a user marks an email as spam, it gets integrated into some way into the gmail rule structure which propogates out to other users. Perhaps under the hood, gmail has implemented some sort of machine learning algorithm to use the user spam flags to “train” their gmail filters. I couldn’t find any more information, but they sure are doing a pretty good job. Opening up my spam folder now and it’s just litered with tons of garbage that I never have to see on a day to day business. Back in the day I certainly had to deal with a lot more spam than I do now!

Categorization Lab

Thought I’d take stab at parsing through postgres lab trying to identify spam. As it turns out, I’m not very familiar with postgres syntax. I had a tough time even getting a look at the message_data table description. At first I wasn’t really sure where to go but I had let the lab run on in the background and they instructor recommended taking a look at the subject line to see if there was anything common within there. I found that the word “stock” seemed to show up a fair amount so I thought I’d try to write a query highlighting subjects with the word stock in them.

1
SELECT COUNT(*) FROM message_data WHERE msubject LIKE '%tock%';

This ended up returning around 68k rows (not ideal). As it also turned out, my raw postgres is very raw. Feeling a bit dejected, I let the lab continue to roll and the instructor mentioned that the attachment hash turned out to be the golden ticket! So I thought I’d take a crack at writing a query identifying spam by attachment hashes.

The screen inside of my psql client keeps overwriting itself when the line loops back so I ended up finally writing the query in sublime and copying it in.
BUT! I finally got a query running to grab emails with the same attachment hash! The theory here is that there should rarely be a case where an email is sent with the same attachment more than 3 or 4 times.

1
2
3
4
SELECT COUNT(*) as hash_count, attachment_hash FROM message_data
GROUP BY attachment_hash
ORDER BY hash_count DESC
LIMIT 10;

This query could be expanded upon and used as a subquery to find all emails with matching source ips. Any email from those bad source IPs would then be marked as bad!

I’m going to jump on to working through some more hack the box challenges for the final! Until next week!