week-9

Wireless Broadband

I’ve always wanted to learn the differences between the different types of wireless broadand networks, especially with 5G on the horizon. This week is the perfect week to do a bit of research and write it up here!

1G

1G, not surprisingly, was the first wirless network to be rolled out in the 1980s. 1G is an analog signal based system that first launched in Tokyo Japan in 1979. When someone speaks into a handset on a 1G network, their voice is modulated to a really high frequency and then sent to a radio tower for transit. This means you could dial in to other peoples calls if you were savvy enough.

2G

2G was a lot like 1G except that instead of using an analog signal, it uses a digital signal. It was launched in Finland in early 1990s. Instead of just modulating the callers voice to a higher frequency like in 1G, in 2G a callers voice is digitally encrypted so that the call cannot be intercepted in a meaningful way, only the receiver would be able to unscramble the messaging.

2G networks went through a series of partial version bumps:
2.5G implemented a different type of routing algorithm (packet-switching instead of circuit switching).It didn’t end up improving the service too much, although switching from circuit to packet for high traffic networks should generally free up resources so perhaps improvements were minor

2.75G Improved data transmission rates by using a different type of packet encoding.

These 2G networks were rolled out in the Americas and many were only decommisioned in the last couple years. For example, AT&T’s 2G network was fully decomissioned in 2017, Verizons this year in 2019.

3G

An internet standard called the International Mobile Telecomunications-2000 specifications (IMT2000) were formulated and became the backbone of the 3G network. With the rise of mobile internet, companies jumped at the change to implement a universal network layer that could send and receive internet sized packets at a reasonable (for the time) speed.

The IMT2000 standards guarnateed a speed of no less than 0.2 Mbit/s. A company could not claim to be selling a 3G network that did not satisfy this requirement.

The first 3G network was again launched in Japan in 1998. It caught on to large companies liek verizon and at&t in the early 2000s, however because 3G at times uses different frequencies and potentially new equipment, companies were slow to adopt due to the need to build new infrastructure when upgrading from their previous 2G towers.

3G offered better security standards, making sure it authenticated the network it was connecting to before beginning transmission. It also used a different cipher to encyrpt its messaging, however this cypher (KASUMI) later was found to have weaknesses.

4G

A group called International Telecommunications Union-Radio communications sector (ITU-R) came together to create the specifications for a 4G network. A 4G network must be able to transmit at 100megabits per second on a high mobilty area (train, car, etc), and 1 gigabit per second stationary. The network must be an all IP based packet-switched network. It also had a few specifications around smooth transitions between networks and suggestions about how resources should be shared to optimize for a maximum amount of concurrent users.

The first 4G network is sort of hard to pin down, as many different countries were able to demonstrate 4G speeds in certain test networks but again Japan is at the top of that list in the mid 2000s. The 4G networks began to roll out in the US in 2008/2009, Sprint Verizon and AT&T being among these players.

The early 4G didn’t actually meet the speed standards set out by the standards commission, but the implementation was in place and increased to reach those standards over time. This is the standard that most cell phone companies use today.

A common misconception is that 4G LTE is a better or equal version to 4G. This turns out as a way for slimy mobile companies to advertize something with 4G int he title without actually having to reach the 4G standards set out by the standards commission because it has LTE tacked onto the end.

LTE-A might be the closest thing on the market to true 4G standards but it still does not reach them in full.

5G

5G is not yet out, but optimists are hoping to see a network roll out in 2020. The IMT-2020 (5G spec) touts speeds of 20Gbits/s.
5G networks achieve this high level data transfer speed by using higher frequency waves between 700 MHz and 3 GHz. Because of this high frequency transfer rate, things like augmented reality and VR for mobile become a much more exciting prospect than they would be on 4G.

There are some issues with the waves that are used though, they have trouble passing through dense gasses or buildings, so things like antennas may begin popping up more frequently across cities like on top of power poles and sides of buildings to receive and transmit the 5G signals. This style of infrastructe is called a Many Input Many Output system, or a MIMO system. Companies leading the charge here in the US are Qualcom, Nokia, Cisco and Samsung, some heavy hitters to say the least.

The tax payers have been dumping plenty of money into the pockets of many of these large telecommunications companies, so it would be a real dissapointment if we didn’t begin to see some of these systems begin to roll out over the next few years, although I wouldn’t be surprised.

These paragraphs were researched from:
https://en.wikipedia.org/wiki/1G
https://en.wikipedia.org/wiki/2G
https://en.wikipedia.org/wiki/3G
https://en.wikipedia.org/wiki/4G
https://en.wikipedia.org/wiki/5G

Lecture

The lecuteres this week were a bit harder to follow, as they were a bit less lively than the others. The instructor seemed to be reading from the slides a bit more than in previous classes however, the lecture topics were a nice cherry on top. When I was listening to Professor Ruiz disuss the FakeInstaller malware, I realized how far I’d come in this class. He identified the malware from Russia as a polymorphic, and I actually knew what he was talking about! The malware would dynamicly change itself on each install so it would be incredibly hard to pin down from a virus protection perspective.

I also thought it was interesting hwo he discussed mobile malware trends over time. Mobile malware in the late 2000s was microscopic copmared to the boom tha happend in the early 2010s. This likely is because of the addition of mobiel stores as well as the surge of 4G network availability.

While I find the topics of mobile security interesting to a degree, I think the biggest takeway for me is that mobile users are far more likely to tap or click on interesting looking things because there is so much less taboo around mobile malware than there is against deskop viruses. For years it’s been beaten into our brains to please don’t open any sketchy attachments or click on any weird links, but that level of caution is somewhat dampened with the pocket computer model. Everything is quickly available on mobile these days and while some phones take precautions like defaulting to communicating in https for example, users are (IMO) even more vulnerable to phising and fake signup scams because it just seems like less of a risk to tap on interesting things on your phone. App store’s can only do so much to flag bad maleware. Espcially on android where anyone can upload applications, its just too easy to push malicious code and have poeple download it thinking its been 100% vetted by google or apple.

I’m looking forward to getting back into hacking the box for the final so I will leave you now. Thanks for reading this quarter! It’s been a blast!

week-8

Viagra Regex

Had some trouble locating the regex checker within the VM, so I went to a website that presumably does the same thing. https://www.regextester.com It can test against multiple strings just the like the regex checker. I’ll be testing against

1
2
3
v|agra
\/iagra
v|4gra

as the lecture indicates.
Here’s what I came up with!

1
/.*(i|\||1|L|l)(a|4|A)gr(a|4|A)/gm

This isn’t very precice, but it does the trick! It allows any number o characters in front and focuses on matching the “iagra” portion of the word. I’ve allowed any a’s with either a 4 or A, any i with | 1 L l and g and r are just there but this could be expanded if need be!

Continuing with the video, they add a new one!

v | a g r a

this can be fixed by adding optional blank space between each character

/.*(i|\||1|L)\s*(a|4|A)\s*g\s*r\s*(a|4|A)/gm

Spam in general

I havn’t had to think about email spam in a long time. While watching the lectures this week I started thinking about why this is? The last time I clicked into my spam filter, it was during my capstone project. One of our requirements is that our app needs to be able to send emails to customers. Password reset flows and pdfs are the main email contents of our application. When I was implementing the emailer, for a moment I thought that I had implemented it incorrectly. Im using a tool called sendgrid to send my emails for me but for whatever reason I wasn’t receiving any emails. I had an inkling my email client might be blocking the email and sure enough, there my emails were in the spam folder!

I haven’t had to think about email spam in a long time (for my personal email) so I thought I’d do a little research into why that is. As it turns out the wikipedia article that I found claims that the filter is not based on predefined rules or regex strings. Instead, it’s community driven! Whenever a user marks an email as spam, it gets integrated into some way into the gmail rule structure which propogates out to other users. Perhaps under the hood, gmail has implemented some sort of machine learning algorithm to use the user spam flags to “train” their gmail filters. I couldn’t find any more information, but they sure are doing a pretty good job. Opening up my spam folder now and it’s just litered with tons of garbage that I never have to see on a day to day business. Back in the day I certainly had to deal with a lot more spam than I do now!

Categorization Lab

Thought I’d take stab at parsing through postgres lab trying to identify spam. As it turns out, I’m not very familiar with postgres syntax. I had a tough time even getting a look at the message_data table description. At first I wasn’t really sure where to go but I had let the lab run on in the background and they instructor recommended taking a look at the subject line to see if there was anything common within there. I found that the word “stock” seemed to show up a fair amount so I thought I’d try to write a query highlighting subjects with the word stock in them.

1
SELECT COUNT(*) FROM message_data WHERE msubject LIKE '%tock%';

This ended up returning around 68k rows (not ideal). As it also turned out, my raw postgres is very raw. Feeling a bit dejected, I let the lab continue to roll and the instructor mentioned that the attachment hash turned out to be the golden ticket! So I thought I’d take a crack at writing a query identifying spam by attachment hashes.

The screen inside of my psql client keeps overwriting itself when the line loops back so I ended up finally writing the query in sublime and copying it in.
BUT! I finally got a query running to grab emails with the same attachment hash! The theory here is that there should rarely be a case where an email is sent with the same attachment more than 3 or 4 times.

1
2
3
4
SELECT COUNT(*) as hash_count, attachment_hash FROM message_data
GROUP BY attachment_hash
ORDER BY hash_count DESC
LIMIT 10;

This query could be expanded upon and used as a subquery to find all emails with matching source ips. Any email from those bad source IPs would then be marked as bad!

I’m going to jump on to working through some more hack the box challenges for the final! Until next week!

week-7

This week I spent a pretty large amout of time on both the lab and getting a first crack at the final! Near the end of the week, I was finally available to sit down with the web goat and inflict some damange in what turned out to be a really fun experience.

I had heard of OWASP before from my first company’s head of security. When I started there, we had a brief hour long seminar on what to do and what not to do. That was quite a while ago, but the basic gist of it was, please don’t use raw queries when using django. Stick with teh ORM because the sql statements are prepared in a way where they are much less vulnerable. Additionally, don’t blindly render use input. Make sure that the attributes for the display models are properly escaped to prevent cross site scripting. He also recommended we parse through OWASP and see if anything is interesting. Now to be fair, this was years ago and I really don’t remember much else of what he said or how in depth the conversation really went, but it’s been really fun to get back into that mindset and work on some web based security issues.

Final: Hackthebox - Lernaean Challenge

Before I get into the webgoat, I wanted to spend some time going over what I learned while doing my first hackthebox challenge. I chose the lernaean web attack where hackthebox spins up a docker container which exposes a single website page. The background is pink and there is a password input box with some text along the lines of “Please don’t try and guess the password”.

For about 2 days I made very little progress. I scoured the html, I read every key on the window object in the console, I tested a few endpoints that I thought might contain some interesting files (/static /index.html /routes come to mind) but no no avail. There wasn’t any javascript loaded on the page and the only thing to communicate back to the server with is the form post against the route ‘/‘.

My first attempt to try guessing the password didn’t go very well. I started out building a simple script that iterates over a list of passwords and curls against the form submission endpoint, writing their results to a directory.

curl -XPOST -d "password=$testpw" $host >> passwords/$testpw

When the curls were all done, I tried to use grep to find a result that did not contain the bad string “Invalid password!”

This didn’t turn out to be very reliable, as my computer kept going to sleep or the connection would get dropped and then the password files would be empty. I tried extending the script to handle empty pw files and retry them but this didn’t actually end up getting me any closer. I was feeling a bit defeated.

I took a step back and googled Lernaean and found out it was from a greek story about hercules and a hydra. I googled hercules hack and hydra hack and bingo, found out about the hydra hacking tool. As it turns out, this tool does pretty much exactly what I tried to do by hand! Fantastic. With this tool in hand, I set out to do pretty much the same thing I had attempted to do before. My tests on a simple small lists showed that hydra was much much faster than my original attempt. I then downlaoded a much muuuuch larger password list and set hydra to work.

After only a couple minutes, I had a hit! I couldn’t believe it! I’ll add it to my final project so I won’t spoil it here, but I went ahead and added it into the form submission and was once again thrown into dispair. A page rendered saying oops too slow. Sigh. I went back to hackthebox and restarted the vm, reset hydra only to come back with the same password as before with the same results.

I opened the network panel to see if there were any redirects but no luck. I decided to try my old friend curl on a last ditch effort and bingo! As it turned out there was some javascript embedded in the page that set a window.location when the page loads. With curl, I was able to capture the original html page and find the flag. I’m not sure why I couldn’t see the redirects in the browser, maybe I didn’t have my preserve network history option box checked, or maybe I just didn’t notice it because the pages ook mostly the same. Anyway, it was a very pleasing experience on the whole.

Webgoat

I decided to try some XSS attacks against the webgoat to see how far I get.

Stage 1: Stored XSS

The goal here is to try and edit Tom’s profile and have it update the database record for Jerry! I decided to try the first basic attack that I’ve learned which revolves around finishing an entry with a statement like 1==1; and then adding a sql update statement. The first attempt didn’t go so well

Toms name is now
Tom OR 1=1; UPDATE users SET stree="moose street" where first_name="Jerry"

After poking around, the server shows some interesting info, specifically that the field first_name is not correct, I should be targetting firstName.

I think instead of trying to update jerry straight away, I’ll try a more straight forward query, like DROP TABLE users;

TOM; DROP TABLE users;
and
TOM] DROP TABLE users;
both did not work.
I tried a lot more stuff but to no avail and at this point I decided to reread the instructions. Sometimes as it turns out, I need to read the directions more carefully. What the ask is, is for me (tom) to add a javascript vulernability to my information, so when Jerry goes to my page (or any time my info is loaded) my injected javascript will load. This turned out to be way easier than what I was trying before.

Yay!

The rest of the stages require a server side developer version of th web goat so I decided to move on to a separate portion of the webgoat.

Stored XSS Attacks

This one is not staged so here is a reference to the one I’m attempting.

This one turned out to be pretty much the same as the last one.

By entering

1
moose;"<script>alert('hello')</script>

I was able to trigger the xss attack when a user goes to click on the message board to view the message. hurrah!

Just to make sure it didn’t just blindly add all scripts, I tried just so the ;” seems to be important.

I’m going to get back into working the final, as I’m finding the hackthebox challenges particularly interesting and hard. Until next week!

Week 6

This week I decided to try and familiarize myself with many of the standard port protocols. I was intrigued by how many of the ip addresses from teh labs used smtp and I wanted to see what other types of applications also use smtp. Most of my time was spent doing the homework/lab this week so I didn’t get through doing all of the suggested labs from the lectures as I normally do.

SMTP

One thing that was interesting to me was that many of ip addresses used smtp but only one main onse used imap and pop3. This got me thinking that there’s probably a difference between which ones are used to send and receive. Wikipedia says that smtp is often used by clients to send or relay email while, pop3 and imap are used to receive or download emails. SMTP uses a store and forward methodology to hold on to data, so when data is transferred to it it holds on to it until all of the pieces reach it until it is allowed to forward what it’s holding onto.

Rabbitmq and AMQP

I have used a project in the passed called rabbitmq to handle events on it’s message bus. I had misremembered that it used SMTP as it’s protocl, instead it uses a different protocol called AMQP. AMQP or advanced message queue protocol looks to be defined on TCP 5672. Rabbitmq has some really nice docs about how the protocol is setup

Wikipedia put AMQP in interesting terms for me. In the same way that SMTP FTP and HTTP have standardized their area of expertise so that different clients and servers can interact reliably, AMQP does that for the messaging queue domain. It defined many different levels of guarantees such as at most once at least once and exactly once. These can be very important differentiations, as they could mean the difference between sending a recipient no push notificaitons on their phone vs 1000 push notifications by accident.

ESMTP

I also stumbled across a protocol called ESMTP or extended SMTP, which as you can probably imagine extends the SMTP protocol. It adds more features such as non ascii text capabilities with the 8BITMIME mode. I wonder why the creators didn’t submit RFCs to extend SMTP itself instead of creating a new protocol ESMTP. Perhaps it was like http2 in the sense that it was somewhat backwards incompatible, although it would appear the ESMTP handles all of the same funcitonalities as SMTP. Go figure

HTTP vs HTTPS

One area that I’ve always wanted to get more proficient in is the difference between http and https.

Theobvious differences

One of the biggest differences obviously is that http transmits over port 80 where https transmits over 443. The general problem with http is that the traffic transmitted between client and server can be intercepted and read by anyone with enough nohow to sniff tcp packets and patch them together. This becomes especialyl critical with things like passwords and creditcard info.

HTTPS introduces a sercurity protocol to encrypt traffic between client and server using a certificate. I’m not quite sure why its a called a secure sockets layer certificate but I aim to find out!

SSL

Right out of the gate I came across some interesting information. SSL was renamed to TLS (Transport Layer Security), or maybe a better statement would be to say that SSL was the predecessor to TLS and SSL has been deprecated. The book Bulletproof SSL & TLS implies that SSL was renamed TLS for purely political reasons around the internet behemoth of the 90s, netscape. To remove some of the tension around SSL being netscape specific, they changed the name to dissasociate it.

Generally speaking, the connection is connection is initiaited by a cryptographic negotiation between the client and the server. A unique secret is agreed upon at the start of the interaction and is used to encrypt information back to the server, who has the same shared secret and is able to decode. All of this is done before the first bye of data is transferred, so everything sent between the client and server is encrypted.

The secret is never transmitted directly across the network so men in the middle cannot capture that and act on behalf of the client or server. The was called SSL because that’s exactly waht it was doing, securing the socket layer. It would seem the name Transport Layer Security also has the same implications from its name.

Firewalls

I wanted to see how they relate to security groups on aws. They both seem to restrict traffic so I wanted to dig in a bit and see what I could find.

How I understand firewalls, they sit in between networks and restrict traffic between the two. That could be between the internet network and an internal network or even networks between to private networks. Basically between any networks.

AWS security groups also restrict traffic to resources (and are obviously aws specific). They can be ip specific but they dont always have to be. They are slighty more configurable as they restrict or allow based on policies rather which can be attached to clusters or new instances.

From what I can tell, they essentially do the same thing except configuring a security group for your aws cluster is probably leaps and bounds easier than configuring a firewall.

Network security at my job

One of the policies that’s been dicsussing at work is whether or not network security on teh whole the direction we want to go vs role based cryptographic security. The argument against firewalls and whitelisted ips is that given an attacker who knows his stuff and the right opportunity, they will be able to break in to some allowed host and get the keys to the kingdom. The idea behind role based authentication is somewhat different. In the most naive implementation, let anyone access anything they like! However let the services only respond appropriately to requests that are correctly authenticated.

I’m no network security expert but to me this seems like a whole different approach to network security in general. I’m sure that going purely with the second would never be a realistic option, the second would probably encompass both. This week has inspired me to learn more about this topic! I’m going to ask about it at work and hopefully at it to my writeup for next week.

Cheers!

Week-5

This week we’re taking a look at rootkits. I’m glad the the instructor Aditya has included some review of past tools in the first lecture, as I’m still feeling a bit shaky on a lot of the subject matter. I have found the tools somewhat user friendly to use in the moment, but I imagine like many software tools, using them consistently over a longer period of time leads to a deeper understanding.

Cuckoo and Agony

Running the sample

This week I’ll start off by booting up the Agony sample from lecture and taking a look through cuckoo with it. The goal here being to determine whether cuckoo can pick up on any of the nuances of this particular rootkit. My hypothesis is, no, we won’t see anything particularly interesting or else why would we be learning alternative methods to analyze malware like this!

As always, I immediately run into an issue. The file requires a password, after piazza-ing around for a moment I found the password, ‘infected’ which I guess I should have tried to begin with. Fakenet has been started and I attempt to run cuckoo!

Wathing the cuckoo terminal I immediately see a windows permission error, although I do see cuckoo attempting to run the agony virus (yay!). The mouse starts moving around the vm as I expect but eventually, cuckoo hits an irrecoverable error, something like “bad status line” and exits. Even more interesting, it looks like “bad” has stopped working and has crashed.

Time to analyze some logs!

Log analyze

First I thought I’d take a look at fakenet to see if there are any clues I can look at in the cuckoo logs. Right off the bat I see some interesting behavior, it looks like two requests were made, the first was made to toolbar.google.com on port 80 and the second was made to microsoft.com on port 80. It’s hard for me to tell if there is anything really malicious going on here because as far as I can see, these are legitimate hosts.

Looking at the cuckoo logs, it’s hard for me to know exactly whether or not changing perms in the system and updating registers is suspicious or not, but it surely looks suspicious to me!

To me it looks like here agony looks up whether it has permissions for something, adjusts the permissions, then saves some values to the computers registers, then finally goes off to install soemthing from the interent. This is pretty sketchy in my opinion but again, I’m not very familiar with windows apis or exe/dlls. This could be normal, but again most programs I can think of can operate within their given permission boundaries just fine without the need to update them on the system they’re executing on.

As was pointed out in lecture, there were three files created that cuckoo was able to capture

Running *.sys on the directory, I was also able to find the hidden sys fyles (pretty cool!)

Tuluka also turns up those suspicious running processes. Tuluka seems like a really helpful tool! I wonder how it would run on the first few samples we’ve run in the class from weeks 1 and 2.

Live Kernal Debugging

I’ve been trying to figure out what to connect windbg, the lecture just quickly skips over that part. I have Tuluka up with the three suspicious processes running, I have windbg open which is not connected, the only options that I can see to connect ask for connection strings? I’ll keep pushing forward with the lectures in the hopes that more will become clear.

Still having trouble:

I will just finish out this lecture and try to summarize my learnings.

In summary: this lab is intended to highlight the hooking abuse that some malware exploits. The malware has inserted itself to sit as a proxy between calls to (in this case) find next file and the actual execution of the find next file code. The debugging exercise is so that me as an anti virus expert, can step through the system function calls and watch as different instructions get run, noting their spot in memory. As I step through, I will see instructions that will send the execution pointer outside of teh kernal memory set. This is where the malicious code is being run. After this code gets executed, we will see the return address point back to a spot within the kernal. The ask by the instructor here is to step through and find the offeset of memory where the hook has jumped us to outside of the kernal. I’m a bit fuzzy on why the offset is important and not the actual spot in memory where the code is executed from, but I’m hoping that more will be revealed in the next series of lectures.

I’m glad the instructor in the following lectures spent time going over the concept. I think I had it pretty clear but the expansion of what a C program would look like to hadnle this problem was particularly illuminating for me. If I wanted to NOT do this by hand using windbg, I would write a C program that would find the kernal memory boundaries, read through my SSDT table and identify any memory addresses that were outside of this boundary, walk through the malicious non-kernal code and identify the return memory address (which persumably would be the original kernal address). If for whatever reason teh return address also falls outside of the kernal, we could do the walk through all over again. Finally, once the kernal address has been identified, we would patch the SSDT table with the kernal address so that the api call would no longer be hooked. The malicious code would still exist, but it would esentially be inert (at least for the purpose of hooking that api call.)

Week 5 Homework

A big portion of my learning this week was focused on working through the homework assignment! My work uses linux computers to run its server so I thought it would be a good idea for my career to work through this assignment on linux. The assignment asks for a fair amount of process statistics, so the first thing that came to my mind was to use something like bash scripts. There are some pretty simple ones like ps a which could easily fulfill the requirements for requirement 1 “Display running processes”. I thought this felt a little too much like cheating so I decided that I was going to write the whole thing in C!

/proc

As it turns out, the proc directory isn’t exactly your normal directory! It’s a special directory that’s managed by the operating system. It is where the system stores information regarding processes and the libraries to which those processes depend. This sounds exactly like what we want to accomplish this assignment. Running processes each get a directory in the /proc folder with their pid as the directory name. This means that to show what the running processes are currently, we just need to iterate through the /proc folder and print any files that are numeric in name.

Threads

From what I can tell about the behavior of /proc, threads are not given their own top level proc directory. Instead, they are added to the parent proceses /proc//task directory. To display which threads are running for which process, we can reuse the read directory code from before and simply read the task directory of the targetted process. Easy!

Child processes

Showing the currently running threads can give us a good amount of information about what’s going on on the host computer, but I thought it would also be useful to include child processes in the display as well! As it turns out, within the proc/ directory there’s a children file that contains all the child processes! Easy! All we have to do is to write this file to screen.

Shared libraries

This becomes bit more complicated than the previous tasks. Once again the /proc directory has the answers we need! Within the /proc//maps file lists all of the included binaries and their memory addresses used by the process! Perfect! Writing this to screen should satisfy the requirement, however I began running into permission issues. As it turns out, reading other bits of memory is generally not okay for a regular user like me! To solve this, I popped into a docker container which automatically runs as root. This way I could explore the proc directory unhindered.

Examine memory

The /proc//maps indicates which binaries are being used and at what memory adresses, so to examine memory I took the ending location and subtracted it from the starting location. With the size of the binary in hand, we can walk through the memory addresses at the library start and printing each byte to screen!

Summary

This project turned out to be harder than I had anticipated. As it turned out, I’d completely forgotton most of what I knew about C and furthermore, reading memory turns out to be somewhat complicated. I’m not entirely sure I was able to read the correct values, as I learned that each process has its own virtual memory addresses. Overall it was a good experience to dive into a linux machine for once and explore the system.

Until next week!

Week-4

WinDBG

This week we jumped right into labs which was refreshing! I was able to load up my vsphere vm with the exploitme program running within half an hour of starting the hours and hours of lectures so this was a nice turn of events. I’m looking forward to acutally targetting some exploits this week as it’s always something that I’ve found fascinating. And now maybe I can reread the girl with the dragon tattoo series with a different lense.

Running lab 1

My first task was to find the correct files to run. Exploit me seemed pretty straight forward but I tried to follow the setup directions and ran into some oddities. The .pdb file was already located in my C:\Windows\Downloaded… programs file so I initially replaced it there but then was worried that I’d ruined the symlink that the next setup line instructed to create. I ended up reverting my vm back to the most recent snapshot and attempting to forge onward with the assumption that the .pdb file in my Windows directory was put there to make my life easier.

With that in mind, I was able to get the debugger to break at the entrypoint provided in lab 1!

Now I’ll attempt to answer some of the questions outlined in the lab

1) What address is FSExploitme.ocx loaded at?
My initial guess was that the instructions had asked us to but a break point at the entry to the ocx program, so my answer was 54431df0 but I after peaking at the answer, I realized that there is probably a ton of setup that must take place in a program before finding bytes to set break points at, so it would make sense that the program wouldnt start there. In fact it starts at 54430000

2) How large is the stack?

It’s been a while since my assembly class but I believe we can analyze stack size by taking a look at how much room is cleared off of the stack at the start.

14h converted to decimal would be 20 -> 20 bytes is my guess, let’s see how I did.
And I was way off. Running teb shows stack size to be 1900 bytes

I was having trouble running the answer code, as it looks like it can’t resolve the variables I’m interested in calculating “StackBase - StackLimit”

3) What is the starting address of the Process Heap?

This time instead of guessing I thought I’d take a look at the lecture slide cheat sheet. Here I found a command where I can take a look at process !peb

At first I was sad because nothign useful seemed to show up in the command results but it turned out my window was too small!


000e0000

4) What is the value of EIP

Well I know this ones a register so it should print out when the breakpoint hits.

54431df0

5) How much space is issued for local variables?

Maybe the answer to this one is what I initially thought #2 was. I’m going to go with 20, which is the amount subtracted from the esp (or ebp I can’t remember which one goes first)

(Correct! feels good!)

6) Execute 5 instructions with t 5, what is the string at the top of the stack?

I ran a bunch of combinations of dd, da and du on all sorts of locations in memory, but I wasn’t able to turn up anything but gibberish. Going to take a look at the answer and analyze.

Ah interesting I had run du 020e9c14 but I had not made it a pointer. Running du poi(esp) I get

7) Trigger the breakpoint and enter the loop 11 instructions later. How many interations do you think the loop will run?

My guess, the loop will run 10 times.
Here the address at ebp-4 is set to 0, moving it to eax, adding 1 to it, then checking it against the constant 10 and jumping out when its greater or equal.

Surprised I got this one correct!

8) What is the return value?

As stated in lecture, the return value of the function should be set in the eax register, so running pt then r shows that eax has the value 7a69h or 31337 decimal

9) Is the esi value within the Stack, Heap or Text Segment?

Took a gander at the lecture cheat sheet again, they have the !address helper there. Running that shows that esi is on the Stack.

Overall this was a pretty tough experience, although I was able to make it through more than half the questions. I’m running into my lack of general knowledge about how computers work. I’m Looking forward getting more clairity on how assembly plays into the whole picture.

I was happy to see that in the following letures, Brad went into a review about how the stack works within an execution frame, however for me I was still pretty lost. When it got to the last lectures and discussing heap sprays my mind was pretty blown. I hadn’t ever considered javascript exploits in such a way, and I felt like that one student in the videos who couldn’t seem to pinpoint exactly what was going on, so I’ll try and explain it as best I can.

There’s a hacker who finds an exploit some dll called program X. Program X can be run by internet explorer, even if the user has to click “okay” to turn on the dll program. The hacker has verified that by writing some simple javascript code, they are able to push their own code into the dll execution, thus giving them previously forbidden access.

In this situation, the hacker creates a website that innocently prompts the user if it’s okay to turn on program X while viewing the website. The user clicks okay and then interacts with the website. The hacker has already pinpointed the an overflow vulnerability in program X, so once the user turns it on, the hacker’s code overflows some bit of the stack, changes the return address of one of the layers of the program x program. The return address points to some other bit of code that the hacker has written into the stack, in this fake scenario it is the address of hte calculator program.

I was trying to follow along with how the hacker was able to find their shell code within the browser heap, but I still don’t quite understand the heap spray and I feel a bit stuck on the lab so I’ll try and explain my thoughts. The heap spray adds a series of similarly sized blocks of memory to the haep so that when the heap is dumped for inspection, we can quickly skim through the normalized blocks of memory to find the intersting parts that were looking for. My question is, how does this make the block of shellcode reproducably found? Why do we even need ha heapspray, as adding more code to the heap seems like it might just pollute it? It could be my misunderstanding about how heaps work. Perhaps by adding so much into the heap, since the heap must reorganize itself and most added chunks are of the same size, the tree of heap data will eventually even out as the tree grows?

Well, this week was especially hard for me. I’m hoping to rewatch the lectures again and try and rewalk through the different exercises to get a better grasp on the material. I regret taking assembly and this class so far appart as I think it would be really useful to have that type of knowledge fresh in my mind. Windbg and the page heap are super cool tools that I’d like to get more comfortable using! Windbg seems pretty complicated to me right now, it almost seems like it has its own programming language to learn, so hopefully over the course we’ll keep using it and it will become second nature!

Until next week !

Week-3

Malware Defense

Yara

Introduction: Working through confusion

I started off this week (again) generally confused about what was going on. After following the lecture videos as closely as I could, I was able to determine I was supposed to scan the windows system32 directory for potential malware string matches. Turns out (again) I was overthinking it, as I dug into slack and piazza to see where and how we were supposed to run the malware examples. After rewatching the lab lectures again I came to terms with running File Insights on the malware to grab strings and then comparing the strings found across malware files.

Down to business: Finding the common strings

The first step, as I mentioned above, was for me to find the file samples and run file insights’ strings plugin on each one. I ran strings All at first but realized there was a bit too much there for me to find anything useful, and I settled on the simpler strings plugin. I saved each of the 7 sample results to .txt file so I could investigate but pretty quickly I noticed a pattern string “Jenna Jam”. This caught my eye because as some more innocent folk may or may not know, Jenna Jameson is an adult film actress and the likelyhood of a permutation of her name ending up in the system32 directory is miniscule.

As it turned out, this was a reference indeed to the film actress, as a list of movies popped up in a couple of the strings results. My guess is these are a list of popular and enticing movie titles that would get a lot of clicks if they were to be posted as free downloads. I owned the whole CKY Bam Margera dvds when I was younger. I might have clicked on a download for these if I was feeling especially piratey back in my college years.

Another common string that came up a few times was jJo!, so I figured I’d keep that one in my back pocket in case “Jenna Jam” wasn’t found in all of the sample files. Turns out, it is indeed in every one so I set out to test my luck with yara.

Yara: Test against the malware samples

I had found a common string “Jenna Jam” and I had set out to test it using Yara. The first hurdle however didn’t even come with the tooling, as I had trouble finding the executable. After using the windows search bar, I attempted for about 10 minutes to execute the directory containing the .exe file without luck. Finally realizing that the snippet I was attempting to run was in fact targetting a directory, I was able to bring up the yara help options at last. I was also able to locate the yara editor next to the yara exe and wrote my first yara rule set.

The code here depicts my yara rule after a few frustrating iterations. I had remembered in lecture the professor describing a difference between string types “ascii” and “wide” and I naively hoped he was must mentioning that to be informative but alas when I ran my yara rule as it was originally written, I was not able to target a single one of the sample files. I was sure the strings existed in there, as I’d pulled every string plugin to text and analyzed, so I figured it was the case that I was targetting the improper string types.

I was able to find some pretty spiffy documentation at the YARA site.which came in very useful. They recommended to catch all string types that I pass both keywords ‘wide’ and ‘ascii’ to my string definition and just like that, I had matches!

I finally had a rule matching against the malware samples. But would it be good enough to ignore all files within system32? (My guess, yes).

It turned out I was correct!

. There were a few files that yara likely does not have permission to scan and those show here, but otherwise there wer no hits! I contemplated this being a false positive, so I changed my yara rule to strings: "file" just to make sure and many hits were logged so I knew that my yara rule had passed the test.

Pros: My yara rule is so simple! No windows files were hurt in the running of this yara scan.

Cons: Like the professor mentioned what about the next iteration of this malware? Jenna Jameson films will probably fall out of favor at some point, my yara rules will not fair well as time passes. This ruleset could be exanpded on to target the structure of the data rather than the film names themselves, I imagine that this would be far more robust in that sense if I were able to write a rule for that case instead.

Cuckoo

Analyzing 068D5

The first sample that I took a look at seems to me not to be malicious. From what I can tell, it seems to be a keyboard driver. There’s a chance that it’s a key logger, but from what I can see there’s nothing particularly malicious. I found an itneresting method name ImmDoThings which doesn’t strike me as very Microsoft-y but poorly named functions hardly mean malware. There’s a series of calls that loop through an array and call KeyHandle which makes me think it’s just attaching handlers to keyboard keys. I’m going to shelve this sample and move on to the second one.

To add, there was also no suspicious network traffic that I could see on fakenet.

Analyzing 00670F

I attempted to run this one but unfortunately nothing showed in the cuckoo logs. I had the bad file on the desktop but there was no network traffic captured on the fakenet trap and nothing added to the cuckoo logs. I’m going to move on from this one and move onto the next sample. There was however a suspicious dx.bat file that appeared on the desktop..

Analyzing 4844FD

Okay, I think I’m having trouble with cuckoo in general now. Analyzer spins up, reports a failure and displays a quick stack trace and then exits. I’m going to try reverting my snapshot even further in case something got messed up. I keep getting this error:

Going back to #2 -> 00670F

I suspect this one is doing something malicious. It’s adding a dx.bat file to the desktop and also adding a shortcut to a (probably fake) internet explorer. However I still can’t get any loglines to show inthe C:\cuckoo\logs directory. I realized that I can’t be running fakenet when I kick off the program, but for whatever reason the cuckoo program still failed. I’ll try again on the third sample, 4844FD

Going for #4 -> A1874F

I finally got cuckoo to output some logs. There are three files, the second largest is has the process bad (the first is cmd.exe and the third has what I imagine is a pid of 292).

I started parsing through csv lines and I started googling .dll libs that I came across. One that caught my eye was ntshrui….dll. When I typed in the beginning and googled it, the results came back with a dll library that allows for file transfer via the network. This has potential to be suspicious! Transferring files over a network is normal but obviously is more risky than moving files around on a single computer.

I had a feeling I didn’t google the dll name correctly so I googled the full name ntshruis2.dll and the first hit claims that this lib is indeed dangerous! We have a winner. Further investigation on macafee show that this is likely a trojan.

Every time I’d run the program, it would delete istelf from the desktop. I finally found the line in the program where it deletes the virus source

A simple rule that I can imagine would naively catch this file specifically would be

rule fineBad
{
strings:
$a = “ntshruis2.dll” wide ascii
condition:
$a
}
Unfortunately because the the trojan deletes itself after running I don’t think that this rule actually is all that useful.

Conclusion

There is still a lot for me to learn. I feel I will be able to get a better feel for yara and cuckoo over the next few weeks. I ran into a lot of logistical issues keeping the vm running, getting cuckoo to run properly and getting it to activate the bad files. I’m hoping that by exposing myself to more tooling and samples, the logistical part of the process will get easier.

On a side note, I would be interested to know if the professional virus hunters use the same vms were using. They are slow, the desktop area space is really small (can’t show many things at the same time) and the OS keeps freezing, not responding and the screen keeps blacking out. Do professionsls use somethign different? In lecture, I remember the professor noting that they use a really old version of windows because it’s usually guaranteed to allow the virus to run. If so, that is a rough job!

Until next week!

Week 2

Recovering Lost Files

I started out this lab a bit confused. I thought the question in lecture regarding whether or not volitility created the memory dump for us or not was a good one but the conversation around the question was tough to follow. I thought I’d do a little exploration on my own when the lab began and he noted we needed 3 ingredients. OSFMount, the .fat.dd image, and PhotoRec. After looking through the windows vm for a while, I remembered back to the introductory syllabus to the mention of a shared mounted directory for some lab resources. There they were! OSFMount and the memory image to recover from. From here I kicked the lab video off once again and the professor was able to guide me to the PhotoRec executable.

I mounted the image at E:\ and opened up PhotoRec to begin the recovery process. It turned out to be pretty painless with a small hickup. The first attempt I must have tried to write to a directory without correct permissions. I got this error:

I figured I should just try again and make sure to put it in a directory I know that I have control over (the desktop!). Writing there, the recovery process went just fine.

Mayflower

Question Summaries

1) What is/are the cyber targets found on the stick?
S-Oil Onsan Refinery
GS Caltex Oil Yeosu Refinery

2) Investigate possible malware and describe how it works
The malware has some targets, ip addresses and switches in the csv file protected by a password found in the dont tell ms il ung.jpg file. The malware likely opens up the csv with the password, then plugs those targets into the stuxnet backdoor exploit. The malware has some username and passwords baked into it (probably obtained by a password cracker) which they can plug into stuxnet. Once the backdoor is configured, stuxnet is deleted.

3) Display the list of username/passwords

Dayals-1 | London13!
JHKim4-1 | !Tomorrow33
KManku-1 | M@nday77
MMcLean3-1 | @Smiley91

4) What offset value did you find the list at?

XOR key 0x67
OFFSET 0x3ebbd

5) Which relevant files were deleted and can you recover them?

I was able to recover the GOV hacked background file which led me to find the passwords within the .bin malware file. Seems to me that because these files were deleted, that the attackers might have been using part of the SONY attack as a piece of their strategy. Perhaps the SONY attack had a sneaky backdoor that the attacker wanted to use and then once they obtained the backdoor they deleted the SONY malware.
I believe there are other relevant parts of the STUXNET exploit that were deleted but I had a hard time determining which ones were which. A lot of the files deleted there were probably unrelated to the attack, maybe just a user’s old information that they deleted (seems sloppy?).

6) What strategy would you give the targets?

I think it’s reasonable to ask that employees use randomly generated passwords from a character generator instead of using common words. I don’t know how these passwords were obtained but I find it suspicious that each of these 4 were regularish words. These passwords seem like they would be good targets for a password cracking program, so enforcing a randomly generated password policy might have helped obstruct some of the attack.

Investigation Notes

Hint 1:
Passwords recovered from photorec is fake data
Only 2 or 3 recovered files you really need
.bin file is the malware
Username and passwords are hidden within the .bin file

Hint 2:
.zip file contains the targets (answer to #1)
password to zip is hidden in another file
password is in .jpg dont tell ms ung

Hint 3?
wallpaper hacked by GOP holds clues!
Find company who got hacked here and search for the company name

Start - 10:37
Setup share network at \10.100.0.1\share
Install OSFMounts
Mount mayflower to E:\ from the network share
Running photorec on it and extracting to desktop 10:44

I started off by trying to google the file names to see if I could infer anything intersting from them, however my VM was not connecting to the internet and I wasn’t able to copy text from within the vm to my local computer. Probably for the better as if I were able to copy somethign I might accidently run something that I shouldn’t run.

At 11:05, I had a lot of trouble trying to open and unzip the .zip file to get at the csv inside. It wasn’t that I couldn’t find the password, (infected123!) it was that every time I tried to open it to enter in a password, I would get a system error instead. I was finally given a password prompt using 7zip and got in to see the csv contents.

After looking around at the recovered flies a bit more, I was able to find the GOV background. There I could see SPEData.zip as the extension mentioned in the hint. Quick google shows that this is actually Sony pictures!

I took the hint to search the bin for SPE using fileInsights and the XOR-Search plugin to find 4 username and passwords.

Slightly more cleaned up:

Dayals-1 | London13!
JHKim4-1 | !Tomorrow33
KManku-1 | M@nday77
MMcLean3-1 | @Smiley91

Next I started searching for files that could be relevant and were deleted. I used an XOR-search for the term .exe within the bin and came up with an interesting line

cleaned up this reads
wmic.exe / node:”%s” / user: “%s” / password: “%s” PROCESS CALL CREATE “%s” > %s..%d_%d…ProcessId -s.WTSQueryUserToken…wtsapi.dll…

Googled wtsapi.dll and it looks like its a library necessary for running a remote shell. My best guess currently is that this bit of code is attempting to create a backdoor with, substituting the users and passwords in from the list above.

I’m going to run volatility on the mem dump and see if I can see anything interesting.

Running pslist from volatiltiy, we can see evil.exe and svchest.exe running just like our first lab from last week. This is confusing to me, could it be this memory dump was taken from a computer that ran both? Are they actually the same virus? Maybe one virus is using part of the other, but I think evil.exe is something specific to our class and this challenge was given as a sample from a real world case.

Week 1

After attempting to use a number of different blog programs (Gatsby, jekyll, ghost), I finally found one that was pretty painless to setup and even integrates with github pages! I’m currently using Hexo which I had never heard of before but has over 26k stars on github. Very cool. It’s all written with markdown and it’s trivial to create new posts and publish. Anyway enough about my setup, let’s get started.

Learnings

Windows

It was immediately apparent that a fair amount of the class would be using the Windows environment. I’m pretty unproficient when it comes to the windows terminal, but I googled around and found powershell was installed on the windows VMs for the class! Although not completely similar to the unix shell, programs like ls and cd make it feel a bit more like home.

Tooling

I really enjoyed getting comfortable with the tools, specifically fakenet became a pretty handy tool to see what the network traffic looked like during the attack. I was pretty confused for hours about why a the fakenet html page kept ending up in my computers /etc/hosts file. I slept on it and the next day I suddenly realized this was fakenets response to a request. This meant that the virus was attempting to request something on the network and add it to my local etc/hosts file.

It’s moments like the above that have me really excited about this class. I very much enjoy the problem solving aspect .

Another tool I found really useful was the antispy program. From an untrained perspective, it seems to be a compendium of generally useful information about the currently running processes, registries and open files. I found this tool to be incredibly useful when analyzing a evil.exe, especially pulling the strings from the program itself! From the strings information, we can see things like file names that the program was targetting and investigate those files directly.

The windows explorer I found to be pretty useful too, as it allows you to sort by date modified. If a the virus modifies a file it will likely show up as changed here, however I have no doubt there are ways for a hacker to sneakily hide the fact that a program was modified.

One tool I was not able to get any utility out of was the attrib tool. I will continue to poke around with this tool on the next lab, but every time I would try to use it, I got a blank response.

Capturing Transient Files

I couldn’t easily capture some files as they would create and delete themselves. I could see them perhaps pop up quickly on the process explorer but then they would disappear as they were deleted. I decided to write my own python script to capture these files when they appeared. The pseudocode went roughly like this

    define files to watch
    create temp directory
    while true
        for each file to watch
            if file exists
                log found
                copy it to temp direcotry                
                remove file from files to watch

I would run this program alongside the virus and it would capture the transient files qiute well. This program is simplistic though, as it wouldn’t have captured different stages of files if for instance any one of the files to capture was added to after its creation. I think for this next assignment, I’ll try and find a program that can do this for me more elegantly.

Overall Takeaways

The biggest takeaway for me this week is that virus and worms often do not have a single strategy of attack. For instance, it would appear that the sample from this week had the IE exploit as well as a directy UDP connection backdoor attempt. Robert Morris’ worm had three different programatic attacks and would then fall back to a password dictionary attack. My point is, next analysis I’ll be on the lookout for the bigger picture to contain most likely more than one avenue of evil.

Also, we have access to a linux VM on VSphere so I’m excited to perhaps analyze a virus/worm specific to a Linux environment at some point!

The Morris Worm

In lecture, the Morris Worm was referred to in a way that made it seem like most people had heard of it and know it’s history and nature. Well I certainly had no idea and was very interested to learn! I set out to read more about it and stumbled upon a video series from SourceFire called Chalk Talks. In the short series, the narrator goes over a brief history of Robert Morris the man, and then delves more deeply into the behavior of the worm.

The information for this section was gathered from the Chalk Talk Series as well as The Morris Worm Wiki and an article found on the University of Utah cs site about the worm by Donn Seeley.

Robert Morris was a graduate student at Cornell University who decided to experiment with some explitations he had found out about in the Unix system, namely fingerd, sendmail and rsh/rexec. He released the worm and a mere two days later, the video reports that roughly a tenth of the computers connected to the internet slowed down to a point where they became basically unusable. The worm would copy itself onto other computers and analyze whether a copy of itself was already running or not. If it found a running copy, it would randomly terminate itself or the copy. However, 1 in 7 times, it would keep both copies running and continue trying to replicate itself onto other computers. It was this behavior that resulted in the worm being harmful, as some computers became so bogged down with worm processes that they couldn’t function on other tasks.

Robert Morris was convicted of felony “Internet Fraud” and was billed a whopping 10k (that’s a lot of money now, but that was a LOT of money back in 1988!). He didn’t serve any time, but he did get probation and community service. He had a pretty successful career after, participated in founding a few companies like Y Combinator and is currently a tenured professor of Computer Science at MIT. He claims his worm was never intended to be malicious.

I found the finger exploit to be particularly clever and fascinating. The gets function reads bytes from a stream and stuffs them into a buffer. The finger daemon expects the finger client to send it a usename to request data about. Morris found out if he were to send a username longer than was intended (512 bytes) he could fill the gets string buffer, however gets would keep writing and end up writing to the file system. He was able to exploit this behavior in such a way as to trick the computer into returning from the gets program to a different byte address and it would then execute the sh program with a socket connected to the infected computer.

Now that the finger client has a direct socket connection to a running sh program on the fingerd server computer, it would send a copy of itself via the socket, then request that the copy be run on the new computer.

The worm had three different attacks: rsh/rexec, the finger exploit above and sendmail.

The rsh attack banked on the idea that the current user might have the same account and password on a connected remote host. Morris exploited the fact that users often dont want to remember multiple passwords so the tendency is for users to use the same username/password multiple times. If this failed, he would fall back to the finger attack. If the finger attack failed, the worm would fall back finally to a sendmail exploit.

The sendmail exploit was exploits a loophole in the way sendmail is commonly compiled. When compiled with the DEBUG flag, a client is (was?) able to send commands as a username that could be run on the receiving machine. THe worm would easily enough establish that the sendmail server was compiled with the DEBUG flag, then send a bootstrap copy of itself which it would compile and wait for a connection to fully create the worm.

The worm also had a second routine to infect computers involving a dictionary attach and exploiting simple user passwords, however my favorite attack is still the finger attack and exploit.

There are arguments on both sides that the worm was malicious or not. It didn’t delete files but it did however replicate quite quickly. 1 in 7 is not a realistic attempt to control the spread of the worm, and could be seen as a mere attempt to avoid instantaneous symptoms of infection. On the other hand, it has been noted that there are pretty basic coding errors that were found in the code, so the population control could have been yet another error or oversight.

Next Week

I hope to take some time to learn about the Stuxnet virus that was mentioned in lecture (as well as highlight some interesting points from the weeks material).