iCode

/iCode
­

About iCode

This author has not yet filled in any details.
So far iCode has created 13 blog entries.

Bubble sort and swapping 2 elements without temp. variable

void swap( PINT i1, PINT i2 ) {
  *i1 = *i1 + *i2;
  *i2 = *i1 - *i2;
  *i1 = *i1 - *i2;
}

void bubblesort( PINT arr, int iN ) {
  for ( int i = iN - 1; i > 0; i-- ) {
    for ( int j = 0; j < i; j++ ) {
      if ( arr[j] > arr[i] ) { 				 
        swap( &arr[j], &arr[i] );
      }
    }
  }
}
By |April 13th, 2016|Uncategorized|0 Comments

Closer look at Petya ransomware (MBR)

I took a closer look at the Petya ransomware MBR modification(s). It turns out, that indeed Petya is just a poor “DOS-style” Master Boot Virus, no “military strength” full disk encryption involved AT ALL. In other words: Petya is a joke.

Fact is: After execution, any (active) MBR partition (Partition0) gets infected by a custom Petya-MBR. GPT partitioned HDDs are not affected. The ransomware then “encrypts” the actual MBR record, sector 0 (512 bytes) of the Windows drive with XOR using value 0x37 as a key.

The encrypted MBR is then copied to 0x7000 on the HDD. The original MBR will then be overwritten by the Petya MBR bootloader: A stub, which eventually jumps to the ransom payload:

Capture1

Left: Original MBR, XOR’ed with 0x37 stored at 0x7000 (encrypted copy)
Right: Original Windows XP MBR (original)

Capture

Note: 0x04 (encrypted) XOR 0x37 (the key) = 0x33 (original value)

So there is nothing special over here. Afterall, Petya is a weak and poorly implemented ransomware. It can easily be removed by

  • 1 Clone your infected HDD. Whatever you do, work with a copy of your HDD, ALWAYS!
  • Un-XOR your original MBR (XOR’ed) @0x7000 using XOR and the KEY (BYTE) 0x37.
  • Copy that sector back to 0x7000 to 0x0 (overwrite Petya MBR with the original MBR)
  • [ Delete (set to zero) bytes sector 2-NTBOOTLOADER-1 ] – optional

Left: Petya MBR stub (infected)
Right: Original MBR, XOR’ed with key 37h saved at 0x7000 by the ransomware

Capture2

By |March 27th, 2016|Uncategorized|0 Comments

Petya ransomware analysed (a Bitcoin Cryptolocker Variant)

Another annoying ransomware, called Petya, is currently in the wild. The crypto-locker variant, encrypts/destroys MBR and boot records of Windows drives. It’s been spread as an attachment via email, disguised to be a job seeker’s application. I took a short look at it.

More about Petya
http://blog.trendmicro.com/trendlabs-security-intelligence/petya-crypto-ransomware-overwrites-mbr-lock-users-computers/

The malware uses several layers of encryption, anti-debugging and anti-emulation layers. To avoid analysis, it makes use of the typical “ask for more than available” strategy: Either wasting too much CPU time (causing sandbox timeout) or allocating too much memory (out of memory failure) etc. It doesn’t try to detect emulation of AV’s however.

I’ve removed the overhead to focus on the core of the malware. Once dumped, I’ve patched the underlying DLL to act as a PE executable, changing the PE header + adding some code caves. Dealing with a PE32 executable, makes the analysis much more convenient. Loading that one into IDA however didn’t provide much help, as the author put quite some effort into this piece of shit to keep it’s size small and to avoid analysis, by NOT using any c++::std functions. So it took quite some time to fully reverse the binary, and to spot functions like new(), delete(), memset() etc. Using that one + IDA + Olly concluded my overall analysis, which I not going to release here. I don’t want to educate SKIDS about ransomware.

Download: Petya Ransomware / Malware, decrypted, unwrapped, DLL->PE32 .exe converted
http://ul.to/ud1zi7oi

Password: infected

Uploaded to Virustotal, for detection
https://www.virustotal.com/en/file/87f202a4c28b27437f505873d8dcbe1cc966a27efa4d79b440704cf341ab4a22/analysis/1458949913/

**** WARNING: Executing this executable will damage your Windows installation! Run it under VMWare only! Use at your own risk. For educational purposes only. Im not responsible for any damages. Do not download if you do not agree. ****

By |March 26th, 2016|Uncategorized|4 Comments

Quicksort: Using std::qsort

After playing with my own implementation of Quicksort, I did a comparison with qsort as implemented in C++ standard lib. Disappointing to me, that implementation turned out to be a lot faster. But why? After taking a look at the source, I did some research.

SECURITYSAFECRITICAL_ATTRIBUTE
#ifdef __USE_CONTEXT
void __fileDECL qsort_s (
    void *base,
    size_t num,
    size_t width,
    int (__fileDECL *comp)(void *, const void *, const void *),
    void *context
    )
#else  /* __USE_CONTEXT */
void __fileDECL qsort (
    void *base,
    size_t num,
    size_t width,
    int (__fileDECL *comp)(const void *, const void *)
    )
#endif  /* __USE_CONTEXT */
{
    char *lo, *hi;              /* ends of sub-array currently sorting */
    char *mid;                  /* points to middle of subarray */
    char *loguy, *higuy;        /* traveling pointers for partition step */
    size_t size;                /* size of the sub-array */
    char *lostk[STKSIZ], *histk[STKSIZ];
    int stkptr;                 /* stack for saving sub-array to be processed */

    /* validation section */
    _VALIDATE_RETURN_VOID(base != NULL || num == 0, EINVAL);
    _VALIDATE_RETURN_VOID(width > 0, EINVAL);
    _VALIDATE_RETURN_VOID(comp != NULL, EINVAL);

    if (num < 2)
        return;                 /* nothing to do */

    stkptr = 0;                 /* initialize stack */

    lo = (char *)base;
    hi = (char *)base + width * (num-1);        /* initialize limits */

    /* this entry point is for pseudo-recursion calling: setting
       lo and hi and jumping to here is like recursion, but stkptr is
       preserved, locals aren't, so we preserve stuff on the stack */
recurse:

    size = (hi - lo) / width + 1;        /* number of el's to sort */

    /* below a certain size, it is faster to use a O(n^2) sorting method */
    if (size <= CUTOFF) {
        __SHORTSORT(lo, hi, width, comp, context);
    }
    else {
        /* First we pick a partitioning element.  The efficiency of the
           algorithm demands that we find one that is approximately the median
           of the values, but also that we select one fast.  We choose the
           median of the first, middle, and last elements, to avoid bad
           performance in the face of already sorted data, or data that is made
           up of multiple sorted runs appended together.  Testing shows that a
           median-of-three algorithm provides better performance than simply
           picking the middle element for the latter case. */

        mid = lo + (size / 2) * width;      /* find middle element */

        /* Sort the first, middle, last elements into order */
        if (__COMPARE(context, lo, mid) > 0) {
            swap(lo, mid, width);
        }
        if (__COMPARE(context, lo, hi) > 0) {
            swap(lo, hi, width);
        }
        if (__COMPARE(context, mid, hi) > 0) {
            swap(mid, hi, width);
        }

        /* We now wish to partition the array into three pieces, one consisting
           of elements <= partition element, one of elements equal to the
           partition element, and one of elements > than it.  This is done
           below; comments indicate conditions established at every step. */

        loguy = lo;
        higuy = hi;

        /* Note that higuy decreases and loguy increases on every iteration,
           so loop must terminate. */
        for (;;) {
            /* lo <= loguy < hi, lo < higuy <= hi,
               A[i] <= A[mid] for lo <= i <= loguy,
               A[i] > A[mid] for higuy <= i < hi,
               A[hi] >= A[mid] */

            /* The doubled loop is to avoid calling comp(mid,mid), since some
               existing comparison funcs don't work when passed the same
               value for both pointers. */

            if (mid > loguy) {
                do  {
                    loguy += width;
                } while (loguy < mid && __COMPARE(context, loguy, mid) <= 0);
            }
            if (mid <= loguy) {
                do  {
                    loguy += width;
                } while (loguy <= hi && __COMPARE(context, loguy, mid) <= 0);
            }

            /* lo < loguy <= hi+1, A[i] <= A[mid] for lo <= i < loguy,
               either loguy > hi or A[loguy] > A[mid] */

            do  {
                higuy -= width;
            } while (higuy > mid && __COMPARE(context, higuy, mid) > 0);

            /* lo <= higuy < hi, A[i] > A[mid] for higuy < i < hi,
               either higuy == lo or A[higuy] <= A[mid] */

            if (higuy < loguy)
                break;

            /* if loguy > hi or higuy == lo, then we would have exited, so
               A[loguy] > A[mid], A[higuy] <= A[mid],
               loguy <= hi, higuy > lo */

            swap(loguy, higuy, width);

            /* If the partition element was moved, follow it.  Only need
               to check for mid == higuy, since before the swap,
               A[loguy] > A[mid] implies loguy != mid. */

            if (mid == higuy)
                mid = loguy;

            /* A[loguy] <= A[mid], A[higuy] > A[mid]; so condition at top
               of loop is re-established */
        }

        /*     A[i] <= A[mid] for lo <= i < loguy,
               A[i] > A[mid] for higuy < i < hi,
               A[hi] >= A[mid]
               higuy < loguy
           implying:
               higuy == loguy-1
               or higuy == hi - 1, loguy == hi + 1, A[hi] == A[mid] */

        /* Find adjacent elements equal to the partition element.  The
           doubled loop is to avoid calling comp(mid,mid), since some
           existing comparison funcs don't work when passed the same value
           for both pointers. */

        higuy += width;
        if (mid < higuy) {
            do  {
                higuy -= width;
            } while (higuy > mid && __COMPARE(context, higuy, mid) == 0);
        }
        if (mid >= higuy) {
            do  {
                higuy -= width;
            } while (higuy > lo && __COMPARE(context, higuy, mid) == 0);
        }

        /* OK, now we have the following:
              higuy < loguy
              lo <= higuy <= hi
              A[i]  <= A[mid] for lo <= i <= higuy
              A[i]  == A[mid] for higuy < i < loguy
              A[i]  >  A[mid] for loguy <= i < hi
              A[hi] >= A[mid] */

        /* We've finished the partition, now we want to sort the subarrays
           [lo, higuy] and [loguy, hi].
           We do the smaller one first to minimize stack usage.
           We only sort arrays of length 2 or more.*/

        if ( higuy - lo >= hi - loguy ) {
            if (lo < higuy) {
                lostk[stkptr] = lo;
                histk[stkptr] = higuy;
                ++stkptr;
            }                           /* save big recursion for later */

            if (loguy < hi) {
                lo = loguy;
                goto recurse;           /* do small recursion */
            }
        }
        else {
            if (loguy < hi) {
                lostk[stkptr] = loguy;
                histk[stkptr] = hi;
                ++stkptr;               /* save big recursion for later */
            }

            if (lo < higuy) {
                hi = higuy;
                goto recurse;           /* do small recursion */
            }
        }
    }

    /* We have sorted the array, except for any pending sorts on the stack.
       Check if there are any, and do them. */

    --stkptr;
    if (stkptr >= 0) {
        lo = lostk[stkptr];
        hi = histk[stkptr];
        goto recurse;           /* pop subarray from stack */
    }
    else
        return;                 /* all subarrays done */
}

Apparently, the answer does nothing to have to do with the minor tricks and improvements the algorithm does, as I thought in advance. The true reason why it’s significantly faster is that is uses less subcalls than I do. Calling a function seems to be very expensive. Knowing so, Ive created a different algorithm using only the 2 mandatory subcalls (to itself). And guess what: It’s even faster than the std::qsort algorithm. Few tweaks later, here’s the fastest solution I came up with. It sorts 30 millions integers in about 5.3 seconds (std::qsort 13.6s), 100 million integers in 14s (std::qsort 41.7s). So it’s about 2.5x as fast.

I leave it up to you to multi-thread this algorithm. It’s gonna be super fast 🙂

// ------------------------------------------------------------------------

void QuickestSort( int a[], int n) {
	// nothing to do!
	if ( n <= 1 ) return;
	
	// pivot index position
	int pp = 0;

	// only for big arrays: calc. median from a[0], a[n/2], a[n]
	if ( n > 100 ) {
		if ( a[0] > a[n/2] ) {
			if ( a[0] > a[n] ) pp = (a[n/2] > a[n]) ? n/2 : n;
			else pp = 0;
		}
		else {
			if ( a[n/2] > a[n] ) pp = (a[0] > a[n]) ? 0 : n;
			else pp = n/2;
		}
	}

	// move pivot to a[0]
	int t = a[0]; a[0] = a[pp]; a[pp] = t;

	// pivot value
	int p = a[0];
	
	// initial left and right pointer
	int i = 0; int j = n;

	while ( TRUE ) {
		// from left to right find element greater than pivot
		while ( i < n && a[++i] < p ) {}

		// from right to left find element smaller than pivot
		while ( a[--j] > p ) {}

		// left >= right: pointers crossed -> stop
		if ( i >= j ) break;
		
		// switch elements
		t = a[i]; a[i] = a[j]; a[j] = t;
	}
	
	// move pivot to the middle of the array
	t = a[i-1]; a[i-1] = a[0]; a[0] = t;
	
	// divide & conquer
	QuickestSort( a, i-1 );
	QuickestSort( a+i, n-i );
}
By |November 19th, 2015|Uncategorized|0 Comments

Multithreaded Quicksort: Measurement

QS0: Quicksort single threaded
QS2: Quicksort multithreaded on first call (2 threads)
QSn: Quicksort fully threaded
t: Time in seconds
#: Number of items in 10^6

Algorithm t t t  CPU t t t
QS0 38,6 38,3 38,6  13% 99,3 98,2 100,0
QS2 23,8 21,5 38,0  15% 83,9 92,5 91,0
QSn 6,5 6,6 6,4  95% 15,9 15,5 15,3
# 30 30 30 50 50 50
By |November 19th, 2015|Uncategorized|0 Comments