Computer Science Colloquium Series Thursday, November 19, 2009 12:30-1:15 CCT 208 “Statistical Tools for Linking Engine-generated Malware to its Engine” The lecture will be given by Edna Milgo. Refreshments will be served!!!

Computer Science Colloquium Series

Thursday, November 19, 2009

12:30-1:15

CCT 208

“Statistical Tools for Linking Engine-generated Malware to its Engine”

The lecture will be given by Edna Milgo, a graduate student from TSYS department of Computer Science conducting research on malwares.

Two filtering (decision support) methods for linking engine-generated malware to its engine are proposed and evaluated. The proposed methods use the n-gram frequency vector (NFV) of the opcode mnemonics of an engine-generated malware instance as a feature vector for the instance.

The first method implements a Bayesian-like classifier that uses optimized 1-gram frequency vectors of programs as feature vectors. This method was successfully evaluated on a sample of benign programs and samples of malicious programs from the W32.Simile malware, yielding a filtering accuracy of 100% for certain feature selection choices.

The second method uses optimized 2-gram frequency vectors as feature vectors and classifies a suspect program by computing its proximity to the average of the NFVs of all training instances of a given malware family. This method was successfully evaluated on four malware-generating engines: W32.Simile, W32.Evol, W32.NGCVK, and W32.VCL. The evaluation yielded a set of four 17-tuples of real numbers as signatures for each of the engines, and achieved a 95% discrimination accuracy between a sample of benign programs and samples of malware instances that were generated by these engines. Accuracies of 94.8 % were achieved for engine signatures of size 6, 8 and, 14 doubles.

This work was inspired by successful methods for attributing natural language texts to their respective authors.

Refreshments will be served!!!