This document addresses two of the biggest problems in IT today. On the one hand there is the increasing plague of viruses and worms
which is estimated to cost the european economy about 9 billion euro in2004. As the battle between virus writers and anti-virus companies getharder, the used means become more technologically advanced. In thispaper the basic mechanism of worms and viruses, called ”Exploiting” willbe covered.
Nowadays, worms and viruses install backdoors in infected machines
and form huge remote-controllable networks which are then used for mas-sive spamming. The working time loss and hit on infrastructure in Europein this year will be about 9.2 billion euro according to The sec-ond part of this document describes the cross-relations of Worm writersand spammers and their basic approach on cooperation as well as thetheory behind the currently used spam filters.
So-called exploits are gates which open up systems to virus attacks. Basicallythis is about programming glitches which can be used to introduce maliciouscode. In this section the term of an exploit will be detailed and the currentlyexisting countermeasures are described.
According to an exploit is defined as follows:
”exploit n. [originally cracker slang] 1. A vulnerability in soft-
ware that can be used for breaking security or otherwise attackingan Internet host over the network.”
In particular this is about programming glitches where some circumstances werenot considered. For example in C/C++ you define an array of 200 elements. But if an attempt is made to write to a higher element of the array, an exploitcan occur. So if some input handling asked the user to enter a number from 1to 200 to change an element of the array and it is not checked if the enterednumber is in bounds, this can be used to write to the memory at positions wherethe program does not expect it.
The exploits that virus and worm writers deal with are mostly bugs in programslike Microsoft Outlook or the operating system itself. As these consist of millionsof lines of code there are plenty of possibilities to find some exploit.
Basically this is about injecting executable code into the bugged programs whichis then executed and enables the possibility to copy the actual program (suchas a worm) onto the system.
Of course it is crucial to avoid these exploitable bugs and thus there are somemeans to do so.
Of course, making no errors in code would disable all exploits, but for everythousand lines of code, there is an estimated 100 to 150 bugs. Even with extremecaution there still remain up to five undiscovered errors in this amount of lines
To minimize the probability of a program being exploited there are several typesof software.
From the start a programmer can assure a low bug rate by making it a habitto check all external information such as user input or network communication. But there still may remain situations where such an input is just used withoutchecks or an exact examination is not possible.
For this case there are various quality assurance tools out there, which are mostlycalled ”lint” tools. Besides their original purpose to detect style glitches, thereis one implementation called ”splint” which specializes in recognizing dangerousparts in software and which covers the C program-ming language.
Tracing security concerns in software is a heavy task and with the complexity ofthe programming language increasing the chances for detecting problems getshard. This is the reason why there are few tools like that for more advancedlanguages.
Commercial CASE tools sometimes provide support for this, which is then called”auditing”. One example of such a tool is Borland Together Control Center.
Even if a program is already written, there can still be some way of securingit. For a long time, tools like StackGuard and StackGhost have been availablewhich are invoked as preprocessors to compilers and modify dangerous codeparts to be less likely exploitable.
Mainly these addons try to randomize the address space (known as AddressSpace Layout Randomization or ASLR). By randomizing the position of vari-ables in the compiled binary it is much harder to guess which exact measureshave to be taken to exploit a program.
In addition on every call of a subprogram a cookie (also called canary) is placedonto the stack If the subprogram then quits, it is compared if the cookieis still present or was overwritten because of some exploitation techniques. Ifthis is the case the whole application is closed for security reasons and thusprevents execution of probably malicious code. But sadly there are still ways tocircumvent this way of protection
Most exploits can be avoided by using a programming language which workswith virtual machines. Virtual machines are some sort of abstraction of theactual computer either by completely simulating a computer or by separatingthe executed binary code from the one the machine knows. In addition, virtualmachine-based languages automatically check the bounds of arrays to preventexploitable parts (see bounds checking in Javand .Net).
The most modern variants of operating systems support some form of preventionagainst injection of foreign code in applications. For example the WindowsXPService Pack 2 does this on two ways
On the one side it supports the DEP/NX features of 64 bit processors (seebelow) and on the other side the Service Pack was compiled with some special
configuration of their own C/C++ compiler which uses so-called ”cookies” toidentify possible exploitation (see Compiler Add-ons above).
The problem with this feature is, that it is just some technique to prevent core-functions of Windows to be exploited. Software of foreign vendors cannot usethis if it is not compiled with the Cookie feature which is only present in the mostrecent versions of Microsoft’s compiler and special gcc versions (StackGuard,SSP, ProPolice).
The same is true to the Linux Pax which is also some patchto be included into the main linux kernel and then to secure it. But becauseof the more open infrastructure (gcc usage is almost mandatory and for free)it is very likely to be adapted throughout the whole community in quite shorttime. Merging of the Pax support is said to occur within the 2.6 line of thelinux kernel.
Data Execution Prevention describes a technique to protect programs form be-ing exploited by marking memory region as ”non executable”. Due to thearchitecture of this protection, it is just usable on computers with 64 bit widepage table entries. The most current CPUs to allow DEP are the Intel Itaniumand the AMD Athlon64, both being true 64 bit processors.
It works by using a reserved bit in page table for marking areas as being non-executable. As this bit is the highest bit in the 64 bit pagetables, DEP is justusable in 64 bit mode. For 32 bit mode there is currently just an implementationon servers which have the Physical Address Extension (PAE) enabled.
On 32 bit platforms just the upcoming AMD Sempron processor will be able touse DEP techniques under the name Enhanced Virus Protection (EVP) whichis a bit misleading. As we have seen DEP does not provide any anti-virusfunctionality but just limits the possibility of using exploits This support on a32 bit platform is possible, because the Sempron processors are based on theAthlon64 and just stripped of the 64 bit mode but with remaining support for64 bit page tables.
Some method from disabling the execution of viruses and worms is still prepared. The so-called Trusted Computing (TC) Initiative is a group of big hardware ven-dors which try to address all issues of Digital Rights Management (DRM), copyprotection, confidentality and security. In a TC-enabled computer the operatingsystem just executes verified programs and denies execution of the non-signedones. Thus just the applications which were certified by e.g. Microsoft will beable to install and start up on a computer. As no worm/virus will ever get asigned signature, their execution in a TC environment will be impossible.
While this sounds good, Pand TC have huge privacy concerns. In these secured environments, the tracking of office documents is possible, no(probably allowed) private copying of CD-ROMs is possible any more and mostlikely small software vendors and opensource programmers will not be able tobring up the money needed for certification and would be locked out forever.
The original term used for mass-mailings that the reciever did not request was”Unsolicited Bulk Email” or in short ”UBE”. Lateron in this term the word”Bulk” (used for classifying mass mailings) was replaced by ”Commercial” asmost of these mails were and still are used to advertise goods.
Nowadays the term ”spam” is used for these mails, as it was more memorizableand abbreviations normally are not as handy. The original meaning of spam is”spiced pork and ham” and describes canned meat which was very popular intimes of World War II when it became the primary source of food for the U.S. troops as well as the residents of the United Kingdom. After years of eatingspam, as it was not rationed, people were fed up of it and the famous comediansof Monty Python made a sketch about spam in which the word is mentioned adozen times and is always present to annoy.
This property of being annoying in the sketch started the usage of ”spam”for describing unwanted mailings as these were as well annoying and presenteverywhere.
The estimates about spam mails sent vary but the value of 25 million per daysounds fairly reasonable. Other sources claim that about 32% of all mails sentwould be spam costing $874 per person of productivity loss
The value of spam an email user recieves varies upon many factors like hisparticipation on the net (especially newsgroups) and his own handling of thisaddress. As most of the sites nowadays require registration and this usuallyincludes the user’s email address, users which register at all the sites will getmore spam due to selling of personal data.
So the amount of spam can vary between 10% and 90% (as in the inbox of thisauthor).
Although one normally would not think so there are strong connections betweenthe writers of worms and viruses and spam nowadays. To explain this we haveto do some short glance on the way spam works.
Spammers usually do not send any mail under their address because of thepossible replys they would get from angry users. So they use fake addresseswhich in fact do not even exist and are chosen at random. This is possiblebecause of design flaws in the protocol used to send e-mails on the Internetbecause in early times the Internet was a network of trust.
The basic procedure for sending mail is: connect to a mail server, enter therecipient, enter additional information (sender email) and the mail itself. As wecan see there is no authentication or checking involved and a malicious senderis able to just use some inexistant or foreign address.
Of course it would be problematic if some spammer used their own internetprovider’s server for this purpose and so they hunt for so-called ”open relays” onthe net. Open relays are misconfigured servers which accept mail for any domainand deliver it instead of just accept mail for the domain they are responsiblefor. Finding new open relay servers is somehow hard and open servers oftenget a new, secure configuration within days because the huge amount of mail isdetected.
So spammers need another source of mail relays on the net. The easiest part forthis is writing specialized worms which open up a backdoor on infected machineswhich can then be used to send spam.
On the one side there are worms which open up an ordinary SMTP serverwithout the user of the computer knowing. A spammer then just has to connectto this SMTP service and can start sending mails. But there is a drawback:ordinary users normally have a rather slow connection compared to real mailservers as they may use an old modem or ISDN for connecting. So a spammerwho uses these backdoored worms has to use multiple at once and for exampleuse 20 or more infected machines in parallel to get the desired throughput whichis usually above 500,000 mails per hour as this is the speed conventional massmailing software
Finding these widely distributed infected machines can be a pain as they arespread all over the world and some of them are probably of little use because ofslow speed or high packet loss. So the most sophisticated spam related virusesuse a much more modern technique.
Modern spam worms establish a complete peer-to-peer network, using modernalgorithms from serious P2P research. They add some encryption layer to theseand then these large networks can be utilized for sending out for massive spamwaves.
There are custom tools for scanning the Internet for infected machines and thenhooking up into the established network, so just one machine has to be detectedand the full power of thousands of machines is open to the mass mailer. Forutilizing these the spammer can actually choose how many of the nodes should
be used for his task (he is limited by his own upload bandwidth anyway) andthen just assign the individual mail delivery based on the detected bandwith ofthe infected machine.
Most recently there occured some kind of personal wage between the writers ofthe Bagle, Netsky and MyDoom viruses. This is not only because Bagle and My-Doom are written to exploit the same bugs in Windows operating systems butrather because the author of Netsky (who is claimed to be a hobbyist ratherthan a spam group member like the others) implemented his worm to deleteexisting instances of Bagle and MyDoom on infected machines. So the P2P net-works of Bagle and MyDoom experience a high fluctuation rate and become lessuseful as most of the worm ressources has to be used to battle the other wormsinstead of finding new victims. This personal wage can be clearly observed inexcerpts of the worm code where the authors harass each
Finding information about the exact techniques used to build up and securing ofthe used P2P networks as well as the actual usage and ”cost” of the fluctuationon the networks is of course little available. It is important for the spam groupsand virus writes to keep this information secret. This is not only because of anti-virus companies would be able to detect and clean infected machines better, butalso to protect from usage of the networks by rivaling groups.
There have been many attempts in filtering spam to re-establish a good overviewon the personal inboxes. In the start of spam filters the technique of good/badword lists was very common. To use this, extensive lists of words that are uniqueto spam (bad list) and words that never occur (good list) were kept. On everyrecieved mail a program looked through it and classified a mail as spam if acertain amount of bad words was detected.
Lateron, when the senders of mass mailings adapted, this method started tofail.
This is where the most recent technology of spam filters comes into play. Themost popular is the Bayesian filter which bases on probabilities and statisticdata, which is even able to learn the user’s habits.
Bayes filters (also called Bayesian filters) use knowledge out of statistics andprobability theory to estimate if an email contains unwanted contents. Thesefilters are the most-used type of anti-spam utilities at the moment and alreadyprovide very high recognition rates with a low number of false positives.
Bayesian filters are named after Thomas Bayes (1702-1761), an english mathe-matician which researched in the area of probability, developing a theory on theinference of probabilities. The most known formula which originated from himis
P(A|B) P(B) = P(A,B) = P(B|A) P(A)
which describes the relationship between the probabilities of two events whereP(A|B) is the probability of event A when event B already happened and P(A,B)is for both events having happened.
The basic idea of a Bayesian filter is that a mail is not filtered due to somewords on a blacklist, but instead calculating a probability of a given mail to bespam. For this purpose a Bayesian filter first has to learn about the habit ofits user by tracking outgoing mail and the decisions on ”Spam or no Spam” ofthe user on incoming mail. With sufficient input the Bayes filter then startsto calculate probabilities for word combinations (”buy” and ”viagra” in a mailwill most likely be spam). There are some words used which classify a mailalmost certainly as spam Although naive Bayesian filters are quiteeasy to implement, there is more math involved in more modern approaches tomaximize recognition rate
By this mean the filter adapts to the user habits and his mail dialogues in away that is generally called ”context sensitive”. Thus some urologist who willbe using ”viagra” in non-spam mails will get an adaptive filter which does nottouch important mail.
There are many ways to improve the described naive Bayesian filters. For ex-ample an advanced filter could recognize conjugated verbs as being of the sameorigin. And more modern filters also have to know Hypertext Markup Language(HTML) as most of the spam mails today arrive in HTML form. For cloakingthe words, these are interrupted by (undefined) HTML tags. As a browser doesnot look at these the word looks ok in the mail reader while a more naive filterwill not recognize them. In addition, by defining so-called ”magnets”, a mailwill immediately be thrown into either spam or ham (wanted mails) if the wordis encountered. This usage of a magnet can be very important to sort out pressreleases into the wanted mail folder
For the basic learning strategy of filters there are three categories:
In this way of learning the filter modifies the word probabilities based onhis own classification. Thus he is always confirming his own decisions andcan easier adapt to new, varying spam.
On the more user-oriented side, the automatic learning of the filter isturned off. Usually the user, if noting the classification rate decreases,moves more recent spam and ham messages into the filter to adjust theprobabilities.
The difference-learning approach uses two folders, where the user sorts inwrongly-classified spam and approved ham. The learning of the filter isthen relying solely upon these two filters.
The most advanced spam currently is not filterable by Bayesian Filters as theyrely on the text in a mail but not check images. There are already some mailingsout there which show the spam-like contents in an image while the text justconsists of typical Ham-words thus these mails will pass the filters untouched
[AC02] Mario Juarez et al Amy Carroll. Microsoft palladium: A business
Daniel Bachfeld. Surf-versicherung. cT Magazin fuer Computertechnik,page 105.
[Bus04] Peter Busser. Linux Magazine, 2004.
[Dyn04] DynamicSoftware. Faq for mail communicator. 2004.
[Gra02] Paul Graham. Better bayesian filtering. 2002.
[Inc03] Nucleus Research Inc. Spam: The silent roi killer. 2003.
JargonFile. The jargon file, a comprehensive compendium of hackerslang.
[Kru03] Karl S. Kruszelnicki. Great moments in science - software sucks. 2003.
[Lew04] David D. Lewis. (naive) bayesian text classification for spam filtering.
[Lia04] Zhenkai Liang. Defensing stack smashing attack. 2004.
Andreas Linke. Spam oder nicht spam. cT Magazin fuer Comput-ertechnik, pages 150–153.
[Mic00] Sun Microsystems. The java language specification, second edition.
Net-Security.org. The creators of bagle, mydoom and netsky exchangepleasantries. 2004.
[Ric02] Gerardo Richarte. Four different tricks to bypass stackshield and stack-
[Stu04] IDC Studies. The true cost of spam and value of anti-spam solutions.
[Wae03] Graeme Waerden. Eu to lose billions through spam and viruses. 2003.
[Wei03] Kai Wei. A naive bayes spam filter. 2003.
Product Name Label Claim Standard Each hard gelatin capsule contains :Pantoprazole Sodium Sesquihydrate ACIDOUSE-D (as sustained release pellets)Colour: Approved Colours used. Each Hard gelatin capsule contains :Propranolol hydrochloride CERLOL SR 40 (as sustained release pellets) Colour: Approved colours used. Each Hard gelatin capsule contains:Rabeprazole sodium Equivalent to R
HarryCrowe FOUNDATION Protecting the Integrity of Academic Work Friday, November 2 Laurier Alcove Laurier Room 13:15 Howard Pawley , President of the Harry Crowe Foundation. Adjunct Political Science, University of Windsor. Academic Entrepreneurship and the Integrity of Science: Are Reconcilable? Sheldon Krimsky , Professor of Urban & Environmental P