The rise in the popularity of the digital marketplace has driven a rise in online crime, manifesting itself in many ways, including: the spread of virus software, websites that ``phish`` for personal information such as bank account details, malicious software that is capable of logging keystrokes, the theft of information through ``ransomware``, the sending of spam emails to solicit purchase of non-existent goods and so on. This exploitation is often carried out by criminal communities with access to large networks of distributed computers, commonly referred to as 'botnets'. Law enforcement agencies regularly employ computer forensic techniques against these botnets and the criminal communities that control them. This battleground has become more sophisticated over time and the software that powers a botnet now regularly deploys a growing library of anti-forensic techniques to make analysis harder. This research examines what anti-forensic techniques are in use by bonets throughout the botnet life-cycle. A number of botnets were analysed in a ``safe`` environment through a series of controlled experiments, using both static code analysis and dynamic execution of the malware. Throughout each experiment, the different types of anti-forensic techniques in use were recorded, and an attempt was made to identify the point in the botnet life-cycle when they were used. The experiments showed that a wide variety of anti-forensic techniques are indeed in use by botnets, offering considerable challenge to the forensic investigator. A catalogue of these techniques was produced with an indication of the difficulty each technique might present to the analyst. Program packing (obfuscating the executable code of the botnet) proved to be the most common anti-forensic technique in use; it also presented the greatest difficulty to the forensic analysis process. Many of the other anti-forensic techniques in use by the sample botnets were observed throughout the entire botnet life-cycle, suggesting that when protecting a botnet from forensic analysis, the author is not concerned with what stage of the life-cycle the botnet is in. A correlation was also observed between the quantity and overall difficulty level of the anti-forensic techniques in use, and the criminal success it has ``in the wild``.
J. Dokler. Identification of User Information Needs Based in the Analysis of Local Web Search Queries. http://computing-reports.open.ac.uk/2008/TR2008-02.pdf. 2008. M801 MSC Dissertation, 2008/02,
Together with the emergence of the World Wide Web some sixteen years ago there came the promise of instant feedback from users as to what information they want and when they want it. This information could then be used to determine the content and structure of web sites. As is usually the case, reality proved to be more complicated. Although user feedback was indeed instantaneous, the analysis was mostly limited to what people looked at as opposed to what they were looking for. Only in recent years has research focused on analysis of search queries that users submit to search engines. And search queries come close to representing what users are looking for. Still the majority of this research is based on queries from general purpose search engines. In this dissertation I explore the findings and ideas coming from the work on general search engines (and in parts on local search engines) and try to apply them in the context of a single web site to improve the structure and content of this web site. In particular I explore the idea that it is possible to determine the top level content elements of a web site from the analysis of the local search queries. Based on this I then proceed to explore the periodic change of web site’s content and how this information can be used to improve the web site. By implementing two different methods of search query analysis (manual and automatic clustering) and examining the results I show that search query analysis is a viable potential source of information about a web site’s structure and that identification of periodic changes of content can be used to amend a web site in advance before the next change occurs.
D. Foreman. Evaluating semantic and rhetorical features to determine author attitude in tabloid newspaper articles. http://computing-reports.open.ac.uk/2008/TR2008-23.pdf. 2008. M801 MSC Dissertation, 2008/23,
This dissertation investigates the potential of families of machine learning features to improve the accuracy of a semantic orientation classifier that assesses attitudes of tabloid journalists towards the subjects of their opinion piece articles. A category of language, `language of judgement', is defined by which a journalist expresses an opinion matching his overall opinion of an article's subject matter. When the existence of ``language of judgement`` was investigated, high inter-annotator agreement on per-document author attitude was found (values of Fleisch and Cohen's kappa were both 0.845) along with moderate agreement on per-sentence classification of judgemental or non-judgemental language (Fleisch's kappa of 0.507 and Cohen's kappa of 0.499). Three families of feature sets were defined to detect this language. The first family, `Semantic features', motivated by consideration of theory of journalism, tags repetitions of nouns that are either located in particular sections of the article or occur multiple times in the article as potential language of judgement. The second and third families, `rhetorical features', draw on Mann and Thompson's Rhetorical Structure Theory. For the second family, rhetorical relations are tagged to indicate the presence of potential language of judgement. For the third family, rhetorical relations are considered to mark potential shifts into and out of language of judgement. Areas of articles between tags from the first family and tags from the second family are tagged with features from this third family, to indicate that the sentence is potentially within an area of language of judgement bounded by these rhetorical relations. The feature sets were not very productive in acquiring judgemental language, together or separately. Precision of 0.405 for combined features was low but exceeded the overall percentage of judgemental language (32.8 percent). Recall of 0.162 was very low. While experimentation with the testing corpus did not give strong evidence for value of the feature sets, cross-validation tests on the training corpus showed greater potential, achieving precision of 0.520 and recall of 0.200. Inspection of learning curves created with the training corpus for the combination of all features showed that learning of judgemental language was taking place. This was also true for the `rhetorical' second and third families when they were investigated separately but was not seen for the first family of features. Weaknesses in corpora construction methodology are considered potentially responsible for differences in results between corpora: suggested changes to remedy this, if more opinion piece articles can be collected, are described. When classifying per-document author attitude, using human-annotated language of judgement was seen to improve the accuracy of a semantic orientation classifier that used Turney's PMI-IR algorithm (in comparison to use of all language in a document). However classification using language selected by the machine learning method did not lead to a similar improvement. The low precision and recall for acquisition of language of judgement obtained on testing corpus data is considered a likely cause of this.
A. Kemble. Forensic Computing: Use of Linux Log Data in USB Portable Storage Device Artefact Analysis. http://computing-reports.open.ac.uk/2008/TR2008-24.pdf. 2008. M801 MSC Dissertation, 2008/24,
Portable storage devices (PSDs) can be very useful but they pose a big security risk. News reports regularly describe companies and government departments losing personal and confidential data. The consequences can involve potential for identity fraud, contract termination and threats to national security. In the event of suspected security breach an organisation may investigate to determince the extent of the problem and find those responsible. Most computer use results in artefacts remaining on the computer long after the activity occurred. These artefacts may be used in a forensic investigation to identiy the actions that took place. In an investigation of USB portable storage devise usage, the user, storage device, time of use and purpose would need to be determined to identify a case of misuse. A series of experiments were performed to study the data available on a Linux computer with various logging configurations. A forensic investigation method was adopted from the current literature and evolved during the project. The results show the default configuration of a given Linux distribution does not provide enough evidence to satisfy a forensic investigation into USB flash drive usage, but improvements can be made by modifying the logging software configuration. The project delivers an evaluation of the native Linux logging software and provides a recommendation of the most effective at recording PSD artefacts. The project also provides a tested investigation procedure that helps determine what PSD usage has taken place on a Linux computer.
R. Livermore. A multi-agent system approach to a sumulation study comparing the performance of aircraft boarding using pre-assigned seating and free-for-all strategies. http://computing-reports.open.ac.uk/2008/TR2008-25.pdf. 2008. M801 MSC Dissertation, 2008/25,
Achieving true efficiency is an important commercial driver for airlines and can be of huge value in differentiating them in a competitive marketplace. The aircraft boarding process remains a relatively unstudied area in this regard and is perhaps one of the few remaining standard airline operations where significant improvements may still be delivered. Studies to date have focused on improving the process by applying varying levels of control to passenger ordering as they enter the aircraft. However, passenger actions and interactions are, by their nature, goverened by an element of chance and so the natural state of the borading system tends towards randomness. In acknowledgement of this fact, this simulation-based study investigates the performance of the boarding process when controls are relaxed to a greater or lesser degrees. It investigates whether multi-agent systems are appropriate for simulating stochastic processes by comparison with baseline results and whether they allow real conclusion to be drawn on the relative merits of different boarding systems. The results produced by this work cannot be statistically proven to be the same as the baseline and thus it cannot be said in this context that multi-agent systems are appropriate for simulating stochastic processes. However, in relative terms, the findings of this work do appear to follow the patterns hypothesised in earlier studies - that is that borading using pre-assigned seating but with no correlation between the order passergers enter the aircraft and the postion of their seat is preferable over a range of different scenarios to Free-for-All borading. This has allowed useful future work to be identified that will ensure that the results presented in this study are built upon in a more comprehensive manner to develop a fuller picture of the types of passenger interaction and interference that cause differential performance across boarding strategies.
A. Nkwocha. Design Rationale Capture with Problem Oriented Engineering: an Investigation into the Use of the POE Framework for the Capture of Design and Architectural Knoweldge for Reuse within an Organisation. http://computing-reports.open.ac.uk/2008/TR2008-26.pdf. 2008. M801 MSC Dissertation, 2008/26,
Design rational in software engineering fills in the gaps between the original requirements of a system and the finished product encompasing decisions, contraints and other information that influences the outcome. Existing research in this field corroborates the importance of design rational for the evolution of existing systems and creation of new systems. Despite this, the practice of design rationale capture and reuse is not as extensive as could be expected due to reasons which include time and budget contraints and lack of standards and tools. This capture of Design Rationale during software design activities carried out using Problem Oriented Engineering (POE) was demonstrated with the use of a case study. POE is a formal system for engineering design that provides a framework for the resolution of software problems in a stepwise manner. A review of literature on Design Rationale, its capture and management yielded a list of elements used as the criteria for identifying design rationale in the information gathered during the case study. Examination of that information revealed that all the identified elements were recorded and led to the conclusion that Design Rationale is captured when solving a software problem using POE. Examination of the flow of information that occurred during the execution of the case study led to the conjecture that Design Rationale recorded during the case study could be reused. Successful reuse would, however depend on the effectiveness of the categorisation, storage and organisation of the information gathered.
I. Ostacchini. Managing assumptions during agile software development. http://computing-reports.open.ac.uk/2008/TR2008-27.pdf. 2008. M801 MSC Dissertation, 2008/27,
Software plays an increasingly critical role in our world, yet the assumptions that underlie software development often go unrecorded, these assumptions can fail at any time, with serious consequences. This research evaluates a lightweight approach to assumption management (AM), designed to complement the agile software development methods that are gaining in popularity. Key AM tasks were drawn from previous research, and implemented over three months within a small, agile software development team. A simple database was developed for recording and monitoring assumption information. Thirty-three assumptions were recorded during the three months; a further 17 failed assumptions were recovered from the preceding three months. Two key indicators were proposed for measuring whether AM had been successful. Only one of these indicators was detected in the research results; a longer research timeframe would be required for a more conclusive analysis. A number of strong correlations were found between properties of assumptions. While the data collected depended to a large degree on the subjective estimates of the author, these judgements were validated with some success by his colleagues. In some ways, assumption management was found to be a good fit for agile development; however, the AM process was not successfully integrated into the team's development process, due to a difficulty in adapting to the required 'assumption-aware' way of thinking. Advice is offered to reserachers seeking to ease this transition, and to those looking to conduct further studies in assumption management.
A Thorpe. Synthesising Test-based justification of Problem Oriented Software Engineering. http://computing-reports.open.ac.uk/2008/TR2008-28.pdf. 2008. M801 MSC Dissertation, 2008/28,
(POSE) is a young framework supporting requirement and design specification. POSE allows a blend of formal and non-formal.Much of POSE research has been concerned with safety-critical systems, where a justification case is required by legislation. Hall et al. (2007a) suggested that an alternative, and as yet undefined, method of justification based on testing may be cheaper than the existing approach. Also, to date there has been no research into the relationship between testing and POSE. The project identifies an approach to test-based justification of POSE. I arrived at this through a synthesis of observations (based on writing a POSE specification and associated test designs), professional experience, and literary review. My approach has three incremental levels of justification detail, with each representing a decrease in risk, and an increase in cost, from the previous one. These levels are framed to describe the relationship between quality assurance, project management, development methodology and POSE. A by-product of this work has been a clearer understanding of the relationship between POSE and Testing within the software development life-cycle. This project is likely to be of interest to those using POSE for a development project,including quality assurance members, project managers, development managers,designers, testers, clients, and the POSE research community.
J Tredgold. An assessment of the analytical benefits afforded by a timeline visualisation of Semantic Web data with temporal properties. http://computing-reports.open.ac.uk/2008/TR2008-29.pdf. 2008. M801 MSC Dissertation, 2008/29,
The vast amount of data on the World Wide Web is, for the most part, authored to be easy for humans to comprehend, rather than for machines to parse. This is good when humans want to read a page, but not so good when they want machines to search out information on their behalf, for instance, to find the best route and price for an upcoming trip. Activity characterised as the Semantic Web is attempting to create a web of structured data that can be utilised by network-based software applications. Data such as that found on Wikipedia is now available on that web. This project sought to evaluate some of the potential benefits of being able to build rich interactive applications to access and analyse data obtained from this new web. To do this a prototype timeline application was built backed with a subset of Semantic Web Wikipedia data. It was evaluated alongside traditional Wikipedia access, with results showing efficiency and accuracy gains. This suggests that, for a class of queries, the approach taken could provide a useful addition to the traditional routes of data discovery.