| 1 | == Design Notes == |
| 2 | |
| 3 | This document serves the following purposes: |
| 4 | * It discusses the high-level concepts of an experiment life cycle. This design is preliminary and constantly evolving and this document will be updated periodically. |
| 5 | * Building on the concepts and constructs within the experiment lifecycle, it describes how elm integrates with fedd, seer and cedl. |
| 6 | * It discusses goals for the August review |
| 7 | |
| 8 | ==Overview== |
| 9 | The current experimental testbed services primarily focus on providing experimenters access to testbed resources with little or no help to configure, correctly execute, |
| 10 | and systematically analyze the experiment data and artifacts. Additionally, while it is well know that experimentation is inherently iterative, there are limited mechanisms to integrate and cumulatively build upon experimentation assets and artifacts during the configure-execute-analyze phases of the experiment lifecyle. |
| 11 | |
| 12 | The eclipse ELM plug-in provides an integrated environment with a (large) collection of tools and workbenches to support and manage artifacts from all three phases on the experiment life cycle. Each workbench or perspective integrates several tools to support a specific experimentation activity. It provide a consistent interface for |
| 13 | easy invocation of tools and tool chains along with access to data repositories to store and recall artifacts in a uniform way. For example, the topology perspective |
| 14 | allows the experimenter to define a physical topology by merging topology elements based on the specified constraints and then validate the resultant topology. |
| 15 | |
| 16 | The key capabilities of the ELM plug-in include: |
| 17 | * Mechanisms to record variations and derivations of the experiment assets and artifacts and along with their inter-realtionships for the entire set of tasks over which an experimenter iterates during the study. |
| 18 | |
| 19 | * Inform design and analysis tools to obtain maximum information with the minimum number of experiment trials for a particular study. Every measured value in an experiment is fundamentally a random variable. Hence there are slight variations in the measurements during a trial even when all experimentation factors are kept constant. Hence to be able to characterize such stochastic behavior, it is necessary to execute multiple repetitions and identify confidence levels. Leveraging the tools in the analysis phase, feedback from the analysis phase can be used to control the number of required repetitions for statistically significant results. |
| 20 | |
| 21 | * Facilitate composition of functional and structural elements of the experiment based on stated and unstated constraints. The ELM workbenches allow creating and linking functional elements of the experiment without specifying the underlying structure and topology. Resolving the constraints to configure a set of realizable and executable experiment trials is a complex constraint satisfaction problem. |
| 22 | |
| 23 | * Facilitate experiment monitoring and analysis for accuracy of results and availability of resources and services. ELM+SEER will enable monitoring the experiment configuration and performance of resources to ensure the experiment is executed correctly. While resource misconfiguration and failures are easier to spot, identifying "incorrect performance" of a resource or service is extremely hard. For stochastic processes as seen typically in networked systems, it is very important to be able to identify such experimentation errors as they can significantly impact results and bias measurements. |
| 24 | |
| 25 | * Enable reuse of experiment assets and artifacts. Reuse is driven by the ability to discover the workflows, scenarios, and data. The ELM environment will provide registry and registry views, along with (RDF-based, DAML+OIL) metadata to facilitate the discovery process. ELM will provide tools to index and search semantically rich descriptions and identify experimentation components including models, workflows, services and specialized applications. To promote sharing, ELM will provide annotation workbenches that allow experimenters to add sufficient metadata and dynamically link artifacts based on these annotations. |
| 26 | |
| 27 | * Support for multi-party experiments where a particular scenario can be personalized for a team in a ''appropriate'' way by providing restricted views and control over only certain aspects of the experiment. The registry view will allow the team to access only a restricted set of services. The analysis perspectives and views will present relevant animations and graphs to the team. Thus by personalizing a scenario view, the same underlying scenario, can be manipulated and observed in different ways by multiple teams. |
| 28 | |
| 29 | |
| 30 | We define a '''scenario''' to encompass related experiments used to explore a scientific inquiry. The scenario explicitly couples the experimenter's '''intent''' with the '''apparatus''' to create a series of related experiment trials. |
| 31 | The experimenter's intent is captured as '''workflows''' and '''invariants'''. A workflow is a sequence of interdependent actions or steps and invariants are properties of an experiment that should remain unchanged throughout the lifecycle. The '''apparatus''' on the other hand, includes the topology and services that are instantiated on the testbed during the execution phase. Separation of experimentation intent from the apparatus also enables experiment portability where the underlying apparatus could |
| 32 | heterogeneous and abstract, virtualized experiment elements. |
| 33 | |
| 34 | == Steps for creating an experiment == |
| 35 | Given the above ELM environment, the basic process of creating a scenario consists of the following steps in a spiral: |
| 36 | |
| 37 | '''Composition Phase ''' |
| 38 | * defining the functional components and functional topology of the study. |
| 39 | * defining the abstractions, models, parameters, and constraints for each functional component |
| 40 | * identifying/defining the experiment workflow and invariants |
| 41 | * identifying/defining the structural physical topology |
| 42 | * Composing the experiment trials by resolving the constraints and exploring the parameter space |
| 43 | |
| 44 | '''Execution Phase''' |
| 45 | * sequential or batched execution of experiment trials |
| 46 | * monitoring for error and configuration problems |
| 47 | |
| 48 | '''Analysis Phase''' |
| 49 | * analyzing completed trials (some trial may still be executing) |
| 50 | * presenting results to experimenter |
| 51 | * feedback parameters into the composition tools |
| 52 | * annotate data and artifacts and store in the repositories |
| 53 | |
| 54 | == Integration with DETER Technologies == |
| 55 | The diagram below describes how ELM, fedd, SEER and CEDL interact |
| 56 | |
| 57 | |------------------------| |
| 58 | ELM --> CEDL --> fedd --> SEER |
| 59 | | ^ |
| 60 | ---------------- | |
| 61 | |
| 62 | (Place holder: need to update) |
| 63 | |
| 64 | == August Review Demo == |
| 65 | |
| 66 | Suppose my intent is to study the response time of an intrusion detection system. I design a scenario to connect attacker components to the IDS component with an internet-cloud component. The ids component is then connected to a service component with a wan-cloud component as shown below. |
| 67 | |
| 68 | [[File:Attacker-ids.png]] |
| 69 | |
| 70 | I am interested in exploring the effects of the attacker on response time of the IDS and not interested in any other aspect of the experiment. The ELM framework should then enable me, the experimenter to solely focus on creating a battery of experimentation trials by varying the number of attacker components, the attacker model, the model parameters,etc. All other aspects of the experiment should be defined, configured, controlled, and monitored based on standard experimentation methodologies and practices. |
| 71 | |
| 72 | Each component that affects the response time of the IDS and has several alternatives is called a ''factor''. In the above example, there are four factors: attacker type, internet-cloud type, wan-cloud type and service type. The models that a factor can assume is called a ''level''. Thus the attacker type has two levels: volume attack and stealth attack. Each level can be further parameterized to give additional sub-levels, for example, low-volume vs high-volume attacks. |
| 73 | |
| 74 | Factors whose effects need to be quantified are called primary factors, for example, in the above study, we are interested in quantifying the effects of attack type. All other factors are secondary, and we are not interested in exploring and quantifying currently. |
| 75 | |
| 76 | Hence the experiment design tool consists of defining individual trials varying each factor and level (and possibly also trial repetitions for statistical significance) to create a battery of experiment trials to explore the every possible combination of all levels and all primary factors. |