Architecture Working group meeting. 28-11-02 GMO 29-11-02 - A short note, prepared by Benedetto and Giuseppe, was agreed as beign a fair summary of the technical architectural variants relevant to scaling and multiplexing. - Most of the discussion was around the scenario with the "sliced" scaling capability. 1) We could not identify a problem which makes the "sliced" scenario undoable in the context of atlas tdaq. 2) We have identified a number of potential problems, however, which ought to be understood - Allocation of sub-farms (or HLT processors) to a predefined trigger type. This would need the replication of the required functionality in each slice. - Partitioning: it becomes more complex, probably each slice has to be shared by all partitions, each partition with its own DFM. And each DFM with its own TTC connections. - ROI mechanism. A sliced tdaq will entail a sliced level-2; level-2 slices will be defined according to the same "slicing" scheme as used by the RODs (e.g. round robin). The two schemes for addressing slices will have to stay in step. The extent of the problem dpends on whether a routing error is local to a particular event (i.e. no error for the subsequent events) or is a permanent one (it propagates to the all the following events). - Implications on the ROD side. The read-out link interface, on the ROD side, has to understand S-link on one side, the network on the other, must have some buffering capability, must be capable of some processing (e.g. calculate L1ID modulo the number of slices) and of translating network flow control signals into S-LINK Xon/Xoff. - Implications on the ROB side: it may need a custom(ized) interface (as the ROBin is already) - Technology: what technology could be used as switched network between RODs and ROB; taking also into account that, today, the required maximum bandwidth is ~ 160MB/sec. Flow control (e.g. how to cope with congested destination); note that the source will have, eventually, to run at full rate. - As regards faults, it is to be noted that there is a coupling between slices, i.e. the slice assignement algorithm. A faulty slice cannot be removed dynamically, it needs the "reprogramming" of the slice assignement algorithm. - Features of the sliced system - smooth scaling of the detector read-out - fault tolerant system (no single point of failure) - Smaller networks at level-2 & event builder - Each network "sees" a fraction of the rate (1/N) - Potential problems (summary) in the area of: - ROI mechanism (synchronisation) - partitioning & allocation of sub-famrms to particular trigger types (duplication) - Send/Receiver interfaces at ROD/ROB level (may be complex) - Technology (flow control, bandwidth) - Issues & sub-systems - it is recognized that the technical issues are related to the dataflow only; a slice system does not affect the work in the HLT and online software areas (a part from farm load-balancing, which if done on smaller size farms (farm slice), may further studies with respect to the current views on load-balancing). - TDAQ week - It is agreed that in the deferral session on tuesday: 1) giuseppe will present the summary of what the wg has achieved and 2) david will present the data flow workplan highlighting what additional issues should be covered.