[go: nahoru, domu]

blob: cacf33ea1753e12759cfda58d0742e1255e96a3c [file] [log] [blame]
pkasting@chromium.org18cca202010-10-21 20:40:581// Copyright (c) 2010 The Chromium Authors. All rights reserved.
sra@chromium.org8cd96962009-06-03 23:28:142// Use of this source code is governed by a BSD-style license that can be
3// found in the LICENSE file.
4
5#include "courgette/adjustment_method.h"
6
7#include <algorithm>
8#include <limits>
9#include <list>
10#include <map>
11#include <set>
12#include <string>
13#include <vector>
sra@chromium.org8cd96962009-06-03 23:28:1414
15#include "base/basictypes.h"
evan@chromium.org34b2b002009-11-20 06:53:2816#include "base/format_macros.h"
sra@chromium.org8cd96962009-06-03 23:28:1417#include "base/logging.h"
18#include "base/string_util.h"
laforge@chromium.org7e6b9262010-03-15 17:33:4619#include "base/time.h"
sra@chromium.org8cd96962009-06-03 23:28:1420
21#include "courgette/assembly_program.h"
22#include "courgette/courgette.h"
23#include "courgette/encoded_program.h"
24#include "courgette/image_info.h"
25
26/*
27
28Shingle weighting matching.
29
30We have a sequence S1 of symbols from alphabet A1={A,B,C,...} called the 'model'
31and a second sequence of S2 of symbols from alphabet A2={U,V,W,....} called the
32'program'. Each symbol in A1 has a unique numerical name or index. We can
33transcribe the sequence S1 to a sequence T1 of indexes of the symbols. We wish
34to assign indexes to the symbols in A2 so that when we transcribe S2 into T2, T2
35has long subsequences that occur in T1. This will ensure that the sequence
36T1;T2 compresses to be only slightly larger than the compressed T1.
37
38The algorithm for matching members of S2 with members of S1 is eager - it makes
39matches without backtracking, until no more matches can be made. Each variable
40(symbol) U,V,... in A2 has a set of candidates from A1, each candidate with a
41weight summarizing the evidence for the match. We keep a VariableQueue of
42U,V,... sorted by how much the evidence for the best choice outweighs the
43evidence for the second choice, i.e. prioritized by how 'clear cut' the best
44assignment is. We pick the variable with the most clear-cut candidate, make the
45assignment, adjust the evidence and repeat.
46
47What has not been described so far is how the evidence is gathered and
48maintained. We are working under the assumption that S1 and S2 are largely
49similar. (A different assumption might be that S1 and S2 are dissimilar except
50for many long subsequences.)
51
52A naive algorithm would consider all pairs (A,U) and for each pair assess the
53benefit, or score, the assignment U:=A. The score might count the number of
54occurrences of U in S2 which appear in similar contexts to A in S1.
55
56To distinguish contexts we view S1 and S2 as a sequence of overlapping k-length
57substrings or 'shingles'. Two shingles are compatible if the symbols in one
58shingle could be matched with the symbols in the other symbol. For example, ABC
59is *not* compatible with UVU because it would require conflicting matches A=U
60and C=U. ABC is compatible with UVW, UWV, WUV, VUW etc. We can't tell which
61until we make an assignment - the compatible shingles form an equivalence class.
62After assigning U:=A then only UVW and UWV (equivalently AVW, AWV) are
63compatible. As we make assignments the number of equivalence classes of
64shingles increases and the number of members of each equivalence class
65decreases. The compatibility test becomes more restrictive.
66
67We gather evidence for the potential assignment U:=A by counting how many
68shingles containing U are compatible with shingles containing A. Thus symbols
69occurring a large number of times in compatible contexts will be assigned first.
70
71Finding the 'most clear-cut' assignment by considering all pairs symbols and for
72each pair comparing the contexts of each pair of occurrences of the symbols is
73computationally infeasible. We get the job done in a reasonable time by
74approaching it 'backwards' and making incremental changes as we make
75assignments.
76
77First the shingles are partitioned according to compatibility. In S1=ABCDD and
78S2=UVWXX we have a total of 6 shingles, each occuring once. (ABC:1 BCD:1 CDD:1;
79UVW:1 VWX: WXX:1) all fit the pattern <V0 V1 V2> or the pattern <V0 V1 V1>. The
80first pattern indicates that each position matches a different symbol, the
81second pattern indicates that the second symbol is repeated.
82
83 pattern S1 members S2 members
84 <V0 V1 V2>: {ABC:1, BCD:1}; {UVW:1, VWX:1}
85 <V0 V1 V1>: {CDD:1} {WXX:1}
86
87The second pattern appears to have a unique assignment but we don't make the
88assignment on such scant evidence. If S1 and S2 do not match exactly, there
89will be numerous spurious low-score matches like this. Instead we must see what
90assignments are indicated by considering all of the evidence.
91
92First pattern has 2 x 2 = 4 shingle pairs. For each pair we count the number
93of symbol assignments. For ABC:a * UVW:b accumulate min(a,b) to each of
94 {U:=A, V:=B, W:=C}.
95After accumulating over all 2 x 2 pairs:
96 U: {A:1 B:1}
97 V: {A:1 B:2 C:1}
98 W: {B:1 C:2 D:1 }
99 X: {C:1 D:1}
100The second pattern contributes:
101 W: {C:1}
102 X: {D:2}
103Sum:
104 U: {A:1 B:1}
105 V: {A:1 B:2 C:1}
106 W: {B:1 C:3 D:1}
107 X: {C:1 D:3}
108
109From this we decide to assign X:=D (because this assignment has both the largest
110difference above the next candidate (X:=C) and this is also the largest
111proportionately over the sum of alternatives).
112
113Lets assume D has numerical 'name' 77. The assignment X:=D sets X to 77 too.
114Next we repartition all the shingles containing X or D:
115
116 pattern S1 members S2 members
117 <V0 V1 V2>: {ABC:1}; {UVW:1}
118 <V0 V1 77>: {BCD:1}; {VWX:1}
119 <V0 77 77>: {CDD:1} {WXX:1}
120As we repartition, we recalculate the contributions to the scores:
121 U: {A:1}
122 V: {B:2}
123 W: {C:3}
124All the remaining assignments are now fixed.
125
126There is one step in the incremental algorithm that is still infeasibly
127expensive: the contributions due to the cross product of large equivalence
128classes. We settle for making an approximation by computing the contribution of
129the cross product of only the most common shingles. The hope is that the noise
130from the long tail of uncounted shingles is well below the scores being used to
131pick assignments. The second hope is that as assignment are made, the large
132equivalence class will be partitioned into smaller equivalence classes, reducing
133the noise over time.
134
135In the code below the shingles are bigger (Shingle::kWidth = 5).
136Class ShinglePattern holds the data for one pattern.
137
138There is an optimization for this case:
139 <V0 V1 V1>: {CDD:1} {WXX:1}
140
141Above we said that we don't make an assignment on this "scant evidence". There
142is an exception: if there is only one variable unassigned (more like the <V0 77
14377> pattern) AND there are no occurrences of C and W other than those counted in
144this pattern, then there is no competing evidence and we go ahead with the
145assignment immediately. This produces slightly better results because these
146cases tend to be low-scoring and susceptible to small mistakes made in
147low-scoring assignments in the approximation for large equivalence classes.
148
149*/
150
151namespace courgette {
152namespace adjustment_method_2 {
sra@chromium.org8cd96962009-06-03 23:28:14153
154////////////////////////////////////////////////////////////////////////////////
155
156class AssignmentCandidates;
157class LabelInfoMaker;
158class Shingle;
159class ShinglePattern;
160
161// The purpose of adjustment is to assign indexes to Labels of a program 'p' to
162// make the sequence of indexes similar to a 'model' program 'm'. Labels
163// themselves don't have enough information to do this job, so we work with a
164// LabelInfo surrogate for each label.
165//
166class LabelInfo {
167 public:
168 // Just a no-argument constructor and copy constructor. Actual LabelInfo
169 // objects are allocated in std::pair structs in a std::map.
170 LabelInfo()
171 : label_(NULL), is_model_(false), debug_index_(0), refs_(0),
172 assignment_(NULL), candidates_(NULL)
173 {}
174
175 ~LabelInfo();
176
177 AssignmentCandidates* candidates();
178
179 Label* label_; // The label that this info a surrogate for.
180
181 uint32 is_model_ : 1; // Is the label in the model?
182 uint32 debug_index_ : 31; // A small number for naming the label in debug
183 // output. The pair (is_model_, debug_index_) is
184 // unique.
185
sra@google.com54f1b822009-07-18 03:28:40186 int refs_; // Number of times this Label is referenced.
sra@chromium.org8cd96962009-06-03 23:28:14187
188 LabelInfo* assignment_; // Label from other program corresponding to this.
189
190 std::vector<uint32> positions_; // Offsets into the trace of references.
191
192 private:
193 AssignmentCandidates* candidates_;
194
195 void operator=(const LabelInfo*); // Disallow assignment only.
196 // Public compiler generated copy constructor is needed to constuct
197 // std::pair<Label*, LabelInfo> so that fresh LabelInfos can be allocated
198 // inside a std::map.
199};
200
201typedef std::vector<LabelInfo*> Trace;
202
203std::string ToString(const LabelInfo* info) {
204 std::string s;
tfarina@chromium.orga77fa2dc2010-11-15 12:11:11205 base::StringAppendF(&s, "%c%d", "pm"[info->is_model_], info->debug_index_);
sra@chromium.org8cd96962009-06-03 23:28:14206 if (info->label_->index_ != Label::kNoIndex)
tfarina@chromium.orga77fa2dc2010-11-15 12:11:11207 base::StringAppendF(&s, " (%d)", info->label_->index_);
sra@chromium.org8cd96962009-06-03 23:28:14208
tfarina@chromium.orga77fa2dc2010-11-15 12:11:11209 base::StringAppendF(&s, " #%u", info->refs_);
sra@chromium.org8cd96962009-06-03 23:28:14210 return s;
211}
212
213// LabelInfoMaker maps labels to their surrogate LabelInfo objects.
214class LabelInfoMaker {
215 public:
216 LabelInfoMaker() : debug_label_index_gen_(0) {}
217
218 LabelInfo* MakeLabelInfo(Label* label, bool is_model, uint32 position) {
219 LabelInfo& slot = label_infos_[label];
220 if (slot.label_ == NULL) {
221 slot.label_ = label;
222 slot.is_model_ = is_model;
223 slot.debug_index_ = ++debug_label_index_gen_;
224 }
225 slot.positions_.push_back(position);
226 ++slot.refs_;
227 return &slot;
228 }
229
230 void ResetDebugLabel() { debug_label_index_gen_ = 0; }
231
232 private:
233 int debug_label_index_gen_;
234
235 // Note LabelInfo is allocated 'flat' inside map::value_type, so the LabelInfo
236 // lifetimes are managed by the map.
237 std::map<Label*, LabelInfo> label_infos_;
238
239 DISALLOW_COPY_AND_ASSIGN(LabelInfoMaker);
240};
241
242struct OrderLabelInfo {
243 bool operator()(const LabelInfo* a, const LabelInfo* b) const {
244 if (a->label_->rva_ < b->label_->rva_) return true;
245 if (a->label_->rva_ > b->label_->rva_) return false;
246 if (a == b) return false;
247 return a->positions_ < b->positions_; // Lexicographic ordering of vector.
248 }
249};
250
251// AssignmentCandidates is a priority queue of candidate assignments to
252// a single program LabelInfo, |program_info_|.
253class AssignmentCandidates {
254 public:
255 explicit AssignmentCandidates(LabelInfo* program_info)
256 : program_info_(program_info) {}
257
258 LabelInfo* program_info() const { return program_info_; }
259
260 bool empty() const { return label_to_score_.empty(); }
261
262 LabelInfo* top_candidate() const { return queue_.begin()->second; }
263
264 void Update(LabelInfo* model_info, int delta_score) {
265 LOG_ASSERT(delta_score != 0);
266 int old_score = 0;
267 int new_score = 0;
268 LabelToScore::iterator p = label_to_score_.find(model_info);
269 if (p != label_to_score_.end()) {
270 old_score = p->second;
271 new_score = old_score + delta_score;
272 queue_.erase(ScoreAndLabel(old_score, p->first));
273 if (new_score == 0) {
274 label_to_score_.erase(p);
275 } else {
276 p->second = new_score;
277 queue_.insert(ScoreAndLabel(new_score, model_info));
278 }
279 } else {
280 new_score = delta_score;
281 label_to_score_.insert(std::make_pair(model_info, new_score));
282 queue_.insert(ScoreAndLabel(new_score, model_info));
283 }
284 LOG_ASSERT(queue_.size() == label_to_score_.size());
285 }
286
287 int TopScore() const {
288 int first_value = 0;
289 int second_value = 0;
290 Queue::const_iterator p = queue_.begin();
291 if (p != queue_.end()) {
292 first_value = p->first;
293 ++p;
294 if (p != queue_.end()) {
295 second_value = p->first;
296 }
297 }
298 return first_value - second_value;
299 }
300
301 bool HasPendingUpdates() { return !pending_updates_.empty(); }
302
303 void AddPendingUpdate(LabelInfo* model_info, int delta_score) {
304 LOG_ASSERT(delta_score != 0);
305 pending_updates_[model_info] += delta_score;
306 }
307
308 void ApplyPendingUpdates() {
309 // TODO(sra): try to walk |pending_updates_| and |label_to_score_| in
310 // lockstep. Try to batch updates to |queue_|.
311 size_t zeroes = 0;
312 for (LabelToScore::iterator p = pending_updates_.begin();
313 p != pending_updates_.end();
314 ++p) {
315 if (p->second != 0)
316 Update(p->first, p->second);
317 else
318 ++zeroes;
319 }
320 pending_updates_.clear();
321 }
322
323 void Print(int max) {
pkasting@chromium.org18cca202010-10-21 20:40:58324 VLOG(2) << "score " << TopScore() << " " << ToString(program_info_)
325 << " := ?";
sra@chromium.org8cd96962009-06-03 23:28:14326 if (!pending_updates_.empty())
pkasting@chromium.org18cca202010-10-21 20:40:58327 VLOG(2) << pending_updates_.size() << " pending";
sra@chromium.org8cd96962009-06-03 23:28:14328 int count = 0;
329 for (Queue::iterator q = queue_.begin(); q != queue_.end(); ++q) {
330 if (++count > max) break;
pkasting@chromium.org18cca202010-10-21 20:40:58331 VLOG(2) << " " << q->first << " " << ToString(q->second);
sra@chromium.org8cd96962009-06-03 23:28:14332 }
333 }
334
335 private:
336 typedef std::map<LabelInfo*, int, OrderLabelInfo> LabelToScore;
337 typedef std::pair<int, LabelInfo*> ScoreAndLabel;
338 struct OrderScoreAndLabelByScoreDecreasing {
339 OrderLabelInfo tie_breaker;
340 bool operator()(const ScoreAndLabel& a, const ScoreAndLabel& b) const {
341 if (a.first > b.first) return true;
342 if (a.first < b.first) return false;
343 return tie_breaker(a.second, b.second);
344 }
345 };
346 typedef std::set<ScoreAndLabel, OrderScoreAndLabelByScoreDecreasing> Queue;
347
348 LabelInfo* program_info_;
349 LabelToScore label_to_score_;
350 LabelToScore pending_updates_;
351 Queue queue_;
352};
353
354AssignmentCandidates* LabelInfo::candidates() {
355 if (candidates_ == NULL)
356 candidates_ = new AssignmentCandidates(this);
357 return candidates_;
358}
359
360LabelInfo::~LabelInfo() {
361 delete candidates_;
362}
363
364// A Shingle is a short fixed-length string of LabelInfos that actually occurs
365// in a Trace. A Shingle may occur many times. We repesent the Shingle by the
366// position of one of the occurrences in the Trace.
367class Shingle {
368 public:
369 static const size_t kWidth = 5;
370
371 struct InterningLess {
372 bool operator()(const Shingle& a, const Shingle& b) const;
373 };
374
375 typedef std::set<Shingle, InterningLess> OwningSet;
376
377 static Shingle* Find(const Trace& trace, size_t position,
378 OwningSet* owning_set) {
379 std::pair<OwningSet::iterator, bool> pair =
380 owning_set->insert(Shingle(trace, position));
sra@google.com54f1b822009-07-18 03:28:40381 // pair.first iterator 'points' to the newly inserted Shingle or the
382 // previouly inserted one that looks the same according to the comparator.
383
384 // const_cast required because key is const. We modify the Shingle
385 // extensively but not in a way that affects InterningLess.
386 Shingle* shingle = const_cast<Shingle*>(&*pair.first);
387 shingle->add_position(position);
388 return shingle;
sra@chromium.org8cd96962009-06-03 23:28:14389 }
390
391 LabelInfo* at(size_t i) const { return trace_[exemplar_position_ + i]; }
bradnelson@google.com104a6082010-12-21 01:03:43392 void add_position(size_t position) {
393 positions_.push_back(static_cast<uint32>(position));
394 }
395 int position_count() const { return static_cast<int>(positions_.size()); }
sra@chromium.org8cd96962009-06-03 23:28:14396
397 bool InModel() const { return at(0)->is_model_; }
398
399 ShinglePattern* pattern() const { return pattern_; }
400 void set_pattern(ShinglePattern* pattern) { pattern_ = pattern; }
401
402 struct PointerLess {
403 bool operator()(const Shingle* a, const Shingle* b) const {
404 // Arbitrary but repeatable (memory-address) independent ordering:
405 return a->exemplar_position_ < b->exemplar_position_;
406 // return InterningLess()(*a, *b);
407 }
408 };
409
410 private:
411 Shingle(const Trace& trace, size_t exemplar_position)
412 : trace_(trace),
413 exemplar_position_(exemplar_position),
414 pattern_(NULL) {
415 }
416
417 const Trace& trace_; // The shingle lives inside trace_.
418 size_t exemplar_position_; // At this position (and other positions).
419 std::vector<uint32> positions_; // Includes exemplar_position_.
420
421 ShinglePattern* pattern_; // Pattern changes as LabelInfos are assigned.
422
423 friend std::string ToString(const Shingle* instance);
424
425 // We can't disallow the copy constructor because we use std::set<Shingle> and
426 // VS2005's implementation of std::set<T>::set() requires T to have a copy
427 // constructor.
428 // DISALLOW_COPY_AND_ASSIGN(Shingle);
429 void operator=(const Shingle&); // Disallow assignment only.
430};
431
432std::string ToString(const Shingle* instance) {
433 std::string s;
434 const char* sep = "<";
435 for (size_t i = 0; i < Shingle::kWidth; ++i) {
tfarina@chromium.orga77fa2dc2010-11-15 12:11:11436 // base::StringAppendF(&s, "%s%x ", sep, instance.at(i)->label_->rva_);
sra@chromium.org8cd96962009-06-03 23:28:14437 s += sep;
438 s += ToString(instance->at(i));
439 sep = ", ";
440 }
bradnelson@google.com104a6082010-12-21 01:03:43441 base::StringAppendF(&s, ">(%" PRIuS ")@{%d}",
tfarina@chromium.orga77fa2dc2010-11-15 12:11:11442 instance->exemplar_position_,
443 instance->position_count());
sra@chromium.org8cd96962009-06-03 23:28:14444 return s;
445}
446
447
448bool Shingle::InterningLess::operator()(
449 const Shingle& a,
450 const Shingle& b) const {
451 for (size_t i = 0; i < kWidth; ++i) {
452 LabelInfo* info_a = a.at(i);
453 LabelInfo* info_b = b.at(i);
454 if (info_a->label_->rva_ < info_b->label_->rva_)
455 return true;
456 if (info_a->label_->rva_ > info_b->label_->rva_)
457 return false;
458 if (info_a->is_model_ < info_b->is_model_)
459 return true;
460 if (info_a->is_model_ > info_b->is_model_)
461 return false;
462 if (info_a != info_b) {
463 NOTREACHED();
464 }
465 }
466 return false;
467}
468
469class ShinglePattern {
470 public:
471 enum { kOffsetMask = 7, // Offset lives in low bits.
472 kFixed = 0, // kind & kVariable == 0 => fixed.
473 kVariable = 8 // kind & kVariable == 1 => variable.
474 };
475 // sequence[position + (kinds_[i] & kOffsetMask)] gives LabelInfo for position
476 // i of shingle. Below, second 'A' is duplicate of position 1, second '102'
477 // is duplicate of position 0.
478 //
479 // <102, A, 103, A , 102>
480 // --> <kFixed+0, kVariable+1, kFixed+2, kVariable+1, kFixed+0>
481 struct Index {
482 explicit Index(const Shingle* instance);
483 uint8 kinds_[Shingle::kWidth];
484 uint8 variables_;
485 uint8 unique_variables_;
486 uint8 first_variable_index_;
487 uint32 hash_;
488 int assigned_indexes_[Shingle::kWidth];
489 };
490
491 // ShinglePattern keeps histograms of member Shingle instances, ordered by
492 // decreasing number of occurrences. We don't have a pair (occurrence count,
493 // Shingle instance), so we use a FreqView adapter to make the instance
494 // pointer look like the pair.
495 class FreqView {
496 public:
497 explicit FreqView(const Shingle* instance) : instance_(instance) {}
bradnelson@google.com104a6082010-12-21 01:03:43498 int count() const { return instance_->position_count(); }
sra@chromium.org8cd96962009-06-03 23:28:14499 const Shingle* instance() const { return instance_; }
500 struct Greater {
501 bool operator()(const FreqView& a, const FreqView& b) const {
502 if (a.count() > b.count()) return true;
503 if (a.count() < b.count()) return false;
504 return resolve_ties(a.instance(), b.instance());
505 }
506 private:
507 Shingle::PointerLess resolve_ties;
508 };
509 private:
510 const Shingle* instance_;
511 };
512
513 typedef std::set<FreqView, FreqView::Greater> Histogram;
514
515 ShinglePattern() : index_(NULL), model_coverage_(0), program_coverage_(0) {}
516
517 const Index* index_; // Points to the key in the owning map value_type.
518 Histogram model_histogram_;
519 Histogram program_histogram_;
520 int model_coverage_;
521 int program_coverage_;
522};
523
524std::string ToString(const ShinglePattern::Index* index) {
525 std::string s;
526 if (index == NULL) {
527 s = "<null>";
528 } else {
tfarina@chromium.orga77fa2dc2010-11-15 12:11:11529 base::StringAppendF(&s, "<%d: ", index->variables_);
sra@chromium.org8cd96962009-06-03 23:28:14530 const char* sep = "";
531 for (size_t i = 0; i < Shingle::kWidth; ++i) {
532 s += sep;
533 sep = ", ";
534 uint32 kind = index->kinds_[i];
535 int offset = kind & ShinglePattern::kOffsetMask;
536 if (kind & ShinglePattern::kVariable)
tfarina@chromium.orga77fa2dc2010-11-15 12:11:11537 base::StringAppendF(&s, "V%d", offset);
sra@chromium.org8cd96962009-06-03 23:28:14538 else
tfarina@chromium.orga77fa2dc2010-11-15 12:11:11539 base::StringAppendF(&s, "%d", index->assigned_indexes_[offset]);
sra@chromium.org8cd96962009-06-03 23:28:14540 }
tfarina@chromium.orga77fa2dc2010-11-15 12:11:11541 base::StringAppendF(&s, " %x", index->hash_);
sra@chromium.org8cd96962009-06-03 23:28:14542 s += ">";
543 }
544 return s;
545}
546
547std::string HistogramToString(const ShinglePattern::Histogram& histogram,
548 size_t snippet_max) {
549 std::string s;
550 size_t histogram_size = histogram.size();
551 size_t snippet_size = 0;
552 for (ShinglePattern::Histogram::const_iterator p = histogram.begin();
553 p != histogram.end();
554 ++p) {
555 if (++snippet_size > snippet_max && snippet_size != histogram_size) {
556 s += " ...";
557 break;
558 }
bradnelson@google.com104a6082010-12-21 01:03:43559 base::StringAppendF(&s, " %d", p->count());
sra@chromium.org8cd96962009-06-03 23:28:14560 }
561 return s;
562}
563
564std::string HistogramToStringFull(const ShinglePattern::Histogram& histogram,
565 const char* indent,
566 size_t snippet_max) {
567 std::string s;
568
569 size_t histogram_size = histogram.size();
570 size_t snippet_size = 0;
571 for (ShinglePattern::Histogram::const_iterator p = histogram.begin();
572 p != histogram.end();
573 ++p) {
574 s += indent;
575 if (++snippet_size > snippet_max && snippet_size != histogram_size) {
576 s += "...\n";
577 break;
578 }
bradnelson@google.com104a6082010-12-21 01:03:43579 base::StringAppendF(&s, "(%d) ", p->count());
sra@chromium.org8cd96962009-06-03 23:28:14580 s += ToString(&(*p->instance()));
581 s += "\n";
582 }
583 return s;
584}
585
586std::string ToString(const ShinglePattern* pattern, size_t snippet_max = 3) {
587 std::string s;
588 if (pattern == NULL) {
589 s = "<null>";
590 } else {
591 s = "{";
592 s += ToString(pattern->index_);
tfarina@chromium.orga77fa2dc2010-11-15 12:11:11593 base::StringAppendF(&s, "; %d(%d):",
594 static_cast<int>(pattern->model_histogram_.size()),
595 pattern->model_coverage_);
sra@chromium.org8cd96962009-06-03 23:28:14596
597 s += HistogramToString(pattern->model_histogram_, snippet_max);
tfarina@chromium.orga77fa2dc2010-11-15 12:11:11598 base::StringAppendF(&s, "; %d(%d):",
599 static_cast<int>(pattern->program_histogram_.size()),
600 pattern->program_coverage_);
sra@chromium.org8cd96962009-06-03 23:28:14601 s += HistogramToString(pattern->program_histogram_, snippet_max);
602 s += "}";
603 }
604 return s;
605}
606
607std::string ShinglePatternToStringFull(const ShinglePattern* pattern,
608 size_t max) {
609 std::string s;
610 s += ToString(pattern->index_);
611 s += "\n";
612 size_t model_size = pattern->model_histogram_.size();
613 size_t program_size = pattern->program_histogram_.size();
tfarina@chromium.orga77fa2dc2010-11-15 12:11:11614 base::StringAppendF(&s, " model shingles %" PRIuS "\n", model_size);
sra@chromium.org8cd96962009-06-03 23:28:14615 s += HistogramToStringFull(pattern->model_histogram_, " ", max);
tfarina@chromium.orga77fa2dc2010-11-15 12:11:11616 base::StringAppendF(&s, " program shingles %" PRIuS "\n", program_size);
sra@chromium.org8cd96962009-06-03 23:28:14617 s += HistogramToStringFull(pattern->program_histogram_, " ", max);
618 return s;
619}
620
621struct ShinglePatternIndexLess {
622 bool operator()(const ShinglePattern::Index& a,
623 const ShinglePattern::Index& b) const {
624 if (a.hash_ < b.hash_) return true;
625 if (a.hash_ > b.hash_) return false;
626
627 for (size_t i = 0; i < Shingle::kWidth; ++i) {
628 if (a.kinds_[i] < b.kinds_[i]) return true;
629 if (a.kinds_[i] > b.kinds_[i]) return false;
630 if ((a.kinds_[i] & ShinglePattern::kVariable) == 0) {
631 if (a.assigned_indexes_[i] < b.assigned_indexes_[i])
632 return true;
633 if (a.assigned_indexes_[i] > b.assigned_indexes_[i])
634 return false;
635 }
636 }
637 return false;
638 }
639};
640
641static uint32 hash_combine(uint32 h, uint32 v) {
642 h += v;
643 return (h * (37 + 0x0000d100)) ^ (h >> 13);
644}
645
646ShinglePattern::Index::Index(const Shingle* instance) {
647 uint32 hash = 0;
648 variables_ = 0;
649 unique_variables_ = 0;
650 first_variable_index_ = 255;
651
bradnelson@google.com104a6082010-12-21 01:03:43652 for (uint32 i = 0; i < Shingle::kWidth; ++i) {
sra@chromium.org8cd96962009-06-03 23:28:14653 LabelInfo* info = instance->at(i);
yusukes@google.com22c105912009-07-19 01:19:55654 uint32 kind = 0;
sra@chromium.org8cd96962009-06-03 23:28:14655 int code = -1;
656 size_t j = 0;
657 for ( ; j < i; ++j) {
658 if (info == instance->at(j)) { // Duplicate LabelInfo
659 kind = kinds_[j];
660 break;
661 }
662 }
663 if (j == i) { // Not found above.
664 if (info->assignment_) {
665 code = info->label_->index_;
666 assigned_indexes_[i] = code;
667 kind = kFixed + i;
668 } else {
669 kind = kVariable + i;
670 ++unique_variables_;
671 if (i < first_variable_index_)
672 first_variable_index_ = i;
673 }
674 }
675 if (kind & kVariable) ++variables_;
676 hash = hash_combine(hash, code);
677 hash = hash_combine(hash, kind);
678 kinds_[i] = kind;
679 assigned_indexes_[i] = code;
680 }
681 hash_ = hash;
682}
683
684struct ShinglePatternLess {
685 bool operator()(const ShinglePattern& a, const ShinglePattern& b) const {
686 return index_less(*a.index_, *b.index_);
687 }
688 ShinglePatternIndexLess index_less;
689};
690
691struct ShinglePatternPointerLess {
692 bool operator()(const ShinglePattern* a, const ShinglePattern* b) const {
693 return pattern_less(*a, *b);
694 }
695 ShinglePatternLess pattern_less;
696};
697
698template<int (*Scorer)(const ShinglePattern*)>
699struct OrderShinglePatternByScoreDescending {
700 bool operator()(const ShinglePattern* a, const ShinglePattern* b) const {
701 int score_a = Scorer(a);
702 int score_b = Scorer(b);
703 if (score_a > score_b) return true;
704 if (score_a < score_b) return false;
705 return break_ties(a, b);
706 }
707 ShinglePatternPointerLess break_ties;
708};
709
710// Returns a score for a 'Single Use' rule. Returns -1 if the rule is not
711// applicable.
712int SingleUseScore(const ShinglePattern* pattern) {
713 if (pattern->index_->variables_ != 1)
714 return -1;
715
716 if (pattern->model_histogram_.size() != 1 ||
717 pattern->program_histogram_.size() != 1)
718 return -1;
719
720 // Does this pattern account for all uses of the variable?
721 const ShinglePattern::FreqView& program_freq =
722 *pattern->program_histogram_.begin();
723 const ShinglePattern::FreqView& model_freq =
724 *pattern->model_histogram_.begin();
725 int p1 = program_freq.count();
726 int m1 = model_freq.count();
727 if (p1 == m1) {
728 const Shingle* program_instance = program_freq.instance();
729 const Shingle* model_instance = model_freq.instance();
730 size_t variable_index = pattern->index_->first_variable_index_;
731 LabelInfo* program_info = program_instance->at(variable_index);
732 LabelInfo* model_info = model_instance->at(variable_index);
733 if (!program_info->assignment_) {
734 if (program_info->refs_ == p1 && model_info->refs_ == m1) {
735 return p1;
736 }
737 }
738 }
739 return -1;
740}
741
742// The VariableQueue is a priority queue of unassigned LabelInfos from
743// the 'program' (the 'variables') and their AssignmentCandidates.
744class VariableQueue {
745 public:
746 typedef std::pair<int, LabelInfo*> ScoreAndLabel;
747
748 VariableQueue() {}
749
750 bool empty() const { return queue_.empty(); }
751
752 const ScoreAndLabel& first() const { return *queue_.begin(); }
753
754 // For debugging only.
755 void Print() const {
756 for (Queue::const_iterator p = queue_.begin(); p != queue_.end(); ++p) {
757 AssignmentCandidates* candidates = p->second->candidates();
758 candidates->Print(std::numeric_limits<int>::max());
759 }
760 }
761
762 void AddPendingUpdate(LabelInfo* program_info, LabelInfo* model_info,
763 int delta_score) {
764 AssignmentCandidates* candidates = program_info->candidates();
765 if (!candidates->HasPendingUpdates()) {
766 pending_update_candidates_.push_back(candidates);
767 }
768 candidates->AddPendingUpdate(model_info, delta_score);
769 }
770
771 void ApplyPendingUpdates() {
772 for (size_t i = 0; i < pending_update_candidates_.size(); ++i) {
773 AssignmentCandidates* candidates = pending_update_candidates_[i];
774 int old_score = candidates->TopScore();
775 queue_.erase(ScoreAndLabel(old_score, candidates->program_info()));
776 candidates->ApplyPendingUpdates();
777 if (!candidates->empty()) {
778 int new_score = candidates->TopScore();
779 queue_.insert(ScoreAndLabel(new_score, candidates->program_info()));
780 }
781 }
782 pending_update_candidates_.clear();
783 }
784
785 private:
786 struct OrderScoreAndLabelByScoreDecreasing {
787 bool operator()(const ScoreAndLabel& a, const ScoreAndLabel& b) const {
788 if (a.first > b.first) return true;
789 if (a.first < b.first) return false;
790 return OrderLabelInfo()(a.second, b.second);
791 }
792 };
793 typedef std::set<ScoreAndLabel, OrderScoreAndLabelByScoreDecreasing> Queue;
794
795 Queue queue_;
796 std::vector<AssignmentCandidates*> pending_update_candidates_;
797
798 DISALLOW_COPY_AND_ASSIGN(VariableQueue);
799};
800
801
802class AssignmentProblem {
803 public:
804 AssignmentProblem(const Trace& trace, size_t model_end)
805 : trace_(trace),
806 model_end_(model_end) {
pkasting@chromium.org18cca202010-10-21 20:40:58807 VLOG(2) << "AssignmentProblem::AssignmentProblem " << model_end << ", "
808 << trace.size();
sra@chromium.org8cd96962009-06-03 23:28:14809 }
810
811 bool Solve() {
812 if (model_end_ < Shingle::kWidth ||
813 trace_.size() - model_end_ < Shingle::kWidth) {
814 // Nothing much we can do with such a short problem.
815 return true;
816 }
817 instances_.resize(trace_.size() - Shingle::kWidth + 1, NULL);
818 AddShingles(0, model_end_);
819 AddShingles(model_end_, trace_.size());
820 InitialClassify();
821 AddPatternsNeedingUpdatesToQueues();
822
823 patterns_needing_updates_.clear();
pkasting@chromium.org18cca202010-10-21 20:40:58824 while (FindAndAssignBestLeader())
sra@chromium.org8cd96962009-06-03 23:28:14825 patterns_needing_updates_.clear();
sra@chromium.org8cd96962009-06-03 23:28:14826 PrintActivePatterns();
827
828 return true;
829 }
830
831 private:
832 typedef std::set<Shingle*, Shingle::PointerLess> ShingleSet;
833
834 typedef std::set<const ShinglePattern*, ShinglePatternPointerLess>
835 ShinglePatternSet;
836
837 // Patterns are partitioned into the following sets:
838
839 // * Retired patterns (not stored). No shingles exist for this pattern (they
840 // all now match more specialized patterns).
841 // * Useless patterns (not stored). There are no 'program' shingles for this
842 // pattern (they all now match more specialized patterns).
843 // * Single-use patterns - single_use_pattern_queue_.
844 // * Other patterns - active_non_single_use_patterns_ / variable_queue_.
845
846 typedef std::set<const ShinglePattern*,
847 OrderShinglePatternByScoreDescending<&SingleUseScore> >
848 SingleUsePatternQueue;
849
850 void PrintPatternsHeader() const {
pkasting@chromium.org18cca202010-10-21 20:40:58851 VLOG(2) << shingle_instances_.size() << " instances "
852 << trace_.size() << " trace length "
853 << patterns_.size() << " shingle indexes "
854 << single_use_pattern_queue_.size() << " single use patterns "
855 << active_non_single_use_patterns_.size() << " active patterns";
sra@chromium.org8cd96962009-06-03 23:28:14856 }
857
858 void PrintActivePatterns() const {
859 for (ShinglePatternSet::const_iterator p =
860 active_non_single_use_patterns_.begin();
861 p != active_non_single_use_patterns_.end();
862 ++p) {
863 const ShinglePattern* pattern = *p;
pkasting@chromium.org18cca202010-10-21 20:40:58864 VLOG(2) << ToString(pattern, 10);
sra@chromium.org8cd96962009-06-03 23:28:14865 }
866 }
867
868 void PrintPatterns() const {
869 PrintAllPatterns();
870 PrintActivePatterns();
871 PrintAllShingles();
872 }
873
874 void PrintAllPatterns() const {
875 for (IndexToPattern::const_iterator p = patterns_.begin();
876 p != patterns_.end();
877 ++p) {
878 const ShinglePattern& pattern = p->second;
pkasting@chromium.org18cca202010-10-21 20:40:58879 VLOG(2) << ToString(&pattern, 10);
sra@chromium.org8cd96962009-06-03 23:28:14880 }
881 }
882
883 void PrintAllShingles() const {
884 for (Shingle::OwningSet::const_iterator p = shingle_instances_.begin();
885 p != shingle_instances_.end();
886 ++p) {
887 const Shingle& instance = *p;
pkasting@chromium.org18cca202010-10-21 20:40:58888 VLOG(2) << ToString(&instance) << " " << ToString(instance.pattern());
sra@chromium.org8cd96962009-06-03 23:28:14889 }
890 }
891
892
893 void AddShingles(size_t begin, size_t end) {
894 for (size_t i = begin; i + Shingle::kWidth - 1 < end; ++i) {
895 instances_[i] = Shingle::Find(trace_, i, &shingle_instances_);
896 }
897 }
898
899 void Declassify(Shingle* shingle) {
900 ShinglePattern* pattern = shingle->pattern();
901 if (shingle->InModel()) {
902 pattern->model_histogram_.erase(ShinglePattern::FreqView(shingle));
903 pattern->model_coverage_ -= shingle->position_count();
904 } else {
905 pattern->program_histogram_.erase(ShinglePattern::FreqView(shingle));
906 pattern->program_coverage_ -= shingle->position_count();
907 }
908 shingle->set_pattern(NULL);
909 }
910
911 void Reclassify(Shingle* shingle) {
912 ShinglePattern* pattern = shingle->pattern();
913 LOG_ASSERT(pattern == NULL);
914
915 ShinglePattern::Index index(shingle);
916 if (index.variables_ == 0)
917 return;
918
919 std::pair<IndexToPattern::iterator, bool> inserted =
920 patterns_.insert(std::make_pair(index, ShinglePattern()));
921
922 pattern = &inserted.first->second;
923 pattern->index_ = &inserted.first->first;
924 shingle->set_pattern(pattern);
925 patterns_needing_updates_.insert(pattern);
926
927 if (shingle->InModel()) {
928 pattern->model_histogram_.insert(ShinglePattern::FreqView(shingle));
929 pattern->model_coverage_ += shingle->position_count();
930 } else {
931 pattern->program_histogram_.insert(ShinglePattern::FreqView(shingle));
932 pattern->program_coverage_ += shingle->position_count();
933 }
934 }
935
936 void InitialClassify() {
937 for (Shingle::OwningSet::iterator p = shingle_instances_.begin();
938 p != shingle_instances_.end();
939 ++p) {
sra@google.com54f1b822009-07-18 03:28:40940 // GCC's set<T>::iterator::operator *() returns a const object.
941 Reclassify(const_cast<Shingle*>(&*p));
sra@chromium.org8cd96962009-06-03 23:28:14942 }
943 }
944
945 // For the positions in |info|, find the shingles that overlap that position.
946 void AddAffectedPositions(LabelInfo* info, ShingleSet* affected_shingles) {
947 const size_t kWidth = Shingle::kWidth;
948 for (size_t i = 0; i < info->positions_.size(); ++i) {
949 size_t position = info->positions_[i];
950 // Find bounds to the subrange of |trace_| we are in.
951 size_t start = position < model_end_ ? 0 : model_end_;
952 size_t end = position < model_end_ ? model_end_ : trace_.size();
953
954 // Clip [position-kWidth+1, position+1)
955 size_t low = position > start + kWidth - 1
956 ? position - kWidth + 1
957 : start;
958 size_t high = position + kWidth < end ? position + 1 : end - kWidth + 1;
959
960 for (size_t shingle_position = low;
961 shingle_position < high;
962 ++shingle_position) {
963 Shingle* overlapping_shingle = instances_.at(shingle_position);
964 affected_shingles->insert(overlapping_shingle);
965 }
966 }
967 }
968
969 void RemovePatternsNeedingUpdatesFromQueues() {
970 for (ShinglePatternSet::iterator p = patterns_needing_updates_.begin();
971 p != patterns_needing_updates_.end();
972 ++p) {
973 RemovePatternFromQueues(*p);
974 }
975 }
976
977 void AddPatternsNeedingUpdatesToQueues() {
978 for (ShinglePatternSet::iterator p = patterns_needing_updates_.begin();
979 p != patterns_needing_updates_.end();
980 ++p) {
981 AddPatternToQueues(*p);
982 }
983 variable_queue_.ApplyPendingUpdates();
984 }
985
986 void RemovePatternFromQueues(const ShinglePattern* pattern) {
987 int single_use_score = SingleUseScore(pattern);
988 if (single_use_score > 0) {
989 size_t n = single_use_pattern_queue_.erase(pattern);
990 LOG_ASSERT(n == 1);
991 } else if (pattern->program_histogram_.size() == 0 &&
992 pattern->model_histogram_.size() == 0) {
993 NOTREACHED(); // Should not come back to life.
994 } else if (pattern->program_histogram_.size() == 0) {
995 // Useless pattern.
996 } else {
997 active_non_single_use_patterns_.erase(pattern);
998 AddPatternToLabelQueue(pattern, -1);
999 }
1000 }
1001
1002 void AddPatternToQueues(const ShinglePattern* pattern) {
1003 int single_use_score = SingleUseScore(pattern);
1004 if (single_use_score > 0) {
1005 single_use_pattern_queue_.insert(pattern);
1006 } else if (pattern->program_histogram_.size() == 0 &&
1007 pattern->model_histogram_.size() == 0) {
1008 } else if (pattern->program_histogram_.size() == 0) {
1009 // Useless pattern.
1010 } else {
1011 active_non_single_use_patterns_.insert(pattern);
1012 AddPatternToLabelQueue(pattern, +1);
1013 }
1014 }
1015
1016 void AddPatternToLabelQueue(const ShinglePattern* pattern, int sign) {
1017 // For each possible assignment in this pattern, update the potential
1018 // contributions to the LabelInfo queues.
sra@chromium.org8cd96962009-06-03 23:28:141019
1020 // We want to find for each symbol (LabelInfo) the maximum contribution that
1021 // could be achieved by making shingle-wise assignments between shingles in
1022 // the model and shingles in the program.
1023 //
1024 // If the shingles in the histograms are independent (no two shingles have a
1025 // symbol in common) then any permutation of the assignments is possible,
1026 // and the maximum contribution can be found by taking the maximum over all
1027 // the pairs.
1028 //
1029 // If the shingles are dependent two things happen. The maximum
1030 // contribution to any given symbol is a sum because the symbol has
1031 // contributions from all the shingles containing it. Second, some
1032 // assignments are blocked by previous incompatible assignments. We want to
1033 // avoid a combinatorial search, so we ignore the blocking.
1034
sra@google.com54f1b822009-07-18 03:28:401035 const size_t kUnwieldy = 5;
sra@chromium.org8cd96962009-06-03 23:28:141036
1037 typedef std::map<LabelInfo*, int> LabelToScore;
1038 typedef std::map<LabelInfo*, LabelToScore > ScoreSet;
1039 ScoreSet maxima;
1040
1041 size_t n_model_samples = 0;
1042 for (ShinglePattern::Histogram::const_iterator model_iter =
1043 pattern->model_histogram_.begin();
1044 model_iter != pattern->model_histogram_.end();
1045 ++model_iter) {
1046 if (++n_model_samples > kUnwieldy) break;
1047 const ShinglePattern::FreqView& model_freq = *model_iter;
1048 int m1 = model_freq.count();
1049 const Shingle* model_instance = model_freq.instance();
1050
1051 ScoreSet sums;
1052 size_t n_program_samples = 0;
1053 for (ShinglePattern::Histogram::const_iterator program_iter =
1054 pattern->program_histogram_.begin();
1055 program_iter != pattern->program_histogram_.end();
1056 ++program_iter) {
1057 if (++n_program_samples > kUnwieldy) break;
1058 const ShinglePattern::FreqView& program_freq = *program_iter;
1059 int p1 = program_freq.count();
1060 const Shingle* program_instance = program_freq.instance();
1061
1062 // int score = p1; // ? weigh all equally??
1063 int score = std::min(p1, m1);
1064
1065 for (size_t i = 0; i < Shingle::kWidth; ++i) {
1066 LabelInfo* program_info = program_instance->at(i);
1067 LabelInfo* model_info = model_instance->at(i);
1068 if ((model_info->assignment_ == NULL) !=
1069 (program_info->assignment_ == NULL)) {
pkasting@chromium.org18cca202010-10-21 20:40:581070 VLOG(2) << "ERROR " << i
1071 << "\n\t" << ToString(pattern, 10)
1072 << "\n\t" << ToString(program_instance)
1073 << "\n\t" << ToString(model_instance);
sra@chromium.org8cd96962009-06-03 23:28:141074 }
1075 if (!program_info->assignment_ && !model_info->assignment_) {
1076 sums[program_info][model_info] += score;
1077 }
1078 }
1079
1080 for (ScoreSet::iterator assignee_iterator = sums.begin();
1081 assignee_iterator != sums.end();
1082 ++assignee_iterator) {
1083 LabelInfo* program_info = assignee_iterator->first;
1084 for (LabelToScore::iterator p = assignee_iterator->second.begin();
1085 p != assignee_iterator->second.end();
1086 ++p) {
1087 LabelInfo* model_info = p->first;
1088 int score = p->second;
1089 int* slot = &maxima[program_info][model_info];
1090 *slot = std::max(*slot, score);
1091 }
1092 }
1093 }
1094 }
1095
1096 for (ScoreSet::iterator assignee_iterator = maxima.begin();
1097 assignee_iterator != maxima.end();
1098 ++assignee_iterator) {
1099 LabelInfo* program_info = assignee_iterator->first;
1100 for (LabelToScore::iterator p = assignee_iterator->second.begin();
1101 p != assignee_iterator->second.end();
1102 ++p) {
1103 LabelInfo* model_info = p->first;
1104 int score = sign * p->second;
1105 variable_queue_.AddPendingUpdate(program_info, model_info, score);
1106 }
1107 }
1108 }
1109
1110 void AssignOne(LabelInfo* model_info, LabelInfo* program_info) {
1111 LOG_ASSERT(!model_info->assignment_);
1112 LOG_ASSERT(!program_info->assignment_);
1113 LOG_ASSERT(model_info->is_model_);
1114 LOG_ASSERT(!program_info->is_model_);
1115
pkasting@chromium.org18cca202010-10-21 20:40:581116 VLOG(3) << "Assign " << ToString(program_info)
1117 << " := " << ToString(model_info);
sra@chromium.org8cd96962009-06-03 23:28:141118
1119 ShingleSet affected_shingles;
1120 AddAffectedPositions(model_info, &affected_shingles);
1121 AddAffectedPositions(program_info, &affected_shingles);
1122
1123 for (ShingleSet::iterator p = affected_shingles.begin();
1124 p != affected_shingles.end();
1125 ++p) {
1126 patterns_needing_updates_.insert((*p)->pattern());
1127 }
1128
1129 RemovePatternsNeedingUpdatesFromQueues();
1130
1131 for (ShingleSet::iterator p = affected_shingles.begin();
1132 p != affected_shingles.end();
1133 ++p) {
1134 Declassify(*p);
1135 }
1136
1137 program_info->label_->index_ = model_info->label_->index_;
1138 // Mark as assigned
1139 model_info->assignment_ = program_info;
1140 program_info->assignment_ = model_info;
1141
1142 for (ShingleSet::iterator p = affected_shingles.begin();
1143 p != affected_shingles.end();
1144 ++p) {
1145 Reclassify(*p);
1146 }
1147
1148 AddPatternsNeedingUpdatesToQueues();
1149 }
1150
1151 bool AssignFirstVariableOfHistogramHead(const ShinglePattern& pattern) {
1152 const ShinglePattern::FreqView& program_1 =
1153 *pattern.program_histogram_.begin();
1154 const ShinglePattern::FreqView& model_1 = *pattern.model_histogram_.begin();
1155 const Shingle* program_instance = program_1.instance();
1156 const Shingle* model_instance = model_1.instance();
1157 size_t variable_index = pattern.index_->first_variable_index_;
1158 LabelInfo* program_info = program_instance->at(variable_index);
1159 LabelInfo* model_info = model_instance->at(variable_index);
1160 AssignOne(model_info, program_info);
1161 return true;
1162 }
1163
1164 bool FindAndAssignBestLeader() {
1165 LOG_ASSERT(patterns_needing_updates_.empty());
1166
1167 if (!single_use_pattern_queue_.empty()) {
1168 const ShinglePattern& pattern = **single_use_pattern_queue_.begin();
1169 return AssignFirstVariableOfHistogramHead(pattern);
1170 }
1171
1172 if (variable_queue_.empty())
1173 return false;
1174
1175 const VariableQueue::ScoreAndLabel best = variable_queue_.first();
1176 int score = best.first;
1177 LabelInfo* assignee = best.second;
1178
1179 // TODO(sra): score (best.first) can be zero. A zero score means we are
1180 // blindly picking between two (or more) alternatives which look the same.
1181 // If we exit on the first zero-score we sometimes get 3-4% better total
1182 // compression. This indicates that 'infill' is doing a better job than
1183 // picking blindly. Perhaps we can use an extended region around the
1184 // undistinguished competing alternatives to break the tie.
1185 if (score == 0) {
1186 variable_queue_.Print();
1187 return false;
1188 }
1189
1190 AssignmentCandidates* candidates = assignee->candidates();
1191 if (candidates->empty())
1192 return false; // Should not happen.
1193
1194 AssignOne(candidates->top_candidate(), assignee);
1195 return true;
1196 }
1197
1198 private:
1199 // The trace vector contains the model sequence [0, model_end_) followed by
1200 // the program sequence [model_end_, trace.end())
1201 const Trace& trace_;
1202 size_t model_end_;
1203
1204 // |shingle_instances_| is the set of 'interned' shingles.
1205 Shingle::OwningSet shingle_instances_;
1206
1207 // |instances_| maps from position in |trace_| to Shingle at that position.
1208 std::vector<Shingle*> instances_;
1209
1210 SingleUsePatternQueue single_use_pattern_queue_;
1211 ShinglePatternSet active_non_single_use_patterns_;
1212 VariableQueue variable_queue_;
1213
1214 // Transient information: when we make an assignment, we need to recompute
1215 // priority queue information derived from these ShinglePatterns.
1216 ShinglePatternSet patterns_needing_updates_;
1217
1218 typedef std::map<ShinglePattern::Index,
1219 ShinglePattern, ShinglePatternIndexLess> IndexToPattern;
1220 IndexToPattern patterns_;
1221
1222 DISALLOW_COPY_AND_ASSIGN(AssignmentProblem);
1223};
1224
1225class Adjuster : public AdjustmentMethod {
1226 public:
sra@chromium.orgd38c3a22009-07-22 01:30:061227 Adjuster() : prog_(NULL), model_(NULL) {}
sra@chromium.org8cd96962009-06-03 23:28:141228 ~Adjuster() {}
1229
1230 bool Adjust(const AssemblyProgram& model, AssemblyProgram* program) {
pkasting@chromium.org18cca202010-10-21 20:40:581231 VLOG(1) << "Adjuster::Adjust";
sra@chromium.org8cd96962009-06-03 23:28:141232 prog_ = program;
1233 model_ = &model;
1234 return Finish();
1235 }
1236
1237 bool Finish() {
1238 prog_->UnassignIndexes();
1239 Trace abs32_trace_;
1240 Trace rel32_trace_;
1241 CollectTraces(model_, &abs32_trace_, &rel32_trace_, true);
1242 size_t abs32_model_end = abs32_trace_.size();
1243 size_t rel32_model_end = rel32_trace_.size();
1244 CollectTraces(prog_, &abs32_trace_, &rel32_trace_, false);
1245 Solve(abs32_trace_, abs32_model_end);
1246 Solve(rel32_trace_, rel32_model_end);
1247 prog_->AssignRemainingIndexes();
1248 return true;
1249 }
1250
1251 private:
1252 void CollectTraces(const AssemblyProgram* program, Trace* abs32, Trace* rel32,
1253 bool is_model) {
1254 label_info_maker_.ResetDebugLabel();
1255 const std::vector<Instruction*>& instructions = program->instructions();
1256 for (size_t i = 0; i < instructions.size(); ++i) {
1257 Instruction* instruction = instructions.at(i);
1258 if (Label* label = program->InstructionAbs32Label(instruction))
1259 ReferenceLabel(abs32, label, is_model);
1260 if (Label* label = program->InstructionRel32Label(instruction))
1261 ReferenceLabel(rel32, label, is_model);
1262 }
1263 // TODO(sra): we could simply append all the labels in index order to
1264 // incorporate some costing for entropy (bigger deltas) that will be
1265 // introduced into the label address table by non-monotonic ordering. This
1266 // would have some knock-on effects to parts of the algorithm that work on
1267 // single-occurrence labels.
1268 }
1269
1270 void Solve(const Trace& model, size_t model_end) {
laforge@chromium.org7e6b9262010-03-15 17:33:461271 base::Time start_time = base::Time::Now();
sra@chromium.org8cd96962009-06-03 23:28:141272 AssignmentProblem a(model, model_end);
1273 a.Solve();
pkasting@chromium.org18cca202010-10-21 20:40:581274 VLOG(1) << " Adjuster::Solve "
1275 << (base::Time::Now() - start_time).InSecondsF();
sra@chromium.org8cd96962009-06-03 23:28:141276 }
1277
1278 void ReferenceLabel(Trace* trace, Label* label, bool is_model) {
1279 trace->push_back(
bradnelson@google.com104a6082010-12-21 01:03:431280 label_info_maker_.MakeLabelInfo(label, is_model,
1281 static_cast<uint32>(trace->size())));
sra@chromium.org8cd96962009-06-03 23:28:141282 }
1283
1284 AssemblyProgram* prog_; // Program to be adjusted, owned by caller.
1285 const AssemblyProgram* model_; // Program to be mimicked, owned by caller.
1286
1287 LabelInfoMaker label_info_maker_;
1288
1289 private:
1290 DISALLOW_COPY_AND_ASSIGN(Adjuster);
1291};
1292
1293////////////////////////////////////////////////////////////////////////////////
1294
1295} // namespace adjustment_method_2
1296
1297AdjustmentMethod* AdjustmentMethod::MakeShingleAdjustmentMethod() {
1298 return new adjustment_method_2::Adjuster();
1299}
1300
1301} // namespace courgette