[go: nahoru, domu]

blob: 99c1103defc6f6e5331b37fc01db830c1e2dc9bd [file] [log] [blame]
tommi@chromium.org43a9e242011-04-06 17:42:451// Copyright (c) 2011 The Chromium Authors. All rights reserved.
sra@chromium.org8cd96962009-06-03 23:28:142// Use of this source code is governed by a BSD-style license that can be
3// found in the LICENSE file.
4
5#include "courgette/adjustment_method.h"
6
aviab98dcc92015-12-21 19:35:337#include <stddef.h>
8#include <stdint.h>
9
sra@chromium.org8cd96962009-06-03 23:28:1410#include <algorithm>
11#include <limits>
12#include <list>
13#include <map>
14#include <set>
15#include <string>
16#include <vector>
17
evan@chromium.org34b2b002009-11-20 06:53:2818#include "base/format_macros.h"
sra@chromium.org8cd96962009-06-03 23:28:1419#include "base/logging.h"
aviab98dcc92015-12-21 19:35:3320#include "base/macros.h"
avi@chromium.org68c986e2013-06-11 13:26:2521#include "base/strings/stringprintf.h"
avi@chromium.orgb43e5562013-06-28 15:20:0222#include "base/time/time.h"
sra@chromium.org8cd96962009-06-03 23:28:1423#include "courgette/assembly_program.h"
24#include "courgette/courgette.h"
25#include "courgette/encoded_program.h"
sra@chromium.org8cd96962009-06-03 23:28:1426
27/*
28
29Shingle weighting matching.
30
31We have a sequence S1 of symbols from alphabet A1={A,B,C,...} called the 'model'
32and a second sequence of S2 of symbols from alphabet A2={U,V,W,....} called the
33'program'. Each symbol in A1 has a unique numerical name or index. We can
34transcribe the sequence S1 to a sequence T1 of indexes of the symbols. We wish
35to assign indexes to the symbols in A2 so that when we transcribe S2 into T2, T2
36has long subsequences that occur in T1. This will ensure that the sequence
37T1;T2 compresses to be only slightly larger than the compressed T1.
38
39The algorithm for matching members of S2 with members of S1 is eager - it makes
40matches without backtracking, until no more matches can be made. Each variable
41(symbol) U,V,... in A2 has a set of candidates from A1, each candidate with a
42weight summarizing the evidence for the match. We keep a VariableQueue of
43U,V,... sorted by how much the evidence for the best choice outweighs the
44evidence for the second choice, i.e. prioritized by how 'clear cut' the best
45assignment is. We pick the variable with the most clear-cut candidate, make the
46assignment, adjust the evidence and repeat.
47
48What has not been described so far is how the evidence is gathered and
49maintained. We are working under the assumption that S1 and S2 are largely
50similar. (A different assumption might be that S1 and S2 are dissimilar except
51for many long subsequences.)
52
53A naive algorithm would consider all pairs (A,U) and for each pair assess the
54benefit, or score, the assignment U:=A. The score might count the number of
55occurrences of U in S2 which appear in similar contexts to A in S1.
56
57To distinguish contexts we view S1 and S2 as a sequence of overlapping k-length
58substrings or 'shingles'. Two shingles are compatible if the symbols in one
59shingle could be matched with the symbols in the other symbol. For example, ABC
60is *not* compatible with UVU because it would require conflicting matches A=U
61and C=U. ABC is compatible with UVW, UWV, WUV, VUW etc. We can't tell which
62until we make an assignment - the compatible shingles form an equivalence class.
63After assigning U:=A then only UVW and UWV (equivalently AVW, AWV) are
64compatible. As we make assignments the number of equivalence classes of
65shingles increases and the number of members of each equivalence class
66decreases. The compatibility test becomes more restrictive.
67
68We gather evidence for the potential assignment U:=A by counting how many
69shingles containing U are compatible with shingles containing A. Thus symbols
70occurring a large number of times in compatible contexts will be assigned first.
71
72Finding the 'most clear-cut' assignment by considering all pairs symbols and for
73each pair comparing the contexts of each pair of occurrences of the symbols is
74computationally infeasible. We get the job done in a reasonable time by
75approaching it 'backwards' and making incremental changes as we make
76assignments.
77
78First the shingles are partitioned according to compatibility. In S1=ABCDD and
79S2=UVWXX we have a total of 6 shingles, each occuring once. (ABC:1 BCD:1 CDD:1;
80UVW:1 VWX: WXX:1) all fit the pattern <V0 V1 V2> or the pattern <V0 V1 V1>. The
81first pattern indicates that each position matches a different symbol, the
82second pattern indicates that the second symbol is repeated.
83
84 pattern S1 members S2 members
85 <V0 V1 V2>: {ABC:1, BCD:1}; {UVW:1, VWX:1}
86 <V0 V1 V1>: {CDD:1} {WXX:1}
87
88The second pattern appears to have a unique assignment but we don't make the
89assignment on such scant evidence. If S1 and S2 do not match exactly, there
90will be numerous spurious low-score matches like this. Instead we must see what
91assignments are indicated by considering all of the evidence.
92
93First pattern has 2 x 2 = 4 shingle pairs. For each pair we count the number
94of symbol assignments. For ABC:a * UVW:b accumulate min(a,b) to each of
95 {U:=A, V:=B, W:=C}.
96After accumulating over all 2 x 2 pairs:
97 U: {A:1 B:1}
98 V: {A:1 B:2 C:1}
99 W: {B:1 C:2 D:1 }
100 X: {C:1 D:1}
101The second pattern contributes:
102 W: {C:1}
103 X: {D:2}
104Sum:
105 U: {A:1 B:1}
106 V: {A:1 B:2 C:1}
107 W: {B:1 C:3 D:1}
108 X: {C:1 D:3}
109
110From this we decide to assign X:=D (because this assignment has both the largest
111difference above the next candidate (X:=C) and this is also the largest
112proportionately over the sum of alternatives).
113
114Lets assume D has numerical 'name' 77. The assignment X:=D sets X to 77 too.
115Next we repartition all the shingles containing X or D:
116
117 pattern S1 members S2 members
118 <V0 V1 V2>: {ABC:1}; {UVW:1}
119 <V0 V1 77>: {BCD:1}; {VWX:1}
120 <V0 77 77>: {CDD:1} {WXX:1}
121As we repartition, we recalculate the contributions to the scores:
122 U: {A:1}
123 V: {B:2}
124 W: {C:3}
125All the remaining assignments are now fixed.
126
127There is one step in the incremental algorithm that is still infeasibly
128expensive: the contributions due to the cross product of large equivalence
129classes. We settle for making an approximation by computing the contribution of
130the cross product of only the most common shingles. The hope is that the noise
131from the long tail of uncounted shingles is well below the scores being used to
132pick assignments. The second hope is that as assignment are made, the large
133equivalence class will be partitioned into smaller equivalence classes, reducing
134the noise over time.
135
136In the code below the shingles are bigger (Shingle::kWidth = 5).
137Class ShinglePattern holds the data for one pattern.
138
139There is an optimization for this case:
140 <V0 V1 V1>: {CDD:1} {WXX:1}
141
142Above we said that we don't make an assignment on this "scant evidence". There
143is an exception: if there is only one variable unassigned (more like the <V0 77
14477> pattern) AND there are no occurrences of C and W other than those counted in
145this pattern, then there is no competing evidence and we go ahead with the
146assignment immediately. This produces slightly better results because these
147cases tend to be low-scoring and susceptible to small mistakes made in
148low-scoring assignments in the approximation for large equivalence classes.
149
150*/
151
152namespace courgette {
153namespace adjustment_method_2 {
154
sra@chromium.org8cd96962009-06-03 23:28:14155////////////////////////////////////////////////////////////////////////////////
156
157class AssignmentCandidates;
158class LabelInfoMaker;
159class Shingle;
160class ShinglePattern;
161
162// The purpose of adjustment is to assign indexes to Labels of a program 'p' to
163// make the sequence of indexes similar to a 'model' program 'm'. Labels
164// themselves don't have enough information to do this job, so we work with a
165// LabelInfo surrogate for each label.
166//
167class LabelInfo {
168 public:
169 // Just a no-argument constructor and copy constructor. Actual LabelInfo
170 // objects are allocated in std::pair structs in a std::map.
171 LabelInfo()
172 : label_(NULL), is_model_(false), debug_index_(0), refs_(0),
173 assignment_(NULL), candidates_(NULL)
174 {}
175
176 ~LabelInfo();
177
178 AssignmentCandidates* candidates();
179
180 Label* label_; // The label that this info a surrogate for.
181
aviab98dcc92015-12-21 19:35:33182 uint32_t is_model_ : 1; // Is the label in the model?
183 uint32_t debug_index_ : 31; // A small number for naming the label in debug
184 // output. The pair (is_model_, debug_index_) is
185 // unique.
sra@chromium.org8cd96962009-06-03 23:28:14186
sra@google.com54f1b822009-07-18 03:28:40187 int refs_; // Number of times this Label is referenced.
sra@chromium.org8cd96962009-06-03 23:28:14188
189 LabelInfo* assignment_; // Label from other program corresponding to this.
190
aviab98dcc92015-12-21 19:35:33191 std::vector<uint32_t> positions_; // Offsets into the trace of references.
sra@chromium.org8cd96962009-06-03 23:28:14192
193 private:
194 AssignmentCandidates* candidates_;
195
196 void operator=(const LabelInfo*); // Disallow assignment only.
197 // Public compiler generated copy constructor is needed to constuct
198 // std::pair<Label*, LabelInfo> so that fresh LabelInfos can be allocated
199 // inside a std::map.
200};
201
202typedef std::vector<LabelInfo*> Trace;
203
204std::string ToString(const LabelInfo* info) {
205 std::string s;
tfarina@chromium.orga77fa2dc2010-11-15 12:11:11206 base::StringAppendF(&s, "%c%d", "pm"[info->is_model_], info->debug_index_);
sra@chromium.org8cd96962009-06-03 23:28:14207 if (info->label_->index_ != Label::kNoIndex)
tfarina@chromium.orga77fa2dc2010-11-15 12:11:11208 base::StringAppendF(&s, " (%d)", info->label_->index_);
sra@chromium.org8cd96962009-06-03 23:28:14209
tfarina@chromium.orga77fa2dc2010-11-15 12:11:11210 base::StringAppendF(&s, " #%u", info->refs_);
sra@chromium.org8cd96962009-06-03 23:28:14211 return s;
212}
213
214// LabelInfoMaker maps labels to their surrogate LabelInfo objects.
215class LabelInfoMaker {
216 public:
217 LabelInfoMaker() : debug_label_index_gen_(0) {}
218
aviab98dcc92015-12-21 19:35:33219 LabelInfo* MakeLabelInfo(Label* label, bool is_model, uint32_t position) {
sra@chromium.org8cd96962009-06-03 23:28:14220 LabelInfo& slot = label_infos_[label];
221 if (slot.label_ == NULL) {
222 slot.label_ = label;
223 slot.is_model_ = is_model;
224 slot.debug_index_ = ++debug_label_index_gen_;
225 }
226 slot.positions_.push_back(position);
227 ++slot.refs_;
228 return &slot;
229 }
230
231 void ResetDebugLabel() { debug_label_index_gen_ = 0; }
232
233 private:
234 int debug_label_index_gen_;
235
236 // Note LabelInfo is allocated 'flat' inside map::value_type, so the LabelInfo
237 // lifetimes are managed by the map.
238 std::map<Label*, LabelInfo> label_infos_;
239
240 DISALLOW_COPY_AND_ASSIGN(LabelInfoMaker);
241};
242
243struct OrderLabelInfo {
244 bool operator()(const LabelInfo* a, const LabelInfo* b) const {
245 if (a->label_->rva_ < b->label_->rva_) return true;
246 if (a->label_->rva_ > b->label_->rva_) return false;
247 if (a == b) return false;
248 return a->positions_ < b->positions_; // Lexicographic ordering of vector.
249 }
250};
251
252// AssignmentCandidates is a priority queue of candidate assignments to
253// a single program LabelInfo, |program_info_|.
254class AssignmentCandidates {
255 public:
256 explicit AssignmentCandidates(LabelInfo* program_info)
257 : program_info_(program_info) {}
258
259 LabelInfo* program_info() const { return program_info_; }
260
261 bool empty() const { return label_to_score_.empty(); }
262
263 LabelInfo* top_candidate() const { return queue_.begin()->second; }
264
265 void Update(LabelInfo* model_info, int delta_score) {
266 LOG_ASSERT(delta_score != 0);
267 int old_score = 0;
268 int new_score = 0;
269 LabelToScore::iterator p = label_to_score_.find(model_info);
270 if (p != label_to_score_.end()) {
271 old_score = p->second;
272 new_score = old_score + delta_score;
273 queue_.erase(ScoreAndLabel(old_score, p->first));
274 if (new_score == 0) {
275 label_to_score_.erase(p);
276 } else {
277 p->second = new_score;
278 queue_.insert(ScoreAndLabel(new_score, model_info));
279 }
280 } else {
281 new_score = delta_score;
282 label_to_score_.insert(std::make_pair(model_info, new_score));
283 queue_.insert(ScoreAndLabel(new_score, model_info));
284 }
285 LOG_ASSERT(queue_.size() == label_to_score_.size());
286 }
287
288 int TopScore() const {
289 int first_value = 0;
290 int second_value = 0;
291 Queue::const_iterator p = queue_.begin();
292 if (p != queue_.end()) {
293 first_value = p->first;
294 ++p;
295 if (p != queue_.end()) {
296 second_value = p->first;
297 }
298 }
299 return first_value - second_value;
300 }
301
302 bool HasPendingUpdates() { return !pending_updates_.empty(); }
303
304 void AddPendingUpdate(LabelInfo* model_info, int delta_score) {
305 LOG_ASSERT(delta_score != 0);
306 pending_updates_[model_info] += delta_score;
307 }
308
309 void ApplyPendingUpdates() {
310 // TODO(sra): try to walk |pending_updates_| and |label_to_score_| in
311 // lockstep. Try to batch updates to |queue_|.
312 size_t zeroes = 0;
313 for (LabelToScore::iterator p = pending_updates_.begin();
314 p != pending_updates_.end();
315 ++p) {
316 if (p->second != 0)
317 Update(p->first, p->second);
318 else
319 ++zeroes;
320 }
321 pending_updates_.clear();
322 }
323
324 void Print(int max) {
pkasting@chromium.org18cca202010-10-21 20:40:58325 VLOG(2) << "score " << TopScore() << " " << ToString(program_info_)
326 << " := ?";
sra@chromium.org8cd96962009-06-03 23:28:14327 if (!pending_updates_.empty())
pkasting@chromium.org18cca202010-10-21 20:40:58328 VLOG(2) << pending_updates_.size() << " pending";
sra@chromium.org8cd96962009-06-03 23:28:14329 int count = 0;
330 for (Queue::iterator q = queue_.begin(); q != queue_.end(); ++q) {
331 if (++count > max) break;
pkasting@chromium.org18cca202010-10-21 20:40:58332 VLOG(2) << " " << q->first << " " << ToString(q->second);
sra@chromium.org8cd96962009-06-03 23:28:14333 }
334 }
335
336 private:
337 typedef std::map<LabelInfo*, int, OrderLabelInfo> LabelToScore;
338 typedef std::pair<int, LabelInfo*> ScoreAndLabel;
339 struct OrderScoreAndLabelByScoreDecreasing {
340 OrderLabelInfo tie_breaker;
341 bool operator()(const ScoreAndLabel& a, const ScoreAndLabel& b) const {
342 if (a.first > b.first) return true;
343 if (a.first < b.first) return false;
344 return tie_breaker(a.second, b.second);
345 }
346 };
347 typedef std::set<ScoreAndLabel, OrderScoreAndLabelByScoreDecreasing> Queue;
348
349 LabelInfo* program_info_;
350 LabelToScore label_to_score_;
351 LabelToScore pending_updates_;
352 Queue queue_;
353};
354
355AssignmentCandidates* LabelInfo::candidates() {
356 if (candidates_ == NULL)
357 candidates_ = new AssignmentCandidates(this);
358 return candidates_;
359}
360
361LabelInfo::~LabelInfo() {
362 delete candidates_;
363}
364
365// A Shingle is a short fixed-length string of LabelInfos that actually occurs
366// in a Trace. A Shingle may occur many times. We repesent the Shingle by the
367// position of one of the occurrences in the Trace.
368class Shingle {
369 public:
aviab98dcc92015-12-21 19:35:33370 static const uint8_t kWidth = 5;
sra@chromium.org8cd96962009-06-03 23:28:14371
372 struct InterningLess {
373 bool operator()(const Shingle& a, const Shingle& b) const;
374 };
375
376 typedef std::set<Shingle, InterningLess> OwningSet;
377
378 static Shingle* Find(const Trace& trace, size_t position,
379 OwningSet* owning_set) {
380 std::pair<OwningSet::iterator, bool> pair =
381 owning_set->insert(Shingle(trace, position));
sra@google.com54f1b822009-07-18 03:28:40382 // pair.first iterator 'points' to the newly inserted Shingle or the
383 // previouly inserted one that looks the same according to the comparator.
384
385 // const_cast required because key is const. We modify the Shingle
386 // extensively but not in a way that affects InterningLess.
387 Shingle* shingle = const_cast<Shingle*>(&*pair.first);
388 shingle->add_position(position);
389 return shingle;
sra@chromium.org8cd96962009-06-03 23:28:14390 }
391
392 LabelInfo* at(size_t i) const { return trace_[exemplar_position_ + i]; }
bradnelson@google.com104a6082010-12-21 01:03:43393 void add_position(size_t position) {
aviab98dcc92015-12-21 19:35:33394 positions_.push_back(static_cast<uint32_t>(position));
bradnelson@google.com104a6082010-12-21 01:03:43395 }
396 int position_count() const { return static_cast<int>(positions_.size()); }
sra@chromium.org8cd96962009-06-03 23:28:14397
398 bool InModel() const { return at(0)->is_model_; }
399
400 ShinglePattern* pattern() const { return pattern_; }
401 void set_pattern(ShinglePattern* pattern) { pattern_ = pattern; }
402
403 struct PointerLess {
404 bool operator()(const Shingle* a, const Shingle* b) const {
405 // Arbitrary but repeatable (memory-address) independent ordering:
406 return a->exemplar_position_ < b->exemplar_position_;
407 // return InterningLess()(*a, *b);
408 }
409 };
410
411 private:
412 Shingle(const Trace& trace, size_t exemplar_position)
413 : trace_(trace),
414 exemplar_position_(exemplar_position),
415 pattern_(NULL) {
416 }
417
418 const Trace& trace_; // The shingle lives inside trace_.
419 size_t exemplar_position_; // At this position (and other positions).
aviab98dcc92015-12-21 19:35:33420 std::vector<uint32_t> positions_; // Includes exemplar_position_.
sra@chromium.org8cd96962009-06-03 23:28:14421
422 ShinglePattern* pattern_; // Pattern changes as LabelInfos are assigned.
423
424 friend std::string ToString(const Shingle* instance);
425
426 // We can't disallow the copy constructor because we use std::set<Shingle> and
427 // VS2005's implementation of std::set<T>::set() requires T to have a copy
428 // constructor.
429 // DISALLOW_COPY_AND_ASSIGN(Shingle);
Chris Watkins2903a822017-11-30 03:19:57430 void operator=(const Shingle&) = delete; // Disallow assignment only.
sra@chromium.org8cd96962009-06-03 23:28:14431};
432
433std::string ToString(const Shingle* instance) {
434 std::string s;
435 const char* sep = "<";
aviab98dcc92015-12-21 19:35:33436 for (uint8_t i = 0; i < Shingle::kWidth; ++i) {
tfarina@chromium.orga77fa2dc2010-11-15 12:11:11437 // base::StringAppendF(&s, "%s%x ", sep, instance.at(i)->label_->rva_);
sra@chromium.org8cd96962009-06-03 23:28:14438 s += sep;
439 s += ToString(instance->at(i));
440 sep = ", ";
441 }
bradnelson@google.com104a6082010-12-21 01:03:43442 base::StringAppendF(&s, ">(%" PRIuS ")@{%d}",
tfarina@chromium.orga77fa2dc2010-11-15 12:11:11443 instance->exemplar_position_,
444 instance->position_count());
sra@chromium.org8cd96962009-06-03 23:28:14445 return s;
446}
447
448
449bool Shingle::InterningLess::operator()(
450 const Shingle& a,
451 const Shingle& b) const {
aviab98dcc92015-12-21 19:35:33452 for (uint8_t i = 0; i < kWidth; ++i) {
sra@chromium.org8cd96962009-06-03 23:28:14453 LabelInfo* info_a = a.at(i);
454 LabelInfo* info_b = b.at(i);
455 if (info_a->label_->rva_ < info_b->label_->rva_)
456 return true;
457 if (info_a->label_->rva_ > info_b->label_->rva_)
458 return false;
459 if (info_a->is_model_ < info_b->is_model_)
460 return true;
461 if (info_a->is_model_ > info_b->is_model_)
462 return false;
463 if (info_a != info_b) {
464 NOTREACHED();
465 }
466 }
467 return false;
468}
469
470class ShinglePattern {
471 public:
472 enum { kOffsetMask = 7, // Offset lives in low bits.
473 kFixed = 0, // kind & kVariable == 0 => fixed.
474 kVariable = 8 // kind & kVariable == 1 => variable.
475 };
476 // sequence[position + (kinds_[i] & kOffsetMask)] gives LabelInfo for position
477 // i of shingle. Below, second 'A' is duplicate of position 1, second '102'
478 // is duplicate of position 0.
479 //
480 // <102, A, 103, A , 102>
481 // --> <kFixed+0, kVariable+1, kFixed+2, kVariable+1, kFixed+0>
482 struct Index {
483 explicit Index(const Shingle* instance);
aviab98dcc92015-12-21 19:35:33484 uint8_t kinds_[Shingle::kWidth];
485 uint8_t variables_;
486 uint8_t unique_variables_;
487 uint8_t first_variable_index_;
488 uint32_t hash_;
sra@chromium.org8cd96962009-06-03 23:28:14489 int assigned_indexes_[Shingle::kWidth];
490 };
491
492 // ShinglePattern keeps histograms of member Shingle instances, ordered by
493 // decreasing number of occurrences. We don't have a pair (occurrence count,
494 // Shingle instance), so we use a FreqView adapter to make the instance
495 // pointer look like the pair.
496 class FreqView {
497 public:
498 explicit FreqView(const Shingle* instance) : instance_(instance) {}
bradnelson@google.com104a6082010-12-21 01:03:43499 int count() const { return instance_->position_count(); }
sra@chromium.org8cd96962009-06-03 23:28:14500 const Shingle* instance() const { return instance_; }
501 struct Greater {
502 bool operator()(const FreqView& a, const FreqView& b) const {
503 if (a.count() > b.count()) return true;
504 if (a.count() < b.count()) return false;
505 return resolve_ties(a.instance(), b.instance());
506 }
507 private:
508 Shingle::PointerLess resolve_ties;
509 };
510 private:
511 const Shingle* instance_;
512 };
513
514 typedef std::set<FreqView, FreqView::Greater> Histogram;
515
516 ShinglePattern() : index_(NULL), model_coverage_(0), program_coverage_(0) {}
517
518 const Index* index_; // Points to the key in the owning map value_type.
519 Histogram model_histogram_;
520 Histogram program_histogram_;
521 int model_coverage_;
522 int program_coverage_;
523};
524
525std::string ToString(const ShinglePattern::Index* index) {
526 std::string s;
527 if (index == NULL) {
528 s = "<null>";
529 } else {
tfarina@chromium.orga77fa2dc2010-11-15 12:11:11530 base::StringAppendF(&s, "<%d: ", index->variables_);
sra@chromium.org8cd96962009-06-03 23:28:14531 const char* sep = "";
aviab98dcc92015-12-21 19:35:33532 for (uint8_t i = 0; i < Shingle::kWidth; ++i) {
sra@chromium.org8cd96962009-06-03 23:28:14533 s += sep;
534 sep = ", ";
aviab98dcc92015-12-21 19:35:33535 uint32_t kind = index->kinds_[i];
sra@chromium.org8cd96962009-06-03 23:28:14536 int offset = kind & ShinglePattern::kOffsetMask;
537 if (kind & ShinglePattern::kVariable)
tfarina@chromium.orga77fa2dc2010-11-15 12:11:11538 base::StringAppendF(&s, "V%d", offset);
sra@chromium.org8cd96962009-06-03 23:28:14539 else
tfarina@chromium.orga77fa2dc2010-11-15 12:11:11540 base::StringAppendF(&s, "%d", index->assigned_indexes_[offset]);
sra@chromium.org8cd96962009-06-03 23:28:14541 }
tfarina@chromium.orga77fa2dc2010-11-15 12:11:11542 base::StringAppendF(&s, " %x", index->hash_);
sra@chromium.org8cd96962009-06-03 23:28:14543 s += ">";
544 }
545 return s;
546}
547
548std::string HistogramToString(const ShinglePattern::Histogram& histogram,
549 size_t snippet_max) {
550 std::string s;
551 size_t histogram_size = histogram.size();
552 size_t snippet_size = 0;
553 for (ShinglePattern::Histogram::const_iterator p = histogram.begin();
554 p != histogram.end();
555 ++p) {
556 if (++snippet_size > snippet_max && snippet_size != histogram_size) {
557 s += " ...";
558 break;
559 }
bradnelson@google.com104a6082010-12-21 01:03:43560 base::StringAppendF(&s, " %d", p->count());
sra@chromium.org8cd96962009-06-03 23:28:14561 }
562 return s;
563}
564
565std::string HistogramToStringFull(const ShinglePattern::Histogram& histogram,
566 const char* indent,
567 size_t snippet_max) {
568 std::string s;
569
570 size_t histogram_size = histogram.size();
571 size_t snippet_size = 0;
572 for (ShinglePattern::Histogram::const_iterator p = histogram.begin();
573 p != histogram.end();
574 ++p) {
575 s += indent;
576 if (++snippet_size > snippet_max && snippet_size != histogram_size) {
577 s += "...\n";
578 break;
579 }
bradnelson@google.com104a6082010-12-21 01:03:43580 base::StringAppendF(&s, "(%d) ", p->count());
sra@chromium.org8cd96962009-06-03 23:28:14581 s += ToString(&(*p->instance()));
582 s += "\n";
583 }
584 return s;
585}
586
587std::string ToString(const ShinglePattern* pattern, size_t snippet_max = 3) {
588 std::string s;
589 if (pattern == NULL) {
590 s = "<null>";
591 } else {
592 s = "{";
593 s += ToString(pattern->index_);
tfarina@chromium.orga77fa2dc2010-11-15 12:11:11594 base::StringAppendF(&s, "; %d(%d):",
595 static_cast<int>(pattern->model_histogram_.size()),
596 pattern->model_coverage_);
sra@chromium.org8cd96962009-06-03 23:28:14597
598 s += HistogramToString(pattern->model_histogram_, snippet_max);
tfarina@chromium.orga77fa2dc2010-11-15 12:11:11599 base::StringAppendF(&s, "; %d(%d):",
600 static_cast<int>(pattern->program_histogram_.size()),
601 pattern->program_coverage_);
sra@chromium.org8cd96962009-06-03 23:28:14602 s += HistogramToString(pattern->program_histogram_, snippet_max);
603 s += "}";
604 }
605 return s;
606}
607
608std::string ShinglePatternToStringFull(const ShinglePattern* pattern,
609 size_t max) {
610 std::string s;
611 s += ToString(pattern->index_);
612 s += "\n";
613 size_t model_size = pattern->model_histogram_.size();
614 size_t program_size = pattern->program_histogram_.size();
tfarina@chromium.orga77fa2dc2010-11-15 12:11:11615 base::StringAppendF(&s, " model shingles %" PRIuS "\n", model_size);
sra@chromium.org8cd96962009-06-03 23:28:14616 s += HistogramToStringFull(pattern->model_histogram_, " ", max);
tfarina@chromium.orga77fa2dc2010-11-15 12:11:11617 base::StringAppendF(&s, " program shingles %" PRIuS "\n", program_size);
sra@chromium.org8cd96962009-06-03 23:28:14618 s += HistogramToStringFull(pattern->program_histogram_, " ", max);
619 return s;
620}
621
622struct ShinglePatternIndexLess {
623 bool operator()(const ShinglePattern::Index& a,
624 const ShinglePattern::Index& b) const {
625 if (a.hash_ < b.hash_) return true;
626 if (a.hash_ > b.hash_) return false;
627
aviab98dcc92015-12-21 19:35:33628 for (uint8_t i = 0; i < Shingle::kWidth; ++i) {
sra@chromium.org8cd96962009-06-03 23:28:14629 if (a.kinds_[i] < b.kinds_[i]) return true;
630 if (a.kinds_[i] > b.kinds_[i]) return false;
631 if ((a.kinds_[i] & ShinglePattern::kVariable) == 0) {
632 if (a.assigned_indexes_[i] < b.assigned_indexes_[i])
633 return true;
634 if (a.assigned_indexes_[i] > b.assigned_indexes_[i])
635 return false;
636 }
637 }
638 return false;
639 }
640};
641
aviab98dcc92015-12-21 19:35:33642static uint32_t hash_combine(uint32_t h, uint32_t v) {
sra@chromium.org8cd96962009-06-03 23:28:14643 h += v;
644 return (h * (37 + 0x0000d100)) ^ (h >> 13);
645}
646
647ShinglePattern::Index::Index(const Shingle* instance) {
aviab98dcc92015-12-21 19:35:33648 uint32_t hash = 0;
sra@chromium.org8cd96962009-06-03 23:28:14649 variables_ = 0;
650 unique_variables_ = 0;
651 first_variable_index_ = 255;
652
aviab98dcc92015-12-21 19:35:33653 for (uint8_t i = 0; i < Shingle::kWidth; ++i) {
sra@chromium.org8cd96962009-06-03 23:28:14654 LabelInfo* info = instance->at(i);
aviab98dcc92015-12-21 19:35:33655 uint8_t kind = 0;
sra@chromium.org8cd96962009-06-03 23:28:14656 int code = -1;
aviab98dcc92015-12-21 19:35:33657 uint8_t j = 0;
sra@chromium.org8cd96962009-06-03 23:28:14658 for ( ; j < i; ++j) {
659 if (info == instance->at(j)) { // Duplicate LabelInfo
660 kind = kinds_[j];
661 break;
662 }
663 }
664 if (j == i) { // Not found above.
665 if (info->assignment_) {
666 code = info->label_->index_;
667 assigned_indexes_[i] = code;
668 kind = kFixed + i;
669 } else {
670 kind = kVariable + i;
671 ++unique_variables_;
672 if (i < first_variable_index_)
673 first_variable_index_ = i;
674 }
675 }
676 if (kind & kVariable) ++variables_;
677 hash = hash_combine(hash, code);
678 hash = hash_combine(hash, kind);
679 kinds_[i] = kind;
680 assigned_indexes_[i] = code;
681 }
682 hash_ = hash;
683}
684
685struct ShinglePatternLess {
686 bool operator()(const ShinglePattern& a, const ShinglePattern& b) const {
687 return index_less(*a.index_, *b.index_);
688 }
689 ShinglePatternIndexLess index_less;
690};
691
692struct ShinglePatternPointerLess {
693 bool operator()(const ShinglePattern* a, const ShinglePattern* b) const {
694 return pattern_less(*a, *b);
695 }
696 ShinglePatternLess pattern_less;
697};
698
699template<int (*Scorer)(const ShinglePattern*)>
700struct OrderShinglePatternByScoreDescending {
701 bool operator()(const ShinglePattern* a, const ShinglePattern* b) const {
702 int score_a = Scorer(a);
703 int score_b = Scorer(b);
704 if (score_a > score_b) return true;
705 if (score_a < score_b) return false;
706 return break_ties(a, b);
707 }
708 ShinglePatternPointerLess break_ties;
709};
710
711// Returns a score for a 'Single Use' rule. Returns -1 if the rule is not
712// applicable.
713int SingleUseScore(const ShinglePattern* pattern) {
714 if (pattern->index_->variables_ != 1)
715 return -1;
716
717 if (pattern->model_histogram_.size() != 1 ||
718 pattern->program_histogram_.size() != 1)
719 return -1;
720
721 // Does this pattern account for all uses of the variable?
722 const ShinglePattern::FreqView& program_freq =
723 *pattern->program_histogram_.begin();
724 const ShinglePattern::FreqView& model_freq =
725 *pattern->model_histogram_.begin();
726 int p1 = program_freq.count();
727 int m1 = model_freq.count();
728 if (p1 == m1) {
729 const Shingle* program_instance = program_freq.instance();
730 const Shingle* model_instance = model_freq.instance();
731 size_t variable_index = pattern->index_->first_variable_index_;
732 LabelInfo* program_info = program_instance->at(variable_index);
733 LabelInfo* model_info = model_instance->at(variable_index);
734 if (!program_info->assignment_) {
735 if (program_info->refs_ == p1 && model_info->refs_ == m1) {
736 return p1;
737 }
738 }
739 }
740 return -1;
741}
742
743// The VariableQueue is a priority queue of unassigned LabelInfos from
744// the 'program' (the 'variables') and their AssignmentCandidates.
745class VariableQueue {
746 public:
747 typedef std::pair<int, LabelInfo*> ScoreAndLabel;
748
Chris Watkins2903a822017-11-30 03:19:57749 VariableQueue() = default;
sra@chromium.org8cd96962009-06-03 23:28:14750
751 bool empty() const { return queue_.empty(); }
752
753 const ScoreAndLabel& first() const { return *queue_.begin(); }
754
755 // For debugging only.
756 void Print() const {
757 for (Queue::const_iterator p = queue_.begin(); p != queue_.end(); ++p) {
758 AssignmentCandidates* candidates = p->second->candidates();
759 candidates->Print(std::numeric_limits<int>::max());
760 }
761 }
762
763 void AddPendingUpdate(LabelInfo* program_info, LabelInfo* model_info,
764 int delta_score) {
765 AssignmentCandidates* candidates = program_info->candidates();
766 if (!candidates->HasPendingUpdates()) {
767 pending_update_candidates_.push_back(candidates);
768 }
769 candidates->AddPendingUpdate(model_info, delta_score);
770 }
771
772 void ApplyPendingUpdates() {
773 for (size_t i = 0; i < pending_update_candidates_.size(); ++i) {
774 AssignmentCandidates* candidates = pending_update_candidates_[i];
775 int old_score = candidates->TopScore();
776 queue_.erase(ScoreAndLabel(old_score, candidates->program_info()));
777 candidates->ApplyPendingUpdates();
778 if (!candidates->empty()) {
779 int new_score = candidates->TopScore();
780 queue_.insert(ScoreAndLabel(new_score, candidates->program_info()));
781 }
782 }
783 pending_update_candidates_.clear();
784 }
785
786 private:
787 struct OrderScoreAndLabelByScoreDecreasing {
788 bool operator()(const ScoreAndLabel& a, const ScoreAndLabel& b) const {
789 if (a.first > b.first) return true;
790 if (a.first < b.first) return false;
791 return OrderLabelInfo()(a.second, b.second);
792 }
793 };
794 typedef std::set<ScoreAndLabel, OrderScoreAndLabelByScoreDecreasing> Queue;
795
796 Queue queue_;
797 std::vector<AssignmentCandidates*> pending_update_candidates_;
798
799 DISALLOW_COPY_AND_ASSIGN(VariableQueue);
800};
801
802
803class AssignmentProblem {
804 public:
805 AssignmentProblem(const Trace& trace, size_t model_end)
806 : trace_(trace),
807 model_end_(model_end) {
pkasting@chromium.org18cca202010-10-21 20:40:58808 VLOG(2) << "AssignmentProblem::AssignmentProblem " << model_end << ", "
809 << trace.size();
sra@chromium.org8cd96962009-06-03 23:28:14810 }
811
812 bool Solve() {
813 if (model_end_ < Shingle::kWidth ||
814 trace_.size() - model_end_ < Shingle::kWidth) {
815 // Nothing much we can do with such a short problem.
816 return true;
817 }
818 instances_.resize(trace_.size() - Shingle::kWidth + 1, NULL);
819 AddShingles(0, model_end_);
820 AddShingles(model_end_, trace_.size());
821 InitialClassify();
822 AddPatternsNeedingUpdatesToQueues();
823
824 patterns_needing_updates_.clear();
pkasting@chromium.org18cca202010-10-21 20:40:58825 while (FindAndAssignBestLeader())
sra@chromium.org8cd96962009-06-03 23:28:14826 patterns_needing_updates_.clear();
sra@chromium.org8cd96962009-06-03 23:28:14827 PrintActivePatterns();
828
829 return true;
830 }
831
832 private:
833 typedef std::set<Shingle*, Shingle::PointerLess> ShingleSet;
834
835 typedef std::set<const ShinglePattern*, ShinglePatternPointerLess>
836 ShinglePatternSet;
837
838 // Patterns are partitioned into the following sets:
839
840 // * Retired patterns (not stored). No shingles exist for this pattern (they
841 // all now match more specialized patterns).
842 // * Useless patterns (not stored). There are no 'program' shingles for this
843 // pattern (they all now match more specialized patterns).
844 // * Single-use patterns - single_use_pattern_queue_.
845 // * Other patterns - active_non_single_use_patterns_ / variable_queue_.
846
847 typedef std::set<const ShinglePattern*,
848 OrderShinglePatternByScoreDescending<&SingleUseScore> >
849 SingleUsePatternQueue;
850
851 void PrintPatternsHeader() const {
pkasting@chromium.org18cca202010-10-21 20:40:58852 VLOG(2) << shingle_instances_.size() << " instances "
853 << trace_.size() << " trace length "
854 << patterns_.size() << " shingle indexes "
855 << single_use_pattern_queue_.size() << " single use patterns "
856 << active_non_single_use_patterns_.size() << " active patterns";
sra@chromium.org8cd96962009-06-03 23:28:14857 }
858
859 void PrintActivePatterns() const {
860 for (ShinglePatternSet::const_iterator p =
861 active_non_single_use_patterns_.begin();
862 p != active_non_single_use_patterns_.end();
863 ++p) {
864 const ShinglePattern* pattern = *p;
pkasting@chromium.org18cca202010-10-21 20:40:58865 VLOG(2) << ToString(pattern, 10);
sra@chromium.org8cd96962009-06-03 23:28:14866 }
867 }
868
869 void PrintPatterns() const {
870 PrintAllPatterns();
871 PrintActivePatterns();
872 PrintAllShingles();
873 }
874
875 void PrintAllPatterns() const {
876 for (IndexToPattern::const_iterator p = patterns_.begin();
877 p != patterns_.end();
878 ++p) {
879 const ShinglePattern& pattern = p->second;
pkasting@chromium.org18cca202010-10-21 20:40:58880 VLOG(2) << ToString(&pattern, 10);
sra@chromium.org8cd96962009-06-03 23:28:14881 }
882 }
883
884 void PrintAllShingles() const {
885 for (Shingle::OwningSet::const_iterator p = shingle_instances_.begin();
886 p != shingle_instances_.end();
887 ++p) {
888 const Shingle& instance = *p;
pkasting@chromium.org18cca202010-10-21 20:40:58889 VLOG(2) << ToString(&instance) << " " << ToString(instance.pattern());
sra@chromium.org8cd96962009-06-03 23:28:14890 }
891 }
892
893
894 void AddShingles(size_t begin, size_t end) {
pkasting8e3a26a2014-10-03 18:52:29895 for (size_t i = begin; i + Shingle::kWidth - 1 < end; ++i) {
sra@chromium.org8cd96962009-06-03 23:28:14896 instances_[i] = Shingle::Find(trace_, i, &shingle_instances_);
897 }
898 }
899
900 void Declassify(Shingle* shingle) {
901 ShinglePattern* pattern = shingle->pattern();
902 if (shingle->InModel()) {
903 pattern->model_histogram_.erase(ShinglePattern::FreqView(shingle));
904 pattern->model_coverage_ -= shingle->position_count();
905 } else {
906 pattern->program_histogram_.erase(ShinglePattern::FreqView(shingle));
907 pattern->program_coverage_ -= shingle->position_count();
908 }
909 shingle->set_pattern(NULL);
910 }
911
912 void Reclassify(Shingle* shingle) {
913 ShinglePattern* pattern = shingle->pattern();
914 LOG_ASSERT(pattern == NULL);
915
916 ShinglePattern::Index index(shingle);
917 if (index.variables_ == 0)
918 return;
919
920 std::pair<IndexToPattern::iterator, bool> inserted =
921 patterns_.insert(std::make_pair(index, ShinglePattern()));
922
923 pattern = &inserted.first->second;
924 pattern->index_ = &inserted.first->first;
925 shingle->set_pattern(pattern);
926 patterns_needing_updates_.insert(pattern);
927
928 if (shingle->InModel()) {
929 pattern->model_histogram_.insert(ShinglePattern::FreqView(shingle));
930 pattern->model_coverage_ += shingle->position_count();
931 } else {
932 pattern->program_histogram_.insert(ShinglePattern::FreqView(shingle));
933 pattern->program_coverage_ += shingle->position_count();
934 }
935 }
936
937 void InitialClassify() {
938 for (Shingle::OwningSet::iterator p = shingle_instances_.begin();
939 p != shingle_instances_.end();
940 ++p) {
sra@google.com54f1b822009-07-18 03:28:40941 // GCC's set<T>::iterator::operator *() returns a const object.
942 Reclassify(const_cast<Shingle*>(&*p));
sra@chromium.org8cd96962009-06-03 23:28:14943 }
944 }
945
946 // For the positions in |info|, find the shingles that overlap that position.
947 void AddAffectedPositions(LabelInfo* info, ShingleSet* affected_shingles) {
aviab98dcc92015-12-21 19:35:33948 const uint8_t kWidth = Shingle::kWidth;
sra@chromium.org8cd96962009-06-03 23:28:14949 for (size_t i = 0; i < info->positions_.size(); ++i) {
950 size_t position = info->positions_[i];
951 // Find bounds to the subrange of |trace_| we are in.
952 size_t start = position < model_end_ ? 0 : model_end_;
953 size_t end = position < model_end_ ? model_end_ : trace_.size();
954
955 // Clip [position-kWidth+1, position+1)
pkasting8e3a26a2014-10-03 18:52:29956 size_t low =
957 position > start + kWidth - 1 ? position - kWidth + 1 : start;
sra@chromium.org8cd96962009-06-03 23:28:14958 size_t high = position + kWidth < end ? position + 1 : end - kWidth + 1;
959
960 for (size_t shingle_position = low;
961 shingle_position < high;
962 ++shingle_position) {
963 Shingle* overlapping_shingle = instances_.at(shingle_position);
964 affected_shingles->insert(overlapping_shingle);
965 }
966 }
967 }
968
969 void RemovePatternsNeedingUpdatesFromQueues() {
970 for (ShinglePatternSet::iterator p = patterns_needing_updates_.begin();
971 p != patterns_needing_updates_.end();
972 ++p) {
973 RemovePatternFromQueues(*p);
974 }
975 }
976
977 void AddPatternsNeedingUpdatesToQueues() {
978 for (ShinglePatternSet::iterator p = patterns_needing_updates_.begin();
979 p != patterns_needing_updates_.end();
980 ++p) {
981 AddPatternToQueues(*p);
982 }
983 variable_queue_.ApplyPendingUpdates();
984 }
985
986 void RemovePatternFromQueues(const ShinglePattern* pattern) {
987 int single_use_score = SingleUseScore(pattern);
988 if (single_use_score > 0) {
989 size_t n = single_use_pattern_queue_.erase(pattern);
990 LOG_ASSERT(n == 1);
erg@google.comf6b8ce32011-03-02 00:03:18991 } else if (pattern->program_histogram_.empty() &&
992 pattern->model_histogram_.empty()) {
sra@chromium.org8cd96962009-06-03 23:28:14993 NOTREACHED(); // Should not come back to life.
erg@google.comf6b8ce32011-03-02 00:03:18994 } else if (pattern->program_histogram_.empty()) {
sra@chromium.org8cd96962009-06-03 23:28:14995 // Useless pattern.
996 } else {
997 active_non_single_use_patterns_.erase(pattern);
998 AddPatternToLabelQueue(pattern, -1);
999 }
1000 }
1001
1002 void AddPatternToQueues(const ShinglePattern* pattern) {
1003 int single_use_score = SingleUseScore(pattern);
1004 if (single_use_score > 0) {
1005 single_use_pattern_queue_.insert(pattern);
erg@google.comf6b8ce32011-03-02 00:03:181006 } else if (pattern->program_histogram_.empty() &&
1007 pattern->model_histogram_.empty()) {
1008 } else if (pattern->program_histogram_.empty()) {
sra@chromium.org8cd96962009-06-03 23:28:141009 // Useless pattern.
1010 } else {
1011 active_non_single_use_patterns_.insert(pattern);
1012 AddPatternToLabelQueue(pattern, +1);
1013 }
1014 }
1015
1016 void AddPatternToLabelQueue(const ShinglePattern* pattern, int sign) {
1017 // For each possible assignment in this pattern, update the potential
1018 // contributions to the LabelInfo queues.
sra@chromium.org8cd96962009-06-03 23:28:141019
1020 // We want to find for each symbol (LabelInfo) the maximum contribution that
1021 // could be achieved by making shingle-wise assignments between shingles in
1022 // the model and shingles in the program.
1023 //
1024 // If the shingles in the histograms are independent (no two shingles have a
1025 // symbol in common) then any permutation of the assignments is possible,
1026 // and the maximum contribution can be found by taking the maximum over all
1027 // the pairs.
1028 //
1029 // If the shingles are dependent two things happen. The maximum
1030 // contribution to any given symbol is a sum because the symbol has
1031 // contributions from all the shingles containing it. Second, some
1032 // assignments are blocked by previous incompatible assignments. We want to
1033 // avoid a combinatorial search, so we ignore the blocking.
1034
sra@google.com54f1b822009-07-18 03:28:401035 const size_t kUnwieldy = 5;
sra@chromium.org8cd96962009-06-03 23:28:141036
1037 typedef std::map<LabelInfo*, int> LabelToScore;
1038 typedef std::map<LabelInfo*, LabelToScore > ScoreSet;
1039 ScoreSet maxima;
1040
1041 size_t n_model_samples = 0;
1042 for (ShinglePattern::Histogram::const_iterator model_iter =
1043 pattern->model_histogram_.begin();
1044 model_iter != pattern->model_histogram_.end();
1045 ++model_iter) {
1046 if (++n_model_samples > kUnwieldy) break;
1047 const ShinglePattern::FreqView& model_freq = *model_iter;
1048 int m1 = model_freq.count();
1049 const Shingle* model_instance = model_freq.instance();
1050
1051 ScoreSet sums;
1052 size_t n_program_samples = 0;
1053 for (ShinglePattern::Histogram::const_iterator program_iter =
1054 pattern->program_histogram_.begin();
1055 program_iter != pattern->program_histogram_.end();
1056 ++program_iter) {
1057 if (++n_program_samples > kUnwieldy) break;
1058 const ShinglePattern::FreqView& program_freq = *program_iter;
1059 int p1 = program_freq.count();
1060 const Shingle* program_instance = program_freq.instance();
1061
1062 // int score = p1; // ? weigh all equally??
1063 int score = std::min(p1, m1);
1064
aviab98dcc92015-12-21 19:35:331065 for (uint8_t i = 0; i < Shingle::kWidth; ++i) {
sra@chromium.org8cd96962009-06-03 23:28:141066 LabelInfo* program_info = program_instance->at(i);
1067 LabelInfo* model_info = model_instance->at(i);
1068 if ((model_info->assignment_ == NULL) !=
1069 (program_info->assignment_ == NULL)) {
pkasting@chromium.org18cca202010-10-21 20:40:581070 VLOG(2) << "ERROR " << i
1071 << "\n\t" << ToString(pattern, 10)
1072 << "\n\t" << ToString(program_instance)
1073 << "\n\t" << ToString(model_instance);
sra@chromium.org8cd96962009-06-03 23:28:141074 }
1075 if (!program_info->assignment_ && !model_info->assignment_) {
1076 sums[program_info][model_info] += score;
1077 }
1078 }
huangs2bffdb362015-08-18 14:05:341079 }
sra@chromium.org8cd96962009-06-03 23:28:141080
huangs2bffdb362015-08-18 14:05:341081 for (ScoreSet::iterator assignee_iterator = sums.begin();
1082 assignee_iterator != sums.end();
1083 ++assignee_iterator) {
1084 LabelInfo* program_info = assignee_iterator->first;
1085 for (LabelToScore::iterator p = assignee_iterator->second.begin();
1086 p != assignee_iterator->second.end();
1087 ++p) {
1088 LabelInfo* model_info = p->first;
1089 int score = p->second;
1090 int* slot = &maxima[program_info][model_info];
1091 *slot = std::max(*slot, score);
sra@chromium.org8cd96962009-06-03 23:28:141092 }
1093 }
1094 }
1095
1096 for (ScoreSet::iterator assignee_iterator = maxima.begin();
1097 assignee_iterator != maxima.end();
1098 ++assignee_iterator) {
1099 LabelInfo* program_info = assignee_iterator->first;
1100 for (LabelToScore::iterator p = assignee_iterator->second.begin();
1101 p != assignee_iterator->second.end();
1102 ++p) {
1103 LabelInfo* model_info = p->first;
1104 int score = sign * p->second;
1105 variable_queue_.AddPendingUpdate(program_info, model_info, score);
1106 }
1107 }
1108 }
1109
1110 void AssignOne(LabelInfo* model_info, LabelInfo* program_info) {
1111 LOG_ASSERT(!model_info->assignment_);
1112 LOG_ASSERT(!program_info->assignment_);
1113 LOG_ASSERT(model_info->is_model_);
1114 LOG_ASSERT(!program_info->is_model_);
1115
pkasting@chromium.org18cca202010-10-21 20:40:581116 VLOG(3) << "Assign " << ToString(program_info)
1117 << " := " << ToString(model_info);
sra@chromium.org8cd96962009-06-03 23:28:141118
1119 ShingleSet affected_shingles;
1120 AddAffectedPositions(model_info, &affected_shingles);
1121 AddAffectedPositions(program_info, &affected_shingles);
1122
1123 for (ShingleSet::iterator p = affected_shingles.begin();
1124 p != affected_shingles.end();
1125 ++p) {
1126 patterns_needing_updates_.insert((*p)->pattern());
1127 }
1128
1129 RemovePatternsNeedingUpdatesFromQueues();
1130
1131 for (ShingleSet::iterator p = affected_shingles.begin();
1132 p != affected_shingles.end();
1133 ++p) {
1134 Declassify(*p);
1135 }
1136
1137 program_info->label_->index_ = model_info->label_->index_;
1138 // Mark as assigned
1139 model_info->assignment_ = program_info;
1140 program_info->assignment_ = model_info;
1141
1142 for (ShingleSet::iterator p = affected_shingles.begin();
1143 p != affected_shingles.end();
1144 ++p) {
1145 Reclassify(*p);
1146 }
1147
1148 AddPatternsNeedingUpdatesToQueues();
1149 }
1150
1151 bool AssignFirstVariableOfHistogramHead(const ShinglePattern& pattern) {
1152 const ShinglePattern::FreqView& program_1 =
1153 *pattern.program_histogram_.begin();
1154 const ShinglePattern::FreqView& model_1 = *pattern.model_histogram_.begin();
1155 const Shingle* program_instance = program_1.instance();
1156 const Shingle* model_instance = model_1.instance();
1157 size_t variable_index = pattern.index_->first_variable_index_;
1158 LabelInfo* program_info = program_instance->at(variable_index);
1159 LabelInfo* model_info = model_instance->at(variable_index);
1160 AssignOne(model_info, program_info);
1161 return true;
1162 }
1163
1164 bool FindAndAssignBestLeader() {
1165 LOG_ASSERT(patterns_needing_updates_.empty());
1166
1167 if (!single_use_pattern_queue_.empty()) {
1168 const ShinglePattern& pattern = **single_use_pattern_queue_.begin();
1169 return AssignFirstVariableOfHistogramHead(pattern);
1170 }
1171
1172 if (variable_queue_.empty())
1173 return false;
1174
1175 const VariableQueue::ScoreAndLabel best = variable_queue_.first();
1176 int score = best.first;
1177 LabelInfo* assignee = best.second;
1178
1179 // TODO(sra): score (best.first) can be zero. A zero score means we are
1180 // blindly picking between two (or more) alternatives which look the same.
1181 // If we exit on the first zero-score we sometimes get 3-4% better total
1182 // compression. This indicates that 'infill' is doing a better job than
1183 // picking blindly. Perhaps we can use an extended region around the
1184 // undistinguished competing alternatives to break the tie.
1185 if (score == 0) {
1186 variable_queue_.Print();
1187 return false;
1188 }
1189
1190 AssignmentCandidates* candidates = assignee->candidates();
1191 if (candidates->empty())
1192 return false; // Should not happen.
1193
1194 AssignOne(candidates->top_candidate(), assignee);
1195 return true;
1196 }
1197
1198 private:
1199 // The trace vector contains the model sequence [0, model_end_) followed by
1200 // the program sequence [model_end_, trace.end())
1201 const Trace& trace_;
1202 size_t model_end_;
1203
1204 // |shingle_instances_| is the set of 'interned' shingles.
1205 Shingle::OwningSet shingle_instances_;
1206
1207 // |instances_| maps from position in |trace_| to Shingle at that position.
1208 std::vector<Shingle*> instances_;
1209
1210 SingleUsePatternQueue single_use_pattern_queue_;
1211 ShinglePatternSet active_non_single_use_patterns_;
1212 VariableQueue variable_queue_;
1213
1214 // Transient information: when we make an assignment, we need to recompute
1215 // priority queue information derived from these ShinglePatterns.
1216 ShinglePatternSet patterns_needing_updates_;
1217
1218 typedef std::map<ShinglePattern::Index,
1219 ShinglePattern, ShinglePatternIndexLess> IndexToPattern;
1220 IndexToPattern patterns_;
1221
1222 DISALLOW_COPY_AND_ASSIGN(AssignmentProblem);
1223};
1224
1225class Adjuster : public AdjustmentMethod {
1226 public:
sra@chromium.orgd38c3a22009-07-22 01:30:061227 Adjuster() : prog_(NULL), model_(NULL) {}
Chris Watkins2903a822017-11-30 03:19:571228 ~Adjuster() = default;
sra@chromium.org8cd96962009-06-03 23:28:141229
1230 bool Adjust(const AssemblyProgram& model, AssemblyProgram* program) {
pkasting@chromium.org18cca202010-10-21 20:40:581231 VLOG(1) << "Adjuster::Adjust";
sra@chromium.org8cd96962009-06-03 23:28:141232 prog_ = program;
1233 model_ = &model;
1234 return Finish();
1235 }
1236
1237 bool Finish() {
1238 prog_->UnassignIndexes();
1239 Trace abs32_trace_;
1240 Trace rel32_trace_;
1241 CollectTraces(model_, &abs32_trace_, &rel32_trace_, true);
1242 size_t abs32_model_end = abs32_trace_.size();
1243 size_t rel32_model_end = rel32_trace_.size();
1244 CollectTraces(prog_, &abs32_trace_, &rel32_trace_, false);
1245 Solve(abs32_trace_, abs32_model_end);
1246 Solve(rel32_trace_, rel32_model_end);
1247 prog_->AssignRemainingIndexes();
1248 return true;
1249 }
1250
1251 private:
1252 void CollectTraces(const AssemblyProgram* program, Trace* abs32, Trace* rel32,
1253 bool is_model) {
1254 label_info_maker_.ResetDebugLabel();
huangs99a5a8c2016-10-28 22:29:311255
huangsc4155eb2017-04-13 20:47:071256 for (Label* label : program->abs32_label_annotations())
1257 ReferenceLabel(abs32, is_model, label);
1258 for (Label* label : program->rel32_label_annotations())
1259 ReferenceLabel(rel32, is_model, label);
huangs99a5a8c2016-10-28 22:29:311260
sra@chromium.org8cd96962009-06-03 23:28:141261 // TODO(sra): we could simply append all the labels in index order to
1262 // incorporate some costing for entropy (bigger deltas) that will be
1263 // introduced into the label address table by non-monotonic ordering. This
1264 // would have some knock-on effects to parts of the algorithm that work on
1265 // single-occurrence labels.
1266 }
1267
1268 void Solve(const Trace& model, size_t model_end) {
laforge@chromium.org7e6b9262010-03-15 17:33:461269 base::Time start_time = base::Time::Now();
sra@chromium.org8cd96962009-06-03 23:28:141270 AssignmentProblem a(model, model_end);
1271 a.Solve();
pkasting@chromium.org18cca202010-10-21 20:40:581272 VLOG(1) << " Adjuster::Solve "
1273 << (base::Time::Now() - start_time).InSecondsF();
sra@chromium.org8cd96962009-06-03 23:28:141274 }
1275
huangs99a5a8c2016-10-28 22:29:311276 void ReferenceLabel(Trace* trace, bool is_model, Label* label) {
aviab98dcc92015-12-21 19:35:331277 trace->push_back(label_info_maker_.MakeLabelInfo(
1278 label, is_model, static_cast<uint32_t>(trace->size())));
sra@chromium.org8cd96962009-06-03 23:28:141279 }
1280
1281 AssemblyProgram* prog_; // Program to be adjusted, owned by caller.
1282 const AssemblyProgram* model_; // Program to be mimicked, owned by caller.
1283
1284 LabelInfoMaker label_info_maker_;
1285
1286 private:
1287 DISALLOW_COPY_AND_ASSIGN(Adjuster);
1288};
1289
1290////////////////////////////////////////////////////////////////////////////////
1291
1292} // namespace adjustment_method_2
1293
1294AdjustmentMethod* AdjustmentMethod::MakeShingleAdjustmentMethod() {
1295 return new adjustment_method_2::Adjuster();
1296}
1297
1298} // namespace courgette