[go: nahoru, domu]

blob: f0047dba4d6db1252db4e2c3476791f5093e772e [file] [log] [blame]
Avi Drissman0ff8a7e2022-09-13 18:35:421// Copyright 2011 The Chromium Authors
sra@chromium.org8cd96962009-06-03 23:28:142// Use of this source code is governed by a BSD-style license that can be
3// found in the LICENSE file.
4
Keishi Hattori0e45c022021-11-27 09:25:525#include "base/memory/raw_ptr.h"
Ali Hijazi55179192022-11-09 16:28:516#include "base/memory/raw_ref.h"
sra@chromium.org8cd96962009-06-03 23:28:147#include "courgette/adjustment_method.h"
8
aviab98dcc92015-12-21 19:35:339#include <stddef.h>
10#include <stdint.h>
11
sra@chromium.org8cd96962009-06-03 23:28:1412#include <algorithm>
13#include <limits>
14#include <list>
15#include <map>
16#include <set>
17#include <string>
18#include <vector>
19
evan@chromium.org34b2b002009-11-20 06:53:2820#include "base/format_macros.h"
sra@chromium.org8cd96962009-06-03 23:28:1421#include "base/logging.h"
avi@chromium.org68c986e2013-06-11 13:26:2522#include "base/strings/stringprintf.h"
avi@chromium.orgb43e5562013-06-28 15:20:0223#include "base/time/time.h"
sra@chromium.org8cd96962009-06-03 23:28:1424#include "courgette/assembly_program.h"
25#include "courgette/courgette.h"
26#include "courgette/encoded_program.h"
sra@chromium.org8cd96962009-06-03 23:28:1427
28/*
29
30Shingle weighting matching.
31
32We have a sequence S1 of symbols from alphabet A1={A,B,C,...} called the 'model'
33and a second sequence of S2 of symbols from alphabet A2={U,V,W,....} called the
34'program'. Each symbol in A1 has a unique numerical name or index. We can
35transcribe the sequence S1 to a sequence T1 of indexes of the symbols. We wish
36to assign indexes to the symbols in A2 so that when we transcribe S2 into T2, T2
37has long subsequences that occur in T1. This will ensure that the sequence
38T1;T2 compresses to be only slightly larger than the compressed T1.
39
40The algorithm for matching members of S2 with members of S1 is eager - it makes
41matches without backtracking, until no more matches can be made. Each variable
42(symbol) U,V,... in A2 has a set of candidates from A1, each candidate with a
43weight summarizing the evidence for the match. We keep a VariableQueue of
44U,V,... sorted by how much the evidence for the best choice outweighs the
45evidence for the second choice, i.e. prioritized by how 'clear cut' the best
46assignment is. We pick the variable with the most clear-cut candidate, make the
47assignment, adjust the evidence and repeat.
48
49What has not been described so far is how the evidence is gathered and
50maintained. We are working under the assumption that S1 and S2 are largely
51similar. (A different assumption might be that S1 and S2 are dissimilar except
52for many long subsequences.)
53
54A naive algorithm would consider all pairs (A,U) and for each pair assess the
55benefit, or score, the assignment U:=A. The score might count the number of
56occurrences of U in S2 which appear in similar contexts to A in S1.
57
58To distinguish contexts we view S1 and S2 as a sequence of overlapping k-length
59substrings or 'shingles'. Two shingles are compatible if the symbols in one
60shingle could be matched with the symbols in the other symbol. For example, ABC
61is *not* compatible with UVU because it would require conflicting matches A=U
62and C=U. ABC is compatible with UVW, UWV, WUV, VUW etc. We can't tell which
63until we make an assignment - the compatible shingles form an equivalence class.
64After assigning U:=A then only UVW and UWV (equivalently AVW, AWV) are
65compatible. As we make assignments the number of equivalence classes of
66shingles increases and the number of members of each equivalence class
67decreases. The compatibility test becomes more restrictive.
68
69We gather evidence for the potential assignment U:=A by counting how many
70shingles containing U are compatible with shingles containing A. Thus symbols
71occurring a large number of times in compatible contexts will be assigned first.
72
73Finding the 'most clear-cut' assignment by considering all pairs symbols and for
74each pair comparing the contexts of each pair of occurrences of the symbols is
75computationally infeasible. We get the job done in a reasonable time by
76approaching it 'backwards' and making incremental changes as we make
77assignments.
78
79First the shingles are partitioned according to compatibility. In S1=ABCDD and
80S2=UVWXX we have a total of 6 shingles, each occuring once. (ABC:1 BCD:1 CDD:1;
81UVW:1 VWX: WXX:1) all fit the pattern <V0 V1 V2> or the pattern <V0 V1 V1>. The
82first pattern indicates that each position matches a different symbol, the
83second pattern indicates that the second symbol is repeated.
84
85 pattern S1 members S2 members
86 <V0 V1 V2>: {ABC:1, BCD:1}; {UVW:1, VWX:1}
87 <V0 V1 V1>: {CDD:1} {WXX:1}
88
89The second pattern appears to have a unique assignment but we don't make the
90assignment on such scant evidence. If S1 and S2 do not match exactly, there
91will be numerous spurious low-score matches like this. Instead we must see what
92assignments are indicated by considering all of the evidence.
93
94First pattern has 2 x 2 = 4 shingle pairs. For each pair we count the number
95of symbol assignments. For ABC:a * UVW:b accumulate min(a,b) to each of
96 {U:=A, V:=B, W:=C}.
97After accumulating over all 2 x 2 pairs:
98 U: {A:1 B:1}
99 V: {A:1 B:2 C:1}
100 W: {B:1 C:2 D:1 }
101 X: {C:1 D:1}
102The second pattern contributes:
103 W: {C:1}
104 X: {D:2}
105Sum:
106 U: {A:1 B:1}
107 V: {A:1 B:2 C:1}
108 W: {B:1 C:3 D:1}
109 X: {C:1 D:3}
110
111From this we decide to assign X:=D (because this assignment has both the largest
112difference above the next candidate (X:=C) and this is also the largest
113proportionately over the sum of alternatives).
114
115Lets assume D has numerical 'name' 77. The assignment X:=D sets X to 77 too.
116Next we repartition all the shingles containing X or D:
117
118 pattern S1 members S2 members
119 <V0 V1 V2>: {ABC:1}; {UVW:1}
120 <V0 V1 77>: {BCD:1}; {VWX:1}
121 <V0 77 77>: {CDD:1} {WXX:1}
122As we repartition, we recalculate the contributions to the scores:
123 U: {A:1}
124 V: {B:2}
125 W: {C:3}
126All the remaining assignments are now fixed.
127
128There is one step in the incremental algorithm that is still infeasibly
129expensive: the contributions due to the cross product of large equivalence
130classes. We settle for making an approximation by computing the contribution of
131the cross product of only the most common shingles. The hope is that the noise
132from the long tail of uncounted shingles is well below the scores being used to
133pick assignments. The second hope is that as assignment are made, the large
134equivalence class will be partitioned into smaller equivalence classes, reducing
135the noise over time.
136
137In the code below the shingles are bigger (Shingle::kWidth = 5).
138Class ShinglePattern holds the data for one pattern.
139
140There is an optimization for this case:
141 <V0 V1 V1>: {CDD:1} {WXX:1}
142
143Above we said that we don't make an assignment on this "scant evidence". There
144is an exception: if there is only one variable unassigned (more like the <V0 77
14577> pattern) AND there are no occurrences of C and W other than those counted in
146this pattern, then there is no competing evidence and we go ahead with the
147assignment immediately. This produces slightly better results because these
148cases tend to be low-scoring and susceptible to small mistakes made in
149low-scoring assignments in the approximation for large equivalence classes.
150
151*/
152
153namespace courgette {
154namespace adjustment_method_2 {
155
sra@chromium.org8cd96962009-06-03 23:28:14156////////////////////////////////////////////////////////////////////////////////
157
158class AssignmentCandidates;
159class LabelInfoMaker;
160class Shingle;
161class ShinglePattern;
162
163// The purpose of adjustment is to assign indexes to Labels of a program 'p' to
164// make the sequence of indexes similar to a 'model' program 'm'. Labels
165// themselves don't have enough information to do this job, so we work with a
166// LabelInfo surrogate for each label.
167//
168class LabelInfo {
169 public:
170 // Just a no-argument constructor and copy constructor. Actual LabelInfo
171 // objects are allocated in std::pair structs in a std::map.
172 LabelInfo()
Raul Tambre5bd92982019-03-28 06:21:55173 : label_(nullptr),
174 is_model_(false),
175 debug_index_(0),
176 refs_(0),
177 assignment_(nullptr),
178 candidates_(nullptr) {}
sra@chromium.org8cd96962009-06-03 23:28:14179
180 ~LabelInfo();
181
182 AssignmentCandidates* candidates();
183
Keishi Hattori0e45c022021-11-27 09:25:52184 raw_ptr<Label> label_; // The label that this info a surrogate for.
sra@chromium.org8cd96962009-06-03 23:28:14185
aviab98dcc92015-12-21 19:35:33186 uint32_t is_model_ : 1; // Is the label in the model?
187 uint32_t debug_index_ : 31; // A small number for naming the label in debug
188 // output. The pair (is_model_, debug_index_) is
189 // unique.
sra@chromium.org8cd96962009-06-03 23:28:14190
sra@google.com54f1b822009-07-18 03:28:40191 int refs_; // Number of times this Label is referenced.
sra@chromium.org8cd96962009-06-03 23:28:14192
Arthur Sonzognie98d2142023-06-01 15:02:25193 raw_ptr<LabelInfo, DanglingUntriaged>
Keishi Hattori0e45c022021-11-27 09:25:52194 assignment_; // Label from other program corresponding to this.
sra@chromium.org8cd96962009-06-03 23:28:14195
aviab98dcc92015-12-21 19:35:33196 std::vector<uint32_t> positions_; // Offsets into the trace of references.
sra@chromium.org8cd96962009-06-03 23:28:14197
198 private:
Arthur Sonzognie98d2142023-06-01 15:02:25199 raw_ptr<AssignmentCandidates, DanglingUntriaged> candidates_;
sra@chromium.org8cd96962009-06-03 23:28:14200
201 void operator=(const LabelInfo*); // Disallow assignment only.
202 // Public compiler generated copy constructor is needed to constuct
203 // std::pair<Label*, LabelInfo> so that fresh LabelInfos can be allocated
204 // inside a std::map.
205};
206
Ali Hijazie63cbaf62023-12-20 19:29:35207typedef std::vector<raw_ptr<LabelInfo, VectorExperimental>> Trace;
sra@chromium.org8cd96962009-06-03 23:28:14208
209std::string ToString(const LabelInfo* info) {
210 std::string s;
tfarina@chromium.orga77fa2dc2010-11-15 12:11:11211 base::StringAppendF(&s, "%c%d", "pm"[info->is_model_], info->debug_index_);
sra@chromium.org8cd96962009-06-03 23:28:14212 if (info->label_->index_ != Label::kNoIndex)
tfarina@chromium.orga77fa2dc2010-11-15 12:11:11213 base::StringAppendF(&s, " (%d)", info->label_->index_);
sra@chromium.org8cd96962009-06-03 23:28:14214
tfarina@chromium.orga77fa2dc2010-11-15 12:11:11215 base::StringAppendF(&s, " #%u", info->refs_);
sra@chromium.org8cd96962009-06-03 23:28:14216 return s;
217}
218
219// LabelInfoMaker maps labels to their surrogate LabelInfo objects.
220class LabelInfoMaker {
221 public:
222 LabelInfoMaker() : debug_label_index_gen_(0) {}
223
Peter Boström896f1372021-11-05 01:12:30224 LabelInfoMaker(const LabelInfoMaker&) = delete;
225 LabelInfoMaker& operator=(const LabelInfoMaker&) = delete;
226
aviab98dcc92015-12-21 19:35:33227 LabelInfo* MakeLabelInfo(Label* label, bool is_model, uint32_t position) {
sra@chromium.org8cd96962009-06-03 23:28:14228 LabelInfo& slot = label_infos_[label];
Raul Tambre5bd92982019-03-28 06:21:55229 if (slot.label_ == nullptr) {
sra@chromium.org8cd96962009-06-03 23:28:14230 slot.label_ = label;
231 slot.is_model_ = is_model;
232 slot.debug_index_ = ++debug_label_index_gen_;
233 }
234 slot.positions_.push_back(position);
235 ++slot.refs_;
236 return &slot;
237 }
238
239 void ResetDebugLabel() { debug_label_index_gen_ = 0; }
240
241 private:
242 int debug_label_index_gen_;
243
244 // Note LabelInfo is allocated 'flat' inside map::value_type, so the LabelInfo
245 // lifetimes are managed by the map.
246 std::map<Label*, LabelInfo> label_infos_;
sra@chromium.org8cd96962009-06-03 23:28:14247};
248
249struct OrderLabelInfo {
250 bool operator()(const LabelInfo* a, const LabelInfo* b) const {
251 if (a->label_->rva_ < b->label_->rva_) return true;
252 if (a->label_->rva_ > b->label_->rva_) return false;
253 if (a == b) return false;
254 return a->positions_ < b->positions_; // Lexicographic ordering of vector.
255 }
256};
257
258// AssignmentCandidates is a priority queue of candidate assignments to
259// a single program LabelInfo, |program_info_|.
260class AssignmentCandidates {
261 public:
262 explicit AssignmentCandidates(LabelInfo* program_info)
263 : program_info_(program_info) {}
264
265 LabelInfo* program_info() const { return program_info_; }
266
267 bool empty() const { return label_to_score_.empty(); }
268
269 LabelInfo* top_candidate() const { return queue_.begin()->second; }
270
271 void Update(LabelInfo* model_info, int delta_score) {
272 LOG_ASSERT(delta_score != 0);
273 int old_score = 0;
274 int new_score = 0;
275 LabelToScore::iterator p = label_to_score_.find(model_info);
276 if (p != label_to_score_.end()) {
277 old_score = p->second;
278 new_score = old_score + delta_score;
279 queue_.erase(ScoreAndLabel(old_score, p->first));
280 if (new_score == 0) {
281 label_to_score_.erase(p);
282 } else {
283 p->second = new_score;
284 queue_.insert(ScoreAndLabel(new_score, model_info));
285 }
286 } else {
287 new_score = delta_score;
288 label_to_score_.insert(std::make_pair(model_info, new_score));
289 queue_.insert(ScoreAndLabel(new_score, model_info));
290 }
291 LOG_ASSERT(queue_.size() == label_to_score_.size());
292 }
293
294 int TopScore() const {
295 int first_value = 0;
296 int second_value = 0;
297 Queue::const_iterator p = queue_.begin();
298 if (p != queue_.end()) {
299 first_value = p->first;
300 ++p;
301 if (p != queue_.end()) {
302 second_value = p->first;
303 }
304 }
305 return first_value - second_value;
306 }
307
308 bool HasPendingUpdates() { return !pending_updates_.empty(); }
309
310 void AddPendingUpdate(LabelInfo* model_info, int delta_score) {
311 LOG_ASSERT(delta_score != 0);
312 pending_updates_[model_info] += delta_score;
313 }
314
315 void ApplyPendingUpdates() {
316 // TODO(sra): try to walk |pending_updates_| and |label_to_score_| in
317 // lockstep. Try to batch updates to |queue_|.
sra@chromium.org8cd96962009-06-03 23:28:14318 for (LabelToScore::iterator p = pending_updates_.begin();
319 p != pending_updates_.end();
320 ++p) {
321 if (p->second != 0)
322 Update(p->first, p->second);
sra@chromium.org8cd96962009-06-03 23:28:14323 }
324 pending_updates_.clear();
325 }
326
327 void Print(int max) {
pkasting@chromium.org18cca202010-10-21 20:40:58328 VLOG(2) << "score " << TopScore() << " " << ToString(program_info_)
329 << " := ?";
sra@chromium.org8cd96962009-06-03 23:28:14330 if (!pending_updates_.empty())
pkasting@chromium.org18cca202010-10-21 20:40:58331 VLOG(2) << pending_updates_.size() << " pending";
sra@chromium.org8cd96962009-06-03 23:28:14332 int count = 0;
333 for (Queue::iterator q = queue_.begin(); q != queue_.end(); ++q) {
334 if (++count > max) break;
pkasting@chromium.org18cca202010-10-21 20:40:58335 VLOG(2) << " " << q->first << " " << ToString(q->second);
sra@chromium.org8cd96962009-06-03 23:28:14336 }
337 }
338
339 private:
340 typedef std::map<LabelInfo*, int, OrderLabelInfo> LabelToScore;
341 typedef std::pair<int, LabelInfo*> ScoreAndLabel;
342 struct OrderScoreAndLabelByScoreDecreasing {
343 OrderLabelInfo tie_breaker;
344 bool operator()(const ScoreAndLabel& a, const ScoreAndLabel& b) const {
345 if (a.first > b.first) return true;
346 if (a.first < b.first) return false;
347 return tie_breaker(a.second, b.second);
348 }
349 };
350 typedef std::set<ScoreAndLabel, OrderScoreAndLabelByScoreDecreasing> Queue;
351
Keishi Hattori0e45c022021-11-27 09:25:52352 raw_ptr<LabelInfo> program_info_;
sra@chromium.org8cd96962009-06-03 23:28:14353 LabelToScore label_to_score_;
354 LabelToScore pending_updates_;
355 Queue queue_;
356};
357
358AssignmentCandidates* LabelInfo::candidates() {
Raul Tambre5bd92982019-03-28 06:21:55359 if (candidates_ == nullptr)
sra@chromium.org8cd96962009-06-03 23:28:14360 candidates_ = new AssignmentCandidates(this);
361 return candidates_;
362}
363
364LabelInfo::~LabelInfo() {
365 delete candidates_;
366}
367
368// A Shingle is a short fixed-length string of LabelInfos that actually occurs
369// in a Trace. A Shingle may occur many times. We repesent the Shingle by the
370// position of one of the occurrences in the Trace.
371class Shingle {
372 public:
aviab98dcc92015-12-21 19:35:33373 static const uint8_t kWidth = 5;
sra@chromium.org8cd96962009-06-03 23:28:14374
375 struct InterningLess {
376 bool operator()(const Shingle& a, const Shingle& b) const;
377 };
378
379 typedef std::set<Shingle, InterningLess> OwningSet;
380
Peter Kastingecbdbf32021-07-05 17:20:58381 // We can't disallow the copy constructor because we use std::set<Shingle> and
382 // VS2005's implementation of std::set<T>::set() requires T to have a copy
383 // constructor.
384 Shingle(const Shingle&) = default;
385 Shingle& operator=(const Shingle&) = delete; // Disallow assignment only.
386
sra@chromium.org8cd96962009-06-03 23:28:14387 static Shingle* Find(const Trace& trace, size_t position,
388 OwningSet* owning_set) {
389 std::pair<OwningSet::iterator, bool> pair =
390 owning_set->insert(Shingle(trace, position));
sra@google.com54f1b822009-07-18 03:28:40391 // pair.first iterator 'points' to the newly inserted Shingle or the
392 // previouly inserted one that looks the same according to the comparator.
393
394 // const_cast required because key is const. We modify the Shingle
395 // extensively but not in a way that affects InterningLess.
396 Shingle* shingle = const_cast<Shingle*>(&*pair.first);
397 shingle->add_position(position);
398 return shingle;
sra@chromium.org8cd96962009-06-03 23:28:14399 }
400
Ali Hijazi55179192022-11-09 16:28:51401 LabelInfo* at(size_t i) const { return (*trace_)[exemplar_position_ + i]; }
bradnelson@google.com104a6082010-12-21 01:03:43402 void add_position(size_t position) {
aviab98dcc92015-12-21 19:35:33403 positions_.push_back(static_cast<uint32_t>(position));
bradnelson@google.com104a6082010-12-21 01:03:43404 }
405 int position_count() const { return static_cast<int>(positions_.size()); }
sra@chromium.org8cd96962009-06-03 23:28:14406
407 bool InModel() const { return at(0)->is_model_; }
408
409 ShinglePattern* pattern() const { return pattern_; }
410 void set_pattern(ShinglePattern* pattern) { pattern_ = pattern; }
411
412 struct PointerLess {
413 bool operator()(const Shingle* a, const Shingle* b) const {
414 // Arbitrary but repeatable (memory-address) independent ordering:
415 return a->exemplar_position_ < b->exemplar_position_;
416 // return InterningLess()(*a, *b);
417 }
418 };
419
420 private:
421 Shingle(const Trace& trace, size_t exemplar_position)
422 : trace_(trace),
423 exemplar_position_(exemplar_position),
Raul Tambre5bd92982019-03-28 06:21:55424 pattern_(nullptr) {}
sra@chromium.org8cd96962009-06-03 23:28:14425
Ali Hijazi55179192022-11-09 16:28:51426 const raw_ref<const Trace> trace_; // The shingle lives inside trace_.
sra@chromium.org8cd96962009-06-03 23:28:14427 size_t exemplar_position_; // At this position (and other positions).
aviab98dcc92015-12-21 19:35:33428 std::vector<uint32_t> positions_; // Includes exemplar_position_.
sra@chromium.org8cd96962009-06-03 23:28:14429
Arthur Sonzognie98d2142023-06-01 15:02:25430 raw_ptr<ShinglePattern, DanglingUntriaged>
Keishi Hattori0e45c022021-11-27 09:25:52431 pattern_; // Pattern changes as LabelInfos are assigned.
sra@chromium.org8cd96962009-06-03 23:28:14432
433 friend std::string ToString(const Shingle* instance);
sra@chromium.org8cd96962009-06-03 23:28:14434};
435
436std::string ToString(const Shingle* instance) {
437 std::string s;
438 const char* sep = "<";
aviab98dcc92015-12-21 19:35:33439 for (uint8_t i = 0; i < Shingle::kWidth; ++i) {
tfarina@chromium.orga77fa2dc2010-11-15 12:11:11440 // base::StringAppendF(&s, "%s%x ", sep, instance.at(i)->label_->rva_);
sra@chromium.org8cd96962009-06-03 23:28:14441 s += sep;
442 s += ToString(instance->at(i));
443 sep = ", ";
444 }
bradnelson@google.com104a6082010-12-21 01:03:43445 base::StringAppendF(&s, ">(%" PRIuS ")@{%d}",
tfarina@chromium.orga77fa2dc2010-11-15 12:11:11446 instance->exemplar_position_,
447 instance->position_count());
sra@chromium.org8cd96962009-06-03 23:28:14448 return s;
449}
450
451
452bool Shingle::InterningLess::operator()(
453 const Shingle& a,
454 const Shingle& b) const {
aviab98dcc92015-12-21 19:35:33455 for (uint8_t i = 0; i < kWidth; ++i) {
sra@chromium.org8cd96962009-06-03 23:28:14456 LabelInfo* info_a = a.at(i);
457 LabelInfo* info_b = b.at(i);
458 if (info_a->label_->rva_ < info_b->label_->rva_)
459 return true;
460 if (info_a->label_->rva_ > info_b->label_->rva_)
461 return false;
462 if (info_a->is_model_ < info_b->is_model_)
463 return true;
464 if (info_a->is_model_ > info_b->is_model_)
465 return false;
466 if (info_a != info_b) {
467 NOTREACHED();
468 }
469 }
470 return false;
471}
472
473class ShinglePattern {
474 public:
475 enum { kOffsetMask = 7, // Offset lives in low bits.
476 kFixed = 0, // kind & kVariable == 0 => fixed.
477 kVariable = 8 // kind & kVariable == 1 => variable.
478 };
479 // sequence[position + (kinds_[i] & kOffsetMask)] gives LabelInfo for position
480 // i of shingle. Below, second 'A' is duplicate of position 1, second '102'
481 // is duplicate of position 0.
482 //
483 // <102, A, 103, A , 102>
484 // --> <kFixed+0, kVariable+1, kFixed+2, kVariable+1, kFixed+0>
485 struct Index {
486 explicit Index(const Shingle* instance);
aviab98dcc92015-12-21 19:35:33487 uint8_t kinds_[Shingle::kWidth];
488 uint8_t variables_;
489 uint8_t unique_variables_;
490 uint8_t first_variable_index_;
491 uint32_t hash_;
sra@chromium.org8cd96962009-06-03 23:28:14492 int assigned_indexes_[Shingle::kWidth];
493 };
494
495 // ShinglePattern keeps histograms of member Shingle instances, ordered by
496 // decreasing number of occurrences. We don't have a pair (occurrence count,
497 // Shingle instance), so we use a FreqView adapter to make the instance
498 // pointer look like the pair.
499 class FreqView {
500 public:
501 explicit FreqView(const Shingle* instance) : instance_(instance) {}
bradnelson@google.com104a6082010-12-21 01:03:43502 int count() const { return instance_->position_count(); }
sra@chromium.org8cd96962009-06-03 23:28:14503 const Shingle* instance() const { return instance_; }
504 struct Greater {
505 bool operator()(const FreqView& a, const FreqView& b) const {
506 if (a.count() > b.count()) return true;
507 if (a.count() < b.count()) return false;
508 return resolve_ties(a.instance(), b.instance());
509 }
510 private:
511 Shingle::PointerLess resolve_ties;
512 };
513 private:
Keishi Hattoric1b00232022-11-22 09:04:26514 raw_ptr<const Shingle> instance_;
sra@chromium.org8cd96962009-06-03 23:28:14515 };
516
517 typedef std::set<FreqView, FreqView::Greater> Histogram;
518
Raul Tambre5bd92982019-03-28 06:21:55519 ShinglePattern()
520 : index_(nullptr), model_coverage_(0), program_coverage_(0) {}
sra@chromium.org8cd96962009-06-03 23:28:14521
Keishi Hattori0e45c022021-11-27 09:25:52522 raw_ptr<const Index>
523 index_; // Points to the key in the owning map value_type.
sra@chromium.org8cd96962009-06-03 23:28:14524 Histogram model_histogram_;
525 Histogram program_histogram_;
526 int model_coverage_;
527 int program_coverage_;
528};
529
530std::string ToString(const ShinglePattern::Index* index) {
531 std::string s;
Raul Tambre5bd92982019-03-28 06:21:55532 if (index == nullptr) {
sra@chromium.org8cd96962009-06-03 23:28:14533 s = "<null>";
534 } else {
tfarina@chromium.orga77fa2dc2010-11-15 12:11:11535 base::StringAppendF(&s, "<%d: ", index->variables_);
sra@chromium.org8cd96962009-06-03 23:28:14536 const char* sep = "";
aviab98dcc92015-12-21 19:35:33537 for (uint8_t i = 0; i < Shingle::kWidth; ++i) {
sra@chromium.org8cd96962009-06-03 23:28:14538 s += sep;
539 sep = ", ";
aviab98dcc92015-12-21 19:35:33540 uint32_t kind = index->kinds_[i];
sra@chromium.org8cd96962009-06-03 23:28:14541 int offset = kind & ShinglePattern::kOffsetMask;
542 if (kind & ShinglePattern::kVariable)
tfarina@chromium.orga77fa2dc2010-11-15 12:11:11543 base::StringAppendF(&s, "V%d", offset);
sra@chromium.org8cd96962009-06-03 23:28:14544 else
tfarina@chromium.orga77fa2dc2010-11-15 12:11:11545 base::StringAppendF(&s, "%d", index->assigned_indexes_[offset]);
sra@chromium.org8cd96962009-06-03 23:28:14546 }
tfarina@chromium.orga77fa2dc2010-11-15 12:11:11547 base::StringAppendF(&s, " %x", index->hash_);
sra@chromium.org8cd96962009-06-03 23:28:14548 s += ">";
549 }
550 return s;
551}
552
553std::string HistogramToString(const ShinglePattern::Histogram& histogram,
554 size_t snippet_max) {
555 std::string s;
556 size_t histogram_size = histogram.size();
557 size_t snippet_size = 0;
558 for (ShinglePattern::Histogram::const_iterator p = histogram.begin();
559 p != histogram.end();
560 ++p) {
561 if (++snippet_size > snippet_max && snippet_size != histogram_size) {
562 s += " ...";
563 break;
564 }
bradnelson@google.com104a6082010-12-21 01:03:43565 base::StringAppendF(&s, " %d", p->count());
sra@chromium.org8cd96962009-06-03 23:28:14566 }
567 return s;
568}
569
570std::string HistogramToStringFull(const ShinglePattern::Histogram& histogram,
571 const char* indent,
572 size_t snippet_max) {
573 std::string s;
574
575 size_t histogram_size = histogram.size();
576 size_t snippet_size = 0;
577 for (ShinglePattern::Histogram::const_iterator p = histogram.begin();
578 p != histogram.end();
579 ++p) {
580 s += indent;
581 if (++snippet_size > snippet_max && snippet_size != histogram_size) {
582 s += "...\n";
583 break;
584 }
bradnelson@google.com104a6082010-12-21 01:03:43585 base::StringAppendF(&s, "(%d) ", p->count());
sra@chromium.org8cd96962009-06-03 23:28:14586 s += ToString(&(*p->instance()));
587 s += "\n";
588 }
589 return s;
590}
591
592std::string ToString(const ShinglePattern* pattern, size_t snippet_max = 3) {
593 std::string s;
Raul Tambre5bd92982019-03-28 06:21:55594 if (pattern == nullptr) {
sra@chromium.org8cd96962009-06-03 23:28:14595 s = "<null>";
596 } else {
597 s = "{";
598 s += ToString(pattern->index_);
tfarina@chromium.orga77fa2dc2010-11-15 12:11:11599 base::StringAppendF(&s, "; %d(%d):",
600 static_cast<int>(pattern->model_histogram_.size()),
601 pattern->model_coverage_);
sra@chromium.org8cd96962009-06-03 23:28:14602
603 s += HistogramToString(pattern->model_histogram_, snippet_max);
tfarina@chromium.orga77fa2dc2010-11-15 12:11:11604 base::StringAppendF(&s, "; %d(%d):",
605 static_cast<int>(pattern->program_histogram_.size()),
606 pattern->program_coverage_);
sra@chromium.org8cd96962009-06-03 23:28:14607 s += HistogramToString(pattern->program_histogram_, snippet_max);
608 s += "}";
609 }
610 return s;
611}
612
613std::string ShinglePatternToStringFull(const ShinglePattern* pattern,
614 size_t max) {
615 std::string s;
616 s += ToString(pattern->index_);
617 s += "\n";
618 size_t model_size = pattern->model_histogram_.size();
619 size_t program_size = pattern->program_histogram_.size();
tfarina@chromium.orga77fa2dc2010-11-15 12:11:11620 base::StringAppendF(&s, " model shingles %" PRIuS "\n", model_size);
sra@chromium.org8cd96962009-06-03 23:28:14621 s += HistogramToStringFull(pattern->model_histogram_, " ", max);
tfarina@chromium.orga77fa2dc2010-11-15 12:11:11622 base::StringAppendF(&s, " program shingles %" PRIuS "\n", program_size);
sra@chromium.org8cd96962009-06-03 23:28:14623 s += HistogramToStringFull(pattern->program_histogram_, " ", max);
624 return s;
625}
626
627struct ShinglePatternIndexLess {
628 bool operator()(const ShinglePattern::Index& a,
629 const ShinglePattern::Index& b) const {
630 if (a.hash_ < b.hash_) return true;
631 if (a.hash_ > b.hash_) return false;
632
aviab98dcc92015-12-21 19:35:33633 for (uint8_t i = 0; i < Shingle::kWidth; ++i) {
sra@chromium.org8cd96962009-06-03 23:28:14634 if (a.kinds_[i] < b.kinds_[i]) return true;
635 if (a.kinds_[i] > b.kinds_[i]) return false;
636 if ((a.kinds_[i] & ShinglePattern::kVariable) == 0) {
637 if (a.assigned_indexes_[i] < b.assigned_indexes_[i])
638 return true;
639 if (a.assigned_indexes_[i] > b.assigned_indexes_[i])
640 return false;
641 }
642 }
643 return false;
644 }
645};
646
aviab98dcc92015-12-21 19:35:33647static uint32_t hash_combine(uint32_t h, uint32_t v) {
sra@chromium.org8cd96962009-06-03 23:28:14648 h += v;
649 return (h * (37 + 0x0000d100)) ^ (h >> 13);
650}
651
652ShinglePattern::Index::Index(const Shingle* instance) {
aviab98dcc92015-12-21 19:35:33653 uint32_t hash = 0;
sra@chromium.org8cd96962009-06-03 23:28:14654 variables_ = 0;
655 unique_variables_ = 0;
656 first_variable_index_ = 255;
657
aviab98dcc92015-12-21 19:35:33658 for (uint8_t i = 0; i < Shingle::kWidth; ++i) {
sra@chromium.org8cd96962009-06-03 23:28:14659 LabelInfo* info = instance->at(i);
aviab98dcc92015-12-21 19:35:33660 uint8_t kind = 0;
sra@chromium.org8cd96962009-06-03 23:28:14661 int code = -1;
aviab98dcc92015-12-21 19:35:33662 uint8_t j = 0;
sra@chromium.org8cd96962009-06-03 23:28:14663 for ( ; j < i; ++j) {
664 if (info == instance->at(j)) { // Duplicate LabelInfo
665 kind = kinds_[j];
666 break;
667 }
668 }
669 if (j == i) { // Not found above.
670 if (info->assignment_) {
671 code = info->label_->index_;
672 assigned_indexes_[i] = code;
673 kind = kFixed + i;
674 } else {
675 kind = kVariable + i;
676 ++unique_variables_;
677 if (i < first_variable_index_)
678 first_variable_index_ = i;
679 }
680 }
681 if (kind & kVariable) ++variables_;
682 hash = hash_combine(hash, code);
683 hash = hash_combine(hash, kind);
684 kinds_[i] = kind;
685 assigned_indexes_[i] = code;
686 }
687 hash_ = hash;
688}
689
690struct ShinglePatternLess {
691 bool operator()(const ShinglePattern& a, const ShinglePattern& b) const {
692 return index_less(*a.index_, *b.index_);
693 }
694 ShinglePatternIndexLess index_less;
695};
696
697struct ShinglePatternPointerLess {
698 bool operator()(const ShinglePattern* a, const ShinglePattern* b) const {
699 return pattern_less(*a, *b);
700 }
701 ShinglePatternLess pattern_less;
702};
703
704template<int (*Scorer)(const ShinglePattern*)>
705struct OrderShinglePatternByScoreDescending {
706 bool operator()(const ShinglePattern* a, const ShinglePattern* b) const {
707 int score_a = Scorer(a);
708 int score_b = Scorer(b);
709 if (score_a > score_b) return true;
710 if (score_a < score_b) return false;
711 return break_ties(a, b);
712 }
713 ShinglePatternPointerLess break_ties;
714};
715
716// Returns a score for a 'Single Use' rule. Returns -1 if the rule is not
717// applicable.
718int SingleUseScore(const ShinglePattern* pattern) {
719 if (pattern->index_->variables_ != 1)
720 return -1;
721
722 if (pattern->model_histogram_.size() != 1 ||
723 pattern->program_histogram_.size() != 1)
724 return -1;
725
726 // Does this pattern account for all uses of the variable?
727 const ShinglePattern::FreqView& program_freq =
728 *pattern->program_histogram_.begin();
729 const ShinglePattern::FreqView& model_freq =
730 *pattern->model_histogram_.begin();
731 int p1 = program_freq.count();
732 int m1 = model_freq.count();
733 if (p1 == m1) {
734 const Shingle* program_instance = program_freq.instance();
735 const Shingle* model_instance = model_freq.instance();
736 size_t variable_index = pattern->index_->first_variable_index_;
737 LabelInfo* program_info = program_instance->at(variable_index);
738 LabelInfo* model_info = model_instance->at(variable_index);
739 if (!program_info->assignment_) {
740 if (program_info->refs_ == p1 && model_info->refs_ == m1) {
741 return p1;
742 }
743 }
744 }
745 return -1;
746}
747
748// The VariableQueue is a priority queue of unassigned LabelInfos from
749// the 'program' (the 'variables') and their AssignmentCandidates.
750class VariableQueue {
751 public:
752 typedef std::pair<int, LabelInfo*> ScoreAndLabel;
753
Chris Watkins2903a822017-11-30 03:19:57754 VariableQueue() = default;
sra@chromium.org8cd96962009-06-03 23:28:14755
Peter Boström896f1372021-11-05 01:12:30756 VariableQueue(const VariableQueue&) = delete;
757 VariableQueue& operator=(const VariableQueue&) = delete;
758
sra@chromium.org8cd96962009-06-03 23:28:14759 bool empty() const { return queue_.empty(); }
760
761 const ScoreAndLabel& first() const { return *queue_.begin(); }
762
763 // For debugging only.
764 void Print() const {
765 for (Queue::const_iterator p = queue_.begin(); p != queue_.end(); ++p) {
766 AssignmentCandidates* candidates = p->second->candidates();
767 candidates->Print(std::numeric_limits<int>::max());
768 }
769 }
770
771 void AddPendingUpdate(LabelInfo* program_info, LabelInfo* model_info,
772 int delta_score) {
773 AssignmentCandidates* candidates = program_info->candidates();
774 if (!candidates->HasPendingUpdates()) {
775 pending_update_candidates_.push_back(candidates);
776 }
777 candidates->AddPendingUpdate(model_info, delta_score);
778 }
779
780 void ApplyPendingUpdates() {
781 for (size_t i = 0; i < pending_update_candidates_.size(); ++i) {
782 AssignmentCandidates* candidates = pending_update_candidates_[i];
783 int old_score = candidates->TopScore();
784 queue_.erase(ScoreAndLabel(old_score, candidates->program_info()));
785 candidates->ApplyPendingUpdates();
786 if (!candidates->empty()) {
787 int new_score = candidates->TopScore();
788 queue_.insert(ScoreAndLabel(new_score, candidates->program_info()));
789 }
790 }
791 pending_update_candidates_.clear();
792 }
793
794 private:
795 struct OrderScoreAndLabelByScoreDecreasing {
796 bool operator()(const ScoreAndLabel& a, const ScoreAndLabel& b) const {
797 if (a.first > b.first) return true;
798 if (a.first < b.first) return false;
799 return OrderLabelInfo()(a.second, b.second);
800 }
801 };
802 typedef std::set<ScoreAndLabel, OrderScoreAndLabelByScoreDecreasing> Queue;
803
804 Queue queue_;
Ali Hijazie63cbaf62023-12-20 19:29:35805 std::vector<raw_ptr<AssignmentCandidates, VectorExperimental>>
806 pending_update_candidates_;
sra@chromium.org8cd96962009-06-03 23:28:14807};
808
809
810class AssignmentProblem {
811 public:
812 AssignmentProblem(const Trace& trace, size_t model_end)
813 : trace_(trace),
814 model_end_(model_end) {
pkasting@chromium.org18cca202010-10-21 20:40:58815 VLOG(2) << "AssignmentProblem::AssignmentProblem " << model_end << ", "
816 << trace.size();
sra@chromium.org8cd96962009-06-03 23:28:14817 }
818
Peter Boström896f1372021-11-05 01:12:30819 AssignmentProblem(const AssignmentProblem&) = delete;
820 AssignmentProblem& operator=(const AssignmentProblem&) = delete;
821
sra@chromium.org8cd96962009-06-03 23:28:14822 bool Solve() {
823 if (model_end_ < Shingle::kWidth ||
Ali Hijazi55179192022-11-09 16:28:51824 trace_->size() - model_end_ < Shingle::kWidth) {
sra@chromium.org8cd96962009-06-03 23:28:14825 // Nothing much we can do with such a short problem.
826 return true;
827 }
Ali Hijazi55179192022-11-09 16:28:51828 instances_.resize(trace_->size() - Shingle::kWidth + 1, nullptr);
sra@chromium.org8cd96962009-06-03 23:28:14829 AddShingles(0, model_end_);
Ali Hijazi55179192022-11-09 16:28:51830 AddShingles(model_end_, trace_->size());
sra@chromium.org8cd96962009-06-03 23:28:14831 InitialClassify();
832 AddPatternsNeedingUpdatesToQueues();
833
834 patterns_needing_updates_.clear();
pkasting@chromium.org18cca202010-10-21 20:40:58835 while (FindAndAssignBestLeader())
sra@chromium.org8cd96962009-06-03 23:28:14836 patterns_needing_updates_.clear();
sra@chromium.org8cd96962009-06-03 23:28:14837 PrintActivePatterns();
838
839 return true;
840 }
841
842 private:
843 typedef std::set<Shingle*, Shingle::PointerLess> ShingleSet;
844
Ali Hijazi133b2d92024-02-09 14:01:52845 typedef std::set<raw_ptr<const ShinglePattern, SetExperimental>,
846 ShinglePatternPointerLess>
sra@chromium.org8cd96962009-06-03 23:28:14847 ShinglePatternSet;
848
849 // Patterns are partitioned into the following sets:
850
851 // * Retired patterns (not stored). No shingles exist for this pattern (they
852 // all now match more specialized patterns).
853 // * Useless patterns (not stored). There are no 'program' shingles for this
854 // pattern (they all now match more specialized patterns).
855 // * Single-use patterns - single_use_pattern_queue_.
856 // * Other patterns - active_non_single_use_patterns_ / variable_queue_.
857
Ali Hijazi133b2d92024-02-09 14:01:52858 typedef std::set<raw_ptr<const ShinglePattern, SetExperimental>,
859 OrderShinglePatternByScoreDescending<&SingleUseScore>>
sra@chromium.org8cd96962009-06-03 23:28:14860 SingleUsePatternQueue;
861
862 void PrintPatternsHeader() const {
Ali Hijazi55179192022-11-09 16:28:51863 VLOG(2) << shingle_instances_.size() << " instances " << trace_->size()
864 << " trace length " << patterns_.size() << " shingle indexes "
pkasting@chromium.org18cca202010-10-21 20:40:58865 << single_use_pattern_queue_.size() << " single use patterns "
866 << active_non_single_use_patterns_.size() << " active patterns";
sra@chromium.org8cd96962009-06-03 23:28:14867 }
868
869 void PrintActivePatterns() const {
870 for (ShinglePatternSet::const_iterator p =
871 active_non_single_use_patterns_.begin();
872 p != active_non_single_use_patterns_.end();
873 ++p) {
874 const ShinglePattern* pattern = *p;
pkasting@chromium.org18cca202010-10-21 20:40:58875 VLOG(2) << ToString(pattern, 10);
sra@chromium.org8cd96962009-06-03 23:28:14876 }
877 }
878
879 void PrintPatterns() const {
880 PrintAllPatterns();
881 PrintActivePatterns();
882 PrintAllShingles();
883 }
884
885 void PrintAllPatterns() const {
886 for (IndexToPattern::const_iterator p = patterns_.begin();
887 p != patterns_.end();
888 ++p) {
889 const ShinglePattern& pattern = p->second;
pkasting@chromium.org18cca202010-10-21 20:40:58890 VLOG(2) << ToString(&pattern, 10);
sra@chromium.org8cd96962009-06-03 23:28:14891 }
892 }
893
894 void PrintAllShingles() const {
895 for (Shingle::OwningSet::const_iterator p = shingle_instances_.begin();
896 p != shingle_instances_.end();
897 ++p) {
898 const Shingle& instance = *p;
pkasting@chromium.org18cca202010-10-21 20:40:58899 VLOG(2) << ToString(&instance) << " " << ToString(instance.pattern());
sra@chromium.org8cd96962009-06-03 23:28:14900 }
901 }
902
903
904 void AddShingles(size_t begin, size_t end) {
pkasting8e3a26a2014-10-03 18:52:29905 for (size_t i = begin; i + Shingle::kWidth - 1 < end; ++i) {
Ali Hijazi55179192022-11-09 16:28:51906 instances_[i] = Shingle::Find(*trace_, i, &shingle_instances_);
sra@chromium.org8cd96962009-06-03 23:28:14907 }
908 }
909
910 void Declassify(Shingle* shingle) {
911 ShinglePattern* pattern = shingle->pattern();
912 if (shingle->InModel()) {
913 pattern->model_histogram_.erase(ShinglePattern::FreqView(shingle));
914 pattern->model_coverage_ -= shingle->position_count();
915 } else {
916 pattern->program_histogram_.erase(ShinglePattern::FreqView(shingle));
917 pattern->program_coverage_ -= shingle->position_count();
918 }
Raul Tambre5bd92982019-03-28 06:21:55919 shingle->set_pattern(nullptr);
sra@chromium.org8cd96962009-06-03 23:28:14920 }
921
922 void Reclassify(Shingle* shingle) {
923 ShinglePattern* pattern = shingle->pattern();
Raul Tambre5bd92982019-03-28 06:21:55924 LOG_ASSERT(pattern == nullptr);
sra@chromium.org8cd96962009-06-03 23:28:14925
926 ShinglePattern::Index index(shingle);
927 if (index.variables_ == 0)
928 return;
929
930 std::pair<IndexToPattern::iterator, bool> inserted =
931 patterns_.insert(std::make_pair(index, ShinglePattern()));
932
933 pattern = &inserted.first->second;
934 pattern->index_ = &inserted.first->first;
935 shingle->set_pattern(pattern);
936 patterns_needing_updates_.insert(pattern);
937
938 if (shingle->InModel()) {
939 pattern->model_histogram_.insert(ShinglePattern::FreqView(shingle));
940 pattern->model_coverage_ += shingle->position_count();
941 } else {
942 pattern->program_histogram_.insert(ShinglePattern::FreqView(shingle));
943 pattern->program_coverage_ += shingle->position_count();
944 }
945 }
946
947 void InitialClassify() {
948 for (Shingle::OwningSet::iterator p = shingle_instances_.begin();
949 p != shingle_instances_.end();
950 ++p) {
sra@google.com54f1b822009-07-18 03:28:40951 // GCC's set<T>::iterator::operator *() returns a const object.
952 Reclassify(const_cast<Shingle*>(&*p));
sra@chromium.org8cd96962009-06-03 23:28:14953 }
954 }
955
956 // For the positions in |info|, find the shingles that overlap that position.
957 void AddAffectedPositions(LabelInfo* info, ShingleSet* affected_shingles) {
aviab98dcc92015-12-21 19:35:33958 const uint8_t kWidth = Shingle::kWidth;
sra@chromium.org8cd96962009-06-03 23:28:14959 for (size_t i = 0; i < info->positions_.size(); ++i) {
960 size_t position = info->positions_[i];
961 // Find bounds to the subrange of |trace_| we are in.
962 size_t start = position < model_end_ ? 0 : model_end_;
Ali Hijazi55179192022-11-09 16:28:51963 size_t end = position < model_end_ ? model_end_ : trace_->size();
sra@chromium.org8cd96962009-06-03 23:28:14964
965 // Clip [position-kWidth+1, position+1)
pkasting8e3a26a2014-10-03 18:52:29966 size_t low =
967 position > start + kWidth - 1 ? position - kWidth + 1 : start;
sra@chromium.org8cd96962009-06-03 23:28:14968 size_t high = position + kWidth < end ? position + 1 : end - kWidth + 1;
969
970 for (size_t shingle_position = low;
971 shingle_position < high;
972 ++shingle_position) {
973 Shingle* overlapping_shingle = instances_.at(shingle_position);
974 affected_shingles->insert(overlapping_shingle);
975 }
976 }
977 }
978
979 void RemovePatternsNeedingUpdatesFromQueues() {
980 for (ShinglePatternSet::iterator p = patterns_needing_updates_.begin();
981 p != patterns_needing_updates_.end();
982 ++p) {
983 RemovePatternFromQueues(*p);
984 }
985 }
986
987 void AddPatternsNeedingUpdatesToQueues() {
988 for (ShinglePatternSet::iterator p = patterns_needing_updates_.begin();
989 p != patterns_needing_updates_.end();
990 ++p) {
991 AddPatternToQueues(*p);
992 }
993 variable_queue_.ApplyPendingUpdates();
994 }
995
996 void RemovePatternFromQueues(const ShinglePattern* pattern) {
997 int single_use_score = SingleUseScore(pattern);
998 if (single_use_score > 0) {
999 size_t n = single_use_pattern_queue_.erase(pattern);
1000 LOG_ASSERT(n == 1);
erg@google.comf6b8ce32011-03-02 00:03:181001 } else if (pattern->program_histogram_.empty() &&
1002 pattern->model_histogram_.empty()) {
sra@chromium.org8cd96962009-06-03 23:28:141003 NOTREACHED(); // Should not come back to life.
erg@google.comf6b8ce32011-03-02 00:03:181004 } else if (pattern->program_histogram_.empty()) {
sra@chromium.org8cd96962009-06-03 23:28:141005 // Useless pattern.
1006 } else {
1007 active_non_single_use_patterns_.erase(pattern);
1008 AddPatternToLabelQueue(pattern, -1);
1009 }
1010 }
1011
1012 void AddPatternToQueues(const ShinglePattern* pattern) {
1013 int single_use_score = SingleUseScore(pattern);
1014 if (single_use_score > 0) {
1015 single_use_pattern_queue_.insert(pattern);
erg@google.comf6b8ce32011-03-02 00:03:181016 } else if (pattern->program_histogram_.empty() &&
1017 pattern->model_histogram_.empty()) {
1018 } else if (pattern->program_histogram_.empty()) {
sra@chromium.org8cd96962009-06-03 23:28:141019 // Useless pattern.
1020 } else {
1021 active_non_single_use_patterns_.insert(pattern);
1022 AddPatternToLabelQueue(pattern, +1);
1023 }
1024 }
1025
1026 void AddPatternToLabelQueue(const ShinglePattern* pattern, int sign) {
1027 // For each possible assignment in this pattern, update the potential
1028 // contributions to the LabelInfo queues.
sra@chromium.org8cd96962009-06-03 23:28:141029
1030 // We want to find for each symbol (LabelInfo) the maximum contribution that
1031 // could be achieved by making shingle-wise assignments between shingles in
1032 // the model and shingles in the program.
1033 //
1034 // If the shingles in the histograms are independent (no two shingles have a
1035 // symbol in common) then any permutation of the assignments is possible,
1036 // and the maximum contribution can be found by taking the maximum over all
1037 // the pairs.
1038 //
1039 // If the shingles are dependent two things happen. The maximum
1040 // contribution to any given symbol is a sum because the symbol has
1041 // contributions from all the shingles containing it. Second, some
1042 // assignments are blocked by previous incompatible assignments. We want to
1043 // avoid a combinatorial search, so we ignore the blocking.
1044
sra@google.com54f1b822009-07-18 03:28:401045 const size_t kUnwieldy = 5;
sra@chromium.org8cd96962009-06-03 23:28:141046
1047 typedef std::map<LabelInfo*, int> LabelToScore;
1048 typedef std::map<LabelInfo*, LabelToScore > ScoreSet;
1049 ScoreSet maxima;
1050
1051 size_t n_model_samples = 0;
1052 for (ShinglePattern::Histogram::const_iterator model_iter =
1053 pattern->model_histogram_.begin();
1054 model_iter != pattern->model_histogram_.end();
1055 ++model_iter) {
1056 if (++n_model_samples > kUnwieldy) break;
1057 const ShinglePattern::FreqView& model_freq = *model_iter;
1058 int m1 = model_freq.count();
1059 const Shingle* model_instance = model_freq.instance();
1060
1061 ScoreSet sums;
1062 size_t n_program_samples = 0;
1063 for (ShinglePattern::Histogram::const_iterator program_iter =
1064 pattern->program_histogram_.begin();
1065 program_iter != pattern->program_histogram_.end();
1066 ++program_iter) {
1067 if (++n_program_samples > kUnwieldy) break;
1068 const ShinglePattern::FreqView& program_freq = *program_iter;
1069 int p1 = program_freq.count();
1070 const Shingle* program_instance = program_freq.instance();
1071
1072 // int score = p1; // ? weigh all equally??
1073 int score = std::min(p1, m1);
1074
aviab98dcc92015-12-21 19:35:331075 for (uint8_t i = 0; i < Shingle::kWidth; ++i) {
sra@chromium.org8cd96962009-06-03 23:28:141076 LabelInfo* program_info = program_instance->at(i);
1077 LabelInfo* model_info = model_instance->at(i);
Raul Tambre5bd92982019-03-28 06:21:551078 if ((model_info->assignment_ == nullptr) !=
1079 (program_info->assignment_ == nullptr)) {
pkasting@chromium.org18cca202010-10-21 20:40:581080 VLOG(2) << "ERROR " << i
1081 << "\n\t" << ToString(pattern, 10)
1082 << "\n\t" << ToString(program_instance)
1083 << "\n\t" << ToString(model_instance);
sra@chromium.org8cd96962009-06-03 23:28:141084 }
1085 if (!program_info->assignment_ && !model_info->assignment_) {
1086 sums[program_info][model_info] += score;
1087 }
1088 }
huangs2bffdb362015-08-18 14:05:341089 }
sra@chromium.org8cd96962009-06-03 23:28:141090
huangs2bffdb362015-08-18 14:05:341091 for (ScoreSet::iterator assignee_iterator = sums.begin();
1092 assignee_iterator != sums.end();
1093 ++assignee_iterator) {
1094 LabelInfo* program_info = assignee_iterator->first;
1095 for (LabelToScore::iterator p = assignee_iterator->second.begin();
1096 p != assignee_iterator->second.end();
1097 ++p) {
1098 LabelInfo* model_info = p->first;
1099 int score = p->second;
1100 int* slot = &maxima[program_info][model_info];
1101 *slot = std::max(*slot, score);
sra@chromium.org8cd96962009-06-03 23:28:141102 }
1103 }
1104 }
1105
1106 for (ScoreSet::iterator assignee_iterator = maxima.begin();
1107 assignee_iterator != maxima.end();
1108 ++assignee_iterator) {
1109 LabelInfo* program_info = assignee_iterator->first;
1110 for (LabelToScore::iterator p = assignee_iterator->second.begin();
1111 p != assignee_iterator->second.end();
1112 ++p) {
1113 LabelInfo* model_info = p->first;
1114 int score = sign * p->second;
1115 variable_queue_.AddPendingUpdate(program_info, model_info, score);
1116 }
1117 }
1118 }
1119
1120 void AssignOne(LabelInfo* model_info, LabelInfo* program_info) {
1121 LOG_ASSERT(!model_info->assignment_);
1122 LOG_ASSERT(!program_info->assignment_);
1123 LOG_ASSERT(model_info->is_model_);
1124 LOG_ASSERT(!program_info->is_model_);
1125
pkasting@chromium.org18cca202010-10-21 20:40:581126 VLOG(3) << "Assign " << ToString(program_info)
1127 << " := " << ToString(model_info);
sra@chromium.org8cd96962009-06-03 23:28:141128
1129 ShingleSet affected_shingles;
1130 AddAffectedPositions(model_info, &affected_shingles);
1131 AddAffectedPositions(program_info, &affected_shingles);
1132
1133 for (ShingleSet::iterator p = affected_shingles.begin();
1134 p != affected_shingles.end();
1135 ++p) {
1136 patterns_needing_updates_.insert((*p)->pattern());
1137 }
1138
1139 RemovePatternsNeedingUpdatesFromQueues();
1140
1141 for (ShingleSet::iterator p = affected_shingles.begin();
1142 p != affected_shingles.end();
1143 ++p) {
1144 Declassify(*p);
1145 }
1146
1147 program_info->label_->index_ = model_info->label_->index_;
1148 // Mark as assigned
1149 model_info->assignment_ = program_info;
1150 program_info->assignment_ = model_info;
1151
1152 for (ShingleSet::iterator p = affected_shingles.begin();
1153 p != affected_shingles.end();
1154 ++p) {
1155 Reclassify(*p);
1156 }
1157
1158 AddPatternsNeedingUpdatesToQueues();
1159 }
1160
1161 bool AssignFirstVariableOfHistogramHead(const ShinglePattern& pattern) {
1162 const ShinglePattern::FreqView& program_1 =
1163 *pattern.program_histogram_.begin();
1164 const ShinglePattern::FreqView& model_1 = *pattern.model_histogram_.begin();
1165 const Shingle* program_instance = program_1.instance();
1166 const Shingle* model_instance = model_1.instance();
1167 size_t variable_index = pattern.index_->first_variable_index_;
1168 LabelInfo* program_info = program_instance->at(variable_index);
1169 LabelInfo* model_info = model_instance->at(variable_index);
1170 AssignOne(model_info, program_info);
1171 return true;
1172 }
1173
1174 bool FindAndAssignBestLeader() {
1175 LOG_ASSERT(patterns_needing_updates_.empty());
1176
1177 if (!single_use_pattern_queue_.empty()) {
1178 const ShinglePattern& pattern = **single_use_pattern_queue_.begin();
1179 return AssignFirstVariableOfHistogramHead(pattern);
1180 }
1181
1182 if (variable_queue_.empty())
1183 return false;
1184
1185 const VariableQueue::ScoreAndLabel best = variable_queue_.first();
1186 int score = best.first;
1187 LabelInfo* assignee = best.second;
1188
1189 // TODO(sra): score (best.first) can be zero. A zero score means we are
1190 // blindly picking between two (or more) alternatives which look the same.
1191 // If we exit on the first zero-score we sometimes get 3-4% better total
1192 // compression. This indicates that 'infill' is doing a better job than
1193 // picking blindly. Perhaps we can use an extended region around the
1194 // undistinguished competing alternatives to break the tie.
1195 if (score == 0) {
1196 variable_queue_.Print();
1197 return false;
1198 }
1199
1200 AssignmentCandidates* candidates = assignee->candidates();
1201 if (candidates->empty())
1202 return false; // Should not happen.
1203
1204 AssignOne(candidates->top_candidate(), assignee);
1205 return true;
1206 }
1207
1208 private:
1209 // The trace vector contains the model sequence [0, model_end_) followed by
1210 // the program sequence [model_end_, trace.end())
Ali Hijazi55179192022-11-09 16:28:511211 const raw_ref<const Trace> trace_;
sra@chromium.org8cd96962009-06-03 23:28:141212 size_t model_end_;
1213
1214 // |shingle_instances_| is the set of 'interned' shingles.
1215 Shingle::OwningSet shingle_instances_;
1216
1217 // |instances_| maps from position in |trace_| to Shingle at that position.
Ali Hijazie63cbaf62023-12-20 19:29:351218 std::vector<raw_ptr<Shingle, VectorExperimental>> instances_;
sra@chromium.org8cd96962009-06-03 23:28:141219
1220 SingleUsePatternQueue single_use_pattern_queue_;
1221 ShinglePatternSet active_non_single_use_patterns_;
1222 VariableQueue variable_queue_;
1223
1224 // Transient information: when we make an assignment, we need to recompute
1225 // priority queue information derived from these ShinglePatterns.
1226 ShinglePatternSet patterns_needing_updates_;
1227
1228 typedef std::map<ShinglePattern::Index,
1229 ShinglePattern, ShinglePatternIndexLess> IndexToPattern;
1230 IndexToPattern patterns_;
sra@chromium.org8cd96962009-06-03 23:28:141231};
1232
1233class Adjuster : public AdjustmentMethod {
1234 public:
Raul Tambre5bd92982019-03-28 06:21:551235 Adjuster() : prog_(nullptr), model_(nullptr) {}
Peter Boströmc68c5aa2021-09-28 00:28:001236
1237 Adjuster(const Adjuster&) = delete;
1238 Adjuster& operator=(const Adjuster&) = delete;
1239
Chris Watkins2903a822017-11-30 03:19:571240 ~Adjuster() = default;
sra@chromium.org8cd96962009-06-03 23:28:141241
1242 bool Adjust(const AssemblyProgram& model, AssemblyProgram* program) {
pkasting@chromium.org18cca202010-10-21 20:40:581243 VLOG(1) << "Adjuster::Adjust";
sra@chromium.org8cd96962009-06-03 23:28:141244 prog_ = program;
1245 model_ = &model;
1246 return Finish();
1247 }
1248
1249 bool Finish() {
1250 prog_->UnassignIndexes();
1251 Trace abs32_trace_;
1252 Trace rel32_trace_;
1253 CollectTraces(model_, &abs32_trace_, &rel32_trace_, true);
1254 size_t abs32_model_end = abs32_trace_.size();
1255 size_t rel32_model_end = rel32_trace_.size();
1256 CollectTraces(prog_, &abs32_trace_, &rel32_trace_, false);
1257 Solve(abs32_trace_, abs32_model_end);
1258 Solve(rel32_trace_, rel32_model_end);
1259 prog_->AssignRemainingIndexes();
1260 return true;
1261 }
1262
1263 private:
1264 void CollectTraces(const AssemblyProgram* program, Trace* abs32, Trace* rel32,
1265 bool is_model) {
1266 label_info_maker_.ResetDebugLabel();
huangs99a5a8c2016-10-28 22:29:311267
huangsc4155eb2017-04-13 20:47:071268 for (Label* label : program->abs32_label_annotations())
1269 ReferenceLabel(abs32, is_model, label);
1270 for (Label* label : program->rel32_label_annotations())
1271 ReferenceLabel(rel32, is_model, label);
huangs99a5a8c2016-10-28 22:29:311272
sra@chromium.org8cd96962009-06-03 23:28:141273 // TODO(sra): we could simply append all the labels in index order to
1274 // incorporate some costing for entropy (bigger deltas) that will be
1275 // introduced into the label address table by non-monotonic ordering. This
1276 // would have some knock-on effects to parts of the algorithm that work on
1277 // single-occurrence labels.
1278 }
1279
1280 void Solve(const Trace& model, size_t model_end) {
laforge@chromium.org7e6b9262010-03-15 17:33:461281 base::Time start_time = base::Time::Now();
sra@chromium.org8cd96962009-06-03 23:28:141282 AssignmentProblem a(model, model_end);
1283 a.Solve();
pkasting@chromium.org18cca202010-10-21 20:40:581284 VLOG(1) << " Adjuster::Solve "
1285 << (base::Time::Now() - start_time).InSecondsF();
sra@chromium.org8cd96962009-06-03 23:28:141286 }
1287
huangs99a5a8c2016-10-28 22:29:311288 void ReferenceLabel(Trace* trace, bool is_model, Label* label) {
aviab98dcc92015-12-21 19:35:331289 trace->push_back(label_info_maker_.MakeLabelInfo(
1290 label, is_model, static_cast<uint32_t>(trace->size())));
sra@chromium.org8cd96962009-06-03 23:28:141291 }
1292
Keishi Hattori0e45c022021-11-27 09:25:521293 raw_ptr<AssemblyProgram> prog_; // Program to be adjusted, owned by caller.
1294 raw_ptr<const AssemblyProgram>
1295 model_; // Program to be mimicked, owned by caller.
sra@chromium.org8cd96962009-06-03 23:28:141296
1297 LabelInfoMaker label_info_maker_;
sra@chromium.org8cd96962009-06-03 23:28:141298};
1299
1300////////////////////////////////////////////////////////////////////////////////
1301
1302} // namespace adjustment_method_2
1303
1304AdjustmentMethod* AdjustmentMethod::MakeShingleAdjustmentMethod() {
1305 return new adjustment_method_2::Adjuster();
1306}
1307
1308} // namespace courgette