US20120137367A1 - Continuous anomaly detection based on behavior modeling and heterogeneous information analysis - Google Patents
Continuous anomaly detection based on behavior modeling and heterogeneous information analysis Download PDFInfo
- Publication number
- US20120137367A1 US20120137367A1 US12/941,849 US94184910A US2012137367A1 US 20120137367 A1 US20120137367 A1 US 20120137367A1 US 94184910 A US94184910 A US 94184910A US 2012137367 A1 US2012137367 A1 US 2012137367A1
- Authority
- US
- United States
- Prior art keywords
- canceled
- data
- events
- present disclosure
- hypergraph
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/552—Detecting local intrusion or implementing counter-measures involving long-term monitoring or reporting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
Definitions
- the present disclosure falls into the general area of anomaly detection, and more particularly, anomaly in human behavior analyzed from electronic data. It can be applied to a number of domain-specific scenarios such as compliance monitoring or risk management in high-risk domains such as investment banking or intelligence. It is also applicable to malicious insider detection in areas ranging from corporate theft to counter-intelligence or the broader intelligence community.
- Anomaly detection is a general goal that can fulfill a number of missions, ranging from the prevention of activities perpetrated by malicious insiders to unintentional threats, as well as more generally managing human and operational risk.
- anomaly detection systems only report threats once they have harmed the organization or its employees.
- malicious insiders are typically flagged once they have perpetrated their actions (when they are flagged) and the associated damage can then only be mitigated rather than prevented.
- the present system is considerably more difficult, and in most situations almost impossible, to circumvent. This is because it builds a holistic model, which covers both structured and unstructured data, but which also covers human behavior, including communications [ 123 ] among people and interactions between people and data. That model can be integrated with the organization's infrastructure to provide complete coverage of human and machine activities.
- a key aspect of this model is the determination of regularity in behavior, both at the individual and group level; behavior, especially in an organizational context, is anything but random.
- This disclosure is data-driven, and not primarily rule-driven, so that it can automatically adapt to changes in the analyzed data or in the environment, including when human behavior itself is changing. (It should be noted however that this disclosure also supports the definition of rules as an essential requirement, since they are an intrinsic part of many regulatory or internal policies in place in large organizations.)
- the problem of applicability scope mentioned above is also addressed by the present disclosure, as it is suitable for any type of domain. This is because it does not require the explicit definition of a set of rules or anomalous patterns, but also because it handles unstructured data in a generic manner, and is able to correlate that data with the type of structured data to which vertical anomaly detection systems are often restricted.
- the present disclosure establishes a multi-dimensional behavioral model that is based on normalcy, i.e. on how people habitually communicate, work, and interact, rather than on expected behavior defined by a predefined number of dimensions. Every risk-generating activity is characterized by at least some change in behavior: even malicious insiders who go to great lengths to circumvent detection mechanisms in place will leave some trace in electronic form that will result in a deviation from baseline behavior—whether from their own established baseline or that of their peer group—even when no rule has been violated, and when the malicious activities in question do not fit any scenario known a priori.
- the present disclosure provides techniques that allow an organization to proactively detect such threats by spotting precursory signs of malicious activity, and thus intervene and prevent the threats from being executed.
- Such signs always precede the perpetration of the malicious insider's actions, however they are only detectable by a system which performs holistic analysis and which detects any significant and therefore possibly relevant deviations from normal behavior—as opposed to a finite number of pre-defined rule violations (also known as patterns or “signatures”).
- An extensible application framework to efficiently visualize patterns derived from analyzing electronic data and human behavior, and to show how those patterns evolve over time and across the organization.
- the present disclosure describes a system and a method to enable a holistic and continuous analysis of information to accurately model human behavior and to detect anomalies in this behavior with respect to, but not limited to, temporal, organizational, or other dimensions.
- the system is able to infer potentially damaging activities, whether of unintentional or malicious nature, without requiring the prior definition of the type and characteristics of these activities, it relies on the analysis of potentially massive volumes of heterogeneous electronic data (includes both text-bearing and non-text-bearing records) stored inside or outside any organization. That analysis can be performed either in discrete increments or in real-time.
- the present disclosure establishes structural and semantic patterns from the analyzed data and builds a predictive multi-dimensional model of both individual and collective behavior which allows detecting abnormal patterns in these behaviors as well.
- Some embodiments of this system are intended to run at a much smaller scale than much of what is described here.
- the smallest scale is at the level of individual personal use, to observe or monitor one's personal ecosystem through all available communication channels, including social media such as Facebook or Twitter.
- Such personal usage would not include a compliance system for example, but could be used to do things such as observe shifts in popularity, influence, stress within one's circle, and other things that are described in this application and in previous ones.
- Somewhat larger scale usage for small business would likewise be unlikely to have enterprise systems present such as an HR system, an accounting system, or a compliance system, but could nevertheless very beneficially be used to do things like identify which processes were causing friction amongst the staff, who is “buck passing”, etc.
- Some embodiments will therefore make suggestions directly to the user based on evidence.
- the system could point out that the user was very slow to respond to their grandmother—maybe it is time to call her or send her mail; that the user tends to “yell” about certain topics and should avoid doing so in future; or that they are spending way more time this year than they did last year on the NCAA betting pool during working hours, etc.
- FIG. 1 is a block diagram of the main data objects analyzed or created by the system in accordance with an embodiment of the present disclosure.
- FIG. 2 is a block diagram of the main concepts used for behavioral modeling and anomaly detection in accordance with an embodiment of the present disclosure.
- FIG. 3 is a block diagram of the main data structures presented to a user of the system in accordance with an embodiment of the present disclosure.
- FIG. 4 is an architecture diagram of the system in accordance with an embodiment of the present disclosure.
- FIG. 5 is a block diagram of the different data collection modes in accordance with an embodiment of the present disclosure.
- FIG. 6 is a block diagram showing the structure of the collection repository content in accordance with an embodiment of the present disclosure.
- FIG. 7 is a flowchart describing the continuous processing of new data in accordance with an embodiment of the present disclosure.
- FIG. 8 is a block diagram showing the different types of anomalies generated by the system in accordance with an embodiment of the present disclosure.
- FIG. 9 is a flowchart of the continuous periodic pattern detection process in accordance with an embodiment of the present disclosure.
- FIG. 10 is a flowchart of the feature collection process in accordance with an embodiment of the present disclosure.
- FIG. 11 is a flowchart of the periodic pattern frequency component update process in accordance with an embodiment of the present disclosure.
- FIG. 12 is a block diagram of a computer memory hierarchy example in accordance with an embodiment of the present disclosure.
- FIG. 13 is a block diagram of a computer network memory hierarchy example in accordance with an embodiment of the present disclosure.
- FIG. 14 is a flowchart showing the continuous categorization process in accordance with an embodiment of the present disclosure.
- FIG. 15 is a block diagram showing the different types of categorization components in accordance with an embodiment of the present disclosure.
- FIG. 16 is a block diagram of the categorization data model in accordance with an embodiment of the present disclosure.
- FIG. 17 is a flowchart of the pragmatic tagging component in accordance with an embodiment of the present disclosure.
- FIG. 18 is a state diagram of the pragmatic workflow model in accordance with an embodiment of the present disclosure.
- FIG. 19 is a flowchart showing a high-level process for detecting both textblock patterns and textblock hits in accordance with an embodiment of the present disclosure.
- FIG. 20 is a flowchart showing the process for producing a textblock graph, i.e. a graph of transitions between n-grams in a sliding window of size k, in accordance with an embodiment of the present disclosure.
- FIG. 21 is a flowchart showing the process for isolating textblock patterns from the textblock graph in accordance with an embodiment of the present disclosure.
- FIG. 22 is a flowchart showing the process for finding textblock hits in items within the universe in accordance with an embodiment of the present disclosure.
- FIG. 23 is a flowchart showing an alternate process for finding textblock patterns which uses bounded constant-access memory in accordance with an embodiment of the present disclosure.
- FIG. 24 is an illustration showing how n-gram transition edges are added to the textblock graph based upon a particular series of tokens in accordance with an embodiment of the present disclosure.
- FIG. 25 is an illustration showing an example of how local clusterability is calculated in accordance with an embodiment of the present disclosure.
- FIG. 26 is an illustration showing a method for limiting the size of the graph to be examined by only considering n-grams following function words in accordance with an embodiment of the present disclosure.
- FIG. 27 is an illustration showing a method for limiting the size of the graph to be examined by winnowing the list of n-grams to be considered in accordance with an embodiment of the present disclosure.
- FIG. 28 is a graph of information dissemination profiles computed by the system in accordance with an embodiment of the present disclosure.
- FIG. 29 is a block diagram showing the different types of features used for anomaly detection in accordance with an embodiment of the present disclosure.
- FIG. 30 is a block diagram showing the referential types used for anomaly detection in accordance with an embodiment of the present disclosure.
- FIG. 31 is a state diagram for anomalies generated by the system in accordance with an embodiment of the present disclosure.
- FIG. 32 is a flowchart for defining an anomaly class when entering user feedback in accordance with an embodiment of the present disclosure.
- FIG. 33 is a flowchart of the user feedback process for an anomaly by deviation in accordance with an embodiment of the present disclosure.
- FIG. 34 is a drawing of the continuous sequence viewer in accordance with an embodiment of the present disclosure.
- FIG. 35 is a drawing of the sequence snapshot contrast viewer in accordance with an embodiment of the present disclosure.
- FIG. 36 is a drawing of the alias usage browser in accordance with an embodiment of the present disclosure.
- FIG. 37 is a drawing illustrating the navigation through the alias usage browser in accordance with an embodiment of the present disclosure.
- FIG. 38 is a Hasse diagram showing an example of emotional intensity assessment in accordance with an embodiment of the present disclosure.
- FIG. 39 is a drawing of the animated graph of attention shift in accordance with an embodiment of the present disclosure.
- FIG. 40 is a drawing of the animated graph of delegation pattern changes in accordance with an embodiment of the present disclosure.
- FIG. 41 is a drawing of the animated graph of clique evolution in accordance with an embodiment of the present disclosure.
- FIG. 42 is a drawing of the continuous gap viewer for a single periodic pattern in accordance with an embodiment of the present disclosure.
- FIG. 43 is a drawing of the continuous gap viewer for correlated periodic patterns in accordance with an embodiment of the present disclosure.
- FIG. 44 is a drawing of the alert timeline visualization in accordance with an embodiment of the present disclosure.
- FIG. 45 is a drawing of the behavior-based alert visualization in accordance with an embodiment of the present disclosure.
- FIG. 46 is an illustration of the animation of the behavior-based alert visualization in accordance with an embodiment of the present disclosure.
- FIG. 47 is an illustration of the effect of behavioral metric tuning in the behavior-based alert visualization in accordance with an embodiment of the present disclosure.
- FIG. 48 is a screenshot of one embodiment of the social you-niverse visualization in accordance with an embodiment of the present disclosure.
- FIG. 49 is a screenshot of one embodiment of the social you-niverse visualization depicting solar system around a star in accordance with an embodiment of the present disclosure.
- FIG. 50 is a screenshot of one embodiment of the social you-niverse visualization depicting icons or other visual indicators for distance in accordance with an embodiment of the present disclosure.
- FIG. 51 is a screenshot of one embodiment of the social you-niverse visualization depicting galaxies in accordance with an embodiment of the present disclosure.
- FIG. 52 is a screenshot of one embodiment of the social you-niverse visualization depicting planet orbiting more complex structures in accordance with an embodiment of the present disclosure.
- FIG. 53 is a screenshot of one embodiment of the social you-niverse visualization depicting binary or multiple stars in accordance with an embodiment of the present disclosure.
- FIG. 54 is a screenshot of one embodiment of the social you-niverse visualization depicting nebulas in accordance with an embodiment of the present disclosure.
- FIG. 55 is a screenshot of one embodiment of the social you-niverse visualization depicting an interstellar cloud of dust in accordance with an embodiment of the present disclosure.
- FIG. 56 is a screenshot of one embodiment of the social you-niverse visualization depicting a supernova explosion in accordance with an embodiment of the present disclosure.
- FIG. 57 is a screenshot of one embodiment of the social you-niverse visualization depicting gravitational pull on outer planets in accordance with an embodiment of the present disclosure.
- FIG. 58 is a screenshot of one embodiment of the social you-niverse visualization depicting wobbling planets in accordance with an embodiment of the present disclosure.
- FIG. 59 is a screenshot of one embodiment of the social you-niverse visualization depicting orbits that are stretched in the regions contiguous to the solar system that is exerting the pull in accordance with an embodiment of the present disclosure.
- FIG. 60 is a screenshot of one embodiment of the social you-niverse visualization depicting disappearance of planets from the galaxy or universe in accordance with an embodiment of the present disclosure.
- FIG. 61 is a screenshot of one embodiment of the social you-niverse visualization depicting solar systems which are exhibiting the greatest degree of change shift automatically towards the visual center of the screen, so as to make themselves more visible to the user in accordance with an embodiment of the present disclosure.
- FIG. 62 is a screenshot of one embodiment of the social you-niverse visualization depicting user ability to specify which types of changes in accordance with an embodiment of the present disclosure.
- FIG. 63 is a screenshot of one embodiment of the social you-niverse visualization depicting a planet pulled into a new solar system with a trail of dust or other visual artifact to call attention to it self in accordance with an embodiment of the present disclosure.
- FIG. 64 is a screenshot of one embodiment of the social you-niverse visualization depict clouds of dust used to cloak planets which represent individuals about whom little is known in accordance with an embodiment of the present disclosure.
- FIG. 65 is a screenshot of one embodiment of the social you-niverse visualization depicting dust rendered over as much space as necessary, for as long as necessary, in order to accurately portray the extent and duration of the data loss in accordance with an embodiment of the present disclosure.
- FIG. 66 is a screenshot of one embodiment of the social you-niverse visualization depict moons orbiting other planets which represent the “followers” or entourage of that actor in accordance with an embodiment of the present disclosure.
- FIG. 67 is a screenshot of one embodiment of the social you-niverse visualization depicting orbits relative speed in accordance with an embodiment of the present disclosure.
- FIG. 68 is a screenshot of one embodiment of the social you-niverse visualization in accordance with an embodiment of the present disclosure.
- FIG. 69 is a screenshot of one embodiment of the social you-niverse visualization depict a planet gradually drifting out of the orbit of the current solar system and disappearing in accordance with an embodiment of the present disclosure.
- FIG. 70 is a screenshot of one embodiment of the social you-niverse visualization depicting two actors experiencing conflict and the planets representing them smashing together in accordance with an embodiment of the present disclosure.
- FIG. 71 is a screenshot of one embodiment of the social you-niverse visualization depicting concepts or topics instead of moons, starts or planets in accordance with an embodiment of the present disclosure.
- FIG. 72 is a screenshot of one embodiment of the social you-niverse visualization depicting optional sound effects in accordance with an embodiment of the present disclosure.
- FIG. 73 is a screenshot of one embodiment of the temperature gauge visualization in accordance with an embodiment of the present disclosure.
- FIG. 74 is a screenshot of one embodiment of the temperature gauge visualization depicting notion of “neutral” and various types of negative sentiments in accordance with an embodiment of the present disclosure.
- FIG. 75 is a screenshot of one embodiment of the temperature gauge visualization depicting expression of positive sentiments by having the midpoint in the gauge in accordance with an embodiment of the present disclosure.
- FIG. 76 is a screenshot of one embodiment of the temperature gauge visualization depicting emoticons of different kinds instead of temperature gauge icons in accordance with an embodiment of the present disclosure.
- FIG. 77 a screenshot of one embodiment of the temperature gauge visualization depicting emoticons of different kinds instead of temperature gauge icons in accordance with an embodiment of the present disclosure.
- FIG. 78 is a screenshot of one embodiment of the stressful topics visualization depicting a matrix representation in which actors and topics are respectively represented in rows and columns in accordance with an embodiment of the present disclosure.
- FIG. 79 is a screenshot of one embodiment of the stressful topics visualization in accordance with an embodiment of the present disclosure.
- FIG. 80 is a screenshot of one embodiment of the stressful topics visualization depicting change over the course of time in accordance with an embodiment of the present disclosure.
- FIG. 81 is a screenshot of one embodiment of the stressful topics visualization depicting changes in individual rows and columns in a matrix in accordance with an embodiment of the present disclosure.
- FIG. 82 is a screenshot of one embodiment of the stressful topics visualization depicting a way to account for languages with word orderings other than left to right in accordance with an embodiment of the present disclosure.
- FIG. 83 is a screenshot of one embodiment of the stressful topics visualization depicting color designation for rows and columns which have been swapped over time in accordance with an embodiment of the present disclosure.
- FIG. 84 is a screenshot of one embodiment of the stressful topics visualization depicting ability to play a visualization which contains an arbitrary number of different matrices according to the same timeline in accordance with an embodiment of the present disclosure.
- FIG. 85 is a screenshot of one embodiment of the stressful topics visualization depicting user ability to select either/both matrices from different timeframes, and/or different timeframes from the same matrix and play these matrices all together in accordance with an embodiment of the present disclosure.
- FIG. 86 is a screenshot of one embodiment of the stressful topics visualization depicting display indicating the offset unit of time in accordance with an embodiment of the present disclosure.
- FIG. 87 is a screenshot of one embodiment of the stressful topics visualization depicting a heat map implementation in accordance with an embodiment of the present disclosure.
- FIG. 88 is a screenshot of one embodiment of the stressful topics visualization depicting the user ability determine whether they want to see a visual emphasis in accordance with an embodiment of the present disclosure.
- FIG. 89 is a screenshot of one embodiment of the stressful topics visualization in accordance with an embodiment of the present disclosure.
- FIG. 90 is a screenshot of one embodiment of the pecking order visualization in accordance with an embodiment of the present disclosure.
- FIG. 91 is a screenshot of one embodiment of the pecking order visualization depicting user ability to choose other animals in accordance with an embodiment of the present disclosure.
- FIG. 92 is a screenshot of one embodiment of the pecking order visualization depicting allowing the user to choose the animal type generally, or with respect to a particular actor, type of actor, or specific hierarchy instance in accordance with an embodiment of the present disclosure.
- FIG. 93 is a screenshot of one embodiment of the pecking order visualization depicting each individual pecking order is represented by a building in accordance with an embodiment of the present disclosure.
- FIG. 94 is a screenshot of one embodiment of the pecking order visualization depicting user ability to specify the left-to-right order in which the buildings are rendered from choices in accordance with an embodiment of the present disclosure.
- FIG. 95 is a screenshot of one embodiment of the pecking order visualization depicting the building being built in accordance with an embodiment of the present disclosure.
- FIG. 96 is a screenshot of one embodiment of the pecking order visualization depicting the building accumulating broken windows, graffiti, and other signs of disuse in accordance with an embodiment of the present disclosure.
- FIG. 97 is a screenshot of one embodiment of the pecking order visualization depicting ledges or levels designated with labels such as “vice president” in accordance with an embodiment of the present disclosure.
- FIG. 98 is a screenshot of one embodiment of the pecking order visualization depicting a chicken flying between the different pecking order instances in accordance with an embodiment of the present disclosure.
- FIG. 99 is a screenshot of one embodiment of the pecking order visualization depict a chicken representing an actor who is no longer on the scene will fall to the ground in a manner that clearly suggests it is dead in accordance with an embodiment of the present disclosure.
- FIG. 100 is a screenshot of one embodiment of the pecking order visualization depict a chicken representing an actor who is no longer on the scene will fall to the ground in a manner that clearly suggests it is dead being carried away by vultures and so on in accordance with an embodiment of the present disclosure.
- FIG. 101 is a screenshot of one embodiment of the pecking order visualization depicting chickens (or other animals) ascending or descending from one level to the next according to the backing data in accordance with an embodiment of the present disclosure.
- FIG. 102 is a screenshot of one embodiment of the pecking order visualization depicting chickens ganging up on one or more other chickens if the actors they represent are engaged in an argument or power struggle in accordance with an embodiment of the present disclosure.
- FIG. 103 is a screenshot of one embodiment of the pecking order visualization in accordance with an embodiment of the present disclosure.
- FIG. 104 is a screenshot of one embodiment of the buck passing visualization in accordance with an embodiment of the present disclosure.
- FIG. 105 a screenshot of one embodiment of the buck passing visualization is viewed as a graph in which two objects are connected together by an arc in accordance with an embodiment of the present disclosure.
- FIG. 106 is a screenshot of one embodiment of the buck passing visualization depicting an arc that becomes thin enough as a result of lack of buck passing that it will simply disappear from the view in accordance with an embodiment of the present disclosure.
- FIG. 107 a screenshot of one embodiment of the buck passing visualization depicting buck passing relationships which have expanded or contracted over the course of the available data in accordance with an embodiment of the present disclosure.
- FIG. 108 is a screenshot of one embodiment of the buck passing visualization depicting horizontally-aligned pairs of arrows which point at one another if the buck passing has diminished, and point in opposite directions if it has increased in accordance with an embodiment of the present disclosure.
- FIG. 109 is a screenshot of one embodiment of the buck passing visualization depicting horizontally-aligned pairs of arrows which point at one another if the buck passing has diminished, and point in opposite directions if it has increased in accordance with an embodiment of the present disclosure.
- FIG. 110 is a screenshot of one embodiment of the buck passing visualization depicting various visual treatments to illustrate the buck passing relationship in accordance with an embodiment of the present disclosure.
- FIG. 111 is a screenshot of one embodiment of the buck passing visualization depicting user ability to specify types of topics and ad hoc workflow processes that should not be considered as instances of buck passing in accordance with an embodiment of the present disclosure.
- FIG. 112 is a screenshot of one embodiment of the buck passing visualization depicting different classes of identifiable tasks that can be specified to have differing visual treatments by the user so as to make them easily distinguishable from one another in accordance with an embodiment of the present disclosure.
- FIG. 113 is a screenshot of one embodiment of the buck passing visualization depicting a different visual treatment for nodes that represent actors who have changed roles and the arcs that represent pre-existing buck-passing relationships in accordance with an embodiment of the present disclosure.
- FIG. 114 is a screenshot of one embodiment of the love life visualization in accordance with an embodiment of the present disclosure.
- FIG. 115 is a Conceptual Diagram of the Hypergraph System in accordance with an embodiment of the present disclosure.
- FIG. 116 is a diagram of indexed Data Sources in accordance with an embodiment of the present disclosure.
- FIG. 117 is a diagram of Indexed Data Sources in accordance with an embodiment of the present disclosure.
- FIG. 118 is a diagram of featured query operators in accordance with an embodiment of the present disclosure.
- FIG. 119 is diagram of the query matching procedure in accordance with an embodiment of the present disclosure.
- FIG. 120 is a diagram of the discussion building process in accordance with an embodiment of the present disclosure.
- FIG. 121 is a diagram of a faceted evidence representation in accordance with an embodiment of the present disclosure.
- the present disclosure efficiently performs continuous monitoring of data produced or circulating within an entity, or social network, whether it is within a specific entity or on the world wide web, especially when relying on a high volume of electronic data, and uses novel behavioral analysis techniques in order to detect, report and/or make users aware of possibly preventable damaging events, of an accidental or fraudulent nature.
- FIGS. 1 , 2 , and 3 depict the key elements and concepts of the system described in accordance with an embodiment of the present disclosure.
- Event [ 100 ] The central unit of analysis of the present disclosure. Depending on its origin, an event can be an observed event [ 102 ] exogenous to the system, a derived event [ 104 ] produced by the system, or user input [ 106 ] manually entered through an embodiment of the disclosure.
- Evidence [ 108 ] is derived by the system after collecting, processing, and analyzing events [ 100 ].
- Evidence is represented by the system as OSF [ 110 ] or order sorted features structures for which grammar [ 112 ] rules can be defined.
- OSF's [ 110 ] are stored as a hypergraph [ 114 ] model in one embodiment.
- Token [ 116 ] The smallest unit of analysis of the disclosure. In one embodiment, this atomic unit of analysis is a linguistic term. In another embodiment, it is a single character. N-grams [ 118 ] are continuous sequences of tokens [ 116 ] of length n, where n is a fixed, pre-determined integer.
- Pattern [ 120 ] A model for a number of features shared by multiple items [ 122 ] and to represent the structure underlying those items [ 122 ]. Each such pattern [ 120 ] can be matched by one or more pieces of evidence [ 108 ] derived by the system.
- a textblock pattern [ 124 ] is a model for contiguous blocks of text that is associated with one author, and is substantive enough to be treated potentially as an independent object in the system. Textblock patterns [ 124 ] are derived by the system from building a textblock graph [ 160 ] which contains transitions between n-grams [ 118 ].
- a periodic pattern [ 126 ] is a model of events [ 100 ] that are occurring at exactly or approximately regular intervals over time.
- a workflow process [ 128 ] is a formal or ad hoc source of evidence [ 108 ] that constrains specific events [ 100 ] to be performed according to a number of workflow stages [ 154 ].
- Items [ 122 ] such as electronic communications [ 123 ] collected from any kind of communication channel [ 156 ] or electronic documents [ 162 ] can, upon processing by this disclosure, be assigned a number of item tags [ 142 ] which are a form of metadata [ 140 ] computed by the system on the basis of categorization components [ 146 ] such as an ontology [ 148 ] composed of a set of ontology classifiers [ 150 ], or a topic detection [ 152 ] method.
- a discussion [ 136 ] in which items [ 122 ] have been marked using item tags [ 142 ] constitutes a tagged sequence [ 138 ].
- Actor [ 220 ] A human or computer system which produces items [ 122 ] and is associated with one or more distinct electronic identities [ 235 ] such as email accounts, IM handles, system logins, etc.
- An actor [ 220 ] may be deemed to have more than one personality [ 230 ] if the content created or received by at least one of the different electronic identities [ 235 ] varies significantly from that of the others, where an electronic identity [ 235 ] can be an electronic alias [ 240 ] on some communication channel [ 156 ] or any reference to an individual, for example by name [ 245 ].
- a group [ 225 ] is a container object for actors [ 220 ]. They can be formal groups [ 250 ], for example when an organizational chart is available to the system. Other types of groups [ 225 ] can be derived by the system from sociological and behavioral analysis, such as cliques [ 255 ] also known as circles of trust, which are a set of actors who consistently correspond in a closed loop with one another.
- a typed update [ 107 ] is a light representation of an incremental change to an evidence or event that can be forwarded to different components.
- a typed update [ 107 ] references one or several evidences or events that are affected by the changes. Because we are dealing with continuous and very large streams of data, forwarding typed updates [ 107 ] instead of resending whole results every time an evidence or event is updated greatly reduces the amount of traffic between components, especially when it comes to updating previously computed results.
- typed updates [ 107 ] can take the form of deltas or simple incremental data changes.
- a typed update [ 107 ] can consist of a function or set of declarations and operations to be applied to existing data in order to mutate it.
- anomaly detection means that the system is able to infer potentially damaging activities, whether of unintentional or malicious nature, without requiring the prior definition of the type and characteristics of these activities.
- heterogeneous information analysis means that all kinds of electronic information stored in a corporation or another kind of organization can be processed by the system in a unified manner and consistently lend themselves to the detection of abnormal patterns in the data. This includes both text-bearing and non-text-bearing records.
- multi-dimensional behavioral modeling means that in addition to analyzing pieces of information, the building of a predictive multi-dimensional model of both individual and collective behaviors allows detecting abnormal patterns in these behaviors as well.
- the present disclosure describes a method for building and continuously maintaining a behavioral model [ 200 ].
- This model represents assessed behavior [ 205 ] which can be either individual behavior [ 210 ] or collective behavior [ 215 ].
- the system establishes baseline behaviors [ 260 ] which are a synthetic representation of communication habits and normal interactions, then assesses deviations [ 265 ] by comparing assessed behaviors [ 205 ] to such a baseline. This allows the detection of anomalies in recent or past behavior, however the system also attempts to predict behavior [ 262 ] in the near future based on the behavioral model [ 200 ].
- the disclosure relies on a number of behavioral traits [ 295 ] which are dimensions according to which an actor's [ 220 ] personality and behavior can be reliably measured, and are computed based on a set of behavioral metrics [ 290 ]. Actors [ 220 ] and their assessed behaviors [ 205 ] can then be evaluated on an absolute scale using scores [ 285 ], and on a relative scale using ranking [ 275 ] mechanisms, in order to assess the relevance [ 280 ] of any detected anomaly [ 270 ].
- the behavioral model [ 200 ] computed by the system, as well as the anomalies [ 270 ] produced, are presented by the system using supporting evidence [ 202 ] and visualizations [ 204 ] in one embodiment.
- a visualization [ 204 ] is produced in several stages. Data is generated [ 365 ] over time either by an iterative process over batches of data [ 370 ], or on a continuous basis [ 375 ] using for example a sliding window mechanism [ 380 ]. Input data [ 365 ] for a particular visualization [ 204 ] is then selected [ 360 ] either automatically by the system or interactively by a user. A layout [ 355 ] is then produced to efficiently display [ 350 ] the visualization as part of the system's user interface [ 300 ].
- the system continuously raises alerts [ 305 ] about behavior flagged as anomalous, for which notifications can be automatically sent to the user.
- An alert [ 305 ] can be a past alert [ 306 ] based solely on already analyzed events [ 100 ], or a predicted alert [ 307 ] based on the likelihood of some events [ 100 ] occurring in a near future and associated to anomalous behavior or patterns.
- reports [ 310 ] on behavior and information analysis can be regularly scheduled or generated on-demand.
- the system can be set up to continuously monitor specific queries [ 325 ].
- a complete audit trail [ 330 ] is available that comprises collected data as well as all types of evidence [ 100 ] stored in the collection repository [ 320 ].
- FIG. 4 illustrates the general architecture of the system in accordance with an embodiment of the present disclosure.
- a user of the system [ 455 ] is typically an analyst or human operator whose role is to respond to alerts raised by the system before a malicious act is perpetrated, or right after unintentional damage has occurred, as well as to actively investigate any leads or patterns with the help of the system's analysis results.
- a central event passing infrastructure [ 460 ] is used by all components in the system to exchange data. That infrastructure can be distributed and ideally tries to keep data in flight as much as possible to maximize the system's throughput. That exchanged data comprises:
- Events [ 100 ] are serialized as OSFs [ 110 ].
- a set of scoping policies [ 485 ], such as sliding windows [ 380 ] over the incoming data stream, are used in some embodiments to regulate the processing of events [ 100 ] by downstream components.
- the data collection component [ 400 ] collects data continuously or in batch mode from a variety of heterogeneous data sources [ 401 ], extracts their content and their metadata and stores the extraction results for access by downstream components of the system.
- the continuous categorization component [ 420 ] analyzes the incoming stream of events [ 100 ] to assign one or more categories to those events [ 100 ], using any number and variety of categorization components [ 146 ], and maintaining the validity and quality of the results even in the case of categorization components [ 146 ] that are inherently data-dependent.
- the continuous discussion building component [ 410 ] establishes discussions [ 136 ] as a structure linking causally related items [ 122 ].
- the discussion-building mechanism described in the present disclosure builds on the disclosure described in U.S. Pat. No. 7,143,091 to support a continuous mode of operation [ 375 ] in a highly-scalable manner.
- the continuous clustering component [ 412 ] produces clusters of items [ 122 ] or events [ 100 ] from the incoming data stream on a continuous basis. It is a required stage of continuous discussion building [ 410 ].
- the continuous periodic patterns detection component [ 405 ] analyzes the incoming stream of events [ 100 ] to find event sequences [ 166 ] that are recurrent over time and occur at roughly periodic intervals, thereby constructing periodic patterns [ 126 ] on a continuous basis.
- the continuous workflow analysis component [ 465 ] automatically detects ad hoc workflow processes [ 128 ] from the incoming stream of events [ 100 ] and analyzes the workflow instances [ 134 ] corresponding to those processes [ 128 ], including the detection of anomalies [ 270 ] in their realization.
- the continuous emotive tone analysis component [ 435 ] is used in some embodiments by the system to identify and analyze occurrences of emotional expression in electronic communications [ 123 ], which provide valuable categorization information to other components of the system, particularly the behavioral modeling component [ 445 ].
- the pragmatic tagging component [ 430 ] is another component used in some embodiments of the system which is based on linguistic analysis: it categorizes the communicative and discourse properties of electronic communications [ 123 ]. In particular, its output produces an abstract workflow model that lets the system detect and analyze workflow processes [ 128 ] associated to the realization of specific tasks.
- the textblock detection component [ 470 ] automatically identifies maximum contiguous sequences of sentences or sentence fragments which can likely be attributed to a single author. Once these textblock patterns [ 124 ] have been detected, any item [ 122 ] that contains that textblock or a significant portion of it is flagged by the system as a textblock hit [ 130 ], which allows the system to assess how information is exchanged or disseminated by specific actors [ 220 ].
- the behavioral modeling component [ 445 ] builds and maintains a model [ 200 ] of individual behavior [ 210 ] and collective behavior [ 215 ], which is defined by any number of behavioral and personality traits [ 295 ] that can be determined in the specific scenario at hand.
- a user of the system [ 455 ] can view that behavioral model [ 200 ] using a number of visualizations [ 204 ] described in the present disclosure.
- the anomaly detection component [ 450 ] continuously monitors the incoming stream of events [ 100 ] (both observed [ 102 ] and derived [ 104 ], including the behavioral model [ 200 ]) with the main goal of spotting anomalous behavior and anomalous patterns in the data based on statistical, analytical, and other types of properties associated to both recent data and historical data.
- the anomaly detection component [ 450 ] also produces alerts [ 305 ] by aggregating anomalies [ 270 ] and reports [ 310 ] sent to the user [ 455 ].
- the system also comprises time-based and behavior-based continuous visualizations [ 204 ] of those alerts [ 305 ].
- Anomalies [ 270 ] detected by this component are also fed to visualizations [ 204 ] in order to highlight anomalous patterns to the user [ 455 ], and can optionally trigger mitigating or preventive actions.
- the anomaly detection component [ 450 ] includes an anomaly detection tuning scheme which maintains the relevance and the accuracy of produced anomalies [ 270 ] based among other things on anomaly feedback [ 158 ]. However, in most such embodiments, all alerts [ 305 ] are still calculated and the user [ 455 ] is informed when significant numbers of different types of anomalies [ 270 ] associated with the same actor [ 220 ] are observed by the system; at any rate, all such instances are logged.
- the continuous multi-dimensional scaling component [ 425 ] computes a low-dimensional layout of an incoming stream of events [ 100 ]. Its output is particularly useful for the sequence viewer [ 440 ] which shows a potentially massive number of tagged sequences [ 138 ], for example those corresponding to the realization of a particular workflow process [ 128 ], thereby outlining dominant patterns and outliers in the instances [ 134 ] of that workflow process [ 128 ].
- the alias usage browser [ 478 ] is a visualization [ 204 ] used in the present disclosure to efficiently display and navigate through the results of actor analysis [ 480 ], for example performed as described in U.S. Pat. No. 7,143,091 which is incorporated by reference herein for all purposes.
- continuous visualizations [ 204 ] show patterns [ 120 ] derived by the system from the data and from human interactions and behaviors, as well as the anomalies [ 270 ] that may have been detected in those patterns.
- These continuous visualizations [ 204 ] include, but are not limited to: animated actor graph visualizations [ 471 ], the social you-niverse visualization [ 472 ], the stressful topics visualization [ 473 ], the temperature gauges visualization [ 474 ], the buck passing visualization [ 475 ], the pecking order visualization [ 476 ], and the love life visualization [ 477 ]
- the system described in the present disclosure processes and analyzes electronic data continuously collected from any number of data sources [ 401 ].
- Those data sources can be of virtually any type: a particular embodiment of the present disclosure only needs to extract the data and metadata from types of data sources relevant to the scenario at hand.
- Types of data sources that can be leveraged by the system for behavioral modeling and anomaly detection purposes include, but are not limited to the following.
- emails collected from email clients or from email transfer agents
- instant messaging calendar events, etc.
- Electronic document sources document management systems, file shares, desktop files, etc.
- Phone data sources including phone logs, phone conversation transcripts, and voicemail.
- Log files application log files, system log files such as syslog events [ 100 ], etc.
- Databases changes to table rows in a relational database or more generally events [ 100 ] captured in real time during a transaction.
- Public social networks such as Facebook, Twitter, etc.
- Physical data sources physical access logs (keycards, biometrics, etc.), sensors and sensor networks (including RFID readers), geo-location information collected from portable devices, etc.
- External monitoring system as described in this disclosure, any external monitoring system can be integrated as a data source, for example rule-based compliance systems or network intrusion detection systems.
- personal communication channels such as email accounts, weblogs or websites
- data publicly available on the Internet data subpoenaed in the case of a law enforcement organization
- wiretaps and intelligence collected from the field in the case of an intelligence organization etc.
- the first component which regulates how data flows through the system in many embodiments is the scoping policies component [ 485 ].
- the scoping policies component [ 485 ].
- the system In presence of a quasi-infinite stream of data, as is usually the case with continuous applications, the system needs to possess extensive data flow management and data aging policies.
- the scoping policies component [ 485 ] This role is carried out by the scoping policies component [ 485 ]. Its functionality is transversally used by all other data manipulating components of the system. Most embodiments maintain an audit trail of changes to the scoping policies.
- the scoping policies component may have multiple policies defined, the highest priority being executed first.
- the scoping policies component [ 485 ] is essential to the continuous mode of operation of other components, such as the data collection component [ 400 ] and the continuous clustering component [ 412 ].
- One of the simplest examples of a scoping policy is an aging policy where a sliding window [ 380 ] of time is maintained over the incoming stream of data. Every time a piece of data falls out of the window [ 380 ], a notification message is sent to all the components notifying them that they are free to discard that data.
- Another type of scoping policy can evict data based on a set of predicates. For example, a predicate that states that every email coming from a certain group of actors [ 225 ] should be first to be discarded when resources are low, or based upon an order from a properly authorized user of the system [ 455 ] requiring certain sensitive data be removed from the system.
- the scoping policies component [ 485 ] based on the configuration, will or will not attempt to enforce any action to be taken by processing and analysis components. In case the scoping policies component [ 485 ] is configured to not enforce its policies, the other components should decide by themselves what to do when they receive the notifications.
- asynchronous procedures can be launched to effectively “garbage collect” the discarded records from the caches.
- Some policies can also be coupled with archival systems or historical databases which will guarantee exploration of the data outside of the continuous flow. Setting up archival systems or historical databases is optional, however, and is not necessary for the proper operation of the system.
- the data collection stage of the present disclosure builds on the method described in U.S. Provisional Patent Application No. 61/280,791, the disclosure of which is incorporated by reference herein for all purposes, while integrating that method into an anomaly detection scenario and adapting it to a continuous mode of operation [ 375 ].
- the main implication of the latter requirement is that due to new data being processed continuously by the system, a pruning mechanism is necessary to keep the volume of persisted data bounded, or at least to keep the increase rate of that volume bounded.
- data collection is performed by the data collection component [ 400 ] within a collection session [ 500 ] in any combination of the following ways:
- any of these collection modes can be combined within the same collection session [ 500 ].
- collection will be initiated in a continuous mode [ 510 ] to collect future data (such as data captured in real-time from external data sources [ 401 ]) but the user can also at the same time, or at a later time, set up incremental collections [ 515 ] from data sources [ 401 ] hosting historical data.
- This allows the system to provide input data [ 365 ] to analytical visualizations [ 204 ] as early as possible, while other components such as the behavioral modeling component [ 445 ] require historical data to have been processed and analyzed in order to establish a behavioral baseline [ 260 ].
- the continuous data collection component [ 400 ] described in the present disclosure has a collection rate adaptation scheme which takes into account several elements to adjust the rate, including but not limited to the following:
- Data collection covers both structured and unstructured data available from locations, including but not limited to:
- the new, fundamentally different characteristic in the context of the present disclosure is that the model, in addition to being highly scalable, needs to be pruned over time since it represents an infinite data stream.
- Different strategies are described in the rest of this section which allow the present disclosure to prune, among other elements, the index, the item relationships (entity to instance, parent entity to child entity, etc.), and the actor and communication graph. Possible pruning strategies are explained in the section on Collection management. Additionally, the model also needs to deal with discussions [ 136 ] that are still in progress, i.e. have not reached a resolution or another form of completion.
- Collection instances [ 545 ] are central elements of the system's mode of operation. They are designed to fulfill the requirements of this invention, namely to provide a single data structure underlying all analytical tasks, including the storage of the complete revision history of documents and allowing the system of the present disclosure to determine the origin of all analyzed data.
- a collection comprises a number of elements and properties, which can be categorized under collection parameters and collection artifacts.
- the collection operation's definition and parameters [ 640 ] include:
- the processing artifacts [ 610 ] include:
- Collection instances [ 545 ] are stored in one or more secure collection repositories that are only accessible to administrators of the system, and any other users to whom an administrator has explicitly granted access.
- Such a repository may contain collections of different natures.
- dynamic collection instances [ 545 ] and static collection instances [ 545 ] the former corresponds to the results of ongoing collection using the data collection component in continuous mode [ 375 ], the latter to the results of a single collection operation or of a data intake from a set of physical media. It is important to provide a single point of storage for both kinds of collected data.
- Collection instances [ 545 ] thus contain information about the lifecycle of the original data items. This means in particular that each collected item is flagged in the collection repository with a number of attributes including the following: which custodians or systems currently have which revisions of the item [ 122 ] (if more than one revision exists), which custodians (or systems) deleted or edited the item [ 122 ] and when these edits were made, and optionally the full item [ 122 ] revision history. (This is optional because it can increase storage requirements and the volume of information to post-process.)
- the persistence of the full information lifecycle allows the anomaly detection component of the system to detect anomalies [ 270 ] related to that lifecycle, including the following examples:
- the collection audit trail contains information about different types of data, namely the items [ 122 ] that have been collected and stored; the items [ 122 ] that have been collected and discarded; finally, the items [ 122 ] that have been ignored.
- one the fundamental benefits of the system is that there is no need to exhaustively store the whole data that has been analyzed, since in most cases the patterns resulting from statistical analysis of that data are sufficient to establish a behavioral baseline [ 260 ] and thus to detect anomalies [ 270 ].
- Very coarse-grained collections can be configured in some specific scenarios, which in turn speeds up processing of collection results.
- a collection session is designed to support a long-term investigation or monitoring case based on a substantial amount of previously collected data.
- analysis results from that prior data can be leveraged by the anomaly detection component [ 450 ], hence it is of great benefit to store the full history of successive item revisions [ 625 ] in the repository [ 655 ], including but not limited to the following reasons:
- collection management is performed in the administration console, and offers various maintenance and administration functionalities, detailed in the rest of this section.
- Merging [ 530 ] collection instances [ 545 ] A common and convenient operation is merging two independently created collection instances [ 545 ]. This means that the original data sets are merged into a new collection instance [ 545 ], along with all associated processing artifacts.
- Pruning [ 525 ] collection instances [ 545 ] The size of collection artifacts from collection operations, both in batch mode [ 370 ], and in continuous mode [ 375 ], tends to increase on the long term. After running for several weeks or months they may contain extremely large volumes of extracted data and metadata, especially when complete item revision [ 625 ] histories are stored.
- collection instances [ 545 ] can be manually pruned, by removing information prior to a given date, or for particular manually performed investigations and data analysis operations that are no longer relevant, or for particular custodians of the data, etc.
- the system optionally performs automatic pruning [ 525 ] of the collection instances [ 545 ]. This is done by assigning a pruning mode to any given collection instance [ 545 ], which is enforced by the scoping policies component [ 485 ].
- the scoping policies component [ 485 ] provides a number of scoping policy types, which when applied to collection instances include but are not limited to the following:
- Dividing collection instances [ 545 ] according to certain criteria is useful when, for example, a new investigation or monitoring case has to be processed which corresponds to a subset of the data collected for a prior case.
- Complementing [ 520 ] collection instances [ 545 ] This consists of creating a new collection instance [ 545 ] from a prior collection session [ 500 ] and running it in an incremental mode so as to collect data that has been added since the prior collection session [ 500 ], but also metadata updates, deleted data, etc.
- collection instances [ 545 ] can finally be deleted in a safe and verifiable manner, for example to comply with a destruction order or a retention policy. In one embodiment, all constituents of the collection instance [ 545 ] are erased.
- the continuous clustering component [ 412 ] produces clusters of items [ 122 ] or more generally events [ 100 ] from a set or a data stream on a continuous basis.
- a cluster is defined as a grouping of events [ 100 ] similar with respect to some observed features.
- the similarity measures are configurable.
- a similarity measure is a function that ascertains similarity between events. In its simplest form, it takes two events [ 100 ] and returns true if they should be considered similar, or false if not. Similarity measures can also provide a degree of similarity between two events instead of a binary answer. A degree of similarity can be a number in a specific interval, usually from 0 to 1. This allows us to perform fuzzy clustering. Custom similarity measures can be defined to fit the different types of data that are being processed.
- An embodiment of a similarity measure can be set up to receive two emails and return true if their tf-idf vectors' cosine exceeds a certain threshold, and false otherwise.
- Another embodiment can take two phone logs and return true if the caller and receiver on both phone logs are the same, and false otherwise.
- Another embodiment can be a set to operate on any type of event [ 100 ] that represents a communication [ 123 ] between two actors [ 220 ] and return true if two events have exactly the same set of actors [ 220 ], regardless of the type or channel [ 156 ] of the event [ 100 ].
- a similarity measure definition is thus very flexible, and is not necessarily constrained by the use of heterogeneous event [ 100 ] types.
- the method is fully continuous, which implies that it produces usable results (sets of clusters) and updates them as it acquires the events [ 100 ] from the underlying data stream.
- the method is stable and incremental. Running it on the same set of events [ 100 ] streamed in different orders produces the same result.
- the method runs on heterogeneous types of electronic events [ 100 ] as long as a similarity measure can be defined on them.
- the method can be configured to prioritize the processing of certain events [ 100 ].
- the continuous clustering is organized around a flow of events [ 100 ] through a set of components that process the events [ 100 ] as they receive them, and immediately forward the results needed by the downstream components, usually in the form of typed updates.
- the continuous clustering component [ 412 ] connects to a provider of raw events [ 100 ] which in the default embodiment of the present disclosure is the event passing infrastructure [ 460 ] which relays data collected by the data collection component [ 400 ] from any other source of raw electronic events such as:
- a feature is a labeled and typed value copied directly or derived from an event [ 100 ] or its underlying items. In case of a derived value, the derivation is free to use information from other events [ 100 ] but a resulting feature is directly associated to the event [ 100 ] itself.
- a feature can be shared among multiple events [ 100 ]. Depending on their type, sets of operations can be defined between different features or structures of features, notably, some of those operations establish equality, equivalence or order between two or more features.
- the feature collection phase collects, for each observed event, the necessary information needed by the downstream components. It is a fully configurable component allowing the specification of all the subsets of the data stream, as well as the snippets of information to retain from each event its high level functions are described in FIG. 10 :
- An embodiment of a feature collector can be set up to only pass emails and IMs to the downstream components (filtering out, for example, phone logs, key card accesses and any other non email or IM records).
- This feature collector could then be set up to extract, for each email and IM, the sent date, which will be used as a time stamp, the title, the sender, receivers and other participants, and instead of passing around the whole raw content, would extract a shorter set of the named entities of the content and their modifying verbs, the feature collector would then compress the whole set of features and pass it along to the next component.
- An embodiment of a feature collector can be set up to automatically prioritize certain types of events [ 100 ] by attaching a priority level to every event that satisfies a prioritization predicate.
- prioritization predicates can be:
- Each event [ 100 ] not filtered out by the aforementioned feature collection phase now has an internal lightweight representation consisting of only the data needed by the downstream components.
- the continuous clustering component [ 412 ] creates and updates sets of clusters of events [ 100 ] continuously as it receives them (in our current setting, from the feature collector).
- the clustering method is an extension of the document [ 162 ] clustering method described in U.S. Pat. No. 7,143,091, the disclosure of which is incorporated by reference herein for all purposes.
- the current method augments the previous one with the following functionalities:
- the main operations are performed using of two main data structures:
- the continuous aspect is achieved by performing the following operations every time a new event [ 100 ] is acquired:
- an embodiment of the clustering component [ 412 ] could do the following:
- the continuous clustering component [ 412 ] also provides methods to query and manipulate its state while it is still running. In some embodiments, examples of those methods are
- Deltas are a form of typed update [ 107 ] used in order to avoid re-sending whole clusters around when a change has been made to them.
- a delta is made of a cluster identifier, and a set of update declarations.
- the update declarations are either additions to the identified cluster or removals from the identified cluster.
- a simple example of delta can be represented as:
- Each update declaration is also interpreted as an idempotent operation, therefore, if by any forwarding or network glitch, the same delta is received twice or more by any component, it will have the same effect as if it was received only once.
- a unique identifier is synthesized for every event [ 100 ], calculated as described in U.S. Pat. No. 7,143,091.
- Examples of events [ 100 ] considered by the periodic patterns detection component [ 405 ] in one embodiment of the present disclosure are:
- Each association between a type of event [ 100 ] and the right type of time stamps is either directly inferred when there is no ambiguity, or can be configured. Some events [ 100 ] also have a duration.
- an event [ 100 ] does not require a one-to-one relationship with an electronic item.
- a simple class of events can be defined as a dialogue of a certain maximum duration, another class of events [ 100 ] can consist of all the electronic records constituting a financial transaction.
- the periodic patterns detection component [ 405 ] therefore allows any logical, identifiable and detectable bundle or derivative of electronic behavior, to be treated as an event [ 100 ].
- a periodic pattern [ 126 ] is an event [ 100 ] or a group of events that is recurrent over time and occurs at roughly periodic intervals. Examples of such patterns [ 126 ] are
- a periodic sequence [ 132 ] is an actual sequence of events that matches a periodic pattern [ 126 ].
- This section describes a method that finds within a set or a stream of events [ 100 ], the periodic patterns [ 126 ] and yields the actual sequences of events [ 100 ] that correspond to the realization of the periodic patterns [ 126 ]. It does this without any prior indication of the periodicity or possible periodic events [ 100 ] themselves.
- the method even finds the periodic patterns [ 126 ] that have changing frequency components over time.
- the method is fully continuous, updating relevant periodic patterns [ 126 ] as new events [ 100 ] enter the system.
- the method is robust against incomplete or locally irregular data and localized changes of frequencies of the events [ 100 ].
- results will be yielded and updated as soon as the data is processed and we do not need to wait until the end of the year to produce the first results:
- the results are available on a continuous basis.
- the resulting set of periodic patterns [ 126 ] can also be queried by both its structural information: What periodic sequences share the same gap? What periodic sequences have a gap of size X during time period Y?
- This method can also recombine previously discovered periodic patterns [ 126 ] into higher order periodic patterns [ 126 ] that provide a richer picture of the regularities or irregularities of the data set or data stream.
- the present disclosure assumes a stream of events [ 100 ] flowing into the periodic patterns detection component [ 405 ]
- the system needs to operate on a static dataset (i.e. in batch mode [ 370 ])
- it simply iterates through the static dataset and provide the events [ 100 ] to the periodic patterns detection component [ 405 ], hence adding no additional difficulty when processing static datasets.
- the periodic patterns detection component [ 405 ] operates on a set of clusters of events [ 100 ] forwarded by the continuous clustering component [ 412 ] described in this disclosure. It receives a set of deltas [See section below for a definition of deltas] and updates the periodic patterns [ 126 ] that have the elements of the clusters as their underlying events [ 100 ].
- the periodic patterns detection component [ 405 ] performs the following actions:
- a process Upon reception of a delta, a process is spun off by the periodic patterns detection component [ 405 ] and locally reloads all necessary information for continuing the analysis of the corresponding clusters.
- the search for frequency components happens at different fixed time resolutions.
- the available time resolutions can, for example, range from a very fine grained resolution such as second or below, to coarser grained resolutions such as hour, day, week, month, etc.
- the analysis starts with the smallest resolution.
- the list of events [ 100 ] is sorted and binned according to the resolution.
- the resulting binned sequence can be treated as an array with each of its elements corresponding to the content of an individual time bin.
- Each element of the binned sequence can therefore represent:
- time bin will sometimes be used as a unit of measure or time just like a natural unit would be used such as day or month.
- the sparse transition matrix T is a matrix that has event classes on its rows and integers on its columns.
- An event class is defined here as a set of events [ 100 ] that are similar.
- An example of event class is a set of all instances of a particular meeting notification email.
- An entry T[i,j] of the matrix is a structure s containing an integer indicating the number of times instances of events [ 100 ] of the class denoted by the index i follow each other separated by a distance of j time bins. This number contained in the entry will be referred to as a repetition count. If the integer contained in the structure s of the entry T[e, 4 ] is 10, that implies that events [ 100 ] of class e have been recorded to succeed each other every 4 time bins 10 times.
- the structure s also records the intervals where those successions occur in order to allow the location of the time intervals when those transitions occur. It also records, for each event class encountered in a time bin, the number of instances of that class observed.
- Reading this binned sequence will produce or update a sparse transition matrix with the following repetition counts (we omit detailing the whole structure of the entries).
- T[e, 2 ] 6 repetitions, at intervals [ 1 , 7 ] [ 11 , 17 ]
- T[e, 4 ] 1 repetition, at interval [ 7 , 11 ]
- T[f, 3 ] 2 repetition, at interval [ 11 , 17 ]
- the creation and update of a sparse transition matrix is a linear operation with respect to the size of the binned sequence. Using two pointers along the binned sequence, one could record all transitions for each event class in one pass.
- a higher-order periodic pattern is any complex periodic pattern obtained via recombination of previously detected standalone periodic patterns.
- Previously built periodic patterns [ 126 ] can be automatically recombined if they satisfy two types of conditions:
- a semantic condition is any condition that triggers a recombination attempt while not being based on the pure periodic structure of the periodic patterns [ 126 ] to be recombined.
- Such conditions include, for example, an attempt to recombine two periodic patterns [ 126 ] because their underlying event classes have been declared similar and therefore merged into the same cluster by an upstream component.
- Semantic conditions, even though they serve as triggers, are sometimes not sufficient to mandate a recombination. Often, the recombination needs to be validated by satisfying further structural conditions.
- a structural condition is any condition based solely on the structure of the periodic pattern [ 126 ].
- Structural conditions are built around information about periodicity, time span, intervals of occurrence, disturbances or gaps. Everything else not related to the structure of the periodic pattern [ 126 ] is labeled as a semantic condition.
- This binned sequence yields at first two distinct periodic patterns [ 126 ]: A first one with a frequency component (e,*) which indicates that an event of class e occurs every 2 time bins and a second one (f,*,*,*) which indicates that an event of class f occurs every 4 time bins.
- the periodic patterns [ 126 ] (e,*) and (f,*,*,*) have respective period or length of 2 and 4 which are multiples. Furthermore, we can observe that every fi is always preceded and followed by a pair ei, ej. This satisfies the bounded phase difference variance condition, giving us enough confidence to recombine the two periodic patterns [ 126 ].
- Structural conditions are not limited to the one illustrated in the above example.
- other conditions such as alignments of disturbances or gaps, where the time intervals representing the disturbances or gaps of a set of periodic pattern line up with a very small variation, can also be used.
- recombinations can also be made offline by using the querying mechanism on periodic patterns [ 126 ] stored by the system in a periodic patterns [ 126 ] database.
- These attributes are indexed in a constant-access memory cache backed up by a persistent store by the cluster IDs and the periodic pattern IDs allowing a fast retrieval when clusters or periodic patterns need to be updated.
- an embodiment of the periodic patterns detection component [ 405 ] could remove the corresponding event [ 100 ] from its caches and persistent stores and keep only its identifier and time stamp within a compressed representation of the periodic patterns [ 126 ] the event [ 100 ] appears in. This is another way to save the amount of space used for caching results from continuous periodic pattern detection.
- every sparse matrix is continuously updatable and the frequency component detection happens incrementally.
- any binned sequence need not be passed to the system in its entirety in one pass. Being able to build the whole binned sequence in one shot is often impossible anyway because of the continuous nature of the system: the periodic patterns detection component [ 405 ] processes events [ 100 ] as it receives them.
- This type of incrementality can be achieved using a dirty flag for each matrix and setting it to true every time an update is made to the matrix, along with the set of two indices pointing to the modified entries. After a matrix is updated from a new binned sequence, process only the entries that have been marked as updated and reset the dirty flag to false.
- the data structure representing a periodic pattern [ 126 ] has, but is not limited to, the following attributes:
- Segments and disturbances are also continuously indexed to allow querying based on time intervals.
- An embodiment of the data structure used to index segments and disturbances is a slightly modified version of an interval tree as described in [Cormen 2009] in which an extra annotation is added to every interval node to point to the corresponding periodic pattern ID.
- Periodic patterns [ 126 ] are not forwarded to other components or databases in their entirety. Because the present disclosure supports operating in a continuous mode where it must produce results as it receives events [ 100 ] and update results accordingly, creations and updates of periodic patterns [ 126 ] are forwarded in the form of periodic pattern mutations.
- a periodic pattern mutation is a form of typed update [ 107 ] consisting of enough information needed by any component receiving it to construct a new periodic pattern [ 126 ] or update an existing one. Its attributes include but are not limited to:
- periodic patterns database which upon reception of each periodic pattern mutation updates or creates the corresponding periodic pattern [ 126 ] and saves it in the database for future use.
- This process performs the following actions upon reception of a periodic pattern mutation data structure:
- the periodic patterns database [ 940 ] is the main repository for all periodic sequences [ 132 ] and their associated periodic patterns [ 126 ]. It is continuously updated.
- the present disclosure comprises a continuous categorization component [ 420 ], which is leveraged in multiple ways, including but not limited to the following use cases:
- the categorization model [ 1470 ] is initially built during system setup, and is later maintained throughout the system operation. Its primary purpose is to evaluate the results produced by the categorization components [ 1400 ] and to ensure their consistency and quality.
- the structure of the categorization model [ 1460 ] is summarized in FIG. 16 , which shows a particular snapshot of the model. The complete history of the categorization model [ 1460 ] is actually persisted, along with the system or user decisions that resulted in each new model version.
- This model is initially built on the results of knowledge engineering (which includes defining and tuning ontology classifiers [ 1410 ] and other categorization components [ 1400 ]) and on the reference data provided to the system, where reference data includes but is not limited to the following constituents:
- the relevance model stores elements of information including but not limited to:
- FIG. 14 shows the preferred embodiment of the categorization process, which can be broken down into three main phases: initialization, adaptation, and iteration.
- the set of categorization components [ 1420 ] available to the system comprises components of different types, as shown in FIG. 14 . This enumeration is not exhaustive and can be extended to include any method defining subsets of the data set.
- An initial set of components [ 1420 ] is built at the beginning of the continuous categorization process. These components [ 1420 ] are continuously maintained, meaning that some new components [ 1420 ] can be added and existing components [ 1420 ] can be deleted or modified. This happens either when a significant model change has been automatically detected in the data, or when an administrator needs to implement a new policy (whether internal or external to the organization).
- the following describes multiple methods provided by the system to assess quality of categorization results and automatically adapt in response to deteriorating quality or unexpected results.
- the system automatically evaluates the classification rules [ 1400 ] produced at the query fitting stage [ 1430 ] with respect to performance goals defined in the categorization scope.
- This comprises many variants of validity checks, for example by computing performance targets either on the basis of each classification rule [ 1400 ] retained by the algorithm, or on the basis of the final categorization decisions resulting from applying the whole set of rules [ 1400 ].
- Different validity checks are also available when some categorization codes include unknown values or values corresponding to a high level of uncertainty.
- Further statistical validation can optionally be performed to assess the generalization power of the classification rules [ 1400 ]: for example, verifying the self-similarity of the categorization results [ 1460 ] (by expanding categorization beyond the manual sample) ensures that there is no major over-fitting with respect to the manual sample [ 1425 ].
- a manual validation stage [ 1440 ] needs to be performed.
- This verification is a sanity check, mainly consisting of controlling that the associations established between categorization rules [ 1400 ] and category codes make sense and that there are no obvious indications of over-fitting, i.e. of excessive adaptation of the rules [ 1400 ] to the data sample that clearly would not generalize to the data in its entirety.
- manual validation of the classification rules [ 1400 ] is made easier by having the rules contain as much human-readable information as possible: for example, ontology classifiers [ 150 ] have an expressive name and a textual description.
- the continuous categorization component [ 420 ] automatically evaluates the categorization results against the categorized data. This evaluation is done in addition to the manual and automatic validation steps already performed on the classification rules themselves. Furthermore, the user [ 455 ] implicitly evaluates the global categorization results by reading the reports generated by the process.
- a basic technique consists of comparing results from a manual sample with results generalized over the whole data set.
- Profile change detection is a key step in assessing and guaranteeing quality of the output produced by the continuous categorization component [ 420 ].
- the data profile i.e. the statistical distribution of the data analyzed by the system in a continuous manner, changes over time, the classification rules and the categorization components [ 1420 ] themselves risk becoming obsolete and decrease both in recall and in accuracy, thus deteriorating the quality of data processing and hence the overall quality of anomaly detection. Therefore in such cases an analyst needs to be notified that either manual re-sampling or component [ 1420 ] updating is necessary.
- profile changes can be automatically detected by the model: for example when a connector allows data collection from HR applications, the appearance of a large volume of data associated to a new employee can be related by the categorization process to that employee's information recorded in the HR system. This case of profile change usually does not reveal any flaw in the categorization components [ 1420 ] and only requires sampling data for the new actor [ 220 ].
- categorization model [ 1470 ] refinements following a profile change include, but are not limited to, the following:
- a default set of data profile changes are monitored, including but not limited to the following list (which also suggests examples of typical cases where such a change occurs):
- each such profile change depends on a number of criteria.
- these criteria are available as detection thresholds including but not limited to the following.
- the hypergraph [ 114 ] system defined here is designed to be used in an environment where elements are continuously added to the hypergraph [ 114 ] and may trigger incremental computations to update derived structures in the hypergraph [ 114 ], of which discussions [ 136 ] are an important example.
- hypergraph operations [ 115 . 26 ] are particularly good for this. Most embodiments will opt to accumulate some number of changes to be triggered in batches at a later time.
- a set of hypergraph operations [ 115 . 26 ] are defined which facilitate different strategies for defining the neighborhood of an element.
- the continuous embodiment of discussion [ 136 ] building is based on accumulation of relationships between actors [ 220 ], events [ 100 ] and items [ 122 ] stored in a hypergraph [ 114 ] data structure. These relationships and the subsequent structures built on them are all considered evidences [ 108 ] to be used in the discussion [ 136 ] building procedure.
- the hypergraph [ 114 ] system is a generic framework supporting incremental hypergraph [ 114 ] computations, described here for the purpose of building discussions [ 136 ] on a continuous basis.
- the hypergraph [ 114 ] is represented as a set of OSF [ 110 ] values serialized and stored in records [ 115 .
- hypergraph store [ 11522 ] which in most embodiments consists of one or more large archives. Rather than using a keyed database, in these embodiments OSF [ 110 ] records [ 115 . 24 ] are referred to via their address in the hypergraph store [ 115 . 22 ].
- the hypergraph store [ 115 . 22 ] is tiled into segments [ 115 . 23 ], and addresses have the form of a segment [ 115 . 23 ] and an offset within that segment [ 115 . 23 ].
- a segment table [ 115 . 25 ] is stored which contains state information over all segments [ 115 . 23 ] allocated at any time. If a segment [ 115 .
- OSF [ 110 ] records [ 115 . 24 ] are immutable.
- a new version of a record to be updated is appended to the hypergraph store [ 115 . 22 ].
- new versions of those records [ 115 . 24 ] must be appended in order to make it visible. In practice this fits very well with the continuous computation model, as new evidence [ 108 ] added to the hypergraph [ 114 ] will trigger revisiting existing discussions [ 136 ] or other structures for which it is relevant. Thus the relevant entities that may have to be updated will be anyway visited as part of the incremental algorithm.
- garbage collection is effected by simply removing segments [ 115 . 23 ] when they pass an age threshold.
- items [ 122 ] that have become too aged are no longer directly used in the computation of new discussions [ 136 ] or other structures.
- different embodiments will implement the notion of “too old” in different manners depending on their primary use cases. While some embodiments may simply use a calendar threshold, others will operate on the basis of obsolescence. For example, data relating to an actor [ 220 ] who is no longer on the scene may be deemed obsolete faster than the same sorts of data for an actor [ 220 ] who remains in the universe.
- hypergraph operations [ 115 . 26 ] For purposes of scalability the system focuses on locality in hypergraph operations [ 115 . 26 ], that is, operations based on looking at elements in the hypergraph [ 114 ] within a neighborhood of a triggering element or set of triggering elements.
- additional subsystems are used for storing the original data from which records [ 115 . 24 ] in the hypergraph [ 114 ] have been derived as well as for storing the final computed structures, such as discussions [ 136 ], used by downstream modules and applications.
- Other embodiments may choose to store all or some of this data in the hypergraph store [ 115 . 22 ] as well. In such embodiments, segments [ 115 . 23 ] would be tagged for different roles, and only those segments [ 115 . 23 ] involved in incrementally computed results would undergo a collection process. Other embodiments may flag collected segments [ 115 . 23 ] so as to redirect the referring process to an external, longer term data store.
- Multi-threaded or multi-process access to the hypergraph store [ 115 . 22 ] becomes much simplified, with minimal to no requirements for record locking.
- the use of the OSF [ 110 ] formalism and the concept of unification of structures creates many opportunities for sharding and layering of the hypergraph [ 114 ] in a consistent and efficient manner.
- hypergraph [ 114 ] elements can be split up in different ways and unification can be used to rejoin them.
- the type system implemented for OSFs [ 110 ] allows us to place constraints on allowable modifications to parts of an element that have been placed in separate locations, thus helping to enforce consistency when they are eventually rejoined.
- sharding of a hypergraph [ 114 ] can be used to simplify sharding of a hypergraph [ 114 ], e.g. splitting it up into subsets to be distributed.
- Subsets can contain redundant information, or can split apart elements.
- New OSE [ 110 ] types can be created and sent with the shard to be used as constraints to make sure that modifications are handled consistently.
- Layering of the hypergraph [ 114 ] can be handled by splitting all elements in the hypergraph [ 114 ] based on subsets of features.
- Most embodiments of the system define a set of hypergraph operations [ 115 . 26 ] on the hypergraph [ 114 ] including but not limited to the following: closure, projection, traversal and a set of hypergraph [ 114 ] path algebra operations.
- these operations may specify additional OSF [ 110 ] constraints [ 115 . 9 ].
- the projection operation [ 115 . 28 ] would start with a set of edges [ 115 . 20 ] and select only those that unified with an OSF [ 110 ] value to be used as a constraint [ 115 . 9 ].
- the system uses an additional method of constraining the elements that these hypergraph operations [ 115 . 26 ] can access, called “boxing”.
- a box [ 115 . 15 ] is simply a set of atoms [ 115 . 19 ] from the hypergraph [ 114 ].
- the purpose of boxing [ 115 . 15 ] is to create groups of hypergraph [ 114 ] elements that are related by more complex relationships than can easily be expressed in queries [ 115 . 1 ]. These groups may also be determined by external factors not stored in or referenced from the hypergraph [ 114 ]. In many cases the boxes [ 115 . 15 ] may simply be useful for optimizing constraints during query [ 115 . 1 ] execution, by effectively pre-caching the relevant sets of items [ 122 ] conforming to the constraint. In the default embodiment, a box [ 115 . 15 ] is defined for each discussion [ 136 ].
- these boxes [ 115 . 15 ] may also include other elements of the hypergraph [ 114 ] that were considered to be the relevant evidence for the discussion [ 136 ] or related to the discussion [ 136 ] in other ways.
- a query [ 115 . 1 ] mechanism is used both for ad-hoc retrieval of records [ 115 . 24 ] from the hypergraph store [ 115 . 22 ] as well as to continuously evaluate an active query [ 115 . 1 ] set on new records [ 115 . 24 ] as they are added to the hypergraph store [ 115 . 22 ].
- vertices and edges can be indexed fairly simply.
- the system implements a set of query operators [ 115 . 30 ] to deal with the more complex nature of the hypergraph [ 114 ] elements.
- Incremental computations are triggered when records [ 115 . 24 ] are matched by the continuous queries [ 115 . 1 ].
- query matches [ 115 . 16 ] are produced as an OSF [ 110 ] value. These values include descriptions of the matched items [ 122 ] built up during query [ 115 . 1 ] evaluation, using unification constraints [ 115 . 9 ] in a manner similar to unification based parsing systems, or logic languages like prolog.
- the resulting match record [ 115 . 16 ] will determine which of the incremental update procedures should be triggered.
- the query [ 115 . 1 ] language is designed so that the query operators [ 115 . 30 ] can be used to build up a working set [ 115 . 17 ] of related hypergraph [ 114 ] elements during the evaluation of the query [ 115 . 1 ].
- indexing [ 115 . 11 ] is used as the indexing [ 115 . 11 ] mechanism.
- the indexing [ 115 . 11 ] is used to speed up linear scans over records [ 115 . 24 ] from the hypergraph store [ 115 . 22 ] rather than fast random access.
- Some embodiments may also include index [ 115 . 11 ] schemes emphasizing random access, such as b-tree indices.
- Any of the skip list solutions described here could also be augmented to an inverted index [ 115 . 13 ] representation as commonly used in text search engines.
- Indexing [ 115 . 11 ] boxes [ 115 . 15 ] uses a form of skip lists as well.
- the primary goal for this default embodiment is to be able to build all of these indices by simply appending new entries without modification to the older entries.
- the indices are meant to be read backwards, from the newest entries to the oldest.
- This works well in conjunction with the structure of the hypergraph store [ 115 . 22 ] in which newer versions of a record shadow the older ones.
- Various embodiments may use different methods for detecting when elements have been shadowed.
- One embodiment assigns a logical ID to records [ 115 . 24 ]. When a new version of a record is written, both records [ 115 . 24 ] share the same ID. Many embodiments will simply include the address of the earlier version of the record as an attribute. In both cases the query matching procedure [ 115 .
- OSF [ 110 ] can also use type information attached to a record to determine how it is related to the prior version.
- Many other schemes are possible, so long as they efficiently and effectively dispatch data events [ 100 ] to the appropriate handler based on the record types.
- OSF [ 110 ] values and OSF [ 110 ] records [ 115 . 24 ] are referenced frequently throughout the description of the hypergraph [ 114 ] system.
- the acronym OSE [ 110 ] refers to a particular type of feature structure called an “Order Sorted Feature Structure”, which is well documented in research literature. Feature structures are used throughout AI and natural language processing applications, again we refer to the literature for standard definitions and usages where these concepts well developed and widely understood. The central operation of feature structures is unification, which in essence merges two feature structures (where the values for features that appear in both structures are themselves unified). Only those usages that differ from common practice will be noted here.
- Types in a feature structure system are handled somewhat differently than in a conventional programming language. Types are themselves represented as feature structures and are available as values which can be unified against. This also means that types can be defined dynamically at runtime. While types are defined within a type lattice, in the default OSF [ 110 ] embodiment described here, one or more types can be attached to an OSF [ 110 ] value as long as they are consistent (i.e. can be unified with) the value, rather than the value being an instance of the type. In essence types are used to tag values. However types can also contain instance values in addition to the general classes (such as String, Number, Feature Structure and so on).
- these instantiated values are never stored directly in instance values, instead the implementation looks up the value from a type associated in the instance when it is not present.
- Values can also be uninstantiated and in this way function similarly to a type, i.e. the value represents a class of potential values.
- the default embodiment defines some additional intrinsic types in addition to those normally found in feature structure implementations; one representing a set of Choices over an enumerated set of values, and another for ranges for values that have a natural order associated with them.
- the representation is that of a set of OSF [ 110 ] values that are the equivalent of a set of unification equations as seen in common practice.
- One of the characteristics of feature structures is that multiple features (or feature paths) in the structure may point to the same value instance. This is what distinguishes feature structures from representations such as nested property lists.
- the set of OSF [ 110 ] values each become a feature of a larger OSF [ 110 ] value, and some feature (paths) that exist in multiple of the OSF [ 110 ] values are modified to point to the same value instance.
- constraints [ 115 . 9 ] bi-directional, which is exploited in query [ 115 . 1 ] representations. If we have a chain of constraints [ 115 . 9 ], we can impose a restriction on all members of the chain by unifying a value against the final sub-feature value in the chain to produce a new, restricted constraint [ 115 . 9 ].
- the operator nodes [ 1151 ] are linked via a tree of dependent constraints [ 115 .
- one of the subfeatures is designated the right hand side of the equation (the RHS) and will be used as the OSF [ 110 ] value derived for the root node [ 115 . 7 ] after unifying values reported from the node's [ 115 . 7 ] children in the matching network [ 115 . 5 ].
- Hypergraph The mathematical definition extends standard graphs to allow for edges connecting any number of vertices.
- the hypergraph [ 114 ] system described here extends this definition further by allowing an edge [ 115 . 20 ] to connect any number of vertices [ 115 . 21 ] and edges [ 115 . 20 ].
- the list of atoms [ 115 . 19 ] composing an edge [ 115 . 20 ] are subject to an ordering constraint. There are three classes of ordering constraint. The simplest defines no ordering relationships at all, resulting in an unordered edge [ 115 . 20 ]. The most common case is that a total order is specified, which is straightforwardly represented as the order of occurrence of the atoms [ 115 .
- Hypergraph store the physical unit of storage for elements of the hypergraph [ 114 ]. It should be noted that there is no requirement that only hypergraph [ 114 ] elements appear in the hypergraph store [ 115 . 22 ]. The hypergraph [ 114 ] may be represented as one subset out of all the records [ 115 . 24 ] in the hypergraph store [ 115 . 22 ].
- Store address An important characteristic of the system is that records [ 115 . 24 ] in the hypergraph store [ 115 . 22 ] are accessed via a direct address rather than by keyed lookup as would be done in database systems. In one embodiment this address consists of segment id and offset parts, indicating an offset into a of data in the hypergraph store [ 115 . 22 ]. In such embodiments the hypergraph store [ 115 . 22 ] maintains a master table.
- Atom [ 115 . 19 ] Any hypergraph[ 114 ] element.
- OSF [ 110 ] records [ 115 . 24 ] to store all atoms [ 115 . 19 ] will be referred to as the OSF [ 110 ] embodiment.
- Other embodiments may use data representations other than OSF [ 110 ] so long as they store a sufficient description of the atom [ 115 . 19 ].
- OSF-based embodiments [ 110 ] a set of types will be predefined by the system that correspond to the two kinds of atoms [ 115 . 19 ]. There may be more than one type defined for each kind of atom [ 115 . 19 ] because for example the system may allow successive versions of an atom [ 115 .
- atoms [ 115 . 19 ] may require an attribute that represents a “logical ID”, i.e. an id that uniquely identifies the atom [ 115 . 19 ] and is shared across successive versions of the atom [ 115 . 19 ] in the underlying hypergraph store [ 115 . 22 ].
- Vertex [ 115 . 21 ] The fundamental unit from which hypergraphs [ 114 ] are formed.
- the OSF [ 110 ] embodiment will predefine a base type for OSF [ 110 ] records [ 115 . 24 ] representing vertices [ 115 . 21 ].
- Edge [ 115 . 20 ] A relationship between a set of atoms [ 115 . 19 ]. Edges [ 115 . 20 ] minimally require an attribute containing a list of atoms [ 115 . 19 ]. In some embodiments the edges [ 115 . 20 ] may be ordered, the equivalent to directed edges in conventional graphs.
- the OSF [ 110 ] embodiment will predefine a base type for OSF [ 110 ] records [ 115 . 24 ] representing edges [ 115 . 20 ]. Additionally subtypes for distinguishing ordered and unordered edges [ 115 . 20 ] may be defined. Ordered edges [ 115 .
- the hypergraph [ 114 ] are the hypergraph [ 114 ] equivalent to directed edges in a conventional graph, where the directed edge places an ordering on the pair of vertices that it relates.
- One point of difference from conventional graphs is that there is no advantage to storing a reversed version of an ordered edge [ 115 . 20 ].
- the hypergraph [ 114 ] has a more complex structure traversal, retrieval and other operations on the hypergraph [ 114 ] need to use a more general model.
- the hypergraph [ 114 ] system relies on additional structures to aid in the location and retrieval of atoms [ 115 . 19 ] from the hypergraph store [ 115 . 22 ].
- the indices stored by the system do not necessarily aid in quickly finding a random element of the hypergraph store [ 115 . 22 ]. In particular they are more often used to iterate over elements in the hypergraph store [ 115 . 22 ] more efficiently.
- the goal for these indices is that they be very cheap to update as new data elements are added to the hypergraph store [ 115 . 22 ]. For this reason the skip list data structure is often useful; there is a standard body of practice on using skip lists to efficiently create balanced index structures.
- Hypergraph operations The system defines a generic set of high level operations for working with the hypergraph [ 114 ]. These operations work by modifying a working set [ 115 . 17 ] consisting of atoms [ 115 . 19 ]. There are three core operations listed below.
- Closure operation The set of elements reachable from an initial atom [ 115 . 19 ] or set of atoms [ 115 . 19 ].
- the system defines a closure operation [ 115 . 29 ] that computes and adds this set of atoms [ 115 . 19 ] to a working set [ 115 . 17 ].
- Closures potentially can be quite large and encompass the whole hypergraph [ 114 ], therefore most embodiments of the system use constraints [ 115 . 9 ] to only return elements of the closure that fall in some smaller neighborhood.
- the closure operation [ 115 . 29 ] implements some extra conditions which separate it from a traversal operation [ 115 . 27 ].
- Projection operation Removes elements from a working set [ 115 . 17 ].
- the elements are chosen via constraints [ 115 . 9 ] and other filters.
- Embodiments may use characteristics such as edge [ 115 . 20 ] weights, arity and so on to select edges [ 115 . 20 ] in the projection.
- edge [ 115 . 20 ] paths contained in the working set [ 115 . 17 ] are replaced with new edges [ 115 . 20 ]. This is accomplished through the use of a path algebra.
- the path algebra operations are specified via an OSF [ 110 ] value and a set of types are predefined for that purpose.
- Traversal operation Adds elements to a working set [ 115 . 17 ], starting from elements that are in the working set [ 115 . 17 ] and following edges [ 115 . 20 ] incident on those elements.
- the operation shares similarities with the closure operation [ 115 . 29 ], but is focused on more fine grained control of the traversal rather than properties of the resulting set.
- the operation can be parameterized to run according to a traversal pattern.
- the traversal pattern is specified via an OSF [ 110 ] record.
- Breadth-first patterns add elements from successive neighborhoods out to a certain number of layers or hops. Depth-first patterns are used to select paths out to a certain number of hops. For example, a depth-first pattern may specify the first n-many paths, or the n highest weighted paths.
- Hypergraph Query The system implements an engine for running queries [ 115 . 1 ] against atoms [ 115 . 19 ] in the hypergraph[ 114 ].
- a query [ 115 . 1 ] is represented as a nested expression [ 115 . 2 ].
- Expressions [ 115 . 2 ] consist of terms [ 115 . 4 ] and functions [ 115 . 3 ] over terms [ 115 . 4 ] and functions [ 115 . 3 ].
- Terms [ 115 . 4 ] define a match condition.
- OSF [ 110 ] embodiments a term [ 115 . 4 ] is represented as a feature value pair. The feature is used to select a value from the OSF [ 110 ] value representing an atom [ 115 .
- Query expressions [ 115 . 2 ] map to a set of operators [ 115 . 30 ]. These operators [ 115 . 30 ] operate on match records [ 115 . 16 ] that contain a payload [ 115 . 18 ] value and a working set [ 115 . 17 ] of atoms [ 115 . 19 ].
- One set of operators [ 115 . 30 ] simply wrap the hypergraph operations [ 115 . 26 ] defined by the system.
- the modifications to the working set [ 115 . 17 ] performed by these operators [ 115 . 30 ] are limited by a set of constraints [ 115 . 9 ], i.e.
- constraints [ 115 . 9 ] are implemented via unification equations. This also means that a new OSF [ 110 ] record will be created as a result of evaluating the equation. Thus in OSF [ 110 ] embodiments constraints [ 115 .
- Query Operators These are the operators [ 115 . 30 ] used to specify queries [ 115 . 1 ] against hypergraph[ 114 ] elements. When evaluated a query operator [ 115 . 30 ] works on one working set [ 115 . 17 ], but a query procedure [ 115 . 10 ] maintains a list of working sets [ 115 . 17 ]. Each atom [ 115 . 19 ] initially matched in a query [ 115 . 1 ] spawns a working set [ 115 . 17 ] which may be expanded during the evaluation of query operators [ 115 . 30 ]. An embodiment of a set of query operators [ 115 . 30 ] appears later in this section.
- the hypergraph store [ 115 . 22 ] is an archive or set of archives that are split up into segments [ 115 . 23 ]. New segments [ 115 . 23 ] are allocated as the hypergraph [ 114 ] grows, and aged elements are removed from the hypergraph [ 114 ] by removing segments [ 115 . 23 ]. In order to allocate a new segment [ 115 . 23 ] the system first looks to see if there are any collected segments [ 115 . 23 ] (i.e. they have been removed from active use), or extends the archive to allocate a new segment [ 115 . 23 ]. The hypergraph store [ 115 . 22 ] has a master segment table [ 115 .
- OSF [ 110 ] that indicates the type and status of a segment [ 115 . 23 ].
- New entries are always appended when a segment [ 115 . 23 ] is allocated, This is because there may be dangling references to collected segments [ 115 . 23 ] in the active part of the hypergraph store [ 115 . 22 ].
- the record for the freed segment [ 115 . 23 ] will contain information used to determine how dangling references are resolved as described earlier in this document.
- OSF [ 110 ] records [ 115 . 24 ] are stored at offsets into a segment [ 115 . 23 ], therefore they can be addressed with a segment id:offset pair as the address.
- the OSF [ 110 ] records [ 115 . 24 ] represent elements of the hypergraph [ 114 ], called atoms [ 115 . 19 ]. However they are not restricted to that purpose, therefore the hypergraph [ 114 ] may be augmented with additional record types, or these additional records [ 115 . 24 ] may be used to do bookkeeping for storing partial state of an ongoing computation, etc. In one OSF-based embodiment [ 110 ], the OSF [ 110 ] records [ 115 . 24 ] are immutable. As discussed above, this places additional requirements on how hypergraph [ 114 ] structures are to be updated, but greatly simplifies management of the hypergraph store [ 115 . 22 ], particularly in multi-threaded or multi-process implementations.
- Hypergraph [ 114 ] computations that are triggered by the addition of new elements will then use lazy algorithms that only build out structures that are directly needed.
- the motivating factor for this design is that in many cases it can take more time to compute a result and store it in a way that it can be efficiently retrieved than to simply re-derive it as and when necessary.
- the hypergraph [ 114 ] system is designed to take advantage of these cases wherever possible.
- OSF [ 110 ] records [ 115 . 24 ] are associated with one or more types.
- One set of types will determine what kind of atom [ 115 . 19 ] the record represents.
- additional types associated to the record can be used to determine how the record should be handled by the system.
- two of these additional distinctions how a record is related to prior versions and the inclusion of special processing instructions for individual atoms [ 115 . 19 ].
- new versions of a record can be added by either completely shadowing the prior version or by describing a set of differences from the old version. For example when dealing with a large edge [ 115 . 20 ] the system may only store new members added to the edge [ 115 . 20 ] rather than storing an entirely new copy. Using this differential storage creates opportunities for new synergies with the query matching procedure [ 115 . 22 ] described below. For example, in a case where the query [ 115 . 1 ] only specifies the traversal of the most recent n paths in a match [ 115 . 16 ], those paths will usually be found in the last delta added for an edge [ 115 . 20 ].
- OSF-based [ 110 ] embodiments of the system allow for “fixes” to be placed in the hypergraph [ 114 ] in the form of specialized processing instructions in most embodiments. Embodiments using other record formats may be able to implement a similar scheme.
- OSF [ 110 ] embodiments a predefined set of OSF [ 110 ] types represent additional instructions that can be given to the various algorithms used by the system.
- a possible problem might be that a particular pair of aliases [ 240 ] is erroneously associated with the same actor identity [ 235 ].
- This system would allow an annotation to be placed on the respective alias [ 240 ] atoms [ 115 . 19 ] ing the combination of the two. These could be added to elements due to user feedback or other procedures detecting inconsistencies or problems in derived hypergraph [ 114 ] structures.
- Queries [ 115 . 1 ] have two purposes: to retrieve elements from the hypergraph [ 114 ] and to trigger hypergraph [ 114 ] computations in response to new atoms [ 115 . 19 ] being added to the hypergraph store [ 115 . 22 ].
- the same query matching procedure [ 115 . 10 ] is used for both purposes.
- the query [ 115 . 1 ] functionality covers three areas in most embodiments: matching features of OSF [ 110 ] records [ 115 . 24 ], augmenting a query working set [ 115 . 17 ] with hypergraph operations [ 115 . 26 ], and placing constraints on the hypergraph operations [ 115 . 26 ] either by additional feature tests or by membership or non-membership in a box [ 115 . 15 ].
- the default embodiment described here represents query operators [ 115 . 30 ] as a set of functions [ 115 . 3 ], whose arguments are either other query operators [ 115 . 30 ] or feature structures to use as terms or to use as constraints [ 115 . 9 ].
- Functions [ 115 . 3 ] accept match records [ 115 . 16 ] from their constituent arguments and produce a match record [ 115 . 16 ] if their condition is met.
- Match records [ 115 . 16 ] are represented as an OSF [ 110 ] record. This record contains bookkeeping information about the partial match [ 115 . 16 ] as well as a “payload” [ 115 . 18 ] feature that pulls some or all of the values out of its arguments.
- the initial match records [ 115 . 16 ] are created by comparing the leaf feature structures to atoms [ 115 . 19 ] in the hypergraph store [ 115 . 22 ]. If the feature structure unifies, a match record [ 115 . 16 ] created with the unified result as the payload [ 115 . 18 ]. If the operator [ 115 . 30 ] has a constraint [ 115 . 9 ], we attempt to unify the constraint [ 115 . 9 ] values against a match record [ 115 . 16 ] and the constraint [ 115 . 9 ] is satisfied if it succeeds. In one embodiment the constraint [ 115 . 9 ] is a unification equation that relates values in two feature structures.
- the resulting match record [ 115 . 16 ] contains the RHS value after successful unification of incoming match records [ 115 . 16 ] against the LHS. Note this same mechanism can be used to introduce new values into the match record [ 115 . 16 ] as well.
- the set of operators [ 115 . 30 ] is represented as a network [ 115 . 5 ] of vertices representing the operators [ 115 . 30 ] and edges that link operators [ 115 . 30 ] to any enclosing operators [ 115 . 30 ].
- the highest operator [ 115 . 30 ] is linked to a final reporting node [ 115 . 8 ]. Any match records [ 115 . 16 ] that flow through to that node [ 115 . 8 ] are reported as results.
- this structure can represent multiple queries [ 115 . 1 ], as the penultimate node [ 115 . 7 ] must be unique to each query [ 115 . 1 ].
- To deactivate a query [ 115 . 1 ] set the reference count on its penultimate node [ 115 . 7 ] to zero, and walk through its children decrementing the count. Only those nodes [ 115 . 7 ] with a positive reference count will report results forward.
- the query procedure [ 115 . 10 ] effectively creates a working set [ 115 . 17 ] for each atom [ 115 . 19 ] that it examines.
- Some query operators [ 115 . 30 ] are just wrappers around hypergraph operations [ 115 . 26 ] that expand or filter that working set [ 115 . 17 ].
- query operators [ 115 . 30 ] will include, but will not be limited to, the following:
- the embodiments of the system described here have all assumed the presence of an index [ 115 . 11 ] based on skip lists.
- the system in fact does not necessarily require the presence of an index [ 115 . 11 ], any of the techniques described here can be used by scanning through the OSF [ 110 ] records [ 115 . 24 ] in the hypergraph store [ 115 . 22 ].
- the value of skip lists is two fold: they are very cheap to maintain, and they implement a technique to speed traversal of the indexes [ 115 . 11 ].
- the embodiment of skip lists described here orders the atoms [ 115 . 19 ] they index by time of arrival in the hypergraph store [ 115 . 22 ]. Alternatively they can support indexing based on other orders, but generally at a cost which is greater than the technique described here.
- the base entries [ 116 . 05 ] in the list each contain a pair of the feature value for an atom [ 115 . 19 ] and its address. Additionally entries [ 116 . 05 ] contain a variable length array of pointer and skip count pairs [ 116 . 10 ] that are used to link entries together.
- Skip list entries can therefore be a member of multiple linked lists. The technique used here is that each successive level in the skip list skips a larger number of entries in the list. The goal is to be able to skip over a large number of entries in a small number of steps. In order to skip a number of entries [ 116 . 05 ] from the current entry [ 116 .
- queries [ 115 . 1 ] that are more restrictive and/or have a larger number of terms [ 115 . 4 ] and sub-expressions [ 115 . 2 ] will gain the most, due to being able to maximize skip lengths. This is because in general more skips will accumulate before revisiting one of the later index [ 115 . 11 ] lists.
- An embodiment may optimize query [ 115 . 1 ] execution speed by ordering term [ 115 . 4 ] tests such that indices referencing fewer atoms [ 115 . 19 ] or indices or tests that are determined to be more likely to fail are visited first. In this way the procedure can find skips both earlier and by visiting a smaller number of index [ 115 . 11 ] lists. As we only visit the later lists when the query [ 115 . 1 ] evaluation for an atom [ 115 . 19 ] has not already failed, this has the result of producing larger skips in those lists.
- Construction of the list is relatively simple and requires a counter that is incremented for each new item added to the hypergraph store [ 115 . 22 ].
- determine which skip lists to update For each skip list, add a new entry [ 116 . 05 ], and determine how many levels of the skip list to include it in. For instance to have skips of increasing orders of magnitude, place it in the second level if the current counter is a multiple of 10, and in the third level if a multiple of 100 and so on.
- there are atoms [ 115 . 19 ] that are not to be included in a list this may result in more level entries than necessary.
- this problem is handled by keeping a separate relative counter for atoms [ 115 . 19 ] added to the list and use that counter to determine the number of levels, however record the skip count based on the first counter. This will require that the last positions used at each level from the first counter be tracked.
- adding entries to a skip list with a level array [ 116 . 10 ] of length x we first increment the head's [ 116 . 15 ] skip counters for each level then copy the first x level pairs from the head [ 116 . 15 ] of the list into the new entry's [ 116 . 05 ] level list [ 116 . 10 ] and update the first x level pointers in the head to point to the new entry [ 116 . 05 ]. Finally, set the first x skip counters to zero.
- Augmenting the skip list to be an inverted index [ 115 . 13 ] is fairly straightforward. It has the same structure as the skip list above (see [ 116 . 20 ] [ 116 . 25 ] [ 116 . 30 ]), however the level lists are constructed slightly differently.
- An inverted index [ 115 . 13 ] essentially acts like a collection of skip lists in this context.
- the construction procedure is the same as above with the change that we append together the level lists [ 116 . 30 ] for each of the skip lists that reference the atom [ 115 . 19 ].
- One embodiment will retain the feature value in skip list entries [ 116 . 20 ] so that values contained in the head table [ 116 . 35 ] may be used to denote a range of possible values (which can be used to reduce the number of posting lists).
- each value contained at the head table [ 116 . 35 ] must be able to successfully unify with all of the values contained in its associated set of levels [ 116 . 30 ] [ 116 . 25 ].
- the skip list index [ 115 . 31 ] used for boxes [ 115 . 15 ] has an additional purpose beyond the skip lists described above. It is desirable to be able to directly enumerate the members of a box [ 115 . 15 ]. To do this we add to the skip list representation above a set of levels consisting of ⁇ boxId, pointer> pairs. These allow us to chase pointers to recover the members of a box [ 115 . 15 ] as well as skipping using the other levels.
- One set of skip list levels [ 117 . 05 ][ 117 . 10 ] [ 117 . 15 ] is constructed similarly to the skip lists described above, however a feature value is not required to be stored in the base list entry [ 117 . 05 ].
- An additional set of levels [ 117 . 20 ] is added for each of the box [ 115 . 15 ] membership lists, along with an array of the heads of each level [ 117 . 25 ].
- an embodiment can choose to store only those containing boxes [ 115 . 15 ] at the bottom of the box [ 115 . 15 ] hierarchy.
- Box 1 , Box 2 , Box 4 are the only ones that would need to be placed in membership lists [ 117 . 20 ].
- Membership in other boxes [ 115 . 15 ] could be found by searching the parents of these boxes [ 115 . 15 ] in the hierarchy.
- the embodiments described use a separate data structure to track the hierarchy of containment relationships between boxes [ 115 . 15 ].
- the box [ 115 . 15 ] hierarchy is simply kept in the hypergraph [ 114 ] itself.
- FIG. 119 defines the logical structure of the query matching procedure [ 115 . 10 ] that will be used in many embodiments.
- the figure contains branches for both indexed and continuous queries [ 115 . 1 ].
- indexed queries [ 115 . 1 ] a list of indexes [ 115 . 11 ] is kept.
- An index [ 115 . 11 ] is marked as exhausted when the procedure [ 115 . 10 ] reaches the end of the index [ 115 . 11 ]. If an index [ 115 . 11 ] entry refers to an atom [ 115 . 19 ] from a segment [ 115 . 23 ] that has been collected, then depending on the garbage collection strategy used by the embodiment, the end of the list may have effectively been reached.
- Each index [ 115 . 11 ] is associated with a particular feature, and an additional list of the tests based on that feature is kept for each index [ 115 . 11 ].
- the procedure [ 115 . 10 ] has access to the OSF [ 110 ] record itself in most embodiments.
- An embodiment may choose to bypass the scan over indices and scan over atoms [ 115 . 19 ] in the hypergraph [ 114 ], retrieving the OSF [ 110 ] records [ 115 . 24 ] and Tests for the continuous queries [ 115 . 1 ] first retrieve a value from the atom [ 115 . 19 ] based on the feature.
- the test can specify whether or not the value is required to be non-empty. If it can be non-empty the test succeeds vacuously, otherwise the test succeeds if the test value can be unified against the retrieved value.
- the following text describes the workflow used in one embodiment.
- an indexed query procedure [ 115 . 10 ] enters a top level loop that scans through the set of indices. First it selects an index [ 115 . 11 ] [ 119 . 10 ]. If the index [ 115 . 11 ] is exhausted [ 119 . 15 ] then check to see if there are any more indices [ 119 . 20 ]. If not then the procedure [ 115 . 10 ] terminates. Otherwise we enter a loop that evaluates all the initial feature tests [ 119 . 35 ].
- a continuous query procedure [ 115 . 10 ] [ 119 . 30 ] will bypass the loop that scans through indices and start at [ 119 . 35 ] as well.
- the procedure [ 115 . 10 ] selects the next test and checks to see if it is satisfied [ 119 . 40 ]. If so then we create an initial match record [ 115 . 16 ] and advance it to all target nodes [ 115 . 7 ] associated with that test [ 119 . 45 ]. If a test appears in more than one immediately containing expression [ 115 . 2 ], then it is linked to the operator nodes [ 115 . 7 ] created for all those expressions [ 115 . 2 ].
- a match record [ 115 . 16 ] When a match record [ 115 . 16 ] is advanced to a node [ 115 . 7 ] it may become active.
- the activation of an operator node [ 115 . 7 ] depends on its type. However the decision is generally based on receiving the appropriate arguments and any initial tests or constraints [ 115 . 9 ] to be satisfied. If a node [ 115 . 7 ] does not become active then the match record [ 115 . 16 ] advanced to it will be stored, in case other match records [ 115 . 16 ] are reported to the node [ 115 . 7 ] later in the procedure [ 115 . 10 ] and it then becomes active. The procedure [ 115 . 10 ] next moves all nodes [ 115 .
- the operator nodes [ 115 . 7 ] corresponding to the outermost query expressions [ 115 . 2 ] are linked to a final reporting node [ 115 . 8 ].
- the final node [ 115 . 8 ] does not become active as operator nodes [ 115 . 7 ] do, rather when a match record [ 115 . 16 ] is advanced to the final node [ 115 . 8 ] it is reported as a query [ 115 . 1 ] match.
- an embodiment may detect shared sub-expressions [ 115 . 2 ] and connect the sub-network [ 115 . 5 ] generated for the sub-expression [ 115 . 2 ] to the operator nodes [ 115 . 7 ] generated for all immediately containing expressions [ 115 . 2 ]. Any nodes [ 115 . 7 ] that became active in the prior step are placed in the queue [ 119 . 90 ].
- any completed query matches [ 115 . 16 ] will have been advanced to the final node [ 115 . 8 ] and reported.
- the procedure [ 115 . 10 ] resets the matching network [ 115 . 5 ] state by removing all match records [ 115 . 16 ] from operator nodes [ 115 . 7 ] [ 119 . 100 ]. If this is a continuous query [ 115 . 1 ] [ 119 . 85 ] then we are finished. Otherwise if there are no more atoms [ 115 . 19 ] [ 119 . 65 ] then we are finished. Otherwise we advance to the next atom [ 115 . 19 ] [ 119 .
- the procedure [ 115 . 10 ] only need check that there are unexhausted indices and it will set the current atom [ 115 . 19 ] to the next atom [ 115 . 19 ] in the next entry of the first index [ 115 . 11 ] it chooses.
- This procedure [ 115 . 10 ] may be simplified by introducing a set of OSF [ 110 ] types for which optimizations have been determined.
- continuous queries [ 115 . 1 ] one embodiment unifies several or all of the test conditions so that they can all be checked with only one unification operation. For example several tests feeding into an “and” operator [ 115 . 30 ] may be combined this way. This lets the unification code find the most optimal strategy for the two values.
- the type system may be used within the unification implementation to dispatch to possible optimizations.
- the following procedure describes how the hypergraph [ 114 ] system is used to enable building of discussions [ 136 ] in a continuous usage mode. It should be noted that the hypergraph [ 114 ] system described here is a general purpose tool; it is used to continuously compute a large variety of complex structures in addition to discussions [ 136 ]. The embodiments referenced here reference discussions [ 136 ] but the use of the system is not limited to this one procedure.
- a large set of queries [ 115 . 1 ] are set up to be run continuously. These queries [ 115 . 1 ] are intended to trigger computations in response to new atoms [ 115 . 19 ] added to the hypergraph [ 114 ].
- the system contains a set of dispatch rules that determine which incremental computation (if any) should be run in response to query matches [ 115 . 16 ]. These rules trigger computations intended to synthesize new pieces of evidence as well as the procedure(s) used to create or update discussions [ 136 ].
- queries [ 115 . 1 ] produce OSF [ 110 ] values as part of the matching procedure [ 115 . 10 ]. These queries [ 115 . 1 ] also contain constraints [ 115 . 9 ] expressed as unification equations that are used to build up the resulting match record [ 115 . 16 ] values. These queries [ 115 . 1 ] do not have to look for particular values, and would not be very useful for incremental computation if they did. In the OSF [ 110 ] embodiment queries [ 115 . 1 ] can easily be set up to detect broad classes of evidence. This is because OSF [ 110 ] feature structures can contain empty, typed values as well as choice and range values that can unify against a set of possible values.
- This OSF [ 110 ] embodiment will also use constraints [ 115 . 9 ] to represent the dispatch rules triggering incremental computations. As elsewhere, these constraints [ 115 . 9 ] can be used to construct a new OSF [ 110 ] value based on the value that they unify against. In this context that functionality can be used to pass instructions on to the routine carrying out the incremental computation.
- One way of describing this system is as a feed-forward inference engine, and is subject to the same sorts of optimizations as those systems.
- An example of the general flow of processing is as follows.
- An email is added to the hypergraph [ 114 ].
- Evidence rules are triggered which perform computations such as resolving aliases of the sender and recipients of the email to actors, possibly creating new actors [ 220 ] as a side effect.
- Various other relationships might be found for content in the email and so on.
- new query matches [ 115 . 16 ] may be produced.
- a query [ 115 . 1 ] which checks to see if the actors [ 220 ] associated to the email are consistent with actors [ 220 ] in one or more discussions [ 136 ] or emails (via the closure query operator [ 115 . 30 ]) triggers a computation to see if the email should be added to one or more of those discussions [ 136 ] or cause the formation of a new one.
- Embodiments have a great deal of flexibility in how to handle this flow. For example an embodiment may run match records [ 115 . 16 ] or incrementally triggered computations in batches to share work in computing updates. The order in which computations are triggered may be important as well. Embodiments may use a queueing methodology, embed conditions in the dispatch rules or other mechanisms to establish orderings. Any such approaches are heavily dependant on the rules used, the set of computations that are available and details of how they work.
- the structure of the procedure is fairly straightforward.
- a match record [ 115 . 16 ] is reported [ 120 . 05 ] it is compared to a list of dispatch rules.
- an evidence rule e.g. used to calculate some intermediate result
- an evidence calculation procedure is triggered [ 120 . 10 ].
- the procedure may run immediately at this point or be scheduled to run later or in a batch. At the point the procedure runs, it may either enumerate the current evidences that are affected by the new data, or simply produce a new evidence and in the next step the overlapping or otherwise affected evidences are determined [ 120 . 15 ]. Once this set is determined a set of updates to the hypergraph [ 114 ] have to be decided [ 120 .
- the sequence is somewhat more involved.
- This expansion of the working set [ 115 . 17 ] may also create new transitory edges [ 115 . 20 ] reflecting heuristics and other ad-hoc methods for specifying or limiting relationships.
- the discussion [ 136 ] is built by taking a projection from this evidence hypergraph [ 114 ] [ 120 .
- Projection in this sense is defined similarly to the hypergraph[ 114 ] projection operation [ 115 . 28 ].
- the working set [ 115 . 17 ] is transformed into a set of pairwise edges [ 115 . 20 ] and the projection is found by running a maximum spanning tree algorithm.
- the algorithm is modified to run on a hypergraph[ 114 ] directly.
- the algorithm is further augmented by conditions that disallow links to be used by the algorithm. The general result of this is that the algorithm will act as if the hypergraph [ 114 ] is split into components and spanning trees will be computed for each of the components.
- the set of current affected discussions [ 136 ] is enumerated [ 120 . 35 ].
- the new discussions [ 136 ] are added to the hypergraph[ 114 ] [ 120 . 40 ].
- discussions [ 136 ] are represented as an edge [ 115 . 20 ] and a partial order on the items on other constituent structures in the discussion [ 136 ].
- the edge [ 115 . 20 ] is unordered and any process viewing the discussion [ 136 ] will decide how to order and display the members of the discussion [ 136 ] based on retrieving associated edges [ 115 . 20 ].
- an embodiment will simply define a discussion [ 136 ] as a box [ 115 .
- a process can then construct a view of the discussion [ 136 ] or discussions [ 136 ] related to an item by running a query [ 115 . 1 ] constrained by the box [ 115 . 15 ], or simply return all the members of the box [ 115 . 15 ].
- the system assumes that the discussions [ 136 ] are reported [ 120 . 45 ] to an external process or store or other similar mechanism.
- the address of a discussion [ 136 ] edge [ 115 . 20 ] is reported. In this manner the client can be notified when there are changes to a discussion [ 136 ] or a new discussion [ 136 ] has been added.
- the client can directly retrieve the edge [ 115 . 20 ] and traverse to related atoms [ 115 . 19 ].
- atoms [ 115 . 19 ] describe a subset of relevant characteristics of source actors [ 220 ], items and events.
- the source data for these entities will be archived separately and atoms [ 115 . 19 ] will contain location data for the source data corresponding to each atom [ 115 . 19 ].
- an OSF [ 110 ] based embodiment will be assumed. Embodiments that use other representations will need to be able to at least minimally represent the characteristics described below.
- the core concept is that the structure of atoms [ 115 . 19 ] be defined by a set of facet types [ 121 . 40 ].
- the intent is that the structure of an atom [ 115 . 19 ] can be determined by unifying a list of facet types. The implication of this is that facet types do not define individual values contained in atoms [ 115 . 19 ], rather they are a slice of the final atom [ 115 . 19 ].
- the set of facet types associated with an atom [ 115 . 19 ] provide an efficient mechanism for dispatching to incremental computation procedures. The set of types is intended to have the effect that most dispatch rules need only test for one or more of these types in order to determine what to trigger for an atom [ 115 . 19 ].
- FIG. 121 provides a highly schematic example of how atoms [ 115 . 19 ] are associated with facet types in one embodiment.
- a type Emailltem [ 121 . 45 ] is defined as the unification of the types ItemIdentity [ 121 . 05 ], ActorBroadcast [ 121 . 20 ], Archived Content [ 121 . 25 ] and EdgeAtom [ 121 . 35 ].
- EdgeAtom [ 121 . 35 ] defines the set of characteristics necessary for an edge [ 115 . 20 ] atom, i.e. an address, a logical ID, a list of incident atoms [ 115 . 19 ] and so on.
- ItemIdentity [ 121 . 05 ] contains characteristics that identify the source item.
- ActorBroadcast [ 121 . 20 ] is used for messages that are broadcast from sender to a list of recipients.
- ArchivedContent [ 121 . 25 ] defines fields that specify where the email's content can be found.
- a type IMConversation [ 121 . 50 ] represents a set of IM messages that have been determined to be part of one conversation. Therefore it is associated with a collection of items and uses the CollectionIdentity [ 121 . 10 ] facet.
- the set of actors [ 220 ] associated with a conversation (in some IM systems conference rooms can be set up which allow for more than 2 participants) is an unstructured list, therefore the ActorPool [ 121 . 15 ] facet is used.
- the content of the conversation is structured into a hierarchy of exchanges, turns and so on which is represented via the HierarchicalContent [ 121 . 30 ] facet.
- the Cataphora system is deployed in a world with an extremely broad variety of real world data sources and use cases.
- many companies and other environments will have unique types of items.
- an email system such as Lotus Notes, where templates for forms and other custom message types can be added to the system.
- messages that can be handled as conventional email are a just a subset of the traffic carried by the internal, potentially highly customized Lotus Notes system.
- Cataphora system will have a set of complex computational procedures that will be difficult to implement and change. For all practical purposes these are a fixed set, though they can be changed and updated over time. In order to bridge the gap, these procedures are built to recognize facets of items.
- the system does not have any intrinsic notion of email, but rather what types of relationships are relevant to a message that is broadcast to a list of recipients.
- the strategy employed to integrate data sources into most embodiments of the Cataphora system is to determine what set of facets best represents each item. Items are archived external to the hypergraph store [ 115 . 22 ], and an OSF [ 110 ] record is created conforming to the set of facet types chosen for each item or event or other object to be represented in the system.
- An advantage of the OSF [ 110 ] formalism as opposed to other formalisms for feature structure types is that it does not require total typing, i.e. an OSF [ 110 ] Value can have features and values in addition to those specified by the type of the value.
- the implication for the system is that we associate a set of types to a value rather than making it an instance of a type per se.
- An evidence source [ 108 ] is defined to be any system which produces regular records that can reliably be linked to one or more actors in the relevant universe. Most valid evidence sources [ 108 ] will have a time and date stamp associated with each unique event record that they generate, however some embodiments will support evidence sources [ 108 ] that lack this, and instead use indirect means to infer a time and date (such as the arrival of an event between two other events which do have such a timestamp.) Common examples of evidence sources [ 108 ] include but are certainly in no way limited to: transactional systems, scheduling systems, HR systems, accounting systems, intelligence monitoring systems, and systems which crawl the web looking for comments in areas of specific interest.
- Each new evidence source [ 108 ] represents at least one new dimension in the model; what is really the case is that each distinct type of event does. For example, transactions which are cancelled in a trading system are likely to be considered a different vector than transactions which exceed some externally specified level of risk. In many embodiments therefore, each broad class of event will be considered a vector in a minimal integration, though of course whoever is performing the integration can decide what vectors make sense for their particular purposes if they wish. Most embodiments will also allow non-orthogonal vectors to be expressed because so doing will often add significant value.
- a marked overall increase in the dimension of emotive tone on the part of an actor may be considered noteworthy in many instances of the system's deployment; that such increase is largely the result of a particular topic or is in relation to another specific actor is also often well worth knowing in such cases. While this of course can be set up manually, many embodiments will also automatically perform such combinations whenever it is merited by a statistically uneven distribution of data such as in the example just above. Some such embodiments will generalize and expand the dimensionality of the model based on such empirical observation. Some embodiments may opt to maintain a “virgin” model that factors in no user feedback as a security control.
- ad hoc workflows are processes [ 128 ] that differ from formal processes [ 128 ] in at least one of two ways: they are not documented (or not documented thoroughly) as formal workflow processes [ 128 ], and they are not subjected to such strict enforcement as formal workflow processes [ 128 ], i.e. they can tolerate various kinds of deviations from the expected model, such as missing stages [ 154 ], additional stages [ 154 ], unusual number of iterations, etc.
- the present invention expands on this definition to continuously detect the presence of significant workflow processes [ 128 ] and to assess how regularly these workflows are performed by various actors [ 220 ], which in turn allows detection of anomalies [ 270 ] in assessed behavior [ 205 ] in that regard.
- ad-hoc workflow models are detected during the analysis of the input event stream by the continuous workflow analysis component [ 465 ].
- a new workflow model is built whenever a statistically significant volume of event sequences [ 166 ] are identified which conform to a particular pattern, including but not limited to the following types of patterns:
- the workflow model is built as a higher-order Markov chain whose states are composed of the individual event [ 100 ] and item [ 122 ] patterns, including but not limited to:
- a probabilistic suffix automaton is used instead of a higher-order Markov chain.
- the model is trained on a training set defined for example as a group's baseline window of events.
- re-training occurs at regular intervals defined in the system configuration (which, depending on the business domain considered, may be in the order of weeks or months).
- a load-balancing component adapts the training frequency so as to find an adequate trade-off between maintaining a recent model and not overload the machines hosting the various components of the processing and analysis layer [ 402 ].
- the ad-hoc workflow model thus built is leveraged by the present disclosure in two different ways: anomaly visualization and outlier detection.
- the ad-hoc workflow model is visualized so as to very efficiently spot any abnormal information flow or steps in any process, particularly a critical one.
- anomalies include but are not limited to:
- the model is used to detect outlier workflow instances [ 134 ]: these are instances [ 134 ] that match the informal process [ 128 ] definition (because they exhibit the properties previously described in terms of document types, actors [ 220 ] and groups [ 225 ], topics [ 144 ] and pragmatic tags [ 172 ], etc.) but have a very low probability of having been generated by the model.
- this outlier detection mechanism is straightforward as the generation probability is given by the model.
- one embodiment of the present disclosure uses the following similarity measures for detecting outliers:
- outliers are defined by assessing the normalcy of an input workflow instance [ 134 ] with respect to a model derived from the observation of instances [ 134 ] as training set.
- this training set or referential can be defined in a number of ways, including but not limited to the following important definitions:
- emotional expression (subjective component) from appraisal (objective component). Identifying emotional expression depends on a variety of indicators: lexical choice (specific words and phrases, including the interjections and the “responsive cries” studied in [Goffman 1981]), person distinctions (first person (“I”, “me”, “my”, “mine”, “we”, etc.), involvement is especially important), tense distinctions (favoring present tense), syntactic constructions (such as exclamative structures like “What a beautiful day it is!”), modification. Different factors play different roles for different emotions.
- the emotive tone analysis component [ 435 ] recognizes a set of basic emotions and cognitive states. These include (among others) anger, surprise, fear, confusion, frustration. These overlap with, but are not coextensive with the set of basic emotions identified and studied by Paul Ekman that are typically communicated not linguistically, but by physiognomic gestures (see [Ekman 2003]).
- the range of any emotion can be modeled by the emotive tone analysis component [ 435 ] as an open interval.
- Basic individual expressions (“livid”, “angry”, “upset”, etc.) can be associated with a basic sub-interval of such an interval. Modifiers can be interpreted as affecting this range.
- the basic observation is that initial modification has a greater effect than subsequent modification. For example, the difference between A and B below is greater than the difference between C and D:
- the range of any emotion can be modeled by the emotive tone analysis component [ 435 ] as a partial ordering.
- the utility of this model is that in a partial ordering, different modes of increasing intensity may not be directly comparable.
- FIG. 38 shows a simple Hasse diagram (where points higher on the diagram indicate greater intensity) that illustrates two modes of increasing intensity are indicated: capitalization and modification by intensifiers (here, “very”). These two modes are compatible (which forms an upper bound for both in the diagram), but the two modes are not easy to rank subjectively in relative intensity.
- emotional intensity varies across individual actors [ 220 ], across pairs of actors [ 220 ], across events [ 100 ], across social groups [ 225 ], etc.
- Measuring intensity makes it possible to filter and analyze emotionally expressive communication in various ways.
- the focus of interest may be the statistical outliers in emotional expression, on the assumption that these occurrences are most likely to correlate with other events of interest.
- the analysis of emotional expression performed by the emotive tone analysis component [ 435 ] is compatible without restriction with a wide variety of other analytic methods: topical [ 144 ] filtering, actor-based [ 220 ] or domain-based filtering, temporal filtering (by interval, by time of day, etc.).
- the emotive tone analysis component [ 435 ] can identify and analyze emotional expression either statically, i.e. on a fixed dataset, when the system is running in batch mode [ 370 ] or dynamically, i.e. on a data stream, when the system is running in continuous mode [ 375 ].
- continuous mode the analysis of emotional expression can be carried out not only retrospectively (as in the case of a fixed static dataset), but also prospectively so that future emotionally-involved events may be anticipated.
- the system described in the present invention includes a pragmatic tagging component [ 430 ] that will categorize the communicative and discourse properties of individual electronic communications [ 123 ].
- This pragmatic tagging component [ 430 ] is a further development and implementation of the system described in U.S. Pat. No. 7,143,091, the disclosure of which is incorporated by reference herein for all purposes, and is designed to support a variety of functions, including but not limited to the following.
- Workflow analysis an important feature of electronic communications [ 123 ] is how they relate to a workflow processes [ 128 ] i.e. the set of corporate tasks associated with normal business. Salient aspects of such communications [ 123 ] include requests for information or deliverables, negotiations concerning such requests, status updates, delivery of results, acknowledgment of receipt of information or deliverables, etc., together with a range of communicative information related to the social relations of those communicating (such as positive and negative forms of politeness, including thanks, praise, etc.).
- sets of electronic communications [ 123 ] typically have various levels of structure, including a first-order structure involving the individual elements of the set, and higher order structures that link first-order structures together in various ways (see, e.g., U.S. Pat. No. 7,143,091 for details).
- a very simple example is a case in which an email message that ends with a request is linked to a subsequent message acknowledging the request and perhaps fulfilling it.
- Pragmatic tags [ 172 ] offer a different basis for postulating hypothetical links, one which can be used to strengthen or confirm hypotheses based on other sources of information.
- Lexical analytics the form of words that individuals use in communicating about a workflow process [ 128 ] or topics [ 144 ] of mutual interest often reveals attitudes and presumptions that the communicants convey directly or indirectly; access to these overt or implicit attitudes is often useful in assessing motives with regard to the tasks at hand or to other actions or events.
- Relational analytics linguistic aspects of email communication also convey mutual understanding of the personal and social relations among the participants, including points at which the strength of these relations is tested in some way and points at which such relations undergo significant changes.
- such a system can be realized as a cascade of transducers, as illustrated in FIG. 17 .
- an electronic document is passed through a linguistic filter [ 1700 ] token by token.
- Individual words and grammatical constructions (such as inverted sentences of the kind that occur in English yes-no questions or wh-questions) are detected.
- Each detected instance is replaced by an intermediate tag [ 1710 ]; non-matching tokens [ 116 ] are disregarded (though in some implementations it may be useful to count them or measure the distance from the onset of a message to the first recognized expression or the distance between recognized expressions).
- the resulting set of intermediate tags is then analyzed further, with the intermediate tags possibly replaced by a final set of projected tags [ 1720 ].
- Transducers of this kind have the property that they preserve the relative order of the input expressions: that is, if tag i follows tag k at a given level, then the evidence for tag i follows the evidence for tag k at the previous level. This is not a trivial property. If the word “Thanks” appears initially in a communication [ 123 ], it serves as an acknowledgment or appreciation of a previous action on the part of the addressee of the communication [ 123 ]. If it appears following a request (as in: “Please get back to me with the requested information by 5:00 this afternoon. Thanks.”), it serves to acknowledge a presumptive acceptance of the request it immediately follows. If the relative order of information is not preserved, this distinction is lost. And in the absence of correct order information, the effectiveness of the tagging in linking different messages together correctly would be degraded.
- the first can be used as a command or request.
- the second two cannot.
- the second can be used to request information.
- the first and third cannot.
- the third can be used to make an assertion.
- the first and second cannot. But in a variety of more complex cases, the relation between syntactic form and pragmatic interpretation is less straightforward.
- Each of these sentences can be used to inform the addressee or make the addressee aware that it's raining. But like the previous cases, they have different presumptions governing their appropriateness in context and they have different effects on these contexts. For example, in some cases, it is regarded as perhaps more tactful to be more indirect—but this is a way of saying that the possible risk to the social relations between speaker and addressee is playing a more prominent role.
- first-level transducer is sensitive both to basic properties of syntactic and to a set of sentence-initial operators:
- Each of these operators is associated with a pair of pragmatic tags [ 172 ]: one associates a basic speech act type with the sentence as a whole; the other provides scalar information associated with the pragmatic context—particularly the way the pragmatic interpretation of the sentence bears on the social relations between speaker and addressee.
- a third reason to have a cascade of transducers involves context-dependent information.
- the study of indirect speech acts has brought out the ambiguity or vagueness of such sentences as “Can you jump over that wall”, and has emphasized the social utility that indirectness involves. Yet context often reveals how the utterance of such a sentence is interpreted by the conversational participants. If the answer to “Can you jump over that wall” is “ok”, the answerer interprets the sentence as a request for a jump. If the answer is “I believe so”, it seems more likely in this case that the speaker interprets the sentence as a request for information, not action. It is simpler to resolve the vagueness or ambiguity of this question at an intermediate level than at the lowest level.
- a fourth reason to have a cascade of transducers, related to the third, involves the fact that information in dialogue is typically distributed across different dialogue acts (or “dialogue turns”).
- a central example of this involves questions and their answers. From a question like “Did you receive the documents?”, one cannot infer that the addressee either received the documents in question or did not receive the documents in question. Suppose the answer is “Yes, I did”. From this elliptical sentence, one can also not in general infer that the speaker either received the documents or did not receive the documents. But suppose the utterances of these two sentences are appropriately coupled in dialogue (or similar electronic communication [ 123 ]), as below:
- This structured dialogue supports the inference that B received the documents in question (assuming such things as that B is trustworthy).
- This inference process is enhanced by one or more intermediate levels of representation. To consider the nature of these levels of representation, it is useful to examine in more detail the nature of questions and their answers.
- Our system employs a novel framework for the analysis of question/answer dialogue structures. This system analyzes the overall problem into a set of cases, with each case being associated with a variety of linguistic forms (on the one hand) and with a set of inference-supporting properties (on the other). (For brevity, we focus here on direct yes/no questions.)
- Case 2 The answer resolves a more general question or a more particular question.
- each relevant pair and each posited relation holding between them we may associate a shared agenda: a mutually recognized set of tasks or obligations or benefits that they are mutually committed to; in addition, each member A of a relevant pair ⁇ A,B ⁇ may be associated with a private agenda—not shared and not necessarily mutually committed to—which directs, governs, or constraints actions and responses by A toward B.
- This abstract workflow model corresponds to a simple finite state machine, illustrated in FIG. 18 .
- the required flow through this machine is from an initial state representing a new task [ 1800 ] through an intermediate required state representing the addition of a task to the agenda [ 1810 ] to a final state representing the removal of the task from the agenda [ 1830 ].
- All the subsidiary questions of the kind just discussed above correspond to loops around task request [ 1810 ], task acceptance [ 1820 ], or task completion [ 1830 ].
- the loops around task request [ 1810 ] correspond to negotiations concerning the nature of the task.
- the loops around task acceptance [ 1820 ] correspond to communications [ 123 ] among the participants during the period that task is mutually recognized but not discharged.
- Loops around task completion [ 1830 ] correspond to acknowledgments and thanks and other later assessments.
- a specific task can fall off the agenda in more than way, including but not limited to:
- the model of generic tasks may never reach task acceptance [ 1820 ] or may involve linking sub-models representing the agendas of individual members of a pair.
- Mutually dependent situations of this kind may be modeled to a first approximation by finite state transducers (like a FSA, but with arcs labeled by pairs), where the first element of any pair represents the cause of the transition, and the second represents the value to one or the other party.
- This behavioral modeling component [ 445 ] allows the system to compute several measures of influence for a particular actor [ 220 ] by assessing the changes in behavior [ 205 ] of people around that actor [ 220 ] before, during, after a significant absence of some kind, both in terms of how completely their behavior returns to the norm after that period, and how quickly it does so.
- measuring the individual behavior [ 210 ] and collective behavior [ 215 ] of people around an actor [ 220 ] both before and after a period of absence or lack of interaction with the rest of the network is a very insightful measure of that actor's [ 220 ] influence level, because if the person is important enough then the behavior of the network will snap back to its prior behavior quickly—and completely—once the person has returned. Conversely, if the person's influence was fragile or not shared by the majority of people interacting with her, not only will the behavior in her neighborhood take much longer to go back to normal, but the new norm will also be significantly different from the former baseline behavior [ 260 ], with new connections created to third-parties, more erratic communication patterns, etc.
- a period of absence can be due to whether vacation, business travel that causes the person to be largely out of touch, dealing with family issues, etc.
- these periods of absence are derived using a number of methods including but not limited to:
- the system computes a divergence on the distribution of per-actor activities between a baseline period P 1 (e.g. sliding window of 1 year) and a period P 2 of inactivity (typically a small number of weeks) for the actor [ 220 ] which will be referred to as the subject in the following.
- the distributions are computed over all actors [ 220 ] having significantly interacted with the subject in one of these periods.
- P 1 and P 2 we prune the elements in which the subject is involved (for P 2 this is to get rid of the residual activity, such as a long-running discussion [ 136 ] that was started in the presence of the subject). Pruning is done either by only removing individual events [ 100 ] involving the subject, or by removing all discussions [ 136 ] containing at least one such event [ 100 ].
- any divergence metric can be used.
- the K-L divergence H(P 1 , P 2 ) ⁇ H(P 1 ) is used where H refers to the entropy of a distribution.
- the system uses the variation of information H(P 1 )+H(P 2 ) ⁇ 2I(P 1 , P 2 ) where I(P 1 , P 2 ) is the mutual information between P 1 and P 2 .
- the divergence measured by this method constitutes an assessment of how the absence of the subject impacts her environment and her closest professional or social contacts.
- the activity model used in this method is also entirely configurable. Activities taken into account in the default embodiment cover all types of communication channels [ 156 ], from which the system derives events that are either individual communications [ 123 ] (i.e. random variables taking values such as “A emails B” or “A calls B”) or entire discussions [ 136 ] (i.e. random variables taking a value such as “A emails B,C; B emails A”).
- actors [ 220 ] such as managers whose sign-off is necessary to approve certain actions and events are excluded from consideration by the system. This is because such actors [ 220 ] will by definition impact the behavior of people reporting to them, without that causal relationship bearing any implication on their actual level of influence.
- Textblocks are defined in U.S. Pat. No. 7,143,091: “Textblocks consist of the maximum contiguous sequence of sentences or sentence fragments which can be attributed to a single author. In certain cases, especially emails, a different author may interpose responses in the midst of a textblock. However, the textblock retains its core identity for as long as it remains recognizable.”
- a method is also given there for detecting textblocks, That method has certain limitations in terms of both recall and memory footprint. The method described here is superior in both respects (see FIG. 19 ). This method is also amenable to continuous computation and hence provides the preferred embodiment for the textblock detection component [ 470 ].
- the idea of this method is to find collections of text fragments in different text-containing items [ 122 ] which are similar enough to infer that they were duplicated from a single original item [ 122 ], and hence have a single author. We shall distinguish between an abstraction of such a collection of fragments, which we shall call a textblock pattern [ 124 ], and a particular fragment of text in a particular item, which we shall call a textblock hit [ 130 ].
- n-gram [ 118 ] transitions within the universe of items [ 122 ] (see FIG. 20 ), also called a textblock graph [ 160 ].
- n-grams [ 118 ] For each item [ 122 ], examine its text one token [ 116 ] at a time. Form n-grams [ 118 ] of successive tokens [ 116 ] for some value of n that will remain constant throughout and which is small compared to the size of a typical item [ 122 ]. Successive n-grams [ 118 ] may overlap. For example, in the text “one two three four five”, the 2-grams would be “one, two”, “two, three”, “three, four”, and “four, five”.
- This window will initially contain the first k n-grams [ 118 ] in the document [ 162 ]. At each step, the window will be moved forward so that the first n-gram [ 118 ] in the window will be removed and the next n-gram [ 118 ] in the document [ 162 ] will be added to the end of the window.
- n-gram [ 118 ] is added to it. Add this n-gram [ 118 ] as a vertex to the graph if it was not already there.
- do a look-up for a directed, weighted edge in the graph pointing from that n-gram [ 118 ] to the one that was just added. If such an edge did not exist, add it to the graph and give it a weight of 1. If such an edge did exist, increase its weight by 1. In the example given above, if k 3, we would create, or increment the weight of, the following edges:
- Textblock patterns [ 124 ] consist of tokens [ 116 ] that are used in roughly the same sequence over and over again, with some modification. This means that their n-grams [ 118 ] will be used in roughly the same order. To the extent that these n-grams [ 118 ] are not commonly used in other contexts, we expect to see a particular pattern in the local environment of each such n-gram [ 118 ] within the graph.
- N contains all vertices which are connected to p by edges, as well as p itself.
- M be the maximum number of directed edges possible with both head and tail in N, excluding loops.
- M
- w(q, r) be defined as:
- LC(p) provides a measure of how evenly interconnected the neighborhood of p is, and as such is key to this algorithm for detecting textblocks.
- N contains all the neighbors of P plus P itself, so
- edges that are within N are precisely those created by the k positions of the sliding window such that the window contains P.
- P goes from being the last item [ 122 ] in the window to being the first.
- this window contains k* (k ⁇ 1)/2 edges.
- a new n-gram [ 118 ] is added and is connected to every n-gram [ 118 ] before it in the window by one edge, so this adds k ⁇ 1 more edges (all other edges in this window were already present in the previous window).
- the window moves forward k ⁇ 1 times (once for each of the k positions under consideration except the initial position), so the total number of edges in N is k*(k ⁇ 1)/2+(k ⁇ 1) 2 .
- the sequence of tokens [ 116 ] we have considered is a type of textblock—exactly the same sequence of tokens, which form n-grams [ 118 ] that are not used anyplace else.
- additional tokens [ 116 ] are added or removed, and/or tokens [ 116 ] are re-ordered.
- the graph connecting tokens [ 116 ] in the textblock will look similar to what we have considered, but the edges will not all have exactly the same weight, and there may be additional edges with low weights. This will affect local clusterability by an amount roughly proportional to how large such changes are.
- an n-gram [ 118 ] in a textblock may appear in more than one context within the universe of items [ 122 ]. If it only appears once within the textblock pattern [ 124 ] and appears in that pattern much more often than in other contexts, its local clusterability will be close to the expected value calculated above. Thus, most n-grams [ 118 ] within the textblock will have LC close to the expected value.
- the most time-consuming part of this algorithm is the local clusterability calculation. Because it may require comparing every pair of items [ 122 ] in a neighborhood N, its running time may be O(
- This method for finding patterns has two phases—a first phase in which the transition graph [ 160 ] is built, and a second phase in which the transition graph [ 160 ] is pruned and connected components and identified.
- the first phase considers only one item [ 122 ] at a time and hence can be performed against a pseudo-infinite stream of text-containing items [ 122 ] in a continuous monitoring context.
- the second phase cannot.
- the graph [ 160 ] is periodically cloned so that a separate process can perform the second phase and update the set of textblock patterns [ 124 ]. While this pruning occurs, the original process can continue to add new incoming items [ 122 ] to the graph [ 160 ].
- textblock patterns [ 124 ] are detected within overlapping time periods.
- textblock patterns [ 124 ] might be detected within bins of two months each, with each bin overlapping the bin after it by one month.
- patterns [ 124 ] are kept for some pre-determined amount of time and then discarded.
- patterns are discarded after some amount of time following when the most recent hit for that pattern was detected.
- edges found so far constitute a match for each value in CountMap then clear CountMap.
- a hit [ 130 ] found in this manner will not constitute an entire textblock under the definition given above, but rather will constitute a portion of a textblock.
- textblocks may be broken into pieces by n-grams [ 118 ] that are used in many contexts, or by the presence of even smaller segments which are also repeated by themselves. Also, at the beginning and end of the textblock n-grams [ 118 ] may not have the clusterability predicted here. But once we have found a set of items [ 122 ] containing hits [ 130 ] of a common textblock pattern [ 124 ], we can expand the hits [ 130 ] using standard methods for inexact string matching which would not be feasible on the entire universe. These methods may include dynamic programming or suffix tree construction.
- the textblock pattern [ 124 ] detection portion of this method will run poorly if the transition graph [ 160 ] is so large that it cannot be held in constant-access memory. In some embodiments, only a subset of transitions will be recorded as edges so as to reduce the total size of the graph [ 160 ]. In one embodiment a list of functional words appropriate to the language of the text is used (see FIG. 26 ). In English, for example, prepositions, articles, and pronouns might be used. Only n-grams [ 118 ] immediately following such functional words are placed in the sliding window.
- the full list of n-grams [ 118 ] is produced, but is then reduced to a smaller list using a winnowing method similar to that described in [Schleimer 2003] (see FIG. 27 ).
- N-grams [ 118 ] are hashed and the hashes are placed in a sliding window. The smallest hash at any given time will be noted, and the n-gram [ 118 ] it came from will be placed in the derived list. If, for a given window position, the n-gram [ 118 ] with the smallest hash is the same as it was in the last window position, then it is not added again. From the derived list, transitions will be recorded as edges.
- a modification of this algorithm will perform well even when the transition graph [ 160 ] is too large to fit in random-access memory (see FIG. 23 ). Proceed as before, but place a maximum size on how large the transition graph [ 160 ] is allowed to grow. In one embodiment, when the graph [ 160 ] reaches this size, the entire graph [ 160 ] is written to disk (or some other slower portion of the memory hierarchy) and the constant-access memory structure is emptied. In another embodiment, vertices in the graph [ 160 ] are held in memory in a Least-Recently Used (LRU) cache. When a vertex is ejected from the cache, edges within its neighborhood are written to disk (or other form of slower memory) and then purged from constant-access memory. In any embodiment, the portion written to disk (or slower memory) is recorded as ordered triples of n-grams [ 118 ] connected by edges. For example, in the text “one two three four five”, using 1-grams we would record:
- LRU Least-Recently Used
- Sorting and merging files on disk is a well-studied problem, and can generally be done in running time that is O(n*log(n)), where n is the total length of the files involved. Hence, the entire process so far will run in time O(n*log(n)), where n is the total length of all items [ 122 ] in the universe.
- the system creates and continuously maintains content re-use profiles that characterize the extent to which individual actors [ 220 ] produce a certain category of information, how they modify such information, and how they consume it.
- Valuable information can be defined in any number of ways.
- valuable content is generally defined as the result of filtering all items [ 122 ] collected and processed by ontology classifiers [ 150 ] associated to business-relevant topics [ 144 ].
- valuable content is defined as intellectual property assets such as research artifacts or software code.
- the system relies on continuous textblock detection as described in this invention.
- Two libraries of textblocks are maintained: a global library of textblocks and a library of recently disseminated textblocks.
- each content-relaying event is a (Textblock pattern, Date, Sender, Recipient) tuple.
- One such tuple is created for each (Sender, Recipient) pair associated to the item [ 122 ] in which the textblock hit [ 130 ] was found—by definition there can be one or more such pairs per item [ 122 ].
- the date of a content re-use event is defined as the representative date of the item [ 122 ] in which the textblock hit [ 130 ] occurs.
- the representative date is the sent date of an electronic communication [ 123 ] (such as an email message or an IM turn), the last modification date of an item [ 122 ] collected from a file (or from a database table), etc.
- the system will then compute scores, called create-score, consume-score, and relay-score, for each actor [ 220 ].
- scores called create-score, consume-score, and relay-score, for each actor [ 220 ].
- a list of all received textblock patterns [ 124 ] is maintained for each actor [ 220 ]. To do this, it scans all content re-use events in date-ascending order.
- the create-score of the sender is updated.
- this update consists of a unit increment of the create-score.
- it is an increasing function of the textblock pattern [ 124 ] length.
- it is a decreasing function of the number of textblock hits [ 130 ] found for the pattern [ 124 ] throughout the whole dataset.
- the receive-score of the recipient is updated (the update function is similar to the update of the send-score described previously), the textblock pattern [ 124 ] is added to the list of received patterns [ 124 ] for the recipient if it was not already present, and if the textblock pattern [ 124 ] belonged to the list of received patterns [ 124 ] for the sender then the sender's relay-score is updated.
- the update of the relay-score consists of a unit increment. In another embodiment, it is proportional to the ratio of the textblock hit [ 130 ] length over the textblock pattern [ 124 ] length.
- the list of received patterns [ 124 ] is augmented to keep track of the token [ 116 ] range of each received pattern [ 124 ] that has also been relayed, and the update consists of adding the ratio of the textblock pattern [ 124 ] length that was not covered (in other words, this differs from the previous embodiment in that information relayed multiple times is only counted once).
- the present invention is not restricted to a specific dissemination profile calculation method.
- the scores defined above simply measure the level and type of interaction among actors [ 220 ] by counting items exchanged through communication channels [ 156 ] or loose documents [ 162 ]. In another embodiment, they also take into account how often that content is being viewed, downloaded or copied, and, by contrast, which content is simply ignored.
- the resulting profiles are much more accurate and more difficult to game than simple counts in large enterprise networks, where actors [ 220 ] who tend to send superfluous content in large quantities are often not contributing to the overall productivity of that organization.
- scores are also a function of the actual roles and responsibilities of actors [ 220 ] as derived from example from their involvement in discussions [ 136 ] that represent workflow processes [ 128 ], including but not limited to how often they initiate, close, or contribute to discussions [ 136 ], whether they are decision-makers (as defined by the measure of that trait [ 295 ] in the behavioral model [ 200 ]), whether they review work products, etc.
- Possible enterprise application scenarios of information dissemination profiling include but are not limited to:
- information dissemination profiles are:
- the ranking [ 275 ] mechanism can be configured by the user.
- actors [ 220 ] are ranked as follows: creators are ranked by decreasing their sender-score, consumers are ranked by decreasing their receive-score, and curators are ranked by decreasing their relay-score.
- a library of recently disseminated textblocks is built at regular intervals (for example, on a monthly basis).
- the list of content re-use events is computed similarly to the global library construction, except that the create-score of the sender of a re-use event is updated only when the textblock pattern [ 124 ] is encountered for the first time in this scan and it is not present in the global library. If either condition is not satisfied, then the sender's relay-score is updated as in the global library construction.
- the result of this scan over re-use events is a ranking of actors on the corresponding time period.
- FIG. 28 shows a graph visualization of information dissemination profiles provided by the system.
- This visualization shows actors [ 220 ] as nodes in the graph, and dissemination relationships as edges.
- the identity [ 2820 ] (potentially anonymized depending on the anonymization scheme [ 340 ] in place) of the actor [ 220 ] decorates each node.
- annuli [ 2800 ] also called donuts
- the width of annuli [ 2800 ] (i.e. the difference between their external and internal radii) drawn around an actor denotes the relative amount of information respectively produced, received, or relayed.
- Color codes are used to distinguish profiles.
- blue circles indicate creators [ 2805 ]
- green circles indicate curators [ 2810 ]
- red circles indicate consumers [ 2815 ].
- a saturation level is used as a visual indicator of the content's value or relevance: the darker the color, the more valuable the information created, relayed, or consumed. This provides an additional dimension to the dissemination profile established by the system. For example, the darker the blue circle around an actor [ 220 ], the more likely that actor [ 220 ] is to be a thought leader; the darker a green circle around an actor [ 220 ], the more actively that actor [ 220 ] is contributing to spreading knowledge or expertise throughout the organization.
- valuable information can be further categorized by the system using any type of categorization component [ 146 ].
- a set of categories would be classification levels, thus adding another dimension to the visualization.
- Each annulus [ 2800 ] is split into one or more annular sectors [ 2825 ], with the angle of a sector proportional to the relative volume of the corresponding category found in that actor's [ 220 ] dissemination profile.
- the actor identified as “actor 13 ” creates significant volumes of information categorized as A or B [ 2825 ] in roughly equal proportions, but produces comparatively little if any information categorized as C.
- the behavioral modeling component [ 445 ] leverages the output of the other components in the processing and analysis layer [ 402 ] to establish a behavioral model [ 200 ] which is the core model from which a behavioral norm is determined and anomalies are detected by the anomaly detection component [ 450 ] based among other things on deviations from that norm.
- the present disclosure defines the concept of an individual behavior profile, which can be used for different purposes, including but not limited to the following.
- an individual behavior profile is useful in and of itself to show a behavioral portrait of the person (whether a snapshot of assessed behavior [ 205 ] over a recent period of time, or a baseline behavior [ 260 ] computed over a decade), while letting a human analyst derive any conclusions on his own. For example, certain organizations may want to investigate any actor [ 220 ] who has been deemed to possess an inflated ego and also appears to exhibit a low level of satisfaction with respect to his job.
- an individual behavior [ 210 ] can be leveraged to analyze changes over time in the individual's behavior, in order to derive a level of associated risk or alternatively to produce anomalies [ 270 ] that should be investigated further. For example, someone who isolates himself from the rest of the organization (whether socially, professionally, or in both respects) over a period of time has a situation worth investigating.
- an individual behavior [ 210 ] can be used to contrast the individual's behavior with her peers' behavior in order to yield another kind of assessment of anomalous behavior as with changes over time. For example, someone whose level of stress increases considerably more than his co-workers' stress is a significant anomaly, much more so than a collective increase in stress levels which might be imputable to internal tensions and difficulties or to exogenous circumstances.
- the primary function accomplished by the construction and maintenance of the behavioral model [ 200 ] in the present invention is to map each important actor [ 220 ] in the electronic dataset to one or more individual personality types.
- these personality types can also be called archetypes since they are a necessarily simplified model of any real human personality, in that the more complex traits have been omitted while emphasizing other traits more relevant to the particular scenario, for example psycho-pathological traits.
- An actor [ 220 ] that matches at least one of these archetypes would typically be flagged for investigation if for example the corresponding archetype(s) suggest a level of present or future insider threat, where an insider threat is defined as a series of malevolent or unintentional actions by a person trusted by the organization with access to sensitive or valuable information and/or assets.
- the behavioral model [ 200 ] can provide evidence suggesting that the individual in question is a malicious insider. This covers three main types of situations described below, each of which presents a significant threat to the organization if it goes undetected until irreversible malicious acts are committed, unless a system such as the one described in this invention flags those individuals by raising alerts [ 305 ] based on the established behavioral model [ 200 ].
- the set of personality archetypes is completely configurable, to allow for either a very generally applicable model of human personality, or a custom model more targeted toward a specific organization.
- the set of personality archetypes represents the Big Five factors, which are a scientifically validated definition of human personality along five broad domains: Extraversion, Agreeableness, Conscientiousness, Neuroticism, and Openness.
- the advantage of such a generic model besides lending itself to cross-validation of results produced by the present invention and by another personality assessment scheme, is that it does not assume a unique categorical attribute to define an individual personality, since the Big Five factors are modeled as 5 numerical features (generally expressed as a percentile value).
- the system provides a picture of actors [ 220 ] prone to anger control issues.
- the set of personality archetypes is defined to represent the main behavioral risk factors present in any business organization, and are as follows:
- the behavioral model [ 200 ] involved in the present invention relies on the assessment of the presence and intensity of a number of behavioral and personality traits [ 295 ] for every individual actor [ 220 ] or group [ 225 ] for which a sufficient volume of data has been processed and analyzed. Each personality type—or archetype—as described previously is then detectable by the presence of behavior traits that are associated to such a personality.
- each behavioral trait is associated to a positive or negative correlation with each archetype based on empirical and/or theoretical data: for example, when using the Big Five factors as personality archetypes, undervaluation is a behavioral trait that is measured by the system as positively correlated to the Neuroticism factor; once all behavioral traits have been accounted for, numerical values are available for each individual along each factor—from which percentile values can further be deduced using a reference sample appropriate to the scenario at hand.
- a behavioral trait [ 295 ] might correlate to several archetypes; in some cases it might even correlate positively to an archetype and negatively to another.
- egocentric personalities are characterized (among other things) by a lack of empathy whereas influenceable personalities can be manipulated by others using their empathy (either towards the manipulator or a third party in case the manipulator resorts to coercion).
- the model assumes that each pair of random variables composed of a behavioral trait [ 295 ] and a personality type shows either no correlation, a positive correlation, or a negative correlation.
- all such correlations are assumed to be linear correlations in the sense that actors [ 220 ] are scored along a personality type using a weighted sum (with positive or negative coefficients) over all behavioral traits [ 295 ] for which a score [ 285 ] has been computed.
- the rest of this section lists a number of behavioral traits [ 295 ] provided by the default embodiment of this invention. This list is not limitative, and a key characteristic of this invention is to support augmenting the anomaly detection mechanism with any behavioral trait [ 295 ] that can be measured for a given actor [ 220 ]. For each trait, a brief explanation is given of how to score an actor [ 220 ] along that trait [ 295 ] in one embodiment among all possible embodiments; essentially each trait [ 295 ] can be measured along a number of vectors either directly observable in the data or derived during processing or post-processing by the system. For clarity, the behavioral traits [ 295 ] supported by the system are broken down into broad categories: in this default embodiment, the categories are job performance, job satisfaction, perception by peers, communication patterns, and character traits.
- Job performance traits correspond to professional achievements and work habits, but also a measure of reliability, i.e. how well the actor [ 220 ] performs her job.
- the rest of this section describes such traits [ 295 ] that can be measured in the default embodiment of this invention.
- an actor's [ 220 ] disengagement is measured by first filtering discussions [ 136 ] computed by the system to retain those that involve the actor [ 220 ] as a primary participant, are business-relevant, and optionally by filtering according to topics addressed in the elements of those discussions [ 136 ]. Then the system computes a number of behavioral metrics [ 290 ] including but not limited to:
- the system lets a user visualize patterns of disengagement for a given actor [ 220 ] by using the sequence viewer described in this invention to show discussions [ 136 ] involving that actor [ 220 ].
- Stability measures the regularity and stability in the distribution of time and effort for a given actor [ 220 ] across workflow processes [ 128 ] such as business workflows and activities.
- stability is measured using periodic patterns [ 126 ] derived by the periodic patterns detection component [ 405 ]; both a high frequency and an increasing frequency of gaps and disturbances in the business-relevant periodic patterns [ 126 ] involving the actor [ 220 ] denote an unstable behavior.
- the system lets a user visualize patterns of stability for a given actor [ 220 ] by using the gap viewer described in this invention to show periodic patterns [ 126 ] involving that actor.
- This behavioral trait [ 295 ] assesses how the actor [ 220 ] delegates professional tasks and responsibilities.
- the level of delegation for a given actor [ 220 ] is measured as the centrality measure in the graph of instruction relaying (as defined in U.S. Pat. No. 7,143,091).
- the graph can be filtered to only retain explicitly actionable instructions, such as those accompanied by an attached email or a list of tasks, which provide a more accurate reflection of work delegation than the transmission of more vague, non-directly actionable instructions, or mere forwards.
- the system allows the user to efficiently visualize delegation patterns, either when an anomaly [ 270 ] has been flagged, or on-demand by the means of a particular type of animated actor graph [ 471 ].
- This behavioral trait [ 295 ] assesses the actor's [ 220 ] ability to respect deadlines and to complete tasks delegated to her.
- Measurements of an actor's [ 220 ] diligence include, but are not limited to the level of regularity in periodic sequences [ 132 ] originating from that actor [ 220 ], such as the submission of reports following a regularly scheduled meeting, or the frequency of indicators found in the analyzed data that the actor [ 220 ] was late or absent from important meetings without a valid reason being found for that absence or lateness.
- This behavioral trait [ 295 ] assesses the level of discipline shown by an actor [ 220 ] in the workplace, as well as her respect for procedures and hierarchy.
- Measures of an actor's [ 220 ] level of discipline include but are not limited to:
- two types of definitions are combined for assessing an actor's [ 220 ] job performance: an objective definition and a subjective definition.
- Objective performance is assessed based on criteria including but not limited to production of high-quality content (i.e. frequently and broadly re-used in content authored by other actors [ 220 ]), or the dominant archetype taken by the actor [ 220 ] using role assessment (e.g. leader who frequently initiates discussions [ 136 ] vs. follower who passively observes discussions [ 136 ]).
- high-quality content i.e. frequently and broadly re-used in content authored by other actors [ 220 ]
- role assessment e.g. leader who frequently initiates discussions [ 136 ] vs. follower who passively observes discussions [ 136 ].
- Subjective performance is assessed based on criteria including but not limited to results of performance review (as directly recorded in numerical values in an HR system or as a polarity value evaluated from linguistic analysis of those reviews' content), any sanctions received by the employee, as well as the expression of a particularly positive or negative judgment on the actor's [ 220 ] performance as inferred from hits produced by appropriate ontology classifiers [ 150 ].
- This behavioral trait [ 295 ] corresponds to an actor spending a lot of time and directing significant attention or effort towards non-business issues and topics [ 144 ] during work hours, as well as to repeated and impactful interferences of personal issues with behavior in the workplace.
- Job satisfaction describes how the considered actor [ 220 ] feels about her job at a given point in time or over time.
- the behavioral model [ 200 ] should contain as many traits [ 295 ] as possible in this category to reliably quantify the level of satisfaction as well as assess the topics [ 144 ] and related entities associated to the highest (resp. the lowest) degree of satisfaction for a particular actor [ 220 ].
- the rest of this section describes such traits [ 295 ] that can be measured in the default embodiment of this invention.
- This behavioral trait [ 295 ] corresponds to chronic discontentment transpiring in the organization.
- this is measured by the volume of negative language related to the actor's [ 220 ] current responsibilities, to organizational policies, to coworkers, etc.
- the system measures resentment expressed by the actor [ 220 ] about people higher up than she is.
- the system discounts any changes in sociability that are health- or family-related.
- An exception to this is when those personal issues become so critical that they turn into motivations for harming other actors [ 220 ], the organization, or the society, for example if they result from financial distress; this can be measured by linguistic analysis, or any method for monitoring the actor's [ 220 ] financial situation.
- This behavioral trait [ 295 ] corresponds to indicators of envy, ashamedy, or any presence of grief in an actor [ 220 ], including a vindictive attitude.
- the system lets a user visualize grievance patterns exhibited by a particular actor [ 220 ] using the stressful topics visualization described in this invention, using that actor [ 220 ] as the subject of interest.
- This behavioral trait [ 295 ] is defined as the manifestation of excessive greed, but also more generally of an overachievement behavior. This is particularly important for anomaly detection aimed at spotting malicious behavior since excessive greed is often characteristic of an individual ready to go to extreme lengths, including malevolent ones, to reach her goals; such individuals try to rationalize unreasonable financial aspirations, thereby suggesting that they are struggling with internal conflicts due to considering malicious or otherwise unauthorized actions.
- excessive greed is detected by ontology classifiers [ 150 ] capturing constructs such as repeated complaints of “wanting a better X” for various values of X, or reflecting an obsession about the recognition (financial or other) of one's achievements
- two types of definitions are combined for assessing an actor's [ 220 ] sense of undervaluation, similarly to the case of job performance assessment: an objective definition and a subjective definition.
- Subjective undervaluation can be measured by criteria including but not limited to:
- Objective undervaluation can be measured by criteria including but not limited to:
- This behavioral trait [ 295 ] corresponds to how well the actor [ 220 ] receives feedback and constructive criticism, as well as her willingness to acknowledge and learn from past errors.
- acceptance of criticism is measured by methods including but not limited to:
- This behavioral trait [ 295 ] corresponds to how closely the way others see the actor [ 220 ] matches her own self-perception.
- Measuring this trait [ 295 ] is particularly useful because a high contrast between how a person is perceived by others and their self-perception is often associated to discontentment and in more extreme cases to psychological troubles.
- the level of perceptive bias for an actor [ 220 ] is measured by methods including but not limited to:
- This behavioral trait [ 295 ] measures the breadth and the level of influence exerted by the actor [ 220 ] over his peers, for example his co-workers.
- a low level of influence occurs when the actor [ 220 ] has no significant impact on others, which depending on the context might be an important anomaly [ 270 ] to detect. Also, a suddenly increasing level of influence might reveal that the actor [ 220 ] is trying to become a power broker, possibly with the intent to extort information from others or coerce them for malicious purposes.
- the level of influence exerted by an actor [ 220 ] is measured by methods including but not limited to:
- Reliability as a behavioral trait [ 295 ] indicates how dependable the actor [ 220 ] is considered, in particular how much actual trust others put into her beyond purely professional interactions.
- the reliability of an actor [ 220 ] is measured by methods including but not limited to:
- This behavioral trait [ 295 ] indicates the status of an actor [ 220 ] in the eyes of her peers, particularly in the domain of their professional interactions.
- the popularity of an actor [ 220 ] is measured by methods including but not limited to:
- This behavioral trait [ 295 ] measures the extent to which an actor [ 220 ] is included in business workflows, and how much she interacts with coworkers.
- This trait [ 295 ] is an important component of an individual behavior [ 210 ] since isolation patterns, which reflect a significant lack of interaction or generally poor-quality interactions, are a worrying sign in most organizations. Conversely, the unexplained disappearance of isolation patterns is also suspicious in many situations, since e.g. someone reclusive who becomes very popular might actually be trying to manipulate people in the parent organization in order to gain unauthorized access to sensitive information.
- the connectedness of an actor [ 220 ] is measured by methods including but not limited to:
- the system allows the user to efficiently visualize connectedness assessment for a particular actor [ 220 ], either when an anomaly [ 270 ] has been flagged or on-demand by the means of a graph showing changes of position within the actor network, which is one of the animated actor graphs [ 471 ] provided by the system.
- This behavioral trait [ 295 ] reflects the tendency of an actor [ 220 ], in the face of conflicts involving other actors [ 220 ] in the organization, to either make those conflicts worse by her actors or speech, or, in contrast, to attempt to nip emerging conflicts in the bud, or more generally to solve existing interpersonal issues.
- an actor's [ 220 ] propensity to confrontation is measured by methods including but not limited to:
- the reply-based confrontation score [ 285 ] takes into account that actor's [ 220 ] response to negative sentiment. For example, if in the course of a discussion [ 136 ] that contains very little (if any) negative sentiment, the actor [ 220 ] sends out a very aggressive email, this will significantly increase her reply-based confrontation score [ 285 ].
- the effect-based confrontation score [ 285 ] takes into account that actor's [ 220 ] communications' [ 123 ] effects on others.
- This behavioral trait [ 295 ] assesses whether the actor [ 220 ] is only concerned with her own interests or has other people's interests in mind.
- an actor's [ 220 ] self-centered behavior is measured by methods including but not limited to:
- stress management issues are detected using linguistic markers including but not limited to:
- Polarized behavior corresponds to the presence of both highly negative and highly positive sentiments expressed by the actor [ 220 ] on a certain topic [ 144 ] or relative to certain actor [ 220 ].
- a polarized attitude is detected by methods including but not limited to:
- Information dissemination analyzes how specific knowledge and data spreads through the organization, e.g. if it typically travels through a single level of management, or more vertically, and how long it takes to become general knowledge, etc.
- This behavioral trait [ 295 ] is particularly useful for analyzing the spread of highly valuable or sensitive information, as well as the effectiveness of knowledge acquisition throughout the organization.
- actor profiles are built according to a model of knowledge creation and transfer.
- profiles are creators of information, couriers, and consumers.
- a method to rank actors [ 220 ] against each such profile is described in the section on Information dissemination profiles, along with a novel way to efficiently visualize these profiles on the whole actor network.
- the input data can be filtered by topic [ 144 ] or by ontology classifier [ 150 ], etc. to only retain information relevant to a specific scenario.
- This behavioral trait [ 295 ] measures a particularly interesting social network pattern, namely the emergence of cliques [ 255 ], i.e. groups of tightly connected people. This is especially relevant in relation to other significant patterns or topics [ 144 ], and especially when these cliques [ 255 ] have a secretive nature.
- clique [ 255 ] members will often share similar behavioral traits [ 295 ], which can provide supporting evidence for the system's behavioral model [ 200 ] construction that an actor [ 220 ] matches a particular profile type.
- anomalous patterns for this behavioral trait [ 295 ] include, but are not limited to, a sudden change in an actor's [ 220 ] tendency to form cliques [ 255 ], and the inclusion of an actor [ 220 ] in one or more cliques [ 255 ] around another actor [ 220 ] who has previously been flagged—manually or automatically—as suspicious.
- an actor's [ 220 ] tendency to form cliques [ 255 ] is assessed by detecting patterns including but not limited to:
- the system allows the user to efficiently visualize the emergence, decay, and evolution of cliques. This is done either upon detecting an anomaly in relation to cliques [ 255 ] including a particular actor [ 220 ], or on demand from the user, by the means of an animated graph of cliques, which is one of the animated actor graphs [ 471 ] provided by the system.
- This behavioral trait [ 295 ] measures an actor's [ 220 ] tendency to develop and sustain personal relationships with co-workers.
- the level of social proximity of an actor [ 220 ] is assessed by computing metrics [ 290 ] including but not limited to:
- the system allows the user to efficiently visualize and assess social proximity and its evolution for a particular actor [ 220 ] on a continuous basis, using the Social You-niverse visualization [ 472 ].
- elicitation is detected using methods such as:
- This character trait [ 295 ] corresponds to detecting extreme values of an actor's [ 220 ] apparent self-esteem. Both extremes result in anti-social behavior that might put the organization or other individuals at risk: a lack of self-esteem might lead the actor [ 220 ] to be easily influenced or to act irrationally, and when extremely low can even denote socio-pathological tendencies. Conversely, an inflated ego—and ego satisfaction issues in general—results in a lack of consideration for others and in a tendency to manipulate other actors [ 220 ] while rationalizing malicious actions. Also, a major decrease in self-esteem is also considered a cause for concern and therefore such a pattern is flagged by the system as anomalous.
- the likely existence of self-esteem issues is assessed using the following methods:
- the system Based on a number of behavioral traits [ 295 ] including but not limited to the ones previously described, each of which can be scored along one or more behavioral metrics [ 290 ], the system establishes an assessment of the behavior of all individual actors [ 220 ] as follows.
- a per-actor [ 220 ] score [ 285 ] is computed as a linear combination of the values taken by each normalized metric [ 290 ] for that trait.
- Metrics [ 290 ] are normalized against all actors [ 220 ] for which they are defined (since no value might be available if for example no data relevant to the metric [ 290 ] is associated to a particular actor [ 220 ]) so as to have a zero mean and unit standard deviation.
- the set of weights involved in the linear combination of metric scores [ 285 ] is independent from the actor [ 220 ] considered.
- the user [ 455 ] completely specifies the weight for each metric [ 290 ] of a given trait.
- the system is initialized with identical weights, following which those weights are continuously adjusted based on user feedback [ 160 ], as described in the section on Anomaly detection tuning.
- This allows the system to fine-tune the assessment of a behavioral trait [ 295 ] based on input from a human expert since the comparative reliability e.g. of an actor network metric and of an ontology classifier metric is subject to interpretation and therefore can not be automatically determined by the system.
- This linear combination yields a scalar value for each actor [ 220 ] against a particular behavioral trait [ 295 ], which in turn does not need to be normalized since it constitutes a relative scoring mechanism across the actor set; the system thus uses these values to rank the actors [ 220 ] by decreasing score [ 285 ].
- These rankings [ 275 ] can then be leveraged in multiple ways, including having the system automatically generate alerts [ 305 ] for the top-ranked actors [ 220 ] against a particular trait [ 295 ] (as described in the section on Anomaly detection) or by letting the user query the behavioral model [ 200 ] against a trait [ 295 ] and returning the top-ranked actors [ 220 ].
- the system is also able to determine the main characteristics of actor groups [ 225 ] using a model similar to the one used in individual behavior [ 210 ] assessment. This is useful to measure behavior at a coarser granularity than on a per-actor [ 220 ] basis. It is also essential to application scenarios where risk is presumed to originate not only from malicious insiders acting on their own or with accomplices external to the organization but also from conspiracies involving multiple actors [ 220 ].
- the system defines and measures a number of collective behavioral traits [ 295 ] which together compose a collective behavior [ 215 ].
- Each of these traits [ 295 ] is derived from the individual behavior traits [ 295 ] of the group [ 225 ] members—which are measured as described previously—in a straightforward manner using simple aggregate metrics [ 290 ].
- This allows the system to rank actor groups [ 225 ] against each of these traits [ 295 ] and thus lets the anomaly detection component [ 450 ] raise behavioral alerts [ 305 ] as appropriate. Those alerts [ 305 ] will thus be imputed to a group [ 225 ] rather than a single actor [ 220 ].
- the system aggregates every single behavioral trait [ 295 ] for each actor [ 220 ] who belongs to a given group [ 225 ] by simply computing the average score [ 285 ]. This allows for example, to determine a score [ 285 ] and thus a ranking [ 275 ] for all formal or informal groups [ 225 ] against the disengagement trait described previously.
- the system presents the resulting collective behavior [ 215 ] model to the user by underlining simple statistical measures over the corresponding behavioral traits [ 295 ], such as the average score [ 285 ] against that trait [ 295 ], highest and lowest score [ 285 ], and standard deviation, for each actor group [ 225 ] for which the trait [ 295 ] has been measured.
- this last condition is defined as having a significant number of individual trait measurements, for example at least 6 individual actor [ 220 ] scores [ 285 ].
- all visualizations [ 204 ] of behavioral patterns described in this invention can be adapted to represent group behavior [ 215 ] rather than individual behavior [ 210 ].
- the social universe visualization uses planets [ 4820 ] to represents either individual actors [ 220 ] or groups [ 225 ]; matrix visualizations such as stressful topics and emotive tones represent either individual actors [ 220 ] or groups [ 225 ] as rows in the matrix; and visualizations [ 204 ] based on selecting input data [ 360 ] according to the actors [ 220 ] involved, such as the sequence viewer, can also filter that data according to the groups [ 225 ] involved.
- FIG. 8 shows a partial hierarchy of anomalies [ 270 ] generated by the anomaly detection component [ 450 ] of the system.
- An anomaly [ 270 ] is associated to one or more events [ 100 ] that triggered the anomalous patterns, and to zero, one or more subjects [ 272 ], each subject [ 272 ] being an actor [ 220 ], a group [ 225 ], a workflow process [ 128 ], or an external event [ 170 ] to which the anomaly is imputed.
- the anomaly subject is defined as follows:
- an anomaly [ 270 ] generated by the present invention possesses a set of properties: confidence [ 870 ], relevance [ 880 ], and severity [ 875 ], which are described in the following.
- Confidence [ 870 ] is a property endogenous to the model and indicates the current likelihood (as a positive numerical value estimated by the system) that the facts underlying the anomaly [ 270 ] are deemed valid in the physical world. This includes, but is not limited to, the following considerations: how strong the chain of evidence [ 100 ] is, and how many associations are established to produce the anomaly [ 270 ].
- Relevance [ 280 ] is a property which is computed by combining user input [ 106 ] and information derived by the system. It represents the importance of the risk associated to the anomaly [ 270 ]. That is, a highly relevant anomaly [ 270 ] indicates behavior that can be malevolent or accidental but carries a significant risk (business risk, operational risk, etc.) whereas a low relevance [ 280 ] indicates abnormal but harmless behavior.
- Relevance [ 280 ] is a positive numerical value initialized to 1 for any newly detected anomaly [ 270 ].
- Severity [ 875 ] is a property defined by human users and indicates the impact (for example in terms of material or financial damage) that the anomaly [ 270 ] would lead to if it is actually confirmed and carries a risk.
- the severity [ 870 ] is defined by a set of system-configurable parameters and is assigned to alerts [ 305 ] and other actions posterior to anomaly detection, but is not used by the system in its computations.
- Anomalies [ 270 ] are generated by the anomaly detection component [ 450 ] and can be of different types, including but not limited to:
- anomalies [ 270 ] Each of these types of anomalies [ 270 ] is described in more detail in the following sections, along with one or more proposed methods for detecting such anomalies [ 270 ]. It should be noted that these anomaly types are not mutually exclusive: for example, an anomaly [ 270 ] may be justified by a single anomalous event [ 835 ] which also triggered a categorization anomaly [ 825 ].
- anomaly [ 270 ] that is raised by the anomaly detection component [ 450 ] is an atomic anomaly [ 830 ] due to an anomalous event [ 835 ].
- emotive tone analysis as described in this invention may have detected derogatory or cursing language on certain internal communication channels [ 156 ], which can constitute anomalous behavior which should be further investigated, and thus results in an atomic anomaly [ 830 ] imputed to the author of these communications [ 123 ].
- the anomaly detection component [ 450 ] can also be configured to trigger atomic anomalies [ 830 ] based on rule violations [ 855 ], such as compliance rules [ 865 ].
- Rule violation is when a particular topic [ 144 ] may have been blacklisted so as to express that no communications [ 123 ] pertaining to that topic [ 144 ] should be exchanged among certain actors [ 220 ]. In this case, the communication [ 123 ] between two such actors [ 220 ], or alternatively the creation by one such actor [ 220 ], of a document [ 162 ] associated to that topic [ 144 ] is flagged by the system as a rule violation.
- Categorization anomalies [ 825 ] are produced on two types of categorical features.
- the first type of categorical features corresponds to categories produced by any type of categorization component [ 146 ] as part of the continuous categorization component [ 420 ], such as detected topics [ 144 ].
- An example of categorization anomaly in that case is when the topical map for a given subject [ 272 ] shows an abrupt and unjustified change at some point in time.
- the Second type of categorical features corresponds to built-in features of the system, such as detected emotive tones or entities (event names, people names, geographical locations, etc.) which are derived from the analysis of data and metadata extracted from events [ 100 ], including periodic sequences [ 132 ] that match a periodic pattern [ 126 ].
- An example of categorization anomaly in that case is when the set of closest actors [ 220 ] has significantly changed for a given actor [ 220 ].
- Categorization anomalies [ 825 ] are produced for any profile change detected as explained in the section on Continuous categorization.
- Some events [ 100 ] can be flagged as an anomaly not by itself but because it is associated with another anomaly [ 270 ] by dint of some relationship inferred from one or more pieces of evidence [ 108 ], including but not limited to:
- anomalies by association [ 850 ] are computed as follows.
- the system configuration defines indirection levels that are incorporated into anomalies [ 270 ] when events [ 100 ] are correlated with other events [ 100 ].
- an indirection level is a value I between 0 and 1, which is used to compute the new confidence level [ 870 ] every time a new relationship is established using the following expression:
- conf A-B Max(conf A ,conf B )+ I Min(conf A ,conf B )
- Baseline values for all types of behavioral features (including statistical measures and periodic features), computed for a particular subject [ 272 ] called the target subject [ 272 ], are stored in the baseline repository. They are continuously adapted by the behavioral modeling component [ 445 ] using the accumulation of collected data.
- FIG. 30 shows the types of referentials [ 3000 ] provided by the anomaly detection component [ 450 ] in a default embodiment of the invention.
- the value tested for anomalies [ 270 ] is called the analyzed feature [ 3075 ], while the value representing the norm is called the reference feature [ 3070 ].
- features can correspond to absolute or relative values.
- the feature can be either the absolute number of emails sent or the fraction it represents relative to the total number of emails sent by all actors [ 220 ] or by actors [ 220 ] in one of the groups [ 225 ] to which he belongs to.
- referentials [ 3000 ] can be created whose definition themselves depend on the analyzed and reference features, such as a filter on their reference events [ 100 ] (for example, when these features are statistical features related to electronic communications [ 123 ], custom peer groups can be defined by restricting the actors [ 220 ] to those having discussed a particular topic [ 144 ]).
- the target subject's [ 272 ] baseline value [ 3035 ] (considered her normal behavior) and her current value [ 3045 ] are compared.
- the rationale for this kind of comparison is that individual actors [ 220 ] as well as actor groups [ 225 ] are creatures of habit who tend to develop their own patterns of communication, of interaction with the data and other actors [ 220 ], etc.
- the system thus aims at modeling as many idiosyncrasies of the considered actor [ 220 ] or group [ 225 ] as possible, and at detecting the cases where the current model diverges from the past model.
- observing permanent change, either individually or at a group level is also of utmost importance since it guarantees the accuracy of the anomaly detection process as well as often provides insightful information on the data itself.
- the referential [ 3000 ] is a historical referential [ 3010 ]
- the current value [ 3045 ] is compared to a historical value [ 3040 ] (i.e. a fixed amount of time in the past).
- the target subject's [ 272 ] baseline values before and after an external event [ 170 ] are compared: the value before that point in time [ 3050 ] is the reference feature, while the value after that point in time [ 3055 ] is the analyzed feature.
- the idea here is that if the behavior of a subject [ 272 ] tends to change significantly following a particularly sensitive external event [ 170 ], this may suggest that the subject [ 272 ] was specifically involved in that external event [ 170 ].
- the referential [ 3000 ] is a periodic event referential [ 3025 ]
- the changes in the target subject's [ 272 ] feature values [ 3065 ] around a more recent external event [ 170 ] are compared to the same feature values [ 3060 ] around an older external event [ 170 ].
- the target subject's [ 272 ] baseline value [ 3035 ] is compared to the baseline values of similar subjects [ 272 ] defined as a peer group [ 3030 ].
- This is particularly useful in the cases where no normal behavior can be easily defined on the subject [ 272 ], but more generally it provides a much more exhaustive way to detect behavioral anomalies, especially in presence of malicious activities: intuitively, in order to escape the detection capabilities of the present invention, an individual who wants to commit fraud would need not only to hide its own malicious actions so they do not appear suspicious with respect to his past behavior, but would also need to ensure that they do not appear suspicious in the light of other people's actions.
- the baseline values of similar subjects [ 272 ] represented as a peer group [ 3035 ] are computed as summaries of the behaviors of similar actors [ 220 ] or groups [ 225 ], for example by averaging each relevant (observed or derived) feature.
- a peer group [ 3035 ] can be defined in a number of ways, which include but are not limited to:
- Anomalies by deviation [ 805 ] can be detected on any type of feature [ 2900 ] associated to events [ 100 ].
- the following types of features [ 2900 ] are supported by the system:
- Periodicity features [ 2925 ] can always be treated as a combination of scalar features and categorical features by considering the time elapsed between successive instances of the periodic pattern [ 126 ] as a specific type of scalar feature [ 2915 ], as well as including all features [ 2900 ] attached to each occurrence (i.e. to each periodic sequence [ 132 ]). Therefore only numerical features [ 2905 ] and categorical features [ 2910 ] actually need to be considered.
- the amplitude of each deviation is mapped into the interval [0,1] where 1 represents the absence of deviation, and this value is used as a multiplier of the confidence level [ 870 ] of the considered anomaly [ 270 ].
- the first step in detecting anomalies by deviation is to define the reference features [ 3070 ] against which analyzed features [ 3075 ] will be compared.
- a feature descriptor [ 3080 ] is an aggregated value computed over a time interval. In one embodiment of the invention, one or more of the following definitions are available for a feature descriptor [ 3080 ]:
- this analyzed feature [ 3075 ] is usually defined as a sliding window [ 380 ] of recent data for the target subject [ 272 ], or as a window around an event [ 100 ] (unique or recurring) in the case where behaviors around two different events [ 100 ] have to be compared.
- the descriptor [ 3080 ] is the average feature value over all observations in time.
- the descriptor [ 3080 ] is the feature value of the best exemplar among the observations, i.e. the value that minimizes the sum of distances to the other observations.
- the system only needs to maintain two descriptors [ 3080 ] for the target subject [ 272 ], in a time window of size w starting at the event's end time and a time window of size w ending the event's end time.
- the event [ 100 ] is a recurring event [ 101 ]
- the positions of these windows are updated every time a new occurrence of the event has been detected.
- the case of a peer-group referential [ 3005 ] is the least simple one to compute.
- the reference group [ 3030 ] can be defined either exogenously or endogenously to the feature [ 2900 ] considered.
- the system may use an exogenous definition of the reference group [ 3030 ], or an endogenous definition, or both.
- exogenous definitions for a reference group [ 3030 ] include but are not limited to:
- the system sorts a list of all subjects [ 272 ] homogenous to the target subject [ 272 ].
- Homogeneous means a subject of the same type (actor [ 220 ] to actor [ 220 ], group [ 225 ] to group [ 225 ], or workflow process [ 128 ] to workflow process [ 128 ]) or of a compatible type (actor [ 220 ] to group [ 225 ]).
- the list is sorted according to a distance defined over feature descriptors [ 3080 ]. In the default embodiment the distance is the Euclidian distance between the feature descriptors [ 3080 ]. In another embodiment it is a linear combination of the Euclidian distance between the feature descriptors [ 3080 ] and of the Euclidian distance between the variances of the feature descriptors [ 3080 ].
- the system re-computes the reference group [ 3030 ] (be it endogenously or exogenously defined) at regular intervals.
- this interval is a fixed multiple of w, for example low.
- the interval is adapted to the feature [ 2900 ] considered: for example if characteristic time scales are available for that feature then the median value of these time scales over the reference group is used as the time interval for the next computation of the reference group.
- the mechanism to detect anomalies by deviation [ 805 ] is executed at fixed time interval w. Two cases have to be considered depending on the type of referential [ 3000 ].
- a deviation is detected when the absolute value of the difference between the analyzed feature descriptor [ 3080 ] and the reference feature descriptor [ 3080 ] is larger than A times the variance of the reference feature descriptor [ 3080 ] across the reference observations.
- a deviation is detected using the same criterion after replacing the variance across the set of observations in time with the variance across the set of subjects comprising the reference group.
- the threshold multiplier A has a default value of 10, and is tuned to larger or smaller values based on feedback given by the user regarding detected anomalies by deviation (see the section on Anomaly detection tuning).
- the system By leveraging behavioral norms and deviations from those norms in the behavioral model [ 200 ] built and continuously maintained by the behavioral modeling component [ 445 ], the system also predicts individual behavior; in other words it uses the past to predict the near future.
- Predicted behavior [ 262 ] computed by this invention includes anomalous behavior and also more generally includes any kind of future events [ 100 ] that are deemed to have a high likelihood of occurring, based on past events [ 100 ].
- Examples of behavior predicted by the system in the default embodiment include but are not limited to:
- the system automatically assigns a confidence level [ 870 ] to predicted anomalies [ 270 ].
- the confidence [ 870 ] of a predicted anomaly [ 270 ] is always lower than if all events [ 100 ] had already been observed.
- the confidence level [ 870 ] is derived from the confidence [ 870 ] of the corresponding past anomaly [ 270 ] by applying an uncertainty factor, which is simply the prior probability that missing events [ 100 ] will be observed given all the events [ 100 ] that have been observed so far.
- the missing events [ 100 ] are part of a workflow process [ 128 ], which is modeled by the default embodiment as a higher-order Markov chain (as described in the section on continuous workflow analysis), the probability of those missing events [ 100 ] is directly inferred from the parameters of that Markov chain.
- anomalies by deviation [ 805 ] are detected on any kind of feature [ 2900 ] present in the stream of events [ 100 ].
- predicted behavior [ 202 ] is inferred on the basis of particular referentials [ 3000 ] whenever an analyzed feature [ 3075 ] matches the corresponding reference feature [ 3070 ], rather than when a deviation is observed between the analyzed feature [ 3075 ] and the reference feature [ 3070 ].
- reference features [ 3070 ] are statistically significant as they result from aggregating patterns of behavior over a significant period of time (for example in the case of a baseline referential [ 3010 ]) or over a large number of actors [ 220 ] (for example in the case of a peer-group referential [ 3005 ]).
- the system generates alerts [ 305 ] for detected anomalies [ 270 ] that meet specific criteria of importance.
- the unique criterion is Conf*Rel ⁇ where Conf is the confidence level [ 870 ] of the anomaly, Rel its relevance [ 280 ], and ⁇ is a threshold for alert generation.
- FIG. 31 shows the states in which an alert [ 305 ] generated by the system exists, and the transitions between those states.
- An alert [ 305 ] is generated by an alert profile, which is defined as follows:
- the alert [ 305 ] generation mechanism is parameterized by the following values:
- the user can modify the parameter t a on which anomalies [ 270 ] are accumulated to build up an alert [ 305 ], using a slider rule widget with a logarithmic scale with windows ranging from as small as 1 hour to as large as 1 month.
- the definition of these configuration parameters implies that since the number of individual actors [ 220 ] covered by the system and the data processing throughput are bounded, then the frequency of generated alerts [ 305 ] is in general bounded.
- the user can provide feedback [ 160 ] regarding the actual importance of one or more generated alerts [ 305 ], as described in the section on Anomaly detection tuning.
- the system can be configured to trigger mitigation and preventive actions as soon as it is notified of an anomaly.
- mitigation and preventive actions are twofold.
- Feedback given by a user of the system on an alert [ 305 ] consists of assessing the relevance of the alert [ 305 ], i.e. the risk it represents. Based on that feedback, the system updates the relevance [ 280 ] of the underlying anomalies.
- the system lets the user enter feedback [ 158 ] on:
- a key feature of the anomaly detection tuning mechanism described in this invention is the ability to define a whole anomaly class [ 3200 ] showing a particular similarity and providing specific feedback [ 158 ] for all the anomalies [ 270 ] contained in this class [ 3200 ].
- An anomaly class [ 3200 ] can be defined in a number of ways.
- a class is defined by one or more expansion criteria.
- Each criterion defines a set of anomalies [ 270 ] (including the set of initial anomalies [ 270 ] contained in the alert on which the user is entering feedback [ 158 ]), and the combination of multiple criteria is interpreted as a conjunction, so that the resulting anomaly class [ 3200 ] is the intersection of the anomaly sets.
- an additional operator allows combination of several classes [ 3200 ], in which case the combination is interpreted as a disjunction; this allows the ability to capture the union of anomaly classes [ 3200 ] using a single set of criteria.
- an anomaly class [ 3200 ] can be defined in two different ways. The first is to define a class by relaxing constraints [ 3205 ] on a given anomaly [ 270 ], which allows the capture of very targeted classes [ 3200 ], in which all anomalies [ 270 ] share many properties with the initial anomaly [ 270 ]. The second is to define a class [ 3200 ] by specifying similarity constraints [ 3210 ] with respect to a given anomaly [ 270 ], which allows the system to capture broader classes [ 3200 ], in which anomalies [ 270 ] only share one or a few properties with the initial anomaly [ 270 ].
- An anomaly class [ 3200 ] can be defined by constraint relaxation [ 3205 ] on anomaly A by using any number of the following constraints [ 3260 ]:
- an anomaly class [ 3200 ] can be defined by constraint specification on the basis of anomaly [ 270 ] A by using any number of the following criteria:
- the feedback [ 158 ] given by the user on an anomaly [ 270 ] or an anomaly class [ 3200 ] is a binary value, “relevant” or “irrelevant”, meaning that the anomaly [ 270 ] is confirmed [ 3105 ] or refuted [ 3110 ].
- the next section explains how the system automatically tunes the anomaly detection component [ 450 ] using that feedback [ 158 ].
- the first type of anomaly [ 270 ] that the user can give feedback on is an atomic anomaly [ 830 ]. For example, if the event [ 100 ] corresponds to a previously blacklisted topic [ 144 ], the user may decide to relax the constraint on this topic [ 144 ] by applying a lower index of relevance to all events [ 100 ] corresponding to the detection of this topic [ 144 ].
- some abnormal events are defined with respect to a threshold value, so that another way to tune the detection of this type of anomaly [ 270 ] is to simply change the threshold at which an event [ 100 ] is considered abnormal: for example, what minimal value of a shared folder's volatility will constitute an anomaly [ 270 ].
- Anomalies by association can also be tuned by a user at a fine-grained level, in addition to the indirection level of each association type described in the section on Anomaly detection, which is part of the system configuration.
- FIG. 33 shows the process by which a user can give feedback [ 158 ] on an anomaly by deviation [ 805 ].
- Such feedback [ 158 ] includes the following options:
- the user may wish to apply the change to a whole class [ 3200 ] of similar anomalies, such as those concerning other actors [ 220 ] or those covering another topic [ 144 ].
- the anomaly detection component [ 450 ] in this invention contains an additional mechanism to fine-tune the detection of anomalies by deviation [ 805 ]. Rather than the specific types of Boolean feedback previously described (whereby the user indicates whether an anomaly [ 270 ] or anomaly class [ 3200 ] is absolutely relevant or absolutely irrelevant), this additional mechanism allows a user to manually change the threshold at which a deviation is detected on a given feature.
- the user [ 455 ] adjusts the threshold (as defined in the section on Anomaly detection tuning) used for detecting abnormal variations of the information flow between two actors [ 220 ], or abnormal variations of the measure of centrality of a given actor [ 220 ], or the breach in the theoretical workflow process [ 128 ] for a given type of document [ 162 ].
- the threshold as defined in the section on Anomaly detection tuning
- the user can adjust the weights associated with each behavioral metric [ 290 ] used to assess individual behavior [ 210 ] or collective behavior [ 215 ] associated with a particular trait [ 295 ], as described in the Individual Behavior Assessment Section. In one embodiment of the invention, this is done by manually setting the numeric weights of each metric [ 290 ]. In another embodiment, this is done using a slider widget acting as a lever to adjust the weight of each metric [ 290 ] while visually controlling the impact of those adjustments on the actor rankings [ 275 ], as described in the Alert Visualizations Section.
- the threshold adjustment mechanism is available only to advanced users of the application since it requires a lower-level understanding of the anomaly detection model.
- the benefit of this additional mechanism is that it allows more accurate feedback on which dimensions of a detected anomaly [ 270 ] need to be given more (or less) importance.
- this feedback [ 158 ] can be a Boolean value (relevant/irrelevant), in which case the anomaly relevance [ 280 ] is either set to 0 or left at its original value, a relative value (increase/decrease relevance [ 280 ])—for example in the case of periodic anomalies—, or this feedback [ 158 ] can be a numeric value whereby the user directly indicates the relevance [ 280 ] to assign.
- An additional mechanism aims to automatically prevent false negatives in the anomaly detection process in the case of anomaly bursts and is referred to as anomaly burst adjustment.
- the time elapsed since the last anomaly [ 270 ] is taken into account to adjust the relevance [ 280 ] of the newer anomaly.
- the goal of this mechanism is thus to account for the case of several anomalies detected within a short timeframe which are part of a burst phenomenon and which all have a root cause: the user's attention should be focused on this event [ 100 ] (or sequence of events [ 100 ]) rather than on every individual anomaly [ 270 ].
- a burst interval is defined: when the interval between two successive anomalies [ 270 ] is longer than the burst interval, no burst adjustment is applied; when the interval is shorter, the adjustment is an exponential function of the time interval.
- the negative exponent factor in this exponential function is optimized by the system at regular intervals in order to to minimize disagreements between successive anomalies [ 270 ], under the hypothesis that all such disagreements are due to burst phenomena and that all agreements occur outside of bursts.
- the system In addition to adjusting the relevance [ 280 ] of previously detected anomalies [ 270 ], the system also automatically adjusts the relevance of feedback [ 158 ] entered by the user in two different situations: feedback decay governs the evolution of feedback relevance over time, while feedback reinforcement describes the impact of recent user feedback on prior feedback.
- the other automated tuning mechanism (described in the next paragraph) is contingent on later anomalies [ 270 ] being presented to the user for confirmation or refutation; if future data happens not to contain any such similarly abnormal events, this alternative tuning mechanism will not be offered to the user. This is why anomaly feedback [ 158 ] entered by a user of the system is subjected to natural decay.
- Automatic tuning of the weight of user feedback [ 158 ] follows a simple decay scheme so that more recent decisions may optionally be given a greater weight (in this case, a value of relevance [ 280 ]) than older ones.
- the main goal of this mechanism is to avoid false negatives resulting from assigning a constant important over time to past decisions even when the data profile has changed significantly (and also in some cases, when these past decisions are no longer relevant to the users of the system).
- user feedback decay follows an exponential decay scheme—as commonly found for example in voting expert systems.
- the value for the half-life of such feedback [ 158 ] is one month, the relevance weight associated with these decisions is halved every month.
- User feedback [ 160 ] is given on two main types of objects produced by the invention: anomalies (as part of the anomaly detection process) and changes in the data profile (as part of the data discovery mechanism optionally provided by the data collection component [ 400 ]).
- the half-life of feedback is optionally adjusted based on the nature of the anomaly, so that the characteristic timescales of the respective actors, data types, and processes involved in the anomaly will be taken into account to compute the feedback half-life for that anomaly.
- the system configuration governs the definition of the decay parameter.
- the configurations available include, but are not limited to:
- the system described in this invention automatically adapts past feedback [ 158 ] based on more recent feedback [ 160 ]entered by the same or another user: this mechanism, called feedback reinforcement, allows the incorporation of a dynamic knowledge base into the relevance model rather than a static knowledge base built on a case-by-case basis. Furthermore the system guarantees the consistency of relevance decisions with respect to past decisions; additionally, by boosting decisions that have been made multiple times, it also increases the recall rate of actual anomalies [ 270 ].
- reinforcement is strictly defined by the subsumption relation between instances of anomaly feedback [ 158 ]: since a partial order is defined on anomaly classes [ 3200 ], past feedback [ 158 ] corresponding to classes [ 3200 ] included in more recent feedback [ 158 ] are the only ones considered for reinforcement. If the feedback [ 158 ] decisions are identical, it is a case of positive reinforcement; if the decisions are opposite, it is a case of negative reinforcement. In another embodiment, reinforcement is more broadly defined, so as to include all pairs of feedback [ 158 ] with a non-empty intersection.
- an example of such an overlap measure consists of computing the ratio of shared features (e.g. actors [ 220 ], topics [ 144 ], groups [ 225 ], workflow processes [ 128 ], named entities, evidence [ 108 ] links and associations, etc.) over the total number of similar features in the older anomaly [ 270 ]. For example, if the older feedback [ 158 ] is related to an anomaly class [ 3200 ] comprising 8 actors [ 220 ], and this set of actors [ 220 ] overlaps with 3 actors [ 220 ] in the anomalies of the more recent feedback [ 158 ], then the reinforcement will be weighed by a factor of 3 ⁇ 8.
- shared features e.g. actors [ 220 ], topics [ 144 ], groups [ 225 ], workflow processes [ 128 ], named entities, evidence [ 108 ] links and associations, etc.
- the feedback reinforcement mechanism follows a multiplicative increase I multiplicative decrease scheme commonly found in voting expert systems: the relevance of a positively reinforced anomaly [ 270 ] is multiplied by a fixed amount and the relevance of a negatively reinforced anomaly [ 270 ] is divided by a given factor. For example, both multiplicative factors are set by default to 2. This factor is then optionally multiplied by a weight associated with the reinforcement operation as described above.
- the present invention relies on a multi-dimensional scaling (MDS) component [ 425 ] to compute the visualization layout [ 305 ] for such datasets.
- MDS multi-dimensional scaling
- the multi-dimensional scaling component [ 425 ] uses an incremental multi-dimensional scaling algorithm that is an improvement over a method described in [Morrison 2003].
- the algorithm uses a sliding time window of period T.
- the size of T is taken as 3 months, which is the characteristic scale of a stable human behavior along a particular trait, and thus provides a reliable baseline to detect deviations from that baseline that represent anomalous behavior.
- the continuous MDS computation is initialized by choosing a core subset of data items, using random sampling from the sliding window ending at the current time, and computing a static layout for those items.
- this initial computation uses a spring algorithm initialized with a random layout.
- the core subset is of size 300,000 data items
- the window is of length 3 months
- the window is updated on a weekly basis.
- the present invention uses an incremental multi-dimensional scaling algorithm to produce an updated layout each week without running a full static MDS algorithm over all 300,000 data items.
- the core subset is used to generate a layout using the basic, static spring-force algorithm.
- this core subset serves as a basic layout and the position of its member items is not modified.
- the position of items in the core subset is only updated if the removed item belonged to the core subset.
- the continuous MDS computation in this invention uses a variation of [Morrison, 2003] that improves on the interpolation method.
- the original method places the item to interpolate on a circle around the nearest core subset item, at an angle obtained through minimization of an error criterion on the difference between high-dimensional and low-dimensional distances against all core subset items; also, it computes a sample to refine the position of the interpolation item using an expensive method, even though the sample obtained rarely varies.
- This method is too computationally intensive to be run in real time, and has additional limitations—the main one being that it only works in two-dimensional spaces. Our proposed method is much faster and, on most datasets, offers as good results for all error criteria used for evaluation.
- Item removal proceeds by the following steps:
- This method of performing item removal allows the basic layout to change over time while still preserving the original layout.
- the sequence viewer [ 440 ] is a real-time data visualization included in this invention that provides the user with a synthetic, continuously updated view of a potentially massive number of event sequences [ 166 ] (which typically reflect individual and collective behavior) by leveraging the visualization layout [ 355 ] produced by the multi-dimensional scaling (MDS) component [ 425 ] previously described.
- MDS multi-dimensional scaling
- sequence viewer [ 440 ] is thus a unique and highly efficient visual representation of massive datasets that also has insightful analytical benefits. More precisely, it offers the following advantages:
- the sequence viewer [ 415 ] presents the user with a potentially huge number of sequences [ 166 ] of any type of events [ 100 ] in a compact and synthetic manner.
- sequence viewer [ 415 ] produces a layout that groups together event sequences [ 166 ] following the same pattern and brings closer together sequences [ 166 ] that exhibit similar patterns.
- unusual sequences stand out from the rest of the data and can be analyzed further, where unusual sequences [ 166 ] are understood as sequences [ 166 ] that do not belong to a cluster of sequences [ 166 ] matching a dominant pattern.
- sequence viewer [ 440 ] is used to display instances of a business workflow
- unusual sequences [ 166 ] correspond to deviations from the typical steps in the realization of an ad-hoc workflow process [ 128 ], or from a non-compliant realization of a formal process [ 128 ].
- the sequence viewer [ 415 ] increases the completeness of any categorization scheme by leveraging the structure represented by the underlying sequences [ 166 ], especially when they are based on discussions [ 136 ]. This is because the sequence viewer [ 415 ] can categorize items [ 122 ] which would otherwise have been missed, for example items [ 122 ] with very low text content. These items [ 122 ] usually cannot be categorized based on their sole content, however discussions [ 136 ] reveal different types of causal relationships items have with other items [ 122 ], such as messages constituting actions that agree, disagree, approve, or reject a topically relevant issue found in other items of the same discussion [ 136 ]. The sequence viewer [ 415 ] can even reveal to the user the existence of an altogether previously unknown workflow process [ 128 ].
- An event sequence [ 166 ] can represent any list of time-stamped events [ 100 ]. Possible types for such events [ 100 ] include but are not limited to: emails and other communications [ 123 ] (time-stamped using the date at which the communication [ 123 ] was sent), loose documents (which can be time-stamped using the last modification date), or external events [ 170 ] (using a natural time-stamp definition).
- the superset of all events [ 100 ] can be optionally filtered by any mechanism relevant to the particular application scenario, following which sequences are defined as lists of items [ 122 ] that pass the filter.
- discussions [ 136 ] provide the basis for defining event sequences [ 166 ]: for example, if the user can select a particular actor [ 220 ], the event sequences [ 166 ] will consist of all discussions [ 136 ] containing at least one data item [ 122 ] involving that actor [ 220 ].
- event sequences [ 166 ] are defined, a tagging scheme is applied to determine a unique item tag [ 142 ] for each element within an event sequence [ 166 ] and produce tagged sequences [ 138 ].
- This scheme can be defined in any way relevant to the particular application scenario.
- tagging is defined as an ordered list of queries [ 168 ] that are continuously run on the incoming stream of events [ 100 ], each query [ 168 ] being associated with a tag or label.
- Examples of tagging schemes include but are not limited to:
- the tagging scheme optionally includes an alignment position definition, which determines a position in each event sequence [ 166 ] that will be used for horizontal positioning of each sequence [ 166 ] in the sequence viewer's [ 415 ] layout.
- the alignment position is by default defined as the first item [ 122 ] matched by the highest-ranked query [ 168 ], and can be overridden by the user with another definition more specific to the particular type of events [ 100 ] considered.
- the sequence viewer [ 415 ] uses the result of a continuous MDS computation to produce a one-dimensional layout of those tagged sequences [ 138 ].
- an incremental one-dimensional MDS algorithm as defined in this invention is used and provided by the MDS component [ 425 ], in which an optimization is introduced to speed up the layout for this particular profile of data.
- This optimization phase stems from observing that over a very large number of tagged sequences [ 138 ] that represent common events [ 100 ], such as those associated with steps in a particular workflow process [ 128 ], many tagged sequences [ 138 ] will be identical.
- the MDS algorithm provided by the MDS component [ 425 ] is a modified version of the previously-described method: it maintains a cache of already-positioned data points (i.e.
- tagged sequences [ 138 ] so that for each data point incrementally added to the layout in the sliding-window-based [ 380 ] MDS algorithm, if that data point already exists in the cache, a counter is incremented but no new position is computed. Conversely, whenever a data point (i.e. a tagged sequence [ 138 ]) leaves the sliding window [ 380 ] and must be deleted from the layout, that counter is decremented and positions are only updated if that point belonged to the core subset in the MDS algorithm and the counter has reached zero.
- the metric definition is also an input to the MDS algorithm.
- the distance definition used in the MDS computation is the shared subsequence distance, which is defined as follows: for two tagged sequences [ 138 ] S 1 and S 2 , using their respective alignment positions compute the number of identically tagged items [ 122 ] in each tagged sequence [ 138 ] at the same position. Let us call this number c(S 1 , S 2 ). Then the distance between S 1 and S 2 is defined as
- length_left(S) is the number of items [ 122 ] in the tagged sequence [ 138 ] S occurring prior to the alignment position and length_right(S) the number of items in S occurring after the alignment position.
- a number of patterns are defined by the user in the form of regular expressions p 1 , . . . , p k using as symbols the list of all available item tags [ 142 ].
- These regular expressions are taken to represent particularly significant patterns of events [ 100 ], such as sequences [ 138 ] know to represent anomalous events [ 100 ], or alternatively to represent nominal, standard sequences [ 138 ].
- the distance between two tagged sequences [ 138 ] S 1 and S 2 is then computed as the L2 norm of the vectors P(S 1 ) and P(S 2 ) where for any tagged sequence [ 138 ] S, P(S) is a vector of length k where the i-th component is 0 if S does not match pattern pi and 1 if it does match pi.
- weights can be associated with each dimension in the L2 norm computation in order to reflect different levels of importance among the patterns.
- the distance between two tagged sequences [ 138 ] is defined as a weighted combination of the shared subsequence distance and of the shared patterns distance.
- the continuous sequence viewer [ 415 ] is regularly updated so as to organize tagged sequences [ 138 ] analyzed in a sliding time window [ 380 ].
- real-time sequence layout computation relies on two types of update operations: major updates and minor updates.
- a full multi-dimensional algorithm computation is performed on an initial data set consisting of tagged sequences [ 138 ] in the sliding window [ 380 ].
- the algorithm used comprises an iterative step where a subset of tagged sequences [ 138 ] sampled from the whole data set is positioned in a number of iteration loops, and an interpolation step in which all tagged sequences [ 138 ] outside the sampled subset are positioned by comparing them only to their closest neighbors among subset sequences [ 138 ].
- the running time for this step is O(S 2 ) where S is the size of the interpolation subset.
- a list of new tagged sequences [ 138 ] is added to the data set and a list of old tagged sequences [ 138 ] is removed from the data set.
- New sequences [ 138 ] are positioned using the interpolation step only. An old sequence [ 138 ] can be removed without repositioning other sequences [ 138 ] if it did not belong to the initial sampled subset. Otherwise, the impact of its removal on its closest neighbors needs to be taken into account.
- the running time for this step if O(W 3/2 ) where W is the size of the data entering or leaving the window in a steady-state regime.
- the default embodiment of this invention performs minor updates using the modified spring-based multi-dimensional scaling algorithm included in the continuous MDS component [ 425 ].
- FIG. 34 shows a graphical depiction of the sequence viewer [ 415 ] that displays the results of computing the layout over a set of tagged sequences [ 138 ] as explained previously.
- This visualization is composed of several areas, which are described in the following.
- each tag [ 142 ] applied to categorize at least one item among all sequences [ 138 ] is assigned a specific color.
- the system assigns a random color using a random palette that maximizes the contrast between the most commonly occurring tags [ 142 ].
- a legend area shows an icon [ 3425 ] filled with the chosen solid color for each such tag [ 142 ]. In one embodiment, clicking on such an icon allows the user to modify the color assigned to that tag [ 142 ].
- the zoomed area shows a detailed view of the sequences [ 138 ] [ 3415 ] that are selected in the compressed area, i.e. that are covered by the zoom box [ 3405 ], which is a bounding box acting as a cursor, which allows the user to browse the set of sequences [ 138 ] vertically.
- sequence browsing [ 138 ] is done by using the up and down arrow keys.
- Each sequence [ 138 ] in the zoomed area is represented horizontally as a series of rectangular areas filled with a solid color selected for the tag [ 142 ] of the corresponding item [ 122 ].
- the compressed area [ 3410 ] shows a synthetic view of the whole set of sequences [ 138 ], optionally filtered by one or more criteria. Sequences [ 138 ] in the compressed area are aligned horizontally using the alignment position defined previously, just as in the zoomed area. However, because the compressed area shows a number of sequences [ 138 ] that is potentially several orders of magnitude larger than the height of the area in pixels, it relies on a color compression mechanism: a number of sequences [ 138 ] per pixel is computed, and each item [ 142 ] is drawn as a rectangle of unit height that has a color defined using a method called a blending scheme.
- a blending scheme takes as input the color of all items [ 142 ] represented at a particular integer position on a one-pixel horizontal line in the compressed area.
- the blending scheme is obtained by computing the average in the integer-valued RGB space of all input colors.
- the blending scheme computes the average in the decimal-valued HSB space of all input colors.
- Another legend area lists the patterns [ 3430 ] that were used in the inter-sequence distance definition when using the shared patterns distance or a composite distance definition.
- search and filter area [ 3435 ] lets the user filter the set of sequences [ 138 ] displayed in the visualization.
- search and filter criteria are provided:
- An alternative sequence viewer [ 415 ] display allows the user to specify a past date for a sequence-layout snapshot with which to compare a snapshot of the current sequence layout results.
- two past dates can be specified.
- snapshots will be computed over a period of time equal to the length of the sliding time window [ 380 ]. This duration can be overridden by the user by specifying a longer time period.
- FIG. 35 shows an example of a sequence snapshot contrast viewer.
- a legend area [ 3510 ] indicates the color assigned to each tag [ 142 ] used to label the items [ 122 ] within each tagged sequence [ 138 ], as in the continuous sequence visualization.
- the compressed area is similar to the one described for the continuous variant, except that it is split vertically into two sections: the upper section [ 3515 ] shows sequences [ 138 ] in the older timeframe, while the lower section [ 3520 ] shows sequences [ 138 ] in the more recent timeframe.
- This contrast visualization lets a user compare salient characteristics of the underlying sequences [ 138 ] (which in turn can represent workflow instances [ 134 ]). In particular, large differences in the number of sequences [ 138 ] are often meaningful, since the contrasted periods have the same duration. The typical length of a sequence [ 138 ] in the two time periods is another feature of interest. Also, choosing a color palette that clearly differentiates two or more categories of tags [ 142 ] lets a user eyeball the dominant tags in each set of sequences [ 138 ], the order in which they appear, etc.
- the resulting visualization dramatically outlines any significant increase or decrease in that actor's [ 220 ] level of engagement in realizing the workflow process [ 128 ].
- sequence snapshot contrast is useful for providing an intuitive and straightforward visualization of the differences between the dominant patterns and the unusual sequences [ 138 ] around dates of particular interest, for example dates at which suspicious activities have been independently detected by another mechanism. This feature is particularly appropriate when the dates to compare are far apart, since the continuously updated view described previously does not provide such a perspective in that case.
- the alias usage browser [ 478 ] is a continuous visualization used by the present invention to efficiently and intuitively display the results of actor [ 220 ] analysis on a continuous basis.
- Actor alias identification and persona [ 230 ] identification are described in U.S. Pat. No. 7,143,091. These analysis methods allow the system to define disambiguate electronic identities [ 235 ] based on the electronic aliases [ 240 ] they use communicate on various channels [ 156 ], and to assign each actor [ 220 ] one or more personae [ 230 ] reflecting the fact that an actor [ 220 ] might project different personae [ 230 ] depending on the topic she is communicating about, with whom she communicated, and more generally in what context and circumstances.
- the alias usage browser can be used both reactively and in a proactive, investigative manner. In particular, it lets a user drill down into usage the whole set of electronic identities [ 235 ] used by a specific actor [ 220 ].
- the alias usage browser [ 478 ] lets a user drill down into the data and filter that data in a very flexible way.
- the alias usage browser [ 478 ] relies on the following features resulting either from data and metadata extraction during the processing phase, or from analysis during the post-processing phase (such as alias identification and persona [ 230 ] identification):
- the user chooses what features are shown as rows and which ones are shown as columns.
- a time period of interest can be selected, otherwise the system will consider and play back all historical data processed and analyzed until the present time. Then the user has the possibility to fully specify the list of rows and the list of columns, or can let the system derive the most meaningful associations.
- the automatic derivation of matrix components i.e. of groups of row elements and column elements
- the algorithm takes as input parameter the partition size (i.e. the number of groups), starts with initial random assignments of rows and columns among groups of identical size, and upon each iteration, re-assigns successively each row and each column so as to minimize the sum of cross-entropies over the whole partition.
- the system lets the user go through these groups using navigation controls.
- FIG. 36 shows the alias usage browser [ 478 ] at some point in time.
- the title of the visualization [ 3605 ] summarizes the bound features in the current state (in this case, an author has been selected whose communications [ 123 ] are analyzed, and the topic feature has been bound too), as well as the column feature (electronic identity [ 235 ] in this case, which is resolved to the actor [ 220 ] as described in the prior patent noted above) and the row feature (pragmatic tag [ 166 ] in this case, which is computed by the pragmatic tagging component [ 430 ]).
- each matrix is visualized as a very common representation known as heat map, with a color saturation assigned to each cell [ 3620 ] in order to indicate the amount of communication [ 123 ] for the corresponding row feature value and given column feature value, as well as for all bounded features.
- Alternative embodiments give other kinds of visual cues, such as the level of formality, or the intensity of negative or positive sentiment.
- Each matrix is animated and offers playback functionality available through a timeline control [ 3625 ], similar to those described in the sections on Stressful topics and Temperature gauges visualizations, so that the user can replay past events [ 100 ] at normal or faster speed.
- the system can determine the most anomalous time periods and highlight them for the user on the timeline control [ 3625 ] using a red color [ 3630 ]. In the default embodiment, this is done by leveraging the results of cross-entropy computation described above. For a given matrix, the most anomalous periods correspond to local maxima of cross-entropy in the group considered. For the whole dataset, the most anomalous periods correspond to local maxima of the sum cross-entropy over all groups.
- the user can interact with the alias usage browser [ 478 ] in multiple ways, including by drilling down and up into the data. More precisely, every time the alias usage browser [ 478 ] shows a particular matrix, the user can drill down into the data in three ways: by right-clicking on a column header [ 3610 ] (which binds the value of the column feature to that element and proposes a selection list for a new column feature among unbound features), on a row header [ 3615 ] (which binds the value of the row feature to that element and proposes a selection list for a new column feature among unbound features), or on a cell [ 3620 ] (which binds both the column and the row values).
- a column header [ 3610 ] which binds the value of the column feature to that element and proposes a selection list for a new column feature among unbound features
- a row header [ 3615 ] which binds the value of the row feature to that element and proposes a selection list for a new column feature
- FIG. 37 illustrates the animation operated by the alias usage browser [ 478 ] whenever the user chooses to drill down into a particular cell [ 3705 ] of the current matrix by double-clicking on that cell.
- the cell which has been double-clicked corresponds to a pragmatic tag [ 166 ] “Expression of concern” and an electronic identity [ 235 ] “Katherine Bronson”.
- This means that the underlying communication [ 123 ] model is restricted to data items [ 122 ] matching these two criteria in addition to all other previously-bound features (which in this case were “Katherine Midas” as author and “Widget pricing” as topic).
- the rectangle bounding the double-clicked cell is expanded until it occupies the exact space of the original matrix.
- a new, zoomed-in matrix [ 3710 ] is then drawn in place of the original one, with new header values (corresponding to the values of the new row feature “Recipient” and to the values of the new column feature “Formality level” that have either been automatically selected by the system or specified by the user, as described above).
- each cell 3715 ] now represents aggregate information (such as communication [ 123 ] volume) for data items [ 122 ] having matched all bound features. Also, if the time animation was playing when the user double-clicked the cell, it is not interrupted and continues playing after the zoom-in animation completes. If it was paused, then it is still paused at the same point in time. Double-clicking on a row header [ 3615 ] works similarly, except that the animation is vertical only: the row corresponding to the double-clicked header is expanded until it occupies the whole height of the original matrix. The same is true for double-clicking on a column header [ 3610 ].
- the user can drill up by right-clicking on a column header [ 3610 ], on a row header [ 3615 ], or on a cell [ 3620 ].
- the system uses a continuous actor graph visualization to represent anomalies [ 270 ] of various types.
- the standard actor graph is decorated with one or more additional visual features.
- the standard continuous actor graph visualization displays actors [ 220 ] as nodes, and communications [ 123 ] and other events [ 100 ] as edges, as described in U.S. Pat. No. 7,143,091 in more detail.
- the display is updated at discrete intervals of time, the interval duration being continuously adjusted by the system: at the end of each interval, newly added edges are displayed, while previously existing edges are aged so that some of them disappear. In one embodiment, this adjustment is based on the following dynamic constraints:
- the system adjusts visualization update parameters (including the value of the interval duration and the decay rate of older edges) and can also decide to ignore some new edges during each interval to meet these constraints.
- Animated actor graph visualizations [ 471 ] are used to represent a variety of structural anomalies in the graph, both local and global, including but not limited to the following:
- this visualization consists of an animated graph representing changes in an individual actor's [ 220 ] position relative to other actors [ 220 ] in the network, and can be used for example to demonstrate the shift of that actor's attention towards outbound communications [ 123 ] (e.g. with actors outside the corporation).
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- Computer Hardware Design (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
- This application claims the benefit of U.S. Provisional Patent Application No. 61/280,791 filed on Nov. 6, 2009, the full disclosure of which is incorporated by reference herein for all purposes.
- The present disclosure falls into the general area of anomaly detection, and more particularly, anomaly in human behavior analyzed from electronic data. It can be applied to a number of domain-specific scenarios such as compliance monitoring or risk management in high-risk domains such as investment banking or intelligence. It is also applicable to malicious insider detection in areas ranging from corporate theft to counter-intelligence or the broader intelligence community.
- Anomaly detection is a general goal that can fulfill a number of missions, ranging from the prevention of activities perpetrated by malicious insiders to unintentional threats, as well as more generally managing human and operational risk.
- Previous approaches to anomaly detection have included:
-
- Rule-based compliance systems based on keyword matches or content identification (for example using a fingerprinting technology).
- Correlation engines where different types of events are put in relation and the rules to trigger an alert are somewhat more flexible than the previous type of systems (producing for example one-dimensional anomalies based on histograms).
- There are essential limitations shared by all these systems. One of the main limitations is that they can be circumvented due to the static character of the rules used to configure them, so that they are unable to cover the infinite spectrum of potentially damaging behaviors and actions that can take place in a large organization.
- In addition, even the most efficient of these previous systems are restricted to a vertical business domain, as the definition of the rules and correlation methods is highly dependent on the underlying business data and processes. Furthermore, even very domain-specific work is usually done in a cross-media, cross-system manner, in which (for example) emails, instant messages and phone calls can be an integral part of processes which in theory are implemented with a dedicated application. Nevertheless such unstructured, out-of-band communication is usually ignored by existing systems. An example of such a vertical domain is the banking and financial services industry in which compliance systems are very often tied to specific transactional systems.
- Also, the fact that these systems rely on rules or patterns which, even when they are learned automatically by a machine, correspond to confirmed signs of malicious or otherwise risk-carrying activities, implies that they only detect these events after the fact, and that there is no way to anticipate them. This is a major limitation in their detection capabilities, since recent studies of espionage and IT sabotage cases have shown that nearly half of malicious insiders exhibited some inappropriate, unusual, or concerning behavior prior to the incident, but had no recorded incidents of violating organizational policies.
- Most of these systems are limited in their analysis scope to data sources located within a particular organization, without the capability to take into account external sources such as public websites or personal communication channels [156] on which insiders also interact and exchange information.
- Finally, most anomaly detection systems only report threats once they have harmed the organization or its employees. In particular, malicious insiders are typically flagged once they have perpetrated their actions (when they are flagged) and the associated damage can then only be mitigated rather than prevented.
- The present disclosure addresses the prior limitations noted above.
- Firstly, the present system is considerably more difficult, and in most situations almost impossible, to circumvent. This is because it builds a holistic model, which covers both structured and unstructured data, but which also covers human behavior, including communications [123] among people and interactions between people and data. That model can be integrated with the organization's infrastructure to provide complete coverage of human and machine activities. A key aspect of this model is the determination of regularity in behavior, both at the individual and group level; behavior, especially in an organizational context, is anything but random. This disclosure is data-driven, and not primarily rule-driven, so that it can automatically adapt to changes in the analyzed data or in the environment, including when human behavior itself is changing. (It should be noted however that this disclosure also supports the definition of rules as an essential requirement, since they are an intrinsic part of many regulatory or internal policies in place in large organizations.)
- Secondly, the problem of applicability scope mentioned above is also addressed by the present disclosure, as it is suitable for any type of domain. This is because it does not require the explicit definition of a set of rules or anomalous patterns, but also because it handles unstructured data in a generic manner, and is able to correlate that data with the type of structured data to which vertical anomaly detection systems are often restricted.
- Thirdly, the present disclosure establishes a multi-dimensional behavioral model that is based on normalcy, i.e. on how people habitually communicate, work, and interact, rather than on expected behavior defined by a predefined number of dimensions. Every risk-generating activity is characterized by at least some change in behavior: even malicious insiders who go to great lengths to circumvent detection mechanisms in place will leave some trace in electronic form that will result in a deviation from baseline behavior—whether from their own established baseline or that of their peer group—even when no rule has been violated, and when the malicious activities in question do not fit any scenario known a priori.
- Finally, in contrast to the reactive mode of operation of most anomaly detection systems, the present disclosure provides techniques that allow an organization to proactively detect such threats by spotting precursory signs of malicious activity, and thus intervene and prevent the threats from being executed. Such signs always precede the perpetration of the malicious insider's actions, however they are only detectable by a system which performs holistic analysis and which detects any significant and therefore possibly relevant deviations from normal behavior—as opposed to a finite number of pre-defined rule violations (also known as patterns or “signatures”).
- The novel way in which the systems and methods of the present disclosure solves these and other common limitations relies heavily on the following characteristics:
-
- A holistic analysis of information, which takes into account relationships between data items, between comparable individuals and groups, between formal or informal processes, and across these different entities (for example, a method called document lifecycle analysis allows the present disclosure to model relationships between people and the data they exchange or manipulate).
- A rich, extensible mechanism to accurately model human behavior and to detect anomalies in this behavior with respect to temporal, organizational, or other dimensions.
- A scalable method of applying these behavior modeling and holistic information analysis capabilities to massive volumes of data, processed in a continuous manner.
- A unique combination of reactive and proactive anomaly detection mechanisms—this combination produces a manageable quantity of alerts that are consistent and directly interpretable. Without being able to constrain the number of alerts and manage their quality, an anomaly detection system such as the one described here would not be realistically deployable.
- Automatic quality assessment and performance guarantees obtained by auto-adjustment of the system, both with respect to data characteristics and to user input.
- An extensible application framework to efficiently visualize patterns derived from analyzing electronic data and human behavior, and to show how those patterns evolve over time and across the organization.
- In one embodiment, the present disclosure describes a system and a method to enable a holistic and continuous analysis of information to accurately model human behavior and to detect anomalies in this behavior with respect to, but not limited to, temporal, organizational, or other dimensions.
- The system is able to infer potentially damaging activities, whether of unintentional or malicious nature, without requiring the prior definition of the type and characteristics of these activities, it relies on the analysis of potentially massive volumes of heterogeneous electronic data (includes both text-bearing and non-text-bearing records) stored inside or outside any organization. That analysis can be performed either in discrete increments or in real-time. The present disclosure establishes structural and semantic patterns from the analyzed data and builds a predictive multi-dimensional model of both individual and collective behavior which allows detecting abnormal patterns in these behaviors as well.
- Some embodiments of this system are intended to run at a much smaller scale than much of what is described here. The smallest scale is at the level of individual personal use, to observe or monitor one's personal ecosystem through all available communication channels, including social media such as Facebook or Twitter. Such personal usage would not include a compliance system for example, but could be used to do things such as observe shifts in popularity, influence, stress within one's circle, and other things that are described in this application and in previous ones. Somewhat larger scale usage for small business would likewise be unlikely to have enterprise systems present such as an HR system, an accounting system, or a compliance system, but could nevertheless very beneficially be used to do things like identify which processes were causing friction amongst the staff, who is “buck passing”, etc.
- Some embodiments will therefore make suggestions directly to the user based on evidence. To take a simple example, in single user use, the system could point out that the user was very slow to respond to their grandmother—maybe it is time to call her or send her mail; that the user tends to “yell” about certain topics and should avoid doing so in future; or that they are spending way more time this year than they did last year on the NCAA betting pool during working hours, etc.
- The present disclosure is illustrated by way of example, and not by way of limitation, in the accompanying drawings which are described below. Like reference numerals may refer to similar elements and the figures may not be drawn to scale.
-
FIG. 1 is a block diagram of the main data objects analyzed or created by the system in accordance with an embodiment of the present disclosure. -
FIG. 2 is a block diagram of the main concepts used for behavioral modeling and anomaly detection in accordance with an embodiment of the present disclosure. -
FIG. 3 is a block diagram of the main data structures presented to a user of the system in accordance with an embodiment of the present disclosure. -
FIG. 4 is an architecture diagram of the system in accordance with an embodiment of the present disclosure. -
FIG. 5 is a block diagram of the different data collection modes in accordance with an embodiment of the present disclosure. -
FIG. 6 is a block diagram showing the structure of the collection repository content in accordance with an embodiment of the present disclosure. -
FIG. 7 is a flowchart describing the continuous processing of new data in accordance with an embodiment of the present disclosure. -
FIG. 8 is a block diagram showing the different types of anomalies generated by the system in accordance with an embodiment of the present disclosure. -
FIG. 9 is a flowchart of the continuous periodic pattern detection process in accordance with an embodiment of the present disclosure. -
FIG. 10 is a flowchart of the feature collection process in accordance with an embodiment of the present disclosure. -
FIG. 11 is a flowchart of the periodic pattern frequency component update process in accordance with an embodiment of the present disclosure. -
FIG. 12 is a block diagram of a computer memory hierarchy example in accordance with an embodiment of the present disclosure. -
FIG. 13 is a block diagram of a computer network memory hierarchy example in accordance with an embodiment of the present disclosure. -
FIG. 14 is a flowchart showing the continuous categorization process in accordance with an embodiment of the present disclosure. -
FIG. 15 is a block diagram showing the different types of categorization components in accordance with an embodiment of the present disclosure. -
FIG. 16 is a block diagram of the categorization data model in accordance with an embodiment of the present disclosure. -
FIG. 17 is a flowchart of the pragmatic tagging component in accordance with an embodiment of the present disclosure. -
FIG. 18 is a state diagram of the pragmatic workflow model in accordance with an embodiment of the present disclosure. -
FIG. 19 is a flowchart showing a high-level process for detecting both textblock patterns and textblock hits in accordance with an embodiment of the present disclosure. -
FIG. 20 is a flowchart showing the process for producing a textblock graph, i.e. a graph of transitions between n-grams in a sliding window of size k, in accordance with an embodiment of the present disclosure. -
FIG. 21 is a flowchart showing the process for isolating textblock patterns from the textblock graph in accordance with an embodiment of the present disclosure. -
FIG. 22 is a flowchart showing the process for finding textblock hits in items within the universe in accordance with an embodiment of the present disclosure. -
FIG. 23 is a flowchart showing an alternate process for finding textblock patterns which uses bounded constant-access memory in accordance with an embodiment of the present disclosure. -
FIG. 24 is an illustration showing how n-gram transition edges are added to the textblock graph based upon a particular series of tokens in accordance with an embodiment of the present disclosure. -
FIG. 25 is an illustration showing an example of how local clusterability is calculated in accordance with an embodiment of the present disclosure. -
FIG. 26 is an illustration showing a method for limiting the size of the graph to be examined by only considering n-grams following function words in accordance with an embodiment of the present disclosure. -
FIG. 27 is an illustration showing a method for limiting the size of the graph to be examined by winnowing the list of n-grams to be considered in accordance with an embodiment of the present disclosure. -
FIG. 28 is a graph of information dissemination profiles computed by the system in accordance with an embodiment of the present disclosure. -
FIG. 29 is a block diagram showing the different types of features used for anomaly detection in accordance with an embodiment of the present disclosure. -
FIG. 30 is a block diagram showing the referential types used for anomaly detection in accordance with an embodiment of the present disclosure. -
FIG. 31 is a state diagram for anomalies generated by the system in accordance with an embodiment of the present disclosure. -
FIG. 32 is a flowchart for defining an anomaly class when entering user feedback in accordance with an embodiment of the present disclosure. -
FIG. 33 is a flowchart of the user feedback process for an anomaly by deviation in accordance with an embodiment of the present disclosure. -
FIG. 34 is a drawing of the continuous sequence viewer in accordance with an embodiment of the present disclosure. -
FIG. 35 is a drawing of the sequence snapshot contrast viewer in accordance with an embodiment of the present disclosure. -
FIG. 36 is a drawing of the alias usage browser in accordance with an embodiment of the present disclosure. -
FIG. 37 is a drawing illustrating the navigation through the alias usage browser in accordance with an embodiment of the present disclosure. -
FIG. 38 is a Hasse diagram showing an example of emotional intensity assessment in accordance with an embodiment of the present disclosure. -
FIG. 39 is a drawing of the animated graph of attention shift in accordance with an embodiment of the present disclosure. -
FIG. 40 is a drawing of the animated graph of delegation pattern changes in accordance with an embodiment of the present disclosure. -
FIG. 41 is a drawing of the animated graph of clique evolution in accordance with an embodiment of the present disclosure. -
FIG. 42 is a drawing of the continuous gap viewer for a single periodic pattern in accordance with an embodiment of the present disclosure. -
FIG. 43 is a drawing of the continuous gap viewer for correlated periodic patterns in accordance with an embodiment of the present disclosure. -
FIG. 44 is a drawing of the alert timeline visualization in accordance with an embodiment of the present disclosure. -
FIG. 45 is a drawing of the behavior-based alert visualization in accordance with an embodiment of the present disclosure. -
FIG. 46 is an illustration of the animation of the behavior-based alert visualization in accordance with an embodiment of the present disclosure. -
FIG. 47 is an illustration of the effect of behavioral metric tuning in the behavior-based alert visualization in accordance with an embodiment of the present disclosure. -
FIG. 48 is a screenshot of one embodiment of the social you-niverse visualization in accordance with an embodiment of the present disclosure. -
FIG. 49 is a screenshot of one embodiment of the social you-niverse visualization depicting solar system around a star in accordance with an embodiment of the present disclosure. -
FIG. 50 is a screenshot of one embodiment of the social you-niverse visualization depicting icons or other visual indicators for distance in accordance with an embodiment of the present disclosure. -
FIG. 51 is a screenshot of one embodiment of the social you-niverse visualization depicting galaxies in accordance with an embodiment of the present disclosure. -
FIG. 52 is a screenshot of one embodiment of the social you-niverse visualization depicting planet orbiting more complex structures in accordance with an embodiment of the present disclosure. -
FIG. 53 is a screenshot of one embodiment of the social you-niverse visualization depicting binary or multiple stars in accordance with an embodiment of the present disclosure. -
FIG. 54 is a screenshot of one embodiment of the social you-niverse visualization depicting nebulas in accordance with an embodiment of the present disclosure. -
FIG. 55 is a screenshot of one embodiment of the social you-niverse visualization depicting an interstellar cloud of dust in accordance with an embodiment of the present disclosure. -
FIG. 56 is a screenshot of one embodiment of the social you-niverse visualization depicting a supernova explosion in accordance with an embodiment of the present disclosure. -
FIG. 57 is a screenshot of one embodiment of the social you-niverse visualization depicting gravitational pull on outer planets in accordance with an embodiment of the present disclosure. -
FIG. 58 is a screenshot of one embodiment of the social you-niverse visualization depicting wobbling planets in accordance with an embodiment of the present disclosure. -
FIG. 59 is a screenshot of one embodiment of the social you-niverse visualization depicting orbits that are stretched in the regions contiguous to the solar system that is exerting the pull in accordance with an embodiment of the present disclosure. -
FIG. 60 is a screenshot of one embodiment of the social you-niverse visualization depicting disappearance of planets from the galaxy or universe in accordance with an embodiment of the present disclosure. -
FIG. 61 is a screenshot of one embodiment of the social you-niverse visualization depicting solar systems which are exhibiting the greatest degree of change shift automatically towards the visual center of the screen, so as to make themselves more visible to the user in accordance with an embodiment of the present disclosure. -
FIG. 62 is a screenshot of one embodiment of the social you-niverse visualization depicting user ability to specify which types of changes in accordance with an embodiment of the present disclosure. -
FIG. 63 is a screenshot of one embodiment of the social you-niverse visualization depicting a planet pulled into a new solar system with a trail of dust or other visual artifact to call attention to it self in accordance with an embodiment of the present disclosure. -
FIG. 64 is a screenshot of one embodiment of the social you-niverse visualization depict clouds of dust used to cloak planets which represent individuals about whom little is known in accordance with an embodiment of the present disclosure. -
FIG. 65 is a screenshot of one embodiment of the social you-niverse visualization depicting dust rendered over as much space as necessary, for as long as necessary, in order to accurately portray the extent and duration of the data loss in accordance with an embodiment of the present disclosure. -
FIG. 66 is a screenshot of one embodiment of the social you-niverse visualization depict moons orbiting other planets which represent the “followers” or entourage of that actor in accordance with an embodiment of the present disclosure. -
FIG. 67 is a screenshot of one embodiment of the social you-niverse visualization depicting orbits relative speed in accordance with an embodiment of the present disclosure. -
FIG. 68 is a screenshot of one embodiment of the social you-niverse visualization in accordance with an embodiment of the present disclosure. -
FIG. 69 is a screenshot of one embodiment of the social you-niverse visualization depict a planet gradually drifting out of the orbit of the current solar system and disappearing in accordance with an embodiment of the present disclosure. -
FIG. 70 is a screenshot of one embodiment of the social you-niverse visualization depicting two actors experiencing conflict and the planets representing them smashing together in accordance with an embodiment of the present disclosure. -
FIG. 71 is a screenshot of one embodiment of the social you-niverse visualization depicting concepts or topics instead of moons, starts or planets in accordance with an embodiment of the present disclosure. -
FIG. 72 is a screenshot of one embodiment of the social you-niverse visualization depicting optional sound effects in accordance with an embodiment of the present disclosure. -
FIG. 73 is a screenshot of one embodiment of the temperature gauge visualization in accordance with an embodiment of the present disclosure. -
FIG. 74 is a screenshot of one embodiment of the temperature gauge visualization depicting notion of “neutral” and various types of negative sentiments in accordance with an embodiment of the present disclosure. -
FIG. 75 is a screenshot of one embodiment of the temperature gauge visualization depicting expression of positive sentiments by having the midpoint in the gauge in accordance with an embodiment of the present disclosure. -
FIG. 76 is a screenshot of one embodiment of the temperature gauge visualization depicting emoticons of different kinds instead of temperature gauge icons in accordance with an embodiment of the present disclosure. -
FIG. 77 a screenshot of one embodiment of the temperature gauge visualization depicting emoticons of different kinds instead of temperature gauge icons in accordance with an embodiment of the present disclosure. -
FIG. 78 is a screenshot of one embodiment of the stressful topics visualization depicting a matrix representation in which actors and topics are respectively represented in rows and columns in accordance with an embodiment of the present disclosure. -
FIG. 79 is a screenshot of one embodiment of the stressful topics visualization in accordance with an embodiment of the present disclosure. -
FIG. 80 is a screenshot of one embodiment of the stressful topics visualization depicting change over the course of time in accordance with an embodiment of the present disclosure. -
FIG. 81 is a screenshot of one embodiment of the stressful topics visualization depicting changes in individual rows and columns in a matrix in accordance with an embodiment of the present disclosure. -
FIG. 82 is a screenshot of one embodiment of the stressful topics visualization depicting a way to account for languages with word orderings other than left to right in accordance with an embodiment of the present disclosure. -
FIG. 83 is a screenshot of one embodiment of the stressful topics visualization depicting color designation for rows and columns which have been swapped over time in accordance with an embodiment of the present disclosure. -
FIG. 84 is a screenshot of one embodiment of the stressful topics visualization depicting ability to play a visualization which contains an arbitrary number of different matrices according to the same timeline in accordance with an embodiment of the present disclosure. -
FIG. 85 is a screenshot of one embodiment of the stressful topics visualization depicting user ability to select either/both matrices from different timeframes, and/or different timeframes from the same matrix and play these matrices all together in accordance with an embodiment of the present disclosure. -
FIG. 86 is a screenshot of one embodiment of the stressful topics visualization depicting display indicating the offset unit of time in accordance with an embodiment of the present disclosure. -
FIG. 87 is a screenshot of one embodiment of the stressful topics visualization depicting a heat map implementation in accordance with an embodiment of the present disclosure. -
FIG. 88 is a screenshot of one embodiment of the stressful topics visualization depicting the user ability determine whether they want to see a visual emphasis in accordance with an embodiment of the present disclosure. -
FIG. 89 is a screenshot of one embodiment of the stressful topics visualization in accordance with an embodiment of the present disclosure. -
FIG. 90 is a screenshot of one embodiment of the pecking order visualization in accordance with an embodiment of the present disclosure. -
FIG. 91 is a screenshot of one embodiment of the pecking order visualization depicting user ability to choose other animals in accordance with an embodiment of the present disclosure. -
FIG. 92 is a screenshot of one embodiment of the pecking order visualization depicting allowing the user to choose the animal type generally, or with respect to a particular actor, type of actor, or specific hierarchy instance in accordance with an embodiment of the present disclosure. -
FIG. 93 is a screenshot of one embodiment of the pecking order visualization depicting each individual pecking order is represented by a building in accordance with an embodiment of the present disclosure. -
FIG. 94 is a screenshot of one embodiment of the pecking order visualization depicting user ability to specify the left-to-right order in which the buildings are rendered from choices in accordance with an embodiment of the present disclosure. -
FIG. 95 is a screenshot of one embodiment of the pecking order visualization depicting the building being built in accordance with an embodiment of the present disclosure. -
FIG. 96 is a screenshot of one embodiment of the pecking order visualization depicting the building accumulating broken windows, graffiti, and other signs of disuse in accordance with an embodiment of the present disclosure. -
FIG. 97 is a screenshot of one embodiment of the pecking order visualization depicting ledges or levels designated with labels such as “vice president” in accordance with an embodiment of the present disclosure. -
FIG. 98 is a screenshot of one embodiment of the pecking order visualization depicting a chicken flying between the different pecking order instances in accordance with an embodiment of the present disclosure. -
FIG. 99 is a screenshot of one embodiment of the pecking order visualization depict a chicken representing an actor who is no longer on the scene will fall to the ground in a manner that clearly suggests it is dead in accordance with an embodiment of the present disclosure. -
FIG. 100 is a screenshot of one embodiment of the pecking order visualization depict a chicken representing an actor who is no longer on the scene will fall to the ground in a manner that clearly suggests it is dead being carried away by vultures and so on in accordance with an embodiment of the present disclosure. -
FIG. 101 is a screenshot of one embodiment of the pecking order visualization depicting chickens (or other animals) ascending or descending from one level to the next according to the backing data in accordance with an embodiment of the present disclosure. -
FIG. 102 is a screenshot of one embodiment of the pecking order visualization depicting chickens ganging up on one or more other chickens if the actors they represent are engaged in an argument or power struggle in accordance with an embodiment of the present disclosure. -
FIG. 103 is a screenshot of one embodiment of the pecking order visualization in accordance with an embodiment of the present disclosure. -
FIG. 104 is a screenshot of one embodiment of the buck passing visualization in accordance with an embodiment of the present disclosure. -
FIG. 105 a screenshot of one embodiment of the buck passing visualization is viewed as a graph in which two objects are connected together by an arc in accordance with an embodiment of the present disclosure. -
FIG. 106 is a screenshot of one embodiment of the buck passing visualization depicting an arc that becomes thin enough as a result of lack of buck passing that it will simply disappear from the view in accordance with an embodiment of the present disclosure. -
FIG. 107 a screenshot of one embodiment of the buck passing visualization depicting buck passing relationships which have expanded or contracted over the course of the available data in accordance with an embodiment of the present disclosure. -
FIG. 108 is a screenshot of one embodiment of the buck passing visualization depicting horizontally-aligned pairs of arrows which point at one another if the buck passing has diminished, and point in opposite directions if it has increased in accordance with an embodiment of the present disclosure. -
FIG. 109 is a screenshot of one embodiment of the buck passing visualization depicting horizontally-aligned pairs of arrows which point at one another if the buck passing has diminished, and point in opposite directions if it has increased in accordance with an embodiment of the present disclosure. -
FIG. 110 is a screenshot of one embodiment of the buck passing visualization depicting various visual treatments to illustrate the buck passing relationship in accordance with an embodiment of the present disclosure. -
FIG. 111 is a screenshot of one embodiment of the buck passing visualization depicting user ability to specify types of topics and ad hoc workflow processes that should not be considered as instances of buck passing in accordance with an embodiment of the present disclosure. -
FIG. 112 is a screenshot of one embodiment of the buck passing visualization depicting different classes of identifiable tasks that can be specified to have differing visual treatments by the user so as to make them easily distinguishable from one another in accordance with an embodiment of the present disclosure. -
FIG. 113 is a screenshot of one embodiment of the buck passing visualization depicting a different visual treatment for nodes that represent actors who have changed roles and the arcs that represent pre-existing buck-passing relationships in accordance with an embodiment of the present disclosure. -
FIG. 114 is a screenshot of one embodiment of the love life visualization in accordance with an embodiment of the present disclosure. -
FIG. 115 is a Conceptual Diagram of the Hypergraph System in accordance with an embodiment of the present disclosure. -
FIG. 116 is a diagram of indexed Data Sources in accordance with an embodiment of the present disclosure. -
FIG. 117 is a diagram of Indexed Data Sources in accordance with an embodiment of the present disclosure. -
FIG. 118 is a diagram of featured query operators in accordance with an embodiment of the present disclosure. -
FIG. 119 is diagram of the query matching procedure in accordance with an embodiment of the present disclosure. -
FIG. 120 is a diagram of the discussion building process in accordance with an embodiment of the present disclosure. -
FIG. 121 is a diagram of a faceted evidence representation in accordance with an embodiment of the present disclosure. - The present disclosure efficiently performs continuous monitoring of data produced or circulating within an entity, or social network, whether it is within a specific entity or on the world wide web, especially when relying on a high volume of electronic data, and uses novel behavioral analysis techniques in order to detect, report and/or make users aware of possibly preventable damaging events, of an accidental or fraudulent nature.
-
FIGS. 1 , 2, and 3 depict the key elements and concepts of the system described in accordance with an embodiment of the present disclosure. - Event [100]: The central unit of analysis of the present disclosure. Depending on its origin, an event can be an observed event [102] exogenous to the system, a derived event [104] produced by the system, or user input [106] manually entered through an embodiment of the disclosure.
- Evidence [108] is derived by the system after collecting, processing, and analyzing events [100]. Evidence is represented by the system as OSF [110] or order sorted features structures for which grammar [112] rules can be defined. OSF's [110] are stored as a hypergraph [114] model in one embodiment.
- Token [116]: The smallest unit of analysis of the disclosure. In one embodiment, this atomic unit of analysis is a linguistic term. In another embodiment, it is a single character. N-grams [118] are continuous sequences of tokens [116] of length n, where n is a fixed, pre-determined integer.
- Pattern [120]: A model for a number of features shared by multiple items [122] and to represent the structure underlying those items [122]. Each such pattern [120] can be matched by one or more pieces of evidence [108] derived by the system. A textblock pattern [124] is a model for contiguous blocks of text that is associated with one author, and is substantive enough to be treated potentially as an independent object in the system. Textblock patterns [124] are derived by the system from building a textblock graph [160] which contains transitions between n-grams [118]. A periodic pattern [126] is a model of events [100] that are occurring at exactly or approximately regular intervals over time. A workflow process [128] is a formal or ad hoc source of evidence [108] that constrains specific events [100] to be performed according to a number of workflow stages [154].
- Discussion [136]: A possibly heterogeneous partially ordered set of electronic record items [122] for which it is presumed that any item [122] is causally related to all items [122] immediately following it by one or more sources of evidence [108].
- Items [122], such as electronic communications [123] collected from any kind of communication channel [156] or electronic documents [162] can, upon processing by this disclosure, be assigned a number of item tags [142] which are a form of metadata [140] computed by the system on the basis of categorization components [146] such as an ontology [148] composed of a set of ontology classifiers [150], or a topic detection [152] method. A discussion [136] in which items [122] have been marked using item tags [142] constitutes a tagged sequence [138].
- Actor [220]: A human or computer system which produces items [122] and is associated with one or more distinct electronic identities [235] such as email accounts, IM handles, system logins, etc. An actor [220] may be deemed to have more than one personality [230] if the content created or received by at least one of the different electronic identities [235] varies significantly from that of the others, where an electronic identity [235] can be an electronic alias [240] on some communication channel [156] or any reference to an individual, for example by name [245].
- A group [225] is a container object for actors [220]. They can be formal groups [250], for example when an organizational chart is available to the system. Other types of groups [225] can be derived by the system from sociological and behavioral analysis, such as cliques [255] also known as circles of trust, which are a set of actors who consistently correspond in a closed loop with one another.
- A typed update [107] is a light representation of an incremental change to an evidence or event that can be forwarded to different components. A typed update [107] references one or several evidences or events that are affected by the changes. Because we are dealing with continuous and very large streams of data, forwarding typed updates [107] instead of resending whole results every time an evidence or event is updated greatly reduces the amount of traffic between components, especially when it comes to updating previously computed results. In some embodiments, typed updates [107] can take the form of deltas or simple incremental data changes. In other embodiments, a typed update [107] can consist of a function or set of declarations and operations to be applied to existing data in order to mutate it.
- According to one aspect, anomaly detection means that the system is able to infer potentially damaging activities, whether of unintentional or malicious nature, without requiring the prior definition of the type and characteristics of these activities.
- According to one aspect, heterogeneous information analysis means that all kinds of electronic information stored in a corporation or another kind of organization can be processed by the system in a unified manner and consistently lend themselves to the detection of abnormal patterns in the data. This includes both text-bearing and non-text-bearing records.
- According to one aspect, multi-dimensional behavioral modeling means that in addition to analyzing pieces of information, the building of a predictive multi-dimensional model of both individual and collective behaviors allows detecting abnormal patterns in these behaviors as well.
- The present disclosure describes a method for building and continuously maintaining a behavioral model [200]. This model represents assessed behavior [205] which can be either individual behavior [210] or collective behavior [215]. In order to detect anomalies [270], the system establishes baseline behaviors [260] which are a synthetic representation of communication habits and normal interactions, then assesses deviations [265] by comparing assessed behaviors [205] to such a baseline. This allows the detection of anomalies in recent or past behavior, however the system also attempts to predict behavior [262] in the near future based on the behavioral model [200]. To compute that behavioral model [200], the disclosure relies on a number of behavioral traits [295] which are dimensions according to which an actor's [220] personality and behavior can be reliably measured, and are computed based on a set of behavioral metrics [290]. Actors [220] and their assessed behaviors [205] can then be evaluated on an absolute scale using scores [285], and on a relative scale using ranking [275] mechanisms, in order to assess the relevance [280] of any detected anomaly [270].
- The behavioral model [200] computed by the system, as well as the anomalies [270] produced, are presented by the system using supporting evidence [202] and visualizations [204] in one embodiment.
- A visualization [204] is produced in several stages. Data is generated [365] over time either by an iterative process over batches of data [370], or on a continuous basis [375] using for example a sliding window mechanism [380]. Input data [365] for a particular visualization [204] is then selected [360] either automatically by the system or interactively by a user. A layout [355] is then produced to efficiently display [350] the visualization as part of the system's user interface [300].
- The system continuously raises alerts [305] about behavior flagged as anomalous, for which notifications can be automatically sent to the user. An alert [305] can be a past alert [306] based solely on already analyzed events [100], or a predicted alert [307] based on the likelihood of some events [100] occurring in a near future and associated to anomalous behavior or patterns. Alternatively, reports [310] on behavior and information analysis can be regularly scheduled or generated on-demand. The system can be set up to continuously monitor specific queries [325]. Finally, a complete audit trail [330] is available that comprises collected data as well as all types of evidence [100] stored in the collection repository [320]. Access to this audit trail [330], like the rest of the application data [315], is restricted by access controls [335] whose scope is determined by a coloring scheme [345] and whose restrictions are defined at a fine-grained level using an anonymization scheme [340].
-
FIG. 4 illustrates the general architecture of the system in accordance with an embodiment of the present disclosure. - A user of the system [455] is typically an analyst or human operator whose role is to respond to alerts raised by the system before a malicious act is perpetrated, or right after unintentional damage has occurred, as well as to actively investigate any leads or patterns with the help of the system's analysis results.
- A central event passing infrastructure [460] is used by all components in the system to exchange data. That infrastructure can be distributed and ideally tries to keep data in flight as much as possible to maximize the system's throughput. That exchanged data comprises:
-
- Observed events [102] that have been collected and captured by the data collection component, including but not limited to documents [162], messages exchanged on a communication channel [156] such as email, instant messages, phone logs and voicemail, as well as structured events such as database records;
- Derived events [104] including but not limited to the output of the processing and analysis layer [402] (discussions [136], patterns [120], item tags [142], detected anomalies [270]); and
- Any kind of user input [106].
- Data handled by the event passing infrastructure [460] is pushed to components in the processing and analysis layer [402]. In one embodiment, events [100] are serialized as OSFs [110].
- A set of scoping policies [485], such as sliding windows [380] over the incoming data stream, are used in some embodiments to regulate the processing of events [100] by downstream components. The data collection component [400] collects data continuously or in batch mode from a variety of heterogeneous data sources [401], extracts their content and their metadata and stores the extraction results for access by downstream components of the system.
- The continuous categorization component [420] analyzes the incoming stream of events [100] to assign one or more categories to those events [100], using any number and variety of categorization components [146], and maintaining the validity and quality of the results even in the case of categorization components [146] that are inherently data-dependent.
- The continuous discussion building component [410] establishes discussions [136] as a structure linking causally related items [122]. The discussion-building mechanism described in the present disclosure builds on the disclosure described in U.S. Pat. No. 7,143,091 to support a continuous mode of operation [375] in a highly-scalable manner.
- The continuous clustering component [412] produces clusters of items [122] or events [100] from the incoming data stream on a continuous basis. It is a required stage of continuous discussion building [410].
- The continuous periodic patterns detection component [405] analyzes the incoming stream of events [100] to find event sequences [166] that are recurrent over time and occur at roughly periodic intervals, thereby constructing periodic patterns [126] on a continuous basis.
- The continuous workflow analysis component [465] automatically detects ad hoc workflow processes [128] from the incoming stream of events [100] and analyzes the workflow instances [134] corresponding to those processes [128], including the detection of anomalies [270] in their realization.
- The continuous emotive tone analysis component [435] is used in some embodiments by the system to identify and analyze occurrences of emotional expression in electronic communications [123], which provide valuable categorization information to other components of the system, particularly the behavioral modeling component [445].
- The pragmatic tagging component [430] is another component used in some embodiments of the system which is based on linguistic analysis: it categorizes the communicative and discourse properties of electronic communications [123]. In particular, its output produces an abstract workflow model that lets the system detect and analyze workflow processes [128] associated to the realization of specific tasks.
- The textblock detection component [470] automatically identifies maximum contiguous sequences of sentences or sentence fragments which can likely be attributed to a single author. Once these textblock patterns [124] have been detected, any item [122] that contains that textblock or a significant portion of it is flagged by the system as a textblock hit [130], which allows the system to assess how information is exchanged or disseminated by specific actors [220].
- The behavioral modeling component [445] builds and maintains a model [200] of individual behavior [210] and collective behavior [215], which is defined by any number of behavioral and personality traits [295] that can be determined in the specific scenario at hand. A user of the system [455] can view that behavioral model [200] using a number of visualizations [204] described in the present disclosure.
- The anomaly detection component [450] continuously monitors the incoming stream of events [100] (both observed [102] and derived [104], including the behavioral model [200]) with the main goal of spotting anomalous behavior and anomalous patterns in the data based on statistical, analytical, and other types of properties associated to both recent data and historical data. The anomaly detection component [450] also produces alerts [305] by aggregating anomalies [270] and reports [310] sent to the user [455]. The system also comprises time-based and behavior-based continuous visualizations [204] of those alerts [305]. Anomalies [270] detected by this component are also fed to visualizations [204] in order to highlight anomalous patterns to the user [455], and can optionally trigger mitigating or preventive actions. In some embodiments, the anomaly detection component [450] includes an anomaly detection tuning scheme which maintains the relevance and the accuracy of produced anomalies [270] based among other things on anomaly feedback [158]. However, in most such embodiments, all alerts [305] are still calculated and the user [455] is informed when significant numbers of different types of anomalies [270] associated with the same actor [220] are observed by the system; at any rate, all such instances are logged. These measures are to prevent the system behavior largely defaulting to that of a traditional system if users [455] should treat it as a rule-based system (e.g. by specifying what types of anomalies [270] they do want to see and which ones they don't, thus defeating the idea that change or deviation is by itself interesting and potentially critically important.)
- The continuous multi-dimensional scaling component [425] computes a low-dimensional layout of an incoming stream of events [100]. Its output is particularly useful for the sequence viewer [440] which shows a potentially massive number of tagged sequences [138], for example those corresponding to the realization of a particular workflow process [128], thereby outlining dominant patterns and outliers in the instances [134] of that workflow process [128].
- The alias usage browser [478] is a visualization [204] used in the present disclosure to efficiently display and navigate through the results of actor analysis [480], for example performed as described in U.S. Pat. No. 7,143,091 which is incorporated by reference herein for all purposes.
- Finally, a number of continuous visualizations [204] show patterns [120] derived by the system from the data and from human interactions and behaviors, as well as the anomalies [270] that may have been detected in those patterns. These continuous visualizations [204] include, but are not limited to: animated actor graph visualizations [471], the social you-niverse visualization [472], the stressful topics visualization [473], the temperature gauges visualization [474], the buck passing visualization [475], the pecking order visualization [476], and the love life visualization [477]
- In the following sections, each of these components is specified in more details, along with one or more embodiments and how it relates to the other components in the present disclosure.
- The system described in the present disclosure processes and analyzes electronic data continuously collected from any number of data sources [401]. Those data sources can be of virtually any type: a particular embodiment of the present disclosure only needs to extract the data and metadata from types of data sources relevant to the scenario at hand. Types of data sources that can be leveraged by the system for behavioral modeling and anomaly detection purposes include, but are not limited to the following.
- Electronic communication channels: emails (collected from email clients or from email transfer agents), instant messaging, calendar events, etc.
- Electronic document sources: document management systems, file shares, desktop files, etc.
- Phone data sources, including phone logs, phone conversation transcripts, and voicemail.
- Log files: application log files, system log files such as syslog events [100], etc.
- Databases: changes to table rows in a relational database or more generally events [100] captured in real time during a transaction.
- Public social networks, such as Facebook, Twitter, etc.
- Other public data feeds: market data feeds, news feeds, etc.
- Physical data sources: physical access logs (keycards, biometrics, etc.), sensors and sensor networks (including RFID readers), geo-location information collected from portable devices, etc.
- External monitoring system: as described in this disclosure, any external monitoring system can be integrated as a data source, for example rule-based compliance systems or network intrusion detection systems.
- Individual data collected from internal sources: travel and financial information, background check and psychological evaluations of individuals in the organization, results of internal investigations, etc.
- Individual data collected from external sources: personal communication channels (such as email accounts, weblogs or websites), data publicly available on the Internet, data subpoenaed in the case of a law enforcement organization, wiretaps and intelligence collected from the field in the case of an intelligence organization, etc.
- The first component which regulates how data flows through the system in many embodiments is the scoping policies component [485]. In presence of a quasi-infinite stream of data, as is usually the case with continuous applications, the system needs to possess extensive data flow management and data aging policies.
- This role is carried out by the scoping policies component [485]. Its functionality is transversally used by all other data manipulating components of the system. Most embodiments maintain an audit trail of changes to the scoping policies. The scoping policies component may have multiple policies defined, the highest priority being executed first. The scoping policies component [485] is essential to the continuous mode of operation of other components, such as the data collection component [400] and the continuous clustering component [412].
- One of the simplest examples of a scoping policy is an aging policy where a sliding window [380] of time is maintained over the incoming stream of data. Every time a piece of data falls out of the window [380], a notification message is sent to all the components notifying them that they are free to discard that data.
- Another type of scoping policy can evict data based on a set of predicates. For example, a predicate that states that every email coming from a certain group of actors [225] should be first to be discarded when resources are low, or based upon an order from a properly authorized user of the system [455] requiring certain sensitive data be removed from the system.
- The scoping policies component [485], based on the configuration, will or will not attempt to enforce any action to be taken by processing and analysis components. In case the scoping policies component [485] is configured to not enforce its policies, the other components should decide by themselves what to do when they receive the notifications.
- In one embodiment, asynchronous procedures can be launched to effectively “garbage collect” the discarded records from the caches. Some policies, such as the aging policy illustrated above, can also be coupled with archival systems or historical databases which will guarantee exploration of the data outside of the continuous flow. Setting up archival systems or historical databases is optional, however, and is not necessary for the proper operation of the system.
- The data collection stage of the present disclosure builds on the method described in U.S. Provisional Patent Application No. 61/280,791, the disclosure of which is incorporated by reference herein for all purposes, while integrating that method into an anomaly detection scenario and adapting it to a continuous mode of operation [375]. The main implication of the latter requirement is that due to new data being processed continuously by the system, a pruning mechanism is necessary to keep the volume of persisted data bounded, or at least to keep the increase rate of that volume bounded.
- As shown in
FIG. 5 , data collection is performed by the data collection component [400] within a collection session [500] in any combination of the following ways: -
- A human user launching a collection session [500] from a machine hosting the data collection component [400] or remotely [505].
- The data collection component [400] automatically collecting data in a continuous mode [510].
- The data collection component [400] automatically collecting data in incremental mode [515], i.e. as a series of batch operations.
- Any of these collection modes can be combined within the same collection session [500]. In particular, when the system is being run for the first time, collection will be initiated in a continuous mode [510] to collect future data (such as data captured in real-time from external data sources [401]) but the user can also at the same time, or at a later time, set up incremental collections [515] from data sources [401] hosting historical data. This allows the system to provide input data [365] to analytical visualizations [204] as early as possible, while other components such as the behavioral modeling component [445] require historical data to have been processed and analyzed in order to establish a behavioral baseline [260].
- The continuous data collection component [400] described in the present disclosure has a collection rate adaptation scheme which takes into account several elements to adjust the rate, including but not limited to the following:
-
- 1. Data collection throughput and load from the various data sources [401], in order to avoid any disruption of live sources [401] (such as mail servers) and ensure a pre-defined level of quality of service. In one embodiment of the present disclosure, a set of performance goals are defined in the system configuration, which consists of a maximum CPU usage ratio, and of a maximum I/O usage ratio on each machine hosting all or part of a data source [401].
- 2. Data processing throughput, which is maximized with respect to the other constraints. In one embodiment, a performance goal is defined in the system configuration by a minimum processing throughput value. In another embodiment, the performance goal is the priority threshold above which a data source [401] must be fully processed (i.e. the entirety of its events [100] must be collected and analyzed by the system in any given time interval). To ensure that the performance goal for processing throughput is achieved, the data collection component [400] is able to enroll additional processing machines at any time to distribute the load; conversely, it is able to free processing machines when the performance goal is exceeded by a pre-defined value.
- 3. Priorities of the different data sources [401] (and the fine-grained locations within each data source [401]), depending both on user prioritization and on system judgments. In one embodiment of the present disclosure, the system checks at regular intervals whether data collection throughput is saturated; if it is, then the data source [401] with lowest priority is set into an inactive state; in the contrary case, the inactive source [401] with highest priority is activated.
- Data collection covers both structured and unstructured data available from locations, including but not limited to:
-
- At the organization's boundaries (internet, intranet)
- On servers (document repositories, file shares, etc.)
- At the endpoints (desktops, laptops, mobile devices, etc.)
- From public data sources [401] (for example, information about a company publicly available on the web, financial market data, data from social networking platforms—whether business-oriented or not)
- The data model underlying the collection process is detailed in U.S. Pat. No. 7,143,091, the disclosure of which is incorporated by reference herein for all purposes. One of its most important characteristics is that it models associations between structured and unstructured data (more precisely, attributes and textual content are indexed and processed identically) as well as associations between business transactional data and generic professional data (in particular using actor analysis [480] and discussion building [410]).
- The new, fundamentally different characteristic in the context of the present disclosure is that the model, in addition to being highly scalable, needs to be pruned over time since it represents an infinite data stream. Different strategies are described in the rest of this section which allow the present disclosure to prune, among other elements, the index, the item relationships (entity to instance, parent entity to child entity, etc.), and the actor and communication graph. Possible pruning strategies are explained in the section on Collection management. Additionally, the model also needs to deal with discussions [136] that are still in progress, i.e. have not reached a resolution or another form of completion.
- Collection instances [545] are central elements of the system's mode of operation. They are designed to fulfill the requirements of this invention, namely to provide a single data structure underlying all analytical tasks, including the storage of the complete revision history of documents and allowing the system of the present disclosure to determine the origin of all analyzed data.
- As described in
FIG. 6 , a collection comprises a number of elements and properties, which can be categorized under collection parameters and collection artifacts. - The collection operation's definition and parameters [640] in some embodiments include:
-
- Configuration [645] of the data sources [401]
- Complete audit trail [650] of the collection, including which data has been collected and stored in the collection repository [655], which data has been collected and submitted for analysis [660], but then discarded to only retain the derived model, and which data has been ignored due to configuration (initial or overriding parameters) of the data collection component [670].
- User-specified properties of the collection [665] including its name, optional description, associated matters or projects, collection mode (batch mode [370] or continuous mode [375]), owner, delete-by-date property.
- The processing artifacts [610] include:
-
- The set of all original items [122].
- The results of extracting, processing, and post-processing those original items [122] (unless explicitly configured not to skip some of these stages on a particular data set) including indexed content [615] (both data and metadata), relationships between data elements [620], and custodial information [635].
- Optionally, a collection instance [545] contains the full lifecycle of the original data items [122], i.e. the revisions [625] (deletion, creation, and data modification) associated with the relevant data sources [401] and the collection.
- Collection instances [545] are stored in one or more secure collection repositories that are only accessible to administrators of the system, and any other users to whom an administrator has explicitly granted access.
- Such a repository may contain collections of different natures. In particular, it holds both dynamic collection instances [545] and static collection instances [545]: the former corresponds to the results of ongoing collection using the data collection component in continuous mode [375], the latter to the results of a single collection operation or of a data intake from a set of physical media. It is important to provide a single point of storage for both kinds of collected data.
- Collection instances [545] thus contain information about the lifecycle of the original data items. This means in particular that each collected item is flagged in the collection repository with a number of attributes including the following: which custodians or systems currently have which revisions of the item [122] (if more than one revision exists), which custodians (or systems) deleted or edited the item [122] and when these edits were made, and optionally the full item [122] revision history. (This is optional because it can increase storage requirements and the volume of information to post-process.)
- Note that the persistence of the full information lifecycle allows the anomaly detection component of the system to detect anomalies [270] related to that lifecycle, including the following examples:
-
- A document [162] was edited an unusually large number of times, and in an unusual number of ways.
- A document [162] was edited by an unusual number of actors [220] relative to what is normal for this type of document [162]
- Many custodians of a particular document [162] all deleting it within a very short time span of one another.
- As described above, the collection audit trail contains information about different types of data, namely the items [122] that have been collected and stored; the items [122] that have been collected and discarded; finally, the items [122] that have been ignored.
- To illustrate the point of organizing the collected data in this way, the case of ephemeral data can be considered. For example, system and application logs produce far too much content to store persistently over the long run, however the patterns that can be automatically inferred from these logs (such as periodic usage patterns) are interesting to observe since any significant aberration from such patterns would be worth showing.
- More generally, one the fundamental benefits of the system is that there is no need to exhaustively store the whole data that has been analyzed, since in most cases the patterns resulting from statistical analysis of that data are sufficient to establish a behavioral baseline [260] and thus to detect anomalies [270]. In particular, this yields—if desired—very strong privacy protection guarantees because the original data items [122] collected from different sources are not necessarily persisted by the system, and also because in case they contain confidential data, some of their properties can still be leveraged by the behavioral model [200] without infringing on privacy by looking at the data.
- Different usage scenarios will benefit from different levels of accuracy and completeness in the collection model, which is the main rationale for the previously defined flexible model.
- Very coarse-grained collections can be configured in some specific scenarios, which in turn speeds up processing of collection results. In a common application scenario, a collection session is designed to support a long-term investigation or monitoring case based on a substantial amount of previously collected data. In that case, analysis results from that prior data can be leveraged by the anomaly detection component [450], hence it is of great benefit to store the full history of successive item revisions [625] in the repository [655], including but not limited to the following reasons:
-
- A complex model of what constitutes normal behavior in the organization (part of the baseline model [200] built by the system) will have been inferred from months or years of collected data, thus allowing reliable detection of anomalous behavior as soon as continuous monitoring starts on recent data.
- Conversely, the system-derived model, including document lifecycles, allows the system to derive associations between actors [220] on the one hand, and events [100] on the other hand, based on who actually does what—and how. In one embodiment of the present disclosure, this allows the system to determine which actors [220] are typically involved in the lifecycle of any particular type of document [162], for example the quarterly filing of a corporation's financial results.
- In one embodiment of the present disclosure, collection management is performed in the administration console, and offers various maintenance and administration functionalities, detailed in the rest of this section.
- Merging [530] collection instances [545]: A common and convenient operation is merging two independently created collection instances [545]. This means that the original data sets are merged into a new collection instance [545], along with all associated processing artifacts.
- Pruning [525] collection instances [545]: The size of collection artifacts from collection operations, both in batch mode [370], and in continuous mode [375], tends to increase on the long term. After running for several weeks or months they may contain extremely large volumes of extracted data and metadata, especially when complete item revision [625] histories are stored.
- In that case, collection instances [545] can be manually pruned, by removing information prior to a given date, or for particular manually performed investigations and data analysis operations that are no longer relevant, or for particular custodians of the data, etc.
- In addition, the system optionally performs automatic pruning [525] of the collection instances [545]. This is done by assigning a pruning mode to any given collection instance [545], which is enforced by the scoping policies component [485]. As described earlier in this disclosure, the scoping policies component [485] provides a number of scoping policy types, which when applied to collection instances include but are not limited to the following:
-
- Aging policy specified by the length of the sliding window [380], so that the data collection component [400] will at any time only retain data in that collection instance [545] which is for example at most 6 months old.
- Predicate-based policy, for example a least-relevant-data-first pruning strategy: In conjunction with a maximum volume assigned to the collection instance [545], this policy enforces the predicate that data deemed the least relevant to the matter or project at hand will be pruned to keep the volume of a collection instance [545] below that limit.
- Splitting [540] collection instances [545]: Dividing collection instances [545] according to certain criteria is useful when, for example, a new investigation or monitoring case has to be processed which corresponds to a subset of the data collected for a prior case.
- Complementing [520] collection instances [545]: This consists of creating a new collection instance [545] from a prior collection session [500] and running it in an incremental mode so as to collect data that has been added since the prior collection session [500], but also metadata updates, deleted data, etc.
- Discarding [535] collection instances [545]: Finally, collection instances [545] can finally be deleted in a safe and verifiable manner, for example to comply with a destruction order or a retention policy. In one embodiment, all constituents of the collection instance [545] are erased.
- The continuous clustering component [412] produces clusters of items [122] or more generally events [100] from a set or a data stream on a continuous basis. A cluster is defined as a grouping of events [100] similar with respect to some observed features. The similarity measures are configurable.
- A similarity measure is a function that ascertains similarity between events. In its simplest form, it takes two events [100] and returns true if they should be considered similar, or false if not. Similarity measures can also provide a degree of similarity between two events instead of a binary answer. A degree of similarity can be a number in a specific interval, usually from 0 to 1. This allows us to perform fuzzy clustering. Custom similarity measures can be defined to fit the different types of data that are being processed.
- An embodiment of a similarity measure can be set up to receive two emails and return true if their tf-idf vectors' cosine exceeds a certain threshold, and false otherwise.
- Another embodiment can take two phone logs and return true if the caller and receiver on both phone logs are the same, and false otherwise.
- Another embodiment can be a set to operate on any type of event [100] that represents a communication [123] between two actors [220] and return true if two events have exactly the same set of actors [220], regardless of the type or channel [156] of the event [100].
- A similarity measure definition is thus very flexible, and is not necessarily constrained by the use of heterogeneous event [100] types.
- The method is fully continuous, which implies that it produces usable results (sets of clusters) and updates them as it acquires the events [100] from the underlying data stream.
- The method is stable and incremental. Running it on the same set of events [100] streamed in different orders produces the same result.
- The method runs on heterogeneous types of electronic events [100] as long as a similarity measure can be defined on them.
- The method can be configured to prioritize the processing of certain events [100].
- We assume a stream of events [100] flowing into the continuous clustering component [412]. In case the system needs to operate on a static dataset (i.e. in batch mode [370]), we can simply iterate through the static dataset and provide the events [100] to the system as a stream, hence adding no additional difficulty when processing static datasets.
- The continuous clustering is organized around a flow of events [100] through a set of components that process the events [100] as they receive them, and immediately forward the results needed by the downstream components, usually in the form of typed updates.
- Raw event stream [905]: The continuous clustering component [412] connects to a provider of raw events [100] which in the default embodiment of the present disclosure is the event passing infrastructure [460] which relays data collected by the data collection component [400] from any other source of raw electronic events such as:
-
- Financial market data feeds
- Online Press data feeds
- Others: news articles, editorials, blogs, tweets
- Feature collection phase [910]: A feature is a labeled and typed value copied directly or derived from an event [100] or its underlying items. In case of a derived value, the derivation is free to use information from other events [100] but a resulting feature is directly associated to the event [100] itself. A feature can be shared among multiple events [100]. Depending on their type, sets of operations can be defined between different features or structures of features, notably, some of those operations establish equality, equivalence or order between two or more features.
- A few examples of features are:
-
- The set of all named entities directly extracted from an email text
- A list of key terms from the text of an email with their tf-idf score calculated by using document frequency information from other emails
- A duration of a phone call written on a phone log
- Feature collection: The feature collection phase collects, for each observed event, the necessary information needed by the downstream components. It is a fully configurable component allowing the specification of all the subsets of the data stream, as well as the snippets of information to retain from each event its high level functions are described in
FIG. 10 : -
- 1. Filter incoming events [100] using filtering predicates [1010]
- 2. Prioritize incoming events [100] based on configurable prioritization predicates [1020]
- 3. Extract features necessary for the remaining of the overall computation by the downstream components. This results in a lighter process since only the necessary subset of events data is forwarded [1035]
- 4. Extract time stamps needed by the downstream components [1040]
- 5. Allow update of the filtering and feature extraction configurations live, while the system is still running
- This example illustrates filtering and feature extraction. An embodiment of a feature collector can be set up to only pass emails and IMs to the downstream components (filtering out, for example, phone logs, key card accesses and any other non email or IM records).
- This feature collector could then be set up to extract, for each email and IM, the sent date, which will be used as a time stamp, the title, the sender, receivers and other participants, and instead of passing around the whole raw content, would extract a shorter set of the named entities of the content and their modifying verbs, the feature collector would then compress the whole set of features and pass it along to the next component.
- This example illustrates prioritization. An embodiment of a feature collector can be set up to automatically prioritize certain types of events [100] by attaching a priority level to every event that satisfies a prioritization predicate. Examples of such prioritization predicates can be:
-
- A communication [123] occurs within a specific group of actors [225].
- A financial order with a certain symbol and a price that surpasses a certain threshold from a high frequency trading firm is acquired
- A key card access to a critical server room within a software company is acquired
Every time an event is acquired and matches a specified prioritization predicate, the feature collector will prioritize its processing [1030]. If there already are prioritized events being processed, the latest prioritized event will be queued for rapid processing, along with the other prioritized events, in a position corresponding to its priority level [1080].
- Each event [100] not filtered out by the aforementioned feature collection phase now has an internal lightweight representation consisting of only the data needed by the downstream components.
- Those events [100] are forwarded as they come to the other components guaranteeing the continuous nature of the whole method.
- The continuous clustering component [412] creates and updates sets of clusters of events [100] continuously as it receives them (in our current setting, from the feature collector).
- The clustering method is an extension of the document [162] clustering method described in U.S. Pat. No. 7,143,091, the disclosure of which is incorporated by reference herein for all purposes. At a high level, the current method augments the previous one with the following functionalities:
-
- The process is now made continuous and every new event [100] is immediately clustered, and the result is immediately forwarded to downstream components.
- The clustering component now listens to notifications from a scoping policies component [485] which notifies of which event [100] is to be discarded.
- In an embodiment of the continuous clustering component [412], the main operations are performed using of two main data structures:
-
- A main event index or lookup table which maps every event ID with the event data and allows its retrieval in constant time
- An inverted index which is another lookup table that associates for every feature the set of event IDs that possess that feature. The associated set of IDs is called a posting
- The continuous aspect is achieved by performing the following operations every time a new event [100] is acquired:
-
- 1. For every extracted feature of the newly acquired event [100], retrieve an initial set of other events [100] that share the same feature. That set is the posting that we can retrieve by using the inverted index.
- 2. Collect and de-duplicate all the postings associated with each of its features. The events [100] whose IDs constitute the postings are called a neighborhood. They can be looked up in constant time using the event index. The remaining operations will be performed only on the neighborhood which will greatly reduce the amount of computation since the neighborhood's size is smaller than the whole set of events [100].
- 3. Apply a similarity measure between elements of the neighborhood to reduce the set and maintain only a smaller set consisting of the events [100] that are close enough to the new event [100] with respect to the similarity measure. The small subset of the neighborhood will be referred to as a clump.
- 4. Check if the newly computed clump can be merged with existing clusters. Merge or split if needed.
- 5. Forward the changes to the downstream components in the form of deltas.
- The above operations can also be performed in other embodiments by using data structures other than the ones listed here. In essence, every data structure that allows constant time lookup of the events [100] coupled with any data structure that allows the retrieval of a given event's [100] neighborhood, based on similarity of their features, can be used.
- Upon reception of a discard notification from a scoping policies component [485], an embodiment of the clustering component [412] could do the following:
-
- 1. Locate the events [100] to be discarded from its index and remove them.
- 2. For every feature of every event [100] removed from the index, look up its posting and remove the event [100] from the list of associated events [100].
- 3. Create and send a delta notification with the appropriate removal declarations to all the components that use clustering results.
- The continuous clustering component [412] also provides methods to query and manipulate its state while it is still running. In some embodiments, examples of those methods are
-
- 1. Iterate through all current clusters
- 2. Get all events [100] of a cluster
- 3. Remove an event [100] from all clusters
- 4. Remove a cluster from the set of available clusters
- Creation and changes to clusters are propagated by sending change representations we will refer to as deltas [925]
- Deltas are a form of typed update [107] used in order to avoid re-sending whole clusters around when a change has been made to them. In its simplest embodiment, a delta is made of a cluster identifier, and a set of update declarations. The update declarations are either additions to the identified cluster or removals from the identified cluster. A simple example of delta can be represented as:
-
delta=(id0,{−event0,+event1,+event2}) (Eq-1) - This will be interpreted as: the cluster with identifier id0, has gained two events: [100] event1 and event2 and has lost the event [100] event0.
- This way of only propagating update information in the form of deltas is a significant gain on performance and overall load on the computational resources. On large scale deployments, where different components reside on different machines connected via a network, this greatly reduces the amount of network traffic and the amount of update information to be processed.
- Each update declaration is also interpreted as an idempotent operation, therefore, if by any forwarding or network glitch, the same delta is received twice or more by any component, it will have the same effect as if it was received only once.
- Notice that this representation of deltas still allows the representation of the whole content of any cluster if need be. Representing the whole cluster comes down to making sure all the events contained in the cluster are stated as additions in the delta.
- This abstract structure is the basis on which the downstream components which use clustering information depend.
- We describe a method that finds within a set or a stream of events [100], sequences of events [166] that are recurrent over time and occur at roughly periodic intervals. The method yields the actual sequences of events [166] that correspond to the realization of those periodic patterns [166], and are called periodic sequences [132]. It does this without any prior indication of the periodicity (for example daily, weekly, yearly, or every six hours and twenty minutes) or any prior indication of what the possible recurring events [101] themselves might be.
- A unique identifier is synthesized for every event [100], calculated as described in U.S. Pat. No. 7,143,091. Examples of events [100] considered by the periodic patterns detection component [405] in one embodiment of the present disclosure are:
-
- An email with its associated sent date
- An IM session with its associated start date
- A phone log with its associated call time
- An electronic meeting invitation note with the associated meeting date
- An electronic meeting invitation note with its associated sent time (as opposed to the time when the actual meeting is supposed to take place)
- A Financial market order and its associated time stamp
- A Financial stock tick with its associated time stamp
- A key card usage record with its associated time stamp
- A document with its associated creation, update or delete time stamps
- A record of actor [220] A communicating with actor [220] B in private (regardless of the communication channel [156] used) with the communication's [123] date and time
- Notice the heterogeneous types of items or combination thereof that can be considered events [100] for the periodic patterns detection component [405].
- Each association between a type of event [100] and the right type of time stamps is either directly inferred when there is no ambiguity, or can be configured. Some events [100] also have a duration.
- It should be noted that the definition of an event [100] does not require a one-to-one relationship with an electronic item. A simple class of events can be defined as a dialogue of a certain maximum duration, another class of events [100] can consist of all the electronic records constituting a financial transaction. The periodic patterns detection component [405] therefore allows any logical, identifiable and detectable bundle or derivative of electronic behavior, to be treated as an event [100].
- A periodic pattern [126] is an event [100] or a group of events that is recurrent over time and occurs at roughly periodic intervals. Examples of such patterns [126] are
-
- A weekly meeting
- A yearly employee review process
- A periodic sequence [132] is an actual sequence of events that matches a periodic pattern [126].
- This section describes a method that finds within a set or a stream of events [100], the periodic patterns [126] and yields the actual sequences of events [100] that correspond to the realization of the periodic patterns [126]. It does this without any prior indication of the periodicity or possible periodic events [100] themselves.
- The method even finds the periodic patterns [126] that have changing frequency components over time.
- The method is fully continuous, updating relevant periodic patterns [126] as new events [100] enter the system.
- The method is robust against incomplete or locally irregular data and localized changes of frequencies of the events [100].
- To illustrate this previous point, let us imagine a particular sequence of meeting invitations in a company, scheduled every week, on Monday, for a year. We continuously receive all data as they become available, starting from the beginning of the year. All meeting-related data is buried among the other types of data and we are not told in advance that this particular sequence of meetings will be a candidate for periodicity. If the sequence is ideal and always occurs every Monday, It will result in our method identifying a periodic pattern [126] of period one week that covers a year. Now, let us imagine that there is still a weekly company meeting but sometimes, maybe due to schedule conflicts, it is held another day of the week instead of Mondays. Each two successive meetings are not necessarily separated by exactly one week anymore. In addition to those slight irregularities, it often happens that such real-life events are skipped for a considerable period of time, maybe the attendants of the meeting have gone on vacation, perhaps for a few months. To finish, there can also be a different type of change: a frequency change when the previously weekly meeting turns into a daily meeting for a few months, maybe due to some tight deadline to meet.
- This illustration presents some of the challenges solved by the present disclosure in most datasets, because regardless of those irregularities, it still identifies the relevant periodic patterns [126]. This method is designed from the ground up to take them into account and still come up with a comprehensive result.
- The method will return, for this illustration:
-
- A periodic pattern [126] consisting of two distinct frequencies, a weekly frequency followed by a daily frequency
- A clear indication of the small disturbances (when the meetings were not held on Mondays but later in the week) and gaps (when everyone was in vacation) within the periodic patterns [126]
- It should be noted that those results will be yielded and updated as soon as the data is processed and we do not need to wait until the end of the year to produce the first results: The results are available on a continuous basis.
- The resulting set of periodic patterns [126] can also be queried by both its structural information: What periodic sequences share the same gap? What periodic sequences have a gap of size X during time period Y?
- Or its semantic information: Which periodic sequences [132] correspond to meetings? Which ones correspond to professional activities? Which ones have changes that coincide with external events [170]?
- This method can also recombine previously discovered periodic patterns [126] into higher order periodic patterns [126] that provide a richer picture of the regularities or irregularities of the data set or data stream.
- The present disclosure assumes a stream of events [100] flowing into the periodic patterns detection component [405] In case the system needs to operate on a static dataset (i.e. in batch mode [370]), it simply iterates through the static dataset and provide the events [100] to the periodic patterns detection component [405], hence adding no additional difficulty when processing static datasets.
- The periodic patterns detection component [405] operates on a set of clusters of events [100] forwarded by the continuous clustering component [412] described in this disclosure. It receives a set of deltas [See section below for a definition of deltas] and updates the periodic patterns [126] that have the elements of the clusters as their underlying events [100].
- In order to continuously build the clusters of events [100] and forward them to the periodic patterns detection component [405], the following steps are performed by the clustering component [412] described earlier in this disclosure:
-
- Raw event streams listener
- Feature collection phase
- Light event representation and forwarding
- Continuous clustering component
- Incremental updates to clusters
The following describes the periodic pattern [126] detection steps performed once the deltas are received from the clustering component [412].
- Detection: The periodic patterns detection component [405] performs the following actions:
-
- 1. Continuously receives a stream of deltas,
- 2. Searches for frequency components within each cluster,
- 3. Attempts to combine the periodic patterns found within different clusters to form higher order periodic patterns if possible.
- An embodiment of a method to find the periodic patterns [126] is outlined below.
- Upon reception of a delta, a process is spun off by the periodic patterns detection component [405] and locally reloads all necessary information for continuing the analysis of the corresponding clusters.
- The search for frequency components happens at different fixed time resolutions. The available time resolutions can, for example, range from a very fine grained resolution such as second or below, to coarser grained resolutions such as hour, day, week, month, etc. The analysis starts with the smallest resolution.
- For each resolution, the list of events [100] is sorted and binned according to the resolution. The resulting binned sequence can be treated as an array with each of its elements corresponding to the content of an individual time bin. Each element of the binned sequence can therefore represent:
-
- A single event [100] within a time bin
- A list of several events [100] that ended up together in the same time bin
- An empty time bin representing a non-event, which can be interpreted as a time bin where nothing relevant was observed.
- Once a binned sequence has been built, the system proceeds to reading it into a sparse transition matrix.
- From here the system only operates within a specific time resolution at a time, meaning the time axis is slotted into time bins of constant width. This helps reduce upfront a set of small variances and errors in time interval measurement. This also implies that several events [100] can end up in the same time bin. In the rest of this section, the term time bin will sometimes be used as a unit of measure or time just like a natural unit would be used such as day or month.
- The sparse transition matrix T is a matrix that has event classes on its rows and integers on its columns. An event class is defined here as a set of events [100] that are similar. An example of event class is a set of all instances of a particular meeting notification email.
- An entry T[i,j] of the matrix is a structure s containing an integer indicating the number of times instances of events [100] of the class denoted by the index i follow each other separated by a distance of j time bins. This number contained in the entry will be referred to as a repetition count. If the integer contained in the structure s of the entry T[e,4] is 10, that implies that events [100] of class e have been recorded to succeed each other every 4
time bins 10 times. The structure s also records the intervals where those successions occur in order to allow the location of the time intervals when those transitions occur. It also records, for each event class encountered in a time bin, the number of instances of that class observed. - Let us illustrate the creation and update of a sparse transition matrix with a small example:
- Let us consider a binned sequence containing two classes of events. The class e={e1, e2, e3, e4, e5, e6, e7, e8, . . . } and the class f={f1, f2, f3, . . . }
- (Portion 1) e1|*|e2|*|e3|*|e4|*|*|*|e5 f1|*|e6|f2|e7|*|e8 f3
- Reading this binned sequence will produce or update a sparse transition matrix with the following repetition counts (we omit detailing the whole structure of the entries).
- T[e, 2]=6 repetitions, at intervals [1,7] [11,17]
T[e, 4]=1 repetition, at interval [7,11]
T[f,3]=2 repetition, at interval [11,17] - The creation and update of a sparse transition matrix is a linear operation with respect to the size of the binned sequence. Using two pointers along the binned sequence, one could record all transitions for each event class in one pass.
- From this matrix we only consider entries that have a minimum number of repetitions, and satisfy a minimum set of contiguous disjoint occurrences (derived from looking at the intervals). The minimum number of repetitions can be adjusted for each time resolution. In the above example, if our minimum repetition is 2, then we only consider entries T[e,2] and [f,3].
- We can automatically infer from these entries the frequency component (e,*) which means an event [100] of class e occurs every 2 time bins as well as the frequency component (f,*,*) meaning an event [100] of class f occurs every 3 time bins. (The symbol * denotes a contextual non-event, or more generally a time bin when observations do not participate in the periodic pattern considered).
- It should be noted that the frequency components will still be detected even if noise—in the form of a set of unrelated events [100] scattered across the binned sequence—is added to the binned sequence. This adds to the robustness of the method.
- Furthermore, because we record the number of instances of every event class per time bin, we are able to produce periodic patterns with accurate amplitude information. As a simple example, if we augment the sequence represented by (Portion 1) in order to have two instances of the event class e at each occurrence:
- (
Portion 1 augmented) e1 e101|*|e2 e102|*|e3 e103|*|e4 e104|*|*|/|e5 e105 f1|*|e6 e106|f2|e7 e107|*|e8 e108 f3
We would be able to detect the corresponding, more informative frequency component (2 e,*) indicating that two occurrences of events of class e occur every 2 time bins. - Once the basic frequency components and periodic patterns [126] are detected a new step is taken to recombine them into higher-order periodic patterns [126].
- A higher-order periodic pattern is any complex periodic pattern obtained via recombination of previously detected standalone periodic patterns.
- Previously built periodic patterns [126] can be automatically recombined if they satisfy two types of conditions:
-
- Semantic conditions
- Structural conditions
- A semantic condition is any condition that triggers a recombination attempt while not being based on the pure periodic structure of the periodic patterns [126] to be recombined. Such conditions include, for example, an attempt to recombine two periodic patterns [126] because their underlying event classes have been declared similar and therefore merged into the same cluster by an upstream component. Semantic conditions, even though they serve as triggers, are sometimes not sufficient to mandate a recombination. Often, the recombination needs to be validated by satisfying further structural conditions.
- A structural condition is any condition based solely on the structure of the periodic pattern [126]. Structural conditions are built around information about periodicity, time span, intervals of occurrence, disturbances or gaps. Everything else not related to the structure of the periodic pattern [126] is labeled as a semantic condition.
- To illustrate an embodiment of a recombination step, let us consider the following binned sequence:
- (Recombination) e1|f1|e2|*|e3|f2|e4|*|e5|f2|e6|*|e7|f2|e8|*|
- This binned sequence yields at first two distinct periodic patterns [126]: A first one with a frequency component (e,*) which indicates that an event of class e occurs every 2 time bins and a second one (f,*,*,*) which indicates that an event of class f occurs every 4 time bins.
- An automatic recombination of these two periodic patterns [126] will result in a richer higher order periodic pattern [126] (e,f,e,*) indicating an event [100] of class e followed by an event [100] of class f followed again by an event [100] of class e every 4 time bins.
- Structural conditions, in this example where both original periodic patterns [126] are flat, can be stated as follows:
-
- The length of one of the periodic patterns [126] must be a multiple of the other, with a bounded multiplicative factor selected based on the time resolution at which the binned sequence has been built.
- The two periodic sequences [132] must satisfy a set of structural invariants, one example of which is a notion of bounded phase difference variance meaning that the oriented difference of positions between pivotal events [100] of each occurrence of the periodic patterns [126] is constant or bounded to a small variation with respect to the time resolution at which the binned sequence has been built. A simple example of a pivotal event is the one that starts every occurrence or repetition of a periodic pattern.
- Those two examples are sufficient for flat periodic patterns [126] such as the ones illustrated here to be automatically recombined. The periodic patterns [126] (e,*) and (f,*,*,*) have respective period or length of 2 and 4 which are multiples. Furthermore, we can observe that every fi is always preceded and followed by a pair ei, ej. This satisfies the bounded phase difference variance condition, giving us enough confidence to recombine the two periodic patterns [126].
- It should be noted that because we store the intervals and positions where the actual events [100] take place in our sparse transition matrix, the above structural conditions are tested in less than linear time (with respect to the binned sequence's length). These conditions can therefore be reevaluated as needed when the underlying event [100] sets are updated to ascertain whether or not the periodic patterns [126] should be kept combined or not.
- Structural conditions are not limited to the one illustrated in the above example. In addition to bounded phase difference variance, other conditions such as alignments of disturbances or gaps, where the time intervals representing the disturbances or gaps of a set of periodic pattern line up with a very small variation, can also be used.
- In one embodiment of the present disclosure, recombinations can also be made offline by using the querying mechanism on periodic patterns [126] stored by the system in a periodic patterns [126] database.
- Every time the system finishes wrapping up frequency components into periodic patterns [126], a set of attributes is cached until the underlying clusters that contain the events [100] are updated again, requiring an update of the associated periodic patterns [126]. The cached attributes contain:
-
- The current state of the different sparse transition matrices and their associated time resolutions, with flags on the entries allowing changes to be versioned. This allows operations such as: “provide from a sparse transition matrix only the entries modified after version X”
- The list of IDs of the underlying clusters
- A set of information kept around to avoid rebuilding the binned sequences all over again every time the underlying clusters change. This information allows the frequency component detection to be carried only on the new binned sequences built from the deltas associated with a cluster change. A summary of that information includes:
- saving the overall length of the binned sequence,
- saving the latest position of each last instance of every class of event,
- saving all events [100] in the most recent time bin for every time resolution.
- These attributes are indexed in a constant-access memory cache backed up by a persistent store by the cluster IDs and the periodic pattern IDs allowing a fast retrieval when clusters or periodic patterns need to be updated.
- Upon reception of a notification from a scoping policies component [485] declaring for example that an event [100] has been aged out, an embodiment of the periodic patterns detection component [405] could remove the corresponding event [100] from its caches and persistent stores and keep only its identifier and time stamp within a compressed representation of the periodic patterns [126] the event [100] appears in. This is another way to save the amount of space used for caching results from continuous periodic pattern detection.
- Because all the events [100] are only read and never modified during the usually very short frequency detection phase, which in addition operates on a partitioned subset of the available clusters, the system ends up with a high number of very localized and short processes. Every frequency detection and periodic pattern [126] update process is therefore spun off within its own parallel process for every subset. This factor allows the system to scale the computation through the parallelization and distribution of the frequency detection processes on the whole data stream. Depending on the concurrency mechanism available, an independent short lived process can be dedicated to any local set of classes of events for each update.
- In addition to the operations described above, every sparse matrix is continuously updatable and the frequency component detection happens incrementally.
- This also means that any binned sequence need not be passed to the system in its entirety in one pass. Being able to build the whole binned sequence in one shot is often impossible anyway because of the continuous nature of the system: the periodic patterns detection component [405] processes events [100] as it receives them.
- Therefore, small sections of the binned sequence are built as the underlying events [100] are acquired from the continuous clustering component [412] using for example deltas (see
FIG. 11 ): -
- In case the stream of events [100] needed to build a binned sequence arrives in increasing order with respect to their timestamps, the previously processed portion of the binned sequence is not needed since every newly arriving event falls at the end of the previously built binned sequence. A continuation of the binned sequence can be built only from the newly acquired events and used to update the sparse transition matrix [1115]. This operation is therefore akin to an “append”.
- In case the stream of events [100] arrives in no particular order, the newly acquired events are first sorted in increasing order. If the earliest event [100] from the newly acquired events [100] happened after the last time bin of the previous binned sequence, the previous portion of the binned sequence does not have to be reloaded or rebuilt [1140]. Otherwise, a sub-portion of the previously processed portion of the binned sequence is reloaded up to the position corresponding to the earliest of the newly acquired events [100].
- To continue with the above illustration, let us assume that new events [100] are added to the binned sequence (Portion 1) in the form of (Portion 2) below (We assume that we are using time bins of the same length). This will happen very often as deltas are received from the clustering component [412]:
- (Portion 2) e9|*|e10|*|e11|*|e12
- If the earliest of the newly acquired events [100], represented by e9, had a timestamp such that, relative to the previous portion (Portion 1) of the binned sequence, e9 falls after the time bin that contains e8 and f3, we can carry on without reloading any previous portion of the binned sequence and updating the sparse transition matrix will be the same as if we have had the whole binned sequence in one pass.
- However, if e9 fell within the previous portion (Portion 1), say at position 9, which was previously occupied by a non-event, we would have to reload only the sub-portion of the previous array up to, and including position 9, meaning:
- (Reloaded sub-portion) . . . *|e5 f1|*|e6|f2|e7|*|e8 f3 . . .
- And we would insert the newly acquired events [100] to obtain
- (Reloaded sub-portion updated) . . . e9|*|e5 f1 e10|*|6 e11|f2|e12|*|e8 f3 . . .
- Notice that in this last case the number of occurrences in this time interval as well as a previous gap at position 9 will have to be updated.
- Only the entry T[e,2] will have changed triggering only an update of the frequency component (e,*) and its associated intervals of occurrence and positions information.
- This type of incrementality can be achieved using a dirty flag for each matrix and setting it to true every time an update is made to the matrix, along with the set of two indices pointing to the modified entries. After a matrix is updated from a new binned sequence, process only the entries that have been marked as updated and reset the dirty flag to false.
- The possibility to locally carry on the computation and still produce meaningful updates without having to reload or rebuild previously used portions of the binned sequence is yet another win in terms of computational performance since we avoid, when the events being acquired are mostly sorted, many slow computer input/output operations and/or network traffic that would occur with high demand for previously computed data such as previous portions of binned sequences or unchanged entries in sparse transition matrices.
- The data structure representing a periodic pattern [126] has, but is not limited to, the following attributes:
-
- 1. A unique periodic pattern ID
- 2. A label which is a short, readable text typically used in displays to label the periodic pattern [126]. Labels are modifiable by external labeling programs even after a periodic pattern [126] has been created.
- 3. A set of cluster IDs representing the event [100] clusters from which the pattern [126] has been detected
- 4. A list of the frequency components found in the periodic pattern [126] and the time resolutions at which they were detected
- 5. For each frequency component, a list of segments and disturbances.
- a. A segment is an interval delimiting contiguous disjoint matches of a pattern [126] in a time binned sequence. It is therefore dependent on a specified time resolution. It simply corresponds to the intervals where a pattern [126] occurs when looking at a binned sequence derived from sorting and binning the events [100].
- b. A disturbance is an interval which delimits spans of time where a pattern [126] did not occur even though it should have when inferring from it frequency components.
- 6. The list of all event IDs constituting the periodic sequence [132]
- To these attributes can optionally be added any number of other attributes such as the different actors [220] linked to every periodic pattern [126].
- Segments and disturbances are also continuously indexed to allow querying based on time intervals. An embodiment of the data structure used to index segments and disturbances is a slightly modified version of an interval tree as described in [Cormen 2009] in which an extra annotation is added to every interval node to point to the corresponding periodic pattern ID.
- Periodic patterns [126] are not forwarded to other components or databases in their entirety. Because the present disclosure supports operating in a continuous mode where it must produce results as it receives events [100] and update results accordingly, creations and updates of periodic patterns [126] are forwarded in the form of periodic pattern mutations.
- A periodic pattern mutation is a form of typed update [107] consisting of enough information needed by any component receiving it to construct a new periodic pattern [126] or update an existing one. Its attributes include but are not limited to:
-
- 1. A unique periodic pattern ID
- 2. A set of cluster IDs representing the event [100] clusters from which the periodic pattern [126] has been detected
- 3. A list of newly added events [100]
- 4. A list of the frequency components to add to the periodic pattern [126] and the time resolutions at which they were detected
- 5. A list of frequency components to remove from the periodic pattern [126] and the time resolutions at which they where detected
- 6. For each frequency component, a list of segments and disturbances to add to the existing ones using interval merging operations (or create if there are no existing ones)
- 7. For each frequency component, a list of segments and disturbances to remove from the existing ones using interval subtraction operations
- 8. An optional new label.
- An example of a consumer of such periodic pattern mutations is the periodic patterns database which upon reception of each periodic pattern mutation updates or creates the corresponding periodic pattern [126] and saves it in the database for future use.
- This process performs the following actions upon reception of a periodic pattern mutation data structure:
-
- 1. For a newly found periodic pattern and its associated periodic sequence, store into database
- 2. For a continuous update to an existing periodic pattern, update stored information within the database.
- The periodic patterns database [940] is the main repository for all periodic sequences [132] and their associated periodic patterns [126]. It is continuously updated.
- It is internally designed to allow:
-
- 1. Real-time queries and structural analysis based on structural information such as periodic sequence [132] length, periods, gaps, etc.
- 2. Real-time queries and semantic analysis based on semantic information
In the default embodiment, structural analysis queries take the following forms: - 1. Given a time interval, find all periodic sequences [132] that have an overlap with that interval.
- 2. Given a periodic sequence [132], find all periodic sequences [132] that have an overlap with that periodic sequence [132]. The results can be sorted by decreasing overlap.
- 3. Bring up all periodic sequence [132] clusters (i.e. groups of most similar periodic sequences [132] with respect to the time intervals during which they occur or not)
The database [940] also allows ad hoc combinations of similar periodic sequences [132] to form new ones.
- The present disclosure comprises a continuous categorization component [420], which is leveraged in multiple ways, including but not limited to the following use cases:
-
- Whenever categorization is used and its results are continuously evaluated and updated, the categorization results are taken into account, in addition to various features built and continuously updated in the data model. This allows the anomaly detection component [450] to produce categorization anomalies as appropriate.
- Whenever an alert [305] is raised or any other type of preventive or mitigation action is taken, categories that have been automatically associated to the corresponding data are reflected in the alert [305]. Thus the alerts [305] themselves, and similarly all types of post-analysis actions, are systematically categorized.
- Finally, a user of the system can also perform online queries on any processed data using the different categories available. Continuous categorization here guarantees that up-to-date results are returned.
- The categorization model [1470] is initially built during system setup, and is later maintained throughout the system operation. Its primary purpose is to evaluate the results produced by the categorization components [1400] and to ensure their consistency and quality. The structure of the categorization model [1460] is summarized in
FIG. 16 , which shows a particular snapshot of the model. The complete history of the categorization model [1460] is actually persisted, along with the system or user decisions that resulted in each new model version. - This model is initially built on the results of knowledge engineering (which includes defining and tuning ontology classifiers [1410] and other categorization components [1400]) and on the reference data provided to the system, where reference data includes but is not limited to the following constituents:
-
- Monitoring rules such as compliance rules [865] used by the anomaly detection component [450] to detect specific types of anomalies [270]
- Organizational structure information including an organizational chart if one was provided to the system, information on actors [220] and groups [225], and a list and definition of workflow processes [128].
- After the initial construction of the categorization model [1470] and once the system has been continuously running for a while, the relevance model stores elements of information including but not limited to:
-
- History of all successive versions of the knowledge base (ontology classifiers [150], custom queries [1530], etc.)
- History of all successive versions of the categorization components
- History of all successive versions of the classification rules [1680]
- History of all categorization results [1590], including detected profile changes [1465] as described later in this section.
-
FIG. 14 shows the preferred embodiment of the categorization process, which can be broken down into three main phases: initialization, adaptation, and iteration. - Initialization phase:
-
- Knowledge engineering [1405] is composed of several sub-steps and consists in building, maintaining, and enhancing a set of categorization components [1420] available to the system.
- The set of categorization components [1420] initially built or modified during maintenance of the process are manually validated by a series of sanity checks [1410].
- An analyst configures the process by defining the scope of the categorization process, its performance targets, and all parameters determining the exact process behavior [1415].
Adaptation phase: - The user reviews data sampled by the system [1425].
- A “best fit” of categorization components [1420] that generalize sampling decisions to newly collected data in a near-optimal manner, as described for example in U.S. Provisional Patent Application No. 61/280,791, the disclosure of which is incorporated by reference herein for all purposes [1430].
- The user then performs a series of sanity checks on the classification rules that have been modified during an execution of this phase [1440].
Iteration phase: - Data is collected as described in another section of this disclosure [1445].
- Data is processed [1450], which includes data and metadata extraction and item tagging among other stages.
- The system then automatically applies the classification rules [1400] computed previously to all content extracted from the data [1455].
- This produces new categorization results [1460], which are then used to update the categorization model [1470].
- In parallel, the distribution of categorized data is monitored, which prevents deterioration of the process performance by detecting profile changes [1465] and re-running the adaptation phase when appropriate
- The set of categorization components [1420] available to the system comprises components of different types, as shown in
FIG. 14 . This enumeration is not exhaustive and can be extended to include any method defining subsets of the data set. - An initial set of components [1420] is built at the beginning of the continuous categorization process. These components [1420] are continuously maintained, meaning that some new components [1420] can be added and existing components [1420] can be deleted or modified. This happens either when a significant model change has been automatically detected in the data, or when an administrator needs to implement a new policy (whether internal or external to the organization).
- The following describes multiple methods provided by the system to assess quality of categorization results and automatically adapt in response to deteriorating quality or unexpected results.
- The system automatically evaluates the classification rules [1400] produced at the query fitting stage [1430] with respect to performance goals defined in the categorization scope. This comprises many variants of validity checks, for example by computing performance targets either on the basis of each classification rule [1400] retained by the algorithm, or on the basis of the final categorization decisions resulting from applying the whole set of rules [1400]. Different validity checks are also available when some categorization codes include unknown values or values corresponding to a high level of uncertainty.
- Further statistical validation can optionally be performed to assess the generalization power of the classification rules [1400]: for example, verifying the self-similarity of the categorization results [1460] (by expanding categorization beyond the manual sample) ensures that there is no major over-fitting with respect to the manual sample [1425].
- In addition to these automatic validation mechanisms, a manual validation stage [1440] needs to be performed. This verification is a sanity check, mainly consisting of controlling that the associations established between categorization rules [1400] and category codes make sense and that there are no obvious indications of over-fitting, i.e. of excessive adaptation of the rules [1400] to the data sample that clearly would not generalize to the data in its entirety. Also, manual validation of the classification rules [1400] is made easier by having the rules contain as much human-readable information as possible: for example, ontology classifiers [150] have an expressive name and a textual description.
- Following the assignment of category codes to the content of a batch or of data collected in real-time, the continuous categorization component [420] automatically evaluates the categorization results against the categorized data. This evaluation is done in addition to the manual and automatic validation steps already performed on the classification rules themselves. Furthermore, the user [455] implicitly evaluates the global categorization results by reading the reports generated by the process.
- A variety of techniques are used in this assessment: a basic technique consists of comparing results from a manual sample with results generalized over the whole data set.
- Profile change detection is a key step in assessing and guaranteeing quality of the output produced by the continuous categorization component [420]. When the data profile, i.e. the statistical distribution of the data analyzed by the system in a continuous manner, changes over time, the classification rules and the categorization components [1420] themselves risk becoming obsolete and decrease both in recall and in accuracy, thus deteriorating the quality of data processing and hence the overall quality of anomaly detection. Therefore in such cases an analyst needs to be notified that either manual re-sampling or component [1420] updating is necessary.
- There are different types of profile changes. Some profile changes can be automatically detected by the model: for example when a connector allows data collection from HR applications, the appearance of a large volume of data associated to a new employee can be related by the categorization process to that employee's information recorded in the HR system. This case of profile change usually does not reveal any flaw in the categorization components [1420] and only requires sampling data for the new actor [220].
- In most cases however, profile changes cannot be directly explained and a manual decision is necessary: such changes may indicate changes in the categorization results [1460], changes in the set of topics detected, changes in the actors [220] and groups [225] involved with the data, etc. In order to automatically detect as many changes as possible, the system relies on the categorization model [1470] described previously.
- The different types of categorization model [1470] refinements following a profile change include, but are not limited to, the following:
-
- 1. Patching existing categorization components [1420]
- 2. Creation of a new component [1420]
- 3. Suppression of an obsolete or invalid component [1420]
- 4. Combination or re-organization of existing components [1420]
- A variety of profile changes can be monitored, depending on the particular usage scenario. In the default embodiment of this disclosure, a default set of data profile changes are monitored, including but not limited to the following list (which also suggests examples of typical cases where such a change occurs):
-
- The hit ratio of a categorization component [1420] decreases significantly: Insufficient coverage by a component [1420] may indicate an incompletely defined ontology [148].
- The hit ratio of a categorization component [1420] increases significantly: The component [1420] may overlap with unrelated content recently that has appeared in the data, such as a commonly occurring topic [144].
- Change in topic map has been detected: New topics [144] are discovered, or previously important topics [144] are no longer present in the data.
- Mismatch between ontology-based classification and unsupervised clustering: Several items [122] with highly similar content are categorized differently by the ontology [148]. This may be due to over-fitting of the ontology classifiers [150],
- A significant volume of data is categorized as relevant but not attributed to any actor [220]: Some new actors [220] have appeared that are involved with relevant data, or it may indicate any other kind of important change in actor [220] interactions and social status.
- The volume of data categorized as relevant for an actor [220] decreases significantly: This may reveal insufficient accuracy of a categorization component [1420] in the context of this actor's [220] data or it may indicate any other kind of important change in actor interactions and social network relationships.
- Instances of a formal or ad-hoc workflow disappear this happens when business processes are formally updated or when work habits simply evolve over time.
- In the absence of such changes, re-sampling is still performed at a regular frequency, the interval between such re-samplings being optionally defined at the process configuration stage.
- It should be noted that each such profile change depends on a number of criteria. In the default embodiment, these criteria are available as detection thresholds including but not limited to the following.
-
- Absolute thresholds: for example, the minimum ratio of relevant data that should be attributed to an actor [220] (given that there will necessarily always be some content that cannot be assigned, such as unsolicited data or personal communications [123]).
- Relative thresholds: for example, the relative variation in the hit ratio of a categorization component [1420] that should be flagged as a profile change.
- Other metrics: for example, detection of a mismatch between the results of ontology classifiers [150] and those of an unsupervised clustering method [1510] depends on a clustering similarity metric. The most appropriate metric depends on the exact typology of the categorized data; however general scale-invariant metrics can also be used such as the variation of information distance.
- The previously described method in U.S. Pat. No. 7,143,091, the disclosure of which is incorporated by reference herein for all purposes, works by building a sub-graph in a regular graph (that is, not a hypergraph [114]), and then finding within it skeletons which conform to discussions [136].
- We now extend the prior invention with the use of a hypergraph data structure, which greatly increases expressivity over a conventional graph, as edges [115.20] can now be incident on any number of vertices [115.21] or other edges [115.20]. Using this model, a much greater variety of structures can be represented in the hypergraph [114] in a more efficient and easier to use fashion. Many embodiments will also use the hypergraph [114] to cache partial results in computations where doing so has advantages over re-deriving them (though there are many cases where rederivation can be superior, there are tradeoffs in both directions). The hypergraph [114] system defined here is designed to be used in an environment where elements are continuously added to the hypergraph [114] and may trigger incremental computations to update derived structures in the hypergraph [114], of which discussions [136] are an important example.
- As discussed in more detail below, most embodiments rebuild the sections of the hypergraph [114] in a localized area around where a new evidence [108] is detected. The closure and traversal hypergraph operations [115.26] are particularly good for this. Most embodiments will opt to accumulate some number of changes to be triggered in batches at a later time. A set of hypergraph operations [115.26] are defined which facilitate different strategies for defining the neighborhood of an element.
- In most embodiments including the default one, the continuous embodiment of discussion [136] building is based on accumulation of relationships between actors [220], events [100] and items [122] stored in a hypergraph [114] data structure. These relationships and the subsequent structures built on them are all considered evidences [108] to be used in the discussion [136] building procedure. The hypergraph [114] system is a generic framework supporting incremental hypergraph [114] computations, described here for the purpose of building discussions [136] on a continuous basis. In the default embodiment the hypergraph [114] is represented as a set of OSF [110] values serialized and stored in records [115.24] in a hypergraph store [11522] which in most embodiments consists of one or more large archives. Rather than using a keyed database, in these embodiments OSF [110] records [115.24] are referred to via their address in the hypergraph store [115.22]. In the default embodiment the hypergraph store [115.22] is tiled into segments [115.23], and addresses have the form of a segment [115.23] and an offset within that segment [115.23]. A segment table [115.25] is stored which contains state information over all segments [115.23] allocated at any time. If a segment [115.23] is unused it can be reallocated as an active segment [115.23], but the entries for prior uses of the segment [115.23] will be retained in the segment table [115.25]. New OSF [110] records [115.24] are added to the end of the newest segment [115.23] as they come in. Records [115.24] are only retained in the hypergraph [114] for a certain period of time, old records [115.24] and in some embodiments shadowed records [115.24] that are no longer accessible are removed from the hypergraph store [115.22] via the garbage collection scheme. The other main feature is that OSF [110] records [115.24] are immutable. When updating an existing record, a new version of a record to be updated is appended to the hypergraph store [115.22]. In the case where an updated record is used in the computation of derived records [115.24] or is otherwise referred to by existing records [115.24], new versions of those records [115.24] must be appended in order to make it visible. In practice this fits very well with the continuous computation model, as new evidence [108] added to the hypergraph [114] will trigger revisiting existing discussions [136] or other structures for which it is relevant. Thus the relevant entities that may have to be updated will be anyway visited as part of the incremental algorithm.
- In the default embodiment garbage collection is effected by simply removing segments [115.23] when they pass an age threshold. In this embodiment, items [122] that have become too aged are no longer directly used in the computation of new discussions [136] or other structures. It is important to note that different embodiments will implement the notion of “too old” in different manners depending on their primary use cases. While some embodiments may simply use a calendar threshold, others will operate on the basis of obsolescence. For example, data relating to an actor [220] who is no longer on the scene may be deemed obsolete faster than the same sorts of data for an actor [220] who remains in the universe. Likewise data relating to routine occurrences of something is less valuable than data which involves anomalous behavior [270], and therefore will be slower to obsolesce. One reason for doing this is to be able to make all computations responsive to recent trends in the data and not be completely overwhelmed by historical patterns. However, longer term trends are important as well, even if they have a smaller effect in most computations. Therefore a special set of segments [115.23] is set aside for storing models of these long term trends, patterns and other statistics. These segments [115.23] are handled differently; in most embodiments they don't store OSF [110] records [115.24] and are mutable and periodically updated with information in old segments [115.23] being removed from the hypergraph store [115.22]. The segment table [115.25] entries for collected segments [115.23] are not removed but rather modified to indicate that any references to records [115.24] previously stored in those segments [115.23] will be handled specially. In most embodiments, the process attempting to access those records [115.24] will be notified that they are no longer active and will be directed to the relevant segments) [115.23] containing trend information.
- For purposes of scalability the system focuses on locality in hypergraph operations [115.26], that is, operations based on looking at elements in the hypergraph [114] within a neighborhood of a triggering element or set of triggering elements. In the embodiment described here, additional subsystems are used for storing the original data from which records [115.24] in the hypergraph [114] have been derived as well as for storing the final computed structures, such as discussions [136], used by downstream modules and applications. Other embodiments may choose to store all or some of this data in the hypergraph store [115.22] as well. In such embodiments, segments [115.23] would be tagged for different roles, and only those segments [115.23] involved in incrementally computed results would undergo a collection process. Other embodiments may flag collected segments [115.23] so as to redirect the referring process to an external, longer term data store.
- There are many important advantages to immutability in the data store. Multi-threaded or multi-process access to the hypergraph store [115.22] becomes much simplified, with minimal to no requirements for record locking. The use of the OSF [110] formalism and the concept of unification of structures creates many opportunities for sharding and layering of the hypergraph [114] in a consistent and efficient manner. The key concept is that hypergraph [114] elements can be split up in different ways and unification can be used to rejoin them. The type system implemented for OSFs [110] allows us to place constraints on allowable modifications to parts of an element that have been placed in separate locations, thus helping to enforce consistency when they are eventually rejoined. These characteristics can be used to simplify sharding of a hypergraph [114], e.g. splitting it up into subsets to be distributed. Subsets can contain redundant information, or can split apart elements. New OSE [110] types can be created and sent with the shard to be used as constraints to make sure that modifications are handled consistently. Layering of the hypergraph [114] can be handled by splitting all elements in the hypergraph [114] based on subsets of features.
- Additionally, a general philosophy of recombining smaller pieces of an entity when the whole entity is needed allows us to make algorithms lazy in more ways. This is a benefit to incremental computations as we build up partial results during complex computations. In general those results will be stored as new hypergraph [114] elements. It is likely that only a smaller subset of those elements will be useful to more derived results such as discussions [136], e.g. not all actor identities [235] will show up in a discussion [136] for example. If we spend less resources to place the partial results in the hypergraph [114], then we avoid extra work for those that are not used in later computations.
- Most embodiments of the system define a set of hypergraph operations [115.26] on the hypergraph [114] including but not limited to the following: closure, projection, traversal and a set of hypergraph [114] path algebra operations. In the OSF-based embodiment [110] described here these operations may specify additional OSF [110] constraints [115.9]. For example the projection operation [115.28] would start with a set of edges [115.20] and select only those that unified with an OSF [110] value to be used as a constraint [115.9]. The system uses an additional method of constraining the elements that these hypergraph operations [115.26] can access, called “boxing”. A box [115.15] is simply a set of atoms [115.19] from the hypergraph [114]. The purpose of boxing [115.15] is to create groups of hypergraph [114] elements that are related by more complex relationships than can easily be expressed in queries [115.1]. These groups may also be determined by external factors not stored in or referenced from the hypergraph [114]. In many cases the boxes [115.15] may simply be useful for optimizing constraints during query [115.1] execution, by effectively pre-caching the relevant sets of items [122] conforming to the constraint. In the default embodiment, a box [115.15] is defined for each discussion [136]. In addition to the core elements of the discussion [136] (e.g. the external items and relationships between them) these boxes [115.15] may also include other elements of the hypergraph [114] that were considered to be the relevant evidence for the discussion [136] or related to the discussion [136] in other ways.
- In most embodiments, a query [115.1] mechanism is used both for ad-hoc retrieval of records [115.24] from the hypergraph store [115.22] as well as to continuously evaluate an active query [115.1] set on new records [115.24] as they are added to the hypergraph store [115.22]. In a normal graph representation, vertices and edges can be indexed fairly simply. For the hypergraph [114] implementation the system implements a set of query operators [115.30] to deal with the more complex nature of the hypergraph [114] elements.
- Incremental computations are triggered when records [115.24] are matched by the continuous queries [115.1]. In the default OSF [110] based embodiment, query matches [115.16] are produced as an OSF [110] value. These values include descriptions of the matched items [122] built up during query [115.1] evaluation, using unification constraints [115.9] in a manner similar to unification based parsing systems, or logic languages like prolog. When a query [115.1] is matched, the resulting match record [115.16] will determine which of the incremental update procedures should be triggered. The query [115.1] language is designed so that the query operators [115.30] can be used to build up a working set [115.17] of related hypergraph [114] elements during the evaluation of the query [115.1].
- In most embodiments, external indices are built to aid in query [115.1] execution. In the OSF [110] embodiment described here, skip lists are used as the indexing [115.11] mechanism. In this embodiment, the indexing [115.11] is used to speed up linear scans over records [115.24] from the hypergraph store [115.22] rather than fast random access. Some embodiments may also include index [115.11] schemes emphasizing random access, such as b-tree indices. Any of the skip list solutions described here could also be augmented to an inverted index [115.13] representation as commonly used in text search engines. Indexing [115.11] boxes [115.15] uses a form of skip lists as well.
- The primary goal for this default embodiment is to be able to build all of these indices by simply appending new entries without modification to the older entries. Thus the indices are meant to be read backwards, from the newest entries to the oldest. This works well in conjunction with the structure of the hypergraph store [115.22] in which newer versions of a record shadow the older ones. Various embodiments may use different methods for detecting when elements have been shadowed. One embodiment assigns a logical ID to records [115.24]. When a new version of a record is written, both records [115.24] share the same ID. Many embodiments will simply include the address of the earlier version of the record as an attribute. In both cases the query matching procedure [115.10] needs to keep a set of the first seen IDs or addresses that will indicate that later records [115.24] are shadowed. OSF [110] based embodiments can also use type information attached to a record to determine how it is related to the prior version. In one embodiment there are two types of records [115.24], records [115.24] storing just the changes between the two versions and records [115.24] that replace the prior version. Many other schemes are possible, so long as they efficiently and effectively dispatch data events [100] to the appropriate handler based on the record types.
- OSF [110] values and OSF [110] records [115.24] (an OSF [110] value serialized and stored in an archive record) are referenced frequently throughout the description of the hypergraph [114] system. The acronym OSE [110] refers to a particular type of feature structure called an “Order Sorted Feature Structure”, which is well documented in research literature. Feature structures are used throughout AI and natural language processing applications, again we refer to the literature for standard definitions and usages where these concepts well developed and widely understood. The central operation of feature structures is unification, which in essence merges two feature structures (where the values for features that appear in both structures are themselves unified). Only those usages that differ from common practice will be noted here.
- The most relevant feature of the OSF [110] formalism for purposes of the present disclosure is the type system. Types in a feature structure system are handled somewhat differently than in a conventional programming language. Types are themselves represented as feature structures and are available as values which can be unified against. This also means that types can be defined dynamically at runtime. While types are defined within a type lattice, in the default OSF [110] embodiment described here, one or more types can be attached to an OSF [110] value as long as they are consistent (i.e. can be unified with) the value, rather than the value being an instance of the type. In essence types are used to tag values. However types can also contain instance values in addition to the general classes (such as String, Number, Feature Structure and so on). This can make types useful a templates that store values common to all members of the type. In some embodiments these instantiated values are never stored directly in instance values, instead the implementation looks up the value from a type associated in the instance when it is not present. Values can also be uninstantiated and in this way function similarly to a type, i.e. the value represents a class of potential values. The default embodiment defines some additional intrinsic types in addition to those normally found in feature structure implementations; one representing a set of Choices over an enumerated set of values, and another for ranges for values that have a natural order associated with them.
- When constraints [115.9] are referenced in the context of an OSF [110] embodiment the representation is that of a set of OSF [110] values that are the equivalent of a set of unification equations as seen in common practice. One of the characteristics of feature structures is that multiple features (or feature paths) in the structure may point to the same value instance. This is what distinguishes feature structures from representations such as nested property lists. When setting up a constraint [115.9], the set of OSF [110] values each become a feature of a larger OSF [110] value, and some feature (paths) that exist in multiple of the OSF [110] values are modified to point to the same value instance. In this way, when we unify against the value of one of the subfeatures in the constraint [115.9], some of the resulting feature values may then be visible in other subfeature's. This property makes constraints [115.9] bi-directional, which is exploited in query [115.1] representations. If we have a chain of constraints [115.9], we can impose a restriction on all members of the chain by unifying a value against the final sub-feature value in the chain to produce a new, restricted constraint [115.9]. When a matching network [115.5] in an OSF [110] embodiment is generated, the operator nodes [1151] are linked via a tree of dependent constraints [115.9] (e.g. the values produced from constraints [115.9] on sub-trees in the network [115.5] are used as inputs to the constraint [115.9] on their common root node [115.7]). In this case one of the subfeatures is designated the right hand side of the equation (the RHS) and will be used as the OSF [110] value derived for the root node [115.7] after unifying values reported from the node's [115.7] children in the matching network [115.5].
- Hypergraph: The mathematical definition extends standard graphs to allow for edges connecting any number of vertices. The hypergraph [114] system described here extends this definition further by allowing an edge [115.20] to connect any number of vertices [115.21] and edges [115.20]. The list of atoms [115.19] composing an edge [115.20] are subject to an ordering constraint. There are three classes of ordering constraint. The simplest defines no ordering relationships at all, resulting in an unordered edge [115.20]. The most common case is that a total order is specified, which is straightforwardly represented as the order of occurrence of the atoms [115.19] comprising an edge [115.20]. In the most extreme case a separate partial order is defined and can be referenced as the ordering constraint for the edge [115.20]. The partial order is defined separately so that it can be reused across edges [115.20].
- Hypergraph store: the physical unit of storage for elements of the hypergraph [114]. It should be noted that there is no requirement that only hypergraph [114] elements appear in the hypergraph store [115.22]. The hypergraph [114] may be represented as one subset out of all the records [115.24] in the hypergraph store [115.22].
- Store address: An important characteristic of the system is that records [115.24] in the hypergraph store [115.22] are accessed via a direct address rather than by keyed lookup as would be done in database systems. In one embodiment this address consists of segment id and offset parts, indicating an offset into a of data in the hypergraph store [115.22]. In such embodiments the hypergraph store [115.22] maintains a master table.
- Atom [115.19]: Any hypergraph[114] element. As stated before, the default embodiment uses OSF [110] records [115.24] to store all atoms [115.19] will be referred to as the OSF [110] embodiment. Other embodiments may use data representations other than OSF [110] so long as they store a sufficient description of the atom [115.19]. In OSF-based embodiments [110], a set of types will be predefined by the system that correspond to the two kinds of atoms [115.19]. There may be more than one type defined for each kind of atom [115.19] because for example the system may allow successive versions of an atom [115.19] to be represented either by storing only the differences between the pairs or by adding a record that completely shadows the old version. In such an embodiment, records [115.24] of multiple types could then be used in the same hypergraph store [115.22], allowing the system flexibility when to use one method versus the other (for instance small atoms [115.19] may use the shadowing method, whereas large atoms [115.19] may use the difference method). In some embodiments atoms [115.19] may require an attribute that represents a “logical ID”, i.e. an id that uniquely identifies the atom [115.19] and is shared across successive versions of the atom [115.19] in the underlying hypergraph store [115.22].
- Vertex [115.21]: The fundamental unit from which hypergraphs [114] are formed. The OSF [110] embodiment will predefine a base type for OSF [110] records [115.24] representing vertices [115.21].
- Edge [115.20]: A relationship between a set of atoms [115.19]. Edges [115.20] minimally require an attribute containing a list of atoms [115.19]. In some embodiments the edges [115.20] may be ordered, the equivalent to directed edges in conventional graphs. The OSF [110] embodiment will predefine a base type for OSF [110] records [115.24] representing edges [115.20]. Additionally subtypes for distinguishing ordered and unordered edges [115.20] may be defined. Ordered edges [115.20] are the hypergraph [114] equivalent to directed edges in a conventional graph, where the directed edge places an ordering on the pair of vertices that it relates. One point of difference from conventional graphs is that there is no advantage to storing a reversed version of an ordered edge [115.20]. In a conventional graph it is often an advantage to index edges by either head or tail, which can then entail storing reverted versions of edges. As the hypergraph [114] has a more complex structure traversal, retrieval and other operations on the hypergraph [114] need to use a more general model.
- Index [115.11]: The hypergraph [114] system relies on additional structures to aid in the location and retrieval of atoms [115.19] from the hypergraph store [115.22]. The indices stored by the system do not necessarily aid in quickly finding a random element of the hypergraph store [115.22]. In particular they are more often used to iterate over elements in the hypergraph store [115.22] more efficiently. The goal for these indices is that they be very cheap to update as new data elements are added to the hypergraph store [115.22]. For this reason the skip list data structure is often useful; there is a standard body of practice on using skip lists to efficiently create balanced index structures.
- Hypergraph operations: The system defines a generic set of high level operations for working with the hypergraph [114]. These operations work by modifying a working set [115.17] consisting of atoms [115.19]. There are three core operations listed below.
- Closure operation: The set of elements reachable from an initial atom [115.19] or set of atoms [115.19]. The system defines a closure operation [115.29] that computes and adds this set of atoms [115.19] to a working set [115.17]. Closures potentially can be quite large and encompass the whole hypergraph [114], therefore most embodiments of the system use constraints [115.9] to only return elements of the closure that fall in some smaller neighborhood. In addition to the constraints [115.9], the closure operation [115.29] implements some extra conditions which separate it from a traversal operation [115.27]. An additional notion of consistency is often used when determining which atoms [115.19] are reachable. Different embodiments may choose from any number of consistency measures. In the OSF [110] embodiment described here, closure results in a set of edges [115.20]. The resulting set is restricted to be edges [115.20] that share a minimum number of atoms [115.19] in common. This measure is evaluated over the whole set. This calculation corresponds to a limited form of clustering and is intended to partially or fully replace the clustering operations that were used in prior implementations of the discussion [136] building process.
- Projection operation: Removes elements from a working set [115.17]. The elements are chosen via constraints [115.9] and other filters. Embodiments may use characteristics such as edge [115.20] weights, arity and so on to select edges [115.20] in the projection. In some embodiments edge [115.20] paths contained in the working set [115.17] are replaced with new edges [115.20]. This is accomplished through the use of a path algebra. In the default embodiment the path algebra operations are specified via an OSF [110] value and a set of types are predefined for that purpose.
- Traversal operation: Adds elements to a working set [115.17], starting from elements that are in the working set [115.17] and following edges [115.20] incident on those elements. The operation shares similarities with the closure operation [115.29], but is focused on more fine grained control of the traversal rather than properties of the resulting set. The operation can be parameterized to run according to a traversal pattern. In the default OSF [110] embodiment the traversal pattern is specified via an OSF [110] record. Breadth-first patterns add elements from successive neighborhoods out to a certain number of layers or hops. Depth-first patterns are used to select paths out to a certain number of hops. For example, a depth-first pattern may specify the first n-many paths, or the n highest weighted paths.
- Hypergraph Query The system implements an engine for running queries [115.1] against atoms [115.19] in the hypergraph[114]. A query [115.1] is represented as a nested expression [115.2]. Expressions [115.2] consist of terms [115.4] and functions [115.3] over terms [115.4] and functions [115.3]. Terms [115.4] define a match condition. In OSF [110] embodiments a term [115.4] is represented as a feature value pair. The feature is used to select a value from the OSF [110] value representing an atom [115.19] and the match condition is satisfied if the two values can be unified successfully. Query expressions [115.2] map to a set of operators [115.30]. These operators [115.30] operate on match records [115.16] that contain a payload [115.18] value and a working set [115.17] of atoms [115.19]. One set of operators [115.30] simply wrap the hypergraph operations [115.26] defined by the system. The modifications to the working set [115.17] performed by these operators [115.30] are limited by a set of constraints [115.9], i.e. only atoms [115.19] that pass the constraints [115.9] may be added/removed (or considered for addition/removal depending on what is most efficient for the operator [115.30] in question). Typically there will be several constraints [115.9] active. Most of the individual operators [115.30] will take a constraint [115.9] argument and there are other constraint mechanisms described below. As defined earlier, in OSF [110] embodiments constraints [115.9] are implemented via unification equations. This also means that a new OSF [110] record will be created as a result of evaluating the equation. Thus in OSF [110] embodiments constraints [115.9] can be used both as a mechanism for licensing an operation and for building up OSF [110] records [115.24] that describe the atoms [115.19] in a working set [115.17] in a match [115.16]. It should also be noted that operators [115.30] may define new transitory edges [115.20] that only exist in the working set [115.17]. In some cases those edges [115.20] may be placed back in the hypergraph store [115.22], but often this mechanism is used to reconstruct an edge [115.20] that was never directly placed in the hypergraph store [115.22]. This is used to implement lazy algorithms.
- Query Operators: These are the operators [115.30] used to specify queries [115.1] against hypergraph[114] elements. When evaluated a query operator [115.30] works on one working set [115.17], but a query procedure [115.10] maintains a list of working sets [115.17]. Each atom [115.19] initially matched in a query [115.1] spawns a working set [115.17] which may be expanded during the evaluation of query operators [115.30]. An embodiment of a set of query operators [115.30] appears later in this section.
- Box: An additional indexing [115.11] mechanism over atoms [115.19]. This mechanism is used to implement constraints that are in force during a query [115.1]. As stated earlier a box [115.15] is simply a set of atoms [115.19]. In some embodiments boxes [115.15] may simply be one type of edge [115.20] stored in the hypergraph [114]. In the embodiment described here boxes [115.15] are stored in a supplementary store. Boxes [115.15] are related to each other by containment, so the set of boxes [115.15] can be placed in their own graph structure with directed edges [115.20] from each box [115.15] to the largest boxes [115.15] contained within it. This means that if boxA contains boxB contains boxC, links only exist from A to B and B to C. Mathematically this structure is a finite lattice. Hypergraph [114] query operators [115.30] can specify that sub-operators [115.30] are constrained to work within (or outside of) a box [115.15]. Boxes [115.15] can be selected in several ways, and there are query operators [115.30] that can change the current box [115.15] by moving up or down in the hierarchy.
- In one embodiment, the hypergraph store [115.22] is an archive or set of archives that are split up into segments [115.23]. New segments [115.23] are allocated as the hypergraph [114] grows, and aged elements are removed from the hypergraph [114] by removing segments [115.23]. In order to allocate a new segment [115.23] the system first looks to see if there are any collected segments [115.23] (i.e. they have been removed from active use), or extends the archive to allocate a new segment [115.23]. The hypergraph store [115.22] has a master segment table [115.25] that indicates the type and status of a segment [115.23]. New entries are always appended when a segment [115.23] is allocated, This is because there may be dangling references to collected segments [115.23] in the active part of the hypergraph store [115.22]. The record for the freed segment [115.23] will contain information used to determine how dangling references are resolved as described earlier in this document. In the OSF [110] embodiment OSF [110] records [115.24] are stored at offsets into a segment [115.23], therefore they can be addressed with a segment id:offset pair as the address. As already noted, the OSF [110] records [115.24] represent elements of the hypergraph [114], called atoms [115.19]. However they are not restricted to that purpose, therefore the hypergraph [114] may be augmented with additional record types, or these additional records [115.24] may be used to do bookkeeping for storing partial state of an ongoing computation, etc. In one OSF-based embodiment [110], the OSF [110] records [115.24] are immutable. As discussed above, this places additional requirements on how hypergraph [114] structures are to be updated, but greatly simplifies management of the hypergraph store [115.22], particularly in multi-threaded or multi-process implementations. As the hypergraph store [115.22] is to be continuously updated, we seek to minimize the cost of adding new elements to the hypergraph store [115.22]. Hypergraph [114] computations that are triggered by the addition of new elements will then use lazy algorithms that only build out structures that are directly needed. The motivating factor for this design is that in many cases it can take more time to compute a result and store it in a way that it can be efficiently retrieved than to simply re-derive it as and when necessary. The hypergraph [114] system is designed to take advantage of these cases wherever possible.
- As described earlier, all OSF [110] records [115.24] are associated with one or more types. One set of types will determine what kind of atom [115.19] the record represents. However additional types associated to the record can be used to determine how the record should be handled by the system. Here we describe two of these additional distinctions: how a record is related to prior versions and the inclusion of special processing instructions for individual atoms [115.19].
- In one OSF [110] embodiment, new versions of a record can be added by either completely shadowing the prior version or by describing a set of differences from the old version. For example when dealing with a large edge [115.20] the system may only store new members added to the edge [115.20] rather than storing an entirely new copy. Using this differential storage creates opportunities for new synergies with the query matching procedure [115.22] described below. For example, in a case where the query [115.1] only specifies the traversal of the most recent n paths in a match [115.16], those paths will usually be found in the last delta added for an edge [115.20]. In that case we have avoided retrieving and updating the edge [115.20] as well as any other atoms [115.19] dependent on it, by only storing the difference into the hypergraph [114]. This scheme has potential costs as well, since we need to retrieve multiple records [115.24] to reconstruct the state of the whole edge [115.20]. The OSF [110] type system enables the implementation to select a tradeoff between these methods atom [115.19] by atom [115.19]. The details of specialized handling of records [115.24] are hidden behind the hypergraph operations [115.26] described above. This allows the system to evolve and add new representations without modifying existing applications.
- When analyzing real world data, no algorithm is perfect. When problems are identified, OSF-based [110] embodiments of the system allow for “fixes” to be placed in the hypergraph [114] in the form of specialized processing instructions in most embodiments. Embodiments using other record formats may be able to implement a similar scheme. In the OSF [110] embodiments, a predefined set of OSF [110] types represent additional instructions that can be given to the various algorithms used by the system. In an embodiment that computes actor identities [235] as elsewhere described, a possible problem might be that a particular pair of aliases [240] is erroneously associated with the same actor identity [235]. This system would allow an annotation to be placed on the respective alias [240] atoms [115.19] ing the combination of the two. These could be added to elements due to user feedback or other procedures detecting inconsistencies or problems in derived hypergraph [114] structures.
- Queries [115.1] have two purposes: to retrieve elements from the hypergraph [114] and to trigger hypergraph [114] computations in response to new atoms [115.19] being added to the hypergraph store [115.22]. The same query matching procedure [115.10] is used for both purposes.
- The query [115.1] functionality covers three areas in most embodiments: matching features of OSF [110] records [115.24], augmenting a query working set [115.17] with hypergraph operations [115.26], and placing constraints on the hypergraph operations [115.26] either by additional feature tests or by membership or non-membership in a box [115.15].
- The default embodiment described here represents query operators [115.30] as a set of functions [115.3], whose arguments are either other query operators [115.30] or feature structures to use as terms or to use as constraints [115.9]. Functions [115.3] accept match records [115.16] from their constituent arguments and produce a match record [115.16] if their condition is met. Match records [115.16] are represented as an OSF [110] record. This record contains bookkeeping information about the partial match [115.16] as well as a “payload” [115.18] feature that pulls some or all of the values out of its arguments. The initial match records [115.16] are created by comparing the leaf feature structures to atoms [115.19] in the hypergraph store [115.22]. If the feature structure unifies, a match record [115.16] created with the unified result as the payload [115.18]. If the operator [115.30] has a constraint [115.9], we attempt to unify the constraint [115.9] values against a match record [115.16] and the constraint [115.9] is satisfied if it succeeds. In one embodiment the constraint [115.9] is a unification equation that relates values in two feature structures. The resulting match record [115.16] contains the RHS value after successful unification of incoming match records [115.16] against the LHS. Note this same mechanism can be used to introduce new values into the match record [115.16] as well.
- In one embodiment the set of operators [115.30] is represented as a network [115.5] of vertices representing the operators [115.30] and edges that link operators [115.30] to any enclosing operators [115.30]. The highest operator [115.30] is linked to a final reporting node [115.8]. Any match records [115.16] that flow through to that node [115.8] are reported as results.
- Note that this structure can represent multiple queries [115.1], as the penultimate node [115.7] must be unique to each query [115.1]. When building the hypergraph [114] we reuse sub-queries [115.1] and only add nodes [115.7] for new ones. We can also turn off queries [115.1] by keeping a reference count of the number of active downstream nodes [115.7]. To deactivate a query [115.1] set the reference count on its penultimate node [115.7] to zero, and walk through its children decrementing the count. Only those nodes [115.7] with a positive reference count will report results forward.
- The query procedure [115.10] effectively creates a working set [115.17] for each atom [115.19] that it examines. Some query operators [115.30] are just wrappers around hypergraph operations [115.26] that expand or filter that working set [115.17]. We do not distinguish between atom [115.19] types in most embodiments as that can be done by selecting the atoms [115.19] that go into an index [115.11], as well as by feature matching and constraints [115.9].
- In most embodiments query operators [115.30] will include, but will not be limited to, the following:
-
- Is (feat, value) [118.05]: if the atom [115.19] contains “feat” and its value unifies with value, then create a partial match record [115.16] containing the unified value.
- IsNot (feat, value, [opt] pattern) [118.10]: if the atom [115.19] doesn't contain “feat” or it does and the value does not unify, create a match record [115.16]. If the optional third argument is specified, return the part of the atom [115.19] matching pattern.
- And (operator+, constraint) [118.15]: if all operator arguments produce a match record [115.16], pass on a new match record [115.16] based on a merge of the incoming match record [115.16]. The constraint
- [115.9] maps values from the incoming match records [115.16] to the result
- AndNot (operator, operator) [118.20]: if arg1 matches and arg2 does not, then pass on arg1.
- Or (operator+) [118.25]: pass through first matching argument, alternately pass through unification of all match records [115.16]
- Close (operator, constraint) [118.30]: calculate the closure of an atom [115.19], i.e. find the set of all directly connected elements. i.e. if
edge 123 shares an atom [115.19] with edge 234 they are both in the same closure. The operator allows for a feature structure that can be used as a constraint/filter on items [122] that are added to a closure. Closure calculation specifies the amount of overlap required between items [122]. (e.g. only if they share x % atoms [115.19]) this is just a modified form of what we do in the clustering framework with hashes, so in fact we can use the closure operator [115.30] to do clustering. - Project (operator, [opt] expression, constraint) [118.35]: take a subset of the working set [115.17]. As above a feature argument can be used as a filter. A path algebra expression can be used in some embodiments to create a new working set [115.17] with edges [115.20] based on paths found in the current working set [115.17]. In OSF [110] embodiments the path algebra expression will be represented as an OSF [110] record.
- Traverse (operator, [opt] expression, constraint) [118.40]: expand the working set [115.17]. By default this just adds the neighborhood for atoms [115.19] in the working set [115.17]. The operator [115.30] accepts an optional expression controlling the traversal as described for the hypergraph [114] traversal operation [115.27]. In OSF [110] embodiments the traversal instructions are passed in as an OSF [110] record.
- Within (box, operator) [118.45]: a box [115.15] is selected based on arg1. All operations within arg2 are constrained to atoms [115.19] within this box [115.15]. This box [115.15] overrides any box [115.15] selected by enclosing operators [115.30], similarly a sub-operator [115.30] may specify a box [115.15] that overrides the one chosen at this level. In OSF [110] embodiments argument1 is an OSF [110] record that is to be matched against a list of box [115.15] descriptors.
- Outside (box, operator) [118.50]: as above but the constraint is reversed.
- Widen (constraint, operator, mode) [118.55]: it is an error if no enclosing operator [115.30] specifies a box [115.15]. Select a box [115.15] that is higher in the hierarchy, using arg1 as a constraint against the box [115.15] descriptor. Arg3 determines whether all boxes [115.15] matching the constraint are used or the first one.
- Narrow (constraint, operator, mode) [118.60]: as above but move down in the hierarchy
- Union (operator+, mode) [118.65]: the mode determines whether or not we try to unify match record [115.16] payloads [115.18] or not. If unification is indicated the operation proceeds only if the payloads [115.18] can be unified and the unified result is used as the new match record's [115.16] payload [115.18]. Otherwise, a new OSF [110] value is created that contains each of the original payloads [115.18] as a feature. The union of the working sets [115.17] from argument match records [115.16] is returned.
- Intersection (operator, operator+) [118.70]: only succeeds if the intersection is non empty. The mode argument specifies how to handle the payloads [115.18] of arguments that have members in the resulting working set [115.17]. If unification is indicated the operation proceeds only if the payloads [115.18] can be unified and the unified result is used as the new match record's [115.16] payload [115.18]. Otherwise, a new OSF [110] value is created that contains each of the original payloads [115.18] as a feature. The intersection of the working sets [115.17] from argument match records [115.16] is returned.
- Annotate (operator, constraint) [118.75]: in the OSF [110] embodiments, the constraint [115.9] will create a new match record [115.16]. If the match record [115.16] unifies with the constraint [115.9] LHS pass on a new match record [115.16] constructed from the constraint's [115.9] RHS.
- The embodiments of the system described here have all assumed the presence of an index [115.11] based on skip lists. The system in fact does not necessarily require the presence of an index [115.11], any of the techniques described here can be used by scanning through the OSF [110] records [115.24] in the hypergraph store [115.22]. The value of skip lists is two fold: they are very cheap to maintain, and they implement a technique to speed traversal of the indexes [115.11]. The embodiment of skip lists described here orders the atoms [115.19] they index by time of arrival in the hypergraph store [115.22]. Alternatively they can support indexing based on other orders, but generally at a cost which is greater than the technique described here.
- We describe an embodiment of the system in which the skip lists are built in reverse order. The normal skip list model needs to be augmented so that we know how far forward a skip will go and there also needs to be a way to coordinate skip lengths across the indices. All indices will add atoms [115.19] in the same order, and a simple solution to the second problem is just to add every atom [115.19] whether it's a useful member of the list or not. However, that simple solution is not very desirable and can be improved by counting the atoms [115.19] that are not members of the list in the skip lengths.
- The base entries [116.05] in the list each contain a pair of the feature value for an atom [115.19] and its address. Additionally entries [116.05] contain a variable length array of pointer and skip count pairs [116.10] that are used to link entries together. Skip list entries can therefore be a member of multiple linked lists. The technique used here is that each successive level in the skip list skips a larger number of entries in the list. The goal is to be able to skip over a large number of entries in a small number of steps. In order to skip a number of entries [116.05] from the current entry [116.05], find the level with the largest skip count that is less than or equal to the remaining number of entries [116.05] to skip. Follow the pointer to the next entry on that level and subtract the skip counter from the remaining number of entries, terminating when the remaining count reaches zero. This procedure is used during query [115.1] execution to speed traversal of the hypergraph [114] when there are multiple query terms [115.4]. Every time query [115.1] matching fails for an atom [115.19], we can skip over the entries [116.05] for that atom [115.19] in the lists that have not yet been visited for that atom [115.19]. As a consequence of the design, queries [115.1] that are more restrictive and/or have a larger number of terms [115.4] and sub-expressions [115.2] will gain the most, due to being able to maximize skip lengths. This is because in general more skips will accumulate before revisiting one of the later index [115.11] lists. An embodiment may optimize query [115.1] execution speed by ordering term [115.4] tests such that indices referencing fewer atoms [115.19] or indices or tests that are determined to be more likely to fail are visited first. In this way the procedure can find skips both earlier and by visiting a smaller number of index [115.11] lists. As we only visit the later lists when the query [115.1] evaluation for an atom [115.19] has not already failed, this has the result of producing larger skips in those lists.
- Construction of the list is relatively simple and requires a counter that is incremented for each new item added to the hypergraph store [115.22]. When a new item is added to the hypergraph store [115.22], determine which skip lists to update. For each skip list, add a new entry [116.05], and determine how many levels of the skip list to include it in. For instance to have skips of increasing orders of magnitude, place it in the second level if the current counter is a multiple of 10, and in the third level if a multiple of 100 and so on. When there are atoms [115.19] that are not to be included in a list, this may result in more level entries than necessary. In one embodiment this problem is handled by keeping a separate relative counter for atoms [115.19] added to the list and use that counter to determine the number of levels, however record the skip count based on the first counter. This will require that the last positions used at each level from the first counter be tracked. When adding entries to a skip list with a level array [116.10] of length x we first increment the head's [116.15] skip counters for each level then copy the first x level pairs from the head [116.15] of the list into the new entry's [116.05] level list [116.10] and update the first x level pointers in the head to point to the new entry [116.05]. Finally, set the first x skip counters to zero.
- Augmenting the skip list to be an inverted index [115.13] is fairly straightforward. It has the same structure as the skip list above (see [116.20] [116.25] [116.30]), however the level lists are constructed slightly differently. An inverted index [115.13] essentially acts like a collection of skip lists in this context. We add an extra table [116.35] that maps from each feature value entry to the head [116.30] of a set of skip list levels used for just that value. The construction procedure is the same as above with the change that we append together the level lists [116.30] for each of the skip lists that reference the atom [115.19]. One embodiment will retain the feature value in skip list entries [116.20] so that values contained in the head table [116.35] may be used to denote a range of possible values (which can be used to reduce the number of posting lists). In this embodiment each value contained at the head table [116.35] must be able to successfully unify with all of the values contained in its associated set of levels [116.30] [116.25].
- The skip list index [115.31] used for boxes [115.15] has an additional purpose beyond the skip lists described above. It is desirable to be able to directly enumerate the members of a box [115.15]. To do this we add to the skip list representation above a set of levels consisting of <boxId, pointer> pairs. These allow us to chase pointers to recover the members of a box [115.15] as well as skipping using the other levels. One set of skip list levels [117.05][117.10] [117.15] is constructed similarly to the skip lists described above, however a feature value is not required to be stored in the base list entry [117.05]. An additional set of levels [117.20] is added for each of the box [115.15] membership lists, along with an array of the heads of each level [117.25]. In order to be more space efficient, an embodiment can choose to store only those containing boxes [115.15] at the bottom of the box [115.15] hierarchy. For example in
FIG. 117 , Box1, Box2, Box4 are the only ones that would need to be placed in membership lists [117.20]. Membership in other boxes [115.15] could be found by searching the parents of these boxes [115.15] in the hierarchy. The embodiments described use a separate data structure to track the hierarchy of containment relationships between boxes [115.15]. In one embodiment the box [115.15] hierarchy is simply kept in the hypergraph [114] itself. - The following description of query [115.1] matching is written assuming an OSF [110] embodiment. However the logic of the procedure [115.10] is not dependent on an OSF [110] representation. There are some asides in the description below that apply to an OSF [110] record representation and they do not limit the applicability of the procedure [115.10].
-
FIG. 119 defines the logical structure of the query matching procedure [115.10] that will be used in many embodiments. The figure contains branches for both indexed and continuous queries [115.1]. For indexed queries [115.1] a list of indexes [115.11] is kept. An index [115.11] is marked as exhausted when the procedure [115.10] reaches the end of the index [115.11]. If an index [115.11] entry refers to an atom [115.19] from a segment [115.23] that has been collected, then depending on the garbage collection strategy used by the embodiment, the end of the list may have effectively been reached. Each index [115.11] is associated with a particular feature, and an additional list of the tests based on that feature is kept for each index [115.11]. - When running a continuous query [115.1] the procedure [115.10] has access to the OSF [110] record itself in most embodiments. An embodiment may choose to bypass the scan over indices and scan over atoms [115.19] in the hypergraph [114], retrieving the OSF [110] records [115.24] and Tests for the continuous queries [115.1] first retrieve a value from the atom [115.19] based on the feature. The test can specify whether or not the value is required to be non-empty. If it can be non-empty the test succeeds vacuously, otherwise the test succeeds if the test value can be unified against the retrieved value. The following text describes the workflow used in one embodiment.
- Starting from [119.05] an indexed query procedure [115.10] enters a top level loop that scans through the set of indices. First it selects an index [115.11] [119.10]. If the index [115.11] is exhausted [119.15] then check to see if there are any more indices [119.20]. If not then the procedure [115.10] terminates. Otherwise we enter a loop that evaluates all the initial feature tests [119.35].
- A continuous query procedure [115.10] [119.30] will bypass the loop that scans through indices and start at [119.35] as well. At [119.35] the procedure [115.10] selects the next test and checks to see if it is satisfied [119.40]. If so then we create an initial match record [115.16] and advance it to all target nodes [115.7] associated with that test [119.45]. If a test appears in more than one immediately containing expression [115.2], then it is linked to the operator nodes [115.7] created for all those expressions [115.2]. When a match record [115.16] is advanced to a node [115.7] it may become active. The activation of an operator node [115.7] depends on its type. However the decision is generally based on receiving the appropriate arguments and any initial tests or constraints [115.9] to be satisfied. If a node [115.7] does not become active then the match record [115.16] advanced to it will be stored, in case other match records [115.16] are reported to the node [115.7] later in the procedure [115.10] and it then becomes active. The procedure [115.10] next moves all nodes [115.7] that became active in the last step into the active node queue [119.60]. If there are more tests [119.55] then the procedure [115.10] continues [119.35]. Otherwise we move to a loop [119.70] that continues attempting to activate operator nodes [115.7]. Note that if we are running an indexed query [115.1], then at [119.55] the test checks to see if any tests remain in the list for the current index [115.11].
- It should be noted that the operator nodes [115.7] corresponding to the outermost query expressions [115.2] are linked to a final reporting node [115.8]. The final node [115.8] does not become active as operator nodes [115.7] do, rather when a match record [115.16] is advanced to the final node [115.8] it is reported as a query [115.1] match. While there are more nodes [115.7] in the active node queue [119.70], remove an operator node [115.7] from the queue and evaluate it [119.75]. If the evaluation fails [119.80] then continue with the loop. Otherwise, we create a match record [115.16] and advance it to any down stream nodes [115.7] [119.95]. As with the initial tests, during generation of the matching network [115.5] an embodiment may detect shared sub-expressions [115.2] and connect the sub-network [115.5] generated for the sub-expression [115.2] to the operator nodes [115.7] generated for all immediately containing expressions [115.2]. Any nodes [115.7] that became active in the prior step are placed in the queue [119.90].
- When the queue empties [119.35], any completed query matches [115.16] will have been advanced to the final node [115.8] and reported. The procedure [115.10] resets the matching network [115.5] state by removing all match records [115.16] from operator nodes [115.7] [119.100]. If this is a continuous query [115.1] [119.85] then we are finished. Otherwise if there are no more atoms [115.19] [119.65] then we are finished. Otherwise we advance to the next atom [115.19] [119.50] and reenter the initial loop [119.10]. In one embodiment the procedure [115.10] only need check that there are unexhausted indices and it will set the current atom [115.19] to the next atom [115.19] in the next entry of the first index [115.11] it chooses.
- There are several conceivable ways that embodiments can optimize this procedure [115.10], the optimizations will in general be different for the continuous and indexed cases. One of the most valuable areas to optimize is the initial tests against feature values. One embodiment will substitute more inexpensive test routines for commonly recognized patterns in the OSF [110] values and thereby avoid the unification step. This procedure [115.10] may be simplified by introducing a set of OSF [110] types for which optimizations have been determined. In continuous queries [115.1] one embodiment unifies several or all of the test conditions so that they can all be checked with only one unification operation. For example several tests feeding into an “and” operator [115.30] may be combined this way. This lets the unification code find the most optimal strategy for the two values. Again, the type system may be used within the unification implementation to dispatch to possible optimizations.
- The following procedure describes how the hypergraph [114] system is used to enable building of discussions [136] in a continuous usage mode. It should be noted that the hypergraph [114] system described here is a general purpose tool; it is used to continuously compute a large variety of complex structures in addition to discussions [136]. The embodiments referenced here reference discussions [136] but the use of the system is not limited to this one procedure.
- For continuous processing a large set of queries [115.1] are set up to be run continuously. These queries [115.1] are intended to trigger computations in response to new atoms [115.19] added to the hypergraph [114]. The system contains a set of dispatch rules that determine which incremental computation (if any) should be run in response to query matches [115.16]. These rules trigger computations intended to synthesize new pieces of evidence as well as the procedure(s) used to create or update discussions [136].
- In one OSF [110] embodiment queries [115.1] produce OSF [110] values as part of the matching procedure [115.10]. These queries [115.1] also contain constraints [115.9] expressed as unification equations that are used to build up the resulting match record [115.16] values. These queries [115.1] do not have to look for particular values, and would not be very useful for incremental computation if they did. In the OSF [110] embodiment queries [115.1] can easily be set up to detect broad classes of evidence. This is because OSF [110] feature structures can contain empty, typed values as well as choice and range values that can unify against a set of possible values. This OSF [110] embodiment will also use constraints [115.9] to represent the dispatch rules triggering incremental computations. As elsewhere, these constraints [115.9] can be used to construct a new OSF [110] value based on the value that they unify against. In this context that functionality can be used to pass instructions on to the routine carrying out the incremental computation. One way of describing this system is as a feed-forward inference engine, and is subject to the same sorts of optimizations as those systems. When a computation is triggered both the payload [115.18] and working set [115.17] from the match record [115.16] are passed to the triggered procedure.
- An example of the general flow of processing is as follows. An email is added to the hypergraph [114]. Evidence rules are triggered which perform computations such as resolving aliases of the sender and recipients of the email to actors, possibly creating new actors [220] as a side effect. Various other relationships might be found for content in the email and so on. As any new evidences are added to the hypergraph[114] new query matches [115.16] may be produced. At some point, a query [115.1] which checks to see if the actors [220] associated to the email are consistent with actors [220] in one or more discussions [136] or emails (via the closure query operator [115.30]) triggers a computation to see if the email should be added to one or more of those discussions [136] or cause the formation of a new one.
- Embodiments have a great deal of flexibility in how to handle this flow. For example an embodiment may run match records [115.16] or incrementally triggered computations in batches to share work in computing updates. The order in which computations are triggered may be important as well. Embodiments may use a queueing methodology, embed conditions in the dispatch rules or other mechanisms to establish orderings. Any such approaches are heavily dependant on the rules used, the set of computations that are available and details of how they work.
- The structure of the procedure is fairly straightforward. When a match record [115.16] is reported [120.05] it is compared to a list of dispatch rules. For each rule that is matched, if it is an evidence rule (e.g. used to calculate some intermediate result), an evidence calculation procedure is triggered [120.10]. The procedure may run immediately at this point or be scheduled to run later or in a batch. At the point the procedure runs, it may either enumerate the current evidences that are affected by the new data, or simply produce a new evidence and in the next step the overlapping or otherwise affected evidences are determined [120.15]. Once this set is determined a set of updates to the hypergraph [114] have to be decided [120.20], as noted elsewhere these updates may be new atoms [115.19] that shadow the old definition or atoms [115.19] that specify a set of changes to the earlier version of the atom [115.19]. The decisions here are dependant on the type and content of the new evidence-atom(s) [115.19].
- If a matched rule triggers a discussion [136] building computation the sequence is somewhat more involved. First the working set [115.17] of the query [115.1] is expanded to produce an on the fly hypergraph[114] of evidences (i.e. the working set [115.17]) related to the item and any discussions [136] that triggered the computation [120.25]. This expansion of the working set [115.17] may also create new transitory edges [115.20] reflecting heuristics and other ad-hoc methods for specifying or limiting relationships. At a broad level the discussion [136] is built by taking a projection from this evidence hypergraph [114] [120.30]. Projection in this sense is defined similarly to the hypergraph[114] projection operation [115.28]. In one embodiment the working set [115.17] is transformed into a set of pairwise edges [115.20] and the projection is found by running a maximum spanning tree algorithm. In other embodiments the algorithm is modified to run on a hypergraph[114] directly. The algorithm is further augmented by conditions that disallow links to be used by the algorithm. The general result of this is that the algorithm will act as if the hypergraph [114] is split into components and spanning trees will be computed for each of the components.
- As with the calculation of evidences the set of current affected discussions [136] is enumerated [120.35]. The new discussions [136] are added to the hypergraph[114] [120.40]. In one embodiment discussions [136] are represented as an edge [115.20] and a partial order on the items on other constituent structures in the discussion [136]. In an alternate embodiment the edge [115.20] is unordered and any process viewing the discussion [136] will decide how to order and display the members of the discussion [136] based on retrieving associated edges [115.20]. In what is perhaps the most flexible scheme, an embodiment will simply define a discussion [136] as a box [115.15], and the reported result will be the ID of the box [115.15]. A process can then construct a view of the discussion [136] or discussions [136] related to an item by running a query [115.1] constrained by the box [115.15], or simply return all the members of the box [115.15]. The system assumes that the discussions [136] are reported [120.45] to an external process or store or other similar mechanism. In one embodiment the address of a discussion [136] edge [115.20] is reported. In this manner the client can be notified when there are changes to a discussion [136] or a new discussion [136] has been added. Then rather than running a query [115.1], the client can directly retrieve the edge [115.20] and traverse to related atoms [115.19].
- An important assumption is that atoms [115.19] describe a subset of relevant characteristics of source actors [220], items and events. In most embodiments, the source data for these entities will be archived separately and atoms [115.19] will contain location data for the source data corresponding to each atom [115.19]. In the description below, as elsewhere, an OSF [110] based embodiment will be assumed. Embodiments that use other representations will need to be able to at least minimally represent the characteristics described below.
- The core concept is that the structure of atoms [115.19] be defined by a set of facet types [121.40]. The intent is that the structure of an atom [115.19] can be determined by unifying a list of facet types. The implication of this is that facet types do not define individual values contained in atoms [115.19], rather they are a slice of the final atom [115.19]. The set of facet types associated with an atom [115.19] provide an efficient mechanism for dispatching to incremental computation procedures. The set of types is intended to have the effect that most dispatch rules need only test for one or more of these types in order to determine what to trigger for an atom [115.19].
-
FIG. 121 provides a highly schematic example of how atoms [115.19] are associated with facet types in one embodiment. A type Emailltem [121.45] is defined as the unification of the types ItemIdentity [121.05], ActorBroadcast [121.20], Archived Content [121.25] and EdgeAtom [121.35]. EdgeAtom [121.35] defines the set of characteristics necessary for an edge [115.20] atom, i.e. an address, a logical ID, a list of incident atoms [115.19] and so on. ItemIdentity [121.05] contains characteristics that identify the source item. ActorBroadcast [121.20] is used for messages that are broadcast from sender to a list of recipients. ArchivedContent [121.25] defines fields that specify where the email's content can be found. For comparison, a type IMConversation [121.50] represents a set of IM messages that have been determined to be part of one conversation. Therefore it is associated with a collection of items and uses the CollectionIdentity [121.10] facet. The set of actors [220] associated with a conversation (in some IM systems conference rooms can be set up which allow for more than 2 participants) is an unstructured list, therefore the ActorPool [121.15] facet is used. The content of the conversation is structured into a hierarchy of exchanges, turns and so on which is represented via the HierarchicalContent [121.30] facet. - The Cataphora system is deployed in a world with an extremely broad variety of real world data sources and use cases. In the real world, many companies and other environments will have unique types of items. As an example consider an email system such as Lotus Notes, where templates for forms and other custom message types can be added to the system. In such cases, messages that can be handled as conventional email are a just a subset of the traffic carried by the internal, potentially highly customized Lotus Notes system.
- One embodiment of the Cataphora system will have a set of complex computational procedures that will be difficult to implement and change. For all practical purposes these are a fixed set, though they can be changed and updated over time. In order to bridge the gap, these procedures are built to recognize facets of items. The system does not have any intrinsic notion of email, but rather what types of relationships are relevant to a message that is broadcast to a list of recipients.
- The strategy employed to integrate data sources into most embodiments of the Cataphora system is to determine what set of facets best represents each item. Items are archived external to the hypergraph store [115.22], and an OSF [110] record is created conforming to the set of facet types chosen for each item or event or other object to be represented in the system. An advantage of the OSF [110] formalism as opposed to other formalisms for feature structure types is that it does not require total typing, i.e. an OSF [110] Value can have features and values in addition to those specified by the type of the value. The implication for the system is that we associate a set of types to a value rather than making it an instance of a type per se.
- Since every organizational computing environment is different, most embodiments support the addition of arbitrary new types of evidence sources [108]. An evidence source [108] is defined to be any system which produces regular records that can reliably be linked to one or more actors in the relevant universe. Most valid evidence sources [108] will have a time and date stamp associated with each unique event record that they generate, however some embodiments will support evidence sources [108] that lack this, and instead use indirect means to infer a time and date (such as the arrival of an event between two other events which do have such a timestamp.) Common examples of evidence sources [108] include but are certainly in no way limited to: transactional systems, scheduling systems, HR systems, accounting systems, intelligence monitoring systems, and systems which crawl the web looking for comments in areas of specific interest.
- Each new evidence source [108] represents at least one new dimension in the model; what is really the case is that each distinct type of event does. For example, transactions which are cancelled in a trading system are likely to be considered a different vector than transactions which exceed some externally specified level of risk. In many embodiments therefore, each broad class of event will be considered a vector in a minimal integration, though of course whoever is performing the integration can decide what vectors make sense for their particular purposes if they wish. Most embodiments will also allow non-orthogonal vectors to be expressed because so doing will often add significant value. For example, a marked overall increase in the dimension of emotive tone on the part of an actor may be considered noteworthy in many instances of the system's deployment; that such increase is largely the result of a particular topic or is in relation to another specific actor is also often well worth knowing in such cases. While this of course can be set up manually, many embodiments will also automatically perform such combinations whenever it is merited by a statistically uneven distribution of data such as in the example just above. Some such embodiments will generalize and expand the dimensionality of the model based on such empirical observation. Some embodiments may opt to maintain a “virgin” model that factors in no user feedback as a security control.
- This approach is critical to being able to describe a rich, high dimensional model of behavior which allows for even relatively small, subtle changes in behavior to be trapped. Even more importantly, such a model can be defined without any definition of rules, which tend to have a very limiting effect. That is, they look for behavior that is believed to be bad based on some prior incident, rather than having the aim of identifying unexplained changes in behavior that especially when used in conjunction with other types of data or evidence sources [108] can help predict a dangerous incident before it occurs.
- As described in U.S. Pat. No. 7,143,091, the disclosure of which is incorporated by reference herein, ad hoc workflows are processes [128] that differ from formal processes [128] in at least one of two ways: they are not documented (or not documented thoroughly) as formal workflow processes [128], and they are not subjected to such strict enforcement as formal workflow processes [128], i.e. they can tolerate various kinds of deviations from the expected model, such as missing stages [154], additional stages [154], unusual number of iterations, etc. The present invention expands on this definition to continuously detect the presence of significant workflow processes [128] and to assess how regularly these workflows are performed by various actors [220], which in turn allows detection of anomalies [270] in assessed behavior [205] in that regard.
- More precisely, ad-hoc workflow models are detected during the analysis of the input event stream by the continuous workflow analysis component [465]. In the default embodiment, a new workflow model is built whenever a statistically significant volume of event sequences [166] are identified which conform to a particular pattern, including but not limited to the following types of patterns:
-
- 1. A sequence of documents [162] (including templated documents) of given nature (i.e. document type) and function (i.e. business attributes) appear in the same relative order. For example, a recruitment process in a small or medium business may be characterized by the exchange of candidate resumes, of interviewer feedback following interviews, and of an offer letter—all these events involving actors [220] within the HR department as well as within another specific department.
- 2. A sequence of communications [123] or other events [100] involving actors [220] within a given group [225] and have topical and/or pragmatic tagging constraints. For example the development of a service proposal for a customer may not be formalized within the corporation but follow an informal, iterative review process on topic [144] X, with pragmatic tags such as the following: a work product is sent, a work products reception is acknowledged, agreement or disagreement with some work product, approval or rejection of the final version of the work product, etc.
- 3. A combination of any of the previous patterns. For example, transactions in a financial organization involving over-the-counter instruments (i.e. those not traded on a public exchange) tend to follow similar series of steps although these are not strictly formalized or constrained in the daily course of business: back-and-forth negotiation of a deal between a trader and one or more brokers (using any type of communication channel [156]), followed by a confirmation phone call, followed by a back-office operation to record the deal in the bank's database systems, etc.
- In one embodiment of the present disclosure, once a pattern has been detected as significant in the baseline data analyzed for the whole set of actors [220] or a subset of those, the workflow model is built as a higher-order Markov chain whose states are composed of the individual event [100] and item [122] patterns, including but not limited to:
-
- Type of event [100]
- Document [162] attributes in the case of the creation, modification, deletion, check-in, check-out, etc. of a document
- Topics [144] and pragmatic tags [172] associated to the event [100]
- Sender and recipient group [225] when the event [100] is an electronic communication [123]
- These states are augmented with primitives to accommodate the expressiveness of ad-hoc workflow processes [128]:
-
- Fork primitive to represent parallel paths in the workflow
- Iteration primitive to represent basic loops in the workflow (where the number of iterations is by default unbounded).
- Conditional loops, whereby each iteration step is condition on a predicate being satisfied.
- In another embodiment, for better completeness and space efficiency, a probabilistic suffix automaton is used instead of a higher-order Markov chain.
- In either case, as described in the following, the model is trained on a training set defined for example as a group's baseline window of events. In the default embodiment, re-training occurs at regular intervals defined in the system configuration (which, depending on the business domain considered, may be in the order of weeks or months). In another embodiment, a load-balancing component adapts the training frequency so as to find an adequate trade-off between maintaining a recent model and not overload the machines hosting the various components of the processing and analysis layer [402].
- The ad-hoc workflow model thus built is leveraged by the present disclosure in two different ways: anomaly visualization and outlier detection.
- Firstly, as described in U.S. Pat. No. 7,519,589, the disclosure of which is incorporated by reference herein for all purposes, the ad-hoc workflow model is visualized so as to very efficiently spot any abnormal information flow or steps in any process, particularly a critical one. Such anomalies include but are not limited to:
-
- Missing stages [154] in the workflow.
- Additional, unexpected stages [154] in the workflow (including when the process unexpectedly morphs into several workflows).
- Unusual number of iterations in a looping step (for example, for a deal negotiation that usually requires at least 3 rounds of deal terms review, there is never more than a single round when a given sales person is in charge of the deal).
- Unexpected time intervals between events [100] (either significantly shorter or significantly longer than the usual delay).
- Anomaly in the workflow structure: for example if a parallel step is supposed to be performed under certain conditions (such as a background check when hiring for an executive position), that parallel step may be bypassed when the conditions are met, or conversely may be unduly performed to eliminate specific candidates from the hiring process.
- Anomalies detected outside the workflow process [128], but affecting some of the actors/groups involved in the workflow and within the same time frame: e.g. 2 actors systematically changing their communication patterns at a given step of the process, or an external event that exhibits a strong correlation with the process [128].
- Secondly, the model is used to detect outlier workflow instances [134]: these are instances [134] that match the informal process [128] definition (because they exhibit the properties previously described in terms of document types, actors [220] and groups [225], topics [144] and pragmatic tags [172], etc.) but have a very low probability of having been generated by the model.
- Using the implementation artifacts described above (higher-order Markov chains and probabilistic suffix automata), this outlier detection mechanism is straightforward as the generation probability is given by the model.
- For example, when the workflow process [128] is modeled by a probabilistic suffix automata, one embodiment of the present disclosure uses the following similarity measures for detecting outliers:
-
- Compute the probability that an input workflow instance [134] has been generated by the model as the product of all transition probabilities and compare this probability to the probability of its being a randomly generated sequence as the product of the distribution probabilities of each state. If the first probability is higher than the second, the workflow instance [134] is flagged as an outlier.
- Alternatively, compute the normalized log-probability of the input workflow instance with respect to the workflow model. If this probability is within a given multiple of standard deviations (typically 3) from the average probability of workflow instances [134] within the training set, this instance [134] is flagged as an outlier.
- It should be noted that outliers, as well as anomalies, are defined by assessing the normalcy of an input workflow instance [134] with respect to a model derived from the observation of instances [134] as training set. As is the case for the general case of detecting anomalies by deviation [805], this training set or referential can be defined in a number of ways, including but not limited to the following important definitions:
-
- Instances for the same actor groups [225] at a prior time (peer-group referential, as formalized in the section on Anomaly detection). This allows detection of deviations from an informal workflow process [128] over time within a certain set of individual actors [220] in the organization, as well as specific actors [220] who do not follow the same workflow as the majority of the other actors [220] in the group [225] (for example, it might be interesting to detect actors [220] who are systematically sloppy and skip important stages [154] in a workflow process [128]).
- Instances for the exact same actors [220] at a prior time (called baseline referential). This allows the system to detect deviations associated to a particular actor [220]. The benefit of this referential compared to the previous one is that by definition, ad-hoc workflow processes [128] do not impose strict constraints on the sequence of stages [154] that constitute them, and the fact that an actor usually performs a process in a different way from the rest of his peers does not constitute an anomaly [270] by itself but only denotes somewhat different working habits or modes of communication and interaction. However, the fact that the workflow performed by that particular actor [220] exhibits significant and a priori unexplained changes at some point in time, is worth being flagged as unusual by the system.
- We claim a method and system to identify and analyze occurrences of subjective emotional expression in electronic communication. Human evaluation involves subjective and objective components. If we judge a steak by saying “This steak is prime, by USDA standards”, the judgment is relatively objective. If we judge a steak by saying, “Wow! What a delicious steak!”, subjective elements are in play. Systems of “sentiment analysis” or “opinion mining” (comprehensively surveyed in [Pang 2008], pp. 1-135) typically don't distinguish these different (but sometimes overlapping) forms of evaluation and expression. While they often do recognize a distinction between subjective and objective aspects of appraisal, the relevant notion of subjectivity is decoupled from direct emotional expression. Focusing on direction emotional expression of an individual provides important and useful information about the relation of that individual to events in the immediate environment as well as insight into the relation of that individual to those he or she is communicating with.
- The emotive tone analysis method included in the present invention has the following characteristics:
- It distinguishes emotional expression (subjective component) from appraisal (objective component). Identifying emotional expression depends on a variety of indicators: lexical choice (specific words and phrases, including the interjections and the “responsive cries” studied in [Goffman 1981]), person distinctions (first person (“I”, “me”, “my”, “mine”, “we”, etc.), involvement is especially important), tense distinctions (favoring present tense), syntactic constructions (such as exclamative structures like “What a beautiful day it is!”), modification. Different factors play different roles for different emotions.
- The emotive tone analysis component [435] recognizes a set of basic emotions and cognitive states. These include (among others) anger, surprise, fear, confusion, frustration. These overlap with, but are not coextensive with the set of basic emotions identified and studied by Paul Ekman that are typically communicated not linguistically, but by physiognomic gestures (see [Ekman 2003]).
- Like the emotions signaled physiognomically, these basic emotions are scalar in nature. In the case of physiognomic gestures, greater intensity is indicated by increased magnitude of the gesture. In the case of linguistically communicated emotions, scalar properties are indicated in a variety of ways:
-
- lexically: “livid” is a description of a more extreme range of anger than “angry” (a claim that can be supported by the range of modifiers that occur with the two expressions);
- by augmentative modifiers like “very” or “extremely” that indicate an increase in the range of intensity;
- by mitigating modifiers like “a bit” or “somewhat” that indicate a diminution or decrease in the range of intensity;
- by punctuation or capitalization or other forms of Loud Talking (see that Digital Mirror section)
- In one embodiment of the system, the range of any emotion can be modeled by the emotive tone analysis component [435] as an open interval. Basic individual expressions (“livid”, “angry”, “upset”, etc.) can be associated with a basic sub-interval of such an interval. Modifiers can be interpreted as affecting this range. The basic observation is that initial modification has a greater effect than subsequent modification. For example, the difference between A and B below is greater than the difference between C and D:
-
- A: “I'm angry.”
- B: “I'm very angry.”
- C: “I'm very very very very very very very very angry.”
- D: “I'm very very very very very very very very very angry.”
- In another embodiment of the system, the range of any emotion can be modeled by the emotive tone analysis component [435] as a partial ordering. The utility of this model is that in a partial ordering, different modes of increasing intensity may not be directly comparable.
FIG. 38 shows a simple Hasse diagram (where points higher on the diagram indicate greater intensity) that illustrates two modes of increasing intensity are indicated: capitalization and modification by intensifiers (here, “very”). These two modes are compatible (which forms an upper bound for both in the diagram), but the two modes are not easy to rank subjectively in relative intensity. - In some cases, there are causal links between specific basic emotions and specific cognitive states. For example, ignorance is a cause of confusion. These causal links can be exploited in both directions: confessions of ignorance can be interpreted as evidence for confusion; expressions of anger may be linked to interesting causal events. (The existence of such stable links is an important reason to study and analyze emotional expression.)
- In the model computed by the emotive tone analysis component [435], emotional intensity varies across individual actors [220], across pairs of actors [220], across events [100], across social groups [225], etc. Measuring intensity (as described above) makes it possible to filter and analyze emotionally expressive communication in various ways. In one implementation, the focus of interest may be the statistical outliers in emotional expression, on the assumption that these occurrences are most likely to correlate with other events of interest.
- The analysis of emotional expression performed by the emotive tone analysis component [435] is compatible without restriction with a wide variety of other analytic methods: topical [144] filtering, actor-based [220] or domain-based filtering, temporal filtering (by interval, by time of day, etc.).
- Finally, the emotive tone analysis component [435] can identify and analyze emotional expression either statically, i.e. on a fixed dataset, when the system is running in batch mode [370] or dynamically, i.e. on a data stream, when the system is running in continuous mode [375]. In continuous mode, the analysis of emotional expression can be carried out not only retrospectively (as in the case of a fixed static dataset), but also prospectively so that future emotionally-involved events may be anticipated.
- The system described in the present invention includes a pragmatic tagging component [430] that will categorize the communicative and discourse properties of individual electronic communications [123]. This pragmatic tagging component [430] is a further development and implementation of the system described in U.S. Pat. No. 7,143,091, the disclosure of which is incorporated by reference herein for all purposes, and is designed to support a variety of functions, including but not limited to the following.
- Workflow analysis: an important feature of electronic communications [123] is how they relate to a workflow processes [128] i.e. the set of corporate tasks associated with normal business. Salient aspects of such communications [123] include requests for information or deliverables, negotiations concerning such requests, status updates, delivery of results, acknowledgment of receipt of information or deliverables, etc., together with a range of communicative information related to the social relations of those communicating (such as positive and negative forms of politeness, including thanks, praise, etc.).
- Discussion building: sets of electronic communications [123] (on any kind of communication channel [156]) typically have various levels of structure, including a first-order structure involving the individual elements of the set, and higher order structures that link first-order structures together in various ways (see, e.g., U.S. Pat. No. 7,143,091 for details). A very simple example is a case in which an email message that ends with a request is linked to a subsequent message acknowledging the request and perhaps fulfilling it. It is possible to postulate hypothetical links between first-order electronic communications [123] based completely on their metadata [140]: what actors are involved, what temporal properties are involved, common properties of subject lines (including indications of forwarding or response added by a mailer or other software application used to format and transfer electronic communications [123]). Pragmatic tags [172] offer a different basis for postulating hypothetical links, one which can be used to strengthen or confirm hypotheses based on other sources of information.
- Lexical analytics: the form of words that individuals use in communicating about a workflow process [128] or topics [144] of mutual interest often reveals attitudes and presumptions that the communicants convey directly or indirectly; access to these overt or implicit attitudes is often useful in assessing motives with regard to the tasks at hand or to other actions or events.
- Relational analytics: linguistic aspects of email communication also convey mutual understanding of the personal and social relations among the participants, including points at which the strength of these relations is tested in some way and points at which such relations undergo significant changes.
- An adequate system of pragmatic tagging requires:
-
- An abstract model of social interaction and social communication;
- A set of pragmatic tags [172] that are, on the one hand, interpretable in this abstract model (so that the presence of a particular pragmatic tag provides evidence of the existence of a specific kind of social event or social interaction or social communication) and can, on the other hand, be defined directly by linguistic properties (that is, by the presence of specific expressions or specific syntactic combinations of specific expressions);
- A method of analysis of individual electronic communications [123] through which the linguistic reflexes of particular pragmatic tags [172] can be identified.
- In one implementation, such a system can be realized as a cascade of transducers, as illustrated in
FIG. 17 . For example, an electronic document is passed through a linguistic filter [1700] token by token. Individual words and grammatical constructions (such as inverted sentences of the kind that occur in English yes-no questions or wh-questions) are detected. Each detected instance is replaced by an intermediate tag [1710]; non-matching tokens [116] are disregarded (though in some implementations it may be useful to count them or measure the distance from the onset of a message to the first recognized expression or the distance between recognized expressions). The resulting set of intermediate tags is then analyzed further, with the intermediate tags possibly replaced by a final set of projected tags [1720]. - Transducers of this kind have the property that they preserve the relative order of the input expressions: that is, if tagi follows tagk at a given level, then the evidence for tagi follows the evidence for tagk at the previous level. This is not a trivial property. If the word “Thanks” appears initially in a communication [123], it serves as an acknowledgment or appreciation of a previous action on the part of the addressee of the communication [123]. If it appears following a request (as in: “Please get back to me with the requested information by 5:00 this afternoon. Thanks.”), it serves to acknowledge a presumptive acceptance of the request it immediately follows. If the relative order of information is not preserved, this distinction is lost. And in the absence of correct order information, the effectiveness of the tagging in linking different messages together correctly would be degraded.
- A further reason to have a cascade of transducers (as in the implementation favored here) involves the complex relation between linguistic form and pragmatic interpretation (in terms of speech acts and dialogue acts). It is well known that speech act or dialogue act interpretation is constrained in important ways by syntactic form. Consider the examples:
-
- “Pass the salt!”
- “Did you pass me the salt?”
- “You passed me the salt.”
- The first can be used as a command or request. The second two cannot. The second can be used to request information. The first and third cannot. The third can be used to make an assertion. The first and second cannot. But in a variety of more complex cases, the relation between syntactic form and pragmatic interpretation is less straightforward.
- In the cases of commands, for example, we find a hierarchy based on directness and explicitness, very partially represented below:
-
- “Pass the salt, dammit!”
- “Pass the salt, please.”
- “Could you pass the salt.”
- “Can you please pass the salt.”
- “Do you think you could pass the salt.”
- “Would you mind passing the salt.”
- As many authors have noted, a sentence like “Can you pass the salt?” can be interpreted as a request for information or as a request for salt. (Of course, when “please” is inserted, it can only be a request for salt, not for information.) But as a request, it is less demanding than the explicit “Pass the salt, dammit.” We model this abstractly in the presumptions these speech acts make and the effect they have on the relations of the conversational participants. Theoretically, one would like to appeal to a logic of conversation (to use Grice's terminology) that allows one to derive the associated presumptions and effects from properties of form (the individual components of the expression and its structure) and properties of the context. But for practical purposes, it is enough to recognize and respect the relevant differences: that is, while each of the above sentences can be used to make a request, the requests are themselves organized in a way that reflects a decrease in directness and a concomitant increase in deference to the addressee.
- The same kind of distinction can be made with respect to other basic speech act types. In the case of requests for information, we have a hierarchy similar to the one above:
-
- “Who won the battle of Alma?”
- “Do you know who won the battle of Alma?”
- “Can you tell me you won the battle of Alma?”
- “Do you happen to know who won the battle of Alma?”
- The last of these represents less of an imposition on the addressee (and thus represents a decrease in potential threat to the addressee's “face”, to use terminology of Goffman adapted by Brown and Levinson.)
- And we have a similar hierarchy for representatives (to use a term of Searle's that covers assertions as a special case):
-
- “It's raining.”
- “Are you aware that it's raining?”
- “Did I mention that it's raining?”
- “You might want to check whether it's raining.” (Uttered when the speaker knows that it's raining.)
- Each of these sentences can be used to inform the addressee or make the addressee aware that it's raining. But like the previous cases, they have different presumptions governing their appropriateness in context and they have different effects on these contexts. For example, in some cases, it is regarded as perhaps more tactful to be more indirect—but this is a way of saying that the possible risk to the social relations between speaker and addressee is playing a more prominent role.
- As mentioned above, there are many possible approaches to such phenomena. At one extreme is an approach which attempts to derive all these effects from first principles. In another implementation, the grammatical system (first-level transducer) is sensitive both to basic properties of syntactic and to a set of sentence-initial operators:
-
- [imperative sentence]
- can you (please) . . .
- could you (please) . . .
- do you think you can . . .
- do you think you could . . .
- would you mind . . .
- do you know . . .
-
- does anybody know . . .
- can you tell me . . .
- can anyone tell me . . .
- do you happen to know . . .
- would anyone know . . .
- Each of these operators is associated with a pair of pragmatic tags [172]: one associates a basic speech act type with the sentence as a whole; the other provides scalar information associated with the pragmatic context—particularly the way the pragmatic interpretation of the sentence bears on the social relations between speaker and addressee.
- A third reason to have a cascade of transducers involves context-dependent information. The study of indirect speech acts has brought out the ambiguity or vagueness of such sentences as “Can you jump over that wall”, and has emphasized the social utility that indirectness involves. Yet context often reveals how the utterance of such a sentence is interpreted by the conversational participants. If the answer to “Can you jump over that wall” is “ok”, the answerer interprets the sentence as a request for a jump. If the answer is “I believe so”, it seems more likely in this case that the speaker interprets the sentence as a request for information, not action. It is simpler to resolve the vagueness or ambiguity of this question at an intermediate level than at the lowest level.
- A fourth reason to have a cascade of transducers, related to the third, involves the fact that information in dialogue is typically distributed across different dialogue acts (or “dialogue turns”). A central example of this involves questions and their answers. From a question like “Did you receive the documents?”, one cannot infer that the addressee either received the documents in question or did not receive the documents in question. Suppose the answer is “Yes, I did”. From this elliptical sentence, one can also not in general infer that the speaker either received the documents or did not receive the documents. But suppose the utterances of these two sentences are appropriately coupled in dialogue (or similar electronic communication [123]), as below:
- A: “Did you receive the documents?”
- This structured dialogue supports the inference that B received the documents in question (assuming such things as that B is trustworthy). This inference process is enhanced by one or more intermediate levels of representation. To consider the nature of these levels of representation, it is useful to examine in more detail the nature of questions and their answers.
- Our system employs a novel framework for the analysis of question/answer dialogue structures. This system analyzes the overall problem into a set of cases, with each case being associated with a variety of linguistic forms (on the one hand) and with a set of inference-supporting properties (on the other). (For brevity, we focus here on direct yes/no questions.)
-
Case 1. The answer completely resolves the question. -
- Question: “Did you receive the documents?”
- Complete answers: “Yes”/“No”/“Absolutely”/“Not at all”/ . . .
-
Case 2. The answer resolves a more general question or a more particular question. -
- Question: “Did you receive the documents?”
- Answer to a narrower question: “Not from Jonathan.”
- Question: “Did you receive the documents?”
- Answer to a broader question: “And not just the documents.”
- Question: “Did you receive the documents?”
- Answer to a broader question: “Not yet.”
- [provides more information than a simple “no”.]
-
Case 3. Answerer responds with information about what he/she knows about the situation: -
- Question: “Did you receive the documents?”
- Answer: “I don't know. I haven't checked my mail yet today.”
- Question: “Did you receive the documents?”
- Answer: “Not that I know of.”
-
Case 4. Answerer cannot resolve the question, but can address the possibility of a positive or a negative answer: -
- Question: Did you receive the documents?”
- Answer: “Possibly: I haven't checked my mail today.”
- Question: “Does 9 divide 333?”
- Answer: “Well, 3 divides 333 . . . ” (implicates that the speaker doesn't know (other things being equal), but that the answer is still possibly positive).
-
Case 5. Answerer cannot resolve the question, but can assess the probability of a positive or a negative answer -
- Question: “Did you receive the documents?”
- Answer: “They're probably in my mailbox.”
- (As used here, “possibility” involves what set of situations are relevant to the event space; “probability” involves an association of a scalar (or something like a scalar) with an element of the event space.)
- We model workflow and other social interaction both abstractly and (in specific implementations) concretely.
- At the abstract level, for each relevant pair of actors [220] (or groups of actors [225], when the group plays the role of an individual), we recognize different types of relations holding between them (personal relations, familial relations, professional relations of different kinds). For example, it is not unknown for a doctor or a professor to have personal relations with a patient or a student. (A special role is played by the notion of a possibly distinguished personality [230] associated with a specific electronic alias [240]. The set of relations posited here is the counterpart of this notion at the level of personal interaction.)
- For each relevant pair and each posited relation holding between them, we may associate a shared agenda: a mutually recognized set of tasks or obligations or benefits that they are mutually committed to; in addition, each member A of a relevant pair {A,B} may be associated with a private agenda—not shared and not necessarily mutually committed to—which directs, governs, or constraints actions and responses by A toward B.
- Particular agenda items will in general have many specific particular properties. Items on the agenda may range from very general requirements (“share amusing pieces of information”) to very specific tasks (“read and sign the 10-K form by tomorrow at noon”), and these vary greatly across different kinds of social relations. One of the virtues of abstraction is that these particulars may be ignored in modeling properties of the postulated agenda. The basic questions are:
-
- How does a task get added to an agenda?
- How does a task get removed from an agenda?
- These basic questions generate a variety of subsidiary questions which may or may not arise in specific circumstances:
-
- How are the specifications of a task negotiated in advance?
- How is the addition of a task acknowledged by the parties concerned?
- What intermediate status reports are involved and how are they initiated and responded to?
- How is the completion/abandonment of a task communicated and acknowledged, so that the task is no longer regarded as part of the mutual agenda?
- This abstract workflow model corresponds to a simple finite state machine, illustrated in
FIG. 18 . The required flow through this machine is from an initial state representing a new task [1800] through an intermediate required state representing the addition of a task to the agenda [1810] to a final state representing the removal of the task from the agenda [1830]. - All the subsidiary questions of the kind just discussed above correspond to loops around task request [1810], task acceptance [1820], or task completion [1830]. The loops around task request [1810] correspond to negotiations concerning the nature of the task. The loops around task acceptance [1820] correspond to communications [123] among the participants during the period that task is mutually recognized but not discharged. Loops around task completion [1830] correspond to acknowledgments and thanks and other later assessments.
- A specific task can fall off the agenda in more than way, including but not limited to:
-
- The task can be satisfactorily completed;
- The task can be abandoned or die off (via a sunset clause, or . . . )
- In some embodiments, the model of generic tasks (such as marriage vows) may never reach task acceptance [1820] or may involve linking sub-models representing the agendas of individual members of a pair. Mutually dependent situations of this kind may be modeled to a first approximation by finite state transducers (like a FSA, but with arcs labeled by pairs), where the first element of any pair represents the cause of the transition, and the second represents the value to one or the other party.
- This behavioral modeling component [445] allows the system to compute several measures of influence for a particular actor [220] by assessing the changes in behavior [205] of people around that actor [220] before, during, after a significant absence of some kind, both in terms of how completely their behavior returns to the norm after that period, and how quickly it does so.
- In general, measuring the individual behavior [210] and collective behavior [215] of people around an actor [220] both before and after a period of absence or lack of interaction with the rest of the network is a very insightful measure of that actor's [220] influence level, because if the person is important enough then the behavior of the network will snap back to its prior behavior quickly—and completely—once the person has returned. Conversely, if the person's influence was fragile or not shared by the majority of people interacting with her, not only will the behavior in her neighborhood take much longer to go back to normal, but the new norm will also be significantly different from the former baseline behavior [260], with new connections created to third-parties, more erratic communication patterns, etc.
- A period of absence can be due to whether vacation, business travel that causes the person to be largely out of touch, dealing with family issues, etc. In one embodiment these periods of absence are derived using a number of methods including but not limited to:
-
- Data pulled from HR systems
- Automatic out-of-the-office email replies
- Detection of inactivity periods, for example using the very common statistical criterion that the number of events [100] originating from the actor is below the average over the baseline by at least k standard deviations, typically with k=3.
- Every time a period of absence or inactivity has been detected, the system computes a divergence on the distribution of per-actor activities between a baseline period P1 (e.g. sliding window of 1 year) and a period P2 of inactivity (typically a small number of weeks) for the actor [220] which will be referred to as the subject in the following. The distributions are computed over all actors [220] having significantly interacted with the subject in one of these periods. Both for P1 and P2, we prune the elements in which the subject is involved (for P2 this is to get rid of the residual activity, such as a long-running discussion [136] that was started in the presence of the subject). Pruning is done either by only removing individual events [100] involving the subject, or by removing all discussions [136] containing at least one such event [100].
- Furthermore, any divergence metric can be used. In one embodiment the K-L divergence H(P1, P2)−H(P1) is used where H refers to the entropy of a distribution. In another embodiment the system uses the variation of information H(P1)+H(P2)−2I(P1, P2) where I(P1, P2) is the mutual information between P1 and P2. The divergence measured by this method constitutes an assessment of how the absence of the subject impacts her environment and her closest professional or social contacts.
- The activity model used in this method is also entirely configurable. Activities taken into account in the default embodiment cover all types of communication channels [156], from which the system derives events that are either individual communications [123] (i.e. random variables taking values such as “A emails B” or “A calls B”) or entire discussions [136] (i.e. random variables taking a value such as “A emails B,C; B emails A”).
- An additional, closely related assessment of the subject's influence takes P1 to be the period of inactivity and P2 to be the same-duration period following the actor's return: by comparing it to the baseline behavior [260], this gives an indication of how completely and quickly the environment returns to normal. This is called the snapback effect exerted by the subject on the rest of the network. When taken together, these two metrics evaluate the importance of the subject: how badly others miss her when she is away, and how easily others get over her instead of returning to normal behavior once she is back.
- Also, when evaluating influence, actors [220] such as managers whose sign-off is necessary to approve certain actions and events are excluded from consideration by the system. This is because such actors [220] will by definition impact the behavior of people reporting to them, without that causal relationship bearing any implication on their actual level of influence.
- Detecting Textblocks Via n-Gram Transition Graphs
- Textblocks are defined in U.S. Pat. No. 7,143,091: “Textblocks consist of the maximum contiguous sequence of sentences or sentence fragments which can be attributed to a single author. In certain cases, especially emails, a different author may interpose responses in the midst of a textblock. However, the textblock retains its core identity for as long as it remains recognizable.”
- A method is also given there for detecting textblocks, That method has certain limitations in terms of both recall and memory footprint. The method described here is superior in both respects (see
FIG. 19 ). This method is also amenable to continuous computation and hence provides the preferred embodiment for the textblock detection component [470]. The idea of this method is to find collections of text fragments in different text-containing items [122] which are similar enough to infer that they were duplicated from a single original item [122], and hence have a single author. We shall distinguish between an abstraction of such a collection of fragments, which we shall call a textblock pattern [124], and a particular fragment of text in a particular item, which we shall call a textblock hit [130]. - We begin by constructing a graph of n-gram [118] transitions within the universe of items [122] (see
FIG. 20 ), also called a textblock graph [160]. For each item [122], examine its text one token [116] at a time. Form n-grams [118] of successive tokens [116] for some value of n that will remain constant throughout and which is small compared to the size of a typical item [122]. Successive n-grams [118] may overlap. For example, in the text “one two three four five”, the 2-grams would be “one, two”, “two, three”, “three, four”, and “four, five”. Keep a sliding window of size k over the successive n-grams [118]. This window will initially contain the first k n-grams [118] in the document [162]. At each step, the window will be moved forward so that the first n-gram [118] in the window will be removed and the next n-gram [118] in the document [162] will be added to the end of the window. - We will produce a graph of transitions between n-grams [118] in this sliding window. Begin with an empty graph. Add each of the k n-grams [118] initially in the window as vertices to the graph. For each of the k*(k−1)/2 pairs of n-grams [118] in the sliding window, add a directed edge with
weight 1 between the corresponding vertices, with the head corresponding to the n-gram [118] appearing first in the window, and the tail corresponding to the latter n-gram [118]. - Each time the sliding window moves forward, an n-gram [118] is added to it. Add this n-gram [118] as a vertex to the graph if it was not already there. For each of the up to k−1 other n-grams [118] in the window, do a look-up for a directed, weighted edge in the graph pointing from that n-gram [118] to the one that was just added. If such an edge did not exist, add it to the graph and give it a weight of 1. If such an edge did exist, increase its weight by 1. In the example given above, if k=3, we would create, or increment the weight of, the following edges:
- “one, two”->“two, three”
“one, two”->“three, four”
“two, three”->“three, four”
“two, three”->“four, five”
“three, four”->“four, five” - Continue in this way for each item [122] in the universe of items [122]. Textblock patterns [124] consist of tokens [116] that are used in roughly the same sequence over and over again, with some modification. This means that their n-grams [118] will be used in roughly the same order. To the extent that these n-grams [118] are not commonly used in other contexts, we expect to see a particular pattern in the local environment of each such n-gram [118] within the graph.
- Consider the neighborhood N of an n-gram [118] p. That is, N contains all vertices which are connected to p by edges, as well as p itself. Let M be the maximum number of directed edges possible with both head and tail in N, excluding loops. Then M=|N|*(|N|−1), where |N| is the size of N. For each pair of vertices q and r in N, let w(q, r) be defined as:
- w(q, r)=weight of the edge from q to r, if such an edge exists
w(q, r)=0, otherwise
Let W be max(w(q,r)) for all q, r in N. Define the local clusterability of p, LC(p), to be (seeFIG. 25 ):
LC(p)=(sum of w(q, r) over all q, r in N)/(M*W) - LC(p) provides a measure of how evenly interconnected the neighborhood of p is, and as such is key to this algorithm for detecting textblocks. Now consider a relatively unique sequence of terms, such that its n-grams [118]are found within the universe of items [122] only within this sequence (although the sequence may be repeated in its entirety many times within the universe). Consider an n-gram [118] P that appears in the middle of this sequence. Then we expect that n-gram [118] P to be connected in the graph we are making to the k−1 n-grams [118] before it, and to the k−1 n-grams [118] after of it, and nothing else. N contains all the neighbors of P plus P itself, so |N|=2(k−1)+1=2k−1, and M=(2k−1)*(2k−2). All edges in N will have the same weight, so w(q, r)=W for any q, r in N such that q and r share an edge, and 0 otherwise.
- Now consider how many edges there will be in N. The edges that are within N (i.e. both the head and tail of the edges are in N) are precisely those created by the k positions of the sliding window such that the window contains P. As the sliding window moves along, P goes from being the last item [122] in the window to being the first. In the window's first such position, every n-gram [118] is connected by one edge to every other, so this window contains k* (k−1)/2 edges. Whenever the window moves forward, a new n-gram [118] is added and is connected to every n-gram [118] before it in the window by one edge, so this adds k−1 more edges (all other edges in this window were already present in the previous window). The window moves forward k−1 times (once for each of the k positions under consideration except the initial position), so the total number of edges in N is k*(k−1)/2+(k−1)2. Thus:
-
- So the expected local clusterability depends only on the size of the sliding window.
- The sequence of tokens [116] we have considered is a type of textblock—exactly the same sequence of tokens, which form n-grams [118] that are not used anyplace else. In practice, we are interested in finding cases where the sequence is sometimes different, additional tokens [116] are added or removed, and/or tokens [116] are re-ordered. In this case the graph connecting tokens [116] in the textblock will look similar to what we have considered, but the edges will not all have exactly the same weight, and there may be additional edges with low weights. This will affect local clusterability by an amount roughly proportional to how large such changes are. Likewise, an n-gram [118] in a textblock may appear in more than one context within the universe of items [122]. If it only appears once within the textblock pattern [124] and appears in that pattern much more often than in other contexts, its local clusterability will be close to the expected value calculated above. Thus, most n-grams [118] within the textblock will have LC close to the expected value.
- We obtain textblock patterns [124] from the transition graph [160] as follows (see
FIG. 21 ). Examine the vertices of the graph [160] one at a time. For each vertex, compute its local clusterability. If this value differs from the expected value by less than some pre-chosen threshold ε, mark the vertex as a textblock-vertex. When this is done, remove any vertices that are not textblock-vertices. Now find and label the connected components within the remaining graph [160]. Each of these connected components represents a textblock pattern [124]. - The most time-consuming part of this algorithm is the local clusterability calculation. Because it may require comparing every pair of items [122] in a neighborhood N, its running time may be O(|N|2). Since this must be done for every vertex, the running time of this calculation would be O(n3), where n is the number of vertices in the graph [160], if the graph [160] were nearly completely connected. (However, in common natural language prose, few n-grams [118] will be this highly connected.) In one embodiment, the algorithm handles this problem by not considering vertices for which |N| is greater than some pre-chosen threshold. The most connected n-grams [118] are unlikely to be useful in detecting textblocks, so this has little effect on the accuracy of this method. The rest of the algorithm described so far runs in time that grows linearly with the total summed length of the items in the universe, provided that all graph look-up and modification steps run in constant time. This is possible if the graph [160] is small enough to fit in constant-access memory.
- This method for finding patterns has two phases—a first phase in which the transition graph [160] is built, and a second phase in which the transition graph [160] is pruned and connected components and identified. The first phase considers only one item [122] at a time and hence can be performed against a pseudo-infinite stream of text-containing items [122] in a continuous monitoring context. The second phase cannot. In one embodiment, the graph [160] is periodically cloned so that a separate process can perform the second phase and update the set of textblock patterns [124]. While this pruning occurs, the original process can continue to add new incoming items [122] to the graph [160]. In a second embodiment, textblock patterns [124] are detected within overlapping time periods. For instance, textblock patterns [124] might be detected within bins of two months each, with each bin overlapping the bin after it by one month. In a third embodiment, patterns [124] are kept for some pre-determined amount of time and then discarded. In a fourth embodiment, patterns are discarded after some amount of time following when the most recent hit for that pattern was detected.
- We find textblock hits [130] for textblock patterns [124] in an item [122] as follows (see
FIG. 22 ). Begin by creating an initially empty lookup-table “CountMap”. This table will have as keys the Ds of textblock patterns [124] currently being considered as potential matches, and maps each such ID “A” to a tuple CountMap(A)=[a, b, c], where “a” is the number of edges found which might be in “A,” “b” is the number of non-matching edges, and “c” is the number of trailing non-matching edges. Now proceed in a manner similar to that used to find the patterns [124]. Examine the tokens [116] of the item [122] in order. Run a sliding window of size k over the resulting n-grams [118], using the same values of n and k used to find the patterns [124]. For each transition edge E, do a lookup in the graph [160] to determine if the edge is part of a pattern [124]. If the edge is found in the graph [160] and has pattern [124] ID A, and if there is no current entry CountMap(A), then set CountMap(A)=[1, 0, 0]. If CountMap(A) already has a value [a, b, c], increment a by 1 and set c to 0. For every pattern - ID X in CountMap where X does not equal A (or, if the edge E was not in a pattern [124], then for every pattern ID in CountMap), look up CountMap(X)=[a, b, c]. First increment both b and c each by 1, then decide whether to drop this pattern [124] from consideration. In one embodiment, we drop from consideration any pattern [124] for which c is above a pre-set threshold. Whenever a pattern [124] drops from consideration, first determine whether the edges found so far constitute a hit [130], then remove the entry for that pattern [124] from CountMap. Likewise, when the end of the item [122] is found, consider whether the edges found so far constitute a match for each value in CountMap, then clear CountMap. In another embodiment, we determine whether edges found constitute a hit [130] using pre-chosen thresholds for recall (were enough edges from the pattern [124] found in the item [122]?) and precision (were enough of the edges from the item [122] in the pattern [124]?).
- A hit [130] found in this manner will not constitute an entire textblock under the definition given above, but rather will constitute a portion of a textblock. In general, textblocks may be broken into pieces by n-grams [118] that are used in many contexts, or by the presence of even smaller segments which are also repeated by themselves. Also, at the beginning and end of the textblock n-grams [118] may not have the clusterability predicted here. But once we have found a set of items [122] containing hits [130] of a common textblock pattern [124], we can expand the hits [130] using standard methods for inexact string matching which would not be feasible on the entire universe. These methods may include dynamic programming or suffix tree construction.
- It should be obvious to a practitioner skilled in the art that, assuming the computer running this algorithm has sufficient constant-access memory to allow constant time look-ups in the transition graph [160], the hit [130] finding algorithm will run in time linear in the length of the text of the item [122] so examined. Hence, finding hits [130] in the universe of items will run in time linear to the total length of all items [122] in the universe.
- As the method for finding textblock hits [130] considers only one text-containing item [122] at a time and compares it against a static graph, it can be used in a continuous monitoring context against a pseudo-infinite stream of items [122].
- The textblock pattern [124] detection portion of this method will run poorly if the transition graph [160] is so large that it cannot be held in constant-access memory. In some embodiments, only a subset of transitions will be recorded as edges so as to reduce the total size of the graph [160]. In one embodiment a list of functional words appropriate to the language of the text is used (see
FIG. 26 ). In English, for example, prepositions, articles, and pronouns might be used. Only n-grams [118] immediately following such functional words are placed in the sliding window. In another embodiment, the full list of n-grams [118] is produced, but is then reduced to a smaller list using a winnowing method similar to that described in [Schleimer 2003] (seeFIG. 27 ). N-grams [118] are hashed and the hashes are placed in a sliding window. The smallest hash at any given time will be noted, and the n-gram [118] it came from will be placed in the derived list. If, for a given window position, the n-gram [118] with the smallest hash is the same as it was in the last window position, then it is not added again. From the derived list, transitions will be recorded as edges. - A modification of this algorithm will perform well even when the transition graph [160] is too large to fit in random-access memory (see
FIG. 23 ). Proceed as before, but place a maximum size on how large the transition graph [160] is allowed to grow. In one embodiment, when the graph [160] reaches this size, the entire graph [160] is written to disk (or some other slower portion of the memory hierarchy) and the constant-access memory structure is emptied. In another embodiment, vertices in the graph [160] are held in memory in a Least-Recently Used (LRU) cache. When a vertex is ejected from the cache, edges within its neighborhood are written to disk (or other form of slower memory) and then purged from constant-access memory. In any embodiment, the portion written to disk (or slower memory) is recorded as ordered triples of n-grams [118] connected by edges. For example, in the text “one two three four five”, using 1-grams we would record: - “one, two, three”
“one, two, four”
“one, two, five”
“one, three, four”
“one, three, five”
“one, four, five”
“two, three, four”
“two, three, five”
“two, four, five”
“three, four, five” - These ordered triples are written to a file or similar storage structure, one per line, followed by a weight, which will initially be one. Periodically, this file can be compressed. To do so, sort the file. Lines containing the same triplet will now be adjacent. In another pass over the file, replace n lines containing the same triplet with 1 line containing their summed total weight. This process can be parallelized by having multiple processors or different computers examine different sets of items [122], where such items [122] partition the universe of items [122], and each processor will write to its own file. When this process is done, the various files can be joined using the merge-sort algorithm, and lines containing the same triplet can be combined. Sorting and merging files on disk is a well-studied problem, and can generally be done in running time that is O(n*log(n)), where n is the total length of the files involved. Hence, the entire process so far will run in time O(n*log(n)), where n is the total length of all items [122] in the universe.
- Once the final list has been created, iterate over it from the beginning. Because the list is sorted, for any n-gram [118] P, all lines containing P in the first position will be adjacent in the list. Thus we can reconstruct the approximate neighborhood of each vertex, and calculate its local clusterability, one vertex at a time, without ever holding more than one neighborhood in constant-access memory at a time, and without doing random-access disk lookups. Some information about a neighborhood may be lost because P's neighbors are found adjacent in a window in which P itself is not found, but in general this will not unduly affect the results.
- The system creates and continuously maintains content re-use profiles that characterize the extent to which individual actors [220] produce a certain category of information, how they modify such information, and how they consume it.
- Valuable information can be defined in any number of ways. In the default embodiment, valuable content is generally defined as the result of filtering all items [122] collected and processed by ontology classifiers [150] associated to business-relevant topics [144]. Alternatively and in a more specific corporate application scenario within a corporate environment, valuable content is defined as intellectual property assets such as research artifacts or software code.
- The system relies on continuous textblock detection as described in this invention. Two libraries of textblocks are maintained: a global library of textblocks and a library of recently disseminated textblocks.
- To build and maintain the global library of textblocks, the algorithm described previously is run over a large dataset, such as all historical valuable data or all valuable data in the past n years. Once the textblock patterns [124] have been discovered, all textblock hits [130] (which represent usage of that content) are scanned and sorted by date, then used as the basis for building content-relaying events. Each content-relaying event is a (Textblock pattern, Date, Sender, Recipient) tuple. One such tuple is created for each (Sender, Recipient) pair associated to the item [122] in which the textblock hit [130] was found—by definition there can be one or more such pairs per item [122]. The date of a content re-use event is defined as the representative date of the item [122] in which the textblock hit [130] occurs. In one embodiment of the present disclosure, the representative date is the sent date of an electronic communication [123] (such as an email message or an IM turn), the last modification date of an item [122] collected from a file (or from a database table), etc.
- The system will then compute scores, called create-score, consume-score, and relay-score, for each actor [220]. In addition, a list of all received textblock patterns [124] is maintained for each actor [220]. To do this, it scans all content re-use events in date-ascending order.
- The first time a given textblock pattern [124] is encountered during this scan, the create-score of the sender is updated. In one embodiment of the present disclosure, this update consists of a unit increment of the create-score. In another embodiment, it is an increasing function of the textblock pattern [124] length. In yet another embodiment, it is a decreasing function of the number of textblock hits [130] found for the pattern [124] throughout the whole dataset.
- For every re-use event, the receive-score of the recipient is updated (the update function is similar to the update of the send-score described previously), the textblock pattern [124] is added to the list of received patterns [124] for the recipient if it was not already present, and if the textblock pattern [124] belonged to the list of received patterns [124] for the sender then the sender's relay-score is updated. In one embodiment of the present disclosure, the update of the relay-score consists of a unit increment. In another embodiment, it is proportional to the ratio of the textblock hit [130] length over the textblock pattern [124] length. In a third embodiment, the list of received patterns [124] is augmented to keep track of the token [116] range of each received pattern [124] that has also been relayed, and the update consists of adding the ratio of the textblock pattern [124] length that was not covered (in other words, this differs from the previous embodiment in that information relayed multiple times is only counted once).
- The present invention is not restricted to a specific dissemination profile calculation method. In the default embodiment of the system, the scores defined above simply measure the level and type of interaction among actors [220] by counting items exchanged through communication channels [156] or loose documents [162]. In another embodiment, they also take into account how often that content is being viewed, downloaded or copied, and, by contrast, which content is simply ignored. The resulting profiles are much more accurate and more difficult to game than simple counts in large enterprise networks, where actors [220] who tend to send superfluous content in large quantities are often not contributing to the overall productivity of that organization. In yet another embodiment, scores are also a function of the actual roles and responsibilities of actors [220] as derived from example from their involvement in discussions [136] that represent workflow processes [128], including but not limited to how often they initiate, close, or contribute to discussions [136], whether they are decision-makers (as defined by the measure of that trait [295] in the behavioral model [200]), whether they review work products, etc.
- Possible enterprise application scenarios of information dissemination profiling include but are not limited to:
-
- Determination of who to promote, who to demote, or lay off;
- Assessment of the impact of an organizational change, such as a merger or acquisition or a major restructuring;
- More generally, the rewarding of employees who are collaborative, and who focus on what is good for the overall organization rather than for their own promotion or their particular business unit.
- Finally, the system ranks each actor [220] along a number of information dissemination profiles based on the scores [285] that have just been computed. In one embodiment of the present disclosure, information dissemination profiles are:
-
- Creator, which represents an actors [220] tendency to create valuable content, i.e. content that is re-used by others in meaningful ways;
- Curator, which indicates to what extent the actor [220] spots valuable information then relays part or all of it across the network, potentially modifying it along the way;
- Consumer, which quantifies the volume of valuable information which is simply received by the actor [220].
- The ranking [275] mechanism can be configured by the user. By default, actors [220] are ranked as follows: creators are ranked by decreasing their sender-score, consumers are ranked by decreasing their receive-score, and curators are ranked by decreasing their relay-score.
- Once the system has been initialized by building the global library of textblocks, it starts maintaining a library of recently disseminated textblocks, which allows analysis of trends in dissemination behaviors.
- In one embodiment of the invention, a library of recently disseminated textblocks is built at regular intervals (for example, on a monthly basis). The list of content re-use events is computed similarly to the global library construction, except that the create-score of the sender of a re-use event is updated only when the textblock pattern [124] is encountered for the first time in this scan and it is not present in the global library. If either condition is not satisfied, then the sender's relay-score is updated as in the global library construction. The result of this scan over re-use events is a ranking of actors on the corresponding time period.
-
FIG. 28 shows a graph visualization of information dissemination profiles provided by the system. - This visualization shows actors [220] as nodes in the graph, and dissemination relationships as edges. The identity [2820] (potentially anonymized depending on the anonymization scheme [340] in place) of the actor [220] decorates each node. Around a given actor [220], zero, one or more annuli [2800] (also called donuts) are drawn to represent that actor's [220] dissemination profiles computed as previously described. Some nodes, such as [2820], do not have any circle because they do not significantly contribute to dissemination of information deemed as valuable in the particular context of that visualization (i.e. they are not involved in items [122] filtered during input data selection [360]).
- The width of annuli [2800] (i.e. the difference between their external and internal radii) drawn around an actor denotes the relative amount of information respectively produced, received, or relayed.
- Color codes are used to distinguish profiles. In the default embodiment, blue circles indicate creators [2805], green circles indicate curators [2810], and red circles indicate consumers [2815]. In addition, a saturation level is used as a visual indicator of the content's value or relevance: the darker the color, the more valuable the information created, relayed, or consumed. This provides an additional dimension to the dissemination profile established by the system. For example, the darker the blue circle around an actor [220], the more likely that actor [220] is to be a thought leader; the darker a green circle around an actor [220], the more actively that actor [220] is contributing to spreading knowledge or expertise throughout the organization.
- Optionally, valuable information can be further categorized by the system using any type of categorization component [146]. In an intelligence application scenario for example, a set of categories would be classification levels, thus adding another dimension to the visualization. Each annulus [2800] is split into one or more annular sectors [2825], with the angle of a sector proportional to the relative volume of the corresponding category found in that actor's [220] dissemination profile. For instance in
FIG. 27 , the actor identified as “actor 13” creates significant volumes of information categorized as A or B [2825] in roughly equal proportions, but produces comparatively little if any information categorized as C. - The evolution of rankings [275] over time can also be visualized in an animated variant of the graph of information dissemination profiles graph described previously.
- The behavioral modeling component [445] leverages the output of the other components in the processing and analysis layer [402] to establish a behavioral model [200] which is the core model from which a behavioral norm is determined and anomalies are detected by the anomaly detection component [450] based among other things on deviations from that norm.
- To build that behavioral model [200], the present disclosure defines the concept of an individual behavior profile, which can be used for different purposes, including but not limited to the following.
- First, an individual behavior profile is useful in and of itself to show a behavioral portrait of the person (whether a snapshot of assessed behavior [205] over a recent period of time, or a baseline behavior [260] computed over a decade), while letting a human analyst derive any conclusions on his own. For example, certain organizations may want to investigate any actor [220] who has been deemed to possess an inflated ego and also appears to exhibit a low level of satisfaction with respect to his job.
- Second, an individual behavior [210] can be leveraged to analyze changes over time in the individual's behavior, in order to derive a level of associated risk or alternatively to produce anomalies [270] that should be investigated further. For example, someone who isolates himself from the rest of the organization (whether socially, professionally, or in both respects) over a period of time has a situation worth investigating.
- Third, an individual behavior [210] can be used to contrast the individual's behavior with her peers' behavior in order to yield another kind of assessment of anomalous behavior as with changes over time. For example, someone whose level of stress increases considerably more than his co-workers' stress is a significant anomaly, much more so than a collective increase in stress levels which might be imputable to internal tensions and difficulties or to exogenous circumstances.
- The primary function accomplished by the construction and maintenance of the behavioral model [200] in the present invention is to map each important actor [220] in the electronic dataset to one or more individual personality types. Note that these personality types can also be called archetypes since they are a necessarily simplified model of any real human personality, in that the more complex traits have been omitted while emphasizing other traits more relevant to the particular scenario, for example psycho-pathological traits.
- An actor [220] that matches at least one of these archetypes would typically be flagged for investigation if for example the corresponding archetype(s) suggest a level of present or future insider threat, where an insider threat is defined as a series of malevolent or unintentional actions by a person trusted by the organization with access to sensitive or valuable information and/or assets.
- In particular, the behavioral model [200] can provide evidence suggesting that the individual in question is a malicious insider. This covers three main types of situations described below, each of which presents a significant threat to the organization if it goes undetected until irreversible malicious acts are committed, unless a system such as the one described in this invention flags those individuals by raising alerts [305] based on the established behavioral model [200].
-
- A first type of malicious insider is an individual starting to work for the organization with the intention of committing malicious actions, but undetected until now,
- A second type of malicious insider is an individual starting with good intentions, being turned around by another individual or another organization for a variety of reasons (typically a mixture of greed, ideology, and other internal or external conflicts)
- A third type of malicious insider that the behavioral model [200] helps to detect is an individual with psychopathological tendencies or other predispositions to commit such malicious acts in the future. Such an individual may have been allowed to enter the organization because he or she was not filtered out in the initial recruiting process, or who developed those tendencies later.
- The set of personality archetypes is completely configurable, to allow for either a very generally applicable model of human personality, or a custom model more targeted toward a specific organization.
- In one embodiment, the set of personality archetypes represents the Big Five factors, which are a scientifically validated definition of human personality along five broad domains: Extraversion, Agreeableness, Conscientiousness, Neuroticism, and Openness. The advantage of such a generic model, besides lending itself to cross-validation of results produced by the present invention and by another personality assessment scheme, is that it does not assume a unique categorical attribute to define an individual personality, since the Big Five factors are modeled as 5 numerical features (generally expressed as a percentile value). Thus for example, by selecting the two factors Neuroticism and Agreeableness, the system provides a picture of actors [220] prone to anger control issues.
- In another embodiment, the set of personality archetypes is defined to represent the main behavioral risk factors present in any business organization, and are as follows:
-
- Personal dissatisfaction: this essentially consists of self-esteem issues, such as the actor [220] not living in harmony with reality and experiencing conflicts about himself or herself intrinsically (as opposed for instance to physical appearance)
- Professional dissatisfaction: this archetype is similar to personal satisfaction issues but pertains to the actor's [220] job and overall work life as opposed to her personal life
- Negativity: this archetype corresponds to the actor [220] expressing repeatedly high levels of negative sentiment and showing durable signs of physical, mental, or emotional tension
- Sociability issues: this archetype corresponds to the actor [220] avoiding almost any interaction (or interacting very poorly) with other individual actors [220] or groups [225], thereby establishing no personal relationships or failing to sustain them.
- The behavioral model [200] involved in the present invention relies on the assessment of the presence and intensity of a number of behavioral and personality traits [295] for every individual actor [220] or group [225] for which a sufficient volume of data has been processed and analyzed. Each personality type—or archetype—as described previously is then detectable by the presence of behavior traits that are associated to such a personality. In the default embodiment of the invention, each behavioral trait is associated to a positive or negative correlation with each archetype based on empirical and/or theoretical data: for example, when using the Big Five factors as personality archetypes, undervaluation is a behavioral trait that is measured by the system as positively correlated to the Neuroticism factor; once all behavioral traits have been accounted for, numerical values are available for each individual along each factor—from which percentile values can further be deduced using a reference sample appropriate to the scenario at hand.
- Note that a behavioral trait [295] might correlate to several archetypes; in some cases it might even correlate positively to an archetype and negatively to another. For instance, egocentric personalities are characterized (among other things) by a lack of empathy whereas influenceable personalities can be manipulated by others using their empathy (either towards the manipulator or a third party in case the manipulator resorts to coercion). In other words, the model assumes that each pair of random variables composed of a behavioral trait [295] and a personality type shows either no correlation, a positive correlation, or a negative correlation. In one embodiment, all such correlations are assumed to be linear correlations in the sense that actors [220] are scored along a personality type using a weighted sum (with positive or negative coefficients) over all behavioral traits [295] for which a score [285] has been computed.
- The rest of this section lists a number of behavioral traits [295] provided by the default embodiment of this invention. This list is not limitative, and a key characteristic of this invention is to support augmenting the anomaly detection mechanism with any behavioral trait [295] that can be measured for a given actor [220]. For each trait, a brief explanation is given of how to score an actor [220] along that trait [295] in one embodiment among all possible embodiments; essentially each trait [295] can be measured along a number of vectors either directly observable in the data or derived during processing or post-processing by the system. For clarity, the behavioral traits [295] supported by the system are broken down into broad categories: in this default embodiment, the categories are job performance, job satisfaction, perception by peers, communication patterns, and character traits.
- Job performance traits correspond to professional achievements and work habits, but also a measure of reliability, i.e. how well the actor [220] performs her job. The rest of this section describes such traits [295] that can be measured in the default embodiment of this invention.
- Disengagement measures the actor's [220] involvement in various professional responsibilities.
- In the default embodiment of the invention, an actor's [220] disengagement is measured by first filtering discussions [136] computed by the system to retain those that involve the actor [220] as a primary participant, are business-relevant, and optionally by filtering according to topics addressed in the elements of those discussions [136]. Then the system computes a number of behavioral metrics [290] including but not limited to:
-
- The lengths of those discussions [136] either over time or with the discussions [136] involving the actor's [220] peers.
- Their frequency over time.
- The actor's [220] level of participation. The level of participation can be measured using the results of a tagging subsystem, for example pragmatic tags [172]: in one embodiment, this is computed as A−U/T where A is the number of requests replied to by the actor [220] (by sending a work product or any relevant answer), U=R−A the number of requests left unanswered, R the total number of requests sent to the actor [220] (for example request for work product, for feedback, etc.), and T the total number of communications [123] received and sent by the actor [220].
- In the default embodiment, the system lets a user visualize patterns of disengagement for a given actor [220] by using the sequence viewer described in this invention to show discussions [136] involving that actor [220].
- Stability measures the regularity and stability in the distribution of time and effort for a given actor [220] across workflow processes [128] such as business workflows and activities.
- In one embodiment, stability is measured using periodic patterns [126] derived by the periodic patterns detection component [405]; both a high frequency and an increasing frequency of gaps and disturbances in the business-relevant periodic patterns [126] involving the actor [220] denote an unstable behavior.
- In the default embodiment, the system lets a user visualize patterns of stability for a given actor [220] by using the gap viewer described in this invention to show periodic patterns [126] involving that actor.
- This behavioral trait [295] assesses how the actor [220] delegates professional tasks and responsibilities.
- In one embodiment, the level of delegation for a given actor [220] is measured as the centrality measure in the graph of instruction relaying (as defined in U.S. Pat. No. 7,143,091). Optionally, the graph can be filtered to only retain explicitly actionable instructions, such as those accompanied by an attached email or a list of tasks, which provide a more accurate reflection of work delegation than the transmission of more vague, non-directly actionable instructions, or mere forwards.
- The system allows the user to efficiently visualize delegation patterns, either when an anomaly [270] has been flagged, or on-demand by the means of a particular type of animated actor graph [471].
- This behavioral trait [295] assesses the actor's [220] ability to respect deadlines and to complete tasks delegated to her.
- Measurements of an actor's [220] diligence include, but are not limited to the level of regularity in periodic sequences [132] originating from that actor [220], such as the submission of reports following a regularly scheduled meeting, or the frequency of indicators found in the analyzed data that the actor [220] was late or absent from important meetings without a valid reason being found for that absence or lateness.
- This behavioral trait [295] assesses the level of discipline shown by an actor [220] in the workplace, as well as her respect for procedures and hierarchy.
- Measures of an actor's [220] level of discipline include but are not limited to:
-
- Degree of absenteeism or unusual work schedules, as measured, for example, from a data source providing keycard access logs;
- Explicit disregard for authority, as measured, for example, from applying the appropriate ontology classifiers [150] to messages sent by the actor [220] but also messages about the actor [220], such as language used by other people and suggesting that the subject is “dragging his feet”, or that he should receive a warning;
- Non-compliance with internal policies, as measured, for example, from the output produced by a compliance system communicating with the present invention.
- In the default embodiment of the present invention, two types of definitions are combined for assessing an actor's [220] job performance: an objective definition and a subjective definition.
- Objective performance is assessed based on criteria including but not limited to production of high-quality content (i.e. frequently and broadly re-used in content authored by other actors [220]), or the dominant archetype taken by the actor [220] using role assessment (e.g. leader who frequently initiates discussions [136] vs. follower who passively observes discussions [136]).
- Subjective performance is assessed based on criteria including but not limited to results of performance review (as directly recorded in numerical values in an HR system or as a polarity value evaluated from linguistic analysis of those reviews' content), any sanctions received by the employee, as well as the expression of a particularly positive or negative judgment on the actor's [220] performance as inferred from hits produced by appropriate ontology classifiers [150].
- This behavioral trait [295] corresponds to an actor spending a lot of time and directing significant attention or effort towards non-business issues and topics [144] during work hours, as well as to repeated and impactful interferences of personal issues with behavior in the workplace.
- It should be noted that another interesting trait [295] to measure is the reverse of this one, i.e. where electronic data produced during time out of the office contains large amounts of work products and other job-related content.
- In the default embodiment of this invention, this is measured by criteria including but not limited to:
-
- The proportion of items pertaining to at least one personal and one professional topic [144], as derived, for example, from running ontology classifiers [150] on electronic communications [123] authored by the actor [220];
- The amount of time spent by the actor [220] on non-business-related activities using electronic systems and applications in the workplace;
- Attention shift toward external contacts, as observed from an increasing proportion of communications [123] being sent to actors [220] external to the organization.
- Job satisfaction describes how the considered actor [220] feels about her job at a given point in time or over time. The behavioral model [200] should contain as many traits [295] as possible in this category to reliably quantify the level of satisfaction as well as assess the topics [144] and related entities associated to the highest (resp. the lowest) degree of satisfaction for a particular actor [220]. The rest of this section describes such traits [295] that can be measured in the default embodiment of this invention.
- This behavioral trait [295] corresponds to chronic discontentment transpiring in the organization.
- In the default embodiment of this invention, this is measured by the volume of negative language related to the actor's [220] current responsibilities, to organizational policies, to coworkers, etc. In particular, the system measures resentment expressed by the actor [220] about people higher up than she is.
- When measuring this behavioral trait [295], the system discounts any changes in sociability that are health- or family-related. An exception to this is when those personal issues become so critical that they turn into motivations for harming other actors [220], the organization, or the society, for example if they result from financial distress; this can be measured by linguistic analysis, or any method for monitoring the actor's [220] financial situation.
- This behavioral trait [295] corresponds to indicators of envy, jealousy, or any presence of grief in an actor [220], including a vindictive attitude.
- This is measured by criteria including but not limited to:
-
- Repeated expression of envy or jealousy towards other actors [220];
- Being often irritable, directing anger toward a multitude of unrelated topics [144] or actors [220], particularly in light of the success of others. In the default embodiment of this invention, the system detects this situation by observing internal and external events [100] such as someone else getting promoted, getting married, or anyone else in the actor's [220] community or workgroup getting lots of non-negative attention regardless of the specific reason;
- A passive-aggressive behavior, which can e.g. be detected as a combination of derogatory comments toward another actor [220] and repeatedly ignoring that other actor's [220] request for advice or for input in their professional tasks.
- In one embodiment, the system lets a user visualize grievance patterns exhibited by a particular actor [220] using the stressful topics visualization described in this invention, using that actor [220] as the subject of interest.
- This behavioral trait [295] is defined as the manifestation of excessive greed, but also more generally of an overachievement behavior. This is particularly important for anomaly detection aimed at spotting malicious behavior since excessive greed is often characteristic of an individual ready to go to extreme lengths, including malevolent ones, to reach her goals; such individuals try to rationalize unreasonable financial aspirations, thereby suggesting that they are struggling with internal conflicts due to considering malicious or otherwise unauthorized actions.
- In one embodiment, excessive greed is detected by ontology classifiers [150] capturing constructs such as repeated complaints of “wanting a better X” for various values of X, or reflecting an obsession about the recognition (financial or other) of one's achievements
- In the default embodiment of the present invention, two types of definitions are combined for assessing an actor's [220] sense of undervaluation, similarly to the case of job performance assessment: an objective definition and a subjective definition.
- Subjective undervaluation can be measured by criteria including but not limited to:
-
- The actor [220] complaining in electronic communications that he is passed over for a promotion by less able individuals;
- The actor [220] complaining in electronic communications [123] that she feels underpaid;
- Objective undervaluation can be measured by criteria including but not limited to:
-
- Increase in actual responsibility without being promoted, as observed from a formal HR system if available, or from pragmatic analysis combined with the detection of work projects and responsibilities;
- How often the actor [220] is consulted on her domain of expertise;
- How often the actor [220] uses her specific skills.
- These traits [295] constitute the external perception of the actor by a number of different people and together provide a collective wisdom portrait which constitutes a far more accurate and reliable picture of an actor's [220] personality and baseline behavior [260] than when only looking at that actor's [220] own point of view. The rest of this section describes such traits [295] that can be measured in the default embodiment of this invention.
- This behavioral trait [295] corresponds to how well the actor [220] receives feedback and constructive criticism, as well as her willingness to acknowledge and learn from past errors.
- In one embodiment of the invention, acceptance of criticism is measured by methods including but not limited to:
-
- Detecting events [100] related to the assessment of the actor's [220] performance then assessing their emotional impact by analyzing mood changes following such events [100], especially the strongly negative ones. Relevant events [100] comprise job performance reviews, informal evaluation by peers or hierarchical superiors, and more indirect evidence [108] such as the presence of calendar events for one-to-one meetings with a manager;
- Assessing whether the actor [220] takes into account feedback and guidance that have been provided to her into her behavior or ignores them, and how often and how completely she incorporates into her work products changes suggested by others. The latter can be assessed using textblock analysis to compute the lineage of work-related documents [162] produced by the actor [220], and looking if the versions of such documents are updated to include a textblock pattern [124] found in content sent or otherwise submitted by a peer.
- This behavioral trait [295] corresponds to how closely the way others see the actor [220] matches her own self-perception.
- Measuring this trait [295] is particularly useful because a high contrast between how a person is perceived by others and their self-perception is often associated to discontentment and in more extreme cases to psychological troubles.
- In one embodiment of the invention, the level of perceptive bias for an actor [220] is measured by methods including but not limited to:
-
- Assessing the similarity of other actors' [220] negative (resp. positive) language relative to the actor's competence with her own expression of her skills and value;
- Analyzing how fairly and realistically the actor [220] presents her own achievements. This can be detected by linguistic constructs such as other people blaming the actor [220] for taking undue credit, and the actor [220] reacting very strongly or inappropriately to criticism from management or from peers.
- This behavioral trait [295] measures the breadth and the level of influence exerted by the actor [220] over his peers, for example his co-workers.
- A low level of influence occurs when the actor [220] has no significant impact on others, which depending on the context might be an important anomaly [270] to detect. Also, a suddenly increasing level of influence might reveal that the actor [220] is trying to become a power broker, possibly with the intent to extort information from others or coerce them for malicious purposes.
- In one embodiment of the invention, the level of influence exerted by an actor [220] is measured by methods including but not limited to:
-
- Computing the importance of the actor [220] in the communication graph filtered appropriately, for example using the graph of relaying implicit instructions (also called Mere Forwards, as described in U.S. Pat. No. 7,519,589, the disclosure of which is incorporated by reference herein for all purposes);
- Detecting memes that originate from the actor [220], which provides a measure of linguistic or topical influence;
- Assessing the role of the actor [220] in discussions [136] based on the relative frequency with which the actor [220] is a primary actor in discussions [306], where the primary actor is defined as the originator of one or more resolution items (as described in U.S. Pat. No. 7,143,091).
- Assessing the impact of the actor's [220] presence or absence on the other actors [220], as described in the section on Actor influence analysis.
- Reliability as a behavioral trait [295] indicates how dependable the actor [220] is considered, in particular how much actual trust others put into her beyond purely professional interactions.
- In one embodiment of the invention, the reliability of an actor [220] is measured by methods including but not limited to:
-
- Identification of linguistic markers of respect and deference (and lack thereof);
- Detection of negative judgments, repeated criticism, and derogatory language about that actor [220], especially when coming from multiple actors [220] who are not closely related to one another.
- This behavioral trait [295] indicates the status of an actor [220] in the eyes of her peers, particularly in the domain of their professional interactions.
- In one embodiment of the invention, the popularity of an actor [220] is measured by methods including but not limited to:
-
- Assessing the level of formality of interactions between that actor [220] and other actors [220], especially when that level is asymmetric (e.g. when the actor [220] is very formal in his messages but replies from various other people are unusually terse and informal);
- Measuring the responsiveness of others to requests made by the actor [220];
- Measuring how often the actor [220] is blown off by others [220];
- Conversely, measuring how often the actor [220] is consulted by others;
- Assessing how often the actor [220] is included in events or distribution lists after the fact: for example, someone who is often added to email recipient lists, especially on sensitive issues, is typically more popular than the average since others think about including her in the loop and implicitly value her opinion;
- Computing the actor's [220] pecking order score (see section on Pecking order visualization);
- Computing a measure of authority for that actor [220] in the graph built from invitations to meetings, which can be done using a well-known hubs and authority algorithm. It should be noted that meeting invitations are an essential behavioral indicator in many organizational environments, especially those where the more relevant or risky facts are rarely put in writing. Thus the graph of who invites, and demonstrably does not invite, whom to meetings is very valuable and can be used to assess an individual's perception by his peers. A measure of an authority for the node representing an actor [220] in the graph of meeting invitations thus indicates how many people invite her who are themselves considered important enough to be invited to other meetings, thereby providing an important objective component in the assessment of that actor's [220] popularity. Connectedness
- This behavioral trait [295] measures the extent to which an actor [220] is included in business workflows, and how much she interacts with coworkers.
- This trait [295] is an important component of an individual behavior [210] since isolation patterns, which reflect a significant lack of interaction or generally poor-quality interactions, are a worrying sign in most organizations. Conversely, the unexplained disappearance of isolation patterns is also suspicious in many situations, since e.g. someone reclusive who becomes very popular might actually be trying to manipulate people in the parent organization in order to gain unauthorized access to sensitive information.
- In one embodiment of the invention, the connectedness of an actor [220] is measured by methods including but not limited to:
-
- Computing the hub value for that actor [220] in the communication graph filtered by work-related content and where connections across distant groups [225] are more heavily weighed, once again using a classical hubs and authority algorithm;
- Computing a measure of betweenness for that actor [220] in the graph of meeting invitations (described in the paragraph on popularity assessment). In a graph where nodes represent actors [220] and edges represent a certain type of connection, a measure of betweenness for a particular node indicates how many pairs of otherwise disconnected actors [220] the considered actor [220] connects with each other: therefore, the betweenness in the graph of meeting invitations is a reliable indicator of connectedness with the rest of the organization;
- Measuring how often the actor [220] makes regular and significant contributions to collective tasks, and how often she collaborates on coworkers' assignments. This can be measured for example using pragmatic tags [172] computed on workflow instances [134], as performed by the pragmatic tagging component [430].
- The system allows the user to efficiently visualize connectedness assessment for a particular actor [220], either when an anomaly [270] has been flagged or on-demand by the means of a graph showing changes of position within the actor network, which is one of the animated actor graphs [471] provided by the system.
- This behavioral trait [295] reflects the tendency of an actor [220], in the face of conflicts involving other actors [220] in the organization, to either make those conflicts worse by her actors or speech, or, in contrast, to attempt to nip emerging conflicts in the bud, or more generally to solve existing interpersonal issues.
- In one embodiment of the invention, an actor's [220] propensity to confrontation is measured by methods including but not limited to:
-
- Detecting language and behavior causing aggravation of conflicts, such as frequent use of an aggressive tone;
- Conversely, detect language and behavior causing mitigation of conflicts, such as using a neutral tone in consecutive communications resulting in an agreement being reached between the parties involved in the conflict;
- Measuring the impact of the actor's [220] involvement in a discussion [136]. In the default embodiment, this is measured by computing two different scores for an actor [220], as explained in the following.
- When assessing an actor's [220] tendency towards confrontational behavior, the reply-based confrontation score [285] takes into account that actor's [220] response to negative sentiment. For example, if in the course of a discussion [136] that contains very little (if any) negative sentiment, the actor [220] sends out a very aggressive email, this will significantly increase her reply-based confrontation score [285]. By contrast, the effect-based confrontation score [285] takes into account that actor's [220] communications' [123] effects on others. For example, if in the course of a highly tense discussion [136] the actor [220] sends a neutral reply that has a clear mitigating effect on the following items [122] in the discussion [136], this will significantly decrease the actor's [220] effect-based confrontation score [285]. These two scores [285] take positive values and are computed as follows in the default embodiment. We denote n the length of a discussion [136], stmi the polarity of the i-th item [122] in the discussion [136] for 1<=i<=n (positive values indicate positive sentiment), and author; its author (e.g. sender of an email). Then, to compute A's reply-based confrontation score, we tally all values taken by the expression
-
1≦i≦k, stm i<0, authori =A, i<j≦n, authorj ≠A: Max(−stm j,0) - and compute its mean over all discussions [136]. To compute A's effect-based confrontation score [285] we tally all values taken by the expression
-
1≦i≦k, stm i<0, authori =A, i<j≦n, authorj ≠A: −stm i/(n−j)*Σj<k≦n Max(−stm j,0) - and compute its mean over all discussions [136].
- The following describes behavioral traits [295] related to individual and collective communication patterns and how they can be measured in the default embodiment of this invention.
- This behavioral trait [295] assesses whether the actor [220] is only concerned with her own interests or has other people's interests in mind.
- In one embodiment of the invention, an actor's [220] self-centered behavior is measured by methods including but not limited to:
-
- The presence of generic linguistic markers of narcissism and egotism. In particular, someone who talks about herself more than the norm, especially in a strictly professional context, is suspicious;
- The absence of expressed interest for others;
- The lack of thankful language as detected using pragmatic tags [166] derived from discourse analysis.
- In one embodiment of the system, stress management issues are detected using linguistic markers including but not limited to:
-
- A significant increase in the amount of negative sentiment detected in communications [123] sent by the actor [220];
- The adoption or a terse language structures, as observed e.g. from punctuation usage;
- More specific signs of stress such as changes in the structure of the actor's [220] email replies, e.g. no longer taking case of in-lining replies within the original email's content.
- Polarized behavior corresponds to the presence of both highly negative and highly positive sentiments expressed by the actor [220] on a certain topic [144] or relative to certain actor [220].
- This is important in many situations as a factor of behavioral risk because polarized interactions and responses can suggest a vindictive attitude, but also more generally identity issues, especially when originating from the effort to maintain a façade of allegiance to the parent organization.
- In one embodiment of the invention, a polarized attitude is detected by methods including but not limited to:
-
- Measuring the presence of reverse sentiments expressed by the actor [220] toward other actors [220] or actor groups [225] when communicating with different audiences (for example, trusted persons and confidants vs. distant relationships);
- Detecting the use of deceptive language, including flattery, which might imply in some cases that the actor [220] is hiding or distorting the truth to promote a hidden agenda. In that embodiment, deceptive language is detected using linguistic cues generally associated to deceptive writing, such as reduced first-person usage, increased usage of negative-emotion terms, a lower volume of exclusive terms, and a higher volume of action verbs.
- Information dissemination analyzes how specific knowledge and data spreads through the organization, e.g. if it typically travels through a single level of management, or more vertically, and how long it takes to become general knowledge, etc.
- This behavioral trait [295] is particularly useful for analyzing the spread of highly valuable or sensitive information, as well as the effectiveness of knowledge acquisition throughout the organization.
- To measure information dissemination behavior, actor profiles are built according to a model of knowledge creation and transfer. In the default embodiment of this invention, profiles are creators of information, couriers, and consumers. A method to rank actors [220] against each such profile is described in the section on Information dissemination profiles, along with a novel way to efficiently visualize these profiles on the whole actor network.
- Optionally, the input data can be filtered by topic [144] or by ontology classifier [150], etc. to only retain information relevant to a specific scenario.
- This behavioral trait [295] measures a particularly interesting social network pattern, namely the emergence of cliques [255], i.e. groups of tightly connected people. This is especially relevant in relation to other significant patterns or topics [144], and especially when these cliques [255] have a secretive nature.
- In addition, people tend to aggregate around their likes, so that clique [255] members will often share similar behavioral traits [295], which can provide supporting evidence for the system's behavioral model [200] construction that an actor [220] matches a particular profile type. In other words, anomalous patterns for this behavioral trait [295] include, but are not limited to, a sudden change in an actor's [220] tendency to form cliques [255], and the inclusion of an actor [220] in one or more cliques [255] around another actor [220] who has previously been flagged—manually or automatically—as suspicious.
- In the default embodiment of the invention, an actor's [220] tendency to form cliques [255] is assessed by detecting patterns including but not limited to:
-
- A small group of actors [220] who have no common professional responsibilities and mostly communicate on a personal level, except that they systematically regroup after the occurrence of some event in the organization (e.g. they exchange IM to agree to meet offline after every internal meeting on a particular project);
- Distinct communication channels [156] between certain groups [225], in comparison to official channels. This is considered particularly suspicious behavior when work groups [220] are supposed to be separated by clear boundaries, because somebody trying to gather info they should not have access to would go out of their way to socialize with members of other groups [220], which is often reflected in communication channels [156] that are clearly separate from the organizational ones.
- The system allows the user to efficiently visualize the emergence, decay, and evolution of cliques. This is done either upon detecting an anomaly in relation to cliques [255] including a particular actor [220], or on demand from the user, by the means of an animated graph of cliques, which is one of the animated actor graphs [471] provided by the system.
- This behavioral trait [295] measures an actor's [220] tendency to develop and sustain personal relationships with co-workers.
- In one embodiment of the invention, the level of social proximity of an actor [220] is assessed by computing metrics [290] including but not limited to:
-
- The number of topics discussed with other actors [220];
- The average level of formality between that actor [220] and all other actors [220] with whom she socially interacts;
- The variety of emotive tones in that actor's [220] social interactions;
- The number of communication channels [156] for each actor [220] she socially interacts with;
- The number of foreign languages spoken (when applicable);
- The time-spread of social interactions, such as time of day or day of week;
- The frequency of invitations to social events;
- The length of non-work-related discussions [136];
- The proportion of time spent in social interactions;
- The actor's [220] ability to form long-lasting relationships, e.g. measured as how often the actor [220] keeps in touch with former co-workers and for how long.
- The system allows the user to efficiently visualize and assess social proximity and its evolution for a particular actor [220] on a continuous basis, using the Social You-niverse visualization [472].
- Elicitation patterns detected by the system include but are not limited to:
-
- Use of flattery to force other actors [220] to cooperate or unknowingly provide information;
- Attempts to defuse a defensive reaction when seemingly trying to extort sensitive or otherwise valuable information.
- In one embodiment of the invention, elicitation is detected using methods such as:
-
- Flagging linguistic markers of flattery and elicitation attempts. For example phrases such as “Can I ask you a favor” might be an attempt to defuse a defensive reaction when trying to extort sensitive information. Also, malicious insiders often manage to convince others that they are committing their acts for a good cause, e.g. that leaking confidential information outside the organization actually helps that organization by demonstrating the ineffectiveness of the data leak prevention systems it has put in place. Therefore, phrases such as “We're all on the same side here” are also positively correlated with elicitation attempts;
- Detecting drastic changes in the level of influence between two actors [220], which is one of many indicators that elicitation attempts could be succeeding.
- In addition to observed behavioral traits [295], some indicators of an actor's [220] personality or character are valuable for anomaly detection. The rest of this section describes such traits [295] and how they can be measured in the default embodiment of this invention.
- This character trait [295] corresponds to detecting extreme values of an actor's [220] apparent self-esteem. Both extremes result in anti-social behavior that might put the organization or other individuals at risk: a lack of self-esteem might lead the actor [220] to be easily influenced or to act irrationally, and when extremely low can even denote socio-pathological tendencies. Conversely, an inflated ego—and ego satisfaction issues in general—results in a lack of consideration for others and in a tendency to manipulate other actors [220] while rationalizing malicious actions. Also, a major decrease in self-esteem is also considered a cause for concern and therefore such a pattern is flagged by the system as anomalous.
- In one embodiment of the invention, the likely existence of self-esteem issues is assessed using the following methods:
-
- Detecting patterns where the actor [220] repeatedly seeks out those who flatter her. This can be done by measuring quality time (as shown in the Quality time visualization [478]) and computing proximity based on communication [123] initiation;
- Conversely, detecting patterns where the actor [220] tries to avoid others to the extent possible especially after expressive negative feelings about herself;
- Significant mood swings according to whom the actor [220] is primarily interacting with;
- Signs of an inflated ego, detected using linguistic constructs and ontology classifiers [150], which include an inability to cope with delayed gratification, a strong sense of entitlement, and constant looking for validation by other actors [220].
- Based on a number of behavioral traits [295] including but not limited to the ones previously described, each of which can be scored along one or more behavioral metrics [290], the system establishes an assessment of the behavior of all individual actors [220] as follows.
- For each behavioral trait [295], a per-actor [220] score [285] is computed as a linear combination of the values taken by each normalized metric [290] for that trait. Metrics [290] are normalized against all actors [220] for which they are defined (since no value might be available if for example no data relevant to the metric [290] is associated to a particular actor [220]) so as to have a zero mean and unit standard deviation. The set of weights involved in the linear combination of metric scores [285] is independent from the actor [220] considered. In one embodiment of the invention, the user [455] completely specifies the weight for each metric [290] of a given trait. In another embodiment, the system is initialized with identical weights, following which those weights are continuously adjusted based on user feedback [160], as described in the section on Anomaly detection tuning. This allows the system to fine-tune the assessment of a behavioral trait [295] based on input from a human expert since the comparative reliability e.g. of an actor network metric and of an ontology classifier metric is subject to interpretation and therefore can not be automatically determined by the system.
- This linear combination yields a scalar value for each actor [220] against a particular behavioral trait [295], which in turn does not need to be normalized since it constitutes a relative scoring mechanism across the actor set; the system thus uses these values to rank the actors [220] by decreasing score [285]. These rankings [275] can then be leveraged in multiple ways, including having the system automatically generate alerts [305] for the top-ranked actors [220] against a particular trait [295] (as described in the section on Anomaly detection) or by letting the user query the behavioral model [200] against a trait [295] and returning the top-ranked actors [220].
- The system is also able to determine the main characteristics of actor groups [225] using a model similar to the one used in individual behavior [210] assessment. This is useful to measure behavior at a coarser granularity than on a per-actor [220] basis. It is also essential to application scenarios where risk is presumed to originate not only from malicious insiders acting on their own or with accomplices external to the organization but also from conspiracies involving multiple actors [220].
- To do this, the system defines and measures a number of collective behavioral traits [295] which together compose a collective behavior [215]. Each of these traits [295] is derived from the individual behavior traits [295] of the group [225] members—which are measured as described previously—in a straightforward manner using simple aggregate metrics [290]. This allows the system to rank actor groups [225] against each of these traits [295] and thus lets the anomaly detection component [450] raise behavioral alerts [305] as appropriate. Those alerts [305] will thus be imputed to a group [225] rather than a single actor [220].
- In the default embodiment of the invention, the system aggregates every single behavioral trait [295] for each actor [220] who belongs to a given group [225] by simply computing the average score [285]. This allows for example, to determine a score [285] and thus a ranking [275] for all formal or informal groups [225] against the disengagement trait described previously.
- The system presents the resulting collective behavior [215] model to the user by underlining simple statistical measures over the corresponding behavioral traits [295], such as the average score [285] against that trait [295], highest and lowest score [285], and standard deviation, for each actor group [225] for which the trait [295] has been measured. In one embodiment, this last condition is defined as having a significant number of individual trait measurements, for example at least 6 individual actor [220] scores [285].
- In addition, all visualizations [204] of behavioral patterns described in this invention can be adapted to represent group behavior [215] rather than individual behavior [210]. For example, the social universe visualization uses planets [4820] to represents either individual actors [220] or groups [225]; matrix visualizations such as stressful topics and emotive tones represent either individual actors [220] or groups [225] as rows in the matrix; and visualizations [204] based on selecting input data [360] according to the actors [220] involved, such as the sequence viewer, can also filter that data according to the groups [225] involved.
-
FIG. 8 shows a partial hierarchy of anomalies [270] generated by the anomaly detection component [450] of the system. - An anomaly [270] is associated to one or more events [100] that triggered the anomalous patterns, and to zero, one or more subjects [272], each subject [272] being an actor [220], a group [225], a workflow process [128], or an external event [170] to which the anomaly is imputed. The anomaly subject is defined as follows:
-
- If the behavioral modeling component [445] has flagged an individual behavior as anomalous for a particular actor [220] or group [225], that actor [220] or group [225] is the subject of the anomaly.
- If the continuous workflow analysis component [465] has detected an outlier workflow instance [134], the corresponding workflow process [128] is the subject of the anomaly.
- If the anomaly detection component [450] has correlated the anomaly with a particular point in time or external circumstances, the corresponding external event [170] is the subject of the anomaly.
- The quality of the output produced by a reliable anomaly detection system cannot be evaluated by a single metric. Therefore, an anomaly [270] generated by the present invention possesses a set of properties: confidence [870], relevance [880], and severity [875], which are described in the following.
- Confidence [870] is a property endogenous to the model and indicates the current likelihood (as a positive numerical value estimated by the system) that the facts underlying the anomaly [270] are deemed valid in the physical world. This includes, but is not limited to, the following considerations: how strong the chain of evidence [100] is, and how many associations are established to produce the anomaly [270].
- Relevance [280] is a property which is computed by combining user input [106] and information derived by the system. It represents the importance of the risk associated to the anomaly [270]. That is, a highly relevant anomaly [270] indicates behavior that can be malevolent or accidental but carries a significant risk (business risk, operational risk, etc.) whereas a low relevance [280] indicates abnormal but harmless behavior. Relevance [280] is a positive numerical value initialized to 1 for any newly detected anomaly [270].
- Severity [875] is a property defined by human users and indicates the impact (for example in terms of material or financial damage) that the anomaly [270] would lead to if it is actually confirmed and carries a risk. In the default embodiment of the invention, the severity [870] is defined by a set of system-configurable parameters and is assigned to alerts [305] and other actions posterior to anomaly detection, but is not used by the system in its computations.
- Anomalies [270] are generated by the anomaly detection component [450] and can be of different types, including but not limited to:
-
- An atomic anomaly [830] is the most basic type of anomaly. An atomic anomaly [830] can be triggered by an anomalous event [835], which corresponds to observed events [102] having features which are abnormal in and of themselves. An atomic anomaly [830] can also be triggered by a rule violation [855] which corresponds to an observed event [102] having breached a compliance rule [865] or any part of an internal or regulatory policy.
- A categorization anomaly [825] corresponds to unexpected categorization results of the data associated with a subject [272] with respect to that subject's [272] past categorization results.
- An anomaly by association [850] groups several anomalies [270] that have strong causal relationship as derived from evidence links [108].
- An anomaly by deviation [805] indicates that an individual actor [220], a group of actors [225], or even a workflow process [128] is showing characteristics that are unusual with respect to their normal characteristics, where normal is defined either with respect to past behavior or to the behavior of similar actors [220] or groups [225].
- A behavioral trait anomaly [840] indicates that relatively to other actors [220] in the organization, an actor [220] has one of the most anomalous scores [285] with respect to a particular behavioral trait [295], as described in the section on the Behavioral model.
- Each of these types of anomalies [270] is described in more detail in the following sections, along with one or more proposed methods for detecting such anomalies [270]. It should be noted that these anomaly types are not mutually exclusive: for example, an anomaly [270] may be justified by a single anomalous event [835] which also triggered a categorization anomaly [825].
- The simplest type of anomaly [270] that is raised by the anomaly detection component [450] is an atomic anomaly [830] due to an anomalous event [835]. For example, emotive tone analysis as described in this invention may have detected derogatory or cursing language on certain internal communication channels [156], which can constitute anomalous behavior which should be further investigated, and thus results in an atomic anomaly [830] imputed to the author of these communications [123].
- In one embodiment of the system, the anomaly detection component [450] can also be configured to trigger atomic anomalies [830] based on rule violations [855], such as compliance rules [865]. An example of rule violation is when a particular topic [144] may have been blacklisted so as to express that no communications [123] pertaining to that topic [144] should be exchanged among certain actors [220]. In this case, the communication [123] between two such actors [220], or alternatively the creation by one such actor [220], of a document [162] associated to that topic [144] is flagged by the system as a rule violation.
- Categorization anomalies [825] are produced on two types of categorical features.
- The first type of categorical features corresponds to categories produced by any type of categorization component [146] as part of the continuous categorization component [420], such as detected topics [144]. An example of categorization anomaly in that case is when the topical map for a given subject [272] shows an abrupt and unjustified change at some point in time.
- The Second type of categorical features corresponds to built-in features of the system, such as detected emotive tones or entities (event names, people names, geographical locations, etc.) which are derived from the analysis of data and metadata extracted from events [100], including periodic sequences [132] that match a periodic pattern [126]. An example of categorization anomaly in that case is when the set of closest actors [220] has significantly changed for a given actor [220].
- Categorization anomalies [825] are produced for any profile change detected as explained in the section on Continuous categorization.
- Some events [100] can be flagged as an anomaly not by itself but because it is associated with another anomaly [270] by dint of some relationship inferred from one or more pieces of evidence [108], including but not limited to:
-
- Content similarity
- Causal relationship (as detected by discussion building)
- Referential relationship (as detected by entity analysis)
- Temporal relationship (for example temporal proximity between the two events [100], or proximity with different instances of a periodic sequence [132])
- Geographical proximity
- Organizational proximity, including involvement of the same actor [220]
- Common topics [144].
- In one embodiment of the system, anomalies by association [850] are computed as follows. The system configuration defines indirection levels that are incorporated into anomalies [270] when events [100] are correlated with other events [100]. In one embodiment of the present invention, an indirection level is a value I between 0 and 1, which is used to compute the new confidence level [870] every time a new relationship is established using the following expression:
-
confA-B=Max(confA,confB)+I Min(confA,confB) - Values of I are for example:
-
- I=1 for strong sources of causal evidence [108], such as formal or informal workflow processes [128], shared attachments or textblock patterns [124], and for any topical, organizational, or referential association.
- I=0.5 for weaker sources of causal evidence [108], such as lexical content similarity, or communications [123] joined based on pragmatic tagging.
- For temporal associations, I is a function of the time difference between the associated events [100], namely I=exp(−γ|tA−tB|), with γ dependent on the type of events [100] considered (since different event types correspond to very different time scales).
- Baseline values for all types of behavioral features (including statistical measures and periodic features), computed for a particular subject [272] called the target subject [272], are stored in the baseline repository. They are continuously adapted by the behavioral modeling component [445] using the accumulation of collected data.
- Every behavioral feature lends itself to one of several pair-wise comparisons, each based on the definition of a referential [3000].
FIG. 30 shows the types of referentials [3000] provided by the anomaly detection component [450] in a default embodiment of the invention. The value tested for anomalies [270] is called the analyzed feature [3075], while the value representing the norm is called the reference feature [3070]. - Note also that features can correspond to absolute or relative values. To take a statistical measure as an example, such as number of emails sent by an actor (individual or aggregate), the feature can be either the absolute number of emails sent or the fraction it represents relative to the total number of emails sent by all actors [220] or by actors [220] in one of the groups [225] to which he belongs to.
- Many other referentials can be used. In an alternative embodiment, the system lets the user combine several of these built-in referentials (for example an around-event referential [3020] and a peer-group referential [3005]). In yet another alternative embodiment, referentials [3000] can be created whose definition themselves depend on the analyzed and reference features, such as a filter on their reference events [100] (for example, when these features are statistical features related to electronic communications [123], custom peer groups can be defined by restricting the actors [220] to those having discussed a particular topic [144]).
- When the referential [3000] is a baseline referential [3015], the target subject's [272] baseline value [3035] (considered her normal behavior) and her current value [3045] are compared. The rationale for this kind of comparison is that individual actors [220] as well as actor groups [225] are creatures of habit who tend to develop their own patterns of communication, of interaction with the data and other actors [220], etc. The system thus aims at modeling as many idiosyncrasies of the considered actor [220] or group [225] as possible, and at detecting the cases where the current model diverges from the past model. In addition to this goal, observing permanent change, either individually or at a group level, is also of utmost importance since it guarantees the accuracy of the anomaly detection process as well as often provides insightful information on the data itself.
- When the referential [3000] is a historical referential [3010], the current value [3045] is compared to a historical value [3040] (i.e. a fixed amount of time in the past).
- When the referential [3000] is an around-event referential [3020], the target subject's [272] baseline values before and after an external event [170] are compared: the value before that point in time [3050] is the reference feature, while the value after that point in time [3055] is the analyzed feature. The idea here is that if the behavior of a subject [272] tends to change significantly following a particularly sensitive external event [170], this may suggest that the subject [272] was specifically involved in that external event [170].
- When the referential [3000] is a periodic event referential [3025], the changes in the target subject's [272] feature values [3065] around a more recent external event [170] are compared to the same feature values [3060] around an older external event [170].
- When the referential [3000] is a peer-group referential [3005], the target subject's [272] baseline value [3035] is compared to the baseline values of similar subjects [272] defined as a peer group [3030]. This is particularly useful in the cases where no normal behavior can be easily defined on the subject [272], but more generally it provides a much more exhaustive way to detect behavioral anomalies, especially in presence of malicious activities: intuitively, in order to escape the detection capabilities of the present invention, an individual who wants to commit fraud would need not only to hide its own malicious actions so they do not appear suspicious with respect to his past behavior, but would also need to ensure that they do not appear suspicious in the light of other people's actions. The baseline values of similar subjects [272] represented as a peer group [3035] are computed as summaries of the behaviors of similar actors [220] or groups [225], for example by averaging each relevant (observed or derived) feature. A peer group [3035], can be defined in a number of ways, which include but are not limited to:
-
- Actors [220] with an identical function or a peer-level, different function in the company, as defined for example in an organizational chart.
- Actors [220] or groups of actors [225] having the same responsibilities within the company, as reflected in their work habits.
- Actors [220] who are closest to the considered actor [220] (based on a variety of properties such as the most frequent email recipients, the whole list being defined in U.S. Pat. No. 7,143,091). In one embodiment the n closest actors [220] are selected, in another embodiment the cutoff occurs at a proximity threshold.
- Other actors belonging to the same circle of trust [255] as the considered actor [220]. (Where circles of trust [255] are defined in U.S. Pat. No. 7,143,091 and comprise professional circles of trust, personal cliques or friendship-based circles of trust, and event- or crisis-motivated circles of trust.)
- Formal workflows belonging to the same business line as the considered workflow process [128].
- Ad-hoc workflow processes similar to the considered workflow process in any of the following meanings: shared workflow stages [154], shared media (i.e. type of communication channel [156] or of document [162]), involvement of the same actors [220], and similar timeframes.
- The methods available in the present invention to detect anomalies by deviation [805] are detailed in the rest of this section.
- Anomalies by deviation [805] can be detected on any type of feature [2900] associated to events [100]. As illustrated in
FIG. 29 , in the default embodiment of the invention, the following types of features [2900] are supported by the system: -
- Scalar features [2915] are all numerical features [2905] that take a single value. An anomaly by deviation [805] on a scalar feature represents an unexpected trend in that feature [2900], intuitively meaning that the habits of particular actors [220] have changed drastically. An example of such an anomaly by deviation is when the daily volume of communications [123] sent by an individual actor [220] on a particular communication channel [156] has significantly changed over time.
- Vector features [2920] are the multidimensional variant of scalar features [2915].
- Periodicity features [2925] correspond to periodic patterns [126] that disappear or are modified, meaning that regularities in the occurrence of specific events [100] have changed or disappeared. An example of deviation from a periodic pattern [126] is when irregularities occur in a meeting scheduled at a fixed interval, or in recurring data archiving operations.
- Periodicity features [2925] can always be treated as a combination of scalar features and categorical features by considering the time elapsed between successive instances of the periodic pattern [126] as a specific type of scalar feature [2915], as well as including all features [2900] attached to each occurrence (i.e. to each periodic sequence [132]). Therefore only numerical features [2905] and categorical features [2910] actually need to be considered.
- The amplitude of each deviation is mapped into the interval [0,1] where 1 represents the absence of deviation, and this value is used as a multiplier of the confidence level [870] of the considered anomaly [270].
- In order to detect anomalies by deviation [805] on a given feature [2900], a few parameters are needed to configure the model for each actor [220] or group [225]. Note that as for most parameters used for tuning the system, these parameters should not be exposed to the user and should remain as stable as possible so as to minimize their impact on recall rate and precision. Also, a default value is available for these parameters, but this value can be overridden to reflect that different actors [220] have different behavioral characteristics (for example, the time scales for answering email messages can vary tremendously across individuals, which must be accounted for in the behavioral model [200].)
- These parameters include:
-
- v: length of the sliding time window [380] for baseline pattern computation
- w: length of the sliding time window [380] for computation of the current trend (w is typically an order of magnitude smaller than v)
- For each observed or derived feature [2920], a threshold multiplier A to detect abnormal deviations. Another embodiment of the invention defines 2 levels of anomaly so that 2 thresholds are needed, with the lower threshold indicating suspicious, potentially abnormal observations that need to be confirmed by subsequent abnormal observations before becoming flagrant anomalies. Possible values for the threshold multiplier are given in the next section.
- The first step in detecting anomalies by deviation is to define the reference features [3070] against which analyzed features [3075] will be compared.
- A feature descriptor [3080] is an aggregated value computed over a time interval. In one embodiment of the invention, one or more of the following definitions are available for a feature descriptor [3080]:
-
- The average value of the feature [2920];
- The linear regression of the feature [2920];
- The variance of the feature [2920];
- Any combination of the previous values.
- Once the descriptor [3080] has been defined for a given feature [2920], it is straightforward to compute it on an analyzed feature [3075]: this analyzed feature [3075] is usually defined as a sliding window [380] of recent data for the target subject [272], or as a window around an event [100] (unique or recurring) in the case where behaviors around two different events [100] have to be compared.
- Computing the descriptor [3080] on a reference feature [3070] is somewhat less obvious, and is described in the following paragraph. We can already distinguish two cases:
- To compute the descriptor [3080] on a reference feature [3070] in the case of a time-based referential (i.e. not a peer-group referential [3005]), in the default embodiment of the system, the descriptor [3080] is the average feature value over all observations in time. In another embodiment, the descriptor [3080] is the feature value of the best exemplar among the observations, i.e. the value that minimizes the sum of distances to the other observations.
- To compute the descriptor [3080] on a reference feature [3070] in the case of a peer-group referential [3005], the same method is used after replacing the set of observations at different points in time by the set of subjects [272] that constitute the reference group [3030].
- This paragraph describes in more detail how a descriptor [3080] is computed on a reference feature [3070], and in particular what information needs to be stored and updated for that computation.
- In the case of a baseline referential [3015], the system only needs to maintain two descriptors [3080] for the target subject [272], in a sliding time window [380] of size w and a sliding time window [380] of size v, both ending at the current time.
- In the case of a historical referential [3010], the system only needs to maintain two descriptors [3080] for the target subject [272], in a sliding time window [380] of size w ending at the current time t and a sliding time window [380] of size w ending at t−T. (Note that necessarily t≦w in the case of a recurring event.)
- In the case of an around-event referential [3020], the system only needs to maintain two descriptors [3080] for the target subject [272], in a time window of size w starting at the event's end time and a time window of size w ending the event's end time. When the event [100] is a recurring event [101], the positions of these windows are updated every time a new occurrence of the event has been detected.
- The case of a peer-group referential [3005] is the least simple one to compute. Firstly, the reference group [3030] can be defined either exogenously or endogenously to the feature [2900] considered. For any given feature [2900], the system may use an exogenous definition of the reference group [3030], or an endogenous definition, or both.
- As mentioned previously, exogenous definitions for a reference group [3030] include but are not limited to:
-
- Group [225] of closest actors of the target actor [220] (when the subject [272] is an actor [220])
- Set of actors [220] having the same role as the target actor [220]
- Set of ad-hoc workflows most similar to the target workflow process [128].
- To compute an endogenous reference group [3030], the system sorts a list of all subjects [272] homogenous to the target subject [272]. Homogeneous means a subject of the same type (actor [220] to actor [220], group [225] to group [225], or workflow process [128] to workflow process [128]) or of a compatible type (actor [220] to group [225]). The list is sorted according to a distance defined over feature descriptors [3080]. In the default embodiment the distance is the Euclidian distance between the feature descriptors [3080]. In another embodiment it is a linear combination of the Euclidian distance between the feature descriptors [3080] and of the Euclidian distance between the variances of the feature descriptors [3080].
- Finally, the system re-computes the reference group [3030] (be it endogenously or exogenously defined) at regular intervals. In one embodiment of the invention this interval is a fixed multiple of w, for example low. In another embodiment, the interval is adapted to the feature [2900] considered: for example if characteristic time scales are available for that feature then the median value of these time scales over the reference group is used as the time interval for the next computation of the reference group.
- The mechanism to detect anomalies by deviation [805] is executed at fixed time interval w. Two cases have to be considered depending on the type of referential [3000].
- In the case of a time-based referential (i.e. any referential [3000] which is not a peer-group referential [3005]), a deviation is detected when the absolute value of the difference between the analyzed feature descriptor [3080] and the reference feature descriptor [3080] is larger than A times the variance of the reference feature descriptor [3080] across the reference observations.
- In the case of a peer-group referential [3005], a deviation is detected using the same criterion after replacing the variance across the set of observations in time with the variance across the set of subjects comprising the reference group.
- In one embodiment, the threshold multiplier A has a default value of 10, and is tuned to larger or smaller values based on feedback given by the user regarding detected anomalies by deviation (see the section on Anomaly detection tuning).
- By leveraging behavioral norms and deviations from those norms in the behavioral model [200] built and continuously maintained by the behavioral modeling component [445], the system also predicts individual behavior; in other words it uses the past to predict the near future.
- Predicted behavior [262] computed by this invention includes anomalous behavior and also more generally includes any kind of future events [100] that are deemed to have a high likelihood of occurring, based on past events [100]. Examples of behavior predicted by the system in the default embodiment include but are not limited to:
-
- Using the behavioral trait [295] measuring acceptance of criticism and described previously in this invention, the system has recorded in the model that a particular actor [220] always reacts very aggressively to negative performance reviews. Then, when the occurrence of the next performance review is detected by any kind of mechanism (for example, an ontology classifier [150] flagging review language in electronic communication [123]), the system will raise an alert at a future point in time at which the subject's negativity is expected to peak, such as 2 weeks after the results of the review have been communicated.
- Using the behavioral trait [295] measuring social proximity between a pair of actors and described previously in this invention, the system has recorded several situations in the past in which members of a particular group of actors [225] severed a strong relationship with some other actor [220]. These past situations shared a number of precursory signs, such as the discussion of a (presumably sensitive) topic [144] in the month preceding the severance of the relationship, or the involvement of a common third-party in the electronic communications [123]. Then, when the same patterns are observed at some point in time for another member of that same group [225], the system will predict that the social relationship will be similarly damaged within a short time frame.
- When an anomaly [270] has been detected soon after several past occurrences of a recurring event [101], the system will predict a similar anomaly [270] when a new occurrence of the same recurring event [101] has been detected. For instance, if a particular group of actors [225] always takes a discussion [136] offline (which is described as a call-me event, along with a method to detect it, in U.S. Pat. No. 7,519,589, the disclosure of which is incorporated by reference herein for all purposes) after a regularly-occurring meeting, the system will predict a call-me event right after the next occurrence of the meeting. As a result, an analyst or operator of the system would have good reasons to investigate the content of that particular discussion [136] among those actors [220].
- Similarly to anomalies [270] raised on past events, the system automatically assigns a confidence level [870] to predicted anomalies [270]. In the default embodiment, the confidence [870] of a predicted anomaly [270] is always lower than if all events [100] had already been observed. The confidence level [870] is derived from the confidence [870] of the corresponding past anomaly [270] by applying an uncertainty factor, which is simply the prior probability that missing events [100] will be observed given all the events [100] that have been observed so far. For example, in the case in which the missing events [100] are part of a workflow process [128], which is modeled by the default embodiment as a higher-order Markov chain (as described in the section on continuous workflow analysis), the probability of those missing events [100] is directly inferred from the parameters of that Markov chain.
- As previously described, one fundamental type of anomalous behavior raised by the system are anomalies by deviation [805], which are detected on any kind of feature [2900] present in the stream of events [100]. Similarly, predicted behavior [202] is inferred on the basis of particular referentials [3000] whenever an analyzed feature [3075] matches the corresponding reference feature [3070], rather than when a deviation is observed between the analyzed feature [3075] and the reference feature [3070]. The fact that the pair-wise comparison operates on baseline behavior [260] guarantees the reliability of this prediction: reference features [3070] are statistically significant as they result from aggregating patterns of behavior over a significant period of time (for example in the case of a baseline referential [3010]) or over a large number of actors [220] (for example in the case of a peer-group referential [3005]).
- The system generates alerts [305] for detected anomalies [270] that meet specific criteria of importance. In the default embodiment of the invention, the unique criterion is Conf*Rel≧κ where Conf is the confidence level [870] of the anomaly, Rel its relevance [280], and κ is a threshold for alert generation.
-
FIG. 31 shows the states in which an alert [305] generated by the system exists, and the transitions between those states. - An alert [305] is generated by an alert profile, which is defined as follows:
-
- Alerts [305] apply to one or more actors [220], groups [225], or workflow processes [128], sorted by level of involvement in the alert [305]. In the default embodiment, the level of involvement is simply the actor presence in the associated discussions [136] and singleton events [100]. In another embodiment, the number of different personae involved in the associated data increases an actor's [220] involvement in the alert [305].)
- Alerts [305] are defined against a given time window.
- List of data sources considered for this alert profile.
- Alert production mode: email (SMTP or IMAP), syslog event, SNMP trap.
- In addition to the system-level parameter κ, the alert [305] generation mechanism is parameterized by the following values:
-
- A maximum aggregation time period ta is defined for anomaly detection (i.e. only events [100] within this timeframe are correlated to possibly trigger an alert [305]). In one embodiment, only events [100] occurring within a sliding window [380] of size ta are aggregated into an alert [305]. In an alternative embodiment, a window of size ta is considered around each occurrence of recurring external events, so that events [100] occurring in disjoint windows [380] can be aggregated into an alert [305].
- A minimum number of independent isolated events [100] to trigger an alert [305] can also be defined.
- In one embodiment of this invention, the user can modify the parameter ta on which anomalies [270] are accumulated to build up an alert [305], using a slider rule widget with a logarithmic scale with windows ranging from as small as 1 hour to as large as 1 month.
- In particular, the definition of these configuration parameters implies that since the number of individual actors [220] covered by the system and the data processing throughput are bounded, then the frequency of generated alerts [305] is in general bounded.
- In addition, the user can provide feedback [160] regarding the actual importance of one or more generated alerts [305], as described in the section on Anomaly detection tuning.
- In addition to alert [305] generation, a number of various actions can be configured, including:
-
- Custom report [310] generation;
- Quarantine [312]: data items [122] and evidence [202] that are associated to a detected alert [305] can be stored in a quarantine for further investigation, for example by a compliance officer in the corporate organization;
- Logging [313]: when an anomaly is detected, the logging functionality either stores metadata relative to the data associated to this anomaly, or stores the native data itself. In the latter case, the way data items are stored can be configured more finely: data can be made anonymous and if the data item is an email, attachments can be stored as well or simply discarded;
- Notification [314]: the notification functionality notifies the actor [220] responsible for a suspicious activity, for example the sender of an email that violated a compliance rule. The objective here is to educate employees on corporate policies.
- Furthermore, the system can be configured to trigger mitigation and preventive actions as soon as it is notified of an anomaly. The benefits of mitigation and preventive actions are twofold.
- First, this is necessary because it provides an extra safety mechanism, thus further reducing the operational risk.
- Moreover, these automatically-triggered actions can limit damages resulting from fraudulent or accidental behaviors in cases where human individuals have failed to react to the anomalies [270] (either by negligence or by malevolence).
- Examples of mitigation and preventive actions are:
-
- Blocking traffic: this action blocks the suspicious traffic, for example on the SMTP server if the data is email, or by sending a TCP reset if the data is transmitted over HTTP;
- Restricting credentials: this action restricts the authorization and access levels of an individual actor [220] or a whole group [225] that is the subject [385] of an alert [305];
- Inhibiting functionality: this action blocks specific functionalities of an enterprise application (either for an individual [220], for a group [225], or globally).
- Feedback given by a user of the system on an alert [305] consists of assessing the relevance of the alert [305], i.e. the risk it represents. Based on that feedback, the system updates the relevance [280] of the underlying anomalies.
- The system lets the user enter feedback [158] on:
-
- A single alert [305] instance;
- Or every instance of a periodically-occurring alert [305];
- Or a cluster of alerts [305] computed by the system;
- Or an anomaly class [3200] defined against that alert as described later in this section.
- A key feature of the anomaly detection tuning mechanism described in this invention is the ability to define a whole anomaly class [3200] showing a particular similarity and providing specific feedback [158] for all the anomalies [270] contained in this class [3200].
- An anomaly class [3200] can be defined in a number of ways. In the default embodiment, a class is defined by one or more expansion criteria. Each criterion defines a set of anomalies [270] (including the set of initial anomalies [270] contained in the alert on which the user is entering feedback [158]), and the combination of multiple criteria is interpreted as a conjunction, so that the resulting anomaly class [3200] is the intersection of the anomaly sets. In another embodiment, an additional operator allows combination of several classes [3200], in which case the combination is interpreted as a disjunction; this allows the ability to capture the union of anomaly classes [3200] using a single set of criteria.
- In another embodiment, as illustrated in
FIG. 32 , an anomaly class [3200] can be defined in two different ways. The first is to define a class by relaxing constraints [3205] on a given anomaly [270], which allows the capture of very targeted classes [3200], in which all anomalies [270] share many properties with the initial anomaly [270]. The second is to define a class [3200] by specifying similarity constraints [3210] with respect to a given anomaly [270], which allows the system to capture broader classes [3200], in which anomalies [270] only share one or a few properties with the initial anomaly [270]. - An anomaly class [3200] can be defined by constraint relaxation [3205] on anomaly A by using any number of the following constraints [3260]:
-
- Actor criteria [3220]: Include anomalies [270] involving any actors [220]; alternatively, include anomalies [270] involving only some of the actors [220] a1, . . . , an of A; alternatively, include anomalies [270] where actors [220] a1, . . . , an of A are expanded to the group [225] of similar actors (which can be defined in a number of ways including personal cliques [255], similar business roles, etc.).
- Time criteria [3225]; Include anomalies [270] with any time frame; alternatively, include anomalies [270] matching a given time specification (which is matched by A).
- Topic criteria [3215]: Include anomalies [270] having any associated topics [144]; alternatively, include anomalies [270] containing only some of the topics [144] t1, . . . , tn of A.
- Deviation referentials [3230]: In case A is an anomaly by deviation, include anomalies by deviation detected against other referentials [3000].
- Named entities criteria [3265]: Include anomalies referring only to some of the named entities E1, . . . , En of A.
- Evidence links [3235]: In case A is an anomaly by association, [850], include anomalies [270] carrying only some of the evidence links [108] L1, . . . , Ln of A. For any type of anomaly [270], include anomalies [270] which are associated to A by a given evidence link [108], even if the confidence of that evidence [108] is not necessarily above the detection threshold. Types of evidence links that can be specified by the user include but are not limited to: causal relationship, shared workflow process [128], shared textblock pattern [124], shared periodic pattern [126], association to an external event [170].
- System type criteria [3255]: Include anomalies [270] involving only some of the systems S1, . . . , Sn of A (where systems are either machines or applications).
- Item type criteria [3245]: Include anomalies [270] involving only some of the item types T1, . . . , Tn of A.
- Interaction type criteria [3240]: Include anomalies [270] involving only some of the action types A1, . . . , An appearing in A. In one embodiment, interaction types that can be specified by the user include but are not limited to: actor-actor communications [123] (exchanging some message, belonging to the same list of recipients), collaboration (shared meeting attendance, work on creating common content), actor-data interactions (check-in operations, data modifications), and system or application interactions (logging into a system, creating a document in an application, creating data on a particular machine).
- Geo-location criteria [3250]: Include anomalies involving only some of the geographical locations g1, . . . , gn appearing in A.
- Alternatively, an anomaly class [3200] can be defined by constraint specification on the basis of anomaly [270] A by using any number of the following criteria:
- Capture all anomalies [270] involving the same actors [220] as A; alternatively, capture all anomalies [270] sharing actors [220] a1, . . . , an of A; alternatively, capture all anomalies [270] including actors [220] similar to actors a1, . . . , an of A (where similar actors [220] can be defined in a number of ways including personal cliques [255], similar business roles, etc.)
- Capture all anomalies [270] having textblock hits [130] for some textblock patterns [124] c1, . . . , cn that also correspond to textblock hits in A.
-
- Actor criteria [3220]: Capture all anomalies [270] overlapping the time frame of A; alternatively, capture all anomalies [270] overlapping the time frame of events [100] e1, . . . , en of A; alternatively, capture all anomalies [270] in the temporal neighborhood of A (where proximity is defined for example by specifying the size of a time window).
- Topic criteria [3215]: Capture all anomalies [270] sharing topics t1, . . . , tn with A.
- Named entities criteria [3265]: Capture all anomalies [270] sharing named entities E1, . . . , En with A.
- System type criteria [3255]: Capture all anomalies [270] sharing systems S1, . . . , Sr, with A (where systems are either machines or applications).
- Item type criteria [3245]: Capture all anomalies [270] sharing item types T1, . . . , Tn with A.
- Workflow criteria [3270]: Capture all anomalies [270] involving workflow processes [128] W1, . . . , Wn also present in A; alternatively, capture all anomalies [270] involving some specific workflow stages [154] within these workflow processes [128].
- Interaction type criteria [3240]:Capture all anomalies [270] sharing action types A1, . . . , An with A
- Geo-location criteria [3250]: Capture all anomalies [270] sharing geographical locations g1, . . . , gn with A.
- The feedback [158] given by the user on an anomaly [270] or an anomaly class [3200] is a binary value, “relevant” or “irrelevant”, meaning that the anomaly [270] is confirmed [3105] or refuted [3110]. The next section explains how the system automatically tunes the anomaly detection component [450] using that feedback [158].
- The first type of anomaly [270] that the user can give feedback on is an atomic anomaly [830]. For example, if the event [100] corresponds to a previously blacklisted topic [144], the user may decide to relax the constraint on this topic [144] by applying a lower index of relevance to all events [100] corresponding to the detection of this topic [144].
- In addition, some abnormal events are defined with respect to a threshold value, so that another way to tune the detection of this type of anomaly [270] is to simply change the threshold at which an event [100] is considered abnormal: for example, what minimal value of a shared folder's volatility will constitute an anomaly [270].
- Anomalies by association can also be tuned by a user at a fine-grained level, in addition to the indirection level of each association type described in the section on Anomaly detection, which is part of the system configuration.
- For example, if a topical association indicates that two events [100] share a given topic [144] T, feedback [158] can then be given in the following ways:
-
- Either by indicating the relevance [280] of this whole class of association, so that all topical associations will be impacted.
- Or by indicating the relevance [280] of similar association instances, so that in this case all associations of events [100] sharing topic [144] T will be impacted.
- Or by indicating the relevance [280] of this specific association only.
-
FIG. 33 shows the process by which a user can give feedback [158] on an anomaly by deviation [805]. Such feedback [158] includes the following options: -
- 1. Confirms the anomaly [3305]: this validates that the anomaly [270] is indeed relevant and presents some risk;
- 2. Mark the anomaly as unique exception [3310]: this implies that this occurrence will be discarded, but also that any future occurrence of the same anomaly [270] will be detected;
- 3. Take an action with respect to the baseline:
- a. Decrease sensitivity [3330]: this means that the deviation threshold will be increased for the relevant feature (thus indicating that this occurrence is a false positive and will be refuted [3110]);
- b. Mark the anomaly as a temporary exception [3315]: for a given period of time, the relevant feature will not be monitored, after which the same baseline is expected to resume, so that the same anomaly [270] would be detected again;
- c. Mark this observation as new baseline [3320]: when choosing this option, the user indicates that a sudden baseline change is known to have happened, and that the data-driven monitor should override the baseline trend with the recent trend;
- d. Mark as ongoing baseline change [3325]: when choosing this option, the user indicates that a slow baseline change is known to be happening, and that the data-driven monitor should update the baseline trend as long as this change is occurring, then start monitoring again when the trend has stabilized.
- In addition, as in the general case previously described, when indicating a change to the baseline trend, the user may wish to apply the change to a whole class [3200] of similar anomalies, such as those concerning other actors [220] or those covering another topic [144].
- It should also be noted that the real-world facts known by an analyst can also be indirectly fed into the system, by giving feedback [158] including but not limited to the following types:
-
- A slow change or trend is taking place;
- Exceptional circumstances require making temporary exceptions in the definition of abnormality in a given context (topic, actors, time, etc.).
- The anomaly detection component [450] in this invention contains an additional mechanism to fine-tune the detection of anomalies by deviation [805]. Rather than the specific types of Boolean feedback previously described (whereby the user indicates whether an anomaly [270] or anomaly class [3200] is absolutely relevant or absolutely irrelevant), this additional mechanism allows a user to manually change the threshold at which a deviation is detected on a given feature.
- For example, the user [455] adjusts the threshold (as defined in the section on Anomaly detection tuning) used for detecting abnormal variations of the information flow between two actors [220], or abnormal variations of the measure of centrality of a given actor [220], or the breach in the theoretical workflow process [128] for a given type of document [162].
- Also, the user can adjust the weights associated with each behavioral metric [290] used to assess individual behavior [210] or collective behavior [215] associated with a particular trait [295], as described in the Individual Behavior Assessment Section. In one embodiment of the invention, this is done by manually setting the numeric weights of each metric [290]. In another embodiment, this is done using a slider widget acting as a lever to adjust the weight of each metric [290] while visually controlling the impact of those adjustments on the actor rankings [275], as described in the Alert Visualizations Section.
- In the default embodiment of the invention, the threshold adjustment mechanism is available only to advanced users of the application since it requires a lower-level understanding of the anomaly detection model. The benefit of this additional mechanism is that it allows more accurate feedback on which dimensions of a detected anomaly [270] need to be given more (or less) importance.
- The most straightforward mechanism by which the system adjusts the relevance level of detected anomalies [270] is based on user feedback [158].
- As described previously, this feedback [158] can be a Boolean value (relevant/irrelevant), in which case the anomaly relevance [280] is either set to 0 or left at its original value, a relative value (increase/decrease relevance [280])—for example in the case of periodic anomalies—, or this feedback [158] can be a numeric value whereby the user directly indicates the relevance [280] to assign.
- An additional mechanism aims to automatically prevent false negatives in the anomaly detection process in the case of anomaly bursts and is referred to as anomaly burst adjustment.
- Whenever a new anomaly [270] is detected by the system, the time elapsed since the last anomaly [270] is taken into account to adjust the relevance [280] of the newer anomaly. The goal of this mechanism is thus to account for the case of several anomalies detected within a short timeframe which are part of a burst phenomenon and which all have a root cause: the user's attention should be focused on this event [100] (or sequence of events [100]) rather than on every individual anomaly [270].
- In one embodiment of the invention, a burst interval is defined: when the interval between two successive anomalies [270] is longer than the burst interval, no burst adjustment is applied; when the interval is shorter, the adjustment is an exponential function of the time interval. The negative exponent factor in this exponential function is optimized by the system at regular intervals in order to to minimize disagreements between successive anomalies [270], under the hypothesis that all such disagreements are due to burst phenomena and that all agreements occur outside of bursts.
- In addition to adjusting the relevance [280] of previously detected anomalies [270], the system also automatically adjusts the relevance of feedback [158] entered by the user in two different situations: feedback decay governs the evolution of feedback relevance over time, while feedback reinforcement describes the impact of recent user feedback on prior feedback.
- First, every time new feedback [158] is given by the user to indicate the actual relevance [280] of a particular anomaly [270] or an entire anomaly class [3200], this feedback [158] will not be given the same importance in the future: this is because a variety of factors inside or outside the corporate environment may totally or partially invalidate that user decision at a later point in time. This mechanism is referred to as feedback decay. Ideally the user would then remove that feedback [158] from the system (which is allowed by the present invention). However the reliability of the anomaly detection scheme cannot depend upon every user remembering to update their manual input whenever the context requires it. In addition, the other automated tuning mechanism (described in the next paragraph) is contingent on later anomalies [270] being presented to the user for confirmation or refutation; if future data happens not to contain any such similarly abnormal events, this alternative tuning mechanism will not be offered to the user. This is why anomaly feedback [158] entered by a user of the system is subjected to natural decay.
- Automatic tuning of the weight of user feedback [158] follows a simple decay scheme so that more recent decisions may optionally be given a greater weight (in this case, a value of relevance [280]) than older ones. The main goal of this mechanism is to avoid false negatives resulting from assigning a constant important over time to past decisions even when the data profile has changed significantly (and also in some cases, when these past decisions are no longer relevant to the users of the system).
- In the default embodiment of the invention, user feedback decay follows an exponential decay scheme—as commonly found for example in voting expert systems. Thus, for example, if the value for the half-life of such feedback [158] is one month, the relevance weight associated with these decisions is halved every month.
- User feedback [160] is given on two main types of objects produced by the invention: anomalies (as part of the anomaly detection process) and changes in the data profile (as part of the data discovery mechanism optionally provided by the data collection component [400]).
- For anomaly feedback [158], the half-life of feedback is optionally adjusted based on the nature of the anomaly, so that the characteristic timescales of the respective actors, data types, and processes involved in the anomaly will be taken into account to compute the feedback half-life for that anomaly.
- For data profile feedback [162], the system configuration governs the definition of the decay parameter. The configurations available include, but are not limited to:
-
- A decay parameter which is proportional to the time elapsed since the change (similarly to the case of anomaly feedback).
- A decay parameter which is proportional to the volume of data analyzed (so that for low data throughputs, the decisions will remain valid for a longer period of time).
- In addition to the natural adjustment of feedback weights described previously as feedback decay, the system described in this invention automatically adapts past feedback [158] based on more recent feedback [160]entered by the same or another user: this mechanism, called feedback reinforcement, allows the incorporation of a dynamic knowledge base into the relevance model rather than a static knowledge base built on a case-by-case basis. Furthermore the system guarantees the consistency of relevance decisions with respect to past decisions; additionally, by boosting decisions that have been made multiple times, it also increases the recall rate of actual anomalies [270].
- More precisely, when past feedback [160] is positively reinforced by recent feedback [160], its relevance is increased; when it is negatively reinforced, its relevance is decreased.
- In the default embodiment of this invention, reinforcement is strictly defined by the subsumption relation between instances of anomaly feedback [158]: since a partial order is defined on anomaly classes [3200], past feedback [158] corresponding to classes [3200] included in more recent feedback [158] are the only ones considered for reinforcement. If the feedback [158] decisions are identical, it is a case of positive reinforcement; if the decisions are opposite, it is a case of negative reinforcement. In another embodiment, reinforcement is more broadly defined, so as to include all pairs of feedback [158] with a non-empty intersection. The system then computes an overlap measure and uses that measure to weigh the reinforcement of the older anomaly [270]: an example of such an overlap measure consists of computing the ratio of shared features (e.g. actors [220], topics [144], groups [225], workflow processes [128], named entities, evidence [108] links and associations, etc.) over the total number of similar features in the older anomaly [270]. For example, if the older feedback [158] is related to an anomaly class [3200] comprising 8 actors [220], and this set of actors [220] overlaps with 3 actors [220] in the anomalies of the more recent feedback [158], then the reinforcement will be weighed by a factor of ⅜.
- In one embodiment of the invention, the feedback reinforcement mechanism follows a multiplicative increase I multiplicative decrease scheme commonly found in voting expert systems: the relevance of a positively reinforced anomaly [270] is multiplied by a fixed amount and the relevance of a negatively reinforced anomaly [270] is divided by a given factor. For example, both multiplicative factors are set by default to 2. This factor is then optionally multiplied by a weight associated with the reinforcement operation as described above.
- To efficiently display massive datasets in visualizations [204] such as the sequence viewer [440], the present invention relies on a multi-dimensional scaling (MDS) component [425] to compute the visualization layout [305] for such datasets.
- The multi-dimensional scaling component [425] uses an incremental multi-dimensional scaling algorithm that is an improvement over a method described in [Morrison 2003]. The algorithm uses a sliding time window of period T. In one embodiment of the invention, the size of T is taken as 3 months, which is the characteristic scale of a stable human behavior along a particular trait, and thus provides a reliable baseline to detect deviations from that baseline that represent anomalous behavior.
- The continuous MDS computation is initialized by choosing a core subset of data items, using random sampling from the sliding window ending at the current time, and computing a static layout for those items. In one embodiment, this initial computation uses a spring algorithm initialized with a random layout. As way of example, assume the core subset is of size 300,000 data items, the window is of
length 3 months, and the window is updated on a weekly basis. Thus, each week, roughly 23,000 new items are added to the layout and 23,000 obsolete items from the first week in the window are deleted. The amount of data in the window stays roughly constant. The present invention uses an incremental multi-dimensional scaling algorithm to produce an updated layout each week without running a full static MDS algorithm over all 300,000 data items. - In other words, the core subset is used to generate a layout using the basic, static spring-force algorithm. When the current low-dimensionality layout has to be updated by inserting new items, this core subset serves as a basic layout and the position of its member items is not modified. When the current low-dimensionality layout has to be updated by removing existing items, the position of items in the core subset is only updated if the removed item belonged to the core subset.
- In the default embodiment of the invention, the core subset size is defined as Θ*√N where N is the number of items in the initial sliding window and Θ a small constant, for example Θ=10.
- Thus, in addition to the static MDS computation, two basic functions are used to update an existing MDS layout based on a sliding window: insertion of a data item entering the sliding window and deletion of an item leaving the sliding window.
- The continuous MDS computation in this invention uses a variation of [Morrison, 2003] that improves on the interpolation method. The original method places the item to interpolate on a circle around the nearest core subset item, at an angle obtained through minimization of an error criterion on the difference between high-dimensional and low-dimensional distances against all core subset items; also, it computes a sample to refine the position of the interpolation item using an expensive method, even though the sample obtained rarely varies. This method is too computationally intensive to be run in real time, and has additional limitations—the main one being that it only works in two-dimensional spaces. Our proposed method is much faster and, on most datasets, offers as good results for all error criteria used for evaluation.
- The method used for inserting a new data item is the same as in [Morrison, 2003] except that interpolation works as follows:
-
- Sort the core subset items by increasing distance to the item to interpolate.
- Set the position of the item to interpolate to be equal to the position of the nearest neighbor from the core subset (i.e. the first element in the list from the previous step)
- Run the base spring algorithm for the item to interpolate using only the k nearest neighbors from the core subset, where k is a predefined constant. (In the default embodiment of the invention, k=20 is shown to produce good results, comparable to what is produced by setting k=100.)
- Item removal proceeds by the following steps:
-
- If the item to remove is not in the core subset, remove it from the item set
- Otherwise, pick a random item which is not in the core subset
- Re-run the subset iterations from the static algorithm, except that the iteration does not start with random positions but with the current positions for the core subset items, and a random position only for the new item being added
- Locate areas of change in subset layout by comparing old and new positions and flagging data items exceeding a pre-defined threshold
- Re-run interpolation of the neighbor items belonging to an area of change, using the interpolation method described in the previous paragraph.
- This method of performing item removal allows the basic layout to change over time while still preserving the original layout.
- The sequence viewer [440] is a real-time data visualization included in this invention that provides the user with a synthetic, continuously updated view of a potentially massive number of event sequences [166] (which typically reflect individual and collective behavior) by leveraging the visualization layout [355] produced by the multi-dimensional scaling (MDS) component [425] previously described.
- The sequence viewer [440] is thus a unique and highly efficient visual representation of massive datasets that also has insightful analytical benefits. More precisely, it offers the following advantages:
- The sequence viewer [415] presents the user with a potentially huge number of sequences [166] of any type of events [100] in a compact and synthetic manner.
- In addition, the sequence viewer [415] produces a layout that groups together event sequences [166] following the same pattern and brings closer together sequences [166] that exhibit similar patterns. The result is that unusual sequences stand out from the rest of the data and can be analyzed further, where unusual sequences [166] are understood as sequences [166] that do not belong to a cluster of sequences [166] matching a dominant pattern. For example, when the sequence viewer [440] is used to display instances of a business workflow, unusual sequences [166] correspond to deviations from the typical steps in the realization of an ad-hoc workflow process [128], or from a non-compliant realization of a formal process [128].
- Finally, in terms of categorization, the sequence viewer [415] increases the completeness of any categorization scheme by leveraging the structure represented by the underlying sequences [166], especially when they are based on discussions [136]. This is because the sequence viewer [415] can categorize items [122] which would otherwise have been missed, for example items [122] with very low text content. These items [122] usually cannot be categorized based on their sole content, however discussions [136] reveal different types of causal relationships items have with other items [122], such as messages constituting actions that agree, disagree, approve, or reject a topically relevant issue found in other items of the same discussion [136]. The sequence viewer [415] can even reveal to the user the existence of an altogether previously unknown workflow process [128].
- An event sequence [166] can represent any list of time-stamped events [100]. Possible types for such events [100] include but are not limited to: emails and other communications [123] (time-stamped using the date at which the communication [123] was sent), loose documents (which can be time-stamped using the last modification date), or external events [170] (using a natural time-stamp definition). The superset of all events [100] can be optionally filtered by any mechanism relevant to the particular application scenario, following which sequences are defined as lists of items [122] that pass the filter. In the default embodiment of this invention, discussions [136] provide the basis for defining event sequences [166]: for example, if the user can select a particular actor [220], the event sequences [166] will consist of all discussions [136] containing at least one data item [122] involving that actor [220].
- Once event sequences [166] are defined, a tagging scheme is applied to determine a unique item tag [142] for each element within an event sequence [166] and produce tagged sequences [138]. This scheme can be defined in any way relevant to the particular application scenario.
- In one embodiment of the application, tagging is defined as an ordered list of queries [168] that are continuously run on the incoming stream of events [100], each query [168] being associated with a tag or label. Examples of tagging schemes include but are not limited to:
-
- Discourse-function tagging, applied only to discussions [136] containing electronic communications [123] with extracted text such as emails, instant messages, or phone conversation transcripts;
- Workflow stages [154] in instances [134] of a particular workflow process [128];
- Revisions in document lifecycles computed by the textblock detection component [470].
- For that particular tagging scheme, whenever an item [122] has been hit by at least one query [168], it is tagged with the tag [142] of the first such query [168] appearing in the query list. Any item [122] missed by all queries [168] is either discarded from the event sequence [166] or assigned a default label. In addition, the tagging scheme optionally includes an alignment position definition, which determines a position in each event sequence [166] that will be used for horizontal positioning of each sequence [166] in the sequence viewer's [415] layout. In one embodiment, the alignment position is by default defined as the first item [122] matched by the highest-ranked query [168], and can be overridden by the user with another definition more specific to the particular type of events [100] considered.
- Once a potentially large (and increasing over time) number of tagged sequences [138] has been produced by the system, the sequence viewer [415] uses the result of a continuous MDS computation to produce a one-dimensional layout of those tagged sequences [138].
- In one embodiment of the invention, an incremental one-dimensional MDS algorithm as defined in this invention is used and provided by the MDS component [425], in which an optimization is introduced to speed up the layout for this particular profile of data. This optimization phase stems from observing that over a very large number of tagged sequences [138] that represent common events [100], such as those associated with steps in a particular workflow process [128], many tagged sequences [138] will be identical. Thus for sequence layout computation, the MDS algorithm provided by the MDS component [425] is a modified version of the previously-described method: it maintains a cache of already-positioned data points (i.e. tagged sequences [138]) so that for each data point incrementally added to the layout in the sliding-window-based [380] MDS algorithm, if that data point already exists in the cache, a counter is incremented but no new position is computed. Conversely, whenever a data point (i.e. a tagged sequence [138]) leaves the sliding window [380] and must be deleted from the layout, that counter is decremented and positions are only updated if that point belonged to the core subset in the MDS algorithm and the counter has reached zero.
- The metric definition is also an input to the MDS algorithm. In one embodiment of the invention, the distance definition used in the MDS computation is the shared subsequence distance, which is defined as follows: for two tagged sequences [138] S1 and S2, using their respective alignment positions compute the number of identically tagged items [122] in each tagged sequence [138] at the same position. Let us call this number c(S1, S2). Then the distance between S1 and S2 is defined as
-
c(S 1 ,S 2)/Max(length_left(S 1),length_left(S 2))+Max(length_right(S 1),length_right(S 2)) - where length_left(S) is the number of items [122] in the tagged sequence [138] S occurring prior to the alignment position and length_right(S) the number of items in S occurring after the alignment position.
- In another embodiment, a number of patterns are defined by the user in the form of regular expressions p1, . . . , pk using as symbols the list of all available item tags [142]. These regular expressions are taken to represent particularly significant patterns of events [100], such as sequences [138] know to represent anomalous events [100], or alternatively to represent nominal, standard sequences [138]. The distance between two tagged sequences [138] S1 and S2, called the shared patterns distance, is then computed as the L2 norm of the vectors P(S1) and P(S2) where for any tagged sequence [138] S, P(S) is a vector of length k where the i-th component is 0 if S does not match pattern pi and 1 if it does match pi. Optionally, weights can be associated with each dimension in the L2 norm computation in order to reflect different levels of importance among the patterns.
- In yet another embodiment, the distance between two tagged sequences [138] is defined as a weighted combination of the shared subsequence distance and of the shared patterns distance.
- The continuous sequence viewer [415] is regularly updated so as to organize tagged sequences [138] analyzed in a sliding time window [380]. In one embodiment of the invention, real-time sequence layout computation relies on two types of update operations: major updates and minor updates.
- During a major update, a full multi-dimensional algorithm computation is performed on an initial data set consisting of tagged sequences [138] in the sliding window [380]. The algorithm used comprises an iterative step where a subset of tagged sequences [138] sampled from the whole data set is positioned in a number of iteration loops, and an interpolation step in which all tagged sequences [138] outside the sampled subset are positioned by comparing them only to their closest neighbors among subset sequences [138].
- One type of multi-dimensional scaling algorithm that lends itself to such a two-step process is the family of force-based algorithms, but other types of algorithms can be used in this invention. Using a typical force-based algorithm, the running time for this step is O(S2) where S is the size of the interpolation subset.
- During a minor update, a list of new tagged sequences [138] is added to the data set and a list of old tagged sequences [138] is removed from the data set. New sequences [138] are positioned using the interpolation step only. An old sequence [138] can be removed without repositioning other sequences [138] if it did not belong to the initial sampled subset. Otherwise, the impact of its removal on its closest neighbors needs to be taken into account.
- Using a typical force-based algorithm, the running time for this step if O(W3/2) where W is the size of the data entering or leaving the window in a steady-state regime. The default embodiment of this invention performs minor updates using the modified spring-based multi-dimensional scaling algorithm included in the continuous MDS component [425].
- Since in continuous mode [375] the system cannot rely on a user validating the resulting layout, a number of techniques are used to automatically assess the reliability of the results, and in particular to decide when items entering and leaving the sliding window result in a minor update and when a major update needs to be performed. In one embodiment of the invention, the following validation steps are performed:
-
- Error validation, for example using a global stress function appropriate to force-based algorithms. In one embodiment of the invention, the stress function is the sum of the absolute values of the differences between low-dimensional distance and high-dimensional distance, computed over a random sample of size √N, N being the total number of sequences [138] accumulated in the current layout. (It is important that sampling be done over all data items and not only on the core subset.) If the error exceeds a threshold, there is no need to perform further validation and a major update of the layout needs to be performed.
- Clustering tendency verification, so as to verify that the data exhibits a clustering structure. This verification is simply done using a statistical test that is fast to compute in real time, allowing comparison of the positioning resulting from multi-dimensional scaling against a reference positioning defined from a random position hypothesis. If the clustering tendency condition is not satisfied, a major update of the layout needs to be performed.
- Experimentation using these validation steps has shown the resulting layout to be quite satisfying from a user's point of view for an initial data volume on the order of S2 (where S is the size of the interpolation subset used during each major update) and an incremental data volume in the same range.
-
FIG. 34 shows a graphical depiction of the sequence viewer [415] that displays the results of computing the layout over a set of tagged sequences [138] as explained previously. - This visualization is composed of several areas, which are described in the following.
- In that visualization, each tag [142] applied to categorize at least one item among all sequences [138] is assigned a specific color. In one embodiment of the invention, the system assigns a random color using a random palette that maximizes the contrast between the most commonly occurring tags [142]. A legend area shows an icon [3425] filled with the chosen solid color for each such tag [142]. In one embodiment, clicking on such an icon allows the user to modify the color assigned to that tag [142].
- The zoomed area shows a detailed view of the sequences [138] [3415] that are selected in the compressed area, i.e. that are covered by the zoom box [3405], which is a bounding box acting as a cursor, which allows the user to browse the set of sequences [138] vertically. In one embodiment of the invention, sequence browsing [138] is done by using the up and down arrow keys. Each sequence [138] in the zoomed area is represented horizontally as a series of rectangular areas filled with a solid color selected for the tag [142] of the corresponding item [122]. For each such sequence, clicking on one of its member items [122] brings up a view showing the content of that item [122] (i.e. its extracted text, data and metadata), while clicking on the title area [3455] brings up a composite view of the underlying sequence [138].
- The compressed area [3410] shows a synthetic view of the whole set of sequences [138], optionally filtered by one or more criteria. Sequences [138] in the compressed area are aligned horizontally using the alignment position defined previously, just as in the zoomed area. However, because the compressed area shows a number of sequences [138] that is potentially several orders of magnitude larger than the height of the area in pixels, it relies on a color compression mechanism: a number of sequences [138] per pixel is computed, and each item [142] is drawn as a rectangle of unit height that has a color defined using a method called a blending scheme.
- A blending scheme takes as input the color of all items [142] represented at a particular integer position on a one-pixel horizontal line in the compressed area. In one embodiment of the invention, the blending scheme is obtained by computing the average in the integer-valued RGB space of all input colors. In an alternative embodiment, the blending scheme computes the average in the decimal-valued HSB space of all input colors. In a third embodiment, the blending scheme computes the resulting HSB value by averaging the hue and saturation, and by defining the brightness as Σk=1 . . . n (1/k2)+B0 where B0 is a constant representing the default brightness for items represented in that area.
- Another legend area lists the patterns [3430] that were used in the inter-sequence distance definition when using the shared patterns distance or a composite distance definition.
- Finally, a search and filter area [3435] lets the user filter the set of sequences [138] displayed in the visualization. In the default embodiment of the invention, the following search and filter criteria are provided:
-
- A free text search;
- A criterion on whether only sequences [138] matching at least one pattern should be displayed, or those that do not match any pattern, or all sequences [138];
- A filter pattern that can be defined using a simplified regular expression syntax allowing basic constructs including but not limited to sequences [138] of particular items [142], repetitions of items [142], and missing items [142];
- A filter on emotive tones or other results from applying ontology classifiers [150].
- An alternative sequence viewer [415] display allows the user to specify a past date for a sequence-layout snapshot with which to compare a snapshot of the current sequence layout results. Alternatively, two past dates can be specified.
- By default, snapshots will be computed over a period of time equal to the length of the sliding time window [380]. This duration can be overridden by the user by specifying a longer time period.
-
FIG. 35 shows an example of a sequence snapshot contrast viewer. - A legend area [3510] indicates the color assigned to each tag [142] used to label the items [122] within each tagged sequence [138], as in the continuous sequence visualization. The compressed area is similar to the one described for the continuous variant, except that it is split vertically into two sections: the upper section [3515] shows sequences [138] in the older timeframe, while the lower section [3520] shows sequences [138] in the more recent timeframe.
- This contrast visualization lets a user compare salient characteristics of the underlying sequences [138] (which in turn can represent workflow instances [134]). In particular, large differences in the number of sequences [138] are often meaningful, since the contrasted periods have the same duration. The typical length of a sequence [138] in the two time periods is another feature of interest. Also, choosing a color palette that clearly differentiates two or more categories of tags [142] lets a user eyeball the dominant tags in each set of sequences [138], the order in which they appear, etc. For instance, when displaying the occurrences of a workflow process [128] for a particular actor [220], it is usually very informative to choose warm colors (such as red tones) for item types in which that actor [220] is actively involved, such as sending a work product or performing a particular task, and to choose colder colors (such as blue tones) for item types where the actor [220] is not involved or only passively so: the resulting visualization dramatically outlines any significant increase or decrease in that actor's [220] level of engagement in realizing the workflow process [128].
- Thus, the sequence snapshot contrast is useful for providing an intuitive and straightforward visualization of the differences between the dominant patterns and the unusual sequences [138] around dates of particular interest, for example dates at which suspicious activities have been independently detected by another mechanism. This feature is particularly appropriate when the dates to compare are far apart, since the continuously updated view described previously does not provide such a perspective in that case.
- The alias usage browser [478] is a continuous visualization used by the present invention to efficiently and intuitively display the results of actor [220] analysis on a continuous basis.
- Actor alias identification and persona [230] identification are described in U.S. Pat. No. 7,143,091. These analysis methods allow the system to define disambiguate electronic identities [235] based on the electronic aliases [240] they use communicate on various channels [156], and to assign each actor [220] one or more personae [230] reflecting the fact that an actor [220] might project different personae [230] depending on the topic she is communicating about, with whom she communicated, and more generally in what context and circumstances.
- As most visualizations [204] described in this invention, the alias usage browser can be used both reactively and in a proactive, investigative manner. In particular, it lets a user drill down into usage the whole set of electronic identities [235] used by a specific actor [220].
- For instance, it shows whether electronic identity [235] usage (e.g. using a specific email account) correlates to particular periods of time, communication [123] recipients, topics [144], or other features. It also provides a useful component of social proximity assessment, since it underlines proximity at the level of individual electronic aliases [240] or actor names [245] rather than at the actor [220] level: for example, it can show with whom a given actor [220] exchanges the largest volume of emails on accounts having her maiden name, even if this volume is negligible compared to her overall communication volume since her work email has her married name. In a more proactive mode, the alias usage browser [478] lets a user drill down into the data and filter that data in a very flexible way.
- The alias usage browser [478] relies on the following features resulting either from data and metadata extraction during the processing phase, or from analysis during the post-processing phase (such as alias identification and persona [230] identification):
-
- Electronic identity [235] of the communication [123] initiator including but not limited to: display names, email aliases, IM monikers
- Electronic identity [235] of the communication [123] recipient(s)
- Topics detected using any topic detection [152] method on electronic communications [123]
- Formality of communications [123]
- Nature of the communications [123], such as personal vs. professional
- More generally, categories derived from any categorization component [146].
- When using the alias usage browser [478] in an investigative fashion, the user chooses what features are shown as rows and which ones are shown as columns. Optionally, a time period of interest can be selected, otherwise the system will consider and play back all historical data processed and analyzed until the present time. Then the user has the possibility to fully specify the list of rows and the list of columns, or can let the system derive the most meaningful associations. In the default embodiment of the invention, the automatic derivation of matrix components (i.e. of groups of row elements and column elements) relies on an iterative algorithm that computes cross-entropies of a bipartite graph—or equivalently of a binary matrix. This algorithm is efficient in this case because the binary matrices considered are typically very sparse. In essence, the algorithm takes as input parameter the partition size (i.e. the number of groups), starts with initial random assignments of rows and columns among groups of identical size, and upon each iteration, re-assigns successively each row and each column so as to minimize the sum of cross-entropies over the whole partition. Once groups of row elements and column elements have been produced, the system lets the user go through these groups using navigation controls.
-
FIG. 36 shows the alias usage browser [478] at some point in time. The title of the visualization [3605] summarizes the bound features in the current state (in this case, an author has been selected whose communications [123] are analyzed, and the topic feature has been bound too), as well as the column feature (electronic identity [235] in this case, which is resolved to the actor [220] as described in the prior patent noted above) and the row feature (pragmatic tag [166] in this case, which is computed by the pragmatic tagging component [430]). - In the default embodiment, each matrix is visualized as a very common representation known as heat map, with a color saturation assigned to each cell [3620] in order to indicate the amount of communication [123] for the corresponding row feature value and given column feature value, as well as for all bounded features. Alternative embodiments give other kinds of visual cues, such as the level of formality, or the intensity of negative or positive sentiment.
- Each matrix is animated and offers playback functionality available through a timeline control [3625], similar to those described in the sections on Stressful topics and Temperature gauges visualizations, so that the user can replay past events [100] at normal or faster speed. Optionally, the system can determine the most anomalous time periods and highlight them for the user on the timeline control [3625] using a red color [3630]. In the default embodiment, this is done by leveraging the results of cross-entropy computation described above. For a given matrix, the most anomalous periods correspond to local maxima of cross-entropy in the group considered. For the whole dataset, the most anomalous periods correspond to local maxima of the sum cross-entropy over all groups.
- In addition to the timeline control [3625], the user can interact with the alias usage browser [478] in multiple ways, including by drilling down and up into the data. More precisely, every time the alias usage browser [478] shows a particular matrix, the user can drill down into the data in three ways: by right-clicking on a column header [3610] (which binds the value of the column feature to that element and proposes a selection list for a new column feature among unbound features), on a row header [3615] (which binds the value of the row feature to that element and proposes a selection list for a new column feature among unbound features), or on a cell [3620] (which binds both the column and the row values).
-
FIG. 37 illustrates the animation operated by the alias usage browser [478] whenever the user chooses to drill down into a particular cell [3705] of the current matrix by double-clicking on that cell. In this particular example, the cell which has been double-clicked corresponds to a pragmatic tag [166] “Expression of concern” and an electronic identity [235] “Katherine Bronson”. This means that the underlying communication [123] model is restricted to data items [122] matching these two criteria in addition to all other previously-bound features (which in this case were “Katherine Midas” as author and “Widget pricing” as topic). To visually represent this drilling-down navigation, the rectangle bounding the double-clicked cell is expanded until it occupies the exact space of the original matrix. A new, zoomed-in matrix [3710] is then drawn in place of the original one, with new header values (corresponding to the values of the new row feature “Recipient” and to the values of the new column feature “Formality level” that have either been automatically selected by the system or specified by the user, as described above). - The content of each cell 3715] now represents aggregate information (such as communication [123] volume) for data items [122] having matched all bound features. Also, if the time animation was playing when the user double-clicked the cell, it is not interrupted and continues playing after the zoom-in animation completes. If it was paused, then it is still paused at the same point in time. Double-clicking on a row header [3615] works similarly, except that the animation is vertical only: the row corresponding to the double-clicked header is expanded until it occupies the whole height of the original matrix. The same is true for double-clicking on a column header [3610].
- Similarly, if one or more features have already been bound, the user can drill up by right-clicking on a column header [3610], on a row header [3615], or on a cell [3620].
- The system uses a continuous actor graph visualization to represent anomalies [270] of various types. In each case, the standard actor graph is decorated with one or more additional visual features.
- The standard continuous actor graph visualization displays actors [220] as nodes, and communications [123] and other events [100] as edges, as described in U.S. Pat. No. 7,143,091 in more detail. The display is updated at discrete intervals of time, the interval duration being continuously adjusted by the system: at the end of each interval, newly added edges are displayed, while previously existing edges are aged so that some of them disappear. In one embodiment, this adjustment is based on the following dynamic constraints:
-
- The maximum number of new edges added at the end of each interval
- The maximum total number of edges displayed at any given time
- The maximum value of the time interval.
- Based on these constraints, the system adjusts visualization update parameters (including the value of the interval duration and the decay rate of older edges) and can also decide to ignore some new edges during each interval to meet these constraints.
- Animated actor graph visualizations [471] are used to represent a variety of structural anomalies in the graph, both local and global, including but not limited to the following:
- Changes of Position within the Network
- As illustrated in
FIG. 39 , this visualization consists of an animated graph representing changes in an individual actor's [220] position relative to other actors [220] in the network, and can be used for example to demonstrate the shift of that actor's attention towards outbound communications [123] (e.g. with actors outside the corporation). - For example, in the case of a disengaged employee: people who feel they are hitting the ceiling will often start to look for validation elsewhere, such as via industry standards bodies and conferences; their attention will shift more outwards over time.
- The decorations added in this graph visualization are:
-
- Blue edges represent communications [123] from and to the subject
- Red edges represent communications [123] exchanged within an outside industry or among business contacts. Thus an abnormally high volume of communications [123] directed towards outbound contacts will manifest itself by an increase in the number of blue edges between the considered actor [220] and actors [220] connected by red edges.
- Green edges denote purely social or family actors [220], meaning that no work-related content seems to be being exchanged among them. Non-work related content is useful to provide context, for example to assess whether the actor [220] is just focusing more on anything not directly work-related, or rather diverting more of their professional attention outside of the company.
- As illustrated in
FIG. 40 , this visualization consists of an animated graph representing the evolution of delegation patterns for an individual actor [220] over time. - A typical usage example is that delegating more work without any obvious or apparent reason suggests disengagement of an employee. For example, abandoning a delegation pattern would be worrisome for a managerial position.
- To build this graph visualization, centrality is defined with respect to a delegation subgraph, and re-computed at the end of each interval of time between two updates of the layout. In one embodiment, the delegation subgraph is defined as business-related communications [123] with other actors [220] or groups [225] inside the company.
- The graph visualization is then decorated as follows:
-
- The node for the considered actor [220] is colored in green
- All other actors [220] are colored in blue
- The most central names have a red color highlight around them
- When the considered actor [220] either enters or leaves the group of most central actors [220], the node flashes in red to attract the user's attention to this change.
- As illustrated in
FIG. 41 , this visualization consists of an animated graph representing the evolution of the most distinct cliques [255] in the actor network. This is appropriate to show the emergence or extinction of cliques [255], as well as existing cliques [255] getting tighter or looser over time. - Cliques [255] are updated at the end of each interval of time between two updates of the layout based on new edges and on decayed edges.
- The decorations added by this graph visualization are:
-
- Cliques [255] are visualized by coloring the nodes it contains (the clique's [255] boundary being defined for example as the convex hull of these nodes)
- Edges corresponding to intra-clique communications [123] are colored in blue
- Edges corresponding to inter-clique communications [123] and communications [123] with actors [220] who are not members of any clique are colored in red
- When a threshold is passed on the emergence or extinction of one of the cliques [255] (i.e. in the relative variation of its size), that clique is highlighted.
- The gap viewer component [415] is a timeline-based visualization [204] that shows part of the output of the periodic patterns detection component [405].
- The gap viewer primarily achieves two goals: to show non-obvious, and in some cases previously unknown, recurring events [101] and activities; and to show where the regularity is broken for any regularly occurring event [101], for example when events [100] are missing, when unexpected events [100] are present, or when events [100] occur at unusual times.
- As illustrated in
FIG. 42 , the gap viewer [415] in its basic form uses the output of periodic patterns detection to show where there are gaps in the sequence of occurrence dates of a particular periodic sequence [132]. - Each flag [4210] represents a unique occurrence of the periodic sequence [132], for example an instance of a weekly meeting. The flags [4210] allow a user to view the content associated to that particular instance, including the extracted text, as well as other data and metadata.
- Circles filled with a solid color, or bubbles [4215], are used to represent gaps between two consecutive occurrences of a periodic sequence [132]. The larger the gap, the bigger is the bubble [4215] drawn for the next occurrence of the periodic sequence [132]. In the default embodiment of this invention, the diameter of each bubble [4215] is proportional to the inter-event interval. Using bubbles [4215] makes the gap more prominent than if they were merely arranged points along a line. This is particularly interesting in the case of communication patterns, for which the presence of important gaps might lead to further look into associated events [100].
- For each periodic sequence [132], a representative title [4220] is displayed, which is derived from the periodic patterns detection phase.
- The gap viewer shows the events [100] for one or more periodic sequences [132] along a timeline [4225]. The visualization is updated on a continuous basis, so that each time an event [100] occurs as part of a periodic sequence [132], a new bubble [4215] is drawn to indicate the time elapsed since the previous event [100] in that sequence [132]. Simultaneously, the index of periodic sequences [132] is updated so that any subsequent query can return other periodic sequences [132] overlapping this periodic sequence [132] with respect to segments, gaps, or disturbances.
- Additionally, the gap viewer can use the periodic patterns query component to display any periodic sequences [132] that are correlated with a particularly interesting or relevant periodic sequence [132]. To this end, the gap viewer supports the simultaneous visualization of multiple correlated sequences [132], as illustrated in
FIG. 43 . - In that case, each periodic sequence [132] is attributed its own color code, so that bubbles [4315] belonging to different periodic sequences [132] are filled using different colors and flags [4310] representing instances of different periodic sequences [132] are drawn in different colors as well.
- This is particularly useful when the system has detected correlations across two or more periodically occurring events [100], including but not limited to:
-
- A significant overlap of the segments in the respective sequences [132]. This is the most general category of correlations and might suggest many interesting patterns, such as the coordination of a priori independent activities.
- A significant overlap of the gaps in the respective sequences [132]. The fact that regularity is broken during similar time periods suggests that the underlying events [100] might have a correlation which is not obvious to the user, for example that they are associated to one or more common actors [220], which would suggest that these actors [220] were doing something unusual, maybe suspicious, around that time: in particular it might suggest in some cases that these actors [220] attempted to erase electronic traces of their activities during the period of overlapping gaps. In one embodiment, the significance of such an overlap is defined as the ratio of gaps over the total aggregated inter-event duration exceeding a threshold.
- A significant overlap of the disturbances in the respective sequences [132]. In this case, disturbances are defined as events [100] occurring more frequently than expected, or events [100] of another nature replacing expected events [100]. This allows to detect patterns such as the same set of actors [220] attending a regular meeting B at unexpected times that coincide with dates when an occurrence of another regular meeting A is expected to occur.
- A significant alignment of the occurrence times in the respective sequences [132], combined with particular patterns such as the sequences [132] in question involving the same actors [220] in different places. This allows the detection of patterns such as in-person regular meetings involving the same employees in different subsidiaries of a corporation.
- Other types of correlations based on features of the individual events [100]. For example
FIG. 43 shows two periodic sequences [132] that involve the same actors [220] attending weekly meetings, where one periodic sequence [132] stops occurring when the other one is initiated. In some cases this is totally innocuous pattern, while in other cases it might reflect an attempt to divert attention from a meeting on a sensitive topic [144], by renaming that meeting or otherwise moving it away from the spotlight.
- As depicted in
FIG. 48 most social or organizational groups [225] have one or more persons at their center, around whom all other members of the group [225] are effectively in orbit. The types of person(s) at the center include the social butterfly, the strong natural leader, the guy who has all the answers to the homework problems, the actual boss. The person(s) at the center are the instigators, whether of ideas or parties, the persons of power of one kind or another who are pursued by others. - Computationally, these central persons, and the set of persons orbiting them, can be determined by analyzing social network graphs. In one embodiment, a communication graph is used, as described in U.S. Pat. No. 7,519,589. In another embodiment, edge weights are determined using a composite proximity score, such as that described in U.S. Pat. No. 7,519,589. From the graph, groupings are computed, such as (depending upon the embodiment) cliques [255], partial cliques, or modules, using standard methods. Within each grouping, we compute centrality scores for each actor [220] using some measure of centrality (in some embodiments this is eigenvector centrality, betweenness centrality, or degree centrality), and sort the actors [220] by their scores. From this list, we determine whether this grouping has a central actor [220] or actors [220]. In one embodiment, this is computed by looking at the ratio of each centrality score to the next in the list. If we encounter a ratio above a pre-selected threshold, we declare the actor(s) [220] above this point in the list to be the center of this group [225], which is now referred to as a solar system.
- As depicted in
FIG. 48 the Social You-niverse visualization [472] represents such groups [225] as distinct solar systems [4805], in different galaxies [5110]. These solar systems can exist separately or as groups of related solar systems displayed as galaxies [5110]. Galaxies [5110] might often represent real groups or types of (including but not limited to) departments, office branches, or even different locations which are subsets of the universe [5105]. - The Social You-niverse visualization [472] can be “played” so as to see the changes that occur in the universe [5105] over the duration of time for which data is available. These solar systems [4805] can be of varying sizes [4815], depending on how many planets [4820] they should logically include. While planets [4820] in many solar systems [4805] orbit around a single star [4905], or powerful actor [220], in some embodiments, the planets [4820] may orbit around more complex structures as depicted in
FIG. 52 . These include, but are not limited to binary or even multiple stars as depicted inFIG. 53 , black holes (which represent the post-explosion state of a star [4905] or actor [220]) as depicted inFIG. 54 and nebulas (which represent a nascent star). - A given solar system [4805] may arise out of an interstellar cloud of dust [5505], or disappear in a supernova explosion as depicted in
FIG. 56 . These creation and destruction events correspond to real-world events such as a company being acquired by a larger company—so a new division comes into existence within the acquirer—or a division being shut down and hence disappearing from the universe. - In some embodiments, one solar system [4805] can exert gravitational pull on the outer planets [4820] of neighboring solar systems [4805], just as (for example) a very popular manager's department tends to grow while those of adjacent managers who are less popular tends to wane. In some of these embodiments, these gravitational forces are indicated with various kinds of arrows [5705]; in others they are indicated by the outer planets [4820] wobbling as their orbit approaches the solar system [4805] in question this embodiment is depicted in
FIG. 58 . - In still others the orbits (in either/both motion and rendered trajectory line) are stretched in the regions contiguous to the solar system [4805] that is exerting the pull [5905].
- In most embodiments, solar systems [4805] may shrink or expand due to either/both the movement of planets [4820] from one solar system [4805] to another and/or due to the appearance or disappearance of planets [4820] from the galaxy or universe altogether as depicted in
FIG. 60 . In some of these embodiments, such changes in the size or structure of the solar system [4805] are indicated with various kinds of arrows [6005]; in others, dust and other cosmic artifacts are rendered so as to visually suggest the expansion or contraction which has occurred [6010]. - In some embodiments, the solar systems [4805] (or galaxies [5110]) which are exhibiting the greatest degree of change shift automatically towards the visual center of the screen, so as to make themselves more visible to the user as depicted in
FIG. 61 . In these embodiments, the more slowly changing solar systems [4805] are pushed out towards the periphery of the view in order to make room for the rapidly evolving ones [6105]. In some of these embodiments, the user can specify which types of change warrant such promotion and which don't [6205]. For example, a planet [4820] leaving the universe entirely may not be seen as an interesting change, whereas a planet [4820] departing for a neighboring solar system [4805] is. - Some embodiments allow planets [4820] to be in multiple simultaneous orbits as doing so allows the capture of some important real world situations, such as people whose job includes two very distinct roles, each of which has them interacting with very different people. Other embodiments force a choice. In most of these embodiments, a planet [4820] will visibly be pulled into the new solar system [4805], often with a trail of dust or other artifact to call attention to it self [6305]. Still other embodiments allow the user [455] to choose the behavior in this regard. Some embodiments offer a simplified view from the point of view of the individual person. In these embodiments, the person is the sun around which the actors [220] closest to them are represented as planets [4820] in orbit around them. This is an inherently egocentric view, since in reality, most people are in someone else's orbit. Many different measures, or blends of measures, can be used in order to determine the definition of closest, including measures for social and professional proximity as discussed in U.S. Pat. No. 7,519,589.
- In most of these embodiments, the orbits of different planets [4820] change as the distance among the real-world actors [220] varies; a planet [4820] may drift into a more distant orbit, or tighten its orbit. In some embodiments, the user may configure different characteristics of the planet [4820] so as to represent different characteristics of the related actors [220]. For example, the color of the planet [4820] could be used to specify actor [220] gender, and the size of her organizational position. In some embodiments, clouds of dust are used to cloak planets [4820] which represent individuals about whom little is known; this embodiment is depicted in
FIG. 64 . In some embodiments such visual effects are also used to indicate temporary loss of visibility such as would occur if access to a particular stream were cut off for some reason. In these embodiments, the dust is rendered over as much space as necessary, for as long as necessary, in order to accurately portray the extent and duration of the data loss [6505]. - In some embodiments, orbiting planets [4820] can themselves have moons [6610] orbiting them which represent the “followers” or entourage of that actor. Likewise, if appropriate to the underlying data, the moons [6610] can have some smaller objects orbiting them. In some embodiments, orbiting planets [4820] with their own moons [6610] are rendered with Saturn-like rings as a visual cue to the existence of the orbiting objects [6605].
- In some embodiments, there is the notion of an“upwardly mobile” actor and the reverse. The former are represented as planets [4820] with more relative rapid orbits, so as to be more eye catching [6705]. Solar systems [4805] with faster moving planets [4820] will in this way seem less sleepy and stagnant than those with less rapidly moving planets [4820].
- In most embodiments, planets [4820] may be destroyed to correspond to the real-world event in which the relevant actor [220] has permanently left the scene [6805]. In some of these embodiments, the planet [4820] explodes in a fireball [6810], leaving behind a trail of cosmic dust as a reminder of its prior existence [6810]. In some embodiments if the actor [220] has merely left a particular group [225] but continues to appear elsewhere in the observable universe, the planet [4820] will gradually drift out of the orbit of the current solar system [4805] and disappear (but not explode) [6905]. In some embodiments, a planet [4820] will flame out leaving behind a trail of cosmic dust if there is evidence of a severed relationship such as an argument, or increasing levels of hostility. In some embodiments, the visual intensity of the explosion will vary according to the extent of the hostility and observed negative emotion. In some embodiments, the planet [4820] will simply disappear from the orbit if the actor [220] disappear, regardless of the reason. In some embodiments, in the event two actors [220] are experiencing conflict, the planets [4820] representing them will smash together [7005]. In some of these embodiments, such smashing together only occurs as part of the destruction of one or both planets [4820] (at least with respect to the particular solar system [4805]).
- In some embodiments, planets [4820] moons [6605], and stars [4905] can represent groups [225] of actors rather than individual actors [220]. In other embodiments, these objects can also represent concepts or topics [7105]. For example, this would allow a group of planets [4820] to be orbiting (that is, bound together by) a topic [144] of immense common interest.
- In some embodiment, actors [220] represented by planets [4820] can be part of a multiple solar systems [4805]. In such a case an indicator or icon will be used to tell the user [455] that an actor [220] is represented in the universe multiple times.
- In some embodiment, a visual indication such as a gauge near each planet [4820] or an astronomical chart marking is used to indicate that an actor [220] continuously holds a certain distance from another actor [220] represented by a star [4905].
- In some embodiments, there are optional sound effects [7205]. In most of these embodiments, only the solar system [4805] or galaxy [5110] currently in focus will have their sound effects played. In these embodiments, such sound effects include, but are not limited to: explosions when planets [4820] flame out, a grinding noise if two planets [4820] are banging against one another, a “swoosh” sound if a planet [4820] is pulled from one solar system [4805] to another, and a “ping” sound if a new star comes into existence.
- In some embodiments, the angriness level or the negative tone in the communication [123] of an actor [220] is represented with wobbling stars [4905] and planets [4820], solar flares and volcanic eruptions on planets [4820]
- Some embodiments of the invention will use this same visualization to depict relationships among actors other than social proximity. These other relationships include, but are not limited to: shared interests, shared opinions, common purchase. One application for example would allow commerce sites to take a user's identity, automatically look up their social media “friends”, compile data as to what they are broadcasting on different publicly available social media about their activities, and draw the visualization according to commonalities in recent activities and topics.
- Just as the Stressful Topics visualization [473] revolves around the idea of analyzing like-minded groups of people and observing how they exert influence over one another, some embodiments of the Temperature gauges visualization [474] are focused on examining divergences in opinion within a group of people.
- In some embodiments of the temperature gauges visualization [474] as pictured in
FIG. 73 , the user [455] provides the set of actors [220]—including potentially the set of all available actors [220]—to examine and the system determines the topics [144] with the greatest amount of observable divergence of opinion. As elsewhere, topics [144] may be determined by any topic detection [152] mechanism, including but not limited to various types of topic clustering, ontologies [148] and taxonomies, or discussions [136]. Since accurately determining positive sentiment expression on most types of data is very difficult, many embodiments only recognize the notion of “neutral” and various types of negative sentiments, as pictured in the legend inFIG. 74 However some embodiments could also allow the expression of positive sentiments by having the midpoint in the gauge [7505] icon be neutral; anything to the left of that or below it (depending on orientation and exact style of the gauge) would be positive sentiment. - In other embodiments, the user [455] specifies which topics [144] are considered most important, so that these topics [144] will pre-empt other less important topics [144] which may show greater variation in sentiment, thus setting much lower thresholds—including no threshold at all—for such interesting topics [144]. In some embodiments, the user [455] may specify how many actors [220] and/or topics [144] to include in the same matrix. In some embodiments, the user [455] may specify that apparent chain reactions—meaning that actor [220] A expresses negative sentiment on a particular topic [144], then actors [220] B and C subsequently do—should be given different treatment than the case in which actor [220] A alone expresses such negative sentiments within the time frame for which data is available.
- In some embodiments, the user [455] may set various kinds of thresholds to establish what kinds and degrees of divergences of sentiment are useful or interesting ones. For example, if half of an actor [220] population is positive on a particular topic [144] such as a presidential candidate, but the other half is negative, the differences may not be particularly interesting, as there is a broad support basis for both positions. By contrast, in the case of a malicious insider, the sentiments expressed may run almost uniquely counter to those of the group [225] at large because they may be happy when something bad happens to the organization.
- As a result, in some embodiments, each actor [220] has only one gauge associated to them. This one gauge captures their overall negative sentiment level so that as the timeline widget is used to play the temperature gauges visualization [474], the needle on the gauge adjusts accordingly. This is for the purpose of noting divergences in overall mood that are not apparently linked to a particular topic [144]. For example, on the day of the 9/11 terrorist attacks, the vast majority of people expressed various kinds of negative sentiments.
- In some embodiments, the gauges may be associated with groups [225] of actors such as departments rather than individual actors [220]. In some embodiments, there are optional sound cues which are played when a gauge that is in focus or visible (depending on the embodiment) exceeds a certain user-specified threshold, so as to draw further attention to it.
- Some embodiments of the stressful topics visualization [473] are used as described here, but with emoticons of different kinds instead of temperature gauge icons as depicted in
FIGS. 76 and 77 . In some of these embodiments, the emoticons are further refined so as to indicate varying levels of the negative emotion being expressed. For example, the emoticon used to represent anger could have varying hues or saturations of red to indicate intensity. Likewise, some embodiments of the temperature gauges visualization [474] can be used in the same manner as in the stressful topics visualization [473]. - As introduced in U.S. Pat. No. 7,519,589, a matrix representation in which actors [220] and topics [114] are respectively represented in rows and columns (or vice versa) as shown in
FIG. 78 and in which individual intersections between actors [220] and topics [144] in the matrix are adorned with emoticons [7805] to express various emotive tones employed by these actors [220] with respect to these topics [144], provides an effective and compact means of capturing friction-generating issues - The stressful topics visualization [474] included in the present invention extends this concept in the following important ways:
-
- Each actor [220] who expresses a negative sentiment is said to have a group of “hot” topics [144]. In one embodiment, an actor's [220] “hot” topics [144] are those for which the actor [220] expresses any negative sentiment. In another embodiment, they are topics [144] for which the actor [220] expresses negative sentiment in a number of items [122]greater than some pre-chosen threshold. In another embodiment, a percentage of the actor's [220] items [122] greater than some pre-chosen threshold is used. In addition, an actor [220] who expresses negative sentiment is said to have an affinity set of other actors [220] who express similar sentiments on similar topics [144] In one embodiment these groups [225] are computed by associating with each actor [220] a vector with one entry per topic [144] specifying the level of negative sentiment this actor [220] expresses with regard to this topic [144]. These vectors are clustered together using standard clustering techniques or using the continuous clustering component [412] that is part of this invention. Each cluster represents an affinity set of actors [220].
- Within each affinity set, one or more actors [220] may be identified who start the cascade of negative commentary that then “bleeds” onto other actors [220]. In one embodiment, this is determined by first clustering the utterances of negative sentiment within an affinity set based upon the date they were made. Any actors [220] who originate at least one of the first k negative utterances in each cluster more than p percent of the time, where k and p are configured by the user, are considered to be starters of negative commentary cascades.
- Likewise, certain topics [144] may tend to provoke negative sentiments from the same group [225] of actors [225] over time, and tend to trend negatively together as a group. In one embodiment, these “hot actors” for this topic [144] are those which express negative sentiment in a number of items [122] within this topic [144] where the number of such items is greater than some pre-chosen threshold. In another embodiment, they are those actors [220] who express negative sentiment in some percentage of the items [122] within this topic [144]], where that percentage is greater than some pre-chosen threshold. Further, just as with actors [220], negative sentiments expressed about one topic [144] may often start a cascade of negative sentiments expressed about other topics [144] in the same group. In one embodiment these groups of topics [144] are detected by the method described above for actors [220], with the roles of actors [220] and topics [144] reversed.
- Such “negative affinity” relationships are often important to understand, since things like mood are rarely constrained to the specific stimulus that caused the ill humor. They also provide a helpful construct to scale this style of visualization [204] to data sets which may contain arbitrary numbers of actors [220] expressing negative sentiment.
- In the original invention, it was specified that users could determine which specific topics [144] and actors [220] they wanted to have represented in the visualization; they could also just specify the top N relative to any group [225]. Users could also create aggregate actors [220] (e.g. the marketing department) and aggregate topics [144] (e.g. “contracts”). Certain classes of actors [220] and topics [144] could be determined by the user to be of the same broad category, which would influence the visual treatment of the appropriate rows or columns so as to provide a visual cue to the user of the similarity. This makes it easier to see at a glance or from a thumbnail that, for example, many of the stressful topics [144] pertain to a particular class of customer.
- As illustrated in
FIG. 79 the stressful topics visualization [473] included in the present invention now adds the capability whereby, in some embodiments, the software automatically determines and visualizes the relevant matrices in response to the user [455] providing either/both specific actors [220] and/or specific topics [144]. Specifically, the selection of a particular topic [144] or actor [220] may now imply the inclusion of another topic [144] or actor [220] in the same group, and hence pull in additional topics [144]/actors [220] as well. This will result in matrices of varying sizes rather than a fixed size. Specifying actors [220] or topics [144] which do not overlap with one another will result in the creation of a new matrix for each; if the actors [220] and/or topics [144] logically intersect, they will be added to the same matrix - In the original invention, the user could “play” the visualization, so that the emoticons [7805] came and went over time according to the emotions being expressed by different actors [220] at the time. As illustrated in
FIG. 80 , since both actors [220] and topics [144] can change over the course of time, the stressful topics visualization [473] now offers the user the option of having the software automatically determine a matrix in which the topics [144] and actors [220] are fixed for as long as makes sense; if for example an actor [220] drops out of circulation, the existing matrix will be automatically replaced with a new matrix without that actor [220] and which reflects any other appropriate changes. In addition, in one embodiment, individual rows and columns in a matrix may change as shown inFIG. 81 . In some such embodiments, the software automatically renders (for example) the row representing an actor [220] who is known to disappear at a later point to the right of the matrix so that the change is less distracting to users who read left to right. In some embodiments, the software is configurable to account for languages with word orderings other than left to right [8205]. In some embodiments, the system will automatically rearrange rows or columns as appropriate in order to visually emphasize patterns, such as two actors [220] reacting very similarly to the same topics [144]. In some embodiments, as illustrated inFIG. 83 , rows and columns which have been swapped over time are designated in a different color than those which have remained in place for the full time span of the matrix. - As depicted in
FIG. 84 , the stressful topics visualization [473] now offers the ability in some embodiments to “play” a visualization which contains an arbitrary number of different matrices according to the same timeline. In some embodiments, as illustrated inFIG. 85 , the user may select either/both matrices from different timeframes, and/or different timeframes from the same matrix and play these matrices all together. In these embodiments, as illustrated inFIG. 86 when the visualization is played, the display indicates the offset unit of time, for example +1 week since the start, since the timeframes are different by definition. - The stressful topics visualization [473] now offers the ability in some embodiments for “bleeding” between topics [144] and actors [220] to leave a visual artifact as the visualization is played. This is important to allow recurring patterns in the data to remain highlighted as the visualization is played; the fact that a pattern has occurred and is recurring does not mean that it will be present all of the time, and is indeed unlikely to be. In some embodiments, as illustrated in
FIG. 87 , this is implemented as a heat map [8705]. As illustrated inFIG. 88 , the user can determine whether they want to see a visual emphasis on the actor [220] or topic [144] that generated the most negative sentiment or the one that appeared to initiate the cascade. Both have important, but different, applications. The heat map [8705] may be implemented in any number of ways, including color saturation level, use of the different colors in the color spectrum and so on. - In some embodiments, the heat map [8705] will decay over time if no further relevant activity is detected for extended periods of time. Otherwise, for data sets that span very long periods of time, the heat map [8705] would often saturate. Some embodiments will augment the heat map [8705] with arrows indicating the direction of “bleeding” in any orientation.
- We now offer the ability in some embodiments for the user to click on the name of an actor [220] (or actor group [225]) or topic [144] (or topic group) or individual emoticon [7805], individual matrix label or bleeding I heat map [8705] in order to see specific evidence [202] which supports the inclusion of the object in the particular matrix, the appearance of the particular emoticon, the grouping of these particular objects together in the same matrix, or the appearance of a bleeding/heat map [8705], as appropriate. In most embodiments this is done via direct manipulation with the mouse. “Hot” items result in a cursor change to the mouse. In most embodiments, evidence [202] can take a number of different forms, including but not limited to the following:
-
- Statistical output such as covariate analysis or clustering
- Charts reflecting statistical output
- Examples of specific communications [123]
- Examples, text or graphical or sequences of communications [123] so as to clearly establish the order of events [100].
- In some embodiments, non-lexical evidence may be added to the lexical evidence as further evidence [202] of stress or other negative emotion. In some embodiments this includes, but is not limited to, use of font markup such as bolding or red color, excessive use of punctuation, and all caps.
- Some embodiments support an alternate mode which allows the user to see the difference between the density of content which contains negative sentiment and the density of content for that content period. This facilitates the user being able to discern the difference between topics [144] which are disproportionately stressful or friction-generating and those which merely have a very large amount of content associated with them, a roughly normal percentage of which contain negative sentiment. Some embodiments offer side by side comparisons of the same matrix, while others will divide the window in half, replicating each matrix in both halves; still others will render pairs of matrices side by side. Some embodiments may also combine use of the emoticons to express negative sentiment with the backdrop of a heat map [8705] based on all content related to the topic [144].
- In related embodiments, the type of visualization strategy described here can be used to capture the reactions of groups [225] of actors—determined by any type of means, and even if the actors [220] in question do not interact with one another—to different products or topics [144] as mined from specific e-commerce or social media sites or the Internet as a whole.
- A “pecking order” [9005] is defined as a partial ordering which reflects some kind of hierarchical relationship amongst a group [225] of strongly interrelated people. Such interrelated groups [225] of actors often correlate to specific topics [144] of interest or tasks, such as work groups, or hobbyist communities. As the phrase “pecking order” suggests, those people at the top of the hierarchy are the ones deemed to be the most important. In most embodiments, “importance” is determined by how often—and in what order—a particular actor is added to an email address field, and is cited or otherwise referred to in any kind of document [162] or text, Many embodiments give additional weight to consistency of behavior in this regard among different actors [220]. For example, it is more telling if 10 actors [220] often cite Jane first than if one person almost always does and others only rarely do, even if the number of “first cites” is the same or greater. In most embodiments, it is possible to exclude orderings which are purely alphabetical as well as those which directly reflect the official organizational hierarchy.
- The default embodiment for the pecking order computation is as follows:
-
- A first phase consists in collecting information from the stream of events [100] to analyze, and in extracting the actors [220] involved with that event [100] and their respective (or local) order. This consists of extracting the “To”, “Cc” or “Bcc” recipient lists for emails, author lists for documents [162], etc.
- Each local order is then referred to as a “context”. From each context, every possible pair of actors [220] is recorded with the context they were extracted from. For contexts which have a very large number of actors [220], the relationship between the first actor [220] in the list and the last one does not carry as much meaning as the relationship between closely related actors [220] inside the list. In this case, sub-contexts of the original context are instead analyzed by applying a sliding window [380] over the original list of actors [220].
- The computation consists in analyzing the different contexts collected for each pair of actors [220]. This includes but is not limited to extracting most common contexts, detecting common patterns among similar contexts, disregarding pairs for which the opposite pair has a similar or higher frequency, and disregarding uncommon actors. Similar contexts are combined based on context frequencies and actor frequencies inside each context.
- The purpose of the pecking order visualization [476] is to reflect evolution in any type of hierarchy among people. Such hierarchies include, but are not limited to, corporate hierarchies, non-corporate organizations (e.g. charity, volunteer, community, or church groups) and pure social hierarchies.
- In one embodiment of the pecking order visualization [476], chickens are used to represent individual humans as illustrated in
FIG. 90 . However, pigeons, other types of birds, squirrels, rats, snakes, or other types of animals as illustrated inFIG. 91 , objects and/or sounds could be used in other embodiments. Some embodiments will allow the user to choose the animal type generally, or with respect to a particular actor [220], type of actor [220], or specific hierarchy instance [9205]. - As illustrated in
FIG. 93 , each individual pecking order is represented by a building in one embodiment. Each building has a number of ledges that must be greater than or equal to the number of levels in the particular hierarchy it is housing. Each building will contain some number of animals, each of which represents a real-world actor [220] in the pecking order. Buildings may vary by color or style according to both embodiment and user preference. In some embodiments, older pecking orders will have older looking building facades; likewise newer ones will boast more modern facades. - Since most real-world data sets will contain multiple pecking orders, the visualization will generally contain several contiguous buildings or more.
- In some embodiments, a building displays the name of its associated pecking order. In some embodiments the name of a pecking order is the name of the most frequently discussed topic [144] among the actors [220] in question. In other embodiments, it is the name of the ad hoc workflow [128] that is most frequent among these actors [220]. In still other embodiments, it will be the name of the department of either the majority of actors [220] or the actor(s) [220] which appear on the highest level of the pecking order, depending on the exact embodiment.
- In some embodiments, as illustrated in
FIG. 94 , the user can specify the left-to-right order in which the buildings are rendered from choices which include, but are not limited to, alphabetical name of the pecking order instance [9405], the number of levels instantiated in the hierarchy, the amount of evidence [202] for the pecking order, the duration in time of the pecking order, the stability of the pecking order, and the number of animals in the pecking order. - In some embodiments, the name of the pecking order is the name of the most frequently discussed topic among the actors in question. In other embodiments, it is the name of the ad hoc workflow that is most frequent among these actors. In still other embodiments, it will be the name of the department of either the majority of actors or the actor(s) which appear on the highest level of the pecking order, depending on the exact embodiment
- In one embodiment, a pecking order instance may change over time as the pecking order visualization [476] is “played” in the following ways:
-
- 1. A new instance is created—for example, a new business division comes into existence—causing the rendering of a new building with the appropriate features and animals. In some embodiments, the visualization depicts the building being built as shown in
FIG. 95 . - 2. Likewise an existing instance is destroyed because, using the same example, a division has been shut down. In some embodiments, the building collapses into dust, in others a wrecking crew may come along, and so on. However, as illustrated in
FIG. 96 in other instances, the building may remain, but accumulate broken windows, graffiti, and other signs of disuse [9605]. - 3. In some embodiments, ledges or “levels” may be designated with labels such as “vice president.” [9705].These names, and the levels with which they are associated may change as appropriate given the backing data. In some embodiments, these names may appear to be etched into the building, in others appear in overhangs; in still others it is according to the user's preference.
- 4. Chickens (or other animals) may appear or disappear from a particular pecking order instance, depending on the participation level of the person they are representing. In some embodiments, chickens are permitted to be concurrently present in more than one pecking order instance—for example, if an actor [220] plays two distinct roles in an organization. In other embodiments, as illustrated in
FIG. 98 the chicken would fly (or otherwise move, depending on its animal type) [9805] between the different pecking order instances of which it is a member, proportionally spending time in each pecking order according to how much time the actor [220] spends in each role. In some embodiments, a chicken representing an actor [220] who is no longer on the scene will fall to the ground in a manner that clearly suggests it is dead [9905]. In some embodiments, the carcass may be removed in various ways according to user preference including an undertaker, getting flattened by a car in a cartoon-like manner, being carried away by vultures and so on [10005] - 5. As illustrated in
FIG. 101 , chickens (or other animals) may ascend or descend from one level to the next according to the backing data [202]. The type of movement is consistent with the type of animal; for example a chicken would fly up to the next ledge. A chicken need not displace an existing chicken unless the backing data [202] warrants it. For example, in many pecking order instances there may be two equally important chickens at the top. - 6. In some embodiments, as illustrated in
FIG. 102 one or more chickens can gang up on one or more other chickens if the actors [220] they represent are engaged in an argument or power struggle. In this event, the chickens will peck at one another [10205], drawing blood and plucking feathers—or whatever analogous actions would be appropriate based on the choice of animal. In some embodiment text from items of evidence used to infer such power struggles are displayed in word balloons associated with the corresponding chickens. This is illustrated inFIG. 103 .
- 1. A new instance is created—for example, a new business division comes into existence—causing the rendering of a new building with the appropriate features and animals. In some embodiments, the visualization depicts the building being built as shown in
- The idea of “buck passing” in this context is a task that is passed along or delegated absent accompanying guidance or instruction. Whether such “buck passing” is considered appropriate depends on a variety of factors, including but not limited to: the reporting relationships connecting the individuals in question, the nature of the task being delegated, the state of the “buck passer” (for example, about to go out on medical leave), whether or not there has been prior discussion on the task through another medium that is not available (for example, an in-person conversation) and that is unreferenced in the communication data that is available.
- Thus a software system is not able to determine whether or not any particular instance of buck passing is appropriate. However, it can determine whether or not any given instance of buck passing is typical with respect to the actors [220] involved, typical for those with their respective roles, or typical with respect to the organization at large. It can also determine, via various forms of topic detection [152] and ad hoc workflow process [128] identification, whether or not buck passing is usual for tasks relating to these particular topics [144] or in the particular type of workflow process [128]. Likewise, it can determine whether atypical buck passing is part of a deviation [265] from normal behavior for an individual by examining other divergences from expected behavior [260] during a particular interval of time—for example, lots of negative sentiment and stressful emotions being expressed.
- By “lack of accompanying guidance or instructions” in this context we mean any reference to specific details including, but not limited to: proper nouns, numbers (such as price, number of units, flight numbers), alphanumerics (such as model number) dates or times. In some embodiments, a user-determined amount of lexical content that excludes salutations and other designated pleasantries such as thanks can be considered an indication of guidance
- In continuous mode [375], the buck passing visualization [475] can be played with a timeline widget [10505]. As illustrated in
FIG. 105 , in some embodiments buck passing is viewed as a graph in which two objects are connected together by an arc if (and only if) there is buck passing occurring between the two objects, which may represent individual actors [220] or groups [225]. In some embodiments, this arc will get thicker or thinner depending on the amount of buck passing that occurs as the timeline is played [10510]. In some embodiments the color of the arc will change to reflect this In some embodiments, an arc that becomes thin enough as a result of lack of buck passing will simply disappear from the view [10605]. In some embodiments there is a mode in which one can see buck passing relationships which have expanded or contracted over the course of the available data as illustrated inFIG. 107 . In some of these embodiments, this is accomplished visually through horizontally-aligned pairs of arrows which point at one another if the buck passing has diminished, and point in opposite directions if it has increased [10705]. - In most embodiments, as illustrated in
FIG. 107 , arrows reflect the direction of the buck passing; 2-way buck passing is represented by a double-sided arrow [10705]. In some embodiments, an animated arrow will be drawn—meaning that arrows move along the arc—to further emphasize a dramatic increase in buck passing [10805]. - Although the software cannot “know” what is appropriate buck-passing, many embodiments allow the input of an organizational chart with the effect that reporting relationships are indicated with a separate visual treatment in the view as shown in
FIG. 109 . This allows instances in which a manager is delegating tasks to one of her employees to be bucketed separately from instances in which one co-worker is buck passing to another without the benefit of a reporting relationship. As illustrated inFIG. 110 separate visual treatment includes, but is not limited to: different style of arc, different color of arc, altering the transparency of the arc and not displaying such “buck passing” from manager to employee altogether. - Likewise, in some embodiments, as illustrated in
FIG. 111 , users may specify types of topics [144] and ad hoc workflow processes [128] that should not be considered as instances of buck passing [11105]. For example, a user might decide that it is not buck passing if someone asks the secretary to go out and get sandwiches. In some embodiments, as illustrated inFIG. 112 different classes of identifiable tasks can be specified to have differing visual treatments by the user so as to make them easily distinguishable from one another [11205]. In some embodiments, HR events are factored into the buck passing visualization [475] either to exclude certain buck passing instances from the view or to alter their visual treatment for a specified time period. These include, but are not limited to: the arrival of a new employee (who does not yet know the ropes), the departure of an employee, an employee leaving for a significant vacation, medical leave or sabbatical, and the transfer of someone to a new position within the company. The motivation for excluding such events [100] is that they are not likely to be meaningful outside of the specific context of their occurrence; they are thus not considered unexplained anomalies [270] in behavior but rather what is expected in the circumstance. - Different embodiments handle the issue of people coming and going or changing roles within the organization in different ways. Such transitions will often disrupt buck-passing routines, since the trusted resource is either less available or not available at all. Particularly hard to change buck-passing behaviors often will follow the actor [220] to their next role within the company. To that end, some embodiments provide a different visual treatment for nodes that represent actors [220] who have changed roles and the arcs that represent pre-existing buck-passing relationships [11305]. Some embodiments gray out or dim the node and connecting arcs to an actor [220] who is no longer in the same role and leave this treatment in place until the stream of buck passing is redirected to one or more others.
- Since apart from such extenuating circumstances, buck-passing behavior is often fairly static—meaning that as a graph it is fairly constant as long as the same actors [220] remain in place—changes in buck-passing behavior are an important input into the behavioral model [200].
- In some embodiments, any kind of detectable delegation is displayed rather than only buck passing. In some of these embodiments, arcs and arrows representing buck-passing are displayed in a different visual style than those which represent delegation with some amount of instruction or guidance. In the event that both apply between any pair of individual actors [220], some embodiments may render two distinct lines while others may combine different line styles on the same line, for example by alternating between the two styles in equal segments, or segments whose length is proportional to the respective degree of usage.
- In some embodiments, any visual change to an arc directed at a particular node also causes that change in visual style to propagate to the node for increased visual emphasis. This is especially helpful as the graphs become quite large.
- In this increasingly wired age, even brief flirtations often leave behind a thick electronic trail; longer relationships generally leave behind ample material even from the perspective of future biographers. Previously existing practical limitations in how frequently—and in what circumstances—one can communicate with the target of one's desire have been largely eradicated by the near ubiquity of mobile texting devices in some populations.
- This newly available excess of documentation provides the ability for introspection, to detect recurring patterns in one's relationships. Romantic relationships are also interesting from the perspective of investigation for a variety of reasons including that they can account for significant changes in behavior.
- As pictured in
FIG. 114 , love life visualization [477] pictorially depicts a number of key aspects of a romantic and/or sexual relationship. (In most embodiments, the presence of such a relationship is inferred by specific romantic or sexual language, even if indirect (e.g. “last night was sensational.”) However some embodiments may opt to use other sources of evidence, including but not limited to the burstiness of communication.) These aspects include, but are not limited to, the following: - Icons are used to indicate different types of language usage [11405] by frequency of occurrence, including but not limited to explicit sexual language, romantic language, domestic language, and breakup language. Neutral or unidentifiable language is not pictured, but is available as an indication in many embodiments. Other embodiments may use different categories including but not limited to flirtatious, affectionate, and platonic friendship. In some embodiments, the number of icons of a particular sort shown is calculated relative to other relationships, and/or relative to other types of language used in the same relationship. Some embodiments represent the absolute number of occurrences, but at the cost of having to effectively decrease the size of the icons in the event of large numbers of occurrences.
- Type of media used [11410] is indicated by different color wedges in some embodiments, including the one pictured. In some embodiments, annotations are used to designate the type of media. The choice of media is important, since some types of media are more aggressive or interactive than others.
- In some embodiments, both the amplitude and length of the relationship as measured by electronic communications [123] is shown in a line graph, and in a bar graph in others. Most embodiments allow for relationships that overlap in time to be visually displayed as overlapping. Some embodiments allow the view to be played with the timeline widget while others display the information statically due to its potential visual complexity.
- As described in the section on predicted behavior [262], some embodiments of this invention use the past to predict the future, either (or both) based on prior patterns of a particular actor [220], or a population that is demographically similar to him or her. In the love life visualization [477], for example, a given person may have a history of relationships which begin with a sexual encounter and end very shortly thereafter, but may at the same time have had several longer term relationships which did not start off as one night stands. In such embodiments, at least one new relationship must be underway in order to have an outcome to predict. Some of these embodiments use different visual effects so as to make clear that future visibility is much less clear than recording of historical fact. These visual effects include, but are not limited to, dimming or graying out, an overlay with partial opacity, and the use of partially obscuring transformations or filters.
- Many embodiments provide a filtering capability to deal with data sets in which the number of overlapping or concurrent romantic relationships is too large to be reasonably displayed at once. Different embodiments may take different approaches to this problem. These include, but are not limited to: the system automatically determining like groups of relationships based on their statistical similarities and cycling through them, allowing the user to specify which properties of the relationship should be used to assign priority for display, and showing one relationship of each discernible type, with a nearby icon allowing the user to see more relationships of the same type.
- Related embodiments may use a similar view to depict types of relationships that are not necessarily romantic in nature, but which peak and then often fall off. For example, some people have a pattern of acquiring “cool new” friends with whom they have a sort of platonic fascination before the novelty wears off and they lose interest; another example would be salespeople who put the hard press on customers until they sign up, but who may then quickly lose interest afterwards. In examples such as this last, “romantic” language might be replaced with flattering language, but the concept of being able to both observe similarities in relationships of the same general kind and to predict the near future based upon them remains the same.
-
FIG. 44 represents alerts [305] produced by the system organized as a timeline view. - This view is based on anomalies [270] which have been detected with a sufficient level of confidence and whose relevance [280] (i.e. likelihood to represent potentially harmful behavior or at least worth investigating, as opposed to unusual behavior explained by external factors) accumulated over a bounded time period exceeds a threshold. The detection of those anomalies [270] is based on the results of various analysis stages, including but not limited to: discussion building, ad hoc workflow analysis, topic detection [152], etc., augmented with continuous anomaly detection capabilities on heterogeneous data. The timeline view shows alerts [305] that are defined as aggregates of anomalies [270] meeting these criteria.
- The visualization is based on a timeline [4405], which is updated on a continuous basis [375] as new alerts [305] are raised; existing alerts [305] are confirmed or invalidated, etc.
- Each alert [305] is laid out horizontally according to its first detection date and vertically according to its relevance [4410].
- This view uses discretized occurrence dates [4425] (for example, the diagram here shows a discretization step of one week). In one embodiment, the discretization step is chosen by the system so as to keep the count of anomalies occurring at a given date below a predefined number. In another embodiment, the user is free to adjust the discretization step, which can be useful (for example, to have an overview of the anomalies [270] detected over a long period of time, this allows the user to quickly see if the general trend is towards an increase or a decrease of the rate of anomalies [270] detected, even if many anomalies [270] are then associated to the same occurrence date).
- Each alert [305] is attributed to a unique entity called the alert subject [385], which is defined similarly to the corresponding anomaly subject, and can be of the following types:
-
- An individual actor [220], which means that the actor [220] should be further investigated to determine if the anomalous behavior corresponds to a potential threat coming from one or more malicious insiders or can be explained in any other way.
- A group [225] (either formally defined from a corporate data source, such as an organizational chart, or an informal group identified by the system), which similarly means that members of that group [225] should receive particular attention from system users or analysts.
- A workflow process [128] (either formally enforced or derived by the system).
- An external event [170]. In some cases, this external event partly or fully explains the anomalous patterns detected, such as a terrorist alert which causes a significant level of chaos within the organization. Conversely, the external event might be caused by the behavior flagged as anomalous, and thus evidence [202] provided by the system will help investigating the external event [170], such as when that external event [170] is an alert reported by an intrusion detection system or a corporate hotline complaint.
- Visually, each anomaly [270] is represented by a circle (which is also called the “orbit”, and whose diameter and width are determined as described below) drawn around a representation of the alert subject [385].
- When several alerts [305] fall into the same time bin, they are sorted vertically by relevance, with the most relevant alerts [305] placed higher up in the bin. The alert timeline view can be filtered on the main characteristics of the underlying data items [122], including but not limited to:
-
- Actor [220] roles and positions in the organizational chart;
- Behavioral patterns (e.g. communication patterns, emotive tones);
- Occurrence date (for example to only show anomalies [270] past a certain date, or to only show anomalies [270] occurring on weekends).
- The relative levels of importance of the alerts [305] are outlined by the diameter of the orbit drawn around each alert subject [385]: by default, the diameter of this circle is proportional to the alert relevance level.
- The relevance of an alert [305] is computed from the relevance [280] of the anomalies [270] it comprises. In the default embodiment of this invention, the relevance of an alert [305] attributed to a subject [385] S is the sum of the relevance levels of each anomaly [270] with respect to S. In another embodiment, the relevance level is simply the number of distinct anomalies [270] constituting the alert [305].
- The orbit's drawing width indicates the severity of the alert [305]. At the center of this circle is an identifier of the alert subject [385].
- The representation scheme of each alert subject [385] is entirely configurable.
- In one embodiment, the application administrator can choose from the following modes to represent the alert subjects [385] which are an individual actor [220]:
-
- One mode where a photograph and the full name [245] of the actor is displayed;
- One fully anonymized mode where no picture is displayed and a unique identifier (for example “User #327”) designates the actor [220];
- One intermediate mode where only the actor group(s) [225] to which an individual actor [220] belongs are displayed;
- Another kind of intermediate mode where an actor's [220] role or title is displayed but not the exact electronic identity [235] (for example “Sales #92” to designate a sales role).
- In another embodiment, both individual actors [220] and groups [225] are represented as avatars matching the organizational entity (for example business unit or department) they belong to, and as in the previous case the caption shows their electronic identity [235], or an identifier, or nothing depending on the anonymization level configured in the anonymization scheme [340].
- The alert timeline visualization distinguishes predicted alerts [307] from past alerts [306] in a straightforward manner since those two types of alerts [305] are naturally separated by the time-based layout. In one embodiment, this is achieved by simply delineating the present time using a vertical separation line, without visually representing predicted alerts [307] differently from past alerts [306]. In another embodiment, the background of the visualization is grayed out on the right-hand side (i.e. for predicted alerts [307]) compared to the left-hand side (i.e. for past alerts [306]).
- Also, a contextual menu allows to discover the electronic identity [235] of each alert subject [385], as well as to open a new window showing information on the subject [385]. For example, the actor's [220] information page can be displayed, or the ad hoc workflow process [128] and workflow instances [134] as described in U.S. Pat. No. 7,519,589.
- Around the circle are positioned [4430] the target(s) [390] of the alert. An icon with a distinctive shape and color indicates the target [390] type (actor [220], group [225], external event [170], or workflow process [128]) and a tooltip indicates the number of targets [390] of the given type impacted by the alert [305]. For example, the tooltip might say “3 workflow-related anomalies” to indicate that anomalous behavior has been detected which spans 3 distinct workflow processes [128] (and possibly many more workflow instances [134] of those workflow processes [128]).
- As for the alert subject [385], a contextual menu allows to display additional information on the targets [390] of a given type, such as the name and description of each workflow process [128] involved in the alert [305] (what are their main workflow stages [154], who are the main actors [220] performing those workflow processes [128], etc.).
- The present invention relies on an alternative layout for alerts [305] which reflects the behavioral model [200] built as described in the Behavioral Model Section.
-
FIG. 45 illustrates the behavior-based alert visualization provided by the system. In that visualization, alerts [305] are no longer binned by occurrence time as for the previous visualization, but by behavioral trait [295] represented on the X axis [4510]. Alerts [305] represented in this visualization are attributed to an individual actor [220] or a group [225], so that alert subjects [385] have the same interpretation as in the alert timeline view. - However, in the behavior-based alert visualization, alerts [305] are ranked by level of anomaly attributed to the alert subject [385] along that particular behavioral trait [295]. This provides a very synthetic representation of the most anomalous individual behaviors [210] and collective behaviors [215] flagged by the system at any point in time. In one embodiment, a parameter n dictates the number of subjects [385] (individual actors [220] and groups [225]) flagged for each behavioral trait [295], so that only the top n most anomalous actors [220] are displayed in a particular column. In another embodiment, the number of actors [220] for which an alert [305] is raised is not constant, but depends on a statistical test such as passing a threshold on the difference between an actor's [220] score [285] against a behavioral trait [295] and the mean score [285] over all other actors [220], measured in number of standard deviations. In yet another embodiment, the mean value and the standard deviation are computed over a group [225] of peers, such as actors [220] in the same organizational entity, actors [220] with the same job description, or groups [225] with a similar function.
- In the behavior-based alert visualization, icons [4515] with a distinctive shape and color positioned around the circle representing an alert [305] have different semantics from the alert timeline view: each such shape here represents a behavioral metric [290], i.e. a particular method of scoring an actor [220] along a behavioral trait [295]. In one embodiment of the application, hovering over one of these icons with the mouse shows the actor's [220] absolute score [285] and relative ranking [275] along that behavioral metric [290], while clicking on one of these icons opens a visualization [204] associated to that metric [290] for that particular actor [220], and/or displays evidence [202] which supports the score [285] attributed to the actor [220], depending on whether the behavioral metric [290] has a visual interpretation.
- Contrary to the alert timeline view, the behavior-based alert visualization does not show the evolution of anomalous behavior over time on a horizontal axis, but updates the visualization to reflect temporal changes. As other visualizations described in this invention, this visualization can be played back to show historical changes in the behavioral model [200].
- As illustrated in
FIG. 46 , the presence of alerts [305] associated to specific individual actors [220] or groups [225], as well as what column they appear in (the behavioral trait [295] flagged as anomalous) and their vertical position, will be updated to reflect those changes. This temporal evolution is represented in the visualization by animating the graphical widgets representing alerts [305] with movements evocative of bubbles floating up and down, appearing and disappearing. - In particular, bubbles will form [4605] to signify that new alerts [305] have been detected: those bubbles will appear in the appropriate column and expand to reach the appropriate level based on the score [285] of the alert subject [385] along the given behavioral trait [295]. If necessary, these new bubbles will push existing bubbles to lower levels [4610].
- Secondly, some alerts [305] will disappear over time when an actor's [220] behavior has reached its baseline [260] again: the corresponding bubbles will explode [4615], and any bubbles below them will be pulled up to higher levels [4620].
- Thirdly, some anomalous behaviors might continue to be flagged but with an increased severity, or alternatively a decreased severity, whenever the score [285] along a given behavioral trait [295] has increased or decreased for the subject [385] while remaining in the top ranks: in that case, a bubble might pass over another bubble so that they exchange places [4625], with the former becoming larger (hence floating toward the surface) and the latter becoming smaller [4630] (hence sinking deeper).
- As described in
FIG. 47 , another interaction mechanism provided by the behavior-based alert visualization consists of providing feedback on manual adjustments to metric weights made by an expert user. In one embodiment of the invention, a set of slider controls [4705] appears which can be adjusted to update the weights, and every time the user changes one of the values the bubbles in the corresponding column move up or down [4710], appear or disappear to reflect the impact on the final scores [285] for that behavioral trait [295]. Along with the set of slider controls, a button [4715] lets the user reset weights to the values resulting from normalizing the behavioral metrics [290] as described in the section on Behavioral model. - As for the timeline alert visualization, visual effects are used in the behavioral alerts visualization to distinguish predicted alerts [307] from past alerts [306]. In one embodiment, predicted alerts [307] are visualized similarly to past alerts [306] but with a dimming effect applied whose intensity is a negative function of the confidence level [870] for the corresponding predicted alert [307].
- In one embodiment, live operations are supported by the system. In some embodiments, operations such as querying of the clustering component's current results as well as its current state are allowed. This is performed on top of the regular continuous forwarding of results. Other examples of five operation include live adjustment of key parameters such as similarity measures.
- In one embodiment, periodic events detection supports live operations. In some embodiments, live operations include real-time indexing and querying of the periodic patterns database based on structural information, (including but not limited to periodic sequence length, periods, gaps, etc.) or based on semantic information. These live operations are performed on top of the regular continuous forwarding of results. Other operations in some embodiments include recombination of existing periodic patterns to create richer, more informative, higher-order periodic patterns.
- The foregoing has outlined features of several embodiments so that those skilled in the art may better understand the detailed description that follows. Those skilled in the art should appreciate that they may readily use the present disclosure as a basis for designing or modifying other processes and structures for carrying out the same purposes and/or achieving the same advantages of the embodiments introduced herein. Those skilled in the art should also realize that such equivalent constructions do not depart from the spirit and scope of the present disclosure, and that they may make various changes, substitutions and alterations herein without departing from the spirit and scope of the present disclosure.
Claims (63)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/941,849 US20120137367A1 (en) | 2009-11-06 | 2010-11-08 | Continuous anomaly detection based on behavior modeling and heterogeneous information analysis |
US14/034,008 US8887286B2 (en) | 2009-11-06 | 2013-09-23 | Continuous anomaly detection based on behavior modeling and heterogeneous information analysis |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US28079109P | 2009-11-06 | 2009-11-06 | |
US12/941,849 US20120137367A1 (en) | 2009-11-06 | 2010-11-08 | Continuous anomaly detection based on behavior modeling and heterogeneous information analysis |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/034,008 Continuation US8887286B2 (en) | 2009-11-06 | 2013-09-23 | Continuous anomaly detection based on behavior modeling and heterogeneous information analysis |
Publications (1)
Publication Number | Publication Date |
---|---|
US20120137367A1 true US20120137367A1 (en) | 2012-05-31 |
Family
ID=46127546
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/941,849 Abandoned US20120137367A1 (en) | 2009-11-06 | 2010-11-08 | Continuous anomaly detection based on behavior modeling and heterogeneous information analysis |
US14/034,008 Active US8887286B2 (en) | 2009-11-06 | 2013-09-23 | Continuous anomaly detection based on behavior modeling and heterogeneous information analysis |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/034,008 Active US8887286B2 (en) | 2009-11-06 | 2013-09-23 | Continuous anomaly detection based on behavior modeling and heterogeneous information analysis |
Country Status (1)
Country | Link |
---|---|
US (2) | US20120137367A1 (en) |
Cited By (916)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100198375A1 (en) * | 2009-01-30 | 2010-08-05 | Apple Inc. | Audio user interface for displayless electronic device |
US20100293123A1 (en) * | 2009-04-15 | 2010-11-18 | Virginia Polytechnic Institute And State University | Complex situation analysis system |
US20110282872A1 (en) * | 2010-05-14 | 2011-11-17 | Salesforce.Com, Inc | Methods and Systems for Categorizing Data in an On-Demand Database Environment |
US20120036123A1 (en) * | 2010-07-30 | 2012-02-09 | Mohammad Al Hasan | Query suggestion for e-commerce sites |
US20120116850A1 (en) * | 2010-11-10 | 2012-05-10 | International Business Machines Corporation | Causal modeling of multi-dimensional hierachical metric cubes |
US20120150788A1 (en) * | 2010-09-02 | 2012-06-14 | Pepperdash Technology Corporation | Automated facilities management system |
US20120158851A1 (en) * | 2010-12-21 | 2012-06-21 | Daniel Leon Kelmenson | Categorizing Social Network Objects Based on User Affiliations |
US20120174231A1 (en) * | 2011-01-04 | 2012-07-05 | Siemens Corporation | Assessing System Performance Impact of Security Attacks |
US20120173004A1 (en) * | 2010-12-31 | 2012-07-05 | Brad Radl | System and Method for Real-Time Industrial Process Modeling |
US20120246054A1 (en) * | 2011-03-22 | 2012-09-27 | Gautham Sastri | Reaction indicator for sentiment of social media messages |
US8280891B1 (en) * | 2011-06-17 | 2012-10-02 | Google Inc. | System and method for the calibration of a scoring function |
US20120284307A1 (en) * | 2011-05-06 | 2012-11-08 | Gopogo, Llc | String Searching Systems and Methods Thereof |
US20120303348A1 (en) * | 2011-05-23 | 2012-11-29 | Gm Global Technology Operation Llc | System and methods for fault-isolation and fault-mitigation based on network modeling |
US20120311474A1 (en) * | 2011-06-02 | 2012-12-06 | Microsoft Corporation | Map-based methods of visualizing relational databases |
US20120323853A1 (en) * | 2011-06-17 | 2012-12-20 | Microsoft Corporation | Virtual machine snapshotting and analysis |
US20130024512A1 (en) * | 2011-02-13 | 2013-01-24 | Georgi Milev | Feature-extended apparatus, system and method for social networking and secure resource sharing |
US20130073594A1 (en) * | 2011-09-19 | 2013-03-21 | Citigroup Technology, Inc. | Methods and Systems for Assessing Data Quality |
US20130085715A1 (en) * | 2011-09-29 | 2013-04-04 | Choudur Lakshminarayan | Anomaly detection in streaming data |
US20130110505A1 (en) * | 2006-09-08 | 2013-05-02 | Apple Inc. | Using Event Alert Text as Input to an Automated Assistant |
US20130111019A1 (en) * | 2011-10-28 | 2013-05-02 | Electronic Arts Inc. | User behavior analyzer |
US20130124497A1 (en) * | 2011-09-13 | 2013-05-16 | Airtime Media, Inc. | Experience graph |
US20130132560A1 (en) * | 2011-11-22 | 2013-05-23 | Sap Ag | Dynamic adaptations for network delays during complex event processing |
US20130179389A1 (en) * | 2010-07-13 | 2013-07-11 | Jean-Pierre Malle | Processor for situational analysis |
US20130191390A1 (en) * | 2011-02-28 | 2013-07-25 | Battelle Memorial Institute | Automatic Identification of Abstract Online Groups |
US20130227046A1 (en) * | 2002-06-27 | 2013-08-29 | Siebel Systems, Inc. | Method and system for processing intelligence information |
US20130232130A1 (en) * | 2010-03-18 | 2013-09-05 | Companybook As | Company network |
US20130290350A1 (en) * | 2012-04-30 | 2013-10-31 | Abdullah Al Mueen | Similarity Search Initialization |
US20130290380A1 (en) * | 2011-01-06 | 2013-10-31 | Thomson Licensing | Method and apparatus for updating a database in a receiving device |
US20130318095A1 (en) * | 2012-05-14 | 2013-11-28 | WaLa! Inc. | Distributed computing environment for data capture, search and analytics |
US20130326620A1 (en) * | 2013-07-25 | 2013-12-05 | Splunk Inc. | Investigative and dynamic detection of potential security-threat indicators from events in big data |
US20130339367A1 (en) * | 2012-06-14 | 2013-12-19 | Santhosh Adayikkoth | Method and system for preferential accessing of one or more critical entities |
US20140040281A1 (en) * | 2012-07-31 | 2014-02-06 | Bottlenose, Inc. | Discovering and ranking trending links about topics |
US20140067370A1 (en) * | 2012-08-31 | 2014-03-06 | Xerox Corporation | Learning opinion-related patterns for contextual and domain-dependent opinion detection |
US20140067734A1 (en) * | 2012-09-05 | 2014-03-06 | Numenta, Inc. | Anomaly detection in spatial and temporal memory system |
US20140067814A1 (en) * | 2012-09-04 | 2014-03-06 | salesforces.com, Inc. | Computer implemented methods and apparatus for identifying a topic for a text |
US8670985B2 (en) | 2010-01-13 | 2014-03-11 | Apple Inc. | Devices and methods for identifying a prompt corresponding to a voice input in a sequence of prompts |
US8676904B2 (en) | 2008-10-02 | 2014-03-18 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US8677377B2 (en) | 2005-09-08 | 2014-03-18 | Apple Inc. | Method and apparatus for building an intelligent automated assistant |
US8682649B2 (en) | 2009-11-12 | 2014-03-25 | Apple Inc. | Sentiment prediction from textual data |
US8682667B2 (en) | 2010-02-25 | 2014-03-25 | Apple Inc. | User profiling for selecting user specific voice input processing information |
US8688446B2 (en) | 2008-02-22 | 2014-04-01 | Apple Inc. | Providing text input using speech data and non-speech data |
US20140095606A1 (en) * | 2012-10-01 | 2014-04-03 | Jonathan Arie Matus | Mobile Device-Related Measures of Affinity |
US8706472B2 (en) | 2011-08-11 | 2014-04-22 | Apple Inc. | Method for disambiguating multiple readings in language conversion |
US8713021B2 (en) | 2010-07-07 | 2014-04-29 | Apple Inc. | Unsupervised document clustering using latent semantic density analysis |
US8712776B2 (en) | 2008-09-29 | 2014-04-29 | Apple Inc. | Systems and methods for selective text to speech synthesis |
US8719006B2 (en) | 2010-08-27 | 2014-05-06 | Apple Inc. | Combined statistical and rule-based part-of-speech tagging for text-to-speech synthesis |
US8718047B2 (en) | 2001-10-22 | 2014-05-06 | Apple Inc. | Text to speech conversion of text messages from mobile communication devices |
US8719014B2 (en) | 2010-09-27 | 2014-05-06 | Apple Inc. | Electronic device with text error correction based on voice recognition data |
US20140129299A1 (en) * | 2012-11-06 | 2014-05-08 | Nice-Systems Ltd | Method and apparatus for detection and analysis of first contact resolution failures |
US20140136534A1 (en) * | 2012-11-14 | 2014-05-15 | Electronics And Telecommunications Research Institute | Similarity calculating method and apparatus |
US8738775B1 (en) * | 2011-12-20 | 2014-05-27 | Amazon Technologies, Inc. | Managing resource dependent workflows |
US8751238B2 (en) | 2009-03-09 | 2014-06-10 | Apple Inc. | Systems and methods for determining the language to use for speech generated by a text to speech engine |
US20140165201A1 (en) * | 2010-11-18 | 2014-06-12 | Nant Holdings Ip, Llc | Vector-Based Anomaly Detection |
US8762156B2 (en) | 2011-09-28 | 2014-06-24 | Apple Inc. | Speech recognition repair using contextual information |
US20140181091A1 (en) * | 2012-12-21 | 2014-06-26 | Soren Bogh Lassen | Extract Operator |
US8768702B2 (en) | 2008-09-05 | 2014-07-01 | Apple Inc. | Multi-tiered voice feedback in an electronic device |
US8775442B2 (en) | 2012-05-15 | 2014-07-08 | Apple Inc. | Semantic search using a single-source semantic model |
US8781836B2 (en) | 2011-02-22 | 2014-07-15 | Apple Inc. | Hearing assistance system for providing consistent human speech |
US8781990B1 (en) * | 2010-02-25 | 2014-07-15 | Google Inc. | Crowdsensus: deriving consensus information from statements made by a crowd of users |
US8788663B1 (en) | 2011-12-20 | 2014-07-22 | Amazon Technologies, Inc. | Managing resource dependent workflows |
US20140223554A1 (en) * | 2013-02-07 | 2014-08-07 | Thomas Gilbert Roden, III | Dynamic operational watermarking for software and hardware assurance |
US8812294B2 (en) | 2011-06-21 | 2014-08-19 | Apple Inc. | Translating phrases from one language into another using an order-based set of declarative rules |
US8825584B1 (en) * | 2011-08-04 | 2014-09-02 | Smart Information Flow Technologies LLC | Systems and methods for determining social regard scores |
US8832116B1 (en) | 2012-01-11 | 2014-09-09 | Google Inc. | Using mobile application logs to measure and maintain accuracy of business information |
US20140282422A1 (en) * | 2013-03-12 | 2014-09-18 | Netflix, Inc. | Using canary instances for software analysis |
US20140279418A1 (en) * | 2013-03-15 | 2014-09-18 | Facebook, Inc. | Associating an indication of user emotional reaction with content items presented by a social networking system |
US20140278729A1 (en) * | 2013-03-12 | 2014-09-18 | Palo Alto Research Center Incorporated | Multiple resolution visualization of detected anomalies in corporate environment |
US20140283059A1 (en) * | 2011-04-11 | 2014-09-18 | NSS Lab Works LLC | Continuous Monitoring of Computer User and Computer Activities |
US20140282442A1 (en) * | 2013-03-13 | 2014-09-18 | Microsoft Corporation | Statically extensible types |
US20140279797A1 (en) * | 2013-03-12 | 2014-09-18 | Bmc Software, Inc. | Behavioral rules discovery for intelligent computing environment administration |
US8856807B1 (en) * | 2011-01-04 | 2014-10-07 | The Pnc Financial Services Group, Inc. | Alert event platform |
WO2014165601A1 (en) * | 2013-04-02 | 2014-10-09 | Orbis Technologies, Inc. | Data center analytics and dashboard |
US8862492B1 (en) | 2011-04-29 | 2014-10-14 | Google Inc. | Identifying unreliable contributors of user-generated content |
US20140317019A1 (en) * | 2013-03-14 | 2014-10-23 | Jochen Papenbrock | System and method for risk management and portfolio optimization |
US20140325643A1 (en) * | 2013-04-26 | 2014-10-30 | Palo Alto Research Center Incorporated | Detecting anomalies in work practice data by combining multiple domains of information |
US20140330968A1 (en) * | 2011-12-15 | 2014-11-06 | Telefonaktiebolaget L M Ericsson (Publ) | Method and trend analyzer for analyzing data in a communication network |
US20140344279A1 (en) * | 2011-09-30 | 2014-11-20 | Thomson Reuters Global Resources | Systems and methods for determining atypical language |
US8898568B2 (en) | 2008-09-09 | 2014-11-25 | Apple Inc. | Audio user interface |
US20140351129A1 (en) * | 2013-05-24 | 2014-11-27 | Hewlett-Packard Development Company, L.P. | Centralized versatile transaction verification |
US20140351931A1 (en) * | 2012-09-06 | 2014-11-27 | Dstillery, Inc. | Methods, systems and media for detecting non-intended traffic using co-visitation information |
WO2014205421A1 (en) * | 2013-06-21 | 2014-12-24 | Arizona Board Of Regents For The University Of Arizona | Automated detection of insider threats |
US8931101B2 (en) | 2012-11-14 | 2015-01-06 | International Business Machines Corporation | Application-level anomaly detection |
US8935167B2 (en) | 2012-09-25 | 2015-01-13 | Apple Inc. | Exemplar-based latent perceptual modeling for automatic speech recognition |
US20150058622A1 (en) * | 2013-08-20 | 2015-02-26 | Hewlett-Packard Development Company, L.P. | Data stream traffic control |
US8977622B1 (en) * | 2012-09-17 | 2015-03-10 | Amazon Technologies, Inc. | Evaluation of nodes |
US8977255B2 (en) | 2007-04-03 | 2015-03-10 | Apple Inc. | Method and system for operating a multi-function portable electronic device using voice-activation |
US20150074806A1 (en) * | 2013-09-10 | 2015-03-12 | Symantec Corporation | Systems and methods for using event-correlation graphs to detect attacks on computing systems |
US8996376B2 (en) | 2008-04-05 | 2015-03-31 | Apple Inc. | Intelligent text-to-speech conversion |
WO2015051185A1 (en) * | 2013-10-04 | 2015-04-09 | Cyberflow Analytics, Inc. | Network intrusion detection |
US9009192B1 (en) * | 2011-06-03 | 2015-04-14 | Google Inc. | Identifying central entities |
EP2866168A1 (en) * | 2013-09-17 | 2015-04-29 | Sap Se | Calibration of strategies for fraud detection |
US20150135048A1 (en) * | 2011-04-20 | 2015-05-14 | Panafold | Methods, apparatus, and systems for visually representing a relative relevance of content elements to an attractor |
US9043332B2 (en) * | 2012-09-07 | 2015-05-26 | Splunk Inc. | Cluster performance monitoring |
US9053089B2 (en) | 2007-10-02 | 2015-06-09 | Apple Inc. | Part-of-speech tagging using latent analogy |
US20150161024A1 (en) * | 2013-12-06 | 2015-06-11 | Qualcomm Incorporated | Methods and Systems of Generating Application-Specific Models for the Targeted Protection of Vital Applications |
WO2015060994A3 (en) * | 2013-07-26 | 2015-06-18 | Nant Holdings Ip, Llc | Discovery routing systems and engines |
US9070088B1 (en) | 2014-09-16 | 2015-06-30 | Trooly Inc. | Determining trustworthiness and compatibility of a person |
US20150193497A1 (en) * | 2014-01-06 | 2015-07-09 | Cisco Technology, Inc. | Method and system for acquisition, normalization, matching, and enrichment of data |
WO2015021449A3 (en) * | 2013-08-08 | 2015-07-30 | E-Valuation,Inc. | Systems and methods of communicating information regarding interpersonal relationships using biographical imagery |
WO2015119607A1 (en) * | 2014-02-06 | 2015-08-13 | Hewlett-Packard Development Company, L.P. | Resource management |
US9111218B1 (en) | 2011-12-27 | 2015-08-18 | Google Inc. | Method and system for remediating topic drift in near-real-time classification of customer feedback |
US20150234905A1 (en) * | 2013-01-22 | 2015-08-20 | Splunk Inc. | Sampling Events for Rule Creation with Process Selection |
US20150235152A1 (en) * | 2014-02-18 | 2015-08-20 | Palo Alto Research Center Incorporated | System and method for modeling behavior change and consistency to detect malicious insiders |
US20150234883A1 (en) * | 2012-11-05 | 2015-08-20 | Tencent Technology (Shenzhen) Company Limited | Method and system for retrieving real-time information |
US20150242786A1 (en) * | 2014-02-21 | 2015-08-27 | International Business Machines Corporation | Integrating process context from heterogeneous workflow containers to optimize workflow performance |
CN104881711A (en) * | 2015-05-18 | 2015-09-02 | 中国矿业大学 | Underground early-warning mechanism based on miner behavioral analysis |
US9128761B1 (en) | 2011-12-20 | 2015-09-08 | Amazon Technologies, Inc. | Management of computing devices processing workflow stages of resource dependent workflow |
US20150254469A1 (en) * | 2014-03-07 | 2015-09-10 | International Business Machines Corporation | Data leak prevention enforcement based on learned document classification |
US20150261940A1 (en) * | 2014-03-12 | 2015-09-17 | Symantec Corporation | Systems and methods for detecting information leakage by an organizational insider |
US20150262474A1 (en) * | 2014-03-12 | 2015-09-17 | Haltian Oy | Relevance determination of sensor event |
US20150262184A1 (en) * | 2014-03-12 | 2015-09-17 | Microsoft Corporation | Two stage risk model building and evaluation |
CN104933095A (en) * | 2015-05-22 | 2015-09-23 | 中国电子科技集团公司第十研究所 | Heterogeneous information universality correlation analysis system and analysis method thereof |
US9152461B1 (en) | 2011-12-20 | 2015-10-06 | Amazon Technologies, Inc. | Management of computing devices processing workflow stages of a resource dependent workflow |
US9152460B1 (en) | 2011-12-20 | 2015-10-06 | Amazon Technologies, Inc. | Management of computing devices processing workflow stages of a resource dependent workflow |
US20150286928A1 (en) * | 2014-04-03 | 2015-10-08 | Adobe Systems Incorporated | Causal Modeling and Attribution |
US9158583B1 (en) | 2011-12-20 | 2015-10-13 | Amazon Technologies, Inc. | Management of computing devices processing workflow stages of a resource dependent workflow |
US20150293979A1 (en) * | 2011-03-24 | 2015-10-15 | Morphism Llc | Propagation Through Perdurance |
US20150294111A1 (en) * | 2014-04-11 | 2015-10-15 | Fuji Xerox Co., Ltd. | Unauthorized-communication detecting apparatus, unauthorized-communication detecting method and non-transitory computer readable medium |
CN105005578A (en) * | 2015-05-21 | 2015-10-28 | 中国电子科技集团公司第十研究所 | Multimedia target information visual analysis system |
US20150317337A1 (en) * | 2014-05-05 | 2015-11-05 | General Electric Company | Systems and Methods for Identifying and Driving Actionable Insights from Data |
US20150334129A1 (en) * | 2011-10-18 | 2015-11-19 | Mcafee, Inc. | User behavioral risk assessment |
US20150347602A1 (en) * | 2012-05-23 | 2015-12-03 | International Business Machines Corporation | Policy based population of genealogical archive data |
US9210183B2 (en) | 2013-12-19 | 2015-12-08 | Microsoft Technology Licensing, Llc | Detecting anomalous activity from accounts of an online service |
US20150373039A1 (en) * | 2014-06-23 | 2015-12-24 | Niara, Inc. | Entity Group Behavior Profiling |
US9225730B1 (en) * | 2014-03-19 | 2015-12-29 | Amazon Technologies, Inc. | Graph based detection of anomalous activity |
US20150379158A1 (en) * | 2014-06-27 | 2015-12-31 | Gabriel G. Infante-Lopez | Systems and methods for pattern matching and relationship discovery |
US20150381641A1 (en) * | 2014-06-30 | 2015-12-31 | Intuit Inc. | Method and system for efficient management of security threats in a distributed computing environment |
US20160004968A1 (en) * | 2014-07-01 | 2016-01-07 | Hitachi, Ltd. | Correlation rule analysis apparatus and correlation rule analysis method |
US9235866B2 (en) | 2012-12-12 | 2016-01-12 | Tata Consultancy Services Limited | Analyzing social network |
WO2016004744A1 (en) * | 2014-07-10 | 2016-01-14 | 同济大学 | Method for measuring user behavior consistency based on complex correspondence system |
US20160019561A1 (en) * | 2010-03-29 | 2016-01-21 | Companybook As | Method and arrangement for monitoring companies |
US20160019479A1 (en) * | 2014-07-18 | 2016-01-21 | Rebecca S. Busch | Interactive and Iterative Behavioral Model, System, and Method for Detecting Fraud, Waste, and Abuse |
US9256739B1 (en) * | 2014-03-21 | 2016-02-09 | Symantec Corporation | Systems and methods for using event-correlation graphs to generate remediation procedures |
US20160044061A1 (en) * | 2014-08-05 | 2016-02-11 | Df Labs | Method and system for automated cybersecurity incident and artifact visualization and correlation for security operation centers and computer emergency response teams |
US9262612B2 (en) | 2011-03-21 | 2016-02-16 | Apple Inc. | Device access using voice authentication |
WO2016024268A1 (en) * | 2014-08-11 | 2016-02-18 | Sentinel Labs Israel Ltd. | Method of malware detection and system thereof |
US9276840B2 (en) | 2013-10-30 | 2016-03-01 | Palo Alto Research Center Incorporated | Interest messages with a payload for a named data network |
US9276948B2 (en) * | 2011-12-29 | 2016-03-01 | 21Ct, Inc. | Method and apparatus for identifying a threatening network |
US9280610B2 (en) | 2012-05-14 | 2016-03-08 | Apple Inc. | Crowd sourcing information to fulfill user requests |
US9286403B2 (en) * | 2014-02-04 | 2016-03-15 | Shoobx, Inc. | Computer-guided corporate governance with document generation and execution |
US9292616B2 (en) | 2014-01-13 | 2016-03-22 | International Business Machines Corporation | Social balancer for indicating the relative priorities of linked objects |
US9300784B2 (en) | 2013-06-13 | 2016-03-29 | Apple Inc. | System and method for emergency calls initiated by voice command |
US9300682B2 (en) | 2013-08-09 | 2016-03-29 | Lockheed Martin Corporation | Composite analysis of executable content across enterprise network |
US9304989B2 (en) | 2012-02-17 | 2016-04-05 | Bottlenose, Inc. | Machine-based content analysis and user perception tracking of microcontent messages |
US9306965B1 (en) * | 2014-10-21 | 2016-04-05 | IronNet Cybersecurity, Inc. | Cybersecurity system |
US9311043B2 (en) | 2010-01-13 | 2016-04-12 | Apple Inc. | Adaptive audio feedback system and method |
US9319420B1 (en) * | 2011-06-08 | 2016-04-19 | United Services Automobile Association (Usaa) | Cyber intelligence clearinghouse |
US9330720B2 (en) | 2008-01-03 | 2016-05-03 | Apple Inc. | Methods and apparatus for altering audio output signals |
US9336332B2 (en) | 2013-08-28 | 2016-05-10 | Clipcard Inc. | Programmatic data discovery platforms for computing applications |
US9338493B2 (en) | 2014-06-30 | 2016-05-10 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US20160132903A1 (en) * | 2014-11-11 | 2016-05-12 | Tata Consultancy Services Limited | Identifying an industry specific e-maven |
WO2016073383A1 (en) * | 2014-11-03 | 2016-05-12 | Vectra Networks, Inc. | A system for implementing threat detection using threat and risk assessment of asset-actor interactions |
US20160132827A1 (en) * | 2014-11-06 | 2016-05-12 | Xerox Corporation | Methods and systems for designing of tasks for crowdsourcing |
US9342796B1 (en) | 2013-09-16 | 2016-05-17 | Amazon Technologies, Inc. | Learning-based data decontextualization |
US9363086B2 (en) | 2014-03-31 | 2016-06-07 | Palo Alto Research Center Incorporated | Aggregate signing of data in content centric networking |
US9363179B2 (en) | 2014-03-26 | 2016-06-07 | Palo Alto Research Center Incorporated | Multi-publisher routing protocol for named data networks |
US20160164714A1 (en) * | 2014-12-08 | 2016-06-09 | Tata Consultancy Services Limited | Alert management system for enterprises |
US9368114B2 (en) | 2013-03-14 | 2016-06-14 | Apple Inc. | Context-sensitive handling of interruptions |
US20160170961A1 (en) * | 2014-12-12 | 2016-06-16 | Behavioral Recognition Systems, Inc. | Perceptual associative memory for a neuro-linguistic behavior recognition system |
US20160170964A1 (en) * | 2014-12-12 | 2016-06-16 | Behavioral Recognition Systems, Inc. | Lexical analyzer for a neuro-linguistic behavior recognition system |
US9374304B2 (en) | 2014-01-24 | 2016-06-21 | Palo Alto Research Center Incorporated | End-to end route tracing over a named-data network |
US9379979B2 (en) | 2014-01-14 | 2016-06-28 | Palo Alto Research Center Incorporated | Method and apparatus for establishing a virtual interface for a set of mutual-listener devices |
EP3038023A1 (en) * | 2014-12-23 | 2016-06-29 | Telefonica Digital España, S.L.U. | A method, a system and computer program products for assessing the behavioral performance of a user |
US20160191450A1 (en) * | 2014-12-31 | 2016-06-30 | Socialtopias, Llc | Recommendations Engine in a Layered Social Media Webpage |
US20160188298A1 (en) * | 2013-08-12 | 2016-06-30 | Telefonaktiebolaget L M Ericsson (Publ) | Predicting Elements for Workflow Development |
US9390289B2 (en) | 2014-04-07 | 2016-07-12 | Palo Alto Research Center Incorporated | Secure collection synchronization using matched network names |
US9391896B2 (en) | 2014-03-10 | 2016-07-12 | Palo Alto Research Center Incorporated | System and method for packet forwarding using a conjunctive normal form strategy in a content-centric network |
US9391777B2 (en) | 2014-08-15 | 2016-07-12 | Palo Alto Research Center Incorporated | System and method for performing key resolution over a content centric network |
US20160210556A1 (en) * | 2015-01-21 | 2016-07-21 | Anodot Ltd. | Heuristic Inference of Topological Representation of Metric Relationships |
US20160210219A1 (en) * | 2013-06-03 | 2016-07-21 | Google Inc. | Application analytics reporting |
US9401864B2 (en) | 2013-10-31 | 2016-07-26 | Palo Alto Research Center Incorporated | Express header for packets with hierarchically structured variable-length identifiers |
US20160217056A1 (en) * | 2015-01-28 | 2016-07-28 | Hewlett-Packard Development Company, L.P. | Detecting flow anomalies |
US9407432B2 (en) | 2014-03-19 | 2016-08-02 | Palo Alto Research Center Incorporated | System and method for efficient and secure distribution of digital content |
US9407549B2 (en) | 2013-10-29 | 2016-08-02 | Palo Alto Research Center Incorporated | System and method for hash-based forwarding of packets with hierarchically structured variable-length identifiers |
WO2016123522A1 (en) * | 2015-01-30 | 2016-08-04 | Securonix, Inc. | Anomaly detection using adaptive behavioral profiles |
WO2016123528A1 (en) * | 2015-01-30 | 2016-08-04 | Securonix, Inc. | Risk scoring for threat assessment |
EP3055808A1 (en) * | 2013-10-08 | 2016-08-17 | Crowdstrike, Inc. | Event model for correlating system component states |
US9424612B1 (en) * | 2012-08-02 | 2016-08-23 | Facebook, Inc. | Systems and methods for managing user reputations in social networking systems |
US9426113B2 (en) | 2014-06-30 | 2016-08-23 | Palo Alto Research Center Incorporated | System and method for managing devices over a content centric network |
US9431006B2 (en) | 2009-07-02 | 2016-08-30 | Apple Inc. | Methods and apparatuses for automatic speech recognition |
US9430463B2 (en) | 2014-05-30 | 2016-08-30 | Apple Inc. | Exemplar-based natural language processing |
US9436686B1 (en) * | 2012-08-07 | 2016-09-06 | Google Inc. | Claim evaluation system |
US20160267441A1 (en) * | 2014-04-13 | 2016-09-15 | Helixaeon Inc. | Visualization and analysis of scheduling data |
US9451032B2 (en) | 2014-04-10 | 2016-09-20 | Palo Alto Research Center Incorporated | System and method for simple service discovery in content-centric networks |
US9455835B2 (en) | 2014-05-23 | 2016-09-27 | Palo Alto Research Center Incorporated | System and method for circular link resolution with hash-based names in content-centric networks |
US9456054B2 (en) | 2008-05-16 | 2016-09-27 | Palo Alto Research Center Incorporated | Controlling the spread of interests and content in a content centric network |
US20160283589A1 (en) * | 2015-03-24 | 2016-09-29 | International Business Machines Corporation | Augmenting search queries based on personalized association patterns |
US9462006B2 (en) | 2015-01-21 | 2016-10-04 | Palo Alto Research Center Incorporated | Network-layer application-specific trust model |
US9459987B2 (en) | 2014-03-31 | 2016-10-04 | Intuit Inc. | Method and system for comparing different versions of a cloud based application in a production environment using segregated backend systems |
US9467492B2 (en) | 2014-08-19 | 2016-10-11 | Palo Alto Research Center Incorporated | System and method for reconstructable all-in-one content stream |
US9473576B2 (en) | 2014-04-07 | 2016-10-18 | Palo Alto Research Center Incorporated | Service discovery using collection synchronization with exact names |
US9473481B2 (en) | 2014-07-31 | 2016-10-18 | Intuit Inc. | Method and system for providing a virtual asset perimeter |
US9473475B2 (en) | 2014-12-22 | 2016-10-18 | Palo Alto Research Center Incorporated | Low-cost authenticated signing delegation in content centric networking |
US9473405B2 (en) | 2014-03-10 | 2016-10-18 | Palo Alto Research Center Incorporated | Concurrent hashes and sub-hashes on data streams |
US20160315822A1 (en) * | 2015-04-24 | 2016-10-27 | Goldman, Sachs & Co. | System and method for handling events involving computing systems and networks using fabric monitoring system |
US9483461B2 (en) | 2012-03-06 | 2016-11-01 | Apple Inc. | Handling speech synthesis of content for multiple languages |
US20160323395A1 (en) * | 2015-04-29 | 2016-11-03 | Facebook, Inc. | Methods and Systems for Viewing User Feedback |
US9497282B2 (en) | 2014-08-27 | 2016-11-15 | Palo Alto Research Center Incorporated | Network coding for content-centric network |
US9495129B2 (en) | 2012-06-29 | 2016-11-15 | Apple Inc. | Device, method, and user interface for voice-activated navigation and browsing of a document |
US9503358B2 (en) | 2013-12-05 | 2016-11-22 | Palo Alto Research Center Incorporated | Distance-based routing in an information-centric network |
US9503365B2 (en) | 2014-08-11 | 2016-11-22 | Palo Alto Research Center Incorporated | Reputation-based instruction processing over an information centric network |
US9502031B2 (en) | 2014-05-27 | 2016-11-22 | Apple Inc. | Method for supporting dynamic grammars in WFST-based ASR |
US9501345B1 (en) | 2013-12-23 | 2016-11-22 | Intuit Inc. | Method and system for creating enriched log data |
US9516064B2 (en) | 2013-10-14 | 2016-12-06 | Intuit Inc. | Method and system for dynamic and comprehensive vulnerability management |
US9516144B2 (en) | 2014-06-19 | 2016-12-06 | Palo Alto Research Center Incorporated | Cut-through forwarding of CCNx message fragments with IP encapsulation |
US20160364733A1 (en) * | 2015-06-09 | 2016-12-15 | International Business Machines Corporation | Attitude Inference |
EP3107026A1 (en) * | 2015-06-17 | 2016-12-21 | Accenture Global Services Limited | Event anomaly analysis and prediction |
US20160381077A1 (en) * | 2014-11-04 | 2016-12-29 | Patternex, Inc. | Method and apparatus for identifying and detecting threats to an enterprise or e-commerce system |
US9536059B2 (en) | 2014-12-15 | 2017-01-03 | Palo Alto Research Center Incorporated | Method and system for verifying renamed content using manifests in a content centric network |
US9535968B2 (en) | 2014-07-21 | 2017-01-03 | Palo Alto Research Center Incorporated | System for distributing nameless objects using self-certifying names |
US9537719B2 (en) | 2014-06-19 | 2017-01-03 | Palo Alto Research Center Incorporated | Method and apparatus for deploying a minimal-cost CCN topology |
US9535906B2 (en) | 2008-07-31 | 2017-01-03 | Apple Inc. | Mobile device having human language translation capability with positional feedback |
US9544380B2 (en) | 2013-04-10 | 2017-01-10 | International Business Machines Corporation | Data analytics and security in social networks |
US9547647B2 (en) | 2012-09-19 | 2017-01-17 | Apple Inc. | Voice-based media searching |
US9548987B1 (en) * | 2012-06-11 | 2017-01-17 | EMC IP Holding Company LLC | Intelligent remediation of security-related events |
US9552493B2 (en) | 2015-02-03 | 2017-01-24 | Palo Alto Research Center Incorporated | Access control framework for information centric networking |
US9552243B2 (en) | 2014-01-27 | 2017-01-24 | International Business Machines Corporation | Detecting an abnormal subsequence in a data sequence |
US9552552B1 (en) | 2011-04-29 | 2017-01-24 | Google Inc. | Identification of over-clustered map features |
US9552490B1 (en) | 2011-12-20 | 2017-01-24 | Amazon Technologies, Inc. | Managing resource dependent workflows |
US9553812B2 (en) | 2014-09-09 | 2017-01-24 | Palo Alto Research Center Incorporated | Interest keep alives at intermediate routers in a CCN |
US9558244B2 (en) * | 2014-10-22 | 2017-01-31 | Conversable, Inc. | Systems and methods for social recommendations |
US20170032129A1 (en) * | 2015-07-30 | 2017-02-02 | IOR Analytics, LLC | Method and apparatus for data security analysis of data flows |
US9576574B2 (en) | 2012-09-10 | 2017-02-21 | Apple Inc. | Context-sensitive handling of interruptions by intelligent digital assistant |
US9582608B2 (en) | 2013-06-07 | 2017-02-28 | Apple Inc. | Unified ranking with entropy-weighted information for phrase-based semantic auto-completion |
GB2541649A (en) * | 2015-08-21 | 2017-03-01 | Senseye Ltd | User feedback for machine learning |
WO2017035455A1 (en) * | 2015-08-27 | 2017-03-02 | Dynology Corporation | System and method for electronically monitoring employees to determine potential risk |
US9590948B2 (en) | 2014-12-15 | 2017-03-07 | Cisco Systems, Inc. | CCN routing using hardware-assisted hash tables |
US9590887B2 (en) | 2014-07-18 | 2017-03-07 | Cisco Systems, Inc. | Method and system for keeping interest alive in a content centric network |
EP3139296A1 (en) * | 2015-09-07 | 2017-03-08 | Docapost DPS | Computer system of secure digital information managing |
US9596251B2 (en) | 2014-04-07 | 2017-03-14 | Intuit Inc. | Method and system for providing security aware applications |
US20170076202A1 (en) * | 2015-09-16 | 2017-03-16 | Adobe Systems Incorporated | Identifying audiences that contribute to metric anomalies |
US9602596B2 (en) | 2015-01-12 | 2017-03-21 | Cisco Systems, Inc. | Peer-to-peer sharing in a content centric network |
US9609014B2 (en) | 2014-05-22 | 2017-03-28 | Cisco Systems, Inc. | Method and apparatus for preventing insertion of malicious content at a named data network router |
US20170093899A1 (en) * | 2015-09-29 | 2017-03-30 | International Business Machines Corporation | Crowd-based detection of device compromise in enterprise setting |
US20170091244A1 (en) * | 2015-09-24 | 2017-03-30 | Microsoft Technology Licensing, Llc | Searching a Data Structure |
US9614807B2 (en) | 2011-02-23 | 2017-04-04 | Bottlenose, Inc. | System and method for analyzing messages in a network or across networks |
US9613447B2 (en) | 2015-02-02 | 2017-04-04 | International Business Machines Corporation | Identifying cyclic patterns of complex events |
US9620104B2 (en) | 2013-06-07 | 2017-04-11 | Apple Inc. | System and method for user-specified pronunciation of words for speech synthesis and recognition |
US9620105B2 (en) | 2014-05-15 | 2017-04-11 | Apple Inc. | Analyzing audio input for efficient speech and music recognition |
US9621354B2 (en) | 2014-07-17 | 2017-04-11 | Cisco Systems, Inc. | Reconstructable content objects |
US9626413B2 (en) | 2014-03-10 | 2017-04-18 | Cisco Systems, Inc. | System and method for ranking content popularity in a content-centric network |
US20170111378A1 (en) * | 2015-10-20 | 2017-04-20 | International Business Machines Corporation | User configurable message anomaly scoring to identify unusual activity in information technology systems |
US9633004B2 (en) | 2014-05-30 | 2017-04-25 | Apple Inc. | Better resolution when referencing to concepts |
US9633674B2 (en) | 2013-06-07 | 2017-04-25 | Apple Inc. | System and method for detecting errors in interactions with a voice-based digital assistant |
US20170116531A1 (en) * | 2015-10-27 | 2017-04-27 | International Business Machines Corporation | Detecting emerging life events and identifying opportunity and risk from behavior |
US9639335B2 (en) | 2013-03-13 | 2017-05-02 | Microsoft Technology Licensing, Llc. | Contextual typing |
US20170126821A1 (en) * | 2015-11-02 | 2017-05-04 | International Business Machines Corporation | Analyzing the Online Behavior of a User and for Generating an Alert Based on Behavioral Deviations of the User |
US9646609B2 (en) | 2014-09-30 | 2017-05-09 | Apple Inc. | Caching apparatus for serving phonetic pronunciations |
US9646614B2 (en) | 2000-03-16 | 2017-05-09 | Apple Inc. | Fast, language-independent method for user authentication by voice |
US9652616B1 (en) * | 2011-03-14 | 2017-05-16 | Symantec Corporation | Techniques for classifying non-process threats |
US9652207B2 (en) | 2013-03-13 | 2017-05-16 | Microsoft Technology Licensing, Llc. | Static type checking across module universes |
US20170140117A1 (en) * | 2015-11-18 | 2017-05-18 | Ucb Biopharma Sprl | Method and system for generating and displaying topics in raw uncategorized data and for categorizing such data |
US20170139887A1 (en) | 2012-09-07 | 2017-05-18 | Splunk, Inc. | Advanced field extractor with modification of an extracted field |
US9659085B2 (en) | 2012-12-28 | 2017-05-23 | Microsoft Technology Licensing, Llc | Detecting anomalies in behavioral network with contextual side information |
US9660825B2 (en) | 2014-12-24 | 2017-05-23 | Cisco Technology, Inc. | System and method for multi-source multicasting in content-centric networks |
US20170143239A1 (en) * | 2013-01-15 | 2017-05-25 | Fitbit, Inc. | Portable monitoring devices and methods of operating the same |
US9668121B2 (en) | 2014-09-30 | 2017-05-30 | Apple Inc. | Social reminders |
US20170154314A1 (en) * | 2015-11-30 | 2017-06-01 | FAMA Technologies, Inc. | System for searching and correlating online activity with individual classification factors |
US20170163663A1 (en) * | 2015-12-02 | 2017-06-08 | Salesforce.Com, Inc. | False positive detection reduction system for network-based attacks |
US9678998B2 (en) | 2014-02-28 | 2017-06-13 | Cisco Technology, Inc. | Content name resolution for information centric networking |
US20170169360A1 (en) * | 2013-04-02 | 2017-06-15 | Patternex, Inc. | Method and system for training a big data machine to defend |
US9686194B2 (en) | 2009-10-21 | 2017-06-20 | Cisco Technology, Inc. | Adaptive multi-interface use for content networking |
US9686301B2 (en) | 2014-02-03 | 2017-06-20 | Intuit Inc. | Method and system for virtual asset assisted extrusion and intrusion detection and threat scoring in a cloud computing environment |
US9697822B1 (en) | 2013-03-15 | 2017-07-04 | Apple Inc. | System and method for updating an adaptive speech recognition model |
US9697820B2 (en) | 2015-09-24 | 2017-07-04 | Apple Inc. | Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks |
US9699198B2 (en) | 2014-07-07 | 2017-07-04 | Cisco Technology, Inc. | System and method for parallel secure content bootstrapping in content-centric networks |
US20170192872A1 (en) * | 2014-12-11 | 2017-07-06 | Hewlett Packard Enterprise Development Lp | Interactive detection of system anomalies |
US9711141B2 (en) | 2014-12-09 | 2017-07-18 | Apple Inc. | Disambiguating heteronyms in speech synthesis |
US9715875B2 (en) | 2014-05-30 | 2017-07-25 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US9716622B2 (en) | 2014-04-01 | 2017-07-25 | Cisco Technology, Inc. | System and method for dynamic name configuration in content-centric networks |
US20170213025A1 (en) * | 2015-10-30 | 2017-07-27 | General Electric Company | Methods, systems, apparatus, and storage media for use in detecting anomalous behavior and/or in preventing data loss |
US9721563B2 (en) | 2012-06-08 | 2017-08-01 | Apple Inc. | Name recognition system |
US9721566B2 (en) | 2015-03-08 | 2017-08-01 | Apple Inc. | Competing devices responding to voice triggers |
US9729662B2 (en) | 2014-08-11 | 2017-08-08 | Cisco Technology, Inc. | Probabilistic lazy-forwarding technique without validation in a content centric network |
US9729616B2 (en) | 2014-07-18 | 2017-08-08 | Cisco Technology, Inc. | Reputation-based strategy for forwarding and responding to interests over a content centric network |
US9734193B2 (en) | 2014-05-30 | 2017-08-15 | Apple Inc. | Determining domain salience ranking from ambiguous words in natural speech |
US9733821B2 (en) | 2013-03-14 | 2017-08-15 | Apple Inc. | Voice control to diagnose inadvertent activation of accessibility features |
WO2017139147A1 (en) * | 2016-02-08 | 2017-08-17 | Nec Laboratories America, Inc. | Ranking causal anomalies via temporal and dynamic analysis on vanishing correlations |
US9742794B2 (en) | 2014-05-27 | 2017-08-22 | Intuit Inc. | Method and apparatus for automating threat model generation and pattern identification |
US20170242932A1 (en) * | 2016-02-24 | 2017-08-24 | International Business Machines Corporation | Theft detection via adaptive lexical similarity analysis of social media data streams |
US20170257292A1 (en) * | 2013-07-31 | 2017-09-07 | Splunk Inc. | Systems and Methods For Displaying Metrics On Real-Time Data In An Environment |
US20170255695A1 (en) | 2013-01-23 | 2017-09-07 | Splunk, Inc. | Determining Rules Based on Text |
US9760559B2 (en) | 2014-05-30 | 2017-09-12 | Apple Inc. | Predictive text input |
US20170262561A1 (en) * | 2014-09-11 | 2017-09-14 | Nec Corporation | Information processing apparatus, information processing method, and recording medium |
CN107196942A (en) * | 2017-05-24 | 2017-09-22 | 山东省计算中心(国家超级计算济南中心) | A kind of inside threat detection method based on user language feature |
US9773112B1 (en) * | 2014-09-29 | 2017-09-26 | Fireeye, Inc. | Exploit detection of malware and malware families |
US20170286856A1 (en) * | 2016-04-05 | 2017-10-05 | Omni Al, Inc. | Trend analysis for a neuro-linguistic behavior recognition system |
US20170286678A1 (en) * | 2014-03-17 | 2017-10-05 | Proofpoint, Inc. | Behavior Profiling for Malware Detection |
US9787640B1 (en) * | 2014-02-11 | 2017-10-10 | DataVisor Inc. | Using hypergraphs to determine suspicious user activities |
US9785630B2 (en) | 2014-05-30 | 2017-10-10 | Apple Inc. | Text prediction using combined word N-gram and unigram language models |
US9794238B2 (en) | 2015-10-29 | 2017-10-17 | Cisco Technology, Inc. | System for key exchange in a content centric network |
US9800637B2 (en) | 2014-08-19 | 2017-10-24 | Cisco Technology, Inc. | System and method for all-in-one content stream in content-centric networks |
US9798882B2 (en) * | 2014-06-06 | 2017-10-24 | Crowdstrike, Inc. | Real-time model of states of monitored devices |
US9798393B2 (en) | 2011-08-29 | 2017-10-24 | Apple Inc. | Text correction processing |
US9807205B2 (en) | 2015-11-02 | 2017-10-31 | Cisco Technology, Inc. | Header compression for CCN messages using dictionary |
US9818400B2 (en) | 2014-09-11 | 2017-11-14 | Apple Inc. | Method and apparatus for discovering trending terms in speech requests |
US9830533B2 (en) | 2015-12-30 | 2017-11-28 | International Business Machines Corporation | Analyzing and exploring images posted on social media |
US9832116B2 (en) | 2016-03-14 | 2017-11-28 | Cisco Technology, Inc. | Adjusting entries in a forwarding information base in a content centric network |
US9832291B2 (en) | 2015-01-12 | 2017-11-28 | Cisco Technology, Inc. | Auto-configurable transport stack |
US9832123B2 (en) | 2015-09-11 | 2017-11-28 | Cisco Technology, Inc. | Network named fragments in a content centric network |
US9836540B2 (en) | 2014-03-04 | 2017-12-05 | Cisco Technology, Inc. | System and method for direct storage access in a content-centric network |
US9836183B1 (en) * | 2016-09-14 | 2017-12-05 | Quid, Inc. | Summarized network graph for semantic similarity graphs of large corpora |
US9842105B2 (en) | 2015-04-16 | 2017-12-12 | Apple Inc. | Parsimonious continuous-space phrase representations for natural language processing |
US9842101B2 (en) | 2014-05-30 | 2017-12-12 | Apple Inc. | Predictive conversion of language input |
US9846881B2 (en) | 2014-12-19 | 2017-12-19 | Palo Alto Research Center Incorporated | Frugal user engagement help systems |
US20170364693A1 (en) * | 2016-06-21 | 2017-12-21 | Unisys Corporation | Systems and methods for efficient access control |
US9852208B2 (en) | 2014-02-25 | 2017-12-26 | International Business Machines Corporation | Discovering communities and expertise of users using semantic analysis of resource access logs |
US9858925B2 (en) | 2009-06-05 | 2018-01-02 | Apple Inc. | Using context information to facilitate processing of commands in a virtual assistant |
US20180004860A1 (en) * | 2016-06-30 | 2018-01-04 | Hitachi, Ltd. | Data generation method and computer system |
US9865280B2 (en) | 2015-03-06 | 2018-01-09 | Apple Inc. | Structured dictation using intelligent automated assistants |
US9866581B2 (en) | 2014-06-30 | 2018-01-09 | Intuit Inc. | Method and system for secure delivery of information to computing environments |
US9875360B1 (en) | 2016-07-14 | 2018-01-23 | IronNet Cybersecurity, Inc. | Simulation and virtual reality based cyber behavioral systems |
US9882964B2 (en) | 2014-08-08 | 2018-01-30 | Cisco Technology, Inc. | Explicit strategy feedback in name-based forwarding |
US9886432B2 (en) | 2014-09-30 | 2018-02-06 | Apple Inc. | Parsimonious handling of word inflection via categorical stem + suffix N-gram language models |
US9886953B2 (en) | 2015-03-08 | 2018-02-06 | Apple Inc. | Virtual assistant activation |
US20180039774A1 (en) * | 2016-08-08 | 2018-02-08 | International Business Machines Corporation | Install-Time Security Analysis of Mobile Applications |
US9892533B1 (en) * | 2015-10-01 | 2018-02-13 | Hrl Laboratories, Llc | Graph visualization system based on gravitational forces due to path distance and betweenness centrality |
US9900322B2 (en) | 2014-04-30 | 2018-02-20 | Intuit Inc. | Method and system for providing permissions management |
US9899019B2 (en) | 2015-03-18 | 2018-02-20 | Apple Inc. | Systems and methods for structured stem and suffix language models |
US20180060437A1 (en) * | 2016-08-29 | 2018-03-01 | EverString Innovation Technology | Keyword and business tag extraction |
US9912776B2 (en) | 2015-12-02 | 2018-03-06 | Cisco Technology, Inc. | Explicit content deletion commands in a content centric network |
US9916457B2 (en) | 2015-01-12 | 2018-03-13 | Cisco Technology, Inc. | Decoupled name security binding for CCN objects |
US9916601B2 (en) | 2014-03-21 | 2018-03-13 | Cisco Technology, Inc. | Marketplace for presenting advertisements in a scalable data broadcasting system |
US20180074856A1 (en) * | 2016-09-15 | 2018-03-15 | Oracle International Corporation | Processing timestamps and heartbeat events for automatic time progression |
US9923909B2 (en) | 2014-02-03 | 2018-03-20 | Intuit Inc. | System and method for providing a self-monitoring, self-reporting, and self-repairing virtual asset configured for extrusion and intrusion detection and threat scoring in a cloud computing environment |
US9922642B2 (en) | 2013-03-15 | 2018-03-20 | Apple Inc. | Training an at least partial voice command system |
US20180084001A1 (en) * | 2016-09-22 | 2018-03-22 | Microsoft Technology Licensing, Llc. | Enterprise graph method of threat detection |
WO2018052984A1 (en) * | 2016-09-14 | 2018-03-22 | The Dun & Bradstreet Corporation | Geolocating entities of interest on geo heat maps |
US20180082193A1 (en) * | 2016-09-21 | 2018-03-22 | Scianta Analytics, LLC | Cognitive modeling apparatus for defuzzification of multiple qualitative signals into human-centric threat notifications |
US9930146B2 (en) | 2016-04-04 | 2018-03-27 | Cisco Technology, Inc. | System and method for compressing content centric networking messages |
US9935791B2 (en) | 2013-05-20 | 2018-04-03 | Cisco Technology, Inc. | Method and system for name resolution across heterogeneous architectures |
US9934775B2 (en) | 2016-05-26 | 2018-04-03 | Apple Inc. | Unit-selection text-to-speech synthesis based on predicted concatenation parameters |
US9949301B2 (en) | 2016-01-20 | 2018-04-17 | Palo Alto Research Center Incorporated | Methods for fast, secure and privacy-friendly internet connection discovery in wireless networks |
US9946743B2 (en) | 2015-01-12 | 2018-04-17 | Cisco Technology, Inc. | Order encoded manifests in a content centric network |
US9946895B1 (en) * | 2015-12-15 | 2018-04-17 | Amazon Technologies, Inc. | Data obfuscation |
US9946706B2 (en) | 2008-06-07 | 2018-04-17 | Apple Inc. | Automatic language identification for dynamic text processing |
US9954795B2 (en) | 2015-01-12 | 2018-04-24 | Cisco Technology, Inc. | Resource allocation using CCN manifests |
US9954678B2 (en) | 2014-02-06 | 2018-04-24 | Cisco Technology, Inc. | Content-based transport security |
US9959870B2 (en) | 2008-12-11 | 2018-05-01 | Apple Inc. | Speech recognition involving a mobile device |
US9959156B2 (en) | 2014-07-17 | 2018-05-01 | Cisco Technology, Inc. | Interest return control message |
US9966065B2 (en) | 2014-05-30 | 2018-05-08 | Apple Inc. | Multi-command single utterance input method |
US9966068B2 (en) | 2013-06-08 | 2018-05-08 | Apple Inc. | Interpreting and acting upon commands that involve sharing information with remote devices |
US9972304B2 (en) | 2016-06-03 | 2018-05-15 | Apple Inc. | Privacy preserving distributed evaluation framework for embedded personalized systems |
US9977809B2 (en) | 2015-09-24 | 2018-05-22 | Cisco Technology, Inc. | Information and data framework in a content centric network |
US9979744B1 (en) | 2012-03-20 | 2018-05-22 | United States Automobile Association (USAA) | Dynamic risk engine |
US9977779B2 (en) | 2013-03-14 | 2018-05-22 | Apple Inc. | Automatic supplementation of word correction dictionaries |
US9986034B2 (en) | 2015-08-03 | 2018-05-29 | Cisco Technology, Inc. | Transferring state in content centric network stacks |
US9992281B2 (en) | 2014-05-01 | 2018-06-05 | Cisco Technology, Inc. | Accountable content stores for information centric networks |
US9992018B1 (en) | 2016-03-24 | 2018-06-05 | Electronic Arts Inc. | Generating cryptographic challenges to communication requests |
US9992097B2 (en) | 2016-07-11 | 2018-06-05 | Cisco Technology, Inc. | System and method for piggybacking routing information in interests in a content centric network |
US10003520B2 (en) | 2014-12-22 | 2018-06-19 | Cisco Technology, Inc. | System and method for efficient name-based content routing using link-state information in information-centric networks |
US10003507B2 (en) | 2016-03-04 | 2018-06-19 | Cisco Technology, Inc. | Transport session state protocol |
US10002177B1 (en) * | 2013-09-16 | 2018-06-19 | Amazon Technologies, Inc. | Crowdsourced analysis of decontextualized data |
US10007788B2 (en) * | 2015-02-11 | 2018-06-26 | Electronics And Telecommunications Research Institute | Method of modeling behavior pattern of instruction set in N-gram manner, computing device operating with the method, and program stored in storage medium to execute the method in computing device |
US10009446B2 (en) | 2015-11-02 | 2018-06-26 | Cisco Technology, Inc. | Header compression for CCN messages using dictionary learning |
US10009358B1 (en) * | 2014-02-11 | 2018-06-26 | DataVisor Inc. | Graph based framework for detecting malicious or compromised accounts |
US10009266B2 (en) | 2016-07-05 | 2018-06-26 | Cisco Technology, Inc. | Method and system for reference counted pending interest tables in a content centric network |
WO2018119068A1 (en) * | 2016-12-21 | 2018-06-28 | Threat Stack, Inc. | System and method for cloud-based operating system event and data access monitoring |
US20180181895A1 (en) * | 2016-12-23 | 2018-06-28 | Yodlee, Inc. | Identifying Recurring Series From Transactional Data |
CN108260155A (en) * | 2018-01-05 | 2018-07-06 | 西安电子科技大学 | A kind of wireless sense network method for detecting abnormality based on space-time similarity |
US10021222B2 (en) | 2015-11-04 | 2018-07-10 | Cisco Technology, Inc. | Bit-aligned header compression for CCN messages using dictionary |
US10019994B2 (en) | 2012-06-08 | 2018-07-10 | Apple Inc. | Systems and methods for recognizing textual identifiers within a plurality of words |
US10027578B2 (en) | 2016-04-11 | 2018-07-17 | Cisco Technology, Inc. | Method and system for routable prefix queries in a content centric network |
US20180203847A1 (en) * | 2017-01-15 | 2018-07-19 | International Business Machines Corporation | Tone optimization for digital content |
US10033639B2 (en) | 2016-03-25 | 2018-07-24 | Cisco Technology, Inc. | System and method for routing packets in a content centric network using anonymous datagrams |
US10033642B2 (en) | 2016-09-19 | 2018-07-24 | Cisco Technology, Inc. | System and method for making optimal routing decisions based on device-specific parameters in a content centric network |
US10038633B2 (en) | 2016-03-04 | 2018-07-31 | Cisco Technology, Inc. | Protocol to query for historical network information in a content centric network |
US10037768B1 (en) | 2017-09-26 | 2018-07-31 | International Business Machines Corporation | Assessing the structural quality of conversations |
US10043016B2 (en) | 2016-02-29 | 2018-08-07 | Cisco Technology, Inc. | Method and system for name encryption agreement in a content centric network |
US10043516B2 (en) | 2016-09-23 | 2018-08-07 | Apple Inc. | Intelligent automated assistant |
US10043221B2 (en) | 2015-11-02 | 2018-08-07 | International Business Machines Corporation | Assigning confidence levels to online profiles |
US10049668B2 (en) | 2015-12-02 | 2018-08-14 | Apple Inc. | Applying neural network language models to weighted finite state transducers for automatic speech recognition |
US10051071B2 (en) | 2016-03-04 | 2018-08-14 | Cisco Technology, Inc. | Method and system for collecting historical network information in a content centric network |
US10049663B2 (en) | 2016-06-08 | 2018-08-14 | Apple, Inc. | Intelligent automated assistant for media exploration |
US10057736B2 (en) | 2011-06-03 | 2018-08-21 | Apple Inc. | Active transport based notifications |
US10055535B2 (en) * | 2016-09-27 | 2018-08-21 | Globalfoundries Inc. | Method, system and program product for identifying anomalies in integrated circuit design layouts |
US10063414B2 (en) | 2016-05-13 | 2018-08-28 | Cisco Technology, Inc. | Updating a transport stack in a content centric network |
US10069729B2 (en) | 2016-08-08 | 2018-09-04 | Cisco Technology, Inc. | System and method for throttling traffic based on a forwarding information base in a content centric network |
US10069933B2 (en) | 2014-10-23 | 2018-09-04 | Cisco Technology, Inc. | System and method for creating virtual interfaces based on network characteristics |
US10067948B2 (en) | 2016-03-18 | 2018-09-04 | Cisco Technology, Inc. | Data deduping in content centric networking manifests |
US10068022B2 (en) | 2011-06-03 | 2018-09-04 | Google Llc | Identifying topical entities |
US10067938B2 (en) | 2016-06-10 | 2018-09-04 | Apple Inc. | Multilingual word prediction |
US10074360B2 (en) | 2014-09-30 | 2018-09-11 | Apple Inc. | Providing an indication of the suitability of speech recognition |
US10075402B2 (en) | 2015-06-24 | 2018-09-11 | Cisco Technology, Inc. | Flexible command and control in content centric networks |
US10075401B2 (en) | 2015-03-18 | 2018-09-11 | Cisco Technology, Inc. | Pending interest table behavior |
CN108519993A (en) * | 2018-03-02 | 2018-09-11 | 华南理工大学 | The social networks focus incident detection method calculated based on multiple data stream |
US10075521B2 (en) | 2014-04-07 | 2018-09-11 | Cisco Technology, Inc. | Collection synchronization using equality matched network names |
US10078631B2 (en) | 2014-05-30 | 2018-09-18 | Apple Inc. | Entropy-guided text prediction using combined word and character n-gram language models |
US10078062B2 (en) | 2015-12-15 | 2018-09-18 | Palo Alto Research Center Incorporated | Device health estimation by combining contextual information with sensor data |
US10078487B2 (en) | 2013-03-15 | 2018-09-18 | Apple Inc. | Context-sensitive handling of interruptions |
US10084764B2 (en) | 2016-05-13 | 2018-09-25 | Cisco Technology, Inc. | System for a secure encryption proxy in a content centric network |
US10083688B2 (en) | 2015-05-27 | 2018-09-25 | Apple Inc. | Device voice control for selecting a displayed affordance |
US10089655B2 (en) | 2013-11-27 | 2018-10-02 | Cisco Technology, Inc. | Method and apparatus for scalable data broadcasting |
US10091330B2 (en) | 2016-03-23 | 2018-10-02 | Cisco Technology, Inc. | Interest scheduling by an information and data framework in a content centric network |
US10089651B2 (en) | 2014-03-03 | 2018-10-02 | Cisco Technology, Inc. | Method and apparatus for streaming advertisements in a scalable data broadcasting system |
US10089072B2 (en) | 2016-06-11 | 2018-10-02 | Apple Inc. | Intelligent device arbitration and control |
US20180287877A1 (en) * | 2017-03-31 | 2018-10-04 | Bmc Software, Inc | Cloud service interdependency relationship detection |
US10097521B2 (en) | 2015-11-20 | 2018-10-09 | Cisco Technology, Inc. | Transparent encryption in a content centric network |
US10098051B2 (en) | 2014-01-22 | 2018-10-09 | Cisco Technology, Inc. | Gateways and routing in software-defined manets |
US10097346B2 (en) | 2015-12-09 | 2018-10-09 | Cisco Technology, Inc. | Key catalogs in a content centric network |
US10095980B1 (en) | 2011-04-29 | 2018-10-09 | Google Llc | Moderation of user-generated content |
US10103989B2 (en) | 2016-06-13 | 2018-10-16 | Cisco Technology, Inc. | Content object return messages in a content centric network |
US10101822B2 (en) | 2015-06-05 | 2018-10-16 | Apple Inc. | Language input correction |
US10102374B1 (en) | 2014-08-11 | 2018-10-16 | Sentinel Labs Israel Ltd. | Method of remediating a program and system thereof by undoing operations |
US10102082B2 (en) | 2014-07-31 | 2018-10-16 | Intuit Inc. | Method and system for providing automated self-healing virtual assets |
CN108713310A (en) * | 2016-02-15 | 2018-10-26 | 策安保安有限公司 | Method and system for information security data in online and transmission to be compressed and optimized |
US10114959B2 (en) * | 2015-05-18 | 2018-10-30 | Ricoh Company, Ltd. | Information processing apparatus, information processing method, and information processing system |
US10116605B2 (en) | 2015-06-22 | 2018-10-30 | Cisco Technology, Inc. | Transport stack name scheme and identity management |
WO2018200113A1 (en) * | 2017-04-26 | 2018-11-01 | Elasticsearch B.V. | Anomaly and causation detection in computing environments |
US10122624B2 (en) | 2016-07-25 | 2018-11-06 | Cisco Technology, Inc. | System and method for ephemeral entries in a forwarding information base in a content centric network |
US20180322276A1 (en) * | 2017-05-04 | 2018-11-08 | Crowdstrike, Inc. | Least recently used (lru)-based event suppression |
US10127220B2 (en) | 2015-06-04 | 2018-11-13 | Apple Inc. | Language identification from short strings |
US10127911B2 (en) | 2014-09-30 | 2018-11-13 | Apple Inc. | Speaker identification and unsupervised speaker adaptation techniques |
US10129365B2 (en) | 2013-11-13 | 2018-11-13 | Cisco Technology, Inc. | Method and apparatus for pre-fetching remote content based on static and dynamic recommendations |
US10127511B1 (en) * | 2017-09-22 | 2018-11-13 | 1Nteger, Llc | Systems and methods for investigating and evaluating financial crime and sanctions-related risks |
US10135948B2 (en) | 2016-10-31 | 2018-11-20 | Cisco Technology, Inc. | System and method for process migration in a content centric network |
US10134385B2 (en) | 2012-03-02 | 2018-11-20 | Apple Inc. | Systems and methods for name pronunciation |
US10148572B2 (en) | 2016-06-27 | 2018-12-04 | Cisco Technology, Inc. | Method and system for interest groups in a content centric network |
US10148673B1 (en) * | 2015-09-30 | 2018-12-04 | EMC IP Holding Company LLC | Automatic selection of malicious activity detection rules using crowd-sourcing techniques |
US20180351978A1 (en) * | 2017-06-05 | 2018-12-06 | Microsoft Technology Licensing, Llc | Correlating user information to a tracked event |
US10152596B2 (en) | 2016-01-19 | 2018-12-11 | International Business Machines Corporation | Detecting anomalous events through runtime verification of software execution using a behavioral model |
WO2018226461A1 (en) * | 2017-06-05 | 2018-12-13 | Microsoft Technology Licensing, Llc | Validating correlation between chains of alerts using cloud view |
US10171510B2 (en) * | 2016-12-14 | 2019-01-01 | CyberSaint, Inc. | System and method for monitoring and grading a cybersecurity framework |
US10172068B2 (en) | 2014-01-22 | 2019-01-01 | Cisco Technology, Inc. | Service-oriented routing in software-defined MANETs |
US10171488B2 (en) * | 2017-05-15 | 2019-01-01 | Forcepoint, LLC | User behavior profile |
US10170123B2 (en) | 2014-05-30 | 2019-01-01 | Apple Inc. | Intelligent assistant for home automation |
US20190005225A1 (en) * | 2017-06-29 | 2019-01-03 | Microsoft Technology Licensing, Llc | Detection of attacks in the cloud by crowd sourcing security solutions |
US10176167B2 (en) | 2013-06-09 | 2019-01-08 | Apple Inc. | System and method for inferring user intent from speech inputs |
US10187260B1 (en) | 2015-05-29 | 2019-01-22 | Quest Software Inc. | Systems and methods for multilayer monitoring of network function virtualization architectures |
US10185542B2 (en) | 2013-06-09 | 2019-01-22 | Apple Inc. | Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant |
US10186254B2 (en) | 2015-06-07 | 2019-01-22 | Apple Inc. | Context-based endpoint detection |
US10192552B2 (en) | 2016-06-10 | 2019-01-29 | Apple Inc. | Digital assistant providing whispered speech |
US10200252B1 (en) | 2015-09-18 | 2019-02-05 | Quest Software Inc. | Systems and methods for integrated modeling of monitored virtual desktop infrastructure systems |
US10199051B2 (en) | 2013-02-07 | 2019-02-05 | Apple Inc. | Voice trigger for a digital assistant |
US10204013B2 (en) | 2014-09-03 | 2019-02-12 | Cisco Technology, Inc. | System and method for maintaining a distributed and fault-tolerant state over an information centric network |
US10212196B2 (en) | 2016-03-16 | 2019-02-19 | Cisco Technology, Inc. | Interface discovery and authentication in a name-based network |
US10212248B2 (en) | 2016-10-03 | 2019-02-19 | Cisco Technology, Inc. | Cache management on high availability routers in a content centric network |
US10223066B2 (en) | 2015-12-23 | 2019-03-05 | Apple Inc. | Proactive assistance based on dialog communication between devices |
US10223644B2 (en) | 2014-09-29 | 2019-03-05 | Cisco Technology, Inc. | Behavioral modeling of a data center utilizing human knowledge to enhance a machine learning algorithm |
US10230601B1 (en) | 2016-07-05 | 2019-03-12 | Quest Software Inc. | Systems and methods for integrated modeling and performance measurements of monitored virtual desktop infrastructure systems |
US20190080352A1 (en) * | 2017-09-11 | 2019-03-14 | Adobe Systems Incorporated | Segment Extension Based on Lookalike Selection |
US10237226B2 (en) | 2015-11-30 | 2019-03-19 | International Business Machines Corporation | Detection of manipulation of social media content |
US10237295B2 (en) * | 2016-03-22 | 2019-03-19 | Nec Corporation | Automated event ID field analysis on heterogeneous logs |
US10237189B2 (en) | 2014-12-16 | 2019-03-19 | Cisco Technology, Inc. | System and method for distance-based interest forwarding |
US20190087750A1 (en) * | 2016-02-26 | 2019-03-21 | Nippon Telegraph And Telephone Corporation | Analysis device, analysis method, and analysis program |
US10243851B2 (en) | 2016-11-21 | 2019-03-26 | Cisco Technology, Inc. | System and method for forwarder connection information in a content centric network |
US10241752B2 (en) | 2011-09-30 | 2019-03-26 | Apple Inc. | Interface for a virtual digital assistant |
US10241644B2 (en) | 2011-06-03 | 2019-03-26 | Apple Inc. | Actionable reminder entries |
EP3460769A1 (en) * | 2017-09-26 | 2019-03-27 | Netscout Systems, Inc. | System and method for managing alerts using a state machine |
US10249300B2 (en) | 2016-06-06 | 2019-04-02 | Apple Inc. | Intelligent list reading |
US20190104156A1 (en) * | 2017-10-04 | 2019-04-04 | Servicenow, Inc. | Systems and methods for automated governance, risk, and compliance |
US10255907B2 (en) | 2015-06-07 | 2019-04-09 | Apple Inc. | Automatic accent detection using acoustic models |
US10257271B2 (en) | 2016-01-11 | 2019-04-09 | Cisco Technology, Inc. | Chandra-Toueg consensus in a content centric network |
US10255566B2 (en) | 2011-06-03 | 2019-04-09 | Apple Inc. | Generating and processing task items that represent tasks to perform |
US10263965B2 (en) | 2015-10-16 | 2019-04-16 | Cisco Technology, Inc. | Encrypted CCNx |
US10262153B2 (en) * | 2017-07-26 | 2019-04-16 | Forcepoint, LLC | Privacy protection during insider threat monitoring |
US10268505B2 (en) | 2016-04-28 | 2019-04-23 | EntIT Software, LLC | Batch job frequency control |
US10268821B2 (en) * | 2014-08-04 | 2019-04-23 | Darktrace Limited | Cyber security |
US10268820B2 (en) * | 2014-06-11 | 2019-04-23 | Nippon Telegraph And Telephone Corporation | Malware determination device, malware determination system, malware determination method, and program |
US10269345B2 (en) | 2016-06-11 | 2019-04-23 | Apple Inc. | Intelligent task discovery |
US10276170B2 (en) | 2010-01-18 | 2019-04-30 | Apple Inc. | Intelligent automated assistant |
US20190132351A1 (en) * | 2015-07-30 | 2019-05-02 | IOR Analytics, LLC. | Method and apparatus for data security analysis of data flows |
US10282463B2 (en) * | 2013-01-23 | 2019-05-07 | Splunk Inc. | Displaying a number of events that have a particular value for a field in a set of events |
US20190138580A1 (en) * | 2017-11-06 | 2019-05-09 | Microsoft Technology Licensing, Llc | Electronic document content augmentation |
US10289433B2 (en) | 2014-05-30 | 2019-05-14 | Apple Inc. | Domain specific language for encoding assistant dialog |
CN109753989A (en) * | 2018-11-18 | 2019-05-14 | 韩霞 | Power consumer electricity stealing analysis method based on big data and machine learning |
US10291493B1 (en) * | 2014-12-05 | 2019-05-14 | Quest Software Inc. | System and method for determining relevant computer performance events |
US10297253B2 (en) | 2016-06-11 | 2019-05-21 | Apple Inc. | Application integration with a digital assistant |
CN109791587A (en) * | 2016-10-05 | 2019-05-21 | 微软技术许可有限责任公司 | Equipment is endangered via User Status detection |
US10296160B2 (en) | 2013-12-06 | 2019-05-21 | Apple Inc. | Method for extracting salient dialog usage from live data |
US10305864B2 (en) | 2016-01-25 | 2019-05-28 | Cisco Technology, Inc. | Method and system for interest encryption in a content centric network |
US10303715B2 (en) | 2017-05-16 | 2019-05-28 | Apple Inc. | Intelligent automated assistant for media exploration |
US10304263B2 (en) * | 2016-12-13 | 2019-05-28 | The Boeing Company | Vehicle system prognosis device and method |
US10305865B2 (en) | 2016-06-21 | 2019-05-28 | Cisco Technology, Inc. | Permutation-based content encryption with manifests in a content centric network |
US20190164094A1 (en) * | 2017-11-27 | 2019-05-30 | Promontory Financial Group Llc | Risk rating analytics based on geographic regions |
US10311144B2 (en) | 2017-05-16 | 2019-06-04 | Apple Inc. | Emoji word sense disambiguation |
US10313227B2 (en) | 2015-09-24 | 2019-06-04 | Cisco Technology, Inc. | System and method for eliminating undetected interest looping in information-centric networks |
US10318537B2 (en) | 2013-01-22 | 2019-06-11 | Splunk Inc. | Advanced field extractor |
US10320675B2 (en) | 2016-05-04 | 2019-06-11 | Cisco Technology, Inc. | System and method for routing packets in a stateless content centric network |
US10320636B2 (en) | 2016-12-21 | 2019-06-11 | Ca, Inc. | State information completion using context graphs |
US10320760B2 (en) | 2016-04-01 | 2019-06-11 | Cisco Technology, Inc. | Method and system for mutating and caching content in a content centric network |
US20190182279A1 (en) * | 2008-05-27 | 2019-06-13 | Yingbo Song | Detecting network anomalies by probabilistic modeling of argument strings with markov chains |
US10333820B1 (en) | 2012-10-23 | 2019-06-25 | Quest Software Inc. | System for inferring dependencies among computing systems |
US10332518B2 (en) | 2017-05-09 | 2019-06-25 | Apple Inc. | User interface for correcting recognition errors |
US10333840B2 (en) | 2015-02-06 | 2019-06-25 | Cisco Technology, Inc. | System and method for on-demand content exchange with adaptive naming in information-centric networks |
CN109951420A (en) * | 2017-12-20 | 2019-06-28 | 广东电网有限责任公司电力调度控制中心 | A kind of multistage flow method for detecting abnormality based on entropy and dynamic linear relationship |
US10338802B2 (en) * | 2017-02-08 | 2019-07-02 | International Business Machines Corporation | Monitoring an activity and determining the type of actor performing the activity |
WO2019133989A1 (en) * | 2017-12-29 | 2019-07-04 | DataVisor, Inc. | Detecting network attacks |
US10346450B2 (en) * | 2016-12-21 | 2019-07-09 | Ca, Inc. | Automatic datacenter state summarization |
US10356243B2 (en) | 2015-06-05 | 2019-07-16 | Apple Inc. | Virtual assistant aided communication with 3rd party service in a communication session |
US10355999B2 (en) | 2015-09-23 | 2019-07-16 | Cisco Technology, Inc. | Flow control with network named fragments |
US10354011B2 (en) | 2016-06-09 | 2019-07-16 | Apple Inc. | Intelligent automated assistant in a home environment |
CN110069480A (en) * | 2019-03-04 | 2019-07-30 | 广东恒睿科技有限公司 | A kind of parallel data cleaning method |
US10365780B2 (en) * | 2014-05-05 | 2019-07-30 | Adobe Inc. | Crowdsourcing for documents and forms |
US10366158B2 (en) | 2015-09-29 | 2019-07-30 | Apple Inc. | Efficient word encoding for recurrent neural network language models |
US10373091B2 (en) * | 2017-09-22 | 2019-08-06 | 1Nteger, Llc | Systems and methods for investigating and evaluating financial crime and sanctions-related risks |
US10372702B2 (en) * | 2016-12-28 | 2019-08-06 | Intel Corporation | Methods and apparatus for detecting anomalies in electronic data |
US20190251166A1 (en) * | 2018-02-15 | 2019-08-15 | International Business Machines Corporation | Topic kernelization for real-time conversation data |
US10387476B2 (en) * | 2015-11-24 | 2019-08-20 | International Business Machines Corporation | Semantic mapping of topic map meta-models identifying assets and events to include modeled reactive actions |
CN110162621A (en) * | 2019-02-22 | 2019-08-23 | 腾讯科技(深圳)有限公司 | Disaggregated model training method, abnormal comment detection method, device and equipment |
US10395654B2 (en) | 2017-05-11 | 2019-08-27 | Apple Inc. | Text normalization based on a data-driven learning network |
US10397259B2 (en) | 2017-03-23 | 2019-08-27 | International Business Machines Corporation | Cyber security event detection |
US10394946B2 (en) | 2012-09-07 | 2019-08-27 | Splunk Inc. | Refining extraction rules based on selected text within events |
US10403283B1 (en) | 2018-06-01 | 2019-09-03 | Apple Inc. | Voice interaction at a primary device to access call functionality of a companion device |
US10404450B2 (en) | 2016-05-02 | 2019-09-03 | Cisco Technology, Inc. | Schematized access control in a content centric network |
US10403278B2 (en) | 2017-05-16 | 2019-09-03 | Apple Inc. | Methods and systems for phonetic matching in digital assistant services |
US10410135B2 (en) * | 2015-05-21 | 2019-09-10 | Software Ag Usa, Inc. | Systems and/or methods for dynamic anomaly detection in machine sensor data |
US10409980B2 (en) | 2012-12-27 | 2019-09-10 | Crowdstrike, Inc. | Real-time representation of security-relevant system state |
US10410637B2 (en) | 2017-05-12 | 2019-09-10 | Apple Inc. | User-specific acoustic models |
US10417569B2 (en) | 2014-01-26 | 2019-09-17 | International Business Machines Corporation | Detecting deviations between event log and process model |
US10419269B2 (en) | 2017-02-21 | 2019-09-17 | Entit Software Llc | Anomaly detection |
US10417037B2 (en) | 2012-05-15 | 2019-09-17 | Apple Inc. | Systems and methods for integrating third party services with a digital assistant |
US10417266B2 (en) | 2017-05-09 | 2019-09-17 | Apple Inc. | Context-aware ranking of intelligent response suggestions |
US10425503B2 (en) | 2016-04-07 | 2019-09-24 | Cisco Technology, Inc. | Shared pending interest table in a content centric network |
US10423647B2 (en) | 2016-12-21 | 2019-09-24 | Ca, Inc. | Descriptive datacenter state comparison |
US10432605B1 (en) * | 2012-03-20 | 2019-10-01 | United Services Automobile Association (Usaa) | Scalable risk-based authentication methods and systems |
US10432639B1 (en) * | 2017-05-04 | 2019-10-01 | Amazon Technologies, Inc. | Security management for graph analytics |
US10430839B2 (en) | 2012-12-12 | 2019-10-01 | Cisco Technology, Inc. | Distributed advertisement insertion in content-centric networks |
US10427048B1 (en) | 2015-03-27 | 2019-10-01 | Electronic Arts Inc. | Secure anti-cheat system |
US10445429B2 (en) | 2017-09-21 | 2019-10-15 | Apple Inc. | Natural language understanding using vocabularies with compressed serialized tries |
US10446135B2 (en) * | 2014-07-09 | 2019-10-15 | Genesys Telecommunications Laboratories, Inc. | System and method for semantically exploring concepts |
US10446141B2 (en) | 2014-08-28 | 2019-10-15 | Apple Inc. | Automatic speech recognition based on user feedback |
US10447718B2 (en) | 2017-05-15 | 2019-10-15 | Forcepoint Llc | User profile definition and management |
US10447805B2 (en) | 2016-10-10 | 2019-10-15 | Cisco Technology, Inc. | Distributed consensus in a content centric network |
US10446143B2 (en) | 2016-03-14 | 2019-10-15 | Apple Inc. | Identification of voice inputs providing credentials |
US10454820B2 (en) | 2015-09-29 | 2019-10-22 | Cisco Technology, Inc. | System and method for stateless information-centric networking |
US20190325343A1 (en) * | 2018-04-19 | 2019-10-24 | National University Of Singapore | Machine learning using partial order hypergraphs |
US10462171B2 (en) | 2017-08-08 | 2019-10-29 | Sentinel Labs Israel Ltd. | Methods, systems, and devices for dynamically modeling and grouping endpoints for edge networking |
US10459827B1 (en) | 2016-03-22 | 2019-10-29 | Electronic Arts Inc. | Machine-learning based anomaly detection for heterogenous data sources |
US10460320B1 (en) | 2016-08-10 | 2019-10-29 | Electronic Arts Inc. | Fraud detection in heterogeneous information networks |
US10469514B2 (en) | 2014-06-23 | 2019-11-05 | Hewlett Packard Enterprise Development Lp | Collaborative and adaptive threat intelligence for computer security |
US10469523B2 (en) | 2016-02-24 | 2019-11-05 | Imperva, Inc. | Techniques for detecting compromises of enterprise end stations utilizing noisy tokens |
US20190342297A1 (en) * | 2018-05-01 | 2019-11-07 | Brighterion, Inc. | Securing internet-of-things with smart-agent technology |
US20190342308A1 (en) * | 2018-05-02 | 2019-11-07 | Sri International | Method of malware characterization and prediction |
US10474753B2 (en) | 2016-09-07 | 2019-11-12 | Apple Inc. | Language identification using recurrent neural networks |
US10482874B2 (en) | 2017-05-15 | 2019-11-19 | Apple Inc. | Hierarchical belief states for digital assistants |
US10490187B2 (en) | 2016-06-10 | 2019-11-26 | Apple Inc. | Digital assistant providing automated status report |
US10496705B1 (en) | 2018-06-03 | 2019-12-03 | Apple Inc. | Accelerated task performance |
US10496753B2 (en) | 2010-01-18 | 2019-12-03 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
US10509862B2 (en) | 2016-06-10 | 2019-12-17 | Apple Inc. | Dynamic phrase expansion of language input |
US10509712B2 (en) * | 2017-11-30 | 2019-12-17 | Vmware, Inc. | Methods and systems to determine baseline event-type distributions of event sources and detect changes in behavior of event sources |
US10515147B2 (en) | 2010-12-22 | 2019-12-24 | Apple Inc. | Using statistical language models for contextual lookup |
US10521466B2 (en) | 2016-06-11 | 2019-12-31 | Apple Inc. | Data driven natural language event detection and classification |
US10530786B2 (en) | 2017-05-15 | 2020-01-07 | Forcepoint Llc | Managing access to user profile information via a distributed transaction database |
US10528533B2 (en) * | 2017-02-09 | 2020-01-07 | Adobe Inc. | Anomaly detection at coarser granularity of data |
US10540976B2 (en) | 2009-06-05 | 2020-01-21 | Apple Inc. | Contextual voice commands |
US10540410B2 (en) * | 2017-11-15 | 2020-01-21 | Sap Se | Internet of things structured query language query formation |
US10547589B2 (en) | 2016-05-09 | 2020-01-28 | Cisco Technology, Inc. | System for implementing a small computer systems interface protocol over a content centric network |
US10552407B2 (en) * | 2014-02-07 | 2020-02-04 | Mackay Memorial Hospital | Computing device for data managing and decision making |
US10552013B2 (en) | 2014-12-02 | 2020-02-04 | Apple Inc. | Data detection |
US10553209B2 (en) | 2010-01-18 | 2020-02-04 | Apple Inc. | Systems and methods for hands-free notification summaries |
WO2020033404A1 (en) * | 2018-08-07 | 2020-02-13 | Triad National Security, Llc | Modeling anomalousness of new subgraphs observed locally in a dynamic graph based on subgraph attributes |
US10567477B2 (en) | 2015-03-08 | 2020-02-18 | Apple Inc. | Virtual assistant continuity |
CN110826492A (en) * | 2019-11-07 | 2020-02-21 | 长沙品先信息技术有限公司 | Method for detecting abnormal behaviors of crowd in sensitive area based on behavior analysis |
US10572476B2 (en) | 2013-03-14 | 2020-02-25 | Apple Inc. | Refining a search based on schedule items |
US10587631B2 (en) * | 2013-03-11 | 2020-03-10 | Facebook, Inc. | Database attack detection tool |
US10593346B2 (en) | 2016-12-22 | 2020-03-17 | Apple Inc. | Rank-reduced token representation for automatic speech recognition |
US10592095B2 (en) | 2014-05-23 | 2020-03-17 | Apple Inc. | Instantaneous speaking of content on touch devices |
US10592666B2 (en) | 2017-08-31 | 2020-03-17 | Micro Focus Llc | Detecting anomalous entities |
US10592604B2 (en) | 2018-03-12 | 2020-03-17 | Apple Inc. | Inverse text normalization for automatic speech recognition |
US10610144B2 (en) | 2015-08-19 | 2020-04-07 | Palo Alto Research Center Incorporated | Interactive remote patient monitoring and condition management intervention system |
US10623431B2 (en) * | 2017-05-15 | 2020-04-14 | Forcepoint Llc | Discerning psychological state from correlated user behavior and contextual information |
US10623428B2 (en) | 2016-09-12 | 2020-04-14 | Vectra Networks, Inc. | Method and system for detecting suspicious administrative activity |
US20200120151A1 (en) * | 2016-09-19 | 2020-04-16 | Ebay Inc. | Interactive Real-Time Visualization System for Large-Scale Streaming Data |
CN111026270A (en) * | 2019-12-09 | 2020-04-17 | 大连外国语大学 | User behavior pattern mining method under mobile context awareness environment |
US10628462B2 (en) * | 2016-06-27 | 2020-04-21 | Microsoft Technology Licensing, Llc | Propagating a status among related events |
US10636424B2 (en) | 2017-11-30 | 2020-04-28 | Apple Inc. | Multi-turn canned dialog |
US10642574B2 (en) | 2013-03-14 | 2020-05-05 | Apple Inc. | Device, method, and graphical user interface for outputting captions |
US10652394B2 (en) | 2013-03-14 | 2020-05-12 | Apple Inc. | System and method for processing voicemail |
RU2721176C2 (en) * | 2016-03-04 | 2020-05-18 | Аксон Вайб Аг | Systems and methods for predicting user behavior based on location data |
US10657328B2 (en) | 2017-06-02 | 2020-05-19 | Apple Inc. | Multi-task recurrent neural network architecture for efficient morphology handling in neural language modeling |
US10659851B2 (en) | 2014-06-30 | 2020-05-19 | Apple Inc. | Real-time digital assistant knowledge updates |
US10672399B2 (en) | 2011-06-03 | 2020-06-02 | Apple Inc. | Switching between text data and audio data based on a mapping |
US10671428B2 (en) | 2015-09-08 | 2020-06-02 | Apple Inc. | Distributed personal assistant |
US10679002B2 (en) * | 2017-04-13 | 2020-06-09 | International Business Machines Corporation | Text analysis of narrative documents |
US10679605B2 (en) | 2010-01-18 | 2020-06-09 | Apple Inc. | Hands-free list-reading by intelligent automated assistant |
US10679067B2 (en) * | 2017-07-26 | 2020-06-09 | Peking University Shenzhen Graduate School | Method for detecting violent incident in video based on hypergraph transition |
US10684703B2 (en) | 2018-06-01 | 2020-06-16 | Apple Inc. | Attention aware virtual assistant dismissal |
US20200193264A1 (en) * | 2018-12-14 | 2020-06-18 | At&T Intellectual Property I, L.P. | Synchronizing virtual agent behavior bias to user context and personality attributes |
WO2020124010A1 (en) * | 2018-12-14 | 2020-06-18 | University Of Georgia Research Foundation, Inc. | Condition monitoring via energy consumption audit in electrical devices and electrical waveform audit in power networks |
CN111310178A (en) * | 2020-01-20 | 2020-06-19 | 武汉理工大学 | Firmware vulnerability detection method and system under cross-platform scene |
US10691473B2 (en) | 2015-11-06 | 2020-06-23 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US10701038B2 (en) | 2015-07-27 | 2020-06-30 | Cisco Technology, Inc. | Content negotiation in a content centric network |
US20200211141A1 (en) * | 2016-04-22 | 2020-07-02 | FiscalNote, Inc. | Systems and methods for analyzing policymaker influence |
CN111371594A (en) * | 2020-02-25 | 2020-07-03 | 成都西加云杉科技有限公司 | Equipment abnormity warning method and device and electronic equipment |
US10705794B2 (en) | 2010-01-18 | 2020-07-07 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
US20200226512A1 (en) * | 2017-09-22 | 2020-07-16 | 1Nteger, Llc | Systems and methods for investigating and evaluating financial crime and sanctions-related risks |
CN111428049A (en) * | 2020-03-20 | 2020-07-17 | 北京百度网讯科技有限公司 | Method, device, equipment and storage medium for generating event topic |
US10726832B2 (en) | 2017-05-11 | 2020-07-28 | Apple Inc. | Maintaining privacy of personal information |
US10726072B2 (en) | 2017-11-15 | 2020-07-28 | Sap Se | Internet of things search and discovery graph engine construction |
US10733375B2 (en) | 2018-01-31 | 2020-08-04 | Apple Inc. | Knowledge-based framework for improving natural language understanding |
US10733993B2 (en) | 2016-06-10 | 2020-08-04 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US10733982B2 (en) | 2018-01-08 | 2020-08-04 | Apple Inc. | Multi-directional dialog |
US10742596B2 (en) | 2016-03-04 | 2020-08-11 | Cisco Technology, Inc. | Method and system for reducing a collision probability of hash-based names using a publisher identifier |
US10747498B2 (en) | 2015-09-08 | 2020-08-18 | Apple Inc. | Zero latency digital assistant |
US10748546B2 (en) | 2017-05-16 | 2020-08-18 | Apple Inc. | Digital assistant services based on device capabilities |
US10748529B1 (en) | 2013-03-15 | 2020-08-18 | Apple Inc. | Voice activated device for use with a voice-based digital assistant |
US10749884B2 (en) | 2015-09-05 | 2020-08-18 | Mastercard Technologies Canada ULC | Systems and methods for detecting and preventing spoofing |
US10755051B2 (en) | 2017-09-29 | 2020-08-25 | Apple Inc. | Rule-based natural language processing |
US10755703B2 (en) | 2017-05-11 | 2020-08-25 | Apple Inc. | Offline personal assistant |
US10757133B2 (en) | 2014-02-21 | 2020-08-25 | Intuit Inc. | Method and system for creating and deploying virtual assets |
US10762200B1 (en) | 2019-05-20 | 2020-09-01 | Sentinel Labs Israel Ltd. | Systems and methods for executable code detection, automatic feature extraction and position independent code detection |
US10762293B2 (en) | 2010-12-22 | 2020-09-01 | Apple Inc. | Using parts-of-speech tagging and named entity recognition for spelling correction |
US10769283B2 (en) | 2017-10-31 | 2020-09-08 | Forcepoint, LLC | Risk adaptive protection |
US10776708B2 (en) | 2013-03-01 | 2020-09-15 | Forcepoint, LLC | Analyzing behavior in light of social time |
US10776463B2 (en) * | 2015-08-12 | 2020-09-15 | Kryptowire LLC | Active authentication of users |
US10783200B2 (en) * | 2014-10-10 | 2020-09-22 | Salesforce.Com, Inc. | Systems and methods of de-duplicating similar news feed items |
US10791216B2 (en) | 2013-08-06 | 2020-09-29 | Apple Inc. | Auto-activating smart responses based on activities from remote devices |
US10789041B2 (en) | 2014-09-12 | 2020-09-29 | Apple Inc. | Dynamic thresholds for always listening speech trigger |
US10789959B2 (en) | 2018-03-02 | 2020-09-29 | Apple Inc. | Training speaker recognition models for digital assistants |
US10789945B2 (en) | 2017-05-12 | 2020-09-29 | Apple Inc. | Low-latency intelligent automated assistant |
US10791176B2 (en) | 2017-05-12 | 2020-09-29 | Apple Inc. | Synchronization and task delegation of a digital assistant |
FR3094600A1 (en) * | 2019-03-29 | 2020-10-02 | Orange | Method of extracting at least one communication pattern in a communication network |
CN111767449A (en) * | 2020-06-30 | 2020-10-13 | 北京百度网讯科技有限公司 | User data processing method, device, computing equipment and medium |
US10802797B2 (en) | 2013-01-23 | 2020-10-13 | Splunk Inc. | Providing an extraction rule associated with a selected portion of an event |
US10803074B2 (en) | 2015-08-10 | 2020-10-13 | Hewlett Packard Entperprise Development LP | Evaluating system behaviour |
US10802872B2 (en) | 2018-09-12 | 2020-10-13 | At&T Intellectual Property I, L.P. | Task delegation and cooperation for automated assistants |
US10810274B2 (en) | 2017-05-15 | 2020-10-20 | Apple Inc. | Optimizing dialogue policy decisions for digital assistants using implicit feedback |
WO2020214187A1 (en) * | 2019-04-19 | 2020-10-22 | Texas State University | Identifying and quantifying sentiment and promotion bias in social and content networks |
US10818288B2 (en) | 2018-03-26 | 2020-10-27 | Apple Inc. | Natural assistant interaction |
US10832153B2 (en) | 2013-03-01 | 2020-11-10 | Forcepoint, LLC | Analyzing behavior in light of social time |
US10831785B2 (en) | 2016-04-11 | 2020-11-10 | International Business Machines Corporation | Identifying security breaches from clustering properties |
US20200356900A1 (en) * | 2019-05-07 | 2020-11-12 | Cerebri AI Inc. | Predictive, machine-learning, locale-aware computer models suitable for location- and trajectory-aware training sets |
US10839159B2 (en) | 2018-09-28 | 2020-11-17 | Apple Inc. | Named entity normalization in a spoken dialog system |
US10841321B1 (en) * | 2017-03-28 | 2020-11-17 | Veritas Technologies Llc | Systems and methods for detecting suspicious users on networks |
CN111967362A (en) * | 2020-08-09 | 2020-11-20 | 电子科技大学 | Hypergraph feature fusion and ensemble learning human behavior identification method for wearable equipment |
US10846610B2 (en) * | 2016-02-05 | 2020-11-24 | Nec Corporation | Scalable system and method for real-time predictions and anomaly detection |
US10846623B2 (en) | 2014-10-15 | 2020-11-24 | Brighterion, Inc. | Data clean-up method for improving predictive model training |
US10848508B2 (en) | 2016-09-07 | 2020-11-24 | Patternex, Inc. | Method and system for generating synthetic feature vectors from real, labelled feature vectors in artificial intelligence training of a big data machine to defend |
US10853496B2 (en) | 2019-04-26 | 2020-12-01 | Forcepoint, LLC | Adaptive trust profile behavioral fingerprint |
US10862927B2 (en) | 2017-05-15 | 2020-12-08 | Forcepoint, LLC | Dividing events into sessions during adaptive trust profile operations |
US10866939B2 (en) * | 2015-11-30 | 2020-12-15 | Micro Focus Llc | Alignment and deduplication of time-series datasets |
US10868832B2 (en) | 2017-03-22 | 2020-12-15 | Ca, Inc. | Systems and methods for enforcing dynamic network security policies |
CN112084140A (en) * | 2020-09-03 | 2020-12-15 | 中国人民大学 | Fine-grained stream data processing method and system in heterogeneous system |
CN112115413A (en) * | 2020-09-07 | 2020-12-22 | 广西天懿智汇建设投资有限公司 | Termite quantity monitoring method based on iterative method |
CN112153343A (en) * | 2020-09-25 | 2020-12-29 | 北京百度网讯科技有限公司 | Elevator safety monitoring method and device, monitoring camera and storage medium |
US10892996B2 (en) | 2018-06-01 | 2021-01-12 | Apple Inc. | Variable latency device coordination |
CN112214775A (en) * | 2020-10-09 | 2021-01-12 | 平安国际智慧城市科技股份有限公司 | Injection type attack method and device for graph data, medium and electronic equipment |
US10896421B2 (en) | 2014-04-02 | 2021-01-19 | Brighterion, Inc. | Smart retail analytics and commercial messaging |
CN112307435A (en) * | 2020-10-30 | 2021-02-02 | 三峡大学 | Method for judging and screening abnormal electricity consumption based on fuzzy clustering and trend |
US10909331B2 (en) | 2018-03-30 | 2021-02-02 | Apple Inc. | Implicit identification of translation payload with neural machine translation |
EP3772003A1 (en) * | 2019-08-02 | 2021-02-03 | CrowdStrike, Inc. | Mapping unbounded incident scores to a fixed range |
US10915643B2 (en) | 2017-05-15 | 2021-02-09 | Forcepoint, LLC | Adaptive trust profile endpoint architecture |
US10915435B2 (en) * | 2018-11-28 | 2021-02-09 | International Business Machines Corporation | Deep learning based problem advisor |
US10917423B2 (en) | 2017-05-15 | 2021-02-09 | Forcepoint, LLC | Intelligently differentiating between different types of states and attributes when using an adaptive trust profile |
US10928918B2 (en) | 2018-05-07 | 2021-02-23 | Apple Inc. | Raise to speak |
US10929777B2 (en) | 2014-08-08 | 2021-02-23 | Brighterion, Inc. | Method of automating data science services |
CN112417099A (en) * | 2020-11-20 | 2021-02-26 | 南京邮电大学 | Method for constructing fraud user detection model based on graph attention network |
US10938817B2 (en) * | 2018-04-05 | 2021-03-02 | Accenture Global Solutions Limited | Data security and protection system using distributed ledgers to store validated data in a knowledge graph |
CN112463848A (en) * | 2020-11-05 | 2021-03-09 | 中国建设银行股份有限公司 | Method, system, device and storage medium for detecting abnormal user behavior |
US20210075812A1 (en) * | 2018-05-08 | 2021-03-11 | Abc Software, Sia | A system and a method for sequential anomaly revealing in a computer network |
US10949428B2 (en) | 2018-07-12 | 2021-03-16 | Forcepoint, LLC | Constructing event distributions via a streaming scoring operation |
US20210081554A1 (en) * | 2018-08-24 | 2021-03-18 | Bank Of America Corporation | Error detection of data leakage in a data processing system |
US20210081950A1 (en) * | 2018-08-15 | 2021-03-18 | Advanced New Technologies Co., Ltd. | Method and apparatus for identifying identity information |
US10956412B2 (en) | 2016-08-09 | 2021-03-23 | Cisco Technology, Inc. | Method and system for conjunctive normal form attribute matching in a content centric network |
CN112566307A (en) * | 2019-09-10 | 2021-03-26 | 酷矽半导体科技(上海)有限公司 | Safety display system and safety display method |
CN112579661A (en) * | 2019-09-29 | 2021-03-30 | 杭州海康威视数字技术股份有限公司 | Method and device for determining specific target pair, computer equipment and storage medium |
CN112580022A (en) * | 2020-12-07 | 2021-03-30 | 北京中电飞华通信有限公司 | Host system safety early warning method, device, equipment and storage medium |
US10972332B2 (en) * | 2015-08-31 | 2021-04-06 | Adobe Inc. | Identifying factors that contribute to a metric anomaly |
CN112651988A (en) * | 2021-01-13 | 2021-04-13 | 重庆大学 | Finger-shaped image segmentation, finger-shaped plate dislocation and fastener abnormality detection method based on double-pointer positioning |
US10977655B2 (en) | 2014-10-15 | 2021-04-13 | Brighterion, Inc. | Method for improving operating profits with better automated decision making with artificial intelligence |
US20210112074A1 (en) * | 2017-05-15 | 2021-04-15 | Forcepoint, LLC | Using a Behavior-Based Modifier When Generating a User Entity Risk Score |
US10984423B2 (en) | 2014-10-15 | 2021-04-20 | Brighterion, Inc. | Method of operating artificial intelligence machines to improve predictive model training and performance |
US10986113B2 (en) * | 2018-01-24 | 2021-04-20 | Hrl Laboratories, Llc | System for continuous validation and threat protection of mobile applications |
US10986121B2 (en) | 2019-01-24 | 2021-04-20 | Darktrace Limited | Multivariate network structure anomaly detector |
US10984780B2 (en) | 2018-05-21 | 2021-04-20 | Apple Inc. | Global semantic word embeddings using bi-directional recurrent neural networks |
US10986110B2 (en) | 2017-04-26 | 2021-04-20 | Elasticsearch B.V. | Anomaly and causation detection in computing environments using counterfactual processing |
US20210126933A1 (en) * | 2018-06-22 | 2021-04-29 | Nec Corporation | Communication analysis apparatus, communication analysis method, communication environment analysis apparatus, communication environment analysis method, and program |
US10999317B2 (en) * | 2017-04-28 | 2021-05-04 | International Business Machines Corporation | Blockchain tracking of virtual universe traversal results |
US10997599B2 (en) | 2014-10-28 | 2021-05-04 | Brighterion, Inc. | Method for detecting merchant data breaches with a computer network server |
US10999297B2 (en) | 2017-05-15 | 2021-05-04 | Forcepoint, LLC | Using expected behavior of an entity when prepopulating an adaptive trust profile |
US11003717B1 (en) * | 2018-02-08 | 2021-05-11 | Amazon Technologies, Inc. | Anomaly detection in streaming graphs |
US11005738B1 (en) | 2014-04-09 | 2021-05-11 | Quest Software Inc. | System and method for end-to-end response-time analysis |
US11010561B2 (en) | 2018-09-27 | 2021-05-18 | Apple Inc. | Sentiment prediction from textual data |
CN112818017A (en) * | 2021-01-22 | 2021-05-18 | 百果园技术(新加坡)有限公司 | Event data processing method and device |
US11010127B2 (en) | 2015-06-29 | 2021-05-18 | Apple Inc. | Virtual assistant for media playback |
US11010550B2 (en) | 2015-09-29 | 2021-05-18 | Apple Inc. | Unified language modeling framework for word prediction, auto-completion and auto-correction |
WO2021097041A1 (en) * | 2019-11-12 | 2021-05-20 | Aveva Software, Llc | Operational anomaly feedback loop system and method |
CN112837078A (en) * | 2021-03-03 | 2021-05-25 | 万商云集(成都)科技股份有限公司 | Cluster-based user abnormal behavior detection method |
US11017330B2 (en) | 2014-05-20 | 2021-05-25 | Elasticsearch B.V. | Method and system for analysing data |
US11025565B2 (en) | 2015-06-07 | 2021-06-01 | Apple Inc. | Personalized prediction of responses for instant messaging |
US11023513B2 (en) | 2007-12-20 | 2021-06-01 | Apple Inc. | Method and apparatus for searching using an active ontology |
US11025659B2 (en) | 2018-10-23 | 2021-06-01 | Forcepoint, LLC | Security system using pseudonyms to anonymously identify entities and corresponding security risk related behaviors |
US11025638B2 (en) | 2018-07-19 | 2021-06-01 | Forcepoint, LLC | System and method providing security friction for atypical resource access requests |
US11023894B2 (en) | 2014-08-08 | 2021-06-01 | Brighterion, Inc. | Fast access vectors in real-time behavioral profiling in fraudulent financial transactions |
US11030886B2 (en) * | 2016-01-21 | 2021-06-08 | Hangzhou Hikvision Digital Technology Co., Ltd. | Method and device for updating online self-learning event detection model |
US11030527B2 (en) | 2015-07-31 | 2021-06-08 | Brighterion, Inc. | Method for calling for preemptive maintenance and for equipment failure prevention |
US20210174128A1 (en) * | 2018-09-19 | 2021-06-10 | Chenope, Inc. | System and Method for Detecting and Analyzing Digital Communications |
US20210174277A1 (en) * | 2019-08-09 | 2021-06-10 | Capital One Services, Llc | Compliance management for emerging risks |
CN112967062A (en) * | 2021-03-02 | 2021-06-15 | 东华大学 | User identity recognition method based on cautious degree |
US11044221B2 (en) * | 2012-12-12 | 2021-06-22 | Netspective Communications Llc | Integration of devices through a social networking platform |
CN113015195A (en) * | 2021-02-08 | 2021-06-22 | 安徽理工大学 | Wireless sensor network data acquisition method and system |
US11050768B1 (en) * | 2016-09-21 | 2021-06-29 | Amazon Technologies, Inc. | Detecting compute resource anomalies in a group of computing resources |
US20210203660A1 (en) * | 2018-05-25 | 2021-07-01 | Nippon Telegraph And Telephone Corporation | Specifying device, specifying method, and specifying program |
US11055754B1 (en) * | 2011-01-04 | 2021-07-06 | The Pnc Financial Services Group, Inc. | Alert event platform |
US11062336B2 (en) | 2016-03-07 | 2021-07-13 | Qbeats Inc. | Self-learning valuation |
US11062317B2 (en) | 2014-10-28 | 2021-07-13 | Brighterion, Inc. | Data breach detection |
US11070949B2 (en) | 2015-05-27 | 2021-07-20 | Apple Inc. | Systems and methods for proactively identifying and surfacing relevant content on an electronic device with a touch-sensitive display |
CN113168469A (en) * | 2018-12-10 | 2021-07-23 | 比特梵德知识产权管理有限公司 | System and method for behavioral threat detection |
US11075932B2 (en) | 2018-02-20 | 2021-07-27 | Darktrace Holdings Limited | Appliance extension for remote communication with a cyber security appliance |
US11080709B2 (en) | 2014-10-15 | 2021-08-03 | Brighterion, Inc. | Method of reducing financial losses in multiple payment channels upon a recognition of fraud first appearing in any one payment channel |
US11080032B1 (en) | 2020-03-31 | 2021-08-03 | Forcepoint Llc | Containerized infrastructure for deployment of microservices |
US11080793B2 (en) | 2014-10-15 | 2021-08-03 | Brighterion, Inc. | Method of personalizing, individualizing, and automating the management of healthcare fraud-waste-abuse to unique individual healthcare providers |
US11080109B1 (en) | 2020-02-27 | 2021-08-03 | Forcepoint Llc | Dynamically reweighting distributions of event observations |
EP3866394A1 (en) * | 2020-02-12 | 2021-08-18 | EXFO Solutions SAS | Detection, characterization, and prediction of real-time events occurring approximately periodically |
CN113283377A (en) * | 2021-06-10 | 2021-08-20 | 重庆师范大学 | Face privacy protection method, system, medium and electronic terminal |
US20210271769A1 (en) * | 2020-03-02 | 2021-09-02 | Forcepoint, LLC | Type-dependent event deduplication |
CN113344133A (en) * | 2021-06-30 | 2021-09-03 | 上海观安信息技术股份有限公司 | Method and system for detecting abnormal fluctuation of time sequence behavior |
US11115300B2 (en) * | 2017-09-12 | 2021-09-07 | Cisco Technology, Inc | Anomaly detection and reporting in a network assurance appliance |
US11115624B1 (en) | 2019-07-22 | 2021-09-07 | Salesloft, Inc. | Methods and systems for joining a conference |
WO2021188315A1 (en) * | 2020-03-19 | 2021-09-23 | Liveramp, Inc. | Cyber security system and method |
US11132681B2 (en) | 2018-07-06 | 2021-09-28 | At&T Intellectual Property I, L.P. | Services for entity trust conveyances |
US20210304207A1 (en) * | 2018-10-16 | 2021-09-30 | Mastercard International Incorporated | Systems and methods for monitoring machine learning systems |
CN113472582A (en) * | 2020-07-15 | 2021-10-01 | 北京沃东天骏信息技术有限公司 | System and method for alarm correlation and alarm aggregation in information technology monitoring |
US11140099B2 (en) | 2019-05-21 | 2021-10-05 | Apple Inc. | Providing message response suggestions |
US20210313056A1 (en) * | 2016-07-18 | 2021-10-07 | Abbyy Development Inc. | System and method for visual analysis of event sequences |
US11145294B2 (en) | 2018-05-07 | 2021-10-12 | Apple Inc. | Intelligent automated assistant for delivering content from user experiences |
US11151899B2 (en) | 2013-03-15 | 2021-10-19 | Apple Inc. | User training by intelligent digital assistant |
US11151502B2 (en) * | 2016-09-01 | 2021-10-19 | PagerDuty, Inc. | Real-time adaptive operations performance management system |
US11157846B2 (en) * | 2018-08-06 | 2021-10-26 | Sociometric Solutions, Inc. | System and method for transforming communication metadata and sensor data into an objective measure of the communication distribution of an organization |
US20210344695A1 (en) * | 2020-04-30 | 2021-11-04 | International Business Machines Corporation | Anomaly detection using an ensemble of models |
US11170166B2 (en) | 2018-09-28 | 2021-11-09 | Apple Inc. | Neural typographical error modeling via generative adversarial networks |
US11170034B1 (en) * | 2020-09-21 | 2021-11-09 | Foxit Software Inc. | System and method for determining credibility of content in a number of documents |
US11171980B2 (en) * | 2018-11-02 | 2021-11-09 | Forcepoint Llc | Contagion risk detection, analysis and protection |
TWI746914B (en) * | 2017-12-28 | 2021-11-21 | 國立臺灣大學 | Detective method and system for activity-or-behavior model construction and automatic detection of the abnormal activities or behaviors of a subject system without requiring prior domain knowledge |
US11179639B1 (en) | 2015-10-30 | 2021-11-23 | Electronic Arts Inc. | Fraud detection system |
US11184404B1 (en) * | 2018-09-07 | 2021-11-23 | Salt Stack, Inc. | Performing idempotent operations to scan and remediate configuration settings of a device |
CN113704233A (en) * | 2021-10-29 | 2021-11-26 | 飞狐信息技术(天津)有限公司 | Keyword detection method and system |
US11190589B1 (en) | 2020-10-27 | 2021-11-30 | Forcepoint, LLC | System and method for efficient fingerprinting in cloud multitenant data loss prevention |
CN113726814A (en) * | 2021-09-09 | 2021-11-30 | 中国电信股份有限公司 | User abnormal behavior identification method, device, equipment and storage medium |
US11188864B2 (en) * | 2016-06-27 | 2021-11-30 | International Business Machines Corporation | Calculating an expertise score from aggregated employee data |
US20210392152A1 (en) * | 2018-05-25 | 2021-12-16 | At&T Intellectual Property I, L.P. | Intrusion detection using robust singular value decomposition |
US11204929B2 (en) * | 2014-11-18 | 2021-12-21 | International Business Machines Corporation | Evidence aggregation across heterogeneous links for intelligence gathering using a question answering system |
US11204787B2 (en) | 2017-01-09 | 2021-12-21 | Apple Inc. | Application integration with a digital assistant |
CN113849497A (en) * | 2021-08-02 | 2021-12-28 | 跨境云(横琴)科技创新研究中心有限公司 | Attribute weight and rule driving based exception aggregation method and system |
CN113869415A (en) * | 2021-09-28 | 2021-12-31 | 华中师范大学 | Problem behavior detection and early warning system |
US11218497B2 (en) | 2017-02-20 | 2022-01-04 | Micro Focus Llc | Reporting behavior anomalies |
US11217251B2 (en) | 2019-05-06 | 2022-01-04 | Apple Inc. | Spoken notifications |
US11223646B2 (en) | 2020-01-22 | 2022-01-11 | Forcepoint, LLC | Using concerning behaviors when performing entity-based risk calculations |
US11227589B2 (en) | 2016-06-06 | 2022-01-18 | Apple Inc. | Intelligent list reading |
CN113963020A (en) * | 2021-09-18 | 2022-01-21 | 江苏大学 | Multi-intelligent-network-connected automobile cooperative target tracking method based on hypergraph matching |
US11231904B2 (en) | 2015-03-06 | 2022-01-25 | Apple Inc. | Reducing response latency of intelligent automated assistants |
US11232111B2 (en) | 2019-04-14 | 2022-01-25 | Zoominfo Apollo Llc | Automated company matching |
US11232364B2 (en) | 2017-04-03 | 2022-01-25 | DataVisor, Inc. | Automated rule recommendation engine |
US20220027575A1 (en) * | 2020-10-14 | 2022-01-27 | Beijing Baidu Netcom Science Technology Co., Ltd. | Method of predicting emotional style of dialogue, electronic device, and storage medium |
US11237797B2 (en) | 2019-05-31 | 2022-02-01 | Apple Inc. | User activity shortcut suggestions |
US11238351B2 (en) | 2014-11-19 | 2022-02-01 | International Business Machines Corporation | Grading sources and managing evidence for intelligence analysis |
US11244113B2 (en) | 2014-11-19 | 2022-02-08 | International Business Machines Corporation | Evaluating evidential links based on corroboration for intelligence analysis |
US11243849B2 (en) * | 2012-12-27 | 2022-02-08 | Commvault Systems, Inc. | Restoration of centralized data storage manager, such as data storage manager in a hierarchical data storage system |
US11244374B2 (en) * | 2018-08-31 | 2022-02-08 | Realm Ip, Llc | System and machine implemented method for adaptive collaborative matching |
US11243922B2 (en) * | 2016-12-01 | 2022-02-08 | Tencent Technology (Shenzhen) Company Limited | Method, apparatus, and storage medium for migrating data node in database cluster |
CN114039744A (en) * | 2021-09-29 | 2022-02-11 | 中孚信息股份有限公司 | Abnormal behavior prediction method and system based on user characteristic label |
US11258806B1 (en) * | 2019-06-24 | 2022-02-22 | Mandiant, Inc. | System and method for automatically associating cybersecurity intelligence to cyberthreat actors |
US11263650B2 (en) * | 2016-04-25 | 2022-03-01 | [24]7.ai, Inc. | Process and system to categorize, evaluate and optimize a customer experience |
US11265225B2 (en) * | 2018-10-28 | 2022-03-01 | Netz Forecasts Ltd. | Systems and methods for prediction of anomalies |
US11269943B2 (en) * | 2018-07-26 | 2022-03-08 | JANZZ Ltd | Semantic matching system and method |
US11271991B2 (en) | 2018-04-19 | 2022-03-08 | Pinx, Inc. | Systems, methods and media for a distributed social media network and system of record |
US11270211B2 (en) * | 2018-02-05 | 2022-03-08 | Microsoft Technology Licensing, Llc | Interactive semantic data exploration for error discovery |
US11275788B2 (en) * | 2019-10-21 | 2022-03-15 | International Business Machines Corporation | Controlling information stored in multiple service computing systems |
US11281993B2 (en) | 2016-12-05 | 2022-03-22 | Apple Inc. | Model and ensemble compression for metric learning |
WO2022061244A1 (en) * | 2020-09-18 | 2022-03-24 | Ethimetrix Llc | System and method for predictive corruption risk assessment |
US11288231B2 (en) * | 2014-07-09 | 2022-03-29 | Splunk Inc. | Reproducing datasets generated by alert-triggering search queries |
US11289073B2 (en) | 2019-05-31 | 2022-03-29 | Apple Inc. | Device text to speech |
US11297088B2 (en) * | 2015-10-28 | 2022-04-05 | Qomplx, Inc. | System and method for comprehensive data loss prevention and compliance management |
US11294700B2 (en) | 2014-04-18 | 2022-04-05 | Intuit Inc. | Method and system for enabling self-monitoring virtual assets to correlate external events with characteristic patterns associated with the virtual assets |
US20220108330A1 (en) * | 2020-10-06 | 2022-04-07 | Rebecca Mendoza Saltiel | Interactive and iterative behavioral model, system, and method for detecting fraud, waste, abuse and anomaly |
CN114329455A (en) * | 2022-03-08 | 2022-04-12 | 北京大学 | User abnormal behavior detection method and device based on heterogeneous graph embedding |
US11301477B2 (en) | 2017-05-12 | 2022-04-12 | Apple Inc. | Feedback analysis of a digital assistant |
CN114356642A (en) * | 2022-03-11 | 2022-04-15 | 军事科学院系统工程研究院网络信息研究所 | Abnormal event automatic diagnosis method and system based on process mining |
US11307752B2 (en) | 2019-05-06 | 2022-04-19 | Apple Inc. | User configurable task triggers |
CN114373186A (en) * | 2022-01-11 | 2022-04-19 | 北京新学堂网络科技有限公司 | Social software information interaction method, device and medium |
US20220121410A1 (en) * | 2016-03-31 | 2022-04-21 | Splunk Inc. | Technology add-on interface |
US11314787B2 (en) | 2018-04-18 | 2022-04-26 | Forcepoint, LLC | Temporal resolution of an entity |
US11323463B2 (en) * | 2019-06-14 | 2022-05-03 | Datadog, Inc. | Generating data structures representing relationships among entities of a high-scale network infrastructure |
US11329933B1 (en) * | 2020-12-28 | 2022-05-10 | Drift.com, Inc. | Persisting an AI-supported conversation across multiple channels |
US11330005B2 (en) * | 2019-04-15 | 2022-05-10 | Vectra Ai, Inc. | Privileged account breach detections based on behavioral access patterns |
US11348573B2 (en) | 2019-03-18 | 2022-05-31 | Apple Inc. | Multimodality in digital assistant systems |
US11348110B2 (en) | 2014-08-08 | 2022-05-31 | Brighterion, Inc. | Artificial intelligence fraud management solution |
US11348016B2 (en) * | 2016-09-21 | 2022-05-31 | Scianta Analytics, LLC | Cognitive modeling apparatus for assessing values qualitatively across a multiple dimension terrain |
US11360641B2 (en) | 2019-06-01 | 2022-06-14 | Apple Inc. | Increasing the relevance of new available information |
US11372867B2 (en) * | 2020-09-09 | 2022-06-28 | Citrix Systems, Inc. | Bootstrapped relevance scoring system |
US11386266B2 (en) | 2018-06-01 | 2022-07-12 | Apple Inc. | Text correction |
US20220222670A1 (en) * | 2021-01-08 | 2022-07-14 | Feedzai - Consultadoria e Inovacao Tecnologica, S. A. | Generation of divergence distributions for automated data analysis |
US11394725B1 (en) * | 2017-05-03 | 2022-07-19 | Hrl Laboratories, Llc | Method and system for privacy-preserving targeted substructure discovery on multiplex networks |
US20220232353A1 (en) * | 2021-01-19 | 2022-07-21 | Gluroo Imaginations, Inc. | Messaging-based logging and alerting system |
US11405368B2 (en) * | 2014-11-14 | 2022-08-02 | William J. Ziebell | Systems, methods, and media for a cloud based social media network |
US11411973B2 (en) | 2018-08-31 | 2022-08-09 | Forcepoint, LLC | Identifying security risks using distributions of characteristic features extracted from a plurality of events |
US20220261406A1 (en) * | 2021-02-18 | 2022-08-18 | Walmart Apollo, Llc | Methods and apparatus for improving search retrieval |
US11423908B2 (en) | 2019-05-06 | 2022-08-23 | Apple Inc. | Interpreting spoken requests |
US11429697B2 (en) | 2020-03-02 | 2022-08-30 | Forcepoint, LLC | Eventually consistent entity resolution |
WO2022180613A1 (en) * | 2021-02-26 | 2022-09-01 | Trackerdetect Ltd | Global iterative clustering algorithm to model entities' behaviors and detect anomalies |
US11436512B2 (en) | 2018-07-12 | 2022-09-06 | Forcepoint, LLC | Generating extracted features from an event |
CN115022055A (en) * | 2022-06-09 | 2022-09-06 | 武汉思普崚技术有限公司 | Network attack real-time detection method and device based on dynamic time window |
US11436656B2 (en) | 2016-03-18 | 2022-09-06 | Palo Alto Research Center Incorporated | System and method for a real-time egocentric collaborative filter on large datasets |
US20220286472A1 (en) * | 2021-03-04 | 2022-09-08 | Qatar Foundation For Education, Science And Community Development | Anomalous user account detection systems and methods |
US11443035B2 (en) * | 2019-04-12 | 2022-09-13 | Mcafee, Llc | Behavioral user security policy |
CN115081468A (en) * | 2021-03-15 | 2022-09-20 | 天津大学 | Multi-task convolutional neural network fault diagnosis method based on knowledge migration |
US20220309155A1 (en) * | 2021-03-24 | 2022-09-29 | International Business Machines Corporation | Defending against adversarial queries in a data governance system |
US20220309387A1 (en) * | 2021-03-26 | 2022-09-29 | Capital One Services, Llc | Computer-based systems for metadata-based anomaly detection and methods of use thereof |
US11462215B2 (en) | 2018-09-28 | 2022-10-04 | Apple Inc. | Multi-modal inputs for voice commands |
US11463457B2 (en) | 2018-02-20 | 2022-10-04 | Darktrace Holdings Limited | Artificial intelligence (AI) based cyber threat analyst to support a cyber security appliance |
US11470103B2 (en) | 2016-02-09 | 2022-10-11 | Darktrace Holdings Limited | Anomaly alert system for cyber threat detection |
US11467802B2 (en) | 2017-05-11 | 2022-10-11 | Apple Inc. | Maintaining privacy of personal information |
US11468282B2 (en) | 2015-05-15 | 2022-10-11 | Apple Inc. | Virtual assistant in a communication session |
US11477235B2 (en) | 2020-02-28 | 2022-10-18 | Abnormal Security Corporation | Approaches to creating, managing, and applying a federated database to establish risk posed by third parties |
US11475884B2 (en) | 2019-05-06 | 2022-10-18 | Apple Inc. | Reducing digital assistant latency when a language is incorrectly determined |
US11475898B2 (en) | 2018-10-26 | 2022-10-18 | Apple Inc. | Low-latency multi-speaker speech recognition |
US11477222B2 (en) | 2018-02-20 | 2022-10-18 | Darktrace Holdings Limited | Cyber threat defense system protecting email networks with machine learning models using a range of metadata from observed email communications |
US20220335042A1 (en) * | 2017-04-27 | 2022-10-20 | Google Llc | Cloud inference system |
US11481186B2 (en) | 2018-10-25 | 2022-10-25 | At&T Intellectual Property I, L.P. | Automated assistant context and protocol |
US11481709B1 (en) | 2021-05-20 | 2022-10-25 | Netskope, Inc. | Calibrating user confidence in compliance with an organization's security policies |
US11481485B2 (en) * | 2020-01-08 | 2022-10-25 | Visa International Service Association | Methods and systems for peer grouping in insider threat detection |
US20220345476A1 (en) * | 2018-06-06 | 2022-10-27 | Reliaquest Holdings, Llc | Threat mitigation system and method |
US20220342966A1 (en) * | 2020-03-02 | 2022-10-27 | Abnormal Security Corporation | Multichannel threat detection for protecting against account compromise |
US11488406B2 (en) | 2019-09-25 | 2022-11-01 | Apple Inc. | Text detection using global geometry estimators |
US11495218B2 (en) | 2018-06-01 | 2022-11-08 | Apple Inc. | Virtual assistant operation in multi-device environments |
US11494711B2 (en) | 2014-11-19 | 2022-11-08 | Shoobx, Inc. | Computer-guided corporate relationship management |
US11494275B1 (en) * | 2020-03-30 | 2022-11-08 | Rapid7, Inc. | Automated log entry identification and alert management |
US11494381B1 (en) * | 2021-01-29 | 2022-11-08 | Splunk Inc. | Ingestion and processing of both cloud-based and non-cloud-based data by a data intake and query system |
US11496600B2 (en) | 2019-05-31 | 2022-11-08 | Apple Inc. | Remote execution of machine-learned models |
US11507663B2 (en) | 2014-08-11 | 2022-11-22 | Sentinel Labs Israel Ltd. | Method of remediating operations performed by a program and system thereof |
US20220374434A1 (en) * | 2021-05-19 | 2022-11-24 | Crowdstrike, Inc. | Real-time streaming graph queries |
WO2022246131A1 (en) * | 2021-05-20 | 2022-11-24 | Netskope, Inc. | Scoring confidence in user compliance with an organization's security policies |
US11516237B2 (en) | 2019-08-02 | 2022-11-29 | Crowdstrike, Inc. | Visualization and control of remotely monitored hosts |
US11516206B2 (en) | 2020-05-01 | 2022-11-29 | Forcepoint Llc | Cybersecurity system having digital certificate reputation system |
CN115440390A (en) * | 2022-11-09 | 2022-12-06 | 山东大学 | Method, system, equipment and storage medium for predicting number of cases of infectious diseases |
US11522898B1 (en) * | 2018-12-17 | 2022-12-06 | Wells Fargo Bank, N.A. | Autonomous configuration modeling and management |
US11526750B2 (en) | 2018-10-29 | 2022-12-13 | Zoominfo Apollo Llc | Automated industry classification with deep learning |
US11532306B2 (en) | 2017-05-16 | 2022-12-20 | Apple Inc. | Detecting a trigger of a digital assistant |
US11544390B2 (en) | 2020-05-05 | 2023-01-03 | Forcepoint Llc | Method, system, and apparatus for probabilistic identification of encrypted files |
US20230004889A1 (en) * | 2017-09-22 | 2023-01-05 | 1Nteger, Llc | Systems and methods for risk data navigation |
US11552969B2 (en) | 2018-12-19 | 2023-01-10 | Abnormal Security Corporation | Threat detection platforms for detecting, characterizing, and remediating email-based threats in real time |
US11563756B2 (en) | 2020-04-15 | 2023-01-24 | Crowdstrike, Inc. | Distributed digital security system |
US11567847B2 (en) * | 2020-02-04 | 2023-01-31 | International Business Machines Corporation | Identifying anomolous device usage based on usage patterns |
US11568136B2 (en) | 2020-04-15 | 2023-01-31 | Forcepoint Llc | Automatically constructing lexicons from unlabeled datasets |
US11568277B2 (en) * | 2018-12-16 | 2023-01-31 | Intuit Inc. | Method and apparatus for detecting anomalies in mission critical environments using word representation learning |
US11579857B2 (en) | 2020-12-16 | 2023-02-14 | Sentinel Labs Israel Ltd. | Systems, methods and devices for device fingerprinting and automatic deployment of software in a computing network using a peer-to-peer approach |
US11582246B2 (en) | 2019-08-02 | 2023-02-14 | Crowd Strike, Inc. | Advanced incident scoring |
US11588832B2 (en) | 2019-08-02 | 2023-02-21 | Crowdstrike, Inc. | Malicious incident visualization |
US11587559B2 (en) | 2015-09-30 | 2023-02-21 | Apple Inc. | Intelligent device identification |
US11586878B1 (en) | 2021-12-10 | 2023-02-21 | Salesloft, Inc. | Methods and systems for cascading model architecture for providing information on reply emails |
CN115766145A (en) * | 2022-11-04 | 2023-03-07 | 中国电信股份有限公司 | Abnormality detection method and apparatus, and computer-readable storage medium |
US11605100B1 (en) * | 2017-12-22 | 2023-03-14 | Salesloft, Inc. | Methods and systems for determining cadences |
US11616812B2 (en) | 2016-12-19 | 2023-03-28 | Attivo Networks Inc. | Deceiving attackers accessing active directory data |
US11616790B2 (en) | 2020-04-15 | 2023-03-28 | Crowdstrike, Inc. | Distributed digital security system |
US20230100315A1 (en) * | 2021-09-28 | 2023-03-30 | Centurylink Intellectual Property Llc | Pattern Identification for Incident Prediction and Resolution |
US11621969B2 (en) | 2017-04-26 | 2023-04-04 | Elasticsearch B.V. | Clustering and outlier detection in anomaly and causation detection for computing environments |
US11631014B2 (en) * | 2019-08-02 | 2023-04-18 | Capital One Services, Llc | Computer-based systems configured for detecting, classifying, and visualizing events in large-scale, multivariate and multidimensional datasets and methods of use thereof |
CN115981970A (en) * | 2023-03-20 | 2023-04-18 | 建信金融科技有限责任公司 | Operation and maintenance data analysis method, device, equipment and medium |
US11630901B2 (en) | 2020-02-03 | 2023-04-18 | Forcepoint Llc | External trigger induced behavioral analyses |
US11638059B2 (en) | 2019-01-04 | 2023-04-25 | Apple Inc. | Content playback on multiple devices |
US11645397B2 (en) | 2020-04-15 | 2023-05-09 | Crowd Strike, Inc. | Distributed digital security system |
US20230141849A1 (en) * | 2021-11-10 | 2023-05-11 | International Business Machines Corporation | Workflow management based on recognition of content of documents |
US11651149B1 (en) | 2012-09-07 | 2023-05-16 | Splunk Inc. | Event selection via graphical user interface control |
US11657813B2 (en) | 2019-05-31 | 2023-05-23 | Apple Inc. | Voice identification in digital assistant systems |
US11663405B2 (en) * | 2018-12-13 | 2023-05-30 | Microsoft Technology Licensing, Llc | Machine learning applications for temporally-related events |
CN116232921A (en) * | 2023-05-08 | 2023-06-06 | 中国电信股份有限公司四川分公司 | Deterministic network data set construction device and method based on hypergraph |
US11683284B2 (en) | 2020-10-23 | 2023-06-20 | Abnormal Security Corporation | Discovering graymail through real-time analysis of incoming email |
CN116304641A (en) * | 2023-05-15 | 2023-06-23 | 山东省计算中心(国家超级计算济南中心) | Anomaly detection interpretation method and system based on reference point search and feature interaction |
US11696060B2 (en) | 2020-07-21 | 2023-07-04 | Apple Inc. | User identification using headphones |
US11693958B1 (en) * | 2022-09-08 | 2023-07-04 | Radiant Security, Inc. | Processing and storing event data in a knowledge graph format for anomaly detection |
CN116386045A (en) * | 2023-06-01 | 2023-07-04 | 创域智能(常熟)网联科技有限公司 | Sensor information analysis method based on artificial intelligence and artificial intelligence platform system |
US11695800B2 (en) | 2016-12-19 | 2023-07-04 | SentinelOne, Inc. | Deceiving attackers accessing network data |
US11704387B2 (en) | 2020-08-28 | 2023-07-18 | Forcepoint Llc | Method and system for fuzzy matching and alias matching for streaming data sets |
US11711379B2 (en) | 2020-04-15 | 2023-07-25 | Crowdstrike, Inc. | Distributed digital security system |
US11709944B2 (en) | 2019-08-29 | 2023-07-25 | Darktrace Holdings Limited | Intelligent adversary simulator |
US11720599B1 (en) | 2014-02-13 | 2023-08-08 | Pivotal Software, Inc. | Clustering and visualizing alerts and incidents |
US11743294B2 (en) | 2018-12-19 | 2023-08-29 | Abnormal Security Corporation | Retrospective learning of communication patterns by machine learning models for discovering abnormal behavior |
US11755585B2 (en) | 2018-07-12 | 2023-09-12 | Forcepoint Llc | Generating enriched events using enriched data and extracted features |
US11755276B2 (en) | 2020-05-12 | 2023-09-12 | Apple Inc. | Reducing description length based on confidence |
US11765209B2 (en) | 2020-05-11 | 2023-09-19 | Apple Inc. | Digital assistant hardware abstraction |
US11785098B2 (en) | 2019-09-30 | 2023-10-10 | Atlassian Pty Ltd. | Systems and methods for personalization of a computer application |
US11790914B2 (en) | 2019-06-01 | 2023-10-17 | Apple Inc. | Methods and user interfaces for voice-based control of electronic devices |
US20230344841A1 (en) * | 2016-06-06 | 2023-10-26 | Netskope, Inc. | Machine learning based anomaly detection initialization |
US11811804B1 (en) * | 2020-12-15 | 2023-11-07 | Red Hat, Inc. | System and method for detecting process anomalies in a distributed computation system utilizing containers |
US11810012B2 (en) | 2018-07-12 | 2023-11-07 | Forcepoint Llc | Identifying event distributions using interrelated events |
US11809483B2 (en) | 2015-09-08 | 2023-11-07 | Apple Inc. | Intelligent automated assistant for media search and playback |
CN117075872A (en) * | 2023-10-17 | 2023-11-17 | 北京长亭科技有限公司 | Method and device for creating security base line based on dynamic parameters |
US11831661B2 (en) | 2021-06-03 | 2023-11-28 | Abnormal Security Corporation | Multi-tiered approach to payload detection for incoming communications |
US11838734B2 (en) | 2020-07-20 | 2023-12-05 | Apple Inc. | Multi-device audio adjustment coordination |
US11836211B2 (en) | 2014-11-21 | 2023-12-05 | International Business Machines Corporation | Generating additional lines of questioning based on evaluation of a hypothetical link between concept entities in evidential data |
CN117290800A (en) * | 2023-11-24 | 2023-12-26 | 华东交通大学 | Timing sequence anomaly detection method and system based on hypergraph attention network |
US11853536B2 (en) | 2015-09-08 | 2023-12-26 | Apple Inc. | Intelligent automated assistant in a media environment |
US11861019B2 (en) | 2020-04-15 | 2024-01-02 | Crowdstrike, Inc. | Distributed digital security system |
CN117372076A (en) * | 2023-08-23 | 2024-01-09 | 广东烟草广州市有限公司 | Abnormal transaction data monitoring method, device, equipment and storage medium |
US20240013075A1 (en) * | 2022-05-25 | 2024-01-11 | Tsinghua University | Method and apparatus for semantic analysis on confrontation scenario based on target-attribute-relation |
CN117421459A (en) * | 2023-12-14 | 2024-01-19 | 成都智慧锦城大数据有限公司 | Data mining method and system applied to digital city |
US11888859B2 (en) | 2017-05-15 | 2024-01-30 | Forcepoint Llc | Associating a security risk persona with a phase of a cyber kill chain |
US11888897B2 (en) | 2018-02-09 | 2024-01-30 | SentinelOne, Inc. | Implementing decoys in a network environment |
US11886805B2 (en) | 2015-11-09 | 2024-01-30 | Apple Inc. | Unconventional virtual assistant interactions |
US20240039914A1 (en) * | 2020-06-29 | 2024-02-01 | Cyral Inc. | Non-in line data monitoring and security services |
US11895158B2 (en) | 2020-05-19 | 2024-02-06 | Forcepoint Llc | Cybersecurity system having security policy visualization |
US11899782B1 (en) | 2021-07-13 | 2024-02-13 | SentinelOne, Inc. | Preserving DLL hooks |
US11914848B2 (en) | 2020-05-11 | 2024-02-27 | Apple Inc. | Providing relevant data items based on context |
US20240070130A1 (en) * | 2022-08-30 | 2024-02-29 | Charter Communications Operating, Llc | Methods And Systems For Identifying And Correcting Anomalies In A Data Environment |
US11924238B2 (en) | 2018-02-20 | 2024-03-05 | Darktrace Holdings Limited | Cyber threat defense system, components, and a method for using artificial intelligence models trained on a normal pattern of life for systems with unusual data sources |
US11936667B2 (en) | 2020-02-28 | 2024-03-19 | Darktrace Holdings Limited | Cyber security system applying network sequence prediction using transformers |
US11936668B2 (en) | 2021-08-17 | 2024-03-19 | International Business Machines Corporation | Identifying credential attacks on encrypted network traffic |
US11935522B2 (en) | 2020-06-11 | 2024-03-19 | Capital One Services, Llc | Cognitive analysis of public communications |
US11948048B2 (en) | 2014-04-02 | 2024-04-02 | Brighterion, Inc. | Artificial intelligence for context classifier |
US11949713B2 (en) | 2020-03-02 | 2024-04-02 | Abnormal Security Corporation | Abuse mailbox for facilitating discovery, investigation, and analysis of email-based threats |
CN117851958A (en) * | 2024-03-07 | 2024-04-09 | 中国人民解放军国防科技大学 | FHGS-based dynamic network edge anomaly detection method, device and equipment |
US11962552B2 (en) | 2018-02-20 | 2024-04-16 | Darktrace Holdings Limited | Endpoint agent extension of a machine learning cyber defense system for email |
CN117909912A (en) * | 2024-03-19 | 2024-04-19 | 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) | Detection method and system for two-stage abnormal user behavior analysis |
US11973772B2 (en) | 2018-12-19 | 2024-04-30 | Abnormal Security Corporation | Multistage analysis of emails to identify security threats |
US11973774B2 (en) | 2020-02-28 | 2024-04-30 | Darktrace Holdings Limited | Multi-stage anomaly detection for process chains in multi-host environments |
US11985142B2 (en) | 2020-02-28 | 2024-05-14 | Darktrace Holdings Limited | Method and system for determining and acting on a structured document cyber threat risk |
US20240160625A1 (en) * | 2022-11-10 | 2024-05-16 | Bank Of America Corporation | Event-driven batch processing system with granular operational access |
US12014118B2 (en) | 2017-05-15 | 2024-06-18 | Apple Inc. | Multi-modal interfaces having selection disambiguation and text modification capability |
CN118233317A (en) * | 2024-05-23 | 2024-06-21 | 四川大学 | Topology confusion defense method based on time-based network inference |
CN118297287A (en) * | 2024-06-05 | 2024-07-05 | 宁波财经学院 | Intelligent campus system based on student information index |
US12034767B2 (en) | 2019-08-29 | 2024-07-09 | Darktrace Holdings Limited | Artificial intelligence adversary red team |
US12058163B2 (en) | 2021-08-10 | 2024-08-06 | CyberSaint, Inc. | Systems, media, and methods for utilizing a crosswalk algorithm to identify controls across frameworks, and for utilizing identified controls to generate cybersecurity risk assessments |
US12063243B2 (en) | 2018-02-20 | 2024-08-13 | Darktrace Holdings Limited | Autonomous email report generator |
US12081522B2 (en) | 2020-02-21 | 2024-09-03 | Abnormal Security Corporation | Discovering email account compromise through assessments of digital activities |
US12126636B2 (en) | 2016-02-09 | 2024-10-22 | Darktrace Holdings Limited | Anomaly alert system for cyber threat detection |
Families Citing this family (279)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9710852B1 (en) | 2002-05-30 | 2017-07-18 | Consumerinfo.Com, Inc. | Credit report timeline user interface |
US9400589B1 (en) | 2002-05-30 | 2016-07-26 | Consumerinfo.Com, Inc. | Circular rotational interface for display of consumer credit information |
US8346593B2 (en) | 2004-06-30 | 2013-01-01 | Experian Marketing Solutions, Inc. | System, method, and software for prediction of attitudinal and message responsiveness |
US8732004B1 (en) | 2004-09-22 | 2014-05-20 | Experian Information Solutions, Inc. | Automated analysis of data to generate prospect notifications based on trigger events |
US8036979B1 (en) | 2006-10-05 | 2011-10-11 | Experian Information Solutions, Inc. | System and method for generating a finance attribute from tradeline data |
US8606666B1 (en) | 2007-01-31 | 2013-12-10 | Experian Information Solutions, Inc. | System and method for providing an aggregation tool |
US8606626B1 (en) | 2007-01-31 | 2013-12-10 | Experian Information Solutions, Inc. | Systems and methods for providing a direct marketing campaign planning environment |
US8156064B2 (en) | 2007-07-05 | 2012-04-10 | Brown Stephen J | Observation-based user profiling and profile matching |
US7996521B2 (en) | 2007-11-19 | 2011-08-09 | Experian Marketing Solutions, Inc. | Service for mapping IP addresses to user segments |
US9990674B1 (en) | 2007-12-14 | 2018-06-05 | Consumerinfo.Com, Inc. | Card registry systems and methods |
US8127986B1 (en) | 2007-12-14 | 2012-03-06 | Consumerinfo.Com, Inc. | Card registry systems and methods |
US9842204B2 (en) | 2008-04-01 | 2017-12-12 | Nudata Security Inc. | Systems and methods for assessing security risk |
CA2924049C (en) | 2008-04-01 | 2019-10-29 | Nudata Security Inc. | Systems and methods for implementing and tracking identification tests |
US8312033B1 (en) | 2008-06-26 | 2012-11-13 | Experian Marketing Solutions, Inc. | Systems and methods for providing an integrated identifier |
US9256904B1 (en) | 2008-08-14 | 2016-02-09 | Experian Information Solutions, Inc. | Multi-bureau credit file freeze and unfreeze |
US8060424B2 (en) | 2008-11-05 | 2011-11-15 | Consumerinfo.Com, Inc. | On-line method and system for monitoring and reporting unused available credit |
WO2010132492A2 (en) | 2009-05-11 | 2010-11-18 | Experian Marketing Solutions, Inc. | Systems and methods for providing anonymized user profile data |
US9652802B1 (en) | 2010-03-24 | 2017-05-16 | Consumerinfo.Com, Inc. | Indirect monitoring and reporting of a user's credit data |
CN102314633A (en) * | 2010-06-30 | 2012-01-11 | 国际商业机器公司 | Be used for equipment and method that deal with data is analyzed |
US9152727B1 (en) | 2010-08-23 | 2015-10-06 | Experian Marketing Solutions, Inc. | Systems and methods for processing consumer information for targeted marketing applications |
US20150235312A1 (en) * | 2014-02-14 | 2015-08-20 | Stephen Dodson | Method and Apparatus for Detecting Rogue Trading Activity |
US9912718B1 (en) * | 2011-04-11 | 2018-03-06 | Viasat, Inc. | Progressive prefetching |
US9665854B1 (en) | 2011-06-16 | 2017-05-30 | Consumerinfo.Com, Inc. | Authentication alerts |
US9483606B1 (en) | 2011-07-08 | 2016-11-01 | Consumerinfo.Com, Inc. | Lifescore |
US9106691B1 (en) | 2011-09-16 | 2015-08-11 | Consumerinfo.Com, Inc. | Systems and methods of identity protection and management |
US8738516B1 (en) | 2011-10-13 | 2014-05-27 | Consumerinfo.Com, Inc. | Debt services candidate locator |
US9853959B1 (en) | 2012-05-07 | 2017-12-26 | Consumerinfo.Com, Inc. | Storage and maintenance of personal data |
US9654541B1 (en) | 2012-11-12 | 2017-05-16 | Consumerinfo.Com, Inc. | Aggregating user web browsing data |
US9916621B1 (en) | 2012-11-30 | 2018-03-13 | Consumerinfo.Com, Inc. | Presentation of credit score factors |
US10255598B1 (en) | 2012-12-06 | 2019-04-09 | Consumerinfo.Com, Inc. | Credit card account data extraction |
US20140222476A1 (en) * | 2013-02-06 | 2014-08-07 | Verint Systems Ltd. | Anomaly Detection in Interaction Data |
US10353765B2 (en) * | 2013-03-08 | 2019-07-16 | Insyde Software Corp. | Method and device to perform event thresholding in a firmware environment utilizing a scalable sliding time-window |
US20140257919A1 (en) * | 2013-03-09 | 2014-09-11 | Hewlett- Packard Development Company, L.P. | Reward population grouping |
US9406085B1 (en) | 2013-03-14 | 2016-08-02 | Consumerinfo.Com, Inc. | System and methods for credit dispute processing, resolution, and reporting |
US9870589B1 (en) | 2013-03-14 | 2018-01-16 | Consumerinfo.Com, Inc. | Credit utilization tracking and reporting |
US10102570B1 (en) | 2013-03-14 | 2018-10-16 | Consumerinfo.Com, Inc. | Account vulnerability alerts |
US10685398B1 (en) | 2013-04-23 | 2020-06-16 | Consumerinfo.Com, Inc. | Presenting credit score information |
US9444722B2 (en) | 2013-08-01 | 2016-09-13 | Palo Alto Research Center Incorporated | Method and apparatus for configuring routing paths in a custodian-based routing architecture |
US9443268B1 (en) | 2013-08-16 | 2016-09-13 | Consumerinfo.Com, Inc. | Bill payment and reporting |
US9311377B2 (en) | 2013-11-13 | 2016-04-12 | Palo Alto Research Center Incorporated | Method and apparatus for performing server handoff in a name-based content distribution system |
US10101801B2 (en) | 2013-11-13 | 2018-10-16 | Cisco Technology, Inc. | Method and apparatus for prefetching content in a data stream |
US10325314B1 (en) | 2013-11-15 | 2019-06-18 | Consumerinfo.Com, Inc. | Payment reporting systems |
US9477737B1 (en) | 2013-11-20 | 2016-10-25 | Consumerinfo.Com, Inc. | Systems and user interfaces for dynamic access of multiple remote databases and synchronization of data based on user rules |
US10262362B1 (en) | 2014-02-14 | 2019-04-16 | Experian Information Solutions, Inc. | Automatic generation of code for attributes |
USD759689S1 (en) | 2014-03-25 | 2016-06-21 | Consumerinfo.Com, Inc. | Display screen or portion thereof with graphical user interface |
USD760256S1 (en) | 2014-03-25 | 2016-06-28 | Consumerinfo.Com, Inc. | Display screen or portion thereof with graphical user interface |
USD759690S1 (en) | 2014-03-25 | 2016-06-21 | Consumerinfo.Com, Inc. | Display screen or portion thereof with graphical user interface |
US9892457B1 (en) | 2014-04-16 | 2018-02-13 | Consumerinfo.Com, Inc. | Providing credit data in search results |
WO2015160367A1 (en) * | 2014-04-18 | 2015-10-22 | Hewlett-Packard Development Company, L.P. | Pre-cognitive security information and event management |
EP3152697A4 (en) * | 2014-06-09 | 2018-04-11 | Northrop Grumman Systems Corporation | System and method for real-time detection of anomalies in database usage |
US11257117B1 (en) | 2014-06-25 | 2022-02-22 | Experian Information Solutions, Inc. | Mobile device sighting location analytics and profiling system |
US10296861B2 (en) * | 2014-10-31 | 2019-05-21 | Microsoft Technology Licensing, Llc | Identifying the effectiveness of a meeting from a meetings graph |
WO2016073457A2 (en) | 2014-11-03 | 2016-05-12 | Level 3 Communications, Llc | Identifying a potential ddos attack using statistical analysis |
US10389573B2 (en) | 2014-11-14 | 2019-08-20 | Apstra, Inc. | Configuring a network |
US9232052B1 (en) * | 2014-11-21 | 2016-01-05 | Marchex, Inc. | Analyzing voice characteristics to detect fraudulent call activity and take corrective action without using recording, transcription or caller ID |
US9904584B2 (en) | 2014-11-26 | 2018-02-27 | Microsoft Technology Licensing, Llc | Performance anomaly diagnosis |
US10127903B2 (en) | 2014-12-02 | 2018-11-13 | International Business Machines Corporation | Discovering windows in temporal predicates |
US10467536B1 (en) | 2014-12-12 | 2019-11-05 | Go Daddy Operating Company, LLC | Domain name generation and ranking |
US9787634B1 (en) * | 2014-12-12 | 2017-10-10 | Go Daddy Operating Company, LLC | Suggesting domain names based on recognized user patterns |
US9990432B1 (en) | 2014-12-12 | 2018-06-05 | Go Daddy Operating Company, LLC | Generic folksonomy for concept-based domain name searches |
US10242019B1 (en) | 2014-12-19 | 2019-03-26 | Experian Information Solutions, Inc. | User behavior segmentation using latent topic detection |
US11556876B2 (en) * | 2015-01-05 | 2023-01-17 | International Business Machines Corporation | Detecting business anomalies utilizing information velocity and other parameters using statistical analysis |
US9853873B2 (en) | 2015-01-10 | 2017-12-26 | Cisco Technology, Inc. | Diagnosis and throughput measurement of fibre channel ports in a storage area network environment |
US10387794B2 (en) | 2015-01-22 | 2019-08-20 | Preferred Networks, Inc. | Machine learning with model filtering and model mixing for edge devices in a heterogeneous environment |
US10484406B2 (en) | 2015-01-22 | 2019-11-19 | Cisco Technology, Inc. | Data visualization in self-learning networks |
US10372906B2 (en) | 2015-02-17 | 2019-08-06 | International Business Machines Corporation | Behavioral model based on short and long range event correlations in system traces |
US20160253077A1 (en) * | 2015-02-27 | 2016-09-01 | General Electric Company | Synchronizing system and method for syncing event logs from different monitoring systems |
US9836599B2 (en) * | 2015-03-13 | 2017-12-05 | Microsoft Technology Licensing, Llc | Implicit process detection and automation from unstructured activity |
US9900250B2 (en) | 2015-03-26 | 2018-02-20 | Cisco Technology, Inc. | Scalable handling of BGP route information in VXLAN with EVPN control plane |
US10222986B2 (en) | 2015-05-15 | 2019-03-05 | Cisco Technology, Inc. | Tenant-level sharding of disks with tenant-specific storage modules to enable policies per tenant in a distributed storage system |
US9760426B2 (en) | 2015-05-28 | 2017-09-12 | Microsoft Technology Licensing, Llc | Detecting anomalous accounts using event logs |
US11588783B2 (en) | 2015-06-10 | 2023-02-21 | Cisco Technology, Inc. | Techniques for implementing IPV6-based distributed storage space |
US10282458B2 (en) * | 2015-06-15 | 2019-05-07 | Vmware, Inc. | Event notification system with cluster classification |
US9591014B2 (en) | 2015-06-17 | 2017-03-07 | International Business Machines Corporation | Capturing correlations between activity and non-activity attributes using N-grams |
US10360184B2 (en) | 2015-06-24 | 2019-07-23 | International Business Machines Corporation | Log file analysis to locate anomalies |
US10063428B1 (en) | 2015-06-30 | 2018-08-28 | Apstra, Inc. | Selectable declarative requirement levels |
US20170010930A1 (en) * | 2015-07-08 | 2017-01-12 | Cisco Technology, Inc. | Interactive mechanism to view logs and metrics upon an anomaly in a distributed storage system |
US9575828B2 (en) | 2015-07-08 | 2017-02-21 | Cisco Technology, Inc. | Correctly identifying potential anomalies in a distributed storage system |
US10216776B2 (en) | 2015-07-09 | 2019-02-26 | Entit Software Llc | Variance based time series dataset alignment |
WO2017011708A1 (en) * | 2015-07-14 | 2017-01-19 | Sios Technology Corporation | Apparatus and method of leveraging machine learning principals for root cause analysis and remediation in computer environments |
US10778765B2 (en) | 2015-07-15 | 2020-09-15 | Cisco Technology, Inc. | Bid/ask protocol in scale-out NVMe storage |
US11030584B2 (en) | 2015-07-17 | 2021-06-08 | Adp, Llc | System and method for managing events |
KR102045468B1 (en) * | 2015-07-27 | 2019-11-15 | 한국전자통신연구원 | Apparatus for detection of anomalous connection behavior based on network data analytics and method using the same |
US9699205B2 (en) | 2015-08-31 | 2017-07-04 | Splunk Inc. | Network security system |
EP3345116A4 (en) * | 2015-09-02 | 2019-01-16 | Nehemiah Security | Process launch, monitoring and execution control |
US10394897B2 (en) | 2015-09-11 | 2019-08-27 | International Business Machines Corporation | Visualization of serial processes |
US20170083815A1 (en) * | 2015-09-18 | 2017-03-23 | Ca, Inc. | Current behavior evaluation with multiple process models |
US11144834B2 (en) * | 2015-10-09 | 2021-10-12 | Fair Isaac Corporation | Method for real-time enhancement of a predictive algorithm by a novel measurement of concept drift using algorithmically-generated features |
US11666702B2 (en) | 2015-10-19 | 2023-06-06 | Medtronic Minimed, Inc. | Medical devices and related event pattern treatment recommendation methods |
US11501867B2 (en) * | 2015-10-19 | 2022-11-15 | Medtronic Minimed, Inc. | Medical devices and related event pattern presentation methods |
US11468368B2 (en) * | 2015-10-28 | 2022-10-11 | Qomplx, Inc. | Parametric modeling and simulation of complex systems using large datasets and heterogeneous data structures |
US9928155B2 (en) * | 2015-11-18 | 2018-03-27 | Nec Corporation | Automated anomaly detection service on heterogeneous log streams |
US9928625B2 (en) | 2015-11-19 | 2018-03-27 | International Business Machines Corporation | Visualizing temporal aspects of serial processes |
US9767309B1 (en) | 2015-11-23 | 2017-09-19 | Experian Information Solutions, Inc. | Access control system for implementing access restrictions of regulated database records while identifying and providing indicators of regulated database records matching validation criteria |
US11074529B2 (en) | 2015-12-04 | 2021-07-27 | International Business Machines Corporation | Predicting event types and time intervals for projects |
US9892075B2 (en) | 2015-12-10 | 2018-02-13 | Cisco Technology, Inc. | Policy driven storage in a microserver computing environment |
US11120460B2 (en) | 2015-12-21 | 2021-09-14 | International Business Machines Corporation | Effectiveness of service complexity configurations in top-down complex services design |
US10313206B1 (en) | 2015-12-23 | 2019-06-04 | Apstra, Inc. | Verifying service status |
US11074536B2 (en) * | 2015-12-29 | 2021-07-27 | Workfusion, Inc. | Worker similarity clusters for worker assessment |
US9436760B1 (en) * | 2016-02-05 | 2016-09-06 | Quid, Inc. | Measuring accuracy of semantic graphs with exogenous datasets |
US20170272453A1 (en) * | 2016-03-15 | 2017-09-21 | DataVisor Inc. | User interface for displaying network analytics |
US20170277997A1 (en) * | 2016-03-23 | 2017-09-28 | Nec Laboratories America, Inc. | Invariants Modeling and Detection for Heterogeneous Logs |
US10498752B2 (en) | 2016-03-28 | 2019-12-03 | Cisco Technology, Inc. | Adaptive capture of packet traces based on user feedback learning |
US9996531B1 (en) * | 2016-03-29 | 2018-06-12 | Facebook, Inc. | Conversational understanding |
US10656979B2 (en) * | 2016-03-31 | 2020-05-19 | International Business Machines Corporation | Structural and temporal semantics heterogeneous information network (HIN) for process trace clustering |
US10423789B2 (en) * | 2016-04-03 | 2019-09-24 | Palo Alto Networks, Inc. | Identification of suspicious system processes |
US10289509B2 (en) * | 2016-04-06 | 2019-05-14 | Nec Corporation | System failure prediction using long short-term memory neural networks |
US10140172B2 (en) | 2016-05-18 | 2018-11-27 | Cisco Technology, Inc. | Network-aware storage repairs |
US10257211B2 (en) | 2016-05-20 | 2019-04-09 | Informatica Llc | Method, apparatus, and computer-readable medium for detecting anomalous user behavior |
US10374872B2 (en) | 2016-05-24 | 2019-08-06 | Apstra, Inc. | Configuring system resources for different reference architectures |
US20170351639A1 (en) | 2016-06-06 | 2017-12-07 | Cisco Technology, Inc. | Remote memory access using memory mapped addressing among multiple compute nodes |
US10902446B2 (en) | 2016-06-24 | 2021-01-26 | International Business Machines Corporation | Top-down pricing of a complex service deal |
US10929872B2 (en) | 2016-06-24 | 2021-02-23 | International Business Machines Corporation | Augmenting missing values in historical or market data for deals |
US10248974B2 (en) | 2016-06-24 | 2019-04-02 | International Business Machines Corporation | Assessing probability of winning an in-flight deal for different price points |
US10664169B2 (en) | 2016-06-24 | 2020-05-26 | Cisco Technology, Inc. | Performance of object storage system by reconfiguring storage devices based on latency that includes identifying a number of fragments that has a particular storage device as its primary storage device and another number of fragments that has said particular storage device as its replica storage device |
US9569729B1 (en) * | 2016-07-20 | 2017-02-14 | Chenope, Inc. | Analytical system and method for assessing certain characteristics of organizations |
WO2018039377A1 (en) | 2016-08-24 | 2018-03-01 | Experian Information Solutions, Inc. | Disambiguation and authentication of device users |
US11563695B2 (en) | 2016-08-29 | 2023-01-24 | Cisco Technology, Inc. | Queue protection using a shared global memory reserve |
US20180060987A1 (en) * | 2016-08-31 | 2018-03-01 | International Business Machines Corporation | Identification of abnormal behavior in human activity based on internet of things collected data |
US11100438B2 (en) * | 2016-10-21 | 2021-08-24 | Microsoft Technology Licensing, Llc | Project entity extraction with efficient search and processing of projects |
US11250444B2 (en) * | 2016-11-04 | 2022-02-15 | Walmart Apollo, Llc | Identifying and labeling fraudulent store return activities |
US10587629B1 (en) * | 2016-11-06 | 2020-03-10 | Akamai Technologies, Inc. | Reducing false positives in bot detection |
US10740170B2 (en) * | 2016-12-08 | 2020-08-11 | Nec Corporation | Structure-level anomaly detection for unstructured logs |
US10217158B2 (en) | 2016-12-13 | 2019-02-26 | Global Healthcare Exchange, Llc | Multi-factor routing system for exchanging business transactions |
US10217086B2 (en) | 2016-12-13 | 2019-02-26 | Golbal Healthcare Exchange, Llc | Highly scalable event brokering and audit traceability system |
WO2018124672A1 (en) | 2016-12-28 | 2018-07-05 | Samsung Electronics Co., Ltd. | Apparatus for detecting anomaly and operating method for the same |
CN106846222B (en) * | 2016-12-31 | 2020-05-12 | 中国科学技术大学 | Motor vehicle exhaust telemetering equipment stationing method based on graph theory and Boolean algebra |
US10545914B2 (en) | 2017-01-17 | 2020-01-28 | Cisco Technology, Inc. | Distributed object storage |
US11567994B2 (en) | 2017-01-24 | 2023-01-31 | Apstra, Inc. | Configuration, telemetry, and analytics of a computer infrastructure using a graph model |
US10425353B1 (en) | 2017-01-27 | 2019-09-24 | Triangle Ip, Inc. | Machine learning temporal allocator |
US10326787B2 (en) | 2017-02-15 | 2019-06-18 | Microsoft Technology Licensing, Llc | System and method for detecting anomalies including detection and removal of outliers associated with network traffic to cloud applications |
US10616251B2 (en) * | 2017-02-23 | 2020-04-07 | Cisco Technology, Inc. | Anomaly selection using distance metric-based diversity and relevance |
US10243823B1 (en) | 2017-02-24 | 2019-03-26 | Cisco Technology, Inc. | Techniques for using frame deep loopback capabilities for extended link diagnostics in fibre channel storage area networks |
US10713203B2 (en) | 2017-02-28 | 2020-07-14 | Cisco Technology, Inc. | Dynamic partition of PCIe disk arrays based on software configuration / policy distribution |
US10254991B2 (en) | 2017-03-06 | 2019-04-09 | Cisco Technology, Inc. | Storage area network based extended I/O metrics computation for deep insight into application performance |
US10127373B1 (en) | 2017-05-05 | 2018-11-13 | Mastercard Technologies Canada ULC | Systems and methods for distinguishing among human users and software robots |
US9990487B1 (en) | 2017-05-05 | 2018-06-05 | Mastercard Technologies Canada ULC | Systems and methods for distinguishing among human users and software robots |
US10007776B1 (en) | 2017-05-05 | 2018-06-26 | Mastercard Technologies Canada ULC | Systems and methods for distinguishing among human users and software robots |
US10262154B1 (en) * | 2017-06-09 | 2019-04-16 | Microsoft Technology Licensing, Llc | Computerized matrix factorization and completion to infer median/mean confidential values |
US20180365622A1 (en) * | 2017-06-16 | 2018-12-20 | Hcl Technologies Limited | System and method for transmitting alerts |
US11086755B2 (en) * | 2017-06-26 | 2021-08-10 | Jpmorgan Chase Bank, N.A. | System and method for implementing an application monitoring tool |
US10303534B2 (en) | 2017-07-20 | 2019-05-28 | Cisco Technology, Inc. | System and method for self-healing of application centric infrastructure fabric memory |
US10229092B2 (en) | 2017-08-14 | 2019-03-12 | City University Of Hong Kong | Systems and methods for robust low-rank matrix approximation |
US10511556B2 (en) * | 2017-09-20 | 2019-12-17 | Fujitsu Limited | Bursty detection for message streams |
US20190102710A1 (en) * | 2017-09-30 | 2019-04-04 | Microsoft Technology Licensing, Llc | Employer ranking for inter-company employee flow |
US10404596B2 (en) | 2017-10-03 | 2019-09-03 | Cisco Technology, Inc. | Dynamic route profile storage in a hardware trie routing table |
US10942666B2 (en) | 2017-10-13 | 2021-03-09 | Cisco Technology, Inc. | Using network device replication in distributed storage clusters |
CN109697282B (en) | 2017-10-20 | 2023-06-06 | 阿里巴巴集团控股有限公司 | Sentence user intention recognition method and device |
US10733813B2 (en) | 2017-11-01 | 2020-08-04 | International Business Machines Corporation | Managing anomaly detection models for fleets of industrial equipment |
US11520880B2 (en) * | 2017-11-03 | 2022-12-06 | International Business Machines Corporation | Identifying internet of things network anomalies using group attestation |
US10042879B1 (en) * | 2017-11-13 | 2018-08-07 | Lendingclub Corporation | Techniques for dynamically enriching and propagating a correlation context |
US11250348B1 (en) * | 2017-12-06 | 2022-02-15 | Amdocs Development Limited | System, method, and computer program for automatically determining customer issues and resolving issues using graphical user interface (GUI) based interactions with a chatbot |
US10756983B2 (en) | 2017-12-08 | 2020-08-25 | Apstra, Inc. | Intent-based analytics |
CN109918688B (en) * | 2017-12-12 | 2023-09-08 | 上海翼锐汽车科技有限公司 | Vehicle body shape uniform matching method based on entropy principle |
US10755324B2 (en) * | 2018-01-02 | 2020-08-25 | International Business Machines Corporation | Selecting peer deals for information technology (IT) service deals |
US11182833B2 (en) | 2018-01-02 | 2021-11-23 | International Business Machines Corporation | Estimating annual cost reduction when pricing information technology (IT) service deals |
US11190538B2 (en) | 2018-01-18 | 2021-11-30 | Risksense, Inc. | Complex application attack quantification, testing, detection and prevention |
US11163722B2 (en) * | 2018-01-31 | 2021-11-02 | Salesforce.Com, Inc. | Methods and apparatus for analyzing a live stream of log entries to detect patterns |
US11036605B2 (en) | 2018-02-21 | 2021-06-15 | International Business Machines Corporation | Feedback tuples for detecting data flow anomalies in stream computing environment |
US11023495B2 (en) * | 2018-03-19 | 2021-06-01 | Adobe Inc. | Automatically generating meaningful user segments |
US10929217B2 (en) * | 2018-03-22 | 2021-02-23 | Microsoft Technology Licensing, Llc | Multi-variant anomaly detection from application telemetry |
US11715070B2 (en) * | 2018-04-12 | 2023-08-01 | Kronos Technology Systems Limited Partnership | Predicting upcoming missed clockings and alerting workers or managers |
US20190334759A1 (en) * | 2018-04-26 | 2019-10-31 | Microsoft Technology Licensing, Llc | Unsupervised anomaly detection for identifying anomalies in data |
US20190378073A1 (en) * | 2018-06-08 | 2019-12-12 | Jpmorgan Chase Bank, N.A. | Business-Aware Intelligent Incident and Change Management |
US10686807B2 (en) * | 2018-06-12 | 2020-06-16 | International Business Machines Corporation | Intrusion detection system |
US10922204B2 (en) * | 2018-06-13 | 2021-02-16 | Ca, Inc. | Efficient behavioral analysis of time series data |
WO2020005250A1 (en) | 2018-06-28 | 2020-01-02 | Google Llc | Detecting zero-day attacks with unknown signatures via mining correlation in behavioral change of entities over time |
CN109003625B (en) * | 2018-07-27 | 2021-01-12 | 中国科学院自动化研究所 | Speech emotion recognition method and system based on ternary loss |
US11693848B2 (en) * | 2018-08-07 | 2023-07-04 | Accenture Global Solutions Limited | Approaches for knowledge graph pruning based on sampling and information gain theory |
US11417415B2 (en) * | 2018-08-10 | 2022-08-16 | International Business Machines Corporation | Molecular representation |
US11003767B2 (en) * | 2018-08-21 | 2021-05-11 | Beijing Didi Infinity Technology And Development Co., Ltd. | Multi-layer data model for security analytics |
WO2020044269A2 (en) * | 2018-08-29 | 2020-03-05 | Credit Suisse Securities (Usa) Llc | Systems and methods for calculating consensus data on a decentralized peer-to-peer network using distributed ledger |
US10776196B2 (en) | 2018-08-29 | 2020-09-15 | International Business Machines Corporation | Systems and methods for anomaly detection in a distributed computing system |
US10880313B2 (en) | 2018-09-05 | 2020-12-29 | Consumerinfo.Com, Inc. | Database platform for realtime updating of user data from third party sources |
US11479243B2 (en) * | 2018-09-14 | 2022-10-25 | Honda Motor Co., Ltd. | Uncertainty prediction based deep learning |
CN109474755B (en) * | 2018-10-30 | 2020-10-30 | 济南大学 | Abnormal telephone active prediction method, system and computer readable storage medium based on sequencing learning and ensemble learning |
US11315179B1 (en) | 2018-11-16 | 2022-04-26 | Consumerinfo.Com, Inc. | Methods and apparatuses for customized card recommendations |
US11182707B2 (en) | 2018-11-19 | 2021-11-23 | Rimini Street, Inc. | Method and system for providing a multi-dimensional human resource allocation adviser |
US11182488B2 (en) | 2018-11-28 | 2021-11-23 | International Business Machines Corporation | Intelligent information protection based on detection of emergency events |
US11973778B2 (en) | 2018-12-03 | 2024-04-30 | British Telecommunications Public Limited Company | Detecting anomalies in computer networks |
EP3891638A1 (en) | 2018-12-03 | 2021-10-13 | British Telecommunications public limited company | Remediating software vulnerabilities |
US11960610B2 (en) | 2018-12-03 | 2024-04-16 | British Telecommunications Public Limited Company | Detecting vulnerability change in software systems |
EP3891636A1 (en) | 2018-12-03 | 2021-10-13 | British Telecommunications public limited company | Detecting vulnerable software systems |
EP3663951B1 (en) * | 2018-12-03 | 2021-09-15 | British Telecommunications public limited company | Multi factor network anomaly detection |
US11089034B2 (en) | 2018-12-10 | 2021-08-10 | Bitdefender IPR Management Ltd. | Systems and methods for behavioral threat detection |
US11323459B2 (en) | 2018-12-10 | 2022-05-03 | Bitdefender IPR Management Ltd. | Systems and methods for behavioral threat detection |
TWI660141B (en) * | 2018-12-12 | 2019-05-21 | Chaoyang University Of Technology | Spotlight device and spotlight tracking method |
US11122084B1 (en) | 2018-12-17 | 2021-09-14 | Wells Fargo Bank, N.A. | Automatic monitoring and modeling |
EP3681124B8 (en) | 2019-01-09 | 2022-02-16 | British Telecommunications public limited company | Anomalous network node behaviour identification using deterministic path walking |
US11336668B2 (en) * | 2019-01-14 | 2022-05-17 | Penta Security Systems Inc. | Method and apparatus for detecting abnormal behavior of groupware user |
US11238656B1 (en) | 2019-02-22 | 2022-02-01 | Consumerinfo.Com, Inc. | System and method for an augmented reality experience via an artificial intelligence bot |
US11561963B1 (en) | 2019-02-26 | 2023-01-24 | Intuit Inc. | Method and system for using time-location transaction signatures to enrich user profiles |
US10992543B1 (en) | 2019-03-21 | 2021-04-27 | Apstra, Inc. | Automatically generating an intent-based network model of an existing computer network |
US11171978B2 (en) | 2019-03-27 | 2021-11-09 | Microsoft Technology Licensing, Llc. | Dynamic monitoring, detection of emerging computer events |
US11514089B2 (en) * | 2019-04-16 | 2022-11-29 | Eagle Technology, Llc | Geospatial monitoring system providing unsupervised site identification and classification from crowd-sourced mobile data (CSMD) and related methods |
US11075805B1 (en) | 2019-04-24 | 2021-07-27 | Juniper Networks, Inc. | Business policy management for self-driving network |
US11082478B1 (en) * | 2019-05-07 | 2021-08-03 | PODTRAC, Inc. | System and method for using a variable analysis window to produce statistics on downloadable media |
US10990425B2 (en) * | 2019-05-08 | 2021-04-27 | Morgan Stanley Services Group Inc. | Simulated change of immutable objects during execution runtime |
US11315177B2 (en) * | 2019-06-03 | 2022-04-26 | Intuit Inc. | Bias prediction and categorization in financial tools |
US11620389B2 (en) | 2019-06-24 | 2023-04-04 | University Of Maryland Baltimore County | Method and system for reducing false positives in static source code analysis reports using machine learning and classification techniques |
US20210027302A1 (en) * | 2019-07-25 | 2021-01-28 | Intuit Inc. | Detecting life events by applying anomaly detection methods to transaction data |
US11295020B2 (en) | 2019-08-05 | 2022-04-05 | Bank Of America Corporation | System for integrated natural language processing and event analysis for threat detection in computing systems |
US11972346B2 (en) | 2019-08-26 | 2024-04-30 | Chenope, Inc. | System to detect, assess and counter disinformation |
US11941065B1 (en) | 2019-09-13 | 2024-03-26 | Experian Information Solutions, Inc. | Single identifier platform for storing entity data |
US12026076B2 (en) * | 2019-09-13 | 2024-07-02 | Rimini Street, Inc. | Method and system for proactive client relationship analysis |
US20210081972A1 (en) * | 2019-09-13 | 2021-03-18 | Rimini Street, Inc. | System and method for proactive client relationship analysis |
US10673886B1 (en) * | 2019-09-26 | 2020-06-02 | Packetsled, Inc. | Assigning and representing security risks on a computer network |
US11681965B2 (en) | 2019-10-25 | 2023-06-20 | Georgetown University | Specialized computing environment for co-analysis of proprietary data |
WO2021080615A1 (en) * | 2019-10-25 | 2021-04-29 | Georgetown University | Specialized computing environment for co-analysis of proprietary data |
US10713577B1 (en) | 2019-11-08 | 2020-07-14 | Capital One Services, Llc | Computer-based systems configured for entity resolution and indexing of entity activity |
US11698977B1 (en) | 2019-11-13 | 2023-07-11 | Ivanti, Inc. | Predicting and quantifying weaponization of software weaknesses |
US11483327B2 (en) * | 2019-11-17 | 2022-10-25 | Microsoft Technology Licensing, Llc | Collaborative filtering anomaly detection explainability |
US11455554B2 (en) | 2019-11-25 | 2022-09-27 | International Business Machines Corporation | Trustworthiness of artificial intelligence models in presence of anomalous data |
US11436611B2 (en) * | 2019-12-12 | 2022-09-06 | At&T Intellectual Property I, L.P. | Property archivist enabled customer service |
US20210182387A1 (en) * | 2019-12-12 | 2021-06-17 | International Business Machines Corporation | Automated semantic modeling of system events |
US12050605B2 (en) | 2019-12-26 | 2024-07-30 | Snowflake Inc. | Indexed geospatial predicate search |
US11681708B2 (en) | 2019-12-26 | 2023-06-20 | Snowflake Inc. | Indexed regular expression search with N-grams |
US11308090B2 (en) | 2019-12-26 | 2022-04-19 | Snowflake Inc. | Pruning index to support semi-structured data types |
US11567939B2 (en) | 2019-12-26 | 2023-01-31 | Snowflake Inc. | Lazy reassembling of semi-structured data |
US10769150B1 (en) | 2019-12-26 | 2020-09-08 | Snowflake Inc. | Pruning indexes to enhance database query processing |
US11372860B2 (en) | 2019-12-26 | 2022-06-28 | Snowflake Inc. | Processing techniques for queries where predicate values are unknown until runtime |
US11416874B1 (en) * | 2019-12-26 | 2022-08-16 | StratoKey Pty Ltd. | Compliance management system |
US10997179B1 (en) | 2019-12-26 | 2021-05-04 | Snowflake Inc. | Pruning index for optimization of pattern matching queries |
US11682041B1 (en) | 2020-01-13 | 2023-06-20 | Experian Marketing Solutions, Llc | Systems and methods of a tracking analytics platform |
US11599732B2 (en) | 2020-01-16 | 2023-03-07 | Bank Of America Corporation | Tunable system for monitoring threats to third-parties |
US11316881B2 (en) | 2020-01-17 | 2022-04-26 | Bank Of America Corporation | Just short of identity analysis |
CN111314173B (en) * | 2020-01-20 | 2022-04-08 | 腾讯科技(深圳)有限公司 | Monitoring information abnormity positioning method and device, computer equipment and storage medium |
US11593085B1 (en) | 2020-02-03 | 2023-02-28 | Rapid7, Inc. | Delta data collection technique for machine assessment |
US20210248703A1 (en) * | 2020-02-12 | 2021-08-12 | Motorola Solutions, Inc. | Methods and Systems for Detecting Inmate Communication Anomalies in a Correctional Facility |
US20210279633A1 (en) * | 2020-03-04 | 2021-09-09 | Tibco Software Inc. | Algorithmic learning engine for dynamically generating predictive analytics from high volume, high velocity streaming data |
US11184402B2 (en) | 2020-03-25 | 2021-11-23 | International Business Machines Corporation | Resource access policy enforcement using a hypergraph |
US20210318944A1 (en) * | 2020-04-13 | 2021-10-14 | UiPath, Inc. | Influence analysis of processes for reducing undesirable behavior |
CN111597224B (en) * | 2020-04-17 | 2023-09-15 | 北京百度网讯科技有限公司 | Method and device for generating structured information, electronic equipment and storage medium |
US12039276B2 (en) | 2020-04-29 | 2024-07-16 | Cisco Technology, Inc. | Anomaly classification with attendant word enrichment |
CN111586051B (en) * | 2020-05-08 | 2021-06-01 | 清华大学 | Network anomaly detection method based on hypergraph structure quality optimization |
US12086016B2 (en) * | 2020-06-30 | 2024-09-10 | Salesforce, Inc. | Anomaly detection and root cause analysis in a multi-tenant environment |
US11886413B1 (en) | 2020-07-22 | 2024-01-30 | Rapid7, Inc. | Time-sliced approximate data structure for storing group statistics |
US11971893B1 (en) | 2020-07-22 | 2024-04-30 | Rapid7, Inc. | Group by operation on time series data using count-min sketch |
US11204851B1 (en) | 2020-07-31 | 2021-12-21 | International Business Machines Corporation | Real-time data quality analysis |
US11263103B2 (en) | 2020-07-31 | 2022-03-01 | International Business Machines Corporation | Efficient real-time data quality analysis |
US11755822B2 (en) | 2020-08-04 | 2023-09-12 | International Business Machines Corporation | Promised natural language processing annotations |
US11520972B2 (en) | 2020-08-04 | 2022-12-06 | International Business Machines Corporation | Future potential natural language processing annotations |
US11907319B2 (en) * | 2020-08-06 | 2024-02-20 | Gary Manuel Jackson | Internet accessible behavior observation workplace assessment method and system to identify insider threat |
US11586609B2 (en) | 2020-09-15 | 2023-02-21 | International Business Machines Corporation | Abnormal event analysis |
US20220092534A1 (en) * | 2020-09-18 | 2022-03-24 | International Business Machines Corporation | Event-based risk assessment |
US20220092612A1 (en) * | 2020-09-21 | 2022-03-24 | Larsen & Toubro Infotech Ltd | System and method for automatically detecting anomaly present within dataset(s) |
JP7093031B2 (en) * | 2020-09-23 | 2022-06-29 | ダイキン工業株式会社 | Information processing equipment, information processing methods, and programs |
CN111930737B (en) * | 2020-10-13 | 2021-03-23 | 中国人民解放军陆军装甲兵学院 | Multidimensional correlation analysis method for equipment combat test data |
US11283691B1 (en) | 2020-10-21 | 2022-03-22 | Juniper Networks, Inc. | Model driven intent policy conflict detection and resolution through graph analysis |
US20220129923A1 (en) * | 2020-10-28 | 2022-04-28 | International Business Machines Corporation | Deep learning based behavior classification |
CN116888602A (en) * | 2020-12-17 | 2023-10-13 | 乌姆奈有限公司 | Interpretable transducer |
WO2022182916A1 (en) * | 2021-02-24 | 2022-09-01 | Lifebrand, Llc | System and method for determining the impact of a social media post across multiple social media platforms |
WO2022203650A1 (en) * | 2021-03-22 | 2022-09-29 | Jpmorgan Chase Bank, N.A. | Method and system for detection of abnormal transactional behavior |
US11989525B2 (en) | 2021-03-26 | 2024-05-21 | Oracle International Corporation | Techniques for generating multi-modal discourse trees |
US12047243B2 (en) | 2021-03-30 | 2024-07-23 | Rensselaer Polytechnic Institute | Synthetic network generator for covert network analytics |
US11481415B1 (en) * | 2021-03-30 | 2022-10-25 | International Business Machines Corporation | Corpus temporal analysis and maintenance |
US11556637B2 (en) | 2021-04-05 | 2023-01-17 | Bank Of America Corporation | Information security system and method for anomaly and security threat detection |
US11847111B2 (en) | 2021-04-09 | 2023-12-19 | Bitdefender IPR Management Ltd. | Anomaly detection systems and methods |
US11640387B2 (en) * | 2021-04-23 | 2023-05-02 | Capital One Services, Llc | Anomaly detection data workflow for time series data |
CN113505127B (en) * | 2021-06-22 | 2024-06-18 | 侍意(厦门)网络信息技术有限公司 | Storage structure and method for data with associated objects, retrieval and visual display method |
US11475211B1 (en) | 2021-07-12 | 2022-10-18 | International Business Machines Corporation | Elucidated natural language artifact recombination with contextual awareness |
US11409593B1 (en) | 2021-08-05 | 2022-08-09 | International Business Machines Corporation | Discovering insights and/or resolutions from collaborative conversations |
US11922357B2 (en) | 2021-10-07 | 2024-03-05 | Charter Communications Operating, Llc | System and method for identifying and handling data quality anomalies |
US11528279B1 (en) | 2021-11-12 | 2022-12-13 | Netskope, Inc. | Automatic user directory synchronization and troubleshooting |
US11777823B1 (en) | 2021-11-24 | 2023-10-03 | Amazon Technologies, Inc. | Metric anomaly detection across high-scale data |
US12040934B1 (en) * | 2021-12-17 | 2024-07-16 | Juniper Networks, Inc. | Conversational assistant for obtaining network information |
US11930027B2 (en) * | 2021-12-28 | 2024-03-12 | Nozomi Networks Sagl | Method for evaluating quality of rule-based detections |
IT202200004961A1 (en) | 2022-03-15 | 2023-09-15 | Rpctech S R L | SYSTEM FOR FORECASTING AND MANAGEMENT OF CRITICAL EVENTS WITH IDENTIFICATION OF ANOMALIES |
US20230316196A1 (en) * | 2022-03-30 | 2023-10-05 | Microsoft Technology Licensing, Llc | Nested model structures for the performance of complex tasks |
US11588843B1 (en) | 2022-04-08 | 2023-02-21 | Morgan Stanley Services Group Inc. | Multi-level log analysis to detect software use anomalies |
WO2023250147A1 (en) * | 2022-06-23 | 2023-12-28 | Bluevoyant Llc | Devices, systems, and method for generating and using a queryable index in a cyber data model to enhance network security |
US20230419216A1 (en) * | 2022-06-27 | 2023-12-28 | International Business Machines Corporation | Closed loop verification for event grouping mechanisms |
US11947682B2 (en) | 2022-07-07 | 2024-04-02 | Netskope, Inc. | ML-based encrypted file classification for identifying encrypted data movement |
US20240039937A1 (en) * | 2022-08-01 | 2024-02-01 | At&T Intellectual Property I, L.P. | System and method for identifying communications network anomalies of connected cars |
WO2024097380A1 (en) * | 2022-11-04 | 2024-05-10 | Tree Goat Media, Inc. | Systems and methods for transforming digital audio content |
US11880369B1 (en) | 2022-11-21 | 2024-01-23 | Snowflake Inc. | Pruning data based on state of top K operator |
US12052211B1 (en) | 2023-01-17 | 2024-07-30 | International Business Machines Corporation | Systems and methods for intelligent message interaction |
US12010003B1 (en) | 2023-01-26 | 2024-06-11 | Bank Of America Corporation | Systems and methods for deploying automated diagnostic engines for identification of network controls status |
US11853173B1 (en) | 2023-03-20 | 2023-12-26 | Kyndryl, Inc. | Log file manipulation detection |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7519589B2 (en) | 2003-02-04 | 2009-04-14 | Cataphora, Inc. | Method and apparatus for sociological data analysis |
CA2475319A1 (en) | 2002-02-04 | 2003-08-14 | Cataphora, Inc. | A method and apparatus to visually present discussions for data mining purposes |
US7386439B1 (en) | 2002-02-04 | 2008-06-10 | Cataphora, Inc. | Data mining by retrieving causally-related documents not individually satisfying search criteria used |
US7421660B2 (en) | 2003-02-04 | 2008-09-02 | Cataphora, Inc. | Method and apparatus to visually present discussions for data mining purposes |
-
2010
- 2010-11-08 US US12/941,849 patent/US20120137367A1/en not_active Abandoned
-
2013
- 2013-09-23 US US14/034,008 patent/US8887286B2/en active Active
Cited By (1506)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9646614B2 (en) | 2000-03-16 | 2017-05-09 | Apple Inc. | Fast, language-independent method for user authentication by voice |
US8718047B2 (en) | 2001-10-22 | 2014-05-06 | Apple Inc. | Text to speech conversion of text messages from mobile communication devices |
US10116595B2 (en) * | 2002-06-27 | 2018-10-30 | Oracle International Corporation | Method and system for processing intelligence information |
US20130227046A1 (en) * | 2002-06-27 | 2013-08-29 | Siebel Systems, Inc. | Method and system for processing intelligence information |
US11928604B2 (en) | 2005-09-08 | 2024-03-12 | Apple Inc. | Method and apparatus for building an intelligent automated assistant |
US9501741B2 (en) | 2005-09-08 | 2016-11-22 | Apple Inc. | Method and apparatus for building an intelligent automated assistant |
US10318871B2 (en) | 2005-09-08 | 2019-06-11 | Apple Inc. | Method and apparatus for building an intelligent automated assistant |
US8677377B2 (en) | 2005-09-08 | 2014-03-18 | Apple Inc. | Method and apparatus for building an intelligent automated assistant |
US8930191B2 (en) | 2006-09-08 | 2015-01-06 | Apple Inc. | Paraphrasing of user requests and results by automated digital assistant |
US8942986B2 (en) | 2006-09-08 | 2015-01-27 | Apple Inc. | Determining user intent based on ontologies of domains |
US20130110505A1 (en) * | 2006-09-08 | 2013-05-02 | Apple Inc. | Using Event Alert Text as Input to an Automated Assistant |
US9117447B2 (en) * | 2006-09-08 | 2015-08-25 | Apple Inc. | Using event alert text as input to an automated assistant |
US11012942B2 (en) | 2007-04-03 | 2021-05-18 | Apple Inc. | Method and system for operating a multi-function portable electronic device using voice-activation |
US8977255B2 (en) | 2007-04-03 | 2015-03-10 | Apple Inc. | Method and system for operating a multi-function portable electronic device using voice-activation |
US11671920B2 (en) | 2007-04-03 | 2023-06-06 | Apple Inc. | Method and system for operating a multifunction portable electronic device using voice-activation |
US10568032B2 (en) | 2007-04-03 | 2020-02-18 | Apple Inc. | Method and system for operating a multi-function portable electronic device using voice-activation |
US9053089B2 (en) | 2007-10-02 | 2015-06-09 | Apple Inc. | Part-of-speech tagging using latent analogy |
US11023513B2 (en) | 2007-12-20 | 2021-06-01 | Apple Inc. | Method and apparatus for searching using an active ontology |
US9330720B2 (en) | 2008-01-03 | 2016-05-03 | Apple Inc. | Methods and apparatus for altering audio output signals |
US10381016B2 (en) | 2008-01-03 | 2019-08-13 | Apple Inc. | Methods and apparatus for altering audio output signals |
US9361886B2 (en) | 2008-02-22 | 2016-06-07 | Apple Inc. | Providing text input using speech data and non-speech data |
US8688446B2 (en) | 2008-02-22 | 2014-04-01 | Apple Inc. | Providing text input using speech data and non-speech data |
US9865248B2 (en) | 2008-04-05 | 2018-01-09 | Apple Inc. | Intelligent text-to-speech conversion |
US8996376B2 (en) | 2008-04-05 | 2015-03-31 | Apple Inc. | Intelligent text-to-speech conversion |
US9626955B2 (en) | 2008-04-05 | 2017-04-18 | Apple Inc. | Intelligent text-to-speech conversion |
US9456054B2 (en) | 2008-05-16 | 2016-09-27 | Palo Alto Research Center Incorporated | Controlling the spread of interests and content in a content centric network |
US10104041B2 (en) | 2008-05-16 | 2018-10-16 | Cisco Technology, Inc. | Controlling the spread of interests and content in a content centric network |
US10819726B2 (en) * | 2008-05-27 | 2020-10-27 | The Trustees Of Columbia University In The City Of New York | Detecting network anomalies by probabilistic modeling of argument strings with markov chains |
US20190182279A1 (en) * | 2008-05-27 | 2019-06-13 | Yingbo Song | Detecting network anomalies by probabilistic modeling of argument strings with markov chains |
US9946706B2 (en) | 2008-06-07 | 2018-04-17 | Apple Inc. | Automatic language identification for dynamic text processing |
US10108612B2 (en) | 2008-07-31 | 2018-10-23 | Apple Inc. | Mobile device having human language translation capability with positional feedback |
US9535906B2 (en) | 2008-07-31 | 2017-01-03 | Apple Inc. | Mobile device having human language translation capability with positional feedback |
US9691383B2 (en) | 2008-09-05 | 2017-06-27 | Apple Inc. | Multi-tiered voice feedback in an electronic device |
US8768702B2 (en) | 2008-09-05 | 2014-07-01 | Apple Inc. | Multi-tiered voice feedback in an electronic device |
US8898568B2 (en) | 2008-09-09 | 2014-11-25 | Apple Inc. | Audio user interface |
US8712776B2 (en) | 2008-09-29 | 2014-04-29 | Apple Inc. | Systems and methods for selective text to speech synthesis |
US11348582B2 (en) | 2008-10-02 | 2022-05-31 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US8713119B2 (en) | 2008-10-02 | 2014-04-29 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US11900936B2 (en) | 2008-10-02 | 2024-02-13 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US10643611B2 (en) | 2008-10-02 | 2020-05-05 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US8762469B2 (en) | 2008-10-02 | 2014-06-24 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US8676904B2 (en) | 2008-10-02 | 2014-03-18 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US9959870B2 (en) | 2008-12-11 | 2018-05-01 | Apple Inc. | Speech recognition involving a mobile device |
US8862252B2 (en) | 2009-01-30 | 2014-10-14 | Apple Inc. | Audio user interface for displayless electronic device |
US20100198375A1 (en) * | 2009-01-30 | 2010-08-05 | Apple Inc. | Audio user interface for displayless electronic device |
US8751238B2 (en) | 2009-03-09 | 2014-06-10 | Apple Inc. | Systems and methods for determining the language to use for speech generated by a text to speech engine |
US8423494B2 (en) * | 2009-04-15 | 2013-04-16 | Virginia Polytechnic Institute And State University | Complex situation analysis system that generates a social contact network, uses edge brokers and service brokers, and dynamically adds brokers |
US8682828B2 (en) | 2009-04-15 | 2014-03-25 | Virginia Polytechnic Institute And State University | Complex situation analysis system that spawns/creates new brokers using existing brokers as needed to respond to requests for data |
US9367805B2 (en) | 2009-04-15 | 2016-06-14 | Virginia Polytechnic Institute And State University | Complex situation analysis system using a plurality of brokers that control access to information sources |
US9870531B2 (en) | 2009-04-15 | 2018-01-16 | Virginia Polytechnic Institute And State University | Analysis system using brokers that access information sources |
US20100293123A1 (en) * | 2009-04-15 | 2010-11-18 | Virginia Polytechnic Institute And State University | Complex situation analysis system |
US10795541B2 (en) | 2009-06-05 | 2020-10-06 | Apple Inc. | Intelligent organization of tasks items |
US11080012B2 (en) | 2009-06-05 | 2021-08-03 | Apple Inc. | Interface for a virtual digital assistant |
US10475446B2 (en) | 2009-06-05 | 2019-11-12 | Apple Inc. | Using context information to facilitate processing of commands in a virtual assistant |
US10540976B2 (en) | 2009-06-05 | 2020-01-21 | Apple Inc. | Contextual voice commands |
US9858925B2 (en) | 2009-06-05 | 2018-01-02 | Apple Inc. | Using context information to facilitate processing of commands in a virtual assistant |
US10283110B2 (en) | 2009-07-02 | 2019-05-07 | Apple Inc. | Methods and apparatuses for automatic speech recognition |
US9431006B2 (en) | 2009-07-02 | 2016-08-30 | Apple Inc. | Methods and apparatuses for automatic speech recognition |
US9686194B2 (en) | 2009-10-21 | 2017-06-20 | Cisco Technology, Inc. | Adaptive multi-interface use for content networking |
US8682649B2 (en) | 2009-11-12 | 2014-03-25 | Apple Inc. | Sentiment prediction from textual data |
US9311043B2 (en) | 2010-01-13 | 2016-04-12 | Apple Inc. | Adaptive audio feedback system and method |
US8670985B2 (en) | 2010-01-13 | 2014-03-11 | Apple Inc. | Devices and methods for identifying a prompt corresponding to a voice input in a sequence of prompts |
US11423886B2 (en) | 2010-01-18 | 2022-08-23 | Apple Inc. | Task flow identification based on user intent |
US9548050B2 (en) | 2010-01-18 | 2017-01-17 | Apple Inc. | Intelligent automated assistant |
US10553209B2 (en) | 2010-01-18 | 2020-02-04 | Apple Inc. | Systems and methods for hands-free notification summaries |
US10741185B2 (en) | 2010-01-18 | 2020-08-11 | Apple Inc. | Intelligent automated assistant |
US8892446B2 (en) | 2010-01-18 | 2014-11-18 | Apple Inc. | Service orchestration for intelligent automated assistant |
US10705794B2 (en) | 2010-01-18 | 2020-07-07 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
US10276170B2 (en) | 2010-01-18 | 2019-04-30 | Apple Inc. | Intelligent automated assistant |
US20130110515A1 (en) * | 2010-01-18 | 2013-05-02 | Apple Inc. | Disambiguation Based on Active Input Elicitation by Intelligent Automated Assistant |
US8903716B2 (en) | 2010-01-18 | 2014-12-02 | Apple Inc. | Personalized vocabulary for digital assistant |
US8799000B2 (en) * | 2010-01-18 | 2014-08-05 | Apple Inc. | Disambiguation based on active input elicitation by intelligent automated assistant |
US8660849B2 (en) | 2010-01-18 | 2014-02-25 | Apple Inc. | Prioritizing selection criteria by automated assistant |
US10679605B2 (en) | 2010-01-18 | 2020-06-09 | Apple Inc. | Hands-free list-reading by intelligent automated assistant |
US10706841B2 (en) | 2010-01-18 | 2020-07-07 | Apple Inc. | Task flow identification based on user intent |
US9318108B2 (en) | 2010-01-18 | 2016-04-19 | Apple Inc. | Intelligent automated assistant |
US12087308B2 (en) | 2010-01-18 | 2024-09-10 | Apple Inc. | Intelligent automated assistant |
US10496753B2 (en) | 2010-01-18 | 2019-12-03 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
US10049675B2 (en) | 2010-02-25 | 2018-08-14 | Apple Inc. | User profiling for voice input processing |
US8781990B1 (en) * | 2010-02-25 | 2014-07-15 | Google Inc. | Crowdsensus: deriving consensus information from statements made by a crowd of users |
US9190062B2 (en) | 2010-02-25 | 2015-11-17 | Apple Inc. | User profiling for voice input processing |
US9633660B2 (en) | 2010-02-25 | 2017-04-25 | Apple Inc. | User profiling for voice input processing |
US10692504B2 (en) | 2010-02-25 | 2020-06-23 | Apple Inc. | User profiling for voice input processing |
US8682667B2 (en) | 2010-02-25 | 2014-03-25 | Apple Inc. | User profiling for selecting user specific voice input processing information |
US20130232130A1 (en) * | 2010-03-18 | 2013-09-05 | Companybook As | Company network |
US20160019561A1 (en) * | 2010-03-29 | 2016-01-21 | Companybook As | Method and arrangement for monitoring companies |
US9141690B2 (en) * | 2010-05-14 | 2015-09-22 | Salesforce.Com, Inc. | Methods and systems for categorizing data in an on-demand database environment |
US10482106B2 (en) | 2010-05-14 | 2019-11-19 | Salesforce.Com, Inc. | Querying a database using relationship metadata |
US20110282872A1 (en) * | 2010-05-14 | 2011-11-17 | Salesforce.Com, Inc | Methods and Systems for Categorizing Data in an On-Demand Database Environment |
US8713021B2 (en) | 2010-07-07 | 2014-04-29 | Apple Inc. | Unsupervised document clustering using latent semantic density analysis |
US9349097B2 (en) * | 2010-07-13 | 2016-05-24 | M8 | Processor for situational analysis |
US20130179389A1 (en) * | 2010-07-13 | 2013-07-11 | Jean-Pierre Malle | Processor for situational analysis |
US8954422B2 (en) * | 2010-07-30 | 2015-02-10 | Ebay Inc. | Query suggestion for E-commerce sites |
US9323811B2 (en) | 2010-07-30 | 2016-04-26 | Ebay Inc. | Query suggestion for e-commerce sites |
US9858608B2 (en) | 2010-07-30 | 2018-01-02 | Ebay Inc. | Query suggestion for e-commerce sites |
US20120036123A1 (en) * | 2010-07-30 | 2012-02-09 | Mohammad Al Hasan | Query suggestion for e-commerce sites |
US8719006B2 (en) | 2010-08-27 | 2014-05-06 | Apple Inc. | Combined statistical and rule-based part-of-speech tagging for text-to-speech synthesis |
US8751432B2 (en) * | 2010-09-02 | 2014-06-10 | Anker Berg-Sonne | Automated facilities management system |
US20120150788A1 (en) * | 2010-09-02 | 2012-06-14 | Pepperdash Technology Corporation | Automated facilities management system |
US8849771B2 (en) | 2010-09-02 | 2014-09-30 | Anker Berg-Sonne | Rules engine with database triggering |
US8719014B2 (en) | 2010-09-27 | 2014-05-06 | Apple Inc. | Electronic device with text error correction based on voice recognition data |
US9075783B2 (en) | 2010-09-27 | 2015-07-07 | Apple Inc. | Electronic device with text error correction based on voice recognition data |
US20120116850A1 (en) * | 2010-11-10 | 2012-05-10 | International Business Machines Corporation | Causal modeling of multi-dimensional hierachical metric cubes |
US10360527B2 (en) * | 2010-11-10 | 2019-07-23 | International Business Machines Corporation | Casual modeling of multi-dimensional hierarchical metric cubes |
US9197658B2 (en) * | 2010-11-18 | 2015-11-24 | Nant Holdings Ip, Llc | Vector-based anomaly detection |
US10218732B2 (en) | 2010-11-18 | 2019-02-26 | Nant Holdings Ip, Llc | Vector-based anomaly detection |
US20140165201A1 (en) * | 2010-11-18 | 2014-06-12 | Nant Holdings Ip, Llc | Vector-Based Anomaly Detection |
US20190238578A1 (en) * | 2010-11-18 | 2019-08-01 | Nant Holdings Ip, Llc | Vector-based anomaly detection |
US10542027B2 (en) * | 2010-11-18 | 2020-01-21 | Nant Holdings Ip, Llc | Vector-based anomaly detection |
US11228608B2 (en) | 2010-11-18 | 2022-01-18 | Nant Holdings Ip, Llc | Vector-based anomaly detection |
US11848951B2 (en) | 2010-11-18 | 2023-12-19 | Nant Holdings Ip, Llc | Vector-based anomaly detection |
US9716723B2 (en) | 2010-11-18 | 2017-07-25 | Nant Holdings Ip, Llc | Vector-based anomaly detection |
US9672284B2 (en) * | 2010-12-21 | 2017-06-06 | Facebook, Inc. | Categorizing social network objects based on user affiliations |
US8738705B2 (en) * | 2010-12-21 | 2014-05-27 | Facebook, Inc. | Categorizing social network objects based on user affiliations |
US10013729B2 (en) * | 2010-12-21 | 2018-07-03 | Facebook, Inc. | Categorizing social network objects based on user affiliations |
US20140222821A1 (en) * | 2010-12-21 | 2014-08-07 | Facebook, Inc. | Categorizing social network objects based on user affiliations |
US20120158851A1 (en) * | 2010-12-21 | 2012-06-21 | Daniel Leon Kelmenson | Categorizing Social Network Objects Based on User Affiliations |
US10762293B2 (en) | 2010-12-22 | 2020-09-01 | Apple Inc. | Using parts-of-speech tagging and named entity recognition for spelling correction |
US10515147B2 (en) | 2010-12-22 | 2019-12-24 | Apple Inc. | Using statistical language models for contextual lookup |
US20120173004A1 (en) * | 2010-12-31 | 2012-07-05 | Brad Radl | System and Method for Real-Time Industrial Process Modeling |
US8457767B2 (en) * | 2010-12-31 | 2013-06-04 | Brad Radl | System and method for real-time industrial process modeling |
US8856807B1 (en) * | 2011-01-04 | 2014-10-07 | The Pnc Financial Services Group, Inc. | Alert event platform |
US11055754B1 (en) * | 2011-01-04 | 2021-07-06 | The Pnc Financial Services Group, Inc. | Alert event platform |
US20120174231A1 (en) * | 2011-01-04 | 2012-07-05 | Siemens Corporation | Assessing System Performance Impact of Security Attacks |
US8832839B2 (en) * | 2011-01-04 | 2014-09-09 | Siemens Aktiengesellschaft | Assessing system performance impact of security attacks |
US20130290380A1 (en) * | 2011-01-06 | 2013-10-31 | Thomson Licensing | Method and apparatus for updating a database in a receiving device |
US20130024512A1 (en) * | 2011-02-13 | 2013-01-24 | Georgi Milev | Feature-extended apparatus, system and method for social networking and secure resource sharing |
US8781836B2 (en) | 2011-02-22 | 2014-07-15 | Apple Inc. | Hearing assistance system for providing consistent human speech |
US9614807B2 (en) | 2011-02-23 | 2017-04-04 | Bottlenose, Inc. | System and method for analyzing messages in a network or across networks |
US8700629B2 (en) * | 2011-02-28 | 2014-04-15 | Battelle Memorial Institute | Automatic identification of abstract online groups |
US20130191390A1 (en) * | 2011-02-28 | 2013-07-25 | Battelle Memorial Institute | Automatic Identification of Abstract Online Groups |
US9652616B1 (en) * | 2011-03-14 | 2017-05-16 | Symantec Corporation | Techniques for classifying non-process threats |
US10102359B2 (en) | 2011-03-21 | 2018-10-16 | Apple Inc. | Device access using voice authentication |
US10417405B2 (en) | 2011-03-21 | 2019-09-17 | Apple Inc. | Device access using voice authentication |
US9262612B2 (en) | 2011-03-21 | 2016-02-16 | Apple Inc. | Device access using voice authentication |
US20120246054A1 (en) * | 2011-03-22 | 2012-09-27 | Gautham Sastri | Reaction indicator for sentiment of social media messages |
US20150293979A1 (en) * | 2011-03-24 | 2015-10-15 | Morphism Llc | Propagation Through Perdurance |
US20140283059A1 (en) * | 2011-04-11 | 2014-09-18 | NSS Lab Works LLC | Continuous Monitoring of Computer User and Computer Activities |
US9047464B2 (en) * | 2011-04-11 | 2015-06-02 | NSS Lab Works LLC | Continuous monitoring of computer user and computer activities |
US20150135048A1 (en) * | 2011-04-20 | 2015-05-14 | Panafold | Methods, apparatus, and systems for visually representing a relative relevance of content elements to an attractor |
US8862492B1 (en) | 2011-04-29 | 2014-10-14 | Google Inc. | Identifying unreliable contributors of user-generated content |
US9552552B1 (en) | 2011-04-29 | 2017-01-24 | Google Inc. | Identification of over-clustered map features |
US10095980B1 (en) | 2011-04-29 | 2018-10-09 | Google Llc | Moderation of user-generated content |
US11868914B2 (en) | 2011-04-29 | 2024-01-09 | Google Llc | Moderation of user-generated content |
US11443214B2 (en) | 2011-04-29 | 2022-09-13 | Google Llc | Moderation of user-generated content |
US20120284307A1 (en) * | 2011-05-06 | 2012-11-08 | Gopogo, Llc | String Searching Systems and Methods Thereof |
US20120303348A1 (en) * | 2011-05-23 | 2012-11-29 | Gm Global Technology Operation Llc | System and methods for fault-isolation and fault-mitigation based on network modeling |
DE102012102770B4 (en) | 2011-05-23 | 2019-12-19 | GM Global Technology Operations LLC (n. d. Gesetzen des Staates Delaware) | System and method for error isolation and error mitigation based on network modeling |
DE102012102770B9 (en) * | 2011-05-23 | 2020-03-19 | GM Global Technology Operations LLC (n. d. Gesetzen des Staates Delaware) | System and method for error isolation and error mitigation based on network modeling |
US8577663B2 (en) * | 2011-05-23 | 2013-11-05 | GM Global Technology Operations LLC | System and methods for fault-isolation and fault-mitigation based on network modeling |
US20120311474A1 (en) * | 2011-06-02 | 2012-12-06 | Microsoft Corporation | Map-based methods of visualizing relational databases |
US9009192B1 (en) * | 2011-06-03 | 2015-04-14 | Google Inc. | Identifying central entities |
US10057736B2 (en) | 2011-06-03 | 2018-08-21 | Apple Inc. | Active transport based notifications |
US10672399B2 (en) | 2011-06-03 | 2020-06-02 | Apple Inc. | Switching between text data and audio data based on a mapping |
US10068022B2 (en) | 2011-06-03 | 2018-09-04 | Google Llc | Identifying topical entities |
US11350253B2 (en) | 2011-06-03 | 2022-05-31 | Apple Inc. | Active transport based notifications |
US10706373B2 (en) | 2011-06-03 | 2020-07-07 | Apple Inc. | Performing actions associated with task items that represent tasks to perform |
US10241644B2 (en) | 2011-06-03 | 2019-03-26 | Apple Inc. | Actionable reminder entries |
US11120372B2 (en) | 2011-06-03 | 2021-09-14 | Apple Inc. | Performing actions associated with task items that represent tasks to perform |
US10255566B2 (en) | 2011-06-03 | 2019-04-09 | Apple Inc. | Generating and processing task items that represent tasks to perform |
US9680857B1 (en) | 2011-06-08 | 2017-06-13 | United States Automobile Association (USAA) | Cyber intelligence clearinghouse |
US9319420B1 (en) * | 2011-06-08 | 2016-04-19 | United Services Automobile Association (Usaa) | Cyber intelligence clearinghouse |
US8280891B1 (en) * | 2011-06-17 | 2012-10-02 | Google Inc. | System and method for the calibration of a scoring function |
US9286182B2 (en) * | 2011-06-17 | 2016-03-15 | Microsoft Technology Licensing, Llc | Virtual machine snapshotting and analysis |
US20120323853A1 (en) * | 2011-06-17 | 2012-12-20 | Microsoft Corporation | Virtual machine snapshotting and analysis |
US8812294B2 (en) | 2011-06-21 | 2014-08-19 | Apple Inc. | Translating phrases from one language into another using an order-based set of declarative rules |
US10217051B2 (en) | 2011-08-04 | 2019-02-26 | Smart Information Flow Technologies, LLC | Systems and methods for determining social perception |
US10217050B2 (en) | 2011-08-04 | 2019-02-26 | Smart Information Flow Technolgies, Llc | Systems and methods for determining social perception |
US10217049B2 (en) | 2011-08-04 | 2019-02-26 | Smart Information Flow Technologies, LLC | Systems and methods for determining social perception |
US9053421B2 (en) | 2011-08-04 | 2015-06-09 | Smart Information Flow Technologies LLC | Systems and methods for determining social perception scores |
US8825584B1 (en) * | 2011-08-04 | 2014-09-02 | Smart Information Flow Technologies LLC | Systems and methods for determining social regard scores |
US8706472B2 (en) | 2011-08-11 | 2014-04-22 | Apple Inc. | Method for disambiguating multiple readings in language conversion |
US9798393B2 (en) | 2011-08-29 | 2017-10-24 | Apple Inc. | Text correction processing |
US9342603B2 (en) | 2011-09-13 | 2016-05-17 | Airtime Media, Inc. | Experience graph |
US8838572B2 (en) * | 2011-09-13 | 2014-09-16 | Airtime Media, Inc. | Experience Graph |
US20130124497A1 (en) * | 2011-09-13 | 2013-05-16 | Airtime Media, Inc. | Experience graph |
US20130073594A1 (en) * | 2011-09-19 | 2013-03-21 | Citigroup Technology, Inc. | Methods and Systems for Assessing Data Quality |
US10248672B2 (en) * | 2011-09-19 | 2019-04-02 | Citigroup Technology, Inc. | Methods and systems for assessing data quality |
US8762156B2 (en) | 2011-09-28 | 2014-06-24 | Apple Inc. | Speech recognition repair using contextual information |
US20130085715A1 (en) * | 2011-09-29 | 2013-04-04 | Choudur Lakshminarayan | Anomaly detection in streaming data |
US9218527B2 (en) * | 2011-09-29 | 2015-12-22 | Hewlett-Packard Development Company, L.P. | Anomaly detection in streaming data |
US9690849B2 (en) * | 2011-09-30 | 2017-06-27 | Thomson Reuters Global Resources Unlimited Company | Systems and methods for determining atypical language |
US20140344279A1 (en) * | 2011-09-30 | 2014-11-20 | Thomson Reuters Global Resources | Systems and methods for determining atypical language |
US10241752B2 (en) | 2011-09-30 | 2019-03-26 | Apple Inc. | Interface for a virtual digital assistant |
US9635047B2 (en) * | 2011-10-18 | 2017-04-25 | Mcafee, Inc. | User behavioral risk assessment |
US9648035B2 (en) | 2011-10-18 | 2017-05-09 | Mcafee, Inc. | User behavioral risk assessment |
US10505965B2 (en) | 2011-10-18 | 2019-12-10 | Mcafee, Llc | User behavioral risk assessment |
US20150334129A1 (en) * | 2011-10-18 | 2015-11-19 | Mcafee, Inc. | User behavioral risk assessment |
US9529777B2 (en) * | 2011-10-28 | 2016-12-27 | Electronic Arts Inc. | User behavior analyzer |
US20130111019A1 (en) * | 2011-10-28 | 2013-05-02 | Electronic Arts Inc. | User behavior analyzer |
US10193772B1 (en) | 2011-10-28 | 2019-01-29 | Electronic Arts Inc. | User behavior analyzer |
US20130132560A1 (en) * | 2011-11-22 | 2013-05-23 | Sap Ag | Dynamic adaptations for network delays during complex event processing |
US9059935B2 (en) * | 2011-11-22 | 2015-06-16 | Sap Se | Dynamic adaptations for network delays during complex event processing |
US20140330968A1 (en) * | 2011-12-15 | 2014-11-06 | Telefonaktiebolaget L M Ericsson (Publ) | Method and trend analyzer for analyzing data in a communication network |
US9736132B2 (en) | 2011-12-20 | 2017-08-15 | Amazon Technologies, Inc. | Workflow directed resource access |
US9152461B1 (en) | 2011-12-20 | 2015-10-06 | Amazon Technologies, Inc. | Management of computing devices processing workflow stages of a resource dependent workflow |
US9158583B1 (en) | 2011-12-20 | 2015-10-13 | Amazon Technologies, Inc. | Management of computing devices processing workflow stages of a resource dependent workflow |
US8738775B1 (en) * | 2011-12-20 | 2014-05-27 | Amazon Technologies, Inc. | Managing resource dependent workflows |
US8788663B1 (en) | 2011-12-20 | 2014-07-22 | Amazon Technologies, Inc. | Managing resource dependent workflows |
US9128761B1 (en) | 2011-12-20 | 2015-09-08 | Amazon Technologies, Inc. | Management of computing devices processing workflow stages of resource dependent workflow |
US9152460B1 (en) | 2011-12-20 | 2015-10-06 | Amazon Technologies, Inc. | Management of computing devices processing workflow stages of a resource dependent workflow |
US9552490B1 (en) | 2011-12-20 | 2017-01-24 | Amazon Technologies, Inc. | Managing resource dependent workflows |
US9111218B1 (en) | 2011-12-27 | 2015-08-18 | Google Inc. | Method and system for remediating topic drift in near-real-time classification of customer feedback |
US9276948B2 (en) * | 2011-12-29 | 2016-03-01 | 21Ct, Inc. | Method and apparatus for identifying a threatening network |
US9578051B2 (en) * | 2011-12-29 | 2017-02-21 | 21Ct, Inc. | Method and system for identifying a threatening network |
US8832116B1 (en) | 2012-01-11 | 2014-09-09 | Google Inc. | Using mobile application logs to measure and maintain accuracy of business information |
US9304989B2 (en) | 2012-02-17 | 2016-04-05 | Bottlenose, Inc. | Machine-based content analysis and user perception tracking of microcontent messages |
US11069336B2 (en) | 2012-03-02 | 2021-07-20 | Apple Inc. | Systems and methods for name pronunciation |
US10134385B2 (en) | 2012-03-02 | 2018-11-20 | Apple Inc. | Systems and methods for name pronunciation |
US9483461B2 (en) | 2012-03-06 | 2016-11-01 | Apple Inc. | Handling speech synthesis of content for multiple languages |
US11159505B1 (en) * | 2012-03-20 | 2021-10-26 | United Services Automobile Association (Usaa) | Scalable risk-based authentication methods and systems |
US11792176B1 (en) * | 2012-03-20 | 2023-10-17 | United Services Automobile Association (Usaa) | Scalable risk-based authentication methods and systems |
US10834119B1 (en) | 2012-03-20 | 2020-11-10 | United Services Automobile Association (Usaa) | Dynamic risk engine |
US9979744B1 (en) | 2012-03-20 | 2018-05-22 | United States Automobile Association (USAA) | Dynamic risk engine |
US11863579B1 (en) | 2012-03-20 | 2024-01-02 | United Services Automobile Association (Usaa) | Dynamic risk engine |
US10432605B1 (en) * | 2012-03-20 | 2019-10-01 | United Services Automobile Association (Usaa) | Scalable risk-based authentication methods and systems |
US10164999B1 (en) | 2012-03-20 | 2018-12-25 | United Services Automobile Association (Usaa) | Dynamic risk engine |
US20130290350A1 (en) * | 2012-04-30 | 2013-10-31 | Abdullah Al Mueen | Similarity Search Initialization |
US8972415B2 (en) * | 2012-04-30 | 2015-03-03 | Hewlett-Packard Development Company, L.P. | Similarity search initialization |
US9953088B2 (en) | 2012-05-14 | 2018-04-24 | Apple Inc. | Crowd sourcing information to fulfill user requests |
US20130318095A1 (en) * | 2012-05-14 | 2013-11-28 | WaLa! Inc. | Distributed computing environment for data capture, search and analytics |
US9280610B2 (en) | 2012-05-14 | 2016-03-08 | Apple Inc. | Crowd sourcing information to fulfill user requests |
US11321116B2 (en) | 2012-05-15 | 2022-05-03 | Apple Inc. | Systems and methods for integrating third party services with a digital assistant |
US10417037B2 (en) | 2012-05-15 | 2019-09-17 | Apple Inc. | Systems and methods for integrating third party services with a digital assistant |
US11269678B2 (en) | 2012-05-15 | 2022-03-08 | Apple Inc. | Systems and methods for integrating third party services with a digital assistant |
US8775442B2 (en) | 2012-05-15 | 2014-07-08 | Apple Inc. | Semantic search using a single-source semantic model |
US9495464B2 (en) * | 2012-05-23 | 2016-11-15 | International Business Machines Corporation | Policy based population of genealogical archive data |
US9996625B2 (en) | 2012-05-23 | 2018-06-12 | International Business Machines Corporation | Policy based population of genealogical archive data |
US10546033B2 (en) | 2012-05-23 | 2020-01-28 | International Business Machines Corporation | Policy based population of genealogical archive data |
US20150347602A1 (en) * | 2012-05-23 | 2015-12-03 | International Business Machines Corporation | Policy based population of genealogical archive data |
US9721563B2 (en) | 2012-06-08 | 2017-08-01 | Apple Inc. | Name recognition system |
US10019994B2 (en) | 2012-06-08 | 2018-07-10 | Apple Inc. | Systems and methods for recognizing textual identifiers within a plurality of words |
US10079014B2 (en) | 2012-06-08 | 2018-09-18 | Apple Inc. | Name recognition system |
US9548987B1 (en) * | 2012-06-11 | 2017-01-17 | EMC IP Holding Company LLC | Intelligent remediation of security-related events |
US20130339367A1 (en) * | 2012-06-14 | 2013-12-19 | Santhosh Adayikkoth | Method and system for preferential accessing of one or more critical entities |
US9495129B2 (en) | 2012-06-29 | 2016-11-15 | Apple Inc. | Device, method, and user interface for voice-activated navigation and browsing of a document |
US20140040281A1 (en) * | 2012-07-31 | 2014-02-06 | Bottlenose, Inc. | Discovering and ranking trending links about topics |
US9009126B2 (en) * | 2012-07-31 | 2015-04-14 | Bottlenose, Inc. | Discovering and ranking trending links about topics |
US9424612B1 (en) * | 2012-08-02 | 2016-08-23 | Facebook, Inc. | Systems and methods for managing user reputations in social networking systems |
US9436686B1 (en) * | 2012-08-07 | 2016-09-06 | Google Inc. | Claim evaluation system |
US20140067370A1 (en) * | 2012-08-31 | 2014-03-06 | Xerox Corporation | Learning opinion-related patterns for contextual and domain-dependent opinion detection |
US9292589B2 (en) * | 2012-09-04 | 2016-03-22 | Salesforce.Com, Inc. | Identifying a topic for text using a database system |
US20140067814A1 (en) * | 2012-09-04 | 2014-03-06 | salesforces.com, Inc. | Computer implemented methods and apparatus for identifying a topic for a text |
US9412067B2 (en) * | 2012-09-05 | 2016-08-09 | Numenta, Inc. | Anomaly detection in spatial and temporal memory system |
US11087227B2 (en) | 2012-09-05 | 2021-08-10 | Numenta, Inc. | Anomaly detection in spatial and temporal memory system |
US20140067734A1 (en) * | 2012-09-05 | 2014-03-06 | Numenta, Inc. | Anomaly detection in spatial and temporal memory system |
US20140351931A1 (en) * | 2012-09-06 | 2014-11-27 | Dstillery, Inc. | Methods, systems and media for detecting non-intended traffic using co-visitation information |
US9306958B2 (en) * | 2012-09-06 | 2016-04-05 | Dstillery, Inc. | Methods, systems and media for detecting non-intended traffic using co-visitation information |
US10783324B2 (en) | 2012-09-07 | 2020-09-22 | Splunk Inc. | Wizard for configuring a field extraction rule |
US9047181B2 (en) * | 2012-09-07 | 2015-06-02 | Splunk Inc. | Visualization of data from clusters |
US11651149B1 (en) | 2012-09-07 | 2023-05-16 | Splunk Inc. | Event selection via graphical user interface control |
US10783318B2 (en) | 2012-09-07 | 2020-09-22 | Splunk, Inc. | Facilitating modification of an extracted field |
US11423216B2 (en) | 2012-09-07 | 2022-08-23 | Splunk Inc. | Providing extraction results for a particular field |
US11972203B1 (en) | 2012-09-07 | 2024-04-30 | Splunk Inc. | Using anchors to generate extraction rules |
US20170139887A1 (en) | 2012-09-07 | 2017-05-18 | Splunk, Inc. | Advanced field extractor with modification of an extracted field |
US9043332B2 (en) * | 2012-09-07 | 2015-05-26 | Splunk Inc. | Cluster performance monitoring |
US11042697B2 (en) | 2012-09-07 | 2021-06-22 | Splunk Inc. | Determining an extraction rule from positive and negative examples |
US10394946B2 (en) | 2012-09-07 | 2019-08-27 | Splunk Inc. | Refining extraction rules based on selected text within events |
US9576574B2 (en) | 2012-09-10 | 2017-02-21 | Apple Inc. | Context-sensitive handling of interruptions by intelligent digital assistant |
US9830344B2 (en) * | 2012-09-17 | 2017-11-28 | Amazon Techonoligies, Inc. | Evaluation of nodes |
US8977622B1 (en) * | 2012-09-17 | 2015-03-10 | Amazon Technologies, Inc. | Evaluation of nodes |
US20150161187A1 (en) * | 2012-09-17 | 2015-06-11 | Amazon Technologies, Inc. | Evaluation of Nodes |
US9971774B2 (en) | 2012-09-19 | 2018-05-15 | Apple Inc. | Voice-based media searching |
US9547647B2 (en) | 2012-09-19 | 2017-01-17 | Apple Inc. | Voice-based media searching |
US8935167B2 (en) | 2012-09-25 | 2015-01-13 | Apple Inc. | Exemplar-based latent perceptual modeling for automatic speech recognition |
US20140095606A1 (en) * | 2012-10-01 | 2014-04-03 | Jonathan Arie Matus | Mobile Device-Related Measures of Affinity |
TWI618015B (en) * | 2012-10-01 | 2018-03-11 | 菲絲博克公司 | Method,computer-readable non-transitory storage medium,and system for mobile device-related measures of affinity |
US9654591B2 (en) * | 2012-10-01 | 2017-05-16 | Facebook, Inc. | Mobile device-related measures of affinity |
US20170091645A1 (en) * | 2012-10-01 | 2017-03-30 | Facebook, Inc. | Mobile device-related measures of affinity |
US10257309B2 (en) * | 2012-10-01 | 2019-04-09 | Facebook, Inc. | Mobile device-related measures of affinity |
US10333820B1 (en) | 2012-10-23 | 2019-06-25 | Quest Software Inc. | System for inferring dependencies among computing systems |
US20150234883A1 (en) * | 2012-11-05 | 2015-08-20 | Tencent Technology (Shenzhen) Company Limited | Method and system for retrieving real-time information |
US20140129299A1 (en) * | 2012-11-06 | 2014-05-08 | Nice-Systems Ltd | Method and apparatus for detection and analysis of first contact resolution failures |
US10242330B2 (en) * | 2012-11-06 | 2019-03-26 | Nice-Systems Ltd | Method and apparatus for detection and analysis of first contact resolution failures |
US8931101B2 (en) | 2012-11-14 | 2015-01-06 | International Business Machines Corporation | Application-level anomaly detection |
US9317887B2 (en) * | 2012-11-14 | 2016-04-19 | Electronics And Telecommunications Research Institute | Similarity calculating method and apparatus |
US9141792B2 (en) | 2012-11-14 | 2015-09-22 | International Business Machines Corporation | Application-level anomaly detection |
US20140136534A1 (en) * | 2012-11-14 | 2014-05-15 | Electronics And Telecommunications Research Institute | Similarity calculating method and apparatus |
US9235866B2 (en) | 2012-12-12 | 2016-01-12 | Tata Consultancy Services Limited | Analyzing social network |
US11044221B2 (en) * | 2012-12-12 | 2021-06-22 | Netspective Communications Llc | Integration of devices through a social networking platform |
US10430839B2 (en) | 2012-12-12 | 2019-10-01 | Cisco Technology, Inc. | Distributed advertisement insertion in content-centric networks |
US11777894B2 (en) | 2012-12-12 | 2023-10-03 | Netspective Communications Llc | Integration of devices through a social networking platform |
US8935271B2 (en) * | 2012-12-21 | 2015-01-13 | Facebook, Inc. | Extract operator |
US20140181091A1 (en) * | 2012-12-21 | 2014-06-26 | Soren Bogh Lassen | Extract Operator |
US11243849B2 (en) * | 2012-12-27 | 2022-02-08 | Commvault Systems, Inc. | Restoration of centralized data storage manager, such as data storage manager in a hierarchical data storage system |
US10409980B2 (en) | 2012-12-27 | 2019-09-10 | Crowdstrike, Inc. | Real-time representation of security-relevant system state |
US9659085B2 (en) | 2012-12-28 | 2017-05-23 | Microsoft Technology Licensing, Llc | Detecting anomalies in behavioral network with contextual side information |
US11204952B2 (en) * | 2012-12-28 | 2021-12-21 | Microsoft Technology Licensing, Llc | Detecting anomalies in behavioral network with contextual side information |
US12002341B2 (en) | 2013-01-15 | 2024-06-04 | Fitbit, Inc. | Portable monitoring devices and methods of operating the same |
US20170143239A1 (en) * | 2013-01-15 | 2017-05-25 | Fitbit, Inc. | Portable monitoring devices and methods of operating the same |
US10134256B2 (en) * | 2013-01-15 | 2018-11-20 | Fitbit, Inc. | Portable monitoring devices and methods of operating the same |
US11423757B2 (en) | 2013-01-15 | 2022-08-23 | Fitbit, Inc. | Portable monitoring devices and methods of operating the same |
US11775548B1 (en) | 2013-01-22 | 2023-10-03 | Splunk Inc. | Selection of representative data subsets from groups of events |
US9582557B2 (en) * | 2013-01-22 | 2017-02-28 | Splunk Inc. | Sampling events for rule creation with process selection |
US10318537B2 (en) | 2013-01-22 | 2019-06-11 | Splunk Inc. | Advanced field extractor |
US20150234905A1 (en) * | 2013-01-22 | 2015-08-20 | Splunk Inc. | Sampling Events for Rule Creation with Process Selection |
US11709850B1 (en) | 2013-01-22 | 2023-07-25 | Splunk Inc. | Using a timestamp selector to select a time information and a type of time information |
US10585910B1 (en) | 2013-01-22 | 2020-03-10 | Splunk Inc. | Managing selection of a representative data subset according to user-specified parameters with clustering |
US11106691B2 (en) | 2013-01-22 | 2021-08-31 | Splunk Inc. | Automated extraction rule generation using a timestamp selector |
US11232124B2 (en) | 2013-01-22 | 2022-01-25 | Splunk Inc. | Selection of a representative data subset of a set of unstructured data |
US11100150B2 (en) | 2013-01-23 | 2021-08-24 | Splunk Inc. | Determining rules based on text |
US10282463B2 (en) * | 2013-01-23 | 2019-05-07 | Splunk Inc. | Displaying a number of events that have a particular value for a field in a set of events |
US11822372B1 (en) | 2013-01-23 | 2023-11-21 | Splunk Inc. | Automated extraction rule modification based on rejected field values |
US10579648B2 (en) | 2013-01-23 | 2020-03-03 | Splunk Inc. | Determining events associated with a value |
US10585919B2 (en) | 2013-01-23 | 2020-03-10 | Splunk Inc. | Determining events having a value |
US11514086B2 (en) | 2013-01-23 | 2022-11-29 | Splunk Inc. | Generating statistics associated with unique field values |
US11782678B1 (en) | 2013-01-23 | 2023-10-10 | Splunk Inc. | Graphical user interface for extraction rules |
US20170255695A1 (en) | 2013-01-23 | 2017-09-07 | Splunk, Inc. | Determining Rules Based on Text |
US12061638B1 (en) | 2013-01-23 | 2024-08-13 | Splunk Inc. | Presenting filtered events having selected extracted values |
US10802797B2 (en) | 2013-01-23 | 2020-10-13 | Splunk Inc. | Providing an extraction rule associated with a selected portion of an event |
US11210325B2 (en) | 2013-01-23 | 2021-12-28 | Splunk Inc. | Automatic rule modification |
US11556577B2 (en) | 2013-01-23 | 2023-01-17 | Splunk Inc. | Filtering event records based on selected extracted value |
US10769178B2 (en) * | 2013-01-23 | 2020-09-08 | Splunk Inc. | Displaying a proportion of events that have a particular value for a field in a set of events |
US11119728B2 (en) | 2013-01-23 | 2021-09-14 | Splunk Inc. | Displaying event records with emphasized fields |
US11557310B2 (en) | 2013-02-07 | 2023-01-17 | Apple Inc. | Voice trigger for a digital assistant |
US10199051B2 (en) | 2013-02-07 | 2019-02-05 | Apple Inc. | Voice trigger for a digital assistant |
US10714117B2 (en) | 2013-02-07 | 2020-07-14 | Apple Inc. | Voice trigger for a digital assistant |
US9081957B2 (en) * | 2013-02-07 | 2015-07-14 | Ryatheon BBN Technologies Corp | Dynamic operational watermarking for software and hardware assurance |
US11636869B2 (en) | 2013-02-07 | 2023-04-25 | Apple Inc. | Voice trigger for a digital assistant |
US11862186B2 (en) | 2013-02-07 | 2024-01-02 | Apple Inc. | Voice trigger for a digital assistant |
US10978090B2 (en) | 2013-02-07 | 2021-04-13 | Apple Inc. | Voice trigger for a digital assistant |
US20140223554A1 (en) * | 2013-02-07 | 2014-08-07 | Thomas Gilbert Roden, III | Dynamic operational watermarking for software and hardware assurance |
US10776708B2 (en) | 2013-03-01 | 2020-09-15 | Forcepoint, LLC | Analyzing behavior in light of social time |
US10832153B2 (en) | 2013-03-01 | 2020-11-10 | Forcepoint, LLC | Analyzing behavior in light of social time |
US11783216B2 (en) | 2013-03-01 | 2023-10-10 | Forcepoint Llc | Analyzing behavior in light of social time |
US10860942B2 (en) | 2013-03-01 | 2020-12-08 | Forcepoint, LLC | Analyzing behavior in light of social time |
US10587631B2 (en) * | 2013-03-11 | 2020-03-10 | Facebook, Inc. | Database attack detection tool |
US9195943B2 (en) * | 2013-03-12 | 2015-11-24 | Bmc Software, Inc. | Behavioral rules discovery for intelligent computing environment administration |
US10692007B2 (en) | 2013-03-12 | 2020-06-23 | Bmc Software, Inc. | Behavioral rules discovery for intelligent computing environment administration |
US20140282422A1 (en) * | 2013-03-12 | 2014-09-18 | Netflix, Inc. | Using canary instances for software analysis |
US10318399B2 (en) * | 2013-03-12 | 2019-06-11 | Netflix, Inc. | Using canary instances for software analysis |
US20140278729A1 (en) * | 2013-03-12 | 2014-09-18 | Palo Alto Research Center Incorporated | Multiple resolution visualization of detected anomalies in corporate environment |
US9563849B2 (en) | 2013-03-12 | 2017-02-07 | Bmc Software, Inc. | Behavioral rules discovery for intelligent computing environment administration |
US20140279797A1 (en) * | 2013-03-12 | 2014-09-18 | Bmc Software, Inc. | Behavioral rules discovery for intelligent computing environment administration |
US9563412B2 (en) * | 2013-03-13 | 2017-02-07 | Microsoft Technology Licensing, Llc. | Statically extensible types |
US20140282442A1 (en) * | 2013-03-13 | 2014-09-18 | Microsoft Corporation | Statically extensible types |
US9652207B2 (en) | 2013-03-13 | 2017-05-16 | Microsoft Technology Licensing, Llc. | Static type checking across module universes |
US9639335B2 (en) | 2013-03-13 | 2017-05-02 | Microsoft Technology Licensing, Llc. | Contextual typing |
US10642574B2 (en) | 2013-03-14 | 2020-05-05 | Apple Inc. | Device, method, and graphical user interface for outputting captions |
US9977779B2 (en) | 2013-03-14 | 2018-05-22 | Apple Inc. | Automatic supplementation of word correction dictionaries |
US11388291B2 (en) | 2013-03-14 | 2022-07-12 | Apple Inc. | System and method for processing voicemail |
US9368114B2 (en) | 2013-03-14 | 2016-06-14 | Apple Inc. | Context-sensitive handling of interruptions |
US10572476B2 (en) | 2013-03-14 | 2020-02-25 | Apple Inc. | Refining a search based on schedule items |
US20140317019A1 (en) * | 2013-03-14 | 2014-10-23 | Jochen Papenbrock | System and method for risk management and portfolio optimization |
US9733821B2 (en) | 2013-03-14 | 2017-08-15 | Apple Inc. | Voice control to diagnose inadvertent activation of accessibility features |
US10652394B2 (en) | 2013-03-14 | 2020-05-12 | Apple Inc. | System and method for processing voicemail |
US11798547B2 (en) | 2013-03-15 | 2023-10-24 | Apple Inc. | Voice activated device for use with a voice-based digital assistant |
US10748529B1 (en) | 2013-03-15 | 2020-08-18 | Apple Inc. | Voice activated device for use with a voice-based digital assistant |
US8918339B2 (en) * | 2013-03-15 | 2014-12-23 | Facebook, Inc. | Associating an indication of user emotional reaction with content items presented by a social networking system |
US9922642B2 (en) | 2013-03-15 | 2018-03-20 | Apple Inc. | Training an at least partial voice command system |
US10298534B2 (en) | 2013-03-15 | 2019-05-21 | Facebook, Inc. | Associating an indication of user emotional reaction with content items presented by a social networking system |
US9697822B1 (en) | 2013-03-15 | 2017-07-04 | Apple Inc. | System and method for updating an adaptive speech recognition model |
US10931622B1 (en) | 2013-03-15 | 2021-02-23 | Facebook, Inc. | Associating an indication of user emotional reaction with content items presented by a social networking system |
US11151899B2 (en) | 2013-03-15 | 2021-10-19 | Apple Inc. | User training by intelligent digital assistant |
US20140279418A1 (en) * | 2013-03-15 | 2014-09-18 | Facebook, Inc. | Associating an indication of user emotional reaction with content items presented by a social networking system |
US10078487B2 (en) | 2013-03-15 | 2018-09-18 | Apple Inc. | Context-sensitive handling of interruptions |
US20170169360A1 (en) * | 2013-04-02 | 2017-06-15 | Patternex, Inc. | Method and system for training a big data machine to defend |
US9613322B2 (en) | 2013-04-02 | 2017-04-04 | Orbis Technologies, Inc. | Data center analytics and dashboard |
WO2014165601A1 (en) * | 2013-04-02 | 2014-10-09 | Orbis Technologies, Inc. | Data center analytics and dashboard |
US9904893B2 (en) * | 2013-04-02 | 2018-02-27 | Patternex, Inc. | Method and system for training a big data machine to defend |
US9544380B2 (en) | 2013-04-10 | 2017-01-10 | International Business Machines Corporation | Data analytics and security in social networks |
US9264442B2 (en) * | 2013-04-26 | 2016-02-16 | Palo Alto Research Center Incorporated | Detecting anomalies in work practice data by combining multiple domains of information |
US20140325643A1 (en) * | 2013-04-26 | 2014-10-30 | Palo Alto Research Center Incorporated | Detecting anomalies in work practice data by combining multiple domains of information |
US9935791B2 (en) | 2013-05-20 | 2018-04-03 | Cisco Technology, Inc. | Method and system for name resolution across heterogeneous architectures |
US20140351129A1 (en) * | 2013-05-24 | 2014-11-27 | Hewlett-Packard Development Company, L.P. | Centralized versatile transaction verification |
US9858171B2 (en) * | 2013-06-03 | 2018-01-02 | Google Llc | Application analytics reporting |
US20160210219A1 (en) * | 2013-06-03 | 2016-07-21 | Google Inc. | Application analytics reporting |
US9633674B2 (en) | 2013-06-07 | 2017-04-25 | Apple Inc. | System and method for detecting errors in interactions with a voice-based digital assistant |
US9620104B2 (en) | 2013-06-07 | 2017-04-11 | Apple Inc. | System and method for user-specified pronunciation of words for speech synthesis and recognition |
US9966060B2 (en) | 2013-06-07 | 2018-05-08 | Apple Inc. | System and method for user-specified pronunciation of words for speech synthesis and recognition |
US9582608B2 (en) | 2013-06-07 | 2017-02-28 | Apple Inc. | Unified ranking with entropy-weighted information for phrase-based semantic auto-completion |
US10657961B2 (en) | 2013-06-08 | 2020-05-19 | Apple Inc. | Interpreting and acting upon commands that involve sharing information with remote devices |
US9966068B2 (en) | 2013-06-08 | 2018-05-08 | Apple Inc. | Interpreting and acting upon commands that involve sharing information with remote devices |
US11048473B2 (en) | 2013-06-09 | 2021-06-29 | Apple Inc. | Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant |
US10769385B2 (en) | 2013-06-09 | 2020-09-08 | Apple Inc. | System and method for inferring user intent from speech inputs |
US10185542B2 (en) | 2013-06-09 | 2019-01-22 | Apple Inc. | Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant |
US10176167B2 (en) | 2013-06-09 | 2019-01-08 | Apple Inc. | System and method for inferring user intent from speech inputs |
US12073147B2 (en) | 2013-06-09 | 2024-08-27 | Apple Inc. | Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant |
US11727219B2 (en) | 2013-06-09 | 2023-08-15 | Apple Inc. | System and method for inferring user intent from speech inputs |
US9300784B2 (en) | 2013-06-13 | 2016-03-29 | Apple Inc. | System and method for emergency calls initiated by voice command |
WO2014205421A1 (en) * | 2013-06-21 | 2014-12-24 | Arizona Board Of Regents For The University Of Arizona | Automated detection of insider threats |
US9516046B2 (en) | 2013-07-25 | 2016-12-06 | Splunk Inc. | Analyzing a group of values extracted from events of machine data relative to a population statistic for those values |
US11134094B2 (en) | 2013-07-25 | 2021-09-28 | Splunk Inc. | Detection of potential security threats in machine data based on pattern detection |
US10091227B2 (en) | 2013-07-25 | 2018-10-02 | Splunk Inc. | Detection of potential security threats based on categorical patterns |
US9215240B2 (en) * | 2013-07-25 | 2015-12-15 | Splunk Inc. | Investigative and dynamic detection of potential security-threat indicators from events in big data |
US20130326620A1 (en) * | 2013-07-25 | 2013-12-05 | Splunk Inc. | Investigative and dynamic detection of potential security-threat indicators from events in big data |
US10567412B2 (en) | 2013-07-25 | 2020-02-18 | Splunk Inc. | Security threat detection based o patterns in machine data events |
WO2015060994A3 (en) * | 2013-07-26 | 2015-06-18 | Nant Holdings Ip, Llc | Discovery routing systems and engines |
US10114925B2 (en) | 2013-07-26 | 2018-10-30 | Nant Holdings Ip, Llc | Discovery routing systems and engines |
US11017884B2 (en) | 2013-07-26 | 2021-05-25 | Nant Holdings Ip, Llc | Discovery routing systems and engines |
US12051485B2 (en) | 2013-07-26 | 2024-07-30 | Nant Holdings Ip, Llc | Discovery routing systems and engines |
US10332618B2 (en) | 2013-07-26 | 2019-06-25 | Nant Holdings Ip, Llc | Discovery routing systems and engines |
US20170257292A1 (en) * | 2013-07-31 | 2017-09-07 | Splunk Inc. | Systems and Methods For Displaying Metrics On Real-Time Data In An Environment |
US10574548B2 (en) | 2013-07-31 | 2020-02-25 | Splunk Inc. | Key indicators view |
US11831523B2 (en) | 2013-07-31 | 2023-11-28 | Splunk Inc. | Systems and methods for displaying adjustable metrics on real-time data in a computing environment |
US12010262B2 (en) | 2013-08-06 | 2024-06-11 | Apple Inc. | Auto-activating smart responses based on activities from remote devices |
US10791216B2 (en) | 2013-08-06 | 2020-09-29 | Apple Inc. | Auto-activating smart responses based on activities from remote devices |
US9704172B2 (en) | 2013-08-08 | 2017-07-11 | E-Valuation, Inc. | Systems and methods of simulating user intuition of business relationships using biographical imagery |
WO2015021449A3 (en) * | 2013-08-08 | 2015-07-30 | E-Valuation,Inc. | Systems and methods of communicating information regarding interpersonal relationships using biographical imagery |
US9300682B2 (en) | 2013-08-09 | 2016-03-29 | Lockheed Martin Corporation | Composite analysis of executable content across enterprise network |
US10013238B2 (en) * | 2013-08-12 | 2018-07-03 | Telefonaktiebolaget Lm Ericsson (Publ) | Predicting elements for workflow development |
US20160188298A1 (en) * | 2013-08-12 | 2016-06-30 | Telefonaktiebolaget L M Ericsson (Publ) | Predicting Elements for Workflow Development |
US9485222B2 (en) * | 2013-08-20 | 2016-11-01 | Hewlett-Packard Development Company, L.P. | Data stream traffic control |
US20150058622A1 (en) * | 2013-08-20 | 2015-02-26 | Hewlett-Packard Development Company, L.P. | Data stream traffic control |
US9336332B2 (en) | 2013-08-28 | 2016-05-10 | Clipcard Inc. | Programmatic data discovery platforms for computing applications |
US20150074806A1 (en) * | 2013-09-10 | 2015-03-12 | Symantec Corporation | Systems and methods for using event-correlation graphs to detect attacks on computing systems |
US9141790B2 (en) * | 2013-09-10 | 2015-09-22 | Symantec Corporation | Systems and methods for using event-correlation graphs to detect attacks on computing systems |
US9342796B1 (en) | 2013-09-16 | 2016-05-17 | Amazon Technologies, Inc. | Learning-based data decontextualization |
US10002177B1 (en) * | 2013-09-16 | 2018-06-19 | Amazon Technologies, Inc. | Crowdsourced analysis of decontextualized data |
EP2866168A1 (en) * | 2013-09-17 | 2015-04-29 | Sap Se | Calibration of strategies for fraud detection |
WO2015051185A1 (en) * | 2013-10-04 | 2015-04-09 | Cyberflow Analytics, Inc. | Network intrusion detection |
EP3055808A4 (en) * | 2013-10-08 | 2017-04-26 | Crowdstrike, Inc. | Event model for correlating system component states |
EP3055808A1 (en) * | 2013-10-08 | 2016-08-17 | Crowdstrike, Inc. | Event model for correlating system component states |
US9516064B2 (en) | 2013-10-14 | 2016-12-06 | Intuit Inc. | Method and system for dynamic and comprehensive vulnerability management |
US9407549B2 (en) | 2013-10-29 | 2016-08-02 | Palo Alto Research Center Incorporated | System and method for hash-based forwarding of packets with hierarchically structured variable-length identifiers |
US9276840B2 (en) | 2013-10-30 | 2016-03-01 | Palo Alto Research Center Incorporated | Interest messages with a payload for a named data network |
US9401864B2 (en) | 2013-10-31 | 2016-07-26 | Palo Alto Research Center Incorporated | Express header for packets with hierarchically structured variable-length identifiers |
US10129365B2 (en) | 2013-11-13 | 2018-11-13 | Cisco Technology, Inc. | Method and apparatus for pre-fetching remote content based on static and dynamic recommendations |
US10089655B2 (en) | 2013-11-27 | 2018-10-02 | Cisco Technology, Inc. | Method and apparatus for scalable data broadcasting |
US9503358B2 (en) | 2013-12-05 | 2016-11-22 | Palo Alto Research Center Incorporated | Distance-based routing in an information-centric network |
US20150161024A1 (en) * | 2013-12-06 | 2015-06-11 | Qualcomm Incorporated | Methods and Systems of Generating Application-Specific Models for the Targeted Protection of Vital Applications |
US9652362B2 (en) | 2013-12-06 | 2017-05-16 | Qualcomm Incorporated | Methods and systems of using application-specific and application-type-specific models for the efficient classification of mobile device behaviors |
US10296160B2 (en) | 2013-12-06 | 2019-05-21 | Apple Inc. | Method for extracting salient dialog usage from live data |
US11314370B2 (en) | 2013-12-06 | 2022-04-26 | Apple Inc. | Method for extracting salient dialog usage from live data |
US9606893B2 (en) * | 2013-12-06 | 2017-03-28 | Qualcomm Incorporated | Methods and systems of generating application-specific models for the targeted protection of vital applications |
US20160080406A1 (en) * | 2013-12-19 | 2016-03-17 | Microsoft Technology Licensing, Llc | Detecting anomalous activity from accounts of an online service |
US9210183B2 (en) | 2013-12-19 | 2015-12-08 | Microsoft Technology Licensing, Llc | Detecting anomalous activity from accounts of an online service |
US9501345B1 (en) | 2013-12-23 | 2016-11-22 | Intuit Inc. | Method and system for creating enriched log data |
US10223410B2 (en) * | 2014-01-06 | 2019-03-05 | Cisco Technology, Inc. | Method and system for acquisition, normalization, matching, and enrichment of data |
US20150193497A1 (en) * | 2014-01-06 | 2015-07-09 | Cisco Technology, Inc. | Method and system for acquisition, normalization, matching, and enrichment of data |
US11150793B2 (en) | 2014-01-13 | 2021-10-19 | International Business Machines Corporation | Social balancer for indicating the relative priorities of linked objects |
US9292616B2 (en) | 2014-01-13 | 2016-03-22 | International Business Machines Corporation | Social balancer for indicating the relative priorities of linked objects |
US10324608B2 (en) | 2014-01-13 | 2019-06-18 | International Business Machines Corporation | Social balancer for indicating the relative priorities of linked objects |
US9379979B2 (en) | 2014-01-14 | 2016-06-28 | Palo Alto Research Center Incorporated | Method and apparatus for establishing a virtual interface for a set of mutual-listener devices |
US10172068B2 (en) | 2014-01-22 | 2019-01-01 | Cisco Technology, Inc. | Service-oriented routing in software-defined MANETs |
US10098051B2 (en) | 2014-01-22 | 2018-10-09 | Cisco Technology, Inc. | Gateways and routing in software-defined manets |
US9374304B2 (en) | 2014-01-24 | 2016-06-21 | Palo Alto Research Center Incorporated | End-to end route tracing over a named-data network |
US10474956B2 (en) | 2014-01-26 | 2019-11-12 | International Business Machines Corporation | Detecting deviations between event log and process model |
US11354588B2 (en) * | 2014-01-26 | 2022-06-07 | International Business Machines Corporation | Detecting deviations between event log and process model |
US11514348B2 (en) | 2014-01-26 | 2022-11-29 | International Business Machines Corporation | Detecting deviations between event log and process model |
US10417569B2 (en) | 2014-01-26 | 2019-09-17 | International Business Machines Corporation | Detecting deviations between event log and process model |
US10467539B2 (en) | 2014-01-26 | 2019-11-05 | International Business Machines Corporation | Detecting deviations between event log and process model |
US10452987B2 (en) | 2014-01-26 | 2019-10-22 | International Business Machines Corporation | Detecting deviations between event log and process model |
US9552243B2 (en) | 2014-01-27 | 2017-01-24 | International Business Machines Corporation | Detecting an abnormal subsequence in a data sequence |
US9686301B2 (en) | 2014-02-03 | 2017-06-20 | Intuit Inc. | Method and system for virtual asset assisted extrusion and intrusion detection and threat scoring in a cloud computing environment |
US9923909B2 (en) | 2014-02-03 | 2018-03-20 | Intuit Inc. | System and method for providing a self-monitoring, self-reporting, and self-repairing virtual asset configured for extrusion and intrusion detection and threat scoring in a cloud computing environment |
US10360062B2 (en) | 2014-02-03 | 2019-07-23 | Intuit Inc. | System and method for providing a self-monitoring, self-reporting, and self-repairing virtual asset configured for extrusion and intrusion detection and threat scoring in a cloud computing environment |
US9286403B2 (en) * | 2014-02-04 | 2016-03-15 | Shoobx, Inc. | Computer-guided corporate governance with document generation and execution |
US9672524B2 (en) | 2014-02-04 | 2017-06-06 | Shoobx, Inc. | Computer-guided corporate governance with document generation and execution |
US9954678B2 (en) | 2014-02-06 | 2018-04-24 | Cisco Technology, Inc. | Content-based transport security |
WO2015119607A1 (en) * | 2014-02-06 | 2015-08-13 | Hewlett-Packard Development Company, L.P. | Resource management |
US10552407B2 (en) * | 2014-02-07 | 2020-02-04 | Mackay Memorial Hospital | Computing device for data managing and decision making |
US10135788B1 (en) * | 2014-02-11 | 2018-11-20 | Data Visor Inc. | Using hypergraphs to determine suspicious user activities |
US9787640B1 (en) * | 2014-02-11 | 2017-10-10 | DataVisor Inc. | Using hypergraphs to determine suspicious user activities |
US10009358B1 (en) * | 2014-02-11 | 2018-06-26 | DataVisor Inc. | Graph based framework for detecting malicious or compromised accounts |
US11720599B1 (en) | 2014-02-13 | 2023-08-08 | Pivotal Software, Inc. | Clustering and visualizing alerts and incidents |
US20150235152A1 (en) * | 2014-02-18 | 2015-08-20 | Palo Alto Research Center Incorporated | System and method for modeling behavior change and consistency to detect malicious insiders |
US11411984B2 (en) | 2014-02-21 | 2022-08-09 | Intuit Inc. | Replacing a potentially threatening virtual asset |
US20150242786A1 (en) * | 2014-02-21 | 2015-08-27 | International Business Machines Corporation | Integrating process context from heterogeneous workflow containers to optimize workflow performance |
US10757133B2 (en) | 2014-02-21 | 2020-08-25 | Intuit Inc. | Method and system for creating and deploying virtual assets |
US9852208B2 (en) | 2014-02-25 | 2017-12-26 | International Business Machines Corporation | Discovering communities and expertise of users using semantic analysis of resource access logs |
US9678998B2 (en) | 2014-02-28 | 2017-06-13 | Cisco Technology, Inc. | Content name resolution for information centric networking |
US10706029B2 (en) | 2014-02-28 | 2020-07-07 | Cisco Technology, Inc. | Content name resolution for information centric networking |
US10089651B2 (en) | 2014-03-03 | 2018-10-02 | Cisco Technology, Inc. | Method and apparatus for streaming advertisements in a scalable data broadcasting system |
US9836540B2 (en) | 2014-03-04 | 2017-12-05 | Cisco Technology, Inc. | System and method for direct storage access in a content-centric network |
US10445380B2 (en) | 2014-03-04 | 2019-10-15 | Cisco Technology, Inc. | System and method for direct storage access in a content-centric network |
US9626528B2 (en) * | 2014-03-07 | 2017-04-18 | International Business Machines Corporation | Data leak prevention enforcement based on learned document classification |
US20150254469A1 (en) * | 2014-03-07 | 2015-09-10 | International Business Machines Corporation | Data leak prevention enforcement based on learned document classification |
US9473405B2 (en) | 2014-03-10 | 2016-10-18 | Palo Alto Research Center Incorporated | Concurrent hashes and sub-hashes on data streams |
US9626413B2 (en) | 2014-03-10 | 2017-04-18 | Cisco Systems, Inc. | System and method for ranking content popularity in a content-centric network |
US9391896B2 (en) | 2014-03-10 | 2016-07-12 | Palo Alto Research Center Incorporated | System and method for packet forwarding using a conjunctive normal form strategy in a content-centric network |
US20150262474A1 (en) * | 2014-03-12 | 2015-09-17 | Haltian Oy | Relevance determination of sensor event |
US20150261940A1 (en) * | 2014-03-12 | 2015-09-17 | Symantec Corporation | Systems and methods for detecting information leakage by an organizational insider |
US20150262184A1 (en) * | 2014-03-12 | 2015-09-17 | Microsoft Corporation | Two stage risk model building and evaluation |
US9652597B2 (en) * | 2014-03-12 | 2017-05-16 | Symantec Corporation | Systems and methods for detecting information leakage by an organizational insider |
US9672729B2 (en) * | 2014-03-12 | 2017-06-06 | Haltian Oy | Relevance determination of sensor event |
US20170286678A1 (en) * | 2014-03-17 | 2017-10-05 | Proofpoint, Inc. | Behavior Profiling for Malware Detection |
US10102372B2 (en) * | 2014-03-17 | 2018-10-16 | Proofpoint, Inc. | Behavior profiling for malware detection |
US9407432B2 (en) | 2014-03-19 | 2016-08-02 | Palo Alto Research Center Incorporated | System and method for efficient and secure distribution of digital content |
US9225730B1 (en) * | 2014-03-19 | 2015-12-29 | Amazon Technologies, Inc. | Graph based detection of anomalous activity |
US9256739B1 (en) * | 2014-03-21 | 2016-02-09 | Symantec Corporation | Systems and methods for using event-correlation graphs to generate remediation procedures |
US9916601B2 (en) | 2014-03-21 | 2018-03-13 | Cisco Technology, Inc. | Marketplace for presenting advertisements in a scalable data broadcasting system |
US9363179B2 (en) | 2014-03-26 | 2016-06-07 | Palo Alto Research Center Incorporated | Multi-publisher routing protocol for named data networks |
US9363086B2 (en) | 2014-03-31 | 2016-06-07 | Palo Alto Research Center Incorporated | Aggregate signing of data in content centric networking |
US9459987B2 (en) | 2014-03-31 | 2016-10-04 | Intuit Inc. | Method and system for comparing different versions of a cloud based application in a production environment using segregated backend systems |
US9716622B2 (en) | 2014-04-01 | 2017-07-25 | Cisco Technology, Inc. | System and method for dynamic name configuration in content-centric networks |
US11948048B2 (en) | 2014-04-02 | 2024-04-02 | Brighterion, Inc. | Artificial intelligence for context classifier |
US10896421B2 (en) | 2014-04-02 | 2021-01-19 | Brighterion, Inc. | Smart retail analytics and commercial messaging |
US20150286928A1 (en) * | 2014-04-03 | 2015-10-08 | Adobe Systems Incorporated | Causal Modeling and Attribution |
US10949753B2 (en) * | 2014-04-03 | 2021-03-16 | Adobe Inc. | Causal modeling and attribution |
US9473576B2 (en) | 2014-04-07 | 2016-10-18 | Palo Alto Research Center Incorporated | Service discovery using collection synchronization with exact names |
US10075521B2 (en) | 2014-04-07 | 2018-09-11 | Cisco Technology, Inc. | Collection synchronization using equality matched network names |
US9596251B2 (en) | 2014-04-07 | 2017-03-14 | Intuit Inc. | Method and system for providing security aware applications |
US9390289B2 (en) | 2014-04-07 | 2016-07-12 | Palo Alto Research Center Incorporated | Secure collection synchronization using matched network names |
US11005738B1 (en) | 2014-04-09 | 2021-05-11 | Quest Software Inc. | System and method for end-to-end response-time analysis |
US9451032B2 (en) | 2014-04-10 | 2016-09-20 | Palo Alto Research Center Incorporated | System and method for simple service discovery in content-centric networks |
US9705901B2 (en) * | 2014-04-11 | 2017-07-11 | Fuji Xerox Co., Ltd. | Unauthorized-communication detecting apparatus, unauthorized-communication detecting method and non-transitory computer readable medium |
US20150294111A1 (en) * | 2014-04-11 | 2015-10-15 | Fuji Xerox Co., Ltd. | Unauthorized-communication detecting apparatus, unauthorized-communication detecting method and non-transitory computer readable medium |
US20160267441A1 (en) * | 2014-04-13 | 2016-09-15 | Helixaeon Inc. | Visualization and analysis of scheduling data |
US10055247B2 (en) | 2014-04-18 | 2018-08-21 | Intuit Inc. | Method and system for enabling self-monitoring virtual assets to correlate external events with characteristic patterns associated with the virtual assets |
US11294700B2 (en) | 2014-04-18 | 2022-04-05 | Intuit Inc. | Method and system for enabling self-monitoring virtual assets to correlate external events with characteristic patterns associated with the virtual assets |
US9900322B2 (en) | 2014-04-30 | 2018-02-20 | Intuit Inc. | Method and system for providing permissions management |
US9992281B2 (en) | 2014-05-01 | 2018-06-05 | Cisco Technology, Inc. | Accountable content stores for information centric networks |
US10365780B2 (en) * | 2014-05-05 | 2019-07-30 | Adobe Inc. | Crowdsourcing for documents and forms |
US20150317337A1 (en) * | 2014-05-05 | 2015-11-05 | General Electric Company | Systems and Methods for Identifying and Driving Actionable Insights from Data |
US9620105B2 (en) | 2014-05-15 | 2017-04-11 | Apple Inc. | Analyzing audio input for efficient speech and music recognition |
US11017330B2 (en) | 2014-05-20 | 2021-05-25 | Elasticsearch B.V. | Method and system for analysing data |
US9609014B2 (en) | 2014-05-22 | 2017-03-28 | Cisco Systems, Inc. | Method and apparatus for preventing insertion of malicious content at a named data network router |
US10158656B2 (en) | 2014-05-22 | 2018-12-18 | Cisco Technology, Inc. | Method and apparatus for preventing insertion of malicious content at a named data network router |
US10592095B2 (en) | 2014-05-23 | 2020-03-17 | Apple Inc. | Instantaneous speaking of content on touch devices |
US9455835B2 (en) | 2014-05-23 | 2016-09-27 | Palo Alto Research Center Incorporated | System and method for circular link resolution with hash-based names in content-centric networks |
US9502031B2 (en) | 2014-05-27 | 2016-11-22 | Apple Inc. | Method for supporting dynamic grammars in WFST-based ASR |
US9742794B2 (en) | 2014-05-27 | 2017-08-22 | Intuit Inc. | Method and apparatus for automating threat model generation and pattern identification |
US9760559B2 (en) | 2014-05-30 | 2017-09-12 | Apple Inc. | Predictive text input |
US10714095B2 (en) | 2014-05-30 | 2020-07-14 | Apple Inc. | Intelligent assistant for home automation |
US10170123B2 (en) | 2014-05-30 | 2019-01-01 | Apple Inc. | Intelligent assistant for home automation |
US9966065B2 (en) | 2014-05-30 | 2018-05-08 | Apple Inc. | Multi-command single utterance input method |
US9734193B2 (en) | 2014-05-30 | 2017-08-15 | Apple Inc. | Determining domain salience ranking from ambiguous words in natural speech |
US10497365B2 (en) | 2014-05-30 | 2019-12-03 | Apple Inc. | Multi-command single utterance input method |
US11670289B2 (en) | 2014-05-30 | 2023-06-06 | Apple Inc. | Multi-command single utterance input method |
US10657966B2 (en) | 2014-05-30 | 2020-05-19 | Apple Inc. | Better resolution when referencing to concepts |
US10083690B2 (en) | 2014-05-30 | 2018-09-25 | Apple Inc. | Better resolution when referencing to concepts |
US11133008B2 (en) | 2014-05-30 | 2021-09-28 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US11257504B2 (en) | 2014-05-30 | 2022-02-22 | Apple Inc. | Intelligent assistant for home automation |
US9430463B2 (en) | 2014-05-30 | 2016-08-30 | Apple Inc. | Exemplar-based natural language processing |
US10169329B2 (en) | 2014-05-30 | 2019-01-01 | Apple Inc. | Exemplar-based natural language processing |
US10699717B2 (en) | 2014-05-30 | 2020-06-30 | Apple Inc. | Intelligent assistant for home automation |
US10078631B2 (en) | 2014-05-30 | 2018-09-18 | Apple Inc. | Entropy-guided text prediction using combined word and character n-gram language models |
US10289433B2 (en) | 2014-05-30 | 2019-05-14 | Apple Inc. | Domain specific language for encoding assistant dialog |
US10417344B2 (en) | 2014-05-30 | 2019-09-17 | Apple Inc. | Exemplar-based natural language processing |
US10878809B2 (en) | 2014-05-30 | 2020-12-29 | Apple Inc. | Multi-command single utterance input method |
US11810562B2 (en) | 2014-05-30 | 2023-11-07 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US9842101B2 (en) | 2014-05-30 | 2017-12-12 | Apple Inc. | Predictive conversion of language input |
US9785630B2 (en) | 2014-05-30 | 2017-10-10 | Apple Inc. | Text prediction using combined word N-gram and unigram language models |
US9633004B2 (en) | 2014-05-30 | 2017-04-25 | Apple Inc. | Better resolution when referencing to concepts |
US9715875B2 (en) | 2014-05-30 | 2017-07-25 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US11699448B2 (en) | 2014-05-30 | 2023-07-11 | Apple Inc. | Intelligent assistant for home automation |
US9798882B2 (en) * | 2014-06-06 | 2017-10-24 | Crowdstrike, Inc. | Real-time model of states of monitored devices |
US10268820B2 (en) * | 2014-06-11 | 2019-04-23 | Nippon Telegraph And Telephone Corporation | Malware determination device, malware determination system, malware determination method, and program |
US9516144B2 (en) | 2014-06-19 | 2016-12-06 | Palo Alto Research Center Incorporated | Cut-through forwarding of CCNx message fragments with IP encapsulation |
US9537719B2 (en) | 2014-06-19 | 2017-01-03 | Palo Alto Research Center Incorporated | Method and apparatus for deploying a minimal-cost CCN topology |
US10212176B2 (en) * | 2014-06-23 | 2019-02-19 | Hewlett Packard Enterprise Development Lp | Entity group behavior profiling |
US20150373039A1 (en) * | 2014-06-23 | 2015-12-24 | Niara, Inc. | Entity Group Behavior Profiling |
US11323469B2 (en) | 2014-06-23 | 2022-05-03 | Hewlett Packard Enterprise Development Lp | Entity group behavior profiling |
US10469514B2 (en) | 2014-06-23 | 2019-11-05 | Hewlett Packard Enterprise Development Lp | Collaborative and adaptive threat intelligence for computer security |
US20150379158A1 (en) * | 2014-06-27 | 2015-12-31 | Gabriel G. Infante-Lopez | Systems and methods for pattern matching and relationship discovery |
US10262077B2 (en) * | 2014-06-27 | 2019-04-16 | Intel Corporation | Systems and methods for pattern matching and relationship discovery |
US9668024B2 (en) | 2014-06-30 | 2017-05-30 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US20150381641A1 (en) * | 2014-06-30 | 2015-12-31 | Intuit Inc. | Method and system for efficient management of security threats in a distributed computing environment |
US10659851B2 (en) | 2014-06-30 | 2020-05-19 | Apple Inc. | Real-time digital assistant knowledge updates |
US9866581B2 (en) | 2014-06-30 | 2018-01-09 | Intuit Inc. | Method and system for secure delivery of information to computing environments |
US11516537B2 (en) | 2014-06-30 | 2022-11-29 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US9338493B2 (en) | 2014-06-30 | 2016-05-10 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US9426113B2 (en) | 2014-06-30 | 2016-08-23 | Palo Alto Research Center Incorporated | System and method for managing devices over a content centric network |
US10050997B2 (en) | 2014-06-30 | 2018-08-14 | Intuit Inc. | Method and system for secure delivery of information to computing environments |
US11838579B2 (en) | 2014-06-30 | 2023-12-05 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US10904611B2 (en) | 2014-06-30 | 2021-01-26 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US20160004968A1 (en) * | 2014-07-01 | 2016-01-07 | Hitachi, Ltd. | Correlation rule analysis apparatus and correlation rule analysis method |
US9699198B2 (en) | 2014-07-07 | 2017-07-04 | Cisco Technology, Inc. | System and method for parallel secure content bootstrapping in content-centric networks |
US10446135B2 (en) * | 2014-07-09 | 2019-10-15 | Genesys Telecommunications Laboratories, Inc. | System and method for semantically exploring concepts |
US11288231B2 (en) * | 2014-07-09 | 2022-03-29 | Splunk Inc. | Reproducing datasets generated by alert-triggering search queries |
WO2016004744A1 (en) * | 2014-07-10 | 2016-01-14 | 同济大学 | Method for measuring user behavior consistency based on complex correspondence system |
US9621354B2 (en) | 2014-07-17 | 2017-04-11 | Cisco Systems, Inc. | Reconstructable content objects |
US9959156B2 (en) | 2014-07-17 | 2018-05-01 | Cisco Technology, Inc. | Interest return control message |
US10237075B2 (en) | 2014-07-17 | 2019-03-19 | Cisco Technology, Inc. | Reconstructable content objects |
US9590887B2 (en) | 2014-07-18 | 2017-03-07 | Cisco Systems, Inc. | Method and system for keeping interest alive in a content centric network |
US10305968B2 (en) | 2014-07-18 | 2019-05-28 | Cisco Technology, Inc. | Reputation-based strategy for forwarding and responding to interests over a content centric network |
US9729616B2 (en) | 2014-07-18 | 2017-08-08 | Cisco Technology, Inc. | Reputation-based strategy for forwarding and responding to interests over a content centric network |
US9929935B2 (en) | 2014-07-18 | 2018-03-27 | Cisco Technology, Inc. | Method and system for keeping interest alive in a content centric network |
US20160019479A1 (en) * | 2014-07-18 | 2016-01-21 | Rebecca S. Busch | Interactive and Iterative Behavioral Model, System, and Method for Detecting Fraud, Waste, and Abuse |
US9535968B2 (en) | 2014-07-21 | 2017-01-03 | Palo Alto Research Center Incorporated | System for distributing nameless objects using self-certifying names |
US9473481B2 (en) | 2014-07-31 | 2016-10-18 | Intuit Inc. | Method and system for providing a virtual asset perimeter |
US10102082B2 (en) | 2014-07-31 | 2018-10-16 | Intuit Inc. | Method and system for providing automated self-healing virtual assets |
US10268821B2 (en) * | 2014-08-04 | 2019-04-23 | Darktrace Limited | Cyber security |
EP3178033B1 (en) * | 2014-08-04 | 2020-01-29 | Darktrace Limited | Cyber security |
US11693964B2 (en) * | 2014-08-04 | 2023-07-04 | Darktrace Holdings Limited | Cyber security using one or more models trained on a normal behavior |
US20190251260A1 (en) * | 2014-08-04 | 2019-08-15 | Darktrace Limited | Cyber security using one or more models trained on a normal behavior |
US10412117B2 (en) * | 2014-08-05 | 2019-09-10 | Dflabs S.P.A. | Method and system for automated cybersecurity incident and artifact visualization and correlation for security operation centers and computer emergency response teams |
US20160044061A1 (en) * | 2014-08-05 | 2016-02-11 | Df Labs | Method and system for automated cybersecurity incident and artifact visualization and correlation for security operation centers and computer emergency response teams |
US11089063B2 (en) | 2014-08-05 | 2021-08-10 | Dflabs S.P.A. | Method and system for automated cybersecurity incident and artifact visualization and correlation for security operation centers and computer emergency response teams |
US10929777B2 (en) | 2014-08-08 | 2021-02-23 | Brighterion, Inc. | Method of automating data science services |
US11348110B2 (en) | 2014-08-08 | 2022-05-31 | Brighterion, Inc. | Artificial intelligence fraud management solution |
US11023894B2 (en) | 2014-08-08 | 2021-06-01 | Brighterion, Inc. | Fast access vectors in real-time behavioral profiling in fraudulent financial transactions |
US9882964B2 (en) | 2014-08-08 | 2018-01-30 | Cisco Technology, Inc. | Explicit strategy feedback in name-based forwarding |
US11507663B2 (en) | 2014-08-11 | 2022-11-22 | Sentinel Labs Israel Ltd. | Method of remediating operations performed by a program and system thereof |
WO2016024268A1 (en) * | 2014-08-11 | 2016-02-18 | Sentinel Labs Israel Ltd. | Method of malware detection and system thereof |
US10664596B2 (en) | 2014-08-11 | 2020-05-26 | Sentinel Labs Israel Ltd. | Method of malware detection and system thereof |
US11625485B2 (en) | 2014-08-11 | 2023-04-11 | Sentinel Labs Israel Ltd. | Method of malware detection and system thereof |
US10417424B2 (en) | 2014-08-11 | 2019-09-17 | Sentinel Labs Israel Ltd. | Method of remediating operations performed by a program and system thereof |
US11886591B2 (en) | 2014-08-11 | 2024-01-30 | Sentinel Labs Israel Ltd. | Method of remediating operations performed by a program and system thereof |
US9710648B2 (en) | 2014-08-11 | 2017-07-18 | Sentinel Labs Israel Ltd. | Method of malware detection and system thereof |
US12026257B2 (en) | 2014-08-11 | 2024-07-02 | Sentinel Labs Israel Ltd. | Method of malware detection and system thereof |
US10102374B1 (en) | 2014-08-11 | 2018-10-16 | Sentinel Labs Israel Ltd. | Method of remediating a program and system thereof by undoing operations |
US10977370B2 (en) | 2014-08-11 | 2021-04-13 | Sentinel Labs Israel Ltd. | Method of remediating operations performed by a program and system thereof |
US9729662B2 (en) | 2014-08-11 | 2017-08-08 | Cisco Technology, Inc. | Probabilistic lazy-forwarding technique without validation in a content centric network |
US9503365B2 (en) | 2014-08-11 | 2016-11-22 | Palo Alto Research Center Incorporated | Reputation-based instruction processing over an information centric network |
US9391777B2 (en) | 2014-08-15 | 2016-07-12 | Palo Alto Research Center Incorporated | System and method for performing key resolution over a content centric network |
US9800637B2 (en) | 2014-08-19 | 2017-10-24 | Cisco Technology, Inc. | System and method for all-in-one content stream in content-centric networks |
US10367871B2 (en) | 2014-08-19 | 2019-07-30 | Cisco Technology, Inc. | System and method for all-in-one content stream in content-centric networks |
US9467492B2 (en) | 2014-08-19 | 2016-10-11 | Palo Alto Research Center Incorporated | System and method for reconstructable all-in-one content stream |
US9497282B2 (en) | 2014-08-27 | 2016-11-15 | Palo Alto Research Center Incorporated | Network coding for content-centric network |
US10446141B2 (en) | 2014-08-28 | 2019-10-15 | Apple Inc. | Automatic speech recognition based on user feedback |
US10204013B2 (en) | 2014-09-03 | 2019-02-12 | Cisco Technology, Inc. | System and method for maintaining a distributed and fault-tolerant state over an information centric network |
US11314597B2 (en) | 2014-09-03 | 2022-04-26 | Cisco Technology, Inc. | System and method for maintaining a distributed and fault-tolerant state over an information centric network |
US9553812B2 (en) | 2014-09-09 | 2017-01-24 | Palo Alto Research Center Incorporated | Interest keep alives at intermediate routers in a CCN |
US20170262561A1 (en) * | 2014-09-11 | 2017-09-14 | Nec Corporation | Information processing apparatus, information processing method, and recording medium |
US9818400B2 (en) | 2014-09-11 | 2017-11-14 | Apple Inc. | Method and apparatus for discovering trending terms in speech requests |
US10431204B2 (en) | 2014-09-11 | 2019-10-01 | Apple Inc. | Method and apparatus for discovering trending terms in speech requests |
US10789041B2 (en) | 2014-09-12 | 2020-09-29 | Apple Inc. | Dynamic thresholds for always listening speech trigger |
US10936959B2 (en) | 2014-09-16 | 2021-03-02 | Airbnb, Inc. | Determining trustworthiness and compatibility of a person |
US10169708B2 (en) | 2014-09-16 | 2019-01-01 | Airbnb, Inc. | Determining trustworthiness and compatibility of a person |
US9070088B1 (en) | 2014-09-16 | 2015-06-30 | Trooly Inc. | Determining trustworthiness and compatibility of a person |
US9773112B1 (en) * | 2014-09-29 | 2017-09-26 | Fireeye, Inc. | Exploit detection of malware and malware families |
US10223644B2 (en) | 2014-09-29 | 2019-03-05 | Cisco Technology, Inc. | Behavioral modeling of a data center utilizing human knowledge to enhance a machine learning algorithm |
US10127911B2 (en) | 2014-09-30 | 2018-11-13 | Apple Inc. | Speaker identification and unsupervised speaker adaptation techniques |
US9668121B2 (en) | 2014-09-30 | 2017-05-30 | Apple Inc. | Social reminders |
US10453443B2 (en) | 2014-09-30 | 2019-10-22 | Apple Inc. | Providing an indication of the suitability of speech recognition |
US10074360B2 (en) | 2014-09-30 | 2018-09-11 | Apple Inc. | Providing an indication of the suitability of speech recognition |
US10438595B2 (en) | 2014-09-30 | 2019-10-08 | Apple Inc. | Speaker identification and unsupervised speaker adaptation techniques |
US9646609B2 (en) | 2014-09-30 | 2017-05-09 | Apple Inc. | Caching apparatus for serving phonetic pronunciations |
US9886432B2 (en) | 2014-09-30 | 2018-02-06 | Apple Inc. | Parsimonious handling of word inflection via categorical stem + suffix N-gram language models |
US10390213B2 (en) | 2014-09-30 | 2019-08-20 | Apple Inc. | Social reminders |
US9986419B2 (en) | 2014-09-30 | 2018-05-29 | Apple Inc. | Social reminders |
US10783200B2 (en) * | 2014-10-10 | 2020-09-22 | Salesforce.Com, Inc. | Systems and methods of de-duplicating similar news feed items |
US10846623B2 (en) | 2014-10-15 | 2020-11-24 | Brighterion, Inc. | Data clean-up method for improving predictive model training |
US11080709B2 (en) | 2014-10-15 | 2021-08-03 | Brighterion, Inc. | Method of reducing financial losses in multiple payment channels upon a recognition of fraud first appearing in any one payment channel |
US10977655B2 (en) | 2014-10-15 | 2021-04-13 | Brighterion, Inc. | Method for improving operating profits with better automated decision making with artificial intelligence |
US10984423B2 (en) | 2014-10-15 | 2021-04-20 | Brighterion, Inc. | Method of operating artificial intelligence machines to improve predictive model training and performance |
US11080793B2 (en) | 2014-10-15 | 2021-08-03 | Brighterion, Inc. | Method of personalizing, individualizing, and automating the management of healthcare fraud-waste-abuse to unique individual healthcare providers |
US9306965B1 (en) * | 2014-10-21 | 2016-04-05 | IronNet Cybersecurity, Inc. | Cybersecurity system |
US9558244B2 (en) * | 2014-10-22 | 2017-01-31 | Conversable, Inc. | Systems and methods for social recommendations |
US10069933B2 (en) | 2014-10-23 | 2018-09-04 | Cisco Technology, Inc. | System and method for creating virtual interfaces based on network characteristics |
US10715634B2 (en) | 2014-10-23 | 2020-07-14 | Cisco Technology, Inc. | System and method for creating virtual interfaces based on network characteristics |
US11062317B2 (en) | 2014-10-28 | 2021-07-13 | Brighterion, Inc. | Data breach detection |
US10997599B2 (en) | 2014-10-28 | 2021-05-04 | Brighterion, Inc. | Method for detecting merchant data breaches with a computer network server |
WO2016073383A1 (en) * | 2014-11-03 | 2016-05-12 | Vectra Networks, Inc. | A system for implementing threat detection using threat and risk assessment of asset-actor interactions |
US10050985B2 (en) | 2014-11-03 | 2018-08-14 | Vectra Networks, Inc. | System for implementing threat detection using threat and risk assessment of asset-actor interactions |
US20160381077A1 (en) * | 2014-11-04 | 2016-12-29 | Patternex, Inc. | Method and apparatus for identifying and detecting threats to an enterprise or e-commerce system |
US9661025B2 (en) * | 2014-11-04 | 2017-05-23 | Patternex, Inc. | Method and apparatus for identifying and detecting threats to an enterprise or e-commerce system |
US20160132827A1 (en) * | 2014-11-06 | 2016-05-12 | Xerox Corporation | Methods and systems for designing of tasks for crowdsourcing |
US10482403B2 (en) * | 2014-11-06 | 2019-11-19 | Conduent Business Services, Llc | Methods and systems for designing of tasks for crowdsourcing |
US20160132903A1 (en) * | 2014-11-11 | 2016-05-12 | Tata Consultancy Services Limited | Identifying an industry specific e-maven |
US11863537B2 (en) * | 2014-11-14 | 2024-01-02 | William Ziebell | Systems, methods, and media for a cloud based social media network |
US20240171552A1 (en) * | 2014-11-14 | 2024-05-23 | William J Ziebell | Systems, methods, and media for a cloud based social media network |
US20220329575A1 (en) * | 2014-11-14 | 2022-10-13 | William J Ziebell | Systems, methods, and media for a cloud based social media network |
US11405368B2 (en) * | 2014-11-14 | 2022-08-02 | William J. Ziebell | Systems, methods, and media for a cloud based social media network |
US11204929B2 (en) * | 2014-11-18 | 2021-12-21 | International Business Machines Corporation | Evidence aggregation across heterogeneous links for intelligence gathering using a question answering system |
US11238351B2 (en) | 2014-11-19 | 2022-02-01 | International Business Machines Corporation | Grading sources and managing evidence for intelligence analysis |
US11494711B2 (en) | 2014-11-19 | 2022-11-08 | Shoobx, Inc. | Computer-guided corporate relationship management |
US11244113B2 (en) | 2014-11-19 | 2022-02-08 | International Business Machines Corporation | Evaluating evidential links based on corroboration for intelligence analysis |
US11836211B2 (en) | 2014-11-21 | 2023-12-05 | International Business Machines Corporation | Generating additional lines of questioning based on evaluation of a hypothetical link between concept entities in evidential data |
US11556230B2 (en) | 2014-12-02 | 2023-01-17 | Apple Inc. | Data detection |
US10552013B2 (en) | 2014-12-02 | 2020-02-04 | Apple Inc. | Data detection |
US10291493B1 (en) * | 2014-12-05 | 2019-05-14 | Quest Software Inc. | System and method for determining relevant computer performance events |
US10623236B2 (en) * | 2014-12-08 | 2020-04-14 | Tata Consultancy Services Limited | Alert management system for enterprises |
US20160164714A1 (en) * | 2014-12-08 | 2016-06-09 | Tata Consultancy Services Limited | Alert management system for enterprises |
US9711141B2 (en) | 2014-12-09 | 2017-07-18 | Apple Inc. | Disambiguating heteronyms in speech synthesis |
US10884891B2 (en) * | 2014-12-11 | 2021-01-05 | Micro Focus Llc | Interactive detection of system anomalies |
US20170192872A1 (en) * | 2014-12-11 | 2017-07-06 | Hewlett Packard Enterprise Development Lp | Interactive detection of system anomalies |
KR20170094357A (en) * | 2014-12-12 | 2017-08-17 | 옴니 에이아이, 인크. | Perceptual associative memory for a neuro-linguistic behavior recognition system |
US11017168B2 (en) | 2014-12-12 | 2021-05-25 | Intellective Ai, Inc. | Lexical analyzer for a neuro-linguistic behavior recognition system |
KR102440821B1 (en) | 2014-12-12 | 2022-09-05 | 인터렉티브 에이아이, 인크. | Perceptual associative memory for a neuro-linguistic behavior recognition system |
US20160170961A1 (en) * | 2014-12-12 | 2016-06-16 | Behavioral Recognition Systems, Inc. | Perceptual associative memory for a neuro-linguistic behavior recognition system |
US20160170964A1 (en) * | 2014-12-12 | 2016-06-16 | Behavioral Recognition Systems, Inc. | Lexical analyzer for a neuro-linguistic behavior recognition system |
US10409909B2 (en) * | 2014-12-12 | 2019-09-10 | Omni Ai, Inc. | Lexical analyzer for a neuro-linguistic behavior recognition system |
US11847413B2 (en) | 2014-12-12 | 2023-12-19 | Intellective Ai, Inc. | Lexical analyzer for a neuro-linguistic behavior recognition system |
US12032909B2 (en) | 2014-12-12 | 2024-07-09 | Intellective Ai, Inc. | Perceptual associative memory for a neuro-linguistic behavior recognition system |
US10409910B2 (en) * | 2014-12-12 | 2019-09-10 | Omni Ai, Inc. | Perceptual associative memory for a neuro-linguistic behavior recognition system |
US9536059B2 (en) | 2014-12-15 | 2017-01-03 | Palo Alto Research Center Incorporated | Method and system for verifying renamed content using manifests in a content centric network |
US9590948B2 (en) | 2014-12-15 | 2017-03-07 | Cisco Systems, Inc. | CCN routing using hardware-assisted hash tables |
US10237189B2 (en) | 2014-12-16 | 2019-03-19 | Cisco Technology, Inc. | System and method for distance-based interest forwarding |
US9846881B2 (en) | 2014-12-19 | 2017-12-19 | Palo Alto Research Center Incorporated | Frugal user engagement help systems |
US9473475B2 (en) | 2014-12-22 | 2016-10-18 | Palo Alto Research Center Incorporated | Low-cost authenticated signing delegation in content centric networking |
US10003520B2 (en) | 2014-12-22 | 2018-06-19 | Cisco Technology, Inc. | System and method for efficient name-based content routing using link-state information in information-centric networks |
WO2016102161A1 (en) * | 2014-12-23 | 2016-06-30 | Telefonica Digital España, S.L.U. | A method, a system and computer program products for assessing the behavioral performance of a user |
EP3038023A1 (en) * | 2014-12-23 | 2016-06-29 | Telefonica Digital España, S.L.U. | A method, a system and computer program products for assessing the behavioral performance of a user |
US9660825B2 (en) | 2014-12-24 | 2017-05-23 | Cisco Technology, Inc. | System and method for multi-source multicasting in content-centric networks |
US10091012B2 (en) | 2014-12-24 | 2018-10-02 | Cisco Technology, Inc. | System and method for multi-source multicasting in content-centric networks |
US20160191450A1 (en) * | 2014-12-31 | 2016-06-30 | Socialtopias, Llc | Recommendations Engine in a Layered Social Media Webpage |
US9602596B2 (en) | 2015-01-12 | 2017-03-21 | Cisco Systems, Inc. | Peer-to-peer sharing in a content centric network |
US10440161B2 (en) | 2015-01-12 | 2019-10-08 | Cisco Technology, Inc. | Auto-configurable transport stack |
US9832291B2 (en) | 2015-01-12 | 2017-11-28 | Cisco Technology, Inc. | Auto-configurable transport stack |
US9954795B2 (en) | 2015-01-12 | 2018-04-24 | Cisco Technology, Inc. | Resource allocation using CCN manifests |
US9916457B2 (en) | 2015-01-12 | 2018-03-13 | Cisco Technology, Inc. | Decoupled name security binding for CCN objects |
US9946743B2 (en) | 2015-01-12 | 2018-04-17 | Cisco Technology, Inc. | Order encoded manifests in a content centric network |
US10891558B2 (en) * | 2015-01-21 | 2021-01-12 | Anodot Ltd. | Creation of metric relationship graph based on windowed time series data for anomaly detection |
US9462006B2 (en) | 2015-01-21 | 2016-10-04 | Palo Alto Research Center Incorporated | Network-layer application-specific trust model |
US20160210556A1 (en) * | 2015-01-21 | 2016-07-21 | Anodot Ltd. | Heuristic Inference of Topological Representation of Metric Relationships |
US20160217056A1 (en) * | 2015-01-28 | 2016-07-28 | Hewlett-Packard Development Company, L.P. | Detecting flow anomalies |
WO2016123528A1 (en) * | 2015-01-30 | 2016-08-04 | Securonix, Inc. | Risk scoring for threat assessment |
WO2016123522A1 (en) * | 2015-01-30 | 2016-08-04 | Securonix, Inc. | Anomaly detection using adaptive behavioral profiles |
US9544321B2 (en) | 2015-01-30 | 2017-01-10 | Securonix, Inc. | Anomaly detection using adaptive behavioral profiles |
US9613447B2 (en) | 2015-02-02 | 2017-04-04 | International Business Machines Corporation | Identifying cyclic patterns of complex events |
US9552493B2 (en) | 2015-02-03 | 2017-01-24 | Palo Alto Research Center Incorporated | Access control framework for information centric networking |
US10333840B2 (en) | 2015-02-06 | 2019-06-25 | Cisco Technology, Inc. | System and method for on-demand content exchange with adaptive naming in information-centric networks |
US10007788B2 (en) * | 2015-02-11 | 2018-06-26 | Electronics And Telecommunications Research Institute | Method of modeling behavior pattern of instruction set in N-gram manner, computing device operating with the method, and program stored in storage medium to execute the method in computing device |
US11231904B2 (en) | 2015-03-06 | 2022-01-25 | Apple Inc. | Reducing response latency of intelligent automated assistants |
US9865280B2 (en) | 2015-03-06 | 2018-01-09 | Apple Inc. | Structured dictation using intelligent automated assistants |
US11087759B2 (en) | 2015-03-08 | 2021-08-10 | Apple Inc. | Virtual assistant activation |
US9886953B2 (en) | 2015-03-08 | 2018-02-06 | Apple Inc. | Virtual assistant activation |
US10930282B2 (en) | 2015-03-08 | 2021-02-23 | Apple Inc. | Competing devices responding to voice triggers |
US9721566B2 (en) | 2015-03-08 | 2017-08-01 | Apple Inc. | Competing devices responding to voice triggers |
US10311871B2 (en) | 2015-03-08 | 2019-06-04 | Apple Inc. | Competing devices responding to voice triggers |
US11842734B2 (en) | 2015-03-08 | 2023-12-12 | Apple Inc. | Virtual assistant activation |
US10529332B2 (en) | 2015-03-08 | 2020-01-07 | Apple Inc. | Virtual assistant activation |
US10567477B2 (en) | 2015-03-08 | 2020-02-18 | Apple Inc. | Virtual assistant continuity |
US9899019B2 (en) | 2015-03-18 | 2018-02-20 | Apple Inc. | Systems and methods for structured stem and suffix language models |
US10075401B2 (en) | 2015-03-18 | 2018-09-11 | Cisco Technology, Inc. | Pending interest table behavior |
US20160283589A1 (en) * | 2015-03-24 | 2016-09-29 | International Business Machines Corporation | Augmenting search queries based on personalized association patterns |
US11442977B2 (en) * | 2015-03-24 | 2022-09-13 | International Business Machines Corporation | Augmenting search queries based on personalized association patterns |
US10427048B1 (en) | 2015-03-27 | 2019-10-01 | Electronic Arts Inc. | Secure anti-cheat system |
US11040285B1 (en) | 2015-03-27 | 2021-06-22 | Electronic Arts Inc. | Secure anti-cheat system |
US11654365B2 (en) | 2015-03-27 | 2023-05-23 | Electronic Arts Inc. | Secure anti-cheat system |
US9842105B2 (en) | 2015-04-16 | 2017-12-12 | Apple Inc. | Parsimonious continuous-space phrase representations for natural language processing |
US10652103B2 (en) * | 2015-04-24 | 2020-05-12 | Goldman Sachs & Co. LLC | System and method for handling events involving computing systems and networks using fabric monitoring system |
US20160315822A1 (en) * | 2015-04-24 | 2016-10-27 | Goldman, Sachs & Co. | System and method for handling events involving computing systems and networks using fabric monitoring system |
US10630792B2 (en) | 2015-04-29 | 2020-04-21 | Facebook, Inc. | Methods and systems for viewing user feedback |
US9774693B2 (en) * | 2015-04-29 | 2017-09-26 | Facebook, Inc. | Methods and systems for viewing user feedback |
US20160323395A1 (en) * | 2015-04-29 | 2016-11-03 | Facebook, Inc. | Methods and Systems for Viewing User Feedback |
US12001933B2 (en) | 2015-05-15 | 2024-06-04 | Apple Inc. | Virtual assistant in a communication session |
US11468282B2 (en) | 2015-05-15 | 2022-10-11 | Apple Inc. | Virtual assistant in a communication session |
US10114959B2 (en) * | 2015-05-18 | 2018-10-30 | Ricoh Company, Ltd. | Information processing apparatus, information processing method, and information processing system |
CN104881711A (en) * | 2015-05-18 | 2015-09-02 | 中国矿业大学 | Underground early-warning mechanism based on miner behavioral analysis |
CN105005578A (en) * | 2015-05-21 | 2015-10-28 | 中国电子科技集团公司第十研究所 | Multimedia target information visual analysis system |
US10410135B2 (en) * | 2015-05-21 | 2019-09-10 | Software Ag Usa, Inc. | Systems and/or methods for dynamic anomaly detection in machine sensor data |
CN104933095A (en) * | 2015-05-22 | 2015-09-23 | 中国电子科技集团公司第十研究所 | Heterogeneous information universality correlation analysis system and analysis method thereof |
US11070949B2 (en) | 2015-05-27 | 2021-07-20 | Apple Inc. | Systems and methods for proactively identifying and surfacing relevant content on an electronic device with a touch-sensitive display |
US11127397B2 (en) | 2015-05-27 | 2021-09-21 | Apple Inc. | Device voice control |
US10083688B2 (en) | 2015-05-27 | 2018-09-25 | Apple Inc. | Device voice control for selecting a displayed affordance |
US10187260B1 (en) | 2015-05-29 | 2019-01-22 | Quest Software Inc. | Systems and methods for multilayer monitoring of network function virtualization architectures |
US10127220B2 (en) | 2015-06-04 | 2018-11-13 | Apple Inc. | Language identification from short strings |
US10356243B2 (en) | 2015-06-05 | 2019-07-16 | Apple Inc. | Virtual assistant aided communication with 3rd party service in a communication session |
US10101822B2 (en) | 2015-06-05 | 2018-10-16 | Apple Inc. | Language input correction |
US10681212B2 (en) | 2015-06-05 | 2020-06-09 | Apple Inc. | Virtual assistant aided communication with 3rd party service in a communication session |
US10255907B2 (en) | 2015-06-07 | 2019-04-09 | Apple Inc. | Automatic accent detection using acoustic models |
US11025565B2 (en) | 2015-06-07 | 2021-06-01 | Apple Inc. | Personalized prediction of responses for instant messaging |
US10186254B2 (en) | 2015-06-07 | 2019-01-22 | Apple Inc. | Context-based endpoint detection |
US20160364733A1 (en) * | 2015-06-09 | 2016-12-15 | International Business Machines Corporation | Attitude Inference |
EP3107026A1 (en) * | 2015-06-17 | 2016-12-21 | Accenture Global Services Limited | Event anomaly analysis and prediction |
US10909241B2 (en) | 2015-06-17 | 2021-02-02 | Accenture Global Services Limited | Event anomaly analysis and prediction |
US10192051B2 (en) | 2015-06-17 | 2019-01-29 | Accenture Global Services Limited | Data acceleration |
US10043006B2 (en) | 2015-06-17 | 2018-08-07 | Accenture Global Services Limited | Event anomaly analysis and prediction |
US10116605B2 (en) | 2015-06-22 | 2018-10-30 | Cisco Technology, Inc. | Transport stack name scheme and identity management |
US10075402B2 (en) | 2015-06-24 | 2018-09-11 | Cisco Technology, Inc. | Flexible command and control in content centric networks |
US11947873B2 (en) | 2015-06-29 | 2024-04-02 | Apple Inc. | Virtual assistant for media playback |
US11010127B2 (en) | 2015-06-29 | 2021-05-18 | Apple Inc. | Virtual assistant for media playback |
US10701038B2 (en) | 2015-07-27 | 2020-06-30 | Cisco Technology, Inc. | Content negotiation in a content centric network |
US20170032129A1 (en) * | 2015-07-30 | 2017-02-02 | IOR Analytics, LLC | Method and apparatus for data security analysis of data flows |
US20190132351A1 (en) * | 2015-07-30 | 2019-05-02 | IOR Analytics, LLC. | Method and apparatus for data security analysis of data flows |
US10693903B2 (en) * | 2015-07-30 | 2020-06-23 | IOR Analytics, LLC. | Method and apparatus for data security analysis of data flows |
US10198582B2 (en) * | 2015-07-30 | 2019-02-05 | IOR Analytics, LLC | Method and apparatus for data security analysis of data flows |
US11030527B2 (en) | 2015-07-31 | 2021-06-08 | Brighterion, Inc. | Method for calling for preemptive maintenance and for equipment failure prevention |
US9986034B2 (en) | 2015-08-03 | 2018-05-29 | Cisco Technology, Inc. | Transferring state in content centric network stacks |
US10803074B2 (en) | 2015-08-10 | 2020-10-13 | Hewlett Packard Entperprise Development LP | Evaluating system behaviour |
US10776463B2 (en) * | 2015-08-12 | 2020-09-15 | Kryptowire LLC | Active authentication of users |
US10610144B2 (en) | 2015-08-19 | 2020-04-07 | Palo Alto Research Center Incorporated | Interactive remote patient monitoring and condition management intervention system |
GB2541649A (en) * | 2015-08-21 | 2017-03-01 | Senseye Ltd | User feedback for machine learning |
WO2017035455A1 (en) * | 2015-08-27 | 2017-03-02 | Dynology Corporation | System and method for electronically monitoring employees to determine potential risk |
US11961029B2 (en) | 2015-08-27 | 2024-04-16 | Clearforce, Inc. | Systems and methods for electronically monitoring employees to determine potential risk |
US10972332B2 (en) * | 2015-08-31 | 2021-04-06 | Adobe Inc. | Identifying factors that contribute to a metric anomaly |
US10805328B2 (en) | 2015-09-05 | 2020-10-13 | Mastercard Technologies Canada ULC | Systems and methods for detecting and scoring anomalies |
US10749884B2 (en) | 2015-09-05 | 2020-08-18 | Mastercard Technologies Canada ULC | Systems and methods for detecting and preventing spoofing |
US10965695B2 (en) | 2015-09-05 | 2021-03-30 | Mastercard Technologies Canada ULC | Systems and methods for matching and scoring sameness |
FR3040810A1 (en) * | 2015-09-07 | 2017-03-10 | Docapost Dps | COMPUTER SYSTEM FOR SECURE MANAGEMENT OF DIGITAL INFORMATION |
US10417438B2 (en) * | 2015-09-07 | 2019-09-17 | Docapost Dps | Computer system of secure digital information managing |
EP3139296A1 (en) * | 2015-09-07 | 2017-03-08 | Docapost DPS | Computer system of secure digital information managing |
US11853536B2 (en) | 2015-09-08 | 2023-12-26 | Apple Inc. | Intelligent automated assistant in a media environment |
US11500672B2 (en) | 2015-09-08 | 2022-11-15 | Apple Inc. | Distributed personal assistant |
US11550542B2 (en) | 2015-09-08 | 2023-01-10 | Apple Inc. | Zero latency digital assistant |
US10747498B2 (en) | 2015-09-08 | 2020-08-18 | Apple Inc. | Zero latency digital assistant |
US10671428B2 (en) | 2015-09-08 | 2020-06-02 | Apple Inc. | Distributed personal assistant |
US11809483B2 (en) | 2015-09-08 | 2023-11-07 | Apple Inc. | Intelligent automated assistant for media search and playback |
US11126400B2 (en) | 2015-09-08 | 2021-09-21 | Apple Inc. | Zero latency digital assistant |
US11954405B2 (en) | 2015-09-08 | 2024-04-09 | Apple Inc. | Zero latency digital assistant |
US10419345B2 (en) | 2015-09-11 | 2019-09-17 | Cisco Technology, Inc. | Network named fragments in a content centric network |
US9832123B2 (en) | 2015-09-11 | 2017-11-28 | Cisco Technology, Inc. | Network named fragments in a content centric network |
US11646947B2 (en) | 2015-09-16 | 2023-05-09 | Adobe Inc. | Determining audience segments of users that contributed to a metric anomaly |
US10985993B2 (en) * | 2015-09-16 | 2021-04-20 | Adobe Inc. | Identifying audiences that contribute to metric anomalies |
US20170076202A1 (en) * | 2015-09-16 | 2017-03-16 | Adobe Systems Incorporated | Identifying audiences that contribute to metric anomalies |
US10200252B1 (en) | 2015-09-18 | 2019-02-05 | Quest Software Inc. | Systems and methods for integrated modeling of monitored virtual desktop infrastructure systems |
US10355999B2 (en) | 2015-09-23 | 2019-07-16 | Cisco Technology, Inc. | Flow control with network named fragments |
US9697820B2 (en) | 2015-09-24 | 2017-07-04 | Apple Inc. | Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks |
US9977809B2 (en) | 2015-09-24 | 2018-05-22 | Cisco Technology, Inc. | Information and data framework in a content centric network |
US10313227B2 (en) | 2015-09-24 | 2019-06-04 | Cisco Technology, Inc. | System and method for eliminating undetected interest looping in information-centric networks |
US20170091244A1 (en) * | 2015-09-24 | 2017-03-30 | Microsoft Technology Licensing, Llc | Searching a Data Structure |
US10366158B2 (en) | 2015-09-29 | 2019-07-30 | Apple Inc. | Efficient word encoding for recurrent neural network language models |
US20170093899A1 (en) * | 2015-09-29 | 2017-03-30 | International Business Machines Corporation | Crowd-based detection of device compromise in enterprise setting |
US10454820B2 (en) | 2015-09-29 | 2019-10-22 | Cisco Technology, Inc. | System and method for stateless information-centric networking |
US9888021B2 (en) * | 2015-09-29 | 2018-02-06 | International Business Machines Corporation | Crowd based detection of device compromise in enterprise setting |
US11010550B2 (en) | 2015-09-29 | 2021-05-18 | Apple Inc. | Unified language modeling framework for word prediction, auto-completion and auto-correction |
US12051413B2 (en) | 2015-09-30 | 2024-07-30 | Apple Inc. | Intelligent device identification |
US11587559B2 (en) | 2015-09-30 | 2023-02-21 | Apple Inc. | Intelligent device identification |
US10148673B1 (en) * | 2015-09-30 | 2018-12-04 | EMC IP Holding Company LLC | Automatic selection of malicious activity detection rules using crowd-sourcing techniques |
US9892533B1 (en) * | 2015-10-01 | 2018-02-13 | Hrl Laboratories, Llc | Graph visualization system based on gravitational forces due to path distance and betweenness centrality |
US10263965B2 (en) | 2015-10-16 | 2019-04-16 | Cisco Technology, Inc. | Encrypted CCNx |
US20170111378A1 (en) * | 2015-10-20 | 2017-04-20 | International Business Machines Corporation | User configurable message anomaly scoring to identify unusual activity in information technology systems |
US10169719B2 (en) * | 2015-10-20 | 2019-01-01 | International Business Machines Corporation | User configurable message anomaly scoring to identify unusual activity in information technology systems |
US20170116531A1 (en) * | 2015-10-27 | 2017-04-27 | International Business Machines Corporation | Detecting emerging life events and identifying opportunity and risk from behavior |
US11750631B2 (en) | 2015-10-28 | 2023-09-05 | Qomplx, Inc. | System and method for comprehensive data loss prevention and compliance management |
US11297088B2 (en) * | 2015-10-28 | 2022-04-05 | Qomplx, Inc. | System and method for comprehensive data loss prevention and compliance management |
US9794238B2 (en) | 2015-10-29 | 2017-10-17 | Cisco Technology, Inc. | System for key exchange in a content centric network |
US10129230B2 (en) | 2015-10-29 | 2018-11-13 | Cisco Technology, Inc. | System for key exchange in a content centric network |
US10192050B2 (en) * | 2015-10-30 | 2019-01-29 | General Electric Company | Methods, systems, apparatus, and storage media for use in detecting anomalous behavior and/or in preventing data loss |
US11786825B2 (en) | 2015-10-30 | 2023-10-17 | Electronic Arts Inc. | Fraud detection system |
US11179639B1 (en) | 2015-10-30 | 2021-11-23 | Electronic Arts Inc. | Fraud detection system |
US20170213025A1 (en) * | 2015-10-30 | 2017-07-27 | General Electric Company | Methods, systems, apparatus, and storage media for use in detecting anomalous behavior and/or in preventing data loss |
US9807205B2 (en) | 2015-11-02 | 2017-10-31 | Cisco Technology, Inc. | Header compression for CCN messages using dictionary |
US10009446B2 (en) | 2015-11-02 | 2018-06-26 | Cisco Technology, Inc. | Header compression for CCN messages using dictionary learning |
US20170126821A1 (en) * | 2015-11-02 | 2017-05-04 | International Business Machines Corporation | Analyzing the Online Behavior of a User and for Generating an Alert Based on Behavioral Deviations of the User |
US10043221B2 (en) | 2015-11-02 | 2018-08-07 | International Business Machines Corporation | Assigning confidence levels to online profiles |
US10021222B2 (en) | 2015-11-04 | 2018-07-10 | Cisco Technology, Inc. | Bit-aligned header compression for CCN messages using dictionary |
US11526368B2 (en) | 2015-11-06 | 2022-12-13 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US11809886B2 (en) | 2015-11-06 | 2023-11-07 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US10691473B2 (en) | 2015-11-06 | 2020-06-23 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US11886805B2 (en) | 2015-11-09 | 2024-01-30 | Apple Inc. | Unconventional virtual assistant interactions |
US20170140117A1 (en) * | 2015-11-18 | 2017-05-18 | Ucb Biopharma Sprl | Method and system for generating and displaying topics in raw uncategorized data and for categorizing such data |
US10681018B2 (en) | 2015-11-20 | 2020-06-09 | Cisco Technology, Inc. | Transparent encryption in a content centric network |
US10097521B2 (en) | 2015-11-20 | 2018-10-09 | Cisco Technology, Inc. | Transparent encryption in a content centric network |
US10387476B2 (en) * | 2015-11-24 | 2019-08-20 | International Business Machines Corporation | Semantic mapping of topic map meta-models identifying assets and events to include modeled reactive actions |
US20170154314A1 (en) * | 2015-11-30 | 2017-06-01 | FAMA Technologies, Inc. | System for searching and correlating online activity with individual classification factors |
US10866939B2 (en) * | 2015-11-30 | 2020-12-15 | Micro Focus Llc | Alignment and deduplication of time-series datasets |
US10237226B2 (en) | 2015-11-30 | 2019-03-19 | International Business Machines Corporation | Detection of manipulation of social media content |
US9912776B2 (en) | 2015-12-02 | 2018-03-06 | Cisco Technology, Inc. | Explicit content deletion commands in a content centric network |
US10049668B2 (en) | 2015-12-02 | 2018-08-14 | Apple Inc. | Applying neural network language models to weighted finite state transducers for automatic speech recognition |
US10354652B2 (en) | 2015-12-02 | 2019-07-16 | Apple Inc. | Applying neural network language models to weighted finite state transducers for automatic speech recognition |
US20170163663A1 (en) * | 2015-12-02 | 2017-06-08 | Salesforce.Com, Inc. | False positive detection reduction system for network-based attacks |
US10187403B2 (en) * | 2015-12-02 | 2019-01-22 | Salesforce.Com, Inc. | False positive detection reduction system for network-based attacks |
US10097346B2 (en) | 2015-12-09 | 2018-10-09 | Cisco Technology, Inc. | Key catalogs in a content centric network |
US10078062B2 (en) | 2015-12-15 | 2018-09-18 | Palo Alto Research Center Incorporated | Device health estimation by combining contextual information with sensor data |
US9946895B1 (en) * | 2015-12-15 | 2018-04-17 | Amazon Technologies, Inc. | Data obfuscation |
US10223066B2 (en) | 2015-12-23 | 2019-03-05 | Apple Inc. | Proactive assistance based on dialog communication between devices |
US10942703B2 (en) | 2015-12-23 | 2021-03-09 | Apple Inc. | Proactive assistance based on dialog communication between devices |
US11853647B2 (en) | 2015-12-23 | 2023-12-26 | Apple Inc. | Proactive assistance based on dialog communication between devices |
US9830533B2 (en) | 2015-12-30 | 2017-11-28 | International Business Machines Corporation | Analyzing and exploring images posted on social media |
US10581967B2 (en) | 2016-01-11 | 2020-03-03 | Cisco Technology, Inc. | Chandra-Toueg consensus in a content centric network |
US10257271B2 (en) | 2016-01-11 | 2019-04-09 | Cisco Technology, Inc. | Chandra-Toueg consensus in a content centric network |
US10152596B2 (en) | 2016-01-19 | 2018-12-11 | International Business Machines Corporation | Detecting anomalous events through runtime verification of software execution using a behavioral model |
US9949301B2 (en) | 2016-01-20 | 2018-04-17 | Palo Alto Research Center Incorporated | Methods for fast, secure and privacy-friendly internet connection discovery in wireless networks |
US11030886B2 (en) * | 2016-01-21 | 2021-06-08 | Hangzhou Hikvision Digital Technology Co., Ltd. | Method and device for updating online self-learning event detection model |
US10305864B2 (en) | 2016-01-25 | 2019-05-28 | Cisco Technology, Inc. | Method and system for interest encryption in a content centric network |
US10846610B2 (en) * | 2016-02-05 | 2020-11-24 | Nec Corporation | Scalable system and method for real-time predictions and anomaly detection |
WO2017139147A1 (en) * | 2016-02-08 | 2017-08-17 | Nec Laboratories America, Inc. | Ranking causal anomalies via temporal and dynamic analysis on vanishing correlations |
US12126636B2 (en) | 2016-02-09 | 2024-10-22 | Darktrace Holdings Limited | Anomaly alert system for cyber threat detection |
US11470103B2 (en) | 2016-02-09 | 2022-10-11 | Darktrace Holdings Limited | Anomaly alert system for cyber threat detection |
EP3417571A4 (en) * | 2016-02-15 | 2019-10-09 | Certis Cisco Security Pte Ltd | Method and system for compression and optimization of in-line and in-transit information security data |
CN108713310A (en) * | 2016-02-15 | 2018-10-26 | 策安保安有限公司 | Method and system for information security data in online and transmission to be compressed and optimized |
US20170242932A1 (en) * | 2016-02-24 | 2017-08-24 | International Business Machines Corporation | Theft detection via adaptive lexical similarity analysis of social media data streams |
US10469523B2 (en) | 2016-02-24 | 2019-11-05 | Imperva, Inc. | Techniques for detecting compromises of enterprise end stations utilizing noisy tokens |
US20190087750A1 (en) * | 2016-02-26 | 2019-03-21 | Nippon Telegraph And Telephone Corporation | Analysis device, analysis method, and analysis program |
US11868853B2 (en) * | 2016-02-26 | 2024-01-09 | Nippon Telegraph And Telephone Corporation | Analysis device, analysis method, and analysis program |
US10043016B2 (en) | 2016-02-29 | 2018-08-07 | Cisco Technology, Inc. | Method and system for name encryption agreement in a content centric network |
US10003507B2 (en) | 2016-03-04 | 2018-06-19 | Cisco Technology, Inc. | Transport session state protocol |
US11625629B2 (en) | 2016-03-04 | 2023-04-11 | Axon Vibe AG | Systems and methods for predicting user behavior based on location data |
US10038633B2 (en) | 2016-03-04 | 2018-07-31 | Cisco Technology, Inc. | Protocol to query for historical network information in a content centric network |
RU2721176C2 (en) * | 2016-03-04 | 2020-05-18 | Аксон Вайб Аг | Systems and methods for predicting user behavior based on location data |
US10051071B2 (en) | 2016-03-04 | 2018-08-14 | Cisco Technology, Inc. | Method and system for collecting historical network information in a content centric network |
US10742596B2 (en) | 2016-03-04 | 2020-08-11 | Cisco Technology, Inc. | Method and system for reducing a collision probability of hash-based names using a publisher identifier |
US10469378B2 (en) | 2016-03-04 | 2019-11-05 | Cisco Technology, Inc. | Protocol to query for historical network information in a content centric network |
US11062336B2 (en) | 2016-03-07 | 2021-07-13 | Qbeats Inc. | Self-learning valuation |
US11756064B2 (en) | 2016-03-07 | 2023-09-12 | Qbeats Inc. | Self-learning valuation |
US12118577B2 (en) | 2016-03-07 | 2024-10-15 | Qbeats, Inc. | Self-learning valuation |
US9832116B2 (en) | 2016-03-14 | 2017-11-28 | Cisco Technology, Inc. | Adjusting entries in a forwarding information base in a content centric network |
US10446143B2 (en) | 2016-03-14 | 2019-10-15 | Apple Inc. | Identification of voice inputs providing credentials |
US10129368B2 (en) | 2016-03-14 | 2018-11-13 | Cisco Technology, Inc. | Adjusting entries in a forwarding information base in a content centric network |
US10212196B2 (en) | 2016-03-16 | 2019-02-19 | Cisco Technology, Inc. | Interface discovery and authentication in a name-based network |
US11436656B2 (en) | 2016-03-18 | 2022-09-06 | Palo Alto Research Center Incorporated | System and method for a real-time egocentric collaborative filter on large datasets |
US10067948B2 (en) | 2016-03-18 | 2018-09-04 | Cisco Technology, Inc. | Data deduping in content centric networking manifests |
US10459827B1 (en) | 2016-03-22 | 2019-10-29 | Electronic Arts Inc. | Machine-learning based anomaly detection for heterogenous data sources |
US10237295B2 (en) * | 2016-03-22 | 2019-03-19 | Nec Corporation | Automated event ID field analysis on heterogeneous logs |
US10091330B2 (en) | 2016-03-23 | 2018-10-02 | Cisco Technology, Inc. | Interest scheduling by an information and data framework in a content centric network |
US9992018B1 (en) | 2016-03-24 | 2018-06-05 | Electronic Arts Inc. | Generating cryptographic challenges to communication requests |
US10033639B2 (en) | 2016-03-25 | 2018-07-24 | Cisco Technology, Inc. | System and method for routing packets in a content centric network using anonymous datagrams |
US20220121410A1 (en) * | 2016-03-31 | 2022-04-21 | Splunk Inc. | Technology add-on interface |
US10320760B2 (en) | 2016-04-01 | 2019-06-11 | Cisco Technology, Inc. | Method and system for mutating and caching content in a content centric network |
US10348865B2 (en) | 2016-04-04 | 2019-07-09 | Cisco Technology, Inc. | System and method for compressing content centric networking messages |
US9930146B2 (en) | 2016-04-04 | 2018-03-27 | Cisco Technology, Inc. | System and method for compressing content centric networking messages |
US20170286856A1 (en) * | 2016-04-05 | 2017-10-05 | Omni Al, Inc. | Trend analysis for a neuro-linguistic behavior recognition system |
US10425503B2 (en) | 2016-04-07 | 2019-09-24 | Cisco Technology, Inc. | Shared pending interest table in a content centric network |
US10841212B2 (en) | 2016-04-11 | 2020-11-17 | Cisco Technology, Inc. | Method and system for routable prefix queries in a content centric network |
US10831785B2 (en) | 2016-04-11 | 2020-11-10 | International Business Machines Corporation | Identifying security breaches from clustering properties |
US10027578B2 (en) | 2016-04-11 | 2018-07-17 | Cisco Technology, Inc. | Method and system for routable prefix queries in a content centric network |
US20200211141A1 (en) * | 2016-04-22 | 2020-07-02 | FiscalNote, Inc. | Systems and methods for analyzing policymaker influence |
US11263650B2 (en) * | 2016-04-25 | 2022-03-01 | [24]7.ai, Inc. | Process and system to categorize, evaluate and optimize a customer experience |
US10268505B2 (en) | 2016-04-28 | 2019-04-23 | EntIT Software, LLC | Batch job frequency control |
US10404450B2 (en) | 2016-05-02 | 2019-09-03 | Cisco Technology, Inc. | Schematized access control in a content centric network |
US10320675B2 (en) | 2016-05-04 | 2019-06-11 | Cisco Technology, Inc. | System and method for routing packets in a stateless content centric network |
US10547589B2 (en) | 2016-05-09 | 2020-01-28 | Cisco Technology, Inc. | System for implementing a small computer systems interface protocol over a content centric network |
US10404537B2 (en) | 2016-05-13 | 2019-09-03 | Cisco Technology, Inc. | Updating a transport stack in a content centric network |
US10063414B2 (en) | 2016-05-13 | 2018-08-28 | Cisco Technology, Inc. | Updating a transport stack in a content centric network |
US10693852B2 (en) | 2016-05-13 | 2020-06-23 | Cisco Technology, Inc. | System for a secure encryption proxy in a content centric network |
US10084764B2 (en) | 2016-05-13 | 2018-09-25 | Cisco Technology, Inc. | System for a secure encryption proxy in a content centric network |
US9934775B2 (en) | 2016-05-26 | 2018-04-03 | Apple Inc. | Unit-selection text-to-speech synthesis based on predicted concatenation parameters |
US9972304B2 (en) | 2016-06-03 | 2018-05-15 | Apple Inc. | Privacy preserving distributed evaluation framework for embedded personalized systems |
US20230344841A1 (en) * | 2016-06-06 | 2023-10-26 | Netskope, Inc. | Machine learning based anomaly detection initialization |
US11227589B2 (en) | 2016-06-06 | 2022-01-18 | Apple Inc. | Intelligent list reading |
US10249300B2 (en) | 2016-06-06 | 2019-04-02 | Apple Inc. | Intelligent list reading |
US10049663B2 (en) | 2016-06-08 | 2018-08-14 | Apple, Inc. | Intelligent automated assistant for media exploration |
US11069347B2 (en) | 2016-06-08 | 2021-07-20 | Apple Inc. | Intelligent automated assistant for media exploration |
US10354011B2 (en) | 2016-06-09 | 2019-07-16 | Apple Inc. | Intelligent automated assistant in a home environment |
US11657820B2 (en) | 2016-06-10 | 2023-05-23 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US10067938B2 (en) | 2016-06-10 | 2018-09-04 | Apple Inc. | Multilingual word prediction |
US10490187B2 (en) | 2016-06-10 | 2019-11-26 | Apple Inc. | Digital assistant providing automated status report |
US10509862B2 (en) | 2016-06-10 | 2019-12-17 | Apple Inc. | Dynamic phrase expansion of language input |
US10192552B2 (en) | 2016-06-10 | 2019-01-29 | Apple Inc. | Digital assistant providing whispered speech |
US11037565B2 (en) | 2016-06-10 | 2021-06-15 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US10733993B2 (en) | 2016-06-10 | 2020-08-04 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US10580409B2 (en) | 2016-06-11 | 2020-03-03 | Apple Inc. | Application integration with a digital assistant |
US10297253B2 (en) | 2016-06-11 | 2019-05-21 | Apple Inc. | Application integration with a digital assistant |
US10521466B2 (en) | 2016-06-11 | 2019-12-31 | Apple Inc. | Data driven natural language event detection and classification |
US11749275B2 (en) | 2016-06-11 | 2023-09-05 | Apple Inc. | Application integration with a digital assistant |
US10942702B2 (en) | 2016-06-11 | 2021-03-09 | Apple Inc. | Intelligent device arbitration and control |
US10089072B2 (en) | 2016-06-11 | 2018-10-02 | Apple Inc. | Intelligent device arbitration and control |
US11809783B2 (en) | 2016-06-11 | 2023-11-07 | Apple Inc. | Intelligent device arbitration and control |
US11152002B2 (en) | 2016-06-11 | 2021-10-19 | Apple Inc. | Application integration with a digital assistant |
US10269345B2 (en) | 2016-06-11 | 2019-04-23 | Apple Inc. | Intelligent task discovery |
US10103989B2 (en) | 2016-06-13 | 2018-10-16 | Cisco Technology, Inc. | Content object return messages in a content centric network |
US11386213B2 (en) * | 2016-06-21 | 2022-07-12 | Unisys Corporation | Systems and methods for efficient access control |
US10305865B2 (en) | 2016-06-21 | 2019-05-28 | Cisco Technology, Inc. | Permutation-based content encryption with manifests in a content centric network |
US20220366065A1 (en) * | 2016-06-21 | 2022-11-17 | Unisys Corporation | Systems and methods for efficient access control |
US11977650B2 (en) * | 2016-06-21 | 2024-05-07 | Unisys Corporation | Systems and methods for efficient access control |
US20170364693A1 (en) * | 2016-06-21 | 2017-12-21 | Unisys Corporation | Systems and methods for efficient access control |
US10581741B2 (en) | 2016-06-27 | 2020-03-03 | Cisco Technology, Inc. | Method and system for interest groups in a content centric network |
US11188864B2 (en) * | 2016-06-27 | 2021-11-30 | International Business Machines Corporation | Calculating an expertise score from aggregated employee data |
US10148572B2 (en) | 2016-06-27 | 2018-12-04 | Cisco Technology, Inc. | Method and system for interest groups in a content centric network |
US10628462B2 (en) * | 2016-06-27 | 2020-04-21 | Microsoft Technology Licensing, Llc | Propagating a status among related events |
US10783184B2 (en) * | 2016-06-30 | 2020-09-22 | Hitachi, Ltd. | Data generation method and computer system |
US20180004860A1 (en) * | 2016-06-30 | 2018-01-04 | Hitachi, Ltd. | Data generation method and computer system |
US10009266B2 (en) | 2016-07-05 | 2018-06-26 | Cisco Technology, Inc. | Method and system for reference counted pending interest tables in a content centric network |
US10230601B1 (en) | 2016-07-05 | 2019-03-12 | Quest Software Inc. | Systems and methods for integrated modeling and performance measurements of monitored virtual desktop infrastructure systems |
US9992097B2 (en) | 2016-07-11 | 2018-06-05 | Cisco Technology, Inc. | System and method for piggybacking routing information in interests in a content centric network |
US9875360B1 (en) | 2016-07-14 | 2018-01-23 | IronNet Cybersecurity, Inc. | Simulation and virtual reality based cyber behavioral systems |
US9910993B2 (en) | 2016-07-14 | 2018-03-06 | IronNet Cybersecurity, Inc. | Simulation and virtual reality based cyber behavioral systems |
US20210313056A1 (en) * | 2016-07-18 | 2021-10-07 | Abbyy Development Inc. | System and method for visual analysis of event sequences |
US10122624B2 (en) | 2016-07-25 | 2018-11-06 | Cisco Technology, Inc. | System and method for ephemeral entries in a forwarding information base in a content centric network |
US20180039774A1 (en) * | 2016-08-08 | 2018-02-08 | International Business Machines Corporation | Install-Time Security Analysis of Mobile Applications |
US10621333B2 (en) * | 2016-08-08 | 2020-04-14 | International Business Machines Corporation | Install-time security analysis of mobile applications |
US10069729B2 (en) | 2016-08-08 | 2018-09-04 | Cisco Technology, Inc. | System and method for throttling traffic based on a forwarding information base in a content centric network |
US10956412B2 (en) | 2016-08-09 | 2021-03-23 | Cisco Technology, Inc. | Method and system for conjunctive normal form attribute matching in a content centric network |
US10460320B1 (en) | 2016-08-10 | 2019-10-29 | Electronic Arts Inc. | Fraud detection in heterogeneous information networks |
US20180060437A1 (en) * | 2016-08-29 | 2018-03-01 | EverString Innovation Technology | Keyword and business tag extraction |
US11093557B2 (en) * | 2016-08-29 | 2021-08-17 | Zoominfo Apollo Llc | Keyword and business tag extraction |
US11151502B2 (en) * | 2016-09-01 | 2021-10-19 | PagerDuty, Inc. | Real-time adaptive operations performance management system |
US10474753B2 (en) | 2016-09-07 | 2019-11-12 | Apple Inc. | Language identification using recurrent neural networks |
US10848508B2 (en) | 2016-09-07 | 2020-11-24 | Patternex, Inc. | Method and system for generating synthetic feature vectors from real, labelled feature vectors in artificial intelligence training of a big data machine to defend |
US10623428B2 (en) | 2016-09-12 | 2020-04-14 | Vectra Networks, Inc. | Method and system for detecting suspicious administrative activity |
WO2018052984A1 (en) * | 2016-09-14 | 2018-03-22 | The Dun & Bradstreet Corporation | Geolocating entities of interest on geo heat maps |
US9836183B1 (en) * | 2016-09-14 | 2017-12-05 | Quid, Inc. | Summarized network graph for semantic similarity graphs of large corpora |
US20180074856A1 (en) * | 2016-09-15 | 2018-03-15 | Oracle International Corporation | Processing timestamps and heartbeat events for automatic time progression |
US11061722B2 (en) | 2016-09-15 | 2021-07-13 | Oracle International Corporation | Processing timestamps and heartbeat events for automatic time progression |
US10514952B2 (en) * | 2016-09-15 | 2019-12-24 | Oracle International Corporation | Processing timestamps and heartbeat events for automatic time progression |
US11503097B2 (en) * | 2016-09-19 | 2022-11-15 | Ebay Inc. | Interactive real-time visualization system for large-scale streaming data |
US20200120151A1 (en) * | 2016-09-19 | 2020-04-16 | Ebay Inc. | Interactive Real-Time Visualization System for Large-Scale Streaming Data |
US10033642B2 (en) | 2016-09-19 | 2018-07-24 | Cisco Technology, Inc. | System and method for making optimal routing decisions based on device-specific parameters in a content centric network |
US20180082193A1 (en) * | 2016-09-21 | 2018-03-22 | Scianta Analytics, LLC | Cognitive modeling apparatus for defuzzification of multiple qualitative signals into human-centric threat notifications |
US11050768B1 (en) * | 2016-09-21 | 2021-06-29 | Amazon Technologies, Inc. | Detecting compute resource anomalies in a group of computing resources |
US11348016B2 (en) * | 2016-09-21 | 2022-05-31 | Scianta Analytics, LLC | Cognitive modeling apparatus for assessing values qualitatively across a multiple dimension terrain |
US11017298B2 (en) * | 2016-09-21 | 2021-05-25 | Scianta Analytics Llc | Cognitive modeling apparatus for detecting and adjusting qualitative contexts across multiple dimensions for multiple actors |
US20180084001A1 (en) * | 2016-09-22 | 2018-03-22 | Microsoft Technology Licensing, Llc. | Enterprise graph method of threat detection |
US10771492B2 (en) * | 2016-09-22 | 2020-09-08 | Microsoft Technology Licensing, Llc | Enterprise graph method of threat detection |
US10553215B2 (en) | 2016-09-23 | 2020-02-04 | Apple Inc. | Intelligent automated assistant |
US10043516B2 (en) | 2016-09-23 | 2018-08-07 | Apple Inc. | Intelligent automated assistant |
US10055535B2 (en) * | 2016-09-27 | 2018-08-21 | Globalfoundries Inc. | Method, system and program product for identifying anomalies in integrated circuit design layouts |
US10212248B2 (en) | 2016-10-03 | 2019-02-19 | Cisco Technology, Inc. | Cache management on high availability routers in a content centric network |
US10897518B2 (en) | 2016-10-03 | 2021-01-19 | Cisco Technology, Inc. | Cache management on high availability routers in a content centric network |
CN109791587A (en) * | 2016-10-05 | 2019-05-21 | 微软技术许可有限责任公司 | Equipment is endangered via User Status detection |
US10447805B2 (en) | 2016-10-10 | 2019-10-15 | Cisco Technology, Inc. | Distributed consensus in a content centric network |
US10721332B2 (en) | 2016-10-31 | 2020-07-21 | Cisco Technology, Inc. | System and method for process migration in a content centric network |
US10135948B2 (en) | 2016-10-31 | 2018-11-20 | Cisco Technology, Inc. | System and method for process migration in a content centric network |
US10243851B2 (en) | 2016-11-21 | 2019-03-26 | Cisco Technology, Inc. | System and method for forwarder connection information in a content centric network |
US11243922B2 (en) * | 2016-12-01 | 2022-02-08 | Tencent Technology (Shenzhen) Company Limited | Method, apparatus, and storage medium for migrating data node in database cluster |
US11281993B2 (en) | 2016-12-05 | 2022-03-22 | Apple Inc. | Model and ensemble compression for metric learning |
US10304263B2 (en) * | 2016-12-13 | 2019-05-28 | The Boeing Company | Vehicle system prognosis device and method |
US11138817B2 (en) * | 2016-12-13 | 2021-10-05 | The Boeing Company | Vehicle system prognosis device and method |
US10171510B2 (en) * | 2016-12-14 | 2019-01-01 | CyberSaint, Inc. | System and method for monitoring and grading a cybersecurity framework |
US11102249B2 (en) | 2016-12-14 | 2021-08-24 | CyberSaint, Inc. | System and method for monitoring and grading a cybersecurity framework |
US11616812B2 (en) | 2016-12-19 | 2023-03-28 | Attivo Networks Inc. | Deceiving attackers accessing active directory data |
US11997139B2 (en) | 2016-12-19 | 2024-05-28 | SentinelOne, Inc. | Deceiving attackers accessing network data |
US11695800B2 (en) | 2016-12-19 | 2023-07-04 | SentinelOne, Inc. | Deceiving attackers accessing network data |
US10791134B2 (en) | 2016-12-21 | 2020-09-29 | Threat Stack, Inc. | System and method for cloud-based operating system event and data access monitoring |
US10346450B2 (en) * | 2016-12-21 | 2019-07-09 | Ca, Inc. | Automatic datacenter state summarization |
US10320636B2 (en) | 2016-12-21 | 2019-06-11 | Ca, Inc. | State information completion using context graphs |
WO2018119068A1 (en) * | 2016-12-21 | 2018-06-28 | Threat Stack, Inc. | System and method for cloud-based operating system event and data access monitoring |
US10423647B2 (en) | 2016-12-21 | 2019-09-24 | Ca, Inc. | Descriptive datacenter state comparison |
US11283822B2 (en) | 2016-12-21 | 2022-03-22 | F5, Inc. | System and method for cloud-based operating system event and data access monitoring |
US10593346B2 (en) | 2016-12-22 | 2020-03-17 | Apple Inc. | Rank-reduced token representation for automatic speech recognition |
US20180181895A1 (en) * | 2016-12-23 | 2018-06-28 | Yodlee, Inc. | Identifying Recurring Series From Transactional Data |
US10902365B2 (en) * | 2016-12-23 | 2021-01-26 | Yodlee, Inc. | Identifying recurring series from transactional data |
US10372702B2 (en) * | 2016-12-28 | 2019-08-06 | Intel Corporation | Methods and apparatus for detecting anomalies in electronic data |
US11656884B2 (en) | 2017-01-09 | 2023-05-23 | Apple Inc. | Application integration with a digital assistant |
US11204787B2 (en) | 2017-01-09 | 2021-12-21 | Apple Inc. | Application integration with a digital assistant |
US10831796B2 (en) * | 2017-01-15 | 2020-11-10 | International Business Machines Corporation | Tone optimization for digital content |
US20180203847A1 (en) * | 2017-01-15 | 2018-07-19 | International Business Machines Corporation | Tone optimization for digital content |
US20190235725A1 (en) * | 2017-02-08 | 2019-08-01 | International Business Machines Corporation | Monitoring an activity and determining the type of actor performing the activity |
US10338802B2 (en) * | 2017-02-08 | 2019-07-02 | International Business Machines Corporation | Monitoring an activity and determining the type of actor performing the activity |
US10684770B2 (en) * | 2017-02-08 | 2020-06-16 | International Business Machines Corporation | Monitoring an activity and determining the type of actor performing the activity |
US10528533B2 (en) * | 2017-02-09 | 2020-01-07 | Adobe Inc. | Anomaly detection at coarser granularity of data |
US11218497B2 (en) | 2017-02-20 | 2022-01-04 | Micro Focus Llc | Reporting behavior anomalies |
US10419269B2 (en) | 2017-02-21 | 2019-09-17 | Entit Software Llc | Anomaly detection |
US10868832B2 (en) | 2017-03-22 | 2020-12-15 | Ca, Inc. | Systems and methods for enforcing dynamic network security policies |
US10397259B2 (en) | 2017-03-23 | 2019-08-27 | International Business Machines Corporation | Cyber security event detection |
US10841321B1 (en) * | 2017-03-28 | 2020-11-17 | Veritas Technologies Llc | Systems and methods for detecting suspicious users on networks |
US10931532B2 (en) * | 2017-03-31 | 2021-02-23 | Bmc Software, Inc. | Cloud service interdependency relationship detection |
US20180287877A1 (en) * | 2017-03-31 | 2018-10-04 | Bmc Software, Inc | Cloud service interdependency relationship detection |
US11233702B2 (en) | 2017-03-31 | 2022-01-25 | Bmc Software, Inc. | Cloud service interdependency relationship detection |
US11232364B2 (en) | 2017-04-03 | 2022-01-25 | DataVisor, Inc. | Automated rule recommendation engine |
US10679002B2 (en) * | 2017-04-13 | 2020-06-09 | International Business Machines Corporation | Text analysis of narrative documents |
US11621969B2 (en) | 2017-04-26 | 2023-04-04 | Elasticsearch B.V. | Clustering and outlier detection in anomaly and causation detection for computing environments |
WO2018200113A1 (en) * | 2017-04-26 | 2018-11-01 | Elasticsearch B.V. | Anomaly and causation detection in computing environments |
US10986110B2 (en) | 2017-04-26 | 2021-04-20 | Elasticsearch B.V. | Anomaly and causation detection in computing environments using counterfactual processing |
US11783046B2 (en) | 2017-04-26 | 2023-10-10 | Elasticsearch B.V. | Anomaly and causation detection in computing environments |
US20220335042A1 (en) * | 2017-04-27 | 2022-10-20 | Google Llc | Cloud inference system |
US11734292B2 (en) * | 2017-04-27 | 2023-08-22 | Google Llc | Cloud inference system |
US10999317B2 (en) * | 2017-04-28 | 2021-05-04 | International Business Machines Corporation | Blockchain tracking of virtual universe traversal results |
US11394725B1 (en) * | 2017-05-03 | 2022-07-19 | Hrl Laboratories, Llc | Method and system for privacy-preserving targeted substructure discovery on multiplex networks |
US10432639B1 (en) * | 2017-05-04 | 2019-10-01 | Amazon Technologies, Inc. | Security management for graph analytics |
US20180322276A1 (en) * | 2017-05-04 | 2018-11-08 | Crowdstrike, Inc. | Least recently used (lru)-based event suppression |
US10635806B2 (en) * | 2017-05-04 | 2020-04-28 | Crowdstrike, Inc. | Least recently used (LRU)-based event suppression |
US10417266B2 (en) | 2017-05-09 | 2019-09-17 | Apple Inc. | Context-aware ranking of intelligent response suggestions |
US10332518B2 (en) | 2017-05-09 | 2019-06-25 | Apple Inc. | User interface for correcting recognition errors |
US10741181B2 (en) | 2017-05-09 | 2020-08-11 | Apple Inc. | User interface for correcting recognition errors |
US10847142B2 (en) | 2017-05-11 | 2020-11-24 | Apple Inc. | Maintaining privacy of personal information |
US11599331B2 (en) | 2017-05-11 | 2023-03-07 | Apple Inc. | Maintaining privacy of personal information |
US10726832B2 (en) | 2017-05-11 | 2020-07-28 | Apple Inc. | Maintaining privacy of personal information |
US10395654B2 (en) | 2017-05-11 | 2019-08-27 | Apple Inc. | Text normalization based on a data-driven learning network |
US11467802B2 (en) | 2017-05-11 | 2022-10-11 | Apple Inc. | Maintaining privacy of personal information |
US10755703B2 (en) | 2017-05-11 | 2020-08-25 | Apple Inc. | Offline personal assistant |
US11580990B2 (en) | 2017-05-12 | 2023-02-14 | Apple Inc. | User-specific acoustic models |
US10791176B2 (en) | 2017-05-12 | 2020-09-29 | Apple Inc. | Synchronization and task delegation of a digital assistant |
US10789945B2 (en) | 2017-05-12 | 2020-09-29 | Apple Inc. | Low-latency intelligent automated assistant |
US11301477B2 (en) | 2017-05-12 | 2022-04-12 | Apple Inc. | Feedback analysis of a digital assistant |
US10410637B2 (en) | 2017-05-12 | 2019-09-10 | Apple Inc. | User-specific acoustic models |
US11538469B2 (en) | 2017-05-12 | 2022-12-27 | Apple Inc. | Low-latency intelligent automated assistant |
US11862151B2 (en) | 2017-05-12 | 2024-01-02 | Apple Inc. | Low-latency intelligent automated assistant |
US11380310B2 (en) | 2017-05-12 | 2022-07-05 | Apple Inc. | Low-latency intelligent automated assistant |
US11405466B2 (en) | 2017-05-12 | 2022-08-02 | Apple Inc. | Synchronization and task delegation of a digital assistant |
US10810274B2 (en) | 2017-05-15 | 2020-10-20 | Apple Inc. | Optimizing dialogue policy decisions for digital assistants using implicit feedback |
US10264012B2 (en) | 2017-05-15 | 2019-04-16 | Forcepoint, LLC | User behavior profile |
US11082440B2 (en) | 2017-05-15 | 2021-08-03 | Forcepoint Llc | User profile definition and management |
US10855693B2 (en) | 2017-05-15 | 2020-12-01 | Forcepoint, LLC | Using an adaptive trust profile to generate inferences |
US10855692B2 (en) * | 2017-05-15 | 2020-12-01 | Forcepoint, LLC | Adaptive trust profile endpoint |
US10645096B2 (en) | 2017-05-15 | 2020-05-05 | Forcepoint Llc | User behavior profile environment |
US11888859B2 (en) | 2017-05-15 | 2024-01-30 | Forcepoint Llc | Associating a security risk persona with a phase of a cyber kill chain |
US11528281B2 (en) | 2017-05-15 | 2022-12-13 | Forcepoint Llc | Security analytics mapping system |
US11025646B2 (en) | 2017-05-15 | 2021-06-01 | Forcepoint, LLC | Risk adaptive protection |
US11888860B2 (en) * | 2017-05-15 | 2024-01-30 | Forcepoint Llc | Correlating concerning behavior during an activity session with a security risk persona |
US11888863B2 (en) | 2017-05-15 | 2024-01-30 | Forcepoint Llc | Maintaining user privacy via a distributed framework for security analytics |
US11888861B2 (en) * | 2017-05-15 | 2024-01-30 | Forcepoint Llc | Using an entity behavior catalog when performing human-centric risk modeling operations |
US10862927B2 (en) | 2017-05-15 | 2020-12-08 | Forcepoint, LLC | Dividing events into sessions during adaptive trust profile operations |
US11546351B2 (en) | 2017-05-15 | 2023-01-03 | Forcepoint Llc | Using human factors when performing a human factor risk operation |
US10862901B2 (en) | 2017-05-15 | 2020-12-08 | Forcepoint, LLC | User behavior profile including temporal detail corresponding to user interaction |
US11888864B2 (en) | 2017-05-15 | 2024-01-30 | Forcepoint Llc | Security analytics mapping operation within a distributed security analytics environment |
US11888862B2 (en) * | 2017-05-15 | 2024-01-30 | Forcepoint Llc | Distributed framework for security analytics |
US20210144153A1 (en) * | 2017-05-15 | 2021-05-13 | Forcepoint, LLC | Generating a Security Risk Persona Using Stressor Data |
US11979414B2 (en) | 2017-05-15 | 2024-05-07 | Forcepoint Llc | Using content stored in an entity behavior catalog when performing a human factor risk operation |
US10623431B2 (en) * | 2017-05-15 | 2020-04-14 | Forcepoint Llc | Discerning psychological state from correlated user behavior and contextual information |
US10834097B2 (en) * | 2017-05-15 | 2020-11-10 | Forcepoint, LLC | Adaptive trust profile components |
US10999297B2 (en) | 2017-05-15 | 2021-05-04 | Forcepoint, LLC | Using expected behavior of an entity when prepopulating an adaptive trust profile |
US10542013B2 (en) | 2017-05-15 | 2020-01-21 | Forcepoint Llc | User behavior profile in a blockchain |
US10999296B2 (en) | 2017-05-15 | 2021-05-04 | Forcepoint, LLC | Generating adaptive trust profiles using information derived from similarly situated organizations |
US10834098B2 (en) | 2017-05-15 | 2020-11-10 | Forcepoint, LLC | Using a story when generating inferences using an adaptive trust profile |
US20210120011A1 (en) * | 2017-05-15 | 2021-04-22 | Forcepoint, LLC | Using an Entity Behavior Catalog When Performing Human-Centric Risk Modeling Operations |
US10326776B2 (en) | 2017-05-15 | 2019-06-18 | Forcepoint, LLC | User behavior profile including temporal detail corresponding to user interaction |
US11902293B2 (en) | 2017-05-15 | 2024-02-13 | Forcepoint Llc | Using an entity behavior catalog when performing distributed security operations |
US10171488B2 (en) * | 2017-05-15 | 2019-01-01 | Forcepoint, LLC | User behavior profile |
US11516225B2 (en) | 2017-05-15 | 2022-11-29 | Forcepoint Llc | Human factors framework |
US10530786B2 (en) | 2017-05-15 | 2020-01-07 | Forcepoint Llc | Managing access to user profile information via a distributed transaction database |
US10326775B2 (en) | 2017-05-15 | 2019-06-18 | Forcepoint, LLC | Multi-factor authentication using a user behavior profile as a factor |
US20210112076A1 (en) * | 2017-05-15 | 2021-04-15 | Forcepoint, LLC | Distributed Framework for Security Analytics |
US11677756B2 (en) | 2017-05-15 | 2023-06-13 | Forcepoint Llc | Risk adaptive protection |
US20210112075A1 (en) * | 2017-05-15 | 2021-04-15 | Forcepoint, LLC | Correlating Concerning Behavior During an Activity Session with a Security Risk Persona |
US20210112074A1 (en) * | 2017-05-15 | 2021-04-15 | Forcepoint, LLC | Using a Behavior-Based Modifier When Generating a User Entity Risk Score |
US11843613B2 (en) * | 2017-05-15 | 2023-12-12 | Forcepoint Llc | Using a behavior-based modifier when generating a user entity risk score |
US11838298B2 (en) * | 2017-05-15 | 2023-12-05 | Forcepoint Llc | Generating a security risk persona using stressor data |
US11902294B2 (en) | 2017-05-15 | 2024-02-13 | Forcepoint Llc | Using human factors when calculating a risk score |
US11463453B2 (en) | 2017-05-15 | 2022-10-04 | Forcepoint, LLC | Using a story when generating inferences using an adaptive trust profile |
US11621964B2 (en) | 2017-05-15 | 2023-04-04 | Forcepoint Llc | Analyzing an event enacted by a data entity when performing a security operation |
US10482874B2 (en) | 2017-05-15 | 2019-11-19 | Apple Inc. | Hierarchical belief states for digital assistants |
US10915643B2 (en) | 2017-05-15 | 2021-02-09 | Forcepoint, LLC | Adaptive trust profile endpoint architecture |
US10798109B2 (en) | 2017-05-15 | 2020-10-06 | Forcepoint Llc | Adaptive trust profile reference architecture |
US11902295B2 (en) | 2017-05-15 | 2024-02-13 | Forcepoint Llc | Using a security analytics map to perform forensic analytics |
US10447718B2 (en) | 2017-05-15 | 2019-10-15 | Forcepoint Llc | User profile definition and management |
US10915644B2 (en) | 2017-05-15 | 2021-02-09 | Forcepoint, LLC | Collecting data for centralized use in an adaptive trust profile event via an endpoint |
US11902296B2 (en) | 2017-05-15 | 2024-02-13 | Forcepoint Llc | Using a security analytics map to trace entity interaction |
US11563752B2 (en) | 2017-05-15 | 2023-01-24 | Forcepoint Llc | Using indicators of behavior to identify a security persona of an entity |
US11575685B2 (en) | 2017-05-15 | 2023-02-07 | Forcepoint Llc | User behavior profile including temporal detail corresponding to user interaction |
US10917423B2 (en) | 2017-05-15 | 2021-02-09 | Forcepoint, LLC | Intelligently differentiating between different types of states and attributes when using an adaptive trust profile |
US10298609B2 (en) | 2017-05-15 | 2019-05-21 | Forcepoint, LLC | User behavior profile environment |
US12014118B2 (en) | 2017-05-15 | 2024-06-18 | Apple Inc. | Multi-modal interfaces having selection disambiguation and text modification capability |
US11757902B2 (en) | 2017-05-15 | 2023-09-12 | Forcepoint Llc | Adaptive trust profile reference architecture |
US10944762B2 (en) | 2017-05-15 | 2021-03-09 | Forcepoint, LLC | Managing blockchain access to user information |
US11601441B2 (en) | 2017-05-15 | 2023-03-07 | Forcepoint Llc | Using indicators of behavior when performing a security operation |
US10943019B2 (en) | 2017-05-15 | 2021-03-09 | Forcepoint, LLC | Adaptive trust profile endpoint |
US11675829B2 (en) | 2017-05-16 | 2023-06-13 | Apple Inc. | Intelligent automated assistant for media exploration |
US11532306B2 (en) | 2017-05-16 | 2022-12-20 | Apple Inc. | Detecting a trigger of a digital assistant |
US10748546B2 (en) | 2017-05-16 | 2020-08-18 | Apple Inc. | Digital assistant services based on device capabilities |
US11217255B2 (en) | 2017-05-16 | 2022-01-04 | Apple Inc. | Far-field extension for digital assistant services |
US10311144B2 (en) | 2017-05-16 | 2019-06-04 | Apple Inc. | Emoji word sense disambiguation |
US10303715B2 (en) | 2017-05-16 | 2019-05-28 | Apple Inc. | Intelligent automated assistant for media exploration |
US10909171B2 (en) | 2017-05-16 | 2021-02-02 | Apple Inc. | Intelligent automated assistant for media exploration |
US10403278B2 (en) | 2017-05-16 | 2019-09-03 | Apple Inc. | Methods and systems for phonetic matching in digital assistant services |
CN107196942A (en) * | 2017-05-24 | 2017-09-22 | 山东省计算中心(国家超级计算济南中心) | A kind of inside threat detection method based on user language feature |
US10657328B2 (en) | 2017-06-02 | 2020-05-19 | Apple Inc. | Multi-task recurrent neural network architecture for efficient morphology handling in neural language modeling |
US10447525B2 (en) | 2017-06-05 | 2019-10-15 | Microsoft Technology Licensing, Llc | Validating correlation between chains of alerts using cloud view |
US20180351978A1 (en) * | 2017-06-05 | 2018-12-06 | Microsoft Technology Licensing, Llc | Correlating user information to a tracked event |
WO2018226461A1 (en) * | 2017-06-05 | 2018-12-13 | Microsoft Technology Licensing, Llc | Validating correlation between chains of alerts using cloud view |
US20190005225A1 (en) * | 2017-06-29 | 2019-01-03 | Microsoft Technology Licensing, Llc | Detection of attacks in the cloud by crowd sourcing security solutions |
US10911478B2 (en) * | 2017-06-29 | 2021-02-02 | Microsoft Technology Licensing, Llc | Detection of attacks in the cloud by crowd sourcing security solutions |
US10262153B2 (en) * | 2017-07-26 | 2019-04-16 | Forcepoint, LLC | Privacy protection during insider threat monitoring |
US10318729B2 (en) | 2017-07-26 | 2019-06-11 | Forcepoint, LLC | Privacy protection during insider threat monitoring |
US11250158B2 (en) | 2017-07-26 | 2022-02-15 | Forcepoint, LLC | Session-based security information |
US11132461B2 (en) | 2017-07-26 | 2021-09-28 | Forcepoint, LLC | Detecting, notifying and remediating noisy security policies |
US10679067B2 (en) * | 2017-07-26 | 2020-06-09 | Peking University Shenzhen Graduate School | Method for detecting violent incident in video based on hypergraph transition |
US10733323B2 (en) | 2017-07-26 | 2020-08-04 | Forcepoint Llc | Privacy protection during insider threat monitoring |
US11244070B2 (en) | 2017-07-26 | 2022-02-08 | Forcepoint, LLC | Adaptive remediation of multivariate risk |
US10642998B2 (en) * | 2017-07-26 | 2020-05-05 | Forcepoint Llc | Section-based security information |
US11379607B2 (en) | 2017-07-26 | 2022-07-05 | Forcepoint, LLC | Automatically generating security policies |
US11379608B2 (en) | 2017-07-26 | 2022-07-05 | Forcepoint, LLC | Monitoring entity behavior using organization specific security policies |
US11290478B2 (en) | 2017-08-08 | 2022-03-29 | Sentinel Labs Israel Ltd. | Methods, systems, and devices for dynamically modeling and grouping endpoints for edge networking |
US11722506B2 (en) | 2017-08-08 | 2023-08-08 | Sentinel Labs Israel Ltd. | Methods, systems, and devices for dynamically modeling and grouping endpoints for edge networking |
US10841325B2 (en) | 2017-08-08 | 2020-11-17 | Sentinel Labs Israel Ltd. | Methods, systems, and devices for dynamically modeling and grouping endpoints for edge networking |
US11973781B2 (en) | 2017-08-08 | 2024-04-30 | Sentinel Labs Israel Ltd. | Methods, systems, and devices for dynamically modeling and grouping endpoints for edge networking |
US11245714B2 (en) | 2017-08-08 | 2022-02-08 | Sentinel Labs Israel Ltd. | Methods, systems, and devices for dynamically modeling and grouping endpoints for edge networking |
US11716341B2 (en) | 2017-08-08 | 2023-08-01 | Sentinel Labs Israel Ltd. | Methods, systems, and devices for dynamically modeling and grouping endpoints for edge networking |
US11876819B2 (en) | 2017-08-08 | 2024-01-16 | Sentinel Labs Israel Ltd. | Methods, systems, and devices for dynamically modeling and grouping endpoints for edge networking |
US11522894B2 (en) | 2017-08-08 | 2022-12-06 | Sentinel Labs Israel Ltd. | Methods, systems, and devices for dynamically modeling and grouping endpoints for edge networking |
US11716342B2 (en) * | 2017-08-08 | 2023-08-01 | Sentinel Labs Israel Ltd. | Methods, systems, and devices for dynamically modeling and grouping endpoints for edge networking |
US11245715B2 (en) | 2017-08-08 | 2022-02-08 | Sentinel Labs Israel Ltd. | Methods, systems, and devices for dynamically modeling and grouping endpoints for edge networking |
US20230007031A1 (en) * | 2017-08-08 | 2023-01-05 | Sentinel Labs Israel Ltd. | Methods, systems, and devices for dynamically modeling and grouping endpoints for edge networking |
US10462171B2 (en) | 2017-08-08 | 2019-10-29 | Sentinel Labs Israel Ltd. | Methods, systems, and devices for dynamically modeling and grouping endpoints for edge networking |
US11838305B2 (en) | 2017-08-08 | 2023-12-05 | Sentinel Labs Israel Ltd. | Methods, systems, and devices for dynamically modeling and grouping endpoints for edge networking |
US11838306B2 (en) | 2017-08-08 | 2023-12-05 | Sentinel Labs Israel Ltd. | Methods, systems, and devices for dynamically modeling and grouping endpoints for edge networking |
US11212309B1 (en) | 2017-08-08 | 2021-12-28 | Sentinel Labs Israel Ltd. | Methods, systems, and devices for dynamically modeling and grouping endpoints for edge networking |
US10592666B2 (en) | 2017-08-31 | 2020-03-17 | Micro Focus Llc | Detecting anomalous entities |
US20190080352A1 (en) * | 2017-09-11 | 2019-03-14 | Adobe Systems Incorporated | Segment Extension Based on Lookalike Selection |
US11115300B2 (en) * | 2017-09-12 | 2021-09-07 | Cisco Technology, Inc | Anomaly detection and reporting in a network assurance appliance |
US10445429B2 (en) | 2017-09-21 | 2019-10-15 | Apple Inc. | Natural language understanding using vocabularies with compressed serialized tries |
US10373091B2 (en) * | 2017-09-22 | 2019-08-06 | 1Nteger, Llc | Systems and methods for investigating and evaluating financial crime and sanctions-related risks |
US11948116B2 (en) * | 2017-09-22 | 2024-04-02 | 1Nteger, Llc | Systems and methods for risk data navigation |
US20230004889A1 (en) * | 2017-09-22 | 2023-01-05 | 1Nteger, Llc | Systems and methods for risk data navigation |
US10997541B2 (en) * | 2017-09-22 | 2021-05-04 | 1Nteger, Llc | Systems and methods for investigating and evaluating financial crime and sanctions-related risks |
US20220222596A1 (en) * | 2017-09-22 | 2022-07-14 | 1Nteger, Llc | Systems and methods for investigating and evaluating financial crime and sanctions-related risks |
US11734633B2 (en) * | 2017-09-22 | 2023-08-22 | Integer, Llc | Systems and methods for investigating and evaluating financial crime and sanctions-related risks |
US11734632B2 (en) | 2017-09-22 | 2023-08-22 | Integer, Llc | Systems and methods for investigating and evaluating financial crime and sanctions-related risks |
US10127511B1 (en) * | 2017-09-22 | 2018-11-13 | 1Nteger, Llc | Systems and methods for investigating and evaluating financial crime and sanctions-related risks |
US20200226512A1 (en) * | 2017-09-22 | 2020-07-16 | 1Nteger, Llc | Systems and methods for investigating and evaluating financial crime and sanctions-related risks |
US10311895B2 (en) | 2017-09-26 | 2019-06-04 | International Business Machines Corporation | Assessing the structural quality of conversations |
US10424319B2 (en) | 2017-09-26 | 2019-09-24 | International Business Machines Corporation | Assessing the structural quality of conversations |
EP3460769A1 (en) * | 2017-09-26 | 2019-03-27 | Netscout Systems, Inc. | System and method for managing alerts using a state machine |
US10297273B2 (en) | 2017-09-26 | 2019-05-21 | International Business Machines Corporation | Assessing the structural quality of conversations |
US10673689B2 (en) | 2017-09-26 | 2020-06-02 | Netscout Systems, Inc | System and method for managing alerts using a state machine |
US10037768B1 (en) | 2017-09-26 | 2018-07-31 | International Business Machines Corporation | Assessing the structural quality of conversations |
US10755051B2 (en) | 2017-09-29 | 2020-08-25 | Apple Inc. | Rule-based natural language processing |
US11611480B2 (en) | 2017-10-04 | 2023-03-21 | Servicenow, Inc. | Systems and methods for automated governance, risk, and compliance |
US10826767B2 (en) * | 2017-10-04 | 2020-11-03 | Servicenow, Inc. | Systems and methods for automated governance, risk, and compliance |
US20190104156A1 (en) * | 2017-10-04 | 2019-04-04 | Servicenow, Inc. | Systems and methods for automated governance, risk, and compliance |
US10769283B2 (en) | 2017-10-31 | 2020-09-08 | Forcepoint, LLC | Risk adaptive protection |
US10803178B2 (en) | 2017-10-31 | 2020-10-13 | Forcepoint Llc | Genericized data model to perform a security analytics operation |
US11301618B2 (en) | 2017-11-06 | 2022-04-12 | Microsoft Technology Licensing, Llc | Automatic document assistance based on document type |
US20190138580A1 (en) * | 2017-11-06 | 2019-05-09 | Microsoft Technology Licensing, Llc | Electronic document content augmentation |
US10579716B2 (en) * | 2017-11-06 | 2020-03-03 | Microsoft Technology Licensing, Llc | Electronic document content augmentation |
US10915695B2 (en) * | 2017-11-06 | 2021-02-09 | Microsoft Technology Licensing, Llc | Electronic document content augmentation |
US10909309B2 (en) | 2017-11-06 | 2021-02-02 | Microsoft Technology Licensing, Llc | Electronic document content extraction and document type determination |
US10984180B2 (en) | 2017-11-06 | 2021-04-20 | Microsoft Technology Licensing, Llc | Electronic document supplementation with online social networking information |
US10699065B2 (en) * | 2017-11-06 | 2020-06-30 | Microsoft Technology Licensing, Llc | Electronic document content classification and document type determination |
US10713310B2 (en) | 2017-11-15 | 2020-07-14 | SAP SE Walldorf | Internet of things search and discovery using graph engine |
US10540410B2 (en) * | 2017-11-15 | 2020-01-21 | Sap Se | Internet of things structured query language query formation |
US10726072B2 (en) | 2017-11-15 | 2020-07-28 | Sap Se | Internet of things search and discovery graph engine construction |
US11170058B2 (en) * | 2017-11-15 | 2021-11-09 | Sap Se | Internet of things structured query language query formation |
US20190164094A1 (en) * | 2017-11-27 | 2019-05-30 | Promontory Financial Group Llc | Risk rating analytics based on geographic regions |
US10636424B2 (en) | 2017-11-30 | 2020-04-28 | Apple Inc. | Multi-turn canned dialog |
US10509712B2 (en) * | 2017-11-30 | 2019-12-17 | Vmware, Inc. | Methods and systems to determine baseline event-type distributions of event sources and detect changes in behavior of event sources |
US11182267B2 (en) * | 2017-11-30 | 2021-11-23 | Vmware, Inc. | Methods and systems to determine baseline event-type distributions of event sources and detect changes in behavior of event sources |
CN109951420A (en) * | 2017-12-20 | 2019-06-28 | 广东电网有限责任公司电力调度控制中心 | A kind of multistage flow method for detecting abnormality based on entropy and dynamic linear relationship |
US11605100B1 (en) * | 2017-12-22 | 2023-03-14 | Salesloft, Inc. | Methods and systems for determining cadences |
TWI746914B (en) * | 2017-12-28 | 2021-11-21 | 國立臺灣大學 | Detective method and system for activity-or-behavior model construction and automatic detection of the abnormal activities or behaviors of a subject system without requiring prior domain knowledge |
US11522873B2 (en) | 2017-12-29 | 2022-12-06 | DataVisor, Inc. | Detecting network attacks |
WO2019133989A1 (en) * | 2017-12-29 | 2019-07-04 | DataVisor, Inc. | Detecting network attacks |
CN108260155A (en) * | 2018-01-05 | 2018-07-06 | 西安电子科技大学 | A kind of wireless sense network method for detecting abnormality based on space-time similarity |
US10733982B2 (en) | 2018-01-08 | 2020-08-04 | Apple Inc. | Multi-directional dialog |
US10986113B2 (en) * | 2018-01-24 | 2021-04-20 | Hrl Laboratories, Llc | System for continuous validation and threat protection of mobile applications |
US10733375B2 (en) | 2018-01-31 | 2020-08-04 | Apple Inc. | Knowledge-based framework for improving natural language understanding |
US20220156598A1 (en) * | 2018-02-05 | 2022-05-19 | Microsoft Technology Licensing, Llc | Interactive semantic data exploration for error discovery |
US11270211B2 (en) * | 2018-02-05 | 2022-03-08 | Microsoft Technology Licensing, Llc | Interactive semantic data exploration for error discovery |
US11803763B2 (en) * | 2018-02-05 | 2023-10-31 | Microsoft Technology Licensing, Llc | Interactive semantic data exploration for error discovery |
US11003717B1 (en) * | 2018-02-08 | 2021-05-11 | Amazon Technologies, Inc. | Anomaly detection in streaming graphs |
US11888897B2 (en) | 2018-02-09 | 2024-01-30 | SentinelOne, Inc. | Implementing decoys in a network environment |
US10606954B2 (en) * | 2018-02-15 | 2020-03-31 | International Business Machines Corporation | Topic kernelization for real-time conversation data |
US20190251166A1 (en) * | 2018-02-15 | 2019-08-15 | International Business Machines Corporation | Topic kernelization for real-time conversation data |
US10956684B2 (en) | 2018-02-15 | 2021-03-23 | International Business Machines Corporation | Topic kernelization for real-time conversation data |
US11843628B2 (en) | 2018-02-20 | 2023-12-12 | Darktrace Holdings Limited | Cyber security appliance for an operational technology network |
US11716347B2 (en) | 2018-02-20 | 2023-08-01 | Darktrace Holdings Limited | Malicious site detection for a cyber threat response system |
US11477222B2 (en) | 2018-02-20 | 2022-10-18 | Darktrace Holdings Limited | Cyber threat defense system protecting email networks with machine learning models using a range of metadata from observed email communications |
US11522887B2 (en) | 2018-02-20 | 2022-12-06 | Darktrace Holdings Limited | Artificial intelligence controller orchestrating network components for a cyber threat defense |
US11418523B2 (en) | 2018-02-20 | 2022-08-16 | Darktrace Holdings Limited | Artificial intelligence privacy protection for cybersecurity analysis |
US11689556B2 (en) | 2018-02-20 | 2023-06-27 | Darktrace Holdings Limited | Incorporating software-as-a-service data into a cyber threat defense system |
US12063243B2 (en) | 2018-02-20 | 2024-08-13 | Darktrace Holdings Limited | Autonomous email report generator |
US11689557B2 (en) | 2018-02-20 | 2023-06-27 | Darktrace Holdings Limited | Autonomous report composer |
US11477219B2 (en) | 2018-02-20 | 2022-10-18 | Darktrace Holdings Limited | Endpoint agent and system |
US11075932B2 (en) | 2018-02-20 | 2021-07-27 | Darktrace Holdings Limited | Appliance extension for remote communication with a cyber security appliance |
US11546360B2 (en) | 2018-02-20 | 2023-01-03 | Darktrace Holdings Limited | Cyber security appliance for a cloud infrastructure |
US11924238B2 (en) | 2018-02-20 | 2024-03-05 | Darktrace Holdings Limited | Cyber threat defense system, components, and a method for using artificial intelligence models trained on a normal pattern of life for systems with unusual data sources |
US11457030B2 (en) | 2018-02-20 | 2022-09-27 | Darktrace Holdings Limited | Artificial intelligence researcher assistant for cybersecurity analysis |
US11902321B2 (en) | 2018-02-20 | 2024-02-13 | Darktrace Holdings Limited | Secure communication platform for a cybersecurity system |
US11606373B2 (en) | 2018-02-20 | 2023-03-14 | Darktrace Holdings Limited | Cyber threat defense system protecting email networks with machine learning models |
US11336670B2 (en) | 2018-02-20 | 2022-05-17 | Darktrace Holdings Limited | Secure communication platform for a cybersecurity system |
US11336669B2 (en) | 2018-02-20 | 2022-05-17 | Darktrace Holdings Limited | Artificial intelligence cyber security analyst |
US11463457B2 (en) | 2018-02-20 | 2022-10-04 | Darktrace Holdings Limited | Artificial intelligence (AI) based cyber threat analyst to support a cyber security appliance |
US11799898B2 (en) | 2018-02-20 | 2023-10-24 | Darktrace Holdings Limited | Method for sharing cybersecurity threat analysis and defensive measures amongst a community |
US11546359B2 (en) | 2018-02-20 | 2023-01-03 | Darktrace Holdings Limited | Multidimensional clustering analysis and visualizing that clustered analysis on a user interface |
US11962552B2 (en) | 2018-02-20 | 2024-04-16 | Darktrace Holdings Limited | Endpoint agent extension of a machine learning cyber defense system for email |
CN108519993A (en) * | 2018-03-02 | 2018-09-11 | 华南理工大学 | The social networks focus incident detection method calculated based on multiple data stream |
US10789959B2 (en) | 2018-03-02 | 2020-09-29 | Apple Inc. | Training speaker recognition models for digital assistants |
US10592604B2 (en) | 2018-03-12 | 2020-03-17 | Apple Inc. | Inverse text normalization for automatic speech recognition |
US11710482B2 (en) | 2018-03-26 | 2023-07-25 | Apple Inc. | Natural assistant interaction |
US10818288B2 (en) | 2018-03-26 | 2020-10-27 | Apple Inc. | Natural assistant interaction |
US10909331B2 (en) | 2018-03-30 | 2021-02-02 | Apple Inc. | Implicit identification of translation payload with neural machine translation |
US10938817B2 (en) * | 2018-04-05 | 2021-03-02 | Accenture Global Solutions Limited | Data security and protection system using distributed ledgers to store validated data in a knowledge graph |
US11314787B2 (en) | 2018-04-18 | 2022-04-26 | Forcepoint, LLC | Temporal resolution of an entity |
US11271991B2 (en) | 2018-04-19 | 2022-03-08 | Pinx, Inc. | Systems, methods and media for a distributed social media network and system of record |
US20190325343A1 (en) * | 2018-04-19 | 2019-10-24 | National University Of Singapore | Machine learning using partial order hypergraphs |
US11651273B2 (en) * | 2018-04-19 | 2023-05-16 | National University Of Singapore | Machine learning using partial order hypergraphs |
US11283857B2 (en) | 2018-04-19 | 2022-03-22 | William J. Ziebell | Systems, methods and media for a distributed social media network and system of record |
US11496480B2 (en) | 2018-05-01 | 2022-11-08 | Brighterion, Inc. | Securing internet-of-things with smart-agent technology |
US20190342297A1 (en) * | 2018-05-01 | 2019-11-07 | Brighterion, Inc. | Securing internet-of-things with smart-agent technology |
US11575688B2 (en) * | 2018-05-02 | 2023-02-07 | Sri International | Method of malware characterization and prediction |
US20190342308A1 (en) * | 2018-05-02 | 2019-11-07 | Sri International | Method of malware characterization and prediction |
US11907436B2 (en) | 2018-05-07 | 2024-02-20 | Apple Inc. | Raise to speak |
US11487364B2 (en) | 2018-05-07 | 2022-11-01 | Apple Inc. | Raise to speak |
US11900923B2 (en) | 2018-05-07 | 2024-02-13 | Apple Inc. | Intelligent automated assistant for delivering content from user experiences |
US10928918B2 (en) | 2018-05-07 | 2021-02-23 | Apple Inc. | Raise to speak |
US11854539B2 (en) | 2018-05-07 | 2023-12-26 | Apple Inc. | Intelligent automated assistant for delivering content from user experiences |
US11169616B2 (en) | 2018-05-07 | 2021-11-09 | Apple Inc. | Raise to speak |
US11145294B2 (en) | 2018-05-07 | 2021-10-12 | Apple Inc. | Intelligent automated assistant for delivering content from user experiences |
US20210075812A1 (en) * | 2018-05-08 | 2021-03-11 | Abc Software, Sia | A system and a method for sequential anomaly revealing in a computer network |
US10984780B2 (en) | 2018-05-21 | 2021-04-20 | Apple Inc. | Global semantic word embeddings using bi-directional recurrent neural networks |
US11870794B2 (en) * | 2018-05-25 | 2024-01-09 | Nippon Telegraph And Telephone Corporation | Specifying device, specifying method, and specifying program |
US20210392152A1 (en) * | 2018-05-25 | 2021-12-16 | At&T Intellectual Property I, L.P. | Intrusion detection using robust singular value decomposition |
US20210203660A1 (en) * | 2018-05-25 | 2021-07-01 | Nippon Telegraph And Telephone Corporation | Specifying device, specifying method, and specifying program |
US12067985B2 (en) | 2018-06-01 | 2024-08-20 | Apple Inc. | Virtual assistant operations in multi-device environments |
US10684703B2 (en) | 2018-06-01 | 2020-06-16 | Apple Inc. | Attention aware virtual assistant dismissal |
US10720160B2 (en) | 2018-06-01 | 2020-07-21 | Apple Inc. | Voice interaction at a primary device to access call functionality of a companion device |
US10984798B2 (en) | 2018-06-01 | 2021-04-20 | Apple Inc. | Voice interaction at a primary device to access call functionality of a companion device |
US11386266B2 (en) | 2018-06-01 | 2022-07-12 | Apple Inc. | Text correction |
US11630525B2 (en) | 2018-06-01 | 2023-04-18 | Apple Inc. | Attention aware virtual assistant dismissal |
US12080287B2 (en) | 2018-06-01 | 2024-09-03 | Apple Inc. | Voice interaction at a primary device to access call functionality of a companion device |
US10892996B2 (en) | 2018-06-01 | 2021-01-12 | Apple Inc. | Variable latency device coordination |
US11360577B2 (en) | 2018-06-01 | 2022-06-14 | Apple Inc. | Attention aware virtual assistant dismissal |
US11431642B2 (en) | 2018-06-01 | 2022-08-30 | Apple Inc. | Variable latency device coordination |
US11009970B2 (en) | 2018-06-01 | 2021-05-18 | Apple Inc. | Attention aware virtual assistant dismissal |
US11495218B2 (en) | 2018-06-01 | 2022-11-08 | Apple Inc. | Virtual assistant operation in multi-device environments |
US10403283B1 (en) | 2018-06-01 | 2019-09-03 | Apple Inc. | Voice interaction at a primary device to access call functionality of a companion device |
US10504518B1 (en) | 2018-06-03 | 2019-12-10 | Apple Inc. | Accelerated task performance |
US10944859B2 (en) | 2018-06-03 | 2021-03-09 | Apple Inc. | Accelerated task performance |
US10496705B1 (en) | 2018-06-03 | 2019-12-03 | Apple Inc. | Accelerated task performance |
US20220345476A1 (en) * | 2018-06-06 | 2022-10-27 | Reliaquest Holdings, Llc | Threat mitigation system and method |
US20210126933A1 (en) * | 2018-06-22 | 2021-04-29 | Nec Corporation | Communication analysis apparatus, communication analysis method, communication environment analysis apparatus, communication environment analysis method, and program |
US11507955B2 (en) | 2018-07-06 | 2022-11-22 | At&T Intellectual Property I, L.P. | Services for entity trust conveyances |
US11132681B2 (en) | 2018-07-06 | 2021-09-28 | At&T Intellectual Property I, L.P. | Services for entity trust conveyances |
US11544273B2 (en) | 2018-07-12 | 2023-01-03 | Forcepoint Llc | Constructing event distributions via a streaming scoring operation |
US11810012B2 (en) | 2018-07-12 | 2023-11-07 | Forcepoint Llc | Identifying event distributions using interrelated events |
US11436512B2 (en) | 2018-07-12 | 2022-09-06 | Forcepoint, LLC | Generating extracted features from an event |
US11755585B2 (en) | 2018-07-12 | 2023-09-12 | Forcepoint Llc | Generating enriched events using enriched data and extracted features |
US11755584B2 (en) | 2018-07-12 | 2023-09-12 | Forcepoint Llc | Constructing distributions of interrelated event features |
US10949428B2 (en) | 2018-07-12 | 2021-03-16 | Forcepoint, LLC | Constructing event distributions via a streaming scoring operation |
US11025638B2 (en) | 2018-07-19 | 2021-06-01 | Forcepoint, LLC | System and method providing security friction for atypical resource access requests |
US11269943B2 (en) * | 2018-07-26 | 2022-03-08 | JANZZ Ltd | Semantic matching system and method |
US11157846B2 (en) * | 2018-08-06 | 2021-10-26 | Sociometric Solutions, Inc. | System and method for transforming communication metadata and sensor data into an objective measure of the communication distribution of an organization |
WO2020033404A1 (en) * | 2018-08-07 | 2020-02-13 | Triad National Security, Llc | Modeling anomalousness of new subgraphs observed locally in a dynamic graph based on subgraph attributes |
US11605087B2 (en) * | 2018-08-15 | 2023-03-14 | Advanced New Technologies Co., Ltd. | Method and apparatus for identifying identity information |
US20210081950A1 (en) * | 2018-08-15 | 2021-03-18 | Advanced New Technologies Co., Ltd. | Method and apparatus for identifying identity information |
US11657168B2 (en) * | 2018-08-24 | 2023-05-23 | Bank Of America Corporation | Error detection of data leakage in a data processing system |
US20210081554A1 (en) * | 2018-08-24 | 2021-03-18 | Bank Of America Corporation | Error detection of data leakage in a data processing system |
US11411973B2 (en) | 2018-08-31 | 2022-08-09 | Forcepoint, LLC | Identifying security risks using distributions of characteristic features extracted from a plurality of events |
US11244374B2 (en) * | 2018-08-31 | 2022-02-08 | Realm Ip, Llc | System and machine implemented method for adaptive collaborative matching |
US11811799B2 (en) | 2018-08-31 | 2023-11-07 | Forcepoint Llc | Identifying security risks using distributions of characteristic features extracted from a plurality of events |
US11184404B1 (en) * | 2018-09-07 | 2021-11-23 | Salt Stack, Inc. | Performing idempotent operations to scan and remediate configuration settings of a device |
US11579923B2 (en) | 2018-09-12 | 2023-02-14 | At&T Intellectual Property I, L.P. | Task delegation and cooperation for automated assistants |
US10802872B2 (en) | 2018-09-12 | 2020-10-13 | At&T Intellectual Property I, L.P. | Task delegation and cooperation for automated assistants |
US11321119B2 (en) | 2018-09-12 | 2022-05-03 | At&T Intellectual Property I, L.P. | Task delegation and cooperation for automated assistants |
US20210174128A1 (en) * | 2018-09-19 | 2021-06-10 | Chenope, Inc. | System and Method for Detecting and Analyzing Digital Communications |
US11594012B2 (en) * | 2018-09-19 | 2023-02-28 | Chenope, Inc. | System and method for detecting and analyzing digital communications |
US11010561B2 (en) | 2018-09-27 | 2021-05-18 | Apple Inc. | Sentiment prediction from textual data |
US10839159B2 (en) | 2018-09-28 | 2020-11-17 | Apple Inc. | Named entity normalization in a spoken dialog system |
US11893992B2 (en) | 2018-09-28 | 2024-02-06 | Apple Inc. | Multi-modal inputs for voice commands |
US11170166B2 (en) | 2018-09-28 | 2021-11-09 | Apple Inc. | Neural typographical error modeling via generative adversarial networks |
US11462215B2 (en) | 2018-09-28 | 2022-10-04 | Apple Inc. | Multi-modal inputs for voice commands |
US20210304207A1 (en) * | 2018-10-16 | 2021-09-30 | Mastercard International Incorporated | Systems and methods for monitoring machine learning systems |
US11025659B2 (en) | 2018-10-23 | 2021-06-01 | Forcepoint, LLC | Security system using pseudonyms to anonymously identify entities and corresponding security risk related behaviors |
US11595430B2 (en) | 2018-10-23 | 2023-02-28 | Forcepoint Llc | Security system using pseudonyms to anonymously identify entities and corresponding security risk related behaviors |
US11481186B2 (en) | 2018-10-25 | 2022-10-25 | At&T Intellectual Property I, L.P. | Automated assistant context and protocol |
US11475898B2 (en) | 2018-10-26 | 2022-10-18 | Apple Inc. | Low-latency multi-speaker speech recognition |
US11265225B2 (en) * | 2018-10-28 | 2022-03-01 | Netz Forecasts Ltd. | Systems and methods for prediction of anomalies |
US11526750B2 (en) | 2018-10-29 | 2022-12-13 | Zoominfo Apollo Llc | Automated industry classification with deep learning |
US11171980B2 (en) * | 2018-11-02 | 2021-11-09 | Forcepoint Llc | Contagion risk detection, analysis and protection |
CN109753989A (en) * | 2018-11-18 | 2019-05-14 | 韩霞 | Power consumer electricity stealing analysis method based on big data and machine learning |
US10915435B2 (en) * | 2018-11-28 | 2021-02-09 | International Business Machines Corporation | Deep learning based problem advisor |
CN113168469A (en) * | 2018-12-10 | 2021-07-23 | 比特梵德知识产权管理有限公司 | System and method for behavioral threat detection |
US11663405B2 (en) * | 2018-12-13 | 2023-05-30 | Microsoft Technology Licensing, Llc | Machine learning applications for temporally-related events |
US11927609B2 (en) | 2018-12-14 | 2024-03-12 | University Of Georgia Research Foundation, Inc. | Condition monitoring via energy consumption audit in electrical devices and electrical waveform audit in power networks |
US20200193264A1 (en) * | 2018-12-14 | 2020-06-18 | At&T Intellectual Property I, L.P. | Synchronizing virtual agent behavior bias to user context and personality attributes |
WO2020124010A1 (en) * | 2018-12-14 | 2020-06-18 | University Of Georgia Research Foundation, Inc. | Condition monitoring via energy consumption audit in electrical devices and electrical waveform audit in power networks |
US11568277B2 (en) * | 2018-12-16 | 2023-01-31 | Intuit Inc. | Method and apparatus for detecting anomalies in mission critical environments using word representation learning |
US12052277B1 (en) | 2018-12-17 | 2024-07-30 | Wells Fargo Bank, N.A. | Autonomous configuration modeling and management |
US11522898B1 (en) * | 2018-12-17 | 2022-12-06 | Wells Fargo Bank, N.A. | Autonomous configuration modeling and management |
US11743294B2 (en) | 2018-12-19 | 2023-08-29 | Abnormal Security Corporation | Retrospective learning of communication patterns by machine learning models for discovering abnormal behavior |
US11552969B2 (en) | 2018-12-19 | 2023-01-10 | Abnormal Security Corporation | Threat detection platforms for detecting, characterizing, and remediating email-based threats in real time |
US11973772B2 (en) | 2018-12-19 | 2024-04-30 | Abnormal Security Corporation | Multistage analysis of emails to identify security threats |
US11824870B2 (en) | 2018-12-19 | 2023-11-21 | Abnormal Security Corporation | Threat detection platforms for detecting, characterizing, and remediating email-based threats in real time |
US11638059B2 (en) | 2019-01-04 | 2023-04-25 | Apple Inc. | Content playback on multiple devices |
US10986121B2 (en) | 2019-01-24 | 2021-04-20 | Darktrace Limited | Multivariate network structure anomaly detector |
CN110162621A (en) * | 2019-02-22 | 2019-08-23 | 腾讯科技(深圳)有限公司 | Disaggregated model training method, abnormal comment detection method, device and equipment |
CN110069480A (en) * | 2019-03-04 | 2019-07-30 | 广东恒睿科技有限公司 | A kind of parallel data cleaning method |
US11783815B2 (en) | 2019-03-18 | 2023-10-10 | Apple Inc. | Multimodality in digital assistant systems |
US11348573B2 (en) | 2019-03-18 | 2022-05-31 | Apple Inc. | Multimodality in digital assistant systems |
FR3094600A1 (en) * | 2019-03-29 | 2020-10-02 | Orange | Method of extracting at least one communication pattern in a communication network |
US11443035B2 (en) * | 2019-04-12 | 2022-09-13 | Mcafee, Llc | Behavioral user security policy |
US11232111B2 (en) | 2019-04-14 | 2022-01-25 | Zoominfo Apollo Llc | Automated company matching |
US11330005B2 (en) * | 2019-04-15 | 2022-05-10 | Vectra Ai, Inc. | Privileged account breach detections based on behavioral access patterns |
US12093970B2 (en) | 2019-04-19 | 2024-09-17 | Texas State University | Identifying and quantifying sentiment and promotion bias in social and content networks |
WO2020214187A1 (en) * | 2019-04-19 | 2020-10-22 | Texas State University | Identifying and quantifying sentiment and promotion bias in social and content networks |
US10853496B2 (en) | 2019-04-26 | 2020-12-01 | Forcepoint, LLC | Adaptive trust profile behavioral fingerprint |
US10997295B2 (en) | 2019-04-26 | 2021-05-04 | Forcepoint, LLC | Adaptive trust profile reference architecture |
US11163884B2 (en) | 2019-04-26 | 2021-11-02 | Forcepoint Llc | Privacy and the adaptive trust profile |
US11217251B2 (en) | 2019-05-06 | 2022-01-04 | Apple Inc. | Spoken notifications |
US11475884B2 (en) | 2019-05-06 | 2022-10-18 | Apple Inc. | Reducing digital assistant latency when a language is incorrectly determined |
US11423908B2 (en) | 2019-05-06 | 2022-08-23 | Apple Inc. | Interpreting spoken requests |
US11705130B2 (en) | 2019-05-06 | 2023-07-18 | Apple Inc. | Spoken notifications |
US11675491B2 (en) | 2019-05-06 | 2023-06-13 | Apple Inc. | User configurable task triggers |
US11307752B2 (en) | 2019-05-06 | 2022-04-19 | Apple Inc. | User configurable task triggers |
US20230135619A1 (en) * | 2019-05-07 | 2023-05-04 | Cerebri AI Inc. | Predictive, machine-learning, locale-aware computer models suitable for location- and trajectory-aware training sets |
US11501213B2 (en) * | 2019-05-07 | 2022-11-15 | Cerebri AI Inc. | Predictive, machine-learning, locale-aware computer models suitable for location- and trajectory-aware training sets |
US20200356900A1 (en) * | 2019-05-07 | 2020-11-12 | Cerebri AI Inc. | Predictive, machine-learning, locale-aware computer models suitable for location- and trajectory-aware training sets |
US11636393B2 (en) | 2019-05-07 | 2023-04-25 | Cerebri AI Inc. | Predictive, machine-learning, time-series computer models suitable for sparse training sets |
US11580218B2 (en) | 2019-05-20 | 2023-02-14 | Sentinel Labs Israel Ltd. | Systems and methods for executable code detection, automatic feature extraction and position independent code detection |
US11790079B2 (en) | 2019-05-20 | 2023-10-17 | Sentinel Labs Israel Ltd. | Systems and methods for executable code detection, automatic feature extraction and position independent code detection |
US11210392B2 (en) | 2019-05-20 | 2021-12-28 | Sentinel Labs Israel Ltd. | Systems and methods for executable code detection, automatic feature extraction and position independent code detection |
US10762200B1 (en) | 2019-05-20 | 2020-09-01 | Sentinel Labs Israel Ltd. | Systems and methods for executable code detection, automatic feature extraction and position independent code detection |
US11888791B2 (en) | 2019-05-21 | 2024-01-30 | Apple Inc. | Providing message response suggestions |
US11140099B2 (en) | 2019-05-21 | 2021-10-05 | Apple Inc. | Providing message response suggestions |
US11289073B2 (en) | 2019-05-31 | 2022-03-29 | Apple Inc. | Device text to speech |
US11657813B2 (en) | 2019-05-31 | 2023-05-23 | Apple Inc. | Voice identification in digital assistant systems |
US11360739B2 (en) | 2019-05-31 | 2022-06-14 | Apple Inc. | User activity shortcut suggestions |
US11237797B2 (en) | 2019-05-31 | 2022-02-01 | Apple Inc. | User activity shortcut suggestions |
US11496600B2 (en) | 2019-05-31 | 2022-11-08 | Apple Inc. | Remote execution of machine-learned models |
US11790914B2 (en) | 2019-06-01 | 2023-10-17 | Apple Inc. | Methods and user interfaces for voice-based control of electronic devices |
US11360641B2 (en) | 2019-06-01 | 2022-06-14 | Apple Inc. | Increasing the relevance of new available information |
US11323463B2 (en) * | 2019-06-14 | 2022-05-03 | Datadog, Inc. | Generating data structures representing relationships among entities of a high-scale network infrastructure |
US11258806B1 (en) * | 2019-06-24 | 2022-02-22 | Mandiant, Inc. | System and method for automatically associating cybersecurity intelligence to cyberthreat actors |
US12063229B1 (en) | 2019-06-24 | 2024-08-13 | Google Llc | System and method for associating cybersecurity intelligence to cyberthreat actors through a similarity matrix |
US11115624B1 (en) | 2019-07-22 | 2021-09-07 | Salesloft, Inc. | Methods and systems for joining a conference |
US11792210B2 (en) | 2019-08-02 | 2023-10-17 | Crowdstrike, Inc. | Mapping unbounded incident scores to a fixed range |
US11588832B2 (en) | 2019-08-02 | 2023-02-21 | Crowdstrike, Inc. | Malicious incident visualization |
US11631014B2 (en) * | 2019-08-02 | 2023-04-18 | Capital One Services, Llc | Computer-based systems configured for detecting, classifying, and visualizing events in large-scale, multivariate and multidimensional datasets and methods of use thereof |
EP3772003A1 (en) * | 2019-08-02 | 2021-02-03 | CrowdStrike, Inc. | Mapping unbounded incident scores to a fixed range |
US11582246B2 (en) | 2019-08-02 | 2023-02-14 | Crowd Strike, Inc. | Advanced incident scoring |
US11516237B2 (en) | 2019-08-02 | 2022-11-29 | Crowdstrike, Inc. | Visualization and control of remotely monitored hosts |
US11669795B2 (en) * | 2019-08-09 | 2023-06-06 | Capital One Services, Llc | Compliance management for emerging risks |
US20210174277A1 (en) * | 2019-08-09 | 2021-06-10 | Capital One Services, Llc | Compliance management for emerging risks |
US11709944B2 (en) | 2019-08-29 | 2023-07-25 | Darktrace Holdings Limited | Intelligent adversary simulator |
US12034767B2 (en) | 2019-08-29 | 2024-07-09 | Darktrace Holdings Limited | Artificial intelligence adversary red team |
CN112566307A (en) * | 2019-09-10 | 2021-03-26 | 酷矽半导体科技(上海)有限公司 | Safety display system and safety display method |
US11488406B2 (en) | 2019-09-25 | 2022-11-01 | Apple Inc. | Text detection using global geometry estimators |
CN112579661A (en) * | 2019-09-29 | 2021-03-30 | 杭州海康威视数字技术股份有限公司 | Method and device for determining specific target pair, computer equipment and storage medium |
US11785098B2 (en) | 2019-09-30 | 2023-10-10 | Atlassian Pty Ltd. | Systems and methods for personalization of a computer application |
US11275788B2 (en) * | 2019-10-21 | 2022-03-15 | International Business Machines Corporation | Controlling information stored in multiple service computing systems |
CN110826492A (en) * | 2019-11-07 | 2020-02-21 | 长沙品先信息技术有限公司 | Method for detecting abnormal behaviors of crowd in sensitive area based on behavior analysis |
CN114787734A (en) * | 2019-11-12 | 2022-07-22 | 阿韦瓦软件有限责任公司 | Operational anomaly feedback ring system and method |
WO2021097041A1 (en) * | 2019-11-12 | 2021-05-20 | Aveva Software, Llc | Operational anomaly feedback loop system and method |
CN111026270A (en) * | 2019-12-09 | 2020-04-17 | 大连外国语大学 | User behavior pattern mining method under mobile context awareness environment |
US11481485B2 (en) * | 2020-01-08 | 2022-10-25 | Visa International Service Association | Methods and systems for peer grouping in insider threat detection |
CN111310178A (en) * | 2020-01-20 | 2020-06-19 | 武汉理工大学 | Firmware vulnerability detection method and system under cross-platform scene |
US11223646B2 (en) | 2020-01-22 | 2022-01-11 | Forcepoint, LLC | Using concerning behaviors when performing entity-based risk calculations |
US11570197B2 (en) | 2020-01-22 | 2023-01-31 | Forcepoint Llc | Human-centric risk modeling framework |
US11489862B2 (en) | 2020-01-22 | 2022-11-01 | Forcepoint Llc | Anticipating future behavior using kill chains |
US11630901B2 (en) | 2020-02-03 | 2023-04-18 | Forcepoint Llc | External trigger induced behavioral analyses |
US11567847B2 (en) * | 2020-02-04 | 2023-01-31 | International Business Machines Corporation | Identifying anomolous device usage based on usage patterns |
EP3866394A1 (en) * | 2020-02-12 | 2021-08-18 | EXFO Solutions SAS | Detection, characterization, and prediction of real-time events occurring approximately periodically |
US11416504B2 (en) | 2020-02-12 | 2022-08-16 | EXFO Solutions SAS | Detection, characterization, and prediction of real-time events occurring approximately periodically |
US12081522B2 (en) | 2020-02-21 | 2024-09-03 | Abnormal Security Corporation | Discovering email account compromise through assessments of digital activities |
CN111371594A (en) * | 2020-02-25 | 2020-07-03 | 成都西加云杉科技有限公司 | Equipment abnormity warning method and device and electronic equipment |
US11080109B1 (en) | 2020-02-27 | 2021-08-03 | Forcepoint Llc | Dynamically reweighting distributions of event observations |
US11973774B2 (en) | 2020-02-28 | 2024-04-30 | Darktrace Holdings Limited | Multi-stage anomaly detection for process chains in multi-host environments |
US11997113B2 (en) | 2020-02-28 | 2024-05-28 | Darktrace Holdings Limited | Treating data flows differently based on level of interest |
US11985142B2 (en) | 2020-02-28 | 2024-05-14 | Darktrace Holdings Limited | Method and system for determining and acting on a structured document cyber threat risk |
US11477235B2 (en) | 2020-02-28 | 2022-10-18 | Abnormal Security Corporation | Approaches to creating, managing, and applying a federated database to establish risk posed by third parties |
US12069073B2 (en) | 2020-02-28 | 2024-08-20 | Darktrace Holdings Limited | Cyber threat defense system and method |
US11936667B2 (en) | 2020-02-28 | 2024-03-19 | Darktrace Holdings Limited | Cyber security system applying network sequence prediction using transformers |
US11429697B2 (en) | 2020-03-02 | 2022-08-30 | Forcepoint, LLC | Eventually consistent entity resolution |
US11663303B2 (en) * | 2020-03-02 | 2023-05-30 | Abnormal Security Corporation | Multichannel threat detection for protecting against account compromise |
US20210271769A1 (en) * | 2020-03-02 | 2021-09-02 | Forcepoint, LLC | Type-dependent event deduplication |
US20220342966A1 (en) * | 2020-03-02 | 2022-10-27 | Abnormal Security Corporation | Multichannel threat detection for protecting against account compromise |
US11949713B2 (en) | 2020-03-02 | 2024-04-02 | Abnormal Security Corporation | Abuse mailbox for facilitating discovery, investigation, and analysis of email-based threats |
US11836265B2 (en) * | 2020-03-02 | 2023-12-05 | Forcepoint Llc | Type-dependent event deduplication |
US11790060B2 (en) * | 2020-03-02 | 2023-10-17 | Abnormal Security Corporation | Multichannel threat detection for protecting against account compromise |
WO2021188315A1 (en) * | 2020-03-19 | 2021-09-23 | Liveramp, Inc. | Cyber security system and method |
CN111428049A (en) * | 2020-03-20 | 2020-07-17 | 北京百度网讯科技有限公司 | Method, device, equipment and storage medium for generating event topic |
US11494275B1 (en) * | 2020-03-30 | 2022-11-08 | Rapid7, Inc. | Automated log entry identification and alert management |
US11080032B1 (en) | 2020-03-31 | 2021-08-03 | Forcepoint Llc | Containerized infrastructure for deployment of microservices |
US11645397B2 (en) | 2020-04-15 | 2023-05-09 | Crowd Strike, Inc. | Distributed digital security system |
US12047399B2 (en) | 2020-04-15 | 2024-07-23 | Crowdstrike, Inc. | Distributed digital security system |
US11861019B2 (en) | 2020-04-15 | 2024-01-02 | Crowdstrike, Inc. | Distributed digital security system |
US12021884B2 (en) | 2020-04-15 | 2024-06-25 | Crowdstrike, Inc. | Distributed digital security system |
US11711379B2 (en) | 2020-04-15 | 2023-07-25 | Crowdstrike, Inc. | Distributed digital security system |
US11616790B2 (en) | 2020-04-15 | 2023-03-28 | Crowdstrike, Inc. | Distributed digital security system |
US11568136B2 (en) | 2020-04-15 | 2023-01-31 | Forcepoint Llc | Automatically constructing lexicons from unlabeled datasets |
US11563756B2 (en) | 2020-04-15 | 2023-01-24 | Crowdstrike, Inc. | Distributed digital security system |
US11575697B2 (en) * | 2020-04-30 | 2023-02-07 | Kyndryl, Inc. | Anomaly detection using an ensemble of models |
US20210344695A1 (en) * | 2020-04-30 | 2021-11-04 | International Business Machines Corporation | Anomaly detection using an ensemble of models |
US11516206B2 (en) | 2020-05-01 | 2022-11-29 | Forcepoint Llc | Cybersecurity system having digital certificate reputation system |
US12130908B2 (en) | 2020-05-01 | 2024-10-29 | Forcepoint Llc | Progressive trigger data and detection model |
US11544390B2 (en) | 2020-05-05 | 2023-01-03 | Forcepoint Llc | Method, system, and apparatus for probabilistic identification of encrypted files |
US11914848B2 (en) | 2020-05-11 | 2024-02-27 | Apple Inc. | Providing relevant data items based on context |
US11765209B2 (en) | 2020-05-11 | 2023-09-19 | Apple Inc. | Digital assistant hardware abstraction |
US11924254B2 (en) | 2020-05-11 | 2024-03-05 | Apple Inc. | Digital assistant hardware abstraction |
US11755276B2 (en) | 2020-05-12 | 2023-09-12 | Apple Inc. | Reducing description length based on confidence |
US11895158B2 (en) | 2020-05-19 | 2024-02-06 | Forcepoint Llc | Cybersecurity system having security policy visualization |
US11935522B2 (en) | 2020-06-11 | 2024-03-19 | Capital One Services, Llc | Cognitive analysis of public communications |
US20240039914A1 (en) * | 2020-06-29 | 2024-02-01 | Cyral Inc. | Non-in line data monitoring and security services |
CN111767449A (en) * | 2020-06-30 | 2020-10-13 | 北京百度网讯科技有限公司 | User data processing method, device, computing equipment and medium |
CN113472582A (en) * | 2020-07-15 | 2021-10-01 | 北京沃东天骏信息技术有限公司 | System and method for alarm correlation and alarm aggregation in information technology monitoring |
US11838734B2 (en) | 2020-07-20 | 2023-12-05 | Apple Inc. | Multi-device audio adjustment coordination |
US11696060B2 (en) | 2020-07-21 | 2023-07-04 | Apple Inc. | User identification using headphones |
US11750962B2 (en) | 2020-07-21 | 2023-09-05 | Apple Inc. | User identification using headphones |
CN111967362A (en) * | 2020-08-09 | 2020-11-20 | 电子科技大学 | Hypergraph feature fusion and ensemble learning human behavior identification method for wearable equipment |
US11704387B2 (en) | 2020-08-28 | 2023-07-18 | Forcepoint Llc | Method and system for fuzzy matching and alias matching for streaming data sets |
CN112084140A (en) * | 2020-09-03 | 2020-12-15 | 中国人民大学 | Fine-grained stream data processing method and system in heterogeneous system |
CN112115413A (en) * | 2020-09-07 | 2020-12-22 | 广西天懿智汇建设投资有限公司 | Termite quantity monitoring method based on iterative method |
US11372867B2 (en) * | 2020-09-09 | 2022-06-28 | Citrix Systems, Inc. | Bootstrapped relevance scoring system |
WO2022061244A1 (en) * | 2020-09-18 | 2022-03-24 | Ethimetrix Llc | System and method for predictive corruption risk assessment |
US11170034B1 (en) * | 2020-09-21 | 2021-11-09 | Foxit Software Inc. | System and method for determining credibility of content in a number of documents |
CN112153343A (en) * | 2020-09-25 | 2020-12-29 | 北京百度网讯科技有限公司 | Elevator safety monitoring method and device, monitoring camera and storage medium |
US20220108330A1 (en) * | 2020-10-06 | 2022-04-07 | Rebecca Mendoza Saltiel | Interactive and iterative behavioral model, system, and method for detecting fraud, waste, abuse and anomaly |
CN112214775A (en) * | 2020-10-09 | 2021-01-12 | 平安国际智慧城市科技股份有限公司 | Injection type attack method and device for graph data, medium and electronic equipment |
US20220027575A1 (en) * | 2020-10-14 | 2022-01-27 | Beijing Baidu Netcom Science Technology Co., Ltd. | Method of predicting emotional style of dialogue, electronic device, and storage medium |
US11683284B2 (en) | 2020-10-23 | 2023-06-20 | Abnormal Security Corporation | Discovering graymail through real-time analysis of incoming email |
US11190589B1 (en) | 2020-10-27 | 2021-11-30 | Forcepoint, LLC | System and method for efficient fingerprinting in cloud multitenant data loss prevention |
CN112307435A (en) * | 2020-10-30 | 2021-02-02 | 三峡大学 | Method for judging and screening abnormal electricity consumption based on fuzzy clustering and trend |
CN112463848A (en) * | 2020-11-05 | 2021-03-09 | 中国建设银行股份有限公司 | Method, system, device and storage medium for detecting abnormal user behavior |
CN112417099A (en) * | 2020-11-20 | 2021-02-26 | 南京邮电大学 | Method for constructing fraud user detection model based on graph attention network |
CN112417099B (en) * | 2020-11-20 | 2022-10-04 | 南京邮电大学 | Method for constructing fraud user detection model based on graph attention network |
CN112580022A (en) * | 2020-12-07 | 2021-03-30 | 北京中电飞华通信有限公司 | Host system safety early warning method, device, equipment and storage medium |
US11811804B1 (en) * | 2020-12-15 | 2023-11-07 | Red Hat, Inc. | System and method for detecting process anomalies in a distributed computation system utilizing containers |
US11579857B2 (en) | 2020-12-16 | 2023-02-14 | Sentinel Labs Israel Ltd. | Systems, methods and devices for device fingerprinting and automatic deployment of software in a computing network using a peer-to-peer approach |
US11748083B2 (en) | 2020-12-16 | 2023-09-05 | Sentinel Labs Israel Ltd. | Systems, methods and devices for device fingerprinting and automatic deployment of software in a computing network using a peer-to-peer approach |
US11329933B1 (en) * | 2020-12-28 | 2022-05-10 | Drift.com, Inc. | Persisting an AI-supported conversation across multiple channels |
WO2022150061A1 (en) * | 2021-01-08 | 2022-07-14 | Feedzai-Consultadoria e Inovação Tecnológica, S.A. | Generation of divergence distributions for automted data analysis |
US12020256B2 (en) * | 2021-01-08 | 2024-06-25 | Feedzai—Consultadoria e Inovação Tecnológica, S.A. | Generation of divergence distributions for automated data analysis |
US20220222670A1 (en) * | 2021-01-08 | 2022-07-14 | Feedzai - Consultadoria e Inovacao Tecnologica, S. A. | Generation of divergence distributions for automated data analysis |
CN112651988A (en) * | 2021-01-13 | 2021-04-13 | 重庆大学 | Finger-shaped image segmentation, finger-shaped plate dislocation and fastener abnormality detection method based on double-pointer positioning |
US20220232353A1 (en) * | 2021-01-19 | 2022-07-21 | Gluroo Imaginations, Inc. | Messaging-based logging and alerting system |
CN112818017A (en) * | 2021-01-22 | 2021-05-18 | 百果园技术(新加坡)有限公司 | Event data processing method and device |
US11494381B1 (en) * | 2021-01-29 | 2022-11-08 | Splunk Inc. | Ingestion and processing of both cloud-based and non-cloud-based data by a data intake and query system |
CN113015195A (en) * | 2021-02-08 | 2021-06-22 | 安徽理工大学 | Wireless sensor network data acquisition method and system |
US20220261406A1 (en) * | 2021-02-18 | 2022-08-18 | Walmart Apollo, Llc | Methods and apparatus for improving search retrieval |
WO2022180613A1 (en) * | 2021-02-26 | 2022-09-01 | Trackerdetect Ltd | Global iterative clustering algorithm to model entities' behaviors and detect anomalies |
CN112967062A (en) * | 2021-03-02 | 2021-06-15 | 东华大学 | User identity recognition method based on cautious degree |
CN112837078A (en) * | 2021-03-03 | 2021-05-25 | 万商云集(成都)科技股份有限公司 | Cluster-based user abnormal behavior detection method |
US20220286472A1 (en) * | 2021-03-04 | 2022-09-08 | Qatar Foundation For Education, Science And Community Development | Anomalous user account detection systems and methods |
US11991196B2 (en) * | 2021-03-04 | 2024-05-21 | Qatar Foundation For Education, Science And Community Development | Anomalous user account detection systems and methods |
CN115081468A (en) * | 2021-03-15 | 2022-09-20 | 天津大学 | Multi-task convolutional neural network fault diagnosis method based on knowledge migration |
US20220309155A1 (en) * | 2021-03-24 | 2022-09-29 | International Business Machines Corporation | Defending against adversarial queries in a data governance system |
US12056236B2 (en) * | 2021-03-24 | 2024-08-06 | International Business Machines Corporation | Defending against adversarial queries in a data governance system |
US20220309387A1 (en) * | 2021-03-26 | 2022-09-29 | Capital One Services, Llc | Computer-based systems for metadata-based anomaly detection and methods of use thereof |
US11836137B2 (en) * | 2021-05-19 | 2023-12-05 | Crowdstrike, Inc. | Real-time streaming graph queries |
US20220374434A1 (en) * | 2021-05-19 | 2022-11-24 | Crowdstrike, Inc. | Real-time streaming graph queries |
US11481709B1 (en) | 2021-05-20 | 2022-10-25 | Netskope, Inc. | Calibrating user confidence in compliance with an organization's security policies |
WO2022246131A1 (en) * | 2021-05-20 | 2022-11-24 | Netskope, Inc. | Scoring confidence in user compliance with an organization's security policies |
US11831661B2 (en) | 2021-06-03 | 2023-11-28 | Abnormal Security Corporation | Multi-tiered approach to payload detection for incoming communications |
CN113283377A (en) * | 2021-06-10 | 2021-08-20 | 重庆师范大学 | Face privacy protection method, system, medium and electronic terminal |
CN113344133A (en) * | 2021-06-30 | 2021-09-03 | 上海观安信息技术股份有限公司 | Method and system for detecting abnormal fluctuation of time sequence behavior |
US11899782B1 (en) | 2021-07-13 | 2024-02-13 | SentinelOne, Inc. | Preserving DLL hooks |
CN113849497A (en) * | 2021-08-02 | 2021-12-28 | 跨境云(横琴)科技创新研究中心有限公司 | Attribute weight and rule driving based exception aggregation method and system |
US12058163B2 (en) | 2021-08-10 | 2024-08-06 | CyberSaint, Inc. | Systems, media, and methods for utilizing a crosswalk algorithm to identify controls across frameworks, and for utilizing identified controls to generate cybersecurity risk assessments |
US11936668B2 (en) | 2021-08-17 | 2024-03-19 | International Business Machines Corporation | Identifying credential attacks on encrypted network traffic |
CN113726814A (en) * | 2021-09-09 | 2021-11-30 | 中国电信股份有限公司 | User abnormal behavior identification method, device, equipment and storage medium |
CN113963020A (en) * | 2021-09-18 | 2022-01-21 | 江苏大学 | Multi-intelligent-network-connected automobile cooperative target tracking method based on hypergraph matching |
CN113869415A (en) * | 2021-09-28 | 2021-12-31 | 华中师范大学 | Problem behavior detection and early warning system |
US20230100315A1 (en) * | 2021-09-28 | 2023-03-30 | Centurylink Intellectual Property Llc | Pattern Identification for Incident Prediction and Resolution |
CN114039744A (en) * | 2021-09-29 | 2022-02-11 | 中孚信息股份有限公司 | Abnormal behavior prediction method and system based on user characteristic label |
CN113704233A (en) * | 2021-10-29 | 2021-11-26 | 飞狐信息技术(天津)有限公司 | Keyword detection method and system |
US20230141849A1 (en) * | 2021-11-10 | 2023-05-11 | International Business Machines Corporation | Workflow management based on recognition of content of documents |
US11586878B1 (en) | 2021-12-10 | 2023-02-21 | Salesloft, Inc. | Methods and systems for cascading model architecture for providing information on reply emails |
CN114373186A (en) * | 2022-01-11 | 2022-04-19 | 北京新学堂网络科技有限公司 | Social software information interaction method, device and medium |
CN114329455A (en) * | 2022-03-08 | 2022-04-12 | 北京大学 | User abnormal behavior detection method and device based on heterogeneous graph embedding |
CN114356642A (en) * | 2022-03-11 | 2022-04-15 | 军事科学院系统工程研究院网络信息研究所 | Abnormal event automatic diagnosis method and system based on process mining |
US11915161B2 (en) * | 2022-05-25 | 2024-02-27 | Tsinghua University | Method and apparatus for semantic analysis on confrontation scenario based on target-attribute-relation |
US20240013075A1 (en) * | 2022-05-25 | 2024-01-11 | Tsinghua University | Method and apparatus for semantic analysis on confrontation scenario based on target-attribute-relation |
CN115022055A (en) * | 2022-06-09 | 2022-09-06 | 武汉思普崚技术有限公司 | Network attack real-time detection method and device based on dynamic time window |
US20240070130A1 (en) * | 2022-08-30 | 2024-02-29 | Charter Communications Operating, Llc | Methods And Systems For Identifying And Correcting Anomalies In A Data Environment |
US11693958B1 (en) * | 2022-09-08 | 2023-07-04 | Radiant Security, Inc. | Processing and storing event data in a knowledge graph format for anomaly detection |
CN115766145A (en) * | 2022-11-04 | 2023-03-07 | 中国电信股份有限公司 | Abnormality detection method and apparatus, and computer-readable storage medium |
CN115440390A (en) * | 2022-11-09 | 2022-12-06 | 山东大学 | Method, system, equipment and storage medium for predicting number of cases of infectious diseases |
US12111825B2 (en) * | 2022-11-10 | 2024-10-08 | Bank Of America Corporation | Event-driven batch processing system with granular operational access |
US20240160625A1 (en) * | 2022-11-10 | 2024-05-16 | Bank Of America Corporation | Event-driven batch processing system with granular operational access |
CN115981970A (en) * | 2023-03-20 | 2023-04-18 | 建信金融科技有限责任公司 | Operation and maintenance data analysis method, device, equipment and medium |
CN116232921A (en) * | 2023-05-08 | 2023-06-06 | 中国电信股份有限公司四川分公司 | Deterministic network data set construction device and method based on hypergraph |
CN116304641A (en) * | 2023-05-15 | 2023-06-23 | 山东省计算中心(国家超级计算济南中心) | Anomaly detection interpretation method and system based on reference point search and feature interaction |
CN116386045A (en) * | 2023-06-01 | 2023-07-04 | 创域智能(常熟)网联科技有限公司 | Sensor information analysis method based on artificial intelligence and artificial intelligence platform system |
CN117372076A (en) * | 2023-08-23 | 2024-01-09 | 广东烟草广州市有限公司 | Abnormal transaction data monitoring method, device, equipment and storage medium |
CN117075872A (en) * | 2023-10-17 | 2023-11-17 | 北京长亭科技有限公司 | Method and device for creating security base line based on dynamic parameters |
CN117290800A (en) * | 2023-11-24 | 2023-12-26 | 华东交通大学 | Timing sequence anomaly detection method and system based on hypergraph attention network |
CN117421459A (en) * | 2023-12-14 | 2024-01-19 | 成都智慧锦城大数据有限公司 | Data mining method and system applied to digital city |
CN117851958A (en) * | 2024-03-07 | 2024-04-09 | 中国人民解放军国防科技大学 | FHGS-based dynamic network edge anomaly detection method, device and equipment |
CN117909912A (en) * | 2024-03-19 | 2024-04-19 | 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) | Detection method and system for two-stage abnormal user behavior analysis |
CN118233317A (en) * | 2024-05-23 | 2024-06-21 | 四川大学 | Topology confusion defense method based on time-based network inference |
CN118297287A (en) * | 2024-06-05 | 2024-07-05 | 宁波财经学院 | Intelligent campus system based on student information index |
Also Published As
Publication number | Publication date |
---|---|
US8887286B2 (en) | 2014-11-11 |
US20140096249A1 (en) | 2014-04-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8887286B2 (en) | Continuous anomaly detection based on behavior modeling and heterogeneous information analysis | |
CN113377850B (en) | Big data technology platform of cognitive Internet of things | |
Rodriguez et al. | A computational social science perspective on qualitative data exploration: Using topic models for the descriptive analysis of social media data | |
Choraś et al. | Advanced Machine Learning techniques for fake news (online disinformation) detection: A systematic mapping study | |
Young et al. | Artificial discretion as a tool of governance: a framework for understanding the impact of artificial intelligence on public administration | |
Wenzel et al. | The double-edged sword of big data in organizational and management research: A review of opportunities and risks | |
US11093568B2 (en) | Systems and methods for content management | |
Paul et al. | Fake review detection on online E-commerce platforms: a systematic literature review | |
Viviani et al. | Credibility in social media: opinions, news, and health information—a survey | |
US9569729B1 (en) | Analytical system and method for assessing certain characteristics of organizations | |
US9269068B2 (en) | Systems and methods for consumer-generated media reputation management | |
US20180189691A1 (en) | Analytical system for assessing certain characteristics of organizations | |
WO2019222742A1 (en) | Real-time content analysis and ranking | |
Wiedemann | Proportional classification revisited: Automatic content analysis of political manifestos using active learning | |
US11816618B1 (en) | Method and system for automatically managing and displaying a hypergraph representation of workflow information | |
Johnsen | The future of Artificial Intelligence in Digital Marketing: The next big technological break | |
Pendyala | Veracity of big data | |
Hu et al. | How to find a perfect data scientist: A distance-metric learning approach | |
Harper | Metadata analytics, visualization, and optimization: Experiments in statistical analysis of the Digital Public Library of America (DPLA) | |
Chung et al. | A computational framework for social-media-based business analytics and knowledge creation: empirical studies of CyTraSS | |
Manoharan et al. | Insider threat detection using supervised machine learning algorithms | |
Brooks | Human centered tools for analyzing online social data | |
Leblanc et al. | Interpretability in machine learning: on the interplay with explainability, predictive performances and models | |
Weber | Artificial Intelligence for Business Analytics: Algorithms, Platforms and Application Scenarios | |
Ahmed et al. | The Narrow Depth and Breadth of Corporate Responsible AI Research |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: PLATINUM CAPITAL PARTNERS INC., CYPRUS Free format text: SECURITY AGREEMENT;ASSIGNOR:CATAPHORA, INC.;REEL/FRAME:025779/0930 Effective date: 20110111 |
|
AS | Assignment |
Owner name: CATAPHORA, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:DUPONT, LAURENT;CHARNOCK, ELIZABETH;ROBERTS, STEVE;AND OTHERS;SIGNING DATES FROM 20110620 TO 20110621;REEL/FRAME:026491/0150 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |
|
AS | Assignment |
Owner name: CATAPHORA, INC., CALIFORNIA Free format text: RELEASE OF SECURITY INTEREST IN PATENTS;ASSIGNOR:PLATINUM CAPITAL PARTNERS INC.;REEL/FRAME:032271/0582 Effective date: 20140217 |
|
AS | Assignment |
Owner name: SUNRISE, SERIES 54 OF ALLIED SECURITY TRUST I, CAL Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:CATAPHORA INC.;REEL/FRAME:033710/0173 Effective date: 20140715 |