US20160127402A1 - Method and apparatus for identifying and detecting threats to an enterprise or e-commerce system - Google Patents
Method and apparatus for identifying and detecting threats to an enterprise or e-commerce system Download PDFInfo
- Publication number
- US20160127402A1 US20160127402A1 US14/532,812 US201414532812A US2016127402A1 US 20160127402 A1 US20160127402 A1 US 20160127402A1 US 201414532812 A US201414532812 A US 201414532812A US 2016127402 A1 US2016127402 A1 US 2016127402A1
- Authority
- US
- United States
- Prior art keywords
- features
- enterprise
- pattern
- commerce system
- logs
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/20—Network architectures or network communication protocols for network security for managing network security; network security policies in general
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1425—Traffic logging, e.g. anomaly detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
- G06N5/046—Forward inferencing; Production systems
- G06N5/047—Pattern matching networks; Rete networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1416—Event detection, e.g. attack signature detection
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1441—Countermeasures against malicious traffic
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2221/00—Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/21—Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/2101—Auditing as a secondary aspect
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L2463/00—Additional details relating to network architectures or network communication protocols for network security covered by H04L63/00
- H04L2463/102—Additional details relating to network architectures or network communication protocols for network security covered by H04L63/00 applying security measure for e-commerce
Definitions
- the invention relates generally to the field of identifying and detecting threats to an enterprise or e-commerce system. More particularly, the invention relates to a scalable method and scalable apparatus for detecting threats by automatically creating statistical rules based on statistical outliers of one or more enterprise or e-commerce systems.
- a method for identifying and detecting threats to an enterprise or e-commerce system comprising: grouping log lines belonging to one or more log line parameters from one or more enterprise or e-commerce system data sources and/or from incoming data traffic to the enterprise or e-commerce system; extracting one or more features from the grouped log lines into one or more features tables; using one or more statistical models on the one or more features tables to identify statistical outliers; labeling the statistical outliers to create one or more labeled features tables; and using the one or more labeled features tables to create one or more rules for identifying threats to the enterprise or e-commerce system.
- an apparatus for identifying and detecting threats to an enterprise or e-commerce system comprising: one or more processors; system memory coupled to the one or more processors; one or more non-transitory memory units coupled to the one or more processors; and threat identification and detection code stored on the one or more non-transitory memory units that when executed by the one or more processors are configured to perform a method, comprising: grouping log lines belonging to one or more log line parameters from one or more enterprise or e-commerce system data sources and/or from incoming data traffic to the enterprise or e-commerce system; extracting one or more features from the grouped log lines into one or more features tables; using one or more statistical models on the one or more features tables to identify statistical outliers; labeling the statistical outliers to create one or more labeled features tables; and using the one or more labeled features tables to create one or more rules for identifying threats to the enterprise or e-commerce system.
- an apparatus for identifying and detecting threats to an enterprise or e-commerce system comprising: a pattern discoverer; one or more pattern normalizers coupled to the pattern discover; and one or more threat detectors coupled to the pattern discover; wherein at least one of the one or more pattern normalizers comprise: one or more pattern normalizer processors; pattern normalizer system memory coupled to the one or more pattern normalizer processors; one or more pattern normalizer non-transitory memory units coupled to the one or more pattern normalizer processors; a pattern normalizer communications device coupled to the one or more pattern normalizer processors, the pattern normalizer communications device being configured to communicate with the pattern discover; and pattern normalizer code stored on the one or more pattern normalizer non-transitory memory units that when executed by the one or more pattern normalizer processors are configured to perform a pattern normalizer method, comprising: grouping log lines belonging to one or more log line parameters from one or more enterprise or e-commerce system data sources and/or from incoming data traffic to the enterprise or e-commerce system; extract
- FIG. 1 is a block diagram illustrating an apparatus for identifying and detecting threats to an enterprise or e-commerce system, in accordance with some embodiments.
- FIG. 2 is a block diagram illustrating an apparatus for identifying and detecting threats to an enterprise or e-commerce system, in accordance with some embodiments.
- FIG. 3 is a table showing a features table, in accordance with some embodiments.
- FIG. 4 is a table showing a labeled features table, in accordance with some embodiments.
- FIG. 5 is a table showing a ranked/labeled features table, in accordance with some embodiments.
- FIG. 6 is a block diagram illustrating a method for identifying and detecting threats to an enterprise or e-commerce system, in accordance with some embodiments.
- the embodiment or embodiments described herein solve these problems and others by proposing a new method and apparatus for identifying and detecting threats to an enterprise or e-commerce system.
- the new method and apparatus uses a multi-dimensional statistical analysis of multiple extracted features from one or more enterprise or e-commerce system data sources and/or from incoming data traffic to the enterprise or e-commerce system to identify outliers. These statistical outliers are where malicious or unauthorized usage may be found.
- One or more rules for identifying threats to the enterprise or e-commerce system are then created based on the labeling of the statistical outliers. The one or more rules may then be used in the real-time detection of malicious or unauthorized use of the enterprise or e-commerce system.
- FIG. 1 is a block diagram illustrating an apparatus for identifying and detecting threats to an enterprise or e-commerce system, in accordance with some embodiments.
- apparatus 100 comprises one or more processors 105 , system memory 110 , and one or more non-transitory memory units 115 , all of which are directly or indirectly coupled to each other.
- Streamed data 120 and/or batch data 125 is fed into the apparatus 100 where a pattern normalizer 130 which comprises code stored on the one or more non-transitory memory units that when executed by the one or more processors are configured to parse the streamed data 120 and/or batch data 125 by grouping or bunching log lines belonging to one or more log line parameters and then extracting one or more features from the grouped log lines into one or more features tables 135 .
- the streamed data 120 comprises incoming data traffic to an enterprise or e-commerce system.
- the batch data 125 comprises web server access logs, firewall logs, packet captures per application, active directory logs, DNS logs, forward proxy logs, external threat feeds, AV logs, user logon audits, DLP logs, LB logs, IPS/IDS logs, black listed URLs, black listed IP addresses, and black listed referrers.
- the one or more log line parameters comprises at least one of: user ID, session, IP address, and URL query.
- the features of a features table, organized or grouped by sessions comprises at least one of: user session duration, number of requests in user session, average time between clicks in user session, user session click rate, percentage of image requests in user session, percentage of 4xx responses in user session, percentage of 3xx responses in user session, percentage of 2xx responses in user session, percentage of zip responses in user session, percentage of binary responses in user session, and percentage of head requests in user session.
- the features of a features table, organized or grouped by URL queries comprises at least one of: length of user URL query, number of characters of user URL query, number of digits of user URL query, and number of punctuations of user URL query.
- the features of a features table, organized or grouped by user ID comprises at least one of: number of checkouts, number of credit cards added, number of promo codes added, number of gift cards added, number of times items were shipped overnight, and number of times new shipping address was added.
- the features of a features table, organized or grouped by IP address comprises at least one of: number of login failures, number of login successes, number of password resets, and total number of requests.
- the one or more features tables comprise a matrix where the features are arranged by column and the one or more log line parameters make up the rows.
- the one or more features tables 135 are then sent to a pattern extractor 140 where the pattern extractor 140 which comprises code stored on the one or more non-transitory memory units that when executed by the one or more processors are configured to use one or more statistical models 145 , such as Clustering models, Hidden Markov model, and Copula models, on the one or more features tables 135 to identify statistical outliers.
- the pattern extractor 140 uses the Copula models on the one or more features tables 135
- the pattern extractor 140 applies a Copula function on all the features of the one or more features tables 135 .
- the Copula function comprises using various techniques to estimate a cumulative distribution function for each feature. In one embodiment, a kernel density estimation function is used to estimate the cumulative distribution function.
- the cumulative distribution function for each feature is used to calculate a U-matrix.
- the inverse of the U-matrix is then normalized and a RHOHAT computed.
- the RHOHAT and U-matrix are then used to compute the joint probability distribution of each row of a feature table.
- the one or more log line parameters of the one or more features tables 135 are ranked and rearranged by probability by the pattern extractor 140 .
- the statistical outliers are then labeled as malicious, non-malicious, or other administrator defined label in order to create one or more labeled features tables 150 .
- the statistical outliers are presented onto a user interface 155 so that an administrator of the enterprise or e-commerce system may manually identify the statistical outliers as malicious, non-malicious, or other administrator defined label.
- the one or more labeled features tables 150 are then sent to a rule generator 160 where the rule generator 160 which comprises code stored on the one or more non-transitory memory units that when executed by the one or more processors are configured to create from the one or more labeled features tables one or more rules 165 for identifying threats to the enterprise or e-commerce system.
- the one or more rules 165 comprises a random forest classifier, learning vector quantization, and/or a neural network.
- the one or more rules 165 that are created are essentially behavioral rules based on a multi-dimensional view of the incoming streamed data 120 and/or batch data 125 .
- the one or more rules 165 may then be sent to one or more threat detectors 170 for real-time monitoring of the streamed data 120 .
- the one or more rules 165 may also be posted to a cloud server 172 or distributed to other third parties 175 to be used in their firewall rules set. If threats are not detected by the one or more threat detectors 170 , the incoming data traffic is allowed to continue to the enterprise or e-commerce system.
- the incoming data traffic to the enterprise or e-commerce system may be blocked and/or challenged.
- the detected threat may be used to modify the one or more statistical models 145 used by the pattern extractor 140 and/or to modify the one or more rules 165 generated by the rule generator 160 .
- the pattern extractor 140 may be fully distributed across multiple server class machines in order to scale the processing of a large number, i.e. billions, of rows of log line parameters.
- each node of the multiple server class machines performs all the one or more statistical models on a group of features and a master node of the multiple server class machines aggregates the results from all the nodes of the multiple server class machines.
- the pattern extractor 140 may intelligently sample the rows of one or more log line parameters by using a technique called bag of little bootstraps. The bag of little bootstraps functions by averaging the results of bootstrapping multiple small subsets of log line parameters.
- FIG. 2 is a block diagram illustrating an apparatus for identifying and detecting threats to an enterprise or e-commerce system, in accordance with some embodiments.
- apparatus 200 comprises one or more pattern normalizers 205 , a pattern extractor 210 , a rule generator 215 , and one or more threat detectors 220 separated over one or more computing systems.
- the pattern extractor 210 and the rule generator 215 are integrated together as a pattern discoverer 225 onto a single computing system. In the embodiment illustrated in FIG.
- the one or more pattern normalizers 205 comprise one or more pattern normalizer processors 206 , pattern normalizer system memory 207 , one or more pattern normalizer non-transitory memory units 208 , and a pattern normalizer communications device 209 , all of which are directly or indirectly coupled to each other, and pattern normalizer code stored on the one or more pattern normalizer non-transitory memory units that when executed by the one or more pattern normalizer processors are configured to perform a pattern normalizer method.
- the pattern discoverer 225 comprises one or more pattern discoverer processors 226 , pattern discoverer system memory 227 , one or more pattern discoverer non-transitory memory units 228 , and a pattern discoverer communications device 229 , all of which are directly or indirectly coupled to each other, and pattern discoverer code stored on the one or more pattern discoverer non-transitory memory units that when executed by the one or more pattern discoverer processors are configured to perform a pattern discoverer method.
- the at least one of the one or more threat detectors 220 comprise one or more threat detector processors 221 , threat detector system memory 222 , one or more threat detector non-transitory memory units 223 , and a threat detector communications device 224 , all of which are directly or indirectly coupled to each other, and threat detector code stored on the one or more threat detector non-transitory memory units that when executed by the one or more threat detector processors are configured to perform a threat detector method.
- streamed data 230 and/or batch data 235 is fed into the one or more pattern normalizers 205 .
- the one or more pattern normalizers 205 parse the streamed data 230 and/or batch data 235 by grouping or bunching log lines belonging to one or more log line parameters and extracting one or more features from the grouped log lines into one or more features tables 240 .
- the streamed data 230 comprises incoming data traffic to an enterprise of e-commerce system.
- the batch data 235 comprises web server access logs, firewall logs, packet captures per application, active directory logs, DNS logs, forward proxy logs, external threat feeds, AV logs, user logon audits, DLP logs, LB logs, IPS/IDS logs, black listed URLs, black listed IP addresses, and black listed referrers.
- the one or more log line parameters comprises at least one of: user ID, session, IP address, and URL query.
- the features of a features table 240 organized or grouped by sessions, comprises at least one of: user session duration, number of requests in user session, average time between clicks in user session, user session click rate, percentage of image requests in user session, percentage of 4xx responses in user session, percentage of 3xx responses in user session, percentage of 2xx responses in user session, percentage of zip responses in user session, percentage of binary responses in user session, and percentage of head requests in user session.
- the features of a features table 240 comprises at least one of: length of user URL query, number of characters of user URL query, number of digits of user URL query, and number of punctuations of user URL query.
- the features of a features table 240 organized or grouped by user ID, comprises at least one of: number of checkouts, number of credit cards added, number of promo codes added, number of gift cards added, number of times items were shipped overnight, and number of times new shipping address was added.
- the features of a features table 240 organized or grouped by IP address, comprises at least one of: number of login failures, number of login successes, number of password resets, and total number of requests.
- the one or more features tables 240 comprise a matrix where the features are arranged by column and the one or more log line parameters make up the rows.
- the one or more features tables 240 are then sent to the pattern discoverer 225 , where the pattern extractor 210 uses one or more statistical models 245 , such as Clustering models, Hidden Markov model, and Copula models, on the one or more features tables 240 to identify statistical outliers.
- the pattern extractor 210 uses the Copula models on the one or more features tables 240
- the pattern extractor 210 applies a Copula function on all the features of the one or more features tables 240 .
- the Copula function comprises using various techniques to estimate a cumulative distribution function for each feature.
- a kernel density estimation function is used to estimate the cumulative distribution function.
- the cumulative distribution function of each feature is used to calculate a U-matrix.
- the inverse of the U-matrix is then normalized and a RHOHAT computed.
- the pattern extractor 210 uses the RHOHAT and U-matrix to compute joint probability distributions of each row of a feature table.
- the one or more log line parameters of the one or more features tables 240 are ranked and rearranged by probability by the pattern extractor 210 .
- the statistical outliers are then labeled as malicious, non-malicious, or other administrator defined label in order to create one or more labeled features tables 250 .
- the statistical outliers are presented onto a user interface 255 so that an administrator of the enterprise or e-commerce system may manually identify the statistical outliers as malicious, non-malicious, or other administrator defined label.
- the one or more labeled features tables 250 are then sent to the rule generator 215 which comprises code stored on the one or more pattern discover non-transitory memory units that when executed by the one or more pattern discoverer processors are configured to create from the one or more labeled features tables one or more rules 265 for identifying threats to the enterprise or e-commerce system.
- the one or more rules 265 comprises a random forest classifier, learning vector quantization, and/or a neural network.
- the one or more rules 265 that are created are essentially behavioral rules based on a multi-dimensional view of the incoming streamed data 230 and/or batch data 235 .
- the one or more rules 265 may then be sent to one or more threat detectors 220 for real-time monitoring of incoming data traffic to an enterprise or e-commerce system 270 .
- the one or more rules 265 may also be posted to a cloud server 275 or distributed to other third parties 280 to be used in their firewall rules set. If threats are not detected by the one or more threat detectors 220 , the incoming data traffic 270 is allowed to continue to the enterprise or e-commerce system.
- the incoming data traffic to the enterprise or e-commerce system 270 may be blocked and/or challenged.
- the detected threat may be used to modify the one or more statistical models 245 used by the pattern extractor 210 and/or to modify the one or more rules 265 generated by the rule generator 215 .
- the pattern discoverer 225 may be fully distributed across multiple server class machines in order to scale the processing of a large number, i.e. billions, of rows of log line parameters.
- each node of the multiple server class machines performs all the one or more statistical models on a group of features and a master node of the multiple server class machines aggregates the results from all the nodes of the multiple server class machines.
- the pattern extractor 210 may intelligently sample the rows of one or more log line parameters by using a technique called bag of little bootstraps. The bag of little bootstraps functions by averaging the results of bootstrapping multiple small subsets of log line parameters.
- FIG. 3 is a table showing a features table, in accordance with some embodiments.
- a pattern normalizer parses streamed data and/or batch data by grouping or bunching log lines belonging to one or more log line parameters and then extracting features from the one or more log line parameters into one or more features tables.
- the streamed data comprises incoming data traffic to an enterprise or e-commerce system.
- the batch data comprises web server access logs, firewall logs, packet captures per application, active directory logs, DNS logs, forward proxy logs, external threat feeds, AV logs, user logon audits, DLP logs, LB logs, IPS/IDS logs, black listed URLs, black listed IP addresses, and black listed referrers.
- the one or more log line parameters comprises at least one of: user ID, session, IP address, and URL query.
- the one or more features tables may be created at one day, seven day, and/or thirty day periods.
- the features table 300 organized or grouped by session from Session 1 to Session n, comprises one or more columns of session features, Features 1 to Features m, comprising at least one of: user session duration, number of requests in user session, average time between clicks in user session, user session click rate, percentage of image requests in user session, percentage of 4xx responses in user session, percentage of 3xx responses in user session, percentage of 2xx responses in user session, percentage of zip responses in user session, percentage of binary responses in user session, and percentage of head requests in user session.
- the features of a features table, organized or grouped by URL queries comprises at least one of: length of user URL query, number of characters of user URL query, number of digits of user URL query, and number of punctuations of user URL query.
- the features of a features table, organized or grouped by user ID comprises at least one of: number of checkouts, number of credit cards added, number of promo codes added, number of gift cards added, number of times items were shipped overnight, and number of times new shipping address was added.
- the features of a features table, organized or grouped by IP address comprises at least one of: number of login failures, number of login successes, number of password resets, and total number of requests.
- FIG. 4 is a table showing a labeled features table, in accordance with some embodiments.
- a pattern extractor uses one or more statistical models, such as Clustering models, Hidden Markov model, and Copula models, on the features table of FIG. 3 to identify statistical outliers.
- the statistical outliers are then labeled as malicious, non-malicious, or other administrator defined label in order to create a labeled features table 400 .
- the labeled features table 400 organized or grouped by session from Session 1 to Session n, comprises one or more columns of session features, Features 1 to Features m, and one or more columns of labels for the sessions.
- Other similar labeled features tables may be created for user ID, IP address, and URL query.
- FIG. 5 is a table showing a ranked/labeled features table, in accordance with some embodiments.
- a pattern extractor uses one or more statistical models, such as Clustering models, Hidden Markov model, and Copula models, on the features table of FIG. 3 to identify statistical outliers.
- the pattern extractor then ranks and rearranges by probability each of the one or more log line parameters from Rank 1 to Rank n, with Rank 1 being the least likely log line parameter to Rank n being the most likely log line parameter. Millions of log lines may be rearranged and ranked.
- the statistical outliers are then labeled as malicious, non-malicious, or other administrator defined label to create a ranked/labeled features table 500 .
- FIG. 6 is a block diagram illustrating a method for identifying and detecting threats to an enterprise or e-commerce system, in accordance with some embodiments.
- the method illustrated for identifying and detecting threats to an enterprise or e-commerce system in FIG. 6 may be performed by one or more of the apparatuses and feature tables illustrated in FIG. 1 , FIG. 2 , FIG. 3 , FIG. 4 , and FIG. 5 .
- Processing begins at 600 whereupon, at block 605 , log lines belonging to one or more log line parameters are grouped from one or more enterprise or e-commerce system data sources and/or from incoming data traffic to the enterprise or e-commerce system.
- the one or more enterprise or e-commerce system data sources comprises at least one of: web server access logs, firewall logs, packet captures per application, active directory logs, DNS logs, forward proxy logs, external threat feeds, AV logs, user logon audits, DLP logs, LB logs, IPS/IDS logs, black listed URLs, black listed IP addresses, and black listed referrers.
- the one or more log line parameters comprises at least one of: user ID, session, IP address, and URL query.
- one or more features are extracted from the grouped log lines into one or more features tables.
- the features of a features table organized or grouped by sessions comprises at least one of: user session duration, number of requests in user session, average time between clicks in user session, user session click rate, percentage of image requests in user session, percentage of 4xx responses in user session, percentage of 3xx responses in user session, percentage of 2xx responses in user session, percentage of zip responses in user session, percentage of binary responses in user session, and percentage of head requests in user session.
- the features of a features table, organized or grouped by URL queries comprises at least one of: length of user URL query, number of characters of user URL query, number of digits of user URL query, and number of punctuations of user URL query.
- the features of a features table, organized or grouped by user ID comprises at least one of: number of checkouts, number of credit cards added, number of promo codes added, number of gift cards added, number of times items were shipped overnight, and number of times new shipping address was added.
- the features of a features table, organized or grouped by IP address comprises at least one of: number of login failures, number of login successes, number of password resets, and total number of requests.
- one or more statistical models are used on the one or more features tables to identify statistical outliers.
- the one or more statistical models comprises at least one of: Clustering models, Hidden Markov model, and Copula models.
- a Copula model is used, a Copula function is applied on all the one or more extracted features.
- the Copula function comprises using various techniques to estimate a cumulative distribution function for each feature.
- a kernel density estimation function is used to estimate the cumulative distribution function.
- the cumulative distribution function of each feature is used to calculate a U-matrix. The inverse of the U-matrix is then normalized and a RHOHAT computed.
- the RHOHAT and U-matrix are then used to compute the joint probability distribution of each row of a feature table.
- the one or more log line parameters of the one or more features tables are ranked and rearranged by probability.
- using one or more statistical models on the one or more features tables from the one or more enterprise or e-commerce system data sources to identify statistical outliers comprises: distributing one or more features from the one or more features tables across two or more servers; using the one or more statistical models on the distributed one or more features; and aggregating results from the using the one or more statistical models on the distributed one or more features.
- the statistical outliers are labeled to create one or more labeled features tables.
- the labeling of the statistical outliers comprises presenting an administrator the statistical outliers for identification as malicious, non-malicious, or other administrator defined label.
- the one or more labeled features tables are used to create one or more rules for identifying threats to the enterprise or e-commerce system.
- the one or more rules for identifying threats to the enterprise or e-commerce system comprises a random forest classifier, learning vector quantization, and/or a neural network.
- the one or more rules are used on incoming enterprise or e-commerce system data traffic to detect threats to the enterprise or e-commerce system.
- the threat detection is done in real-time. If threats are detected, the incoming data traffic to the enterprise or e-commerce system may be blocked and/or challenged. In some embodiments, if a threat is detected, the detected threat may be used to modify the one or more statistical models and/or to modify the one or more rules. Processing subsequently ends at 699 .
- Some embodiments described herein relate to a computer storage product with one or more non-transitory memory units having instructions or computer code thereon for performing various computer-implemented operations.
- the one or more memory units are non-transitory in the sense that they do not include transitory propagating signals per se (e.g., a propagating electromagnetic wave carrying information on a transmission medium such as space or a cable).
- the one or more memory units and computer code may be those designed and constructed for the specific purpose or purposes.
- Examples of one or more memory units include, but are not limited to: magnetic storage media such as hard disks, floppy disks, and magnetic tape; optical storage media such as Compact Disc/Digital Video Discs (CD/DVDs), Compact Disc-Read Only Memories (CD-ROMs), and holographic devices; magneto-optical storage media such as optical disks; carrier wave signal processing modules; and hardware devices that are specially configured to store and execute program code, such as Application-Specific Integrated Circuits (ASICs), Programmable Logic Devices (PLDs), Read-Only Memory (ROM), and Random-Access Memory (RAM) devices.
- ASICs Application-Specific Integrated Circuits
- PLDs Programmable Logic Devices
- ROM Read-Only Memory
- RAM Random-Access Memory
- Examples of computer code include, but are not limited to, micro-code or micro-instructions, machine instructions, such as produced by a compiler, code used to produce a web service, and files containing higher-level instructions that are executed by a computer using an interpreter.
- embodiments may be implemented using Java, C++, Python, C, or other programming languages (e.g., object-oriented programming languages) and development tools.
- Additional examples of computer code include, but are not limited to, control signals, encrypted code, database code, and compressed code.
- Embodiments of distributed database code may be implemented using Hadoop/HDFS, Cassandra, or other database technologies.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computer Hardware Design (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Mathematical Physics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computational Linguistics (AREA)
- Computational Mathematics (AREA)
- Mathematical Optimization (AREA)
- Probability & Statistics with Applications (AREA)
- Mathematical Analysis (AREA)
- Algebra (AREA)
- Pure & Applied Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Computer And Data Communications (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
Methods and apparatuses for identifying and detecting threats to an enterprise or e-commerce system are disclosed, including grouping log lines belonging to one or more log line parameters from one or more enterprise or e-commerce system data sources and/or from incoming data traffic to the enterprise or e-commerce system; extracting one or more features from the grouped log lines into one or more features tables; using one or more statistical models on the one or more features tables to identify statistical outliers; labeling the statistical outliers to create one or more labeled features tables; using the one or more labeled features tables to create one or more rules for identifying threats to the enterprise or e-commerce system; and using the one or more rules on incoming enterprise or e-commerce system data traffic to detect threats to the enterprise or e-commerce system. Other embodiments are described and claimed.
Description
- The invention relates generally to the field of identifying and detecting threats to an enterprise or e-commerce system. More particularly, the invention relates to a scalable method and scalable apparatus for detecting threats by automatically creating statistical rules based on statistical outliers of one or more enterprise or e-commerce systems.
- In one respect, disclosed is a method for identifying and detecting threats to an enterprise or e-commerce system, the method comprising: grouping log lines belonging to one or more log line parameters from one or more enterprise or e-commerce system data sources and/or from incoming data traffic to the enterprise or e-commerce system; extracting one or more features from the grouped log lines into one or more features tables; using one or more statistical models on the one or more features tables to identify statistical outliers; labeling the statistical outliers to create one or more labeled features tables; and using the one or more labeled features tables to create one or more rules for identifying threats to the enterprise or e-commerce system.
- In one respect, disclosed is an apparatus for identifying and detecting threats to an enterprise or e-commerce system, the apparatus comprising: one or more processors; system memory coupled to the one or more processors; one or more non-transitory memory units coupled to the one or more processors; and threat identification and detection code stored on the one or more non-transitory memory units that when executed by the one or more processors are configured to perform a method, comprising: grouping log lines belonging to one or more log line parameters from one or more enterprise or e-commerce system data sources and/or from incoming data traffic to the enterprise or e-commerce system; extracting one or more features from the grouped log lines into one or more features tables; using one or more statistical models on the one or more features tables to identify statistical outliers; labeling the statistical outliers to create one or more labeled features tables; and using the one or more labeled features tables to create one or more rules for identifying threats to the enterprise or e-commerce system.
- In another respect, disclosed is an apparatus for identifying and detecting threats to an enterprise or e-commerce system, the apparatus comprising: a pattern discoverer; one or more pattern normalizers coupled to the pattern discover; and one or more threat detectors coupled to the pattern discover; wherein at least one of the one or more pattern normalizers comprise: one or more pattern normalizer processors; pattern normalizer system memory coupled to the one or more pattern normalizer processors; one or more pattern normalizer non-transitory memory units coupled to the one or more pattern normalizer processors; a pattern normalizer communications device coupled to the one or more pattern normalizer processors, the pattern normalizer communications device being configured to communicate with the pattern discover; and pattern normalizer code stored on the one or more pattern normalizer non-transitory memory units that when executed by the one or more pattern normalizer processors are configured to perform a pattern normalizer method, comprising: grouping log lines belonging to one or more log line parameters from one or more enterprise or e-commerce system data sources and/or from incoming data traffic to the enterprise or e-commerce system; extracting one or more features from the grouped log lines into one or more features tables; and sending the one or more features tables to the pattern discoverer; wherein the pattern discoverer comprises: one or more pattern discoverer processors; pattern discoverer system memory coupled to the one or more pattern discoverer processors; one or more pattern discoverer non-transitory memory units coupled to the one or more pattern discoverer processors; a pattern discoverer communications device coupled to the one or more pattern discoverer processors, the pattern discoverer communications device being configured to communicate with the one or more pattern normalizers; and pattern discoverer code stored on the one or more pattern discoverer non-transitory memory units that when executed by the one or more pattern discoverer processors are configured to perform a pattern discoverer method, comprising: using one or more statistical models on the one or more features tables to identify statistical outliers; labeling the statistical outliers to create one or more labeled features tables; using the one or more labeled features tables to create one or more rules for identifying threats to the enterprise or e-commerce system; and sending to the one or more threat detectors, the one or more rules for identifying threats to the enterprise or e-commerce system; and wherein at least one of the one or more threat detectors comprise: one or more threat detector processors; threat detector system memory coupled to the one or more threat detector processors; one or more threat detector non-transitory memory units coupled to the one or more threat detector processors; a threat detector communications device coupled to the one or more threat detector processors, the threat detector communications device being configured to communicate with the pattern discover; and threat detector code stored on the one or more threat detector non-transitory memory units that when executed by the one or more threat detector processors are configured to perform a threat detector method, comprising: using the one or more rules on the incoming data traffic to the enterprise or e-commerce system to detect threats to the enterprise or e-commerce system.
- Numerous additional embodiments are also possible.
- Other objects and advantages of the invention may become apparent upon reading the detailed description and upon reference to the accompanying drawings.
-
FIG. 1 is a block diagram illustrating an apparatus for identifying and detecting threats to an enterprise or e-commerce system, in accordance with some embodiments. -
FIG. 2 is a block diagram illustrating an apparatus for identifying and detecting threats to an enterprise or e-commerce system, in accordance with some embodiments. -
FIG. 3 is a table showing a features table, in accordance with some embodiments. -
FIG. 4 is a table showing a labeled features table, in accordance with some embodiments. -
FIG. 5 is a table showing a ranked/labeled features table, in accordance with some embodiments. -
FIG. 6 is a block diagram illustrating a method for identifying and detecting threats to an enterprise or e-commerce system, in accordance with some embodiments. - While the invention is subject to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and the accompanying detailed description. It should be understood, however, that the drawings and detailed description are not intended to limit the invention to the particular embodiments. This disclosure is instead intended to cover all modifications, equivalents, and alternatives falling within the scope of the present invention as defined by the appended claims.
- One or more embodiments of the invention are described below. It should be noted that these and any other embodiments are exemplary and are intended to be illustrative of the invention rather than limiting. While the invention is widely applicable to different types of systems, it is impossible to include all of the possible embodiments and contexts of the invention in this disclosure. Upon reading this disclosure, many alternative embodiments of the present invention will be apparent to persons of ordinary skill in the art.
- Malicious or unauthorized use of enterprise or e-commerce systems is on the rise. The results of which can be seen in the daily reports about breach and fraud. Unfortunately, roughly 70% of these activities are discovered or detected by the end users or by third parties. This is the case since current cyber security infrastructure uses simplistic, static rules and signatures that are backward looking and therefore cannot catch what has not been seen before. Typically, information about the malicious or unauthorized use of enterprise or e-commerce systems is captured by current cyber security infrastructure, but are merely indexed and stored for search and retrieval during forensics. Once a malicious or unauthorized use is discovered and a new breach or fraud identified, new rules and signatures are implemented to the current cyber security infrastructure. Eventually though, malicious or unauthorized users are successful in bypassing the new rules and signatures and the whole process of discovery by a third party or end user, forensics by the victim, and subsequent rule creation starts again. Current cyber security infrastructure is not capable of identifying and detecting malicious or unauthorized usage that is capable of circumventing enterprise or e-commerce systems' firewalls and rules. This is leading to an increase in breach and fraud.
- The embodiment or embodiments described herein solve these problems and others by proposing a new method and apparatus for identifying and detecting threats to an enterprise or e-commerce system. The new method and apparatus uses a multi-dimensional statistical analysis of multiple extracted features from one or more enterprise or e-commerce system data sources and/or from incoming data traffic to the enterprise or e-commerce system to identify outliers. These statistical outliers are where malicious or unauthorized usage may be found. One or more rules for identifying threats to the enterprise or e-commerce system are then created based on the labeling of the statistical outliers. The one or more rules may then be used in the real-time detection of malicious or unauthorized use of the enterprise or e-commerce system.
-
FIG. 1 is a block diagram illustrating an apparatus for identifying and detecting threats to an enterprise or e-commerce system, in accordance with some embodiments. - In some embodiments,
apparatus 100 comprises one ormore processors 105,system memory 110, and one or morenon-transitory memory units 115, all of which are directly or indirectly coupled to each other. Streameddata 120 and/orbatch data 125 is fed into theapparatus 100 where apattern normalizer 130 which comprises code stored on the one or more non-transitory memory units that when executed by the one or more processors are configured to parse the streameddata 120 and/orbatch data 125 by grouping or bunching log lines belonging to one or more log line parameters and then extracting one or more features from the grouped log lines into one or more features tables 135. The streameddata 120 comprises incoming data traffic to an enterprise or e-commerce system. Thebatch data 125 comprises web server access logs, firewall logs, packet captures per application, active directory logs, DNS logs, forward proxy logs, external threat feeds, AV logs, user logon audits, DLP logs, LB logs, IPS/IDS logs, black listed URLs, black listed IP addresses, and black listed referrers. The one or more log line parameters comprises at least one of: user ID, session, IP address, and URL query. The features of a features table, organized or grouped by sessions, comprises at least one of: user session duration, number of requests in user session, average time between clicks in user session, user session click rate, percentage of image requests in user session, percentage of 4xx responses in user session, percentage of 3xx responses in user session, percentage of 2xx responses in user session, percentage of zip responses in user session, percentage of binary responses in user session, and percentage of head requests in user session. The features of a features table, organized or grouped by URL queries, comprises at least one of: length of user URL query, number of characters of user URL query, number of digits of user URL query, and number of punctuations of user URL query. The features of a features table, organized or grouped by user ID, comprises at least one of: number of checkouts, number of credit cards added, number of promo codes added, number of gift cards added, number of times items were shipped overnight, and number of times new shipping address was added. The features of a features table, organized or grouped by IP address, comprises at least one of: number of login failures, number of login successes, number of password resets, and total number of requests. In some embodiments, the one or more features tables comprise a matrix where the features are arranged by column and the one or more log line parameters make up the rows. The one or more features tables 135 are then sent to apattern extractor 140 where thepattern extractor 140 which comprises code stored on the one or more non-transitory memory units that when executed by the one or more processors are configured to use one or morestatistical models 145, such as Clustering models, Hidden Markov model, and Copula models, on the one or more features tables 135 to identify statistical outliers. In the embodiment where thepattern extractor 140 uses the Copula models on the one or more features tables 135, thepattern extractor 140 applies a Copula function on all the features of the one or more features tables 135. The Copula function comprises using various techniques to estimate a cumulative distribution function for each feature. In one embodiment, a kernel density estimation function is used to estimate the cumulative distribution function. Next, the cumulative distribution function for each feature is used to calculate a U-matrix. The inverse of the U-matrix is then normalized and a RHOHAT computed. The RHOHAT and U-matrix are then used to compute the joint probability distribution of each row of a feature table. In some embodiments, the one or more log line parameters of the one or more features tables 135 are ranked and rearranged by probability by thepattern extractor 140. - The statistical outliers are then labeled as malicious, non-malicious, or other administrator defined label in order to create one or more labeled features tables 150. In some embodiments, the statistical outliers are presented onto a user interface 155 so that an administrator of the enterprise or e-commerce system may manually identify the statistical outliers as malicious, non-malicious, or other administrator defined label. The one or more labeled features tables 150 are then sent to a
rule generator 160 where therule generator 160 which comprises code stored on the one or more non-transitory memory units that when executed by the one or more processors are configured to create from the one or more labeled features tables one ormore rules 165 for identifying threats to the enterprise or e-commerce system. The one ormore rules 165 comprises a random forest classifier, learning vector quantization, and/or a neural network. The one ormore rules 165 that are created are essentially behavioral rules based on a multi-dimensional view of the incoming streameddata 120 and/orbatch data 125. The one ormore rules 165 may then be sent to one ormore threat detectors 170 for real-time monitoring of the streameddata 120. The one ormore rules 165 may also be posted to acloud server 172 or distributed to other third parties 175 to be used in their firewall rules set. If threats are not detected by the one ormore threat detectors 170, the incoming data traffic is allowed to continue to the enterprise or e-commerce system. If threats are detected by the one ormore threat detectors 170, the incoming data traffic to the enterprise or e-commerce system may be blocked and/or challenged. In some embodiments, if a threat is detected, the detected threat may be used to modify the one or morestatistical models 145 used by thepattern extractor 140 and/or to modify the one ormore rules 165 generated by therule generator 160. - In some embodiments, the
pattern extractor 140 may be fully distributed across multiple server class machines in order to scale the processing of a large number, i.e. billions, of rows of log line parameters. To scale across the features of the one or more features tables 135, each node of the multiple server class machines performs all the one or more statistical models on a group of features and a master node of the multiple server class machines aggregates the results from all the nodes of the multiple server class machines. To scale across the rows of one or more log line parameters of the one or more features tables 135, thepattern extractor 140 may intelligently sample the rows of one or more log line parameters by using a technique called bag of little bootstraps. The bag of little bootstraps functions by averaging the results of bootstrapping multiple small subsets of log line parameters. -
FIG. 2 is a block diagram illustrating an apparatus for identifying and detecting threats to an enterprise or e-commerce system, in accordance with some embodiments. - In some embodiments,
apparatus 200 comprises one or more pattern normalizers 205, apattern extractor 210, arule generator 215, and one ormore threat detectors 220 separated over one or more computing systems. In one embodiment, thepattern extractor 210 and therule generator 215 are integrated together as apattern discoverer 225 onto a single computing system. In the embodiment illustrated inFIG. 2 , at least one of the one ormore pattern normalizers 205 comprise one or morepattern normalizer processors 206, patternnormalizer system memory 207, one or more pattern normalizer non-transitory memory units 208, and a pattern normalizer communications device 209, all of which are directly or indirectly coupled to each other, and pattern normalizer code stored on the one or more pattern normalizer non-transitory memory units that when executed by the one or more pattern normalizer processors are configured to perform a pattern normalizer method. Thepattern discoverer 225 comprises one or morepattern discoverer processors 226, patterndiscoverer system memory 227, one or more pattern discoverer non-transitory memory units 228, and a patterndiscoverer communications device 229, all of which are directly or indirectly coupled to each other, and pattern discoverer code stored on the one or more pattern discoverer non-transitory memory units that when executed by the one or more pattern discoverer processors are configured to perform a pattern discoverer method. The at least one of the one ormore threat detectors 220 comprise one or morethreat detector processors 221, threatdetector system memory 222, one or more threat detector non-transitory memory units 223, and a threatdetector communications device 224, all of which are directly or indirectly coupled to each other, and threat detector code stored on the one or more threat detector non-transitory memory units that when executed by the one or more threat detector processors are configured to perform a threat detector method. Inapparatus 200, streameddata 230 and/orbatch data 235 is fed into the one ormore pattern normalizers 205. The one ormore pattern normalizers 205 parse the streameddata 230 and/orbatch data 235 by grouping or bunching log lines belonging to one or more log line parameters and extracting one or more features from the grouped log lines into one or more features tables 240. The streameddata 230 comprises incoming data traffic to an enterprise of e-commerce system. Thebatch data 235 comprises web server access logs, firewall logs, packet captures per application, active directory logs, DNS logs, forward proxy logs, external threat feeds, AV logs, user logon audits, DLP logs, LB logs, IPS/IDS logs, black listed URLs, black listed IP addresses, and black listed referrers. The one or more log line parameters comprises at least one of: user ID, session, IP address, and URL query. The features of a features table 240, organized or grouped by sessions, comprises at least one of: user session duration, number of requests in user session, average time between clicks in user session, user session click rate, percentage of image requests in user session, percentage of 4xx responses in user session, percentage of 3xx responses in user session, percentage of 2xx responses in user session, percentage of zip responses in user session, percentage of binary responses in user session, and percentage of head requests in user session. The features of a features table 240, organized or grouped by URL queries, comprises at least one of: length of user URL query, number of characters of user URL query, number of digits of user URL query, and number of punctuations of user URL query. The features of a features table 240, organized or grouped by user ID, comprises at least one of: number of checkouts, number of credit cards added, number of promo codes added, number of gift cards added, number of times items were shipped overnight, and number of times new shipping address was added. The features of a features table 240, organized or grouped by IP address, comprises at least one of: number of login failures, number of login successes, number of password resets, and total number of requests. In some embodiments, the one or more features tables 240 comprise a matrix where the features are arranged by column and the one or more log line parameters make up the rows. The one or more features tables 240 are then sent to thepattern discoverer 225, where thepattern extractor 210 uses one or morestatistical models 245, such as Clustering models, Hidden Markov model, and Copula models, on the one or more features tables 240 to identify statistical outliers. In the embodiment where thepattern extractor 210 uses the Copula models on the one or more features tables 240, thepattern extractor 210 applies a Copula function on all the features of the one or more features tables 240. The Copula function comprises using various techniques to estimate a cumulative distribution function for each feature. In one embodiment, a kernel density estimation function is used to estimate the cumulative distribution function. Next, the cumulative distribution function of each feature is used to calculate a U-matrix. The inverse of the U-matrix is then normalized and a RHOHAT computed. Thepattern extractor 210 then uses the RHOHAT and U-matrix to compute joint probability distributions of each row of a feature table. In some embodiments, the one or more log line parameters of the one or more features tables 240 are ranked and rearranged by probability by thepattern extractor 210. - The statistical outliers are then labeled as malicious, non-malicious, or other administrator defined label in order to create one or more labeled features tables 250. In some embodiments, the statistical outliers are presented onto a
user interface 255 so that an administrator of the enterprise or e-commerce system may manually identify the statistical outliers as malicious, non-malicious, or other administrator defined label. The one or more labeled features tables 250 are then sent to therule generator 215 which comprises code stored on the one or more pattern discover non-transitory memory units that when executed by the one or more pattern discoverer processors are configured to create from the one or more labeled features tables one ormore rules 265 for identifying threats to the enterprise or e-commerce system. The one ormore rules 265 comprises a random forest classifier, learning vector quantization, and/or a neural network. The one ormore rules 265 that are created are essentially behavioral rules based on a multi-dimensional view of the incoming streameddata 230 and/orbatch data 235. The one ormore rules 265 may then be sent to one ormore threat detectors 220 for real-time monitoring of incoming data traffic to an enterprise ore-commerce system 270. The one ormore rules 265 may also be posted to a cloud server 275 or distributed to otherthird parties 280 to be used in their firewall rules set. If threats are not detected by the one ormore threat detectors 220, theincoming data traffic 270 is allowed to continue to the enterprise or e-commerce system. If threats are detected by the one ormore threat detectors 220, the incoming data traffic to the enterprise ore-commerce system 270 may be blocked and/or challenged. In some embodiments, if a threat is detected, the detected threat may be used to modify the one or morestatistical models 245 used by thepattern extractor 210 and/or to modify the one ormore rules 265 generated by therule generator 215. - In some embodiments, the
pattern discoverer 225 may be fully distributed across multiple server class machines in order to scale the processing of a large number, i.e. billions, of rows of log line parameters. To scale across the features of the one or more features tables 240, each node of the multiple server class machines performs all the one or more statistical models on a group of features and a master node of the multiple server class machines aggregates the results from all the nodes of the multiple server class machines. To scale across the rows of one or more log line parameters of the one or more features tables 240, thepattern extractor 210 may intelligently sample the rows of one or more log line parameters by using a technique called bag of little bootstraps. The bag of little bootstraps functions by averaging the results of bootstrapping multiple small subsets of log line parameters. -
FIG. 3 is a table showing a features table, in accordance with some embodiments. - In some embodiments, a pattern normalizer parses streamed data and/or batch data by grouping or bunching log lines belonging to one or more log line parameters and then extracting features from the one or more log line parameters into one or more features tables. The streamed data comprises incoming data traffic to an enterprise or e-commerce system. The batch data comprises web server access logs, firewall logs, packet captures per application, active directory logs, DNS logs, forward proxy logs, external threat feeds, AV logs, user logon audits, DLP logs, LB logs, IPS/IDS logs, black listed URLs, black listed IP addresses, and black listed referrers. The one or more log line parameters comprises at least one of: user ID, session, IP address, and URL query. The one or more features tables may be created at one day, seven day, and/or thirty day periods. The features table 300, organized or grouped by session from
Session 1 to Session n, comprises one or more columns of session features,Features 1 to Features m, comprising at least one of: user session duration, number of requests in user session, average time between clicks in user session, user session click rate, percentage of image requests in user session, percentage of 4xx responses in user session, percentage of 3xx responses in user session, percentage of 2xx responses in user session, percentage of zip responses in user session, percentage of binary responses in user session, and percentage of head requests in user session. The features of a features table, organized or grouped by URL queries, comprises at least one of: length of user URL query, number of characters of user URL query, number of digits of user URL query, and number of punctuations of user URL query. The features of a features table, organized or grouped by user ID, comprises at least one of: number of checkouts, number of credit cards added, number of promo codes added, number of gift cards added, number of times items were shipped overnight, and number of times new shipping address was added. The features of a features table, organized or grouped by IP address, comprises at least one of: number of login failures, number of login successes, number of password resets, and total number of requests. -
FIG. 4 is a table showing a labeled features table, in accordance with some embodiments. - In some embodiments, a pattern extractor uses one or more statistical models, such as Clustering models, Hidden Markov model, and Copula models, on the features table of
FIG. 3 to identify statistical outliers. The statistical outliers are then labeled as malicious, non-malicious, or other administrator defined label in order to create a labeled features table 400. The labeled features table 400, organized or grouped by session fromSession 1 to Session n, comprises one or more columns of session features,Features 1 to Features m, and one or more columns of labels for the sessions. In labeled features table 400, there is a Malicious label, a Non-Malicious label, andAdministrator Label 1 to Administrator Label p. Other similar labeled features tables may be created for user ID, IP address, and URL query. -
FIG. 5 is a table showing a ranked/labeled features table, in accordance with some embodiments. - In some embodiments, a pattern extractor uses one or more statistical models, such as Clustering models, Hidden Markov model, and Copula models, on the features table of
FIG. 3 to identify statistical outliers. The pattern extractor then ranks and rearranges by probability each of the one or more log line parameters fromRank 1 to Rank n, withRank 1 being the least likely log line parameter to Rank n being the most likely log line parameter. Millions of log lines may be rearranged and ranked. The statistical outliers are then labeled as malicious, non-malicious, or other administrator defined label to create a ranked/labeled features table 500. -
FIG. 6 is a block diagram illustrating a method for identifying and detecting threats to an enterprise or e-commerce system, in accordance with some embodiments. - In some embodiments, the method illustrated for identifying and detecting threats to an enterprise or e-commerce system in
FIG. 6 may be performed by one or more of the apparatuses and feature tables illustrated inFIG. 1 ,FIG. 2 ,FIG. 3 ,FIG. 4 , andFIG. 5 . Processing begins at 600 whereupon, atblock 605, log lines belonging to one or more log line parameters are grouped from one or more enterprise or e-commerce system data sources and/or from incoming data traffic to the enterprise or e-commerce system. The one or more enterprise or e-commerce system data sources comprises at least one of: web server access logs, firewall logs, packet captures per application, active directory logs, DNS logs, forward proxy logs, external threat feeds, AV logs, user logon audits, DLP logs, LB logs, IPS/IDS logs, black listed URLs, black listed IP addresses, and black listed referrers. The one or more log line parameters comprises at least one of: user ID, session, IP address, and URL query. - At
block 610, one or more features are extracted from the grouped log lines into one or more features tables. The features of a features table organized or grouped by sessions, comprises at least one of: user session duration, number of requests in user session, average time between clicks in user session, user session click rate, percentage of image requests in user session, percentage of 4xx responses in user session, percentage of 3xx responses in user session, percentage of 2xx responses in user session, percentage of zip responses in user session, percentage of binary responses in user session, and percentage of head requests in user session. The features of a features table, organized or grouped by URL queries, comprises at least one of: length of user URL query, number of characters of user URL query, number of digits of user URL query, and number of punctuations of user URL query. The features of a features table, organized or grouped by user ID, comprises at least one of: number of checkouts, number of credit cards added, number of promo codes added, number of gift cards added, number of times items were shipped overnight, and number of times new shipping address was added. The features of a features table, organized or grouped by IP address, comprises at least one of: number of login failures, number of login successes, number of password resets, and total number of requests. - At
block 615, one or more statistical models are used on the one or more features tables to identify statistical outliers. The one or more statistical models comprises at least one of: Clustering models, Hidden Markov model, and Copula models. In the embodiment where a Copula model is used, a Copula function is applied on all the one or more extracted features. The Copula function comprises using various techniques to estimate a cumulative distribution function for each feature. In one embodiment, a kernel density estimation function is used to estimate the cumulative distribution function. Next, the cumulative distribution function of each feature is used to calculate a U-matrix. The inverse of the U-matrix is then normalized and a RHOHAT computed. The RHOHAT and U-matrix are then used to compute the joint probability distribution of each row of a feature table. In some embodiments, the one or more log line parameters of the one or more features tables are ranked and rearranged by probability. In some embodiments, using one or more statistical models on the one or more features tables from the one or more enterprise or e-commerce system data sources to identify statistical outliers comprises: distributing one or more features from the one or more features tables across two or more servers; using the one or more statistical models on the distributed one or more features; and aggregating results from the using the one or more statistical models on the distributed one or more features. - At
block 620, the statistical outliers are labeled to create one or more labeled features tables. In some embodiments, the labeling of the statistical outliers comprises presenting an administrator the statistical outliers for identification as malicious, non-malicious, or other administrator defined label. - At
block 625, the one or more labeled features tables are used to create one or more rules for identifying threats to the enterprise or e-commerce system. In some embodiments, the one or more rules for identifying threats to the enterprise or e-commerce system comprises a random forest classifier, learning vector quantization, and/or a neural network. - At
block 630, the one or more rules are used on incoming enterprise or e-commerce system data traffic to detect threats to the enterprise or e-commerce system. In some embodiments, the threat detection is done in real-time. If threats are detected, the incoming data traffic to the enterprise or e-commerce system may be blocked and/or challenged. In some embodiments, if a threat is detected, the detected threat may be used to modify the one or more statistical models and/or to modify the one or more rules. Processing subsequently ends at 699. - Some embodiments described herein relate to a computer storage product with one or more non-transitory memory units having instructions or computer code thereon for performing various computer-implemented operations. The one or more memory units are non-transitory in the sense that they do not include transitory propagating signals per se (e.g., a propagating electromagnetic wave carrying information on a transmission medium such as space or a cable). The one or more memory units and computer code (also can be referred to as code) may be those designed and constructed for the specific purpose or purposes. Examples of one or more memory units include, but are not limited to: magnetic storage media such as hard disks, floppy disks, and magnetic tape; optical storage media such as Compact Disc/Digital Video Discs (CD/DVDs), Compact Disc-Read Only Memories (CD-ROMs), and holographic devices; magneto-optical storage media such as optical disks; carrier wave signal processing modules; and hardware devices that are specially configured to store and execute program code, such as Application-Specific Integrated Circuits (ASICs), Programmable Logic Devices (PLDs), Read-Only Memory (ROM), and Random-Access Memory (RAM) devices.
- Examples of computer code include, but are not limited to, micro-code or micro-instructions, machine instructions, such as produced by a compiler, code used to produce a web service, and files containing higher-level instructions that are executed by a computer using an interpreter. For example, embodiments may be implemented using Java, C++, Python, C, or other programming languages (e.g., object-oriented programming languages) and development tools. Additional examples of computer code include, but are not limited to, control signals, encrypted code, database code, and compressed code. Embodiments of distributed database code may be implemented using Hadoop/HDFS, Cassandra, or other database technologies.
- The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
- The benefits and advantages that may be provided by the present invention have been described above with regard to specific embodiments. These benefits and advantages, and any elements or limitations that may cause them to occur or to become more pronounced are not to be construed as critical, required, or essential features of any or all of the claims. As used herein, the terms “comprises,” “comprising,” or any other variations thereof, are intended to be interpreted as non-exclusively including the elements or limitations which follow those terms. Accordingly, a system, method, or other embodiment that comprises a set of elements is not limited to only those elements, and may include other elements not expressly listed or inherent to the claimed embodiment.
- While the present invention has been described with reference to particular embodiments, it should be understood that the embodiments are illustrative and that the scope of the invention is not limited to these embodiments. Many variations, modifications, additions and improvements to the embodiments described above are possible. It is contemplated that these variations, modifications, additions and improvements fall within the scope of the invention as detailed within the following claims.
Claims (46)
1. A method for identifying and detecting threats to an enterprise or e-commerce system, the method comprising:
grouping log lines belonging to one or more log line parameters from one or more enterprise or e-commerce system data sources and/or from incoming data traffic to the enterprise or e-commerce system;
extracting one or more features from the grouped log lines into one or more features tables;
using one or more statistical models on the one or more features tables to identify statistical outliers;
labeling the statistical outliers to create one or more labeled features tables; and
using the one or more labeled features tables to create one or more rules for identifying threats to the enterprise or e-commerce system.
2. The method of claim 1 , wherein the method further comprises ranking and rearranging the one or more log line parameters of the one or more features tables by probability.
3. The method of claim 1 , the method further comprising using the one or more rules on the incoming data traffic to the enterprise or e-commerce system to detect threats to the enterprise or e-commerce system.
4. The method of claim 2 , wherein the using the one or more rules on the incoming data traffic to the enterprise or e-commerce system to detect threats to the enterprise or e-commerce system is in real-time.
5. The method of claim 2 , the method further comprising using the detected threats to modify the one or more statistical models and/or the one or more rules for identifying threats to the enterprise or e-commerce system.
6. The method of claim 2 , the method further comprising blocking and/or challenging the packet flow of the detected threats.
7. The method of claim 1 , wherein the one or more log line parameters comprises at least one of: user ID, session, IP address, and URL query.
8. The method of claim 1 , wherein the one or more enterprise or e-commerce system data sources comprises at least one of: web server access logs, firewall logs, packet captures per application, active directory logs, DNS logs, forward proxy logs, external threat feeds, AV logs, user logon audits, DLP logs, LB logs, IPS/IDS logs, black listed URLs, black listed IP addresses, and black listed referrers.
9. The method of claim 1 , wherein the one or more features comprises at least one of: user session duration, length of user URL query, number of characters of user URL query, number of digits of user URL query, number of punctuations of user URL query, number of requests in user session, average time between clicks in user session, user session click rate, percentage of image requests in user session, percentage of 4xx responses in user session, percentage of 3xx responses in user session, percentage of 2xx responses in user session, percentage of zip responses in user session, percentage of binary responses in user session, percentage of head requests in user session, number of checkouts, number of credit cards added, number of promo codes added, number of gift cards added, number of times items were shipped overnight, number of times new shipping address was added, number of login failures, number of login successes, number of password resets, and total number of requests.
10. The method of claim 1 , wherein the one or more statistical models comprises at least one of: Clustering models, Hidden Markov model, and Copula models.
11. The method of claim 1 , wherein the one or more rules for identifying threats to the enterprise or e-commerce system comprises a random forest classifier, learning vector quantization, and/or a neural network.
12. The method of claim 1 , wherein the using one or more statistical models on the one or more features comprises using a bag of little bootstraps sampling.
13. The method of claim 1 , wherein the using one or more statistical models on the one or more features tables to identify statistical outliers comprises:
distributing one or more features from the one or more features tables across two or more servers;
using the one or more statistical models on the distributed one or more features; and
aggregating results from the using the one or more statistical models on the distributed one or more features.
14. The method of claim 13 , wherein the using the one or more statistical models on the distributed one or more features comprises using a bag of little bootstraps sampling.
15. The method of claim 1 , wherein the labeling the statistical outliers to create one or more labeled features tables comprises presenting an administrator the statistical outliers for identification as malicious, non-malicious, or other administrator defined label.
16. An apparatus for identifying and detecting threats to an enterprise or e-commerce system, the apparatus comprising:
one or more processors;
system memory coupled to the one or more processors;
one or more non-transitory memory units coupled to the one or more processors; and
threat identification and detection code stored on the one or more non-transitory memory units that when executed by the one or more processors are configured to perform a method, comprising:
grouping log lines belonging to one or more log line parameters from one or more enterprise or e-commerce system data sources and/or from incoming data traffic to the enterprise or e-commerce system;
extracting one or more features from the grouped log lines into one or more features tables;
using one or more statistical models on the one or more features tables to identify statistical outliers;
labeling the statistical outliers to create one or more labeled features tables; and
using the one or more labeled features tables to create one or more rules for identifying threats to the enterprise or e-commerce system.
17. The apparatus of claim 16 , wherein the method further comprises ranking and rearranging the one or more log line parameters of the one or more features tables by probability.
18. The apparatus of claim 16 , wherein the method further comprises using the one or more rules on the incoming data traffic to the enterprise or e-commerce system to detect threats to the enterprise or e-commerce system.
19. The apparatus of claim 17 , wherein the using the one or more rules on the incoming data traffic to the enterprise or e-commerce system to detect threats to the enterprise or e-commerce system is in real-time.
20. The apparatus of claim 17 , wherein the method further comprises using the detected threats to modify the one or more statistical models and/or the one or more rules for identifying threats to the enterprise or e-commerce system.
21. The apparatus of claim 17 , wherein the method further comprises blocking and/or challenging the packet flow of the detected threats.
22. The apparatus of claim 16 , wherein the one or more log line parameters comprises at least one of: user ID, session, IP address, and URL query.
23. The apparatus of claim 16 , wherein the one or more enterprise or e-commerce system data sources comprises at least one of: web server access logs, firewall logs, packet captures per application, active directory logs, DNS logs, forward proxy logs, external threat feeds, AV logs, user logon audits, DLP logs, LB logs, IPS/IDS logs, black listed URLs, black listed IP addresses, and black listed referrers.
24. The apparatus of claim 16 , wherein the one or more features comprises at least one of: user session duration, length of user URL query, number of characters of user URL query, number of digits of user URL query, number of punctuations of user URL query, number of requests in user session, average time between clicks in user session, user session click rate, percentage of image requests in user session, percentage of 4xx responses in user session, percentage of 3xx responses in user session, percentage of 2xx responses in user session, percentage of zip responses in user session, percentage of binary responses in user session, percentage of head requests in user session, number of checkouts, number of credit cards added, number of promo codes added, number of gift cards added, number of times items were shipped overnight, number of times new shipping address was added, number of login failures, number of login successes, number of password resets, and total number of requests.
25. The apparatus of claim 16 , wherein the one or more statistical models comprises at least one of: Clustering models, Hidden Markov model, and Copula models.
26. The apparatus of claim 16 , wherein the one or more rules for identifying threats to the enterprise or e-commerce system comprises a random forest classifier, learning vector quantization, and/or a neural network.
27. The apparatus of claim 16 , wherein the using one or more statistical models on the one or more features comprises using a bag of little bootstraps sampling.
28. The apparatus of claim 16 , wherein the using one or more statistical models on the one or more features tables to identify statistical outliers comprises:
distributing one or more features from the one or more features tables across two or more servers;
using the one or more statistical models on the distributed one or more features; and
aggregating results from the using the one or more statistical models on the distributed one or more features.
29. The apparatus of claim 28 , wherein the using the one or more statistical models on the distributed one or more features comprises using a bag of little bootstraps sampling.
30. The apparatus of claim 16 , wherein the labeling the statistical outliers to create one or more labeled features tables comprises presenting an administrator the statistical outliers for identification as malicious, non-malicious, or other administrator defined label.
31. An apparatus for identifying and detecting threats to an enterprise or e-commerce system, the apparatus comprising:
a pattern discoverer; and
one or more pattern normalizers coupled to the pattern discover;
wherein at least one of the one or more pattern normalizers comprise:
one or more pattern normalizer processors;
pattern normalizer system memory coupled to the one or more pattern normalizer processors;
one or more pattern normalizer non-transitory memory units coupled to the one or more pattern normalizer processors;
a pattern normalizer communications device coupled to the one or more pattern normalizer processors, the pattern normalizer communications device being configured to communicate with the pattern discover; and
pattern normalizer code stored on the one or more pattern normalizer non-transitory memory units that when executed by the one or more pattern normalizer processors are configured to perform a pattern normalizer method, comprising:
grouping log lines belonging to one or more log line parameters from one or more enterprise or e-commerce system data sources and/or from incoming data traffic to the enterprise or e-commerce system;
extracting one or more features from the grouped log lines into one or more features tables; and
sending the one or more features tables to the pattern discoverer; and
wherein the pattern discoverer comprises:
one or more pattern discoverer processors;
pattern discoverer system memory coupled to the one or more pattern discoverer processors;
one or more pattern discoverer non-transitory memory units coupled to the one or more pattern discoverer processors;
a pattern discoverer communications device coupled to the one or more pattern discoverer processors, the pattern discoverer communications device being configured to communicate with the one or more pattern normalizers; and
pattern discoverer code stored on the one or more pattern discoverer non-transitory memory units that when executed by the one or more pattern discoverer processors are configured to perform a pattern discoverer method, comprising:
using one or more statistical models on the one or more features tables to identify statistical outliers;
labeling the statistical outliers to create one or more labeled features tables; and
using the one or more labeled features tables to create one or more rules for identifying threats to the enterprise or e-commerce system.
32. The apparatus of claim 31 , wherein the pattern discoverer method further comprises ranking and rearranging the one or more log line parameters of the one or more features tables by probability.
33. The apparatus of claim 31 , the apparatus further comprising one or more threat detectors coupled to the pattern discover, wherein the pattern discoverer method further comprises sending to the one or more threat detectors, the one or more rules for identifying threats to the enterprise or e-commerce system; and wherein at least one of the one or more threat detectors comprise:
one or more threat detector processors;
threat detector system memory coupled to the one or more threat detector processors;
one or more threat detector non-transitory memory units coupled to the one or more threat detector processors;
a threat detector communications device coupled to the one or more threat detector processors, the threat detector communications device being configured to communicate with the pattern discover; and
threat detector code stored on the one or more threat detector non-transitory memory units that when executed by the one or more threat detector processors are configured to perform a threat detector method, comprising:
using the one or more rules on the incoming data traffic to the enterprise or e-commerce system to detect threats to the enterprise or e-commerce system.
34. The apparatus of claim 32 , wherein the using the one or more rules on the incoming data traffic to the enterprise or e-commerce system to detect threats to the enterprise or e-commerce system is in real-time.
35. The apparatus of claim 32 , wherein the threat detector method further comprises sending the detected threats to the pattern discoverer; and wherein the pattern discoverer method further comprises using the detected threats to modify the one or more statistical models and/or the one or more rules for identifying threats to the enterprise or e-commerce system.
36. The apparatus of claim 32 , wherein the threat detector method further comprises blocking and/or challenging the packet flow of the detected threats.
37. The apparatus of claim 31 , wherein the one or more log line parameters comprises at least one of: user ID, session, IP address, and URL query.
38. The apparatus of claim 31 , the apparatus further comprising a cloud server linked to the pattern discoverer, the cloud server being configured to share the one or more rules for identifying threats to the enterprise or e-commerce system with one or more enterprise or e-commerce systems.
39. The apparatus of claim 31 , wherein the one or more enterprise or e-commerce system data sources comprises at least one of: web server access logs, firewall logs, packet captures per application, active directory logs, DNS logs, forward proxy logs, external threat feeds, AV logs, user logon audits, DLP logs, LB logs, IPS/IDS logs, black listed URLs, black listed IP addresses, and black listed referrers.
40. The apparatus of claim 31 , wherein the one or more features comprises at least one of: user session duration, length of user URL query, number of characters of user URL query, number of digits of user URL query, number of punctuations of user URL query, number of requests in user session, average time between clicks in user session, user session click rate, percentage of image requests in user session, percentage of 4xx responses in user session, percentage of 3xx responses in user session, percentage of 2xx responses in user session, percentage of zip responses in user session, percentage of binary responses in user session, percentage of head requests in user session, number of checkouts, number of credit cards added, number of promo codes added, number of gift cards added, number of times items were shipped overnight, number of times new shipping address was added, number of login failures, number of login successes, number of password resets, and total number of requests.
41. The apparatus of claim 31 , wherein the one or more statistical models comprises at least one of: Clustering models, Hidden Markov model, and Copula models.
42. The apparatus of claim 31 , wherein the one or more rules for identifying threats to the enterprise or e-commerce system comprises a random forest classifier, learning vector quantization, and/or a neural network.
43. The apparatus of claim 31 , the using one or more statistical models on the one or more features comprises using a bag of little bootstraps sampling.
44. The apparatus of claim 31 , wherein the using one or more statistical models on the one or more features tables to identify statistical outliers comprises:
distributing one or more features from the one or more features tables across two or more servers;
using the one or more statistical models on the distributed one or more features; and
aggregating results from the using the one or more statistical models on the distributed one or more features.
45. The apparatus of claim 44 , wherein the using the one or more statistical models on the distributed one or more features comprises using a bag of little bootstraps sampling.
46. The apparatus of claim 31 , wherein the labeling the statistical outliers to create one or more labeled features tables comprises presenting an administrator the statistical outliers for identification as malicious, non-malicious, or other administrator defined label.
Priority Applications (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/532,812 US20160127402A1 (en) | 2014-11-04 | 2014-11-04 | Method and apparatus for identifying and detecting threats to an enterprise or e-commerce system |
US15/258,797 US9661025B2 (en) | 2014-11-04 | 2016-09-07 | Method and apparatus for identifying and detecting threats to an enterprise or e-commerce system |
US15/382,413 US9904893B2 (en) | 2013-04-02 | 2016-12-16 | Method and system for training a big data machine to defend |
US15/612,388 US10044762B2 (en) | 2014-11-04 | 2017-06-02 | Copula optimization method and apparatus for identifying and detecting threats to an enterprise or e-commerce system and other applications |
US15/662,323 US10264027B2 (en) | 2014-11-04 | 2017-07-28 | Computer-implemented process and system employing outlier score detection for identifying and detecting scenario-specific data elements from a dynamic data source |
PCT/US2018/035867 WO2018223133A1 (en) | 2014-11-04 | 2018-06-04 | Copula optimization method and apparatus for identifying and detecting threats to an enterprise or e-commerce system and other applications |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/532,812 US20160127402A1 (en) | 2014-11-04 | 2014-11-04 | Method and apparatus for identifying and detecting threats to an enterprise or e-commerce system |
Related Child Applications (4)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/258,797 Continuation US9661025B2 (en) | 2013-04-02 | 2016-09-07 | Method and apparatus for identifying and detecting threats to an enterprise or e-commerce system |
US15/258,797 Continuation-In-Part US9661025B2 (en) | 2013-04-02 | 2016-09-07 | Method and apparatus for identifying and detecting threats to an enterprise or e-commerce system |
US15/612,388 Continuation US10044762B2 (en) | 2014-11-04 | 2017-06-02 | Copula optimization method and apparatus for identifying and detecting threats to an enterprise or e-commerce system and other applications |
US15/662,323 Continuation US10264027B2 (en) | 2014-11-04 | 2017-07-28 | Computer-implemented process and system employing outlier score detection for identifying and detecting scenario-specific data elements from a dynamic data source |
Publications (1)
Publication Number | Publication Date |
---|---|
US20160127402A1 true US20160127402A1 (en) | 2016-05-05 |
Family
ID=55854020
Family Applications (4)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/532,812 Abandoned US20160127402A1 (en) | 2013-04-02 | 2014-11-04 | Method and apparatus for identifying and detecting threats to an enterprise or e-commerce system |
US15/258,797 Active US9661025B2 (en) | 2013-04-02 | 2016-09-07 | Method and apparatus for identifying and detecting threats to an enterprise or e-commerce system |
US15/612,388 Active - Reinstated US10044762B2 (en) | 2014-11-04 | 2017-06-02 | Copula optimization method and apparatus for identifying and detecting threats to an enterprise or e-commerce system and other applications |
US15/662,323 Active US10264027B2 (en) | 2014-11-04 | 2017-07-28 | Computer-implemented process and system employing outlier score detection for identifying and detecting scenario-specific data elements from a dynamic data source |
Family Applications After (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/258,797 Active US9661025B2 (en) | 2013-04-02 | 2016-09-07 | Method and apparatus for identifying and detecting threats to an enterprise or e-commerce system |
US15/612,388 Active - Reinstated US10044762B2 (en) | 2014-11-04 | 2017-06-02 | Copula optimization method and apparatus for identifying and detecting threats to an enterprise or e-commerce system and other applications |
US15/662,323 Active US10264027B2 (en) | 2014-11-04 | 2017-07-28 | Computer-implemented process and system employing outlier score detection for identifying and detecting scenario-specific data elements from a dynamic data source |
Country Status (2)
Country | Link |
---|---|
US (4) | US20160127402A1 (en) |
WO (1) | WO2018223133A1 (en) |
Cited By (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9690937B1 (en) * | 2015-03-30 | 2017-06-27 | EMC IP Holding Company LLC | Recommending a set of malicious activity detection rules in an automated, data-driven manner |
US20170364680A1 (en) * | 2016-06-20 | 2017-12-21 | Sap Se | Detecting attacks by matching of access frequencies and sequences in different software layers |
US20180150639A1 (en) * | 2015-05-28 | 2018-05-31 | Entit Software Llc | Security vulnerability detection |
US9996409B2 (en) * | 2016-03-28 | 2018-06-12 | Ca, Inc. | Identification of distinguishable anomalies extracted from real time data streams |
US10002144B2 (en) | 2016-03-25 | 2018-06-19 | Ca, Inc. | Identification of distinguishing compound features extracted from real time data streams |
CN108234524A (en) * | 2018-04-02 | 2018-06-29 | 广州广电研究院有限公司 | Method, apparatus, equipment and the storage medium of network data abnormality detection |
US10158657B1 (en) * | 2015-08-06 | 2018-12-18 | Microsoft Technology Licensing Llc | Rating IP addresses based on interactions between users and an online service |
US10204146B2 (en) | 2016-02-09 | 2019-02-12 | Ca, Inc. | Automatic natural language processing based data extraction |
CN109416763A (en) * | 2016-07-01 | 2019-03-01 | 英特尔公司 | Machine learning in antagonism environment |
CN109600382A (en) * | 2018-12-19 | 2019-04-09 | 北京知道创宇信息技术有限公司 | Webshell detection method and device, HMM model training method and device |
US10291646B2 (en) | 2016-10-03 | 2019-05-14 | Telepathy Labs, Inc. | System and method for audio fingerprinting for attack detection |
US10324956B1 (en) | 2015-11-11 | 2019-06-18 | Microsoft Technology Licensing, Llc | Automatically mapping organizations to addresses |
WO2019165883A1 (en) * | 2018-03-01 | 2019-09-06 | 中兴通讯股份有限公司 | Data processing method and apparatus |
WO2019228158A1 (en) * | 2018-05-29 | 2019-12-05 | 北京白山耘科技有限公司 | Method and apparatus for detecting dangerous information by means of text information, medium, and device |
CN110912908A (en) * | 2019-11-28 | 2020-03-24 | 中国电子产品可靠性与环境试验研究所((工业和信息化部电子第五研究所)(中国赛宝实验室)) | Network protocol anomaly detection method and device, computer equipment and storage medium |
US10681059B2 (en) * | 2016-05-25 | 2020-06-09 | CyberOwl Limited | Relating to the monitoring of network security |
CN111901329A (en) * | 2020-07-22 | 2020-11-06 | 浙江军盾信息科技有限公司 | Method and device for identifying network security event |
CN113364745A (en) * | 2021-05-21 | 2021-09-07 | 北京国联天成信息技术有限公司 | Log collecting and analyzing processing method |
CN113453076A (en) * | 2020-03-24 | 2021-09-28 | 中国移动通信集团河北有限公司 | User video service quality evaluation method and device, computing equipment and storage medium |
CN114338087A (en) * | 2021-12-03 | 2022-04-12 | 成都安恒信息技术有限公司 | Directional operation and maintenance auditing method and system based on firewall |
US20220200960A1 (en) * | 2020-12-21 | 2022-06-23 | Oracle International Corporation | Automatic web application firewall (waf) security suggester |
US20220247750A1 (en) * | 2021-01-29 | 2022-08-04 | Paypal, Inc. | Evaluating access requests using assigned common actor identifiers |
US20220417749A1 (en) * | 2019-11-20 | 2022-12-29 | Siemens Energy Global GmbH & Co. KG | Protected resetting of an iot device |
CN116389166A (en) * | 2023-05-29 | 2023-07-04 | 天翼云科技有限公司 | Malicious DOS traffic detection method and device, electronic equipment and storage medium |
US20240121267A1 (en) * | 2022-10-06 | 2024-04-11 | Palo Alto Networks, Inc. | Inline malicious url detection with hierarchical structure patterns |
CN118628168A (en) * | 2024-08-15 | 2024-09-10 | 长沙时代跳动科技有限公司 | Private domain flow management method |
Families Citing this family (40)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160127402A1 (en) * | 2014-11-04 | 2016-05-05 | Patternex, Inc. | Method and apparatus for identifying and detecting threats to an enterprise or e-commerce system |
US9866576B2 (en) | 2015-04-17 | 2018-01-09 | Centripetal Networks, Inc. | Rule-based network-threat detection |
RU2649793C2 (en) | 2016-08-03 | 2018-04-04 | ООО "Группа АйБи" | Method and system of detecting remote connection when working on web resource pages |
US10848508B2 (en) | 2016-09-07 | 2020-11-24 | Patternex, Inc. | Method and system for generating synthetic feature vectors from real, labelled feature vectors in artificial intelligence training of a big data machine to defend |
RU2637477C1 (en) * | 2016-12-29 | 2017-12-04 | Общество с ограниченной ответственностью "Траст" | System and method for detecting phishing web pages |
RU2671991C2 (en) | 2016-12-29 | 2018-11-08 | Общество с ограниченной ответственностью "Траст" | System and method for collecting information for detecting phishing |
CN108804469B (en) * | 2017-05-04 | 2021-10-29 | 腾讯科技(深圳)有限公司 | Webpage identification method and electronic equipment |
RU2689816C2 (en) | 2017-11-21 | 2019-05-29 | ООО "Группа АйБи" | Method for classifying sequence of user actions (embodiments) |
CN108024156B (en) * | 2017-12-14 | 2020-04-14 | 四川大学 | Partially reliable video transmission method based on hidden Markov model |
RU2680736C1 (en) | 2018-01-17 | 2019-02-26 | Общество с ограниченной ответственностью "Группа АйБи ТДС" | Malware files in network traffic detection server and method |
RU2677361C1 (en) | 2018-01-17 | 2019-01-16 | Общество с ограниченной ответственностью "Траст" | Method and system of decentralized identification of malware programs |
RU2676247C1 (en) | 2018-01-17 | 2018-12-26 | Общество С Ограниченной Ответственностью "Группа Айби" | Web resources clustering method and computer device |
RU2677368C1 (en) | 2018-01-17 | 2019-01-16 | Общество С Ограниченной Ответственностью "Группа Айби" | Method and system for automatic determination of fuzzy duplicates of video content |
RU2668710C1 (en) | 2018-01-17 | 2018-10-02 | Общество с ограниченной ответственностью "Группа АйБи ТДС" | Computing device and method for detecting malicious domain names in network traffic |
RU2681699C1 (en) | 2018-02-13 | 2019-03-12 | Общество с ограниченной ответственностью "Траст" | Method and server for searching related network resources |
CN110324198B (en) * | 2018-03-30 | 2021-06-04 | 华为技术有限公司 | Packet loss processing method and packet loss processing device |
EP3795975B1 (en) | 2018-06-14 | 2023-08-02 | Mitsubishi Electric Corporation | Abnormality sensing apparatus, abnormality sensing method, and abnormality sensing program |
RU2708508C1 (en) | 2018-12-17 | 2019-12-09 | Общество с ограниченной ответственностью "Траст" | Method and a computing device for detecting suspicious users in messaging systems |
RU2701040C1 (en) | 2018-12-28 | 2019-09-24 | Общество с ограниченной ответственностью "Траст" | Method and a computer for informing on malicious web resources |
WO2020176005A1 (en) | 2019-02-27 | 2020-09-03 | Общество С Ограниченной Ответственностью "Группа Айби" | Method and system for identifying a user according to keystroke dynamics |
CN109981647B (en) * | 2019-03-27 | 2021-07-06 | 北京百度网讯科技有限公司 | Method and apparatus for detecting brute force cracking |
CN110163368B (en) * | 2019-04-18 | 2023-10-20 | 腾讯科技(深圳)有限公司 | Deep learning model training method, device and system based on mixed precision |
CN110070535A (en) * | 2019-04-23 | 2019-07-30 | 东北大学 | A kind of retinal vascular images dividing method of Case-based Reasoning transfer learning |
CN110120956B (en) * | 2019-05-28 | 2021-06-29 | 杭州迪普科技股份有限公司 | Message processing method and device based on virtual firewall |
CN110300106B (en) * | 2019-06-24 | 2021-11-23 | 中国人民解放军战略支援部队信息工程大学 | Moving target defense decision selection method, device and system based on Markov time game |
CN110516879A (en) * | 2019-08-29 | 2019-11-29 | 京东城市(北京)数字科技有限公司 | Cross-platform modeling method, system and device |
RU2728498C1 (en) | 2019-12-05 | 2020-07-29 | Общество с ограниченной ответственностью "Группа АйБи ТДС" | Method and system for determining software belonging by its source code |
RU2728497C1 (en) | 2019-12-05 | 2020-07-29 | Общество с ограниченной ответственностью "Группа АйБи ТДС" | Method and system for determining belonging of software by its machine code |
RU2743974C1 (en) | 2019-12-19 | 2021-03-01 | Общество с ограниченной ответственностью "Группа АйБи ТДС" | System and method for scanning security of elements of network architecture |
SG10202001963TA (en) | 2020-03-04 | 2021-10-28 | Group Ib Global Private Ltd | System and method for brand protection based on the search results |
CN111611924B (en) * | 2020-05-21 | 2022-03-25 | 东北林业大学 | Mushroom identification method based on deep migration learning model |
CN111680742A (en) * | 2020-06-04 | 2020-09-18 | 甘肃电力科学研究院 | Attack data labeling method applied to new energy plant station network security field |
US11652832B2 (en) * | 2020-07-01 | 2023-05-16 | Vmware, Inc. | Automated identification of anomalous devices |
US11475090B2 (en) | 2020-07-15 | 2022-10-18 | Group-Ib Global Private Limited | Method and system for identifying clusters of affiliated web resources |
RU2743619C1 (en) | 2020-08-06 | 2021-02-20 | Общество с ограниченной ответственностью "Группа АйБи ТДС" | Method and system for generating the list of compromise indicators |
US11712629B2 (en) * | 2020-09-03 | 2023-08-01 | Roblox Corporation | Determining game quality based on gameplay duration |
US11935077B2 (en) * | 2020-10-04 | 2024-03-19 | Vunet Systems Private Limited | Operational predictive scoring of components and services of an information technology system |
US11947572B2 (en) | 2021-03-29 | 2024-04-02 | Group IB TDS, Ltd | Method and system for clustering executable files |
NL2030861B1 (en) | 2021-06-01 | 2023-03-14 | Trust Ltd | System and method for external monitoring a cyberattack surface |
RU2769075C1 (en) | 2021-06-10 | 2022-03-28 | Общество с ограниченной ответственностью "Группа АйБи ТДС" | System and method for active detection of malicious network resources |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080229415A1 (en) * | 2005-07-01 | 2008-09-18 | Harsh Kapoor | Systems and methods for processing data flows |
US9015301B2 (en) * | 2007-01-05 | 2015-04-21 | Digital Doors, Inc. | Information infrastructure management tools with extractor, secure storage, content analysis and classification and method therefor |
US20120137367A1 (en) * | 2009-11-06 | 2012-05-31 | Cataphora, Inc. | Continuous anomaly detection based on behavior modeling and heterogeneous information analysis |
US8418249B1 (en) * | 2011-11-10 | 2013-04-09 | Narus, Inc. | Class discovery for automated discovery, attribution, analysis, and risk assessment of security threats |
US9276948B2 (en) * | 2011-12-29 | 2016-03-01 | 21Ct, Inc. | Method and apparatus for identifying a threatening network |
ES2745701T3 (en) * | 2012-05-15 | 2020-03-03 | Univ Of Lancaster | Bad system status identification |
US20130318584A1 (en) * | 2012-05-25 | 2013-11-28 | Qualcomm Incorporated | Learning information on usage by a user, of one or more device(s), for cumulative inference of user's situation |
US9361885B2 (en) * | 2013-03-12 | 2016-06-07 | Nuance Communications, Inc. | Methods and apparatus for detecting a voice command |
US9904893B2 (en) * | 2013-04-02 | 2018-02-27 | Patternex, Inc. | Method and system for training a big data machine to defend |
US20160127402A1 (en) * | 2014-11-04 | 2016-05-05 | Patternex, Inc. | Method and apparatus for identifying and detecting threats to an enterprise or e-commerce system |
-
2014
- 2014-11-04 US US14/532,812 patent/US20160127402A1/en not_active Abandoned
-
2016
- 2016-09-07 US US15/258,797 patent/US9661025B2/en active Active
-
2017
- 2017-06-02 US US15/612,388 patent/US10044762B2/en active Active - Reinstated
- 2017-07-28 US US15/662,323 patent/US10264027B2/en active Active
-
2018
- 2018-06-04 WO PCT/US2018/035867 patent/WO2018223133A1/en active Application Filing
Cited By (36)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9690937B1 (en) * | 2015-03-30 | 2017-06-27 | EMC IP Holding Company LLC | Recommending a set of malicious activity detection rules in an automated, data-driven manner |
US20180150639A1 (en) * | 2015-05-28 | 2018-05-31 | Entit Software Llc | Security vulnerability detection |
US10614223B2 (en) * | 2015-05-28 | 2020-04-07 | Micro Focus Llc | Security vulnerability detection |
US10158657B1 (en) * | 2015-08-06 | 2018-12-18 | Microsoft Technology Licensing Llc | Rating IP addresses based on interactions between users and an online service |
US10324956B1 (en) | 2015-11-11 | 2019-06-18 | Microsoft Technology Licensing, Llc | Automatically mapping organizations to addresses |
US10204146B2 (en) | 2016-02-09 | 2019-02-12 | Ca, Inc. | Automatic natural language processing based data extraction |
US10002144B2 (en) | 2016-03-25 | 2018-06-19 | Ca, Inc. | Identification of distinguishing compound features extracted from real time data streams |
US9996409B2 (en) * | 2016-03-28 | 2018-06-12 | Ca, Inc. | Identification of distinguishable anomalies extracted from real time data streams |
US10681059B2 (en) * | 2016-05-25 | 2020-06-09 | CyberOwl Limited | Relating to the monitoring of network security |
US20170364680A1 (en) * | 2016-06-20 | 2017-12-21 | Sap Se | Detecting attacks by matching of access frequencies and sequences in different software layers |
US10061925B2 (en) * | 2016-06-20 | 2018-08-28 | Sap Se | Detecting attacks by matching of access frequencies and sequences in different software layers |
CN109416763A (en) * | 2016-07-01 | 2019-03-01 | 英特尔公司 | Machine learning in antagonism environment |
US10992700B2 (en) | 2016-10-03 | 2021-04-27 | Telepathy Ip Holdings | System and method for enterprise authorization for social partitions |
US11122074B2 (en) | 2016-10-03 | 2021-09-14 | Telepathy Labs, Inc. | System and method for omnichannel social engineering attack avoidance |
US10419475B2 (en) | 2016-10-03 | 2019-09-17 | Telepathy Labs, Inc. | System and method for social engineering identification and alerting |
US11165813B2 (en) | 2016-10-03 | 2021-11-02 | Telepathy Labs, Inc. | System and method for deep learning on attack energy vectors |
US10404740B2 (en) | 2016-10-03 | 2019-09-03 | Telepathy Labs, Inc. | System and method for deprovisioning |
US11818164B2 (en) | 2016-10-03 | 2023-11-14 | Telepathy Labs, Inc. | System and method for omnichannel social engineering attack avoidance |
US10291646B2 (en) | 2016-10-03 | 2019-05-14 | Telepathy Labs, Inc. | System and method for audio fingerprinting for attack detection |
WO2019165883A1 (en) * | 2018-03-01 | 2019-09-06 | 中兴通讯股份有限公司 | Data processing method and apparatus |
CN108234524A (en) * | 2018-04-02 | 2018-06-29 | 广州广电研究院有限公司 | Method, apparatus, equipment and the storage medium of network data abnormality detection |
WO2019228158A1 (en) * | 2018-05-29 | 2019-12-05 | 北京白山耘科技有限公司 | Method and apparatus for detecting dangerous information by means of text information, medium, and device |
CN109600382A (en) * | 2018-12-19 | 2019-04-09 | 北京知道创宇信息技术有限公司 | Webshell detection method and device, HMM model training method and device |
US12108253B2 (en) * | 2019-11-20 | 2024-10-01 | Siemens Energy Global GmH & Co. KG | Protected resetting of an IoT device |
US20220417749A1 (en) * | 2019-11-20 | 2022-12-29 | Siemens Energy Global GmbH & Co. KG | Protected resetting of an iot device |
CN110912908A (en) * | 2019-11-28 | 2020-03-24 | 中国电子产品可靠性与环境试验研究所((工业和信息化部电子第五研究所)(中国赛宝实验室)) | Network protocol anomaly detection method and device, computer equipment and storage medium |
CN113453076A (en) * | 2020-03-24 | 2021-09-28 | 中国移动通信集团河北有限公司 | User video service quality evaluation method and device, computing equipment and storage medium |
CN111901329A (en) * | 2020-07-22 | 2020-11-06 | 浙江军盾信息科技有限公司 | Method and device for identifying network security event |
US20220200960A1 (en) * | 2020-12-21 | 2022-06-23 | Oracle International Corporation | Automatic web application firewall (waf) security suggester |
US20220247750A1 (en) * | 2021-01-29 | 2022-08-04 | Paypal, Inc. | Evaluating access requests using assigned common actor identifiers |
US12034731B2 (en) * | 2021-01-29 | 2024-07-09 | Paypal, Inc. | Evaluating access requests using assigned common actor identifiers |
CN113364745A (en) * | 2021-05-21 | 2021-09-07 | 北京国联天成信息技术有限公司 | Log collecting and analyzing processing method |
CN114338087A (en) * | 2021-12-03 | 2022-04-12 | 成都安恒信息技术有限公司 | Directional operation and maintenance auditing method and system based on firewall |
US20240121267A1 (en) * | 2022-10-06 | 2024-04-11 | Palo Alto Networks, Inc. | Inline malicious url detection with hierarchical structure patterns |
CN116389166A (en) * | 2023-05-29 | 2023-07-04 | 天翼云科技有限公司 | Malicious DOS traffic detection method and device, electronic equipment and storage medium |
CN118628168A (en) * | 2024-08-15 | 2024-09-10 | 长沙时代跳动科技有限公司 | Private domain flow management method |
Also Published As
Publication number | Publication date |
---|---|
US20170272471A1 (en) | 2017-09-21 |
US10044762B2 (en) | 2018-08-07 |
US9661025B2 (en) | 2017-05-23 |
WO2018223133A1 (en) | 2018-12-06 |
US10264027B2 (en) | 2019-04-16 |
US20160381077A1 (en) | 2016-12-29 |
US20170339192A1 (en) | 2017-11-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9661025B2 (en) | Method and apparatus for identifying and detecting threats to an enterprise or e-commerce system | |
US10972495B2 (en) | Methods and apparatus for detecting and identifying malware by mapping feature data into a semantic space | |
US10681012B2 (en) | Methods and systems for deep learning based API traffic security | |
US11196756B2 (en) | Identifying notable events based on execution of correlation searches | |
US11418525B2 (en) | Data processing method, device and storage medium | |
Manadhata et al. | Detecting malicious domains via graph inference | |
US10866849B2 (en) | System and method for automated computer system diagnosis and repair | |
US20210203690A1 (en) | Phishing detection using certificates associated with uniform resource locators | |
US20210203692A1 (en) | Phishing detection using uniform resource locators | |
US12021894B2 (en) | Phishing detection based on modeling of web page content | |
US10749882B2 (en) | Network security system and methods for encoding network connectivity for activity classification | |
CN109155774A (en) | System and method for detecting security threat | |
CN111885007B (en) | Information tracing method, device, system and storage medium | |
US11470114B2 (en) | Malware and phishing detection and mediation platform | |
US12010150B2 (en) | Multi-perspective security context per actor | |
US11038803B2 (en) | Correlating network level and application level traffic | |
Las-Casas et al. | A big data architecture for security data and its application to phishing characterization | |
Mimura et al. | A practical experiment of the HTTP-based RAT detection method in proxy server logs | |
CN108876314B (en) | Career professional ability traceable method and platform | |
US11838313B2 (en) | Artificial intelligence (AI)-based malware detection | |
Ismail et al. | Stateless malware packet detection by incorporating naive bayes with known malware signatures | |
Mishra et al. | Cloud Computing Security: Machine and Deep Learning Models Analysis | |
Sharma et al. | Network log clustering using k-means algorithm | |
Ritchey et al. | Machine learning toolkit for system log file reduction and detection of malicious behavior | |
US11886582B1 (en) | Malicious javascript detection based on abstract syntax trees (AST) and deep machine learning (DML) |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: PATTERNEX, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:VEERAMACHANENI, UDAY;BASSIAS, CONSTANTINOS;KORRAPATI, VAMSI;AND OTHERS;SIGNING DATES FROM 20150724 TO 20150817;REEL/FRAME:036521/0926 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |