WO2016034935A1

WO2016034935A1 - Protecting against phishing attacks

Info

Publication number: WO2016034935A1
Application number: PCT/IB2015/001511
Authority: WO
Inventors: Rafael SANTIAGO DE SOUZA NETTO; Rafael Luiz FERNANDES LEANDRO JUNIOR; Thiago Guimaraes BRITO; Emilio CINI SIMONI; Carlos Henrique BAIA E SILVA
Original assignee: Gas Informatica Ltda
Priority date: 2014-09-02
Filing date: 2015-09-03
Publication date: 2016-03-10

Abstract

In an example embodiment, phishing attacks are detected by analyzing visual data associated with a web page. Patterns in the visual data are compared with a list of known patterns associated with known web pages. If there is a match, the web page is determined to be a malicious web page.

Description

PROTECTING AGAINST PHISHING ATTACKS

CROSS REFERENCE TO RELATED APPLICATIONS

[0001] This application claims the benefit of priority to U.S. Provisional Application No. 62/044,612, filed September 2, 2014.

TECHNICAL FIELD

[0002] The present disclosure relates generally to computer security.

BACKGROUND

[0003] Phishing is the attempt to acquire sensitive information such as usernames, passwords, bank account numbers, or credit card numbers by masquerading as a trustworthy For example, a communication, such as an email claiming to be from a bank or online payment processor may be employed to lure a customer into divulging account information. When the customer clicks on a link in the email, a counterfeit web page is presented. Another type of phishing attack involves the use of misspellings of domain names. For example "amozon.com" may be used to present a counterfeit web page for "amazon.com".

BRIEF DESCRIPTION OF THE DRAWINGS

[0004] The accompanying drawings incorporated herein and forming a part of the specification illustrate the example embodiments.

[0005] FIG. 1 is a block diagram illustrating an example of a network with a client and servers.

[0006] FIG. 2 is a block diagram of a computer system upon which an example embodiment can be implemented.

[0007] FIG. 3 is a block diagram of a method for determining whether a web page is a malicious web page.

[0008] FIG. 4 is a block diagram of a method for performing a visual analysis.

OVERVIEW OF EXAMPLE EMBODIMENTS

[0009] The following presents a simplified overview of the example embodiments in order to provide a basic understanding of some aspects of the example embodiments. This overview is not an extensive overview of the example embodiments. It is intended to neither identify key or critical elements of the example embodiments nor delineate the scope of the appended claims. Its sole purpose is to present some concepts of the example embodiments in a simplified form as a prelude to the more detailed description that is presented later.

[0010] In accordance with an example embodiment, there is disclosed herein, a method for detecting phishing attacks by analyzing visual data associated with a web page. Patterns in the visual data are compared with a list of known patterns associated with known web pages. If there is a match, the web page is determined to be a malicious web page. Other embodiments may include apparatuses and computer readable mediums of instructions that when executed implement the method.

DESCRIPTION OF EXAMPLE EMBODIMENTS

[0011] This description provides examples not intended to limit the scope of the appended claims. The figures generally indicate the features of the examples, where it is understood and appreciated that like reference numerals are used to refer to like elements. Reference in the specification to "one embodiment" or "an embodiment" or "an example embodiment" means that a particular feature, structure, or characteristic described is included in at least one embodiment described herein and does not imply that the feature, structure, or characteristic is present in all embodiments described herein.

[0012] FIG. 1 is a block diagram illustrating an example 100 of a network 110 coupled with a client 102 and a server 112 being evaluated. In an example embodiment, the client 102 comprises a transceiver 104 and anti-phishing logic 106. Anti-phishing logic 106 is coupled with transceiver 104 and is operable to send, receive, or both send and receive data via the transceiver 104. "Logic", as used herein, includes but is not limited to hardware, firmware, software and/or combinations of each to perform a function(s) or an action(s), and/or to cause a function or action from another component. For example, based on a desired application or need, logic may include a software controlled microprocessor, discrete logic such as an application specific integrated circuit (ASIC), a programmable/programmed logic device, memory device containing instructions, or the like, or combinational logic embodied in hardware. Logic may also be fully embodied as software that performs the desired functionality when executed by a processor.

[0013] The transceiver 104 may be suitably any type of wired or wireless transceiver. The transceiver 104 is coupled via link 108 with the network 1 10. The server being evaluated (or target server) 2 is coupled via link 4 to the network 110. A security server 116 with a database of patterns 120 is coupled via link 118 to the network 110. Although links 108, 1 14, and 1 18 are illustrated as single links, this is merely for ease of illustration and those skilled in the art should readily appreciate that links 108, 114, and 118 may suitably comprise a wired, wireless, or any combination of wired and wireless links.

[0014] In an example embodiment, the browser 122 sends a request for a web page via transceiver 104. The anti-phishing logic 106, which in an example embodiment is a browser component, parses data, such as visual data, from the web page 112. In an example embodiment, the anti-phishing logic 106 collects screen shots and produces a raw data extract (or document object model "DOM") from the data collected from the web page 1 12. In particular embodiments, the anti-phishing logic 106 may use a computer vision model (e.g., screen snapshot).

[0014] In an example embodiment, the anti-phishing logic 106 crops the border of the image laterally and vertically and executes an algorithm to extract the usable area of the web page 1 12 in order to remove any unnecessary information (e.g., blank areas).

[0015] In an example embodiment, the anti-phishing logic 106 executes a bilinear interpolation algorithm in the image in order to scale to an expected size.

[0016] In an example embodiment, the anti-phishing logic 106 executes a color distribution analysis in order to identify the percentage of predefined colors positions according to the associated with known web pages (e.g., percentage of yellow in the top, left, center, middle and total area). This set of information produces a "Page DNA".

[0015] In an example embodiment, the anti-phishing logic 106 employs an artificial intelligence model to analyze the Page DNA, which classifies the web page 112 as either "trusted" or "malicious" (e.g., phishing). In an example embodiment, the artificial intelligence model is based on a machine algorithm, such as, for example a RANDOM FORESTS, algorithm that searches for patterns inside the Page DNA (e.g. if there is more than 50% of yellow in the top, more than 20% of blue in the footer and 2% of black in the middle, this can be considered a web page similar to a predetermined web page, such as, for example, a financial institution's web page).

[0016] For example, in an example embodiment, the anti-phishing logic 106 compares the visual information obtained from the web page 1 12 with a list of known patterns to determine whether the web page 112 is similar to a known web page. For example, the anti-phishing logic 106 may determine whether the web page 112 is similar to a financial institution's (e.g., bank, credit union, savings and loan, etc.) web page.

[0017] In an example embodiment, if the anti-phishing logic 06 determines that the web page is malicious, the anti-phishing logic 106 re-directs the browser 122 to a predetermined web page. For example, the predetermined web page can be a blank web page, such as "aboutblank". In particular embodiment's anti-phishing logic 106 provides an alert indicating that the web page is a malicious web page. For example, a message may be displayed on a user interface. In particular embodiments, the anti-phishing logic 106 is operable to store data representative of a uniform resource locator ("URL") for the web page in a list of malicious web pages.

[0018] Fig. 2 is a block diagram that illustrates a computer system 200 upon which an example embodiment may be implemented. Computer system 200 may be employed for implementing the functionality of the anti-phishing logic 106 in FIG. 1.

[0019] Computer system 200 includes a bus 202 or other communication mechanism for communicating information and a processor 204 coupled with bus 202 for processing information. Computer system 200 also includes a main memory 206, such as random access memory (RAM) or other dynamic storage device coupled to bus 202 for storing information and instructions to be executed by processor 204. Main memory 206 also may be used for storing a temporary variable or other intermediate information during execution of instructions to be executed by processor 204. Computer system 200 further includes a read only memory (ROM) 208 or other static storage device coupled to bus 202 for storing static information and instructions for processor 204. A storage device 210, such as a magnetic disk or optical disk, is provided and coupled to bus 202 for storing information and instructions.

[0020] Computer system 200 may be coupled via bus 202 to a user interface 212. The user interface 212 may suitably comprise a display such as a cathode ray tube (CRT) or liquid crystal display (LCD), for displaying information to a computer user. The user interface 212 may also include an input device, such as a keyboard including alphanumeric and other keys is coupled to bus 202 for communicating information and command selections to processor 204. Another type of user input device is a cursor control, such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to processor 204 and for controlling cursor movement on a display. This type of input device typically has two degrees of freedom in two axes, a first axis (e.g. x) and a second axis (e.g. y) that allows the device to specify positions in a plane. In particular embodiments, the user interface 212 may be a touch screen. In an example embodiment, if the processor 204 determines that a web page is a malicious (e.g., phishing) web page, the processor may cause a warning to be output onto the user interface 212.

[0021] An aspect of the example embodiment is related to the use of computer system 200 for preventing phishing attacks. According to an example embodiment, preventing phishing attacks is provided by computer system 200 in response to processor 204 executing one or more sequences of one or more instructions contained in main memory 206. Such instructions may be read into main memory 206 from another computer-readable medium, such as storage device 210. Execution of the sequence of instructions contained in main memory 206 causes processor 204 to perform the process steps described herein. One or more processors in a multi-processing arrangement may also be employed to execute the sequences of instructions contained in main memory 206. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions to implement an example embodiment. Thus, embodiments described herein are not limited to any specific combination of hardware circuitry and software.

[0022] The term "computer-readable medium" as used herein refers to any medium that participates in providing instructions to processor 204 for execution. Such a medium may take many forms, including but not limited to non-volatile media. Non-volatile media include for example optical or magnetic disks, such as storage device 210. Common forms of computer-readable media include for example floppy disk, a flexible disk, hard disk, magnetic cards, paper tape, any other physical medium with patterns of holes, a RAM, a PROM, an EPROM, a FLASHPROM, CD, DVD or any other memory chip or cartridge, or any other medium from which a computer can read.

[0023] Computer system 200 also includes a communication interface 218 coupled to bus 202. Communication interface 218 provides a two-way data communication coupling computer system 200 to a network link 220 that is connected to a local network 222. For example, communication interface 218 may be a local area network (LAN) card to provide a data communication connection to a compatible LAN. As another example, communication interface 218 may be an integrated services digital network (ISDN) card or a modem to provide a data communication connection to a corresponding type of telephone line. Wireless links may also be implemented. In any such implementation, communication interface 218 sends and receives electrical, electromagnetic, or optical signals that carry digital data streams representing various types of information.

[0024] Network link 220 typically provides data communication through one or more networks to other data devices. For example, network link 220 may provide a connection through local network 222 to data equipment operated by an Internet Service Provider (ISP) 226. ISP 226 in turn provides data communications through the worldwide packet data communication network, now commonly referred to as the "Internet" 228. Computer system 200 can send messages and receive data, including program codes, through the network(s), network link 220, and communication interface 218.

[0025] In view of the foregoing structural and functional features described above, methodologies in accordance with an example embodiment wills be better appreciated with reference to FIGs 3 and 4. . While, for purposes of simplicity of explanation, the methodologies of FIGs. 3 and 4 are shown and described as executing serially, it is to be understood and appreciated that the example embodiments are not limited by the illustrated order, as some aspects could occur in different orders and/or concurrently with other aspects from that shown and described herein. Moreover, not all illustrated features may be required. The methodologies described herein is suitably adapted to be implemented in hardware, software when executed by a processor, or a combination thereof. For example, the methodologies of FIGs 3 and 4 may be implemented by anti-phishing logic 106 in FIG. 1 or by processor 204 in FIG. 2.

[0026] FIG. 3 is a block diagram of a method 300 for determining whether a web page is a malicious web page. At 302, web page data is obtained. The web page data may suitably comprise visual information for the page. Screenshots may be collected to produce a raw-data extract of the web page. In particular embodiments, a document object model of the web page may be generated.

[0027] At 304, a visual analysis of the web page is performed. In an example embodiment, a color distribution analysis in order to identify the percentage of predefined colors positions according to the associated with known web pages (e.g., percentage of yellow in the top, left, center, middle and total area). This set of information produces a "Page DNA" as described herein.

[0028] A 306, the page DNA may be analyzed by an artificial intelligence ("Al") model as described herein to classify the web page as either trusted or malicious. If the page is classified as trusted (NO), at 308 the web page is allowed to load.

[0029] If, however, at 306, the web page is classified as malicious (YES), at 310 corrective action may be taken. The corrective action may include, but is not limited to preventing the web page from loading, re-directing the browser to a known "safe" page, such as a blank page or "about:blank," providing an alert, or any combination of the aforementioned actions.

[0030] FIG. 4 is a block diagram of a method 400 for performing a visual analysis. At 402, a color distribution analysis is executed in order to identify the percentage of predefined colors positions according to the associated with known web pages (e.g., percentage of yellow in the top, left, center, middle and total area). This set of information produces a "Page DNA".

[0031] At 404, the visual data is analyzed and a Page DNA is generated. The analysis comprises a statistical analysis of the visual elements of a page, including but not limited to, layout of page elements, color distribution, and other parameters resulting in a Page DNA as described herein.

[0032] At 406, the Page DNA is analyzed. The Page DNA is analyzed by an artificial intelligence model that classifies the DNA as trusted or malicious (e.g., a phishing attack). The Page DNA may be analyzed based on a machine learning approach based on an analysis of DNA collected from known phishing and trusted pages. [0033] Described above are example embodiments. It is, of course, not possible to describe every conceivable combination of components or methodologies for purposes of describing the example embodiments, but one of ordinary skill in the art will recognize that many further combinations and permutations of the example embodiments are possible. Accordingly, it is intended to embrace all such alterations, modifications and variations that fall within the spirit and scope of any claims filed in applications claiming priority hereto interpreted in accordance with the breadth to which they are fairly, legally and equitably entitled.

Claims

CLAIM(S)

1. An apparatus, comprising:

a transceiver;

anti-phishing logic coupled with the transceiver and operable to obtain data from the transceiver;

a browser operable coupled with the anti-phishing logic and the transceiver;

wherein the browser is operable to request a web page;

wherein the anti-phishing logic is operable to obtain visual information for the web page; and

wherein the anti-phishing logic performs a statistical analysis of the visual information to determine whether the web page is similar to a known web page based on known patterns.

2. The apparatus according to claim 1 , wherein the visual information comprises a color distribution; and

wherein the anti-phishing logic is operable to determine whether the color distribution matches a color distribution of the known web page.

3. The apparatus according to claim 1 , wherein the transceiver is a wireless transceiver.

4. The apparatus according to claim 1 , wherein the anti-phishing logic is operable to re-direct the browser to a predetermined web page responsive to determining the web page is attempting a phishing attack.

5. The apparatus according to claim 4, wherein the predetermined web page is a blank web page.

6. The apparatus according to claim 4, wherein the anti-phishing logic provides an alert indicating that the web page is a malicious web page.

7. The apparatus according to claim 4, wherein the anti-phishing logic is operable to store data representative of a uniform resource locator for the web page in a list of malicious web pages.

8. A method, comprising:

obtaining visual information for a web page requested by a browser; and

performing a statistical analysis of the visual information to determine whether the web page is similar to a known web page based on a list of known patterns.

9. The method set forth in claim 8, wherein the visual information comprises a color distribution; and

wherein the statistical analysis further comprises determining whether the color distribution matches a color distribution of the known web page.

10. A non-transitory, tangible computer readable medium of instructions with instructions encoded thereon for execution by a processor, and when executed operable to:

obtain data representative of a visual information of a web page requested by a browser;

statistically analyze the visual information to determine whether the web page is similar to a known web page based on a list of known patterns; and determine, based on the statistical analysis, whether the web page is a malicious web page.

11. The computer readable medium set forth in 10, wherein the visual information comprises a color distribution; and

the instructions are further operable to determine whether the color distribution matches a color distribution of the known web page.

12. The computer readable medium set forth in claim 10, the instructions are further operable to re-direct the browser to a predetermined web page responsive to determining the web page is attempting a phishing attack.

13. The computer readable medium set forth in claim 12, the instructions are further operable to provide an alert indicating that the web page is a malicious web page.

14. The computer readable medium set forth in claim 12, the instructions are further operable to store data representative of a uniform resource locator for the web page in a list of malicious web pages.