CN115729560A - Program code processing method and device - Google Patents
Program code processing method and device Download PDFInfo
- Publication number
- CN115729560A CN115729560A CN202211465066.9A CN202211465066A CN115729560A CN 115729560 A CN115729560 A CN 115729560A CN 202211465066 A CN202211465066 A CN 202211465066A CN 115729560 A CN115729560 A CN 115729560A
- Authority
- CN
- China
- Prior art keywords
- function
- statement
- digest
- context
- program
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000003672 processing method Methods 0.000 title claims description 13
- 238000000034 method Methods 0.000 claims abstract description 30
- 238000012545 processing Methods 0.000 claims abstract description 25
- 230000006870 function Effects 0.000 claims description 281
- 238000004590 computer program Methods 0.000 claims description 10
- 230000003068 static effect Effects 0.000 description 18
- 238000010586 diagram Methods 0.000 description 10
- 230000000694 effects Effects 0.000 description 7
- 230000008569 process Effects 0.000 description 5
- 101150071434 BAR1 gene Proteins 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 3
- 238000013459 approach Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 230000001902 propagating effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computer Hardware Design (AREA)
- Quality & Reliability (AREA)
- Stored Programmes (AREA)
Abstract
An embodiment of the present specification provides a method and an apparatus for processing a program code, and a specific implementation manner of the method includes: identifying context sensitive statements and context insensitive statements aiming at a plurality of program statements included in a called first function in a program code to be analyzed; generating a first statement digest for the context insensitive statement identified from the number of program statements; generating a first function digest of the first function, wherein the first function digest comprises a first statement digest and context-sensitive statements identified from the plurality of program statements; a first function call point in the program code that calls a first function is replaced with a first function digest.
Description
Technical Field
The embodiment of the specification relates to the technical field of computers, in particular to a program code processing method and device.
Background
Static program analysis generally refers to techniques that do not run a program, but rather discover specific properties of the program by analyzing the source code or build artifacts of the program. Static program analysis techniques can be divided into intra-function and inter-function depending on whether the properties of cross-functions are handled, where inter-function static program analysis requires tracing of data flow and control flow based on program call links. The inter-function static program analysis has great application value as the basic capability of learning properties from programs, such as for the security field, the inter-function static program analysis can track privacy information and prevent privacy leakage, for the technical wind direction, the inter-function static program analysis can automatically discover possible bugs of programs, and the like.
Currently, many programs have very long function call links (e.g., up to hundreds of nodes) and large numbers (e.g., growing exponentially with the number of nodes). Therefore, a reasonable and reliable scheme is urgently needed, and the program code to be analyzed can be processed, so that the performance and the precision of the inter-function static program analysis can be improved.
Disclosure of Invention
Embodiments of the present specification provide a method and an apparatus for processing a program code, which can process a program code to be analyzed, so as to facilitate improving performance and accuracy of inter-function static program analysis.
In a first aspect, an embodiment of the present specification provides a program code processing method, including: identifying context sensitive statements and context insensitive statements aiming at a plurality of program statements included in a called first function in program codes to be analyzed; generating a first statement digest for a context-insensitive statement identified from the number of program statements; generating a first function digest of the first function, including the first statement digest, and context-sensitive statements identified from the plurality of program statements; replacing a first function call point in the program code that calls the first function with the first function digest.
In some embodiments, after said replacing the first function call point in the program code that called the first function, further comprises: determining whether the context sensitive statement meets a preset summary generation condition based on the context information of the context sensitive statement in the first function summary; when the determination result is yes, generating a second statement digest for the context-sensitive statement based on the context information, and replacing, in the program code, the context-sensitive statement in the first function digest with the second statement digest.
In some embodiments, when the determination result is negative, or after replacing the context sensitive statement in the first function digest with the second statement digest, further comprising: and generating a second function summary of the second function based on the current function body of the second function where the first function calling point is located.
In some embodiments, after said generating the second function digest of the second function, further comprising: replacing a second function call point in the program code that calls the second function with the second function digest.
In some embodiments, the identifying a context-sensitive statement and a context-insensitive statement for a number of program statements included in a first function called in the program code to be analyzed includes: determining whether the program statement is a polymorphic calling statement or not for any program statement in the plurality of program statements; if the program statement is determined to be a polymorphic calling statement, judging that the program statement is a context sensitive statement; and if the program statement is determined not to be the polymorphic calling statement, judging that the program statement is a context insensitive statement.
In some embodiments, after said replacing the first function call point in the program code that called the first function, further comprises: if a first parameter different from the input parameter of the first function exists in the input parameters of the first function call point, updating the first statement digest in the first function digest in the program code based on the first parameter.
In a second aspect, an embodiment of the present specification provides a program code processing apparatus, including: the identification unit is configured to identify context sensitive statements and context insensitive statements aiming at a plurality of program statements included in a first function called in program codes to be analyzed; a statement digest generation unit configured to generate a first statement digest for a context-insensitive statement identified from the number of program statements; a function digest generation unit configured to generate a first function digest of the first function, including the first statement digest, and context-sensitive statements identified from the plurality of program statements; a code processing unit configured to replace a first function call point in the program code that calls the first function with the first function digest.
In some embodiments, the apparatus further comprises: a determining unit configured to determine whether a context-sensitive statement in the first function digest satisfies a preset digest generation condition based on context information of the context-sensitive statement after the code processing unit replaces the first function call point in the program code with the first function digest; the statement digest generation unit is further configured to generate a second statement digest for the context-sensitive statement based on the context information when the determination result of the determination unit is yes; the code processing unit is further configured to replace, in the program code, the context sensitive statement in the first function digest with the second statement digest.
In a third aspect, the present specification provides a computer-readable storage medium, on which a computer program is stored, wherein when the computer program is executed in a computer, the computer is caused to execute the method described in any implementation manner of the first aspect.
In a fourth aspect, the present specification provides a computing device, including a memory and a processor, where the memory stores executable code, and the processor executes the executable code to implement the method described in any implementation manner of the first aspect.
In a fifth aspect, the present specification provides a computer program, wherein when the computer program is executed in a computer, the computer is caused to execute the method described in any implementation manner of the first aspect.
The solution provided by the foregoing embodiment of this specification may divide a plurality of program statements included in a first function called in a program code to be analyzed into a context-sensitive statement and a context-insensitive statement. For the context insensitive statement, the statement abstract can be accurately abstracted without giving context information. For the context-sensitive statement, the context-sensitive statement is kept in the first function abstract of the first function, and context information can be given to the context-sensitive statement by replacing a first function call point, calling the first function, in the program code with the first function abstract, so that the context-sensitive statement can be accurately abstracted based on the context information. This ensures high precision of the function digest. In addition, by generating the function abstract for the function in the program code to be analyzed, the inter-function static program analysis based on the function abstract can be realized, and compared with the existing inter-function static program analysis technology based on path traversal, the performance of the inter-function static program analysis can be improved. Therefore, the scheme is helpful for improving the performance and the precision of the inter-function static program analysis.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments disclosed in the present specification, the drawings needed to be used in the description of the embodiments will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments disclosed in the present specification, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.
FIG. 1 is a schematic diagram of one application scenario in which embodiments of the present description may be applied;
FIG. 2 is a flow diagram for one embodiment of a program code processing method;
FIG. 3 is a diagram of the presentation of updated program code;
FIG. 4 is a flow diagram of one embodiment of a program code processing method;
FIG. 5 is a schematic illustration of the presentation of updated program code;
FIG. 6 is a diagram of the presentation of updated program code;
fig. 7 is a schematic diagram of a configuration of the program code processing apparatus.
Detailed Description
The present specification will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. The described embodiments are only a subset of the embodiments described herein and not all embodiments described herein. All other embodiments obtained by a person skilled in the art based on the embodiments in the present specification without any inventive step are within the scope of the present application.
It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings. The embodiments and features of the embodiments in the present description may be combined with each other without conflict.
As previously mentioned, many programs have function call links that are very long (e.g., up to hundreds of nodes) and large in number (e.g., growing exponentially with the number of nodes). To help improve the performance and accuracy of inter-function static program analysis, some embodiments of the present specification provide program code processing methods.
Fig. 1 is a schematic diagram of an application scenario in which the embodiments of the present description may be applied. As shown in fig. 1, the application scenario relates to a program code to be analyzed, which may include a plurality of functions each including several program statements. In addition, some of the functions have function call points. It is noted that the program code shown in fig. 1 is merely exemplary code. It should be understood that the solutions provided in the embodiments of the present specification can be processed for various program codes.
Taking the program statement "return foo (x, obj)" on the 28 th line in the program code shown in fig. 1 as an example, the "foo (x, obj)" in the program statement is a function call point for calling the foo function shown on the 22 th line in the program code. For the function call point, the foo function is a called function, and the mid function where the function call point is located is a calling function. Further, taking the program statement "return mid (new Y (), obj)" in the 31 th row in the program code as an example, the "mid (new Y (), obj)" in the program statement is a function call point, and the function call point is used for calling the mid function shown in the 27 th row in the program code. For the function call point, the mid function is a called function, and the bar1 function at which the function call point is located is a calling function.
In this application scenario, for a function called in the program code to be analyzed, for example, the foo function shown in line 22, a function digest may be generated for the foo function. To improve the precision of the function digest, before generating the function digest, each program statement of the foo function may be divided into a context-sensitive statement and a context-insensitive statement.
The context may be understood as a preamble state of statement execution, and generally includes values of input parameters of a function. Context sensitive statements, the abstraction of which is different for different contexts. Context insensitive statements, the abstraction of which is the same for different contexts. As for pointer analysis, simple assignment statements are context insensitive statements, while polymorphic call statements are context sensitive statements.
Specifically, for the program statement "X tx = id (X)" located on the 24 th line in the foo function, since the id function is a non-polymorphic function, the id function defined in the class FacadeImpl (the class shown on the 17 th line) is called when the program statement is executed, regardless of the values of the input parameters (X and obj) of the foo function. Thus, the program statement is a context insensitive statement for which a statement digest can be generated directly.
For a program statement "return tx poly (obj)" located at line 25 in the foo function, the called function poly in the program statement is a polymorphic function, and the function calling point "tx poly (obj)" is specifically a calling poly function in class Y (see line 7), or a calling poly function in class Z (see line 13), depending on the value of the parameter x. Thus, the program statement is a polymorphic call statement, belonging to a context sensitive statement.
After each program statement in the foo function is divided into a context-sensitive statement and a context-insensitive statement, and a statement digest is generated for the context-insensitive statement, a function digest of the foo function may be generated, the function digest including the statement digest and the context-sensitive statement in each program statement. It is noted that by leaving the context sensitive statements intact in the function digest, a high accuracy of the function digest can be ensured.
After generating the function digest of the foo function, the function digest may be used to replace a function call point in the program code that calls the foo function, such as the function call point "foo (x, obj)" in line 28. Thus, context information can be given to the context sensitive statement in the function digest, so that the context sensitive statement can be accurately abstracted based on the given context information.
The specific steps of the above method are described below with reference to specific examples.
Referring to FIG. 2, a flow 200 of one embodiment of a program code processing method is shown. The execution subject of the method can be any device, equipment, platform or equipment cluster with computing and processing capabilities. The method comprises the following steps:
at step 208, a first function call point in the program code that calls the first function is replaced with the first function digest.
The above steps are further explained below.
In step 202, context-sensitive statements and context-insensitive statements may be identified for a number of program statements included in a first function called in the program code to be analyzed.
Various identification modes can be adopted to identify whether the program statement is a context-sensitive statement or a context-insensitive statement.
Specifically, in one example, a context-sensitive statement identification service may be preset, the program code to be analyzed may be provided to the identification service, the identification service identifies a context-sensitive statement and a context-insensitive statement for a program statement included in a function in the program code, and returns an identification result. Thus, context-sensitive statements and context-insensitive statements of the several program statements comprised by the first function may be determined based on the recognition result.
In another example, for any of the program statements, it may be determined whether the program statement is a polymorphic call statement. If the program statement is determined to be a polymorphic call statement, it may be determined that the program statement is a context-sensitive statement. If it is determined that the program statement is not a polymorphic call statement, it may be determined that the program statement is a context insensitive statement.
Next, in step 204, a statement digest (which may be referred to as a first statement digest) may be generated for the context-insensitive statement identified from the number of program statements. It should be noted that the form of the abstract varies according to different analysis, and the specific abstract form can be set according to actual requirements, and is not limited herein.
Next, in step 206, a function digest of the first function (which may be referred to as the first function digest) may be generated. The first function digest may include a first statement digest and context-sensitive statements identified from the program statements.
Next, in step 208, each function call site in the program code that calls a first function (which may be referred to as a first function call site) may be replaced with a first statement digest.
Taking the foo function in the program code shown in fig. 1 as an example, according to the description in the foregoing, it can be known that the program statement "X tx = id (X)" located at the 24 th row in the foo function is a context insensitive statement, and the program statement "return tx. Poly (obj)" located at the 25 th row in the foo function is a context sensitive statement. When abstracting the program statements in line 24, the transfer relationship between the input parameter x and the return value tx (which can be seen based on the program statements in lines 24, 18-20) can be compressed intoI.e. the set of points for tx contains x. Thus, the statement digest of the program statement in line 24 may beThe function digest generated for the foo function may be, for exampleThereafter, the function digest may be used to replace the function call point "foo (x, obj)" located on line 28 in the program code. The updated presentation effect of the program code can be as shown in fig. 3. FIG. 3 is a schematic diagram illustrating the effect of displaying the updated program code.
The scheme provided by the embodiment corresponding to fig. 2 may divide a plurality of program statements included in a first function called in a program code to be analyzed into a context-sensitive statement and a context-insensitive statement. For the context insensitive statement, the statement abstract can be accurately abstracted without giving context information. For the context-sensitive statement, the context-sensitive statement is kept in the first function abstract of the first function, and context information can be given to the context-sensitive statement by replacing a first function call point, calling the first function, in the program code with the first function abstract, so that the context-sensitive statement can be accurately abstracted based on the context information. This ensures high accuracy of the function digest. In addition, by generating the function abstract for the function in the program code to be analyzed, the inter-function static program analysis based on the function abstract can be realized, and compared with the existing inter-function static program analysis technology based on path traversal, the performance of the inter-function static program analysis can be improved. Therefore, the scheme is helpful for improving the performance and the precision of the inter-function static program analysis. It should be appreciated that this approach helps to improve the performance of the equipment used to perform the inter-function static program analysis, as well as the accuracy of the data produced by the equipment during the analysis process.
In one embodiment, after replacing a first function call point in the program code that calls a first function with a first function digest, a context sensitive statement may be abstracted in time when context information of the context sensitive statement is sufficient. When propagating a context sensitive statement, it is not always passed to the root function.
Referring specifically to FIG. 4, a flow 400 of one embodiment of a program code processing method is shown. The execution subject of the method can be any device, equipment, platform or equipment cluster with computing and processing capabilities. The method comprises the following steps:
at step 416, a second function call point in the program code that calls the second function is replaced with the second function digest.
Wherein, steps 402-408 correspond to steps 202-208 in the embodiment corresponding to fig. 2, and details and technical effects of the specific implementation can refer to the related descriptions in the embodiment corresponding to fig. 2, which are not repeated herein.
In step 410, it may be determined whether the context-sensitive statement satisfies a preset digest generation condition based on the context information of the context-sensitive statement in the first function digest. The summary generation condition may be set according to actual requirements, and is not specifically limited herein. In one example, the digest generation condition may be, for example, that a class to which a called function in a context-sensitive statement belongs can be determined based on context information. When the result of the determination of step 410 is yes, step 412 may be performed. When the result of the determination of step 410 is negative, step 414 may be performed.
In step 412, a second statement digest may be generated for the context-sensitive statement based on the context information in response to a yes determination in step 410, and the context-sensitive statement in the first function digest is replaced in the program code with the second statement digest. Thereafter, step 414 may be performed.
When the determination result of step 410 is negative, or after step 412 is completed, a second function digest of the second function may be generated by executing step 414 based on the current function body of the second function where the first function call point is located. Wherein, when the determination result of step 410 is negative, the second function digest at least comprises the first function digest.
After steps 412 and 414 are executed in sequence, if there is no function call point in the program code for calling the second function, that is, if the second function is a root function, the execution of the process 400 may be ended. If there is a function call point in the program code that calls the second function, step 416 may then be performed.
In addition, when the determination result of step 410 is no, it may indicate that the context information of the context-sensitive statement in the first function digest is not enough for accurate abstraction of the context-sensitive statement, and it is necessary to continue to propagate the context-sensitive statement to the function that calls the second function. Based on this, after steps 410, 414 are performed in sequence, step 416 may be performed next.
In step 416, a second function call point in the program code that called the second function may be replaced with a second function digest. The subsequent execution process can be obtained by analogy based on the content related to the first function in the foregoing, and will not be described herein again.
In one embodiment, after step 408 is performed, for example, specifically after step 408 is performed, and before step 410 is performed, if a first parameter different from the input parameter of the first function exists in the input parameters of the first function call site, the first statement digest in the first function digest may be updated in the program code based on the first parameter.
Next, steps performed after step 408 will be described by taking the program code shown in fig. 3 as an example. See fig. 3, line 28Is a function digest of the foo function. The value of the input parameter x in line 27 can be considered as the context of the context sensitive statement "return tx. Poly (obj)" in the function digest. In this example, tx is derived from the input parameter x of the mid function, so it remains uncertain whether the poly function in the context-sensitive statement is from class Y or class Z. Thus, the context may be deemed insufficient for accurate abstraction of the context-sensitive statement, and it may be determined that the context-sensitive statement does not satisfy the digest generation condition.
Next, a function summary of the mid function may be generated based on the current function body of the mid function. Wherein, the function abstract of the mid function is the same as the function abstract of the foo function, and isThereafter, the function call point in the program code that called the mid function, for example, the function call point "mid (new Y ()), obj" on the 31 st line, may be replaced with the function digest of the mid function. At this time, the display effect of the updated program code may be as shown in fig. 5. FIG. 5 is a schematic diagram illustrating the effect of displaying the updated program code.
Then, according to the input parameter new Y () of the function call point, it can know that tx is from an object of a class Y, so that the statement summary in line 31 can be summarized in the program code shown in FIG. 5Is updated to Then, based on the context information of the context sensitive statement "return tx. Poly (obj)" in the 31 st row, it can be known that the poly function in the context sensitive statement comes from the class Y, and at this time, the context information can be considered to be sufficient, and the context sensitive statement can be accurately abstracted. In particular, the function of the poly function in class Y may be abstracted, for exampleAs a statement digest of the context-sensitive statement and replaces the context-sensitive statement in line 31 with the statement digest. At this time, the function digest in the 31 st line may be embodied asThe updated program code may be as shown in fig. 6. FIG. 6 is a schematic diagram illustrating the effect of displaying the updated program code.
Next, a function digest of bar1 may be generated based on the current function body of the bar1 functionThe function digest may be, for example, the following
The scheme provided by the embodiment corresponding to fig. 4 can divide the program statements in the function into context-sensitive statements which can be accurately abstracted only by context information and context-insensitive statements which can be accurately abstracted without context information, so that the precision is not lost, and the high performance is maintained. In addition, when a context-sensitive statement is propagated, it is not always passed to the root function, but during propagation, if the context information is sufficient, it is directly abstracted.
With further reference to fig. 7, the present specification provides an embodiment of a program code processing apparatus, which corresponds to the method embodiment shown in fig. 2, and which may be applied to any device, platform, or cluster of devices, etc. having computing and processing capabilities.
As shown in fig. 7, the program code processing apparatus 700 of the present embodiment includes: a recognition unit 701, a sentence digest generation unit 702, a function digest generation unit 703, and a code processing unit 704. Wherein the identifying unit 701 is configured to identify a context-sensitive statement and a context-insensitive statement for a number of program statements comprised by a first function called in the program code to be analyzed; the statement digest generation unit 702 is configured to generate a first statement digest for the context-insensitive statement identified from the number of program statements; the function digest generation unit 703 is configured to generate a first function digest of the first function, including a first statement digest, and a context-sensitive statement identified from the number of program statements; code processing unit 704 is configured to replace a first function call point in the program code that calls a first function with the first function digest.
In some embodiments, the apparatus 700 may further include: a determining unit configured to determine whether the context-sensitive statement in the first function digest satisfies a preset digest generation condition based on context information of the context-sensitive statement after the code processing unit 704 replaces the first function call point in the program code with the first function digest. In addition, the statement digest generation unit 702 may be further configured to generate a second statement digest for the context-sensitive statement based on the context information when the determination result of the determination unit is yes. The code processing unit 704 may be further configured to replace, in the program code, the context-sensitive statement in the first function digest with the second statement digest.
In some embodiments, the function digest generation unit 703 may be further configured to: when the determination result of the determining unit is negative, or after the code processing unit 704 replaces the context sensitive statement in the first function digest with the second statement digest, a second function digest of the second function is generated based on the current function body of the second function where the first function call point is located.
In some embodiments, when the determination of the determining unit is negative, the second function digest comprises at least the first function digest, and the code processing unit 704 may be further configured to: after the function digest generation unit 703 generates the second function digest of the second function, the second function digest is used to replace the second function call point in the program code, at which the second function is called.
In some embodiments, the identifying unit 701 may be further configured to: determining whether any program statement in the plurality of program statements is a polymorphic calling statement; if the program statement is determined to be a polymorphic calling statement, judging that the program statement is a context sensitive statement; and if the program statement is determined not to be the polymorphic calling statement, judging that the program statement is a context insensitive statement.
In some embodiments, the code processing unit 704 may be further configured to: and if the first parameter different from the input parameter of the first function exists in the input parameters of the first function call point, updating the first statement abstract in the first function abstract in the program code based on the first parameter.
In the embodiment of the apparatus corresponding to fig. 7, the detailed processing of each unit and the technical effect thereof can refer to the related description of the method embodiment in the foregoing, and are not repeated herein.
Embodiments of the present specification also provide a computer-readable storage medium on which a computer program is stored, wherein when the computer program is executed in a computer, the computer is caused to execute the program code processing method described in each of the above method embodiments.
The embodiment of the present specification further provides a computing device, which includes a memory and a processor, where the memory stores executable codes, and the processor executes the executable codes to implement the program code processing methods respectively described in the above method embodiments.
Embodiments of the present specification also provide a computer program, wherein when the computer program is executed in a computer, the computer is caused to execute the program code processing method described in each of the above method embodiments.
Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in the embodiments disclosed herein may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.
In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The above-mentioned embodiments, objects, technical solutions and advantages of the embodiments disclosed in the present specification are further described in detail, it should be understood that the above-mentioned embodiments are only specific embodiments of the embodiments disclosed in the present specification, and are not intended to limit the scope of the embodiments disclosed in the present specification, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the embodiments disclosed in the present specification should be included in the scope of the embodiments disclosed in the present specification.
Claims (10)
1. A program code processing method, comprising:
identifying context sensitive statements and context insensitive statements aiming at a plurality of program statements included in a called first function in program codes to be analyzed;
generating a first statement digest for a context-insensitive statement identified from the number of program statements;
generating a first function digest of the first function, including the first statement digest, and context-sensitive statements identified from the plurality of program statements;
replacing a first function call point in the program code that calls the first function with the first function digest.
2. The method of claim 1, wherein after said replacing a first function call point in said program code that called said first function, further comprising:
determining whether the context sensitive statement meets a preset summary generation condition based on the context information of the context sensitive statement in the first function summary;
when the determination result is yes, generating a second statement digest for the context-sensitive statement based on the context information, and replacing, in the program code, the context-sensitive statement in the first function digest with the second statement digest.
3. The method of claim 2, wherein when the determination is negative, or after replacing the context-sensitive statement in the first function digest with the second statement digest, further comprising:
and generating a second function summary of the second function based on the current function body of the second function where the first function calling point is located.
4. The method of claim 3, wherein after said generating a second function digest of said second function, further comprising:
replacing a second function call point in the program code that calls the second function with the second function digest.
5. The method of claim 1, wherein identifying context-sensitive statements and context-insensitive statements for a number of program statements included in the first function to be invoked in the program code to be analyzed comprises:
determining whether the program statement is a polymorphic calling statement or not for any program statement in the plurality of program statements;
if the program statement is determined to be a polymorphic calling statement, judging that the program statement is a context sensitive statement;
and if the program statement is determined not to be the polymorphic calling statement, judging that the program statement is a context insensitive statement.
6. The method of claim 1, wherein after said replacing a first function call point in said program code that called said first function, further comprising:
if a first parameter different from the input parameter of the first function exists in the input parameters of the first function call point, updating the first statement digest in the first function digest in the program code based on the first parameter.
7. A program code processing apparatus comprising:
the identification unit is configured to identify context sensitive statements and context insensitive statements aiming at a plurality of program statements included in a first function called in program codes to be analyzed;
a statement digest generation unit configured to generate a first statement digest for a context-insensitive statement identified from the number of program statements;
a function digest generation unit configured to generate a first function digest of the first function, including the first statement digest, and context-sensitive statements identified from the plurality of program statements;
a code processing unit configured to replace a first function call point in the program code that calls the first function with the first function digest.
8. The apparatus of claim 7, further comprising:
a determining unit configured to determine whether a context-sensitive statement in the first function digest satisfies a preset digest generation condition based on context information of the context-sensitive statement after the code processing unit replaces the first function call point in the program code with the first function digest;
the statement digest generation unit is further configured to generate a second statement digest for the context-sensitive statement based on the context information when the determination result of the determination unit is yes;
the code processing unit is further configured to replace, in the program code, the context sensitive statement in the first function digest with the second statement digest.
9. A computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed in a computer, causes the computer to carry out the method of any one of claims 1-6.
10. A computing device comprising a memory and a processor, wherein the memory has stored therein executable code that when executed by the processor implements the method of any of claims 1-6.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211465066.9A CN115729560B (en) | 2022-11-22 | 2022-11-22 | Program code processing method and device |
PCT/CN2023/111941 WO2024109167A1 (en) | 2022-11-22 | 2023-08-09 | Program code processing method and apparatus |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211465066.9A CN115729560B (en) | 2022-11-22 | 2022-11-22 | Program code processing method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115729560A true CN115729560A (en) | 2023-03-03 |
CN115729560B CN115729560B (en) | 2024-05-17 |
Family
ID=85297227
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211465066.9A Active CN115729560B (en) | 2022-11-22 | 2022-11-22 | Program code processing method and device |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN115729560B (en) |
WO (1) | WO2024109167A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2024109167A1 (en) * | 2022-11-22 | 2024-05-30 | 支付宝(杭州)信息技术有限公司 | Program code processing method and apparatus |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7392545B1 (en) * | 2002-01-18 | 2008-06-24 | Cigital, Inc. | Systems and methods for detecting software security vulnerabilities |
US20080229286A1 (en) * | 2007-03-14 | 2008-09-18 | Nec Laboratories America, Inc. | System and method for scalable flow and context-sensitive pointer alias analysis |
CN101894064A (en) * | 2009-05-21 | 2010-11-24 | 北京邮电大学 | Method for testing software by applying across function analysis |
CN103744776A (en) * | 2013-11-04 | 2014-04-23 | 北京邮电大学 | Static analysis method and system based on symbolic function abstracts |
US20140344633A1 (en) * | 2013-05-15 | 2014-11-20 | Oracle International Corporation | Path-sensitive analysis framework for bug checking |
US20150067660A1 (en) * | 2013-08-27 | 2015-03-05 | International Business Machines Corporation | Building reusable function summaries for frequently visited methods to optimize data-flow analysis |
US20150220419A1 (en) * | 2014-02-06 | 2015-08-06 | NATIONAL ICT AUSTRALIA LlMITIED | Analysis of program code |
CN107193742A (en) * | 2017-05-23 | 2017-09-22 | 电子科技大学 | A kind of symbolism function digest algorithm of path-sensitive based on state |
US20200042706A1 (en) * | 2018-07-31 | 2020-02-06 | Oracle International Corporation | Taint analysis with access paths |
US10719424B1 (en) * | 2019-03-18 | 2020-07-21 | Oracle International Corporation | Compositional string analysis |
CN115098108A (en) * | 2022-06-22 | 2022-09-23 | 南京邮电大学 | Lightweight context sensitive pointer analysis method based on high-order function |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111104335B (en) * | 2019-12-25 | 2021-08-24 | 清华大学 | C language defect detection method and device based on multi-level analysis |
CN115729560B (en) * | 2022-11-22 | 2024-05-17 | 支付宝(杭州)信息技术有限公司 | Program code processing method and device |
-
2022
- 2022-11-22 CN CN202211465066.9A patent/CN115729560B/en active Active
-
2023
- 2023-08-09 WO PCT/CN2023/111941 patent/WO2024109167A1/en unknown
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7392545B1 (en) * | 2002-01-18 | 2008-06-24 | Cigital, Inc. | Systems and methods for detecting software security vulnerabilities |
US20080229286A1 (en) * | 2007-03-14 | 2008-09-18 | Nec Laboratories America, Inc. | System and method for scalable flow and context-sensitive pointer alias analysis |
CN101894064A (en) * | 2009-05-21 | 2010-11-24 | 北京邮电大学 | Method for testing software by applying across function analysis |
US20140344633A1 (en) * | 2013-05-15 | 2014-11-20 | Oracle International Corporation | Path-sensitive analysis framework for bug checking |
US20150067660A1 (en) * | 2013-08-27 | 2015-03-05 | International Business Machines Corporation | Building reusable function summaries for frequently visited methods to optimize data-flow analysis |
CN103744776A (en) * | 2013-11-04 | 2014-04-23 | 北京邮电大学 | Static analysis method and system based on symbolic function abstracts |
US20150220419A1 (en) * | 2014-02-06 | 2015-08-06 | NATIONAL ICT AUSTRALIA LlMITIED | Analysis of program code |
CN107193742A (en) * | 2017-05-23 | 2017-09-22 | 电子科技大学 | A kind of symbolism function digest algorithm of path-sensitive based on state |
US20200042706A1 (en) * | 2018-07-31 | 2020-02-06 | Oracle International Corporation | Taint analysis with access paths |
US10719424B1 (en) * | 2019-03-18 | 2020-07-21 | Oracle International Corporation | Compositional string analysis |
CN115098108A (en) * | 2022-06-22 | 2022-09-23 | 南京邮电大学 | Lightweight context sensitive pointer analysis method based on high-order function |
Non-Patent Citations (5)
Title |
---|
ONDŘEJ LHOTÁK等: "Evaluating the Benefits of Context-Sensitive Points-to Analysis Using a BDD-Based Implementation", 《ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY》, 7 October 2008 (2008-10-07), pages 1 - 53, XP058153117, DOI: 10.1145/1391984.1391987 * |
PENGHUI LI等: "LChecker:一种用于检测PHP松散比较错误的工具", pages 1 - 13, Retrieved from the Internet <URL:https://www.anquanke.com/post/id/243936> * |
王留帅: "基于函数摘要的C++过程间静态分析研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》, 15 February 2018 (2018-02-15), pages 138 - 252 * |
肖庆等: "提高静态缺陷检测精度方法", 计算机辅助设计与图形学学报》, no. 11, 15 November 2010 (2010-11-15), pages 2037 - 2044 * |
胡成杰: "Java语言基于函数摘要的全局分析静态测试方法", 《计算机研究与发展》, 15 June 2010 (2010-06-15), pages 64 - 68 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2024109167A1 (en) * | 2022-11-22 | 2024-05-30 | 支付宝(杭州)信息技术有限公司 | Program code processing method and apparatus |
Also Published As
Publication number | Publication date |
---|---|
WO2024109167A1 (en) | 2024-05-30 |
CN115729560B (en) | 2024-05-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8516443B2 (en) | Context-sensitive analysis framework using value flows | |
US11579856B2 (en) | Multi-chip compatible compiling method and device | |
US8458681B1 (en) | Method and system for optimizing the object code of a program | |
CN112394942A (en) | Distributed software development compiling method and software development platform based on cloud computing | |
EP1722300A1 (en) | Reifying generic types | |
US11307975B2 (en) | Machine code analysis for identifying software defects | |
CN111736840A (en) | Compiling method and running method of applet, storage medium and electronic equipment | |
US20210406152A1 (en) | Cloud Application to Automatically Detect and Solve Issues in a Set of Code Base Changes Using Reinforcement Learning and Rule-Based Learning | |
CN113312113B (en) | Dynamic configuration method, device, equipment and storage medium of business rule information | |
CN111679852A (en) | Detection method and device for conflict dependency library | |
CN111797020A (en) | Mock data method and device based on dynamic bytecode | |
CN115729560B (en) | Program code processing method and device | |
CN109542444B (en) | JAVA application monitoring method, device, server and storage medium | |
US11537372B2 (en) | Generating compilable machine code programs from dynamic language code | |
Petrescu et al. | Do names echo semantics? A large-scale study of identifiers used in C++’s named casts | |
CN111240728A (en) | Application program updating method, device, equipment and storage medium | |
CN116305131A (en) | Static confusion removing method and system for script | |
Mohsin | WGSLsmith: a random generator of WebGPU shader programs | |
US11442845B2 (en) | Systems and methods for automatic test generation | |
US11650802B2 (en) | Idiomatic source code generation | |
CN115705294B (en) | Method, device, electronic equipment and medium for acquiring function call information | |
CN115951916A (en) | Component processing method and device, electronic equipment and storage medium | |
CN111796832B (en) | Hot patch file generation method, device, equipment and storage medium | |
CN114047923A (en) | Error code positioning method, device, storage medium and electronic equipment | |
CN113220586A (en) | Automatic interface pressure test execution method, device and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |