-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The current Java source file analyser doesn't detect implicit package usages #1
Comments
Spoon might help. |
Thanks for letting me know about this tool, I didn't know it, and it looks very interesting. I will investigate as soon as I will have time. The About page says Spoon is developed at Inria Lille, and your GitHub profile also says you're from Lille. |
Not a coincidence ;-) I have been using this library in past projects and know some team members. |
Hmm I think there is already a Java parser with the JDK as part of the Compiler API. What are the required inputs and outputs needed ? - Would it be that given a set of Java source files to parse, a unique list of import strings would be returned covering the standard imports plus any implicit import as in the example e.g. a.getB().getC() ? |
Yes, that's exactly it. For each Java file, I need the list of packages explicitly or implicitly used by that file. Does the JDK Java parser have an API directly usable from Java? |
I think it does yes, you can use the compile api parse() method to return a tree of compile unit declarations (classes etc) and then traverse the tree to get other tree parts such as packages etc. I don't know yet about the case of transitive but a little prototype may help discover more detail. |
Yes, if you can make a prototype, that would be wonderful. It can be a separate project in your own repo to start with. |
Hi Bruno, I tried out using the java parser and came to.a stop as the parser could not extract A.getB().grtC() as it had not enough information. I've looked at the Apache BCEL byte code engineering library which works on .class files. I think we can get the information using that. Each class file has a constant pool and a string pool. I think we can use BCEL to get the classes used from the constant pool - it has an index that points to a string in the string pool and then we get the package and class name from the sting pool. The strings need a little manipulation to cope with arrays etc. This means that the cli export would need to pass in .class files after the maven compile phase (so the pre package phase would do) |
Hi Alex, thanks for this research that saves me a lot of time. I don't think we can use any tool based on byte code such as Apache BCEL, because the whole idea of the WebFX CLI is to create the build chain from an analysis of the source code, and you can't get the byte code without the build chain... Byte code could be ok for third-party libraries but not for the project itself. It's only after a WebFX update pass that the build chain is ready to compile a WebFX project. So the WebFX CLI can't require any byte code as input to do its update pass. So I think we need to keep investigating alternative Java source parsers. |
Can I ask in the case of A.getB().getC() would the source files for A.java, B.java and C.java be supplied ? If this is the case then just parsing each class would build up a list of packages and classes using the prototype I've written so far with further work would do that. I suspect you would need a parser that follows the source path for each file to resolve the A.getB().getC() scenario - quick look at JavaParser I was only supplying a single source file.. Just thinking if there is another way to get the information needed. |
This is the tricky part!.. It would be inefficient in terms of performance to provide straightaway all the Java sources that may be used by the WebFX project. At the beginning, the input should be the project source files only. Then let's say one of these Java files uses A.getB().getC() and A is not part of the project but part of a library declared in webfx.xml, the parser should at least report at this point the package that class A is belonging to (but not B and C at this point). Ideally, it would be great if the parser had a callback feature to request more sources. Then the WebFX CLI could search which library this package belongs to among the declared libraries (this part is already implemented), and once found, pass it back to the parser so it can keep going, and eventually ask about B & C if these classes are also declared in other libraries. At the end of this process, we should have the complete information. That callback feature would be the ideal Java Parser API for us. Can you check if the parser you're investigating provides one? If not, maybe we can try to make a process with several passes. The first pass would be with the project sources only. The result of this first parsing would be partial because of the missing sources, but at least we should be able to collect all external packages. Then the WebFX CLI can resolve these packages, and we start another pass with the scope extended (project + libraries declaring these packages). And we repeat that process until there is no more external packages. It will be less efficient in terms of performance to have several passes, but it should work. |
Just to share my investigation on Spoon, I didn't go far as I was stopped by a few bugs (see the issues I opened here and here). These bugs are quite basic but I guess this is because Spoon has not been intensively tested with a partial set a sources (probably most people provide a full set). That said, I'm not concluding that Spoon is not the right tool. It's apparently possible to do incremental parsing (no need to parse everything again when the scope increases), which is a very good point for our use case with several passes. I didn't want to investigate more because of the bugs, but I will restart once they fix them (these issues are also a good opportunity to see how active their project is). |
I investigated JavaParser and it looked like up to JDK 13 is supported but beyond that not yet - lacking effort. The code itself is quite similar in some ways to the JDK compiler API regarding source level parsing and class API, there is some statement evaluation but not useful enough for our needs. The java compiler API is better as it is up to date i.e. JDK 20+ I have gone back to my prototype code and I'm working,albeit gradually, to get the prototype code to extract all class, field, method parameter, method variable, method return type, inner declared classes, exceptions, annotations, generic types, records, enum objects, interfaces etc from a file and hold them in a package and class list. I will then resolve packages the classes etc belong to with a set of rules and then have a callable interface to ask cli 'does this package and class exist ?' - if cli can't resolve to a package path and class then it would return null i.e package path / file not found. The parser would then try a different package and class combination based on the imports for the current file. If cli found a file then that would then be passed in to parse and the cycle continue recursively. I would also keep track of visited files to prevent a circularity error. In the webfx.xml I think there would still need to be a manual section for dynamically loaded classes as these would not get parsed and possibly other case where sources were not available but were needed in order to fully resolve a package. As it is Easter I'm away so possibly do some more next weekend. |
Yes and no. Spoon relies on the eclipse compiler (JDT) to analyze the source code and then transforms its internal representation into a more high-level format. If you give it an incomplete classpath, JDT will guess its way through it and spoon tries to fix up the results. That part of spoon is quite error prone compared to having the full classpath though — with an incomplete classpath many java classes are suddenly ambiguous. Historically, these issues were handled whenever they crept up (and were at least partially fixable). As spoon does not maintain its own compiler, the semantic analysis it can perform on invalid code is limited by what JDT offers. Spoon just massages the problem AST nodes created by JDT to squeeze out a more plausible AST, but it fundamentally cannot and does not always produce correct results. This area of spoon also hasn't seen the most love: Quite often there are semantic clues spoon could use to disambiguate cases. That being said, Spoon is able to pick up on classes in your classpath. If you have compiled libraries, just supplying them as additional classpath resources and only giving it the project's source code should also lift you into a much better position. Spoon does not need to analyze the whole source of all libraries, just reflectively introspect the types referenced by your code. Another project you could look at (though the docs are severely lacking) is IntelliJ PSI. I haven't used it much, but I would expect it to also handle incomplete types gracefully. It's the backbone of their IDE, after all. |
Thanks @I-Al-Istannen for your comment and explanation. Regarding your suggestion to supply the additional classpath to libraries, it doesn't fit in my use case as I'm writing a tool whose aim is to (re)create the Maven build chain (pom.xml, etc...) of a project from an analysis of its source code alone. So the starting point is necessarily an incomplete classpath (the project source code without the libraries) and the source code analysis must return the external missing packages that my tool will then resolve by finding the appropriate libraries in Maven remote repositories (using some minimal information). So it's a bit the opposite way as a normal usage. I saw you submitted a PR for the issue I raised, thank you very much for this contribution. I will restart my tests with Spoon once your PR will be merged. If it's still not matching my needs, I will give a try to IntelliJ PSI. Thanks also for that suggestion. |
Bruno, can you see my webfx-prototype project on AlexBelchUK GitHub or do I need to publish to make visible ? It's still work in progress at the moment though |
@AlexBelchUK, thanks for your previous feedback Alex, I was also unavailable during the Easter weekend. Ok for continuing your investigation with the JDK compiler API 👍 Yes I can see your prototype and clone it, let me know how it works. If you give it A.java and B.java as input (as described in that Spoon issue), will it detect that a.A is using c.C (and therefore a.A is using package c)? |
In theory yes - it does it by following each source file - if given
A.getB().getC() the parser will find the first class A store its name in
the list, it will then find method getB() and the return type B and add
it's name to the list, and then look in class B for getC() method and
return type C and store class C name in the list, and so on until class D
and finally store that. For each class stored, the main class is also
stored and the associated list of imports - either wildcard com.abc.*
Imports or imports with a class name com.abc.ClassName. The resolver class
then applies matching rules to see which package a stored class name comes
from. It matches by comparing import lists, primary class names to the end
of string for class name imports for example. Other resolve methods are
reflection of java.lang classes, other part compiled classes on the
classpath and will also ask cli if it has a file for the package and class
combination. Once a package name is known it is stored with the class once
all resolution is done a unique list of package names is returned. The
story is more complicated however and the parser will look at and find all
non primitive objects eg exceptions, records, interfaces, and try to
resolve them too. That's the theory :-)
Alex
…On Tue, 11 Apr 2023, 20:30 Bruno Salmon, ***@***.***> wrote:
@AlexBelchUK <https://github.com/AlexBelchUK>, thanks for your previous
feedback Alex, I was also unavailable during the Easter weekend. Ok for
continuing your investigation with the JDK compiler API 👍
Yes I can see your prototype and clone it, let me know how it works. If
you give it A.java and B.java as input (as described in that Spoon issue
<INRIA/spoon#5150>), will it detect that a.A is
using c.C (and therefore a.A is using package c)?
—
Reply to this email directly, view it on GitHub
<#1 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABXRM2EE5YQK4ENKZM65CI3XAWWN5ANCNFSM54IHGELQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@AlexBelchUK I tried to use your prototype but got a NPE ( I suggest we continue with concrete simple cases, so I created this webfx-cli-cases repository, where I will put some concrete cases we can work on. The first case I created this webfx-cli-spoon repository, and the class SpoonCase1 analyses
Could you please write a |
Hi Bruno,
|
Hi Alex, no worries and no pressure, we all have busy lives. It was just my wrong interpretation that you already had some results to share with your prototype repository, sorry... Yes, Spoon recognises records. It's based on the eclipse compiler (JDT), as I-Al-Istannen explained above, and it supports Java level up to 19. Yes the Spoon API is good for the WebFX CLI, but not the runtime, too buggy at the moment when used with incomplete classpath... But this week they fixed one of the 2 issues I submitted (the simplest one), so we will see how it goes. So, Spoon is uncertain at this stage, and I'm still interested in your prototype. I understand it's a lower level to directly work with the JDK compiler, but if we can make it work, it's better as we won't be dependent on another project to fix the bugs for our use case. Also it will be probably faster, because it looks like Spoon is building a lot of stuff on top of the compiler before giving the report, and the performance of Let me know if you make any progress, but take your time, I know it's an additional effort to your job, which makes your contribution even more appreciable :) |
The current Java source analyser is a simple implementation based on regular expressions. It can detect only packages that are explicitly written in the Java source file (such as the ones listed in the imports, or in the code when using a fully qualified class name). However, it's possible to have implicit usages of packages, and the CLI currently doesn't detect them, like for example in the following code:
So it needs to be replaced with a much more elaborate Java syntax parser, able to detect the getB() call and tell what type is the returned value (through an analyse of the ClassA source).
There are probably other cases of implicit package usage, such as those coming from inheritance.
Any volunteer to work on this issue?
The text was updated successfully, but these errors were encountered: