-
-
Notifications
You must be signed in to change notification settings - Fork 641
Plugin Architecture GSoC 2017 Final Report
Yash D. Saraf <yashdsaraf@gmail.com>
This project’s purpose was to create a decoupled plugin architecture for ScanCode such that it can handle plugins at different stages of a scan and can be coupled at runtime. These stages were,
1. Format:
In this stage, the plugins are supposed to run after the scanning is done and post-scan plugins are called. These plugins could be used for
- converting the scanned output to the given format (say csv, json, etc.)
Here, a plugin needs to add an entry in the scancode_output_writers entry point in the following format
'<format> = <module>:<function>'
-
<format>is the format name which will be used as the command line option name (e.g csv or json). -
<module>is a python module which implements theoutputhook specification. -
<function>is the function to which the scan output will be passed if this plugin is called.
The <format> name will be automatically added to the --format command line option and (if called) the scanned data will be passed to the plugin.
2. Post-scan:
In this stage, the plugins are supposed to run after the scanning is done. Some uses for these plugins were
-
summarization of scan outputs
e.g A post-scan plugin for markingis_sourceto true for directories with ~90% of source files -
simplification of scan outputs
e.g The--only-findingsoption to return files or directories with findings for the requested scans. Files and directories without findings are omitted (not considering basic file information as findings)).
This option already existed, I just ported it to a post-scan plugin.
Here, a plugin needs to add an entry in the scancode_post_scan entry point in the following format
'<name> = <module>:<function>'
-
<name>is the command line option name (e.g only-findings). -
<module>is a python module which implements thepost_scanhook specification. -
<function>is the function to which the scanned files will be passed if this plugin is called.
The command line option for this plugin will be automatically created using the <function>'s doctring as its help text and (if called) the scanned files will be passed to the plugin.
3. Pre-scan:
In this stage, the plugins are supposed to run before the scan starts. So the potential uses for these types of plugins were to
- ignore files based on a given pattern (glob)
- ignore files based on their info i.e size, type etc.
- extract archives before scanning
Here, a plugin needs to add an entry in the scancode_pre_scan entry point in the following format
'<name> = <module>:<class>'
-
<name>is the command line option name (e.g ignore). -
<module>is a python module which implements thepre_scanhook specification. -
<class>is the class which is instantiated and its appropriate method is invoked if this plugin is called. This needs to extend theplugincode.pre_scan.PreScanPluginclass.
The command line option for this plugin will be automatically created using the <class>'s doctring as its help text.
Since there isn't a single spot where pre-scan plugins can be plugged in, more methods to PreScanPlugin class can be added which can represent different hooks, say to add or delete a scan there might be a method called process_scan.
If a plugin's option is passed by the user, then the <class> is instantiated with the user input and its appropriate aforementioned methods are called.
In this stage, the plugins are supposed to run before the scan starts and after the pre-scan plugins are called. These plugins would have been used for
- adding or deleting scans
- adding dependency scans (whose data could be used in other scans)
No development has been done for this stage, but it will be quite similar to pre-scan.
Group cli options in cli help
Here, the goal was to add command line options to pre-defined groups such that they are displayed in their respective groups when scancode -h or scancode --help is called. This helped to better visually represent the command line options and determine more easily what context they belong to.
Add a Resource class to hold all scanned info for a resource Ongoing
Here, the goal was to create a Resource class such that it holds all the scanned data for a resource (i.e a file or a directory).
This class would go on to eventually encapsulate the caching logic entirely. For now, it just holds the info and path of a resource.
- Pre-scan plugin for archive extraction
- Scan (proper) plugins
- More complex post-scan plugins
- Support plugins written in languages other than python
Additionally, all my commits can be found here.
See http://nexb.com for more.