A logo showing the text blog.marcnuri.com
Español
Home»Java»Building a GitHub Dependents Scraper with Quarkus and Picocli

Recent Posts

  • Fabric8 Kubernetes Client 7.2 is now available!
  • Connecting to an MCP Server from JavaScript using AI SDK
  • Connecting to an MCP Server from JavaScript using LangChain.js
  • The Future of Developer Tools: Adapting to Machine-Based Developers
  • Connecting to a Model Context Protocol (MCP) Server from Java using LangChain4j

Categories

  • Artificial Intelligence
  • Front-end
  • Go
  • Industry and business
  • Java
  • JavaScript
  • Legacy
  • Operations
  • Personal
  • Pet projects
  • Tools

Archives

  • May 2025
  • April 2025
  • March 2025
  • February 2025
  • January 2025
  • December 2024
  • November 2024
  • August 2024
  • June 2024
  • May 2024
  • April 2024
  • March 2024
  • February 2024
  • January 2024
  • December 2023
  • November 2023
  • October 2023
  • September 2023
  • August 2023
  • July 2023
  • June 2023
  • May 2023
  • April 2023
  • March 2023
  • February 2023
  • January 2023
  • December 2022
  • November 2022
  • October 2022
  • September 2022
  • August 2022
  • July 2022
  • June 2022
  • May 2022
  • March 2022
  • February 2022
  • January 2022
  • December 2021
  • November 2021
  • October 2021
  • September 2021
  • August 2021
  • July 2021
  • January 2021
  • December 2020
  • November 2020
  • October 2020
  • September 2020
  • August 2020
  • July 2020
  • June 2020
  • May 2020
  • February 2020
  • January 2020
  • December 2019
  • October 2019
  • September 2019
  • July 2019
  • March 2019
  • November 2018
  • July 2018
  • June 2018
  • May 2018
  • April 2018
  • March 2018
  • February 2018
  • December 2017
  • July 2017
  • January 2017
  • December 2015
  • November 2015
  • December 2014
  • March 2014
  • February 2011
  • November 2008
  • June 2008
  • May 2008
  • April 2008
  • January 2008
  • November 2007
  • September 2007
  • August 2007
  • July 2007
  • June 2007
  • May 2007
  • April 2007
  • March 2007

Building a GitHub Dependents Scraper with Quarkus and Picocli

2020-07-31 in Java tagged Client / GitHub / Java / Eclipse JKube / Picocli / Quarkus by Marc Nuri | Last updated: 2021-10-09
Versión en Español

Introduction

During the past few months, my team and I have been working very hard to release Eclipse JKube. JKube is the successor of the deprecated Fabric8 Maven Plugin, and as such, our main goal right now is to migrate the current user-base to the new project. You can learn more about JKube and how to get started in this other post.

GitHub provides some fancy stats and metrics, including information about the project’s dependency graph. This information is really valuable since we get to know which projects (within GitHub) depend on ours. So for our user-base migration use case, this information is spot on. Unfortunately, the GitHub developers API offers information about dependencies, but not about dependents.

In this blog post, I will show you how to create a simple web scraper using Picocli and Quarkus to build a native binary that will scrape dependents for any GitHub project.

Note: Make sure you comply with GitHub Scraping and API Usage Restrictions before using this tool.

Application bootstrap

The first step is to bootstrap the project, we can use the handy code.quarkus.io web interface to generate our project. In this case, we only need Picocli experimental dependency. We’ll also need to add the jsoup dependency to our pom.xml.

Since the application will be scraping the GitHub website which uses SSL, we need to enable the https URL protocol for the GraalVM build. Quarkus Maven Plugin provides a very simple configuration that allows us to include build arguments for GraalVM to use.

1<!-- ... -->
2<build>
3  <plugins>
4    <plugin>
5      <groupId>io.quarkus</groupId>
6      <artifactId>quarkus-maven-plugin</artifactId>
7      <executions>
8        <execution>
9          <id>package</id>
10          <goals>
11            <goal>native-image</goal>
12          </goals>
13          <configuration>
14            <dockerBuild>${dockerBuild}</dockerBuild>
15            <additionalBuildArgs>
16              <additionalBuildArg>-H:EnableURLProtocols=https</additionalBuildArg>
17              <additionalBuildArg>-H:EnableURLProtocols=http</additionalBuildArg>
18            </additionalBuildArgs>
19          </configuration>
20        </execution>
21      </executions>
22    </plugin>
23  </plugins>
24</build>
25<!-- ... -->

Quarkus Command line application

Next up, we will make our application command-line friendly by using Picocli. There are a few guides online that will help us achieve this, and even a full section in the quarkus-cheat-sheet.

Application Main Class

Our application is really simple and only has an entry-point, so to make things easier, we’ll annotate the application’s main class with @CommandLine.Command. This is the resulting code:

1@CommandLine.Command(name = "github-dependents")
2public class Application implements Runnable {
3
4  private final ScraperService scraperService;
5
6  @CommandLine.Parameters(index = "0", paramLabel = "URL", arity = "1",
7    description = "GitHub URL to the projects dependents list")
8  String dependentsUrl;
9
10  @Inject
11  public Application(ScraperService scraperService) {
12    this.scraperService = scraperService;
13  }
14
15  @Override
16  public void run() {
17    try {
18      new URL(dependentsUrl);
19      scraperService.scrape(dependentsUrl);
20    } catch(MalformedURLException ex) {
21      System.err.printf("URL %s is invalid, please provide a valid URL.%n", dependentsUrl);
22      CommandLine.usage(this, System.out);
23    } catch (Exception e) {
24      System.err.println(e.getMessage());
25      e.printStackTrace();
26    }
27  }
28
29}

Since I want to make this application usable for any GitHub project, I need input from the user specifying which page they want to scrape. For this purpose, I’ll annotate a dependentsUrl field with @CommandLine.Parameters. Please note how Picocli is very user-friendly and provides a set of parameters that will be used to generate the CLI help command.

We’ll use standard Java CDI @Inject annotation provided by Quarkus to inject the service class that will be used to perform the scraping.

Scraper Service

This class contains the actual logic that will scrape the GitHub dependents page. The program will recurse through the different pages and return a JSON representation of each dependent including organization, name, URL, stars, and forks.

The service also performs some validations for the user inputs (valid URL, URL belongs to a GitHub dependents page, etc.) and has some logic to retry in case GitHub returns a 429 - Too Many Requests HTTP status code.

Running the application

JVM

To run the application using a Java Virtual Machine, first, we need to compile and package the application:

1mvn clean package

Once we’ve packaged the application we can run it with a target repository of our choice:

1java -jar  target/github-dependents-scraper-uber.jar "https://github.com/eclipse/jkube/network/dependents?package_id=UGFja2FnZS0xMDY0ODYxMDkz"

Native binary

To run the application using a native binary, first, we need to compile, package and build the native image for the application:

1mvn clean package -Pnative

Once the binary file is ready we can run it with a target repository of our choice:

1./target/github-dependents-scraper-uber "https://github.com/eclipse/jkube/network/dependents?package_id=UGFja2FnZS0xMDY0ODYxMDkz"

The following GIF shows a quick demo of the application running;

A demo of this application running
A demo of this application running

Conclusion

In this article, you’ve seen how easy it is to create a simple but very useful command-line tool using Quarkus and Picocli, and how to create a native binary with no pain using Quarkus features.

You can check the full source code for this post in the github-dependents-scraper GitHub repository.

Quarkus
Quarkus
Twitter iconFacebook iconLinkedIn iconPinterest iconEmail icon

Post navigation
Apache Camel used on a Kubernetes Cassandra clusterJBang: Unlocking Java's Scripting Potential
© 2007 - 2025 Marc Nuri