Gridcoin Scraper

Gridcoin Scraper

The Scraper as described below meets or exceeds the definition of a blockchain Oracle and as such the Scraper system is being renamed the Gridcoin Oracle. See https://medium.com/fabric-ventures/decentralised-oracles-a-comprehensive-overview-d3168b9a8841 and https://cryptobriefing.com/what-is-blockchain-oracle/.

The previous .NET Visual Basic based distributed computing statistics gathering system, referred to as the Gridcoin “Neural Network,” was not really a neural network. It was actually a rules based system for gathering 3rd party distributed computing statistics (currently from BOINC projects) off blockchain, summarizing and normalizing them, and then providing a mechanism for the nodes in the network to agree on the statistics and put them on the blockchain in summarized form once a day. (This is referred to as a superblock.) The research rewards are then calculated and generated for/by staking wallets that perform research via the distributed computing platform BOINC, confirmed by other nodes in accordance with blockchain protocols (referred to as Proof-of-Research).

This existing system had a number of serious defects and had been in need of replacement for some time. In October 2018 Jim Owens began a project which originally had the goal of fully implementing Paul Jensen’s prototype statistics proxy program. (See https://github.com/gridcoin-community/ScraperProxy). As this project progressed, it became apparent that this was more properly scoped as a complete rewrite of the existing “Neural Network” subsystem, and should be written entirely in C++ as part of the core wallet. The scraper has been developed on the integrated_scraper branch in the author’s Github repository fork of Gridcoin and was merged into the development branch of the official Gridcoin repository on February 25, 2019. (See https://github.com/jamescowens/Gridcoin-Research and https://github.com/gridcoin-community/Gridcoin-Research/commit/989665d699fb9753cd2d519c39ed347d4298652f).

After several months of testing and refinement on the testnet network, the Core Developers released the new Scraper as the highlight of the Denise milestone release (4.0.3.0). Despite the significant amount of new code (>10000 lines of C++ representing more than 250 hours of development time), it was designed to be rolled out in a leisure release, as the initial version is protocol compatible with the existing “Neural Network.”

The new Scraper consists of three major parts:

  1. The actual scraper handles the downloading of stats files from the BOINC projects, and the filtering, compression, and publishing (with hashes and signatures) of the stats files to the network. This was designed and written by Jim Owens. (“Scraper”)

  2. The scraper networking code uses the wallet messaging system in an elegant fashion to automatically distribute the compressed, hashed, and signed stats files to all of the nodes. The author is grateful to Tomas Brada (tomasbrod) for writing a very elegant approach for this part. (“Scraper Net”)

  3. The interface to the “neural network”, which interfaces the core wallet to the scraper and together with the existing functions in the core wallet, provides the core “neural network” functionality. The author is grateful to Marco Nilsson (ravon) for this contribution. (“NN”) This is going to be renamed the Research Rewards module from “neural network” in Elizabeth to better reflect its functionality.

Old vs. New Scraper Comparison

Category Old VB .NET New Native C++
Scalability Severely limited. No support for removal of team requirement. High - At least 20x current capacity - while maintaining constant low load on BOINC statistics sites. Fully supports the removal of the team requirement, which is scheduled for the Elizabeth Milestone (4.1.0.0).
Cross Platform Compatibility Windows only - Requires GUI. Completely cross platform - supports all platforms the wallet supports - currently Win64, Win32, Linux (Intel 64 and 32 bit, ARM 64 and 32 bit), and MacOS (Intel 64 bit) and can be run daemon-only (headless).
Reliability/Availability Low - due to single point of failure for old scraper High - Support for multiple scrapers, cross-verified by the nodes, with a configurable (nominally 48 hour) statistics retention period, ensures scraper outages are transparent.
Security Poor - Single scraper model allowed the possibility of a man in the middle attack Very High - Each scraper must be authorized to publish statistics to the network. Each scraper hashes and signs all statistics and these hashes and signatures are checked and cross-verified by all nodes. Unauthorized scrapers’ statistics are deleted and they are banned from the network.
Network Bandwidth Use High - the original scraper simply forwarded uncompressed and unfiltered statistics files (>300 MB for a complete set), the same as when the nodes downloaded them directly Extremely Low - the new scrapers download the stats, filter, and compress them, reducing >300 MB of statistics to 4-5 MB for a 48 hour retention period. Statistics are shared in two stages: the statistics directory is “pushed”, and then the actual statistics are “pulled” by the nodes to get the statistics the node does not already have. This minimizes network traffic. Since the messages are signed, they can be forwarded by intermediate nodes, just like other network messages, such as transactions.
Client CPU Use High - the “Neural Net” on each node could eat up at least 1 CPU for up to 30 minutes for processing the statistics. Extremely Low - the normal nodes process the scraper statistics in under three seconds for a typical Intel CPU. This ensures the CPU goes towards computing not administration.
Client Disk Use High - up to 2 GB used on the client drive. Significant disk loading during operation None - All scraper statistics are compressed and stored in memory
Client Memory Use Moderate. The .NET runtime adds overhead to the wallet Low - Very little additional memory required (<50 MB).
BOINC Server Resource Use High - The old scraper sometimes downloaded statistics files over and over that were already downloaded. If the single scraper was down, each node would fall back to downloading its own statistics, crushing the BOINC servers (250+ nodes at once). Low - Typically five scrapers in operation - each downloads statistics files for a 4 hour window before the superblock is due, only downloading changed statistics. This results in a constant, low load on the BOINC servers only during the 4 hour window regardless of the size of the Gridcoin network.
Maintainability Low - Used non-native development and build tool chain (Microsoft Visual Studio .NET) that is not open source and also does not play well with core wallet. This hampered development, testing, and the release process. High - Written to conform to Gridcoin’s coding standards and 100% C++, well commented, with a modular design that is easily extensible, and completely integrated into core wallet.

The milestone release Elizabeth is planned to be a mandatory and will include a number of protocol upgrades that will necessitate a mandatory version and build on the new scraper functionality introduced in Denise. The Gridcoin team requirement will be removed.