From Deep Web to Deep Learning: The Props Revolution

From Deep Web to Deep Learning: The Props Revolution

By Ben Marsh - Sei Labs Research for Sei Research Initiative

Special thanks to Jake, and Naman for feedback and discussions.


The Sei blockchain is not just another high performance network. It is the bedrock on which new, trust-anchored data solutions can thrive. Approximately 95% of humanity's data resides in the deep and dark web, while only 5% exists on the surface web. The potential impact of emerging solutions in this space cannot be understated. One emerging concept that stands to greatly benefit from the Sei networks architecture is the idea of protected pipelines, or “props”, introduced by Juels and Koushanfar (2024). Props solve a critical challenge for both AI companies and data owners: enabling machine learning models to access high-quality private data while preserving privacy, authenticity, and scalability, allowing organizations to build better AI models without compromising sensitive information. By integrating props with Sei protocol, developers and businesses can create reliable pipelines that feed ML systems the data they need without compromising on speed, trust, or confidentiality.

ML models rely on abundant, accurate data to generate high quality predictions. Yet much of the most valuable data—such as patient health records, enterprise financial logs, or proprietary research—is hidden behind access controls, sensitive in nature, or stored in enclaves that cannot be easily shared. Props tackle this challenge by establishing a pipeline that can securely fetch and verify data from, unindexed, deep web sources, all without requiring server modifications or exposing private details (Juels and Koushanfar, 2024). Instead of treating private data as off limits, props show that sensitive information can be leveraged ethically and securely. 

What does it mean to act ethically in this context? Props align well with a rules-based, deontological approach to ethics in data usage. By embedding firm policies about privacy, security, and user consent into the pipeline, props can help ensure that data is only used if it respects fundamental moral rights of an individual rather than simply maximizing overall benefits, as a purely utilitarian view might suggest.

A fundamental strength of props is their ability to produce cryptographic proofs of data authenticity, ensuring that ML models consume only verifiably legitimate data. To use the example Juels and Koushanfar give, assume a user Alice wants to give her health data to a machine learning company. A proof can be generated that the records are authentic and have come from the deep web, from a payer or health provider. They can also support privacy preserving transformations, such as redacting personally identifiable information or aggregating multiple data points to protect individual users. For example, Alice from the previous example may choose to redact her address before sharing her health records in a prop-based filter. This flexibility means props do not simply deliver raw data; they deliver data that can be refined, proven, and tailored to the model’s needs as defined by the filters and constraints put in place by the developers, and making the most of the possibility of trusted provenance through authenticated pipelines. Through this dynamic flexibility, props break down traditional barriers and enable ML pipelines that are both more secure and more informative than before.

These pipelines are built on two core ideas: secure data sourcing and pinned models. Secure data sourcing assures that data truly originates from an intended private source, while pinned models, pinned through methods such as zkML and TEEs, ensure that any inference or computation results are produced by a specific, known model. Together, these elements form a trust chain that stretches from the original data source to the final ML inference, making it possible to integrate data from challenging environments without losing confidence or control.

Why run props on Sei blockchain specifically? Sei is a fast, L1 blockchain designed to handle large volumes of transactions and data operations (Sei Labs, 2024). ML pipelines often need rapid, continuous verification of data authenticity, and Sei’s architecture is well suited for this. With Sei blockchain, proof records and data attestations become tamper resistant entries on the blockchain. Once a piece of data is verified through props and anchored on Sei blockchain, its authenticity cannot be undone or quietly altered.

Moreover, the Sei developer ecosystem makes it straightforward for builders to integrate props. Instead of wrestling with complicated cryptography or special purpose hardware, developers can tap into well documented tools and services. Sei’s consensus ensures that verification events, which lie at the heart of props, finalize quickly and reliably. This allows ML models to receive new, verified data frequently, keeping predictions and insights fresh in real time.

By combining props and Sei blockchain, ML developers can confidently include data sources that were once off limits.

Consider a scenario where an AI driven healthcare application wants to analyze patient records to detect patterns in rare diseases. Without props, this data would remain inaccessible, leaving the model with only public datasets of questionable relevance. With props, the system can securely fetch patient records after redacting personal identifiers, verifying through cryptographic proofs that the records are genuine and unaltered. Running on Sei blockchain ensures these proofs are indisputable, so stakeholders can trust the model’s learning process.

A second scenario might involve a financial AI that forecasts market trends. Instead of relying only on public price data, it could incorporate proprietary transaction logs from private exchanges. Props let the exchange prove the authenticity of each data point, while Sei blockchain stores the proofs. The ML model, now receiving reliable inputs, can improve the quality of its forecasts, potentially informing better investment decisions.

This combination can also reduce risks associated with adversarial data. ML models are vulnerable to adversarial examples that can trick them into making incorrect predictions (Goodfellow et al. 2014). With props on Sei, data inputs are continuously vetted, limiting the room for malicious actors to insert harmful inputs. This creates more stable and trustworthy ML pipelines where the emphasis is on transparency and verified authenticity.

Under the hood, props rely on cryptographic tools and optionally trusted execution environments, or other privacy preserving systems, to maintain data secrecy and proof generation. The notion of privacy-preserving queries and zero knowledge proofs, as proposed by Zhang et al. (2020) and refined by others, makes it possible to show that data came from a certain source without revealing it publicly. Sei’s fast finality and throughput ensure these proofs are posted on chain efficiently, helping maintain a continuous flow of verified data to ML models.

Sei’s on-chain state can store references to these proofs, and smart contracts can enforce policies on what data is acceptable and decentralised model pinning through oracle networks, allowing Sei blockchain to form the basis of a props implementation, as outlined by Juels and Koushanfar. For example, a contract might require proofs that certain regulatory conditions are met before allowing data into the off-chain ML pipeline. Thus, the entire system from data source to final ML inference can operate under a shared understanding of trust, speed, and careful validation.

While the main focus here is on scaling secure and private data pipelines, there is also a broader vision at play. As Bostrom and Yudkowsky (2011) note, ensuring that AI systems align with human values and operate responsibly is crucial as these technologies grow more influential. Props on Sei help move us toward a world where AI can rely on verifiably authentic and private data inputs, reducing the chances of harmful consequences that arise from poor quality or manipulated information.

Props create conditions that support healthier ML ecosystems by enabling richer, more trustworthy datasets. Combined with Sei’s immutable audit trail, this encourages more transparent data use, and potentially fewer biases or security breaches. Although the combination of props and Sei cannot solve every challenge, they provide a solid infrastructural layer that makes it easier to build AI systems that are not only high performing but also better aligned with the interests and needs of their users.

Props are still evolving, and so is Sei. As both mature, we can expect more straightforward integrations, additional tooling, and even more automation in data verification. The ultimate outcome is a more robust environment for building AI driven applications that can adapt, learn, and deliver insights with confidence.

This future represents a shift in how we approach ML data sourcing. With props on Sei, data is no longer an unverified jumble. It becomes a carefully validated resource that ensures ML models are always operating on solid ground. The result is a more stable, scalable, and dependable world for AI, one that acknowledges the complexities of modern data while embracing the trust and reliability that blockchain technologies like Sei can deliver.

Join the Sei Research Initiative

We invite developers, researchers, and community members to join us in this mission. This is an open invitation for open source collaboration to build a more scalable blockchain infrastructure. Check out Sei Protocol’s documentation, and explore Sei Foundation grant opportunities (Sei Creator FundJapan Ecosystem Fund). Get in touch - collaborate[at]seiresearch[dot]io

References

Bostrom, N., Yudkowsky, E. (2011). The ethics of artificial intelligence. In: The Cambridge Handbook of Artificial Intelligence. Cambridge University Press.

Goodfellow, I., Shlens, J., Szegedy, C. (2014) Explaining and harnessing adversarial examples. Available at: https://arxiv.org/pdf/1412.6572

Juels, A., Koushanfar, F. (2024). Props for Machine-Learning Security. Available at: https://arxiv.org/pdf/2410.20522v1

Zhang, F., Cecchetti, E., Croman, K., Juels, A. (2020) Deco: Liberating web data using decentralized oracles for TLS. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security (CCS '20).

Sei Research Initiative (2025). Available at: https://www.sei.io/