Sign In

Communications of the ACM

ACM TechNews

Dark Web ChatGPT Unleashed: Meet DarkBERT

View as: Print Mobile App Share:
Artist's conception of a hacker on the dark web.

To train the model, the researchers crawled the Dark Web through the Tor network, then filtered the raw data (applying techniques such as deduplication, category balancing, and data pre-processing) to generate a Dark Web database.

Credit: freepik

Researchers at South Korea's Korea Advanced Institute of Science and Technology (KAIST) and data intelligence company S2W have created a large language model (LLM) trained on Dark Web data.

The researchers fed the RoBERTa framework a database they compiled from the Dark Web via the Tor network to create the DarkBERT LLM, which can analyze and extract useful information from a new piece of Dark Web content composed in its own dialects and heavily-coded messages.

They demonstrated DarkBERT's superior performance to other LLMs, which should enable security researchers and law enforcement to delve deeper into the Dark Web.

From Tom's Hardware
View Full Article


Abstracts Copyright © 2023 SmithBucklin, Washington, DC, USA


No entries found