In this article, the authors present a fully automated large-scale end-to-end pipeline that starts with the step of provenance image filtering (over millions of images) and ends up with the provenance graphs. The most immediate application of provenance image filtering is forensics, where the detection of manipulated images spans traditional policing to analysis for strategic intelligence. The question of the origins of suspect images has taken a prominent role recently, with the rise of so-called “fake news” on the Internet.
A comprehensive set of experiments for each stage of the pipeline is provided, comparing the proposed solution with state-of-the-art results, employing previously published datasets. In addition, this work introduces a new dataset of real-world provenance cases from the social media site Reddit, along with baseline results.