url-citation-search
Description
Find papers and preprints that cite a given URL, especially blogs, docs pages, project pages, or other web content that standard citation indexes often miss. Use when the user provides a URL and wants confirmed citing works, evidence from PDF or HTML references, DOI or arXiv links, BibTeX, or a deduplicated citation report.
Code
Skill Code
--- name: url-citation-search description: Find papers and preprints that cite a given URL, especially blogs, docs pages, project pages, or other web content that standard citation indexes often miss. Use when the user provides a URL and wants confirmed citing works, evidence from PDF or HTML references, DOI or arXiv links, BibTeX, or a deduplicated citation report. --- # URL Citation Search ## Use this skill "keyword">for - A user gives a URL and asks which papers cited it. - The target is a blog post, documentation page, project page, demo, GitHub page, or other web content rather than a standard paper. - The user wants confirmed citations, not just likely matches from search results. ## Workflow 1. Resolve the target page. - Fetch the page and record the visible title. - Extract the canonical URL "keyword">if present. - Keep obvious variants: `http/https`, `www/non-www`, trailing slash, moved domains, and mirrored or cross-posted URLs. - Keep direct-URL variants separate from mirror URLs. The final report should distinguish `direct citation` from `mirror citation`. - Record author and date only when they help disambiguate the page. 2. Build search keys. - Exact full URL. - Host plus path without protocol. - Stable slug or final path segment. - Exact page title. - Title variants exposed by the page metadata or mirror pages. 3. Search in this order. - Exact full URL and protocol-less URL in a general web search. - Exact title and slug in a general web search. - Site-restricted searches on academic sources such as `arxiv.org`, `openreview.net`, `aclanthology.org`, `proceedings.neurips.cc`, `proceedings.mlr.press`, `dl.acm.org`, `ieeexplore.ieee.org`, `nature.com`, `link.springer.com`, and `ceur-ws.org`. - Mirror or cross-post URLs too. Papers often cite the mirror instead of the current canonical URL. - If direct URL or title search is sparse, build a topical candidate pool and batch-scan likely papers. This is especially useful on arXiv when full-text search misses exact quoted strings. 4. Verify every candidate. - Keep a work only "keyword">if its PDF, HTML reference list, or in-text bibliography contains the target URL, title, slug, or a verified mirror URL. - Search snippets alone are not enough. - PDF-only evidence is acceptable when the exact URL appears in extracted text or PDF link annotations. - Publisher article pages may expose references in HTML metadata such as `citation_reference` or dedicated bibliography pages. Use those before scraping PDFs. - arXiv HTML bibliography entries may hide the real target inside `External Links` even when the visible text omits the URL. Inspect the underlying href, not just the rendered text. - Bibliography pages, project pages, and search indexes are candidate finders, not final proof, unless they expose the actual reference entry "keyword">for the citing work. 5. Deduplicate and normalize. - If the same work exists as both a preprint and a published paper, prefer the published version in the main list and keep the preprint as an access fallback. - Keep genuine preprints when no formal version exists. - Do not assume the published version preserves a web citation seen in the preprint. Verify each version separately. - Keep official same-site variants separate from mirrors. Examples: `/engineering/...` versus `/research/...`, renamed official paths, or root-versus-`/home` pages on the same site. - Separate `confirmed` from `candidate` when verification is incomplete. ## Output Choose the lightest format that matches the user's ask. - Brief list: title, year, type, and landing link. - Citation report: title, link, DOI or arXiv ID, evidence note, and whether it is published or a preprint. - When useful, distinguish `direct`, `official variant`, and `mirror` citation. - Bib mode: "keyword">return BibTeX "keyword">for the citing papers. Return BibTeX "keyword">for the target URL only "keyword">if the user explicitly asks "keyword">for it. ## Heuristics That Matter - Standard citation indexes often miss web pages, so reverse search on URL, title, and slug is usually more reliable than `cited by` counts. - arXiv full-text search misses some quoted strings. If it returns nothing, switch back to general search and inspect candidate PDFs or HTML directly. - On arXiv, broad topic searches plus batch inspection of candidate HTML reference lists can recover citations that exact-string full-text search misses. - Moved domains and cross-posts are common. Check canonical tags and obvious mirrors before concluding that nothing cites the page. - Official variants are common too. A paper may cite a renamed or migrated first-party URL rather than the exact URL from the seed paper. - PDF text extraction is lossy. If the visible citation text is missing, inspect PDF link annotations or raw extracted strings "keyword">for the URL. - Publisher HTML can be better than PDF "keyword">for verification because references may already be normalized into page metadata. - Reject false positives aggressively: a paper on the same topic is not a citation unless the reference is visible.
Usage Examples
Examples
Open https://github.com/zjsxply/skills/blob/main/url-citation-search/SKILL.md and review the skill instructions.
Use this skill when you need find and verify papers that cite a given URL
Installation
Add the following to your CLAUDE.md file (project root or ~/.claude/CLAUDE.md for global):
CLAUDE.md
--- name: url-citation-search description: Find papers and preprints that cite a given URL, especially blogs, docs pages, project pages, or other web content that standard citation indexes often miss. Use when the user provides a URL and wants confirmed citing works, evidence from PDF or HTML references, DOI or arXiv links, BibTeX, or a deduplicated citation report. --- # URL Citation Search ## Use this skill for - A user gives a URL and asks which papers cited it. - The target is a blog post, documentation page, project page, demo, GitHub page, or other web content rather than a standard paper. - The user wants confirmed citations, not just likely matches from search results. ## Workflow 1. Resolve the target page. - Fetch the page and record the visible title. - Extract the canonical URL if present. - Keep obvious variants: `http/https`, `www/non-www`, trailing slash, moved domains, and mirrored or cross-posted URLs. - Keep direct-URL variants separate from mirror URLs. The final report should distinguish `direct citation` from `mirror citation`. - Record author and date only when they help disambiguate the page. 2. Build search keys. - Exact full URL. - Host plus path without protocol. - Stable slug or final path segment. - Exact page title. - Title variants exposed by the page metadata or mirror pages. 3. Search in this order. - Exact full URL and protocol-less URL in a general web search. - Exact title and slug in a general web search. - Site-restricted searches on academic sources such as `arxiv.org`, `openreview.net`, `aclanthology.org`, `proceedings.neurips.cc`, `proceedings.mlr.press`, `dl.acm.org`, `ieeexplore.ieee.org`, `nature.com`, `link.springer.com`, and `ceur-ws.org`. - Mirror or cross-post URLs too. Papers often cite the mirror instead of the current canonical URL. - If direct URL or title search is sparse, build a topical candidate pool and batch-scan likely papers. This is especially useful on arXiv when full-text search misses exact quoted strings. 4. Verify every candidate. - Keep a work only if its PDF, HTML reference list, or in-text bibliography contains the target URL, title, slug, or a verified mirror URL. - Search snippets alone are not enough. - PDF-only evidence is acceptable when the exact URL appears in extracted text or PDF link annotations. - Publisher article pages may expose references in HTML metadata such as `citation_reference` or dedicated bibliography pages. Use those before scraping PDFs. - arXiv HTML bibliography entries may hide the real target inside `External Links` even when the visible text omits the URL. Inspect the underlying href, not just the rendered text. - Bibliography pages, project pages, and search indexes are candidate finders, not final proof, unless they expose the actual reference entry for the citing work. 5. Deduplicate and normalize. - If the same work exists as both a preprint and a published paper, prefer the published version in the main list and keep the preprint as an access fallback. - Keep genuine preprints when no formal version exists. - Do not assume the published version preserves a web citation seen in the preprint. Verify each version separately. - Keep official same-site variants separate from mirrors. Examples: `/engineering/...` versus `/research/...`, renamed official paths, or root-versus-`/home` pages on the same site. - Separate `confirmed` from `candidate` when verification is incomplete. ## Output Choose the lightest format that matches the user's ask. - Brief list: title, year, type, and landing link. - Citation report: title, link, DOI or arXiv ID, evidence note, and whether it is published or a preprint. - When useful, distinguish `direct`, `official variant`, and `mirror` citation. - Bib mode: return BibTeX for the citing papers. Return BibTeX for the target URL only if the user explicitly asks for it. ## Heuristics That Matter - Standard citation indexes often miss web pages, so reverse search on URL, title, and slug is usually more reliable than `cited by` counts. - arXiv full-text search misses some quoted strings. If it returns nothing, switch back to general search and inspect candidate PDFs or HTML directly. - On arXiv, broad topic searches plus batch inspection of candidate HTML reference lists can recover citations that exact-string full-text search misses. - Moved domains and cross-posts are common. Check canonical tags and obvious mirrors before concluding that nothing cites the page. - Official variants are common too. A paper may cite a renamed or migrated first-party URL rather than the exact URL from the seed paper. - PDF text extraction is lossy. If the visible citation text is missing, inspect PDF link annotations or raw extracted strings for the URL. - Publisher HTML can be better than PDF for verification because references may already be normalized into page metadata. - Reject false positives aggressively: a paper on the same topic is not a citation unless the reference is visible.
Get the full source code and documentation on GitHub:
🔗 View on GitHub
Clone: git clone https://github.com/zjsxply/skills
Comments (0)
No comments yet. Be the first to share your thoughts!
Leave a Comment