Pywikibot/Pages
From charlesreid1
Contents
Page Objects
These represent MediaWiki pages, obviously.
Many of Site object's methods return Pages or PageGenerators (see Pywikibot/Sites).
To get a site to begin with, you need a user-config.py (see Pywikibot/Setup). Then you should run this from Python, in the directory where your user config file is located:
import pywikibot s = pywikibot.Site()
Revision Objects
Each page has an edit history, consisting of different versions of the page at different points in time. Revision objects represent atomic information about a given revision of a given page.
Useful Actions
To build a graph of links on a given wiki, there are a couple of methods that are useful:
- page.backlinks - lists pages that link to the given page
- page.linkedPages - lists pages on this wiki that this page links to
- page.extlinks - lists external targets that this page links to
- page.getReferences - similar to backlinks, but includes transclusions too
backlinks
In [43]: page.backlinks
Out[43]: <bound method deprecated_args.<locals>.decorator.<locals>.wrapper of Page('AOCP/Binomial Coefficients')>
In [44]: page.backlinks()
Out[44]: <itertools.chain at 0x10838c828>
In [45]: list(page.backlinks())
Out[45]:
[Page('Flags'),
Page('Algorithms/Combinatorics and Heuristics'),
Page('Algorithms/Combinatorics'),
Page('AOCP/Multisets'),
Page('AOCP/Permutations'),
Page('Template:AOCPFlag'),
Page('AOCP/Multinomial Coefficients'),
Page('AOCP/Harmonic Numbers'),
Page('AOCP/Fibonacci Numbers'),
Page('ACOP/Generating Functions'),
Page('AOCP'),
Page('AOCP/Generating Functions'),
Page('Generating Functions'),
Page('AOCP/Combinatorics'),
Page('Cards'),
Page('Binomial Coefficients'),
Page('AOCP/Generating Permutations and Tuples'),
Page('Letter Coverage'),
Page('Five Letter Words')]
In [46]:
linkedPages
Asking for linkedPages() will return all pages that the current page contains links TO. This method returns a PageGenerator object, similar to the site's allpages() method. As before, we pass that into a list() method to return each item from the generator and construct a list from the results.
In [46]: list(page.linkedPages())
Out[46]:
[Page('AOCP/Boolean Functions'),
Page('AOCP/Combinatorial Algorithms'),
Page('AOCP/Infinite Series'),
Page('Algorithm Analysis/Randomized Quick Sort'),
Page('Algorithm Analysis/Substring Pattern Matching'),
Page('ACOP/Generating Functions'),
Page('AOCP/Combinatorics'),
Page('AOCP/Fibonacci Numbers'),
Page('AOCP/Five Letter Words'),
Page('AOCP/Generating Permutations and Tuples'),
Page('AOCP/Harmonic Numbers'),
Page('AOCP/Multinomial Coefficients'),
Page('AOCP/Multisets'),
Page('Algorithm Analysis/Matrix Multiplication'),
Page('Algorithm Analysis/Merge Sort'),
Page('Algorithm complexity'),
Page('Algorithmic Analysis of Sort Functions'),
Page('Algorithms'),
Page('Algorithms/Combinatorics'),
Page('Algorithms/Combinatorics and Heuristics'),
Page('Algorithms/Data Structures'),
Page('Algorithms/Graphs'),
Page('Algorithms/Optimization'),
Page('Algorithms/Search'),
Page('Algorithms/Sort'),
Page('Algorithms/Strings'),
Page('Amortization'),
Page('Amortization/Accounting Method'),
Page('Binary Search'),
Page('Binary Search Modifications'),
Page('CS'),
Page('Cards'),
Page('Divide and Conquer'),
Page('Divide and Conquer/Master Theorem'),
Page('Estimation'),
Page('Estimation/BitsAndBytes'),
Page('Five Letter Words'),
Page('Flags'),
Page('Heap Sort'),
Page('Letter Coverage'),
Page('Merge Sort'),
Page('Project Euler'),
Page('Quick Sort'),
Page('Rubiks Cube/Permutations'),
Page('Rubiks Cube/Tuples'),
Page('Skiena Chapter 4 Questions'),
Page('Theta vs Big O'),
Page('Template:AOCPFlag'),
Page('Template:AlgorithmsFlag'),
Category('Category:AOCP')]
In [47]: type(page.linkedPages())
Out[47]: pywikibot.data.api.PageGenerator
extlinks
Asking for the external links on a given page will return a plain generator:
In [48]: type(page.extlinks()) Out[48]: generator In [49]: list(page.extlinks()) Out[49]: ['http://charlesreid1.com/w/index.php?title=Template:AOCPFlag&action=edit', 'http://charlesreid1.com/w/index.php?title=Template:AlgorithmsFlag&action=edit', 'https://charlesreid1.com:3000/cs/study-plan']
getReferences
Not sure how this is different from backlinks, but it is almost entirely the same (only one item is in backlinks but not in getReferences).
In [50]: list(page.getReferences())
Out[50]:
[Page('Flags'),
Page('Algorithms/Combinatorics and Heuristics'),
Page('Algorithms/Combinatorics'),
Page('AOCP/Multisets'),
Page('AOCP/Permutations'),
Page('Template:AOCPFlag'),
Page('AOCP/Multinomial Coefficients'),
Page('AOCP/Harmonic Numbers'),
Page('AOCP/Fibonacci Numbers'),
Page('ACOP/Generating Functions'),
Page('AOCP'),
Page('AOCP/Generating Functions'),
Page('Generating Functions'),
Page('AOCP/Combinatorics'),
Page('Cards'),
Page('AOCP/Generating Permutations and Tuples'),
Page('Letter Coverage'),
Page('Five Letter Words')]
In [51]: type(page.getReferences())
Out[51]: itertools.islice
In [52]: type(page.backlinks())
Out[52]: itertools.chain
If we ask for some help, we can see the difference between these two methods:
In [54]: help(page.getReferences)
Help on method getReferences in module pywikibot.page:
getReferences(follow_redirects=True, withTemplateInclusion=True, onlyTemplateInclusion=False, redirectsOnly=False, namespaces=None, total=None, content=False, step=NotImplemented) method of pywikibot.page.Page instance
Return an iterator all pages that refer to or embed the page.
If you need a full list of referring pages, use
C{pages = list(s.getReferences())}
@param follow_redirects: if True, also iterate pages that link to a
redirect pointing to the page.
@param withTemplateInclusion: if True, also iterate pages where self
is used as a template.
@param onlyTemplateInclusion: if True, only iterate pages where self
is used as a template.
@param redirectsOnly: if True, only iterate redirects to self.
@param namespaces: only iterate pages in these namespaces
@param total: iterate no more than this number of pages in total
@param content: if True, retrieve the content of the current version
of each referring page (default False)
In [55]: help(page.backlinks)
Help on method backlinks in module pywikibot.page:
backlinks(followRedirects=True, filterRedirects=None, namespaces=None, total=None, content=False, step=NotImplemented) method of pywikibot.page.Page instance
Return an iterator for pages that link to this page.
@param followRedirects: if True, also iterate pages that link to a
redirect pointing to the page.
@param filterRedirects: if True, only iterate redirects; if False,
omit redirects; if None, do not filter
@param namespaces: only iterate pages in these namespaces
@param total: iterate no more than this number of pages in total
@param content: if True, retrieve the content of the current version
of each referring page (default False)
How To Edit A Page
Suppose you want to change a page's text. How do you go about doing that?
First, you can access a page's text using the text attribute:
>>> print(page.text) ==Stage 1: Collecting System Data== ===COMPLETED Phase 1a: Netdata=== First, we set up [[Netdata]]. ...
Once we've retrieved a page, we can update its text as follows:
>>> page.text = u"new page text" >>> page.save(u"Log message for this edit")
All Available Methods
Page Object Methods
A list of all available methods for Page objects:
>>> dir(page) ['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__unicode__', '__weakref__', '_applicable_protections', '_cache_attrs', '_cmpkey', '_contentmodel', '_cosmetic_changes_hook', '_getInternals', '_get_parsed_page', '_isredir', '_latest_cached_revision', '_link', '_namespace_obj', '_pageid', '_protection', '_revid', '_revisions', '_save', '_timestamp', 'applicable_protections', 'aslink', 'autoFormat', 'backlinks', 'botMayEdit', 'canBeEdited', 'categories', 'change_category', 'clear_cache', 'content_model', 'contributingUsers', 'contributors', 'coordinates', 'data_item', 'data_repository', 'defaultsort', 'delete', 'depth', 'editTime', 'embeddedin', 'encoding', 'exists', 'expand_text', 'extlinks', 'fullVersionHistory', 'full_url', 'get', 'getCategoryRedirectTarget', 'getCreator', 'getDeletedRevision', 'getLatestEditors', 'getMovedTarget', 'getOldVersion', 'getRedirectTarget', 'getReferences', 'getRestrictions', 'getTemplates', 'getVersionHistory', 'getVersionHistoryTable', 'image_repository', 'imagelinks', 'interwiki', 'isAutoTitle', 'isCategory', 'isCategoryRedirect', 'isDisambig', 'isEmpty', 'isFlowPage', 'isImage', 'isIpEdit', 'isRedirectPage', 'isStaticRedirect', 'isTalkPage', 'is_categorypage', 'is_filepage', 'is_flow_page', 'iterlanglinks', 'itertemplates', 'langlinks', 'lastNonBotUser', 'latestRevision', 'latest_revision', 'latest_revision_id', 'linkedPages', 'loadDeletedRevisions', 'markDeletedRevision', 'merge_history', 'move', 'moved_target', 'namespace', 'oldest_revision', 'pageAPInfo', 'page_image', 'pageid', 'permalink', 'preloadText', 'previousRevision', 'previous_revision_id', 'properties', 'protect', 'protection', 'purge', 'put', 'put_async', 'raw_extracted_templates', 'removeImage', 'replaceImage', 'revision_count', 'revisions', 'save', 'section', 'sectionFreeTitle', 'set_redirect_target', 'site', 'templates', 'templatesWithParams', 'text', 'title', 'titleForFilename', 'titleWithoutNamespace', 'toggleTalkPage', 'touch', 'undelete', 'urlname', 'userName', 'version', 'watch'] |
Revision Object Methods
>>> revs = list(page.revisions()) >>> dir(revs[0]) ['FullHistEntry', 'HistEntry', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__unicode__', '__weakref__', '_content_model', '_parent_id', '_sha1', '_thank', 'anon', 'comment', 'content_model', 'full_hist_entry', 'hist_entry', 'minor', 'parent_id', 'revid', 'rollbacktoken', 'sha1', 'text', 'timestamp', 'user'] |