Pywikibot/Pages
From charlesreid1
Contents
Page Objects
These represent MediaWiki pages, obviously.
Many of Site object's methods return Pages or PageGenerators (see Pywikibot/Sites).
To get a site to begin with, you need a user-config.py
(see Pywikibot/Setup). Then you should run this from Python, in the directory where your user config file is located:
import pywikibot s = pywikibot.Site()
Revision Objects
Each page has an edit history, consisting of different versions of the page at different points in time. Revision objects represent atomic information about a given revision of a given page.
Useful Actions
To build a graph of links on a given wiki, there are a couple of methods that are useful:
- page.backlinks - lists pages that link to the given page
- page.linkedPages - lists pages on this wiki that this page links to
- page.extlinks - lists external targets that this page links to
- page.getReferences - similar to backlinks, but includes transclusions too
backlinks
In [43]: page.backlinks Out[43]: <bound method deprecated_args.<locals>.decorator.<locals>.wrapper of Page('AOCP/Binomial Coefficients')> In [44]: page.backlinks() Out[44]: <itertools.chain at 0x10838c828> In [45]: list(page.backlinks()) Out[45]: [Page('Flags'), Page('Algorithms/Combinatorics and Heuristics'), Page('Algorithms/Combinatorics'), Page('AOCP/Multisets'), Page('AOCP/Permutations'), Page('Template:AOCPFlag'), Page('AOCP/Multinomial Coefficients'), Page('AOCP/Harmonic Numbers'), Page('AOCP/Fibonacci Numbers'), Page('ACOP/Generating Functions'), Page('AOCP'), Page('AOCP/Generating Functions'), Page('Generating Functions'), Page('AOCP/Combinatorics'), Page('Cards'), Page('Binomial Coefficients'), Page('AOCP/Generating Permutations and Tuples'), Page('Letter Coverage'), Page('Five Letter Words')] In [46]:
linkedPages
Asking for linkedPages() will return all pages that the current page contains links TO. This method returns a PageGenerator object, similar to the site's allpages() method. As before, we pass that into a list() method to return each item from the generator and construct a list from the results.
In [46]: list(page.linkedPages()) Out[46]: [Page('AOCP/Boolean Functions'), Page('AOCP/Combinatorial Algorithms'), Page('AOCP/Infinite Series'), Page('Algorithm Analysis/Randomized Quick Sort'), Page('Algorithm Analysis/Substring Pattern Matching'), Page('ACOP/Generating Functions'), Page('AOCP/Combinatorics'), Page('AOCP/Fibonacci Numbers'), Page('AOCP/Five Letter Words'), Page('AOCP/Generating Permutations and Tuples'), Page('AOCP/Harmonic Numbers'), Page('AOCP/Multinomial Coefficients'), Page('AOCP/Multisets'), Page('Algorithm Analysis/Matrix Multiplication'), Page('Algorithm Analysis/Merge Sort'), Page('Algorithm complexity'), Page('Algorithmic Analysis of Sort Functions'), Page('Algorithms'), Page('Algorithms/Combinatorics'), Page('Algorithms/Combinatorics and Heuristics'), Page('Algorithms/Data Structures'), Page('Algorithms/Graphs'), Page('Algorithms/Optimization'), Page('Algorithms/Search'), Page('Algorithms/Sort'), Page('Algorithms/Strings'), Page('Amortization'), Page('Amortization/Accounting Method'), Page('Binary Search'), Page('Binary Search Modifications'), Page('CS'), Page('Cards'), Page('Divide and Conquer'), Page('Divide and Conquer/Master Theorem'), Page('Estimation'), Page('Estimation/BitsAndBytes'), Page('Five Letter Words'), Page('Flags'), Page('Heap Sort'), Page('Letter Coverage'), Page('Merge Sort'), Page('Project Euler'), Page('Quick Sort'), Page('Rubiks Cube/Permutations'), Page('Rubiks Cube/Tuples'), Page('Skiena Chapter 4 Questions'), Page('Theta vs Big O'), Page('Template:AOCPFlag'), Page('Template:AlgorithmsFlag'), Category('Category:AOCP')] In [47]: type(page.linkedPages()) Out[47]: pywikibot.data.api.PageGenerator
extlinks
Asking for the external links on a given page will return a plain generator:
In [48]: type(page.extlinks()) Out[48]: generator In [49]: list(page.extlinks()) Out[49]: ['http://charlesreid1.com/w/index.php?title=Template:AOCPFlag&action=edit', 'http://charlesreid1.com/w/index.php?title=Template:AlgorithmsFlag&action=edit', 'https://charlesreid1.com:3000/cs/study-plan']
getReferences
Not sure how this is different from backlinks, but it is almost entirely the same (only one item is in backlinks but not in getReferences).
In [50]: list(page.getReferences()) Out[50]: [Page('Flags'), Page('Algorithms/Combinatorics and Heuristics'), Page('Algorithms/Combinatorics'), Page('AOCP/Multisets'), Page('AOCP/Permutations'), Page('Template:AOCPFlag'), Page('AOCP/Multinomial Coefficients'), Page('AOCP/Harmonic Numbers'), Page('AOCP/Fibonacci Numbers'), Page('ACOP/Generating Functions'), Page('AOCP'), Page('AOCP/Generating Functions'), Page('Generating Functions'), Page('AOCP/Combinatorics'), Page('Cards'), Page('AOCP/Generating Permutations and Tuples'), Page('Letter Coverage'), Page('Five Letter Words')] In [51]: type(page.getReferences()) Out[51]: itertools.islice In [52]: type(page.backlinks()) Out[52]: itertools.chain
If we ask for some help, we can see the difference between these two methods:
In [54]: help(page.getReferences) Help on method getReferences in module pywikibot.page: getReferences(follow_redirects=True, withTemplateInclusion=True, onlyTemplateInclusion=False, redirectsOnly=False, namespaces=None, total=None, content=False, step=NotImplemented) method of pywikibot.page.Page instance Return an iterator all pages that refer to or embed the page. If you need a full list of referring pages, use C{pages = list(s.getReferences())} @param follow_redirects: if True, also iterate pages that link to a redirect pointing to the page. @param withTemplateInclusion: if True, also iterate pages where self is used as a template. @param onlyTemplateInclusion: if True, only iterate pages where self is used as a template. @param redirectsOnly: if True, only iterate redirects to self. @param namespaces: only iterate pages in these namespaces @param total: iterate no more than this number of pages in total @param content: if True, retrieve the content of the current version of each referring page (default False) In [55]: help(page.backlinks) Help on method backlinks in module pywikibot.page: backlinks(followRedirects=True, filterRedirects=None, namespaces=None, total=None, content=False, step=NotImplemented) method of pywikibot.page.Page instance Return an iterator for pages that link to this page. @param followRedirects: if True, also iterate pages that link to a redirect pointing to the page. @param filterRedirects: if True, only iterate redirects; if False, omit redirects; if None, do not filter @param namespaces: only iterate pages in these namespaces @param total: iterate no more than this number of pages in total @param content: if True, retrieve the content of the current version of each referring page (default False)
How To Edit A Page
Suppose you want to change a page's text. How do you go about doing that?
First, you can access a page's text using the text attribute:
>>> print(page.text) ==Stage 1: Collecting System Data== ===COMPLETED Phase 1a: Netdata=== First, we set up [[Netdata]]. ...
Once we've retrieved a page, we can update its text as follows:
>>> page.text = u"new page text" >>> page.save(u"Log message for this edit")
All Available Methods
Page Object Methods
A list of all available methods for Page objects:
>>> dir(page) ['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__unicode__', '__weakref__', '_applicable_protections', '_cache_attrs', '_cmpkey', '_contentmodel', '_cosmetic_changes_hook', '_getInternals', '_get_parsed_page', '_isredir', '_latest_cached_revision', '_link', '_namespace_obj', '_pageid', '_protection', '_revid', '_revisions', '_save', '_timestamp', 'applicable_protections', 'aslink', 'autoFormat', 'backlinks', 'botMayEdit', 'canBeEdited', 'categories', 'change_category', 'clear_cache', 'content_model', 'contributingUsers', 'contributors', 'coordinates', 'data_item', 'data_repository', 'defaultsort', 'delete', 'depth', 'editTime', 'embeddedin', 'encoding', 'exists', 'expand_text', 'extlinks', 'fullVersionHistory', 'full_url', 'get', 'getCategoryRedirectTarget', 'getCreator', 'getDeletedRevision', 'getLatestEditors', 'getMovedTarget', 'getOldVersion', 'getRedirectTarget', 'getReferences', 'getRestrictions', 'getTemplates', 'getVersionHistory', 'getVersionHistoryTable', 'image_repository', 'imagelinks', 'interwiki', 'isAutoTitle', 'isCategory', 'isCategoryRedirect', 'isDisambig', 'isEmpty', 'isFlowPage', 'isImage', 'isIpEdit', 'isRedirectPage', 'isStaticRedirect', 'isTalkPage', 'is_categorypage', 'is_filepage', 'is_flow_page', 'iterlanglinks', 'itertemplates', 'langlinks', 'lastNonBotUser', 'latestRevision', 'latest_revision', 'latest_revision_id', 'linkedPages', 'loadDeletedRevisions', 'markDeletedRevision', 'merge_history', 'move', 'moved_target', 'namespace', 'oldest_revision', 'pageAPInfo', 'page_image', 'pageid', 'permalink', 'preloadText', 'previousRevision', 'previous_revision_id', 'properties', 'protect', 'protection', 'purge', 'put', 'put_async', 'raw_extracted_templates', 'removeImage', 'replaceImage', 'revision_count', 'revisions', 'save', 'section', 'sectionFreeTitle', 'set_redirect_target', 'site', 'templates', 'templatesWithParams', 'text', 'title', 'titleForFilename', 'titleWithoutNamespace', 'toggleTalkPage', 'touch', 'undelete', 'urlname', 'userName', 'version', 'watch'] |
Revision Object Methods
>>> revs = list(page.revisions()) >>> dir(revs[0]) ['FullHistEntry', 'HistEntry', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__unicode__', '__weakref__', '_content_model', '_parent_id', '_sha1', '_thank', 'anon', 'comment', 'content_model', 'full_hist_entry', 'hist_entry', 'minor', 'parent_id', 'revid', 'rollbacktoken', 'sha1', 'text', 'timestamp', 'user'] |