Pywikibot: Difference between revisions
From charlesreid1
| Line 480: | Line 480: | ||
this will create a user-config.py configuration file with (almost entirely superfluous) information about the wiki you log into. | this will create a user-config.py configuration file with (almost entirely superfluous) information about the wiki you log into. | ||
From there, it will work pretty much like it did before: | |||
<pre> | |||
In [1]: import pywikibot | |||
In [2]: s = pywikibot.Site() | |||
In [3]: s | |||
Out[3]: APISite("en", "charlesreid1") | |||
In [5]: pywikibot.Page(s,'Linux/Wireless') | |||
Out[5]: Page('Linux/Wireless') | |||
In [8]: hist = list(p.fullVersionHistory()) | |||
WARNING: /Users/charles/Library/Python/3.6/bin/ipython:1: DeprecationWarning: pywikibot.page.BasePage.fullVersionHistory is deprecated; use Page.revisions(content=True) instead. | |||
#!/usr/local/opt/python3/bin/python3.6 | |||
In [10]: hist = list(p.revisions(content=False)) | |||
In[11]: hist[0:3] | |||
Out[11]: | |||
[{'revid': 16262, 'text': None, 'timestamp': Timestamp(2017, 4, 15, 6, 35, 29), 'user': 'Admin', 'anon': False, 'comment': '/* Flags */', 'minor': False, 'rollbacktoken': None, '_parent_id': 16261, '_content_model': 'wikitext', '_sha1': 'df790e36c30e7895fea4e114d40d3515c0345b23'}, | |||
{'revid': 16261, 'text': None, 'timestamp': Timestamp(2017, 4, 15, 6, 32, 26), 'user': 'Admin', 'anon': False, 'comment': '/* Joining network with WPA encryption */', 'minor': False, 'rollbacktoken': None, '_parent_id': 16260, '_content_model': 'wikitext', '_sha1': '430d06d81ecca002199b895b4ab8de76615f86a2'}, | |||
{'revid': 16260, 'text': None, 'timestamp': Timestamp(2017, 4, 15, 6, 31, 10), 'user': 'Admin', 'anon': False, 'comment': '/* WPA Supplicant Method */', 'minor': False, 'rollbacktoken': None, '_parent_id': 16259, '_content_model': 'wikitext', '_sha1': 'ca1ee6ab00d2305992c71c14127e0fc474476a84'}] | |||
In [12]: print(type(hist[0])) | |||
In [19]: revision_dictionary = dict(hist[0].__dict__) | |||
In [20]: print(revision_dictionary.keys()) | |||
dict_keys(['revid', 'text', 'timestamp', 'user', 'anon', 'comment', 'minor', 'rollbacktoken', '_parent_id', '_content_model', '_sha1']) | |||
<class 'pywikibot.page.Revision'> | |||
.............................................. | |||
In [49]: all_pages = list(site.allpages()) | |||
In [50]: print(type(all_pages)) | |||
<class 'list'> | |||
In [51]: print(len(all_pages)) | |||
2292 | |||
In [52]: print(type(all_pages[0])) | |||
<class 'pywikibot.page.Page'> | |||
</pre> | |||
==Flags== | ==Flags== | ||
Revision as of 18:43, 31 January 2018
Setting this up is confusing as hell, mainly because the documentation is lacking.
Pywikibot is a single standalone Python script that works a little bit like a framework. To use it, you assemble various "actions", and run each action through the pywikibot Python script. The first action you'll usually run is the login action, which stores credentials for a wiki. Then, you can run any of the other built-in actions, or define your own actions.
Getting, Configuring, Installing
I have the pywikibot software set up with two remotes: one official (Wikimedia gerrit), and one unofficial (my own git repo).
Link to pywikibot on Wikimedia Foundation's gerrit: https://gerrit.wikimedia.org/r/pywikibot/core.git
Link to pywikibot on git.charlesreid1.com: https://charlesreid1.com:3000/wiki/pywikibot
Wikimedia gerrit
Note the official pywikibot repo is also cloned on Github: https://github.com/wikimedia/pywikibot-core/
Start by checking it out:
$ git clone https://gerrit.wikimedia.org/r/pywikibot/core.git pywikibot $ cd pywikibot
Install all the pip stuff that you may need:
$ pip install -r requirements.txt
Update git submodules:
$ git submodule update --init
Add a custom family file to the big directory of family files:
$ ls pywikibot/families ... wikivoyage_family.py wiktionary_family.py wowwiki_family.py
This is where you will put your custom family file. Here's what the custom family file looks like:
from pywikibot import family
class Family(family.Family):
def __init__(self):
family.Family.__init__(self)
self.name = 'charlesreid1'
self.langs = {
'en': 'charlesreid1.com',
}
Copy and paste this into pwb/pywikibot/families/charlesreid1_family.py (where pwb is the name of the directory where you checked out the git repository).
Now you should be able to log into the wiki as your bot:
$ python pwb.py login Password for user Bleep bloop on charlesreid1:en (no characters will be shown): Logging in to charlesreid1:en as Bleep bloop WARNING: /Users/charles/codes/pywikibot/pywikibot/tools/__init__.py:1717: UserWarning: File /Users/charles/codes/pywikibot/pywikibot.lwp had 644 mode; converted to 600 mode. Logged in on charlesreid1:en as Bleep bloop.
git.charlesreid1.com
Link to pywikibot on git.charlesreid1.com: https://charlesreid1.com:3000/wiki/pywikibot
To push changes to the pywikibot on git.charlesreid1.com I set up the repo with another remote:
$ git remote add cmr https://charlesreid1.com:3000/wiki/pywikibot $ git push cmr master
Running Simple Scripts
There are two ways to use pywikibot:
- Write your own custom actions
- Use a bundle of scripts that come packaged with pywikibot
Using Provided Scripts
Here is a list of all the pre-written scripts for MediaWiki wikis: https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:Pywikibot/Scripts
These are also located in the scripts/ folder of the repository.
To run a given script, you actually run it THROUGH the pwb.py script. See example below.
Redirect.py Script
Suppose we wanted to run the script redirect.py to programmatically deal with redirects on our wiki. We can start by looking at the documentation for this file, which shows us there are many options for this script:
Script to resolve double redirects, and to delete broken redirects.
Requires access to MediaWiki's maintenance pages or to a XML dump file.
Delete function requires adminship.
Syntax:
python pwb.py redirect action [-arguments ...]
where action can be one of these:
double Fix redirects which point to other redirects.
do Shortcut action command is "do".
broken Tries to fix redirect which point to nowhere by using the last
br moved target of the destination page. If this fails and the
-delete option is set, it either deletes the page or marks it
for deletion depending on whether the account has admin rights.
It will mark the redirect not for deletion if there is no speedy
deletion template available. Shortcut action command is "br".
both Both of the above. Retrieves redirect pages from live wiki,
not from a special page.
and arguments can be:
-xml Retrieve information from a local XML dump
(https://download.wikimedia.org). Argument can also be given as
"-xml:filename.xml". Cannot be used with -fullscan or -moves.
-fullscan Retrieve redirect pages from live wiki, not from a special page
Cannot be used with -xml.
-moves Use the page move log to find double-redirect candidates. Only
works with action "double", does not work with -xml.
NOTE: You may use only one of these options above.
If neither of -xml -fullscan -moves is given, info will be
loaded from a special page of the live wiki.
-page:title Work on a single page
-namespace:n Namespace to process. Can be given multiple times, for several
namespaces. If omitted, only the main (article) namespace is
treated.
-offset:n With -moves, the number of hours ago to start scanning moved
pages. With -xml, the number of the redirect to restart with
(see progress). Otherwise, ignored.
-start:title The starting page title in each namespace. Page need not exist.
-until:title The possible last page title in each namespace. Page needs not
exist.
-total:n The maximum count of redirects to work upon. If omitted, there
is no limit.
-delete Prompt the user whether broken redirects should be deleted (or
marked for deletion if the account has no admin rights) instead
of just skipping them.
-sdtemplate:x Add the speedy deletion template string including brackets.
This enables overriding the default template via i18n or
to enable speedy deletion for projects other than wikipedias.
-always Don't prompt you for each replacement.
|
Suppose we want to eliminate double-redirects. To do this, we run the redirect script through pwb.py, and pass it the double argument like so:
$ python pwb.py redirect double
Using Custom Scripts from Pywikibot Directory
Site and Page Objects
NOTE: for the following commands, you must be in the pywikibot repository folder (otherwise the pwb.py script will not be available and the first command will fail.)
We'll start out by creating Site and Page objects to obtain pages from the wiki.
Start ipython:
$ ipython
import the pywikibot package:
In [1]: import pwb
if you want to import a custom script in the scripts folder, import it like this (supposing it is in scripts/beavo.py):
In [4]: import scripts.beavo In [6]: scripts.beavo.test_match_any()
now create a Site object to represent the wiki you're dealing with (you have to have run the login script, to store the credentials to view/edit the wiki):
In [12]: pwb.pywikibot.Site()
Out[12]: APISite("en", "charlesreid1")
next, we can create a Page object to represent pages on the wiki:
In [15]: flag1 = pwb.pywikibot.Page(site,'Template:AircrackFlag')
this Page object has a very long list of available methods:
In [27]: dir(flag1) Out[27]: ['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__unicode__', '__weakref__', '_cache_attrs', '_cmpkey', '_cosmetic_changes_hook', '_getInternals', '_get_parsed_page', '_latest_cached_revision', '_link', '_namespace_obj', '_revisions', '_save', 'applicable_protections', 'aslink', 'autoFormat', 'backlinks', 'botMayEdit', 'canBeEdited', 'categories', 'change_category', 'clear_cache', 'content_model', 'contributingUsers', 'contributors', 'coordinates', 'data_item', 'data_repository', 'defaultsort', 'delete', 'depth', 'editTime', 'embeddedin', 'encoding', 'exists', 'expand_text', 'extlinks', 'fullVersionHistory', 'full_url', 'get', 'getCategoryRedirectTarget', 'getCreator', 'getDeletedRevision', 'getLatestEditors', 'getMovedTarget', 'getOldVersion', 'getRedirectTarget', 'getReferences', 'getRestrictions', 'getTemplates', 'getVersionHistory', 'getVersionHistoryTable', 'image_repository', 'imagelinks', 'interwiki', 'isAutoTitle', 'isCategory', 'isCategoryRedirect', 'isDisambig', 'isEmpty', 'isFlowPage', 'isImage', 'isIpEdit', 'isRedirectPage', 'isStaticRedirect', 'isTalkPage', 'is_categorypage', 'is_filepage', 'is_flow_page', 'iterlanglinks', 'itertemplates', 'langlinks', 'lastNonBotUser', 'latestRevision', 'latest_revision', 'latest_revision_id', 'linkedPages', 'loadDeletedRevisions', 'markDeletedRevision', 'merge_history', 'move', 'moved_target', 'namespace', 'oldest_revision', 'pageAPInfo', 'page_image', 'pageid', 'permalink', 'preloadText', 'previousRevision', 'previous_revision_id', 'properties', 'protect', 'protection', 'purge', 'put', 'put_async', 'raw_extracted_templates', 'removeImage', 'replaceImage', 'revision_count', 'revisions', 'save', 'section', 'sectionFreeTitle', 'set_redirect_target', 'site', 'templates', 'templatesWithParams', 'text', 'title', 'titleForFilename', 'titleWithoutNamespace', 'toggleTalkPage', 'touch', 'undelete', 'urlname', 'userName', 'version', 'watch'] |
Getting a Page's Version History
To get a page's version history, use fullVersionHistory():
In [29]: hist = flag1.fullVersionHistory()
WARNING: /Users/charles/Library/Python/3.6/bin/ipython:1: DeprecationWarning: pywikibot.page.BasePage.fullVersionHistory is deprecated; use Page.revisions(content=True) instead.
#!/usr/local/opt/python3/bin/python3.6
In [30]: print(type(hist))
<class 'list'>
In [31]: print(type(hist[0]))
<class 'pywikibot.page.FullHistEntry'>
In [32]: item = hist[0]
In [33]: print(dir(item))
['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__module__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmul__', '__setattr__', '__sizeof__', '__slots__', '__str__', '__subclasshook__', '_asdict', '_fields', '_make', '_replace', '_source', 'count', 'index', 'revid', 'rollbacktoken', 'text', 'timestamp', 'user']
In [34]: item.timestamp
Out[34]: Timestamp(2017, 4, 15, 21, 13, 50)
In [36]: item._asdict()
Out[36]:
OrderedDict([('revid', 16270),
('timestamp', Timestamp(2017, 4, 15, 21, 13, 50)),
('user', 'Admin'),
('text',
"<br />\n\n{{Flag\n|header=aircrack-ng\n|text=a suite of tools for wireless cracking. \n\n\n[[Aircrack]] {{,}} [[:Category:Aircrack]]\n\n\n\n'''aircrack-ng'''\n\nMany Ways to Crack a Wifi: [[Cracking Wifi]]\n\nAircrack Benchmarking: [[Aircrack/Benchmarking]]\n\nWEP Attacks with Aircrack: [[Aircrack/WEP Cracking]]\n\nWPA Attacks with Aircrack: [[Aircrack/WPA Cracking]]\n\nAircrack Hardware: [[Aircrack/Packet Injection Testing]]\n\n[[Harvesting Wireless Network Information]]\n\n\n\n'''airodump-ng'''\n\nBasic Usage of [[Airodump]]\n\n\n\n[[:Category:Security]] {{,}} [[:Category:Wireless]] {{,}} [[:Category:Passwords]]\n\n\n<small>[[Flags]]</small> {{,}} <small>[[Template:AircrackFlag]]</small> {{,}} <small>[http://charlesreid1.com/w/index.php?title=Template:AircrackFlag&action=edit e]</small>\n}}\n\n[[Category:Aircrack]]\n[[Category:Security]]\n[[Category:Wireless]]\n[[Category:Networking]]\n\n<br />"),
('rollbacktoken', None)])
In [37]: item.timestamp.isoformat()
Out[37]: '2017-04-15T21:13:50Z'
Getting All Pages on a Wiki
The allpages() method will return a generator object that generates all pages
In [49]: all_pages = list(site.allpages()) In [50]: print(type(all_pages)) <class 'list'> In [51]: print(len(all_pages)) 2292 In [52]: print(type(all_pages[0])) <class 'pywikibot.page.Page'>
The Page generator object works like a normal Python generator - it won't return page names until it needs to. Wrapping it in a list() call above will walk through every page before constructing and returning the list. Don't do this on big wiki sites like Wikipedia!!!
Using Custom Scripts from Outside Pywikibot Directory
You can install pywikibot and import it from anywhere (e.g., /tmp), but you will need to include a user config file. If you try to import pywikibot without a user-config file,
import pywikibot
you'll see this RuntimeError:
RuntimeError: No user-config.py found in directory '/tmp'.
Please check that user-config.py is stored in the correct location.
Directory where user-config.py is searched is determined as follows:
Return the directory in which user-specific information is stored.
This is determined in the following order:
1. If the script was called with a -dir: argument, use the directory
provided in this argument.
2. If the user has a PYWIKIBOT2_DIR environment variable, use the value
of it.
3. If user-config is present in current directory, use the current
directory.
4. If user-config is present in pwb.py directory, use that directory
5. Use (and if necessary create) a 'pywikibot' folder under
'Application Data' or 'AppData\Roaming' (Windows) or
'.pywikibot' directory (Unix and similar) under the user's home
directory.
Set PYWIKIBOT2_NO_USER_CONFIG=1 to disable loading user-config.py
@param test_directory: Assume that a user config file exists in this
directory. Used to test whether placing a user config file in this
directory will cause it to be selected as the base directory.
@type test_directory: str or None
@rtype: unicode
When you are in the pywikibot directory, and you run the login script,
python pwb.py login
this will create a user-config.py configuration file with (almost entirely superfluous) information about the wiki you log into.
From there, it will work pretty much like it did before:
In [1]: import pywikibot
In [2]: s = pywikibot.Site()
In [3]: s
Out[3]: APISite("en", "charlesreid1")
In [5]: pywikibot.Page(s,'Linux/Wireless')
Out[5]: Page('Linux/Wireless')
In [8]: hist = list(p.fullVersionHistory())
WARNING: /Users/charles/Library/Python/3.6/bin/ipython:1: DeprecationWarning: pywikibot.page.BasePage.fullVersionHistory is deprecated; use Page.revisions(content=True) instead.
#!/usr/local/opt/python3/bin/python3.6
In [10]: hist = list(p.revisions(content=False))
In[11]: hist[0:3]
Out[11]:
[{'revid': 16262, 'text': None, 'timestamp': Timestamp(2017, 4, 15, 6, 35, 29), 'user': 'Admin', 'anon': False, 'comment': '/* Flags */', 'minor': False, 'rollbacktoken': None, '_parent_id': 16261, '_content_model': 'wikitext', '_sha1': 'df790e36c30e7895fea4e114d40d3515c0345b23'},
{'revid': 16261, 'text': None, 'timestamp': Timestamp(2017, 4, 15, 6, 32, 26), 'user': 'Admin', 'anon': False, 'comment': '/* Joining network with WPA encryption */', 'minor': False, 'rollbacktoken': None, '_parent_id': 16260, '_content_model': 'wikitext', '_sha1': '430d06d81ecca002199b895b4ab8de76615f86a2'},
{'revid': 16260, 'text': None, 'timestamp': Timestamp(2017, 4, 15, 6, 31, 10), 'user': 'Admin', 'anon': False, 'comment': '/* WPA Supplicant Method */', 'minor': False, 'rollbacktoken': None, '_parent_id': 16259, '_content_model': 'wikitext', '_sha1': 'ca1ee6ab00d2305992c71c14127e0fc474476a84'}]
In [12]: print(type(hist[0]))
In [19]: revision_dictionary = dict(hist[0].__dict__)
In [20]: print(revision_dictionary.keys())
dict_keys(['revid', 'text', 'timestamp', 'user', 'anon', 'comment', 'minor', 'rollbacktoken', '_parent_id', '_content_model', '_sha1'])
<class 'pywikibot.page.Revision'>
..............................................
In [49]: all_pages = list(site.allpages())
In [50]: print(type(all_pages))
<class 'list'>
In [51]: print(len(all_pages))
2292
In [52]: print(type(all_pages[0]))
<class 'pywikibot.page.Page'>