Posts Tagged ‘python’

mongoDB: whoa.

June 17th, 2010

Yesterday, I timed inserting metadata about 172,201 wiki edits from the Fedora Project wiki into a local SQLite database with SQLAlchemy and Python. The process took about 8 minutes, and used up over 600 MB of RAM and did not stop using the disk the entire time.

Obviously not the sort of database engine we want running datanommer, which will have that many rows in the MediaWiki table alone.

So I set up a mongoDB locally. Rebuilt the RPM that’s available in this review request, packaged up pymongo, and rewrote my Python script to use mongoDB.

Not only can the process of inserting my JSON dump of wiki edits be written in 10 lines of code, but it took less than 6 seconds to do.

Wow.

Tags: , , , , , | 4 Comments »

FAD NA 2010: “Can you see…”

May 26th, 2010

One of the great advantages of having membership within the Fedora Project (including all the little subgroups like ambassadors) centralized in FAS is that you can write a simple script to get some meaningful numbers.

We were discussing ambassador mentoring at FAD NA 2010 and one of the many proposals tossed back and forth was to require that ambassadors are within the project for a period of time before they apply to be an ambassador. David Nalley asked the group: “How long do people wait now before they join the ambassadors group in FAS?” Three seconds later he turned to me and asked me to write a script to do that. Here it is.

This script downloads a bunch of group data from FAS (which takes a little while because it needs to grab cla_done), finds users who have signed the CLA (approved in cla_done), and who have applied to be an ambassador but have not yet been approved. It then determines the amount of time the user spent between signing the CLA and applying to be an ambassador in FAS (what we’ll call the “delta”). It prints two lines: the first is a sorted Python list of the delta, converted to seconds; the second is the number of users the list describes (a count of the elements in the list).

(It should be noted that there is a cutoff for the usability of time-based data in FAS. For some reason or another—whether it was beacuse FAS1 didn’t track times, or because the upgrade to FAS2 overwrote the times—timestamps for group joins and approvals are all horribly wrong before March 12, 2008 at 02:06 UTC. See line 11 in the script.)

As of the FAD, here’s the data it produced (with line breaks added):

[65, 71, 90, 100, 117, 157, 177, 359, 367, 390, 432, 455, 518, 1032, 4174, 4327,
10162, 18168, 21257, 66571, 120267, 122254, 230746, 451587, 904754, 1293886,
1378508, 2001388, 2619665, 3862083, 6272559, 10794330, 15915004, 19977760,
36867582, 39432762]
36

Some conclusions we can make based on this data:

  • The average delta was 1544 seconds, which is about 26 minutes.
  • 20 of the 36 users (55.6%) had a delta of less than a day (86400 seconds). 7 of the 36 users (19.4%) had a delta of less than 5 minutes (300 seconds).
  • The maximum delta was 456.4 days (about 15 months).

If you look to the comment on line 19 of the script, it’s a simple one-line change to get data for those who already have become ambassadors as well. Here’s the data for that, as of now(ish), with line breaks added:

[-7046682, -2244969, -2169415, -2105694, -1210664, -946193, -171773, -132781,
-105235, -88070, -11491, -2193, -380, -70, -31, -30, -13, 18, 19, 22, 26, 26,
26, 33, 33, 33, 36, 39, 40, 41, 43, 46, 47, 47, 47, 57, 59, 60, 61, 62, 62, 66,
66, 66, 67, 68, 69, 71, 71, 75, 76, 76, 77, 80, 85, 90, 90, 90, 92, 93, 95, 96,
98, 104, 105, 106, 109, 109, 109, 110, 111, 118, 119, 119, 120, 120, 127, 128,
131, 134, 135, 137, 139, 143, 145, 145, 146, 150, 150, 152, 155, 156, 158, 159,
168, 169, 176, 183, 185, 189, 191, 194, 194, 196, 198, 199, 205, 210, 211, 214,
217, 222, 222, 225, 237, 240, 243, 245, 252, 256, 258, 262, 264, 270, 272, 272,
278, 283, 294, 294, 295, 296, 297, 304, 306, 319, 321, 321, 323, 328, 335, 343,
346, 353, 361, 374, 378, 400, 400, 402, 412, 421, 441, 450, 452, 452, 455, 478,
484, 491, 520, 531, 575, 589, 607, 607, 621, 648, 658, 663, 705, 720, 722, 724,
732, 733, 738, 749, 753, 814, 827, 832, 874, 880, 929, 950, 956, 1012, 1014,
1041, 1046, 1131, 1286, 1381, 1408, 1430, 1559, 1577, 1821, 1845, 1887, 1906,
1971, 2028, 2165, 2195, 2424, 2479, 2640, 2901, 2934, 3094, 3339, 3354, 3364,
3413, 3414, 3711, 4874, 5386, 5426, 5577, 6329, 7416, 8916, 11001, 18324, 18575,
19330, 19936, 21462, 24887, 27708, 28870, 31331, 37117, 37872, 43673, 45269,
45565, 48128, 49488, 63696, 66359, 68765, 69655, 69813, 70958, 73441, 75468,
76693, 78022, 80469, 81074, 83926, 84313, 85884, 94732, 97918, 109199, 132682,
153970, 159001, 159096, 166200, 167190, 172526, 203033, 209366, 232599, 254839,
298215, 335812, 338047, 346164, 347030, 350391, 373753, 390049, 402758, 419056,
419722, 426483, 473510, 516436, 573911, 602051, 677595, 692417, 760878, 763579,
765369, 856220, 857455, 988386, 988834, 1000077, 1100141, 1208640, 1209160,
1296560, 1298298, 1391236, 1399265, 1409442, 1462069, 1468372, 1475776, 1549503,
1551292, 1556641, 1570053, 1644704, 1724047, 1727078, 1736449, 1819393, 1852417,
1883617, 1908922, 1969031, 1989497, 2075824, 2122750, 2139385, 2145740, 2186876,
2267192, 2292659, 2410660, 2430179, 2503012, 2594221, 2644249, 2699353, 2711578,
2826634, 2905727, 2917899, 2926825, 2928264, 3087834, 3130616, 3133132, 3772561,
4058559, 4446452, 4477283, 4590461, 4666894, 4771861, 4809502, 4868847, 5005004,
5058314, 5092264, 5183777, 5196236, 5411273, 5593249, 5628497, 5873109, 5947922,
6105292, 6240295, 6368175, 6488855, 7137656, 7348233, 7412019, 7524910, 7695694,
7712467, 7743736, 7950337, 8184019, 8226472, 8898541, 9143874, 9157720, 9354098,
9481789, 9552013, 9850428, 10295579, 10468848, 11302343, 11365382, 11483738,
11680912, 12374970, 12556286, 12776962, 12916884, 14004298, 14098912, 14506093,
14567374, 14836520, 15074649, 15868294, 16877210, 16920294, 17261366, 17462813,
17654050, 18496770, 18578171, 19207671, 19240507, 20335751, 20650780, 21510299,
21576474, 22797578, 25967324, 26705809, 26819684, 27315401, 27475767, 27628951,
28697835, 29272369, 29484943, 30322585, 30675304, 31282206, 31359463, 35558509,
36867582, 37016239, 37389204, 40520264, 43289246, 45256091, 45268939, 49846083,
56418326]
438

The first thing I noticed was that there were negative numbers. (lolwut?) These were probably before FAS had the ability to require that you were in cla_done before you joined ambassadors.

The main reason I’m posting about this is because I want to show that it’s really easy to pull group information from FAS and start messing with numbers. Take a look at pydoc fedora.client.fas2 and some other modules inside python-fedora. Looking at numbers can help you figure out what you can do within Fedora to help the project move along. (As for the requiring a certain amount of time as a contributor before becoming an ambassador proposal, I’m not sure where that ended up. I think we determined it was unneeded, but I can’t quite remember.)

Tags: , , , , , | Comments Off

New awesomeness: mw

December 7th, 2009

During an extremely long hackfest today at FUDCon Toronto 2009, I planned to work on resurrecting fuse-mediawiki from its 15-month slumber.

I failed.

After talking with Jesus M. Rodriguez for an hour or so, we both determined that FUSE is not the right way to go about this for what I want to accomplish. The only thing we were planning to use FUSE for so far was downloading the wiki pages; everything else would be done with helper scripts.

We discussed things like “pull” and “commit”. It started to sound like a bastardized VCS. So we wrote a bastardized VCS. :)

Introducing mw: a command-line program with subcommands like “fetch” and “commit” to work with MediaWiki installations. I spent all day creating the framework for commands and all sorts of things, and ended up creating the init and fetch commands to start a mw repo and fetch some pages.

Currently: useless. Future: promising. I’m hoping that I can get the committing portion ready to roll within the week, and have fetch get all the pages of wikis and categories soonish.

Some key awesomeness: attempts to merge instead of just giving up (haha, you suck, MediaWiki), unified diffs, logs, and anything you really feel like doing.

Clone it now and read the README and HACKING:

git clone git://github.com/ianweller/mw.git

Edit: If you want to discuss this with me at FUDCon tomorrow, by all means do. Ping me on IRC to see where I’m at. :)

Tags: , , , , , | 1 Comment »

fuse-mediawiki 0.1

July 27th, 2008

A fun personal project of mine, fuse-mediawiki, has been pushed to 0.1. It’s most likely still very broken, but it’d be nice if people would be able to test it a bit, submit patches, whatever.

Fetch the source with

$ git clone git://repo.or.cz/fuse-mediawiki.git

and play around. This’ll get you started with the Fedora Project wiki:

$ mkdir ~/wiki/
$ python fuse-mediawiki.git https://fedoraproject.org/w/index.php ~/wiki/ --auth-basic -u FAS_USERNAME
$ mkdir -p ~/wiki/content/User:Ianweller/
$ vim ~/wiki/content/User:Ianweller/fuse-mediawiki_playground.wiki

and a :wq and changes will be committed to the wiki. Exit the filesystem with

$ fusermount -u ~/wiki/

Do NOT, under any circumstances, use this for real work and blame me for any damage caused. However, please do test it in places where it doesn’t matter what happens, and let me know what breaks.

There is currently nothing to prevent you from overwriting somebody else’s changes. There is currently nothing that clears out the cache of a page unless you remount it.

I have no clue how this works in Emacs, or gedit, or anything else. Patches welcome to fix it. :)

If you’re trying to debug something, pass the -f option to the end of the command line; it’ll put the program in the foreground and print fun debugging information. Read the README for more info.

This may be a personal project, but if somebody would like to work on this with me, that’d be great! Shoot me an email.

Edit: I fail. The correct option for auth_basic is --http-basic.

Tags: , , | 2 Comments »