The Blog




I've been listening to Lawrence Lessig giving an interview to Digital Village and part of it is about the current arguments around Google Print.

I have a slightly different take on this. Most of the arguments from both sides and from commentators have been about copyright and copyright law and focusing on fair use. I think this is a red herring. I think what's really happening here is horse trading in public between Google and the publishers. And the publisher's real worry is that Google is creating a new form of their product which they really should be doing themselves. In a few years, Google will hold a digital copy of the publisher's product and the publishers won't. And at some time in the future after that, the publishers will want to buy a digital copy from Google rather than create one themselves. At which point it may be rather more expensivethan they would like because Google will have sole control.

One point in the copyright arguments that I find particularly interesting is that fair use of the written word is fairly well understood and has a long history. Provided you give attribution where you can and limit what you quote, it's held to be entirely reasonable to include a snippet of text from a copyrighted work in your work and then do whatever you like with it, including to sell it. However there appears to be no equivalent fair use for any other form of media and communication. Or at least that's what the major media companies would like you to believe. And in particular where it relates to sound and vision. To make it completely clear, including a sample of music or video in your music or video is not allowed while including a sample of text in your text is allowed. Now why is that and is it right? And as we move to an increasingly multimedia world and away from a pure text world, this limits our freedom of expression and ability to create new works. It also limits Google's (and others) ability to provide search metadata for audio and video in comparison with their ability with text. This is the core of Lessig's arguments for a remix culture and against current copyright law.

Has anyone figured out bulk upload via FTP? I can't seem to make it work reliably.

What I want to do is a daily upload from a PHP script. The script is running, it appears to upload ok and ftp returns no errors, but Google rarely seems to process the file. And there's precious little feedback about what's going on. And when there is feedback, it's 2-6-12 hours later.

1) Are you supposed to use the same file over and over again? Or does it need a new unique filename for each upload?

2) Do you have to do one manual upload and then ftp over the same filename later? Or if you do the manual upload does that prevent using ftp later?

3) After doing an ftp upload by hand, filezilla, firefox and IE usually show no entry on the server side. What? But if you use linux ftp from the command line and do a dir / the new file is there with the correct timestamps. If you try to upload the file twice in quick succession, the second upload fails with a permissions error. Try it an hour later and it goes through as if Google is processing the previous one. But the timestamps on the web bulk upload file display don't change.

Agh! This is getting extremely frustrating and google support are not responding to trouble tickets.
The only mailing list I can find doesn't seem to be very helpful either.

If anyone can help please email julian_bond at voidstar.com




Quantum mechanics appears to work. The equations predict things we can test and these check out. But numerous anomalies and paradoxes appear if we try and scale them up to the real world. There are two classic problems 1) Put a cat in a box with a radioactive pellet. If the pellet decays, release a poison capsule. After one half life of the pellet, the cat has 50% probability of being dead. The equations predict that the cat is both alive and dead until we open the box. Not either alive and dead but both alive and dead. 2) There are lots of atomic actions that generate a matched pair of particles with opposite values of some parameter. Like polarization of light photons. Now before we measure one of them we don't know their state. But after measuring one we know the state of the other. Now let them move light years apart. When we measure one we now know the state of the other instantaneously. So something (information perhaps) has traveled across the gap faster than the speed of light.

There are a bunch of major views on all this.

- The Copenhagen interpretation. The equations don't reflect reality. They reflect a reality we need to create in order to think about what's happening. They're nothing but maths, although they are useful maths

- Bell's Theorem. Particles that have been in touch continue to influence each other. They can only do this if the communication employs no known form of energy since it violates Relativity. Which leads to.

- Everett-Wheeler-Graham. Everything that can happen does but in another universe. When we open the box and discover a live cat we choose which universe to live in and it's the one with a live cat. Next door there is a universe where we found a dead cat.

- Hidden variable. There is an invisible hand below the quantum level that is manipulating reality to appear the way it does. Some people think this is consciousness

- Non-objectivity. The universe has no reality aside from observation. If the tree falls in the wood and nobody hears it, there is no sound.

Now Reed's law suggests that that the utility of large networks, particularly social networks, can scale exponentially with the size of the network. The reason for this is that the number of possible sub-groups of network participants is 2^N - N - 1 , where N is the number of participants. This grows much more rapidly than either

* the number of participants, N, or
* the number of possible pair connections, N (N - 1) / 2, (which follows Metcalfe's law)

But this requires that every entity in the network is in touch with every other entity, and all possible combinations of entities simultaneously. This violates several social relativity laws such as the number of people any one person can "know" and track. And as this is all happening in time rather than as a snapshot it assumes simultaneous communication across time and space at faster than light speed. So while the reality of the value of large networks does appear to be somewhere between Metcalfe and Reed, we can draw some parallels with the Quantum Physics paradoxes and theorems. :)

- Reed and Metcalfe are just mathematical formalisms. We don't actually know what they mean by value. But they seem to be handy when predicting the success of various networks.

- Two people who meet at an Social Networking meeting continue to influence each other across space and time (limited only by their ability to use email)

- Every possible combination of people at Social Networking meetings does in fact happen. Just not all in the same room.

- The network owners are actually manipulating all their members. They just don't all notice this.

- If you don't keep your eyes open at Social Networking meetings, you'll miss the details. If you're asleep in the corner and don't observe it, it's as if the meeting never happened.




I'm sure I've seen reports that Sony have only been selling their XCP Rootkit infected CDs in the USA.

So what's this then on the UK Amazon website?

Amazon.co.uk: XCP: Search Results Music

And why doesn't this page say anything about refunds in the UK?





I've been playing with Google Base and also tracking Amazon because I've been trying to find a Book equivalent of last.fm, settled on librarything.com and was looking at the Amazon API, lists and recommendation system. What's interesting in all this is categorization systems for general SKU#s This also got raised in the Microformats group. The underlying problem is that a job, an event, a book, an iPod, a piece of music, and so on, all have different sets of attributes.

So back to Google Base. They've got their own set of top categories. Each category has a set of attributes that can be provided as part of an RSS or Atom upload using namespace extensions (Why isn't Dave Winer all over this? Somebody should tell him.). You can then also provide arbitrary tags that you can use in Google Base searches as pivots. These are kind of like Flickr/del.icio.us tags but the user interface is very different and there's no community feedback at all.

Now, inside Ecademy (and like Tribes) we've got our own copy of Craigslist. We use user-entered tags (unlike Tribes and Craigslist) to navigate these and to avoid me having to constantly maintain a category hierarchy. I'm working on uploading these automatically into Google Base. But if I'm going to do this really effectively I have to duplicate Google's top level categories and category attributes.

And so I finally get to the point. Google, Amazon and eBay with their APIs and uploads are forcing a kind of category imperialism on the rest of us. As much smaller developers, we have to match their category-attribute schemes in our systems or maintain lookup tables of the differences. And I really can't decide if this is a bad thing or a good thing. We do actually need standards for how to describe common SKUs. And the Microformat people could do worse than create an hGoogleJob format to match Google's idea of job. But then you get into the whole issue that Google's descriptions aren't always very good. For instance, they currently only support USA addresses and telephone numbers (surprise!). And Google are pretty much impossible to have a conversation with to try and improve them. There's no standards process here at all. We get to use whatever Google decides with no reference to us.

Much to ponder here.

Separately, Danny Ayers (I think) made a comment (which of course I can no longer find) about OPML, XOXO and RDF. He pointed out that a hierarchy is just a special case of a mesh. And that all real world problems are meshes. I'd take that further, we as humans have a hard time understanding and groking meshes. It's as though they're multi-dimensional when our brains want to work in 3D and preferably in 2D. And we work best manipulating symbols on 2D paper. Which is partly why we insist on reducing complex mesh problems into 2D Hierarchies. And it's why I consider Outliners to be harmful. Somewhere in there is a big thesis on why we seem to like hierarchical command structures that support an Alpha Male. But I digress! One of the things that makes tag based folksonomies interesting is that they represent a way of navigating and understanding multi-dimensional meshes in a 2D user interface.




2005 Notebook Drive Roundup
I'm sure I'm going to be updating the drive ooner rather than later. You can never get enough storage, right? [from: del.icio.us]




Another quick thought about Google Base. It’s good that they support RSS and Atom as transfer formats but with Google specific extensions that are category specific. Which means knowing about their categories in my system and storing, retrieving and formatting the data specific to them for each category. This is going to be a form of category imperialism where we our systems will have to be designed round their category data elements.

And FTPing a bulk file is a pretty old fashioned API. Why can’t they just subscribe to my RSS?




I've been messing around this afternoon with taking entries out of Ecademy marketplace and posting them into Google Base.

I've taken the last 15 items in the marketplace and written some code to produce a custom RSS feed and saved it locally. I've then uploaded it using an ftp program. Google is currently processing the entries and accepted the file. So I think I can do a daily update of the file, save it and then use ftp_put to upload it. There's probably a way of using ftp_fput to stream it straight from the code.

Some discoveries.

1) The lack of a global location element is annoying. I've emailed Google about this but not heard anything. But then it is thanksgiving. I think they ought to accept address strings that work in google maps, break it out into street, city, county/state and country and/or provide a lat/long element. This all arises because the location element requires a full US address "Should include street, city, state, postal code, and country, in that order." and partial data is apparently not allowed. eg Anytown, CA, 12345, USA and hence SG12 7DB, UK or London, UK

2) They've added new categories Blogs and News Articles. This seems to suggest that they want pretty much any arbitrary data. That includes all the blogs, meetings etc from Ecademy. Lots of overlap here with Blogsearch and Search.

3) There's a missing category for "Listing". They've got "Wanted Ads" and a bunch of others but no obvious way to say "it's a listing and I don't know what type". I can imagine that as other people do this we're all going to end up using their categories to describe our stuff in our systems. I feel vaguely uncomfortable about this.




So via Digg, I'm quickly scanning The 'Phinest' source to the golden section, golden mean, divine proportion, Fibonacci series and phi. Explore its application to art, design, life, beauty, mathematics, geometry, stock markets, theology, cosmology and more.

I came across this
Phi, curiously, can also be expressed all in fives as:
5 ^ .5 * .5 + .5 = Phi

Friends of Robert Anton Wilson and the Principia Discordia will understand why I burst out laughing and splatted coffee over the desk.

Tim Ireland does it again.
BB VII - The World According To Leo Blair

Bloody brilliant mate! Sheer genius to look at Blair's works through the eyes of his 5 year old son.

What did you do in the war, Daddy?

ps. bonus Dr Who reference: Are you my Mummy?




HearFromYourMP.com - Sign up to hear from your MP about local issues, and to discuss them with other constituents
Those clever guys at MySociety have come up with another cool tool. Sign up and register with an email address and a postcode. They work out who your MP is. When enough people in their constituency do this they start badgering the MP via email to tell their constituents what they are up to. [from: del.icio.us]




You know what to do with a Link to Sony

AggregateKnowledge.com
This is very very interesting. Watch it. [from: del.icio.us]




Custom Scooter Girls Team!
Awesome collection of photos of customised megascooters owned by Japanese girls. [from: del.icio.us]




In Microsoft Netscape Doc Searls comments on Microsoft's plans for the web and the portrayal of this as a "war" with Netscape by the media.

Except that it's 1995. How naive we were then. How innocent.

What struck me as I read it was how few of the links in the document still work. Damn linkrot! Now think about an equivalent document being writen and posted on the web now. How many of the links will still work in 2015?

About a month ago, we started taking the first paragraph of people's postings in Profiles, Blogs, Marketplace entries and elsewhere and putting it into the page description. We also started or confirmed that Ecademy was "pinging" the various blog update services. The end result is that Google is paying much more attention to us and the page description in Google results reflects the first few lines of the body text. This has some implications:-

1) Pay attention to the title and first sentence of your Profile, Blogs and Marketplace entries. This has a big effect on how easy the post is to find in Google and the description when it has been found.

2) The beta Google Blogsearch is indexing Ecademy blogs and Marketplace extremely quickly. A new post typically appears in the Blogsearch within an hour. It's not clear yet whether Google routes this through to the main index as well but I rather think they do now.

Technically, what we do is to strip HTML tags from the body text of the entry, we then convert newlines to spaces, finally we take the first 180 characters broken at a word boundary and put the results into the Meta Description tag at the top of the page. So I'll stress this again. Think carefully about the first sentence and paragraph of your posts and profile.

The ping service "pings" a collection of services each 15 minutes if there are any new posts. We're currently pinging Weblogs.com, Pingomatic, Technorati and Blo.gs This is belt and braces but ensures we come to the attention of most of the major search and aggregation engines. [from: JB Ecademy]

Good analysis that rips Sony a new one. Sony DRM is worse than you might think : "Sony compromised your system and will not directly allow you to remove it without compromising your privacy. It also will not replace your defective CDs with non-infected ones. If you hose your computer or network with this infection, and want to play your music, do not pass go, do not collect $200."

You might want to look here. The UK company that supplied the DRM software.

info@first4internet.co.uk
sales@first4internet.co.uk
webmaster@first4internet.co.uk
By Phone
Tel: +44 (0)1295 255777
Fax: +44 (0)1295 262682

By Post
6 South Bar Street
Banbury
Oxfordshire
OX16 9AA
United Kingdom

Management Team
Nick Bingham Chairman
Mathew Gilliat-Smith CEO
Tony Miles Operations & Technical Director
Peter Worrall Marketing & Research Director
Nick Drew ICA Business Development Manager

Anyway, yet another reason not to buy Sony products.

Cool! Yahoo! Maps Web Services - Geocoding API Convert any address to Lat/Long.

USA Only. Oh Pants! How can anyone release a maps system that is USA only! SO I thought I'd try http://maps.yahoo.co.uk/beta/#trf=0 but that redirects to a German page. "Yahoo! Deutschland - Seite nicht gefunden"

Come on guys.

Dave says, Let's make the Google API an open standard. Actually let's not. Let's push Google and the other search engines to do a very similar API in ReST and in XMLRPC. Who needs SOAP? And the API itself could do with some simplification. And while we're at it let's push Google and the others to put APIs on their other services like Image Search, News Search, Blog Search, Comparison shopping (Froogle), Maps, Mail, etc, etc, etc. And let's not forget that most of these services need RSS witha simple ReST interface as well. And while I'm grumbling, that goes for Amazon and eBay too.

Which raises the big problem with Google right now. Why don't they ever finish anything? They launch some really neat stuff but so much of it hardly ever gets any more attention and rarely gets changed. Microsoft has forgotten how to ship early and ship often. But arguably Google still hasn't learnt this one.




The Big Picture: DRM Crippled CD: A bizarre tale in 4 parts : Variety writes that "the new copy protection scheme which makes it difficult to rip CDs and listen to them with an iPod is designed to put pressure on Apple to open the iPod to other music services, rather than making it dependent on the iTunes Music Store for downloads."

It's DRM day today. Ludicrous as this entire story is, let's not forget that Apple is also guilty here for whatever reasons; by using DRM to manipulate the market and protect their position as an exclusive distributor of music to iPod owners.

1 to 20 of 3860