Should everyone have access to models like Dall-E, or are the risks too great, though? And what does the law say about this? In addition, news from Amazon (including a big purchase), regulation of algorithms by the Chinese + a great new repository for learning System Design.
1. Has Dall-E opened Pandora’s box for us?
There hasn’t been a week without some news about Dall-E lately. However, that just shows how much the world has been turned upside down by the possibilities offered by image synthesis – especially one at the level provided by the models unveiled this year. In addition to the stunning results, however, they come with a host of problems, and ones that don’t have an easy solution.
Up to now, behind projects like DALL-E, Imagen, or even MidJourney were companies/institutions who, on their side, tried to ensure (as much as possible in such a situation) that their models would not be used for a (widely understood) lousy purpose. Registrations, closed Betas, a kind of GateKeeping access to the results of their work. However, technology has it that once taken out of the box, it is difficult to close it again. Such is the case with DeepFake technology (although its first public case was a synthesized erotic piece – the beginnings were already “difficult”), so is in the context of image synthesis. The first indications are that the devil is already out of the box and cannot be easily put back in there.
That aforementioned devil is Stability AI, a startup whose goal is to “democratize image generation.” What does this phrase mean in practice? Their solution is not only meant to be accessible to everyone but lacks some typical filters available in similar solutions. For example, in the graphics below, you can find generated examples of Boris Johnson holding a gun – something you won’t get in Dall-E, at least not in such a realistic form.
While (despite the initial concerns that arose with the announcement of “democratization”) the model is not entirely without fuses (e.g., it does not allow the creation of sexual or hate-generating content), it pushes the boundaries a bit further than previous solutions. Overton’s window will probably move further unless market regulators get a handle on the subject. But even then, I’m sure someone will eventually release widely available open source or darknet solutions. TechCrunch publication very accurately scores the risks associated with popularizing ML models. Sad to say, it won’t be long before you really can’t trust anything. The post-truth era is in full swing.
The topic of regulating the use of models itself is also extremely interesting. Even the question of copyright to the generated images is wildly debatable. After all, who owns the images – the creators of the model? or the model itself? the users who generated it with their commands? This is the subject of a VentureBeat publication Who owns DALL-E images? Legal AI experts weigh in. What emerges from it is a very sad picture that DALL-E usage will always remain in a gray area until the first high-profile court trial. The judicial process will establish our legislation for years to come. However, it won’t be some objective law principles but a battlefield of lawyers and lobbyists, that will probably ultimately influence the decisions.
Sources
- Who owns DALL-E images? Legal AI experts weigh in
- This startup is setting a DALL-E 2-like AI free, consequences be damned
2. New System Design course on GitHub
System Design is widely considered (next to algorithmic tasks) to be the most complicated stage of a recruitment interview. People dread it, in my experience, because it’s really hard to know what to expect. That’s why, in between news, I wanted to share a very interesting new repository on the subject.
The whole thing bears the perhaps not-so-creative name System Design Course, but by itself is a very well-structured educational material. Why is it worth your interest? I was captivated by the fact that the whole thing is divided into very specific subchapters and focuses on things that are usually treated neglectfully – a lot of space is devoted to the network layer, such as the OSI Model or the topic of RAID. It also goes beyond the standard shtick – where most courses end the topic of database trade-offs on CAP, System Design Course goes into its expanded version of PACELC. Overall, it feels that the whole thing is written from scratch, so we get such a “The State of System Design in 2022“. Additionally, the whole thing can also certainly be credited with very clear illustrations (is it Excalidraw?).
As a bonus (if you still want some more), I have two additional links on this topic (although regulars of our review probably already know them well). The first is an absolute classic – System Design Primer, which has almost 200,000 stars on GitHub. And the second is Georgely Orosz’s article, Preparing for the Systems Design and Coding Interview, collecting books and links worth knowing before recruiting (and of course – the System Design Primer itself couldn’t be missing among them).
Sources
3. China takes a look at algorithms of its own companies
As early as September (in theory), the Digital Service Package will take effect in the European Union. Under its rule, companies (especially large, especially US-based) will have to (if they want to stay in Europe) meet several conditions. One of them is to provide insight into how the companies’ internal algorithms work and whether they are using them in an uncompetitive manner. It turns out that while the EU is still preparing for this, someone else is putting the law into effect faster. That country, in turn, is China – well known for its defense of citizens’ rights against big corporations.
I laughed a little, but it doesn’t change the fact that China is quite effective at regulating its technology entities, as it has shown once again. Their Cyberspace Administration of China already stewarded insights from major companies into their internal operations back in March, and last week published a list of nearly thirty algorithms from companies such as Alibaba (China’s Amazon), Baidu (China’s Google), ByteDance (makers of TikTok) and Tencent (those of WeChat), along with a brief description of how they work.
You can find the list on this page if you can speak Chinese.
It’s worth adding that it’s not just the Chinese who are looking at the activities of their companies. Do you still remember how, during Donald Trump’s presidency, the American part of TikTok was almost sold to Oracle? In the end, it only forced US citizens’ data to be kept in the United States, and the whole migration was to be based on Oracle’s cloud. After a lengthy tug-of-war, the transfer finally got underway this July. As part of the deal, Oracle gained insight into TikTok’s internal processes, such as the operation of the recommendation system and moderation methods. Axios reports that they are eager to take advantage of this. I wonder if the public ever finds out what things they find there.
Sources
- China’s internet regulator details algorithms used by local Big Tech players
- Scoop: Oracle begins auditing TikTok’s algorithms
4. Amazon gets into IPv6, invests in IaaS tools and buys itself a Roomba
Although that line is blurring, Amazon remains the most popular cloud platform. Therefore, as there were some really exciting announcements in AWS this week, I decided to summarize the most interesting ones.
Amazon is slowly entering the world of IPv6. Last week they announced support for this standard in the company’s database solutions – for both AWS RDS and AWS Aurora. Continuing the theme of protocol updates, I’m sure many will also be happy to see native HTTP/3 support for AWS CloudFront – AWS’s CDN. I’m sure Amazon’s involvement in this matter will facilitate its popularization among developers (and DevOps).
The latter, moreover, have some news for themselves. It’s assured that many people have been waiting for the new Kubernetes 1.23, which has finally become available as part of the Amazon Kubernetes Service. Among the other significant new features, the removal of dockershim shines bright. Another big launch is also a surprising combination – the official release of CDK for Terraform. Why surprising, you ask? Because the two tools are, in theory, competitors to each other. Despite this, the new solution aims to enable the usage of Terraform’s existing ecosystem of plug-ins and modules from within CDK.
And finally, some more information is a little older but ties in with the Amazon theme. The company has decided to spend some of its money and buy itself an iRobot. While the company’s name may not be recognizable to everyone, I suspect that its flagship product – the Roomba vacuum cleaner, which launched a rash of similar devices – has already caught the ears of most of you. This is another move by Amazon to “take over our home” – Last year, the company showed the Astro Robot, which is capable of monitoring the space of the house. The Alexa-powered robot, however, is a fairly niche product. Gaining access to the sensors (and stored room maps) of the popular Roomba brings the company into a new league in this regard.