AI Coding Assistants: Attribution, Copyrights, Moral Rights and more… — What’s AI episode 12 | by Louis Bouchard | May, 2023

Posted on May 18, 2023 by Editorial Team

[ad_1]

An interview with David Mertz, senior developer, data scientist, and author

30 min read

21 hours ago

🎙️ Check out this insightful interview with David Mertz, senior developer, data scientist, and author, who explores the fascinating world of generative AI and its applications in coding and AI coding assistants like Copilot and ChatGPT.

📚 David’s recent book focuses on how these AI coding assistants handle regular expressions, shedding light on their strengths, weaknesses, and how they can both help and hinder developers.

Generative AI, employing transformer-based neural networks, revolutionizes code suggestion/ completion. However, acknowledging its limitations, such as hallucinations and misleading output, is crucial. The dangers of relying on AI coding assistants include potential plagiarism, a reduction in code quality & innovation, and the easy generation of misinformation. Nevertheless, they’re powerful tools when used with caution and expertise.

When it comes to code generation, the line between human-generated and AI-generated code is blurry. Ultimately, the responsibility lies with the human developer to ensure the quality and functionality of the final output.

Explore the complex world of generative AI and its impact on programming, writing, and society by listening to David Mertz’s fascinating interview! Listen here on Spotify or YouTube:

YouTube Transcription

0:08

this is an interview with David Mertz senior developer data scientist and author David recently published regular

0:15

expression puzzles and AI coding assistants which we’ll talk about in this podcast he got his PhD in 1999

0:21

political philosophy before switching Philly to machine Learning and Development he is also on the board of

0:27

directors of the Python software foundation and does a lot of other stuff in this interview we go over a lot of

0:33

different issues related to generative AI including legal issues but also copyright’s attribution and moral rights

0:41

I hope you enjoy this interview so I’d love if you could introduce

0:46

yourself and give your your background yeah thank you Louis um my name is David Mertz and I’m well

0:53

I’ve done a number of different things in my career in my life but in particular sort of latest I uh do

1:02

some work in training and writing books and uh actually doing code in areas of

1:09

machine learning um particularly that comes out of uh when I formerly worked at Anaconda which

1:14

is something of a player in the python space and machine learning stuff since then I’ve gone on to do a variety of

1:21

other things I actually have a background that’s very different from that like decades ago but I got a

1:27

doctorate in political philosophy and so naturally I after many years completing

1:33

a doctorate I discovered that you don’t actually get paid to have a doctorate in philosophy so writing

1:40

computer programs and did you transition from political philosophy to to coding

1:48

mainly because of the basically the money or did you always have an interest

1:54

in coding and such like art machine learning models and things like that

Why did you get into the field of political science?

2:00

there was a lot of practicality about it but nonetheless I have of course been programming since I was a teenager on

2:07

you know my junior high school had access to the Mainframe that was across town and writing Fortran and things like

2:14

that like um that’s what it was in the like 80s or something

2:19

um but and also some of my background was

2:25

in mathematics I did a bit of um mostly very uh like pure math set theory and a

2:33

little bit of model Theory or logic or things like that was the transition difficult from such a different field

2:40

oh it really wasn’t and you know I mean actually this is more life advice possibly to your listeners but

2:48

like many of the people who are really strongest and and the most wonderful to

2:55

work with in sort of software development technical

3:00

you know machine learning areas or actually people have come out of entirely different domains and people

3:06

like in Humanities or music or other things obviously there’s things that you learn

3:12

in education programs that are going to be relevant to what you do in that area but people who have as I hope I do very

3:21

uh engaged and open and minds and a willingness to learn can you know it’s

3:28

not a particularly difficult transition actually yeah especially with everything

3:33

available online right now it’s like easier than than ever to to learn something new and practice and improve

3:41

so you you mentioned that you you you wrote you you recently wrote a book and I believe it is related to generative AI

3:50

and so I’d love if you could maybe give us a brief overview of what what is a

3:55

book about more more a bit more precisely and also I’d love if you could

4:00

maybe start with what if’s generative AI to you sure so let me say first um I did this

What is AI-generated?

4:08

book on Manning I don’t know if it’s useful to hold it up but there’s what it looks like and folks can go buy it for

4:14

Manning but it was this kind of a um puzzles about regular Expressions where

4:21

I would give these um sort of edge cases or complicated things that would allow you to think about in some sometimes the

4:28

like weird theoretical education of understanding what they’re really capable of doing and

4:34

um hopefully that’s a way of for developers to gain skills in that particular mini language that’s embedded

4:39

in almost every other programming language um and in the first sort of iteration of

4:46

this it was my own discussion of what the solutions were or in some cases that it was impossible to solve this problem

4:52

using that technique and I was approached by Manning at late

4:57

last year in at the end of 2022 and through the discussion we decided to do

5:04

a sort of enhancement or an update to that book that was just about regular

5:10

expressions that would look at how effective these new tools these AI coding assistants

5:16

like copilot and chatgpt had just come out and there will be more in the future there’s one called tab nine I used a

5:23

couple years ago that still around that’s a similar kind of concept but many developers will be familiar

5:29

with these either you’ve read about them or you actually use them where you’re in some sort of coding development

5:36

environment and there is uh artificial intelligence or a large

5:41

language model behind the scenes that makes suggestions for how you might complete your code whether it’s you know

5:49

fleshing out the full function from the function signature or giving a comment in the code and it will try to write

5:55

code that matches your comment in some cases the reverse way where it’ll compose the comments from the actual

6:01

structure of the code that exists a variety of things and in it’s sometimes

6:06

quite impressive what these tools can do in some cases like copilot they’re

6:12

integrated into the editors you might be using anyway in others like chat GPT it’s

6:19

um you know you go to a separate interface which is sort of modeled on a chat client like you might talk with

6:24

your friends in but it’s talking to the AI but in either case it’ll provide these possible answers to what we

6:33

software developers actually do and I thought that this

6:38

case with the regular Expressions that I had written the earlier book on was a

6:43

nice test case because regular expressions are so Compact and so

6:48

interdependent um I mean of course it doesn’t matter about the specific token link that’s

6:53

involved if one character or a you know word based or you know a byte pairing coated or something like

6:59

that really in terms of the complexity but the kind of deep interdependence in

7:05

regular Expressions is definitely uh challenging for these language models

7:11

and I took a look at how copilot and trap GPT in particular in the future

7:17

there’ll be other tools that’ll be similar and equally relevant to evaluate and it’s interesting to look not only

7:23

where they produce really good and interesting results but more often where

7:28

they dramatically fail to and trying to understand the mind of the machine and

7:35

what are sort of the modes of these failures so I have some discussion about that and some kind of advice to

7:42

programmers developers about how actually to utilize these

7:48

these assistants so I don’t know exactly the machine that is underneath the copilot or that’s underneath chat 2pt or

7:56

you know gpt4 that just came out but I have some feelings about what’s

8:01

going on the general idea is to use as a system called Transformers which is a

8:06

kind of neural network architecture in particular it allows something called an

8:11

attention mechanism and a self-attention mechanism where you can look at

8:17

the um not just the last few symbols like you might get in a kind of machine

8:22

called a recurrent neural network but rather a very large number perhaps like

8:28

thousands of previous tokens and um contextually give more weight to

8:35

Prior tokens perhaps quite a ways before the current production that’s involved

8:42

in the the current output of this model and that turns out to be very effective

8:47

in making these you know sort of magical chat Bots and code assistants and things like that

8:54

could you could you share some interesting insights and limitations

9:00

that you’ve seen using cha GPT and copilot during this work or basically

What are the best insights from your books?

9:06

share some of the insights from your books from your book there’s certainly a big tendency and

9:14

this I found even of not just the regular expression examples which I think push it to boundaries but even in

9:19

regular code and and I can even say something of the same thing of just sort of General prose

9:25

text you ask these machines some of them questions about sort of General

9:31

conversational knowledge like you would talk with a human and this issue of

9:37

Hallucination is extremely prominent where you know you imagine you’re having

9:43

a conversation as if with a human and the AI will very freely just make up

9:50

entirely nonsense facts to put into there because they are the kinds of

9:59

words the kind of tokens that typically occur in that context within the Corpus

10:05

that it was trained on but but they’re not actually true there’s no sort of epistemic basis that these machines

10:13

can utilize it’s just simply not how they work and that’s a thing I think in the popular press and for many users of

10:20

these is a misunderstanding that simply not a concern what they’re trying to do is they’re trying to produce the most uh

10:27

typical expected completions from keeping props or from giving prior text

10:34

well when it comes to programming that means you see lots of stuff that looks like valid syntactic code but it doesn’t

10:42

actually do conceptually the thing you’re interested in but the ways that it fails to do that can be quite subtle

10:48

you have to actually know what the code is supposed to be and actually know the constructs of the particular language or

10:54

mini language like with regular Expressions that you’re interested in and to understand where it’s gone amiss

11:03

I mean it looks in both cases both the sort of natural language prose and in the programming language prose in a

11:10

sense the symbols that make that up something that’s superficially very plausible and that’s pretty consistent

11:17

in the state of the machines now where you know if you’re not paying a lot of

11:22

attention you don’t really think through or research what is actually true and what’s not or what

11:29

is actually you know functional relevant to your relative to your particular program that you’re trying to develop

11:36

um it looks like the right thing so it can be very easy and tempting to just assume that it’s doing the right thing

11:42

because the errors are not would you say that your main experience is with

11:49

your main experience with generative AI is with coding or have you also tried it

11:54

a lot with writing I’ve tried it a little bit with writing but many of the same kind of toy cases

12:01

that you know everyone else who’s subscribed to one of these things has done you know I’ve asked it to write you

12:08

know a biography of myself or of my friends or you know famous people or things like that or you know a little

12:15

bit of you know like a historical question or something like that

12:20

and it gives answers that as I said are kind of the the most typical things one

12:26

would say about that topic so if I ask and or users do your listeners do ask

12:34

about you know a biography of me well I’m someone who’s published who’s done some things that have been

12:41

described in the corpora that go into the training of these models and so

12:46

therefore it’ll pick up on certain elements of that like it probably knows I was director of the Python software

12:53

Foundation at one of several directors at a given term

12:59

um and so you know that’s something that probably occurs in proximity to my name

13:06

relatively often within the corpora that are in the training data so okay it

13:13

finds that true fact but not because of the truthfulness of it just simply because of the proximity and attention

13:20

relations of the different parts of it that occur in the training sense

13:27

um and you know similarly if you ask at a oh but just to continue that it but it

13:32

also makes up lots of things that if you didn’t know me very well could be true

13:37

you know the sort of things you say about people who have done the sorts of things I’ve done you know like I went to

13:44

University in grad school but you know completely different ones than I actually went to because it’s plausible

13:50

could be true I could have gone to those other ones yeah and it’s much the same if you ask it you know a history

13:56

question mostly it just sort of spits back the you know High School level

14:02

answer to you know what one typically says about that historical question so it’s

14:08

probably not wrong because a lot of people have said relatively inane but relatively similar things about this

14:14

historical issue but if and this is equally true of the

14:19

code if it doesn’t know you know a sort of most typical pattern

14:25

that this actually has some sort of very similitative to it it will just make up

14:30

words that sort of make sense in the context yeah that definitely makes makes sense

14:37

and for for writing is is pretty clear on how it can basically help spread

14:43

misinformation just because it it will hallucinate false statements as you just mentioned

14:49

but it’s a bit less clear to me how it can impact the the code generation in

14:56

the sense that it can definitely just produce bad code examples that cannot be used but what could be dangerous of

What are the dangers of using ChatGPT as a coding assistant?

15:03

using such an AI coding assistant well I mean an example of regular

15:10

Expressions which are you know the subject of my book and are something that are

15:15

notoriously subtle it’s very easy to create a pattern that

15:20

you think I mean even as a human being you think it’s going to match that particular you know pattern in the text

15:26

that you hope to identify and pull out or modify or whatever

15:32

um you as a human it’s very easy to write something that will get many of

15:37

the cases correct but miss edge cases and using the code assistance is just

15:43

sort of an amplification of that kind of error that humans make um it’ll very often get a little bit

15:50

right it’ll catch some of the cases that are at issue especially if you give a relatively clear prompt of what you’re

15:57

trying to do but it’ll Miss many edge cases even if they’re you know they actually are

16:04

logically implied by the description of the prompt they’re using to produce it

16:09

so they’ll do something that’s right but it doesn’t actually capture the underlying logic or the underlying

16:16

intention and where would you draw the line between code that is AI generated or

16:24

code that just is used a human that is just using an AI to help

16:30

produce the code like where where do we where can we say that it is AI generated

16:35

or it’s just human but with some help just as if I would ask on on stack

16:40

Overflow to to help me code like where is the the line where it’s officially AI

Where do you draw the line from using a writing assistant to AI-generated?

16:48

generated code weather AI is the author or say I created I mean

16:55

it’s an assistant it’s providing a possible suggestion for what I might want to occur in my code and I have to

17:04

make the decision to accept it um I mean I certainly should there may

17:09

be lazy programmers or not as good programmers who will too automatically

17:15

accept it and the co-quality will suffer as a consequence um of course people write bad code all

17:22

on their own too so it’s not unique to the AIS but I don’t really think that

17:27

there’s such an issue about you know like giving credit to a machine or

17:33

um like it’s a important even I don’t think even from

17:39

an auditing or security or you know regulatory perspective it’s even necessarily important to say this

17:45

was written by a human versus written by an AI it was it’s code it’s going to do

17:50

something yeah and obviously there’s a human driving it in some way but how

17:55

much is done for them is not entirely relevant yeah that definitely makes sense and

18:02

would you say that since it’s becoming more and more powerful but it’s based on

18:08

probability probabilities and statistics of existing code would you

18:15

say that it’s it may hurt the innovation of our coding like the it we may

18:23

more more people may rely on this tool and not produce some sort of better

18:31

programming than than before like May limit The Innovation towards the

Will AI coding assistants affect programming innovation?

18:38

different programming oh that’s definitely a danger of code

18:43

quality suffering as a consequence of too much use of these things and there is an inherent tendency

18:51

um to produce the most mediocre response possible I mean by definition you’re

18:58

trying to produce the most typical completion of you know a given prior sequence of tokens and the most typical

19:06

one is going to be in some way the most mediocre one nothing is can ever really

19:11

easier to stand out but that’s more probably a question of like human prose if I’m you know reading a a Borges novel

19:20

or no the short story you know he has a very characteristic writing style and it

19:27

has a feel and has originality and has unexpected I mean I’ve read it in the

19:32

translations not in the Spanish but um but whatever I could pick a English

19:38

language writer as well but there’s something very distinctive about it and you’re never going to get

19:44

that kind of distinctiveness for prose or poetry or something like that from

19:50

the machines although unless you’re an expert and it

19:55

connoisseur of the kinds of writing you may not notice that difference but with code I’m not sure that that

20:02

focus on originality is really the same I mean I’ve definitely seen code that I

20:08

think is beautiful code but you very very quickly get into as

20:13

soon as it’s too original it’s probably too clever and it’s not something that people can easily understand and easily

20:19

reuse even if it actually does the right thing in a interesting and creative way like you actually don’t want code to be

20:27

creative yeah definitely and do you do you foresee any other limitations of using

20:34

generative AI of using coding assistance AI coding assistance other than

20:39

Innovation and trying to as we said it it will surely use the

20:46

the easier way of programming a function or like the most common type

Limitations of AI coding assistants…

20:51

but what would be another limitation of it I mean I do think this there’s an

20:56

issue here from a sort of social or ethical uh perspective that

21:03

I’m going to sort of transition in our conversation towards which I think is interesting is that there’s a there is a

21:10

certain question of authenticity one can imagine using assistance to

21:18

um you know fake your school papers or fake your not fake but you know produce

21:24

um political speeches or newspaper articles or other kinds of things where

21:31

there is a sort of reputational stake in you know in a social basis for a

21:38

reputational stake in your actual production of a particular text

21:45

um and these machines might make it easy to not do that I mean you think

21:50

particularly in an education context you really want a responsibility for students to do their own work not you

21:57

know plagiarized in a sense I’m not it’s not literally plagiarism because there’s not a author with a copyright stake in

22:03

the same way but but plagiarism has actually broader than that it’s you know

22:08

you it’s a requirement for originality if you you know my old field of doing

22:15

political philosophy if I could take an author from hundreds of years ago who wrote very interesting philosophical

22:21

thoughts of course it’s not under copyright so it’s not like I in any event would you know owe a payment to

22:27

them for quoting them but nonetheless if I’m claiming this is my original thought

22:33

it should be my original words that do it um so in some sense if I’m quoting the AI

22:40

without attribution then it’s the same kind of issue

22:46

and I do think for this there and this is a thing I think is interesting to talk about is there is technically a pretty

22:55

good approach to authenticating or um watermarking

23:02

um AI Productions it depends a little bit on the kind of Productions some of the things that we

23:09

generate we haven’t really talked about it as much in this conversation but it was very popular just a few months ago

23:16

or maybe on the order of a year where people wouldn’t use these um image generator programs that you know you

23:23

type in a few words of a prompt and they’ll do a interesting picture that’s generated from that description and then

23:31

and it’s sort of uh humorously amazing how you know you

23:37

write just a few words and they can be very fanciful and kind of incongruous words together and it will make some

23:42

sort of picture that really does seem to be a representation of that particular

23:48

phrase that you’ve come up with and that’s fun to look at I’m not sure what

23:55

actual social value it has but I mean people play games and so on too so it’s that not everything has to have a sort

24:01

of productive social value in that sense but in that case the watermarking

24:06

question that is we might want to say of a given output a given production

24:12

whether this was in fact generated by an AI or whether it was created by a human

24:18

for the image issue it’s actually pretty well understood and pretty easy

24:25

when you have a photographic kind of image something with continuous tones in it

24:33

you know digital composition uh yeah you know our program or something like that as well but something that emphasizes

24:41

continuous tones human eyes don’t perceive very small

24:48

I you give you gave the example for the for the for Education obviously for

24:55

not plagiarizing but using something else to to do your homework and Etc so

25:00

that’s definitely a problem for education but is other than education

25:06

well first other than education is this a real problem and second regarding to

25:13

education how do you think this will affect our current education system

25:20

um I think in the short term I mean these uh potential laws that I don’t exist

25:26

that I’m discussing how they might conceptually work certainly aren’t going to be in place

25:32

you know this year or even next year I mean maybe they will be at some point in

25:37

the future but laws don’t move quickly typically um so I mean I do think we’re going to

25:44

see a whole lot of cases where the students everywhere from you know

25:49

Elementary School to grad school you know start writing papers or yeah

25:56

School papers or professional papers or like things like journalist producing

26:03

articles that aren’t really written by them which I do think raises some social

26:10

concerns um and it’s not going to be very easy to

26:15

detect that fact for currently and I even though there are

26:21

many many cases where it’s not really socially important to have this attribution to the

26:28

generative AI there are enough cases where one should

26:34

you know admit to that one should document that that was the approach used that I actually think it’s kind of a

26:41

good idea to pass laws along this line um you know education isn’t the only

26:49

thing that people write text about but but it’s one and it’s an important one

26:54

it you know has a significant social function and you couldn’t you know you

26:59

couldn’t have a switch that’s you know water market now because I’m a student you would have to do it systematically

27:06

otherwise you know students would turn off the switch and do that so how will how do you think this will well this

27:14

this new trend of using generative AI will affect copyrights and also another

27:20

question after that is that how is it different from other new technologies such as Printing and other stuff that

How will AI coding affect copyrighting vs. other changes like the invention of the printer?

27:27

allowed us to to scale writing basically how is this copywriting issue different

27:33

from then to now with using ai ai there’s a concern that’s been expressed

27:40

by many people I know and I’m sure that many people that the listeners know of

27:47

creators particularly visual artists but to some degree you know like authors people who write

27:52

prose where they feel like the Productions of

27:58

these artificial intelligence which I don’t like to say artificial intelligence

28:03

because it suggests a generality that I think is not yet where we are but these machine

28:08

learning models that produce particular outputs that have you know perhaps

28:15

useful value or cultural significance or things like that that they are derived

28:21

in some sense from Human produced Works whether I know someone who’s a visual

28:27

artist and has you know done paintings and drawings and things like that and those were used in the training data

28:34

for these various machines that images as outputs and they’re not getting any

28:40

royalties from a copyright perspective and but they’re also not even getting any attribution credit yeah

28:47

it’s a similar concept if you or talk about writing where the

28:53

particular kinds of words that I choose to use in my writing or the particular style of

28:59

you know grammatical and linguistic composition of words that I characteristically have in my writing

29:06

can be mimicked in some sense by the machine and I don’t get any credit for that there’s a sort of academic

29:13

tradition of giving credit where credits do and the feeling of many people is that

29:19

these machines are taking that content without giving this credit that is

29:25

merited and for some of them my acquaintance and whom I see in general discourse you know

29:31

articles and op-eds and things like that it’s people who you know are alive and

29:38

actually make money make their living from doing these creative works yeah and

29:43

the ai’s mimic these creative works and they feel that it’s undercutting the

29:49

their ability to do it the market for their Productions and they feel that

29:55

there’s something unjust in that um so I think it’s interesting to consider

30:02

I’m not sure where I actually would come down or how I feel about the the

30:08

copyright issue I think that’s um sort of much less plausible in my mind

30:14

that you know open AI makes a machine and produce some text and it should send me a you know a

30:21

thousandth of a penny royalty check for having based that text on the thing that

30:26

I created before um but certainly in the attribution sense

30:32

and in this case it’s much more work to actually do this than it is for the

30:41

watermarking question um I’ve had some thoughts about approaches that might be possible to you

30:49

know stipulating there was laws being written and you know these were actually

30:55

enforced on the producers of degenerative AI um how would they

31:02

assign this attribution and it’s

31:08

not necessarily easy I mean it’s not actually easy in the sense of um computational resources in particular

31:16

so when you have any kind of neural network and honestly you probably understand this better than I do

31:23

I understand you’re doing your doctorate in neural networks specifically

31:29

but there’s always you um this is true of you know convolutional networks this is true of

31:36

transformers this is true of any architecture in neural networks you have some sort of feed forward step in the

31:43

particular arrangement of neurons can be very different in these different Arrangements but you have a back propagation step where you adjust the

31:50

weights of parameters so you assigned to the different neurons a particular parameter weight and in all of these

31:57

machines you tend to find specialization of particular neurons they’re the you

32:04

know detector for well in very simple ones they’re detector for a vertical Edge and that neuron

32:10

yeah in an image analysis kind of network um

32:15

you know so this neuron or these few neurons sort of specialize on detecting vertical edges and other ones you know

32:22

detect horizontal edges and other ones detect color gradients this is all sort of toy this is about simple neural

32:29

networks and these you know billions of neurons kind of ones that are the current ones so

32:34

you know the image generator which has the billions of neurons and it

32:40

makes things that are in the style of Bangkok I think there are some collection of the neurons that really

32:46

are lighting up much more when you create that production than are most of

32:52

the others that mostly remain you know that with little output they’re not the

32:58

ones that are triggered by that particular specialization than whenever you know those are particularly lit up

33:04

during a production you give the footnote but the question is how do you discover

33:11

that and here we’re really hurt by very large

33:16

batch sizes of course very large batch sizes are completely required to do this in

33:24

realistic time frames you have a complex neural network in concept you could have batch sizes of

33:30

you know one sample and do the feet forward and do the back propagation every time you had one sample not a

33:37

million samples or you know even 32. um but your machine would be trained you

33:45

know a hundred thousand times slower and these big ones already take you know months on

33:51

you know thousands of gpus to train well we don’t want to multiply that by a hundred or a thousand or a million in

33:58

terms of the training time I don’t think the mask quite works for

34:04

the size of batches that parent machines use I mean what I describe you know

34:10

obviously this works for batch size one I don’t think it works on the 100 000

34:15

batch size speaking of the the laws in place and the upcoming class that will be decided by the governments and

34:22

everything is there in your opinion something that one of us either people

34:27

using those models are developing them can do to to help or to to to to help

34:35

with the progress regarding attribution copyrights and moral rights of all this

What can people who use or develop AI models do to help progress with copyrights and moral rights?

34:41

generated content I wish I had a better answer to that I mean I there may be an

34:46

answer that I have in the future I’ve had a conversation with um

34:51

yeah sometimes I can’t disclose more information exactly but um I I could

34:58

potentially be working on exactly this kind of thing if

35:03

someone I know decides to go forward with funding this and there’s been some coordination with lawyers and lobbyists

35:10

and obviously they need technology people to evaluate what

35:15

um if it happens at all relative to this you know to my contact

35:20

um but it’ll probably whether or not I have any involvement or this person has

35:25

any involvement in the actual process I suspect that a lot of people are having similar

35:31

thoughts so I think that there probably will be things like

35:38

you know standards bodies you know uh IEEE or ietf or you know things along

35:44

those lines um I don’t really know exactly what the relevant would want would be it would probably be something novel I mean a new

35:51

organization but you’d have to produce technical standards to describe

35:58

things like the techniques that I’ve talked about in this conversation and then of course you would have to

36:03

have lobbyists and lawyers and politicians and things involved in diplomats for treaty negotiation and

36:09

things that actually led to the enforcement of these technical requirements because I don’t think

36:16

there’s the watermarking is sort of cheap I mean like the computational resources that a

36:22

private company would have to do to do the watermarking is negligible the computational requirements for using any

36:29

of the kind of techniques for you know marking attribution in neuron

36:34

specialization and stuff is computationally expensive I mean maybe prohibitively so that it’s

36:42

just simply implausible to do it or maybe just expensive but it’s not going

36:47

to be a negligible cost if this becomes a requirement so

36:52

in the latter case of the attribution and moral rights and copyright I just

36:58

don’t see prior companies just doing it on their own um I can only imagine that happening

37:06

um as a result of an actual requirement placed upon them but then again

37:12

would some people like um bad people could just try to apply

37:19

watermark on like their own watermark on tons of images or

37:25

anything else to try to say that they generated them and like would that be an issue that

Watermarking in AI

37:31

that could come up with those watermarking uh for the images yeah definitely I mean if if the images are

37:37

watermarked in the style that I described of you know you can’t do something like a

37:43

cryptographic hash did this decide which pixels get modified slightly and then someone else takes the image and does

37:50

their own cryptographic hash with their own identity and adjust their set of pixels then you have two watermarks on

37:57

it one of which was authentic and another of which was not

38:03

um and distinguishing those technically does seem like it would be very difficult I think for the text

38:09

Productions there’s really not such an issue I mean the only way you can apply

38:14

a different Watermark applied in the the red green token marking style that I described is to create an entirely

38:21

different text I mean you come along with you know the the Louis you know

38:27

token plus Louis and shot hash of that to get your parody

38:33

um that’s just gonna produce entirely different words than anything not

38:38

anything but you know there’s not going to be a whole lot of overlap yes and you

38:45

can’t just in a text production just randomly change a word in the middle and have it

38:50

not be obviously wrong you know maintaining the quality of the text

38:57

so yeah so I think it’s not an issue for the texts but it probably is an issue for the images

39:03

do you see um any do you foresee any upcoming legal issues or

39:09

are ways we we can get in trouble by using such generative models either

39:15

through coding or writing images if we are not careful enough

39:22

um yes I think that there’s going to be lawsuits

39:28

in particular well there already are lawsuits um there

39:33

I’m I don’t know the exact legal status but there is some particular artist I know in um the United States I think in

39:41

the what what’s called the U.S ninth circuit there’s a there’s 12 circuits for uh Federal level court cases in the

39:50

United States um oh they’re just like geographic regions of the United States that anyway so this particular artist who does sort

39:56

of um covers for like fantasy novels that’s his schtick to do that and some people

40:04

who are using the various image generators either describe subject matter that’s similar to his and he

40:09

feels like it looks like his or they actually say in the style of his name and it you know makes things that look

40:16

like his pictures and he wants payment for that and there’s a variety of other

40:22

people who are visual artists who are in a similar position so you know that’ll my guess is that’ll

40:29

probably be dismissed at some point that case I don’t think it’s probably going to go super far but you know you never

40:35

really know with Court decisions but I’m pretty sure that the same kind

40:40

of things are going to start happening with the language models as well and perhaps it’ll be dismissed and not

40:48

go anywhere but perhaps it’ll you know go through all kinds of Courts

40:53

would you recommend someone using chat GPT and copilot even though they’re they

41:00

are maybe lawsuits would would you suggest someone to use it to use them

Would you recommend using ChatGPT and/or Copilot?

41:07

there’s sort of a saying amongst you know lawyers that

41:13

I mean or I mean this is oh it’s not really a saying it’s a truism that

41:19

certainly in the United States anybody can sue anybody for any reason

41:26

um so nothing you ever do in life is going to be free of the risk of litigation

41:33

I mean that’s just sort of the reality I don’t think it’s I mean there’s a

41:39

a bit of like anti-slap loss in some United States and in Canada and in other jurisdictions that somewhat

41:46

limit that and like in European jurisdictions they have a little stronger

41:51

protection for the defendants and malicious litigation

41:57

um but you know in general anybody can be sued for our normal activity I I don’t

42:03

have to use the generative AI I can just you know write some code today

42:10

and it doesn’t initially have to be in the interest of being restricted let’s say that you um

42:16

wrote This brilliant algorithm and You released it as GPL

42:21

which of course is a copyright term and it carries restrictions namely that the derived Works have to have a compatible

42:28

license they have to remain free software um so I’m working on this proprietary

42:34

software and I use your brilliant algorithm I’ve never used in any generative AI I just you know but I

42:42

read your GitHub repository and I use this and you know maybe you’re gonna sue

42:48

me and it really is a improper use I’m using this GPL code in my proprietary

42:54

software and I really shouldn’t do that that is genuinely a bad thing so I mean I guess it does raise additional risk

43:00

because I don’t think that you know running something through chat GPT absolves you of your GPU of your GPL

43:08

applications so you cannot really say that you didn’t know that in the sense

43:13

that if you are using copilot or any other it won’t be the the company that is

43:19

responsible it will ultimately be you that is using their tool to generate

What legal issues should the people using such technologies be afraid or careful of?

43:25

code openai has a lot more money than David Mertz does so if I’m sort of turning around the

43:32

hypothetical scenario but if you take this code that you know David wrote and

43:39

um whichever way I tell the story um you feel there’s I guess it’s the same

43:45

way but anyway there’s misappropriation oh yeah that’s right we’re still in the scenario sorry yeah

43:50

cut a little bit of my rambling um Louis wrote the brilliant algorithm David used it

43:56

but did not know that I was using it because I asked copilot to generate it for me

44:02

um you could sue me you know I’m somehow using it in an improper manner I don’t

44:07

have that much money um Microsoft and openai have a lot more money who is more appealing for you to sue

44:14

yeah that makes sense so an individual like ourselves

44:22

shouldn’t be really be scared of the proprietary that data of open AI it’s

44:27

mainly well of these companies it’s mainly the companies building the assistant or ultimately the the the

44:34

entity that has the more the the most money that is that should be scared of

44:40

of that and not just users like us

44:46

seems likely in some sense but of course I mean it’s litigation is complicated I mean you

44:52

have this alleged mispropriation of your code because it went through the generative AI

44:57

I mean okay David Mertz doesn’t have that much money but maybe David Mertz works for

45:03

you know IBM and I and indirectly that by IBM is the misappropriator well they

45:09

have a lot of money too um I could look at the you know whatever the balance sheets of the various

45:16

companies but the point is that in in all these cases there’s companies with billions of dollars

45:23

um so as a litigation Target that certainly becomes appealing to choose any one of

45:28

these so whether it’s Microsoft open AI or whether it’s you know David Mertz IBM

45:34

I don’t currently work for them but hypothetically I could um

45:39

you know there’s some Deep Pockets that your lawsuit is looking for

45:44

are you ultimately optimistic regarding those

Are you optimistic regarding generative AI?

45:49

generative assistant as well as the legal components that that comes with

45:57

them yeah I haven’t I have no idea how any

46:04

litigation like this is going to turn out I mean I I’m not optimistic but I’m not pessimistic either I’m just curious

46:09

really in terms of their utility though I do think that and this is actually an area

46:14

where they’re getting better and they’re often useful now

46:19

but they do have to be used with great caution because the laws are produced subtle but terrible mistakes that if you

46:27

just you know don’t think about what it did well enough you get code in production that can cause bad things to

46:33

happen um so I actually think in education there

46:40

should be I mean like I don’t mean like primary education like school kids writing at school paper I mean like you

46:46

[ad_2]

Source link