Database Application 10/24/01 I came in late, so Kim took notes for me. Here are all of her notes and then my notes from the time I came in. Thanks, Kim!!! Kim's notes begin … Midterm topics – Data Definition Language – comes out of the handout from Lab 1. 1. In DQL DDL, you would be expected to write a statement that creates a table – CREATE TABLE statement that he's worried about – here is where we specify constraints… - attributes (data types) - Primary keys (entity integrity) - Foreign keys (referential integrity) - check constraints (domain integrity) (set of allowable values) Pseudocode is acceptable. Check PhoneNumber Like ([]…) Check Gender in ('M', 'F') (IN operator) Check VotingAge >= 18 Important thing is understand how to specify constraints and why they're needed. Won't expect us to know how to do this in VB, only SQL. 2. Fat Server/Fat Client Handout Need to Understand how these are "divvied up" in a distributed application and be able to explain the advantages and disadvantages of each architecture. Single tier – all goes together 2-tier – what we did last year – logically separated it into a server and a front end last year, even though it was all on the same machine. Not scalable. One advantage is that it's easier than 3-tier. Easier to write because probably less project people involved. 1st tier - Front End handled presentation and business logic 2nd tier (back end) handled database logic 3 tier – brings database component to a single server, only one place it needs to be changed. Scales well. Web "stuff" scales very well. - Presentation logic (how do we presentt the data to the user, etc.)? - Business or application logic (policiies for backorders if not in stock – how is that specified in the application) - Database Logic (how things get storedd, updated, deleted) 3. Somewhat handled in number 1. Must understand the necessity of constraints. Should now have a notion of why data needs to be constrained and what mechanisms you can use to do this. Q: Why do we have validation in all three (two) tiers? Because the application (in the real world) has other users, other applications, and even though you do your job, they might not do theirs, and your application could suffer as a result of other applications' bad data. The mindset we have to get out of is that we own the database and that we're the only people using the DB. Get away from the single user application mindset. *** Review CH 10 (NO view processing or security and control) *** Cross 7 off the list – database recovery. This will not be on the test. 4. When I delete something, what happens if there is a foreign key pointing to that (child record)? We have a number of choices on how to handle that. Most DB's do a "restrict delete" which won't let you delete it. This is probably the most common way of dealing with this. If you want to delete a parent record, cascade delete will delete all child records when the parent is deleted. Another possibility – Nullify delete - if I delete the parent record, set the foreign keys to null. (like "these" customers don't have a sales rep. He doesn't think SQL server supports the Nullify delete – the first two are commonly supported. Even with a restrict delete, a stored procedure can be written to delete the parent records "upwards" – this method is common. DRI – Declarative Referential Integrity – can specify when I build the schema – check off "cascade delete" You should understand why you're running into these problems with your database. You see them as errors, but they're really not. Cascade deletes are covered in Ch 10 he believes. 5. Ch 12… Inserting (updating) something into the customer table Insert something into the order table Insert one or more lines into OrderDetails table Insert one or more in Inventory table In terms of a transaction, we want all of that to happen. What if it all doesn't go? Have this notion of transaction properties – first one is always that a transaction is Atomic. Atomicity (non divisible) SQL by default does auto-commits. So it's up to the programmer to mark the transaction boundaries. Do this by starting with "Begin transaction" and "End Transaction" – marks it as one transaction. Hand out talks about things you have to do to prevent parts of the transaction and roll it back if something fails. Consistent - All integrity constraints need to be satisfied when it commits. Isolation – If you haven't marked transaction boundaries, it's possible for multiple transactions to become interleaved. Isolation says if that's the case, the other transaction that's happening can't see what you're doing. It's in an intermediate state. Other transactions are not allowed to see the intermediate steps that are going on (the innards). (Serializable) Durablity – once it's committed, it's there. Transaction log guarantees Atomicity, Consistency, Durability. These are used in the recover procedure. By looking at what hasn't been committed, you can roll those back. To guarantee isolation, you need locks. Notion is simple – if I'm going to do something to a resource, it's mine and you can't have it til I'm done. Different types of locks.. Look at handout. Lock granularity – how fine is the lock. Row – high degree of concurrency. - page (8 kb) - table – typical with bulk updates to a table Low – lock database. No concurrency. I'm the only one that has it. Up to server to decide how it implements locks, even if you "make hints" at how you want it handled. We have these because the higher degree of concurrency you have, the harder it is to manage it - more costly in terms of resources to manage. SQL is pretty efficient. Locks come in two forms - shared… READ only - exclusive – update or insert – this iis mine until I'm done with it. - update – read about this one also - schema – not too interested in this 6. Deadlock 7. Cursors (added) CH 12 pg 318 Cursor is what allows you to move through a recordset. Cursors are an expensive process. Lots of "overhead". Cursor types: - Forward Only – any changes to the acttual data are NOT reflected in the data I have. All I can do is move through these one at a time. Very cheap cursor. Changes AHEAD of the cursor can be seen. (State table, don't care, not likely that things will change) - Static – Will allow you to move back and forth. A little more expensive. Changes from other sources are NOT visible. (again, don't need to see changes immediately) - Keyset – expensive. Allows you to seee all changes. When the cursor points to a record, you have to go get a current copy of the record. What you don't see are the INSERTS cause they're not part of the original set. - Dynamic – I'm always checking back wiith the DB to see if anything has been changed. What do I need to be able to see? Often, not very much. Client-side cursors – either a forward only or static. Couldn't give you the other two. This is less expensive. Takes work away from the back end and puts it on the front end. Andi's notes begin … Transaction properties: A transaction is atomic (ACID) Atomic Atomicity – non-divisable This sequence of actions is viewed as something that has to all complete or not at all. The problem here is that SQL Server, by default, does auto-commits. If you don't tell it not to, it won't. By default, each of these things commits as this runs. It's up to the programmer to mark the transaction boundaries. We did that with "begin transaction" and "end transaction". Conceptionally, now the whole thing is viewed as all one thing. Begin transaction Insert (updating) tblCustomers X Insert tblOrders X Insert tblOrderDetails X Update tblInventory End transaction SQL doesn't view a violation of referencial integrity as reason enough to abort the transaction. When you read your handout, you'll find that there's special code to do this. Do operation Check error code If it's not 0, bail out (or roll out the transaction). There's some code in there to handle this that needs to be added to the sample code he gave us. Consistent All integrity constraints (all types of other constraints) need to be satisfied. Anything that would make this consistent is not allowed here, and you can't use a transaction to do that. Isolation You have to understand how things are being processed to understand this one. The server sets aside memory and other resources to allow things to happen at the same time Part of your transaction happens at the same time somebody else's transaction happens. It's possible for other transactions to happen in the middle of yours. Isolation says that the other transaction can see what you're doing. There are values in here that may change before this commits. Isolation = other transactions are not allowed to see the intermediate steps that are going on. It's not allowed to take data out of the intermediate stages. Durability Once you commit, it's there. Once the database issues a commit, it's written back to the data store, and it's a permanent update. 3 of these properties we get because of transaction log. We have one in our db. When ever you change things, this log gets filled up with the values before and after the changes, where are the commit points, which things haven't been committed, keeps track of when the computer crashed, etc. Transaction log allows the recovery to take place and helps guarantee C, A, and D. This is automatic. If it crashes, it goes to the transaction log and gets the information it needs. Isolation is a different story. The log is not any good for that. We need locks to guarantee isolation. See handout 9 If I lock something in the database, nobody else can access it until I release it from being locked. There are different types of locks: We want a high degree of concurrency. Depending on how much we lock, we restrict activity. How fine is the lock? This is defined by the lock granularity We can lock a row (RID) If I lock the row, I get a high degree of concurrency. Lots of users can be on the database. Locks are very short. They're so short that it's hard to catch them. Way at the bottom, I could lock the database (DB). When I do this, I am the only person who can be in the database. This would only be applicable for major updates to the DB. You could lock a table. I could lock the customer table. That's typically a lock that would occur because there are bulk updates. You don't typically define how these things will be handled. When you write transactions against the db, the server decides how to lock it. You can make suggestions ("hints"). The server doesn't always take your suggestion. Page lock Some extent of the DB (like 8 KB or something like that) Why do we have all of tehse different types? The higher degree of concurrency you have, the harder it is to manage it. If you lock a row, you might have to lock a customer, an order, an order line, etc. Locks escalate a lot at the row level. It takes mechanisms to manage that. The higher the concurrency, the more expensive it is. SQL is pretty efficient. It locks by row by default if possible. Locks basically come in two forms. Shared lock – all I want to do is read it. I'll lock it and then give up the lock almost immediately in case you want to look at it. You can have a shared locked that changes into an exclusive lock. Exclusive lock – when you update or insert something, you need an exclusive lock so they can't make changes to the same record at the same time you're making changes to it. On p. 3 of the handout, see "Shared", "Update", and "Exclusive". Locks are what implement the isolation property – to keep us from undoing other people's stuff while they're working on it. Deadlock … TblCustomer TblOrder I have a transaction here … Trans A locks the customer table Trans B locks the order table The intent of both of these transaction was to process a customer order. Trans A needs the order table. Trans B needs the customer table. But, both of those resources are already locked. This is deadlock. Neither trans is willing to give up. SQL Server is pretty good at detecting these things when it's about to happen. But, when it does happen, and SQL hasn't detected it, you only have one choice. One transaction has to die. SQL will push one out and give the user an error message back. Users have to understand what this kind of message means. You can't (and don't want to) prevent it. This used to be a constant problem. Cursors: All this stuff is in ch. 12. Cursors start in ch. 18 Suppose I have a "select" statement … Select * from tblCustomer where Zip = '12345' I just selected a bunch of records from my customer table This group is called a record set. Q: Where does that record set live? A: Memory The table data still resides in the table. You have a copy of it in memory that includes just the records you selected. You need some mechanism for moving through these records individually. On the SQL side, moving the cursor is called "fetching". In ADO you have move first, move last, move previous, etc. So, you can look at this one row at a time. All of this and all of the mechanisms it takes to support it is in memory, and so are dozens of other. The server has to allocate all of this memory. This is an extensive process. Cursors don't come cheap. It's a major load on your server if you have a lot of concurrency. Look at p. 319 I have a copy of the data. I don't have the actual data. What happens if a transaction comes in and deletes 20 from the table while I have 20 in memory. I can have "forward only" cursor. Any changes in the actual data are not reflected in the actual data that I have. All I can do is move through these things one at a time. This is a very cheap cursor. If I want to be able to see the changes, I have to rebuild the query every time I move to the next record. That would be expensive. ADO – changes that happen in the table can be seen if they happen ahead of the cursor. You can get a "static cusor", and it will allow you to move back and forth. It's a little more expensive. Changes from other sources are not visible. If 20 is deleted, I don't see it. Is it likely that anybody is going in and change a state or a tax rate? No. You don't need to see these changes because you'll see them the next morning, and that's good enough. The other cursors have to do with being able to see updates and stuff. The next cursor is the "key set" cursor. You go out and fetch the data each time you move to the key field. You're going to get the most accurate information here. But, if they do an insert, you won't see it. This is sort of expensive. Each time you access the row by the key, you go back and check. Dynamic always goes back and checks the data. What do I need to be able to see? Most of the time, it's not very much. The server has to be able to let you know if there's anything going on that's going to affect you. We can have client side cursors (by default – they'll probably give you forward-only or static cursors – they can't give you the other 2). This runs on the client side. We've taken the burden of managing the cursor on our side. On the client side, it's less expensive, it takes work away from the back end and puts it in the front end, but it's applicable at times. This will work with state table, for example. StateTable.Cursor = ad ??? For the most part, the data will remain stable long enough for us to get our work done. Surprisingly, you can get away with not looking at the database very often during a transaction. Access is always in constant connection with the DB. We use the client-side cursor. Many people, including Waldrop, don't like to use them. They're pretty expensive. SQL is a set-base language. Triggers seem like a nice idea. But, who knows what goes on inside the trigger and how that affects the current transaction. Waldrop avoids them, though. Test – Ch. 10, 12 and handouts. He mostly used Ch. 10 this year to address the "constraints" issue All DB access is going to go on through us through middle tiers. Users should only have access to what they need – no more and no less. Look on "Transaction Processing" handout, there are things that the user can do on the SQL side, but we need to do them in the code on our side. There's a lot of responsibility on the back-end and middle-end developers. The sure sign of a beginner is no referential integrity and no constraints. Database Applications Programming 10/24/01 Page 1 of 8