-*- Mode:Text; Fonts:(MEDFNT) -*- (hardcopy-file "it:keith;kodak-problem.text" :page-headings nil) MEMORANDUM TO: Mike Grandfield FROM: Keith Corbett DATE: August 11, 1986 SUBJ: Eastman-Kodak TCP/IP Problems -- Second Request for Help COPY: John Salis, Ron Rando, Roger Frye, John Insabella, Sarah Long I have not received a response to my memo of 8/5, as follows: Thank you for trying to get some help for Paul Mazarella, who has been at Kodak trying to solve recurring TCP problems. Unfortunately, he is unable to solve their problems with the tools available to him. TCP is an integral part of Kodak's expansion plans, so it is especially important that we convince the customer that LMI can find a solution. In short, we need escalated attention from Engineering. They are presently trying to communicate with a Symbolics, which is a known and operational configuration elsewhere. Full details will be available from Paul when he returns. As far as I can tell there are two distinct problems: 1) Directory listings (e.g. LISTF and DIRED) from the Lambda to the Symbolics do not work at all. [Has this been tested??] There are two symptoms that recur alternatively: a "soft" error, when a pathname is returned but no directory information is listed; and a "hard" error, when the function hangs in "Exos Reply" and eventually times out. There is no rhyme or reason: either error may occur any number of times, followed by the other. 2) TELNET and FTP functions work properly but not consistently. Periodically the situation degenerates such that every attempted TCP transaction hangs in "Exos Reply" and times out. Re-booting does not always help. The only method we know of reinitializing the board from software does not work: (setq tcp:dma-initialized-p nil) (tcp:start) To complicate the picture, sometimes this situation seems to cure itself. Re-booting *both* the Lambda and Symbolics has also helped on one occasion. Worse, the system manager (Sarah Freedman) has twice rebuilt the Symbolics software, and this helped for a little while. Paul has seen the same symptoms while running on two bands built by the customer, and on a band he built himself. Also he tried swapping Exelan controllers, which did not help. He also reseated transceiver cables, without effect. Paul described the problems to Peter DeWolf, and they agree that this is an unusually flaky situation. Peter said in essence, 1) because Kodak and Cust. Serv. don't have the "netstat" utility, we cannot debug the problem 2) he hasn't been working on the 1.4 release in months, and they don't have it running in Engineering. Apparently there are utilities of the kind we have requested in the past which have not been made available in the release and/or to us. This is very unfortunate, since we have had to send a person on-site without the tools he needs to debug and solve the problem. Paul feels that we cannot work on this class of problems in the field without additional tools he has specified. [Reference: TCP/IP meeting minutes.] The second concern Peter described has been heard before; apparently few or no resources are available in Cambridge to work on maintenance of the current releases. This is a "meta-problem" that we should discuss some time. Regarding Kodak, it is critical that we get help in isolating this to software, hardware, or the network (e.g. a problem on the Symbolics). Please let me know what action plan Engineering proposes. Since that memo, we have sent a field rep on-site to replace a suspect TCP/IP bulkhead assembly. The one on the customer's system was definitely built incorrectly. After replacing it, the customer experienced the same flaky symptoms as described above. Then, the system crashed doing a (TCP:START), and had to be reset hard (no response at the monitor). We understand that George has seen this kind of flaky behavior even in Cambridge, and does not yet know the cause. We have done everything we know how to do. Please advise as to how Engineering will escalate this problem. KC