Wednesday, September 2, 2020

File/FTP/MFT Adapter in SOA IQA

 FILE and FTP Adapter

   It is used to exchange (read\write) files on local systems or remote file system using FTP(File transfer protocol).

FTP : File Transfer Protocol

It is standard protocol that is used to transfer files between computers and servers over a network.

 Operation types:

Oracle File adapter:

o   Read file (inbound operation) [Read file operation that polls for incoming files in your local file system.]
o   Write file (outbound operation) [It creates outgoing files]
o   Synchronous read file (outbound operation) [Read the current content of files]
o   List files (outbound operation) [It lists file names in specific locations]

Chunked Read : this operation that synchronously reads file data in chunks and can be used  only with BPEL.

Oracle FTP Adapter:

o   Get File (inbound operation) [It polls for incoming files at the FTP server(read).]
o   Put File (outbound operation) [create outgoing file (write)]
o   Synchronous Get File (outbound operation) [It reads current content of file.]
o   List Files (outbound operation) [It lists file names in specified ftp locations]
Chuncked Get file : it synchronously reads file data in chunks and can only be used with BPEL.

   The oracle file and FTP adapter can read and write the following file formats

o   Xml(both xsd and dtd based)
o   Delimited
o   Fixed positional
o   Binary data
o   COBOL Copybook data

Q) What all differences we have between File and FTP adapters.
Following are the major differences between File and FTP adapter

File Adapter
1. It is used to deal with local systems only.
2. Do not require any configuration to make this adapter work.
FTP Adapter
1. It is used to deal with remote systems.
2. Outbound Connection Pool needs to be configured with FTP details to make this adapter work.

Q) What is synchronous file read?

Sync Read option in BPEL file adapter allows us to read the file from the middle of the process;

As we design the file adapter for sync read it asks for the file name which is static,

This means that only file with this given name will be read.

Q) What is sync read option in file adapter? How is it different from read?

The read operation can be either a “synchronous read” or “read”,

1. Sync Read option in BPEL file adapter allows us to read the file from the middle of the process, this is different from the Read option which polls for the new files and is the start of the BPEL process.

2.  In Synchronous read, inside an a BPEL process we can provide the file details and read a file,
In a Read option which polls for the new files and is the start of the BPEL process.

3. When file has to be read in the mid of the BPEL process, then we will use syncFileRead Operation,
means some process should initiate the file read process and it is an outbound operation and process can’t begin with Sync File read.

As we design the file adapter for sync read it asks for the file name which is static, this means that only file with this given name will be read.

Now suppose if we need to read files having same format but different names.

How are we going to do that? 

Dynamic File name for sync read operation in FTP Adapater.
how to use a file name that is dynamically created for sync read operation of FTP adapter

In your bpel file:

1. Create a variable which will contain the file name

<variable name="FileName" type="xsd:string"/>


2. Use an assign activity and assign the file name you want to use to FileName variable

<assign name="Assign_FileName">
    <copy>
      <from>'TestFile'</from>
      <to>$FileName</to>
    </copy>
</assign>

3. Right click on invoke activity that invokes FTP adapter and go to Edit ->Properties ->To
Create a new property and select Name as jca.ftp.FileName and Value as $FileName
In source it will appear as:

<bpelx:toProperties>
        <bpelx:toProperty name="jca.ftp.FileName" variable="FileName"/>
</bpelx:toProperties>

Now at run time TestFile will be read irrespective of whatever file name specified for the FileName property in your {xyz}_ftp.jca file.

Now we can give the name of the file at runtime as the input parameter or we can define deployment descriptor to give the file name.

Note: passing the file name when you are reading the file in the middle of a running BPEL process. Also you can not pass the file name when you poll for the files.

Q) Can process begin with a syncFileRead operation ?

No, since it is an outbound operation so you need to invoke it through bpel process

Also, there is difference in configuration wizard for both Read & Sync Read

In the Read option, it will ask for polling frequency & age of file, whereas in Sync read there is no such option

Q) Is it an inbound or an outbound operation?

SyncFileRead is outbound operation.

Q) Can we use a File Adapter to get a file without reading its content?

Yes, by selecting the Do not read file content check box in the JDeveloper wizard while configuring the "Read operation."

 It means you dont want to read file reguraly but you need access for some information such 
as file name, directory path or file size.

Q) What do you mean by Native Format Builder
Native format means, we read the file in opaque format, we do not convert it to XML format from its original format.
its genrate a XSD for native format or non-xml message.
"The genrated grammer or XSD used by the translator  at runtime to translate a native 
format message into xml message and vice-versa."

Q) What will be output for file list operation?
 File List operation shows that what all files are available in the specified folder and also show file name, file size and time stamp of each file.

Q) What is file debatching?

When a file contains multiple messages, you can choose to publish messages in a specific number of batches, This is referred to as debatching

During debatching, the file reader, on restart, proceeds from where it left off in the previous run, thereby avoiding duplicate messages

File debatching is supported for files in XML and native formats.

 if i have a single file with multiple records in it then i can opt to go for a batch process. 

How to Poll Single file from FTP/File location which has multiple files ?

Lets take one example if we have include file wild card as *.txt and there are 2 files available at File/FTP Location. So read operation will read both the file and create two instance and created time will be same for both the instance. To avoid such scenario and read only one file at a time, we need to use below property in jca file.

<property name="SingleThreadModel" value="true"/>

      <property name="MaxRaiseSize" value="1"/>

Q) Can we read flat files using these adapters?
 yes, we can read both Flat and XML files using these adapters.

Q)Do we have support for multiple directories in File & FTP adapter ?
 Yes, we can specify more than one directory for these adapters. This is applicable to both physical and logical directories.

Q) What is difference between Physical and Logical Directory ?.
Below are the major differences between Physical and Logical paths.
Physical Path
1. As name suggest, we mention actual full path (physical) of directory
2. Not Flexible
3. We need to manual change this when difference environments has different paths

Logical Path1. Here we can mention any logical name and actual value of that path define in Composite.xml file.
2. Flexible as we can change it from EM console.
3. We can easily replace this path with the help of Config plan if we have difference paths in different environments.

Q) How we read large files using these adapters?
 We can use streaming option to read large files using these adapters. We can also read the files in attachment form.

Q) Can we append to existing file using these adapters?
 Yes, we can append to existing file using these adapters.

Q) What is minimum file age property ?
" This property is used to ensure proper hand shaking, by using this property we can ensure that we
 read only that file that are completely written by clients, we don’t read incomplete files. e.g. If we set this property value as 2 minutes, adapter will only read that files that are 2 minutes older in that folder."

Q) What is Trigger file option?
 By default, polling by inbound Oracle File and FTP Adapters start as soon as the endpoint is activated. However, if you want more control over polling, then you can use a file-based trigger. Once the Oracle File or FTP Adapter finds the specified trigger file in a local or remote directory, it starts polling for the files in the inbound directory.

For example, a BPEL process is writing files to a directory and a second BPEL process is polling the same directory for files. If you want the second process to start polling the directory only after the first process has written all the files, then you can use a trigger file. You can configure the first process to create a trigger file at the end. The second process starts polling the inbound directory once it finds the trigger file.

Q) Can we change the file name and directory path at runtime ?
 yes, we can change the file name and directory path at run time, for that open the Invoke activity used to invoke file/ftp adapter then go to Properties tab and update the following properties for the same.

jca.file.FileName/jca.ftp.FileName
jca.file.Directory/jca.ftp.Directory

What is HA File and FTP Adapters?

In the clustered environment, File and FTP adapters should be used as HA(High-Availability)

Inbound:It  is controlled by Control Files and avoids the race between the manages servers in reading the files where the reference of the files read by the managed servers will be maintained in  the control directory.

Outbound:It is controlled by DB Mutex table exist  in the SOA dehydration store and this avoids duplicated been written to the same file when all the managed servers in the clusters process the same messages.

Oracle Managed File Transfer (MFT):

Oracle has introduced MFT (Managed File Transfer) tool in Oracle SOA 12c. MFT enables secure file exchange between two points; these two points can be internal or external.

In this release of Oracle SOA 12c, we can't use Java DB for MFT. We have to use Oracle database.

There are some Prerequisites for MFT Installation which are mentioned below.

Oracle Database should be installed.
Oracle SOA 12c installed
RCU

Q) What is Managed File Transfer?

MFT is a simple and secure End-to-End Managed File Gateway.

At the base MFT uses an “Embedded” (S)FTP / SSH server which support HA clustering.

MFT has a scalable architecture, which mean it can easily be expanded by adding another Weblogic node to the cluster.

It also includes an extensible framwork for pre/post processing of files.
MFT integrates with Standards Based Middleware like (S)FTP, SOA, B2B, Service Bus and Web Services.

Highly available, clusterable solution.

"Managed File Transfer has support file delivery of very large files ~ 500GB+ which can be ZIP compressed/decompressed and encrypted and decrypted using PGP encrypting." 

One main feature is the possibility to send files via Web Services using Pass-by-Reference (Claim/Check pattern).

It can be a reference to a FTP or File location, but there is also inline (base64) support.

File transfers can be scheduled and delivery to target endpoints can be paused, resumed and resubmitted.

If delivery fails then files can automatically be retried.

MFT can send notifications when files are delivered or when transfers fail.

Deliveries can be done through HTTP, JCA, FTP or in-memory.

Managed File Transfer support (custom) callouts to archive, move and delete files.

In-depth look into Managed File Transfer
There is a growing problem with FTP in the enterprise where there is a lack of control, visibility, security and reliability.
The lack of control is due to the uncontrolled proliferation of FTP servers & clients.
Departments are creating stand-alone FTP servers and configuring users where needed.

There is no central FTP server.
Because of this there is no global visibility of the exchange of crucial data files – including customer data.

It is highly possible that these FTP servers are not integrated with enterprise security standards
where as FTP servers are rarely integrated with directories.

Because these FTP servers run stand-alone they are a single point of failure and rarely offer HA capabilities, which effects the reliability.

Disadvantage of FTP
Not a secure protocol.
Data being transferred is not encrypted. Data is sent in clear text.

This can be tackled using Managed File Transfer.

SetUp FTP Adapter configuration in SOA weblogic console

1. Open       http://hostname:port/console
Click Deployments   –>  FTPAdapter

2.     Go to Configuration  –>  Outbound Connection Pools
Click New to create new FTP Configuration

3.     Select javax.resource.cci.ConnectionFactory
Click Next

4.Enter the JNDI name and Click Finish 

5.Now click on eis/ftp/read to enter the configuration
Select either Public key authentication or password authentication and
click Next

6.Enter the host name
Click Next

7.Enter password  and port number 21 for FTP and 22 for SFTP
Click Next

8. outbound connection properties
Click Next

9. Enter Server Type either win for windows or unix
Click Next

10.Enter Username , if your using SFTP make useSftp to true
Click Save

11. Once Configuration is saved Update FTP Adapter
Select FtpAdapter and click Update

12. Select Update this application in place with new deployment plan changes
Click next

13.update application assistant

14.Once update is done click Activate Changes

FTP Adapter Configuration for SFTP
Two configuration steps are required now:
A. Create a private and public key file
B. Add a FTP Adapter Outbound Configuration for SFTP

A. Create a private and public key file (linux/unix):
1.        Log in with a command prompt as the oracle user (the user under which weblogic runs) on the weblogic server
2.        Navigate to the .ssh directory under the user home directory: cd ~/.ssh
3.        Generate a public and private key with ssh-keygen (and accept defaults): ssh-keygen (4 x Enter).
Two files are created: id_rsa and id_rsa.pub
4.        Add the public key to the ‘autorized keys’: cat id_rsa.pub >> authorized_keys
In a production environment these files should be write protected (even for the oracle account itself).

B. Add a FTP Adapter Outbound Configuration for SFTP:
1.        Log in with admin privileges on the weblogic console
2.        Click on “Deployments” (second item in menu on left side of page)
3.      Search in the list of deployments for the “FtpAdapter” 
(you may have to navigate to the next page with “Next”) and click on its name (it’s a link):        
(hint: Customize this table -> Number of rows 100 -> Apply)
4.        Select tab “Configuration” and its subtab “Outbound Connection Pools”
5.   Expand “javax.resource.cci.ConnectionFactory” by clicking on the + icon :
6.        We’re going the make configuration changes, so create a session with button “Lock & Edit” (upper top left of screen)
7.        De button “New” is enabled. Click on it.
8.        Select the only option “javax.resource.cci.ConnectionFactory” and click “Next”
9.        Enter a descent JNDI name, e.g. eis/Ftp/TimeCardsSftp (Be precise! This name is used in the software to connect)
10.        Click on “Finish”
11.        Expand “javax.resource.cci.ConnectionFactory” again
12.    Click on the outbound connection you’ve just created, e.g. on eis/Ftp/TimeCardsSftp. 
We’re going the change some properties.

Attention! The UI is a little awkward. 
Be sure to use the ENTER button after changing a property otherwise the change will not be saved! 
Unfortunately this causes the UI to navigate back to the first property page  
Property adjustments (different from default values)

host: <host name or ip address of ftp server> 
password: <password of ftp account> 
privateKeyFile: <path to private key file> e.g. /home/oracle/.ssh/id_rsa 
username: <username of ftp account> 
useSftp: true 

13. Now press button “Save” to store these settings in the deployment plan of the FtpAdapter.

" (The first you’re asked in which file these settings has to be saved. My advise is to rename Plan.xml to a more descriptive name, e.g. FtpAdapterPlan.xml)"

14. The FtpAdapter has to be redeployed with these new settings. Go back to the list of “Deployments”. 
(second item in menu on left side or use the ‘breadcrumbs’ in top of page)
15.  Do NOT click on the FtpAdapter, but select it! 
16.  Click on button “Update” (on top or button of list)
17.   Accept the already chosen option “Redeploy this application using the following deployment files”
 and click on button “Finish”
18        Finally you have to activate these changes by clicking on button “Release Configuration” (upper left top of page)

Note: In case of no password, FTP uses public/private key
1.    Authentication type
2.    Host
3.    Private key file
4.    User name
5.    Use stop

Ex:
1.    User name = filezilla user name (ex: file Zilla 243)
2.    Password = filezilla Password (ex: welcome1)
3.    Port = 21 (default)
Host name = local host /192.***.***.3 (default) 

File Adapter Wizard:

Steps
1. ServiceType: FileAdapter
    ServiceName

2. Adapter Interface
Define from operation and schema(specified later)

Import an existing WSDL
WSDL URL
Port Type
Operation

3. Operation
ReadFile
WriteFile
SyncronousReadFile
ListFiles
Chunked Read

ReadFile
Operation Name: Read
checkbox Do not read file content
checkbox use file streaming  [which improve adapter performance with interaction of read large                                                                 document from the file system.]
checkbox Read file as Attachment

WriteFile
Operation Name: Write
checkbox Add output header

SyncronousReadFile
Operation Name: SynchRead
checkbox Read file as Attachment

ListFiles
Operation Name: FileListing

Chunked Read
Operation Name: Chunk Read
Chunk Size: 1

4. File Configuration 
Read
Directory for outoing Files - physicalPath or Logical Name
checkbox process file recursively
checkbox Archive processed files
Archive directory  for processed files (physical path )
checkbox delete file successful retrieval

write
Directory for outoing Files - physicalPath or Logical Name
FileNaming Convention 
checkbox append to existing file
checkbox Number of message equals 
checkbox Elapsed Time Exceeds
checkbox File size exceed

Use File streaming
which improve adapter performance with interaction of read large document from the file system.

Read File as attachment
when we don't wish to go through content of file but wish to pass through file movement.

File Filtering
Files contain multiple messages:
  publish Messages in batches of: 1,2,3....n times

-> if we check this, it means it will go to my file , 
it will take first record putted into output file and for another record it will create another output file. 
so for transaction of whole data or record dont select this.

add output headers
it will give response in headers.

polling Frequency: sec,min,hours,days,weeks.

Append to existing file
suppose if u have two input file with same data then it will append and give one output file.

Define Schema from Native Format: It generate a xsd for non-xml messages.

Publish message in batchs of 1:
 it means it will read one row or one content from one input file and create multiple output file for on one record.
If, you want to transfer whole data into single output file then unchecked this.

File polling
in how much min or sec soa will keep on checking input folder is any file is their or not .
[sec,min,hours,days,weeks.]

write file
it creates outgoing files.

Synchronous Read File
it reads the current contents of a file.

list file
It list file name in specified locations.
it retrieve a list of files from a target directory. 
This list of files is returned as an XML document and contains information 
such as file name, directory name, file size, and last modified time

chunked Read
It synchronously reads file data in chunks and can be used only with BPEL.

File ChunkedRead
"This is a feature of Oracle File and FTP Adapters that uses an invoke activity within a while loop to process the  target file. "
This feature enables you to process arbitrarily large files.
If an invalid payload is provided, then ChunkedRead scenarios do not throw an exception. 
When a translation exception (bad record violating the NXSD specification) is encountered, 
"the return header is populated with the translation exception message that includes details such as line and column where the error occurred. "

You must check the jca.file.IsMessageRejected and jca.file.RejectionReason header values to ascertain whether an exception has occurred. 
Additionally, you can also check the jca.file.NoDataFound header value.

Others RND and examples

Processing large files in Oracle SOA

Explanation1:
Processing large files through SOA Suite using Synchronous File Read
Reading files using SOA Suite is very easy as the file-adapter is a powerful adapter. 
However, processing of large files is less trivial. 
You don’t want to read the huge file into memory and then process it. 
Preferable you process it in smaller chunks.
Chunking the file using the “Read File” option of the file-adapter is pretty straight forward, 
all you need to do is to specify the publish size. 
Working with chucks for the “Synchronous Read File” option used from BPEL is less easy. 

Explanation 2: 
Read large XML files in chunks
Three approach to processing it- that is through BPEL,OSB and MFT :

For BPEL and OSB it is going to be the same concept that is to use chunking

chunking is different than debatching
when you debatch your file you actually create multiple instances for the file. 
However, when you chunk read you actually read your whole file in chunk within a single instance.

Since I was doing chunk reading the transformation also was happening for smaller chunk of files and at the end i was appending all the transformed output. 

This final output i was using to call a package in DB.

In case of chunk reading this was very smooth and i didn't see any performance lag.

Just to give you more idea on this, 

 I tried to process a 1MB file without chunking and it was taking around 8 minutes (you have increase time out) just for transformation of file. 
However in case of chunk reading my whole process gets completed in 30 seconds.

If you are planning for chunk reading there are few things you should know beforehand:

1.Chunk reading doesn't allow to re read the same file name, so if you want to reprocess the same file you need to change the name for the file. 
It happens because it creates a reference in server. 
Weblogic has its own cycle to delete these reference .

2. It does not move file by itself you have to move the file manually.

Q) I need to pick files from two different locations with same FTP adapter and need
to place them in local location using File adapter.

If you know when the file will arrive, I mean the frequency you may be Sync File Read option within ur BPEL process which can be scheduled with quartz scheduler. Else you will have to use One FTP adapter per path. As we cannot change path for inbound FTP Adapter.

I came up with different scenario that, I am having 20MB file which consist of 3lac records.
need to insert into database. the file is not satisfying the schema but need to insert into Database. I am handling this type of files through valves and pipelines(normalizing the each row by adding required fields) and able to insert into database, but while inserting 20mb file the server getting down. So please suggest me any idea to overcome this issue.

using ODI in scenarios where large payload need can be processed.

If we have many different FTP paths - more than 100, then Oracle FTP adapter is NOT the perfect one to choose.

But if we have less paths, then Oracle FTP adapter is a perfect solution as it can be orchestrated within Oracle Suite with more powerful engines like BPEL and BPM.

"To solve the problem we tried playing around with polling frequency from 1 minute to 5 minutes."

To reduce the solution, we started exploring the Oracle FTP adapters JCA binding properties, the one stands out of all the binding properties is MaxRaiseSize and Threading Model which plays very important role in processing the files.

after switching to Partitioned Threading Model, but still we had polling issue, since number of files were high.

Therefore, the next property we explored was MaxRaiseSize, the most important property - this is the one which controls how many files are submitted for each processing cycle. We configured MaxRaiseSize to 40, so that during each polling cycle only 40 files will be submitted for processing to 12 processor threads. So that we had better performance and the files were processed in better way.

Finally we ran out of all luck, and had to revisit the solution design and we moved whole solution to SQL Server Integration Services (SSIS), which was able to process the files in a better way than Oracle FTP adapter.

Large Payload Handling In SOA

Processing large XML (with repeating structures). Requirement was to concurrently process more than 10 large sized XML files (each > 1GB). Oracle recommends using below approach for this use case:

De-batching XML
Chunked Read
Streaming XPath functions

Controlling the Order in which Files Are Processed
The File/FTP adapter enables you to achieve controlling the order in which files gets processed through a FileSorter attribute that you can define in the JCA file for your inbound File/Ftp Adapter service.



For example, if you know that it takes three to four minutes for a file to be written, then set the minimum age to five minutes.

 If a file is detected in the input directory and its modification time is less than five minutes older than the current time, then the file is not retrieved because it is still potentially being written to.
obtain more control over polling, you can use a file-based trigger.

For example, 
a BPEL process is writing files to a directory and
 a second BPEL process is polling the same directory for files. 

To have the second process start polling the directory only after the first process has written all the files, 

you can use a trigger file
You can configure the first process to create a trigger file at the end. 
The second process starts polling the inbound directory after it finds the trigger file.




No comments:

Post a Comment

SOA Overview Part-1

  Middleware It provides a mechanism for the process to interact with other processes running on multiple network machines. Advantages...