Real-World datasets accumulated over a number of years tend to be incomplete, inconsistent and contain noisy data, this, in turn, will cause an inconsistency of data warehouses. Data owners are having hundred-millions to billions of records written in different languages, hence continuously increases the need for comprehensive, efficient techniques to maintain data consistency and increase its quality. It is known that the data cleaning is a very complex and difficult task, especially for the data written in Arabic as a complex language, where various types of unclean data can occur to the contents. For example, missing values, dummy values, redundant, inconsistent values, misspelling, and noisy data. The ultimate goal of this paper is to improve the data quality by cleaning the contents of Arabic datasets from various types of errors, to produce data for better analysis and highly accurate results. This, in turn, leads to discover correct patterns of knowledge and get an accurate Decision-Making. This approach established based on the merging of different algorithms. It ensures that reliable methods are used for data cleansing. This approach cleans the Arabic datasets based on the multi-level cleaning using Arabic Misspelling Detection, Correction Model (AMDCM), and Decision Tree Induction (DTI). This approach can solve the problems of Arabic language misspelling, cryptic values, dummy values, and unification of naming styles. A sample of data before and after cleaning errors presented.

Бесплатно

A new Measuring Method of Flux Linkage of SRM

Chuntang Zhang, hui kong

Статья научная

This paper presents a indirect method of measuring flux characteristics based on DSP. By measuring current and voltage on a phase winding circuit and transfer them to PC by communication program, combined with the Simpson’s rule, the magnetization characteristics are obtained. The experimental instruments needed in this method are common, and the test platform is easy to be built, thus the expense is lowered in measurement of flux. The test indicates the measuring process is simple to implement, and the experimental results are accurate.

Бесплатно

A new similarity measure based on simple matching coefficient for improving the accuracy of collaborative recommendations

Vijay Verma, Rajesh Kumar Aggarwal

Статья научная

Recommender Systems (RSs) are essential tools of an e-commerce portal in making intelligent decisions for an individual to obtain product recommendations. Neighborhood-based approaches are traditional techniques for collaborative recommendations and are very popular due to their simplicity and efficiency. Neighborhood-based recommender systems use numerous kinds of similarity measures for finding similar users or items. However, the existing similarity measures function only on common ratings between a pair of users (i.e. ignore the uncommon ratings) thus do not utilize all ratings made by a pair of users. Furthermore, the existing similarity measures may either provide inadequate results in many situations that frequently occur in sparse data or involve very complex calculations. Therefore, there is a compelling need to define a similarity measure that can deal with such issues. This research proposes a new similarity measure for defining the similarities between users or items by using the rating data available in the user-item matrix. Firstly, we describe a way for applying the simple matching coefficient (SMC) to the common ratings between users or items. Secondly, the structural information between the rating vectors is exploited using the Jaccard index. Finally, these two factors are leveraged to define the proposed similarity measure for better recommendation accuracy. For evaluating the effectiveness of the proposed method, several experiments have been performed using standardized benchmark datasets (MovieLens-1M, 10M, and 20M). Results obtained demonstrate that the proposed method provides better predictive accuracy (in terms of MAE and RMSE) along with improved classification accuracy (in terms of precision-recall).

Бесплатно

A novel dynamic KCi - slice publishing prototype for retaining privacy and utility of multiple sensitive attributes

N.V.S. Lakshmipathi Raju, M.N. Seetaramanath, P. Srinivasa Rao

Статья научная

Data publishing plays a major role to establish a path between current world scenarios and next generation requirements and it is desirable to keep the individuals privacy on the released content without reducing the utility rate. Existing KC and KCi models concentrate on multiple categorical sensitive attributes. Both these models have their own merits and demerits. This paper proposes a new method named as novel KCi - slice model, to enhance the existing KCi approach with better utility levels and required privacy levels. The proposed model uses two rounds to publish the data. Anatomization approach is used to separate the sensitive attributes and quasi attributes. The first round uses a novel approach called as enhanced semantic l-diversity technique to bucketize the tuples and also determine the correlation of the sensitive attributes to build different sensitive tables. The second round generates multiple quasi tables by performing slicing operation on concatenated correlated quasi attributes. It concatenate the attributes of the quasi tables with the ID's of the buckets from the different sensitive tables and perform random permutations on the buckets of quasi tables. Proposed model publishes the data with more privacy and high utility levels when compared to the existing models.

Бесплатно

A novel interactive communication system realization through smart low noise block downconverter

Krishn Kumar Gupt

Статья научная

An interactive communication is the basic motivation behind a smart communication system, which requires simultaneous downlink and uplink feature. Smart LNB is a popular discussion which is leading towards Know Your DTH (KY-DTH). A low noise block-downconverter (LNB) is the signal receiving device used for satellite TV reception mounted on the satellite dishes. For broadcasters, this smart LNB opens the door to operate their own linear TV ecosystem and other services connected directly by satellite. This new generation Smart LNB comprises of both transmitter and receiver to provide interactive TV experiences and M2M services, unlike LNB. Having uplink and downlink capability, it enables full duplex communication leading various additional applications like live interactions; live viewing; TV servicing for 24 hours; solutions for remote monitoring; control in mission critical applications in the energy and utility sectors; natural gas monitoring; Smart grid; etc. DVB-S2 source and sink are analyzed using Agilent SystemVue platform. This paper describes the study and design of a smart low noise block downconverter (LNB) used for satellite communication, transmission in Ka band (29.5 to 30 GHz) and reception in Ku band (10.7 to 12.75 GHz). The LNB design is compromised importance characteristics like Spectrum comparison. The proposed design will result in enhancement of working lifetime of the Smart LNB system with capability to receive all signals within the range. The designed and simulated process were done using Agilent SystemVue. A summary of simulation work and result over the Smart LNB in Ka and Ku band is illustrated.

Бесплатно

A novel musculoskeletal imbalance identification mechanism for lower body analyzing gait cycle by motion tracking

Hiranthi Tennakoon, Charitha Paranamana, Maheshya Weerasinghe, Damitha Sandaruwan, Kalpani Mahindaratne

Статья научная

Muscles in a human body consists of a pair. Musculoskeletal imbalances caused by repetitive usage of one part of the muscle in this pair and incorrect posture a human body takes on regular basis lead to severe injuries in terms of neuro musculoskeletal problems, hamstring strains, lower back tightness, repetitive stress injuries, altered movement patterns, postural dysfunctions, trapped nerves and etc. and both neurological and physical performances are severely affected when time progresses. In clinical domain, muscle imbalances are determined by gait and posture analysis, Movement analysis, Joint range of motion analysis and muscle length analysis which require expertise knowledge and experience. X-Rays and CT scans in the medical domain also require domain experts to interpret the results of a checkup. Kinect is a motion capturing device which is able to track human skeleton, its joints and body movements within its sensory range. The purpose of this research is to provide a mechanism to identify muscle imbalances based on gait analysis tracked via Kinect motion capture device by differentiate the deviation of healthy person’s gait patterns. Primarily, the outcome of this study will be a self-identification method of human skeletal imbalance.

Бесплатно

A parallel evolutionary search for shortest vector problem

Gholam Reza Moghissi, Ali Payandeh

Статья научная

The hardness assumption of approximate shortest vector problem (SVP) within the polynomial factor in polynomial time reduced to the security of many lattice-based cryptographic primitives, so solving this problem, breaks these primitives. In this paper, we investigate the suitability of combining the best techniques in general search/optimization, lattice theory and parallelization technologies for solving the SVP into a single algorithm. Our proposed algorithm repeats three steps in a loop: an evolutionary search (a parallelized Genetic Algorithm), brute-force of tiny full enumeration (in role of too much local searches with random start points over the lattice vectors) and a single main enumeration. The test results showed that our proposed algorithm is better than LLL reduction and may be worse than the BKZ variants (except some so small block sizes). The main drawback for these test results is the not-sufficient tuning of various parameters for showing the potential strength of our contribution. Therefore, we count the entire main problems and weaknesses in our work for clearer and better results in further studies. Also it is proposed a pure model of Genetic Algorithm with more solid/stable design for SVP problem which can be inspired by future works.

Бесплатно

A reliable solution to load balancing with trust based authentication enhanced by virtual machines

Rakhi, G.L.Pahuja

Статья научная

Vehicular Ad hoc network is the most fast growing which shape fresh engineering opportunities like controlling traffic smartly, optimal resource maintenance and improved service for customers. Vehicular Ad hoc Network (VANET) is one of the most popular ad hoc networks. A vehicular ad hoc network generally faces the problems like trust modeling, congestion, and battery optimization issues. If the nodes are comparatively less than it can handle the traffic well when it comes to transferring the data at a rapid rate. But, when it comes to high-density traffic than a Vehicular network always faces congestion problem. This paper tried to find the reliable solution to the traffic management by adding up the virtual gears into the network and optimizes the congestion problem by using a trust queue which is updated with the broadcast concept of the hello packets in order to remove the unwanted nodes in the list. The network performance has been measured with QOS Parameters like delay, throughput, and other parameters to prove the authentication of the research.

Бесплатно

A robust functional minimization technique to protect image details from disturbances

Robiul Islam, Chen Xu, Yu Han, Sanjida Sultana Putul, Rana Aamir Raza

Статья научная

Image capturing using faulty systems or environmental vulnerabilities always degrade the image quality and causes the distortion of true details from the original imaging signals. Thus a robust way of image enhancement and edge preservation is an enormously requirement for smooth imaging operations. Although, many techniques have been deployed in this area during the decades for its betterment. However, the key challenges are remain towards better tradeoff between image enhancement and details protection. Therefore, this study inspects the existing limitations and proposes a robust technique based on functional minimization scheme in variational framework for ensuring better performance in case of image enhancement and details preservation simultaneously. A vigorous way to solve the minimization problem is also develop to make sure the efficiency of the proposed technique than some other traditional techniques.

Бесплатно

A stochastic model for simple document processing

Pierre Moukeli Mbindzoukou, Arsène Roland Moukoukou, David Naccache, Nino Tskhovrebashvili

Статья научная

This work focuses on the stationary behavior of a simple document processing system. We mean by simple document, any document whose processing, at each stage of its progression in its graph of processing, is assured by a single person. Our simple document processing system derives from the general model described by MOUKELI and NEMBE. It is about an adaptation of the said general model to determine in terms of metrics and performance, its behavior in the particular case of simple document processing. By way of illustration, data relating to a station of a central administration of a ministry, observed over six (6) years, were presented. The need to study this specific case comes from the fact that the processing of simple documents is based on a hierarchical organization and the use of priority queues. As in the general model proposed by MOUKELI and NEMBE, our model has a static component and a dynamic component. The static component is a tree that represents the hierarchical organization of the processing stations. The dynamic component consists of a Markov process and a network of priority queues which model all waiting lines at each processing unit. Key performance indicators were defined and studied point by point and on average. As well as issues specific to the hierarchical model associated with priority queues have been analyzed and solutions proposed; it is mainly infinite loops.

Бесплатно

A study and performance comparison of MapReduce and apache spark on twitter data on Hadoop cluster

Nowraj Farhan, Ahsan Habib, Arshad Ali

Статья научная

We explore Apache Spark, the newest tool to analyze big data, which lets programmers perform in-memory computation on large data sets in a fault tolerant manner. MapReduce is a high-performance distributed BigData programming framework which is highly preferred by most big data analysts and is out there for a long time with a very good documentation. The purpose of this project was to compare the scalability of open-source distributed data management systems like Apache Hadoop for small and medium data sets and to compare it’s performance against the Apache Spark, which is a scalable distributed in-memory data processing engine. To do this comparison some experiments were executed on data sets of size ranging from 5GB to 43GB, on both single machine and on a Hadoop cluster. The results show that the cluster outperforms the computation of a single machine by a huge range. Apache Spark outperforms MapReduce by a dramatic margin, and as the data grows Spark becomes more reliable and fault tolerant. We also got an interesting result that, with the increase of the number of blocks on the Hadoop Distributed File System, also increases the run-time of both the MapReduce and Spark programs and even in this case, Spark performs far more better than MapReduce. This demonstrates Spark as a possible replacement of MapReduce in the near future.

Бесплатно

A study on diagnosis of Parkinson’s disease from voice dysphonias

Kemal Akyol

Статья научная

Parkinson disease that occurs at older ages is a neurological disorder that is one of the most painful, dangerous and non-curable diseases. One symptom that a person may have Parkinson’s disease is trouble that can occur in the voice of a person which is so-called dysphonia. In this study, an application based on assessing the importance of features was carried out by using multiple types of sound recordings dataset for diagnosis of Parkinson disease from voice disorders. The sub-datasets, which were obtained from these records and were divided into 70-30% training and testing data respectively, include the important features. According to the experimental results, the Random Forest and Logistic Regression algorithms were found successful in general. Besides, one or two of these algorithms were found to be more successful for each sound. For example, the Logistic Regression algorithm is more successful for the ‘a’ voice. The Artificial Neural Networks algorithm is more successful for the ‘o’ voice.

Бесплатно

A study on the diagnosis of parkinson’s disease using digitized wacom graphics tablet dataset

Kemal Akyol

Статья научная

Parkinson Disease is a neurological disorder, which is one of the most painful, dangerous and non-curable diseases, which occurs at older ages. The Static Spiral Test, Dynamic Spiral Test and Stability Test on Certain Point records were used in the application which was developed for the diagnosis of this disease. These datasets were divided into 80-20% training and testing data respectively within the framework of 10-fold cross validation technique. Training data as the input data were sent to the Random Forest, Logistic Regression and Artificial Neural Networks classifier algorithms. After this step, performances of these classifier algorithms were evaluated on testing data. Also, new data analysis was carried out. According to the results obtained, Artificial Neural Networks is more successful than Random Forest and Logistic Regression algorithms in analysis of new data.

Бесплатно

A systematic study of data wrangling

Malini M. Patil, Basavaraj N. Hiremath

Статья научная

The paper presents the theory, design, usage aspects of data wrangling process used in data ware housing and business intelligence. Data wrangling is defined as an art of data transformation or data preparation. It is a method adapted for basic data management which is to be properly processed, shaped, and is made available for most convenient consumption of data by the potential future users. A large historical data is either aggregated or stored as facts or dimensions in data warehouses to accommodate large adhoc queries. Data wrangling enables fast processing of business queries with right solutions to both analysts and end users. The wrangler provides interactive language and recommends predictive transformation scripts. This helps the user to have an insight of reduction of manual iterative processes. Decision support systems are the best examples here. The methodologies associated in preparing data for mining insights are highly influenced by the impact of big data concepts in the data source layer to self-service analytics and visualization tools.

Бесплатно

A task scheduling model for multi-CPU and multi-hard disk drive in soft real-time systems

Zeynab Mohseni, Vahdaneh Kiani, Amir Masoud Rahmani

Статья научная

In recent years, by increasing CPU and I/O devices demands, running multiple tasks simultaneously becomes a crucial issue. This paper presents a new task scheduling algorithm for multi-CPU and multi-Hard Disk Drive (HDD) in soft Real-Time (RT) systems, which reduces the number of missed tasks. The aim of this paper is to execute more parallel tasks by considering an efficient trade-off between energy consumption and total execution time. For study purposes, we analyzed the proposed scheduling algorithm, named HCS (Hard disk drive and CPU Scheduling) in terms of the task set utilization, the total execution time, the average waiting time and the number of missed tasks from their deadlines. The results show that HCS algorithm improves the above mentioned criteria compared to the HCS_UE (Hard disk drive and CPU Scheduling _Unchanged Execution time) algorithm.

Бесплатно

ADPBC: Arabic Dependency Parsing Based Corpora for Information Extraction

Sally Mohamed, Mahmoud Hussien, Hamdy M. Mousa

Статья научная

There is a massive amount of different information and data in the World Wide Web, and the number of Arabic users and contents is widely increasing. Information extraction is an essential issue to access and sort the data on the web. In this regard, information extraction becomes a challenge, especially for languages, which have a complex morphology like Arabic. Consequently, the trend today is to build a new corpus that makes the information extraction easier and more precise. This paper presents Arabic linguistically analyzed corpus, including dependency relation. The collected data includes five fields; they are a sport, religious, weather, news and biomedical. The output is CoNLL universal lattice file format (CoNLL-UL). The corpus contains an index for the sentences and their linguistic meta-data to enable quick mining and search across the corpus. This corpus has seventeenth morphological annotations and eight features based on the identification of the textual structures help to recognize and understand the grammatical characteristics of the text and perform the dependency relation. The parsing and dependency process conducted by the universal dependency model and corrected manually. The results illustrated the enhancement in the dependency relation corpus. The designed Arabic corpus helps to quickly get linguistic annotations for a text and make the information Extraction techniques easy and clear to learn. The gotten results illustrated the average enhancement in the dependency relation corpus.

Бесплатно

APESS - A Service-Oriented Data Mining Platform: Application for Medical Sciences

Mohammed Sabri, Sidi Ahmed Rahal

Статья научная

The domain medical and public health remains the principal preoccupation of all world population. It makes recourse at several means from various disciplines, including for instance epidemiology, to help clinicians in decision processes. This paper proposes an Assistance Platform for Epidemiological Searches and Surveillance (APESS) for service-oriented data mining in the field of epidemiology. The main aim of the present platform is to build a system that enables extracting predictive rules, flexible and scalable for aid in decision-making by trades' experts. Results showed that the current system provides prediction models of chronic diseases (epidemiological prediction rules), using classification algorithms.

Бесплатно

AQUAZONE: A Spatial Decision Support System for Aquatic Zone Management

Sekhri A. Arezki, Hamdadou B. Djamila, Beldjilali C. Bouziane

Статья научная

During the last years, the Sebkha Lake of Oran (Algeria) has been the subject of many studies for its protection and recovery. Many environmental and wetlands experts are a hope on the integration of this rich and fragile space, ecologically, as a pilot project in "management of water tides". Support the large of Sebkha (Lake) of Oran is a major concern for governments looking to make this a protected natural area and viable place. It was a question of putting in place a management policy to respond to the requirements of economic, agricultural and urban development and the preservation of this natural site through management of its water and the preservation of its quality. The objective of this study is to design and develop a Spatial Decision Support System, namely AQUAZONE, able to assist decision makers in various natural resource management projects. The proposed system integrates remote sensing image processing methods, from display operations, to analysis results, in order to extract useful knowledge to best natural resource management, and in particular define the extension of Sebkha Lake of Oran (Algeria). Two methods were applied to classify LANDSAT 5 TM images of Oran (Algeria): Fuzzy C-Means (FCM) applied on multi spectral images, and the other that comes with the manual which is the Ordered Queue-based Watershed (OQW). The FCM will serve as initialization phase, to automatically discover the different classes (urban, forest, water, etc..) from a LANDSAT 5 TM images, and also minimize ambiguity in grayscale and establish Land cover map of this region.

Бесплатно

ATAM-based Architecture Evaluation Using LOTOS Formal Method

Muhammad Usman Ashraf, Wajdi Aljedaibi

Статья научная

System Architecture evaluation and formal specification are the significant processes and practical endeavors in all domains. Many methods and formal descriptive techniques have been proposed to make a comprehensive analysis and formal representation of a system architecture. This paper consists of two main parts, in first we evaluated system performance, quality attribute in Remote Temperature Sensor clients-Server architecture by implementing an ATAM model, which provides a comprehensive support for evaluation of architecture designs by considering design quality attributes and how they can be represented in the architecture. In the second part, we computed the selected system architecture in ISO standards formal description technique LOTOS with which a system can be specified by the temporal relation between interactions and behavior of the system. Our proposed approach improves on factors such as ambiguity, inconsistency and incompleteness in current system architecture.

Бесплатно

AWG Based Optical Packet Switch Architecture

Pallavi S, M. Lakshmi

Статья научная

This paper discusses an optical packet switch (OPS) architecture, which utilizes the components like optical reflectors, tunable wavelength converters (TWCs), arrayed waveguide grating (AWG) and pieces of fiber to realize the switching action. This architecture uses routing pattern of AWG, and its symmetric nature, to simplify switch operation significantly. It is also shown that using multi-wavelengths optical reflector, length of delay lines can be reduced to half of its original value. This reduction in length is useful for comparatively larger size packets as for them. It can grow up some kilometers. The considered architecture is compared with already published architecture. Finally, modifications in the architecture are suggested such that switch can be efficiently placed in the backbone network.

Бесплатно

1
...
5
6
7
8
9
10
11
...
В конец

Журнал